项目作者: ondata

项目描述 :
ETL scripts and issue tracking for AppaltiPOP project.
高级语言: Jupyter Notebook
项目地址: git://github.com/ondata/appaltipop.git
创建时间: 2020-02-02T11:10:32Z
项目社区:https://github.com/ondata/appaltipop

开源协议:MIT License

下载


AppaltiPOP

This repository is intended for project tracking. Here you can also find raw data and utilities for validation and indexing.

Data

Pipeline:

  • start: json files (an array of objects per source) in json folder
  • then: jsonl files (same data, but one objects per line) in jsonl folder
  • finally: indexing in elasticsearch folder

Schema

You can validate all files using JSON Schema in schema folder. Refer to README files in each folder for further informations, you need Python 3 and virtual environments managed by pipenv.

General usage:

  • cd [folder]
  • pipenv shell
  • pipenv install (only the first time)
  • python [script] [...args] (inside the virtual env) or pipenv run python [script] [...args]