项目作者: yennanliu

项目描述 :
Collections of POC/dev data infrastructure. | #DE
高级语言: Python
项目地址: git://github.com/yennanliu/data_infra_repo.git
创建时间: 2019-01-23T19:54:49Z
项目社区:https://github.com/yennanliu/data_infra_repo

开源协议:

下载


data_infra_repo

Build Status
PRs

As Data infra build part of the “Daas (Data as a service) repo”, this project shows how to build DS/DE environments via Docker from scratch. Will focus on : 1) System design by practical using cases 2) Docker, package, and libraries env setting up 3) Test, staging, and product develop/deploy workflow development (CI/CD style maybe)

File Structure

  1. # main projects
  2. ├── airflow_in_docker_compose
  3. ├── celery_redis_flower_infra
  4. ├── deploy_dockerhub.sh
  5. ├── hadoop_yarn_spark
  6. ├── kafka-zookeeper
  7. ├── kafka_zookeeper_redis_infra
  8. ├── mysql-master-slave

TODO

  • Hadoop
    • hadoop_yarn_spark (batch)
    • hadoop_yarn_spark (stream)
    • hadoop namenode, datanode
    • hadoop_yarn_flink
  • Kafka
    • Kafka producer, consumer, zk
    • Kafka mirror
    • Kafka-ELK-DB
  • airflow
    • airflow app in docker compose
  • DB
    • DB sharding (partition)
    • DB replica
    • DB master-follower
    • DB master-master
    • DB binary stream (kafka) to Bigquery/DW
    • DB binary stream ELK
  • Microservice

Test

Ref