项目作者: anskarl

项目描述 :
Dockerized Apache Druid for testing and development
高级语言: Shell
项目地址: git://github.com/anskarl/druid-docker-cluster.git
创建时间: 2019-05-25T13:13:32Z
项目社区:https://github.com/anskarl/druid-docker-cluster

开源协议:Apache License 2.0

下载


Dockerized Apache Druid cluster

This project demostrates how you can setup a Dockerized example/development Apache Druid cluster.

The cluster is being composed of the following components:

  • S3 Compatible Object Storage MinIO for Deep storage
  • PostgreSQL for metadata storage
  • Zookeeper for internal service discovery, coordination, and leader election
  • Apache Druid platform:

    • Middle Manager to handle the ingestion of data into the cluster
    • Historical to handle the storage and querying on “historical” data
    • Broker to receive queries from external clients
    • Coordinator to assign segments to Historical nodes
    • Overlord to assign ingestion tasks to Middle Managers and to coordinate segment publishing
    • Router provides a unified API gateway in front of Brokers, Overlords and Coordinators

Instructions to build Druid image

  1. make image

or by using docker-compose

  1. docker-compose build

You can also specify the version of Druid to build, for example:

  1. make DRUID_VERSION=0.14.1-incubating image

or by using docker-compose

  1. docker-compose build --build-arg ARG_DRUID_VERSION=0.14.1-incubating

Run the cluster

  1. docker-compose up

or to run in the backgroumd:

  1. docker-compose up -d

After a while the Druid console should be available in http://localhost:8888

Load example data

For example data we are using a subset of the NYC Taxi & Limousine Commission - Trip Record Data, specifically from months 2015-01 to 2015-03.

  1. cd dataset
  2. ./03-load_to_druid.sh

Please note that you can download data for different months and adjust the sample size by adjusting the parameters of ./dataset/01-download.sh and ./dataset/02-create_sample_tripdata.sh.

The schema of the dataset and the indexing task is being defined in ./dataset/yellow_tripdata-index.json

…enjoy :)