项目作者: zzt93

项目描述 :
Sync & Manipulate data from MySQL/MongoDB to Elasticsearch/MySQL/Http/Kafka Endpoint
高级语言: Java
项目地址: git://github.com/zzt93/syncer.git
创建时间: 2017-09-08T09:50:34Z

开源协议:BSD 3-Clause "New" or "Revised" License


Syncer: MySQL/MongoDB => Elasticsearch/MySQL/Kafka/HBase



Use Syncer


  • MySQL config
    • binlog_format: row
    • binlog_row_image: full
  • MongoDB config:
    • (optional) update bind_ip to allow listens for connections from applications on configured addresses.
    • enable replication set:
      • mongod --replSet myapp
      • Or use docker: docker run -d --name mongodb -p 27017:27017 -v /root/mongodb-container/db:/data/db mongo:3.2 mongod --replSet chat
    • init replication set in shell: rs.initiate()


      1. git clone https://github.com/zzt93/syncer
      2. cd syncer/ && mvn package
      3. # /path/to/config/: producer.yml, consumer.yml, password-file
      4. # use `-XX:+UseParallelOldGC` if you have less memory and lower input pressure
      5. # use `-XX:+UseG1GC` if you have at least 4g memory and event input rate larger than 2*10^4/s
      6. java -server -XX:+UseG1GC -jar ./syncer-core/target/syncer-core-1.0-SNAPSHOT.jar [--debug] [--port=40000] [--config=/absolute/path/to/syncerConfig.yml] --producerConfig=/absolute/path/to/producer.yml --consumerConfig=/absolute/path/to/consumer1.yml,/absolute/path/to/consumer2.yml
      Full and usable sample config can be found under test/config/, like test/config/simplest

How to ?

If you have any problems with how to use Syncer or bugs of it, write an issue.
I will handle it as soon as I can.


  • Q: “Got error produce response in correlation id xxx on topic-partition xxx.xxPartition-0, splitting and retrying (5 attempts left). Error: MESSAGE_TOO_LARGE”?
    • A: Adjust message batch.size to smaller number or config kafka to receive large message

Used In Production

  • Search system: search data sync
  • Micro-service: auth/recommend/chat data sync
    • Sync Requirement: low latency, high availability
  • Join table: avoid join in production env, use space for speed by joining table
    • Sync Requirement: low latency, high availability
  • Kafka: sync data to kafka, for other heterogeneous system to use
  • For data recovery: In case of drop entity mistakenly, or you know where to start & end
  • For alter table sync:
  • For data warehouse sync


See Issue 1


Implementation detail can be found in doc