项目作者: stdatalabs

项目描述 :
Twitter sentiment analysis using Spark and Stanford CoreNLP and visualization using elasticsearch and kibana
高级语言: Scala
项目地址: git://github.com/stdatalabs/sparkNLP-elasticsearch.git
创建时间: 2017-09-03T17:10:13Z
项目社区:https://github.com/stdatalabs/sparkNLP-elasticsearch

开源协议:

下载


SparkTwitterPopularHashTags

A project on Spark Streaming to analyze Popular hashtags from live twitter data streams. Data is ingested from different input sources like Twitter source, Flume and Kafka and processed downstream using Spark Streaming.

Requirements

  • IDE
  • Apache Maven 3.x
  • JVM 6 or 7

General Info

The source folder is organized into 2 packages i.e. Kafka and Streaming. Each class in the Streaming package explores different approach to consume data from Twitter source. Below is the list of classes:

  • com/stdatalabs/Kafka
    • KafkaTwitterProducer.java — A Kafka Producer that publishes twitter data to a kafka broker
  • com/stdatalabs/Streaming
    • SparkPopularHashTags.scala — Receives data from Twitter datasource
    • FlumeSparkPopularHashTags.scala — Receives data from Flume Twitter producer
    • KafkaSparkPopularHashTags.scala — Receives data from Kafka Producer
    • RecoverableKafkaPopularHashTags.scala — Spark-Kafka receiver based approach. Ensures at-least once semantics
    • KafkaDirectPopularHashTags.scala — Spark-Kafka Direct approach. Ensures exactly once semantics
  • TwitterAvroSource.conf
    — Flume conf for running Twitter avro source

Description