项目作者: odfalik

项目描述 :
Twitter geolocation and sentiment analysis with PySpark, Elasticsearch, Kibana Dashboard
高级语言: Python
项目地址: git://github.com/odfalik/tagshark.git
创建时间: 2020-11-09T04:31:11Z
项目社区:https://github.com/odfalik/tagshark

开源协议:

下载


Tagshark

Getting Started

  1. pip install -r requirements.txt
  2. sudo systemctl restart elasticsearch
  3. sudo systemctl restart kibana

In one terminal, run python3 stream.py
and in another, run python3 spark.py

To import Kibana dashboard/maps:

  1. Go to Kibana
  2. Click on Management
  3. Click on Saved Objects
  4. Click on the Import button
  5. Load export.ndjson (found in the root of this repo)

Useful (default) ports

  • 4040 Spark
  • 9200 Elasticsearch
  • 5601 Kibana

Useful Elasticsearch queries

  1. // Delete index
  2. DELETE tagshark
  3. // Create tagshark index
  4. PUT tagshark
  5. // Set geo_point mapping for location field
  6. PUT tagshark/_mappings
  7. {
  8. "properties": {
  9. "location": {
  10. "type": "geo_point"
  11. }
  12. }
  13. }
  14. // Get number of indexed documents
  15. GET tagshark/_count
  16. // Get a random document from tagshark index
  17. GET tagshark/_search
  18. {
  19. "size": 1,
  20. "query": {
  21. "function_score": {
  22. "query" : { "match_all": {} },
  23. "random_score": {}
  24. }
  25. }
  26. }