项目作者: avt-computer-education-center

项目描述 :
The avtMonExp project on Python 3 for searching experts of a given domain in Twitter
高级语言: Python
项目地址: git://github.com/avt-computer-education-center/avt-mon-exp.git
创建时间: 2020-12-17T17:38:18Z
项目社区:https://github.com/avt-computer-education-center/avt-mon-exp

开源协议:MIT License

下载


avtMonExp project on Python 3

1. Introduction

The avtMonExp project on Python 3 for searching experts of a given domain in Twitter.

Work on the avtMonExp project was started in August 2016. This project implements some ideas from the E-Government Monitor project.

The avtMonExp project includes the following main stages:

  1. Search and retrieve data based on pre-defined criteria from Twitter.
  2. Data analysis.
  3. Evaluation of relevance and ranking of data by a unique algorithm developed by our company’s specialists.
  4. Save data that corresponds to the specified criteria in the relational database.
  5. Visualization of results in the browser on Google Maps.

2. Requirements

2.1 The avtMonExp project requires the following main components:

3. How to prepare and start using this project step by step

3.1 Fork, Clone or Download the project

3.2 Install the requirements

3.3 Create a MySQL database called monexp_db

3.4 Instead of data placeholders, add your real data to the following project files:

3.4.1 Description of the domain model and experts using JSON. To search on Twitter and further analysis of search results

3.4.1.1 Format of the domains_data.json file
  1. // avtMonExp/avtMonExp/domains_data.json
  2. {
  3. "domains": [
  4. {
  5. "domain":"your_domain_1",
  6. "tags":{ // Tags are strings with no spaces, which describe the domain.
  7. // The tags will perform in the following three forms:
  8. // 1. "your_tag"
  9. // 2. "#your_tag"
  10. // 3. "@your_tag"
  11. "your_tag_1_1":your_tag_score_1_1, // Your score for the tag is from 1 to 5
  12. "your_tag_1_2":your_tag_score_1_2,
  13. "your_tag_1_3":your_tag_score_1_3,
  14. ...
  15. "your_tag_1_n":your_score_1_n
  16. },
  17. "phrases":{ // Phrases are strings with spaces, which describe the domain.
  18. "your phrase_1_1":your_phrase_score_1_1, // Your score for the phrase is from 1 to 5
  19. "your phrase_1_2":your_phrase_score_1_2,
  20. "your phrase_1_3":your_phrase_score_1_3,
  21. ...
  22. "your phrase_1_m":your_phrase_score_1_m
  23. },
  24. "expert_keywords":{ // Expert keywords are strings without spaces, which describe experts
  25. // in the specified domain
  26. "your_expert_keywords_1_1":your_expert_keywords_score_1_1, //Your score for the expert keyword
  27. // is from 1 to 5
  28. "your_expert_keywords_1_2":your_expert_keywords_score_1_2,
  29. "your_expert_keywords_1_3":your_expert_keywords_score_1_3,
  30. ...
  31. "your_expert_keywords_1_k":your_expert_keywords_score_1_k
  32. }
  33. },
  34. {
  35. "domain":"your_domain_2",
  36. "tags":{ // Tags are strings with no spaces, which describe the domain.
  37. // The tags will perform in the following three forms:
  38. // 1. "your_tag"
  39. // 2. "#your_tag"
  40. // 3. "@your_tag"
  41. "your_tag_2_1":your_tag_score_2_1, // Your score for the tag is from 1 to 5
  42. "your_tag_2_2":your_tag_score_2_2,
  43. "your_tag_2_3":your_tag_score_2_3,
  44. ...
  45. "your_tag_2_n":your_score_2_n
  46. },
  47. "phrases":{ // Phrases are strings with spaces, which describe the domain.
  48. "your phrase_2_1":your_phrase_score_2_1, // Your score for the phrase is from 1 to 5
  49. "your phrase_2_2":your_phrase_score_2_2,
  50. "your phrase_2_3":your_phrase_score_2_3,
  51. ...
  52. "your phrase_2_m":your_phrase_score_2_m
  53. },
  54. "expert_keywords":{ // Expert keywords are strings without spaces, which describe experts
  55. // in the specified domain
  56. "your_expert_keywords_2_1":your_expert_keywords_score_2_1, //Your score for the expert keyword
  57. // is from 1 to 5
  58. "your_expert_keywords_2_2":your_expert_keywords_score_2_2,
  59. "your_expert_keywords_2_3":your_expert_keywords_score_2_3,
  60. ...
  61. "your_expert_keywords_2_k":your_expert_keywords_score_2_k
  62. }
  63. }
  64. ]
  65. }
3.4.1.2 Example of the domains_data.json file for Wireless_Communications domain
  1. // avtMonExp/avtMonExp/domains_data.json
  2. {
  3. "domains": [
  4. {
  5. "domain":"Wireless_Communications",
  6. "tags":{
  7. "Wireless":5,
  8. "Infrared":3,
  9. "Bluetooth":4,
  10. "Wi-Fi":4,
  11. "ZigBee":4,
  12. "Cellural":5,
  13. "Mobile":5,
  14. "Satellite":4
  15. },
  16. "phrases":{
  17. "Wireless Networking":4,
  18. "Wireless Communication Networks":5,
  19. "Wireless Communication Systems":5
  20. },
  21. "expert_keywords":{
  22. "Expert":5,
  23. "Leader":4,
  24. "Engineer":4,
  25. "CEO":5,
  26. "CTO":5,
  27. "PhD":4,
  28. "Magazine":3,
  29. "Journalist":4,
  30. "Reviewer":4,
  31. "Analyst":5,
  32. "Blogger":5,
  33. "Reseacher":5
  34. }
  35. }
  36. ]
  37. }

3.4.2 To interact with MySQL database

  1. # avtMonExp/avtMonExp/mysql_monexp_db_config.py
  2. # create dictionary to hold connection info to <monexp_db> database
  3. monexp_db_config = {
  4. 'user': '<your-user>',
  5. 'password': '<your-password>',
  6. 'host': '127.0.0.1',
  7. 'charset': 'utf8mb4'
  8. }

3.4.3 To interact with your Twitter account with TwitterSearch Library need create Twitter App, and getting your application tokens

  1. # avtMonExp/avtMonExp/tw_search_experts.py
  2. def init_tw_search_lib(self, domain_keyword):
  3. #...
  4. # it's about time to create a TwitterSearch object with our secret tokens
  5. ts = TwitterSearch(
  6. consumer_key='<your-CONSUMER_KEY>',
  7. consumer_secret='<your-CONSUMER_SECRET>',
  8. access_token='<your-ACCESS_TOKEN>',
  9. access_token_secret='<your-ACCESS_TOKEN_SECRET>'
  10. )
  11. # ...

3.4.4 To use python-gmaps Package for getting the latitude and longitude of the expert’s location from the field in database

  1. # avtMonExp/avtMonExp/tw_search_experts.py
  2. def tw_expert_location_geocoding(self, tw_user_location):
  3. # ...
  4. gmaps_request = Geocoding(api_key='<your-api_key>')
  5. # ...

3.5 Run the avtMonExp application

Run the main application module (avtmonexp.py) from the avtMonExp package with the following console command:

$ python avtmonexp.py

3.6 Example of the results of the first launch of the avtMonExp application

3.6.1 A fragment of the output of the application results to the console

  1. ---------------------------------------------------------------------
  2. The avtMonExp app began to search and analyze experts on Twitter ...
  3. ---------------------------------------------------------------------
  4. ---
  5. Timestamp (UTC): 2018-Apr-03 14:05:04
  6. ---
  7. Prepare data from <domains_data.json> file...
  8. ---
  9. Create <monexp_db> database...
  10. ---
  11. Create tables in <monexp_db> database...
  12. ---
  13. Search and analysis experts from Twitter users...
  14. ---
  15. Current processing domain: Wireless_Communications
  16. Queries done: 1. Tweets received: 100
  17. Queries done: 2. Tweets received: 200
  18. Queries done: 3. Tweets received: 300
  19. Queries done: 4. Tweets received: 400
  20. Queries done: 5. Tweets received: 500
  21. ---
  22. Current time(UTC): 14:05:26
  23. Elapsed time: 00:00:22
  24. ---
  25. Now the avtMonExp app is suspended for 60 seconds to avoid rate-limitation by Twitter...
  26. ---
  27. Resume processing and analysis...
  28. Queries done: 6. Tweets received: 600
  29. Queries done: 7. Tweets received: 700
  30. Queries done: 8. Tweets received: 800
  31. Queries done: 9. Tweets received: 900
  32. Queries done: 10. Tweets received: 1000
  33. ---
  34. Current time(UTC): 14:06:33
  35. Elapsed time: 00:01:29
  36. ---
  37. Now the avtMonExp app is suspended for 60 seconds to avoid rate-limitation by Twitter...
  38. ---
  39. Resume processing and analysis...
  40. Queries done: 11. Tweets received: 1100
  41. Queries done: 12. Tweets received: 1200
  42. Queries done: 13. Tweets received: 1300
  43. Queries done: 14. Tweets received: 1400
  44. Queries done: 15. Tweets received: 1500
  45. ---
  46. Current time(UTC): 14:07:41
  47. Elapsed time: 00:02:37
  48. ---
  49. Now the avtMonExp app is suspended for 60 seconds to avoid rate-limitation by Twitter...
  50. ...
  51. ...
  52. ...
  53. ---
  54. Resume processing and analysis...
  55. Queries done: 46. Tweets received: 4600
  56. Queries done: 47. Tweets received: 4700
  57. Queries done: 48. Tweets received: 4800
  58. Queries done: 49. Tweets received: 4900
  59. Queries done: 50. Tweets received: 5000
  60. ---
  61. Current time(UTC): 14:53:40
  62. Elapsed time: 00:48:35
  63. ---
  64. Now the avtMonExp app is suspended for 60 seconds to avoid rate-limitation by Twitter...
  65. ...
  66. ...
  67. ...
  68. ---
  69. Resume processing and analysis...
  70. Queries done: 91. Tweets received: 9100
  71. Queries done: 92. Tweets received: 9200
  72. Queries done: 93. Tweets received: 9300
  73. Queries done: 94. Tweets received: 9400
  74. Queries done: 95. Tweets received: 9500
  75. ---
  76. Current time(UTC): 15:03:51
  77. Elapsed time: 00:58:47
  78. ---
  79. Now the avtMonExp app is suspended for 60 seconds to avoid rate-limitation by Twitter...
  80. ---
  81. Resume processing and analysis...
  82. ...
  83. ...
  84. ...
  85. ---
  86. Generate HTML and display experts for each domain on Google Maps in default browser...
  87. ---
  88. Copy exists <Wireless_Communications_experts.html> file to <Wireless_Communications_experts.bak> file...
  89. ---
  90. New <Wireless_Communications_experts.html> file was successfully generated...
  91. ---
  92. Open new <Wireless_Communications_experts.html> file in default browser...
  93. ---
  94. Timestamp (UTC): 2018-Apr-03 15:06:08
  95. ---------------------------------------------------------------------
  96. The avtMonExp app successfully completed.
  97. ---------------------------------------------------------------------
  98. Elapsed time: 01:01:03
  99. ---------------------------------------------------------------------

3.6.2 Displaying experts for each domain on Google Maps in default browser

Results of data processing for each domain are saved in the experts_data_viz_html project folder. The file name corresponds to the following pattern: domain_experts.html.
The avtMonExp app automatically opens this file in the default browser.

For example, for the Wireless_Communications domain, the result is as follows:
avtMonExp/avtMonExp/experts_data_viz_html/Wireless_Communications_experts.html

NOTE: If the same file already exists in the experts_data_viz_html folder when creating a new *.html file, it is copied to a file with the *.bak extension, and the existing *.html file is overwritten with a new *.html file with the same name.

Example of plotting expert data from the specified domain on Google Maps as heatmap in the default browser

Plotting expert data from the the `Wireless_Communications` domain on Google Maps as heatmap in the default browser