项目作者: giuseppegambino

项目描述 :
Application of Sentiment Analysis of Italian tweet with Python and Spark
高级语言: Python
项目地址: git://github.com/giuseppegambino/Italian-Sentiment-Analysis-with-Spark.git


ItalianSentimentAnalysis

This is the project for my thesis in Computer Science done at University of Palermo under the supervision
of the professor Roberto Pirrone.

The goal was to build a data analysis pipeline with technologies related to Big Data:

  • Data collection
  • Data pre-processing
  • Data labeling
  • Machine Learning model tuning
  • Application of the Naive Bayes algorithm
  • Model evaluation
  • Insight extraction

The technologies used are:

  • Python 3.7
  • Tweepy, Twitter API
  • Pandas, Python Data Analysis Library
  • NLTK, Natural Language Toolkit Library
  • Apache Spark 2.4

The project consists of 4 python pages of code:

  • tweetSave.py to collect the tweet, is set to collect italian tweet with music keyword
  • tweetClean.py to clean and pre-process the data
  • tweetSentimentRadici.py to label the tweet with positive, negative or neutral sentiment
  • tweetSpark.py to apply the machine learning tools (RUNS ON SPARK)

Write me if you have doubts or to improve the solution.