项目作者： kenypatel233

项目描述：
Multiclass classification task to perform sentiment analysis of tweet using NLP

高级语言： Jupyter Notebook

项目主页：

项目地址: git://github.com/kenypatel233/SentimentAnalysis.git

创建时间： 2021-08-28T06:27:57Z
项目社区：https://github.com/kenypatel233/SentimentAnalysis
开源协议：
下载

About the code:

This is a sentiment analysis problem statement solved using the dataset from kaggle.
https://www.kaggle.com/datatattle/covid-19-nlp-text-classification

The main aim was multiclass classification of tweets using NLP

Both Machine Learning and Deep Learning approaches were explored:
ML models include:

Multinomial Naive Bayes Classifier,
Gradient Boosting classifier,
Random Forest Classifier( relatively best training accuracy around 77%)
As expected, they performed poorly on true test data(only 35% accuracy)

Deep Learning approach includes:

A simple RNN model,(Accuracy: around 75%)
A LSTM based model,(Accuracy around 82%)
A Bidirectional LSTM model(Accuracy around 84% but suffers from overfitting)

How To Use

This folder contains 3 files:-

Sentiment Analysis.ipynb
Corona_NLP_train.csv
Corona_NLP_test.csv

—————About the module———————

Tools used: Jupyter notebook in Ananconda environment
Dependencies: Python 3, Tensorflow version 2.5.0, Keras, nltk
Libraries used: Numpy, Sklearn, Seaborn, Keras, Tensorflow, Matplotlib, gensim

=====Instructions to run the code======

1. IN JUPYTER NOTEBOOK:

The folder contains the train and test data in form of .csv files (‘Corona_NLP_train.csv’ and ‘Corona_NLP_test.csv’)
Ensure you download the whole folder and not change the relative path of .ipynb and .csv files.
Run the code cells sequentially
NOTE: Some models may take time to execute

2. In GOOGLE COLABORATORY

open the .ipynb file
Upload both the .csv files using the file upload option( mostly available at left hand side menu bar)
Ensure upload is completed
Execute cells sequentially
NOTE: Some models may take time to execute


