项目作者: kenypatel233
项目描述 :
Multiclass classification task to perform sentiment analysis of tweet using NLP
高级语言: Jupyter Notebook
项目地址: git://github.com/kenypatel233/SentimentAnalysis.git
About the code:
This is a sentiment analysis problem statement solved using the dataset from kaggle.
https://www.kaggle.com/datatattle/covid-19-nlp-text-classification
The main aim was multiclass classification of tweets using NLP
Both Machine Learning and Deep Learning approaches were explored:
ML models include:
- Multinomial Naive Bayes Classifier,
- Gradient Boosting classifier,
- Random Forest Classifier( relatively best training accuracy around 77%)
- As expected, they performed poorly on true test data(only 35% accuracy)
Deep Learning approach includes:
- A simple RNN model,(Accuracy: around 75%)
- A LSTM based model,(Accuracy around 82%)
- A Bidirectional LSTM model(Accuracy around 84% but suffers from overfitting)
How To Use
This folder contains 3 files:-
- Sentiment Analysis.ipynb
- Corona_NLP_train.csv
- Corona_NLP_test.csv
—————About the module———————
- Tools used: Jupyter notebook in Ananconda environment
- Dependencies: Python 3, Tensorflow version 2.5.0, Keras, nltk
- Libraries used: Numpy, Sklearn, Seaborn, Keras, Tensorflow, Matplotlib, gensim
=====Instructions to run the code======
1. IN JUPYTER NOTEBOOK:
- The folder contains the train and test data in form of .csv files (‘Corona_NLP_train.csv’ and ‘Corona_NLP_test.csv’)
- Ensure you download the whole folder and not change the relative path of .ipynb and .csv files.
- Run the code cells sequentially
NOTE: Some models may take time to execute
2. In GOOGLE COLABORATORY
- open the .ipynb file
- Upload both the .csv files using the file upload option( mostly available at left hand side menu bar)
- Ensure upload is completed
- Execute cells sequentially
NOTE: Some models may take time to execute