智慧物流-abusive-text-detection-PROSAGA-码农传奇

Description:
This program predicts whether a given comment contains derogatory or abusive content or not.
Two machine learning algorithms: support vector machines and multinomial naive bayes are
used for this purpose.

Prerequisites:
The program is written in Python 3.6 . The following libraries are required to run this program :

sklearn
numpy
pandas
os
re
csv

Installation:

Pip can be installed by using the following command:
sudo easy_install pip
Scikit-learn can be installed by using the following command:
sudo pip install -U numpy scipy scikit-learn
Pandas can be installed by using the following command:
pip install pandas
Other libraries can be installed by using the pip command in a similar way shown above.

Instructions to run:

The program requires the dataset to be present in the directory which can be found here .
The files needed are train.csv, impermium_verification_labels.csv and
test_with_solutions.csv (Note: Remove the column Usage from the file
test_with_solutions.csv)
To increase the number of training instances, we have merged the files train.csv and
impermium_verification_labels.csv. train.csv can also be used individually.
Create two directories where the python code is located. Name them data and
cleaned_data/data and store the files from the given link in the directory: data.
Run the program by using the following command: python abusive_content_detection.py

Authors:
The authors for this program are:
Prajakta Gaydhani(pag3862), Virtee Parekh(vvp2639), Vaibhav Nagda(vjn4006).