Scientific Paper Summarization
This project is based on Summarizing Scientific Articles: Experiments with Relevance and Rhetorical Status
The main idea behind this paper is to use a rhetorical approach for classifying different statements present in a scientific paper on basis of argumentative zoning.
This project builds towards automatic summarisation of scientific papers. We aim to classify each sentence within the research paper as one of the 7 rhetorical categories as mentioned below.
Each of the statement in the paper is divided into following different categories
On basis of the above rhetorical categories we do the argumentative zoning of the sentences present in the papers.
We used existing argumentative zoning dataset and on that we created different feature vectors corresponding to each sentence, and then we trained a Naive Bayes classifier on the dataset. We did a test-train split of 0.8
We used NLTK and Scikit for writing the classifier. Since we used scikit learn we were able to test our model with multiple distributions.
We have used Naive Bayes with the following distributions:
Type | Number of papers |
---|---|
Train Dataset | 64 ( 80 % ) |
Test Dataset | 15 ( 20 % ) |
Distribution | Accuracy |
---|---|
Bernoulli | 84.64 |
Gaussian | 100 |
Multinomial | 80.89 |
Complement | 81.28 |
- To Generate summary of a given file
- $ python src/summary.py {relative_path_of_file_from_summary.py}
- Example:
- $ python src/summary.py ../data/tagged/9405001.az-scixml
- To Train, Test and get accuracy of the classification of sentences
- $ python src/naive_bayes.py
- Running Flask app Locally
- $ sudo apt-get install python-pip
- $ sudo pip install virtualenv
- $ virtualenv -p python venv
- $ source venv/bin/activate
- $ pip install -r requirements.txt
- $ export PORT=5000
- $ gunicorn -b :$PORT --chdir src app:app
- After running the above commands, go the the following url
- http://0.0.0.0:5000/