项目作者: ApoorvTyagi

项目描述 :
Summarize the paragraphs
高级语言: Jupyter Notebook
项目地址: git://github.com/ApoorvTyagi/Text-Summariser.git
创建时间: 2019-04-04T19:40:32Z
项目社区:https://github.com/ApoorvTyagi/Text-Summariser

开源协议:

下载


Text-Summariser

Required Modules:
1.Beautiful soup
2.urllib
3.lxml

We scrap all the paragraphs of a wikipedia articles and try to find the summary of that.

STEPS:

We use the urlopen function from the urllib.request utility to scrape the data

To parse the data, we use BeautifulSoup object and pass it the scraped data object i.e. article and the lxml parser.

Remove Square Brackets and Extra Spaces

Remove special characters and digits

Converting Text To Sentences

Find Weighted Frequency of Occurrence

calculate the scores for each sentence by adding weighted frequencies of the words that occur in that particular sentence.

To summarize the article, we can take top N sentences with the highest scores.