Use python spider web scrapping website automatically, get a clear json format and store the data in MongoDB
Use python spider web scrapping website automatically and get a clear json format
This assginment is from https://github.com/rafikahmed/web-scraping-course.
In this third assignment you will:
Scrape this website: http://quotes.toscrape.com
From each page you have to scrape: The quote;The author;The tags
Initiate a new Scrapy project and call it: third_assignment
Create a new spider class called QuotesToScrapeSpider
Your spider should be named: quotes
Build the parse method, just a regular one and yield all the fields
Follow the links in pagination
Execute your spider and save the items in a JSON file
Notice that for each quote we have these unicode characters (\u201c, \u201d) which refers to the opening and closing quotation marks.
Now by default for all other item exporters like CSV, Scrapy by default uses utf-8 encoding
and this is why in CSV files you don’t see uninterpreted unicode characters like in JSON files (\u201c, \u201d)
so your job is to go through this link from Scrapy’s documentation
and figure out how you can force Scrapy to use UTF-8 encoding.
Now this your job is to build: