项目作者: annieqt

项目描述 :
Use Scrapy to crawl text reviews & images from Dianping.com and generate pretty static pages!
高级语言: Python
项目地址: git://github.com/annieqt/Dianping-Gallery.git
创建时间: 2017-04-14T06:36:20Z
项目社区:https://github.com/annieqt/Dianping-Gallery

开源协议:

下载


Dianping-Gallery

Use Scrapy to crawl text reviews & images from Dianping.com and generate pretty static pages!

Features

1. Crawling text reviews & images from a specific user’s Dianping account and storing them locally

Images

  • The downloaded images will be stored under ../imgs/, sorted by /user/shop/
  • You can also custom images path by change in IMAGES_STORE in settings.py

Text Reviews

  • The text reviews are exported in JSON format in review.json

2. Generating pretty static pages from crawled data to visualize the user’s FOOD preference!

To Be Done..

How to use

1. Dependencies

2. Configurations

  • Set start_urls in dianping_spider.py to the url of the review page that you want to crawl. e.g., click here to view my dianping reviews page

3. Run

Under ../Dianping-Gallery/dianping_gallery/dianping_gallery/spiders, run:
scrapy runspider dianping_spider.py -o review.json

The downloading process will then show in the command screen