项目作者: l-portet

项目描述 :
Data scraper of french yellow pages (Pages Jaunes)
高级语言: JavaScript
项目地址: git://github.com/l-portet/yellow-scraper.git
创建时间: 2019-05-26T09:54:18Z
项目社区:https://github.com/l-portet/yellow-scraper

开源协议:

下载


yellow-scraper

Scrape the french yellow pages (Pages Jaunes) with puppeteer

:warning: MAY BE DEPRECATED: Since Pages Jaunes pages and data structure may change, this scraper won’t be automatically updated.

Installation

  1. npm install

Usage

Set up the config.js file

Sample config

  1. module.exports = {
  2. query: {
  3. keyword: 'luthier',
  4. location: 'Rennes'
  5. }, // Will search all 'luthier' businesses in 'Rennes'
  6. headless: true, // Use chrome in headless mode
  7. userAgent: 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36',
  8. acceptLanguage: 'fr-FR,fr;q=0.9,en-US;q=0.8,en;q=0.7,la;q=0.6',
  9. outputFilename: 'output',
  10. outputFormat: 'csv', // Supported format : 'json', 'csv'
  11. maxResults: -1, // -1 => all or N max allowed results (the scraper will stop when the limit is outreached)
  12. puppeteerArgs: [], // Additional args for puppeteer (like proxy for example)
  13. baseURL: 'https://www.pagesjaunes.fr', // Only target this domain if you have the proper rights
  14. safeMode: true // Safe mode sets a delay between each query
  15. }

Run the scraper

  1. npm start

Todo

Export as Excel format (xls)

Issues

If you find an issue, feel free to contact me or open an issue on github. You can also contribute by creating a pull request.

Disclaimer

I can’t be charged for any abusive usage or problem of this software. Be sure you have the proper rights before you run it.