项目作者: Mediashare

项目描述 :
:dizzy: Crawl urls from a webpage and provide a DomCrawler with Scraper Library
高级语言: PHP
项目地址: git://github.com/Mediashare/crawler.git
创建时间: 2019-12-22T15:17:26Z
项目社区:https://github.com/Mediashare/crawler

开源协议:MIT License

下载


Crawler

:dizzy: Crawl urls from a webpage and provide a DomCrawler with Scraper Library.

DomCrawler

Scraper use DomCrawler library. This is symfony component for DOM navigation for HTML and XML documents. You can retrieve Documentation Here.

Installation

  1. composer require mediashare/crawler

Usage

  1. <?php
  2. require 'vendor/autoload.php';
  3. use Mediashare\Crawler\Crawler;
  4. $crawler = new Crawler("https://mediashare.fr");
  5. $crawler->run();
  6. dump($crawler);
With Config
  1. <?php
  2. require 'vendor/autoload.php';
  3. use Mediashare\Crawler\Crawler;
  4. use Mediashare\Crawler\Config;
  5. $config = new Config();
  6. $config->setWebspider(true); // All website crawling
  7. $config->setVerbose(true); // Prompt progress bar
  8. $config->setPathRequires(['/Kernel/']); // Not crawl other path
  9. $config->setPathExceptions(['/CodeSnippet/']); // Not crawl this path
  10. $crawler = new Crawler("https://mediashare.fr", $config);
  11. $crawler->run();
  12. dump($crawler);