PROSAGA码农传奇-智慧物流-Scrapy Spider：完成后重启蜘蛛

Scrapy Spider：完成后重启蜘蛛

作者: 至此
发布时间: 2025-04-07 03:13:24 (2天前)
转自：

<div class =“post-text”itemprop =“text”>
  <P>
    我找到了解决问题的方法！我想做什么？
  </p>
  <UL>
    <LI>
      失败或关闭时处理蜘蛛
    </LI>
    <LI>
      关闭时尝试重新执行Spider
    </LI>
  </UL>
  <P>
    我通过像这样处理蜘蛛的错误来管理：
  </p>
   <pre>
    <code>
      import time

class mySpider(scrapy.Spider):
    name = "myspider"
    allowed_domains = ["google.com"]
    start_urls = [
        "http://www.google.com",
    ]

def handle_error(self, failure):
        self.log("Error Handle: %s" % failure.request)
        self.log("Sleeping 60 seconds")
        time.sleep(60)
        url = 'http://www.google.com'
        yield scrapy.Request(url, self.parse, errback=self.handle_error, dont_filter=True)

def start_requests(self):
        url = 'http://www.google.com'
        yield scrapy.Request(url, self.parse, errback=self.handle_error)

</code>
  </pre>
  <UL>
    <LI>
      我用了
       <code>
        dont_filter=True
      </code>
       使Spider允许复制请求，只有在它发生错误时才允许。
    </LI>
    <LI>
       <code>
        errback=self.handle_error
      </code>
       让蜘蛛完成自定义
       <code>
        handle_error
      </code>
       功能
    </LI>
  </UL>
</DIV>