PROSAGA码农传奇-agent-Robots.txt-多个用户代理的抓取延迟的正确格式是什么？

Robots.txt-多个用户代理的抓取延迟的正确格式是什么？

作者: 狗头军师
发布时间: 2024-01-02 10:07:34 (3月前)

以下是一个示例robots.txt文件，用于允许多个用户代理为每个用户代理提供多个抓取延迟。抓取延迟值仅用于说明目的，在实际的robots.txt文件中会有所不同。我已经在网上搜索了适当的答案，但是找不到答案。有太多混合的建议，我不知道哪种是正确/正确的方法。问题：（1）每个用户代理都可以拥有自己的爬网延迟吗？（我认为是）（2）在“允许/禁止”行之前或之后，您将每个用户代理的“爬行延迟”行放在何处？（3）每个用户代理组之间是否必须有空白。参考文献：<a href="http://www.seopt.com/2013/01/robots-text-file/">http://www.seopt.com/2013/01/robots-text-file/</a><a href="http://help.yandex.com/webmaster/?id=1113851#1113858">http://help.yandex.com/webmaster/?id=1113851#1113858</a>本质上，我希望使用以下示例中的值来查找最终的robots.txt文件。提前致谢。
<pre><code># Allow only major search spiders 
User-agent: Mediapartners-Google
Disallow:
Crawl-delay: 11

User-agent: Googlebot
Disallow:
Crawl-delay: 12

User-agent: Adsbot-Google
Disallow:
Crawl-delay: 13

User-agent: Googlebot-Image
Disallow:
Crawl-delay: 14

User-agent: Googlebot-Mobile
Disallow:
Crawl-delay: 15

User-agent: MSNBot
Disallow:
Crawl-delay: 16

User-agent: bingbot
Disallow:
Crawl-delay: 17

User-agent: Slurp
Disallow:
Crawl-delay: 18

User-agent: Yahoo! Slurp
Disallow:
Crawl-delay: 19

# Block all other spiders
User-agent: *
Disallow: /

# Block Directories for all spiders
User-agent: *
Disallow: /ads/
Disallow: /cgi-bin/
Disallow: /scripts/
（4）如果我想将所有用户代理设置为具有10秒的爬网延迟，以下内容是否正确？

# Allow only major search spiders
User-agent: *
Crawl-delay: 10

User-agent: Mediapartners-Google
Disallow:

User-agent: Googlebot
Disallow:

User-agent: Adsbot-Google
Disallow:

User-agent: Googlebot-Image
Disallow:

User-agent: Googlebot-Mobile
Disallow:

User-agent: MSNBot
Disallow:

User-agent: bingbot
Disallow:

User-agent: Slurp
Disallow:

User-agent: Yahoo! Slurp
Disallow:

# Block all other spiders
User-agent: *
Disallow: /

# Block Directories for all spiders
User-agent: *
Disallow: /ads/
Disallow: /cgi-bin/
Disallow: /scripts/
</code></pre>