Google 把解析爬虫规则的代码公开了 看起来他们似乎发现 robot.txt 很多都是手写的 专门留了一段代码来对付 “user agent” “disallow” 和 ”sitemap” 的各种错误拼法… https://github.com/google/robotstxt/blob/59f3643d3a3ac88f613326dd4dfc8c9b9a545e45/robots.cc#L680-L704

Please be advised that this post was written or last updated a while ago and may therefore contain outdated information or opinions I no longer hold.
请知悉本文自写作或上次更新已届相当时限,或包含过时信息及观点。