Googlebot hits each page twice, presumably to work around dynamic content: I understand the reasoning, but it would be useful if that “feature” were disabled when the ‘bot realizes the content-length is identical on the repeated requests.
Surely, we expect robots.txt to be static?
216.239.46.226 – – [25/May/2002:21:20:45 -0700] “GET /robots.txt HTTP/1.0” 200 0 “-” “Googlebot/2.1 (+http://www.googlebot.com/bot.html)”
216.239.46.226 – – [25/May/2002:21:20:45 -0700] “GET /robots.txt HTTP/1.0” 200 0 “-” “Googlebot/2.1 (+http://www.googlebot.com/bot.html)”
216.239.46.226 – – [25/May/2002:21:20:45 -0700] “GET /movabletype/archives/000057.html HTTP/1.0” 200 6115 “-” “Googlebot/2.1 (+http://www.googlebot.com/bot.html)”
216.239.46.226 – – [25/May/2002:21:20:45 -0700] “GET /movabletype/archives/000057.html HTTP/1.0” 200 6115 “-” “Googlebot/2.1 (+http://www.googlebot.com/bot.html)”
216.239.46.220 – – [25/May/2002:21:23:41 -0700] “GET /movabletype/archives/000004.html HTTP/1.0” 200 5388 “-” “Googlebot/2.1 (+http://www.googlebot.com/bot.html)”
216.239.46.220 – – [25/May/2002:21:23:41 -0700] “GET /movabletype/archives/000004.html HTTP/1.0” 200 5388 “-” “Googlebot/2.1 (+http://www.googlebot.com/bot.html)”
More than half my hits today are from GoogleBot, but when I factor in that each hit is a duplicate, it’s actually just a third.