I recently found this in the web server logs of one of the websites I look after:
22.214.171.124 - - [26/Jan/2010:05:01:44 -0800] "GET /application/json HTTP/1.1" 404 763 "-" "panscient.com" 126.96.36.199 - - [26/Jan/2010:05:01:47 -0800] "GET /following-sibling::* HTTP/1.1" 404 763 "-" "panscient.com" 188.8.131.52 - - [26/Jan/2010:05:01:55 -0800] "GET /AppleWebKit/ HTTP/1.1" 404 763 "-" "panscient.com" 184.108.40.206 - - [26/Jan/2010:05:01:58 -0800] "GET /following-sibling::* HTTP/1.1" 404 763 "-" "panscient.com"
In case you are not familiar with web server log files, these line mean is that someone/something from IP address 220.127.116.11 requested the pages named after “GET” on the website, for example, a page named “following-sibling::*” etc.
Does it need to be said that no such pages exist (that’s what the “
When I saw this I was rather puzzled; and looked up
Why is your web crawler trying to access pages that don’t exist on my website?
(Emphasis mine) Oh ok. They are looking into
Looks like a pretty competitive business when people start pulling at straws like this. Also I take it bandwidth is easier to come by than crawling software that avoids such silly attempts.