Stephan WehnerBlog and Homepage

January 27, 2010

webcrawlers desperate for content

Filed under: internet,programming — sw @ 9:42 am

I recently found this in the web server logs of one of the websites I look after:

38.100.8.50 - - [26/Jan/2010:05:01:44 -0800] "GET /application/json HTTP/1.1" 404 763 "-" "panscient.com"
38.100.8.50 - - [26/Jan/2010:05:01:47 -0800] "GET /following-sibling::* HTTP/1.1" 404 763 "-" "panscient.com"
38.100.8.50 - - [26/Jan/2010:05:01:55 -0800] "GET /AppleWebKit/ HTTP/1.1" 404 763 "-" "panscient.com"
38.100.8.50 - - [26/Jan/2010:05:01:58 -0800] "GET /following-sibling::* HTTP/1.1" 404 763 "-" "panscient.com"

In case you are not familiar with web server log files, these line mean is that someone/something from IP address 38.100.8.50 requested the pages named after “GET” on the website, for example, a page named “following-sibling::*” etc.

Does it need to be said that no such pages exist (that’s what the “404” indicates)?

When I saw this I was rather puzzled; and looked up panscient.com (the last item on each line). Their home page says they provide some kind of vertical search service, whatever that is. On their FAQ page, I found this:

Why is your web crawler trying to access pages that don’t exist on my website?

Our web crawler attempts to extract links to valid web pages from javascript and other scripting languages. The crawler may misinterpret the information in these scripts and request a page that does not actually exist. These requests are attempts to retrieve valid web content, and are not an attempt to circumvent your webserver security.

(Emphasis mine) Oh ok. They are looking into javascript files on the web site and attempting to extract names of pages that might have content for the “vertical search”. But not successful in this case. As a web developer, I can tell you that javascript files very rarely contain interesting links to web pages.

Looks like a pretty competitive business when people start pulling at straws like this. Also I take it bandwidth is easier to come by than crawling software that avoids such silly attempts.

January 2, 2010

the police: competence and responsibility

Filed under: bc — sw @ 12:25 pm

Today there was news about Danish cartoonist Kurt Westergaard being attacked in his home. I don’t want to talk about the details which you can find covered at

In short, a man broke into his home, Kurt Westergaard hid in a special “panic room”, and called police. The police came and managed to arrest the invader.

What I want to highlight is that

  • the attacker threw an axe at one of the police officers,
  • but, the police officers did not kill the attacker.

They managed to arrest the man after shooting him in the arm and leg.

It’s hard to tell whether the police were simply lucky with this successful outcome or not.

For now, they’re actions look much more competent and responsible than the police officers on this continent, who use tasers on young and old, pregnant women, people with mental problems, and for example, killed Robert Dziekański. The explanation or excuse for using the Taser in Robert Dziekański’s case was that he was “armed” with a stapler – compare that to a flying axe.

Powered by WordPress