My thrackle.org website is alive again. It’s about a nice math problem that I worked on 10 – 18 years ago.
March 2, 2010
January 27, 2010
webcrawlers desperate for content
I recently found this in the web server logs of one of the websites I look after:
38.100.8.50 - - [26/Jan/2010:05:01:44 -0800] "GET /application/json HTTP/1.1" 404 763 "-" "panscient.com" 38.100.8.50 - - [26/Jan/2010:05:01:47 -0800] "GET /following-sibling::* HTTP/1.1" 404 763 "-" "panscient.com" 38.100.8.50 - - [26/Jan/2010:05:01:55 -0800] "GET /AppleWebKit/ HTTP/1.1" 404 763 "-" "panscient.com" 38.100.8.50 - - [26/Jan/2010:05:01:58 -0800] "GET /following-sibling::* HTTP/1.1" 404 763 "-" "panscient.com"
In case you are not familiar with web server log files, these line mean is that someone/something from IP address 38.100.8.50 requested the pages named after “GET” on the website, for example, a page named “following-sibling::*” etc.
Does it need to be said that no such pages exist (that’s what the “404” indicates)?
When I saw this I was rather puzzled; and looked up panscient.com (the last item on each line). Their home page says they provide some kind of vertical search service, whatever that is. On their FAQ page, I found this:
Why is your web crawler trying to access pages that don’t exist on my website?
Our web crawler attempts to extract links to valid web pages from javascript and other scripting languages. The crawler may misinterpret the information in these scripts and request a page that does not actually exist. These requests are attempts to retrieve valid web content, and are not an attempt to circumvent your webserver security.
(Emphasis mine) Oh ok. They are looking into javascript files on the web site and attempting to extract names of pages that might have content for the “vertical search”. But not successful in this case. As a web developer, I can tell you that javascript files very rarely contain interesting links to web pages.
Looks like a pretty competitive business when people start pulling at straws like this. Also I take it bandwidth is easier to come by than crawling software that avoids such silly attempts.
December 27, 2009
truste.org ssl certificate problems
Today, a little note about a problem with https that I ran into with https://www.truste.org
When visiting that site my Firefox (Version 3.0) warned me that
Secure Connection Failed
www.truste.org uses an invalid security certificate.
The certificate is only valid for *.truste.com
Visiting https://www.truste.com instead simply timed out: “The server at www.truste.com is taking too long to respond.”
Looks like they didn’t configure their web server properly. A bit odd since they specialize “as the leading internet privacy services provider.”
May 7, 2009
slashdot down
Website administrators fear the slashdot effect (“slashdotting” / “being slashdotted”) — now slashdot.org, “News for nerds. Stuff that matters.”, is down itself. Here is a screen shot:

Unclear what “Guru Meditation” refers to, but in case you’re wondering, the Varnish link generated by the slashdot web server goes to http://www.varnish-cache.org. Which takes you to http://varnish.projects.linpro.no, which says,
Welcome to the Varnish project
Varnish is a state-of-the-art, high-performance HTTP accelerator
(The slashdot site was working again an hour later)
April 23, 2009
on github
Joined github today, you can look up my (future) public software at
Added a little project which should make Rails development a little easier when it comes to working with the database directly. For now only for mysql. See my_sql.rb under http://github.com/stephanwehner/railsgoodies.
Thanks to my friend Sam for encouraging me.
January 31, 2009
google broken
This morning Google’s search results don’t work.
Let’s say you search Google for water. Then:
Each result has a warning under its link “This site may harm your computer”:

Clicking on the link doesn’t take your browser to the page as usual, but brings up an error message.

Clicking on the “This site may harm your computer link” produces a help page with the title “Concerns About Web Search Results: Results labeled ‘This site may harm your computer’”:

So now I search Google for “Concerns About Web Search Results: Results labeled ‘This site may harm your computer‘”. The first result is a Google support page at
http://www.google.com/support/websearch/bin/answer.py?hl=en&answer=45449
But accessing that page leads to another error (no screen shot):
We’re sorry…
… but your query looks similar to automated requests from a computer virus or spyware application. To protect our users, we can’t process your request right now.
What a mess!
Questions
- Should visiting any web page really “harm your computer”?
- On what basis would Google think that a web page is going to “harm your computer”? Does it take into account or even know what kind of computer you are using?
- If Google had reason to believe a web page were to “harm your computer”, should the page be really listed as a search result? (Less is more?)
- If a search result page is not marked with the warning, would you blame Google if you then visited the search result page and your computer came out “harmed”?
- Are these search result page getting too crowded altogether? craigslist has barely changed their listing format and they’re doing just fine.
Update
Of course, this was a temporary glitch. According to google’s blog, “the errors began appearing between 6:27 a.m. and 6:40 a.m. and began disappearing between 7:10 and 7:25 a.m. [PST]“. (So I ran into this just towards the end, around 7:20). The problem’s root cause is given as:
“Unfortunately (and here’s the human error), the URL of ‘/’ was mistakenly checked in [to a list of bad URL's] as a value to the file and ‘/’ expands to all URLs”.
And it wasn’t StopBadWare.org’s list as Google had originally posted. (The two organizations work together on this list). Well, mistakes happen …