<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Stephan Wehner &#187; internet</title>
	<atom:link href="http://stephan.sugarmotor.org/category/internet/feed/" rel="self" type="application/rss+xml" />
	<link>http://stephan.sugarmotor.org</link>
	<description>Blog and Homepage</description>
	<lastBuildDate>Mon, 23 Aug 2010 04:50:59 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.1</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>thrackle.org alive again</title>
		<link>http://stephan.sugarmotor.org/2010/03/thrackle-org-alive-again/</link>
		<comments>http://stephan.sugarmotor.org/2010/03/thrackle-org-alive-again/#comments</comments>
		<pubDate>Tue, 02 Mar 2010 19:13:36 +0000</pubDate>
		<dc:creator>sw</dc:creator>
				<category><![CDATA[internet]]></category>
		<category><![CDATA[programming]]></category>

		<guid isPermaLink="false">http://stephan.sugarmotor.org/?p=362</guid>
		<description><![CDATA[My thrackle.org website is alive again. It&#8217;s about a nice math problem that I worked on 10 – 18 years ago.
]]></description>
			<content:encoded><![CDATA[<p>My <a href="http://www.thrackle.org/">thrackle.org</a> website is alive again. It&#8217;s about a nice math problem that I worked on 10 – 18 years ago.</p>
]]></content:encoded>
			<wfw:commentRss>http://stephan.sugarmotor.org/2010/03/thrackle-org-alive-again/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>webcrawlers desperate for content</title>
		<link>http://stephan.sugarmotor.org/2010/01/webcrawlers-desperate-for-content/</link>
		<comments>http://stephan.sugarmotor.org/2010/01/webcrawlers-desperate-for-content/#comments</comments>
		<pubDate>Wed, 27 Jan 2010 17:42:39 +0000</pubDate>
		<dc:creator>sw</dc:creator>
				<category><![CDATA[internet]]></category>
		<category><![CDATA[programming]]></category>

		<guid isPermaLink="false">http://stephan.sugarmotor.org/?p=348</guid>
		<description><![CDATA[I recently found this in the web server logs of one of the websites I look after:
38.100.8.50 - - [26/Jan/2010:05:01:44 -0800] "GET /application/json HTTP/1.1" 404 763 "-" "panscient.com"
38.100.8.50 - - [26/Jan/2010:05:01:47 -0800] "GET /following-sibling::* HTTP/1.1" 404 763 "-" "panscient.com"
38.100.8.50 - - [26/Jan/2010:05:01:55 -0800] "GET /AppleWebKit/ HTTP/1.1" 404 763 "-" "panscient.com"
38.100.8.50 - - [26/Jan/2010:05:01:58 -0800] "GET [...]]]></description>
			<content:encoded><![CDATA[<p>I recently found this in the web server logs of one of the websites I look after:</p>
<pre>38.100.8.50 - - [26/Jan/2010:05:01:44 -0800] "GET /application/json HTTP/1.1" 404 763 "-" "panscient.com"
38.100.8.50 - - [26/Jan/2010:05:01:47 -0800] "GET /following-sibling::* HTTP/1.1" 404 763 "-" "panscient.com"
38.100.8.50 - - [26/Jan/2010:05:01:55 -0800] "GET /AppleWebKit/ HTTP/1.1" 404 763 "-" "panscient.com"
38.100.8.50 - - [26/Jan/2010:05:01:58 -0800] "GET /following-sibling::* HTTP/1.1" 404 763 "-" "panscient.com"</pre>
<p>In case you are not familiar with web server log files, these line mean is that someone/something from IP address 38.100.8.50 requested the pages named after &#8220;GET&#8221; on the website, for example, a page named &#8220;following-sibling::*&#8221; etc.</p>
<p>Does it need to be said that no such pages exist (that&#8217;s what the &#8220;<a href="http://en.wikipedia.org/wiki/HTTP_404">404</a>&#8221; indicates)?</p>
<p>When I saw this I was rather puzzled; and looked up <a href="http://panscient.com ">panscient.com</a> (the last item on each line). Their home page says they provide some kind of vertical search service, whatever that is. On their <a href="http://panscient.com/faq.htm#5">FAQ</a> page, I found this:</p>
<blockquote><p>Why is your web crawler trying to access pages that don&#8217;t exist on my website?</p>
<p>Our web crawler attempts to extract links to valid web pages from javascript and other scripting languages.<strong> The crawler may misinterpret the information in these scripts and request a page that does not actually exist</strong>. These requests are attempts to retrieve valid web content, and are not an attempt to circumvent your webserver security.</p></blockquote>
<p>(Emphasis mine) Oh ok. They are looking into <a href="http://en.wikipedia.org/wiki/JavaScript">javascript</a> files on the web site and attempting to extract names of pages that might have content for the &#8220;vertical search&#8221;. But not successful in this case. As a web developer, I can tell you that javascript files very rarely contain interesting links to web pages.</p>
<p>Looks like a pretty competitive business when people start pulling at straws like this. Also I take it bandwidth is easier to come by than crawling software that avoids such silly attempts.</p>
]]></content:encoded>
			<wfw:commentRss>http://stephan.sugarmotor.org/2010/01/webcrawlers-desperate-for-content/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>truste.org ssl certificate problems</title>
		<link>http://stephan.sugarmotor.org/2009/12/truste-org-ssl-certificate-problems/</link>
		<comments>http://stephan.sugarmotor.org/2009/12/truste-org-ssl-certificate-problems/#comments</comments>
		<pubDate>Mon, 28 Dec 2009 01:31:24 +0000</pubDate>
		<dc:creator>sw</dc:creator>
				<category><![CDATA[internet]]></category>

		<guid isPermaLink="false">http://stephan.sugarmotor.org/?p=332</guid>
		<description><![CDATA[Today, a little note about a problem with https that I ran into with https://www.truste.org
When visiting that site my Firefox (Version 3.0) warned me that
Secure Connection Failed
www.truste.org uses an invalid security certificate.
The certificate is only valid for *.truste.com
Visiting https://www.truste.com instead simply timed out: &#8220;The server at www.truste.com is taking too long to respond.&#8221;
Looks like they [...]]]></description>
			<content:encoded><![CDATA[<p>Today, a little note about a problem with https that I ran into with <a href="https://www.truste.org">https://www.truste.org</a></p>
<p>When visiting that site my Firefox (Version 3.0) warned me that</p>
<blockquote><p>Secure Connection Failed<br />
<a class="linkification-ext" title="Linkification: http://www.truste.org" href="http://www.truste.org">www.truste.org</a> uses an invalid security certificate.<br />
The certificate is only valid for *.truste.com</p></blockquote>
<p>Visiting <a href="https://www.truste.com">https://www.truste.com</a> instead simply timed out: &#8220;The server at <a class="linkification-ext" title="Linkification: http://www.truste.com" href="http://www.truste.com">www.truste.com</a> is taking too long to respond.&#8221;</p>
<p>Looks like they didn&#8217;t configure their web server properly. A bit odd since they specialize &#8220;as the leading internet privacy services provider.&#8221;</p>
]]></content:encoded>
			<wfw:commentRss>http://stephan.sugarmotor.org/2009/12/truste-org-ssl-certificate-problems/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>slashdot down</title>
		<link>http://stephan.sugarmotor.org/2009/05/slashdot-down/</link>
		<comments>http://stephan.sugarmotor.org/2009/05/slashdot-down/#comments</comments>
		<pubDate>Fri, 08 May 2009 04:44:59 +0000</pubDate>
		<dc:creator>sw</dc:creator>
				<category><![CDATA[internet]]></category>

		<guid isPermaLink="false">http://stephan.sugarmotor.org/?p=246</guid>
		<description><![CDATA[Website administrators fear the slashdot effect (&#8220;slashdotting&#8221; / &#8220;being slashdotted&#8221;) &#8212; now slashdot.org, &#8220;News for nerds. Stuff that matters.&#8221;,Â  is down itself. Here is a screen shot:

Unclear what &#8220;Guru Meditation&#8221; refers to, but in case you&#8217;re wondering, the Varnish link generated by the slashdot web server goes to http://www.varnish-cache.org. Which takes you to http://varnish.projects.linpro.no, which [...]]]></description>
			<content:encoded><![CDATA[<p>Website administrators fear the <a href="http://www.google.ca/search?q=slashdot+effect">slashdot effect</a> (&#8220;slashdotting&#8221; / &#8220;being slashdotted&#8221;) &#8212; now <a href="http://slashdot.org">slashdot.org</a>, &#8220;News for nerds. Stuff that matters.&#8221;,Â  is down itself. Here is a screen shot:</p>
<p style="text-align: center;"><img class="aligncenter" style="border: 1px solid silver;" title="Screenshot" src="http://stephan.sugarmotor.org/slashdot-down.jpg" alt="Screenshot" width="601" height="295" /></p>
<p>Unclear what &#8220;Guru Meditation&#8221; refers to, but in case you&#8217;re wondering, the Varnish link generated by the slashdot web server goes to <a href="http://www.varnish-cashe.org">http://www.varnish-cache.org</a>. Which takes you to <a href="http://varnish.projects.linpro.no">http://varnish.projects.linpro.no</a>, which says,</p>
<blockquote><p><strong>Welcome to the Varnish project</strong><br />
Varnish is a state-of-the-art, high-performance HTTP accelerator</p></blockquote>
<p>(The slashdot site was working again an hour later)</p>
]]></content:encoded>
			<wfw:commentRss>http://stephan.sugarmotor.org/2009/05/slashdot-down/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>on github</title>
		<link>http://stephan.sugarmotor.org/2009/04/on-github/</link>
		<comments>http://stephan.sugarmotor.org/2009/04/on-github/#comments</comments>
		<pubDate>Fri, 24 Apr 2009 05:21:41 +0000</pubDate>
		<dc:creator>sw</dc:creator>
				<category><![CDATA[internet]]></category>

		<guid isPermaLink="false">http://stephan.sugarmotor.org/?p=213</guid>
		<description><![CDATA[Joined github today, you can look up my (future) public software at
http://github.com/stephanwehner
Added a little project which should make Rails development a little easier when it comes to working with the database directly. For now only for mysql. See my_sql.rb under http://github.com/stephanwehner/railsgoodies.
Thanks to my friend Sam for encouraging me.
Learn about git if you haven&#8217;t heared about [...]]]></description>
			<content:encoded><![CDATA[<p>Joined <a href="http://github.com">github</a> today, you can look up my (future) public software at</p>
<blockquote><p><a href="http://github.com/stephanwehner">http://github.com/stephanwehner</a></p></blockquote>
<p>Added a little project which should make Rails development a little easier when it comes to working with the database directly. For now only for mysql. See <strong>my_sql.rb</strong> under <a href="http://github.com/stephanwehner/railsgoodies">http://github.com/stephanwehner/railsgoodies.</a></p>
<p>Thanks to my friend <a href="http://twitter.com/samvincent">Sam</a> for encouraging me.</p>
<p><a href="http://git-scm.com/">Learn about git if you haven&#8217;t heared about it</a>,</p>
]]></content:encoded>
			<wfw:commentRss>http://stephan.sugarmotor.org/2009/04/on-github/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>google broken</title>
		<link>http://stephan.sugarmotor.org/2009/01/google-broken/</link>
		<comments>http://stephan.sugarmotor.org/2009/01/google-broken/#comments</comments>
		<pubDate>Sat, 31 Jan 2009 16:38:23 +0000</pubDate>
		<dc:creator>sw</dc:creator>
				<category><![CDATA[internet]]></category>

		<guid isPermaLink="false">http://stephan.sugarmotor.org/?p=134</guid>
		<description><![CDATA[This morning Google&#8217;s search results don&#8217;t work.
Let&#8217;s say you search Google for water. Then:
Each result has a warning under its link &#8220;This site may harm your computer&#8221;:


Clicking on the link doesn&#8217;t take your browser to the page as usual, but brings up an error message.


Clicking on the &#8220;This site may harm your computer link&#8221; produces [...]]]></description>
			<content:encoded><![CDATA[<p style="text-align: left;">This morning Google&#8217;s search results don&#8217;t work.</p>
<p style="text-align: left;">Let&#8217;s say you search Google for <a href="http://www.google.ca/search?hl=en&amp;q=water&amp;btnG=Google+Search&amp;meta=">water</a>. Then:</p>
<p style="text-align: left;">Each result has a warning under its link &#8220;This site may harm your computer&#8221;:</p>
<p style="text-align: center;"><img class="aligncenter size-full wp-image-141" title="Google search results page for &quot;water&quot;" src="http://stephan.sugarmotor.org/wp-content/uploads/2009/01/pix-google-harms1.png" alt="Google search results page for &quot;water&quot;" width="620" /></p>
<p style="text-align: left;">
<p style="text-align: left;">Clicking on the link doesn&#8217;t take your browser to the page as usual, but brings up an error message.</p>
<p style="text-align: left;">
<p style="text-align: left;"><img class="aligncenter size-full wp-image-142" title="Error page when clicking on a search result link" src="http://stephan.sugarmotor.org/wp-content/uploads/2009/01/google-harms-forbidden1.png" alt="Error page when clicking on a search result link" width="628" height="269" /></p>
<p>Clicking on the &#8220;This site may harm your computer link&#8221; produces a help page with the title &#8220;Concerns About Web Search Results: Results labeled &#8216;This site may harm your computer&#8217;&#8221;:</p>
<p><img class="aligncenter size-full wp-image-147" title="Google help page about harmful search result pages" src="http://stephan.sugarmotor.org/wp-content/uploads/2009/01/google-harms-concerns1.png" alt="Google help page about harmful search result pages" width="624" height="654" /></p>
<p style="text-align: left;">
<p style="text-align: left;">
<p style="text-align: left;">So now I search Google for &#8220;<a href="http://www.google.com/search?q=Concerns+About+Web+Search+Results%3A+Results+labeled+'This+site+may+harm+your+computer'">Concerns About Web Search Results: Results labeled &#8216;This site may harm your computer</a>&#8216;&#8221;. The first result is a Google support page at</p>
<p style="text-align: center;"><a href="http://www.google.com/support/websearch/bin/answer.py?hl=en&amp;answer=45449">http://www.google.com/support/websearch/bin/answer.py?hl=en&amp;answer=45449</a></p>
<p style="text-align: left;">But accessing that page leads to another error (no screen shot):</p>
<blockquote>
<h4>We&#8217;re sorry&#8230;</h4>
<p>&#8230; but your query looks similar to automated requests from a computer virus or spyware application.  To protect our users, we can&#8217;t process your request right now.</p></blockquote>
<p>What a mess!</p>
<h2>Questions</h2>
<ul>
<li>Should visiting any web page really &#8220;harm your computer&#8221;?</li>
<li>On what basis would Google think that a web page is going to &#8220;harm your computer&#8221;? Does it take into account or even know what kind of computer you are using?</li>
<li>If Google had reason to believe a web page were to &#8220;harm your computer&#8221;, should the page be really listed as a search result? (Less is more?)</li>
<li>If a search result page is <strong>not</strong> marked with the warning, would you blame Google if you then visited the search result page and your computer came out &#8220;harmed&#8221;?</li>
<li>Are these search result page getting too crowded altogether? <a href="http://craigslist.org">craigslist</a> has barely changed their listing format and they&#8217;re doing just fine.</li>
</ul>
<h2>Update</h2>
<p>Of course, this was a temporary glitch. According to <a href="http://googleblog.blogspot.com/2009/01/this-site-may-harm-your-computer-on.html">google&#8217;s blog</a>, &#8220;<span style="background-color: white;"><span style="color: #0b5394;"><span style="border-collapse: collapse;"><span style="color: black;">the errors began appearing between 6:27 a.m. and 6:40 a.m. and began disappearing between 7:10 and 7:25 a.m. [PST]&#8220;. (So I ran into this just towards the end, around 7:20). The problem&#8217;s root cause is given as:<br />
</span></span></span></span></p>
<p style="padding-left: 30px;">&#8220;<span style="background-color: white;"><span style="color: #0b5394;"><span style="border-collapse: collapse;"><span style="color: black;">Unfortunately (and here&#8217;s the human error), the URL of &#8216;/&#8217; was mistakenly checked in [to a list of bad URL's] as a value to the file and &#8216;/&#8217; expands to all URLs&#8221;.</span></span></span></span></p>
<p><span style="background-color: white;"><span style="color: #0b5394;"><span style="border-collapse: collapse;"><span style="color: black;">And it wasn&#8217;t  <a href="http://blog.stopbadware.org/2009/01/31/google-glitch-causes-confusion">StopBadWare.org&#8217;s</a> list as Google had originally posted. (The two organizations work together on this list). Well, mistakes happen &#8230; </span></span></span></span></p>
]]></content:encoded>
			<wfw:commentRss>http://stephan.sugarmotor.org/2009/01/google-broken/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>
