Let me start with mentioning that I’m writing this post on March 24, 2013. This kind of stuff is sure to change over time.
One of the things about the Internet is that it doesn’t have location built in. You don’t get to know where someone was when they wrote you an email, website operators don’t get to know where their visitors are when they visit the site. People don’t get to know where the craigslist servers are which put all those bytes together that make up a listing, and so on, and so on.
So people try to approximate. Because it is deemed to be useful information: country, region, city, latitude, longitude, ZIP code, time zone data and more. Even if it is not absolutely reliable, you can still derive some information, some picture. Different services are available, some free, some quite inexpensive, and some quite expensive. Most come in a form of a database file that you download, after paying for a license to use the data. Periodic updates to the database are made available. Web services are also available; this is where an IP address is submitted, over the Internet itself, to the service provider, and they immediately respond with their location data for this address. You can read more about IP Address Location at wikipedia.
Of course, these services don’t work when people use so called proxies and VPN‘s. In fact, proxies are set up precisely for the purpose of fooling services which are ordinarily restricted by location (e.g. for accessing Netflix or Spotify from non-US locations) into handing over the goods. Such proxies are not particularly costly to use.
Now, when you look at determining location for mobile devices, these models become quite questionable. Surely the network is not aligned with city boundaries, and surely a mobile device’s Internet address does not change smoothly as you move about. Of course, you can move faster than a database update.
So I thought I’d try out different IP Address location services – with my mobile device, using the “Data Plan,” not our home’s router or Wifi. I was in Vancouver, British Columbia, most definitely, the whole time.
Vancouver, BC (this is said to be based on the free version of MaxMind, the paid version of which placed me in Toronto)
“Location: … actually we haven’t a clue.”
So, you see, some got it right, and some did not. (I didn’t even bother to read or record the latitude/longitude that were given in some cases.)
In case you are not familiar with web server log files, these line mean is that someone/something from IP address 184.108.40.206 requested the pages named after “GET” on the website, for example, a page named “following-sibling::*” etc.
Does it need to be said that no such pages exist (that’s what the “404” indicates)?
When I saw this I was rather puzzled; and looked up panscient.com (the last item on each line). Their home page says they provide some kind of vertical search service, whatever that is. On their FAQ page, I found this:
Why is your web crawler trying to access pages that don’t exist on my website?
Looks like a pretty competitive business when people start pulling at straws like this. Also I take it bandwidth is easier to come by than crawling software that avoids such silly attempts.
But accessing that page leads to another error (no screen shot):
… but your query looks similar to automated requests from a computer virus or spyware application. To protect our users, we can’t process your request right now.
What a mess!
Should visiting any web page really “harm your computer”?
On what basis would Google think that a web page is going to “harm your computer”? Does it take into account or even know what kind of computer you are using?
If Google had reason to believe a web page were to “harm your computer”, should the page be really listed as a search result? (Less is more?)
If a search result page is not marked with the warning, would you blame Google if you then visited the search result page and your computer came out “harmed”?
Are these search result page getting too crowded altogether? craigslist has barely changed their listing format and they’re doing just fine.
Of course, this was a temporary glitch. According to google’s blog, “the errors began appearing between 6:27 a.m. and 6:40 a.m. and began disappearing between 7:10 and 7:25 a.m. [PST]“. (So I ran into this just towards the end, around 7:20). The problem’s root cause is given as:
“Unfortunately (and here’s the human error), the URL of ‘/’ was mistakenly checked in [to a list of bad URL's] as a value to the file and ‘/’ expands to all URLs”.
And it wasn’t StopBadWare.org’s list as Google had originally posted. (The two organizations work together on this list). Well, mistakes happen …