webcrawlers desperate for content

January 27, 2010 internet, programming No comments

I recently found this in the web server logs of one of the websites I look after:

38.100.8.50 - - [26/Jan/2010:05:01:44 -0800] "GET /application/json HTTP/1.1" 404 763 "-" "panscient.com"
38.100.8.50 - - [26/Jan/2010:05:01:47 -0800] "GET /following-sibling::* HTTP/1.1" 404 763 "-" "panscient.com"
38.100.8.50 - - [26/Jan/2010:05:01:55 -0800] "GET /AppleWebKit/ HTTP/1.1" 404 763 "-" "panscient.com"
38.100.8.50 - - [26/Jan/2010:05:01:58 -0800] "GET /following-sibling::* HTTP/1.1" 404 763 "-" "panscient.com"

In case you are not familiar with web server log files, these line mean is that someone/something from IP address 38.100.8.50 requested the pages named after “GET” on the website, for example, a page named “following-sibling::*” etc.

Does it need to be said that no such pages exist (that’s what the “404” indicates)?

When I saw this I was rather puzzled; and looked up panscient.com (the last item on each line). Their home page says they provide some kind of vertical search service, whatever that is. On their FAQ page, I found this:

Why is your web crawler trying to access pages that don’t exist on my website?

Our web crawler attempts to extract links to valid web pages from javascript and other scripting languages. The crawler may misinterpret the information in these scripts and request a page that does not actually exist. These requests are attempts to retrieve valid web content, and are not an attempt to circumvent your webserver security.

(Emphasis mine) Oh ok. They are looking into javascript files on the web site and attempting to extract names of pages that might have content for the “vertical search”. But not successful in this case. As a web developer, I can tell you that javascript files very rarely contain interesting links to web pages.

Looks like a pretty competitive business when people start pulling at straws like this. Also I take it bandwidth is easier to come by than crawling software that avoids such silly attempts.

the police: competence and responsibility

January 2, 2010 bc No comments

Today there was news about Danish cartoonist Kurt Westergaard being attacked in his home. I don’t want to talk about the details which you can find covered at

In short, a man broke into his home, Kurt Westergaard hid in a special “panic room”, and called police. The police came and managed to arrest the invader.

What I want to highlight is that

  • the attacker threw an axe at one of the police officers,
  • but, the police officers did not kill the attacker.

They managed to arrest the man after shooting him in the arm and leg.

It’s hard to tell whether the police were simply lucky with this successful outcome or not.

For now, they’re actions look much more competent and responsible than the police officers on this continent, who use tasers on young and old, pregnant women, people with mental problems, and for example, killed Robert Dziekański. The explanation or excuse for using the Taser in Robert Dziekański’s case was that he was “armed” with a stapler – compare that to a flying axe.

truste.org ssl certificate problems

December 27, 2009 internet No comments

Today, a little note about a problem with https that I ran into with https://www.truste.org

When visiting that site my Firefox (Version 3.0) warned me that

Secure Connection Failed
www.truste.org uses an invalid security certificate.
The certificate is only valid for *.truste.com

Visiting https://www.truste.com instead simply timed out: “The server at www.truste.com is taking too long to respond.”

Looks like they didn’t configure their web server properly. A bit odd since they specialize “as the leading internet privacy services provider.”

an email about the HandyDart strike

December 26, 2009 bc No comments

I just sent an email to Martin Lay, director in charge of accessibility at TransLink. HandyDart is an accessible door-to-door transit service in in all of the British Columbia’s larger centres, as well as in many smaller communities. It uses vans and small buses to transport disabled or elderly passengers who cannot use the normal transit system.

I was not aware, but the service has actually been provided by independent contractors. Tim Louis takes credit for founding the system, and its previous operator, the Pacific Transit Cooperative. As far as I can make out, it went bankrupt because it was paying high wages (but you know that that is always just one side of the story). Translink then went out to find a new contractor, and awarded the contract to MVT Canadian Bus in late 2008.

Some useful background links

I’m writing this from the perspective of a Vancouver citizen; I find it difficult to get a good picture beyond that.

The text of my email

(Dated: December 26, 2009)

Dear Mr. Lay,

I just learned that the operator of the HandyDart, MVT Canadian Bus, is not providing the service they were contracted for. As I understand, they were awarded the contract and took over operations at the beginning of 2009. After ten months a labor dispute lead to a strike which has been ongoing since October 26.

I understand that the HandyDart drivers are paid less than “ordinary bus drivers”. Due to the nature of the service, I would have expected them to be paid more! There is more responsibility, and the job is more demanding and requires more skill. Furthermore the users of the service are more dependent on it.

While the HandyDart drivers have a right to strike, I feel that Translink is obliged to provide the service. This is simply an issue of whether Canada is a civilized nation or not. If they have made a poor choice by sub-contracting to MVT Canadian Bus company, that means that Translink needs to urgently work on a more reliable operation. Urgent as in there needs to be a solution tomorrow.

For the time being, the blurb on Translink’s website, “A truly great transit system opens its doors for everyone,” is empty and shallow.

Yours sincerely,

Stephan Wehner

Updates (December 31)

(Not part of the email). Three updates:

  1. From CBC News: “Five hundred striking Metro Vancouver HandyDART workers will start returning to work on Monday after more than two months of picketing. [...] The union began taking down its picket lines on Thursday morning after the union and employer MVT Canadian Bus agreed to binding arbitration.” — good news for those who depend on HandyDart.
  2. I received a lengthy reply from Martin Lay, two days after sending my email.
  3. I have heared in the meantime that the Pacific Transit Co-op still exists (I had written it is bankrupt). A comment to that effect was made on Facebook – but the comment is now gone (Facebook not being reliable for communication).

a sign-up-form lesson

September 7, 2009 systems 1 comment

I came across mail.yeah.net, from a Chinese Internet provider, and thought I’d try out their free email service, and see how that would work out.

The sign up form was full of Chinese characters, naturally, still I could type in my name, fill in the password, date of birth etc. But after about 10 fields or so, I got to this section:

Screenshot from sign up form

Screenshot from the yeah.net sign up form

Now I was kind of stuck. I went to Google / Translate and and pasted in the label, and clicked the “Translate” button:

Screen shot from Google Translate

Screen shot from Google Translate

Please type in the characters above! — a CAPTCHA field — of course! Now I was definitely stuck, and gave up! (I don’t know Chinese characters, nor do I know how to produce them with my keyboard)

I’ve been putting CAPTCHAs on some websites myself, to keep out spam and abuse; see

My friends tell me often that they don’t like it when they have to tackle those CAPTCHAs (this is why I thought of the CAPTCHA for the stephansmap sign up form: it is supposed to entertain, as far as my entertainment talents go in terms of computer graphics.) But it definitely stops the spammers.

So with yeah.net I got to see this all from a different perspective.

the critical mass of the cbc

July 31, 2009 cbc 1 comment

Well tonight, the Canadian Broadcasting Corporation, the “CBC”, is reporting

An estimated 1,000 bicycle-riding members of Critical Mass again disrupted Friday’s rush hour traffic in Vancouver in the latest of the group’s planned monthly protests to promote urban bike use.

See http://www.cbc.ca/canada/british-columbia/story/2009/07/31/bc-critical-mass-bike-ride-vancouver-mayor.html

Instead of sending the CBC a letter to point out the inaccuracy, I thought I’d write it up here.

The unusual, the special nature, what you cannot miss about Critical Mass rides, and what is really quite easy to find out, and what makes it different from many other similar happenings, is that:

  • they are not a group with membership
  • they are not organized
  • they are not a protest (movement)
  • they are not demonstrations.

They are just rides that happen in the evening of the last Friday of each month. There is no agreed upon route. There is no leader. There are no “members of Critical Mass”.

And if a journalist writes about Critical Mass and doesn’t find that out then they haven’t done their homework. It’s really not that hard!

Here’s what wikipedia says about Critical Mass Rides.

Here’s an informal wiki about critical mass rides.

the newspaper deliveryman and the policeman

July 29, 2009 bc No comments

Here’s a clever comment from Rex Mundi on the story of a newspaper deliveryman being viciously attacked by a number of drunk policemen. The policemen were charged, and one of them “has been given a conditional sentence without jail time after pleading guilty” today. (The others’ trials are not complete yet)

His comment is:

So does this mean that off-duty newspaper deliverymen may anticipate no jail time if they get drunk and assault an off-duty police officer?

See http://www.cbc.ca/canada/british-columbia/story/2009/07/29/bc-west-vancouver-policeman-gillan-assault-sentence.html

cryptography: a note on cipher block chaining

July 25, 2009 programming No comments

I’ve been looking into encryption methods recently, and came across this little surprise about cipher block chaining, or CBC, as it is used for block ciphers.

Block ciphers only encrypt messages of a fixed length, which depends on the cipher. To encrypt longer messages one breaks them up into blocks with the block cipher’s length and then individually encrypts these blocks. The receiver decrypts all the encrypted blocks and pastes the original message together. So for example, if your message is 2 kilobytes long (one ordinary page of writing), and the block cipher length is 32 bytes, then 2 kilobytes / 32 bytes = 2 * 1024 / 32 = 64 blocks of 32 bytes each will be encrypted. (Padding may or may not be necessary)

The idea of cipher block chaining is that if such a long message contains identical blocks, or two messages contain identical blocks, then you can tell that from the encrypted parts: they will be the same. Whoever has access to the encrypted message, and if they know the block cipher employed, then they can extract these blocks. While they cannot decrypt the individual blocks, they can compare them. Such is the world of cryptography that there are cases where it should be made difficult to tell that one message contains parts of a different message, or repeats itself.

Cypher Block Chaining

One solution, and the most commonly used “mode of operation” for a block cipher (see 1 , 2 , 3 ) is called Cipher Block Chaining. The idea is to introduce an additional block, called “initial vector”. This block is XOR-ed with the first block to be encrypted. The result is encrypted, and yields the first encrypted block to be sent. This block is however also XOR-ed with the next block to be encrypted. The result is encrypted, and yields the second encrypted block to be sent, and so on. Let’s generalize, and describe more accurately:

Suppose our numbering is such that the first block has number 1 (not 0 as is common).

  • Let P(i) be the i-th block of the plain text message.
  • Let E(X) be the result of encrypting the (plain text) block X.
  • Let D(Y) be the result of decrypting the (encrypted) block Y.
  • Let C(i) be the i-th encrypted (cipher) block.

Then encryption with Cipher Block Chaining can be formalized as:

C(0) := IV, the initial vector
C(i) := E( P(i) XOR C(i-1))

If the receiver knows the initial vector as well as the block cipher’s encryption key they can completely decrypt the message. Decryption is formalized like this:

C(0) := IV, the initial vector
P(i) := D( C(i) ) XOR C(i-1)

Decrypting with a Different Initial Vector

Finally I can point out what surprised me: it is that when decrypting, the blocks P(2), P(3), P(4), and so on do not depend on the initial vector IV that was used for encryption! Only P(1), the first decrypted block, depends on IV, while the other parts of the decrypted message will be the same regardless of IV.

In this way, the contribution of the initial vector is very different from the encryption key! And it is rather nice to see that it need not be any stronger, since it provides the function it is designed for: to hide the information about identical blocks.

And so, if the message is prepended by the the encrypter with some arbitrary initial block, the receiver does not need to know the initial vector used for encryption. After decrypting with some arbitrarily chosen initial vector (all 0′s, for example) they can just throw away the first block; the remaining blocks will represent the encrypted message.

Sample Code with AES and openssl

Here is some rather simple code to illustrate the effect. It is based on one of the Rijndael block ciphers, AES-256 (see Advanced Encryption Standard), and the openssl libary. The openssl options for  enc, “symmetric cipher routines”, are available through man enc

echo "The symmetric cipher commands allow data to be encrypted or decrypted using various block and stream ciphers" > msg.in
# Encrypt msg.in with some key and an initial vector
openssl enc -aes-256-cbc -K 1234567890123456 -iv 1234567890123456 -in msg.in -out msg.crypt
echo Decrypt with both the right key and the right iv
openssl enc -d -aes-256-cbc -K 1234567890123456 -iv 1234567890123456 -in msg.crypt
echo Decrypt with the right key but a different iv
# Pipe into 'od -cx' because there will likely be non-displayable characters. msg.crypt is a properly binary file
openssl enc -d -aes-256-cbc -K 1234567890123456 -iv ABCDEF1234560FED -in msg.crypt | od -cx
echo Compare with the output with the right key and the right iv
openssl enc -d -aes-256-cbc -K 1234567890123456 -iv 1234567890123456 -in msg.crypt | od -cx

When executed in a UNIX shell, and all the required programs are available, the output is:

Decrypt with both the right key and the right iv
The symmetric cipher commands allow data to be encrypted or decrypted using various block and stream ciphers
Decrypt with the right key but a different iv
0000000 355 221 334   J 327   =   V 326   e   t   r   i   c       c   i
        91ed 4adc 3dd7 d656 7465 6972 2063 6963
0000020   p   h   e   r       c   o   m   m   a   n   d   s       a   l
        6870 7265 6320 6d6f 616d 646e 2073 6c61
0000040   l   o   w       d   a   t   a       t   o       b   e       e
        6f6c 2077 6164 6174 7420 206f 6562 6520
0000060   n   c   r   y   p   t   e   d       o   r       d   e   c   r
        636e 7972 7470 6465 6f20 2072 6564 7263
0000100   y   p   t   e   d       u   s   i   n   g       v   a   r   i
        7079 6574 2064 7375 6e69 2067 6176 6972
0000120   o   u   s       b   l   o   c   k       a   n   d       s   t
        756f 2073 6c62 636f 206b 6e61 2064 7473
0000140   r   e   a   m       c   i   p   h   e   r   s  n  
        6572 6d61 6320 7069 6568 7372 000a
0000155
Compare with the output with the right key and the right iv
0000000   T   h   e       s   y   m   m   e   t   r   i   c       c   i
        6854 2065 7973 6d6d 7465 6972 2063 6963
0000020   p   h   e   r       c   o   m   m   a   n   d   s       a   l
        6870 7265 6320 6d6f 616d 646e 2073 6c61
0000040   l   o   w       d   a   t   a       t   o       b   e       e
        6f6c 2077 6164 6174 7420 206f 6562 6520
0000060   n   c   r   y   p   t   e   d       o   r       d   e   c   r
        636e 7972 7470 6465 6f20 2072 6564 7263
0000100   y   p   t   e   d       u   s   i   n   g       v   a   r   i
        7079 6574 2064 7375 6e69 2067 6176 6972
0000120   o   u   s       b   l   o   c   k       a   n   d       s   t
        756f 2073 6c62 636f 206b 6e61 2064 7473
0000140   r   e   a   m       c   i   p   h   e   r   s  n  
        6572 6d61 6320 7069 6568 7372 000a
0000155

As you can see only the first few bytes differ when using the "wrong initial vector".

Just for future reference, here is my system information when running the above code:

$ uname -a
Linux myosin 2.6.24-19-generic #1 SMP Wed Aug 20 22:56:21 UTC 2008 i686 GNU/Linux
$ bash --version
GNU bash, version 3.2.39(1)-release (i486-pc-linux-gnu)
Copyright (C) 2007 Free Software Foundation, Inc.
$ openssl version
OpenSSL 0.9.8g 19 Oct 2007

slashdot down

May 7, 2009 internet No comments

Website administrators fear the slashdot effect (“slashdotting” / “being slashdotted”) — now slashdot.org, “News for nerds. Stuff that matters.”,  is down itself. Here is a screen shot:

Screenshot

Unclear what “Guru Meditation” refers to, but in case you’re wondering, the Varnish link generated by the slashdot web server goes to http://www.varnish-cache.org. Which takes you to http://varnish.projects.linpro.no, which says,

Welcome to the Varnish project
Varnish is a state-of-the-art, high-performance HTTP accelerator

(The slashdot site was working again an hour later)

democratic alternative action now

May 5, 2009 systems No comments

Over here in British Columbia (“B.C.” – also known as “Bring Cash”), it’s election time. (I can’t vote for lack of citizenship, but that is another story.) This time, like last time, there is also a referendum on “Electoral Reform”, to switch to the Single Transferable Vote system (oh boy, they even have a video).

On the weekend I remembered a few ideas I had some years ago about alternatives to the ordinary democratic arrangement. I could recall two but I knew there were three; it took a visit to the Wise Hall to recover the third one: it was a friend’s favourite from when I passed it by her at the time.

None of these are likely to work as such; I think it’s nice to ponder though. Here they are, enjoy:

1. “Copy and Paste”

Instead of maintaining a parliament for your city, province or country, just copy the laws some other parliament comes up with and make them your own. They pass a new law, it becomes yours too. They remove one, its gone for you. Why would you think you can do better than they? Save the time and effort! Usually no one is happy with their parliament anyway.

2. “Everyone is a Minister”

Instead of maintaining a government, divide up all its functions among the constituents. There will be a long list of small areas and responsibilities. Assign each of these areas and responsibilities to one person only: no arguments, they have all the say in their area. If you see something that’s wrong there’s exactly one person to complain to.

Ordinarily, ministers are appointed because of who they know, instead of what they know. After ten years every one of the mini-ministers will be an expert in their field, and do a much better job.

3. “The Less Power, the More Votes”

Usually each person gets one vote. However, some people already have a lot of power over other people’s lives. For example, a store manager has eight hours a day to have things their way. CEO’s of big companies may have thousands of people follow their lead.

At election time, this is reversed. For each person, add up the number of hours times the number of people, for an election period, that they control those other people. If a person controls people that control other people, add the hours from the middle person to the one at the top. The more hours a person is assigned, the less their vote counts.