Deep linking and bandwidth theft

Fri 26 July 2002 : Deep linking and bandwidth theft

Some sites will go to court because other sites link to pages on their site
so you can get to interesting bits of the site without clicking through
loads of index pages with al their banner ads.
Especially newspaper sites seem to be infected with this disease.

But with Apache Webserver it is perfectly possible to set the rules for
linking to articles on a site without having to look EXTREMELY STUPID
because you need lawyers to avoid links to your site in stead of someone
who configures your webserver. Last time I checked a technician who can
configure special access rules in Apache was still cheaper then a lawyer.

But hey, if you want to become the ridicule of the Internet community,
go ahead. They will have a laugh at your expense and post on places like
Slashdot that you don't "get it".

If you want some opinions on the subject of deep linking, search Google.
Jakob Nielsen has written a good article about deep linking
being good linking
. And Jakob Nielsen isn't just another name when it
comes to web usability and user interfaces.

On my other site, The Virtual Bookcase I promote deep linking.
Please link to any book or author on the site, the URLs are made to stay the
same.

This is all about linking to interesting content on your site. There is
a different bit of 'deep linking' which is more commonly known as
'bandwidth theft'.

The first time I had to deal with this was not set up as 'bandwidth
theft', but someone who linked to all kinds of images on my site
from some discussion board to 'get back at me' (I reported a portscan
from an IP unknown to me to the ISP where they decided to kill the
account).

This showed in my web-logs as an interesting new set of external
referrers. So I visited the referring pages and found all kinds of
interesting name-calling.

So.. I dug around the Apache website and found the
Apache 1.3 URL rewriting guide which has special 'cookbook'
entries for cases like this. Using the Apache mod_rewrite module
it's possible to do almost anything based on the URL and the environment
variables. The HTTP_REFERRER is one of those.

So, copying and adapting a cookbook entry from that guide, I got to a
.htaccess file with:

RewriteEngine on

RewriteBase /~koos

RewriteCond %{HTTP_REFERER} !^$
RewriteCond %{HTTP_REFERER} !^http://idefix.net/.*$ [NC]
RewriteCond %{HTTP_REFERER} !^http://www.idefix.net/.*$ [NC]
RewriteCond %{HTTP_REFERER} !^http://images\.google\..*/.*$ [NC]
RewriteRule .*\.(jpg|gif|au|wav|mp3)$ - [F]

And that stopped most of the stupidy then. The images will just give a 403
when directly linked or a broken image when linked from a page.

A few months later I grepped my logs for 403 errors and found that other
places also linked images and never bothered to check the results because
the pages were filled with linked images. So a slight update was needed to
give a bigger hint..

RewriteEngine on

RewriteBase /~koos

RewriteCond %{HTTP_REFERER} !^$
RewriteCond %{HTTP_REFERER} !^http://idefix.net/.*$ [NC]
RewriteCond %{HTTP_REFERER} !^http://www.idefix.net/.*$ [NC]
RewriteCond %{HTTP_REFERER} !^http://images\.google\..*/.*$ [NC]
RewriteRule .*\.(jpg|gif)$ stealing.gif [T=image/gif,L]

'stealing.gif' is an animated gif with the text 'stealing bandwith is lame'.

Looking in the logs, it seems the message is getting through. A bit.

Most recent entries
My take on Microsoft wants to buy yahoo
Last updated Fri 01 February 2008
The server room as multistable climate system
Last updated Wed 09 January 2008
Comparing tvtime and XawTV
Last updated Fri 30 November 2007
From VIDEO_TS to working video DVD in Linux
Last updated Tue 27 November 2007
Configuring ssh on a Netgear GSM7224/GSM7248 switch
Last updated Thu 29 March 2007
mod_authnz_ldap, Apache 2.2 and allowing all ldap users
Last updated Tue 13 February 2007
weblog software
Last updated Mon 01 May 2006
FreeBSD ntpd PPS setup (PPS slave)
Last updated Mon 01 May 2006
homeplug netwerk
Last updated Thu 02 March 2006
Monitoring squid using mon
Last updated Mon 23 January 2006
All entries


Copyright
Valid HTML 4.01!
Valid CSS!
The Irregular is an irregular column-like something which I write. Any opinion in The Irregular is my own personal opinion and has nothing to do with any current, past or future employers or any other person/company I may have contact with.

I consider it my copyright what I write here, please get in touch with me if you want to copy/republish it.

Koos van den Hout, koos@kzdoos.xs4all.nl
The Virtual Bookcase / Camp Wireless / SnowCam