Deep linking and bandwidth theft / 2002-07-26

2002-07-26 Deep linking and bandwidth theft 16 years ago
Some sites will go to court because other sites link to pages on their site
so you can get to interesting bits of the site without clicking through
loads of index pages with al their banner ads.
Especially newspaper sites seem to be infected with this disease.

But with Apache Webserver it is perfectly possible to set the rules for
linking to articles on a site without having to look EXTREMELY STUPID
because you need lawyers to avoid links to your site in stead of someone
who configures your webserver. Last time I checked a technician who can
configure special access rules in Apache was still cheaper then a lawyer.

But hey, if you want to become the ridicule of the Internet community,
go ahead. They will have a laugh at your expense and post on places like
Slashdot that you don't "get it".

If you want some opinions on the subject of deep linking, search Google.
Jakob Nielsen has written a good article about deep linking
being good linking
. And Jakob Nielsen isn't just another name when it
comes to web usability and user interfaces.

On my other site, The Virtual Bookcase I promote deep linking.
Please link to any book or author on the site, the URLs are made to stay the

This is all about linking to interesting content on your site. There is
a different bit of 'deep linking' which is more commonly known as
'bandwidth theft'.

The first time I had to deal with this was not set up as 'bandwidth
theft', but someone who linked to all kinds of images on my site
from some discussion board to 'get back at me' (I reported a portscan
from an IP unknown to me to the ISP where they decided to kill the

This showed in my web-logs as an interesting new set of external
referrers. So I visited the referring pages and found all kinds of
interesting name-calling.

So.. I dug around the Apache website and found the
Apache mod_rewrite manual which has special 'cookbook'
entries for cases like this. Using the Apache mod_rewrite module
it's possible to do almost anything based on the URL and the environment
variables. The HTTP_REFERRER is one of those.

So, copying and adapting a cookbook entry from that guide, I got to a
.htaccess file with:

RewriteEngine on

RewriteBase /~koos

RewriteCond %{HTTP_REFERER} !^$
RewriteCond %{HTTP_REFERER} !^*$ [NC]
RewriteCond %{HTTP_REFERER} !^*$ [NC]
RewriteCond %{HTTP_REFERER} !^http://images\.google\..*/.*$ [NC]
RewriteRule .*\.(jpg|gif|au|wav|mp3)$ - [F]

And that stopped most of the stupidy then. The images will just give a 403
when directly linked or a broken image when linked from a page.

A few months later I grepped my logs for 403 errors and found that other
places also linked images and never bothered to check the results because
the pages were filled with linked images. So a slight update was needed to
give a bigger hint..

RewriteEngine on

RewriteBase /~koos

RewriteCond %{HTTP_REFERER} !^$
RewriteCond %{HTTP_REFERER} !^*$ [NC]
RewriteCond %{HTTP_REFERER} !^*$ [NC]
RewriteCond %{HTTP_REFERER} !^http://images\.google\..*/.*$ [NC]
RewriteRule .*\.(jpg|gif)$ stealing.gif [T=image/gif,L]

'stealing.gif' is an animated gif with the text 'stealing bandwith is lame'.

Looking in the logs, it seems the message is getting through. A bit.

Tags: , ,

, reachable as PGP encrypted e-mail preferred.

PGP key 5BA9 368B E6F3 34E4 local copy PGP key 5BA9 368B E6F3 34E4 via keyservers pgp key statistics for 0x5BA9368BE6F334E4 Koos van den Hout
Other webprojects: Camp Wireless, wireless Internet access at campsites, The Virtual Bookcase, book reviews
This page generated in 0.004388 seconds.