Looking at web caching options / 2023-06-14

2023-06-14 Looking at web caching options
Somewhere on irc the term "don't host your website on a wet newspaper" is sometimes used when an url getting a bit of serious traffic makes it really respond slow or give errors.

So I looked at my own webservers at home and what would happen if one of the sites got hit with the Slashdot Effect. As I don't like guessing I played with ab - Apache HTTP server benchmarking tool to get some idea of what happens under load and/or highly concurrent access.

Especially highly concurrent access turns out to be an issue because there are only so much database connections available for the webservers. The load average does go up, but the main problem is clients getting a database connection error.

I started looking at caching options to allow the dynamic pages to be cached for short periods. This would make high amounts of traffic have the advantages of having a cached version without losing the advantages of dynamic pages.

By now this has cost me more time and energy than the advantage of ever surviving a high amount of valid traffic. And to be honest the chances of a DDoS attack on my site because someone didn't like something I wrote is higher than the chances of a lot of people suddenly liking something I wrote.

This was all tested with the test and development servers, so actual production traffic was never affected by the tests.

Apache built-in memory cache with memcached

I first tried the Apache module socache_module with socache_memcache_module as backend. This did not cache the dynamic pages, just .css and other static files which originate from diskcache or ssd storage anyway. All kinds of fiddling with the caching headers did not make this work. With debugging enabled all I could see was that the dynamic pages coming from cgid or modperl were not a candidate for caching.

I could have used memcached from the web applications directly, but that would mean I would have to rewrite every script to handle caching. I was hoping to add the caching in a layer between the outside world and the web applications, so I can just detour the traffic via a caching proxy when needed.

Haproxy cache

Between the outside world and the webservers is a haproxy installation anyway, so I looked at that option. But the haproxy cache will not cache pages that have a Vary: header, but even after removing that header in Apache the next problem is that the Content-Length: http header has to be set in the answer from the webserver. With my current setup that header is missing in dynamic pages.

Varnish cache

Using varnish cache means I really have to 'detour' web traffic through another application before it goes on to the final webserver. This turned out to be the working combination. But this caused confusion as Varnish adds to the X-Forwarded-For header and I had an entire setup based on this header being added by haproxy listing the correct external IP address from the view of haproxy. It took a few tries and some reading to find the right incantation to specifically mangle back the X-Forwarded-For header to the right state in the outgoing request to the backend server. The varnish cache runs on the same virtual machine as the test haproxy, so the rule was to delete , ::1 from the header.

Tuning haproxy to avoid overloading a backend

In looking at things and testing I also found out haproxy has a maxconn parameter for backend servers, listing the maximum number of open connections to the backend. By changing this number to something lower than the maximum amount of database connections the site starts to respond slow under a high number of concurrent requests, but it keeps working and doesn't give database errors.

Tags: , , ,

IPv6 check

Running test...
, reachable as koos+website@idefix.net. PGP encrypted e-mail preferred. PGP key 5BA9 368B E6F3 34E4 local copy PGP key 5BA9 368B E6F3 34E4 via keyservers

Meningen zijn die van mezelf, wat ik schrijf is beschermd door auteursrecht. Sommige publicaties bevatten een expliciete vermelding dat ze ongevraagd gedeeld mogen worden.
My opinions are my own, what I write is protected by copyrights. Some publications contain an explicit license statement which allows sharing without asking permission.
Other webprojects: Camp Wireless, wireless Internet access at campsites
This page generated by $Id: newsitem.cgi,v 1.62 2023/09/19 14:49:50 koos Exp $ in 0.009813 seconds.