2023-09-20 Adding an RSS feed to my amateur radio site
A remark on mastodon.radio triggered me: Quick reminder that there is a public RSS aggegator that combines all the ham radio blog feeds into one web siteQuick reminder that there is a public RSS aggegator that combines all the ham radio blog feeds into one web site: blogs.radioAnd I wanted to add the latest from my site PE4KH Amateur Radio but there was no RSS feed available. I've been wanting to add such a feed for a while but didn't get around to it. This was the trigger I needed. The perl script that generates the feed for idefix.net has now been updated to allow for 'filtered' feeds for pe4kh.idefix.net and other sites that have a specific part of the main feed. And I moved the script to version control so I can work on it on the development server and deploy to production when it's working fine.Update: you can now find my posts over there
The feed is now accepted and imported on blogs.radio: PE4KH amateur radio - Blogs.radio.
2023-06-14 Looking at web caching options
Somewhere on irc the term "don't host your website on a wet newspaper" is sometimes used when an url getting a bit of serious traffic makes it really respond slow or give errors. So I looked at my own webservers at home and what would happen if one of the sites got hit with the Slashdot Effect. As I don't like guessing I played with ab - Apache HTTP server benchmarking tool to get some idea of what happens under load and/or highly concurrent access. Especially highly concurrent access turns out to be an issue because there are only so much database connections available for the webservers. The load average does go up, but the main problem is clients getting a database connection error. I started looking at caching options to allow the dynamic pages to be cached for short periods. This would make high amounts of traffic have the advantages of having a cached version without losing the advantages of dynamic pages. By now this has cost me more time and energy than the advantage of ever surviving a high amount of valid traffic. And to be honest the chances of a DDoS attack on my site because someone didn't like something I wrote is higher than the chances of a lot of people suddenly liking something I wrote. This was all tested with the test and development servers, so actual production traffic was never affected by the tests.Apache built-in memory cache with memcached
I first tried the Apache module socache_module with socache_memcache_module as backend. This did not cache the dynamic pages, just .css and other static files which originate from diskcache or ssd storage anyway. All kinds of fiddling with the caching headers did not make this work. With debugging enabled all I could see was that the dynamic pages coming from cgid or modperl were not a candidate for caching. I could have used memcached from the web applications directly, but that would mean I would have to rewrite every script to handle caching. I was hoping to add the caching in a layer between the outside world and the web applications, so I can just detour the traffic via a caching proxy when needed.Haproxy cache
Between the outside world and the webservers is a haproxy installation anyway, so I looked at that option. But the haproxy cache will not cache pages that have a Vary: header, but even after removing that header in Apache the next problem is that the Content-Length: http header has to be set in the answer from the webserver. With my current setup that header is missing in dynamic pages.Varnish cache
Using varnish cache means I really have to 'detour' web traffic through another application before it goes on to the final webserver. This turned out to be the working combination. But this caused confusion as Varnish adds to the X-Forwarded-For header and I had an entire setup based on this header being added by haproxy listing the correct external IP address from the view of haproxy. It took a few tries and some reading to find the right incantation to specifically mangle back the X-Forwarded-For header to the right state in the outgoing request to the backend server. The varnish cache runs on the same virtual machine as the test haproxy, so the rule was to delete , ::1 from the header.Tuning haproxy to avoid overloading a backend
In looking at things and testing I also found out haproxy has a maxconn parameter for backend servers, listing the maximum number of open connections to the backend. By changing this number to something lower than the maximum amount of database connections the site starts to respond slow under a high number of concurrent requests, but it keeps working and doesn't give database errors.
2023-05-16 Maybe YouTube isn't completely on to me...
I sometimes think YouTube is quite good at suggesting new videos to me with interesting subjects. For a while I've been seeing Tom Scott videos and Connections Museum videos. But only today YouTube suggested to me this video, Tom Scott at the Connections Museum! So maybe YouTube isn't completely on to me. Of course with Sarah from the Connections Museum explaining things.
2023-01-11 Working around broken urls for my website
If you're bored enough to look at the sources for my webpages you'll notice I make a lot of use ofRead the rest of Working around broken urls for my website<base href="https://idefix.net/~koos/">This changes the base for all relative urls from https://idefix.net/ to https://idefix.net/~koos/ because my whole site is based on being in my userdir, but https://idefix.net/ is the easy url. I use a lot of relative urls for local things because why make them longer. And this eases developing and debugging on the developer site. All browsers support the 'base href' meta tag, but some bots ignore it. And there has been a case a few years ago where a bug in one script made all urls seem 'below' other urls. The net result is that my logs are currently filled with entries like:[11/Jan/2023:17:09:34 +0100] "GET /~koos/irregular.php/morenews.cgi/2022/newstag.cgi/morenews.cgi/draadloosnetwerk/morenews.cgi/newsitem.cgi/morenews.cgi/morenews.cgi/newstag.cgi/asterisk/morenews.cgi/morenews.cgi/morenews.cgi/morenews.cgi/morenews.cgi/morenews.cgi/morenews.cgi/morenews.cgi/newstag.cgi/newstag.cgi/kismet/morenews.cgi/newstag.cgi/newsitem.cgi/morenews.cgi/morenews.cgi/2023 HTTP/1.1" 410all those entries seem for http:// versions of the urls so I now adjusted the http to https redirect function to stop at urls that look like ^\/~koos/irregular.php\/.+\.cgi to give a status 410 immediately. This 'saves' a bit of traffic because it never gets the redirect to the https version. While checking this I see multiple stupid bots, like:35.209.99.100 - - [11/Jan/2023:17:02:14 +0100] "GET /homeserver.html HTTP/1.1" 404 972 "-" "Buck/2.3.2; (+https://app.hypefactors.com/media-monitoring/about.html)"This one clearly doesn't parse the base href tag.
2023-01-08 Time to stop with The Virtual Bookcase
Recently I was looking at some reports of the affiliate income generated by The Virtual Bookcase and it hasn't generated a cent in a few years. This is probably fully related to the fact I haven't paid any attention to the site both in code and content for years. The only commits in 2022 were due to a vulnerability found in the site. Most commits to the code for the site were before 2010. Time to admit to myself I need to stop doing this. There are other things that take my time and give me joy. If someone else wants to take over: get in touch. I'm not sure which parts of the database are of any use to people and which parts I shouldn't transfer due to Dutch privacy laws but we'll figure it out. If nobody wants it, I will start giving 410 gone status from 1 september 2023 and end the domain registration in November 2023. The original announcement of starting the site, dated 28 march 1999: I've created a virtual bookcase with an overview of books I like/read.. visit the site too! which is also the oldest newsitem in my archive.Read the rest of Time to stop with The Virtual Bookcase
2022-11-18 SSL scans showing up in the log
A comment on irc made me have a look at the logs for my haproxy system to get an idea whether any weird vulnerability scan came by. No special vulnerability scan showed up, but my attention was drawn to a number of lines like:Nov 18 08:05:01 wozniak haproxy[13987]: 2001:470:1:332::28:37618 [18/Nov/2022:08:05:01.900] https-in/1: SSL handshake failure Nov 18 08:05:44 wozniak haproxy[13987]: 2001:470:1:332::28:27286 [18/Nov/2022:08:05:44.328] https-in/1: SSL handshake failure Nov 18 08:06:22 wozniak haproxy[13987]: 2001:470:1:332::2e:3137 [18/Nov/2022:08:06:21.962] https-in/1: SSL handshake failure Nov 18 08:06:22 wozniak haproxy[13987]: 2001:470:1:332::2d:33085 [18/Nov/2022:08:06:22.278] https-in/1: SSL handshake failure Nov 18 08:06:22 wozniak haproxy[13987]: 2001:470:1:332::2d:17531 [18/Nov/2022:08:06:22.593] https-in/1: SSL handshake failure Nov 18 08:06:22 wozniak haproxy[13987]: 2001:470:1:332::30:58869 [18/Nov/2022:08:06:22.915] https-in/1: SSL handshake failure Nov 18 08:06:23 wozniak haproxy[13987]: 2001:470:1:332::2e:46537 [18/Nov/2022:08:06:23.228] https-in/1: SSL handshake failure Nov 18 08:06:23 wozniak haproxy[13987]: 2001:470:1:332::29:20027 [18/Nov/2022:08:06:23.544] https-in/1: SSL handshake failure Nov 18 08:06:24 wozniak haproxy[13987]: 2001:470:1:332::31:13423 [18/Nov/2022:08:06:23.872] https-in/1: SSL handshake failure Nov 18 08:06:24 wozniak haproxy[13987]: 2001:470:1:332::28:56683 [18/Nov/2022:08:06:24.197] https-in/1: SSL handshake failure Nov 18 08:06:24 wozniak haproxy[13987]: 2001:470:1:332::31:5055 [18/Nov/2022:08:06:24.524] https-in/1: SSL handshake failure Nov 18 08:06:24 wozniak haproxy[13987]: 2001:470:1:332::2e:20907 [18/Nov/2022:08:06:24.841] https-in/1: SSL handshake failureIf there is one of two of these lines from one address, it is a sign of a client which can't finish the SSL negotiation. With my site that probably means and old client which doesn't understand LetsEncrypt certificates without an extra certification path. But this is quote a number of SSL errors from the same IPv6 range in a short time. I wondered what was behind this and did a bit of testing, until I found it's simple to cause this by doing an SSL test. For example with the famous Qualys SSL test or with an ssl scan tool. This is logical: ssltest uses a lot of different negotiations to test what actually works.
2022-10-31 Trying mastodon for amateur radio
All the news about twitter makes me wonder if I want to stay there in the long run. But changing a social network is always a negative experience, you lose contacts. I still remember some several people who I haven't heard much from since google+ and wonder how they are doing! For amateur radio I'm having a look at mastodon as @PE4KH@mastodon.radio. One conclusion is that my own site is more permanent than any social media. My own website survived the rise and fall of google+ while importing my posts so those are still available here. But interaction on my own site is complex and needs constant maintenance to avoid spam.
2022-08-26 Limiting URLs to scan with wapiti
I wanted to use wapiti as scanner to check for other vulnerabilities in The Virtual Bookcase after receiving a report about a cross-site scripting vulnerability. Wapiti is open source and free, which is a fitting price for scanning a hobby project site. I quickly ran into wapiti taking hours to scan because of the URL structure of the site: all /book/detail/x/y URLs map to one handler that deals with the X and Y parameters in SQL queries. Yes those queries are surrounded by very defensive checking and I use positional parameters. Everything to avoid SQL injection and becoming the next Little Bobby Tables. Wapiti has no simple method that I can find to crawl for a list of URLs and stop at that to allow for selecting the list of URLs to scan. But it has an option to minimize crawling and import a list of additional URLs to scan so I used that option to get at the same result. Gathering URLs was done with wget:$ wget --spider -r http://developer.virtualbookcase.com 2>&1 | grep '^--' | egrep -v '\.(css|jpg|gif|png)' | awk '{ print $3}' > developer.virtualbookcase.com-urls.txtAfter that I sorted the file with URLs and threw out a lot of them, making sure all the scripts with several variants of input were still tested. With that list I start wapiti with some special options. It still needs a starting url at -u so I give it the root but I limit the crawling with the depth parameter -d 1 and the max files parameter --max-files-per-dir 50. Then I add the additional urls from the earlier scan with the -s parameter. It's a lot of tweaking but it does the trick.$ wapiti -u http://developer.virtualbookcase.com/ -d 1 --max-files-per-dir 50 -s developer.virtualbookcase.com-urls.txt -o ~/wapiti/ -v 2No vulnerabilities were found. I found one PHP warning which only triggered in the kind of corner case a web vulnerability scanner causes, or an attacker. So I fixed that corner case too.
2022-08-25 D'oh!!! A cross-site scripting vulnerability in one of my own sites
I received a responsible disclosure report of a vulnerability in The Virtual Bookcase. I will directly admit I haven't done a lot of maintenance on this site in the past few years but I want to keep my sites secure. The report came via openbugbounty.org and has no details about the vulnerability, so I am not 100% sure where the reported vulnerability is. But based on the report text XSS (Cross Site Scripting) and a peek in the access-log looking for specific requests I found I made a beginner mistake in dealing with a search query: displaying it as-is within an HTML context. I immediately fixed that error in the site. Now I wonder why it took so long for me to realize the error of my ways or for someone to notice it! Checking the logs some more finds huge amounts of attempts at SQL injection, which is a vulnerability I am very aware of and where I put up standard defenses. But this is the first time a security researcher made me aware of the cross-site scripting vulnerability. Update: I contacted the reporter about the vulnerability who responded quickly inquiring about the possible bounty for finding the bug. As this is a site that hasn't delivered any income in years the best I can do is a mention in the credits of the site or on a separate hall of fame. Update: I also started a vulnerability scanner on the site myself, to find any other vulnerabilities I might have missed. This scanner is going through the development site at the moment. Like many other scanners it doesn't see by default how certain urls all map to the same PHP script. I already committed a few minor updates to improve handling of corner cases in not set variables and other things popping up in the scan. Update 2022-09-23: I realized the reporter has never responded with the actual bug information.
2022-07-20 I redid my 'recent QSO map' with leafletjs and openstreetmap tiles
Items with tag web before 2022-07-20My todo-list for hobby projects has had an entry 'redo maps in sites using leaflet' for a while and on an otherwise calm evening I got around to it. The first thing to upgrade was the recent contact map for PE4KH which shows an overview of places where I had the last 150 contacts plotted on a map, with some details per contact. I'm not good at javascript programming at all so I just look for examples that come close to what I want and I adjust them until they do what I want. Luckily I found some good geojson examples and I managed to get the points on the map. After a bit of massaging, trying and reading I managed to add the popup with the location. The next and harder bit was adding default and non-default icons. Eventually I got my brain wrapped around the bits needed for that too. After that the test version got deployed to production and you can look at it now. Documentation and code snippets used: The main reasons for switching to leaflet are that google maps was limiting free access to maps although they seem to have mostly reverted this plan and I wanted to promote openstreetmap. The general conclusion is that sites with maps do need regular maintenance, if hosted leaflet goes away or stops this version, if the rules for using hosted openstreetmap tiles change or if something else happens I have to adapt the site, maybe even quite fast.