Two things are usually in short supply when you run a Mastodon instance: Funding and public attention. But both are important so that the instance can continue to operate and - if desired - achieve growth or reach.
My goal with metalhead.club is to offer a professionally hosted platform for everyone who feels at home in the metal music genre. In order for such a theme-based instance to be viable and its users to benefit from it to the greatest extent, it must achieve a certain level of popularity. Nevertheless, the usual advertising media are only of limited use - either because they do not implement my idea of data protection or because they are currently not readily financially viable. Precious donations have to be used sparingly.
Recently I can into an issue with acme.sh / Let’s Encrypt and a failing ACME validation
Error 404 when running acme.sh --renew -d mydomain.tld
[Wed May  3 15:31:45 UTC 2023] Pending, The CA is processing your order, please just wait. (1/30)
[Wed May  3 15:31:49 UTC 2023] mydomain.tld:Verify error:<ipaddress> Invalid response from https://mydomain.tld/.well-known/acme-challenge/5GmSwd0P0ukTtX302yHHhAuZMCEDJx7MmAaBBoPIKtk: 404
[Wed May  3 15:31:49 UTC 2023] Please add '--debug' or '--log' to check more details.
[Wed May  3 15:31:49 UTC 2023] See: https://github.com/acmesh-official/acme.sh/wiki/How-to-debug-acme.sh
My Mastodon instance metalhead.club exists since summer 2016 and seen several waves of new users - but never as many new users as in early November 2022. This has not only led to heavy CPU work on the servers (see my post about scaling up Mastodon’s Sidekiq Workers), but also to greater load on storage space. Mastodon uses a media cache that not only stores copies of preview images for posts containing links - but also copies of all media files that the server knows of. Before the user wave of late 2020 metalhead.club’s media cache was about 350 GB in size with a cache retention time of 60 days. Quickly the numbers escalated and after a few days we were already at 400 GB - and after about 3 weeks we had more than 550 GB of cached media files. Not with 60 days retention time - but with 30 only.
Despite I added hundreds of GB of new storage space, the cache showed no signs of shrinking in the near future, so decided to offload the storage to an S3 storage provider. The local disks would have been full a few days later.
Situation: One of my servers is located at my home. It’s connected to the internet via two different interfaces at the same time:
- Physical Interface: Connected to Deutsche Telekom ISP via DSL / landline.
 - Virtual Wireguard VPN interface: Connected to one of my data center servers, has a public IPv6 address.
 
The “data center server” acts as a gateway for my home server and routes a static IPv6 address to it. This setup lets me reach my home server via a static and public IP address, although my DSL provider does not assign a static IPv6 subnet to my landline. (But that is subject to another story … ;-) ).
After finishing the setup, I ran into the problem of asymmetric routing: Packets addresses to my static IPv6 address (and thus routed via the Wireguard VPN) did arrive at my home server, but the response packets were not send back the same way: They were routed via my Deutsche Telekom landline and therefore originated from another source IP address, which the original requester did not expect.
A few days ago I noticed that I could not use my OPNsense firewall as a SSH jump host to my other servers. I’m not sure how long this issue has existed, or if it has always existed, but since I’ve had IPv6 connectivity after a long time of IPv4-only internet, I could definitely feel the consequences.
While ssh root@opnsense worked perfectly, ssh -6 root@opnsense failed with a timeout. Verbose output of the ssh command showed that the client was trying to access the correct IPv6 address of my firewall, but obviously it did not receive any response.
It happened again - this time on my Fedora machine! I ended up with a laptop that won’t boot after some package changes. Last time that happened was ~ 4 years ago when Arch Linux could not decrypt my main partitions due to some changes on a crypto library. This time the accident was caused by a simple dnf command:
dnf autoremove
I intended to remove dangling packages from my system - expecting my package manager to know which packages are needed and which not. Unfortunately some really important packages (amongst some legacy packages) were removed. My laptop was not even able to start any boot loader - it booted straight to the device diagnosis application that the hardware manufacturer ships.