The failure seems to have been in the main firewall, if it had been the server itself we could have easily restored it on another server from the backups on another machine. But as it stands, remote access is entirely cut off.
There usually is another person with hardware access, but they are on summer holidays. This seemed like an acceptable risk at the time…
An off-site backup would have been nice of course, but due to the costs involved in running an Lemmy instance of that size on a rented server, it would have not been a great option either.
I have plans to add a KVM to the main firewall via a secondary connection, but even that might have not helped in this case. I’ll know more when I have physical access again.
I’ve done a lot of SysAdmin and DCOps stuff in the past so, thought I’d give you some plausible suggestions (haven’t dug deep into Lemmy DB stuff and DNS/Federation of the stack, so not sure all is practical).
Scenario 1 - Preserve and merge when access is restored
Setup
Spin up two VMs/VPS (or one that has enough grunt for two Lemmy servers). Call them robak.slrpnk.net and slrpnk.net and point DNS appropriately.
Pull federated content from other instances and place it on robak, set as read-only.
Sync important comms to (new) slrpnk.net without content.
Allow users to sign up, vetting as possible (all mods). Keep a list of those that are vetted (call it vetted.list). Inform all users that any non-vetted users will have their content dropped when access is restored.
Merge!
Once access is restored, ensure that (old) slrpnk.net is set to read-only.
Schedule a maintenance window (announce more time than you are likely to need).
During the maintenance window, put (new) slrpnk.net into R/O, or just block external access.
Is it in a data centre or someone’s house? If the latter, would they let a stranger in?
Surely they would need a backup and replicate db to so in case of hardware failure they switch over.
Sounds like they could improve their setup.
Too much of a single point of failure.
Slrpnk.net admin here.
The failure seems to have been in the main firewall, if it had been the server itself we could have easily restored it on another server from the backups on another machine. But as it stands, remote access is entirely cut off.
There usually is another person with hardware access, but they are on summer holidays. This seemed like an acceptable risk at the time…
An off-site backup would have been nice of course, but due to the costs involved in running an Lemmy instance of that size on a rented server, it would have not been a great option either.
I have plans to add a KVM to the main firewall via a secondary connection, but even that might have not helped in this case. I’ll know more when I have physical access again.
Appreciate the answer and the detail. Good luck getting it all resolved.
I’ve done a lot of SysAdmin and DCOps stuff in the past so, thought I’d give you some plausible suggestions (haven’t dug deep into Lemmy DB stuff and DNS/Federation of the stack, so not sure all is practical).
Scenario 1 - Preserve and merge when access is restored
Setup
robak.slrpnk.net
andslrpnk.net
and point DNS appropriately.Merge!
Scenario 2 - Server is in DC or Admin able to facilitate access
Probably quite expensive, and when doing something as a hobby it’s often hard to get the funds.