Yikes! We had some downtime today - here's what happened.

Who knew that changing a hostname was so difficult? I was doing some renaming of the aliases in the hosts file, along with the hostname itself. The hosts file was looking pretty generic, like so:

127.0.0.1 localhost lo-monty
209.[IP] pve01 monty monty.thegeekbin.com pve01.can.thegeekbin.com
209.[IP] pve02 cleo cleo.thegeekbin.com pve02.can.thegeekbin.com

Seems pretty straight forward, right? So, I renamed and rebooted. But, nothing came back up – Proxmox could no longer connect to the containers, and I had some serious downtime on my hands.

74 minutes of downtime

After some initial testing, I wasn’t sure the issue of the VM not switching over to a new hostname — Proxmox is never clean when you change the hostname, it turns out you’ve got to update database file, and a bunch of other files in order for a successful transfer, however, I didn’t have enough time, so I began to revert my changes systematically. But, all of a sudden both Monty and Cleo (hypervisors) stopped responding, because pveproxy wouldn’t start (failed to find an IP address), so I headed off to the data center.

Once I arrived at the data center, I noticed they’re not starting because I reverted a change in the hosts file that clobbered the original hostname… double yikes! A few keys later, solid reboot, and we’re back online. I then rushed home, and expanded my database storage, and back online we went!

I had to run the following commands to get both systems back online:

# Diagnose the issues
systemctl status pvestatd
systemctl status pvedaemon
systemctl status pveproxy
systemctl start pveproxy # errors
# Google the error messages
echo "$OLDHOSTNAME" > /etc/hostname
vim /etc/hosts # replace new entires with old
systemctl start pvestatd # no errors? great!
systemctl start pveproxy # no errors? great!
# Reboot and hope it'll still work
reboot

Takeaways

  1. Always read the manual entirely — don’t skip ahead to the “important” bits, because you’ll miss something
  2. Set up a backup system to access servers remotely, my currently iLO behind firewall depends on at least 1 server responding to give me an interface to it

That’s today’s adventure!