To reboot lizard, some steps are necessary to do before:
Check for convenience of rebooting lizard
Check the release calendar to know whether developers will be working on a release by the time you plan to reboot.
Avoid rebooting:
-
During the 2 days a release takes (this could screw up the whole RM's schedule for these 2 days).
-
Until a couple of days after a release, as lots of users might be upgrading during this period.
Icinga2
lizard has many systems and services observed by Icinga2. We don't want to receive hundreds of notifications because they are down for the reboot. Icinga2 has a way to set up Downtimes so that failures between a certain time are ignored.
XXX Setting downtimes as described above also causes a flood of messages.
If the Icinga2 master host (ecours) has to be rebooted too, the easier solution is then to reboot it first and wait that lizard's reboot is over before typing the ecours passphrase. But in the other case, if you have to set up a Downtime for lizard:
- Visit the list of hosts that contain "lizard" in their names.
- Select the first host with a left-click.
- In the left split of the main content (where the host list moved), scroll down and SHIFT+click the last service to select them all.
- In the right split of the main content, click Schedule downtime.
- Set the downtime start and end time.
- Enable "All Services".
- You can check results in Overview → Downtimes.
Now that the downtime is scheduled, you can proceed with the reboot.
Boot the machine
-
Start the machine. It usually takes ~2m30s for the Dropbear prompt to appear in the IPMI console and ~3m10s until Dropbear starts responding to pings.
-
Connect to the IPMI console if curious (see [[lizard/hardware]]).
-
Login as root to the initramfs SSHd (dropbear, see fingerprint in the notes):
ssh -o UserKnownHostsFile=/path/to/lizard-known_hosts.reboot \ root@lizard.tails.net
-
Get a LUKS passphrase prompt:
/lib/cryptsetup/askpass 'P: ' > /lib/cryptsetup/passfifo
-
Enter the LUKS passphrase.
-
Do the LUKS passphrase dance two more times (we have 3 PVs to unlock). If you need to wait a long time between each passphrase prompt, it means #12589 is still not fixed and then:
- report on the ticket
- kill all
pvscanprocesses
Note: It usually takes 35s after all LUKS passphrases were entered until the system starts responding to pings.
-
Reconnect to the real SSHd (as opposed to the initramfs' dropbear).
-
Make sure the libvirt guests start:
virsh list --all -
Make sure the various iso{builders,testers} Jenkins Agents are connected to the master, restart the jenkins-slave service for those which aren't:
https://jenkins.tails.net/computer/ -
Check on our monitoring that everything looks good.