27 September 2006

betaBUMM... so, what happened?

Hard Disk failure

If you visited this site between Saturday and Tuesday evening, you may have noticed a friendly but unhelpful outage notice. The server was down. What has happened? The server has moved to a proper server housing. That was the reason for a couple of hours offline time from Saturday morning till noon. But when the server was started up at the new location it wouldn't boot. The machine output something about disk consistency failure, always a bad sign. On Tuesday evening with a lot of help from friends, I was finally able to diagnose the problem further and to bring the machine up.

One of the hard disks in the Apple PowerMac G3 B&W failed. Either due to coincidence or due to the physical movement, this happened during the move. I'm lucky to have multiple disks in there, and the disk which failed wasn't the one with the main operating system partition.

What I've lost are the /home partitions and the partition for /var/mail, which is where incoming mail is stored. That's pretty bad. I'm again lucky (or I was wise, depending on point of view) to have made nightly backups of /home. I didn't do this for /var/mail. Myself I'm moving mails to my mail directory in /home once I read them, but some other people on the machine don't do that. For them it is very important to make local copies of their mail in their mail client before reconnecting. Otherwise their mail client will delete the local copies once it finds they are gone from the server. This is the way IMAP works, it assumes that when a message is gone from the server that you deleted it.

Future outlook

The server started a fun project with no direct reason and no professional service attitude. But after some time it turned out that I like what it's doing so much, that I would miss it if it was gone. That was the reason for moving it to proper housing when the current location wasn't feasible any more. For the near future I will have to find a suitable replacement SCSI hard disk. I will then revise my backup strategy to cover more possibilities. But in any case, the machine is a hobby server and failures may happen again, that's just life. As they say, with hard disks it's not the question "if" they fail, but "when".

Big thanks go to Charlott, Peter, Martin, and Jerome for all their help!

