Back-up directories on shutdown: advice sought


(J) #1

I’m setting up a file server that will be booted only intermittently: it will be in a powered-on state only at junctures when I want to back up to it some file or directory from another computer on my network. So it will be powered off probably more than 90% of the time. I hope to implement wake-on-lan on this machine so that I can power it on remotely–but that’s an issue for another day. What I’d like to ask about in this thread is my plan to do automated back-ups of certain folders on this machine each time it will be powered off.

First, an explanation of drives/folders on the machine. The file server contains 3 independent hard drives and one RAID array. One hard drive runs the OS (Void)–I don’t care about backing up any files/folders involved with the OS running on that drive. The RAID array contains one of the back-up folders, while a larger SATA drive houses another such folder. Both the RAID and the SATA get mounted whenever the server is powered on, say, under /mnt/folder1 and /mnt/folder2. The remaining, even larger SATA drive, is to remain unmounted except when the process of shuting down the server is initiated. At shutdown, I want the system to mount that remaining drive and to back up to it the contents of /mnt/folder1 and /mnt/folder2–rsync is likley to be used for that task, since I aim to use something that will back up only changes made within /mnt/folder1 and /mnt/folder2.

Some reading of various documents and internet forums leads me to believe that, on Void, the way to trigger the mounting and rsync’ing as part of the power-down procedure is likely to be by editing the file /etc/rc.shutdown: is that correct? So, I would cobble together a simple bash script (not that my bash abilities allow me to do anything more advanced) that would do those things, then put the path to the script in the file /etc/rc.shutdown? Some experiments I’ve done do indicate to me that this should work.

In its simplest form I suppose the script could look something like the following:
#!/bin/sh
mount /dev/sdX# /mnt/back-up
rsync -a --delete /mnt/folder1 /mnt/back-up &&
rsync -a --delete /mnt/folder2 /mnt/back-up

A couple of considerations I have are as follows. First, the rsync commands could, especially on a first run, take a really long time; there are currently tens of gigabytes of data on each of the two drives under discussion. So, can I trust that the system will not power off before those rsync commands complete? I’m just not sure how rc.shutdown works and fits into the shutdown scheme. Can anyone offer clarification on that?

As far as the script itself goes, would it be of any benefit to make a “for” loop out of the rsync commands? And maybe add some explicit exit sequence? Perhaps some feedback sent to a terminal as to the script’s success/failure?

And for extra points, someone has recommended to me doing something like hashing the source and destination directories to confirm that they are, indeed, exact copies of one another. I do see some sense in that plan. But the bit of research I’ve done indicates that hashing directories with sub-directories is not a simple affair. For example, I do not want much in the way of meta-data checking in such a routine, since checking of access times or file ownership is not really important to verifying file integrity in this case. Some sources, on the other hand, seem to indicate that getting a diff of the two could serve better. Does anyone have any input on such a possible data-integrity enhancement to my back-up plan?

And just as importantly, what, exactly, would a hash do? Ok, it could tell me whether the source and destination are accurate copies of one another. But let’s say it does happen that the hashes turn out to be different: what can be done then? The two folders under discussion contain somewhere between 80GB and 200GB of data. Certainly the mismatching hashes cannot tell me anything about where the mismatch–the error–between the two directories is, can it? If not, then what good would the hashing do? A diff, on the other hand, seems like it could be more informative as to the location where a problem might lie, wouldn’t it?


#2

I think you’re on the right track with rc.shutdown. With just two folders I wouldn’t use a for loop, just a sequence of commands (but I guess that’s a matter of taste). E.g.

mount ...
rsync ... folder1 ...
rsync ... folder2 ...
umount ...

The shutdown continues after rc.shutdown is done. You can see in /etc/runit/3 how rc.shutdown is invoked.

When you create checksums for your backup you can find corrupted data, but you cannot restore it if it got corrupted (unless you’re using some error correction code).

But instead of using rsync there are many more advanced backup tools in the repo you could try like e.g. borg.


(J) #3

Thanks for your input, bluemoon. Glad to know it seems I am on the right track in figuring out how/where to trigger this planned back-up routine.

I’m now looking into borg. Incidentally, why specify the umount command in the script? Isn’t that part of the shutdown sequence that will resume once the script is finished? Maybe you’re thinking it will provide additional insurance against that back-up disk being possibly uncleanly unmounted?

Btw, here’s what I came up with so far in terms of a for loop for this process:

#!/bin/bash
mount /dev/sdX# /mnt/back-up
for command in 'rsync -avz /mnt/folder1 /mnt/back-up' 'rsync -avz /mnt/folder2 /mnt/back-up'
   do
   echo
   echo "*** The output of $command command >"
   #run command
   $command
   echo
done

Not saying I will or should do that. But trying to figure out how to do it provided an opportunity to slightly improve my rather paltry bash scripting knowledge/ability.


#4

Well, filesystems are unmounted in /etc/runit/3 after rc.shutdown is run, but in my eyes it’s just “cleaner” to unmount something I mounted previously.

Regarding the for-loop I’d do something like

for folder in folder1 folder2; do
    ...
    rsync -avz /mnt/$folder  /mnt/back-up
    ...
done

#5

I suggest just running a script whenever you want to back things up and call shutdown/poweroff in it. Why hardcore it into the shutdown process?


#6

If you consider to switch to btrfs you can use btrfs’ features like snapshots of subvolumes and btrfs send | btrfs receive to backup them. This is even possible to a remote machine over ssh. In combination with snapper and cronjobs this is a wonderful setup which I successfully use since more than two years. :slight_smile:


(J) #7

I’d initially dismissed the option of making a special script that calls shutdown/poweroff at the end because I’m in the habit of issuing those commands from the terminal when I want to shut off my machine. Making a new command to initiate the shutdown sequence would mean that I would have to start remembering that I need to do something different whenever I wanted to power off this particular machine. Which is why I started looking into Void’s shutdown routine to see whether I might automatically trigger the back-up script whenever I issue the standard command. I’m not ruling out doing something along the lines of your suggestion, AnachronGuy, just spelling out my initial rationale. I’m still working out how to accomplish what I want to do so am taking all options into consideration.

Thanks to all contributors for the additional input offered in this thread.