After going to all the trouble of ripping and encoding my CD's to a lossless format, I want to:
- Ensure integrity of the music library, i.e. at any point be able to validate that all the files exist, their contents haven't changed and that there are no extra files.
- Have a recovery strategy should there be a problem with the files.
- I tried out Git, but after the initial commit of a music file, the repository storage space on the filesystem took up twice the size of the music file. Furthermore, changing metadata such as fixing a spelling mistake in the track name and committing increases the repository by the full size of the file again. I assume this is because the files are binary and already compressed. I didn't try out Mercurial, but I expect it will be the same.
- The music files are already large, even without the extra overhead of the previous point and the data transfer costs here in Australia are just too high.
My current solution:
- Store the music library on a removable drive on the Mac at home.
- Keep a copy of the music library on my computer at work by either periodically taking in the removable drive and using rsync or copying newer music onto a USB drive if physical space is at a premium, such as when cycling.
- Put checksums of the files in a Git repository stored on both machines. I can then verify the integrity of a music library at any time. Currently I use md5deep because it can recursively process a directory tree and is available for both linux and Mac OS X. The default
md5
program on the Mac does not seem to have the same feature set asmd5sum
on linux. - I also store FLAC fingerprints in the Git repository. FLAC files store a checksum of the uncompressed audio in the metadata and various tools, such as xAct on the Mac, can verify the file against that. I am not sure how useful storing the fingerprints is, but I can think of a few unlikely situations where it might be helpful, plus it is small and easy to generate anyway.
To verify a music library, I do:
$ cd $MUSIC_LIBRARY
$ md5deep -rl * | sort | diff $GIT_REPO/md5deep.txt -
where
$MUSIC_LIBRARY
and $GIT_REPO
represent appropriate file paths.I originally tried the matching feature of md5deep instead:
$ cd $MUSIC_LIBRARY
$ md5deep -rX $GIT_REPO/md5deep.txt *
However this does not catch the case where a file has been deleted in the music library but is still present in the Git checksum file.
No comments:
Post a Comment