After going to all the trouble of ripping and encoding my CD's to a lossless format, I want to:
- Ensure integrity of the music library, i.e. at any point be able to validate that all the files exist, their contents haven't changed and that there are no extra files.
- Have a recovery strategy should there be a problem with the files.
- I tried out Git, but after the initial commit of a music file, the repository storage space on the filesystem took up twice the size of the music file. Furthermore, changing metadata such as fixing a spelling mistake in the track name and committing increases the repository by the full size of the file again. I assume this is because the files are binary and already compressed. I didn't try out Mercurial, but I expect it will be the same.
- The music files are already large, even without the extra overhead of the previous point and the data transfer costs here in Australia are just too high.
My current solution:
- Store the music library on a removable drive on the Mac at home.
- Keep a copy of the music library on my computer at work by either periodically taking in the removable drive and using rsync or copying newer music onto a USB drive if physical space is at a premium, such as when cycling.
- Put checksums of the files in a Git repository stored on both machines. I can then verify the integrity of a music library at any time. Currently I use md5deep because it can recursively process a directory tree and is available for both linux and Mac OS X. The default
md5program on the Mac does not seem to have the same feature set asmd5sumon linux. - I also store FLAC fingerprints in the Git repository. FLAC files store a checksum of the uncompressed audio in the metadata and various tools, such as xAct on the Mac, can verify the file against that. I am not sure how useful storing the fingerprints is, but I can think of a few unlikely situations where it might be helpful, plus it is small and easy to generate anyway.
To verify a music library, I do:
$ cd $MUSIC_LIBRARY
$ md5deep -rl * | sort | diff $GIT_REPO/md5deep.txt -where
$MUSIC_LIBRARY and $GIT_REPO represent appropriate file paths.I originally tried the matching feature of md5deep instead:
$ cd $MUSIC_LIBRARY
$ md5deep -rX $GIT_REPO/md5deep.txt *However this does not catch the case where a file has been deleted in the music library but is still present in the Git checksum file.
No comments:
Post a Comment