dirhash: create, verify and update md5 checksum files recursively, but per directory or per file
https://github.com/felf/dh
Copyright © 2013–2024 Frank Steinmetzger
Dh creates, verifies and updates file hashes recursively and per directory or
single file. It is written in Python 3.
Nothing fancy to do, dh is a simple one-file program. You can copy it wherever
you want, such as ~/bin, /usr/local/bin etc.
dh has a built-in help:
$ dh -h
Dh has three modes:
It will always work recursively.
In check mode (the default), it looks for a file called Checksums.md5 by
default. It will verify the hash of every file listed in this file, and it will
check whether there exist any files that are not listed in the checksum file.
Using the —paths option, it skips the hashing and only checks the file names.
Use this to quickly check your has files and prune them of cruft.
In write mode, dh will hash all files in a directory and write the hashes to
the checksum file. It ignores the content of pre-existing checksum files, but
warns when it is about to overwrite one.
In update mode, dh first does a quick paths-mode check, but it will also update
a directory’s checksum file:
To remove entries from existing checksum files because a file don’t exist
anymore, use —update or —paths option in combination with —delete.
Due to my own usage experience over the years, I think about removing write
mode in favour of the more flexible update mode—or at least changing the
default behaviour—because usually I do not need the all-or-nothing principle of
write mode. Instead the normal use case is to either hash a completely new tree
(then —update does the same as write mode anyways), or to update existing
checksum files because a small portion of the data files has changed.
Dh has a range of options to alter its behaviour:
I like to hash my media files for long-time storage or when I suspect that a
storage medium has seen better days. Often in such cases I find myself in the
need for hashing recursively, for example when dealing with a collection of
music albums. But unlike md5deep, which uses a single md5 file for everything, I
want a separate checksum file for every directory. Thus dh was born.
Over time, I might re-tag some of my files, which makes the existing checksums
obsolete. Thus, dh’s second task is to clean up checksum files from cruft and
add yet missing files to it. At the same time it does not re-hash every single
file, which saves a lot of time if one is dealing with gigabytes of files.
Dh also sorts the checksum files’ entries by file name.
Over time, new use cases emerged. One of those is a single directory of big,
independent files (such as ISOs or movies). I wanted to hash those, too, but
keep their checksum files separate so it is easier to copy a single file out of
the directory without having to edit checksum files. Thus dh gained a mode to
write one checksum file per input file.
Feel free to add new stuff or clean up the code mess that I created. :o) Dh is
written following the standard Python formatting (pep8). Although flake8
and pylint do contradict each other in certain areas, such as hanging indents.
You can use github’s facilities, drop me a mail or submit a pull request with
your own fix. ;-)
Here are some notable ToDos:
checksumtests.py
which