As we increasingly work in the digital world there are new areas of work for information and records managers. We need to understand our agency’s capability to participate in these. Archives New Zealand is working towards being able to accept born-digital records and fundamental to this is the ability to produce checksums.
A checksum is a string of numbers and letters that act as a fingerprint for a file against which later comparisons can be made to detect errors in the data. They are important because we use them to check files for integrity.
Our digital preservation policy uses the UNESCO definition of integrity.
“Digital content is information encapsulated in one or more digital objects. Within this context, integrity of a digital object is the quality of its content remaining ‘uncorrupted and free of unauthorized and undocumented changes’” (National Library of Australia/UNESCO. (2003). Guidelines for the Preservation of Digital Heritage. Retrieved from http://unesdoc.unesco.org/images/0013/001300/130071e.pdf
Checksums are useful when moving files from one environment to another (e.g. validation after migration) and also when working with files to uniquely identify what we are working with.
Checksums will bridge the gap, quite literally, between agency and permanent preservation at Archives New Zealand during transfer or deposit. A file must remain unchanged from the duplicate in your Content Management System when you extract it. We will attempt to prove that unchanged state when we store it in the Archives New Zealand digital repository. An exception procedure triggers if anything unexpected has happened.
The actual procedure which yields the checksum is called checksum generation. A generation uses one of a collection of checksum functions or algorithms. These algorithms usually output a significantly different value even for small changes to the data. So, checksums ensure a corrupt-free transmission. They also indicate when the file has been tampered with; an important by-product of integrity is security.
Once we can officially accept transfers on a regular basis, we will need to monitor checksums throughout the transfer or deposit lifecycle. There are two important points where we must guarantee integrity. Firstly, when we receive the files (including checksums) from your agency and compare them to a new checksum output that we create. Secondly, when we deposit the files into the permanent repository and check them against the original transfer sent to us by your agency. Once in the Archives New Zealand repository, we will continue to monitor the checksums to ensure the files remain unchanged in perpetuity.
Checksums are generated with a tool such as: Free Commander (Windows); an online tool on the Internet (http://www.md5.cz/); SHA1SUM. MD5SUM (Linux); or DROID (Cross-platform tool from The National Archives, UK). Then for validation we make comparisons using various tools: Spreadsheet; SHA1SUM, MD5SUM (Linux); AVPreserve Fixity http://vimeo.com/100311241; Checksum-comparator https://github.com/exponential-decay/checksum-comparator
There are many other tools out there and many internet links!
To assess your own capability, here are some questions for you and/or your agency…
Does your agency use checksums and if so what type?
Has your agency used checksums in any other scenario e.g. for de-duplication?
Would your agency be able to create a checksum list like the one described?
If you have any questions or comments please use the blog comments space or email us at firstname.lastname@example.org