Metadata is loosely defined as “data about data”. In electronic records management its definition can be refined to mean the factual information related to records such as who took a photograph, what the title of a publication is, where a video was shot, when a speech was recorded, and why a report is security classified. This information describes the content, context, and structure of records and supports their appropriate management for as long as they are needed.
While metadata can be used to describe any record whether it is paper, film-based, or electronic, it is especially important in electronic records management where there aren’t physical items to interact with or provide visual clues. While the term metadata is relatively new and is usually associated with electronic records, the concept has been around as long as there have been archives. The box lists, indexes, and finding aids traditionally used with paper and analog records provide information analogous to metadata but for physical records. Even the notes written on boxes, folders, binders, and envelopes would be considered metadata if they provided information used to locate or interpret records.
It is easy to distinguish between a glass plate, 35mm slide, or an 8” X 10” print even if they carry the same image, but digital versions of each of these may appear identical to the naked eye when viewed on a computer monitor. Additional information, metadata, is required to insure that digital files are distinguishable and that they can be maintained and accessed appropriately.
Categories of Metadata:
Metadata can describe almost any aspect of an electronic record. For convenience, individual metadata elements that work together to describe a particular aspect of a record can be referred to as belonging to categories. Anyone with a digital camera and a collection of images is probably familiar with administrative and descriptive metadata, but they may not be familiar with more specialized categories such as technical, preservation, and use metadata. Metadata viewed out of context is easy to misunderstand. It is important to note that while it is possible for a metadata label such as date to appear in more than one category, it may not have the same intent. It may indicate the date a photograph was originally taken, the date it was digitized, or the date it was migrated to a new format for preservation. As a result, metadata is often organized into schemas that define what labels mean and how they should be used.
Administrative Metadata is used to manage collections of records. Examples include the Transfer Request (TR) Number, the Record Group, the name of the person authorized to transfer custody, etc.
Descriptive Metadata identifies and describes records. Examples include a photograph’s caption, the title of a book, or the composer of a song.
Preservation Metadata is the specialized set of information required to preserve and provide access to electronic records. Examples include the file format used to encode a file, the software necessary to view it, or an action taken to maintain it such as the results of a virus scan.
Technical Metadata describes aspects of electronic records important to their proper interpretation, rendering, or playback. The type of compression used with a digital image, the audio codec contained in a digital video, or the encryption algorithm used to digitally sign an email are all examples of technical metadata. Technical metadata is frequently included as Preservation metadata as it is necessary for the maintenance of electronic records.
Use Metadata includes information that describes how records can be accessed or circulated. Metadata identifying copyright status or security classification are examples of use metadata.
Metadata examples:
Image 1. From NARA Image 2. From Library of Congress
At first glance, these images appear to be simply versions of the same photograph. The metadata for each presents a much more interesting picture (pun intended).
If we consider the “who”, ‘what’, ‘where’, and ‘when’ for each we would expect to see a similar if not identical set of metadata elements. The photographer, geographic location, date, and caption, should match. Let’s take a look, keeping in mind that the digital images we view are representations of physical originals:
Image 1 | Image 2 | ||
Photographer | Mathew Brady | Photographer | Mathew Brady |
Caption | An interior view of Union breastworks on Little Round Top, Gettysburg, PA | Caption | Battle-field of Gettysburg. Temporary entrenchments thrown up by the Federal troops on Little Round Top. Big Round Top in the background |
Date | ca. 1860 – ca. 1865 | Date | Photographed 1863, July |
Place | Little Round Top, Gettysburg, PA | Place | Battle-field of Gettysburg. |
So far so good. Now let’s examine some administrative metadata for each:
Image 1 | Image 2 | ||
ID | National Archives Identifier: 530424 | ID | LC Call Number LOT 4167-A, no. 15 [P&P] |
URL | https://catalog. archives.gov/id/ 530424 | URL | http://www.loc.gov/ pictures/item/ 2012647707/ |
Collection | Record Group 111 Records of the Office of the Chief Signal Officer, 1860 – 1985 | Collection | Civil War Glass Negatives and Related Prints |
And technical metadata:
Image 1 | Image 2 | ||
Original Media | Negative | Original Media | Albumen Print |
Digital Format | GIF | Digital Format | JPEG |
Resolution | 600 X 454 Pixels | Resolution | 640 X 474 Pixels |
File Name | union-breastworks-gettysburg.gif | File Name | 32841r.jpeg |
File Size | 110k | File Size | 93.1k |
Both of these images originate from physical objects, a negative and an albumen print, rather than having been “born digital” from a digital camera. Therefore the originals are the subject of ongoing preservation rather than these copies which were produced solely to provide access to users and so there isn’t any preservation metadata to examine. But preservation metadata is extremely important, so important that there is an entire data dictionary called PREMIS, dedicated to it. It identifies the information that an organization interested in preserving digital content should maintain. We’ll discuss PREMIS in detail in a future blog post.
Metadata in action:
The metadata for the images above provides lots of information useful for researchers that makes them searchable and discoverable. Most importantly, anyone interested in the negative knows that it is in the holdings of the National Archives while the albumen print is part of the collection of the Library of Congress. Equipped with the identifier or call number they will be able to request additional information. On a practical level, each example has a caption that can be used for citing them in a paper, as well as technical information including the format and resolution would allow a researcher to decide if they could be used for printing. These images are both low resolution access copies so while they might look fine in a presentation, a higher resolution copy would needed to print a poster.
Metadata for Federal agencies:
Other administrative metadata is certain to exist at both NARA and the Library of Congress that permits both institutions to distinguish between the different representations of these images that they manage. While in both examples the record copy or original source image is a physical print or negative, multiple scanned copies will exist and be maintained. It is important for agencies to maintain metadata that clearly identifies and distinguishes between copies of records so that they can execute the appropriate disposition at the correct time. Additional metadata requirements for permanent records are identified in NARA 2015-04 Metadata Guidance for the Transfer of Permanent Electronic Records.
Conclusion:
Hopefully we’ve been able to demonstrate the importance of metadata for records management. While two electronic records may appear to be visually identical, there may be difference that can only be determined through their metadata.
This is a complicated subject so please feel free to contact us at acps@nara.gov with any questions that you may have.
This can be used to determine data authenticity as well, yes? A fake image’s metadata would be significantly less? How do you prevent the creation of fake metadata?
Leah, Thank you for your question. I’ve checked with some of our experts here. This was the response;
Dear Ms. Wolfe,
I am writing in response to your excellent question, “This can be used to determine data authenticity as well, yes? A fake image’s metadata would be significantly less? How do you prevent the creation of fake metadata?”.
Institutions responsible for preserving electronic records employ different methods to insure that it is possible to determine the authenticity of copies of files. With digital images it is possible to make use of watermarks, while with text files digital signatures are often used. This page (https://www.gpo.gov/authentication/) on the Government Printing Office’s website explains how they use digital signatures as a part of their authenticity strategy.
Another approach that can be used with all file and record types is to make use of a checksum utility to generate a fixity value for a file. If you aren’t familiar with fixity, it is a bit like a very fine measurement of a file. If you change a single character or even insert a blank space in a text document it will alter the fixity value. The Digital Preservation Coalition describes the use of checksums to determine fixity on their site (http://handbook.dpconline.org/technical-solutions-and-tools/fixity-and-checksums). Institutions often generate fixity values for the copies of records that they make available and record them as metadata. If someone alters a file in anyway, including its metadata, it’s fixity value will change as well making it possible to recognize that something is wrong with the altered copy.