The last session before lunch was about metadata, or ‘data about data’ or ’structured information about other information.’ Pretty straightforward really. A good example of metadata that was given was of the label on the side of a can of beans. The beans are the ‘object’ which the label describes in a structured and predictable way, usually listing the weight, ingredients, nutritional values, company details, etc. Tesco’s Archivists have got it easy.
The tutor discussed how there are different types of metadata in archiving that serve different purposes and users.
He started with metadata that assists ‘resource discovery’ and the ‘most famous’ of all metadata standards for resource discovery is Dublin Core. ADAM v1.0 draws on a mixture of this standard and another called IPTC. We’ll be expanding the metadata captured and the standards for compliance in ADAM v1.4 and v2.2, but for now, we rely on just these two. Dublin Core is a useful standard, often criticised for being too simple, which, with its mere 15 elements, it is, but it can also be expanded to 55 elements (‘rich’ or ‘qualified’ Dublin Core) and as a standard designed for interoperability on the web, it’s very useful. We were given an exercise to catalogue the Mind The Gap report which was handed out in class. It just showed how easy it is and therefore how useful.
We’ll be using Dublin Core specifically for interoperability on the web when we make a selection of ADAM content available publicly on the web. We’ll use the Open Archives Initiative’s Protocol for Metadata Harvesting which makes excellent use of Dublin Core’s strengths as a standard.
We then moved onto the OAIS standard and its concept of an Information Package (PDF). This is a package of information which contains all the necessary metadata for either the Ingest, Preservation or Dissemination of a digital object. It can even contain the digital object itself. “Incredible!” I hear you cry.
It’s a useful concept though and again, I found it useful to discuss its real-world application.
Since this is a course about digital preservation, we concentrated on preservation metadata for the next hour or so. Preservation metadata typically refers to information on the provenance, authenticity, preservation activity, technical environment, and rights associated with the digital object. That’s because digital objects are technology dependent, are mutable, and are bound by intellectual property.
Fortunately, there’s a recent standard that’s been developed to handle the preservation of any digital object and it’s called PREMIS. Apparently, it’s so good, it won an award sponsored by Sir Paul McCartney. I bet John Lennon would have better things to do were he alive.
PREMIS is meant to be implementation neutral, representing the core of what might need to be preserved for any digital object and support the automatic capture of information about the object. We looked at the PREMIS data model and learned that it can be broken down into five main components: the Intellectual Entity (i.e. a photo of a HR victim), the Objects (the digital image file(s)), the Rights associated with the object which in turn instruct the Agent as to what he/she/it can do (Events) with the object in terms of preservation. These Events, in turn, affect the nature of the digital object. An example might be that we have a photo (Intellectual Entity) which we scan to create a TIFF (Object) which we apply certain Rights to that instruct ADAM or AVR (Agents) that we can transform (Event) the TIFF to a JPEG-2000 file (a newly recommended preservation format for digital images).
For the Object, we’d record information such as file size, ’significant properties’, date created, file format, creating application, etc. For the Rights, we’d record the agreement or licence we have, who granted that agreement, and the exact permissions granted over the preservation of the Object. For the Agent, we’d record the role of the person undertaking the Event, the name, organisation, etc. We’d also record the software name, version number, OS type, etc. For the Event, we’d record the type of event, the date/time, the outcome of the Event and the link to the Agent. These are just examples. The data dictionary goes into more detail.
Finally, we discussed METS, which is a metadata standard for structuring all of the varied types of metadata captured during the archival process. Damon, Fiona and I have already looked at this in some detail, and it’s our intention that ADAM can export to a METS package. The METS package is actually meant to be a practical implementation of the OAIS Information Package and it appears to do the job very well.
March 22, 2006 at 12:38 pm
Interesting. But what about the metadata CONTEXT?
- When was the metadata statements made?
- Who made these statements?
- Where were they made?
- etc.
Context is very important for enabling a reinterpretation of the metadata, when some of the underlying assumptions about the metadata statements have to be reevaluated:
- is this a TRUSTED person?
- Geo-political references (e.g. East Germany)
- Names as Unique Identifiers (e.g. a person/city changing his/its name)
- General Assumptions (e.g. ‘Iraq is in civil war’)
- etc.
Amnesty International Case modelling is a great example of how changes in your undelying assumptions can ripple through changes in your metadata model.
March 22, 2006 at 6:12 pm
Yes, the metadata context is always expected to be recorded, too. So when Joss catalogues an image, the system should record that it is me doing so, on March 22nd 2006, using the ADAM system (actually, ADAM doesn’t do this right now but it soon will). In an archival system, the adding of metadata is always controlled so users are trusted.
I think you’re thinking of open, social systems, which are not seen as part of the archival tradition and, as far as I know, not catered for by existing standards. Archiving, particularly, the preservation process which this course is about, is not an individual activity but occurs in a controlled, institutional environment. In this environment, the metadata context is very clear (standards compliant).
And the reason that so many archival standards exist is so that we can reliably interpret metadata at a later stage by understanding the particular standard that the object was catalogued against (the standard defines the ‘archive environment’). Personal names are indeed difficult to deal with but there are recognised rules for dealing with them that Librarians and Archivists apply. Subjective statements are identified as just that in the metadata structure. Geo-political references are identified by the standard you choose to refer to. In our case, it’s AI’s country keyword list. Others may use the ISO 3166-1 standard.