Archive for the ‘Preservation’ Category

Certification & Outsourcing

March 24, 2006

The last hour of the course was spent discussing the Trusted Digital Repositories (PDF) certification document and the related matter of outsourcing digital preservation to another institution.

Two of the tutors were part of the group that have drafted the TDR document and another of the tutors was involved in the peer review process. They gave some background to the document which grew out of a recommendation from a 1996 Task Force on Archiving of Digital Information. They explained the basic difference between auditing a repository and certifying a a repository, essentially saying that an audit is an internal or external process evaluating you against ideals. An audit is about continual improvement and does not necessarily have a pass/fail process. Certification, on the other hand, does result in a pass or fail, usually of an audit-like process. Certification may have elements of both processes such as ISO 9000 mandatory elements.

The authors of the TDR document were tasked to created a standard certification process or framework that can be implemented across domains or types of digital repositories.

The tutor who has been involved in the peer review process criticised the draft document on the following grounds:

  • It sets the bar too high for many archives
  • The drafting of the document was not led by archivists
  • The authors claim that they are an international body, but in fact they are mostly N. American with some Europeans but no-one east of the Netherlands
  • The OAIS standard is flexible but the TDR certification which is based on OAIS is very specific about wht the archive should achieve

However, before we throw the document out on the above criticisms, it is worth remembering that it’s only a draft and that it still offers a very useful tool for internal audit and self-improvement. As one of the tutors listed all the reasons why we should care about certification, I realised that none of them specifically apply to AI but are really aimed at public institutions who have obligations to to funding bodies, regulatory bodies, service purchasers, and external depositors. Of course, certification would still give a number of IRP staff a warm glow inside, smug with the satisfaction of international recognition of our work and the service we provide to our organisation, but I think the real value in the TDR document is that it provides practical guidance on how we can improve. Also, having audited our digital preservation archives using the TDR document, we may come to the conclusion that we’re just not up to or interested in developing an archive for long-term preservation and that we might want to outsource some or all of the work to another institution. Preferable one that is certified!

Outsourcing was touched upon at the very end of the hour and we were offered some step-by-step guidelines on approaching the possibility of outsourcing. Here they are, straight from the PPT slides:

We can’t all be experts in everything
We need to carry out some tasks we are not well-equipped to do
We may have resource reasons (money but no space)
We may have policy reasons
It may be cheaper

Understand your needs
Define your problem before you look for a solution
Otherwise you will buy the answer to someone else’s question
Specify mechanisms to monitor and measure performance
Look at the DPC document on outsourcing

Outsourcing digital preservation requirements may be an attractive option – particularly for smaller organisations
But these are also the most vulnerable in terms of what they should expect to ask for and receive
Having a system of certified repositories can help to provide assurance
The checklist of requirements can help organisations find a good match between what they think they asked for and what they receive.

That’s it really. I should add that the University of London Computing Centre which is a ten minute walk from AI will almost certainly be certified as a TDR because one of the authors of the report, runs the preservation programme there. And, yes, they take on consultancy work and are willing to discuss any outsourcing we might decide we want to do with them.

That’s the end of this blog. I hope you’ve enjoyed reading it and found at least some parts of it thought provoking. As I said early in the week, I wanted to do it because Chris sent me on the course on the condition that I gave a presentation to interested staff when I returned, which I’m happy to do, although a presentation is probably not the way to go about it. Hopefully this blog has provided some background reading (and light entertainment) for us to discuss in the near future.

See you on Monday.


NDAD – The National Digital Archive of Datasets.

March 22, 2006

The last session of the day before getting on with our class project about the Internet Archive was on NDAD.

It’s a service that the ULCC perform for TNA by preserving UK government databases and records no longer in use. Sounds dull but of course these databases include the National Inventory of Woodland and Trees, a survey of British Bats, statistics on how many accidents there are in the home, crime statistics and the names and assets of victims of Nazi persecution who were compensated by the UK government. These databases are all transferred to NDAD who migrate them into sustainable formats, document them with good metadata and make them available to search online. It sounds like a great place to work if you’re interested in the history of computing as some of these databases were the largest of their kind at the time and represent significant historical moments in the history of computing. They also have to deal with all the legacy software and hardware issues, data analysis, system design and digital conversion as well as the development of emulators, data recovery and so on. Hacker heaven and it’s only a ten minute walk from AI.

We assessed NDAD against the OAIS standard and it does pretty well. Since TNA essentially do the selection of the databases, NDAD have no negotiation in the Ingest stage, but together TNA and NDAD are a functioning OAIS archive. And it’s only a ten minute walk from AI! It feels a bit like saying you live only ten minutes from Buckingham Palace.

Records Management and Digital Preservation

March 22, 2006

I wondered whether the next two sessions would be a bit dry, but not so. They were both led by an Archivist from the University of London Computing Centre who, in his free time, has a Friday evening radio show on RessonanceFM.

This was a brief intro to RM discussing how it fits in with the digital presentation process. I’ve never been trained in RM so it was useful for me and these are the highlights of what I learned:

RM is the efficient control of the creation, reciept, maintenance, use, retention and disposition of records. It’s an archival skill but overlaps with business analysis and it’s assisted by international standards such as ISO 15489.

Why do we need records? Well, they might be evidence, required for accountability, for decision-making and to record institutional ‘memory’. Good records are authentic, accurate, accessible, complete and comprehensive. They are compliant, effective and secure. I was told that RM assists and supports an organisation’s business processes; it identifies and protects vital records, ensures legal and regulatory compliance, provides protection against litigation, and allows compliance with Freedom of Information legislation.

With the growth of digital records, there’s obviously been a massive quantative increase in information. Digital records share the same issues as paper records such as acquisition, preservation, storage and retrieval but also present additional challenges. Digital records are characterised by being easy to create, copy, share, modify and store in multiple locations. They can be complex, transient, vulnerable, software and hardware dependent.

Good digital records management is an underlying framework to good digital curation.

A sound migration plan is essential to good digital RM and inherent in the preservation planning recommended by OAIS. A migration plan is an essential part of ensuring that formats are retained and readable throughout their lifecycle.

Of course, digital records have metadata which also needs to be managed and assists with the authenication of a record. I was told that good electronic records management policy should cover the creation and capture of all corporate records within the RM system. It should cover the design and management of indexing and naming schemes. It should offer policy on the automated management of metadata, for retrieval and retention. It should ensure that records are ‘locked down’ to ensure their integrity and security. It should also provide guidance on the retention, preservation and destruction of digital records.

We discussed which type of records are selected for management: vital records needed to sustain the organisation’s business; records essential for legal compliance; and records with mid to long-term administrative value. These kind of things should form the basis of a selection policy.
It all sounds like archiving to me with the exception that there’s more destruction in RM. I do see how RM is more business focused though and the issues of preservation are not always so difficult when you might be retaining records for shorter periods of time. Still, there’s no reason why RM shouldn’t fit into an OAIS environment. The main characteristics of selection, validation, fixity, preservation planning, metadata standards and retrieval/access are clearly very simmilar. Perhaps Fiona or Lynda can explain more to me when I get back.

A whole hour discussing file formats!

March 22, 2006

I departed from earth this afternoon. I’m not sure where I went but this session on file formats and then a further session on digital records management took me places I never thought I’d go.

The title of this class was ‘File Formats: Matters to Consider’, and I found it fascinating.

First, we were shown where file formats fit in the hierarchy of the IT system:

Semantic Layer
Actions Layer
Format Layer (Alright!)
Filesystem Layer
Media Layer

Then an anecdote about how some file formats and their creating applications are better used for some tasks and not for others. The tutor knew someone who wrote a novel in Excel because he didn’t have any other software to hand and I guess curiosity didn’t get the better of him either.

We did a quick exercise in what features to look for in a file format for preservation purposes. Not too difficult:

Open, documented, widely used and therefore supported, interoperable over different Operating Systems, lossless/no compression, metadata support, etc. etc.

Another anecdote was that ten years ago, two men wrote a book detailing over 3000 graphic file formats. As the number of formats grew, it was revised and issued on a CD-ROM. Now it’s updated on the web. I’m sure Tim would love it.

I’ll state this here: ADAM handles two graphic file formats for a reason. They are both open, documented, widely used, well supported, interoperable and have metadata support. The list of supported graphic file formats may double or triple over time, but 3000+ demonstrates what an industry digital archives are having to deal with.

If you want guidance on file formats (and who doesn’t?), then look no further than these fine institutions:

FCLA Digital Archive
Harvard University formats registry
ERPAnet file formats
Library of Congress (my favourite).

We finished up by looking at the conversion of file formats, something which presents problems when you want to preserve the original integrity of the file’s content but in a more suitable or non-obsolete file format.

I could go on about file formats but let’s face it, we’ve both had enough for one day. Let’s talk at length in the ‘breakout’ area when I get back, OK?

Institutional Repositories

March 22, 2006

Maybe I should think up snappier title headings to these blogs. Believe me, occasionally I’m sitting in the class wondering how the hell I got here. Though I should say that the quality of the training programme so far has been very high and I’m finding it very engaging. The tutors are decent, down-to-earth people with practical advice. In other news, the stats for this blog suggest that most of ITP and IRP looked at it yesterday. Tomorrow’s stats should be interesting… 😉

Basically, this was a discussion on DSpace and the OCLC. Fedora was mentioned but only briefly. That’s OK, because Fiona, Damon and I attended a conference on Fedora last year. Damon’s an expert so ask him all the questions about Fedora… The implementation of a ‘trusted repository’ is central to digital archiving and the two main course documents are the OAIS standard and the follow up document, Trusted Digital Repositories. The TDR document basically goes through all the attributes and responsibilities that an OAIS compliant have. The report defines a TDR as:

A trusted digital repository is one whose mission is to provide reliable, long-term access to managed digital resources to its designated community, now and in the future.

It’s a useful document for testing how well your institution is doing.

DSpace is a repository system that’s been developed at MIT. It’s very popular (OK, so that’s a relative term…) in the USA and some UK institutions use it too. From what I could see, it provides a customisable ‘repository out of the box’ and shares some functionality with a Content Management System.

DSpace has three preservation service levels, providing functional preservation through ‘supported’ (1) and ‘recognised’ (2) file formats and bit-level (3) preservation. I don’t think it is ‘OAIS compliant’ but clearly it follows the basic OAIS functional model of Ingest of Submission Information Packages, creation of Archival Information Packages and the creation of Dissemination Information Packages. The example we were shown worked very well for the submission and archiving of a document by an academic writer. From the Fedora conference we attended, I’d got the impression DSpace was a bit crap, but there’s some competition between the two systems so that shouldn’t be surprising. Fedora is a different animal really as it provides a suite of repository services which developers are expected to work with while DSpace is useable out of the box by people without programming skills.

OCLC, The Online Computer Library Centre provides a repository service for other institutions so is really an out-sourcing solution. Accessible over the web with OAIS-like functionality but, of course, still requires that you prepare your collection for Ingest.