Archive for the ‘OAIS’ Category

NDAD – The National Digital Archive of Datasets.

March 22, 2006

The last session of the day before getting on with our class project about the Internet Archive was on NDAD.

It’s a service that the ULCC perform for TNA by preserving UK government databases and records no longer in use. Sounds dull but of course these databases include the National Inventory of Woodland and Trees, a survey of British Bats, statistics on how many accidents there are in the home, crime statistics and the names and assets of victims of Nazi persecution who were compensated by the UK government. These databases are all transferred to NDAD who migrate them into sustainable formats, document them with good metadata and make them available to search online. It sounds like a great place to work if you’re interested in the history of computing as some of these databases were the largest of their kind at the time and represent significant historical moments in the history of computing. They also have to deal with all the legacy software and hardware issues, data analysis, system design and digital conversion as well as the development of emulators, data recovery and so on. Hacker heaven and it’s only a ten minute walk from AI.

We assessed NDAD against the OAIS standard and it does pretty well. Since TNA essentially do the selection of the databases, NDAD have no negotiation in the Ingest stage, but together TNA and NDAD are a functioning OAIS archive. And it’s only a ten minute walk from AI! It feels a bit like saying you live only ten minutes from Buckingham Palace.

Real life OAIS

March 21, 2006

Woke at 6.30, disorientated by furious bird song. I am more used to a furious John Humphreys waking me in the morning.

Following on from yesterday’s introduction to the OAIS standard, we’ve just been discussing how the standard has been implemented at the ADS. Very useful to hear from a relatively small archive how they have tested basic compliance with the standard. The class followed the ‘digital pipeline’ of Submission Information Package (SIP), Archive Information Package (AIP) and Dissemination Information Package (DIP). More traditionally known as Deposit, Preservation and Access.

I’ve tried doing a similar exercise for AVR/ADAM in the past and was again reassured that our current situation is getting close to compliant in the broadest sense and that ADAM v2.0 should be pretty much there. The auditing features being implemented in ADAM v1.2 are important as is the new Upload/Submission interface we’re developing for AI Teams. The discussion reminded me that there’s a few more things to include in the services ADAM provides but nothing that should give Merlin, Damon and co. too much of a headache.

OAIS Introduction.

March 20, 2006

I’ve been wrestling with the OAIS document/standard (Open Archival Information System) for about 18 months and have only recently, finally been able to understand its real-life application in detail. It’s a ‘functional model’ for digital preservation archives, developed collaboratively by institutions from all over the world, led by NASA and originally meant to provide a model for the preservation of their own space programme data. It became an approved ISO standard in 2002.

Here is an OAIS on the most general functional level:

OAIS Functional Entities

The tutor did an excellent job of showing how this relates to real world archiving and then discussed the model on a slightly deeper level, breaking down each ‘actor’, ‘object’ and ‘action’. It was reassuring to hear that the model is not meant to dictate each and every function of a ‘compliant’ archive, but rather serve as a very thorough checklist for the design and functionality of a digital archive of any size (it’s a ‘standard’ after all). It is obvious when you read the standard that some of the functions are essential and any archivist would naturally expect to find them in any archive. Other functions might be useful to some archives and not to others, often depending on the size and remit, but each function does serve to stiumlate archive managers into questioning whether their digital archive is serving the ‘designated community’ (users) correctly. When thinking of how ADAM v2.0 should function, I’ve used the OAIS standard as a model of ‘best practice’ and though we’ve got some way to go, it remains a useful benchmark to work with. At the highest level, it’s just acquisition, storage and access. And then, as you drill deeper into the detail, it raises many questions about workflow, authentication and validation of digital objects, ability to audit each service correctly and fully, and ultimately ensures the archive is designed to serve the community of people it functions for.

It’s also about people, the archive staff and users. Clearly some areas of the functional model suggest a level of automation through the use of computing but other sections of the standard are about decision making processes and strategic planning.

We finished with a quick exercise to test our understand of the OAIS model at its highest level. I scored four out of five. Not bad. I learn from my mistakes.

More OAIS tomorrow (and the next day and the day after that…)

Introductions & Why Bother?

March 20, 2006

Ten people are attending including me. Most are from publicly funded organisations such as the Scottish National Archives, Scottish National Library, The National Archives, N. Ireland Public Records Office, Natural History Museum, Durham University and The University of London. Reassuringly, most people said that, like AI, they are in the early stages of developing a digital archive and digital preservation policies.

The first presentation is entitled, ‘Why bother? Incentives and risks in digital preservation.’ Mostly familiar stuff explaining why the preservation of digital data is important. The speakers acknowledge that they are preaching to the converted but I suppose it’s important we get it out of the way. Perhaps because many delegates are from publicly funded organisations, they’ve highlighted how one of the stimuli for developing preservation strategies and implementing digital preservation archives is because their funding bodies expect them to so that data can be re-used. A later PPT slide (oh, sweet lamb of God! a week of Powerpoint Presentations…), showed how over the last decade, the re-use of existing data has increased at a rate far greater than the production of data. So it speaks for itself. If people want to and expect to be able to re-use existing data, then there’s a need to preserve it.

More interesting (and unverifiable) facts to come out of this first session is that the world produces the equivalent of 37,000 Library of Congresses each year. I’m glad it wasn’t my job to work out the figures for that. Lots of data is being created each year with 75% of it being digital. Yet, three times of this data flows unrecorded, 99% via the telephone. So let’s not get too hung up on preserving every last piece of information we exchange. Like much in life, it’s about prioritising and selecting the right type of information to be recorded. The Curator’s job.

The now infamous BBC Doomsday Project was discussed and also the more interesting NASA Viking Mission where information from a Mars landing was ‘preserved’ on data tape which, when needed 30 years later, was found to be deteriorating, despite being in what was considered decent environmental conditions. Significantly, it was this incident that led to NASA leading the creation of the OAIS Reference Model (see links to the right), the main subject of this week’s training programme.

Finally, this first session discussed Mind The Gap, the new report on the state of digital preservation in the UK. Highlights include:

  • 84% of respondents to a questionnaire for the report agreed there were legal drivers to preserve in their organisations.
  • 73% recognised that if they failed to comply they would be failing to meet legal requirements.
  • 70% of UK companies use email for contract negotiations, HR letters and financial transactions.
  • 81% were able to specify a lifetime for the digital information and had to keep some of it for at least 50 years. How?
  • 64% need to preserve digital data in order to protect intellectual property.
  • 22% preserve to support patent applications.
  • Over 80% recognised that their organisations would benefit from improved access to information brought about by having a suitably catalogued and searchable digital repository.
  • 50% of respondents still print out documents in order to preserve them!