Certification & Outsourcing

March 24, 2006 by

The last hour of the course was spent discussing the Trusted Digital Repositories (PDF) certification document and the related matter of outsourcing digital preservation to another institution.

Two of the tutors were part of the group that have drafted the TDR document and another of the tutors was involved in the peer review process. They gave some background to the document which grew out of a recommendation from a 1996 Task Force on Archiving of Digital Information. They explained the basic difference between auditing a repository and certifying a a repository, essentially saying that an audit is an internal or external process evaluating you against ideals. An audit is about continual improvement and does not necessarily have a pass/fail process. Certification, on the other hand, does result in a pass or fail, usually of an audit-like process. Certification may have elements of both processes such as ISO 9000 mandatory elements.

The authors of the TDR document were tasked to created a standard certification process or framework that can be implemented across domains or types of digital repositories.

The tutor who has been involved in the peer review process criticised the draft document on the following grounds:

  • It sets the bar too high for many archives
  • The drafting of the document was not led by archivists
  • The authors claim that they are an international body, but in fact they are mostly N. American with some Europeans but no-one east of the Netherlands
  • The OAIS standard is flexible but the TDR certification which is based on OAIS is very specific about wht the archive should achieve

However, before we throw the document out on the above criticisms, it is worth remembering that it’s only a draft and that it still offers a very useful tool for internal audit and self-improvement. As one of the tutors listed all the reasons why we should care about certification, I realised that none of them specifically apply to AI but are really aimed at public institutions who have obligations to to funding bodies, regulatory bodies, service purchasers, and external depositors. Of course, certification would still give a number of IRP staff a warm glow inside, smug with the satisfaction of international recognition of our work and the service we provide to our organisation, but I think the real value in the TDR document is that it provides practical guidance on how we can improve. Also, having audited our digital preservation archives using the TDR document, we may come to the conclusion that we’re just not up to or interested in developing an archive for long-term preservation and that we might want to outsource some or all of the work to another institution. Preferable one that is certified!

Outsourcing was touched upon at the very end of the hour and we were offered some step-by-step guidelines on approaching the possibility of outsourcing. Here they are, straight from the PPT slides:

Rationale
We can’t all be experts in everything
We need to carry out some tasks we are not well-equipped to do
We may have resource reasons (money but no space)
We may have policy reasons
It may be cheaper

Understand your needs
Define your problem before you look for a solution
Otherwise you will buy the answer to someone else’s question
Specify mechanisms to monitor and measure performance
Look at the DPC document on outsourcing

Outsourcing
Outsourcing digital preservation requirements may be an attractive option – particularly for smaller organisations
But these are also the most vulnerable in terms of what they should expect to ask for and receive
Having a system of certified repositories can help to provide assurance
The checklist of requirements can help organisations find a good match between what they think they asked for and what they receive.

That’s it really. I should add that the University of London Computing Centre which is a ten minute walk from AI will almost certainly be certified as a TDR because one of the authors of the report, runs the preservation programme there. And, yes, they take on consultancy work and are willing to discuss any outsourcing we might decide we want to do with them.

That’s the end of this blog. I hope you’ve enjoyed reading it and found at least some parts of it thought provoking. As I said early in the week, I wanted to do it because Chris sent me on the course on the condition that I gave a presentation to interested staff when I returned, which I’m happy to do, although a presentation is probably not the way to go about it. Hopefully this blog has provided some background reading (and light entertainment) for us to discuss in the near future.

See you on Monday.

Access

March 23, 2006 by

The last class of the day before getting on with our Internet Archive project (which has been very instructive), was about providing access to archival collections. Good, common-sense advice was dished out which we all pretty much knew but were pleased to hear again:

We preserve because we expect access.
We must be able to derive one from the other.
We don’t have to let one dictate the other.
Access needs may drive decisions at ingest e.g. on metadata.
There are many ways to provide access.

We were advised to ‘preserve enough to tell the story’ and that good preservation refers to ‘preserving meaningful information through time’. Quotes like that can come in handy sometimes.

Of course, depending on the archive’s remit, there are various reasons why we might have to provide access (FOI) or restrict access (DPA). Fortunately only the latter applies to us right now. There was also a brief discussion about redacting certain information before providing access, something that we do in a way with MAV’s products and transcripts for security reasons before they are put on the public database. I don’t knnow if we do it to documents, too. Do we?

Finally, the OAIS standard clearly has a strong element dedicated to ensuring access. It’s based around knowing your ‘designated community’ which may be as broad as ‘people who can read english and use the internet’ (The National Archives) or as narrow as ‘my friends and family’ (imagine an online photo service like Flickr). As usual some of the best advice was about planning ahead, being proactive abouut offering ways to access material, seperating the preservation infrastructure from the access infrastructure and collaborating with other repositories.

Legalities

March 23, 2006 by

This was a post-lunch crash course in Intellectual Property Rights, Copyright, Digital Rights Management, Freedom of Information Act, Data Protection Act and Legal Deposit. No depth but some useful highlights and another interesting case study from the ADS concering the mis-use of images and how they dealt with it.

Basically, the ADS received a collection of data, including images, from the excavation of Christ Church, Spitalfields. The images in this very interestining collection show the remains of bodies buried in the 18th century crypt. These images were found by a web site for necrophilia enthusiasts and some images were copied from the ADS site and republished on the sex-with-dead fan site. They also provided a link through to the ADS for enthusiasts to grab more images for themselves.

This was their mistake, because the ADS noticed an unexpected spike in the use of its website and traced it back to the link from the other website. It was the first time they’d had to deal with the mis-use of their digital collections and sought advice from the JISC legal team. They were advised a six-point plan spread over 70 days. The first was simply to contact the website, tell them they had broken the licence they agreed to on the ADS website and that they take the images down or else face further legal action. And they did take them down. End of story.

This was a satisfactory result for the ADS because despite the mis-use of the images, it would have been a long and difficult legal process had the web site not taken them down.

We’ve been thinking about such things for ADAM and intend to introduce a ‘handshake’ agreement prior to the download of ADAM images. Having seen how the ADS handle this ‘contract’ with its users, I’m now inclined to just have users agree to a licence when they first enter an ADAM session rather than each time they click to download. Legally it would appear to cover us. Our present system is based on authentication into the AI Intranet and then trusting that the AI staff member will respect the terms and conditions that are displayed with each image, but we think we can do better than this with little inconvenience to users. There will also be more changes to the way ADAM handles rights management and licence agreements.

The main piece of advice that the ADS gave from this example was that archives should not wait for the abuse of their content before forming a response but rather formulate a strategy for dealing with a potential incident so we can react quickly, methodically and legally. Wayne, Claire and Tim will know more about whether we’ve had to deal with this already. I’m not aware of such a strategy being in place though. In late May, an IPR expert from the Open University will be giving a one-day workshop on IPR issues for AI staff, something we intend to run each year. Having spent just an hour touching on such issues, I feel a day’s course would be well spent ensuring IS staff are informed of the risks and responsibilities involved in this area of our work. Not least because the European Copyright Directive, which applies to the UK, now makes breaking copyright protection a criminal offense rather than a civil offense, so theoretically someone could go to jail whereas it used to be that the individual/organisation would be fined based on the ‘loss’ (financial, of reputation, of relationships, etc) to the rights owner.

Costs, risk management and business planning

March 23, 2006 by

Not the way I would have chosen to start the day but it ended up being a useful morning discussing how to identify the organisational costs of running a digital archive and how to justify those cost and identify the benefits. We also discussed risk management, the implications of lifecycle management and how to cost elements of an OAIS compliant archive.

We did an interesting exercise comparing the costs of running an e-prints archive at Cornell University and The National Archive’s digital archive. Not surprisingly, the two archive’s costs are radically different because their remit and services provided are radically different. It costs TNA £18.76 to ingest/acquire a single file into their archive. A huge sum compared to Cornell’s £0.56-£2.84. This is not only because TNA’s remit is so much wider and therefore the ingest/acquisition process is much more complex, but because TNA operate in an environment where they catalogue the material themselves whereas Cornell have no catalogers but require the Professor submitting her document to provide and verify all the information/metadata. Also, TNA have huge preservation costs because they are dealing with a legacy digital material which are 20-30 years old, when no preparation was made for long-term preservation of these materials. Cornell on the other hand, are archiving simple, modern digital materials and their preservation activities are relatively easy and predicatable.

This raised a familiar and interesting question for me because we will be developing a facillity in ADAM for staff to upload images to a team catalogue and provide metadata for the image. In an ideal world, the member of staff would provide full and accurate metadata which would require no validation and could be entered directly into ADAM and immediately available on the Intranet. Of course, this is almost certainly impossible for AI. It works for Cornell because the Professor has a vested and very personal interest to ensure that her article is made widely available and correctly cited through the submission of complete and accurate metadata. Even then, an example was given where an academic catalogued their article with a single keyword representing their sole academic interest, disregarding the other subject areas which the article related to. I asked people if they had any advice on how we could have AI staff more involved in the cataloguing process but no miracle answers were forthcoming. Basically, while staff are essential providers of information about the digital object, supplying information only they might know, it’s an unacceptable organisational risk to then make those images directly available for other staff to reuse before AVR have checked and verified the metadata and, as is always the case, enriched it with further information. And of course, staff might justifiably argue that they could be making better use of their time than extensively cataloguing images and checking copyright and license agreements. There will be ways that we can ensure that the information provided to us is formed in a way that is easy to validate and enrich though and that’s the approach we’ll be taking with ADAM.

At one point while trying to breakdown the cost elements of a digital archive I realised that we were a room full of archivists trying to do the job that IT professionals have been doing for years. The element costs involved in digital archiving such as hardware, software, licenses, support, development, fixtures and fittings, etc. are costs that we share with ITP. Where IRP need to demonstrate costs is by detailing the work processes and therefore the staff time involved and the business reasons why archival preservation might require three or four times the storage requirements, a different approach to risk management, changes in data management, etc. But, with the exception of staff time, a digital archive uses readily available IT solutions in a specific way. I tried to make this point that we (archivists) are not the people best placed to cost IT systems but rather need to work with IT professionals and draw on their existing experience in planning, purchasing and maintaining systems. I think that to an IT department, a digital archive is just another application of IT hardware, software and processes. Do you agree?

This wasn’t the first time I’ve found that archivists tend to look at a digital archive infrastructure as something new and perculiar to them and completely alien to IT professionals. Sure, there might be different requirements that some IT staff might not be familiar with but it’s the archivist’s role to explain and justify these in business terms and in return, let the IT staff deliver the infrastructure requirements to meet the business case. It’s just data that needs to be treated a bit differently, that’s all.

Despite this frustration, this class had real practical value for me and was a morning well spent.

NDAD – The National Digital Archive of Datasets.

March 22, 2006 by

The last session of the day before getting on with our class project about the Internet Archive was on NDAD.

It’s a service that the ULCC perform for TNA by preserving UK government databases and records no longer in use. Sounds dull but of course these databases include the National Inventory of Woodland and Trees, a survey of British Bats, statistics on how many accidents there are in the home, crime statistics and the names and assets of victims of Nazi persecution who were compensated by the UK government. These databases are all transferred to NDAD who migrate them into sustainable formats, document them with good metadata and make them available to search online. It sounds like a great place to work if you’re interested in the history of computing as some of these databases were the largest of their kind at the time and represent significant historical moments in the history of computing. They also have to deal with all the legacy software and hardware issues, data analysis, system design and digital conversion as well as the development of emulators, data recovery and so on. Hacker heaven and it’s only a ten minute walk from AI.

We assessed NDAD against the OAIS standard and it does pretty well. Since TNA essentially do the selection of the databases, NDAD have no negotiation in the Ingest stage, but together TNA and NDAD are a functioning OAIS archive. And it’s only a ten minute walk from AI! It feels a bit like saying you live only ten minutes from Buckingham Palace.