Archive for the ‘Risks’ Category

Legalities

March 23, 2006

This was a post-lunch crash course in Intellectual Property Rights, Copyright, Digital Rights Management, Freedom of Information Act, Data Protection Act and Legal Deposit. No depth but some useful highlights and another interesting case study from the ADS concering the mis-use of images and how they dealt with it.

Basically, the ADS received a collection of data, including images, from the excavation of Christ Church, Spitalfields. The images in this very interestining collection show the remains of bodies buried in the 18th century crypt. These images were found by a web site for necrophilia enthusiasts and some images were copied from the ADS site and republished on the sex-with-dead fan site. They also provided a link through to the ADS for enthusiasts to grab more images for themselves.

This was their mistake, because the ADS noticed an unexpected spike in the use of its website and traced it back to the link from the other website. It was the first time they’d had to deal with the mis-use of their digital collections and sought advice from the JISC legal team. They were advised a six-point plan spread over 70 days. The first was simply to contact the website, tell them they had broken the licence they agreed to on the ADS website and that they take the images down or else face further legal action. And they did take them down. End of story.

This was a satisfactory result for the ADS because despite the mis-use of the images, it would have been a long and difficult legal process had the web site not taken them down.

We’ve been thinking about such things for ADAM and intend to introduce a ‘handshake’ agreement prior to the download of ADAM images. Having seen how the ADS handle this ‘contract’ with its users, I’m now inclined to just have users agree to a licence when they first enter an ADAM session rather than each time they click to download. Legally it would appear to cover us. Our present system is based on authentication into the AI Intranet and then trusting that the AI staff member will respect the terms and conditions that are displayed with each image, but we think we can do better than this with little inconvenience to users. There will also be more changes to the way ADAM handles rights management and licence agreements.

The main piece of advice that the ADS gave from this example was that archives should not wait for the abuse of their content before forming a response but rather formulate a strategy for dealing with a potential incident so we can react quickly, methodically and legally. Wayne, Claire and Tim will know more about whether we’ve had to deal with this already. I’m not aware of such a strategy being in place though. In late May, an IPR expert from the Open University will be giving a one-day workshop on IPR issues for AI staff, something we intend to run each year. Having spent just an hour touching on such issues, I feel a day’s course would be well spent ensuring IS staff are informed of the risks and responsibilities involved in this area of our work. Not least because the European Copyright Directive, which applies to the UK, now makes breaking copyright protection a criminal offense rather than a civil offense, so theoretically someone could go to jail whereas it used to be that the individual/organisation would be fined based on the ‘loss’ (financial, of reputation, of relationships, etc) to the rights owner.

Costs, risk management and business planning

March 23, 2006

Not the way I would have chosen to start the day but it ended up being a useful morning discussing how to identify the organisational costs of running a digital archive and how to justify those cost and identify the benefits. We also discussed risk management, the implications of lifecycle management and how to cost elements of an OAIS compliant archive.

We did an interesting exercise comparing the costs of running an e-prints archive at Cornell University and The National Archive’s digital archive. Not surprisingly, the two archive’s costs are radically different because their remit and services provided are radically different. It costs TNA £18.76 to ingest/acquire a single file into their archive. A huge sum compared to Cornell’s £0.56-£2.84. This is not only because TNA’s remit is so much wider and therefore the ingest/acquisition process is much more complex, but because TNA operate in an environment where they catalogue the material themselves whereas Cornell have no catalogers but require the Professor submitting her document to provide and verify all the information/metadata. Also, TNA have huge preservation costs because they are dealing with a legacy digital material which are 20-30 years old, when no preparation was made for long-term preservation of these materials. Cornell on the other hand, are archiving simple, modern digital materials and their preservation activities are relatively easy and predicatable.

This raised a familiar and interesting question for me because we will be developing a facillity in ADAM for staff to upload images to a team catalogue and provide metadata for the image. In an ideal world, the member of staff would provide full and accurate metadata which would require no validation and could be entered directly into ADAM and immediately available on the Intranet. Of course, this is almost certainly impossible for AI. It works for Cornell because the Professor has a vested and very personal interest to ensure that her article is made widely available and correctly cited through the submission of complete and accurate metadata. Even then, an example was given where an academic catalogued their article with a single keyword representing their sole academic interest, disregarding the other subject areas which the article related to. I asked people if they had any advice on how we could have AI staff more involved in the cataloguing process but no miracle answers were forthcoming. Basically, while staff are essential providers of information about the digital object, supplying information only they might know, it’s an unacceptable organisational risk to then make those images directly available for other staff to reuse before AVR have checked and verified the metadata and, as is always the case, enriched it with further information. And of course, staff might justifiably argue that they could be making better use of their time than extensively cataloguing images and checking copyright and license agreements. There will be ways that we can ensure that the information provided to us is formed in a way that is easy to validate and enrich though and that’s the approach we’ll be taking with ADAM.

At one point while trying to breakdown the cost elements of a digital archive I realised that we were a room full of archivists trying to do the job that IT professionals have been doing for years. The element costs involved in digital archiving such as hardware, software, licenses, support, development, fixtures and fittings, etc. are costs that we share with ITP. Where IRP need to demonstrate costs is by detailing the work processes and therefore the staff time involved and the business reasons why archival preservation might require three or four times the storage requirements, a different approach to risk management, changes in data management, etc. But, with the exception of staff time, a digital archive uses readily available IT solutions in a specific way. I tried to make this point that we (archivists) are not the people best placed to cost IT systems but rather need to work with IT professionals and draw on their existing experience in planning, purchasing and maintaining systems. I think that to an IT department, a digital archive is just another application of IT hardware, software and processes. Do you agree?

This wasn’t the first time I’ve found that archivists tend to look at a digital archive infrastructure as something new and perculiar to them and completely alien to IT professionals. Sure, there might be different requirements that some IT staff might not be familiar with but it’s the archivist’s role to explain and justify these in business terms and in return, let the IT staff deliver the infrastructure requirements to meet the business case. It’s just data that needs to be treated a bit differently, that’s all.

Despite this frustration, this class had real practical value for me and was a morning well spent.

Preservation Approaches to Technological Obsolescence

March 22, 2006

At 9am sharp we went straight into issues surrounding the obsolescence of digital file formats and their supporting digital hardware and software. What better way to begin the day!

Generally there are three or four ways of dealing with this:

Migration: Changing a file from one format to another. i.e. Word 2.0 file to Word XP file. Migration changes the data but hopefully in a way which retains the integrity of the digital object.

Things to consider might be whether the new format can still represent the ‘significant properties’ of the original format. Can the migration be done automatically? How long will it take (and therefore how much will it cost?). On what basis is the new format chosen? How do we know the migration is 100% successful?

Refreshing: Moving files from one storage media to another. i.e. moving a document from a 5″ floppy disk to a networked server. The object remains unchanged.

Emulation: Writing software that runs on a modern Operating System which emulates the software environment of the original creator application. i.e. writing a Spectrum ZX81 emulator to run my favourite game of all time: ‘Elite’.

Preservation of hardware and software: Basically, you keep a museum of old computers with the original software running on them.

Each approach can be useful depending on the circumstances, although emulation and the museum approach are generally regarded as the most inconvenient. Archives aren’t museums and approaching preservation this way is contrary to the digital archival process which is to move conservatively with changing technology rather than hang on to it.

Software emulation is invaluable some of the time, but may be expensive to undertake because of the development resources required and often a black art in reverse engineering as older technologies tend to be poorly specified or the programing skills required are, like many skills, lost over generations as technology moves on. Also, if you emulate the original software faithfully, then you get the older, more difficult interfaces that came with it. For a large collection of a single file format, a single emulator might be a useful method of access to multiple objects. It also helps retain our understanding of older systems. The BBC Doomsday Project is a good example of when emulation was the most successful method of bringing the data back to life.

In most situations though, migration of file formats and refreshing of storage media are what most archivists rely on. At the IS, for example, we already undertake these approaches, migrating paper to microfilm, WordPerfect files to Word files or PDF, and by incrementally upgrading our hardware and software environments. I think it would be useful if ITP and IRP discuss a joint strategy for this, recognising that traditional IT migration strategies do not always recognise the archivist’s needs and expectations. Digital Archiving is part of both the IT world and Archiving world and our digital preservation requirements need to be reflected in a joint ITP and IRP agreement. Already AVR are starting to feel the need for this as we archive 200GB of images on IT’s servers and require 1TB/year for the storage of video. The storage of this data is not static but requires continual backup, migration and refreshing over time and clearly the two departments need to acknowledge this formally and make resources available.

The next session followed on from this, discussing how Archivists might live with obsolescence. I was hoping for personal spiritual guidance but instead we discussed particular examples where the above approaches might be useful.

I won’t go into detail here, but predictably enough, it focused on the need to develop organisational strategies, promoting the need to analyse and evaluate the collections, create inventories, determine preferred file formats and storage media, assess how market conditions affect the longevity of IT systems, adopt metadata standards, work with IT departments on joint strategies (as per above), watch technological changes and developments (actually write this into someone’s role responsibilities), and be prepared for hard work and headaches.

Introductions & Why Bother?

March 20, 2006

Ten people are attending including me. Most are from publicly funded organisations such as the Scottish National Archives, Scottish National Library, The National Archives, N. Ireland Public Records Office, Natural History Museum, Durham University and The University of London. Reassuringly, most people said that, like AI, they are in the early stages of developing a digital archive and digital preservation policies.

The first presentation is entitled, ‘Why bother? Incentives and risks in digital preservation.’ Mostly familiar stuff explaining why the preservation of digital data is important. The speakers acknowledge that they are preaching to the converted but I suppose it’s important we get it out of the way. Perhaps because many delegates are from publicly funded organisations, they’ve highlighted how one of the stimuli for developing preservation strategies and implementing digital preservation archives is because their funding bodies expect them to so that data can be re-used. A later PPT slide (oh, sweet lamb of God! a week of Powerpoint Presentations…), showed how over the last decade, the re-use of existing data has increased at a rate far greater than the production of data. So it speaks for itself. If people want to and expect to be able to re-use existing data, then there’s a need to preserve it.

More interesting (and unverifiable) facts to come out of this first session is that the world produces the equivalent of 37,000 Library of Congresses each year. I’m glad it wasn’t my job to work out the figures for that. Lots of data is being created each year with 75% of it being digital. Yet, three times of this data flows unrecorded, 99% via the telephone. So let’s not get too hung up on preserving every last piece of information we exchange. Like much in life, it’s about prioritising and selecting the right type of information to be recorded. The Curator’s job.

The now infamous BBC Doomsday Project was discussed and also the more interesting NASA Viking Mission where information from a Mars landing was ‘preserved’ on data tape which, when needed 30 years later, was found to be deteriorating, despite being in what was considered decent environmental conditions. Significantly, it was this incident that led to NASA leading the creation of the OAIS Reference Model (see links to the right), the main subject of this week’s training programme.

Finally, this first session discussed Mind The Gap, the new report on the state of digital preservation in the UK. Highlights include:

  • 84% of respondents to a questionnaire for the report agreed there were legal drivers to preserve in their organisations.
  • 73% recognised that if they failed to comply they would be failing to meet legal requirements.
  • 70% of UK companies use email for contract negotiations, HR letters and financial transactions.
  • 81% were able to specify a lifetime for the digital information and had to keep some of it for at least 50 years. How?
  • 64% need to preserve digital data in order to protect intellectual property.
  • 22% preserve to support patent applications.
  • Over 80% recognised that their organisations would benefit from improved access to information brought about by having a suitably catalogued and searchable digital repository.
  • 50% of respondents still print out documents in order to preserve them!