Archive for the ‘Web’ Category

Archiving the World Wide Web

March 21, 2006

After a heavy lunch (I was longing for just a sandwich but it’s a residential course, so you eat what you are given), we discussed the archiving of web sites.

Frankly, I think this is a doomed project. You only have to look at the Internet Archive to see its limitations. It’s great for harvesting flat HTML files but faced with Javascript, dynamic sites, database driven sites, and pretty much any Web 2.0 technology, you’re screwed. Still, there’s a new group called UKWAC in the UK determined to archive selected UK web sites. I can see it working when you have the close co-operation of the web site owner, but if you’re trying to capture sites on an ad hoc basis, you’re going to end up with a lot of style and no content.

Archiving web sites seems like a curious legacy exercise to me that will be abandoned eventually, I’m sure. A web page is increasingly about presenting dynamically changing information to Users based upon their selection and not just a set of predetermined and static information. Web browsers these days are often serving up information that is stored in Content Management Systems rather than as flat files in a directory on a web server. The Internet Archive has been ‘archiving’ since 1996. Click here to see a front page from 1997. Not bad. Now look at 2005 here. Looks good at first. Now click on the interactive links such as this. Doh! SVAW’s been ‘lost’. How about AI’s reports? Click on the ‘Library‘ page and try finding a report from any country and you get sent out of the Internet Archive’s site and into AI’s original site! This demonstrates to me how superficial their efforts are increasingly becoming. It tells me that we shouldn’t rely on other organisations such as the Internet Archive to take responsibility for the general archiving of our web site. It’s something only we can do. And in a way, we do archive a lot of the important content on the web site. We’ve got ADAM (AV and multimedia) and AIDOC (indexed reports and press releases) afterall. And a new Content Management System will make it easier to manage other content we’re creating for the web and assist us in organising the specific web content we might wish to archive. But do we need to archive the user experience? Whether it’s worth putting our resources into doing this is up for discussion, I guess. It doesn’t interest me. The way information is presented changes over time as design and technology shift together. Perhaps this has value to some cultural institutions such as a design museum. I just see it as a shallow exercise in vanity rather than of historical value. If we’re putting content on the web that is of genuine archival importance to the organisation, then that content should be archived but not each and every style sheet that displays it. I don’t think we should get too drawn in by developing and changing web technologies. It’s fun to look back a few years and see how web presentation has changed but that’s all.

On a related note, the final session for the day was to start our group project which is to look at the Internet Archive and decide whether it lives up to its mission as an ‘archive’. You can probably tell what I think already, but I’ve still got three days of research to do before we present our conclusions. Actually, there’s more to the Internet Archive than just crawling web sites. They collect books, software, films and music and I think that’s where their value will lie in the future in addition to having collected a few years worth of functional early web sites.