View University CalendarsView University DirectoriesSearch the SiteGo to the SitemapGo to the Homepage

Web Archive Policy

We need two backup copies:

  1. Current backup copy A working copy, refreshed weekly, to be available for fixing messes and replacing lost files.
  2. Historical archive copy An archival copy for "the record."

    Two preliminary approaches to this: burn a cd-rom once a month with the site on it, and create a pdf "hard copy" of the site. This 3.2MB "copy" of the About directory is an example.

Reports from other places:

I asked the CLAC web list:
"Now that we're on the third "edition" of our web site, we're starting to think seriously about The Historical Record. We want to have a copy of the site (or selected portions) for future study, etc. Has anyone developed policies and procedures for archiving web sites?"

Davidson College
"We have archived some of our web information. We decided to print off the home page and top levels and archive those. We worked with our archivist on this idea."

Carleton College
"We don't have a formal plan, though the college archivist has been talking about setting something up. We have a couple of archived captures, but nothing consistent."

Occidental College
"We don't keep "old" copies of the site (except on backup tapes)."

Reed
"doesn't have a policy or procedure."

Alma College:
"We do have a minimal procedure for the archival of our web site. We just maintain a week to week archival, as the site continues to grow rapidly. Currently, we do a tar-zip on the directories, and store them on a zip disk For any long term archivals, we have not yet done anything with that."

Union College
"the only kind of archiving we do are with backup tapes (Daily, Weekly, and end of term). We do not archive for historical reference for future study."

I was corresponding with Kevin Lowey at the University of Saskatchewan about something else, so I asked him about archives as well. His response:
"We do nightly backups of the main campus web server (www.usask.ca). However, that is intended for file recovery, not archival purposes.

We currently do not have an "archive" of the information on the web site. However, that may be of interest for historical purposes. We may look into doing this in association with the University Archives department.

There are some significant technical issues:

  1. The U of S web site is actually housed on about 60 different web servers. Most are on "www.usask.ca", but some are on individual servers operated by departments or colleges, including commerce, engineering, computer science, extension, etc. Many of these use different software (such as macintosh for Extension, unix for others, NT for others, etc.)
  2. Many of the pages use server specific features, like "server side includes", that when viewed out of context without the server will give unexpected results. So, to accurately archive all the pages, we would have to archive the server software too.
  3. Many of our features are stored in "databases". Again, to properly archive these databases, we would have to archive the database programs, the programs that provide the web interface between the web server and the database, etc.
  4. The volume of information is staggering. Just on the single www.usask.ca web server, the web area takes 2.25 Gigabytes of information, or roughly four CD-ROMs. That's not counting the information on the other 60 servers around campus.

However, we might be able to make a limited archive of just the files on the system. I'll talk this over with the other web coordinators. If this is done, it would be for the University Archives department (if they decide it would be useful).

Thanks for the suggestion."