Progress Report No.3: 25 June- 4 November 2002

Anna Sexton

25 October 2002

 

1. Staffing

From 27 May 2002- 6 October 2002 Chris Turner has been employed on a part-time (60%) contract. On 7 October 2002 Chris’ contract was changed to full-time and he will continue to be employed on this basis until the end of the project.

2. Research and Development

2.1 Categorisation of archive users

In the draft paper ‘Towards an Analysis of User Needs’ (Version 14, 20 December 2001), the team put forward a model for ‘categorising’ or ‘segmenting’ archive users. This model has been used to form the basis of a questionnaire that the team are using in a survey of archive users across different repositories. The questionnaire survey will provide a means of establishing a profile of the ‘typical’ archive user. This profile will then act as a basis for selecting a sample of users who can provide more detailed information about their needs and feedback on our work.

Initially the team felt that the in-depth surveying of archive users should be carried out across three archive repositories in the UK. However, the team sought advice from experts within the market research field who suggested that the surveying needed to be conducted in at least 6 different repositories. These repositories should be representative of the diversity in types of archives and should be spread across the UK. The team contacted a number of repositories to ask for participation in the user survey, in doing so the team were looking to encourage participation from the following repository types:

* National Archives
* Local Government Archives
* Business Archives
* Specialist repositories (including university archives)

The final list of participants enlisted are:

* The Public Record Office (National Archive)
* Gloucestershire Record Office (Local Government Archive)
* Birmingham Record Office (Local Government Archive)
* University of Glasgow Archive Service (A Specialist Repository that specialises in collecting Business Archives)
* University College London Special Collections (Specialist Repository)
* Wellcome Institute Archive and Manuscript Service (Specialist Repository)

It has proved difficult to find any business archives that are willing to participate in the survey. Those that have been invited to participate felt that their external user numbers were so insignificant that they would prove useless. In a business archive the users are mainly internal members of staff and much of the archival research is conducted by the archivists on the users behalf and is reported back via telephone or email. The closest the team have come to enlisting a business archive is the involvement of the University of Glasgow Archive Service who maintain and administer their own institutional archive as well as pro-actively collecting business archives from other institutions/corporations.

The team had planned to carry out all the surveying over a two month period running from 7 October 2002 – 6 December 2002. However, this time frame clashed with the annual Survey of Visitors to British Archives run by the Public Services Quality Group. Many of the repositories that expressed an interest in participating in the LEADERS survey were also involved with the PSQG and some felt unable to run two user surveys at the same time. This has meant that the LEADERS survey time frame has had to be divided into two phases. In the first phase that is running from October-December the following repositories are being surveyed:

* The Public Record Office
* University College London Special Collections
* Wellcome Institute Archive and Manuscript Service

The Public Record Office have incorporated the LEADERS questions into their own survey that they give to new readers. The PRO have offered to input and analyse the answers to the LEADERS questions as part of their larger survey. LEADERS is extremely grateful to the PRO for their enthusiasm, support and practical assistance.

The other repositories that have been enlisted will be surveyed in a two month period from January 2003 (exact dates still to be arranged).

2.2 Building the demonstrator application

LEADERS are developing a model system that will serve as a demonstrator to show what can be produced from the encoded materials and the LEADERS toolset. The demonstrator will incorporate basic search and retrieval functions and alternative presentations to show the possibilities of TEI/EAD encoded resources. The application will be used to gather feedback from users which will guide further design and development. The demonstrator is being developed on the Microsoft .NET framework using ASP.NET and if necessary C#. We are in discussion with an organisation based in Portugal – Bookmarc, which has developed Bibliographic applications using similar techniques derived from XML files. We have a direct contact with Bookmarc via Maira Ines Cordeiro a PhD student at SLAIS.

Digitisation of the selected samples for the demonstrator is now complete. The team opted to employ the UCL’s Photographic Service to carry out the work. The originals have been copied on a high resolution camera which has produced a file of under 18 MB per image. The photography complies with the Association of Photographers guidelines for the supply of digital images which stipulates that the images are saved in an uncompressed TIFF format with an embedded [Adobe (1998)] colour profile. All the pictures are neutralised to a Kodak greyscale and no editing at all has been carried out on the original high resolution files. The high resolution files will act as the archive from which all surrogate images for the prototype will derive. The images have been supplied to us on a CD in ISO-9660 format. We will be using the NISO Metadata for Images in XML (NISO MIX) Schema to record image metadata, as recommended by VADS and HEDS. Rosamund Cummings has been given a CD of the UCL material and Gill Furlong has been given a copy of the Orwell material for their own use.

2.3 The use of TEI markup for textual representations of archival documents

The LEADERS team are working towards producing a subset of the TEI which can be used to encode a wide range of archival material. In the research conducted so far, we have found that the TEI contains a number of tags that can deal with many of the commonly occurring features of archival material such as complex, additions, deletions, and gaps in the text and changes in the hand, style or character of the writing. Most of these tags can be invoked through the use of the additional TEI tagset for the ‘Transcription of Primary Sources’. Much valuable work has been done by others to set out the encoding options available for dealing with such features and this work will undoubtedly act as a foundation for some of the models and rules developed by LEADERS.

However, we have also identified the need to build models and rules that can deal with structures and features such as ‘overlaid data’, textual and numerical data presented in complex tables and the presence of formulae and mathematical expressions within the text. Data within archive documents can be described as ‘overlaid’ when an underlying layer of data is used as the basic structure onto which further data (other layer(s)) is applied. The underlying layer of data is usually printed, and the overlaying layers are usually handwritten on top pf the printed structure. Such structures and features are often found in archival material (particularly administrative archives) but the TEI’s current encoding scheme will need to be developed if they are to be comprehensively dealt with. The team are looking at a variety of examples of these structures and features across a range of documents from the UCL Archive. Rules and models for encoding are currently being formulated and tested by the team.

2.4 Overlaps between TEI and EAD

The team have spent time identifying areas of overlap in the metadata provided by the EAD encoding framework and that provided by TEI. This research has shown that overlaps occur in relation to metadata that:

* Identifies, locates and gives details about the creation of the original object
* Describes the physical characteristics of the object
* Provides contextual information about the creator and the participants within the original object
* Interprets/describes the data in the original object

The team have identified that solutions to these overlaps and our final integration method must be capable of:

* Avoiding repetition of information.
Archivists and other related professionals will not appreciate repeating information that has been recorded in one place a second time. Furthermore, the potential for confusion in a system that is trying to integrate the two encoding frameworks is increased when the same data is held in different places according to different principles.
* Allowing re-use of EAD finding aids and TEI transcripts as stand-alone objects
It is also important that both the EAD finding aids and the TEI transcripts are not integrated to the extent that one cannot be reused independently of the other. Each should be able to be exported and used in other systems/applications for other purposes as stand-alone digital objects, otherwise the potential for the reuse of data that comes when working with non-proprietary tools will not be fully exploitable.
* Supporting meaningful search and retrieval
As a primary use of metadata is to facilitate resource discovery, our integration solution must support meaningful search and retrieval and presentation of results.
We are looking at the possibility of developing a Schema which will incorporate the EAD DTD and the TEI DTD subset. However several different Schema languages have been developed by the XML community, the most notable of these are the W3C XML Schema Language, RELAX (Regular Language Description for XML) and Schematron. We are currently at the stage where we are researching into which language is the most suitable for the project’s requirements.

3. Text Encoding Initiative (TEI) Consortium Membership

In August/September 2002, SLAIS supported the project in becoming a paid member of the TEI Consortium. As the deliverables from the project involve adaptation and development of TEI, consortium membership is vital as it entitles the project to:


* The right to vote on consortium issues and in elections: Members can help determine the work and priorities of the Consortium through voting for members of the Board of Directors and on other Consortium-wide resolutions.
* Access to information not available to public: On a restricted access web-site all individual constituents of TEI Consortium members are given access to pre-release drafts of Consortium working documents and technical reports; announcements and news, and a database of members, Sponsors, and Subscribers, with contact information.
* Discounts on training, consulting services and software: Constituents of Members receive discounts and reservation priority on TEI Consortium sponsored training, workshops, seminars, summer schools, project consulting, and discount pricing on software tools.
* Certification: Members who wish to offer TEI training courses or other services, TEI software tools, or publish data in TEI formats, may have these reviewed and certified by the TEI Council at discounted rates.
* Annual Meeting: Members are invited to send representatives to an annual TEI Executive Briefing, where leaders in the text encoding community will discuss current critical issues in text encoding and digital libraries.
* Affiliation: Members are prominently identified on TEI Consortium materials (such as TEI Consortium Web Pages), and may use the TEI Consortium logo to express their affiliation on their own promotional material.

4. Dissemination

4.1 Conferences and meetings

Susan Hockey and Anna Sexton attended the Association for Literary and Linguistic Computing and the Association for Computing in the Humanities (ALLC/ACH) Conference which was held from 24-28 July 2002 at Tubingen University, Germany.

Chris Turner and Anna Sexton delivered the team’s paper entitled “TEI, EAD and Integrated User Access to Archives: Towards a Generic Toolset” at the Digital Resources in the Humanities (DRH) Conference which took place at Edinburgh University from 8-11 September 2002. The full paper has been submitted to be considered for publication in the forthcoming conference proceedings.

Chris Turner attended the Society of Archivists’ Annual Conference in Jersey from 1-4 October 2002 and delivered a paper entitled “Love Match or Shotgun Wedding?: Archivists and IT Vendors”. Chris’ paper was not officially delivered in his capacity as LEADERS Project Manager, but he did make references to our work within his talk which have proved to be useful in the promotion and dissemination of the project.

Susan Hockey and Chris Turner attended the TEI Consortium’s Annual Meeting on 12 October 2002. Susan was invited to deliver the opening keynote paper which she entitled “Markup, TEI, Digital Libraries & Humanities Scholarship” and Chris gave an introduction to the LEADERS Project in the ‘Reports from Members’ section of the meeting.

The team have submitted a proposal for running a session (3 related papers) at the Society of Archivists’ Annual Conference 2003 which will be held next September in Southampton, as well as a proposal to run a special focus session at The Society of American Archivists Annual Conference 2003 which will be held on 18-24 August in Los Angeles, USA. The team are also working on a submission for the ALLC/ACH 2003 Conference which will take place on 29 May -2 June in Athens, Georgia, USA.

4.2 Talks

Chris Turner and Anna Sexton have been invited to give a talk on LEADERS at the Society of Archivists EAD/Data Exchange Group Meeting on 14 November 2002.

The team have also been invited by Mark Greengrass to speak at the Humanities Research Institute at the University of Sheffield. The exact date is still to be arranged.

4.3 Website

The LEADERS website (http://www.ucl.ac.uk/leaders-project) was officially launched on 22 July 2002. The website was last updated on 9 October 2002 when the references page was substantially updated and PowerPoint slides from the talk delivered at DRH were added to the site. Some general statistics indicating website usage (analysed from 22 July – 20 October) are given below:

Table 1: Monthly breakdown of requests for pages

Month

No of requests for website

July 2002

885

August 2002

770

September 2002

561

October 2002

218

Table 2: Domains accessing website (domains with at least 100 requests are listed)

Domain

No of requests

% of total

.com (commercial)

1111

46%

.uk (United Kingdom)

555

23%

Unresolved numerical addresses

245

10%

.edu (USA educational)

202

8%

.net (Network)

106

4%

Not listed:25 domains

215

9%

Table 3: List of top 20 organisations accessing website (ordered by no of requests: highest first)

Organisation

googlebot.com

ucl.ac.uk

Unresolved numerical addresses

aol.com

lsu.edu

pol.co.uk

directhit.com

pro.gov.uk

fastsearch.net

umd.edu

btopenworld.com

oclc.org

uu.net

av.com

clara.net

cableinet.co.uk

wellcome.ac.uk

mcc.ac.uk

looksmart.net

rutgers.edu

4.4 Leaflets

The team have had 1000 leaflets printed for the project. The leaflets provide general information about the project with an introduction to our aims, objectives, deliverables and research questions alongside contact details for the project team.

5. Training

Attendees

Title and date of course

Provider

Chris Turner

Putting Your Databases on the Web

Oxford University Computing Service

Chris Turner & Anna Sexton

The TEI Framework and How to Use It

Oxford University Computing Service

Chris Turner & Anna Sexton

Publishing TEI Documents

Oxford University Computing Service

Chris Turner

AHDS Digitisation Workshop

Oxford University Computing Service

Chris Turner

Online Resource Discovery and Use - Humbul Humanities Hub

Oxford University Computing Service


Copyright © UCL 2002