EASTER - Evaluating Automated Subject Tools for Enhancing Retrieval
This is a JISC funded project hosted by UKOLN at the University of Bath.
The purpose of the project is to:
Test and evaluate existing tools for automated subject metadata generation;
Better understand what is possible and what the limitations of current solutions are; and,
Make subsequent recommendations for services employing subject metadata in the JISC community.
Department of Information Studies, University College London, is a supporting partner, and Vanda Broughton a project consultant.
FATKS - Facet analytical theory in knowledge structures
This recent research project was funded by a grant from the (then) Arts and Humanities Research Board under their Innovation award scheme, designed to support speculative and highly theoretical research. Under the formal title Towards a knowledge structure for high performance subject access and retrieval within managed digital collections, the research investigated the feasibility of creating a fully faceted indexing language for use with digital resources in the humanities.
The immediate aim of the research was to develop and evaluate a prototype classification and implementation in collaboration with the Arts and Humanities Data Service (AHDS) and the Humbul Humanities Hub in order to fulfil the following objectives:
- to make a major contribution to the development of facet analytical theory
- to test an innovative method for accessing digital content, taking into account the complexity and variety of digital resources
- to test an innovative method for accessing digital content in a cross-disciplinary framework
- to develop a working prototype of a knowledge structure extensible across the arts and humanities
- to provide a model for such schemes for other disciplines and the wider community
- to provide the capacity for mapping between this knowledge structure and recognized international standards to ensure interoperability
- to disseminate the results of the research.
The project was felt to have very significant implications for the broad community of users of the AHDS, Humbul, and, more generally, for developments within the JISC IE (Joint Information Systems Committee Information Environment) and other information discovery activities. It would make it possible to carry out cross-collection searches in ways that are much more effective than can be achieved by current linear indexing schemes.
The process of building a classification as experienced in FATKS was three-fold. Initially work focused on the classification of conceptual content, coverage, structure, syntax, humanities subject related matters, the level of specificity and a plan for incremental development. The second stage involved the design of a data model and editorial tool, and addressed functionality and interface issues both for a desktop and Web application. The third area concerned the facilitation of verbal access to the classification (keywords, chain index, thesaurus) and mapping. The development of the prototype classification demonstrated the feasibility of building a system that translates the conceptual approach of facet analysis into a manageable data structure that can support all the semantic and syntactic features of a fully faceted vocabulary.
Full details of the project, demonstrators and project documentation can be found on the Project website
Automatic metadata generation for resource discovery
This JISC funded project was carried out in conjunction with the Arts & Humanities Data Service.
Resource discovery metadata is a crucial component of the lifecycle of digital resources. Without appropriate metadata, resources remain hidden and unused and much of the original investment is wasted. Standardising metadata is crucial to interoperability, since metadata is a powerful tool that enables the discovery and selection of relevant digital resources quickly and easily. Poor quality or non-existent metadata on the other hand is equally effective at rendering resources unusable, since without it a resource is essentially invisible within a repository or archive and thus remains undiscovered and inaccessible.
The JISC IE operates on the basis of an underlying assumption that quality metadata is necessary for the effective discovery of learning, teaching and research materials across the Web. And, effective discovery enhances and promotes the re-use of digital materials, an issue of increasing significance now that value-adding is seen to be essential for post-creation services (data centres, portals, etc.), and for ensuring an adequate return on investment in research and digitisation projects. However, there is a general view that metadata creation is expensive (only partially borne out by the very few investigations of costs).
Author-created metadata has been proposed as a solution but concerns remain over the quality of metadata produced by non-professionals. Fortunately, the features of digital resources that caused the problem in the first place – their amenability to automated production, copying, and manipulation – may come to the rescue. Automated metadata generation is still in its infancy but several approaches have emerged:
- metatag harvesting;
- content extraction;
- automatic indexing or classification;
- text and data mining;
- social tagging;
- auto-generation of extrinsic metadata.
The report aimed to provide a state-of-the-art survey and evaluation of currently available metadata generation tools, an analysis of trends in automatic classification and indexing research, and an investigation of the role of folksonomies and other democratic tagging tools in the creation of subject metadata. Full report.