EASTER - Evaluating Automated Subject Tools for Enhancing Retrieval
Project website
This is a JISC funded project hosted by UKOLN at the University of Bath.
The purpose of the project is to:
Test and evaluate existing tools for automated subject metadata generation;
Better understand what is possible and what the limitations of current solutions are; and,
Make subsequent recommendations for services employing subject metadata in the JISC community.
Department of Information Studies, University College London, is a supporting partner, and Vanda Broughton a project consultant.
FATKS - Facet analytical theory in knowledge structures
This recent research project was funded by a grant from the (then) Arts and Humanities Research Board under their Innovation award scheme, designed to support speculative and highly theoretical research. Under the formal title Towards a knowledge structure for high performance subject access and retrieval within managed digital collections, the research investigated the feasibility of creating a fully faceted indexing language for use with digital resources in the humanities.
The immediate aim of the research was to develop and evaluate a prototype classification and implementation in collaboration with the Arts and Humanities Data Service (AHDS) and the Humbul Humanities Hub in order to fulfil the following objectives:
- to make a major contribution to the development of facet analytical theory
- to test an innovative method for accessing digital content, taking into account the complexity and variety of digital resources
- to test an innovative method for accessing digital content in a cross-disciplinary framework
- to develop a working prototype of a knowledge structure extensible across the arts and humanities
- to provide a model for such schemes for other disciplines and the wider community
- to provide the capacity for mapping between this knowledge structure and recognized international standards to ensure interoperability
- to disseminate the results of the research.
The project was felt to have very significant implications for the broad
community of users of the AHDS, Humbul, and, more generally, for
developments within the JISC IE (Joint Information Systems Committee
Information Environment) and other information discovery activities. It
would make it possible to carry out cross-collection searches in ways
that are much more effective than can be achieved by current linear
indexing schemes.
The process of building a classification as experienced in FATKS was
three-fold. Initially work focused on the classification of conceptual
content, coverage, structure, syntax, humanities subject related
matters, the level of specificity and a plan for incremental
development. The second stage involved the design of a data model and
editorial tool, and addressed functionality and interface issues both
for a desktop and Web application. The third area concerned the
facilitation of verbal access to the classification (keywords, chain
index, thesaurus) and mapping. The development of the prototype
classification demonstrated the feasibility of building a system that
translates the conceptual approach of facet analysis into a manageable
data structure that can support all the semantic and syntactic features
of a fully faceted vocabulary.
Full details of the project, demonstrators and project documentation can be found on the Project website
Automatic metadata generation for resource discovery
This JISC funded project was carried out in conjunction with the Arts & Humanities Data Service.
Resource discovery metadata is a crucial component of the lifecycle of
digital resources. Without appropriate metadata, resources remain hidden
and unused and much of the original investment is wasted. Standardising
metadata is crucial to interoperability, since metadata is a powerful
tool that enables the discovery and selection of relevant digital
resources quickly and easily. Poor quality or non-existent metadata on
the other hand is equally effective at rendering resources unusable,
since without it a resource is essentially invisible within a repository
or archive and thus remains undiscovered and inaccessible.
The JISC IE operates on the basis of an underlying assumption that
quality metadata is necessary for the effective discovery of learning,
teaching and research materials across the Web. And, effective discovery
enhances and promotes the re-use of digital materials, an issue of
increasing significance now that value-adding is seen to be essential
for post-creation services (data centres, portals, etc.), and for
ensuring an adequate return on investment in research and digitisation
projects. However, there is a general view that metadata creation is
expensive (only partially borne out by the very few investigations of
costs).
Author-created metadata has been proposed as a solution but concerns
remain over the quality of metadata produced by non-professionals.
Fortunately, the features of digital resources that caused the problem
in the first place - their amenability to automated production, copying,
and manipulation - may come to the rescue. Automated metadata
generation is still in its infancy but several approaches have emerged:
- metatag harvesting;
- content extraction;
- automatic indexing or classification;
- text and data mining;
- social tagging;
- auto-generation of extrinsic metadata.
The report aimed to provide a state-of-the-art survey and evaluation of
currently available metadata generation tools, an analysis of trends in
automatic classification and indexing research, and an investigation of
the role of folksonomies and other democratic tagging tools in the
creation of subject metadata. Full report.