in knowledge representation and organization for the 21st century: integration
of knowledge across boundaries: proceedings of the the Seventh International
ISKO Conference, 10-13 July 2002, Granada, Spain). Eds. María
J. López-Huertas with the assistance of Francisco J. Muñoz-Fernández.
Würzburg: Ergon Verlag, 2002. (Advances in Knowledge Organization;
Vol 8). pp. 135-142.
Facet analytical theory as a basis for a knowledge organization
tool in a subject portal
and background to the project
The object of the research is to develop a subject tool for the
management of the new portal with two functions envisaged. Firstly
the system will be used to organize the front end of the portal,
and will provide a structure for the first point of entry to the
site. A directory style layout will be provided with subject headings
developed from the system, together with browsable indexes derived
from the inversion of the subject headings. The knowledge tool
has the additional function of providing a vocabulary for use
in subject metadata.
Currently the two gateways create catalogue data for all of the digital objects in their collections using the Dublin Core. For the subject field (and for organization of the display of resources) Humbul uses the Dewey Decimal classification, and AHDS Library of Congress Subject Headings. Neither of these systems have proved entirely satisfactory, and for the combined portal both organizations decided that they wanted to explore the possibilities of a new and more satisfactory indexing tool, logically structured and based on modern theories of subject organization, and designed with the indexing needs of digital materials specifically in mind.
A second indexing problem related to the digital nature of the
materials is the need to identify non-semantic properties of the
objects i.e. properties of electronic format, media, and so on,
since these may well be sought terms.
There is also a desire to exploit the major tool for 'cross-referencing' in a digital environment - hypertext. Hypertext is able to facilitate retrieval in at least two ways. Firstly within the visual display where it can be used to expose successive layers of a hierarchy (see below) maintaining the detail of the developed structure without exposing too much of it to the user at once. Secondly it supports the notion of multiple points of access to the network of resources and reinforces the regularity of the structure of the system.
During discussions with staff at SLAIS it was decided that a new indexing tool, built on faceted classification principles might meet the specific indexing needs of the situation better than the conventional systems currently being employed.
nature of classificatory structures
Despite this, the classification scheme is often seen as ineffective as an organizer for electronic resources (Koch, 1998). Some obvious reasons for this can be found in the lack of logical structure in older systems of classification. Traditional classification schemes are built on the basis of a tree structure, with the emphasis on downward subdivision into smaller and more specific classes; often, the only relationships acknowledged are those of super- and subordination and there is no provision for syntactic relations. As a result the classification is usually relatively broad and there may be limited facilities for combining between classes, or for expressing complex semantic content. As seen in the example given above, this can cause difficulties when searching in a multi-disciplinary environment, or when dealing with objects of a complex nature.
Classifications built on the facet analytical model work on different principles, starting from the constituent concepts of the subject, rather than from a view of knowledge in its entirety. It is important to establish what precisely we mean by facet analysis, since there are a number of interpretations of this theory, some of them very different from the original work of S. R. Ranganathan. The situation is complicated by the current interest shown by computer scientists in this area, and the frequent use of the term faceted classification to mean any system of subject organization in which analytico-synthesis is used. This notion is also often employed in the world wide web literature on facet analysis (Maple, 1995), and also in the United States usage of the term when compared with the British or European tradition. A recent commentator on types of classification scheme (Koehler, 2000) states that "Universal classification schemes . tend to classify works by a single characteristic.", contrasting them with faceted schemes which are considered to express more than one aspect of a subject.
In this paper, and in the accompanying project, facet analysis is interpreted in a much more restricted way; it is taken to mean that rigorous process of terminological analysis whereby the vocabulary of a given subject is organized into facets and arrays, resulting in a complex knowledge structure with both semantic and syntactic relationships clearly delineated.
Classifications of this type provide effective tools for vocabulary management, document description and retrieval. Facet analysis provides a method which is in theory appropriate for the management of terminologies and concepts in a variety of environments, although its applications have so far mainly been limited to the conventional print based document collection
As an ordering device for initial access to resources in a managed site the faceted classification has some advantages over traditional schemes. The predictable format of the classification is usually very evident to the casual searcher, and the structure can also be used to generate subject headings and a browsable index, as alternatives to, or in support of, a directory style front page.
Although primarily used for the physical organization of print
based materials the potential of facet analysis for the management
of documents in an automated environment has already been recognized.
More than ten years ago Godert (Godert, 1991)
and Ingwersen and Wormell (Ingwersen & Wormell,
1992) testified to their performance when used in conjunction
with online databases for both document description and the framing
of queries. Ingwersen and Wormell (Ingwersen &
Wormell 1992, p.199) were able to state with confidence that
the discussion demonstrates the suitability of the
faceted categorization, not only for textual documents, but also
with other forms of carriers of information. Faceted categorization
may provide multi-dimensional and hence structured entry points
to document contents, and thus give intellectual access to generated
and stored knowledge." More recently the specific application
of facet analysis to searching the World Wide Web has been discussed
by Ellis and Vasconcelos (Ellis & Vasconcelos,
2000) who conclude that "
is derived inductively from the concepts or terms used in the
subject field, it can alleviate some problems in searching the
World Wide Web by being applied to using subject directories or
The system currently under development at UCL is built on those classificatory principles developed by the Classification Research Group in the mid twentieth century out of the original work of S. R. Ranganathan. The internal logic of the system is based on rigorous analysis of vocabulary, whereby terms are sorted into a standard set of functional categories. Within these categories a range of semantic relations are acknowledged, and problems of vocabulary control addressed. A sophisticated system syntax provides for the ordering and combination of terms both intra- and inter-category. There is consequent improved performance in the accommodation of complex subjects, the predictability of location, and in the effectiveness of retrieval. This methodology has been brought to a high level of sophistication in the second edition of the Bliss Bibliographic Classification (Mills & Broughton, 1977-) where it has been used to create a new general scheme of classification (albeit built on the infrastructure of the original scheme (Bliss, 1940-1953)).
Taking the BC2 methodology as a standard, the new system begins from the literary warrant of the objects to be classified. The digital objects within the humanities collections are used to establish the terminology to be worked on. A set of standard categories is used for analysis which follows the 'standard citation order'. These categories are functional in nature; they comprise thing/entity - kind - part - property - material - processes - operations - patient - product - agents - space - time. Although developed initially within the areas of science and technology (so that they resemble a production process), in the process of the BC2 work these standard categories have been applied successfully to the vocabulary of subject fields in almost all areas of knowledge. As one moves across the spectrum of disciplines from science to the arts it is true that the number of these categories that are used becomes smaller; notions such as materials, products and agents are less often found in the humanities. It is also true that some domain specific categories can occur. Commonly recognised examples include those of genre and form in literature and music. We expect that new categories can be determined and that the techniques of facet analysis can be applied in new ways and to new material. It is the methodology that is here being tested, and not any specific structure that has been created in the past.
Once the terms have been sorted into categories, further analysis structures each category and clusters the terms into arrays or groups of terms which share a common property or characteristic (known as the characteristic of division). Various principles of ordering are used to determine the sequence of arrays within the category, and the collocation of terms within the array; chronological, developmental, spatial and physical contiguity, and the commonly used classificatory principles of increasing concreteness, increasing speciality, and increasing complexity. The organization of the categories or facets into a single sequence normally follows standard citation order i.e. the order given above, but inverted; this brings the more concrete facets to the end of the sequence, and facilitates the location of compounded terms.
The classificatory structure built in this way will consist of single simple terms arranged in a systematic way according to the logic of the system - a pure faceted structure in Ranganathan's understanding. In practice we find that the terminology does not always lend itself to absolute reduction of this kind. There are constant occurrences of terms which represent compounds and combinations of all sorts, but which require to be included in the schedules (and perhaps more significantly in the index) because of their status within the literature and their existence as sought concepts. Consequently the classification consists of rather more than the 'skeleton' of simple concepts in mutually exclusive sets. As examples of compound concepts are used to populate the classes the structure grows in complexity. Where, as is usually the case, the compound represents the intersection of two or more categories, the system syntax (or rules of combination) control the precise location of the compound and a much expanded classification is generated by the interaction of the syntax and the conceptual structure. The basic citation order can be repeated as necessary at any point within the classification, and subjects of considerable complexity can be accommodated in a highly logical, predictable and regular fashion. A fully developed faceted classification on the model of BC2 can by this means generate a very complex knowledge structure of n-dimensionality and great logical regularity, and with deep levels of hierarchy. The resultant structure can be utilised in a number of ways; as an ordering device, as a source of index terms and subject headings, and can also be converted to a thesaurus.
The principles of categorization implicit in facet analytical theory can be extended to organize any set of properties of objects in any domain, and of the domain itself. The theory is sufficiently well established to allow variation in the classical form, and the compiler of a faceted structure need not feel restricted to the categories and combinatorial rules of standard citation order. Viewed in this way, facet analysis should be thought of as a powerful methodological tool rather than a specific arrangement of topics in a given field. Any attributes of objects which are significant for retrieval can be incorporated into the categorical formulation; hence facet analytical theory can accommodate concepts descriptive of a digital environment and the objects in it when building a structure to function as a retrieval tool in that environment. Its capacity to cope with objects with complex combinations of attributes in a logical and predictable fashion permits the creation of structures which reflect the multidimensionality of such objects. There also appears to be potential to exploit hypertext to generate links across such structures, and to increase the points of entry to the semantic network in an innovative manner.
As we have seen above, the effectiveness of browsing across a series of semantically discrete databases is important to the success of the combined portal; therefore experiments will be carried out to test its use in cross-disciplinary browsing and retrieval of digital resources within the digital collections. In a test-bed implementation for the research, AHDS and Humbul are applying the knowledge structure to the Portal's planned metadata repository for all the digital objects in their collection. Initially a pilot project will build a structure for the discipline of history and test its effectiveness over this part of the collection This area has been chosen because of the pervasiveness of the discipline, and its particular suitability for testing cross-disciplinary searching and retrieval. Extensible markup language (XML) will be the vehicle for implementing the knowledge structure, and the potential of this combination of the knowledge structure and the markup language seems to be very considerable.
It is certainly true that, to date, markup languages have not been utilised to index the intellectual content of documents. Hypertext markup language is engaged only with the format and display of documents. Extensible markup language has taken things a stage further in considering document structure and provides more information to the searcher in terms of the constituent parts of the document, although from the viewpoint of traditional retrieval this leaves us only with the equivalent of the author-title catalogue organized by a descriptive cataloguing standard. The problems of intellectual content of the object have yet to be addressed.
There is currently much interest in the development of the 'Semantic web'. In an article in Scientific American Tim Berners-Lee (Berners-Lee, 2001) looks at how the emphasis in web searching might be switched from a provenance based way of handling documents to one linked to the semantic content; in this new approach to document analysis the ontology has an essential role to play, and it is clear that Berners-Lee's understanding of the ontology bears a close relationship to the kind of structure that we are developing at SLAIS. He says (Berners-Lee, 2001 p 34) "The most typical kind of ontology for the Web has a taxonomy and a set of inference rules. The taxonomy defines objects of classes and relations among them ..Classes, subclasses and relations among entities are a very powerful tool for Web use ..Inference rules in ontologies supply further power." He also describes (Berners-Lee, 2001 p 37) what is essentially the process of mapping and of the use of a classification with a notation to enable information exchange "An essential process is the joining together of subcultures when a wider common language is needed. . The relations allow communication and collaboration even where the commonality of concept has not yet led to a commonality of terms."
The capacity of the classification scheme or controlled indexing language to perform this function of intermediary in the exchange of information across languages and cultures is well represented in the professional literature. We are also clear that the use of a faceted structure has advantages in terms of its efficiency as a retrieval tool, and that it has the potential to handle complex digital objects and facilitate cross-disciplinary searching. It remains to be seen whether the implementation of a system with these properties using XML might provide us with a tool with far wider application.
1. Berners-Lee, T.; Hendler, J.; Lassila, O. (2001). The semantic web. Scientific American, May (2001) [online]. Available at URL: http://www.sciam.com/article.cfm?articleID=00048144-10D2-1C70-84A9809EC588EF21
5. HILT (2001) Interim report on the High Level Thesaurus http://www.hilt.strath.ac.uk/Reports/Consultation.htm (Accessed 14.02.2002)
7. Koch, T. (1998) The role of classification schemes in Internet resource description and discovery; work package 3 of Telematics for Research project DESIRE (RE 1004) http://www.ukoln.ac.uk/metadata/desire/classification/ (Accessed 29.08.01)
8. Koehler, W. (2000) Web document management
9. Maple, A. (1995) Faceted access; a review
of the literature Paper presented at the Music Library Association
Annual Meeting 10 February Also available at