UCL DEPARTMENT OF ENGLISH
UCL logo





Search

computer screen
The Survey of English Usage

Widely recognised for its research on present-day English, the Survey was founded a full half century ago. Its outputs include the Comprehensive Grammar of the English Language (1985), by four internationally renowned linguists: Randolph Quirk, Sidney Greenbaum, Geoffrey Leech, and Jan Svartvik (Quirk et al. 1985). It is also well known for its pioneering work in the area of corpus linguistics.

A linguistic corpus is a collection of written and spoken material, compiled for the purposes of language research. The SEU houses three corpora: the Survey of English Usage Corpus, the British component of the International Corpus of English (ICE-GB), and the Diachronic Corpus of Present-Day Spoken English (DCPSE). ICE-GB and DCPSE are parsed collections of transcribed speech and writing, which means that every sentence is fully grammatically analysed in the form of so-called tree diagrams, i.e. graphical representations of the structure of sentences, using a grammar based on Quirk et al. (1985). These corpora amount to over 1.4m words of fully parsed English sentences and span genres and contexts, comprising spoken transcriptions and written text in a variety of situations, as well as sound recordings of the later material. The corpora have been used all over the world for scholarly research, and are state-of-the art resources. The SEU also has an established experience in building web resources for grammar teaching, e.g. the JISC-funded Internet Grammar of English.

The Survey has a five-pronged research strategy

  1. The creation of new corpora
  2. The further development of current corpora
  3. The development of computer software for creating and exploiting corpora
  4. The development of educational resources using corpora
  5. Carrying out research using SEU resources

a. The creation of new corpora

The Survey intends to create parsed corpora of literary works (e.g. Shakespeare) in the future, with the aim of building bridges between language and literature.

b. The further development of current corpora

The Survey’s three corpora (the ‘Quirk corpus’, ICE-GB and DCPSE) can be further annotated, e.g. at the prosodic, pragmatic, and morphological levels of linguistic analysis. ICE-GB and DCPSE are fully parsed and extending the analysis would permit new avenues of research (for example, to explore interactions between grammar and pragmatics). The intention is to seek funding for further development.

c. The development of computer software for creating and exploiting corpora

The SEU has developed tools for automatic annotation, including part-of-speech tagging (automatically classifying words by verb, noun, etc.) and parsing. Central to all our work is exploration software designed for working with a parsed corpus, called ICECUP, developed by the Senior Research Fellow, Sean Wallis. This software has been used for building corpora and carrying out research with the corpus and is distributed freely with our corpora.

The SEU was recently funded by the ESRC to build a corpus experimentation platform (the Next Generation Tools for Linguistic Research in Grammatical Treebanks project) which allows scholars to define viable research projects into grammar and lexis of far greater complexity than was previously possible.

Sean Wallis’s research is in the area of corpus linguistics methodology. This is the ‘how do we do research?’ question, including how to design and annotate a corpus, how to use a new corpus for your linguistic research (and learning the grammar of the corpus), what kinds of experiments can we undertake (and what do results mean), through to developing new methods for linguistic analysis.

d. The development of educational resources using corpora

One of the biggest potential exploitation routes of corpora is in education. Corpora have been used for developing educational material by SEU staff from the famous Comprehensive Grammar to the Internet Grammar of English to create the Internet Grammar of English.

The SEU has recently secured funding to develop web-based teaching and learning resources for schools. The intention is to construct a web platform for teaching English at Key Stages 3-5 (secondary schools and equivalent). Our ICECUP server technology is then used to select examples from a corpus and publish them online. The corpus provides a vast supply of natural language examples for teaching, and can supply context for any example provided.

The SEU’s corpora are also used as an integral element of both undergraduate and postgraduate teaching in the department, for example in the undergraduate course Literary Linguistics and the postgraduate course Corpus Linguistics (both run in alternate years).

e. Carrying out research using SEU resources

As noted above, the SEU’s corpora are used throughout the world to carry out English language research. However, for decades the Survey has also carried out its own English language research. A recently completed project is The English Noun Phrase, funded by the AHRC, which has resulted in a monograph published by Cambridge University Press, authored by the Researcher working on the project, Evelien Keizer, now at the University of Amsterdam.

Currently Bas Aarts and a researcher are working on an AHRC-funded project entitled The Changing English Verb Phrase which will use DCPSE as its database to investigate short-term changes in the English verbal system, such as the increased use of progressive constructions in English (e.g. I love this film vs. I’m loving this film.). Bas Aarts is also using the SEU’s resources to write a grammar of English for Oxford University Press.

The Survey’s research on recent change in English is complemented by Kathryn Allan’s work on change over a longer timespan in English, which focuses on lexical semantic change and lexicology. The possible uses of diachronic corpora and other electronic resources to investigate semantic change is a major theme of a volume on current methods in historical semantics which Kathryn Allan is co-editing for Mouton de Gruyter’s ‘Topics in English Linguistics’ series.

See also:

» The Survey of English Usage website

Back to Research

See also:

Department of English - University College London - Gower Street - London - WC1E 6BT - Telephone: +44 (0)20 7679 3134 - Copyright © 1999-2005 UCL


Search by Google