The Survey of English Usage carries out research in English language Corpus Linguistics, and was the first centre in Europe to undertake this type of research.
About the Survey of English Usage
The Survey of English Usage carries out research in English language Corpus Linguistics, and was the first centre in Europe to undertake this type of research. From its inception in 1959, the Survey collected samples of naturally-occurring language for the purposes of description and analysis.
Corpus Linguistics is the study of naturally-occurring language structure and use by first collecting samples of spoken or written language and second, analysing these samples.
The first corpus projects predated cheap computer power and mass storage. The original Survey Corpus was first recorded on reel-to-reel Revox tape recorders, transcribed by hand, and then typed up, stored and annotated on paper cards or ‘slips’ (above).
The advent of modern desktop computing has completely changed all that. From the 1990s onwards, major corpus projects, such as ICE-GB and DCPSE, were digitised, transcribed, annotated and indexed on computers. We have developed software tools to help us undertake this work and, partly as a result, the sophistication of annotation, search facilities and research potential has grown.
Once such a resource is constructed, what can you do with it? A significant theme in our research concerns how best one may exploit computerised corpora for linguistic purposes.
Our current and recent projects are summarised on our Research Projects pages, while the results of research, including reference material, downloadable software/corpora and software sales are available from the Resources pages.
History
The Survey of English Usage (‘the Survey’) was founded in 1959 by Randolph Quirk. Many well-known linguists have spent time doing research at the Survey. Among them are: Valerie Adams, John Algeo, Dwight Bolinger, Noël Burton-Roberts, David Crystal, Derek Davy, Jan Firbas, Sidney Greenbaum, Liliane Haegeman, Robert Ilson, Ruth Kempson, Geoffrey Leech, Terttu Nevalainen, Jan Rusiecki, Jan Svartvik, Joe Taglicht and many others.

The ‘Gang of Four’ in the 1970s (left to right): Quirk, Greenbaum, Svartvik and Leech.

The ‘Gang of Five’ in 1983 (left to right): Svartvik, Crystal, Greenbaum, Leech and Quirk.
The ‘Quirk Corpus’
The million-word Survey Corpus, now complete, samples written and spoken British English produced between c.1955 and 1985. It comprises 200 texts, each of 5,000 words. The spoken texts include both dialogue and monologue, while the written texts include not only printed and manuscript material but also examples of English read aloud, as in broadcast news and scripted speeches.

The Survey Corpus was originally compiled on paper, in the form of many thousands of slips, with detailed grammatical annotations. This has now been computerized and each lexical item has been automatically tagged for wordclass. (It is available on the network of computers at the Survey premises. The original sound recordings may also be consulted at the Survey.)
Hundreds of publications have used, and continue to use, material from the Survey Corpus, either in its original printed form on slips of paper or in the later computerized spoken form, which became known as the London-Lund Corpus (LLC).

Foster Court in the late 1940s – what would become the Survey premises are on the top right of the picture.
The International Corpus of English
Randolph Quirk was succeeded in 1983 by Professor Sidney Greenbaum, who was Director until 1996. The ICE project began in 1990, with the Survey responsible for the international coordination of the project and for the compilation of ICE-GB, the British component of the project. The Survey has produced the grammatical and syntactic annotation schemes for the ICE corpora as well as numerous software packages to support the compilation of the project.
Bas Aarts took over as Director of the Survey in January 1997. The first release of ICE-GB took place in 1998. ICE-GB was distributed with software for searching and exploring the parsed corpus called ICECUP. Release 2 of ICE-GB has now been released and is available for purchase (optionally with sound files).
The Diachronic Corpus of Present-Day Spoken English
A recent project at the Survey undertook the parsing of a large (400,000 word) selection of the spoken part of the LLC in a manner directly comparable with ICE-GB, forming a new, 800,000 word diachronic corpus, called the Diachronic Corpus of Present-Day Spoken English (DCPSE). DCPSE has now been released and is also available for purchase.
For more about our current research, see here.
Funding
We gratefully acknowledge support for research from:
- University College London
- The Arts and Humanities Research Board
- The Arts and Humanities Research Council
- The British Academy
- The Economic and Social Research Council
- The Engineering & Physical Sciences Research Council
- The Leverhulme Trust
- The Joint Information Systems Committee
- HM Government (DSIR, OSTI, SSRC, DES)
- Longman Publishers
- Oxford University Press
- Cambridge University Press
- Ford Foundation
- Naturmedtodens Sproginstitut
- The Gulbenkian Foundation
- Bank of Sweden Tercentenary Foundation
- Sir Sigmund Sternberg Foundation
- IBM
- ESPRIT
- The Michael Marks Charitable Trust
- British Telecom
- The British Sasakawa Foundation
Full contact details
The Survey of English Usage
Department of English Language and Literature
University College London
Gower Street
London WC1E 6BT
UK
phone: +44 20 7679 3119
email: ucleseu@ucl.ac.uk
Credits
All material on this web site is copyright © the UCL Survey of English Usage.
Photographs are used with permission from staff personal collections.