The Survey of English Usage carries out research in English language Corpus Linguistics. From its inception, the Survey gathered samples of naturally-occurring language for the purposes of description and analysis. Results of research may also be found in our Resources pages.

Constructing Corpora

The first corpus compiled at the Survey was the 'Quirk Corpus', eventually becoming better known as the "London" part of the London-Lund Corpus (LLC). This corpus was the first of its kind in Europe.

In 1988 Sidney Greenbaum proposed a new project, ICE, the International Corpus of English. ICE was to be an international project, carried out at research centres around the world, to compile corpora of English varieties where English was the first or second official language. ICE texts would contain spoken and written English in a balanced sample of one million words per component so that these samples could be compared in a wide varieties of ways. The ICE project continues around the world to the present day.

ICE-GB, the British Component of ICE, was compiled at the Survey. ICE-GB was annotated to a very detailed level, including constructing a full grammatical analysis (parse) for every sentence in the corpus. The first release of ICE-GB took place in 1998. ICE-GB was distributed with software for searching and exploring the parsed corpus called ICECUP. Release 2 of ICE-GB has now been released and is available on CD.

As well as contrasting varieties of English, many researchers are interested in language development and change over time. A recent project at the Survey undertook the parsing of a large (400,000 word) selection of the spoken part of the LLC in a manner directly comparable with ICE-GB, forming a new, 800,000 word diachronic corpus, called the Diachronic Corpus of Present-Day Spoken English (DCPSE). DCPSE has now been released and is available on CD.

Exploring Corpora

Parsed corpora are large databases containing detailed grammatical tree structures. One of the consequences of forming large collections of valuable linguistic data is a pressing need for methods and tools to help researchers and other users make the most of them. So in parallel with the parsing of natural language data, we have carried out research and development of software tools to help linguists use our corpora.

The Corpus Queries project concerned the development of an intuitive and robust grammatical query system called Fuzzy Tree Fragments (FTFs). FTFs are approximate models which can be readily understood and have a variety of applications. Our ICECUP software uses FTFs to carry our grammatical searches on parsed corpora in an exploratory environment.

A project called Next Generation Tools for linguistic research in grammatical treebanks, extends the exploratory platform of ICECUP to support cycles of linguistic experiments on parsed corpora. See also the new pages on ICECUP IV. Using this system linguists can carry out focused experiments to unpick how one grammatical choice impacts on another.

Linguistic Research with Corpora

As well as distributing our corpora and tools to the Corpus Linguistics research community, we carry out research into English language ourselves. Recent projects include research on the English Noun Phrase and Subordination in Spoken and Written English. A new research project on the English Verb Phrase has started.

We also have a number of PhD students who carry out research into corpora. More information is available on request.

Applying Corpus Research

A great deal of linguistic research is unashamedly ‘pure’ research, dedicated to understanding how language works and changes over time, or between speaker communities. However, corpora also have a number of potentially very useful applications.

One important area of application is in education. A new Knowledge Transfer project for Teaching English Grammar in Schools has the aim of developing a complete system for teaching English Grammar at a secondary school level which uses a corpus to provide many real world examples, in context, selectively and dynamically, something no traditional grammar book can do.

The first publication from this project is the spin-off interactive Grammar of English App for the iPhone and other hand-held devices.

