XClose

UCL English

Home
Menu

The Survey of English Usage – Projects

The Survey of English Usage carries out research in English language Corpus Linguistics. From its inception, the Survey gathered samples of naturally-occurring language for the purposes of description and analysis. Results of research may also be found in our Resources pages.
The British Component of the International Corpus of English

The ICE-GB Corpus

ICE-GB is the British component of the International Corpus of English. Published by the Survey of English Usage, it contains over one million words of fully-parsed written and spoken English c.1990-92.

The Diachronic Corpus of Present Day Spoken English

The DCPSE corpus

DCPSE is the Diachronic Corpus of Present-day Spoken English. Published by the Survey of English Usage, it contains over 800,000 words of fully-parsed spoken English from 1957-1993.

Pregnancy Loss Language project

Pregnancy Loss Language

Ongoing Linguistics research which engaged stakeholders to explore linguistic challenges in communicating about pregnancy loss. 

Teaching English Grammar in Schools project

Teaching English Grammar in Schools

This project, Creating a Web-Based Platform for English Language Teaching and Learning, led to the development of the Englicious website for schools and teachers.

Corpus Queries Project

Corpus Queries

The development of an effective grammatical query methodology in the context of a parsed corpus. This project supported the development of Fuzzy Tree Fragments and ICECUP III.

Next Generation Tools Project

Next Generation Tools

Developing next generation tools for linguistic research in grammatical treebanks. As well as developing a new system, ICECUP IV, this project contributed to the further development of ICECUP.

The English Noun Phrase project

The English Noun Phrase

Corpus Linguistics research into the English Noun Phrase that has helped enrich our understanding of a core aspect of English grammar, leading to a published monograph.

The Changing English Verb Phrase Project

The English Verb Phrase

A Corpus Linguistics research project into how the English Verb Phrase system continues to change in recent British English.

Subordination in Spoken and Written English project

Subordination in English

A Corpus Linguistics study into the concept of ‘subordination’ and grammatical complexity in spoken and written English.

Our Research

Constructing Corpora

The first corpus compiled at the Survey was the 'Quirk Corpus', which comprises spoken and written English. The spoken component eventually became better known as the 'London' part of the London-Lund Corpus (LLC). This corpus was the first of its kind in Europe.

In 1988 Sidney Greenbaum proposed a new project, ICE, the International Corpus of English. ICE was to be an international project, carried out at research centres around the world, to compile corpora of English varieties where English was the first or second official language. ICE texts would contain spoken and written English in a balanced sample of one million words per component so that these samples could be compared in a wide varieties of ways. The ICE project continues around the world to the present day.

ICE-GB, the British Component of ICE, was compiled at the Survey. ICE-GB was annotated to a very detailed level, including constructing a full grammatical analysis (parse) for every sentence in the corpus. The first release of ICE-GB took place in 1998. ICE-GB was distributed with software for searching and exploring the parsed corpus called ICECUP. Release 2 of ICE-GB has now been released and is available for download.

As well as contrasting varieties of English, many researchers are interested in language development and change over time. A recent project at the Survey undertook the parsing of a large (400,000 word) selection of the spoken part of the LLC in a manner directly comparable with ICE-GB, forming a new, 800,000 word diachronic corpus, called the Diachronic Corpus of Present-Day Spoken English (DCPSE). DCPSE has now been released and is available on CD.


Exploring Corpora

Parsed corpora are large databases containing detailed grammatical tree structures. One of the consequences of forming large collections of valuable linguistic data is a pressing need for methods and tools to help researchers and other users make the most of them. So in parallel with the parsing of natural language data, we have carried out research and development of software tools to help linguists use our corpora.

The Corpus Queries project concerned the development of an intuitive and robust grammatical query system called Fuzzy Tree Fragments (FTFs). FTFs are approximate models which can be readily understood and have a variety of applications. Our ICECUP software uses FTFs to carry our grammatical searches on parsed corpora in an exploratory environment.

A project called Next Generation Tools for linguistic research in grammatical treebanks, extended the exploratory platform of ICECUP to support cycles of linguistic experiments on parsed corpora. 


Linguistic Research with Corpora

As well as distributing our corpora and tools to the Corpus Linguistics research community, we carry out research into English language ourselves. Recent projects include research on the English Noun Phrase and the English Verb Phrase

We also have a number of PhD students who carry out research into corpora. More information is available on request.


Applying Corpus Research

A great deal of linguistic research is unashamedly ‘pure’ research, dedicated to understanding how language works and changes over time, or between speaker communities. Corpora also have a number of potentially very useful applications.

One important area of application is in education. A new Knowledge Transfer project for Teaching English Grammar in Schools has the aim of developing a complete system for teaching English Grammar at a secondary school level which uses a corpus to provide many real world examples, in context, selectively and dynamically, something no traditional grammar book can do.

The first publication from this project is the spin-off interactive Grammar of English App for the iPhone and other hand-held devices.


Linguistic research using Corpus Linguistics Methods

Even where general corpora are not being constructed, linguists often use methods from Corpus Linguistics alongside other types of linguistic engagement.

Work led by Dr Beth Malory concerns the language used by health care professionals and patients in the context of the loss of a pregnancy. The project Engaging Stakeholders to Explore Linguistic Challenges in Communicating about Pregnancy Loss (EStELC) constructed a corpus of interviews in two research cohorts. One cohort represented people with lived experience of pregnancy loss, and the other was comprised of people whose professional role involves providing care for people experiencing pregnancy loss.