XClose

UCL English

Home
Menu

The Survey of English Usage – Resources

The Survey of English Usage carries out research in English language Corpus Linguistics. We construct corpora, develop tools and methodologies, and carry out original research into the English language itself.
The British Component of the International Corpus of English

The ICE-GB Corpus

ICE-GB is the British Component of the International Corpus of English. Published by the Survey of English Usage, it contains over one million words of fully-parsed written and spoken English c.1990-92.

The Diachronic Corpus of Present Day Spoken English

The DCPSE corpus

DCPSE is the Diachronic Corpus of Present-day Spoken English. Published by the Survey of English Usage, it contains over 800,000 words of fully-parsed spoken English from 1957-1993.

ICECUP 3.1

ICECUP 3.1

ICECUP 3.1 is a state-of-the-art corpus exploration program designed for parsed corpora such as ICE-GB and DCPSE. It is distributed with our corpora for free with extensive online help.

The TOSCA/ICE Grammar

The TOSCA/ICE Grammar

The TOSCA/ICE Grammar is a detailed grammatical framework based on Quirk et al.’s (1985) Comprehensive Grammar of English which is used throughout our parsed corpora.

FTFs logo

Fuzzy Tree Fragments

Fuzzy Tree Fragments (FTFs) are structured grammatical queries designed for searching parsed corpora. FTFs were initially developed in the Corpus Query project and are implemented in ICECUP.

Statistics Resources, looking down the neck of a bottle

Statistics Resources

A range of statistics resources developed by Sean Wallis in the course of his research. See also his corp.ling.stats blog.

Parsed Corpora

The Research Projects section of our site summarises two major parsed corpora of English.

ICE-GB is the British Component of the International Corpus of English, containing samples of written and spoken contemporary (early 1990s) English. ICE-GB is available in a Release 2 version with updated software and optionally, aligned digital audio.

DCPSE is the Diachronic Corpus of Present-day Spoken English, a parsed corpus of spoken English, containing samples spanning the late 1960s to early 1990s. This resource is ideal for studying language change in the recent past.

Both corpora are sampled from very large numbers of participants in a wide range of contexts. They are fully-parsed and richly-annotated using the TOSCA/ICE Grammar and the parse trees are fully searchable with our ICECUP software.

These resources are available to order.


Corpus Research Tools

ICECUP 3.1 is our state-of-art corpus exploration platform designed from the ground up for disseminating, exploiting and carrying out research with parsed corpora.


Grammatical Schema

A glossary for the grammatical framework employed in ICE-GB and DCPSE is on the TOSCA/ICE Grammar page


Grammatical Query Methodologies

Fuzzy Tree Fragments (FTFs for short) are a powerful and intuitive way of specifying a grammatical query in a parsed corpus or treebank. They are a visual representation of parts of trees where potential tree nodes (clauses, phrases, wordclasses) are combined together by a flexible set of relationships (links and edges), mirroring the structures found in the corpus.

Our FTF pages on this site describe this system in detail, including detailing the links and edges available and how they work in practice matching examples in the corpus. We also provide an introduction to using FTFs in experimental research with parsed corpora.

FTFs are integrated at the core of ICECUP's database technology along with lexical wildcards, and are also used for the dynamic retrieval of examples from ICE-GB in our teaching platform Englicious.