The Survey of English Usage – Resources
Parsed Corpora
The Research Projects section of our site summarises two major parsed corpora of English.
ICE-GB is the British Component of the International Corpus of English, containing samples of written and spoken contemporary (early 1990s) English. ICE-GB is available in a Release 2 version with updated software and optionally, aligned digital audio.
DCPSE is the Diachronic Corpus of Present-day Spoken English, a parsed corpus of spoken English, containing samples spanning the late 1960s to early 1990s. This resource is ideal for studying language change in the recent past.
Both corpora are sampled from very large numbers of participants in a wide range of contexts. They are fully-parsed and richly-annotated using the TOSCA/ICE Grammar and the parse trees are fully searchable with our ICECUP software.
These resources are available to order.
Corpus Research Tools
ICECUP 3.1 is our state-of-art corpus exploration platform designed from the ground up for disseminating, exploiting and carrying out research with parsed corpora.
Grammatical Schema
A glossary for the grammatical framework employed in ICE-GB and DCPSE is on the TOSCA/ICE Grammar page.
Grammatical Query Methodologies
Fuzzy Tree Fragments (FTFs for short) are a powerful and intuitive way of specifying a grammatical query in a parsed corpus or treebank. They are a visual representation of parts of trees where potential tree nodes (clauses, phrases, wordclasses) are combined together by a flexible set of relationships (links and edges), mirroring the structures found in the corpus.
Our FTF pages on this site describe this system in detail, including detailing the links and edges available and how they work in practice matching examples in the corpus. We also provide an introduction to using FTFs in experimental research with parsed corpora.
FTFs are integrated at the core of ICECUP's database technology along with lexical wildcards, and are also used for the dynamic retrieval of examples from ICE-GB in our teaching platform Englicious.