ICE-GB is the British component of the International Corpus of English (ICE).
ICE began in 1990 with the primary aim of providing material for comparative studies of varieties of English throughout the world.
More than twenty centres around the world are preparing corpora of their own national or regional variety of English. These include
East Africa (Kenya, Malawi, Tanzania)
Great Britain (parsed)
Trinidad and Tobago
ICE-GB was first released in 1998 with ICECUP 3.0. Since then it has been used for research and education in universities, colleges and schools all over the world.
ICE-GB Release 2 on CD-ROM
Release 2 is aligned with the 300 audio recordings which are available as an optional extra. You can also upgrade from ICE-GB R1 at a low cost. See the order form for details and why you should consider upgrading below.
- One million words of spoken and written British English from the 1990s.
- Tagged, parsed and checked.
- Bundled with the latest ICECUP 3.1 exploration software designed for parsed corpora.
- Supplied with Getting Started with ICECUP 3.1 (40pp) and extensive on-line help.
- Option: Audio recordings for 300 spoken texts.
64bit note: ICECUP 3.1 is a 16bit program (as is its installer), and has run successfully on PCs running anything from Windows 3.1 to 7. It runs fine on 32bit Windows 7 on 64bit hardware.
Unfortunately, at present 64bit versions of Windows will not run ICECUP directly. If you wish to run ICECUP 3.1 under 64bit Windows, we suggest you investigate using Microsoft Virtual PC. For more information see this blog. Unfortunately we have not been able to test this, so if you succeed or fail do let us know!
There are numerous English corpora available.
» Many are available on CD from ICAME (see right).
ICE-GB is fully grammatically analysed. Like all the ICE corpora, ICE-GB consists of a million words of spoken and written English and adheres to the common corpus design. 200 written and 300 spoken texts make up the million words. Every text is grammatically annotated, permitting complex and detailed searches across the whole corpus.
ICE-GB contains 83,394 parse trees, including 59,640 in the spoken part of the corpus. This is the biggest collection of parsed spoken material anywhere with the exception of DCPSE (which only contains spoken material). The picture below shows ICECUP 3.1 displaying a single tree from the spoken part of the corpus.
ICE-GB has been fully checked. It was checked by linguists at several stages in its completion, using both a traditional post-checking strategy and also by cross-sectional error-based searches. We do not believe that the analysis in the corpus is perfect, but it is not systematically imperfect - unlike the best parser output.
ICE-GB comes complete with ICECUP. ICECUP allows you to perform a variety of different queries, including using the parse analysis in the corpus to construct Fuzzy Tree Fragments to search the corpus.
Release 2 of ICE-GB is now available. This includes, as an optional paid-for extra, the digitised speech recordings of the spoken part of the corpus, aligned with the text. This allows researchers to play back the original source of the text that they can see on their screen.
A sample corpus from ICE-GB Release 2 and ICECUP 3.1 is now available for download. We also invite linguists to contribute to the development of cutting-edge corpus linguistics tools by participating in our beta programme.
A book about ICE-GB and ICECUP was published in 2002.
More information about ICE-GB
ICE-GB Release 1 was a milestone in corpus linguistics when it was released in 1998. Why should any current user of ICE-GB consider upgrading to Release 2? Here are three reasons.
- An enhanced corpus. We have reinstated some missing material and corrected the transcription (and thus the parse analysis) when we reviewed the recordings.
- More facilities. ICECUP 3.1 contains many more facilities for search and exploration. These include lexical wild-card queries, enhanced FTFs, an integrated lexicon and grammaticon for ICE-GB and the ability to compute and extract statistical tables.
- Synchronous audio. If you want to play the audio aligned with the transcription you will need to upgrade. This facility means that if you search for a word in the corpus, you can hear the passage containing that word.
The ICE-GB project is coordinated by the Survey of English Usage. Enquiries should be sent to the Survey (firstname.lastname@example.org).
This page last modified 28 May, 2015 by Survey Web Administrator.