ICECUP 3.1 is a state-of-the-art corpus exploration program designed for parsed corpora such as ICE-GB and DCPSE.

“The second release of the ICE-GB corpus with accompanying sound files, the first release of the DCPSE corpus and ICECUP Version 3.1 deserve serious examination from those involved in all flavours of corpus linguistics. They offer considerable opportunities for the analysis of recent language change in British English and for the analysis of discourse patterns (linguistic and extra-linguistic) from a syntactical perspective. They demonstrate how the Survey of English Usage continues to make new contributions to the use of corpora in linguistic analysis and theory.” – O’Donnell (2008).
Preview





“The possibilities offered by the Utility Program are impressive, especially with respect to syntactic queries.” – The Year’s Work in English Studies, 2004, 83.1
Corpus Map

The Corpus Map provides an overview of the corpus. Here we can see part of the spoken division of ICE-GB and its structure. The currently selected point in the corpus map is a text, S1A-002. This has two subtexts, each consisting of a dialogue with two speakers, labelled A and B. More information about the text is available on the right hand side if the window is expanded.

You can also view and extract frequency data as a table of statistics.
Lexical wild cards
ICECUP 3.1 supports lexical searches using wild cards.
Wild cards are a way of specifying only part of a word in a query. For example, you can search for all words starting with the letter P by searching for ‘p*’.

As well as indicating missing characters, ICECUP also lets the user define sets or insert predefined sets of characters (e.g. ^v = a vowel, ^c = a consonant). The example above shows a search for a three-letter word consisting of two consonants followed by the letter 'a', such as spa, bra and PTA.
description | examples | explanation | |
---|---|---|---|
* | Multiple | a* *ing b*ing | Any number of characters |
? | 1 character | a??? b?c?u?e | Any one character |
{ } | Set | w{0123} t{a-z??} | User defined set |
^ | Escape | b^vd be^c^v | Predefined set |
^? ^' ^{ ^- ^& ^^ | Literal ?, {, etc. |
ICECUP also lets users search for sets of words and wild card patterns. The set '{apple banana chocolate}' searches for all words matching apple, banana, or chocolate.
You can also create exclusion lists. For example the search '{*ing ~thing ~morning ~evening}+<N>' (a noun ending in -ing but not thing, morning, or evening) might be used to search for gerunds (this list would be extended).
Finally, ICECUP is designed to work with a tagged and parsed corpus. This means that users can restrict the lexical item by part of speech, as above, or insert the wild card into a Fuzzy Tree Fragment.
Fuzzy Tree Fragments
This is a simple Fuzzy Tree Fragment, or ‘FTF’. We believe that FTFs are intuitive grammatical queries. A whole section of this website explains FTFs in more detail.

The idea is that you use a simplified fragment of a grammatical tree to specify a search across the corpus.
FTFs contain nodes, words, links and edges. Nodes and words match the nodes and words in corpus trees. Links and edges specify relationships between nodes and words.
FTFs are very similar to syntactic trees in the corpus. Each node contains a category part, specifying the element's type, including its word class; a function part, defining the node's relationship in the phrase or clause that it is part of; and a set of features which refine the category.
There are three ways of constructing an FTF.
- You can build an FTF from scratch with the editor. This is where you make a very abstract query more specific: what is known as “top-down” editing.
- Alternatively, you can make an FTF from a tree in the corpus working from the bottom, up. ICECUP 3.1 lets you simply mark nodes in a tree and these nodes are incorporated into the FTF matching those nodes.
- You can also use Text Fragment and simple Node searches to make simple FTFs.
ICECUP 3.1 also enhances Fuzzy Tree Fragments to make them more expressive. The main way this is done is with logic. For example, the element at the top right in the figure above is defined as “N v PRON” – noun or pronoun.
In this FTF, all the links are immediate, that is, the nodes they match against must be directly connected to one another in the specified arrangement. The direct object : object complement relationship is also ordered.
More information on Fuzzy Tree Fragments and their topology is on our FTF pages.
Fuzzy Tree Fragments work on parsed corpora. To see the results of applying an FTF we provide new ways of concordancing the corpus.
Logic in Fuzzy Tree Fragments
To help you work with enhanced FTFs, ICECUP 3.1 has a vastly enhanced FTF editor interface. One of its key elements is a floating ‘edit node’ window.
Below you can see just the options for editing the function and category elements in a single FTF node.
Using this set of options, called Func:Cats, alone you can say that the node may match one of a set of possible categories (N v PRON) or cannot possess a particular function (¬NPHD). The window shows the set of functions and categories compatible with its complement.

The edit node window has two other ‘tabs’: Features and Logic.
With the Features tab you can allow a node to match any set of alternative features. You can mark a feature class as unspecified (useful in formal experiments), or individually negate any feature (to say ‘CL(~ditr)’, a clause that is not ditransitive, for example).
The Node Logic Editor
You can also edit any logical combination of patterns using the Node Logic editor. This is built into the Logic tab.
With this editor you can manipulate any propositional expression of node patterns. For example, you can specify that a node in an FTF may not match a particular pattern or may match one or other of two (or more) patterns, as shown below. (You can combine node patterns with and although this is usually less useful!)
The following diagram shows the Edit Node window with the logic editor enabled.

The currently selected node pattern is reflected in the other controls above the tabs.
To edit a pattern, you can either use these pull-down controls or select another tab and then flip back to the logic editor. To edit the overall logical expression you can click on the buttons below the tab or use the keyboard.
The simplify option, shown as a light bulb, applies logical reasoning to draw out the implications of the expression. This can be useful in preventing your expressions from becoming too convoluted!
FTF Concordancing

For concordancing to operate with FTFs, we define the concept of an FTF focus node or nodes. These were shown highlighted with a yellow border in the FTF diagram (the direct object and its following object complement). Note also that we may have multiple matches of an FTF in the same text unit.
Traditional key word in context concordancing works with plain text or tagged corpora. ICECUP supports this kind of concordancing, letting you see part of speech information covered by, or in the vicinity of, matching cases.
Moreover, the corpus is parsed. ICECUP can search the grammar at any level. ICECUP 3.1 therefore provides more powerful ways of exploring the grammatical context of sets of matching cases. These grammatical concordancing modes let you view adjacent phrases and clauses, or browse constituents.
ICECUP also allows us to view individual parse trees in the corpus. This view also allows us to see how an FTF matches against the grammar in the corpus.
Viewing Syntactic Trees and FTF matches

This picture shows one of the 83,394 grammatical trees in the corpus, matched against the FTF that we have already seen.
You can navigate around the tree, expand or contract branch structure, or use the mouse to zoom in on parts of the tree to see more detail.

ICECUP 3.1 supports zooming and dragging of many windows, including tree views.
While exploring the corpus you may find that some other part of the tree is interesting, and wish to create an FTF using this as a base. You can use the Wizard tool to create your FTF from any tree in the corpus.
FTFs are designed around the observation that the greatest difficulty that people have searching a parsed corpus is learning the grammar, both in abstract, and in its practical expression in the corpus. ICECUP lets you perform complex grammatical searches quickly. FTF searches are efficient and operate in the background. If you make a mistake, you will find out quickly.
You can use the corpus map and the other viewers to find examples of utterances in a particular linguistic context and then use their parse analysis to build your Fuzzy Tree Fragment. We turn to this question next.
Creating FTFs from Trees – the Wizard tool
The simplicity of FTFs is that it is quite easy to see how the FTF query matches cases in the corpus.
Moreover, this matching process can be reversed. We can take a tree and abstract a query from it. The slogan is
“Use the corpus to search the corpus.”
ICECUP 3.0 introduced an innovative Wizard facility for making FTFs from trees. You selected a branch of the tree and then pressed the Wizard button. This then presented the user with a large window with a series of options. However this was rather confusing.
In ICECUP 3.1, we provided a new “Version 2” Wizard. This works slightly differently. The idea is that you first mark the nodes to include in the FTF, and then ask the Wizard to relate them together.
To mark the nodes you can right click them with the mouse, right-drag a box around them (as above) or use the ‘Select nodes for Wizard’ button. When you are happy you hit the Wizard button as before.

Suppose you mark the nodes in the sentence Let's just stop there [S1A-001 #84] as above and hit the Wizard button.
The Wizard options control the process. Depending on the status of the option 'Make tree links immediate' (yes / no), ICECUP will either
- construct an FTF with the same number of intervening nodes as the tree (immediate = yes), or
- delete intermediate nodes and insert an 'eventual' ancestor link (immediate = no).


You can also ask ICECUP to set the edges of the FTF or to mark word order in the FTF. For more information on what the different links and edges do, see Links and Edges in FTFs.
You can edit the FTF and apply it to the corpus. ICECUP is extremely forgiving in its interface, allowing linguists to experiment with the grammar.
Drag & Drop Logic

The way it works is that every text browsing window stores the results of a query. The “query” may be simple, like the whole corpus, a random sample from the corpus, or a text. Or it may be complex, like an FTF. Irrespective, each query can be thought of as a distinct atomic element, which may be combined with another using a very simple kind of logic: propositional logic.
Drag and Drop logic has two parts: the drag and drop interface, which allows you to pick up elements from one query and copy or move them into another query (also possible using Copy and Paste commands) and modify the logical relationships between these elements; and the rapid recalculation of results, which shows the impact of each logic change across the entire corpus near-instantaneously.
The Lexicon and Grammaticon
ICECUP 3.1 has two new overview tools, the Lexicon and Grammaticon. Like the Corpus Map, they are a way of viewing different corpus queries as a hierarchical structure.
You can define your own lexicon view of any lexical or word class element, e.g. a lexicon of pronouns or of words matching ‘prett*’ (below). You can ask ICECUP to subdivide the lexicon into sections, e.g. by part of speech or initial letter.

Statistical tables
One interesting extension in the new version of ICECUP is its support for tables of statistics derived from the corpus. ICECUP 3.1 lets you define tables in the Corpus Map, Lexicon and Grammaticon.

You can add query columns to the table by Drag and Drop and insert columns to calculate ratios and evaluate statistical significance. For example, the last column in the table above shows the proportion in ICE-GB of utterances of ‘pretty/ier/iest’ made by male participants.
You can also organise the lexicon and contract the hierarchy to view statistics at an appropriate level of generality. The table below shows baseline statistics for different parts of speech for words starting with ‘work’.

Corpus Map tables can be used to explore whether sociolinguistic variation predict other changes. Lexicon and grammaticon tables, like these here, can be used to see if lexical or grammatical alternation can predict sociolinguistic variation (a kind of 'stylistic' prediction).
Note: If you want to explore whether one lexico-grammatical choice interacts with another (two text-internal variables), then ICECUP's tables will not give you the results you need. Instead, you should use the methods outlined for Grammatical predictors in the FTF experiments webpages.
Sound playback (ICE-GB, with ICE-GB R2 Sound)
ICECUP 3.1 also supports simultaneous synchronised audio playback using sound recordings (available as an optional extra). ICE-GB R2 Sound is the digital audio for the 300 spoken texts in the ICE-GB corpus, subdivided into individual text units or short series of text units, and accurately aligned with the texts. You can download the audio for the ICE-GB sample corpus by following the links below.
Once the audio is installed you can use ICECUP to hear the audio. Playback controls will appear on the button bar.

Simply press play to hear the sound recording for a particular text unit in the corpus.
The following examples are also included in the sample corpus package.

Sometimes it is not possible to subdivide the audio into a single text unit. In the audio for S1A-012, the first sentence we hear speaker A, with speaker B overlapping.

Of course, ICE-GB does not simply consist of text but grammatically annotated trees. With audio installed you can open a spy window while a text is playing. You can browse trees and hear speakers talk.
Using the continuous play option, when one section has been played back ICECUP will move onto the start of the next one and play this.
