How FTFs match trees


Introduction

In order to understand FTFs you need to understand how their components work together. How does a program like ICECUP decide that an FTF matches (part of) a tree in the corpus?

Recall that an FTF is declarative, in other words: all aspects of the FTF must be true together, and the order in which they are evaluated is not important. (We will not trouble ourselves with how this might work here.)

All the examples are based on ICECUP, although, as we comment elsewhere, other programs could use the same principle. On the democratic principle, examples of matching cases are taken from the freely available sample corpus download (which is itself a sample of ICE-GB).

Single elements

We will start with some simple examples. First we will consider FTFs consisting of a single node and a single word.

Single node FTFs

Our first examples are single node FTFs.

You can generate a single node FTF in ICECUP using the ‘(inexact) Nodal’ query, typing the expression, e.g., “OD,CL” and hitting the ‘Edit’ button.

The FTF you get should look like the example below.

Next, find an example of a matching case.

If you then press the key ‘F4’ or the ‘Start!’ button, you will get a complete list of examples of (in this case) direct object clauses. Then double-click on an example to open a tree window.

The resulting tree should look something like the one below (this is S1A-010 #149). The matching case is the node in brown, and the part of the text dominated by the node is shaded. This is because the node also has the ‘focus’ of the FTF.

The FTF contains a number of unspecified edges apart from the ‘OD,CL’ designation. But because these are unspecified, they do not limit the position of the matching case. So this query will match direct objects in the last position in the branch, or in other positions, e.g., as in “I think you will agree because... they were dumbfounded”[S1A-094 #52], where “because... they were dumbfounded” is analysed as an adverbial clause.

The second point to notice is that the FTF explicitly contains a ‘word’ element, which is unspecified. If we did specify the word, the FTF would only match examples that contained that word (note that the position of the word would not be specified within the set of covered words).

Finally, note that the FTF can match more than once within a single corpus tree.

Single word FTFs

In the case of single word FTFs, that is, an FTF that searches for a single text unit element, we must specify aspects of the (unspecified) node.

Using the ‘Text Fragment’ command, type the word “work” and press the ‘Edit’ button.

The result should be the FTF shown. If you then perform the query, you will get examples like this one (W1B-001 #179).

The empty node still has the focus but is specified as a leaf. There is no white ‘stub’ between node and word, and there is a black dotted line, meaning that word and node are immediately connected. The node must ‘tag the word’. No other edges have been specified.

Tagged word FTFs

The previous FTF finds examples of “work” as a noun or (more rarely) as a verb, as in “We will have to work very hard.” [W2C-009 #44]. If you want to find examples of “work” as a verb, you can also use the ‘Text Fragment’ command.

In the ‘Text Fragment’ window, type the word “work”. Then, without pressing <SPACE>, press the ‘Node’ button. Position the input caret (blinking cursor) between the angled brackets and type “V”. The query should look like this: “work+<V>”. Then press the ‘Edit’ button.

If you have done this successfully, the FTF will look like the one shown here. In fact, the only difference from the previous example is that the node, still in the ‘tag position’, has the category ‘V’, meaning “verb”. An example match is shown below.

You may have noticed another difference. ICECUP 3.0 performs this search using a background search. This is because, although it has a table of indexes for words like “work” and another for elements like ‘all verbs’, it does not have one for “work as a verb”. ICECUP has to work out whether the FTF matches the tree by looking at the trees in the corpus, one-by-one. In the examples that follow, ICECUP has to do this kind of search. Last word = <unknown> First word = <unknown> Last word = <unknown> First word = <unknown> Leaf = <unknown> Leaf = <unknown> Last child = <unknown> Root = <unknown> First child = <unknown> Next word = <unknown> Parent = (immediate) parent Parent = (immediate) parent Parent = (immediate) parent Next child = Immediately after

FTF home pages by Sean Wallis and Gerry Nelson.
Comments/questions to s.wallis@ucl.ac.uk.

This page last modified 12 June, 2013 by Survey Web Administrator.