Fuzzy Tree Fragments

This is the FTF home page at the Survey of English Usage.

Introduction

Fuzzy Tree Fragments (FTFs for short) are a handy way of specifying a grammatical query in a parsed corpus or treebank. The idea is that you represent your query by drawing the kind of structure that you are looking for. FTFs were initially developed under the auspices of the Corpus Query project (the report summarises FTFs quite well from a technical point of view), but we are continuing to work on, and with, the system. Latest developments included as part of the latest ICECUP 3.1 release are summarised here.

At present, FTFs are used in the ICECUP software, which you can download for free from our website, together with 20,000 words from the ICE-GB corpus. Alternatively, if you are interested in recent change in grammar, you may prefer to download the DCPSE sample corpus.

The idea of these pages is to explain FTFs: to help users use and understand FTFs, what they can do, and what (at present) they can’t. Although FTFs are designed to be easy to use - see comments on evaluating FTFs and other query systems - we have found that some users find the more sophisticated options difficult to understand, or cannot see how to use them to form the query that they want. On the principle that, if a user is making a mistake, it’s probably our fault for not explaining the method properly, we have decided to set up this resource. (You may wish to bookmark this page in your browser.)

We expect the site to grow, particularly in response to feedback. So if you think that there is a question that is not properly covered here, email us and we will respond and possibly add to or modify these pages. You may see the answer to your query up on the site (don’t worry, we won’t put your name up in lights!).

What are FTFs?

Fuzzy Tree Fragments are approximate ‘models’, ‘diagrams’ or ‘wild-cards’ for grammatical queries on a parsed (tree-analysed) corpus. Because they are models, they are essentially declarative, that is, there is no right or wrong order for evaluating the elements - like logical statements, elements must be true together.

FTFs are generalised grammatical subtrees representing a model of the grammatical structure sought, with only the essential elements retained - a ‘wild-card’ model for grammar. The idea is fairly intuitive to linguists while retaining a high degree of flexibility. Nodes and text unit elements may be approximately specified, as may links between elements, and ‘edges’ (unary structural properties such as ‘first child’).

FTFs are diagrammatic representations: they make sense as drawings of partial trees rather than as a set of logical predicates. Such diagrams have the property of structural coherence, that is, it is immediately apparent if an FTF is feasible and sufficient (grammatically and structurally). You can’t draw a tree containing two nodes where each one is the parent of the other, but you might write this expression in logic by mistake.

Components of FTFs

FTFs contain the following elements.

  • Nodes’, which are drawn as white ‘boxes’ divided into function, category and feature partitions (see ICE grammar). At least one node must contain a yellow border indicating that it is the ‘focus’ of the FTF. ICECUP employs this focal point to indicate the portion of text ‘covered by’ the FTF, and to organise concordancing displays.
  • Words’, including all lexical items and pauses (strictly, we should call them ‘text unit elements’). These are drawn on the other side of the divider from the tree structure. In the example above the words are unspecified.
  • Links’ joining two elements together. There are two kinds of link between two nodes (called ‘Parent’ and ‘Next’) and one type of link between two words (’Next word’).
  • Edges’, which are properties of single nodes or words. An edge might specify, for example, that a node is a leaf node, or a word is the first in the sentence.
  • Each link and edge can be set to one of a set of different ‘values’ or ‘statuses’, which are summarised on the next page. The status of a link or edge can be set by clicking with the mouse on the "dot" or ‘cool spot’ in the middle of the element. In order to keep the distinction between them clear, blue dots are used for node edges and links, while green is for words.

    In this example, both ‘Parent’ links are set to ‘immediate parent’, the ‘Next (child)’ link is ‘immediately after’ (hence the arrow), and the ‘Next word’ link is <unknown>. All edges are set to <unknown>, i.e., they are unspecified.

Links are coded for adjacency, order and connectedness and depicted so as to exploit this notion of structural coherence. Thus, the ‘Parentparent:child relation in an FTF can be either immediately or eventually adjacent (called ‘parent’ and ‘ancestor’ respectively, and coloured black or white).

On the other hand, the ‘Next (child)’ sibling child:child relation may be set to one of a number of options, from ‘immediately following’ (depicted by a black directional arrow), through ‘before or after’ (a white bi-directional arrow), to ‘<unknown>’ (no arrow). Bear in mind that two ‘siblings’(sisters, if you prefer) in an FTF need not match sibling nodes in a corpus tree (as in ‘Text fragment’ examples). Links and edges are summarised in the next section.

The final benefit of the graphical approach is that it is relatively easy to see the relationship between an FTF and corpus trees. This applies both to matching and to abstraction (creating an FTF from an example in the corpus).

On the FTF links page we list the links and edges used in ICECUP and make some points about how these work together in common situations. This is explained in further detail, using a series of examples, in the FTF matching pages.

FTF home pages by Sean Wallis and Gerry Nelson.
Comments/questions to s.wallis@ucl.ac.uk.

This page last modified 12 June, 2013 by Survey Web Administrator.