Extensions to FTFs in ICECUP 3.1

The new ICECUP 3.1 software provides a number of enhancements to Fuzzy Tree Fragments. ICECUP 3.1 is available with DCPSE and ICE-GB R2 corpora.

These enhancements affect the definition of nodes and words in FTFs. No changes have been made to links and edges, the topology or matching of FTFs.

Nodes

In ICECUP 3.0 an FTF node could have an optional single function or category label. It could also have any number of feature labels provided that they were consistent with the category, with only one feature per feature class.

In ICECUP 3.1 an FTF node can now contain a logical combination of node patterns. Each pattern can contain:

  • sets of possible functions and categories (optionally, negated)
  • any number of features, positive or negative (plus any feature class can be marked as ‘unspecified’)

The following examples illustrate these extensions.

function category feature  
Simple SU SU,CL CL(cop) 3.0
Unspecified 0, 0,0 CL(!transy) 3.1
Sets {OD,SU} {OD,SU},CL CL(intr,cop)
Negation ¬SU {¬OD,¬SU},CL CL(¬cop)
Logic ¬(SU) (SU ∧ CL) (CL(cop) ∨ SU)
Levels of FTF node complexity in ICECUP

Unspecified

You can search for an unspecified function, category or feature class. Although (in a complete corpus) functions and categories should only be unspecified if the tree is empty, unmarked feature classes are quite common. They may be unspecified because the feature is optional, the element is ambiguous or they may be unmarked in error.

Searching for an unspecified feature class is particularly useful when you want to exhaustively list all subtypes of a particular node pattern. This is labelled IV=0 or DV=0 in the experiment pages. In ICECUP 3.0 you had to calculate the remaining unspecified elements. Now you can obtain the values easily, e.g. “CL(!transy)” finds all clauses whose transitivity has not been marked.

Sets and negation

Function and category sets allow you to easily define broader groupings than those defined by the grammar. For example you may want to embrace all types of direct object “{PROD, NOOD, OD}” within the same query. The easiest way to do this is with a set. A negated set can be used to likewise remove possibilities from a node. Thus “{¬OD,¬SU},CL” is a clause which is anything other than a direct object or subject.

Feature sets can be used to obtain results where different subtypes of features are not of interest, or where the frequency is very low (what is known as ‘collapsing values’). Both intransitive and copular are transitivity features of clauses. The query “CL(intr,cop)” searches for clauses which are either intransitive or copular.

NB. If they belong to different feature classes, as in “N(com,plu)”, features are independent. If they are members of the same class, e.g., “N(com,prop)”, then they are treated as members of a set.

Logic

The introduction of propositional logic into nodes is most useful for the introduction of wholesale negation (where you say a particular node in an FTF may not conform to pattern A) and disjunction (where you say that a node could be either pattern B or pattern C).

The node logic editor in ICECUP 3.1 lets you edit these expressions. It also includes a simplify command which draws out the logical consequences of a particular expression.

Two further extensions

In addition you can specify:

  • structural pseudo-features such as “ditto” (‘ditto-tagged’), and
  • that any pattern is exactly matched, e.g. “=SU,NP”.

Exact matching works by replacing all unstated features with the explicit unmarked feature class (see above) and removing features which fall within the same feature class.

Words

The second major extension to ICECUP 3.1 is the introduction of an extensive wild card system into the ‘word’ slot in an FTF.

In ICECUP 3.0 you could optionally include a lexical item and these items could be ambiguously matched by case or accent.

ICECUP 3.1 lets you specify sets of wild card patterns (including negated patterns). Each wild card consists of a string of characters optionally including the following special characters.

description examples explanation
* Multiple a* *ing b*ing Any number of characters
? 1 character a??? b?c?u?e Any one character
{ } Set w{0123} t{a-z??} User defined set
^ Escape b^vd be^c^v Predefined set
^? ^' ^{ ^- ^& ^^ Literal ?, {, etc.
Lexical wild cards in ICECUP

For more information see ICECUP 3.1: lexical wild cards.

The set representation lets you list alternatives or define a wild card and delete specific alternatives. Moreover, because they are part of an FTF, any lexical pattern can be constrained by the node which tags it. Thus you can write “{*ing ~thing}+<N>” meaning “any -ing noun except ‘thing’.”

FTF home pages by Sean Wallis and Gerry Nelson.
Comments/questions to s.wallis@ucl.ac.uk.

This page last modified 12 June, 2013 by Survey Web Administrator.