Performing experiments using FTFs
Predicting one grammatical variable with another
We now turn to the study of the interaction between two grammatical variables, i.e., how one aspect of grammatical structure might interact with another.
This is a bit more complicated than studying how sociolinguistic context might affect a grammatical outcome. The central practical problem is that you cannot complete a contingency table by computing intersections with logical ‘and’ in the normal way.
If you are not quite sure that you understand the simpler example on the previous page, please reread the discussion.
The concept of a ‘case’ in a grammatical sample
A corpus consists of independently  if not randomly  sampled “texts”. If we find two examples of a phenomenon in separate texts, we can readily assume that these examples have arisen independently. However, what if the phenomenon appears in the same text, in the same flow of utterances or in the same single utterance? We discussed this previously.
Note that a corpus text is not like a regular database, where records are typically independent from one another. Where records in a regular database are related (e.g., samples collected over time), they should be analysed differently.
In grammar, a case could be a single constituent, like a clause, or a group of associated constituents expressed as a more complex FTF.
If we want to investigate how two aspects of a grammatical phenomenon interact, we should note the following.
 The two variables must apply to the same phenomenon. That is, both variables must be based on the same fundamental definition of the case in question. Note that this fundamental definition could be specified as a set of alternative FTFs (as in the example below), but these alternatives should form a meaningful group (e.g., with/without a constituent). We will adopt the convention of specifying a definition of the case in the top left of the contingency tables.
 In practice, this means that we specify FTFs for every cell in the contingency table, not just for every column. We cannot use drag and drop logic to calcuate the intersections. However, we must avoid ambiguity in FTF relations (avoid unordered or eventual relations if at all possible).
 We should enumerate all alternatives of the case. Where an FTF cannot be used directly (ICECUP 3.1 allows a search for unspecified features but ICECUP 3.0 did not) we may infer these values by subtracting from a more general case, representing the total.
Extending the basic approach to grammatical interactions
Suppose that we are interested in investigating aspects of clause structure and we want to find out whether one grammatical variable (say, the mood {exclamative, interrogative, etc.} = IV) affects another (say, the transitivity feature = DV).
 We construct a contingency table as before. Instead of performing FTF queries for each grammatical outcome we must define FTFs for each combination of dependent and independent variable (the cells shown in green). As before, each total is the sum of all preceding rows or columns.
CL 
dependent
variable (transitivity) 

DV
= m 
DV
= d 
... 
TOTAL 

independent variable (mood) 
IV
= e exclam 
CL (exclam, montr) 
CL (exclam, ditr) 
e and (m or d or...) 

IV = i 
CL (inter, montr) 
CL (inter, ditr) 
i and (m or d or...) 

...  
TOTAL 
(e or
i or ...) and m 
(e or
i or ...) and d 
(e or
i or ...) and (m or d or...) 
A contingency table (DV x IV) for the example
 If you want to ensure that mood and transitivity
are fully enumerated, you may also allow for them to be unmarked.
This makes the experiment more robust. The way to do this with
ICECUP 3.0 is to add a new column or row for the unmarked element,
compute an FTF for the total (e.g., for the first column, simply
‘CL(montr)’) and then subtract the frequencies from
the total. The result will be the total number of all those monotransitive
clauses whose mood is not marked, which you put in the ‘unmarked’
cell. In ICECUP 3.1 you can retrieve these values directly.
Note that there is an important difference between the mood and transitivity of clauses. All clauses should be classified by transitivity, so if the feature is absent, the clause is incomplete or an error. Mood, on the other hand, is optional (and meaningful): if unmarked, it is assumed to be indicative.
The grand total is then simply the result of performing a query for ‘CL’. If you can write an explicit FTF here, this FTF defines the case. In the table above, the grand total is the set of all clauses where both transitivity and mood are stated (which is not always the case).
CL 
dependent
variable (transitivity) 

DV
= m 
DV
= d 
... 
DV
= 0 
TOTAL 

independent variable (mood) 
IV
= e exclam 
CL (exclam, montr) 
CL (exclam, ditr) 
CL (exclam) 

IV = i 
CL (inter, montr) 
CL (inter, ditr) 
CL (inter) 

... 

IV
= 0 

TOTAL 
CL (montr) 
CL (ditr) 
CL 
A fully enumerated contingency table (DV x IV)
The white cells contain the result after subtracting all other values from the total.
How does including ‘unmarked’ elements increase the robustness of the experiment?
Including an ‘unmarked’ column or row increases the background noise in the experimental design slightly but makes your claims more general. For example, if you want to see if the mood interacts with the monotransitive case (DV = m), it is preferable to say that the probability that the clause is marked as monotransitive is affected by mood rather than if the transitivity and mood are stated, the probability that the clause is marked as monotransitive is affected by mood. Note that predicting the unmarked outcome (DV = 0) may not be very useful (except to detect errors).
 We set up a simple chisquare test for each outcome of the dependent variable as before. The chisquare compares an observed distribution for DV = m, d with an expected distribution based on the total (DV = <any>). You scale the expected distribution as before.
CL 
dependent
variable (transitivity) 

DV
= m 
DV
= d 
.  TOTAL 

independent variable (mood) 
IV
= e 
CL(exclam,
montr) 
CL(exclam,
ditr) 
CL(exclam) 

IV
= i 
CL(inter,
montr) 
CL(inter,
ditr) 
CL(inter) 

... 

IV
= 0 

observed 
expected 
Observed and expected distributions for DV = m in the first contingency table
You can perform a single chisquare test for the entire table, as before, to see if there is an interaction going on, without specifying where.
In summary: we define what we mean by a case, either explicitly  “it’s a clause”  or implicitly  “here are x alternative types of a case”, and collect frequency statistics separately for each cell in the table. The variable is completely enumerated for the dataset if the total number of cases always adds up to the total for each separate column or row in the table.
We will work through some real examples on the following page.
FTF home pages by Sean Wallis
and Gerry
Nelson.
Comments/questions to s.wallis@ucl.ac.uk.
This page last modified 28 May, 2015 by Survey Web Administrator.