How FTFs can be used to perform natural experiments with a parsed corpus like ICE-GB and DCPSE.
Part 5. Predicting one grammatical variable with another
We now turn to the study of the interaction between two grammatical variables, i.e., how one aspect of grammatical structure might interact with another.
This is slightly more complicated in practice than studying how sociolinguistic context might affect a grammatical outcome. The central practical problem in ICECUP is you cannot complete a contingency table by computing intersections with logical ‘and’ in the normal way.
If you are not quite sure that you understand the simpler example in Part 2, please reread the discussion.
The concept of a ‘case’ in a grammatical sample
A corpus consists of independently – if not randomly – sampled “texts”. If we find two examples of a phenomenon in separate texts, we can readily assume that these examples have arisen independently. However, what if the phenomenon appears in the same text, in the same flow of utterances or in the same single utterance? We discussed this previously.
Note that a corpus text is not like a regular database, where records are typically independent from one another. Where records in a regular database are related (e.g., samples collected over time), they should be analysed differently.
In grammar, a case could be a single constituent, like a clause, or a group of associated constituents expressed as a more complex FTF.
If we want to investigate how two aspects of a grammatical phenomenon interact, we should note the following.
- The two variables must apply to the same phenomenon. That is, both variables must be based on the same fundamental definition of the case in question. Note that this fundamental definition could be specified as a set of alternative FTFs (as in the example below), but these alternatives should form a meaningful group (e.g., with/without a constituent). We will adopt the convention of specifying a definition of the case in the top left of the contingency tables.
- In practice, this means that we specify FTFs for every cell in the contingency table, not just for every column. We cannot use drag and drop logic to calcuate the intersections. However, we must avoid ambiguity in FTF relations (avoid unordered or eventual relations if at all possible).
- We should enumerate all alternatives of the case. Where an FTF cannot be used directly (ICECUP 3.1 allows a search for unspecified features but ICECUP 3.0 did not) we may infer these values by subtracting from a more general case, representing the total.
Some examples will make these points clearer.
Extending the basic approach to grammatical interactions
Suppose that we are interested in investigating aspects of clause structure and we want to find out whether one grammatical variable (say, the mood {exclamative, interrogative, etc.} = IV) affects another (say, the transitivity feature = DV).
- We construct a contingency table as before. Instead of performing FTF queries for each grammatical outcome we must define FTFs for each combination of dependent and independent variable. As before, each total is the sum of all preceding rows or columns.
CL | dependent variable (transitivity) | ||||
DV = m (montr) | DV = d (ditr) | ... | TOTAL | ||
independent variable (mood) | IV = e (exclam) | CL (exclam, montr) | CL (exclam, ditr) | e and (m or d or...) | |
IV = i (inter) | CL (inter, montr) | CL (inter, ditr) | i and (m or d or...) | ||
... | |||||
TOTAL | (e or i or ...) and m | (e or i or ...) and d | (e or i or ...) and (m or d or...) |
- If you want to ensure that mood and transitivity are fully enumerated, you may also allow for them to be unmarked. This makes the experiment more robust. In ICECUP 3.1 you can retrieve these cases directly. Alternatively, you can obtain results by subtraction – see the table below. Essentially, you do a search for CL(inter) and then for CL(inter, montr), CL(inter, ditr), and so on. Subtracting these more specific clause instances from CL(inter) will eventually give you the unmarked set. (This is the old ICECUP 3.0 method.)
Incidentally, there is an important distinction between the meaning of unmarked mood and unmarked transitivity. All clauses should be classified by transitivity, so if the feature is absent, the clause is incomplete or an error. Mood, on the other hand, is optional (and meaningful): if the mood is unmarked, it is assumed to be indicative.
The grand total is then simply the result of performing a query for ‘CL’. If you can write an explicit FTF here, this FTF defines the case. In the table above, the grand total is the set of all clauses where both transitivity and mood are stated (which is not always the case).
CL | dependent variable (transitivity) | |||||
DV = m (montr) | DV = d (ditr) | ... | DV = 0 | TOTAL | ||
independent variable (mood) | IV = e (exclam) | CL (exclam, montr) | CL (exclam, ditr) | CL(exclam) | ||
IV = i (inter) | CL (inter, montr) | CL (inter, ditr) | CL(inter) | |||
... | ||||||
IV = 0 | ||||||
TOTAL | CL(montr) | CL(ditr) | CL |
The unmarked cells contain the result after subtracting all other values from the total.
Q. How does including ‘unmarked’ elements increase the robustness of the experiment?
A. Including an ‘unmarked’ column or row increases the background noise in the experimental design slightly but makes your claims more general. For example, if you want to see if the mood interacts with the monotransitive case (DV = m), it is preferable to say that “the probability that the clause is marked as monotransitive is affected by mood” rather than “if the transitivity and mood are stated, the probability that the clause is marked as monotransitive is affected by mood.”
Note that predicting the unmarked outcome (DV = 0) may not be very useful (except to detect errors).
- You can set up a simple goodness of fit chi-square test for each outcome of the dependent variable. A goodness of fit chi-square compares an observed distribution (column) for DV = m, with an expected distribution based on the total column (DV = <any>). We scale the expected distribution as before.
You can also perform a single homogeneity chi-square test for the entire table, as before, to see if there is an interaction going on, without specifying where.
You can plot proportions of any value of the dependent variable, in the form p = CL(montr)/CL, for each value of the independent variable, e.g. p1 = CL(montr, exclam)/CL(exclam). With more than 2 values of the DV, you can plot rates with Wilson score intervals for each value of the DV.
In summary: first we define what we mean by a case, either explicitly – “it’s a clause” – or implicitly – “here are x alternative types of a case,” and collect frequency statistics separately for each cell in the table. The variable is fully enumerated for the dataset if the total number of cases always adds up to the total for each separate column or row in the table.
Examples
1. Two features in the same constituent
- Q. Does transitivity affect mood in clauses?
Our first example is achieved by simply completing the table in the basic approach. Using ICECUP and the complete ICE-GB corpus, we perform queries for each combination of mood and transitivity, including where features are absent. (We assume that the features within one clause are independent from the features within another.) Note that the overwhelming majority of clauses are not marked for mood (labelled 0 below), i.e., they are declarative (also termed ‘indicative’). Clauses whose transitivity is unmarked are in error, or the transitivity cannot be determined, possibly as a result of ellipted material. In this case the clause should be labelled ‘CL(incomp)’.
CL | IV (transitivity) | ||||||||
DV (mood) | montr | ditr | dim'tr | cxtr | trans | intr | cop | 0 | TOTAL |
exclam | 6 | 0 | 0 | 0 | 0 | 2 | 14 | 1 | 23 |
inter | 2,193 | 72 | 17 | 132 | 90 | 1,350 | 1,869 | 64 | 5,793 |
imp | 1,139 | 62 | 25 | 128 | 112 | 697 | 54 | 4 | 2,221 |
subjun | 61 | 2 | 1 | 10 | 3 | 85 | 70 | 1 | 233 |
0 | 58,502 | 1,589 | 199 | 3,803 | 2,373 | 30,281 | 29,867 | 10,295 | 136,909 |
TOTAL | 61,907 | 1,725 | 242 | 4,073 | 2,578 | 31,874 | 31,874 | 10,365 | 145,179 |
We can now investigate, for example, if the transitivity of monotransitive clauses affects the mood.
CL | IV (transititvity) | ||
DV (mood) | observed | expected | χ² |
exclamative | 6 | 10 | 1.4782 |
interrogative | 2,199 | 2,470 | 29.7834 |
imperative | 1,139 | 947 | 38.8935 |
subjunctive | 61 | 99 | 14.8069 |
0 (unmarked) | 58,502 | 58,381 | 0.2528 |
TOTAL | 61,907 | 61,907 | 85 |
The calculation is as follows:
- observed O = {6, 2223, 1066, 62, 58984},
- expected E = {10, 2470, 947, 99, 58381} (approx.).
- chi-square χ² = Σ(E – O)²/E = 4²/10 + 271²/2470 + 192²/947 + 38²/99 + 121²/58381 = 85.
This score is greater than the critical value of χ² for 5-1 = 4 degrees of freedom, crit(4, 0.05) = 9.4877. So we can report that there is a significant difference.
The null hypothesis is that monotransitive distribution by mood correlates with (closely fits) all clauses. This may be rejected.
The table summarises the observed and expected distributions and the contribution that each pair of values (i.e., (E – O)²/E) makes to the overall χ². Note that although there are many cases of ‘mood = 0/unmarked’ (indicative), this contributes the least variation (121²/58381 = 0.2528). The largest contributions to the chi-square are imperative and interrogative.
What do these individual chi-square scores represent? They may be thought of as estimates of “the robustness of the claim of a deviation from the mean.” The mean is the expected score, and each observed score is subtracted, squared and divided by it. It used to be recommended to compare particular contributory scores (or their signed square root), but today it is considered preferable to plot data points with confidence intervals. See Chapter 7 in Wallis (2021).
The goodness of fit assessment matches uncertainty in the distribution of each transitivity class against the TOTAL column. Confidence intervals may be assigned to each based on the total amount of data, as shown in the figure below. Thus there are n = 61,907 monotransitive clauses (first column), of which 6 (~0%) are exclamative, 2,223 (3.5%) are interrogative, and so on. Our goodness of fit assessment compares the monotransitive cluster (left, with confidence intervals) with the TOTAL (right).

A visual inspection suggests interrogative and imperative are the most visibly different, reflecting our observation about chi-square contributions.
You can repeat the goodness of fit test for the other columns.
Wallis (2021: 157) takes this type of analysis further, proposing that a more meaningful approach would involve subdividing transitivity classes (complementation patterns) into speaker or writer decisions, such as to add or not to add an object. Our data includes the copular class (cop), but this is really a special case, so should be excluded. Thus we might compare monotransitive and intransitive columns in a 5 × 2 homogeneity chi-square, and plot the probability that a complementation pattern contains a direct object with and without an indirect object.
2. Two features within a structure
- Q. Does a phrasal adverb affect the following preposition?
What if we want to look at interactions within a group of related constituents rather than within a single node?
Consider clauses containing two adverbial elements: an adverb phrase followed by a prepositional phrase, expressed by the FTF below left. A matching example ‘No, I load up [AVP] with fast film [PP]’ (S1A-009 #19) is seen below.

Now, suppose that we want to establish if the fact that the adverb in the adverb phrase is or is not phrasal (as in “finding out”, “moving on”, etc.) affects whether the preposition in the following prepositional phrase is also marked as phrasal.
We might write something like “ADV(?phras) → PREP(?phras)” as a shorthand for this idea (the question mark in our notation indicates that the feature is optional).

The example below right contrasts with cases such as S1A-006 #60, “It’s only just come out [ADV] in the cinema [PP]” where the preposition “in” is classed as general rather than phrasal.
We therefore introduce two constituents (for the adverb and preposition) into the FTF.
We can construct four FTFs like the one above, identical in all respects except that each has a different combination of features.
- The simplest way to get data in this situation is to create the pattern without specifying the features. This gives us the grand total (bottom right of the table), 4,189.
- Then add 'phrasal' to 'adverb type' for the adverb, and run the search again. This will give us the total for ADV(phras), i.e. the first column total, 2,034.
- Then add 'phrasal' to 'preposition type' to the preposition, and repeat. Now we have the first cell, 611.
- Finally remove the 'phrasal' feature from the adverb and repeat the search, yielding the total for the first row, 629.
You can now obtain the remaining cells by subtraction. A handy 2 × 2 chi-square spreadsheet does this automatically for you.
IV | |||
DV | ADV(phras) | ADV(¬phras) | TOTAL |
PREP(phras) | 611 | 318 | 929 |
PREP(¬phras) | 1,423 | 1,837 | 3,260 |
TOTAL | 2,034 | 2,155 | 4,189 |
Note that the IV and DV are both Boolean (the presence or absence of a single feature). Entering this data into a 2 × 2 chi-square spreadsheet reveals a homogeneity chi-square score of 141, which is significant.
Correlations do not tell us the order speakers make decisions
The homogeneity test is associative, i.e. we can see a bi-directional correlation.
- The tendency to employ phrasal prepositions increases if it is preceded by phrasal adverbs.
- But it is also true to report the tendency for a preceding adverb is greater if it is followed by a phrasal preposition!
We should not assume that the fact that one construction follows the other in word order that the speaker made language production decisions in that order, and that therefore we can say ‘choice 1 influences choice 2’, especially when constructions are small and adjacent.
- Consider attributive adjectives modifying a common noun head, like the blue fish or a long hard look.
It seems safe to assume that usually, the choice of adjective is made with the target concept (if not the exact word specifying the head) in mind. The fact that this word is uttered after the adjective should not distract us from the realistic assessment that our cognitive language production processes are more likely to select the concept we wished to express first!
We may be interested in whether the IV predicts the DV, but in fact the same test evaluates whether the DV predicts the IV.
We might also report Cramér’s ϕ as 0.1839 ∈ (0.1541, 0.2131).
Alternatively we may carry out an evaluation of the difference of proportions:
- p1 = 0.3004 ∈ (0.2809, 0.3207),
- p2 = 0.1476 ∈ (0.1332, 0.1632).
We may visualise this as a binomial proportion difference graph (below).

The difference between the proportions is
- d = p2 − p1 = -0.1528 ∈ (-0.1777, -0.1278).
Note that the difference has the opposite sign to ϕ, and may be larger or smaller in absolute terms. Both intervals exclude zero and are significant.
3. A feature and a constituent
- Q. Does verb transitivity affect the presence of a following adverbial clause?
As well as representing aspects of a node (features, function, category), grammatical variables can represent the presence of a structural item, such as a node, in the first place. In the following example we ask, does the transitivity of a verb predict whether or not the verb phrase might be followed by an adverbial clause?
Consider examples like ‘We can get that out [if you want]’, where ‘if you want’ is a grammatically optional element (termed an Adverbial by Quirk, more commonly called an Adjunct). The speaker is free to include or exclude this clause.
![Example case with an optional constituent: ‘We can get that out [if you want]’ Example case with an optional constituent: ‘We can get that out [if you want]’](https://www.ucl.ac.uk/english/sites/english/files/styles/non_responsive/public/optional-m.png?itok=48XMM5_6)
Suppose we wish to ask whether the transitivity of the preceding clause correlates in some way with the presence of this element. Do certain complementation patterns tend to exclude or include them more than the average?
Again, our first step is to define our FTFs. We will use the following template. We will create two sets of FTFs, one with the optional blue node (A, CL) included, and one with it removed. The white arrow means the node must be eventually following the VP in the same clause.

If we wished to simply test one feature, say 'trans' against the average, then we could enumerate a 2 × 2 table, as above, and perform a goodness of fit test. But we will list all transitivity types and the total, as in Example 1.
- We enumerate FTFs for the first column (A,CL) and the TOTAL column (without the clause), and obtain the middle column by subtraction.
- Note that FTFs do not allow us to specify that something is absent, i.e. that the VP node must not have a following A,CL. But we can still get data!
A,CL | ¬A,CL | TOTAL | |
montr | 4,644 | 54,428 | 59,072 |
ditr | 117 | 1,554 | 1,671 |
dimonotr | 19 | 236 | 255 |
cxtr | 284 | 3,611 | 3,895 |
trans | 130 | 2,384 | 2,514 |
intr | 2,746 | 26,691 | 29,437 |
cop | 2,147 | 27,003 | 29,150 |
0 | 59 | 791 | 850 |
TOTAL | 10,146 | 116,698 | 126,844 |
An 8 x 1 goodness of fit test for A,CL against the TOTAL column obtains χ² = 112, which is greater than the critical value of 14.0671 at a 0.05 error level. But as previously noted, with very large amounts of data, many differences are likely to be significant. Corresponding confidence intervals will also be small.
Plotting the proportion of A,CL cases over the total column with 95% Wilson score intervals gives us the following graph. The mean is slightly less than 0.8, and if copular and unmarked cases are left out, it is about 0.82.

The transitive, intransitive and copular cases stand out. Copular cases (mostly forms of BE) are a special case, so arguably should not be included in a mean.
The monotransitive pattern is the most frequent overall, contributing almost half of the data. It is just significantly less than the mean excluding copular and unmarked cases (recall these are likely incapable of being determined or were left blank in error).
There are a number of ways in which one could take this research. For example, one could consider patterns which include direct objects and those that do not, or cases including indirect objects and those that don't, and so on. In addition one might subdivide the data by whether it is spoken or written (see below).

The problem of overlapping cases
The example above can fall victim to two different kinds of overlapping.
- Embedding. The optional adverbial clause is the superordinate clause of another case.
- Multiple matching. There is more than one adverbial clause within the same superordinate clause. (In principle, according to the structure there could be more than one VP, but this is prevented by the grammar.)
The second type usually arises because there is a white arrow in the FTF. We can construct an FTF to detect both types of overlap.
Type 1: Below is an FTF for the first type of construction. We find 9,000 matches in ICE-GB out of 10,146. Nearly 9 out of 10 adverbial clauses in such structures contain a VP and a verb, which is not altogether surprising! Do these clauses interact with one another, and if so, does this undermine the sampling assumptions of the experiment?

Most adverbial clauses will also be clauses of the first kind.
In a narrow sense, we are obviously referring to two distinct constituents. However, an adverbial clause which contains another clause is called a complex clause. Moreover, when we examine the results of the query above left, we find that out of the 9,000 cases where the adverbial clause contains a verb, a mere 472 contain a second adverbial clause! The 8% proportion of all cases becomes 4%. This type of repeated embedding interaction is discussed in Wallis (2019).
Therefore, it seems that if the clause we are investigating is itself an adverbial within another clause, then it decreases the likelihood that it contains a second adverbial clause. (We can perform a mini-experiment to check this, such that the IV is the presence of the preceding structure and the DV is the presence of the second adverbial clause.)
Type 2: However, this does not tell us whether or how the fact that the clause is an adverbial in another clause affects our experiment.
We may set up a further experiment, placing the first in a framing constituent. We subdivide these 9,000 cases (below) to examine the interaction between the transitivity feature of the second verb and the presence of the second adverbial clause. We might find that this sample behaves differently from the original one, an important rider to the original experiment. The FTF pattern below then replaces the previous one.

There can be more than one adverbial clause within the superordinate clause. If a second adverbial clause, the ‘adverbial FTF’ will be be counted twice within the same structure. The more general FTF without the adverbial is only counted once, because it only has one matching arrangement.
To address this first we search for structures containing two following adverbial clauses (see below). This FTF yields 328 cases out of 10,146, i.e., around 3% of cases containing one adverbial clause also contain a second, such as “Now use [VP] the brake if necessary [CL1] to stop it [CL2] <,>” (S2A-054 #22).
We can either
- decrease the total number of hits by subtracting each additional adverbial clause, saying, in effect, that the ‘case’ is the overall clause, with the dependent variable being whether or not there is any following adverbial clause, or
- increase the total, which we might do if the preceding part of the FTF was treated as a mere contextual constraint.
We have to consider what our hypothesis is. In this case our focus is on the relationship between the transitivity of a previous VP node and apart from its existence, we are not interested in how many adverbial clauses there are. We adopt the firsr method.
We construct an FTF like the following, and then review each case. It is also possible to create a series enumerating the transitive feature class of the verb.

We classify the results as follows.
- First, if the case matches only once in the same text unit, these can be subtracted directly. That accounts for 307 cases out of 328.
- The remainder reflect cases of multiple matching which we should examine manually. (Hint: ICECUP’s text browser has a red ‘number of matches’ button which identifies all cases of multiple matching in the same text unit.)
- If they are independent or embedded (as before), they are simply subtracted from the totals.
- If they match clauses with three or more adverbial clauses we subtract the extra nodes.
Finally, note that it is important to put these problems into some kind of numerical context. This type of overlap only occurs in 3% of cases, and subtracting excess matches from total is unlikely to make much difference overall.
References
Wallis, S.A. (2019), Investigating the additive probability of repeated language production decisions. International Journal of Corpus Linguistics 24:4, 490-521. » corp.ling.stats
Wallis, S.A. (2021), Statistics in Corpus Linguistics Research – A New Approach, New York, London: Routledge. » order