The Survey of English Usage publishes a range of statistics resources by Sean Wallis.

Introduction
This page includes links to a number of papers, spreadsheets and PowerPoint slides by Sean Wallis on Experimental Design and Statistics (EDS) methods for corpus linguistics.
The arguments contained within these papers are not limited in application to Survey corpora! They are written to help corpus linguists get to grips with the basic statistical methods they need.
Papers are listed by date of first e-publication, although some of these are now being published in print elsewhere.
A comprehensive discussion can be found on Sean’s corp.ling.stats blog.
Articles
Listed by their original date of online publication.
2009 Binomial confidence intervals and contingency tests. London: Survey of English Usage. » corp.ling.stats » ePublished (JQL, 2013)
2009 Grammatical Noriegas: interaction in corpora and treebanks. Paper presented at ICAME 2009, Lancaster. » PowerPoint slides
2010 Competition between choices over time. London: Survey of English Usage. » corp.ling.stats » ePublished (SICLR)
2010 z-squared: The origin and use of χ². London: Survey of English Usage. » corp.ling.stats » ePublished » PowerPoint slides (JQL, 2013)
2011 Comparing χ² tests. London: Survey of English Usage. » corp.ling.stats » ePublished (JQL, 2020)
2012 Goodness of fit measures for discrete categorical data. London: Survey of English Usage. » corp.ling.stats » ePublished (SICLR*)
2012 Measures of association for contingency tables. London: Survey of English Usage. » corp.ling.stats » ePublished (SICLR*)
2012 (with J. Bowie) That vexed problem of choice. Presented at ICAME 33. London: Survey of English Usage. » corp.ling.stats » ePublished » PowerPoint slides (SICLR)
2012 Capturing patterns of linguistic interaction in a parsed corpus: an insight into the empirical evaluation of grammar? London: Survey of English Usage. » corp.ling.stats » ePublished » PowerPoint slides » PowerPoint slides (ICAME) » Handouts (ICAME) » Data and spreadsheets (IJCL, 2019)
2012 Tagging ICE Phillipines and other corpora. London: Survey of English Usage. » ePublished
2012 A statistics crib sheet. London: Survey of English Usage. » ePublished
2014 What might a corpus of parsed spoken data tell us about language? Presented at Olinco 2014. London: Survey of English Usage. » corp.ling.stats » ePublished » PowerPoint slides (SICLR)
2015 Adapting random-instance sampling variance estimates and Binomial models for random-text sampling. London: Survey of English Usage. » corp.ling.stats » ePublished (SICLR)
2017 Detecting direction in interaction evidence. London: Survey of English Usage. » corp.ling.stats » ePublished
2018 Plotting the Wilson distribution. London: Survey of English Usage. » corp.ling.stats » ePublished (SICLR)
2020 Interval arithmetic ‘cheat sheet’. » ePublished
2020 Further evaluation of Binomial confidence intervals and difference intervals. London: Survey of English Usage. » corp.ling.stats » ePublished
2022 Accurate confidence intervals on Binomial proportions, functions of proportions and other related scores. London: Survey of English Usage. » corp.ling.stats » ePublished » PowerPoint slides
2022 Are embedding decisions independent? Evidence from preposition(al) phrases. London: Survey of English Usage. » corp.ling.stats » ePublished » Spreadsheet
2022 Directional evidence revisited: End weight bias and templating in conjoined phrase postmodification. London: Survey of English Usage. » corp.ling.stats » ePublished » Spreadsheet
JQL = Journal of Quantitative Linguistics, IJCL = International Journal of Corpus Linguistics, SICLR = Statistics in Corpus Linguistics Research, * = abridged.
Spreadsheets
- 2 x 2 χ² (multiple 2x2 contingency tests and 2x1 goodness of fit calculations)
- χ² separability test (tests whether two 2x2, 2x1, 3x2 or 3x1 tables significantly differ)
- Direction test (compares sizes of effect in different directions in 2x2 tables)
- Wilson score intervals for a small population (use when the population is finite, or analysing subsamples)
- Single-sample z test for comparing two competing frequencies for significant difference
- Interaction trend analysis (evaluates a series of repeating decisions)
- Binomial demonstrator (classroom demonstrator for Binomial distribution)
- Random-text sample recalibration (example computation)
- Plotting confidence intervals on ϕ (Note: contains macros)
- Plotting confidence intervals on algebraic functions of proportions (−, +, ÷, ×, ^, log, %diff, odds ratio)
- Bootstrap demonstrator for the single proportion (simulates a range of bootstrap methods)
- Cohen’s h confidence interval
- Diversity confidence interval (a measure similar to entropy)
- Results of evaluating tests (Ratio tests vs. Fisher, Newcombe-Wilson and 2 × Clopper-Pearson)
- Plotting distribution curves (pdfs)
- Wilson score interval, logit-Wilson and Clopper-Pearson (precalculated up to n = 10)
- Wilson score interval, logit-Wilson, Clopper-Pearson and ‘mid-p’ (precalculated up to n = 20)
- Newcombe-Wilson and algebraic functions (−, +, ÷, ×, ^, log)
- 2 × 2 ϕ effect size
- Entropy
- Example resources
- Incidental resources
© Sean Wallis 2009-. All rights reserved.