XClose

UCL English

Home
Menu

Statistics Resources

The Survey of English Usage publishes a range of statistics resources by Sean Wallis.

Statistics Resources, looking down the neck of a bottle

Introduction

This page includes links to a number of papers, spreadsheets and PowerPoint slides by Sean Wallis on Experimental Design and Statistics (EDS) methods for corpus linguistics.

The arguments contained within these papers are not limited in application to Survey corpora! They are written to help corpus linguists get to grips with the basic statistical methods they need.

Papers are listed by date of first e-publication, although some of these are now being published in print elsewhere.

A comprehensive discussion can be found on Sean’s corp.ling.stats blog.


Articles

Listed by their original date of online publication.

2009  Binomial confidence intervals and contingency tests. London: Survey of English Usage. » corp.ling.stats » ePublished (JQL, 2013)

2009  Grammatical Noriegas: interaction in corpora and treebanks. Paper presented at ICAME 2009, Lancaster. » PowerPoint slides

2010  Competition between choices over time. London: Survey of English Usage. » corp.ling.stats » ePublished (SICLR)

2010  z-squared: The origin and use of χ². London: Survey of English Usage. » corp.ling.stats » ePublished » PowerPoint slides (JQL, 2013)

2011  Comparing χ² tests. London: Survey of English Usage. » corp.ling.stats » ePublished (JQL, 2020)

2012  Goodness of fit measures for discrete categorical data. London: Survey of English Usage. » corp.ling.stats » ePublished (SICLR*)

2012  Measures of association for contingency tables. London: Survey of English Usage. » corp.ling.stats » ePublished (SICLR*)

2012  (with J. Bowie) That vexed problem of choice. Presented at ICAME 33. London: Survey of English Usage. » corp.ling.stats » ePublished » PowerPoint slides (SICLR)

2012  Capturing patterns of linguistic interaction in a parsed corpus: an insight into the empirical evaluation of grammar? London: Survey of English Usage. » corp.ling.stats » ePublished » PowerPoint slides » PowerPoint slides (ICAME) » Handouts (ICAME) » Data and spreadsheets (IJCL, 2019)

2012  Tagging ICE Phillipines and other corpora. London: Survey of English Usage. » ePublished

2012  A statistics crib sheet. London: Survey of English Usage. » ePublished

2014  What might a corpus of parsed spoken data tell us about language? Presented at Olinco 2014. London: Survey of English Usage. » corp.ling.stats » ePublished » PowerPoint slides (SICLR)

2015  Adapting random-instance sampling variance estimates and Binomial models for random-text sampling. London: Survey of English Usage. » corp.ling.stats » ePublished (SICLR)

2017  Detecting direction in interaction evidence. London: Survey of English Usage. » corp.ling.stats » ePublished

2018  Plotting the Wilson distribution. London: Survey of English Usage. » corp.ling.stats » ePublished (SICLR)

2020  Interval arithmetic ‘cheat sheet’. » ePublished

2020  Further evaluation of Binomial confidence intervals and difference intervals. London: Survey of English Usage. » corp.ling.stats » ePublished

2022  Accurate confidence intervals on Binomial proportions, functions of proportions and other related scores. London: Survey of English Usage. » corp.ling.stats » ePublished » PowerPoint slides

2022  Are embedding decisions independent? Evidence from preposition(al) phrases. London: Survey of English Usage. » corp.ling.stats » ePublished » Spreadsheet

2022  Directional evidence revisited: End weight bias and templating in conjoined phrase postmodification. London: Survey of English Usage. » corp.ling.stats » ePublished » Spreadsheet

JQL = Journal of Quantitative Linguistics, IJCL = International Journal of Corpus Linguistics, SICLR = Statistics in Corpus Linguistics Research, * = abridged.


Spreadsheets

© Sean Wallis 2009-. All rights reserved.