Short courses


English Corpus Linguistics Summer School

  • 18 hours
  • 3 days


The Summer School in English Corpus Linguistics is a three-day introduction to corpus linguistics.

You'll gain experience with a state-of-art corpus and an understanding of basic statistical ideas.

It's aimed at students of language and linguistics and teachers of English. You'll need a basic knowledge of English linguistics and grammar.

This course is taught by staff at the Survey of English Usage at UCL.

Course content

Over the three days, you'll learn about:

  • the scope of corpus linguistics, and how we can use it to study the English language
  • key issues in corpus linguistics methodology
  • how to use corpora to analyse issues in syntax and semantics
  • basic elements of statistics
  • how to navigate large and small corpora, particularly ICE-GB and DCPSE

Who this course is for

The summer school is for students and teachers of the English language in colleges and universities who want to acquire a knowledge of basic concepts and methodologies used in English corpus linguistics.


You'll be expected to have a basic knowledge of English linguistics and grammar at undergraduate level. 

Before the course begins, you'll be given access to a reading list and set of materials on UCL's Moodle site.

Structure and teaching

This is a three-day course in which:

  • mornings consist of a 'theory lecture' and a 'practical lecture'
  • afternoons consist of a practical session where you're able to make the most of what you've learned

The theory session on the first day is on English grammar, the second session is on corpus linguistics methodologies, and the third introduces the basic principles of statistics.

The course is practical and hands on.

A certificate of attendance will be issued on request.

The corpora studied

You'll also learn about a wide variety of corpora. Most of the practical teaching focuses on two particular corpora, both developed at UCL. These are the British component of the International Corpus of English (ICE-GB), and the Diachronic Corpus of Present-day Spoken English (DCPSE).

These corpora consist of authentic samples of written and spoken English and are unusual in that they are fully parsed, i.e. they contain a complete grammatical tree analysis for every sentence. You'll use the state-of-the-art software developed for research with grammatical treebanks - ICECUP - to explore these rich resources.

You'll be taught statistics fundamentals from the ground up, from probability theory to distributions, confidence intervals and statistical tests. No prior knowledge of statistics is assumed.

Learning outcomes

At the end of the course, you'll have:

  • acquired a basic but solid knowledge of the terminology, concepts and methodologies used in English corpus linguistics
  • had practical experience working with two state-of-the-art corpora and a corpus exploration tool (ICECUP)
  • gained an understanding of the breadth of corpus linguistics and the potential application for projects
  • have learned about the fundamental concepts of inferential statistics and their practical application to corpus linguistics

Benefits of taking this course

After attending the course you'll be able to:

  • confidently use the ICE-GB and DCPSE corpora in your research
  • apply basic statistical procedures and, most importantly, understand the results
  • understand the core concepts in analysis which have general purposes beyond the study of language - these include research methods and principles of statistical inference

As a teacher, you'll be able to design a course in English corpus linguistics for use in your own institution. You'll gain an understanding of the broad range of possibilities that corpus linguistics has to offer.

You'll also have an opportunity to meet other students and teachers to discuss ideas and issues, as well as having the chance to stay in London and explore its museums and theatres.

Costs and concessions

The standard fee is £350.

This fee does not cover accommodation.

Full programme information

View the full programme for more detailed information.

Course team

Professor Bas Aarts

Professor Bas Aarts

Bas teaches English linguistics to undergraduate and postgraduate students at UCL. Since January 1997 he's been the Director of the Survey of English Usage (SEU) at UCL - an internationally recognised and highly regarded centre of excellence for research in the area of English Language and Linguistics. From this research he and his team have developed 'Englicious', an extensive online platform containing original English language teaching resources closely tailored to the New 2014 UK National Curriculum, which includes professional development materials for teachers.

Sean Wallis

Sean Wallis

Sean is a Principal Research Fellow in corpus linguistics at the Survey of English Usage at UCL. He's the developer of the ICECUP research software, oversaw the completion of ICE-GB and DCPSE, and has written on many aspects of corpus linguistics methodology and statistics. He runs a blog on statistics, corp.ling.stats, which discusses how statistics should be used for research in corpus linguistics.

Rachele de Felice

Rachele de Felice

Rachele is a Senior Teaching Fellow in English linguistics in the Department of English Language and Literature at UCL. Her research focuses on corpus pragmatics, which looks at how the use of corpora can further our understanding of pragmatics and communication.

Learner reviews

"I am amazed by your availability and helpfulness throughout the summer school. 4 tutors for 21 students - that is a support ratio I have never had in my (university) education before. Maybe I should go to more summer schools."

"[Deciding what was most useful for me was a] difficult choice because each session offered a link to the next (admirably cohesive) and all were very useful - perhaps I'll opt for what I was least comfortable with, statistics."

"I'm studying statistics in my university. So I think this lecture is very useful and I want to study hard."

Course information last modified: 23 Oct 2019, 12:04