The Survey of English Usage
Widely recognised for its research on present-day English, the Survey was
founded a full half century ago. Its outputs include the Comprehensive
Grammar of the English Language (1985), by four internationally renowned linguists:
Randolph Quirk, Sidney Greenbaum, Geoffrey Leech, and Jan Svartvik (Quirk et
al. 1985). It is also well known for its pioneering work in the area of corpus
linguistics.
A linguistic corpus is a collection of written and spoken material, compiled
for the purposes of language research. The SEU houses three corpora: the Survey
of English Usage Corpus, the British component of the International
Corpus of English (ICE-GB), and the Diachronic Corpus of Present-Day
Spoken English (DCPSE). ICE-GB and DCPSE are parsed collections of transcribed speech and
writing, which means that every sentence is fully grammatically analysed in
the form of so-called tree diagrams, i.e. graphical representations of the
structure of sentences, using a grammar based on Quirk et al. (1985). These
corpora amount to over 1.4m words of fully parsed English sentences and span
genres and contexts, comprising spoken transcriptions and written text in a
variety of situations, as well as sound recordings of the later material. The
corpora have been used all over the world for scholarly research, and are state-of-the
art resources. The SEU also has an established experience in building web resources
for grammar teaching, e.g. the JISC-funded Internet Grammar of English.
The Survey has a five-pronged research strategy
- The creation of new corpora
- The further development of current corpora
- The development of computer software for
creating and exploiting corpora
- The development of educational resources
using corpora
- Carrying out research using SEU resources
a. The creation of new corpora
The Survey intends to create parsed corpora of literary works (e.g. Shakespeare)
in the future, with the aim of building bridges between language and literature.
b. The further development of current corpora
The Survey’s three corpora (the ‘Quirk corpus’,
ICE-GB and DCPSE) can be further annotated, e.g. at the prosodic,
pragmatic, and morphological levels of linguistic analysis. ICE-GB
and DCPSE are fully parsed and extending the analysis would permit
new avenues of research (for example, to explore interactions between
grammar and pragmatics). The intention is to seek funding for further
development.
c. The development of computer software for creating and exploiting
corpora
The SEU has developed tools for automatic annotation, including part-of-speech
tagging (automatically classifying words by verb, noun, etc.) and
parsing. Central to all our work is exploration software designed
for working with a parsed corpus, called ICECUP, developed by the
Senior Research Fellow, Sean
Wallis. This software has been used for building corpora and
carrying out research with the corpus and is distributed freely
with our corpora.
The SEU was recently funded by the ESRC to build a corpus experimentation
platform (the Next Generation Tools for Linguistic Research
in Grammatical Treebanks project) which allows scholars to
define viable research projects into grammar and lexis of far greater
complexity than was previously possible.
Sean Wallis’s
research is in the area of corpus linguistics methodology. This
is the ‘how do we do research?’ question, including
how to design and annotate a corpus, how to use a new corpus for
your linguistic research (and learning the grammar of the corpus),
what kinds of experiments can we undertake (and what do results
mean), through to developing new methods for linguistic analysis.
d. The development of educational resources using corpora
One of the biggest potential exploitation routes of corpora is
in education. Corpora have been used for developing educational
material by SEU staff from the famous Comprehensive Grammar
to the Internet Grammar of English to create the Internet
Grammar of English.
The SEU has recently secured funding to develop web-based teaching
and learning resources for schools. The intention is to construct
a web platform for teaching English at Key Stages 3-5 (secondary
schools and equivalent). Our ICECUP server technology is then used
to select examples from a corpus and publish them online. The corpus
provides a vast supply of natural language examples for teaching,
and can supply context for any example provided.
The SEU’s corpora are also used as an integral element of
both undergraduate and postgraduate teaching in the department,
for example in the undergraduate course Literary Linguistics and
the postgraduate course Corpus Linguistics (both run in alternate
years).
e. Carrying out research using SEU resources
As noted above, the SEU’s corpora are used throughout the world to carry
out English language research. However, for decades the Survey has
also carried out its own English language research. A recently completed
project is The English Noun Phrase, funded by the AHRC,
which has resulted in a monograph published by Cambridge University
Press, authored by the Researcher working on the project, Evelien
Keizer, now at the University of Amsterdam.
Currently Bas Aarts and a researcher are
working on an AHRC-funded project entitled The Changing English
Verb Phrase which will use DCPSE as its database to investigate
short-term changes in the English verbal system, such as the increased
use of progressive constructions in English (e.g. I love this
film vs. I’m loving this film.). Bas
Aarts is also using the SEU’s resources to write a grammar
of English for Oxford University Press.
The Survey’s research on recent change in English is complemented
by Kathryn Allan’s work on change
over a longer timespan in English, which focuses on lexical semantic
change and lexicology. The possible uses of diachronic corpora and
other electronic resources to investigate semantic change is a major
theme of a volume on current methods in historical semantics which
Kathryn Allan is co-editing for Mouton
de Gruyter’s ‘Topics in English Linguistics’ series.
See also:
» The
Survey of English Usage website
Back to Research
|