The Survey of English Usage
Annual Report 2003

1. General

Sean Wallis and Bas Aarts presented the new ESRC project on current change (see below) at the 2003 ICAME conference in Guernsey. In linguistics a distinction is traditionally made between diachronic and synchronic approaches to the study of language. The first considers language through time, whereas the latter takes a ‘snapshot’ look at language viewed from the present. This dichotomy has recently been questioned by some linguists who have argued that the distinction is an artificial one. They claim that languages change all the time, even synchronically. As a result of these new attitudes to language development there is an emerging research impetus in linguistics which concerns itself with recent change. We hope to provide a useful research tool for linguists working in this field.Apart from the work on our research projects we continue to contribute to specialised areas in the field of corpus linguistics; see the papers by Sean Wallis on research methods and the methodology of corpus annotation listed at the end of this report.

2. Research

Creating a parsed and searchable diachronic corpus of present-day spoken English (ESRC R000239643)

At the core of this two-year research project are two corpora of Modern British English, both founded at the Survey of English Usage (SEU) at University College London: the London-Lund Corpus (LLC), compiled in the 1960s, and the British Component of the International Corpus of English (ICE-GB), compiled in the 1990s. The aim of the project is to construct a fully parsed and searchable diachronic corpus of spontaneous spoken English, containing carefully selected, directly comparable texts from the LLC and ICE-GB. This corpus will be a unique resource for linguists studying the spoken English of a period spanning 25-30 years. There is currently no such resource available, and the corpus will be the first of its kind to enable research into current change in spoken language.

The central aim of this project is to annotate 400,000 words of the LLC in a manner directly comparable with ICE-GB. The first task is to ensure that segmentation, sub-lexical utterances and part of speech annotation is carried out in a manner consistent with ICE-GB, and then to reparse the LLC portion. This parsed analysis is now being manually corrected. We have extended our ICECUP software (see below) to permit cross-sectional analysis and correction and to permit the wholesale insertion of structure. New facilities in ICECUP 3.1, such as the lexicon and grammaticon, will be used to identify further inconsistencies. For full details, see here.

ICE-GB

ICECUP 3.1 is in the process of being beta-tested and finalised for the publication of ICE-GB Release 2. The software has been revised in a number of important ways from the initial release (3.0), and provides more powerful search facilities including logical combinations of lexical wild cards and logical expressions within FTF nodes, new integrated lexicon and ‘grammaticon’ tools, and an improved user interface supporting parallel searching. More information will be published on the Survey website.

Sound recordings for ICE-GB’s 300 spoken texts, consisting of approximately 75 hours of speech, are available on request from the Survey of English Usage. They are available in three formats, all 16kHz mono:

  1. uncompressed wave files, one per text. uncompressed wave files, split into text units (or groups where overlapping, etc. prevents subdivision).
  2. compressed mp3 files, split into text units or groups.

Uncompressed files take 13 CDs. The compressed files currently require 6 CDs.

The Bayreuth-UCL Morphology Corpus

Phase I of this project has been carried out in conjunction with Professor Dr Hans-Jörg Schmid at the University of Bayreuth. A sample of ICE-GB was annotated using a morphological analysis scheme developed at Bayreuth. We are now planning Phase II, which will involve extending our corpus research platform, ICECUP, to support the browsing and searching of grammatical and morphological layers of analysis. We will shortly be submitting a research proposal to the ESRC to this end.

The English Noun Phrase: an empirical study (AHRB B/RG/AN5308/APN10614)

This project has now been assessed by the AHRB and has been given the highest grading. Comments made by the assessors include: “The project has offered extremely good value for money. The output of a major book-length monograph has been produced in only two years”, “This [project] seems to me to be an excellent use of the AHRB’s resources in that it has enabled the researcher to realize their potential to improve the breadth and depth of our knowledge in this area.” The principal researcher on this project, Dr Evelien Keizer, is now at the University of Amsterdam.

3. Staff

Dr Dirk Bury and Leslie Kirk joined Yordanka Kostadinova-Kavalova and Gabriel Ozón on the ESRC project team as Research Assistants. Sean Wallis continues as Principal Senior Research Fellow. Toshihiko Kubota joined the Survey for two years as a Visiting Scholar.

Marie Gibney is the SEU Administrator and Isaac Hallegua continues as Systems Administrator.

4. Publications, conference presentations, talks, theses and other studies using Survey material

Please let us know if you would like us to include your publications based on SEU material. We will appreciate it if you send us offprints of any such publications.

Aarts, Bas (2003) English Language and Linguistics. (With David Denison and Richard Hogg.) Cambridge University Press. Volumes 7.1 and 7.2.

Aarts, Bas (2003) Argumentation. In: The guide to good practice for learning and teaching in languages, linguistics and area studies. 2003. Published on the web: http://152.78.89.51/resources/guidecontents.aspx.

Aarts, Bas (2003, with Sean Wallis) Tracking the development of spoken English across the decades: the Diachronic Corpus of Present-day Spoken English. Paper presented at the annual ICAME conference, Guernsey.

Aarts, Bas (2002/2003; with Evelien Keizer) The English noun phrase: an empirical study. Final Report. AHRB Project B/RG/AN5308/APN10614. Available at http://www.ucl.ac.uk/english-usage/projects/noun-phrase/index.htm.

Crystal, David (2003) The Cambridge encyclopedia of the English language. Second edition. Cambridge: Cambridge University Press.

De Clerck, Bernard (2003) The multifunctionality of the imperative in present-day British English in contrast with Dutch: fundamental research into the functional differentiation and the semantic core meaning(s) on the basis of corpus analysis. Paper presented at the annual ICAME conference, Guernsey.

Depraetere, Ilse (2003) On verbal concord with collective nouns in British English. English Language and Linguistics 7.1., 85-127.

Facchinetti, Roberta (2003) Pragmatic and sociological constraints on the functions of may in contemporary British English. In: Roberta Facchinetti, Manfred Krug and Frank Palmer (eds.) Modality in contemporary English. Topics in English Linguistics 44. Berlin and New York: Mouton de Gruyter. 301-327.

Gilquin, Gaëtanelle (2003) Automatic retrieval of syntactic structures: the quest for the holy grail. International Journal of Corpus Linguistics 7.2., 183-214.Gilquin, Gaëtanelle (2003) Causative get and have. Journal of English Linguistics 31.2, 125-148.

Kaltenböck, Gunther (2003) On the syntactic status of anticipatory it. English Language and Linguistics 7.2. 235-255.

Keizer, M.E. (2002/2003, with Bas Aarts) The English noun phrase: an empirical study. Final report. Available at http://www.ucl.ac.uk/english-usage/projects/noun-phrase/index.htm.

Kirk, John, Jeffrey Kallen, Orla Lowry and Anne Rooney (2003) The compilation of ICE-Ireland. Paper presented at the annual ICAME conference, Guernsey.

Leech, Geoffrey (2003) Modality on the move: the English modal auxiliaries 1961-1992. In: Roberta Facchinetti, Manfred Krug and Frank Palmer (eds.) Modality in contemporary English. Topics in English Linguistics 44. Berlin and New York: Mouton de Gruyter. 223-240.

Leech, Geoffrey (2003) Recent grammatical change in written English: comparing American and British English. Paper presented at the annual ICAME conference, Guernsey.

Mair, Christian (2003) Tracking ongoing grammatical change and recent diversification in present-day standard English: the complemetary role of small and large corpora. Paper presented at the annual ICAME conference, Guernsey.

Ozón, Gabriel (2003) English ditransitives: a corpus-based study. Paper presented at the annual ICAME conference, Guernsey.

Smith, Nicholas (2003) Changes in the modals and semi-modals of strong obligation and epistemic necessity in recent British English. In: Roberta Facchinetti, Manfred Krug and Frank Palmer (eds.) Modality in contemporary English. Topics in English Linguistics 44. Berlin and New York: Mouton de Gruyter. 241-266.

Spinillo, Mariangela (2003) On such. English Language and Linguistics 7.2. 195-210.

Spinillo, Mariangela (2003) Such is such. CamLing 2003: Proceedings of the University of Cambridge First Postgraduate Conference in Language Research. 169-174.

Wallis, Sean (2003) Completing parsed corpora: from correction to evolution. In Anne Abeillé (ed.), Treebanks: building and using parsed corpora. Boston: Kluwer. 61-71.

Wallis, Sean (2003) Scientific experiments in parsed corpora: an overview. In: Sylviane Granger and Stephanie Petch-Tyson (eds.) Extending the scope of corpus-based research: new applications, new challenges. Language and Computers 48. Amsterdam/New York: Rodopi. 27-38.

Wallis, Sean (2003, with Bas Aarts) Tracking the development of spoken English across the decades: the Diachronic Corpus of Present-day Spoken English. Paper presented at the annual ICAME conference, Guernsey.

Bas Aarts
Director

January 2004

This page last modified 21 July, 2014 by Survey Web Administrator.