The Survey of English Usage
Annual Report 2006

News
Research
Staff
Publications, conference presentations, etc.

1. News

During the summer the Survey released the Diachronic Corpus of Present-Day Spoken English (DCPSE). This new resource will allow researchers to investigate changes in the grammar and usage of Present-Day English from two time periods, namely the 1960s and 1990s. More details follow in section 2.1.

The Survey also released version 3.1 of ICECUP, bundled with the ICE-GB corpus. The sound material (aligned audio recordings) is now also available to researchers. See sections 2.2 and 2.3 for further details.

The project Next generation tools for linguistic research in grammatical treebanks, funded by the Economic and Social Research Council (ESRC), has now been running for a year. Progress is described in section 2.4.

Back to top

2. Research

2.1 The Diachronic Corpus of Present-day Spoken English

DCPSE is unique in containing exclusively spontaneous spoken English. As distinct from ICE-GB, the corpus is based entirely on the spoken word, and is sampled over time - from the early 1960s to the early 1990s. The corpus contains over 400,000 words of spoken English from the London Lund Corpus and from the British component of the International Corpus of English ICE-GB, sampled under comparable headings. The corpus is parsed, which will permit research into synchronic and diachronic grammatical variation, and it is fully searchable using the International Corpus of English Corpus Utility Program (ICECUP), software that we developed for ICE-GB. This software has been modified to operate on the new data. In due course we intend to provide a playback facility enabling linguists to listen to the original recordings. We hope that DCPSE will be a major new resource for linguists interested in 'current change'. The DCPSE project was rated as 'outstanding research' by the ESRC.

DCPSE is available on CD with ICECUP 3.1, including some 87,000 syntactic trees, a complete lexicon and grammaticon, and all the latest search tools. A freely distributed sample corpus of 20,000 words is also available from our website. DCPSE can be ordered for education and research purposes by completing a printable order form on our website.

2.2 ICECUP 3.1

The second release of the ICE Corpus Utility Program (ICECUP 3.1) represents, we believe, a major achievement.

The first release of ICECUP (v. 3.0) was developed with support from the ESRC and released with ICE-GB in September 1998. It has stood the test of time.

Fuzzy Tree Fragments (FTFs) made grammatical queries on a parsed corpus easy. With an intuitive user interface, a Wizard for creating FTFs, and the ability to combine queries to form more complex ones, ICECUP 3.0 offers many tools linguists need to carry out corpus-based research on English. What could we do to improve on ICECUP? In recent years Senior Research Fellow Sean Wallis extended ICECUP in a number of ways (summarised here), taking into account requests for improvements from users.

ICECUP 3.1 is an 'evolutionary' development of ICECUP. Users of ICECUP 3.0 will find apparently subtle changes, rather than radical changes to the software. But under the surface the program is completely revised.

The most fundamental changes are in the way that the corpus is indexed. In ICECUP 3.0 the word work simply referred to a list of cases in the corpus. In ICECUP 3.1, work will be found first in a 'second order' lexical index which will list the distinct grammatical instances of the word work, each of which then refer to cases in the corpus.

This approach requires more processing. It also means that ICECUP 3.1 is not compatible with the old ICE-GB Release 1 index. But the new index makes the lexicon possible and lexical wild card queries (see below) efficient. The lexicon lets users explore these 'second order' indexes. ICECUP 3.1 also makes similar node lists visible in the new grammaticon tool.

Queries have been enhanced in a number of ways. For a complete list of extensions see here.

Lexical wild cards are familiar to linguists in many different 'flavours'. The usual '*' and '?' symbols stand for any number of characters or a single character, respectively. In addition we decided that the wild card system should support some plausible pre-defined sets, letting the user substitute a general code for a specific character, e.g. the code '^v' means 'any vowel'. Users can also define their own sets of characters at any point in the string. Finally, lexical wild cards can themselves now be combined using a kind of 'set notation with exceptions'. This means users can list irregular verbs, or define a pattern and exclude exceptions to the rule. Taken together, although this is not a substitute for a morphological query system, it is fair to say that (with some work) it can come close.

Node queries, i.e., query elements that stand for a node in a corpus tree, have been extended in a similar way to lexical queries. Node queries can be applied individually, or as part of a Fuzzy Tree Fragment or a lexical query.

In ICECUP 3.0 users could include an empty node in an FTF, standing for any node. They could label the function or category with a single code, or leave it blank, or list syntactic features that the node should have. But they could not say e.g. 'this node is either a direct object or an indirect object'. ICECUP 3.1 changes this. The function and category slots can contain a set of alternatives, e.g. '{OD,OI}', or can be negated, e.g. 'SU' (not a subject) or '{OD,OI}' (neither a direct object nor an indirect object). In the ICE grammar, features (e.g. singular, plural) are organised into sets of mutually exclusive elements (number), dependent on the category (noun or pronoun). Feature sets can also be included in a node query, so the node can now say that we want a clause which is either copular or intransitive ('CL(cop,intr)'). A number of other improvements, such as the ability to specify that a feature set is unspecified, have also been made. Finally, any node in an FTF can now consist of a logical combination of these 'node patterns'. This means that one can say that an FTF or text fragment could contain one of two distinct patterns at a particular location in the tree.

All of this additional power would come to nought if ICECUP's user interface did not help users master it. In Nelson, Wallis and Aarts (2002) Exploring Natural Language, we described much of the aforementioned expressivity. But we were conscious that actually defining and editing what users wanted was difficult. We therefore put significant effort to enhancing the FTF editor. The new FTF editor works similarly to ICECUP 3.0, so it is familiar to current users, but allows them to switch on new options. To define a set of categories, rather than select one category at a time, the 'multiple category' switch must be on. Secondly, as well as editing by using pop-up menus and key strokes, a new 'property inspector' window replaces the old Edit Node window. This floats over the screen, and allows users to select and modify any property of the FTF node, including editing logical expressions.

A further major extension in ICECUP 3.1 is the provision of statistical tables in the corpus map, grammaticon and lexicon. Tables in the corpus map show frequency distributions across the chosen sociolinguistic variable. Using drag and drop, the distribution of any query over any variable can be easily found, and some simple statistical calculations (ratio, log likelihood) allow users to contrast any pair of distributions. This means they can consider, for example, how the proportion of clauses that are interrogative varies across text categories. Similarly, dropping a sociolinguistic query into the lexicon allows users to contrast the total lexical frequency with the frequency in any subcorpus.

Other tools and extensions include a new Wizard tool which allows users to mark parts of a corpus tree and then make an FTF query out of the nodes they highlighted. A 'selection list' can be formed by marking sentences manually, which is useful if what is required is to extract just a few selected sentences from the corpus (or omit them). Sentences can be viewed in 'word wrap' mode; there are several new grammatical concordancing modes, and context sentences before or after the current one may be shown. ICECUP supports the playback of recorded speech (see below).

The software is supplied with a completely rewritten online help manual and the CD comes with a printed Getting Started manual. Prospective users can try out ICECUP 3.1, by downloading it with the ICE-GB Release 2 sample corpus (20,000 words) from our website.

2.3 ICE-GB R2 and Sound Files

This summer we simultaneously published Release 2 of the ICE-GB corpus alongside DCPSE. ICE-GB R2 is an extended and improved version of ICE-GB Release 1, indexed to be compatible with ICECUP 3.1 (see above).

We had always intended to publish the audio recordings for the 300 spoken texts alongside ICE-GB, in such a format that would allow the simultaneous playback of the spoken word. ICECUP was extended to support the playback of speech in 2000.

This new release has been re-aligned with the split sound recordings. Two conversation texts had to be partially reparsed when we spotted errors in the orthographic transcription. There were other cases of missing text and we have located some recordings we had not digitised previously.

The result is a resource which allows users to browse or search the grammar and then click to hear the recording, which expands the applications of the corpus, especially in the domain of teaching.

More details about the corpus, a sample corpus download and sample audio files are available from our website. The sample download package includes the complete help file for the software and corpus.

Order forms for the full ICE-GB Release 2 plus the ICE-GB R2 Sound Files are also available from our website. Note that users who have a licence for ICE-GB R1 are entitled to a discounted upgrade price.

2.4 Next Generation Tools

Senior Research Fellow Sean Wallis is working full time on this new research project. He is extending the ICECUP software platform to support experimental research in linguistics. This means that the platform will support existing exploration options and tools, like the lexicon. Once a linguist has identified an interesting linguistic phenomenon she can then use this as a basis for a much more careful and detailed investigation.

At the moment the project is at what Sean calls an 'intensive programming stage' with new tools and facilities being grafted onto existing viewers. We will be releasing beta versions of the software compatible with our current corpora (DCPSE, ICE-GB Release 2) and with sample download corpora. Full details are described on our website.

Back to top

3. Staff

Bas Aarts lectured on gradience in various places, and his Handbook of English linguistics (co-edited with April McMahon in Edinburgh) was published during the summer. His book Syntactic gradience: the nature of grammatical determinacy is now in press (Oxford University Press 2007).

Christine Bowles, the Survey Administrator had twins, and is on maternity leave until August 2007. Marie Gibney has kindly agreed to cover.

Isaac Hallegua continues as Systems Administrator.

Yordanka Kavalova organised a workshop with Nicole Dehé on parentheticals in English as part of the 28th Annual Conference of the German Linguistics Society (DGfS), 22-24 February 2006, University of Bielefeld, Germany. The proceedings will be published as a book by John Benjamins.

Sean Wallis represented the Survey at ICAME 27 in Helsinki by presenting a paper on Experimental Corpus Linguistics, demonstrating ICECUP 3.1 and the DCPSE corpus and taking part in a plenary debate on the future of corpora. For more information see the ICAME 27 website.

Back to top

4. Publications, conference presentations, talks, theses and other studies using Survey material

Please let us know if you would like us to include your publications based on SEU material. We will appreciate it if you send us offprints of any such publications.

Aarts, Bas and April McMahon (2006) The handbook of English linguistics. Malden MA: Blackwell Publishers.

Aarts, Bas and April McMahon (2006) Introduction. In: Bas Aarts and April McMahon The handbook of English linguistics. Malden MA: Blackwell Publishers. 1-5.

Aarts, Bas and Liliane Haegeman (2006) English word classes and phrases. In: Bas Aarts and April McMahon The handbook of English linguistics. Malden MA: Blackwell Publishers. 117-145.

Aarts, Bas (2006) Conceptions of categorization in the history of linguistics. Language Sciences 28. 361-385.

Aarts, Bas and Sean Wallis (2006) Recent developments in the syntactic annotation of corpora. In: Eloína Miyares Bermúdez and Leonel Ruiz Miyares (eds.) Linguistics in the twenty-first century. 2006. Cambridge: Cambridge Scholars Press. 197-202.

Aarts, Bas (2006) Boundaries in language: fuzzy or sharp? Plenary lecture at the 15th Postgraduate Conference in Linguistics. University of Manchester.

Aarts, Bas, Mariangela Spinillo and Sean Wallis (2006) Researching recent change in English. Paper presented at the conference Directions in English Language Studies. University of Manchester.

Aarts, Bas and Sean Wallis (2006) The British component of the International Corpus of English (ICE-GB), Release 2, London: Survey of English Usage, UCL.

Aarts, Bas and Sean Wallis (2006) The Diachronic Corpus of Present-Day Spoken English (DCPSE). CD-ROM. London: Survey of English Usage, UCL.

Aarts, Bas, Gerald Nelson and Sean Wallis (2006) Getting started: for use with the British component of the International Corpus of English (ICE-GB) and The Diachronic Corpus of Present-Day Spoken English (DCPSE). London: Survey of English Usage, UCL.

Aijmer, Karin (2006) Modal adverbs in spoken interaction - Some recent developments in adolescent language. Paper presented at the 27th conference of the International Computer Archive of Modern and Medieval English (ICAME), Helsinki, Finland.

Algeo, John (2006) British or American English: a handbook of word and grammar patterns. Cambridge: Cambridge University Press.

Collins, Peter (2006) Modal wars: Some ascendant semi-modals in World Englishes. Paper presented at the 27th conference of the International Computer Archive of Modern and Medieval English (ICAME), Helsinki, Finland.

De Clerck, Bernard (2006) Imperatives as conversational and textual managers: a pragmatic, corpus-based analysis. Paper presented at the 27th conference of the International Computer Archive of Modern and Medieval English (ICAME), Helsinki, Finland.

Dehé, Nicole and Yordanka Kavalova (2006) Introduction to parentheticals. Paper presented at the workshop 'Parenthetical Constructions', Annual Meeting of the Deutsche Gesellschaft für Sprachwissenschaft, Bielefeld, Germany.

Dehé, Nicole (2006) Prosodic aspects of parentheticals in English. Paper presented at the workshop 'Parenthetical Constructions', Annual Meeting of the Deutsche Gesselschaft für Schrachwissenschaft, Bielefeld, Germany.

Dehé, Nicole and Yordanka Kavalova (2006) The syntax, pragmatics and prosody of parenthetical what. English Language and Linguistics 10.2. 289-320.

Denison, David (2006) Playing tag with category boundaries. Paper presented at the 27th conference of the International Computer Archive of Modern and Medieval English (ICAME), Helsinki, Finland.

Depraetere, Ilse and Susan Reed (2006) Mood and modality in English. In: Bas Aarts and April McMahon The handbook of English linguistics. Malden MA: Blackwell Publishers. 269-290.

Gries, Stefan Th. (2006a) Resampling corpora: Investigating the ranges and sources of variation within and between corpora. Paper presented at the 27th conference of the International Computer Archive of Modern and Medieval English (ICAME), Helsinki, Finland.

Gries, S. Th. (2006b) Some proposals towards more rigorous corpus linguistics. Zeitschrift für Anglistik und Amerikanistik 54.2. 191-202.

Kaltenböck, Gunther (2006) Parenthetical clauses in spoken English. Paper presented at the workshop 'Parenthetical Constructions', Annual Meeting of the Deutsche Gesselschaft für Schrachwissenschaft, Bielefeld, Germany.

Kaltenböck, Gunther (2006) ‘...That is the question’: complementizer omission in extraposed that-clauses, English Language and Linguistics 10 (2): 371-396.

Kaltenböck, Gunther (2006) Zur Verwendung von that und Asyndeton in extraponierten Subjektsätzen des Englischen, in: Ketteman, B. and Marko, G. 2006. Planing and gluing corpora. Inside the applied corpus linguist's workshop. Frankfurt: Peter Lang, 69-99.

Kavalova, Yordanka (2006) Parenthetical clauses. Paper presented at the workshop 'Parenthetical Constructions', Annual Meeting of the Deutsche Gesellschaft für Sprachwissenschaft, Bielefeld, Germany.

Kirk, John M. and Jeffrey L. Kallen (2006) The pseudo-perfect. Paper presented at the 27th conference of the International Computer Archive of Modern and Medieval English (ICAME), Helsinki, Finland.

Mair, Christian (2006) Twentieth-century English: history, variation and standardization. Cambridge: Cambridge University Press.

Mair, Christian and Geoffrey Leech (2006) Current changes in English syntax. In: Bas Aarts and April McMahon The handbook of English linguistics. Malden MA: Blackwell Publishers. 318-342.

McEnery, Tony and Costas Gabrielatos (2006). English corpus linguistics. In: Bas Aarts and April McMahon The handbook of English linguistics. Malden MA: Blackwell Publishers. 33-71.

McEnery, Tony, Richard Xiao and Yukio Tono (2006) Corpus-based language studies. Routledge Applied Linguistics. London: Routledge.

Meyer, Charles F. and Gerald Nelson (2006) Data collection. In: Bas Aarts and April McMahon The handbook of English linguistics. Malden MA: Blackwell Publishers. 93-113.

Miller, Jim (2006) Spoken and written English. In: Bas Aarts and April McMahon The handbook of English linguistics. Malden MA: Blackwell Publishers. 670-691.

Nelson, Gerald (2006) The core and periphery of world Englishes: a corpus-based exploration. World Englishes 25.1. 115-29.

Nelson, Gerald (2006) Review of Gunnel Melchers and Philip Shaw (2003) World Englishes. The English Language Series. London: Arnold. English Language and Linguistics 10.1. 219-21.

Nelson, Gerald, Bas Aarts and Sean Wallis (2006) Getting started: for use with the British component of the International Corpus of English (ICE-GB) and The Diachronic Corpus of Present-Day Spoken English (DCPSE). London: Survey of English Usage, UCL.

Ozón, Gabriel (2006) Ditransitives, the Given Before New principle, and textual retrievability: a corpus-based study using ICECUP. In: Antoinette Renouf and Andrew Kehoe (Eds.) The Changing Face of Corpus Linguistics. Amsterdam: Rodopi. 243-262.

Quaglio, Paulo and Douglas Biber (2006) The grammar of conversation. In: Bas Aarts and April McMahon The handbook of English linguistics. Malden MA: Blackwell Publishers. 692-723.

Rosenbach, Anette (2006) Animacy versus weight as determinants of grammatical variation in English. Language 81.3. 613-644.

Spinillo, Mariangela, Bas Aarts and Sean Wallis (2006) Researching recent change in English. Paper presented at the conference Directions in English Language Studies. University of Manchester.

Tanskanen, Sanna-Kaisa (2006) Collaborating towards coherence. Amsterdam: John Benjamins.

Tao, Hongyin and Charles F. Meyer (2006) 'Gapped coordinations in English: form, usage and implications for linguistic theory'. Corpus Linguistics and Linguistic Theory 2.2. 129-163.

Wallis, Sean (2006) Experimental corpus linguistics: the next generation? Paper presented at the 27th conference of the International Computer Archive of Modern and Medieval English (ICAME), Helsinki, Finland.

Wallis, Sean (2006) Software demonstrations of ICECUP 3.1 and DCPSE at the 27th conference of the International Computer Archive of Modern and Medieval English (ICAME), Helsinki, Finland.

Wallis, Sean, Bas Aarts and Mariangela Spinillo (2006) Researching recent change in English. Paper presented at the conference Directions in English Language Studies. University of Manchester.

Wallis, Sean (2006) The International Corpus of English Corpus Utility Program (ICECUP), version 3.1. CD-ROM. London: Survey of English Usage, UCL.

Wallis, Sean and Bas Aarts (2006) The British component of the International Corpus of English (ICE-GB), Release 2, London: Survey of English Usage, UCL.

Wallis, Sean and Bas Aarts (2006) The Diachronic Corpus of Present-Day Spoken English (DCPSE). CD-ROM. London: Survey of English Usage, UCL.

Wallis, Sean, Bas Aarts and Gerald Nelson and (2006) Getting started: for use with the British component of the International Corpus of English (ICE-GB) and The Diachronic Corpus of Present-Day Spoken English (DCPSE). London: Survey of English Usage, UCL.

Wichmann, Anne, Anne-Marie Simon Vandenbergen and Karin Aijmer (2006) The role of prosody in semantic change: corpus evidence for the multiple meanings of 'of course'. Paper presented at the 27th conference of the International Computer Archive of Modern and Medieval English (ICAME), Helsinki, Finland.

Back to top

Bas Aarts
Director

January 2006

This page last modified 21 July, 2014 by Survey Web Administrator.