The Survey of English Usage
This is the second in a series of quarterly newsletters from the Survey of English Usage, intended to keep the academic community and other interested parties informed about research in the Survey. The newsletter will be sent out in March, June, September and December. The March issue is the Survey’s Annual report.
We are very pleased to announce that the beta release of ICECUP IV, the new research platform for experimental research in parsed corpora, is available for download from our website. ICECUP IV is compatible with ICE-GB Release 2 and DCPSE.
- Users possessing a full CD copy of ICE-GB Release 2 or DCPSE can download the software entirely free of charge to carry out research on their corpus.
- Anyone can download the two 20,000-word sample corpora with ICECUP IV.
This is a beta version. The software will continue to be developed to a full release in Summer 2008. ICECUP IV was developed under the ESRC Next Generation Tools research project.
In the remainder of this newsletter there are details of a number
of grants submitted by the SEU currently under consideration by
the ESRC and the AHRC.
Partially parsing the Hong Kong Component of the International Corpus of English (ICE-HK)
This project will support a team at University College London and a team based at the Chinese University of Hong Kong. It will start a process of parsing the ICE-HK corpus, using the same grammar, and ensuring that the analysis is as consistently applied as possible.
Parsing will consist of two stages. The first is to ensure a correct classification of every word in the corpus by its part of speech (verb, noun, etc.) including subtypes (copular verb, common singular noun, etc.). The second stage is to construct trees that group terms under phrases, clauses and other constituents. A third stage (beyond the scope of this project) will guarantee that the full structure of every sentence in the corpus is correctly applied. We estimate that the total effort it will take to complete the parsing of the Hong Kong corpus using the methods described would be around ten person years.
This one-year project will provide the training, infrastructure, tools and support to create a partially parsed corpus where self-contained non-recursive clauses and phrases are annotated. It will further ensure that the ICE-HK team will be in a position to complete the parsing of their corpus consistently with ICE-GB.
This ambitious project, to be carried out at the Survey of English Usage (SEU) at UCL, lies at the interface between language, literature and computing.
We propose to construct a fully grammatically parsed and searchable corpus of all of William Shakespeare's First Folio plays. This amounts to approximately 700,000 words. These will need to be prepared and checked; some basic standardisation will be necessary. This process involves a scholarly review of the text of each play, including a consideration of variant formes, and will feed into the tagging and parsing procedure. Consideration will be given to all textual and grammatical material available in the standard modern editions in their various series. All texts and their readings will be freshly checked and verified.
The works will be fully segmented into 'text units' ('sentences'), tagged and parsed. The grammatical analysis will be presented to users in the form of a phrase structure tree. Stage directions and portions of plays that are thought not to be by Shakespeare (in e.g. 1 Henry 6, Titus Andronicus, Macbeth, etc.) will also be parsed. The linguistic analysis will be based on a detailed grammatical scheme, derived from Quirk et al. (1985), which the SEU has already applied to the existing British Component of the International Corpus of English (ICE-GB) and Diachronic Corpus of Present Day English (DCPSE).
How English works: Developing dynamic and interactive facilities for the exploration of the English language
In this partnership with the English Project (EP) and a number of other institutions we aim to make the SEU's research on the English language available to two audiences: the general public and secondary schools. We propose to do so by creating a platform of English language learning facilities using data-driven learning techniques.
For the English Project we will create an Exploring English Zone which will contain a set of 'hands-on' exhibits that will allow visitors to explore the English language dynamically and interactively, especially its usage, lexis, and grammar. We will also develop an Exploring English Website which will form part of the EP's larger web presence. There is a real need for these facilities, given that the English language is part of the UK's heritage, and generates enormous interest in the general public. This is reflected in the letter columns of the press, in (reactions to) radio and television broadcasts on the English language (Crystal 1984, 2004, 2006), and in the many popular publications that deal with English language issues, e.g. Truss (2003), which sold millions of copies, and books by Radio 4 presenter John Humphreys (2004, 2006). It is also evident from the success of the English Wiktionary, an online dictionary, which has almost 800,000 words written by users, and from the popularity of other language museums, such as the museum of the Portuguese language and the National Museum of Language in the US which has just opened.
For secondary schools, including sixth forms, we will develop the deliverables of the project so that they can be used for the teaching of the English language. The impetus for this is that in the past the study of the English language, especially its grammar, has often been perceived as boring and irrelevant. That resulted in the teaching of the English language being withdrawn from the curriculum in the 1960s for several decades. It had disastrous results for generations of school children (see Hudson 2003 for discussion). While much has been achieved by bringing the study of language back into schools, there is still room for improvement (see Hudson&Walmsley 2005, Hudson 2008). In recent years the government has recognised the importance of modern languages in general, and the English language in particular for the UK society and its economy. This is reflected in the National Curriculum, both at the primary and secondary levels, especially regarding the teaching of literacy skills, communication skills and grammar in schools. Tools for English language teaching are badly needed by teachers, and the project will develop these.
Crystal, David (1984) Who cares about English usage? Penguin.
Crystal, David (2004) The Stories of English. Allen Lane.
Crystal, David (2006) The fight for English. OUP.
Hudson, R. (2003) Linguistics at school. » e-Published
Hudson, R. and J. Walmsley (2005) The English Patient: English grammar and teaching in the twentieth century, Journal of Linguistics 43.3, 593-622.
Hudson, R. (2008) The language crisis. Forthcoming in Languages, Linguistics and area Studies Magazine. » e-Published
Humphreys, J. (2004) Lost for words. Hodder & Stoughton.
Humphreys, J. (2006) Beyond words. Hodder & Stoughton.
Truss, L. (2003) Eats, shoots and leaves. Profile.
This page last modified 11 February, 2014 by Survey Web Administrator.