Serendipitously, browsing through the departmental library [at Uppsala], my eyes one day fell on a copy of English Studies with an article called "Relative clauses in educated spoken English" by a certain R. Quirk. I was instantly hooked by this approach: studying English that was contemporary - and spoken at that, based on a corpus of audio recordings, both surreptitious and non-surreptitious - it even smelled of cloak and dagger. Armed with a British Council scholarship, largely thanks to the support of my two Uppsala professors, Erik Tengstrand in language and H. W. Donner in literature, I left for north-east England to spend the 1959-1960 academic session at the University of Durham under the guidance of R. Quirk (1957).

It's an amusing coincidence that the location of both my American and British universities carries the same name (but of course with different pronunciations). My private life at Durham, UK, was also different from that of Durham, USA. I was now married, and this created a problem for university administration. The solution was to make me "a member of the Senior Common Room", as the term went. Fortunately, suitable lodgings were found at Lumley Castle in Chester-le-Street, halfway between Durham and Newcastle. My wife Gunilla (aged 21) and I (aged 28) found ourselves the inhabitants of a medieval castle dating back to 1348 ("that's a hundred years before Columbus", American visitors were told), surrounded by a ha-ha and adjoining a golf course, which led to a long but unrequited love affair with the game of golf. Academically, on festive college occasions on Palace Green, I experienced circulating (clockwise of course) a port decanter, and weekly enjoying tutorials with Randolph Quirk, an inspiring mentor with an enviable zest for work, who taught me the virtues of close corpus observation and a broad grounding in theoretical approaches, including British Firthian, American structuralist (Sapir, Bloomfield, Zelig Harris, Gleason, Hockett, Pike, Archibald Hill and James Sledd), Jan Firbas and other members of the Prague School and, of course, the great Dane Jespersen (who lived in Helsingör, a town which, on a clear day, I can now glimpse from the Swedish shoreline). Leaving Durham with the embryo of a thesis which was delivered in Uppsala in the spring of 1961, I was decorated with a licentiate degree. A week after submitting the thesis, I received a letter from Randolph Quirk offering me an assistantship on the Survey of English Usage at University College London where he had moved from Durham - an offer which I naturally couldn't refuse.

A four-year London period gave me the opportunity of working with many other young linguists attached to the English Department: one was Sidney Greenbaum, who later held academic posts in the United States before returning to London as holder of the Quain Chair, the oldest English language chair in Britain; another was Geoffrey Leech, then a postgraduate somewhat unenthusiastically studying the language of television commercials, who later left for Lancaster University where he successfully contributed to making his department one of the world-leading centres of stylistics, pragmatics and corpus linguistics; a third was David Crystal, who later began a career at the Universities of Bangor and Reading and, with his unique penchant for both scholarship and popularization, ended up as an extremely successful international speaker and writer on linguistic topics. This was indeed a period when University College was rich in inspiring linguists, and I was lucky to be in the right place at the right time. Upstairs from the Survey, Michael Halliday had set up a linguistics department with colleagues like Bob Dixon, Rodney Huddleston and Dick Hudson, and next door was the Phonetics Department with A. C. Gimson, J. D. ("Doc") O'Connor and John Wells. Just down the road at the School of Oriental and African Languages, R. H. Robins lectured on general linguistics. We could widen our horizons still further by attending meetings of the London University Linguistics Circle at Bedford College. I particularly remember a talk there by John Lyons who, returning from a period at Indiana University, declared he was "a transformational linguist" - a bold confession to make to a largely Firth-inspired audience.

Our first computer project on the Survey was based on some of the early collected spoken texts and written up in a paper submitted to the Ninth International Congress of Linguists, in Cambridge, Massachusetts (Quirk et al. 1964). It was a study of the correspondence of prosodic to grammatical features, an exciting topic I was later to return to in my work on spoken English, which I found strikingly different from the written variety. This research also gave me the first idea of how a computer's ability to store and manage information could be put to good use for a linguist.

One day, I believe it was in 1963, Nelson Francis from Brown University turned up at UCL, walked into the office and dumped one of those huge computer tapes on Quirk's desk with the accompanying words "Habeas corpus". This was the Brown Corpus which Nelson Francis and Henry Kucera had just completed - the first large computerized text collection of American English for linguistic analysis. Its size of one million words and principles of text selection set a pattern for the creation of later corpora. Over the years, the Brown corpus has been used in English departments all over the world as a unique source of data and a means of exploring the ways in which computers can be employed in language research.

One of the milestones in the history of twentieth-century linguistics was of course the publication in 1957 of Syntactic Structures by Noam Chomsky, which imposed a constricting religiosity upon, especially, American linguistics departments but set off, for decades, the dominance of worldwide transformational generative (TG) linguistics. His view of the inadequacy of corpora and the adequacy of intuition, became the orthodoxy of succeeding generations of theoretical linguists:

Any natural corpus will be skewed. Some sentences won't occur because they are obvious, others because they are false, still others because they are impolite. The corpus, if natural, will be so wildly skewed that the description would be no more than a mere list. (Chomsky, 1962)

I have sometimes been asked why, in the unsupportive linguistic environment at the time, I chose to become "a corpus linguist". One reason for my choice was no doubt that, in the long Scandinavian philological tradition in English studies, the text was central. Another reason was of course that, to a non-native speaker of the language, the armchair approach of introspection is effectively ruled out. This may help to explain why certain parts outside the Anglo-Saxon world, such as northern Europe, were early strongholds of corpus linguistics. Yet the word "corpus" was not a common term in the early days of the Survey of English Usage. In his plan for the Survey, Randolph Quirk (1968) talks instead about "Descriptive Register", "primary material" and "texts". I recall one discussion over morning coffee in the UCL common room about the correct plural of corpus: should it be corpuses or corpora? The session reached an abrupt impasse when somebody suggested: "I think it's corpi."

While we didn't buy Chomsky's ideas wholesale, they nevertheless inspired us to undertake some related research. One basic concept in his theory was grammaticality: "the fundamental aim" of a grammar, he wrote, is to account for "all and only the grammatical sentences of a language" (Chomsky, 1957). To us on the Survey, surrounded as we were by masses of real language data, both spoken and written, drawing the line between grammatical and ungrammatical sentences seemed a huge problem. You will realize that our detailed analysis of a corpus of real-life language was very much swimming against the tide - there might indeed have been moments when you, being named a corpus linguist, felt like discovering your name on the passenger list for the Titanic.

The goal of the Survey of English Usage was to describe the grammatical repertoire of adult educated native speakers of British English: "their linguistic activity ranges from writing love letters or scientific lectures to speaking upon a public rostrum or in the relaxed atmosphere of a private dinner party. Since native speakers include lawyers, journalists, gynaecologists, school teachers, engineers, and a host of other specialists, it follows that [...] no grammarian can describe adequately the grammatical and stylistic properties of the whole repertoire from his own unsupplemented resources: 'introspection' as the sole guiding star is clearly ruled out" (Svartvik & Quirk, 1980). Like the Brown corpus for American English, the Survey of English Usage corpus for British English was to total one million words collected from a planned programme of varieties but, unlike Brown, it was to include both spoken and written material. There was of course no question of attempting to match proportions with the statistical distribution of the varieties according to normal use: this would obviously have obliged us to assign over ninety-nine per cent of the corpus to the preponderant variety of conversation between people on an intimate footing. Instead, we saw the overwhelming criterion as being the relative amount of data that would be required to represent the grammatical/stylistic potential of a given variety. The most difficult and time-consuming part of the work was the transcription of the audio recordings, especially those of spontaneous, interactive speech. Yet, any transcription, however "delicate" can be no more than a rough representation of its original spoken performance. Since we believe that prosody is part of grammar, the decision was taken to include a transcription which was sensitive to a wide range of prosodic and paralinguistic features in the spoken realization as heard on the recording. This system was documented by David Crystal and Randolph Quirk (1964) and further elaborated by Crystal in his PhD thesis (1969).

It was of course never envisaged that any corpus, necessarily finite (at least not in those pre-web days) would itself be adequate for a comprehensive description of English grammar. From the outset, elicitation tests with native subjects were envisaged as an essential tool for enlarging on corpus-derived information and for investigating features perhaps not found in the corpus at all. We undertook informant-based studies trying to devise a technique for establishing degrees and kinds of acceptability of different English sentences (Quirk & Svartvik, 1966; Quirk & Greenbaum, 1970). We had found that direct questioning (such as "Is this a grammatical sentence?") was the least reliable technique. Our way of improving on the direct question technique (called "judgement test") was to present informants with sentences on which they were required to carry out one of several operations (hence called "operation tests") which were easy to understand and to perform. An example would be to turn the verb in the present tense of a sentence into the past tense, and it would be left to informants to make any consequential changes they then deemed necessary. An example: When asked to turn the verb in They don't want some cake into the past tense, 24 of the 76 informants replaced some with any, and several others showed obvious discomfort over some, with hesitations and deletions. The results indicated that clear-cut categorization is futile and may actually inhibit our understanding of the nature of linguistic acceptability. In fact, the judgements of informant groups occurred anywhere throughout the entire range between unanimous acceptance and unanimous rejection. Testing acceptability in this way is not basic corpus linguistics but rather an extension of it, so as to investigate not only linguistic performance but also linguistic attitudes, and both techniques were part of the original Survey plan.

Most of my work consisted of grammatical analysis of texts by marking paper slips which were placed in various grammatical categories and stored in filing cabinets. In those days computers were rare, expensive, unreliable and not readily accessible to ordinary folk, but located inside glass doors and operated by engineers in white coats. In the company of Henry Carvell, a Cambridge mathematician turned programmer on joining the Survey, I spent many late nights in Gordon Square to get inexpensive off-peak access to the Atlas machine, programmed by punched paper tape. When the tape broke, which happened not infrequently, we had to start punching a new tape all over again! The topic was the pursuit of suitable computational methods of analyzing real-language data, where we used a program primarily intended for the classification of bacteria - looking in the rear mirror, a pretty bold under-taking, considering the negative attitude to taxonomy in the dominant linguistic climate at the time. We found the concept of gradience to be true in the data: the results gave us a few fairly distinct classes with some partially overlapping classes (Carvell & Svartvik, 1969).

Also the topic of my (1966) PhD thesis, On Voice in the English Verb, can be said to have been inspired by TG. In his early theory, Chomsky derived passive sentences from kernel active sentences, claiming that every active sentence with a transitive verb can be transformed into a corresponding passive sentence. The idea of representing the active-passive relation in terms of transformations was not new - Jespersen talked about the "turning" and Poutsma of the "conversion" of the verb form from one voice to another, but it was only in TG theory that the use of transformation was extended, formalized and systematically incorporated into a unified grammatical framework. The validity of this huge claim for active-passive transformation seemed worth investigating. After all, a pair of active-passive sentences like We play football and Football is played by us is enough to make any native speaker dubious. Also considerations other than linguistic may influence informant judgements: in one informant test, English students were asked to mark in an answer sheet the acceptability of, among others, the pair I have a black Bentley and A black Bentley is had by me. One student, in addition to rejecting the passive submitted to him by a threadbare postgraduate student speaking in a sing-song accent, wrote this marginal comment "And I don't think you can afford a black Bentley in the first place" - which was of course a correct observation. Over three hundred thousand words in some coexisting varieties of present-day English, spoken and written, were subjected to a variety of analyses which indicated that syntactic relationships can, and should, be expected to be multidimensional rather than binary and, in order to find this network of relations, it was best to cast the net wide. The conclusions state that there is in fact "a passive scale" with a number of passive clause classes that have different affinities with each other and with actives, including both transformational and serial relations.

A Grammar of Contemporary English (Quirk et al, 1972), was written by a foursome (by one reviewer referred to as "the Gang of Four"). When work on this grammar began, all four collaborators were on the staff of the English Department, University College London. This association happily survived a dispersal which put considerable distances between us (at the extreme, the 5000 miles between Wisconsin and Lund). In those days with no personal computers, email and faxes (in fact, even electric typewriters were thin on the ground), physical separation made collaboration arduous and time-consuming. Still, the book appeared in 1972. The original plan was to write a rather unpretentious undergraduate handbook but, with ambitions soaring as the job got underway, the printed book came to a total of 1120 pages. No wonder our Longman editor Peggy Drinkwater, equally professional and patient, used to refer to the ever-growing manuscript as "the pregnant brick". So the publishers were keen to have two smaller, more marketable grammars. As a result the foursome split up into twosomes with the idea of writing grammars also for language learners (Greenbaum & Quirk, 1973; Leech & Svartvik, 1975).

Our big grammar was successfully received but, in the early eighties, we felt it was time to embark on an updated edition: this culmination of our joint work resulted in a grammar that is considerably larger and richer, A Comprehensive Grammar of the English Language (Quirk et al, 1985). Contacts with international scholars have always been important to us, and in the preparation of this book we enjoyed welcome expert advice from, among others, two prominent American linguists: Dwight Bolinger and John Algeo. Apart from good reviews from colleagues, the grammar earned the distinction of being awarded "First Prize in the English-Speaking Union's Duke of Edinburgh English Language Competition", which all four of us received at Buckingham Palace from the hands of HRH Prince Philip. An amusing episode occurred after the photo session at the Palace. A couple of weeks later, the publishers wrote to say that no picture emerged since the photographer had somehow managed to insert the film incorrectly into the camera. However, Prince Philip, finding a free spot on a December morning in his diary, had kindly agreed to pose with the Gang of Four for a second photo op so, Longman asked, "Could I manage to fly to London on that date?" Yes, I could. What our host said to the photographer is not fit to print in the European English Messenger.

The grammarian is beset with a number of problems. One is the question of descriptive adequacy, as indicated in the openings lines of a review which appeared in The Times:

Writing a grammar of a living language is as muddy an undertaking as mapping a river. By the time you have finished, the rain has fallen, the water has moved on, the banks have crumbled, the silt has risen. With English having become the world language, in silt and spate with hundreds of different grammars, the project of making a comprehensive grammar of it is as Quixotic as trying to chart the Atlantic precisely.

It's typical for some reviewers to focus on the changing language rather than general descriptive problems. It's not the grammar of English that has changed a lot - for instance, pronouns have largely stayed the same for over four hundred years. The grammarian's real problem is rather choosing adequate descriptive categories, finding data and presenting them in an appropriate form of organization. Another problem for the grammarian is of course to find an audience for the book. The Times reviewer, Philip Howard (1985), concludes by saying:

It is a prodigious undertaking. It is just a bit difficult to see who it is for, other than other writers of English grammars. You would be ill-advised to try to learn English from it.

Here Philip Howard was right. A pedagogical grammar has to be different from a descriptive grammar.

Public attitudes to grammar are interesting and unpredictable. One particularly spontaneous and illuminating reaction to grammar from a native speaker occurred when I was drafting my part of the Communicative Grammar. Being a keen sailor and boat lover, I found myself sitting on the deck of the Queen Elizabeth en route from Southampton to New York. Difficult as it was in the rough weather, I was trying to keep my yellow notepad sheets from falling overboard. An American lady sitting in a deck chair next to me, clutching a highball (as those tall drinks used to be called in those days), said:

"May I ask what are you writing, I'm just dying to find out?"

"A grammar" I said.

No other reply could probably have startled my companion as much. After taking a big gulp from her highball she asked:

"A grammar - you mean to say you are a grammarian?"

"Yes, ma'am" I truthfully replied, realizing that, at this stage in our encounter, it was too late to cover up what was after all my academic and economic lifeline. I can still remember the exact words of her succinct comment, drowning both the din of the engine and the roar of the Atlantic:

"Gee, you've made my day, I've met a grammarian!"

Even now, thirty years after the event, I'm still not sure how to interpret this reaction, but I fear it was not meant to be a particularly flattering remark. I suspect that, to my deck chair companion, "grammar" was a manual giving advice on how to avoid "bad language": not mixing up imply and infer, disinterested and uninterested, who and whom, and of course how to avoid, at all cost, the imminent dangers of the passive voice, the dangling participle and the split infinitive.

This little story reflects, I think, the discrepancy between the popular and scholarly notions of what "grammar" is, or should be, and the widespread distrust of professional statements based on documented language use. While there is considerable public interest in questions of usage, it seems hard work for linguists to convince the public of the validity of their advice, even when supported by actual usage, and to bring home the notion that grammar is not synonymous with "linguistic etiquette". Also, many of the practitioners who give advice on usage lack both linguistic competence and real data, as Dwight Bolinger (1980) has pointed out:

In language there are no licensed practitioners, but the woods are full of midwives, herbalists, colonic irrigationists, bonesetters, and general-purpose witch doctors, some abysmally ignorant, others with a rich fund of practical knowledge ... They require our attention not only because they fill a lack but because they are almost the only people who make the news when language begins to cause trouble and someone must answer the cry for help. Sometimes their advice is sound. Sometimes it is worthless, but still it is sought because no one knows where else to turn.

In the mid-seventies, the London-Lund Corpus project was launched, with the aim of making the spoken part of the Survey of English Usage corpus available in electronic form. Thanks to a generous grant from the Bank of Sweden Tercentenary Foundation it was possible to employ a group of postgraduate students and get secretarial help to transfer the spoken material typed on paper slips in London to electronic form in Lund - a vast under-taking, considering the size of the material and the problem of finding ways of representing the detailed prosodic transcription in digital form. In 1980 we published a printed book, A Corpus of English Conversation, including thirty-four spoken texts from the magnetic tape (Svartvik & Quirk, 1980). Later, the corpus became available on CD-ROM - still one of the largest and most widely used corpora of spoken English, not least because it's prosodically annotated. The detailed annotation has facilitated numerous studies of lexis, grammar and, especially, the unique structure of spoken discourse. Under the present director of the Survey of English Usage, Bas Aarts, the corpus has recently been enhanced by the addition of wordclass tags using the ICE-GB scheme. In addition, the Survey has plans to digitize the original sound recordings to be supplied as a new resource.

Backtracking to the mid-seventies when the London-Lund project was launched, I had three main reasons for opting for research in spoken language. First, the Brown Corpus was a resource of machine-readable text exclusively for the medium of written language, and this was also the case with the on-going Lancaster/Oslo-Bergen Corpus (LOB) project, which was the first of the Brown Corpus clones and designed to be a British counterpart of the American corpus. Furthermore, then available grammatical descriptions of English were almost exclusively based on written language. Yet the vast majority of English language use takes place in the spoken channel. Second, it seemed a pity that the unique prosodic transcriptions of the Survey of English Usage should be restricted to the small number of research scholars who had physical access to the filing cabinets at University College London. Third, in the 1970s, computers were becoming more widespread and efficient, opening up new exciting approaches to corpus-driven research in spoken English. Today anybody anywhere in the world with a laptop, a CD unit and some off-the-shelf software can study selected aspects of spoken English. The final product, the London-Lund corpus, offered at cost to all interested colleagues in all parts of the world, was the result of research, recording, analysis and compilation extending over many years and involving a great number of colleagues on the Survey of English Usage at University College London and on the Survey of Spoken English at Lund University. But for the dedication and arduous teamwork by our students there would be no London-Lund Corpus. Many Lund colleagues contributed to the corpus and made extensive use of it - including Karin Aijmer (now professor at Göteborg University), Bengt Altenberg (later professor at Lund), Anna Brita Stenström (later professor at Bergen University), and Gunnel Tottie (later professor at the University of Zurich).


©Jan Svartvik. Edited extract from 'A Life in Linguistics' which first appeared in The European English Messenger, volume 14.1, 2005, 34-44. Reprinted by kind permission of the author.

