Exploring Natural Language:
Working with the British Component of the International Corpus of English


“This book is a must for anyone who wants to explore the immense possibilities of the ICE-GB.” - The Year’s Work in English Studies, 2004, 83.1

Gerald Nelson, Sean Wallis and Bas Aarts, 2002, Amsterdam: John Benjamins. 355 pages hbk/pbk.

ISBN 90 272 4889 3 (Europe) / 1 58811 271 3 (US)

Number G29 in the series Varieties of English Around the World (series editor: Edgar Schneider).

You may go to John Benjamin's site in order to purchase this book. The table of contents is listed below.

ICE-GB is a 1 million-word corpus of contemporary British English. It is fully parsed, and contains over 83,000 syntactic trees. Together with the dedicated retrieval software, ICECUP, ICE-GB is an unprecedented resource for the study of English syntax.

Exploring Natural Language is a comprehensive guide to both corpus and software. It contains a full reference guide for ICE-GB. The chapters on ICECUP provide complete instructions on the use of the many features of the software, including concordancing, lexical and grammatical searches, sociolinguistic queries, random sampling, and searching for syntactic structures using ICECUP's Fuzzy Tree Fragment models. Special attention is given to the principles of experimental design in a parsed corpus.

Six case studies provide step-by-step illustrations of how the corpus and software can be used to explore real linguistic issues, from simple lexical studies to more complex syntactic topics, such as noun phrase structure, verb transitivity, and voice.

Keywords: Corpus Linguistics; International Corpus of English (ICE); ICE-GB; ICECUP; Grammar; Parsing; Fuzzy Tree Fragments (FTFs); Research Methods; Corpus Exploration; Experimental Design

See also: Review by Lea Cyrus on the Linguist List

CONTENTS

SERIES EDITOR’S INTRODUCTION

FOREWORD

PREFACE

PART 1: Introducing the corpus

1. INTRODUCING ICE-GB

1.1 AIMS AND BACKGROUND

1.2 CORPUS DESIGN

1.3 EXTRA-CORPUS MATERIAL

1.4 COPYRIGHT

1.5 TRANSCRIPTION AND MARKUP

1.6 PART-OF-SPEECH TAGGING

1.7 SYNTACTIC PARSING

1.8 CROSS-SECTIONAL CHECKING

1.9 DIGITIZATION

1.10 EXAMINING ICE-GB TEXTS

2. THE ICE-GB GRAMMAR

2.1 INTRODUCTION

2.2 ICE WORD CLASSES

    1. Adjective (ADJ)
    2. Adverb (ADV)
    3. Article (ART)
    4. Auxiliary verb (AUX)
    5. Cleft it (CLEFTIT)
    6. Conjunction (CONJUNC)
    7. Connective (CONNEC)
    8. Existential there (EXTHERE)
    9. Formulaic expression (FRM)
    10. Genitive marker (GENM)
    11. Interjection (INTERJEC)
    12. Noun (N)
    13. Nominal Adjective (NADJ)
    14. Numeral (NUM)
    15. Preposition (PREP)
    16. Proform (PROFM)
    17. Pronoun (PRON)
    18. Particle (PRTCL)
    19. Reaction signal (REACT)
    20. Verb (V)
    21. Miscellaneous tags

2.3 FUNCTIONS AND CATEGORIES

    1. Adverbial (A) [Function]
    2. Adjective Phrase (AJP) [Category]
    3. Adjective Phrase Head (AJHD) [Function]
    4. Adjective Phrase Postmodifier (AJPO) [Function]
    5. Adjective Phrase Premodifier (AJPR) [Function]
    6. Adverb Phrase Head (AVHD) [Function]
    7. Adverb Phrase (AVP) [Category]
    8. Adverb Phrase Postmodifier (AVPO) [Function]
    9. Adverb Phrase Premodifier (AVPR) [Function]
    10. Auxiliary Verb (AVB) [Function]
    11. Central Determiner (DTCE) [Function]
    12. Clause (CL) [Category]
    13. Cleft Operator (CLOP) [Function]
    14. Conjoin (CJ) [Function]
    15. Coordinator (COOR) [Function]
    16. Detached Function (DEFUNC) [Function]
    17. Determiner (DT) [Function]
    18. Determiner Phrase (DTP) [Category]
    19. Determiner Postmodifier (DTPO) [Function]
    20. Determiner Premodifier (DTPR) [Function]
    21. Direct Object (OD) [Function]
    22. Discourse Marker (DISMK) [Function]
    23. Disparate (DISP) [Category]
    24. Element (ELE) [Function]
    25. Empty (EMPTY) [Category]
    26. Existential Operator (EXOP) [Function]
    27. Floating Noun Phrase Postmodifier (FNPPO) [Function]
    28. Focus (FOC) [Function]
    29. Focus Complement (CF) [Function]
    30. Genitive function (GENF) [Function]
    31. Imperative Operator (IMPOP) [Function]
    32. Indeterminate (INDET) [Function]
    33. Indirect Object (OI) [Function]
    34. Interrogative Operator (INTOP) [Function]
    35. Inverted Operator (INVOP) [Function]
    36. Main Verb (MVB) [Function]
    37. Nonclause (NONCL) [Category]
    38. Notional Direct Object (NOOD) [Function]
    39. Notional Subject (NOSU) [Function]
    40. Noun Phrase (NP) [Category]
    41. Noun Phrase Head (NPHD) [Function]
    42. Noun Phrase Postmodifier (NPPO) [Function]
    43. Noun Phrase Premodifier (NPPR) [Function]
    44. Object Complement (CO) [Function]
    45. Operator (OP) [Function]
    46. Parataxis (PARA) [Function]
    47. Parsing Unit (PU) [Function]
    48. Postdeterminer (DTPS) [Function]
    49. Predeterminer (DTPE) [Function]
    50. Predicate Element (PREDEL) [Category]
    51. Predicate Group (PREDGP) [Function]
    52. Prepositional (P) [Function]
    53. Prepositional Complement (PC) [Function]
    54. Prepositional Modifier (PMOD) [Function]
    55. Prepositional Phrase (PP) [Category]
    56. Provisional Direct Object (PROD) [Function]
    57. Provisional Subject (PRSU) [Function]
    58. Stranded Preposition (PS) [Function]
    59. Subject (SU) [Function]
    60. Subject Complement (CS) [Function]
    61. Subordinator Phrase Head (SBHD) [Function]
    62. Subordinator Phrase Modifier (SBMO) [Function]
    63. Subordinator (SUB) [Function]
    64. Subordinator Phrase (SUBP) [Category]
    65. Tag Question (TAGQ) [Function]
    66. Particle To (TO) [Function]
    67. Transitive Complement (CT) [Function]
    68. Verbal (VB) [Function]
    69. Verb Phrase (VP) [Category]

2.4 FEATURE LABELS

2.5 SPECIAL TOPICS IN THE ICE-GB GRAMMAR

    1. Inversion
    2. Interrogative
    3. Imperative
    4. Coordination
    5. Direct Speech

PART 2: Exploring the corpus

3. INTRODUCING THE ICE CORPUS UTILITY PROGRAM (ICECUP)

3.1 FIRST IMPRESSIONS

3.2 THE CORPUS MAP

3.3 BROWSING THE RESULTS OF QUERIES

3.4 VIEWING TREES IN THE CORPUS

3.5 VARIABLE QUERIES

3.6 ‘SINGLE GRAMMATICAL NODE’ QUERIES

3.7 MARKUP QUERIES

3.8 RANDOM SAMPLING

3.9 TEXT FRAGMENT QUERIES

3.10 FUZZY TREE FRAGMENT SEARCHES

3.11 OPEN FILE

3.12 SAVE TO DISK

3.13 SEARCH OPTIONS

4. BROWSING THE CORPUS

4.1 THE IDEA OF CORPUS EXPLORATION

4.2 NAVIGATING THE CORPUS MAP

4.3 BROWSING SINGLE TEXTS

4.4 THE TEXT BROWSER WINDOW

4.5 VIEWING WORD CLASS TAGS

4.6 CONCORDANCING A QUERY

4.7 DISPLAYING TREES IN THE TEXT

4.8 GRAMMATICAL CONCORDANCING IN ICECUP 3.1

4.9 DISPLAYING TREES IN A SEPARATE WINDOW

4.10 CONCORDANCING, MATCHING AND VIEWING TREES

4.11 LISTENING TO SPEAKERS IN THE CORPUS

4.12 SELECTING TEXT UNITS IN ICECUP 3.1

5. FUZZY TREE FRAGMENTS AND TEXT QUERIES

5.1 THE TEXT FRAGMENT QUERY WINDOW

5.2 SEARCHING FOR WORDS, TAGS AND TREE NODES

5.3 MISSING WORDS AND SPECIAL CHARACTERS

5.4 EXTENDING THE QUERY INTO THE TREE

5.5 INTRODUCING FUZZY TREE FRAGMENTS

5.6 AN OVERVIEW OF COMMANDS TO CONSTRUCT FTFS

5.7 CREATING A SIMPLE FTF

5.8 ADDING A FEATURE AND RELATING A WORD TO THE TREE

5.9 MOVING NODES AND BRANCHES

5.10 APPLYING A MULTIPLE SELECTION AND SETTING THE FOCUS OF AN FTF

5.11 TEXT-ORIENTED FTFS REVISITED

5.12 THE GEOMETRY OF FTFS

5.13 HOW FTFS MATCH AGAINST THE CORPUS

5.14 THE FTF CREATION WIZARD: A TOOL FOR MAKING FTFS FROM TREES

6. COMBINING QUERIES

6.1 A SIMPLE EXAMPLE

6.2 VIEWING THE QUERY EXPRESSION

6.3 MODIFYING THE LOGIC OF QUERY COMBINATIONS

6.4 USING DRAG AND DROP TO MANIPULATE QUERY EXPRESSIONS

6.5 REMOVING PARTS OF THE QUERY

6.6 LOGIC AND FUZZY TREE FRAGMENTS

6.7 EDITING QUERY ELEMENTS

6.8 MODIFYING THE FOCUS OF AN FTF DURING BROWSING

6.9 BACKGROUND FTF SEARCHES AND THE QUERY EDITOR

6.10 SIMPLIFYING THE QUERY

7. ADVANCED FACILITIES IN ICECUP 3.1

7.1 INTRODUCING ICECUP 3.1

7.2 THE LEXICON

7.3 THE GRAMMATICON

7.4 STATISTICAL TABLES

7.5 LEXICAL WILD CARDS

7.6 EXTENSIONS TO FUZZY TREE FRAGMENT NODES

    1. Performing exact matching in FTFs
    2. Specifying missing features and pseudo-features
    3. Specifying sets of functions, categories and features
    4. Specifying a logical formula

PART 3: Performing research with the corpus

8. CASE STUDIES USING ICE-GB

8.1 CASE STUDY 1: PRETTY MUCH AN ADVERB

8.2 CASE STUDY 2: EXPLORING THE LEXEME BOOK WITH THE LEXICON

8.3 CASE STUDY 3: TRANSITIVITY AND CLAUSE TYPE

8.4 CASE STUDY 4: WHAT SIZE FEET HAVE YOU GOT? WH-DETERMINERS IN NOUN PHRASES

8.5 CASE STUDY 5: ACTIVE AND PASSIVE CLAUSES

8.6 CASE STUDY 6: THE POSITIONS OF IF-CLAUSES

9. PRINCIPLES OF EXPERIMENTAL DESIGN WITH A PARSED CORPUS

9.1 WHAT IS A SCIENTIFIC EXPERIMENT?

9.2 WHAT IS AN EXPERIMENTAL HYPOTHESIS?

9.3 THE BASIC APPROACH: CONSTRUCTING A CONTINGENCY TABLE

9.4 WHAT MIGHT SIGNIFICANT RESULTS MEAN?

9.5 HOW CAN WE MEASURE THE ‘SIZE’ OF A RESULT?

    1. Relative size
    2. Relative swing
    3. Chi-square contribution
    4. Cramer’s phi

9.6 COMMON ISSUES IN EXPERIMENTAL DESIGN

    1. Have we specified the null hypothesis incorrectly?
    2. Are all the relevant values listed together?
    3. Are we really dealing with the same linguistic choice?
    4. Have we counted the same thing twice?

9.7 INVESTIGATING GRAMMATICAL INTERACTIONS

9.8 THREE STUDIES OF INTERACTION IN THE GRAMMAR

    1. Two features within a single constituent
    2. Two features in a structure
    3. A feature and an optional constituent
    4. Footnote: dealing with overlapping cases

PART 4: The future of the corpus

10. FUTURE PROSPECTS

10.1 EXTENDING THE ANNOTATION IN THE CORPUS

10.2 EXTENDING THE EXPRESSIVITY OF FUZZY TREE FRAGMENTS

10.3 INCORPORATING EXPERIMENTS IN SOFTWARE

10.4 KNOWLEDGE DISCOVERY IN CORPORA

10.5 AIDING THE ANNOTATION OF CORPORA

10.6 TEACHING GRAMMAR WITH CORPORA

REFERENCES

APPENDIX 1. ICE TEXT CATEGORIES AND CODES

A1.1 SPOKEN CATEGORIES

A1.2 WRITTEN CATEGORIES

APPENDIX 2. SOURCES OF ICE-GB TEXTS

A2.1 S1A-001 TO S1A-090: DIRECT CONVERSATIONS

A2.2 S1A-091 TO S1A-100: TELEPHONE CALLS

A2.3 S1B-001 TO S1B-020: CLASSROOM LESSONS

A2.4 S1B-021 TO S1B-040: BROADCAST DISCUSSIONS

A2.5 S1B-041 TO S1B-050: BROADCAST INTERVIEWS

A2.6 S1B-051 TO S1B-060: PARLIAMENTARY DEBATES

A2.7 S1B-061 TO S1B-070: LEGAL CROSS-EXAMINATIONS

A2.8 S1B-071 TO S1B-080: BUSINESS TRANSACTIONS

A2.9 S2A-001 TO S2A-020: SPONTANEOUS COMMENTARIES

A2.10 S2A-021 TO S2A-050: UNSCRIPTED SPEECHES

A2.11 S2A-051 TO S2A-060: DEMONSTRATIONS

A2.12 S2A-061 TO S2A-070: LEGAL PRESENTATIONS

A2.13 S2B-001 TO S2B-020: NEWS BROADCASTS

A2.14 S2B-021 TO S2B-040: BROADCAST TALKS (SCRIPTED)

A2.15 S2B-041 TO S2B-050: NON-BROADCAST SPEECHES (SCRIPTED)

A2.16 W1A-001 TO W1A-010: UNTIMED STUDENT ESSAYS

A2.17 W1A-011 TO W1A-020: STUDENT EXAMINATION SCRIPTS

A2.18 W1B-001 TO W1B-015: SOCIAL LETTERS

A2.19 W1B-016 TO W1B-030: BUSINESS LETTERS

A2.20 W2A-001 TO W2A-040: ACADEMIC WRITING

A2.21 W2B-001 TO W2B-040: POPULAR WRITING

A2.22 W2C-001 TO W2C-020: NEWSPAPER REPORTS

A2.23 W2D-001 TO W2D-010: ADMINISTRATIVE/REGULATORY WRITING

A2.24 W2D-011 TO W2D-020: SKILLS AND HOBBIES

A2.25 W2E-001 TO W2E-010: PRESS EDITORIALS

A2.26 W2F-001 TO W2F-020: FICTION

APPENDIX 3. BIBLIOGRAPHICAL AND BIOGRAPHICAL VARIABLES

APPENDIX 4. STRUCTURAL MARKUP SYMBOLS

APPENDIX 5. A QUICK REFERENCE GUIDE TO THE ICE GRAMMAR

APPENDIX 6. SPECIAL CHARACTERS USED IN ICE-GB

INDEX

This page last modified 12 June, 2013 by Survey Web Administrator.