PLIN0068 Constructed Languages

2023-24

Lecture 7: Syntax 1

Syntax

How to put words together to form complex phrases

Obviously not every sequence of words should be grammatical, e.g.

  • Colourless green ideas sleep furiously
  • *Sleep green furiously ideas colourless

A language can be finite or infinite, but your language should be infinite

An infinite language has a finite set of grammatical rules that decide which strings of words are grammatical and which ones are not

Languages and sets

A language can be understood to be a set of strings of words

  • Let's call the entire set of words
  • We assume to be finite

We denote the set of all possible strings of words in by (Kleene Closure of )

  • is an empty string

If a string of words is a member of () we say is grammatical in ; otherwise () is ungrammatical in

Syntax as a function

The syntax of should tell us for any whether or

The syntax of can be seen as a single function : for any string :

NB: This assumes that grammaticality is binary

  • We stick to this assumption, as per most syntactic theories
  • But one could understand grammaticality as a more gradient notion, in which case should return values in
  • Cf. grammaticality vs. acceptability

Decision problems and computability

The problem of determining whether for an arbitrary is a decision problem

Some decision problems are undecidable = there is no general way of answering them = is not computable (connection to Computability Theory)

The fact that we can (quickly) determine (un)grammataticality suggests that the syntax function of any natural language is computable

Chomsky hierarchy

Some computable functions are 'simpler' than others

Correspondingly there are different levels of syntaxes/languages

The Chomsky hierarchy
By J. Finkelstein - CC BY-SA 3.0, Link

Regular Languages

Example 1: A finite regular language

The syntactic rules of :

  • A grammatical sentence must contain at least one and at most two 's.
  • must be preceded by and followed by .
  • can be preceded by but not by or .

Production rules

The syntax of regular languages is described by a finite set of production rules

  • : set of terminal symbols =
  • : set of non-terminal symbols

A production rule of a regular language has to look like:

  • ; or
  • ; or

where and

One member (usually ) of is a start symbol

Production rules for

Production rules

is the set of all strings that can be generated from the start symbol by applying the production rules

Example 2: An infinite regular language

Production rules

This grammar involves recursion

One non-terminal node (namely ) appears on the left and right of

Recusion yields infinite languages

Remarks

  • Every finite language is a regular language (proof omitted)

  • What we defined above is right regular languages

Right regular

  • ; or
  • ; or
Left regular
  • ; or
  • ; or
  • For any right regular language, there is an equivalent left regular language, and vice versa (proof omitted)

Remarks (cont.)

  • Every regular language is recognised by a finite state automaton, and every language recognised by a finite state automaton is a regular language (proof omitted)

  • A finite state automaton is a kind of machine that does not have long-term memory

  • Regular languages can encode certain types of dependencies

    • In both and , must be immediately followed by
    • Such dependency can be long-distance, e.g., if there is a , there must be a following it at some point

Non-regular languages

  • However, regular languages cannot count (with no limit), so there is no way to formulate dependencies like the following (proof omitted)

    • There can be any number of 's, but there must be the same number of 's following all the 's
    • Similarly for non-terminal levels, e.g., if there are -many 's, there must be -many 's
  • Natural languages show such dependencies, e.g., centre embedding

  • It's been claimed that birdsongs are regular languages

Context Free Languages

Context Free Languages

Context free languages are generated by productions rules of the form:

where is any sequence of terminal and/or non-terminal symbols

  • Every regular language is a context free language
  • There are context free languages that are not regular, e.g.

Remarks

  • Context free languages can have center embedding dependencies, so more powerful than regular languages

  • Every context free language is recognised by a push-down automaton, and every language recognised by a push-down automaton is a context free language (proof omitted)

  • A push-down automaton is a machine that has a stack memory

Non-context free languages

  • There are languages that are not context free, e.g., (proof omitted)

  • Syntacticians argue that some natural languages have unbounded numbers of cross-serial dependencies, so are not context free

  • Chomsky's Transformational Grammar:

    • Prase-Structural Rules are context free and generate D-structures
    • D-structures are mapped to S-structures by Transformations

More powerful languages

  • The next class of languages in the Chomsky hierarchy is context sensitive languages, whose production rules are either of the following:


    • where are any strings of terminal or non-terminal symbols and
  • Natural language syntax does not have the full power of context-free languages

    • E.g., no rules that say "the same number of determiners, nouns, and verbs with in the same sentence but in any order"

More powerful languages

  • Natural language is mostly context free, but it goes a bit outside of it

  • Mildly context free languages

    • Tree Adjoining Grammar
    • Multiple Context Free Grammar
    • Minimalist Grammar

For your language

  • No need to make syntactic rules for the entire language

  • Focus on some constructions, e.g.

    • Simple intransitive and transitive sentences name + predicate (+ name)
    • Structures of DPs
  • Decide whether to have a regular language or a context free language (or something stronger), and write down some rules

  • Make sure to have recursion so that your language will be infinite

Example: Context free rules for DPs in English

DP recursion

  • [DP[DP the scientist]'s equipment]
  • [DP[DP[DPthe scientist]'s mother]'s equipment]
  • [DP[DP[DP[DPthe scientist]'s mother]'s mother]'s equipment]

Note that there will be unacceptable cases, e.g. "a horrible equipment's mother's expensive scientist"

Example: Mirror English

  • Equipment expensive scientist every's
  • Scientist smart every
  • Mother mother scientist the's's

Word order of transitive sentences

Natural languages differ in word order

  • Some languages have flexible word order; other languages are more rigid
  • Many languages, if not all, have a dominant word order (see Dryer 2013 for discussion)
  • See Dryer 2013: Ch. 81 for data on simple transitive sentences