XClose

UCL Computer Science

Home
Menu

Principles of Natural Language, Logic and Statistics

We conduct research on mathematical models of natural language that take both logical structure and statistical data into account. The models are applied to textual understanding in a range of domains from human parsing processes to media recommendations.

Image of two trees which resemble faces with birds flying towards each other representing an abstract image of language

Research activities

Like nature, natural language abides by a set of logical rules and its behaviour can be described by statistics. We work on unified models that bring the two together. Our aim is to lead the development of next generation Artificial Intelligence systems that are compositional and transparent.

Historically, logic and statistics have been studied by different areas of mathematics, known as pure and applied. The logical models we consider lend themselves to statistical representations. Predominant examples are substructural logics and their computational variants, such as Lambek Calculus, modal Lambek Calculus and the CCG. Higher order sheaf theoretic semantics of these are also considered.

The statistical models we consider use vectors, matrices and tensors as learning vessels. Matrices and tensors are natural inhabitants of quantum mechanics. The parameters of our models can thus also be learnt via quantum simulations and on quantum computing devices.

We model a variety of real-world problems, from lexical semantics, to paraphrasing and inference, to human parsing processes and textual understanding. Some of the challenges we have considered are Garden path phenomena from human parsing, the Winograd Schema Challenge as a test of  machine intelligence, and media recommendations.

Principles of Natural Language, Logic and Statistics Lab is by a Research Chair of Royal Academy of Engineering, on “Engineered mathematics for modelling typed structures”, awarded to Professor Mehrnoosh Sadrzadeh, March 2022

People and Projects

  • Compositional distributional audio representations, PhD student: Saba Nazir
  • Multi modal similarity-based media recommendations, PhD students: Taner Cagali, Saba Nazir
  • Contextuality and sheaf theory in quantum mechanics and in natural language, PhD students:  Lo Ian Kin and Daphne Wang
  • Large scale quantum variational learning for pronoun resolution, PhD students:  Hadi Wazni
  • Vector space semantics for modal Lambek Calculus and applications to discourse, PhD students:  Lachlan McPheat

Collaborations

Events

Projects for prospective MRes and PhD students

Logic

  • Complete vector space and probabilistic semantics for:
    • Lambek Calculus
    • Modal Lambek Calculus
    • Fuzzy logic
    • Separation logic
  • Weighted relational semantics for modal Lambek Calculus
  • Relational algebras for modal Lambek Calculus

Quantum

  • Circuit based text learning
  • Grover’s algorithm for recommendation generation
  • Encoding classical word embeddings into quantum circuits
  • Non local games and linguistic protocols
  • Quantum contextuality in large language models
  • Classical vs quantum compositional distributional semantics (DisCoCat)

Events and Sheaves

Sheaf theoretic models of natural language

  • Lexical ambiguities, polysemy/homonymy
  • Discourse ambiguities
  • Grammatical ambiguities
    • Local ambiguities in garden path sentences
    • Global spurious ambiguities
  • Combinations of the above