UCL Centre for Artificial Intelligence


Best Short Paper Award EACL 2021

11 May 2021

EACL Logo 2021

We are proud to congratulate Patrick Lewis, Pontus Stenetorp and Sebastian Riedel on their recent Best Paper Award at The 16th Conference of the European Chapter of the Association for Computational Linguistics for their paper, "Question and Answer Test-Train Overlap in Open-Domain Question Answering Datasets"

This paper focusses on the task of Open-Domain Question Answering.  This involves training NLP models to answer short, factoid questions expressed in natural language.

Models for open-domain question answering have access to a training set of question-answer pairs and are tested on how many questions they can answer correctly from a set of testing questions (supposedly) different from the training questions.

In this paper, we are interested in understanding what kinds of questions our current models can answer correctly, or, put another way, what kinds of "behaviours" or "competencies" the models are capable of.

We identify three kinds of behaviours:

1) Can our models memorize their training set, and answer test questions which are paraphrases of training questions? (this should be easy)

2)  Can our models answer novel testing questions with answers they have seen from the training set? (this is harder)

3) Can our models answer totally novel testing questions with novel answers? (this is the hardest)


Currently, we only know overall test performance, which doesn't tell us which behaviours the models display, or how well the set of testing questions test for each of these behaviours. 


To address this, we perform a detailed study and annotation of the test sets of 3 popular open-domain QA datasets. We find that a large fraction of Test answers (60%) occur in the training set, and 30% of test questions have paraphrases in the training sets. 

This means that relatively strong overall performance is possible for models that only exhibit the simple, easy behaviours 1 and 2. Using these insights, we were able to demonstrate that some recent models behave very similarly to simple nearest neighbour baselines.

Pie Chart
Bar chart