XClose

UCL Policy Lab

Home
Menu

Designing tests to out-game cheaters

26 May 2022

Vasiliki Skreta suggests a novel way of beating those who cheat on tests.

Chess

In modern economies, decisions are increasingly guided by tests, ratings, and algorithms.

These systems, however, are vulnerable to input manipulations, or falsification. When regulating vehicle emissions, for example, compliance with emission standards must be checked with a test. These tests have been infamously manipulated through what are called “defeat devices”, interventons that artificially reduce vehicles’ emissions in testing conditions.

Accounting for these possible manipulations is an integral part of designing tests that provide valuable and accurate information. 

In a new paper co-authored with Eduardo Perez-Richet at Science Po, my colleagues and I propose a theory of test design under input manipulations. Interestingly, optimal tests do not discourage falsification. Instead, they induce “productive falsification” that serves the interests of the test designer. The nature of these tests and how they perform depend on the cost of input falsification. Financial institutions, for instance, may face fines when they’re caught hiding assets or misreporting their holdings when facing stress tests.

Students who increase their standardised test scores through tutoring pay the cost of time and effort. Online shoppers incur a time cost when they adapt their browsing behaviour to get better deals from pricing algorithms.

If these costs are sufficiently high, the designer incurs no loss. If costs get milder, however, optimal tests become noisy, leading non-compliers to be approved with some probability first, and then even compliers to be rejected with some probability. While higher falsification costs always benefit the designer, they can also initially benefit but eventually hurt the agent.

We show how the availability of a falsification-detection technology can be leveraged to improve test performance.

Falsification detection leads to grade devaluation which amounts to an implicit cost of cheating. This cost is increased the finer the set of possible test scores are. We demonstrate that even with a binary decision--deeming a vehicle compliant, versus non-compliant--optimal tests have continuous signals that lead to approvals. This is because rich tests in terms of signals maximize the implicit devaluation costs. Taken together, these results contribute to practical test design by conceptualizing two levers to improve test performance: productive falsification and devaluations.

In the emissions cheating scandal, falsification by car manufacturers was detrimental as it enabled vehicles with noncompliant emission levels to pass the environmental test. Our analysis suggests that tests designed without accounting for falsification are overly informative and, because of that, can provide very unreliable results in the presence of cheating. 


The results point to practical and simple features that can significantly improve the performance of emissions, and other tests. When falsification is undetectable, the structure of the optimal test suggests raising the operational standard above the baseline standard.

Doing so both deters detrimental falsification and relies on productive falsification to generate approvals of compliant states. With high falsification costs, simply raising the standard suffices to eliminate approvals of noncompliant states. With lower falsification costs, optimality additionally requires randomly approving a fringe of noncompliant states to deter detrimental falsification. When falsification costs are even lower, randomly rejecting compliant states becomes necessary to prevent extremely low states from falsifying to the standard. 


When a falsification-detection technology is available, the threat of devaluation provides a powerful channel to improve test performance, which is especially appealing when falsification costs are low.

In practice, a testing agency could accompany test outputs with a report on detected amounts of falsification, or even perform the devaluation on the decision makers’ behalf by directly reporting the expectation she should form following each output. While a very rich set of test outputs best harnesses this tool, merely adding only a few signals may already yield significant improvement in reliability of test results in practice.

Dr Vasiliki Skreta is Professor and Chair of Microeconomics.