UCLIC - UCL Interaction Centre


Where are movement-sensing, emotion inferring technologies in the real world?

Human life revolves around movement. Cells reproduce, hairs make their way through follicles, food and drink go down tubes, humans wiggle, reach, crawl, limp, jump...

It is unsurprising then that human movement is a fascinating subject of study, more so as the study of movement, how it works, when it fails, and what it can intimate, can be valuable to addressing needs in the human society.

For instance, knowledge about movement is fundamental to physiotherapy which helps people toward regaining movement and function when illness or injury affects their normal functioning.

In social interactions, at work or home, in public spaces or with a clinician, movement plays a role in our perception of what another human is doing, experiencing, or intending. As digital technology becomes more available, accessible, and ubiquitous, there is increasing interest in equipping technology with the ability to interpret movement or cues encoded in movement (intentionally or not) for the purposing of supporting, augmenting, guiding, mediating, to name a few uses, human activity. A special class of such technologies are affect recognition technologies (ARTs), which is used here as an umbrella term for systems that can infer types or levels of emotions, mood, other subjective experiences, traits, or cognitive states from behavioural (e.g. facial expression, body movement, speech) or physiological (e.g. sweat response, cardiac activity, brain activity) expressions. While ARTs can use one or more of the wider range of expression modalities, there are certain use cases where body movement expressions have a complementary value or are even more critical than the others. However, of the ART-based solutions currently on the market, less than 0.1% of them use body movement, compared with the more than 40% that are based on facial expressions or the 22% that use text data, and they seem to only capture upper body movement behaviour (this analysis is based on the AAAC database). What are barriers keeping movement-based ARTs in research labs and out of the real world?

This was the topic of discussion at the recent AffectMove workshop with a panel of experts with experience covering human motor action/perception, machine learning/AI, affective computing, and human-computer interaction research as well as industry experience in delivering ART, other AI, and body movement sensor solutions. The barriers and needs highlighted in that discussion are summarised here under two main themes:

Limited by Data

As with other technologies largely based on machine learning, ARTs require data at three different levels: to build the affect recognition system, to evaluate its performance, and to make real-time inferences about a person's (or group's) affective experience.

Data used to build and evaluate the system needs to be representative of both the experience and the context of interest. Meanwhile, it is not at all trivial to capture such representative data. In addition, the closer to real world settings one goes in acquiring data, the more 'noisy' the data (one or both of the sensor data and the subjective experience) will be due to confounding human factors, environmental conditions, and the various other elements of the real world that one may not be able to control.

The lines are rotated plots of lateral trunk movement profiles of people with chronic low back pain during sit-to-stand movements where they felt confident of their ability to complete it despite pain. (Image credit: Temitayo Olugbade)

Further, human movement in particular has a very high number of degrees of freedom, each with a large range of possible values/states, and it is extremely difficult to obtain a single dataset that covers the wide range of human movement, whereas better understanding of the human movement space is critical to technology that aims to infer affect from body movement in the real world. While the use of additional modalities could help disentangle from confounding factors and as well add complementary information about the affective experience, fusion of multiple modalities can be challenging due to different time resolutions, degrees of freedom, and expression time scales for each different modality, and is itself an ongoing area of research.

A more primary impediment to the use of movement-based ARTs in real world settings is the constraints of sensor technology currently available. Unlike facial expressions which can be captured in reasonable spatial resolution using a camera on a phone or certain physiological signals that can be recorded with sensors embedded in smart watches or bracelets, continuous recording of body movement data in ubiquitous settings is still at infancy. Many of the body movement sensor technologies that currently exist are expensive and/or only practical for lab settings where the activity of the human, the environment, and other factors that make the real world challenging can be well controlled. They often fail in some form, or cannot at all function, when they are put in minimally constrained or everdyday settings.

Beyond the sensor data itself, there is also the necessity, general to all ARTs, to obtain annotations for building the recognition system. Perhaps unique to body movement is its encapsulation of additional information about context, e.g. action tendency (i.e. coping strategy or adaptive action), interaction settings (is the person with colleagues or with friends? are they at a meeting or a party?), that are important for helpfully interpreting the experience of the human and delivering appropriate personalisation or intervention. The difficulty is in collecting good quality annotations that capture the affective experience as well as contextual information. While self-report may usually be the fairest means of capturing subjective experience, it can be burdensome (and sometimes impossible) for a person to provide annotations in real time at a valuable time resolution, across representative scenarios, and in real world settings. Observer annotations at a later time is an alternative that is more widely used; however, it is expensive and there are no established frameworks to standardise body movement annotation for affect recognition across annotators.

Design, A Need

To bring any ART to market, return on investment will typically be a critical consideration for a company or other organisation considering that space and as such, beyond fantastic technological possibilities, there is the critical need to create marketable designs that people will be willing to accept, as both valuable and ethical, and pay for. User studies where technology needs and specific application contexts are properly investigated are necessary to understand pursuits of benefit to the user and/or the society as a whole, and a good number of academic ART research rightly follow this approach. Generally, people may be more open to exploring ARTs that give the user both agency as well as useful and appropriate feedback. There may of course be conflict between what is profitable for a company and what is beneficial to relevant user groups. Still, ethical considerations will need to be well addressed. While movement-based ARTs may not raise the same concerns as technology that record and assess the face, ARTs in general are sensitive due to their analysis of personal and intimate experiences and body movement further encodes information that could be used to deduce additional private characteristics about a person or their activities. There are currently a number of endeavours towards creating guidelinesstandards, and/or regulations that support and foster ethical ART designs.

The difficulties in obtaining data in the wild to train ARTs highlight the additional need for hybrid ART designs that initially have limited capabilities but provide some functionality to the user while capturing personalized data (in real scenarios) to improve affect recognition performance for that user. In such somewhat symbiotic relationship, the ART will need to be transparent with the user about their capabilities, especially while limited, and be able to create a (metaphorical or literal) dialogue with the user with the aim of obtaining input (i.e. annotations) that it needs from them and at the same time providing some level of utility to them. Recent findings suggest that for certain use cases, literal dialogue with and the process of providing annotations to a technological system could in itself be valuable to a user.

Finally, there is clearly a need for cheaper and more ubiquitous body movement sensors. The clothing and fashion industry has, as an example, been exploring washable sensors embedded in everyday clothing. There are also designs based on the use of multiple wearable sensing units each worn on separate anatomical segments although these still mainly target research, elite sport, and clinical setting use.

Far from being the litany of bad news that it may seem at first glance, our aim with the discussion summarized above is to spur on interest in closing the gap between research and technology deployment in the real world as well as to direct research questions toward more critical problems, which may perhaps be best investigated as academia-industry collaborations. We conclude with the note that, if movement is so central to human existence, technology designed for humans will need to make sense of and adapt to human movement in more depth than it already does. This indeed is the wheel that drives our own interest in body movement.