Title: Interpretable features to compare distributions in linear time
Abstract: The goal of this talk is to describe an efficient method for testing whether two sets of samples come from the same probability distribution (a two-sample test). I will present an adaptive two-sample test which learns interpretable features to optimize testing power, i.e., to increase the number of true positives. These features will be used in constructing the mean embedding (ME) distance, which can be shown to be a metric on distributions under appropriate assumptions. The empirical ME statistic can be computed in linear time, making it much more efficient than earlier quadratic time kernel test statistics. The key point in choosing meaningful features is that variance matters: it is not enough to have a large empirical divergence; we also need to have high confidence in the value of our divergence. We use the linear-time ME test to distinguish positive and negative emotions on a facial expression database, showing that a distinguishing feature reveals the facial areas most relevant to emotion.