Statistical Science Seminars
Usual time: Thursdays 16:00 - 17:00
Location: Room 102, Department of Statistical Science, 1-19 Torrington Place (1st floor).
Some seminars are held in different locations at different times. Click on the abstract for more details.
Vector-valued Distribution Regression: A Simple and Consistent Approach
We address the distribution regression problem (DRP): regressing on the domain of probability measures, in the two-stage sampled setup when only samples from the distributions are given. The DRP formulation offers a unified framework for several important tasks in statistics and machine learning including multi-instance learning (MIL), or point estimation problems without analytical solution. Despite the large number of MIL heuristics, essentially there is no theoretically grounded approach to tackle the DRP problem in two-stage sampled case. To the best of our knowledge, the only existing technique with consistency guarantees requires kernel density estimation as an intermediate step (which often scale poorly in practice), and the domain of the distributions to be compact Euclidean. We analyse a simple (analytically computable) ridge regression alternative to DRP: we embed the distributions to a reproducing kernel Hilbert space, and learn the regressor from the embeddings to the outputs. We show that this scheme is consistent in the two-stage sampled setup under mild conditions, for probability measure inputs defined on separable, topological domains endowed with kernels, with vector-valued outputs belonging to an arbitrary separable Hilbert space. Specially, choosing the kernel on the space of embedded distributions to be linear and the output space to the real line, we get the consistency of set kernels in regression, which was a 15-year-old open question. In our talk we are going to present (i) the main ideas and results of consistency, (ii) concrete kernel constructions on mean embedded distributions, and (iii) two applications (supervised entropy learning, aerosol prediction based on multispectral satellite images) demonstrating the efficiency of our approach.
Joint work with Arthur Gretton (UCL), Barnabas Poczos (CMU) and Bharath K. Sriperumbudur (PSU).
Directional inference for vector parameters
We consider inference on a vector-valued parameter of interest in a linear exponential family, in the presence of a finite-dimensional nuisance parameter. Based on higher-order asymptotic theory for likelihood, we propose a directional test whose p-value is computed using one-dimensional integration. The work simplifies and develops earlier research on directional tests for continuous models and on higher-order inference for discrete models, and the examples include contingency tables and logistic regression. Examples and simulations illustrate the high accuracy of the method, which we compare with the usual likelihood ratio test and with an adjusted version due to Skovgaard. In high-dimensional settings, such as covariance selection, the approach works essentially perfectly, whereas its competitors can fail catastrophically. Extensions to non-linear exponential families and to general models are also sketched.
(Joint work with A.C. Davison, D.A.S. Fraser, N. Reid.)
Page last modified on 08 oct 12 11:11