List of talks for LIST 2023. This page will be populated with titles, abstracts and bios of the speakers.

List of talks at LSIT 2023.

## Learning to Compress

**Title:** Learned Data Compression – New Solutions to Old Problems in IT

**Abstract: **Since its emergence roughly 6 years ago, the field of learned data compression has attracted considerable attention from both the machine learning and information theory communities. Data-driven source coding promises faster innovation cycles, as well as better adaptation to novel types of sources and unconventional models of distortion. In this talk, I review nonlinear transform coding (NTC), a framework of techniques which over the past few years have superseded the state of the art of hand-crafted image compression methods in terms of subjective quality vs. rate. I demonstrate the empirical rate–distortion performance of NTC with the help of toy sources, for which the optimal performance of a vector quantizer is easier to estimate than for natural data sources such as images. I touch upon applications to compression of not just images, but also neural network parameters and activations, two unconventional data sources. My talk concludes with recent findings about limitations as well as future directions of NTC, such as its potential to bridge a long-standing gap between theory and practice in the multi-terminal case.

**Bio: **Johannes Ballé (they/them) is a Staff Research Scientist at Google. They defended their master's and doctoral theses on signal processing and image compression under the supervision of Jens-Rainer Ohm at RWTH Aachen University in 2007 and 2012, respectively. This was followed by a brief collaboration with Javier Portilla at CSIC in Madrid, Spain, and a postdoctoral fellowship at New York University’s Center for Neural Science with Eero P. Simoncelli, studying the relationship between perception and image statistics. While there, Johannes pioneered the use of variational Bayesian models and deep learning techniques for end-to-end optimized image compression. They joined Google in early 2017 to continue working in this line of research. Johannes has served as a reviewer for top-tier publications in both machine learning and image processing, such as NeurIPS, ICLR, ICML, Picture Coding Symposium, and several IEEE Transactions journals. They have been a co-organizer of the annual Workshop and Challenge on Learned Image Compression (CLIC) since 2018.

**Title:** Universal Rate-Distortion-Perception Representations for Lossy Compression

**Abstract: **In recent years, deep learning methods are being increasingly used in lossy data compression. Deep generative models when applied to (lossy) image compression tasks can reconstruct realistic looking outputs even at very low bit-rates, when traditional compression methods suffer from significant artifacts. In this talk, we will revisit the rate-distortion-perception (RDP) tradeoff that underlies the performance of learned image compression and study the quadratic-Gaussian source model in detail. Motivated by the Gaussian case, we will introduce a notion of universal encoded representations, where the same compressed representation can be used to simultaneously achieve different operating points on the distortion-perception tradeoff. We will demonstrate through both theoretical analysis and experimental results involving deep learning models that near-optimal encoding representations can be constructed for a broad class of source models.

**Bio: **Jun Chen received the B.S. degree in Electronic Engineering from Shanghai Jiao Tong University in 2001, and the M.S. and Ph.D. degrees in Electrical and Computer Engineering from Cornell University in 2004 and 2006, respectively. He was a Postdoctoral Research Associate in the Coordinated Science Laboratory at the University of Illinois at Urbana-Champaign from September 2005 to July 2006, and a Postdoctoral Fellow at the IBM Thomas J. Watson Research Center from July 2006 to August 2007. Since September 2007 he has been with the Department of Electrical and Computer Engineering at McMaster University, where he is currently a Professor. He held the title of the Barber-Gennum Chair in Information Technology from 2008 to 2013 and the Joseph Ip Distinguished Engineering Fellow from 2016 to 2018. He was a recipient of the Josef Raviv Memorial Postdoctoral Fellowship (2006), the Early Researcher Award from the Province of Ontario (2010), the IBM Faculty Award (2010), the ICC Best Paper Award (2020), and the JSPS Invitational Fellowship (2021). He served as an Editor of the IEEE Transactions on Green Communications and Networking from 2020 to 2021 and is currently an Associate Editor of the IEEE Transactions on Information Theory.

**Title:** Neural Estimation of the Rate-Distortion Function With Applications to Operational Source Coding

**Abstract: **A fundamental question in designing lossy data compression schemes is how well one can do in comparison with the rate-distortion function, which describes the known theoretical limits of lossy compression. Motivated by the empirical success of deep neural network (DNN) compressors on large, real-world data, we investigate methods to estimate the rate-distortion function on such data, which would allow comparison of DNN compressors with optimality. While one could use the empirical distribution of the data and apply the Blahut-Arimoto algorithm, this approach presents several computational challenges and inaccuracies when the datasets are large and high-dimensional, such as the case of modern image datasets. Instead, we re-formulate the rate-distortion objective, and solve the resulting functional optimization problem using neural networks. We apply the resulting rate-distortion estimator, called NERD, on popular image datasets, and provide evidence that NERD can accurately estimate the rate-distortion function. Using our estimate, we show that the rate-distortion achievable by DNN compressors are within several bits of the rate-distortion function for real-world datasets. Additionally, NERD provides access to the rate-distortion achieving channel, as well as samples from its output marginal. Therefore, using recent results in reverse channel coding, we describe how NERD can be used to construct an operational one-shot lossy compression scheme with guarantees on the achievable rate and distortion. Experimental results demonstrate competitive performance with DNN compressors. Finally, we discuss several future research directions pertaining to techniques to make neural compressors optimal and scalable.

**Bio: **Shirin Saeedi Bidokhti is an assistant professor in the Department of Electrical and Systems Engineering at the University of Pennsylvania (UPenn). She received her M.Sc. and Ph.D. degrees in Computer and Communication Sciences from the Swiss Federal Institute of Technology (EPFL). Prior to joining UPenn, she was a postdoctoral scholar at Stanford University and the Technical University of Munich. She has also held short-term visiting positions at ETH Zurich, University of California at Los Angeles, and the Pennsylvania State University. Her research interests broadly include the design and analysis of network strategies that are scalable, practical, and efficient for use in Internet of Things (IoT) applications, information transfer on networks, as well as data compression techniques for big data. She is a recipient of the 2022 IT society Goldsmith lecturer award, 2021 NSF-CAREER award, 2019 NSF-CRII Research Initiative award and the prospective researcher and advanced postdoctoral fellowships from the Swiss National Science Foundation.

**Title:** Relative Entropy Coding

**Abstract: **Relative entropy coding (REC) is an approach to data compression which does not require quantization. A REC algorithm uses samples from a proposal distribution P to produce a random code representing a sample from a target distribution Q, with expected length of approximately KL[Q|P]_2 bits, where the subscript 2 specifies that the KL divergence is measured in bits rather than nats. REC algorithms can be applied even when Q and P are continuous and can be naturally applied to perform compression with models trained via gradient descent for applications including but not limited to: (1) data compression with variational autoencoders where Q corresponds to a variational posterior over latent variables and P to a prior over these latent variables; (2) model compression, where Q corresponds to an approximate posterior over parameters, and P to a prior over those parameters. In this talk, we will review some REC algorithms and their applications to compression problems.

**Bio:** Dr. Lobato is a University Lecturer (equivalent to US Assistant Professor) in Machine Learning at the Department of Engineering in the University of Cambridge, UK. He was before a postdoctoral fellow in the Harvard Intelligent Probabilistic Systems group at the School of Engineering and Applied Sciencies of Harvard University, working with the group leader Prof. Ryan Adams. Before that, he was a postdoctoral research associate in the Machine Learning Group at the Department of Engineering in the University of Cambridge (UK) from June 2011 to August 2014, working with Prof. Zoubin Ghahramani. From December 2010 to May 2011, he was a teaching assistant at the Computer Science Department in Universidad Autónoma de Madrid (Spain), where he completed his Ph.D. and M.Phil. in Computer Science in December 2010 and June 2007, respectively. He also obtained a B.Sc. in Computer Science from this institution in June 2004, with a special prize to the best academic record on graduation. His research revolves around model based machine learning with a focus on probabilistic learning techniques and with a particular interest on Bayesian optimization, matrix factorization methods, copulas, Gaussian processes and sparse linear models.

## Massive Access

**Speaker: **Giuseppe Durisi**Title: **Short packets over a massive random-access channel

**Abstract: **To support future distributed autonomous systems operating in real time, we need a new wireless infrastructure, able to provide connectivity to a large number of sporadically active devices transmitting short data packets. In this talk, I will present an overview of recent information-theoretic results that provide a characterization of the largest energy efficiency achievable in the so-called unsourced massive multiple access scenario, where a large number of sporadically active devices access the wireless medium in an uncoordinated fashion. I will also discuss extensions of this scenario to the case in which the number of active devices is random and unknown to the receiver, and to the case in which the devices transmit heterogeneous traffic consisting of both standard and critical messages.**Bio: **Giuseppe Durisi received the Laurea degree summa cum laude and the Doctor degree both from Politecnico di Torino, Italy, in 2001 and 2006, respectively. From 2002 to 2006, he was with Istituto Superiore Mario Boella, Torino, Italy. From 2006 to 2010 he was a postdoctoral researcher at ETH Zurich, Zurich, Switzerland. In 2010, he joined Chalmers University of Technology, Gothenburg, Sweden, where he is now full professor with the Communication Systems Group. At Chalmers, he served as co-director of the Information and Communication Technology Area of Advance, and of the Artificial Intelligence Research Center. He is currently director of the master program in information and communication technologies. Dr. Durisi is a senior member of the IEEE. He is the recipient of the 2013 IEEE ComSoc Best Young Researcher Award for the Europe, Middle East, and Africa Region, and is co-author of a paper that won a “student paper award” at the 2012 International Symposium on Information Theory, and of a paper that won the 2013 IEEE Sweden VT-COM-IT joint chapter best student conference paper award. From 2011 to 2014, he served as publications editor for the IEEE Transactions on Information Theory. From 2015 to 2021, he served as associate editor for the IEEE Transactions on Communications. His research interests are in the areas of communication and information theory and machine learning.

**Speaker: **Maxime Guillaud

**Title: **Towards Practical Massive Random Access

**Abstract: **In multi-user wireless communications, random access departs from the classical set-up of multiple access in the fact that the number and identity of the active transmitters are unknown to the receiver (typically because the sporadic nature of the traffic does not allow for coordination between transmitters). This is a significant departure from the well-understood multiple access schemes (including non-orthogonal multiple access, NOMA). Random access arises e.g. in massive internet-of-things, ultra-reliable and low-latency communications, and non-terrestrial networks applications. This talk will outline why, compared to multiple access, multi-user decoding in random access scenarios is markedly more difficult, and requires to revise some of the basic assumptions that underpin modern multi-user communications systems, such as the pre-existence of synchronization and timing offset compensation, or centralized assignment of pilot sequences. We will focus on massive random access, which explicitly considers the high-contention regime, i.e. the case where the number of simultaneously active transmitters can be very large, and discuss some of the practical waveforms and coding approaches that have been proposed in practice to solve this problem.

**Bio: **Dr. Maxime Guillaud is a senior researcher at Inria, the French national research institute for digital science and technology. He received the PhD from EURECOM in 2005. Before joining Inria in 2023, Dr. Guillaud has held research positions in both academia and industry, with Lucent Bell Laboratories (now Nokia) in the USA, the FTW research center and Vienna University of Technology in Austria, and Huawei Technologies in France. He is an expert on the physical layer of radio access networks, including transceiver algorithms, channel modeling, machine learning, and modulation design for non-coherent and multiple access communications. He has authored over 90 research papers and holds 18 patents.

**Speaker: **Ramji Venkataramanan**Title: **Near-optimal communication over many-user channels

**Abstract: **We consider communication over the Gaussian multiple-access channel in the asymptotic regime where the number of users grows linearly with the code length. We propose efficient coding schemes based on random linear models with approximate message passing (AMP) decoding, and study the tradeoff between energy-per-bit and achievable user density for a fixed target error rate. It is demonstrated that in the large system limit, a spatially coupled coding scheme with AMP decoding achieves near-optimal tradeoffs for a wide range of user densities. We also present achievability bounds and coding schemes for many-user channels with random user activity.

**Bio:** Ramji Venkataramanan is Professor of Information Engineering at the University of Cambridge. He received his PhD in Electrical Engineering from the University of Michigan, Ann Arbor in 2008, and his undergraduate degree from the Indian Institute of Technology, Madras in 2002. Before joining the University of Cambridge in 2013, he held post-doctoral positions at Stanford University and Yale University. His research interests span statistical learning, information theory, and communications. He is an Associate Editor of the IEEE Transactions on Information Theory, and from 2018-21 was a Fellow of the Alan Turing Institute and an Associate Editor of the IEEE Transactions on Communications.

**Speaker:** Petar Popovski

**Title:** On Control Signaling in Uplink and Downlink Massive Access

**Abstract: **This talk will present solutions to two control signaling problems in massive access. The first problem is related to user identification and authentication in unsourced access. When the number of devices is massive and relatively small packets are transmitted, the overhead of including identification/authentication fields becomes restrictive. This led to schemes based on unsourced random access (U-RA) that completely drop those fields. I will present a scheme that builds upon an underlying U-RA protocol and solves the problem of user identification and authentication. Second, the talk will focus on the problem of sending feedback in the downlink to a massive set of devices. This is a necessary functionality, as reliable communication needs to rely on two-way communication for acknowledgement and retransmission. I will introduce the massive ARQ (Automatic Retransmission reQuest) protocol and present a scheme for joint encoding of multiple acknowledgements in the downlink.

**Bio: **Petar Popovski is a Professor at Aalborg University, where he heads the section on Connectivity and a Visiting Excellence Chair at the University of Bremen. He received his Dipl.-Ing (1997) and M. Sc. (2000) degrees in communication engineering from the University of Sts. Cyril and Methodius in Skopje and the Ph.D. degree (2005) from Aalborg University. He is a Fellow of the IEEE. He received an ERC Consolidator Grant (2015), the Danish Elite Researcher award (2016), IEEE Fred W. Ellersick prize (2016), IEEE Stephen O. Rice prize (2018), Technical Achievement Award from the IEEE Technical Committee on Smart Grid Communications (2019), the Danish Telecommunication Prize (2020) and Villum Investigator Grant (2021). He is currently an Editor-in-Chief of IEEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS and Chair of the IEEE Communication Theory Technical Committee. His research interests are in the area of wireless communication and communication theory. He authored the book ``Wireless Connectivity: An Intuitive and Fundamental Guide'', published by Wiley in 2020.

## Mathematics of Learning

**Speaker: **Gitta Kutyniok

**Title: **Reliable AI: From Mathematical Foundations to Analog Computing

**Abstract**: Artificial intelligence is currently leading to one breakthrough after the other, in the sciences, in public life, and also in industry.

However, one current major drawback is the lack of reliability of such methodologies. In this lecture we will first provide an introduction into this vibrant research area, focussing specifically on deep neural networks. We will then survey recent advances, in particular, concerning generalization guarantees and explainability methods. Finally, we will discuss fundamental limitations of deep neural networks and related approaches in terms of computability, which seriously affects their reliability, and reveal a connection with analog computing such as neuromorphic or quantum computing."

**Bio:** Gitta Kutyniok (https://www.ai.math.lmu.de/kutyniok) currently has a Bavarian AI Chair for Mathematical Foundations of Artificial Intelligence at the Ludwig-Maximilians Universität München. She received her Diploma in Mathematics and Computer Science as well as her Ph.D. degree from the Universität Paderborn in Germany, and her Habilitation in Mathematics in 2006 at the Justus-Liebig Universität Gießen. From 2001 to 2008 she held visiting positions at several US institutions, including Princeton University, Stanford University, Yale University, Georgia Institute of Technology, and Washington University in St. Louis. In 2008, she became a full professor of mathematics at the Universität Osnabrück, and moved to Berlin three years later, where she held an Einstein Chair in the Institute of Mathematics at the Technische Universität Berlin and a courtesy appointment in the Department of Computer Science and Engineering until 2020. In addition, Gitta Kutyniok holds an Adjunct Professorship in Machine Learning at the University of Tromso since 2019. Gitta Kutyniok has received various awards for her research such as an award from the Universität Paderborn in 2003, the Research Prize of the Justus-Liebig Universität Gießen and a Heisenberg-Fellowship in 2006, and the von Kaven Prize by the DFG in 2007. She was invited as the Noether Lecturer at the ÖMG-DMV Congress in 2013, a plenary lecturer at the 8th European Congress of Mathematics (8ECM) in 2021, and the lecturer of the London Mathematical Society (LMS) Invited Lecture Series in 2022. She was also honored by invited lectures at both the International Congress of Mathematicians 2022 (ICM 2022) and the International Congress on Industrial and Applied Mathematics (ICIAM 2023). Moreover, she was elected as a member of the Berlin-Brandenburg Academy of Sciences and Humanities in 2017 and of the European Academy of Sciences in 2022, and became a SIAM Fellow in 2019. She is currently the main coordinator of the Research Focus "Next Generation AI" at the Center for Advanced Studies at LMU and the DFG-Priority Program "Theoretical Foundations of Deep Learning", serves as Vice President-at-Large of SIAM, and acts as Co-Director of the Konrad Zuse School of Excellence in Reliable AI (relAI) in Munich.

**Title:** Optimal Convergence Rate for Exact Policy Mirror Descent in Discounted Markov Decision Processes

**Abstract:** Policy Mirror Descent (PMD) is a general family of algorithms that covers a wide range of novel and fundamental methods in reinforcement learning. In this work we show that PMD with exact policy evaluation achieves a dimension-free $\gamma$-rate of convergence under an adaptive step-size. We show that both this rate is unimprovable for PMD and that the adaptive step-size is necessary to achieve it. Our study of the convergence of PMD avoids the use of the performance difference lemma, which leads to a direct analysis of independent interest. We also extend the analysis to the inexact setting and establish the first dimension-optimal sample complexity for unregularized PMD under a generative model, improving upon the best-known result.

This is joint work with Emmeran Johnson and Patrick Rebeschini.

**Bio:** Dr Ciara Pike-Burke is a lecturer in Statistics at Imperial College London. Her research is in statistical machine learning, specialising in sequential decision making problems. She is interested in decision making under uncertainty and potentially limited feedback, and has worked on multi-armed bandit, reinforcement learning, and online learning problems. Ciara's research often focuses on the mathematical foundations of such models, providing formal guarantees that algorithms will behave in the desired way, but she is also interested in applications of these algorithms in settings such as education.

**Speaker: **Anders Hansen

**Title: **On Smale’s 9th problem, generalised hardness of approximation and the limits of AI

**Abstract:** Alchemists wanted to create gold, Hilbert wanted an algorithm to solve Diophantine equations, researchers want to make deep learning robust in AI without hallucinations, MATLAB wants (but fails) to detect when it provides wrong solutions to linear programs, etc. Why do we fail in so many of these fundamental cases? The reason is typically methodological barriers. The history of science is full of methodological barriers — reasons for why we never succeed in reaching certain goals. In many cases, this is due to barriers in the foundations of mathematics. This talk introduces new such barriers from foundations: the phenomenon of generalised hardness of approximation (GHA). GHA grows out of our work on Smale’s 9th problem on the list of mathematical problems for the 21st century. This phenomenon is not a rare issue — but happens on a daily basis in computations (in particular in AI) — and causes modern software such as MATLAB to fail on basic problems, and even certify nonsensical solutions as correct. GHA is close in spirit to hardness of approximation (HA) in computer science. Assuming P not equal to NP, HA is the phenomenon that one can easily compute an epsilon-approximation to the solution of a discrete computational problem for epsilon > epsilon_0>0, but for epsilon < epsilon_0 it suddenly becomes intractable. HA was discovered decades ago and has been transformative being the subject of several Goedel, Nevanlinna and ACM prizes. GHA is a similar but distinct mathematical phenomenon that requires a new set of mathematical tools in computation rather than computer science (NB: GHA is independent of P vs. NP). The GHA grows out of the Solvability Complexity Index (SCI) hierarchy framework, and has so far been detected in optimisation, inverse problems, deep learning and AI, as well as computer-assisted proofs. We will discuss in particular the connections to AI.

**Bio: **Anders C. Hansen is professor of mathematics at the University of Cambridge, where he leads the Applied Functional and Harmonic Analysis Group, and a Royal Society University Research Fellow. He is the recipient of the Leverhulme Prize in mathematics (2017), The IMA Prize in Mathematics and its Applications (2018), and the Whitehead Prize in mathematics (2019). His research span foundations of computational mathematics, mathematical analysis, the SCI hierarchy, foundations of AI, inverse problems, imaging etc. His book (with Ben Adcock) — Compressive Imaging: Structure, Sampling, Learning (CUP 2021) — was a finalist for the 2022 PROSE Award given by the Association of American Publishers.

**Speaker: **Yao Xie

**Title: **Kernel two-sample tests for manifold data have no curse-of-dimensionality

**Abstract: **We study a kernel two-sample test statistic (related to the popular Maximum Mean Discrepancy (MMD) in statistics and machine learning) when data are noisy observations sampled from distributions supported on low-dimensional sub-manifolds. We characterize the two-sample test performance in the non-asymptotic regime for the manifold data, revealing that the test has no curse-of-dimensionality when the data lie on or close to a low-dimensional manifold. This result suggests that the kernel two-sample statistic can adapt to the underlying data manifold structure. In particular, our analysis shows that the test is powerful when the number of samples n is on the order of D^{-2-d/2\beta}, where d is the intrinsic dimensionality, D is the squared L2-divergence between the true densities p and q, which are Holder with order \beta. We validate our theory with numerical experiments on manifold data.

**Bio: **Yao Xie is an Associate Professor, Harold R. and Mary Anne Nash Early Career Professor at Georgia Institute of Technology in the H. Milton Stewart School of Industrial and Systems Engineering and Associate Director of the Machine Learning Center. She received her Ph.D. in Electrical Engineering (minor in Mathematics) from Stanford University and was a Research Scientist at Duke University. Her research lies at the intersection of statistics, machine learning, and optimization in providing theoretical guarantees and developing computationally efficient and statistically powerful methods for problems motivated by real-world applications. She received the National Science Foundation (NSF) CAREER Award in 2017, INFORMS Wagner Prize Finalist in 2021, and the INFORMS Gaver Early Career Award for Excellence in Operations Research in 2022. She is currently an Associate Editor for IEEE Transactions on Information Theory, IEEE Transactions on Signal Processing, Journal of the American Statistical Association, Theory and Methods, Sequential Analysis: Design Methods and Applications, INFORMS Journal on Data Science, and an Area Chair of NeurIPS in 2021, 2022, 2023, and ICML in 2023.

## Online Learning

**Title:** Online-to-PAC conversions: Generalization Bounds via Online Learning

**Abstract:** We present a new framework for deriving bounds on the generalization bound of statistical learning algorithms from the perspective of online learning. Specifically, we construct an online learning game called the Generalization Game, where an online learner is trying to compete with a fixed statistical learning algorithm in predicting the sequence of generalization gaps on a training set of i.i.d. data points. We establish a connectionbetween the online and statistical learning setting by showing that the existence of an online learning algorithm with bounded regret in this game implies a bound on the generalization error of the statistical learning algorithm, up to a martingale concentration term that is independent of the complexity of the statistical learning method. This technique allows us to recover several standard generalization bounds including a range of PAC-Bayesian and information-theoretic guarantees, as well as generalizations thereof.

**Bio:** Gergely Neu is a research assistant professor at the Pompeu Fabra University, Barcelona, Spain. He has previously worked with the SequeL team of INRIA Lille, France and the RLAI group at the University of Alberta, Edmonton, Canada. He obtained his PhD degree in 2013 from the Budapest University of Technology and Economics, where his advisors were András György, Csaba Szepesvári and László Györfi. His main research interests are in machine learning theory, with a strong focus on sequential decision making problems. Dr. Neu was the recipient of a Google Faculty Research award in 2018, the Bosch Young AI Researcher Award in 2019, and an ERC Starting Grant in 2020.

**Title:** Semi-Decentralized Federated Learning with Collaborative Relaying in Unreliable Networks

**Abstract:** Intermittent client connectivity is one of the major challenges in centralized federated edge learning frameworks. Intermittently failing uplinks to the central parameter server (PS) can induce a large generalization gap in performance especially when the data distribution among the clients exhibits heterogeneity. To mitigate communication blockages between clients and the central PS, this talk introduces the concept of knowledge relaying wherein the successfully participating clients collaborate in relaying their neighbors' local updates to a central PS in order to boost the participation of clients with intermittently failing connectivity. I will accomplish this vision by presenting a collaborative relaying-based semi-decentralized federated edge learning framework where at every communication round each client first computes a local consensus of the updates from its neighboring clients and eventually transmits a weighted average of its own update and those of its neighbors to the PS. Further, we will discuss how to appropriately optimize these averaging weights to reduce the variance of the global update at the PS while ensuring that the global update is unbiased, consequently improving the convergence rate. I will validate the theoretical results and demonstrate that this proposed scheme is superior to the Federated averaging benchmark especially when data distribution among clients is non-iid. Finally, if time permits I will also discuss the privacy ramifications of the presented scheme and how to update the collaborative relaying to adhere to privacy constraints.

**Bio:** Michal Yemini is an assistant professor at Bar-Ilan University, Ramat-Gan, Israel, additionally, she is a visiting research collaborator at Princeton University, Princeton, USA. Prior to that, she was an associate research scholar at Princeton University, a postdoctoral researcher at Stanford University, Stanford, USA, and a visiting postdoctoral researcher at Princeton University. Her main research interests include distributed optimization, sequential decision-making, learning theory, information theory, and percolation theory.

She received the Eric and Wendy Schmidt Postdoctoral Award for Women in Mathematical and Computing Sciences, the Council of Higher Education’s Postdoctoral Fellowships Program for Outstanding Women in Science, and the Bar-Ilan University’s Postdoctoral Fellowship for Women. She obtained her BSc in computer engineering from the Technion-Israel Institute of Technology, Haifa, Israel, in 2011. In 2017 she received her PhD degree in the joint MSc-PhD program from the Faculty of Engineering, Bar-Ilan University, Ramat-Gan, Israel.

**Title: **Looking beyond the worst-case adversaries in online decision making".

**Abstract:** Robustness to changes in data is one of the main challenges faced by sequential machine learning and decision-making algorithms. Yet, most efficient and highly optimized deployed algorithms today were designed to work well on fixed data sets and ultimately fail when data evolves in unpredictable or adversarial ways. It is even more concerning that, for most fundamental problems in machine learning and optimization, providing any performance guarantees that are not completely diminished in the presence of all-powerful adversaries is impossible. We will explore the smoothed analysis perspective on adaptive adversaries in machine learning and optimization, which goes beyond the worst-case scenario. We will examine both information theoretical and computational perspectives and present general-purpose techniques that provide strong robustness guarantees in practical domains for a wide range of applications, such as online learning, differential privacy, discrepancy theory, sequential probability assignment, and learning-augmented algorithm design. Our conclusion is that even small perturbations to worst-case adaptive adversaries can make learning in their presence as easy as learning over a fixed data set.

**Bio:** Nika Haghtalab is an Assistant Professor in the Department of Electrical Engineering and Computer Sciences at UC Berkeley. She works broadly on the theoretical aspects of machine learning and algorithmic economics. Prof. Haghtalab’s work builds theoretical foundations for ensuring both the performance of learning algorithms in presence of everyday economic forces and the integrity of social and economic forces that are born out of the use of machine learning systems. Previously, Prof. Haghtalab was an Assistant Professor in the CS department of Cornell University, in 2019-2020. She received her Ph.D. from the Computer Science Department of Carnegie Mellon University. She is a co-founder of Learning Theory Alliance (LeT-All). Among her honors are the CMU School of Computer Science Dissertation Award, SIGecom Dissertation Honorable Mention, and NeurIPS outstanding paper award.

**Title:** Kernelized Reinforcement Learning

**Abstract:** Reinforcement Learning (RL) has shown great empirical success in various settings with complex models and large state-action spaces. However, the existing analytical results typically focus on settings with a small number of state-actions or simple models, such as linearly modeled state-action value functions. To derive RL policies that efficiently handle large state-action spaces with more general value functions, some recent works have explored nonlinear function approximation using kernel ridge regression. In this talk, we examine existing results in this RL setting, analytical tools, their limitations and some advances towards optimal performance guarantees.

**Bio:** Sattar Vakili is a senior AI researcher at MediaTek Research, the research arm of MediaTek, the 4th largest global fabless semiconductor company. He specializes in problems involving sequential decision-making in uncertain environments, with a focus on optimization, bandit and reinforcement learning, kernel-based modeling, and neural networks. Before joining MediaTek Research, Sattar worked at Secondmind.ai, a research lab in Cambridge, UK, led by Professor Carl Rasmussen, Cambridge University. There, he gained expertise in kernel-based and Gaussian process models. Prior to that, he was a postdoc at Princeton University, and he earned his PhD under the supervision of Professor Qing Zhao at Cornell University in 2017.

## Optimal Transport and Machine Learning

**Title: **On the Monge gap and the MBO feature-sparse transport estimator.

**Abstract: **This talk will cover two recent works aimed at estimating Monge maps from samples. In the first part (in collaboration with Théo Uscidda) I will present a novel approach to train neural networks so that they mimic Monge maps for the squared-Euclidean cost. In that field, a popular approach has been to parameterize dual potentials using input convex neural networks, and estimate their parameters using SGD and a convex conjugate approximation. We present in this work a regularizer for that task that is conceptually simpler (as it does not require any assumption on the architecture) and which extends to non-Euclidean costs. In the second part (in collaboration with Michal Klein and Pierre Ablin), I will show that when adding to the squared-Euclidean distance an extra translation-invariant cost, the Brenier theorem translates into the application of the proximal mapping of that extra term to the derivative of the dual potential. Using an entropic map to parameterize that potential, we obtain the Monge-Bregman-Occam (MBO) estimator, which has the defining property that its displacement vectors T(x) - x are sparse, resulting in interpretable OT maps in high dimensions.

**Bio:** Marco Cuturi received my Ph.D. in 11/2005 from Ecole des Mines de Paris. Before that he graduated from the ENSAE with a master degree from ENS Cachan. He worked as a post-doctoral researcher at the Institute of Statistical Mathematics, Tokyo, between 11/2005 and 03/2007. Between 04/2007 and 09/2008 he worked in the financial industry. After working at the ORFE department of Princeton University between 02/2009 and 08/2010 as a lecturer, he was at the Graduate School of Informatics of Kyoto University between 09/2010 and 09/2016 as an associate professor (tenured in 11/2013). He joined ENSAE in 09/2016, working there part-time from 10/2018. Between 10/2018 and 01/2022 he was with the Google Brain team. He joined Apple on 01/2022, working in the Machine Learning Research team led by Samy Bengio.

**Title:** Gromov-Wasserstein Distances: Statistical and Computational Advancements via a New Duality Theory

**Abstract:** The Gromov-Wasserstein (GW) distance quantifies dissimilarity between metric measure spaces and provides a natural correspondence between them. As such, it serves as a figure of merit for applications involving alignment of heterogeneous datasets, including object matching, multi-modal analysis of biological data, and matching language models. While various computational methods for the GW distance have been developed, formal guarantees for such approaches as well as foundational questions concerning a duality theory and statistical estimation rates remained open. This work closes these gaps for the GW distance with quadratic cost over Euclidean spaces of different dimensions. We focus on the entropically regularized GW (EGW) distance, although similar results apply for the standard (unregularized) version. We derive a novel dual formulation of the EGW problem by representing it as an infimum of certain entropic optimal transportation problems. The dual form enables deriving, for the first time, empirical convergence rates (which are shown to be minimax optimal), sufficient conditions for convexity, and efficient computational methods with global convergence guarantees. Our results serve as a first step towards principled learning and inference methods for heterogeneous data, under the GW framework.

**Bio:** Ziv Goldfeld is an assistant professor in the School of Electrical and Computer Engineering, and a graduate field member in Computer Science, Statistics, Data Science, and the Center of Applied Mathematics, at Cornell University. Before joining Cornell, he was a postdoctoral research fellow in LIDS at MIT. Ziv graduated with a B.Sc., M.Sc., and Ph.D. (all summa cum laude) in Electrical and Computer Engineering from Ben Gurion University, Israel. Ziv’s research interests include optimal transport theory, statistical learning theory, information theory, and mathematical statistics. Honors include the NSF CAREER Award, the IBM University Award, and the Rothschild Postdoctoral Fellowship.

**Title:** Optimal Transport for Offline Imitation Learning

**Abstract:** With the advent of large datasets, offline reinforcement learning is a promising framework for learning good decision-making policies without the need to interact with the real environment. However, offline RL requires the dataset to be reward-annotated, which presents practical challenges when reward engineering is difficult or when obtaining reward annotations is labor-intensive. We propose an imitation learning algorithm that can automatically relabel offline data of mixed and unknown quality with rewards from a few good demonstrations. The key idea is to use optimal transport to compute an optimal alignment between an unlabeled trajectory in the dataset and an expert demonstration to obtain a similarity measure that can be interpreted as a reward, which can then be used by an offline RL algorithm to learn the policy. This approach is easy to implement and computationally efficient. On D4RL benchmarks, we demonstrate that with a single demonstration this approach can consistently match the performance of offline RL that has access to ground-truth rewards.

**Bio:** Professor Marc Deisenroth is the DeepMind Chair of Machine Learning and Artificial Intelligence at University College London and the Deputy Director of the UCL Centre for Artificial Intelligence. His research interests center around data-efficient machine learning, probabilistic modeling and autonomous decision making with applications in climate/weather science and robotics.

## Quantum Machine Learning

**Title: **Quantum Machine Learning: a quantum Information perspective

**Abstract: **In recent years there have been an increasing number of results where quantum physics has been combined with machine learning for different reasons. On the one hand, quantum computers promise to significantly speed up some of the computational techniques used in machine learning and, on the other hand, “classical” machine learning methods can help us with the verification and classification of complex quantum systems. Moreover, the rich mathematical structure of quantum mechanics can help define new models and learning paradigms. In this talk, we will introduce quantum machine learning in all of these flavors, and then discuss how to bound the accuracy and generalization errors via entropic quantities, such as Renyi generalizations of the quantum mutual information. These bounds establish a link between the compression of information into quantum states and the ability to learn, and allow us to understand how difficult it is, namely how many samples are needed in the worst case scenario, to learn a quantum classification problem from examples. Finally, we will introduce a quantum version of the Information Bottleneck principle that allows us to explore the various tradeoffs between accuracy and generalization. Different applications will be considered, such as the classification of complex phases of matter or the optimization of quantum embeddings of classical data.

**Bio:** Leonardo Banchi is an Associate Professor of Theoretical Physics of Matter at the Department of Physics and Astronomy of the University of Florence (Firenze). He received his PhD in Florence and worked as a post-doc at ISI foundation (Torino), University College London and Imperial College London (UK). He also worked as a scientist in the industry, at Xanadu Inc. (Toronto, Canada). Leonardo Banchi’s main research interests are quantum algorithms for simulating many-body physics and machine learning, quantum information and communication theory. He currently works on formal and theoretical aspects of quantum machine learning, such as classifying the complexity of learning quantum properties of physical objects directly from data. He has published in several journals including Nature (Reviews) Physics , Nature Computational Science, Nature Communications, npj Quantum Information, Quantum, PRX, PRX Quantum and Physical Review Letters.

**Title: **Generalization in Quantum Machine Learning

**Abstract: **A (machine) learning task is a data-driven process in which a learner observes a data set and uses it to train a machine learning (ML) model to either decode the input-output relationship (supervised learning) or to extract hidden structure/patterns (unsupervised learning). The goal of learning is to infer a ML model that generalizes well, i.e., the model trained on observed data performs well on a new, previously unseen, data. In this talk, we introduce and review recent results in the generalization analysis of quantum machine learning (QML) models that includes quantum neural networks (QNNs). QML models find increasing application in discriminative (or supervised) learning as well as in generative learning, where in the latter, a QNN models the probability distribution underlying the observed data. While generalization performance of discriminative QML models is well-studied by borrowing tools from classical learning theory, the analysis of the latter is still in its infancy both in classical and quantum settings. Challenges include appropriately defining the notion of generalization in the context of generative learning and quantifying it. We give an overview of the state-of-the art approaches adopted in understanding generalization in quantum discriminative and generative learning and discuss some open problems.

**Bio:** Dr. Sharu Theresa Jose is an Assistant Professor in the Department of Computer Science of University of Birmingham (UoB), United Kingdom. Her research interests span information theory, statistical learning and quantum computing. Prior to joining UoB, she was a post-doctoral research associate with the Department of Engineering of King’s College London, working with Prof. Osvaldo Simeone. She received her Ph.D. from the Systems and Control Engineering department of IIT Bombay, India in 2018. She is the recipient of Excellence in Research Award, 2019 from IIT Bombay.

**Title:** Information-theoretic upper and lower bounds in quantum machine learning

**Abstract:** A fundamental challenge in any learning problem is to determine the data size necessary and sufficient to solve the task. Recent years have seen significant progress in our understanding of such sample complexities in different quantum machine learning scenarios. In this talk, I will present some of the prevalent information-theoretic strategies for establishing sample complexity upper and lower bounds, with applications in quantum Probably Approximately Correct learning, shadow tomography, classical shadows, and other quantum learning settings.

**Bio:** Matthias C. Caro is a postdoctoral visiting research fellow at California Institute of Technology and a postdoctoral researcher at Free University Berlin. His research lies at the intersection of quantum information theory and machine learning theory, contributing to a rigorous understanding of the potential and limitations of quantum machine learning. He completed his PhD under the supervision by Michael M. Wolf at the Technical University of Munich, where he worked mainly on statistical properties of variational quantum machine learning models.

**Title:** Dimensionality Reduction in the Presence of Quantum Models and Agents

**Abstract:** Dimensionality reduction captures the spirit of Occam’s razor – asking the fundamental question: What features of current data are the essential indicators of future behaviour? By isolating such features and discarding the rest, we gain a better understanding of underlying causal behaviour, while enabling models with reduced memory costs and enhanced generalisation capability. Could quantum computers offer fundamental advantages in this arena? Here I will outline our ongoing work along these directions in the context of stochastic modelling and quantum-enhanced adaptive agents, demonstrating that machines armed with the capacity to quantum information offers means to dimensionality reduce beyond what provably classical limits. Time permitting, I will discuss what these results imply for in neougbouring fields of thermodynamics and complexity science.

**Bio: **Gu is originally from New Zealand and obtained his Ph.D. at the University of Queensland in 2009. After a faculty position at Tsinghua at the Institute for Interdisciplinary Information science, he joined Nanyang Technological University in 2016 under Singapore’s National Research Foundation fellow program. He presently leads the quantum and complexity science initiative – focusuing at topics bridging quantum, complexity and information science, and holds joint affiliations with the Centre for Quantum technologies and Majulabs.