Think! Evidence

Learning and inference with Wasserstein metrics

Show simple item record

dc.contributor Tomaso Poggio.
dc.contributor Massachusetts Institute of Technology. Department of Brain and Cognitive Sciences.
dc.contributor Massachusetts Institute of Technology. Department of Brain and Cognitive Sciences.
dc.creator Frogner, Charles (Charles Albert)
dc.date 2019-03-01T19:52:20Z
dc.date 2019-03-01T19:52:20Z
dc.date 2018
dc.date 2018
dc.date.accessioned 2019-05-10T17:25:40Z
dc.date.available 2019-05-10T17:25:40Z
dc.identifier http://hdl.handle.net/1721.1/120619
dc.identifier 1086609029
dc.identifier.uri https://evidence.thinkportal.org/handle/1721.1/120619
dc.description Thesis: Ph. D., Massachusetts Institute of Technology, Department of Brain and Cognitive Sciences, 2018.
dc.description Cataloged from PDF version of thesis.
dc.description Includes bibliographical references (pages 131-143).
dc.description This thesis develops new approaches for three problems in machine learning, using tools from the study of optimal transport (or Wasserstein) distances between probability distributions. Optimal transport distances capture an intuitive notion of similarity between distributions, by incorporating the underlying geometry of the domain of the distributions. Despite their intuitive appeal, optimal transport distances are often difficult to apply in practice, as computing them requires solving a costly optimization problem. In each setting studied here, we describe a numerical method that overcomes this computational bottleneck and enables scaling to real data. In the first part, we consider the problem of multi-output learning in the presence of a metric on the output domain. We develop a loss function that measures the Wasserstein distance between the prediction and ground truth, and describe an efficient learning algorithm based on entropic regularization of the optimal transport problem. We additionally propose a novel extension of the Wasserstein distance from probability measures to unnormalized measures, which is applicable in settings where the ground truth is not naturally expressed as a probability distribution. We show statistical learning bounds for both the Wasserstein loss and its unnormalized counterpart. The Wasserstein loss can encourage smoothness of the predictions with respect to a chosen metric on the output space. We demonstrate this property on a real-data image tagging problem, outperforming a baseline that doesn't use the metric. In the second part, we consider the probabilistic inference problem for diffusion processes. Such processes model a variety of stochastic phenomena and appear often in continuous-time state space models. Exact inference for diffusion processes is generally intractable. In this work, we describe a novel approximate inference method, which is based on a characterization of the diffusion as following a gradient flow in a space of probability densities endowed with a Wasserstein metric. Existing methods for computing this Wasserstein gradient flow rely on discretizing the underlying domain of the diffusion, prohibiting their application to problems in more than several dimensions. In the current work, we propose a novel algorithm for computing a Wasserstein gradient flow that operates directly in a space of continuous functions, free of any underlying mesh. We apply our approximate gradient flow to the problem of filtering a diffusion, showing superior performance where standard filters struggle. Finally, we study the ecological inference problem, which is that of reasoning from aggregate measurements of a population to inferences about the individual behaviors of its members. This problem arises often when dealing with data from economics and political sciences, such as when attempting to infer the demographic breakdown of votes for each political party, given only the aggregate demographic and vote counts separately. Ecological inference is generally ill-posed, and requires prior information to distinguish a unique solution. We propose a novel, general framework for ecological inference that allows for a variety of priors and enables efficient computation of the most probable solution. Unlike previous methods, which rely on Monte Carlo estimates of the posterior, our inference procedure uses an efficient fixed point iteration that is linearly convergent. Given suitable prior information, our method can achieve more accurate inferences than existing methods. We additionally explore a sampling algorithm for estimating credible regions.
dc.description by Charles Frogner.
dc.description Ph. D.
dc.format 143 pages
dc.format application/pdf
dc.language eng
dc.publisher Massachusetts Institute of Technology
dc.rights MIT theses are protected by copyright. They may be viewed, downloaded, or printed from this source but further reproduction or distribution in any format is prohibited without written permission.
dc.rights http://dspace.mit.edu/handle/1721.1/7582
dc.subject Brain and Cognitive Sciences.
dc.title Learning and inference with Wasserstein metrics
dc.type Thesis


Files in this item

Files Size Format View
1086609029-MIT.pdf 11.42Mb application/pdf View/Open

This item appears in the following Collection(s)

Show simple item record

Search Think! Evidence


Browse

My Account