Conference: Geometry and Data Analysis

June 8-10, 2015

The University of Chicago


This is a workshop on geometric and topological approaches to statistical inference.  One of the main bottlenecks in the analysis of “big data” is caused by the poor scaling of fitting statistical distributions to high-dimensional data, especially when the data has an intricate correlation structure.  In recent years much work has been done using techniques from differential geometry, geometric analysis, algebraic geometry, and algebraic topology to produce algorithms and gain insight in some circumstances.

Conference Schedule

Invited Speakers

Yuliy Baryshnikov  |  University of Illinois at Urbana-Champaign

Reading Doodles and Scribbles


While most of the data analysis strive at filtering out what looks like noise, we look into what can be parsed – if heuristically – from the jitter.

 Mikhail Belkin  |  The Ohio State University

Learning a Hidden Basis from Imperfect Measurements


A number of problems from the classical spectral theorem for symmetric matrices to such topics of recent interest as the Independent Component Analysis, Gaussian mixture learning, spectral clustering and some recent applications of tensor methods for statistical inference can be recast in terms of learning a hidden orthogonal basis from (potentially noisy) observations.  In this talk I will introduce the problem and propose a “gradient iteration” algorithm for provable basis recovery.  I will describe some of its theoretical properties including fast (super-linear) convergence and a perturbation analysis which can be viewed as a non-linear generalization of classical perturbation results for matrices.  Unlike most of the existing related work our approach is based not on matrix or tensorial properties but on certain underlying “hidden convexity”.  I will discuss applications of these ideas to spectral graph clustering and to the Independent Component Analysis.  This is joint work with L. Rademacher and J. Voss.

Omer Bobrowski  |  Duke University

Maximal cycles in random geometric complexes


Random geometric complexes are simplicial complexes with vertices generated by a random point process in a metric space.  Their study is useful in establishing null hypotheses for geometric and topological data analysis methods.  In this talk we will review recent advances in the study of the homology (cycles in various dimensions) of these complexes.  In particular, we will focus on recent work describing the size of the “largest” cycles that can be formed, and discuss the contribution of this analysis to applications in topological data analysis (TDA).  This is joint work with Matthew Kahle (Ohio State University) and Primoz Skraba (Jozef Stefan Institute).

Dima Burago  |  Pennsylvania State University

On discretization and approximation in Metric geometry and PDEs


We will discuss approximations of metric spaces by metric graphs, approximating the Beltrami-Laplace operator on manifolds by Laplace on weighted graphs, and the convolution metric-measure Laplacian and its stability.  This is joint work with S. Ivanov and Y. Kurylev.

Gunnar Carlsson  |  Stanford University

The Shape of Data


It is starting to be understood that complexity of data is as big a bottleneck in its analysis as is its absolute size.  This argues for the creation of new modeling and representation methods for complex data sets.  One such method is based on topology, the mathematical study of shape.  This method extends topological methods for spaces with complete information to more sampled versions, point clouds.  We will discuss these methods with numerous examples, including several from financial data.  Presented by Gunnar Carlsson and Michael Woods.

Frederic Chazal  |  INRIA

Subsampling methods for persistent homology


Computational topology has recently seen an important development toward data analysis, giving birth to Topological Data Analysis.  Persistent homology appears as a fundamental tool in this field.  It is usually computed from filtrations built on top of data sets sampled from some unknown (metric) space, providing “topological signatures” revealing the structure of the underlying space.  In this talk we will present a few stability and statistical properties of persistence diagrams that allow to efficiently infer robust and relevant topological information from data.

Rob Ghrist  |  University of Pennsylvania

Algebraic-Topological Data Structures


Recent progress has given numerous examples of spaces whose topology (as measures by homology or cohomology) is of relevance to applications in data, systems, networks, and more.  Concomitant with this has been an explosion of computational tools for computing such.  This talk considers the evolution of these ideas from spaces to algebraic data structures tethered to a space.  The appropriate topological data structure is called a sheaf, and the theory of sheaves yields numerous tools, including cohomology theories, cellular approximations, and more.  This talk will outline the basics in the context of numerous examples, including computational issues.

Ezra Miller  |  Duke University

Persistent interactions between biology, topology, and statistics


Applying persistent homology to biological problems can lead to fresh perspectives on the relevant topology and geometry (and on the underlying biology, too).  The examples here come from two datasets: magnetic resonance images of cerebral arteries and photographic images of fruit fly wings.  This first part of the talk explains what we have learned about the geometry of blood vessels in aging human brains, along with lessons this exploration has taught us about applications of persistent homology in general.  The second part concerns current investigations in evolutionary biology, especially geometric questions that arise from statistical analysis in the context of multiparameter persistence.

Washington Mio  |  Florida State University

The Shape of Data and Probability Measures


We discuss an approach to shape of data and probability measures that is based on weighted covariance tensor fields (CTF).  Localized forms of covariance have been used empirically in data analysis, but we present a systematic treatment that includes stability theorems with respect to optimal transport metrics that ensure that properties of probability measures derived from CTFs are robust, as well as consistency results and convergence rates for empirical CTFs that guarantee that shape properties can be estimated reliably from data.  Not only CTFs are stable and make the geometry of data more readily accessible, but distributions may be fully recovered from CTFs.  We consider applications to data analysis that includes manifold clustering, where the goal is to categorize points according to their noisy membership in a finite union of possibly intersecting smooth manifolds.  We also discuss ways of using CTFs in analysis of high-dimensional data with low-dimensional structure and in modeling the shape of Euclidean data with Riemmanian metrics derived from CTFs.  This is joint work with Diego Diaz Martinez and Facundo Memoli.

Sayan Murkhejee  |  Duke University

Geometry of Mixture Models


In this talk we state how geometry can be used for modeling mixtures of subspaces as well as analyzing online (stochastic) optimization algorithms.  We  introduce a Bayesian model for inferring mixtures of subspaces of different dimensions.  The key challenge in such a model is specifying prior distributions over subspaces of different dimensions. We address this challenge by embedding subspaces or Grassmann manifolds into a sphere of relatively low dimension and specifying priors on the sphere.  We provide an efficient sampling algorithm for the posterior distribution of the model parameters.  We also prove posterior consistency of our procedure.  The utility of this approach is demonstrated with applications to real and simulated data.  We also state a topic model based on this idea.  We prove the equivalence of two online learning algorithms: 1) mirror descent and 2) natural gradient descent. The equivalence is a straightforward application of differential geometry.  We discuss how this relation can be used in the context of mixtures of subspaces.  We touch on the idea of information geometry for stratified spaces.  This is joint work with Brian St. Thomas, Lizhen Lin, Lek-Heng Lim, and Garvesh Raskutti.

Hal Schenck  |  University of Illinois at Urbana-Champaign

Trading networks and Hodge theory


Connections can both stabilize networks and provide pathways for contagion.  The central problem in such networks is establishing global behavior from local interactions.  We use the Hodge decomposition introduced by Jiang-Lim-Yao-Ye to study financial networks, starting from the Eisenberg-Noe setup of liabilities and endowments, and construct a network of default, and quantifying the systemic importance of defaults.  This is joint work with Richard Sowers, UIUC Math/ORIE.

Katharine Turner  |  University of Chicago

Reconstruction of compact sets using cone fields


A standard reconstruction problem is how to discover a compact set from a noisy point cloud that approximates it.  When learning manifolds we often use the distance to the closest critical point of the distance function.  For more general compact sets need to use geometric quantities such as $\mu$-critical points that can cope with corners.  We prove a sufficient condition, as a bound on the Hausdorff distance between two compact sets, for when certain offsets of these two sets are homotopy equivalent in terms of the absence of $\mu$-critical points in an annular region.  We do this by showing the existence of a vector field whose flow provides a deformation retraction.  The ambient space can be any Riemannian manifold but we focus on ambient manifolds which have nowhere negative curvature (this includes Euclidean space).  In the process, we prove stability theorems for $\mu$-critical points when the ambient space is a manifold.

Michael Woods  |  Ayasdi

The Shape of Data


It is starting to be understood that complexity of data is as big a bottleneck in its analysis as is its absolute size.  This argues for the creation of new modeling and representation methods for complex data sets.  One such method is based on topology, the mathematical study of shape.  This method extends topological methods for spaces with complete information to more sampled versions, point clouds.  We will discuss these methods with numerous examples, including several from financial data.  Presented by Michael Woods and Gunnar Carlsson.

Registration information

Registration has closed.


Hotel information

as of June 6, 2015

The Stevanovich Center is supported by the generous philanthropy of University of Chicago Trustee Steve G. Stevanovich, AB ’85, MBA ’90.