Topological Data Analysis

Topological data analysis is a new area which seeks to apply methods from topology to study the ”shape” of data sets. It is particularly suited to noisy data sets sitting in potentially high (e.g., 10,000) dimensional spaces, which are none-the-less concentrated around low-dimensional geometric structures that need to be uncovered.

Despite being a new area, topological data analysis has already seen applications in several areas of science and engineering, including oncology, astronomy, neuroscience, image processing, and biophysics.

General information

Instructor:	Shane Kelly
Email:	shane [dot] kelly [dot] uni [at] gmail [dot] com
Webpage:	http://www.mi.fu-berlin.de/users/shanekelly/TopologicalDataAnalysis2017SS.html
University webpage:	http://www.fu-berlin.de/vv/de/lv/365478?query=kelly\&sm=314889
Textbooks:	“Computational topology” by Edelsbrunner and Harer
	“Geometric and Topological Inference”
	by Boissonnat and Chazal and Yvinec
Room:	SR 210/A3 Seminarraum (Arnimallee 3-5)
Time:	Mo 14:00-16:00

About the presentation

This is a student seminar which means that the students each make one of the presentations. The presentation should be about 75 minutes long, leaving 15 minutes for potential questions and discussion.

Students are not required to hand in any written notes. However, students are encouraged to prepare some notes if they feel it will improve the presentation. This should be considered seriously, especially if the student has not made many presentations before.

The material listed below should be considered as a skeleton of the talk, to be padded out with other material from the texts or examples that the student finds interesting, relevant, enlightening, useful, etc.

If you have any questions please feel free to contact me at the email address above.

Overview and schedule

I. 24.04. Introduction

II. 08.05. Topology of manifolds (Stefan)

When developing topological data analysis methods, often one assumes that the data points are a random selection of points from a smooth manifold. Indeed, one of the basic theoretical questions is to what extent one can reconstruct the manifold, given a random selection of points.

On the other hand given a set of points in ℝⁿ, even if they are not a random selection of points from a manifold, a basic TDA technique is to construct from them a shape which is similar to a manifold, and study its topology as though it was a manifold.

This lecture develops the basic material about manifolds that we will need later. It includes the classification of compact 2-manifolds. This demonstrates that quite a lot of information is contained in the homology groups. It also gives an idea of what class of shapes are available within the theory of manifolds.

Define topological space, continuous map; Give examples; Define 2-manifold (without boundary); Define compact; Give an example of a non-compact topological space; State the Classification Theorem for Compact 2-Manifolds; Define triangulation; Define the Euler characteristic and genus of a compact 2-manifold.

III. 15.05. Simplicial Complexes (Mathias)

Simplicial complexes originated as a combinatorial tool, designed to capture the topology of a manifold. The idea is to cover the manifold with small triangles—a triangulation—and then instead of working with the manifold, work with the set of triangles and how they are glued together.

On the other hand, given a set of points in ℝⁿ, there are various ways of choosing a set of triangles which have these points as their verticies, and thus obtaining a topological space to which one can study using topological techniques.

Define the following terms: affine combination, affine hull, affinely independent, convex combination, convex hull, k-simplex, face, boundary. Simplicial complex, dimension, underlying space, polyhedron, triangulation, skeleton, star and link of a simplex, abstract simplicial complex and related terms; State and prove the Geometric Realization Theorem; Define barycentric coordinates, vertex map, simplicial map induced by a vertex map, simplicial homeomorphism; Give the example in Figure III.3; Define subdivision, barycenter, barycentric subdivision, diameter, mesh; State and prove the Mesh Lemma; Define the star condition, simplicial approximation; State and prove the Simplicial Approximation Theorem.

IV. 22.05. Convex Set Systems (Joschka)

The first part of this lecture discusses the relationship between a manifold and the union of a set of balls with centres on the manifold. If the centres and radii are chosen well, the union has the same topological properties as the original manifold.

The second part of this lecture discusses the first two ways we obtain simplicial complexes from a set of points—the Čech complex and the Vietoris-Rips complex. The Čech complex corresponds to replacing each point with a ball. The Čech complex may be two complicated for reasons of space or computational time. The Vietoris-Rips corresponds to replacing each point with a ball, and then filling in all “holes” of dimension > 2. Since it is completely determined by simplicies of dimension ≤ 1, it requires less memory.

State and prove Helly’s Theorem; Define homotopy, retract, deformation retract, homotopy equivalence, homotopy type, homotopy inverse, contractible, the nerve of a finite collection of sets; State the Nerve Theorem. Give at least one example, and one counter example for the Nerve Theorem; Define The Čech complex of a finite set of points. Give at least one example of a Čech complex; Define The Vietoris-Rips complex of a finite set of points; State and prove the Vietoris-Rips Lemma.

V. 29.05. Delaunay and Alpha Complexes (Felicia)

This lecture discusses the next two ways to obtain a simplicial complex from a set of points—the Delaunay and Alpha complexes. Basically, given a set of points S ⊆ ℝ², the verticies of the Delaunay complex are the set S, and three vertices are assigned a triangle if there exists a point equidistant from all three. The alpha complex is similar, but we bound the allowed distance of the equidistant points.

Define Voronoi cell, Voronoi diagram; Give an example of a Voronoi diagram; Define Delaunay complex; Show that when the points are in general position, taking convex hulls gives a geometric realization of the Delaunay complex;

Define the alpha complex; Observe that for a set of points in general position, taking convex hulls gives a geometric realization; Observe that the union of the balls and the Alpha complex have the same homotopy type; Define the weighted alpha complex; Mention its use in modelling biomolecules; Define the filtration by radius on the Delaunay complex. Define elementary collapses, collapses, and observe that they preserve the homotopy type. Define regular event, and critical event, and give an example of both.

VI. 12.06. Homology and Matrix Reduction (Tom)

The first part of this lecture introduces homology—the basic tool from algebraic topology that we will use. The first homology of a simplicial complex is a vector space, whose dimension is the number of sets of three joined edges which are not filled by a triangle, that is, the number of holes of dimension 1 in the associated space. The other homology groups count holes of different sizes.

The second half of this lecture discusses computational aspects of homology groups.

Define p-chain, addition of p-chains, boundary of a p-chain, p-cycle, p-boundary; State and prove the Fundamental Lemma of Homology; Define homology groups, Betti numbers, homology class, homologous; Give the example of a torus in Figure IV.2; Calculate the homology of a single simplex of dimension k; Define the augmentation map, reduced homology groups, reduced Betti numbers, induced map on homology of a simplicial map, mod 2 degree of a map; State and prove Brower’s Fixed Point Theorem.

Define the Euler characteristic of a simplicial complex; Calculate the Euler characteristic using Betti numbers; Define the pth boundary matrix, Smith normal form; Observe that the Smith normal form gives the ranks of the group of boundaries and cycles (Figure IV.5); Show that reduction to Smith normal form takes cubic time, and quadratic memory, in the number of simplicies.

VII. 19.06. Piecewise Linear Morse Theory (William)

The input to persistent homology which is covered in the next lecture, is a sequence of spaces depending on a parameter a ∈ ℝ. One way of obtaining such a sequence is if we have a function f : X → ℝ defined on some space X, and we consider the sequence X_a = f^-1(-∞,a].

The first part of this lecture discusses how to do this with simplicial complexes. The second part introduces the Reeb graph.

The Reeb graph of a continuous function X → ℝ, replaces a space X with a graph, which contains information about the “horizontal” loops in X.

Given a map f from the verticies of a simplicial complex K to ℝ, define the associated piecewise linear function on the geometric realisation |K|; Define when f is generic; Define the lower star, closed lower star, lower star filtration; Prove that the sublevel sets have the same homotopy type as the associated complexes in the lower star filtration; Define the lower link; Define PL regular vertex, simple PL critical vertex, index, minimum, saddle, maximum, PL Morse function; Prove that the alternating sum of the simple PL critial points gives the Euler characteristic; State the PL Morse Inequalities.

Define the Reeb graph of a continuous map f : X → ℝ from a topological space X to ℝ; Present Figure VI.13; State the inequalities for β₀ and β₁ on page 141 and explain heuristically why they are true; Present the algorithm for constructing the Reeb graph of a PL Morse function on the simplicial complex of a triangulated 2-manifold (pp.143-144).

VIII. 26.06. Persistent homology (Tommaso)

Many of the simplicial complexes we have introduced depend on a parameter, r. Persistent homology studies how the homology groups change as we vary r. As homology groups count holes in our space, persistent homology studies how long these holes “persist” for as we vary r.

Define monotonic function on a simplicial complex, the induced filtration on the simplicial complex; Observe that Čech and alpha complexes give examples of this; Define persistent homology groups, persistent Betti numbers, when a class is born, dies, persistence of a class γ, index persistence, persistence diagram, State the Fundamental Lemma of Persistent Homology; Define boundary matrix ∂; Present the reduction algorithm, and observe that the reduced matrix it produces the Betti numbers State and prove the Pairing Lemma; Define positive and negative complexes and explain why they are called positive or negative; Present the example of a 2-simplex (a triangle).

IX. 03.07. Lessons learned: Use topology to understand abstract datasets! (Arne)

General, high dimensional datasets contain a lot of helpful information. Such datasets in particular often doesn’t come with a perfect notion of a distance. This talk will mention some arising problems analyzing such datasets and describe some caveats as well as methods to answer some questions about them without requiring big amounts of labeled data.

X. 10.07. 1-parameter Families (Josué)

Typically, rather than using a single function f : K → ℝ to study the topology of a simplicial complex, one would use a well chosen collection of functions. This lecture discusses computationally efficient ways to calculate the persistent homology of one function from the persistent homology of another.

Define the straight-line homotopy between two monotonic functions; Observe that every f_t is monotonic, and we can find a total order compatible with the partial order; Define the ru-decomposition; Explain the procedure for updating the decomposition at a transposition; Define switches; give two examples of transpositions which are not switches, and three examples of transpositions which are switches; State and prove the Transposition Lemma; Explain the computational time for updating the ru-decomposition.

XI. 17.07. Stability Theorems (Alex)

This lecture discusses how much the persistent homology of two different functions can differ. To this end, one defines the distance between two persistence diagrams, and the distance between two functions, and shows that the former is bounded by the latter.

Define bottleneck distance; Show that it is a metric; Define vine, vineyard, and describe why the vines form polygonal paths that monotonically increase in t; State and prove the Stability Theorem for Filtrations; Define the degree-q Wasserstein distance; describe the relationship to the bottleneck distance, and that it is a metric.

XII. Potential extra talk. Efficient Implementations

Present the sparse matrix version of the Persistence Algorithm and the analysis of its running time; Present the construction of the zeroth persistence diagram and the analysis of its running time; Present the construction of the first persistence diagram and the analysis of its running time.