Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/test_notebooks.yml
Original file line number Diff line number Diff line change
Expand Up @@ -22,4 +22,4 @@ jobs:
pip install jupyter
- name: Execute notebooks
run: |
for f in *.ipynb; do echo "Processing $f file.."; time jupyter nbconvert --TagRemovePreprocessor.enabled=True --TagRemovePreprocessor.remove_cell_tags="['do_not_execute']" --to notebook --ExecutePreprocessor.timeout=600 --inplace --execute $f;done;
for f in tutorials/*.ipynb; do echo "Processing $f file.."; time jupyter nbconvert --TagRemovePreprocessor.enabled=True --TagRemovePreprocessor.remove_cell_tags="['do_not_execute']" --to notebook --ExecutePreprocessor.timeout=600 --inplace --execute $f;done;
30 changes: 15 additions & 15 deletions README.Rmd
Original file line number Diff line number Diff line change
@@ -1,36 +1,36 @@
---
title: "Tutorials for Topological Data Analysis with the Gudhi Library"
output:
output:
github_document:
pandoc_args: --webtex
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(
echo = FALSE,
echo = FALSE,
fig.align = "center"
)
```

Topological Data Analysis (TDA) is a recent and fast growing field providing a set of new topological and geometric tools to infer relevant features for possibly complex data. Here we propose a set of notebooks for the practice of TDA with the Python Gudhi library together with popular machine learning and data sciences libraries. See for instance [this paper](https://arxiv.org/abs/1710.04019) for an introduction to TDA for data science. The complete list of notebooks can also be found at the end of this page.

## Install Python Gudhi Library
## Install Python Gudhi Library

See the [installation page](https://gudhi.inria.fr/python/latest/installation.html) or if you have conda you can make a [conda install](https://anaconda.org/conda-forge/gudhi).

## TDA Analysis Pipeline

### 01 - Simplex trees and simpicial complexes

TDA typically aims at extracting topological signatures from a point cloud in $\mathbb{R}^d$ or in a general metric space. By studying the topology of a point cloud, we actually mean studying the topology of the unions of balls centered at the point cloud, also called *offsets*. However, non-discrete sets such as offsets, and also continuous mathematical shapes like curves, surfaces and more generally manifolds, cannot easily be encoded as finite discrete structures. [Simplicial complexes](https://en.wikipedia.org/wiki/Simplicial_complex) are therefore used in computational geometry to approximate such shapes.
TDA typically aims at extracting topological signatures from a point cloud in $\mathbb{R}^d$ or in a general metric space. By studying the topology of a point cloud, we actually mean studying the topology of the unions of balls centered at the point cloud, also called *offsets*. However, non-discrete sets such as offsets, and also continuous mathematical shapes like curves, surfaces and more generally manifolds, cannot easily be encoded as finite discrete structures. [Simplicial complexes](https://en.wikipedia.org/wiki/Simplicial_complex) are therefore used in computational geometry to approximate such shapes.

A simplicial complex is a set of [simplices](https://en.wikipedia.org/wiki/Simplex), they can be seen as higher dimensional generalization of graphs. These are mathematical objects that are both topological and combinatorial, a property making them particularly useful for TDA. The challenge here is to define such structures that are proven to reflect relevant information about the structure of data and that can be effectively constructed and manipulated in practice. Below is an exemple of simplicial complex:

```{r simplicial-complex-example}
knitr::include_graphics("Images/Pers14.PNG")
knitr::include_graphics("tutorials/Images/Pers14.PNG")
```
A filtration is an increasing sequence of sub-complexes of a simplicial complex $\mathcal{K}$. It can be seen as ordering the simplices included in the complex $\mathcal{K}$. Indeed, simpicial complexes often come with a specific order, as for [Vietoris-Rips complexes](https://en.wikipedia.org/wiki/Vietoris%E2%80%93Rips_complex), [Cech complexes](https://en.wikipedia.org/wiki/%C4%8Cech_complex) and [alpha complexes](https://en.wikipedia.org/wiki/Alpha_shape#Alpha_complex).

A filtration is an increasing sequence of sub-complexes of a simplicial complex $\mathcal{K}$. It can be seen as ordering the simplices included in the complex $\mathcal{K}$. Indeed, simpicial complexes often come with a specific order, as for [Vietoris-Rips complexes](https://en.wikipedia.org/wiki/Vietoris%E2%80%93Rips_complex), [Cech complexes](https://en.wikipedia.org/wiki/%C4%8Cech_complex) and [alpha complexes](https://en.wikipedia.org/wiki/Alpha_shape#Alpha_complex).

[Notebook: Simplex trees](Tuto-GUDHI-simplex-Trees.ipynb). In Gudhi, filtered simplicial complexes are encoded through a data structure called simplex tree. Vertices are represented as integers, edges as pairs of integers, etc.

Expand All @@ -42,27 +42,27 @@ knitr::include_graphics("https://gudhi.inria.fr/python/latest/_images/Simplex_tr

[Notebook: Rips and alpha complexes from pairwise distance](Tuto-GUDHI-simplicial-complexes-from-distance-matrix.ipynb). It is also possible to define Rips complexes in general metric spaces from a matrix of pairwise distances. The definition of the metric on the data is usually given as an input or guided by the application. It is however important to notice that the choice of the metric may be critical to reveal interesting topological and geometric features of the data. We also give in this last notebook a way to define alpha complexes from matrix of pairwise distances by first applying a [multidimensional scaling (MDS)](https://en.wikipedia.org/wiki/Multidimensional_scaling) transformation on the matrix.

TDA signatures can extracted from point clouds but in many cases in data sciences the question is to study the topology of the sublevel sets of a function.
TDA signatures can extracted from point clouds but in many cases in data sciences the question is to study the topology of the sublevel sets of a function.

```{r sublevel-sets-example}
knitr::include_graphics("Images/sublevf.png")
knitr::include_graphics("tutorials/Images/sublevf.png")
```

Above is an example for a function defined on a subset of $\mathbb{R}$ but in general the function $f$ is defined on a subset of $\mathbb{R}^d$.
Above is an example for a function defined on a subset of $\mathbb{R}$ but in general the function $f$ is defined on a subset of $\mathbb{R}^d$.

[Notebook: cubical complexes](Tuto-GUDHI-cubical-complexes.ipynb). One first approach for studying the topology of the sublevel sets of a function is to define a regular grid on $\mathbb{R}^d$ and then to define a filtered complex based on this grid and the function $f$.

### 02 - Persistent homology and persistence diagrams

Homology is a well-known concept in algebraic topology. It provides a powerful tool to formalize and handle the notion of topological features of a topological space or of a simplicial complex in an algebraic way. For any dimension $k$, the $k$-dimensional *holes* are represented by a vector space $H_k$, whose dimension is intuitively the number of such independent features. For example, the $0$-dimensional homology group $H_0$ represents the connected components of the complex, the $1$-dimensional homology group $H_1$ represents the $1$-dimensional loops, the $2$-dimensional homology group $H_2$ represents the $2$-dimensional cavities and so on.

Persistent homology is a powerful tool to compute, study and encode efficiently multiscale topological features of nested families of simplicial complexes and topological spaces. It encodes the evolution of the homology groups of the nested complexes across the scales. The diagram below shows several level sets of the filtration:

```{r persistence}
knitr::include_graphics("Images/pers.png")
knitr::include_graphics("tutorials/Images/pers.png")
```

[Notebook: persistence diagrams](Tuto-GUDHI-persistence-diagrams.ipynb) In this notebook we show how to compute barcodes and persistence diagrams from a filtration defined on the Protein binding dataset. This tutorial also introduces the bottleneck distance between persistence diagrams.
[Notebook: persistence diagrams](Tuto-GUDHI-persistence-diagrams.ipynb) In this notebook we show how to compute barcodes and persistence diagrams from a filtration defined on the Protein binding dataset. This tutorial also introduces the bottleneck distance between persistence diagrams.

### 03 - Representations of persistence and linearization

Expand All @@ -82,7 +82,7 @@ C. Oballe and V. Maroulas provide a [tutorial](https://github.com/coballejr/misc

### 06 - Machine learning and deep learning with TDA

Two libraries related to Gudhi:
Two libraries related to Gudhi:

- [ATOL](https://github.com/martinroyer/atol): Automatic Topologically-Oriented Learning. See [this tutorial](https://github.com/martinroyer/atol/blob/master/demo/atol-demo.ipynb).
- [Perslay](https://github.com/MathieuCarriere/perslay): A Simple and Versatile Neural Network Layer for Persistence Diagrams. See [notebook](Tuto-GUDHI-perslay-visu.ipynb).
Expand All @@ -97,7 +97,7 @@ This [notebook](Tuto-GUDHI-DTM-filtrations.ipynb) introduces the distance to mea

### 10 - TDA and dimension reduction

### 11 - Inverse problem and optimization with TDA
### 11 - Inverse problem and optimization with TDA

In this [notebook](Tuto-GUDHI-optimization.ipynb), we will see how Gudhi and Tensorflow can be combined to perform optimization of persistence diagrams to sove an inverse problem.

Expand Down
66 changes: 33 additions & 33 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ structures that are proven to reflect relevant information about the
structure of data and that can be effectively constructed and
manipulated in practice. Below is an exemple of simplicial complex:

![simplicial complex example](Images/Pers14.PNG)
![simplicial complex example](tutorials/Images/Pers14.PNG)

A filtration is an increasing sequence of sub-complexes of a simplicial
complex $\mathcal{K}$. It can be seen as ordering the simplices included in
Expand All @@ -50,27 +50,27 @@ complexes](https://en.wikipedia.org/wiki/Vietoris%E2%80%93Rips_complex),
[alpha
complexes](https://en.wikipedia.org/wiki/Alpha_shape#Alpha_complex).

[Notebook: Simplex trees](Tuto-GUDHI-simplex-Trees.ipynb). In Gudhi,
[Notebook: Simplex trees](tutorials/Tuto-GUDHI-simplex-Trees.ipynb). In Gudhi,
filtered simplicial complexes are encoded through a data structure
called simplex tree. Vertices are represented as integers, edges as
pairs of integers, etc.

![simplex tree representation](Images/Simplex_tree_representation.png)
![simplex tree representation](tutorials/Images/Simplex_tree_representation.png)

[Notebook: Vietoris-Rips complexes and alpha complexes from data
points](https://github.com/GUDHI/TDA-tutorial/blob/master/Tuto-GUDHI-simplicial-complexes-from-data-points.ipynb).
points](tutorials/Tuto-GUDHI-simplicial-complexes-from-data-points.ipynb).
In practice, the first step of the **TDA Analysis Pipeline** is to define a
filtration of simplicial complexes for some data. This notebook explains
how to build Vietoris-Rips complexes and alpha complexes (represented as
simplex trees) from data points in $\mathbb{R}^d$, using the simplex tree data
structure.


This [Notebook](Tuto-GUDHI-alpha-complex-visualization.ipynb) shows how to visualize simplicial complexes.
This [Notebook](tutorials/Tuto-GUDHI-alpha-complex-visualization.ipynb) shows how to visualize simplicial complexes.


[Notebook: Rips and alpha complexes from pairwise
distance](Tuto-GUDHI-simplicial-complexes-from-distance-matrix.ipynb).
distance](tutorials/Tuto-GUDHI-simplicial-complexes-from-distance-matrix.ipynb).
It is also possible to define Rips complexes in general metric spaces
from a matrix of pairwise distances. The definition of the metric on the
data is usually given as an input or guided by the application. It is
Expand All @@ -85,13 +85,13 @@ TDA signatures can extracted from point clouds but in many cases in data
sciences the question is to study the topology of the sublevel sets of a
function.

![function exemple](Images/sublevf.png)
![function exemple](tutorials/Images/sublevf.png)

Above is an example for a function defined on a subset of
$\mathbb{R}$ but in general the function $f$ is defined on a subset of
$\mathbb{R}^d$.

[Notebook: cubical complexes](Tuto-GUDHI-cubical-complexes.ipynb). One
[Notebook: cubical complexes](tutorials/Tuto-GUDHI-cubical-complexes.ipynb). One
first approach for studying the topology of the sublevel sets of a
function is to define a regular grid on
$\mathbb{R}^d$ and then to define a filtered complex based on this grid and the
Expand All @@ -115,26 +115,26 @@ simplicial complexes and topological spaces. It encodes the evolution of
the homology groups of the nested complexes across the scales. The
diagram below shows several level sets of the filtration:

![persistence](Images/pers.png)
![persistence](tutorials/Images/pers.png)

[Notebook: persistence diagrams](https://github.com/GUDHI/TDA-tutorial/blob/master/Tuto-GUDHI-persistence-diagrams.ipynb)
[Notebook: persistence diagrams](tutorials/Tuto-GUDHI-persistence-diagrams.ipynb)
In this notebook we show how to compute barcodes and persistence
diagrams from a filtration defined on the Protein binding dataset. This
tutorial also introduces the bottleneck distance between persistence
diagrams.

### 03 - Representations of persistence and linearization

In this [notebook](Tuto-GUDHI-representations.ipynb), we learn how to
In this [notebook](tutorials/Tuto-GUDHI-representations.ipynb), we learn how to
use alternative representations of persistence with the representations
module and finally we see a first example of how to efficiently combine
machine learning and topological data analysis.

This [notebook](Tuto-GUDHI-Expected-persistence-diagrams.ipynb)
This [notebook](tutorials/Tuto-GUDHI-Expected-persistence-diagrams.ipynb)
illustrates the notion of “Expected Persistence Diagram”, which is a way
to encode the topology of a random process as a deterministic measure.

This [notebook](Tuto-GUDHI-persistent-entropy.ipynb) shows how to summarize
This [notebook](tutorials/Tuto-GUDHI-persistent-entropy.ipynb) shows how to summarize
the information given by persistent homology using persistent entropy (a
number) and the ES-function (a curve) and explains in which situations they
can be useful.
Expand All @@ -146,7 +146,7 @@ features close to the diagonal. Since they correspond to topological
structures that die very soon after they appear in the filtration, these
points are generally considered as “topological noise”. Confidence
regions for persistence diagram provide a rigorous framework to this
idea. This [notebook](Tuto-GUDHI-ConfRegions-PersDiag-datapoints.ipynb)
idea. This [notebook](tutorials/Tuto-GUDHI-ConfRegions-PersDiag-datapoints.ipynb)
introduces the subsampling approach of [Fasy et al. 2014
AoS](https://projecteuclid.org/download/pdfview_1/euclid.aos/1413810729).

Expand All @@ -167,15 +167,15 @@ Two libraries related to Gudhi:
tutorial](https://github.com/martinroyer/atol/blob/master/demo/atol-demo.ipynb).
- [Perslay](https://github.com/MathieuCarriere/perslay): A Simple and
Versatile Neural Network Layer for Persistence Diagrams. See [this
notebook](Tuto-GUDHI-perslay-visu.ipynb).
notebook](tutorials/Tuto-GUDHI-perslay-visu.ipynb).

### 07 - Alternative filtrations and robust TDA

This [notebook](Tuto-GUDHI-DTM-filtrations.ipynb) introduces the
This [notebook](tutorials/Tuto-GUDHI-DTM-filtrations.ipynb) introduces the
distance to measure (DTM) filtration, as defined in [this
paper](https://arxiv.org/abs/1811.04757). This filtration can be used
for robust TDA. The DTM can also be used for robust approximations of
compact sets, see this [notebook](Tuto-GUDHI-kPDTM-kPLM.ipynb).
compact sets, see this [notebook](tutorials/Tuto-GUDHI-kPDTM-kPLM.ipynb).

### 08 - Topological Data Analysis for Time series

Expand All @@ -185,49 +185,49 @@ compact sets, see this [notebook](Tuto-GUDHI-kPDTM-kPLM.ipynb).

### 11 - Inverse problem and optimization with TDA

In this [notebook](Tuto-GUDHI-optimization.ipynb), we will see how Gudhi and
In this [notebook](tutorials/Tuto-GUDHI-optimization.ipynb), we will see how Gudhi and
Tensorflow can be combined to perform optimization of persistence diagrams to
solve an inverse problem. This other, less complete
[notebook](Tuto-GUDHI-PyTorch-optimization.ipynb) shows that this kind of
[notebook](tutorials/Tuto-GUDHI-PyTorch-optimization.ipynb) shows that this kind of
optimization works just as well with PyTorch.

## Complete list of notebooks for TDA

[Simplex trees](Tuto-GUDHI-simplex-Trees.ipynb)
[Simplex trees](tutorials/Tuto-GUDHI-simplex-Trees.ipynb)

[Vietoris-Rips complexes and alpha complexes from data
points](Tuto-GUDHI-simplicial-complexes-from-data-points.ipynb)
points](tutorials/Tuto-GUDHI-simplicial-complexes-from-data-points.ipynb)

[Visualizing simplicial
complexes](Tuto-GUDHI-alpha-complex-visualization.ipynb)
complexes](tutorials/Tuto-GUDHI-alpha-complex-visualization.ipynb)

[Rips and alpha complexes from pairwise
distance](Tuto-GUDHI-simplicial-complexes-from-distance-matrix.ipynb)
distance](tutorials/Tuto-GUDHI-simplicial-complexes-from-distance-matrix.ipynb)

[Cubical complexes](Tuto-GUDHI-cubical-complexes.ipynb)
[Cubical complexes](tutorials/Tuto-GUDHI-cubical-complexes.ipynb)

[Persistence diagrams and bottleneck
distance](Tuto-GUDHI-persistence-diagrams.ipynb)
distance](tutorials/Tuto-GUDHI-persistence-diagrams.ipynb)

[Representations of persistence](Tuto-GUDHI-representations.ipynb)
[Representations of persistence](tutorials/Tuto-GUDHI-representations.ipynb)

[Expected Persistence
Diagram](Tuto-GUDHI-Expected-persistence-diagrams.ipynb)
Diagram](tutorials/Tuto-GUDHI-Expected-persistence-diagrams.ipynb)

[Confidence regions for persistence diagrams - data
points](Tuto-GUDHI-ConfRegions-PersDiag-datapoints.ipynb)
points](tutorials/Tuto-GUDHI-ConfRegions-PersDiag-datapoints.ipynb)

[ATOL
tutorial](https://github.com/martinroyer/atol/blob/master/demo/atol-demo.ipynb)

[Perslay](Tuto-GUDHI-perslay-visu.ipynb)
[Perslay](tutorials/Tuto-GUDHI-perslay-visu.ipynb)

[DTM-filtrations](Tuto-GUDHI-DTM-filtrations.ipynb)
[DTM-filtrations](tutorials/Tuto-GUDHI-DTM-filtrations.ipynb)

[kPDTM-kPLM](Tuto-GUDHI-kPDTM-kPLM.ipynb)
[kPDTM-kPLM](tutorials/Tuto-GUDHI-kPDTM-kPLM.ipynb)

[Inverse problem and optimization with TDA](Tuto-GUDHI-optimization.ipynb)
[Inverse problem and optimization with TDA](tutorials/Tuto-GUDHI-optimization.ipynb)

[PyTorch differentiation of diagrams](Tuto-GUDHI-PyTorch-optimization.ipynb)
[PyTorch differentiation of diagrams](tutorials/Tuto-GUDHI-PyTorch-optimization.ipynb)

Contact : <[email protected]>
Loading
Loading