Description

UMAP (Uniform Manifold Approximation and Projection) is a non-linear dimensionality reduction technique that creates 2D or 3D visualisations of high-dimensional data by constructing a mathematical model of the data's underlying manifold structure. Unlike t-SNE, UMAP preserves both local neighbourhood relationships and global topology more effectively, using techniques from topological data analysis and Riemannian geometry. This approach often produces more interpretable cluster layouts while maintaining meaningful distances between clusters, making it particularly valuable for exploratory data analysis and understanding complex dataset structures.

Example Use Cases

Explainability

Analysing single-cell RNA sequencing data to visualise how different cell types cluster based on gene expression patterns, revealing developmental trajectories and identifying previously unknown cell subtypes in tissue samples.

Exploring customer segmentation by reducing hundreds of behavioural and demographic features to 2D space, showing how different customer groups relate to each other and identifying transition zones where customers might move between segments.

Limitations

  • Hyperparameter choices (n_neighbors, min_dist, metric) significantly influence the embedding structure and can lead to very different interpretations of the same data.
  • While preserving global structure better than t-SNE, distances in the reduced space still don't directly correspond to distances in the original feature space.
  • Performance can be sensitive to the choice of distance metric, which may not be obvious for complex or mixed data types.
  • Like other manifold learning techniques, it assumes the data lies on a lower-dimensional manifold, which may not hold for all datasets.

Resources

Research Papers

UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction
Leland McInnes, John Healy, and James MelvilleFeb 9, 2018

UMAP (Uniform Manifold Approximation and Projection) is a novel manifold learning technique for dimension reduction. UMAP is constructed from a theoretical framework based in Riemannian geometry and algebraic topology. The result is a practical scalable algorithm that applies to real world data. The UMAP algorithm is competitive with t-SNE for visualization quality, and arguably preserves more of the global structure with superior run time performance. Furthermore, UMAP has no computational restrictions on embedding dimension, making it viable as a general purpose dimension reduction technique for machine learning.

Uniform Manifold Approximation and Projection (UMAP) and its Variants: Tutorial and Survey
Benyamin Ghojogh et al.Aug 25, 2021

Uniform Manifold Approximation and Projection (UMAP) is one of the state-of-the-art methods for dimensionality reduction and data visualization. This is a tutorial and survey paper on UMAP and its variants. We start with UMAP algorithm where we explain probabilities of neighborhood in the input and embedding spaces, optimization of cost function, training algorithm, derivation of gradients, and supervised and semi-supervised embedding by UMAP. Then, we introduce the theory behind UMAP by algebraic topology and category theory. Then, we introduce UMAP as a neighbor embedding method and compare it with t-SNE and LargeVis algorithms. We discuss negative sampling and repulsive forces in UMAP's cost function. DensMAP is then explained for density-preserving embedding. We then introduce parametric UMAP for embedding by deep learning and progressive UMAP for streaming and out-of-sample data embedding.

Software Packages

umap
Jul 2, 2017

Uniform Manifold Approximation and Projection

Documentations

How UMAP Works — umap 0.5.8 documentation
Umap-learn DevelopersJan 1, 2018

Tags

Explainability Dimensions

Representation Analysis:
Explanation Target:
Properties:
Explanatory Scope:

Other Categories

Applicable Models:
Data Requirements:
Data Type:
Evidence Type:
Expertise Needed:
Lifecycle Stage:
Technique Type: