t-SNE
Description
t-SNE (t-Distributed Stochastic Neighbour Embedding) is a non-linear dimensionality reduction technique that creates 2D or 3D visualisations of high-dimensional data by preserving local neighbourhood relationships. The algorithm converts similarities between data points into joint probabilities in the high-dimensional space, then tries to minimise the divergence between these probabilities and those in the low-dimensional embedding. This approach excels at revealing cluster structures and local patterns, making it particularly effective for exploratory data analysis and understanding complex data relationships that linear methods like PCA might miss.
Example Use Cases
Explainability
Analyzing genomic data with thousands of gene expression features to visualize how different cancer subtypes cluster together, revealing which tumors have similar molecular signatures and potentially similar treatment responses.
Exploring deep learning model embeddings to understand how a neural network represents different categories of images, showing whether the model groups similar objects (cars, animals, furniture) in meaningful clusters in its internal feature space.
Limitations
- Non-deterministic algorithm produces different results on each run, making it difficult to reproduce exact visualizations or compare results across studies.
- Prioritizes preserving local neighborhood structure at the expense of global relationships, potentially creating misleading impressions about overall data topology.
- Computationally expensive with O(n²) complexity, making it impractical for datasets with more than ~10,000 points without approximation methods.
- Sensitive to hyperparameter choices (perplexity, learning rate, iterations) that can dramatically affect clustering patterns and require domain expertise to tune appropriately.