en
Tutorials
08. Dimensionality Reduction

08. Dimensionality Reduction

PCA, t-SNE, UMAP


Learning Objectives

After completing this tutorial, you will be able to:

  • Understand PCA's mathematical principles and intuition
  • Utilize Principal Component Selection criteria (Explained Variance)
  • Understand t-SNE principles and hyperparameter tuning
  • Perform practical data visualization and interpretation
  • Select appropriate algorithms for different situations

Key Concepts

1. What is Dimensionality Reduction?

A technique to transform high-dimensional data to lower dimensions while preserving important information.

PurposeEffect
VisualizationExplore data in 2D/3D
Noise RemovalRemove unnecessary information
Computational EfficiencyImprove training speed
Overfitting PreventionSolve curse of dimensionality

Curse of Dimensionality: As dimensions increase, distances between data points become similar, and the amount of data needed for learning grows exponentially.


2. PCA (Principal Component Analysis)

PCA is a linear dimensionality reduction technique that projects data in the direction that maximizes variance.

Core Idea:

  • Find new axes (principal components) that preserve as much variance as possible
  • First principal component: Direction with largest variance
  • Second principal component: Direction with largest variance while orthogonal to the first
X → X_pca (Linear transformation)

Characteristics

  • Linear transformation
  • Orthogonal principal components
  • Ordered by explained variance
  • Invertible (reconstruction possible)
⚠️

Scaling Required! Always use StandardScaler before applying PCA. Features with different scales will bias principal components toward features with larger variance.

from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
 
# Scaling (required before PCA!)
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
 
# Reduce to 2D
pca = PCA(n_components=2)
X_pca = pca.fit_transform(X_scaled)
 
# Explained variance ratio
print(f"Explained Variance Ratio: {pca.explained_variance_ratio_}")
print(f"Total: {pca.explained_variance_ratio_.sum():.2%}")

Finding Optimal n_components

Determine optimal number of principal components through cumulative variance plot (Scree Plot).

pca_full = PCA()
pca_full.fit(X_scaled)
 
# Cumulative variance plot
cumsum = np.cumsum(pca_full.explained_variance_ratio_)
plt.plot(cumsum, 'o-')
plt.axhline(0.95, color='r', linestyle='--')  # 95% threshold
plt.axhline(0.90, color='orange', linestyle='--')  # 90% threshold
plt.xlabel('Number of Components')
plt.ylabel('Cumulative Explained Variance')
plt.title('Scree Plot')

Rule of Thumb: Generally select number of principal components where cumulative variance reaches 90~95%. Run the code above to check the optimal number for your dataset.

Principal Component Interpretation (Loading Analysis)

Analyze which original features each principal component relates to through the Loading Matrix.

import pandas as pd
 
# Loading Matrix (relationship between principal components and original features)
loadings = pd.DataFrame(
    pca.components_.T,
    columns=[f'PC{i+1}' for i in range(pca.n_components_)],
    index=feature_names
)
print(loadings.round(3))

3. t-SNE

t-SNE (t-distributed Stochastic Neighbor Embedding) is a non-linear dimensionality reduction technique, specialized for visualization.

Core Idea:

  • Preserve similarity between points in high dimensions in low dimensions
  • Can capture non-linear relationships
ParameterDescriptionRecommended Range
perplexityNumber of local neighbors5-50
n_iterNumber of iterationsAt least 1000
learning_rateLearning rate10-1000
from sklearn.manifold import TSNE
 
tsne = TSNE(
    n_components=2,
    perplexity=30,
    n_iter=1000,
    random_state=42
)
X_tsne = tsne.fit_transform(X_scaled)

Effect of Perplexity

Perplexity determines the number of neighbors each point considers:

  • Small value (5-10): Emphasizes local structure, clusters are more separated
  • Large value (30-50): Emphasizes global structure, more continuous distribution
⚠️

t-SNE Cautions

  • Distance between clusters is meaningless (only relative positions matter)
  • Cluster size is also meaningless
  • Difficult to preserve global structure
  • Sensitive to parameters
  • No transform() (only fit_transform() available)
  • Computationally slow (unsuitable for large-scale data)
  • Results differ each run (random_state fixing needed)

High-dimensional data tip: First reducing to about 50 dimensions with PCA before applying t-SNE significantly improves speed.


4. UMAP

A modern technique that's faster than t-SNE and better preserves global structure.

import umap
 
reducer = umap.UMAP(
    n_components=2,
    n_neighbors=15,
    min_dist=0.1,
    random_state=42
)
X_umap = reducer.fit_transform(X_scaled)
 
# Can transform new data (unlike t-SNE!)
X_new_umap = reducer.transform(X_new)

5. Algorithm Comparison

FeaturePCAt-SNEUMAP
TypeLinearNon-linearNon-linear
GoalMaximize variancePreserve neighbor structurePreserve neighbor structure
SpeedFastSlowMedium
Global StructurePreservedXPreserved
TransformOXO
Inverse TransformOXX
InterpretabilityHigh (loading)LowLow
Use CasePreprocessing/VisualizationVisualizationVisualization/Preprocessing

Code Summary

from sklearn.decomposition import PCA
from sklearn.manifold import TSNE
import umap
from sklearn.preprocessing import StandardScaler
 
# Scaling (required!)
X_scaled = StandardScaler().fit_transform(X)
 
# PCA
pca = PCA(n_components=2)
X_pca = pca.fit_transform(X_scaled)
print(f'Explained variance: {sum(pca.explained_variance_ratio_)*100:.1f}%')
 
# t-SNE
tsne = TSNE(n_components=2, perplexity=30, random_state=42)
X_tsne = tsne.fit_transform(X_scaled)
 
# UMAP
reducer = umap.UMAP(n_components=2, random_state=42)
X_umap = reducer.fit_transform(X_scaled)
 
# Visualization
fig, axes = plt.subplots(1, 3, figsize=(15, 5))
for ax, X_reduced, title in zip(axes, [X_pca, X_tsne, X_umap], ['PCA', 't-SNE', 'UMAP']):
    scatter = ax.scatter(X_reduced[:, 0], X_reduced[:, 1], c=y, cmap='tab10', alpha=0.7)
    ax.set_title(title)

Practical Tips & Best Practices

PCA Usage Guide

  1. Preprocessing required: Apply StandardScaler (remove scale effects)
  2. Principal component count selection: Check Elbow/Scree plot, 90~95% cumulative variance criterion
  3. Interpretation: Loading matrix analysis, Biplot visualization
  4. Use cases: Visualization (2~3D), Preprocessing (noise removal), Multicollinearity resolution

t-SNE Usage Guide

  1. Preprocessing: Scaling recommended, first reduce to ~50D with PCA if high-dimensional
  2. Hyperparameters: perplexity 5~50, max_iter at least 1000 (check convergence)
  3. Caution: Distance/size between clusters is meaningless
  4. Use case: Visualization only (unsuitable for preprocessing)

Selection Guide

SituationRecommendation
Preprocessing/Feature extractionPCA
Visualization (small-scale)t-SNE
Visualization (large-scale)UMAP
Need to transform new dataPCA or UMAP
Interpretation neededPCA

Interview Questions Preview

  1. What are PCA's principles and principal component selection methods?
  2. What are the differences between t-SNE and UMAP?
  3. What are the considerations when using dimensionality reduction for preprocessing?
  4. Why is scaling needed for PCA?
  5. Can you interpret the distance between clusters in t-SNE results?

Check out more interview questions at Premium Interviews (opens in a new tab).


Practice Notebook

Additional notebook content:

  • Understanding PCA intuition with 2D data
  • Iris, Digits dataset practice
  • Eigenfaces (face recognition data) visualization
  • Image compression and reconstruction using PCA
  • t-SNE results comparison by Perplexity change
  • Practice problems (Wine, MNIST datasets)

Previous: 07. Clustering | Next: 09. Anomaly Detection