08. Dimensionality Reduction

PCA, t-SNE, UMAP

Learning Objectives

After completing this tutorial, you will be able to:

Understand PCA's mathematical principles and intuition
Utilize Principal Component Selection criteria (Explained Variance)
Understand t-SNE principles and hyperparameter tuning
Perform practical data visualization and interpretation
Select appropriate algorithms for different situations

Key Concepts

1. What is Dimensionality Reduction?

A technique to transform high-dimensional data to lower dimensions while preserving important information.

Purpose	Effect
Visualization	Explore data in 2D/3D
Noise Removal	Remove unnecessary information
Computational Efficiency	Improve training speed
Overfitting Prevention	Solve curse of dimensionality

Curse of Dimensionality: As dimensions increase, distances between data points become similar, and the amount of data needed for learning grows exponentially.

2. PCA (Principal Component Analysis)

PCA is a linear dimensionality reduction technique that projects data in the direction that maximizes variance.

Core Idea:

Find new axes (principal components) that preserve as much variance as possible
First principal component: Direction with largest variance
Second principal component: Direction with largest variance while orthogonal to the first

X → X_pca (Linear transformation)

Characteristics

Linear transformation
Orthogonal principal components
Ordered by explained variance
Invertible (reconstruction possible)

⚠️

Scaling Required! Always use StandardScaler before applying PCA. Features with different scales will bias principal components toward features with larger variance.

from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
 
# Scaling (required before PCA!)
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
 
# Reduce to 2D
pca = PCA(n_components=2)
X_pca = pca.fit_transform(X_scaled)
 
# Explained variance ratio
print(f"Explained Variance Ratio: {pca.explained_variance_ratio_}")
print(f"Total: {pca.explained_variance_ratio_.sum():.2%}")

Finding Optimal n_components

Determine optimal number of principal components through cumulative variance plot (Scree Plot).

pca_full = PCA()
pca_full.fit(X_scaled)
 
# Cumulative variance plot
cumsum = np.cumsum(pca_full.explained_variance_ratio_)
plt.plot(cumsum, 'o-')
plt.axhline(0.95, color='r', linestyle='--')  # 95% threshold
plt.axhline(0.90, color='orange', linestyle='--')  # 90% threshold
plt.xlabel('Number of Components')
plt.ylabel('Cumulative Explained Variance')
plt.title('Scree Plot')

Rule of Thumb: Generally select number of principal components where cumulative variance reaches 90~95%. Run the code above to check the optimal number for your dataset.

Principal Component Interpretation (Loading Analysis)

Analyze which original features each principal component relates to through the Loading Matrix.

import pandas as pd
 
# Loading Matrix (relationship between principal components and original features)
loadings = pd.DataFrame(
    pca.components_.T,
    columns=[f'PC{i+1}' for i in range(pca.n_components_)],
    index=feature_names
)
print(loadings.round(3))

3. t-SNE

t-SNE (t-distributed Stochastic Neighbor Embedding) is a non-linear dimensionality reduction technique, specialized for visualization.

Core Idea:

Preserve similarity between points in high dimensions in low dimensions
Can capture non-linear relationships

Parameter	Description	Recommended Range
`perplexity`	Number of local neighbors	5-50
`n_iter`	Number of iterations	At least 1000
`learning_rate`	Learning rate	10-1000

from sklearn.manifold import TSNE
 
tsne = TSNE(
    n_components=2,
    perplexity=30,
    n_iter=1000,
    random_state=42
)
X_tsne = tsne.fit_transform(X_scaled)

Effect of Perplexity

Perplexity determines the number of neighbors each point considers:

Small value (5-10): Emphasizes local structure, clusters are more separated
Large value (30-50): Emphasizes global structure, more continuous distribution

⚠️

t-SNE Cautions

Distance between clusters is meaningless (only relative positions matter)
Cluster size is also meaningless
Difficult to preserve global structure
Sensitive to parameters
No transform() (only fit_transform() available)
Computationally slow (unsuitable for large-scale data)
Results differ each run (random_state fixing needed)

High-dimensional data tip: First reducing to about 50 dimensions with PCA before applying t-SNE significantly improves speed.

4. UMAP

A modern technique that's faster than t-SNE and better preserves global structure.

import umap
 
reducer = umap.UMAP(
    n_components=2,
    n_neighbors=15,
    min_dist=0.1,
    random_state=42
)
X_umap = reducer.fit_transform(X_scaled)
 
# Can transform new data (unlike t-SNE!)
X_new_umap = reducer.transform(X_new)

5. Algorithm Comparison

Feature	PCA	t-SNE	UMAP
Type	Linear	Non-linear	Non-linear
Goal	Maximize variance	Preserve neighbor structure	Preserve neighbor structure
Speed	Fast	Slow	Medium
Global Structure	Preserved	X	Preserved
Transform	O	X	O
Inverse Transform	O	X	X
Interpretability	High (loading)	Low	Low
Use Case	Preprocessing/Visualization	Visualization	Visualization/Preprocessing

Code Summary

from sklearn.decomposition import PCA
from sklearn.manifold import TSNE
import umap
from sklearn.preprocessing import StandardScaler
 
# Scaling (required!)
X_scaled = StandardScaler().fit_transform(X)
 
# PCA
pca = PCA(n_components=2)
X_pca = pca.fit_transform(X_scaled)
print(f'Explained variance: {sum(pca.explained_variance_ratio_)*100:.1f}%')
 
# t-SNE
tsne = TSNE(n_components=2, perplexity=30, random_state=42)
X_tsne = tsne.fit_transform(X_scaled)
 
# UMAP
reducer = umap.UMAP(n_components=2, random_state=42)
X_umap = reducer.fit_transform(X_scaled)
 
# Visualization
fig, axes = plt.subplots(1, 3, figsize=(15, 5))
for ax, X_reduced, title in zip(axes, [X_pca, X_tsne, X_umap], ['PCA', 't-SNE', 'UMAP']):
    scatter = ax.scatter(X_reduced[:, 0], X_reduced[:, 1], c=y, cmap='tab10', alpha=0.7)
    ax.set_title(title)

Practical Tips & Best Practices

PCA Usage Guide

Preprocessing required: Apply StandardScaler (remove scale effects)
Principal component count selection: Check Elbow/Scree plot, 90~95% cumulative variance criterion
Interpretation: Loading matrix analysis, Biplot visualization
Use cases: Visualization (2~3D), Preprocessing (noise removal), Multicollinearity resolution

t-SNE Usage Guide

Preprocessing: Scaling recommended, first reduce to ~50D with PCA if high-dimensional
Hyperparameters: perplexity 5~50, max_iter at least 1000 (check convergence)
Caution: Distance/size between clusters is meaningless
Use case: Visualization only (unsuitable for preprocessing)

Selection Guide

Situation	Recommendation
Preprocessing/Feature extraction	PCA
Visualization (small-scale)	t-SNE
Visualization (large-scale)	UMAP
Need to transform new data	PCA or UMAP
Interpretation needed	PCA

Interview Questions Preview

What are PCA's principles and principal component selection methods?
What are the differences between t-SNE and UMAP?
What are the considerations when using dimensionality reduction for preprocessing?
Why is scaling needed for PCA?
Can you interpret the distance between clusters in t-SNE results?

Check out more interview questions at Premium Interviews (opens in a new tab).

Practice Notebook

Additional notebook content:

Understanding PCA intuition with 2D data
Iris, Digits dataset practice
Eigenfaces (face recognition data) visualization
Image compression and reconstruction using PCA
t-SNE results comparison by Perplexity change
Practice problems (Wine, MNIST datasets)

Previous: 07. Clustering | Next: 09. Anomaly Detection

07. Clustering 09. Anomaly Detection