05. Ensemble Methods
Bagging, Random Forest, Boosting, XGBoost, LightGBM
Learning Objectives
After completing this tutorial, you will be able to:
- Understand the key differences between Bagging and Boosting
- Understand Random Forest operation principles and experiment with main parameters
- Understand the mathematical principles of Gradient Boosting and XGBoost implementation
- Compare performance of XGBoost, LightGBM, CatBoost
- Perform hyperparameter tuning with GridSearch, RandomSearch, Optuna
- Analyze and interpret Feature Importance
Key Concepts
1. What is Ensemble?
Ensemble learning is a technique that combines multiple weak learners to create a strong learner.
"The wisdom of three people is better than one genius" - Collective Intelligence ┌─────────────────────────────────────┐
│ Ensemble Learning │
└─────────────────────────────────────┘
│
┌───────────────┴───────────────┐
▼ ▼
┌───────────────┐ ┌───────────────┐
│ Bagging │ │ Boosting │
│ (Parallel) │ │ (Sequential) │
└───────────────┘ └───────────────┘
│ │
▼ ▼
Random Forest AdaBoost, GBM, XGBoost2. Ensemble Types
| Method | Characteristics | Representative Models |
|---|---|---|
| Bagging | Parallel training, Variance reduction | Random Forest |
| Boosting | Sequential training, Bias reduction | XGBoost, LightGBM |
| Stacking | Meta-model learning | StackingClassifier |
| Voting | Majority vote/Average | VotingClassifier |
3. Bagging (Bootstrap Aggregating)
Principle:
- Bootstrap sampling from original data (sampling with replacement)
- Train independent models on each sample (parallelizable)
- Combine predictions through voting (classification) or averaging (regression)
Mathematical Expression:
Advantages:
- Variance reduction → Overfitting prevention
- Fast training through parallel processing
- Validation possible with OOB (Out-of-Bag) samples
OOB (Out-of-Bag) Insight: About 36.8% of original data is excluded in each Bootstrap sample. Using these OOB samples enables performance evaluation without a separate Validation Set!
from sklearn.ensemble import BaggingClassifier
from sklearn.tree import DecisionTreeClassifier
bagging = BaggingClassifier(
estimator=DecisionTreeClassifier(),
n_estimators=100,
max_samples=0.8,
bootstrap=True,
oob_score=True, # Calculate OOB score
random_state=42
)4. Random Forest
Random Forest = Bagging + Feature Randomness
- Uses randomly selected feature subset when training each tree
- Reduces correlation between trees → Maximizes ensemble effect
Main Parameters:
| Parameter | Description | Default |
|---|---|---|
n_estimators | Number of trees (more is better but diminishing returns) | 100 |
max_features | Number of features to consider at each split | sqrt(n) |
max_depth | Maximum tree depth | None |
min_samples_split | Minimum samples to split | 2 |
min_samples_leaf | Minimum samples in leaf node | 1 |
from sklearn.ensemble import RandomForestClassifier
rf = RandomForestClassifier(
n_estimators=100, # Number of trees
max_depth=10, # Tree depth
max_features='sqrt', # Random features
min_samples_leaf=2,
n_jobs=-1, # Parallel processing
random_state=42
)n_estimators Selection Guide: Performance improves as trees increase but with diminishing returns. Monitor OOB Score to find where performance improvement stops. Since OOB Score is similar to Test Score, model selection is possible without a separate validation set.
OOB (Out-of-Bag) Score
Validation using samples not included in bootstrap:
rf = RandomForestClassifier(oob_score=True)
rf.fit(X_train, y_train)
print(f"OOB Score: {rf.oob_score_}")5. Boosting
A method that sequentially corrects errors of previous models.
Principle:
- Train first model
- Train next model focusing on previous model's errors (Residual)
- Weighted sum of all model predictions
Mathematical Expression:
Gradient Boosting
Reduces error using the gradient of the loss function:
Where:
- : Ensemble prediction at step m
- : m-th weak learner (learns residuals)
- : Learning rate (shrinkage factor)
from sklearn.ensemble import GradientBoostingClassifier
gb = GradientBoostingClassifier(
n_estimators=100,
learning_rate=0.1,
max_depth=3,
random_state=42
)XGBoost
from xgboost import XGBClassifier
xgb = XGBClassifier(
n_estimators=100,
learning_rate=0.1,
max_depth=3,
n_jobs=-1,
tree_method='hist', # Fast training
random_state=42
)Early Stopping Usage: XGBoost allows setting early stopping with the early_stopping_rounds parameter. This prevents overfitting and automatically finds the optimal number of iterations.
# Early Stopping example
xgb_es = XGBClassifier(
n_estimators=1000, # Set large value
learning_rate=0.1,
early_stopping_rounds=20,
)
xgb_es.fit(
X_train, y_train,
eval_set=[(X_val, y_val)],
verbose=False
)
print(f"Optimal iterations: {xgb_es.best_iteration}")LightGBM
from lightgbm import LGBMClassifier
lgbm = LGBMClassifier(
n_estimators=100,
learning_rate=0.1,
max_depth=-1, # No limit
num_leaves=31,
n_jobs=-1,
random_state=42
)6. XGBoost vs LightGBM vs CatBoost Comparison
| Feature | XGBoost | LightGBM | CatBoost |
|---|---|---|---|
| Tree Growth | Level-wise | Leaf-wise | Level-wise |
| Speed | Fast | Very fast | Moderate |
| Categorical Handling | Encoding required | Supported | Strong support |
| Missing Value Handling | Automatic | Automatic | Automatic |
| GPU Support | Yes | Yes | Yes |
| Model | Pros | Cons |
|---|---|---|
| Random Forest | Parallelizable, Overfitting prevention, Easy tuning | Slow prediction |
| XGBoost | High performance, Regularization | Memory usage, Tuning sensitive |
| LightGBM | Fast speed, Large-scale | Leaf-wise overfitting |
| CatBoost | Categorical handling | Slow training |
7. Feature Importance
# Random Forest
importances = rf.feature_importances_
# XGBoost (multiple types)
xgb.feature_importances_ # gain-based
import xgboost as xgb
xgb.plot_importance(model, importance_type='weight') # Split count
xgb.plot_importance(model, importance_type='gain') # Information gain
xgb.plot_importance(model, importance_type='cover') # CoverageFeature Importance Interpretation Tip: Each model calculates Feature Importance differently. Combining Importance from multiple models provides more reliable interpretation.
Code Summary
from sklearn.ensemble import (
RandomForestClassifier,
GradientBoostingClassifier,
VotingClassifier
)
from xgboost import XGBClassifier
from lightgbm import LGBMClassifier
# Models
models = {
'RF': RandomForestClassifier(n_estimators=100, random_state=42),
'GB': GradientBoostingClassifier(n_estimators=100, random_state=42),
'XGB': XGBClassifier(n_estimators=100, random_state=42),
'LGBM': LGBMClassifier(n_estimators=100, random_state=42)
}
# Performance comparison
for name, model in models.items():
scores = cross_val_score(model, X, y, cv=5)
print(f"{name}: {scores.mean():.4f} (+/-{scores.std():.4f})")Hyperparameter Tuning
GridSearchCV
from sklearn.model_selection import GridSearchCV
param_grid = {
'n_estimators': [100, 200],
'max_depth': [3, 5, 7],
'learning_rate': [0.01, 0.1, 0.2],
'subsample': [0.8, 1.0],
'colsample_bytree': [0.8, 1.0]
}
grid_search = GridSearchCV(
XGBClassifier(random_state=42),
param_grid,
cv=5,
scoring='accuracy',
n_jobs=-1
)
grid_search.fit(X_train, y_train)
print(f"Best parameters: {grid_search.best_params_}")RandomizedSearchCV
from sklearn.model_selection import RandomizedSearchCV
from scipy.stats import uniform, randint
param_distributions = {
'n_estimators': randint(50, 500),
'max_depth': randint(2, 10),
'learning_rate': uniform(0.01, 0.3),
'subsample': uniform(0.6, 0.4),
'colsample_bytree': uniform(0.6, 0.4)
}
random_search = RandomizedSearchCV(
XGBClassifier(random_state=42),
param_distributions,
n_iter=50,
cv=5,
n_jobs=-1
)
random_search.fit(X_train, y_train)Optuna Recommendation: For more efficient hyperparameter tuning, try Optuna. It uses TPE (Tree-structured Parzen Estimator) sampler to explore the search space more effectively.
Voting Ensemble
Combine multiple models for additional performance improvement:
from sklearn.ensemble import VotingClassifier
voting_clf = VotingClassifier(
estimators=[
('rf', RandomForestClassifier(n_estimators=200, random_state=42)),
('xgb', XGBClassifier(n_estimators=200, random_state=42)),
('lgb', LGBMClassifier(n_estimators=200, random_state=42))
],
voting='soft' # Probability-based voting
)
voting_clf.fit(X_train, y_train)Ensemble Methods Checklist
| Item | Bagging (RF) | Boosting (XGB) |
|---|---|---|
| Training Method | Parallel (Independent) | Sequential (Dependent) |
| Goal | Variance reduction | Bias + Variance reduction |
| Overfitting Risk | Low | High (needs caution) |
| Training Speed | Fast (parallelizable) | Slow (sequential) |
| Tuning Sensitivity | Low | High |
Interview Questions Preview
- What's the difference between Bagging and Boosting?
- What is the role of max_features in Random Forest?
- What are the differences between XGBoost and LightGBM?
Check out more interview questions at Premium Interviews (opens in a new tab).
Practice Notebook
Additional notebook content: Bootstrap Sampling visualization, Bagging vs Single Tree variability comparison, n_estimators/max_features effect experiments, Boosting step-by-step learning process visualization, California Housing regression performance comparison, Optuna hyperparameter tuning, and practice problems (House Prices regression, Stacking Ensemble, SHAP analysis).
Previous: 04. Decision Tree | Next: 06. Feature Engineering