en
Tutorials
11. Time Series

11. Time Series Analysis

ARIMA, Seasonality, Forecasting


Learning Objectives

After completing this tutorial, you will be able to:

  • Understand time series data characteristics (Trend, Seasonality, Stationarity)
  • Perform Time Series Decomposition
  • Execute and interpret Stationarity tests (ADF Test)
  • Understand and apply ARIMA models
  • Select parameters using ACF/PACF
  • Evaluate forecasting performance (RMSE, MAE, MAPE)

Key Concepts

1. What is Time Series Data?

Data ordered by time, with special characteristics different from regular data.

CharacteristicDescription
Order mattersData order has meaning
AutocorrelationPast values influence future

3 Components of Time Series

ComponentDescription
TrendLong-term increase/decrease pattern
SeasonalityPeriodic repeating pattern
NoiseIrregular variation
# Time series data creation example
import numpy as np
import pandas as pd
 
np.random.seed(42)
dates = pd.date_range(start='2020-01-01', periods=730, freq='D')
 
# Components
trend = np.linspace(100, 200, 730)  # Linear trend
seasonality = 30 * np.sin(2 * np.pi * np.arange(730) / 365)  # Annual seasonality
weekly = 10 * np.sin(2 * np.pi * np.arange(730) / 7)  # Weekly pattern
noise = np.random.normal(0, 10, 730)  # Noise
 
# Final time series
values = trend + seasonality + weekly + noise
df = pd.DataFrame({'date': dates, 'sales': values})
df.set_index('date', inplace=True)

2. Time Series Decomposition

Decompose time series into trend, seasonality, and residuals to analyze each component.

Additive vs Multiplicative Model

ModelFormulaWhen to Use
AdditiveYt = Tt + St + RtWhen seasonal variation is constant
MultiplicativeYt = Tt × St × RtWhen seasonal variation grows with trend
  • T: Trend
  • S: Seasonal
  • R: Residual
from statsmodels.tsa.seasonal import seasonal_decompose
 
# Decompose with additive model
decomposition = seasonal_decompose(df['sales'], model='additive', period=365)
 
# Access each component
print(f'Trend start: {decomposition.trend.dropna().iloc[0]:.2f}')
print(f'Seasonality range: [{decomposition.seasonal.min():.2f}, {decomposition.seasonal.max():.2f}]')
print(f'Residual std: {decomposition.resid.std():.2f}')

Use multiplicative model for data like airline passengers where seasonal variation grows as trend increases.


3. Stationarity

Core assumption of time series analysis: Statistical properties are constant over time

ConditionDescription
Constant meanE[Yt] = μ
Constant varianceVar(Yt) = σ²
AutocovarianceCov(Yt, Yt-k) = γk (depends only on lag)
⚠️

Why is Stationarity Important? Most time series models (like ARIMA) assume stationarity. Non-stationary series require transformation.

Stationarity Test (ADF Test)

Use ADF (Augmented Dickey-Fuller) test to check stationarity.

from statsmodels.tsa.stattools import adfuller
 
def adf_test(series, name=''):
    result = adfuller(series.dropna(), autolag='AIC')
    print(f'=== ADF Test: {name} ===')
    print(f'ADF Statistic: {result[0]:.4f}')
    print(f'p-value: {result[1]:.4f}')
    print(f'Critical Values:')
    for key, value in result[4].items():
        print(f'  {key}: {value:.4f}')
 
    if result[1] < 0.05:
        print('\n→ Stationary')
    else:
        print('\n→ Non-stationary')
 
adf_test(df['sales'], 'Original Series')
# p-value < 0.05 → Stationary

Non-stationary → Stationary Transformation

# First differencing
series_diff = series.diff().dropna()
 
# Log transform + differencing (when variance increases)
series_log_diff = np.log(series).diff().dropna()

4. ACF and PACF

Key tools for determining ARIMA model parameters.

MetricMeaningFormulaARIMA Application
ACFCorrelation by lagCorr(Yt, Yt-k)Determine MA(q) order
PACFPure lag correlationRemove intermediate lag effectsDetermine AR(p) order
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
 
fig, axes = plt.subplots(1, 2, figsize=(12, 4))
plot_acf(series_diff, ax=axes[0], lags=50, alpha=0.05)
plot_pacf(series_diff, ax=axes[1], lags=50, alpha=0.05)
plt.tight_layout()
plt.show()

ACF/PACF Interpretation Guide

PatternACFPACFModel
AR(p)Exponential decayCut off at pARIMA(p,d,0)
MA(q)Cut off at qExponential decayARIMA(0,d,q)
ARMAExponential decayExponential decayARIMA(p,d,q)

5. ARIMA Model

AutoRegressive Integrated Moving Average

ARIMA(p, d, q):

  • p: AR (AutoRegressive) order - Determined by PACF
  • d: Differencing order - Number of differences needed for stationarity
  • q: MA (Moving Average) order - Determined by ACF
from statsmodels.tsa.arima.model import ARIMA
 
# Train/Test split (maintain time order!)
train_size = int(len(df) * 0.8)
train = df['sales'][:train_size]
test = df['sales'][train_size:]
 
# Model creation and training
model = ARIMA(train, order=(2, 1, 2))
model_fit = model.fit()
 
# Forecast
forecast = model_fit.forecast(steps=len(test))
 
# Summary
print(model_fit.summary())
🚫

Caution: Time series data must be split in time order. Random splitting causes future information leakage (Data Leakage)!

Automatic Parameter Selection (Grid Search)

from itertools import product
 
p_values = range(0, 4)
d_values = range(0, 2)
q_values = range(0, 4)
 
results = []
for p, d, q in product(p_values, d_values, q_values):
    try:
        model = ARIMA(train, order=(p, d, q))
        model_fit = model.fit()
        aic = model_fit.aic
        results.append({'Order': f'({p},{d},{q})', 'AIC': aic})
    except:
        continue
 
results_df = pd.DataFrame(results)
print(results_df.nsmallest(5, 'AIC'))  # Lower AIC is better

6. Seasonal ARIMA (SARIMA)

Use SARIMA for data with seasonality.

SARIMA(p, d, q)(P, D, Q, s):

  • (p, d, q): Non-seasonal parameters
  • (P, D, Q, s): Seasonal parameters, s=period
from statsmodels.tsa.statespace.sarimax import SARIMAX
 
model = SARIMAX(train,
                order=(1, 1, 1),
                seasonal_order=(1, 1, 1, 12))  # Monthly data
model_fit = model.fit(disp=False)

7. Auto ARIMA

Automatically find optimal parameters with pmdarima library.

from pmdarima import auto_arima
 
auto_model = auto_arima(
    train,
    seasonal=True,
    m=12,  # Seasonal period
    trace=True,
    error_action='ignore',
    suppress_warnings=True
)
print(auto_model.summary())

8. Residual Diagnostics

Good models should have residuals that are white noise.

from statsmodels.stats.diagnostic import acorr_ljungbox
 
residuals = model_fit.resid
 
# Ljung-Box test
lb_result = acorr_ljungbox(residuals, lags=[10, 20, 30], return_df=True)
print(lb_result)
# p-value > 0.05 means no autocorrelation in residuals (Good!)
💡

Residual diagnostic checklist:

  1. Residuals randomly distributed around 0
  2. Residual ACF not significant
  3. Residuals follow normal distribution (check Q-Q Plot)

9. Moving Average Based Forecasting

Simple but effective baseline model.

# Simple Moving Average (SMA)
df['SMA_7'] = df['sales'].rolling(window=7).mean()
df['SMA_30'] = df['sales'].rolling(window=30).mean()
 
# Exponential Moving Average (EMA) - More weight on recent values
df['EMA_7'] = df['sales'].ewm(span=7).mean()
df['EMA_30'] = df['sales'].ewm(span=30).mean()

10. Time Series Cross-Validation

from sklearn.model_selection import TimeSeriesSplit
 
tscv = TimeSeriesSplit(n_splits=5)
for train_idx, test_idx in tscv.split(X):
    X_train, X_test = X[train_idx], X[test_idx]
    y_train, y_test = y[train_idx], y[test_idx]

Code Summary

import pandas as pd
import numpy as np
from statsmodels.tsa.arima.model import ARIMA
from statsmodels.tsa.stattools import adfuller
from sklearn.metrics import mean_squared_error, mean_absolute_error
 
# 1. Stationarity test
result = adfuller(series.dropna())
print(f'ADF p-value: {result[1]:.4f}')
 
# 2. Difference if needed
series_diff = series.diff().dropna()
 
# 3. Data split (maintain time order)
train_size = int(len(series) * 0.8)
train, test = series[:train_size], series[train_size:]
 
# 4. ARIMA model
model = ARIMA(train, order=(2, 1, 2))
model_fit = model.fit()
 
# 5. Forecast
forecast = model_fit.forecast(steps=len(test))
 
# 6. Evaluation
rmse = np.sqrt(mean_squared_error(test, forecast))
mae = mean_absolute_error(test, forecast)
mape = np.mean(np.abs((test - forecast) / test)) * 100
 
print(f"RMSE: {rmse:.4f}")
print(f"MAE: {mae:.4f}")
print(f"MAPE: {mape:.2f}%")

Evaluation Metrics

MetricFormulaCharacteristics
RMSE√(MSE)Sensitive to large errors
MAE`Mean(error
MAPE`Mean(error/y

Time Series Forecasting Best Practices

Checklist

  1. Data Exploration: Check trend, seasonality, outliers and decompose time series
  2. Ensure Stationarity: Difference/log transform if needed after ADF test
  3. Model Selection: ACF/PACF analysis, AIC/BIC-based parameter selection
  4. Residual Diagnostics: Check residual autocorrelation, normality test
  5. Forecast Evaluation: Time-based split, Rolling window cross-validation

Common Mistakes

MistakeCorrect Approach
Random Train/Test splitSplit by time order
Using future informationUse only past data
Skipping stationarity testADF test required
Single evaluation metricComprehensive evaluation with RMSE, MAE, MAPE

Interview Questions Preview

  1. What is stationarity and why is it important?
  2. How do you determine p, d, q for ARIMA?
  3. What are the considerations for Train/Test Split with time series data?
  4. What's the difference between ACF and PACF?
  5. When do you use additive vs multiplicative models?

Check out more interview questions at Premium Interviews (opens in a new tab).


Practice Notebook

The notebook additionally covers practice with synthetic data and real airline passenger data, various moving average comparisons, residual diagnostic visualization, and practice problems.


Previous: 10. Imbalanced Data | Next: 12. Neural Network