Electricity Forecasting Framework
Overview
This skill provides end-to-end support for electricity load/demand forecasting projects, from data preprocessing to model deployment. It covers traditional statistical methods, modern machine learning approaches, and state-of-the-art deep learning architectures.
Quick Start
1. Define Your Forecasting Task
| Horizon | Type | Typical Use |
|---|
| 1-48 hours | Short-term (STLF) | Grid operations, unit commitment |
| 1 week - 1 month |
Medium-term | Maintenance scheduling, fuel planning |
| 1-12 months | Long-term (LTLF) | Capacity planning, infrastructure investment |
2. Prepare Your Data
CODEBLOCK0
Required data columns:
- -
timestamp: Datetime index (hourly or sub-hourly) - INLINECODE1 : Target variable (MW or kWh)
- INLINECODE2 : Weather feature (°C)
- Optional: humidity, windspeed, solarradiation, holiday_flag
3. Select Your Model
See references/model-selection.md for detailed guidance.
Quick recommendation:
- - Baseline: Start with
persistence or INLINECODE4 - Production STLF: Use
XGBoost or LightGBM with weather features - Research/SOTA: Try
Temporal Fusion Transformer (TFT) or INLINECODE8
4. Train and Evaluate
CODEBLOCK1
Key metrics to track:
- - MAPE (%): Mean Absolute Percentage Error - business interpretability
- RMSE (MW): Root Mean Square Error - penalizes large errors
- MAE (MW): Mean Absolute Error - robust to outliers
- Coverage (%): Prediction interval coverage probability
Core Workflows
Data Preprocessing
- 1. Load raw data with proper datetime parsing
- Handle missing values: Forward-fill for short gaps, interpolate for longer
- Feature engineering:
- Temporal: hour, day
ofweek, month, is
weekend, isholiday
- Lag features: load
t-1, loadt-24, load_t-168 (weekly)
- Rolling stats: rolling
mean24h, rolling
std7d
- Weather: temperature, humidity, apparent_temperature
- 4. Normalization: RobustScaler or MinMaxScaler for deep learning models
See references/feature-engineering.md for complete feature list.
Model Training
CODEBLOCK2
Hyperparameter Tuning
Use scripts/hyperparameter_search.py for automated tuning:
CODEBLOCK3
Uncertainty Quantification
For risk-aware decision making:
- - Quantile Regression: Predict multiple quantiles (0.1, 0.5, 0.9)
- Conformal Prediction: Distribution-free uncertainty bounds
- Ensemble Methods: Model disagreement as uncertainty proxy
- Monte Carlo Dropout: For neural networks
See references/uncertainty.md for implementation details.
Model Reference
Statistical Models
| Model | Best For | Pros | Cons |
|---|
| ARIMA | Stable series | Interpretable, fast | Assumes linearity |
| SARIMA |
Strong seasonality | Captures daily/weekly patterns | Manual parameter tuning |
| Prophet | Multiple seasonalities | Handles holidays well | Less accurate for STLF |
| TBATS | Complex seasonality | Automatic parameter selection | Slower training |
Machine Learning Models
| Model | Best For | Pros | Cons |
|---|
| XGBoost | Production STLF | Fast, accurate, handles missing | No native uncertainty |
| LightGBM |
Large datasets | Faster than XGBoost, memory efficient | Sensitive to hyperparameters |
| Random Forest | Baseline ML | Robust, easy to tune | Lower accuracy than boosting |
| CatBoost | Categorical features | Handles categoricals natively | Slower training |
Deep Learning Models
| Model | Best For | Pros | Cons |
|---|
| LSTM | Sequential patterns | Captures long-term dependencies | Slow training, hard to tune |
| GRU |
Similar to LSTM | Faster convergence | Similar limitations |
| Transformer | Long sequences | Parallel training, attention | Data-hungry, complex |
| TFT | Multi-horizon | Interpretable attention, uncertainty | Complex implementation |
| N-BEATS | Pure deep learning | Strong baseline, interpretable | Less flexible than TFT |
| iTransformer | SOTA performance | Inverted transformer architecture | Recent, less battle-tested |
See references/deep-learning-models.md for architecture details and PyTorch implementations.
Evaluation Best Practices
Time Series Cross-Validation
Never use random k-fold! Use expanding or sliding window:
CODEBLOCK4
Backtesting Framework
CODEBLOCK5
Benchmark Comparison
Always compare against:
- 1. Persistence: loadt = loadt-1
- Seasonal Naive: loadt = loadt-24 (for hourly data)
- Weekly Naive: loadt = loadt-168
Deployment
Production Pipeline
- 1. Model serialization: Save with joblib or ONNX
- Feature pipeline: Ensure identical preprocessing at inference
- Scheduling: Cron or Airflow for automated forecasts
- Monitoring: Track forecast drift and retrain triggers
See references/deployment.md for MLOps patterns.
Real-time Inference
CODEBLOCK6
Common Pitfalls
- 1. Data leakage: Ensure no future information in features
- Holiday handling: Special days need explicit modeling
- Temperature nonlinearity: Use heating/cooling degree days
- Concept drift: Retrain quarterly or when MAPE degrades >20%
- Peak prediction: Models often under-predict peaks - consider quantile loss
Resources
Scripts
| Script | Purpose |
|---|
| INLINECODE10 | Data cleaning and feature engineering |
| INLINECODE11 |
Model training with validation |
|
scripts/hyperparameter_search.py | Automated hyperparameter optimization |
|
scripts/backtest.py | Time series cross-validation |
|
scripts/evaluate.py | Comprehensive metric calculation |
|
scripts/deploy_model.py | Export model for production |
Example Usage
CODEBLOCK7
电力负荷预测框架
概述
本技能为电力负荷/需求预测项目提供端到端支持,涵盖从数据预处理到模型部署的全流程。包括传统统计方法、现代机器学习方法以及最先进的深度学习架构。
快速开始
1. 定义预测任务
| 预测周期 | 类型 | 典型用途 |
|---|
| 1-48小时 | 短期预测 | 电网运行、机组组合 |
| 1周-1个月 |
中期预测 | 维护排程、燃料规划 |
| 1-12个月 | 长期预测 | 容量规划、基础设施投资 |
2. 准备数据
bash
运行数据准备脚本
python scripts/prepare
data.py --input rawload.csv --output processed/
所需数据列:
- - timestamp:时间戳索引(小时级或更细粒度)
- load:目标变量(MW或kWh)
- temperature:天气特征(°C)
- 可选:湿度、风速、太阳辐射、节假日标志
3. 选择模型
详细指导请参见 references/model-selection.md。
快速推荐:
- - 基线模型:从持久性模型或季节性朴素模型开始
- 生产环境短期预测:使用带天气特征的XGBoost或LightGBM
- 研究/最新技术:尝试时序融合变换器(TFT)或iTransformer
4. 训练与评估
bash
python scripts/train_model.py --model xgboost --data processed/ --horizon 24
关键跟踪指标:
- - MAPE(%):平均绝对百分比误差 - 业务可解释性
- RMSE(MW):均方根误差 - 惩罚大误差
- MAE(MW):平均绝对误差 - 对异常值鲁棒
- 覆盖率(%):预测区间覆盖概率
核心工作流
数据预处理
- 1. 加载原始数据,正确解析日期时间
- 处理缺失值:短间隔用前向填充,较长间隔用插值
- 特征工程:
- 时间特征:小时、星期几、月份、是否周末、是否节假日
- 滞后特征:load
t-1、loadt-24、load_t-168(周)
- 滚动统计:rolling
mean24h、rolling
std7d
- 天气特征:温度、湿度、体感温度
- 4. 归一化:深度学习模型使用RobustScaler或MinMaxScaler
完整特征列表请参见 references/feature-engineering.md
模型训练
python
训练工作流示例
from electricity_forecasting import ForecastPipeline
pipeline = ForecastPipeline(
model_type=xgboost,
horizon=24,
lookback=168 # 1周历史数据
)
pipeline.fit(traindata, valdata)
predictions, uncertainty = pipeline.predict(test_data)
metrics = pipeline.evaluate(predictions, actuals)
超参数调优
使用 scripts/hyperparameter_search.py 进行自动调优:
bash
python scripts/hyperparameter_search.py \
--model lightgbm \
--data processed/ \
--n-trials 50 \
--study-name stlf-tuning
不确定性量化
用于风险感知决策:
- - 分位数回归:预测多个分位数(0.1、0.5、0.9)
- 共形预测:无分布假设的不确定性边界
- 集成方法:模型分歧作为不确定性代理
- 蒙特卡洛Dropout:用于神经网络
实现细节请参见 references/uncertainty.md
模型参考
统计模型
| 模型 | 最佳适用场景 | 优点 | 缺点 |
|---|
| ARIMA | 稳定序列 | 可解释、快速 | 假设线性关系 |
| SARIMA |
强季节性 | 捕捉日/周模式 | 手动参数调优 |
| Prophet | 多重季节性 | 良好处理节假日 | 短期预测精度较低 |
| TBATS | 复杂季节性 | 自动参数选择 | 训练较慢 |
机器学习模型
| 模型 | 最佳适用场景 | 优点 | 缺点 |
|---|
| XGBoost | 生产环境短期预测 | 快速、准确、处理缺失值 | 无原生不确定性 |
| LightGBM |
大数据集 | 比XGBoost更快、内存高效 | 对超参数敏感 |
| 随机森林 | 基线机器学习 | 鲁棒、易于调优 | 精度低于提升方法 |
| CatBoost | 类别特征 | 原生处理类别特征 | 训练较慢 |
深度学习模型
| 模型 | 最佳适用场景 | 优点 | 缺点 |
|---|
| LSTM | 序列模式 | 捕捉长期依赖 | 训练慢、难以调优 |
| GRU |
类似LSTM | 收敛更快 | 类似局限性 |
| Transformer | 长序列 | 并行训练、注意力机制 | 数据需求大、复杂 |
| TFT | 多周期预测 | 可解释注意力、不确定性 | 实现复杂 |
| N-BEATS | 纯深度学习 | 强基线、可解释 | 灵活性低于TFT |
| iTransformer | 最新技术性能 | 倒置Transformer架构 | 较新、测试不足 |
架构细节和PyTorch实现请参见 references/deep-learning-models.md
评估最佳实践
时间序列交叉验证
切勿使用随机k折!使用扩展窗口或滑动窗口:
python
扩展窗口交叉验证
from sklearn.model_selection import TimeSeriesSplit
tscv = TimeSeriesSplit(nsplits=5, testsize=168) # 1周测试
for trainidx, testidx in tscv.split(data):
train, test = data[trainidx], data[testidx]
# 训练和评估
回测框架
bash
python scripts/backtest.py \
--model xgboost \
--data processed/ \
--cv-splits 5 \
--horizon 24 \
--metrics mape,rmse,mae
基准对比
始终对比:
- 1. 持久性模型:loadt = loadt-1
- 季节性朴素模型:loadt = loadt-24(小时级数据)
- 周朴素模型:loadt = loadt-168
部署
生产流水线
- 1. 模型序列化:使用joblib或ONNX保存
- 特征流水线:确保推理时预处理一致
- 调度:使用Cron或Airflow进行自动预测
- 监控:跟踪预测漂移和重训练触发条件
MLOps模式请参见 references/deployment.md
实时推理
python
from electricity_forecasting import DeploymentModel
model = DeploymentModel.load(models/xgboost-stlf.joblib)
features = preparefeatures(latestdata)
prediction = model.predict(features, return_uncertainty=True)
常见陷阱
- 1. 数据泄露:确保特征中不含未来信息
- 节假日处理:特殊日期需要显式建模
- 温度非线性:使用供暖/制冷度日数
- 概念漂移:每季度或MAPE退化超过20%时重新训练
- 峰值预测:模型常低估峰值 - 考虑分位数损失
资源
脚本
| 脚本 | 用途 |
|---|
| scripts/preparedata.py | 数据清洗和特征工程 |
| scripts/trainmodel.py |
带验证的模型训练 |
| scripts/hyperparameter_search.py | 自动超参数优化 |
| scripts/backtest.py | 时间序列交叉验证 |
| scripts/evaluate.py | 综合指标计算 |
| scripts/deploy_model.py | 导出模型用于生产 |
使用示例
bash
完整工作流示例
1. 准备数据
python scripts/prepare
data.py --input data/load2024.csv --output data/processed/
2. 训练模型
python scripts/train_model.py --model lightgbm --data data/processed/ --horizon 48
3. 超参数调优
python scripts/hyperparameter_search.py --model lightgbm --data data/processed/ --