MLOps (Deep Workflow)
MLOps connects research velocity to production reliability: version data, code, and artifacts together; monitor behavior after deploy.
When to Offer This Workflow
Trigger conditions:
- - First production model; batch or online serving
- Drift, bias, or latency SLO misses
- Compliance needs for lineage and explainability
Initial offer:
Use six stages: (1) problem & risk class, (2) data & reproducibility, (3) training & evaluation, (4) packaging & deployment, (5) monitoring & feedback, (6) governance & rollback). Confirm batch vs real-time and regulatory tier.
Stage 1: Problem & Risk Class
Goal: Align ML to decision risk (credit, health vs recommendation).
Exit condition: Offline and online success metrics defined.
Stage 2: Data & Reproducibility
Goal: Snapshot training data; deterministic pipelines; PII handling.
Practices
- - Feature stores optional but valuable for consistency
- Secrets not in notebooks; orchestrated jobs
Exit condition: Run id reproduces artifact hash within agreed bounds.
Stage 3: Training & Evaluation
Goal: Train/val/test without leakage; time-series splits careful.
Practices
- - Model card with limits and metrics
- Fairness slices where policy requires
Stage 4: Packaging & Deployment
Goal: Immutable artifacts; canary or shadow before full cutover.
Practices
- - Model + preprocessing code version pinned together
Exit condition: Rollback to previous artifact id documented.
Stage 5: Monitoring & Feedback
Goal: Data drift, concept drift, latency; business KPIs tied to model decisions.
Practices
- - Human review queue for low-confidence predictions when needed
Stage 6: Governance & Rollback
Goal: Approvals for retrain/deploy; audit trail; A/B for big changes.
Final Review Checklist
- - [ ] Offline metrics aligned with business risk
- [ ] Data and code reproducibility
- [ ] Packaged artifacts with versioning and rollback
- [ ] Online monitoring and drift strategy
- [ ] Governance and approval path
Tips for Effective Guidance
- - Training-serving skew is a top bug—feature parity tests help.
- Offline accuracy ≠ online business outcome.
- Fairness needs explicit slices—not one headline number.
Handling Deviations
- - LLM-heavy products: lean on eval harnesses and prompt versioning (see llm-evaluation).
- Tiny teams: start with artifact registry + dashboards before a full feature store.
MLOps(深度工作流)
MLOps将研究速度与生产可靠性相连接:对数据、代码和制品进行统一版本管理;在部署后监控行为。
何时提供此工作流
触发条件:
- - 首个生产模型;批处理或在线服务
- 漂移、偏差或延迟SLO未达标
- 合规性需要可追溯性和可解释性
初始提供:
使用六个阶段:(1)问题与风险分类,(2)数据与可复现性,(3)训练与评估,(4)打包与部署,(5)监控与反馈,(6)治理与回滚。确认批处理与实时处理以及监管层级。
阶段1:问题与风险分类
目标: 将机器学习与决策风险(信贷、健康与推荐)对齐。
退出条件: 定义离线与在线成功指标。
阶段2:数据与可复现性
目标: 快照训练数据;确定性流水线;PII处理。
实践
- - 特征存储可选但有助于一致性
- 密钥不在笔记本中;使用编排作业
退出条件: 运行ID在约定范围内复现制品哈希值。
阶段3:训练与评估
目标: 无泄漏的训练/验证/测试;时间序列分割需谨慎。
实践
- - 包含限制和指标的模型卡片
- 政策要求时进行公平性切片
阶段4:打包与部署
目标: 不可变制品;完全切换前进行金丝雀或影子部署。
实践
退出条件: 记录回滚到先前制品ID的文档。
阶段5:监控与反馈
目标: 数据漂移、概念漂移、延迟;业务KPI与模型决策关联。
实践
阶段6:治理与回滚
目标: 重新训练/部署的审批;审计追踪;重大变更进行A/B测试。
最终审查清单
- - [ ] 离线指标与业务风险对齐
- [ ] 数据和代码可复现性
- [ ] 打包制品具有版本控制和回滚能力
- [ ] 在线监控和漂移策略
- [ ] 治理和审批路径
有效指导建议
- - 训练-服务偏差是首要问题——特征一致性测试有助于解决。
- 离线准确率≠在线业务成果。
- 公平性需要明确的切片——而非单一总体数字。
偏差处理
- - 重度LLM产品:依赖评估框架和提示版本管理(参见llm-evaluation)。
- 小型团队:在完整特征存储之前,从制品注册表和仪表板开始。