🤖
MLOps
Deploy ML models to production with pipelines, monitoring, serving, and reproducibility best practices.
安全通过
💬Prompt
技能说明
name: MLOps slug: mlops version: 1.0.0 description: "Deploy ML models to production with pipelines, monitoring, serving, and reproducibility best practices." metadata: {"clawdbot":{"emoji":"🤖","requires":{"bins":[]},"os":["linux","darwin","win32"]}}
Quick Reference
| Topic | File | Key Trap |
|---|---|---|
| CI/CD and DAGs | pipelines.md | Coupling training/inference deps |
| Model serving | serving.md | Cold start with large models |
| Drift and alerts | monitoring.md | Only technical metrics |
| Versioning | reproducibility.md | Not versioning preprocessing |
| GPU infrastructure | gpu.md | GPU request = full device |
Critical Traps
Training-Serving Skew:
- Preprocessing in notebook ≠ preprocessing in service → silent bugs
- Pandas in notebook → memory leaks in production (use native types)
- Feature store values at training time ≠ serving time without proper joins
GPU Memory:
requests.nvidia.com/gpu: 1reserves ENTIRE GPU, not partial memory- MIG/MPS sharing has real limitations (not plug-and-play)
- OOM on GPU kills pod with no useful logs
Model Versioning ≠ Code Versioning:
- Model artifacts need separate versioning (MLflow, W&B, DVC)
- Training data version + preprocessing version + code version = reproducibility
- Rollback requires keeping old model versions deployable
Drift Detection Timing:
- Retraining trigger isn't just "drift > threshold" → cost/benefit matters
- Delayed ground truth makes concept drift detection lag weeks
- Upstream data pipeline changes cause drift without model issues
Scope
This skill ONLY covers:
- CI/CD pipelines for models
- Model serving and scaling
- Monitoring and drift detection
- Reproducibility practices
- GPU infrastructure patterns
Does NOT cover: ML algorithms, feature engineering, hyperparameter tuning.
如何使用「MLOps」?
- 打开小龙虾AI(Web 或 iOS App)
- 点击上方「立即使用」按钮,或在对话框中输入任务描述
- 小龙虾AI 会自动匹配并调用「MLOps」技能完成任务
- 结果即时呈现,支持继续对话优化