🤖

MLOps

Deploy ML models to production with pipelines, monitoring, serving, and reproducibility best practices.

下载340

星标4

版本1.0.0

安全通过

💬Prompt

在 App 中使用在 ClawHub 查看 ↗

技能说明

name: MLOps slug: mlops version: 1.0.0 description: "Deploy ML models to production with pipelines, monitoring, serving, and reproducibility best practices." metadata: {"clawdbot":{"emoji":"🤖","requires":{"bins":[]},"os":["linux","darwin","win32"]}}

Quick Reference

Topic	File	Key Trap
CI/CD and DAGs	`pipelines.md`	Coupling training/inference deps
Model serving	`serving.md`	Cold start with large models
Drift and alerts	`monitoring.md`	Only technical metrics
Versioning	`reproducibility.md`	Not versioning preprocessing
GPU infrastructure	`gpu.md`	GPU request = full device

Critical Traps

Training-Serving Skew:

Preprocessing in notebook ≠ preprocessing in service → silent bugs
Pandas in notebook → memory leaks in production (use native types)
Feature store values at training time ≠ serving time without proper joins

GPU Memory:

requests.nvidia.com/gpu: 1 reserves ENTIRE GPU, not partial memory
MIG/MPS sharing has real limitations (not plug-and-play)
OOM on GPU kills pod with no useful logs

Model Versioning ≠ Code Versioning:

Model artifacts need separate versioning (MLflow, W&B, DVC)
Training data version + preprocessing version + code version = reproducibility
Rollback requires keeping old model versions deployable

Drift Detection Timing:

Retraining trigger isn't just "drift > threshold" → cost/benefit matters
Delayed ground truth makes concept drift detection lag weeks
Upstream data pipeline changes cause drift without model issues

Scope

This skill ONLY covers:

CI/CD pipelines for models
Model serving and scaling
Monitoring and drift detection
Reproducibility practices
GPU infrastructure patterns

Does NOT cover: ML algorithms, feature engineering, hyperparameter tuning.

如何使用「MLOps」？

打开小龙虾AI（Web 或 iOS App）
点击上方「立即使用」按钮，或在对话框中输入任务描述
小龙虾AI 会自动匹配并调用「MLOps」技能完成任务
结果即时呈现，支持继续对话优化

相关技能