跳至主要内容
小龙虾小龙虾AI
🤖

Models

Choose AI models for coding, reasoning, and agents with cost-aware, task-matched recommendations.

下载521
星标2
版本1.0.0
AI 智能体
安全通过
💬Prompt

技能说明


name: Models description: Choose AI models for coding, reasoning, and agents with cost-aware, task-matched recommendations. metadata: {"clawdbot":{"emoji":"🤖","os":["linux","darwin","win32"]}}

AI Model Selection Rules

Core Principle

  • No single model is best for everything — match model to task, not brand loyalty
  • A $0.75/M model often performs identically to a $40/M model for simple tasks
  • Test cheaper alternatives before committing to expensive defaults

Cost Reality

  • Output tokens cost 3-10x more than input tokens — advertised input prices are misleading
  • Calculate real cost with your actual input/output ratio, not theoretical pricing
  • Batch/async APIs offer 50% discounts — use them for non-real-time workloads
  • Prompt caching reduces repeated context costs significantly

Task Matching

Coding

  • Architecture and design decisions: Use frontier models (Opus-class) — they catch subtle issues cheaper models miss
  • Day-to-day implementation: Mid-tier models (Sonnet-class) offer 90% of capability at 20% of cost
  • Parallel subtasks and scaffolding: Fast/cheap models (Haiku-class) — speed matters more than depth
  • Code review: Thorough models catch async bugs and edge cases that fast models miss

Non-Coding

  • Complex reasoning and math: Extended thinking modes justify their cost for hard problems
  • General assistance: User preference studies favor models different from benchmark leaders
  • High-volume simple queries: Cheapest models perform identically — don't overpay
  • Long documents: Context window size determines viability — some offer 1M+ tokens

Claude Code vs Codex CLI

  • Claude Code: Fast iteration, UI/frontend, interactive debugging — developer stays in the loop
  • Codex CLI: Long-running background tasks, large refactors, set-and-forget — accuracy over speed
  • Both tools have value — use Claude Code for implementation, Codex for final review
  • File size limits differ — Claude Code struggles with files over 25K tokens

Orchestration Pattern

  • Planning phase: Use expensive/smart models to break down problems correctly
  • Execution phase: Use balanced models, parallelize where possible
  • Review phase: Use accurate models for final verification — catches bugs others miss
  • This pattern beats using one model for everything at similar total cost

Benchmark Skepticism

  • Benchmark scores vary 2-3x based on scaffolding and evaluation method
  • User preference rankings differ significantly from benchmark rankings
  • SWE-bench scores don't predict real-world coding quality reliably
  • Models drift week-to-week — last month's best may underperform today

Open Source Viability

  • DeepSeek and similar models approach frontier performance at 1/50th API cost
  • Self-hosting eliminates API rate limits and price variability
  • MIT/Apache licensed models allow commercial use without restrictions
  • Consider for: data privacy, cost predictability, custom fine-tuning

Model Selection Mistakes

  • Using premium models for chatbot responses that cheap models handle identically
  • Ignoring context window limits — chunking long documents costs more than using large-context models
  • Expecting consistency — same prompt gives different results over time as models update
  • Trusting speed over accuracy for complex tasks — fast models trade thoroughness for latency

Practical Guidelines

  • Default to mid-tier for most tasks, escalate to frontier only when quality suffers
  • Track actual costs per workflow, not just per-token rates
  • Build verification into pipelines — don't trust any model blindly
  • Reassess model choices quarterly — pricing and capabilities shift constantly

如何使用「Models」?

  1. 打开小龙虾AI(Web 或 iOS App)
  2. 点击上方「立即使用」按钮,或在对话框中输入任务描述
  3. 小龙虾AI 会自动匹配并调用「Models」技能完成任务
  4. 结果即时呈现,支持继续对话优化

相关技能