🤖

Skill Evaluator

Evaluate Clawdbot skills for quality, reliability, and publish-readiness using a multi-framework rubric (ISO 25010, OpenSSF, Shneiderman, agent-specific heuristics). Use when asked to review, audit, evaluate, score, or assess a skill before publishing, or when checking skill quality. Runs automated structural checks and guides manual assessment across 25 criteria.

下载1.9k

星标2

版本1.0.0

安全通过

⚙️脚本

在 App 中使用在 ClawHub 查看 ↗

技能说明

name: skill-evaluator description: Evaluate Clawdbot skills for quality, reliability, and publish-readiness using a multi-framework rubric (ISO 25010, OpenSSF, Shneiderman, agent-specific heuristics). Use when asked to review, audit, evaluate, score, or assess a skill before publishing, or when checking skill quality. Runs automated structural checks and guides manual assessment across 25 criteria.

Skill Evaluator

Evaluate skills across 25 criteria using a hybrid automated + manual approach.

Quick Start

1. Run automated checks

python3 scripts/eval-skill.py /path/to/skill
python3 scripts/eval-skill.py /path/to/skill --json    # machine-readable
python3 scripts/eval-skill.py /path/to/skill --verbose  # show all details

Checks: file structure, frontmatter, description quality, script syntax, dependency audit, credential scan, env var documentation.

2. Manual assessment

Use the rubric at references/rubric.md to score 25 criteria across 8 categories (0–4 each, 100 total). Each criterion has concrete descriptions per score level.

3. Write the evaluation

Copy assets/EVAL-TEMPLATE.md to the skill directory as EVAL.md. Fill in automated results + manual scores.

Evaluation Process

Run eval-skill.py — get the automated structural score
Read the skill's SKILL.md — understand what it does
Read/skim the scripts — assess code quality, error handling, testability
Score each manual criterion using references/rubric.md — concrete criteria per level
Prioritize findings as P0 (blocks publishing) / P1 (should fix) / P2 (nice to have)
Write EVAL.md in the skill directory with scores + findings

Categories (8 categories, 25 criteria)

#	Category	Source Framework	Criteria
1	Functional Suitability	ISO 25010	Completeness, Correctness, Appropriateness
2	Reliability	ISO 25010	Fault Tolerance, Error Reporting, Recoverability
3	Performance / Context	ISO 25010 + Agent	Token Cost, Execution Efficiency
4	Usability — AI Agent	Shneiderman, Gerhardt-Powals	Learnability, Consistency, Feedback, Error Prevention
5	Usability — Human	Tognazzini, Norman	Discoverability, Forgiveness
6	Security	ISO 25010 + OpenSSF	Credentials, Input Validation, Data Safety
7	Maintainability	ISO 25010	Modularity, Modifiability, Testability
8	Agent-Specific	Novel	Trigger Precision, Progressive Disclosure, Composability, Idempotency, Escape Hatches

Interpreting Scores

Range	Verdict	Action
90–100	Excellent	Publish confidently
80–89	Good	Publishable, note known issues
70–79	Acceptable	Fix P0s before publishing
60–69	Needs Work	Fix P0+P1 before publishing
<60	Not Ready	Significant rework needed

Deeper Security Scanning

This evaluator covers security basics (credentials, input validation, data safety) but for thorough security audits of skills under development, consider SkillLens (npx skilllens scan <path>). It checks for exfiltration, code execution, persistence, privilege bypass, and prompt injection — complementary to the quality focus here.

Dependencies

Python 3.6+ (for eval-skill.py)
PyYAML (pip install pyyaml) — for frontmatter parsing in automated checks

如何使用「Skill Evaluator」？

打开小龙虾AI（Web 或 iOS App）
点击上方「立即使用」按钮，或在对话框中输入任务描述
小龙虾AI 会自动匹配并调用「Skill Evaluator」技能完成任务
结果即时呈现，支持继续对话优化

Skill Evaluator

技能说明

Skill Evaluator

Quick Start

1. Run automated checks

2. Manual assessment

3. Write the evaluation

Evaluation Process

Categories (8 categories, 25 criteria)

Interpreting Scores

Deeper Security Scanning

Dependencies

如何使用「Skill Evaluator」？

相关技能