code2animation
Produce complete code-based animated videos by scripting, generating narration, creating visual assets, and rendering final MP4s using the code2animation fra...
技能说明
AgentSaaS Video Editor Skill
A comprehensive video editing and rendering skill that enables AI agents to create code-driven animations with text-to-speech narration and smooth transitions.
Purpose
This skill allows agents to:
- Create and preview interactive video projects with animations
- Generate TTS audio narration using Microsoft Edge TTS
- Render complete videos with synchronized audio and visual effects
- Support both portrait and landscape video formats
- Apply smooth transition effects between media elements
Core Capabilities
1. Interactive Video Preview
- Real-time preview of video projects in the browser
- Playback controls for testing and debugging
- Support for transitions, media clips, and timing adjustments
- Frame-by-frame seeking for precise editing
- Live transition preview with easing effects
2. Transition System
- transitionIn: Each media defines its own entrance animation
- Supported transitions: fade, zoom, slide2Left, slideUp, none
- Easing: Built-in easeOutCubic for smooth slide and zoom animations
- stayInClip: Media can persist throughout entire clip duration
- Cross-clip transitions: Automatic handling of clip boundaries
3. TTS Audio Generation
- Automated text-to-speech using Microsoft Edge TTS (msedge-tts)
- Support for multiple voices (English and Chinese)
- Word-level timing metadata for lip-sync and animations
- Audio file caching for faster previews
4. Video Rendering
- Automated frame-by-frame rendering using Puppeteer
- FFmpeg integration for video encoding and audio mixing
- 30 FPS output at 1920x1080 (landscape) or 1080x1920 (portrait)
- Deterministic rendering for consistent results
- Transition effects preserved in final output
Project Configuration Format
Media Item Properties
- src: HTML filename in the footage directory
- words: Trigger phrase from speech that activates this media
- transitionIn: Entrance animation type (optional)
- transitionDuration: Duration in seconds (optional, default: 0.6s)
- stayInClip: If true, media remains visible until clip ends (optional)
Transition Types
- fade: Opacity transition (0 → 1)
- zoom: Scale transition (2x → 1x) with opacity
- slide2Left: Horizontal slide from right (100% → 0%)
- slideUp: Vertical slide from bottom (100% → 0%)
- none: No transition effect
Transition Behavior
- transitionIn: Defines how this media enters the scene
- transitionDuration: Duration in seconds (default: 0.6s)
- stayInClip: If true, media remains visible until clip ends
- Easing: slide2Left and slideUp use easeOutCubic for smooth deceleration
Reference Implementation
See public/projects/agentSaasPromoVideo.json for a complete working example demonstrating all transition types and stayInClip behavior.
Technical Requirements
System Dependencies
- Node.js: 18 or higher
- FFmpeg: Required for video encoding and audio mixing
- Chromium/Chrome: Used by Puppeteer for headless rendering
Node.js Dependencies
- React & Vite: Frontend framework and build tool
- Puppeteer: Headless browser for frame capture
- msedge-tts: Microsoft Edge TTS for audio generation
- Express: Optional HTTP server (for TTS API endpoint)
- Motion (Framer Motion): Animation library
- Tailwind CSS: Styling framework
Shell Commands Used
This skill executes the following shell commands:
Audio Generation
npx tsx scripts/generate-audio.ts <projectId>
- Reads project JSON configuration
- Generates MP3 audio files using Edge TTS
- Saves word-level timing metadata
Video Rendering
node scripts/render.js <projectId> [--portrait]
- Starts a local Vite dev server
- Launches Puppeteer to capture frames
- Uses FFmpeg to encode video and mix audio
- Cleans up temporary files
FFmpeg Operations
- Frame encoding:
ffmpeg -framerate 30 -i frames/frame-%05d.jpg -c:v libx264 ... - Audio mixing:
ffmpeg -i video.mp4 -i audio1.mp3 -i audio2.mp3 -filter_complex ...
Browser Detection
- Uses
whichcommand to find Chrome/Chromium on Linux/macOS - Respects
PUPPETEER_EXECUTABLE_PATHenvironment variable
API Endpoints (Optional)
The skill may expose an HTTP endpoint for TTS generation:
POST /api/tts
Content-Type: application/json
{
"text": "Text to speak",
"voice": "en-US-GuyNeural",
"rate": "+0%",
"pitch": "+0Hz"
}
This endpoint is optional and only used when pre-generated audio files are not available.
Security Considerations
File System Access
- Reads from:
public/projects/<projectId>/ - Writes to:
public/projects/<projectId>/audio/,public/video/ - Creates temporary directories for frame storage
- Cleans up temporary files after rendering
Network Access
- Starts local HTTP server on port 5175+ (configurable)
- Connects to Microsoft Edge TTS service (external)
- No external API keys required for basic functionality
Process Execution
- Spawns child processes for: Vite dev server, FFmpeg encoding
- Uses
execSyncfor: browser detection, audio generation trigger - All commands are predefined and not user-controllable
Environment Variables
Optional configuration:
PUPPETEER_EXECUTABLE_PATH: Custom browser path for PuppeteerFASTMCP_LOG_LEVEL: Logging level (default: ERROR)
Project Structure
public/
projects/
<projectId>/
<projectId>.json # Project configuration
footage/ # HTML/CSS media components
audio/ # Generated TTS audio files
0.mp3, 1.mp3, ...
0.json, 1.json, ... # Word timing metadata
video/
render-<projectId>-landscape.mp4
render-<projectId>-portrait.mp4
Usage Example
# 1. Generate audio for a project
pnpm generate-audio agentSaasPromoVideo
# 2. Preview in browser
pnpm dev
# 3. Render final video
pnpm render agentSaasPromoVideo
# 4. Render portrait version
pnpm render agentSaasPromoVideo --portrait
HTML Animation Guidelines
When creating HTML animations for video rendering, use the CSS variable timeline model.
Core Model
- Renderer controls time: Puppeteer sets
--tevery frame. - Page only renders state:
DOM = f(t). - No lifecycle animation APIs (
play/start/reset) and no hidden runtime state. - Transition system handles entrance effects: Don't implement slide/fade transitions in HTML - use the project's
transitionInproperty instead.
✅ Required Patterns
- Define timeline root:
:root { --t: 0; } - Every animated property must derive from
--t. - Always clamp normalized progress values:
--p: clamp(0, calc((var(--t) - var(--start)) / var(--duration)), 1); - Express initial/ending states directly in CSS (seek-safe at any frame).
- Use small deterministic JS only for content mapping (e.g., subtitle/text index from
t). - Let transition system handle entrance: Focus on content animation, not entrance effects.
🚫 Forbidden Patterns
transitionanimation/@keyframeswindow.registerFrameAnimation(...)requestAnimationFrameloops for timeline progression- Implicit time from
Date.now()/performance.now()for visual state - Manual entrance transitions: Don't implement slide/fade in HTML - use
transitionInin project config - Fade-out effects: Elements should not disappear after animation completes. Use
opacity: var(--p)instead ofopacity: calc(var(--p) * (1 - var(--fade)))to keep elements visible at their final state.
Recommended Template
.element {
--start: 0.5;
--duration: 1;
--p: clamp(0, calc((var(--t) - var(--start)) / var(--duration)), 1);
opacity: var(--p);
transform: translateY(calc((1 - var(--p)) * 20px));
}
Easing (without transition)
Use math on progress directly:
--p: clamp(0, calc((var(--t) - var(--start)) / var(--duration)), 1);
--ease-out: calc(1 - (1 - var(--p)) * (1 - var(--p)));
opacity: var(--ease-out);
JS Hook Pattern (text/content only)
<script>
const labels = ['A', 'B', 'C'];
const el = document.getElementById('label');
window.onTimelineUpdate = (t) => {
const idx = Math.floor(Math.max(0, t) / 1.2) % labels.length;
el.textContent = labels[idx];
};
</script>
Time Semantics (t vs globalTime)
onTimelineUpdate(t, globalTime)supports two time domains:t: clip-local time (resets to0when clip changes). This is the default for most HTML animations.globalTime: continuous timeline across clips. Use only when an element must stay continuous through cross-clip transitions.
- Do not assume
tis media-local. If a media appears mid-clip,tmay already be large when it first becomes visible. - For media-local behavior (e.g., toggle starts animating when this media appears), anchor from first visible
globalTimeand derive:local = globalTime - mediaStartGlobalTime
- Keep fallback for compatibility:
window.onTimelineUpdate = (t, globalTime) => { const g = Number.isFinite(globalTime) ? globalTime : t; // use `t` for normal clip-local animation, `g` only when continuity is required };
Determinism Checklist
- Seeking to any
tyields exactly one deterministic frame. - Animation state must not depend on "previous frame".
- Cross-clip transition visuals should be continuous in both clips.
- Final frame (
t = totalDuration) must remain on the last clip (no wrap to first clip).
Limitations
- Requires FFmpeg to be installed on the system
- TTS generation requires internet connection (Microsoft Edge TTS)
- Rendering is CPU-intensive and may take several minutes
- Maximum TTS text length: ~1000 characters per clip
- Frame capture requires sufficient disk space
Transparency Statement
This skill executes shell commands and spawns child processes for video rendering. All operations are limited to:
- Starting a local development server (Vite)
- Running FFmpeg for video encoding
- Launching Puppeteer for frame capture
- Detecting browser executables on the system
No arbitrary code execution or user input is passed to shell commands. All file paths and commands are predefined and validated.
如何使用「code2animation」?
- 打开小龙虾AI(Web 或 iOS App)
- 点击上方「立即使用」按钮,或在对话框中输入任务描述
- 小龙虾AI 会自动匹配并调用「code2animation」技能完成任务
- 结果即时呈现,支持继续对话优化