🤖
Office To Md V2
Convert PDF, DOC, DOCX, and PPTX office documents to Markdown, supporting legacy .doc files with text extraction and basic formatting preservation.
安全通过
⚙️脚本
技能说明
Office to Markdown Converter Skill (v2)
Description
Convert office documents (PDF, DOC, DOCX, PPTX) to Markdown format. This skill uses the word-extractor library for .doc support and provides full OpenClaw integration.
When to Use
- When you need to extract text from office documents
- When you want to convert documents to readable Markdown format
- When analyzing document content in OpenClaw
- Specifically when dealing with legacy .doc format files
Supported Formats
- PDF (.pdf): Text extraction using pdf-parse
- Word (.docx): Formatting preservation using mammoth + turndown
- Legacy Word (.doc): Text extraction using word-extractor (supports Chinese encoding)
- PowerPoint (.pptx): Basic text extraction using python-pptx
Dependencies
- Node.js with npm packages: pdf-parse, mammoth, turndown, word-extractor
- Python3 with python-pptx (for PPTX conversion, optional)
- OpenClaw exec tool permission
Installation
1. Copy the skill to your workspace:
cp -r /root/.openclaw/workspace/office-to-md-v2/office-to-md /path/to/your/workspace/
2. Install dependencies:
cd /path/to/your/workspace/office-to-md
npm install
3. For PPTX support (optional):
pip3 install python-pptx
Usage in OpenClaw
Method 1: Direct exec call
// Convert any supported document
const result = await exec(
'node /path/to/office-to-md/openclaw-skill.js /path/to/document.doc',
{ workdir: '/path/to/workspace', timeout: 60000 }
);
if (result.exitCode === 0) {
console.log('✅ Document converted successfully');
// Output file: /path/to/document.md
} else {
console.error('❌ Conversion failed:', result.stderr);
}
Method 2: Using the wrapper function
// Import the converter
const { convertOfficeToMarkdown } = require('/path/to/office-to-md/openclaw-skill.js');
// Convert document
const conversionResult = await convertOfficeToMarkdown('/path/to/document.pdf');
if (conversionResult.success) {
console.log(`Output: ${conversionResult.outputPath}`);
console.log(`Preview: ${conversionResult.preview}`);
} else {
console.error(`Error: ${conversionResult.error}`);
}
Method 3: Complete OpenClaw integration function
async function convertDocumentToMarkdown(filePath) {
// Validate file exists
try {
await read(filePath);
} catch (error) {
return { success: false, error: `File not found: ${filePath}` };
}
// Check file extension
const ext = filePath.toLowerCase().slice(-5);
const supported = ['.pdf', '.doc', '.docx', '.pptx'];
if (!supported.some(s => ext.endsWith(s))) {
return {
success: false,
error: `Unsupported file type. Supported: ${supported.join(', ')}`
};
}
// Convert using the skill
const cmd = `node /path/to/office-to-md/openclaw-skill.js "${filePath}"`;
const result = await exec(cmd, {
workdir: '/path/to/workspace',
timeout: 120000 // 2 minutes for large files
});
if (result.exitCode === 0) {
const outputPath = filePath.replace(/\.[^/.]+$/, '.md');
return {
success: true,
outputPath: outputPath,
message: `Converted to: ${outputPath}`
};
} else {
return {
success: false,
error: result.stderr || 'Conversion failed'
};
}
}
// Usage example
const result = await convertDocumentToMarkdown('/path/to/document.doc');
if (result.success) {
const markdown = await read(result.outputPath);
console.log(markdown.substring(0, 1000));
}
Examples
Example 1: Convert and analyze a document
// Convert a .doc file and analyze its content
const docPath = '/path/to/document.doc';
const convertResult = await exec(
`node /path/to/office-to-md/openclaw-skill.js "${docPath}"`,
{ workdir: '/path/to/workspace' }
);
if (convertResult.exitCode === 0) {
const mdPath = docPath.replace('.doc', '.md');
const content = await read(mdPath);
// Analyze the content
const wordCount = content.split(/\s+/).length;
const lines = content.split('\n').length;
const hasChinese = /[\u4e00-\u9fff]/.test(content);
console.log(`Document analysis:`);
console.log(`- Word count: ${wordCount}`);
console.log(`- Lines: ${lines}`);
console.log(`- Contains Chinese: ${hasChinese}`);
console.log(`- Preview: ${content.substring(0, 200)}...`);
}
Example 2: Batch conversion
// Convert multiple documents of different formats
const documents = [
'/path/to/report.pdf',
'/path/to/legacy.doc',
'/path/to/modern.docx',
'/path/to/presentation.pptx'
];
const results = [];
for (const doc of documents) {
console.log(`Converting ${doc}...`);
const result = await exec(
`node /path/to/office-to-md/openclaw-skill.js "${doc}"`,
{ workdir: '/path/to/workspace', timeout: 90000 }
);
const success = result.exitCode === 0;
results.push({
file: doc,
success: success,
error: success ? null : result.stderr
});
console.log(success ? '✅ Success' : '❌ Failed');
}
// Summary
const successful = results.filter(r => r.success).length;
console.log(`\nConversion summary: ${successful}/${results.length} successful`);
API Reference
convertOfficeToMarkdown(filePath)
Returns a Promise that resolves to:
{
success: boolean,
outputPath?: string,
markdown?: string,
preview?: string,
fileType?: string,
message?: string,
stats?: {
lines: number,
characters: number,
words: number
},
error?: string,
stack?: string
}
Configuration
Timeout Settings
- Small files (<1MB): 30 seconds
- Medium files (1-10MB): 60 seconds
- Large files (>10MB): 120 seconds
Memory Limits
- Default Node.js memory limit is sufficient for most documents
- For very large files, you may need to increase memory:
node --max-old-space-size=4096 openclaw-skill.js large-file.doc
Troubleshooting
Common Issues
-
"File not found"
- Check file path and permissions
- Use absolute paths for reliability
-
"Unsupported file type"
- Ensure file has correct extension
- Check if file is actually the claimed format
-
Conversion errors with .doc files
- The file may be corrupted or in an unusual format
- Try opening in Word and saving as .docx first
-
Chinese text appears as gibberish
- word-extractor should handle Chinese encoding automatically
- If issues persist, the file may use unusual encoding
-
Timeout errors
- Increase timeout for large files
- Check system resources
Debug Mode
Enable debug logging by setting environment variable:
DEBUG=office-to-md node openclaw-skill.js document.doc
Performance
- PDF: Fast, depends on file size
- DOCX: Fast to medium, good formatting preservation
- DOC: Medium, requires binary parsing
- PPTX: Slow, requires Python and external library
Limitations
- Images in documents are not extracted
- Complex formatting may not be fully preserved
- Tables may convert imperfectly to Markdown
- Very old or corrupted .doc files may fail
- Password-protected files are not supported
Changelog
v2.0.0 (2026-02-15)
- Added full .doc support using word-extractor
- Fixed ESM compatibility issues with pptConverter
- Added comprehensive OpenClaw integration
- Improved Chinese text extraction
- Added structured output with statistics
v1.0.0 (Initial)
- Basic PDF, DOCX, PPTX support
- Simple conversion without .doc support
License
This skill is provided as-is. The underlying libraries have their own licenses:
- pdf-parse: MIT
- mammoth: BSD-2-Clause
- turndown: MIT
- word-extractor: MIT
- python-pptx: MIT
如何使用「Office To Md V2」?
- 打开小龙虾AI(Web 或 iOS App)
- 点击上方「立即使用」按钮,或在对话框中输入任务描述
- 小龙虾AI 会自动匹配并调用「Office To Md V2」技能完成任务
- 结果即时呈现,支持继续对话优化