Arxiv Paper Processor
Tool for manual per-paper ArXiv paper processing: batch/source/pdf download then model-driven full-text reading and summary.md writing in chosen language.
技能说明
name: arxiv-paper-processor description: "Tool-only paper processing skill with a manual language parameter: supports batch artifact download for many papers or single-paper download, then the model manually reads source/PDF and writes summary.md in the selected language. Use when per-paper comprehension should be model-driven instead of script-generated."
ArXiv Paper Processor
Use this skill for per-paper manual summarization, with optional batch artifact download.
- Single-paper mode: process one paper directory (e.g.
<run_dir>/<arxiv_id>/). - Batch predownload mode: process many paper directories under one run dir before writing summaries.
Language Parameter
- Use a workflow language parameter (for example
EnglishorChinese) and apply it manually. - The per-paper
summary.mdmust be written in the selected language. - If download scripts are called directly, pass
--language <LANG>for traceability.
Core Principle
Scripts only fetch artifacts. The model performs reading and writing.
Non-negotiable Constraint
- Do not generate
summary.mdby script-based snippet extraction, regex harvesting, or template autofill. - Do not use Python/shell scripts to auto-compose section text from abstract/introduction fragments.
- Scripts in this skill are only for artifact download (
source/pdf) and trace logs. - The final
summary.mdmust come from model-side reading and synthesis of the paper content.
Optional Batch Artifact Download (Many Papers)
Use this first when Stage B has many papers:
python3 scripts/download_papers_batch.py \
--run-dir /path/to/run \
--artifact source_then_pdf \
--max-workers 3 \
--min-interval-sec 5 \
--language English
Key behavior:
- Supports
--artifact source,--artifact pdf, or--artifact source_then_pdf(default). - Supports concurrency (
--max-workers) and safe throttling/retry (--min-interval-sec, retry args). - Uses run-local throttle state by default (
<run_dir>/.runtime/arxiv_download_state.json) to reduce 429 risk. - Skips papers that already have usable
source/source_extract/*.texor existingsource/paper.pdf(unless--force). - Resume-friendly: if a paper already has a completed
summary.md, you can skip that paper's summary-writing step. - Writes batch log to
<run_dir>/download_batch_log.jsonby default.
Step 1: Download Source (Preferred)
python3 scripts/download_arxiv_source.py \
--paper-dir /path/to/run/2602.00528 \
--language English
This writes:
source/source_bundle.binsource/source_extract/source/download_source_log.json
If usable source already exists and --force is not set, the script reuses local artifacts.
Step 2: If Needed, Download PDF
python3 scripts/download_arxiv_pdf.py \
--paper-dir /path/to/run/2602.00528 \
--language English
This writes:
source/paper.pdfsource/download_pdf_log.json
If PDF already exists and --force is not set, the script reuses local artifacts.
Step 3: Model Reads and Summarizes
- If
summary.mdalready exists and follows the required format, skip this paper and mark it complete. - Read
metadata.mdfirst. - If
source/source_extract/already exists with readable.texfiles, use it directly. - Otherwise, if
source/paper.pdfalready exists, use PDF directly. - If neither exists, run download scripts (single-paper scripts or batch script) first.
- Manually write
summary.mdin the same paper directory, in the selected language.
Do not rely on rule-based auto summarization. Do not rely on auto-extracted snippets as the primary writing basis.
Quality Requirement
- Every section should include paper-specific details that are traceable to full-text reading.
- Section 4/5/10 should reflect concrete method and evaluation details, not generic wording.
- If key details are unclear in the source, explicitly note uncertainty instead of guessing.
- Match the detail level shown in
references/summary-example-en.mdandreferences/summary-example-zh.md. - If your draft is clearly shorter or less specific than the examples, expand it before finishing.
Required Output
<paper_dir>/summary.mdin fixed section format.- Pay special attention to section
## 10. Brief Conclusion: write a 3-4 sentence mini-conclusion that covers contribution, method, evaluation setup, and results with paper-specific details. - In section
## 1. Paper Snapshot, use exact keys:ArXiv ID,Title,Authors,Publish date,Primary category,Reading basis. - Do not use key variants such as
Reading source,Author list,Published on, or lowercase key names.
See references/summary-format.md for exact section requirements.
Related Skills
This skill is a sub-skill of arxiv-summarizer-orchestrator.
Pipeline position:
- Step 1 (upstream):
arxiv-search-collectorproduces the selected paper directories and metadata. - Step 2 (this skill):
arxiv-paper-processordownloads artifacts and writes onesummary.mdper paper. - Step 3 (downstream):
arxiv-batch-reporteruses these per-paper summaries to generate the final collection report.
Use this skill together with Step 1 and Step 3 for full end-to-end execution.
如何使用「Arxiv Paper Processor」?
- 打开小龙虾AI(Web 或 iOS App)
- 点击上方「立即使用」按钮,或在对话框中输入任务描述
- 小龙虾AI 会自动匹配并调用「Arxiv Paper Processor」技能完成任务
- 结果即时呈现,支持继续对话优化