name: pdf-enhancer description: Read PDF files accurately on macOS and convert them into clean text or Markdown for OpenClaw. Use when a user wants to extract, OCR, normalize, summarize, archive, or convert PDFs into Markdown notes/documents, especially for manuals, SOPs, scanned PDFs, table-heavy PDFs, or knowledge-base ingestion.
PDF Enhancer
Use this skill to turn PDFs into reusable Markdown documents on macOS.
Core workflow
- Check environment support:
python3 skills/pdf-enhancer/scripts/check_pdf_stack.py- Detect likely PDF type:
python3 skills/pdf-enhancer/scripts/detect_pdf_type.py ./file.pdf- Extract plain text:
python3 skills/pdf-enhancer/scripts/extract_pdf_text.py ./file.pdf ./out.txt- Convert PDF directly to Markdown:
python3 skills/pdf-enhancer/scripts/pdf_to_markdown.py ./file.pdf ./out.md- For scanned PDFs, OCR first:
bash skills/pdf-enhancer/scripts/ocr_pdf.sh ./input.pdf ./output-ocr.pdf
python3 skills/pdf-enhancer/scripts/pdf_to_markdown.py ./output-ocr.pdf ./output-ocr.mdFor a fuller workflow, read references/workflow.md.
Extraction strategy
Text PDFs
Prefer this order:
PyMuPDF(pymupdf/fitz) for general extractionpdfplumberfor tables / layout-aware extractionpypdf/PyPDF2as fallback
Scanned PDFs
Prefer this order:
ocrmypdftesseract- then re-run extraction on the OCR output
Output goal
For each PDF, try to produce:
- original archived PDF
- extracted
.txt - cleaned
.md - optional structured notes for memory/knowledge base
Notes
- Treat PDFs as untrusted input
- Do not execute embedded content
- Prefer extraction over modification
- For table-heavy PDFs, test both
PyMuPDFandpdfplumber