Search Results: pdf-text-extraction

Found 8 Skills

AI & Machine Learningaktsmm/agent-skills

ocr-super-surya

GPU-optimized OCR using Surya. Use when: (1) Extracting text from images/screenshots, (2) Processing PDFs with embedded images, (3) Multi-language document OCR, (4) Layout analysis and table detection. Supports 90+ languages with 2x accuracy over Tesseract.

🇺🇸|EnglishTranslated

1 scripts/Checked

AI & Machine Learningvamseeachanta/workspace-h...

document-rag-pipeline

Build complete document knowledge bases with PDF text extraction, OCR for scanned documents, vector embeddings, and semantic search. Use this for creating searchable document libraries from folders of PDFs, technical standards, or any document collection.

🇺🇸|EnglishTranslated

Document Processingchildbamboo/claude-code-m...

pdf-reader

Reads PDF files and extracts text content in Markdown format. Handles tables and multi-page documents. Use when needing to read PDF documents. Requires pdfplumber package.

🇺🇸|EnglishTranslated

1 scripts/Checked

AI & Machine Learningparlamento-ai/parlamento-...

mistral-ocr

Extract text from images and PDFs using Mistral OCR API. Convert scanned documents to Markdown, JSON, or plain text. No external dependencies required. Use when you need OCR, extract text from images, convert PDFs to markdown, or digitize documents.

🇺🇸|EnglishTranslated

AI & Machine Learningletta-ai/skills

extracting-pdf-text

Extract text from PDFs for LLM consumption. Use when processing PDFs for RAG, document analysis, or text extraction. Supports API services (Mistral OCR) and local tools (PyMuPDF, pdfplumber). Handles text-based PDFs, tables, and scanned documents with OCR.

🇺🇸|EnglishTranslated

4 scripts/Checked

Document Processingteam-commonly/commonly

pdf

Manipulate PDF files — extract text, count pages, render thumbnails, merge or split documents. Use for PDF-specific operations that don't fit `markdown-converter` (general read) or `pandic-office` (write from markdown).

🇺🇸|EnglishTranslated

Data Processingratacat/claude-skills

ebook-extractor

Use when user wants to extract text from ebooks (EPUB, MOBI, PDF). Use for converting ebooks to plain text for analysis, processing, or reading. Handles all common ebook formats.

🇺🇸|EnglishTranslated

5 scripts/Attention

Document Processingwilloscar/research-units-...

pdf-text-extractor

Download PDFs (when available) and extract plain text to support full-text evidence, writing `papers/fulltext_index.jsonl` and `papers/fulltext/*.txt`. **Trigger**: PDF download, fulltext, extract text, papers/pdfs, 全文抽取, 下载PDF. **Use when**: `queries.md` 设置 `evidence_mode: fulltext`（或你明确需要全文证据）并希望为 paper notes/claims 提供更强 evidence。 **Skip if**: `evidence_mode: abstract`（默认）；或你不希望进行下载/抽取（成本/权限/时间）。 **Network**: fulltext 下载通常需要网络（除非你手工提供 PDF 缓存在 `papers/pdfs/`）。 **Guardrail**: 缓存下载到 `papers/pdfs/`；默认不覆盖已有抽取文本（除非显式要求重抽）。

🇺🇸|EnglishTranslated

1 scripts/Checked