markit-markdown-converter
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
Chinesemarkit-markdown-converter
markit-markdown转换器
Skill by ara.so — Daily 2026 Skills collection.
markit converts almost anything to markdown: PDFs, Word docs, PowerPoint, Excel, HTML, EPUB, Jupyter notebooks, RSS feeds, CSV, JSON, YAML, images (with EXIF + AI description), audio (with metadata + AI transcription), ZIP archives, URLs, Wikipedia pages, and source code files. It works as a CLI tool and as a TypeScript/Node.js library, supports pluggable converters, and integrates with OpenAI, Anthropic, and any OpenAI-compatible LLM API.
由ara.so开发的Skill——属于Daily 2026 Skills合集。
markit几乎可以将任何内容转换为Markdown格式:PDF、Word文档、PowerPoint演示文稿、Excel表格、HTML、EPUB、Jupyter笔记本、RSS订阅源、CSV、JSON、YAML、图片(包含EXIF信息+AI描述)、音频(包含元数据+AI转录)、ZIP压缩包、网址、维基百科页面以及源代码文件。它既可以作为CLI工具使用,也可以作为TypeScript/Node.js库使用,支持可插拔转换器,并与OpenAI、Anthropic以及任何兼容OpenAI的LLM API集成。
Installation
安装
bash
undefinedbash
undefinedGlobal CLI
全局安装CLI
npm install -g markit-ai
npm install -g markit-ai
Or as a project dependency
或作为项目依赖安装
npm install markit-ai
npm install markit-ai
bun add markit-ai
bun add markit-ai
pnpm add markit-ai
pnpm add markit-ai
---
---CLI Quick Reference
CLI快速参考
bash
undefinedbash
undefinedConvert a file
转换文件
markit report.pdf
markit document.docx
markit slides.pptx
markit data.xlsx
markit notebook.ipynb
markit report.pdf
markit document.docx
markit slides.pptx
markit data.xlsx
markit notebook.ipynb
Convert a URL
转换网址
Convert media (requires LLM API key for AI features)
转换媒体内容(AI功能需要LLM API密钥)
markit photo.jpg
markit recording.mp3
markit diagram.png -p "Describe the architecture and data flow"
markit receipt.jpg -p "List all line items with prices as a table"
markit photo.jpg
markit recording.mp3
markit diagram.png -p "描述架构和数据流"
markit receipt.jpg -p "将所有商品条目及价格整理为表格"
Output options
输出选项
markit report.pdf -o report.md # Write to file
markit report.pdf -q # Raw markdown only (great for piping)
markit report.pdf --json # Structured JSON output
markit report.pdf -o report.md # 写入文件
markit report.pdf -q # 仅输出原始Markdown(适合管道操作)
markit report.pdf --json # 输出结构化JSON
Read from stdin
从标准输入读取内容
cat file.pdf | markit -
cat file.pdf | markit -
Pipe output
管道输出
markit report.pdf | pbcopy
markit data.xlsx -q | some-other-tool
markit report.pdf | pbcopy
markit data.xlsx -q | some-other-tool
List supported formats
列出支持的格式
markit formats
markit formats
Configuration
配置
markit init # Create .markit/config.json
markit config show # Show resolved config
markit config get llm.model
markit config set llm.provider anthropic
markit config set llm.model claude-haiku-4-5
markit init # 创建.markit/config.json
markit config show # 显示已解析的配置
markit config get llm.model
markit config set llm.provider anthropic
markit config set llm.model claude-haiku-4-5
Plugins
插件
markit plugin install npm:markit-plugin-dwg
markit plugin install git:github.com/user/markit-plugin-ocr
markit plugin install ./my-plugin.ts
markit plugin list
markit plugin remove dwg
markit plugin install npm:markit-plugin-dwg
markit plugin install git:github.com/user/markit-plugin-ocr
markit plugin install ./my-plugin.ts
markit plugin list
markit plugin remove dwg
Agent integration
Agent集成
markit onboard # Adds usage instructions to CLAUDE.md
---markit onboard # 将使用说明添加到CLAUDE.md
---AI / LLM Configuration
AI / LLM 配置
Images and audio always get free metadata extraction. AI-powered description and transcription requires an API key.
bash
undefined图片和音频的元数据提取是免费的。AI驱动的描述和转录功能需要API密钥。
bash
undefinedOpenAI (default)
OpenAI(默认)
export OPENAI_API_KEY=sk-...
markit photo.jpg
export OPENAI_API_KEY=sk-...
markit photo.jpg
Anthropic
Anthropic
export ANTHROPIC_API_KEY=sk-ant-...
markit config set llm.provider anthropic
markit photo.jpg
export ANTHROPIC_API_KEY=sk-ant-...
markit config set llm.provider anthropic
markit photo.jpg
OpenAI-compatible APIs (Ollama, Groq, Together, etc.)
兼容OpenAI的API(Ollama、Groq、Together等)
markit config set llm.apiBase http://localhost:11434/v1
markit config set llm.model llama3.2-vision
markit photo.jpg
`.markit/config.json` (created by `markit init`):
```json
{
"llm": {
"provider": "openai",
"apiBase": "https://api.openai.com/v1",
"model": "gpt-4.1-nano",
"transcriptionModel": "gpt-4o-mini-transcribe"
}
}Environment variables always override config file values. Never store API keys in the config file — use env vars.
| Provider | Env Vars | Default Vision Model |
|---|---|---|
| | |
| | |
markit config set llm.apiBase http://localhost:11434/v1
markit config set llm.model llama3.2-vision
markit photo.jpg
`.markit/config.json`(通过`markit init`创建):
```json
{
"llm": {
"provider": "openai",
"apiBase": "https://api.openai.com/v1",
"model": "gpt-4.1-nano",
"transcriptionModel": "gpt-4o-mini-transcribe"
}
}环境变量始终会覆盖配置文件中的值。请勿在配置文件中存储API密钥——请使用环境变量。
| 提供商 | 环境变量 | 默认视觉模型 |
|---|---|---|
| | |
| | |
SDK Usage
SDK使用
Basic File and URL Conversion
基础文件与网址转换
typescript
import { Markit } from "markit-ai";
const markit = new Markit();
// Convert a file by path
const { markdown } = await markit.convertFile("report.pdf");
console.log(markdown);
// Convert a URL
const { markdown: webMd } = await markit.convertUrl("https://example.com/article");
// Convert a Buffer with explicit type hint
import { readFileSync } from "fs";
const buffer = readFileSync("document.docx");
const { markdown: docMd } = await markit.convert(buffer, { extension: ".docx" });typescript
import { Markit } from "markit-ai";
const markit = new Markit();
// 通过路径转换文件
const { markdown } = await markit.convertFile("report.pdf");
console.log(markdown);
// 转换网址
const { markdown: webMd } = await markit.convertUrl("https://example.com/article");
// 转换Buffer并指定类型提示
import { readFileSync } from "fs";
const buffer = readFileSync("document.docx");
const { markdown: docMd } = await markit.convert(buffer, { extension: ".docx" });With OpenAI for Vision + Transcription
结合OpenAI实现视觉+转录功能
typescript
import OpenAI from "openai";
import { Markit } from "markit-ai";
const openai = new OpenAI(); // reads OPENAI_API_KEY from env
const markit = new Markit({
describe: async (image: Buffer, mime: string) => {
const res = await openai.chat.completions.create({
model: "gpt-4.1-nano",
messages: [
{
role: "user",
content: [
{ type: "text", text: "Describe this image in detail." },
{
type: "image_url",
image_url: {
url: `data:${mime};base64,${image.toString("base64")}`,
},
},
],
},
],
});
return res.choices[0].message.content ?? "";
},
transcribe: async (audio: Buffer, mime: string) => {
const res = await openai.audio.transcriptions.create({
model: "gpt-4o-mini-transcribe",
file: new File([audio], "audio.mp3", { type: mime }),
});
return res.text;
},
});
const { markdown } = await markit.convertFile("photo.jpg");typescript
import OpenAI from "openai";
import { Markit } from "markit-ai";
const openai = new OpenAI(); // 从环境变量读取OPENAI_API_KEY
const markit = new Markit({
describe: async (image: Buffer, mime: string) => {
const res = await openai.chat.completions.create({
model: "gpt-4.1-nano",
messages: [
{
role: "user",
content: [
{ type: "text", text: "详细描述这张图片。" },
{
type: "image_url",
image_url: {
url: `data:${mime};base64,${image.toString("base64")}`,
},
},
],
},
],
});
return res.choices[0].message.content ?? "";
},
transcribe: async (audio: Buffer, mime: string) => {
const res = await openai.audio.transcriptions.create({
model: "gpt-4o-mini-transcribe",
file: new File([audio], "audio.mp3", { type: mime }),
});
return res.text;
},
});
const { markdown } = await markit.convertFile("photo.jpg");With Anthropic for Vision
结合Anthropic实现视觉功能
typescript
import Anthropic from "@anthropic-ai/sdk";
import OpenAI from "openai";
import { Markit } from "markit-ai";
const anthropic = new Anthropic(); // reads ANTHROPIC_API_KEY from env
const openai = new OpenAI(); // reads OPENAI_API_KEY from env
// Mix providers: Claude for images, OpenAI Whisper for audio
const markit = new Markit({
describe: async (image: Buffer, mime: string) => {
const res = await anthropic.messages.create({
model: "claude-haiku-4-5",
max_tokens: 1024,
messages: [
{
role: "user",
content: [
{
type: "image",
source: {
type: "base64",
media_type: mime as "image/jpeg" | "image/png" | "image/gif" | "image/webp",
data: image.toString("base64"),
},
},
{ type: "text", text: "Describe this image." },
],
},
],
});
return (res.content[0] as { text: string }).text;
},
transcribe: async (audio: Buffer, mime: string) => {
const res = await openai.audio.transcriptions.create({
model: "gpt-4o-mini-transcribe",
file: new File([audio], "audio.mp3", { type: mime }),
});
return res.text;
},
});typescript
import Anthropic from "@anthropic-ai/sdk";
import OpenAI from "openai";
import { Markit } from "markit-ai";
const anthropic = new Anthropic(); // 从环境变量读取ANTHROPIC_API_KEY
const openai = new OpenAI(); // 从环境变量读取OPENAI_API_KEY
// 混合使用提供商:Claude处理图片,OpenAI Whisper处理音频
const markit = new Markit({
describe: async (image: Buffer, mime: string) => {
const res = await anthropic.messages.create({
model: "claude-haiku-4-5",
max_tokens: 1024,
messages: [
{
role: "user",
content: [
{
type: "image",
source: {
type: "base64",
media_type: mime as "image/jpeg" | "image/png" | "image/gif" | "image/webp",
data: image.toString("base64"),
},
},
{ type: "text", text: "描述这张图片。" },
],
},
],
});
return (res.content[0] as { text: string }).text;
},
transcribe: async (audio: Buffer, mime: string) => {
const res = await openai.audio.transcriptions.create({
model: "gpt-4o-mini-transcribe",
file: new File([audio], "audio.mp3", { type: mime }),
});
return res.text;
},
});Using Built-in Providers via Config
通过配置使用内置提供商
typescript
import { Markit, createLlmFunctions, loadConfig, loadAllPlugins } from "markit-ai";
// Reads .markit/config.json and env vars automatically
const config = loadConfig();
// Load any installed plugins
const plugins = await loadAllPlugins();
// Create instance with built-in providers + plugins
const markit = new Markit(createLlmFunctions(config), plugins);
const { markdown } = await markit.convertFile("report.pdf");typescript
import { Markit, createLlmFunctions, loadConfig, loadAllPlugins } from "markit-ai";
// 自动读取.markit/config.json和环境变量
const config = loadConfig();
// 加载所有已安装的插件
const plugins = await loadAllPlugins();
// 使用内置提供商+插件创建实例
const markit = new Markit(createLlmFunctions(config), plugins);
const { markdown } = await markit.convertFile("report.pdf");Writing a Plugin
编写插件
Plugins let you add new formats or override built-in converters. Plugin converters run before built-ins.
插件可让你添加新格式或覆盖内置转换器。插件转换器会在内置转换器之前运行。
Basic Converter Plugin
基础转换器插件
typescript
// my-plugin.ts
import type { MarkitPluginAPI } from "markit-ai";
export default function (api: MarkitPluginAPI) {
api.setName("my-format");
api.setVersion("1.0.0");
api.registerConverter(
{
name: "myformat",
accepts: (info) => [".myf", ".myfmt"].includes(info.extension ?? ""),
convert: async (input: Buffer, info) => {
// info.extension, info.mimeType, info.fileName available
const text = input.toString("utf-8");
const markdown = `# Converted\n\n\`\`\`\n${text}\n\`\`\``;
return { markdown };
},
},
// Optional: declare so it appears in `markit formats`
{ name: "My Format", extensions: [".myf", ".myfmt"] },
);
}typescript
// my-plugin.ts
import type { MarkitPluginAPI } from "markit-ai";
export default function (api: MarkitPluginAPI) {
api.setName("my-format");
api.setVersion("1.0.0");
api.registerConverter(
{
name: "myformat",
accepts: (info) => [".myf", ".myfmt"].includes(info.extension ?? ""),
convert: async (input: Buffer, info) => {
// 可使用info.extension, info.mimeType, info.fileName
const text = input.toString("utf-8");
const markdown = `# 转换结果\n\n\`\`\`\n${text}\n\`\`\``;
return { markdown };
},
},
// 可选:声明后会显示在`markit formats`中
{ name: "My Format", extensions: [".myf", ".myfmt"] },
);
}Override a Built-in Converter
覆盖内置转换器
typescript
// better-pdf-plugin.ts
import type { MarkitPluginAPI } from "markit-ai";
export default function (api: MarkitPluginAPI) {
api.setName("better-pdf");
api.setVersion("1.0.0");
// Runs before built-in PDF converter, effectively replacing it
api.registerConverter({
name: "pdf",
accepts: (info) => info.extension === ".pdf",
convert: async (input: Buffer, info) => {
// Custom PDF extraction logic
const markdown = `# PDF Content\n\n(extracted with custom logic)`;
return { markdown };
},
});
}typescript
// better-pdf-plugin.ts
import type { MarkitPluginAPI } from "markit-ai";
export default function (api: MarkitPluginAPI) {
api.setName("better-pdf");
api.setVersion("1.0.0");
// 在内置PDF转换器之前运行,相当于替换它
api.registerConverter({
name: "pdf",
accepts: (info) => info.extension === ".pdf",
convert: async (input: Buffer, info) => {
// 自定义PDF提取逻辑
const markdown = `# PDF内容\n\n(使用自定义逻辑提取)`;
return { markdown };
},
});
}Register a Custom LLM Provider
注册自定义LLM提供商
typescript
import type { MarkitPluginAPI } from "markit-ai";
export default function (api: MarkitPluginAPI) {
api.setName("gemini-provider");
api.registerProvider({
name: "gemini",
envKeys: ["GOOGLE_API_KEY"],
defaultBase: "https://generativelanguage.googleapis.com/v1beta",
defaultModel: "gemini-2.0-flash",
create: (config, prompt) => ({
describe: async (image: Buffer, mime: string) => {
// Call Gemini API here
return "Image description from Gemini";
},
}),
});
}typescript
import type { MarkitPluginAPI } from "markit-ai";
export default function (api: MarkitPluginAPI) {
api.setName("gemini-provider");
api.registerProvider({
name: "gemini",
envKeys: ["GOOGLE_API_KEY"],
defaultBase: "https://generativelanguage.googleapis.com/v1beta",
defaultModel: "gemini-2.0-flash",
create: (config, prompt) => ({
describe: async (image: Buffer, mime: string) => {
// 在此处调用Gemini API
return "来自Gemini的图片描述";
},
}),
});
}Install and Use a Plugin
安装并使用插件
bash
undefinedbash
undefinedInstall from file
从文件安装
markit plugin install ./my-plugin.ts
markit plugin install ./my-plugin.ts
Install from npm
从npm安装
markit plugin install npm:markit-plugin-dwg
markit plugin install npm:markit-plugin-dwg
Install from git
git安装
markit plugin install git:github.com/user/markit-plugin-ocr
markit plugin install git:github.com/user/markit-plugin-ocr
Verify
验证
markit plugin list
markit plugin list
Remove
移除
markit plugin remove my-format
---markit plugin remove my-format
---Common Patterns
常见使用模式
Batch Convert a Directory
批量转换目录中的文件
typescript
import { Markit } from "markit-ai";
import { readdirSync, readFileSync, writeFileSync } from "fs";
import { extname, basename, join } from "path";
const markit = new Markit();
const inputDir = "./docs";
const outputDir = "./docs-md";
const files = readdirSync(inputDir);
for (const file of files) {
const ext = extname(file);
if (![".pdf", ".docx", ".html"].includes(ext)) continue;
const buffer = readFileSync(join(inputDir, file));
const { markdown } = await markit.convert(buffer, { extension: ext });
const outName = basename(file, ext) + ".md";
writeFileSync(join(outputDir, outName), markdown);
console.log(`Converted: ${file} → ${outName}`);
}typescript
import { Markit } from "markit-ai";
import { readdirSync, readFileSync, writeFileSync } from "fs";
import { extname, basename, join } from "path";
const markit = new Markit();
const inputDir = "./docs";
const outputDir = "./docs-md";
const files = readdirSync(inputDir);
for (const file of files) {
const ext = extname(file);
if (![".pdf", ".docx", ".html"].includes(ext)) continue;
const buffer = readFileSync(join(inputDir, file));
const { markdown } = await markit.convert(buffer, { extension: ext });
const outName = basename(file, ext) + ".md";
writeFileSync(join(outputDir, outName), markdown);
console.log(`已转换: ${file} → ${outName}`);
}Convert URL List
转换网址列表
typescript
import { Markit } from "markit-ai";
const markit = new Markit();
const urls = [
"https://example.com/article-1",
"https://example.com/article-2",
"https://en.wikipedia.org/wiki/TypeScript",
];
const results = await Promise.all(
urls.map(async (url) => {
const { markdown } = await markit.convertUrl(url);
return { url, markdown };
})
);
for (const { url, markdown } of results) {
console.log(`\n## ${url}\n\n${markdown.slice(0, 200)}...`);
}typescript
import { Markit } from "markit-ai";
const markit = new Markit();
const urls = [
"https://example.com/article-1",
"https://example.com/article-2",
"https://en.wikipedia.org/wiki/TypeScript",
];
const results = await Promise.all(
urls.map(async (url) => {
const { markdown } = await markit.convertUrl(url);
return { url, markdown };
})
);
for (const { url, markdown } of results) {
console.log(`\n## ${url}\n\n${markdown.slice(0, 200)}...`);
}CLI in Agent/Automation Scripts
在Agent/自动化脚本中使用CLI
bash
undefinedbash
undefinedStructured JSON for programmatic parsing
输出结构化JSON以便程序化解析
markit report.pdf --json
markit report.pdf --json
Raw markdown for piping (no spinner, no extra output)
输出原始Markdown用于管道操作(无加载动画,无额外输出)
markit report.pdf -q > report.md
markit report.pdf -q > report.md
Pipe into another CLI tool
管道输出到其他CLI工具
markit https://example.com/article -q | wc -w
markit https://example.com/article -q | wc -w
Process multiple files in a shell loop
在shell循环中处理多个文件
for f in docs/*.pdf; do
markit "$f" -o "out/$(basename "$f" .pdf).md"
done
---for f in docs/*.pdf; do
markit "$f" -o "out/$(basename "$f" .pdf).md"
done
---Supported Formats Reference
支持格式参考
| Format | Extensions |
|---|---|
| |
| Word | |
| PowerPoint | |
| Excel | |
| HTML | |
| EPUB | |
| Jupyter | |
| RSS/Atom | |
| CSV/TSV | |
| JSON | |
| YAML | |
| XML/SVG | |
| Images | |
| Audio | |
| ZIP | |
| Code | |
| Plain text | |
| URLs | |
Run to see the full live list including any installed plugins.
markit formats| 格式 | 扩展名 |
|---|---|
| |
| Word | |
| PowerPoint | |
| Excel | |
| HTML | |
| EPUB | |
| Jupyter | |
| RSS/Atom | |
| CSV/TSV | |
| JSON | |
| YAML | |
| XML/SVG | |
| 图片 | |
| 音频 | |
| ZIP | |
| 代码 | |
| 纯文本 | |
| 网址 | |
运行可查看完整的实时列表,包括所有已安装的插件支持的格式。
markit formatsTroubleshooting
故障排除
AI description/transcription not working
- Ensure the correct env var is set: or
OPENAI_API_KEYANTHROPIC_API_KEY - Run to verify the resolved provider and model
markit config show - For custom API bases (Ollama, etc.), confirm the server is running and the model supports vision
Plugin not loading
- Run to confirm it's installed
markit plugin list - Check the plugin exports a default function matching
(api: MarkitPluginAPI) => void - Try reinstalling: then
markit plugin remove <name>markit plugin install <source>
PDF returns empty or garbled markdown
- The built-in converter uses text extraction (not OCR). Scanned PDFs need an OCR plugin.
- Try a custom plugin or pre-process with an OCR tool first.
Stdin () not working
markit -- Pipe content directly:
cat file.pdf | markit - - Ensure the file type can be detected from content; use explicit hints if needed.
Config not being read
- Config is loaded from relative to the current working directory
.markit/config.json - Run to create it, then
markit initto verifymarkit config show
AI描述/转录功能无法工作
- 确保已设置正确的环境变量:或
OPENAI_API_KEYANTHROPIC_API_KEY - 运行验证已解析的提供商和模型
markit config show - 对于自定义API地址(如Ollama等),确认服务器正在运行且模型支持视觉功能
插件无法加载
- 运行确认插件已安装
markit plugin list - 检查插件是否导出了符合的默认函数
(api: MarkitPluginAPI) => void - 尝试重新安装:然后
markit plugin remove <name>markit plugin install <source>
PDF转换后返回空内容或乱码Markdown
- 内置转换器使用文本提取(而非OCR)。扫描版PDF需要OCR插件。
- 尝试使用自定义插件或先通过OCR工具预处理。
标准输入()无法工作
markit -- 直接管道传输内容:
cat file.pdf | markit - - 确保内容类型可被自动检测;必要时使用显式提示。
配置未被读取
- 配置从当前工作目录下的加载
.markit/config.json - 运行创建配置文件,然后运行
markit init验证markit config show