Loading...
Loading...
Robust URL-to-Markdown extraction for OpenClaw workflows. Use this when the user needs to "extract/summarize/convert a webpage to Markdown" (especially for WeChat official accounts at mp.weixin.qq.com) and web_fetch or browser access is blocked or returns messy content. It first uses a low-cost probe via web_fetch, then falls back to the official MinerU API (through the local mineru-extract skill), and returns a traceable result contract with source links.
npx skill4agent add blessonism/openclaw-search-skills content-extracturlreferences/domain-whitelist.mdmodel_version=MinerU-HTMLweb_fetch(url)references/heuristics.mdskills/mineru-extract/scripts/mineru_parse_documents.pymodel_version=MinerU-HTML{
"ok": true,
"source_url": "...",
"engine": "web_fetch" ,
"markdown": "...",
"artifacts": {
"out_dir": "...",
"markdown_path": "...",
"zip_path": "..."
},
"sources": [
"原文URL",
"(如使用MinerU)MinerU full_zip_url",
"(如使用MinerU)本地markdown_path"
],
"notes": ["任何重要限制/失败原因/下一步建议"]
}Note:can beengineorweb_fetch.mineru
python3 mineru-extract/scripts/mineru_parse_documents.py \
--file-sources "<URL>" \
--model-version MinerU-HTML \
--emit-markdown --max-chars 20000Path Note: The above command assumes you are executing it in the root directory of the skills installation. If mineru-extract is installed in another location, please replace it with the actual path.
sourcesmarkdown_pathsources