Search Results: crawler

Found 43 Skills

site-crawler

Crawl and extract content from websites

Tools & Utilitiesleobrival/serum-plugins-o...

website-crawler

High-performance web crawler for discovering and mapping website structure. Use when users ask to crawl a website, map site structure, discover pages, find all URLs on a site, analyze link relationships, or generate site reports. Supports sitemap discovery, checkpoint/resume, rate limiting, and HTML report generation.

🇺🇸|EnglishTranslated

11 scripts/Attention

Data Processingvibery-studio/templates

udemy-crawler

Extract Udemy course content to markdown. Use when user asks to scrape/crawl Udemy course pages.

🇺🇸|EnglishTranslated

Data Processingleobrival/serum-plugins-o...

web-crawler

High-performance Rust web crawler with stealth mode, LLM-ready Markdown export, multi-format output, sitemap discovery, and robots.txt support. Optimized for content extraction, site mapping, structure analysis, and LLM/RAG pipelines.

🇺🇸|EnglishTranslated

Data Processinggn00678465/crawler-skill

crawler

Fetches web pages and converts them to clean markdown using a robust 3-tier chain (Firecrawl → Jina Reader → Scrapling stealth browser). Use this skill instead of WebFetch whenever the user provides a URL and needs the page's text content — especially for sites that block direct access: medium.com articles (paywalled/metered), WeChat public accounts (mp.weixin.qq.com, geo-restricted), documentation sites with bot protection, or any page where simple HTTP fetching might return a CAPTCHA or empty page. Triggers for: "read this URL", "summarize this article/page", "grab the content from", "extract text from", "what does this page say", "fetch this link", or any request to access and process a specific web page. Do NOT trigger for: building scrapers, checking HTTP status codes, parsing already-downloaded HTML files, answering conceptual questions about scraping tools, or monitoring page changes.

🇺🇸|EnglishTranslated

6 scripts/Checked

Marketing & Growthzubair-trabzada/geo-seo-c...

geo-crawlers

AI crawler access analysis. Checks robots.txt, meta tags, and HTTP headers to determine which AI crawlers can access the site. Provides a complete access map and recommendations for maximizing AI visibility while maintaining appropriate control.

🇺🇸|EnglishTranslated

Data Processingalpoxdev/hypercore

crawler

[Hyper] Investigate websites with Playwriter plus CDP to choose a crawl strategy, capture API/auth evidence, document findings under `.hypercore/crawler/[site]/`, and generate crawler code only after discovery is grounded.

🇺🇸|EnglishTranslated

Data Processingnanmicoder/newscrawler

china-news-crawler

Content extraction for Chinese news sites. Supports WeChat Official Accounts, Toutiao, NetEase News, Sohu News, and Tencent News. Activated when users need to extract Chinese news content, crawl official account articles, scrape news, or obtain news in JSON/Markdown format.

🇨🇳|ChineseTranslated

12 scripts/Attention

Data Processingaaaaqwq/claude-code-skill...

web-scraping-automation

Automatically crawl website data and API interfaces. Use this skill when you need to scrape web content, call APIs, parse data, or create crawler scripts.

🇨🇳|ChineseTranslated

Data Processingasgard-ai-platform/skills

algo-seo-crawl

Implement a web crawler pipeline covering URL discovery, fetching, parsing, and storage. Use this skill when the user needs to build a site crawler, audit website structure, or collect web data systematically — even if they say 'scrape a website', 'crawl all pages', or 'site audit spider'.

🇺🇸|EnglishTranslated

Marketing & Growthkostja94/marketing-skills

robots-txt

When the user wants to configure, audit, or optimize robots.txt. Also use when the user mentions "robots.txt," "crawler rules," "block crawlers," "AI crawlers," "GPTBot," "allow/disallow," "disallow path," "crawl directives," "user-agent," "block Googlebot," "fix robots.txt," "robots.txt blocking," or "search engine crawling."

🇺🇸|EnglishTranslated

Security & Complianceagentic-reserve/blockint-...

katana-web-crawling

Guides use of ProjectDiscovery Katana for web crawling and spidering in security testing and recon workflows. Covers installation, standard vs headless mode, scope and rate limits, JSONL output, and piping from httpx or URL lists. Use when the user mentions Katana, projectdiscovery/katana, web crawling, spidering, endpoint discovery, attack surface mapping, or chaining crawlers in automation pipelines.

🇺🇸|EnglishTranslated