pr-mining-coordinator

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

PR Mining Coordinator Skill

PR Mining Coordinator Skill

Operator Context

操作上下文

This skill operates as an operator for PR mining coordination workflows, configuring Claude's behavior for background job management and tribal knowledge extraction. It implements the Pipeline architectural pattern -- Validate, Mine, Verify, Generate, Report -- with Domain Intelligence embedded in the mining methodology.
本skill作为PR挖掘协调工作流的操作者,配置Claude的行为以进行后台任务管理和团队隐性知识提取。它采用Pipeline架构模式——验证、挖掘、校验、生成、报告——并在挖掘方法中嵌入了领域智能

Hardcoded Behaviors (Always Apply)

硬编码行为(始终生效)

  • CLAUDE.md Compliance: Read and follow repository CLAUDE.md before execution
  • Over-Engineering Prevention: Only implement what's directly requested. No speculative features
  • Background Execution: Mining jobs always run in background with
    &
  • GitHub Token from Keychain: Uses
    security find-internet-password -s github.com -w
  • Process Tracking: Always store and monitor background job PIDs
  • Sequential by Default: Run mining jobs one at a time to avoid API rate limits
  • CLAUDE.md 合规性:执行前阅读并遵循仓库的CLAUDE.md文件
  • 避免过度设计:仅实现直接要求的功能,不添加推测性特性
  • 后台执行:挖掘任务始终通过
    &
    在后台运行
  • 从钥匙串获取GitHub Token:使用
    security find-internet-password -s github.com -w
    命令获取
  • 进程跟踪:始终存储并监控后台任务的PID
  • 默认串行执行:逐个运行挖掘任务,避免触发API速率限制

Default Behaviors (ON unless disabled)

默认行为(默认开启,可禁用)

  • Communication Style: Report facts without self-congratulation. Show command output, not descriptions
  • Temporary File Cleanup: Remove coordination files and debug outputs at completion. Keep only mining results (JSON) and generated rules (markdown)
  • Progress Reporting: Show mining job progress every 30-60 seconds
  • Auto Rules Generation: Generate categorized markdown rules after successful mining
  • Error Detection: Monitor for API rate limits, auth failures, empty results
  • Confidence Scoring: Calculate HIGH/MEDIUM/LOW confidence for patterns
  • 沟通风格:仅报告事实,不自我夸赞。显示命令输出,而非描述性文字
  • 临时文件清理:完成后删除协调文件和调试输出,仅保留挖掘结果(JSON格式)和生成的规则(markdown格式)
  • 进度报告:每30-60秒显示一次挖掘任务进度
  • 自动规则生成:挖掘成功后生成分类的markdown规则文档
  • 错误检测:监控API速率限制、认证失败、空结果等情况
  • 置信度评分:为识别出的模式计算高/中/低置信度

Optional Behaviors (OFF unless enabled)

可选行为(默认关闭,需开启)

  • Concurrent Mining: Run multiple repos simultaneously (risk: rate limits)
  • Historical Analysis: Mine specific date ranges with --since/--until flags
  • All Comments Mode: Use --all-comments for senior reviewers (default: imperative only)
  • Cross-Repo Merging: Combine patterns from multiple mining results into unified rules
  • 并发挖掘:同时运行多个仓库的挖掘任务(风险:触发速率限制)
  • 历史分析:使用--since/--until标志挖掘特定日期范围的内容
  • 全评论模式:为资深评审者使用--all-comments参数(默认:仅提取命令式评论)
  • 跨仓库合并:将多个挖掘结果中的模式合并为统一规则

What This Skill CAN Do

本Skill可实现的功能

  • Coordinate background PR mining jobs with the pr-miner tool
  • Track running jobs and report progress to user
  • Generate categorized coding rules documents from mined data
  • Calculate pattern confidence from occurrence frequency
  • Handle API rate limits, auth failures, and empty result sets
  • 协调pr-miner工具的后台PR挖掘任务
  • 跟踪运行中的任务并向用户报告进度
  • 从挖掘数据生成分类的编码规则文档
  • 根据出现频率计算模式的置信度
  • 处理API速率限制、认证失败和空结果集

What This Skill CANNOT Do

本Skill无法实现的功能

  • Mine without a valid GitHub token
  • Run multiple mining jobs in parallel (sequential by default)
  • Perform code review (use code-review skill instead)
  • Write coding standards from scratch without PR data
  • Skip prerequisite validation or result verification

  • 无有效GitHub Token时进行挖掘
  • 默认并行运行多个挖掘任务(默认串行)
  • 执行代码评审(请使用code-review skill)
  • 在无PR数据的情况下从头编写编码标准
  • 跳过前置验证或结果校验

Instructions

操作步骤

Phase 1: VALIDATE

阶段1:验证(VALIDATE)

Goal: Confirm prerequisites before starting any mining operation.
Step 1: Check miner script exists
bash
fish -c "ls ~/.claude/skills/pr-miner/scripts/miner.py"
Expected: File exists at path.
Step 2: Verify GitHub token
bash
fish -c "security find-internet-password -s github.com -w 2>/dev/null"
Expected: Token printed (ghp_...).
Step 3: Verify reviewer username (if filtering by reviewer)
bash
fish -c "gh pr list --repo {org/repo} --search 'reviewed-by:{username}' --limit 5"
Expected: PR results confirm username is valid and active.
Gate: Miner script exists, token available, reviewer verified. Proceed only when gate passes.
目标:在开始任何挖掘操作前确认所有前置条件。
步骤1:检查挖掘脚本是否存在
bash
fish -c "ls ~/.claude/skills/pr-miner/scripts/miner.py"
预期结果:文件存在于指定路径。
步骤2:验证GitHub Token
bash
fish -c "security find-internet-password -s github.com -w 2>/dev/null"
预期结果:输出Token(格式为ghp_...)。
步骤3:验证评审者用户名(如果按评审者过滤)
bash
fish -c "gh pr list --repo {org/repo} --search 'reviewed-by:{username}' --limit 5"
预期结果:PR搜索结果确认用户名有效且活跃。
准入条件:挖掘脚本存在、Token可用、评审者验证通过。仅当所有条件满足时才可继续。

Phase 2: MINE

阶段2:挖掘(MINE)

Goal: Execute mining job in background and track progress.
Step 1: Start mining job
bash
fish -c "set -x GITHUB_TOKEN (security find-internet-password -s github.com -w 2>/dev/null) && \
  cd ~/.claude/skills/pr-miner && \
  ./venv/bin/python3 scripts/miner.py {repos} mined_data/{output}.json {flags} --summary" &
Output naming:
{reviewer}_{repos}_{YYYY-MM-DD}.json
or
{repos}_all_{YYYY-MM-DD}.json
See
references/mining-commands.md
for full command patterns and flag reference.
Step 2: Track progress
Monitor background job with BashOutput tool. Check every 30-60 seconds. Report progress to user.
Step 3: Handle multiple repos
Run jobs sequentially. Wait for each to complete before starting next.
Gate: Mining job completes with non-zero interaction count. Proceed only when gate passes.
目标:在后台执行挖掘任务并跟踪进度。
步骤1:启动挖掘任务
bash
fish -c "set -x GITHUB_TOKEN (security find-internet-password -s github.com -w 2>/dev/null) && \
  cd ~/.claude/skills/pr-miner && \
  ./venv/bin/python3 scripts/miner.py {repos} mined_data/{output}.json {flags} --summary" &
输出命名规则
{reviewer}_{repos}_{YYYY-MM-DD}.json
{repos}_all_{YYYY-MM-DD}.json
完整命令模式和参数参考请见
references/mining-commands.md
步骤2:跟踪进度
使用BashOutput工具监控后台任务,每30-60秒检查一次,并向用户报告进度。
步骤3:处理多仓库场景
串行运行任务,等待前一个任务完成后再启动下一个。
准入条件:挖掘任务完成且交互计数非零。仅当满足条件时才可继续。

Phase 3: VERIFY

阶段3:校验(VERIFY)

Goal: Confirm mining output is valid and contains usable data.
Step 1: Check output file exists and has content
bash
fish -c "cat ~/.claude/skills/pr-miner/mined_data/{output}.json | head -50"
Step 2: Validate structure
Confirm JSON matches expected schema:
json
{
  "metadata": {
    "repos": ["org/repo"],
    "reviewer": "username",
    "mined_at": "2025-11-29T10:30:00Z",
    "pr_count": 100,
    "interaction_count": 36
  },
  "interactions": [
    {
      "pr_number": 123,
      "pr_title": "Add feature X",
      "comment": "Use errors.Is() instead of comparing error strings",
      "code_before": "if err.Error() == \"not found\" {",
      "code_after": "if errors.Is(err, ErrNotFound) {"
    }
  ]
}
If
interaction_count
is 0, do not proceed -- see Error Handling for "0 interactions found".
Step 3: Check interaction quality
Verify interactions have: pr_number, pr_title, comment text, and ideally code_before/code_after pairs. Interactions without code pairs can still produce rules but lack concrete examples.
Gate: Output JSON is valid, contains interactions with usable data. Proceed only when gate passes.
目标:确认挖掘输出有效且包含可用数据。
步骤1:检查输出文件是否存在且有内容
bash
fish -c "cat ~/.claude/skills/pr-miner/mined_data/{output}.json | head -50"
步骤2:验证结构
确认JSON符合预期 schema:
json
{
  "metadata": {
    "repos": ["org/repo"],
    "reviewer": "username",
    "mined_at": "2025-11-29T10:30:00Z",
    "pr_count": 100,
    "interaction_count": 36
  },
  "interactions": [
    {
      "pr_number": 123,
      "pr_title": "Add feature X",
      "comment": "Use errors.Is() instead of comparing error strings",
      "code_before": "if err.Error() == \"not found\" {",
      "code_after": "if errors.Is(err, ErrNotFound) {"
    }
  ]
}
如果
interaction_count
为0,请勿继续——请参考错误处理中的「未找到交互内容」部分。
步骤3:检查交互内容质量
验证交互内容包含:pr_number、pr_title、评论文本,理想情况下还包含code_before/code_after对。即使没有代码对,仍可生成规则,但会缺少具体示例。
准入条件:输出JSON有效,包含可用的交互数据。仅当满足条件时才可继续。

Phase 4: GENERATE

阶段4:生成(GENERATE)

Goal: Produce categorized coding rules document from mined data.
Step 1: Load and categorize patterns
Read mined JSON. Categorize interactions by topic using standard categories from
references/pattern-categories.md
.
Step 2: Score confidence
LevelCriteriaAction
HIGH5+ occurrences from senior reviewersInclude as standard practice
MEDIUM2-4 occurrencesInclude with context caveats
LOWSingle occurrencePlace in "Additional Observations"
Step 3: Generate markdown rules document
Follow this structure for each pattern entry:
markdown
undefined
目标:从挖掘数据生成分类的编码规则文档。
步骤1:加载并分类模式
读取挖掘得到的JSON文件,使用
references/pattern-categories.md
中的标准分类按主题对交互内容进行分类。
步骤2:置信度评分
等级判定标准处理方式
高(HIGH)出现5次以上且来自资深评审者作为标准实践纳入
中(MEDIUM)出现2-4次纳入并附加上下文说明
低(LOW)仅出现1次放入「其他观察结果」部分
步骤3:生成markdown规则文档
每个模式条目遵循以下结构:
markdown
undefined

{Category Name}

{分类名称}

{Pattern Name} ({CONFIDENCE} confidence)

{模式名称}({CONFIDENCE} 置信度)

Pattern: {Brief description}
Good: ```{lang} {good_example_code} ```
Bad: ```{lang} {bad_example_code} ```
Rationale: From PR #{number} review by {reviewer}: "{comment_text}"

Order categories by total pattern count (most patterns first). Within each category, sort HIGH before MEDIUM before LOW.

**Step 4: Save rules**

```bash
fish -c "cat > ~/.claude/skills/pr-miner/rules/{repos}_coding_rules.md"
Gate: Rules document is categorized, confidence-scored, and saved to disk.
模式:{简要描述}
正确示例: ```lang {good_example_code} ```
错误示例: ```lang {bad_example_code} ```
依据:来自PR #{number}中{reviewer}的评审评论: "{comment_text}"

按总模式数量排序分类(模式最多的分类排在最前)。在每个分类内,按高置信度→中置信度→低置信度排序。

**步骤4:保存规则**

```bash
fish -c "cat > ~/.claude/skills/pr-miner/rules/{repos}_coding_rules.md"
准入条件:规则文档已完成分类、置信度评分,并保存到磁盘。

Phase 5: REPORT

阶段5:报告(REPORT)

Goal: Deliver comprehensive results to user.
Provide:
  • PRs analyzed count
  • Interactions extracted count
  • File paths for mined data and generated rules
  • Top HIGH confidence patterns with occurrence counts
  • Summary of MEDIUM and LOW confidence pattern counts
Gate: User has all information needed to act on mining results.

目标:向用户交付完整的结果。
需提供:
  • 分析的PR数量
  • 提取的交互内容数量
  • 挖掘数据和生成规则的文件路径
  • 出现次数最多的高置信度模式
  • 中、低置信度模式的数量汇总
准入条件:用户已获取所有可用于处理挖掘结果的信息。

Examples

示例

Example 1: Mine Specific Reviewer

示例1:挖掘特定评审者的模式

User says: "Mine senior-reviewer's patterns from go-libs" Actions:
  1. Verify miner, token, and reviewer username (VALIDATE)
  2. Run mining with --reviewer and --all-comments flags (MINE)
  3. Check output JSON for valid interactions (VERIFY)
  4. Categorize patterns and generate rules markdown (GENERATE)
  5. Report top patterns and file locations (REPORT) Result: Categorized coding rules with confidence scores
用户需求:"挖掘senior-reviewer在go-libs仓库中的评审模式" 操作步骤:
  1. 验证挖掘工具、Token和评审者用户名(验证阶段)
  2. 使用--reviewer和--all-comments参数运行挖掘任务(挖掘阶段)
  3. 检查输出JSON中的有效交互内容(校验阶段)
  4. 分类模式并生成规则markdown文档(生成阶段)
  5. 报告顶级模式和文件位置(报告阶段) 结果:带有置信度评分的分类编码规则

Example 2: Team Standards Extraction

示例2:提取团队编码标准

User says: "Get coding standards from service-a and service-b" Actions:
  1. Verify miner and token, no reviewer to verify (VALIDATE)
  2. Run mining without --reviewer to capture all reviewers (MINE)
  3. Confirm output has interactions from multiple reviewers (VERIFY)
  4. Generate team-wide rules document (GENERATE)
  5. Report findings with reviewer distribution (REPORT) Result: Team-wide coding rules across both repositories

用户需求:"从service-a和service-b仓库提取编码标准" 操作步骤:
  1. 验证挖掘工具和Token,无需验证评审者(验证阶段)
  2. 不使用--reviewer参数运行挖掘,捕获所有评审者的评论(挖掘阶段)
  3. 确认输出包含来自多个评审者的交互内容(校验阶段)
  4. 生成团队级别的规则文档(生成阶段)
  5. 报告结果及评审者分布情况(报告阶段) 结果:涵盖两个仓库的团队级编码规则

Error Handling

错误处理

Error: "API rate limit exceeded"

错误:"API rate limit exceeded"(API速率限制超出)

Cause: GitHub API 5000 requests/hour exhausted by mining operations Solution:
  1. Report remaining quota and reset time to user
  2. Stop current job if rate limit is critically low (<150 remaining)
  3. Wait for reset or cancel and retry later
  4. For future runs: reduce --limit or mine fewer repos per job
原因:挖掘操作耗尽了GitHub API的每小时5000次请求限额 解决方案:
  1. 向用户报告剩余配额和重置时间
  2. 如果剩余配额极低(<150),停止当前任务
  3. 等待配额重置或取消任务稍后重试
  4. 后续运行时:减少--limit参数值,或每次挖掘更少的仓库

Error: "Authentication failed"

错误:"Authentication failed"(认证失败)

Cause: GitHub token expired, revoked, or missing from keychain Solution:
  1. Run
    fish -c "security find-internet-password -s github.com -w 2>/dev/null"
    to check token
  2. If empty: token not in keychain. User must add it
  3. If present but rejected: token expired or lacks repo scope
  4. Guide user to update token with
    security add-internet-password
原因:GitHub Token过期、被撤销,或未存储在钥匙串中 解决方案:
  1. 运行
    fish -c "security find-internet-password -s github.com -w 2>/dev/null"
    检查Token
  2. 如果无输出:Token未在钥匙串中,需用户添加
  3. 如果有输出但认证失败:Token过期或缺少仓库权限
  4. 引导用户使用
    security add-internet-password
    命令更新Token

Error: "0 interactions found"

错误:"0 interactions found"(未找到交互内容)

Cause: Wrong reviewer username, no PR activity, or date range too narrow Solution:
  1. Verify reviewer username with
    gh pr list --search 'reviewed-by:{username}'
  2. Re-run without --reviewer to confirm data exists
  3. Widen date range by removing --since/--until
  4. Check if repo has PR review comments (not just approvals)
原因:评审者用户名错误、无PR活动,或日期范围过窄 解决方案:
  1. 使用
    gh pr list --search 'reviewed-by:{username}'
    验证评审者用户名
  2. 不使用--reviewer参数重新运行,确认是否存在数据
  3. 通过移除--since/--until参数扩大日期范围
  4. 检查仓库是否有PR评审评论(而非仅批准)

Error: "Mining job timeout (>5 min)"

错误:"Mining job timeout (>5 min)"(挖掘任务超时(超过5分钟))

Cause: Large repo, many PRs, or slow API responses Solution:
  1. Report current progress to user
  2. Continue monitoring -- mining is still running
  3. If stuck: check for network issues or API downtime
  4. For future runs: reduce --limit to smaller batches

原因:仓库规模大、PR数量多,或API响应缓慢 解决方案:
  1. 向用户报告当前进度
  2. 继续监控——挖掘任务仍在运行
  3. 如果任务停滞:检查网络问题或API停机情况
  4. 后续运行时:减少--limit参数值,采用更小的批次

Anti-Patterns

反模式

Anti-Pattern 1: Mining Without Verifying Reviewer Username

反模式1:未验证评审者用户名就开始挖掘

What it looks like: Running
--reviewer senior-reviewer
without checking the actual GitHub username Why wrong: Job completes successfully with 0 interactions. Wastes API quota and 5-10 minutes. Username errors are silent. Do instead: Verify username with
gh pr list --search 'reviewed-by:{username}'
before mining.
表现:在未检查实际GitHub用户名的情况下,使用
--reviewer senior-reviewer
参数运行挖掘 问题:任务成功完成但交互计数为0,浪费API配额和5-10分钟时间,用户名错误不会触发明显报错 正确做法:挖掘前使用
gh pr list --search 'reviewed-by:{username}'
验证用户名

Anti-Pattern 2: Running Multiple Mining Jobs in Parallel

反模式2:并行运行多个挖掘任务

What it looks like: Starting 3+ mining jobs simultaneously to save time Why wrong: Exhausts 5000 requests/hour rate limit across all jobs. Later jobs fail mid-execution. Cannot track which job consumed quota. Do instead: Run jobs sequentially. Wait for each to complete before starting the next.
表现:同时启动3个以上挖掘任务以节省时间 问题:所有任务会耗尽每小时5000次的请求限额,后续任务会中途失败,且无法追踪哪个任务消耗了配额 正确做法:串行运行任务,等待前一个任务完成后再启动下一个

Anti-Pattern 3: Generating Flat Rules Without Categorization

反模式3:生成无分类的扁平规则列表

What it looks like: A numbered list of 50 patterns with no organization or confidence scoring Why wrong: Overwhelming to read. No way to find relevant patterns. Loses priority context. Do instead: Categorize by topic (Error Handling, Testing, API Design, etc.) and sort by confidence level within each category. See
references/pattern-categories.md
.
表现:生成包含50个模式的编号列表,无任何组织或置信度评分 问题:内容过于繁杂,用户难以找到相关模式,丢失优先级上下文 正确做法:按主题(错误处理、测试、API设计等)分类,并在每个分类内按置信度排序。参考
references/pattern-categories.md

Anti-Pattern 4: Skipping --all-comments for Senior Reviewers

反模式4:挖掘资深评审者时未使用--all-comments参数

What it looks like: Mining a senior reviewer without the --all-comments flag and getting 0-2 interactions Why wrong: Senior reviewers use questions ("Why not use errors.Is here?") and suggestions instead of imperatives. Default mode misses the majority of their feedback. Do instead: Always use
--all-comments
when mining senior or experienced reviewers.
表现:挖掘资深评审者时未添加--all-comments参数,仅得到0-2条交互内容 问题:资深评审者常使用疑问("为什么不在这里使用errors.Is?")和建议而非命令式语句,默认模式会错过大部分反馈 正确做法:挖掘资深或经验丰富的评审者时,始终使用
--all-comments
参数

Anti-Pattern 5: Testing Multi-Repo Mining Without Single-Repo Validation

反模式5:未验证单仓库就进行多仓库挖掘

What it looks like: Mining 5 repos at once on the first attempt without verifying any individually Why wrong: If any repo has access issues, entire job fails after minutes of wasted time. Cannot determine which repo caused failure. Do instead: Test with a single repo and
--limit 10
first. Expand incrementally after confirming access.

表现:首次尝试就同时挖掘5个仓库,未单独验证任何一个仓库 问题:如果任何一个仓库存在访问问题,整个任务会在数分钟后失败,且无法确定是哪个仓库导致的问题 正确做法:先使用单个仓库和
--limit 10
参数测试,确认访问正常后再逐步扩大范围

References

参考文件

This skill uses these reference files:
  • ${CLAUDE_SKILL_DIR}/references/mining-commands.md
    : Command patterns, flag reference, output naming conventions
  • ${CLAUDE_SKILL_DIR}/references/pattern-categories.md
    : Standard categories for coding rules (10 categories with examples)
  • ${CLAUDE_SKILL_DIR}/references/reviewer-usernames.md
    : Known GitHub usernames and verification methods
本skill使用以下参考文件:
  • ${CLAUDE_SKILL_DIR}/references/mining-commands.md
    :命令模式、参数参考、输出命名规范
  • ${CLAUDE_SKILL_DIR}/references/pattern-categories.md
    :编码规则的标准分类(含示例的10个分类)
  • ${CLAUDE_SKILL_DIR}/references/reviewer-usernames.md
    :已知的GitHub用户名及验证方法

Domain-Specific Anti-Rationalization

领域特定的反合理化说明

RationalizationWhy It's WrongRequired Action
"Username is probably right"Probably = 0 interactions after 5 minVerify with gh pr list first
"Parallel mining saves time"Saves nothing when rate limit kills jobsRun sequentially
"Just dump all patterns"Flat lists are unusable at 50+ itemsCategorize and score confidence
"Low limit is enough"Small samples produce low-confidence rulesUse --limit 100+ for meaningful patterns
合理化借口问题所在要求操作
"用户名应该是对的""应该"意味着5分钟后得到0条交互内容先使用gh pr list验证
"并行挖掘节省时间"当速率限制导致任务失败时,根本无法节省时间串行运行任务
"直接导出所有模式就行"当模式数量超过50个时,扁平列表毫无用处进行分类并评分置信度
"小限额足够了"小样本会生成低置信度的规则使用
--limit 100+
参数以获取有意义的模式