pr-mining-coordinator

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

PR Mining Coordinator Skill

Operator Context

操作上下文

This skill operates as an operator for PR mining coordination workflows, configuring Claude's behavior for background job management and tribal knowledge extraction. It implements the Pipeline architectural pattern -- Validate, Mine, Verify, Generate, Report -- with Domain Intelligence embedded in the mining methodology.

本skill作为PR挖掘协调工作流的操作者，配置Claude的行为以进行后台任务管理和团队隐性知识提取。它采用Pipeline架构模式——验证、挖掘、校验、生成、报告——并在挖掘方法中嵌入了领域智能。

Hardcoded Behaviors (Always Apply)

硬编码行为（始终生效）

CLAUDE.md Compliance: Read and follow repository CLAUDE.md before execution
Over-Engineering Prevention: Only implement what's directly requested. No speculative features
Background Execution: Mining jobs always run in background with
```
&
```

GitHub Token from Keychain: Uses

security find-internet-password -s github.com -w

Process Tracking: Always store and monitor background job PIDs
Sequential by Default: Run mining jobs one at a time to avoid API rate limits

CLAUDE.md 合规性：执行前阅读并遵循仓库的CLAUDE.md文件
避免过度设计：仅实现直接要求的功能，不添加推测性特性
后台执行：挖掘任务始终通过
```
&
```
在后台运行
从钥匙串获取GitHub Token：使用
```
security find-internet-password -s github.com -w
```
命令获取
进程跟踪：始终存储并监控后台任务的PID
默认串行执行：逐个运行挖掘任务，避免触发API速率限制

Default Behaviors (ON unless disabled)

默认行为（默认开启，可禁用）

Communication Style: Report facts without self-congratulation. Show command output, not descriptions
Temporary File Cleanup: Remove coordination files and debug outputs at completion. Keep only mining results (JSON) and generated rules (markdown)
Progress Reporting: Show mining job progress every 30-60 seconds
Auto Rules Generation: Generate categorized markdown rules after successful mining
Error Detection: Monitor for API rate limits, auth failures, empty results
Confidence Scoring: Calculate HIGH/MEDIUM/LOW confidence for patterns

沟通风格：仅报告事实，不自我夸赞。显示命令输出，而非描述性文字
临时文件清理：完成后删除协调文件和调试输出，仅保留挖掘结果（JSON格式）和生成的规则（markdown格式）
进度报告：每30-60秒显示一次挖掘任务进度
自动规则生成：挖掘成功后生成分类的markdown规则文档
错误检测：监控API速率限制、认证失败、空结果等情况
置信度评分：为识别出的模式计算高/中/低置信度

Optional Behaviors (OFF unless enabled)

可选行为（默认关闭，需开启）

Concurrent Mining: Run multiple repos simultaneously (risk: rate limits)
Historical Analysis: Mine specific date ranges with --since/--until flags
All Comments Mode: Use --all-comments for senior reviewers (default: imperative only)
Cross-Repo Merging: Combine patterns from multiple mining results into unified rules

并发挖掘：同时运行多个仓库的挖掘任务（风险：触发速率限制）
历史分析：使用--since/--until标志挖掘特定日期范围的内容
全评论模式：为资深评审者使用--all-comments参数（默认：仅提取命令式评论）
跨仓库合并：将多个挖掘结果中的模式合并为统一规则

What This Skill CAN Do

本Skill可实现的功能

Coordinate background PR mining jobs with the pr-miner tool
Track running jobs and report progress to user
Generate categorized coding rules documents from mined data
Calculate pattern confidence from occurrence frequency
Handle API rate limits, auth failures, and empty result sets

协调pr-miner工具的后台PR挖掘任务
跟踪运行中的任务并向用户报告进度
从挖掘数据生成分类的编码规则文档
根据出现频率计算模式的置信度
处理API速率限制、认证失败和空结果集

What This Skill CANNOT Do

本Skill无法实现的功能

Mine without a valid GitHub token
Run multiple mining jobs in parallel (sequential by default)
Perform code review (use code-review skill instead)
Write coding standards from scratch without PR data
Skip prerequisite validation or result verification

无有效GitHub Token时进行挖掘
默认并行运行多个挖掘任务（默认串行）
执行代码评审（请使用code-review skill）
在无PR数据的情况下从头编写编码标准
跳过前置验证或结果校验

Instructions

操作步骤

Phase 1: VALIDATE

阶段1：验证（VALIDATE）

Goal: Confirm prerequisites before starting any mining operation.

Step 1: Check miner script exists

bash

fish -c "ls ~/.claude/skills/pr-miner/scripts/miner.py"

Expected: File exists at path.

Step 2: Verify GitHub token

bash

fish -c "security find-internet-password -s github.com -w 2>/dev/null"

Expected: Token printed (ghp_...).

Step 3: Verify reviewer username (if filtering by reviewer)

bash

fish -c "gh pr list --repo {org/repo} --search 'reviewed-by:{username}' --limit 5"

Expected: PR results confirm username is valid and active.

Gate: Miner script exists, token available, reviewer verified. Proceed only when gate passes.

目标：在开始任何挖掘操作前确认所有前置条件。

步骤1：检查挖掘脚本是否存在

bash

fish -c "ls ~/.claude/skills/pr-miner/scripts/miner.py"

预期结果：文件存在于指定路径。

步骤2：验证GitHub Token

bash

fish -c "security find-internet-password -s github.com -w 2>/dev/null"

预期结果：输出Token（格式为ghp_...）。

步骤3：验证评审者用户名（如果按评审者过滤）

bash

fish -c "gh pr list --repo {org/repo} --search 'reviewed-by:{username}' --limit 5"

预期结果：PR搜索结果确认用户名有效且活跃。

准入条件：挖掘脚本存在、Token可用、评审者验证通过。仅当所有条件满足时才可继续。

Phase 2: MINE

阶段2：挖掘（MINE）

Goal: Execute mining job in background and track progress.

Step 1: Start mining job

bash

fish -c "set -x GITHUB_TOKEN (security find-internet-password -s github.com -w 2>/dev/null) && \
  cd ~/.claude/skills/pr-miner && \
  ./venv/bin/python3 scripts/miner.py {repos} mined_data/{output}.json {flags} --summary" &

Output naming:

{reviewer}_{repos}_{YYYY-MM-DD}.json

{repos}_all_{YYYY-MM-DD}.json

See

references/mining-commands.md

for full command patterns and flag reference.

Step 2: Track progress

Monitor background job with BashOutput tool. Check every 30-60 seconds. Report progress to user.

Step 3: Handle multiple repos

Run jobs sequentially. Wait for each to complete before starting next.

Gate: Mining job completes with non-zero interaction count. Proceed only when gate passes.

目标：在后台执行挖掘任务并跟踪进度。

步骤1：启动挖掘任务

bash

fish -c "set -x GITHUB_TOKEN (security find-internet-password -s github.com -w 2>/dev/null) && \
  cd ~/.claude/skills/pr-miner && \
  ./venv/bin/python3 scripts/miner.py {repos} mined_data/{output}.json {flags} --summary" &

输出命名规则：

{reviewer}_{repos}_{YYYY-MM-DD}.json

或

{repos}_all_{YYYY-MM-DD}.json

完整命令模式和参数参考请见

references/mining-commands.md

。

步骤2：跟踪进度

使用BashOutput工具监控后台任务，每30-60秒检查一次，并向用户报告进度。

步骤3：处理多仓库场景

串行运行任务，等待前一个任务完成后再启动下一个。

准入条件：挖掘任务完成且交互计数非零。仅当满足条件时才可继续。

Phase 3: VERIFY

阶段3：校验（VERIFY）

Goal: Confirm mining output is valid and contains usable data.

Step 1: Check output file exists and has content

bash

fish -c "cat ~/.claude/skills/pr-miner/mined_data/{output}.json | head -50"

Step 2: Validate structure

Confirm JSON matches expected schema:

json

{
  "metadata": {
    "repos": ["org/repo"],
    "reviewer": "username",
    "mined_at": "2025-11-29T10:30:00Z",
    "pr_count": 100,
    "interaction_count": 36
  },
  "interactions": [
    {
      "pr_number": 123,
      "pr_title": "Add feature X",
      "comment": "Use errors.Is() instead of comparing error strings",
      "code_before": "if err.Error() == \"not found\" {",
      "code_after": "if errors.Is(err, ErrNotFound) {"
    }
  ]
}

interaction_count

is 0, do not proceed -- see Error Handling for "0 interactions found".

Step 3: Check interaction quality

Verify interactions have: pr_number, pr_title, comment text, and ideally code_before/code_after pairs. Interactions without code pairs can still produce rules but lack concrete examples.

Gate: Output JSON is valid, contains interactions with usable data. Proceed only when gate passes.

目标：确认挖掘输出有效且包含可用数据。

步骤1：检查输出文件是否存在且有内容

bash

fish -c "cat ~/.claude/skills/pr-miner/mined_data/{output}.json | head -50"

步骤2：验证结构

确认JSON符合预期 schema：

json

{
  "metadata": {
    "repos": ["org/repo"],
    "reviewer": "username",
    "mined_at": "2025-11-29T10:30:00Z",
    "pr_count": 100,
    "interaction_count": 36
  },
  "interactions": [
    {
      "pr_number": 123,
      "pr_title": "Add feature X",
      "comment": "Use errors.Is() instead of comparing error strings",
      "code_before": "if err.Error() == \"not found\" {",
      "code_after": "if errors.Is(err, ErrNotFound) {"
    }
  ]
}

如果

interaction_count

为0，请勿继续——请参考错误处理中的「未找到交互内容」部分。

步骤3：检查交互内容质量

验证交互内容包含：pr_number、pr_title、评论文本，理想情况下还包含code_before/code_after对。即使没有代码对，仍可生成规则，但会缺少具体示例。

准入条件：输出JSON有效，包含可用的交互数据。仅当满足条件时才可继续。

Phase 4: GENERATE

阶段4：生成（GENERATE）

Goal: Produce categorized coding rules document from mined data.

Step 1: Load and categorize patterns

Read mined JSON. Categorize interactions by topic using standard categories from

references/pattern-categories.md

Step 2: Score confidence

Level	Criteria	Action
HIGH	5+ occurrences from senior reviewers	Include as standard practice
MEDIUM	2-4 occurrences	Include with context caveats
LOW	Single occurrence	Place in "Additional Observations"

Step 3: Generate markdown rules document

Follow this structure for each pattern entry:

markdown

undefined

目标：从挖掘数据生成分类的编码规则文档。

步骤1：加载并分类模式

读取挖掘得到的JSON文件，使用

references/pattern-categories.md

中的标准分类按主题对交互内容进行分类。

步骤2：置信度评分

等级	判定标准	处理方式
高（HIGH）	出现5次以上且来自资深评审者	作为标准实践纳入
中（MEDIUM）	出现2-4次	纳入并附加上下文说明
低（LOW）	仅出现1次	放入「其他观察结果」部分

步骤3：生成markdown规则文档

每个模式条目遵循以下结构：

markdown

undefined

{Category Name}

{分类名称}

{Pattern Name} ({CONFIDENCE} confidence)

{模式名称}（{CONFIDENCE} 置信度）

Pattern: {Brief description}

Good: ```{lang} {good_example_code} ```

Bad: ```{lang} {bad_example_code} ```

Rationale: From PR #{number} review by {reviewer}: "{comment_text}"


Order categories by total pattern count (most patterns first). Within each category, sort HIGH before MEDIUM before LOW.

**Step 4: Save rules**

```bash
fish -c "cat > ~/.claude/skills/pr-miner/rules/{repos}_coding_rules.md"

Gate: Rules document is categorized, confidence-scored, and saved to disk.

模式：{简要描述}

正确示例： ```lang {good_example_code} ```

错误示例： ```lang {bad_example_code} ```

依据：来自PR #{number}中{reviewer}的评审评论： "{comment_text}"


按总模式数量排序分类（模式最多的分类排在最前）。在每个分类内，按高置信度→中置信度→低置信度排序。

**步骤4：保存规则**

```bash
fish -c "cat > ~/.claude/skills/pr-miner/rules/{repos}_coding_rules.md"

准入条件：规则文档已完成分类、置信度评分，并保存到磁盘。

Phase 5: REPORT

阶段5：报告（REPORT）

Goal: Deliver comprehensive results to user.

Provide:

PRs analyzed count
Interactions extracted count
File paths for mined data and generated rules
Top HIGH confidence patterns with occurrence counts
Summary of MEDIUM and LOW confidence pattern counts

Gate: User has all information needed to act on mining results.

目标：向用户交付完整的结果。

需提供：

分析的PR数量
提取的交互内容数量
挖掘数据和生成规则的文件路径
出现次数最多的高置信度模式
中、低置信度模式的数量汇总

准入条件：用户已获取所有可用于处理挖掘结果的信息。

Examples

示例

Example 1: Mine Specific Reviewer

示例1：挖掘特定评审者的模式

User says: "Mine senior-reviewer's patterns from go-libs" Actions:

Verify miner, token, and reviewer username (VALIDATE)
Run mining with --reviewer and --all-comments flags (MINE)
Check output JSON for valid interactions (VERIFY)
Categorize patterns and generate rules markdown (GENERATE)
Report top patterns and file locations (REPORT) Result: Categorized coding rules with confidence scores

用户需求："挖掘senior-reviewer在go-libs仓库中的评审模式" 操作步骤：

验证挖掘工具、Token和评审者用户名（验证阶段）
使用--reviewer和--all-comments参数运行挖掘任务（挖掘阶段）
检查输出JSON中的有效交互内容（校验阶段）
分类模式并生成规则markdown文档（生成阶段）
报告顶级模式和文件位置（报告阶段）结果：带有置信度评分的分类编码规则

Example 2: Team Standards Extraction

示例2：提取团队编码标准

User says: "Get coding standards from service-a and service-b" Actions:

Verify miner and token, no reviewer to verify (VALIDATE)
Run mining without --reviewer to capture all reviewers (MINE)
Confirm output has interactions from multiple reviewers (VERIFY)
Generate team-wide rules document (GENERATE)
Report findings with reviewer distribution (REPORT) Result: Team-wide coding rules across both repositories

用户需求："从service-a和service-b仓库提取编码标准" 操作步骤：

验证挖掘工具和Token，无需验证评审者（验证阶段）
不使用--reviewer参数运行挖掘，捕获所有评审者的评论（挖掘阶段）
确认输出包含来自多个评审者的交互内容（校验阶段）
生成团队级别的规则文档（生成阶段）
报告结果及评审者分布情况（报告阶段）结果：涵盖两个仓库的团队级编码规则

Error Handling

错误处理

Error: "API rate limit exceeded"

错误："API rate limit exceeded"（API速率限制超出）

Cause: GitHub API 5000 requests/hour exhausted by mining operations Solution:

Report remaining quota and reset time to user
Stop current job if rate limit is critically low (<150 remaining)
Wait for reset or cancel and retry later
For future runs: reduce --limit or mine fewer repos per job

原因：挖掘操作耗尽了GitHub API的每小时5000次请求限额解决方案：

向用户报告剩余配额和重置时间
如果剩余配额极低（<150），停止当前任务
等待配额重置或取消任务稍后重试
后续运行时：减少--limit参数值，或每次挖掘更少的仓库

Error: "Authentication failed"

错误："Authentication failed"（认证失败）

Cause: GitHub token expired, revoked, or missing from keychain Solution:

Run

fish -c "security find-internet-password -s github.com -w 2>/dev/null"

to check token

If empty: token not in keychain. User must add it
If present but rejected: token expired or lacks repo scope
Guide user to update token with
```
security add-internet-password
```

原因：GitHub Token过期、被撤销，或未存储在钥匙串中解决方案：

运行

fish -c "security find-internet-password -s github.com -w 2>/dev/null"

检查Token

如果无输出：Token未在钥匙串中，需用户添加
如果有输出但认证失败：Token过期或缺少仓库权限
引导用户使用
```
security add-internet-password
```
命令更新Token

Error: "0 interactions found"

错误："0 interactions found"（未找到交互内容）

Cause: Wrong reviewer username, no PR activity, or date range too narrow Solution:

Verify reviewer username with

gh pr list --search 'reviewed-by:{username}'

Re-run without --reviewer to confirm data exists
Widen date range by removing --since/--until
Check if repo has PR review comments (not just approvals)

原因：评审者用户名错误、无PR活动，或日期范围过窄解决方案：

使用

gh pr list --search 'reviewed-by:{username}'

验证评审者用户名

不使用--reviewer参数重新运行，确认是否存在数据
通过移除--since/--until参数扩大日期范围
检查仓库是否有PR评审评论（而非仅批准）

Error: "Mining job timeout (>5 min)"

错误："Mining job timeout (>5 min)"（挖掘任务超时（超过5分钟））

Cause: Large repo, many PRs, or slow API responses Solution:

Report current progress to user
Continue monitoring -- mining is still running
If stuck: check for network issues or API downtime
For future runs: reduce --limit to smaller batches

原因：仓库规模大、PR数量多，或API响应缓慢解决方案：

向用户报告当前进度
继续监控——挖掘任务仍在运行
如果任务停滞：检查网络问题或API停机情况
后续运行时：减少--limit参数值，采用更小的批次

Anti-Patterns

反模式

Anti-Pattern 1: Mining Without Verifying Reviewer Username

反模式1：未验证评审者用户名就开始挖掘

What it looks like: Running

--reviewer senior-reviewer

without checking the actual GitHub username Why wrong: Job completes successfully with 0 interactions. Wastes API quota and 5-10 minutes. Username errors are silent. Do instead: Verify username with

gh pr list --search 'reviewed-by:{username}'

before mining.

表现：在未检查实际GitHub用户名的情况下，使用

--reviewer senior-reviewer

参数运行挖掘问题：任务成功完成但交互计数为0，浪费API配额和5-10分钟时间，用户名错误不会触发明显报错 正确做法：挖掘前使用

gh pr list --search 'reviewed-by:{username}'

验证用户名

Anti-Pattern 2: Running Multiple Mining Jobs in Parallel

反模式2：并行运行多个挖掘任务

What it looks like: Starting 3+ mining jobs simultaneously to save time Why wrong: Exhausts 5000 requests/hour rate limit across all jobs. Later jobs fail mid-execution. Cannot track which job consumed quota. Do instead: Run jobs sequentially. Wait for each to complete before starting the next.

表现：同时启动3个以上挖掘任务以节省时间问题：所有任务会耗尽每小时5000次的请求限额，后续任务会中途失败，且无法追踪哪个任务消耗了配额 正确做法：串行运行任务，等待前一个任务完成后再启动下一个

Anti-Pattern 3: Generating Flat Rules Without Categorization

反模式3：生成无分类的扁平规则列表

What it looks like: A numbered list of 50 patterns with no organization or confidence scoring Why wrong: Overwhelming to read. No way to find relevant patterns. Loses priority context. Do instead: Categorize by topic (Error Handling, Testing, API Design, etc.) and sort by confidence level within each category. See

references/pattern-categories.md

表现：生成包含50个模式的编号列表，无任何组织或置信度评分问题：内容过于繁杂，用户难以找到相关模式，丢失优先级上下文 正确做法：按主题（错误处理、测试、API设计等）分类，并在每个分类内按置信度排序。参考

references/pattern-categories.md

。

Anti-Pattern 4: Skipping --all-comments for Senior Reviewers

反模式4：挖掘资深评审者时未使用--all-comments参数

What it looks like: Mining a senior reviewer without the --all-comments flag and getting 0-2 interactions Why wrong: Senior reviewers use questions ("Why not use errors.Is here?") and suggestions instead of imperatives. Default mode misses the majority of their feedback. Do instead: Always use

--all-comments

when mining senior or experienced reviewers.

表现：挖掘资深评审者时未添加--all-comments参数，仅得到0-2条交互内容问题：资深评审者常使用疑问（"为什么不在这里使用errors.Is？"）和建议而非命令式语句，默认模式会错过大部分反馈 正确做法：挖掘资深或经验丰富的评审者时，始终使用

--all-comments

参数

Anti-Pattern 5: Testing Multi-Repo Mining Without Single-Repo Validation

反模式5：未验证单仓库就进行多仓库挖掘

What it looks like: Mining 5 repos at once on the first attempt without verifying any individually Why wrong: If any repo has access issues, entire job fails after minutes of wasted time. Cannot determine which repo caused failure. Do instead: Test with a single repo and

--limit 10

first. Expand incrementally after confirming access.

表现：首次尝试就同时挖掘5个仓库，未单独验证任何一个仓库问题：如果任何一个仓库存在访问问题，整个任务会在数分钟后失败，且无法确定是哪个仓库导致的问题 正确做法：先使用单个仓库和

--limit 10

参数测试，确认访问正常后再逐步扩大范围

References

参考文件

This skill uses these reference files:

```
${CLAUDE_SKILL_DIR}/references/mining-commands.md
```
: Command patterns, flag reference, output naming conventions
```
${CLAUDE_SKILL_DIR}/references/pattern-categories.md
```
: Standard categories for coding rules (10 categories with examples)

${CLAUDE_SKILL_DIR}/references/reviewer-usernames.md

: Known GitHub usernames and verification methods

本skill使用以下参考文件：

```
${CLAUDE_SKILL_DIR}/references/mining-commands.md
```
：命令模式、参数参考、输出命名规范
```
${CLAUDE_SKILL_DIR}/references/pattern-categories.md
```
：编码规则的标准分类（含示例的10个分类）

${CLAUDE_SKILL_DIR}/references/reviewer-usernames.md

：已知的GitHub用户名及验证方法

Domain-Specific Anti-Rationalization

领域特定的反合理化说明

Rationalization	Why It's Wrong	Required Action
"Username is probably right"	Probably = 0 interactions after 5 min	Verify with gh pr list first
"Parallel mining saves time"	Saves nothing when rate limit kills jobs	Run sequentially
"Just dump all patterns"	Flat lists are unusable at 50+ items	Categorize and score confidence
"Low limit is enough"	Small samples produce low-confidence rules	Use --limit 100+ for meaningful patterns

合理化借口	问题所在	要求操作
"用户名应该是对的"	"应该"意味着5分钟后得到0条交互内容	先使用gh pr list验证
"并行挖掘节省时间"	当速率限制导致任务失败时，根本无法节省时间	串行运行任务
"直接导出所有模式就行"	当模式数量超过50个时，扁平列表毫无用处	进行分类并评分置信度
"小限额足够了"	小样本会生成低置信度的规则	使用 `--limit 100+` 参数以获取有意义的模式