confidence-calibration

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Confidence Calibration Framework

可信度校准框架

When This Activates

触发场景

This skill activates when:
  • Expressing uncertainty about a suggestion
  • Working in a domain with past errors
  • User asks "how confident are you?"
  • Making predictions or recommendations
当出现以下情况时,该技能会被激活:
  • 对建议表达不确定性
  • 在曾出现过错误的领域开展工作
  • 用户询问“你有多确定?”
  • 进行预测或给出推荐建议

Domain Tracking

领域追踪

The system tracks prediction accuracy across domains:
Domain CategoryExamples
Infrastructuredocker, kubernetes, nginx, ci/cd
Frontendreact, react-native, nextjs, expo
Languagestypescript, javascript, python
Backendfirebase, firestore, authentication
Operationstesting, git, database, api
Optimizationperformance, security, caching
系统会跨领域追踪预测的准确性:
领域分类示例
基础设施docker, kubernetes, nginx, ci/cd
前端react, react-native, nextjs, expo
编程语言typescript, javascript, python
后端firebase, firestore, authentication
运维testing, git, database, api
优化performance, security, caching

Calibration Data Structure

校准数据结构

json
{
  "domain_stats": {
    "docker": {
      "correct": 12,
      "incorrect": 3,
      "partial": 2,
      "accuracy": 0.71
    }
  },
  "overall": {
    "correct": 145,
    "incorrect": 23,
    "partial": 18
  }
}
json
{
  "domain_stats": {
    "docker": {
      "correct": 12,
      "incorrect": 3,
      "partial": 2,
      "accuracy": 0.71
    }
  },
  "overall": {
    "correct": 145,
    "incorrect": 23,
    "partial": 18
  }
}

How to Express Calibrated Confidence

如何表达经过校准的可信度

High Confidence (>85% domain accuracy)

高可信度(领域准确率>85%)

"This approach should work well - it follows established patterns."
"这种方法应该能很好地生效——它遵循了既定模式。"

Medium Confidence (60-85% accuracy)

中等可信度(准确率60-85%)

"This is my best assessment, though you may want to verify [specific aspect]."
"这是我给出的最佳评估,但你可能需要验证[特定方面]。"

Low Confidence (<60% accuracy, or past errors in domain)

低可信度(准确率<60%,或该领域曾出现过错误)

"I've had some misses in [domain] before. Let me double-check this..."
"I'm less certain here - consider testing thoroughly before proceeding."
"我之前在[领域]上出过一些错。让我再仔细检查一下……"
"我在这里的确定性较低——建议在推进前进行全面测试。"

Unknown Domain

未知领域

"I don't have much track record in [area]. Proceed with appropriate caution."
"我在[领域]的过往记录不多。请谨慎推进。"

Self-Awareness Triggers

自我感知触发点

When working in a domain with past errors:
  1. Check track record before making recommendations
  2. Acknowledge past mistakes if relevant: "I've gotten Docker networking wrong before..."
  3. Suggest verification for uncertain areas
  4. Ask clarifying questions rather than guessing
当在曾出现过错误的领域开展工作时:
  1. 查看过往记录后再给出推荐建议
  2. 若相关则承认过往错误:“我之前在Docker网络配置上出过一些错……”
  3. 建议对不确定的部分进行验证
  4. 提出澄清问题而非猜测

Recording Outcomes

记录结果

When the user indicates an outcome:
Success signals:
  • "That worked!"
  • "Perfect"
  • "Thanks, it's fixed"
Failure signals:
  • "That didn't work"
  • "Still broken"
  • "Wrong"
Partial signals:
  • "Almost"
  • "Partly fixed"
  • "One issue remaining"
当用户反馈结果时:
成功信号:
  • "这奏效了!"
  • "完美"
  • "谢谢,问题解决了"
失败信号:
  • "这没用"
  • "还是坏的"
  • "错了"
部分成功信号:
  • "差不多了"
  • "部分问题已解决"
  • "还剩一个问题"

Domain Detection Keywords

领域检测关键词

python
DOMAIN_KEYWORDS = {
    "docker": ["docker", "container", "dockerfile", "compose"],
    "react": ["react", "component", "jsx", "hooks", "useState"],
    "react-native": ["react native", "expo", "metro"],
    "nextjs": ["next.js", "nextjs", "getServerSideProps"],
    "typescript": ["typescript", "type", "interface"],
    "firebase": ["firebase", "firestore"],
    "authentication": ["auth", "login", "token", "jwt"],
    "testing": ["test", "jest", "mock", "coverage"],
    "git": ["git", "commit", "branch", "merge"],
    "performance": ["slow", "optimize", "cache", "memory"]
}
python
DOMAIN_KEYWORDS = {
    "docker": ["docker", "container", "dockerfile", "compose"],
    "react": ["react", "component", "jsx", "hooks", "useState"],
    "react-native": ["react native", "expo", "metro"],
    "nextjs": ["next.js", "nextjs", "getServerSideProps"],
    "typescript": ["typescript", "type", "interface"],
    "firebase": ["firebase", "firestore"],
    "authentication": ["auth", "login", "token", "jwt"],
    "testing": ["test", "jest", "mock", "coverage"],
    "git": ["git", "commit", "branch", "merge"],
    "performance": ["slow", "optimize", "cache", "memory"]
}

Integration with Learning System

与学习系统的集成

Confidence data feeds into:
  • <semantic-memory>
    context injection
  • ReasoningBank for pattern matching
  • Preference learner for style calibration
可信度数据会被输入到:
  • <semantic-memory>
    上下文注入
  • 用于模式匹配的ReasoningBank
  • 用于风格校准的偏好学习器

Example Workflow

示例工作流

User: "Set up Docker networking between containers"

1. Detect domain: docker
2. Check calibration: docker accuracy = 71%
3. Check past corrections: "Docker can't use Metal GPU on Mac"
4. Respond with calibrated confidence:

"For container networking, you'll want a bridge network.
Note: I've had some edge cases with Docker networking before,
so if this doesn't work immediately, the issue is usually
DNS resolution between containers."
用户:"设置容器之间的Docker网络"

1. 检测领域:docker
2. 查看校准数据:docker准确率 = 71%
3. 查看过往修正记录:"Docker在Mac上无法使用Metal GPU"
4. 以经过校准的可信度回复:

"对于容器网络,你需要使用桥接网络。
注意:我之前在Docker网络配置上遇到过一些边缘案例问题,
所以如果这个方法不能立即生效,通常问题出在容器之间的DNS解析上。"