geofeed-tuner

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Geofeed Tuner – Create Better IP Geolocation Feeds

Geofeed调优工具——打造更优质的IP地理定位源

This skill helps you create and improve IP geolocation feeds in CSV format by:
  • Ensuring your CSV is well-formed and consistent
  • Checking alignment with RFC 8805 (the industry standard)
  • Applying opinionated best practices learned from real-world deployments
  • Suggesting improvements for accuracy, completeness, and privacy
本技能可通过以下方式帮助您创建和优化CSV格式的IP地理定位源:
  • 确保CSV格式规范且一致
  • 检查是否符合RFC 8805(行业标准)
  • 应用从实际部署中总结的经验性最佳实践
  • 为提升准确性、完整性和隐私性提供改进建议

When to Use This Skill

何时使用本技能

  • Use this skill when a user asks for help creating, improving, or publishing an IP geolocation feed file in CSV format.
  • Use it to tune and troubleshoot CSV geolocation feeds — catching errors, suggesting improvements, and ensuring real-world usability beyond RFC compliance.
  • Intended audience:
    • Network operators, administrators, and engineers responsible for publicly routable IP address space
    • Organizations such as ISPs, mobile carriers, cloud providers, hosting and colocation companies, Internet Exchange operators, and satellite internet providers
  • Do not use this skill for private or internal IP address management; it applies only to publicly routable IP addresses.
  • 当用户需要帮助创建、改进或发布CSV格式的IP地理定位源文件时,使用本技能。
  • 用于调优和排查CSV地理定位源问题——捕获错误、提出改进建议,确保其在实际场景中的可用性,而不仅仅满足RFC合规要求。
  • 目标受众:
    • 负责可公开路由IP地址空间的网络运营商、管理员和工程师
    • 各类组织,如ISP、移动运营商、云服务商、托管和 colocation 公司、互联网交换中心运营商以及卫星互联网服务商
  • 请勿用于私有或内部IP地址管理;本技能仅适用于可公开路由的IP地址

Prerequisites

前置条件

  • Python 3 is required.
  • 需要安装Python 3

Directory Structure and File Management

目录结构与文件管理

This skill uses a clear separation between distribution files (read-only) and working files (generated at runtime).
本技能明确区分分发文件(只读)和工作文件(运行时生成)。

Read-Only Directories (Do Not Modify)

只读目录(请勿修改)

The following directories contain static distribution assets. Do not create, modify, or delete files in these directories:
DirectoryPurpose
assets/
Static data files (ISO codes, examples)
references/
RFC specifications and code snippets for reference
scripts/
Executable code and HTML template files for reports
以下目录包含静态分发资源。请勿在这些目录中创建、修改或删除文件:
目录用途
assets/
静态数据文件(ISO代码、示例文件等)
references/
RFC规范和代码片段参考文件
scripts/
可执行代码和报告HTML模板文件

Working Directories (Generated Content)

工作目录(生成内容)

All generated, temporary, and output files go in these directories:
DirectoryPurpose
run/
Working directory for all agent-generated content
run/data/
Downloaded CSV files from remote URLs
run/report/
Generated HTML tuning reports
所有生成的临时文件和输出文件都存储在这些目录中:
目录用途
run/
所有Agent生成内容的工作目录
run/data/
从远程URL下载的CSV文件存储目录
run/report/
生成的HTML调优报告存储目录

File Management Rules

文件管理规则

  1. Never write to
    assets/
    ,
    references/
    , or
    scripts/
    — these are part of the skill distribution and must remain unchanged.
  2. All downloaded input files (from remote URLs) must be saved to
    ./run/data/
    .
  3. All generated HTML reports must be saved to
    ./run/report/
    .
  4. All generated Python scripts must be saved to
    ./run/
    .
  5. The
    run/
    directory may be cleared between sessions; do not store permanent data there.
  6. Working directory for execution: All generated scripts in
    ./run/
    must be executed with the skill root directory (the directory containing
    SKILL.md
    ) as the current working directory, so that relative paths like
    assets/iso3166-1.json
    and
    ./run/data/report-data.json
    resolve correctly. Do not
    cd
    into
    ./run/
    before running scripts.
  1. 切勿向
    assets/
    references/
    scripts/
    写入内容
    ——这些是技能分发的一部分,必须保持不变。
  2. 所有下载的输入文件(来自远程URL)必须保存到
    ./run/data/
  3. 所有生成的HTML报告必须保存到
    ./run/report/
  4. 所有生成的Python脚本必须保存到
    ./run/
  5. run/
    目录可能会在会话之间被清空;请勿在此存储永久数据。
  6. 执行工作目录:
    ./run/
    中的所有生成脚本必须以技能根目录(包含
    SKILL.md
    的目录)作为当前工作目录执行,这样相对路径如
    assets/iso3166-1.json
    ./run/data/report-data.json
    才能正确解析。执行脚本前请勿切换到
    ./run/
    目录。

Processing Pipeline: Sequential Phase Execution

处理流程:按阶段顺序执行

All phases must be executed in order, from Phase 1 through Phase 6. Each phase depends on the successful completion of the previous phase. For example, structure checks must complete before quality analysis can run.
The phases are summarized below. The agent must follow the detailed steps outlined further in each phase section.
PhaseNameDescription
1Understand the StandardReview the key requirements of RFC 8805 for self-published IP geolocation feeds
2Gather InputCollect IP subnet data from local files or remote URLs
3Checks & SuggestionsValidate CSV structure, analyze IP prefixes, and check data quality
4Tuning Data LookupUse Fastah's MCP tool to retrieve tuning data for improving geolocation accuracy
5Generate Tuning ReportCreate an HTML report summarizing the analysis and suggestions
6Final ReviewVerify consistency and completeness of the report data
Do not skip phases. Each phase provides critical checks or data transformations required by subsequent stages.
所有阶段必须按顺序执行,从第1阶段到第6阶段。每个阶段都依赖于前一阶段的成功完成。例如,结构检查必须在质量分析之前完成。
以下是各阶段的概述。Agent必须遵循每个阶段部分中列出的详细步骤。
阶段名称描述
1理解标准要求回顾RFC 8805中关于自发布IP地理定位源的关键要求
2收集输入数据从本地文件或远程URL收集IP子网数据
3检查与建议验证CSV结构、分析IP前缀并检查数据质量
4调优数据查询使用Fastah的MCP工具检索调优数据,以提升地理定位准确性
5生成调优报告创建HTML报告,总结分析结果和建议
6最终审核验证报告数据的一致性和完整性
请勿跳过任何阶段。每个阶段都提供后续阶段所需的关键检查或数据转换。

Execution Plan Rules

执行计划规则

Before executing each phase, the agent MUST generate a visible TODO checklist.
The plan MUST:
  • Appear at the very start of the phase
  • List every step in order
  • Use a checkbox format
  • Be updated live as steps complete
在执行每个阶段之前,Agent必须生成一个可见的待办事项清单。
计划必须:
  • 出现在阶段的最开始
  • 按顺序列出每个步骤
  • 使用复选框格式
  • 随着步骤完成实时更新

Phase 1: Understand the Standard

阶段1:理解标准要求

The key requirements from RFC 8805 that this skill enforces are summarized below. Use this summary as your working reference. Only consult the full RFC 8805 text for edge cases, ambiguous situations, or when the user asks a standards question not covered here.
以下是本技能强制执行的RFC 8805关键要求摘要。请将此摘要作为工作参考。仅在遇到边缘情况、模糊场景或用户提出此处未涵盖的标准相关问题时,才查阅完整的RFC 8805文本

RFC 8805 Key Facts

RFC 8805关键要点

Purpose: A self-published IP geolocation feed lets network operators publish authoritative location data for their IP address space in a simple CSV format, allowing geolocation providers to incorporate operator-supplied corrections.
CSV Column Order (Sections 2.1.1.1–2.1.1.5):
ColumnFieldRequiredNotes
1
ip_prefix
YesCIDR notation; IPv4 or IPv6; must be a network address
2
alpha2code
NoISO 3166-1 alpha-2 country code; empty or "ZZ" = do-not-geolocate
3
region
NoISO 3166-2 subdivision code (e.g.,
US-CA
)
4
city
NoFree-text city name; no authoritative validation set
5
postal_code
NoDeprecated — must be left empty or absent
Structural rules:
  • Files may contain comment lines beginning with
    #
    (including the header, if present).
  • A header row is optional; if present, it is treated as a comment if it starts with
    #
    .
  • Files must be encoded in UTF-8.
  • Subnet host bits must not be set (i.e.,
    192.168.1.1/24
    is invalid; use
    192.168.1.0/24
    ).
  • Applies only to globally routable unicast addresses — not private, loopback, link-local, or multicast space.
Do-not-geolocate: An entry with an empty
alpha2code
or case-insensitive
ZZ
(irrespective of values of region/city) is an explicit signal that the operator does not want geolocation applied to that prefix.
Postal codes deprecated (Section 2.1.1.5): The fifth column must not contain postal or ZIP codes. They are too fine-grained for IP-range mapping and raise privacy concerns.
**目的:**自发布IP地理定位源允许网络运营商以简单的CSV格式发布其IP地址空间的权威位置数据,以便地理定位提供商纳入运营商提供的修正信息。
CSV列顺序(第2.1.1.1–2.1.1.5节):
列号字段名是否必填说明
1
ip_prefix
CIDR表示法;支持IPv4或IPv6;必须是网络地址
2
alpha2code
ISO 3166-1 alpha-2国家代码;空值或"ZZ"表示无需地理定位
3
region
ISO 3166-2细分代码(例如
US-CA
4
city
自由文本城市名称;无权威验证集
5
postal_code
已弃用——必须留空或省略该列
结构规则:
  • 文件中可包含以
    #
    开头的注释行(包括可能存在的表头)。
  • 表头行是可选的;如果存在,若以
    #
    开头则会被视为注释行。
  • 文件必须以UTF-8编码。
  • 子网主机位不得设置(例如
    192.168.1.1/24
    无效;应使用
    192.168.1.0/24
    )。
  • 仅适用于全局可路由的单播地址——不适用于私有、环回、链路本地或多播地址段。
**无需地理定位的标记:**如果
alpha2code
为空或不区分大小写的
ZZ
(无论region/city的值如何),则明确表示运营商不希望对该前缀进行地理定位。
**邮政编码已弃用(第2.1.1.5节):**第五列不得包含邮政编码或ZIP代码。它们对于IP范围映射来说粒度太细,且存在隐私问题。

Phase 2: Gather Input

阶段2:收集输入数据

  • If the user has not already provided a list of IP subnets or ranges (sometimes referred to as
    inetnum
    or
    inet6num
    ), prompt them to supply it. Accepted input formats:
    • Text pasted into the chat
    • A local CSV file
    • A remote URL pointing to a CSV file
  • If the input is a remote URL:
    • Attempt to download the CSV file to
      ./run/data/
      before processing.
    • On HTTP error (4xx, 5xx, timeout, or redirect loop), stop immediately and report to the user:
      Feed URL is not reachable: HTTP {status_code}. Please verify the URL is publicly accessible.
    • Do not proceed to Phase 3 with an incomplete or empty download.
  • If the input is a local file, process it directly without downloading.
  • Encoding detection and normalization:
    1. Attempt to read the file as UTF-8 first.
    2. If a
      UnicodeDecodeError
      is raised, try
      utf-8-sig
      (UTF-8 with BOM), then
      latin-1
      .
    3. Once successfully decoded, re-encode and write the working copy as UTF-8.
    4. If no encoding succeeds, stop and report:
      Unable to decode input file. Please save it as UTF-8 and try again.
  • 如果用户尚未提供IP子网或范围列表(有时称为
    inetnum
    inet6num
    ),请提示他们提供。支持的输入格式:
    • 粘贴到聊天框中的文本
    • 本地CSV文件
    • 指向CSV文件的远程URL
  • 如果输入是远程URL
    • 处理前先尝试将CSV文件下载到
      ./run/data/
    • 如果遇到HTTP错误(4xx、5xx、超时或重定向循环),立即停止并向用户报告:
      源URL无法访问:HTTP {status_code}。请验证该URL是否可公开访问。
    • 若下载不完整或为空,请勿进入阶段3。
  • 如果输入是本地文件,直接处理无需下载。
  • 编码检测与标准化:
    1. 首先尝试以UTF-8编码读取文件。
    2. 如果引发
      UnicodeDecodeError
      ,尝试
      utf-8-sig
      (带BOM的UTF-8),再尝试
      latin-1
    3. 成功解码后,重新编码为UTF-8并写入工作副本。
    4. 如果所有编码都无法成功解码,停止操作并报告:
      无法解码输入文件。请将其保存为UTF-8格式后重试。

Phase 3: Checks & Suggestions

阶段3:检查与建议

Execution Rules

执行规则

  • Generate a script for this phase.
  • Do NOT combine this phase with others.
  • Do NOT precompute future-phase data.
  • Store the output as a JSON file at:
    ./run/data/report-data.json
  • 为此阶段生成一个脚本
  • 请勿将此阶段与其他阶段合并。
  • 请勿预先计算后续阶段的数据。
  • 将输出存储为JSON文件,路径为:
    ./run/data/report-data.json

Schema Definition

模式定义

The JSON structure below is IMMUTABLE during Phase 3. Phase 4 will later add a
TunedEntry
object to each object in
Entries
— this is the only permitted schema extension and happens in a separate phase.
JSON keys map directly to template placeholders like
{{.CountryCode}}
,
{{.HasError}}
, etc.
json
{
  "InputFile": "",
  "Timestamp": 0,

  "TotalEntries": 0,
  "IpV4Entries": 0,
  "IpV6Entries": 0,
  "InvalidEntries": 0,

  "Errors": 0,
  "Warnings": 0,
  "OK": 0,
  "Suggestions": 0,

  "CityLevelAccuracy": 0,
  "RegionLevelAccuracy": 0,
  "CountryLevelAccuracy": 0,
  "DoNotGeolocate": 0,

  "Entries": [
    {
      "Line": 0,
      "IPPrefix": "",
      "CountryCode": "",
      "RegionCode": "",
      "City": "",

      "Status": "",
      "IPVersion": "",

      "Messages": [
        {
          "ID": "",
          "Type": "",
          "Text": "",
          "Checked": false
        }
      ],

      "HasError": false,
      "HasWarning": false,
      "HasSuggestion": false,
      "DoNotGeolocate": false,
      "GeocodingHint": "",
      "Tunable": false
    }
  ]
}
Field definitions:
Top-level metadata:
  • InputFile
    : The original input source, either a local filename or a remote URL.
  • Timestamp
    : Milliseconds since Unix epoch when the tuning was performed.
  • TotalEntries
    : Total number of data rows processed (excluding comment and blank lines).
  • IpV4Entries
    : Count of entries that are IPv4 subnets.
  • IpV6Entries
    : Count of entries that are IPv6 subnets.
  • InvalidEntries
    : Count of entries that failed IP prefix parsing and CSV parsing.
  • Errors
    : Total entries whose
    Status
    is
    ERROR
    .
  • Warnings
    : Total entries whose
    Status
    is
    WARNING
    .
  • OK
    : Total entries whose
    Status
    is
    OK
    .
  • Suggestions
    : Total entries whose
    Status
    is
    SUGGESTION
    .
  • CityLevelAccuracy
    : Count of valid entries where
    City
    is non-empty.
  • RegionLevelAccuracy
    : Count of valid entries where
    RegionCode
    is non-empty and
    City
    is empty.
  • CountryLevelAccuracy
    : Count of valid entries where
    CountryCode
    is non-empty,
    RegionCode
    is empty, and
    City
    is empty.
  • DoNotGeolocate
    (metadata): Count of valid entries where
    CountryCode
    ,
    RegionCode
    , and
    City
    are all empty.
Entry fields:
  • Entries
    : Array of objects, one per data row, with the following per-entry fields:
    • Line
      : 1-based line number in the original CSV (counting all lines including comments and blanks).
    • IPPrefix
      : The normalized IP prefix in CIDR slash notation.
    • CountryCode
      : The ISO 3166-1 alpha-2 country code, or empty string.
    • RegionCode
      : The ISO 3166-2 region code (e.g.,
      US-CA
      ), or empty string.
    • City
      : The city name, or empty string.
    • Status
      : Highest severity assigned:
      ERROR
      >
      WARNING
      >
      SUGGESTION
      >
      OK
      .
    • IPVersion
      :
      "IPv4"
      or
      "IPv6"
      based on the parsed IP prefix.
    • Messages
      : Array of message objects, each with:
      • ID
        : String identifier from the Validation Rules Reference table below (e.g.,
        "1101"
        ,
        "3301"
        ).
      • Type
        : The severity type:
        "ERROR"
        ,
        "WARNING"
        , or
        "SUGGESTION"
        .
      • Text
        : The human-readable validation message string.
      • Checked
        :
        true
        if the validation rule is auto-tunable (
        Tunable: true
        in the reference table),
        false
        otherwise. Controls whether the checkbox in the report is
        checked
        or
        disabled
        .
    • HasError
      :
      true
      if any message has
      Type
      "ERROR"
      .
    • HasWarning
      :
      true
      if any message has
      Type
      "WARNING"
      .
    • HasSuggestion
      :
      true
      if any message has
      Type
      "SUGGESTION"
      .
    • DoNotGeolocate
      (entry):
      true
      if
      CountryCode
      is empty or
      "ZZ"
      — the entry is an explicit do-not-geolocate signal.
    • GeocodingHint
      : Always empty string
      ""
      in Phase 3. Reserved for future use.
    • Tunable
      :
      true
      if any message in the entry has
      Checked: true
      . Computed as logical OR across all messages'
      Checked
      values. This flag drives the "Tune" button visibility in the report.
以下JSON结构在阶段3中是不可修改的。阶段4将在
Entries
中的每个对象中添加一个
TunedEntry
对象——这是唯一允许的模式扩展,且将在单独阶段中完成。
JSON键直接映射到模板占位符,如
{{.CountryCode}}
{{.HasError}}
等。
json
{
  "InputFile": "",
  "Timestamp": 0,

  "TotalEntries": 0,
  "IpV4Entries": 0,
  "IpV6Entries": 0,
  "InvalidEntries": 0,

  "Errors": 0,
  "Warnings": 0,
  "OK": 0,
  "Suggestions": 0,

  "CityLevelAccuracy": 0,
  "RegionLevelAccuracy": 0,
  "CountryLevelAccuracy": 0,
  "DoNotGeolocate": 0,

  "Entries": [
    {
      "Line": 0,
      "IPPrefix": "",
      "CountryCode": "",
      "RegionCode": "",
      "City": "",

      "Status": "",
      "IPVersion": "",

      "Messages": [
        {
          "ID": "",
          "Type": "",
          "Text": "",
          "Checked": false
        }
      ],

      "HasError": false,
      "HasWarning": false,
      "HasSuggestion": false,
      "DoNotGeolocate": false,
      "GeocodingHint": "",
      "Tunable": false
    }
  ]
}
字段定义:
顶层元数据:
  • InputFile
    :原始输入源,可为本地文件名或远程URL。
  • Timestamp
    :调优执行时的Unix时间戳(毫秒)。
  • TotalEntries
    :处理的数据总行数(不包括注释行和空行)。
  • IpV4Entries
    :IPv4子网条目的数量。
  • IpV6Entries
    :IPv6子网条目的数量。
  • InvalidEntries
    :无法解析IP前缀和CSV格式的条目数量。
  • Errors
    Status
    ERROR
    的条目总数。
  • Warnings
    Status
    WARNING
    的条目总数。
  • OK
    Status
    OK
    的条目总数。
  • Suggestions
    Status
    SUGGESTION
    的条目总数。
  • CityLevelAccuracy
    City
    字段非空的有效条目数量。
  • RegionLevelAccuracy
    RegionCode
    非空且
    City
    为空的有效条目数量。
  • CountryLevelAccuracy
    CountryCode
    非空且
    RegionCode
    City
    为空的有效条目数量。
  • DoNotGeolocate
    :标记为无需地理定位的有效条目数量。
条目字段:
  • Entries
    :对象数组,每个对象对应一行数据,包含以下字段:
    • Line
      :原始CSV中的行号(从1开始计数,包括所有行,如注释行和空行)。
    • IPPrefix
      :标准化后的IP前缀(CIDR斜杠表示法)。
    • CountryCode
      :ISO 3166-1 alpha-2国家代码,或空字符串。
    • RegionCode
      :ISO 3166-2地区代码(例如
      US-CA
      ),或空字符串。
    • City
      :城市名称,或空字符串。
    • Status
      :分配的最高严重级别:
      ERROR
      >
      WARNING
      >
      SUGGESTION
      >
      OK
    • IPVersion
      :根据解析的IP前缀确定为
      "IPv4"
      "IPv6"
    • Messages
      :消息对象数组,每个对象包含:
      • ID
        :来自下方验证规则参考表的字符串标识符(例如
        "1101"
        "3301"
        )。
      • Type
        :严重级别类型:
        "ERROR"
        "WARNING"
        "SUGGESTION"
      • Text
        :人类可读的验证消息字符串。
      • Checked
        :如果验证规则可自动调优(参考表中
        Tunable: true
        )则为
        true
        ,否则为
        false
        。控制报告中复选框是否为
        checked
        disabled
        状态。
    • HasError
      :如果任何消息的
      Type
      "ERROR"
      则为
      true
    • HasWarning
      :如果任何消息的
      Type
      "WARNING"
      则为
      true
    • HasSuggestion
      :如果任何消息的
      Type
      "SUGGESTION"
      则为
      true
    • DoNotGeolocate
      (条目级):如果
      CountryCode
      为空或
      "ZZ"
      则为
      true
      ——表示该条目明确标记为无需地理定位。
    • GeocodingHint
      :阶段3中始终为空字符串
      ""
      。预留供后续使用。
    • Tunable
      :如果条目中任何消息的
      Checked
      true
      则为
      true
      。通过所有消息的
      Checked
      值的逻辑或运算得出。该标志控制报告中“调优”按钮的可见性。

Validation Rules Reference

验证规则参考表

When adding messages to an entry, use the
ID
,
Type
,
Text
, and
Checked
values from this table.
IDTypeTextCheckedCondition Reference
1101
ERROR
IP prefix is empty
false
IP Prefix Analysis: empty
1102
ERROR
Invalid IP prefix: unable to parse as IPv4 or IPv6 network
false
IP Prefix Analysis: invalid syntax
1103
ERROR
Non-public IP range is not allowed in an RFC 8805 feed
false
IP Prefix Analysis: non-public
3101
SUGGESTION
IPv4 prefix is unusually large and may indicate a typo
false
IP Prefix Analysis: IPv4 < /22
3102
SUGGESTION
IPv6 prefix is unusually large and may indicate a typo
false
IP Prefix Analysis: IPv6 < /64
1201
ERROR
Invalid country code: not a valid ISO 3166-1 alpha-2 value
true
Country Code Analysis: invalid
1301
ERROR
Invalid region format; expected COUNTRY-SUBDIVISION (e.g., US-CA)
true
Region Code Analysis: bad format
1302
ERROR
Invalid region code: not a valid ISO 3166-2 subdivision
true
Region Code Analysis: unknown code
1303
ERROR
Region code does not match the specified country code
true
Region Code Analysis: mismatch
1401
ERROR
Invalid city name: placeholder value is not allowed
false
City Name Analysis: placeholder
1402
ERROR
Invalid city name: abbreviated or code-based value detected
true
City Name Analysis: abbreviation
2401
WARNING
City name formatting is inconsistent; consider normalizing the value
true
City Name Analysis: formatting
1501
ERROR
Postal codes are deprecated by RFC 8805 and must be removed for privacy reasons
true
Postal Code Check
3301
SUGGESTION
Region is usually unnecessary for small territories; consider removing the region value
true
Tuning: small territory region
3402
SUGGESTION
City-level granularity is usually unnecessary for small territories; consider removing the city value
true
Tuning: small territory city
3303
SUGGESTION
Region code is recommended when a city is specified; choose a region from the dropdown
true
Tuning: missing region with city
3104
SUGGESTION
Confirm whether this subnet is intentionally marked as do-not-geolocate or missing location data
true
Tuning: unspecified geolocation
向条目添加消息时,请使用此表中的
ID
Type
Text
Checked
值。
ID类型消息文本Checked条件参考
1101
ERROR
IP前缀为空
false
IP前缀分析:空值
1102
ERROR
无效IP前缀:无法解析为IPv4或IPv6网络地址
false
IP前缀分析:语法无效
1103
ERROR
RFC 8805源中不允许使用非公开IP地址段
false
IP前缀分析:非公开地址
3101
SUGGESTION
IPv4前缀过大,可能存在输入错误
false
IP前缀分析:IPv4前缀小于/22
3102
SUGGESTION
IPv6前缀过大,可能存在输入错误
false
IP前缀分析:IPv6前缀小于/64
1201
ERROR
无效国家代码:不是有效的ISO 3166-1 alpha-2值
true
国家代码分析:无效值
1301
ERROR
无效地区格式;预期格式为COUNTRY-SUBDIVISION(例如US-CA)
true
地区代码分析:格式错误
1302
ERROR
无效地区代码:不是有效的ISO 3166-2细分代码
true
地区代码分析:未知代码
1303
ERROR
地区代码与指定的国家代码不匹配
true
地区代码分析:代码不匹配
1401
ERROR
无效城市名称:不允许使用占位符值
false
城市名称分析:占位符
1402
ERROR
无效城市名称:检测到缩写或基于代码的值
true
城市名称分析:缩写形式
2401
WARNING
城市名称格式不一致;建议标准化该值
true
城市名称分析:格式问题
1501
ERROR
RFC 8805已弃用邮政编码,出于隐私考虑必须移除
true
邮政编码检查
3301
SUGGESTION
对于小型地区,通常无需指定地区;建议移除地区值
true
调优建议:小型地区的地区值
3402
SUGGESTION
对于小型地区,通常无需指定城市粒度;建议移除城市值
true
调优建议:小型地区的城市值
3303
SUGGESTION
指定城市时建议同时提供地区代码;请从下拉列表中选择一个地区
true
调优建议:指定城市但缺少地区
3104
SUGGESTION
请确认该子网是否有意标记为无需地理定位,或是否缺少位置数据
true
调优建议:未指定地理定位信息

Populating Messages

填充消息

When a validation check matches, add a message to the entry's
Messages
array using the values from the reference table:
python
entry["Messages"].append({
    "ID": "1201",      # From the table
    "Type": "ERROR",   # From the table
    "Text": "Invalid country code: not a valid ISO 3166-1 alpha-2 value",  # From the table
    "Checked": True    # From the table (True = tunable)
})
After populating all messages for an entry, derive the entry-level flags:
python
entry["HasError"] = any(m["Type"] == "ERROR" for m in entry["Messages"])
entry["HasWarning"] = any(m["Type"] == "WARNING" for m in entry["Messages"])
entry["HasSuggestion"] = any(m["Type"] == "SUGGESTION" for m in entry["Messages"])
entry["Tunable"] = any(m["Checked"] for m in entry["Messages"])
当验证检查匹配时,使用参考表中的值将消息添加到条目的
Messages
数组中:
python
entry["Messages"].append({
    "ID": "1201",      # 来自参考表
    "Type": "ERROR",   # 来自参考表
    "Text": "Invalid country code: not a valid ISO 3166-1 alpha-2 value",  # 来自参考表
    "Checked": True    # 来自参考表(True表示可自动调优)
})
为条目填充所有消息后,推导条目级标志:
python
entry["HasError"] = any(m["Type"] == "ERROR" for m in entry["Messages"])
entry["HasWarning"] = any(m["Type"] == "WARNING" for m in entry["Messages"])
entry["HasSuggestion"] = any(m["Type"] == "SUGGESTION" for m in entry["Messages"])
entry["Tunable"] = any(m["Checked"] for m in entry["Messages"])

Accuracy Level Counting Rules

准确性级别计数规则

Accuracy levels are mutually exclusive. Assign each valid (non-ERROR, non-invalid) entry to exactly one bucket based on the most granular non-empty geo field:
ConditionBucket
City
is non-empty
CityLevelAccuracy
RegionCode
non-empty AND
City
is empty
RegionLevelAccuracy
CountryCode
non-empty,
RegionCode
and
City
empty
CountryLevelAccuracy
DoNotGeolocate
(entry) is
true
DoNotGeolocate
(metadata)
Do not count entries with
HasError: true
or entries in
InvalidEntries
in any accuracy bucket.
The agent MUST NOT:
  • Rename fields
  • Add or remove fields
  • Change data types
  • Reorder keys
  • Alter nesting
  • Wrap the object
  • Split into multiple files
If a value is unknown, leave it empty — never invent data.
准确性级别是互斥的。根据最精细的非空地理字段,将每个有效(非ERROR、非无效)条目分配到恰好一个分类中:
条件分类
City
字段非空
CityLevelAccuracy
RegionCode
非空且
City
为空
RegionLevelAccuracy
CountryCode
非空且
RegionCode
City
为空
CountryLevelAccuracy
条目的
DoNotGeolocate
true
DoNotGeolocate
(元数据)
请勿将
HasError: true
的条目或
InvalidEntries
中的条目计入任何准确性分类。
Agent不得:
  • 重命名字段
  • 添加或删除字段
  • 更改数据类型
  • 调整键的顺序
  • 修改嵌套结构
  • 包装对象
  • 拆分为多个文件
如果值未知,留空——切勿编造数据。

Structure & Format Check

结构与格式检查

This phase verifies that your feed is well-formed and parseable. Critical structural errors must be resolved before the tuner can analyze geolocation quality.
此阶段验证源文件格式是否规范、是否可解析。必须先解决关键结构错误,调优工具才能分析地理定位质量。
CSV Structure
CSV结构
This subsection defines rules for CSV-formatted input files used for IP geolocation feeds. The goal is to ensure the file can be parsed reliably and normalized into a consistent internal representation.
  • CSV Structure Checks
    • If
      pandas
      is available, use it for CSV parsing.
    • Otherwise, fall back to Python's built-in
      csv
      module.
    • Ensure the CSV contains exactly 4 or 5 logical columns.
    • Comment lines are allowed.
    • A header row may or may not be present.
    • If no header row exists, assume the implicit column order:
      ip_prefix, alpha2code, region, city, postal code (deprecated)
    • Refer to the example input file:
      assets/example/01-user-input-rfc8805-feed.csv
  • CSV Cleansing and Normalization
    • Clean and normalize the CSV using Python logic equivalent to the following operations:
      • Select only the first five columns, dropping any columns beyond the fifth.
      • Write the output file with a UTF-8 BOM.
    • Comments
      • Remove comment rows where the first column begins with
        #
        .
      • This also removes a header row if it begins with
        #
        .
      • Create a map of comments using the 1-based line number as the key and the full original line as the value. Also store blank lines.
      • Store this map in a JSON file at:
        ./run/data/comments.json
      • Example:
        { "4": "# It's OK for small city states to leave state ISO2 code unspecified" }
  • Notes
    • Both implementation paths (
      pandas
      and built-in
      csv
      ) must write output using the
      utf-8-sig
      encoding to ensure a UTF-8 BOM is present.
本小节定义了用于IP地理定位源的CSV格式输入文件规则。目标是确保文件能够被可靠解析并标准化为一致的内部表示形式
  • CSV结构检查
    • 如果已安装
      pandas
      ,使用它进行CSV解析。
    • 否则,回退到Python内置的
      csv
      模块。
    • 确保CSV包含恰好4或5个逻辑列
    • 允许存在注释行。
    • 可能存在或不存在表头行。
    • 如果没有表头行,假设默认列顺序:
      ip_prefix, alpha2code, region, city, postal code(已弃用)
    • 参考示例输入文件:
      assets/example/01-user-input-rfc8805-feed.csv
  • CSV清理与标准化
    • 使用Python逻辑对CSV进行清理和标准化,等效于以下操作:
      • 仅保留前5列,删除第5列之后的所有列。
      • UTF-8 BOM编码写入输出文件。
    • 注释处理
      • 删除第一列以
        #
        开头
        的注释行。
      • 这也会删除以
        #
        开头的表头行。
      • 创建注释映射,以1-based行号为键,完整原始行为值。同时存储空行。
      • 将此映射存储为JSON文件,路径为:
        ./run/data/comments.json
      • 示例:
        { "4": "# 小型城市国家可以不指定州ISO2代码" }
  • 注意事项
    • 两种实现方式(
      pandas
      和内置
      csv
      模块)都必须使用
      utf-8-sig
      编码写入输出,确保包含UTF-8 BOM

IP Prefix Analysis

IP前缀分析

  • Check that the
    IPPrefix
    field is present and non-empty for each entry.
    • Check for duplicate
      IPPrefix
      values across entries.
    • If duplicates are found, stop the skill and report to the user with the message:
      Duplicate IP prefix detected: {ip_prefix_value} appears on lines {line_numbers}
    • If no duplicates are found, continue with the analysis.
    • Checks
      • Each subnet must parse cleanly as either an IPv4 or IPv6 network using the code snippets in the
        references/
        folder.
      • Subnets must be normalized and displayed in CIDR slash notation.
        • Single-host IPv4 subnets must be represented as
          /32
          .
        • Single-host IPv6 subnets must be represented as
          /128
          .
    • ERROR
      • Report the following conditions as ERROR:
      • Invalid subnet syntax
        • Message ID:
          1102
      • Non-public address space
        • Applies to subnets that are private, loopback, link-local, multicast, or otherwise non-public
          • In Python, detect non-public ranges using
            is_private
            and related address properties as shown in
            ./references
            .
        • Message ID:
          1103
    • SUGGESTION
      • Report the following conditions as SUGGESTION:
      • Overly large IPv6 subnets
        • Prefixes shorter than
          /64
        • Message ID:
          3102
      • Overly large IPv4 subnets
        • Prefixes shorter than
          /22
        • Message ID:
          3101
  • 检查每个条目的
    IPPrefix
    字段是否存在且非空。
    • 检查
      IPPrefix
      值是否存在重复。
    • 如果发现重复,停止技能并向用户报告:
      检测到重复IP前缀:{ip_prefix_value}出现在行{line_numbers}
    • 如果未发现重复,继续分析。
    • 检查项
      • 每个子网必须能够使用
        references/
        文件夹中的代码片段正确解析为IPv4或IPv6网络
      • 子网必须标准化并以CIDR斜杠表示法显示。
        • 单主机IPv4子网必须表示为**
          /32
          **。
        • 单主机IPv6子网必须表示为**
          /128
          **。
    • 错误(ERROR)
      • 以下情况报告为ERROR
      • 无效子网语法
        • 消息ID:
          1102
      • 非公开地址段
        • 适用于私有、环回、链路本地、多播或其他非公开的子网
          • 在Python中,使用
            is_private
            ./references
            中所示的相关地址属性检测非公开地址段。
        • 消息ID:
          1103
    • 建议(SUGGESTION)
      • 以下情况报告为SUGGESTION
      • IPv6前缀过大
        • 前缀长度小于
          /64
        • 消息ID:
          3102
      • IPv4前缀过大
        • 前缀长度小于
          /22
        • 消息ID:
          3101

Geolocation Quality Check

地理定位质量检查

Analyze the accuracy and consistency of geolocation data:
  • Country codes
  • Region codes
  • City names
  • Deprecated fields
This phase runs after structural checks pass.
分析地理定位数据的准确性和一致性
  • 国家代码
  • 地区代码
  • 城市名称
  • 已弃用字段
此阶段在结构检查通过后运行。
Country Code Analysis
国家代码分析
  • Use the locally available data table
    ISO3166-1
    for checking.
    • JSON array of countries and territories with ISO codes
    • Each object includes:
      • alpha_2
        : two-letter country code
      • name
        : short country name
      • flag
        : flag emoji
    • This file represents the superset of valid
      CountryCode
      values
      for an RFC 8805 CSV.
    • Check the entry's
      CountryCode
      (RFC 8805 Section 2.1.1.2, column
      alpha2code
      ) against the
      alpha_2
      attribute.
    • Sample code is available in the
      references/
      directory.
    • If a country is found in
      assets/small-territories.json
      , mark the entry internally as a small territory. This flag is used in later checks and suggestions but is not stored in the output JSON (it is transient validation state).
    • Note:
      small-territories.json
      contains some historic/disputed codes (
      AN
      ,
      CS
      ,
      XK
      ) that are not present in
      iso3166-1.json
      . An entry using one of these as its
      CountryCode
      will fail the country code validation (ERROR) even though it matches as a small territory. The country code ERROR takes precedence — do not suppress it based on the small-territory flag.
    • ERROR
      • Report the following conditions as ERROR:
      • Invalid country code
        • Condition:
          CountryCode
          is present but not found in the
          alpha_2
          set
        • Message ID:
          1201
    • SUGGESTION
      • Report the following conditions as SUGGESTION:
      • Unspecified geolocation for subnet
        • Condition: All geographical fields (
          CountryCode
          ,
          RegionCode
          ,
          City
          ) are empty for a subnet.
        • Action:
          • Set
            DoNotGeolocate = true
            for the entry.
          • Set
            CountryCode
            to
            ZZ
            for the entry.
        • Message ID:
          3104
  • 使用本地可用的数据表
    ISO3166-1
    进行检查。
    • 包含国家和地区ISO代码的JSON数组
    • 每个对象包含:
      • alpha_2
        :两位国家代码
      • name
        :国家简称
      • flag
        :国旗表情符号
    • 此文件代表RFC 8805 CSV中
      CountryCode
      值的有效全集
    • 将条目中的
      CountryCode
      (RFC 8805第2.1.1.2节,列
      alpha2code
      )与
      alpha_2
      属性进行比对。
    • 参考代码可在
      references/
      目录中找到。
    • 如果某个国家在
      assets/small-territories.json
      中存在,将该条目标记为小型地区。此标志用于后续检查和建议,但不会存储在输出JSON中(属于临时验证状态)。
    • 注意:
      small-territories.json
      包含一些历史/有争议的代码(
      AN
      CS
      XK
      ),这些代码未出现在
      iso3166-1.json
      中。条目使用这些代码作为
      CountryCode
      时,即使匹配小型地区,也会触发国家代码验证错误(ERROR)。国家代码ERROR优先级更高——请勿根据小型地区标志抑制该错误。
    • 错误(ERROR)
      • 以下情况报告为ERROR
      • 无效国家代码
        • 条件:
          CountryCode
          存在但未在
          alpha_2
          集合中找到
        • 消息ID:
          1201
    • 建议(SUGGESTION)
      • 以下情况报告为SUGGESTION
      • 子网未指定地理定位信息
        • 条件:子网的所有地理字段(
          CountryCode
          RegionCode
          City
          )均为空。
        • 操作:
          • 将条目的
            DoNotGeolocate
            设置为
            true
          • 将条目的
            CountryCode
            设置为
            ZZ
        • 消息ID:
          3104
Region Code Analysis
地区代码分析
  • Use the locally available data table
    ISO3166-2
    for checking.
    • JSON array of country subdivisions with ISO-assigned codes
    • Each object includes:
      • code
        : subdivision code prefixed with country code (e.g.,
        US-CA
        )
      • name
        : short subdivision name
    • This file represents the superset of valid
      RegionCode
      values
      for an RFC 8805 CSV.
    • If a
      RegionCode
      value is provided (RFC 8805 Section 2.1.1.3):
      • Check that the format matches
        {COUNTRY}-{SUBDIVISION}
        (e.g.,
        US-CA
        ,
        AU-NSW
        ).
      • Check the value against the
        code
        attribute (already prefixed with the country code).
    • Small-territory exception: If the entry is a small territory and the
      RegionCode
      value equals the entry's
      CountryCode
      (e.g.,
      SG
      as both country and region for Singapore), treat the region as acceptable — skip all region validation checks for this entry. Small territories are effectively city-states with no meaningful ISO 3166-2 administrative subdivisions.
    • ERROR
      • Report the following conditions as ERROR:
      • Invalid region format
        • Condition:
          RegionCode
          does not match
          {COUNTRY}-{SUBDIVISION}
          and the small-territory exception does not apply
        • Message ID:
          1301
      • Unknown region code
        • Condition:
          RegionCode
          value is not found in the
          code
          set and the small-territory exception does not apply
        • Message ID:
          1302
      • Country–region mismatch
        • Condition: Country portion of
          RegionCode
          does not match
          CountryCode
        • Message ID:
          1303
  • 使用本地可用的数据表
    ISO3166-2
    进行检查。
    • 包含国家细分ISO分配代码的JSON数组
    • 每个对象包含:
      • code
        :带国家代码前缀的细分代码(例如
        US-CA
      • name
        :细分地区简称
    • 此文件代表RFC 8805 CSV中
      RegionCode
      值的有效全集
    • 如果提供了
      RegionCode
      值(RFC 8805第2.1.1.3节):
      • 检查格式是否符合
        {COUNTRY}-{SUBDIVISION}
        (例如
        US-CA
        AU-NSW
        )。
      • 将值与
        code
        属性(已带有国家代码前缀)进行比对。
    • 小型地区例外:如果条目属于小型地区
      RegionCode
      值等于条目中的
      CountryCode
      (例如新加坡的
      SG
      同时作为国家代码和地区代码),则认为该地区代码是可接受的——跳过该条目的所有地区验证检查。小型地区本质上是城市国家,没有有意义的ISO 3166-2行政细分。
    • 错误(ERROR)
      • 以下情况报告为ERROR
      • 无效地区格式
        • 条件:
          RegionCode
          不符合
          {COUNTRY}-{SUBDIVISION}
          格式不适用小型地区例外
        • 消息ID:
          1301
      • 未知地区代码
        • 条件:
          RegionCode
          值未在
          code
          集合中找到不适用小型地区例外
        • 消息ID:
          1302
      • 国家-地区代码不匹配
        • 条件:
          RegionCode
          中的国家部分与
          CountryCode
          不匹配
        • 消息ID:
          1303
City Name Analysis
城市名称分析
  • City names are validated using heuristic checks only.
    • There is currently no authoritative dataset available for validating city names.
    • ERROR
      • Report the following conditions as ERROR:
      • Placeholder or non-meaningful values
        • Condition: Placeholder or non-meaningful values including but not limited to:
          • undefined
          • Please select
          • null
          • N/A
          • TBD
          • unknown
        • Message ID:
          1401
      • Truncated names, abbreviations, or airport codes
        • Condition: Truncated names, abbreviations, or airport codes that do not represent valid city names:
          • LA
          • Frft
          • sin01
          • LHR
          • SIN
          • MAA
        • Message ID:
          1402
    • WARNING
      • Report the following conditions as WARNING:
      • Inconsistent casing or formatting
        • Condition: City names with inconsistent casing, spacing, or formatting that may reduce data quality, for example:
          • HongKong
            vs
            Hong Kong
          • Mixed casing or unexpected script usage
        • Message ID:
          2401
  • 城市名称仅通过启发式检查进行验证。
    • 目前没有权威数据集可用于验证城市名称。
    • 错误(ERROR)
      • 以下情况报告为ERROR
      • 占位符或无意义值
        • 条件:包含占位符或无意义值,包括但不限于:
          • undefined
          • Please select
          • null
          • N/A
          • TBD
          • unknown
        • 消息ID:
          1401
      • 截断名称、缩写或机场代码
        • 条件:检测到截断名称、缩写或机场代码,不代表有效城市名称:
          • LA
          • Frft
          • sin01
          • LHR
          • SIN
          • MAA
        • 消息ID:
          1402
    • 警告(WARNING)
      • 以下情况报告为WARNING
      • 格式不一致
        • 条件:城市名称大小写、空格或格式不一致,可能降低数据质量,例如:
          • HongKong
            vs
            Hong Kong
          • 大小写混合或使用意外的脚本
        • 消息ID:
          2401
Postal Code Check
邮政编码检查
  • RFC 8805 Section 2.1.1.5 explicitly deprecates postal or ZIP codes.
    • Postal codes can represent very small populations and are not considered privacy-safe for mapping IP address ranges, which are statistical in nature.
    • ERROR
      • Report the following conditions as ERROR:
      • Postal code present
        • Condition: A non-empty value is present in the postal/ZIP code field.
        • Message ID:
          1501
  • RFC 8805第2.1.1.5节明确弃用邮政编码或ZIP代码
    • 邮政编码代表的人口范围非常小,对于统计性质的IP地址范围映射来说不符合隐私安全要求
    • 错误(ERROR)
      • 以下情况报告为ERROR
      • 存在邮政编码
        • 条件:邮政编码/ZIP代码字段存在非空值。
        • 消息ID:
          1501

Tuning & Recommendations

调优与建议

This phase applies opinionated recommendations beyond RFC 8805, learned from real-world geofeed deployments, that improve accuracy and usability.
  • SUGGESTION
    • Report the following conditions as SUGGESTION:
    • Region or city specified for small territory
      • Condition:
        • Entry is a small territory
        • RegionCode
          is non-empty OR
        • City
          is non-empty.
      • Message IDs:
        3301
        (for region),
        3402
        (for city)
    • Missing region code when city is specified
      • Condition:
        • City
          is non-empty
        • RegionCode
          is empty
        • Entry is not a small territory
      • Message ID:
        3303
此阶段应用超出RFC 8805要求的经验性建议,这些建议来自实际geofeed部署经验,可提升准确性和可用性。
  • 建议(SUGGESTION)
    • 以下情况报告为SUGGESTION
    • 小型地区指定了地区或城市
      • 条件:
        • 条目属于小型地区
        • RegionCode
          非空
        • City
          非空。
      • 消息ID:
        3301
        (针对地区)、
        3402
        (针对城市)
    • 指定城市但缺少地区代码
      • 条件:
        • City
          非空
        • RegionCode
          为空
        • 条目不属于小型地区
      • 消息ID:
        3303

Phase 4: Tuning Data Lookup

阶段4:调优数据查询

Objective

目标

Lookup all the
Entries
using Fastah's
rfc8805-row-place-search
tool.
使用Fastah的
rfc8805-row-place-search
工具查询所有
Entries

Execution Rules

执行规则

  • Generate a new script only for payload generation (read the dataset and write one or more payload JSON files; do not call MCP from this script).
  • Server only accepts 1000 entries per request, so if there are more than 1000 entries, split into multiple requests.
  • The agent must read the generated payload files, construct the requests from them, and send those requests to the MCP server in batches of at most 1000 entries each.
  • On MCP failure: If the MCP server is unreachable, returns an error, or returns no results for any batch, log a warning and continue to Phase 5. Set
    TunedEntry: {}
    for all affected entries. Do not block report generation. Notify the user clearly:
    Tuning data lookup unavailable; the report will show validation results only.
  • Suggestions are advisory onlynever auto-populate them.
  • 仅为生成请求负载创建一个新脚本(读取数据集并写入一个或多个负载JSON文件;请勿在此脚本中调用MCP)。
  • 服务器每个请求最多接受1000个条目,因此如果条目超过1000个,拆分到多个请求中。
  • Agent必须读取生成的负载文件,从中构造请求,并以最多1000个条目为一批发送到MCP服务器。
  • **MCP失败处理:**如果MCP服务器无法访问、返回错误或任何批次未返回结果,记录警告并继续到阶段5。将受影响条目的
    TunedEntry: {}
    设置为空对象。请勿阻止报告生成。向用户明确通知:
    调优数据查询不可用;报告将仅显示验证结果。
  • 建议仅作为参考——切勿自动填充

Step 1: Build Lookup Payload with Deduplication

步骤1:构建去重后的查询负载

Load the dataset from: ./run/data/report-data.json
  • Read the
    Entries
    array. Each entry will be used to build the MCP lookup payload.
Reduce server requests by deduplicating identical entries:
  • For each entry in
    Entries
    , compute a content hash (hash of
    CountryCode
    +
    RegionCode
    +
    City
    ).
  • Create a deduplication map:
    { contentHash -> { rowKey, payload, entryIndices: [] } }
    . rowKey is a UUID that will be sent to the MCP server for matching responses.
  • If an entry's hash already exists, append its 0-based array index in
    Entries
    to that deduplication entry's
    entryIndices
    array.
  • If hash is new, generate a UUID (rowKey) and create a new deduplication entry.
Build request batches:
  • Extract unique deduplicated entries from the map, keeping them in deduplication order.
  • Build request batches of up to 1000 items each.
  • For each batch, keep an in-memory structure like
    [{ rowKey, payload, entryIndices }, ...]
    to match responses back by rowKey.
  • When writing the MCP payload file, include the
    rowKey
    field with each payload object:
json
[
    {"rowKey": "550e8400-e29b-41d4-a716-446655440000", "countryCode":"CA","regionCode":"CA-ON","cityName":"Toronto"},
    {"rowKey": "6ba7b810-9dad-11d1-80b4-00c04fd430c8", "countryCode":"IN","regionCode":"IN-KA","cityName":"Bangalore"},
    {"rowKey": "6ba7b811-9dad-11d1-80b4-00c04fd430c8", "countryCode":"IN","regionCode":"IN-KA"}
]
  • When reading responses, match each response
    rowKey
    field to the corresponding deduplication entry to retrieve all associated
    entryIndices
    .
Rules:
  • Write payload to: ./run/data/mcp-server-payload.json
  • Exit the script after writing the payload.
./run/data/report-data.json加载数据集。
  • 读取
    Entries
    数组。每个条目将用于构建MCP查询负载。
通过去重相同条目减少服务器请求:
  • 对于
    Entries
    中的每个条目,计算内容哈希(
    CountryCode
    +
    RegionCode
    +
    City
    的哈希值)。
  • 创建去重映射:
    { contentHash -> { rowKey, payload, entryIndices: [] } }
    。rowKey是将发送到MCP服务器用于匹配响应的UUID。
  • 如果条目的哈希已存在,将其在
    Entries
    中的0-based数组索引追加到该去重条目的
    entryIndices
    数组中。
  • 如果哈希是新的,生成一个**UUID(rowKey)**并创建一个新的去重条目。
构建请求批次:
  • 从映射中提取唯一的去重条目,保持去重顺序。
  • 构建最多包含1000个条目的请求批次。
  • 对于每个批次,保留内存结构如
    [{ rowKey, payload, entryIndices: [] }, ...]
    ,以便通过rowKey匹配响应。
  • 写入MCP负载文件时,每个负载对象包含
    rowKey
    字段:
json
[
    {"rowKey": "550e8400-e29b-41d4-a716-446655440000", "countryCode":"CA","regionCode":"CA-ON","cityName":"Toronto"},
    {"rowKey": "6ba7b810-9dad-11d1-80b4-00c04fd430c8", "countryCode":"IN","regionCode":"IN-KA","cityName":"Bangalore"},
    {"rowKey": "6ba7b811-9dad-11d1-80b4-00c04fd430c8", "countryCode":"IN","regionCode":"IN-KA"}
]
  • 读取响应时,将每个响应中的
    rowKey
    字段与去重映射中的对应条目进行匹配,以获取所有关联的
    entryIndices
规则:
  • 将负载写入:./run/data/mcp-server-payload.json
  • 写入负载后退出脚本。

Step 2: Invoke Fastah MCP Tool

步骤2:调用Fastah MCP工具

  • An example
    mcp.json
    style configuration of Fastah MCP server is as follows:
json
    "fastah-ip-geofeed": {
      "type": "http",
      "url": "https://mcp.fastah.ai/mcp"
    }
  • Server:
    https://mcp.fastah.ai/mcp
  • Tool and its Schema: before the first
    tools/call
    , the agent MUST send a
    tools/list
    request to read the input and output schema for
    rfc8805-row-place-search
    . Use the discovered schema as the authoritative source for field names, types, and constraints.
  • The following is an illustrative example only; always defer to the schema returned by
    tools/list
    :
    json
    [
        {"rowKey": "550e8400-...", "countryCode":"CA", ...},
        {"rowKey": "690e9301-...", "countryCode":"ZZ", ...}
    ]
  • Open ./run/data/mcp-server-payload.json and send all deduplicated entries with their rowKeys.
  • If there are more than 1000 deduplicated entries after deduplication, split into multiple requests of 1000 entries each.
  • The server will respond with the same
    rowKey
    field in each response for mapping back.
  • Do NOT use local data.
  • Fastah MCP服务器的
    mcp.json
    风格配置示例如下:
json
    "fastah-ip-geofeed": {
      "type": "http",
      "url": "https://mcp.fastah.ai/mcp"
    }
  • 服务器地址:
    https://mcp.fastah.ai/mcp
  • 工具及其Schema:在第一次
    tools/call
    之前,Agent必须发送
    tools/list
    请求,读取**
    rfc8805-row-place-search
    **的输入和输出Schema。 使用返回的Schema作为字段名、类型和约束的权威来源。
  • 以下仅为示例说明;请始终以
    tools/list
    返回的Schema为准:
    json
    [
        {"rowKey": "550e8400-...", "countryCode":"CA", ...},
        {"rowKey": "690e9301-...", "countryCode":"ZZ", ...}
    ]
  • 打开./run/data/mcp-server-payload.json,发送所有带rowKey的去重条目。
  • 如果去重后条目超过1000个,拆分为多个请求,每个请求最多1000个条目。
  • 服务器将在每个响应中返回相同的
    rowKey
    字段,用于映射回原始条目。
  • 请勿使用本地数据。

Step 3: Attach Tuned Data to Entries

步骤3:将调优数据附加到条目

  • Generate a new script for attaching tuned data.
  • Load both ./run/data/report-data.json and the deduplication map (held in memory from Step 1, or re-derived from the payload file).
  • For each response from the MCP server:
    • Extract the
      rowKey
      from the response.
    • Look up the
      entryIndices
      array associated with that
      rowKey
      from the deduplication map.
    • For each index in
      entryIndices
      , attach the best match to
      Entries[index]
      .
  • Use the first (best) match from the response when available.
Create the field on each affected entry if it does not exist. Remap the MCP API response keys to Go struct field names:
json
"TunedEntry": {
  "Name": "",
  "CountryCode": "",
  "RegionCode": "",
  "PlaceType": "",
  "H3Cells": [],
  "BoundingBox": []
}
The
TunedEntry
field is a single object (not an array). It holds the best match from the MCP server.
MCP response key → JSON key mapping:
MCP API response keyJSON key
placeName
Name
countryCode
CountryCode
stateCode
RegionCode
placeType
PlaceType
h3Cells
H3Cells
boundingBox
BoundingBox
Entries with no UUID match (i.e. the MCP server returned no response for their UUID) must receive an empty
TunedEntry: {}
object — never leave the field absent.
  • Write the dataset back to: ./run/data/report-data.json
  • Rules:
    • Maintain all existing validation flags.
    • Do NOT create additional intermediate files.
  • 生成一个新脚本用于附加调优数据。
  • 加载./run/data/report-data.json和去重映射(从步骤1的内存中获取,或从负载文件重新推导)。
  • 对于MCP服务器返回的每个响应:
    • 从响应中提取
      rowKey
    • 从去重映射中查找与该
      rowKey
      关联的
      entryIndices
      数组。
    • 对于数组中的每个索引,将最佳匹配结果附加到
      Entries[index]
  • 如果有可用结果,使用第一个(最佳)匹配
如果条目不存在该字段,则创建该字段。将MCP API响应键映射为Go结构体字段名:
json
"TunedEntry": {
  "Name": "",
  "CountryCode": "",
  "RegionCode": "",
  "PlaceType": "",
  "H3Cells": [],
  "BoundingBox": []
}
TunedEntry
字段是一个单个对象(不是数组)。它保存来自MCP服务器的最佳匹配结果。
MCP响应键 → JSON键映射:
MCP API响应键JSON键
placeName
Name
countryCode
CountryCode
stateCode
RegionCode
placeType
PlaceType
h3Cells
H3Cells
boundingBox
BoundingBox
对于没有UUID匹配的条目(即MCP服务器未返回其UUID的响应),必须设置
TunedEntry: {}
——切勿省略该字段。
  • 将数据集写回:./run/data/report-data.json
  • 规则:
    • 保留所有现有验证标志。
    • 请勿创建额外的中间文件。

Phase 5: Generate Tuning Report

阶段5:生成调优报告

Generate a self-contained HTML report by rendering the template at
./scripts/templates/index.html
with data from
./run/data/report-data.json
and
./run/data/comments.json
.
Write the completed report to
./run/report/geofeed-report.html
. After generating, attempt to open it in the system's default browser (e.g.,
webbrowser.open()
). If running in a headless environment, CI pipeline, or remote container where no browser is available, skip the browser step and instead present the file path to the user so they can open or download it.
The template uses Go
html/template
syntax
(
{{.Field}}
,
{{range}}
,
{{if eq}}
, etc.). Write a Python script that reads the template, builds a rendering context from the JSON data files, and processes the template placeholders to produce final HTML. Do not modify the template file itself — all processing happens in the Python script at render time.
通过渲染
./scripts/templates/index.html
模板,结合
./run/data/report-data.json
./run/data/comments.json
中的数据,生成独立HTML报告
将完成的报告写入
./run/report/geofeed-report.html
。生成后,尝试在系统默认浏览器中打开(例如使用
webbrowser.open()
)。如果在无头环境、CI流水线或远程容器中运行,且没有可用浏览器,跳过打开浏览器步骤,而是向用户提供文件路径,以便他们打开或下载。
模板使用Go
html/template
语法
{{.Field}}
{{range}}
{{if eq}}
等)。编写Python脚本读取模板,从JSON数据文件构建渲染上下文,并处理模板占位符以生成最终HTML。请勿修改模板文件本身——所有处理都在Python脚本渲染时完成。

Step 1: Replace Metadata Placeholders

步骤1:替换元数据占位符

Replace each
{{.Metadata.X}}
placeholder in the template with the corresponding value from
report-data.json
. Since JSON keys match the template placeholder, the mapping is direct —
{{.Metadata.InputFile}}
maps to the
InputFile
JSON key, etc.
Template placeholderJSON key (
report-data.json
)
{{.Metadata.InputFile}}
InputFile
{{.Metadata.Timestamp}}
Timestamp
{{.Metadata.TotalEntries}}
TotalEntries
{{.Metadata.IpV4Entries}}
IpV4Entries
{{.Metadata.IpV6Entries}}
IpV6Entries
{{.Metadata.InvalidEntries}}
InvalidEntries
{{.Metadata.Errors}}
Errors
{{.Metadata.Warnings}}
Warnings
{{.Metadata.Suggestions}}
Suggestions
{{.Metadata.OK}}
OK
{{.Metadata.CityLevelAccuracy}}
CityLevelAccuracy
{{.Metadata.RegionLevelAccuracy}}
RegionLevelAccuracy
{{.Metadata.CountryLevelAccuracy}}
CountryLevelAccuracy
{{.Metadata.DoNotGeolocate}}
DoNotGeolocate
(metadata)
Note on
{{.Metadata.Timestamp}}
:
This placeholder appears inside a JavaScript
new Date(...)
call. Replace it with the raw integer value (no HTML escaping needed for a numeric literal inside
<script>
). All other metadata values should be HTML-escaped since they appear inside HTML element text.
将模板中的每个
{{.Metadata.X}}
占位符替换为
report-data.json
中的对应值。由于JSON键与模板占位符匹配,映射是直接的——
{{.Metadata.InputFile}}
映射到JSON键
InputFile
,以此类推。
模板占位符JSON键(
report-data.json
{{.Metadata.InputFile}}
InputFile
{{.Metadata.Timestamp}}
Timestamp
{{.Metadata.TotalEntries}}
TotalEntries
{{.Metadata.IpV4Entries}}
IpV4Entries
{{.Metadata.IpV6Entries}}
IpV6Entries
{{.Metadata.InvalidEntries}}
InvalidEntries
{{.Metadata.Errors}}
Errors
{{.Metadata.Warnings}}
Warnings
{{.Metadata.Suggestions}}
Suggestions
{{.Metadata.OK}}
OK
{{.Metadata.CityLevelAccuracy}}
CityLevelAccuracy
{{.Metadata.RegionLevelAccuracy}}
RegionLevelAccuracy
{{.Metadata.CountryLevelAccuracy}}
CountryLevelAccuracy
{{.Metadata.DoNotGeolocate}}
DoNotGeolocate
(元数据)
**关于
{{.Metadata.Timestamp}}
的注意事项:**该占位符出现在JavaScript
new Date(...)
调用中。直接替换为原始整数值(
<script>
中的数值字面量无需HTML转义)。所有其他元数据值应进行HTML转义,因为它们出现在HTML元素文本中。

Step 2: Replace the Comment Map Placeholder

步骤2:替换注释映射占位符

Locate this pattern in the template:
javascript
const commentMap = {{.Comments}};
Replace
{{.Comments}}
with the serialized JSON object from
./run/data/comments.json
. The JSON is embedded directly as a JavaScript object literal (not inside a string), so no extra escaping is needed:
python
comments_json = json.dumps(comments)
template = template.replace("{{.Comments}}", comments_json)
在模板中找到以下模式:
javascript
const commentMap = {{.Comments}};
{{.Comments}}
替换为
./run/data/comments.json
中的序列化JSON对象。JSON直接作为JavaScript对象字面量嵌入(不在字符串内),因此无需额外转义:
python
comments_json = json.dumps(comments)
template = template.replace("{{.Comments}}", comments_json)

Step 3: Expand the Entries Range Block

步骤3:展开条目循环块

The template contains a
{{range .Entries}}...{{end}}
block inside
<tbody id="entriesTableBody">
. Process it as follows:
  1. Extract the range block body using regex. Critical: The block contains nested
    {{end}}
    tags (from
    {{if eq .Status ...}}
    ,
    {{if .Checked}}
    , and
    {{range .Messages}}
    ). A naive non-greedy match like
    \{\{range \.Entries\}\}(.*?)\{\{end\}\}
    will match the first inner
    {{end}}
    , truncating the block. Instead, anchor the outer
    {{end}}
    to the
    </tbody>
    that follows it:
    python
    m = re.search(
        r'\{\{range \.Entries\}\}(.*?)\{\{end\}\}\s*</tbody>',
        template,
        re.DOTALL,
    )
    entry_body = m.group(1)  # template text for one entry iteration
    This ensures you capture the full block body including all three
    <tr>
    rows and the nested
    {{range .Messages}}...{{end}}
    .
  2. Iterate over each entry in
    report-data.json
    's
    Entries
    array.
  3. Expand the block body for each entry using the processing order below.
  4. Replace the entire match (from
    {{range .Entries}}
    through
    </tbody>
    ) with the concatenated expanded HTML followed by
    </tbody>
    .
Processing order for each entry (innermost constructs first to avoid
{{end}}
confusion):
  1. Evaluate
    {{if eq .Status ...}}...{{end}}
    conditionals (status badge class and icon).
  2. Evaluate
    {{if .Checked}}...{{end}}
    conditional (message checkbox).
  3. Expand
    {{range .Messages}}...{{end}}
    inner range.
  4. Replace simple
    {{.Field}}
    placeholders.
模板在
<tbody id="entriesTableBody">
内部包含一个
{{range .Entries}}...{{end}}
块。按以下方式处理:
  1. 提取循环块主体(使用正则表达式)。关键:块中包含嵌套的
    {{end}}
    标签(来自
    {{if eq .Status ...}}
    {{if .Checked}}
    {{range .Messages}}
    )。简单的非贪婪匹配如
    \{\{range \.Entries\}\}(.*?)\{\{end\}\}
    会匹配
    第一个
    内部
    {{end}}
    ,导致块被截断。相反,将外部
    {{end}}
    锚定到其后的
    </tbody>
    python
    m = re.search(
        r'\{\{range \.Entries\}\}(.*?)\{\{end\}\}\s*</tbody>',
        template,
        re.DOTALL,
    )
    entry_body = m.group(1)  # 单个条目的模板文本
    这样可确保捕获完整的块主体,包括所有三个
    <tr>
    行和嵌套的
    {{range .Messages}}...{{end}}
  2. 遍历
    report-data.json
    Entries
    数组的每个条目。
  3. 按以下处理顺序为每个条目展开块主体。
  4. 替换整个匹配内容(从
    {{range .Entries}}
    </tbody>
    )为拼接后的展开HTML,再加上
    </tbody>
每个条目的处理顺序(先处理最内层结构,避免
{{end}}
混淆):
  1. 计算
    {{if eq .Status ...}}...{{end}}
    条件(状态徽章类和图标)。
  2. 计算
    {{if .Checked}}...{{end}}
    条件(消息复选框)。
  3. 展开
    {{range .Messages}}...{{end}}
    内部循环。
  4. 替换简单的
    {{.Field}}
    占位符。
Entry Field Mapping
条目字段映射
Within the range block body, replace these placeholders for each entry. Since JSON keys match the template placeholder, the template placeholder
{{.X}}
maps directly to JSON key
X
:
Template placeholderJSON key (
Entries[]
)
Notes
{{.Line}}
Line
Direct integer value
{{.IPPrefix}}
IPPrefix
HTML-escaped
{{.CountryCode}}
CountryCode
HTML-escaped
{{.RegionCode}}
RegionCode
HTML-escaped
{{.City}}
City
HTML-escaped
{{.Status}}
Status
HTML-escaped
{{.HasError}}
HasError
Lowercase string:
"true"
or
"false"
{{.HasWarning}}
HasWarning
Lowercase string:
"true"
or
"false"
{{.HasSuggestion}}
HasSuggestion
Lowercase string:
"true"
or
"false"
{{.GeocodingHint}}
GeocodingHint
Empty string
""
{{.DoNotGeolocate}}
DoNotGeolocate
"true"
or
"false"
{{.Tunable}}
Tunable
"true"
or
"false"
{{.TunedEntry.CountryCode}}
TunedEntry.CountryCode
""
if
TunedEntry
is empty
{}
{{.TunedEntry.RegionCode}}
TunedEntry.RegionCode
""
if
TunedEntry
is empty
{}
{{.TunedEntry.Name}}
TunedEntry.Name
""
if
TunedEntry
is empty
{}
{{.TunedEntry.H3Cells}}
TunedEntry.H3Cells
Bracket-wrapped space-separated;
"[]"
if empty (see format below)
{{.TunedEntry.BoundingBox}}
TunedEntry.BoundingBox
Bracket-wrapped space-separated;
"[]"
if empty (see format below)
data-h3-cells
and
data-bounding-box
format:
These are NOT JSON arrays. They are bracket-wrapped, space-separated values. Do not use JSON serialization (no quotes around string elements, no commas between numbers). Examples:
  • [836752fffffffff 836755fffffffff]
    — correct
  • ["836752fffffffff","836755fffffffff"]
    WRONG, quotes will break parsing
  • [-71.70 10.73 -71.52 10.55]
    — correct
  • []
    — correct for empty
在循环块主体中,为每个条目替换以下占位符。由于JSON键与模板占位符匹配,模板占位符
{{.X}}
直接映射到JSON键
X
模板占位符JSON键(
Entries[]
说明
{{.Line}}
Line
直接整数值
{{.IPPrefix}}
IPPrefix
HTML转义后的值
{{.CountryCode}}
CountryCode
HTML转义后的值
{{.RegionCode}}
RegionCode
HTML转义后的值
{{.City}}
City
HTML转义后的值
{{.Status}}
Status
HTML转义后的值
{{.HasError}}
HasError
小写字符串:
"true"
"false"
{{.HasWarning}}
HasWarning
小写字符串:
"true"
"false"
{{.HasSuggestion}}
HasSuggestion
小写字符串:
"true"
"false"
{{.GeocodingHint}}
GeocodingHint
Empty string
""
{{.DoNotGeolocate}}
DoNotGeolocate
"true"
"false"
{{.Tunable}}
Tunable
"true"
"false"
{{.TunedEntry.CountryCode}}
TunedEntry.CountryCode
如果
TunedEntry
为空
{}
则为
""
{{.TunedEntry.RegionCode}}
TunedEntry.RegionCode
如果
TunedEntry
为空
{}
则为
""
{{.TunedEntry.Name}}
TunedEntry.Name
如果
TunedEntry
为空
{}
则为
""
{{.TunedEntry.H3Cells}}
TunedEntry.H3Cells
括号包裹的空格分隔值;空值时为
"[]"
(格式见下文)
{{.TunedEntry.BoundingBox}}
TunedEntry.BoundingBox
括号包裹的空格分隔值;空值时为
"[]"
(格式见下文)
data-h3-cells
data-bounding-box
格式:这些不是JSON数组
。它们是括号包裹、空格分隔的值。请勿使用JSON序列化(字符串元素无需加引号,数字之间无需逗号)。示例:
  • [836752fffffffff 836755fffffffff]
    — 正确
  • ["836752fffffffff","836755fffffffff"]
    错误,引号会导致解析失败
  • [-71.70 10.73 -71.52 10.55]
    — 正确
  • []
    — 空值时正确
Evaluating Status Conditionals
计算状态条件
Process these BEFORE replacing simple
{{.Field}}
placeholders
— otherwise the
{{end}}
markers get consumed and the regex won't match.
The template uses
{{if eq .Status "..."}}
conditionals for the status badge CSS class and icon. Evaluate these by checking the entry's
status
value and keeping only the matching branch text.
The status badge line contains two
{{if eq .Status ...}}...{{end}}
blocks on a single line — one for the CSS class, one for the icon. Use
re.sub
with a callback to resolve all occurrences:
python
STATUS_CSS = {"ERROR": "error", "WARNING": "warning", "SUGGESTION": "suggestion", "OK": "ok"}
STATUS_ICON = {
    "ERROR": "bi-x-circle-fill",
    "WARNING": "bi-exclamation-triangle-fill",
    "SUGGESTION": "bi-lightbulb-fill",
    "OK": "bi-check-circle-fill",
}

def resolve_status_if(match_obj, status):
    """Pick the branch matching `status` from a {{if eq .Status ...}}...{{end}} block."""
    block = match_obj.group(0)
    # Try each branch: {{if eq .Status "X"}}val{{else if ...}}val{{else}}val{{end}}
    for st, val in [("ERROR",), ("WARNING",), ("SUGGESTION",)]:
        # not needed to parse generically — just map from the known patterns
    ...
A simpler approach: since there are exactly two known patterns, replace them as literal strings:
python
css_class = STATUS_CSS.get(status, "ok")
icon_class = STATUS_ICON.get(status, "bi-check-circle-fill")
body = body.replace(
    '{{if eq .Status "ERROR"}}error{{else if eq .Status "WARNING"}}warning{{else if eq .Status "SUGGESTION"}}suggestion{{else}}ok{{end}}',
    css_class,
)
body = body.replace(
    '{{if eq .Status "ERROR"}}bi-x-circle-fill{{else if eq .Status "WARNING"}}bi-exclamation-triangle-fill{{else if eq .Status "SUGGESTION"}}bi-lightbulb-fill{{else}}bi-check-circle-fill{{end}}',
    icon_class,
)
This avoids regex entirely and is safe because these exact strings appear verbatim in the template.
在替换简单
{{.Field}}
占位符之前处理这些条件
——否则
{{end}}
标记会被消耗,导致正则表达式无法匹配。
模板使用
{{if eq .Status "..."}}
条件来设置状态徽章的CSS类和图标。通过检查条目的
status
值,仅保留匹配分支的文本。
状态徽章行包含两个
{{if eq .Status ...}}...{{end}}
块——一个用于CSS类,一个用于图标。使用
re.sub
和回调函数解析所有匹配项:
python
STATUS_CSS = {"ERROR": "error", "WARNING": "warning", "SUGGESTION": "suggestion", "OK": "ok"}
STATUS_ICON = {
    "ERROR": "bi-x-circle-fill",
    "WARNING": "bi-exclamation-triangle-fill",
    "SUGGESTION": "bi-lightbulb-fill",
    "OK": "bi-check-circle-fill",
}

def resolve_status_if(match_obj, status):
    """从{{if eq .Status ...}}...{{end}}块中选择与`status`匹配的分支。"""
    block = match_obj.group(0)
    # 尝试每个分支:{{if eq .Status "X"}}val{{else if ...}}val{{else}}val{{end}}
    for st, val in [("ERROR",), ("WARNING",), ("SUGGESTION",)]:
        # 无需通用解析——只需根据已知模式映射
    ...
更简单的方法:由于只有两个已知模式,直接替换为字面字符串:
python
css_class = STATUS_CSS.get(status, "ok")
icon_class = STATUS_ICON.get(status, "bi-check-circle-fill")
body = body.replace(
    '{{if eq .Status "ERROR"}}error{{else if eq .Status "WARNING"}}warning{{else if eq .Status "SUGGESTION"}}suggestion{{else}}ok{{end}}',
    css_class,
)
body = body.replace(
    '{{if eq .Status "ERROR"}}bi-x-circle-fill{{else if eq .Status "WARNING"}}bi-exclamation-triangle-fill{{else if eq .Status "SUGGESTION"}}bi-lightbulb-fill{{else}}bi-check-circle-fill{{end}}',
    icon_class,
)
这样可避免使用正则表达式,且安全可靠,因为这些字符串在模板中是固定的。

Step 4: Expand the Nested Messages Range

步骤4:展开嵌套消息循环

The
{{range .Messages}}...{{end}}
block contains a nested
{{if .Checked}} checked{{else}} disabled{{end}}
conditional, so its inner
{{end}}
would cause a simple non-greedy regex to match too early. Anchor the regex to
</td>
(the tag immediately after the messages range closing
{{end}}
) to capture the full block body:
python
msg_match = re.search(
    r'\{\{range \.Messages\}\}(.*?)\{\{end\}\}\s*(?=</td>)',
    body, re.DOTALL
)
The lookahead
(?=</td>)
ensures the regex skips past the checkbox conditional's
{{end}}
(which is followed by
>
, not
</td>
) and matches only the range-closing
{{end}}
(which is followed by whitespace then
</td>
).
For each message in the entry's
Messages
array, clone the captured block body and expand it:
  1. Resolve the checkbox conditional per message (must happen before simple placeholder replacement to remove the nested
    {{end}}
    ):
    python
    if msg.get("Checked"):
        msg_body = msg_body.replace(
            '{{if .Checked}} checked{{else}} disabled{{end}}', ' checked'
        )
    else:
        msg_body = msg_body.replace(
            '{{if .Checked}} checked{{else}} disabled{{end}}', ' disabled'
        )
  2. Replace message field placeholders:
    Template placeholderSourceNotes
    {{.ID}}
    Messages[i].ID
    Direct string value from JSON
    {{.Text}}
    Messages[i].Text
    HTML-escaped
  3. Concatenate all expanded message blocks and replace the original
    {{range .Messages}}...{{end}}
    match (
    msg_match.group(0)
    ) with the result:
    python
    body = body[:msg_match.start()] + "".join(expanded_msgs) + body[msg_match.end():]
If
Messages
is empty, replace the entire matched region with an empty string (no message divs — only the issues header remains).
{{range .Messages}}...{{end}}
块包含一个嵌套
{{if .Checked}} checked{{else}} disabled{{end}}
条件,因此其内部的
{{end}}
会导致简单的非贪婪正则表达式过早匹配。将正则表达式锚定到
</td>
(紧跟在消息循环结束
{{end}}
之后的标签),以捕获完整的块主体:
python
msg_match = re.search(
    r'\{\{range \.Messages\}\}(.*?)\{\{end\}\}\s*(?=</td>)',
    body, re.DOTALL
)
前瞻断言
(?=</td>)
确保正则表达式跳过复选框条件的
{{end}}
(其后是
>
,而非
</td>
),仅匹配循环结束的
{{end}}
(其后是空格和
</td>
)。
对于条目中
Messages
数组的每个消息,克隆捕获的块主体并展开:
  1. 解析复选框条件(每条消息):必须在替换简单占位符之前处理,以避免嵌套
    {{end}}
    的混淆:
    python
    if msg.get("Checked"):
        msg_body = msg_body.replace(
            '{{if .Checked}} checked{{else}} disabled{{end}}', ' checked'
        )
    else:
        msg_body = msg_body.replace(
            '{{if .Checked}} checked{{else}} disabled{{end}}', ' disabled'
        )
  2. 替换消息字段占位符:
    模板占位符来源说明
    {{.ID}}
    Messages[i].ID
    直接使用JSON中的字符串值
    {{.Text}}
    Messages[i].Text
    HTML转义后的值
  3. 拼接所有展开的消息块,并将原始
    {{range .Messages}}...{{end}}
    匹配内容(
    msg_match.group(0)
    )替换为结果:
    python
    body = body[:msg_match.start()] + "".join(expanded_msgs) + body[msg_match.end():]
如果
Messages
为空,将整个匹配区域替换为空字符串(无消息div——仅保留问题标题)。

Output Guarantees

输出保证

  • The report must be readable in any modern browser without extra network dependencies beyond the CDN links already in the template (
    leaflet
    ,
    h3-js
    ,
    bootstrap-icons
    , Raleway font).
  • All values embedded in HTML must be HTML-escaped (
    <
    ,
    >
    ,
    &
    ,
    "
    ) to prevent rendering issues.
  • commentMap
    is embedded as a direct JavaScript object literal (not inside a string), so no JS string escaping is needed — just emit valid JSON.
  • All values must be derived only from analysis output, not recomputed heuristically.
  • 报告必须可在任何现代浏览器中读取,无需模板中已有的CDN链接(
    leaflet
    h3-js
    bootstrap-icons
    、Raleway字体)之外的额外网络依赖。
  • 嵌入HTML的所有值必须进行HTML转义
    <
    >
    &
    "
    ),以避免渲染问题。
  • commentMap
    直接作为JavaScript对象字面量嵌入(不在字符串内),因此无需JS字符串转义——只需输出有效的JSON。
  • 所有值必须仅来自分析输出,而非通过启发式重新计算。

Phase 6: Final Review

阶段6:最终审核

Perform a final verification pass using concrete, checkable assertions before presenting results to the user.
Check 1 — Entry count integrity
  • Count non-comment, non-blank data rows in the original input CSV.
  • Assert:
    len(entries) in report-data.json == data_row_count
  • On failure:
    Row count mismatch: input has {N} data rows but report contains {M} entries.
Check 2 — Summary counter integrity
  • These counters use mutual exclusion based on the boolean flags, which mirrors the highest-severity
    Status
    field. An entry with both
    HasError: true
    and
    HasWarning: true
    is counted only in
    Errors
    , never in
    Warnings
    . This is equivalent to counting by the entry's
    Status
    field.
  • Assert all of the following; correct any that fail before generating the report:
    • Errors == sum(1 for e in Entries if e['HasError'])
    • Warnings == sum(1 for e in Entries if e['HasWarning'] and not e['HasError'])
    • Suggestions == sum(1 for e in Entries if e['HasSuggestion'] and not e['HasError'] and not e['HasWarning'])
    • OK == sum(1 for e in Entries if not e['HasError'] and not e['HasWarning'] and not e['HasSuggestion'])
    • Errors + Warnings + Suggestions + OK == TotalEntries - InvalidEntries
Check 3 — Accuracy bucket integrity
  • Assert:
    CityLevelAccuracy + RegionLevelAccuracy + CountryLevelAccuracy + DoNotGeolocate == TotalEntries - InvalidEntries
  • Note: The accuracy buckets defined in Phase 3 say "Do not count entries with
    HasError: true
    ", but the Check 3 formula above uses
    TotalEntries - InvalidEntries
    (which still includes ERROR entries). This means ERROR entries (those that parsed as valid IPs but failed validation) are counted in accuracy buckets by their geo-field presence. Only
    InvalidEntries
    (unparsable IP prefixes) are excluded. Follow the Check 3 formula as the authoritative rule.
  • On failure, trace and fix the bucketing logic before proceeding.
Check 4 — No duplicate line numbers
  • Assert: all
    Line
    values in
    Entries
    are unique.
  • On failure, report the duplicated line numbers to the user.
Check 5 — TunedEntry completeness
  • Assert: every object in
    Entries
    has a
    TunedEntry
    key (even if its value is
    {}
    ).
  • On failure, add
    "TunedEntry": {}
    to any entry missing the key, then re-save
    report-data.json
    .
Check 6 — Report file is present and non-empty
  • Confirm
    ./run/report/geofeed-report.html
    was written and has a file size greater than zero bytes.
  • On failure, regenerate the report before presenting to the user.
在向用户呈现结果之前,使用具体、可检查的断言执行最终验证。
检查1 — 条目计数完整性
  • 统计原始输入CSV中非注释、非空的数据行数。
  • 断言:
    report-data.json中的entries长度 == data_row_count
  • 失败时:
    行数不匹配:输入有{N}行数据,但报告包含{M}个条目。
检查2 — 汇总计数器完整性
  • 这些计数器基于布尔标志互斥计数,与最高严重级别
    Status
    字段一致。同时
    HasError: true
    HasWarning: true
    的条目仅计入
    Errors
    ,不计入
    Warnings
    。这与按条目
    Status
    字段计数等效。
  • 断言以下所有条件;生成报告前修正任何失败的条件:
    • Errors == sum(1 for e in Entries if e['HasError'])
    • Warnings == sum(1 for e in Entries if e['HasWarning'] and not e['HasError'])
    • Suggestions == sum(1 for e in Entries if e['HasSuggestion'] and not e['HasError'] and not e['HasWarning'])
    • OK == sum(1 for e in Entries if not e['HasError'] and not e['HasWarning'] and not e['HasSuggestion'])
    • Errors + Warnings + Suggestions + OK == TotalEntries - InvalidEntries
检查3 — 准确性分类完整性
  • 断言:
    CityLevelAccuracy + RegionLevelAccuracy + CountryLevelAccuracy + DoNotGeolocate == TotalEntries - InvalidEntries
  • **注意:**阶段3中定义的准确性分类规则指出“请勿计数
    HasError: true
    的条目”,但检查3的公式使用
    TotalEntries - InvalidEntries
    (仍包含ERROR条目)。这意味着ERROR条目(可解析为有效IP但验证失败的条目)根据其地理字段存在情况计入准确性分类。仅排除
    InvalidEntries
    (无法解析的IP前缀)。以检查3的公式作为权威规则。
  • 失败时,追溯并修正分类逻辑后再继续。
检查4 — 无重复行号
  • 断言:
    Entries
    中的所有
    Line
    值都是唯一的。
  • 失败时,向用户报告重复的行号。
检查5 — TunedEntry完整性
  • 断言:
    Entries
    中的每个对象都有
    TunedEntry
    键(即使值为
    {}
    )。
  • 失败时,为缺少该键的条目添加
    "TunedEntry": {}
    ,然后重新保存
    report-data.json
检查6 — 报告文件存在且非空
  • 确认
    ./run/report/geofeed-report.html
    已写入且文件大小大于0字节。
  • 失败时,重新生成报告后再向用户呈现。