geofeed-tuner
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseGeofeed Tuner – Create Better IP Geolocation Feeds
Geofeed调优工具——打造更优质的IP地理定位源
This skill helps you create and improve IP geolocation feeds in CSV format by:
- Ensuring your CSV is well-formed and consistent
- Checking alignment with RFC 8805 (the industry standard)
- Applying opinionated best practices learned from real-world deployments
- Suggesting improvements for accuracy, completeness, and privacy
本技能可通过以下方式帮助您创建和优化CSV格式的IP地理定位源:
- 确保CSV格式规范且一致
- 检查是否符合RFC 8805(行业标准)
- 应用从实际部署中总结的经验性最佳实践
- 为提升准确性、完整性和隐私性提供改进建议
When to Use This Skill
何时使用本技能
- Use this skill when a user asks for help creating, improving, or publishing an IP geolocation feed file in CSV format.
- Use it to tune and troubleshoot CSV geolocation feeds — catching errors, suggesting improvements, and ensuring real-world usability beyond RFC compliance.
- Intended audience:
- Network operators, administrators, and engineers responsible for publicly routable IP address space
- Organizations such as ISPs, mobile carriers, cloud providers, hosting and colocation companies, Internet Exchange operators, and satellite internet providers
- Do not use this skill for private or internal IP address management; it applies only to publicly routable IP addresses.
- 当用户需要帮助创建、改进或发布CSV格式的IP地理定位源文件时,使用本技能。
- 用于调优和排查CSV地理定位源问题——捕获错误、提出改进建议,确保其在实际场景中的可用性,而不仅仅满足RFC合规要求。
- 目标受众:
- 负责可公开路由IP地址空间的网络运营商、管理员和工程师
- 各类组织,如ISP、移动运营商、云服务商、托管和 colocation 公司、互联网交换中心运营商以及卫星互联网服务商
- 请勿用于私有或内部IP地址管理;本技能仅适用于可公开路由的IP地址。
Prerequisites
前置条件
- Python 3 is required.
- 需要安装Python 3。
Directory Structure and File Management
目录结构与文件管理
This skill uses a clear separation between distribution files (read-only) and working files (generated at runtime).
本技能明确区分分发文件(只读)和工作文件(运行时生成)。
Read-Only Directories (Do Not Modify)
只读目录(请勿修改)
The following directories contain static distribution assets. Do not create, modify, or delete files in these directories:
| Directory | Purpose |
|---|---|
| Static data files (ISO codes, examples) |
| RFC specifications and code snippets for reference |
| Executable code and HTML template files for reports |
以下目录包含静态分发资源。请勿在这些目录中创建、修改或删除文件:
| 目录 | 用途 |
|---|---|
| 静态数据文件(ISO代码、示例文件等) |
| RFC规范和代码片段参考文件 |
| 可执行代码和报告HTML模板文件 |
Working Directories (Generated Content)
工作目录(生成内容)
All generated, temporary, and output files go in these directories:
| Directory | Purpose |
|---|---|
| Working directory for all agent-generated content |
| Downloaded CSV files from remote URLs |
| Generated HTML tuning reports |
所有生成的临时文件和输出文件都存储在这些目录中:
| 目录 | 用途 |
|---|---|
| 所有Agent生成内容的工作目录 |
| 从远程URL下载的CSV文件存储目录 |
| 生成的HTML调优报告存储目录 |
File Management Rules
文件管理规则
- Never write to ,
assets/, orreferences/— these are part of the skill distribution and must remain unchanged.scripts/ - All downloaded input files (from remote URLs) must be saved to .
./run/data/ - All generated HTML reports must be saved to .
./run/report/ - All generated Python scripts must be saved to .
./run/ - The directory may be cleared between sessions; do not store permanent data there.
run/ - Working directory for execution: All generated scripts in must be executed with the skill root directory (the directory containing
./run/) as the current working directory, so that relative paths likeSKILL.mdandassets/iso3166-1.jsonresolve correctly. Do not./run/data/report-data.jsonintocdbefore running scripts../run/
- 切勿向、
assets/或references/写入内容——这些是技能分发的一部分,必须保持不变。scripts/ - 所有下载的输入文件(来自远程URL)必须保存到。
./run/data/ - 所有生成的HTML报告必须保存到。
./run/report/ - 所有生成的Python脚本必须保存到。
./run/ - 目录可能会在会话之间被清空;请勿在此存储永久数据。
run/ - 执行工作目录:中的所有生成脚本必须以技能根目录(包含
./run/的目录)作为当前工作目录执行,这样相对路径如SKILL.md和assets/iso3166-1.json才能正确解析。执行脚本前请勿切换到./run/data/report-data.json目录。./run/
Processing Pipeline: Sequential Phase Execution
处理流程:按阶段顺序执行
All phases must be executed in order, from Phase 1 through Phase 6. Each phase depends on the successful completion of the previous phase. For example, structure checks must complete before quality analysis can run.
The phases are summarized below. The agent must follow the detailed steps outlined further in each phase section.
| Phase | Name | Description |
|---|---|---|
| 1 | Understand the Standard | Review the key requirements of RFC 8805 for self-published IP geolocation feeds |
| 2 | Gather Input | Collect IP subnet data from local files or remote URLs |
| 3 | Checks & Suggestions | Validate CSV structure, analyze IP prefixes, and check data quality |
| 4 | Tuning Data Lookup | Use Fastah's MCP tool to retrieve tuning data for improving geolocation accuracy |
| 5 | Generate Tuning Report | Create an HTML report summarizing the analysis and suggestions |
| 6 | Final Review | Verify consistency and completeness of the report data |
Do not skip phases. Each phase provides critical checks or data transformations required by subsequent stages.
所有阶段必须按顺序执行,从第1阶段到第6阶段。每个阶段都依赖于前一阶段的成功完成。例如,结构检查必须在质量分析之前完成。
以下是各阶段的概述。Agent必须遵循每个阶段部分中列出的详细步骤。
| 阶段 | 名称 | 描述 |
|---|---|---|
| 1 | 理解标准要求 | 回顾RFC 8805中关于自发布IP地理定位源的关键要求 |
| 2 | 收集输入数据 | 从本地文件或远程URL收集IP子网数据 |
| 3 | 检查与建议 | 验证CSV结构、分析IP前缀并检查数据质量 |
| 4 | 调优数据查询 | 使用Fastah的MCP工具检索调优数据,以提升地理定位准确性 |
| 5 | 生成调优报告 | 创建HTML报告,总结分析结果和建议 |
| 6 | 最终审核 | 验证报告数据的一致性和完整性 |
请勿跳过任何阶段。每个阶段都提供后续阶段所需的关键检查或数据转换。
Execution Plan Rules
执行计划规则
Before executing each phase, the agent MUST generate a visible TODO checklist.
The plan MUST:
- Appear at the very start of the phase
- List every step in order
- Use a checkbox format
- Be updated live as steps complete
在执行每个阶段之前,Agent必须生成一个可见的待办事项清单。
计划必须:
- 出现在阶段的最开始
- 按顺序列出每个步骤
- 使用复选框格式
- 随着步骤完成实时更新
Phase 1: Understand the Standard
阶段1:理解标准要求
The key requirements from RFC 8805 that this skill enforces are summarized below. Use this summary as your working reference. Only consult the full RFC 8805 text for edge cases, ambiguous situations, or when the user asks a standards question not covered here.
以下是本技能强制执行的RFC 8805关键要求摘要。请将此摘要作为工作参考。仅在遇到边缘情况、模糊场景或用户提出此处未涵盖的标准相关问题时,才查阅完整的RFC 8805文本。
RFC 8805 Key Facts
RFC 8805关键要点
Purpose: A self-published IP geolocation feed lets network operators publish authoritative location data for their IP address space in a simple CSV format, allowing geolocation providers to incorporate operator-supplied corrections.
CSV Column Order (Sections 2.1.1.1–2.1.1.5):
| Column | Field | Required | Notes |
|---|---|---|---|
| 1 | | Yes | CIDR notation; IPv4 or IPv6; must be a network address |
| 2 | | No | ISO 3166-1 alpha-2 country code; empty or "ZZ" = do-not-geolocate |
| 3 | | No | ISO 3166-2 subdivision code (e.g., |
| 4 | | No | Free-text city name; no authoritative validation set |
| 5 | | No | Deprecated — must be left empty or absent |
Structural rules:
- Files may contain comment lines beginning with (including the header, if present).
# - A header row is optional; if present, it is treated as a comment if it starts with .
# - Files must be encoded in UTF-8.
- Subnet host bits must not be set (i.e., is invalid; use
192.168.1.1/24).192.168.1.0/24 - Applies only to globally routable unicast addresses — not private, loopback, link-local, or multicast space.
Do-not-geolocate: An entry with an empty or case-insensitive (irrespective of values of region/city) is an explicit signal that the operator does not want geolocation applied to that prefix.
alpha2codeZZPostal codes deprecated (Section 2.1.1.5): The fifth column must not contain postal or ZIP codes. They are too fine-grained for IP-range mapping and raise privacy concerns.
**目的:**自发布IP地理定位源允许网络运营商以简单的CSV格式发布其IP地址空间的权威位置数据,以便地理定位提供商纳入运营商提供的修正信息。
CSV列顺序(第2.1.1.1–2.1.1.5节):
| 列号 | 字段名 | 是否必填 | 说明 |
|---|---|---|---|
| 1 | | 是 | CIDR表示法;支持IPv4或IPv6;必须是网络地址 |
| 2 | | 否 | ISO 3166-1 alpha-2国家代码;空值或"ZZ"表示无需地理定位 |
| 3 | | 否 | ISO 3166-2细分代码(例如 |
| 4 | | 否 | 自由文本城市名称;无权威验证集 |
| 5 | | 否 | 已弃用——必须留空或省略该列 |
结构规则:
- 文件中可包含以开头的注释行(包括可能存在的表头)。
# - 表头行是可选的;如果存在,若以开头则会被视为注释行。
# - 文件必须以UTF-8编码。
- 子网主机位不得设置(例如无效;应使用
192.168.1.1/24)。192.168.1.0/24 - 仅适用于全局可路由的单播地址——不适用于私有、环回、链路本地或多播地址段。
**无需地理定位的标记:**如果为空或不区分大小写的(无论region/city的值如何),则明确表示运营商不希望对该前缀进行地理定位。
alpha2codeZZ**邮政编码已弃用(第2.1.1.5节):**第五列不得包含邮政编码或ZIP代码。它们对于IP范围映射来说粒度太细,且存在隐私问题。
Phase 2: Gather Input
阶段2:收集输入数据
-
If the user has not already provided a list of IP subnets or ranges (sometimes referred to asor
inetnum), prompt them to supply it. Accepted input formats:inet6num- Text pasted into the chat
- A local CSV file
- A remote URL pointing to a CSV file
-
If the input is a remote URL:
- Attempt to download the CSV file to before processing.
./run/data/ - On HTTP error (4xx, 5xx, timeout, or redirect loop), stop immediately and report to the user:
Feed URL is not reachable: HTTP {status_code}. Please verify the URL is publicly accessible. - Do not proceed to Phase 3 with an incomplete or empty download.
- Attempt to download the CSV file to
-
If the input is a local file, process it directly without downloading.
-
Encoding detection and normalization:
- Attempt to read the file as UTF-8 first.
- If a is raised, try
UnicodeDecodeError(UTF-8 with BOM), thenutf-8-sig.latin-1 - Once successfully decoded, re-encode and write the working copy as UTF-8.
- If no encoding succeeds, stop and report:
Unable to decode input file. Please save it as UTF-8 and try again.
-
如果用户尚未提供IP子网或范围列表(有时称为或
inetnum),请提示他们提供。支持的输入格式:inet6num- 粘贴到聊天框中的文本
- 本地CSV文件
- 指向CSV文件的远程URL
-
如果输入是远程URL:
- 处理前先尝试将CSV文件下载到。
./run/data/ - 如果遇到HTTP错误(4xx、5xx、超时或重定向循环),立即停止并向用户报告:
源URL无法访问:HTTP {status_code}。请验证该URL是否可公开访问。 - 若下载不完整或为空,请勿进入阶段3。
- 处理前先尝试将CSV文件下载到
-
如果输入是本地文件,直接处理无需下载。
-
编码检测与标准化:
- 首先尝试以UTF-8编码读取文件。
- 如果引发,尝试
UnicodeDecodeError(带BOM的UTF-8),再尝试utf-8-sig。latin-1 - 成功解码后,重新编码为UTF-8并写入工作副本。
- 如果所有编码都无法成功解码,停止操作并报告:
无法解码输入文件。请将其保存为UTF-8格式后重试。
Phase 3: Checks & Suggestions
阶段3:检查与建议
Execution Rules
执行规则
- Generate a script for this phase.
- Do NOT combine this phase with others.
- Do NOT precompute future-phase data.
- Store the output as a JSON file at:
./run/data/report-data.json
- 为此阶段生成一个脚本。
- 请勿将此阶段与其他阶段合并。
- 请勿预先计算后续阶段的数据。
- 将输出存储为JSON文件,路径为:
./run/data/report-data.json
Schema Definition
模式定义
The JSON structure below is IMMUTABLE during Phase 3. Phase 4 will later add a object to each object in — this is the only permitted schema extension and happens in a separate phase.
TunedEntryEntriesJSON keys map directly to template placeholders like , , etc.
{{.CountryCode}}{{.HasError}}json
{
"InputFile": "",
"Timestamp": 0,
"TotalEntries": 0,
"IpV4Entries": 0,
"IpV6Entries": 0,
"InvalidEntries": 0,
"Errors": 0,
"Warnings": 0,
"OK": 0,
"Suggestions": 0,
"CityLevelAccuracy": 0,
"RegionLevelAccuracy": 0,
"CountryLevelAccuracy": 0,
"DoNotGeolocate": 0,
"Entries": [
{
"Line": 0,
"IPPrefix": "",
"CountryCode": "",
"RegionCode": "",
"City": "",
"Status": "",
"IPVersion": "",
"Messages": [
{
"ID": "",
"Type": "",
"Text": "",
"Checked": false
}
],
"HasError": false,
"HasWarning": false,
"HasSuggestion": false,
"DoNotGeolocate": false,
"GeocodingHint": "",
"Tunable": false
}
]
}Field definitions:
Top-level metadata:
- : The original input source, either a local filename or a remote URL.
InputFile - : Milliseconds since Unix epoch when the tuning was performed.
Timestamp - : Total number of data rows processed (excluding comment and blank lines).
TotalEntries - : Count of entries that are IPv4 subnets.
IpV4Entries - : Count of entries that are IPv6 subnets.
IpV6Entries - : Count of entries that failed IP prefix parsing and CSV parsing.
InvalidEntries - : Total entries whose
ErrorsisStatus.ERROR - : Total entries whose
WarningsisStatus.WARNING - : Total entries whose
OKisStatus.OK - : Total entries whose
SuggestionsisStatus.SUGGESTION - : Count of valid entries where
CityLevelAccuracyis non-empty.City - : Count of valid entries where
RegionLevelAccuracyis non-empty andRegionCodeis empty.City - : Count of valid entries where
CountryLevelAccuracyis non-empty,CountryCodeis empty, andRegionCodeis empty.City - (metadata): Count of valid entries where
DoNotGeolocate,CountryCode, andRegionCodeare all empty.City
Entry fields:
- : Array of objects, one per data row, with the following per-entry fields:
Entries- : 1-based line number in the original CSV (counting all lines including comments and blanks).
Line - : The normalized IP prefix in CIDR slash notation.
IPPrefix - : The ISO 3166-1 alpha-2 country code, or empty string.
CountryCode - : The ISO 3166-2 region code (e.g.,
RegionCode), or empty string.US-CA - : The city name, or empty string.
City - : Highest severity assigned:
Status>ERROR>WARNING>SUGGESTION.OK - :
IPVersionor"IPv4"based on the parsed IP prefix."IPv6" - : Array of message objects, each with:
Messages- : String identifier from the Validation Rules Reference table below (e.g.,
ID,"1101")."3301" - : The severity type:
Type,"ERROR", or"WARNING"."SUGGESTION" - : The human-readable validation message string.
Text - :
Checkedif the validation rule is auto-tunable (truein the reference table),Tunable: trueotherwise. Controls whether the checkbox in the report isfalseorchecked.disabled
- :
HasErrorif any message hastrueType."ERROR" - :
HasWarningif any message hastrueType."WARNING" - :
HasSuggestionif any message hastrueType."SUGGESTION" - (entry):
DoNotGeolocateiftrueis empty orCountryCode— the entry is an explicit do-not-geolocate signal."ZZ" - : Always empty string
GeocodingHintin Phase 3. Reserved for future use."" - :
Tunableif any message in the entry hastrue. Computed as logical OR across all messages'Checked: truevalues. This flag drives the "Tune" button visibility in the report.Checked
以下JSON结构在阶段3中是不可修改的。阶段4将在中的每个对象中添加一个对象——这是唯一允许的模式扩展,且将在单独阶段中完成。
EntriesTunedEntryJSON键直接映射到模板占位符,如、等。
{{.CountryCode}}{{.HasError}}json
{
"InputFile": "",
"Timestamp": 0,
"TotalEntries": 0,
"IpV4Entries": 0,
"IpV6Entries": 0,
"InvalidEntries": 0,
"Errors": 0,
"Warnings": 0,
"OK": 0,
"Suggestions": 0,
"CityLevelAccuracy": 0,
"RegionLevelAccuracy": 0,
"CountryLevelAccuracy": 0,
"DoNotGeolocate": 0,
"Entries": [
{
"Line": 0,
"IPPrefix": "",
"CountryCode": "",
"RegionCode": "",
"City": "",
"Status": "",
"IPVersion": "",
"Messages": [
{
"ID": "",
"Type": "",
"Text": "",
"Checked": false
}
],
"HasError": false,
"HasWarning": false,
"HasSuggestion": false,
"DoNotGeolocate": false,
"GeocodingHint": "",
"Tunable": false
}
]
}字段定义:
顶层元数据:
- :原始输入源,可为本地文件名或远程URL。
InputFile - :调优执行时的Unix时间戳(毫秒)。
Timestamp - :处理的数据总行数(不包括注释行和空行)。
TotalEntries - :IPv4子网条目的数量。
IpV4Entries - :IPv6子网条目的数量。
IpV6Entries - :无法解析IP前缀和CSV格式的条目数量。
InvalidEntries - :
Errors为Status的条目总数。ERROR - :
Warnings为Status的条目总数。WARNING - :
OK为Status的条目总数。OK - :
Suggestions为Status的条目总数。SUGGESTION - :
CityLevelAccuracy字段非空的有效条目数量。City - :
RegionLevelAccuracy非空且RegionCode为空的有效条目数量。City - :
CountryLevelAccuracy非空且CountryCode和RegionCode为空的有效条目数量。City - :标记为无需地理定位的有效条目数量。
DoNotGeolocate
条目字段:
- :对象数组,每个对象对应一行数据,包含以下字段:
Entries- :原始CSV中的行号(从1开始计数,包括所有行,如注释行和空行)。
Line - :标准化后的IP前缀(CIDR斜杠表示法)。
IPPrefix - :ISO 3166-1 alpha-2国家代码,或空字符串。
CountryCode - :ISO 3166-2地区代码(例如
RegionCode),或空字符串。US-CA - :城市名称,或空字符串。
City - :分配的最高严重级别:
Status>ERROR>WARNING>SUGGESTION。OK - :根据解析的IP前缀确定为
IPVersion或"IPv4"。"IPv6" - :消息对象数组,每个对象包含:
Messages- :来自下方验证规则参考表的字符串标识符(例如
ID、"1101")。"3301" - :严重级别类型:
Type、"ERROR"或"WARNING"。"SUGGESTION" - :人类可读的验证消息字符串。
Text - :如果验证规则可自动调优(参考表中
Checked)则为Tunable: true,否则为true。控制报告中复选框是否为false或checked状态。disabled
- :如果任何消息的
HasError为Type则为"ERROR"。true - :如果任何消息的
HasWarning为Type则为"WARNING"。true - :如果任何消息的
HasSuggestion为Type则为"SUGGESTION"。true - (条目级):如果
DoNotGeolocate为空或CountryCode则为"ZZ"——表示该条目明确标记为无需地理定位。true - :阶段3中始终为空字符串
GeocodingHint。预留供后续使用。"" - :如果条目中任何消息的
Tunable为Checked则为true。通过所有消息的true值的逻辑或运算得出。该标志控制报告中“调优”按钮的可见性。Checked
Validation Rules Reference
验证规则参考表
When adding messages to an entry, use the , , , and values from this table.
IDTypeTextChecked| ID | Type | Text | Checked | Condition Reference |
|---|---|---|---|---|
| | IP prefix is empty | | IP Prefix Analysis: empty |
| | Invalid IP prefix: unable to parse as IPv4 or IPv6 network | | IP Prefix Analysis: invalid syntax |
| | Non-public IP range is not allowed in an RFC 8805 feed | | IP Prefix Analysis: non-public |
| | IPv4 prefix is unusually large and may indicate a typo | | IP Prefix Analysis: IPv4 < /22 |
| | IPv6 prefix is unusually large and may indicate a typo | | IP Prefix Analysis: IPv6 < /64 |
| | Invalid country code: not a valid ISO 3166-1 alpha-2 value | | Country Code Analysis: invalid |
| | Invalid region format; expected COUNTRY-SUBDIVISION (e.g., US-CA) | | Region Code Analysis: bad format |
| | Invalid region code: not a valid ISO 3166-2 subdivision | | Region Code Analysis: unknown code |
| | Region code does not match the specified country code | | Region Code Analysis: mismatch |
| | Invalid city name: placeholder value is not allowed | | City Name Analysis: placeholder |
| | Invalid city name: abbreviated or code-based value detected | | City Name Analysis: abbreviation |
| | City name formatting is inconsistent; consider normalizing the value | | City Name Analysis: formatting |
| | Postal codes are deprecated by RFC 8805 and must be removed for privacy reasons | | Postal Code Check |
| | Region is usually unnecessary for small territories; consider removing the region value | | Tuning: small territory region |
| | City-level granularity is usually unnecessary for small territories; consider removing the city value | | Tuning: small territory city |
| | Region code is recommended when a city is specified; choose a region from the dropdown | | Tuning: missing region with city |
| | Confirm whether this subnet is intentionally marked as do-not-geolocate or missing location data | | Tuning: unspecified geolocation |
向条目添加消息时,请使用此表中的、、和值。
IDTypeTextChecked| ID | 类型 | 消息文本 | Checked | 条件参考 |
|---|---|---|---|---|
| | IP前缀为空 | | IP前缀分析:空值 |
| | 无效IP前缀:无法解析为IPv4或IPv6网络地址 | | IP前缀分析:语法无效 |
| | RFC 8805源中不允许使用非公开IP地址段 | | IP前缀分析:非公开地址 |
| | IPv4前缀过大,可能存在输入错误 | | IP前缀分析:IPv4前缀小于/22 |
| | IPv6前缀过大,可能存在输入错误 | | IP前缀分析:IPv6前缀小于/64 |
| | 无效国家代码:不是有效的ISO 3166-1 alpha-2值 | | 国家代码分析:无效值 |
| | 无效地区格式;预期格式为COUNTRY-SUBDIVISION(例如US-CA) | | 地区代码分析:格式错误 |
| | 无效地区代码:不是有效的ISO 3166-2细分代码 | | 地区代码分析:未知代码 |
| | 地区代码与指定的国家代码不匹配 | | 地区代码分析:代码不匹配 |
| | 无效城市名称:不允许使用占位符值 | | 城市名称分析:占位符 |
| | 无效城市名称:检测到缩写或基于代码的值 | | 城市名称分析:缩写形式 |
| | 城市名称格式不一致;建议标准化该值 | | 城市名称分析:格式问题 |
| | RFC 8805已弃用邮政编码,出于隐私考虑必须移除 | | 邮政编码检查 |
| | 对于小型地区,通常无需指定地区;建议移除地区值 | | 调优建议:小型地区的地区值 |
| | 对于小型地区,通常无需指定城市粒度;建议移除城市值 | | 调优建议:小型地区的城市值 |
| | 指定城市时建议同时提供地区代码;请从下拉列表中选择一个地区 | | 调优建议:指定城市但缺少地区 |
| | 请确认该子网是否有意标记为无需地理定位,或是否缺少位置数据 | | 调优建议:未指定地理定位信息 |
Populating Messages
填充消息
When a validation check matches, add a message to the entry's array using the values from the reference table:
Messagespython
entry["Messages"].append({
"ID": "1201", # From the table
"Type": "ERROR", # From the table
"Text": "Invalid country code: not a valid ISO 3166-1 alpha-2 value", # From the table
"Checked": True # From the table (True = tunable)
})After populating all messages for an entry, derive the entry-level flags:
python
entry["HasError"] = any(m["Type"] == "ERROR" for m in entry["Messages"])
entry["HasWarning"] = any(m["Type"] == "WARNING" for m in entry["Messages"])
entry["HasSuggestion"] = any(m["Type"] == "SUGGESTION" for m in entry["Messages"])
entry["Tunable"] = any(m["Checked"] for m in entry["Messages"])当验证检查匹配时,使用参考表中的值将消息添加到条目的数组中:
Messagespython
entry["Messages"].append({
"ID": "1201", # 来自参考表
"Type": "ERROR", # 来自参考表
"Text": "Invalid country code: not a valid ISO 3166-1 alpha-2 value", # 来自参考表
"Checked": True # 来自参考表(True表示可自动调优)
})为条目填充所有消息后,推导条目级标志:
python
entry["HasError"] = any(m["Type"] == "ERROR" for m in entry["Messages"])
entry["HasWarning"] = any(m["Type"] == "WARNING" for m in entry["Messages"])
entry["HasSuggestion"] = any(m["Type"] == "SUGGESTION" for m in entry["Messages"])
entry["Tunable"] = any(m["Checked"] for m in entry["Messages"])Accuracy Level Counting Rules
准确性级别计数规则
Accuracy levels are mutually exclusive. Assign each valid (non-ERROR, non-invalid) entry to exactly one bucket based on the most granular non-empty geo field:
| Condition | Bucket |
|---|---|
| |
| |
| |
| |
Do not count entries with or entries in in any accuracy bucket.
HasError: trueInvalidEntriesThe agent MUST NOT:
- Rename fields
- Add or remove fields
- Change data types
- Reorder keys
- Alter nesting
- Wrap the object
- Split into multiple files
If a value is unknown, leave it empty — never invent data.
准确性级别是互斥的。根据最精细的非空地理字段,将每个有效(非ERROR、非无效)条目分配到恰好一个分类中:
| 条件 | 分类 |
|---|---|
| |
| |
| |
条目的 | |
请勿将的条目或中的条目计入任何准确性分类。
HasError: trueInvalidEntriesAgent不得:
- 重命名字段
- 添加或删除字段
- 更改数据类型
- 调整键的顺序
- 修改嵌套结构
- 包装对象
- 拆分为多个文件
如果值未知,留空——切勿编造数据。
Structure & Format Check
结构与格式检查
This phase verifies that your feed is well-formed and parseable. Critical structural errors must be resolved before the tuner can analyze geolocation quality.
此阶段验证源文件格式是否规范、是否可解析。必须先解决关键结构错误,调优工具才能分析地理定位质量。
CSV Structure
CSV结构
This subsection defines rules for CSV-formatted input files used for IP geolocation feeds.
The goal is to ensure the file can be parsed reliably and normalized into a consistent internal representation.
-
CSV Structure Checks
-
Ifis available, use it for CSV parsing.
pandas -
Otherwise, fall back to Python's built-inmodule.
csv -
Ensure the CSV contains exactly 4 or 5 logical columns.
-
Comment lines are allowed.
-
A header row may or may not be present.
-
If no header row exists, assume the implicit column order:
ip_prefix, alpha2code, region, city, postal code (deprecated) -
Refer to the example input file:
assets/example/01-user-input-rfc8805-feed.csv
-
-
CSV Cleansing and Normalization
-
Clean and normalize the CSV using Python logic equivalent to the following operations:
- Select only the first five columns, dropping any columns beyond the fifth.
- Write the output file with a UTF-8 BOM.
-
Comments
- Remove comment rows where the first column begins with .
# - This also removes a header row if it begins with .
# - Create a map of comments using the 1-based line number as the key and the full original line as the value. Also store blank lines.
- Store this map in a JSON file at:
./run/data/comments.json - Example:
{ "4": "# It's OK for small city states to leave state ISO2 code unspecified" }
- Remove comment rows where the first column begins with
-
-
Notes
- Both implementation paths (and built-in
pandas) must write output using thecsvencoding to ensure a UTF-8 BOM is present.utf-8-sig
- Both implementation paths (
本小节定义了用于IP地理定位源的CSV格式输入文件规则。目标是确保文件能够被可靠解析并标准化为一致的内部表示形式。
-
CSV结构检查
-
如果已安装,使用它进行CSV解析。
pandas -
否则,回退到Python内置的模块。
csv -
确保CSV包含恰好4或5个逻辑列。
-
允许存在注释行。
-
可能存在或不存在表头行。
-
如果没有表头行,假设默认列顺序:
ip_prefix, alpha2code, region, city, postal code(已弃用) -
参考示例输入文件:
assets/example/01-user-input-rfc8805-feed.csv
-
-
CSV清理与标准化
-
使用Python逻辑对CSV进行清理和标准化,等效于以下操作:
- 仅保留前5列,删除第5列之后的所有列。
- 以UTF-8 BOM编码写入输出文件。
-
注释处理
- 删除第一列以开头的注释行。
# - 这也会删除以开头的表头行。
# - 创建注释映射,以1-based行号为键,完整原始行为值。同时存储空行。
- 将此映射存储为JSON文件,路径为:
./run/data/comments.json - 示例:
{ "4": "# 小型城市国家可以不指定州ISO2代码" }
- 删除第一列以
-
-
注意事项
- 两种实现方式(和内置
pandas模块)都必须使用csv编码写入输出,确保包含UTF-8 BOM。utf-8-sig
- 两种实现方式(
IP Prefix Analysis
IP前缀分析
- Check that the field is present and non-empty for each entry.
IPPrefix-
Check for duplicatevalues across entries.
IPPrefix -
If duplicates are found, stop the skill and report to the user with the message:
Duplicate IP prefix detected: {ip_prefix_value} appears on lines {line_numbers} -
If no duplicates are found, continue with the analysis.
-
Checks
- Each subnet must parse cleanly as either an IPv4 or IPv6 network using the code snippets in the folder.
references/ - Subnets must be normalized and displayed in CIDR slash notation.
- Single-host IPv4 subnets must be represented as .
/32 - Single-host IPv6 subnets must be represented as .
/128
- Single-host IPv4 subnets must be represented as
- Each subnet must parse cleanly as either an IPv4 or IPv6 network using the code snippets in the
-
ERROR
-
Report the following conditions as ERROR:
-
Invalid subnet syntax
- Message ID:
1102
- Message ID:
-
Non-public address space
- Applies to subnets that are private, loopback, link-local, multicast, or otherwise non-public
- In Python, detect non-public ranges using and related address properties as shown in
is_private../references
- In Python, detect non-public ranges using
- Message ID:
1103
- Applies to subnets that are private, loopback, link-local, multicast, or otherwise non-public
-
-
SUGGESTION
-
Report the following conditions as SUGGESTION:
-
Overly large IPv6 subnets
- Prefixes shorter than
/64 - Message ID:
3102
- Prefixes shorter than
-
Overly large IPv4 subnets
- Prefixes shorter than
/22 - Message ID:
3101
- Prefixes shorter than
-
-
- 检查每个条目的字段是否存在且非空。
IPPrefix-
检查值是否存在重复。
IPPrefix -
如果发现重复,停止技能并向用户报告:
检测到重复IP前缀:{ip_prefix_value}出现在行{line_numbers} -
如果未发现重复,继续分析。
-
检查项
- 每个子网必须能够使用文件夹中的代码片段正确解析为IPv4或IPv6网络。
references/ - 子网必须标准化并以CIDR斜杠表示法显示。
- 单主机IPv4子网必须表示为****。
/32 - 单主机IPv6子网必须表示为****。
/128
- 单主机IPv4子网必须表示为**
- 每个子网必须能够使用
-
错误(ERROR)
-
以下情况报告为ERROR:
-
无效子网语法
- 消息ID:
1102
- 消息ID:
-
非公开地址段
- 适用于私有、环回、链路本地、多播或其他非公开的子网
- 在Python中,使用和
is_private中所示的相关地址属性检测非公开地址段。./references
- 在Python中,使用
- 消息ID:
1103
- 适用于私有、环回、链路本地、多播或其他非公开的子网
-
-
建议(SUGGESTION)
-
以下情况报告为SUGGESTION:
-
IPv6前缀过大
- 前缀长度小于
/64 - 消息ID:
3102
- 前缀长度小于
-
IPv4前缀过大
- 前缀长度小于
/22 - 消息ID:
3101
- 前缀长度小于
-
-
Geolocation Quality Check
地理定位质量检查
Analyze the accuracy and consistency of geolocation data:
- Country codes
- Region codes
- City names
- Deprecated fields
This phase runs after structural checks pass.
分析地理定位数据的准确性和一致性:
- 国家代码
- 地区代码
- 城市名称
- 已弃用字段
此阶段在结构检查通过后运行。
Country Code Analysis
国家代码分析
- Use the locally available data table for checking.
ISO3166-1-
JSON array of countries and territories with ISO codes
-
Each object includes:
- : two-letter country code
alpha_2 - : short country name
name - : flag emoji
flag
-
This file represents the superset of validvalues for an RFC 8805 CSV.
CountryCode -
Check the entry's(RFC 8805 Section 2.1.1.2, column
CountryCode) against thealpha2codeattribute.alpha_2 -
Sample code is available in thedirectory.
references/ -
If a country is found in, mark the entry internally as a small territory. This flag is used in later checks and suggestions but is not stored in the output JSON (it is transient validation state).
assets/small-territories.json -
Note:contains some historic/disputed codes (
small-territories.json,AN,CS) that are not present inXK. An entry using one of these as itsiso3166-1.jsonwill fail the country code validation (ERROR) even though it matches as a small territory. The country code ERROR takes precedence — do not suppress it based on the small-territory flag.CountryCode -
ERROR
- Report the following conditions as ERROR:
- Invalid country code
- Condition: is present but not found in the
CountryCodesetalpha_2 - Message ID:
1201
- Condition:
-
SUGGESTION
-
Report the following conditions as SUGGESTION:
-
Unspecified geolocation for subnet
- Condition: All geographical fields (,
CountryCode,RegionCode) are empty for a subnet.City - Action:
- Set for the entry.
DoNotGeolocate = true - Set to
CountryCodefor the entry.ZZ
- Set
- Message ID:
3104
- Condition: All geographical fields (
-
-
- 使用本地可用的数据表进行检查。
ISO3166-1-
包含国家和地区ISO代码的JSON数组
-
每个对象包含:
- :两位国家代码
alpha_2 - :国家简称
name - :国旗表情符号
flag
-
此文件代表RFC 8805 CSV中值的有效全集。
CountryCode -
将条目中的(RFC 8805第2.1.1.2节,列
CountryCode)与alpha2code属性进行比对。alpha_2 -
参考代码可在目录中找到。
references/ -
如果某个国家在中存在,将该条目标记为小型地区。此标志用于后续检查和建议,但不会存储在输出JSON中(属于临时验证状态)。
assets/small-territories.json -
注意:包含一些历史/有争议的代码(
small-territories.json、AN、CS),这些代码未出现在XK中。条目使用这些代码作为iso3166-1.json时,即使匹配小型地区,也会触发国家代码验证错误(ERROR)。国家代码ERROR优先级更高——请勿根据小型地区标志抑制该错误。CountryCode -
错误(ERROR)
- 以下情况报告为ERROR:
- 无效国家代码
- 条件:存在但未在
CountryCode集合中找到alpha_2 - 消息ID:
1201
- 条件:
-
建议(SUGGESTION)
-
以下情况报告为SUGGESTION:
-
子网未指定地理定位信息
- 条件:子网的所有地理字段(、
CountryCode、RegionCode)均为空。City - 操作:
- 将条目的设置为
DoNotGeolocate。true - 将条目的设置为
CountryCode。ZZ
- 将条目的
- 消息ID:
3104
- 条件:子网的所有地理字段(
-
-
Region Code Analysis
地区代码分析
- Use the locally available data table for checking.
ISO3166-2-
JSON array of country subdivisions with ISO-assigned codes
-
Each object includes:
- : subdivision code prefixed with country code (e.g.,
code)US-CA - : short subdivision name
name
-
This file represents the superset of validvalues for an RFC 8805 CSV.
RegionCode -
If avalue is provided (RFC 8805 Section 2.1.1.3):
RegionCode- Check that the format matches (e.g.,
{COUNTRY}-{SUBDIVISION},US-CA).AU-NSW - Check the value against the attribute (already prefixed with the country code).
code
- Check that the format matches
-
Small-territory exception: If the entry is a small territory and thevalue equals the entry's
RegionCode(e.g.,CountryCodeas both country and region for Singapore), treat the region as acceptable — skip all region validation checks for this entry. Small territories are effectively city-states with no meaningful ISO 3166-2 administrative subdivisions.SG -
ERROR
- Report the following conditions as ERROR:
- Invalid region format
- Condition: does not match
RegionCodeand the small-territory exception does not apply{COUNTRY}-{SUBDIVISION} - Message ID:
1301
- Condition:
- Unknown region code
- Condition: value is not found in the
RegionCodeset and the small-territory exception does not applycode - Message ID:
1302
- Condition:
- Country–region mismatch
- Condition: Country portion of does not match
RegionCodeCountryCode - Message ID:
1303
- Condition: Country portion of
-
- 使用本地可用的数据表进行检查。
ISO3166-2-
包含国家细分ISO分配代码的JSON数组
-
每个对象包含:
- :带国家代码前缀的细分代码(例如
code)US-CA - :细分地区简称
name
-
此文件代表RFC 8805 CSV中值的有效全集。
RegionCode -
如果提供了值(RFC 8805第2.1.1.3节):
RegionCode- 检查格式是否符合(例如
{COUNTRY}-{SUBDIVISION}、US-CA)。AU-NSW - 将值与属性(已带有国家代码前缀)进行比对。
code
- 检查格式是否符合
-
小型地区例外:如果条目属于小型地区且值等于条目中的
RegionCode(例如新加坡的CountryCode同时作为国家代码和地区代码),则认为该地区代码是可接受的——跳过该条目的所有地区验证检查。小型地区本质上是城市国家,没有有意义的ISO 3166-2行政细分。SG -
错误(ERROR)
- 以下情况报告为ERROR:
- 无效地区格式
- 条件:不符合
RegionCode格式且不适用小型地区例外{COUNTRY}-{SUBDIVISION} - 消息ID:
1301
- 条件:
- 未知地区代码
- 条件:值未在
RegionCode集合中找到且不适用小型地区例外code - 消息ID:
1302
- 条件:
- 国家-地区代码不匹配
- 条件:中的国家部分与
RegionCode不匹配CountryCode - 消息ID:
1303
- 条件:
-
City Name Analysis
城市名称分析
- City names are validated using heuristic checks only.
-
There is currently no authoritative dataset available for validating city names.
-
ERROR
-
Report the following conditions as ERROR:
-
Placeholder or non-meaningful values
- Condition: Placeholder or non-meaningful values including but not limited to:
undefinedPlease selectnullN/ATBDunknown
- Message ID:
1401
- Condition: Placeholder or non-meaningful values including but not limited to:
-
Truncated names, abbreviations, or airport codes
- Condition: Truncated names, abbreviations, or airport codes that do not represent valid city names:
LAFrftsin01LHRSINMAA
- Message ID:
1402
- Condition: Truncated names, abbreviations, or airport codes that do not represent valid city names:
-
-
WARNING
- Report the following conditions as WARNING:
- Inconsistent casing or formatting
- Condition: City names with inconsistent casing, spacing, or formatting that may reduce data quality, for example:
- vs
HongKongHong Kong - Mixed casing or unexpected script usage
- Message ID:
2401
- Condition: City names with inconsistent casing, spacing, or formatting that may reduce data quality, for example:
-
- 城市名称仅通过启发式检查进行验证。
-
目前没有权威数据集可用于验证城市名称。
-
错误(ERROR)
-
以下情况报告为ERROR:
-
占位符或无意义值
- 条件:包含占位符或无意义值,包括但不限于:
undefinedPlease selectnullN/ATBDunknown
- 消息ID:
1401
- 条件:包含占位符或无意义值,包括但不限于:
-
截断名称、缩写或机场代码
- 条件:检测到截断名称、缩写或机场代码,不代表有效城市名称:
LAFrftsin01LHRSINMAA
- 消息ID:
1402
- 条件:检测到截断名称、缩写或机场代码,不代表有效城市名称:
-
-
警告(WARNING)
- 以下情况报告为WARNING:
- 格式不一致
- 条件:城市名称大小写、空格或格式不一致,可能降低数据质量,例如:
- vs
HongKongHong Kong - 大小写混合或使用意外的脚本
- 消息ID:
2401
- 条件:城市名称大小写、空格或格式不一致,可能降低数据质量,例如:
-
Postal Code Check
邮政编码检查
- RFC 8805 Section 2.1.1.5 explicitly deprecates postal or ZIP codes.
-
Postal codes can represent very small populations and are not considered privacy-safe for mapping IP address ranges, which are statistical in nature.
-
ERROR
- Report the following conditions as ERROR:
- Postal code present
- Condition: A non-empty value is present in the postal/ZIP code field.
- Message ID:
1501
-
- RFC 8805第2.1.1.5节明确弃用邮政编码或ZIP代码。
-
邮政编码代表的人口范围非常小,对于统计性质的IP地址范围映射来说不符合隐私安全要求。
-
错误(ERROR)
- 以下情况报告为ERROR:
- 存在邮政编码
- 条件:邮政编码/ZIP代码字段存在非空值。
- 消息ID:
1501
-
Tuning & Recommendations
调优与建议
This phase applies opinionated recommendations beyond RFC 8805, learned from real-world geofeed deployments, that improve accuracy and usability.
- SUGGESTION
-
Report the following conditions as SUGGESTION:
-
Region or city specified for small territory
- Condition:
- Entry is a small territory
- is non-empty OR
RegionCode - is non-empty.
City
- Message IDs: (for region),
3301(for city)3402
- Condition:
-
Missing region code when city is specified
- Condition:
- is non-empty
City - is empty
RegionCode - Entry is not a small territory
- Message ID:
3303
- Condition:
-
此阶段应用超出RFC 8805要求的经验性建议,这些建议来自实际geofeed部署经验,可提升准确性和可用性。
- 建议(SUGGESTION)
-
以下情况报告为SUGGESTION:
-
小型地区指定了地区或城市
- 条件:
- 条目属于小型地区
- 非空或
RegionCode - 非空。
City
- 消息ID:(针对地区)、
3301(针对城市)3402
- 条件:
-
指定城市但缺少地区代码
- 条件:
- 非空
City - 为空
RegionCode - 条目不属于小型地区
- 消息ID:
3303
- 条件:
-
Phase 4: Tuning Data Lookup
阶段4:调优数据查询
Objective
目标
Lookup all the using Fastah's tool.
Entriesrfc8805-row-place-search使用Fastah的工具查询所有。
rfc8805-row-place-searchEntriesExecution Rules
执行规则
- Generate a new script only for payload generation (read the dataset and write one or more payload JSON files; do not call MCP from this script).
- Server only accepts 1000 entries per request, so if there are more than 1000 entries, split into multiple requests.
- The agent must read the generated payload files, construct the requests from them, and send those requests to the MCP server in batches of at most 1000 entries each.
- On MCP failure: If the MCP server is unreachable, returns an error, or returns no results for any batch, log a warning and continue to Phase 5. Set for all affected entries. Do not block report generation. Notify the user clearly:
TunedEntry: {}Tuning data lookup unavailable; the report will show validation results only. - Suggestions are advisory only — never auto-populate them.
- 仅为生成请求负载创建一个新脚本(读取数据集并写入一个或多个负载JSON文件;请勿在此脚本中调用MCP)。
- 服务器每个请求最多接受1000个条目,因此如果条目超过1000个,拆分到多个请求中。
- Agent必须读取生成的负载文件,从中构造请求,并以最多1000个条目为一批发送到MCP服务器。
- **MCP失败处理:**如果MCP服务器无法访问、返回错误或任何批次未返回结果,记录警告并继续到阶段5。将受影响条目的设置为空对象。请勿阻止报告生成。向用户明确通知:
TunedEntry: {}调优数据查询不可用;报告将仅显示验证结果。 - 建议仅作为参考——切勿自动填充。
Step 1: Build Lookup Payload with Deduplication
步骤1:构建去重后的查询负载
Load the dataset from: ./run/data/report-data.json
- Read the array. Each entry will be used to build the MCP lookup payload.
Entries
Reduce server requests by deduplicating identical entries:
- For each entry in , compute a content hash (hash of
Entries+CountryCode+RegionCode).City - Create a deduplication map: . rowKey is a UUID that will be sent to the MCP server for matching responses.
{ contentHash -> { rowKey, payload, entryIndices: [] } } - If an entry's hash already exists, append its 0-based array index in to that deduplication entry's
Entriesarray.entryIndices - If hash is new, generate a UUID (rowKey) and create a new deduplication entry.
Build request batches:
- Extract unique deduplicated entries from the map, keeping them in deduplication order.
- Build request batches of up to 1000 items each.
- For each batch, keep an in-memory structure like to match responses back by rowKey.
[{ rowKey, payload, entryIndices }, ...] - When writing the MCP payload file, include the field with each payload object:
rowKey
json
[
{"rowKey": "550e8400-e29b-41d4-a716-446655440000", "countryCode":"CA","regionCode":"CA-ON","cityName":"Toronto"},
{"rowKey": "6ba7b810-9dad-11d1-80b4-00c04fd430c8", "countryCode":"IN","regionCode":"IN-KA","cityName":"Bangalore"},
{"rowKey": "6ba7b811-9dad-11d1-80b4-00c04fd430c8", "countryCode":"IN","regionCode":"IN-KA"}
]- When reading responses, match each response field to the corresponding deduplication entry to retrieve all associated
rowKey.entryIndices
Rules:
- Write payload to: ./run/data/mcp-server-payload.json
- Exit the script after writing the payload.
从./run/data/report-data.json加载数据集。
- 读取数组。每个条目将用于构建MCP查询负载。
Entries
通过去重相同条目减少服务器请求:
- 对于中的每个条目,计算内容哈希(
Entries+CountryCode+RegionCode的哈希值)。City - 创建去重映射:。rowKey是将发送到MCP服务器用于匹配响应的UUID。
{ contentHash -> { rowKey, payload, entryIndices: [] } } - 如果条目的哈希已存在,将其在中的0-based数组索引追加到该去重条目的
Entries数组中。entryIndices - 如果哈希是新的,生成一个**UUID(rowKey)**并创建一个新的去重条目。
构建请求批次:
- 从映射中提取唯一的去重条目,保持去重顺序。
- 构建最多包含1000个条目的请求批次。
- 对于每个批次,保留内存结构如,以便通过rowKey匹配响应。
[{ rowKey, payload, entryIndices: [] }, ...] - 写入MCP负载文件时,每个负载对象包含字段:
rowKey
json
[
{"rowKey": "550e8400-e29b-41d4-a716-446655440000", "countryCode":"CA","regionCode":"CA-ON","cityName":"Toronto"},
{"rowKey": "6ba7b810-9dad-11d1-80b4-00c04fd430c8", "countryCode":"IN","regionCode":"IN-KA","cityName":"Bangalore"},
{"rowKey": "6ba7b811-9dad-11d1-80b4-00c04fd430c8", "countryCode":"IN","regionCode":"IN-KA"}
]- 读取响应时,将每个响应中的字段与去重映射中的对应条目进行匹配,以获取所有关联的
rowKey。entryIndices
规则:
- 将负载写入:./run/data/mcp-server-payload.json
- 写入负载后退出脚本。
Step 2: Invoke Fastah MCP Tool
步骤2:调用Fastah MCP工具
- An example style configuration of Fastah MCP server is as follows:
mcp.json
json
"fastah-ip-geofeed": {
"type": "http",
"url": "https://mcp.fastah.ai/mcp"
}-
Server:
https://mcp.fastah.ai/mcp -
Tool and its Schema: before the first, the agent MUST send a
tools/callrequest to read the input and output schema fortools/list. Use the discovered schema as the authoritative source for field names, types, and constraints.rfc8805-row-place-search -
The following is an illustrative example only; always defer to the schema returned by:
tools/listjson[ {"rowKey": "550e8400-...", "countryCode":"CA", ...}, {"rowKey": "690e9301-...", "countryCode":"ZZ", ...} ] -
Open ./run/data/mcp-server-payload.json and send all deduplicated entries with their rowKeys.
-
If there are more than 1000 deduplicated entries after deduplication, split into multiple requests of 1000 entries each.
-
The server will respond with the samefield in each response for mapping back.
rowKey -
Do NOT use local data.
- Fastah MCP服务器的风格配置示例如下:
mcp.json
json
"fastah-ip-geofeed": {
"type": "http",
"url": "https://mcp.fastah.ai/mcp"
}-
服务器地址:
https://mcp.fastah.ai/mcp -
工具及其Schema:在第一次之前,Agent必须发送
tools/call请求,读取**tools/list**的输入和输出Schema。 使用返回的Schema作为字段名、类型和约束的权威来源。rfc8805-row-place-search -
以下仅为示例说明;请始终以返回的Schema为准:
tools/listjson[ {"rowKey": "550e8400-...", "countryCode":"CA", ...}, {"rowKey": "690e9301-...", "countryCode":"ZZ", ...} ] -
打开./run/data/mcp-server-payload.json,发送所有带rowKey的去重条目。
-
如果去重后条目超过1000个,拆分为多个请求,每个请求最多1000个条目。
-
服务器将在每个响应中返回相同的字段,用于映射回原始条目。
rowKey -
请勿使用本地数据。
Step 3: Attach Tuned Data to Entries
步骤3:将调优数据附加到条目
- Generate a new script for attaching tuned data.
- Load both ./run/data/report-data.json and the deduplication map (held in memory from Step 1, or re-derived from the payload file).
- For each response from the MCP server:
- Extract the from the response.
rowKey - Look up the array associated with that
entryIndicesfrom the deduplication map.rowKey - For each index in , attach the best match to
entryIndices.Entries[index]
- Extract the
- Use the first (best) match from the response when available.
Create the field on each affected entry if it does not exist. Remap the MCP API response keys to Go struct field names:
json
"TunedEntry": {
"Name": "",
"CountryCode": "",
"RegionCode": "",
"PlaceType": "",
"H3Cells": [],
"BoundingBox": []
}The field is a single object (not an array). It holds the best match from the MCP server.
TunedEntryMCP response key → JSON key mapping:
| MCP API response key | JSON key |
|---|---|
| |
| |
| |
| |
| |
| |
Entries with no UUID match (i.e. the MCP server returned no response for their UUID) must receive an empty object — never leave the field absent.
TunedEntry: {}- Write the dataset back to: ./run/data/report-data.json
- Rules:
- Maintain all existing validation flags.
- Do NOT create additional intermediate files.
- 生成一个新脚本用于附加调优数据。
- 加载./run/data/report-data.json和去重映射(从步骤1的内存中获取,或从负载文件重新推导)。
- 对于MCP服务器返回的每个响应:
- 从响应中提取。
rowKey - 从去重映射中查找与该关联的
rowKey数组。entryIndices - 对于数组中的每个索引,将最佳匹配结果附加到。
Entries[index]
- 从响应中提取
- 如果有可用结果,使用第一个(最佳)匹配。
如果条目不存在该字段,则创建该字段。将MCP API响应键映射为Go结构体字段名:
json
"TunedEntry": {
"Name": "",
"CountryCode": "",
"RegionCode": "",
"PlaceType": "",
"H3Cells": [],
"BoundingBox": []
}TunedEntryMCP响应键 → JSON键映射:
| MCP API响应键 | JSON键 |
|---|---|
| |
| |
| |
| |
| |
| |
对于没有UUID匹配的条目(即MCP服务器未返回其UUID的响应),必须设置——切勿省略该字段。
TunedEntry: {}- 将数据集写回:./run/data/report-data.json
- 规则:
- 保留所有现有验证标志。
- 请勿创建额外的中间文件。
Phase 5: Generate Tuning Report
阶段5:生成调优报告
Generate a self-contained HTML report by rendering the template at with data from and .
./scripts/templates/index.html./run/data/report-data.json./run/data/comments.jsonWrite the completed report to . After generating, attempt to open it in the system's default browser (e.g., ). If running in a headless environment, CI pipeline, or remote container where no browser is available, skip the browser step and instead present the file path to the user so they can open or download it.
./run/report/geofeed-report.htmlwebbrowser.open()The template uses Go syntax (, , , etc.). Write a Python script that reads the template, builds a rendering context from the JSON data files, and processes the template placeholders to produce final HTML. Do not modify the template file itself — all processing happens in the Python script at render time.
html/template{{.Field}}{{range}}{{if eq}}通过渲染模板,结合和中的数据,生成独立HTML报告。
./scripts/templates/index.html./run/data/report-data.json./run/data/comments.json将完成的报告写入。生成后,尝试在系统默认浏览器中打开(例如使用)。如果在无头环境、CI流水线或远程容器中运行,且没有可用浏览器,跳过打开浏览器步骤,而是向用户提供文件路径,以便他们打开或下载。
./run/report/geofeed-report.htmlwebbrowser.open()模板使用Go 语法(、、等)。编写Python脚本读取模板,从JSON数据文件构建渲染上下文,并处理模板占位符以生成最终HTML。请勿修改模板文件本身——所有处理都在Python脚本渲染时完成。
html/template{{.Field}}{{range}}{{if eq}}Step 1: Replace Metadata Placeholders
步骤1:替换元数据占位符
Replace each placeholder in the template with the corresponding value from . Since JSON keys match the template placeholder, the mapping is direct — maps to the JSON key, etc.
{{.Metadata.X}}report-data.json{{.Metadata.InputFile}}InputFile| Template placeholder | JSON key ( |
|---|---|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
Note on : This placeholder appears inside a JavaScript call. Replace it with the raw integer value (no HTML escaping needed for a numeric literal inside ). All other metadata values should be HTML-escaped since they appear inside HTML element text.
{{.Metadata.Timestamp}}new Date(...)<script>将模板中的每个占位符替换为中的对应值。由于JSON键与模板占位符匹配,映射是直接的——映射到JSON键,以此类推。
{{.Metadata.X}}report-data.json{{.Metadata.InputFile}}InputFile| 模板占位符 | JSON键( |
|---|---|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
**关于的注意事项:**该占位符出现在JavaScript 调用中。直接替换为原始整数值(中的数值字面量无需HTML转义)。所有其他元数据值应进行HTML转义,因为它们出现在HTML元素文本中。
{{.Metadata.Timestamp}}new Date(...)<script>Step 2: Replace the Comment Map Placeholder
步骤2:替换注释映射占位符
Locate this pattern in the template:
javascript
const commentMap = {{.Comments}};Replace with the serialized JSON object from . The JSON is embedded directly as a JavaScript object literal (not inside a string), so no extra escaping is needed:
{{.Comments}}./run/data/comments.jsonpython
comments_json = json.dumps(comments)
template = template.replace("{{.Comments}}", comments_json)在模板中找到以下模式:
javascript
const commentMap = {{.Comments}};将替换为中的序列化JSON对象。JSON直接作为JavaScript对象字面量嵌入(不在字符串内),因此无需额外转义:
{{.Comments}}./run/data/comments.jsonpython
comments_json = json.dumps(comments)
template = template.replace("{{.Comments}}", comments_json)Step 3: Expand the Entries Range Block
步骤3:展开条目循环块
The template contains a block inside . Process it as follows:
{{range .Entries}}...{{end}}<tbody id="entriesTableBody">- Extract the range block body using regex. Critical: The block contains nested tags (from
{{end}},{{if eq .Status ...}}, and{{if .Checked}}). A naive non-greedy match like{{range .Messages}}will match the first inner\{\{range \.Entries\}\}(.*?)\{\{end\}\}, truncating the block. Instead, anchor the outer{{end}}to the{{end}}that follows it:</tbody>This ensures you capture the full block body including all threepythonm = re.search( r'\{\{range \.Entries\}\}(.*?)\{\{end\}\}\s*</tbody>', template, re.DOTALL, ) entry_body = m.group(1) # template text for one entry iterationrows and the nested<tr>.{{range .Messages}}...{{end}} - Iterate over each entry in 's
report-data.jsonarray.Entries - Expand the block body for each entry using the processing order below.
- Replace the entire match (from through
{{range .Entries}}) with the concatenated expanded HTML followed by</tbody>.</tbody>
Processing order for each entry (innermost constructs first to avoid confusion):
{{end}}- Evaluate conditionals (status badge class and icon).
{{if eq .Status ...}}...{{end}} - Evaluate conditional (message checkbox).
{{if .Checked}}...{{end}} - Expand inner range.
{{range .Messages}}...{{end}} - Replace simple placeholders.
{{.Field}}
模板在内部包含一个块。按以下方式处理:
<tbody id="entriesTableBody">{{range .Entries}}...{{end}}- 提取循环块主体(使用正则表达式)。关键:块中包含嵌套的标签(来自
{{end}}、{{if eq .Status ...}}和{{if .Checked}})。简单的非贪婪匹配如{{range .Messages}}会匹配第一个内部\{\{range \.Entries\}\}(.*?)\{\{end\}\},导致块被截断。相反,将外部{{end}}锚定到其后的{{end}}:</tbody>这样可确保捕获完整的块主体,包括所有三个pythonm = re.search( r'\{\{range \.Entries\}\}(.*?)\{\{end\}\}\s*</tbody>', template, re.DOTALL, ) entry_body = m.group(1) # 单个条目的模板文本行和嵌套的<tr>。{{range .Messages}}...{{end}} - 遍历中
report-data.json数组的每个条目。Entries - 按以下处理顺序为每个条目展开块主体。
- 替换整个匹配内容(从到
{{range .Entries}})为拼接后的展开HTML,再加上</tbody>。</tbody>
每个条目的处理顺序(先处理最内层结构,避免混淆):
{{end}}- 计算条件(状态徽章类和图标)。
{{if eq .Status ...}}...{{end}} - 计算条件(消息复选框)。
{{if .Checked}}...{{end}} - 展开内部循环。
{{range .Messages}}...{{end}} - 替换简单的占位符。
{{.Field}}
Entry Field Mapping
条目字段映射
Within the range block body, replace these placeholders for each entry. Since JSON keys match the template placeholder, the template placeholder maps directly to JSON key :
{{.X}}X| Template placeholder | JSON key ( | Notes |
|---|---|---|
| | Direct integer value |
| | HTML-escaped |
| | HTML-escaped |
| | HTML-escaped |
| | HTML-escaped |
| | HTML-escaped |
| | Lowercase string: |
| | Lowercase string: |
| | Lowercase string: |
| | Empty string |
| | |
| | |
| | |
| | |
| | |
| | Bracket-wrapped space-separated; |
| | Bracket-wrapped space-separated; |
data-h3-cellsdata-bounding-box- — correct
[836752fffffffff 836755fffffffff] - — WRONG, quotes will break parsing
["836752fffffffff","836755fffffffff"] - — correct
[-71.70 10.73 -71.52 10.55] - — correct for empty
[]
在循环块主体中,为每个条目替换以下占位符。由于JSON键与模板占位符匹配,模板占位符直接映射到JSON键:
{{.X}}X| 模板占位符 | JSON键( | 说明 |
|---|---|---|
| | 直接整数值 |
| | HTML转义后的值 |
| | HTML转义后的值 |
| | HTML转义后的值 |
| | HTML转义后的值 |
| | HTML转义后的值 |
| | 小写字符串: |
| | 小写字符串: |
| | 小写字符串: |
| | Empty string |
| | |
| | |
| | 如果 |
| | 如果 |
| | 如果 |
| | 括号包裹的空格分隔值;空值时为 |
| | 括号包裹的空格分隔值;空值时为 |
data-h3-cellsdata-bounding-box- — 正确
[836752fffffffff 836755fffffffff] - — 错误,引号会导致解析失败
["836752fffffffff","836755fffffffff"] - — 正确
[-71.70 10.73 -71.52 10.55] - — 空值时正确
[]
Evaluating Status Conditionals
计算状态条件
Process these BEFORE replacing simple placeholders — otherwise the markers get consumed and the regex won't match.
{{.Field}}{{end}}The template uses conditionals for the status badge CSS class and icon. Evaluate these by checking the entry's value and keeping only the matching branch text.
{{if eq .Status "..."}}statusThe status badge line contains two blocks on a single line — one for the CSS class, one for the icon. Use with a callback to resolve all occurrences:
{{if eq .Status ...}}...{{end}}re.subpython
STATUS_CSS = {"ERROR": "error", "WARNING": "warning", "SUGGESTION": "suggestion", "OK": "ok"}
STATUS_ICON = {
"ERROR": "bi-x-circle-fill",
"WARNING": "bi-exclamation-triangle-fill",
"SUGGESTION": "bi-lightbulb-fill",
"OK": "bi-check-circle-fill",
}
def resolve_status_if(match_obj, status):
"""Pick the branch matching `status` from a {{if eq .Status ...}}...{{end}} block."""
block = match_obj.group(0)
# Try each branch: {{if eq .Status "X"}}val{{else if ...}}val{{else}}val{{end}}
for st, val in [("ERROR",), ("WARNING",), ("SUGGESTION",)]:
# not needed to parse generically — just map from the known patterns
...A simpler approach: since there are exactly two known patterns, replace them as literal strings:
python
css_class = STATUS_CSS.get(status, "ok")
icon_class = STATUS_ICON.get(status, "bi-check-circle-fill")
body = body.replace(
'{{if eq .Status "ERROR"}}error{{else if eq .Status "WARNING"}}warning{{else if eq .Status "SUGGESTION"}}suggestion{{else}}ok{{end}}',
css_class,
)
body = body.replace(
'{{if eq .Status "ERROR"}}bi-x-circle-fill{{else if eq .Status "WARNING"}}bi-exclamation-triangle-fill{{else if eq .Status "SUGGESTION"}}bi-lightbulb-fill{{else}}bi-check-circle-fill{{end}}',
icon_class,
)This avoids regex entirely and is safe because these exact strings appear verbatim in the template.
在替换简单占位符之前处理这些条件——否则标记会被消耗,导致正则表达式无法匹配。
{{.Field}}{{end}}模板使用条件来设置状态徽章的CSS类和图标。通过检查条目的值,仅保留匹配分支的文本。
{{if eq .Status "..."}}status状态徽章行包含两个块——一个用于CSS类,一个用于图标。使用和回调函数解析所有匹配项:
{{if eq .Status ...}}...{{end}}re.subpython
STATUS_CSS = {"ERROR": "error", "WARNING": "warning", "SUGGESTION": "suggestion", "OK": "ok"}
STATUS_ICON = {
"ERROR": "bi-x-circle-fill",
"WARNING": "bi-exclamation-triangle-fill",
"SUGGESTION": "bi-lightbulb-fill",
"OK": "bi-check-circle-fill",
}
def resolve_status_if(match_obj, status):
"""从{{if eq .Status ...}}...{{end}}块中选择与`status`匹配的分支。"""
block = match_obj.group(0)
# 尝试每个分支:{{if eq .Status "X"}}val{{else if ...}}val{{else}}val{{end}}
for st, val in [("ERROR",), ("WARNING",), ("SUGGESTION",)]:
# 无需通用解析——只需根据已知模式映射
...更简单的方法:由于只有两个已知模式,直接替换为字面字符串:
python
css_class = STATUS_CSS.get(status, "ok")
icon_class = STATUS_ICON.get(status, "bi-check-circle-fill")
body = body.replace(
'{{if eq .Status "ERROR"}}error{{else if eq .Status "WARNING"}}warning{{else if eq .Status "SUGGESTION"}}suggestion{{else}}ok{{end}}',
css_class,
)
body = body.replace(
'{{if eq .Status "ERROR"}}bi-x-circle-fill{{else if eq .Status "WARNING"}}bi-exclamation-triangle-fill{{else if eq .Status "SUGGESTION"}}bi-lightbulb-fill{{else}}bi-check-circle-fill{{end}}',
icon_class,
)这样可避免使用正则表达式,且安全可靠,因为这些字符串在模板中是固定的。
Step 4: Expand the Nested Messages Range
步骤4:展开嵌套消息循环
The block contains a nested conditional, so its inner would cause a simple non-greedy regex to match too early. Anchor the regex to (the tag immediately after the messages range closing ) to capture the full block body:
{{range .Messages}}...{{end}}{{if .Checked}} checked{{else}} disabled{{end}}{{end}}</td>{{end}}python
msg_match = re.search(
r'\{\{range \.Messages\}\}(.*?)\{\{end\}\}\s*(?=</td>)',
body, re.DOTALL
)The lookahead ensures the regex skips past the checkbox conditional's (which is followed by , not ) and matches only the range-closing (which is followed by whitespace then ).
(?=</td>){{end}}></td>{{end}}</td>For each message in the entry's array, clone the captured block body and expand it:
Messages-
Resolve the checkbox conditional per message (must happen before simple placeholder replacement to remove the nested):
{{end}}pythonif msg.get("Checked"): msg_body = msg_body.replace( '{{if .Checked}} checked{{else}} disabled{{end}}', ' checked' ) else: msg_body = msg_body.replace( '{{if .Checked}} checked{{else}} disabled{{end}}', ' disabled' ) -
Replace message field placeholders:
Template placeholder Source Notes {{.ID}}Messages[i].IDDirect string value from JSON {{.Text}}Messages[i].TextHTML-escaped -
Concatenate all expanded message blocks and replace the originalmatch (
{{range .Messages}}...{{end}}) with the result:msg_match.group(0)pythonbody = body[:msg_match.start()] + "".join(expanded_msgs) + body[msg_match.end():]
If is empty, replace the entire matched region with an empty string (no message divs — only the issues header remains).
Messages{{range .Messages}}...{{end}}{{if .Checked}} checked{{else}} disabled{{end}}{{end}}</td>{{end}}python
msg_match = re.search(
r'\{\{range \.Messages\}\}(.*?)\{\{end\}\}\s*(?=</td>)',
body, re.DOTALL
)前瞻断言确保正则表达式跳过复选框条件的(其后是,而非),仅匹配循环结束的(其后是空格和)。
(?=</td>){{end}}></td>{{end}}</td>对于条目中数组的每个消息,克隆捕获的块主体并展开:
Messages-
解析复选框条件(每条消息):必须在替换简单占位符之前处理,以避免嵌套的混淆:
{{end}}pythonif msg.get("Checked"): msg_body = msg_body.replace( '{{if .Checked}} checked{{else}} disabled{{end}}', ' checked' ) else: msg_body = msg_body.replace( '{{if .Checked}} checked{{else}} disabled{{end}}', ' disabled' ) -
替换消息字段占位符:
模板占位符 来源 说明 {{.ID}}Messages[i].ID直接使用JSON中的字符串值 {{.Text}}Messages[i].TextHTML转义后的值 -
拼接所有展开的消息块,并将原始匹配内容(
{{range .Messages}}...{{end}})替换为结果:msg_match.group(0)pythonbody = body[:msg_match.start()] + "".join(expanded_msgs) + body[msg_match.end():]
如果为空,将整个匹配区域替换为空字符串(无消息div——仅保留问题标题)。
MessagesOutput Guarantees
输出保证
- The report must be readable in any modern browser without extra network dependencies beyond the CDN links already in the template (,
leaflet,h3-js, Raleway font).bootstrap-icons - All values embedded in HTML must be HTML-escaped (,
<,>,&) to prevent rendering issues." - is embedded as a direct JavaScript object literal (not inside a string), so no JS string escaping is needed — just emit valid JSON.
commentMap - All values must be derived only from analysis output, not recomputed heuristically.
- 报告必须可在任何现代浏览器中读取,无需模板中已有的CDN链接(、
leaflet、h3-js、Raleway字体)之外的额外网络依赖。bootstrap-icons - 嵌入HTML的所有值必须进行HTML转义(、
<、>、&),以避免渲染问题。" - 直接作为JavaScript对象字面量嵌入(不在字符串内),因此无需JS字符串转义——只需输出有效的JSON。
commentMap - 所有值必须仅来自分析输出,而非通过启发式重新计算。
Phase 6: Final Review
阶段6:最终审核
Perform a final verification pass using concrete, checkable assertions before presenting results to the user.
Check 1 — Entry count integrity
- Count non-comment, non-blank data rows in the original input CSV.
- Assert:
len(entries) in report-data.json == data_row_count - On failure:
Row count mismatch: input has {N} data rows but report contains {M} entries.
Check 2 — Summary counter integrity
- These counters use mutual exclusion based on the boolean flags, which mirrors the highest-severity field. An entry with both
StatusandHasError: trueis counted only inHasWarning: true, never inErrors. This is equivalent to counting by the entry'sWarningsfield.Status - Assert all of the following; correct any that fail before generating the report:
Errors == sum(1 for e in Entries if e['HasError'])Warnings == sum(1 for e in Entries if e['HasWarning'] and not e['HasError'])Suggestions == sum(1 for e in Entries if e['HasSuggestion'] and not e['HasError'] and not e['HasWarning'])OK == sum(1 for e in Entries if not e['HasError'] and not e['HasWarning'] and not e['HasSuggestion'])Errors + Warnings + Suggestions + OK == TotalEntries - InvalidEntries
Check 3 — Accuracy bucket integrity
- Assert:
CityLevelAccuracy + RegionLevelAccuracy + CountryLevelAccuracy + DoNotGeolocate == TotalEntries - InvalidEntries - Note: The accuracy buckets defined in Phase 3 say "Do not count entries with ", but the Check 3 formula above uses
HasError: true(which still includes ERROR entries). This means ERROR entries (those that parsed as valid IPs but failed validation) are counted in accuracy buckets by their geo-field presence. OnlyTotalEntries - InvalidEntries(unparsable IP prefixes) are excluded. Follow the Check 3 formula as the authoritative rule.InvalidEntries - On failure, trace and fix the bucketing logic before proceeding.
Check 4 — No duplicate line numbers
- Assert: all values in
Lineare unique.Entries - On failure, report the duplicated line numbers to the user.
Check 5 — TunedEntry completeness
- Assert: every object in has a
Entrieskey (even if its value isTunedEntry).{} - On failure, add to any entry missing the key, then re-save
"TunedEntry": {}.report-data.json
Check 6 — Report file is present and non-empty
- Confirm was written and has a file size greater than zero bytes.
./run/report/geofeed-report.html - On failure, regenerate the report before presenting to the user.
在向用户呈现结果之前,使用具体、可检查的断言执行最终验证。
检查1 — 条目计数完整性
- 统计原始输入CSV中非注释、非空的数据行数。
- 断言:
report-data.json中的entries长度 == data_row_count - 失败时:
行数不匹配:输入有{N}行数据,但报告包含{M}个条目。
检查2 — 汇总计数器完整性
- 这些计数器基于布尔标志互斥计数,与最高严重级别字段一致。同时
Status和HasError: true的条目仅计入HasWarning: true,不计入Errors。这与按条目Warnings字段计数等效。Status - 断言以下所有条件;生成报告前修正任何失败的条件:
Errors == sum(1 for e in Entries if e['HasError'])Warnings == sum(1 for e in Entries if e['HasWarning'] and not e['HasError'])Suggestions == sum(1 for e in Entries if e['HasSuggestion'] and not e['HasError'] and not e['HasWarning'])OK == sum(1 for e in Entries if not e['HasError'] and not e['HasWarning'] and not e['HasSuggestion'])Errors + Warnings + Suggestions + OK == TotalEntries - InvalidEntries
检查3 — 准确性分类完整性
- 断言:
CityLevelAccuracy + RegionLevelAccuracy + CountryLevelAccuracy + DoNotGeolocate == TotalEntries - InvalidEntries - **注意:**阶段3中定义的准确性分类规则指出“请勿计数的条目”,但检查3的公式使用
HasError: true(仍包含ERROR条目)。这意味着ERROR条目(可解析为有效IP但验证失败的条目)会根据其地理字段存在情况计入准确性分类。仅排除TotalEntries - InvalidEntries(无法解析的IP前缀)。以检查3的公式作为权威规则。InvalidEntries - 失败时,追溯并修正分类逻辑后再继续。
检查4 — 无重复行号
- 断言:中的所有
Entries值都是唯一的。Line - 失败时,向用户报告重复的行号。
检查5 — TunedEntry完整性
- 断言:中的每个对象都有
Entries键(即使值为TunedEntry)。{} - 失败时,为缺少该键的条目添加,然后重新保存
"TunedEntry": {}。report-data.json
检查6 — 报告文件存在且非空
- 确认已写入且文件大小大于0字节。
./run/report/geofeed-report.html - 失败时,重新生成报告后再向用户呈现。