marketplace-search-recsys-planning
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseMarketplace Engineering Two-Sided Search and Recsys Planning Best Practices
市场工程:双边信任型搜索与推荐系统规划最佳实践
Comprehensive planning, design and diagnostic guide for search and recommendation systems
in two-sided trust marketplaces. Covers OpenSearch index, query and ranking patterns, the
methodology for planning retrieval work, the handoff points to recommendation-specific
tooling, and the instrumentation and dashboard layer that turns measurement into ongoing
decision making. Contains 57 rules across 10 categories ordered by cascade impact, plus
two playbooks (plan a new system from scratch, diagnose an existing one) and explicit
living-artefact conventions (decisions log, golden set, gotchas).
本指南为双边信任型市场中的搜索与推荐系统提供全面的规划、设计与诊断方案,涵盖OpenSearch索引、查询与排序模式,检索工作的规划方法论,向推荐专用工具的交接要点,以及将度量数据转化为持续决策的监控仪表盘层。内容包含10个类别下的57条规则(按影响传导顺序排列),外加两份操作手册(从零规划新系统、诊断现有系统),以及明确的动态文档规范(决策日志、黄金测试集、问题记录)。
When to Apply
适用场景
Reference this skill when:
- Planning a new marketplace retrieval project from scratch
- Reviewing an existing retrieval system that feels stale, unfair, or unpersonalised
- Designing the OpenSearch index mapping, analyzers, or query DSL
- Choosing retrieval primitives per product surface (search, recs, hybrid, curated)
- Deciding which search quality metrics to track and dashboard
- Running the weekly search-quality review ritual
- Diagnosing a silent regression in ranking, coverage, or zero-result rate
- Deciding when a retrieval problem is actually a personalisation problem
This skill is the precursor to . Start here for
planning and search work; hand off to the personalisation skill when the diagnosed
bottleneck is impression tracking, feedback-loop bias, or AWS Personalize-specific
design.
marketplace-personalisation在以下场景中可参考本技能:
- 从零规划全新的市场检索项目
- 评审表现不佳、缺乏公平性或个性化的现有检索系统
- 设计OpenSearch索引映射、分词器或query DSL
- 为不同产品界面(搜索、推荐、混合、人工精选)选择合适的检索原语
- 确定需要跟踪并在仪表盘展示的搜索质量指标
- 开展每周一次的搜索质量评审流程
- 诊断排序、覆盖范围或零结果率方面的隐性退化问题
- 判断检索问题是否实际属于个性化问题
本技能是的前置技能。规划与搜索相关工作时请从本技能入手;当诊断出的瓶颈涉及曝光跟踪、反馈循环偏差或AWS Personalize专属设计时,可转交给个性化技能处理。
marketplace-personalisationLiving Context
动态上下文
This skill treats the system as evolving. Three living artefacts carry context across
sessions, releases, and team changes — read them before making suggestions, update them
after every shipped change:
- (in this skill folder) — append-only diagnostic lessons. Every gotcha has a date and a short description of what surprised the team and how it was resolved.
gotchas.md - Decisions log (maintained in the product repo, typically ) — every ranking change, schema tweak, and synonym edit recorded with its hypothesis, offline and online evidence, ship criterion, outcome, and rollback path. See rule
decisions/*.md.plan-maintain-a-decisions-log - Golden query set (frozen per eval cycle, committed to the product repo) — the
reference set of queries against which every ranking change is offline-evaluated
before an online test. See rule
.
plan-version-the-golden-set
本技能将系统视为持续演进的对象。三类动态文档可跨会话、版本迭代与团队变更保留上下文信息——在提出建议前请先阅读这些文档,在每次上线变更后请更新它们:
- (位于本技能文件夹中)——仅可追加的诊断经验记录。每条记录包含日期、团队遇到的意外问题及解决方法。
gotchas.md - 决策日志(维护在产品代码库中,通常路径为)——所有排序变更、schema调整、同义词编辑都需记录假设前提、离线与在线验证证据、上线标准、结果及回滚方案。详见规则
decisions/*.md。plan-maintain-a-decisions-log - 黄金查询集(每个评估周期固定版本,提交至产品代码库)——所有排序变更在进行在线测试前,都需基于该参考查询集完成离线评估。详见规则。
plan-version-the-golden-set
Rule Categories
规则类别
Categories are ordered by cascade impact on the retrieval lifecycle: intent
misunderstanding poisons architecture; wrong architecture poisons index; wrong index
poisons retrieval forever until a reindex; every downstream layer inherits the upstream
error.
| # | Category | Prefix | Impact |
|---|---|---|---|
| 1 | Problem Framing and User Intent | | CRITICAL |
| 2 | Surface Taxonomy and Architecture | | CRITICAL |
| 3 | Index Design and Mapping | | HIGH |
| 4 | Planning and Improvement Methodology | | HIGH |
| 5 | Query Understanding | | MEDIUM-HIGH |
| 6 | Retrieval Strategy | | MEDIUM-HIGH |
| 7 | Relevance and Ranking | | MEDIUM-HIGH |
| 8 | Search and Recommender Blending | | MEDIUM |
| 9 | Measurement and Experimentation | | MEDIUM |
| 10 | Instrumentation, Dashboards and Decision Triggers | | MEDIUM |
类别按对检索生命周期的影响传导顺序排列:意图理解错误会影响架构设计;错误的架构会导致索引设计缺陷;索引问题会永久影响检索效果,除非重新索引;下游所有环节都会继承上游的错误。
| 序号 | 类别 | 前缀 | 影响等级 |
|---|---|---|---|
| 1 | 问题构建与用户意图 | | CRITICAL |
| 2 | 界面分类与架构设计 | | CRITICAL |
| 3 | 索引设计与映射 | | HIGH |
| 4 | 规划与优化方法论 | | HIGH |
| 5 | 查询理解 | | MEDIUM-HIGH |
| 6 | 检索策略 | | MEDIUM-HIGH |
| 7 | 相关性与排序 | | MEDIUM-HIGH |
| 8 | 搜索与推荐融合 | | MEDIUM |
| 9 | 度量与实验 | | MEDIUM |
| 10 | 监控、仪表盘与决策触发 | | MEDIUM |
Quick Reference
快速参考
1. Problem Framing and User Intent (CRITICAL)
1. 问题构建与用户意图(CRITICAL)
- — classify before retrieving
intent-map-queries-to-intent-classes - — different failure modes, different strategies
intent-separate-known-item-from-discovery - — design from real data, not imagined data
intent-audit-live-query-logs-first - — precision vs diversity
intent-distinguish-transactional-from-exploratory - — per-surface query shapes
intent-reject-one-search-for-everything - — curated is a legitimate answer
intent-treat-no-search-as-first-class-choice
- —— 先分类,再检索
intent-map-queries-to-intent-classes - —— 不同的失败模式对应不同的策略
intent-separate-known-item-from-discovery - —— 基于真实数据设计,而非假想数据
intent-audit-live-query-logs-first - —— 精准度 vs 多样性
intent-distinguish-transactional-from-exploratory - —— 为不同界面设计不同的查询模式
intent-reject-one-search-for-everything - —— 人工精选是合理的解决方案
intent-treat-no-search-as-first-class-choice
2. Surface Taxonomy and Architecture (CRITICAL)
2. 界面分类与架构设计(CRITICAL)
- — a single-source-of-truth routing table
arch-map-surface-to-retrieval-primitive - — two-stage pipelines
arch-split-candidate-generation-from-ranking - — declare fallback owner per surface
arch-design-zero-result-fallback - — cold start is permanent, not bootstrap
arch-design-for-cold-start-from-day-one - — diversify primary dependencies
arch-avoid-mono-stack-retrieval - — every routing decision recorded
arch-route-surfaces-deliberately
- —— 构建单一可信的路由表
arch-map-surface-to-retrieval-primitive - —— 采用两阶段流水线架构
arch-split-candidate-generation-from-ranking - —— 为每个界面指定fallback方案负责人
arch-design-zero-result-fallback - —— 冷启动是长期存在的问题,而非仅初始化阶段
arch-design-for-cold-start-from-day-one - —— 多样化核心依赖
arch-avoid-mono-stack-retrieval - —— 所有路由决策都需记录
arch-route-surfaces-deliberately
3. Index Design and Mapping (HIGH)
3. 索引设计与映射(HIGH)
- — reindex is expensive
index-design-mappings-conservatively - — full-text plus exact match
index-use-keyword-and-text-as-multi-fields - — tokens must agree
index-match-index-and-query-time-analyzers - — language-aware stemming
index-use-language-analyzers-for-language-fields - — index only what you search
index-separate-searchable-from-display-fields - — prevent mapping drift
index-use-index-templates-for-consistency - — freshness in seconds, not hours
index-stream-listing-updates-via-cdc
- —— 重新索引成本高昂,需谨慎设计
index-design-mappings-conservatively - —— 同时支持全文检索与精确匹配
index-use-keyword-and-text-as-multi-fields - —— 索引与查询阶段的分词器需保持一致
index-match-index-and-query-time-analyzers - —— 针对语言字段使用支持词干提取的分词器
index-use-language-analyzers-for-language-fields - —— 仅索引需要搜索的字段
index-separate-searchable-from-display-fields - —— 防止映射漂移
index-use-index-templates-for-consistency - —— 实现秒级新鲜度,而非小时级
index-stream-listing-updates-via-cdc
4. Planning and Improvement Methodology (HIGH)
4. 规划与优化方法论(HIGH)
- — instrumentation gate on kick-off
plan-audit-before-you-build - — the first artefact, not the last
plan-build-golden-query-set-first - — theory of constraints
plan-find-bottleneck-before-optimising - — living context across team changes
plan-maintain-a-decisions-log - — frozen per eval cycle
plan-version-the-golden-set - — recognise the boundary
plan-handoff-to-personalisation-skill
- —— 项目启动前先完成监控部署
plan-audit-before-you-build - —— 黄金测试集是首个要构建的文档,而非最后一个
plan-build-golden-query-set-first - —— 遵循约束理论
plan-find-bottleneck-before-optimising - —— 跨团队变更保留上下文信息
plan-maintain-a-decisions-log - —— 每个评估周期固定版本
plan-version-the-golden-set - —— 明确技能边界
plan-handoff-to-personalisation-skill
5. Query Understanding (MEDIUM-HIGH)
5. 查询理解(MEDIUM-HIGH)
- — canonical string in
query-normalise-before-anything-else - — double-digit recall wins
query-use-language-analyzers-for-stemming - — domain vocabulary not thesaurus
query-curate-synonyms-by-domain - — 10-15% of queries have typos
query-use-fuzzy-matching-for-typos - — single-pass classifier
query-classify-before-routing - — latency isolation
query-build-autocomplete-on-separate-index
- —— 先将查询字符串标准化
query-normalise-before-anything-else - —— 词干提取可显著提升召回率
query-use-language-analyzers-for-stemming - —— 基于领域词汇而非通用同义词库
query-curate-synonyms-by-domain - —— 10-15%的查询存在拼写错误
query-use-fuzzy-matching-for-typos - —— 使用单通道分类器
query-classify-before-routing - —— 隔离autocomplete的延迟影响
query-build-autocomplete-on-separate-index
6. Retrieval Strategy (MEDIUM-HIGH)
6. 检索策略(MEDIUM-HIGH)
- — filter cache wins
retrieve-use-filter-clauses-for-exact-matches - — must vs should vs filter
retrieve-use-bool-structure-deliberately - — rescore window limits cost
retrieve-run-expensive-signals-in-rescore - — lexical plus semantic
retrieve-combine-bm25-and-knn-via-hybrid-search - — constant-cost deep pagination
retrieve-paginate-with-search-after - — re-embedding is expensive
retrieve-choose-embedding-model-deliberately
- —— 使用过滤子句实现精确匹配,利用过滤缓存提升性能
retrieve-use-filter-clauses-for-exact-matches - —— 合理使用must、should与filter结构
retrieve-use-bool-structure-deliberately - —— 在rescore阶段处理计算成本高的信号,通过窗口限制成本
retrieve-run-expensive-signals-in-rescore - —— 结合词法检索与语义检索
retrieve-combine-bm25-and-knn-via-hybrid-search - —— 使用search-after实现常量成本的深度分页
retrieve-paginate-with-search-after - —— 重新生成嵌入向量成本高昂,需谨慎选择模型
retrieve-choose-embedding-model-deliberately
7. Relevance and Ranking (MEDIUM-HIGH)
7. 相关性与排序(MEDIUM-HIGH)
- — upstream levers first
rank-tune-bm25-parameters-last - — explicit named functions
rank-use-function-score-for-business-signals - — supervised learning needs labels
rank-deploy-ltr-only-after-golden-set-exists - — after scoring, not before
rank-apply-diversity-at-rank-time - — comparable scales
rank-normalise-scores-across-retrieval-primitives
- —— 优先调优上游环节,最后再调整BM25参数
rank-tune-bm25-parameters-last - —— 使用命名的显式函数处理业务信号
rank-use-function-score-for-business-signals - —— 监督式学习需要标签,需先构建黄金测试集
rank-deploy-ltr-only-after-golden-set-exists - —— 在排序阶段实现多样性,而非检索阶段
rank-apply-diversity-at-rank-time - —— 统一不同检索原语的分数尺度
rank-normalise-scores-across-retrieval-primitives
8. Search and Recommender Blending (MEDIUM)
8. 搜索与推荐融合(MEDIUM)
- — precision queries
blend-use-search-alone-for-specific-intent - — normalised weighted sum
blend-combine-search-and-personalisation-scores - — traceable results
blend-keep-hybrid-blending-explainable - — guaranteed cascade to non-empty
blend-never-return-zero-results
- —— 针对明确意图的查询仅使用搜索
blend-use-search-alone-for-specific-intent - —— 使用标准化加权和融合搜索与个性化分数
blend-combine-search-and-personalisation-scores - —— 确保混合结果可追溯
blend-keep-hybrid-blending-explainable - —— 保证最终返回非空结果
blend-never-return-zero-results
9. Measurement and Experimentation (MEDIUM)
9. 度量与实验(MEDIUM)
- — one definition per surface
measure-define-session-success-per-surface - — three metrics for one picture
measure-track-ndcg-mrr-zero-result-rate - — cheapest failure metric
measure-track-reformulation-rate-as-failure-signal - — scale beyond human judges
measure-use-click-models-for-implicit-judgments - — 10x less sample needed
measure-run-interleaving-as-cheap-ab-proxy
- —— 为每个界面定义独立的会话成功标准
measure-define-session-success-per-surface - —— 三个指标全面反映搜索质量
measure-track-ndcg-mrr-zero-result-rate - —— 查询重写率是成本最低的失败指标
measure-track-reformulation-rate-as-failure-signal - —— 超越人工标注,实现规模化评估
measure-use-click-models-for-implicit-judgments - —— 所需样本量仅为A/B测试的1/10
measure-run-interleaving-as-cheap-ab-proxy
10. Instrumentation, Dashboards and Decision Triggers (MEDIUM)
10. 监控、仪表盘与决策触发(MEDIUM)
- — structured replayable events
monitor-log-every-query-with-full-context - — redact before warehouse ingestion
monitor-scrub-pii-from-query-logs - — threshold lines, colour bands
monitor-build-search-health-dashboard - — quality metrics, not error rates
monitor-alert-on-decision-triggers - — RBO churn as leading indicator
monitor-track-ranking-stability-churn - — calendar-driven ritual
monitor-run-weekly-search-quality-review
- —— 记录包含完整上下文的结构化可重放事件
monitor-log-every-query-with-full-context - —— 在导入数据仓库前先脱敏处理PII数据
monitor-scrub-pii-from-query-logs - —— 设置阈值线与颜色标识
monitor-build-search-health-dashboard - —— 针对质量指标告警,而非仅错误率
monitor-alert-on-decision-triggers - —— 将RBO波动作为前置指标
monitor-track-ranking-stability-churn - —— 建立日历驱动的固定评审流程
monitor-run-weekly-search-quality-review
Planning and Improving
规划与优化流程
Two playbooks compose the rules into end-to-end workflows:
- — Plan a new marketplace retrieval system from scratch. Nine-step workflow from intent audit through the first A/B-tested online lift, with explicit exit criteria per step.
references/playbooks/planning.md - — Diagnose and improve an existing retrieval system. Decision tree that walks through telemetry, index freshness, coverage, baseline gap, cold start, segment regressions, and algorithm iteration in that order, with hand-off points to
references/playbooks/improving.mdwhen the bottleneck is personalisation-specific.marketplace-personalisation
Read the playbooks first when the task is "design a new search and recommender project"
or "this retrieval system needs to get better". Read individual rules when a specific
question arises during implementation or review.
两份操作手册将规则整合为端到端的工作流:
- —— 从零规划全新的市场检索系统。包含从意图审计到首次A/B测试上线提升的九步工作流,每个步骤都有明确的退出标准。
references/playbooks/planning.md - —— 诊断与优化现有检索系统。决策树将引导你依次检查遥测数据、索引新鲜度、覆盖范围、基准差距、冷启动问题、细分场景退化及算法迭代,当瓶颈为个性化相关问题时,可转交给
references/playbooks/improving.md技能处理。marketplace-personalisation
当你的任务是“设计全新的搜索与推荐项目”或“优化现有检索系统”时,请先阅读操作手册。在实现或评审过程中遇到具体问题时,再查阅对应的单条规则。
How to Use
使用方法
- Read for category structure and cascade rationale.
references/_sections.md - Read for diagnostic lessons accumulated from prior incidents.
gotchas.md - Read to plan a new system.
references/playbooks/planning.md - Read to diagnose an existing one.
references/playbooks/improving.md - Read individual rule files when a specific task matches the rule title.
- Use to author new rules as the skill grows.
assets/templates/_template.md
- 阅读了解类别结构与影响传导逻辑。
references/_sections.md - 阅读获取过往事件积累的诊断经验。
gotchas.md - 阅读规划新系统。
references/playbooks/planning.md - 阅读诊断现有系统。
references/playbooks/improving.md - 当遇到具体任务时,查阅对应的单条规则文件。
- 使用在技能扩展时编写新规则。
assets/templates/_template.md
Related Skills
相关技能
- — The companion skill covering AWS Personalize implementation, impression tracking, schema design, two-sided matching, feedback loops, and the personalisation-specific diagnostic playbook. Hand off to this skill when the diagnostic identifies a personalisation-specific bottleneck.
marketplace-personalisation
- —— 配套技能,涵盖AWS Personalize落地、曝光跟踪、schema设计、双边匹配、反馈循环及个性化专属诊断手册。当诊断出的瓶颈为个性化相关问题时,可转交给该技能处理。
marketplace-personalisation
Reference Files
参考文件
| File | Description |
|---|---|
| references/_sections.md | Category definitions and impact ordering |
| references/playbooks/planning.md | Plan a new retrieval system |
| references/playbooks/improving.md | Diagnose an existing retrieval system |
| gotchas.md | Accumulated diagnostic lessons (living) |
| assets/templates/_template.md | Template for authoring new rules |
| metadata.json | Version, discipline, references |
| 文件路径 | 描述 |
|---|---|
| references/_sections.md | 类别定义与影响排序说明 |
| references/playbooks/planning.md | 新检索系统规划手册 |
| references/playbooks/improving.md | 现有检索系统诊断与优化手册 |
| gotchas.md | 积累的诊断经验(动态更新) |
| assets/templates/_template.md | 新规则编写模板 |
| metadata.json | 版本、领域与参考信息 |