marketplace-search-recsys-planning

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Marketplace Engineering Two-Sided Search and Recsys Planning Best Practices

市场工程：双边信任型搜索与推荐系统规划最佳实践

Comprehensive planning, design and diagnostic guide for search and recommendation systems in two-sided trust marketplaces. Covers OpenSearch index, query and ranking patterns, the methodology for planning retrieval work, the handoff points to recommendation-specific tooling, and the instrumentation and dashboard layer that turns measurement into ongoing decision making. Contains 57 rules across 10 categories ordered by cascade impact, plus two playbooks (plan a new system from scratch, diagnose an existing one) and explicit living-artefact conventions (decisions log, golden set, gotchas).

本指南为双边信任型市场中的搜索与推荐系统提供全面的规划、设计与诊断方案，涵盖OpenSearch索引、查询与排序模式，检索工作的规划方法论，向推荐专用工具的交接要点，以及将度量数据转化为持续决策的监控仪表盘层。内容包含10个类别下的57条规则（按影响传导顺序排列），外加两份操作手册（从零规划新系统、诊断现有系统），以及明确的动态文档规范（决策日志、黄金测试集、问题记录）。

When to Apply

适用场景

Reference this skill when:

Planning a new marketplace retrieval project from scratch
Reviewing an existing retrieval system that feels stale, unfair, or unpersonalised
Designing the OpenSearch index mapping, analyzers, or query DSL
Choosing retrieval primitives per product surface (search, recs, hybrid, curated)
Deciding which search quality metrics to track and dashboard
Running the weekly search-quality review ritual
Diagnosing a silent regression in ranking, coverage, or zero-result rate
Deciding when a retrieval problem is actually a personalisation problem

This skill is the precursor to

marketplace-personalisation

. Start here for planning and search work; hand off to the personalisation skill when the diagnosed bottleneck is impression tracking, feedback-loop bias, or AWS Personalize-specific design.

在以下场景中可参考本技能：

从零规划全新的市场检索项目
评审表现不佳、缺乏公平性或个性化的现有检索系统
设计OpenSearch索引映射、分词器或query DSL
为不同产品界面（搜索、推荐、混合、人工精选）选择合适的检索原语
确定需要跟踪并在仪表盘展示的搜索质量指标
开展每周一次的搜索质量评审流程
诊断排序、覆盖范围或零结果率方面的隐性退化问题
判断检索问题是否实际属于个性化问题

本技能是

marketplace-personalisation

的前置技能。规划与搜索相关工作时请从本技能入手；当诊断出的瓶颈涉及曝光跟踪、反馈循环偏差或AWS Personalize专属设计时，可转交给个性化技能处理。

Living Context

动态上下文

This skill treats the system as evolving. Three living artefacts carry context across sessions, releases, and team changes — read them before making suggestions, update them after every shipped change:

gotchas.md
(in this skill folder) — append-only diagnostic lessons. Every gotcha has a date and a short description of what surprised the team and how it was resolved.
Decisions log (maintained in the product repo, typically
```
decisions/*.md
```
) — every ranking change, schema tweak, and synonym edit recorded with its hypothesis, offline and online evidence, ship criterion, outcome, and rollback path. See rule
```
plan-maintain-a-decisions-log
```
.
Golden query set (frozen per eval cycle, committed to the product repo) — the reference set of queries against which every ranking change is offline-evaluated before an online test. See rule
```
plan-version-the-golden-set
```
.

本技能将系统视为持续演进的对象。三类动态文档可跨会话、版本迭代与团队变更保留上下文信息——在提出建议前请先阅读这些文档，在每次上线变更后请更新它们：

gotchas.md
（位于本技能文件夹中）——仅可追加的诊断经验记录。每条记录包含日期、团队遇到的意外问题及解决方法。
决策日志（维护在产品代码库中，通常路径为
```
decisions/*.md
```
）——所有排序变更、schema调整、同义词编辑都需记录假设前提、离线与在线验证证据、上线标准、结果及回滚方案。详见规则
```
plan-maintain-a-decisions-log
```
。
黄金查询集（每个评估周期固定版本，提交至产品代码库）——所有排序变更在进行在线测试前，都需基于该参考查询集完成离线评估。详见规则
```
plan-version-the-golden-set
```
。

Rule Categories

规则类别

Categories are ordered by cascade impact on the retrieval lifecycle: intent misunderstanding poisons architecture; wrong architecture poisons index; wrong index poisons retrieval forever until a reindex; every downstream layer inherits the upstream error.


intent-
arch-
index-
plan-
query-
retrieve-
rank-
blend-
measure-
monitor-

#	Category	Prefix	Impact
1	Problem Framing and User Intent	`intent-`	CRITICAL
2	Surface Taxonomy and Architecture	`arch-`	CRITICAL
3	Index Design and Mapping	`index-`	HIGH
4	Planning and Improvement Methodology	`plan-`	HIGH
5	Query Understanding	`query-`	MEDIUM-HIGH
6	Retrieval Strategy	`retrieve-`	MEDIUM-HIGH
7	Relevance and Ranking	`rank-`	MEDIUM-HIGH
8	Search and Recommender Blending	`blend-`	MEDIUM
9	Measurement and Experimentation	`measure-`	MEDIUM
10	Instrumentation, Dashboards and Decision Triggers	`monitor-`	MEDIUM

类别按对检索生命周期的影响传导顺序排列：意图理解错误会影响架构设计；错误的架构会导致索引设计缺陷；索引问题会永久影响检索效果，除非重新索引；下游所有环节都会继承上游的错误。


intent-
arch-
index-
plan-
query-
retrieve-
rank-
blend-
measure-
monitor-

序号	类别	前缀	影响等级
1	问题构建与用户意图	`intent-`	CRITICAL
2	界面分类与架构设计	`arch-`	CRITICAL
3	索引设计与映射	`index-`	HIGH
4	规划与优化方法论	`plan-`	HIGH
5	查询理解	`query-`	MEDIUM-HIGH
6	检索策略	`retrieve-`	MEDIUM-HIGH
7	相关性与排序	`rank-`	MEDIUM-HIGH
8	搜索与推荐融合	`blend-`	MEDIUM
9	度量与实验	`measure-`	MEDIUM
10	监控、仪表盘与决策触发	`monitor-`	MEDIUM

Quick Reference

快速参考

1. Problem Framing and User Intent (CRITICAL)

1. 问题构建与用户意图（CRITICAL）

```
intent-map-queries-to-intent-classes
```
— classify before retrieving
```
intent-separate-known-item-from-discovery
```
— different failure modes, different strategies
```
intent-audit-live-query-logs-first
```
— design from real data, not imagined data

intent-distinguish-transactional-from-exploratory

— precision vs diversity

```
intent-reject-one-search-for-everything
```
— per-surface query shapes

intent-treat-no-search-as-first-class-choice

— curated is a legitimate answer

```
intent-map-queries-to-intent-classes
```
—— 先分类，再检索
```
intent-separate-known-item-from-discovery
```
—— 不同的失败模式对应不同的策略
```
intent-audit-live-query-logs-first
```
—— 基于真实数据设计，而非假想数据

intent-distinguish-transactional-from-exploratory

—— 精准度 vs 多样性

```
intent-reject-one-search-for-everything
```
—— 为不同界面设计不同的查询模式

intent-treat-no-search-as-first-class-choice

—— 人工精选是合理的解决方案

2. Surface Taxonomy and Architecture (CRITICAL)

2. 界面分类与架构设计（CRITICAL）

```
arch-map-surface-to-retrieval-primitive
```
— a single-source-of-truth routing table

arch-split-candidate-generation-from-ranking

— two-stage pipelines

```
arch-design-zero-result-fallback
```
— declare fallback owner per surface
```
arch-design-for-cold-start-from-day-one
```
— cold start is permanent, not bootstrap
```
arch-avoid-mono-stack-retrieval
```
— diversify primary dependencies
```
arch-route-surfaces-deliberately
```
— every routing decision recorded

```
arch-map-surface-to-retrieval-primitive
```
—— 构建单一可信的路由表

arch-split-candidate-generation-from-ranking

—— 采用两阶段流水线架构

```
arch-design-zero-result-fallback
```
—— 为每个界面指定fallback方案负责人
```
arch-design-for-cold-start-from-day-one
```
—— 冷启动是长期存在的问题，而非仅初始化阶段
```
arch-avoid-mono-stack-retrieval
```
—— 多样化核心依赖
```
arch-route-surfaces-deliberately
```
—— 所有路由决策都需记录

3. Index Design and Mapping (HIGH)

3. 索引设计与映射（HIGH）

```
index-design-mappings-conservatively
```
— reindex is expensive

index-use-keyword-and-text-as-multi-fields

— full-text plus exact match

index-match-index-and-query-time-analyzers

— tokens must agree

index-use-language-analyzers-for-language-fields

— language-aware stemming

index-separate-searchable-from-display-fields

— index only what you search

index-use-index-templates-for-consistency

— prevent mapping drift

```
index-stream-listing-updates-via-cdc
```
— freshness in seconds, not hours

```
index-design-mappings-conservatively
```
—— 重新索引成本高昂，需谨慎设计
```
index-use-keyword-and-text-as-multi-fields
```
—— 同时支持全文检索与精确匹配
```
index-match-index-and-query-time-analyzers
```
—— 索引与查询阶段的分词器需保持一致
```
index-use-language-analyzers-for-language-fields
```
—— 针对语言字段使用支持词干提取的分词器

index-separate-searchable-from-display-fields

—— 仅索引需要搜索的字段

index-use-index-templates-for-consistency

—— 防止映射漂移

```
index-stream-listing-updates-via-cdc
```
—— 实现秒级新鲜度，而非小时级

4. Planning and Improvement Methodology (HIGH)

4. 规划与优化方法论（HIGH）

```
plan-audit-before-you-build
```
— instrumentation gate on kick-off
```
plan-build-golden-query-set-first
```
— the first artefact, not the last
```
plan-find-bottleneck-before-optimising
```
— theory of constraints
```
plan-maintain-a-decisions-log
```
— living context across team changes
```
plan-version-the-golden-set
```
— frozen per eval cycle
```
plan-handoff-to-personalisation-skill
```
— recognise the boundary

```
plan-audit-before-you-build
```
—— 项目启动前先完成监控部署
```
plan-build-golden-query-set-first
```
—— 黄金测试集是首个要构建的文档，而非最后一个
```
plan-find-bottleneck-before-optimising
```
—— 遵循约束理论
```
plan-maintain-a-decisions-log
```
—— 跨团队变更保留上下文信息
```
plan-version-the-golden-set
```
—— 每个评估周期固定版本
```
plan-handoff-to-personalisation-skill
```
—— 明确技能边界

5. Query Understanding (MEDIUM-HIGH)

5. 查询理解（MEDIUM-HIGH）

```
query-normalise-before-anything-else
```
— canonical string in

query-use-language-analyzers-for-stemming

— double-digit recall wins

```
query-curate-synonyms-by-domain
```
— domain vocabulary not thesaurus
```
query-use-fuzzy-matching-for-typos
```
— 10-15% of queries have typos
```
query-classify-before-routing
```
— single-pass classifier

query-build-autocomplete-on-separate-index

— latency isolation

```
query-normalise-before-anything-else
```
—— 先将查询字符串标准化
```
query-use-language-analyzers-for-stemming
```
—— 词干提取可显著提升召回率
```
query-curate-synonyms-by-domain
```
—— 基于领域词汇而非通用同义词库
```
query-use-fuzzy-matching-for-typos
```
—— 10-15%的查询存在拼写错误
```
query-classify-before-routing
```
—— 使用单通道分类器

query-build-autocomplete-on-separate-index

—— 隔离autocomplete的延迟影响

6. Retrieval Strategy (MEDIUM-HIGH)

6. 检索策略（MEDIUM-HIGH）

retrieve-use-filter-clauses-for-exact-matches

— filter cache wins

retrieve-use-bool-structure-deliberately

— must vs should vs filter

retrieve-run-expensive-signals-in-rescore

— rescore window limits cost

retrieve-combine-bm25-and-knn-via-hybrid-search

— lexical plus semantic

```
retrieve-paginate-with-search-after
```
— constant-cost deep pagination

retrieve-choose-embedding-model-deliberately

— re-embedding is expensive

```
retrieve-use-filter-clauses-for-exact-matches
```
—— 使用过滤子句实现精确匹配，利用过滤缓存提升性能
```
retrieve-use-bool-structure-deliberately
```
—— 合理使用must、should与filter结构
```
retrieve-run-expensive-signals-in-rescore
```
—— 在rescore阶段处理计算成本高的信号，通过窗口限制成本

retrieve-combine-bm25-and-knn-via-hybrid-search

—— 结合词法检索与语义检索

```
retrieve-paginate-with-search-after
```
—— 使用search-after实现常量成本的深度分页
```
retrieve-choose-embedding-model-deliberately
```
—— 重新生成嵌入向量成本高昂，需谨慎选择模型

7. Relevance and Ranking (MEDIUM-HIGH)

7. 相关性与排序（MEDIUM-HIGH）

```
rank-tune-bm25-parameters-last
```
— upstream levers first

rank-use-function-score-for-business-signals

— explicit named functions

rank-deploy-ltr-only-after-golden-set-exists

— supervised learning needs labels

```
rank-apply-diversity-at-rank-time
```
— after scoring, not before

rank-normalise-scores-across-retrieval-primitives

— comparable scales

```
rank-tune-bm25-parameters-last
```
—— 优先调优上游环节，最后再调整BM25参数
```
rank-use-function-score-for-business-signals
```
—— 使用命名的显式函数处理业务信号
```
rank-deploy-ltr-only-after-golden-set-exists
```
—— 监督式学习需要标签，需先构建黄金测试集
```
rank-apply-diversity-at-rank-time
```
—— 在排序阶段实现多样性，而非检索阶段

rank-normalise-scores-across-retrieval-primitives

—— 统一不同检索原语的分数尺度

8. Search and Recommender Blending (MEDIUM)

8. 搜索与推荐融合（MEDIUM）

blend-use-search-alone-for-specific-intent

— precision queries

blend-combine-search-and-personalisation-scores

— normalised weighted sum

```
blend-keep-hybrid-blending-explainable
```
— traceable results
```
blend-never-return-zero-results
```
— guaranteed cascade to non-empty

```
blend-use-search-alone-for-specific-intent
```
—— 针对明确意图的查询仅使用搜索
```
blend-combine-search-and-personalisation-scores
```
—— 使用标准化加权和融合搜索与个性化分数
```
blend-keep-hybrid-blending-explainable
```
—— 确保混合结果可追溯
```
blend-never-return-zero-results
```
—— 保证最终返回非空结果

9. Measurement and Experimentation (MEDIUM)

9. 度量与实验（MEDIUM）

measure-define-session-success-per-surface

— one definition per surface

```
measure-track-ndcg-mrr-zero-result-rate
```
— three metrics for one picture

measure-track-reformulation-rate-as-failure-signal

— cheapest failure metric

measure-use-click-models-for-implicit-judgments

— scale beyond human judges

measure-run-interleaving-as-cheap-ab-proxy

— 10x less sample needed

```
measure-define-session-success-per-surface
```
—— 为每个界面定义独立的会话成功标准
```
measure-track-ndcg-mrr-zero-result-rate
```
—— 三个指标全面反映搜索质量
```
measure-track-reformulation-rate-as-failure-signal
```
—— 查询重写率是成本最低的失败指标
```
measure-use-click-models-for-implicit-judgments
```
—— 超越人工标注，实现规模化评估
```
measure-run-interleaving-as-cheap-ab-proxy
```
—— 所需样本量仅为A/B测试的1/10

10. Instrumentation, Dashboards and Decision Triggers (MEDIUM)

10. 监控、仪表盘与决策触发（MEDIUM）

monitor-log-every-query-with-full-context

— structured replayable events

```
monitor-scrub-pii-from-query-logs
```
— redact before warehouse ingestion
```
monitor-build-search-health-dashboard
```
— threshold lines, colour bands
```
monitor-alert-on-decision-triggers
```
— quality metrics, not error rates
```
monitor-track-ranking-stability-churn
```
— RBO churn as leading indicator

monitor-run-weekly-search-quality-review

— calendar-driven ritual

```
monitor-log-every-query-with-full-context
```
—— 记录包含完整上下文的结构化可重放事件
```
monitor-scrub-pii-from-query-logs
```
—— 在导入数据仓库前先脱敏处理PII数据
```
monitor-build-search-health-dashboard
```
—— 设置阈值线与颜色标识
```
monitor-alert-on-decision-triggers
```
—— 针对质量指标告警，而非仅错误率
```
monitor-track-ranking-stability-churn
```
—— 将RBO波动作为前置指标
```
monitor-run-weekly-search-quality-review
```
—— 建立日历驱动的固定评审流程

Planning and Improving

规划与优化流程

Two playbooks compose the rules into end-to-end workflows:

```
references/playbooks/planning.md
```
— Plan a new marketplace retrieval system from scratch. Nine-step workflow from intent audit through the first A/B-tested online lift, with explicit exit criteria per step.
```
references/playbooks/improving.md
```
— Diagnose and improve an existing retrieval system. Decision tree that walks through telemetry, index freshness, coverage, baseline gap, cold start, segment regressions, and algorithm iteration in that order, with hand-off points to
```
marketplace-personalisation
```
when the bottleneck is personalisation-specific.

Read the playbooks first when the task is "design a new search and recommender project" or "this retrieval system needs to get better". Read individual rules when a specific question arises during implementation or review.

两份操作手册将规则整合为端到端的工作流：

```
references/playbooks/planning.md
```
—— 从零规划全新的市场检索系统。包含从意图审计到首次A/B测试上线提升的九步工作流，每个步骤都有明确的退出标准。
```
references/playbooks/improving.md
```
—— 诊断与优化现有检索系统。决策树将引导你依次检查遥测数据、索引新鲜度、覆盖范围、基准差距、冷启动问题、细分场景退化及算法迭代，当瓶颈为个性化相关问题时，可转交给
```
marketplace-personalisation
```
技能处理。

当你的任务是“设计全新的搜索与推荐项目”或“优化现有检索系统”时，请先阅读操作手册。在实现或评审过程中遇到具体问题时，再查阅对应的单条规则。

How to Use

使用方法

Read
```
references/_sections.md
```
for category structure and cascade rationale.
Read
```
gotchas.md
```
for diagnostic lessons accumulated from prior incidents.
Read
```
references/playbooks/planning.md
```
to plan a new system.
Read
```
references/playbooks/improving.md
```
to diagnose an existing one.
Read individual rule files when a specific task matches the rule title.
Use
```
assets/templates/_template.md
```
to author new rules as the skill grows.

阅读
```
references/_sections.md
```
了解类别结构与影响传导逻辑。
阅读
```
gotchas.md
```
获取过往事件积累的诊断经验。
阅读
```
references/playbooks/planning.md
```
规划新系统。
阅读
```
references/playbooks/improving.md
```
诊断现有系统。
当遇到具体任务时，查阅对应的单条规则文件。
使用
```
assets/templates/_template.md
```
在技能扩展时编写新规则。

Related Skills

Reference Files

参考文件

File	Description
references/_sections.md	Category definitions and impact ordering
references/playbooks/planning.md	Plan a new retrieval system
references/playbooks/improving.md	Diagnose an existing retrieval system
gotchas.md	Accumulated diagnostic lessons (living)
assets/templates/_template.md	Template for authoring new rules
metadata.json	Version, discipline, references

文件路径	描述
references/_sections.md	类别定义与影响排序说明
references/playbooks/planning.md	新检索系统规划手册
references/playbooks/improving.md	现有检索系统诊断与优化手册
gotchas.md	积累的诊断经验（动态更新）
assets/templates/_template.md	新规则编写模板
metadata.json	版本、领域与参考信息