Marketplace Engineering Two-Sided Search and Recsys Planning Best Practices
Comprehensive planning, design and diagnostic guide for search and recommendation systems
in two-sided trust marketplaces. Covers OpenSearch index, query and ranking patterns, the
methodology for planning retrieval work, the handoff points to recommendation-specific
tooling, and the instrumentation and dashboard layer that turns measurement into ongoing
decision making. Contains 57 rules across 10 categories ordered by cascade impact, plus
two playbooks (plan a new system from scratch, diagnose an existing one) and explicit
living-artefact conventions (decisions log, golden set, gotchas).
When to Apply
Reference this skill when:
- Planning a new marketplace retrieval project from scratch
- Reviewing an existing retrieval system that feels stale, unfair, or unpersonalised
- Designing the OpenSearch index mapping, analyzers, or query DSL
- Choosing retrieval primitives per product surface (search, recs, hybrid, curated)
- Deciding which search quality metrics to track and dashboard
- Running the weekly search-quality review ritual
- Diagnosing a silent regression in ranking, coverage, or zero-result rate
- Deciding when a retrieval problem is actually a personalisation problem
This skill is the
precursor to
marketplace-personalisation
. Start here for
planning and search work; hand off to the personalisation skill when the diagnosed
bottleneck is impression tracking, feedback-loop bias, or AWS Personalize-specific
design.
Living Context
This skill treats the system as evolving. Three living artefacts carry context across
sessions, releases, and team changes — read them before making suggestions, update them
after every shipped change:
- (in this skill folder) — append-only diagnostic lessons. Every gotcha
has a date and a short description of what surprised the team and how it was resolved.
- Decisions log (maintained in the product repo, typically ) —
every ranking change, schema tweak, and synonym edit recorded with its hypothesis,
offline and online evidence, ship criterion, outcome, and rollback path. See rule
plan-maintain-a-decisions-log
.
- Golden query set (frozen per eval cycle, committed to the product repo) — the
reference set of queries against which every ranking change is offline-evaluated
before an online test. See rule
plan-version-the-golden-set
.
Rule Categories
Categories are ordered by cascade impact on the retrieval lifecycle: intent
misunderstanding poisons architecture; wrong architecture poisons index; wrong index
poisons retrieval forever until a reindex; every downstream layer inherits the upstream
error.
| # | Category | Prefix | Impact |
|---|
| 1 | Problem Framing and User Intent | | CRITICAL |
| 2 | Surface Taxonomy and Architecture | | CRITICAL |
| 3 | Index Design and Mapping | | HIGH |
| 4 | Planning and Improvement Methodology | | HIGH |
| 5 | Query Understanding | | MEDIUM-HIGH |
| 6 | Retrieval Strategy | | MEDIUM-HIGH |
| 7 | Relevance and Ranking | | MEDIUM-HIGH |
| 8 | Search and Recommender Blending | | MEDIUM |
| 9 | Measurement and Experimentation | | MEDIUM |
| 10 | Instrumentation, Dashboards and Decision Triggers | | MEDIUM |
Quick Reference
1. Problem Framing and User Intent (CRITICAL)
intent-map-queries-to-intent-classes
— classify before retrieving
intent-separate-known-item-from-discovery
— different failure modes, different strategies
intent-audit-live-query-logs-first
— design from real data, not imagined data
intent-distinguish-transactional-from-exploratory
— precision vs diversity
intent-reject-one-search-for-everything
— per-surface query shapes
intent-treat-no-search-as-first-class-choice
— curated is a legitimate answer
2. Surface Taxonomy and Architecture (CRITICAL)
arch-map-surface-to-retrieval-primitive
— a single-source-of-truth routing table
arch-split-candidate-generation-from-ranking
— two-stage pipelines
arch-design-zero-result-fallback
— declare fallback owner per surface
arch-design-for-cold-start-from-day-one
— cold start is permanent, not bootstrap
arch-avoid-mono-stack-retrieval
— diversify primary dependencies
arch-route-surfaces-deliberately
— every routing decision recorded
3. Index Design and Mapping (HIGH)
index-design-mappings-conservatively
— reindex is expensive
index-use-keyword-and-text-as-multi-fields
— full-text plus exact match
index-match-index-and-query-time-analyzers
— tokens must agree
index-use-language-analyzers-for-language-fields
— language-aware stemming
index-separate-searchable-from-display-fields
— index only what you search
index-use-index-templates-for-consistency
— prevent mapping drift
index-stream-listing-updates-via-cdc
— freshness in seconds, not hours
4. Planning and Improvement Methodology (HIGH)
plan-audit-before-you-build
— instrumentation gate on kick-off
plan-build-golden-query-set-first
— the first artefact, not the last
plan-find-bottleneck-before-optimising
— theory of constraints
plan-maintain-a-decisions-log
— living context across team changes
plan-version-the-golden-set
— frozen per eval cycle
plan-handoff-to-personalisation-skill
— recognise the boundary
5. Query Understanding (MEDIUM-HIGH)
query-normalise-before-anything-else
— canonical string in
query-use-language-analyzers-for-stemming
— double-digit recall wins
query-curate-synonyms-by-domain
— domain vocabulary not thesaurus
query-use-fuzzy-matching-for-typos
— 10-15% of queries have typos
query-classify-before-routing
— single-pass classifier
query-build-autocomplete-on-separate-index
— latency isolation
6. Retrieval Strategy (MEDIUM-HIGH)
retrieve-use-filter-clauses-for-exact-matches
— filter cache wins
retrieve-use-bool-structure-deliberately
— must vs should vs filter
retrieve-run-expensive-signals-in-rescore
— rescore window limits cost
retrieve-combine-bm25-and-knn-via-hybrid-search
— lexical plus semantic
retrieve-paginate-with-search-after
— constant-cost deep pagination
retrieve-choose-embedding-model-deliberately
— re-embedding is expensive
7. Relevance and Ranking (MEDIUM-HIGH)
rank-tune-bm25-parameters-last
— upstream levers first
rank-use-function-score-for-business-signals
— explicit named functions
rank-deploy-ltr-only-after-golden-set-exists
— supervised learning needs labels
rank-apply-diversity-at-rank-time
— after scoring, not before
rank-normalise-scores-across-retrieval-primitives
— comparable scales
8. Search and Recommender Blending (MEDIUM)
blend-use-search-alone-for-specific-intent
— precision queries
blend-combine-search-and-personalisation-scores
— normalised weighted sum
blend-keep-hybrid-blending-explainable
— traceable results
blend-never-return-zero-results
— guaranteed cascade to non-empty
9. Measurement and Experimentation (MEDIUM)
measure-define-session-success-per-surface
— one definition per surface
measure-track-ndcg-mrr-zero-result-rate
— three metrics for one picture
measure-track-reformulation-rate-as-failure-signal
— cheapest failure metric
measure-use-click-models-for-implicit-judgments
— scale beyond human judges
measure-run-interleaving-as-cheap-ab-proxy
— 10x less sample needed
10. Instrumentation, Dashboards and Decision Triggers (MEDIUM)
monitor-log-every-query-with-full-context
— structured replayable events
monitor-scrub-pii-from-query-logs
— redact before warehouse ingestion
monitor-build-search-health-dashboard
— threshold lines, colour bands
monitor-alert-on-decision-triggers
— quality metrics, not error rates
monitor-track-ranking-stability-churn
— RBO churn as leading indicator
monitor-run-weekly-search-quality-review
— calendar-driven ritual
Planning and Improving
Two playbooks compose the rules into end-to-end workflows:
references/playbooks/planning.md
— Plan a new marketplace retrieval system from scratch. Nine-step workflow from intent audit through the first A/B-tested online lift, with explicit exit criteria per step.
references/playbooks/improving.md
— Diagnose and improve an existing retrieval system. Decision tree that walks through telemetry, index freshness, coverage, baseline gap, cold start, segment regressions, and algorithm iteration in that order, with hand-off points to marketplace-personalisation
when the bottleneck is personalisation-specific.
Read the playbooks first when the task is "design a new search and recommender project"
or "this retrieval system needs to get better". Read individual rules when a specific
question arises during implementation or review.
How to Use
- Read for category structure and cascade rationale.
- Read for diagnostic lessons accumulated from prior incidents.
- Read
references/playbooks/planning.md
to plan a new system.
- Read
references/playbooks/improving.md
to diagnose an existing one.
- Read individual rule files when a specific task matches the rule title.
- Use
assets/templates/_template.md
to author new rules as the skill grows.
Related Skills
marketplace-personalisation
— The companion skill covering AWS Personalize implementation, impression tracking, schema design, two-sided matching, feedback loops, and the personalisation-specific diagnostic playbook. Hand off to this skill when the diagnostic identifies a personalisation-specific bottleneck.
Reference Files
| File | Description |
|---|
| references/_sections.md | Category definitions and impact ordering |
| references/playbooks/planning.md | Plan a new retrieval system |
| references/playbooks/improving.md | Diagnose an existing retrieval system |
| gotchas.md | Accumulated diagnostic lessons (living) |
| assets/templates/_template.md | Template for authoring new rules |
| metadata.json | Version, discipline, references |