E-Commerce Product Ranking
Overview
E-commerce ranking combines text relevance (BM25) with commercial signals (CTR, conversion rate, revenue, margin) into a unified ranking score. Uses learning-to-rank (LTR) models trained on click and conversion data to optimize for business-relevant outcomes.
When to Use
Trigger conditions:
- Building a product search/browse ranking beyond pure text relevance
- Incorporating business metrics (margin, inventory) into ranking
- Implementing a learning-to-rank pipeline
When NOT to use:
- For pure text search relevance only (use BM25)
- When no click/conversion data exists (start with rule-based ranking)
Algorithm
IRON LAW: Relevance Is Necessary But NOT Sufficient for E-Commerce Ranking
A result that is textually relevant but has zero sales history, no
reviews, and is out of stock serves no one. E-commerce ranking must
balance: relevance (does it match the query?), quality (is it a good
product?), and commercial value (does it generate revenue?).
Phase 1: Input Validation
Collect features per product-query pair: text relevance score (BM25), historical CTR, conversion rate, average rating, review count, price competitiveness, inventory level, margin.
Gate: Minimum features available, click data from 30+ days.
Phase 2: Core Algorithm
Rule-based baseline: Score = w₁×relevance + w₂×popularity + w₃×rating + w₄×recency. Manually tune weights.
LTR approach:
- Generate training data from click logs (clicked = positive, skipped = negative, with position debiasing)
- Features: text match, behavioral (CTR, add-to-cart rate), product quality (rating, reviews), freshness, price
- Train: LambdaMART or gradient-boosted ranking model optimizing NDCG
- Blend: final_score = α × LTR_score + (1-α) × business_boost
Phase 3: Verification
Evaluate offline: NDCG@10, MRR. A/B test online: revenue per search, click-through rate, conversion rate.
Gate: NDCG improves over baseline, A/B test positive on primary metric.
Phase 4: Output
Return ranked product list with score decomposition.
Output Format
json
{
"results": [{"product_id": "P123", "rank": 1, "final_score": 0.92, "components": {"relevance": 0.85, "popularity": 0.95, "quality": 0.90}}],
"metadata": {"query": "wireless earbuds", "model": "lambdamart", "ndcg_at_10": 0.72}
}
Examples
Sample I/O
Input: Query "laptop", 500 matching products
Expected: Top results balance text match + high conversion + good ratings, not just keyword relevance.
Edge Cases
| Input | Expected | Why |
|---|
| New product, no history | Rely on text relevance + category avg | Cold start — no behavioral signal |
| Out of stock item | Demote or remove | Showing unavailable products frustrates users |
| Sponsored product | Blend ad rank with organic | Separate sponsored from organic clearly |
Gotchas
- Position bias in training data: Higher-ranked items get more clicks regardless of quality. Debias training data using inverse propensity weighting or randomization experiments.
- Popularity bias: Without diversity controls, popular items dominate rankings. New or niche products get no exposure. Add exploration bonus.
- Revenue optimization ≠ user satisfaction: Ranking by margin pushes expensive products up. Users lose trust if results feel commercially manipulated.
- Feature freshness: Click signals change daily. Retrain or update features frequently. Stale features degrade ranking quality.
- Category-specific models: A single ranking model may not work across all categories. Electronics ranking differs from fashion ranking.
References
- For LambdaMART implementation, see
- For position debiasing techniques, see
references/position-debiasing.md