Loading...
Loading...
Implement BM25 ranking function for e-commerce product search relevance scoring. Use this skill when the user needs to build a text-based product search engine, improve search result relevance, or replace basic TF-IDF with a more robust ranking function — even if they say 'product search ranking', 'search relevance', or 'BM25 implementation'.
npx skill4agent add asgard-ai-platform/skills algo-ecom-bm25IRON LAW: BM25 Has Two Critical Parameters — k₁ and b
k₁ controls term frequency saturation: higher k₁ = more weight to
repeated terms. k₁=0 ignores TF entirely (boolean).
b controls document length normalization: b=1 fully normalizes by
length, b=0 ignores length. Default k₁=1.2, b=0.75 works for most
cases but MUST be tuned for your specific corpus.the, a, an, and, or, but, of, in, on, at, to, for, with, by, from, as, is, are, was, were, be, been, being⚠️ Stop-word removal affectsand|d|: because stop words are dropped before length is measured, hand-computing BM25 without removing them will give the wrong length normalization and scores will be off by 3–5%. If you're reproducing BM25 by hand to compare against the script, apply the same stop list first — or just run the script.avgdl
⚠️ IDF variant lock-in: BM25 has several IDF formulations in the wild (Robertson-Sparck Jones, classic Okapi, Lucene's smoothed, BM25+, BM25L). This skill — and the bundled script — uses the Lucene-style smoothed variant shown above (+1), which never returns negative IDF for very common terms. If you compare scores against another engine (Elasticsearch, Solr, Whoosh), they may differ by ~3–5% even on identical inputs. Do not "correct" the script unless you intend to change the variant globally.log((N - df + 0.5) / (df + 0.5) + 1)
{
"results": [{"doc_id": "SKU-123", "score": 12.5, "title": "Red Running Shoes"}],
"metadata": {"query": "red shoes", "hits": 85, "k1": 1.2, "b": 0.75, "avg_doc_length": 45}
}| Input | Expected | Why |
|---|---|---|
| Single-word query | IDF-dominated ranking | Only one term's IDF differentiates |
| Very common term ("the") | Near-zero IDF, low impact | IDF suppresses common terms |
| Document with 100 repetitions | Saturated TF, not 100x score | k₁ caps the benefit of repetition |
| Script | Description | Usage |
|---|---|---|
| Score documents against a query using BM25 ranking function | |
python scripts/bm25.py --verifyreferences/bm25f.mdreferences/parameter-tuning.md