Loading...
Loading...
Compare original and translation side by side
benchmarks/benchmark_prefix_caching.pyvllm bench serveprefix_repetitionbenchmarks/benchmark_prefix_caching.pyprefix_repetitionvllm bench serve--enable-prefix-caching--enable-prefix-cachingpython3 benchmarks/benchmark_prefix_caching.py \
--model Qwen/Qwen3-8B \
--enable-prefix-caching \
--num-prompts 1 \
--repeat-count 100 \
--input-length-range 128:256python3 benchmarks/benchmark_prefix_caching.py \
--model Qwen/Qwen3-8B \
--no-enable-prefix-caching \
--num-prompts 1 \
--repeat-count 100 \
--input-length-range 128:256python3 benchmarks/benchmark_prefix_caching.py \
--model Qwen/Qwen3-8B \
--enable-prefix-caching \
--num-prompts 1 \
--repeat-count 100 \
--input-length-range 128:256python3 benchmarks/benchmark_prefix_caching.py \
--model Qwen/Qwen3-8B \
--no-enable-prefix-caching \
--num-prompts 1 \
--repeat-count 100 \
--input-length-range 128:256wget https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered/resolve/main/ShareGPT_V3_unfiltered_cleaned_split.jsonpython3 benchmarks/benchmark_prefix_caching.py \
--model Qwen/Qwen3-8B \
--dataset-path ShareGPT_V3_unfiltered_cleaned_split.json \
--enable-prefix-caching \
--num-prompts 20 \
--repeat-count 5 \
--input-length-range 128:256wget https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered/resolve/main/ShareGPT_V3_unfiltered_cleaned_split.jsonpython3 benchmarks/benchmark_prefix_caching.py \
--model Qwen/Qwen3-8B \
--dataset-path ShareGPT_V3_unfiltered_cleaned_split.json \
--enable-prefix-caching \
--num-prompts 20 \
--repeat-count 5 \
--input-length-range 128:256vllm bench serveprefix_repetitionvllm serve Qwen/Qwen3-8Bvllm bench serve \
--backend openai \
--model Qwen/Qwen3-8B \
--dataset-name prefix_repetition \
--num-prompts 100 \
--prefix-repetition-prefix-len 512 \
--prefix-repetition-suffix-len 128 \
--prefix-repetition-num-prefixes 5 \
--prefix-repetition-output-len 128prefix_repetition| Parameter | Description |
|---|---|
| Number of tokens in the shared prefix portion |
| Number of tokens in the unique suffix portion |
| Number of distinct prefixes to cycle through |
| Number of output tokens to generate per request |
prefix_repetitionvllm bench servevllm serve Qwen/Qwen3-8Bvllm bench serve \
--backend openai \
--model Qwen/Qwen3-8B \
--dataset-name prefix_repetition \
--num-prompts 100 \
--prefix-repetition-prefix-len 512 \
--prefix-repetition-suffix-len 128 \
--prefix-repetition-num-prefixes 5 \
--prefix-repetition-output-len 128prefix_repetition| 参数 | 描述 |
|---|---|
| 共享前缀部分的token数量 |
| 唯一后缀部分的token数量 |
| 循环使用的不同前缀数量 |
| 每个请求生成的输出token数量 |
cd vllmQwen/Qwen3-8B--model--repeat-count--input-length-rangemin:max128:256--tensor-parallel-size <N>--prefix-caching-hash-algo xxhashpip install xxhashcd vllmQwen/Qwen3-8B--model--repeat-count--input-length-rangemin:max128:256--tensor-parallel-size <N>--prefix-caching-hash-algo xxhashpip install xxhashbenchmark_prefix_caching.pybenchmark_prefix_caching.py| Argument | Required | Description |
|---|---|---|
| Yes | Model name or path (HuggingFace ID or local path) |
| Yes | Number of prompts to process |
| Yes | Token length range for inputs, e.g. |
| No | Number of times each prompt is repeated (default: 1) |
| No | Path to a dataset file (e.g. ShareGPT JSON). Omit for synthetic fixed-prompt mode |
| No | Fixed prefix token length to prepend to every prompt |
| No | Number of output tokens to generate per request |
| No | Sort prompts by length before benchmarking |
| No | Toggle APC (recommended: enable to test caching) |
| No | Hash algorithm: |
| No | Number of GPUs for tensor parallelism |
| No | Skip detokenization to reduce overhead |
| 参数 | 是否必填 | 描述 |
|---|---|---|
| 是 | 模型名称或路径(HuggingFace ID或本地路径) |
| 是 | 要处理的提示词数量 |
| 是 | 输入的token长度范围,例如 |
| 否 | 每个提示词的重复次数(默认值:1) |
| 否 | 数据集文件路径(如ShareGPT JSON)。省略则使用合成固定提示词模式 |
| 否 | 要添加到每个提示词前的固定前缀token长度 |
| 否 | 每个请求生成的输出token数量 |
| 否 | 基准测试前按长度对提示词排序 |
| 否 | 开启/关闭APC(推荐开启以测试缓存) |
| 否 | 哈希算法: |
| 否 | 用于张量并行的GPU数量 |
| 否 | 跳过解码以减少开销 |
python3 benchmarks/*.pygit clone https://github.com/vllm-project/vllm
cd vllmexport HF_TOKEN=<your_token>--hf-token <your_token>xxhashcbor2pip install xxhash cbor2python3 benchmarks/*.pygit clone https://github.com/vllm-project/vllm
cd vllmexport HF_TOKEN=<your_token>--hf-token <your_token>xxhashcbor2pip install xxhash cbor2