Loading...
Loading...
Quick install and deploy vLLM, start serving with a simple LLM, and test OpenAI API.
npx skill4agent add vllm-project/vllm-skills vllm-deploy-simple# Default deployment options (--venv "." --model "Qwen/Qwen2.5-1.5B-Instruct" --port 8000 --gpu_memory_utilization 0.8)
scripts/quickstart.sh# Use custom virtual environment
scripts/quickstart.sh --venv /path/to/venv
# Use custom model and port
scripts/quickstart.sh --model "Qwen/Qwen2.5-1.5B-Instruct" --port 8000
# Use custom GPU memory utilization
scripts/quickstart.sh --gpu_memory_utilization 0.6
# Combine all options
scripts/quickstart.sh --venv /path/to/venv --model "Qwen/Qwen2.5-1.5B-Instruct" --port 8000 --gpu_memory_utilization 0.8scripts/quickstart.sh install
# Or with virtual environment
scripts/quickstart.sh install --venv /path/to/venvscripts/quickstart.sh start
# Or with custom options
scripts/quickstart.sh start --venv /path/to/venv --model "Qwen/Qwen2.5-1.5B-Instruct" --port 8000 --gpu_memory_utilization 0.8scripts/quickstart.sh test
# Or with custom port
scripts/quickstart.sh test --port 8000scripts/quickstart.sh stop
# Or with virtual environment
scripts/quickstart.sh stop --venv /path/to/venvscripts/quickstart.sh statusscripts/quickstart.sh restart
# Or with custom options
scripts/quickstart.sh restart --venv /path/to/venv --port 8000 --gpu_memory_utilization 0.8scripts/quickstart.sh [command] [OPTIONS]
Commands:
install - Install vLLM and dependencies
start - Start the vLLM server
stop - Stop the vLLM server
test - Test the OpenAI-compatible API
status - Show server status
restart - Restart the server
all - Run complete workflow (default)
Options:
--model MODEL Model to use (default: Qwen/Qwen2.5-1.5B-Instruct)
--port PORT Port to run server on (default: 8000)
--venv VENV_PATH Virtual environment path (default: .)
--gpu_memory_utilization VRAM GPU memory utilization (default: 0.8)nvidia-smi/dev/kfd/dev/driTPU_NAMEgcloudvllm-tpuvllmcurl http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "Qwen/Qwen2.5-1.5B-Instruct",
"messages": [{"role": "user", "content": "Say hello!"}],
"max_tokens": 50
}'--venvbin/activateScripts/activateuv venv /path/to/venvpython3 -m venv /path/to/venvlsof -i :8000nvidia-smirocm-smipython -c "import vllm; print(vllm.__version__)"$VENV_PATH/tmp/vllm-server.logcat $VENV_PATH/tmp/vllm-server.logscripts/quickstart.sh status--gpu-memory-utilizationnvidia-smiTPU_NAMEgcloud$VENV_PATH/tmp/vllm-server.log$VENV_PATH/tmp/vllm-server.piduvpipscripts/quickstart.sh --port 8080 start --venv /path/to/venv