Install Docker image https://github.com/eugr/spark-vllm-docker/
git clone https://github.com/eugr/spark-vllm-docker.git
cd spark-vllm-docker
./build-and-copy.sh --use-wheels
Launch Server (from https://forums.developer.nvidia.com/t/how-to-run-qwen3-coder-next-on-spark/359571)
./launch-cluster.sh --solo \
exec vllm serve Qwen/Qwen3-Coder-Next-FP8 \
--enable-auto-tool-choice \
--tool-call-parser qwen3_coder \
--gpu-memory-utilization 0.8 \
--host 0.0.0.0 --port 8888 \
--load-format fastsafetensors \
--attention-backend flashinfer \
--enable-prefix-caching
Update OpenCode Config (key of model must match model serve id):
.config/opencode.json
{
"$schema": "https://opencode.ai/config.json",
"provider": {
"spark_vllm": {
"npm": "@ai-sdk/openai-compatible",
"name": "Qwen3-Coder-Next-FP8 (Local)",
"options": {
"baseURL": "http://FILLIN:8888/v1",
},
"models": {
"Qwen/Qwen3-Coder-Next-FP8": {
"name": "My Q3CN"
}
}
}
}
}
- Install
uv - Run the following:
# Step 1: Create a fresh virtualenv in folder
uv venv .venv
source .venv/bin/activate
#Step 2: Install PyTorch FIRST (before vLLM)
uv pip install --pre torch torchvision torchaudio \
--index-url https://download.pytorch.org/whl/nightly/cu130
# Step 3: Verify PyTorch has CUDA
uv run python -c "import torch; print(f'PyTorch: {torch.__version__}'); print(f'CUDA: {torch.cuda.is_available()}')"
#Should show: PyTorch: 2.11.0.dev... and CUDA: True
# Step 4: Clone the fork
git clone https://github.com/seli-equinix/vllm.git
cd vllm
git checkout feature/sm121-gb10-support
# Step 5: Install vLLM with --no-deps (CRITICAL)
# This preserves your PyTorch installation:
uv pip install setuptools-scm
uv pip install -e . --no-deps --no-build-isolation
# Step 6: Install remaining dependencies
uv pip install -r requirements-common.txt
uv pip install fastsafetensors
# Step 7: Verify installation
uv run python -c "import vllm; print(f'vLLM: {vllm.__version__}')"
# Step 8: Run server
# see https://unsloth.ai/docs/models/qwen3-coder-next
uv run vllm serve unsloth/Qwen3-Coder-Next-FP8-Dynamic \
--served-model-name unsloth/Qwen3-Coder-Next \
--tensor-parallel-size 1 \
--tool-call-parser qwen3_coder \
--enable-auto-tool-choice \
--dtype bfloat16 \
--seed 3407 \
--max-model-len 200000 \
--gpu-memory-utilization 0.93 \
--port 8001
There is a typo up there:
uv pip install -r requirements-common.txt
is probably
uv pip install -r requirements/common.txt