What Works

Install Docker image https://github.com/eugr/spark-vllm-docker/

git clone https://github.com/eugr/spark-vllm-docker.git
cd spark-vllm-docker
./build-and-copy.sh --use-wheels

Launch Server (from https://forums.developer.nvidia.com/t/how-to-run-qwen3-coder-next-on-spark/359571)

./launch-cluster.sh --solo \
exec vllm serve Qwen/Qwen3-Coder-Next-FP8 \
	--enable-auto-tool-choice \
	--tool-call-parser qwen3_coder \
	--gpu-memory-utilization 0.8 \
	--host 0.0.0.0 --port 8888 \
	--load-format fastsafetensors \
	--attention-backend flashinfer \
	--enable-prefix-caching

Update OpenCode Config (key of model must match model serve id):

.config/opencode.json

{
  "$schema": "https://opencode.ai/config.json",
  "provider": {
    "spark_vllm": {
      "npm": "@ai-sdk/openai-compatible",
      "name": "Qwen3-Coder-Next-FP8 (Local)",
      "options": {
        "baseURL": "http://FILLIN:8888/v1",
      },
      "models": {
        "Qwen/Qwen3-Coder-Next-FP8": {
          "name": "My Q3CN"
        }
      }
    }

  }
}

What I Want to Do

Install uv
Run the following:

# Step 1: Create a fresh virtualenv in folder

uv venv .venv
source .venv/bin/activate

#Step 2: Install PyTorch FIRST (before vLLM)

uv pip install --pre torch torchvision torchaudio \
    --index-url https://download.pytorch.org/whl/nightly/cu130

# Step 3: Verify PyTorch has CUDA

uv run python -c "import torch; print(f'PyTorch: {torch.__version__}'); print(f'CUDA: {torch.cuda.is_available()}')"
#Should show: PyTorch: 2.11.0.dev... and CUDA: True

# Step 4: Clone the fork

git clone https://github.com/seli-equinix/vllm.git
cd vllm
git checkout feature/sm121-gb10-support

# Step 5: Install vLLM with --no-deps (CRITICAL)

# This preserves your PyTorch installation:

uv pip install setuptools-scm
uv pip install -e . --no-deps --no-build-isolation

# Step 6: Install remaining dependencies

uv pip install -r requirements-common.txt
uv pip install fastsafetensors

# Step 7: Verify installation

uv run python -c "import vllm; print(f'vLLM: {vllm.__version__}')"

# Step 8: Run server
# see https://unsloth.ai/docs/models/qwen3-coder-next
uv run vllm serve unsloth/Qwen3-Coder-Next-FP8-Dynamic \
    --served-model-name unsloth/Qwen3-Coder-Next \
    --tensor-parallel-size 1 \
    --tool-call-parser qwen3_coder \
    --enable-auto-tool-choice \
    --dtype bfloat16 \
    --seed 3407 \
    --max-model-len 200000 \
    --gpu-memory-utilization 0.93 \
    --port 8001

mattharrison/gist:b56bee4cc1d5408868bc03aca7ce9f71

Select an option

No results found

Select an option

No results found

What Works

What I Want to Do

mbrasile commented Feb 8, 2026

Uh oh!

mattharrison/gist:b56bee4cc1d5408868bc03aca7ce9f71

What Works

What I Want to Do

mbrasile commented Feb 8, 2026

uv pip install -r requirements-common.txt

uv pip install -r requirements/common.txt

Uh oh!