Skip to content

Instantly share code, notes, and snippets.

@nerdalert
Created January 28, 2026 06:03
Show Gist options
  • Select an option

  • Save nerdalert/f57d18260d11848c2ecf4f229aa665a0 to your computer and use it in GitHub Desktop.

Select an option

Save nerdalert/f57d18260d11848c2ecf4f229aa665a0 to your computer and use it in GitHub Desktop.

KIND vSR Deploy/Validation

# Create a Cluster #

$ kind create cluster --name semantic-router
Creating cluster "semantic-router" ...
 βœ“ Ensuring node image (kindest/node:v1.35.0) πŸ–Ό
 βœ“ Preparing nodes πŸ“¦
 βœ“ Writing configuration πŸ“œ
 βœ“ Starting control-plane πŸ•ΉοΈ
 βœ“ Installing CNI πŸ”Œ
 βœ“ Installing StorageClass πŸ’Ύ
Set kubectl context to "kind-semantic-router"
You can now use your cluster with:

kubectl cluster-info --context kind-semantic-router

Have a nice day! πŸ‘‹

# Deploy #

$ ./deploy/openshift/deploy-to-openshift.sh --kind --no-observability
[SUCCESS] Connected to cluster: kind-semantic-router
[INFO] Creating namespace: vllm-semantic-router-system
namespace/vllm-semantic-router-system created
[SUCCESS] Namespace ready
[INFO] KServe CRD not found - using standalone deployment mode
[INFO] Deploying standalone simulator pods...
deployment.apps/vllm-model-a created
deployment.apps/vllm-model-b created
service/vllm-model-a created
service/vllm-model-b created
[INFO] Waiting for simulator services to get ClusterIPs...
[SUCCESS] Got ClusterIPs: model-a=10.96.101.161, model-b=10.96.143.105
[INFO] Creating PersistentVolumeClaims...
persistentvolumeclaim/semantic-router-models created
persistentvolumeclaim/semantic-router-cache created
[SUCCESS] PVCs created
[INFO] Generating configuration...
[SUCCESS] Configuration generated
[INFO] Creating ConfigMaps...
configmap/semantic-router-config created
configmap/envoy-config created
[SUCCESS] ConfigMaps created
[INFO] Deploying semantic-router...
deployment.apps/semantic-router created
[SUCCESS] Semantic-router deployment applied
[INFO] Creating services...
service/semantic-router created
service/semantic-router-metrics created
[SUCCESS] Services created
[INFO] Waiting for deployments to be ready...
[INFO] This may take several minutes as models are downloaded...
Waiting for deployment "vllm-model-a" rollout to finish: 0 of 1 updated replicas are available...
deployment "vllm-model-a" successfully rolled out
deployment "vllm-model-b" successfully rolled out
Waiting for deployment "semantic-router" rollout to finish: 0 of 1 updated replicas are available...
deployment "semantic-router" successfully rolled out
[SUCCESS] Deployment complete!

==================================================
  Kind Deployment Summary
==================================================

Namespace: vllm-semantic-router-system

Access the services (run in a separate terminal):

  kubectl port-forward -n vllm-semantic-router-system svc/semantic-router 8080:8080 8801:8801

Then test:

  # Auto-routing (classifier picks the model)
  curl http://localhost:8801/v1/chat/completions \
    -H 'Content-Type: application/json' \
    -d '{"model": "auto", "messages": [{"role": "user", "content": "What is 2+2?"}]}'

  # STEM query -> routes to Model-A
  curl http://localhost:8801/v1/chat/completions \
    -H 'Content-Type: application/json' \
    -d '{"model": "auto", "messages": [{"role": "user", "content": "Explain quantum physics"}]}'

  # Humanities query -> routes to Model-B
  curl http://localhost:8801/v1/chat/completions \
    -H 'Content-Type: application/json' \
    -d '{"model": "auto", "messages": [{"role": "user", "content": "Explain the elements of a contract under common law and give a simple example."}]}'

View logs:
  kubectl logs -f deployment/semantic-router -c semantic-router -n vllm-semantic-router-system
  kubectl logs -f deployment/semantic-router -c envoy-proxy -n vllm-semantic-router-system

View status:
  kubectl get pods -n vllm-semantic-router-system
  kubectl get svc -n vllm-semantic-router-system

# Validation #

$ curl http://localhost:8801/v1/chat/completions \
    -H 'Content-Type: application/json' \
    -d '{"model": "auto", "messages": [{"role": "user", "content": "What is 2+2?"}]}'
{"id":"chatcmpl-431952b9-f369-4cd7-b398-c09f1425c774","created":1769578653,"model":"Model-A","usage":{"prompt_tokens":6,"completion_tokens":50,"total_tokens":56},"object":"chat.completion","do_remote_decode":false,"do_remote_prefill":false,"remote_block_ids":null,"remote_engine_id":"","remote_host":"","remote_port":0,"choices":[{"index":0,"finish_reason":"stop","message":{"role":"assistant","content":"Testing@, #testing 1$ ,2%,3^, [4\u0026*5], 6~, 7-_ + (8 : 9) / \\ \u003c \u003e . Today it is partially cloudy and raining. The temperature here is "}}]}

$ curl http://localhost:8801/v1/chat/completions \
    -H 'Content-Type: application/json' \
    -d '{"model": "auto", "messages": [{"role": "user", "content": "Explain quantum physics"}]}'
{"id":"chatcmpl-2306494d-3681-481e-82b0-9e160b36d16c","created":1769578669,"model":"Model-A","usage":{"prompt_tokens":3,"completion_tokens":45,"total_tokens":48},"object":"chat.completion","do_remote_decode":false,"do_remote_prefill":false,"remote_block_ids":null,"remote_engine_id":"","remote_host":"","remote_port":0,"choices":[{"index":0,"finish_reason":"stop","message":{"role":"assistant","content":"Alas, poor Yorick! I knew him, Horatio: A fellow of infinite jest The rest is silence.  Today it is partially cloudy and raining. Testing@, #testing 1$ ,2%,3^, [4"}}]}

$ curl http://localhost:8801/v1/chat/completions \
    -H 'Content-Type: application/json' \
    -d '{"model": "auto", "messages": [{"role": "user", "content": "Explain the elements of a contract under common law and give a simple example."}]}'
{"id":"chatcmpl-3f759710-4064-4143-b5b2-402398fbda6b","created":1769578677,"model":"Model-B","usage":{"prompt_tokens":15,"completion_tokens":25,"total_tokens":40},"object":"chat.completion","do_remote_decode":false,"do_remote_prefill":false,"remote_block_ids":null,"remote_engine_id":"","remote_host":"","remote_port":0,"choices":[{"index":0,"finish_reason":"stop","message":{"role":"assistant","content":"Today it is partially cloudy and raining. The temperature here is twenty-five degrees centigrade. Today it is partially cloudy and raining"}}]}$
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment