Doug Smith dougbtv

RHAII Preview: Voxtral Realtime

This guide covers running and trying out the Red Hat AI Inference Server to serve Mistral Voxtral-Mini-4B-Realtime-2602 model, powered by vLLM.

You can find the Voxtral Mini model card @ https://huggingface.co/mistralai/Voxtral-Mini-4B-Realtime-2602

From the model card:

Voxtral Mini 4B Realtime 2602 is a multilingual, realtime speech-transcription model and among the first open-source solutions to achieve accuracy comparable to offline systems with a delay of <500ms. It supports 13 languages and outperforms existing open-source baselines across a range of tasks, making it ideal for applications like voice assistants and live subtitling.

RHAIIS Preview: NVIDIA Nemotron v3 (Nano 30B-A3B) on Red Hat AI Inference Server

This is a technical quick-start gist for the latest Red Hat AI Inference Server (RHAIIS) preview image, featuring NVIDIA Nemotron v3 Nano 30B-A3B models on vLLM.

Preview image tag (this release):

registry.redhat.io/rhaiis-preview/vllm-cuda-rhel9:nvidia-nemotron-v3

Upstream model family (Hugging Face):

nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-Base-BF16
nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16

Using this dummy CNI script...

Pay attention to the cniresult() routine, which returns... two interfaces.

#!/usr/bin/env bash

# DEBUG=true
# LOGFILE=/tmp/seamless.log

Enable the reconciler...

oc edit networks.operator.openshift.io cluster and add the additionalNetworks section like:

  additionalNetworks:
  - name: whereabouts-shim
    namespace: openshift-multus
    rawCNIConfig: |-
      {

Referring to "global namespaces" using Multus CNI in OpenShift

This demonstrates using a cross-namespace reference in OpenShift to refer to net-attach-defs in the openshift-multus namespace from another namespace.

See the additional pod.yml and net-attach-def.yaml files included in this gist.

Using latest OCP from CI (4.9 master)

Clearing IP allocations manually with Whereabouts

This outlines a process for clearing IP address allocations with Whereabouts manually. This clears all allocations, you could be more surgical about it, however, this is efficient if it's possible.

NOTE I have another procedure somewhere which has fancy bash commands to make this easier, and is fully tested, however, in theory this "should just work" (you've heard that before)

Overview

Stop all pods which use Multus + Whereabouts (if possible)
Clear IP allocations

Flatten and verify on Matic blockscout explorer

I was having trouble verifying my contracts on the Matic blockscout explorer when they were using included files such as the openzepplin libraries.

I found that I was not having good luck with the truffle-flattener, so I went out seeking something else.

I wound up using: https://github.com/DaveAppleton/SolidityFlattery -- which I found from this openzeppelin thread.

You'll need to install golang and configure your gopath.

	const axios = require('axios');

	// Server URL
	const server = "http://192.168.50.201:5000/api/v1/generate";

	// Generation parameters
	// Reference: https://huggingface.co/docs/transformers/main_classes/text_generation#transformers.GenerationConfig
	const data = {
	'prompt': 'recommend a cheese please',
	"max_new_tokens": 100,

	---
	kind: DaemonSet
	apiVersion: apps/v1
	metadata:
	name: multus-additional-cni-plugins
	namespace: kube-system
	annotations:
	kubernetes.io/description: \|
	This daemon installs and configures auxiliary CNI plugins on each node.
	spec:

	#!/usr/bin/env bash

	DEBUG=true
	# LOGFILE=/tmp/seamless.log

	# Outputs errors to stderr
	errorlog () {
	>&2 echo $1
	}