Guía paso a paso para instalar y ejecutar Unsloth Studio en Linux.
- GPU NVIDIA con driver CUDA
- Linux con pacman/dnf/apt (este guide usa Arch/pacman)
# Arch Linux
sudo pacman -S cmake
# Debian/Ubuntu
sudo apt install cmake
# Fedora
sudo dnf install cmakecurl -fsSL https://unsloth.ai/install.sh | sh/root/.unsloth/studio/unsloth_studio/bin/unsloth studio -H 0.0.0.0 -p 8888Accedé en: http://localhost:8888
El password inicial está en:
cat /root/.unsloth/studio/auth/.bootstrap_password
sudo tee /etc/systemd/system/unsloth.service > /dev/null << 'EOF'
[Unit]
Description=Unsloth Studio
After=network.target
[Service]
Type=simple
User=root
WorkingDirectory=/root
ExecStart=/root/.unsloth/studio/unsloth_studio/bin/unsloth studio -H 0.0.0.0 -p 8888
Restart=on-failure
Environment=PATH=/root/.unsloth/studio/unsloth_studio/bin:/usr/local/bin:/usr/bin:/bin
Environment=PYTHONPATH=/root/.unsloth/studio/unsloth_studio
[Install]
WantedBy=multi-user.target
EOFsudo systemctl daemon-reload
sudo systemctl enable unsloth
sudo systemctl start unsloth# Ver estado
sudo systemctl status unsloth
# Parar
sudo systemctl stop unsloth
# Iniciar
sudo systemctl start unsloth
# Reiniciar
sudo systemctl restart unsloth- Unsloth Studio usa CUDA 13.x por defecto
- Si usás Qwen y obtenés salidas extrañas, verificá la versión de CUDA
- Podés cambiar a CUDA 12.x si es necesario
Unsloth Studio expone una API que podés usar desde cualquier cliente HTTP.
BASE_URL=http://localhost:8888curl ${BASE_URL}/api/healthcurl ${BASE_URL}/api/models/listcurl -X POST ${BASE_URL}/api/inference/chat \
-H "Content-Type: application/json" \
-d '{
"model": "unsloth/Qwen3-4B-GGUF",
"messages": [
{"role": "user", "content": "Hola! cómo estás?"}
],
"max_tokens": 256
}'curl -X POST ${BASE_URL}/api/inference/chat \
-H "Content-Type: application/json" \
-d '{
"model": "unsloth/Qwen3-4B-GGUF",
"messages": [
{"role": "user", "content": "Hola! cómo estás?"}
],
"max_tokens": 256,
"stream": true
}'import requests
url = "http://localhost:8888/api/inference/chat"
payload = {
"model": "unsloth/Qwen3-4B-GGUF",
"messages": [{"role": "user", "content": "Hola!"}],
"max_tokens": 256
}
response = requests.post(url, json=payload)
print(response.json())Si preferís usar el cliente de Ollama, podés configurar Unsloth como endpoint:
# En tu app, apuntá a:
export OLLAMA_BASE_URL=http://localhost:8888/apiPara usar la API de Unsloth como endpoint de LLM en OpenCode:
export OPENCODE_LLM_API_BASE=http://localhost:8888/api
export OPENCODE_LLM_MODEL=unsloth/Qwen3-4B-GGUFSi tu proyecto tiene AGENTS.md, agregá:
### LLM Configuration
- **API Base URL**: `http://localhost:8888/api`
- **Model**: `unsloth/Qwen3-4B-GGUF` (o `unsloth/Qwen3.5-8B-GGUF`)curl -X POST http://localhost:8888/api/inference/chat \
-H "Content-Type: application/json" \
-d '{
"model": "unsloth/Qwen3-4B-GGUF",
"messages": [
{"role": "system", "content": "You are a helpful coding assistant."},
{"role": "user", "content": "Write a hello world in Python"}
],
"max_tokens": 512,
"temperature": 0.7
}'