This document outlines the architectural shift from the current "Lootbox/RPC" model to the new "Code-Mode/UTCP" paradigm.
The LLM acts as a "Router". It decides on one tool, waits for the result, then decides the next step. This incurs a round-trip latency cost and token cost (re-reading history) for every single step.
sequenceDiagram
participant LLM
participant Server
participant Tool
LLM->>Server: Call Tool A (args)
Server->>Tool: Execute A
Tool-->>Server: Result A
Server-->>LLM: Result A (Text)
Note over LLM: "Thinking..." (Context Window Fill)
LLM->>Server: Call Tool B (args)
Server->>Tool: Execute B
Tool-->>Server: Result B
Server-->>LLM: Result B (Text)
The LLM acts as a "Programmer". It writes a complete script to solve the problem. The server executes this script in a sandbox, with tools available as native functions.
sequenceDiagram
participant LLM
participant Sandbox
participant Tool
LLM->>Sandbox: Send Script (TypeScript)
Note over Sandbox: Execute Script
Sandbox->>Tool: Call Tool A
Tool-->>Sandbox: Result A (Object)
Note over Sandbox: Logic / Loop / Filter
Sandbox->>Tool: Call Tool B
Tool-->>Sandbox: Result B (Object)
Sandbox-->>LLM: Final Result (JSON)
Scenario: Search logs for errors and summarize them.
Requires 3+ LLM turns.
- User: "Find errors in logs."
- LLM: Calls
search_logs("error") - Tool: Returns list of 50 log lines.
- LLM: Reads 50 lines... "Okay, I see these. I'll summarize the database ones."
- LLM: Calls
summarize("db connection failed...")
Requires 1 LLM turn.
The LLM writes and executes this once:
// The LLM writes this script:
const logs = await cm.tools.search_logs({ query: "error" });
// Filter locally (Zero token cost, infinite logic)
const dbErrors = logs.filter(l => l.includes("database"));
if (dbErrors.length > 0) {
// Pass data directly to next tool
const summary = await cm.ai.summarize({ content: dbErrors.join("\n") });
return summary;
} else {
return "No database errors found.";
}| Feature | Current (RPC/Lootbox) | New (Code-Mode) |
|---|---|---|
| Logic Location | Inside the LLM (Prompt Engineering) | Inside the Script (Code) |
| Data Processing | String manipulation via LLM | Native JS Arrays/Objects |
| Context Usage | High (Intermediate data is tokenized) | Low (Only final result is returned) |
| Latency | High (Sum of all tool RTTs + Inference) | Low (Single inference + fast execution) |
| Tool Integration | Complex (JSON Schema parsing) | Simple (Native TS Interfaces) |
| CLI Tools | Wrapped individually | Imported as global objects (e.g., cass.search) |
Your existing tools (cass, beads) wrap powerful CLIs.
- Currently: To use
cass, the LLM must understand the CLI output text. - New Way: The LLM receives Typed Objects from
cass. It can map, reduce, or filter these results using standard TypeScript before presenting them to you.