Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Select an option

  • Save VivianBalakrishnan/a7d4eec3833baee4971a0ee54b08f322 to your computer and use it in GitHub Desktop.

Select an option

Save VivianBalakrishnan/a7d4eec3833baee4971a0ee54b08f322 to your computer and use it in GitHub Desktop.
NanoClaw — Personal Claude Assistant (second brain for a diplomat)

NanoClaw — Personal Claude Assistant

A self-hosted, compounding-memory AI assistant running on a Raspberry Pi.


What Is This?

NanoClaw is a personal AI assistant built on Anthropic's Claude that runs entirely on a Raspberry Pi. It connects to messaging channels (WhatsApp, Telegram, Slack, Discord), processes voice and images, schedules recurring tasks, and — unlike a standard chatbot — accumulates knowledge over time through a structured memory system.


The Core Problem It Solves

Standard LLM assistants are stateless. They forget everything between sessions. The typical fix is RAG over a document store, but RAG retrieves chunks of raw text, not synthesised knowledge.

This system does three things instead:

  1. Extract — pull discrete facts, insights, and style preferences from raw documents (speeches, articles, conversations) into a graph database. Each entry is a self-contained, retrievable statement.
  2. Synthesise — compile those facts into human-readable wiki pages organised by entity, concept, and timeline.
  3. Recall — on every agent invocation, run a semantic query against the graph using the user's message as input. Relevant entries are injected as context before the agent responds.

The result is an agent that gets smarter over time, surfaces what it knows automatically, and can cite specific stored facts when it explains its reasoning.


Architecture

Raw sources          →    mnemon graph         →    wiki pages
(transcripts/            (structured facts,         (narrative syntheses,
 articles,                graph nodes,               human-readable,
 web clips)               semantic retrieval)        cross-referenced)
         ↑                      ↑                          ↑
   ingest pipeline        auto-recall hook          synthesise operation

Layer 1 — Raw Sources

Archival files, never modified after storage:

  • Speech transcripts in markdown
  • Articles saved from URL ingest or mobile web clipping (via Obsidian Web Clipper)

Layer 2 — mnemon Knowledge Graph

A SQLite-backed graph database where each entry has: content, category, importance score, tags, timestamp, and graph edges to related entries. Queried semantically via local vector embeddings (Ollama + nomic-embed-text).

Two stores:

  • Global — shared knowledge across all groups; read-only for non-main agents
  • Local — per-group memory, writable only by that group's agent

Layer 3 — Wiki Pages

Synthesised markdown files compiled from mnemon facts. Not raw extracts — full narrative pages with cross-references, organised into entities/, concepts/, and timelines/ subdirectories. Browsable in Obsidian on macOS and iOS.


Technology Stack

Component Purpose
NanoClaw (Node.js + TypeScript) Orchestrator: message loop, container management, channel routing
Claude Agent SDK Runs agent logic inside isolated Docker containers per group
Baileys WhatsApp Web protocol (no business API needed)
mnemon Custom CLI knowledge graph tool (SQLite + graph traversal)
Ollama + nomic-embed-text Local vector embeddings for semantic recall — runs on Pi, no cloud calls
whisper.cpp Local voice transcription — converts voice notes to text on-device
sharp Image resize and processing before multimodal Claude calls
OneCLI Credential proxy — containers never see raw API keys
SQLite Message store, group registry, task scheduler
systemd Process management (nanoclaw service + article watcher)

Key Capabilities

Messaging channels: WhatsApp, Gmail (read + send), and Web are active. Telegram, Slack, and Discord are available as installable skill branches.

Multimodal: Voice notes transcribed locally via whisper.cpp before the agent sees them. Images resized and passed as multimodal content to Claude.

Memory: Every agent invocation triggers a semantic recall against the knowledge graph. Relevant facts surface automatically as a system reminder — the agent never has to decide to "look something up."

Task scheduler: Cron, interval, and one-time tasks. Supports a bash pre-check script to avoid waking the agent unnecessarily — keeps API usage minimal.

Multi-group isolation: Each registered group gets an isolated Docker container, filesystem, local mnemon store, and Claude session. Groups cannot read each other's memory or messages.

Subagent teams: Agents can spawn specialised child agents for parallel work (research, web browsing, data extraction) via Claude's experimental agent teams feature.

Web interface: Multi-conversation portal (port 3080, configurable) for conversations outside WhatsApp. Full Markdown rendering, multiple simultaneous group conversations, conversation history.

Obsidian integration: Wiki pages sync to an Obsidian vault on macOS/iOS via iCloud + rsync bridge. Web clips from the Obsidian mobile app flow in the other direction — clip → iCloud → Pi → ingest pipeline.


Security Design

  • Containers never see raw API keys. An HTTP proxy (OneCLI) intercepts container HTTPS traffic and injects credentials at request time.
  • Sender allowlist — optional per-chat control over who can trigger the agent. Two modes: trigger (non-allowed senders' messages stored for context but can't trigger) or drop (messages not stored at all).
  • Mount allowlist — stored outside the project root so containers cannot read it. Controls which host directories can be mounted into containers; blocks sensitive path patterns (.ssh, .aws, *.pem, etc.).
  • Per-group IPC namespacing — each group can only send messages to its own JID. Source identity is verified by directory path, not message content. Main group has elevated privileges.
  • Group folder validation — folder names are strictly validated (alphanumeric, hyphens, underscores only; no path traversal).

Interesting Design Decisions

Why Docker containers per group, not one process? Isolation. Each group gets a clean environment, its own filesystem, and its own Claude session. A runaway agent in one group can't affect others. Container lifetime is tied to conversation activity — they shut down after idle timeout.

Why iCloud + rsync for Obsidian sync, not git? iOS git clients (obsidian-git, isomorphic-git) have unreliable auth and clone failures in practice. iCloud is native to iOS, zero-config, and free. rsync is directional and battle-tested. A Mac Mini acts as the bridge (always on, same LAN as the Pi).

Why mnemon + wiki pages, not just RAG? RAG retrieves text chunks; mnemon stores synthesised facts as discrete nodes. The wiki layer adds human-readable narratives that can be reviewed and corrected. The two-tier design (mnemon for recall, wiki for synthesis) separates retrieval from presentation. The wiki pattern is inspired by Andrej Karpathy's LLM Wiki concept — extracting structured knowledge from raw sources rather than indexing them whole.

Why local embeddings? The knowledge base contains personal and policy-sensitive content. Running nomic-embed-text locally on the Pi means no document content leaves the network. The 274MB model runs fast enough on the Pi 5 for this use case.

Why whisper.cpp locally? Same reason — voice notes contain private conversations. Running whisper.cpp locally keeps audio on-device. The base model is fast enough on the Pi 5 for practical use.

Why a task pre-check script? Each agent invocation uses API credits. For tasks like "check if there are new PRs" or "did anything change?", a bash script can answer the question without waking the LLM. The agent only runs when the script signals wakeAgent: true.


Project Structure (overview)

src/                    orchestrator source (TypeScript)
  channels/             WhatsApp, Telegram, Slack, Discord, Gmail adapters
  container-runner.ts   Docker container lifecycle management
  task-scheduler.ts     cron/interval/once scheduler
  ipc.ts                inter-process messaging (JSON file drops)
container/
  agent-runner/         agent entrypoint running inside containers
  skills/               container-side skill markdown files
  mnemon                compiled mnemon binary
groups/
  global/               shared knowledge (CLAUDE.md, wiki, transcripts, articles)
  {channel}_{group}/    per-group files (CLAUDE.md, attachments, conversations)
data/
  sessions/{group}/     per-group Claude sessions, local mnemon, IPC streams
  ipc/{group}/          message and task drop directories
scripts/
  watch-articles.sh     inotifywait watcher → IPC ingest task on new article
docs/
  obsidian-setup/       Mac Mini rsync scripts and launchd plist

Status

Actively running on a Raspberry Pi 5 (aarch64) as a personal assistant. The system is in daily use — processing messages, running scheduled briefings, ingesting articles, and building up the knowledge graph continuously.

NanoClaw is open source: github.com/qwibitai/nanoclaw

@danbri
Copy link
Copy Markdown

danbri commented Apr 26, 2026

This looks fun and I hate to be negative about it! But since it is getting a lot of attention:

Are you comfortable that there is sufficient protection from prompt injection? e.g. see rug pull section of - it shows how a simple message is enough to 'trick' an AI-with-whatsapp-linked into leaking a lot of information. You don't need to be using MCP for these kinds of risks. For example your "folder names are strictly validated" above doesn't seem to address the risk that the text in the folder name might be presented to an LLM in a trusted (e.g. tool rich) environment.

@johnnyfish
Copy link
Copy Markdown

@VivianBalakrishnan co-founder of onecli here. really nice writeup, and the way you've framed the credential proxy in the security section is one of the cleanest articulations of the threat model i've seen anywhere. glad it's earning its keep in a setup like this. if anything breaks or you want to chat architecture, my dms are open.

jonathan

@OmriHadadi-Alta
Copy link
Copy Markdown

Looks amazing
I know OneCLI
this is the kind of tool that makes you wonder why it wasn't always this way.
exactly what the ecosystem needed for credential proxy

@clementstrange
Copy link
Copy Markdown

@VivianBalakrishnan does Claude have access to the mnemon? If yes, how do you make sure Claude doesn't read everything in it thus breaking the privacy of the mnemon/wiki?

@lastforkbender
Copy link
Copy Markdown

A true second “brain” needs real cognitive processes, consider this:

# phase_spline_transformer.py
"""
PhaseSplineTransformer — single-file GPU-ready module
(No custom CUDA compilation required)

Features:
- Complex embedding from real inputs
- Exact unitary via parametric complex Givens rotations (GPU-friendly)
- Vectorized, fully PyTorch B-spline evaluator (GPU)
- Complex B-spline layer with learned complex coefficients
- Scoring MLP gating, optional Gumbel-sigmoid discrete-like routing
- Permutation-invariant outer aggregation: Hermitian attention + sum/mean
- Unrolled inner->outer iterative feedback loop (configurable steps)
- Tuple grouping (n-tuples) and group/independent gating
- Perturbation hooks (noise, gate masks, temperature)
- Spanish comments in key sections for clarity

Dependencies:
    torch (1.12+ recommended), numpy
    Optional: numba (only used for CPU knot helper; not required)

Usage:
    from phase_spline_transformer_full import PhaseSplineTransformer
    model = PhaseSplineTransformer(...)
    out = model(x)  # x on GPU works directly
"""

from typing import Optional, Dict
import math
import torch
import torch.nn as nn
import torch.nn.functional as F
import numpy as np

# Optional numba for knot helper (not required)
try:
    import numba
    from numba import njit
    _NUMBA = True
except Exception:
    _NUMBA = False

# -------------------------
# Knot helper (numba optional)
# -------------------------
def make_uniform_clamped_knots(a: float, b: float, n_ctrl: int, degree: int):
    """Create clamped uniform knots on [a,b] (numpy)."""
    n_knots = n_ctrl + degree + 1
    n_inner = n_knots - 2*(degree+1)
    if n_inner > 0:
        inner = np.linspace(a, b, n_inner + 2)
        knots = np.concatenate((np.full(degree+1, a), inner[1:-1], np.full(degree+1, b)))
    else:
        knots = np.concatenate((np.full(degree+1, a), np.full(degree+1, b)))
    return knots.astype(np.float32)

if _NUMBA:
    @njit
    def make_uniform_clamped_knots_numba(a, b, n_ctrl, degree):
        n_knots = n_ctrl + degree + 1
        n_inner = n_knots - 2*(degree+1)
        if n_inner > 0:
            inner = np.linspace(a, b, n_inner + 2)
            out = []
            for _ in range(degree+1):
                out.append(a)
            for i in range(1, len(inner)-1):
                out.append(inner[i])
            for _ in range(degree+1):
                out.append(b)
            return np.array(out, dtype=np.float32)
        else:
            out = []
            for _ in range(degree+1):
                out.append(a)
            for _ in range(degree+1):
                out.append(b)
            return np.array(out, dtype=np.float32)

# -------------------------
# B-spline basis evaluator (fully vectorized in PyTorch)
# -------------------------
def bspline_basis_torch(t: torch.Tensor, knots: torch.Tensor, degree: int):
    """
    Cox-de Boor evaluator vectorized in PyTorch (GPU compatible).
    t: tensor (...,) or (..., n_channels)
    knots: 1D tensor (n_knots,)
    degree: int
    Returns: B with shape (..., n_basis)
    """
    device = t.device
    knots = knots.to(device=device)
    n_knots = knots.shape[0]
    n_basis = n_knots - degree - 1
    # flatten last dims to a single axis of parameters
    orig_shape = t.shape
    t_flat = t.reshape(-1)  # (N,)
    N = t_flat.shape[0]
    # degree-0 basis
    left = knots[:n_basis].unsqueeze(0)   # (1, n_basis)
    right = knots[1:n_basis+1].unsqueeze(0)
    t_col = t_flat.unsqueeze(1)  # (N,1)
    B = ((t_col >= left) & (t_col < right)).to(t.dtype)  # (N, n_basis)
    # special-case exact endpoint
    last_k = knots[-1]
    if torch.any(t_flat == last_k):
        ids = (t_flat == last_k).nonzero(as_tuple=False).squeeze(1)
        if ids.numel() > 0:
            B[ids, -1] = 1.0
    # recursion
    for p in range(1, degree+1):
        Bp = torch.zeros_like(B)
        for i in range(n_basis):
            denom1 = (knots[i+p] - knots[i]).to(device)
            denom2 = (knots[i+p+1] - knots[i+1]).to(device)
            term1 = torch.zeros(N, device=device, dtype=t.dtype)
            term2 = torch.zeros(N, device=device, dtype=t.dtype)
            if denom1 != 0:
                term1 = ((t_flat - knots[i]) / denom1) * B[:, i]
            if denom2 != 0 and (i+1) < n_basis:
                term2 = ((knots[i+p+1] - t_flat) / denom2) * B[:, i+1]
            Bp[:, i] = term1 + term2
        B = Bp
    B = B.view(*orig_shape, n_basis)
    return B  # (..., n_basis)

# -------------------------
# Unitary via complex Givens (vectorized)
# -------------------------
class UnitaryGivens(nn.Module):
    """
    Parameterized exact unitary using pairwise complex Givens rotations.
    Applies to the last dimension of complex tensors.
    """
    def __init__(self, n: int, n_layers: int = 2):
        super().__init__()
        assert n % 2 == 0, "n must be even for pairwise Givens."
        self.n = n
        self.n_layers = n_layers
        # parameters: (n_layers, n//2)
        self.theta = nn.Parameter(0.01 * torch.randn(n_layers, n//2))
        self.phi = nn.Parameter(0.01 * torch.randn(n_layers, n//2))
        self.psi = nn.Parameter(0.01 * torch.randn(n_layers, n//2))

    def forward(self, z: torch.Tensor):
        # z: complex tensor (..., n)
        orig_shape = z.shape
        n = self.n
        B = int(torch.prod(torch.tensor(orig_shape[:-1]))) if len(orig_shape) > 1 else 1
        x = z.reshape(B, n)  # (B, n) complex
        for l in range(self.n_layers):
            th = self.theta[l].to(z.device)  # (n//2,)
            ph = self.phi[l].to(z.device)
            ps = self.psi[l].to(z.device)
            c = torch.cos(th).unsqueeze(0)  # (1, n//2)
            s = torch.sin(th).unsqueeze(0)
            eiph = torch.cos(ph).unsqueeze(0) + 1j * torch.sin(ph).unsqueeze(0)
            eips = torch.cos(ps).unsqueeze(0) + 1j * torch.sin(ps).unsqueeze(0)
            a = x[:, 0::2]
            b = x[:, 1::2]
            out_a = c * a - eiph * s * b
            out_b = eips * s * a + (eiph * eips) * c * b
            x = torch.empty_like(x)
            x[:, 0::2] = out_a
            x[:, 1::2] = out_b
        x = x.view(*orig_shape)
        return x

# -------------------------
# Complex B-spline layer (GPU)
# -------------------------
class ComplexBSpline(nn.Module):
    """
    Per-channel 1D B-spline mapping: param t from z (real/imag/mag), outputs complex via learned complex coeffs.
    Entirely on GPU during forward.
    """
    def __init__(self, n_channels: int, degree: int = 3, n_ctrl: int = 16, a: float = -1.0, b: float = 1.0):
        super().__init__()
        assert n_ctrl >= 2
        self.n_channels = n_channels
        self.degree = degree
        self.n_ctrl = n_ctrl
        self.a = float(a); self.b = float(b)
        knots_np = make_uniform_clamped_knots(self.a, self.b, n_ctrl, degree)
        self.register_buffer("knots", torch.from_numpy(knots_np))  # float32
        coef = torch.randn(n_channels, n_ctrl, dtype=torch.cfloat) * 0.01
        self.coef = nn.Parameter(coef)

    def forward(self, z: torch.Tensor, param_mode: str = "real"):
        # z: (..., n_channels) complex
        assert z.shape[-1] == self.n_channels
        device = z.device
        knots = self.knots.to(device=device)
        if param_mode == "real":
            t = z.real
        elif param_mode == "imag":
            t = z.imag
        elif param_mode == "mag":
            t = torch.abs(z)
        else:
            raise ValueError("param_mode must be 'real'|'imag'|'mag'")
        t = torch.clamp(t, self.a, self.b)
        B = bspline_basis_torch(t, knots, self.degree)  # (..., n_ctrl)
        # expand and multiply
        # B: (..., n_ctrl) -> (..., 1, n_ctrl)
        B_exp = B.unsqueeze(-2)
        coef = self.coef.unsqueeze(0)  # (1, n_channels, n_ctrl)
        prod = B_exp * coef  # (..., n_channels, n_ctrl) complex
        out = prod.sum(dim=-1)  # (..., n_channels) complex
        return out

# -------------------------
# Hermitian attention (complex)
# -------------------------
def hermitian_attention(query: torch.Tensor, key: torch.Tensor, value: torch.Tensor, mask: Optional[torch.Tensor]=None, eps: float=1e-8):
    """
    Complex-valued attention using Hermitian inner product <q,k> = q* conj(k) (conjugate transpose on last dim).
    Inputs: tensors shape (..., L, d) with complex dtype.
    Returns: aggregated value (..., d) or (..., L, d) depending on reduction.
    We'll compute attention weights over L dimension and output weighted sum over value.
    """
    # compute logits: real scalar from complex inner product magnitude + alignment (use real part of q * conj(k))
    # q,k: (..., L, d)
    # compute pairwise scores along L: for simplicity assume query and key same shape and we want matrix over last-2 dim
    # We'll implement simple self-attention with queries per element
    # For permutation-invariant outer aggregation we set query = global context or per-tuple queries
    q = query  # (..., Lq, d)
    k = key    # (..., Lk, d)
    v = value  # (..., Lk, d)
    # compute complex inner product along d: result (..., Lq, Lk) complex
    # use einsum: sum over d of q * conj(k)
    conj_k = torch.conj(k)
    scores_c = torch.einsum("...qd,...kd->...qk", q, conj_k)  # complex
    # convert to real scores: use real part and magnitude
    scores = scores_c.real  # use real part; alternative: torch.abs(scores_c)
    if mask is not None:
        scores = scores.masked_fill(~mask, float("-1e9"))
    weights = F.softmax(scores, dim=-1)  # (..., Lq, Lk)
    # weighted sum of v over Lk
    out = torch.einsum("...qk,...kd->...qd", weights, v)
    return out, weights

# -------------------------
# Gumbel-Sigmoid helper for discrete-like routing
# -------------------------
def gumbel_sigmoid_sample(logits: torch.Tensor, temperature: float = 1.0, hard: bool = False):
    """
    Sample differentiable approximate Bernoulli via Gumbel-sigmoid.
    logits: tensor in real domain (pre-sigmoid)
    Returns sample in (0,1), same shape.
    """
    # logits -> probabilities via logistic; implement using standard Gumbel trick on logits
    eps = 1e-20
    u = torch.rand_like(logits)
    g = -torch.log(-torch.log(u + eps) + eps)
    y = torch.sigmoid((logits + g) / temperature)
    if hard:
        y_hard = (y > 0.5).to(y.dtype)
        y = (y_hard - y).detach() + y
    return y

# -------------------------
# High-level PhaseSplineTransformer with inner-outer loop
# -------------------------
class PhaseSplineTransformer(nn.Module):
    """
    Comprehensive module integrating:
      - complex embedding
      - unitary Givens rotations
      - per-channel complex B-splines
      - gating (sigmoid or gumbel-sigmoid)
      - inner->outer iterative feedback with hermitian attention aggregation
    """
    def __init__(
        self,
        input_dim: int,
        complex_dim: int,
        tuple_size: int = 1,
        n_layers: int = 3,
        spline_ctrl: int = 24,
        spline_degree: int = 3,
        t_range: tuple = (-1.0, 1.0),
        gate_hidden: int = 128,
        group_gating: bool = False,
        routing_gumbel: bool = False,
        inner_outer_steps: int = 2,
        attn_heads: int = 1
    ):
        super().__init__()
        assert complex_dim % 2 == 0, "complex_dim must be even"
        assert complex_dim % tuple_size == 0
        self.input_dim = input_dim
        self.complex_dim = complex_dim
        self.tuple_size = tuple_size
        self.num_tuples = complex_dim // tuple_size
        self.group_gating = group_gating
        self.t_min, self.t_max = float(t_range[0]), float(t_range[1])
        self.routing_gumbel = routing_gumbel
        self.inner_outer_steps = max(1, inner_outer_steps)
        self.attn_heads = attn_heads

        # embedding to complex
        self.embed = nn.Linear(input_dim, 2 * complex_dim)

        # unitary
        self.unitary = UnitaryGivens(complex_dim, n_layers=n_layers)

        # spline
        self.spline = ComplexBSpline(complex_dim, degree=spline_degree, n_ctrl=spline_ctrl, a=self.t_min, b=self.t_max)

        # gating MLP (produces logits pre-sigmoid if using gumbel)
        gate_out_dim = self.num_tuples if group_gating else complex_dim
        self.gate_pre = nn.Sequential(
            nn.Linear(2*complex_dim + 2*complex_dim, gate_hidden),  # local + global context concat
            nn.Tanh(),
            nn.Linear(gate_hidden, gate_out_dim)
        )
        # small MLP to compute global context summary if needed
        self.global_mlp = nn.Sequential(
            nn.Linear(2*complex_dim, gate_hidden//2),
            nn.ReLU(),
            nn.Linear(gate_hidden//2, 2*complex_dim)
        )
        # head to real outputs (user may replace)
        self.head = nn.Linear(2*complex_dim, input_dim)

    def forward(self, x: torch.Tensor, perturbation: Optional[Dict]=None):
        """
        x: real input (..., input_dim)
        perturbation: optional dict with keys:
            - 'noise_complex': complex tensor broadcastable to zU
            - 'gate_mask': mask in [0,1] same shape as gates
            - 'gate_temp': float for gumbel or temperature shaping
            - 'hard_gumbel': bool to use hard discrete gumbel
        Returns dict with 'logits', 'z_out', 'gates', 'attn_weights'
        """
        device = x.device
        # embed
        e = self.embed(x)
        re, im = torch.chunk(e, 2, dim=-1)
        z = torch.complex(re, im)  # (..., complex_dim)

        # apply unitary
        zU = self.unitary(z)  # complex

        # optional noise perturbation
        if perturbation is not None and 'noise_complex' in perturbation:
            zU = zU + perturbation['noise_complex'].to(device)

        # initialize local state as zU
        local = zU

        attn_weights_all = None
        gates_final = None

        # iterative inner->outer loop
        for step in range(self.inner_outer_steps):
            # outer aggregation: compute global summary via hermitian attention
            # for permutation-invariant aggregation we treat tuples as sequence length
            Bshape = local.shape
            # reshape to (..., L, d) where L = num_tuples, d = tuple_size
            bs = list(Bshape[:-1])  # batch dims
            L = self.num_tuples
            d = self.tuple_size
            new_shape = (*bs, L, d)
            local_grp = local.view(*bs, L, d)  # complex
            # compute simple per-tuple summary by summing over tuple dimension to get tuple vectors
            tuple_vecs = local_grp.sum(dim=-1)  # (..., L) complex (collapsed tuple to scalar channel)
            # promote to vector features for attention: make feature dim = 1 (or expand)
            # we'll use tuple_vecs as keys/values with feature dim 1
            q = tuple_vecs.unsqueeze(-1)  # (..., L, 1)
            k = q
            v = local_grp  # (..., L, d)
            # apply hermitian attention (queries = keys = tuples). We compute out per tuple
            attn_out, attn_w = hermitian_attention(q, k, v)  # attn_out (..., L, d)
            # reduce attn_out to produce global context: mean over L then flatten to match 2*complex_dim
            global_ctx = attn_out.mean(dim=-2)  # (..., d) complex
            # expand global_ctx to match original channel dim by repeating per tuple and split real/imag
            global_expanded = global_ctx.repeat_interleave(d, dim=-1) if d>1 else torch.cat([global_ctx, torch.zeros_like(global_ctx)], dim=-1)
            # flatten global into real vector for MLP
            global_flat = torch.cat([global_ctx.real, global_ctx.imag], dim=-1)
            # refine global via small MLP (real)
            global_ref = self.global_mlp(global_flat)  # (..., 2*complex_dim)
            # build gate logits using local + global context
            local_flat = torch.cat([local.real, local.imag], dim=-1)
            gate_input = torch.cat([local_flat, global_ref], dim=-1)
            gate_logits = self.gate_pre(gate_input)  # (..., gate_out_dim) real
            # sample or compute gating
            if self.routing_gumbel:
                temp = perturbation.get('gate_temp', 1.0) if perturbation else 1.0
                hard = perturbation.get('hard_gumbel', False) if perturbation else False
                s = gumbel_sigmoid_sample(gate_logits, temperature=temp, hard=hard)
            else:
                s = torch.sigmoid(gate_logits)
            # apply gate mask if provided
            if perturbation is not None and 'gate_mask' in perturbation:
                mask = perturbation['gate_mask'].to(device)
                s = s * mask
            # expand group gating if needed
            if self.group_gating:
                s = s.unsqueeze(-1).expand(*s.shape, self.tuple_size).reshape(*s.shape[:-1], self.complex_dim)
            # convert to complex
            s_c = torch.complex(s, torch.zeros_like(s))

            # compute spline transform
            spline_out = self.spline(local, param_mode="real")
            # gated residual update (local update)
            local = (1.0 - s_c) * local + s_c * spline_out

            # keep last attn weights/gates
            attn_weights_all = attn_w
            gates_final = s

        z_out = local
        head_in = torch.cat([z_out.real, z_out.imag], dim=-1)
        logits = self.head(head_in)

        return {"logits": logits, "z_out": z_out, "gates": gates_final, "attn_weights": attn_weights_all}

# -------------------------
# Minimal smoke test when run as script
# -------------------------
if __name__ == "__main__":
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    B = 32
    input_dim = 16
    complex_dim = 8  # must be even
    model = PhaseSplineTransformer(input_dim, complex_dim, tuple_size=2, n_layers=2, spline_ctrl=12, spline_degree=3, gate_hidden=64, group_gating=False, routing_gumbel=True, inner_outer_steps=3).to(device)
    x = torch.randn(B, input_dim, device=device)
    out = model(x, perturbation={"gate_temp":0.7, "hard_gumbel":False})
    print("logits", out["logits"].shape)
    print("z_out dtype", out["z_out"].dtype, "device", out["z_out"].device)
    print("gates", out["gates"].shape)
    if out["attn_weights"] is not None:
        print("attn_weights shape", out["attn_weights"].shape)

"""
Notas en español (breve):
- Este módulo evita transferencias CPU<->GPU en la ruta forward/backward.
- La evaluación B-spline está vectorizada en PyTorch; para batches muy grandes y muchos n_ctrl puede requerir optimización adicional.
- El bucle inner->outer está desenrollado (unrolled) para permitir gradientes a través de la retroalimentación; si necesitas largas iteraciones usa truncated BPTT o stop_gradient.
- Para comportamiento tipo "cognitivo" prueba perturbaciones con 'gate_mask' y 'noise_complex' para forzar cambios en la ruta de enrutamiento.
- Si deseas, puedo añadir utilidades para visualizar fases (arg) y magnitudes por canal durante entrenamiento.
"""

@seefood
Copy link
Copy Markdown

seefood commented May 2, 2026

@lastforkbender seems worth fleshing out into a usable product, If I understood what this does. I fed it to perplexity and it suggested the best way to implement this is as a wrapper over Mnemon, perhaps as an MCP server, and helped me get started with this PRD: https://gist.github.com/seefood/5405727af8e28618b52e8ee4a74b7257

Is this what you are suggesting? is there an implementation of something like this ready to use or just your pasted code here?

@fredrickchew
Copy link
Copy Markdown

For those who wants to try claw without hardware or hassle of configuring API. Check out this claw webtop that you can run on Github CodeSpace before deploying on your local hardware.

https://github.com/gitricko/openclaw-webtop (use 4 cpu for GitHub codespace as it is resource heavy)
https://github.com/gitricko/picoclaw-webtop (use 2cpu default codespace as it is super lightweight (lighten than nano claw)

Might be interesting to us nano claw, I like the onecli concept as it improves security. Both OC and Pico uses config files

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment