Skip to content

Instantly share code, notes, and snippets.

@NyanHelsing
Created February 12, 2026 17:56
Show Gist options
  • Select an option

  • Save NyanHelsing/bc2815763a841fe3dc45393fe7cd0130 to your computer and use it in GitHub Desktop.

Select an option

Save NyanHelsing/bc2815763a841fe3dc45393fe7cd0130 to your computer and use it in GitHub Desktop.
Mixture of Attended Contexts
title subtitle version status
Mixture of Attended Contexts (MoAC)
Out-of-Model Attention Priors for Context Selection, Tool Routing, and MCP Strategy Orchestration
0.1
Draft / for sharing

Mixture of Attended Contexts (MoAC)

Abstract

Large language models (LLMs) are increasingly deployed in environments where the “right answer” depends not only on the user prompt, but on situated knowledge: codebase structure, ticket history, documentation, ownership, runbooks, operational telemetry, and organization-specific processes. Standard retrieval-augmented generation (RAG) retrieves semantically similar chunks and stuffs them into the model context, leaving the model to perform attention over a noisy and often mis-scoped set of inputs.

This paper proposes Mixture of Attended Contexts (MoAC): an out-of-model attention mechanism that routes a query to a mixture of reusable attention indices (“context experts”) that each encode a ranking/weighting over a shared corpus of context atoms. MoAC extends beyond information selection to also route tools and tool-use strategies, including MCP (Model Context Protocol) tools, by attending over tool experts and strategy experts that encode how to call tools for the given use case.

MoAC provides:

  • A controllable, interpretable layer for context assembly and tool orchestration.
  • Reduced prompt-to-prompt variance by reusing stable attention patterns.
  • A bridge between retrieval, planning, and tool execution that is modular and learnable from logs.

1. Motivation

LLM performance degrades when:

  • The relevant knowledge is diffuse (spread across files, docs, tickets, people).
  • The query is underspecified (“why is this failing?”) and needs situational priors.
  • The system has many tools; choosing the right tool and call pattern matters more than raw recall.
  • “Top-k similar chunks” retrieval is insufficient because the task requires a schema (e.g., auth debugging, performance regression triage, incident response) rather than a literal match.

Human experts rarely retrieve “similar paragraphs.” They activate work modes:

  • “Debugging a prod incident” → start with runbooks/alerts/recent deploys.
  • “Refactoring a module” → start with ownership boundaries, interfaces, tests, callers.
  • “Security review” → start with threat model, sensitive sinks, historical vulnerabilities.

MoAC operationalizes these work modes as attended context indices and attended tool strategies.


2. Core Idea

MoAC introduces a router that maps an input query to a mixture over:

  1. Context experts: reusable attention indices that rank/weight context atoms.
  2. Tool experts: reusable priors over which tools to use.
  3. Strategy experts: reusable priors over how to call tools (sequences, prompts, safety checks, argument templates).

Instead of “retrieve chunks similar to the prompt,” MoAC performs:

“Retrieve attention patterns similar to the prompt, then use them to assemble context and tool plans.”


3. Definitions

3.1 Context Atom

A context atom is a small, structured unit of knowledge designed for retrieval and assembly. It is typically derived from:

  • A file or folder analysis (“module summary,” “API contract,” “call graph notes”)
  • Tickets and postmortems (“root cause patterns,” “known gotchas”)
  • Ownership graphs (“who modified this,” “reviewers,” “SME map”)
  • Operational data (“service SLO,” “alert definitions,” “deployment history”)

Each atom should include:

  • id
  • text (LLM-consumable prose)
  • embedding (vector)
  • provenance (source link/commit/ticket id)
  • tags (service/module/domain; optional)
  • freshness (timestamp; optional)
  • access_scope (permissions)

3.2 Context Expert (Attended Context Index)

A context expert encodes a stable attention distribution over the atom space. Informally: “When in this mode, these atoms tend to matter more.”

Representation options:

  • A sparse ranked list of atom IDs with weights
  • A dense weight vector over atoms (usually compressed)
  • A factorized model (weights over tags → atoms)

Each expert also has an embedding used for routing.

3.3 Tool Expert

A tool expert encodes a prior over which tools are appropriate (and which are risky/expensive), optionally conditioned on environment constraints (latency budget, permissions, offline mode).

3.4 Strategy Expert

A strategy expert encodes a prior over tool-call strategies:

  • Sequencing (which tool first, second, etc.)
  • Argument templates and slot-filling
  • Guardrails and validation steps
  • “Ask clarifying question vs. call tool now”
  • Summarization / normalization patterns for tool outputs

Strategies can be written by humans, learned from logs, or both.


4. System Overview

4.1 Architecture

MoAC is an orchestration layer around an LLM:


User Prompt
|
v
[Embedder] --> q (query embedding)
|
v
[MoAC Router]
|         |            |
v         v            v
Context    Tool        Strategy
Mixture    Mixture     Mixture
|         |            |
+-----> [Assembler & Planner] -----> Context Pack + Tool Plan
|
v
[LLM]
|
v
(Optional tool execution loop)

4.2 Outputs

MoAC produces two primary artifacts:

A) Context Pack

A structured bundle of selected atoms, ordered and optionally weighted.

B) Tool Plan (MCP-aware)

A plan that specifies:

  • Which MCP tools to call
  • In what order
  • With what arguments
  • With what safety checks and stopping conditions
  • How to summarize/transform results for the LLM

5. Formalization

Let:

  • A = {a_1..a_M} be context atoms
  • E_c = {e_1..e_N} be context experts
  • q be query embedding

Each context expert e_i has:

  • an embedding k_i
  • a weighting function over atoms w_i(a) (often stored as sparse weights)

5.1 Routing to a Mixture

Compute mixture weights:

[ \alpha_i = \text{softmax}\Big(\frac{\text{sim}(q, k_i)}{\tau}\Big) ]

where sim is cosine similarity and τ is a temperature.

5.2 Combined Attention Over Atoms

[ W(a) = \sum_{i=1}^{N} \alpha_i \cdot w_i(a) ]

Select top-K atoms by W(a) (with diversity and permission constraints).

5.3 Tool and Strategy Mixtures

Similarly:

  • tool mixture selects tools T
  • strategy mixture selects strategies S
  • the planner composes them into a tool plan (or offers candidates to the LLM)

6. Extending MoAC to MCP Tools and Strategies

6.1 Why “tool attention” matters

For many tasks, the model’s first decision dominates outcome:

  • “Search logs” vs “read runbook” vs “inspect recent PRs”
  • “Call dependency graph tool” vs “grep code” vs “query incidents”
  • “Ask one clarifying question” vs “start executing tools”

MoAC treats tools and strategies as first-class attended objects.

6.2 MCP Integration Model

Assume MCP tools expose:

  • a name
  • input/output schema
  • description/capabilities
  • permission requirements
  • cost/latency hints (optional)
  • environment constraints (optional)

MoAC maintains:

  • a tool catalog (MCP servers + tools)
  • embeddings for tools and strategies
  • learned priors per domain/work-mode

6.3 Strategy Templates

A strategy template is a reusable recipe:

strategy_id: "incident_triage_v1"
description: "Triage a production issue using runbook + recent deploy + error logs"
preconditions:
  - has_service_name: true
guards:
  - require_permissions: ["logs:read", "deploys:read"]
  - max_cost: 2.0
steps:
  - tool: "mcp.runbook.search"
    args_template:
      query: "{service_name} {symptom_keywords}"
  - tool: "mcp.deploys.recent"
    args_template:
      service: "{service_name}"
      window: "48h"
  - tool: "mcp.logs.query"
    args_template:
      service: "{service_name}"
      filter: "level:error AND ({symptom_keywords})"
      window: "2h"
postprocess:
  - summarize_outputs: true
  - normalize_timestamps: true
stop_conditions:
  - "root_cause_identified"
  - "need_user_clarification"

Strategies can also specify how to present tool results to the LLM (e.g., deduplication, structured tables, “top 5 anomalies”).

6.4 Planning Modes

MoAC supports multiple execution policies:

  1. Suggest-only: Provide top tool/strategy candidates to the LLM; the LLM decides.
  2. Auto-plan: Produce a tool plan; LLM reviews and executes.
  3. Auto-execute: Orchestrator executes tools, feeds results to LLM (guardrailed).
  4. Hybrid: Auto-execute low-risk steps; ask for approval on risky actions.

7. Assembly: From Mixtures to a Concrete Prompt

7.1 Context Assembly Heuristics

Even with good weights, assembly matters. Common rules:

  • Permission filter: only include atoms the user is allowed to see.
  • Freshness bias: for operational/debug tasks, upweight recent atoms.
  • Diversity: prevent the pack from being all tickets or all docs.
  • Compression: prefer smaller “summary atoms” when space is constrained.
  • Provenance: attach sources to reduce hallucinations and enable follow-up.

7.2 Tool Plan Assembly Heuristics

A good plan should:

  • minimize calls (cost/latency)
  • maximize expected information gain
  • validate tool outputs (schema checks)
  • use safe defaults
  • include fallback/clarifying questions

MoAC can compute an expected utility score:

[ U(\text{plan}) \approx \mathbb{E}[\text{task_success}] - \lambda_1 \cdot \text{cost} - \lambda_2 \cdot \text{latency} - \lambda_3 \cdot \text{risk} ]


8. Learning MoAC (Optional but Powerful)

MoAC can start as a hand-built system and become learnable over time.

8.1 Bootstrapping

  • Create initial experts from curated “work modes”
  • Create initial strategy templates from runbooks and best practices
  • Populate context atoms from code/doc/ticket analyzers

8.2 Learning Context Experts

Data sources:

  • Successful prompt → atoms used
  • Human annotations (“these were the right atoms”)
  • Downstream outcomes (task success)

Methods:

  • clustering prompt embeddings → expert centroids
  • learning weights from co-occurrence (atoms used in successful sessions)
  • supervised learning from labeled relevance sets

8.3 Learning Tool and Strategy Experts

Data sources:

  • tool call logs and outcomes
  • latency/cost telemetry
  • human “good plan” examples

Methods:

  • behavior cloning: predict tool sequences from historical traces
  • bandits: explore tool choices with guarded rollout
  • offline RL: optimize success metrics under constraints

9. Example Walkthrough

Prompt

“Why did the auth service start failing after yesterday’s deploy?”

MoAC routing might activate:

  • Context experts:

    • incident_triage
    • auth_domain
    • recent_changes_bias
  • Tool experts:

    • logs, deploy history, runbooks
  • Strategy experts:

    • incident triage strategy for auth services

Context Pack might include:

  • Auth service architecture atom
  • “Token refresh known issue” ticket atom
  • Recent deploy summary atom (last 48h)
  • Ownership/contact atom (oncall, recent authors)

Tool Plan might do:

  1. Fetch runbook section relevant to “auth failures”
  2. Fetch deploy diff summary (commits, config changes)
  3. Query error logs for signature changes
  4. (Optional) query metrics for error-rate spike correlating with deploy

The LLM then answers with grounded evidence and a prioritized hypothesis list, plus suggested next steps.


10. Safety, Privacy, and Access Control

MoAC increases leverage; therefore, it must be constrained.

10.1 Permissioning

  • Every atom and tool call must be permission-checked.
  • Tool experts and strategies must obey org policy (least privilege).
  • Strategy templates should declare required scopes.

10.2 Data Minimization

  • Prefer summarized atoms over raw sensitive data.
  • Redact/avoid secrets and PII by default.
  • Use “need-to-know” retrieval policies.

10.3 Auditability

  • Log: which experts fired, which atoms selected, which tools called, why.
  • Provide an “explain route” view for debugging.

10.4 Failure Modes & Mitigations

  • Overconfident routing → use mixture (top-N) + entropy thresholds.
  • Index staleness → decay weights; retrain periodically; freshness gates.
  • Tool misuse → allowlists; risk scoring; approval gates.

11. Implementation Sketch

11.1 Data Structures

  • Atom store:

    • ANN index over atom embeddings
    • metadata store for tags/freshness/access
  • Expert store:

    • expert embedding index
    • expert → sparse weights (top atoms / tag priors)
  • Tool/strategy store:

    • embeddings + metadata (schemas, scopes)
    • strategy templates + guards

11.2 Routing Pseudocode

def moac_route(prompt: str) -> dict:
    q = embed(prompt)

    # 1) Context mixture
    ctx_experts = topn_similar(q, context_expert_index, n=8)
    alpha = softmax([sim(q, e.embedding) / TAU for e in ctx_experts])

    atom_scores = defaultdict(float)
    for w, e in zip(alpha, ctx_experts):
        for atom_id, atom_w in e.sparse_atom_weights:
            atom_scores[atom_id] += w * atom_w

    # Apply constraints + diversity + freshness
    atom_ids = select_atoms(atom_scores)

    # 2) Tool mixture
    tool_experts = topn_similar(q, tool_expert_index, n=6)
    tool_mix = mix_tools(q, tool_experts)

    # 3) Strategy mixture
    strat_experts = topn_similar(q, strategy_index, n=6)
    strategies = propose_strategies(q, strat_experts, tool_mix)

    return {
        "context_pack": assemble_context(atom_ids),
        "tool_plan": assemble_plan(strategies, tool_mix),
        "explain": {
            "context_experts": [(e.id, w) for e, w in zip(ctx_experts, alpha)],
            "tool_experts": [e.id for e in tool_experts],
            "strategy_experts": [e.id for e in strat_experts],
        }
    }

11.3 “Context Pack” Formatting

A simple, robust format for LLM consumption:

  • Short header: what this pack is, and how it was selected
  • Sections by category (docs, code, tickets, people, ops)
  • Each atom includes provenance and a confidence/weight

This makes the pack both machine-usable and human-debuggable.


12. Evaluation

MoAC should be evaluated on:

  • Task success (did the system solve the issue / produce correct patch?)
  • Groundedness (claims supported by atoms/tool outputs)
  • Tool efficiency (calls per task, latency, cost)
  • Robustness (performance under ambiguous prompts)
  • User trust (interpretability of routing + ability to override)

Recommended harness:

  • A fixed set of tasks (bug triage, feature dev, refactor, incident response)
  • Offline replay with logged ground truth
  • A/B tests versus baseline RAG and baseline “LLM picks tools”

13. Limitations

  • Cold start: without good experts/strategies, MoAC is just extra plumbing.
  • Maintenance: experts can drift as code/org evolves.
  • Overfitting to modes: rare tasks may route incorrectly.
  • Complexity: routing + planning + tool execution introduces more failure points.

MoAC works best when paired with:

  • clear atom generation pipelines
  • a small number of high-quality initial strategies
  • strong guardrails and observability

14. Future Work

  • Hierarchical MoAC: route first to a domain, then to sub-modes.
  • Adaptive mixtures: re-route after each tool call using updated state.
  • Differentiable training: learn experts and routing end-to-end (where feasible).
  • Personalized mixtures: adapt to team/project/user while preserving privacy boundaries.
  • Bidirectional integration: allow the LLM to request specific experts (“switch to incident triage mode”).

Appendix A: Minimal Strategy Library (Starter Set)

  1. code_understanding_v1: find owners, read module summary, trace call graph
  2. bug_triage_v1: reproduce steps, check recent changes, search similar tickets
  3. incident_triage_v1: runbook → deploy diff → logs → metrics correlation
  4. security_review_v1: sensitive sinks, auth boundaries, known vulns, threat checklist
  5. refactor_planning_v1: interface boundaries, tests, callers, risk hotspots

Appendix B: “Explainability” Output (Recommended)

Expose to developers:

  • top experts (with weights)
  • top atoms (with scores and provenance)
  • chosen tools and strategies (with guards)
  • reasons for excluding atoms (permissions, redundancy, size)

This makes MoAC debuggable and helps iterate on indices/strategies quickly.


Conclusion

Mixture of Attended Contexts (MoAC) reframes retrieval as attention routing: selecting not only what information to include, but also how to act—via tool selection and MCP strategy orchestration. By representing reusable work modes as context experts and strategy templates as attended objects, MoAC offers a practical path toward more reliable, interpretable, and efficient LLM systems in real-world engineering environments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment