| title | subtitle | version | status |
|---|---|---|---|
Mixture of Attended Contexts (MoAC) |
Out-of-Model Attention Priors for Context Selection, Tool Routing, and MCP Strategy Orchestration |
0.1 |
Draft / for sharing |
Large language models (LLMs) are increasingly deployed in environments where the “right answer” depends not only on the user prompt, but on situated knowledge: codebase structure, ticket history, documentation, ownership, runbooks, operational telemetry, and organization-specific processes. Standard retrieval-augmented generation (RAG) retrieves semantically similar chunks and stuffs them into the model context, leaving the model to perform attention over a noisy and often mis-scoped set of inputs.
This paper proposes Mixture of Attended Contexts (MoAC): an out-of-model attention mechanism that routes a query to a mixture of reusable attention indices (“context experts”) that each encode a ranking/weighting over a shared corpus of context atoms. MoAC extends beyond information selection to also route tools and tool-use strategies, including MCP (Model Context Protocol) tools, by attending over tool experts and strategy experts that encode how to call tools for the given use case.
MoAC provides:
- A controllable, interpretable layer for context assembly and tool orchestration.
- Reduced prompt-to-prompt variance by reusing stable attention patterns.
- A bridge between retrieval, planning, and tool execution that is modular and learnable from logs.
LLM performance degrades when:
- The relevant knowledge is diffuse (spread across files, docs, tickets, people).
- The query is underspecified (“why is this failing?”) and needs situational priors.
- The system has many tools; choosing the right tool and call pattern matters more than raw recall.
- “Top-k similar chunks” retrieval is insufficient because the task requires a schema (e.g., auth debugging, performance regression triage, incident response) rather than a literal match.
Human experts rarely retrieve “similar paragraphs.” They activate work modes:
- “Debugging a prod incident” → start with runbooks/alerts/recent deploys.
- “Refactoring a module” → start with ownership boundaries, interfaces, tests, callers.
- “Security review” → start with threat model, sensitive sinks, historical vulnerabilities.
MoAC operationalizes these work modes as attended context indices and attended tool strategies.
MoAC introduces a router that maps an input query to a mixture over:
- Context experts: reusable attention indices that rank/weight context atoms.
- Tool experts: reusable priors over which tools to use.
- Strategy experts: reusable priors over how to call tools (sequences, prompts, safety checks, argument templates).
Instead of “retrieve chunks similar to the prompt,” MoAC performs:
“Retrieve attention patterns similar to the prompt, then use them to assemble context and tool plans.”
A context atom is a small, structured unit of knowledge designed for retrieval and assembly. It is typically derived from:
- A file or folder analysis (“module summary,” “API contract,” “call graph notes”)
- Tickets and postmortems (“root cause patterns,” “known gotchas”)
- Ownership graphs (“who modified this,” “reviewers,” “SME map”)
- Operational data (“service SLO,” “alert definitions,” “deployment history”)
Each atom should include:
idtext(LLM-consumable prose)embedding(vector)provenance(source link/commit/ticket id)tags(service/module/domain; optional)freshness(timestamp; optional)access_scope(permissions)
A context expert encodes a stable attention distribution over the atom space. Informally: “When in this mode, these atoms tend to matter more.”
Representation options:
- A sparse ranked list of atom IDs with weights
- A dense weight vector over atoms (usually compressed)
- A factorized model (weights over tags → atoms)
Each expert also has an embedding used for routing.
A tool expert encodes a prior over which tools are appropriate (and which are risky/expensive), optionally conditioned on environment constraints (latency budget, permissions, offline mode).
A strategy expert encodes a prior over tool-call strategies:
- Sequencing (which tool first, second, etc.)
- Argument templates and slot-filling
- Guardrails and validation steps
- “Ask clarifying question vs. call tool now”
- Summarization / normalization patterns for tool outputs
Strategies can be written by humans, learned from logs, or both.
MoAC is an orchestration layer around an LLM:
User Prompt
|
v
[Embedder] --> q (query embedding)
|
v
[MoAC Router]
| | |
v v v
Context Tool Strategy
Mixture Mixture Mixture
| | |
+-----> [Assembler & Planner] -----> Context Pack + Tool Plan
|
v
[LLM]
|
v
(Optional tool execution loop)
MoAC produces two primary artifacts:
A structured bundle of selected atoms, ordered and optionally weighted.
A plan that specifies:
- Which MCP tools to call
- In what order
- With what arguments
- With what safety checks and stopping conditions
- How to summarize/transform results for the LLM
Let:
A = {a_1..a_M}be context atomsE_c = {e_1..e_N}be context expertsqbe query embedding
Each context expert e_i has:
- an embedding
k_i - a weighting function over atoms
w_i(a)(often stored as sparse weights)
Compute mixture weights:
[ \alpha_i = \text{softmax}\Big(\frac{\text{sim}(q, k_i)}{\tau}\Big) ]
where sim is cosine similarity and τ is a temperature.
[ W(a) = \sum_{i=1}^{N} \alpha_i \cdot w_i(a) ]
Select top-K atoms by W(a) (with diversity and permission constraints).
Similarly:
- tool mixture selects tools
T - strategy mixture selects strategies
S - the planner composes them into a tool plan (or offers candidates to the LLM)
For many tasks, the model’s first decision dominates outcome:
- “Search logs” vs “read runbook” vs “inspect recent PRs”
- “Call dependency graph tool” vs “grep code” vs “query incidents”
- “Ask one clarifying question” vs “start executing tools”
MoAC treats tools and strategies as first-class attended objects.
Assume MCP tools expose:
- a name
- input/output schema
- description/capabilities
- permission requirements
- cost/latency hints (optional)
- environment constraints (optional)
MoAC maintains:
- a tool catalog (MCP servers + tools)
- embeddings for tools and strategies
- learned priors per domain/work-mode
A strategy template is a reusable recipe:
strategy_id: "incident_triage_v1"
description: "Triage a production issue using runbook + recent deploy + error logs"
preconditions:
- has_service_name: true
guards:
- require_permissions: ["logs:read", "deploys:read"]
- max_cost: 2.0
steps:
- tool: "mcp.runbook.search"
args_template:
query: "{service_name} {symptom_keywords}"
- tool: "mcp.deploys.recent"
args_template:
service: "{service_name}"
window: "48h"
- tool: "mcp.logs.query"
args_template:
service: "{service_name}"
filter: "level:error AND ({symptom_keywords})"
window: "2h"
postprocess:
- summarize_outputs: true
- normalize_timestamps: true
stop_conditions:
- "root_cause_identified"
- "need_user_clarification"Strategies can also specify how to present tool results to the LLM (e.g., deduplication, structured tables, “top 5 anomalies”).
MoAC supports multiple execution policies:
- Suggest-only: Provide top tool/strategy candidates to the LLM; the LLM decides.
- Auto-plan: Produce a tool plan; LLM reviews and executes.
- Auto-execute: Orchestrator executes tools, feeds results to LLM (guardrailed).
- Hybrid: Auto-execute low-risk steps; ask for approval on risky actions.
Even with good weights, assembly matters. Common rules:
- Permission filter: only include atoms the user is allowed to see.
- Freshness bias: for operational/debug tasks, upweight recent atoms.
- Diversity: prevent the pack from being all tickets or all docs.
- Compression: prefer smaller “summary atoms” when space is constrained.
- Provenance: attach sources to reduce hallucinations and enable follow-up.
A good plan should:
- minimize calls (cost/latency)
- maximize expected information gain
- validate tool outputs (schema checks)
- use safe defaults
- include fallback/clarifying questions
MoAC can compute an expected utility score:
[ U(\text{plan}) \approx \mathbb{E}[\text{task_success}] - \lambda_1 \cdot \text{cost} - \lambda_2 \cdot \text{latency} - \lambda_3 \cdot \text{risk} ]
MoAC can start as a hand-built system and become learnable over time.
- Create initial experts from curated “work modes”
- Create initial strategy templates from runbooks and best practices
- Populate context atoms from code/doc/ticket analyzers
Data sources:
- Successful prompt → atoms used
- Human annotations (“these were the right atoms”)
- Downstream outcomes (task success)
Methods:
- clustering prompt embeddings → expert centroids
- learning weights from co-occurrence (atoms used in successful sessions)
- supervised learning from labeled relevance sets
Data sources:
- tool call logs and outcomes
- latency/cost telemetry
- human “good plan” examples
Methods:
- behavior cloning: predict tool sequences from historical traces
- bandits: explore tool choices with guarded rollout
- offline RL: optimize success metrics under constraints
“Why did the auth service start failing after yesterday’s deploy?”
MoAC routing might activate:
-
Context experts:
incident_triageauth_domainrecent_changes_bias
-
Tool experts:
- logs, deploy history, runbooks
-
Strategy experts:
- incident triage strategy for auth services
Context Pack might include:
- Auth service architecture atom
- “Token refresh known issue” ticket atom
- Recent deploy summary atom (last 48h)
- Ownership/contact atom (oncall, recent authors)
Tool Plan might do:
- Fetch runbook section relevant to “auth failures”
- Fetch deploy diff summary (commits, config changes)
- Query error logs for signature changes
- (Optional) query metrics for error-rate spike correlating with deploy
The LLM then answers with grounded evidence and a prioritized hypothesis list, plus suggested next steps.
MoAC increases leverage; therefore, it must be constrained.
- Every atom and tool call must be permission-checked.
- Tool experts and strategies must obey org policy (least privilege).
- Strategy templates should declare required scopes.
- Prefer summarized atoms over raw sensitive data.
- Redact/avoid secrets and PII by default.
- Use “need-to-know” retrieval policies.
- Log: which experts fired, which atoms selected, which tools called, why.
- Provide an “explain route” view for debugging.
- Overconfident routing → use mixture (top-N) + entropy thresholds.
- Index staleness → decay weights; retrain periodically; freshness gates.
- Tool misuse → allowlists; risk scoring; approval gates.
-
Atom store:
- ANN index over atom embeddings
- metadata store for tags/freshness/access
-
Expert store:
- expert embedding index
- expert → sparse weights (top atoms / tag priors)
-
Tool/strategy store:
- embeddings + metadata (schemas, scopes)
- strategy templates + guards
def moac_route(prompt: str) -> dict:
q = embed(prompt)
# 1) Context mixture
ctx_experts = topn_similar(q, context_expert_index, n=8)
alpha = softmax([sim(q, e.embedding) / TAU for e in ctx_experts])
atom_scores = defaultdict(float)
for w, e in zip(alpha, ctx_experts):
for atom_id, atom_w in e.sparse_atom_weights:
atom_scores[atom_id] += w * atom_w
# Apply constraints + diversity + freshness
atom_ids = select_atoms(atom_scores)
# 2) Tool mixture
tool_experts = topn_similar(q, tool_expert_index, n=6)
tool_mix = mix_tools(q, tool_experts)
# 3) Strategy mixture
strat_experts = topn_similar(q, strategy_index, n=6)
strategies = propose_strategies(q, strat_experts, tool_mix)
return {
"context_pack": assemble_context(atom_ids),
"tool_plan": assemble_plan(strategies, tool_mix),
"explain": {
"context_experts": [(e.id, w) for e, w in zip(ctx_experts, alpha)],
"tool_experts": [e.id for e in tool_experts],
"strategy_experts": [e.id for e in strat_experts],
}
}A simple, robust format for LLM consumption:
- Short header: what this pack is, and how it was selected
- Sections by category (docs, code, tickets, people, ops)
- Each atom includes provenance and a confidence/weight
This makes the pack both machine-usable and human-debuggable.
MoAC should be evaluated on:
- Task success (did the system solve the issue / produce correct patch?)
- Groundedness (claims supported by atoms/tool outputs)
- Tool efficiency (calls per task, latency, cost)
- Robustness (performance under ambiguous prompts)
- User trust (interpretability of routing + ability to override)
Recommended harness:
- A fixed set of tasks (bug triage, feature dev, refactor, incident response)
- Offline replay with logged ground truth
- A/B tests versus baseline RAG and baseline “LLM picks tools”
- Cold start: without good experts/strategies, MoAC is just extra plumbing.
- Maintenance: experts can drift as code/org evolves.
- Overfitting to modes: rare tasks may route incorrectly.
- Complexity: routing + planning + tool execution introduces more failure points.
MoAC works best when paired with:
- clear atom generation pipelines
- a small number of high-quality initial strategies
- strong guardrails and observability
- Hierarchical MoAC: route first to a domain, then to sub-modes.
- Adaptive mixtures: re-route after each tool call using updated state.
- Differentiable training: learn experts and routing end-to-end (where feasible).
- Personalized mixtures: adapt to team/project/user while preserving privacy boundaries.
- Bidirectional integration: allow the LLM to request specific experts (“switch to incident triage mode”).
code_understanding_v1: find owners, read module summary, trace call graphbug_triage_v1: reproduce steps, check recent changes, search similar ticketsincident_triage_v1: runbook → deploy diff → logs → metrics correlationsecurity_review_v1: sensitive sinks, auth boundaries, known vulns, threat checklistrefactor_planning_v1: interface boundaries, tests, callers, risk hotspots
Expose to developers:
- top experts (with weights)
- top atoms (with scores and provenance)
- chosen tools and strategies (with guards)
- reasons for excluding atoms (permissions, redundancy, size)
This makes MoAC debuggable and helps iterate on indices/strategies quickly.
Mixture of Attended Contexts (MoAC) reframes retrieval as attention routing: selecting not only what information to include, but also how to act—via tool selection and MCP strategy orchestration. By representing reusable work modes as context experts and strategy templates as attended objects, MoAC offers a practical path toward more reliable, interpretable, and efficient LLM systems in real-world engineering environments.