Large language models (LLMs) struggle with long-range context awareness in complex codebases and organizational knowledge systems. Traditional retrieval-augmented generation (RAG) selects documents based on semantic similarity to a prompt, but it does not model attention structure. As a result, retrieval is reactive and often shallow: it retrieves what looks similar, not what is structurally relevant.
This paper proposes EigenAttention, an embedding-conditioned external attention routing mechanism that approximates transformer-style attention at the schema level rather than the document level. Instead of retrieving documents directly from a prompt embedding, the system retrieves an attention prior—a reusable ranking distribution over contextual entries—based on similarity between the prompt embedding and a set of learned attention indices.
The result is schema-aware contextualization: prompts retrieve patterns of relevance, not just similar documents. This approach better models how humans recall structured knowledge and how transformers would behave if given unlimited context.
Modern LLM workflows typically rely on:
- Prompt engineering
- Retrieval-Augmented Generation (RAG)