TokenZip v2 transforms Karpathy's llm wiki concept into a gzip like token compression engine on top of entire codebase, which can reduce the LLM input token cost upto by 95% when using with Coding Copilots like Claude Code, Codex etc. Instead of generating a flat text summary, it builds a multi-level, queryable, chainable knowledge graph — from repo → modules → files → symbols — stored locally in .tokenzip/db, exposed as an MCP server for any AI copilot, and kept fresh via git hooks
| Problem | Impact |
|---|---|
| AI copilots lack structural awareness of large codebases | They hallucinate imports, miss dependencies, suggest changes in wrong modules |
| Text-based token references are flat and non-queryable | Cannot ask "which functions depend on this interface?" or "what modules does this feature span?" |
| No persistent code intelligence layer | Every session re-parses from scratch, wasting tokens and time |
| Documentation (PRD/HLD/LLD/README) is unstructured | AI can't extract workflows, sequence diagrams, or release plans from markdown |
| Cross-language dependency tracking is manual | A SQL schema change affecting 3 TS files is invisible until runtime |
| Cross-repository dependency tracking is manual | The current repository has no awareness of dependent or upstream repositories, including shared interfaces, API contracts, endpoint usage, schema dependencies, or cross-repo integrations — making impact analysis and coordinated changes error-prone |
| | Version-aware dependency conflicts are difficult to detect | AI copilots and developers lack visibility into incompatible interface versions, breaking API/schema changes, SDK mismatches, or transitive dependency drift across repositories — causing silent integration failures and upgrade risks
- AI Copilot Users (Claude Code, Codex, OpenCode, Kilo Code) — need structured context without token waste
- Full-stack Developers working in monorepos with 50+ modules
- Tech Leads auditing codebase structure and dependency health
- Onboarding Engineers needing rapid codebase mental model
"Your codebase as a queryable graph — not a text dump. Ask structural questions, get precise answers, zero hallucination."
Repository
└── Module (auto-detected: package.json, pyproject.toml, go.mod, Cargo.toml, etc.)
└── File
└── Symbol (function, class, interface, variable, table, column, etc.)
Acceptance Criteria:
- Auto-detect module boundaries by presence of manifest files
- Support nested modules (monorepo: repo → apps/web → src/components)
- Each node has a stable UUID that survives renames (content-hash + path-hash hybrid)
| Language | Extracted Artifacts |
|---|---|
.js, .mjs |
Functions, classes, exports, imports, global vars, JSDoc |
.ts, .tsx |
Above + interfaces, type aliases, generics, enums, decorators, namespace exports |
.py |
Functions, classes, decorators, type hints, imports, async defs |
.sql |
Tables, views, columns, constraints, indexes, foreign keys, stored procedures |
.go |
Functions, structs, interfaces, methods, packages, imports |
.rs |
Functions, structs, traits, impls, enums, mods, use statements |
.java, .kt |
Classes, interfaces, methods, annotations, packages |
.md (special) |
Headings, lists, code blocks, mermaid diagrams, tables, frontmatter |
Acceptance Criteria:
- Each symbol stored as a node with: name, kind, signature, line range, hash, docstring
- Relationships:
CALLS,IMPLEMENTS,INHERITS,IMPORTS,EXPORTS,MODIFIES,READS - Incremental parse: only re-parse files whose content hash changed
- Parse errors stored as node metadata (not silently dropped)
For structured markdown files (.prd.md, .hld.md, .lld.md, README.md, CHANGELOG.md, ADR/*.md):
| Section Type | Extracted Structure |
|---|---|
## Workflow / ## Flow |
Ordered step graph with actors and actions |
## Sequence Diagram |
Parsed mermaid sequenceDiagram into actor→message→actor edges |
## Flowchart |
Parsed mermaid flowchart into decision/action node graph |
## Release Plan |
Timeline with milestones, versions, dates |
## API |
Endpoint → method → params → response schema |
## Architecture / ## Components |
Component hierarchy with responsibility and tech stack |
## Decision (ADR) |
Context → Decision → Consequences as structured tuple |
| Standard lists | Typed list items (checkbox, numbered, bullet) with nesting |
| Tables | Columnar data as records |
Acceptance Criteria:
- Mermaid blocks parsed into graph nodes, not stored as raw text
- Section-level linking: a workflow step can reference a function symbol node
- Cross-reference resolution:
[see ModuleX]in PRD links to Module node in graph
// Level 1: Repository
const repo = tz.repo('.');
// Level 2: Modules (filterable, chainable)
const feModules = repo.modules().filter(m => m.language === 'typescript');
// Level 3: Files within modules
const tsFiles = feModules.files().filter(f => f.ext === '.tsx');
// Level 4: Symbols within files
const exportedComponents = tsFiles.symbols()
.filter(s => s.kind === 'class' && s.isExported && s.extends('React.Component'));
// Cross-cutting queries
const dependants = tz.repo('.').symbol('UserService.authenticate')
.dependants() // who calls this?
.withinModule('api-gateway') // scope it
.withKind('function'); // filter
const impact = tz.repo('.').table('users')
.columns() // what columns
.referencedBy() // where are they referenced
.files(); // which files
const workflow = tz.repo('.').doc('prd.md')
.section('Workflow: User Onboarding')
.steps() // ordered steps
.linkedSymbols(); // what code implements each stepAcceptance Criteria:
- Every level returns a query builder, not raw data (lazy evaluation)
-
.toArray(),.toGraph(),.toMarkdown(),.toJSON()terminal methods - Queries translate to SurrealDB graph traversal queries
- Response < 100ms for repos up to 100K files
- Engine: SurrealDB (embedded via RocksDB storage)
- Location:
<project_root>/.tokenzip/db/ - Schema: Schemaful (strict types per node kind)
- Persistence: WAL-enabled, crash-safe
Acceptance Criteria:
-
.tokenzip/added to.gitignoreautomatically - DB size < 10% of source code size for typical repos
- Cold start (first full parse) completes at > 500 files/second
- Hot start (incremental) completes at > 2000 files/second
# Installed via: tokenzip init
# Creates .git/hooks/pre-commit and .git/hooks/post-commit
pre-commit:
1. Detect staged files (git diff --cached --name-only)
2. Parse changed files with tree-sitter
3. Diff new AST against stored graph
4. Validate: no broken exports, no orphan imports
5. Update graph with new symbol nodes/edges
6. If validation fails: warn (configurable: warn/block)
post-commit:
1. Store commit metadata (hash, message, author, timestamp)
2. Create COMMIT → MODIFIED → FILE edges
3. Update file-level git history nodesAcceptance Criteria:
- Hook installation is non-destructive (appends to existing hooks)
- Hook execution adds < 500ms to commit time for typical changes (< 10 files)
-
tokenzip init --no-hooksflag for CI environments -
tokenzip statusshows graph health (stale files, broken references)
Acceptance Criteria:
- MCP server starts in < 200ms
- All tools return structured JSON (never raw text dumps)
- Token budget aware: responses include
token_countmetadata - Works with Claude Code, Codex, OpenCode, Kilo Code without config changes
- Concurrent tool calls supported (SurrealDB connection pooling)
| Workflow | Input | Output | Graph Operations |
|---|---|---|---|
| Create Module | module name, type, dependencies | Scaffolded structure + graph nodes | CREATE module, CREATE files, CREATE IMPORTS edges |
| Update Module | module name, change description | Affected files + symbols list | READ dependants, READ dependents, DIFF graph |
| Implement Feature | feature description, target module | Files to create/modify, symbol gaps | SEARCH related symbols, PATH analysis, IMPACT query |
| Upgrade Feature | feature name, upgrade description | Migration plan + affected modules | SUBGRAPH extraction, DEPENDENCY chain analysis |
| Bug Fix | error message / stack trace | Root cause candidates + impact radius | TRACE call chain, FIND modified symbols in git blame range |
Acceptance Criteria:
- Each workflow is a deterministic graph query sequence, not LLM-generated
- Workflows return structured data that an LLM can act on (not final answers)
- Workflow results are cached and timestamped in the graph
| Category | Requirement |
|---|---|
| Performance | Full index of 100K file repo < 3 minutes; incremental update < 2 seconds |
| Memory | MCP server idle < 50MB; parsing peak < 500MB |
| Reliability | Never corrupt the graph on crash; WAL recovery on restart |
| Compatibility | Node.js 20+, macOS 12+, Ubuntu 22.04+, Windows WSL2 |
| Security | No network calls; all data local; no code execution from graph |
| Extensibility | New language support via plugin (tree-sitter grammar + extractor config) |
| Metric | Target |
|---|---|
| Copilot context accuracy (relevant vs irrelevant tokens) | > 85% (vs ~40% with text dump) |
Time to first useful query after tokenzip init |
< 5 minutes for 50K file repo |
| Hook overhead per commit | < 500ms |
| MCP tool call latency (p95) | < 200ms |
| Graph size efficiency | < 10% of source size |
- Remote graph synchronization (multi-developer shared graph)
- LLM-powered code generation (this is a context layer, not a code writer)
- Runtime analysis (only static analysis via tree-sitter)
- Binary file parsing (images, compiled artifacts)
- IDE plugin (VS Code extension is v3)
| Phase | Scope | Timeline |
|---|---|---|
| Alpha | Core graph + JS/TS parsing + MCP server + basic queries | Week 1-3 |
| Beta | All languages + git hooks + documentation intelligence | Week 4-6 |
| RC | Workflow templates + chainable API polish + perf tuning | Week 7-8 |
| GA | Stability hardening + plugin system + docs | Week 9-10 |
TokenZip v2 is a local-first, static-analysis graph engine with four layers:
┌─────────────────────────────────────────────────────────────────┐
│ LAYER 4: INTEGRATION │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌───────────────┐ │
│ │ Claude │ │ Codex │ │ OpenCode │ │ Kilo Code │ │
│ │ Code │ │ │ │ │ │ │ │
│ └────┬─────┘ └────┬─────┘ └────┬─────┘ └──────┬────────┘ │
│ │ │ │ │ │
│ └──────────────┴──────┬───────┴────────────────┘ │
│ │ MCP Protocol (stdio/SSE) │
├─────────────────────────────┼───────────────────────────────────┤
│ LAYER 3: API & QUERY │
│ ┌──────────────────────────┴──────────────────────────────┐ │
│ │ MCP Server │ │
│ │ ┌─────────────────┐ ┌──────────────────────────────┐ │ │
│ │ │ Tool Registry │ │ Resource Registry │ │ │
│ │ └────────┬────────┘ └──────────────┬───────────────┘ │ │
│ │ └──────────┬───────────────┘ │ │
│ │ ┌───────┴────────┐ │ │
│ │ │ Chainable Query│ │ │
│ │ │ Builder (CQB) │ │ │
│ │ └───────┬────────┘ │ │
│ └──────────────────────┼──────────────────────────────────┘ │
├──────────────────────────┼──────────────────────────────────────┤
│ LAYER 2: ENGINE │
│ ┌───────────────────────┼──────────────────────────────────┐ │
│ │ ┌────────────┐ ┌────┴─────┐ ┌──────────┐ ┌───────┐ │ │
│ │ │ Tree-Sitter│ │ Markdown │ │ Workflow │ │ Graph │ │ │
│ │ │ Extractor │ │ Parser │ │ Engine │ │ Query │ │ │
│ │ │ (per lang) │ │ (struct) │ │ (tpl) │ │ Planner│ │ │
│ │ └─────┬──────┘ └────┬─────┘ └────┬─────┘ └───┬───┘ │ │
│ │ └──────────────┼──────────────┼────────────┘ │ │
│ │ ┌───────┴──────────────┴───────┐ │ │
│ │ │ Graph Mutation Engine │ │ │
│ │ │ (diff, merge, validate) │ │ │
│ │ └───────────────┬───────────────┘ │ │
│ └──────────────────────────────┼────────────────────────────┘ │
├──────────────────────────────┼─────────────────────────────────┤
│ LAYER 1: STORAGE │
│ ┌───────────────────────────┼─────────────────────────────┐ │
│ │ ┌────────────┴────────────┐ │ │
│ │ │ Storage Abstraction │ │ │
│ │ │ (IStore interface) │ │ │
│ │ └────────────┬────────────┘ │ │
│ │ ┌──────────────────┼──────────────────┐ │ │
│ │ ┌─────┴──────┐ ┌─────┴──────┐ ┌─────┴──────┐ │ │
│ │ │ SurrealDB │ │ SQLite │ │ In-Memory │ │ │
│ │ │ (primary) │ │ (fallback) │ │ (tests) │ │ │
│ │ └────────────┘ └────────────┘ └────────────┘ │ │
│ └──────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ SIDE CHANNELS │
│ ┌──────────────┐ ┌───────────────┐ ┌────────────────────┐ │
│ │ Git Hooks │ │ File Watcher │ │ CLI (tokenzip) │ │
│ │ pre-commit │ │ (optional) │ │ init, parse, query │ │
│ │ post-commit │ │ chokidar │ │ status, serve │ │
│ └──────────────┘ └───────────────┘ └────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
┌─────────────────────┐
│ File Input Stream │
└──────────┬──────────┘
│
┌──────────┴──────────┐
│ Language Detector │
│ (extension + shebang│
│ + .editorconfig) │
└──────────┬──────────┘
│
┌────────────────┼────────────────┐
│ │ │
┌────────┴──────┐ ┌──────┴──────┐ ┌──────┴──────┐
│ Code Extractor│ │ SQL Extract.│ │ MD Extractor│
│ (JS/TS/Py/Go │ │ (Tables, │ │ (Sections, │
│ /Rs/Java/Kt) │ │ Columns, │ │ Mermaid, │
│ │ │ FKs, SPs) │ │ Lists, │
│ │ │ │ │ Tables) │
└───────┬───────┘ └──────┬──────┘ └──────┬──────┘
│ │ │
└────────────────┼────────────────┘
│
┌─────────┴──────────┐
│ Symbol Graph │
│ (nodes + edges) │
└────────────────────┘
Key Design Decision: Extractors produce an intermediate representation (IR) — a flat list of SymbolNode and SymbolEdge objects — regardless of source language. This decouples parsing from storage.
QueryBuilder
├── .repo(path) → RepoScope
│ ├── .modules() → ModuleScope
│ │ ├── .files() → FileScope
│ │ │ ├── .symbols() → SymbolScope
│ │ │ ├── .tables() → TableScope
│ │ │ └── .sections()→ SectionScope
│ │ ├── .dependencies() → ModuleScope (external deps)
│ │ └── .dependants() → ModuleScope
│ ├── .files() → FileScope (all files, no module filter)
│ ├── .symbols() → SymbolScope (global search)
│ ├── .tables() → TableScope
│ └── .docs() → DocScope
├── .symbol(name) → SymbolScope (direct lookup)
├── .table(name) → TableScope
├── .commit(hash) → CommitScope
└── .workflow(name) → WorkflowScope
Every Scope has:
├── .filter(predicate) → same Scope (adds WHERE clause)
├── .sort(field, dir) → same Scope
├── .limit(n) → same Scope
├── .offset(n) → same Scope
└── Terminal methods:
├── .toArray() → SymbolNode[]
├── .toGraph() → { nodes: [], edges: [] }
├── .toMarkdown() → string
├── .toJSON() → string
├── .count() → number
└── .exists() → boolean
┌─────────────────────────────────────────────┐
│ MCP Server │
│ │
│ ┌─────────────────────────────────────┐ │
│ │ Transport Layer │ │
│ │ ┌──────────┐ ┌───────────────┐ │ │
│ │ │ stdio │ │ SSE/HTTP │ │ │
│ │ │ (default)│ │ (optional) │ │ │
│ │ └────┬─────┘ └──────┬────────┘ │ │
│ └───────┼──────────────────┼───────────┘ │
│ └──────────┬───────┘ │
│ ┌─────┴──────┐ │
│ │ Protocol │ │
│ │ Handler │ │
│ └─────┬──────┘ │
│ │ │
│ ┌─────────────────┼─────────────────────┐ │
│ │ Tool Dispatcher │ │
│ │ ┌──────────┐ ┌──────────┐ ┌────────┐ │ │
│ │ │ Structure│ │ Search │ │ Impact │ │ │
│ │ │ Tools │ │ Tools │ │ Tools │ │ │
│ │ └────┬─────┘ └────┬─────┘ └───┬────┘ │ │
│ │ └─────────────┼───────────┘ │ │
│ │ ┌──────┴──────┐ │ │
│ │ │ CQB │ │ │
│ │ │ (shared) │ │ │
│ │ └──────┬──────┘ │ │
│ └─────────────────────┼──────────────────┘ │
│ │ │
│ ┌─────────────────────┼──────────────────┐ │
│ │ Token Budget Manager │ │
│ │ - Estimates response token count │ │
│ │ - Truncates if over budget │ │
│ │ - Prioritizes: symbols > files > mods │ │
│ └─────────────────────────────────────────┘ │
└─────────────────────────────────────────────┘
pre-commit trigger
│
▼
┌──────────────────┐
│ git diff --cached │
│ --name-only │
└───────┬──────────┘
│ staged file paths
▼
┌──────────────────┐
│ Content Hash │ ← SHA256 of file content
│ Check │ ← Compare with stored hash
└───────┬──────────┘
│ changed files only
▼
┌──────────────────┐
│ Tree-Sitter │ ← Parallel parse (worker threads)
│ Batch Parse │
└───────┬──────────┘
│ new symbol IR
▼
┌──────────────────┐
│ Graph Diff │ ← Old symbols vs new symbols
│ & Merge │ ← Update nodes, edges, hashes
└───────┬──────────┘
│
▼
┌──────────────────┐
│ Validation │ ← Check: broken exports, orphan imports,
│ (optional) │ missing type references
└───────┬──────────┘
│
┌────┴────┐
│ │
▼ ▼
PASS FAIL
│ │
▼ ▼
Continue Warn/Block
Commit (configurable)
┌─────────────────────────────────────────────────────────────────┐
│ NODE: repository │
│ id: string (record ID) │
│ name: string │
│ root: string (absolute path) │
│ created_at: datetime │
│ updated_at: datetime │
│ stats: { files: number, modules: number, symbols: number } │
└─────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ NODE: module │
│ id: string │
│ name: string │
│ path: string (relative to repo root) │
│ manifest_type: string (package.json | pyproject.toml | ...) │
│ language: string (primary language) │
│ is_root: bool │
│ metadata: { name, version, description, ... } │
└─────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ NODE: file │
│ id: string │
│ path: string (relative to repo root) │
│ module_id: string (reference to module) │
│ language: string │
│ ext: string │
│ size_bytes: number │
│ content_hash: string (SHA256) │
│ line_count: number │
│ parse_status: string (parsed | partial | failed | skipped) │
│ parse_error: option<string> │
│ last_parsed: datetime │
│ git_last_modified: option<datetime> │
│ git_blame_summary: option<{ author, date, commit_count }> │
└─────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ NODE: symbol (polymorphic by kind) │
│ id: string │
│ file_id: string │
│ name: string │
│ kind: enum { │
│ function, method, constructor, │
│ class, interface, type_alias, enum, │
│ variable, constant, property, │
│ parameter, generic_param, │
│ decorator, annotation, │
│ table, view, column, index, constraint, │
│ foreign_key, stored_procedure, │
│ import, export, re_export, │
│ namespace, module_decl, │
│ section, subsection, │
│ workflow_step, diagram_node, │
│ list_item, table_row │
│ } │
│ signature: option<string> (full signature text) │
│ return_type: option<string> │
│ start_line: number │
│ end_line: number │
│ start_col: number │
│ end_col: number │
│ docstring: option<string> │
│ is_exported: bool │
│ is_async: option<bool> │
│ is_static: option<bool> │
│ visibility: option<enum { public, private, protected }> │
│ modifiers: array<string> │
│ parent_symbol_id: option<string> (for nested symbols) │
│ metadata: object (language-specific extras) │
│ // For tables: { schema, engine, columns: [...] } │
│ // For classes: { implements: [...], extends: ... } │
│ // For functions: { params: [...], generics: [...] } │
│ // For sections: { level, anchor_id } │
└─────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ NODE: commit │
│ id: string │
│ hash: string (full SHA) │
│ short_hash: string (7 char) │
│ message: string │
│ author: string │
│ email: string │
│ date: datetime │
│ branch: string │
│ tags: array<string> │
└─────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ NODE: dependency (external) │
│ id: string │
│ module_id: string (which module depends on it) │
│ name: string (npm package name, pip package, etc.) │
│ version: string (resolved version) │
│ dev: bool │
│ source: string (npm, pip, cargo, go modules, maven) │
└─────────────────────────────────────────────────────────────────┘
EDGE: contains
FROM: repository → TO: module
FROM: module → TO: file
FROM: file → TO: symbol
FROM: symbol → TO: symbol (nested: class → method)
EDGE: imports
FROM: file → TO: file (file-level import)
FROM: module → TO: module (module-level dependency)
FROM: symbol → TO: symbol (symbol-level import)
METADATA: { is_type_only: bool, is_default: bool, alias: option<string> }
EDGE: exports
FROM: file → TO: symbol
FROM: symbol → TO: symbol (re-export chain)
METADATA: { is_default: bool, is_reexport: bool, alias: option<string> }
EDGE: calls
FROM: symbol (function/method) → TO: symbol (function/method)
METADATA: { line: number, is_async: bool, call_type: enum { direct, indirect, dynamic } }
EDGE: implements
FROM: symbol (class) → TO: symbol (interface)
METADATA: { is_partial: bool }
EDGE: inherits
FROM: symbol (class/interface) → TO: symbol (class/interface)
METADATA: { is_interface_inheritance: bool }
EDGE: modifies
FROM: symbol (function) → TO: symbol (variable/table/column)
EDGE: reads
FROM: symbol (function) → TO: symbol (variable/table/column)
EDGE: references
FROM: symbol → TO: symbol (generic "uses" relationship)
METADATA: { context: string }
EDGE: depends_on
FROM: module → TO: module (transitive closure of imports)
FROM: file → TO: file
METADATA: { is_transitive: bool, depth: number }
EDGE: depended_by (computed reverse of depends_on)
EDGE: modified_in
FROM: file → TO: commit
METADATA: { change_type: enum { added, modified, deleted, renamed } }
EDGE: authored_by
FROM: file/symbol → TO: commit (latest commit touching this artifact)
EDGE: belongs_to_workflow
FROM: symbol → TO: symbol (workflow_step)
EDGE: workflow_transition
FROM: symbol (workflow_step) → TO: symbol (workflow_step)
METADATA: { condition: option<string>, action: option<string> }
EDGE: diagram_edge
FROM: symbol (diagram_node) → TO: symbol (diagram_node)
METADATA: { label: string, style: string, type: enum { solid, dashed, dotted, bold } }
EDGE: foreign_key
FROM: symbol (column) → TO: symbol (table)
METADATA: { constraint_name: string, on_delete: string, on_update: string }
EDGE: column_of
FROM: symbol (column/index/constraint) → TO: symbol (table)
DEFINE INDEX idx_file_path ON file FIELDS path UNIQUE
DEFINE INDEX idx_file_hash ON file FIELDS content_hash
DEFINE INDEX idx_file_module ON file FIELDS module_id
DEFINE INDEX idx_symbol_name ON symbol FIELDS name
DEFINE INDEX idx_symbol_kind ON symbol FIELDS kind
DEFINE INDEX idx_symbol_file ON symbol FIELDS file_id
DEFINE INDEX idx_symbol_export ON symbol FIELDS is_exported
DEFINE INDEX idx_module_path ON module FIELDS path UNIQUE
DEFINE INDEX idx_commit_hash ON commit FIELDS hash UNIQUE
DEFINE INDEX idx_dep_name ON dependency FIELDS name, module_id
| Component | Technology | Rationale |
|---|---|---|
| Runtime | Node.js 20+ (ESM) | Universal, tree-sitter bindings available, MCP SDK native |
| Tree-Sitter | tree-sitter + language grammars |
Industry standard, incremental parsing, multi-language |
| Graph DB | SurrealDB v2 (embedded/RocksDB) | Native graph queries, schemaful, embedded mode, no server |
| Fallback DB | better-sqlite3 | Zero-config fallback if SurrealDB unavailable |
| MCP | @modelcontextprotocol/sdk |
Official SDK, stdio + SSE transport |
| CLI | commander |
Battle-tested CLI framework |
| Git | simple-git |
Promise-based git operations |
| File Watch | chokidar |
Cross-platform, efficient |
| Logging | pino |
Structured, fast |
| Testing | vitest + memfs |
Fast, in-memory FS for unit tests |
| Bundling | tsup |
ESM + CJS dual output, tree-shaking |
| Markdown | unified + remark + rehype |
Pluggable markdown AST pipeline |
| Mermaid | mermaid (headless) |
Parse mermaid diagrams to structured data |
Claude Code / Codex / OpenCode
│
│ MCP Protocol (JSON-RPC 2.0 over stdio)
│
┌────┴─────┐
│ MCP │
│ Server │
└────┬─────┘
│
┌────┴──────────────────────────────────┐
│ Tool Calls │
│ │
│ 1. query_repo_structure │
│ → Returns module tree + stats │
│ │
│ 2. query_symbol { name, scope } │
│ → Symbol node + edges │
│ │
│ 3. get_impact_analysis { symbol_id } │
│ → Dependents + transitive closure │
│ │
│ 4. search_symbols { query, filters } │
│ → Fuzzy match on name/signature │
│ │
│ 5. get_workflow { doc, section } │
│ → Structured workflow + links │
│ │
│ 6. get_git_history { path, limit } │
│ → Commit chain for file/symbol │
│ │
│ 7. execute_workflow_template { │
│ type, params } │
│ → Structured analysis result │
│ │
│ 8. get_dependencies { module_id } │
│ → Internal + external deps │
│ │
│ 9. get_dependants { symbol_id } │
│ → Reverse dependency chain │
│ │
│ 10. get_context_for_files { │
│ paths, max_tokens } │
│ → Token-budget-aware context │
│ │
└───────────────────────────────────────┘
{
"mcpServers": {
"tokenzip": {
"command": "npx",
"args": ["tokenzip", "serve", "--cwd", "/path/to/project"],
"env": {}
}
}
}- No network: All data stays local. SurrealDB binds to
127.0.0.1only if HTTP transport used. - No code execution: Graph stores metadata only. No eval, no require from stored data.
- Path traversal protection: All file paths resolved and canonicalized before storage.
- Git hook safety: Hooks are read-only from git's perspective (never force-push, never amend).
.tokenzip/in.gitignore: Automatically appended, never committed.- Token budget: MCP responses capped at configurable token limit to prevent context overflow.
Local Developer Machine
│
├── ~/.tokenzip/
│ ├── config.json # Global config
│ ├── surrealdb/ # Shared SurrealDB binary (if not system-installed)
│ └── cache/ # Cross-project cache
│
└── <project-root>/
├── .tokenzip/
│ ├── db/ # SurrealDB data directory
│ │ ├── data.db # RocksDB storage
│ │ └── lock # Process lock
│ ├── config.json # Project-specific config
│ │ ├── languages: [...]
│ │ ├── excluded: [...]
│ │ ├── hooks: { preCommit: "warn" | "block" | "off" }
│ │ └── mcp: { maxTokens: 8000, transport: "stdio" }
│ └── state.json # Parse state, last commit, version
│
├── .git/
│ └── hooks/
│ ├── pre-commit # Appended tokenzip hook
│ └── post-commit # Appended tokenzip hook
│
└── .gitignore # Contains .tokenzip/
tokenzip/
├── src/
│ ├── index.ts # Public API entry point
│ │
│ ├── cli/ # CLI layer
│ │ ├── index.ts # Commander setup
│ │ ├── commands/
│ │ │ ├── init.ts # tokenzip init
│ │ │ ├── parse.ts # tokenzip parse [--full | --incremental]
│ │ │ ├── query.ts # tokenzip query <cqb-expression>
│ │ │ ├── status.ts # tokenzip status
│ │ │ ├── serve.ts # tokenzip serve [--transport stdio|sse] [--port 3000]
│ │ │ ├── hooks.ts # tokenzip hooks install|uninstall
│ │ │ └── clean.ts # tokenzip clean
│ │ └── utils/
│ │ └── spinner.ts
│ │
│ ├── mcp/ # MCP server layer
│ │ ├── server.ts # MCP server creation & setup
│ │ ├── transport/
│ │ │ ├── stdio.ts
│ │ │ └── sse.ts
│ │ ├── tools/
│ │ │ ├── registry.ts # Tool registration
│ │ │ ├── structure.ts # query_repo_structure, query_module
│ │ │ ├── symbol.ts # query_symbol, search_symbols
│ │ │ ├── dependency.ts # get_dependencies, get_dependants
│ │ │ ├── impact.ts # get_impact_analysis
│ │ │ ├── git.ts # get_git_history
│ │ │ ├── workflow.ts # get_workflow, execute_workflow_template
│ │ │ └── context.ts # get_context_for_files
│ │ ├── resources/
│ │ │ ├── registry.ts
│ │ │ ├── repo.ts
│ │ │ ├── module.ts
│ │ │ ├── file.ts
│ │ │ └── symbol.ts
│ │ └── token-budget.ts # Token estimation & truncation
│ │
│ ├── query/ # Chainable Query Builder
│ │ ├── builder.ts # Base QueryBuilder class
│ │ ├── scopes/
│ │ │ ├── repo-scope.ts
│ │ │ ├── module-scope.ts
│ │ │ ├── file-scope.ts
│ │ │ ├── symbol-scope.ts
│ │ │ ├── table-scope.ts
│ │ │ ├── commit-scope.ts
│ │ │ ├── doc-scope.ts
│ │ │ └── workflow-scope.ts
│ │ ├── filters.ts # Filter predicate parser
│ │ ├── translators/
│ │ │ ├── surrealql.ts # CQB → SurrealQL translation
│ │ │ └── sql.ts # CQB → SQL translation (SQLite fallback)
│ │ └── types.ts
│ │
│ ├── engine/ # Core engine layer
│ │ ├── indexer.ts # Full & incremental indexing orchestrator
│ │ ├── differ.ts # Graph diff: old symbols vs new symbols
│ │ ├── merger.ts # Merge diff into graph
│ │ ├── validator.ts # Reference integrity validation
│ │ ├── module-detector.ts # Detect module boundaries
│ │ └── language-detector.ts # Detect language from extension + content
│ │
│ ├── extractor/ # Tree-sitter extraction layer
│ │ ├── base-extractor.ts # Abstract extractor interface
│ │ ├── registry.ts # Language → extractor mapping
│ │ ├── code/
│ │ │ ├── javascript.ts # JS/JSX extractor
│ │ │ ├── typescript.ts # TS/TSX extractor
│ │ │ ├── python.ts
│ │ │ ├── go.ts
│ │ │ ├── rust.ts
│ │ │ ├── java.ts
│ │ │ └── kotlin.ts
│ │ ├── sql/
│ │ │ └── sql.ts # SQL extractor (tables, columns, FKs)
│ │ ├── markdown/
│ │ │ ├── markdown.ts # Markdown structure extractor
│ │ │ ├── mermaid.ts # Mermaid diagram parser
│ │ │ └── sections.ts # Section type classifier
│ │ └── types.ts # SymbolIR, EdgeIR types
│ │
│ ├── storage/ # Storage abstraction layer
│ │ ├── interface.ts # IStore interface
│ │ ├── surreal/
│ │ │ ├── connection.ts # Connection pool & lifecycle
│ │ │ ├── migrations.ts # Schema migration
│ │ │ ├── queries/
│ │ │ │ ├── nodes.ts
│ │ │ │ ├── edges.ts
│ │ │ │ ├── graph.ts
│ │ │ │ └── search.ts
│ │ │ └── store.ts # SurrealStore implements IStore
│ │ ├── sqlite/
│ │ │ ├── schema.ts # Table creation
│ │ │ ├── queries/
│ │ │ │ ├── nodes.ts
│ │ │ │ ├── edges.ts
│ │ │ │ └── graph.ts
│ │ │ └── store.ts # SQLiteStore implements IStore
│ │ ├── memory/
│ │ │ └── store.ts # MemoryStore for testing
│ │ └── factory.ts # StoreFactory: config → IStore
│ │
│ ├── hooks/ # Git hook layer
│ │ ├── installer.ts # Install hooks into .git/hooks/
│ │ ├── pre-commit.ts # Pre-commit logic
│ │ ├── post-commit.ts # Post-commit logic
│ │ └── detector.ts # Detect staged files
│ │
│ ├── workflows/ # Workflow template engine
│ │ ├── engine.ts # Workflow executor
│ │ ├── registry.ts # Workflow template registry
│ │ └── templates/
│ │ ├── create-module.ts
│ │ ├── update-module.ts
│ │ ├── implement-feature.ts
│ │ ├── upgrade-feature.ts
│ │ └── bug-fix.ts
│ │
│ ├── utils/
│ │ ├── logger.ts
│ │ ├── hash.ts # Content hashing (SHA256)
│ │ ├── path.ts # Path resolution & normalization
│ │ ├── tokens.ts # Token estimation (chars/4 for code)
│ │ ├── workers.ts # Worker thread pool for parsing
│ │ └── version.ts
│ │
│ └── types/
│ ├── graph.ts # All node & edge types
│ ├── extractor.ts # Extractor IR types
│ ├── query.ts # Query builder types
│ └── config.ts # Configuration types
│
├── grammars/ # Tree-sitter WASM grammars (bundled)
│ ├── tree-sitter-javascript.wasm
│ ├── tree-sitter-typescript.wasm
│ ├── tree-sitter-python.wasm
│ ├── tree-sitter-go.wasm
│ ├── tree-sitter-rust.wasm
│ ├── tree-sitter-java.wasm
│ ├── tree-sitter-kotlin.wasm
│ └── tree-sitter-sql.wasm
│
├── tests/
│ ├── unit/
│ │ ├── extractor/
│ │ │ ├── javascript.test.ts
│ │ │ ├── typescript.test.ts
│ │ │ ├── python.test.ts
│ │ │ ├── sql.test.ts
│ │ │ └── markdown.test.ts
│ │ ├── query/
│ │ │ └── builder.test.ts
│ │ ├── engine/
│ │ │ ├── differ.test.ts
│ │ │ ├── merger.test.ts
│ │ │ └── module-detector.test.ts
│ │ ├── storage/
│ │ │ └── memory-store.test.ts
│ │ └── hooks/
│ │ └── detector.test.ts
│ ├── integration/
│ │ ├── full-parse.test.ts
│ │ ├── incremental-parse.test.ts
│ │ ├── mcp-server.test.ts
│ │ └── git-hook.test.ts
│ ├── fixtures/
│ │ ├── js-project/
│ │ ├── ts-monorepo/
│ │ ├── python-project/
│ │ ├── sql-project/
│ │ └── mixed-project/
│ └── e2e/
│ └── claude-code.test.ts
│
├── package.json
├── tsconfig.json
├── tsup.config.ts
└── vitest.config.ts
// src/storage/interface.ts
import type {
RepositoryNode, ModuleNode, FileNode, SymbolNode,
CommitNode, DependencyNode,
ContainsEdge, ImportsEdge, ExportsEdge, CallsEdge,
ImplementsEdge, InheritsEdge, ModifiesEdge, ReadsEdge,
ReferencesEdge, DependsOnEdge, ModifiedInEdge,
ForeignKeyEdge, ColumnOfEdge,
// ... all edge types
} from '../types/graph';
export interface GraphNode {
id: string;
type: 'repository' | 'module' | 'file' | 'symbol' | 'commit' | 'dependency';
[key: string]: unknown;
}
export interface GraphEdge {
id: string;
type: string;
from: string;
to: string;
[key: string]: unknown;
}
export interface GraphResult {
nodes: GraphNode[];
edges: GraphEdge[];
}
export interface StoreStats {
nodeCount: Record<string, number>;
edgeCount: Record<string, number>;
dbSizeBytes: number;
}
export interface IStore {
// Lifecycle
initialize(): Promise<void>;
close(): Promise<void>;
migrate(): Promise<void>;
clear(): Promise<void>;
stats(): Promise<StoreStats>;
// Node CRUD
createNode<T extends GraphNode>(node: T): Promise<T>;
createNodes<T extends GraphNode>(nodes: T[]): Promise<T[]>;
getNode<T extends GraphNode>(id: string): Promise<T | null>;
getNodes(ids: string[]): Promise<GraphNode[]>;
updateNode<T extends GraphNode>(id: string, patch: Partial<T>): Promise<T>;
deleteNode(id: string): Promise<void>;
deleteNodes(ids: string[]): Promise<void>;
// Edge CRUD
createEdge<T extends GraphEdge>(edge: T): Promise<T>;
createEdges<T extends GraphEdge>(edges: T[]): Promise<T[]>;
getEdges(from: string, type?: string): Promise<GraphEdge[]>;
getEdgesTo(to: string, type?: string): Promise<GraphEdge[]>;
deleteEdges(from: string, type?: string): Promise<void>;
// Graph Queries
query(surrealQL: string, vars?: Record<string, unknown>): Promise<unknown[]>;
graphTraversal(
startId: string,
edgeTypes: string[],
direction: 'outbound' | 'inbound' | 'both',
depth?: number,
filter?: string
): Promise<GraphResult>;
// Bulk Operations
batchUpsert(nodes: GraphNode[], edges: GraphEdge[]): Promise<void>;
// Search
searchNodes(
type: string,
field: string,
query: string,
limit?: number
): Promise<GraphNode[]>;
// Transactions
transaction<T>(fn: (store: IStore) => Promise<T>): Promise<T>;
}// src/extractor/base-extractor.ts
import { Parser, Tree } from 'tree-sitter';
import { SymbolIR, EdgeIR } from './types';
export interface ExtractionResult {
symbols: SymbolIR[];
edges: EdgeIR[];
parseErrors: ParseError[];
}
export interface ParseError {
line: number;
column: number;
message: string;
}
export interface ExtractorContext {
filePath: string;
relativePath: string;
content: string;
contentHash: string;
tree: Tree;
language: string;
moduleId: string;
}
export abstract class BaseExtractor {
abstract readonly language: string;
abstract readonly extensions: string[];
/**
* Extract symbols and edges from a parsed tree.
* Called after tree-sitter has parsed the file.
*/
abstract extract(ctx: ExtractorContext): ExtractionResult;
/**
* Post-process extraction results.
* Resolve internal references, compute derived edges.
* Default implementation does nothing; subclasses can override.
*/
postProcess(
symbols: SymbolIR[],
edges: EdgeIR[],
ctx: ExtractorContext
): { symbols: SymbolIR[]; edges: EdgeIR[] } {
return { symbols, edges };
}
/**
* Generate a stable ID for a symbol.
* Must be deterministic for the same symbol in the same file.
*/
generateSymbolId(
filePath: string,
symbolName: string,
kind: string,
startLine: number
): string {
// Format: sym:<filepath-hash>:<name>:<kind>:<line>
const pathHash = this.hashPath(filePath);
return `sym:${pathHash}:${symbolName}:${kind}:${startLine}`;
}
private hashPath(filePath: string): string {
// First 8 chars of SHA256 of relative path
return createHash('sha256')
.update(filePath)
.digest('hex')
.slice(0, 8);
}
/**
* Walk the tree-sitter AST with a visitor pattern.
* Utility method for subclasses.
*/
protected walk(
node: Parser.SyntaxNode,
visitors: Record<string, (node: Parser.SyntaxNode) => void>
): void {
const visitor = visitors[node.type];
if (visitor) {
visitor(node);
}
for (let i = 0; i < node.childCount; i++) {
this.walk(node.child(i)!, visitors);
}
}
/**
* Extract docstring/JSDoc/comment attached to a node.
*/
protected extractDocstring(node: Parser.SyntaxNode, content: string): string | null {
// Look for preceding comment nodes
const prev = node.previousNamedSibling;
if (prev && (prev.type === 'comment' || prev.type === 'block_comment'
|| prev.type === 'docstring' || prev.type === 'jsdoc')) {
return content.slice(prev.startIndex, prev.endIndex).trim();
}
return null;
}
}// src/extractor/code/typescript.ts
import { BaseExtractor, ExtractorContext, ExtractionResult, SymbolIR, EdgeIR } from '../base-extractor';
export class TypeScriptExtractor extends BaseExtractor {
language = 'typescript';
extensions = ['.ts', '.tsx', '.mts', '.cts'];
extract(ctx: ExtractorContext): ExtractionResult {
const symbols: SymbolIR[] = [];
const edges: EdgeIR[] = [];
const parseErrors: ParseError[] = [];
// Collect parse errors
this.collectErrors(ctx.tree.rootNode, parseErrors, ctx.content);
// Visit top-level and nested declarations
this.walk(ctx.tree.rootNode, {
// Functions
'function_declaration': (node) => {
const name = this.getName(node);
if (!name) return;
symbols.push({
id: this.generateSymbolId(ctx.relativePath, name, 'function', node.startPosition.row + 1),
fileId: `file:${ctx.relativePath}`,
name,
kind: 'function',
signature: this.getSignature(node, ctx.content),
returnType: this.getReturnType(node),
startLine: node.startPosition.row + 1,
endLine: node.endPosition.row + 1,
startCol: node.startPosition.column,
endCol: node.endPosition.column,
docstring: this.extractDocstring(node, ctx.content),
isExported: this.isExported(node),
isAsync: this.hasModifier(node, 'async'),
isStatic: false,
visibility: this.getVisibility(node),
modifiers: this.getModifiers(node),
metadata: {
params: this.extractParams(node, ctx.content),
generics: this.extractGenerics(node, ctx.content),
typeParams: this.extractTypeParams(node),
},
});
},
// Arrow functions assigned to variables
'variable_declaration': (node) => {
const declarator = node.childForFieldName('declarator');
if (!declarator) return;
const value = declarator.childForFieldName('value');
if (!value || (value.type !== 'arrow_function' && value.type !== 'function_expression')) return;
const name = this.getName(declarator);
if (!name) return;
const funcKind = value.type === 'arrow_function' ? 'function' : 'function';
symbols.push({
id: this.generateSymbolId(ctx.relativePath, name, funcKind, node.startPosition.row + 1),
fileId: `file:${ctx.relativePath}`,
name,
kind: funcKind,
signature: this.getSignature(value, ctx.content),
returnType: this.getReturnType(value),
startLine: node.startPosition.row + 1,
endLine: node.endPosition.row + 1,
startCol: node.startPosition.column,
endCol: node.endPosition.column,
docstring: this.extractDocstring(node, ctx.content),
isExported: this.isExported(node),
isAsync: this.hasModifier(value, 'async'),
isStatic: false,
visibility: this.getVisibility(node),
modifiers: this.getModifiers(node),
metadata: {
isArrow: value.type === 'arrow_function',
params: this.extractParams(value, ctx.content),
generics: this.extractGenerics(value, ctx.content),
},
});
},
// Classes
'class_declaration': (node) => {
const name = this.getName(node);
if (!name) return;
const heritage = this.extractHeritage(node); // extends, implements
const symbolId = this.generateSymbolId(ctx.relativePath, name, 'class', node.startPosition.row + 1);
symbols.push({
id: symbolId,
fileId: `file:${ctx.relativePath}`,
name,
kind: 'class',
signature: this.getSignature(node, ctx.content),
startLine: node.startPosition.row + 1,
endLine: node.endPosition.row + 1,
startCol: node.startPosition.column,
endCol: node.endPosition.column,
docstring: this.extractDocstring(node, ctx.content),
isExported: this.isExported(node),
isStatic: false,
visibility: this.getVisibility(node),
modifiers: this.getModifiers(node),
metadata: {
extends: heritage.extends,
implements: heritage.implements,
generics: this.extractGenerics(node, ctx.content),
},
});
// Create inheritance edges
if (heritage.extends) {
edges.push({
type: 'inherits',
from: symbolId,
to: `sym:unknown:${heritage.extends}:class:0`, // resolved later
metadata: { is_interface_inheritance: false },
isResolved: false,
});
}
for (const impl of heritage.implements) {
edges.push({
type: 'implements',
from: symbolId,
to: `sym:unknown:${impl}:interface:0`,
metadata: { is_partial: false },
isResolved: false,
});
}
},
// Interfaces
'interface_declaration': (node) => {
const name = this.getName(node);
if (!name) return;
const extendsList = this.extractInterfaceExtends(node);
const symbolId = this.generateSymbolId(ctx.relativePath, name, 'interface', node.startPosition.row + 1);
symbols.push({
id: symbolId,
fileId: `file:${ctx.relativePath}`,
name,
kind: 'interface',
signature: this.getSignature(node, ctx.content),
startLine: node.startPosition.row + 1,
endLine: node.endPosition.row + 1,
startCol: node.startPosition.column,
endCol: node.endPosition.column,
docstring: this.extractDocstring(node, ctx.content),
isExported: this.isExported(node),
isStatic: false,
visibility: 'public',
modifiers: this.getModifiers(node),
metadata: {
extends: extendsList,
generics: this.extractGenerics(node, ctx.content),
members: this.extractInterfaceMembers(node, ctx.content, ctx.relativePath),
},
});
for (const ext of extendsList) {
edges.push({
type: 'inherits',
from: symbolId,
to: `sym:unknown:${ext}:interface:0`,
metadata: { is_interface_inheritance: true },
isResolved: false,
});
}
},
// Type aliases
'type_alias_declaration': (node) => {
const name = this.getName(node);
if (!name) return;
symbols.push({
id: this.generateSymbolId(ctx.relativePath, name, 'type_alias', node.startPosition.row + 1),
fileId: `file:${ctx.relativePath}`,
name,
kind: 'type_alias',
signature: this.getTypeAliasBody(node, ctx.content),
startLine: node.startPosition.row + 1,
endLine: node.endPosition.row + 1,
startCol: node.startPosition.column,
endCol: node.endPosition.column,
docstring: this.extractDocstring(node, ctx.content),
isExported: this.isExported(node),
isStatic: false,
visibility: 'public',
modifiers: [],
metadata: {
generics: this.extractGenerics(node, ctx.content),
},
});
},
// Enums
'enum_declaration': (node) => {
const name = this.getName(node);
if (!name) return;
const members = this.extractEnumMembers(node, ctx.content);
symbols.push({
id: this.generateSymbolId(ctx.relativePath, name, 'enum', node.startPosition.row + 1),
fileId: `file:${ctx.relativePath}`,
name,
kind: 'enum',
startLine: node.startPosition.row + 1,
endLine: node.endPosition.row + 1,
startCol: node.startPosition.column,
endCol: node.endPosition.column,
docstring: this.extractDocstring(node, ctx.content),
isExported: this.isExported(node),
isStatic: false,
visibility: 'public',
modifiers: this.getModifiers(node),
metadata: { members },
});
},
// Imports (file-level)
'import_statement': (node) => {
const importInfo = this.extractImport(node, ctx.content);
if (!importInfo) return;
// Store as symbol for tracking
symbols.push({
id: this.generateSymbolId(ctx.relativePath, importInfo.source, 'import', node.startPosition.row + 1),
fileId: `file:${ctx.relativePath}`,
name: importInfo.source,
kind: 'import',
startLine: node.startPosition.row + 1,
endLine: node.endPosition.row + 1,
startCol: node.startPosition.column,
endCol: node.endPosition.column,
isExported: false,
modifiers: [],
metadata: {
source: importInfo.source,
specifiers: importInfo.specifiers,
isTypeOnly: importInfo.isTypeOnly,
isDefault: importInfo.isDefault,
},
});
// Create import edge
edges.push({
type: 'imports',
from: `file:${ctx.relativePath}`,
to: `file:${this.resolveImportPath(ctx.relativePath, importInfo.source)}`,
metadata: {
is_type_only: importInfo.isTypeOnly,
is_default: importInfo.isDefault,
specifiers: importInfo.specifiers,
},
isResolved: false,
});
},
// Export statements
'export_statement': (node) => {
// Handle: export { foo, bar } from './module'
const exportInfo = this.extractReExport(node, ctx.content);
if (exportInfo) {
for (const spec of exportInfo.specifiers) {
edges.push({
type: 'exports',
from: `file:${ctx.relativePath}`,
to: `file:${this.resolveImportPath(ctx.relativePath, exportInfo.source)}`,
metadata: {
is_reexport: true,
is_default: spec.isDefault,
alias: spec.alias,
name: spec.name,
},
isResolved: false,
});
}
}
},
// Method definitions inside classes
'method_definition': (node) => {
// This is handled inside class_declaration visitor
// We capture it there for parent_symbol_id linking
},
// Property definitions inside classes
'public_field_definition': (node) => {
// Handled inside class_declaration
},
});
// Post-process: resolve parent_symbol_id for nested symbols
// Post-process: mark exported symbols
const processed = this.postProcess(symbols, edges, ctx);
return {
symbols: processed.symbols,
edges: processed.edges,
parseErrors,
};
}
// ... helper methods (getName, getSignature, extractParams, etc.)
// Each is ~10-20 lines using tree-sitter child navigation
}// src/extractor/sql/sql.ts
export class SQLExtractor extends BaseExtractor {
language = 'sql';
extensions = ['.sql'];
extract(ctx: ExtractorContext): ExtractionResult {
const symbols: SymbolIR[] = [];
const edges: EdgeIR[] = [];
const parseErrors: ParseError[] = [];
this.walk(ctx.tree.rootNode, {
'create_table': (node) => {
const tableName = this.getTableName(node);
if (!tableName) return;
const tableId = this.generateSymbolId(
ctx.relativePath, tableName, 'table', node.startPosition.row + 1
);
// Extract columns
const columns = this.extractColumns(node, ctx.content, ctx.relativePath, tableId);
const constraints = this.extractConstraints(node, ctx.content, ctx.relativePath, tableId);
const indexes = this.extractIndexes(node, ctx.content, ctx.relativePath, tableId);
symbols.push({
id: tableId,
fileId: `file:${ctx.relativePath}`,
name: tableName,
kind: 'table',
signature: this.getTableSignature(node, ctx.content),
startLine: node.startPosition.row + 1,
endLine: node.endPosition.row + 1,
startCol: node.startPosition.column,
endCol: node.endPosition.column,
docstring: this.extractTableComment(node, ctx.content),
isExported: false,
modifiers: [],
metadata: {
schema: this.getSchemaName(node),
engine: this.getEngine(node),
columns: columns.map(c => c.name),
columnCount: columns.length,
},
});
symbols.push(...columns, ...constraints, ...indexes);
// Create column_of edges
for (const col of columns) {
edges.push({ type: 'column_of', from: col.id, to: tableId });
}
for (const idx of indexes) {
edges.push({ type: 'column_of', from: idx.id, to: tableId });
}
for (const con of constraints) {
edges.push({ type: 'column_of', from: con.id, to: tableId });
}
// Extract foreign keys and create FK edges
const fks = this.extractForeignKeys(node, ctx.content);
for (const fk of fks) {
const fromColId = this.generateSymbolId(
ctx.relativePath, fk.column, 'column', 0 // approximate
);
const toTableId = `sym:unknown:${fk.refTable}:table:0`;
edges.push({
type: 'foreign_key',
from: fromColId,
to: toTableId,
metadata: {
constraint_name: fk.name,
on_delete: fk.onDelete,
on_update: fk.onUpdate,
ref_column: fk.refColumn,
},
isResolved: false,
});
}
},
'create_view': (node) => {
const viewName = this.getViewName(node);
if (!viewName) return;
symbols.push({
id: this.generateSymbolId(ctx.relativePath, viewName, 'view', node.startPosition.row + 1),
fileId: `file:${ctx.relativePath}`,
name: viewName,
kind: 'view',
signature: this.getViewQuery(node, ctx.content),
startLine: node.startPosition.row + 1,
endLine: node.endPosition.row + 1,
startCol: node.startPosition.column,
endCol: node.endPosition.column,
docstring: this.extractViewComment(node, ctx.content),
isExported: false,
modifiers: [],
metadata: { schema: this.getSchemaName(node) },
});
},
'create_procedure': (node) => {
// Stored procedures / functions
},
});
return { symbols, edges, parseErrors };
}
}// src/extractor/markdown/markdown.ts
import { unified } from 'unified';
import remarkParse from 'remark-parse';
import remarkGfm from 'remark-gfm';
import { visit } from 'unist-util-visit';
import { Root, Heading, Code, List, Table, ListItem } from 'mdast';
export class MarkdownExtractor extends BaseExtractor {
language = 'markdown';
extensions = ['.md', '.mdx', '.markdown'];
extract(ctx: ExtractorContext): ExtractionResult {
const symbols: SymbolIR[] = [];
const edges: EdgeIR[] = [];
const tree = unified()
.use(remarkParse)
.use(remarkGfm)
.parse(ctx.content) as Root;
let currentSection: string | null = null;
let sectionCounter = 0;
let workflowStepCounter = 0;
let diagramNodeCounter = 0;
visit(tree, (node) => {
// Headings → sections
if (node.type === 'heading') {
const heading = node as Heading;
const text = this.getTextContent(heading);
const level = heading.depth;
const sectionId = this.generateSymbolId(
ctx.relativePath, text, 'section', heading.position?.start.line || 0
);
const sectionSymbol: SymbolIR = {
id: sectionId,
fileId: `file:${ctx.relativePath}`,
name: text,
kind: 'section',
startLine: heading.position?.start.line || 0,
endLine: heading.position?.end.line || 0,
startCol: heading.position?.start.column || 0,
endCol: heading.position?.end.column || 0,
isExported: false,
modifiers: [],
metadata: {
level,
anchor_id: this.slugify(text),
section_type: this.classifySection(text),
},
};
symbols.push(sectionSymbol);
// Link to parent section
if (currentSection && level > 1) {
edges.push({
type: 'contains',
from: currentSection,
to: sectionId,
});
}
currentSection = sectionId;
sectionCounter++;
}
// Code blocks → check for mermaid
if (node.type === 'code') {
const code = node as Code;
if (code.lang === 'mermaid' && code.value) {
const diagramResult = this.parseMermaid(code.value, ctx);
symbols.push(...diagramResult.symbols);
edges.push(...diagramResult.edges);
// Link diagram to current section
if (currentSection) {
for (const sym of diagramResult.symbols) {
edges.push({ type: 'contains', from: currentSection, to: sym.id });
}
}
}
}
// Lists → structured list items
if (node.type === 'list') {
const list = node as List;
this.extractListItems(list, symbols, edges, ctx, currentSection);
}
// Tables → structured rows
if (node.type === 'table') {
const table = node as Table;
const tableResult = this.extractTable(table, ctx, currentSection);
symbols.push(...tableResult.symbols);
edges.push(...tableResult.edges);
}
});
return { symbols, edges, parseErrors: [] };
}
private classifySection(heading: string): string {
const lower = heading.toLowerCase();
if (/workflow|flow|process|pipeline/.test(lower)) return 'workflow';
if (/sequence\s*diagram/.test(lower)) return 'sequence_diagram';
if (/flowchart/.test(lower)) return 'flowchart';
if (/release\s*plan|roadmap|timeline/.test(lower)) return 'release_plan';
if (/api|endpoint/.test(lower)) return 'api';
if (/architecture|component|system\s*design/.test(lower)) return 'architecture';
if (/decision|adr/.test(lower)) return 'decision';
if (/requirement|user\s*story|acceptance/.test(lower)) return 'requirement';
return 'general';
}
private parseMermaid(mermaidCode: string, ctx: ExtractorContext):
{ symbols: SymbolIR[]; edges: EdgeIR[] } {
const symbols: SymbolIR[] = [];
const edges: EdgeIR[] = [];
// Detect diagram type
const typeMatch = mermaidCode.match(/^(sequenceDiagram|flowchart\s+\w+|stateDiagram|erDiagram|classDiagram|gantt)/m);
const diagramType = typeMatch?.[1] || 'unknown';
if (diagramType === 'sequenceDiagram') {
return this.parseSequenceDiagram(mermaidCode, ctx);
}
if (diagramType.startsWith('flowchart')) {
return this.parseFlowchart(mermaidCode, ctx);
}
if (diagramType === 'erDiagram') {
return this.parseERDiagram(mermaidCode, ctx);
}
if (diagramType === 'classDiagram') {
return this.parseClassDiagram(mermaidCode, ctx);
}
// Fallback: store as raw diagram node
symbols.push({
id: this.generateSymbolId(ctx.relativePath, `diagram-${Date.now()}`, 'section', 0),
fileId: `file:${ctx.relativePath}`,
name: `Mermaid ${diagramType}`,
kind: 'section',
startLine: 0,
endLine: 0,
startCol: 0,
endCol: 0,
isExported: false,
modifiers: [],
metadata: { diagram_type: diagramType, raw: mermaidCode },
});
return { symbols, edges };
}
private parseSequenceDiagram(code: string, ctx: ExtractorContext):
{ symbols: SymbolIR[]; edges: EdgeIR[] } {
// Parse:
// participant A as Actor A
// A->>B: Message
// B-->>A: Response
//
// Creates: diagram_node per participant
// Creates: diagram_edge per message (with label, style)
const symbols: SymbolIR[] = [];
const edges: EdgeIR[] = [];
const participants = new Map<string, string>(); // alias → full name
const baseLine = 0; // Would need actual line from parent
const participantRe = /^participant\s+(\w+)(?:\s+as\s+(.+))?$/gm;
let match;
while ((match = participantRe.exec(code)) !== null) {
const alias = match[1];
const fullName = match[2] || alias;
participants.set(alias, fullName);
const id = this.generateSymbolId(ctx.relativePath, alias, 'diagram_node', baseLine);
symbols.push({
id,
fileId: `file:${ctx.relativePath}`,
name: fullName,
kind: 'diagram_node',
startLine: baseLine,
endLine: baseLine,
startCol: 0,
endCol: 0,
isExported: false,
modifiers: [],
metadata: {
diagram_type: 'sequence_diagram',
role: 'participant',
alias,
},
});
}
// Parse messages: A->>B: text or A-->>B: text
const msgRe = /^(\w+)(->>|-->>|->|-->)\s*(\w+):\s*(.+)$/gm;
let msgMatch;
let msgCounter = 0;
while ((msgMatch = msgRe.exec(code)) !== null) {
const fromAlias = msgMatch[1];
const arrowStyle = msgMatch[2];
const toAlias = msgMatch[3];
const message = msgMatch[4];
const fromId = this.generateSymbolId(ctx.relativePath, fromAlias, 'diagram_node', baseLine);
const toId = this.generateSymbolId(ctx.relativePath, toAlias, 'diagram_node', baseLine);
// Register participants if not explicitly declared
if (!participants.has(fromAlias)) {
participants.set(fromAlias, fromAlias);
symbols.push({
id: fromId,
fileId: `file:${ctx.relativePath}`,
name: fromAlias,
kind: 'diagram_node',
startLine: baseLine, endLine: baseLine,
startCol: 0, endCol: 0,
isExported: false, modifiers: [],
metadata: { diagram_type: 'sequence_diagram', role: 'participant', alias: fromAlias },
});
}
if (!participants.has(toAlias)) {
participants.set(toAlias, toAlias);
symbols.push({
id: toId,
fileId: `file:${ctx.relativePath}`,
name: toAlias,
kind: 'diagram_node',
startLine: baseLine, endLine: baseLine,
startCol: 0, endCol: 0,
isExported: false, modifiers: [],
metadata: { diagram_type: 'sequence_diagram', role: 'participant', alias: toAlias },
});
}
edges.push({
type: 'diagram_edge',
from: fromId,
to: toId,
metadata: {
label: message,
style: arrowStyle === '->>' ? 'solid' : arrowStyle === '-->>' ? 'dashed' : 'dotted',
type: 'solid',
sequence: msgCounter++,
is_response: arrowStyle.includes('--'),
},
});
}
return { symbols, edges };
}
// ... parseFlowchart, parseERDiagram, parseClassDiagram, extractListItems, extractTable
}// src/query/builder.ts
import { IStore } from '../storage/interface';
import { GraphNode, GraphEdge, GraphResult } from '../types/graph';
import { RepoScope } from './scopes/repo-scope';
export type SortDirection = 'asc' | 'desc';
export type TerminalFormat = 'array' | 'graph' | 'markdown' | 'json';
export interface FilterPredicate {
field: string;
op: 'eq' | 'neq' | 'gt' | 'gte' | 'lt' | 'lte' | 'contains' | 'matches' | 'in' | 'exists';
value: unknown;
}
export abstract class QueryScope<T extends QueryScope<T>> {
protected filters: FilterPredicate[] = [];
protected sortField: string | null = null;
protected sortDir: SortDirection = 'asc';
protected limitCount: number | null = null;
protected offsetCount: number = 0;
constructor(protected store: IStore, protected repoPath: string) {}
filter(predicate: FilterPredicate | ((item: GraphNode) => boolean)): T {
const clone = this.clone();
if (typeof predicate === 'function') {
// Function filters are applied post-hoc (for in-memory operations)
clone.filters.push({ field: '_func', op: 'eq', value: predicate } as any);
} else {
clone.filters.push(predicate);
}
return clone as T;
}
// Shorthand filters
eq(field: string, value: unknown): T { return this.filter({ field, op: 'eq', value }); }
neq(field: string, value: unknown): T { return this.filter({ field, op: 'neq', value }); }
contains(field: string, value: string): T { return this.filter({ field, op: 'contains', value }); }
matches(field: string, pattern: string): T { return this.filter({ field, op: 'matches', value: pattern }); }
in(field: string, values: unknown[]): T { return this.filter({ field, op: 'in', value: values }); }
sort(field: string, dir: SortDirection = 'asc'): T {
const clone = this.clone();
clone.sortField = field;
clone.sortDir = dir;
return clone as T;
}
limit(n: number): T {
const clone = this.clone();
clone.limitCount = n;
return clone as T;
}
offset(n: number): T {
const clone = this.clone();
clone.offsetCount = n;
return clone as T;
}
// Terminal methods
async toArray(): Promise<GraphNode[]> {
const result = await this.execute();
return this.applyPostFilters(result.nodes as GraphNode[]);
}
async toGraph(): Promise<GraphResult> {
const result = await this.execute();
return {
nodes: this.applyPostFilters(result.nodes as GraphNode[]),
edges: result.edges as GraphEdge[],
};
}
async toMarkdown(): Promise<string> {
const nodes = await this.toArray();
return this.formatAsMarkdown(nodes);
}
async toJSON(): Promise<string> {
const result = await this.toGraph();
return JSON.stringify(result, null, 2);
}
async count(): Promise<number> {
const nodes = await this.toArray();
return nodes.length;
}
async exists(): Promise<boolean> {
const count = await this.count();
return count > 0;
}
// Abstract: each scope implements its own query translation
protected abstract execute(): Promise<{ nodes: unknown[]; edges: unknown[] }>;
protected abstract clone(): T;
protected abstract formatAsMarkdown(nodes: GraphNode[]): string;
protected applyPostFilters(nodes: GraphNode[]): GraphNode[] {
return nodes.filter(node => {
for (const f of this.filters) {
if (f.field === '_func') continue; // Skip function filters for DB
const val = (node as any)[f.field];
if (!this.evaluateFilter(val, f)) return false;
}
// Apply function filters
for (const f of this.filters) {
if (f.field === '_func') {
if (!(f.value as Function)(node)) return false;
}
}
return true;
});
}
private evaluateFilter(val: unknown, f: FilterPredicate): boolean {
switch (f.op) {
case 'eq': return val === f.value;
case 'neq': return val !== f.value;
case 'contains': return typeof val === 'string' && val.includes(f.value as string);
case 'matches': return typeof val === 'string' && new RegExp(f.value as string).test(val);
case 'in': return Array.isArray(f.value) && f.value.includes(val);
case 'exists': return val !== null && val !== undefined;
case 'gt': return typeof val === 'number' && val > (f.value as number);
case 'gte': return typeof val === 'number' && val >= (f.value as number);
case 'lt': return typeof val === 'number' && val < (f.value as number);
case 'lte': return typeof val === 'number' && val <= (f.value as number);
default: return true;
}
}
}
// Public API entry point
export function createQuery(store: IStore, repoPath: string): RepoScope {
return new RepoScope(store, repoPath);
}// src/query/scopes/repo-scope.ts
import { QueryScope } from '../builder';
import { IStore } from '../../storage/interface';
import { GraphNode } from '../../types/graph';
import { ModuleScope } from './module-scope';
import { FileScope } from './file-scope';
import { SymbolScope } from './symbol-scope';
export class RepoScope extends QueryScope<RepoScope> {
protected async execute(): Promise<{ nodes: unknown[]; edges: unknown[] }> {
const query = `
SELECT * FROM repository
WHERE root = $repoPath
LIMIT 1
`;
const nodes = await this.store.query(query, { repoPath: this.repoPath });
return { nodes, edges: [] };
}
protected clone(): RepoScope {
return new RepoScope(this.store, this.repoPath);
}
protected formatAsMarkdown(nodes: GraphNode[]): string {
if (nodes.length === 0) return 'Repository not indexed.';
const repo = nodes[0];
const stats = repo.stats as any;
return [
`# Repository: ${repo.name}`,
``,
`- **Path:** ${repo.root}`,
`- **Files:** ${stats?.files ?? 'N/A'}`,
`- **Modules:** ${stats?.modules ?? 'N/A'}`,
`- **Symbols:** ${stats?.symbols ?? 'N/A'}`,
`- **Last Indexed:** ${repo.updated_at}`,
].join('\n');
}
// Navigation to sub-scopes
modules(): ModuleScope {
return new ModuleScope(this.store, this.repoPath, null);
}
files(): FileScope {
return new FileScope(this.store, this.repoPath, null);
}
symbols(): SymbolScope {
return new SymbolScope(this.store, this.repoPath, null);
}
docs(): DocScope {
return new DocScope(this.store, this.repoPath, null);
}
// Convenience: direct symbol lookup
symbol(name: string): SymbolScope {
return new SymbolScope(this.store, this.repoPath, null)
.eq('name', name);
}
table(name: string): TableScope {
return new TableScope(this.store, this.repoPath, null)
.eq('name', name);
}
commit(hash: string): CommitScope {
return new CommitScope(this.store, this.repoPath, null)
.eq('hash', hash);
}
}// src/query/scopes/symbol-scope.ts
import { QueryScope } from '../builder';
import { IStore } from '../../storage/interface';
import { GraphNode, GraphEdge } from '../../types/graph';
export class SymbolScope extends QueryScope<SymbolScope> {
constructor(
store: IStore,
repoPath: string,
private moduleId: string | null
) {
super(store, repoPath);
}
protected async execute(): Promise<{ nodes: unknown[]; edges: unknown[] }> {
let query = 'SELECT * FROM symbol';
const vars: Record<string, unknown> = {};
const conditions: string[] = [];
if (this.moduleId) {
// Join through file to filter by module
query = `
SELECT symbol.*, file.path as file_path, file.module_id
FROM symbol
INNER JOIN file ON symbol.file_id = file.id
`;
conditions.push('file.module_id = $moduleId');
vars.moduleId = this.moduleId;
}
// Apply filters
for (const f of this.filters) {
if (f.field === '_func') continue;
const param = `f_${f.field}`;
switch (f.op) {
case 'eq': conditions.push(`symbol.${f.field} = $${param}`); break;
case 'neq': conditions.push(`symbol.${f.field} != $${param}`); break;
case 'contains': conditions.push(`string::contains(symbol.${f.field}, $${param})`); break;
case 'matches': conditions.push(`string::matches(symbol.${f.field}, $${param})`); break;
case 'in': conditions.push(`symbol.${f.field} IN $${param}`); break;
case 'exists': conditions.push(`symbol.${f.field} != NONE`); break;
}
vars[param] = f.value;
}
if (conditions.length > 0) {
query += ` WHERE ${conditions.join(' AND ')}`;
}
if (this.sortField) {
query += ` ORDER BY symbol.${this.sortField} ${this.sortDir.toUpperCase()}`;
}
if (this.limitCount !== null) {
query += ` LIMIT ${this.limitCount}`;
}
if (this.offsetCount > 0) {
query += ` START ${this.offsetCount}`;
}
const nodes = await this.store.query(query, vars);
return { nodes, edges: [] };
}
// Graph traversal methods
async dependants(): Promise<SymbolScope> {
const symbols = await this.toArray();
if (symbols.length === 0) return this;
const ids = symbols.map(s => s.id);
const result = await this.store.graphTraversal(
ids[0], // Start from first symbol
['calls', 'imports', 'references'],
'inbound',
10, // max depth
undefined
);
// Return new scope with traversed nodes
const newScope = new SymbolScope(this.store, this.repoPath, this.moduleId);
// Store pre-computed result
(newScope as any)._precomputedNodes = result.nodes;
(newScope as any)._precomputedEdges = result.edges;
return newScope;
}
async dependencies(): Promise<SymbolScope> {
const symbols = await this.toArray();
if (symbols.length === 0) return this;
const result = await this.store.graphTraversal(
symbols[0].id,
['calls', 'imports', 'references'],
'outbound',
10,
undefined
);
const newScope = new SymbolScope(this.store, this.repoPath, this.moduleId);
(newScope as any)._precomputedNodes = result.nodes;
(newScope as any)._precomputedEdges = result.edges;
return newScope;
}
async callers(): Promise<SymbolScope> {
const symbols = await this.toArray();
if (symbols.length === 0) return this;
const result = await this.store.graphTraversal(
symbols[0].id,
['calls'],
'inbound',
10,
undefined
);
const newScope = new SymbolScope(this.store, this.repoPath, this.moduleId);
(newScope as any)._precomputedNodes = result.nodes;
(newScope as any)._precomputedEdges = result.edges;
return newScope;
}
async callees(): Promise<SymbolScope> {
const symbols = await this.toArray();
if (symbols.length === 0) return this;
const result = await this.store.graphTraversal(
symbols[0].id,
['calls'],
'outbound',
10,
undefined
);
const newScope = new SymbolScope(this.store, this.repoPath, this.moduleId);
(newScope as any)._precomputedNodes = result.nodes;
(newScope as any)._precomputedEdges = result.edges;
return newScope;
}
// Navigate to containing file
async file(): Promise<FileScope> {
const symbols = await this.toArray();
if (symbols.length === 0) return new FileScope(this.store, this.repoPath, null);
const fileId = (symbols[0] as any).file_id;
const fileScope = new FileScope(this.store, this.repoPath, null);
(fileScope as any)._precomputedFileId = fileId;
return fileScope;
}
protected clone(): SymbolScope {
return new SymbolScope(this.store, this.repoPath, this.moduleId);
}
protected formatAsMarkdown(nodes: GraphNode[]): string {
if (nodes.length === 0) return 'No symbols found.';
return nodes.map(n => {
const s = n as any;
const exportTag = s.is_exported ? 'exported' : 'internal';
const location = s.file_path ? `(${s.file_path}:${s.start_line})` : `(${s.start_line})`;
return `- **${s.name}** [${s.kind}] [${exportTag}] ${location}${s.signature ? `\n \`${s.signature}\`` : ''}${s.docstring ? `\n > ${s.docstring.split('\n')[0]}` : ''}`;
}).join('\n');
}
}// src/mcp/tools/impact.ts
import { Tool } from '@modelcontextprotocol/sdk/types.js';
import { IStore } from '../../storage/interface';
import { createQuery } from '../../query/builder';
import { TokenBudgetManager } from '../token-budget';
export function createImpactAnalysisTool(store: IStore, repoPath: string, budget: TokenBudgetManager): Tool {
return {
name: 'get_impact_analysis',
description: `Analyze the impact of changing a symbol. Returns all direct and transitive dependants — functions that call it, files that import it, modules that depend on it. Use this before making changes to understand blast radius.`,
inputSchema: {
type: 'object',
properties: {
symbol_name: {
type: 'string',
description: 'Name of the symbol to analyze',
},
symbol_kind: {
type: 'string',
enum: ['function', 'class', 'interface', 'type_alias', 'variable', 'table', 'column'],
description: 'Kind of symbol (optional, narrows search)',
},
file_path: {
type: 'string',
description: 'File path to disambiguate (optional)',
},
max_depth: {
type: 'number',
description: 'Max traversal depth for transitive dependants (default: 5)',
default: 5,
},
include_transitive: {
type: 'boolean',
description: 'Include transitive (indirect) dependants (default: true)',
default: true,
},
},
required: ['symbol_name'],
},
handler: async (params: any) => {
const q = createQuery(store, repoPath)
.symbol(params.symbol_name);
if (params.symbol_kind) q.eq('kind', params.symbol_kind);
if (params.file_path) q.eq('file_path', params.file_path);
const symbols = await q.toArray();
if (symbols.length === 0) {
return {
content: [{ type: 'text', text: JSON.stringify({ error: 'Symbol not found', symbol_name: params.symbol_name }) }],
};
}
const symbol = symbols[0];
const depth = params.max_depth ?? 5;
// Get dependants via graph traversal
const result = await store.graphTraversal(
symbol.id,
['calls', 'imports', 'references', 'implements'],
'inbound',
depth,
undefined
);
// Organize by distance (direct vs transitive)
const direct = result.edges.filter(e => {
// Direct edges are those where the target is our symbol
return e.to === symbol.id;
}).map(e => result.nodes.find(n => n.id === e.from)!).filter(Boolean);
const transitive = result.nodes.filter(n =>
n.id !== symbol.id && !direct.find(d => d.id === n.id)
);
// Group by file and module
const byFile = new Map<string, GraphNode[]>();
const byModule = new Map<string, GraphNode[]>();
for (const node of result.nodes) {
const n = node as any;
if (n.file_path) {
if (!byFile.has(n.file_path)) byFile.set(n.file_path, []);
byFile.get(n.file_path)!.push(node);
}
if (n.module_id) {
if (!byModule.has(n.module_id)) byModule.set(n.module_id, []);
byModule.get(n.module_id)!.push(node);
}
}
const response = {
target: {
id: symbol.id,
name: (symbol as any).name,
kind: (symbol as any).kind,
file: (symbol as any).file_path,
line: (symbol as any).start_line,
},
impact_summary: {
total_dependants: result.nodes.length,
direct_dependants: direct.length,
transitive_dependants: transitive.length,
files_affected: byFile.size,
modules_affected: byModule.size,
},
direct_dependants: direct.map(n => ({
name: (n as any).name,
kind: (n as any).kind,
file: (n as any).file_path,
line: (n as any).start_line,
relationship: result.edges.find(e => e.from === n.id && e.to === symbol.id)?.type,
})),
affected_files: Object.fromEntries(
Array.from(byFile.entries()).map(([path, nodes]) => [
path,
nodes.map(n => ({ name: (n as any).name, kind: (n as any).kind, line: (n as any).start_line }))
])
),
affected_modules: Object.fromEntries(
Array.from(byModule.entries()).map(([id, nodes]) => [
id,
{ symbol_count: nodes.length, kinds: [...new Set(nodes.map(n => (n as any).kind))] }
])
),
token_estimate: budget.estimate(JSON.stringify(result)),
};
// Apply token budget truncation if needed
const truncated = budget.truncate(response, params.max_tokens);
return {
content: [{ type: 'text', text: JSON.stringify(truncated, null, 2) }],
};
},
};
}// src/hooks/pre-commit.ts
import { simpleGit, SimpleGit } from 'simple-git';
import { IStore } from '../storage/interface';
import { ExtractorRegistry } from '../extractor/registry';
import { GraphDiffer } from '../engine/differ';
import { GraphMerger } from '../engine/merger';
import { Validator } from '../engine/validator';
import { contentHash } from '../utils/hash';
import { Logger } from '../utils/logger';
interface PreCommitResult {
status: 'pass' | 'warn' | 'fail';
parsed: number;
updated: number;
added: number;
removed: number;
errors: string[];
warnings: string[];
}
export async function runPreCommit(
repoPath: string,
store: IStore,
config: { mode: 'warn' | 'block' | 'off' },
logger: Logger
): Promise<PreCommitResult> {
const result: PreCommitResult = {
status: 'pass',
parsed: 0,
updated: 0,
added: 0,
removed: 0,
errors: [],
warnings: [],
};
const git: SimpleGit = simpleGit(repoPath);
// 1. Get staged files
const stagedFiles = await git.diff(['--cached', '--name-only', '--diff-filter=ACMR']);
const fileNames = stagedFiles.trim().split('\n').filter(Boolean);
if (fileNames.length === 0) {
return result;
}
logger.info(`Pre-commit: ${fileNames.length} staged files`);
// 2. Filter to supported files
const registry = new ExtractorRegistry();
const supportedFiles = fileNames.filter(f => registry.supportsFile(f));
if (supportedFiles.length === 0) {
return result;
}
logger.info(`Pre-commit: ${supportedFiles.length} supported files to parse`);
// 3. Parse changed files
for (const filePath of supportedFiles) {
try {
const absolutePath = path.resolve(repoPath, filePath);
const content = await fs.readFile(absolutePath, 'utf-8');
const hash = contentHash(content);
// Check if content actually changed
const existingFile = await store.query(
'SELECT content_hash FROM file WHERE path = $path LIMIT 1',
{ path: filePath }
);
if (existingFile.length > 0 && existingFile[0].content_hash === hash) {
continue; // No change
}
// Extract symbols
const extractor = registry.getExtractor(filePath);
const extraction = await extractor.extractFile(absolutePath, repoPath);
// Diff against existing graph
const oldSymbols = await store.query(
'SELECT * FROM symbol WHERE file_id = $fileId',
{ fileId: `file:${filePath}` }
);
const diff = GraphDiffer.diff(oldSymbols, extraction.symbols);
// Merge into graph
await store.transaction(async (tx) => {
// Remove old symbols
for (const removed of diff.removed) {
await tx.deleteNode(removed.id);
await tx.deleteEdges(removed.id);
result.removed++;
}
// Update changed symbols
for (const changed of diff.changed) {
await tx.updateNode(changed.new.id, changed.new);
result.updated++;
}
// Add new symbols
for (const added of diff.added) {
await tx.createNode(added);
result.added++;
}
// Update edges
await tx.deleteEdges(`file:${filePath}`); // Remove old edges from this file
await tx.createEdges(extraction.edges.map(e => ({
...e,
// Resolve file-level edges
from: e.from.startsWith('file:') ? `file:${filePath}` : e.from,
})));
// Update file node
const fileNode = {
id: `file:${filePath}`,
type: 'file',
path: filePath,
content_hash: hash,
parse_status: extraction.parseErrors.length === 0 ? 'parsed' : 'partial',
parse_error: extraction.parseErrors.length > 0
? extraction.parseErrors.map(e => `L${e.line}: ${e.message}`).join('; ')
: null,
last_parsed: new Date().toISOString(),
line_count: content.split('\n').length,
size_bytes: Buffer.byteLength(content),
};
await tx.createNode(fileNode as any);
});
result.parsed++;
if (extraction.parseErrors.length > 0) {
result.warnings.push(
`${filePath}: ${extraction.parseErrors.length} parse errors`
);
}
} catch (err) {
result.errors.push(`${filePath}: ${err.message}`);
logger.error(`Pre-commit error for ${filePath}`, err);
}
}
// 4. Validate (if enabled)
if (config.mode !== 'off') {
const validation = await Validator.validate(store, repoPath);
result.warnings.push(...validation.warnings);
result.errors.push(...validation.errors);
if (result.errors.length > 0 && config.mode === 'block') {
result.status = 'fail';
} else if (result.warnings.length > 0 || result.errors.length > 0) {
result.status = 'warn';
}
}
// 5. Update repo stats
await updateRepoStats(store, repoPath);
return result;
}// src/workflows/templates/bug-fix.ts
import { IStore } from '../../storage/interface';
import { createQuery } from '../../query/builder';
export interface BugFixInput {
error_message?: string;
stack_trace?: string[];
file_path?: string;
line_number?: number;
symbol_name?: string;
error_type?: string; // TypeError, ReferenceError, etc.
}
export interface BugFixOutput {
root_candidates: RootCandidate[];
impact_radius: ImpactRadius;
related_tests: RelatedTest[];
recent_changes: RecentChange[];
suggested_investigation_order: string[];
}
interface RootCandidate {
symbol_id: string;
symbol_name: string;
kind: string;
file_path: string;
line: number;
confidence: 'high' | 'medium' | 'low';
reason: string;
}
interface ImpactRadius {
direct_callers: number;
transitive_callers: number;
affected_files: string[];
affected_modules: string[];
}
export async function executeBugFixWorkflow(
store: IStore,
repoPath: string,
input: BugFixInput
): Promise<BugFixOutput> {
const candidates: RootCandidate[] = [];
// Strategy 1: If we have a file + line, look up the symbol at that location
if (input.file_path && input.line_number) {
const symbols = await createQuery(store, repoPath)
.symbol('') // We need a different query here
.eq('file_path', input.file_path)
.toArray();
// Find symbol containing the line
const containing = symbols.find(s => {
const sym = s as any;
return sym.start_line <= input.line_number! && sym.end_line >= input.line_number!;
});
if (containing) {
candidates.push({
symbol_id: containing.id,
symbol_name: (containing as any).name,
kind: (containing as any).kind,
file_path: (containing as any).file_path,
line: (containing as any).start_line,
confidence: 'high',
reason: `Symbol at error location (${input.file_path}:${input.line_number})`,
});
}
}
// Strategy 2: If we have a symbol name from the error (e.g., "Cannot read property 'foo' of undefined")
if (input.symbol_name || input.error_message) {
const nameToSearch = input.symbol_name || extractPropertyName(input.error_message!);
if (nameToSearch) {
const matches = await createQuery(store, repoPath)
.symbol(nameToSearch)
.toArray();
for (const match of matches) {
// Don't duplicate if already found
if (candidates.find(c => c.symbol_id === match.id)) continue;
candidates.push({
symbol_id: match.id,
symbol_name: (match as any).name,
kind: (match as any).kind,
file_path: (match as any).file_path,
line: (match as any).start_line,
confidence: 'medium',
reason: `Name matches error reference: "${nameToSearch}"`,
});
}
}
}
// Strategy 3: If we have a stack trace, trace the call chain
if (input.stack_trace && input.stack_trace.length > 0) {
for (const frame of input.stack_trace) {
const parsed = parseStackFrame(frame);
if (!parsed) continue;
const symbols = await createQuery(store, repoPath)
.symbol(parsed.functionName)
.eq('file_path', parsed.filePath)
.toArray();
for (const sym of symbols) {
if (candidates.find(c => c.symbol_id === sym.id)) continue;
candidates.push({
symbol_id: sym.id,
symbol_name: (sym as any).name,
kind: (sym as any).kind,
file_path: (sym as any).file_path,
line: (sym as any).start_line,
confidence: parsed.filePath === input.file_path ? 'high' : 'medium',
reason: `Appears in stack trace: ${frame.trim()}`,
});
}
}
}
// Strategy 4: If error type suggests null/undefined, find recently changed symbols in the area
if (input.error_type && ['TypeError', 'ReferenceError'].includes(input.error_type)) {
// Find symbols modified in last 5 commits in the same file
if (input.file_path) {
const recentSymbols = await store.query(`
SELECT symbol.*, commit.hash, commit.date
FROM symbol
INNER JOIN modified_in ON symbol.file_id = modified_in.from
INNER JOIN commit ON modified_in.to = commit.id
WHERE symbol.file_path = $filePath
ORDER BY commit.date DESC
LIMIT 10
`, { filePath: input.file_path });
for (const rs of recentSymbols) {
if (candidates.find(c => c.symbol_id === rs.id)) continue;
candidates.push({
symbol_id: rs.id,
symbol_name: rs.name,
kind: rs.kind,
file_path: rs.file_path,
line: rs.start_line,
confidence: 'low',
reason: `Recently modified symbol in error file (commit ${rs.hash})`,
});
}
}
}
// Compute impact radius for top candidate
let impactRadius: ImpactRadius = {
direct_callers: 0,
transitive_callers: 0,
affected_files: [],
affected_modules: [],
};
if (candidates.length > 0) {
const topCandidate = candidates[0];
const result = await store.graphTraversal(
topCandidate.symbol_id,
['calls', 'imports'],
'inbound',
10,
undefined
);
const directEdges = result.edges.filter(e => e.to === topCandidate.symbol_id);
impactRadius.direct_callers = directEdges.length;
impactRadius.transitive_callers = result.nodes.length;
impactRadius.affected_files = [...new Set(result.nodes.map(n => (n as any).file_path).filter(Boolean))];
// Resolve modules
for (const filePath of impactRadius.affected_files) {
const fileNode = await store.query(
'SELECT module_id FROM file WHERE path = $path LIMIT 1',
{ path: filePath }
);
if (fileNode.length > 0 && fileNode[0].module_id) {
impactRadius.affected_modules.push(fileNode[0].module_id);
}
}
impactRadius.affected_modules = [...new Set(impactRadius.affected_modules)];
}
// Find related tests
const relatedTests: RelatedTest[] = [];
if (candidates.length > 0) {
for (const candidate of candidates.slice(0, 3)) {
const testSymbols = await store.query(`
SELECT * FROM symbol
WHERE name CONTAINS $testName
AND (kind = 'function' AND name LIKE '%test%')
LIMIT 5
`, { testName: candidate.symbol_name });
for (const test of testSymbols) {
relatedTests.push({
test_name: test.name,
file_path: test.file_path,
line: test.start_line,
linked_to: candidate.symbol_name,
});
}
}
}
// Suggest investigation order
const suggestedOrder = candidates
.sort((a, b) => {
const confOrder = { high: 0, medium: 1, low: 2 };
return confOrder[a.confidence] - confOrder[b.confidence];
})
.map(c => `${c.file_path}:${c.line} (${c.symbol_name})`);
return {
root_candidates: candidates,
impact_radius: impactRadius,
related_tests: relatedTests,
recent_changes: [], // Populated from git log
suggested_investigation_order: suggestedOrder,
};
}
function extractPropertyName(errorMessage: string): string | null {
// "Cannot read properties of undefined (reading 'foo')"
const readMatch = errorMessage.match(/reading '(\w+)'/);
if (readMatch) return readMatch[1];
// "foo is not a function"
const notFnMatch = errorMessage.match(/(\w+) is not a function/);
if (notFnMatch) return notFnMatch[1];
// "foo is not defined"
const notDefMatch = errorMessage.match(/(\w+) is not defined/);
if (notDefMatch) return notDefMatch[1];
return null;
}
function parseStackFrame(frame: string): { functionName: string; filePath: string } | null {
// "at functionName (/path/to/file.ts:10:5)"
const match = frame.match(/at\s+(\w+)\s+\((.+):(\d+):\d+\)/);
if (!match) return null;
return { functionName: match[1], filePath: match[2] };
}// src/mcp/token-budget.ts
export class TokenBudgetManager {
private maxTokens: number;
// Approximate tokens per character for different content types
private static RATES = {
code: 0.25, // ~4 chars per token
markdown: 0.3, // ~3.3 chars per token
json: 0.22, // ~4.5 chars per token (compact)
text: 0.33, // ~3 chars per token
};
constructor(maxTokens: number = 8000) {
this.maxTokens = maxTokens;
}
estimate(content: string, type: keyof typeof TokenBudgetManager.RATES = 'json'): number {
return Math.ceil(content.length * TokenBudgetManager.RATES[type]);
}
truncate<T>(data: T, requestedMax?: number): T & { _truncated: boolean; _token_count: number } {
const max = requestedMax ?? this.maxTokens;
const json = JSON.stringify(data);
const tokens = this.estimate(json);
if (tokens <= max) {
return {
...data,
_truncated: false,
_token_count: tokens,
} as T & { _truncated: boolean; _token_count: number };
}
// Truncation strategy: keep structure, reduce detail
const truncated = this.smartTruncate(data, max);
const truncatedJson = JSON.stringify(truncated);
const truncatedTokens = this.estimate(truncatedJson);
return {
...truncated,
_truncated: true,
_token_count: truncatedTokens,
} as T & { _truncated: boolean; _token_count: number };
}
private smartTruncate<T>(data: T, budget: number): T {
const obj = data as any;
// Strategy 1: If it has an array of items, truncate the array
for (const key of Object.keys(obj)) {
if (Array.isArray(obj[key]) && obj[key].length > 0) {
// Keep reducing until we're under budget
let len = obj[key].length;
while (len > 1) {
const testObj = { ...obj, [key]: obj[key].slice(0, len) };
const testJson = JSON.stringify(testObj);
if (this.estimate(testJson) <= budget * 0.9) { // 10% margin for metadata
obj[key] = obj[key].slice(0, len);
obj._truncation_note = `${key} truncated from ${obj[key].length} to ${len} items`;
return obj as T;
}
len = Math.floor(len * 0.7); // Reduce by 30% each iteration
}
obj[key] = obj[key].slice(0, 1);
return obj as T;
}
}
// Strategy 2: Remove verbose fields
const verboseFields = ['signature', 'docstring', 'metadata', 'raw'];
for (const field of verboseFields) {
if (obj[field]) {
delete obj[field];
const testJson = JSON.stringify(obj);
if (this.estimate(testJson) <= budget * 0.9) {
return obj as T;
}
}
}
// Strategy 3: Last resort - truncate string fields
for (const key of Object.keys(obj)) {
if (typeof obj[key] === 'string' && obj[key].length > 100) {
obj[key] = obj[key].slice(0, 100) + '...';
}
}
return obj as T;
}
}// src/storage/surreal/migrations.ts
export const SCHEMA_DEFINITION = `
// ============================================
// TOKENZIP GRAPH SCHEMA - SurrealDB v2
// ============================================
// --- NODE TYPES ---
DEFINE TABLE repository SCHEMAFULL;
DEFINE FIELD name ON repository TYPE string;
DEFINE FIELD root ON repository TYPE string;
DEFINE FIELD created_at ON repository TYPE datetime DEFAULT time::now();
DEFINE FIELD updated_at ON repository TYPE datetime DEFAULT time::now();
DEFINE FIELD stats ON repository TYPE object {
files: number,
modules: number,
symbols: number
};
DEFINE TABLE module SCHEMAFULL;
DEFINE FIELD name ON module TYPE string;
DEFINE FIELD path ON module TYPE string;
DEFINE FIELD manifest_type ON module TYPE string;
DEFINE FIELD language ON module TYPE string;
DEFINE FIELD is_root ON module TYPE bool DEFAULT false;
DEFINE FIELD metadata ON module TYPE object;
DEFINE FIELD repository_id ON module TYPE record<repository>;
DEFINE TABLE file SCHEMAFULL;
DEFINE FIELD path ON file TYPE string;
DEFINE FIELD module_id ON file TYPE record<module>;
DEFINE FIELD language ON file TYPE string;
DEFINE FIELD ext ON file TYPE string;
DEFINE FIELD size_bytes ON file TYPE int;
DEFINE FIELD content_hash ON file TYPE string;
DEFINE FIELD line_count ON file TYPE int;
DEFINE FIELD parse_status ON file TYPE string
ASSERT $value IN ['parsed', 'partial', 'failed', 'skipped'];
DEFINE FIELD parse_error ON file TYPE option<string>;
DEFINE FIELD last_parsed ON file TYPE datetime;
DEFINE FIELD git_last_modified ON file TYPE option<datetime>;
DEFINE FIELD git_blame_summary ON file TYPE option<object>;
DEFINE TABLE symbol SCHEMAFULL;
DEFINE FIELD file_id ON symbol TYPE record<file>;
DEFINE FIELD name ON symbol TYPE string;
DEFINE FIELD kind ON symbol TYPE string
ASSERT $value IN [
'function', 'method', 'constructor',
'class', 'interface', 'type_alias', 'enum',
'variable', 'constant', 'property',
'parameter', 'generic_param',
'decorator', 'annotation',
'table', 'view', 'column', 'index', 'constraint',
'foreign_key', 'stored_procedure',
'import', 'export', 're_export',
'namespace', 'module_decl',
'section', 'subsection',
'workflow_step', 'diagram_node',
'list_item', 'table_row'
];
DEFINE FIELD signature ON symbol TYPE option<string>;
DEFINE FIELD return_type ON symbol TYPE option<string>;
DEFINE FIELD start_line ON symbol TYPE int;
DEFINE FIELD end_line ON symbol TYPE int;
DEFINE FIELD start_col ON symbol TYPE int;
DEFINE FIELD end_col ON symbol TYPE int;
DEFINE FIELD docstring ON symbol TYPE option<string>;
DEFINE FIELD is_exported ON symbol TYPE bool DEFAULT false;
DEFINE FIELD is_async ON symbol TYPE option<bool>;
DEFINE FIELD is_static ON symbol TYPE option<bool>;
DEFINE FIELD visibility ON symbol TYPE option<string>
ASSERT $value IN [null, 'public', 'private', 'protected'];
DEFINE FIELD modifiers ON symbol TYPE array;
DEFINE FIELD parent_symbol_id ON symbol TYPE option<string>;
DEFINE FIELD metadata ON symbol TYPE object;
DEFINE TABLE commit SCHEMAFULL;
DEFINE FIELD hash ON commit TYPE string;
DEFINE FIELD short_hash ON commit TYPE string;
DEFINE FIELD message ON commit TYPE string;
DEFINE FIELD author ON commit TYPE string;
DEFINE FIELD email ON commit TYPE string;
DEFINE FIELD date ON commit TYPE datetime;
DEFINE FIELD branch ON commit TYPE string;
DEFINE FIELD tags ON commit TYPE array;
DEFINE TABLE dependency SCHEMAFULL;
DEFINE FIELD module_id ON dependency TYPE record<module>;
DEFINE FIELD name ON dependency TYPE string;
DEFINE FIELD version ON dependency TYPE string;
DEFINE FIELD dev ON dependency TYPE bool DEFAULT false;
DEFINE FIELD source ON dependency TYPE string;
// --- EDGE TYPES ---
DEFINE TABLE contains SCHEMAFULL TYPE RELATION FROM repository, module, file, symbol TO module, file, symbol;
DEFINE TABLE imports SCHEMAFULL TYPE RELATION FROM file, symbol, module TO file, symbol, module;
DEFINE FIELD is_type_only ON imports TYPE option<bool>;
DEFINE FIELD is_default ON imports TYPE option<bool>;
DEFINE FIELD alias ON imports TYPE option<string>;
DEFINE FIELD specifiers ON imports TYPE option<array>;
DEFINE TABLE exports SCHEMAFULL TYPE RELATION FROM file, symbol TO symbol, file;
DEFINE FIELD is_default ON exports TYPE option<bool>;
DEFINE FIELD is_reexport ON exports TYPE option<bool>;
DEFINE FIELD alias ON exports TYPE option<string>;
DEFINE FIELD name ON exports TYPE option<string>;
DEFINE TABLE calls SCHEMAFULL TYPE RELATION FROM symbol TO symbol;
DEFINE FIELD line ON calls TYPE option<int>;
DEFINE FIELD is_async ON calls TYPE option<bool>;
DEFINE FIELD call_type ON calls TYPE option<string>
ASSERT $value IN [null, 'direct', 'indirect', 'dynamic'];
DEFINE TABLE implements SCHEMAFULL TYPE RELATION FROM symbol TO symbol;
DEFINE FIELD is_partial ON implements TYPE option<bool>;
DEFINE TABLE inherits SCHEMAFULL TYPE RELATION FROM symbol TO symbol;
DEFINE FIELD is_interface_inheritance ON inherits TYPE option<bool>;
DEFINE TABLE modifies SCHEMAFULL TYPE RELATION FROM symbol TO symbol;
DEFINE TABLE reads SCHEMAFULL TYPE RELATION FROM symbol TO symbol;
DEFINE TABLE references SCHEMAFULL TYPE RELATION FROM symbol TO symbol;
DEFINE FIELD context ON references TYPE option<string>;
DEFINE TABLE depends_on SCHEMAFULL TYPE RELATION FROM module, file TO module, file;
DEFINE FIELD is_transitive ON depends_on TYPE option<bool>;
DEFINE FIELD depth ON depends_on TYPE option<int>;
DEFINE TABLE modified_in SCHEMAFULL TYPE RELATION FROM file TO commit;
DEFINE FIELD change_type ON modified_in TYPE string
ASSERT $value IN ['added', 'modified', 'deleted', 'renamed'];
DEFINE TABLE foreign_key SCHEMAFULL TYPE RELATION FROM symbol TO symbol;
DEFINE FIELD constraint_name ON foreign_key TYPE option<string>;
DEFINE FIELD on_delete ON foreign_key TYPE option<string>;
DEFINE FIELD on_update ON foreign_key TYPE option<string>;
DEFINE FIELD ref_column ON foreign_key TYPE option<string>;
DEFINE TABLE column_of SCHEMAFULL TYPE RELATION FROM symbol TO symbol;
DEFINE TABLE diagram_edge SCHEMAFULL TYPE RELATION FROM symbol TO symbol;
DEFINE FIELD label ON diagram_edge TYPE option<string>;
DEFINE FIELD style ON diagram_edge TYPE option<string>;
DEFINE FIELD type ON diagram_edge TYPE option<string>;
DEFINE FIELD sequence ON diagram_edge TYPE option<int>;
DEFINE FIELD is_response ON diagram_edge TYPE option<bool>;
DEFINE TABLE workflow_transition SCHEMAFULL TYPE RELATION FROM symbol TO symbol;
DEFINE FIELD condition ON workflow_transition TYPE option<string>;
DEFINE FIELD action ON workflow_transition TYPE option<string>;
// --- INDEXES ---
DEFINE INDEX idx_file_path ON file FIELDS path UNIQUE;
DEFINE INDEX idx_file_hash ON file FIELDS content_hash;
DEFINE INDEX idx_file_module ON file FIELDS module_id;
DEFINE INDEX idx_symbol_name ON symbol FIELDS name;
DEFINE INDEX idx_symbol_kind ON symbol FIELDS kind;
DEFINE INDEX idx_symbol_file ON symbol FIELDS file_id;
DEFINE INDEX idx_symbol_export ON symbol FIELDS is_exported;
DEFINE INDEX idx_module_path ON module FIELDS path UNIQUE;
DEFINE INDEX idx_commit_hash ON commit FIELDS hash UNIQUE;
DEFINE INDEX idx_dep_name ON dependency FIELDS name, module_id;
`;// src/utils/errors.ts
export class TokenZipError extends Error {
constructor(
message: string,
public readonly code: ErrorCode,
public readonly details?: Record<string, unknown>
) {
super(message);
this.name = 'TokenZipError';
}
}
export enum ErrorCode {
// Storage errors (1xxx)
DB_CONNECTION_FAILED = 'E1001',
DB_QUERY_FAILED = 'E1002',
DB_MIGRATION_FAILED = 'E1003',
DB_CORRUPTED = 'E1004',
// Parser errors (2xxx)
PARSE_FAILED = 'E2001',
GRAMMAR_NOT_FOUND = 'E2002',
PARTIAL_PARSE = 'E2003',
// Git errors (3xxx)
GIT_NOT_REPOSITORY = 'E3001',
GIT_HOOK_INSTALL_FAILED = 'E3002',
GIT_DIFF_FAILED = 'E3003',
// MCP errors (4xxx)
MCP_TRANSPORT_FAILED = 'E4001',
MCP_TOOL_NOT_FOUND = 'E4002',
MCP_INVALID_PARAMS = 'E4003',
MCP_TOKEN_BUDGET_EXCEEDED = 'E4004',
// Config errors (5xxx)
CONFIG_NOT_FOUND = 'E5001',
CONFIG_INVALID = 'E5002',
// Indexer errors (6xxx)
INDEX_INTERRUPTED = 'E6001',
INDEX_FILE_TOO_LARGE = 'E6002',
INDEX_BINARY_FILE = 'E6003',
}
// Global error handler for MCP tools
export function mcpErrorHandler(error: unknown): { content: Array<{ type: 'text'; text: string }>; isError: boolean } {
if (error instanceof TokenZipError) {
return {
content: [{
type: 'text',
text: JSON.stringify({
error: error.message,
code: error.code,
details: error.details,
}),
}],
isError: true,
};
}
if (error instanceof Error) {
return {
content: [{
type: 'text',
text: JSON.stringify({
error: error.message,
code: 'E9999',
stack: process.env.NODE_ENV === 'development' ? error.stack : undefined,
}),
}],
isError: true,
};
}
return {
content: [{ type: 'text', text: JSON.stringify({ error: 'Unknown error' }) }],
isError: true,
};
}// tests/unit/extractor/typescript.test.ts
import { describe, it, expect, beforeEach } from 'vitest';
import { TypeScriptExtractor } from '../../../src/extractor/code/typescript';
import { createMockContext } from '../../helpers';
describe('TypeScriptExtractor', () => {
let extractor: TypeScriptExtractor;
beforeEach(() => {
extractor = new TypeScriptExtractor();
});
describe('function extraction', () => {
it('extracts a simple exported function', () => {
const code = `
export function addUser(name: string, age: number): User {
return { name, age, id: crypto.randomUUID() };
}
`;
const ctx = createMockContext('src/user.ts', code, 'module-1');
const result = extractor.extract(ctx);
expect(result.symbols).toHaveLength(1);
expect(result.symbols[0]).toMatchObject({
name: 'addUser',
kind: 'function',
isExported: true,
isAsync: false,
startLine: 2,
endLine: 4,
});
expect(result.symbols[0].metadata.params).toEqual([
{ name: 'name', type: 'string' },
{ name: 'age', type: 'number' },
]);
expect(result.symbols[0].returnType).toBe('User');
});
it('extracts async arrow function assigned to const', () => {
const code = `
export const fetchUser = async (id: string): Promise<User> => {
const res = await fetch(\`/api/users/\${id}\`);
return res.json();
};
`;
const ctx = createMockContext('src/api.ts', code, 'module-1');
const result = extractor.extract(ctx);
expect(result.symbols).toHaveLength(1);
expect(result.symbols[0]).toMatchObject({
name: 'fetchUser',
kind: 'function',
isExported: true,
isAsync: true,
});
expect(result.symbols[0].metadata.isArrow).toBe(true);
});
it('extracts class with methods, inheritance, and implementation', () => {
const code = `
export class UserRepository implements IRepository<User> {
private cache: Map<string, User> = new Map();
async findById(id: string): Promise<User | null> {
return this.cache.get(id) ?? null;
}
async save(user: User): Promise<void> {
this.cache.set(user.id, user);
}
}
`;
const ctx = createMockContext('src/repo.ts', code, 'module-1');
const result = extractor.extract(ctx);
// 1 class + 1 property + 2 methods
expect(result.symbols).toHaveLength(4);
const classSym = result.symbols.find(s => s.kind === 'class')!;
expect(classSym.name).toBe('UserRepository');
expect(classSym.isExported).toBe(true);
expect(classSym.metadata.implements).toEqual(['IRepository<User>']);
const methods = result.symbols.filter(s => s.kind === 'method');
expect(methods).toHaveLength(2);
expect(methods.map(m => m.name)).toEqual(['findById', 'save']);
// Check implements edge
const implEdge = result.edges.find(e => e.type === 'implements');
expect(implEdge).toBeDefined();
});
it('extracts interface with generics and members', () => {
const code = `
export interface IRepository<T extends { id: string }> {
findById(id: string): Promise<T | null>;
save(entity: T): Promise<void>;
delete(id: string): Promise<boolean>;
}
`;
const ctx = createMockContext('src/types.ts', code, 'module-1');
const result = extractor.extract(ctx);
expect(result.symbols).toHaveLength(1);
expect(result.symbols[0]).toMatchObject({
name: 'IRepository',
kind: 'interface',
isExported: true,
});
expect(result.symbols[0].metadata.generics).toEqual(['T extends { id: string }']);
expect(result.symbols[0].metadata.members).toHaveLength(3);
});
it('extracts imports with type-only and default', () => {
const code = `
import type { User } from './types';
import React, { useState, useEffect } from 'react';
import { formatDate } from './utils';
`;
const ctx = createMockContext('src/component.tsx', code, 'module-1');
const result = extractor.extract(ctx);
const imports = result.symbols.filter(s => s.kind === 'import');
expect(imports).toHaveLength(3);
expect(imports[0].metadata.isTypeOnly).toBe(true);
expect(imports[0].metadata.source).toBe('./types');
expect(imports[1].metadata.isDefault).toBe(true);
expect(imports[1].metadata.source).toBe('react');
expect(imports[1].metadata.specifiers).toContain('useState');
});
it('handles parse errors gracefully', () => {
const code = `
export function broken(
// Missing closing paren and body
`;
const ctx = createMockContext('src/broken.ts', code, 'module-1');
const result = extractor.extract(ctx);
expect(result.parseErrors.length).toBeGreaterThan(0);
// Should still return partial results if any
expect(result.symbols).toBeDefined();
});
});
});// tests/integration/full-parse.test.ts
import { describe, it, expect, beforeAll, afterAll } from 'vitest';
import { MemoryStore } from '../../src/storage/memory/store';
import { Indexer } from '../../src/engine/indexer';
import { createQuery } from '../../src/query/builder';
import path from 'path';
describe('Full Parse Integration', () => {
let store: MemoryStore;
let indexer: Indexer;
const fixturePath = path.join(__dirname, '../fixtures/ts-monorepo');
beforeAll(async () => {
store = new MemoryStore();
await store.initialize();
await store.migrate();
indexer = new Indexer(store, fixturePath);
await indexer.fullIndex();
});
afterAll(async () => {
await store.close();
});
it('indexes all modules in the monorepo', async () => {
const modules = await createQuery(store, fixturePath).modules().toArray();
expect(modules.length).toBeGreaterThanOrEqual(3); // apps/web, apps/api, packages/shared
});
it('extracts all TypeScript symbols', async () => {
const symbols = await createQuery(store, fixturePath)
.symbols()
.eq('kind', 'function')
.toArray();
expect(symbols.length).toBeGreaterThan(10);
});
it('resolves cross-module imports', async () => {
// Find a symbol in packages/shared that's imported by apps/web
const sharedExports = await createQuery(store, fixturePath)
.modules()
.eq('path', 'packages/shared')
.files()
.symbols()
.eq('is_exported', true)
.toArray();
expect(sharedExports.length).toBeGreaterThan(0);
// Check that at least one has an imports edge from apps/web
const importEdges = await store.getEdgesTo(sharedExports[0].id, 'imports');
// At least the file-level import should exist
});
it('chainable query: modules → files → symbols → filters', async () => {
const result = await createQuery(store, fixturePath)
.modules()
.eq('language', 'typescript')
.files()
.eq('ext', '.ts')
.symbols()
.eq('kind', 'class')
.eq('is_exported', true)
.toArray();
expect(result.length).toBeGreaterThan(0);
for (const sym of result) {
expect((sym as any).kind).toBe('class');
expect((sym as any).is_exported).toBe(true);
}
});
it('graph traversal: find all callers of an exported function', async () => {
const targetFunc = await createQuery(store, fixturePath)
.symbol('formatDate')
.eq('kind', 'function')
.toArray();
if (targetFunc.length === 0) return; // Skip if fixture doesn't have this
const callers = await createQuery(store, fixturePath)
.symbol('formatDate')
.callers()
.toArray();
// Should find at least one caller
expect(callers.length).toBeGreaterThan(0);
});
it('formats query result as markdown', async () => {
const md = await createQuery(store, fixturePath)
.modules()
.limit(3)
.toMarkdown();
expect(md).toContain('#');
expect(md).toContain('packages/shared'); // Based on fixture
});
});// src/types/config.ts
export interface TokenZipConfig {
// Project-level config (.tokenzip/config.json)
version: string;
storage: {
engine: 'surrealdb' | 'sqlite' | 'auto';
path: string; // relative to project root, default: .tokenzip/db
surrealdb?: {
binary_path?: string; // custom surrealdb binary
memory?: boolean; // use memory backend instead of RocksDB
};
};
languages: {
enabled: string[]; // ['typescript', 'javascript', 'python', 'sql', 'markdown']
disabled: string[];
custom: Record<string, {
extensions: string[];
grammar_path?: string; // path to custom tree-sitter WASM
extractor_path?: string; // path to custom extractor JS
}>;
};
exclude: {
paths: string[]; // glob patterns: ['**/node_modules/**', '**/dist/**', '**/.git/**']
files: string[]; // exact filenames: ['package-lock.json', 'yarn.lock']
max_file_size_kb: number; // default: 500
};
hooks: {
pre_commit: 'warn' | 'block' | 'off';
post_commit: 'on' | 'off';
validate_on_commit: boolean; // run reference integrity checks
};
mcp: {
max_tokens: number; // default: 8000
transport: 'stdio' | 'sse';
port: number; // for SSE, default: 3777
include_source: boolean; // include source code in responses
source_max_lines: number; // max lines of source per symbol, default: 50
};
indexing: {
worker_threads: number; // default: os.cpus().length - 1, min 1
batch_size: number; // files per batch, default: 100
git_history_depth: number; // commits to index, default: 100
};
workflows: {
enabled: string[]; // ['create-module', 'update-module', 'implement-feature', 'upgrade-feature', 'bug-fix']
};
}
export const DEFAULT_CONFIG: TokenZipConfig = {
version: '2.0.0',
storage: {
engine: 'auto',
path: '.tokenzip/db',
},
languages: {
enabled: ['typescript', 'javascript', 'python', 'sql', 'go', 'rust', 'java', 'kotlin', 'markdown'],
disabled: [],
custom: {},
},
exclude: {
paths: [
'**/node_modules/**',
'**