Skip to content

Instantly share code, notes, and snippets.

@devilankur18
Last active May 14, 2026 03:07
Show Gist options
  • Select an option

  • Save devilankur18/ee2402e656fa4eaa076bdf2c79fcc6b8 to your computer and use it in GitHub Desktop.

Select an option

Save devilankur18/ee2402e656fa4eaa076bdf2c79fcc6b8 to your computer and use it in GitHub Desktop.
TokenZip v2 — PRD, HLD, LLD

TokenZip — PRD, HLD, LLD


📋 PRD — Product Requirements Document

1. Executive Summary

TokenZip v2 transforms Karpathy's llm wiki concept into a gzip like token compression engine on top of entire codebase, which can reduce the LLM input token cost upto by 95% when using with Coding Copilots like Claude Code, Codex etc. Instead of generating a flat text summary, it builds a multi-level, queryable, chainable knowledge graph — from repo → modules → files → symbols — stored locally in .tokenzip/db, exposed as an MCP server for any AI copilot, and kept fresh via git hooks

2. Problem Statement

Problem Impact
AI copilots lack structural awareness of large codebases They hallucinate imports, miss dependencies, suggest changes in wrong modules
Text-based token references are flat and non-queryable Cannot ask "which functions depend on this interface?" or "what modules does this feature span?"
No persistent code intelligence layer Every session re-parses from scratch, wasting tokens and time
Documentation (PRD/HLD/LLD/README) is unstructured AI can't extract workflows, sequence diagrams, or release plans from markdown
Cross-language dependency tracking is manual A SQL schema change affecting 3 TS files is invisible until runtime
Cross-repository dependency tracking is manual The current repository has no awareness of dependent or upstream repositories, including shared interfaces, API contracts, endpoint usage, schema dependencies, or cross-repo integrations — making impact analysis and coordinated changes error-prone

| | Version-aware dependency conflicts are difficult to detect | AI copilots and developers lack visibility into incompatible interface versions, breaking API/schema changes, SDK mismatches, or transitive dependency drift across repositories — causing silent integration failures and upgrade risks

POC Results

Under 30 seconds indexing time for a codebase with ~1950 files

image

Under 1 seconds lookup.

image

3. Target Users

Primary

  • AI Copilot Users (Claude Code, Codex, OpenCode, Kilo Code) — need structured context without token waste
  • Full-stack Developers working in monorepos with 50+ modules

Secondary

  • Tech Leads auditing codebase structure and dependency health
  • Onboarding Engineers needing rapid codebase mental model

4. Product Vision

"Your codebase as a queryable graph — not a text dump. Ask structural questions, get precise answers, zero hallucination."

5. Feature Specification

5.1 Multi-Level Code Graph

Repository
  └── Module (auto-detected: package.json, pyproject.toml, go.mod, Cargo.toml, etc.)
        └── File
              └── Symbol (function, class, interface, variable, table, column, etc.)

Acceptance Criteria:

  • Auto-detect module boundaries by presence of manifest files
  • Support nested modules (monorepo: repo → apps/web → src/components)
  • Each node has a stable UUID that survives renames (content-hash + path-hash hybrid)

5.2 Tree-Sitter Metadata Extraction

Language Extracted Artifacts
.js, .mjs Functions, classes, exports, imports, global vars, JSDoc
.ts, .tsx Above + interfaces, type aliases, generics, enums, decorators, namespace exports
.py Functions, classes, decorators, type hints, imports, async defs
.sql Tables, views, columns, constraints, indexes, foreign keys, stored procedures
.go Functions, structs, interfaces, methods, packages, imports
.rs Functions, structs, traits, impls, enums, mods, use statements
.java, .kt Classes, interfaces, methods, annotations, packages
.md (special) Headings, lists, code blocks, mermaid diagrams, tables, frontmatter

Acceptance Criteria:

  • Each symbol stored as a node with: name, kind, signature, line range, hash, docstring
  • Relationships: CALLS, IMPLEMENTS, INHERITS, IMPORTS, EXPORTS, MODIFIES, READS
  • Incremental parse: only re-parse files whose content hash changed
  • Parse errors stored as node metadata (not silently dropped)

5.3 Documentation Intelligence

For structured markdown files (.prd.md, .hld.md, .lld.md, README.md, CHANGELOG.md, ADR/*.md):

Section Type Extracted Structure
## Workflow / ## Flow Ordered step graph with actors and actions
## Sequence Diagram Parsed mermaid sequenceDiagram into actor→message→actor edges
## Flowchart Parsed mermaid flowchart into decision/action node graph
## Release Plan Timeline with milestones, versions, dates
## API Endpoint → method → params → response schema
## Architecture / ## Components Component hierarchy with responsibility and tech stack
## Decision (ADR) Context → Decision → Consequences as structured tuple
Standard lists Typed list items (checkbox, numbered, bullet) with nesting
Tables Columnar data as records

Acceptance Criteria:

  • Mermaid blocks parsed into graph nodes, not stored as raw text
  • Section-level linking: a workflow step can reference a function symbol node
  • Cross-reference resolution: [see ModuleX] in PRD links to Module node in graph

5.4 Chainable Query API

// Level 1: Repository
const repo = tz.repo('.');

// Level 2: Modules (filterable, chainable)
const feModules = repo.modules().filter(m => m.language === 'typescript');

// Level 3: Files within modules
const tsFiles = feModules.files().filter(f => f.ext === '.tsx');

// Level 4: Symbols within files
const exportedComponents = tsFiles.symbols()
  .filter(s => s.kind === 'class' && s.isExported && s.extends('React.Component'));

// Cross-cutting queries
const dependants = tz.repo('.').symbol('UserService.authenticate')
  .dependants()                    // who calls this?
  .withinModule('api-gateway')     // scope it
  .withKind('function');           // filter

const impact = tz.repo('.').table('users')
  .columns()                       // what columns
  .referencedBy()                  // where are they referenced
  .files();                        // which files

const workflow = tz.repo('.').doc('prd.md')
  .section('Workflow: User Onboarding')
  .steps()                         // ordered steps
  .linkedSymbols();                 // what code implements each step

Acceptance Criteria:

  • Every level returns a query builder, not raw data (lazy evaluation)
  • .toArray(), .toGraph(), .toMarkdown(), .toJSON() terminal methods
  • Queries translate to SurrealDB graph traversal queries
  • Response < 100ms for repos up to 100K files

5.5 Graph Database Storage

  • Engine: SurrealDB (embedded via RocksDB storage)
  • Location: <project_root>/.tokenzip/db/
  • Schema: Schemaful (strict types per node kind)
  • Persistence: WAL-enabled, crash-safe

Acceptance Criteria:

  • .tokenzip/ added to .gitignore automatically
  • DB size < 10% of source code size for typical repos
  • Cold start (first full parse) completes at > 500 files/second
  • Hot start (incremental) completes at > 2000 files/second

5.6 Git Hook Integration

# Installed via: tokenzip init
# Creates .git/hooks/pre-commit and .git/hooks/post-commit

pre-commit:
  1. Detect staged files (git diff --cached --name-only)
  2. Parse changed files with tree-sitter
  3. Diff new AST against stored graph
  4. Validate: no broken exports, no orphan imports
  5. Update graph with new symbol nodes/edges
  6. If validation fails: warn (configurable: warn/block)

post-commit:
  1. Store commit metadata (hash, message, author, timestamp)
  2. Create COMMIT → MODIFIED → FILE edges
  3. Update file-level git history nodes

Acceptance Criteria:

  • Hook installation is non-destructive (appends to existing hooks)
  • Hook execution adds < 500ms to commit time for typical changes (< 10 files)
  • tokenzip init --no-hooks flag for CI environments
  • tokenzip status shows graph health (stale files, broken references)

5.7 MCP Server

// Exposed to any MCP-compatible client
{
  "tools": [
    "query_repo_structure",
    "query_module", 
    "query_file",
    "query_symbol",
    "get_dependencies",
    "get_dependants",
    "search_symbols",
    "get_git_history",
    "get_workflow",
    "get_impact_analysis",
    "execute_workflow_template"
  ],
  "resources": [
    "tokenzip://repo/structure",
    "tokenzip://module/{name}/overview",
    "tokenzip://file/{path}/symbols",
    "tokenzip://symbol/{id}/detail"
  ]
}

Acceptance Criteria:

  • MCP server starts in < 200ms
  • All tools return structured JSON (never raw text dumps)
  • Token budget aware: responses include token_count metadata
  • Works with Claude Code, Codex, OpenCode, Kilo Code without config changes
  • Concurrent tool calls supported (SurrealDB connection pooling)

5.8 Workflow Templates

Workflow Input Output Graph Operations
Create Module module name, type, dependencies Scaffolded structure + graph nodes CREATE module, CREATE files, CREATE IMPORTS edges
Update Module module name, change description Affected files + symbols list READ dependants, READ dependents, DIFF graph
Implement Feature feature description, target module Files to create/modify, symbol gaps SEARCH related symbols, PATH analysis, IMPACT query
Upgrade Feature feature name, upgrade description Migration plan + affected modules SUBGRAPH extraction, DEPENDENCY chain analysis
Bug Fix error message / stack trace Root cause candidates + impact radius TRACE call chain, FIND modified symbols in git blame range

Acceptance Criteria:

  • Each workflow is a deterministic graph query sequence, not LLM-generated
  • Workflows return structured data that an LLM can act on (not final answers)
  • Workflow results are cached and timestamped in the graph

6. Non-Functional Requirements

Category Requirement
Performance Full index of 100K file repo < 3 minutes; incremental update < 2 seconds
Memory MCP server idle < 50MB; parsing peak < 500MB
Reliability Never corrupt the graph on crash; WAL recovery on restart
Compatibility Node.js 20+, macOS 12+, Ubuntu 22.04+, Windows WSL2
Security No network calls; all data local; no code execution from graph
Extensibility New language support via plugin (tree-sitter grammar + extractor config)

7. Success Metrics

Metric Target
Copilot context accuracy (relevant vs irrelevant tokens) > 85% (vs ~40% with text dump)
Time to first useful query after tokenzip init < 5 minutes for 50K file repo
Hook overhead per commit < 500ms
MCP tool call latency (p95) < 200ms
Graph size efficiency < 10% of source size

8. Out of Scope (v2)

  • Remote graph synchronization (multi-developer shared graph)
  • LLM-powered code generation (this is a context layer, not a code writer)
  • Runtime analysis (only static analysis via tree-sitter)
  • Binary file parsing (images, compiled artifacts)
  • IDE plugin (VS Code extension is v3)

9. Release Phases

Phase Scope Timeline
Alpha Core graph + JS/TS parsing + MCP server + basic queries Week 1-3
Beta All languages + git hooks + documentation intelligence Week 4-6
RC Workflow templates + chainable API polish + perf tuning Week 7-8
GA Stability hardening + plugin system + docs Week 9-10

🏗️ HLD — High-Level Design

1. Architecture Overview

TokenZip v2 is a local-first, static-analysis graph engine with four layers:

┌─────────────────────────────────────────────────────────────────┐
│                    LAYER 4: INTEGRATION                         │
│  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌───────────────┐  │
│  │ Claude   │  │ Codex    │  │ OpenCode │  │ Kilo Code     │  │
│  │ Code     │  │          │  │          │  │               │  │
│  └────┬─────┘  └────┬─────┘  └────┬─────┘  └──────┬────────┘  │
│       │              │              │                │           │
│       └──────────────┴──────┬───────┴────────────────┘           │
│                             │ MCP Protocol (stdio/SSE)          │
├─────────────────────────────┼───────────────────────────────────┤
│                    LAYER 3: API & QUERY                         │
│  ┌──────────────────────────┴──────────────────────────────┐   │
│  │                    MCP Server                            │   │
│  │  ┌─────────────────┐  ┌──────────────────────────────┐  │   │
│  │  │  Tool Registry  │  │  Resource Registry            │  │   │
│  │  └────────┬────────┘  └──────────────┬───────────────┘  │   │
│  │           └──────────┬───────────────┘                  │   │
│  │              ┌───────┴────────┐                         │   │
│  │              │ Chainable Query│                         │   │
│  │              │ Builder (CQB)  │                         │   │
│  │              └───────┬────────┘                         │   │
│  └──────────────────────┼──────────────────────────────────┘   │
├──────────────────────────┼──────────────────────────────────────┤
│                    LAYER 2: ENGINE                              │
│  ┌───────────────────────┼──────────────────────────────────┐  │
│  │  ┌────────────┐  ┌────┴─────┐  ┌──────────┐  ┌───────┐  │  │
│  │  │ Tree-Sitter│  │ Markdown │  │ Workflow │  │ Graph │  │  │
│  │  │ Extractor  │  │ Parser   │  │ Engine   │  │ Query │  │  │
│  │  │ (per lang) │  │ (struct) │  │ (tpl)    │  │ Planner│  │  │
│  │  └─────┬──────┘  └────┬─────┘  └────┬─────┘  └───┬───┘  │  │
│  │        └──────────────┼──────────────┼────────────┘      │  │
│  │              ┌───────┴──────────────┴───────┐            │  │
│  │              │     Graph Mutation Engine     │            │  │
│  │              │  (diff, merge, validate)      │            │  │
│  │              └───────────────┬───────────────┘            │  │
│  └──────────────────────────────┼────────────────────────────┘  │
├──────────────────────────────┼─────────────────────────────────┤
│                    LAYER 1: STORAGE                            │
│  ┌───────────────────────────┼─────────────────────────────┐  │
│  │              ┌────────────┴────────────┐                 │  │
│  │              │   Storage Abstraction   │                 │  │
│  │              │   (IStore interface)    │                 │  │
│  │              └────────────┬────────────┘                 │  │
│  │        ┌──────────────────┼──────────────────┐           │  │
│  │  ┌─────┴──────┐    ┌─────┴──────┐    ┌─────┴──────┐     │  │
│  │  │ SurrealDB  │    │  SQLite    │    │  In-Memory │     │  │
│  │  │ (primary)  │    │ (fallback) │    │  (tests)   │     │  │
│  │  └────────────┘    └────────────┘    └────────────┘     │  │
│  └──────────────────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────┐
│                    SIDE CHANNELS                                │
│  ┌──────────────┐  ┌───────────────┐  ┌────────────────────┐  │
│  │ Git Hooks    │  │ File Watcher  │  │ CLI (tokenzip)     │  │
│  │ pre-commit   │  │ (optional)    │  │ init, parse, query │  │
│  │ post-commit  │  │ chokidar      │  │ status, serve      │  │
│  └──────────────┘  └───────────────┘  └────────────────────┘  │
└─────────────────────────────────────────────────────────────────┘

2. Component Design

2.1 Tree-Sitter Extractor

                    ┌─────────────────────┐
                    │  File Input Stream  │
                    └──────────┬──────────┘
                               │
                    ┌──────────┴──────────┐
                    │  Language Detector  │
                    │  (extension + shebang│
                    │   + .editorconfig)  │
                    └──────────┬──────────┘
                               │
              ┌────────────────┼────────────────┐
              │                │                │
     ┌────────┴──────┐ ┌──────┴──────┐ ┌──────┴──────┐
     │ Code Extractor│ │ SQL Extract.│ │ MD Extractor│
     │ (JS/TS/Py/Go │ │ (Tables,    │ │ (Sections,  │
     │  /Rs/Java/Kt) │ │  Columns,   │ │  Mermaid,   │
     │               │ │  FKs, SPs)  │ │  Lists,     │
     │               │ │             │ │  Tables)    │
     └───────┬───────┘ └──────┬──────┘ └──────┬──────┘
             │                │                │
             └────────────────┼────────────────┘
                              │
                    ┌─────────┴──────────┐
                    │  Symbol Graph      │
                    │  (nodes + edges)   │
                    └────────────────────┘

Key Design Decision: Extractors produce an intermediate representation (IR) — a flat list of SymbolNode and SymbolEdge objects — regardless of source language. This decouples parsing from storage.

2.2 Chainable Query Builder (CQB)

QueryBuilder
  ├── .repo(path)          → RepoScope
  │     ├── .modules()     → ModuleScope
  │     │     ├── .files() → FileScope
  │     │     │     ├── .symbols() → SymbolScope
  │     │     │     ├── .tables()  → TableScope
  │     │     │     └── .sections()→ SectionScope
  │     │     ├── .dependencies()  → ModuleScope (external deps)
  │     │     └── .dependants()    → ModuleScope
  │     ├── .files()       → FileScope (all files, no module filter)
  │     ├── .symbols()     → SymbolScope (global search)
  │     ├── .tables()      → TableScope
  │     └── .docs()        → DocScope
  ├── .symbol(name)        → SymbolScope (direct lookup)
  ├── .table(name)         → TableScope
  ├── .commit(hash)        → CommitScope
  └── .workflow(name)      → WorkflowScope

Every Scope has:
  ├── .filter(predicate)   → same Scope (adds WHERE clause)
  ├── .sort(field, dir)    → same Scope
  ├── .limit(n)            → same Scope
  ├── .offset(n)           → same Scope
  └── Terminal methods:
        ├── .toArray()     → SymbolNode[]
        ├── .toGraph()     → { nodes: [], edges: [] }
        ├── .toMarkdown()  → string
        ├── .toJSON()      → string
        ├── .count()       → number
        └── .exists()      → boolean

2.3 MCP Server Architecture

┌─────────────────────────────────────────────┐
│              MCP Server                      │
│                                              │
│  ┌─────────────────────────────────────┐    │
│  │         Transport Layer              │    │
│  │  ┌──────────┐    ┌───────────────┐  │    │
│  │  │  stdio   │    │  SSE/HTTP     │  │    │
│  │  │ (default)│    │ (optional)    │  │    │
│  │  └────┬─────┘    └──────┬────────┘  │    │
│  └───────┼──────────────────┼───────────┘    │
│          └──────────┬───────┘                │
│              ┌─────┴──────┐                  │
│              │  Protocol  │                  │
│              │  Handler   │                  │
│              └─────┬──────┘                  │
│                    │                         │
│  ┌─────────────────┼─────────────────────┐  │
│  │            Tool Dispatcher            │  │
│  │  ┌──────────┐ ┌──────────┐ ┌────────┐ │  │
│  │  │ Structure│ │ Search   │ │ Impact │ │  │
│  │  │ Tools    │ │ Tools    │ │ Tools  │ │  │
│  │  └────┬─────┘ └────┬─────┘ └───┬────┘ │  │
│  │       └─────────────┼───────────┘      │  │
│  │              ┌──────┴──────┐           │  │
│  │              │    CQB      │           │  │
│  │              │  (shared)   │           │  │
│  │              └──────┬──────┘           │  │
│  └─────────────────────┼──────────────────┘  │
│                        │                     │
│  ┌─────────────────────┼──────────────────┐  │
│  │          Token Budget Manager          │  │
│  │  - Estimates response token count      │  │
│  │  - Truncates if over budget            │  │
│  │  - Prioritizes: symbols > files > mods │  │
│  └─────────────────────────────────────────┘  │
└─────────────────────────────────────────────┘

2.4 Git Hook Pipeline

pre-commit trigger
       │
       ▼
┌──────────────────┐
│ git diff --cached │
│ --name-only       │
└───────┬──────────┘
        │ staged file paths
        ▼
┌──────────────────┐
│ Content Hash     │  ← SHA256 of file content
│ Check            │  ← Compare with stored hash
└───────┬──────────┘
        │ changed files only
        ▼
┌──────────────────┐
│ Tree-Sitter      │  ← Parallel parse (worker threads)
│ Batch Parse      │
└───────┬──────────┘
        │ new symbol IR
        ▼
┌──────────────────┐
│ Graph Diff       │  ← Old symbols vs new symbols
│ & Merge          │  ← Update nodes, edges, hashes
└───────┬──────────┘
        │
        ▼
┌──────────────────┐
│ Validation       │  ← Check: broken exports, orphan imports,
│ (optional)       │     missing type references
└───────┬──────────┘
        │
   ┌────┴────┐
   │         │
   ▼         ▼
PASS      FAIL
   │         │
   ▼         ▼
Continue   Warn/Block
Commit     (configurable)

3. Data Model (Graph Schema)

3.1 Node Types

┌─────────────────────────────────────────────────────────────────┐
│ NODE: repository                                                 │
│   id:        string (record ID)                                  │
│   name:      string                                              │
│   root:      string (absolute path)                              │
│   created_at: datetime                                           │
│   updated_at: datetime                                           │
│   stats:     { files: number, modules: number, symbols: number } │
└─────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────┐
│ NODE: module                                                     │
│   id:            string                                          │
│   name:          string                                          │
│   path:          string (relative to repo root)                  │
│   manifest_type: string (package.json | pyproject.toml | ...)    │
│   language:      string (primary language)                       │
│   is_root:       bool                                            │
│   metadata:      { name, version, description, ... }             │
└─────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────┐
│ NODE: file                                                       │
│   id:          string                                            │
│   path:        string (relative to repo root)                    │
│   module_id:   string (reference to module)                      │
│   language:    string                                            │
│   ext:         string                                            │
│   size_bytes:  number                                            │
│   content_hash: string (SHA256)                                  │
│   line_count:  number                                            │
│   parse_status: string (parsed | partial | failed | skipped)     │
│   parse_error:  option<string>                                   │
│   last_parsed: datetime                                          │
│   git_last_modified: option<datetime>                            │
│   git_blame_summary: option<{ author, date, commit_count }>      │
└─────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────┐
│ NODE: symbol (polymorphic by kind)                               │
│   id:            string                                          │
│   file_id:       string                                          │
│   name:          string                                          │
│   kind:          enum {                                          │
│     function, method, constructor,                               │
│     class, interface, type_alias, enum,                          │
│     variable, constant, property,                                │
│     parameter, generic_param,                                    │
│     decorator, annotation,                                       │
│     table, view, column, index, constraint,                      │
│     foreign_key, stored_procedure,                               │
│     import, export, re_export,                                   │
│     namespace, module_decl,                                      │
│     section, subsection,                                         │
│     workflow_step, diagram_node,                                 │
│     list_item, table_row                                         │
│   }                                                             │
│   signature:     option<string>  (full signature text)           │
│   return_type:   option<string>                                  │
│   start_line:    number                                          │
│   end_line:      number                                          │
│   start_col:     number                                          │
│   end_col:       number                                          │
│   docstring:     option<string>                                  │
│   is_exported:   bool                                            │
│   is_async:      option<bool>                                    │
│   is_static:     option<bool>                                    │
│   visibility:    option<enum { public, private, protected }>     │
│   modifiers:     array<string>                                   │
│   parent_symbol_id: option<string> (for nested symbols)          │
│   metadata:      object (language-specific extras)               │
│     // For tables: { schema, engine, columns: [...] }           │
│     // For classes: { implements: [...], extends: ... }         │
│     // For functions: { params: [...], generics: [...] }        │
│     // For sections: { level, anchor_id }                       │
└─────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────┐
│ NODE: commit                                                     │
│   id:        string                                             │
│   hash:      string (full SHA)                                  │
│   short_hash: string (7 char)                                   │
│   message:   string                                             │
│   author:    string                                             │
│   email:     string                                             │
│   date:      datetime                                           │
│   branch:    string                                             │
│   tags:      array<string>                                      │
└─────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────┐
│ NODE: dependency (external)                                      │
│   id:          string                                            │
│   module_id:   string (which module depends on it)               │
│   name:        string (npm package name, pip package, etc.)      │
│   version:     string (resolved version)                         │
│   dev:         bool                                              │
│   source:      string (npm, pip, cargo, go modules, maven)       │
└─────────────────────────────────────────────────────────────────┘

3.2 Edge Types

EDGE: contains
  FROM: repository  → TO: module
  FROM: module      → TO: file
  FROM: file        → TO: symbol
  FROM: symbol      → TO: symbol (nested: class → method)

EDGE: imports
  FROM: file    → TO: file       (file-level import)
  FROM: module  → TO: module     (module-level dependency)
  FROM: symbol  → TO: symbol     (symbol-level import)
  METADATA: { is_type_only: bool, is_default: bool, alias: option<string> }

EDGE: exports
  FROM: file   → TO: symbol
  FROM: symbol → TO: symbol       (re-export chain)
  METADATA: { is_default: bool, is_reexport: bool, alias: option<string> }

EDGE: calls
  FROM: symbol (function/method) → TO: symbol (function/method)
  METADATA: { line: number, is_async: bool, call_type: enum { direct, indirect, dynamic } }

EDGE: implements
  FROM: symbol (class) → TO: symbol (interface)
  METADATA: { is_partial: bool }

EDGE: inherits
  FROM: symbol (class/interface) → TO: symbol (class/interface)
  METADATA: { is_interface_inheritance: bool }

EDGE: modifies
  FROM: symbol (function) → TO: symbol (variable/table/column)

EDGE: reads
  FROM: symbol (function) → TO: symbol (variable/table/column)

EDGE: references
  FROM: symbol → TO: symbol (generic "uses" relationship)
  METADATA: { context: string }

EDGE: depends_on
  FROM: module → TO: module (transitive closure of imports)
  FROM: file   → TO: file
  METADATA: { is_transitive: bool, depth: number }

EDGE: depended_by  (computed reverse of depends_on)

EDGE: modified_in
  FROM: file   → TO: commit
  METADATA: { change_type: enum { added, modified, deleted, renamed } }

EDGE: authored_by
  FROM: file/symbol → TO: commit (latest commit touching this artifact)

EDGE: belongs_to_workflow
  FROM: symbol → TO: symbol (workflow_step)

EDGE: workflow_transition
  FROM: symbol (workflow_step) → TO: symbol (workflow_step)
  METADATA: { condition: option<string>, action: option<string> }

EDGE: diagram_edge
  FROM: symbol (diagram_node) → TO: symbol (diagram_node)
  METADATA: { label: string, style: string, type: enum { solid, dashed, dotted, bold } }

EDGE: foreign_key
  FROM: symbol (column) → TO: symbol (table)
  METADATA: { constraint_name: string, on_delete: string, on_update: string }

EDGE: column_of
  FROM: symbol (column/index/constraint) → TO: symbol (table)

3.3 Indexes

DEFINE INDEX idx_file_path      ON file   FIELDS path         UNIQUE
DEFINE INDEX idx_file_hash      ON file   FIELDS content_hash
DEFINE INDEX idx_file_module    ON file   FIELDS module_id
DEFINE INDEX idx_symbol_name    ON symbol FIELDS name
DEFINE INDEX idx_symbol_kind    ON symbol FIELDS kind
DEFINE INDEX idx_symbol_file    ON symbol FIELDS file_id
DEFINE INDEX idx_symbol_export  ON symbol FIELDS is_exported
DEFINE INDEX idx_module_path    ON module FIELDS path          UNIQUE
DEFINE INDEX idx_commit_hash    ON commit FIELDS hash          UNIQUE
DEFINE INDEX idx_dep_name       ON dependency FIELDS name, module_id

4. Technology Stack

Component Technology Rationale
Runtime Node.js 20+ (ESM) Universal, tree-sitter bindings available, MCP SDK native
Tree-Sitter tree-sitter + language grammars Industry standard, incremental parsing, multi-language
Graph DB SurrealDB v2 (embedded/RocksDB) Native graph queries, schemaful, embedded mode, no server
Fallback DB better-sqlite3 Zero-config fallback if SurrealDB unavailable
MCP @modelcontextprotocol/sdk Official SDK, stdio + SSE transport
CLI commander Battle-tested CLI framework
Git simple-git Promise-based git operations
File Watch chokidar Cross-platform, efficient
Logging pino Structured, fast
Testing vitest + memfs Fast, in-memory FS for unit tests
Bundling tsup ESM + CJS dual output, tree-shaking
Markdown unified + remark + rehype Pluggable markdown AST pipeline
Mermaid mermaid (headless) Parse mermaid diagrams to structured data

5. Integration Architecture

5.1 MCP Integration Points

Claude Code / Codex / OpenCode
         │
         │  MCP Protocol (JSON-RPC 2.0 over stdio)
         │
    ┌────┴─────┐
    │ MCP      │
    │ Server   │
    └────┬─────┘
         │
    ┌────┴──────────────────────────────────┐
    │              Tool Calls               │
    │                                       │
    │  1. query_repo_structure              │
    │     → Returns module tree + stats     │
    │                                       │
    │  2. query_symbol { name, scope }      │
    │     → Symbol node + edges             │
    │                                       │
    │  3. get_impact_analysis { symbol_id } │
    │     → Dependents + transitive closure │
    │                                       │
    │  4. search_symbols { query, filters } │
    │     → Fuzzy match on name/signature   │
    │                                       │
    │  5. get_workflow { doc, section }     │
    │     → Structured workflow + links     │
    │                                       │
    │  6. get_git_history { path, limit }   │
    │     → Commit chain for file/symbol    │
    │                                       │
    │  7. execute_workflow_template {       │
    │       type, params }                  │
    │     → Structured analysis result      │
    │                                       │
    │  8. get_dependencies { module_id }    │
    │     → Internal + external deps        │
    │                                       │
    │  9. get_dependants { symbol_id }      │
    │     → Reverse dependency chain        │
    │                                       │
    │  10. get_context_for_files {          │
    │        paths, max_tokens }            │
    │      → Token-budget-aware context     │
    │                                       │
    └───────────────────────────────────────┘

5.2 Claude Code MCP Config (auto-generated)

{
  "mcpServers": {
    "tokenzip": {
      "command": "npx",
      "args": ["tokenzip", "serve", "--cwd", "/path/to/project"],
      "env": {}
    }
  }
}

6. Security Considerations

  • No network: All data stays local. SurrealDB binds to 127.0.0.1 only if HTTP transport used.
  • No code execution: Graph stores metadata only. No eval, no require from stored data.
  • Path traversal protection: All file paths resolved and canonicalized before storage.
  • Git hook safety: Hooks are read-only from git's perspective (never force-push, never amend).
  • .tokenzip/ in .gitignore: Automatically appended, never committed.
  • Token budget: MCP responses capped at configurable token limit to prevent context overflow.

7. Deployment Model

Local Developer Machine
│
├── ~/.tokenzip/
│   ├── config.json          # Global config
│   ├── surrealdb/           # Shared SurrealDB binary (if not system-installed)
│   └── cache/               # Cross-project cache
│
└── <project-root>/
    ├── .tokenzip/
    │   ├── db/              # SurrealDB data directory
    │   │   ├── data.db      # RocksDB storage
    │   │   └── lock         # Process lock
    │   ├── config.json      # Project-specific config
    │   │   ├── languages: [...]
    │   │   ├── excluded: [...]
    │   │   ├── hooks: { preCommit: "warn" | "block" | "off" }
    │   │   └── mcp: { maxTokens: 8000, transport: "stdio" }
    │   └── state.json       # Parse state, last commit, version
    │
    ├── .git/
    │   └── hooks/
    │       ├── pre-commit   # Appended tokenzip hook
    │       └── post-commit  # Appended tokenzip hook
    │
    └── .gitignore           # Contains .tokenzip/

🔧 LLD — Low-Level Design

1. Module Structure

tokenzip/
├── src/
│   ├── index.ts                    # Public API entry point
│   │
│   ├── cli/                        # CLI layer
│   │   ├── index.ts                # Commander setup
│   │   ├── commands/
│   │   │   ├── init.ts             # tokenzip init
│   │   │   ├── parse.ts            # tokenzip parse [--full | --incremental]
│   │   │   ├── query.ts            # tokenzip query <cqb-expression>
│   │   │   ├── status.ts           # tokenzip status
│   │   │   ├── serve.ts            # tokenzip serve [--transport stdio|sse] [--port 3000]
│   │   │   ├── hooks.ts            # tokenzip hooks install|uninstall
│   │   │   └── clean.ts            # tokenzip clean
│   │   └── utils/
│   │       └── spinner.ts
│   │
│   ├── mcp/                        # MCP server layer
│   │   ├── server.ts               # MCP server creation & setup
│   │   ├── transport/
│   │   │   ├── stdio.ts
│   │   │   └── sse.ts
│   │   ├── tools/
│   │   │   ├── registry.ts         # Tool registration
│   │   │   ├── structure.ts        # query_repo_structure, query_module
│   │   │   ├── symbol.ts           # query_symbol, search_symbols
│   │   │   ├── dependency.ts       # get_dependencies, get_dependants
│   │   │   ├── impact.ts           # get_impact_analysis
│   │   │   ├── git.ts              # get_git_history
│   │   │   ├── workflow.ts         # get_workflow, execute_workflow_template
│   │   │   └── context.ts          # get_context_for_files
│   │   ├── resources/
│   │   │   ├── registry.ts
│   │   │   ├── repo.ts
│   │   │   ├── module.ts
│   │   │   ├── file.ts
│   │   │   └── symbol.ts
│   │   └── token-budget.ts         # Token estimation & truncation
│   │
│   ├── query/                      # Chainable Query Builder
│   │   ├── builder.ts              # Base QueryBuilder class
│   │   ├── scopes/
│   │   │   ├── repo-scope.ts
│   │   │   ├── module-scope.ts
│   │   │   ├── file-scope.ts
│   │   │   ├── symbol-scope.ts
│   │   │   ├── table-scope.ts
│   │   │   ├── commit-scope.ts
│   │   │   ├── doc-scope.ts
│   │   │   └── workflow-scope.ts
│   │   ├── filters.ts              # Filter predicate parser
│   │   ├── translators/
│   │   │   ├── surrealql.ts        # CQB → SurrealQL translation
│   │   │   └── sql.ts              # CQB → SQL translation (SQLite fallback)
│   │   └── types.ts
│   │
│   ├── engine/                     # Core engine layer
│   │   ├── indexer.ts              # Full & incremental indexing orchestrator
│   │   ├── differ.ts               # Graph diff: old symbols vs new symbols
│   │   ├── merger.ts               # Merge diff into graph
│   │   ├── validator.ts            # Reference integrity validation
│   │   ├── module-detector.ts      # Detect module boundaries
│   │   └── language-detector.ts    # Detect language from extension + content
│   │
│   ├── extractor/                  # Tree-sitter extraction layer
│   │   ├── base-extractor.ts       # Abstract extractor interface
│   │   ├── registry.ts             # Language → extractor mapping
│   │   ├── code/
│   │   │   ├── javascript.ts       # JS/JSX extractor
│   │   │   ├── typescript.ts       # TS/TSX extractor
│   │   │   ├── python.ts
│   │   │   ├── go.ts
│   │   │   ├── rust.ts
│   │   │   ├── java.ts
│   │   │   └── kotlin.ts
│   │   ├── sql/
│   │   │   └── sql.ts              # SQL extractor (tables, columns, FKs)
│   │   ├── markdown/
│   │   │   ├── markdown.ts         # Markdown structure extractor
│   │   │   ├── mermaid.ts          # Mermaid diagram parser
│   │   │   └── sections.ts         # Section type classifier
│   │   └── types.ts                # SymbolIR, EdgeIR types
│   │
│   ├── storage/                    # Storage abstraction layer
│   │   ├── interface.ts            # IStore interface
│   │   ├── surreal/
│   │   │   ├── connection.ts       # Connection pool & lifecycle
│   │   │   ├── migrations.ts       # Schema migration
│   │   │   ├── queries/
│   │   │   │   ├── nodes.ts
│   │   │   │   ├── edges.ts
│   │   │   │   ├── graph.ts
│   │   │   │   └── search.ts
│   │   │   └── store.ts            # SurrealStore implements IStore
│   │   ├── sqlite/
│   │   │   ├── schema.ts           # Table creation
│   │   │   ├── queries/
│   │   │   │   ├── nodes.ts
│   │   │   │   ├── edges.ts
│   │   │   │   └── graph.ts
│   │   │   └── store.ts            # SQLiteStore implements IStore
│   │   ├── memory/
│   │   │   └── store.ts            # MemoryStore for testing
│   │   └── factory.ts              # StoreFactory: config → IStore
│   │
│   ├── hooks/                      # Git hook layer
│   │   ├── installer.ts            # Install hooks into .git/hooks/
│   │   ├── pre-commit.ts           # Pre-commit logic
│   │   ├── post-commit.ts          # Post-commit logic
│   │   └── detector.ts             # Detect staged files
│   │
│   ├── workflows/                  # Workflow template engine
│   │   ├── engine.ts               # Workflow executor
│   │   ├── registry.ts             # Workflow template registry
│   │   └── templates/
│   │       ├── create-module.ts
│   │       ├── update-module.ts
│   │       ├── implement-feature.ts
│   │       ├── upgrade-feature.ts
│   │       └── bug-fix.ts
│   │
│   ├── utils/
│   │   ├── logger.ts
│   │   ├── hash.ts                 # Content hashing (SHA256)
│   │   ├── path.ts                 # Path resolution & normalization
│   │   ├── tokens.ts               # Token estimation (chars/4 for code)
│   │   ├── workers.ts              # Worker thread pool for parsing
│   │   └── version.ts
│   │
│   └── types/
│       ├── graph.ts                # All node & edge types
│       ├── extractor.ts            # Extractor IR types
│       ├── query.ts                # Query builder types
│       └── config.ts               # Configuration types
│
├── grammars/                       # Tree-sitter WASM grammars (bundled)
│   ├── tree-sitter-javascript.wasm
│   ├── tree-sitter-typescript.wasm
│   ├── tree-sitter-python.wasm
│   ├── tree-sitter-go.wasm
│   ├── tree-sitter-rust.wasm
│   ├── tree-sitter-java.wasm
│   ├── tree-sitter-kotlin.wasm
│   └── tree-sitter-sql.wasm
│
├── tests/
│   ├── unit/
│   │   ├── extractor/
│   │   │   ├── javascript.test.ts
│   │   │   ├── typescript.test.ts
│   │   │   ├── python.test.ts
│   │   │   ├── sql.test.ts
│   │   │   └── markdown.test.ts
│   │   ├── query/
│   │   │   └── builder.test.ts
│   │   ├── engine/
│   │   │   ├── differ.test.ts
│   │   │   ├── merger.test.ts
│   │   │   └── module-detector.test.ts
│   │   ├── storage/
│   │   │   └── memory-store.test.ts
│   │   └── hooks/
│   │       └── detector.test.ts
│   ├── integration/
│   │   ├── full-parse.test.ts
│   │   ├── incremental-parse.test.ts
│   │   ├── mcp-server.test.ts
│   │   └── git-hook.test.ts
│   ├── fixtures/
│   │   ├── js-project/
│   │   ├── ts-monorepo/
│   │   ├── python-project/
│   │   ├── sql-project/
│   │   └── mixed-project/
│   └── e2e/
│       └── claude-code.test.ts
│
├── package.json
├── tsconfig.json
├── tsup.config.ts
└── vitest.config.ts

2. Detailed Component Design

2.1 Storage Abstraction (IStore)

// src/storage/interface.ts

import type { 
  RepositoryNode, ModuleNode, FileNode, SymbolNode, 
  CommitNode, DependencyNode,
  ContainsEdge, ImportsEdge, ExportsEdge, CallsEdge,
  ImplementsEdge, InheritsEdge, ModifiesEdge, ReadsEdge,
  ReferencesEdge, DependsOnEdge, ModifiedInEdge,
  ForeignKeyEdge, ColumnOfEdge,
  // ... all edge types
} from '../types/graph';

export interface GraphNode {
  id: string;
  type: 'repository' | 'module' | 'file' | 'symbol' | 'commit' | 'dependency';
  [key: string]: unknown;
}

export interface GraphEdge {
  id: string;
  type: string;
  from: string;
  to: string;
  [key: string]: unknown;
}

export interface GraphResult {
  nodes: GraphNode[];
  edges: GraphEdge[];
}

export interface StoreStats {
  nodeCount: Record<string, number>;
  edgeCount: Record<string, number>;
  dbSizeBytes: number;
}

export interface IStore {
  // Lifecycle
  initialize(): Promise<void>;
  close(): Promise<void>;
  migrate(): Promise<void>;
  clear(): Promise<void>;
  stats(): Promise<StoreStats>;

  // Node CRUD
  createNode<T extends GraphNode>(node: T): Promise<T>;
  createNodes<T extends GraphNode>(nodes: T[]): Promise<T[]>;
  getNode<T extends GraphNode>(id: string): Promise<T | null>;
  getNodes(ids: string[]): Promise<GraphNode[]>;
  updateNode<T extends GraphNode>(id: string, patch: Partial<T>): Promise<T>;
  deleteNode(id: string): Promise<void>;
  deleteNodes(ids: string[]): Promise<void>;

  // Edge CRUD
  createEdge<T extends GraphEdge>(edge: T): Promise<T>;
  createEdges<T extends GraphEdge>(edges: T[]): Promise<T[]>;
  getEdges(from: string, type?: string): Promise<GraphEdge[]>;
  getEdgesTo(to: string, type?: string): Promise<GraphEdge[]>;
  deleteEdges(from: string, type?: string): Promise<void>;

  // Graph Queries
  query(surrealQL: string, vars?: Record<string, unknown>): Promise<unknown[]>;
  graphTraversal(
    startId: string,
    edgeTypes: string[],
    direction: 'outbound' | 'inbound' | 'both',
    depth?: number,
    filter?: string
  ): Promise<GraphResult>;

  // Bulk Operations
  batchUpsert(nodes: GraphNode[], edges: GraphEdge[]): Promise<void>;
  
  // Search
  searchNodes(
    type: string, 
    field: string, 
    query: string, 
    limit?: number
  ): Promise<GraphNode[]>;

  // Transactions
  transaction<T>(fn: (store: IStore) => Promise<T>): Promise<T>;
}

2.2 Tree-Sitter Extractor Interface

// src/extractor/base-extractor.ts

import { Parser, Tree } from 'tree-sitter';
import { SymbolIR, EdgeIR } from './types';

export interface ExtractionResult {
  symbols: SymbolIR[];
  edges: EdgeIR[];
  parseErrors: ParseError[];
}

export interface ParseError {
  line: number;
  column: number;
  message: string;
}

export interface ExtractorContext {
  filePath: string;
  relativePath: string;
  content: string;
  contentHash: string;
  tree: Tree;
  language: string;
  moduleId: string;
}

export abstract class BaseExtractor {
  abstract readonly language: string;
  abstract readonly extensions: string[];

  /**
   * Extract symbols and edges from a parsed tree.
   * Called after tree-sitter has parsed the file.
   */
  abstract extract(ctx: ExtractorContext): ExtractionResult;

  /**
   * Post-process extraction results.
   * Resolve internal references, compute derived edges.
   * Default implementation does nothing; subclasses can override.
   */
  postProcess(
    symbols: SymbolIR[], 
    edges: EdgeIR[], 
    ctx: ExtractorContext
  ): { symbols: SymbolIR[]; edges: EdgeIR[] } {
    return { symbols, edges };
  }

  /**
   * Generate a stable ID for a symbol.
   * Must be deterministic for the same symbol in the same file.
   */
  generateSymbolId(
    filePath: string, 
    symbolName: string, 
    kind: string, 
    startLine: number
  ): string {
    // Format: sym:<filepath-hash>:<name>:<kind>:<line>
    const pathHash = this.hashPath(filePath);
    return `sym:${pathHash}:${symbolName}:${kind}:${startLine}`;
  }

  private hashPath(filePath: string): string {
    // First 8 chars of SHA256 of relative path
    return createHash('sha256')
      .update(filePath)
      .digest('hex')
      .slice(0, 8);
  }

  /**
   * Walk the tree-sitter AST with a visitor pattern.
   * Utility method for subclasses.
   */
  protected walk(
    node: Parser.SyntaxNode, 
    visitors: Record<string, (node: Parser.SyntaxNode) => void>
  ): void {
    const visitor = visitors[node.type];
    if (visitor) {
      visitor(node);
    }
    for (let i = 0; i < node.childCount; i++) {
      this.walk(node.child(i)!, visitors);
    }
  }

  /**
   * Extract docstring/JSDoc/comment attached to a node.
   */
  protected extractDocstring(node: Parser.SyntaxNode, content: string): string | null {
    // Look for preceding comment nodes
    const prev = node.previousNamedSibling;
    if (prev && (prev.type === 'comment' || prev.type === 'block_comment' 
        || prev.type === 'docstring' || prev.type === 'jsdoc')) {
      return content.slice(prev.startIndex, prev.endIndex).trim();
    }
    return null;
  }
}

2.3 TypeScript Extractor (Detailed Example)

// src/extractor/code/typescript.ts

import { BaseExtractor, ExtractorContext, ExtractionResult, SymbolIR, EdgeIR } from '../base-extractor';

export class TypeScriptExtractor extends BaseExtractor {
  language = 'typescript';
  extensions = ['.ts', '.tsx', '.mts', '.cts'];

  extract(ctx: ExtractorContext): ExtractionResult {
    const symbols: SymbolIR[] = [];
    const edges: EdgeIR[] = [];
    const parseErrors: ParseError[] = [];

    // Collect parse errors
    this.collectErrors(ctx.tree.rootNode, parseErrors, ctx.content);

    // Visit top-level and nested declarations
    this.walk(ctx.tree.rootNode, {
      // Functions
      'function_declaration': (node) => {
        const name = this.getName(node);
        if (!name) return;
        symbols.push({
          id: this.generateSymbolId(ctx.relativePath, name, 'function', node.startPosition.row + 1),
          fileId: `file:${ctx.relativePath}`,
          name,
          kind: 'function',
          signature: this.getSignature(node, ctx.content),
          returnType: this.getReturnType(node),
          startLine: node.startPosition.row + 1,
          endLine: node.endPosition.row + 1,
          startCol: node.startPosition.column,
          endCol: node.endPosition.column,
          docstring: this.extractDocstring(node, ctx.content),
          isExported: this.isExported(node),
          isAsync: this.hasModifier(node, 'async'),
          isStatic: false,
          visibility: this.getVisibility(node),
          modifiers: this.getModifiers(node),
          metadata: {
            params: this.extractParams(node, ctx.content),
            generics: this.extractGenerics(node, ctx.content),
            typeParams: this.extractTypeParams(node),
          },
        });
      },

      // Arrow functions assigned to variables
      'variable_declaration': (node) => {
        const declarator = node.childForFieldName('declarator');
        if (!declarator) return;
        const value = declarator.childForFieldName('value');
        if (!value || (value.type !== 'arrow_function' && value.type !== 'function_expression')) return;
        
        const name = this.getName(declarator);
        if (!name) return;

        const funcKind = value.type === 'arrow_function' ? 'function' : 'function';
        symbols.push({
          id: this.generateSymbolId(ctx.relativePath, name, funcKind, node.startPosition.row + 1),
          fileId: `file:${ctx.relativePath}`,
          name,
          kind: funcKind,
          signature: this.getSignature(value, ctx.content),
          returnType: this.getReturnType(value),
          startLine: node.startPosition.row + 1,
          endLine: node.endPosition.row + 1,
          startCol: node.startPosition.column,
          endCol: node.endPosition.column,
          docstring: this.extractDocstring(node, ctx.content),
          isExported: this.isExported(node),
          isAsync: this.hasModifier(value, 'async'),
          isStatic: false,
          visibility: this.getVisibility(node),
          modifiers: this.getModifiers(node),
          metadata: {
            isArrow: value.type === 'arrow_function',
            params: this.extractParams(value, ctx.content),
            generics: this.extractGenerics(value, ctx.content),
          },
        });
      },

      // Classes
      'class_declaration': (node) => {
        const name = this.getName(node);
        if (!name) return;
        
        const heritage = this.extractHeritage(node); // extends, implements
        const symbolId = this.generateSymbolId(ctx.relativePath, name, 'class', node.startPosition.row + 1);

        symbols.push({
          id: symbolId,
          fileId: `file:${ctx.relativePath}`,
          name,
          kind: 'class',
          signature: this.getSignature(node, ctx.content),
          startLine: node.startPosition.row + 1,
          endLine: node.endPosition.row + 1,
          startCol: node.startPosition.column,
          endCol: node.endPosition.column,
          docstring: this.extractDocstring(node, ctx.content),
          isExported: this.isExported(node),
          isStatic: false,
          visibility: this.getVisibility(node),
          modifiers: this.getModifiers(node),
          metadata: {
            extends: heritage.extends,
            implements: heritage.implements,
            generics: this.extractGenerics(node, ctx.content),
          },
        });

        // Create inheritance edges
        if (heritage.extends) {
          edges.push({
            type: 'inherits',
            from: symbolId,
            to: `sym:unknown:${heritage.extends}:class:0`, // resolved later
            metadata: { is_interface_inheritance: false },
            isResolved: false,
          });
        }
        for (const impl of heritage.implements) {
          edges.push({
            type: 'implements',
            from: symbolId,
            to: `sym:unknown:${impl}:interface:0`,
            metadata: { is_partial: false },
            isResolved: false,
          });
        }
      },

      // Interfaces
      'interface_declaration': (node) => {
        const name = this.getName(node);
        if (!name) return;

        const extendsList = this.extractInterfaceExtends(node);
        const symbolId = this.generateSymbolId(ctx.relativePath, name, 'interface', node.startPosition.row + 1);

        symbols.push({
          id: symbolId,
          fileId: `file:${ctx.relativePath}`,
          name,
          kind: 'interface',
          signature: this.getSignature(node, ctx.content),
          startLine: node.startPosition.row + 1,
          endLine: node.endPosition.row + 1,
          startCol: node.startPosition.column,
          endCol: node.endPosition.column,
          docstring: this.extractDocstring(node, ctx.content),
          isExported: this.isExported(node),
          isStatic: false,
          visibility: 'public',
          modifiers: this.getModifiers(node),
          metadata: {
            extends: extendsList,
            generics: this.extractGenerics(node, ctx.content),
            members: this.extractInterfaceMembers(node, ctx.content, ctx.relativePath),
          },
        });

        for (const ext of extendsList) {
          edges.push({
            type: 'inherits',
            from: symbolId,
            to: `sym:unknown:${ext}:interface:0`,
            metadata: { is_interface_inheritance: true },
            isResolved: false,
          });
        }
      },

      // Type aliases
      'type_alias_declaration': (node) => {
        const name = this.getName(node);
        if (!name) return;
        symbols.push({
          id: this.generateSymbolId(ctx.relativePath, name, 'type_alias', node.startPosition.row + 1),
          fileId: `file:${ctx.relativePath}`,
          name,
          kind: 'type_alias',
          signature: this.getTypeAliasBody(node, ctx.content),
          startLine: node.startPosition.row + 1,
          endLine: node.endPosition.row + 1,
          startCol: node.startPosition.column,
          endCol: node.endPosition.column,
          docstring: this.extractDocstring(node, ctx.content),
          isExported: this.isExported(node),
          isStatic: false,
          visibility: 'public',
          modifiers: [],
          metadata: {
            generics: this.extractGenerics(node, ctx.content),
          },
        });
      },

      // Enums
      'enum_declaration': (node) => {
        const name = this.getName(node);
        if (!name) return;
        const members = this.extractEnumMembers(node, ctx.content);
        symbols.push({
          id: this.generateSymbolId(ctx.relativePath, name, 'enum', node.startPosition.row + 1),
          fileId: `file:${ctx.relativePath}`,
          name,
          kind: 'enum',
          startLine: node.startPosition.row + 1,
          endLine: node.endPosition.row + 1,
          startCol: node.startPosition.column,
          endCol: node.endPosition.column,
          docstring: this.extractDocstring(node, ctx.content),
          isExported: this.isExported(node),
          isStatic: false,
          visibility: 'public',
          modifiers: this.getModifiers(node),
          metadata: { members },
        });
      },

      // Imports (file-level)
      'import_statement': (node) => {
        const importInfo = this.extractImport(node, ctx.content);
        if (!importInfo) return;
        
        // Store as symbol for tracking
        symbols.push({
          id: this.generateSymbolId(ctx.relativePath, importInfo.source, 'import', node.startPosition.row + 1),
          fileId: `file:${ctx.relativePath}`,
          name: importInfo.source,
          kind: 'import',
          startLine: node.startPosition.row + 1,
          endLine: node.endPosition.row + 1,
          startCol: node.startPosition.column,
          endCol: node.endPosition.column,
          isExported: false,
          modifiers: [],
          metadata: {
            source: importInfo.source,
            specifiers: importInfo.specifiers,
            isTypeOnly: importInfo.isTypeOnly,
            isDefault: importInfo.isDefault,
          },
        });

        // Create import edge
        edges.push({
          type: 'imports',
          from: `file:${ctx.relativePath}`,
          to: `file:${this.resolveImportPath(ctx.relativePath, importInfo.source)}`,
          metadata: {
            is_type_only: importInfo.isTypeOnly,
            is_default: importInfo.isDefault,
            specifiers: importInfo.specifiers,
          },
          isResolved: false,
        });
      },

      // Export statements
      'export_statement': (node) => {
        // Handle: export { foo, bar } from './module'
        const exportInfo = this.extractReExport(node, ctx.content);
        if (exportInfo) {
          for (const spec of exportInfo.specifiers) {
            edges.push({
              type: 'exports',
              from: `file:${ctx.relativePath}`,
              to: `file:${this.resolveImportPath(ctx.relativePath, exportInfo.source)}`,
              metadata: {
                is_reexport: true,
                is_default: spec.isDefault,
                alias: spec.alias,
                name: spec.name,
              },
              isResolved: false,
            });
          }
        }
      },

      // Method definitions inside classes
      'method_definition': (node) => {
        // This is handled inside class_declaration visitor
        // We capture it there for parent_symbol_id linking
      },

      // Property definitions inside classes
      'public_field_definition': (node) => {
        // Handled inside class_declaration
      },
    });

    // Post-process: resolve parent_symbol_id for nested symbols
    // Post-process: mark exported symbols
    const processed = this.postProcess(symbols, edges, ctx);

    return {
      symbols: processed.symbols,
      edges: processed.edges,
      parseErrors,
    };
  }

  // ... helper methods (getName, getSignature, extractParams, etc.)
  // Each is ~10-20 lines using tree-sitter child navigation
}

2.4 SQL Extractor

// src/extractor/sql/sql.ts

export class SQLExtractor extends BaseExtractor {
  language = 'sql';
  extensions = ['.sql'];

  extract(ctx: ExtractorContext): ExtractionResult {
    const symbols: SymbolIR[] = [];
    const edges: EdgeIR[] = [];
    const parseErrors: ParseError[] = [];

    this.walk(ctx.tree.rootNode, {
      'create_table': (node) => {
        const tableName = this.getTableName(node);
        if (!tableName) return;
        
        const tableId = this.generateSymbolId(
          ctx.relativePath, tableName, 'table', node.startPosition.row + 1
        );

        // Extract columns
        const columns = this.extractColumns(node, ctx.content, ctx.relativePath, tableId);
        const constraints = this.extractConstraints(node, ctx.content, ctx.relativePath, tableId);
        const indexes = this.extractIndexes(node, ctx.content, ctx.relativePath, tableId);

        symbols.push({
          id: tableId,
          fileId: `file:${ctx.relativePath}`,
          name: tableName,
          kind: 'table',
          signature: this.getTableSignature(node, ctx.content),
          startLine: node.startPosition.row + 1,
          endLine: node.endPosition.row + 1,
          startCol: node.startPosition.column,
          endCol: node.endPosition.column,
          docstring: this.extractTableComment(node, ctx.content),
          isExported: false,
          modifiers: [],
          metadata: {
            schema: this.getSchemaName(node),
            engine: this.getEngine(node),
            columns: columns.map(c => c.name),
            columnCount: columns.length,
          },
        });

        symbols.push(...columns, ...constraints, ...indexes);

        // Create column_of edges
        for (const col of columns) {
          edges.push({ type: 'column_of', from: col.id, to: tableId });
        }
        for (const idx of indexes) {
          edges.push({ type: 'column_of', from: idx.id, to: tableId });
        }
        for (const con of constraints) {
          edges.push({ type: 'column_of', from: con.id, to: tableId });
        }

        // Extract foreign keys and create FK edges
        const fks = this.extractForeignKeys(node, ctx.content);
        for (const fk of fks) {
          const fromColId = this.generateSymbolId(
            ctx.relativePath, fk.column, 'column', 0 // approximate
          );
          const toTableId = `sym:unknown:${fk.refTable}:table:0`;
          edges.push({
            type: 'foreign_key',
            from: fromColId,
            to: toTableId,
            metadata: {
              constraint_name: fk.name,
              on_delete: fk.onDelete,
              on_update: fk.onUpdate,
              ref_column: fk.refColumn,
            },
            isResolved: false,
          });
        }
      },

      'create_view': (node) => {
        const viewName = this.getViewName(node);
        if (!viewName) return;
        symbols.push({
          id: this.generateSymbolId(ctx.relativePath, viewName, 'view', node.startPosition.row + 1),
          fileId: `file:${ctx.relativePath}`,
          name: viewName,
          kind: 'view',
          signature: this.getViewQuery(node, ctx.content),
          startLine: node.startPosition.row + 1,
          endLine: node.endPosition.row + 1,
          startCol: node.startPosition.column,
          endCol: node.endPosition.column,
          docstring: this.extractViewComment(node, ctx.content),
          isExported: false,
          modifiers: [],
          metadata: { schema: this.getSchemaName(node) },
        });
      },

      'create_procedure': (node) => {
        // Stored procedures / functions
      },
    });

    return { symbols, edges, parseErrors };
  }
}

2.5 Markdown Extractor

// src/extractor/markdown/markdown.ts

import { unified } from 'unified';
import remarkParse from 'remark-parse';
import remarkGfm from 'remark-gfm';
import { visit } from 'unist-util-visit';
import { Root, Heading, Code, List, Table, ListItem } from 'mdast';

export class MarkdownExtractor extends BaseExtractor {
  language = 'markdown';
  extensions = ['.md', '.mdx', '.markdown'];

  extract(ctx: ExtractorContext): ExtractionResult {
    const symbols: SymbolIR[] = [];
    const edges: EdgeIR[] = [];

    const tree = unified()
      .use(remarkParse)
      .use(remarkGfm)
      .parse(ctx.content) as Root;

    let currentSection: string | null = null;
    let sectionCounter = 0;
    let workflowStepCounter = 0;
    let diagramNodeCounter = 0;

    visit(tree, (node) => {
      // Headings → sections
      if (node.type === 'heading') {
        const heading = node as Heading;
        const text = this.getTextContent(heading);
        const level = heading.depth;
        const sectionId = this.generateSymbolId(
          ctx.relativePath, text, 'section', heading.position?.start.line || 0
        );

        const sectionSymbol: SymbolIR = {
          id: sectionId,
          fileId: `file:${ctx.relativePath}`,
          name: text,
          kind: 'section',
          startLine: heading.position?.start.line || 0,
          endLine: heading.position?.end.line || 0,
          startCol: heading.position?.start.column || 0,
          endCol: heading.position?.end.column || 0,
          isExported: false,
          modifiers: [],
          metadata: {
            level,
            anchor_id: this.slugify(text),
            section_type: this.classifySection(text),
          },
        };
        symbols.push(sectionSymbol);

        // Link to parent section
        if (currentSection && level > 1) {
          edges.push({
            type: 'contains',
            from: currentSection,
            to: sectionId,
          });
        }
        currentSection = sectionId;
        sectionCounter++;
      }

      // Code blocks → check for mermaid
      if (node.type === 'code') {
        const code = node as Code;
        if (code.lang === 'mermaid' && code.value) {
          const diagramResult = this.parseMermaid(code.value, ctx);
          symbols.push(...diagramResult.symbols);
          edges.push(...diagramResult.edges);

          // Link diagram to current section
          if (currentSection) {
            for (const sym of diagramResult.symbols) {
              edges.push({ type: 'contains', from: currentSection, to: sym.id });
            }
          }
        }
      }

      // Lists → structured list items
      if (node.type === 'list') {
        const list = node as List;
        this.extractListItems(list, symbols, edges, ctx, currentSection);
      }

      // Tables → structured rows
      if (node.type === 'table') {
        const table = node as Table;
        const tableResult = this.extractTable(table, ctx, currentSection);
        symbols.push(...tableResult.symbols);
        edges.push(...tableResult.edges);
      }
    });

    return { symbols, edges, parseErrors: [] };
  }

  private classifySection(heading: string): string {
    const lower = heading.toLowerCase();
    if (/workflow|flow|process|pipeline/.test(lower)) return 'workflow';
    if (/sequence\s*diagram/.test(lower)) return 'sequence_diagram';
    if (/flowchart/.test(lower)) return 'flowchart';
    if (/release\s*plan|roadmap|timeline/.test(lower)) return 'release_plan';
    if (/api|endpoint/.test(lower)) return 'api';
    if (/architecture|component|system\s*design/.test(lower)) return 'architecture';
    if (/decision|adr/.test(lower)) return 'decision';
    if (/requirement|user\s*story|acceptance/.test(lower)) return 'requirement';
    return 'general';
  }

  private parseMermaid(mermaidCode: string, ctx: ExtractorContext): 
    { symbols: SymbolIR[]; edges: EdgeIR[] } {
    
    const symbols: SymbolIR[] = [];
    const edges: EdgeIR[] = [];

    // Detect diagram type
    const typeMatch = mermaidCode.match(/^(sequenceDiagram|flowchart\s+\w+|stateDiagram|erDiagram|classDiagram|gantt)/m);
    const diagramType = typeMatch?.[1] || 'unknown';

    if (diagramType === 'sequenceDiagram') {
      return this.parseSequenceDiagram(mermaidCode, ctx);
    }
    if (diagramType.startsWith('flowchart')) {
      return this.parseFlowchart(mermaidCode, ctx);
    }
    if (diagramType === 'erDiagram') {
      return this.parseERDiagram(mermaidCode, ctx);
    }
    if (diagramType === 'classDiagram') {
      return this.parseClassDiagram(mermaidCode, ctx);
    }

    // Fallback: store as raw diagram node
    symbols.push({
      id: this.generateSymbolId(ctx.relativePath, `diagram-${Date.now()}`, 'section', 0),
      fileId: `file:${ctx.relativePath}`,
      name: `Mermaid ${diagramType}`,
      kind: 'section',
      startLine: 0,
      endLine: 0,
      startCol: 0,
      endCol: 0,
      isExported: false,
      modifiers: [],
      metadata: { diagram_type: diagramType, raw: mermaidCode },
    });

    return { symbols, edges };
  }

  private parseSequenceDiagram(code: string, ctx: ExtractorContext): 
    { symbols: SymbolIR[]; edges: EdgeIR[] } {
    // Parse:
    //   participant A as Actor A
    //   A->>B: Message
    //   B-->>A: Response
    //
    // Creates: diagram_node per participant
    // Creates: diagram_edge per message (with label, style)
    
    const symbols: SymbolIR[] = [];
    const edges: EdgeIR[] = [];
    const participants = new Map<string, string>(); // alias → full name
    const baseLine = 0; // Would need actual line from parent

    const participantRe = /^participant\s+(\w+)(?:\s+as\s+(.+))?$/gm;
    let match;
    while ((match = participantRe.exec(code)) !== null) {
      const alias = match[1];
      const fullName = match[2] || alias;
      participants.set(alias, fullName);
      
      const id = this.generateSymbolId(ctx.relativePath, alias, 'diagram_node', baseLine);
      symbols.push({
        id,
        fileId: `file:${ctx.relativePath}`,
        name: fullName,
        kind: 'diagram_node',
        startLine: baseLine,
        endLine: baseLine,
        startCol: 0,
        endCol: 0,
        isExported: false,
        modifiers: [],
        metadata: {
          diagram_type: 'sequence_diagram',
          role: 'participant',
          alias,
        },
      });
    }

    // Parse messages: A->>B: text  or  A-->>B: text
    const msgRe = /^(\w+)(->>|-->>|->|-->)\s*(\w+):\s*(.+)$/gm;
    let msgMatch;
    let msgCounter = 0;
    while ((msgMatch = msgRe.exec(code)) !== null) {
      const fromAlias = msgMatch[1];
      const arrowStyle = msgMatch[2];
      const toAlias = msgMatch[3];
      const message = msgMatch[4];

      const fromId = this.generateSymbolId(ctx.relativePath, fromAlias, 'diagram_node', baseLine);
      const toId = this.generateSymbolId(ctx.relativePath, toAlias, 'diagram_node', baseLine);

      // Register participants if not explicitly declared
      if (!participants.has(fromAlias)) {
        participants.set(fromAlias, fromAlias);
        symbols.push({
          id: fromId,
          fileId: `file:${ctx.relativePath}`,
          name: fromAlias,
          kind: 'diagram_node',
          startLine: baseLine, endLine: baseLine,
          startCol: 0, endCol: 0,
          isExported: false, modifiers: [],
          metadata: { diagram_type: 'sequence_diagram', role: 'participant', alias: fromAlias },
        });
      }
      if (!participants.has(toAlias)) {
        participants.set(toAlias, toAlias);
        symbols.push({
          id: toId,
          fileId: `file:${ctx.relativePath}`,
          name: toAlias,
          kind: 'diagram_node',
          startLine: baseLine, endLine: baseLine,
          startCol: 0, endCol: 0,
          isExported: false, modifiers: [],
          metadata: { diagram_type: 'sequence_diagram', role: 'participant', alias: toAlias },
        });
      }

      edges.push({
        type: 'diagram_edge',
        from: fromId,
        to: toId,
        metadata: {
          label: message,
          style: arrowStyle === '->>' ? 'solid' : arrowStyle === '-->>' ? 'dashed' : 'dotted',
          type: 'solid',
          sequence: msgCounter++,
          is_response: arrowStyle.includes('--'),
        },
      });
    }

    return { symbols, edges };
  }

  // ... parseFlowchart, parseERDiagram, parseClassDiagram, extractListItems, extractTable
}

2.6 Chainable Query Builder — Core

// src/query/builder.ts

import { IStore } from '../storage/interface';
import { GraphNode, GraphEdge, GraphResult } from '../types/graph';
import { RepoScope } from './scopes/repo-scope';

export type SortDirection = 'asc' | 'desc';
export type TerminalFormat = 'array' | 'graph' | 'markdown' | 'json';

export interface FilterPredicate {
  field: string;
  op: 'eq' | 'neq' | 'gt' | 'gte' | 'lt' | 'lte' | 'contains' | 'matches' | 'in' | 'exists';
  value: unknown;
}

export abstract class QueryScope<T extends QueryScope<T>> {
  protected filters: FilterPredicate[] = [];
  protected sortField: string | null = null;
  protected sortDir: SortDirection = 'asc';
  protected limitCount: number | null = null;
  protected offsetCount: number = 0;

  constructor(protected store: IStore, protected repoPath: string) {}

  filter(predicate: FilterPredicate | ((item: GraphNode) => boolean)): T {
    const clone = this.clone();
    if (typeof predicate === 'function') {
      // Function filters are applied post-hoc (for in-memory operations)
      clone.filters.push({ field: '_func', op: 'eq', value: predicate } as any);
    } else {
      clone.filters.push(predicate);
    }
    return clone as T;
  }

  // Shorthand filters
  eq(field: string, value: unknown): T { return this.filter({ field, op: 'eq', value }); }
  neq(field: string, value: unknown): T { return this.filter({ field, op: 'neq', value }); }
  contains(field: string, value: string): T { return this.filter({ field, op: 'contains', value }); }
  matches(field: string, pattern: string): T { return this.filter({ field, op: 'matches', value: pattern }); }
  in(field: string, values: unknown[]): T { return this.filter({ field, op: 'in', value: values }); }

  sort(field: string, dir: SortDirection = 'asc'): T {
    const clone = this.clone();
    clone.sortField = field;
    clone.sortDir = dir;
    return clone as T;
  }

  limit(n: number): T {
    const clone = this.clone();
    clone.limitCount = n;
    return clone as T;
  }

  offset(n: number): T {
    const clone = this.clone();
    clone.offsetCount = n;
    return clone as T;
  }

  // Terminal methods
  async toArray(): Promise<GraphNode[]> {
    const result = await this.execute();
    return this.applyPostFilters(result.nodes as GraphNode[]);
  }

  async toGraph(): Promise<GraphResult> {
    const result = await this.execute();
    return {
      nodes: this.applyPostFilters(result.nodes as GraphNode[]),
      edges: result.edges as GraphEdge[],
    };
  }

  async toMarkdown(): Promise<string> {
    const nodes = await this.toArray();
    return this.formatAsMarkdown(nodes);
  }

  async toJSON(): Promise<string> {
    const result = await this.toGraph();
    return JSON.stringify(result, null, 2);
  }

  async count(): Promise<number> {
    const nodes = await this.toArray();
    return nodes.length;
  }

  async exists(): Promise<boolean> {
    const count = await this.count();
    return count > 0;
  }

  // Abstract: each scope implements its own query translation
  protected abstract execute(): Promise<{ nodes: unknown[]; edges: unknown[] }>;
  protected abstract clone(): T;
  protected abstract formatAsMarkdown(nodes: GraphNode[]): string;

  protected applyPostFilters(nodes: GraphNode[]): GraphNode[] {
    return nodes.filter(node => {
      for (const f of this.filters) {
        if (f.field === '_func') continue; // Skip function filters for DB
        const val = (node as any)[f.field];
        if (!this.evaluateFilter(val, f)) return false;
      }
      // Apply function filters
      for (const f of this.filters) {
        if (f.field === '_func') {
          if (!(f.value as Function)(node)) return false;
        }
      }
      return true;
    });
  }

  private evaluateFilter(val: unknown, f: FilterPredicate): boolean {
    switch (f.op) {
      case 'eq': return val === f.value;
      case 'neq': return val !== f.value;
      case 'contains': return typeof val === 'string' && val.includes(f.value as string);
      case 'matches': return typeof val === 'string' && new RegExp(f.value as string).test(val);
      case 'in': return Array.isArray(f.value) && f.value.includes(val);
      case 'exists': return val !== null && val !== undefined;
      case 'gt': return typeof val === 'number' && val > (f.value as number);
      case 'gte': return typeof val === 'number' && val >= (f.value as number);
      case 'lt': return typeof val === 'number' && val < (f.value as number);
      case 'lte': return typeof val === 'number' && val <= (f.value as number);
      default: return true;
    }
  }
}

// Public API entry point
export function createQuery(store: IStore, repoPath: string): RepoScope {
  return new RepoScope(store, repoPath);
}

2.7 RepoScope (Top-Level)

// src/query/scopes/repo-scope.ts

import { QueryScope } from '../builder';
import { IStore } from '../../storage/interface';
import { GraphNode } from '../../types/graph';
import { ModuleScope } from './module-scope';
import { FileScope } from './file-scope';
import { SymbolScope } from './symbol-scope';

export class RepoScope extends QueryScope<RepoScope> {
  protected async execute(): Promise<{ nodes: unknown[]; edges: unknown[] }> {
    const query = `
      SELECT * FROM repository 
      WHERE root = $repoPath
      LIMIT 1
    `;
    const nodes = await this.store.query(query, { repoPath: this.repoPath });
    return { nodes, edges: [] };
  }

  protected clone(): RepoScope {
    return new RepoScope(this.store, this.repoPath);
  }

  protected formatAsMarkdown(nodes: GraphNode[]): string {
    if (nodes.length === 0) return 'Repository not indexed.';
    const repo = nodes[0];
    const stats = repo.stats as any;
    return [
      `# Repository: ${repo.name}`,
      ``,
      `- **Path:** ${repo.root}`,
      `- **Files:** ${stats?.files ?? 'N/A'}`,
      `- **Modules:** ${stats?.modules ?? 'N/A'}`,
      `- **Symbols:** ${stats?.symbols ?? 'N/A'}`,
      `- **Last Indexed:** ${repo.updated_at}`,
    ].join('\n');
  }

  // Navigation to sub-scopes
  modules(): ModuleScope {
    return new ModuleScope(this.store, this.repoPath, null);
  }

  files(): FileScope {
    return new FileScope(this.store, this.repoPath, null);
  }

  symbols(): SymbolScope {
    return new SymbolScope(this.store, this.repoPath, null);
  }

  docs(): DocScope {
    return new DocScope(this.store, this.repoPath, null);
  }

  // Convenience: direct symbol lookup
  symbol(name: string): SymbolScope {
    return new SymbolScope(this.store, this.repoPath, null)
      .eq('name', name);
  }

  table(name: string): TableScope {
    return new TableScope(this.store, this.repoPath, null)
      .eq('name', name);
  }

  commit(hash: string): CommitScope {
    return new CommitScope(this.store, this.repoPath, null)
      .eq('hash', hash);
  }
}

2.8 SymbolScope (With Graph Traversal)

// src/query/scopes/symbol-scope.ts

import { QueryScope } from '../builder';
import { IStore } from '../../storage/interface';
import { GraphNode, GraphEdge } from '../../types/graph';

export class SymbolScope extends QueryScope<SymbolScope> {
  constructor(
    store: IStore,
    repoPath: string,
    private moduleId: string | null
  ) {
    super(store, repoPath);
  }

  protected async execute(): Promise<{ nodes: unknown[]; edges: unknown[] }> {
    let query = 'SELECT * FROM symbol';
    const vars: Record<string, unknown> = {};
    const conditions: string[] = [];

    if (this.moduleId) {
      // Join through file to filter by module
      query = `
        SELECT symbol.*, file.path as file_path, file.module_id 
        FROM symbol 
        INNER JOIN file ON symbol.file_id = file.id
      `;
      conditions.push('file.module_id = $moduleId');
      vars.moduleId = this.moduleId;
    }

    // Apply filters
    for (const f of this.filters) {
      if (f.field === '_func') continue;
      const param = `f_${f.field}`;
      switch (f.op) {
        case 'eq': conditions.push(`symbol.${f.field} = $${param}`); break;
        case 'neq': conditions.push(`symbol.${f.field} != $${param}`); break;
        case 'contains': conditions.push(`string::contains(symbol.${f.field}, $${param})`); break;
        case 'matches': conditions.push(`string::matches(symbol.${f.field}, $${param})`); break;
        case 'in': conditions.push(`symbol.${f.field} IN $${param}`); break;
        case 'exists': conditions.push(`symbol.${f.field} != NONE`); break;
      }
      vars[param] = f.value;
    }

    if (conditions.length > 0) {
      query += ` WHERE ${conditions.join(' AND ')}`;
    }

    if (this.sortField) {
      query += ` ORDER BY symbol.${this.sortField} ${this.sortDir.toUpperCase()}`;
    }

    if (this.limitCount !== null) {
      query += ` LIMIT ${this.limitCount}`;
    }
    if (this.offsetCount > 0) {
      query += ` START ${this.offsetCount}`;
    }

    const nodes = await this.store.query(query, vars);
    return { nodes, edges: [] };
  }

  // Graph traversal methods
  async dependants(): Promise<SymbolScope> {
    const symbols = await this.toArray();
    if (symbols.length === 0) return this;
    
    const ids = symbols.map(s => s.id);
    const result = await this.store.graphTraversal(
      ids[0], // Start from first symbol
      ['calls', 'imports', 'references'],
      'inbound',
      10, // max depth
      undefined
    );
    
    // Return new scope with traversed nodes
    const newScope = new SymbolScope(this.store, this.repoPath, this.moduleId);
    // Store pre-computed result
    (newScope as any)._precomputedNodes = result.nodes;
    (newScope as any)._precomputedEdges = result.edges;
    return newScope;
  }

  async dependencies(): Promise<SymbolScope> {
    const symbols = await this.toArray();
    if (symbols.length === 0) return this;
    
    const result = await this.store.graphTraversal(
      symbols[0].id,
      ['calls', 'imports', 'references'],
      'outbound',
      10,
      undefined
    );
    
    const newScope = new SymbolScope(this.store, this.repoPath, this.moduleId);
    (newScope as any)._precomputedNodes = result.nodes;
    (newScope as any)._precomputedEdges = result.edges;
    return newScope;
  }

  async callers(): Promise<SymbolScope> {
    const symbols = await this.toArray();
    if (symbols.length === 0) return this;
    
    const result = await this.store.graphTraversal(
      symbols[0].id,
      ['calls'],
      'inbound',
      10,
      undefined
    );
    
    const newScope = new SymbolScope(this.store, this.repoPath, this.moduleId);
    (newScope as any)._precomputedNodes = result.nodes;
    (newScope as any)._precomputedEdges = result.edges;
    return newScope;
  }

  async callees(): Promise<SymbolScope> {
    const symbols = await this.toArray();
    if (symbols.length === 0) return this;
    
    const result = await this.store.graphTraversal(
      symbols[0].id,
      ['calls'],
      'outbound',
      10,
      undefined
    );
    
    const newScope = new SymbolScope(this.store, this.repoPath, this.moduleId);
    (newScope as any)._precomputedNodes = result.nodes;
    (newScope as any)._precomputedEdges = result.edges;
    return newScope;
  }

  // Navigate to containing file
  async file(): Promise<FileScope> {
    const symbols = await this.toArray();
    if (symbols.length === 0) return new FileScope(this.store, this.repoPath, null);
    const fileId = (symbols[0] as any).file_id;
    const fileScope = new FileScope(this.store, this.repoPath, null);
    (fileScope as any)._precomputedFileId = fileId;
    return fileScope;
  }

  protected clone(): SymbolScope {
    return new SymbolScope(this.store, this.repoPath, this.moduleId);
  }

  protected formatAsMarkdown(nodes: GraphNode[]): string {
    if (nodes.length === 0) return 'No symbols found.';
    return nodes.map(n => {
      const s = n as any;
      const exportTag = s.is_exported ? 'exported' : 'internal';
      const location = s.file_path ? `(${s.file_path}:${s.start_line})` : `(${s.start_line})`;
      return `- **${s.name}** [${s.kind}] [${exportTag}] ${location}${s.signature ? `\n  \`${s.signature}\`` : ''}${s.docstring ? `\n  > ${s.docstring.split('\n')[0]}` : ''}`;
    }).join('\n');
  }
}

2.9 MCP Tool Implementation Example

// src/mcp/tools/impact.ts

import { Tool } from '@modelcontextprotocol/sdk/types.js';
import { IStore } from '../../storage/interface';
import { createQuery } from '../../query/builder';
import { TokenBudgetManager } from '../token-budget';

export function createImpactAnalysisTool(store: IStore, repoPath: string, budget: TokenBudgetManager): Tool {
  return {
    name: 'get_impact_analysis',
    description: `Analyze the impact of changing a symbol. Returns all direct and transitive dependants — functions that call it, files that import it, modules that depend on it. Use this before making changes to understand blast radius.`,
    inputSchema: {
      type: 'object',
      properties: {
        symbol_name: {
          type: 'string',
          description: 'Name of the symbol to analyze',
        },
        symbol_kind: {
          type: 'string',
          enum: ['function', 'class', 'interface', 'type_alias', 'variable', 'table', 'column'],
          description: 'Kind of symbol (optional, narrows search)',
        },
        file_path: {
          type: 'string',
          description: 'File path to disambiguate (optional)',
        },
        max_depth: {
          type: 'number',
          description: 'Max traversal depth for transitive dependants (default: 5)',
          default: 5,
        },
        include_transitive: {
          type: 'boolean',
          description: 'Include transitive (indirect) dependants (default: true)',
          default: true,
        },
      },
      required: ['symbol_name'],
    },
    handler: async (params: any) => {
      const q = createQuery(store, repoPath)
        .symbol(params.symbol_name);

      if (params.symbol_kind) q.eq('kind', params.symbol_kind);
      if (params.file_path) q.eq('file_path', params.file_path);

      const symbols = await q.toArray();
      if (symbols.length === 0) {
        return {
          content: [{ type: 'text', text: JSON.stringify({ error: 'Symbol not found', symbol_name: params.symbol_name }) }],
        };
      }

      const symbol = symbols[0];
      const depth = params.max_depth ?? 5;

      // Get dependants via graph traversal
      const result = await store.graphTraversal(
        symbol.id,
        ['calls', 'imports', 'references', 'implements'],
        'inbound',
        depth,
        undefined
      );

      // Organize by distance (direct vs transitive)
      const direct = result.edges.filter(e => {
        // Direct edges are those where the target is our symbol
        return e.to === symbol.id;
      }).map(e => result.nodes.find(n => n.id === e.from)!).filter(Boolean);

      const transitive = result.nodes.filter(n => 
        n.id !== symbol.id && !direct.find(d => d.id === n.id)
      );

      // Group by file and module
      const byFile = new Map<string, GraphNode[]>();
      const byModule = new Map<string, GraphNode[]>();
      
      for (const node of result.nodes) {
        const n = node as any;
        if (n.file_path) {
          if (!byFile.has(n.file_path)) byFile.set(n.file_path, []);
          byFile.get(n.file_path)!.push(node);
        }
        if (n.module_id) {
          if (!byModule.has(n.module_id)) byModule.set(n.module_id, []);
          byModule.get(n.module_id)!.push(node);
        }
      }

      const response = {
        target: {
          id: symbol.id,
          name: (symbol as any).name,
          kind: (symbol as any).kind,
          file: (symbol as any).file_path,
          line: (symbol as any).start_line,
        },
        impact_summary: {
          total_dependants: result.nodes.length,
          direct_dependants: direct.length,
          transitive_dependants: transitive.length,
          files_affected: byFile.size,
          modules_affected: byModule.size,
        },
        direct_dependants: direct.map(n => ({
          name: (n as any).name,
          kind: (n as any).kind,
          file: (n as any).file_path,
          line: (n as any).start_line,
          relationship: result.edges.find(e => e.from === n.id && e.to === symbol.id)?.type,
        })),
        affected_files: Object.fromEntries(
          Array.from(byFile.entries()).map(([path, nodes]) => [
            path,
            nodes.map(n => ({ name: (n as any).name, kind: (n as any).kind, line: (n as any).start_line }))
          ])
        ),
        affected_modules: Object.fromEntries(
          Array.from(byModule.entries()).map(([id, nodes]) => [
            id,
            { symbol_count: nodes.length, kinds: [...new Set(nodes.map(n => (n as any).kind))] }
          ])
        ),
        token_estimate: budget.estimate(JSON.stringify(result)),
      };

      // Apply token budget truncation if needed
      const truncated = budget.truncate(response, params.max_tokens);

      return {
        content: [{ type: 'text', text: JSON.stringify(truncated, null, 2) }],
      };
    },
  };
}

2.10 Git Hook Implementation

// src/hooks/pre-commit.ts

import { simpleGit, SimpleGit } from 'simple-git';
import { IStore } from '../storage/interface';
import { ExtractorRegistry } from '../extractor/registry';
import { GraphDiffer } from '../engine/differ';
import { GraphMerger } from '../engine/merger';
import { Validator } from '../engine/validator';
import { contentHash } from '../utils/hash';
import { Logger } from '../utils/logger';

interface PreCommitResult {
  status: 'pass' | 'warn' | 'fail';
  parsed: number;
  updated: number;
  added: number;
  removed: number;
  errors: string[];
  warnings: string[];
}

export async function runPreCommit(
  repoPath: string,
  store: IStore,
  config: { mode: 'warn' | 'block' | 'off' },
  logger: Logger
): Promise<PreCommitResult> {
  const result: PreCommitResult = {
    status: 'pass',
    parsed: 0,
    updated: 0,
    added: 0,
    removed: 0,
    errors: [],
    warnings: [],
  };

  const git: SimpleGit = simpleGit(repoPath);

  // 1. Get staged files
  const stagedFiles = await git.diff(['--cached', '--name-only', '--diff-filter=ACMR']);
  const fileNames = stagedFiles.trim().split('\n').filter(Boolean);

  if (fileNames.length === 0) {
    return result;
  }

  logger.info(`Pre-commit: ${fileNames.length} staged files`);

  // 2. Filter to supported files
  const registry = new ExtractorRegistry();
  const supportedFiles = fileNames.filter(f => registry.supportsFile(f));

  if (supportedFiles.length === 0) {
    return result;
  }

  logger.info(`Pre-commit: ${supportedFiles.length} supported files to parse`);

  // 3. Parse changed files
  for (const filePath of supportedFiles) {
    try {
      const absolutePath = path.resolve(repoPath, filePath);
      const content = await fs.readFile(absolutePath, 'utf-8');
      const hash = contentHash(content);

      // Check if content actually changed
      const existingFile = await store.query(
        'SELECT content_hash FROM file WHERE path = $path LIMIT 1',
        { path: filePath }
      );

      if (existingFile.length > 0 && existingFile[0].content_hash === hash) {
        continue; // No change
      }

      // Extract symbols
      const extractor = registry.getExtractor(filePath);
      const extraction = await extractor.extractFile(absolutePath, repoPath);

      // Diff against existing graph
      const oldSymbols = await store.query(
        'SELECT * FROM symbol WHERE file_id = $fileId',
        { fileId: `file:${filePath}` }
      );

      const diff = GraphDiffer.diff(oldSymbols, extraction.symbols);

      // Merge into graph
      await store.transaction(async (tx) => {
        // Remove old symbols
        for (const removed of diff.removed) {
          await tx.deleteNode(removed.id);
          await tx.deleteEdges(removed.id);
          result.removed++;
        }

        // Update changed symbols
        for (const changed of diff.changed) {
          await tx.updateNode(changed.new.id, changed.new);
          result.updated++;
        }

        // Add new symbols
        for (const added of diff.added) {
          await tx.createNode(added);
          result.added++;
        }

        // Update edges
        await tx.deleteEdges(`file:${filePath}`); // Remove old edges from this file
        await tx.createEdges(extraction.edges.map(e => ({
          ...e,
          // Resolve file-level edges
          from: e.from.startsWith('file:') ? `file:${filePath}` : e.from,
        })));

        // Update file node
        const fileNode = {
          id: `file:${filePath}`,
          type: 'file',
          path: filePath,
          content_hash: hash,
          parse_status: extraction.parseErrors.length === 0 ? 'parsed' : 'partial',
          parse_error: extraction.parseErrors.length > 0 
            ? extraction.parseErrors.map(e => `L${e.line}: ${e.message}`).join('; ') 
            : null,
          last_parsed: new Date().toISOString(),
          line_count: content.split('\n').length,
          size_bytes: Buffer.byteLength(content),
        };
        await tx.createNode(fileNode as any);
      });

      result.parsed++;

      if (extraction.parseErrors.length > 0) {
        result.warnings.push(
          `${filePath}: ${extraction.parseErrors.length} parse errors`
        );
      }
    } catch (err) {
      result.errors.push(`${filePath}: ${err.message}`);
      logger.error(`Pre-commit error for ${filePath}`, err);
    }
  }

  // 4. Validate (if enabled)
  if (config.mode !== 'off') {
    const validation = await Validator.validate(store, repoPath);
    result.warnings.push(...validation.warnings);
    result.errors.push(...validation.errors);

    if (result.errors.length > 0 && config.mode === 'block') {
      result.status = 'fail';
    } else if (result.warnings.length > 0 || result.errors.length > 0) {
      result.status = 'warn';
    }
  }

  // 5. Update repo stats
  await updateRepoStats(store, repoPath);

  return result;
}

2.11 Workflow Template: Bug Fix

// src/workflows/templates/bug-fix.ts

import { IStore } from '../../storage/interface';
import { createQuery } from '../../query/builder';

export interface BugFixInput {
  error_message?: string;
  stack_trace?: string[];
  file_path?: string;
  line_number?: number;
  symbol_name?: string;
  error_type?: string; // TypeError, ReferenceError, etc.
}

export interface BugFixOutput {
  root_candidates: RootCandidate[];
  impact_radius: ImpactRadius;
  related_tests: RelatedTest[];
  recent_changes: RecentChange[];
  suggested_investigation_order: string[];
}

interface RootCandidate {
  symbol_id: string;
  symbol_name: string;
  kind: string;
  file_path: string;
  line: number;
  confidence: 'high' | 'medium' | 'low';
  reason: string;
}

interface ImpactRadius {
  direct_callers: number;
  transitive_callers: number;
  affected_files: string[];
  affected_modules: string[];
}

export async function executeBugFixWorkflow(
  store: IStore,
  repoPath: string,
  input: BugFixInput
): Promise<BugFixOutput> {
  const candidates: RootCandidate[] = [];

  // Strategy 1: If we have a file + line, look up the symbol at that location
  if (input.file_path && input.line_number) {
    const symbols = await createQuery(store, repoPath)
      .symbol('') // We need a different query here
      .eq('file_path', input.file_path)
      .toArray();

    // Find symbol containing the line
    const containing = symbols.find(s => {
      const sym = s as any;
      return sym.start_line <= input.line_number! && sym.end_line >= input.line_number!;
    });

    if (containing) {
      candidates.push({
        symbol_id: containing.id,
        symbol_name: (containing as any).name,
        kind: (containing as any).kind,
        file_path: (containing as any).file_path,
        line: (containing as any).start_line,
        confidence: 'high',
        reason: `Symbol at error location (${input.file_path}:${input.line_number})`,
      });
    }
  }

  // Strategy 2: If we have a symbol name from the error (e.g., "Cannot read property 'foo' of undefined")
  if (input.symbol_name || input.error_message) {
    const nameToSearch = input.symbol_name || extractPropertyName(input.error_message!);
    if (nameToSearch) {
      const matches = await createQuery(store, repoPath)
        .symbol(nameToSearch)
        .toArray();

      for (const match of matches) {
        // Don't duplicate if already found
        if (candidates.find(c => c.symbol_id === match.id)) continue;

        candidates.push({
          symbol_id: match.id,
          symbol_name: (match as any).name,
          kind: (match as any).kind,
          file_path: (match as any).file_path,
          line: (match as any).start_line,
          confidence: 'medium',
          reason: `Name matches error reference: "${nameToSearch}"`,
        });
      }
    }
  }

  // Strategy 3: If we have a stack trace, trace the call chain
  if (input.stack_trace && input.stack_trace.length > 0) {
    for (const frame of input.stack_trace) {
      const parsed = parseStackFrame(frame);
      if (!parsed) continue;

      const symbols = await createQuery(store, repoPath)
        .symbol(parsed.functionName)
        .eq('file_path', parsed.filePath)
        .toArray();

      for (const sym of symbols) {
        if (candidates.find(c => c.symbol_id === sym.id)) continue;
        candidates.push({
          symbol_id: sym.id,
          symbol_name: (sym as any).name,
          kind: (sym as any).kind,
          file_path: (sym as any).file_path,
          line: (sym as any).start_line,
          confidence: parsed.filePath === input.file_path ? 'high' : 'medium',
          reason: `Appears in stack trace: ${frame.trim()}`,
        });
      }
    }
  }

  // Strategy 4: If error type suggests null/undefined, find recently changed symbols in the area
  if (input.error_type && ['TypeError', 'ReferenceError'].includes(input.error_type)) {
    // Find symbols modified in last 5 commits in the same file
    if (input.file_path) {
      const recentSymbols = await store.query(`
        SELECT symbol.*, commit.hash, commit.date
        FROM symbol
        INNER JOIN modified_in ON symbol.file_id = modified_in.from
        INNER JOIN commit ON modified_in.to = commit.id
        WHERE symbol.file_path = $filePath
        ORDER BY commit.date DESC
        LIMIT 10
      `, { filePath: input.file_path });

      for (const rs of recentSymbols) {
        if (candidates.find(c => c.symbol_id === rs.id)) continue;
        candidates.push({
          symbol_id: rs.id,
          symbol_name: rs.name,
          kind: rs.kind,
          file_path: rs.file_path,
          line: rs.start_line,
          confidence: 'low',
          reason: `Recently modified symbol in error file (commit ${rs.hash})`,
        });
      }
    }
  }

  // Compute impact radius for top candidate
  let impactRadius: ImpactRadius = {
    direct_callers: 0,
    transitive_callers: 0,
    affected_files: [],
    affected_modules: [],
  };

  if (candidates.length > 0) {
    const topCandidate = candidates[0];
    const result = await store.graphTraversal(
      topCandidate.symbol_id,
      ['calls', 'imports'],
      'inbound',
      10,
      undefined
    );
    
    const directEdges = result.edges.filter(e => e.to === topCandidate.symbol_id);
    impactRadius.direct_callers = directEdges.length;
    impactRadius.transitive_callers = result.nodes.length;
    impactRadius.affected_files = [...new Set(result.nodes.map(n => (n as any).file_path).filter(Boolean))];
    
    // Resolve modules
    for (const filePath of impactRadius.affected_files) {
      const fileNode = await store.query(
        'SELECT module_id FROM file WHERE path = $path LIMIT 1',
        { path: filePath }
      );
      if (fileNode.length > 0 && fileNode[0].module_id) {
        impactRadius.affected_modules.push(fileNode[0].module_id);
      }
    }
    impactRadius.affected_modules = [...new Set(impactRadius.affected_modules)];
  }

  // Find related tests
  const relatedTests: RelatedTest[] = [];
  if (candidates.length > 0) {
    for (const candidate of candidates.slice(0, 3)) {
      const testSymbols = await store.query(`
        SELECT * FROM symbol
        WHERE name CONTAINS $testName
          AND (kind = 'function' AND name LIKE '%test%')
        LIMIT 5
      `, { testName: candidate.symbol_name });

      for (const test of testSymbols) {
        relatedTests.push({
          test_name: test.name,
          file_path: test.file_path,
          line: test.start_line,
          linked_to: candidate.symbol_name,
        });
      }
    }
  }

  // Suggest investigation order
  const suggestedOrder = candidates
    .sort((a, b) => {
      const confOrder = { high: 0, medium: 1, low: 2 };
      return confOrder[a.confidence] - confOrder[b.confidence];
    })
    .map(c => `${c.file_path}:${c.line} (${c.symbol_name})`);

  return {
    root_candidates: candidates,
    impact_radius: impactRadius,
    related_tests: relatedTests,
    recent_changes: [], // Populated from git log
    suggested_investigation_order: suggestedOrder,
  };
}

function extractPropertyName(errorMessage: string): string | null {
  // "Cannot read properties of undefined (reading 'foo')"
  const readMatch = errorMessage.match(/reading '(\w+)'/);
  if (readMatch) return readMatch[1];
  
  // "foo is not a function"
  const notFnMatch = errorMessage.match(/(\w+) is not a function/);
  if (notFnMatch) return notFnMatch[1];
  
  // "foo is not defined"
  const notDefMatch = errorMessage.match(/(\w+) is not defined/);
  if (notDefMatch) return notDefMatch[1];

  return null;
}

function parseStackFrame(frame: string): { functionName: string; filePath: string } | null {
  // "at functionName (/path/to/file.ts:10:5)"
  const match = frame.match(/at\s+(\w+)\s+\((.+):(\d+):\d+\)/);
  if (!match) return null;
  return { functionName: match[1], filePath: match[2] };
}

2.12 Token Budget Manager

// src/mcp/token-budget.ts

export class TokenBudgetManager {
  private maxTokens: number;

  // Approximate tokens per character for different content types
  private static RATES = {
    code: 0.25,       // ~4 chars per token
    markdown: 0.3,    // ~3.3 chars per token
    json: 0.22,       // ~4.5 chars per token (compact)
    text: 0.33,       // ~3 chars per token
  };

  constructor(maxTokens: number = 8000) {
    this.maxTokens = maxTokens;
  }

  estimate(content: string, type: keyof typeof TokenBudgetManager.RATES = 'json'): number {
    return Math.ceil(content.length * TokenBudgetManager.RATES[type]);
  }

  truncate<T>(data: T, requestedMax?: number): T & { _truncated: boolean; _token_count: number } {
    const max = requestedMax ?? this.maxTokens;
    const json = JSON.stringify(data);
    const tokens = this.estimate(json);

    if (tokens <= max) {
      return {
        ...data,
        _truncated: false,
        _token_count: tokens,
      } as T & { _truncated: boolean; _token_count: number };
    }

    // Truncation strategy: keep structure, reduce detail
    const truncated = this.smartTruncate(data, max);
    const truncatedJson = JSON.stringify(truncated);
    const truncatedTokens = this.estimate(truncatedJson);

    return {
      ...truncated,
      _truncated: true,
      _token_count: truncatedTokens,
    } as T & { _truncated: boolean; _token_count: number };
  }

  private smartTruncate<T>(data: T, budget: number): T {
    const obj = data as any;

    // Strategy 1: If it has an array of items, truncate the array
    for (const key of Object.keys(obj)) {
      if (Array.isArray(obj[key]) && obj[key].length > 0) {
        // Keep reducing until we're under budget
        let len = obj[key].length;
        while (len > 1) {
          const testObj = { ...obj, [key]: obj[key].slice(0, len) };
          const testJson = JSON.stringify(testObj);
          if (this.estimate(testJson) <= budget * 0.9) { // 10% margin for metadata
            obj[key] = obj[key].slice(0, len);
            obj._truncation_note = `${key} truncated from ${obj[key].length} to ${len} items`;
            return obj as T;
          }
          len = Math.floor(len * 0.7); // Reduce by 30% each iteration
        }
        obj[key] = obj[key].slice(0, 1);
        return obj as T;
      }
    }

    // Strategy 2: Remove verbose fields
    const verboseFields = ['signature', 'docstring', 'metadata', 'raw'];
    for (const field of verboseFields) {
      if (obj[field]) {
        delete obj[field];
        const testJson = JSON.stringify(obj);
        if (this.estimate(testJson) <= budget * 0.9) {
          return obj as T;
        }
      }
    }

    // Strategy 3: Last resort - truncate string fields
    for (const key of Object.keys(obj)) {
      if (typeof obj[key] === 'string' && obj[key].length > 100) {
        obj[key] = obj[key].slice(0, 100) + '...';
      }
    }

    return obj as T;
  }
}

2.13 SurrealDB Schema Migration

// src/storage/surreal/migrations.ts

export const SCHEMA_DEFINITION = `
// ============================================
// TOKENZIP GRAPH SCHEMA - SurrealDB v2
// ============================================

// --- NODE TYPES ---

DEFINE TABLE repository SCHEMAFULL;
DEFINE FIELD name ON repository TYPE string;
DEFINE FIELD root ON repository TYPE string;
DEFINE FIELD created_at ON repository TYPE datetime DEFAULT time::now();
DEFINE FIELD updated_at ON repository TYPE datetime DEFAULT time::now();
DEFINE FIELD stats ON repository TYPE object {
  files: number,
  modules: number, 
  symbols: number
};

DEFINE TABLE module SCHEMAFULL;
DEFINE FIELD name ON module TYPE string;
DEFINE FIELD path ON module TYPE string;
DEFINE FIELD manifest_type ON module TYPE string;
DEFINE FIELD language ON module TYPE string;
DEFINE FIELD is_root ON module TYPE bool DEFAULT false;
DEFINE FIELD metadata ON module TYPE object;
DEFINE FIELD repository_id ON module TYPE record<repository>;

DEFINE TABLE file SCHEMAFULL;
DEFINE FIELD path ON file TYPE string;
DEFINE FIELD module_id ON file TYPE record<module>;
DEFINE FIELD language ON file TYPE string;
DEFINE FIELD ext ON file TYPE string;
DEFINE FIELD size_bytes ON file TYPE int;
DEFINE FIELD content_hash ON file TYPE string;
DEFINE FIELD line_count ON file TYPE int;
DEFINE FIELD parse_status ON file TYPE string 
  ASSERT $value IN ['parsed', 'partial', 'failed', 'skipped'];
DEFINE FIELD parse_error ON file TYPE option<string>;
DEFINE FIELD last_parsed ON file TYPE datetime;
DEFINE FIELD git_last_modified ON file TYPE option<datetime>;
DEFINE FIELD git_blame_summary ON file TYPE option<object>;

DEFINE TABLE symbol SCHEMAFULL;
DEFINE FIELD file_id ON symbol TYPE record<file>;
DEFINE FIELD name ON symbol TYPE string;
DEFINE FIELD kind ON symbol TYPE string 
  ASSERT $value IN [
    'function', 'method', 'constructor',
    'class', 'interface', 'type_alias', 'enum',
    'variable', 'constant', 'property',
    'parameter', 'generic_param',
    'decorator', 'annotation',
    'table', 'view', 'column', 'index', 'constraint',
    'foreign_key', 'stored_procedure',
    'import', 'export', 're_export',
    'namespace', 'module_decl',
    'section', 'subsection',
    'workflow_step', 'diagram_node',
    'list_item', 'table_row'
  ];
DEFINE FIELD signature ON symbol TYPE option<string>;
DEFINE FIELD return_type ON symbol TYPE option<string>;
DEFINE FIELD start_line ON symbol TYPE int;
DEFINE FIELD end_line ON symbol TYPE int;
DEFINE FIELD start_col ON symbol TYPE int;
DEFINE FIELD end_col ON symbol TYPE int;
DEFINE FIELD docstring ON symbol TYPE option<string>;
DEFINE FIELD is_exported ON symbol TYPE bool DEFAULT false;
DEFINE FIELD is_async ON symbol TYPE option<bool>;
DEFINE FIELD is_static ON symbol TYPE option<bool>;
DEFINE FIELD visibility ON symbol TYPE option<string>
  ASSERT $value IN [null, 'public', 'private', 'protected'];
DEFINE FIELD modifiers ON symbol TYPE array;
DEFINE FIELD parent_symbol_id ON symbol TYPE option<string>;
DEFINE FIELD metadata ON symbol TYPE object;

DEFINE TABLE commit SCHEMAFULL;
DEFINE FIELD hash ON commit TYPE string;
DEFINE FIELD short_hash ON commit TYPE string;
DEFINE FIELD message ON commit TYPE string;
DEFINE FIELD author ON commit TYPE string;
DEFINE FIELD email ON commit TYPE string;
DEFINE FIELD date ON commit TYPE datetime;
DEFINE FIELD branch ON commit TYPE string;
DEFINE FIELD tags ON commit TYPE array;

DEFINE TABLE dependency SCHEMAFULL;
DEFINE FIELD module_id ON dependency TYPE record<module>;
DEFINE FIELD name ON dependency TYPE string;
DEFINE FIELD version ON dependency TYPE string;
DEFINE FIELD dev ON dependency TYPE bool DEFAULT false;
DEFINE FIELD source ON dependency TYPE string;

// --- EDGE TYPES ---

DEFINE TABLE contains SCHEMAFULL TYPE RELATION FROM repository, module, file, symbol TO module, file, symbol;
DEFINE TABLE imports SCHEMAFULL TYPE RELATION FROM file, symbol, module TO file, symbol, module;
DEFINE FIELD is_type_only ON imports TYPE option<bool>;
DEFINE FIELD is_default ON imports TYPE option<bool>;
DEFINE FIELD alias ON imports TYPE option<string>;
DEFINE FIELD specifiers ON imports TYPE option<array>;

DEFINE TABLE exports SCHEMAFULL TYPE RELATION FROM file, symbol TO symbol, file;
DEFINE FIELD is_default ON exports TYPE option<bool>;
DEFINE FIELD is_reexport ON exports TYPE option<bool>;
DEFINE FIELD alias ON exports TYPE option<string>;
DEFINE FIELD name ON exports TYPE option<string>;

DEFINE TABLE calls SCHEMAFULL TYPE RELATION FROM symbol TO symbol;
DEFINE FIELD line ON calls TYPE option<int>;
DEFINE FIELD is_async ON calls TYPE option<bool>;
DEFINE FIELD call_type ON calls TYPE option<string>
  ASSERT $value IN [null, 'direct', 'indirect', 'dynamic'];

DEFINE TABLE implements SCHEMAFULL TYPE RELATION FROM symbol TO symbol;
DEFINE FIELD is_partial ON implements TYPE option<bool>;

DEFINE TABLE inherits SCHEMAFULL TYPE RELATION FROM symbol TO symbol;
DEFINE FIELD is_interface_inheritance ON inherits TYPE option<bool>;

DEFINE TABLE modifies SCHEMAFULL TYPE RELATION FROM symbol TO symbol;
DEFINE TABLE reads SCHEMAFULL TYPE RELATION FROM symbol TO symbol;
DEFINE TABLE references SCHEMAFULL TYPE RELATION FROM symbol TO symbol;
DEFINE FIELD context ON references TYPE option<string>;

DEFINE TABLE depends_on SCHEMAFULL TYPE RELATION FROM module, file TO module, file;
DEFINE FIELD is_transitive ON depends_on TYPE option<bool>;
DEFINE FIELD depth ON depends_on TYPE option<int>;

DEFINE TABLE modified_in SCHEMAFULL TYPE RELATION FROM file TO commit;
DEFINE FIELD change_type ON modified_in TYPE string
  ASSERT $value IN ['added', 'modified', 'deleted', 'renamed'];

DEFINE TABLE foreign_key SCHEMAFULL TYPE RELATION FROM symbol TO symbol;
DEFINE FIELD constraint_name ON foreign_key TYPE option<string>;
DEFINE FIELD on_delete ON foreign_key TYPE option<string>;
DEFINE FIELD on_update ON foreign_key TYPE option<string>;
DEFINE FIELD ref_column ON foreign_key TYPE option<string>;

DEFINE TABLE column_of SCHEMAFULL TYPE RELATION FROM symbol TO symbol;

DEFINE TABLE diagram_edge SCHEMAFULL TYPE RELATION FROM symbol TO symbol;
DEFINE FIELD label ON diagram_edge TYPE option<string>;
DEFINE FIELD style ON diagram_edge TYPE option<string>;
DEFINE FIELD type ON diagram_edge TYPE option<string>;
DEFINE FIELD sequence ON diagram_edge TYPE option<int>;
DEFINE FIELD is_response ON diagram_edge TYPE option<bool>;

DEFINE TABLE workflow_transition SCHEMAFULL TYPE RELATION FROM symbol TO symbol;
DEFINE FIELD condition ON workflow_transition TYPE option<string>;
DEFINE FIELD action ON workflow_transition TYPE option<string>;

// --- INDEXES ---

DEFINE INDEX idx_file_path ON file FIELDS path UNIQUE;
DEFINE INDEX idx_file_hash ON file FIELDS content_hash;
DEFINE INDEX idx_file_module ON file FIELDS module_id;
DEFINE INDEX idx_symbol_name ON symbol FIELDS name;
DEFINE INDEX idx_symbol_kind ON symbol FIELDS kind;
DEFINE INDEX idx_symbol_file ON symbol FIELDS file_id;
DEFINE INDEX idx_symbol_export ON symbol FIELDS is_exported;
DEFINE INDEX idx_module_path ON module FIELDS path UNIQUE;
DEFINE INDEX idx_commit_hash ON commit FIELDS hash UNIQUE;
DEFINE INDEX idx_dep_name ON dependency FIELDS name, module_id;
`;

2.14 Error Handling Strategy

// src/utils/errors.ts

export class TokenZipError extends Error {
  constructor(
    message: string,
    public readonly code: ErrorCode,
    public readonly details?: Record<string, unknown>
  ) {
    super(message);
    this.name = 'TokenZipError';
  }
}

export enum ErrorCode {
  // Storage errors (1xxx)
  DB_CONNECTION_FAILED = 'E1001',
  DB_QUERY_FAILED = 'E1002',
  DB_MIGRATION_FAILED = 'E1003',
  DB_CORRUPTED = 'E1004',

  // Parser errors (2xxx)
  PARSE_FAILED = 'E2001',
  GRAMMAR_NOT_FOUND = 'E2002',
  PARTIAL_PARSE = 'E2003',

  // Git errors (3xxx)
  GIT_NOT_REPOSITORY = 'E3001',
  GIT_HOOK_INSTALL_FAILED = 'E3002',
  GIT_DIFF_FAILED = 'E3003',

  // MCP errors (4xxx)
  MCP_TRANSPORT_FAILED = 'E4001',
  MCP_TOOL_NOT_FOUND = 'E4002',
  MCP_INVALID_PARAMS = 'E4003',
  MCP_TOKEN_BUDGET_EXCEEDED = 'E4004',

  // Config errors (5xxx)
  CONFIG_NOT_FOUND = 'E5001',
  CONFIG_INVALID = 'E5002',

  // Indexer errors (6xxx)
  INDEX_INTERRUPTED = 'E6001',
  INDEX_FILE_TOO_LARGE = 'E6002',
  INDEX_BINARY_FILE = 'E6003',
}

// Global error handler for MCP tools
export function mcpErrorHandler(error: unknown): { content: Array<{ type: 'text'; text: string }>; isError: boolean } {
  if (error instanceof TokenZipError) {
    return {
      content: [{
        type: 'text',
        text: JSON.stringify({
          error: error.message,
          code: error.code,
          details: error.details,
        }),
      }],
      isError: true,
    };
  }

  if (error instanceof Error) {
    return {
      content: [{
        type: 'text',
        text: JSON.stringify({
          error: error.message,
          code: 'E9999',
          stack: process.env.NODE_ENV === 'development' ? error.stack : undefined,
        }),
      }],
      isError: true,
    };
  }

  return {
    content: [{ type: 'text', text: JSON.stringify({ error: 'Unknown error' }) }],
    isError: true,
  };
}

2.15 Testing Strategy

// tests/unit/extractor/typescript.test.ts

import { describe, it, expect, beforeEach } from 'vitest';
import { TypeScriptExtractor } from '../../../src/extractor/code/typescript';
import { createMockContext } from '../../helpers';

describe('TypeScriptExtractor', () => {
  let extractor: TypeScriptExtractor;

  beforeEach(() => {
    extractor = new TypeScriptExtractor();
  });

  describe('function extraction', () => {
    it('extracts a simple exported function', () => {
      const code = `
export function addUser(name: string, age: number): User {
  return { name, age, id: crypto.randomUUID() };
}
`;
      const ctx = createMockContext('src/user.ts', code, 'module-1');
      const result = extractor.extract(ctx);

      expect(result.symbols).toHaveLength(1);
      expect(result.symbols[0]).toMatchObject({
        name: 'addUser',
        kind: 'function',
        isExported: true,
        isAsync: false,
        startLine: 2,
        endLine: 4,
      });
      expect(result.symbols[0].metadata.params).toEqual([
        { name: 'name', type: 'string' },
        { name: 'age', type: 'number' },
      ]);
      expect(result.symbols[0].returnType).toBe('User');
    });

    it('extracts async arrow function assigned to const', () => {
      const code = `
export const fetchUser = async (id: string): Promise<User> => {
  const res = await fetch(\`/api/users/\${id}\`);
  return res.json();
};
`;
      const ctx = createMockContext('src/api.ts', code, 'module-1');
      const result = extractor.extract(ctx);

      expect(result.symbols).toHaveLength(1);
      expect(result.symbols[0]).toMatchObject({
        name: 'fetchUser',
        kind: 'function',
        isExported: true,
        isAsync: true,
      });
      expect(result.symbols[0].metadata.isArrow).toBe(true);
    });

    it('extracts class with methods, inheritance, and implementation', () => {
      const code = `
export class UserRepository implements IRepository<User> {
  private cache: Map<string, User> = new Map();

  async findById(id: string): Promise<User | null> {
    return this.cache.get(id) ?? null;
  }

  async save(user: User): Promise<void> {
    this.cache.set(user.id, user);
  }
}
`;
      const ctx = createMockContext('src/repo.ts', code, 'module-1');
      const result = extractor.extract(ctx);

      // 1 class + 1 property + 2 methods
      expect(result.symbols).toHaveLength(4);
      
      const classSym = result.symbols.find(s => s.kind === 'class')!;
      expect(classSym.name).toBe('UserRepository');
      expect(classSym.isExported).toBe(true);
      expect(classSym.metadata.implements).toEqual(['IRepository<User>']);

      const methods = result.symbols.filter(s => s.kind === 'method');
      expect(methods).toHaveLength(2);
      expect(methods.map(m => m.name)).toEqual(['findById', 'save']);

      // Check implements edge
      const implEdge = result.edges.find(e => e.type === 'implements');
      expect(implEdge).toBeDefined();
    });

    it('extracts interface with generics and members', () => {
      const code = `
export interface IRepository<T extends { id: string }> {
  findById(id: string): Promise<T | null>;
  save(entity: T): Promise<void>;
  delete(id: string): Promise<boolean>;
}
`;
      const ctx = createMockContext('src/types.ts', code, 'module-1');
      const result = extractor.extract(ctx);

      expect(result.symbols).toHaveLength(1);
      expect(result.symbols[0]).toMatchObject({
        name: 'IRepository',
        kind: 'interface',
        isExported: true,
      });
      expect(result.symbols[0].metadata.generics).toEqual(['T extends { id: string }']);
      expect(result.symbols[0].metadata.members).toHaveLength(3);
    });

    it('extracts imports with type-only and default', () => {
      const code = `
import type { User } from './types';
import React, { useState, useEffect } from 'react';
import { formatDate } from './utils';
`;
      const ctx = createMockContext('src/component.tsx', code, 'module-1');
      const result = extractor.extract(ctx);

      const imports = result.symbols.filter(s => s.kind === 'import');
      expect(imports).toHaveLength(3);
      
      expect(imports[0].metadata.isTypeOnly).toBe(true);
      expect(imports[0].metadata.source).toBe('./types');
      
      expect(imports[1].metadata.isDefault).toBe(true);
      expect(imports[1].metadata.source).toBe('react');
      expect(imports[1].metadata.specifiers).toContain('useState');
    });

    it('handles parse errors gracefully', () => {
      const code = `
export function broken(
  // Missing closing paren and body
`;
      const ctx = createMockContext('src/broken.ts', code, 'module-1');
      const result = extractor.extract(ctx);

      expect(result.parseErrors.length).toBeGreaterThan(0);
      // Should still return partial results if any
      expect(result.symbols).toBeDefined();
    });
  });
});
// tests/integration/full-parse.test.ts

import { describe, it, expect, beforeAll, afterAll } from 'vitest';
import { MemoryStore } from '../../src/storage/memory/store';
import { Indexer } from '../../src/engine/indexer';
import { createQuery } from '../../src/query/builder';
import path from 'path';

describe('Full Parse Integration', () => {
  let store: MemoryStore;
  let indexer: Indexer;
  const fixturePath = path.join(__dirname, '../fixtures/ts-monorepo');

  beforeAll(async () => {
    store = new MemoryStore();
    await store.initialize();
    await store.migrate();
    indexer = new Indexer(store, fixturePath);
    await indexer.fullIndex();
  });

  afterAll(async () => {
    await store.close();
  });

  it('indexes all modules in the monorepo', async () => {
    const modules = await createQuery(store, fixturePath).modules().toArray();
    expect(modules.length).toBeGreaterThanOrEqual(3); // apps/web, apps/api, packages/shared
  });

  it('extracts all TypeScript symbols', async () => {
    const symbols = await createQuery(store, fixturePath)
      .symbols()
      .eq('kind', 'function')
      .toArray();
    expect(symbols.length).toBeGreaterThan(10);
  });

  it('resolves cross-module imports', async () => {
    // Find a symbol in packages/shared that's imported by apps/web
    const sharedExports = await createQuery(store, fixturePath)
      .modules()
      .eq('path', 'packages/shared')
      .files()
      .symbols()
      .eq('is_exported', true)
      .toArray();

    expect(sharedExports.length).toBeGreaterThan(0);

    // Check that at least one has an imports edge from apps/web
    const importEdges = await store.getEdgesTo(sharedExports[0].id, 'imports');
    // At least the file-level import should exist
  });

  it('chainable query: modules → files → symbols → filters', async () => {
    const result = await createQuery(store, fixturePath)
      .modules()
      .eq('language', 'typescript')
      .files()
      .eq('ext', '.ts')
      .symbols()
      .eq('kind', 'class')
      .eq('is_exported', true)
      .toArray();

    expect(result.length).toBeGreaterThan(0);
    for (const sym of result) {
      expect((sym as any).kind).toBe('class');
      expect((sym as any).is_exported).toBe(true);
    }
  });

  it('graph traversal: find all callers of an exported function', async () => {
    const targetFunc = await createQuery(store, fixturePath)
      .symbol('formatDate')
      .eq('kind', 'function')
      .toArray();

    if (targetFunc.length === 0) return; // Skip if fixture doesn't have this

    const callers = await createQuery(store, fixturePath)
      .symbol('formatDate')
      .callers()
      .toArray();

    // Should find at least one caller
    expect(callers.length).toBeGreaterThan(0);
  });

  it('formats query result as markdown', async () => {
    const md = await createQuery(store, fixturePath)
      .modules()
      .limit(3)
      .toMarkdown();

    expect(md).toContain('#');
    expect(md).toContain('packages/shared'); // Based on fixture
  });
});

2.16 Configuration Schema

// src/types/config.ts

export interface TokenZipConfig {
  // Project-level config (.tokenzip/config.json)
  version: string;
  
  storage: {
    engine: 'surrealdb' | 'sqlite' | 'auto';
    path: string; // relative to project root, default: .tokenzip/db
    surrealdb?: {
      binary_path?: string; // custom surrealdb binary
      memory?: boolean; // use memory backend instead of RocksDB
    };
  };

  languages: {
    enabled: string[]; // ['typescript', 'javascript', 'python', 'sql', 'markdown']
    disabled: string[];
    custom: Record<string, {
      extensions: string[];
      grammar_path?: string; // path to custom tree-sitter WASM
      extractor_path?: string; // path to custom extractor JS
    }>;
  };

  exclude: {
    paths: string[]; // glob patterns: ['**/node_modules/**', '**/dist/**', '**/.git/**']
    files: string[]; // exact filenames: ['package-lock.json', 'yarn.lock']
    max_file_size_kb: number; // default: 500
  };

  hooks: {
    pre_commit: 'warn' | 'block' | 'off';
    post_commit: 'on' | 'off';
    validate_on_commit: boolean; // run reference integrity checks
  };

  mcp: {
    max_tokens: number; // default: 8000
    transport: 'stdio' | 'sse';
    port: number; // for SSE, default: 3777
    include_source: boolean; // include source code in responses
    source_max_lines: number; // max lines of source per symbol, default: 50
  };

  indexing: {
    worker_threads: number; // default: os.cpus().length - 1, min 1
    batch_size: number; // files per batch, default: 100
    git_history_depth: number; // commits to index, default: 100
  };

  workflows: {
    enabled: string[]; // ['create-module', 'update-module', 'implement-feature', 'upgrade-feature', 'bug-fix']
  };
}

export const DEFAULT_CONFIG: TokenZipConfig = {
  version: '2.0.0',
  storage: {
    engine: 'auto',
    path: '.tokenzip/db',
  },
  languages: {
    enabled: ['typescript', 'javascript', 'python', 'sql', 'go', 'rust', 'java', 'kotlin', 'markdown'],
    disabled: [],
    custom: {},
  },
  exclude: {
    paths: [
      '**/node_modules/**',
      '**
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment