Skip to content

Instantly share code, notes, and snippets.

@Alexsky347
Last active June 18, 2026 08:02
Show Gist options
  • Select an option

  • Save Alexsky347/145b635611b563402227a7905f51e229 to your computer and use it in GitHub Desktop.

Select an option

Save Alexsky347/145b635611b563402227a7905f51e229 to your computer and use it in GitHub Desktop.
🧠 Knowledge Graphs for AI Coding Assistants β€” A Team Setup Guide

🧠 Knowledge Graphs for AI Coding Assistants β€” A Team Setup Guide

graphify + code-review-graph + MCP, built around one shared source of truth

A rewrite of the popular "500x smarter" gist, fixed for correct installs (uv/uvx, not pip), honest token math, hardened auto-update hooks, and β€” the part the original skips β€” a centralized graph every developer queries from one URL instead of N drifting local copies.


What this actually does (and an honest number)

Your AI assistant normally re-reads source files to answer questions about your codebase. A knowledge graph gives it a map β€” functions, classes, imports, call edges, plus (for graphify) docs and semantic links β€” so it looks up precise answers instead of grepping and reading.

About the "500x": that figure comes from graphify's own benchmark, which compares feeding your entire corpus into context (~360K tokens) against a single graph query (~700 tokens). No real assistant does that β€” they read selectively. Independently measured on a real monorepo, the realistic saving is roughly 5–10Γ— on typical work, occasionally more on impact/review tasks, and graphify's broad BFS queries can sometimes cost more than a narrow grep. The savings are real; treat the magnitude as "single-to-low-double-digit Γ—," not 500.

Two tools, different jobs

graphify (safishamsi/graphify) code-review-graph (tirth8205/code-review-graph)
Scope Code + docs + PDFs + images + SQL + IaC Code only (AST)
Extraction AST (free) + LLM for docs/images Pure AST, no LLM, free
Best at "How does X relate to the design in DOCS?" "Who calls X? Blast radius of changing X?"
Build cost Code free; docs/images cost API tokens Free, ~10s / 500 files
Transport stdio or HTTP (shareable URL) stdio (local-first)

They complement each other. The key consequence for a team: graphify is the one worth centralizing (semantic build costs money, do it once); code-review-graph is cheap enough to run locally per dev.


Pick your deployment model first

Decide this before installing β€” it determines the rest.

Model What it is Best when
A β€” Centralized HTTP One host builds graphify's graph and serves it over a URL; devs point their MCP client at it You want a true single source of truth, paid semantic extraction done once
B β€” Git-distributed Commit graphify-out/ to the repo; everyone pulls; a cheap hook keeps it fresh Simpler, no server to run; each dev has a local copy that syncs via git
C β€” Solo / local Everything stdio on one machine Personal use, evaluating the tools

The original gist is Model C with graphify-out/ gitignored β€” the most drift-prone option for a team. This guide leads with A, then covers B and C.


Prerequisites

Install uv (manages an isolated tool environment and puts CLIs on PATH β€” this is what both projects officially recommend, and it avoids the ModuleNotFoundError you get when plain pip lands the package in the wrong environment):

# macOS / Linux
curl -LsSf https://astral.sh/uv/install.sh | sh
# Windows
winget install astral-sh.uv

Python 3.10+ is required (uv can manage that for you too).


Install (the correct way)

# graphify β€” PyPI package is "graphifyy" (two y's), CLI is "graphify"
uv tool install graphifyy
# add extras only if you need them, e.g. PDFs + office docs + MCP server + sql:
#   uv tool install "graphifyy[pdf,office,mcp,sql]"

# code-review-graph β€” install uv and let its MCP config use uvx
uv tool install code-review-graph
# embeddings extra powers semantic_search; recommended:
#   uv tool install "code-review-graph[embeddings]"

Why not pip install? graphify resolves its Python interpreter at runtime from graphify-out/.graphify_python. With plain pip that can point at a different environment than where the package landed, producing ModuleNotFoundError: No module named 'graphify'. uv tool / pipx isolate it and avoid this. code-review-graph similarly states uv gives "the best experience," and its install command auto-detects uvx vs pip to write the right MCP config. pipx install graphifyy / pipx install code-review-graph are fine alternatives if you prefer pipx.

Verify:

graphify --version
code-review-graph --version

Build the graphs

cd /path/to/your/project

# code-review-graph: free, fast, no LLM
code-review-graph build      # ~10s / 500 files

# graphify: code is free (AST); docs/images use your AI tokens.
# Inside Claude Code (uses your session's model β€” no API key needed):
#   /graphify .
# Headless / CI (needs a backend + key, see below):
graphify extract . --backend claude        # or gemini / openai / ollama (local) / bedrock
# AST-only refresh, free, no LLM:
graphify update .

graphify writes graphify-out/graph.json (the data), graph.html (interactive viz), and GRAPH_REPORT.md (god nodes, surprising connections, suggested questions).


Model A β€” Centralized shared graph (the main event)

The goal: build graphify's graph once, serve it from one URL, and have every developer's Claude Code / Cursor / etc. query that URL. No per-dev rebuilds, no drift, paid semantic extraction paid for once.

A1. Serve the graph over HTTP

On the host that holds graph.json:

# Loopback only (put a reverse proxy / tunnel in front β€” see A3):
python -m graphify.serve graphify-out/graph.json \
  --transport http --host 127.0.0.1 --port 8080 \
  --api-key "$GRAPHIFY_API_KEY" --stateless

Useful flags: --stateless (no per-session state β€” right for load-balanced / CI-served deployments), --json-response (plain JSON instead of SSE, handy if a proxy mishandles streaming), --session-timeout (reap idle sessions). The server exposes query_graph, get_node, get_neighbors, shortest_path, and PR tools.

Containerized (the repo ships a Dockerfile):

docker build -t graphify .
docker run -p 8080:8080 -v "$(pwd)/graphify-out:/data" graphify \
  /data/graph.json --transport http --host 0.0.0.0 --api-key "$GRAPHIFY_API_KEY"

A2. Keep it fresh (this is what makes it a source of truth)

An HTTP server serves a static snapshot. Without a refresh step it silently drifts from the code. Add a CI job that rebuilds on merge to main and redeploys:

# .github/workflows/graph.yml
name: Rebuild knowledge graph
on:
  push:
    branches: [main]
jobs:
  build-graph:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: astral-sh/setup-uv@v5
      - run: uv tool install "graphifyy[all]"
      # Code is free (AST). Add a backend + key only if you want docs/images too.
      - run: graphify extract . --backend claude
        env:
          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
      # Publish graph.json to wherever your server reads it
      # (artifact, object storage, or scp to the host), then restart/reload the server.
      - uses: actions/upload-artifact@v4
        with:
          name: graph
          path: graphify-out/graph.json

For pure-code graphs you can drop the extract/key step and run graphify update . (free, no LLM) β€” cheaper and no secrets needed.

A3. Expose it safely

Three ways to get from "listening on the host" to "reachable by the team," easiest first:

  • Tailscale (recommended for a fixed team). Keep --host on the tailnet IP / loopback; devs reach it over the private tailnet. No public exposure, no public TLS to manage.

  • Cloudflare Tunnel / ngrok. cloudflared tunnel --url http://localhost:8080 gives an HTTPS URL with TLS handled for you. Good for quick sharing.

  • Cloud VM + reverse proxy. Bind graphify to loopback, front it with Caddy for automatic Let's Encrypt TLS:

    graph.yourcompany.com {
        reverse_proxy 127.0.0.1:8080
    }

    Keep the firewall closed to 8080; only 443 open. With nginx, disable buffering for SSE (proxy_buffering off;) or run graphify with --json-response.

Always pair --host 0.0.0.0 with --api-key. graph.json is a map of your whole codebase structure β€” treat the endpoint as sensitive.

A4. Point each developer's client at the URL

claude mcp add --transport http graphify https://graph.yourcompany.com/mcp \
  --header "Authorization: Bearer $GRAPHIFY_API_KEY"

That's the whole per-dev setup for graphify β€” no local install needed.

A5. code-review-graph in a centralized world

code-review-graph is local-first (stdio) and free/fast to build, so the pragmatic pattern is: each dev runs it locally, or you commit its built graph as an artifact. Don't try to force it behind the shared URL β€” that's not the model it's designed for. Per-dev local config:

// .mcp.json (committed) β€” local code-review-graph + remote graphify
{
  "mcpServers": {
    "code-review-graph": {
      "type": "stdio",
      "command": "uvx",
      "args": ["--with", "code-review-graph[embeddings]", "code-review-graph", "serve"],
      "env": {
        // Trim ~25 tools down to the 8 that cover ~95% of use β€” cuts ~4,200 tokens
        // of always-loaded schema overhead per session:
        "CRG_TOOLS": "semantic_search_nodes_tool,query_graph_tool,get_impact_radius_tool,traverse_graph_tool,list_communities_tool,get_community_tool,get_review_context_tool,list_graph_stats_tool"
      }
    }
  }
}

(graphify is added separately via claude mcp add --transport http …, so it lives in the user/global config with the URL + bearer token, not in the committed .mcp.json.)


Model B β€” Git-distributed (no server)

The simplest team option. graphify's own README endorses committing graphify-out/.

  1. One person builds the graph and commits graphify-out/ (it's portable β€” keys are stored as relative paths).
  2. Everyone pulls; their assistant reads the graph immediately.
  3. graphify hook install sets up a post-commit rebuild (AST-only, free) and a git merge driver so two devs committing in parallel get their graph.json union-merged instead of conflicting.
  4. When docs/images change, someone runs /graphify . --update (the only step that costs tokens).

.gitignore for this model:

graphify-out/cost.json      # local token accounting only
# graphify-out/cache/       # optional: commit for build speed, or skip to keep the repo small

MCP config is the local stdio form for both tools (see Model C).


Model C β€” Solo / local

Everything on one machine, stdio. Commit .mcp.json so the config travels with the repo.

// .mcp.json
{
  "mcpServers": {
    "code-review-graph": {
      "type": "stdio",
      "command": "uvx",
      "args": ["--with", "code-review-graph[embeddings]", "code-review-graph", "serve"],
      "env": { "CRG_TOOLS": "semantic_search_nodes_tool,query_graph_tool,get_impact_radius_tool,traverse_graph_tool,list_communities_tool,get_community_tool,get_review_context_tool,list_graph_stats_tool" }
    },
    "graphify": {
      "type": "stdio",
      "command": "graphify",
      "args": ["serve", "graphify-out/graph.json"]
    }
  }
}

Then make your assistant prefer the graph over grep:

graphify claude install      # adds a PreToolUse hook + CLAUDE.md guidance (Claude Code)
code-review-graph install    # auto-detects your editors and writes their MCP config
# (cursor / gemini / codex / etc. variants exist for both)

Restart your editor; MCP servers load on startup.


Wiring it into everything-claude-code (ECC) reviewers

If you use affaan-m/everything-claude-code, its reviewer subagents are where these graphs pay off most β€” review is exactly the "understand the blast radius of a change" task code-review-graph is built for. But there's one rule that decides whether it helps or just sits idle:

A subagent can only call tools listed in its tools: frontmatter. ECC's shipped reviewers list ["Read", "Grep", "Glob", "Bash"] and no MCP tools, so they'll keep grepping and never touch the graph until you grant the tool names.

(If a subagent omits tools: entirely it inherits everything, including MCP β€” but then it also inherits Write/Edit, which you usually don't want on a reviewer. An explicit read-only-plus-graph allowlist is cleaner.)

How MCP tools are named (and granted)

In Claude Code an MCP tool is mcp__<server>__<tool>. Server names here: code-review-graph and graphify (the HTTP server in Model A or local stdio in B/C β€” same name either way, so grants don't change between models). Two ways to put them in a tools: allowlist:

  • Server wildcard (recommended): mcp__code-review-graph__* grants every tool that server exposes. Since you already capped that server to ~8 tools with CRG_TOOLS, the wildcard stays lean and you don't have to track exact tool ids.
  • Exact ids: list tools individually for the tightest set β€” but verify the ids first via /agents (tool-access view) or the server's tool list; suffixes like _tool vary between versions, so don't guess them.

There's also a denylist: disallowedTools: Write, Edit, NotebookEdit inherits every session tool (including all MCP) minus file writes β€” the simplest one-liner for a read-only reviewer, at the cost of loading every MCP schema into that agent.

Plugin install: shadow the agent, don't edit it

Installing ECC as a plugin changes where your edits go. By Claude Code's scope precedence, plugin agents rank lowest:

Scope Priority
Managed (org) settings 1 (highest)
.claude/agents/ (project) 3
~/.claude/agents/ (user) 4
Plugin agents/ 5 (lowest)

Two consequences: an edit to the plugin's cached copy is overwritten on update, and plugin agents ignore the mcpServers, hooks, and permissionMode frontmatter fields for security. So don't edit ECC's file β€” shadow it: create a same-name agent in a higher-priority scope and yours wins.

  1. Get the body: run /agents β†’ Library β†’ open ecc:code-reviewer to view it, or copy agents/code-reviewer.md from the ECC repo on GitHub.
  2. Save it to ~/.claude/agents/code-reviewer.md (all projects) or .claude/agents/code-reviewer.md (this repo β€” check it into git so the team gets it), keeping name: code-reviewer.
  3. Add the grant and the body nudge (below).

Your copy at priority 3–4 overrides the plugin's code-reviewer (priority 5) for that name; the plugin original stays reachable as ecc:code-reviewer if you want it. Prefer not to rely on override resolution? Name yours distinctly (e.g. code-reviewer-graph) and either @-mention it explicitly or disable the plugin's with "permissions": { "deny": ["Agent(code-reviewer)"] } in settings.json.

Agents created or edited through /agents take effect immediately; files dropped on disk need a session restart.

~/.claude/agents/code-reviewer.md β€” paste ECC's body, then set the grant:

---
name: code-reviewer
description: Reviews code for quality, security, and maintainability
model: opus
# Read-only reviewer + the already-trimmed code-review-graph toolset:
tools: Read, Grep, Glob, Bash, mcp__code-review-graph__*
# Simplest alternative (inherit everything except writes):
# disallowedTools: Write, Edit, NotebookEdit
---
# ...ECC's existing reviewer prompt, plus near the top...
# Before reading files, call get_review_context on the changed files for the
# minimal affected set + risk score. Use semantic_search to locate symbols
# instead of Grep, and get_impact_radius before calling a change safe.

~/.claude/agents/security-reviewer.md β€” same pattern; the security reviewer benefits from graphify's semantic layer too, so grant both servers:

tools: Read, Grep, Glob, Bash, mcp__code-review-graph__*, mcp__graphify__query_graph, mcp__graphify__get_neighbors

Which server for which agent

ECC agent Grant Why
code-reviewer mcp__code-review-graph__* Review context + risk + impact radius in one place
security-reviewer mcp__code-review-graph__* (+ a couple graphify tools) Trace data flow / callers across modules to follow a vuln
database-reviewer, python-reviewer mcp__code-review-graph__* Locate definitions + dependents in their domain
architect, planner mcp__graphify__* Design work wants graphify's semantic + docs layer, not line-level AST

Granting β‰  using

The tools: line only gives permission. To change behavior, nudge the agent's body (the prompt you pasted). Add something like:

Before reading files, call get_review_context on the changed files to get the minimal affected set and a risk score. Use semantic_search to locate symbols instead of Grep, and get_impact_radius before claiming a change is safe.

Belt-and-suspenders: graphify claude install also adds a PreToolUse hook that intercepts grep/find/rg (including inside subagent Bash calls) and steers toward the graph. That hook lives in your settings, not the agent, so the plugin's frontmatter restrictions don't touch it β€” it works regardless of how the reviewers are installed.

Cost note specific to a fleet of agents

Each subagent runs in its own context window, so the code-review-graph tool schemas load per reviewer granted them. The two levers work together: server-side CRG_TOOLS caps the universe at ~8 tools, and mcp__code-review-graph__* in each agent then grants exactly those 8 β€” lean by construction. Skip CRG_TOOLS and the wildcard pulls in all ~25 per agent, and the savings evaporate.

Verify

Run a reviewer on a branch with changes and watch the tool calls β€” success is an early get_review_context call instead of a burst of Read/Grep. Confirm your shadow agent is the active one: the /agents Library tab marks which definition wins when duplicates exist. If it still greps, either your copy isn't overriding (check the name and scope) or you granted the tools but didn't nudge the body.


Auto-update: do it safely

The original gist runs updates synchronously and unguarded in post-commit. On a large repo, rapid commits (rebases, amend chains) can pile up CPU-heavy rebuilds and stall your machine. Two rules fix this:

  1. code-review-graph update is fast (~0.4s) β€” safe to run after each AI turn (Claude Stop hook) and on commit.
  2. graphify update is slow (~10s+) β€” keep it out of fast hooks; run it only from git hooks, backgrounded, with a guard.

A safer post-commit hook (background + de-dupe; for the full cross-platform CPU/RAM guard see the dev.to writeup linked at the bottom):

# .git/hooks/post-commit  (chmod +x)
#!/bin/sh
# code-review-graph: fast incremental update
if command -v code-review-graph >/dev/null 2>&1 && [ -d .code-review-graph ]; then
  nohup sh -c 'code-review-graph update --skip-flows && code-review-graph embed' \
    >/dev/null 2>&1 < /dev/null &
fi
# graphify: slow β€” background it, and skip if one is already running
if command -v graphify >/dev/null 2>&1 && ! pgrep -qf 'graphify' 2>/dev/null; then
  nohup graphify update . >"$HOME/.cache/graphify-rebuild.log" 2>&1 < /dev/null &
fi

For the AI-turn update, add a Stop hook (Claude Code) running code-review-graph only, PID-guarded so turns don't overlap. graphify never belongs in Stop/PostToolUse.

In Model A, ignore most of this on the dev side β€” freshness is the CI job's job (A2). Devs only need the fast local code-review-graph hook.


Token economics, honestly

  • code-review-graph's MCP server loads ~25 tool schemas by default (~6,000 tokens, every session). The CRG_TOOLS allow-list above cuts that to ~1,800. Do this β€” it's the single highest-leverage tweak the original gist omits.
  • After cold start those schemas are served from prompt cache (~10Γ— cheaper), so the overhead pays back quickly on multi-question sessions.
  • graphify's broad query (BFS) returns ~1,500 tokens regardless of specificity β€” great for "orient me in an unfamiliar module," wasteful for "where is function X." Use code-review-graph's semantic_search for symbol lookups; reach for graphify query/path for exploration.

Security checklist (Model A)

  • --api-key set whenever the server is reachable beyond loopback
  • TLS in front (tunnel or reverse proxy) β€” never a raw http:// public URL
  • Prefer a private network (Tailscale) or IP allowlist over open internet for a fixed team
  • Firewall closed to the raw graphify port; only the proxy port open
  • Treat graph.json as sensitive (it maps your codebase); restrict who can pull it
  • graphify query logging writes to ~/.cache/graphify-queries.log β€” set GRAPHIFY_QUERY_LOG_DISABLE=1 if that's unwanted

Troubleshooting

Problem Fix
graphify: command not found Use uv tool install graphifyy / pipx install graphifyy (both fix PATH), or python -m graphify
ModuleNotFoundError: graphify Classic plain-pip symptom β€” reinstall via uv tool / pipx
/graphify . errors in PowerShell Use graphify . β€” the leading slash is a path separator on Windows
MCP tools don't appear Restart the editor; MCP loads at startup
Remote graphify works in curl but not in editor (SSE) Disable proxy buffering, or serve with --json-response
graph.json has conflict markers (Model B) graphify hook install sets up a union-merge driver
Central graph is stale Your CI refresh (A2) isn't firing / not redeploying the file
code-review-graph "built on a different branch" code-review-graph build to rebuild

What changed from the original gist

  • Installs fixed to uv tool / uvx (both projects' official recommendation); pip flagged as the cause of the common ModuleNotFoundError.
  • Centralized HTTP model added as the headline β€” the original was local-stdio-only with graphify-out/ gitignored (max drift for a team).
  • Honest token framing β€” "500x" demoted to a benchmark artifact; realistic 5–10Γ— stated.
  • CRG_TOOLS allow-list added β€” cuts ~4,200 tokens/session of schema overhead.
  • Hooks hardened β€” backgrounded + de-duplicated; graphify kept out of fast/AI-turn hooks; CI owns freshness in Model A.
  • Honest tool split β€” graphify centralized (paid semantic build, HTTP-capable); code-review-graph local-first (free AST, stdio).
  • everything-claude-code integration added β€” exactly which reviewer agent files to edit and which MCP tool names to grant each, plus the "granting β‰  using" nudge.

Sources

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment