title	Agentic DS Framework
project	discord
domain	agentic-ds
status	active
timing	Q2 2026
context	CTO-down initiative on DS role in agentic era
research	kampus/ds-as-claude-md

Agentic DS Framework

Thesis

A design system is a constraint manifest agents read before generating UI.

In agentic work, the durable readable substrate is the first-class primitive. Everything that isn't substrate is exhaust. The pattern repeats across domains:

Domain	Substrate	Projection	Exhaust
Workflow coordination	vault (state as files, workflow.json, manifest.json)	dashboard, notifications	chat transcript
Design coordination	DS (tokens, components, constraints, intent docs)	Figma file, mockup	pixel-pushing
Code conventions	CLAUDE.md	IDE view, linter output	individual edits

When designers drive Claude Code for prototyping, the design system IS the constraint manifest that scopes output to brand quality. No DS → generic AI slop. Good DS → output that feels like it came from the team.

This pattern is already shipping at Atlassian (70% DS accuracy in one pass), Spotify ("good for machines, not just humans"), Uber (weeks → minutes), Indeed (4-stage MDX→JSON pipeline), Shopify (Polaris MCP + validation). Discord's DS team already practices substrate-first design through the a11y automation project — the opportunity is applying that pattern to designer-agent sessions.

Death of Figma = Promotion of DS

Figma's core value: visual spec → implementation handoff. If designers drive Claude Code directly, the handoff is gone. Figma becomes a scratchpad, not source of truth.

This is not the death of design systems. It's the promotion. DS was always the constraint manifest underneath Figma. Now it's the primary artifact an agent reads before producing anything. DS team shifts from "component librarians" to "authors of the constraint manifest every designer-driven agent reads."

Spotify's framing is the forcing function: "Developers consult AI agents before checking design documentation, potentially bypassing the design system entirely." If the DS isn't in the agent's context at the moment of generation, it doesn't exist.

What happens without active constraints (the Mirumee cautionary tale): even with full Figma DS access, Cursor "regularly decided to improvise" — unauthorized styling, inconsistent variants, phantom spacing rules. Passive documentation doesn't prevent drift. Only active constraint encoding does.

Three-tier architecture for agent-readable DS

Research converges on a layered approach. CLAUDE.md works best under 300 lines; frontier LLMs degrade past 150-200. A full DS can't fit in one file. Atlassian discovered this independently.

Tier	What	How it loads	Discord path
1. Static context	DESIGN.md (token tables, core rules) + CLAUDE.md (behavioral instructions, short)	Always loaded, human-written	Write DESIGN.md + CLAUDE.md for DS monorepo
2. On-demand context	Storybook MCP manifests (~200 tokens/component), component registries, skill files	Queried when relevant	Retrofit existing Storybook stories → auto-generated manifests (lowest effort, highest impact)
3. Validation	Compliance-checking scripts. Agents verify their own work against DS rules.	Post-generation	Build DS-compliance evaluator (Carbon's `validateComponent` and Shopify's `validate_graphql_codeblocks` are the only shipped examples)

Key insight: imperative beats descriptive at 94% vs 73% application rate. "Always use spacing tokens from tokens.ts" > "The project uses a spacing scale." Front-load commands, put warnings last (Lost in the Middle effect). 50-line focused files outperform 500-line comprehensive ones.

Evidence from Q1 2026

All built by this team. All demonstrating the substrate pattern.

Tool / Project	What	Substrate pattern	Status
Claude A11y Automation	P0 project. 340/340 tickets classified (zero failures), 268 automatable. Fix pipeline producing draft PRs.	manifest.json + knowledge.json + classification comments = constraint manifest per ticket. Agent reads contract, doesn't rederive.	M3 active
a11y-Audit Plugin	4-skill plugin (/pull /classify /fix /draft-pr), state machine lifecycle	Notion "Fix Patterns" + "Blockers" DBs as knowledge base. Validation Status enum as failure taxonomy.	Shipped
Operator	XState-driven orchestrator. Executes task lists autonomously via state-ledger CLI.	workflow.json + workflow-state.json + progress.md + operator.md. Resumability via world state, not checkpoints.	Shipped
Vault workflow skills	5 composable skills (/write-a-prd → /prd-to-tasks → /tdd → /do-work). End-to-end pipeline.	Each skill emits machine-readable output for the next. PRD → tasks.md → progress.md. Contract-based handoffs.	Shipped
Asana CLI	Effect-based typed CLI for Asana API. Self-directed, nobody asked for it.	Stateless, deterministic. Replaced slow MCP with direct CLI for agent speed.	Shipped (PR #275653, 84 tests)
Notify CLI	Discord webhooks at every operator lifecycle event.	Reads filesystem, operator just says what happened. Observation decoupled from agent.	Shipped
/rework skill	Built after a11y pipeline surfaced "Has Bugs" tickets with no path back. +49% pass rate.	Self-improving loop: pipeline surfaces gap → new skill closes it → loop improves.	Shipped

Industry evidence

Company	What they shipped	Result
Atlassian	20K lines of MCP-served DS guidance + pre-coded templates	70% accuracy → "nearly zero" hallucinations with templates. 4-5 days → 20 min prototyping.
Spotify	"Good for machines, not just humans" initiative	"Developers consult AI before checking DS docs." DS must be agent-visible or it doesn't exist.
Uber	uSpec — AI-driven design spec automation	Weeks → minutes
Indeed	AIMS — 4-stage MDX→JSON metadata pipeline	New role title: "Senior UX Designer, Design Systems, AI Enablement"
Shopify	Polaris web components + MCP + `validate_graphql_codeblocks`	Validation layer: agents check their own compliance
IBM Carbon	`validateComponent` via MCP	Only other shipped example of agent self-verification against DS
shadcn	registry-item.json schema	Becoming cross-tool lingua franca for DS→AI context passing

Market signal: IDS 2026 sold out — 1,000 attendees, 21 speakers from WhatsApp, Adobe, Figma, Miro, Atlassian, GitHub. This is mainstream, not fringe.

Cross-cutting patterns

Durable readable substrate as operative primitive. Every system treats the filesystem as single source of truth. Agents read structured files, not accumulated context.
Contract-based agent handoffs. Each artifact is machine-readable; downstream agents parse deterministically. Not prose handoffs.
Self-improving loops via tooling gaps. a11y pipeline → "Has Bugs" gap → /rework skill. Operator → no visibility → notify CLI. Asana MCP slow → Effect CLI. Pattern: observation of gap → new tool → loop improves.
Failure taxonomies with distinct semantics. Not generic "error." Enumerated states with different remediation paths. classification_wrong vs execution_wrong. Validation Status enum.
Observability as first-class feature. notify at every step, state-machine diagrams, session logs as infrastructure.
Validation is the differentiator. Most companies give agents DS data. Carbon and Shopify give agents the ability to verify their own compliance. The leap: "here's our DS" → "here's how to check you used it correctly." Our a11y work does this already (before/after a11y tree verification). Same pattern applies to DS compliance.

What this means for the team

The CTO-down initiative asks "what's DS's role in the agentic era." Our team has already been answering that question in practice for a quarter, through the a11y automation work. We just haven't extracted the general framework yet.

The framework:

DS is the constraint manifest for design in a codebase (same pattern as CLAUDE.md for code, manifest.json for a11y tickets)
Designers using Claude Code need DS to be agent-readable: three-tier architecture (static context + on-demand MCP + validation)
The DS team's role shifts from publishing visual references to authoring and maintaining the substrate agents read. Spotify: "ensure components are in the agent's context." Indeed: new title includes "AI Enablement."
Top-performer designers on the team are already using DS as CLAUDE.md implicitly. The opportunity is to extract that implicit practice and make it repeatable.
The a11y automation project is the structural precedent: knowledge base (classify constraints by enforceability) + per-component annotations + linting for automatable subset + metadata for judgment calls. DS teams should reuse this architecture.

Gaps / what's needed

DS readability audit: baseline measurement. Run Claude Code against Discord DS monorepo, measure DS-violation rate with zero context vs with CLAUDE.md. Quantifiable starting point. Identify which 20-30 components are most-misused by agents → priority targets for hand-curated .metadata.ts.
DESIGN.md + CLAUDE.md for DS monorepo: Tier 1 static context. Short, imperative, front-loaded commands. Under 300 lines for CLAUDE.md.
Storybook MCP manifests: Tier 2 on-demand context. Auto-generate from existing stories. ~200 tokens per component. Lowest effort, highest impact retrofit.
DS-compliance evaluator: Tier 3 validation. Generator-evaluator separation (Anthropic's harness pattern). Static agents for rule-checking (typography, spacing, color contrast) + dynamic agents for contextual quality (brand alignment, visual appropriateness). No one has shipped a complete version of this for design yet — Carbon's validateComponent is closest.
Supervision tool: M3 → M4 scaling is blocked without parallelism-preserving observability for operator runs. For designers: 4-up grid generation + visual comparison + DS-compliance evaluation. Nobody has combined these for coding agents. Midjourney's grid, Shuffle's multi-model arena, Figma branching are partial prior art.
Meta-RFC: field report generalizing the a11y pattern into a framework. Grounded in both internal evidence (Q1 2026 work) and external evidence (industry table above).
Designer practice extraction: sit with the top-performer designers, observe their implicit Claude Code workflow, reverse-engineer how they use DS as context. First interview is scheduleable this week.

Open questions

What's our DS-violation baseline? (Claude Code + monorepo, zero context vs with CLAUDE.md — how big is the gap?)
Which components are most-misused by agents? (priority targets for .metadata.ts curation)
Who's the first designer to interview? What prototype are they currently working on?
Where does the supervision tool live — inside the a11y project scope (serves M3→M4 directly) or standalone?
How do we position this in the CTO initiative: is this memo the pre-read, or do we need a shorter pitch first?
Storybook MCP: does our current Storybook setup support manifest auto-generation, or is there retrofit work?
Timeline: can framing land before next CTO touchpoint?

usirin/framing-memo.md

Select an option

No results found