Skip to content

Instantly share code, notes, and snippets.

@usirin
Created April 12, 2026 17:55
Show Gist options
  • Select an option

  • Save usirin/834a11a1b36472e4201886a803b5dea2 to your computer and use it in GitHub Desktop.

Select an option

Save usirin/834a11a1b36472e4201886a803b5dea2 to your computer and use it in GitHub Desktop.
Agentic DS Framework — framing memo (working doc)
title Agentic DS Framework
project discord
domain agentic-ds
status active
timing Q2 2026
context CTO-down initiative on DS role in agentic era
research kampus/ds-as-claude-md

Agentic DS Framework

Thesis

A design system is a constraint manifest agents read before generating UI.

In agentic work, the durable readable substrate is the first-class primitive. Everything that isn't substrate is exhaust. The pattern repeats across domains:

Domain Substrate Projection Exhaust
Workflow coordination vault (state as files, workflow.json, manifest.json) dashboard, notifications chat transcript
Design coordination DS (tokens, components, constraints, intent docs) Figma file, mockup pixel-pushing
Code conventions CLAUDE.md IDE view, linter output individual edits

When designers drive Claude Code for prototyping, the design system IS the constraint manifest that scopes output to brand quality. No DS → generic AI slop. Good DS → output that feels like it came from the team.

This pattern is already shipping at Atlassian (70% DS accuracy in one pass), Spotify ("good for machines, not just humans"), Uber (weeks → minutes), Indeed (4-stage MDX→JSON pipeline), Shopify (Polaris MCP + validation). Discord's DS team already practices substrate-first design through the a11y automation project — the opportunity is applying that pattern to designer-agent sessions.

Death of Figma = Promotion of DS

Figma's core value: visual spec → implementation handoff. If designers drive Claude Code directly, the handoff is gone. Figma becomes a scratchpad, not source of truth.

This is not the death of design systems. It's the promotion. DS was always the constraint manifest underneath Figma. Now it's the primary artifact an agent reads before producing anything. DS team shifts from "component librarians" to "authors of the constraint manifest every designer-driven agent reads."

Spotify's framing is the forcing function: "Developers consult AI agents before checking design documentation, potentially bypassing the design system entirely." If the DS isn't in the agent's context at the moment of generation, it doesn't exist.

What happens without active constraints (the Mirumee cautionary tale): even with full Figma DS access, Cursor "regularly decided to improvise" — unauthorized styling, inconsistent variants, phantom spacing rules. Passive documentation doesn't prevent drift. Only active constraint encoding does.

Three-tier architecture for agent-readable DS

Research converges on a layered approach. CLAUDE.md works best under 300 lines; frontier LLMs degrade past 150-200. A full DS can't fit in one file. Atlassian discovered this independently.

Tier What How it loads Discord path
1. Static context DESIGN.md (token tables, core rules) + CLAUDE.md (behavioral instructions, short) Always loaded, human-written Write DESIGN.md + CLAUDE.md for DS monorepo
2. On-demand context Storybook MCP manifests (~200 tokens/component), component registries, skill files Queried when relevant Retrofit existing Storybook stories → auto-generated manifests (lowest effort, highest impact)
3. Validation Compliance-checking scripts. Agents verify their own work against DS rules. Post-generation Build DS-compliance evaluator (Carbon's validateComponent and Shopify's validate_graphql_codeblocks are the only shipped examples)

Key insight: imperative beats descriptive at 94% vs 73% application rate. "Always use spacing tokens from tokens.ts" > "The project uses a spacing scale." Front-load commands, put warnings last (Lost in the Middle effect). 50-line focused files outperform 500-line comprehensive ones.

Evidence from Q1 2026

All built by this team. All demonstrating the substrate pattern.

Tool / Project What Substrate pattern Status
Claude A11y Automation P0 project. 340/340 tickets classified (zero failures), 268 automatable. Fix pipeline producing draft PRs. manifest.json + knowledge.json + classification comments = constraint manifest per ticket. Agent reads contract, doesn't rederive. M3 active
a11y-Audit Plugin 4-skill plugin (/pull /classify /fix /draft-pr), state machine lifecycle Notion "Fix Patterns" + "Blockers" DBs as knowledge base. Validation Status enum as failure taxonomy. Shipped
Operator XState-driven orchestrator. Executes task lists autonomously via state-ledger CLI. workflow.json + workflow-state.json + progress.md + operator.md. Resumability via world state, not checkpoints. Shipped
Vault workflow skills 5 composable skills (/write-a-prd → /prd-to-tasks → /tdd → /do-work). End-to-end pipeline. Each skill emits machine-readable output for the next. PRD → tasks.md → progress.md. Contract-based handoffs. Shipped
Asana CLI Effect-based typed CLI for Asana API. Self-directed, nobody asked for it. Stateless, deterministic. Replaced slow MCP with direct CLI for agent speed. Shipped (PR #275653, 84 tests)
Notify CLI Discord webhooks at every operator lifecycle event. Reads filesystem, operator just says what happened. Observation decoupled from agent. Shipped
/rework skill Built after a11y pipeline surfaced "Has Bugs" tickets with no path back. +49% pass rate. Self-improving loop: pipeline surfaces gap → new skill closes it → loop improves. Shipped

Industry evidence

Company What they shipped Result
Atlassian 20K lines of MCP-served DS guidance + pre-coded templates 70% accuracy → "nearly zero" hallucinations with templates. 4-5 days → 20 min prototyping.
Spotify "Good for machines, not just humans" initiative "Developers consult AI before checking DS docs." DS must be agent-visible or it doesn't exist.
Uber uSpec — AI-driven design spec automation Weeks → minutes
Indeed AIMS — 4-stage MDX→JSON metadata pipeline New role title: "Senior UX Designer, Design Systems, AI Enablement"
Shopify Polaris web components + MCP + validate_graphql_codeblocks Validation layer: agents check their own compliance
IBM Carbon validateComponent via MCP Only other shipped example of agent self-verification against DS
shadcn registry-item.json schema Becoming cross-tool lingua franca for DS→AI context passing

Market signal: IDS 2026 sold out — 1,000 attendees, 21 speakers from WhatsApp, Adobe, Figma, Miro, Atlassian, GitHub. This is mainstream, not fringe.

Cross-cutting patterns

  1. Durable readable substrate as operative primitive. Every system treats the filesystem as single source of truth. Agents read structured files, not accumulated context.
  2. Contract-based agent handoffs. Each artifact is machine-readable; downstream agents parse deterministically. Not prose handoffs.
  3. Self-improving loops via tooling gaps. a11y pipeline → "Has Bugs" gap → /rework skill. Operator → no visibility → notify CLI. Asana MCP slow → Effect CLI. Pattern: observation of gap → new tool → loop improves.
  4. Failure taxonomies with distinct semantics. Not generic "error." Enumerated states with different remediation paths. classification_wrong vs execution_wrong. Validation Status enum.
  5. Observability as first-class feature. notify at every step, state-machine diagrams, session logs as infrastructure.
  6. Validation is the differentiator. Most companies give agents DS data. Carbon and Shopify give agents the ability to verify their own compliance. The leap: "here's our DS" → "here's how to check you used it correctly." Our a11y work does this already (before/after a11y tree verification). Same pattern applies to DS compliance.

What this means for the team

The CTO-down initiative asks "what's DS's role in the agentic era." Our team has already been answering that question in practice for a quarter, through the a11y automation work. We just haven't extracted the general framework yet.

The framework:

  • DS is the constraint manifest for design in a codebase (same pattern as CLAUDE.md for code, manifest.json for a11y tickets)
  • Designers using Claude Code need DS to be agent-readable: three-tier architecture (static context + on-demand MCP + validation)
  • The DS team's role shifts from publishing visual references to authoring and maintaining the substrate agents read. Spotify: "ensure components are in the agent's context." Indeed: new title includes "AI Enablement."
  • Top-performer designers on the team are already using DS as CLAUDE.md implicitly. The opportunity is to extract that implicit practice and make it repeatable.
  • The a11y automation project is the structural precedent: knowledge base (classify constraints by enforceability) + per-component annotations + linting for automatable subset + metadata for judgment calls. DS teams should reuse this architecture.

Gaps / what's needed

  • DS readability audit: baseline measurement. Run Claude Code against Discord DS monorepo, measure DS-violation rate with zero context vs with CLAUDE.md. Quantifiable starting point. Identify which 20-30 components are most-misused by agents → priority targets for hand-curated .metadata.ts.
  • DESIGN.md + CLAUDE.md for DS monorepo: Tier 1 static context. Short, imperative, front-loaded commands. Under 300 lines for CLAUDE.md.
  • Storybook MCP manifests: Tier 2 on-demand context. Auto-generate from existing stories. ~200 tokens per component. Lowest effort, highest impact retrofit.
  • DS-compliance evaluator: Tier 3 validation. Generator-evaluator separation (Anthropic's harness pattern). Static agents for rule-checking (typography, spacing, color contrast) + dynamic agents for contextual quality (brand alignment, visual appropriateness). No one has shipped a complete version of this for design yet — Carbon's validateComponent is closest.
  • Supervision tool: M3 → M4 scaling is blocked without parallelism-preserving observability for operator runs. For designers: 4-up grid generation + visual comparison + DS-compliance evaluation. Nobody has combined these for coding agents. Midjourney's grid, Shuffle's multi-model arena, Figma branching are partial prior art.
  • Meta-RFC: field report generalizing the a11y pattern into a framework. Grounded in both internal evidence (Q1 2026 work) and external evidence (industry table above).
  • Designer practice extraction: sit with the top-performer designers, observe their implicit Claude Code workflow, reverse-engineer how they use DS as context. First interview is scheduleable this week.

Open questions

  • What's our DS-violation baseline? (Claude Code + monorepo, zero context vs with CLAUDE.md — how big is the gap?)
  • Which components are most-misused by agents? (priority targets for .metadata.ts curation)
  • Who's the first designer to interview? What prototype are they currently working on?
  • Where does the supervision tool live — inside the a11y project scope (serves M3→M4 directly) or standalone?
  • How do we position this in the CTO initiative: is this memo the pre-read, or do we need a shorter pitch first?
  • Storybook MCP: does our current Storybook setup support manifest auto-generation, or is there retrofit work?
  • Timeline: can framing land before next CTO touchpoint?
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment