Skip to content

Instantly share code, notes, and snippets.

@rpeck
Created December 19, 2025 21:41
Show Gist options
  • Select an option

  • Save rpeck/e34e70e648f5b8e348e2810a8f58673c to your computer and use it in GitHub Desktop.

Select an option

Save rpeck/e34e70e648f5b8e348e2810a8f58673c to your computer and use it in GitHub Desktop.
Using Andrej Karpathy's `LLM Council` pattern to improve a Claude command that proposes updates to CLAUDE.md

wut

There's a discussion here in Reddit about using Claude to incrementally improve CLAUDE.md based on the current context.

I had Claude argue with itself in order to improve the prompt I got from there. This follows Andrej Karpathy's LLM Council pattern. I find that having Claude argue with Gemini works best (e.g., with the gemini-collab MCP server), but here I just let it argue with itself.

You can either grab the improved prompt in self-improve.md, or redo this yourself. To redo it, first grab a-c-m's tweak from https://gist.github.com/a-c-m/f4cead5ca125d2eaad073dfd71efbcfc and put it in ~/.claude/commands/self-improve.md. Then, run the following prompt in Claude Code:

Simulate two experts in Anthropic prompt engineering, slash commands, agents, skills, etc. discussing the prompt here: ~/.claude/commands/self-improve.md

1. Have the two experts independently give their improvement suggestions,  
2. next, simulate each criticize each other's suggestions,
3. then have simulate a CTO weighing the discussion and making a final plan.

I want to make sure that the prompt analyzes the current context for problems that we struggled with, and that it doesn't lose older important instructions in CLAUDE.md as it evolves that file. Also, make sure it follows DRY (Don't Repeat Yourself), that it focuses on CLAUDE.md being a machine-readable document for Claude Opus and Sonnet, and that it maintains a good organization for the file.

This prompt should work for any CLAUDE.md in any project.

ultrathink

Here is the full output from my run:


Expert Panel Discussion: Improving self-improve.md

🎯 The Prompt Under Review

The current self-improve.md command is designed to help Claude analyze and evolve CLAUDE.md files. Key structural elements:

  • Analysis Phase β†’ Interaction Phase β†’ Implementation Phase β†’ Output Format
  • Reviews chat history and current instructions
  • Iterative approval process

πŸ‘©β€πŸ”¬ Expert 1: Dr. Elena Vasquez

Role: Principal Prompt Engineer, Anthropic
Focus: Context engineering, machine-readability, Claude-specific optimization

Independent Suggestions

1. Add Explicit Problem Mining Phase

"The current prompt says 'analyze chat history' but provides no methodology. Claude needs explicit patterns to search for. I'd add:

## 0. Problem Mining Phase (BEFORE Analysis):

Scan the conversation history for evidence of friction:
- Error messages Claude had to recover from
- User corrections ("actually...", "no, I meant...", "wait...")
- Repeated clarifications on the same topic
- Workarounds the user had to provide
- Places where Claude apologized or had to retry

Categorize each problem:
- MISSING_CONTEXT: Information Claude needed but didn't have
- WRONG_ASSUMPTION: Claude assumed something incorrectly
- WORKFLOW_FRICTION: Process was clunky or unclear
- RULE_VIOLATION: Claude broke an existing CLAUDE.md rule

This makes problem identification systematic rather than vibes-based."

2. Preservation Protocol with Categorization

"The biggest risk is losing valuable older instructions. Before ANY modification, Claude should:

For each existing section in CLAUDE.md, classify as:
- βœ… RETAIN: Still relevant, no changes needed
- πŸ”„ UPDATE: Core intent valid, wording/details need refresh
- ⚠️ MERGE: Duplicates another section, consolidate
- ❌ REMOVE: Obsolete (must justify with specific reason)

NEVER remove content without explicit classification and user approval.
Default stance: RETAIN unless proven obsolete.
```"

3. Machine-Readability Optimization

"CLAUDE.md isn't for humansβ€”it's for Claude. The prompt should enforce:

Machine-Readability Rules:
- Use imperative voice: "Do X" not "You should consider doing X"
- Use markdown headers (##, ###) for logical hierarchy
- Put rules in lists, not paragraphs
- Use code blocks for examples (Claude parses these well)
- Avoid hedging language ("might", "perhaps", "consider")
- Be specific: "Run nbqa pyflakes before execution" not "validate code"
```"

4. DRY Enforcement via Content Audit

"Before adding anything new, scan for:

DRY Check:
1. Search CLAUDE.md for similar concepts/phrases
2. If found, extend the existing section rather than creating new
3. Use cross-references: "See [Section Name] above for details"
4. Extract repeated values into named constants (config values, thresholds)
5. Never duplicate rulesβ€”consolidate or reference
```"

πŸ‘¨β€πŸ’» Expert 2: Marcus Chen

Role: Senior Agent & Skills Architect, Anthropic
Focus: Agentic behavior, command patterns, maintainability, cross-project compatibility

Independent Suggestions

1. Pattern Recognition via Structured Search

"Elena's 'problem mining' is good conceptually, but let's make it actionable. Claude should actually search:

Problem Pattern Recognition:

Search the conversation for these literal patterns:
- grep: "NameError", "KeyError", "TypeError", "AttributeError"
- grep: "undefined", "not found", "doesn't exist"  
- grep: "sorry", "apologize", "my mistake", "correction"
- grep: "actually", "wait", "no I meant"

For each match, extract:
- What went wrong
- What the fix was
- What CLAUDE.md instruction could have prevented it

This is concrete and reproducible."

2. Delta-Based Evolution Model

"Rather than full rewrites, track changes explicitly:

Change Tracking:
For each proposed change, specify:
- OPERATION: ADD | MODIFY | REMOVE | MOVE | MERGE
- LOCATION: Section header or line reference
- BEFORE: Original text (for MODIFY/REMOVE)
- AFTER: New text (for ADD/MODIFY)
- RATIONALE: Why this change addresses identified problems

Present changes as a diff, not a complete rewrite.

This prevents the 'nuke and pave' failure mode where good stuff gets lost."

3. Project-Agnostic Discovery

"The current prompt hardcodes paths like /CLAUDE.md. For cross-project use:

File Discovery (project-agnostic):
- Search for CLAUDE.md at: ./CLAUDE.md, **/CLAUDE.md
- Find commands at: ./.claude/commands/*.md
- Find settings at: ./.claude/settings*.json
- Discover structure dynamicallyβ€”don't assume

Use relative paths only. Never hardcode absolute paths.
```"

4. Organizational Schema Enforcement

"CLAUDE.md files benefit from consistent structure. Suggest standard sections:

Recommended CLAUDE.md Organization:

## 🚨 Critical Rules (Non-Negotiable)
## πŸ“‹ Domain Knowledge  
## πŸ”§ Development Workflows
## πŸ› οΈ Tool Usage
## πŸ“ Project Structure
## βœ… Validation Requirements

If existing CLAUDE.md lacks sections, suggest adding headers.
If existing sections exist, preserve their naming conventions.
```"

5. Validation Before Finalization

"The current prompt has no validation step. Add:

Before finalizing changes:
1. Verify all cross-references still resolve
2. Check that no existing behavior is orphaned
3. Confirm machine-readability guidelines are followed
4. Ensure no duplication introduced (DRY check)
5. Test that examples in code blocks are syntactically valid
```"

πŸ”„ Cross-Critique Phase

Elena's Critique of Marcus

On Pattern Recognition via grep: "Mechanical grep patterns miss context. Claude should reason about what went wrong, not just string-match. If I say 'sorry about that' after a tangent, that's not a CLAUDE.md issue. The categorization I proposed (MISSING_CONTEXT, WRONG_ASSUMPTION, etc.) provides semantic understanding that grep can't."

On Delta-Based Evolution: "I appreciate the intent, but tracking BEFORE/AFTER for every change bloats the output and consumes tokens. For large CLAUDE.md files like Raymond's (2,500+ lines), this becomes unwieldy. Better to categorize sections first, then only show diffs for sections being modified."

On Organizational Schema: "Pre-defining section headers like '🚨 Critical Rules' is too rigid. Different projects have different needs. A web API project doesn't need 'πŸ“‹ Domain Knowledge' the way a data science project does. Let the schema emerge from existing content."


Marcus's Critique of Elena

On Preservation Protocol: "Classifying every section as RETAIN/UPDATE/MERGE/REMOVE sounds thorough, but for a 2,500-line file, this is token-expensive and slow. Claude would spend more time classifying than improving. We need surgical precision: identify problem areas first, then focus there."

On Machine-Readability Rules: "Saying 'use imperative voice' is too abstract. Claude needs examples. And some explanatory prose IS valuableβ€”it helps the next developer (or Claude instance) understand WHY a rule exists. Pure directive lists lose context."

On DRY Enforcement: "Searching for 'similar concepts/phrases' is vague. What's similar? 70% word overlap? Same semantic meaning? We need concrete criteria or this becomes subjective. My explicit grep patterns are reproducible."


🀝 Common Ground

Both experts agree on:

  1. Explicit problem analysis is essential β€” The current prompt is too vague about "analyzing chat history"
  2. Preservation of existing content must be enforced β€” Loss of working instructions is the primary risk
  3. Project-agnostic design is required β€” Hardcoded paths break portability
  4. DRY principle should be checked β€” But methodology differs
  5. Organization matters β€” But flexibility vs. prescription is debated

πŸ‘” CTO Weighs In

Role: Chief Technology Officer
Focus: Practical synthesis, user requirements, actionable outcomes

Analysis

"Both Elena and Marcus have valid points, but they're optimizing for different ends of a spectrum. Elena wants semantic understanding; Marcus wants reproducible mechanics. The right answer is both, in sequence.

The user's specific requirements are:

  1. Analyze current context for problems struggled with βœ“ (both address)
  2. Don't lose older important instructions βœ“ (both address, differently)
  3. Follow DRY βœ“ (both address)
  4. Machine-readable for Claude Opus/Sonnet βœ“ (Elena addresses)
  5. Maintain good organization βœ“ (Marcus addresses)
  6. Work for ANY CLAUDE.md βœ“ (Marcus addresses with paths)

Key decisions:

  • Problem Mining: Use Marcus's grep patterns AS INPUT to Elena's categorization framework. First find evidence mechanically, then reason about it semantically.

  • Preservation: Use a lightweight version of Elena's protocol. Don't classify everything upfrontβ€”only classify sections Claude proposes to change.

  • Delta vs Full Rewrite: Require delta-style output for CHANGES, but allow full output for net-new sections. Never show the entire CLAUDE.md in final output.

  • Organization: Suggest recommended sections but don't enforce. Discover and preserve existing structure first.

  • Machine-Readability: Include concrete examples (addressing Marcus's critique) but keep the rules (Elena was right that Claude needs directives).

  • DRY: Define concrete criteria: 'If the same concept appears in >1 place with >50% similar wording, consolidate or cross-reference.'"


πŸ“‹ CTO's Final Plan: Improved self-improve.md

Here's the synthesized, improved prompt:


πŸ“Š Summary of Changes from Original

Aspect Original Improved
Problem Analysis "Analyze chat history" (vague) Explicit grep patterns + semantic categorization
Preservation Not addressed Mandatory classification, default RETAIN
DRY Not addressed Explicit audit phase with criteria
Machine-Readability Not addressed Guidelines section with examples
Organization Not addressed Recommended sections + discovery-first approach
Project Agnostic Hardcoded /CLAUDE.md Relative paths, dynamic discovery
Output Format Full file dump Delta-only + summaries
Validation Not addressed Cross-reference + duplication check

Would you like me to create the improved self-improve.md file, Raymond?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment