Four Principles to Stop LLMs Overengineering Your Code

TL;DR

What it solves: Stops assistants from making wide, overconfident refactors by giving them four short, checkable rules.
Why it matters: Small, testable edits reduce regressions and cut review churn.
Best for: Teams using Claude Code or Claude-style coding assistants who need predictable, verifiable changes.
Best use case: Install the plugin org-wide or add a per-repo CLAUDE.md for non-trivial fixes and refactors.
Main differentiator: Rules-first approach - “Think Before Coding”, “Simplicity First”, “Surgical Changes”, “Goal-Driven Execution”.

I asked an assistant for a one-line fix. It returned a 300-line PR and a cheerful commit message. The CI was green, the rollback was not.

It felt like hiring a carpenter who showed up with a chainsaw. There was a checklist missing, and I still did not know who picked the screws.

Foundation

This repo provides a single CLAUDE.md plus an optional Claude Code plugin so that Claude-style assistants follow four explicit principles and make surgical, testable, minimal edits.

Why now: teams use coding agents across many repos, and accidental broad edits are common. The author’s guidance packages Karpathy-inspired rules into a plugin or a per-repo file so the assistant has a shared source of truth.

The core pain: ambiguous prompts + overconfident agents = large diffs you must personally verify.

Real-World Use Cases

Plugin-first enforcement
When you run Claude Code across many repos, install the plugin once and enforce org-level defaults.
Per-repo canonical guidance
Add a CLAUDE.md to a repo so any collaborator or assistant sees the same rules.
Append-only adoption
Append the canonical file to existing repos to adopt the guidance without overwriting local changes.
Plugin + local overrides
Use the plugin for defaults and a small Project-Specific Guidelines section in CLAUDE.md for exceptions.
Surgical workflow (tests-first enforcement)
Convert “fix X” into a testable success criterion and loop until tests pass.

Walkthrough - surgical workflow (before / after):

Input (what you ask):

Human: "Fix the timeout in the handler. Tests should still pass."

Without tests-first guidance:

- // many files touched
+ // 20-file refactor: extracted utils, updated handlers, bumped major dependency

With CLAUDE.md + goal-driven execution:

- // 1 file changed
+ handler.go: adjust timeout from 30s to 60s, add regression test

The observation: a tests-first instruction turns a fuzzy request into a measurable change, therefore the assistant’s edits shrink.

Markdy animation

How to Use It

Top commands the README documents, with their effect.

Install plugin (recommended org-wide)

/plugin marketplace add forrestchang/andrej-karpathy-skills
/plugin install andrej-karpathy-skills@karpathy-skills

Result: The principles apply org-wide via the plugin, making enforcement and updates easier.

Add CLAUDE.md to a repo (new project)

curl -o CLAUDE.md https://raw.githubusercontent.com/forrestchang/andrej-karpathy-skills/main/CLAUDE.md

Result: Adds canonical guidance to the repo so collaborators and assistants read the same rules.

Append canonical guidance to an existing repo

echo "" >> CLAUDE.md
curl https://raw.githubusercontent.com/forrestchang/andrej-karpathy-skills/main/CLAUDE.md >> CLAUDE.md

Result: Appends the standard guidance without overwriting existing local files.

Tests-first pattern (workflow) Use the “Goal-Driven Execution” pattern from the README: convert requests to success criteria, require tests, and loop until tests pass. Result: Measurable edits and reduced rework.

💡 Tip: prefer the plugin for many repos; prefer per-repo CLAUDE.md for single-repo control.

Configuration & Customization

Plugin vs per-repo: plugin for org consistency, per-repo for repo-specific exceptions.
Project-Specific Guidelines: append a short section to CLAUDE.md for local conventions (TypeScript strict mode, required tests, paths to check).
Tests-first enforcement: require a failing test or a clear success metric when requesting non-trivial changes.
Diff-size signals: use diff-size or changed-file thresholds to reject large, unverified edits automatically.
Escalation policy: require human approval for changes touching critical paths.

When to use each:

Use the plugin if you manage multiple repos and want a single source of truth.
Use per-repo CLAUDE.md if a repo has unique style or legal constraints.
Use tests-first when the change impacts correctness or customer-facing behavior.

Where It Fits (And Where It Doesn’t)

Fits: Claude Code and any assistant that reads CLAUDE.md or accepts plugin marketplace installs.
Replaces: ad-hoc LLM instructions scattered across PRs and chat messages.
Complements: existing linters, test suites, and CI gating - use them together to make edits verifiable.
Not for: workflows that need high-speed, trial-and-error edits where human review is expensive and recovery is easy.

The Rough Edges

Conservative bias: the guidance intentionally slows non-trivial edits to reduce risk; this can feel slow on tiny fixes.
Remote curl risk: the per-repo install uses curl from raw.githubusercontent.com; always inspect fetched content before committing.
Snapshot completeness: the research snapshot lacked some example files, so example-driven docs may be incomplete.
Enforcement surface: the plugin makes rules consistent, therefore you must keep the plugin and per-repo overrides in sync.

⚠️ Warning: appending remote CLAUDE.md without review can introduce unexpected rules. Inspect before committing.

Getting Started

For many repos, install the plugin:

/plugin marketplace add forrestchang/andrej-karpathy-skills
/plugin install andrej-karpathy-skills@karpathy-skills

For a single repo, add the file:

curl -o CLAUDE.md https://raw.githubusercontent.com/forrestchang/andrej-karpathy-skills/main/CLAUDE.md

Optionally add a short Project-Specific Guidelines section to tune rules for your codebase.
Start requesting edits as tests-first goals and require the assistant to loop until tests pass.

How It Compares: Alternatives

The README mentions Multica as a related project by the author. Use Multica for broader agent orchestration; prefer the Karpathy-styled CLAUDE.md when your immediate goal is conservative, surgical coding-agent behavior.

FAQ

Q: Will installing the plugin automatically change past PRs?
A: No. The plugin sets defaults going forward; existing PRs are unaffected unless you re-run the agent against them.

Q: When should I prefer a per-repo CLAUDE.md over the plugin?
A: Prefer per-repo when you need repo-specific exceptions or legal constraints that cannot be expressed as org defaults.

Q: How does tests-first enforcement work in practice?
A: Convert the request into explicit success criteria (a failing test, then a passing test) and require the assistant to iterate until tests pass.

Q: Does the guidance prevent all risky changes?
A: No. It reduces risk by biasing toward caution and minimal edits, therefore you still need CI and human review for critical paths.

Q: What are the four principles in one line?
A: Think Before Coding, Simplicity First, Surgical Changes, Goal-Driven Execution.

Final Thoughts

The scene started with a cheerful bot and a chainsaw. Four short rules move the saw back into the toolbox.

If you install the plugin or add one CLAUDE.md, you get the four concrete guardrails that bias assistants toward tests and surgical edits. That is the change: four short principles that make agent edits smaller and safer.

forrestchang/andrej-karpathy-skills · no license · 54909