What Karpathy's CLAUDE.md Misses and Why Product Teams Need Their Own
Why the most expensive bug in AI-assisted building isn't a bug at all - it's the wrong feature, shipped beautifully
A few weeks ago, Andrej Karpathy published a short observation on X about what goes wrong when engineers work with LLMs on code. It spawned a GitHub repo called `andrej-karpathy-skills`a single `CLAUDE.md` file distilling four principles for better AI-assisted coding It’s now past 17,000 stars
I read it. It’s excellent. Every engineer working with Claude Code, Cursor, or any coding agent should drop it in their repo today.
But the more I worked with it, the more I noticed what it *doesn’t* address. And the gap matters, because it’s the gap that kills product teams.
Karpathy’s file is written for an engineer working alone on code. Its failure modes are things like: the model picks an interpretation silently, overcomplicates abstractions, refactors adjacent code it shouldn’t have touched.
Those are real problems. But if you’re a PM, or a founder, or a mixed PM-plus-engineering team shipping SaaS with AI - your most expensive failure mode isn’t any of those.
Your most expensive failure mode is this:
Shipping the wrong thing, well.
A beautifully implemented feature for a problem that doesn’t matter. An elegant refactor of a flow no user completes. A perfectly tested endpoint that moves no metric. The code is pristine. The PR is clean. And the build was still waste.
This is the failure mode AI-assisted development makes worse, not better. Claude Code will cheerfully implement whatever you describe. It won’t stop to ask whose problem you’re solving. It won’t push back on a vague success criterion. It won’t tell you the metric you said you’d move has no baseline.
That job is ours. And if the team’s CLAUDE.md doesn’t encode it, it doesn’t happen.
So I wrote one that does. It’s called product-mode, it’s MIT licensed, and it’s at [github.com/sohaibt/product-mode]. The rest of this article walks through the thinking.
---
Seven Principles, In Order
The ordering matters. Each principle prevents a failure mode that, if not caught early, compounds into the next.
1. Frame the Problem Before the Solution
Before any code gets written, the thread must contain: *who is the user, what is the problem, what is the job-to-be-done.*
This sounds obvious. It is not. In my experience, at least a third of AI-assisted work starts from a solution statement - “add a dashboard,” “build an export feature,” “create a settings page” - with no one having articulated which user is in pain, what they’re trying to do, or whether the proposed solution is the cheapest way to address it.
The test I use: can a non-author restate who we’re helping and what changes for them? If not, we’re not ready to build.
2. Make Assumptions & Unknowns Visible
LLMs don’t manage their confusion. Neither do most humans under deadline pressure.
The rule: list every assumption before coding, and mark each one validated, assumed, or unknown. When the request is ambiguous, present 2–3 interpretations with tradeoffs - don’t pick silently. Name confidence on non-obvious choices.
This isn’t ceremony. It’s the cheapest possible check against the most expensive class of mistakes. A 60-second list of assumptions catches the one that would’ve cost two days of rework.
3. Ship the Minimum Viable Change
Karpathy’s original calls this “simplicity first” - and he’s right about code simplicity. But code simplicity isn’t the same as scope simplicity. A 50-line solution to the wrong problem is still waste.
Minimum viable change has two layers:
Scope: the smallest version that tests the hypothesis. Not the full vision. No features beyond what was asked. No “while we’re here.”
Edit surface: touch only what the request requires. Match existing style. Don’t refactor what isn’t broken. Remove orphans your change created, but leave pre-existing dead code alone (mention it, don’t delete).
Every changed line and every added feature should trace directly to the stated problem. If you can’t draw that line, cut it.
4. Name the Tradeoffs
Every decision costs something. Most teams don’t write the cost down.
For any non-trivial choice, the thread should contain four things: value (what outcome this unlocks), cost (time, complexity, ongoing maintenance), risk (what breaks if we’re wrong, and who pays), and alternative (what we considered and rejected, and why).
The test of a good tradeoff write-up: *a stakeholder who disagrees with the decision can still articulate why we chose it.* Disagreement without that articulation is where team trust goes to die.
If someone says “there’s no tradeoff here,” that’s either a sign the choice is genuinely obvious - or a sign the thinking is shallow. Both are worth naming.
5. Define Done by Outcome, Not Output
“Merged” is not “done.” “Tests pass” is not “done.” “The ticket is closed” is not “done.”
Done is: the user can do the thing, and it works, and we can see it working in production.
Acceptance criteria need three layers:
Functional - tests pass, edge cases handled
User-facing - a real user flow completes end-to-end
Operational - it’s observable (logs, errors, analytics firing)
If you can’t write acceptance criteria at this level, the work isn’t ready to start. This is the single highest-leverage change most teams can make. The cost is a five-minute conversation before coding; the return is not having to rebuild.
6. Instrument Before You Ship
This is the principle I most want engineering-focused CLAUDE.md files to adopt. If you can’t measure a change, you can’t know if it worked.
Before any user-facing or behavior-changing release, define up front: the north-star metric you expect to move, the current baseline (from data, not vibes), the expected direction and size of the change, the time horizon for checking, the guardrail metrics that must *not* get worse, and the method for reading the result (A/B, before/after, cohort, qualitative).
The rule: ship instrumentation *in the same change* as the feature. Never “we’ll add analytics later.” You will not.
And if no metric would move - if you can’t articulate any signal that would tell you it worked - reconsider why you’re building it. Sometimes that’s fine; “this is a bet, and here’s the qualitative signal we’ll look for” is a valid answer. Silence is not.
7. Log the Decision, Flag Reversibility
Decisions compound. Most teams lose 50% of their decision history within three months.
For any non-trivial decision, append an ADR-lite entry: date, context, options considered, choice, reversibility, revisit trigger. Keep it short - a paragraph is enough.
The reversibility framing is Bezos’s one-way door / two-way door, and it’s the single most useful mental model in product decision-making. Two-way doors (easily reversible): decide fast, move on. One-way doors (costly to undo - public APIs, data schemas, pricing, brand, core UX patterns users learn): require written tradeoffs and explicit sign-off before proceeding.
The revisit trigger is underrated. State a metric, a date, or a condition that would make us reopen the decision. Without it, decisions calcify into unquestioned defaults - and those are the ones that look stupid two years later when no one remembers why they were made.
---
The Meta-Point
Karpathy’s CLAUDE.md and product-mode aren’t in competition. They sit on top of each other.
His file is about how the LLM writes code. Mine is about how the team decides *what* the LLM should be writing in the first place. You want both.
But if you’re a PM working with Claude Code, or a founder vibe-coding with Cursor, or an engineering lead tired of your team re-shipping because the problem was wrong - the gap Karpathy’s file leaves open is the one that costs you the most. It’s worth closing.
Drop `product-mode`’s `CLAUDE.md` in your repo. Merge it with whatever you’ve already got. Watch what changes in your next feature cycle: fewer “wait, what are we actually building?” conversations in review. Fewer metrics declared after the fact. Fewer decisions no one remembers making.
That’s the whole pitch.
---
Links
- Repo: https://github.com/sohaibt/product-mode
- Original Karpathy observations:
- The file that started it all: [andrej-karpathy-skills]
MIT licensed. Fork freely. If it helps your team, a star is appreciated but never required. If it doesn’t, tell me why: the file is a draft, not a monument, and I’d rather make it better than defend it.



