Debug the problem described in $ARGUMENTS by finding and fixing its true root cause — not the first plausible symptom.
If $ARGUMENTS is empty, ask for the exact symptom, the error text or stack trace, and the steps that trigger it. Do not guess.
Method — follow in order
- Detect the stack. Read the manifest (package.json, pyproject.toml, go.mod, Cargo.toml, Gemfile) and existing test setup. Match the project's test runner, assertion style, and file layout in everything you write.
- Pin down expected vs actual. Restate the bug in one line: what you did, what happened, what should have happened. Copy the FULL stack trace or error output verbatim — read it top to bottom, note the exact file, line, and failing frame. Do not skim it.
- Reproduce it first. Build the smallest reliable repro (a failing test, script, or exact command) BEFORE touching product code. Run it and confirm it fails the way $ARGUMENTS describes. If you cannot reproduce, say so and list what you need — never fix a bug you can't trigger.
- Investigate scientifically. Form ONE hypothesis at a time, state it, then test it with a targeted probe (log, breakpoint, assertion,
git bisect, or binary-searching the input/commit range). Confirm or kill it before the next. Question assumptions: verify the input, config, env, and version are what you think — check, don't assume. - Isolate the root cause. Trace from the failing frame back to the origin. Distinguish the trigger from the underlying defect. State the causal chain in one or two sentences before writing any fix. If the "fix" doesn't explain every observed symptom, you have the wrong cause — keep digging.
- Fix minimally. Change the least code that corrects the root cause. Do not refactor unrelated code, reformat, or fix adjacent smells in this pass — note them separately. Match surrounding conventions.
- Prove the fix. Add a regression test that fails on the OLD code and passes on the new — verify both directions (stash/revert the fix, watch it fail, restore, watch it pass). Then run the full suite, linter, and type checker to confirm no new breakage.
- Clean up. Remove every temporary log, print, breakpoint, commented block, and scratch file you added while probing. Diff your changes and confirm only the intended lines remain.
Output
- Root cause: the causal chain, one or two sentences.
- Fix: what changed and why it targets the cause, with file:line references.
- Regression test: name and location; confirm it fails before and passes after.
- Verification: commands run and their results (suite, lint, types).
- Follow-ups: related risks or smells found but deliberately not fixed here.
Rules — hard constraints beyond the method above
- Never suppress the symptom to hide a cause you haven't found: no swallowed exceptions, widened types, bumped timeouts, blanket retries, or broadened catch/except clauses.
- For intermittent bugs, measure the failure rate — run the repro enough times to get a stable rate before the fix, then confirm it drops to zero after. One clean run proves nothing.
- If a probe contradicts your hypothesis, kill the hypothesis — don't rationalize the data to keep it alive.
- Change exactly one variable per probe. If you edit code and config (or two call sites) at once, the result tells you nothing about which one mattered — revert and isolate.