Promptheus/commands36 commands · free · CC0Promptheus hub ↗
← All commands
/fix-tests

Test

Fix failing tests

Diagnose and fix failing / flaky tests at the root.

testingflakyci
CategoryTest
Argumentsnone
Allowed toolsReadGrepGlobEditBash
What it does

Run the test suite, diagnose the failures (including flaky ones) and fix the root cause — never mask with retries.

.claude/commands/fix-tests.md.claude/commands/ (project) · ~/.claude/commands/ (global)
Install into your repo
npx promptheus-commands add fix-tests

The prompt

Run the test suite, diagnose every failure to its real cause, fix the root cause, and re-run until green. $ARGUMENTS may scope this to a file, directory, or test pattern.

Setup

  1. Detect the runner from the project: package.json scripts (jest/vitest/mocha/playwright), pytest/tox.ini/pyproject.toml, go test, cargo test, rspec, phpunit, etc. Use the project's own command (e.g. the test script), not a global guess.
  2. Run the full suite once (or the $ARGUMENTS subset) and capture the complete output. List every failing test with its error and file:line before changing anything.
  3. If the suite won't even start (missing deps, config error, uncompiled TS), fix that first — that is not a test failure.

Diagnose each failure

For each failing test, read the test AND the code under test, then classify the cause before touching anything:

  • Genuine bug: the test asserts correct behavior and the code is wrong. Fix the code.
  • Wrong/stale assertion: the code is correct and the expectation is outdated. Before changing the assertion, run git log -p / git blame on both the test and the code under test to establish whether the behavior change was intentional (a deliberate commit/PR that changed the API, matched by a spec or changelog) or an accidental regression. Rewrite the assertion only once the new behavior is confirmed intended by that history — never by reading the desired value off the failure output. If the history shows a regression, fix the code instead.
  • Flake: reproduce it deterministically, then locate the source. Run the single test in isolation, then the whole file, then the suite in randomized order and repeated, using your runner's real flags: jest --shuffle and --runInBand (serial) — jest has no repeat flag, so loop the command in a shell; vitest --sequence.shuffle and --no-file-parallelism; pytest with the pytest-randomly plugin (--randomly-seed=<n> pins/replays an order) and pytest-repeat (--count=<n>), serial by dropping -n from pytest-xdist; go -shuffle=on -count=<n> (-count also defeats the test cache); cargo --test-threads=1; rspec --order random / --seed <n>. A test that passes alone but fails in the suite is shared-state or order coupling; one that fails intermittently alone is a timing/async race or nondeterminism.
  • CI-only (fails in CI but passes locally, or the reverse): treat the environment gap as the cause, not the test. Diff the two environments — missing/differing env vars, locale/timezone (LC_ALL, TZ), filesystem case-sensitivity, higher CI parallelism/worker count, tighter resource or time limits, absent services. Reproduce locally by matching CI: same env vars, same worker count and test order, same container/image.

Fix the root cause

  • Timing/async: await the actual condition or use the framework's fake timers / waitFor. Never add bare sleep/fixed delays or retry wrappers to paper over a race.
  • Order/shared state: find the leak — a mutated global, un-reset module, shared DB row, leftover file, unclosed handle. Reset it in beforeEach/afterEach or make the test self-contained. Do not fix it by pinning test order.
  • Nondeterminism: pin the clock, seed the RNG, fix locale/timezone, stub the network. Remove reliance on real wall-clock time or live services.
  • Environment gap: make the test provide what it needs (set the env var/locale in setup, skip explicitly with a documented reason when a required service is truly absent) rather than depending on ambient CI state.
  • Genuine bug: fix the source, not the test.

Hard rules

  • Never make a test pass by deleting it, marking it skip/xfail/.only, loosening an assertion to match wrong output, widening tolerances, or catching-and-ignoring. If a test is genuinely obsolete, say so and ask before removing.
  • Never add blanket retry/rerun config to hide flakes.
  • Change one cause at a time and re-run to confirm that fix before moving on; keep changes minimal and scoped to the failure.
  • After all fixes, run the entire suite (not just the previously-failing tests) to confirm green and no regressions. If flakes were involved, run it 2-3 times or shuffled to prove stability.
  • Do not touch snapshots wholesale — regenerate a snapshot only after verifying the new output is correct.

Report

State, per originally-failing test: the classification (bug / wrong assertion / flake- / env), the actual root cause in one line (cite the commit for a confirmed intended-change or regression), and the fix. End with the final suite result (counts) and how you verified stability.

Add it to your toolkit

Save it as .claude/commands/fix-tests.md or .cursor/commands/fix-tests.md, then invoke it with /fix-tests.

Back to top ↑