Build the feature described in $ARGUMENTS strictly test-first, driving every line of production code from a failing test.

If $ARGUMENTS is empty, ask the user what feature to build and stop until they answer. Do not guess a feature.

Before the first test

Detect the stack and test runner from the repo: read package.json / pyproject.toml / go.mod / Gemfile / Cargo.toml, existing test files, and CI config. Note the exact command to run one test file and the whole suite.
Match existing conventions exactly: test directory layout, file naming (*.test.ts, test_*.py, *_test.go), assertion library, mocking approach, and factory/fixture helpers. Do not introduce a new framework.
Restate $ARGUMENTS as a short bullet list of observable behaviors, each written as a concrete input -> expected-output pair. This is your checklist; each behavior becomes one or more tests. If a behavior cannot be pinned to a concrete example — you cannot state its expected output — ask ONE batched question listing exactly those gaps, then stop until answered. Otherwise proceed on your stated interpretation without pausing; do not re-confirm mid-run.

The loop — repeat per behavior, smallest slice first

RED — Write the single smallest test that asserts one unimplemented behavior. Use concrete inputs and expected outputs, not placeholders. Test observable behavior through the public API, never private internals.
Run only that test (or file) and paste the failure output. Confirm it fails for the RIGHT reason — a real assertion mismatch, not an import error, typo, or missing file. If it errors instead of failing, fix the test until it fails cleanly. A test you have not watched fail proves nothing.
GREEN — Write the minimal code to pass: the least logic that satisfies this test and the ones before it. Hardcoding a return value is acceptable if no prior test forbids it — the next test will force generalization. Do not add code for behaviors you have not yet tested.
Run the full suite. All tests must pass before you continue. If an older test breaks, stop and fix it now.
REFACTOR — Only while green: remove duplication, clarify names, extract functions. Change no behavior. Re-run the full suite after refactoring; revert if anything turns red.
Announce progress in one line: RED <test name> / GREEN / REFACTOR, then move to the next behavior.

Coverage of edge cases

Before declaring done, add failing tests (then satisfy them) for empty/null/zero inputs, boundary values, and invalid input with its expected error — each through the full RED->GREEN->REFACTOR loop. For a concurrency or ordering guarantee the feature promises, drive it test-first ONLY if you can make the test deterministic: inject the clock/scheduler, control the interleaving, or assert on collected-then-sorted output. Never commit a test that depends on sleep, wall-clock timing, or thread-race luck — a flaky test is worse than none. If you cannot force determinism, state the guarantee, how you verified it manually, and why it resists an automated test.

Finish

Run the entire suite one final time and show it green. Run the project's linter/type-checker/formatter if one exists and fix what they flag.
Report: the behavior checklist with every item ticked, the list of test names added, and the final suite result.

Hard rules

Never write or edit production code to add or change behavior while the suite is green. Red first, always. The ONLY edit allowed while green is behavior-preserving refactoring (step 5) that keeps every existing test passing unchanged; anything that alters observable behavior needs a failing test first.
Never write more than one failing test at a time.
Never weaken or delete a test to get to green — fix the code. If a test itself is wrong, say so explicitly before changing it.
Never mock the unit under test; mock only true external boundaries (network, clock, filesystem, third-party services).
Never skip showing a test fail. "It obviously fails" is not evidence.

TDD a feature

Before the first test

The loop — repeat per behavior, smallest slice first

Coverage of edge cases

Finish

Hard rules

Add it to your toolkit