Agent repair walkthrough

Step by step — same bug, two agent workflows

CI verified — no LLM

39/39 fixtures pass

39 fixtures · 9 feature-build · 4 repair-plan loop · check-json ~170–260 tok vs TS paste · 78–92% less context.

bun run proof:agent-repair

Full CI suite: bun test tests/agent-repair-sufficiency.test.ts tests/agent-repair-multistep.test.ts && bun run benchmark:agent-repair

View test source on GitHub →

Live model benchmark

Point 100% · TS 100%

May 22, 2026, 3:16 AM · gpt-4.1, claude-opus-4-6, claude-sonnet-4-6 on 13 fixtures (35 single-shot, 4 CI-only loop). Success = point check passes.

Latest proof run: 78/78 API runs pass point check.

Claude Opus 4.6Point 13/13 · TS 13/13
Claude Sonnet 4.6Point 13/13 · TS 13/13
GPT-4.1Point 13/13 · TS 13/13

Full benchmark table →

Pick a CI fixture below. Each column shows what the agent sees at every step, how many tokens that step costs, and whether we prove it in CI or illustrate it from typical chat pastes.

What is actually verified?

Claim	Point workflow	TypeScript paste workflow
Repair without an LLM	Yes — golden line → point check in CI	No — no equivalent tsc→fix CI test
Context size measured	Yes — real check-json on fixtures	Estimated — paste heuristic, not a logged trace
Model can repair with this context	Live API — GPT-4o and Claude, point check gate	Live API — same .point file, larger paste prompt

CI verifiedRuns in point repo on every bun test — real check-json, golden line, point check. No LLM.
EstimatedIllustrative TS agent paste sizes (~4 chars/token). Not captured from a live Cursor trace.
Live modelReal API calls; success = point check passes on the applied line.

Fixture: tests/fixtures/agent-repair/unknown-field-broken.point — same broken file both workflows must repair. Model-eval context sizes: ~263 Point vs ~3000 TS for this case.

Agent task: Fix a typo in an existing launch readiness rule.

Point context~263tokens · one turn

TS paste~3,000tokens · one turn

Saved91%same repair task

With Point

Point agent loop

check-json → patch one line → point check

1. Run point check-json+~263 tok

Agent (or you) runs the CLI. Compiler returns structured JSON — no file paste.

CI verified

What the agent reads

{
  "schemaVersion": "point.core.check.v1",
  "ok": false,
  "diagnostics": [
    {
      "code": "unknown-field",
      "message": "Unknown field unknownField on LaunchSignals",
      "path": "fn.launchReadinessScore.if.condition",
      "ref": "point://semantic/Math/rule.launch readiness",
      "severity": "error",
      "span": {
        "start": {
          "line": 12,
          "column": 1,
          "offset": 207
        },
        "end": {
          "line": 12,
          "column": 36,
          "offset": 242
        }
      },
      "expected": [
        "has bundle id",
        "submitted for review",
        "has passing tests"
      ],
      "actual": "unknownField",
      "repair": "Use one of: has bundle id, submitted for review, has passing tests.",
      "relatedRefs": [
        "point://semantic/Math/record.Launch Signals.field.has bundle id",
        "point://semantic/Math/record.Launch Signals.field.submitted for review",
        "point://semantic/Math/record.Launch Signals.field.has passing tests"
      ]
    }
  ]
}

Measured from real CLI output on this fixture (1,051 chars). CI asserts context stays under 1,200 chars.

2. Patch the line at ref+0 tok
Agent picks a field from expected, replaces line 12. No repo search — ref and repair tell it where.
CI verified
One-line change
```
-   add 30 when signals.unknown field
+   add 30 when signals.has bundle id
```
CI applies the golden line from the fixed fixture — no LLM — and point check passes.

3. Run point check+0 tok

Same gate as CI. If check passes, the repair loop is done.

CI verified

Terminal

$ point check tests/fixtures/agent-repair/unknown-field-broken.point
Point core check passed: tests/fixtures/agent-repair/unknown-field-broken.point

Without Point

TypeScript + chat paste

tsc error → paste files → guess → often retry

1. Run tsc — error only+~31 tok
Compiler returns a line number in emitted JS/TS. No list of valid field names.
Estimated
What the agent reads first
```
error TS2339: Property 'unknownField' does not exist on type 'LaunchSignals'.
  at launchReadinessScore (lib/math.ts:18:15)
```
Real error shape; token count from this message only.

2. Paste surrounding code+~2,969 tok

Typical Cursor/Codex workflow: paste component, lib, tests — agent hunts for the typo.

Estimated

Representative paste (truncated)

// ReadinessPanel.tsx — excerpt (~320 lines total in real repos)
import { useMemo } from "react";
import type { LaunchSignals } from "../../types";
import { launchReadinessScore, scoreStatusLabel } from "../../lib/math";

export function ReadinessPanel({ signals }: { signals: LaunchSignals }) {
  const score = useMemo(() => launchReadinessScore(signals), [signals]);
  const label = scoreStatusLabel(score);
  return (<section><h2>Launch readiness</h2><p>Score: {score} — {label}</p></section>);
}

// lib/math.ts — agent often pastes this too when tsc fails
export function launchReadinessScore(signals: LaunchSignals): number {
  let score = 0;
  if (signals.unknownField) score += 30;
  if (signals.submittedForReview) score += 40;
  if (signals.hasPassingTests) score += 30;
  return score;
}

Illustrative ~12,000-char paste heuristic. Not from a logged agent session.

3. Wrong fix → same paste again+~3,000 tok
If the model emits camelCase or patches the wrong file, the next turn reloads the same context.
Estimated
Second turn (common)
```
Same ~3,000 tokens pasted again + new error output
```
Not CI-tested. Shown because retry loops are common without structured repair hints.

Reproduce CI proof: bun test tests/agent-repair-sufficiency.test.ts tests/agent-repair-multistep.test.ts && bun run benchmark:agent-repair · Fixtures: tests/fixtures/agent-repair/ · Benchmarks and tables →