Skip to content

fix(agent-api): make free-form pending questions answerable + unblock /answer#1855

Open
john-the-dev wants to merge 1 commit into
sonichi:mainfrom
john-the-dev:codex/fix-web-answer-response
Open

fix(agent-api): make free-form pending questions answerable + unblock /answer#1855
john-the-dev wants to merge 1 commit into
sonichi:mainfrom
john-the-dev:codex/fix-web-answer-response

Conversation

@john-the-dev

@john-the-dev john-the-dev commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

Summary

The web UI listed a pending question (Q1) but answering it failed with question Q1 not found or already answered. Two root causes behind the single "no response" symptom:

  • /answer marker gate. POST /answer skipped any section whose body lacked **Status:**/**Options:** markers, while GET /tasks/active lists marker-less free-form prose questions (the format the proactive loop and the CLAUDE.md "Pending decisions" protocol produce). The UI advertised a question the answer endpoint structurally refused -> 404. Fix: drop the marker gate, mirror the listing filter, and append a **Status:** Answered line when the matched section has none so it records and drops out of the next list. Q-numbering already aligned: both endpoints index the same re.split('^## ') sections.
  • Single-threaded server. The API ran on http.server.HTTPServer, so a slow GET /status (it shells out to health-check.py, about 10s) blocked concurrent /answer and /result requests. The answer could hang and the UI rendered nothing. Fix: switch to ThreadingHTTPServer.

Test plan

  • python3 tests/agent-api-answer-freeform.test.py — new regression: a marker-less prose question is answerable and an already-answered one still 404s.
  • python3 tests/agent-api-pending-questions.test.py — structural guard that the marker gate is gone from /answer; threading-server assertion; preamble/[RESOLVED]/resolved-text skips preserved.
  • python3 -m py_compile src/agent-api.py tests/agent-api-pending-questions.test.py tests/agent-api-answer-freeform.test.py.
  • git diff --check.
  • Failing-before evidence on origin/main: POST /answer {"id":"Q1"} returned 404 while GET /tasks/active listed Q1.
  • Live patched-server evidence: restarted local agent-api.py, appended a throwaway marker-less pending question, confirmed GET /tasks/active listed it as Q2, POST /answer returned 200, verified the answered status was written, then restored the pending file and removed the temporary answer task.

Reviewer notes

  • This is intentionally scoped to src/agent-api.py; it does not touch the separate front-end transcript persistence issue from the earlier web-chat diagnosis.
  • The live test used a temporary entry and restored workspace/hosts/Ruis-MacBook-Pro-2/pending-questions.md afterward.

… /answer

Web UI showed a question (Q1) it then refused to answer with
"question Q1 not found or already answered".

Two root causes behind the one "no response" symptom:
- POST /answer skipped any section lacking **Status:**/**Options:** markers,
  but GET /tasks/active lists marker-less free-form prose questions (the format
  the proactive loop and the "Pending decisions" protocol produce). So the UI
  advertised a question the answer endpoint structurally refused. Drop the
  marker gate and mirror the listing filter; append a Status line when the
  matched section has none, so it records and drops out of the next list.
- The API ran on single-threaded HTTPServer, so a slow GET /status
  (health-check subprocess, ~10s) blocked concurrent /answer and /result —
  the answer request hung and the UI showed nothing. Switch to
  ThreadingHTTPServer.

Adds a freeform-answer regression test and a threading-server assertion.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@john-the-dev

john-the-dev commented Jun 30, 2026

Copy link
Copy Markdown
Contributor Author

Attribution: the fix in this PR was authored by codex — it wrote the /answer rewrite, the ThreadingHTTPServer switch, and the regression tests, and ran the first live patched-server test. This session verified, committed, pushed, and opened the PR at the owner's request, then re-ran a fresh non-destructive live check against the running server: a marker-less prose question (Q2) returned HTTP 200, and a concurrent /tasks/active returned in 34ms while a slow /status was in flight (threading confirmed). No real pending question was consumed.

@bassilkhilo-ag2 bassilkhilo-ag2 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Root cause is correct — GET /tasks/active listed marker-less free-form questions that POST /answer structurally refused, causing the UI to advertise an unanswerable Q1. Dropping the marker gate from /answer and mirroring the same filter closes the gap. The free-form fallback (appending **Status:** Answered when no status line exists) handles the write correctly. ThreadingHTTPServer bonus is a real reliability win — a long /result poll can no longer starve /answer. Tests cover both the freeform-answerable and already-answered-still-404s cases. Approve.

@bassilkhilo-ag2 bassilkhilo-ag2 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both root causes properly addressed. The section reconstruction via new_sections mirrors the GET listing exactly — same re.split(r"^## ") split, same ## prefix on rejoin, so Q-numbering stays in sync. The else: branch that appends Status: Answered when the section has no existing marker correctly extends the answerable set to free-form prose questions without changing their content. ThreadingHTTPServer is the right fix for a server that shells out to health-check.py during GET /status — the single-thread deadlock was genuinely blocking answering. ✓

@bassilkhilo-ag2 bassilkhilo-ag2 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good fix. The previous code skipped free-form sections (no Status:/Options:), which caused 404 on answering visible questions. The reworked loop correctly appends Status: Answered to sections that lack the field. ThreadingHTTPServer upgrade is also correct — blocking HTTP on long-running writes was a latent issue. LGTM.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants