feat: thinking-only prefill continuation for structured reasoning responses by teknium1 · Pull Request #5931 · NousResearch/hermes-agent

teknium1 · 2026-04-07T20:18:55Z

Summary

When the model produces structured reasoning (via API fields like .reasoning, .reasoning_content, .reasoning_details) but no visible text content, the agent now appends the assistant message as prefill and continues the loop. The model sees its own reasoning context on the next turn and produces the text portion.

Inspired by clawdbot's "incomplete-text" recovery pattern discovered during competitive codebase analysis (clawdbot, pi-mono, cline).

What changed

run_agent.py (6 change points):

Added _thinking_prefill_retries counter (reset per turn)
Core prefill logic: detects structured reasoning + no content → appends message with _thinking_prefill=True marker → continues loop (up to 2 attempts)
Prefill message cleanup on both tool-call and final-response paths (prevents consecutive assistant messages that break Anthropic's strict role alternation)
_thinking_prefill marker stripped from all 3 API message building paths
Counter reset on successful content

tests/test_run_agent.py:

Updated 2 existing tests for new prefill behavior (expect 3 API calls instead of 1)
Added test_reasoning_only_prefill_succeeds_on_continuation — verifies prefill produces content with no consecutive assistant messages

Design decisions

Only structured reasoning — inline <think> tags without API reasoning fields go straight to (empty) as before
2 attempts max — avoids infinite loops; falls through to existing behavior after exhaustion
Prefill messages popped on success — maintains strict role alternation for all providers
Works across providers — OpenAI (continuation), Anthropic (native prefill), Codex (already has similar pattern)

Testing

228/228 unit tests pass
4 E2E scenarios with real imports verified
Live E2E: injected thinking-only response → real OpenRouter continuation → model produced correct content
Confirmed Qwen models consistently produce structured-reasoning-only under token pressure (6/6 attempts)
25 concurrent normal-token requests confirmed stop + reasoning-only is a genuine provider anomaly (0/25 reproduced organically)

…ponses When the model produces structured reasoning (via API fields like .reasoning, .reasoning_content, .reasoning_details) but no visible text content, append the assistant message as prefill and continue the loop. The model sees its own reasoning context on the next turn and produces the text portion. Inspired by clawdbot's 'incomplete-text' recovery pattern. Up to 2 prefill attempts before falling through to the existing '(empty)' terminal. Key design decisions: - Only triggers for structured reasoning (API fields), NOT inline <think> tags - Prefill messages are popped on success to maintain strict role alternation - _thinking_prefill marker stripped from all API message building paths - Works across all providers: OpenAI (continuation), Anthropic (native prefill) Verified with E2E tests: simulated thinking-only → real OpenRouter continuation produces correct content. Also confirmed Qwen models consistently produce structured-reasoning-only responses under token pressure.

github-actions · 2026-04-07T20:19:11Z

⚠️ Supply Chain Risk Detected

This PR contains patterns commonly associated with supply chain attacks. This does not mean the PR is malicious — but these patterns require careful human review before merging.

⚠️ WARNING: exec() or eval() usage

Dynamic code execution can hide malicious behavior, especially when combined with base64 or network fetches.

Matches (first 20):

5506:+        stat_result = self._exec(stat_cmd)
5527:+        b64_result = self._exec(b64_cmd, timeout=30)
5541:+            dim_result = self._exec(dim_cmd)

⚠️ WARNING: Install hook files modified

These files can execute code during package installation or interpreter startup.

Files:

hermes_cli/memory_setup.py
hermes_cli/setup.py

Automated scan triggered by supply-chain-audit. If this is a false positive, a maintainer can approve after manual review.

…ponses (NousResearch#5931) When the model produces structured reasoning (via API fields like .reasoning, .reasoning_content, .reasoning_details) but no visible text content, append the assistant message as prefill and continue the loop. The model sees its own reasoning context on the next turn and produces the text portion. Inspired by clawdbot's 'incomplete-text' recovery pattern. Up to 2 prefill attempts before falling through to the existing '(empty)' terminal. Key design decisions: - Only triggers for structured reasoning (API fields), NOT inline <think> tags - Prefill messages are popped on success to maintain strict role alternation - _thinking_prefill marker stripped from all API message building paths - Works across all providers: OpenAI (continuation), Anthropic (native prefill) Verified with E2E tests: simulated thinking-only → real OpenRouter continuation produces correct content. Also confirmed Qwen models consistently produce structured-reasoning-only responses under token pressure.

When a model returns no content, no structured reasoning, and no tool calls (common with open models), the agent now nudges the model up to 3 times before falling through to (empty). Each retry appends the empty assistant message and a system nudge asking the model to respond. This fills the last gap in the empty-response recovery chain: 1. _last_content_with_tools fallback (prior tool turn had content) 2. Thinking-only prefill continuation (#5931 — structured reasoning) 3. Empty response nudge retry (NEW — truly empty, no reasoning) 4. (empty) terminal (last resort after all retries exhausted) Inline <think> blocks are excluded — the model chose to reason, it just produced no visible text. That is different from truly empty. Tests: - Updated test_truly_empty to expect 4 API calls (1 + 3 retries) - Added test_truly_empty_response_succeeds_on_nudge (content on retry)

When a model returns no content, no structured reasoning, and no tool calls (common with open models), the agent now silently retries up to 3 times before falling through to (empty). Silent retry (no synthetic messages) keeps the conversation history clean, preserves prompt caching, and respects the no-synthetic-user- injection invariant. Most empty responses from open models are transient (provider hiccups, rate limits, sampling flukes) so a simple retry is sufficient. This fills the last gap in the empty-response recovery chain: 1. _last_content_with_tools fallback (prior tool turn had content) 2. Thinking-only prefill continuation (#5931 — structured reasoning) 3. Empty response silent retry (NEW — truly empty, no reasoning) 4. (empty) terminal (last resort after all retries exhausted) Inline <think> blocks are excluded — the model chose to reason, it just produced no visible text. That differs from truly empty. Tests: - Updated test_truly_empty to expect 4 API calls (1 + 3 retries) - Added test_truly_empty_response_succeeds_on_nudge

…ponses (NousResearch#5931) When the model produces structured reasoning (via API fields like .reasoning, .reasoning_content, .reasoning_details) but no visible text content, append the assistant message as prefill and continue the loop. The model sees its own reasoning context on the next turn and produces the text portion. Inspired by clawdbot's 'incomplete-text' recovery pattern. Up to 2 prefill attempts before falling through to the existing '(empty)' terminal. Key design decisions: - Only triggers for structured reasoning (API fields), NOT inline <think> tags - Prefill messages are popped on success to maintain strict role alternation - _thinking_prefill marker stripped from all API message building paths - Works across all providers: OpenAI (continuation), Anthropic (native prefill) Verified with E2E tests: simulated thinking-only → real OpenRouter continuation produces correct content. Also confirmed Qwen models consistently produce structured-reasoning-only responses under token pressure.

…rch#6488) When a model returns no content, no structured reasoning, and no tool calls (common with open models), the agent now silently retries up to 3 times before falling through to (empty). Silent retry (no synthetic messages) keeps the conversation history clean, preserves prompt caching, and respects the no-synthetic-user- injection invariant. Most empty responses from open models are transient (provider hiccups, rate limits, sampling flukes) so a simple retry is sufficient. This fills the last gap in the empty-response recovery chain: 1. _last_content_with_tools fallback (prior tool turn had content) 2. Thinking-only prefill continuation (NousResearch#5931 — structured reasoning) 3. Empty response silent retry (NEW — truly empty, no reasoning) 4. (empty) terminal (last resort after all retries exhausted) Inline <think> blocks are excluded — the model chose to reason, it just produced no visible text. That differs from truly empty. Tests: - Updated test_truly_empty to expect 4 API calls (1 + 3 retries) - Added test_truly_empty_response_succeeds_on_nudge

…ponses (NousResearch#5931) When the model produces structured reasoning (via API fields like .reasoning, .reasoning_content, .reasoning_details) but no visible text content, append the assistant message as prefill and continue the loop. The model sees its own reasoning context on the next turn and produces the text portion. Inspired by clawdbot's 'incomplete-text' recovery pattern. Up to 2 prefill attempts before falling through to the existing '(empty)' terminal. Key design decisions: - Only triggers for structured reasoning (API fields), NOT inline <think> tags - Prefill messages are popped on success to maintain strict role alternation - _thinking_prefill marker stripped from all API message building paths - Works across all providers: OpenAI (continuation), Anthropic (native prefill) Verified with E2E tests: simulated thinking-only → real OpenRouter continuation produces correct content. Also confirmed Qwen models consistently produce structured-reasoning-only responses under token pressure.

…rch#6488) When a model returns no content, no structured reasoning, and no tool calls (common with open models), the agent now silently retries up to 3 times before falling through to (empty). Silent retry (no synthetic messages) keeps the conversation history clean, preserves prompt caching, and respects the no-synthetic-user- injection invariant. Most empty responses from open models are transient (provider hiccups, rate limits, sampling flukes) so a simple retry is sufficient. This fills the last gap in the empty-response recovery chain: 1. _last_content_with_tools fallback (prior tool turn had content) 2. Thinking-only prefill continuation (NousResearch#5931 — structured reasoning) 3. Empty response silent retry (NEW — truly empty, no reasoning) 4. (empty) terminal (last resort after all retries exhausted) Inline <think> blocks are excluded — the model chose to reason, it just produced no visible text. That differs from truly empty. Tests: - Updated test_truly_empty to expect 4 API calls (1 + 3 retries) - Added test_truly_empty_response_succeeds_on_nudge

teknium1 merged commit ab8f9c0 into main Apr 7, 2026
5 of 6 checks passed

teknium1 mentioned this pull request Apr 9, 2026

fix: retry 3 times with nudge when model returns truly empty response #6488

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: thinking-only prefill continuation for structured reasoning responses#5931

feat: thinking-only prefill continuation for structured reasoning responses#5931
teknium1 merged 1 commit intomainfrom
hermes/hermes-81f85bb4

teknium1 commented Apr 7, 2026

Uh oh!

Uh oh!

github-actions bot commented Apr 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

teknium1 commented Apr 7, 2026

Summary

What changed

Design decisions

Testing

Uh oh!

Uh oh!

github-actions bot commented Apr 7, 2026

⚠️ Supply Chain Risk Detected

⚠️ WARNING: exec() or eval() usage

⚠️ WARNING: Install hook files modified

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant