feat: thinking-only prefill continuation for structured reasoning responses#5931
Merged
feat: thinking-only prefill continuation for structured reasoning responses#5931
Conversation
…ponses When the model produces structured reasoning (via API fields like .reasoning, .reasoning_content, .reasoning_details) but no visible text content, append the assistant message as prefill and continue the loop. The model sees its own reasoning context on the next turn and produces the text portion. Inspired by clawdbot's 'incomplete-text' recovery pattern. Up to 2 prefill attempts before falling through to the existing '(empty)' terminal. Key design decisions: - Only triggers for structured reasoning (API fields), NOT inline <think> tags - Prefill messages are popped on success to maintain strict role alternation - _thinking_prefill marker stripped from all API message building paths - Works across all providers: OpenAI (continuation), Anthropic (native prefill) Verified with E2E tests: simulated thinking-only → real OpenRouter continuation produces correct content. Also confirmed Qwen models consistently produce structured-reasoning-only responses under token pressure.
|
Schrotti77
pushed a commit
to Schrotti77/hermes-agent
that referenced
this pull request
Apr 7, 2026
…ponses (NousResearch#5931) When the model produces structured reasoning (via API fields like .reasoning, .reasoning_content, .reasoning_details) but no visible text content, append the assistant message as prefill and continue the loop. The model sees its own reasoning context on the next turn and produces the text portion. Inspired by clawdbot's 'incomplete-text' recovery pattern. Up to 2 prefill attempts before falling through to the existing '(empty)' terminal. Key design decisions: - Only triggers for structured reasoning (API fields), NOT inline <think> tags - Prefill messages are popped on success to maintain strict role alternation - _thinking_prefill marker stripped from all API message building paths - Works across all providers: OpenAI (continuation), Anthropic (native prefill) Verified with E2E tests: simulated thinking-only → real OpenRouter continuation produces correct content. Also confirmed Qwen models consistently produce structured-reasoning-only responses under token pressure.
saxster
pushed a commit
to saxster/hermes-agent
that referenced
this pull request
Apr 8, 2026
…ponses (NousResearch#5931) When the model produces structured reasoning (via API fields like .reasoning, .reasoning_content, .reasoning_details) but no visible text content, append the assistant message as prefill and continue the loop. The model sees its own reasoning context on the next turn and produces the text portion. Inspired by clawdbot's 'incomplete-text' recovery pattern. Up to 2 prefill attempts before falling through to the existing '(empty)' terminal. Key design decisions: - Only triggers for structured reasoning (API fields), NOT inline <think> tags - Prefill messages are popped on success to maintain strict role alternation - _thinking_prefill marker stripped from all API message building paths - Works across all providers: OpenAI (continuation), Anthropic (native prefill) Verified with E2E tests: simulated thinking-only → real OpenRouter continuation produces correct content. Also confirmed Qwen models consistently produce structured-reasoning-only responses under token pressure.
teknium1
added a commit
that referenced
this pull request
Apr 9, 2026
When a model returns no content, no structured reasoning, and no tool calls (common with open models), the agent now nudges the model up to 3 times before falling through to (empty). Each retry appends the empty assistant message and a system nudge asking the model to respond. This fills the last gap in the empty-response recovery chain: 1. _last_content_with_tools fallback (prior tool turn had content) 2. Thinking-only prefill continuation (#5931 — structured reasoning) 3. Empty response nudge retry (NEW — truly empty, no reasoning) 4. (empty) terminal (last resort after all retries exhausted) Inline <think> blocks are excluded — the model chose to reason, it just produced no visible text. That is different from truly empty. Tests: - Updated test_truly_empty to expect 4 API calls (1 + 3 retries) - Added test_truly_empty_response_succeeds_on_nudge (content on retry)
teknium1
added a commit
that referenced
this pull request
Apr 9, 2026
When a model returns no content, no structured reasoning, and no tool calls (common with open models), the agent now silently retries up to 3 times before falling through to (empty). Silent retry (no synthetic messages) keeps the conversation history clean, preserves prompt caching, and respects the no-synthetic-user- injection invariant. Most empty responses from open models are transient (provider hiccups, rate limits, sampling flukes) so a simple retry is sufficient. This fills the last gap in the empty-response recovery chain: 1. _last_content_with_tools fallback (prior tool turn had content) 2. Thinking-only prefill continuation (#5931 — structured reasoning) 3. Empty response silent retry (NEW — truly empty, no reasoning) 4. (empty) terminal (last resort after all retries exhausted) Inline <think> blocks are excluded — the model chose to reason, it just produced no visible text. That differs from truly empty. Tests: - Updated test_truly_empty to expect 4 API calls (1 + 3 retries) - Added test_truly_empty_response_succeeds_on_nudge
teknium1
added a commit
that referenced
this pull request
Apr 9, 2026
When a model returns no content, no structured reasoning, and no tool calls (common with open models), the agent now silently retries up to 3 times before falling through to (empty). Silent retry (no synthetic messages) keeps the conversation history clean, preserves prompt caching, and respects the no-synthetic-user- injection invariant. Most empty responses from open models are transient (provider hiccups, rate limits, sampling flukes) so a simple retry is sufficient. This fills the last gap in the empty-response recovery chain: 1. _last_content_with_tools fallback (prior tool turn had content) 2. Thinking-only prefill continuation (#5931 — structured reasoning) 3. Empty response silent retry (NEW — truly empty, no reasoning) 4. (empty) terminal (last resort after all retries exhausted) Inline <think> blocks are excluded — the model chose to reason, it just produced no visible text. That differs from truly empty. Tests: - Updated test_truly_empty to expect 4 API calls (1 + 3 retries) - Added test_truly_empty_response_succeeds_on_nudge
DiscoStew6082
pushed a commit
to DiscoStew6082/hermes-agent
that referenced
this pull request
Apr 9, 2026
…ponses (NousResearch#5931) When the model produces structured reasoning (via API fields like .reasoning, .reasoning_content, .reasoning_details) but no visible text content, append the assistant message as prefill and continue the loop. The model sees its own reasoning context on the next turn and produces the text portion. Inspired by clawdbot's 'incomplete-text' recovery pattern. Up to 2 prefill attempts before falling through to the existing '(empty)' terminal. Key design decisions: - Only triggers for structured reasoning (API fields), NOT inline <think> tags - Prefill messages are popped on success to maintain strict role alternation - _thinking_prefill marker stripped from all API message building paths - Works across all providers: OpenAI (continuation), Anthropic (native prefill) Verified with E2E tests: simulated thinking-only → real OpenRouter continuation produces correct content. Also confirmed Qwen models consistently produce structured-reasoning-only responses under token pressure.
DiscoStew6082
pushed a commit
to DiscoStew6082/hermes-agent
that referenced
this pull request
Apr 9, 2026
…rch#6488) When a model returns no content, no structured reasoning, and no tool calls (common with open models), the agent now silently retries up to 3 times before falling through to (empty). Silent retry (no synthetic messages) keeps the conversation history clean, preserves prompt caching, and respects the no-synthetic-user- injection invariant. Most empty responses from open models are transient (provider hiccups, rate limits, sampling flukes) so a simple retry is sufficient. This fills the last gap in the empty-response recovery chain: 1. _last_content_with_tools fallback (prior tool turn had content) 2. Thinking-only prefill continuation (NousResearch#5931 — structured reasoning) 3. Empty response silent retry (NEW — truly empty, no reasoning) 4. (empty) terminal (last resort after all retries exhausted) Inline <think> blocks are excluded — the model chose to reason, it just produced no visible text. That differs from truly empty. Tests: - Updated test_truly_empty to expect 4 API calls (1 + 3 retries) - Added test_truly_empty_response_succeeds_on_nudge
dbmizrahi
pushed a commit
to dbmizrahi/hermes-agent
that referenced
this pull request
Apr 10, 2026
…ponses (NousResearch#5931) When the model produces structured reasoning (via API fields like .reasoning, .reasoning_content, .reasoning_details) but no visible text content, append the assistant message as prefill and continue the loop. The model sees its own reasoning context on the next turn and produces the text portion. Inspired by clawdbot's 'incomplete-text' recovery pattern. Up to 2 prefill attempts before falling through to the existing '(empty)' terminal. Key design decisions: - Only triggers for structured reasoning (API fields), NOT inline <think> tags - Prefill messages are popped on success to maintain strict role alternation - _thinking_prefill marker stripped from all API message building paths - Works across all providers: OpenAI (continuation), Anthropic (native prefill) Verified with E2E tests: simulated thinking-only → real OpenRouter continuation produces correct content. Also confirmed Qwen models consistently produce structured-reasoning-only responses under token pressure.
kawanoii
pushed a commit
to kawanoii/hermes-agent
that referenced
this pull request
Apr 11, 2026
…rch#6488) When a model returns no content, no structured reasoning, and no tool calls (common with open models), the agent now silently retries up to 3 times before falling through to (empty). Silent retry (no synthetic messages) keeps the conversation history clean, preserves prompt caching, and respects the no-synthetic-user- injection invariant. Most empty responses from open models are transient (provider hiccups, rate limits, sampling flukes) so a simple retry is sufficient. This fills the last gap in the empty-response recovery chain: 1. _last_content_with_tools fallback (prior tool turn had content) 2. Thinking-only prefill continuation (NousResearch#5931 — structured reasoning) 3. Empty response silent retry (NEW — truly empty, no reasoning) 4. (empty) terminal (last resort after all retries exhausted) Inline <think> blocks are excluded — the model chose to reason, it just produced no visible text. That differs from truly empty. Tests: - Updated test_truly_empty to expect 4 API calls (1 + 3 retries) - Added test_truly_empty_response_succeeds_on_nudge
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
When the model produces structured reasoning (via API fields like
.reasoning,.reasoning_content,.reasoning_details) but no visible text content, the agent now appends the assistant message as prefill and continues the loop. The model sees its own reasoning context on the next turn and produces the text portion.Inspired by clawdbot's "incomplete-text" recovery pattern discovered during competitive codebase analysis (clawdbot, pi-mono, cline).
What changed
run_agent.py (6 change points):
_thinking_prefill_retriescounter (reset per turn)_thinking_prefill=Truemarker → continues loop (up to 2 attempts)_thinking_prefillmarker stripped from all 3 API message building pathstests/test_run_agent.py:
test_reasoning_only_prefill_succeeds_on_continuation— verifies prefill produces content with no consecutive assistant messagesDesign decisions
<think>tags without API reasoning fields go straight to(empty)as beforeTesting
stop+ reasoning-only is a genuine provider anomaly (0/25 reproduced organically)