Skip to content

feat: thinking-only prefill continuation for structured reasoning responses#5931

Merged
teknium1 merged 1 commit intomainfrom
hermes/hermes-81f85bb4
Apr 7, 2026
Merged

feat: thinking-only prefill continuation for structured reasoning responses#5931
teknium1 merged 1 commit intomainfrom
hermes/hermes-81f85bb4

Conversation

@teknium1
Copy link
Copy Markdown
Contributor

@teknium1 teknium1 commented Apr 7, 2026

Summary

When the model produces structured reasoning (via API fields like .reasoning, .reasoning_content, .reasoning_details) but no visible text content, the agent now appends the assistant message as prefill and continues the loop. The model sees its own reasoning context on the next turn and produces the text portion.

Inspired by clawdbot's "incomplete-text" recovery pattern discovered during competitive codebase analysis (clawdbot, pi-mono, cline).

What changed

run_agent.py (6 change points):

  • Added _thinking_prefill_retries counter (reset per turn)
  • Core prefill logic: detects structured reasoning + no content → appends message with _thinking_prefill=True marker → continues loop (up to 2 attempts)
  • Prefill message cleanup on both tool-call and final-response paths (prevents consecutive assistant messages that break Anthropic's strict role alternation)
  • _thinking_prefill marker stripped from all 3 API message building paths
  • Counter reset on successful content

tests/test_run_agent.py:

  • Updated 2 existing tests for new prefill behavior (expect 3 API calls instead of 1)
  • Added test_reasoning_only_prefill_succeeds_on_continuation — verifies prefill produces content with no consecutive assistant messages

Design decisions

  • Only structured reasoning — inline <think> tags without API reasoning fields go straight to (empty) as before
  • 2 attempts max — avoids infinite loops; falls through to existing behavior after exhaustion
  • Prefill messages popped on success — maintains strict role alternation for all providers
  • Works across providers — OpenAI (continuation), Anthropic (native prefill), Codex (already has similar pattern)

Testing

  • 228/228 unit tests pass
  • 4 E2E scenarios with real imports verified
  • Live E2E: injected thinking-only response → real OpenRouter continuation → model produced correct content
  • Confirmed Qwen models consistently produce structured-reasoning-only under token pressure (6/6 attempts)
  • 25 concurrent normal-token requests confirmed stop + reasoning-only is a genuine provider anomaly (0/25 reproduced organically)

…ponses

When the model produces structured reasoning (via API fields like .reasoning,
.reasoning_content, .reasoning_details) but no visible text content, append
the assistant message as prefill and continue the loop. The model sees its own
reasoning context on the next turn and produces the text portion.

Inspired by clawdbot's 'incomplete-text' recovery pattern. Up to 2 prefill
attempts before falling through to the existing '(empty)' terminal.

Key design decisions:
- Only triggers for structured reasoning (API fields), NOT inline <think> tags
- Prefill messages are popped on success to maintain strict role alternation
- _thinking_prefill marker stripped from all API message building paths
- Works across all providers: OpenAI (continuation), Anthropic (native prefill)

Verified with E2E tests: simulated thinking-only → real OpenRouter continuation
produces correct content. Also confirmed Qwen models consistently produce
structured-reasoning-only responses under token pressure.
@teknium1 teknium1 merged commit ab8f9c0 into main Apr 7, 2026
5 of 6 checks passed
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 7, 2026

⚠️ Supply Chain Risk Detected

This PR contains patterns commonly associated with supply chain attacks. This does not mean the PR is malicious — but these patterns require careful human review before merging.

⚠️ WARNING: exec() or eval() usage

Dynamic code execution can hide malicious behavior, especially when combined with base64 or network fetches.

Matches (first 20):

5506:+        stat_result = self._exec(stat_cmd)
5527:+        b64_result = self._exec(b64_cmd, timeout=30)
5541:+            dim_result = self._exec(dim_cmd)

⚠️ WARNING: Install hook files modified

These files can execute code during package installation or interpreter startup.

Files:

hermes_cli/memory_setup.py
hermes_cli/setup.py

Automated scan triggered by supply-chain-audit. If this is a false positive, a maintainer can approve after manual review.

Schrotti77 pushed a commit to Schrotti77/hermes-agent that referenced this pull request Apr 7, 2026
…ponses (NousResearch#5931)

When the model produces structured reasoning (via API fields like .reasoning,
.reasoning_content, .reasoning_details) but no visible text content, append
the assistant message as prefill and continue the loop. The model sees its own
reasoning context on the next turn and produces the text portion.

Inspired by clawdbot's 'incomplete-text' recovery pattern. Up to 2 prefill
attempts before falling through to the existing '(empty)' terminal.

Key design decisions:
- Only triggers for structured reasoning (API fields), NOT inline <think> tags
- Prefill messages are popped on success to maintain strict role alternation
- _thinking_prefill marker stripped from all API message building paths
- Works across all providers: OpenAI (continuation), Anthropic (native prefill)

Verified with E2E tests: simulated thinking-only → real OpenRouter continuation
produces correct content. Also confirmed Qwen models consistently produce
structured-reasoning-only responses under token pressure.
saxster pushed a commit to saxster/hermes-agent that referenced this pull request Apr 8, 2026
…ponses (NousResearch#5931)

When the model produces structured reasoning (via API fields like .reasoning,
.reasoning_content, .reasoning_details) but no visible text content, append
the assistant message as prefill and continue the loop. The model sees its own
reasoning context on the next turn and produces the text portion.

Inspired by clawdbot's 'incomplete-text' recovery pattern. Up to 2 prefill
attempts before falling through to the existing '(empty)' terminal.

Key design decisions:
- Only triggers for structured reasoning (API fields), NOT inline <think> tags
- Prefill messages are popped on success to maintain strict role alternation
- _thinking_prefill marker stripped from all API message building paths
- Works across all providers: OpenAI (continuation), Anthropic (native prefill)

Verified with E2E tests: simulated thinking-only → real OpenRouter continuation
produces correct content. Also confirmed Qwen models consistently produce
structured-reasoning-only responses under token pressure.
teknium1 added a commit that referenced this pull request Apr 9, 2026
When a model returns no content, no structured reasoning, and no tool
calls (common with open models), the agent now nudges the model up to
3 times before falling through to (empty).  Each retry appends the
empty assistant message and a system nudge asking the model to respond.

This fills the last gap in the empty-response recovery chain:
1. _last_content_with_tools fallback (prior tool turn had content)
2. Thinking-only prefill continuation (#5931 — structured reasoning)
3. Empty response nudge retry (NEW — truly empty, no reasoning)
4. (empty) terminal (last resort after all retries exhausted)

Inline <think> blocks are excluded — the model chose to reason, it
just produced no visible text.  That is different from truly empty.

Tests:
- Updated test_truly_empty to expect 4 API calls (1 + 3 retries)
- Added test_truly_empty_response_succeeds_on_nudge (content on retry)
teknium1 added a commit that referenced this pull request Apr 9, 2026
When a model returns no content, no structured reasoning, and no tool
calls (common with open models), the agent now silently retries up to
3 times before falling through to (empty).

Silent retry (no synthetic messages) keeps the conversation history
clean, preserves prompt caching, and respects the no-synthetic-user-
injection invariant.  Most empty responses from open models are
transient (provider hiccups, rate limits, sampling flukes) so a
simple retry is sufficient.

This fills the last gap in the empty-response recovery chain:
1. _last_content_with_tools fallback (prior tool turn had content)
2. Thinking-only prefill continuation (#5931 — structured reasoning)
3. Empty response silent retry (NEW — truly empty, no reasoning)
4. (empty) terminal (last resort after all retries exhausted)

Inline <think> blocks are excluded — the model chose to reason, it
just produced no visible text.  That differs from truly empty.

Tests:
- Updated test_truly_empty to expect 4 API calls (1 + 3 retries)
- Added test_truly_empty_response_succeeds_on_nudge
teknium1 added a commit that referenced this pull request Apr 9, 2026
When a model returns no content, no structured reasoning, and no tool
calls (common with open models), the agent now silently retries up to
3 times before falling through to (empty).

Silent retry (no synthetic messages) keeps the conversation history
clean, preserves prompt caching, and respects the no-synthetic-user-
injection invariant.  Most empty responses from open models are
transient (provider hiccups, rate limits, sampling flukes) so a
simple retry is sufficient.

This fills the last gap in the empty-response recovery chain:
1. _last_content_with_tools fallback (prior tool turn had content)
2. Thinking-only prefill continuation (#5931 — structured reasoning)
3. Empty response silent retry (NEW — truly empty, no reasoning)
4. (empty) terminal (last resort after all retries exhausted)

Inline <think> blocks are excluded — the model chose to reason, it
just produced no visible text.  That differs from truly empty.

Tests:
- Updated test_truly_empty to expect 4 API calls (1 + 3 retries)
- Added test_truly_empty_response_succeeds_on_nudge
DiscoStew6082 pushed a commit to DiscoStew6082/hermes-agent that referenced this pull request Apr 9, 2026
…ponses (NousResearch#5931)

When the model produces structured reasoning (via API fields like .reasoning,
.reasoning_content, .reasoning_details) but no visible text content, append
the assistant message as prefill and continue the loop. The model sees its own
reasoning context on the next turn and produces the text portion.

Inspired by clawdbot's 'incomplete-text' recovery pattern. Up to 2 prefill
attempts before falling through to the existing '(empty)' terminal.

Key design decisions:
- Only triggers for structured reasoning (API fields), NOT inline <think> tags
- Prefill messages are popped on success to maintain strict role alternation
- _thinking_prefill marker stripped from all API message building paths
- Works across all providers: OpenAI (continuation), Anthropic (native prefill)

Verified with E2E tests: simulated thinking-only → real OpenRouter continuation
produces correct content. Also confirmed Qwen models consistently produce
structured-reasoning-only responses under token pressure.
DiscoStew6082 pushed a commit to DiscoStew6082/hermes-agent that referenced this pull request Apr 9, 2026
…rch#6488)

When a model returns no content, no structured reasoning, and no tool
calls (common with open models), the agent now silently retries up to
3 times before falling through to (empty).

Silent retry (no synthetic messages) keeps the conversation history
clean, preserves prompt caching, and respects the no-synthetic-user-
injection invariant.  Most empty responses from open models are
transient (provider hiccups, rate limits, sampling flukes) so a
simple retry is sufficient.

This fills the last gap in the empty-response recovery chain:
1. _last_content_with_tools fallback (prior tool turn had content)
2. Thinking-only prefill continuation (NousResearch#5931 — structured reasoning)
3. Empty response silent retry (NEW — truly empty, no reasoning)
4. (empty) terminal (last resort after all retries exhausted)

Inline <think> blocks are excluded — the model chose to reason, it
just produced no visible text.  That differs from truly empty.

Tests:
- Updated test_truly_empty to expect 4 API calls (1 + 3 retries)
- Added test_truly_empty_response_succeeds_on_nudge
dbmizrahi pushed a commit to dbmizrahi/hermes-agent that referenced this pull request Apr 10, 2026
…ponses (NousResearch#5931)

When the model produces structured reasoning (via API fields like .reasoning,
.reasoning_content, .reasoning_details) but no visible text content, append
the assistant message as prefill and continue the loop. The model sees its own
reasoning context on the next turn and produces the text portion.

Inspired by clawdbot's 'incomplete-text' recovery pattern. Up to 2 prefill
attempts before falling through to the existing '(empty)' terminal.

Key design decisions:
- Only triggers for structured reasoning (API fields), NOT inline <think> tags
- Prefill messages are popped on success to maintain strict role alternation
- _thinking_prefill marker stripped from all API message building paths
- Works across all providers: OpenAI (continuation), Anthropic (native prefill)

Verified with E2E tests: simulated thinking-only → real OpenRouter continuation
produces correct content. Also confirmed Qwen models consistently produce
structured-reasoning-only responses under token pressure.
kawanoii pushed a commit to kawanoii/hermes-agent that referenced this pull request Apr 11, 2026
…rch#6488)

When a model returns no content, no structured reasoning, and no tool
calls (common with open models), the agent now silently retries up to
3 times before falling through to (empty).

Silent retry (no synthetic messages) keeps the conversation history
clean, preserves prompt caching, and respects the no-synthetic-user-
injection invariant.  Most empty responses from open models are
transient (provider hiccups, rate limits, sampling flukes) so a
simple retry is sufficient.

This fills the last gap in the empty-response recovery chain:
1. _last_content_with_tools fallback (prior tool turn had content)
2. Thinking-only prefill continuation (NousResearch#5931 — structured reasoning)
3. Empty response silent retry (NEW — truly empty, no reasoning)
4. (empty) terminal (last resort after all retries exhausted)

Inline <think> blocks are excluded — the model chose to reason, it
just produced no visible text.  That differs from truly empty.

Tests:
- Updated test_truly_empty to expect 4 API calls (1 + 3 retries)
- Added test_truly_empty_response_succeeds_on_nudge
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant