fix(gateway): retry transient send failures and notify user on exhaustion#3288
Merged
fix(gateway): retry transient send failures and notify user on exhaustion#3288
Conversation
…tion When send() fails due to a network error (ConnectError, ReadTimeout, etc.), the failure was silently logged and the user received no feedback — appearing as a hang. In one reported case, a user waited 1+ hour for a response that had already been generated but failed to deliver (#2910). Adds _send_with_retry() to BasePlatformAdapter: - Transient errors: retry up to 2x with exponential backoff + jitter - On exhaustion: send delivery-failure notice so user knows to retry - Permanent errors: fall back to plain-text version (preserves existing behavior) - SendResult.retryable flag for platform-specific transient errors All adapters benefit automatically via BasePlatformAdapter inheritance. Cherry-picked from PR #3108 by Mibayy.
|
StreamOfRon
pushed a commit
to StreamOfRon/hermes-agent
that referenced
this pull request
Mar 29, 2026
…tion (NousResearch#3288) When send() fails due to a network error (ConnectError, ReadTimeout, etc.), the failure was silently logged and the user received no feedback — appearing as a hang. In one reported case, a user waited 1+ hour for a response that had already been generated but failed to deliver (NousResearch#2910). Adds _send_with_retry() to BasePlatformAdapter: - Transient errors: retry up to 2x with exponential backoff + jitter - On exhaustion: send delivery-failure notice so user knows to retry - Permanent errors: fall back to plain-text version (preserves existing behavior) - SendResult.retryable flag for platform-specific transient errors All adapters benefit automatically via BasePlatformAdapter inheritance. Cherry-picked from PR NousResearch#3108 by Mibayy. Co-authored-by: Mibayy <mibayy@users.noreply.github.com>
dlkakbs
added a commit
to dlkakbs/hermes-agent
that referenced
this pull request
Mar 30, 2026
When sendMessage times out, the Bot API may have already delivered the message even though the HTTP client got no response. PR NousResearch#3288 added _send_with_retry() which retried on any transient error (including timeouts), stacking on top of TelegramAdapter.send()'s existing 3- attempt internal loop — risking 2–3 duplicate messages per response. - Add SendResult.delivery_uncertain flag; when True, _send_with_retry() returns immediately without retrying or falling back to plain text. - Add TelegramAdapter._looks_like_send_timeout() to detect TimedOut / ReadTimeout / WriteTimeout exceptions (with and without the python-telegram-bot import). - Set delivery_uncertain=True in send()'s final except clause when the exhausted error is a send timeout. Fixes NousResearch#3906.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Salvage of PR #3108 by @Mibayy (authorship preserved). Fixes #2910.
When
send()fails due to a network error (ConnectError, ReadTimeout, etc.), the failure was silently logged and the user received no feedback — appearing as a hang. In one reported case, a user waited 1+ hour for a response that had already been generated but failed to deliver.Changes
Adds
_send_with_retry()toBasePlatformAdapter:Also adds:
SendResult.retryablefield for platform-specific transient error flagging_RETRYABLE_ERROR_PATTERNSconstant for string-based transient detection_is_retryable_error()static methodAll adapters benefit automatically via
BasePlatformAdapterinheritance — no per-adapter changes needed.Follow-up improvements over original PR
eventparameter from_send_with_retrysignatureimport randomto module-level instead of per-call importTests
27 tests in
tests/gateway/test_send_retry.py. 6294 pass full suite (only pre-existing anthropic 429 flake fails).