Skip to content

There appears to be a regression in Goose when using an Ollama backend. #8476

@AiRC-ai

Description

@AiRC-ai

Describe the bug

There appears to be a regression in Goose when using an Ollama backend.

This issue does not occur in v1.27.2, but it does occur in v1.30.0.

In v1.30.0, requests through the Ollama backend frequently fail with:

Request failed: Stream decode error: Ollama stream stalled: no data received for 30s. This may indicate the model is overwhelmed by the request payload. Try a smaller model or reduce the number of tools.

The same Ollama workflow worked in v1.27.2, so this appears to be a behavioral change or regression introduced after that release.


To Reproduce
Steps to reproduce the behavior:

  1. Install and run Goose v1.30.0
  2. Configure Goose to use an Ollama backend
  3. Start a chat using a model served by Ollama
  4. Send a prompt and wait for the streamed response
  5. Observe the request fail with the 30-second stalled stream error

For comparison:

  1. Run the same general workflow in Goose v1.27.2
  2. Use the same Ollama backend/setup
  3. Observe that the issue does not occur there

Expected behavior

Requests using the Ollama backend should behave at least as reliably as they did in v1.27.2. Goose should not prematurely fail with a stream stall error when the same backend/setup previously worked in the older version.


Screenshots

Attached screenshot shows the exact error message in the UI.


Please provide the following information

  • OS & Arch: macOS host environment; also using Kali WSL in related workflow
  • Interface: UI
  • Version: v1.30.0 affected; v1.27.2 works
  • Extensions enabled: Standard enabled set in current Goose UI session
  • Provider & Model: Ollama backend; reproduced across models

Additional context

Important detail: this looks like a version regression.

  • Working: v1.27.2
  • Failing: v1.30.0

This suggests the issue may be related to a change in:

  • streaming response handling
  • timeout handling
  • Ollama provider integration
  • tool/context payload handling
  • initial token wait behavior

Observed error:

Request failed: Stream decode error: Ollama stream stalled: no data received for 30s. This may indicate the model is overwhelmed by the request payload. Try a smaller model or reduce the number of tools.

The message suggests payload/tool load, but since the same backend flow worked in v1.27.2, this seems more likely to be a regression in Goose’s handling rather than just model size alone.

Potential areas to review:

  • any Ollama integration changes between v1.27.2 and v1.30.0
  • timeout defaults for streamed responses
  • handling of slow first-token latency
  • request/tool payload changes introduced in newer versions

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions