There appears to be a regression in Goose when using an Ollama backend.

**Describe the bug**

There appears to be a regression in Goose when using an **Ollama backend**.

This issue does **not** occur in **v1.27.2**, but it **does** occur in **v1.30.0**.

In v1.30.0, requests through the Ollama backend frequently fail with:

`Request failed: Stream decode error: Ollama stream stalled: no data received for 30s. This may indicate the model is overwhelmed by the request payload. Try a smaller model or reduce the number of tools.`

The same Ollama workflow worked in v1.27.2, so this appears to be a behavioral change or regression introduced after that release.

---

**To Reproduce**
Steps to reproduce the behavior:
1. Install and run **Goose v1.30.0**
2. Configure Goose to use an **Ollama backend**
3. Start a chat using a model served by Ollama
4. Send a prompt and wait for the streamed response
5. Observe the request fail with the 30-second stalled stream error

For comparison:
1. Run the same general workflow in **Goose v1.27.2**
2. Use the same Ollama backend/setup
3. Observe that the issue does not occur there

---

**Expected behavior**

Requests using the Ollama backend should behave at least as reliably as they did in v1.27.2. Goose should not prematurely fail with a stream stall error when the same backend/setup previously worked in the older version.

---

**Screenshots**

Attached screenshot shows the exact error message in the UI.

---

**Please provide the following information**
- **OS & Arch:** macOS host environment; also using Kali WSL in related workflow
- **Interface:** UI
- **Version:** v1.30.0 affected; v1.27.2 works
- **Extensions enabled:** Standard enabled set in current Goose UI session
- **Provider & Model:** Ollama backend; reproduced across models

---

**Additional context**

Important detail: this looks like a **version regression**.

- **Working:** v1.27.2
- **Failing:** v1.30.0

This suggests the issue may be related to a change in:
- streaming response handling
- timeout handling
- Ollama provider integration
- tool/context payload handling
- initial token wait behavior

Observed error:

`Request failed: Stream decode error: Ollama stream stalled: no data received for 30s. This may indicate the model is overwhelmed by the request payload. Try a smaller model or reduce the number of tools.`

The message suggests payload/tool load, but since the same backend flow worked in v1.27.2, this seems more likely to be a regression in Goose’s handling rather than just model size alone.

Potential areas to review:
- any Ollama integration changes between v1.27.2 and v1.30.0
- timeout defaults for streamed responses
- handling of slow first-token latency
- request/tool payload changes introduced in newer versions

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

There appears to be a regression in Goose when using an Ollama backend. #8476

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

There appears to be a regression in Goose when using an Ollama backend. #8476

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions