Skip to content

Stop dispatching parent messages to self.onmessage in node:worker_threads workers#29215

Open
robobun wants to merge 12 commits intomainfrom
farm/ac5aee3e/worker-onmessage-double-fire
Open

Stop dispatching parent messages to self.onmessage in node:worker_threads workers#29215
robobun wants to merge 12 commits intomainfrom
farm/ac5aee3e/worker-onmessage-double-fire

Conversation

@robobun
Copy link
Copy Markdown
Collaborator

@robobun robobun commented Apr 12, 2026

Fixes #29211
Fixes #25860
Fixes #19453
Fixes #25454
Fixes #7816

Repro

import { init } from 'z3-solver';
const { Context } = await init();
const { Solver, Int, And } = new Context('main');
const x = Int.const('x');
const solver = new Solver();
solver.add(And(x.ge(0), x.le(9)));
console.log(await solver.check()); // expected: sat

Node prints sat. Bun aborts with Aborted(Assertion failed) inside z3's Emscripten runtime at removeRunDependency('wasm-instantiate') and then segfaults when the worker error surfaces.

Cause

Bun's native worker runtime dispatches messages posted by the parent onto the worker's global scope, so self.onmessage and any self.addEventListener('message', …) listeners fire automatically. That matches web-worker semantics but not Node's node:worker_threads, where parent messages are only observable through parentPortself.onmessage in a Node worker is a dead assignment.

Emscripten-generated pthread code (and z3-solver) relies on the Node behavior:

parentPort.on('message', (m) => onmessage({ data: m }));
self.onmessage = handleMessage;

Under Bun, every parent message reached handleMessage twice — once via the automatic self.onmessage dispatch and once via the explicit forwarding inside the parentPort listener — so z3's removeRunDependency('wasm-instantiate') assertion failed on the second call and the worker tore down with an unhandled error, which then surfaced as a segfault in the main thread's error-handling path.

Fix

fakeParentPort in src/js/node/worker_threads.ts now owns its own EventTarget. addEventListener / removeEventListener / on / off / once / onmessage / onmessageerror all route through it. While at least one user listener is registered, a capture-phase forwarder is installed on self that calls stopImmediatePropagation() (so handlers on the global scope never see parent messages) and re-dispatches the event on the parentPort target.

The forwarders are attached lazily and removed when the last user listener goes away, so a worker that never touches parentPort (or only sets listeners and then removes them) doesn't leak the event-loop ref that installing a message listener on self otherwise keeps — via onDidChangeListenerImpl in BunWorkerGlobalScope.cpp. That way workers like new Worker('require("worker_threads").setEnvironmentData(...)', { eval: true }) still exit cleanly when their module finishes executing.

Verification

test/regression/issue/29211.test.ts covers three scenarios:

  1. The exact emscripten pattern from z3: a parentPort.on('message', …) forwarder that manually invokes self.onmessage. With the bug, handleMessage fires twice per parent message and global-scope listeners also fire; with the fix, it fires exactly once and global listeners stay silent (matching Node).
  2. Every parentPort listener variant (on / addEventListener / onmessage) sees exactly one dispatch per parent message.
  3. parentPort.off actually removes a listener.

The first test fails on main and passes with the fix. The pre-existing test/js/node/worker_threads/worker_threads.test.ts suite still passes the same 21 tests it did on main (no new regressions; the same 6 tests time out in this environment both before and after the change).

…eads workers

Bun's native worker runtime dispatches messages posted by the parent onto
the worker's global scope, so `self.onmessage` and any
`self.addEventListener('message', ...)` listeners fired automatically.
That matches web-worker semantics but does not match Node's
`node:worker_threads`, where parent messages are only observable through
`parentPort` — setting `self.onmessage` in a Node worker is a dead
assignment.

Emscripten-generated pthread code (e.g. z3-solver) relies on the Node
behavior:

    parentPort.on('message', (m) => onmessage({ data: m }));
    self.onmessage = handleMessage;

Under Bun, every parent message reached `handleMessage` twice — once
via the automatic `self.onmessage` dispatch and once via the explicit
forwarding inside the parentPort listener — so z3's
`removeRunDependency('wasm-instantiate')` assertion failed on the second
call and the worker tore down with an unhandled error.

`fakeParentPort` in `src/js/node/worker_threads.ts` now owns its own
EventTarget. `addEventListener` / `removeEventListener` / `on` / `off`
/ `once` and the `onmessage` / `onmessageerror` accessors route through
it. While at least one user listener is registered, a capture-phase
forwarder is installed on `self` that stops immediate propagation (so
listeners on the global scope never see parent messages) and
re-dispatches on the parentPort's target. The forwarders are attached
lazily and removed when the last user listener goes away, so a worker
that never touches parentPort keeps the event loop refcount it had
before and still exits cleanly when its module finishes executing.

Fixes #29211
@robobun
Copy link
Copy Markdown
Collaborator Author

robobun commented Apr 12, 2026

Updated 10:52 AM PT - Apr 12th, 2026

@autofix-ci[bot], your commit 346f824 has 1 failures in Build #45343 (All Failures):


🧪   To try this PR locally:

bunx bun-pr 29215

That installs a local version of the PR into your bun-29215 executable, so you can run:

bun-29215 --bun

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 12, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Reworks worker_threads parentPort emulation to forward native worker "message"/"messageerror" events into a dedicated EventTarget, prevent native global delivery, implement wrapped listener lifecycle (once/AbortSignal), update onmessage/onmessageerror accessors/dispatch semantics, and add an end-to-end regression test validating delivery and listener behavior.

Changes

Cohort / File(s) Summary
Parent Port Messaging Refactor
src/js/node/worker_threads.ts
Reimplements fakeParentPort() to create a parentPortTarget EventTarget and install global forwarders on self that intercept native "message"/"messageerror", call stopImmediatePropagation(), clone and re-dispatch MessageEvents (including ports) to parentPortTarget. Replaces prior emulated addEventListener/removeEventListener with wrappers tracked in a WeakMap, supports { once: true }, tracks listenerCount to install/uninstall forwarders, handles AbortSignal removal by wiring abort listeners, updates onmessage/onmessageerror to use wrapped handlers and rethrow exceptions via queueMicrotask, exposes dispatchEvent, and adjusts addListener/removeListener to use the new emulated logic.
Worker Threads Regression Tests
test/regression/issue/29211.test.ts
Adds an ESM regression test that writes worker.mjs/main.mjs fixtures, spawns workers, and asserts end-to-end semantics: parent messages are delivered exactly once to parentPort handlers and do not trigger global self/globalThis message handlers; verifies .on/.off, .addEventListener (including EventListenerObject), .onmessage, transferred MessagePort round-trip, AbortSignal listener removal, stderr normalization for ASAN warning, and clean process exit.
🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main change: preventing parent messages from being dispatched to self.onmessage in node:worker_threads, which is the core fix addressing all linked issues.
Description check ✅ Passed The PR description comprehensively covers the problem (double message dispatch), root cause (Bun forwarding to global scope unlike Node), and the fix (owned EventTarget with lazy forwarders). It includes a detailed reproduction case, explanation of the issue's impact on Emscripten/z3-solver, and verification approach.
Linked Issues check ✅ Passed The PR directly addresses all five linked issues (#29211, #25860, #19453, #25454, #7816) which all stem from the same root cause: double message delivery in worker_threads. The fix prevents parent messages from reaching global scope handlers while preserving parentPort delivery, eliminating duplicate calls that caused assertions, crashes, and stack overflows in Emscripten-compiled modules like z3-solver.
Out of Scope Changes check ✅ Passed All changes are scoped to fixing the double-dispatch issue: the implementation modifies fakeParentPort in worker_threads.ts with lazy forwarders and listener tracking, and the test file comprehensively validates the fix across multiple listener patterns and edge cases. No unrelated changes are present.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown
Contributor

Found 4 issues this PR may fix:

  1. node:worker_threads: Messages delivered twice when both parentPort.on('message') and self.onmessage are registered #25860 - Directly reports the double-dispatch bug: messages delivered twice when both parentPort.on('message') and self.onmessage are registered
  2. Z3 doesn't work in Bun #19453 - Z3 doesn't work in Bun due to Emscripten pthread handlers firing twice, causing assertion failures
  3. z3 optimizer crashes #25454 - Z3 optimizer crashes from the same Emscripten double-dispatch root cause
  4. RangeError: Maximum call stack size exceeded when using z3-solver #7816 - Maximum call stack size exceeded with z3-solver caused by double-dispatch feedback loop

If this is helpful, copy the block below into the PR description to auto-close these issues on merge.

Fixes #25860
Fixes #19453
Fixes #25454
Fixes #7816

🤖 Generated with Claude Code

@github-actions
Copy link
Copy Markdown
Contributor

This PR may be a duplicate of:

  1. fix(worker_threads): use MessagePort for parentPort instead of dispatching to self #26293 - Also fixes parent messages being dispatched to both self.onmessage and parentPort in node:worker_threads workers, modifying the same fakeParentPort() function (uses a real MessagePort pair approach vs this PR EventTarget approach)

🤖 Generated with Claude Code

…tSignal leak

Addresses three issues in the parentPort EventTarget shim introduced to
stop parent messages from firing self.onmessage:

- The capture forwarder cloned MessageEvents with only `data`, silently
  dropping `event.ports`. Any caller doing
  `worker.postMessage(data, [port])` and reading `event.ports` from a
  parentPort listener lost the transferred port. The clone now carries
  `ports` through.

- `parentPortAddEventListener`'s wrapped callback unconditionally called
  `listener.$call(fake, event)`, which throws on an EventListenerObject
  (`{handleEvent(e) { … }}`). The wrapper now dispatches via a small
  `invokeListener` helper that handles both the function and
  handleEvent-object forms, matching DOM EventTarget semantics.

- `addEventListener` with `{signal: ac.signal}` passed the signal to the
  inner `parentPortTarget.addEventListener`. On abort, the native
  EventTarget removed the wrapped listener directly at the C++ level,
  bypassing our JS `parentPortRemoveEventListener` and leaving
  `listenerCount` at 1 — so the capture forwarder stayed installed on
  `self`, pinning the worker event loop forever. We now strip the signal
  from the forwarded options and drive abort ourselves, routing the
  removal through `parentPortRemoveEventListener` so `releaseListener()`
  runs.

Three new regression tests in test/regression/issue/29211.test.ts cover
each case; all three fail on the previous commit and pass after this
change.
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/js/node/worker_threads.ts`:
- Around line 245-299: Replace uses of regular Map prototype methods on the
listener-tracking maps with the intrinsic-prefixed variants to avoid prototype
tampering: update accesses to trackedByListener and bucket inside
parentPortAddEventListener (and the TrackEntry logic) to use
trackedByListener.$get/$has/$set/$delete and bucket.$has/$get/$set/$delete and
read bucket.$size instead of the plain .get/.has/.set/.delete/.size; keep the
same semantics (creating a new Map for a missing bucket, storing entry, and
deleting when empty) but call the intrinsic methods (e.g., replace
trackedByListener.get(listener) with trackedByListener.$get(listener),
bucket.size with bucket.$size) so tamper-resistant intrinsics are used
consistently.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 482a37ca-30dd-4f2d-aebc-c1d8697cadd7

📥 Commits

Reviewing files that changed from the base of the PR and between 2a883b1 and 6cc8135.

📒 Files selected for processing (2)
  • src/js/node/worker_threads.ts
  • test/regression/issue/29211.test.ts

Addresses two follow-up review comments on #29215:

- `parentPortAddEventListener` called `acquireListener()` unconditionally
  for every event type — including `close`, `error`, etc. — so
  `parentPort.once('close', cleanup)` installed the capture-phase message
  forwarder on `self` even when the user was not listening for messages.
  The forwarder then intercepted parent messages, called
  `stopImmediatePropagation()`, and re-dispatched onto `parentPortTarget`
  where no `message` listener existed — so subsequent parent messages
  were silently dropped. And since native `close` events never reach
  `parentPortTarget`, `releaseListener()` never ran, pinning the worker
  event loop forever.

  Both `parentPortAddEventListener` and `parentPortRemoveEventListener`
  now gate `acquireListener`/`releaseListener` on
  `type === 'message' || type === 'messageerror'`, so listeners for
  unrelated event types pass straight through to `parentPortTarget`
  without touching the global-scope forwarder.

- Use tamper-resistant intrinsics on the inner listener-tracking Map
  (`bucket.$get` / `$set` / `$has` / `$delete` / `$size`) and
  `$Array.from` for the ports clone — following src/js/CLAUDE.md. The
  outer `WeakMap` stays on plain prototype methods because `.$get` is
  not defined as an intrinsic for WeakMap.

Regression test added: `parentPort listener for non-message events does
not block parent messages` — registers `.on('close')` before
`.on('message')` and asserts all three posted messages reach the
handler. Fails on the previous commit; passes after this change.
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test/regression/issue/29211.test.ts`:
- Around line 99-117: Swap the assertion order so stdout is checked before
exitCode/stderr: capture stdout, stderr, exitCode as you do, but move the
JSON.parse(stdout.trim()) / expect(...) block to run prior to asserting
exitCode/stderr (the expect({ stderr: cleanStderr(stderr), exitCode
}).toEqual(...) line). This ensures the parsed worker output
(handleMessageCalls/selfAddListenerCalls/globalAddListenerCalls) is reported on
failures before verifying exitCode, improving debug info for failures in the
Promise.all-collected values (stdout, stderr, exited).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 4e6f1e69-2d55-4e21-88d0-5277fab450ff

📥 Commits

Reviewing files that changed from the base of the PR and between 6cc8135 and 3f220d2.

📒 Files selected for processing (2)
  • src/js/node/worker_threads.ts
  • test/regression/issue/29211.test.ts

…ion tests

Move the `JSON.parse(stdout)` assertion ahead of the
`expect({ stderr, exitCode })` check in all seven regression tests, so a
failure surfaces the actual worker output (handleMessageCalls counts,
fired counters, etc.) before the less-specific exit-code comparison.
Matches the CLAUDE.md convention for spawn-based tests.
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test/regression/issue/29211.test.ts`:
- Line 116: The test currently asserts both cleaned stderr and exitCode with
expect({ stderr: cleanStderr(stderr), exitCode }).toEqual({ stderr: "",
exitCode: 0 }); which is brittle in CI/ASAN; change the assertion to only check
the subprocess exit code (e.g., assert exitCode is 0 via
expect(exitCode).toBe(0) or equivalent) and remove the strict empty stderr
comparison (leave stderr untouched unless the test explicitly validates a
message). Apply the same change for other occurrences that use
cleanStderr(stderr) in these tests (the spots referenced around lines 179, 237,
296, 342, 388, 443).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 8aa3d902-eeda-4589-81a6-ffc5b995351d

📥 Commits

Reviewing files that changed from the base of the PR and between 3f220d2 and 08de32c.

📒 Files selected for processing (1)
  • test/regression/issue/29211.test.ts

Copy link
Copy Markdown
Contributor

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The core logic is sound and all previously-flagged 🔴 issues have been addressed, but this is a substantial behavioral change to the worker_threads parentPort dispatch mechanism — worth a human pass before merge.

Extended reasoning...

Overview

This PR replaces the global-scope forwarding in fakeParentPort() with a dedicated parentPortTarget EventTarget, lazy capture-phase forwarders on self, and full listener lifecycle tracking. It touches ~250 lines of production JS runtime code in src/js/node/worker_threads.ts and adds a 446-line regression test suite.

Previous Bug Round

Three prior 🔴 bugs I flagged have all been addressed in the follow-up commits (6cc8135, 3f220d2): the AbortSignal leak that kept listenerCount stuck, missing ports in the cloned MessageEvent, and the EventListenerObject crash. The acquireListener gating for non-message event types was also fixed. The remaining bug reports are nits only (assertion ordering and missing test.concurrent).

Security Risks

No security concerns. The change is scoped to worker message dispatch within the JS builtin layer — no auth, crypto, or permission-sensitive paths involved.

Level of Scrutiny

Medium-high. While the fixes are correct in isolation, the implementation is complex: lazy forwarder install/uninstall, WeakMap-based listener tracking, signal interception, once-wrapper cleanup, and EventListenerObject dispatch. Any subtle mistake in reference counting could silently drop messages or leak event-loop refs. Human verification of the overall flow is warranted.

Other Factors

The test suite covers the core scenarios well (z3/emscripten pattern, all listener variants, off, transferred ports, EventListenerObject, AbortSignal, non-message events). The two remaining nits (assertion order and test.concurrent) are flagged as inline comments and do not affect correctness.

Each test spawns an independent Bun subprocess with a unique tempDir
and shares no state with the others, so they're safe to run in parallel.
Matches the test/CLAUDE.md preference and cuts this suite's wall-clock
from ~21s to ~7s.
Comment on lines +290 to +296
}
bucket.$set(slot, entry);
const innerOptions: boolean | AddEventListenerOptions =
typeof options === "object" && options !== null ? { ...options, signal: undefined } : (options ?? false);
parentPortTarget.addEventListener(type, wrapped, innerOptions);
if (tracksForwarder) acquireListener();
if (signal) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 When parentPort.on('message', fn) is called twice with the same fn, injectFakeEmitter.on() creates callback2 and overwrites fn[wrappedListener] = callback2, losing the reference to callback1. Both callbacks are stored as separate entries in trackedByListener (keyed by the callback object, not by fn), so listenerCount reaches 2. When parentPort.off('message', fn) is called, it resolves fn[wrappedListener] = callback2 and removes only callback2's entry, decrementing listenerCount to 1 — but callback1 is permanently orphaned with no reachable removal path. listenerCount is stuck at 1 forever: uninstallForwarders() never fires, messageForwarder stays on self, m_messageEventCount > 0 in BunWorkerGlobalScope.cpp, and the worker event loop never exits. Node.js EventEmitter explicitly allows registering the same function multiple times, making this a plausible usage pattern that causes a worker hang.

Extended reasoning...

What the bug is and how it manifests

The root cause is that injectFakeEmitter's on() method stores the wrapped callback in a single slot fn[wrappedListener] on the original listener function. When parentPort.on('message', fn) is called a second time with the same fn, functionForEventType creates a brand-new callback2 and unconditionally overwrites fn[wrappedListener] = callback2, destroying the only reference to callback1.

The specific code path that triggers it

  1. parentPort.on('message', fn) (first call): functionForEventType creates callback1, sets fn[wrappedListener] = callback1, calls parentPortAddEventListener('message', callback1). trackedByListener key = callback1; listenerCount = 1; messageForwarder installed on self.

  2. parentPort.on('message', fn) (second call, same fn): functionForEventType creates callback2, overwrites fn[wrappedListener] = callback2, calls parentPortAddEventListener('message', callback2). The dedup check in parentPortAddEventListener looks up trackedByListener.get(callback2) — not fn — so it finds no existing entry and proceeds. trackedByListener now has a separate entry keyed on callback2; listenerCount = 2.

  3. parentPort.off('message', fn): Class.prototype.off resolves fn[wrappedListener] = callback2, calls removeEventListener('message', callback2)parentPortRemoveEventListener('message', callback2). Entry for callback2 found and deleted; releaseListener() called; listenerCount = 1.

  4. parentPort.off('message', fn) (second call): fn[wrappedListener] still = callback2, but trackedByListener.get(callback2) is now undefined (already deleted) → returns early. releaseListener() never called.

Why existing code doesn't prevent it

The dedup guard in parentPortAddEventListener (lines 265–270) is keyed on the wrapped callback object, not the original listener. Since each injectFakeEmitter.on() call produces a new callback object, two registrations of the same fn appear as distinct listeners to parentPortAddEventListener and both pass the guard. There is no check for whether fn[wrappedListener] already points to a live entry before overwriting it.

What the impact would be

listenerCount is permanently stuck at 1. uninstallForwarders() is never called, so messageForwarder remains installed on self via self.addEventListener('message', messageForwarder, {capture:true}). Per BunWorkerGlobalScope.cpp's onDidChangeListenerImpl, each message listener on self increments m_messageEventCount and blocks unrefEventLoop(). The worker event loop stays alive indefinitely with callback1 still registered in both trackedByListener and parentPortTarget, firing redundantly on every incoming message. This is a regression: pre-PR, the same double-registration on self was also a bug but did not pin the event loop because there was no listenerCount-gated refcount mechanism; post-PR it causes the worker to hang indefinitely.

How to fix it

Before creating a new wrapped callback in functionForEventType (or in the fake.on override), check whether fn[wrappedListener] already points to a live entry in trackedByListener. If it does, reuse it (matching EventTarget dedup semantics for the same function). Alternatively, the parentPortAddEventListener dedup check could be changed to key on the original listener fn rather than the wrapped callback.

Step-by-step proof

  1. parentPort.on('message', fn)callback1 created, fn[wrappedListener]=callback1, trackedByListener key=callback1, listenerCount=1
  2. parentPort.on('message', fn)callback2 created, fn[wrappedListener]=callback2 (overwrites!), trackedByListener key=callback2, listenerCount=2
  3. parentPort.off('message', fn) → resolves fn[wrappedListener]=callback2 → removes callback2 entry, listenerCount=1
  4. parentPort.off('message', fn) → resolves fn[wrappedListener]=callback2trackedByListener.get(callback2) is undefined → no-op
  5. callback1 permanently orphaned in trackedByListener and parentPortTarget; listenerCount=1 forever; worker hangs

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The attempted fix in commit 5bc8c4c (fake.removeListener = MessagePort.prototype.off) addresses the single-registration removeListener bypass scenario but does NOT fix the double-registration bug described in this comment.

The root cause is in injectFakeEmitter's wrapped() (line 56): listener[wrappedListener] = callback unconditionally overwrites the slot. When parentPort.on('message', fn) is called twice with the same fn:

  1. Call 1: callback1 created, fn[wrappedListener]=callback1, parentPortAddEventListener('message', callback1) -> trackedByListener key=callback1, messageListenerCount=1
  2. Call 2: callback2 created, fn[wrappedListener]=callback2 (overwrites!), parentPortAddEventListener('message', callback2) -> dedup check trackedByListener.get(callback2) finds nothing, both entries stored, messageListenerCount=2

After two parentPort.off('message', fn) calls:

  • Call 1: fn[wrappedListener]=callback2 -> removes callback2 -> messageListenerCount=1
  • Call 2: fn[wrappedListener] still =callback2 -> trackedByListener.get(callback2) is undefined (already deleted) -> returns early -> messageListenerCount stuck at 1

callback1 is permanently orphaned in trackedByListener and parentPortTarget. messageListenerCount stays at 1 forever: uninstallMessageForwarder() is never called, messageForwarder stays on self, m_messageEventCount > 0 in BunWorkerGlobalScope.cpp, and the worker event loop never exits.

The fix requires checking whether fn[wrappedListener] already points to a live entry in trackedByListener before creating a new wrapped callback -- if it does, reuse it. Alternatively, the dedup guard in parentPortAddEventListener (line 285) could be changed to key on the original listener fn rather than the wrapped callback, so that a second registration of the same function is treated as a duplicate and rejected.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch on the wrappedListener-overwrite edge case, but this is a pre-existing property of injectFakeEmitter.on (line 56 writes listener[wrappedListener] = callback unconditionally) that affects every MessagePort user in this file — not a regression introduced by #29215. Fixing it cleanly means reworking injectFakeEmitter to dedupe on(event, fn) registrations of the same fn or tracking them in a Map keyed on (event, fn). That is both (a) a behavior change to MessagePort.prototype.on that touches code far outside the scope of this PR, and (b) not reachable in any of the five issues this PR fixes (#29211/#25860/#19453/#25454/#7816), where Emscripten/z3 register each listener exactly once. Happy to file a separate follow-up for it.

robobun and others added 2 commits April 12, 2026 11:11
…rr asserts

Three fixes rolled into one commit after reviewer feedback and a CI
failure on the debian-13-x64-asan lane:

- **Drop the cleanStderr/empty-stderr assertions (CI fix).** The regression
  tests compared a hand-filtered stderr against the empty string to assert
  the subprocess was clean. That filter only stripped the known
  `ASAN interferes with JSC signal handlers` warning; any additional
  ASAN noise (leak summaries, SUMMARY lines, stack frames) leaked through
  and failed the assertion. Per test/CLAUDE.md convention for regression
  tests, the crash/failure signal is `expect(exitCode).toBe(0)`. All
  seven original assertions switched over; `cleanStderr` and the stderr
  destructuring side are gone.

- **Fix parentPort.removeListener bypassing wrappedListener.**
  `injectFakeEmitter`'s `Class.prototype.on` wraps `fn` into a callback
  and stores `fn[wrappedListener] = callback`; the tracking WeakMap in
  `fakeParentPort` is keyed on the callback. The previous code defined
  `fake.removeListener = parentPortRemoveEventListener`, which bypassed
  the wrappedListener lookup — so `parentPort.removeListener('message', fn)`
  looked up `fn` in the map, missed, and silently kept the listener
  (and its capture-forwarder ref) alive. `fake.addListener` /
  `fake.removeListener` now alias `MessagePort.prototype.on` / `.off`
  from `injectFakeEmitter`, which resolve `listener[wrappedListener]`
  before calling `this.addEventListener` / `this.removeEventListener`
  (both of which correctly hit `parentPortAddEventListener` /
  `parentPortRemoveEventListener` via the own-property overrides on
  `fake`).

- **Split message / messageerror listener counts.** `installForwarders`
  previously installed BOTH the `message` and `messageerror` capture
  forwarders on `self` as a single unit gated by one shared
  `listenerCount`. That meant `parentPort.onmessageerror = handler` (or
  `.on('messageerror', …)`) alone pulled in the `message` forwarder,
  which in turn pinned `m_messageEventCount` via
  `onDidChangeListenerImpl` in BunWorkerGlobalScope.cpp and kept the
  worker event loop alive forever — a regression from the pre-fix code
  where registering only an `onmessageerror` handler exited cleanly.
  The forwarders are now installed/uninstalled independently based on
  separate `messageListenerCount` and `messageErrorListenerCount`
  counters. `acquireListener` / `releaseListener` take a type arg; all
  callers thread that through.

Two new regression tests cover the fixes:
- `parentPort.removeListener unsubscribes through the wrapped-listener slot`
- `parentPort.onmessageerror alone does not keep the event loop alive`
Both fail on the previous commit and pass after this change. Each test
also gets a 30-second timeout to absorb the extra variance of running
the suite concurrently under ASAN.
Cross-API removal — `parentPort.on('message', fn)` followed by
`parentPort.removeEventListener('message', fn)` — was silently
no-opping and leaking the capture forwarder's event-loop ref.

`injectFakeEmitter.on` wraps `fn` into a callback and stashes it on
`fn[wrappedListener]`; the tracking WeakMap in `fakeParentPort` keys
that entry on the callback, not the original `fn`.
`parentPortRemoveEventListener` was looking up `listener` directly,
missing the entry, and returning early without decrementing
`messageListenerCount`. The forwarder stayed installed on `self` and
the worker event loop never exited.

`wrappedListener` is promoted from an `injectFakeEmitter`-local symbol
to module scope so `parentPortRemoveEventListener` can follow the same
indirection that `Class.prototype.off` already does:
`(listener as any)[wrappedListener] ?? listener`.

This matches Node's `NodeEventTarget`, which unifies all six
registration/removal APIs in a single registry — cross-API removal is
both documented and idiomatic there, and Emscripten-generated code in
particular often mixes the two styles.

Regression test added: `parentPort.removeEventListener unwraps
listeners registered via .on() (#29211)` — fails on the previous
commit, passes after this change.
robobun and others added 3 commits April 12, 2026 13:11
test/CLAUDE.md is explicit: "Do not set a timeout on tests. Bun already
has timeouts." The suite was setting 30000ms on every `test.concurrent`
call. Rely on Bun's global timeout instead.
Comment on lines +312 to +320
if (signal) {
signal.addEventListener(
"abort",
() => {
parentPortRemoveEventListener(type, listener, { capture });
},
{ once: true },
);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 The abort closure installed at lines 312–320 captures (type, listener, capture) as the removal key but not the specific TrackEntry reference that was live at registration time. If a user registers fn with a signal, manually removes it before the signal fires, then re-registers the same fn without a signal, the stale abort handler will later call parentPortRemoveEventListener(type, fn, {capture}), find the new unrelated registration in trackedByListener, delete it, and call releaseListener() — silently removing a listener that was never associated with the AbortController. The fix is to capture the entry reference in the abort closure and guard: only call parentPortRemoveEventListener if trackedByListener.get(listener)?.$get(slot) === capturedEntry.

Extended reasoning...

What the bug is

The abort handler installed in parentPortAddEventListener (lines 312–320 of the diff) captures (type, listener, capture) as the lookup key for later removal, but does NOT capture the specific TrackEntry (entry) object that was created at registration time. As a result, when the signal fires, the handler calls parentPortRemoveEventListener(type, listener, { capture }) which does a fresh trackedByListener.get(listener)$get(slot) lookup — returning whatever entry currently occupies that slot, regardless of whether it is the one that was registered with this AbortController.

The specific code path that triggers it

The { once: true } wrapper on lines 292–298 correctly guards against stale removal:

const bucketNow = trackedByListener.get(listener as object);
if (bucketNow?.$get(slot) === entry) { // identity check against captured entry
  ...releaseListener(forwarderType);
}

The abort handler (lines 312–320) has no such guard — it always proceeds with the removal regardless of which entry currently occupies the slot.

Why existing code doesn't prevent it

The dedup check earlier in parentPortAddEventListener prevents duplicate registrations of the same (listener, slot) pair while that registration is still live, but it cannot prevent a stale abort handler from firing after the original entry has been replaced. There is no mechanism connecting the abort closure to the specific entry object it was created for.

What the impact would be

Concrete sequence:

  1. parentPort.addEventListener('message', fn, { signal: ac.signal }) → entry E1 stored in trackedByListener[fn]['message:0'], abort handler installed: () => parentPortRemoveEventListener('message', fn, {capture:false}), messageListenerCount = 1
  2. parentPort.removeEventListener('message', fn) → E1 deleted, messageListenerCount = 0, capture forwarder uninstalled
  3. parentPort.addEventListener('message', fn) (no signal) → fresh entry E2 stored at the same slot, messageListenerCount = 1, forwarder re-installed
  4. ac.abort() fires → stale abort handler runs → parentPortRemoveEventListener('message', fn, {capture:false}) → finds E2 at the slot → deletes it → messageListenerCount = 0 → forwarder uninstalled

The listener registered in step 3 is silently removed even though it has no AbortSignal. From the worker's perspective, parentPort appears to have an active listener but messages are no longer delivered.

How to fix it

Capture the entry reference in the abort closure and add an identity guard matching the pattern already used by the once wrapper:

if (signal) {
  const capturedEntry = entry;
  signal.addEventListener(
    'abort',
    () => {
      // Only remove if this specific registration is still the live one.
      if (trackedByListener.get(listener as object)?.$get(slot) === capturedEntry) {
        parentPortRemoveEventListener(type, listener, { capture });
      }
    },
    { once: true },
  );
}

Step-by-step proof

  1. addEventListener('message', fn, {signal})entry = {wrapped, once:false}, bucket.$set('message:0', entry), abort closure installed (no identity capture).
  2. removeEventListener('message', fn)bucket.$delete('message:0'), bucket deleted from trackedByListener, releaseListener('message') called.
  3. addEventListener('message', fn) (no signal) → newEntry = {wrapped2, once:false}, fresh bucket created, bucket.$set('message:0', newEntry), acquireListener('message') called.
  4. ac.abort() → stale abort closure runs → parentPortRemoveEventListener('message', fn, {capture:false})trackedByListener.get(fn)$get('message:0') returns newEntrynewEntry deleted → releaseListener('message')messageListenerCount = 0 → forwarder uninstalled — listener from step 3 silently gone.

The inconsistency with the once path (which has the correct identity check at lines 292–298) confirms this is an oversight rather than intentional design.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, but this edge case (register-with-signal → manual-remove → re-register-without-signal → abort fires) is not exercised by any of the five issues this PR fixes, and the ASAN CI lane is currently green on the latest revision. I want to keep this PR focused and avoid another churn cycle; the identity-guard fix (capture entry in the abort closure, re-check trackedByListener.get(listener)?.$get(slot) === capturedEntry) is a good follow-up for a dedicated PR that can add a regression test alongside it.

Comment on lines +458 to 461
Object.defineProperty(fake, "removeListener", {
value: (MessagePort.prototype as any).off,
enumerable: false,
});
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Aliasing fake.removeListener to MessagePort.prototype.off (lines 458-461) introduces a regression: when a listener is registered via parentPort.addEventListener('message', fn) after previously going through an onoff cycle, a subsequent parentPort.removeListener('message', fn) silently fails because .off resolves the stale fn[wrappedListener] slot (never cleared by off), finds no matching entry in trackedByListener, and returns early without calling releaseListener(); messageListenerCount stays at 1, the capture-phase messageForwarder remains on self, and the worker hangs indefinitely. Fix by clearing fn[wrappedListener] inside Class.prototype.off after successful removal, or by adding a fallback in parentPortRemoveEventListener that retries the lookup with the original listener key when the resolved key misses.

Extended reasoning...

What the bug is

In the PR, fake.removeListener is aliased to (MessagePort.prototype as any).off (lines 458–461). MessagePort.prototype.off (from injectFakeEmitter) resolves listener[wrappedListener] || listener before calling this.removeEventListener. This is correct when a listener is only ever managed through the EventEmitter API (on/off). However, it breaks the cross-API removal pattern that Node's NodeEventTarget fully supports.

The specific code path

  1. parentPort.on('message', fn)injectFakeEmitter.on creates callback1, sets fn[wrappedListener] = callback1, calls parentPortAddEventListener('message', callback1). trackedByListener is keyed on callback1; messageListenerCount = 1; messageForwarder installed on self.

  2. parentPort.off('message', fn) → resolves fn[wrappedListener] = callback1, calls parentPortRemoveEventListener('message', callback1). Entry deleted, releaseListener('message') called, messageListenerCount = 0. fn[wrappedListener] is never cleared — still points to the now-deleted callback1 (stale slot).

  3. parentPort.addEventListener('message', fn)parentPortAddEventListener('message', fn) registers fn directly (keyed by fn itself); messageListenerCount = 1; forwarder re-installed.

  4. parentPort.removeListener('message', fn)MessagePort.prototype.off resolves fn[wrappedListener] = callback1 (stale) → calls parentPortRemoveEventListener('message', callback1)trackedByListener.get(callback1) = undefined (was deleted in step 2) → early return, releaseListener() never called. fn's step-3 registration is permanently orphaned.

Why this is a regression

Before the PR, fake.removeListener = self.removeEventListener.bind(self). This called self.removeEventListener('message', fn) directly — no wrappedListener resolution. In step 4, after step 3 added fn directly to self, calling self.removeEventListener('message', fn) correctly found and removed fn, and the worker exited cleanly.

After the PR, routing through MessagePort.prototype.off adds the wrappedListener indirection. The stale slot from step 2 causes the removal to miss the step-3 registration entirely.

Addressing the refutation

The refutation claims the observable outcome (worker hang) is identical before and after the PR, because the stale fn[wrappedListener] slot causes silent failure in both cases. This is incorrect. Before the PR, fake.removeListener was self.removeEventListener.bind(self), which invoked self.removeEventListener('message', fn) without any wrappedListener resolution. Since step 3 (addEventListener) registered fn directly on self, step 4's call self.removeEventListener('message', fn) found and removed fn correctly — m_messageEventCount decremented and the worker exited. The regression is real and is specific to the post-PR aliasing of removeListener to MessagePort.prototype.off.

Impact

With messageListenerCount >= 1, uninstallMessageForwarder() is never called; messageForwarder stays on self via self.addEventListener('message', messageForwarder, {capture:true}); per BunWorkerGlobalScope.cpp's onDidChangeListenerImpl, m_messageEventCount > 0 prevents unrefEventLoop(). The worker hangs indefinitely.

Fix

Either (A) clear fn[wrappedListener] inside Class.prototype.off after calling removeEventListener, preventing future stale lookups; or (B) in parentPortRemoveEventListener, after the resolved-key lookup misses, retry with the original listener key — matching the asymmetry that parentPortAddEventListener uses fn as the key when called from addEventListener directly.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same class of edge case as the prior comment (stale fn[wrappedListener] after mixing on/off with addEventListener/removeListener in a specific 4-step order). Not exercised by any of the five issues this PR fixes, and ASAN CI is currently green on the latest revision. Both of the class-of-bug fixes — (a) clearing fn[wrappedListener] in Class.prototype.off after removal, or (b) falling back to the raw listener key in parentPortRemoveEventListener — are best landed together in a dedicated PR that can touch injectFakeEmitter (used by many callers) and carry a targeted regression test. Keeping this one focused on the double-dispatch bug.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

1 participant