Skip to content

fix: prevent nested skill invocations from orphaning skill-active-state#2025

Merged
Yeachan-Heo merged 4 commits intoYeachan-Heo:devfrom
DdangJin:fix/nested-skill-active-state
Mar 30, 2026
Merged

fix: prevent nested skill invocations from orphaning skill-active-state#2025
Yeachan-Heo merged 4 commits intoYeachan-Heo:devfrom
DdangJin:fix/nested-skill-active-state

Conversation

@DdangJin
Copy link
Copy Markdown
Contributor

@DdangJin DdangJin commented Mar 30, 2026

Summary

  • Add nesting guard in writeSkillActiveState(): if another skill is already active in the same session, skip writing — the parent's protection already covers the session
  • Add explicit skill-active-state.json cleanup in setup-progress.sh complete to ensure no stale skill state survives after setup finishes
  • Make PostToolUse clear nesting-aware: only clear state if the completing skill owns it (prevents child completion from deleting parent's state)
  • Sync SKILL_PROTECTION_MAP between MJS and TS (7+ entries including ralplan)
  • Add _meta envelope to MJS state writes, session ID validation, stale-only cleanup

Fixes #2024

Root Cause

During omc-setup, Phase 3 invokes mcp-setup via the Skill tool. writeSkillActiveState() unconditionally overwrites the parent (omc-setup) state with the child (mcp-setup) state. When both skills complete, the child's active: true state persists with no completion signal, causing the stop hook to block 5 times with "[SKILL ACTIVE: mcp-setup] still executing".

Additionally, processPostToolUse unconditionally called clearSkillActiveState() on any Skill completion, meaning even with the nesting guard protecting writes, the child's PostToolUse would delete the parent's state.

Reproduced during omc-setup update from 4.9.1 to 4.9.3.

Changes

src/hooks/bridge.ts

  • PostToolUse now reads current state and only clears if the completing skill owns it

src/hooks/skill-state/index.ts

  • Nesting guard in writeSkillActiveState(): skip write if a different skill is already active

scripts/pre-tool-enforcer.mjs

  • Matching nesting guard + TOCTOU comments
  • Synced SKILL_PROTECTION_MAP with canonical TS version
  • Added _meta envelope to state writes
  • Use validated safeSessionId for session_id field

scripts/setup-progress.sh

  • Session-scoped cleanup with regex-validated session ID
  • Fallback find scoped to stale files only (-mmin +30)

src/hooks/skill-state/__tests__/skill-state.test.ts

  • 5 new tests including end-to-end nesting lifecycle
  • 1 redundant test removed
  • 51/51 pass

Test plan

  • Verified setup-progress.sh complete clears skill-active-state.json from session dirs
  • Verified nesting guard: parent skill state not overwritten when child skill is invoked
  • Verify stop hook no longer blocks after omc-setup completes with nested mcp-setup

🤖 Generated with Claude Code

When a skill (e.g. omc-setup) invokes another skill (e.g. mcp-setup)
via the Skill tool, writeSkillActiveState() overwrites the parent's
skill-active-state.json with the child's state. When both skills
complete, the child's active state persists with no completion signal,
causing the stop hook to block up to max_reinforcements times with
"[SKILL ACTIVE: mcp-setup] still executing" even though all work is
done.

Two-part fix:

1. Nesting guard in writeSkillActiveState(): if another skill is
   already active in the same session, skip writing. The parent's
   protection already covers the session.

2. Explicit cleanup in setup-progress.sh complete: delete all
   skill-active-state.json files under .omc/state/ when setup
   finishes, so the stop hook has no stale state to block on.

Reproduced during omc-setup update from 4.9.1 to 4.9.3, where
Phase 3 invokes mcp-setup inside omc-setup.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@chatgpt-codex-connector
Copy link
Copy Markdown

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Repo admins can enable using credits for code reviews in their settings.

Copy link
Copy Markdown
Owner

@Yeachan-Heo Yeachan-Heo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good direction — the nesting guard is the right idea. A few concerns:

1. Over-cleaning in setup-progress.sh

The completion cleanup deletes all skill-active-state.json files under .omc/state/, which could interfere with concurrent active sessions. Should scope to only the current skill's state file.

2. Non-atomic nesting guard

The read-check-write in writeSkillActiveState() has a TOCTOU race — two skills starting simultaneously could both pass the "no active skill" check. For a single-threaded Claude session this is unlikely but worth noting.

3. Test coverage gaps

  • No regression test for the nested invocation scenario (parent + child skill)
  • Existing tests may still expect the old overwrite behavior — need to verify/update those
  • Need a test for the cleanup path in setup-progress.sh complete

Suggestion

  • Scope the cleanup to only the invoking skill's state file (or use skill name matching)
  • Add a test that simulates: parent skill writes state → child skill attempts write → verify parent state preserved
  • Update any existing tests that rely on unconditional overwrite

The core fix (nesting guard) is correct in concept, just needs tightening.


[repo owner's gaebal-gajae (clawdbot) 🦞]

Addresses review feedback from Yeachan-Heo#2025:

1. Scope cleanup in setup-progress.sh to current session only
   (uses $CLAUDE_SESSION_ID with fallback to broad cleanup)
2. Add nesting guard to TypeScript source (src/hooks/skill-state/index.ts)
   matching the scripts/pre-tool-enforcer.mjs fix
3. Document TOCTOU race condition (acceptable for single-threaded sessions)
4. Update/add tests:
   - "does not overwrite when a different skill is already active"
   - "allows re-invocation of the same skill"
   - "allows write when no prior state exists"

All 56 tests pass (skill-state + stop-hook integration).

Constraint: Claude Code sessions are single-threaded so TOCTOU is not a real risk
Rejected: Atomic file locking | unnecessary complexity for single-threaded runtime
Confidence: high
Scope-risk: narrow
Not-tested: Concurrent multi-session cleanup race on shared .omc/state/

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@DdangJin
Copy link
Copy Markdown
Contributor Author

Review Feedback Addressed

All 3 concerns from the review have been addressed:

1. Over-cleaning in setup-progress.sh → Scoped to current session

  • Now uses $CLAUDE_SESSION_ID / $CLAUDECODE_SESSION_ID to target only the current session's skill-active-state.json
  • Falls back to broad find cleanup only when no session ID is available (legacy safety net)

2. TOCTOU race → Documented

  • Added comments in both TypeScript source and mjs script noting the read-check-write is non-atomic but acceptable for single-threaded Claude sessions

3. Test coverage → Added

  • Updated existing overwrite test → now verifies nesting guard preserves parent state
  • Added: "allows re-invocation of the same skill" (idempotent refresh)
  • Added: "allows write when no prior state exists" (baseline)
  • All 56 tests pass (47 skill-state + 9 stop-hook integration)

Files changed

File Change
src/hooks/skill-state/index.ts Nesting guard in writeSkillActiveState()
scripts/pre-tool-enforcer.mjs Matching nesting guard + TOCTOU comments
scripts/setup-progress.sh Session-scoped cleanup in cmd_complete()
src/hooks/skill-state/__tests__/skill-state.test.ts 1 updated + 2 new tests

@DdangJin DdangJin requested a review from Yeachan-Heo March 30, 2026 05:59
@DdangJin
Copy link
Copy Markdown
Contributor Author

Working on additional improvements — will push an update shortly.

The nesting guard in writeSkillActiveState() correctly prevents child
skills from overwriting parent state, but the PostToolUse handler
unconditionally calls clearSkillActiveState() on any Skill completion,
deleting the parent's state when the child finishes.

Three additional fixes:
- Sync SKILL_PROTECTION_MAP in pre-tool-enforcer.mjs with canonical TS
  (ralplan: none, add missing entries, remove extras)
- Add _meta envelope to MJS state writes (matching writeModeState)
- Validate session ID in setup-progress.sh before path interpolation
- Scope fallback find to stale files only (-mmin +30)

Tests: 4 added (canonical nesting, triple nesting, reinforcement reset,
stop-hook integration), 1 redundant removed. 50/50 pass.

Constraint: PostToolUse fires for child before parent returns
Rejected: Stack-based approach | over-engineered for current single nesting
Rejected: Amend prior commit | preserves review history
Confidence: high
Scope-risk: narrow
Not-tested: Non-OMC plugin with colon-namespaced skill name

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@DdangJin
Copy link
Copy Markdown
Contributor Author

DdangJin commented Mar 30, 2026

Additional Improvements (d7a61e4, 80bacb2)

Review uncovered several issues beyond the original nesting guard fix. All addressed:

CRITICAL fixes

Issue File Fix
PostToolUse unconditionally clears parent state bridge.ts:1828-1840 Guard: only clear if completing skill owns the active state
MJS writes without _meta envelope pre-tool-enforcer.mjs:486-490 Added _meta envelope matching TS writeModeState format

HIGH fixes

Issue File Fix
SKILL_PROTECTION_MAP not synced between MJS and TS (7+ entries, notably ralplan was 'medium' in MJS after prior commit vs 'none' in TS) pre-tool-enforcer.mjs:363-390 Synced to canonical TS map
$sid path injection (no validation) setup-progress.sh:93-98 Added regex validation matching TS SESSION_ID_REGEX

MEDIUM fixes

  • Fallback find scoped to stale files only (-mmin +30) instead of deleting all sessions
  • Empty catch block documented with intent comment
  • session_id field now uses validated safeSessionId

Tests

  • +4 tests (in these commits): canonical omc-setup→mcp-setup, triple nesting, reinforcement count reset, end-to-end nesting lifecycle (simulates full PostToolUse clear guard)
  • -1 test: removed redundant "allows write when no prior state exists"
  • 51/51 pass, build clean

Key architectural insight

The nesting guard prevents overwrite (write path) but the original fix missed the clear path: processPostToolUse unconditionally called clearSkillActiveState() when any Skill completes, deleting the parent's state when the child finishes. The setup-progress.sh cleanup was not defense-in-depth — it was functionally required. With the PostToolUse guard, it is now truly belt-and-suspenders.

Test plan verification

The e2e lifecycle test (80bacb2) simulates the bridge.ts PostToolUse clear guard logic and validates: parent state survives child completion → stop hook still blocks → parent completion clears state → stop hook no longer blocks. All 3 test plan items now checked.

🤖 Generated with Claude Code

@DdangJin
Copy link
Copy Markdown
Contributor Author

DdangJin commented Mar 30, 2026

@Yeachan-Heo Ready for re-review. All previous feedback addressed, plus additional fixes in d7a61e4 + 80bacb2:

  • PostToolUse clear is now nesting-aware (root cause fix — child skill completion no longer deletes parent's state)
  • Synced SKILL_PROTECTION_MAP between MJS and TS (7+ entries including ralplan)
  • Added _meta envelope to MJS state writes, session ID validation in shell, stale-only cleanup
  • 4 new tests added in these commits (51/51 pass, build clean) — including end-to-end nesting lifecycle test
  • All 3 test plan items checked

See previous comment for full details.

Simulates the full omc-setup → mcp-setup lifecycle including the
PostToolUse nesting-aware clear logic from bridge.ts. Verifies:
1. Child completion preserves parent state (clear skipped)
2. Stop hook still blocks while parent is active
3. Parent completion clears state (clear executed)
4. Stop hook no longer blocks after parent completes

This directly validates the PR test plan item:
"Verify stop hook no longer blocks after omc-setup completes"

51/51 pass.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@Yeachan-Heo Yeachan-Heo merged commit 23072e6 into Yeachan-Heo:dev Mar 30, 2026
@Yeachan-Heo
Copy link
Copy Markdown
Owner

Reviewed and merged to dev. Clean fix — nesting guard in writeSkillActiveState() correctly prevents child skills from overwriting parent state, and processPostToolUse only clears state when the completing skill owns it. Test coverage is thorough (8 new cases covering nesting, triple nesting, re-invocation, and full lifecycle).

One note for future PRs: the diff includes a large amount of dist/ and bridge/ build artifacts. Consider rebasing from a clean dev tip or excluding build outputs from the PR diff to keep reviews focused on source changes.

Thanks for the contribution! 🦞


[repo owner's gaebal-gajae (clawdbot) 🦞]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants