feat(skill): add self-improve autonomous code improvement skill by lonj7798 · Pull Request #2074 · Yeachan-Heo/oh-my-claudecode

lonj7798 · 2026-04-01T03:52:28Z

Summary

Add a self-contained self-improve skill that autonomously improves any target codebase through tournament-based evolutionary optimization
Spawns parallel agent pairs (researcher → planner → executor), benchmarks experiments in isolated git worktrees, merges only the best-performing change per iteration
Leverages 4 existing OMC agents (planner, architect, critic, executor) with 4 custom reference docs for specialized roles (researcher, benchmark-builder, goal-clarifier, tournament logic)
Skill-only invocation via /oh-my-claudecode:self-improve (no keyword trigger — "self improve" is too common in English)

What's included

New skill directory (skills/self-improve/):

SKILL.md — Loop controller with 11-step iteration cycle, resumability, cancellation
si-researcher.md, si-benchmark-builder.md, si-goal-clarifier.md — Custom agent prompts
data_contracts.md — 12 JSON schemas for inter-agent communication
scripts/validate.sh — Sealed file + plan/result schema validation
scripts/plot_progress.py — Progress visualization (matplotlib with text fallback)
templates/ — Default configs (settings, agent-state, goal, harness, ideas)

Integration points (4 files):

src/tools/state-tools.ts — 'self-improve' in STATE_TOOL_MODES + EXTRA_STATE_ONLY_MODES
src/hooks/skill-state/index.ts — 'self-improve': 'heavy' (10 reinforcements, 30min TTL)
skills/cancel/SKILL.md — Position 11 in cancellation dependency order
CLAUDE.md — Added to workflow skill catalog

Test update: src/__tests__/skills.test.ts — Updated expected skill counts

Inspired by

lonj7798/self-improvement — evolutionary code improvement engine with tournament selection, sealed benchmarks, and institutional memory.

Test plan

npm test passes (skill count assertions updated)
/oh-my-claudecode:self-improve loads the skill
state_read(mode='self-improve') works
Stop hook reinforces with heavy protection (10x, 30min)
/oh-my-claudecode:cancel clears self-improve state
scripts/validate.sh runs without errors
Manual: full iteration cycle with a test repo + benchmark

🤖 Generated with Claude Code

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 80ef26fb59

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-04-01T03:56:06Z

skills/self-improve/SKILL.md

+2. **Filter** to `status: "success"` only. If zero candidates, skip to Step 9 (Record & Visualize).
+3. **Rank** by `benchmark_score` (respecting `benchmark_direction`)
+4. **Ranked-candidate loop** — for each candidate in rank order (best first):
+   a. **No-regression check**: candidate score must be >= current `best_score`


Respect metric direction in regression gate

The tournament gate currently hard-codes candidate score >= best_score, which is only valid for higher_is_better. For goals configured as lower_is_better (e.g., latency/error), genuinely better candidates will be rejected before merge, so the loop can stall even when improvements exist. The comparison in this step needs to branch on benchmark_direction just like ranking does.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-04-01T03:56:06Z

skills/self-improve/scripts/validate.sh

+        base_commit=$(git -C "${GIT_DIR}" merge-base HEAD HEAD~1 2>/dev/null || echo "HEAD~1")
+        modified_files_str=$(git -C "${GIT_DIR}" diff --name-only "${base_commit}" 2>/dev/null || true)


Compare sealed files against the correct baseline

In --worktree mode, sealed-file detection diffs against merge-base HEAD HEAD~1 (effectively HEAD~1 for normal commits), which is not the branch baseline. In a fresh experiment branch this can include unrelated changes from the parent commit and falsely report sealed-file violations, and in multi-commit experiments it can miss sealed-file edits made before the last commit. This makes sealed-file enforcement both noisy and unreliable during executor runs.

Useful? React with 👍 / 👎.

Yeachan-Heo · 2026-04-01T04:17:55Z

CI failed — 2 test failures in keyword-detector-script.test.ts and hook-templates.test.ts. These tests were updated in recent dev merges (#2068 changed keyword-detector to inject SKILL.md content directly instead of using Skill tool invocations).

Please rebase on dev and the tests should align:

git fetch origin dev
git rebase origin/dev

—
[repo owner's gaebal-gajae (clawdbot) 🦞]

Integrate an evolutionary self-improvement loop as a self-contained skill that autonomously improves any target codebase through tournament selection. The skill spawns parallel agent pairs (researcher → planner → executor), benchmarks each experiment in isolated git worktrees, and merges only the best-performing change per iteration. Skill structure: - SKILL.md: Loop controller with 11-step iteration cycle, resumability, and cancellation support - si-researcher.md, si-benchmark-builder.md, si-goal-clarifier.md: Custom reference docs for roles without OMC agent equivalents - data_contracts.md: 12 JSON schemas for inter-agent communication - scripts/validate.sh: Sealed file + plan/result schema validation - scripts/plot_progress.py: Progress visualization with matplotlib fallback - templates/: Default config for settings, agent state, goal, harness, ideas Integration points (4 files): - state-tools.ts: Register in STATE_TOOL_MODES + EXTRA_STATE_ONLY_MODES - skill-state/index.ts: SKILL_PROTECTION 'heavy' (10 reinforcements, 30min) - cancel/SKILL.md: Position 11 in dependency order with cleanup semantics - CLAUDE.md: Added to workflow skill catalog Agent mapping (evidence-based): - 4 OMC agents leveraged: planner, architect, critic, executor - 4 custom roles: researcher, benchmark-builder, goal-clarifier, tournament - Skill-only invocation (/oh-my-claudecode:self-improve) — no keyword trigger Inspired by github.com/lonj7798/self-improvement Constraint: EXTRA_STATE_ONLY_MODES (not full ExecutionMode) to minimize integration surface Rejected: Global agents | context bloat in every session Rejected: Full OMC agent wrapping | impedance mismatch (explore=haiku, git-master=atomic only) Rejected: Keyword trigger | "self improve" too common in English, accidental trigger risk Confidence: high Scope-risk: narrow Not-tested: Full multi-iteration loop with real benchmark (manual testing recommended) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…seline In --worktree mode, detect the improve/* branch as the correct merge-base for sealed file comparison instead of HEAD~1. This ensures multi-commit experiments are fully checked and avoids false positives from unrelated parent commit changes. Constraint: Must handle repos where improve/* branch does not yet exist Confidence: high Scope-risk: narrow Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…on check The no-regression gate hard-coded >= which only works for higher_is_better. For lower_is_better goals (latency, error rate), better candidates would be rejected. Now branches on benchmark_direction like the ranking step does. Confidence: high Scope-risk: narrow Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c13231305c

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-04-01T04:39:48Z

skills/self-improve/scripts/validate.sh

+        improve_branch=$(git -C "${GIT_DIR}" branch -a --list 'improve/*' 2>/dev/null | head -1 | tr -d ' *' || true)
+        if [[ -z "${improve_branch}" ]]; then


Derive sealed-file baseline from current experiment branch

In --worktree mode the sealed-file diff baseline is chosen from the first improve/* branch returned by Git, not from the improvement branch that the current experiment/round_* branch was created from. If a repo has multiple improvement branches, this can diff against an unrelated history and either falsely report sealed-file violations or miss real ones for this executor run. Fresh evidence: this revision now uses git branch -a --list 'improve/*' | head -1, which is still branch-agnostic.

Useful? React with 👍 / 👎.

current skill is designed to maintain a single improved branch at a time (only keep winner branch), and this will help to keep the branch as clean as possible while reducing the complexity. So, head -1 will return the correct branch.

chatgpt-codex-connector · 2026-04-01T04:39:48Z

skills/self-improve/scripts/validate.sh

+
+# Settings path must be provided or discovered from .omc/self-improve/config/
+SETTINGS=""
+VALID_APPROACH_FAMILIES="architecture training_config data infrastructure optimization testing documentation other"


Accept custom approach families in plan validation

The validator hard-codes VALID_APPROACH_FAMILIES and rejects anything outside that list, but the skill contracts explicitly allow custom approach families from harness.md (data_contracts.md states these are valid). As soon as a user configures a custom family, check_plan_schema will fail and block executor benchmarking for otherwise valid plans.

Useful? React with 👍 / 👎.

lonj7798 · 2026-04-01T04:42:52Z

CI failure is a flaky performance test in bridge.test.ts:193 — expected 159 to be less than 100 (timing-dependent, CI runner was slow). Not related to this PR's changes. 409/410 test files pass, 7280/7281 tests pass.

Can you re-run CI or merge with the flaky test acknowledged?

Yeachan-Heo · 2026-04-01T04:49:00Z

CI is all green now ✅ (the flaky test passed on re-run). Will review the PR shortly.

—
[repo owner's gaebal-gajae (clawdbot) 🦞]

Yeachan-Heo

Review: REQUEST_CHANGES

Thanks for the ambitious PR — the self-improve concept is directionally interesting, but in its current form there are blocking concerns that need to be addressed before merge.

Blocking Issues

1. Unsandboxed arbitrary-code execution / repo trust boundary

SKILL.md:96-100, 213-225, si-benchmark-builder.md:47-49, 63-67
The feature accepts an arbitrary repo_path, creates/uses benchmark code in that repo, then runs benchmark commands autonomously
Git worktrees isolate Git state, not process/network/env access
As written, this is effectively an autonomous code-execution loop over arbitrary repos with no sandbox/trust gate

2. Autonomous push/PR behavior is too risky

SKILL.md:130, 243, 302
The loop auto-pushes winners and may create a PR upstream
Without explicit opt-in and remote verification, this is too dangerous for a new skill

3. Cancel/resume integration is incomplete

SKILL.md:147, 276, 301-312, cancel/SKILL.md:118, state-tools.ts:46-49
The skill promises user_stopped, preserved iteration_state, orphaned worktree cleanup, and resume safety, but the patch only adds a state-tool enum entry, heavy skill protection, and one cancel bullet
There is no real self-improve-specific cancel flow implementing those promises

Non-blocking Issues

4. Validator contradicts the documented contract — data_contracts.md:144-157 docs say custom approach families from harness.md are valid, but validate.sh hardcodes a fixed whitelist

5. Sealed-file baseline is nondeterministic — validate.sh:83-94 picks the first improve/* branch via head -1, which can be the wrong baseline if multiple improvement branches exist

6. Docs are under-integrated — CLAUDE.md is updated, but broader user-facing docs/skill inventories are not

What Needs to Happen

Add a trust/sandbox model for the execution loop
Make push/PR creation explicitly opt-in (not default behavior)
Implement real cancel/resume integration (not just documentation)
Fix validator contract mismatches

Looking forward to a v2!

—
[repo owner's gaebal-gajae (clawdbot) 🦞]

Yeachan-Heo · 2026-04-01T05:28:27Z

Review Summary

Gate: ✅ Star gate passed (starred oh-my-claudecode)

CI: ✅ All checks passing (after dev fix in #2075 + flaky perf test rerun)

Code Review:

Core integration (3 files) — Clean

state-tools.ts: adds self-improve to STATE_TOOL_MODES + EXTRA_STATE_ONLY_MODES — follows existing pattern
skill-state/index.ts: self-improve: 'heavy' protection (10 reinforcements, 30min TTL) — same as deepinit, appropriate for long-running
skills.test.ts: 31→32 canonical, 32→33 total — correctly updated

Skill content (14 files) — Self-contained

SKILL.md (345 lines): well-structured loop controller with 11 steps, state tracking, agent mapping, git strategy, stop conditions
Supporting docs: data_contracts.md (12 JSON schemas), si-researcher.md, si-benchmark-builder.md, si-goal-clarifier.md
Scripts: validate.sh (sealed file + schema validation), plot_progress.py (matplotlib visualization)
Templates: default configs for settings, agent-state, goal, harness, ideas

Notes

No keyword trigger (skill-only invocation via /oh-my-claudecode:self-improve) — good call, "self improve" is too common
Cancel integration documented in position 11 of dependency order
Inspired by lonj7798/self-improvement
All state isolated under .omc/self-improve/

Decision needed

This is a feature addition — owner approval needed for inclusion. Code quality looks good, integration is minimal and follows patterns.

—
[repo owner's gaebal-gajae (clawdbot) 🦞]

…in, cancel/resume Addresses all 3 blocking issues from maintainer review on Yeachan-Heo#2074: 1. Trust gate: Setup phase now requires explicit user confirmation of repo_path and benchmark_command before any autonomous execution. Consent persisted as trust_confirmed in agent-settings.json; skipped on resume. Gate enforced alongside si_setting_* flags. 2. Push/PR opt-in: auto_push and auto_pr default to false in settings. Tournament push (Step 8) and completion PR gated on these flags. When disabled, exact manual commands are logged for the user. 3. Cancel/resume: New Step 0 cleans stale worktrees idempotently on every iteration start and resume. Step 2 detects cancel with explicit cleanup flow (set user_stopped, update iteration_state, clean worktrees). Resumability handles user_stopped (ask), crash (auto-resume), and fresh start with Step 0 as a hard prerequisite. Also fixes non-blocker: removed approach_family enum check from validate.sh — critic owns taxonomy validation (supports custom families from harness.md). Constraint: All fixes are SKILL.md prompt changes, no new TypeScript Rejected: Compiled trust infrastructure | disproportionate for skill-level feature Confidence: high Scope-risk: narrow Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9e1ff7cd38

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-04-01T05:35:32Z

skills/self-improve/SKILL.md

+3. **Rank** by `benchmark_score` (respecting `benchmark_direction`)
+4. **Ranked-candidate loop** — for each candidate in rank order (best first):
+   a. **No-regression check**: candidate score must improve or hold even vs `best_score`, respecting `benchmark_direction` (`higher_is_better`: score >= best_score; `lower_is_better`: score <= best_score)
+   b. **Merge** via `oh-my-claudecode:git-master`: `git merge experiment/round_{n}_executor_{id} --no-ff -m "Iteration {n}: {hypothesis} (score: {before} → {after})"`


Checkout improvement branch before running merge command

This merge step uses git merge experiment/... without switching branches first, but Setup explicitly leaves the main repo on {target_branch} after creating improve/{goal_slug}. Because git merge always merges into the current branch, following these instructions literally can merge experiment commits into the protected baseline branch instead of improve/{goal_slug}. Add an explicit checkout/switch to the improvement branch immediately before this command (or use a command form that names both source and destination).

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-04-01T05:35:33Z

skills/self-improve/scripts/validate.sh

+            elif [[ "${field}" == "benchmark_score" ]]; then
+                exists=$(jq --arg f "${field}" 'has($f)' "${result_file}" 2>/dev/null || echo "false")
+                if [[ "${exists}" != "true" ]]; then
+                    missing="${missing} ${field}"


Validate benchmark_score type for success results

Result validation only checks whether benchmark_score exists, so non-numeric values like strings are accepted even when status is success. The tournament step ranks and compares candidates by score, so allowing non-numeric values can produce incorrect winner selection or comparison failures. Enforce that benchmark_score is numeric for successful runs (and only relax this for error/timeout statuses if needed).

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-04-01T05:35:33Z

skills/self-improve/scripts/validate.sh

+    steps_len=$(jq '.steps | length' "${plan_file}" 2>/dev/null || echo "0")
+    if [[ "${steps_len}" -eq 0 ]]; then
+        err "steps must be a non-empty array"


Require steps to be an array during plan validation

The plan check uses .steps | length but never verifies that steps is an array, so malformed payloads like a string still pass schema validation when non-empty. Downstream executor logic expects an ordered list of step objects, so this can allow invalid plans through and cause execution ambiguity/failures later in the loop. Add an explicit type == "array" check before evaluating length.

Useful? React with 👍 / 👎.

chatgpt-codex-connector bot reviewed Apr 1, 2026

View reviewed changes

lonj7798 force-pushed the feat/self-improve-skill branch from 80ef26f to 86c34a7 Compare April 1, 2026 04:31

lonj7798 and others added 2 commits March 31, 2026 21:33

chatgpt-codex-connector bot reviewed Apr 1, 2026

View reviewed changes

Yeachan-Heo requested changes Apr 1, 2026

View reviewed changes

lonj7798 requested a review from Yeachan-Heo April 1, 2026 05:31

chatgpt-codex-connector bot reviewed Apr 1, 2026

View reviewed changes

Yeachan-Heo merged commit a59c097 into Yeachan-Heo:dev Apr 1, 2026

		base_commit=$(git -C "${GIT_DIR}" merge-base HEAD HEAD~1 2>/dev/null \|\| echo "HEAD~1")
		modified_files_str=$(git -C "${GIT_DIR}" diff --name-only "${base_commit}" 2>/dev/null \|\| true)

		improve_branch=$(git -C "${GIT_DIR}" branch -a --list 'improve/' 2>/dev/null \| head -1 \| tr -d ' ' \|\| true)
		if [[ -z "${improve_branch}" ]]; then

Uh oh!

Conversation

lonj7798 commented Apr 1, 2026

Summary

What's included

Inspired by

Test plan

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Yeachan-Heo commented Apr 1, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

lonj7798 Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

lonj7798 commented Apr 1, 2026

Uh oh!

Yeachan-Heo commented Apr 1, 2026

Uh oh!

Yeachan-Heo left a comment

Choose a reason for hiding this comment

Review: REQUEST_CHANGES

Blocking Issues

Non-blocking Issues

What Needs to Happen

Uh oh!

Yeachan-Heo commented Apr 1, 2026

Review Summary

Core integration (3 files) — Clean

Skill content (14 files) — Self-contained

Notes

Decision needed

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants