Skip to content

[Self-Heal] Add self-scheduling auto-repair workflow#57

Draft
badMade wants to merge 1 commit into
mainfrom
feature/self-healing-ci-1263455366753982256
Draft

[Self-Heal] Add self-scheduling auto-repair workflow#57
badMade wants to merge 1 commit into
mainfrom
feature/self-healing-ci-1263455366753982256

Conversation

@badMade

@badMade badMade commented Jun 4, 2026

Copy link
Copy Markdown
Owner

Self-Heal CI Pipeline

This PR introduces an automated self-healing CI pipeline configured via GitHub Actions and Node.js scripts to automatically repair code drift.

Details

  • Triggers: Supports scheduled runs (based on computed telemetry), reactive runs on ci workflow failure, and manual dispatches.
  • Repair Steps: Implements idempotent repairs for tooling installation (npm ci), formatting (eslint --fix, prettier -w), snapshots (vitest run -u), and dependencies (npm update).
  • Telemetry-Based Scheduling: scripts/compute_schedule.mjs dynamically adjusts the checking cadence by counting recent commits to determine if the repo is in a High, Active, Standard, or Dormant activity state.
  • Safeguards: Prevents self-trigger loops (!startsWith(github.ref_name, 'selfheal-')), strictly prevents direct pushes to default branch (uses gh pr create), and runs entropy regex checks against the git diff to avoid committing API tokens or secrets.

Rationale for Initial Schedule

The initial schedule is set to the bootstrap value of 0 0 * * * (Daily). This gives the repository baseline coverage while the telemetry logic waits for adequate commit data to adjust frequency up or down dynamically during active periods.

(Note: Review SELF_HEAL_SETUP.md for full breakdown and reviewer instructions).


PR created automatically by Jules for task 1263455366753982256 started by @badMade

Implements an automated self-healing CI pipeline capable of detecting code drift, resolving common formatting/snapshot/dependency issues idempotently, and proposing repairs via PR. Includes telemetry-based self-scheduling logic.

Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
@google-labs-jules

Copy link
Copy Markdown

👋 Jules, reporting for duty! I'm here to lend a hand with this pull request.

When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down.

I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job!

For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with @jules. You can find this option in the Pull Request section of your global Jules UI settings. You can always switch back!

New to Jules? Learn more at jules.google/docs.


For security, I will only act on instructions from the user who triggered this task.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces an automated self-healing CI pipeline, including documentation, ESLint configuration, dependency updates, and scripts to compute optimal schedules, run health checks, and perform repair steps. The review feedback highlights several critical improvements: handling shallow clones in compute_schedule.mjs to prevent inaccurate commit counts, validating regex matches before updating workflow files to avoid silent failures, adding robust error fallback guards in health check logging, implementing a fail-fast mechanism for critical repair steps like npm ci, and correcting the exit code in self_heal.mjs when no code drift is detected to prevent false-alarm workflow failures.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment on lines +34 to +42
function getCommitCount(days) {
try {
const since = new Date(Date.now() - days * 24 * 60 * 60 * 1000).toISOString();
const count = execSync(`git log --since="${since}" --oneline | wc -l`, { encoding: 'utf8' });
return parseInt(count.trim(), 10) || 0;
} catch (e) {
return 0;
}
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

In GitHub Actions, the default behavior of actions/checkout is to perform a shallow clone (fetch-depth: 1). In a shallow clone, git log will only see the single fetched commit, causing getCommitCount to return at most 1 and incorrectly classifying active repositories as 'Dormant' or 'Standard'.

Additionally, using git log ... | wc -l is inefficient and less portable across different environments (e.g., Windows development environments without wc).

We can resolve both issues by detecting shallow clones and attempting to unshallow them, and using the more efficient and portable git rev-list --count command.

function getCommitCount(days) {
  try {
    try {
      const isShallow = execSync("git rev-parse --is-shallow-repository", { encoding: "utf8" }).trim() === "true";
      if (isShallow) {
        console.log("Shallow clone detected. Attempting to unshallow for accurate telemetry...");
        execSync("git fetch --unshallow", { stdio: "ignore" });
      }
    } catch (shallowError) {
      console.warn("Warning: Could not unshallow repository history:", shallowError.message);
    }
    const since = new Date(Date.now() - days * 24 * 60 * 60 * 1000).toISOString();
    const count = execSync("git rev-list --count --since=\"" + since + "\" HEAD", { encoding: "utf8" });
    return parseInt(count.trim(), 10) || 0;
  } catch (e) {
    return 0;
  }
}

Comment on lines +24 to +32
function updateWorkflowSchedule(newSchedule) {
const content = fs.readFileSync(WORKFLOW_PATH, 'utf8');
// Match the cron line with the # AUTO-UPDATED marker
const updated = content.replace(
/cron:\s*".*?"\s*# AUTO-UPDATED/,
`cron: "${newSchedule}" # AUTO-UPDATED`
);
fs.writeFileSync(WORKFLOW_PATH, updated, 'utf8');
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The updateWorkflowSchedule function uses content.replace with a regular expression to update the cron schedule. If the # AUTO-UPDATED marker is missing, formatted differently, or uses single quotes, the replacement will silently fail and write the unmodified content back, while the script still reports success. Adding a check to verify that the regex matches before writing prevents silent failures.

Suggested change
function updateWorkflowSchedule(newSchedule) {
const content = fs.readFileSync(WORKFLOW_PATH, 'utf8');
// Match the cron line with the # AUTO-UPDATED marker
const updated = content.replace(
/cron:\s*".*?"\s*# AUTO-UPDATED/,
`cron: "${newSchedule}" # AUTO-UPDATED`
);
fs.writeFileSync(WORKFLOW_PATH, updated, 'utf8');
}
function updateWorkflowSchedule(newSchedule) {
const content = fs.readFileSync(WORKFLOW_PATH, "utf8");
const regex = /cron:\s*".*?"\s*# AUTO-UPDATED/;
if (!regex.test(content)) {
throw new Error("Could not find the 'cron: \"...\" # AUTO-UPDATED' marker in " + WORKFLOW_PATH);
}
const updated = content.replace(regex, "cron: \"" + newSchedule + "\" # AUTO-UPDATED");
fs.writeFileSync(WORKFLOW_PATH, updated, "utf8");
}

Comment thread scripts/self_heal.mjs
Comment on lines +31 to +55
function runStep(name, command) {
log(`\n--- Running Repair Step: ${name} ---`);
try {
const output = execSync(command, { encoding: 'utf-8', stdio: 'pipe' });
log(output);
} catch (error) {
log(`[WARN] Step ${name} encountered an error:`);
log(error.stdout || '');
log(error.stderr || '');
}

const isHealthy = runHealthCheck(POST_CHECK_LOG);
const changed = hasDiff();

log(`Step result: Healthy=${isHealthy}, Diff=${changed}`);

if (isHealthy && changed) {
log('System is healthy and repairs were made. Exiting with success.');
process.exit(0);
} else if (isHealthy && !changed) {
log('System is healthy but no files were modified. Continuing to next potential repair step.');
} else {
log('System is still unhealthy. Proceeding to next repair step.');
}
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

If a critical setup step like npm ci fails, proceeding to run subsequent steps (such as eslint or vitest) is futile and will result in a cascade of confusing module resolution errors. Adding a required parameter to runStep allows the pipeline to fail fast and abort immediately when a critical step fails.

function runStep(name, command, required = false) {
  log("\n--- Running Repair Step: " + name + " ---");
  try {
    const output = execSync(command, { encoding: "utf-8", stdio: "pipe" });
    log(output);
  } catch (error) {
    log("[WARN] Step " + name + " encountered an error:");
    log(error.stdout || "");
    log(error.stderr || "");
    if (required) {
      log("[FATAL] Critical step " + name + " failed. Aborting self-heal pipeline.");
      process.exit(1);
    }
  }

  const isHealthy = runHealthCheck(POST_CHECK_LOG);
  const changed = hasDiff();

  log("Step result: Healthy=" + isHealthy + ", Diff=" + changed);

  if (isHealthy && changed) {
    log("System is healthy and repairs were made. Exiting with success.");
    process.exit(0);
  } else if (isHealthy && !changed) {
    log("System is healthy but no files were modified. Continuing to next potential repair step.");
  } else {
    log("System is still unhealthy. Proceeding to next repair step.");
  }
}

Comment thread scripts/self_heal.mjs
}

// Step 1: Rebuild/reinstall (clean install of tooling + deps)
runStep('Reinstall dependencies', 'npm ci');

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Mark the dependency installation step as required so that the self-heal pipeline aborts immediately if npm ci fails, preventing subsequent steps from failing with confusing module resolution errors.

Suggested change
runStep('Reinstall dependencies', 'npm ci');
runStep('Reinstall dependencies', 'npm ci', true);

Comment thread scripts/self_heal.mjs
Comment on lines +15 to +24
function runHealthCheck(logFile) {
try {
const output = execSync('node scripts/healthcheck.mjs', { encoding: 'utf-8', stdio: 'pipe' });
fs.writeFileSync(logFile, output);
return true;
} catch (error) {
fs.writeFileSync(logFile, error.stdout + '\n' + error.stderr);
return false;
}
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

If execSync fails to spawn the healthcheck script or if stdout/stderr are not populated on the error object, accessing them directly can result in writing 'undefined undefined' to the log file. Adding fallback guards and logging error.message makes the error logging more robust.

function runHealthCheck(logFile) {
  try {
    const output = execSync("node scripts/healthcheck.mjs", { encoding: "utf-8", stdio: "pipe" });
    fs.writeFileSync(logFile, output);
    return true;
  } catch (error) {
    const stdout = error.stdout || "";
    const stderr = error.stderr || error.message || "";
    fs.writeFileSync(logFile, stdout + "\n" + stderr);
    return false;
  }
}

Comment thread scripts/self_heal.mjs
Comment on lines +102 to +105
if (!hasDiff()) {
log('System is healthy but no code drift was detected. Nothing to repair.');
process.exit(1);
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

If the repository is already healthy and no code drift is detected, this is a successful state. Exiting with 1 will mark the scheduled GitHub Actions workflow run as failed, triggering false alarm notifications. It should exit with 0 instead.

Suggested change
if (!hasDiff()) {
log('System is healthy but no code drift was detected. Nothing to repair.');
process.exit(1);
}
if (!hasDiff()) {
log("System is healthy and no code drift was detected. Nothing to repair.");
process.exit(0);
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant