[Self-Heal] Add self-scheduling auto-repair workflow#50
Conversation
Implemented a comprehensive self-healing CI pipeline using GitHub Actions to automatically detect drift, recalculate scheduling based on telemetry, and perform idempotent repairs (formatting, snapshots, type stubs, and dependencies) safely. Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
|
👋 Jules, reporting for duty! I'm here to lend a hand with this pull request. When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down. I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job! For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with New to Jules? Learn more at jules.google/docs. For security, I will only act on instructions from the user who triggered this task. |
There was a problem hiding this comment.
Code Review
This pull request introduces a self-healing continuous integration and drift repair pipeline, adding scripts to compute a dynamic schedule based on repository telemetry, perform health checks, and execute automated repair steps. It also adds ESLint, Prettier, and TypeScript-ESLint to the project's devDependencies. The review comments point out several critical improvements: warning about shallow clones affecting git telemetry, ignoring untracked files in git status checks to prevent false positives, handling null values when parsing YAML, validating that the cron placeholder replacement actually occurred, adding the missing typesync package to package.json to avoid slow runtime downloads, and optimizing the repair loop by skipping health checks if a repair command fails.
| // Commit count | ||
| const commitsStr = execSync(`git rev-list --count HEAD --since="${sinceDate}"`, { encoding: 'utf-8' }); | ||
| commitCount = parseInt(commitsStr.trim(), 10) || 0; |
There was a problem hiding this comment.
In GitHub Actions, the default checkout action performs a shallow clone (fetch-depth: 1). Running git rev-list --count HEAD --since="..." on a shallow clone will only count the single checked-out commit, leading to inaccurate telemetry. This will cause the script to incorrectly assume the repository is dormant and scale back the schedule to weekly.
Consider checking if the repository is shallow and logging a warning, or fetching the history if needed. At a minimum, we should detect and warn the user so they know to configure fetch-depth: 0 in their workflow.
// Commit count (warn if shallow clone)
let isShallow = false;
try {
isShallow = execSync('git rev-parse --is-shallow-repository', { encoding: 'utf-8' }).trim() === 'true';
} catch (e) {}
if (isShallow) {
console.warn('Warning: Shallow clone detected. Telemetry (commit count) will be inaccurate. Consider checking out with fetch-depth: 0.');
}
const commitsStr = execSync("git rev-list --count HEAD --since=\"" + sinceDate + "\"", { encoding: 'utf-8' });
commitCount = parseInt(commitsStr.trim(), 10) || 0;| let hasDiff = false; | ||
| try { | ||
| const diff = execSync('git status --porcelain', { encoding: 'utf-8' }); | ||
| if (diff.trim().length > 0) { | ||
| hasDiff = true; | ||
| } | ||
| } catch (err) {} |
There was a problem hiding this comment.
git status --porcelain includes untracked files (prefixed with ??). If any untracked files (such as build artifacts, logs, or temporary files) are generated during the build or test steps and are not ignored in .gitignore, hasDiff will falsely evaluate to true. This will cause the script to prematurely exit with success and potentially create a PR with no actual tracked changes or with unwanted untracked files.
Using the -uno flag with git status --porcelain ignores untracked files, ensuring we only detect changes to tracked files.
| let hasDiff = false; | |
| try { | |
| const diff = execSync('git status --porcelain', { encoding: 'utf-8' }); | |
| if (diff.trim().length > 0) { | |
| hasDiff = true; | |
| } | |
| } catch (err) {} | |
| let hasDiff = false; | |
| try { | |
| const diff = execSync('git status --porcelain -uno', { encoding: 'utf-8' }); | |
| if (diff.trim().length > 0) { | |
| hasDiff = true; | |
| } | |
| } catch (err) {} |
| let currentScheduleData; | ||
| try { | ||
| const rawYaml = await fs.readFile(SCHEDULE_FILE, 'utf8'); | ||
| currentScheduleData = yaml.load(rawYaml); |
There was a problem hiding this comment.
If the schedule file is empty or invalid, yaml.load can return null or undefined. This will cause subsequent property accesses (like currentScheduleData.LAST_UPDATED or currentScheduleData.SCHEDULE) to throw a TypeError and crash the script. Enforce defensive programming by providing a fallback empty object.
| let currentScheduleData; | |
| try { | |
| const rawYaml = await fs.readFile(SCHEDULE_FILE, 'utf8'); | |
| currentScheduleData = yaml.load(rawYaml); | |
| let currentScheduleData; | |
| try { | |
| const rawYaml = await fs.readFile(SCHEDULE_FILE, 'utf8'); | |
| currentScheduleData = yaml.load(rawYaml) || {}; |
| const updatedWorkflow = workflowContent.replace( | ||
| /- cron: ".*" # AUTO-UPDATED/, | ||
| `- cron: "${newCron}" # AUTO-UPDATED` | ||
| ); |
There was a problem hiding this comment.
If the placeholder - cron: "..." # AUTO-UPDATED is missing or malformed in the workflow file, workflowContent.replace will silently do nothing. The script will then write the unmodified content back to the file, leading to a silent failure where the schedule tracker is updated but the actual GitHub Actions workflow schedule is not.
We should verify that the replacement actually occurred and throw an error if the placeholder was not found.
const updatedWorkflow = workflowContent.replace(
/- cron: ".*" # AUTO-UPDATED/,
'- cron: "' + newCron + '" # AUTO-UPDATED'
);
if (updatedWorkflow === workflowContent) {
console.error('Error: Could not find the cron placeholder \'- cron: "..." # AUTO-UPDATED\' in ' + WORKFLOW_FILE + '.');
process.exit(1);
}| "devDependencies": { | ||
| "@anthropic-ai/sdk": "^0.33.1", | ||
| "@eslint/js": "^10.0.1", |
There was a problem hiding this comment.
typesync is executed via npx typesync in scripts/self_heal.mjs, but it is not listed in the devDependencies of package.json. This forces npx to download and install typesync from the npm registry on every single run of the self-healing script, which is slow, inefficient, and susceptible to network or registry failures.
Please add typesync to the devDependencies in package.json to ensure it is installed locally and runs deterministically.
| "devDependencies": { | |
| "@anthropic-ai/sdk": "^0.33.1", | |
| "@eslint/js": "^10.0.1", | |
| "devDependencies": { | |
| "@anthropic-ai/sdk": "^0.33.1", | |
| "@eslint/js": "^10.0.1", | |
| "typesync": "^0.11.1", |
| for (const step of repairSteps) { | ||
| console.log(`\n--- Attempting: ${step.name} ---`); | ||
| runCmd(step.cmd); | ||
|
|
||
| const { passed, hasDiff } = checkHealthAndDiff(); |
There was a problem hiding this comment.
The return value of runCmd(step.cmd) is currently ignored. If a critical repair step fails (for example, npm ci fails due to network issues or lockfile conflicts), the script will still proceed to run the heavy checkHealthAndDiff() function. This runs the full build, lint, and test suites, which will almost certainly fail, wasting significant CI execution time and cluttering the logs with redundant errors.
We should check the return value of runCmd and skip the healthcheck/diff check if the command itself failed.
for (const step of repairSteps) {
console.log("\n--- Attempting: " + step.name + " ---");
if (!runCmd(step.cmd)) {
console.log("Step " + step.name + " failed. Skipping healthcheck.");
continue;
}
const { passed, hasDiff } = checkHealthAndDiff();
Adds an adaptive, self-healing continuous integration pipeline.
It introduces two GitHub Actions workflows (
self-heal.ymlandcompute-schedule.yml) with reactive, scheduled, and manual triggers. Telemetry logic allows the proactive scheduler to analyze repository activity (commit/PR velocity) and calculate optimal cadence dynamically.The idempotent repair pipeline addresses lockfile updates, stubs acquisition, lint fixes, and test snapshot regeneration while safeguarding against secrets commits and infinite automated PR loops. Detailed configurations, overrides, and instructions are provided in
SELF_HEAL_SETUP.md.PR created automatically by Jules for task 13008234447552265971 started by @badMade