Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .github/self-heal-schedule.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
SCHEDULE: "0 0 * * *" # AUTO-UPDATED
LAST_UPDATED: "2024-05-27T00:00:00.000Z"
RATIONALE: "Bootstrap schedule. Default to running daily at midnight until telemetry provides a more optimal cadence."
82 changes: 82 additions & 0 deletions .github/workflows/compute-schedule.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
name: Compute Self-Heal Schedule

on:
schedule:
- cron: "0 0 * * *" # Runs daily to adaptively update the main workflow schedule
workflow_dispatch:

concurrency:
group: schedule-update-${{ github.ref }}
cancel-in-progress: true

permissions:
contents: write
pull-requests: write
actions: read

jobs:
compute-schedule:
runs-on: ubuntu-latest
timeout-minutes: 5
if: github.ref_name == github.event.repository.default_branch
steps:
- name: Checkout
uses: actions/checkout@v4
with:
fetch-depth: 0

- name: Setup Node.js
uses: actions/setup-node@v4
with:
node-version: "20"
cache: "npm"

- name: Install dependencies
run: npm ci

- name: Compute new schedule
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: node scripts/compute_schedule.mjs > compute.log 2>&1 || true

- name: Upload log
if: always()
uses: actions/upload-artifact@v4
with:
name: compute-schedule-log
path: compute.log

- name: Check Diff
id: check_diff
run: |
if [ -n "$(git status --porcelain)" ]; then
echo "DIFF_EXISTS=1" >> $GITHUB_ENV
fi

- name: Create PR
if: env.DIFF_EXISTS == '1'
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: |
# Check for duplicate schedule update PRs
RECENT_PR=$(gh pr list --label self-heal-schedule --state open --json createdAt -q '.[0].createdAt')
if [ ! -z "$RECENT_PR" ] && [ "$RECENT_PR" != "null" ]; then
echo "Duplicate schedule PR exists, aborting"
exit 1
fi

git add .github/self-heal-schedule.yml .github/workflows/self-heal.yml 2>/dev/null || true

BRANCH="selfheal-schedule-$(date +%s)"
git config user.name "github-actions[bot]"
git config user.email "github-actions[bot]@users.noreply.github.com"
git checkout -b "$BRANCH"
git commit -m "Update self-heal schedule cadence"

git push origin "$BRANCH"

gh pr create --title "[Self-Heal Schedule] Update cadence" \
--body "Automated PR to update the self-healing schedule based on repository telemetry." \
--label self-heal \
--label self-heal-schedule \
--label automation
143 changes: 143 additions & 0 deletions .github/workflows/self-heal.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,143 @@
name: Self-Heal Repair Pipeline

on:
schedule:
- cron: "0 0 * * *" # AUTO-UPDATED
workflow_run:
workflows: ["ci"]
types:
- completed
workflow_dispatch:

concurrency:
group: selfheal-${{ github.ref }}
cancel-in-progress: true

permissions:
contents: write
pull-requests: write
actions: read

jobs:
repair:
runs-on: ubuntu-latest
timeout-minutes: 15
if: >
!startsWith(github.ref_name, 'selfheal-') &&
github.ref_name == github.event.repository.default_branch &&
(github.event_name == 'schedule' || github.event_name == 'workflow_dispatch' || (github.event_name == 'workflow_run' && github.event.workflow_run.conclusion == 'failure'))
steps:
- name: Checkout
uses: actions/checkout@v4

- name: Setup Node.js
uses: actions/setup-node@v4
with:
node-version: "20"
cache: "npm"

- name: Install dependencies
run: npm ci

- name: Check for duplicate PR
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: |
# Cleanup stale PRs older than 7 days first
gh pr list --label self-heal --state open --search "created:<$(date -d '7 days ago' -I)" --json number -q '.[].number' | xargs -I {} gh pr close {}

RECENT_PR=$(gh pr list --label self-heal --state open --json createdAt -q '.[0].createdAt')
if [ ! -z "$RECENT_PR" ] && [ "$RECENT_PR" != "null" ]; then
echo "Duplicate PR exists, aborting"
exit 1
fi

- name: Pre-healthcheck
id: pre
run: node scripts/healthcheck.mjs > pre-check.log 2>&1 || echo "PRE_FAILED=1" >> $GITHUB_ENV

- name: Repair script
id: repair
run: node scripts/self_heal.mjs > repair.log 2>&1 || echo "REPAIR_FAILED=1" >> $GITHUB_ENV

- name: Post-healthcheck
id: post
run: node scripts/healthcheck.mjs > post-check.log 2>&1 || echo "POST_FAILED=1" >> $GITHUB_ENV

- name: Upload logs
if: always()
uses: actions/upload-artifact@v4
with:
name: selfheal-logs
path: |
pre-check.log
repair.log
post-check.log

- name: Check Diff
id: check_diff
run: |
if [ -n "$(git status --porcelain)" ]; then
echo "DIFF_EXISTS=1" >> $GITHUB_ENV
fi

- name: Create PR
if: env.POST_FAILED != '1' && env.DIFF_EXISTS == '1'
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: |
# Stage specific files safely, explicitly skipping restricted files
for path in src tests package.json package-lock.json scripts; do
git add "$path" 2>/dev/null || true
done

# Add .github files explicitly avoiding ci.yml
git add .github/workflows/self-heal.yml .github/workflows/compute-schedule.yml .github/self-heal-schedule.yml 2>/dev/null || true

# Check for entropy / secrets before commit
# When using grep -E, pipe is an OR operator and should NOT be escaped
if git diff --staged | grep -i -E -q 'api[_-]?key|secret|token|password'; then
echo "Possible secret exposed, aborting"
exit 1
fi

# Create branch and commit
BRANCH="selfheal-$(date +%s)"
git config user.name "github-actions[bot]"
git config user.email "github-actions[bot]@users.noreply.github.com"
git checkout -b "$BRANCH"
git commit -m "Auto-repair drift/failure"

git push origin "$BRANCH"

if [ "${{ github.event_name }}" == "schedule" ]; then
TITLE="[Self-Heal Scheduled] Drift fixes"
elif [ "${{ github.event_name }}" == "workflow_run" ]; then
TITLE="[Self-Heal Reactive] CI fix"
else
TITLE="[Self-Heal Manual] Repair"
fi

SCHEDULE_INFO=$(cat .github/self-heal-schedule.yml)

BODY="Automated repair PR. Trigger: \`${{ github.event_name }}\`.

### Drift Summary
The repository experienced drift or CI failures which have been automatically repaired. Please review the changes for correctness.

### Self-Scheduling
Current configuration:
\`\`\`yaml
$SCHEDULE_INFO
\`\`\`

### Artifacts
Review the workflow logs:
- [Pre-check Log](${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }})
- [Repair Log](${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }})
- [Post-check Log](${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }})"

gh pr create --title "$TITLE" \
--body "$BODY" \
--label self-heal \
--label automation
45 changes: 45 additions & 0 deletions SELF_HEAL_SETUP.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
# Self-Healing Configuration

This project is configured with a self-healing continuous integration and drift repair pipeline.

## Overview

The self-healing mechanism is designed to automatically resolve common build failures, linting issues, and test snapshot mismatches. The system generates PRs when a fix successfully passes health checks and produces a meaningful diff.

The pipeline runs based on three triggers:
1. **Scheduled:** Runs proactively based on an autonomously computed schedule (see `Self-Scheduling` below).
2. **Reactive:** Triggers automatically if the main `ci` workflow fails on the default branch.
3. **Manual:** Can be manually triggered via GitHub Actions (`workflow_dispatch`).

## Workflow Details

- **Healthcheck (`scripts/healthcheck.mjs`)**: Verifies project health (linting, tests, build). Must pass cleanly before and after the repair script.
- **Repair Script (`scripts/self_heal.mjs`)**: Sequentially attempts idempotent repairs:
1. Reinstalls dependencies (`npm ci`).
2. Formats and lints code (`eslint --fix` and `prettier`).
3. Updates test snapshots (`vitest -u`).
4. Rebuilds the project.
After each step, if a healthcheck passes and there is a git diff, a PR is successfully generated.

## Self-Scheduling

To prevent unnecessary runs on dormant projects or delayed runs on highly active ones, the self-healing scheduled cadence is dynamically computed:
- Handled by `.github/workflows/compute-schedule.yml` running `scripts/compute_schedule.mjs`.
- Telemetry (commits and merged PRs over the last 7 days) informs the schedule.
- High velocity updates result in more frequent self-healing runs (e.g. every 4 hours), while low activity scales back (e.g. weekly).
- Current schedule and rationale can be found in `.github/self-heal-schedule.yml`.

### Manual Override

To override the self-healing schedule and prevent automatic reassignment:
1. Update `.github/self-heal-schedule.yml` manually with your desired cron expression.
2. Ensure you modify the `LAST_UPDATED` timestamp to be far in the future to trigger the "oscillation guard", avoiding script overrides.
3. Update `.github/workflows/self-heal.yml` to reflect your chosen cron expression.

## Reviewer Checklist

When reviewing a PR generated by this bot (`[Self-Heal ...] ...`):
- [ ] Check the trigger reason in the title and description.
- [ ] Review the workflow artifacts (`pre-check.log`, `repair.log`, `post-check.log`) to understand what was broken and how it was fixed.
- [ ] Verify that no new functionality or logic was inadvertently changed (only formats, snapshots, stubs, etc. are allowed).
- [ ] Merge the PR safely. Note: Auto-merge is explicitly disabled for these PRs.
Loading