Skip to content

Add scaffold tool for creating new health checks [2/2]#78

Open
gustcol wants to merge 7 commits intofacebookresearch:mainfrom
gustcol:feature/health-check-scaffold
Open

Add scaffold tool for creating new health checks [2/2]#78
gustcol wants to merge 7 commits intofacebookresearch:mainfrom
gustcol:feature/health-check-scaffold

Conversation

@gustcol
Copy link
Contributor

@gustcol gustcol commented Mar 1, 2026

Summary

Ref: #75
Depends on: #77

  • Introduce bin/create_new_health_check.py, a scaffold tool that automates the creation of new health checks by generating all required files and registering the check across all touchpoints
  • Generated checks use the HealthCheckRuntime context manager from Add HealthCheckRuntime context manager for shared boilerplate [1/2] #77
  • Supports --group flag for grouped checks (@click.group()) and --dry-run for previewing changes
  • Idempotent — safe to re-run without duplicating entries
  • Updates the "Adding New Health Check" documentation with a Quick Start section

Stacked PR series: [1/2] Runtime helper (#77) → [2/2] Scaffold tool (this PR)

What the tool does

$ python bin/create_new_health_check.py check_ntp_sync --dry-run
[dry-run] Would create: gcm/health_checks/checks/check_ntp_sync.py
[dry-run] Would create: gcm/tests/health_checks_tests/test_check_ntp_sync.py
[dry-run] Would create: website/docs/GCM_Health_Checks/health_checks/check-ntp-sync.md
[dry-run] Would add import to gcm/health_checks/checks/__init__.py
[dry-run] Would add __all__ entry to gcm/health_checks/checks/__init__.py
[dry-run] Would add checks.check_ntp_sync to gcm/health_checks/cli/health_checks.py
[dry-run] Would add CHECK_NTP_SYNC to gcm/schemas/health_check/health_check_name.py
[dry-run] Would add disable_check_ntp_sync to gcm/monitoring/features/.../health_checks_features.py
Action Target Insert Order
Create gcm/health_checks/checks/check_{name}.py
Create gcm/tests/health_checks_tests/test_check_{name}.py
Create website/docs/.../check-{name}.md
Modify checks/__init__.py Alphabetical import + append __all__
Modify cli/health_checks.py Append to list_of_checks
Modify health_check_name.py Alphabetical enum entry
Modify health_checks_features.py Alphabetical disable_ field

Test plan

  • nox -s tests -- gcm/tests/test_create_health_check.py — 15 tests covering validation, file generation, registration, idempotency, dry-run, grouped checks
  • python bin/create_new_health_check.py check_test_example --dry-run — integration smoke test
  • nox -s lint
  • nox -s format
  • Verify Docusaurus website builds with updated docs

…plate

Extract the ~30 lines of repeated setup code (logger init, GPU node ID
detection, derived cluster resolution, TelemetryContext + OutputContext
nesting, killswitch check) into a reusable HealthCheckRuntime dataclass
context manager. This reduces per-subcommand boilerplate from ~30 lines
to ~5 lines.

The helper is purely additive — existing checks continue to work
unchanged. New checks can use `with HealthCheckRuntime(...) as rt:`
instead of manually wiring up the setup ceremony.

Includes comprehensive tests covering field initialization, killswitch
behavior, context manager nesting, GPU node ID failure handling, and
the finish() convenience method.

Refs: facebookresearch#75
@github-actions
Copy link

github-actions bot commented Mar 1, 2026

CI Commands

The following CI workflows run automatically on every push and pull request:

Workflow What it runs
GPU Cluster Monitoring Python CI lint, tests, typecheck, format, deb build, pyoxidizer builds
Go packages CI shelper tests, format, lint

The following commands can be used by maintainers to trigger additional tests that require access to secrets:

Command Description Requires approval?
/metaci tests Runs Meta internal integration tests (pytest) Yes — a maintainer must trigger the command and approve the deployment request
/metaci integration tests Same as above (alias) Yes

Note: Only repository maintainers (OWNER association) can trigger /metaci commands. After commenting the command, a maintainer must also navigate to the Actions tab and approve the deployment to the graph-api-access environment before the jobs will run. See the approval guidelines for what to approve or reject.

gustcol added 3 commits March 1, 2026 12:51
Apply ufmt formatting and fix mypy errors in test helper
by using explicit typed parameters instead of **kwargs dict
unpacking.
Introduce bin/create_new_health_check.py that automates the creation
of new health checks by generating all required files (check module,
test skeleton, documentation stub) and registering the check across
all touchpoints (checks/__init__.py, CLI entry point, HealthCheckName
enum, killswitch feature flag).

The tool supports single-command and grouped-command checks, has a
dry-run mode, and is idempotent (safe to re-run). Generated checks
use the new HealthCheckRuntime context manager from the previous
commit.

Also updates the "Adding New Health Check" guide with a Quick Start
section pointing to the scaffold tool.

Refs: facebookresearch#75
- Replace len() slice with constant to avoid E203 whitespace-before-colon
- Remove placeholder-less f-strings (F541)
- Remove unused StringIO import (F401)
@gustcol gustcol force-pushed the feature/health-check-scaffold branch from 158959c to 4110400 Compare March 1, 2026 11:58
Add type narrowing asserts for importlib spec/loader (which return
Optional types), type the dynamically-loaded scaffold module as Any
to allow attribute access, and annotate the scaffold_env fixture
return type as Iterator[Path].
```bash
python bin/generate_features.py
ufmt format gcm
```
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

run these from create_new_health_check.py?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea — the scaffold tool now runs generate_features.py and ufmt format gcm automatically as a post-scaffold step (skipped in --dry-run mode). Updated the docs to remove the manual instructions.

gustcol added 2 commits March 1, 2026 22:02
The scaffold tool now runs generate_features.py and ufmt format gcm
as a post-scaffold step, so users no longer need to run them manually.
Updated docs to reflect the automated workflow.
- Remove unused import sys from generated templates (would fail lint)
- Remove unused check_name parameter from run_post_scaffold
- Add guard for silent no-op when CLI anchor string is not found
- Fix dry-run inconsistency in update_init __all__ block
- Remove tautological assertion in test_update_init_idempotent
- Add test for missing CLI anchor warning
- Update Quick Start docs to list all remaining manual steps
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants