Skip to content

Conversation

@xdotli
Copy link
Member

@xdotli xdotli commented Dec 20, 2025

Summary

  • Created a debugging experiment that tests LLM agents' ability to identify and fix 16 broken configuration errors in a Linux distribution build
  • Includes broken configs (debootstrap, fstab, GRUB, chroot setup) and their fixed versions
  • Provides automated test suite, Docker environment, and comprehensive documentation

Experiment Structure

linux/fix-broken-distro/
├── README.md           # Overview with key metrics
├── EXPERIMENT.yaml     # Machine-readable metadata
├── BROKEN.md          # Detailed list of all 16 issues
├── Dockerfile         # Build environment
├── *.conf/sh          # Broken configurations
├── *.fixed            # Fixed versions
├── test-build.sh      # Automated validation
└── trajectories/
    └── SUMMARY.md     # Debugging process documentation

Key Metrics

Metric Value
Agent Claude Opus 4.5
Issues 16 deliberate errors
Difficulty Hard
Estimated Steps ~100

Issue Categories

  • Critical (8): Build/boot blocking (invalid arch, typos, missing kernel)
  • High (4): Degraded functionality (missing mounts, DNS, boot params)
  • Medium/Low (4): Cosmetic (locale, timezone, swap)

Test plan

  • EXPERIMENT.yaml validates
  • Artifacts are organized
  • test-build.sh runs and identifies issues
  • Documentation is complete

This experiment tests whether LLM agents can identify and fix broken Linux
distribution build configurations. Contains 16 deliberate errors across
architecture settings, package dependencies, filesystem configuration, and
bootloader setup.

Key components:
- Broken configuration files with 16 common issues
- Fixed versions demonstrating correct configuration
- Automated test suite for validation
- Complete debugging process documentation
- Docker environment for isolated testing

Issues span critical (build-blocking), high (boot-blocking), and medium/low
(degraded functionality) severity levels. Tests multi-file reasoning,
build vs runtime error understanding, and systematic debugging methodology.

Estimated 100 steps to complete, requiring deep Linux system knowledge.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants