Skip to content

test(skills): replace committed PPTX binary fixture with programmatic generation #1135

@WilliamBerryiii

Description

@WilliamBerryiii

Summary

PR #1128 (addressing #1057) introduces a committed binary .pptx file as a test fixture for the PowerPoint extraction integration tests. While this satisfies the integration testing goal, a committed binary fixture is an opaque blob that cannot be meaningfully reviewed in a pull request diff. Binary PPTX files (which are ZIP archives containing XML and embedded media) can carry macros, OLE objects, embedded executables, or other content that is invisible during code review.

This issue proposes replacing the committed minimal_test_fixture.pptx with a Python script that programmatically generates the fixture at test time using python-pptx. This eliminates the opaque binary from the repository, makes the fixture content fully auditable in code review, and creates the opportunity for a true roundtrip test: generate → extract → validate.

Problem

Security: Opaque binary in the repository

The committed .pptx at tests/fixtures/minimal_test_fixture.pptx is a binary blob. The PR metadata indicates author "ChatGPT", meaning it was generated externally. Reviewers cannot verify:

  • Whether the file contains macros (vbaProject.bin)
  • Whether OLE objects or embedded executables are present
  • Whether the XML inside the ZIP is well-formed or contains unexpected elements
  • Whether the file will remain safe after future modifications — any binary replacement is equally opaque

The PPTX format (OOXML) stores everything in a ZIP container with XML parts. A determined attacker could embed malicious content in custom XML parts, relationship targets, or legacy embedding areas that python-pptx would silently pass through.

Maintainability: Silent fixture drift

When the extraction pipeline evolves (new element types, changed YAML schema, new metadata fields), the binary fixture cannot be updated through normal code review. A contributor must regenerate it externally, commit a new binary, and reviewers are once again unable to verify the change.

Auditability: No single source of truth

The test expectations (EXPECTED_FIXTURE dict in test_extract_content_integration.py) and the fixture file are maintained independently. If they diverge, the failure message identifies what broke but not why — because the fixture's ground truth is locked inside a binary.

Proposed Solution

1. Create a fixture generator module

Add a tests/fixtures/generate_fixture.py (or a conftest.py fixture function) that builds the minimal test PPTX programmatically using python-pptx.

The existing conftest.py already demonstrates this pattern — it creates presentations, slides, textboxes, shapes, and images in-memory using make_blank_presentation(), make_blank_slide(), and _minimal_png_bytes(). The fixture generator extends this pattern to cover the specific integration test requirements.

The fixture must reproduce the content described in EXPECTED_FIXTURE:

# Fixture requirements derived from PR #1128 test expectations:
{
    "metadata": {
        "title": "Minimal Test Fixture",
        "author": "<appropriate author>",
    },
    "slides": {
        1: {
            "layout": "Title Slide",
            "speaker_notes": "This is a speaker note for slide 1.",
            "texts": [
                "Test Fixture Presentation",
                "Slide with theme colors and notes",
            ],
            # Font color #0066CC applied to title text
        },
        2: {
            "layout": "Title and Content",
            "texts": ["Slide with Image", "Below is an embedded image."],
            "element_types": ["textbox", "textbox", "image"],
            # Embedded PNG image at images/image-01.png
        },
    },
    "theme_colors": {
        "dark_1": "#000000",
        "accent_1": "#4F81BD",
    },
}

Key implementation details:

  • Theme colors: python-pptx exposes theme XML via presentation.slide_masters[0].element. The generator must set dark_1 (dk1) and accent_1 to specific hex values in the theme XML so extract_content.py --resolve-themes produces deterministic results.
  • Font color from theme: Slide 1's title font uses #0066CC, which in the original fixture appears to be a theme-resolved color. The generator should either set this as a direct RGB color or set a theme color reference that resolves to #0066CC.
  • Embedded image: Use the existing _minimal_png_bytes() helper from conftest.py to generate a valid PNG in-memory, then embed it via slide.shapes.add_picture().
  • Speaker notes: python-pptx supports setting notes via slide.notes_slide.notes_text_frame.
  • Metadata: Set via presentation.core_properties.title and .author.

2. Wire the generator into pytest fixtures

Replace the current conftest.py session-scoped fixture:

# Current (binary file reference):
@pytest.fixture(scope="session")
def minimal_test_fixture_path(powerpoint_fixture_dir: Path) -> Path:
    return powerpoint_fixture_dir / "minimal_test_fixture.pptx"

# Proposed (programmatic generation):
@pytest.fixture(scope="session")
def minimal_test_fixture_path(tmp_path_factory: pytest.TempPathFactory) -> Path:
    fixture_dir = tmp_path_factory.mktemp("fixtures")
    pptx_path = fixture_dir / "minimal_test_fixture.pptx"
    generate_minimal_fixture(pptx_path)
    return pptx_path

Using scope="session" ensures the fixture is generated once per test session, not per test function.

3. Delete the committed binary

Remove tests/fixtures/minimal_test_fixture.pptx from the repository. If the tests/fixtures/ directory is no longer needed, remove it as well.

4. Validate the generated fixture (roundtrip test)

The real payoff: once the fixture is generated programmatically, add an assertion step that runs the generated PPTX through validate_deck.py to confirm it meets structural requirements. This creates a closed-loop guarantee:

  1. Generate the deck with python-pptx (known inputs)
  2. Extract with extract_content.py (existing integration tests)
  3. Validate with validate_deck.py (structural correctness)

If any step fails, the test failure points to auditable Python code rather than an opaque binary.

5. Consider a builder helper for future fixtures

If additional integration test scenarios are needed later (e.g., charts, tables, grouped shapes, freeform paths), consider creating a small builder helper that encapsulates common patterns:

class FixtureBuilder:
    """Builds minimal PPTX fixtures for integration tests."""
    
    def __init__(self, title: str, author: str):
        self.prs = Presentation()
        self.prs.core_properties.title = title
        self.prs.core_properties.author = author
    
    def add_title_slide(self, title: str, subtitle: str, **kwargs): ...
    def add_content_slide(self, title: str, body: str, **kwargs): ...
    def set_theme_colors(self, **colors): ...
    def save(self, path: Path): ...

This is optional and should only be added if multiple fixtures are needed. Don't over-engineer for a single fixture.

Implementation Notes

python-pptx capabilities already in use

The skill's build_deck.py (550+ lines) already demonstrates advanced programmatic PPTX construction: shapes, fills, gradients, connectors, grouped elements, charts, tables, rich text, and content-extra scripts. The test conftest.py creates presentations, slides, textboxes, shapes, and images in-memory. All required python-pptx APIs are already exercised in the codebase.

Theme color manipulation

Setting theme colors requires direct XML manipulation on the slide master's theme element. This is the only part that python-pptx doesn't expose through a high-level API. The pattern looks like:

from lxml import etree
theme = prs.slide_masters[0].element.find(
    './/{http://schemas.openxmlformats.org/drawingml/2006/main}theme'
)
# Modify clrScheme elements for dk1, accent1, etc.

Reference pptx_colors.py in the skill for how theme colors are resolved during extraction — the generator must produce XML that this resolver handles correctly.

Existing test patterns to follow

  • conftest.pymake_blank_presentation(), make_blank_slide(), _minimal_png_bytes()
  • test_build_deck.py → creates shapes, fills, text, images via python-pptx in unit tests
  • test_extract_content.py → creates presentation elements and validates extraction output

Related Issues

Acceptance Criteria

  • A Python module or fixture function generates minimal_test_fixture.pptx programmatically using python-pptx
  • The generator produces a PPTX that passes all existing integration tests in test_extract_content_integration.py without modifying test expectations
  • The committed binary tests/fixtures/minimal_test_fixture.pptx is removed from the repository
  • The generated fixture is validated by validate_deck.py as part of the test run (roundtrip integrity)
  • Theme color resolution (--resolve-themes) produces the same deterministic results as the current binary fixture
  • All existing tests continue to pass (npm run test:py)
  • The fixture generation code is fully reviewable in standard code review (no opaque binaries)

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestskillsCopilot skill packages (SKILL.md)

    Type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions