Summary
PR #1128 (addressing #1057) introduces a committed binary .pptx file as a test fixture for the PowerPoint extraction integration tests. While this satisfies the integration testing goal, a committed binary fixture is an opaque blob that cannot be meaningfully reviewed in a pull request diff. Binary PPTX files (which are ZIP archives containing XML and embedded media) can carry macros, OLE objects, embedded executables, or other content that is invisible during code review.
This issue proposes replacing the committed minimal_test_fixture.pptx with a Python script that programmatically generates the fixture at test time using python-pptx. This eliminates the opaque binary from the repository, makes the fixture content fully auditable in code review, and creates the opportunity for a true roundtrip test: generate → extract → validate.
Problem
Security: Opaque binary in the repository
The committed .pptx at tests/fixtures/minimal_test_fixture.pptx is a binary blob. The PR metadata indicates author "ChatGPT", meaning it was generated externally. Reviewers cannot verify:
- Whether the file contains macros (
vbaProject.bin)
- Whether OLE objects or embedded executables are present
- Whether the XML inside the ZIP is well-formed or contains unexpected elements
- Whether the file will remain safe after future modifications — any binary replacement is equally opaque
The PPTX format (OOXML) stores everything in a ZIP container with XML parts. A determined attacker could embed malicious content in custom XML parts, relationship targets, or legacy embedding areas that python-pptx would silently pass through.
Maintainability: Silent fixture drift
When the extraction pipeline evolves (new element types, changed YAML schema, new metadata fields), the binary fixture cannot be updated through normal code review. A contributor must regenerate it externally, commit a new binary, and reviewers are once again unable to verify the change.
Auditability: No single source of truth
The test expectations (EXPECTED_FIXTURE dict in test_extract_content_integration.py) and the fixture file are maintained independently. If they diverge, the failure message identifies what broke but not why — because the fixture's ground truth is locked inside a binary.
Proposed Solution
1. Create a fixture generator module
Add a tests/fixtures/generate_fixture.py (or a conftest.py fixture function) that builds the minimal test PPTX programmatically using python-pptx.
The existing conftest.py already demonstrates this pattern — it creates presentations, slides, textboxes, shapes, and images in-memory using make_blank_presentation(), make_blank_slide(), and _minimal_png_bytes(). The fixture generator extends this pattern to cover the specific integration test requirements.
The fixture must reproduce the content described in EXPECTED_FIXTURE:
# Fixture requirements derived from PR #1128 test expectations:
{
"metadata": {
"title": "Minimal Test Fixture",
"author": "<appropriate author>",
},
"slides": {
1: {
"layout": "Title Slide",
"speaker_notes": "This is a speaker note for slide 1.",
"texts": [
"Test Fixture Presentation",
"Slide with theme colors and notes",
],
# Font color #0066CC applied to title text
},
2: {
"layout": "Title and Content",
"texts": ["Slide with Image", "Below is an embedded image."],
"element_types": ["textbox", "textbox", "image"],
# Embedded PNG image at images/image-01.png
},
},
"theme_colors": {
"dark_1": "#000000",
"accent_1": "#4F81BD",
},
}
Key implementation details:
- Theme colors:
python-pptx exposes theme XML via presentation.slide_masters[0].element. The generator must set dark_1 (dk1) and accent_1 to specific hex values in the theme XML so extract_content.py --resolve-themes produces deterministic results.
- Font color from theme: Slide 1's title font uses
#0066CC, which in the original fixture appears to be a theme-resolved color. The generator should either set this as a direct RGB color or set a theme color reference that resolves to #0066CC.
- Embedded image: Use the existing
_minimal_png_bytes() helper from conftest.py to generate a valid PNG in-memory, then embed it via slide.shapes.add_picture().
- Speaker notes:
python-pptx supports setting notes via slide.notes_slide.notes_text_frame.
- Metadata: Set via
presentation.core_properties.title and .author.
2. Wire the generator into pytest fixtures
Replace the current conftest.py session-scoped fixture:
# Current (binary file reference):
@pytest.fixture(scope="session")
def minimal_test_fixture_path(powerpoint_fixture_dir: Path) -> Path:
return powerpoint_fixture_dir / "minimal_test_fixture.pptx"
# Proposed (programmatic generation):
@pytest.fixture(scope="session")
def minimal_test_fixture_path(tmp_path_factory: pytest.TempPathFactory) -> Path:
fixture_dir = tmp_path_factory.mktemp("fixtures")
pptx_path = fixture_dir / "minimal_test_fixture.pptx"
generate_minimal_fixture(pptx_path)
return pptx_path
Using scope="session" ensures the fixture is generated once per test session, not per test function.
3. Delete the committed binary
Remove tests/fixtures/minimal_test_fixture.pptx from the repository. If the tests/fixtures/ directory is no longer needed, remove it as well.
4. Validate the generated fixture (roundtrip test)
The real payoff: once the fixture is generated programmatically, add an assertion step that runs the generated PPTX through validate_deck.py to confirm it meets structural requirements. This creates a closed-loop guarantee:
- Generate the deck with
python-pptx (known inputs)
- Extract with
extract_content.py (existing integration tests)
- Validate with
validate_deck.py (structural correctness)
If any step fails, the test failure points to auditable Python code rather than an opaque binary.
5. Consider a builder helper for future fixtures
If additional integration test scenarios are needed later (e.g., charts, tables, grouped shapes, freeform paths), consider creating a small builder helper that encapsulates common patterns:
class FixtureBuilder:
"""Builds minimal PPTX fixtures for integration tests."""
def __init__(self, title: str, author: str):
self.prs = Presentation()
self.prs.core_properties.title = title
self.prs.core_properties.author = author
def add_title_slide(self, title: str, subtitle: str, **kwargs): ...
def add_content_slide(self, title: str, body: str, **kwargs): ...
def set_theme_colors(self, **colors): ...
def save(self, path: Path): ...
This is optional and should only be added if multiple fixtures are needed. Don't over-engineer for a single fixture.
Implementation Notes
python-pptx capabilities already in use
The skill's build_deck.py (550+ lines) already demonstrates advanced programmatic PPTX construction: shapes, fills, gradients, connectors, grouped elements, charts, tables, rich text, and content-extra scripts. The test conftest.py creates presentations, slides, textboxes, shapes, and images in-memory. All required python-pptx APIs are already exercised in the codebase.
Theme color manipulation
Setting theme colors requires direct XML manipulation on the slide master's theme element. This is the only part that python-pptx doesn't expose through a high-level API. The pattern looks like:
from lxml import etree
theme = prs.slide_masters[0].element.find(
'.//{http://schemas.openxmlformats.org/drawingml/2006/main}theme'
)
# Modify clrScheme elements for dk1, accent1, etc.
Reference pptx_colors.py in the skill for how theme colors are resolved during extraction — the generator must produce XML that this resolver handles correctly.
Existing test patterns to follow
conftest.py → make_blank_presentation(), make_blank_slide(), _minimal_png_bytes()
test_build_deck.py → creates shapes, fills, text, images via python-pptx in unit tests
test_extract_content.py → creates presentation elements and validates extraction output
Related Issues
Acceptance Criteria
Summary
PR #1128 (addressing #1057) introduces a committed binary
.pptxfile as a test fixture for the PowerPoint extraction integration tests. While this satisfies the integration testing goal, a committed binary fixture is an opaque blob that cannot be meaningfully reviewed in a pull request diff. Binary PPTX files (which are ZIP archives containing XML and embedded media) can carry macros, OLE objects, embedded executables, or other content that is invisible during code review.This issue proposes replacing the committed
minimal_test_fixture.pptxwith a Python script that programmatically generates the fixture at test time usingpython-pptx. This eliminates the opaque binary from the repository, makes the fixture content fully auditable in code review, and creates the opportunity for a true roundtrip test: generate → extract → validate.Problem
Security: Opaque binary in the repository
The committed
.pptxattests/fixtures/minimal_test_fixture.pptxis a binary blob. The PR metadata indicates author"ChatGPT", meaning it was generated externally. Reviewers cannot verify:vbaProject.bin)The PPTX format (OOXML) stores everything in a ZIP container with XML parts. A determined attacker could embed malicious content in custom XML parts, relationship targets, or legacy embedding areas that
python-pptxwould silently pass through.Maintainability: Silent fixture drift
When the extraction pipeline evolves (new element types, changed YAML schema, new metadata fields), the binary fixture cannot be updated through normal code review. A contributor must regenerate it externally, commit a new binary, and reviewers are once again unable to verify the change.
Auditability: No single source of truth
The test expectations (
EXPECTED_FIXTUREdict intest_extract_content_integration.py) and the fixture file are maintained independently. If they diverge, the failure message identifies what broke but not why — because the fixture's ground truth is locked inside a binary.Proposed Solution
1. Create a fixture generator module
Add a
tests/fixtures/generate_fixture.py(or aconftest.pyfixture function) that builds the minimal test PPTX programmatically usingpython-pptx.The existing
conftest.pyalready demonstrates this pattern — it creates presentations, slides, textboxes, shapes, and images in-memory usingmake_blank_presentation(),make_blank_slide(), and_minimal_png_bytes(). The fixture generator extends this pattern to cover the specific integration test requirements.The fixture must reproduce the content described in
EXPECTED_FIXTURE:Key implementation details:
python-pptxexposes theme XML viapresentation.slide_masters[0].element. The generator must setdark_1(dk1) andaccent_1to specific hex values in the theme XML soextract_content.py --resolve-themesproduces deterministic results.#0066CC, which in the original fixture appears to be a theme-resolved color. The generator should either set this as a direct RGB color or set a theme color reference that resolves to#0066CC._minimal_png_bytes()helper fromconftest.pyto generate a valid PNG in-memory, then embed it viaslide.shapes.add_picture().python-pptxsupports setting notes viaslide.notes_slide.notes_text_frame.presentation.core_properties.titleand.author.2. Wire the generator into pytest fixtures
Replace the current
conftest.pysession-scoped fixture:Using
scope="session"ensures the fixture is generated once per test session, not per test function.3. Delete the committed binary
Remove
tests/fixtures/minimal_test_fixture.pptxfrom the repository. If thetests/fixtures/directory is no longer needed, remove it as well.4. Validate the generated fixture (roundtrip test)
The real payoff: once the fixture is generated programmatically, add an assertion step that runs the generated PPTX through
validate_deck.pyto confirm it meets structural requirements. This creates a closed-loop guarantee:python-pptx(known inputs)extract_content.py(existing integration tests)validate_deck.py(structural correctness)If any step fails, the test failure points to auditable Python code rather than an opaque binary.
5. Consider a builder helper for future fixtures
If additional integration test scenarios are needed later (e.g., charts, tables, grouped shapes, freeform paths), consider creating a small builder helper that encapsulates common patterns:
This is optional and should only be added if multiple fixtures are needed. Don't over-engineer for a single fixture.
Implementation Notes
python-pptx capabilities already in use
The skill's
build_deck.py(550+ lines) already demonstrates advanced programmatic PPTX construction: shapes, fills, gradients, connectors, grouped elements, charts, tables, rich text, and content-extra scripts. The testconftest.pycreates presentations, slides, textboxes, shapes, and images in-memory. All requiredpython-pptxAPIs are already exercised in the codebase.Theme color manipulation
Setting theme colors requires direct XML manipulation on the slide master's theme element. This is the only part that
python-pptxdoesn't expose through a high-level API. The pattern looks like:Reference
pptx_colors.pyin the skill for how theme colors are resolved during extraction — the generator must produce XML that this resolver handles correctly.Existing test patterns to follow
conftest.py→make_blank_presentation(),make_blank_slide(),_minimal_png_bytes()test_build_deck.py→ creates shapes, fills, text, images viapython-pptxin unit teststest_extract_content.py→ creates presentation elements and validates extraction outputRelated Issues
.pptxfixtureAcceptance Criteria
minimal_test_fixture.pptxprogrammatically usingpython-pptxtest_extract_content_integration.pywithout modifying test expectationstests/fixtures/minimal_test_fixture.pptxis removed from the repositoryvalidate_deck.pyas part of the test run (roundtrip integrity)--resolve-themes) produces the same deterministic results as the current binary fixturenpm run test:py)