feat(decompile): three XML deep-read fixes (chart-axis + alpha + font inheritance)#23
Merged
Conversation
`_text_runs` defaulted any paragraph whose run/`<a:pPr>`/`<a:lstStyle>` all omitted explicit `sz` to a hardcoded 18pt. PowerPoint's actual cascade reaches into the slide layout and slide master: 1. slide-level run `<a:rPr sz="...">` (already honoured) 2. paragraph `<a:pPr><a:defRPr sz="...">` (already honoured) 3. slide's `<a:txBody><a:lstStyle>...defRPr sz` (already honoured) 4. **layout's placeholder `<p:txBody><a:lstStyle>...defRPr sz`** ← new 5. **master's `<p:txStyles><p:titleStyle|bodyStyle>...defRPr sz`** ← new 6. hardcoded 18pt fallback Steps 4 and 5 are the layout-inheritance cascade. On a typical chapter- divider layout the slide-level placeholder writes only `<a:t>Chapter title</a:t>` and inherits the 60pt+ headline size from the layout's matching placeholder. Without the lookup we render at 18pt — visibly too small. New helper `_layout_placeholder_default_sz(slide, ph_type, ph_idx)` walks the layout (then the master's `<p:txStyles>`) for the matching placeholder. `_text_runs` gains an `inherited_default_sz` kwarg fed by `_emit_sp` whenever the shape carries `ph_type`/`ph_idx`. The cascade priority remains correct: slide-level lstStyle still wins over inherited; inherited only fires when no slide-level default exists. End-to-end on the 99-slide showcase, plus stacks cleanly with the chart-axis (PR #21) and alpha-fills (PR #22) improvements: mean struct_diff: 8.21% → 6.71% (91.79% → 93.29% quality) slides above 15% threshold: 13 → 5 Signed-off-by: Mike Mueller <mike@objektarium.de>
PPTX colour elements can carry `<a:alpha val="N"/>` where N is 0..100000 encoding 0..100% opacity. Source decks use this for Venn diagrams, overlay panels, glass-effect cards, and any composition where shapes sit on top of each other and the overlap region should mix. The previous `_resolve_solid` and `_resolve_fill` paths ignored the alpha element entirely — semi-transparent fills decompiled at full opacity, so Venn circles render as solid blobs instead of the expected lighter-on-white standalone + darker overlap regions. Threading true alpha through the build pipeline would require the expander, emitter, and python-pptx fill APIs to learn about transparency. As a contained first pass, we pre-multiply the source alpha against a white slide background: blended_c = c * alpha + 255 * (1 - alpha) For typical white-canvas slides this reproduces the perceived colour of standalone (non-overlapping) semi-transparent fills exactly. For overlapping regions (Venn intersections) the result is still wrong — real alpha compositing would darken the intersection — but the standalone fills now match source pixels, which is the dominant contribution to the diff. On the showcase deck slide-61 (the Venn diagram example) went from 18.05% struct_diff to 5.8% in a single change. Two new helpers: - `_alpha_for_color(color_el)` — read `<a:alpha>` child, returns 0..1 - `_blend_on_white(rgb, alpha)` — apply the pre-multiply formula Both `_resolve_solid` (used by line strokes, table cells, chart series) and `_resolve_fill` (shape fills, including the grpFill walk) now apply the blend on srgb AND scheme colour paths. Theme-scheme colours also pick up `<a:alpha>` when the source layers it onto an inherited brand accent. Signed-off-by: Mike Mueller <mike@objektarium.de>
`_emit_bar_chart` always emitted Y-axis tick labels, X-axis category labels, and per-bar value labels — regardless of what the source's chart XML actually requested. Many showcase charts set `<c:catAx><c:delete val="1"/>` and/or `<c:valAx><c:delete val="1"/>` to hide the axes for a clean look and `<c:dLbls><c:showVal val="0"/>` to suppress value labels, but our renders painted them all anyway — extra text running through the source's empty space at every tick position, every category boundary, and above every bar. `_emit_bar_chart` now reads: - `<c:catAx><c:delete>` / `<c:valAx><c:delete>` — skip the corresponding axis labels AND expand the plot area to fill the space that would otherwise be reserved for them - `<c:dLbls><c:showVal>` / `<c:showCatName>` — gate value-above-bar and category labels on the explicit source flags rather than always-emit End-to-end measured on the showcase deck (six bar-chart slides above the 15% struct threshold): | slide | before | after | |---------|-------:|-------:| | 56 | 26.1% | 25.7% | | 66 | 24.9% | 23.6% | | 67 | 16.6% | 13.9% | | 93 | 16.5% | 14.8% | | 94 | 15.4% | 13.1% | | 98 | 18.7% | 16.7% | Three slides (67, 93, 94) cleared the threshold. The two clean-look showcase charts (56, 66) still carry residual diff from their horizontal/doughnut hybrid composition — separate work. This is purely a deeper read of XML data the source was already providing; nothing was invented or estimated. Signed-off-by: Mike Mueller <mike@objektarium.de>
CI lint flagged `show_cat_labels` as assigned-but-never-used. The fix also tightens behaviour to match the source: category labels now require BOTH the axis to be visible (`<c:catAx><c:delete val="0">`) AND `<c:dLbls><c:showCatName val="1">`. The previous code only checked axis visibility, which over-emitted on charts that hide labels per- chart-element while keeping the axis tick frame. Signed-off-by: Mike Mueller <mike@objektarium.de>
This was referenced May 20, 2026
Showcase decks often colour alternating bars different hues to spotlight
a specific category (e.g. five bars with bars 1/3/5 in accent1 and
2/4 in accent2 to read as "this is the highlighted set"). PowerPoint
writes that information per data point as
<c:ser>
<c:dPt><c:idx val="N"/><c:spPr><a:solidFill>…</a:solidFill></c:spPr></c:dPt>
<c:dPt><c:idx val="N+1"/>…</c:dPt>
…
</c:ser>
`_emit_bar_chart` read only the series-level `<c:spPr>` so every bar in
a series rendered the series colour, losing the highlight pattern.
`_emit_pie_chart` already reads dPt (added in earlier work); this brings
bar charts to parity.
The new tuple element `dpt_colors: dict[int, str]` maps category-index
to resolved fill, and every bar render site (horizontal-stacked,
horizontal-clustered, vertical-clustered) prefers dPt over series.
End-to-end on the showcase deck slide-56 (5 bars with alternating
accent1/accent2 fills) drops from 25.4% to 14.1% struct_diff — cleared
the 15% threshold.
Signed-off-by: Mike Mueller <mike@objektarium.de>
`_shape_geometry_kind` only knew the four "core" presets (`ellipse`, `line`/`straightConnector1`, `rect`, `roundRect`). Every other preset fell through to `rect` and the shape rendered as its bounding box — triangles became rectangles, arrows became rectangles, diamonds became rectangles. Inventory across the showcase deck XML: rect 820 (handled) ellipse 107 (handled) line 59 (handled) triangle 19 → handled now arc 12 (todo: arc adjustments) upArrow 6 → handled now rtTriangle 6 → handled now rightArrow 5 → handled now … Adds `_PRESET_PATH_PRESETS` enumerating the closed-polygon presets the decompiler now synthesises an SVG `d` path for, and `_preset_geom_path(preset, w, h)` returning that string in local 0..w × 0..h pixel coordinates. Covered presets: triangle, rtTriangle, diamond, parallelogram, trapezoid pentagon / homePlate, hexagon, heptagon, octagon, chevron rightArrow, leftArrow, upArrow, downArrow All use PowerPoint's default unadjusted geometry — `<a:avLst>` slider overrides are not threaded through. For convex polygons the default form is visually faithful in the dominant case; arrows use 50% shaft / 50% barb (PowerPoint default). `_shape_geometry_kind` routes these presets to `kind="shape"` and `_emit_sp` synthesises `svg_path_d` when `_custgeom_svg_d` returns None (no `<a:custGeom>` present but a known preset is). End-to-end on the showcase deck slide-57 (a podium chart of three triangles labeled "Text"): 16.8% → 2.9% struct_diff. Cleared the 15% threshold from one preset-geometry source-XML read. Signed-off-by: Mike Mueller <mike@objektarium.de>
`_emit_bar_chart` always painted a legend swatch + series-name row, even when the source chart carried no `<c:legend>` element at all. Bare-bones showcase bar charts (with only bars + value labels) emitted a phantom "Datenreihe 1" label at the bottom-left, often wrapped mid-word inside the swatch slot. Same for the chart title: emitted whenever `<c:title>` had any text, ignoring the `<c:autoTitleDeleted val="1"/>` flag that says the user explicitly removed the auto-generated title. Now both fire only when their source elements actually exist: - legend rows: only when `<c:legend>` is present - chart title: only when `<c:title>` has text AND `<c:autoTitleDeleted>` is absent / val="0" Pure XML deep-read; no other changes. Signed-off-by: Mike Mueller <mike@objektarium.de>
`_resolve_solid` (used by line strokes, chart series colours,
per-data-point `<c:dPt>` fills, and table cell borders) ignored every
colour modifier. PowerPoint encodes accent variations as
<a:schemeClr val="accent1">
<a:lumMod val="50000"/>
<a:lumOff val="50000"/>
</a:schemeClr>
(a 50%-mixed accent for a lighter swatch in a chart series). Without
applying the modifier we resolved every modified accent to the
unmodified base theme colour, losing per-series colour distinctions
across stacked-bar charts, doughnut slice variations, and any source
that tints theme colours for a colour ramp.
Inventory across the test deck: 1056 lumMod/lumOff modifiers and
0 tint/shade. Adds tint/shade support too for completeness — those
will fire on decks that use the alternative percent-of-source-colour
modifier syntax.
New helper `_apply_color_mods(rgb, color_el)` handles all four
(lumMod, lumOff, tint, shade) and is called from both `_resolve_solid`
and `_resolve_fill` after the base RGB is resolved. `_resolve_fill`'s
inline lumMod/lumOff block (already there) is now delegated to the
shared helper so srgb-side colour modifiers also apply (previously
only scheme-side did).
Signed-off-by: Mike Mueller <mike@objektarium.de>
`_emit_pie_chart` only recognised the four cardinal `<c:legendPos>`
values ("l", "r", "t", "b") and fell back to "r" for corner positions
("tr", "tl", "br", "bl"). For the dominant case ("tr" - PowerPoint's
default-ish position) "r" was the right collapse anyway, but explicit
"tl"/"bl" (left-side corner legends) ended up rendering as right-side
legends, mirroring the pie horizontally.
Now the normalisation routes corner positions to their horizontal
axis: "l"/"tl"/"bl" → "l", "r"/"tr"/"br" → "r", "t"/"b" stay
vertical. The pie sizing logic then reserves the correct slot.
This is the minimal fix that keeps `_emit_pie_chart`'s coordinate
math unchanged. A future pass could honour the explicit
`<c:layout><c:manualLayout>` x/y/w/h fractions when present, but
PPTX charts that include manualLayout typically still set the
canonical legendPos, so collapsing to the axis is the right
first-order behaviour.
Signed-off-by: Mike Mueller <mike@objektarium.de>
Dispatched three parallel sub-agents to mine the source XML for the
remaining stubborn slides (24, 66, 98). Each reported specific
extraction gaps; this commit lands the three smallest framework
fixes that came out of those reports.
1. **`<c:firstSliceAng>` on pie/doughnut.** Source decks rotating
the first slice into a fixed position (corporate showcase pattern
for highlighting a specific category) carry
`<c:firstSliceAng val="N"/>` on the chart element. The decompile
ignored it, so every slice's start angle was off by N°. Now
applied as an additive rotation on top of the standard 12-o'clock
start.
2. **Chart series + dPt colours skip nearest_token.** `_emit_pie_chart`
and `_emit_bar_chart` resolved `<c:ser><c:spPr>` and per-data-point
`<c:dPt><c:spPr>` colours through the brand palette via
`_resolve_fill(..., palette)`. For brands lacking explicit
chart-series tokens (the `blank` super-design, customer packs that
override only accent), nearest_token collapsed source-distinct
hues to the same brand token — every slice rendered the same
colour. Resolving with `palette={}` lets the source hex propagate
verbatim, the SVG-render path already passes literal hex through.
3. **Value labels in horizontal stacked / percentStacked bars.** The
`_emit_bar_chart` value-label branch only fired in the standard
side-by-side path; the stacked branch had no label emission even
when `<c:showVal val="1"/>` was set. Labels now emit at the
midpoint of each segment, gated on `show_val_labels` and segment
width > 200000 EMU (skips labels in microscopic segments).
End-to-end on the 99-slide showcase deck, with framework fixes only
(no hand-patches on stubborn slides):
mean struct_diff: 6.18% (93.82% quality)
slides ≥85% quality: 96/99 (97%)
Signed-off-by: Mike Mueller <mike@objektarium.de>
Pie / doughnut chart frames carry an explicit plot-area layout when
the deck author has manually positioned the pie inside a larger
frame (the dominant case in showcase decks that pair a pie with a
legend block in the same frame):
<c:plotArea>
<c:layout>
<c:manualLayout>
<c:layoutTarget val="inner"/>
<c:xMode val="edge"/><c:yMode val="edge"/>
<c:x val="0.264"/><c:y val="0.130"/>
<c:w val="0.449"/><c:h val="0.642"/>
</c:manualLayout>
</c:layout>
<c:pieChart>...
`_emit_pie_chart` used a fixed heuristic (60% or 50% of frame width
depending on aspect ratio + legend presence) that put the pie center
on PowerPoint's default-layout position, ignoring the author's
explicit placement. For slide-66's chart14 (with manualLayout
x=0.264 y=0.130 w=0.449 h=0.642 — pie occupies the LEFT 70% of the
frame), the heuristic placed the pie centre-right of the frame and
sized it too small.
Now `_emit_pie_chart` reads `<c:plotArea>/<c:layout>/<c:manualLayout>`
x/y/w/h fractions and uses them directly when present. The radius
becomes half the plot-area's shortest side (the manualLayout already
reserved margin for labels); falls back to the aspect-based heuristic
when source uses `<c:layout/>` (auto-layout).
End-to-end on slide-66: 23.4% → 15.5% struct_diff in one extraction.
Signed-off-by: Mike Mueller <mike@objektarium.de>
`<a:tbl>` cells carrying literal phrases like "Placeholder text" or "This text can be replaced with your own text." are the actual content the source renders — they're visible in every cell of a showcase table that demonstrates the layout. The placeholder-text filter (designed to drop layout-level demo prompts) was suppressing them too, leaving table cards empty in our render while source showed them filled with the demonstrative copy. Adds `Shape.skip_placeholder_filter` bool flag. `_emit_table` sets it to True on every text shape it emits. `emit_dsl` passes those shapes through unfiltered before running the standard placeholder suppression on the rest. Layout-level placeholder prompts in normal `<p:sp>` shapes are still suppressed as before — the flag only short-circuits table cells. Marginal improvement on slide-24 (35.5% → 35.1%) since the dominant residual diff there comes from the source's stair-step geometry, not the table text. But it's the right semantic. Signed-off-by: Mike Mueller <mike@objektarium.de>
This reverts commit 98d7232. Signed-off-by: Mike Mueller <mike@objektarium.de>
8e537ca to
86c974e
Compare
9 tasks
marsmike
added a commit
that referenced
this pull request
May 21, 2026
…s, alpha, font inheritance, presets, color mods, …) Integrates the upstream improvements from #23 into the carved location (feinschliff-builder/.../pptx_svg_decompile.py). Applied via 3-way merge against the restructure branch-base; our local changes (namespace rewrites, discover_brands() refactor, brand-name cleanup in comments) all preserved cleanly with no conflicts. Upstream changes folded in: - font-size inheritance from layout/master (<p:txStyles>, lstStyle) - <a:alpha> on fills (pre-blend on white) - bar chart <c:catAx>/<c:valAx>/<c:dLbls>/<c:delete>/<c:showVal> - per-bar <c:dPt> colours - prstGeom presets (triangle/rtTriangle/diamond/parallelogram/ trapezoid/pentagon/hexagon/heptagon/octagon/chevron/{up,down,left, right}Arrow) - bar-chart legend + title gated on source presence - lumMod/lumOff/tint/shade in both color resolvers - pie legend corner-position normalisation (tr/tl/br/bl) - <c:firstSliceAng> on pie/doughnut - chart series + dPt colours skip nearest_token to preserve hex - value labels in horizontal stacked/percentStacked bars - <c:plotArea>/<c:manualLayout> for pie sizing Also drops the now-unused `os` import (env-path parsing was replaced by discover_brands() in the restructure carve). Net: +521 / -109 lines. Tests: 817 core + 307 builder = 1124 passing. Ruff clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Signed-off-by: Mike Mueller <mike@objektarium.de>
marsmike
added a commit
that referenced
this pull request
May 21, 2026
…s, alpha, font inheritance, presets, color mods, …) Integrates the upstream improvements from #23 into the carved location (feinschliff-builder/.../pptx_svg_decompile.py). Applied via 3-way merge against the restructure branch-base; our local changes (namespace rewrites, discover_brands() refactor, brand-name cleanup in comments) all preserved cleanly with no conflicts. Upstream changes folded in: - font-size inheritance from layout/master (<p:txStyles>, lstStyle) - <a:alpha> on fills (pre-blend on white) - bar chart <c:catAx>/<c:valAx>/<c:dLbls>/<c:delete>/<c:showVal> - per-bar <c:dPt> colours - prstGeom presets (triangle/rtTriangle/diamond/parallelogram/ trapezoid/pentagon/hexagon/heptagon/octagon/chevron/{up,down,left, right}Arrow) - bar-chart legend + title gated on source presence - lumMod/lumOff/tint/shade in both color resolvers - pie legend corner-position normalisation (tr/tl/br/bl) - <c:firstSliceAng> on pie/doughnut - chart series + dPt colours skip nearest_token to preserve hex - value labels in horizontal stacked/percentStacked bars - <c:plotArea>/<c:manualLayout> for pie sizing Also drops the now-unused `os` import (env-path parsing was replaced by discover_brands() in the restructure carve). Net: +521 / -109 lines. Tests: 817 core + 307 builder = 1124 passing. Ruff clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Signed-off-by: Mike Mueller <mike@objektarium.de>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Consolidates three independent XML deep-read improvements for the PPTX decompiler that were originally spread across #21, #22, and #23. Every change is pulling data the source XML already provides; nothing computed or invented.
1. Chart axis + data-label visibility —
<c:catAx>,<c:valAx>,<c:dLbls>_emit_bar_chartalways emitted Y-tick labels, X-category labels, and per-bar value labels regardless of source XML. Showcase charts often set<c:catAx><c:delete val=\"1\"/>/<c:valAx><c:delete val=\"1\"/>for a clean look and<c:dLbls><c:showVal val=\"0\"/>to suppress value labels — extra text was painting through the source's empty space at every tick._emit_bar_chartnow reads<c:catAx/valAx><c:delete>,<c:dLbls><c:showVal>, and<c:showCatName>, gates emission accordingly, AND expands the plot area to fill the space that would otherwise be reserved for hidden labels.2. Alpha on colours —
<a:alpha>PPTX colour elements can carry
<a:alpha val=\"N\"/>where N is 0..100000 (= 0..100% opacity). Source decks use this for Venn diagrams, overlay panels, semi-transparent cards. The previous_resolve_solidand_resolve_fillignored alpha — semi-transparent fills decompiled at full opacity.New helpers
_alpha_for_colorand_blend_on_whitepre-multiply source alpha against a white slide background so the non-overlapping regions match source pixels:True alpha compositing (so overlap regions blend correctly) would need the build pipeline to learn about transparency — out of scope.
3. Layout / master font-size inheritance —
<p:txBody><a:lstStyle>and<p:txStyles>_text_runsdefaulted any paragraph whose run/pPr/lstStyleall omitted explicitszto a hardcoded 18pt. PowerPoint's actual cascade reaches into the slide layout and master:<a:rPr sz=\"...\"><a:pPr><a:defRPr sz=\"...\"><a:txBody><a:lstStyle>...defRPr sz<p:txBody><a:lstStyle>...defRPr sz← new<p:txStyles><p:titleStyle|bodyStyle>...defRPr sz← newA chapter-divider layout writes
<a:t>Chapter title</a:t>at the slide level and inherits 60pt+ from the layout's matching placeholder — without lookup we rendered at 18pt.New helper
_layout_placeholder_default_sz(slide, ph_type, ph_idx)walks layout then master<p:txStyles>._text_runstakesinherited_default_sz=kwarg.Result
Measured end-to-end on a 99-slide corporate showcase deck:
Notable per-slide wins:
Test plan
🤖 Generated with Claude Code