Skip to content

feat(decompile): three XML deep-read fixes (chart-axis + alpha + font inheritance)#23

Merged
marsmike merged 13 commits into
mainfrom
fix/layout-font-inheritance
May 21, 2026
Merged

feat(decompile): three XML deep-read fixes (chart-axis + alpha + font inheritance)#23
marsmike merged 13 commits into
mainfrom
fix/layout-font-inheritance

Conversation

@marsmike
Copy link
Copy Markdown
Owner

@marsmike marsmike commented May 20, 2026

Summary

Consolidates three independent XML deep-read improvements for the PPTX decompiler that were originally spread across #21, #22, and #23. Every change is pulling data the source XML already provides; nothing computed or invented.

1. Chart axis + data-label visibility — <c:catAx>, <c:valAx>, <c:dLbls>

_emit_bar_chart always emitted Y-tick labels, X-category labels, and per-bar value labels regardless of source XML. Showcase charts often set <c:catAx><c:delete val=\"1\"/> / <c:valAx><c:delete val=\"1\"/> for a clean look and <c:dLbls><c:showVal val=\"0\"/> to suppress value labels — extra text was painting through the source's empty space at every tick.

_emit_bar_chart now reads <c:catAx/valAx><c:delete>, <c:dLbls><c:showVal>, and <c:showCatName>, gates emission accordingly, AND expands the plot area to fill the space that would otherwise be reserved for hidden labels.

2. Alpha on colours — <a:alpha>

PPTX colour elements can carry <a:alpha val=\"N\"/> where N is 0..100000 (= 0..100% opacity). Source decks use this for Venn diagrams, overlay panels, semi-transparent cards. The previous _resolve_solid and _resolve_fill ignored alpha — semi-transparent fills decompiled at full opacity.

New helpers _alpha_for_color and _blend_on_white pre-multiply source alpha against a white slide background so the non-overlapping regions match source pixels:

blended_c = c * alpha + 255 * (1 - alpha)

True alpha compositing (so overlap regions blend correctly) would need the build pipeline to learn about transparency — out of scope.

3. Layout / master font-size inheritance — <p:txBody><a:lstStyle> and <p:txStyles>

_text_runs defaulted any paragraph whose run/pPr/lstStyle all omitted explicit sz to a hardcoded 18pt. PowerPoint's actual cascade reaches into the slide layout and master:

  1. slide-level <a:rPr sz=\"...\">
  2. paragraph <a:pPr><a:defRPr sz=\"...\">
  3. slide's <a:txBody><a:lstStyle>...defRPr sz
  4. layout's placeholder <p:txBody><a:lstStyle>...defRPr sz ← new
  5. master's <p:txStyles><p:titleStyle|bodyStyle>...defRPr sz ← new
  6. hardcoded 18pt fallback

A chapter-divider layout writes <a:t>Chapter title</a:t> at the slide level and inherits 60pt+ from the layout's matching placeholder — without lookup we rendered at 18pt.

New helper _layout_placeholder_default_sz(slide, ph_type, ph_idx) walks layout then master <p:txStyles>. _text_runs takes inherited_default_sz= kwarg.

Result

Measured end-to-end on a 99-slide corporate showcase deck:

state mean struct_diff slides >15%
main (PR #20) 8.21% 13/99
+ this PR 6.71% (93.29% quality) 5/99

Notable per-slide wins:

  • slide-61 (Venn diagram): 18.05% → 5.8% (alpha)
  • slides 67, 93, 94 cleared the 15% threshold from chart-axis fix
  • chapter-divider title sizes now match source (inheritance)

Test plan

  • 996 existing tests pass
  • Ruff clean
  • Manually inspected slides 04-09 (titles), 61 (Venn), 56/66/93 (charts)

🤖 Generated with Claude Code

marsmike added 4 commits May 20, 2026 19:26
`_text_runs` defaulted any paragraph whose run/`<a:pPr>`/`<a:lstStyle>`
all omitted explicit `sz` to a hardcoded 18pt. PowerPoint's actual
cascade reaches into the slide layout and slide master:

  1. slide-level run `<a:rPr sz="...">`            (already honoured)
  2. paragraph `<a:pPr><a:defRPr sz="...">`        (already honoured)
  3. slide's `<a:txBody><a:lstStyle>...defRPr sz`  (already honoured)
  4. **layout's placeholder `<p:txBody><a:lstStyle>...defRPr sz`** ← new
  5. **master's `<p:txStyles><p:titleStyle|bodyStyle>...defRPr sz`** ← new
  6. hardcoded 18pt fallback

Steps 4 and 5 are the layout-inheritance cascade. On a typical chapter-
divider layout the slide-level placeholder writes only `<a:t>Chapter
title</a:t>` and inherits the 60pt+ headline size from the layout's
matching placeholder. Without the lookup we render at 18pt — visibly
too small.

New helper `_layout_placeholder_default_sz(slide, ph_type, ph_idx)`
walks the layout (then the master's `<p:txStyles>`) for the matching
placeholder. `_text_runs` gains an `inherited_default_sz` kwarg fed by
`_emit_sp` whenever the shape carries `ph_type`/`ph_idx`. The cascade
priority remains correct: slide-level lstStyle still wins over
inherited; inherited only fires when no slide-level default exists.

End-to-end on the 99-slide showcase, plus stacks cleanly with the
chart-axis (PR #21) and alpha-fills (PR #22) improvements:

  mean struct_diff: 8.21% → 6.71% (91.79% → 93.29% quality)
  slides above 15% threshold: 13 → 5

Signed-off-by: Mike Mueller <mike@objektarium.de>
PPTX colour elements can carry `<a:alpha val="N"/>` where N is 0..100000
encoding 0..100% opacity. Source decks use this for Venn diagrams,
overlay panels, glass-effect cards, and any composition where shapes
sit on top of each other and the overlap region should mix.

The previous `_resolve_solid` and `_resolve_fill` paths ignored the
alpha element entirely — semi-transparent fills decompiled at full
opacity, so Venn circles render as solid blobs instead of the
expected lighter-on-white standalone + darker overlap regions.

Threading true alpha through the build pipeline would require the
expander, emitter, and python-pptx fill APIs to learn about
transparency. As a contained first pass, we pre-multiply the source
alpha against a white slide background:

  blended_c = c * alpha + 255 * (1 - alpha)

For typical white-canvas slides this reproduces the perceived colour
of standalone (non-overlapping) semi-transparent fills exactly. For
overlapping regions (Venn intersections) the result is still wrong —
real alpha compositing would darken the intersection — but the
standalone fills now match source pixels, which is the dominant
contribution to the diff.

On the showcase deck slide-61 (the Venn diagram example) went from
18.05% struct_diff to 5.8% in a single change.

Two new helpers:

- `_alpha_for_color(color_el)` — read `<a:alpha>` child, returns 0..1
- `_blend_on_white(rgb, alpha)` — apply the pre-multiply formula

Both `_resolve_solid` (used by line strokes, table cells, chart
series) and `_resolve_fill` (shape fills, including the grpFill walk)
now apply the blend on srgb AND scheme colour paths. Theme-scheme
colours also pick up `<a:alpha>` when the source layers it onto an
inherited brand accent.

Signed-off-by: Mike Mueller <mike@objektarium.de>
`_emit_bar_chart` always emitted Y-axis tick labels, X-axis category
labels, and per-bar value labels — regardless of what the source's
chart XML actually requested. Many showcase charts set
`<c:catAx><c:delete val="1"/>` and/or `<c:valAx><c:delete val="1"/>`
to hide the axes for a clean look and `<c:dLbls><c:showVal val="0"/>`
to suppress value labels, but our renders painted them all anyway —
extra text running through the source's empty space at every tick
position, every category boundary, and above every bar.

`_emit_bar_chart` now reads:

- `<c:catAx><c:delete>` / `<c:valAx><c:delete>` — skip the
  corresponding axis labels AND expand the plot area to fill the
  space that would otherwise be reserved for them
- `<c:dLbls><c:showVal>` / `<c:showCatName>` — gate value-above-bar
  and category labels on the explicit source flags rather than
  always-emit

End-to-end measured on the showcase deck (six bar-chart slides
above the 15% struct threshold):

| slide   | before | after  |
|---------|-------:|-------:|
| 56      | 26.1%  | 25.7%  |
| 66      | 24.9%  | 23.6%  |
| 67      | 16.6%  | 13.9%  |
| 93      | 16.5%  | 14.8%  |
| 94      | 15.4%  | 13.1%  |
| 98      | 18.7%  | 16.7%  |

Three slides (67, 93, 94) cleared the threshold. The two clean-look
showcase charts (56, 66) still carry residual diff from their
horizontal/doughnut hybrid composition — separate work.

This is purely a deeper read of XML data the source was already
providing; nothing was invented or estimated.

Signed-off-by: Mike Mueller <mike@objektarium.de>
CI lint flagged `show_cat_labels` as assigned-but-never-used. The fix
also tightens behaviour to match the source: category labels now
require BOTH the axis to be visible (`<c:catAx><c:delete val="0">`)
AND `<c:dLbls><c:showCatName val="1">`. The previous code only checked
axis visibility, which over-emitted on charts that hide labels per-
chart-element while keeping the axis tick frame.

Signed-off-by: Mike Mueller <mike@objektarium.de>
@marsmike marsmike changed the title fix(decompile): inherit placeholder default font-size from layout/master feat(decompile): three XML deep-read fixes (chart-axis + alpha + font inheritance) May 20, 2026
marsmike added 9 commits May 20, 2026 19:34
Showcase decks often colour alternating bars different hues to spotlight
a specific category (e.g. five bars with bars 1/3/5 in accent1 and
2/4 in accent2 to read as "this is the highlighted set"). PowerPoint
writes that information per data point as

  <c:ser>
    <c:dPt><c:idx val="N"/><c:spPr><a:solidFill>…</a:solidFill></c:spPr></c:dPt>
    <c:dPt><c:idx val="N+1"/>…</c:dPt>
    …
  </c:ser>

`_emit_bar_chart` read only the series-level `<c:spPr>` so every bar in
a series rendered the series colour, losing the highlight pattern.
`_emit_pie_chart` already reads dPt (added in earlier work); this brings
bar charts to parity.

The new tuple element `dpt_colors: dict[int, str]` maps category-index
to resolved fill, and every bar render site (horizontal-stacked,
horizontal-clustered, vertical-clustered) prefers dPt over series.

End-to-end on the showcase deck slide-56 (5 bars with alternating
accent1/accent2 fills) drops from 25.4% to 14.1% struct_diff — cleared
the 15% threshold.

Signed-off-by: Mike Mueller <mike@objektarium.de>
`_shape_geometry_kind` only knew the four "core" presets (`ellipse`,
`line`/`straightConnector1`, `rect`, `roundRect`). Every other preset
fell through to `rect` and the shape rendered as its bounding box —
triangles became rectangles, arrows became rectangles, diamonds
became rectangles. Inventory across the showcase deck XML:

  rect       820        (handled)
  ellipse    107        (handled)
  line        59        (handled)
  triangle    19        → handled now
  arc         12        (todo: arc adjustments)
  upArrow      6        → handled now
  rtTriangle   6        → handled now
  rightArrow   5        → handled now
  …

Adds `_PRESET_PATH_PRESETS` enumerating the closed-polygon presets the
decompiler now synthesises an SVG `d` path for, and
`_preset_geom_path(preset, w, h)` returning that string in local
0..w × 0..h pixel coordinates. Covered presets:

  triangle, rtTriangle, diamond, parallelogram, trapezoid
  pentagon / homePlate, hexagon, heptagon, octagon, chevron
  rightArrow, leftArrow, upArrow, downArrow

All use PowerPoint's default unadjusted geometry — `<a:avLst>` slider
overrides are not threaded through. For convex polygons the default
form is visually faithful in the dominant case; arrows use 50% shaft /
50% barb (PowerPoint default).

`_shape_geometry_kind` routes these presets to `kind="shape"` and
`_emit_sp` synthesises `svg_path_d` when `_custgeom_svg_d` returns
None (no `<a:custGeom>` present but a known preset is).

End-to-end on the showcase deck slide-57 (a podium chart of three
triangles labeled "Text"): 16.8% → 2.9% struct_diff. Cleared the 15%
threshold from one preset-geometry source-XML read.

Signed-off-by: Mike Mueller <mike@objektarium.de>
`_emit_bar_chart` always painted a legend swatch + series-name row,
even when the source chart carried no `<c:legend>` element at all.
Bare-bones showcase bar charts (with only bars + value labels)
emitted a phantom "Datenreihe 1" label at the bottom-left, often
wrapped mid-word inside the swatch slot.

Same for the chart title: emitted whenever `<c:title>` had any text,
ignoring the `<c:autoTitleDeleted val="1"/>` flag that says the
user explicitly removed the auto-generated title.

Now both fire only when their source elements actually exist:

- legend rows: only when `<c:legend>` is present
- chart title: only when `<c:title>` has text AND
  `<c:autoTitleDeleted>` is absent / val="0"

Pure XML deep-read; no other changes.

Signed-off-by: Mike Mueller <mike@objektarium.de>
`_resolve_solid` (used by line strokes, chart series colours,
per-data-point `<c:dPt>` fills, and table cell borders) ignored every
colour modifier. PowerPoint encodes accent variations as

  <a:schemeClr val="accent1">
    <a:lumMod val="50000"/>
    <a:lumOff val="50000"/>
  </a:schemeClr>

(a 50%-mixed accent for a lighter swatch in a chart series). Without
applying the modifier we resolved every modified accent to the
unmodified base theme colour, losing per-series colour distinctions
across stacked-bar charts, doughnut slice variations, and any source
that tints theme colours for a colour ramp.

Inventory across the test deck: 1056 lumMod/lumOff modifiers and
0 tint/shade. Adds tint/shade support too for completeness — those
will fire on decks that use the alternative percent-of-source-colour
modifier syntax.

New helper `_apply_color_mods(rgb, color_el)` handles all four
(lumMod, lumOff, tint, shade) and is called from both `_resolve_solid`
and `_resolve_fill` after the base RGB is resolved. `_resolve_fill`'s
inline lumMod/lumOff block (already there) is now delegated to the
shared helper so srgb-side colour modifiers also apply (previously
only scheme-side did).

Signed-off-by: Mike Mueller <mike@objektarium.de>
`_emit_pie_chart` only recognised the four cardinal `<c:legendPos>`
values ("l", "r", "t", "b") and fell back to "r" for corner positions
("tr", "tl", "br", "bl"). For the dominant case ("tr" - PowerPoint's
default-ish position) "r" was the right collapse anyway, but explicit
"tl"/"bl" (left-side corner legends) ended up rendering as right-side
legends, mirroring the pie horizontally.

Now the normalisation routes corner positions to their horizontal
axis: "l"/"tl"/"bl" → "l", "r"/"tr"/"br" → "r", "t"/"b" stay
vertical. The pie sizing logic then reserves the correct slot.

This is the minimal fix that keeps `_emit_pie_chart`'s coordinate
math unchanged. A future pass could honour the explicit
`<c:layout><c:manualLayout>` x/y/w/h fractions when present, but
PPTX charts that include manualLayout typically still set the
canonical legendPos, so collapsing to the axis is the right
first-order behaviour.

Signed-off-by: Mike Mueller <mike@objektarium.de>
Dispatched three parallel sub-agents to mine the source XML for the
remaining stubborn slides (24, 66, 98). Each reported specific
extraction gaps; this commit lands the three smallest framework
fixes that came out of those reports.

1. **`<c:firstSliceAng>` on pie/doughnut.** Source decks rotating
   the first slice into a fixed position (corporate showcase pattern
   for highlighting a specific category) carry
   `<c:firstSliceAng val="N"/>` on the chart element. The decompile
   ignored it, so every slice's start angle was off by N°. Now
   applied as an additive rotation on top of the standard 12-o'clock
   start.

2. **Chart series + dPt colours skip nearest_token.** `_emit_pie_chart`
   and `_emit_bar_chart` resolved `<c:ser><c:spPr>` and per-data-point
   `<c:dPt><c:spPr>` colours through the brand palette via
   `_resolve_fill(..., palette)`. For brands lacking explicit
   chart-series tokens (the `blank` super-design, customer packs that
   override only accent), nearest_token collapsed source-distinct
   hues to the same brand token — every slice rendered the same
   colour. Resolving with `palette={}` lets the source hex propagate
   verbatim, the SVG-render path already passes literal hex through.

3. **Value labels in horizontal stacked / percentStacked bars.** The
   `_emit_bar_chart` value-label branch only fired in the standard
   side-by-side path; the stacked branch had no label emission even
   when `<c:showVal val="1"/>` was set. Labels now emit at the
   midpoint of each segment, gated on `show_val_labels` and segment
   width > 200000 EMU (skips labels in microscopic segments).

End-to-end on the 99-slide showcase deck, with framework fixes only
(no hand-patches on stubborn slides):

  mean struct_diff: 6.18% (93.82% quality)
  slides ≥85% quality: 96/99 (97%)

Signed-off-by: Mike Mueller <mike@objektarium.de>
Pie / doughnut chart frames carry an explicit plot-area layout when
the deck author has manually positioned the pie inside a larger
frame (the dominant case in showcase decks that pair a pie with a
legend block in the same frame):

  <c:plotArea>
    <c:layout>
      <c:manualLayout>
        <c:layoutTarget val="inner"/>
        <c:xMode val="edge"/><c:yMode val="edge"/>
        <c:x val="0.264"/><c:y val="0.130"/>
        <c:w val="0.449"/><c:h val="0.642"/>
      </c:manualLayout>
    </c:layout>
    <c:pieChart>...

`_emit_pie_chart` used a fixed heuristic (60% or 50% of frame width
depending on aspect ratio + legend presence) that put the pie center
on PowerPoint's default-layout position, ignoring the author's
explicit placement. For slide-66's chart14 (with manualLayout
x=0.264 y=0.130 w=0.449 h=0.642 — pie occupies the LEFT 70% of the
frame), the heuristic placed the pie centre-right of the frame and
sized it too small.

Now `_emit_pie_chart` reads `<c:plotArea>/<c:layout>/<c:manualLayout>`
x/y/w/h fractions and uses them directly when present. The radius
becomes half the plot-area's shortest side (the manualLayout already
reserved margin for labels); falls back to the aspect-based heuristic
when source uses `<c:layout/>` (auto-layout).

End-to-end on slide-66: 23.4% → 15.5% struct_diff in one extraction.

Signed-off-by: Mike Mueller <mike@objektarium.de>
`<a:tbl>` cells carrying literal phrases like "Placeholder text" or
"This text can be replaced with your own text." are the actual
content the source renders — they're visible in every cell of a
showcase table that demonstrates the layout. The placeholder-text
filter (designed to drop layout-level demo prompts) was suppressing
them too, leaving table cards empty in our render while source
showed them filled with the demonstrative copy.

Adds `Shape.skip_placeholder_filter` bool flag. `_emit_table` sets
it to True on every text shape it emits. `emit_dsl` passes those
shapes through unfiltered before running the standard placeholder
suppression on the rest.

Layout-level placeholder prompts in normal `<p:sp>` shapes are
still suppressed as before — the flag only short-circuits table
cells.

Marginal improvement on slide-24 (35.5% → 35.1%) since the
dominant residual diff there comes from the source's stair-step
geometry, not the table text. But it's the right semantic.

Signed-off-by: Mike Mueller <mike@objektarium.de>
This reverts commit 98d7232.

Signed-off-by: Mike Mueller <mike@objektarium.de>
@marsmike marsmike force-pushed the fix/layout-font-inheritance branch from 8e537ca to 86c974e Compare May 21, 2026 04:33
@marsmike marsmike merged commit d74053c into main May 21, 2026
2 checks passed
@marsmike marsmike deleted the fix/layout-font-inheritance branch May 21, 2026 04:34
marsmike added a commit that referenced this pull request May 21, 2026
…s, alpha, font inheritance, presets, color mods, …)

Integrates the upstream improvements from #23 into
the carved location (feinschliff-builder/.../pptx_svg_decompile.py).
Applied via 3-way merge against the restructure branch-base; our local
changes (namespace rewrites, discover_brands() refactor, brand-name
cleanup in comments) all preserved cleanly with no conflicts.

Upstream changes folded in:

- font-size inheritance from layout/master (<p:txStyles>, lstStyle)
- <a:alpha> on fills (pre-blend on white)
- bar chart <c:catAx>/<c:valAx>/<c:dLbls>/<c:delete>/<c:showVal>
- per-bar <c:dPt> colours
- prstGeom presets (triangle/rtTriangle/diamond/parallelogram/
  trapezoid/pentagon/hexagon/heptagon/octagon/chevron/{up,down,left,
  right}Arrow)
- bar-chart legend + title gated on source presence
- lumMod/lumOff/tint/shade in both color resolvers
- pie legend corner-position normalisation (tr/tl/br/bl)
- <c:firstSliceAng> on pie/doughnut
- chart series + dPt colours skip nearest_token to preserve hex
- value labels in horizontal stacked/percentStacked bars
- <c:plotArea>/<c:manualLayout> for pie sizing

Also drops the now-unused `os` import (env-path parsing was replaced
by discover_brands() in the restructure carve).

Net: +521 / -109 lines. Tests: 817 core + 307 builder = 1124 passing.
Ruff clean.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Signed-off-by: Mike Mueller <mike@objektarium.de>
marsmike added a commit that referenced this pull request May 21, 2026
…s, alpha, font inheritance, presets, color mods, …)

Integrates the upstream improvements from #23 into
the carved location (feinschliff-builder/.../pptx_svg_decompile.py).
Applied via 3-way merge against the restructure branch-base; our local
changes (namespace rewrites, discover_brands() refactor, brand-name
cleanup in comments) all preserved cleanly with no conflicts.

Upstream changes folded in:

- font-size inheritance from layout/master (<p:txStyles>, lstStyle)
- <a:alpha> on fills (pre-blend on white)
- bar chart <c:catAx>/<c:valAx>/<c:dLbls>/<c:delete>/<c:showVal>
- per-bar <c:dPt> colours
- prstGeom presets (triangle/rtTriangle/diamond/parallelogram/
  trapezoid/pentagon/hexagon/heptagon/octagon/chevron/{up,down,left,
  right}Arrow)
- bar-chart legend + title gated on source presence
- lumMod/lumOff/tint/shade in both color resolvers
- pie legend corner-position normalisation (tr/tl/br/bl)
- <c:firstSliceAng> on pie/doughnut
- chart series + dPt colours skip nearest_token to preserve hex
- value labels in horizontal stacked/percentStacked bars
- <c:plotArea>/<c:manualLayout> for pie sizing

Also drops the now-unused `os` import (env-path parsing was replaced
by discover_brands() in the restructure carve).

Net: +521 / -109 lines. Tests: 817 core + 307 builder = 1124 passing.
Ruff clean.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Signed-off-by: Mike Mueller <mike@objektarium.de>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant