Parent quality EPIC. Root cause: verify.score uses SSIM only, which is largely luminance/structural — it returned 0.923 for shiba with no face and ignored red→brown shifts. The loop therefore optimizes the wrong thing and reports false 'pass'.
Fix: blend SSIM with a color-fidelity term, e.g. mean perceptual color distance (ΔE in Lab) or a per-region color-histogram difference, into the score the loop optimizes and reports. Recalibrate --quality default if needed. Optionally add a small-feature/edge term so deleted faces are penalized.
Acceptance: the shiba 'no-face' and watercolor 'desaturated' outputs score materially below a faithful conversion; faithful outputs still score high. Add tests with a color-shifted fixture.
Parent quality EPIC. Root cause:
verify.scoreuses SSIM only, which is largely luminance/structural — it returned 0.923 for shiba with no face and ignored red→brown shifts. The loop therefore optimizes the wrong thing and reports false 'pass'.Fix: blend SSIM with a color-fidelity term, e.g. mean perceptual color distance (ΔE in Lab) or a per-region color-histogram difference, into the score the loop optimizes and reports. Recalibrate
--qualitydefault if needed. Optionally add a small-feature/edge term so deleted faces are penalized.Acceptance: the shiba 'no-face' and watercolor 'desaturated' outputs score materially below a faithful conversion; faithful outputs still score high. Add tests with a color-shifted fixture.