An automated DevOps solution that captures incoming newsletters from Gmail, sanitizes them, and archives them as a static, responsive website hosted on GitHub Pages.
This project is designed for automated ingestion and high-fidelity archival. Below is the technical breakdown for developers and LLMs.
graph TD;
A[Gmail Inbox] -- "IMAP Fetch" --> B[process_email.py];
B -- "Raw Content" --> C[src/parser.py];
subgraph "Parsing & Sanitization"
C -- "Soup Parsing" --> C1[Remove Fwd/Quoted Headers];
C1 -- "Asset Localizing" --> C2[Download Images to /assets];
C2 -- "Metadata Extraction" --> C3[Detect CRM, Preheader, Reading Time];
C3 -- "Link Auditing" --> C4[Extract Domain/Tracking Info];
end
C4 -- "Clean Data JSON" --> D[src/generator.py];
D -- "Jinja2 Templates" --> E[docs/ archives];
E -- "Deployment" --> F[GitHub Pages];
| Component | File Path | Responsibility |
|---|---|---|
| Orchestrator | process_email.py |
Main entry point. Handles Gmail auth, IMAP fetching, and triggers parsing/generation. |
| Parser | src/parser.py |
EmailParser class. Handles BeautifulSoup logic, image localization, tracking pixel detection, and link metadata extraction (domain/tracking). |
| Generator | src/generator.py |
Render logic using Jinja2 templates (templates/). Handles the creation of index.html and individual viewer.html files. |
| Viewer UI | templates/viewer.html |
The responsive dashboard for emails. Contains the fixed sidebar, mobile simulator, and link interaction logic (Spotlight, Overlays). |
| Theme & UX | src/assets/js/main.js |
Client-side logic for theme toggling, search filtering, and "Smart Inversion" dark mode. |
| Manual Injector | injector.py |
Streamlit app for out-of-band archival. Fixes lazy-loading and relative paths. |
Instead of complex CSS re-theming of unknown email HTML, we apply a global filter to the email iframe: filter: invert(1) hue-rotate(180deg).
- Spotlight Problem: Shadows and highlights are inverted. We use "Pre-inverted" CSS variables in
viewer.htmlso that when the filter is applied, they flip back to the intended colors (e.g., purple inverts to green highlights).
Links are parsed into structured objects:
- Header: Index, Clean Domain, Tracking Tag.
- Body: Anchor text.
- URL Zone: Monospace URL + clipboard interaction with visual feedback.
Since email HTML is sandboxed in an iframe, link numbering badges are rendered in the parent window using absolute positioning calculated via getBoundingClientRect().
- Clipping Logic: Badges are hidden if the target link scrolls out of the iframe viewport.
- Parser Changes: When adding new metadata, update the
EmailParserinsrc/parser.pyfirst. Ensure the dictionary returned is compatible with Jinja2 expectations. - Template Updates: Updates to
templates/viewer.htmlmust maintain the JS-based sidebar logic. Be careful with variable escaping when injecting JSON into script tags. - Asset Isolation: All archived images MUST be saved to
docs/assets/to ensure offline/long-term availability. - CSS Architecture: Use the getinside Design System palette defined in
src/assets/css/style.cssand documented inDESIGN-SYSTEM.md. Key tokens:#0aaa8ebrand primary (light),#6AE7C8mint accent,#F7F6F3light bg,#1b1b1fdark bg. After editing CSS, copysrc/assets/css/style.css→docs/assets/css/style.css.
- Gmail: Use an App Password and the label
Github/archive-newsletters. - Secrets: Set
GMAIL_USERandGMAIL_PASSWORDin GitHub Repo Secrets. - Streamlit: (Optional) Run
streamlit run injector.pyfor manual uploads.
- Author: Benoît Prentout
- License: MIT
- Contents remain the property of their respective authors. This is a technical demonstration.