Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion backend/app/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ async def lifespan(app: FastAPI):


app = FastAPI(
title="Flair2 — AI Campaign Studio",
title="Flair2 — AI Script Studio",
version="0.1.0",
lifespan=lifespan,
)
Expand Down
20 changes: 2 additions & 18 deletions docs/lessons/01-why-this-system-exists.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,9 @@

## The product

Flair2 is an AI Campaign Studio. A user enters a brand name and a creator profile. The system analyzes a dataset of viral videos, extracts structural patterns, generates candidate marketing scripts based on those patterns, has 100 simulated personas vote on the scripts, ranks the winners, and personalizes the top scripts to the creator's voice. Then the user watches the results appear in real time.
Flair2 is an AI Script Studio for short-form video. A creator enters a profile — tone, niche, audience, catchphrases — and with one click the system returns **ten ready-to-shoot video scripts** to pick from. Under the hood it analyzes 100 real viral TikToks, extracts structural patterns, generates candidate scripts shaped by those patterns, has a panel of 42 personas vote, ranks the winners, and personalizes the top scripts into the creator's voice. The user watches the results appear in real time.

One click. Six stages. Roughly 261 LLM API calls. ~500,000 tokens of work. The user sees live progress as each stage completes.
One click. Six stages. Roughly 260 LLM API calls. ~500,000 tokens of work. The user sees live progress as each stage completes.

That description — one click, many calls, live progress — is the entire reason a distributed architecture exists here. If it were one LLM call returning one result, you'd write a Python script.

Expand Down Expand Up @@ -44,22 +44,6 @@ A useful discipline: before accepting a complex design, try to break it with sim

Each simpler approach fails on at least one of the three forces. That's why the architecture is what it is.

## V1 to V2: what changed and why

Flair2 is a V2 rewrite of an earlier hackathon prototype ([gemini-social-asset](https://github.com/yangyang-how/gemini-social-asset)). Understanding what V1 got wrong tells you what V2 is designed to prevent:

| V1 | V2 | Why it changed |
|----|-----|---------------|
| Monolithic `main.py` | Separated modules (`api/`, `pipeline/`, `workers/`, `infra/`) | One file with everything means you can't change one part without risking all parts. Module boundaries are change boundaries. |
| In-memory state | Redis-backed state | Process dies, state dies. Redis survives process restarts. |
| Sequential pipeline | Concurrent workers with MapReduce | S1 analyzing 100 videos one-by-one takes 100x longer than analyzing them concurrently. |
| Gemini only | Pluggable provider registry (Kimi is live) | Gemini had intermittent 500s and rate limit issues. The registry pattern made switching to Kimi a one-line change. |
| No tests | pytest with unit + integration + experiment coverage | V1 "worked" locally and broke in production. Tests are how you know it still works after changes. |
| Google Cloud Run | AWS (ECS Fargate, ElastiCache, ALB, S3) | Course requirement + richer distributed systems story. |
| `.DS_Store` and `__pycache__` committed | `.gitignore` from day one | Hygiene. Never commit generated files. |

The pattern to notice: **every V2 decision exists to prevent a specific V1 failure mode.** When you design systems, you should be able to name the failure each design choice prevents. If you can't, you're adding complexity without justification.

## The two-person team

Flair2 was built by two people — Sam (pipeline + frontend) and Jess (infrastructure + distributed systems). This shaped the architecture:
Expand Down
10 changes: 4 additions & 6 deletions docs/lessons/02-the-deployed-architecture.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,11 +4,9 @@

## A note on honesty

The original architecture doc (`design/architecture.md`) says the backend deploys to Railway and the frontend to Cloudflare Pages. In reality, the backend runs on AWS ECS Fargate with ElastiCache Redis, and the frontend is an S3 static website. The Gemini API is mentioned throughout the design docs and experiment reports — in reality, the production provider is Kimi (Moonshot AI), accessed over Kimi's coding endpoint using the Anthropic Messages API. The architecture doc hasn't been updated.
The original architecture doc (`design/architecture.md`) describes one topology; the deployed system differs in several places. Design docs freeze at the moment they were written — `git log`, `grep`, and `terraform plan` are more trustworthy than any markdown file.

This is the single most common documentation failure in software: **design docs freeze at the point they were written.** Always verify against the code and the infrastructure. `git log`, `grep`, and `terraform plan` are more trustworthy than any markdown file.

This article describes the real system as of April 2026, verified against code and PR history.
This article describes the real system as of April 2026, verified against code and infrastructure.

## The topology

Expand Down Expand Up @@ -110,9 +108,9 @@ This article describes the real system as of April 2026, verified against code a
**File:** `backend/app/providers/kimi.py`
**File:** `backend/app/providers/registry.py`

**How it connects:** Kimi's coding endpoint speaks the **Anthropic Messages API** at `/coding/v1/messages`. The `KimiProvider` uses the `AsyncAnthropic` client with `base_url="https://api.kimi.com/coding"` and a `default_headers` override for User-Agent (Kimi's endpoint whitelists approved coding agents — Claude Code, Kimi CLI, etc.). An earlier version of the endpoint spoke OpenAI's `chat/completions` schema; that surface went dead in early 2026 and we migrated to Anthropic's SDK. See [Article 15](15-kimi-and-openai-compatibility.md) for the migration story.
**How it connects:** Kimi's coding endpoint speaks the **Anthropic Messages API** at `/coding/v1/messages`. The `KimiProvider` uses the `AsyncAnthropic` client with `base_url="https://api.kimi.com/coding"` and a `default_headers` override for User-Agent (Kimi's endpoint whitelists approved coding agents — Claude Code, Kimi CLI, etc.). See [Article 15](15-kimi-and-openai-compatibility.md) for the full wiring.

**The migration stories:** Plural, now. First Gemini → Kimi (PR #95: "remove Gemini secret requirement"), driven by Gemini's intermittent 500s and rate-limit issues. Then OpenAI SDK → Anthropic SDK, driven by Kimi deprecating their OpenAI-compatible shim. Both migrations only touched `providers/kimi.py` because every stage calls through the `ReasoningProvider` Protocol. The `GeminiProvider` class still exists; it's just not wired up in production.
**Why provider choice is isolated:** every stage calls through the `ReasoningProvider` Protocol, so the specific SDK and endpoint shape live in exactly one file (`providers/kimi.py`). A `GeminiProvider` class also exists in the registry but isn't wired up in production — it's available if you pass `reasoning_model: "gemini"` at run time.

### Frontend (Astro + React on S3)

Expand Down
2 changes: 1 addition & 1 deletion docs/lessons/04-the-api-layer.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ This pattern has a name: **non-blocking request handling**. Learn to spot it eve

```python
app = FastAPI(
title="Flair2 — AI Campaign Studio",
title="Flair2 — AI Script Studio",
version="0.1.0",
lifespan=lifespan,
)
Expand Down
30 changes: 13 additions & 17 deletions docs/lessons/14-the-provider-abstraction.md
Original file line number Diff line number Diff line change
@@ -1,16 +1,12 @@
# 14. The Provider Abstraction

> Flair2 switched its entire LLM backend from Gemini to Kimi in one PR. The reason it was possible is a 30-line abstraction layer that most developers would dismiss as "premature." This article explains why it wasn't, and teaches you the design pattern behind it.
> No code in Flair2's pipeline names a specific LLM vendor. Stages ask a `ReasoningProvider` for text; the concrete provider is chosen at runtime from configuration. This article explains why that shape matters and how the ~30 lines that make it work are structured.

## The payoff story
## The shape of the abstraction

PR #95: "chore: remove Gemini secret requirement, Kimi-only deployment."
The pipeline never writes `KimiProvider` or `GeminiProvider`. It writes `ReasoningProvider`. The specific implementation is chosen at runtime — per pipeline run — from the `reasoning_model` field in the request config.

Here's what changed: the Terraform config stopped provisioning a Gemini API key, and the default reasoning model in the frontend was set to "kimi." That's it. No stage functions were modified. No task definitions changed. No tests broke (except the ones specifically testing GeminiProvider).

The switch was possible because no code in the pipeline says `KimiProvider` or `GeminiProvider`. It says `ReasoningProvider`. The specific implementation is chosen at runtime, from configuration.

**Cost of the abstraction:** ~30 lines of code (the Protocol class + the registry). **Payoff:** hours of migration work avoided, plus the ability to switch providers again in the future without touching business logic.
**Cost:** ~30 lines of code (the Protocol class + the registry). **What it buys:** callers don't depend on SDKs, credentials, or endpoint shapes. Stage code stays readable and focused on prompts and parsing. Swapping or adding a provider is a one-file change, never a cross-codebase refactor.

## The Protocol class

Expand Down Expand Up @@ -161,7 +157,7 @@ class KimiProvider:
return self._client
```

Kimi speaks the **Anthropic Messages API** on its coding endpoint. We use the `AsyncAnthropic` client with a custom `base_url`. (An earlier version of Kimi spoke OpenAI's chat/completions schema instead — that surface went dead and the client had to migrate. [Article 15](15-kimi-and-openai-compatibility.md) covers the migration history and why the abstraction let us do it with a ~30-line change.)
Kimi speaks the **Anthropic Messages API** on its coding endpoint. We use the `AsyncAnthropic` client with a custom `base_url` and a User-Agent header Kimi's endpoint requires. [Article 15](15-kimi-and-openai-compatibility.md) covers the full wiring and the User-Agent whitelist.

### GeminiProvider (`providers/gemini.py`)

Expand Down Expand Up @@ -218,19 +214,19 @@ A second Protocol for video generation. Currently has no implementations — the

The registry already has a `_video_providers` dict and a `register_video` function. Adding a video provider would follow the exact same pattern as the reasoning providers.

## When abstraction is premature vs prescient
## When abstraction is premature vs justified

The common objection: "You only have two providers. This is premature abstraction. Just use Kimi directly."
The common objection: "Two providers isn't enough to justify an interface. Just call Kimi directly."

Here's why it wasn't premature:
Here's why the abstraction earns its keep:

1. **The switch actually happened.** Gemini → Kimi migration was driven by Gemini's intermittent 500s and rate limit issues. Without the abstraction, every stage function would have needed modification.
1. **The interface is dictated by the domain, not speculation.** Every LLM provider takes a prompt and returns text. The shape of `generate_text(prompt, schema) -> str` isn't a guess — it's the only shape the domain allows.

2. **The cost was trivial.** Protocol class: 15 lines. Registry: 20 lines. Zero runtime overhead. The abstraction doesn't add complexity to the codebase — it removes it from every stage function.
2. **The cost is trivial.** Protocol class: 15 lines. Registry: 20 lines. Zero runtime overhead. The abstraction doesn't add complexity to the codebase — it removes it from every stage function, which no longer has to know or care about SDKs, auth headers, or endpoint shapes.

3. **The interface was obvious.** All LLM providers do the same thing: take a prompt, return text. The interface didn't require speculation — it was dictated by the domain.
3. **Provider wiring is inherently churn-heavy.** LLM APIs shift — endpoints move, SDKs get deprecated, auth policies tighten. Keeping that churn confined to one file is the whole point.

**When abstraction IS premature:** when you're guessing at the interface. If you don't know what the methods should look like, an abstraction will be wrong. Wait until you have two concrete implementations and extract the commonality.
**When abstraction IS premature:** when you're guessing at the interface. If you don't know what the methods should look like, an abstraction will be wrong. Wait until you have one concrete implementation you trust, then name the interface it demanded.

**Rule of thumb:** abstract when (a) the interface is obvious from the domain, (b) you have at least one concrete implementation, and (c) the cost of the abstraction is small relative to the cost of changing callers later.

Expand All @@ -240,7 +236,7 @@ Here's why it wasn't premature:

2. **The Registry pattern is a Strategy + Factory hybrid.** Dictionary mapping names to classes. Adding a provider is one line. Listing providers is one function call.

3. **Abstraction pays for itself when the switch actually happens.** The Gemini → Kimi migration is the proof. Without the abstraction, it would have been a week of find-and-replace across stage functions, tests, and error handling.
3. **Abstraction pays for itself by confining churn.** Provider wiring shifts — endpoints move, SDKs get deprecated, auth policies tighten. Without the Protocol + Registry, each shift would ripple across every stage function, test, and error handler. With it, the churn lives in exactly one file.

4. **The interface should be dictated by the domain, not the implementation.** All LLM providers take prompts and return text. That's the interface. Implementation details (SDK choice, auth mechanism, retry strategy) are hidden behind it.

Expand Down
Loading
Loading