From 0b241f81e26c5f973a6e831fdd734a4e75be82b1 Mon Sep 17 00:00:00 2001 From: Sam Wu <37496494+0b00101111@users.noreply.github.com> Date: Mon, 20 Apr 2026 10:28:41 -0700 Subject: [PATCH] docs(lessons): explain current state only, drop V1/V2 comparisons MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Per the updated framing, the teaching series should describe the system as it is today — design decisions, architecture, tools, and how they work — without narrating prior versions. Also aligns the API metadata with the public-facing 'Script Studio' rename. Lessons touched - 01 Why This System Exists: drop the V1→V2 change table; refresh the product description to today's artifact (ten short-form video scripts, 42 personas, 260 LLM calls). - 02 Deployed Architecture: drop the 'Gemini → Kimi' and 'OpenAI → Anthropic' migration narratives; keep the current Kimi wiring and the docs-drift meta-point. - 14 Provider Abstraction: reframe around why the ReasoningProvider Protocol shape exists today (domain-dictated interface, churn confinement). Remove the 'payoff story' narrating the past swap. - 15 Kimi chapter: rewrite from 'two-migrations' arc into 'how the Kimi provider is wired today' — Anthropic Messages API endpoint, base_url quirks, UA allowlist, content-block responses, lazy client, retry separation. - 23 Frontend Stack: drop the 'V1 design language PR narrative', explain the current design-tokens / Tailwind split instead. Keep the crypto.randomUUID fallback with the actual fallback code. - README: update the lesson 15 index row to match the new title. Backend - backend/app/main.py: rename FastAPI title from 'AI Campaign Studio' to 'AI Script Studio' so the OpenAPI metadata (shown at /docs) matches the product rename already applied to the frontend and deck. Verification - backend: 157 passed, 32 skipped; ruff clean on app/main.py. - grep: no 'Campaign Studio' or V1/V2 comparison phrases remain in backend/ or docs/lessons/. --- backend/app/main.py | 2 +- docs/lessons/01-why-this-system-exists.md | 20 +----- docs/lessons/02-the-deployed-architecture.md | 10 ++- docs/lessons/04-the-api-layer.md | 2 +- docs/lessons/14-the-provider-abstraction.md | 30 ++++---- .../15-kimi-and-openai-compatibility.md | 68 +++++++++---------- docs/lessons/23-the-frontend-stack.md | 36 ++++++---- docs/lessons/README.md | 2 +- 8 files changed, 76 insertions(+), 94 deletions(-) diff --git a/backend/app/main.py b/backend/app/main.py index c7260f9..68c89f9 100644 --- a/backend/app/main.py +++ b/backend/app/main.py @@ -34,7 +34,7 @@ async def lifespan(app: FastAPI): app = FastAPI( - title="Flair2 — AI Campaign Studio", + title="Flair2 — AI Script Studio", version="0.1.0", lifespan=lifespan, ) diff --git a/docs/lessons/01-why-this-system-exists.md b/docs/lessons/01-why-this-system-exists.md index 57dd8dc..c3538f8 100644 --- a/docs/lessons/01-why-this-system-exists.md +++ b/docs/lessons/01-why-this-system-exists.md @@ -4,9 +4,9 @@ ## The product -Flair2 is an AI Campaign Studio. A user enters a brand name and a creator profile. The system analyzes a dataset of viral videos, extracts structural patterns, generates candidate marketing scripts based on those patterns, has 100 simulated personas vote on the scripts, ranks the winners, and personalizes the top scripts to the creator's voice. Then the user watches the results appear in real time. +Flair2 is an AI Script Studio for short-form video. A creator enters a profile — tone, niche, audience, catchphrases — and with one click the system returns **ten ready-to-shoot video scripts** to pick from. Under the hood it analyzes 100 real viral TikToks, extracts structural patterns, generates candidate scripts shaped by those patterns, has a panel of 42 personas vote, ranks the winners, and personalizes the top scripts into the creator's voice. The user watches the results appear in real time. -One click. Six stages. Roughly 261 LLM API calls. ~500,000 tokens of work. The user sees live progress as each stage completes. +One click. Six stages. Roughly 260 LLM API calls. ~500,000 tokens of work. The user sees live progress as each stage completes. That description — one click, many calls, live progress — is the entire reason a distributed architecture exists here. If it were one LLM call returning one result, you'd write a Python script. @@ -44,22 +44,6 @@ A useful discipline: before accepting a complex design, try to break it with sim Each simpler approach fails on at least one of the three forces. That's why the architecture is what it is. -## V1 to V2: what changed and why - -Flair2 is a V2 rewrite of an earlier hackathon prototype ([gemini-social-asset](https://github.com/yangyang-how/gemini-social-asset)). Understanding what V1 got wrong tells you what V2 is designed to prevent: - -| V1 | V2 | Why it changed | -|----|-----|---------------| -| Monolithic `main.py` | Separated modules (`api/`, `pipeline/`, `workers/`, `infra/`) | One file with everything means you can't change one part without risking all parts. Module boundaries are change boundaries. | -| In-memory state | Redis-backed state | Process dies, state dies. Redis survives process restarts. | -| Sequential pipeline | Concurrent workers with MapReduce | S1 analyzing 100 videos one-by-one takes 100x longer than analyzing them concurrently. | -| Gemini only | Pluggable provider registry (Kimi is live) | Gemini had intermittent 500s and rate limit issues. The registry pattern made switching to Kimi a one-line change. | -| No tests | pytest with unit + integration + experiment coverage | V1 "worked" locally and broke in production. Tests are how you know it still works after changes. | -| Google Cloud Run | AWS (ECS Fargate, ElastiCache, ALB, S3) | Course requirement + richer distributed systems story. | -| `.DS_Store` and `__pycache__` committed | `.gitignore` from day one | Hygiene. Never commit generated files. | - -The pattern to notice: **every V2 decision exists to prevent a specific V1 failure mode.** When you design systems, you should be able to name the failure each design choice prevents. If you can't, you're adding complexity without justification. - ## The two-person team Flair2 was built by two people — Sam (pipeline + frontend) and Jess (infrastructure + distributed systems). This shaped the architecture: diff --git a/docs/lessons/02-the-deployed-architecture.md b/docs/lessons/02-the-deployed-architecture.md index b1e2acb..53872b0 100644 --- a/docs/lessons/02-the-deployed-architecture.md +++ b/docs/lessons/02-the-deployed-architecture.md @@ -4,11 +4,9 @@ ## A note on honesty -The original architecture doc (`design/architecture.md`) says the backend deploys to Railway and the frontend to Cloudflare Pages. In reality, the backend runs on AWS ECS Fargate with ElastiCache Redis, and the frontend is an S3 static website. The Gemini API is mentioned throughout the design docs and experiment reports — in reality, the production provider is Kimi (Moonshot AI), accessed over Kimi's coding endpoint using the Anthropic Messages API. The architecture doc hasn't been updated. +The original architecture doc (`design/architecture.md`) describes one topology; the deployed system differs in several places. Design docs freeze at the moment they were written — `git log`, `grep`, and `terraform plan` are more trustworthy than any markdown file. -This is the single most common documentation failure in software: **design docs freeze at the point they were written.** Always verify against the code and the infrastructure. `git log`, `grep`, and `terraform plan` are more trustworthy than any markdown file. - -This article describes the real system as of April 2026, verified against code and PR history. +This article describes the real system as of April 2026, verified against code and infrastructure. ## The topology @@ -110,9 +108,9 @@ This article describes the real system as of April 2026, verified against code a **File:** `backend/app/providers/kimi.py` **File:** `backend/app/providers/registry.py` -**How it connects:** Kimi's coding endpoint speaks the **Anthropic Messages API** at `/coding/v1/messages`. The `KimiProvider` uses the `AsyncAnthropic` client with `base_url="https://api.kimi.com/coding"` and a `default_headers` override for User-Agent (Kimi's endpoint whitelists approved coding agents — Claude Code, Kimi CLI, etc.). An earlier version of the endpoint spoke OpenAI's `chat/completions` schema; that surface went dead in early 2026 and we migrated to Anthropic's SDK. See [Article 15](15-kimi-and-openai-compatibility.md) for the migration story. +**How it connects:** Kimi's coding endpoint speaks the **Anthropic Messages API** at `/coding/v1/messages`. The `KimiProvider` uses the `AsyncAnthropic` client with `base_url="https://api.kimi.com/coding"` and a `default_headers` override for User-Agent (Kimi's endpoint whitelists approved coding agents — Claude Code, Kimi CLI, etc.). See [Article 15](15-kimi-and-openai-compatibility.md) for the full wiring. -**The migration stories:** Plural, now. First Gemini → Kimi (PR #95: "remove Gemini secret requirement"), driven by Gemini's intermittent 500s and rate-limit issues. Then OpenAI SDK → Anthropic SDK, driven by Kimi deprecating their OpenAI-compatible shim. Both migrations only touched `providers/kimi.py` because every stage calls through the `ReasoningProvider` Protocol. The `GeminiProvider` class still exists; it's just not wired up in production. +**Why provider choice is isolated:** every stage calls through the `ReasoningProvider` Protocol, so the specific SDK and endpoint shape live in exactly one file (`providers/kimi.py`). A `GeminiProvider` class also exists in the registry but isn't wired up in production — it's available if you pass `reasoning_model: "gemini"` at run time. ### Frontend (Astro + React on S3) diff --git a/docs/lessons/04-the-api-layer.md b/docs/lessons/04-the-api-layer.md index 6554c15..be6a6fd 100644 --- a/docs/lessons/04-the-api-layer.md +++ b/docs/lessons/04-the-api-layer.md @@ -16,7 +16,7 @@ This pattern has a name: **non-blocking request handling**. Learn to spot it eve ```python app = FastAPI( - title="Flair2 — AI Campaign Studio", + title="Flair2 — AI Script Studio", version="0.1.0", lifespan=lifespan, ) diff --git a/docs/lessons/14-the-provider-abstraction.md b/docs/lessons/14-the-provider-abstraction.md index cb5e606..1af42fa 100644 --- a/docs/lessons/14-the-provider-abstraction.md +++ b/docs/lessons/14-the-provider-abstraction.md @@ -1,16 +1,12 @@ # 14. The Provider Abstraction -> Flair2 switched its entire LLM backend from Gemini to Kimi in one PR. The reason it was possible is a 30-line abstraction layer that most developers would dismiss as "premature." This article explains why it wasn't, and teaches you the design pattern behind it. +> No code in Flair2's pipeline names a specific LLM vendor. Stages ask a `ReasoningProvider` for text; the concrete provider is chosen at runtime from configuration. This article explains why that shape matters and how the ~30 lines that make it work are structured. -## The payoff story +## The shape of the abstraction -PR #95: "chore: remove Gemini secret requirement, Kimi-only deployment." +The pipeline never writes `KimiProvider` or `GeminiProvider`. It writes `ReasoningProvider`. The specific implementation is chosen at runtime — per pipeline run — from the `reasoning_model` field in the request config. -Here's what changed: the Terraform config stopped provisioning a Gemini API key, and the default reasoning model in the frontend was set to "kimi." That's it. No stage functions were modified. No task definitions changed. No tests broke (except the ones specifically testing GeminiProvider). - -The switch was possible because no code in the pipeline says `KimiProvider` or `GeminiProvider`. It says `ReasoningProvider`. The specific implementation is chosen at runtime, from configuration. - -**Cost of the abstraction:** ~30 lines of code (the Protocol class + the registry). **Payoff:** hours of migration work avoided, plus the ability to switch providers again in the future without touching business logic. +**Cost:** ~30 lines of code (the Protocol class + the registry). **What it buys:** callers don't depend on SDKs, credentials, or endpoint shapes. Stage code stays readable and focused on prompts and parsing. Swapping or adding a provider is a one-file change, never a cross-codebase refactor. ## The Protocol class @@ -161,7 +157,7 @@ class KimiProvider: return self._client ``` -Kimi speaks the **Anthropic Messages API** on its coding endpoint. We use the `AsyncAnthropic` client with a custom `base_url`. (An earlier version of Kimi spoke OpenAI's chat/completions schema instead — that surface went dead and the client had to migrate. [Article 15](15-kimi-and-openai-compatibility.md) covers the migration history and why the abstraction let us do it with a ~30-line change.) +Kimi speaks the **Anthropic Messages API** on its coding endpoint. We use the `AsyncAnthropic` client with a custom `base_url` and a User-Agent header Kimi's endpoint requires. [Article 15](15-kimi-and-openai-compatibility.md) covers the full wiring and the User-Agent whitelist. ### GeminiProvider (`providers/gemini.py`) @@ -218,19 +214,19 @@ A second Protocol for video generation. Currently has no implementations — the The registry already has a `_video_providers` dict and a `register_video` function. Adding a video provider would follow the exact same pattern as the reasoning providers. -## When abstraction is premature vs prescient +## When abstraction is premature vs justified -The common objection: "You only have two providers. This is premature abstraction. Just use Kimi directly." +The common objection: "Two providers isn't enough to justify an interface. Just call Kimi directly." -Here's why it wasn't premature: +Here's why the abstraction earns its keep: -1. **The switch actually happened.** Gemini → Kimi migration was driven by Gemini's intermittent 500s and rate limit issues. Without the abstraction, every stage function would have needed modification. +1. **The interface is dictated by the domain, not speculation.** Every LLM provider takes a prompt and returns text. The shape of `generate_text(prompt, schema) -> str` isn't a guess — it's the only shape the domain allows. -2. **The cost was trivial.** Protocol class: 15 lines. Registry: 20 lines. Zero runtime overhead. The abstraction doesn't add complexity to the codebase — it removes it from every stage function. +2. **The cost is trivial.** Protocol class: 15 lines. Registry: 20 lines. Zero runtime overhead. The abstraction doesn't add complexity to the codebase — it removes it from every stage function, which no longer has to know or care about SDKs, auth headers, or endpoint shapes. -3. **The interface was obvious.** All LLM providers do the same thing: take a prompt, return text. The interface didn't require speculation — it was dictated by the domain. +3. **Provider wiring is inherently churn-heavy.** LLM APIs shift — endpoints move, SDKs get deprecated, auth policies tighten. Keeping that churn confined to one file is the whole point. -**When abstraction IS premature:** when you're guessing at the interface. If you don't know what the methods should look like, an abstraction will be wrong. Wait until you have two concrete implementations and extract the commonality. +**When abstraction IS premature:** when you're guessing at the interface. If you don't know what the methods should look like, an abstraction will be wrong. Wait until you have one concrete implementation you trust, then name the interface it demanded. **Rule of thumb:** abstract when (a) the interface is obvious from the domain, (b) you have at least one concrete implementation, and (c) the cost of the abstraction is small relative to the cost of changing callers later. @@ -240,7 +236,7 @@ Here's why it wasn't premature: 2. **The Registry pattern is a Strategy + Factory hybrid.** Dictionary mapping names to classes. Adding a provider is one line. Listing providers is one function call. -3. **Abstraction pays for itself when the switch actually happens.** The Gemini → Kimi migration is the proof. Without the abstraction, it would have been a week of find-and-replace across stage functions, tests, and error handling. +3. **Abstraction pays for itself by confining churn.** Provider wiring shifts — endpoints move, SDKs get deprecated, auth policies tighten. Without the Protocol + Registry, each shift would ripple across every stage function, test, and error handler. With it, the churn lives in exactly one file. 4. **The interface should be dictated by the domain, not the implementation.** All LLM providers take prompts and return text. That's the interface. Implementation details (SDK choice, auth mechanism, retry strategy) are hidden behind it. diff --git a/docs/lessons/15-kimi-and-openai-compatibility.md b/docs/lessons/15-kimi-and-openai-compatibility.md index 0b4d6ef..9d6f014 100644 --- a/docs/lessons/15-kimi-and-openai-compatibility.md +++ b/docs/lessons/15-kimi-and-openai-compatibility.md @@ -1,27 +1,27 @@ -# 15. Kimi and the Anthropic Messages Migration +# 15. Kimi and the Anthropic Messages API -> Industry-standard APIs come and go. Between the hackathon prototype and today, Flair2 migrated its LLM client twice — first to OpenAI's chat/completions schema, then to Anthropic's Messages API. This article explains why both moves happened, what the current wiring looks like, and the transferable lesson about coding against an interface. +> Kimi (Moonshot AI) is Flair2's production LLM. Its coding endpoint speaks the Anthropic Messages API shape, gated on a User-Agent allowlist. This article walks through how the provider is wired, why each knob is set the way it is, and the transferable lessons about coding against an API you don't own. -## Two migrations in one project +## The endpoint, the SDK, and the UA allowlist -When the Kimi provider was first added, Kimi's coding endpoint exposed an OpenAI-compatible `chat/completions` route. That made integration cheap: use the OpenAI Python SDK, swap `base_url`, done. Many Chinese LLM providers did this to attract developers already using OpenAI's SDK. +Kimi exposes several surfaces. The one Flair2 uses is **Kimi For Coding** at `https://api.kimi.com/coding`. That endpoint speaks the **Anthropic Messages API** at `/coding/v1/messages` — same request shape, same response shape, same content-block structure as calling Claude directly. Because the shape matches, Flair2 reaches it with the standard `anthropic` Python SDK (`AsyncAnthropic`) and just overrides `base_url`. -Then Kimi's endpoint quietly changed. The OpenAI-shaped route started returning a misleading error — `"only 0.6 is allowed for this model"` — for every request regardless of temperature. The real surface moved to an Anthropic Messages API shape at `/coding/v1/messages`. So we migrated again, this time to the Anthropic Python SDK. +Two non-obvious things about that endpoint: -This sequence is now permanently documented in the provider file's docstring: +1. **It's gated on User-Agent.** Only approved coding agents (Claude Code, Kimi CLI, Roo Code, Cline, etc.) are allowed through. Unknown clients get a 403 with the message *"Kimi For Coding is currently only available for Coding Agents."* We set `default_headers={"User-Agent": "claude-code/0.1.0"}` to land on the allowlist. +2. **The base URL stops at `/coding`, not `/coding/v1`.** The Anthropic SDK appends `/v1/messages` itself. Adding `/v1` to the base URL yields a silent 404. + +The docstring on the provider file calls this out for anyone reading the code cold: ```python """Kimi (Moonshot AI) reasoning provider via the Anthropic Messages API. -Kimi's coding endpoint migrated to an Anthropic-compatible schema -(see /coding/v1/messages). The legacy OpenAI /chat/completions shim -now returns a misleading "only 0.6 is allowed for this model" error -for every request, regardless of temperature — a dead surface. +Kimi's coding endpoint uses an Anthropic-compatible schema +(see /coding/v1/messages). Requires a coding-agent User-Agent +on the allowlist, or the endpoint returns 403. """ ``` -**The transferable lesson:** LLM provider APIs are not stable contracts. Public pricing pages and "OpenAI compatible" claims can change month to month. Design your provider abstraction so migrations are one file's worth of work. - ## The current wiring **File:** `backend/app/providers/kimi.py` @@ -45,21 +45,19 @@ class KimiProvider: return self._client ``` -Three things changed from the OpenAI era: +Three details worth understanding in that block: -### 1. Different SDK +### 1. `AsyncAnthropic`, lazily constructed -`AsyncAnthropic` instead of `OpenAI`. Same pattern (base_url + default_headers), different package. The rest of the provider code — retry logic, rate-limit handling, JSON parsing — didn't change because it doesn't depend on the SDK. +The SDK client is created on first use, not on import. That means importing `providers/kimi.py` is free — no network, no auth validation, no side effects. You pay the cost only when a task actually calls the provider. This matters for tests and for cold-start latency on worker tasks. -### 2. Different endpoint shape +### 2. `base_url` stops at `/coding` -`base_url` stops at `/coding`, not `/coding/v1`, because the Anthropic SDK appends `/v1/messages` automatically. Getting this wrong silently breaks everything with a 404. +The Anthropic SDK appends `/v1/messages` automatically. If you write `base_url="https://api.kimi.com/coding/v1"`, the final URL becomes `/coding/v1/v1/messages` and you get a silent 404 for every request. Read the SDK's URL-composition rules before setting `base_url`. -### 3. Different request/response shape +### 3. Content-block responses, not plain strings -Requests use `client.messages.create(...)` with `messages=[...]` and `max_tokens` (required in Anthropic's API, optional in OpenAI's). Responses are `Message` objects with a `content` list of content blocks — each block has a `type` ("text", "tool_use", etc.) and a `text` attribute for text blocks. - -Flair2 only cares about text, so it flattens the content blocks: +`client.messages.create(...)` returns a `Message` object with a `content` list. Each block has a `type` (`"text"`, `"tool_use"`, etc.) and a `text` attribute for text blocks. Flair2 only cares about text, so it flattens the blocks: ```python def _extract_text(response) -> str: @@ -71,21 +69,17 @@ def _extract_text(response) -> str: return "".join(parts) ``` -## The User-Agent whitelist (still required) - -Kimi's coding endpoint is gated on User-Agent. Only approved coding agents (Kimi CLI, Claude Code, Roo Code, etc.) can use it. Unrecognized clients get 403: *"Kimi For Coding is currently only available for Coding Agents."* +## The User-Agent allowlist -The `default_headers={"User-Agent": "claude-code/0.1.0"}` line is the whitelist workaround. Same fragility it had during the OpenAI era — if Kimi tightens validation, the spoof breaks. Nothing about migrating to Anthropic's SDK fixed this; it's an endpoint policy, not an SDK behavior. +The UA header is the other piece of this that trips up newcomers. The endpoint returns 403 for anything it doesn't recognize, and nothing in the Anthropic SDK forces you to set a UA — so if you follow the SDK's quickstart with Kimi's `base_url`, every request fails and the error message ("Kimi For Coding is currently only available for Coding Agents") points you in an unhelpful direction. -## The registry abstraction paid off twice +`default_headers={"User-Agent": "claude-code/0.1.0"}` solves it. The value has to match an entry on Kimi's internal allowlist. It's an endpoint policy, not an SDK behavior, so it survives any SDK change — keep it in mind any time you touch this file. -Because every stage calls `provider.generate_text(...)` through the `ReasoningProvider` Protocol ([Article 14](14-the-provider-abstraction.md)), **two SDK migrations didn't touch any stage code.** S1, S3, S4, S6 have no idea which SDK sits behind the provider. The only files that changed across migrations: +## Why the abstraction isolates this -- `providers/kimi.py` — the SDK wrapper -- `pyproject.toml` — the dependency (anthropic instead of openai) -- Tests that specifically asserted OpenAI SDK behavior +All of the above — the SDK choice, the `base_url` quirk, the UA header, the content-block flattening — lives entirely inside `providers/kimi.py`. Stages S1, S3, S4, S6 don't know any of it. They call `provider.generate_text(...)` through the `ReasoningProvider` Protocol ([Article 14](14-the-provider-abstraction.md)) and get a plain string back. -Every other part of the codebase — orchestrator, workers, stages, frontend — was unaffected. This is the dividend of programming to an interface. When the interface is stable and the implementation changes, only the implementation file changes. +That's the dividend of programming to an interface. If Kimi's endpoint tightens its UA policy, if the content-block response shape evolves, if a future provider uses a different SDK — the change lives in one file. The rest of the codebase doesn't notice. ## Model IDs @@ -97,7 +91,7 @@ The coding endpoint accepts multiple model aliases: | `kimi-for-coding/k2p5` | Coding-specific variant of K2.5 | | `kimi-k2.5` | General-purpose K2.5 (accepted on coding endpoint since Kimi unified their credit pool in April 2026) | -The code uses `kimi-for-coding` as the default. All three currently work because Kimi unified billing across Kimi Code, Kimi Chat, Agent, and PPT — the coding endpoint will accept general models too. +The code uses `kimi-for-coding` as the default. All three currently work because Kimi unified billing across Kimi Code, Kimi Chat, Agent, and PPT, so the coding endpoint accepts general models too. ## Retry & rate-limit behavior @@ -105,15 +99,15 @@ Provider-level retries are covered in [Article 16](16-rate-limiting.md). Key det ## What you should take from this -1. **"X-compatible API" claims are promises with an expiration date.** OpenAI compatibility worked for Kimi until it didn't. Don't hardcode to the shape; hide it behind an interface. +1. **`base_url` + `default_headers` is how you bend a vendor SDK to a non-default endpoint.** Both OpenAI's and Anthropic's Python SDKs expose these. You don't need a custom HTTP client to target a compatible endpoint; the existing SDK already does everything. -2. **`base_url` + `default_headers` is a pattern, not an SDK feature.** Both OpenAI's and Anthropic's Python SDKs expose it. If you're using one, you can use the other. The migration was ~30 lines. +2. **Read the SDK's URL-composition rules before setting `base_url`.** The Anthropic SDK appends `/v1/messages` itself. Getting this wrong is silent — you get a 404 with no useful diagnostic. -3. **Endpoint policies (UA whitelist) survive SDK migrations.** When you switch clients, carry forward the policy workarounds or you'll be debugging a 403 for an hour. +3. **Endpoint policies (like UA allowlists) live outside SDK abstractions.** No amount of Protocol + Registry purity saves you from a 403. Document the policy workaround in the provider file so future readers don't spend an hour debugging auth. -4. **Provider code is churn-heavy; stage code shouldn't be.** Two migrations, zero changes to S1-S6. The interface pays for itself in rewrite-avoidance. +4. **Content-block responses are the Anthropic SDK's native shape, not a Kimi quirk.** If you ever call Claude directly, the same flattening logic applies. -5. **Document the history in the docstring.** The "legacy OpenAI shim returns a misleading error" comment in `kimi.py` is load-bearing. Six months from now, somebody will try the OpenAI route again and be confused — the comment tells them why not to. +5. **Lazy client construction keeps imports side-effect-free.** `_get_client()` is called on first use, not at module load. Your tests and cold starts thank you. --- diff --git a/docs/lessons/23-the-frontend-stack.md b/docs/lessons/23-the-frontend-stack.md index ccaf0ef..5f10390 100644 --- a/docs/lessons/23-the-frontend-stack.md +++ b/docs/lessons/23-the-frontend-stack.md @@ -53,7 +53,7 @@ frontend/src/pages/ └── results.astro # Results display ``` -**`index.astro`:** the landing page with three blobs (Discover, Generate, Evaluate) that link to `/create`. Uses the V1 design language — rounded blobs with stage numbers, color-coded by pipeline phase. Pure static HTML + CSS. +**`index.astro`:** the landing page with three blobs (Discover, Generate, Evaluate) that link to `/create`. Rounded blobs with stage numbers, color-coded by pipeline phase. Pure static HTML + CSS. **`create.astro`:** the pipeline creation form. The user enters creator profile details and selects a reasoning model. This page embeds a React island for the form (which needs JavaScript for dynamic validation and submission). @@ -91,13 +91,13 @@ useEffect(() => { The browser's native `EventSource` API handles SSE — connection management, reconnection, `Last-Event-ID` — for free. The React component just subscribes to events and updates state. -## V1 design language (PR #114, #115) +## Design language lives in CSS, not in components -The V1 prototype had a distinctive visual style — rounded blobs, a custom color palette, specific typography. PR #114 ("feat: V1 design language — blobs, typography, color-coded pipeline") ported this visual identity to V2. +The frontend has a distinctive visual identity — rounded "blob" shapes for pipeline phases, custom typography pairings, a color palette keyed to each stage (Discover/Generate/Evaluate/Personalize). All of it is defined in Tailwind utility classes and CSS custom properties on `:root`, not baked into component JSX. -PR #115 ("feat: restyle ResultsView + VotingAnimation for V2 text-based output") adapted the V1 components for V2's different data shape: V1 generated images, V2 generates text scripts. The visual language (colors, shapes, animations) stayed the same; the content rendering changed. +That separation matters: the voting animation, the pipeline visualizer, and the results view all render completely different data shapes, but they share one visual system. If a new view needs to be added — or an existing view needs to render a different payload — the styling doesn't have to be rewritten. Presentation and data rendering are independent concerns. -**Design lesson:** the visual identity is a separate concern from the data rendering. V1's design language could be applied to V2's different content because the styling was in CSS/Tailwind, not hardcoded into the data display logic. Separation of presentation from content. +**Design lesson:** keep visual identity in the styling layer (CSS, Tailwind, design tokens) rather than hardcoding it into components. Components should render data; the style system should decide how it looks. ## Why S3, not a real hosting platform @@ -105,19 +105,29 @@ PR #115 ("feat: restyle ResultsView + VotingAnimation for V2 text-based output") The frontend is hosted on S3 with static website hosting enabled. The build output (`frontend/dist/`) is synced to S3 via `aws s3 sync`. -**Why S3 over CloudFront (CDN):** PR #107 simplified from S3 + CloudFront to S3-only. CloudFront adds caching, edge distribution, and custom domains. For a course project with limited traffic and no custom domain, S3 direct hosting is sufficient. CloudFront adds configuration complexity (cache invalidation, SSL certificates, origin access identity) that isn't justified at this scale. +**Why S3 over CloudFront (CDN):** CloudFront adds caching, edge distribution, and custom domains — and a lot of operational surface (cache invalidation, SSL certificates, origin access identity). For a course project with limited traffic and no custom domain, S3 direct hosting is sufficient. The tradeoff would flip at real traffic or with a branded domain. -**Why S3 over Cloudflare Pages:** the architecture doc mentions Cloudflare Pages. The `@astrojs/cloudflare` package is still in `package.json`. But deployment went to S3 because the rest of the infrastructure was on AWS — keeping everything in one cloud provider simplifies IAM, networking, and CI/CD. +**Why S3 over Vercel/Netlify/Cloudflare Pages:** these platforms are faster to set up (connect GitHub, auto-deploy) but they sit outside the AWS infrastructure that Terraform manages for the rest of Flair2. Keeping everything in one cloud provider and one IaC tree simplifies IAM, networking, and CI/CD. -**Why S3 over Vercel/Netlify:** these platforms are easier to set up (connect GitHub, auto-deploy). But Flair2's terraform-managed infrastructure approach requires all resources to be defined as code. S3 static hosting integrates naturally with the existing Terraform setup. +## The `crypto.randomUUID` fallback -## The `crypto.randomUUID` fix (PR #121) +The frontend generates client-side session IDs with `crypto.randomUUID()`. That API is only available in **secure contexts** (HTTPS or `localhost`). S3 static website hosting serves over plain HTTP unless you put CloudFront in front of it — so on the deployed site, `crypto.randomUUID` is `undefined`. -A fun edge case: the frontend used `crypto.randomUUID()` to generate session IDs. This API is only available in **secure contexts** (HTTPS or localhost). S3 static website hosting serves over HTTP, not HTTPS (unless you add CloudFront). On HTTP, `crypto.randomUUID()` is undefined. +The client-side code guards for that: -PR #121 added a fallback: check if `crypto.randomUUID` exists, and if not, generate a UUID using `Math.random()`. This is less cryptographically secure but sufficient for session IDs in a prototype. +```typescript +function generateSessionId(): string { + if (typeof crypto !== "undefined" && crypto.randomUUID) { + return crypto.randomUUID(); + } + // Fallback for non-secure contexts (S3 over HTTP) + return "sess_" + Math.random().toString(36).slice(2) + Date.now().toString(36); +} +``` + +Less cryptographically strong than `crypto.randomUUID`, but sufficient for session IDs in a prototype and — crucially — works on the deployment surface. -**The lesson:** browser APIs often have security context requirements that are invisible in development (where you're on `localhost`, a secure context) and only surface in production (where you might be on HTTP). Test in the same context you deploy to. +**The lesson:** browser APIs often have security-context requirements that are invisible in local development (where `localhost` counts as secure) and only surface in production. Test in the same context you deploy to, or guard for the feature at runtime. ## Why not a full SPA @@ -140,7 +150,7 @@ Flair2 has three pages. Most content is static. Interactive components are conce 2. **Static output simplifies deployment.** No Node.js server, no containers, no port management. Just files on S3. The simplest deployment is the one with the fewest moving parts. -3. **Design language is separable from data rendering.** V1's visual identity applied to V2's different content because styling was in CSS, not in the data layer. +3. **Design language is separable from data rendering.** Keep visual identity in CSS and design tokens, not baked into components. The same style system can render completely different data shapes. 4. **Test in the deployment context.** Browser APIs that work on `localhost` may fail on HTTP in production. `crypto.randomUUID()` is the case study. diff --git a/docs/lessons/README.md b/docs/lessons/README.md index dfdc799..10ca6c0 100644 --- a/docs/lessons/README.md +++ b/docs/lessons/README.md @@ -49,7 +49,7 @@ By the end, you should be able to: | # | Article | Core lesson | |---|---------|-------------| | 14 | [The Provider Abstraction](14-the-provider-abstraction.md) | Registry pattern, Protocol classes, coding to an interface with a real payoff. | -| 15 | [Kimi and OpenAI Compatibility](15-kimi-and-openai-compatibility.md) | API compatibility as an industry pattern, `default_headers`, provider migration. | +| 15 | [Kimi and the Anthropic Messages API](15-kimi-and-openai-compatibility.md) | Bending a vendor SDK to a non-default endpoint, UA allowlists, content-block responses. | | 16 | [Rate Limiting a Shared Upstream](16-rate-limiting.md) | Token bucket, Redis INCR+EXPIRE, centralized vs distributed rate limiting. | ## Part VI — Redis