Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 17 additions & 10 deletions docs/homework/project-management-report.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
# Flair2 — Project Management Report

**Team:** Sam Wu (@0b00101111) · Jess Zhang (@tyrahappy)
**Duration:** 2026-03-22 → 2026-04-18 (~4 weeks)
**Scope:** 262 commits · 112 merged PRs · 2 deployed services on AWS
**Duration:** 2026-03-22 → 2026-04-20 (~4.5 weeks)

Copilot AI Apr 20, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The duration math looks off: 2026-03-22 → 2026-04-20 is ~29 days (~4.1 weeks), not ~4.5 weeks. Suggest adjusting the approximation (or dropping it) so the header stays internally consistent.

Suggested change
**Duration:** 2026-03-22 → 2026-04-20 (~4.5 weeks)
**Duration:** 2026-03-22 → 2026-04-20 (~4.1 weeks)

Copilot uses AI. Check for mistakes.
**Scope:** 292 commits · 127 merged PRs · 189 tests · 2 deployed services on AWS

---

Expand All @@ -15,9 +15,9 @@ We started from a V1 hackathon prototype (`gemini-social-asset`) and rewrote it
───────────────────────────────────────────────────────────────────────
Monolithic main.py → modular (api/pipeline/workers) → ECS Fargate
In-memory state → Redis + Celery coordination → ElastiCache
Sequential pipeline → MapReduce fan-out/fan-in → 20 concurrent
Gemini only → pluggable provider registry → Kimi live
No tests / CI → 111 unit + 5 integration + M5/M6 → GitHub Actions
Sequential pipeline → MapReduce fan-out/fan-in → cap=29 concurrent
Gemini only → pluggable provider registry → Kimi (Anthropic Messages)
No tests / CI → 157 unit + 32 skipped + M5/M6 → GitHub Actions
```

## Work Breakdown (Parallel Tracks)
Expand All @@ -29,9 +29,10 @@ We started from a V1 hackathon prototype (`gemini-social-asset`) and rewrote it
| Apr 4–8 | **M4** — SSE pipeline visualizer, voting animation | **M3** — SSE manager, checkpoints, multi-user validation | Contract #71 §2 — SSE events |
| Apr 8–11 | **M4** — results page, polish | **M3-5** — integration tests, deploy workflow | First full E2E on AWS |
| Apr 11–15 | Experiments helper, design-language port | **M5** — M5-1/2/3 backpressure, recovery, cache experiments | Both: M5-4 Locust, M6 ElastiCache |
| Apr 15–18 | S1 grid viz, S4 vote matrix, observability | Deploy hardening, Terraform state import, destroy workflow | Final polish |
| Apr 15–18 | S1 grid viz, S4 vote matrix, observability | Deploy hardening, Terraform state import, destroy workflow | First E2E with full viz + resilience |
| Apr 18–20 | Business-case doc, deck iterations, Campaign → Script rename across deck/frontend/backend, lessons-series refresh | M5-distributed-patterns experiments (fan-out, stragglers, idempotency), experiment overview, PDF export, M5-4 rerun | Final submission package |

**PR split:** Sam 69 (62%), Jess 43 (38%). Every PR reviewed by the other.
**PR split:** Sam 77 (61%), Jess 50 (39%). Every PR reviewed by the other.

## Problems Encountered & How We Broke Them Down

Expand All @@ -47,17 +48,23 @@ We started from a V1 hackathon prototype (`gemini-social-asset`) and rewrote it
| 8 | **One bad video killed whole S1 run** | Sparse TikTok transcripts hit schema validation | Skip-and-continue (#156): mark video skipped, let S2 aggregate the rest |
| 9 | **Stragglers (99/100) hung pipeline** because of long retry budget | UI showed "Pipeline appears stalled" at 99/100 | 95% completion threshold with SETNX-guarded transition (#165) |
| 10 | **S3 client-side routes 404'd** after deploy — "Start Pipeline" bounced to home page | User-reported | S3 error doc only serves ROOT index.html; switched path-based routes to query params (#134) |
| 11 | **Straggler-experiment chart left its concurrency assumption implicit** — raw 94.8% savings number looked misleadingly strong | Team-review of the deck | Kept the result but added explicit best/worst-case framing; noted cap=29 reality is closer to 80–85% |
| 12 | **"Campaign Studio" branding didn't match the artifact** — the tool generates ten short-form video scripts, not a marketing campaign | Deck rehearsal with Sam | Renamed across frontend (8 surfaces), deck, docs, backend OpenAPI title, and 6 lessons (#180, #182) |
| 13 | **Lessons series narrated V1-vs-V2 history** — distracting from the current architecture | Review for final submission | Rewrote lessons 01, 02, 14, 15, 23 to explain current state only (#182) |

## Process Discipline That Held

- **Interface contract (issue #71) upfront** — documented every Redis key, every SSE event, every API endpoint. Let Sam and Jess work independently for weeks and integrate cleanly.
- **One PR, one fix** — 112 PRs averaged 4 files each. Every PR title used conventional prefixes (`feat:`, `fix:`, `chore:`, `docs:`) for grep-friendly history.
- **Hard branch protection on `main`** — Sam and Jess set a GitHub ruleset that *physically blocks* direct pushes and requires ≥1 approving review from the other team member before merge (admin override disabled). This turned "we should review each other's work" from a norm into an enforced gate. Effect: every one of the 112 merged PRs has at least one reviewer's signature; neither of us can ship without the other reading the diff. On multiple occasions Jess's review caught issues Sam missed (and vice-versa) before they hit production.
Comment on lines 58 to 59

Copilot AI Apr 20, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These bullets still say "112 PRs" / "112 merged PRs", but the header scope was updated to "127 merged PRs". Please update these counts (or rephrase to avoid hard-coding) so the report doesn’t contradict itself.

Suggested change
- **One PR, one fix**112 PRs averaged 4 files each. Every PR title used conventional prefixes (`feat:`, `fix:`, `chore:`, `docs:`) for grep-friendly history.
- **Hard branch protection on `main`** — Sam and Jess set a GitHub ruleset that *physically blocks* direct pushes and requires ≥1 approving review from the other team member before merge (admin override disabled). This turned "we should review each other's work" from a norm into an enforced gate. Effect: every one of the 112 merged PRs has at least one reviewer's signature; neither of us can ship without the other reading the diff. On multiple occasions Jess's review caught issues Sam missed (and vice-versa) before they hit production.
- **One PR, one fix**127 PRs averaged 4 files each. Every PR title used conventional prefixes (`feat:`, `fix:`, `chore:`, `docs:`) for grep-friendly history.
- **Hard branch protection on `main`** — Sam and Jess set a GitHub ruleset that *physically blocks* direct pushes and requires ≥1 approving review from the other team member before merge (admin override disabled). This turned "we should review each other's work" from a norm into an enforced gate. Effect: every one of the 127 merged PRs has at least one reviewer's signature; neither of us can ship without the other reading the diff. On multiple occasions Jess's review caught issues Sam missed (and vice-versa) before they hit production.

Copilot uses AI. Check for mistakes.
- **Tests before features** — 105 unit tests existed before the first real pipeline run. Grew to 111 by the end. Caught the `RateLimitError` retry countdown bug pre-deploy.
- **Tests before features** — 105 unit tests existed before the first real pipeline run. Grew to 157 passing (+ 32 skipped integration tests gated on AWS creds). Caught the `RateLimitError` retry countdown bug pre-deploy.

Copilot AI Apr 20, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parenthetical says the 32 skipped integration tests are "gated on AWS creds", but the skipped experiment/integration modules are actually gated on environment variables like PIPELINE_BASE_URL / ELASTICACHE_URL being set (see backend/tests/experiments/test_e2e_pipeline.py and test_elasticache_integration.py). Suggest updating the wording to match the actual skip conditions.

Suggested change
- **Tests before features** — 105 unit tests existed before the first real pipeline run. Grew to 157 passing (+ 32 skipped integration tests gated on AWS creds). Caught the `RateLimitError` retry countdown bug pre-deploy.
- **Tests before features** — 105 unit tests existed before the first real pipeline run. Grew to 157 passing (+ 32 skipped integration tests gated on integration env vars such as `PIPELINE_BASE_URL` / `ELASTICACHE_URL`). Caught the `RateLimitError` retry countdown bug pre-deploy.

Copilot uses AI. Check for mistakes.
- **Test the failure paths explicitly** — M5-2 (failure recovery) validated the checkpoint-and-resume code before we ever needed it. Saved ~50% of LLM calls on the first real crash.
- **Experiments drove design changes, not the other way around** — the straggler experiment justified the 95% threshold (#165); the Locust load test surfaced that CPU-based autoscaling was wrong for IO-bound workers; the fan-out experiment located Amdahl's knee at cap=29.

## What the Final State Runs

**AWS** `314727362981` / `us-west-2`: VPC, 2 public + 2 private subnets, ECS Fargate cluster (API + Worker services, 2–6 / 2–4 tasks with autoscaling), ElastiCache Redis, ALB, S3 static site, DynamoDB, ECR, IAM roles — all Terraform-managed.
**CI/CD:** GitHub Actions runs lint+test on every push; merges to main auto-deploy Docker image → ECR → ECS and static site → S3.
**Pipeline:** 100 real viral TikTok videos analyzed in ~15s (20 concurrent workers, bounded by Kimi's concurrency semaphore), 20 scripts generated on-niche, 42 predefined personas vote, top 10 personalized per creator profile.
**CI/CD:** GitHub Actions runs lint+test on every push; merges to main auto-deploy Docker image → ECR → ECS and static site → S3. Hard branch protection on `main` requires ≥1 review and blocks direct pushes.
**Pipeline:** 100 real viral TikTok videos analyzed in ~15s end-to-end (cap=29 concurrent LLM calls, bounded by Kimi's concurrency semaphore), 20 scripts generated on-niche, 42 predefined real-viewer personas vote with Borda rank aggregation, top 10 personalized to the creator's voice.
**Experiment portfolio:** 7 experiments, 61 tests (59 pass + 2 documented boundary failures) — M5-1/2/3 resilience, M5-4 Locust load test (K=10 → K=500), M6 ElastiCache under real concurrency, and distributed-patterns experiments (fan-out parallelism, straggler mitigation, idempotency under races).
**Final deliverables:** deployed app, 28-article teaching series, 4-section presentation deck for 6-minute talk, business-case document, experiments report (PDF-exported), this PM report.
Loading