Objective: Build a ruthless, high-performance, single-page aggregator that answers one question: "Who has the cheapest authentic NMN (and other longevity supplements) today?" Architecture: "Git-Scraper" model. $0/month infrastructure cost.
The system is decoupled into two primary components communicating via a single static JSON file committed to the repository.
- Backend (Go): Scrapes vendor sites, standardizes data, applies hardcoded business rules/overrides, calculates ROI, and outputs
data/analysis_report.json. - CI/CD (GitHub Actions): Runs the Go script daily. If data changes, it commits the changes and triggers a frontend build.
- Frontend (Next.js): Reads
data/analysis_report.jsonat build time (SSG), rendering a static, ultra-fast HTML page hosted on Vercel/Cloudflare Pages.
data/analysis_report.json is the sole contract between the Go backend and the Next.js frontend. The Go backend writes it. The frontend reads it. No other data files cross the boundary.
- The Go backend scrapes raw product data into
data/*.json(one file per vendor) for its own internal use. These raw files are not consumed by the frontend. - The backend applies vendor rules, runs the math engine, and serializes the final
[]models.Analysisarray todata/analysis_report.jsonviastorage.SaveReport(). - The frontend reads only
data/analysis_report.jsonvialib/data.ts. It performs zero parsing, zero regex extraction, zero bioavailability math. It is a dumb renderer.
- Command:
go run cmd/main.go -refresh(Scrapes web concurrently → saves raw products todata/*.json→ Analyzes → Saves report todata/analysis_report.json→ Prints table to stdout). - Command:
go run cmd/main.go(Reads localdata/*.jsonconcurrently → Analyzes → Saves report → Prints table). Instant execution for logic debugging. - Command:
go run cmd/main.go -audit(Runs the normal pipeline, then scans all products that pass the supplement keyword filter and vendor blocklist. Products that lack enough data for the analyzer to computeactiveGramsare printed with a gap report: what data was extracted, what is missing, and a suggestedvendor_rules.jsonoverride snippet. Combinable with-refresh.) - Command:
go run cmd/main.go -pprof(Starts the pprof HTTP server on:6060. Off by default.) - Dependency Injection: There is no global mutable state in the Go backend.
rules.LoadRules()returns arules.Registry(type alias formap[string]VendorConfig).cmd/main.goconstructs aparser.Analyzerstruct with the registry and supplement keywords injected as fields, then calls its methods.rules.ApplyRules()takes the registry as an explicit parameter. - Concurrency Model:
cmd/main.gocallsscrapeAll(), which launches one goroutine per vendor usingsync.WaitGroup. Each goroutine callsscrapeOrLoad()independently and sends its result through a buffered channel. A separate goroutine callswg.Wait()thenclose(ch). The main goroutine drains the channel sequentially, applies blocklist rules viarules.ApplyRules(reg, ...), and collects products into a[]vendorProductslice. All downstream processing (analysis, sorting, report generation) remains sequential and deterministic. - Scraper Engines (
internal/scraper/): Scrapers are registered asFetchFuncvalues (typefunc(models.Vendor) ([]models.Product, error)) in a package-levelregistrymap keyed by vendor type string.FetchProducts()dispatches to the correct function via map lookup — no switch statement. All scrapers share aDefaultClient(*http.Client) andNewRequest()/FetchBody()helpers fromclient.go, eliminating duplicate HTTP boilerplate.shopify.go: Parsesproducts.jsonendpoints.magento.go: Parses embeddedMagento_Swatches/js/swatch-rendererJSON configs and extracts HTML metadata. All regexps are compiled once at package level.ld+json.go: Parses Schema.org@graphLD+JSON objects.
- Normalization Layer (
internal/rules/): Readsdata/vendor_rules.json.LoadRules()returns(Registry, error)— no global variable.ApplyRules(reg, vendorName, p)evaluates only the product-level vendor blocklist and returnsfalseto reject a product,trueto allow it. It performs NO data enrichment or string injection — overrides are consumed directly by the analyzer's Hybrid Engine. TheVendorConfigstruct also carriesVariantBlocklist []stringfor skipping ghost variants inside the analyzer loop, andGlobalSubscriptionDiscount float64for vendors whose Shopify APIs hide subscription pricing. - Regex Extraction Helpers (
internal/parser/extract.go):extractFloat(re, s) (float64, bool)returns the first captured group as a float64, returning(0, false)on no match or non-positive value.extractFloatFrom(re, sources...)triesextractFloatagainst each source string in order, implementing the "variant → clean → broad" fallback chains in a single call.containsAny(s, substrs)reports whether a string contains any substring from a slice. These three helpers replace ~13 instances of the 3–5 line regex→parse→check pattern across analyzer.go and audit.go. - Math Engine (
internal/parser/analyzer.go): TheAnalyzerstruct holdsRules rules.RegistryandSupplements []string. ItsAnalyzeProduct()method implements a Hybrid Catalog/Regex Engine with three-tier mass resolution and active/gross mass disambiguation. Returns[]models.Analysis— one entry per valid variant. Mass extraction is delegated toAnalyzer.extractMass(), which returns(capsuleMass, powderMass, usedOverride). For ActiveGrams extraction (the active ingredient mass), the method evaluates a strict priority chain: (1)spec.VariantOverrides[v.Title]— per-variant override takes highest priority; (2)spec.ForceActiveGrams— product-level override bypasses regex; (3) standard regex pipeline viaextractFloat/extractFloatFromhelpers. TherePackregex (pack multiplier) always runs regardless of override source.activeGrams = baseMass * packMultiplier. GrossGrams (label weight) is resolved byAnalyzer.extractGrossGrams()via a two-tier priority chain: (1)spec.VariantGrossOverrides[v.Title]; (2) regex extraction viareLabelGrams/reLabelKgscanning only label text. Defaults to 0 for capsule-only products. Pure Powder Fallback: if the product has no dirty keywords (checked viacontainsAny), GrossGrams was found, and ActiveGrams was regex-resolved, thenactiveGrams = grossGrams. Type classification is delegated toclassifyType(). Bioavailability multiplier is resolved bybioavailabilityMultiplier(). Display name is built bybuildDisplayName(). Cost metrics are computed bybuildAnalysis(), which constructs a singlemodels.Analysisentry — used for both one-time and subscription entries, eliminating the previous struct-literal duplication. When a vendor hasGlobalSubscriptionDiscount > 0, a synthetic "Subscribe & Save" entry is emitted via the samebuildAnalysis()helper. Returnsnilwhen the product has no analyzable variants. - Triage Engine (
internal/parser/analyzer.go): Dirty-data detection is delegated toAnalyzer.triageDirtyData(). If mass was NOT resolved by an override, the method scans againstdirtyKeywordsusingcontainsAnywith a special-case guard for"unflavored"products. The servings sub-exception flags products with"serv"in their identity for manual review. Both one-time and subscription entries inherit the same flag.cmd/main.gocallssaveReviewQueue()to extract flagged entries and write them todata/needs_review.json. - Audit Gap Detector (
internal/parser/audit.go):Analyzer.AuditProduct()is a method on theAnalyzerstruct. It runs the same supplement keyword gate (viaAnalyzer.matchesSupplement()) and callsAnalyzer.AnalyzeProduct()to check if the product is already analyzable. If not, it probes for partial data usingextractFloat/extractFloatFromhelpers and returns anAuditResultdescribing the gap.FormatAuditReport()groups results by vendor and renders them as a human-readable stdout report. Triggered by the-auditCLI flag. - Storage (
internal/storage/json_store.go): Uses Go generics:SaveJSON[T any](path, data)andLoadJSON[T any](path)replace the previousSaveProducts,SaveReport, andLoadProductsfunctions.VendorFilename()converts a vendor name to its JSON file path (e.g.,"Do Not Age"→"data/do_not_age.json").
Agents must adhere to these structs when modifying scrapers:
type Product struct {
ID string `json:"id"`
Title string `json:"title"`
Context string `json:"context"`
Handle string `json:"handle"`
BodyHTML string `json:"body_html"`
ImageURL string `json:"image_url"`
Variants []Variant `json:"variants"`
}
type Variant struct {
Price string `json:"price"`
Title string `json:"title"`
Available bool `json:"available"`
}
type Analysis struct {
Vendor string `json:"vendor"`
Name string `json:"name"`
Handle string `json:"handle"`
Price float64 `json:"price"`
ActiveGrams float64 `json:"active_grams"`
GrossGrams float64 `json:"gross_grams"`
CostPerGram float64 `json:"cost_per_gram"`
EffectiveCost float64 `json:"effective_cost"`
Multiplier float64 `json:"multiplier"`
MultiplierLabel string `json:"multiplier_label"`
Type string `json:"type"`
ImageURL string `json:"image_url"`
IsSubscription bool `json:"is_subscription"`
NeedsReview bool `json:"needs_review"`
ReviewReason string `json:"review_reason,omitempty"`
}The Analysis struct is the schema for data/analysis_report.json. JSON field names use snake_case. The frontend maps these to camelCase at the data-loading boundary (web/lib/data.ts).
Name: The analyzer strips the vendor name prefix from the product title before assigning it. Stripping is case-insensitive. Example: vendor"Nutricost", title"Nutricost Creatine Monohydrate"→Namebecomes"Creatine Monohydrate". If stripping would produce an empty string, the original title is kept.ActiveGrams: The total active ingredient mass in grams. This is the denominator forCostPerGramandEffectiveCostcalculations. Populated by the Hybrid Engine's priority chain: variant override (VariantOverrides) > product override (ForceActiveGrams) > regex pipeline. For "Pure Powder" products (no dirty keywords), if a label weight (GrossGrams) was found and mass was regex-resolved (not override), ActiveGrams is set equal to GrossGrams.GrossGrams: The physical weight printed on the product label (e.g., "500 GMS", "1 KG"). Resolved via a three-tier priority chain: (1)VariantGrossOverrides[v.Title]— per-variant manual override for variants whose titles lack standard gram/kg patterns (e.g.,"30 SERV"); (2) regex extraction viareLabelGrams/reLabelKgscanningvariant.Titleandproduct.Titleonly — neverbody_html; (3) Pure Powder Fallback — if the product type is"Powder",grossGramsis still0after overrides and regex, and the product is NOT flagged for review (!needsReview), thengrossGramsis set equal toactiveGrams. Rationale: an unflagged powder product is 100% pure active ingredient, so the container weight equals the active weight. This covers products with minimalist titles (e.g., Blueprint's"Creatine") where no gram/kg pattern exists for regex to match. Defaults to0for capsule-only products, tablets, or flagged powders where neither override, regex, nor fallback applies. NOT used in cost calculations — exists solely for frontend transparency. The frontend and CLI display the value whenevergrossGrams > 0; when0, they display "—".Multiplier: The bioavailability multiplier applied toCostPerGramto produceEffectiveCost(i.e.,EffectiveCost = CostPerGram / Multiplier). Defaults to1.0for standard formulations. Values:1.5for liposomal,1.1for sublingual/gel/tablet.MultiplierLabel: Human-readable label for the multiplier reason. Empty string whenMultiplieris1.0. Possible values:"Lipo Bonus","Sublingual","Gel Bonus","Tablet Bonus".IsSubscription:truewhen the entry is a synthetic "Subscribe & Save" row generated by the analyzer.falsefor standard one-time purchase entries. The frontend uses this field to power a purchase-type toggle.NeedsReview:truewhen the Triage Engine detected a dirty keyword in a product whose mass was resolved by regex (no override).falsewhen the product has an explicit override or no dirty keyword was found. Flagged entries are also written todata/needs_review.jsonbycmd/main.go.ReviewReason: Human-readable reason for the flag. Format:"Detected dirty keyword: <word>". Empty string whenNeedsReviewisfalse.
- Framework: Next.js 15 (App Router), React 19.
- Styling: Tailwind CSS v4 (PostCSS plugin
@tailwindcss/postcss). - Deployment: Vercel (or Cloudflare Pages). Static export (
output: "export"innext.config.ts). - Rendering: Strictly Static Site Generation (SSG). No client-side fetching to original APIs. No databases. Build produces
web/out/— a flat directory of HTML/CSS/JS.
web/lib/data.tsreadsdata/analysis_report.jsonfrom the filesystem at build time usingfs.readFileSync. The data directory is resolved relative to theweb/working directory (path.resolve(process.cwd(), '..', 'data')).data.tsmaps the snake_case JSON fields (active_grams,gross_grams,cost_per_gram,effective_cost,multiplier,multiplier_label,image_url,is_subscription) to camelCase (activeGrams,grossGrams,costPerGram,effectiveCost,multiplier,multiplierLabel,imageURL,isSubscription) via a privateRawReportEntryinterface and amapEntry()function. All downstream code uses the camelCaseAnalysistype.web/app/page.tsxcallsloadReport()in a Server Component, enriches each entry withVendorInfofromweb/lib/vendors.ts, and passes the result toProductTable.- The frontend contains zero parsing logic. No regexes, no mg/count extraction, no bioavailability multipliers, no type classification. All of that lives exclusively in the Go backend's
analyzer.go. The frontend is a dumb renderer of pre-computed data.
- The Table: The core UI is a data table sorted by
effectiveCost(Lowest to Highest). Columns: Rank (gold/silver/bronze badges for top 3), Image, Vendor, Product Name, Type (colored pill badge), Base Price, Active (grams), Gross (grams), $/Gram, True Cost, Buy link. The "Active" column showsactiveGrams(the denominator for cost math). The "Gross" column showsgrossGramswhenever it is> 0(including when it equals Active — this is the expected state for pure powders); it shows "—" only whengrossGramsis0, which is the correct state for Capsules and Tablets that do not advertise a gross powder weight. - True Cost Transparency: The True Cost column header includes a hover tooltip
(i)explaining: "Base Price ÷ Bioavailability Multiplier". When a product has amultiplier > 1, a muted subtext is rendered below the True Cost value showing the multiplier and its label (e.g.,(1.5x Lipo Bonus),(1.1x Sublingual)). This subtext appears in both the desktop table rows and the mobile card layout. Products with a1.0multiplier show no subtext. - Supplement Filter: Pill-style tabs at the top filter by supplement type: All, NMN, NAD+, TMG, Resveratrol, Creatine. Implemented as a client component (
SupplementFilter.tsx) withuseState. Filtering is keyword-based on the product name/handle/vendor string — no re-analysis. - Column Sorting: Clicking Price, $/Gram, or True Cost column headers toggles ascending/descending sort. Active sort column shows a directional arrow indicator.
- Mobile Layout: Below
mdbreakpoint (768px), the table is hidden and replaced by a card layout. Each card shows rank badge, product image, vendor name, type badge, product name, a 2×2 stats grid (Price, Total, $/Gram, True Cost), and a full-width "View Deal" button. - Performance: Static export. First Load JS is ~105 kB. No client-side API calls. All product data is baked into the HTML at build time.
- Image Handling: Product images use a standard
<img>tag withloading="lazy".next.config.tsdefinesremotePatternsfor all vendor CDN hostnames (cdn.shopify.com, donotage.org, renuebyscience.com, etc.). Images are set tounoptimized: truefor static export compatibility. - Compliance Banners: All rendered in the page footer (
web/app/page.tsx):- FDA Disclaimer: "These statements have not been evaluated by the Food and Drug Administration..."
- EU Notice: "NMN is classified as a Novel Food in the European Union. Listings are provided for research and personal import purposes only."
- Vendor Registry:
web/lib/vendors.tsmaps each vendor name to its base URL and whether the handle is a full URL or a slug. It does not reference any raw data files. - Allowed Frontend Math: The only calculations permitted on the frontend are user-driven state computations (e.g., a future "Monthly Cost" column based on user dosage input). All product-level math ($/gram, effective cost, multiplier, type classification) is computed by the Go backend and consumed as-is.
.github/workflows/scrape.yml requirements:
- Schedule:
cron: '0 8 * * *'(Runs daily). - Environment: Ubuntu latest, Go 1.21+.
- Execution: Run
go run cmd/main.go -refresh. - Diff Check: Check if
data/*.jsonfiles have changed usinggit diff. This includes both raw vendor files andanalysis_report.json. - Commit & Push: If changes exist, commit as "Auto-update product data [skip ci]".
- Trigger Build: Trigger the Next.js Vercel build webhook.
- Rule 1: No Databases. Do not introduce PostgreSQL, MongoDB, Prisma, or ORMs. The JSON files in the repo are the sole source of truth.
- Rule 2: Don't Break the Analyzer. When modifying
analyzer.go, ensure strict isolation of string parsing. Do not allow HTML tags fromBodyHTMLto leak intoTypeclassification. - Rule 3: OCR is Banned. Do not implement image-to-text processing for missing data. Rely entirely on
data/vendor_rules.jsonoverrides. - Rule 4: Brutal Simplicity. Avoid complex state management (Redux/Zustand) in Next.js. The app is a static table. Keep client-side JavaScript to an absolute minimum.
- Rule 5: No Duplicated Logic. All parsing, regex extraction, bioavailability math, and type classification live exclusively in the Go backend (
analyzer.go). The frontend readsdata/analysis_report.jsonand renders it. Do not re-implement or duplicate the analyzer in TypeScript or any other language. - Rule 6: Single Integration Point.
data/analysis_report.jsonis the sole contract between the backend and frontend. If theAnalysisstruct changes in Go, update theRawReportEntryinterface inweb/lib/data.tsand theAnalysisinterface inweb/lib/types.tsto match. KeepSPEC.mdin sync perAGENTS_DOCS_PROTOCOL.md.