LLM Gauntlet

Local LLM coding benchmark — open-source models, real backend tasks, executed on a 24GB MacBook with no cloud cheats. Live: llm-gauntlet-blog.vercel.app

The site is the front-end for an evaluation harness that runs open-source coding models against seven backend engineering tasks (T1–T7): a YAML parser FIM, an async refactor, a concurrent-cache race condition, a distributed Redis job scheduler, a retry decorator, a memoize decorator, and an observable typed state machine. Models are scored execution-first — if it doesn't compile and meet the architectural constraints, it doesn't score.

This repository ships the React/Vite UI, the scoring components, the leaderboard, and the rendering layer for the long-form write-ups (episodes, deliberations, methodology). The benchmark content itself (markdown, chat dumps, score data) lives outside this repository for now.

Stack

React 19 + TypeScript with Vite 8 (Rolldown bundler)
Tailwind CSS v4 (@theme design tokens, data-theme="dark" overrides)
react-router-dom 7 for SPA routing
react-markdown + remark-gfm + remark-breaks + remark-wiki-link for the article reader
framer-motion for the homepage carousel and list animations
@iconify/react for vendor brand marks
bun as the package manager and dev runner

Getting started

bun install
bun run dev              # Vite dev server on :5173
bun run build            # Production build (tsc -b && vite build)
bun run preview          # Serve dist/ locally
bun run lint             # ESLint

The bun run content script regenerates src/content/manifest.json from the upstream Obsidian vault. It's intentionally local-only — see "Content layer" below.

Project layout

.
├── index.html                 # Root HTML, font preconnects, no-flash theme script
├── public/                    # Static assets (favicon, brand marks, hero SVGs)
├── scripts/
│   └── build-content.ts       # Vault → src/content/manifest.json (run locally)
├── src/
│   ├── App.tsx                # Routes: /, /models, /models/:slug, /blog,
│   │                          # /blog/:slug, /docs/:slug, /deliberations/:slug,
│   │                          # /agentic, /rubrics, /leaderboard
│   ├── main.tsx
│   ├── index.css              # Tailwind v4 @theme tokens + dark overrides
│   ├── components/
│   │   ├── Layout.tsx         # Nav + main + footer + global Cmd-K search
│   │   ├── Nav.tsx            # Sticky header with contextual dropdowns
│   │   ├── HeroCarousel.tsx   # Homepage gradient carousel
│   │   ├── ModelCard.tsx      # 2-up grid card with hover wipe
│   │   ├── BrandIcon.tsx      # Iconify-driven vendor marks
│   │   ├── Prose.tsx          # Markdown renderer with wikilink resolution
│   │   ├── SearchPalette.tsx  # Cmd-K command palette
│   │   ├── ThemeToggle.tsx    # Sun/moon toggle, system-aware
│   │   └── …
│   ├── pages/
│   │   ├── Home.tsx           # Hero + leaderboard preview + recent posts
│   │   ├── Models.tsx         # Filterable grid (Track A/B/Benched + Small)
│   │   ├── ModelDetail.tsx    # Per-model breakdown
│   │   ├── Blog.tsx           # Sidebar nav + Medium-style article list
│   │   ├── DocReader.tsx      # Long-form article view with scroll-spy TOC
│   │   ├── Agentic.tsx        # Agentic V2 — state coordinator (see below)
│   │   ├── Rubrics.tsx        # Per-task scoring criteria
│   │   └── Leaderboard.tsx    # Score table by track
│   ├── components/agentic/    # Agentic V2 sub-components (see below)
│   │   ├── types.ts           # Command, HistoryEntry shared types
│   │   ├── renderers.tsx      # ContentRenderer, ToolOutputViewer, EditDiffViewer
│   │   ├── CommandPalette.tsx # Ctrl+P overlay
│   │   ├── ThemePicker.tsx    # Searchable theme switcher modal
│   │   ├── ModelPicker.tsx    # Session/agent switcher modal
│   │   ├── SessionHistoryModal.tsx  # Last-50 viewed sessions
│   │   └── SessionView.tsx    # Full-screen chat log viewer (chat + sidebar)
│   ├── lib/
│   │   ├── content.ts         # Manifest + raw markdown access
│   │   ├── models.ts          # Models data layer
│   │   ├── search.ts          # Search index for the palette
│   │   ├── theme.ts           # Theme hook (light/dark/system)
│   │   ├── agenticChats.ts    # Chat log parser + AGENTIC_MODELS registry
│   │   └── agenticThemes.ts   # AgenticTheme type + AGENTIC_THEMES map
│   └── content/               # Local-only — see "Content layer"
└── vercel.json                # SPA rewrite rule for direct route hits

Content layer

The site reads vault markdown at build time via Vite's import.meta.glob('?raw') and renders it through react-markdown. The raw markdown lives in src/content/vault/ and the index lives at src/content/manifest.json.

This content/ directory is not tracked in this public repository — see .gitignore. The benchmark write-ups, deliberation logs, and chat dumps stay local while the site code is open-source. Anyone forking this repo will need to provide their own src/content/manifest.json (matching the Doc shape in src/lib/content.ts) and their own markdown under src/content/vault/ to render anything.

Theming

Light mode is the default. Dark mode is wired through CSS custom-property overrides on [data-theme="dark"] in src/index.css. The theme hook in src/lib/theme.ts supports three states (light / dark / system); the toggle in the nav flips between explicit light and dark, and the no-flash inline script in index.html resolves the persisted choice before React mounts so there's no FOUC on dark.

Search

/ or Cmd/Ctrl+K opens a docs-style command palette (src/components/SearchPalette.tsx). The index is built statically from the manifest plus the model registry — see src/lib/search.ts. Hits are grouped by category (Episode / Deliberation / Methodology / Model / Page), arrow-key navigable, with a soft-fill active row and an Enter pill.

Agentic V2 (WIP)

/agentic is a terminal-style UI for browsing recorded LangChain RAG benchmark sessions. It's being built toward a live chat interface once a lightweight SLM is deployed on the site.

Current state: read-only session viewer. Users can browse each model's recorded agentic run — thinking, tool calls, written files, and a metadata sidebar showing token usage, MCP tools, and file change stats.

Planned: the fake pulsing input at the bottom of each session becomes a real prompt box. The SLM responds in context, letting visitors ask follow-up questions about the benchmark results.

Using the Agentic page

The text input at the top is not a free-text chat box — that's the planned SLM interface, not yet wired up. What does work:

Slash commands — type / in the input to autocomplete:

Command	What it does
`/sessions` or `/agents`	Open the model/session picker
`/themes`	Open the theme switcher
`/variants`	Cycle to the next registered model
`/models`	Navigate to the Models page
`/leaderboard`	Navigate to the Leaderboard
`/blog`	Navigate to the Blog
`/rubrics`	Navigate to the Rubrics page
`/home`	Navigate to the homepage

Keyboard shortcuts (work anywhere on the page):

Shortcut	What it does
`Ctrl+P`	Open command palette
`Ctrl+X T`	Switch theme
`Ctrl+X M`	Switch model
`Ctrl+X R`	Open the current model's long-form report
`Ctrl+X B`	Open the agentic leaderboard
`Tab`	Cycle to next model (when suggestion list is open)
`Esc`	Close any open modal or return from session view

Session viewer — click "● Session View" at the bottom of the main view to open the full-screen chat log for the currently selected model. Inside the session viewer:

Scroll the left panel to read through the model's agentic run (thinking → tool calls → final response)
Right sidebar shows context tokens, MCP status, modified files, and session stats
Ctrl+P opens the command palette inside the session viewer too
← Return button (or Esc) exits back to the main terminal view

Architecture

Agentic.tsx is a thin state coordinator. It owns all modal/keyboard state and delegates rendering to focused components:

Component	Responsibility
`SessionView`	Full-screen two-panel chat log viewer
`CommandPalette`	Ctrl+P overlay; driven by parent keyboard handler
`ThemePicker`	Searchable theme switcher
`ModelPicker`	Agent/session switcher
`SessionHistoryModal`	Last 50 viewed sessions (localStorage)
`renderers.tsx`	`ContentRenderer`, `ToolOutputViewer`, `EditDiffViewer`

Adding a theme

Open src/lib/agenticThemes.ts and add a new key to AGENTIC_THEMES with all 10 --a-* CSS variable values. The picker, localStorage persistence, and AgenticThemeKey type all update automatically — no other changes needed.

Chat log format

Sessions are stored as .md files in src/content/agentic/. agenticChats.ts parses them at build time via Vite ?raw imports. The format uses ## User / ## Assistant (Model · Xs) headings, _Thinking:_ blocks, and **Tool:** / **Input:** / **Output:** fences. See any existing file in that directory for reference.

Routing notes for Vercel

The site is a pure-client SPA. vercel.json ships a single rewrite rule (/(.*) → /index.html) so direct hits on routes like /blog/episode-1-... or /models/<slug> get served by the React app instead of 404'ing. .vercelignore keeps the upload tarball lean by excluding scoring scripts, local benchmark data, and Windows NTFS metadata streams.

Credits

Site design + implementation: CodeStrate. Content collaborators: Claude 4.6 (judging) and Gemini 3.1 Pro (verification). Iconify brand marks via simple-icons. Type stack: Lato (sans), JetBrains Mono (mono), League Gothic (display).

License

Code: MIT. Benchmark content (when published): CC BY-SA 4.0.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
_legacy		_legacy
public		public
scripts		scripts
src		src
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
bun.lock		bun.lock
eslint.config.js		eslint.config.js
index.html		index.html
package.json		package.json
tsconfig.app.json		tsconfig.app.json
tsconfig.json		tsconfig.json
tsconfig.node.json		tsconfig.node.json
vercel.json		vercel.json
vite.config.ts		vite.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLM Gauntlet

Stack

Getting started

Project layout

Content layer

Theming

Search

Agentic V2 (WIP)

Using the Agentic page

Architecture

Adding a theme

Chat log format

Routing notes for Vercel

Credits

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

LLM Gauntlet

Stack

Getting started

Project layout

Content layer

Theming

Search

Agentic V2 (WIP)

Using the Agentic page

Architecture

Adding a theme

Chat log format

Routing notes for Vercel

Credits

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages