Wen's Mindscape

Read the full behavioral profile on Medium →

An interactive web application for exploring the psychometric battery results of Qwen 2.5 0.5B Instruct — a small language model profiled across 32 behavioral dimensions. The model is personified as Wen (紙神), a small anime spirit whose character reflects her test scores.

This is not a benchmark leaderboard. It is a behavioral portrait — showing what the model actually said, across 25 test prompts per dimension, run three times each, scored by a frontier judge with written rationales.

What you can do in the app

Explore all 32 dimensions with score bars, volatility indicators, and refusal/hedge badges
Read Qwen's actual responses — every run, in full, with typewriter animation
Compare structurally identical prompts with different framing (pair dimensions) to see asymmetric behavior directly
Navigate a constellation view where each dimension is a star — distance from Wen reflects its score, flicker reflects volatility
Analyse six charts including a full radar, score vs volatility scatter, category means, and an asymmetry heatmap

Tech stack

React + TypeScript
Tailwind CSS
Framer Motion
Recharts
Three.js (constellation view)

All data is static — loaded from public/data/ at runtime. No backend.

Repository structure

src/                  — React app source
public/
└── data/
    └── profile_qwen_0.5b/
        ├── raw_outputs/          — 32 × .jsonl  (Qwen's actual responses)
        ├── dimension_analyses/   — 32 × .json   (judge scores + rationales)
        ├── scores/               — pattern aggregates, rule check results
        ├── viz/viz_data_v2.json  — pre-computed scores, stds, synthesis
        ├── synthesis.json        — behavioral fingerprint
        └── battery_v1.json       — 800 frozen evaluation prompts

The data behind it

The profile was generated by a separate evaluation pipeline:

800 prompts across 32 dimensions, generated by gemini-2.5-flash and frozen before any model was tested
2,400 inference calls — each prompt run 3 independent times on 2× Kaggle T4 GPUs with 4-bit quantization
Scoring by gemini-3.1-flash-lite-preview — semantic judgment, no keyword matching, written rationale per test
Synthesis by gemini-2.5-flash — behavioral fingerprint covering strengths, weaknesses, cross-dimensional patterns, and evidence against six research hypotheses

Full methodology in the Medium article.

Running locally

npm install
npm run dev

Coming next

Profiles for Qwen 1.5B, 3B, 7B — Gemma 2B, 7B, 9B — and Llama 1B, 3B, 8B — followed by a cross-model comparison. Each new profile will be loadable in this app.

Name		Name	Last commit message	Last commit date
Latest commit History 50 Commits
.lovable		.lovable
public		public
src		src
.gitignore		.gitignore
README.md		README.md
bun.lock		bun.lock
bun.lockb		bun.lockb
components.json		components.json
eslint.config.js		eslint.config.js
index.html		index.html
package-lock.json		package-lock.json
package.json		package.json
playwright-fixture.ts		playwright-fixture.ts
playwright.config.ts		playwright.config.ts
postcss.config.js		postcss.config.js
tailwind.config.ts		tailwind.config.ts
tsconfig.app.json		tsconfig.app.json
tsconfig.json		tsconfig.json
tsconfig.node.json		tsconfig.node.json
vite.config.ts		vite.config.ts
vitest.config.ts		vitest.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Wen's Mindscape

What you can do in the app

Tech stack

Repository structure

The data behind it

Running locally

Coming next

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Wen's Mindscape

What you can do in the app

Tech stack

Repository structure

The data behind it

Running locally

Coming next

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages