Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 11 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,8 @@ PUPILA shells out to whichever AI CLI you already have authenticated locally —

Drag a `.pdf` / `.docx` / `.md` / `.txt` CV onto the drop zone (or click **Choose file**). The CV is parsed locally and sent to your LLM CLI to generate a short candidate brief — who you are, what stack you work in, what kind of role you want, and what to avoid. The original CV stays on disk at `config/cv.<ext>` (gitignored) so **AI Apply** can re-attach it later.

**No recent CV?** Click **Import from LinkedIn instead** (optional) and upload a LinkedIn profile PDF — on your LinkedIn profile, go to **More → Save to PDF**, then drop the downloaded file here. It runs through the same parse → LLM pipeline with a LinkedIn-tuned prompt, so the resulting brief is comparable to a CV-sourced one. (No login or scraping — the "Save to PDF" export is fully self-serve.)

### 3. Confirm the generated brief

<p align="left">
Expand Down Expand Up @@ -84,7 +86,7 @@ Contributing rules and project invariants live in [`CONTRIBUTING.md`](./CONTRIBU
> **Looking for today's matches?** → [`JOBS.md`](./JOBS.md) (auto-generated by the local aggregator).
> Raw data lives in [`data/jobs.json`](./data/jobs.json).
> Local RSS feed at [`data/feed.xml`](./data/feed.xml) (point your reader at the file:// path).
> Prefer a UI? → `pnpm run ui` opens a local-only Vite dashboard at `http://127.0.0.1:5173`. Four tabs: **Jobs** (filter, search, sortable columns, click-to-expand rows over `data/jobs.json`), **Jinder** (Tinder-style swipe deck — right-swipe to queue an AI Apply, left-swipe to skip), **Profile** (drop a PDF/DOCX/MD CV to set up or refresh your candidate brief), and **Settings** (scheduler lifecycle, scoring profile regenerate, disk usage, apply queue). See [AI Apply](#ai-apply-per-job-optional) and [AI per-job review](#ai-per-job-review) below.
> Prefer a UI? → `pnpm run ui` opens a local-only Vite dashboard at `http://127.0.0.1:5173`. Four tabs: **Jobs** (filter, search, sortable columns, click-to-expand rows over `data/jobs.json`), **Jinder** (Tinder-style swipe deck — right-swipe to queue an AI Apply, left-swipe to skip), **Profile** (drop a PDF/DOCX/MD CV — or import a LinkedIn profile PDF — to set up or refresh your candidate brief), and **Settings** (scheduler lifecycle, scoring profile regenerate, disk usage, apply queue). See [AI Apply](#ai-apply-per-job-optional) and [AI per-job review](#ai-per-job-review) below.

---

Expand Down Expand Up @@ -113,16 +115,19 @@ The friendliest path is the **first-run onboarding wizard**: run `pnpm run ui` o
For the CLI-only path, run setup-brief directly:

```bash
pnpm run setup-brief --file ~/Documents/cv.pdf # PDF
pnpm run setup-brief --file ~/Documents/cv.docx # Word document
pnpm run setup-brief --file ~/Documents/cv.md # markdown
cat resume.txt | pnpm run setup-brief # stdin
pnpm run setup-brief --file ~/Documents/cv.pdf # PDF
pnpm run setup-brief --file ~/Documents/cv.docx # Word document
pnpm run setup-brief --file ~/Documents/cv.md # markdown
pnpm run setup-brief --linkedin ~/Downloads/profile.pdf # LinkedIn "Save to PDF" export
cat resume.txt | pnpm run setup-brief # stdin
```

`--linkedin` runs the same pipeline as `--file` but tells the LLM the input is a LinkedIn profile export (so it ignores LinkedIn's boilerplate). Don't have a recent CV? On your LinkedIn profile, go to **More → Save to PDF** and pass the downloaded file. (A `--file` whose name contains "linkedin" is auto-detected as a LinkedIn source.)

Or open the UI and use the Profile tab:

```bash
pnpm run ui # http://127.0.0.1:5173 → Profile tab → drop your CV
pnpm run ui # http://127.0.0.1:5173 → Profile tab → drop your CV (or "From LinkedIn")
```

The auto-detected provider order is `claude` → `codex` → `gemini` → `opencode` (whichever is on `PATH` first). Override with `PUPILA_LLM=codex pnpm run setup-brief ...`. No API keys; uses your existing CLI subscription.
Expand Down
53 changes: 53 additions & 0 deletions src/lib/brief-prompt.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
// Single source of truth for the candidate-brief summarization prompt.
//
// Both entry points that turn a CV-like document into config/candidate-brief.md
// share this builder so they can't drift:
// - the `pnpm run setup-brief` CLI (src/setup-brief.ts)
// - the UI's POST /api/cv middleware (ui/plugins/brief.ts)
//
// The output contract is identical regardless of source — three short
// paragraphs of plain markdown — so a LinkedIn-sourced brief is comparable in
// quality to a CV-sourced one. Only the *framing* changes: a LinkedIn "Save to
// PDF" export has a predictable structure (and predictable boilerplate) that
// the LLM does better with when we name it explicitly.

export type BriefSource = 'cv' | 'linkedin';

// Shared three-paragraph contract + closing instructions. Kept verbatim across
// sources so the resulting brief has the same shape no matter where the raw
// text came from.
const OUTPUT_CONTRACT = `Output ONLY three short paragraphs as plain markdown text. No preamble, no markdown fences, no headings, no commentary.

PARAGRAPH 1 — Who they are: role, years of experience, primary location, primary stack/skills. Be concrete (frameworks, languages, tools they ship with regularly).
PARAGRAPH 2 — What they're looking for: target seniority (senior / lead / staff / principal IC), domains/sectors of interest (web3, AI, fintech, etc.), location preference (remote-worldwide / remote-EMEA / hybrid in <city> / open to relocation).
PARAGRAPH 3 — What to avoid: roles that look like a fit on paper but aren't. Examples: wrong specialty (backend if frontend, etc.), wrong level (junior, intern, exec), on-site only, US-only positions, support/solutions/devrel/GTM titles.

Aim for 6-10 lines total. Drop anything that doesn't help a job-matching tool decide. Don't editorialize.`;

const CV_INTRO = `You are summarizing the following CV into a short candidate brief that will be sent to an LLM each time the candidate's job-matching tool evaluates a posting. The brief decides whether the LLM agrees with the rule-based fit score.`;

const LINKEDIN_INTRO = `You are summarizing the candidate's LinkedIn profile into a short candidate brief that will be sent to an LLM each time the candidate's job-matching tool evaluates a posting. The brief decides whether the LLM agrees with the rule-based fit score.`;

// LinkedIn "Save to PDF" exports interleave profile content with structural
// boilerplate (Contact section, "Page N of M" footers, endorsement/skill
// counts, "Top Skills", repeated headers). Naming the source lets the LLM
// ignore that noise and read the Experience/Education sections as a résumé.
const LINKEDIN_PREAMBLE = `The text below was extracted from a LinkedIn profile exported via "Save to PDF". Treat it as the candidate's résumé. Ignore LinkedIn boilerplate — contact details, "Page N of M" footers, skill-endorsement counts, "Top Skills" / "Contact" section labels, and repeated page headers. Infer the candidate's current role, seniority, and stack from the Experience and Education sections.`;

/**
* Build the summarization prompt for a CV or LinkedIn profile export.
*
* @param text parsed plain text of the document (CV or LinkedIn PDF)
* @param source 'cv' (default) or 'linkedin' — only changes the framing
* @param maxChars hard cap on how much of `text` we forward to the LLM
*/
export function buildBriefPrompt(text: string, source: BriefSource, maxChars: number): string {
const label = source === 'linkedin' ? 'LINKEDIN PROFILE' : 'CV';
const intro = source === 'linkedin' ? `${LINKEDIN_INTRO}\n\n${LINKEDIN_PREAMBLE}` : CV_INTRO;
return `${intro}

${OUTPUT_CONTRACT}

${label}:
${text.slice(0, maxChars)}`;
}
62 changes: 39 additions & 23 deletions src/setup-brief.ts
Original file line number Diff line number Diff line change
Expand Up @@ -5,17 +5,24 @@
// keeping it sharp directly improves the per-job verdicts.
//
// CLI:
// pnpm run setup-brief --file path/to/cv.pdf # parse PDF (via pdfjs-dist)
// pnpm run setup-brief --file path/to/cv.docx # parse DOCX (via mammoth)
// pnpm run setup-brief --file path/to/cv.md # plain markdown
// pnpm run setup-brief --file path/to/cv.txt # plain text
// cat cv.txt | pnpm run setup-brief # stdin
// pnpm run setup-brief --file path/to/cv.pdf # parse PDF (via pdfjs-dist)
// pnpm run setup-brief --file path/to/cv.docx # parse DOCX (via mammoth)
// pnpm run setup-brief --file path/to/cv.md # plain markdown
// pnpm run setup-brief --file path/to/cv.txt # plain text
// pnpm run setup-brief --linkedin path/to/profile.pdf # LinkedIn "Save to PDF" export
// cat cv.txt | pnpm run setup-brief # stdin
//
// `--linkedin` is the same pipeline as `--file` but tells the LLM the input is
// a LinkedIn profile export so it ignores LinkedIn's boilerplate. A `--file`
// whose name contains "linkedin" is auto-treated as a LinkedIn source.
//
// Provider: auto-detects claude / codex / gemini / opencode on PATH (in that
// order). Override with PUPILA_LLM=<provider>.

import { existsSync } from 'node:fs';
import { copyFile } from 'node:fs/promises';
import { basename } from 'node:path';
import { type BriefSource, buildBriefPrompt } from './lib/brief-prompt.js';
import { writeBriefBody } from './lib/brief-template.js';
import { detectFormat, parseCvFile } from './lib/cv-parser.js';
import { detectLlmCli, runLlm } from './lib/llm.js';
Expand All @@ -27,11 +34,15 @@ const CV_DEST_BASENAME = 'config/cv';

interface CliArgs {
file: string | null;
source: BriefSource;
help: boolean;
}

function parseArgs(argv: string[]): CliArgs {
let file: string | null = null;
// Only flips to 'linkedin' when --linkedin is passed, or when a --file path
// looks like a LinkedIn export (see inference below).
let source: BriefSource = 'cv';
let help = false;
for (let i = 0; i < argv.length; i++) {
const arg = argv[i];
Expand All @@ -44,11 +55,29 @@ function parseArgs(argv: string[]): CliArgs {
file = next;
i++;
}
} else if (arg.startsWith('--linkedin=')) {
file = arg.slice(11);
source = 'linkedin';
} else if (arg === '--linkedin' && i + 1 < argv.length) {
const next = argv[i + 1];
if (next) {
file = next;
source = 'linkedin';
i++;
}
} else if (arg === '--help' || arg === '-h') {
help = true;
}
}
return { file, help };
// Auto-detect a LinkedIn export passed via --file by its filename, so
// `--file ~/Downloads/LinkedIn_Profile.pdf` still gets the tuned prompt.
// Match only the basename — a directory like ~/linkedin-stuff/ shouldn't
// silently flip a regular CV to the LinkedIn prompt. The "Reading … (…,
// LinkedIn export)" log in main() surfaces the decision either way.
if (source === 'cv' && file && /linkedin/i.test(basename(file))) {
source = 'linkedin';
}
return { file, source, help };
}

async function readStdin(): Promise<string> {
Expand All @@ -59,21 +88,6 @@ async function readStdin(): Promise<string> {
return Buffer.concat(chunks).toString('utf-8');
}

function buildPrompt(cvText: string): string {
return `You are summarizing the following CV into a short candidate brief that will be sent to an LLM each time the candidate's job-matching tool evaluates a posting. The brief decides whether the LLM agrees with the rule-based fit score.

Output ONLY three short paragraphs as plain markdown text. No preamble, no markdown fences, no headings, no commentary.

PARAGRAPH 1 — Who they are: role, years of experience, primary location, primary stack/skills. Be concrete (frameworks, languages, tools they ship with regularly).
PARAGRAPH 2 — What they're looking for: target seniority (senior / lead / staff / principal IC), domains/sectors of interest (web3, AI, fintech, etc.), location preference (remote-worldwide / remote-EMEA / hybrid in <city> / open to relocation).
PARAGRAPH 3 — What to avoid: roles that look like a fit on paper but aren't. Examples: wrong specialty (backend if frontend, etc.), wrong level (junior, intern, exec), on-site only, US-only positions, support/solutions/devrel/GTM titles.

Aim for 6-10 lines total. Drop anything that doesn't help a job-matching tool decide. Don't editorialize.

CV:
${cvText.slice(0, MAX_CV_CHARS)}`;
}

function stripMarkdownFences(text: string): string {
let cleaned = text.trim();
if (cleaned.startsWith('```')) {
Expand All @@ -89,6 +103,7 @@ async function main(): Promise<void> {
console.log(' pnpm run setup-brief --file path/to/cv.pdf');
console.log(' pnpm run setup-brief --file path/to/cv.docx');
console.log(' pnpm run setup-brief --file path/to/cv.md');
console.log(' pnpm run setup-brief --linkedin path/to/profile.pdf # LinkedIn "Save to PDF"');
console.log(' cat cv.txt | pnpm run setup-brief');
console.log('');
console.log('Provider: auto-detects claude/codex/gemini/opencode on PATH.');
Expand All @@ -103,7 +118,8 @@ async function main(): Promise<void> {
process.exit(1);
}
const format = detectFormat(args.file);
console.log(`Reading ${args.file} (${format})...`);
const sourceLabel = args.source === 'linkedin' ? 'LinkedIn export' : 'CV';
console.log(`Reading ${args.file} (${format}, ${sourceLabel})...`);
try {
cvText = await parseCvFile(args.file);
} catch (err) {
Expand Down Expand Up @@ -149,7 +165,7 @@ async function main(): Promise<void> {
`Parsed ${cvText.length} chars. Running ${invocation.provider} (${invocation.cmd})...`,
);

const prompt = buildPrompt(cvText);
const prompt = buildBriefPrompt(cvText, args.source, MAX_CV_CHARS);
let raw: string;
try {
raw = await runLlm(prompt);
Expand Down
44 changes: 44 additions & 0 deletions tests/brief-prompt.test.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
import { describe, expect, it } from 'vitest';
import { buildBriefPrompt } from '../src/lib/brief-prompt.js';

const SAMPLE = 'Jane Doe — Senior Frontend Engineer. React, TypeScript, 8 years.';

describe('buildBriefPrompt', () => {
it('keeps the three-paragraph output contract for both sources', () => {
for (const source of ['cv', 'linkedin'] as const) {
const prompt = buildBriefPrompt(SAMPLE, source, 12_000);
expect(prompt).toContain('PARAGRAPH 1 — Who they are');
expect(prompt).toContain('PARAGRAPH 2 — What they');
expect(prompt).toContain('PARAGRAPH 3 — What to avoid');
expect(prompt).toContain('No preamble, no markdown fences');
// The raw document text is always appended.
expect(prompt).toContain(SAMPLE);
}
});

it('labels the input as CV for the cv source', () => {
const prompt = buildBriefPrompt(SAMPLE, 'cv', 12_000);
expect(prompt).toContain('summarizing the following CV');
expect(prompt).toContain('\nCV:\n');
// No LinkedIn-specific framing leaks into the CV prompt — check the exact
// LinkedIn copy, not just the word, so a partial bleed is caught.
expect(prompt).not.toContain('LINKEDIN PROFILE');
expect(prompt).not.toContain('Save to PDF');
expect(prompt).not.toContain('Ignore LinkedIn boilerplate');
});

it('adds LinkedIn-tuned framing for the linkedin source', () => {
const prompt = buildBriefPrompt(SAMPLE, 'linkedin', 12_000);
expect(prompt).toContain("summarizing the candidate's LinkedIn profile");
expect(prompt).toContain('Save to PDF');
expect(prompt).toContain('Ignore LinkedIn boilerplate');
expect(prompt).toContain('\nLINKEDIN PROFILE:\n');
});

it('truncates the document text to maxChars', () => {
const long = 'x'.repeat(50);
const prompt = buildBriefPrompt(long, 'cv', 10);
expect(prompt).toContain('xxxxxxxxxx'); // exactly 10
expect(prompt).not.toContain('x'.repeat(11));
});
});
3 changes: 3 additions & 0 deletions ui/plugins/_shared.ts
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,9 @@ export const CV_MAX_CHARS = Number(process.env.PUPILA_CV_MAX_CHARS ?? '12000');
// `src/types.ts` so the MCP server and the UI agree on the same values.
export const VALID_STATUSES: ReadonlySet<string> = new Set<string>(APPLICATION_STATUSES);
export const VALID_CV_FORMATS = new Set<CvFormat>(['pdf', 'docx', 'md', 'txt']);
// Brief input source — 'cv' (a résumé/CV) or 'linkedin' (a LinkedIn
// "Save to PDF" export). Only changes the LLM prompt framing.
export const VALID_CV_SOURCES = new Set<string>(['cv', 'linkedin']);
export const VALID_PROVIDER_OR_AUTO = new Set<string>([...SUPPORTED_PROVIDERS, 'auto']);

const CV_EXTENSIONS: readonly CvFormat[] = ['pdf', 'docx', 'md', 'txt'];
Expand Down
Loading