diff --git a/docs/superpowers/plans/2026-06-18-memoria-perfil-curado.md b/docs/superpowers/plans/2026-06-18-memoria-perfil-curado.md new file mode 100644 index 0000000..e256a5f --- /dev/null +++ b/docs/superpowers/plans/2026-06-18-memoria-perfil-curado.md @@ -0,0 +1,310 @@ +# Perfil curado de memória — Plano de implementação (v1) + +> **For agentic workers:** REQUIRED SUB-SKILL: usar `superpowers:subagent-driven-development` +> (recomendado) ou `superpowers:executing-plans` para implementar tarefa a tarefa. +> Os passos usam checkbox (`- [ ]`). **Antes de cada tarefa, LEIA os arquivos +> citados** para pegar as assinaturas reais (este repo já existe; não invente +> shapes). Spec-fonte: `docs/superpowers/specs/2026-06-18-memoria-perfil-curado-design.md`. + +**Goal:** Dar ao Zinom uma camada de perfil curado por conta, injetada em toda +sessão MCP (owner-only, gated off), com escada de confiança determinística e +curador que opera só na tabela nova. + +**Architecture:** Tabela nova `user_profile_facts` + `memory_audit` (migração +0019). Injeção via dois helpers (`loadProfileFacts` DB-bound + `renderProfile` +puro) compostos nas `instructions` do MCP em `index.ts:423`, gated pelo booleano +`owner`. Curador determinístico (`runMemoryCuration`) num novo tick do +`brain-classifier`, operando **só** em `user_profile_facts` (nunca `brain_chunks`). +Tudo atrás de 3 flags default-off. + +**Tech Stack:** TypeScript + Express, Postgres + pgvector, `node:test` via tsx, +migrações SQL idempotentes (`scripts/migrations/`), Playwright (portal e2e). + +## Global Constraints (de todo o spec; valem para toda tarefa) +- **Owner-only, gated off:** flags `PROFILE_INJECT_ENABLED`, `MEMORY_CURATION_ENABLED`, + `MEMORY_CURATION_LLM` (esta = fase 2) default off. Owner = `DEFAULT_ACCOUNT_ID`. +- **Fail-closed:** injeção gated pelo booleano `owner` (index.ts:401), **nunca** + por `getAccountId()` (que faz fallback pra `'bruno'`). +- **Isolamento:** todo statement de `user_profile_facts`/`memory_audit` inclui + `account_id` em colunas e `WHERE`/conflict target. Conflict = `(account_id, + content_hash)`. NÃO reusar `facts-storage.ts`. +- **Off eval gate F8:** o curador **nunca muta `brain_chunks`**; nada toca + `src/rag/search.ts`. (Único toque em `brain_chunks` = D13, só escritas novas.) +- **CI sem Postgres:** nenhum AC pode depender de teste que dá early-return sem + `POSTGRES_URL`. Testes de SQL usam o padrão de pool injetado já existente + (mirror `src/rag/__tests__/entity-storage.test.ts` / `entity-extraction-run.test.ts`). +- **`composeInstructions` nunca retorna `''`** (o SDK MCP só envia `instructions` + truthy). +- Commits pequenos e atômicos; cada tarefa termina verde em `npm test`. + +--- + +## FASE v1a — Injeção (flags off, mergeável sem eval) + +### Task 1: Migração 0019 (tabelas) + surrogate de CI +**Files:** +- Create: `scripts/migrations/0019_user_profile.sql` +- Test: `src/rag/__tests__/migration-0019.test.ts` + +**Interfaces:** +- Produces: tabelas `user_profile_facts` e `memory_audit` (DDL exata no spec, + seção "Modelo de dados"). + +- [ ] **Step 1: Teste que falha** — ler o `.sql` do disco e assertar contrato. +```ts +import { test } from 'node:test'; import assert from 'node:assert/strict'; +import { readFileSync } from 'node:fs'; +test('migration 0019 é idempotente e isola por conta', () => { + const sql = readFileSync('scripts/migrations/0019_user_profile.sql', 'utf8'); + for (const create of sql.match(/CREATE TABLE[^;]*/gi) ?? []) + assert.match(create, /IF NOT EXISTS/i); + assert.match(sql, /UNIQUE\s*\(\s*account_id\s*,\s*content_hash\s*\)/i); + assert.match(sql, /account_id\s+text\s+NOT NULL/i); + assert.match(sql, /CREATE TABLE IF NOT EXISTS memory_audit/i); +}); +``` +- [ ] **Step 2: Rodar e ver falhar** — `node --import tsx --test src/rag/__tests__/migration-0019.test.ts` → FAIL (arquivo não existe). +- [ ] **Step 3: Criar o `.sql`** copiando a DDL exata do spec (as duas tabelas + os dois índices). Conferir contra `0012`/`0011` o estilo (idempotente, comentários). +- [ ] **Step 4: Rodar e ver passar.** +- [ ] **Step 5: Commit** — `feat(memoria): migracao 0019 user_profile_facts + memory_audit`. + +### Task 2: `computeConfidence` (puro, em utility.ts) +**Files:** +- Modify: `src/rag/utility.ts` (adicionar função + constantes `K_SMOOTH`, + `CONFIDENCE_BANDS={medium:0.40,high:0.75}`; reusar `DECAY_PER_DAY`/forma de + `computeEffectiveUtility`) +- Test: `src/rag/__tests__/utility-confidence.test.ts` + +**Interfaces:** +- Produces: `computeConfidence(applied:number, violated:number, lastEvidenceAt:Date|null, now?:Date): {value:number, band:'low'|'medium'|'high'}`. + `value = (applied/(applied+violated+K_SMOOTH)) * freshness(lastEvidenceAt, now)`. + +- [ ] **Step 1: Teste que falha** — tabela de casos: +```ts +// evidência fina não chega a high +assert.equal(computeConfidence(1,0,now,now).band !== 'high', true); +// muita evidência fresca e consistente -> high +assert.equal(computeConfidence(20,0,now,now).band, 'high'); +// bandas cruzam em 0.40/0.75 (montar casos determinísticos) +// lastEvidenceAt ~90d atrás derruba o value vs fresco (queda observável) +assert.ok(computeConfidence(20,0,d90,now).value < computeConfidence(20,0,now,now).value); +``` +- [ ] **Step 2: Rodar e ver falhar.** +- [ ] **Step 3: Implementar** a função pura (sem DB). Frescor via mesma curva de + `computeEffectiveUtility` (idade em dias × `DECAY_PER_DAY`). +- [ ] **Step 4: Rodar e ver passar.** +- [ ] **Step 5: Commit** — `feat(memoria): computeConfidence (razao suavizada x decaimento)`. + +### Task 3: tipo `ProfileFact` + `renderProfile` (puro) +**Files:** +- Create: `src/rag/profile.ts` (tipo `ProfileFact` + `renderProfile` puro + + constantes `PROFILE_CHAR_BUDGET=2800`, `INJECT_MIN_CONFIDENCE=0.75`) +- Test: `src/rag/__tests__/profile-render.test.ts` + +**Interfaces:** +- Consumes: `confidence_band`/`confidence_value`/`status`/`pinned`/`category`/`content` de `ProfileFact`. +- Produces: `renderProfile(facts: ProfileFact[], budget: number): string | null`. + Elegível = `pinned` OR (`status==='confirmed'` AND `confidence_value>=INJECT_MIN_CONFIDENCE`). + Ordena `pinned` > banda > confiança. Trunca por fato inteiro até `budget`. + Retorna `null` se nenhum fato elegível. + +- [ ] **Step 1: Teste que falha** — +```ts +assert.equal(renderProfile([], 2800), null); // vazio -> null +assert.equal(renderProfile([signalOnly], 2800), null); // só signal -> inelegível +const out = renderProfile([pinnedA, confirmedHigh, confirmedLow], 2800); +assert.ok(out!.indexOf(pinnedA.content) < out!.indexOf(confirmedHigh.content)); // pinned 1º +assert.ok(out!.length <= 2800); // orçamento +// truncamento nunca corta no meio de um fato (montar fatos que estouram) +``` +- [ ] **Step 2: Rodar e ver falhar.** +- [ ] **Step 3: Implementar** `renderProfile` puro + tipo + constantes. +- [ ] **Step 4: Rodar e ver passar.** +- [ ] **Step 5: Commit** — `feat(memoria): renderProfile puro + orcamento de caracteres`. + +### Task 4: `composeInstructions` (puro, em mcp-account-config.ts) +**Files:** +- Modify: `src/mcp-account-config.ts` (adicionar `composeInstructions`; **não** + mutar `OWNER_INSTRUCTIONS`/`FRIEND_INSTRUCTIONS`) +- Test: `src/__tests__/mcp-account-config.test.ts` (estender; manter os testes de substring existentes verdes) + +**Interfaces:** +- Produces: `composeInstructions(base: string, profileBlock: string | null): string`. + Retorna `base` quando `profileBlock` é `null`/`''`; senão `base + sep + block`. + **Nunca** retorna `''`. + +- [ ] **Step 1: Teste que falha** — +```ts +assert.equal(composeInstructions(OWNER_INSTRUCTIONS, null), OWNER_INSTRUCTIONS); +assert.equal(composeInstructions(OWNER_INSTRUCTIONS, ''), OWNER_INSTRUCTIONS); +const c = composeInstructions(OWNER_INSTRUCTIONS, 'PERFIL: x'); +assert.ok(c.includes(OWNER_INSTRUCTIONS) && c.includes('PERFIL: x')); +assert.notEqual(composeInstructions('', null), ''); // invariante: base já é não-vazia em prod +``` +- [ ] **Step 2: Rodar e ver falhar.** +- [ ] **Step 3: Implementar** concat puro. +- [ ] **Step 4: Rodar e ver passar** (incluindo os testes de substring pré-existentes). +- [ ] **Step 5: Commit** — `feat(memoria): composeInstructions puro (nao muta consts)`. + +### Task 5: `loadProfileFacts` (storage account-scoped) + isolamento +**Files:** +- Modify/Create: `src/rag/profile-storage.ts` (novo; **não** `facts-storage.ts`) +- Test: `src/rag/__tests__/profile-storage.test.ts` (pool injetado, mirror `entity-storage.test.ts`) + +**Interfaces:** +- Consumes: padrão de query account-scoped de `getBrainCounts` (`storage.ts:633,648`). +- Produces: `loadProfileFacts(accountId: string, pool?): Promise` + (SQL `WHERE account_id=$1`); `upsertProfileFact(fact, pool?)` (ON CONFLICT + `(account_id, content_hash)`); `insertMemoryAudit(row, pool?)`. Todos com + `account_id` em colunas e WHERE. + +- [ ] **Step 1: Teste que falha** — com pool fake injetado, capturar o SQL emitido: +```ts +// loadProfileFacts emite WHERE account_id = $1 e passa accountId como param +// upsertProfileFact inclui account_id na coluna E no ON CONFLICT (account_id, content_hash) +// nenhum statement omite account_id (assert sobre o texto do SQL capturado) +``` +- [ ] **Step 2: Rodar e ver falhar.** +- [ ] **Step 3: Implementar** os helpers, mirror do shape de `getBrainCounts` e do + padrão de injeção de pool (`__setPoolForTest` ou param opcional, conforme o + arquivo existente usa). +- [ ] **Step 4: Rodar e ver passar.** +- [ ] **Step 5: Commit** — `feat(memoria): profile-storage account-scoped (load/upsert/audit)`. + +### Task 6: Wiring da injeção em index.ts (gated por `owner`) +**Files:** +- Modify: `src/index.ts` (~401-425): ler `PROFILE_INJECT_ENABLED`; compor + `profileBlock` só quando `owner`; passar `composeInstructions(...)` em `instructions`. +- Test: `src/__tests__/profile-injection.test.ts` (AC4 + AC5) + +**Interfaces:** +- Consumes: `owner` (index.ts:401), `composeInstructions`, `loadProfileFacts`, `renderProfile`, `isOwnerContext`. + +- [ ] **Step 1: Teste que falha** — fronteira real fail-closed + flag-off no-op: +```ts +// AC4: contexto friend e contexto accountId=undefined+isOperator=undefined => isOwnerContext false +assert.equal(isOwnerContext({authType:'oauth',scopes:['personal'],accountId:'friend:x'}), false); +assert.equal(isOwnerContext({authType:'oauth',scopes:['personal'],accountId:undefined,isOperator:undefined}), false); +// e a função de composição da sessão, para esses contextos, NÃO inclui bytes do perfil do owner +// AC5: com PROFILE_INJECT_ENABLED off, instructions === baseline (owner e friend) +``` + (Extrair a lógica de composição da sessão para uma função testável, ex. + `buildSessionInstructions(owner: boolean, enabled: boolean, loadFacts, render): Promise`, + para não precisar subir o Express no teste.) +- [ ] **Step 2: Rodar e ver falhar.** +- [ ] **Step 3: Implementar** o wiring + a função extraída. `(PROFILE_INJECT_ENABLED && owner)` é a única porta para `loadProfileFacts(DEFAULT_ACCOUNT_ID)`. +- [ ] **Step 4: Rodar e ver passar.** +- [ ] **Step 5: Commit** — `feat(memoria): injeta perfil curado nas instructions (gated, fail-closed)`. + +**→ Checkpoint v1a:** `npm run build && npm test` verde. PR + merge (off-gate). +Deploy de rotina. Ligar `PROFILE_INJECT_ENABLED` só owner, semear 2-3 fatos +`pinned` (Task 10 dá a UI; ou seed manual via psql no rollout), verificar AC10. + +--- + +## FASE v1b — Curador + Portal + +### Task 7: `runMemoryCuration` (curador determinístico, owner-only) +**Files:** +- Create: `src/rag/memory-curator.ts` +- Test: `src/rag/__tests__/memory-curator.test.ts` (deps injetadas, mirror `entity-extraction-run.test.ts`) + +**Interfaces:** +- Consumes: `computeConfidence`, `profile-storage` helpers, `recordRun`. +- Produces: `runMemoryCuration(deps): Promise<{processed:number, transitions:number}>`. + Loop **filtrado ao owner** (`WHERE account_id = DEFAULT_ACCOUNT_ID`); recomputa + confiança/banda; promove `signal`→`evidence` (`applied_count>=1`) →`confirmed` + (`value>=CONFIRM_THRESHOLD`); decai; 1 `memory_audit` por transição; **nunca** + toca `brain_chunks`. + +- [ ] **Step 1: Teste que falha** (AC6 + AC7): +```ts +// signal com applied_count>=1 -> promove a evidence, 1 audit row (trigger 'cron-promote') +// evidence com value>=CONFIRM -> confirmed, +1 audit row +// fato sem evidência: value decai, NÃO é deletado (assert: nenhum DELETE de fato) +// isolamento: deps com 2 contas -> só processa DEFAULT_ACCOUNT_ID, skipa friend +// todo statement capturado inclui account_id +``` +- [ ] **Step 2: Rodar e ver falhar.** +- [ ] **Step 3: Implementar** `runMemoryCuration(deps)` com deps injetadas (load/upsert/audit/clock). Replace-on-write atômico por fato. +- [ ] **Step 4: Rodar e ver passar.** +- [ ] **Step 5: Commit** — `feat(memoria): runMemoryCuration determinístico owner-only + auditoria`. + +### Task 8: `tickMemoryCuration` no brain-classifier +**Files:** +- Modify: `src/index-classifier.ts` (novo tick gated `MEMORY_CURATION_ENABLED`, + lazy-import, try/catch, `recordRun(worker='classifier', source='memory-curation')`, + `cron.schedule`) +- Modify: o allowlist/typing de `getStatus` para reconhecer a fonte `'memory-curation'` +- Test: estender o teste de status/run se houver; senão um unit do gating + +**Interfaces:** +- Consumes: `runMemoryCuration`, `recordRun`. + +- [ ] **Step 1: Teste que falha** — gating: com `MEMORY_CURATION_ENABLED` off, o tick não chama `runMemoryCuration`; on, chama 1x e registra run. +- [ ] **Step 2: Rodar e ver falhar.** +- [ ] **Step 3: Implementar** o tick espelhando `tickEntities`; registrar a fonte no `getStatus`. +- [ ] **Step 4: Rodar e ver passar.** +- [ ] **Step 5: Commit** — `feat(memoria): tick de curadoria no brain-classifier (gated)`. + +### Task 9: dedup do `remember` + guarda de segredo no intake +**Files:** +- Modify: `src/rag/remember-doc.ts` (`source_id` determinístico) +- Create: `src/rag/profile-guard.ts` (`looksLikeSecret(text): boolean` + `scrub`) +- Test: `src/rag/__tests__/remember-dedup.test.ts`, `src/rag/__tests__/profile-guard.test.ts` + +**Interfaces:** +- Produces: `source_id = conversation:`; + `looksLikeSecret`/`stripSecrets` usados no intake de fato de perfil. + +- [ ] **Step 1: Teste que falha** (AC8) — `buildConversationDocument` deriva + `source_id` determinístico da dupla (account, content); `looksLikeSecret` + reconhece padrões óbvios (sk-…, AKIA…, JWT, “BEGIN PRIVATE KEY”). +- [ ] **Step 2: Rodar e ver falhar.** +- [ ] **Step 3: Implementar** o hash determinístico + o guard. Não mudar o fluxo + `deleteBySource`→`upsert` (já idempotente). +- [ ] **Step 4: Rodar e ver passar.** +- [ ] **Step 5: Commit** — `feat(memoria): dedup deterministico do remember + guarda de segredo`. + +### Task 10: Portal — perfil do owner (view + pin/prune + toggles) +**Files:** +- Modify: `src/portal/routes.ts` (rotas owner-only `GET/POST /portal/profile`, + `POST /portal/profile/:id/pin`, `/toggle`); reusar o padrão de + `entity-management.ts` (merge/rename manual) +- Modify: `portal/*` (front estático) — view read-only + ações +- Test: `tests/e2e/profile.spec.ts` (Playwright) + unit das rotas (deps injetadas) + +**Interfaces:** +- Consumes: `profile-storage` (load/upsert/audit), `profile-guard`. + +- [ ] **Step 1: Teste que falha** — rota owner-only (não-owner → 403); criar fato + aplica `looksLikeSecret`; toggle incrementa `applied_count`/`violated_count` + + grava `memory_audit` (trigger `'portal'`); Playwright renderiza a view e a + ação de pin. +- [ ] **Step 2: Rodar e ver falhar.** +- [ ] **Step 3: Implementar** rotas + front mínimo, owner-scoped pela sessão. +- [ ] **Step 4: Rodar e ver passar** (`npm test` + `npx playwright test profile`). +- [ ] **Step 5: Commit** — `feat(memoria): portal do perfil curado (view + pin/prune + toggles)`. + +**→ Checkpoint v1b:** `npm run build && npm test && npx playwright test` verde. +PR + merge. Deploy. Ligar `MEMORY_CURATION_ENABLED` (owner). Verificar AC11 + +`npm run eval` (esperado zero delta, o curador não toca `brain_chunks`). + +--- + +## Self-review (rodado contra o spec) +- **Cobertura de AC:** AC1→T2, AC2→T3, AC3→T4, AC4/AC5→T6, AC6→T7, AC7→T5+T7, + AC8→T9, AC9→T1, AC10→checkpoint v1a, AC11→checkpoint v1b. Sem AC órfão. +- **Consistência de assinatura:** `loadProfileFacts(accountId)` (T5) / + `renderProfile(facts,budget)` (T3) / `composeInstructions(base,block)` (T4) / + `computeConfidence(applied,violated,lastEvidenceAt,now?)` (T2) / + `runMemoryCuration(deps)` (T7) usados de forma idêntica em T6/T7. +- **Sem placeholder de escopo:** fase 2 (LLM) e dedup de `brain_chunks` antigos + são explicitamente **fora deste plano** (planos próprios, eval-gated). + +## Fora deste plano (planos futuros) +- Fase 2: `MEMORY_CURATION_LLM` (extração/fusão Haiku + guarda de segredo + budget/circuit-breaker). +- Conserto do utility no-op em `search.ts` (eval-gated). +- Dedup em massa de `conversation` antigas em `brain_chunks` (eval-gated). +- Patch de 1 linha no repo do Odysseus (entregar perfil no 3º surface). +- Calibração de thresholds + extensão a friends. diff --git a/docs/superpowers/specs/2026-06-18-memoria-perfil-curado-design.md b/docs/superpowers/specs/2026-06-18-memoria-perfil-curado-design.md new file mode 100644 index 0000000..043c92b --- /dev/null +++ b/docs/superpowers/specs/2026-06-18-memoria-perfil-curado-design.md @@ -0,0 +1,375 @@ +# Memória: perfil curado sempre-injetado + escada de confiança + curadoria (2026-06-18) + +Contexto: estudo do **Hermes Agent** (Nous Research) e do **Open Second Brain** +(itechmeat) levantou que o Zinom tem um RAG forte (busca híbrida Voyage + FTS +PT-BR + RRF + rerank) mas **não tem a camada pequena, curada e sempre-presente** +que esses projetos validam: um "USER.md" injetado em todo prompt, alimentado por +uma escada de confiança (sinal → evidência → confirmado) e mantido por uma +passada de curadoria. Hoje a memória do Zinom é 100% *pull*: só existe quando +alguém faz `brain_search`/`recall`. Toda sessão começa fria. + +Este spec desenha a evolução **reaproveitando a infra que já existe e está +dormente** (não construir do zero), mantendo o Zinom como MEGA-MCP multi-surface +e sem tocar Notion/Calendar/Rubrix/VPS. Passou por revisão adversarial de 5 +lentes (segurança multi-tenant, eval-gate, simplicidade, entrega multi-surface, +testabilidade); os bloqueios encontrados estão incorporados abaixo. + +## Diagnóstico (verificado no código, jun/2026) + +Refs `file:line` conferidas por leitura do código real em `src/`: + +1. **Único canal sempre-injetado = `instructions` do MCP.** Construído uma vez + por sessão em `src/index.ts:417-425` como `owner ? OWNER_INSTRUCTIONS : + FRIEND_INSTRUCTIONS` (consts estáticas em `src/mcp-account-config.ts:49`/`:166`). + `owner` é calculado em `index.ts:401` via `isOwnerContext`. Hoje é prosa fixa, + **zero fato por-usuário**, sem orçamento, sem feed/prune/reformulate. +2. **`isOwnerContext` (mcp-account-config.ts:18-24) é fail-closed por desenho:** + `undefined ctx` = owner (cron/eval); `scopes==='all'` = owner; `accountId === + DEFAULT_ACCOUNT_ID` = owner; **qualquer outro `accountId` = friend**; sem + `accountId` = owner SÓ se `isOperator`. **Porém `getAccountId()` + (context.ts:48) faz `ctx.accountId ?? DEFAULT_ACCOUNT_ID` → cai em `'bruno'`.** + Renderizar o perfil a partir de `getAccountId()` vazaria o perfil do owner + para um friend cujo `accountId` foi perdido. A injeção tem que ser gated pelo + **booleano `owner`**, nunca por `getAccountId()`. +3. **Memória de conversa (`remember`/`recall`)** grava `source_type='conversation'` + em `brain_chunks` (`remember-doc.ts:50` usa `seam.id ?? randomUUID()` → + `source_id` aleatório por chamada → **sem dedup**). Só recuperável por query, + **nunca auto-injetada**. +4. **Camada utility/decay/feedback existe e está dormente** (Spec-004, migração + `0012`): colunas `utility_score`/`feedback_count`/`last_useful_at`, ledger + `chunk_feedback`, função `effective_utility()` (decaimento 0.995/dia), view + `stale_memories`. O boost de utility em `search.ts:402-403` é **no-op em + produção**: `rowToChunk` não seta as colunas e os SELECT de + `storage.ts:516-525`/`:570-580` não as buscam. `utility.ts` tem + `DECAY_PER_DAY`/`computeEffectiveUtility` mas **não tem nada de confiança**. +5. **`brain_facts`** (migração `0004`) existe mas `FACTS_ENABLED` **off**, + `queryFacts` sem chamador, e **`facts-storage.ts` tem ZERO referências a + `account_id`** (vazamento confirmado por grep). É a classe de bug a NÃO repetir. +6. **`entities`/`entity_mentions`** (migração `0011`): dedup fuzzy + merge manual + no portal (`entity-management.ts`); único precedente de curadoria humana. +7. **Cron `brain-classifier`** (`src/index-classifier.ts`): `tickEntities` é o + template multi-tenant (loop por conta, orçamento por run, lazy-import, + try/catch, `recordRun`). `runEntityExtraction` varre **todas** as contas com + chunks (friends incluídos) — o curador v1 NÃO pode copiar isso cru. +8. **`status_runs`** (`recordRun`, `storage.ts:289`) e `auditWrite()` + (`src/audit.ts`, JSONL) são os padrões de telemetria/auditoria existentes. +9. **Eval harness** (`scripts/eval/run-eval.mts:94`) chama `brainSearch(q, + {topK:10})` **sem filtro de `source_type`**, e `dedupBySourceId` roda sobre o + pool (`search.ts:269,378`). Logo, **mutar qualquer linha de `brain_chunks` + (inclusive `conversation`) altera o pool/dedup/rerank das queries-ouro** → é + sensível ao eval gate F8 mesmo sem editar `search.ts`. + +**Conclusão:** a fundação existe; falta (a) a camada curada sempre-injetada, (b) +uma escada de confiança que governe o que entra nela, (c) uma passada de +curadoria que a mantenha, (d) auditoria das mutações. Trabalho = +ativar/conectar/consertar, atrás de flag, **sem mutar `brain_chunks` no v1**. + +## Objetivo (uma frase) + +Dar ao Zinom uma camada de **perfil curado por conta**, pequena e injetada em +toda sessão MCP, alimentada por uma escada de confiança determinística +(sinal → evidência → confirmado) e mantida por uma passada de curadoria que +opera **só na tabela nova** (nunca em `brain_chunks`), com trilha de auditoria, +owner-first e tudo atrás de flag. + +## Decisões de design (com tradeoff) + +### D1. Substrato: tabela nova `user_profile_facts`, não `brain_facts` +Migração `0019_user_profile.sql` (o slot `0018` já é `0018_rubrix_flows.sql` em +main; uso o próximo livre `0019`), **fora do vault AES** (prosa não-secreta lida a cada sessão), +`account_id NOT NULL`, `UNIQUE (account_id, content_hash)`, aditiva/idempotente. +- **Tradeoff:** `brain_facts` daria validade temporal de graça mas tem o + `account_id` quebrado e está acoplado a `FACTS_ENABLED`. Tabela nova isola o + risco. `brain_facts` fica **intocado e fora de escopo**. NÃO reusar + `facts-storage.ts`. + +### D2. v1 owner-only, tudo gated OFF +Flags (default off): `PROFILE_INJECT_ENABLED`, `MEMORY_CURATION_ENABLED`, +`MEMORY_CURATION_LLM` (fase 2). Owner = `DEFAULT_ACCOUNT_ID` (`'bruno'`). +- **Injeção fail-closed:** gated pelo booleano `owner` de `index.ts:401`. + `loadProfileFacts`/`renderProfile` **nunca** são chamados para contexto + não-owner no v1. Friends ficam fora. +- **Tradeoff:** default off = **zero mudança de comportamento** até ligar; + owner-only calibra em dado de alto sinal antes de expor friends. + +### D3. Conteúdo: fatos estruturados → renderizados em prosa na injeção +Linha estruturada (`category`, `content`, `status`, confiança, `pinned`); +`renderProfile()` monta o bloco de prosa no momento da injeção. +- **Tradeoff:** prosa pura seria mais simples de injetar mas impede + escada/auditoria/orçamento por fato. + +### D4. Fontes de sinal no v1 (sem tool MCP nova, sem mutar `brain_chunks`) +- **Feed primário:** autoria manual do owner via portal (`/portal/profile`, + owner-only) → linhas em `user_profile_facts` (`pinned` = verdade-base, sempre + elegível; ou `status='signal'` para deixar a escada promover). +- **Evidência/decaimento:** toggles no portal ("ainda vale" / "não vale mais") + incrementam `applied_count`/`violated_count`; o curador recomputa confiança e + decai frescor ao longo do tempo. +- **Guarda de segredo:** a entrada de qualquer fato passa por um filtro que + **descarta padrões óbvios de segredo** (chaves/tokens) antes de virar linha + (o canal é injetado em todo prompt e a tabela é plaintext). +- **Fora do v1:** feed automático a partir de `remember`/conversa (precisa LLM, + é fase 2), evidência automática via `brain_feedback` (é nível-chunk, não + nível-fato), e sinais implícitos de `ai_search_log`. + +### D5. Confiança: razão suavizada × decaimento de frescor (NÃO Wilson no v1) +`computeConfidence(applied, violated, lastEvidenceAt, now?)` — **código novo, +puro, em `src/rag/utility.ts`** (casa da matemática de decaimento, com teste): +`value = (applied / (applied + violated + K_SMOOTH)) * freshnessDecay(lastEvidenceAt)`, +`K_SMOOTH` pequeno, bandas `low|medium|high` em `0.40`/`0.75`. +- **Tradeoff:** o Wilson lower bound só agrega valor em escala multi-usuário + (penalizar razão alta com N baixo), que está fora de escopo. Mantemos a + **assinatura** `computeConfidence(applied, violated, lastEvidenceAt)` para + trocar por Wilson depois sem mexer em chamadores. **Sem auto-retire no v1.** + +### D6. Orçamento de caracteres + economia de prompt-cache +`PROFILE_CHAR_BUDGET` (inicial ~2800 chars ≈ 700 tokens). `renderProfile()` ordena +por prioridade (`pinned` > banda > confiança) e trunca respeitando o orçamento, +**nunca no meio de um fato**. +- **Elegível para injeção** = `pinned` OU (`status='confirmed'` E `confidence_value + >= INJECT_MIN_CONFIDENCE` (0.75)). Decaimento só baixa a confiança/banda → o + fato deixa de ser injetado (não há retire no v1). +- **Crescimento:** sem auto-retire, o perfil só cresce até o teto; o **orçamento + é o limite de fato** (não o decaimento — `0.995/dia` ≈ 0.64 em 90d, lento). +- **Prompt-cache:** o bloco é anexado **depois** das `OWNER_INSTRUCTIONS` + estáticas, então o prefixo estático continua cacheável; só o bloco final + re-cacheia na primeira sessão nova após uma mudança do cron. Frozen-per-sessão + ⇒ zero churn intra-sessão. +- **Staleness:** instructions são congeladas por sessão (`SESSION_TTL_MS` + ~30min). Pior caso: um fato corrigido/removido ainda pode ser injetado por até + um TTL. Aceito e declarado. + +### D7. Banda de confiança NÃO exposta ao modelo no v1 +A banda governa **o que entra** no perfil; não vai pro `presentation_hint`. +Decisão de produto adiada. + +### D8. Ranking intocado — `search.ts` e `brain_chunks` fora do caminho do curador +v1 usa colunas de confiança próprias na tabela nova; **não** conserta o no-op do +utility boost; o **curador NUNCA muta `brain_chunks`**. +- **Eval gate F8:** mergear o código com as flags off é genuinamente off-gate + (zero mudança de comportamento). Como o curador v1 não toca `brain_chunks`, + ligar `MEMORY_CURATION_ENABLED` **também** é off-gate. O único toque em + `brain_chunks` é o D13 (dedup do `remember`, só escritas novas). +- **Tradeoff:** consertar o utility no-op faria o feedback influenciar ranking, + mas é bugfix independente eval-gated; fica fora de escopo. + +### D9. Injeção: dois helpers + wiring fail-closed +Assinaturas (locked, pro plano ser consistente): +- `loadProfileFacts(accountId: string): Promise` — **DB-bound**, + SQL account-scoped `WHERE account_id=$1` (modelado em `getBrainCounts`, + `storage.ts:648`). Sem default interno de conta. +- `renderProfile(facts: ProfileFact[], budget: number): string | null` — + **PURO**. Retorna `null` quando não há fato elegível. +- `composeInstructions(base: string, profileBlock: string | null): string` — + **PURO**. **Sempre** retorna string não-vazia ≥ `base` (retorna `base` quando + o bloco é `null`/vazio). NUNCA `''` (o SDK MCP só envia `instructions` se + truthy; `''` derrubaria TODAS as instructions da sessão). + +Wiring em `src/index.ts:423`, gated pelo `owner` já computado em `:401`: +```ts +const profileBlock = (PROFILE_INJECT_ENABLED && owner) + ? renderProfile(await loadProfileFacts(DEFAULT_ACCOUNT_ID), PROFILE_CHAR_BUDGET) + : null; +// instructions: +composeInstructions(owner ? OWNER_INSTRUCTIONS : FRIEND_INSTRUCTIONS, profileBlock) +``` +`composeInstructions` é concat puro e **não muta** as consts (preserva os testes +de substring de `mcp-account-config.test.ts`). + +### D10. Curadoria: tick determinístico, só na tabela nova, só o owner no v1 +- `tickMemoryCuration(label)` em `src/index-classifier.ts` espelhando + `tickEntities` (gated `MEMORY_CURATION_ENABLED`, lazy-import, try/catch, + `recordRun(worker='classifier', source='memory-curation')`). +- Módulo novo `src/rag/memory-curator.ts`, função testável + `runMemoryCuration(deps)`. **Loop filtrado ao owner** (`WHERE account_id = + DEFAULT_ACCOUNT_ID`), NÃO a varredura de todas as contas. Skipa contas friend. +- **Determinístico, só em `user_profile_facts`:** recomputa + `confidence_value`/`confidence_band` (via `computeConfidence`), promove + `signal`→`evidence` (`applied_count >= 1`) → `confirmed` + (`confidence_value >= CONFIRM_THRESHOLD`), aplica decaimento, escreve **uma + linha em `memory_audit` por transição**. Replace-on-write atômico. +- **NUNCA toca `brain_chunks`** (mantém off eval gate). Evidência vem de + `applied_count` (toggles do portal), nunca de adicionar `utility_score` ao + SELECT do `search.ts`. +- **Fase 2 (flag `MEMORY_CURATION_LLM`):** `callHaiku` extrai fatos candidatos de + memórias de conversa recentes e funde fatos sobrepostos; owner-first; budget + + circuit-breaker do `entity-extractor`; **guarda de segredo** + 1 linha de + auditoria por mudança; `evidence_ref` guarda **ponteiro/id**, nunca conteúdo + cru. + +### D11. Auditoria: linha append-only por transição + telemetria de run +- Tabela `memory_audit` (na migração `0019`): `account_id`, `fact_id`, + `from_state`, `to_state`, `trigger`, `evidence_ref`, `created_at`. +- **Os inserts** ficam no passo do curador (fase v1b), não no MVP de injeção. +- Run da curadoria → `recordRun(...)`; registrar a fonte `'memory-curation'` no + allowlist usado por `getStatus` para não reportar como faltante antes do 1º run. +- **Sem tarballs/snapshots** (Postgres já tem PITR/`pg_dump`). + +### D12. Zero tools MCP novas no v1 +Leitura = injeção. Escrita/curadoria manual = portal. Mantém a seleção de tools +estável para os friends. + +### D13. `remember` ganha dedup por hash de conteúdo +`remember-doc.ts` deriva `source_id = conversation:` (delimitador `\0` evita ambiguidade de concatenação) → upsert +idempotente (replace-on-write via `deleteBySource` já existe). `account_id` vem do +contexto, nunca de input. +- **Tradeoff/eval:** afeta só escritas novas (off-gate no merge). A limpeza em + massa de duplicatas antigas de `conversation` em `brain_chunks` **fica fora do + v1** (seria eval-gated por mutar dados que o `brain_search` lê). + +## Arquitetura do fluxo (v1) + +``` +SINAIS (sem mutar brain_chunks) ESCADA (user_profile_facts) INJEÇÃO (por sessão, owner-only) +portal manual (owner) ─┐ signal ──(applied_count>=1)──> evidence index.ts:423, gated por `owner` +portal toggle "vale?" ─┘ cron evidence ──(value>=CONFIRM)──> confirmed composeInstructions( + curadoria confirmed ──(decai)──> inelegível base, renderProfile( + (noturna, (sem retire; some da injeção) loadProfileFacts(DEFAULT))) + determ., └─ SQL direto, sem brainSearch + só a tabela └─ orçamento PROFILE_CHAR_BUDGET + nova) elegível = pinned OR (confirmed └─ block null ⇒ instructions = base + │ AND value>=INJECT_MIN) + ├─> recompute confidence/band (computeConfidence) + ├─> promote/decay + └─> memory_audit (1 linha/transição) + status_runs +``` + +## Modelo de dados (migração 0019, aditiva/idempotente) + +```sql +CREATE TABLE IF NOT EXISTS user_profile_facts ( + id bigserial PRIMARY KEY, + account_id text NOT NULL, + category text NOT NULL, -- 'projeto'|'pessoa'|'preferencia'|'rotina'|... + content text NOT NULL, + status text NOT NULL DEFAULT 'signal', -- signal|evidence|confirmed (text, extensível) + applied_count int NOT NULL DEFAULT 0, + violated_count int NOT NULL DEFAULT 0, + confidence_value real NOT NULL DEFAULT 0, + confidence_band text NOT NULL DEFAULT 'low', -- low|medium|high + pinned boolean NOT NULL DEFAULT false, + source text NOT NULL DEFAULT 'manual', -- manual|remember|llm + content_hash text NOT NULL, -- sha256(account_id || '\0' || content) + last_evidence_at timestamptz, + valid_from timestamptz NOT NULL DEFAULT now(), + created_at timestamptz NOT NULL DEFAULT now(), + updated_at timestamptz NOT NULL DEFAULT now(), + UNIQUE (account_id, content_hash) +); +CREATE INDEX IF NOT EXISTS idx_upf_account_status ON user_profile_facts (account_id, status); + +CREATE TABLE IF NOT EXISTS memory_audit ( + id bigserial PRIMARY KEY, + account_id text NOT NULL, + fact_id bigint, + from_state text, + to_state text, + trigger text NOT NULL, -- 'portal'|'cron-promote'|'cron-decay'|'llm'|... + evidence_ref text, -- ponteiro/id, nunca conteúdo cru + created_at timestamptz NOT NULL DEFAULT now() +); +CREATE INDEX IF NOT EXISTS idx_memaudit_account ON memory_audit (account_id, created_at DESC); +``` +**Invariante de isolamento (obrigatória em todo statement):** todo INSERT/UPSERT/ +UPDATE/DELETE/SELECT de `user_profile_facts` e `memory_audit` inclui `account_id` +na lista de colunas e no `WHERE`/conflict target. Conflict target do upsert = +`(account_id, content_hash)`. + +## Fora de escopo (registrado, não fazer agora) +- Conserto do no-op do utility boost em `search.ts` (bugfix independente, eval-gated). +- Dedup/merge em massa de duplicatas antigas de `conversation` em `brain_chunks` + (eval-gated; o curador v1 não toca `brain_chunks`). +- `brain_facts`: não ativar, não consertar `account_id` aqui. +- Feed automático via `remember`/conversa + reformulação Haiku = fase 2 + (`MEMORY_CURATION_LLM`). Evidência automática via `brain_feedback`. Sinais + implícitos de `ai_search_log`. Friends/multi-tenant. Máquina de 4+ estados, + Wilson, snapshots em tarball. +- **Entrega no Odysseus:** ver "Gap confirmado" abaixo. + +## Gap confirmado: Odysseus não renderiza `instructions` do MCP +Um crítico (lente multi-surface) apontou que o Odysseus +(`src/mcp_manager.py:362`, **repo separado, a confirmar**) chama +`await session.initialize()` e **descarta o retorno**, jogando fora as +`instructions` do servidor MCP. Logo, no v1 o perfil chega ao **Claude.ai e +Claude Code**, mas **não ao Odysseus**. +- **Decisão:** v1 entrega 2 surfaces. Odysseus é um **follow-up de 1 linha no + repo do Odysseus** (capturar `init_result = await session.initialize()` e + prefixar `init_result.instructions` no system prompt), rastreado separadamente, + **não bloqueia** o v1. + +## Critérios de aceite (verificáveis por máquina) + +`npm test` verde (node:test via tsx, deps injetadas). **Regra:** nenhum AC pode +depender de teste que dá early-return quando `POSTGRES_URL` não está setado (isso +é cobertura fantasma no CI sem Postgres). + +1. **Confiança (puro):** tabela de casos de `computeConfidence`: evidência fina + (1 applied, 0 violated) **não** alcança banda `high`; bandas cruzam em + 0.40/0.75; `lastEvidenceAt` antigo derruba o valor; um caso assertando a queda + realista em ~90d (sinal observável no rollout). +2. **Render + orçamento (puro):** `renderProfile(facts, budget)` põe `pinned` + primeiro, ordena por banda/confiança, respeita `PROFILE_CHAR_BUDGET` (asserção + de tamanho), **nunca corta no meio de um fato**, retorna `null` sem fato + elegível; elegível = `pinned` OR (`confirmed` AND `value>=INJECT_MIN`). +3. **Compose (puro):** `composeInstructions(base, null) === base`; + `composeInstructions(base, block)` contém `base`+`block`, **nunca** retorna + `''`, e preserva as substrings testadas de `OWNER_INSTRUCTIONS`. +4. **Fail-closed (puro, fronteira real):** construir `RequestContext` reais e + assertar `isOwnerContext({authType:'oauth',scopes:['personal'],accountId: + 'friend:x'}) === false` **e** `isOwnerContext({...,accountId:undefined, + isOperator:undefined}) === false`; e que, com esses contextos, o wiring de + injeção produz **zero bytes** do perfil do owner. **Excluir** explicitamente o + caso `ctx === undefined` (esse é owner = cron/eval, por desenho). +5. **Flag off = no-op:** com `PROFILE_INJECT_ENABLED` off, as `instructions` são + byte-idênticas ao baseline (teste de igualdade). +6. **Ladder (curador, deps injetadas, SQL-shape):** dado `signal` com + `applied_count` que cruza o limiar, `runMemoryCuration` promove + `signal`→`evidence`→`confirmed`, recomputa banda e grava **uma** linha em + `memory_audit` por transição; fato sem evidência decai e **não** é deletado. +7. **Isolamento de conta (estrutural):** asserção SQL-shape de que **todo** + statement de `user_profile_facts`/`memory_audit` (insert, upsert com conflict + `(account_id, content_hash)`, update de promote/decay, insert de audit) inclui + `account_id`; e que `runMemoryCuration` processando o owner **nunca** lê/escreve + linha de outro `account_id` e **skipa contas friend** (espelha o teste de + isolamento do entity cron). +8. **Dedup do `remember` (puro + SQL-shape):** (a) `buildConversationDocument` + deriva `source_id = conversation:` + deterministicamente; (b) com pool injetado, `remember` faz + `deleteBySource(source_id)` e depois upsert (idempotente). +9. **Migração (surrogate de CI):** um teste lê o `.sql` da `0019` do disco e + asserta que todo `CREATE` usa `IF NOT EXISTS` e que existe + `UNIQUE (account_id, content_hash)`. (A execução 2x real é checada na VPS.) + +Pós-deploy: +10. **Etapa de injeção (`PROFILE_INJECT_ENABLED` on, owner):** `curl` do handshake + `initialize` em `/mcp` com o bearer do owner mostra o fato `pinned` de teste + no campo `instructions` (prova que o **engine** envia). `GET /health` 200, + `https://zinom.ai/mcp` 401, e (flags-off no deploy anterior) AC5. +11. **Etapa do curador (`MEMORY_CURATION_ENABLED` on):** `GET /status` (Bearer) + sem `stale_or_failing` e com run `memory-curation` registrado; **e** rodar + `npm run eval` na VPS confirmando R@5 ≥ 0.917 / MRR ≥ 0.616 inalterados + (o curador não toca `brain_chunks`, então o esperado é zero delta; documentar + em `RESULTS.md`). + +## Plano de rollout (gated, reversível) +1. Migração 0019 (aditiva; deploy não muda comportamento). +2. **v1a — injeção:** helpers + wiring fail-closed, **flags off**. Mergear com CI + verde (off-gate: não toca `search.ts`; sem mudança de comportamento). + Pós-deploy: AC5 + /health + 401. +3. Ligar `PROFILE_INJECT_ENABLED` **só owner**, semear 2-3 fatos `pinned`, + verificar AC10 (injeção) e economia de prompt. +4. **v1b — curador + portal:** `runMemoryCuration` (só tabela nova), inserts de + `memory_audit`, portal (view + pin/prune + toggles). Ligar + `MEMORY_CURATION_ENABLED`; AC11 (incl. `npm run eval` = zero delta). +5. **Fase 2:** `MEMORY_CURATION_LLM` owner-first (extração/fusão Haiku + guarda de + segredo); depois calibrar thresholds e considerar friends. +6. Follow-up separado: patch de 1 linha no repo do Odysseus para entregar o perfil + no 3º surface. + +## Questões em aberto (default conservador assumido) +- **Char budget exato:** ~2800 chars é âncora do Hermes; calibrar na etapa 3. +- **Quando ligar a fase 2 LLM:** depende do núcleo provado (etapa 4) e do + orçamento Anthropic (há histórico de créditos zerados, ver spec + confianca-multitenant). +- **Confirmar o `mcp_manager.py:362` do Odysseus** antes do follow-up da etapa 6. diff --git a/scripts/migrations/0019_user_profile.sql b/scripts/migrations/0019_user_profile.sql new file mode 100644 index 0000000..eaa6312 --- /dev/null +++ b/scripts/migrations/0019_user_profile.sql @@ -0,0 +1,35 @@ +-- 0018: perfil curado de memória por conta (Hermes/Open Second Brain). +-- Prosa NÃO-secreta, lida a cada montagem de sessão MCP. Fora do vault AES. +CREATE TABLE IF NOT EXISTS user_profile_facts ( + id bigserial PRIMARY KEY, + account_id text NOT NULL, + category text NOT NULL, + content text NOT NULL, + status text NOT NULL DEFAULT 'signal', + applied_count int NOT NULL DEFAULT 0, + violated_count int NOT NULL DEFAULT 0, + confidence_value real NOT NULL DEFAULT 0, + confidence_band text NOT NULL DEFAULT 'low', + pinned boolean NOT NULL DEFAULT false, + source text NOT NULL DEFAULT 'manual', + content_hash text NOT NULL, + last_evidence_at timestamptz, + valid_from timestamptz NOT NULL DEFAULT now(), + created_at timestamptz NOT NULL DEFAULT now(), + updated_at timestamptz NOT NULL DEFAULT now(), + UNIQUE (account_id, content_hash) +); +CREATE INDEX IF NOT EXISTS idx_upf_account_status ON user_profile_facts (account_id, status); + +-- trilha append-only de transições de estado da memória +CREATE TABLE IF NOT EXISTS memory_audit ( + id bigserial PRIMARY KEY, + account_id text NOT NULL, + fact_id bigint, + from_state text, + to_state text, + trigger text NOT NULL, + evidence_ref text, + created_at timestamptz NOT NULL DEFAULT now() +); +CREATE INDEX IF NOT EXISTS idx_memaudit_account ON memory_audit (account_id, created_at DESC); diff --git a/src/__tests__/mcp-account-config.test.ts b/src/__tests__/mcp-account-config.test.ts index 01ff1af..b6029b7 100644 --- a/src/__tests__/mcp-account-config.test.ts +++ b/src/__tests__/mcp-account-config.test.ts @@ -5,7 +5,7 @@ // puro mcp-account-config.ts justamente para ser testável sem boot do servidor. import { test } from "node:test"; import assert from "node:assert/strict"; -import { OWNER_INSTRUCTIONS, FRIEND_INSTRUCTIONS } from "../mcp-account-config.js"; +import { OWNER_INSTRUCTIONS, FRIEND_INSTRUCTIONS, composeInstructions } from "../mcp-account-config.js"; test("owner e friend instructions trazem a regra Zinom-first e os links", () => { for (const s of [OWNER_INSTRUCTIONS, FRIEND_INSTRUCTIONS]) { @@ -24,3 +24,23 @@ test("owner e friend instructions ensinam brain_get_document p/ conteúdo ínteg assert.match(s, /NUNCA reconstrua um documento somando resultados de brain_search/); } }); + +test("composeInstructions: profileBlock vazio/null retorna a base inalterada", () => { + // INVARIANTE: o SDK só envia `instructions` se truthy; jamais devolver '' por + // ausência de bloco — devolver a base intacta. + assert.equal(composeInstructions(OWNER_INSTRUCTIONS, null), OWNER_INSTRUCTIONS); + assert.equal(composeInstructions(OWNER_INSTRUCTIONS, ""), OWNER_INSTRUCTIONS); + assert.equal(composeInstructions(OWNER_INSTRUCTIONS, " "), OWNER_INSTRUCTIONS); // só-espaços = vazio +}); + +test("composeInstructions: concatena base + bloco preservando ambos", () => { + const c = composeInstructions(OWNER_INSTRUCTIONS, "PERFIL: x"); + assert.ok(c.includes(OWNER_INSTRUCTIONS), "deve conter a base íntegra"); + assert.ok(c.includes("PERFIL: x"), "deve conter o bloco de perfil"); +}); + +test("composeInstructions: nunca devolve '' e sempre contém a base não-vazia", () => { + const c = composeInstructions("base", "b"); + assert.notEqual(c, ""); + assert.ok(c.includes("base")); +}); diff --git a/src/__tests__/profile-injection.test.ts b/src/__tests__/profile-injection.test.ts new file mode 100644 index 0000000..03a0d99 --- /dev/null +++ b/src/__tests__/profile-injection.test.ts @@ -0,0 +1,132 @@ +// src/__tests__/profile-injection.test.ts +// T6 (CRÍTICO DE SEGURANÇA): resolveInstructions injeta o perfil curado nas +// `instructions` da sessão MCP de forma FAIL-CLOSED e gated por flag. Um bug aqui +// vazaria o perfil do owner para um friend. Estes testes são PUROS (sem Postgres, +// sem Express): o carregamento de fatos é injetado (loadFacts fake/spy), então +// um friend NUNCA pode tocar os fatos do owner. +import { test } from "node:test"; +import assert from "node:assert/strict"; +import { + resolveInstructions, + isOwnerContext, + OWNER_INSTRUCTIONS, + FRIEND_INSTRUCTIONS, +} from "../mcp-account-config.js"; +import type { ProfileFact } from "../rag/profile.js"; + +// Fato fixture COMPLETO (todos os campos do tipo ProfileFact). pinned=true para +// ser sempre elegível pelo renderProfile real, independentemente de confiança. +function makeFact(overrides: Partial = {}): ProfileFact { + return { + id: 1, + account_id: "bruno", + category: "preferencia", + content: "OWNER_SECRET_FACT_marcador_unico", + status: "confirmed", + applied_count: 3, + violated_count: 0, + confidence_value: 0.95, + confidence_band: "high", + pinned: true, + source: "manual", + content_hash: "hash-1", + last_evidence_at: null, + ...overrides, + }; +} + +// Um spy de loadFacts que FALHA o teste se for chamado: prova que o owner=false +// (friend) curto-circuita ANTES de qualquer carregamento de fatos do owner. +function neverCalled(): (accountId: string) => Promise { + return async () => { + throw new Error("loadFacts NÃO deveria ter sido chamado (fail-closed violado)"); + }; +} + +// Um spy que, se chamado, devolveria os fatos do OWNER — usado para provar que o +// resultado para um friend não contém nenhum byte desses fatos. +function spyThatReturnsOwnerFacts(): { + fn: (accountId: string) => Promise; + called: () => boolean; +} { + let was = false; + const fn = async () => { + was = true; + return [makeFact()]; + }; + return { fn, called: () => was }; +} + +// AC5 — flag off = baseline exato, e loadFacts NUNCA é chamado. +test("AC5: flag off → owner recebe OWNER_INSTRUCTIONS sem tocar loadFacts", async () => { + const out = await resolveInstructions({ + owner: true, + injectEnabled: false, + defaultAccountId: "bruno", + loadFacts: neverCalled(), + }); + assert.equal(out, OWNER_INSTRUCTIONS); +}); + +test("AC5: flag off → friend recebe FRIEND_INSTRUCTIONS sem tocar loadFacts", async () => { + const out = await resolveInstructions({ + owner: false, + injectEnabled: false, + defaultAccountId: "bruno", + loadFacts: neverCalled(), + }); + assert.equal(out, FRIEND_INSTRUCTIONS); +}); + +// AC4 — fail-closed. +test("AC4: isOwnerContext é false para OAuth friend com accountId", () => { + assert.equal( + isOwnerContext({ authType: "oauth", scopes: ["personal"], accountId: "friend:x" }), + false, + ); +}); + +test("AC4: isOwnerContext é false para OAuth sem accountId e sem isOperator", () => { + assert.equal( + isOwnerContext({ + authType: "oauth", + scopes: ["personal"], + accountId: undefined, + isOperator: undefined, + }), + false, + ); +}); + +test("AC4: flag on + friend → NÃO contém fatos do owner E loadFacts NÃO é chamado", async () => { + const spy = spyThatReturnsOwnerFacts(); + const out = await resolveInstructions({ + owner: false, + injectEnabled: true, + defaultAccountId: "bruno", + loadFacts: spy.fn, + }); + // owner=false curto-circuita: nunca carrega nem renderiza fatos do owner. + assert.equal(spy.called(), false, "loadFacts não pode ser chamado para friend"); + assert.equal(out, FRIEND_INSTRUCTIONS, "friend recebe a base friend pura"); + assert.ok( + !out.includes("OWNER_SECRET_FACT_marcador_unico"), + "resultado do friend não pode conter nenhum byte dos fatos do owner", + ); +}); + +// owner on — perfil é injetado. +test("owner on: contém OWNER_INSTRUCTIONS E o conteúdo do fato pinned", async () => { + const pinned = makeFact({ content: "Bruno prefere respostas curtas em PT-BR." }); + const out = await resolveInstructions({ + owner: true, + injectEnabled: true, + defaultAccountId: "bruno", + loadFacts: async () => [pinned], + }); + assert.ok(out.includes(OWNER_INSTRUCTIONS), "deve conter a base owner íntegra"); + assert.ok( + out.includes("Bruno prefere respostas curtas em PT-BR."), + "deve conter o conteúdo do fato pinned", + ); +}); diff --git a/src/index-classifier.ts b/src/index-classifier.ts index b1f64e5..04f5d69 100644 --- a/src/index-classifier.ts +++ b/src/index-classifier.ts @@ -9,6 +9,7 @@ import { runRevisitar } from "./classifier/revisitar.js"; import { syncGranolasToReunioes } from "./classifier/granola-to-reuniao.js"; import { runDailyBriefing } from "./briefing/daily-briefing.js"; import { recordRun } from "./rag/storage.js"; +import { tickMemoryCuration } from "./rag/memory-curation-tick.js"; import { runResyncTick } from "./billing/resync-cron.js"; import { notify } from "./notify.js"; @@ -16,6 +17,7 @@ const CLASSIFIER_CRON = process.env.CLASSIFIER_CRON ?? "30 * * * *"; // half pas const REVISITAR_CRON = process.env.REVISITAR_CRON ?? "0 7 * * *"; // 07:00 every day const GRANOLA_REUNIAO_CRON = process.env.GRANOLA_REUNIAO_CRON ?? "*/15 * * * *"; // every 15min const BRIEFING_CRON = process.env.BRIEFING_CRON ?? "0 7 * * *"; // 07:00 every day +const MEMORY_CURATION_CRON = process.env.MEMORY_CURATION_CRON ?? "15 4 * * *"; // 04:15 every day async function tickClassifier(label: string): Promise { const start = Date.now(); @@ -84,7 +86,7 @@ async function tickBriefing(label: string): Promise { } console.log( - `brain-classifier starting; classifier cron: ${CLASSIFIER_CRON}; revisitar cron: ${REVISITAR_CRON}; granola->reuniao cron: ${GRANOLA_REUNIAO_CRON}; briefing cron: ${BRIEFING_CRON}`, + `brain-classifier starting; classifier cron: ${CLASSIFIER_CRON}; revisitar cron: ${REVISITAR_CRON}; granola->reuniao cron: ${GRANOLA_REUNIAO_CRON}; briefing cron: ${BRIEFING_CRON}; memory-curation cron: ${MEMORY_CURATION_CRON}`, ); console.log("running initial classifier tick..."); void tickClassifier("initial"); @@ -133,6 +135,14 @@ cron.schedule(CLASSIFIER_CRON, () => { void tickEntities("cron"); }); +// T7/T8 — agenda o curador determinístico OWNER-ONLY num horário noturno +// (default 04:15). O gate MEMORY_CURATION_ENABLED é checado dentro de +// tickMemoryCuration: o schedule existe sempre, mas o tick é no-op quando a flag +// não é 'true'. Lazy-import e recordRun(source='memory-curation') vivem no tick. +cron.schedule(MEMORY_CURATION_CRON, () => { + void tickMemoryCuration("cron"); +}); + // Fase 3 billing — per-account auto re-sync. Hourly tick; each account is // re-indexed only when its plan's syncIntervalHours has elapsed (free skipped). const RESYNC_CRON = process.env.RESYNC_CRON ?? "15 * * * *"; diff --git a/src/index.ts b/src/index.ts index f0b1cd4..bde26aa 100644 --- a/src/index.ts +++ b/src/index.ts @@ -36,8 +36,9 @@ import { createRubrixRouter } from "./rubrix/routes.js"; import { registerRubrixTools } from "./rubrix/tools.js"; import { resolveBearer, accountWorkspaces } from "./account-bearer.js"; import { isAccountActive } from "./account-status.js"; -import { requestContext, getContext, getAccountId, type RequestContext } from "./context.js"; -import { isOwnerContext, isOperatorToken, OWNER_INSTRUCTIONS, FRIEND_INSTRUCTIONS } from "./mcp-account-config.js"; +import { requestContext, getContext, getAccountId, DEFAULT_ACCOUNT_ID, type RequestContext } from "./context.js"; +import { isOwnerContext, isOperatorToken, resolveInstructions } from "./mcp-account-config.js"; +import { loadProfileFacts } from "./rag/profile-storage.js"; import { ALL_WORKSPACES } from "./clients.js"; import { getStatus } from "./rag/storage.js"; import { summarizeStatus, renderStatusHtml, escapeHtml } from "./rag/status.js"; @@ -131,6 +132,11 @@ app.use((req, _res, next) => { // Auth middleware for /mcp — accepts static BEARER_TOKEN or OAuth access tokens const BEARER_TOKEN = process.env.BEARER_TOKEN; +// T6 — gate (default OFF) para injetar o perfil curado do owner nas instructions +// da sessão. Fail-closed: o perfil só é carregado para o owner (ver +// resolveInstructions); um friend nunca toca os fatos do owner, flag ligada ou não. +const PROFILE_INJECT_ENABLED = process.env.PROFILE_INJECT_ENABLED === "true"; + // OAuth routes (well-known, register, authorize, token, admin) app.use(createOAuthRouter(BASE_URL, BEARER_TOKEN)); @@ -446,13 +452,23 @@ app.post("/mcp", async (req, res) => { if (id) evictSession(id); }; + // T6 — instructions da sessão (owner/friend), opcionalmente enriquecidas com o + // perfil curado do owner quando PROFILE_INJECT_ENABLED. Fail-closed: friend nunca + // carrega os fatos do owner. Montado ANTES de instanciar o server (handler async). + const instructions = await resolveInstructions({ + owner, + injectEnabled: PROFILE_INJECT_ENABLED, + defaultAccountId: DEFAULT_ACCOUNT_ID, + loadFacts: loadProfileFacts, + }); + const server = new McpServer( { name: "zinom", version: "1.0.0", }, { - instructions: owner ? OWNER_INSTRUCTIONS : FRIEND_INSTRUCTIONS, + instructions, } ); diff --git a/src/mcp-account-config.ts b/src/mcp-account-config.ts index 18f5c9f..b9f83d0 100644 --- a/src/mcp-account-config.ts +++ b/src/mcp-account-config.ts @@ -4,6 +4,7 @@ // Security note: ownership is decided ONLY from the trusted request context's // accountId (set by the auth layer), never from tool input. import { DEFAULT_ACCOUNT_ID, type RequestContext } from "./context.js"; +import { renderProfile, PROFILE_CHAR_BUDGET, type ProfileFact } from "./rag/profile.js"; /** True when the request is the operator/owner (full notion_* suite + full * INSTRUCTIONS), false for an onboarded friend account (restricted, safe set). @@ -43,6 +44,61 @@ export function isOperatorToken( ); } +/** Pure concat of the base server instructions with an optional per-account + * profile block (e.g. a remembered preferences summary). Used to enrich the + * session `instructions` without mutating the base consts. + * + * INVARIANTE CRÍTICA: o SDK MCP só envia `instructions` quando truthy — uma + * string vazia derrubaria TODAS as instructions da sessão. Por isso: + * - `profileBlock` null / vazio / só-espaços → retorna `base` inalterada; + * - caso contrário → `base` + separador + `profileBlock`. + * Na prática `base` nunca é vazia em produção (são as consts OWNER/FRIEND + * já `.trim()`'d). Mesmo assim a função é segura: ela nunca destrói `base`, + * e quando `base` é não-vazia o retorno sempre a contém. Concat puro: NÃO muta + * OWNER_INSTRUCTIONS / FRIEND_INSTRUCTIONS. */ +export function composeInstructions(base: string, profileBlock: string | null): string { + if (profileBlock === null || profileBlock.trim() === "") return base; + return `${base}\n\n${profileBlock}`; +} + +/** + * T6 (CRÍTICO DE SEGURANÇA): resolve as `instructions` da sessão MCP, opcionalmente + * enriquecidas com o perfil curado do owner. FAIL-CLOSED por desenho: + * + * - `base` = owner ? OWNER_INSTRUCTIONS : FRIEND_INSTRUCTIONS. + * - O perfil só é carregado/renderizado quando `injectEnabled && owner` for true. + * Um friend (owner=false) NUNCA chama `loadFacts`: o perfil curado é do owner e + * carregá-lo no caminho de um friend já seria um vazamento (mesmo que o bloco + * fosse descartado depois). O curto-circuito `owner &&` garante isso. + * - O resultado SEMPRE contém `base` (composeInstructions nunca devolve '' nem a + * destrói): mesmo sem fatos elegíveis, a sessão mantém suas instructions. + * + * `loadFacts`/`render`/`budget` são injetados para manter a função testável sem + * Postgres nem Express. Em produção: loadFacts = loadProfileFacts, render = + * renderProfile, budget = PROFILE_CHAR_BUDGET. + */ +export async function resolveInstructions(args: { + owner: boolean; + injectEnabled: boolean; + defaultAccountId: string; + loadFacts: (accountId: string) => Promise; + render?: (facts: ProfileFact[], budget: number) => string | null; + budget?: number; +}): Promise { + const base = args.owner ? OWNER_INSTRUCTIONS : FRIEND_INSTRUCTIONS; + const render = args.render ?? renderProfile; + const budget = args.budget ?? PROFILE_CHAR_BUDGET; + + // FAIL-CLOSED: o perfil curado é do owner. Só carrega quando a flag está ligada + // E o request é do owner. Para um friend, NUNCA toca loadFacts. + const profileBlock = + args.injectEnabled && args.owner + ? render(await args.loadFacts(args.defaultAccountId), budget) + : null; + + return composeInstructions(base, profileBlock); +} + // The operator/owner server instructions (moved from index.ts so they are pure // and unit-testable). They name the owner's three private workspaces and house // rules — NEVER serve them to a friend account (see FRIEND_INSTRUCTIONS below). diff --git a/src/rag/__tests__/memory-curation-tick.test.ts b/src/rag/__tests__/memory-curation-tick.test.ts new file mode 100644 index 0000000..339c008 --- /dev/null +++ b/src/rag/__tests__/memory-curation-tick.test.ts @@ -0,0 +1,59 @@ +// src/rag/__tests__/memory-curation-tick.test.ts +// T8 — gating do tick do curador no brain-classifier. +// +// tickMemoryCuration vive num módulo próprio (memory-curation-tick.ts) com um +// seam injetável (run) para testar o gate sem DB nem Anthropic, e sem arrastar a +// cadeia de imports do entrypoint (Express/clients/validação de tokens Notion). +// index-classifier.ts apenas importa e agenda esse tick. +// +// off (MEMORY_CURATION_ENABLED != 'true') -> NÃO chama run. +// on (=== 'true') -> chama run exatamente uma vez. +import { test, afterEach } from "node:test"; +import assert from "node:assert/strict"; +import { tickMemoryCuration } from "../memory-curation-tick.js"; + +const saved = process.env.MEMORY_CURATION_ENABLED; + +afterEach(() => { + if (saved === undefined) delete process.env.MEMORY_CURATION_ENABLED; + else process.env.MEMORY_CURATION_ENABLED = saved; +}); + +test("gate OFF: tickMemoryCuration não chama run", async () => { + delete process.env.MEMORY_CURATION_ENABLED; + let calls = 0; + await tickMemoryCuration("test", async () => { + calls++; + return { processed: 0, transitions: 0 }; + }); + assert.equal(calls, 0, "com a flag desligada o curador não roda"); +}); + +test("gate OFF explícito ('false'): tickMemoryCuration não chama run", async () => { + process.env.MEMORY_CURATION_ENABLED = "false"; + let calls = 0; + await tickMemoryCuration("test", async () => { + calls++; + return { processed: 0, transitions: 0 }; + }); + assert.equal(calls, 0); +}); + +test("gate ON ('true'): tickMemoryCuration chama run uma vez", async () => { + process.env.MEMORY_CURATION_ENABLED = "true"; + let calls = 0; + await tickMemoryCuration("test", async () => { + calls++; + return { processed: 3, transitions: 1 }; + }); + assert.equal(calls, 1, "com a flag ligada o curador roda exatamente uma vez"); +}); + +test("gate ON: erro no run é engolido (tick nunca propaga)", async () => { + process.env.MEMORY_CURATION_ENABLED = "true"; + await assert.doesNotReject(async () => { + await tickMemoryCuration("test", async () => { + throw new Error("boom"); + }); + }); +}); diff --git a/src/rag/__tests__/memory-curator.test.ts b/src/rag/__tests__/memory-curator.test.ts new file mode 100644 index 0000000..1d49daf --- /dev/null +++ b/src/rag/__tests__/memory-curator.test.ts @@ -0,0 +1,283 @@ +// src/rag/__tests__/memory-curator.test.ts +// T7 — curador determinístico de memória (owner-only, só user_profile_facts). +// +// Deps INJETADAS (loadFacts/upsertFact/insertAudit), sem DB. Cobre: +// AC6: promoção determinística por degraus +// - 'signal' -> 'evidence' quando applied_count >= 1 (1 audit cron-promote) +// - 'evidence' -> 'confirmed' quando confidence >= CONFIRM_THRESHOLD (1 audit) +// - um fato pode subir 2 degraus no mesmo run (2 audits) +// - fato sem evidência: confiança decai, status NÃO muda, nenhum delete +// AC7 isolamento: accountId='bruno' chega em loadFacts e TODA escrita +// (upsertFact/insertAudit) carrega account_id da conta processada; o +// curador NÃO tem caminho que itere outras contas (sem dep de listar contas). +import { test } from "node:test"; +import assert from "node:assert/strict"; +import { + runMemoryCuration, + CONFIRM_THRESHOLD, + type MemoryCurationDeps, +} from "../memory-curator.js"; +import { INJECT_MIN_CONFIDENCE, type ProfileFact } from "../profile.js"; +import { DEFAULT_ACCOUNT_ID } from "../../context.js"; + +// A fixed "now" so confidence freshness is deterministic. +const NOW = new Date("2026-06-18T12:00:00.000Z"); + +function fact(overrides: Partial): ProfileFact { + return { + id: 1, + account_id: DEFAULT_ACCOUNT_ID, + category: "estilo", + content: "Prefere respostas concisas", + status: "signal", + applied_count: 0, + violated_count: 0, + confidence_value: 0, + confidence_band: "low", + pinned: false, + source: "manual", + content_hash: "h1", + last_evidence_at: null, + ...overrides, + }; +} + +interface Spies { + loadCalls: string[]; + upserts: Array>; + audits: Array>; +} + +function makeDeps(facts: ProfileFact[], accountId?: string): { deps: MemoryCurationDeps; spies: Spies } { + const spies: Spies = { loadCalls: [], upserts: [], audits: [] }; + const deps: MemoryCurationDeps = { + accountId, + now: NOW, + loadFacts: async (acct: string) => { + spies.loadCalls.push(acct); + // Return facts stamped with the account the curator asked for, mirroring + // the real account-scoped loader. + return facts.map((f) => ({ ...f, account_id: acct })); + }, + upsertFact: async (f) => { + spies.upserts.push(f as unknown as Record); + return 0; + }, + insertAudit: async (row) => { + spies.audits.push(row as unknown as Record); + }, + }; + return { deps, spies }; +} + +// --------------------------------------------------------------------------- +// CONFIRM_THRESHOLD alignment +// --------------------------------------------------------------------------- + +test("CONFIRM_THRESHOLD está alinhado com INJECT_MIN_CONFIDENCE (0.75)", () => { + assert.equal(CONFIRM_THRESHOLD, INJECT_MIN_CONFIDENCE); + assert.equal(CONFIRM_THRESHOLD, 0.75); +}); + +// --------------------------------------------------------------------------- +// AC6 — signal -> evidence +// --------------------------------------------------------------------------- + +test("AC6: 'signal' com applied_count>=1 vira 'evidence' e gera 1 audit cron-promote", async () => { + const { deps, spies } = makeDeps([ + fact({ id: 10, status: "signal", applied_count: 1, violated_count: 0, last_evidence_at: NOW }), + ]); + + const res = await runMemoryCuration(deps); + + assert.equal(res.processed, 1); + assert.equal(res.transitions, 1); + assert.equal(spies.audits.length, 1); + assert.equal(spies.audits[0].from_state, "signal"); + assert.equal(spies.audits[0].to_state, "evidence"); + assert.equal(spies.audits[0].trigger, "cron-promote"); + assert.equal(spies.audits[0].account_id, DEFAULT_ACCOUNT_ID); + assert.equal(spies.audits[0].fact_id, 10); + + // upsert reflects the new status. + assert.equal(spies.upserts.length, 1); + assert.equal(spies.upserts[0].status, "evidence"); + assert.equal(spies.upserts[0].account_id, DEFAULT_ACCOUNT_ID); +}); + +// --------------------------------------------------------------------------- +// AC6 — evidence -> confirmed +// --------------------------------------------------------------------------- + +test("AC6: 'evidence' com confidence>=0.75 vira 'confirmed' e gera 1 audit", async () => { + // applied alto + violated 0 + evidência hoje => ratio = 20/22 ≈ 0.909, freshness=1. + const { deps, spies } = makeDeps([ + fact({ + id: 20, + status: "evidence", + applied_count: 20, + violated_count: 0, + last_evidence_at: NOW, + }), + ]); + + const res = await runMemoryCuration(deps); + + assert.equal(res.processed, 1); + assert.equal(res.transitions, 1); + assert.equal(spies.audits.length, 1); + assert.equal(spies.audits[0].from_state, "evidence"); + assert.equal(spies.audits[0].to_state, "confirmed"); + assert.equal(spies.audits[0].trigger, "cron-promote"); + + assert.equal(spies.upserts.length, 1); + assert.equal(spies.upserts[0].status, "confirmed"); + assert.ok((spies.upserts[0].confidence_value as number) >= CONFIRM_THRESHOLD); + assert.equal(spies.upserts[0].confidence_band, "high"); +}); + +// --------------------------------------------------------------------------- +// AC6 — two steps in one run +// --------------------------------------------------------------------------- + +test("AC6: fato 'signal' com evidência forte sobe 2 degraus num run só (2 audits)", async () => { + const { deps, spies } = makeDeps([ + fact({ + id: 30, + status: "signal", + applied_count: 20, + violated_count: 0, + last_evidence_at: NOW, + }), + ]); + + const res = await runMemoryCuration(deps); + + assert.equal(res.processed, 1); + assert.equal(res.transitions, 2); + assert.equal(spies.audits.length, 2); + assert.equal(spies.audits[0].from_state, "signal"); + assert.equal(spies.audits[0].to_state, "evidence"); + assert.equal(spies.audits[1].from_state, "evidence"); + assert.equal(spies.audits[1].to_state, "confirmed"); + // Single upsert with the final status. + assert.equal(spies.upserts.length, 1); + assert.equal(spies.upserts[0].status, "confirmed"); +}); + +// --------------------------------------------------------------------------- +// Decay only — no status change, no delete +// --------------------------------------------------------------------------- + +test("fato sem evidência (applied 0, last_evidence_at antigo) decai sem mudar status e sem delete", async () => { + const old = new Date(NOW.getTime() - 400 * 86_400_000); // ~400 dias atrás + const { deps, spies } = makeDeps([ + fact({ + id: 40, + status: "evidence", + applied_count: 0, + violated_count: 3, + confidence_value: 0.3, + confidence_band: "low", + last_evidence_at: old, + }), + ]); + + const res = await runMemoryCuration(deps); + + assert.equal(res.processed, 1); + assert.equal(res.transitions, 0, "decaimento não gera transição de status"); + assert.equal(spies.audits.length, 0, "sem audit quando só decai"); + + // No "delete" dependency exists on MemoryCurationDeps — this is structural. + assert.equal((deps as Record).deleteFact, undefined); + assert.equal((deps as Record).removeFact, undefined); + + // If value/band changed vs the stored row, an upsert may fire — but status + // must be unchanged. + for (const u of spies.upserts) { + assert.equal(u.status, "evidence", "status não muda por decaimento"); + } +}); + +test("fato 'signal' sem applied_count NÃO promove e não gera audit", async () => { + const { deps, spies } = makeDeps([ + fact({ id: 41, status: "signal", applied_count: 0, violated_count: 0, last_evidence_at: null }), + ]); + + const res = await runMemoryCuration(deps); + assert.equal(res.transitions, 0); + assert.equal(spies.audits.length, 0); +}); + +// --------------------------------------------------------------------------- +// pinned — confidence recomputed but status untouched +// --------------------------------------------------------------------------- + +test("fato pinned não muda de status por decaimento (recomputa confiança só pra exibição)", async () => { + const old = new Date(NOW.getTime() - 400 * 86_400_000); + const { deps, spies } = makeDeps([ + fact({ + id: 50, + status: "signal", + pinned: true, + applied_count: 5, + violated_count: 0, + confidence_value: 0.9, + confidence_band: "high", + last_evidence_at: old, + }), + ]); + + const res = await runMemoryCuration(deps); + assert.equal(res.transitions, 0, "pinned não promove de status"); + assert.equal(spies.audits.length, 0); + for (const u of spies.upserts) { + assert.equal(u.status, "signal", "pinned mantém o status"); + } +}); + +// --------------------------------------------------------------------------- +// AC7 — isolation +// --------------------------------------------------------------------------- + +test("AC7: accountId explícito chega em loadFacts e em TODA escrita", async () => { + const { deps, spies } = makeDeps( + [ + fact({ id: 60, status: "signal", applied_count: 1, last_evidence_at: NOW }), + fact({ id: 61, status: "evidence", applied_count: 20, violated_count: 0, last_evidence_at: NOW, content_hash: "h2" }), + ], + "bruno", + ); + + await runMemoryCuration(deps); + + assert.deepEqual(spies.loadCalls, ["bruno"], "loadFacts chamado uma vez com 'bruno'"); + assert.ok(spies.upserts.length > 0); + for (const u of spies.upserts) { + assert.equal(u.account_id, "bruno", "todo upsert carrega account_id da conta processada"); + } + for (const a of spies.audits) { + assert.equal(a.account_id, "bruno", "todo audit carrega account_id da conta processada"); + } +}); + +test("AC7: default é DEFAULT_ACCOUNT_ID quando accountId não é passado", async () => { + const { deps, spies } = makeDeps([ + fact({ id: 70, status: "signal", applied_count: 1, last_evidence_at: NOW }), + ]); // accountId undefined + + await runMemoryCuration(deps); + assert.deepEqual(spies.loadCalls, [DEFAULT_ACCOUNT_ID]); + for (const u of spies.upserts) assert.equal(u.account_id, DEFAULT_ACCOUNT_ID); + for (const a of spies.audits) assert.equal(a.account_id, DEFAULT_ACCOUNT_ID); +}); + +test("AC7: o curador NÃO recebe dep de listar contas (sem caminho multi-conta)", () => { + // Structural: MemoryCurationDeps must not carry any "list accounts" hook. + const { deps } = makeDeps([]); + const keys = Object.keys(deps); + for (const k of keys) { + assert.doesNotMatch(k, /listAccounts|allAccounts|accounts/i, `dep inesperada: ${k}`); + } +}); diff --git a/src/rag/__tests__/migration-0019.test.ts b/src/rag/__tests__/migration-0019.test.ts new file mode 100644 index 0000000..bec49b3 --- /dev/null +++ b/src/rag/__tests__/migration-0019.test.ts @@ -0,0 +1,39 @@ +// Surrogate de CI para a migração 0019 (perfil curado de memória). +// Puro: lê o .sql do disco e valida o contrato da DDL. NÃO toca Postgres. +import { test } from 'node:test'; +import assert from 'node:assert/strict'; +import { readFileSync } from 'node:fs'; +import path from 'node:path'; +import { fileURLToPath } from 'node:url'; + +const __dirname = path.dirname(fileURLToPath(import.meta.url)); +const SQL_PATH = path.resolve( + __dirname, + '../../../scripts/migrations/0019_user_profile.sql', +); +const sql = readFileSync(SQL_PATH, 'utf8'); + +test('toda CREATE TABLE é idempotente (IF NOT EXISTS)', () => { + // captura "CREATE TABLE ..." até o nome da tabela, tolerando espaços/quebras. + const createTables = sql.match(/CREATE\s+TABLE[\s\S]*?\(/gi) ?? []; + assert.ok(createTables.length >= 1, 'esperava ao menos um CREATE TABLE'); + for (const stmt of createTables) { + assert.match( + stmt, + /CREATE\s+TABLE\s+IF\s+NOT\s+EXISTS/i, + `CREATE TABLE sem IF NOT EXISTS: ${stmt.trim().slice(0, 60)}`, + ); + } +}); + +test('existe UNIQUE (account_id, content_hash)', () => { + assert.match(sql, /UNIQUE\s*\(\s*account_id\s*,\s*content_hash\s*\)/i); +}); + +test('account_id é text NOT NULL', () => { + assert.match(sql, /account_id\s+text\s+NOT\s+NULL/i); +}); + +test('cria a tabela memory_audit (idempotente)', () => { + assert.match(sql, /CREATE\s+TABLE\s+IF\s+NOT\s+EXISTS\s+memory_audit/i); +}); diff --git a/src/rag/__tests__/profile-guard.test.ts b/src/rag/__tests__/profile-guard.test.ts new file mode 100644 index 0000000..57eb06c --- /dev/null +++ b/src/rag/__tests__/profile-guard.test.ts @@ -0,0 +1,60 @@ +// src/rag/__tests__/profile-guard.test.ts +// T9 Part B — PURE secret guard. No IO. Recognizes OBVIOUS secret shapes and +// masks them, while leaving ordinary prose untouched (conservative: prefer a +// false-negative over a false-positive that would redact normal text). +import { test } from "node:test"; +import assert from "node:assert/strict"; +import { looksLikeSecret, stripSecrets } from "../profile-guard.js"; + +const JWT = + "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiIxMjM0NTY3ODkwIiwibmFtZSI6IkpvaG4ifQ.SflKxwRJSMeKKF2QT4fwpMeJf36POk6yJV_adQssw5c"; + +test("looksLikeSecret recognizes an sk- API key", () => { + assert.equal(looksLikeSecret("sk-ant-aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"), true); +}); + +test("looksLikeSecret recognizes a JWT", () => { + assert.equal(looksLikeSecret(JWT), true); +}); + +test("looksLikeSecret recognizes an AWS access key id", () => { + assert.equal(looksLikeSecret("AKIAIOSFODNN7EXAMPLE"), true); +}); + +test("looksLikeSecret recognizes a PEM private key block", () => { + const pem = + "-----BEGIN RSA PRIVATE KEY-----\nMIIEowIBAAKCAQEA\n-----END RSA PRIVATE KEY-----"; + assert.equal(looksLikeSecret(pem), true); +}); + +test("looksLikeSecret recognizes a long random token", () => { + assert.equal(looksLikeSecret("0123456789abcdef0123456789abcdef0123"), true); +}); + +test("looksLikeSecret does NOT flag normal Portuguese prose", () => { + assert.equal(looksLikeSecret("Bruno prefere reuniões curtas pela manhã"), false); +}); + +test("looksLikeSecret does NOT flag a short ordinary word", () => { + assert.equal(looksLikeSecret("decisão"), false); +}); + +test("stripSecrets masks the secret and preserves the rest", () => { + const out = stripSecrets("a chave é sk-ant-aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa, guarde"); + assert.ok(out.includes("[REDACTED]")); + assert.ok(!out.includes("sk-ant-aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa")); + assert.ok(out.startsWith("a chave é ")); + assert.ok(out.includes("guarde")); +}); + +test("stripSecrets leaves clean prose unchanged", () => { + const clean = "Bruno prefere reuniões curtas pela manhã"; + assert.equal(stripSecrets(clean), clean); +}); + +test("stripSecrets masks a JWT inside text", () => { + const out = stripSecrets(`token: ${JWT} fim`); + assert.ok(out.includes("[REDACTED]")); + assert.ok(!out.includes(JWT)); + assert.ok(out.includes("fim")); +}); diff --git a/src/rag/__tests__/profile-render.test.ts b/src/rag/__tests__/profile-render.test.ts new file mode 100644 index 0000000..2a51194 --- /dev/null +++ b/src/rag/__tests__/profile-render.test.ts @@ -0,0 +1,126 @@ +// src/rag/__tests__/profile-render.test.ts +import { test } from "node:test"; +import assert from "node:assert/strict"; +import { + renderProfile, + PROFILE_CHAR_BUDGET, + INJECT_MIN_CONFIDENCE, + type ProfileFact, +} from "../profile.ts"; + +// Build a fully-populated ProfileFact, overriding only the fields a case cares about. +function makeFact(overrides: Partial): ProfileFact { + return { + account_id: "bruno", + category: "preferencia", + content: "fato generico", + status: "confirmed", + applied_count: 0, + violated_count: 0, + confidence_value: 0.9, + confidence_band: "high", + pinned: false, + source: "manual", + content_hash: "hash", + last_evidence_at: null, + ...overrides, + }; +} + +test("constantes têm os valores travados do contrato", () => { + assert.equal(PROFILE_CHAR_BUDGET, 2800); + assert.equal(INJECT_MIN_CONFIDENCE, 0.75); +}); + +test("lista vazia retorna null", () => { + assert.equal(renderProfile([], PROFILE_CHAR_BUDGET), null); +}); + +test("um fato signal (não-elegível) sozinho retorna null", () => { + const signal = makeFact({ + content: "fato em fase de sinal", + status: "signal", + confidence_value: 0.99, + pinned: false, + }); + assert.equal(renderProfile([signal], PROFILE_CHAR_BUDGET), null); +}); + +test("elegibilidade e ordem: pinned antes de confirmed; confirmed de baixa confiança é ignorado", () => { + const pinnedA = makeFact({ + content: "PINNED_A_CONTENT", + category: "regra", + status: "evidence", // pinned vence o status + confidence_value: 0.1, // pinned ignora confiança + confidence_band: "low", + pinned: true, + }); + const confirmedHigh = makeFact({ + content: "CONFIRMED_HIGH_CONTENT", + category: "preferencia", + status: "confirmed", + confidence_value: 0.9, + confidence_band: "high", + pinned: false, + }); + const confirmedLow = makeFact({ + content: "CONFIRMED_LOW_CONTENT", + category: "preferencia", + status: "confirmed", + confidence_value: 0.5, // < INJECT_MIN_CONFIDENCE -> inelegível + confidence_band: "low", + pinned: false, + }); + + const out = renderProfile([confirmedLow, confirmedHigh, pinnedA], PROFILE_CHAR_BUDGET); + assert.ok(out, "deveria render um bloco não-nulo"); + const text = out as string; + + assert.ok(text.includes("PINNED_A_CONTENT"), "inclui o fato pinned"); + assert.ok(text.includes("CONFIRMED_HIGH_CONTENT"), "inclui o confirmed de alta confiança"); + assert.ok( + !text.includes("CONFIRMED_LOW_CONTENT"), + "ignora o confirmed de baixa confiança", + ); + + assert.ok( + text.indexOf("PINNED_A_CONTENT") < text.indexOf("CONFIRMED_HIGH_CONTENT"), + "pinned vem antes do confirmed", + ); +}); + +test("orçamento: nunca corta um fato pela metade e respeita o budget total", () => { + const budget = 200; + // Vários fatos elegíveis com content longo: nem todos cabem em 200 chars. + const facts: ProfileFact[] = Array.from({ length: 6 }, (_, i) => + makeFact({ + content: `FATO_${i}_${"x".repeat(40)}`, + category: "preferencia", + status: "confirmed", + confidence_value: 0.95 - i * 0.01, // ordem determinística por confiança desc + confidence_band: "high", + pinned: false, + }), + ); + + const out = renderProfile(facts, budget); + assert.ok(out, "deveria render um bloco com pelo menos um fato"); + const text = out as string; + + // Budget total respeitado. + assert.ok(text.length <= budget, `bloco (${text.length}) deve ser <= budget (${budget})`); + + // Nem todos os fatos cabem (senão o teste de orçamento não testaria nada). + const included = facts.filter((f) => text.includes(f.content)); + const excluded = facts.filter((f) => !text.includes(f.content)); + assert.ok(included.length >= 1, "pelo menos um fato incluído"); + assert.ok(excluded.length >= 1, "pelo menos um fato excluído por falta de espaço"); + + // Todo fato presente aparece com seu content COMPLETO (nunca cortado pela metade). + for (const f of included) { + assert.ok( + text.includes(f.content), + `fato incluído deve aparecer com content completo: ${f.content}`, + ); + } +}); diff --git a/src/rag/__tests__/profile-storage.test.ts b/src/rag/__tests__/profile-storage.test.ts new file mode 100644 index 0000000..b05d30b --- /dev/null +++ b/src/rag/__tests__/profile-storage.test.ts @@ -0,0 +1,179 @@ +// src/rag/__tests__/profile-storage.test.ts +// Unit tests for profile-storage. NO live DB: a fake pool is injected via the +// __setPoolForTest seam (the same seam storage.ts/entity-storage.ts use) and it +// captures every {text, values} so we assert the SQL + params PURELY (CI has no +// Postgres). The load-bearing invariant: EVERY statement is account-scoped — +// this is the structural guard against the facts-storage tenant-leak bug. +import { test, afterEach } from "node:test"; +import assert from "node:assert/strict"; +import { __setPoolForTest } from "../storage.js"; +import { + loadProfileFacts, + upsertProfileFact, + insertMemoryAudit, +} from "../profile-storage.js"; + +// A fake pg-like pool that records every query as {text, values}. +function recordingPool(rows: unknown[] = []) { + const calls: { text: string; values: unknown[] }[] = []; + const pool = { + query: async (text: string, values: unknown[]) => { + calls.push({ text, values: values ?? [] }); + return { rows, rowCount: rows.length }; + }, + }; + return { pool, calls }; +} + +afterEach(() => { + __setPoolForTest(null); +}); + +// --- loadProfileFacts ------------------------------------------------------- + +test("loadProfileFacts is account-scoped (WHERE account_id = $1, value 'bruno')", async () => { + const { pool, calls } = recordingPool([]); + __setPoolForTest(pool as never); + + await loadProfileFacts("bruno", pool as never); + + assert.equal(calls.length, 1); + const { text, values } = calls[0]; + assert.match(text, /WHERE\s+account_id\s*=\s*\$1/i, "deve filtrar por account_id = $1"); + assert.equal(values[0], "bruno", "account_id deve ser o primeiro param"); +}); + +test("loadProfileFacts maps a row into a ProfileFact", async () => { + const row = { + id: 7, + account_id: "bruno", + category: "estilo", + content: "Prefere respostas concisas", + status: "confirmed", + applied_count: 3, + violated_count: 0, + confidence_value: 0.9, + confidence_band: "high", + pinned: true, + source: "inferred", + content_hash: "abc123", + last_evidence_at: null, + }; + const { pool } = recordingPool([row]); + __setPoolForTest(pool as never); + + const facts = await loadProfileFacts("bruno", pool as never); + assert.equal(facts.length, 1); + assert.equal(facts[0].id, 7); + assert.equal(facts[0].account_id, "bruno"); + assert.equal(facts[0].category, "estilo"); + assert.equal(facts[0].content, "Prefere respostas concisas"); + assert.equal(facts[0].status, "confirmed"); + assert.equal(facts[0].applied_count, 3); + assert.equal(facts[0].violated_count, 0); + assert.equal(facts[0].confidence_value, 0.9); + assert.equal(facts[0].confidence_band, "high"); + assert.equal(facts[0].pinned, true); + assert.equal(facts[0].content_hash, "abc123"); +}); + +// --- upsertProfileFact ------------------------------------------------------ + +test("upsertProfileFact uses ON CONFLICT (account_id, content_hash) and puts account_id in the columns + params", async () => { + const { pool, calls } = recordingPool([{ id: 11 }]); + __setPoolForTest(pool as never); + + await upsertProfileFact( + { + account_id: "bruno", + category: "estilo", + content: "Prefere respostas concisas", + status: "signal", + applied_count: 1, + violated_count: 0, + confidence_value: 0.4, + confidence_band: "low", + pinned: false, + source: "inferred", + content_hash: "deadbeef", + }, + pool as never, + ); + + assert.equal(calls.length, 1); + const { text, values } = calls[0]; + assert.match(text, /INSERT\s+INTO\s+user_profile_facts/i); + // account_id must be in the INSERT column list. + assert.match(text, /\(\s*account_id\s*,/i, "account_id deve abrir a lista de colunas do INSERT"); + // conflict target must be the (account_id, content_hash) unique key. + assert.match( + text, + /ON\s+CONFLICT\s*\(\s*account_id\s*,\s*content_hash\s*\)/i, + "conflito deve ser em (account_id, content_hash)", + ); + assert.match(text, /DO\s+UPDATE/i); + assert.ok(values.includes("bruno"), "account_id deve ir nos params"); + assert.ok(values.includes("deadbeef"), "content_hash deve ir nos params"); +}); + +// --- insertMemoryAudit ------------------------------------------------------ + +test("insertMemoryAudit writes account_id in the columns and the params", async () => { + const { pool, calls } = recordingPool([]); + __setPoolForTest(pool as never); + + await insertMemoryAudit( + { + account_id: "bruno", + fact_id: 11, + from_state: "signal", + to_state: "evidence", + trigger: "applied", + evidence_ref: "conv:42", + }, + pool as never, + ); + + assert.equal(calls.length, 1); + const { text, values } = calls[0]; + assert.match(text, /INSERT\s+INTO\s+memory_audit/i); + assert.match(text, /account_id/i, "account_id deve estar nas colunas"); + assert.ok(values.includes("bruno"), "account_id deve ir nos params"); +}); + +// --- structural tenant guard ------------------------------------------------ +// This is the anti-facts-storage check: for EVERY statement any of these +// functions emits, account_id must appear in the SQL text. facts-storage.ts +// omits account_id entirely (the tenant-leak bug); profile-storage must not. + +test("INVARIANT: every emitted statement references account_id", async () => { + const { pool, calls } = recordingPool([{ id: 1 }]); + __setPoolForTest(pool as never); + + await loadProfileFacts("bruno", pool as never); + await upsertProfileFact( + { + account_id: "bruno", + category: "estilo", + content: "x", + status: "signal", + applied_count: 0, + violated_count: 0, + confidence_value: 0, + confidence_band: "low", + pinned: false, + source: "manual", + content_hash: "h", + }, + pool as never, + ); + await insertMemoryAudit( + { account_id: "bruno", trigger: "manual" }, + pool as never, + ); + + assert.ok(calls.length >= 3, "as três funções devem emitir ao menos um statement cada"); + for (const { text } of calls) { + assert.match(text, /account_id/i, `statement sem account_id (tenant-leak): ${text}`); + } +}); diff --git a/src/rag/__tests__/remember-dedup.test.ts b/src/rag/__tests__/remember-dedup.test.ts new file mode 100644 index 0000000..20b867e --- /dev/null +++ b/src/rag/__tests__/remember-dedup.test.ts @@ -0,0 +1,35 @@ +// src/rag/__tests__/remember-dedup.test.ts +// T9 Part A — deterministic dedup for `remember`. PURE: no DB, no Voyage. +// When the caller does NOT pass an explicit id (the production path in +// remember-tool.ts), the source_id must be derived deterministically from +// (account_id, content) so the same memory upserted twice replaces itself +// instead of accumulating duplicate conversation sources. +import { test } from "node:test"; +import assert from "node:assert/strict"; +import { buildConversationDocument } from "../remember-doc.js"; + +const baseSeam = { workspace: "personal" as const, now: new Date("2026-06-07T12:00:00.000Z") }; + +test("same (account, content) -> same source_id (no injected id)", () => { + const a = buildConversationDocument({ text: "lembrar do pricing" }, { accountId: "bruno", ...baseSeam }); + const b = buildConversationDocument({ text: "lembrar do pricing" }, { accountId: "bruno", ...baseSeam }); + assert.equal(a.source_id, b.source_id); + assert.match(a.source_id, /^conversation:[0-9a-f]{64}$/); +}); + +test("different content -> different source_id (same account)", () => { + const a = buildConversationDocument({ text: "lembrar do pricing" }, { accountId: "bruno", ...baseSeam }); + const b = buildConversationDocument({ text: "lembrar da reunião" }, { accountId: "bruno", ...baseSeam }); + assert.notEqual(a.source_id, b.source_id); +}); + +test("different account, same content -> different source_id (no cross-account collision)", () => { + const a = buildConversationDocument({ text: "mesma nota" }, { accountId: "bruno", ...baseSeam }); + const b = buildConversationDocument({ text: "mesma nota" }, { accountId: "alice", ...baseSeam }); + assert.notEqual(a.source_id, b.source_id); +}); + +test("an explicit seam id still overrides the content hash", () => { + const doc = buildConversationDocument({ text: "qualquer" }, { accountId: "bruno", id: "abc123", ...baseSeam }); + assert.equal(doc.source_id, "conversation:abc123"); +}); diff --git a/src/rag/__tests__/utility-confidence.test.ts b/src/rag/__tests__/utility-confidence.test.ts new file mode 100644 index 0000000..80e9c83 --- /dev/null +++ b/src/rag/__tests__/utility-confidence.test.ts @@ -0,0 +1,54 @@ +// src/rag/__tests__/utility-confidence.test.ts +// Pure unit tests for computeConfidence (no Postgres, no Voyage). +// Deterministic: fixed `now` Date. + +import { test } from 'node:test'; +import assert from 'node:assert/strict'; +import { computeConfidence } from '../utility.ts'; + +const now = new Date('2026-06-18T00:00:00Z'); +// ~90 days before `now`. +const d90 = new Date(now.getTime() - 90 * 86_400_000); + +test('thin evidence does not become high', () => { + // ratio = 1/(1+0+K_SMOOTH) is far below the high band. + const r = computeConfidence(1, 0, now, now); + assert.notEqual(r.band, 'high'); +}); + +test('lots of fresh consistent evidence is high', () => { + const r = computeConfidence(50, 0, now, now); + assert.equal(r.band, 'high'); +}); + +test('moderate evidence lands in medium', () => { + // 15/(15+5+2) = 0.682 -> [0.40, 0.75) -> medium + const r = computeConfidence(15, 5, now, now); + assert.equal(r.band, 'medium'); +}); + +test('mostly-violated evidence lands in low', () => { + // 5/(5+15+2) = 0.227 -> < 0.40 -> low + const r = computeConfidence(5, 15, now, now); + assert.equal(r.band, 'low'); +}); + +test('value is clamped to [0,1]', () => { + const r = computeConfidence(50, 0, now, now); + assert.ok(r.value >= 0 && r.value <= 1, `value out of range: ${r.value}`); +}); + +test('staleness lowers value (observable freshness decay)', () => { + const fresh = computeConfidence(50, 0, now, now).value; + const stale = computeConfidence(50, 0, d90, now).value; + assert.ok(stale < fresh, `expected ${stale} < ${fresh}`); +}); + +test('violations lower value vs the inverse split', () => { + const mostlyViolated = computeConfidence(5, 15, now, now).value; + const mostlyApplied = computeConfidence(15, 5, now, now).value; + assert.ok( + mostlyViolated < mostlyApplied, + `expected ${mostlyViolated} < ${mostlyApplied}`, + ); +}); diff --git a/src/rag/memory-curation-tick.ts b/src/rag/memory-curation-tick.ts new file mode 100644 index 0000000..235eadf --- /dev/null +++ b/src/rag/memory-curation-tick.ts @@ -0,0 +1,73 @@ +// src/rag/memory-curation-tick.ts +// T8 — o tick que agenda o curador determinístico no brain-classifier. +// +// Vive num módulo próprio (e não inline no index-classifier.ts) para ser +// testável sem arrastar a cadeia de imports do entrypoint (Express, clients, +// validação de tokens Notion no load). Espelha tickEntities: +// - gate MEMORY_CURATION_ENABLED (=== 'true', default off); +// - lazy-import das deps REAIS (runMemoryCuration + helpers de profile-storage); +// - try/catch com log de 1 linha; +// - recordRun(worker='classifier', source='memory-curation', ...) no mesmo +// formato dos outros ticks. +// +// OWNER-ONLY: runMemoryCuration processa só DEFAULT_ACCOUNT_ID (default da dep +// accountId); este tick não passa accountId, então cai no default. +import { recordRun } from "./storage.js"; + +/** Stats do curador, reexportado pra tipar o seam de teste. */ +export interface MemoryCurationTickStats { + processed: number; + transitions: number; +} + +/** + * Roda o curador uma vez, atrás do gate. O 2º parâmetro `run` é um test seam: o + * teste de gating injeta um fake e verifica que ele só é chamado com a flag + * ligada (sem DB/Anthropic). Em produção `run` é omitido e as deps reais são + * lazy-importadas. Erros nunca propagam — são logados e registrados em recordRun. + */ +export async function tickMemoryCuration( + label: string, + run?: () => Promise, +): Promise { + if (process.env.MEMORY_CURATION_ENABLED !== "true") return; + const start = Date.now(); + try { + let stats: MemoryCurationTickStats; + if (run) { + stats = await run(); + } else { + const { runMemoryCuration } = await import("./memory-curator.js"); + const { loadProfileFacts, upsertProfileFact, insertMemoryAudit } = await import( + "./profile-storage.js" + ); + stats = await runMemoryCuration({ + loadFacts: (accountId) => loadProfileFacts(accountId), + upsertFact: (fact) => upsertProfileFact(fact), + insertAudit: (row) => insertMemoryAudit(row), + }); + } + console.log( + `[${new Date().toISOString()}] [memory-curation:${label}] processed=${stats.processed} transitions=${stats.transitions} took=${Date.now() - start}ms`, + ); + await recordRun({ + worker: "classifier", + source: "memory-curation", + ok: true, + counts: stats, + startedAt: new Date(start), + endedAt: new Date(), + }); + } catch (err) { + const msg = `[${new Date().toISOString()}] [memory-curation:${label}] FAILED: ${err instanceof Error ? err.message : String(err)}`; + console.error(msg); + await recordRun({ + worker: "classifier", + source: "memory-curation", + ok: false, + error: err instanceof Error ? err.message : String(err), + startedAt: new Date(start), + endedAt: new Date(), + }); + } +} diff --git a/src/rag/memory-curator.ts b/src/rag/memory-curator.ts new file mode 100644 index 0000000..bf40e8e --- /dev/null +++ b/src/rag/memory-curator.ts @@ -0,0 +1,171 @@ +// src/rag/memory-curator.ts +// T7 — curador determinístico de memória (owner-only, v1). +// +// Roda no brain-classifier (tick noturno, atrás do gate MEMORY_CURATION_ENABLED). +// Reavalia a confiança de cada fato curado da CONTA OWNER e promove o status +// PARA FRENTE quando há evidência. Sem LLM, sem heurística: é uma máquina de +// estados determinística sobre user_profile_facts. +// +// ESCOPO (não confundir com runEntityExtraction, que varre TODAS as contas): +// o curador v1 é OWNER-ONLY. Processa SOMENTE DEFAULT_ACCOUNT_ID (ou o +// accountId injetado nos testes). NÃO existe caminho que liste/itere outras +// contas — não há dep de "listar contas". Isolamento por conta é estrutural. +// +// REGRAS: +// - reavalia {value, band} = computeConfidence(applied, violated, last_evidence_at, now). +// - transição de status (só PARA FRENTE; sem retire/demote no v1): +// 'signal' -> 'evidence' se applied_count >= 1 +// 'evidence' -> 'confirmed' se value >= CONFIRM_THRESHOLD +// Um fato pode subir 2 degraus no mesmo run; gera 1 audit por degrau +// (trigger 'cron-promote', com from_state/to_state). +// - pinned NÃO muda de status por decaimento; a confiança é recomputada só +// pra exibição/elegibilidade. +// - decaimento só baixa value/band (afeta elegibilidade de injeção, não o +// status). NUNCA deleta um fato — não há dep de delete. +// - se status mudou OU value/band mudaram, chama upsertFact com os novos +// valores. account_id SEMPRE presente em toda escrita (via os helpers de +// profile-storage, que carregam account_id na coluna e na chave). +import { computeConfidence } from "./utility.js"; +import { INJECT_MIN_CONFIDENCE, type ProfileFact } from "./profile.js"; +import { DEFAULT_ACCOUNT_ID } from "../context.js"; + +/** Um fato 'confirmed' precisa atingir esta confiança para ser elegível à + * injeção; é o mesmo limiar que promove 'evidence' -> 'confirmed'. Alinhado + * com INJECT_MIN_CONFIDENCE (0.75) em profile.ts. */ +export const CONFIRM_THRESHOLD = INJECT_MIN_CONFIDENCE; + +/** O que o curador grava num upsert. Espelha UpsertProfileFactInput de + * profile-storage.ts (account_id obrigatório, keyed por content_hash). */ +export interface CuratedFactUpsert { + account_id: string; + category: string; + content: string; + status: ProfileFact["status"]; + applied_count: number; + violated_count: number; + confidence_value: number; + confidence_band: ProfileFact["confidence_band"]; + pinned: boolean; + source?: string; + content_hash: string; + last_evidence_at?: Date | null; +} + +/** Uma linha de audit de transição (espelha MemoryAuditInput). */ +export interface CuratedAuditRow { + account_id: string; + fact_id?: number | null; + from_state?: string | null; + to_state?: string | null; + trigger: string; + evidence_ref?: string | null; +} + +export interface MemoryCurationDeps { + /** Conta a processar. Default: DEFAULT_ACCOUNT_ID. NUNCA itera outras contas. */ + accountId?: string; + /** Lê TODOS os fatos curados de UMA conta (account-scoped). */ + loadFacts: (accountId: string) => Promise; + /** Insere/atualiza um fato (keyed por account_id+content_hash). */ + upsertFact: (fact: CuratedFactUpsert) => Promise; + /** Anexa uma linha à trilha de audit. */ + insertAudit: (row: CuratedAuditRow) => Promise; + /** Test seam: relógio. Default: new Date(). */ + now?: Date; +} + +export interface MemoryCurationResult { + /** Quantos fatos foram lidos/avaliados. */ + processed: number; + /** Quantas linhas de audit (degraus de promoção) foram geradas. */ + transitions: number; +} + +/** Próximo status PARA FRENTE dado o atual, applied_count e a confiança. Retorna + * null quando não há promoção. Determinístico, sem efeito colateral. */ +function nextStatus( + status: ProfileFact["status"], + appliedCount: number, + confidenceValue: number, +): ProfileFact["status"] | null { + if (status === "signal" && appliedCount >= 1) return "evidence"; + if (status === "evidence" && confidenceValue >= CONFIRM_THRESHOLD) return "confirmed"; + return null; +} + +/** + * Roda o curador determinístico para UMA conta (owner-only no v1). + * + * Para cada fato: + * 1. recomputa {value, band} de confiança. + * 2. se NÃO é pinned, promove o status para frente até estabilizar (até 2 + * degraus), gerando 1 audit 'cron-promote' por degrau. + * 3. se o status mudou OU value/band mudaram, faz upsert com os novos valores. + * pinned nunca muda de status; só value/band são recomputados. + * + * NUNCA deleta. account_id da conta processada vai em toda escrita. + */ +export async function runMemoryCuration( + deps: MemoryCurationDeps, +): Promise { + const accountId = deps.accountId ?? DEFAULT_ACCOUNT_ID; + const now = deps.now ?? new Date(); + + // Owner-only: SOMENTE esta conta. Sem listagem de contas, sem loop externo. + const facts = await deps.loadFacts(accountId); + + let transitions = 0; + + for (const fact of facts) { + const { value, band } = computeConfidence( + fact.applied_count, + fact.violated_count, + fact.last_evidence_at ?? null, + now, + ); + + let status = fact.status; + + if (!fact.pinned) { + // Sobe degrau a degrau (signal -> evidence -> confirmed). A confiança é a + // recomputada agora; um fato pode subir 2 degraus num run só. + for (;;) { + const promoted = nextStatus(status, fact.applied_count, value); + if (!promoted) break; + await deps.insertAudit({ + account_id: accountId, + fact_id: fact.id ?? null, + from_state: status, + to_state: promoted, + trigger: "cron-promote", + evidence_ref: null, + }); + transitions++; + status = promoted; + } + } + + const statusChanged = status !== fact.status; + const confidenceChanged = + value !== fact.confidence_value || band !== fact.confidence_band; + + if (statusChanged || confidenceChanged) { + await deps.upsertFact({ + account_id: accountId, + category: fact.category, + content: fact.content, + status, + applied_count: fact.applied_count, + violated_count: fact.violated_count, + confidence_value: value, + confidence_band: band, + pinned: fact.pinned, + source: fact.source, + content_hash: fact.content_hash ?? "", + last_evidence_at: fact.last_evidence_at ?? null, + }); + } + } + + return { processed: facts.length, transitions }; +} diff --git a/src/rag/profile-guard.ts b/src/rag/profile-guard.ts new file mode 100644 index 0000000..9d09bfe --- /dev/null +++ b/src/rag/profile-guard.ts @@ -0,0 +1,61 @@ +// src/rag/profile-guard.ts +// T9 Part B — PURE secret guard. No IO, no deps. Recognizes OBVIOUS secret +// shapes (API keys, AWS access key ids, JWTs, PEM private-key blocks, long +// random tokens) so callers can refuse or mask them before persisting text +// (e.g. a remembered note or a profile field) into the brain. +// +// Conservative by design: it must prefer a false-negative (let an odd-looking +// string through) over a false-positive that would clobber ordinary prose. The +// patterns therefore key off distinctive prefixes/structures, and the +// "long random token" heuristic only fires on a single >=32-char unbroken run +// with no whitespace (real prose breaks into words long before that). + +/** API-key shapes with a distinctive prefix, e.g. OpenAI/Anthropic `sk-...`, + * Anthropic admin `sk-ant-...`. Underscores/hyphens allowed inside the body. */ +const API_KEY = /\bsk-[A-Za-z0-9_-]{16,}\b/g; + +/** AWS access key id: literal AKIA/ASIA + 16 uppercase alphanumerics. */ +const AWS_KEY = /\b(?:AKIA|ASIA)[A-Z0-9]{16}\b/g; + +/** JWT: three base64url segments separated by dots; header begins `eyJ`. */ +const JWT = /\beyJ[A-Za-z0-9_-]+\.[A-Za-z0-9_-]+\.[A-Za-z0-9_-]+\b/g; + +/** PEM private-key block (RSA/EC/OPENSSH/PKCS8…). Spans newlines. */ +const PEM = /-----BEGIN [A-Z0-9 ]*PRIVATE KEY-----[\s\S]*?-----END [A-Z0-9 ]*PRIVATE KEY-----/g; + +/** Long unbroken hex/base64-ish token (>=32 chars, no separators/whitespace). + * Catches raw secrets that lack a telltale prefix. The leading boundary keeps + * it from matching inside a longer word; the trailing boundary requires the + * run to end. */ +const LONG_TOKEN = /(?=32-char unbroken token and no key prefix, + * so it returns false. + */ +export function looksLikeSecret(text: string): boolean { + if (typeof text !== "string" || text.length === 0) return false; + for (const re of PATTERNS) { + re.lastIndex = 0; + if (re.test(text)) return true; + } + return false; +} + +/** + * Returns `text` with every matched secret span replaced by `[REDACTED]`, + * leaving the surrounding text intact. Non-secret input is returned unchanged. + */ +export function stripSecrets(text: string): string { + if (typeof text !== "string" || text.length === 0) return text; + let out = text; + for (const re of PATTERNS) { + out = out.replace(re, "[REDACTED]"); + } + return out; +} diff --git a/src/rag/profile-storage.ts b/src/rag/profile-storage.ts new file mode 100644 index 0000000..3d4febe --- /dev/null +++ b/src/rag/profile-storage.ts @@ -0,0 +1,176 @@ +// src/rag/profile-storage.ts +// Account-scoped DB layer for the curated user profile (user_profile_facts + +// memory_audit, migration 0019). Mirrors the patterns in storage.ts / +// entity-storage.ts: reuses getPool() (so __setPoolForTest works for tests too), +// keeps every value parameterized, and is intentionally small. +// +// HARD MULTI-TENANT RULE: account_id is ALWAYS an explicit parameter (never read +// from any untrusted input) and ALWAYS appears in the columns AND the WHERE / +// conflict target of every statement. This is the deliberate counterpoint to +// facts-storage.ts, whose statements omit account_id (a tenant-leak anti-pattern). +import pg from "pg"; +import { getPool } from "./storage.js"; +import type { ProfileFact } from "./profile.js"; + +// Minimal pg-like surface so callers/tests can inject a fake pool (same shape +// storage.ts exposes). When omitted we fall back to the shared getPool(). +type PoolLike = Pick; + +interface ProfileFactRow { + id: number | string; + account_id: string; + category: string; + content: string; + status: string; + applied_count: number; + violated_count: number; + confidence_value: number; + confidence_band: string; + pinned: boolean; + source: string | null; + content_hash: string | null; + last_evidence_at: Date | null; +} + +/** Map a user_profile_facts row into the domain ProfileFact (from profile.ts). */ +function rowToProfileFact(r: ProfileFactRow): ProfileFact { + return { + id: Number(r.id), + account_id: r.account_id, + category: r.category, + content: r.content, + status: r.status as ProfileFact["status"], + applied_count: Number(r.applied_count), + violated_count: Number(r.violated_count), + confidence_value: Number(r.confidence_value), + confidence_band: r.confidence_band as ProfileFact["confidence_band"], + pinned: r.pinned, + source: r.source ?? undefined, + content_hash: r.content_hash ?? undefined, + last_evidence_at: r.last_evidence_at ?? null, + }; +} + +/** + * Load ALL curated profile facts for ONE account, ordered pinned-first then by + * confidence (highest first). account_id is the only WHERE clause and is always + * the bound $1 — one account can never read another's facts. + */ +export async function loadProfileFacts( + accountId: string, + pool?: PoolLike, +): Promise { + const p = pool ?? getPool(); + const { rows } = await p.query( + `SELECT id, account_id, category, content, status, + applied_count, violated_count, confidence_value, confidence_band, + pinned, source, content_hash, last_evidence_at + FROM user_profile_facts + WHERE account_id = $1 + ORDER BY pinned DESC, confidence_value DESC`, + [accountId], + ); + return rows.map(rowToProfileFact); +} + +/** Fields needed to upsert one curated profile fact. account_id + content_hash + * form the unique key (account-scoped dedup); both are mandatory. */ +export interface UpsertProfileFactInput { + account_id: string; + category: string; + content: string; + status: ProfileFact["status"]; + applied_count: number; + violated_count: number; + confidence_value: number; + confidence_band: ProfileFact["confidence_band"]; + pinned: boolean; + source?: string; + content_hash: string; + last_evidence_at?: Date | null; +} + +/** + * Insert or update one curated profile fact, keyed by (account_id, content_hash) + * — the table's UNIQUE constraint (migration 0019). account_id is in the column + * list AND the conflict target, so a re-observation of the same content for one + * account updates that account's row and can never collide with another's. + * On conflict we refresh the mutable fields (content/status/counts/confidence/ + * pinned/source/last_evidence_at) and bump updated_at. Returns the row id. + */ +export async function upsertProfileFact( + fact: UpsertProfileFactInput, + pool?: PoolLike, +): Promise { + const p = pool ?? getPool(); + const { rows } = await p.query<{ id: number | string }>( + `INSERT INTO user_profile_facts + (account_id, category, content, status, + applied_count, violated_count, confidence_value, confidence_band, + pinned, source, content_hash, last_evidence_at, updated_at) + VALUES + ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, now()) + ON CONFLICT (account_id, content_hash) DO UPDATE SET + category = EXCLUDED.category, + content = EXCLUDED.content, + status = EXCLUDED.status, + applied_count = EXCLUDED.applied_count, + violated_count = EXCLUDED.violated_count, + confidence_value = EXCLUDED.confidence_value, + confidence_band = EXCLUDED.confidence_band, + pinned = EXCLUDED.pinned, + source = EXCLUDED.source, + last_evidence_at = EXCLUDED.last_evidence_at, + updated_at = now() + RETURNING id`, + [ + fact.account_id, + fact.category, + fact.content, + fact.status, + fact.applied_count, + fact.violated_count, + fact.confidence_value, + fact.confidence_band, + fact.pinned, + fact.source ?? "manual", + fact.content_hash, + fact.last_evidence_at ?? null, + ], + ); + return Number(rows[0]?.id); +} + +/** One append-only row for the memory state-transition audit trail (0019). */ +export interface MemoryAuditInput { + account_id: string; + fact_id?: number | null; + from_state?: string | null; + to_state?: string | null; + trigger: string; + evidence_ref?: string | null; +} + +/** + * Append one row to the memory_audit trail. account_id leads the column list and + * the params, so the trail stays account-scoped (queryable per tenant). + */ +export async function insertMemoryAudit( + row: MemoryAuditInput, + pool?: PoolLike, +): Promise { + const p = pool ?? getPool(); + await p.query( + `INSERT INTO memory_audit + (account_id, fact_id, from_state, to_state, trigger, evidence_ref) + VALUES ($1, $2, $3, $4, $5, $6)`, + [ + row.account_id, + row.fact_id ?? null, + row.from_state ?? null, + row.to_state ?? null, + row.trigger, + row.evidence_ref ?? null, + ], + ); +} diff --git a/src/rag/profile.ts b/src/rag/profile.ts new file mode 100644 index 0000000..35e27b4 --- /dev/null +++ b/src/rag/profile.ts @@ -0,0 +1,83 @@ +// src/rag/profile.ts +// +// Curated user-profile facts and a PURE renderer that turns the eligible facts +// into a single prose block fit for injection into a system prompt. No DB, no +// storage imports — just the type, the tuning constants, and renderProfile(). + +export interface ProfileFact { + id?: number; + account_id: string; + category: string; + content: string; + status: "signal" | "evidence" | "confirmed"; + applied_count: number; + violated_count: number; + confidence_value: number; // 0..1 + confidence_band: "low" | "medium" | "high"; + pinned: boolean; + source?: string; + content_hash?: string; + last_evidence_at?: Date | null; +} + +// Total character budget for the rendered profile block. +export const PROFILE_CHAR_BUDGET = 2800; + +// A confirmed fact must reach this confidence to be eligible for injection. +export const INJECT_MIN_CONFIDENCE = 0.75; + +const HEADER = "PERFIL DO USUÁRIO (memória curada):"; + +const BAND_RANK: Record = { + high: 0, + medium: 1, + low: 2, +}; + +// A fact is injectable if it's pinned, or confirmed with enough confidence. +function isEligible(fact: ProfileFact): boolean { + if (fact.pinned) return true; + return fact.status === "confirmed" && fact.confidence_value >= INJECT_MIN_CONFIDENCE; +} + +// Pinned first; then by confidence band (high > medium > low); then confidence desc. +function compareFacts(a: ProfileFact, b: ProfileFact): number { + if (a.pinned !== b.pinned) return a.pinned ? -1 : 1; + const band = BAND_RANK[a.confidence_band] - BAND_RANK[b.confidence_band]; + if (band !== 0) return band; + return b.confidence_value - a.confidence_value; +} + +function lineFor(fact: ProfileFact): string { + return `- [${fact.category}] ${fact.content}`; +} + +/** + * Render the eligible facts into a single prose block, respecting `budget` + * (max characters of the returned string). Facts are added whole, in priority + * order, while the running block stays within budget; the first fact that would + * overflow stops accumulation (never truncated mid-fact). Returns null when no + * eligible fact fits (including: no eligible facts, or the header alone already + * exceeds the budget). + */ +export function renderProfile(facts: ProfileFact[], budget: number): string | null { + const eligible = facts.filter(isEligible).sort(compareFacts); + if (eligible.length === 0) return null; + + let block = HEADER; + let included = 0; + + for (const fact of eligible) { + const candidate = `${block}\n${lineFor(fact)}`; + if (candidate.length > budget) { + // Adding this fact whole would overflow — stop, never truncate. + break; + } + block = candidate; + included += 1; + } + + // Header alone with no fitting fact is not a useful profile. + if (included === 0) return null; + return block; +} diff --git a/src/rag/remember-doc.ts b/src/rag/remember-doc.ts index 5defb96..f1f5d62 100644 --- a/src/rag/remember-doc.ts +++ b/src/rag/remember-doc.ts @@ -9,7 +9,7 @@ // explicit seam, so the result is deterministic and the test can prove that // account_id comes from the trusted context, never from tool input. -import { randomUUID } from "node:crypto"; +import { createHash } from "node:crypto"; import type { IndexableDocument, Workspace } from "./types.js"; /** What the assistant passes to `remember`. account_id is intentionally ABSENT — @@ -26,7 +26,9 @@ export interface RememberInput { export interface RememberSeam { accountId: string; workspace: Workspace | null; - /** stable id for the conversation source_id; omitted -> randomUUID(). */ + /** stable id for the conversation source_id; omitted -> a deterministic + * content hash, sha256(accountId + '\0' + text), so re-remembering the same + * note replaces itself (replace-on-write dedup) instead of duplicating. */ id?: string; /** clock; omitted -> new Date(). */ now?: Date; @@ -37,7 +39,10 @@ export const DEFAULT_TITLE = "Nota de conversa"; /** * PURE: build the IndexableDocument for a conversation note. No IO, no Voyage. * - source_type = "conversation" - * - source_id = `conversation:` (deterministic from the seam's id) + * - source_id = `conversation:`; when the caller omits an explicit id the + * id defaults to sha256hex(accountId + '\0' + text), making the + * same memory dedup-stable across calls (NUL delimiter avoids + * account/content concatenation ambiguity). * - account_id = seam.accountId (NEVER from input) * - workspace = seam.workspace (the account's default workspace) * - parent_url = null (no per-note URL; brain-format cites by title instead) @@ -47,7 +52,11 @@ export function buildConversationDocument( input: RememberInput, seam: RememberSeam, ): IndexableDocument { - const id = seam.id ?? randomUUID(); + const id = + seam.id ?? + createHash("sha256") + .update(`${seam.accountId}\0${input.text}`) + .digest("hex"); const now = seam.now ?? new Date(); const date = now.toISOString().slice(0, 10); // YYYY-MM-DD const tags = Array.isArray(input.tags) ? input.tags : []; diff --git a/src/rag/utility.ts b/src/rag/utility.ts index dc25922..be000e2 100644 --- a/src/rag/utility.ts +++ b/src/rag/utility.ts @@ -108,3 +108,66 @@ export function applyFinalScore( if (alpha === 0) return rerankScore; return rerankScore * (1 + alpha * Math.tanh(effectiveUtility / 10)); } + +// --------------------------------------------------------------------------- +// Confidence (memory: smoothed apply/violate ratio × freshness decay) +// --------------------------------------------------------------------------- + +/** + * Laplace-style smoothing for the apply/violate ratio. A small K dampens + * confidence when total evidence is thin (e.g. 1/0 should not read as certain). + */ +export const K_SMOOTH = 2; + +/** value >= this -> band 'high'. */ +export const CONFIDENCE_BAND_HIGH = 0.75; + +/** value >= this (and < HIGH) -> band 'medium'; below -> 'low'. */ +export const CONFIDENCE_BAND_MEDIUM = 0.4; + +export type ConfidenceBand = 'low' | 'medium' | 'high'; + +/** + * Compute a [0,1] confidence for a memory from how often it was applied vs + * violated, attenuated by how fresh the supporting evidence is. + * + * Formula: value = clamp( (applied / (applied + violated + K_SMOOTH)) * freshness , 0, 1 ) + * + * Freshness reuses the SAME exponential-decay curve as computeEffectiveUtility: + * DECAY_PER_DAY ^ days_since_last_evidence (days clamped at >= 0). When + * lastEvidenceAt is null we apply a NEUTRAL freshness factor of 1 (no decay): + * absence of a timestamp means we have no age signal, and the smoothed ratio + * already captures evidence quality, so we don't double-penalize. This mirrors + * computeEffectiveUtility, which also returns the score unchanged when its + * timestamp is null. + * + * Bands: value >= CONFIDENCE_BAND_HIGH -> 'high'; + * value >= CONFIDENCE_BAND_MEDIUM -> 'medium'; else 'low'. + * + * @param applied Count of times the memory was applied/confirmed. + * @param violated Count of times the memory was contradicted/violated. + * @param lastEvidenceAt Timestamp of the most recent supporting evidence (or null). + * @param now Injectable for tests; defaults to new Date(). + */ +export function computeConfidence( + applied: number, + violated: number, + lastEvidenceAt: Date | null, + now: Date = new Date(), +): { value: number; band: ConfidenceBand } { + const ratio = applied / (applied + violated + K_SMOOTH); + const freshness = lastEvidenceAt + ? Math.pow( + DECAY_PER_DAY, + Math.max(0, (now.getTime() - lastEvidenceAt.getTime()) / 86_400_000), + ) + : 1; + const value = Math.max(0, Math.min(1, ratio * freshness)); + const band: ConfidenceBand = + value >= CONFIDENCE_BAND_HIGH + ? 'high' + : value >= CONFIDENCE_BAND_MEDIUM + ? 'medium' + : 'low'; + return { value, band }; +}