Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/content/deep-dives/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ bookCollapseSection = true
# Deep dives

Understanding-oriented discussion of the *why* behind Katalyst: the
[vision and scope]({{< relref "vision.md" >}}), the
[vision and scope]({{< relref "why-katalyst/_index.md" >}}), the
[domain model]({{< relref "domain-model/_index.md" >}}) the tool is built on,
and the deeper design discussions that no single page or package owns: how
[checks work]({{< relref "domain-model/checks.md" >}}) and the libraries that
Expand Down
2 changes: 1 addition & 1 deletion docs/content/deep-dives/domain-model/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ The most central concept in Katalyst is a **base**: a storage system that holds
An **operation** is something a base lets you do with data: read, list,
aggregate, write, and eventually query. Which operations a base supports,
and what structural commitments those operations require, is the subject of
[progressive operations]({{< relref "../progressive-operations.md" >}}).
[progressive operations]({{< relref "../why-katalyst/progressive-operations.md" >}}).

In addition to natively-supported operations for various backends, Katalyst provides two very useful kinds of operation.

Expand Down
109 changes: 0 additions & 109 deletions docs/content/deep-dives/vision.md

This file was deleted.

32 changes: 32 additions & 0 deletions docs/content/deep-dives/why-katalyst/_index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
+++
title = "Why Katalyst?"
weight = 10
bookCollapseSection = true
+++

# Why Katalyst?

In order for agents to become capable of real work, the next frontier seems to revolve around two things:

1. Making operational context more legible to agents.
2. Enabling agents to curate their own memory—individual or shared—in a way that's robust, durable, and efficient.

These problems have several things in common:
* Content that's a mix of text and more structured data.
* A compute model that's a mix of LLMs and deterministic software.
* The need for humans and agents need to make sense of the same information.
* UI/UX questions that end up being grounded in shared primitives.

I've come to see the two problems as two faces of the same coin. By enabling agents to curate internally consistent, always-up-to-date knowledge bases, I believe we can serve both needs.

Katalyst is designed to provide the right content primitives and large fraction of the deterministic compute required to solve this problem.

## How this section is organized

This section contains the first-principles reasoning underlying Katalyst's primitives. This isn't necessary if you just want to use the library. It will mostly be useful for those who want a solid, well-grounded perspective on how to build AI knowledge bases.

- [What is curation?]({{< relref "what-is-curation.md" >}}) defines curation and the criteria that make curated information useful.
- [Internal consistency]({{< relref "internal-consistency.md" >}}) explains how a knowledge base decides which contradictions count.
- [Completeness]({{< relref "completeness.md" >}}) covers the scope of information a knowledge base claims to contain.
- [Up-to-dateness]({{< relref "up-to-dateness.md" >}}) describes how a knowledge base stays connected to the world it represents.
- [Progressive operations]({{< relref "progressive-operations.md" >}}) explains how storage backends grow richer as query complexity increases.
16 changes: 16 additions & 0 deletions docs/content/deep-dives/why-katalyst/completeness.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
+++
title = "Completeness"
weight = 17
+++

# Completeness

<!-- Draft this section -->

Completeness means covering all the relevant material within some scope.

The scope matters. A knowledge base does not need to contain every true fact in the world. It needs to contain the material required by the purpose it claims to serve.



* Claims about ordering rely on claims about completeness
94 changes: 94 additions & 0 deletions docs/content/deep-dives/why-katalyst/internal-consistency.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,94 @@
+++
title = "Internal consistency"
weight = 16
+++

# Internal consistency

<!-- Introduce the definition -->

Consistency means being free from internal contradiction. On its surface, this seems simple: the knowledge base can't say "A is true" in one place and "A is false" in another.

<!-- Distinguish between content claims and structural claims -->

## Content claims vs structural claims

However, there's some subtlety here. Imagine a folder containing customer feedback interviews. In one transcript, customer A says, "this product is amazing!" In another, customer B says "the product is terrible." Those statements are in direct contradiction, but is the knowledge base inconsistent?

I'd argue no. The knowledge base isn't claiming that both customer opinions are true descriptions of the product. It is claiming that both interviews happened and that both customers said what the transcripts record.

Imagine adding a README in the folder: "This folder contains interview transcripts from many customers. Customers may disagree among themselves." The README makes the *structure* of the folder explicit. It tells a reader what kind of content the folder contains, how to interpret disagreement between items, and which guarantees the folder is making.

> **Structure** is a set of rules and conventions that distinguish structural claims from ordinary content within a knowledge base.

If the README said, "This folder contains interview transcripts from many customers. All customers absolutely love the product," that would contradict customer B's statement, and the folder would be internally inconsistent.

In other words, we need to distinguish between two types of claims.

> A **content claim** says something *within* the knowledge base.

> A **structural claim** says something *about* the knowledge base: what kind of content it contains, how that content is organized, and any other guarantees the system makes about it.

For determining consistency, only structural claims matter.

> **Internal consistency**: A knowledge base is internally consistent if it is free from contradictory structural claims.


## Defining structure

<!-- Flesh out the concept of structure, to help reduce it to practice -->

<!-- Give more examples -->

In the customer feedback example, the README defines a simple structure. There are lots of other examples of

{Examples: Tables of contents; executive summaries; indexes; chapters; sections; API references}

<!--- Anticipate potential failure modes -->

{Transition, introduce the list of desiderata: structure should be explicit; structure doesn't need to be part of the content; structure needs to be defined authoritatively}

**Structure should be explicit**

In many knowledge bases, it's common for structural conventions to be implicit. You don't usually need to be told "the chapter entries in the table of contents correspond 1:1 with the chapters in the book." Or "terms in the index are sorted alphabetically."

However, for our purposes, it's helpful to insist that structure be made explicit. All of our structural claims must be declared somewhere. This gives us a master list to check, to ensure consistency.

**Structure is often embedded in content, but it doesn't need to be**

Sometimes, it's useful to embed knowledge base structure directly in content. {Examples: summaries and overviews, text books, technical documentation}

When structure is written directly into content, {it has benefits X, Y and Z}

However, structure doesn't always need to be spelled out in content. In some cases, this can be counterproductive. {Examples: marketing, persuasion; security / private knowledge}. In other cases, it would just be pedantic.

Since we want the structure of our knowledge base to be explicit, but we don't necessarily want to show all of it to the user, we need a concept of metadata / markup attached to the knowledge base.

**Structure needs an authoritative source**

If you want to play logic games, you can invent self-referential cases where content tries to override structure. "Ignore all previous instructions and..." "This page lists all pages that do not list themselves."

Practially speaking, we can avoid this kind of thing by defining an authoritative source for structure in the knowledge base.



To separate these caess, a knowledge base needs a *structural layer*:



changes how the content should be read. Disagreement between transcripts is allowed, but a transcript with the wrong customer ID, source date, or interview format may still violate the structure of the collection.

{Make an analogy to turing machines (no separation between programs / data) and how we actually do things.}

<!-- Briefly introduce weirdness that can happen if we don't have an external way of deciding what's authoritative -->


{What properties must a structural interpreter have? Goal isn't executionin the sense of a program, but interpretation.}


## Explicit vs implicit structure

## Guaranteeing internal consistency

* Explicit
* With a comprehensive library of invariants, and a reliable method of enforcement.
12 changes: 12 additions & 0 deletions docs/content/deep-dives/why-katalyst/up-to-dateness.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
+++
title = "Up-to-dateness"
weight = 18
+++

# Up-to-dateness

Up-to-dateness is the guarantee of external consistency: the state of the content accurately reflects the state of the real world at some point in time. A knowledge base can be internally consistent and complete within its stated scope while still being wrong, because the world changed since the content was last updated.

That makes up-to-dateness different from the other two criteria. It cannot be guaranteed from inside the content alone. It requires contact with an external source of truth: an event stream, a periodic refresh, a source-system query, a human review, or some other verification process. A curated system can record timestamps, sources, freshness windows, and update rules, but the guarantee comes from the process that reconnects the content to the world.

Because curation takes work, there's always some lag between a change in the world and the content that reflects it. As a general rule, less lag is better. Information doesn't need to be perfectly up-to-date in order to be valuable. The important questions are whether the content makes a truthful claim about when it corresponded to the world, and whether the content is updated quickly enough to support valuable decisions.
Loading
Loading