Skip to content

openclawunboxed/ingestion-engine

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 

Repository files navigation

Ingestion Engine

Convert raw documents into structured, retrieval-optimized knowledge objects for AI systems.

This prompt transforms articles, documentation, transcripts, datasets, and other sources into clean JSON knowledge structures that preserve:

  • evidence and provenance
  • structured datasets
  • instructions and workflows
  • entities and definitions
  • retrieval-optimized chunks

Designed for use with:

  • OpenClaw
  • ChatGPT Projects
  • Claude Projects
  • Claude Code
  • RAG pipelines
  • AI knowledge bases

Why This Exists

Most people give documents to LLMs like this:

Here is a document. Summarize it.

This works for quick answers but introduces several problems:

  • important details get compressed or lost
  • datasets become flattened text
  • evidence disappears
  • claims and facts get mixed together
  • the same document must be reprocessed repeatedly

Large language models are optimized for conversation, not long-term knowledge storage.

The Universal Source Ingestion Engine converts raw sources into structured knowledge objects that can be reused across AI workflows.

Instead of repeatedly analyzing the same document, you ingest it once and reuse the structured output.


What the Engine Produces

The prompt converts a source into a structured JSON object containing sections such as:

  • source metadata
  • key points
  • evidence index
  • entities
  • definitions
  • instructions
  • workflows
  • datasets
  • QA pairs
  • retrieval chunks

These structures make it easier for AI systems to:

  • retrieve specific information
  • reason over structured knowledge
  • reference evidence
  • reuse knowledge across tasks

Quick Start

  1. Open prompt.md
  2. Copy the entire prompt
  3. Paste it into your AI system
  4. Paste or attach a source document
  5. The model returns a structured JSON knowledge object

You can then store that JSON in:

  • a knowledge base
  • a dataset
  • an OpenClaw workspace
  • a RAG system

Using This With OpenClaw

OpenClaw works especially well with structured knowledge sources.

Typical workflow:

  1. Copy the prompt from prompt.md
  2. Paste it into OpenClaw
  3. Provide a source document
  4. The model generates a structured JSON knowledge object
  5. Save the structured output inside your OpenClaw workspace

Benefits

  • faster retrieval
  • improved reasoning for agents
  • reusable knowledge objects
  • preserved datasets and evidence

Instead of repeatedly pasting the same document into a model, you ingest it once and reuse the result.


Saving the Engine in OpenClaw

OpenClaw rebuilds its system prompt each run using workspace context files.

The easiest way to make the ingestion engine reusable is to store instructions in your workspace.

Method 1 — Add Instructions to AGENTS.md (Recommended)

Create or edit a file called:

AGENTS.md

Add instructions like this:

When I paste or attach a source document and ask to ingest it,
use the Universal Source Ingestion Engine from prompt.md.

Return the final structured JSON knowledge object.

Because AGENTS.md is automatically injected into OpenClaw context, the ingestion workflow will always be available in your project.


Method 2 — Manual Trigger Phrase

You can define a consistent trigger phrase while working with OpenClaw.

Example trigger:

/ingest

Workflow example:

/ingest

[paste document here]

The model then applies the ingestion rules and returns the structured JSON output.


Method 3 — Package as an OpenClaw Skill (Advanced)

You can package the ingestion workflow as a reusable OpenClaw skill.

Example structure:

skills/
  ingestion/
    SKILL.md

The skill can reference the ingestion prompt and apply it whenever ingestion is requested.


Using This With Claude Projects

  1. Create a Claude Project
  2. Add the prompt to Project Instructions
  3. Paste or attach documents

Claude will follow the ingestion rules whenever a new source is provided.


Using This With ChatGPT Projects

  1. Create a project
  2. Add the prompt to Project Instructions
  3. Paste or upload documents

The model will convert those documents into structured knowledge objects.


Example

Example Source

AI agents are increasingly used to automate complex workflows across software development and operations.

Companies are building agent frameworks that orchestrate APIs and reasoning models to perform tasks autonomously.

Example Output Structure

{
  "key_points": [
    {
      "statement": "AI agents are increasingly used to automate complex workflows.",
      "verification_status": "likely"
    }
  ],
  "entities": [
    "AI agents",
    "agent frameworks",
    "workflow automation"
  ]
}

See the examples/ folder for a full example.


Safety Note

This prompt treats source documents strictly as data, not instructions.

It attempts to ignore instructions contained within the source itself to reduce prompt injection risks.

However, prompt-based defenses are not perfect. Always review outputs before using them in automated systems.

Avoid executing commands, running code, or triggering actions based solely on model output.


Repository Structure

universal-source-ingestion-engine

README.md
prompt.md

examples/
  example_input.txt
  example_output.json

License

MIT License

This allows anyone to use, modify, and share the prompt while preserving attribution.

About

Prompt for converting raw sources into structured, retrieval-optimized JSON knowledge objects for OpenClaw and AI knowledge bases.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors