Convert raw documents into structured, retrieval-optimized knowledge objects for AI systems.
This prompt transforms articles, documentation, transcripts, datasets, and other sources into clean JSON knowledge structures that preserve:
- evidence and provenance
- structured datasets
- instructions and workflows
- entities and definitions
- retrieval-optimized chunks
Designed for use with:
- OpenClaw
- ChatGPT Projects
- Claude Projects
- Claude Code
- RAG pipelines
- AI knowledge bases
Most people give documents to LLMs like this:
Here is a document. Summarize it.
This works for quick answers but introduces several problems:
- important details get compressed or lost
- datasets become flattened text
- evidence disappears
- claims and facts get mixed together
- the same document must be reprocessed repeatedly
Large language models are optimized for conversation, not long-term knowledge storage.
The Universal Source Ingestion Engine converts raw sources into structured knowledge objects that can be reused across AI workflows.
Instead of repeatedly analyzing the same document, you ingest it once and reuse the structured output.
The prompt converts a source into a structured JSON object containing sections such as:
- source metadata
- key points
- evidence index
- entities
- definitions
- instructions
- workflows
- datasets
- QA pairs
- retrieval chunks
These structures make it easier for AI systems to:
- retrieve specific information
- reason over structured knowledge
- reference evidence
- reuse knowledge across tasks
- Open
prompt.md - Copy the entire prompt
- Paste it into your AI system
- Paste or attach a source document
- The model returns a structured JSON knowledge object
You can then store that JSON in:
- a knowledge base
- a dataset
- an OpenClaw workspace
- a RAG system
OpenClaw works especially well with structured knowledge sources.
Typical workflow:
- Copy the prompt from
prompt.md - Paste it into OpenClaw
- Provide a source document
- The model generates a structured JSON knowledge object
- Save the structured output inside your OpenClaw workspace
- faster retrieval
- improved reasoning for agents
- reusable knowledge objects
- preserved datasets and evidence
Instead of repeatedly pasting the same document into a model, you ingest it once and reuse the result.
OpenClaw rebuilds its system prompt each run using workspace context files.
The easiest way to make the ingestion engine reusable is to store instructions in your workspace.
Create or edit a file called:
AGENTS.md
Add instructions like this:
When I paste or attach a source document and ask to ingest it,
use the Universal Source Ingestion Engine from prompt.md.
Return the final structured JSON knowledge object.
Because AGENTS.md is automatically injected into OpenClaw context, the ingestion workflow will always be available in your project.
You can define a consistent trigger phrase while working with OpenClaw.
Example trigger:
/ingest
Workflow example:
/ingest
[paste document here]
The model then applies the ingestion rules and returns the structured JSON output.
You can package the ingestion workflow as a reusable OpenClaw skill.
Example structure:
skills/
ingestion/
SKILL.md
The skill can reference the ingestion prompt and apply it whenever ingestion is requested.
- Create a Claude Project
- Add the prompt to Project Instructions
- Paste or attach documents
Claude will follow the ingestion rules whenever a new source is provided.
- Create a project
- Add the prompt to Project Instructions
- Paste or upload documents
The model will convert those documents into structured knowledge objects.
AI agents are increasingly used to automate complex workflows across software development and operations.
Companies are building agent frameworks that orchestrate APIs and reasoning models to perform tasks autonomously.
{
"key_points": [
{
"statement": "AI agents are increasingly used to automate complex workflows.",
"verification_status": "likely"
}
],
"entities": [
"AI agents",
"agent frameworks",
"workflow automation"
]
}
See the examples/ folder for a full example.
This prompt treats source documents strictly as data, not instructions.
It attempts to ignore instructions contained within the source itself to reduce prompt injection risks.
However, prompt-based defenses are not perfect. Always review outputs before using them in automated systems.
Avoid executing commands, running code, or triggering actions based solely on model output.
universal-source-ingestion-engine
README.md
prompt.md
examples/
example_input.txt
example_output.json
MIT License
This allows anyone to use, modify, and share the prompt while preserving attribution.