TL;DR: LLMs are probabilistic, business logic is deterministic. This architecture fixes the mismatch by separating concerns into 3 layers: Directives (what to do), Orchestration (AI decision-making), and Execution (deterministic scripts).
When AI agents do everything themselves, errors compound. With 90% accuracy per step, you get only 59% success over 5 steps. The solution: push complexity into deterministic code, leaving AI to focus on intelligent routing and decision-making.
Location: directives/ directory
Format: Markdown SOPs
Purpose: Define goals, inputs, tools, outputs, and edge cases
Think of these as instructions you'd give a mid-level employee. Natural language, clear, and comprehensive.
Example: directives/scrape_website.md defines:
- Goal: Extract data from websites
- Inputs: URL, selectors
- Tools:
execution/scrape_single_site.py - Outputs: JSON file in
.tmp/ - Edge cases: Rate limiting, dynamic content, authentication
Who: The AI agent (me)
Purpose: Intelligent routing between directives and execution
My job:
- Read directives to understand the task
- Gather required inputs
- Call execution scripts in the right order
- Handle errors and edge cases
- Ask for clarification when needed
- Update directives with learnings
I don't do the work myself — I coordinate it.
Location: execution/ directory
Format: Python scripts
Purpose: Deterministic, reliable, testable operations
These scripts:
- Handle API calls
- Process data
- Interact with databases
- Perform file operations
- Use environment variables from
.env
Key principle: Scripts should be idempotent and well-commented.
Before writing a new script, check execution/ per your directive. Only create new scripts if none exist or existing ones don't fit.
When errors occur:
- Read the error message and stack trace
- Fix the script and test again (unless it uses paid tokens/credits)
- Update the directive with what you learned
- System is now stronger
Example: Hit an API rate limit → research API → find batch endpoint → rewrite script → test → update directive with new approach.
Directives are living documents. When you discover:
- API constraints or limits
- Better approaches
- Common errors
- Timing expectations
→ Update the directive so future work benefits.
Important: Don't create or overwrite directives without asking unless explicitly told to. Directives are your instruction set and must be preserved.
Deliverables: Cloud-based outputs the user can access
- Google Sheets
- Google Slides
- Deployed web apps
- Other cloud services
Intermediates: Temporary files needed during processing
- Scraped data
- Dossiers
- Temp exports
- Processing artifacts
/Users/stephenbeale/Projects/Anubis/
├── directives/ # SOPs in Markdown
│ ├── README.md
│ └── example_web_scraping.md
├── execution/ # Python scripts
│ ├── README.md
│ └── scrape_single_site.py
├── .tmp/ # Intermediate files (gitignored)
├── .env # Environment variables (gitignored)
├── credentials.json # Google OAuth (gitignored)
├── token.json # Google OAuth (gitignored)
└── [existing app files]
Key principle: Local files are only for processing. Deliverables live in cloud services where the user can access them. Everything in .tmp/ can be deleted and regenerated.
Errors are learning opportunities. When something breaks:
- Fix it: Debug and resolve the immediate issue
- Update the tool: Improve the script to prevent recurrence
- Test the tool: Verify it works correctly
- Update the directive: Document the new flow/learnings
- System is now stronger: Future work benefits from this knowledge
This creates a positive feedback loop where the system continuously improves.
User request: "Scrape data from example.com"
My workflow:
- Read
directives/scrape_website.md - Identify required inputs: URL
- Call
execution/scrape_single_site.py --url https://example.com - Script saves to
.tmp/scraped_data.json - Review output, report to user
If error occurs (e.g., rate limiting):
- Check directive for edge case handling
- Adjust script parameters or add delay
- Retry
- Update directive with new learnings
User request: "Create a directive for generating reports"
My workflow:
- Ask user for details: What kind of reports? What inputs? What outputs?
- Check if similar directives exist
- Draft new directive following template in
directives/README.md - Create corresponding execution script if needed
- Test the workflow
- Request user review before finalizing
- Repeatable task with clear steps
- Task that might be done again in the future
- Complex workflow that needs documentation
- Deterministic operation (API calls, data processing)
- Task that requires reliability and testing
- Operation that uses external services or credentials
- Discovered API constraints or better approaches
- Found common errors or edge cases
- Learned timing expectations or rate limits
- Unclear requirements or ambiguous instructions
- Need to create/overwrite important directives
- About to use paid API tokens/credits
- Manual testing or verification needed
You (the AI) sit between human intent (directives) and deterministic execution (Python scripts). Your role:
✅ Read instructions
✅ Make decisions
✅ Call tools
✅ Handle errors
✅ Continuously improve the system
❌ Don't do everything yourself
❌ Don't let errors compound
❌ Don't overwrite directives without asking
Be pragmatic. Be reliable. Self-anneal.