postiz-agent

Cron-driven autonomous publisher for any local MP3-first pipeline. Consumes a directory of slug.mp3 + slug.json + slug-cover.png, renders per-platform slide videos with word-level karaoke captions, and publishes to X, TikTok, Instagram, YouTube, and Spotify (RSS) via Postiz self-hosted.

Extracted from a daily audio publishing pipeline and generalized into a reusable MP3-first publishing layer: whisper-crash-hard-stop, 24h per-platform idempotency, retry with backoff, webhook alerts after exhaustion, caption moderation, and a JSONL decision log designed for an LLM to grep across runs.

Agents: read SKILL.md first. It explains when to use each command, what the flags mean, and what NOT to do.

Who this is for

Narrow on purpose:

You run a local MP3-first pipeline (a TTS loop, podcast staging folder, lesson generator, briefing generator, or anything that drops slug.mp3 + slug.json into a directory) and want a cron-driven autonomous publisher on top of it.
You want the publish layer self-hosted and open-source, not a SaaS dashboard. Postiz handles OAuth and scheduling; this agent handles the MP3 → video step and the orchestration.
You want an agent-readable decision log (data/decisions.jsonl) so an LLM can reason across runs without re-hitting platform APIs.

Not for:

General podcasters looking for "audio → viral clips." Headliner, Descript, Opus Clip, and Recast Studio already do that with more visual variety and no code. Use those unless you specifically need the above.
Anyone who doesn't have a local MP3 pipeline and doesn't want to stand one up. Without the upstream, there's nothing to publish.

The gap it fills

Postiz is a great social scheduler but it's content-agnostic — you hand it a finished post and it publishes. An MP3 pipeline produces audio files. No social platform accepts raw audio. This project is the layer that turns one into the other.

  local/output/                          Postiz (X/TikTok/IG adapters)
  ├─ slug.mp3                                     ▲
  ├─ slug.json  ─────── postiz-agent ─────────────┘
  └─ slug-cover.png        │
                           ├─ whisper word-level transcription
                           ├─ HyperFrames slide video per aspect ratio
                           ├─ external YouTube CLI (bring-your-own)
                           └─ RSS feed (for Spotify/Apple)

The input schema is small and source-agnostic: title, content, mood, optional beats, and meta. Legacy aliases such as titulo and contenido are still accepted for backward compatibility, but the internal contract is generic.

What gets published

Platform	Format	Render spec
X	Video	1:1 · 1080×1080 · up to 4h (Premium)
TikTok	Video	9:16 · 1080×1920 · up to 10min
Instagram Reels	Video	9:16 · 1080×1920 · up to 3min (multi-part for longer audio)
YouTube	Video	16:9 · 1920×1080 · no limit
Spotify / Apple / Amazon	Audio (RSS)	MP3 feed polled hourly

Each video is a slide-based composition: page-like pacing (15–25 words per slide), current-word highlight, narrator's voice as the audio track. The visual identity is driven by the content item's mood field. Today the project ships one template (fantasia); all other moods fall back to it with a warning recorded in the decision log.

Instagram Reels cap at 3 minutes, so longer audio is automatically split into N ≤170s parts, each rendered with a PARTE i/N ribbon and scheduled 5 minutes apart so they land in order on your feed.

Example

A one-minute story becomes four ready-to-upload MP4s with a single command:

$ postiz-agent render --slug dragon-marcos --platforms x,tiktok,instagram,youtube

content: "El dragón curioso" (145 words, 1min)
transcribing audio...
  118 words → tmp/dragon-marcos/dragon-marcos.json
→ x          dragon-marcos-x.mp4          1080×1080 · 40s · 2.3MB
→ tiktok     dragon-marcos-tiktok.mp4     1080×1920 · 40s · 2.6MB
→ instagram  dragon-marcos-instagram.mp4  1080×1920 · 40s · 2.6MB
→ youtube    dragon-marcos-youtube.mp4    1920×1080 · 40s · 2.6MB

Swap render for publish and the same videos get uploaded through Postiz and YouTubeCLI.

Install

Prerequisites: Node 20+, ffmpeg, whisper (pip install openai-whisper), Docker (for the bundled self-hosted Postiz).

git clone https://github.com/DjinnFoundry/postiz-agent.git
cd postiz-agent
pnpm install

cp .env.example .env
# edit POSTIZ_API_URL, POSTIZ_API_KEY, CONTENT_OUTPUT_DIR
# (optional) set ALERT_WEBHOOK_URL to get a POST when a platform publish fails
# (optional) set YOUTUBECLI_PATH if you have a YouTube CLI to delegate to;
#            leave unset to skip YouTube

# Optional: install HyperFrames skills for Claude Code / Cursor
npx skills add heygen-com/hyperframes

Then deploy Postiz self-hosted and connect each platform's OAuth:

cd deploy
cp .env.example .env
# add X_API_KEY, X_API_SECRET, TIKTOK_CLIENT_KEY, INSTAGRAM_APP_ID, ...
docker compose up -d
# open http://localhost:5000 and connect X, TikTok, Instagram
# copy the Postiz public API key back to the project's root .env

Verify:

$ postiz-agent status
✓ ffmpeg installed
✓ ffprobe installed
✓ whisper installed
✓ npx installed
✓ Content output dir     /path/to/content/output
✓ Postiz API reachable   http://localhost:5000/public/v1
✓ POSTIZ_API_KEY set     present
✓ YouTubeCLI project path /path/to/youtubecli

Commands

Full reference is in the CLI itself (postiz-agent <cmd> --help). Short version:

Command	What it does
`dispatch`	Autonomously pick the next content item not yet published and run it. Cron-safe.
`status`	Env health check — run this first. Reports tooling + Postiz integration health per platform. `--strict` escalates warnings to exit 1.
`integrations`	List connected Postiz accounts
`render --slug <s> --platforms <list>`	Build MP4s, no upload
`publish --slug <s> --platforms <list>`	Render + upload to each platform
`rss --output <path>`	Rebuild the Spotify/Apple RSS feed
`decisions [--slug s] [--platform p]`	Query the JSONL publish history

Every command supports --help. dispatch, publish, render, status, integrations, and decisions support --json for agent-readable output.

Reliability features (enabled by default on `publish` / `dispatch`)

Retry with exponential backoff. Transient 5xx / network errors retry 3 times (~2s base, jittered). 4xx does not retry.
24h idempotency guard. Skips any (slug, platform) that already has a successful decision-log entry in the last 24 hours. Bypass with --force.
Webhook alerts. Set ALERT_WEBHOOK_URL. After retries exhaust, the agent fires a 5s-timeout POST with {slug, platform, error, attempts, timestamp}. Fire-and-forget.
Caption moderation. Whisper output passes through a Spanish blocklist (src/media/spanish-blocklist.json) to guard against embarrassing mis-transcriptions. Replacements are recorded as warnings. Disable for debugging with --no-moderation (not recommended).
Whisper failure = hard stop. If whisper crashes, publish aborts with exit 1 before any platform is touched. Override with --allow-no-captions when you want the video out regardless.
Instagram multi-part split. Cuentos > 3 min on IG are auto-chunked into N ≤170s parts, each with a PARTE i/N ribbon, scheduled 5 minutes apart.

Scheduling

Ready-to-edit configs ship in deploy/cron/:

crontab.example (Linux/macOS)
com.djinnfoundry.postiz-agent.plist (macOS launchd)
README.md (systemd timer for Linux servers)

The typical setup:

# Linux/macOS crontab, daily at 08:00
0 8 * * * cd /path/to/postiz-agent && pnpm dev dispatch --platforms x,tiktok,instagram,youtube --json >> data/cron.log 2>&1

dispatch exits 0 with {"dispatched": false, "reason": "nothing pending"} when there is nothing to publish, so you can run it more often than your content cadence without churn.

Architecture

src/
├── cli.ts                   # commander entrypoint; 6 subcommands
├── orchestrator.ts          # loop: story → transcript → publishers → decision log
├── config.ts · types.ts
│
├── content/reader.ts        # read MP3, JSON metadata, and cover art from content output
├── media/
│   ├── subtitles.ts         # whisper CLI → word-level JSON (disk-cached)
│   ├── whisper-json.ts      # parser
│   └── slide-video.ts       # stages assets, drives `npx hyperframes render`
│
├── dispatch.ts              # picks the next unpublished story for autonomous runs
├── idempotency.ts           # 24h duplicate-publish guard
├── platforms/
│   ├── base.ts              # PlatformPublisher + VideoPublisher strategy base
│   ├── postiz-video-publisher.ts # shared upload() for X/TikTok/IG
│   ├── instagram-split.ts   # splits audio into ≤180s windows on beat/word boundaries
│   ├── registry.ts          # platform → publisher
│   ├── postiz.ts            # Postiz public API client (upload + create post)
│   ├── youtube.ts           # shells out to YouTubeCLI
│   ├── {x,tiktok,instagram,youtube,spotify}-publisher.ts
│   └── spotify-rss.ts       # builds iTunes-format podcast feed
│
├── media/caption-moderation.ts  # Spanish blocklist filter for whisper output
├── decisions/log.ts         # JSONL append-only decision log
└── lib/{process,ffprobe,retry,alerts,slug}.ts

hyperframes/                 # HyperFrames project (HTML → MP4, HeyGen/Apache-2.0)
├── hyperframes.json · CLAUDE.md · AGENTS.md
└── templates/
    ├── common.mjs           # buildPages, palette, HTML base, part-ribbon helper
    └── fantasia.mjs         # the only mood template; all other moods fall back to this

deploy/
├── docker-compose.yml       # self-hosted Postiz + Postgres + Redis
├── cron/                    # crontab + launchd + systemd timer examples
└── README.md                # OAuth setup walkthrough

Why this shape

One publisher per file, uniform base class. Adding a platform is: new file + one method + one line in registry.ts. Keeps each platform's quirks isolated.
Transcription runs once. Whisper processes the MP3 and caches the JSON. All three video variants consume the same timestamps.
YouTube is delegated to an external CLI. Set YOUTUBECLI_PATH to any tool that accepts run youtube_cli video upload --file ... --title ... --description ... --privacy ... [--tags ...] and prints videoId: <11 chars> on stdout (see src/platforms/youtube.ts for the exact contract). If you don't have one, leave the env unset — YouTube will simply not appear as a target.
Spotify is RSS, not API. There is no per-episode publishing endpoint for indie podcasters. We host a feed; the platforms poll it.
Video templates are plain HTML + GSAP, not React. This project originally used Remotion. We migrated to HyperFrames because an agent can write new mood templates in HTML+CSS+GSAP directly. The output is also 40% smaller at equivalent quality.
Decision log is JSONL, not SQL. Append-only, trivially greppable, human-readable. The agent's memory across runs.

Costs

For 1 story/day across all 5 platforms:

Platform	Cost	Setup
X	~$2–$10/mo (pay-per-use, or $8/mo Premium for 4h video limit)	developer.x.com
TikTok	free	developers.tiktok.com (requires production review)
Instagram	free	Meta App + Business/Creator IG account
YouTube	free (quota)	Google Cloud project
Spotify + Apple + Amazon	free	RSS submission only

Total: under ~$10/month of API costs. Everything else is open source.

Status

Shipping:

End-to-end publish pipeline (status ok → render → upload → log), 93 unit + integration tests
dispatch subcommand for autonomous daily runs (cron/launchd/systemd examples in deploy/cron/)
Retry with backoff, 24h idempotency guard, webhook alerts, whisper-failure hard stop
Caption moderation against a Spanish blocklist
fantasia mood template (all other moods fall back to it with a warning)
Automatic multi-part splitting for IG Reels audio > 3 min
Whisper transcription with caching
Spotify RSS generator (per-episode image, 2-sentence teaser, SPOTIFY_RSS_EXCLUDE_SLUGS env)
Decision log with CLI query
Self-hosted Postiz docker-compose

Intentionally not shipping yet:

Additional mood templates — fantasia covers all moods via fallback. Authoring the other six is deferred by product decision.
Engagement ingestion from YouTubeCLI + Postiz back into the decision log (future feedback loop).
Automated clip selection ("post the best 30s of a 5min item"). Content runs at narration pace by default; clip selection is a separate product layer.
Automatic upload of the Spotify RSS feed + MP3s to R2. rss builds feed.xml; operator handles the upload.

How this compares

Need	Use this
I have a podcast RSS feed and want auto-posted clips on every episode with 10+ template options	Headliner
I want to turn long video into viral short-form	Opus Clip, Submagic
I want a polished text-edit-driven audiogram maker	Descript
I have a local MP3-first pipeline and want a cron-driven, self-hosted, multi-platform publisher with an LLM-legible decision log	this repo

Origin

Extracted from a production audio-publishing toolchain and open-sourced because the reliability scaffolding around multi-platform publishing (retry, idempotency, alerts, caption moderation, whisper-crash hard-stop, decision log) is generic enough to stand alone. The product boundary is the MP3-first package contract, not any one upstream generator.

License

MIT. Use freely; attribution appreciated.

For agents

See SKILL.md for workflow heuristics, platform quirks, and things to avoid. It's the contract this repo publishes for LLM agents to consume.

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
deploy		deploy
docs		docs
hyperframes		hyperframes
src		src
tests		tests
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
SKILL.md		SKILL.md
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

postiz-agent

Who this is for

The gap it fills

What gets published

Example

Install

Commands

Reliability features (enabled by default on `publish` / `dispatch`)

Scheduling

Architecture

Why this shape

Costs

Status

How this compares

Origin

License

For agents

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

postiz-agent

Who this is for

The gap it fills

What gets published

Example

Install

Commands

Reliability features (enabled by default on publish / dispatch)

Scheduling

Architecture

Why this shape

Costs

Status

How this compares

Origin

License

For agents

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Reliability features (enabled by default on `publish` / `dispatch`)

Packages