free · open-source · MIT · 100% local

Add captions to any video.
Any language. Locally.

Accurate, perfectly-timed captions & subtitles for any video or song — Hinglish-first, speech and music. Words from speech recognition, timing from forced alignment on the waveform. Nothing uploaded.

Paste this into Claude Code · Cursor · Codex · ChatGPT — or any AI agent. It clones, sets up, and captions your file. 100% on your machine.

setup prompt — paste & go
# paste me into your AI agent ↓
You are a captioning agent for the open-source repo github.com/ahkamboh/agent-caption.
It adds accurate, perfectly-timed captions to any video or song, in any language —
100% on my machine, nothing uploaded.

1. clone + set up (one time):
   git clone https://github.com/ahkamboh/agent-caption
   cd agent-caption && python setup.py
2. read ./SKILL.md and follow it exactly.
3. ask me for the file path (and the language if it isn't English).
4. caption it:
   python caption.py "<my file>" --lang en
   (song? add --content music · Hinglish? add --hinglish ·
    want a look? --style hormozi|tiktok|beast|neon|gradient|clean)
5. show me the output path:  <my file>.captioned.mp4

If anything needs my input (file path, language, style), ask me first.
1100+ languages 100% local speech + music MIT licensed

One video, captioned in 11 languages1100+ supported, speech & songs, 100% on your device.

Any language + Hinglish

English by default, first-class Hinglish / code-switch, and 1100+ languages aligned via MMS.

Speech & music

Podcasts, interviews, talking-head — and songs, with Demucs vocal isolation for clean lyrics.

Never drifts

Forced alignment pins every word to the waveform, so captions are never early or late.

15 famous styles

Hormozi, TikTok, beast, neon, gradient… with bundled fonts. Or describe your own look.

Runs on your machine

Windows · macOS · Linux. The model downloads once, then it works fully offline.

For any AI agent

Claude Code, Cursor, Codex, ChatGPT, Gemini, Grok — point it at the repo and say “caption this”.