Accurate, perfectly-timed captions & subtitles for any video or song — Hinglish-first, speech and music. Words from speech recognition, timing from forced alignment on the waveform. Nothing uploaded.
Paste this into Claude Code · Cursor · Codex · ChatGPT — or any AI agent. It clones, sets up, and captions your file. 100% on your machine.
# paste me into your AI agent ↓ You are a captioning agent for the open-source repo github.com/ahkamboh/agent-caption. It adds accurate, perfectly-timed captions to any video or song, in any language — 100% on my machine, nothing uploaded. 1. clone + set up (one time): git clone https://github.com/ahkamboh/agent-caption cd agent-caption && python setup.py 2. read ./SKILL.md and follow it exactly. 3. ask me for the file path (and the language if it isn't English). 4. caption it: python caption.py "<my file>" --lang en (song? add --content music · Hinglish? add --hinglish · want a look? --style hormozi|tiktok|beast|neon|gradient|clean) 5. show me the output path: <my file>.captioned.mp4 If anything needs my input (file path, language, style), ask me first.
One video, captioned in 11 languages — 1100+ supported, speech & songs, 100% on your device.
English by default, first-class Hinglish / code-switch, and 1100+ languages aligned via MMS.
Podcasts, interviews, talking-head — and songs, with Demucs vocal isolation for clean lyrics.
Forced alignment pins every word to the waveform, so captions are never early or late.
Hormozi, TikTok, beast, neon, gradient… with bundled fonts. Or describe your own look.
Windows · macOS · Linux. The model downloads once, then it works fully offline.
Claude Code, Cursor, Codex, ChatGPT, Gemini, Grok — point it at the repo and say “caption this”.