OpenMontage: Your Coding Agent Becomes the Video Director

TL;DR

What it solves: One-shot text-to-video clips give you a slot machine. OpenMontage gives your coding agent a full production crew manual - research, script, assets, edit, compose - so the output can be a finished piece, not a single hallucinated clip.
Why it matters: Without structure, agents improvise video like interns on day one: pretty frames, no pacing, no budget, no second take.
Best for: Developers who already use Claude Code, Cursor, Copilot, or Windsurf daily and want to produce explainers, documentary montages, or Shorts-style content from the same window they write code.
Main differentiator: Agent-first architecture. There is no orchestrator app. You are the producer; the agent reads YAML pipeline manifests and Markdown director skills, then calls 52 Python tools.
Concrete path: git clone, make setup, paste one sentence into your agent. A 60-second explainer with Piper TTS, Pexels footage, and Remotion composition costs roughly zero API dollars if you stay on the free tier.

I asked Cursor for a 45-second explainer about black holes. It wrote a script, generated six FLUX images, picked a royalty-free piano track, and rendered an mp4. I never opened DaVinci. I also never clicked a Generate button anywhere - because there isn’t one.

Markdy animation

The vending machine vs. the edit bay

Most AI video products work like vending machines. You put in a prompt. Something drops out. Sometimes it is good. Usually you cannot explain why.

OpenMontage works like handing your coding agent the keys to an edit bay that already has labeled drawers: 12 pipeline playbooks (pipeline_defs/), 52 callable Python tools (tools/), and 400+ Markdown skills that teach the agent how each stage should behave (skills/). The agent is the director. The repo is the crew.

That distinction matters because real video is not one model call. It is research, pacing decisions, asset sourcing, voice direction, subtitle timing, a pre-compose sanity check, and only then a render. OpenMontage encodes that sequence the way a production team would - except the instructions are files your agent can read.

What the repo actually is

OpenMontage is a production playbook for coding agents so that finished video follows research, scripting, asset generation, and render review instead of one prompt and prayer.

Physically, it is a Python project with a Node.js Remotion/HyperFrames sidecar, FFmpeg on the PATH, and a forest of YAML + Markdown contracts. You clone it, run make setup, open the folder in Cursor or Claude Code, and talk to your agent in plain language.

The agent does not freestyle. It picks a pipeline (Animated Explainer, Documentary Montage, Talking Head, Clip Factory, and eight others), reads the stage director skill for each step, calls tools from the registry, checkpoints state to JSON, and asks for your approval at creative forks - voice choice, music bed, render engine.

Every pipeline follows the same spine:

research → proposal → script → scene_plan → assets → edit → compose

Web research is a first-class stage. Before a word of narration is written, the agent is supposed to search YouTube, Reddit, Hacker News, and news sites so your explainer is grounded in current facts - not last year’s training cutoff.

The smallest working example

Clone, setup, prompt. That is the whole interface.

git clone https://github.com/calesthio/OpenMontage.git
cd OpenMontage
make setup

Open the project in your AI coding assistant and say:

Make a 60-second animated explainer about how neural networks learn.

The agent selects the Animated Explainer pipeline, estimates cost before spending, generates narration (Piper TTS works offline with zero keys), creates visuals, composes in Remotion or HyperFrames, runs a post-render self-review (black frames, audio levels, subtitle drift), and only then shows you projects/<name>/renders/final.mp4.

No API keys required for that path. Add FAL_KEY or OPENAI_API_KEY later if you want FLUX images or premium voices - the scored provider selector ranks alternatives across seven dimensions and logs why it picked the winner.

💡 Tip: Start with the zero-key path: make setup, then ask for a 60-second Animated Explainer. Piper TTS and Remotion need no API keys. Add FAL_KEY only after you have one acceptable render on the free stack.

Three paths, one playbook

Path	What you supply	What the agent builds	Typical cost
Zero-key explainer	A topic sentence	Piper narration + Remotion motion graphics	~$0
Reference remix	A YouTube Short URL you love	Keeps pacing/hook, swaps topic and visuals	~$0–$1
Real-footage documentary	”Use real footage only”	CLIP search across Archive.org, NASA, Pexels; music, no VO	~$0

That third path is the one that caught me. OpenMontage is not only “animate stills and call it video.” The Documentary Montage pipeline builds a searchable corpus from free archives, retrieves actual motion clips, cuts them to a beat, and renders a finished timeline - closer to a junior editor than to a slideshow generator.

Five prompts that map cleanly to shipped pipelines:

Product demo for a side project: Animated Explainer with Piper narration and Remotion motion graphics.
Weekly newsletter clip: Documentary Montage, archive footage only, no voiceover.
Remix a Short you admire: paste the YouTube URL, keep hook pacing, swap the topic.
Internal onboarding video: Talking Head pipeline with an approved voice and brand palette.
Batch Shorts from one research pass: Clip Factory reuses the same corpus across multiple renders.

When the agent is the product

OpenMontage ships platform files for Claude Code (CLAUDE.md), Cursor (.cursor/rules/), Copilot, Windsurf, and Codex. All of them point to AGENT_GUIDE.md - the contract that says: read the pipeline manifest first, never improvise the workflow, checkpoint before GPU-heavy renders, respect budget caps (default $10, pause above $0.50 per action).

Quality gates are production-grade, which sounds boring until you’ve burned an afternoon on a render where 80% of frames were static JPEGs. Pre-compose validation blocks broken plans. Post-render review extracts frames at four positions, checks audio for silence and clipping, verifies the delivery promise (“motion-led” means motion-led). Failed review means the agent does not hand you garbage and call it done.

The README’s demo reel includes a 60-second Pixar-style banana short for $1.33, a product ad for $0.69 with one OpenAI key, and $0.15 anime pieces with FLUX + Remotion - useful anchors when you are deciding whether to add paid providers.

What is not great

Skip OpenMontage if you want a GUI and a single button. There is no Streamlit dashboard, no drag-and-drop timeline. You live in the agent chat.

Skip it if you do not already trust an AI coding assistant to run Python and read long instruction files. The repo assumes agentic operation; fighting that model means fighting the design.

⚠️ Warning: make setup pulls Python deps, Node for Remotion, and FFmpeg. Budget an hour on a fresh machine. A failed first render is usually a missing binary, not a bad prompt. AGPL-3.0 also blocks closed SaaS embedding without sharing modifications.

The concrete requirements are worth knowing before you start: Python 3.11+, Node 18+, FFmpeg on the PATH, and an optional GPU only if you want local video generation instead of cloud calls.

Heavy cinematic work still wants human eyes. The agent proposes; you approve voice, music, and style. That is a feature for quality, a cost for speed.

The director’s chair was empty

I used to think AI video meant typing into a website and hoping the clip looked intentional. OpenMontage convinced me the missing piece was never the model - it was the production grammar. Pipelines, checkpoints, provider scoring, archive search, render review.

The edit bay was always there. It just needed someone in the director’s chair who could read a playbook and call the tools in order.

My coding agent was already in the room. Now it knows what “final cut” means.

calesthio/OpenMontage · AGPL-3.0 · 5k