Pixelle-Video: One Topic, a ComfyUI Assembly Line
TL;DR
- Feeds a topic or a fixed script through script planning, per-shot media, voice, optional BGM, then ffmpeg composition into
output/, all from a three-column Streamlit UI (the README). - Image and motion come from ComfyUI locally or from RunningHub JSON workflows; you swap graphs by dropping API-format files named
image_*,video_*, ortts_*intoworkflows/(the FAQ). - Best for makers who already speak ComfyUI and want templated vertical or landscape shorts without living inside a nonlinear editor (the README).
- Differentiator: atomic workflows you can remix (FLUX instead of default image graph, ChatTTS instead of Edge) while the rest of the pipeline stays put (the README).
- Concrete path:
uv run streamlit run web/app.py, openhttp://localhost:8501, save LLM + ComfyUI or RunningHub keys, click generate, read the mp4 path the UI prints beside duration and shot count (the README, the FAQ).
The upload bar is not the enemy this time. The enemy is shot five of seven still waiting on a GPU queue while your caption writer pings you for a script tweak. I stared at a blank timeline and realized I was not stuck on creativity. I was stuck on handoffs.
Situation: shorts without a human conveyor belt
Short-form video is a kitchen during rush hour. Someone writes the ticket, someone fires the line, someone plates. When you are all three roles, the metaphor breaks and you only feel the heat. Pixelle-Video is the expediter: it keeps the ticket readable, routes each dish to the right station, and brings back a finished plate in output/ (the README). The tension that remains is whether your ComfyUI stations stay online.
Task: what it does in one sentence
This repo is Pixelle-Video, an Apache-2.0 Streamlit app that automates short video creation by chaining an OpenAI-compatible LLM (optional), ComfyUI or RunningHub media workflows, TTS, optional background music, and HTML templates, then ffmpeg, so a topic or script becomes a rendered file without manual NLE steps (the README).
Real-world use cases
- Explainer shorts from a single prompt: AI Generated Content mode turns a topic like “why read daily” into narration plus visuals (the README).
- Narration-first with your own words: Fixed Script Content skips the LLM writer and still generates frames and voice (the README).
- Brand voice via reference audio: upload MP3, WAV, or FLAC for cloning workflows such as Index-TTS graphs you place under
tts_prefixes (the README). - Template A/B for aspect ratio: pick
static_*,image_*, orvideo_*HTML templates grouped by portrait, square, or landscape (the README). - Custom Comfy graphs: bind
$prompt.text!or$prompt.value!on the CLIP text node title, export API Format, name files correctly, reload the UI (the FAQ).
Walkthrough: before and after the button
# Before (manual NLE mental load)
1. Draft script in a doc.
2. Generate stills or clips in separate tools.
3. Record VO or run TTS elsewhere.
4. Align layers, levels, and exports in an editor.
# After (Pixelle-Video happy path)
git clone https://github.com/AIDC-AI/Pixelle-Video.git
cd Pixelle-Video
uv run streamlit run web/app.py
# Browser opens http://localhost:8501
# System Configuration → LLM + ComfyUI http://127.0.0.1:8188 or RunningHub key → Save
# Generate Video → watch progress → open output/<generated>.mp4
Expected observation: the UI shows steps such as “Frame 3/5 - Generating Image” and ends with duration, size, and shot count next to the file (the README, the FAQ). But what happens when the free TTS lane stalls?
How to use it
Top commands and what you should see:
-
Clone and enter the tree
- Command:
git clone https://github.com/AIDC-AI/Pixelle-Video.git && cd Pixelle-Video - Outcome: repository with
web/app.py,workflows/,templates/,bgm/(the README).
- Command:
-
Launch the Streamlit shell (uv path)
- Command:
uv run streamlit run web/app.py - Outcome: local server on port
8501, browser tab with three columns (the README).
- Command:
-
Windows all-in-one
- Action: download the latest release bundle, run
start.bat - Outcome: same port without installing Python manually (the README).
- Action: download the latest release bundle, run
-
Configure media backends
- Local: set ComfyUI URL (default
http://127.0.0.1:8188) and hit Test Connection - Cloud: paste RunningHub API key when you use bundled
runninghub/*.jsonworkflows (the README,config.example.yaml).
- Local: set ComfyUI URL (default
-
Verify artifacts
- Command:
ls output/after a successful run - Outcome: composed mp4 files the UI referenced (the FAQ).
- Command:
Directory shape worth memorizing:
Pixelle-Video/
web/app.py # Streamlit entry
workflows/ # image_*.json, video_*.json, tts_*.json (API format)
templates/ # static_*, image_*, video_* HTML
bgm/ # drop custom MP3/WAV
output/ # rendered videos land here
Markdy scene: the night the TTS lane caught fire
--> // making it invisible to querySelectorAll. // // This inline script is NOT touched by Rocket Loader (no src, no type attr). // It rescues module scripts via two strategies: // 1. Query the DOM for type$="-module" + src (covers case A) // 2. Regex-parse the raw HTML for commented-out script tags (covers case B) // Dynamically-created scripts bypass Rocket Loader entirely. (function () { if (window.__markdyRescue) return; window.__markdyRescue = true; var rescued = false; function rescueModuleScripts() { if (rescued) return; rescued = true; var srcs = []; // Strategy 1: Rocket Loader kept the tag in DOM but changed the type. // type="module" → type="{uuid}-module" (still has src attribute) document.querySelectorAll('script[type$="-module"][src]').forEach(function (s) { srcs.push(s.src); }); // Strategy 2: Rocket Loader COMMENTED OUT the script tag entirely: // // These are invisible to querySelectorAll, so we parse the raw HTML. // We handle both attribute orderings (type-first or src-first). var html = document.documentElement.innerHTML; var reSrcFirst = //g; var reTypeFirst = //g; var m; while ((m = reSrcFirst.exec(html)) !== null) { srcs.push(m[1]); } while ((m = reTypeFirst.exec(html)) !== null) { srcs.push(m[1]); } // Re-inject each found src as a real module script. // Deduplicate first, then inject. Dynamically-created scripts bypass // Rocket Loader entirely. Modules with the same URL are only executed // once by the browser (cached), so re-injecting already-running scripts // is safe. var seen = {}; srcs.forEach(function (src) { if (seen[src]) return; seen[src] = true; var fix = document.createElement('script'); fix.type = 'module'; fix.src = src; document.head.appendChild(fix); }); } // Rescue when user clicks the placeholder (fallback if autoplay failed). document.addEventListener('click', function (e) { var t = e.target; if (t && typeof t.closest === 'function' && t.closest('.markdy-placeholder')) { rescueModuleScripts(); } }); // Rescue automatically after a short delay for autoplay. // Only fires if initAll() never ran (no data-markdy-init on any root). setTimeout(function () { if (document.querySelector('.markdy-root:not([data-markdy-init])')) { rescueModuleScripts(); } }, 1500); }());Configuration and customization
- LLM block in
config.example.yaml: setapi_key,base_url, andmodelfor any OpenAI-compatible endpoint. Change when you outgrow the bundled presets (Qwen, GPT, DeepSeek, Ollama URLs are commented inline) (config.example.yaml). - ComfyUI URL vs Docker: use
http://127.0.0.1:8188on bare metal; switch tohost.docker.internal:8188on Mac or Windows containers, or your host IP on Linux, so queues still reach the GPU box (config.example.yaml). runninghub_concurrent_limit: raise only when your membership tier allows more parallel cloud jobs; keeps bills predictable (config.example.yaml).prompt_prefix: English style string applied before every image or video prompt; tighten it when brand guidelines demand a single look (config.example.yaml).template.default_template: points at paths such as1080x1920/image_default.html; change when your distribution channel demands a new aspect ratio (config.example.yaml).
💡 Tip: Copy config.example.yaml to config.yaml for static defaults, but remember the UI also writes runtime settings. Never commit secrets (config.example.yaml).
Where it fits (and where it does not)
- Fits when ComfyUI or RunningHub is already part of your stack and you want a guided short-video wrapper with history, batch tasks, and motion-transfer add-ons noted in recent release bullets (the README).
- Does not fit when you need a fully managed editor with collaborative timelines and manual keyframes; this is automation-first.
- Works alongside Pixelle-MCP if you also want agents invoking ComfyUI directly (the README).
The rough edges
- Edge-TTS depends on Microsoft’s free surface and may fail on noisy networks; the FAQ explicitly recommends ComfyUI
tts_workflows for stability (the FAQ). - LLM failures usually trace to bad
base_url, exhausted balances, or typoed model names; the FAQ tells you to verify each (the FAQ). - Features that rely on Chromium may throw “Could not find a Chrome executable”; installing Chrome resolves it (the FAQ).
- Generation latency grows with shot count, model speed, and WAN conditions; README sets expectations in minutes, not seconds (the README).
⚠️ Warning: Treat API keys like cash. Streamlit binds locally, yet any script with your RunningHub or LLM tokens can rack up charges if you expose the port.
How it compares: alternatives the repo names
The README credits Pixelle-MCP, MoneyPrinterTurbo, NarratoAI, MoneyPrinterPlus, and ComfyKit as inspiration (the README). Pixelle-Video’s bet is composable Comfy graphs plus a Streamlit control surface instead of a monolithic editor.
Final thoughts
I stopped measuring the night by how many keyframes I touched. The concrete result is a rendered mp4 in output/, a UI that narrates each shot, and a community that pushed the repo past twelve thousand stars while shipping motion transfer and digital-human pipelines (the README). The kitchen still gets hot, but the tickets finally print legibly.
AIDC-AI/Pixelle-Video · Apache-2.0 · 12738 · docs
Hoang Yell
A software developer and technical storyteller. I spend my time exploring the most interesting open-source repositories on GitHub and presenting them as accessible stories for everyone.