Pixelle-Video: One Topic, a ComfyUI Assembly Line

TL;DR

Feeds a topic or a fixed script through script planning, per-shot media, voice, optional BGM, then ffmpeg composition into output/, all from a three-column Streamlit UI (the README).
Image and motion come from ComfyUI locally or from RunningHub JSON workflows; you swap graphs by dropping API-format files named image_*, video_*, or tts_* into workflows/ (the FAQ).
Best for makers who already speak ComfyUI and want templated vertical or landscape shorts without living inside a nonlinear editor (the README).
Differentiator: atomic workflows you can remix (FLUX instead of default image graph, ChatTTS instead of Edge) while the rest of the pipeline stays put (the README).
Concrete path: uv run streamlit run web/app.py, open http://localhost:8501, save LLM + ComfyUI or RunningHub keys, click generate, read the mp4 path the UI prints beside duration and shot count (the README, the FAQ).

The upload bar is not the enemy this time. The enemy is shot five of seven still waiting on a GPU queue while your caption writer pings you for a script tweak. I stared at a blank timeline and realized I was not stuck on creativity. I was stuck on handoffs.

Situation: shorts without a human conveyor belt

Short-form video is a kitchen during rush hour. Someone writes the ticket, someone fires the line, someone plates. When you are all three roles, the metaphor breaks and you only feel the heat. Pixelle-Video is the expediter: it keeps the ticket readable, routes each dish to the right station, and brings back a finished plate in output/ (the README). The tension that remains is whether your ComfyUI stations stay online.

Task: what it does in one sentence

This repo is Pixelle-Video, an Apache-2.0 Streamlit app that automates short video creation by chaining an OpenAI-compatible LLM (optional), ComfyUI or RunningHub media workflows, TTS, optional background music, and HTML templates, then ffmpeg, so a topic or script becomes a rendered file without manual NLE steps (the README).

Real-world use cases

Explainer shorts from a single prompt: AI Generated Content mode turns a topic like “why read daily” into narration plus visuals (the README).
Narration-first with your own words: Fixed Script Content skips the LLM writer and still generates frames and voice (the README).
Brand voice via reference audio: upload MP3, WAV, or FLAC for cloning workflows such as Index-TTS graphs you place under tts_ prefixes (the README).
Template A/B for aspect ratio: pick static_*, image_*, or video_* HTML templates grouped by portrait, square, or landscape (the README).
Custom Comfy graphs: bind $prompt.text! or $prompt.value! on the CLIP text node title, export API Format, name files correctly, reload the UI (the FAQ).

Walkthrough: before and after the button

# Before (manual NLE mental load)
1. Draft script in a doc.
2. Generate stills or clips in separate tools.
3. Record VO or run TTS elsewhere.
4. Align layers, levels, and exports in an editor.

# After (Pixelle-Video happy path)
git clone https://github.com/AIDC-AI/Pixelle-Video.git
cd Pixelle-Video
uv run streamlit run web/app.py
# Browser opens http://localhost:8501
# System Configuration → LLM + ComfyUI http://127.0.0.1:8188 or RunningHub key → Save
# Generate Video → watch progress → open output/<generated>.mp4

Expected observation: the UI shows steps such as “Frame 3/5 - Generating Image” and ends with duration, size, and shot count next to the file (the README, the FAQ). But what happens when the free TTS lane stalls?

How to use it

Top commands and what you should see:

Clone and enter the tree
- Command: git clone https://github.com/AIDC-AI/Pixelle-Video.git && cd Pixelle-Video
- Outcome: repository with web/app.py, workflows/, templates/, bgm/ (the README).
Launch the Streamlit shell (uv path)
- Command: uv run streamlit run web/app.py
- Outcome: local server on port 8501, browser tab with three columns (the README).
Windows all-in-one
- Action: download the latest release bundle, run start.bat
- Outcome: same port without installing Python manually (the README).
Configure media backends
- Local: set ComfyUI URL (default http://127.0.0.1:8188) and hit Test Connection
- Cloud: paste RunningHub API key when you use bundled runninghub/*.json workflows (the README, config.example.yaml).
Verify artifacts
- Command: ls output/ after a successful run
- Outcome: composed mp4 files the UI referenced (the FAQ).

Directory shape worth memorizing:

Pixelle-Video/
  web/app.py          # Streamlit entry
  workflows/          # image_*.json, video_*.json, tts_*.json (API format)
  templates/          # static_*, image_*, video_* HTML
  bgm/                # drop custom MP3/WAV
  output/             # rendered videos land here

Markdy scene: the night the TTS lane caught fire

Markdy animation

Configuration and customization

LLM block in config.example.yaml: set api_key, base_url, and model for any OpenAI-compatible endpoint. Change when you outgrow the bundled presets (Qwen, GPT, DeepSeek, Ollama URLs are commented inline) (config.example.yaml).
ComfyUI URL vs Docker: use http://127.0.0.1:8188 on bare metal; switch to host.docker.internal:8188 on Mac or Windows containers, or your host IP on Linux, so queues still reach the GPU box (config.example.yaml).
runninghub_concurrent_limit: raise only when your membership tier allows more parallel cloud jobs; keeps bills predictable (config.example.yaml).
prompt_prefix: English style string applied before every image or video prompt; tighten it when brand guidelines demand a single look (config.example.yaml).
template.default_template: points at paths such as 1080x1920/image_default.html; change when your distribution channel demands a new aspect ratio (config.example.yaml).

💡 Tip: Copy config.example.yaml to config.yaml for static defaults, but remember the UI also writes runtime settings. Never commit secrets (config.example.yaml).

Where it fits (and where it does not)

Fits when ComfyUI or RunningHub is already part of your stack and you want a guided short-video wrapper with history, batch tasks, and motion-transfer add-ons noted in recent release bullets (the README).
Does not fit when you need a fully managed editor with collaborative timelines and manual keyframes; this is automation-first.
Works alongside Pixelle-MCP if you also want agents invoking ComfyUI directly (the README).

The rough edges

Edge-TTS depends on Microsoft’s free surface and may fail on noisy networks; the FAQ explicitly recommends ComfyUI tts_ workflows for stability (the FAQ).
LLM failures usually trace to bad base_url, exhausted balances, or typoed model names; the FAQ tells you to verify each (the FAQ).
Features that rely on Chromium may throw “Could not find a Chrome executable”; installing Chrome resolves it (the FAQ).
Generation latency grows with shot count, model speed, and WAN conditions; README sets expectations in minutes, not seconds (the README).

⚠️ Warning: Treat API keys like cash. Streamlit binds locally, yet any script with your RunningHub or LLM tokens can rack up charges if you expose the port.

How it compares: alternatives the repo names

The README credits Pixelle-MCP, MoneyPrinterTurbo, NarratoAI, MoneyPrinterPlus, and ComfyKit as inspiration (the README). Pixelle-Video’s bet is composable Comfy graphs plus a Streamlit control surface instead of a monolithic editor.

Final thoughts

I stopped measuring the night by how many keyframes I touched. The concrete result is a rendered mp4 in output/, a UI that narrates each shot, and a community that pushed the repo past twelve thousand stars while shipping motion transfer and digital-human pipelines (the README). The kitchen still gets hot, but the tickets finally print legibly.

AIDC-AI/Pixelle-Video · Apache-2.0 · 12738 · docs