Helios: 14B Video Model That Runs in Real Time
Helios is a 14B-parameter open-weight video generation model that hits 19.5 FPS on a single H100 GPU, generates minute-long videos, and supports text, image, and video inputs — all under Apache 2.0.
Seedance 2.0 generates 15-second clips. Kling 3.0 caps at similar lengths. Both require cloud inference and closed APIs. Helios, an open-weight 14B model from PKU Yuan Group, generates minute-long videos at 19.5 FPS on a single H100 — and you can download the weights today.
That speed claim deserves emphasis. Most video generation models at this scale take minutes per second of output. Helios achieves near-real-time throughput without the standard bag of acceleration tricks — no KV-cache, no causal masking, no sparse attention, no quantization, no TinyVAE. The architecture itself is fast.
What Makes It Fast?
Helios is an autoregressive diffusion model. It generates video in chunks of 33 frames, each chunk conditioned on the previous ones. The training pipeline has three stages: Stage 1 converts a pretrained bidirectional diffusion model into an autoregressive generator using what the team calls "Unified History Injection." Stage 2 applies a Pyramid Unified Predictor Corrector to compress token counts and cut computation. Stage 3 uses adversarial distillation to reduce sampling steps from 50 down to just 3 while eliminating the need for classifier-free guidance entirely.
The distilled variant is the speed champion. Three denoising steps per chunk, no CFG overhead, and aggressive multi-scale sampling — that's how you get 19.5 FPS out of a 14B model. The base model is slower but produces the highest-quality output. An intermediate checkpoint (Helios-Mid) sits between them.
Three Models, One Architecture
| Variant | Steps | CFG | Best For |
|---|---|---|---|
| Helios-Base | 50 | Standard | Maximum quality |
| Helios-Mid | 20 | CFG-Zero* | Quality/speed balance |
| Helios-Distilled | 3 | None (guidance-free) | Real-time generation |
All three share the same 14B architecture and support text-to-video, image-to-video, and video-to-video generation. The resolution is 384×640 — not 4K, not even 1080p, but reasonable for the speed and length tradeoff. All three variants support an interactive mode for iterative generation.
Ecosystem Support on Day One
This is where Helios pulls ahead of most open-weight releases. It launched with same-day integration support from four major inference frameworks: HuggingFace Diffusers, vLLM-Omni, SGLang-Diffusion, and Cache-DiT. Multi-GPU parallel inference works out of the box. It also runs on Huawei's Ascend NPUs at roughly 10 FPS — notable for anyone operating outside the NVIDIA ecosystem.
The weights are on HuggingFace and ModelScope under Apache 2.0. Training code, inference scripts, and even a toy dataset for fine-tuning are included. There's also HeliosBench, a dedicated benchmark for evaluating real-time long-video generation models — the team is clearly trying to define the evaluation standard, not just release a model.
Where It Falls Short
The 384×640 resolution is the obvious limitation. Kling 3.0 outputs native 4K. Seedance 2.0 handles synchronized audio alongside video. Helios does neither — it generates silent video at a resolution that looks decent on a phone but won't survive a large monitor. Image-to-video and video-to-video quality trails behind text-to-video, which is where the training was concentrated.
But Helios isn't competing on resolution or audio. It's competing on the combination of model size, speed, video length, and openness that no one else offers. A minute of coherent video from a 14B model running at real-time speeds on a single GPU, with weights you can download and fine-tune — that's a capability that didn't exist a month ago. Four H100 GPUs and multi-GPU inference could push this into interactive territory for applications where latency matters more than pixel count.