Chinese language tech large Meituan has launched its new LongCat-Video mannequin, claiming a breakthrough in text-to-video technology by producing coherent, high-definition clips as much as 5 minutes lengthy. The corporate has additionally open-sourced the mannequin on GitHub and Hugging Face to help broader analysis collaboration.
In response to Meituan, LongCat-Video is constructed on a Diffusion Transformer (DiT) structure and helps three modes — text-to-video, image-to-video, and video continuation. The mannequin can rework a textual content immediate or a single reference picture right into a easy 720p/30 fps sequence, or prolong present footage into longer scenes with constant type, movement, and physics.
The group mentioned the mannequin addresses a persistent problem in generative video — sustaining high quality and temporal stability throughout prolonged durations. LongCat-Video can generate steady, multi-minute content material with out the standard body degradation that impacts most diffusion-based techniques.
Meituan described LongCat-Video as a step towards “world-model” AI, able to studying real-world geometry, semantics, and movement to simulate bodily environments. The mannequin is publicly out there by way of Meituan’s repositories on GitHub and Hugging Face.
Elevate your perspective with NextTech Information, the place innovation meets perception.
Uncover the most recent breakthroughs, get unique updates, and join with a worldwide community of future-focused thinkers.
Unlock tomorrow’s traits right this moment: learn extra, subscribe to our publication, and change into a part of the NextTech neighborhood at NextTech-news.com

