SHENZHEN – Tencent’s Hunyuan massive mannequin staff has formally launched and open-sourced HunyuanOCR, a specialised light-weight vision-language mannequin for optical character recognition (OCR) containing simply 1 billion parameters.
The mannequin combines a local Imaginative and prescient Transformer (ViT) structure with a light-weight massive language mannequin (LLM), delivering commercial-level efficiency in textual content detection, doc parsing, and data extraction. It lately received first place within the small mannequin observe of the ICDAR 2025 DIMT problem and achieved state-of-the-art outcomes on the OCRBench benchmark for fashions underneath 3B parameters.
HunyuanOCR introduces three key breakthroughs:
- Unified multitasking functionality – supporting textual content detection, complicated format evaluation, open-field data extraction, and picture translation inside a single environment friendly framework
- Finish-to-end structure – eliminating conventional preprocessing pipelines and lowering error accumulation
- Reinforcement studying optimization – demonstrating that RL can considerably improve efficiency throughout a number of OCR duties
The mannequin has gained speedy neighborhood traction, rating among the many prime 4 trending fashions on Hugging Face and receiving over 700 stars on GitHub inside a brief interval. It has additionally been built-in into the vLLM inference engine.
Accessible now on Hugging Face and ModelScope, HunyuanOCR offers researchers and builders with a strong, deployable OCR resolution that balances excessive accuracy with computational effectivity – significantly helpful for edge deployment and industrial functions.
Elevate your perspective with NextTech Information, the place innovation meets perception.
Uncover the newest breakthroughs, get unique updates, and join with a world community of future-focused thinkers.
Unlock tomorrow’s traits at present: learn extra, subscribe to our e-newsletter, and grow to be a part of the NextTech neighborhood at NextTech-news.com

