Close Menu
  • Home
  • Opinion
  • Region
    • Africa
    • Asia
    • Europe
    • Middle East
    • North America
    • Oceania
    • South America
  • AI & Machine Learning
  • Robotics & Automation
  • Space & Deep Tech
  • Web3 & Digital Economies
  • Climate & Sustainability Tech
  • Biotech & Future Health
  • Mobility & Smart Cities
  • Global Tech Pulse
  • Cybersecurity & Digital Rights
  • Future of Work & Education
  • Trend Radar & Startup Watch
  • Creator Economy & Culture
What's Hot

AI made builders quicker. So why are product groups nonetheless gradual?

March 19, 2026

Do you have to promote your Impinj inventory?

March 19, 2026

The Reformatory Lab opens two new areas, bringing speciality espresso and pastries to Sharjah’s Aljada

March 18, 2026
Facebook X (Twitter) Instagram LinkedIn RSS
NextTech NewsNextTech News
Facebook X (Twitter) Instagram LinkedIn RSS
  • Home
  • Africa
  • Asia
  • Europe
  • Middle East
  • North America
  • Oceania
  • South America
  • Opinion
Trending
  • AI made builders quicker. So why are product groups nonetheless gradual?
  • Do you have to promote your Impinj inventory?
  • The Reformatory Lab opens two new areas, bringing speciality espresso and pastries to Sharjah’s Aljada
  • Kota ranks ninth in Sifted’s 100 fastest-growing UK and Irish start-ups checklist
  • 2026 Web Zero Metropolis Expo shares new means of making use of sustainability; Not simply cities however in society as properly
  • BMW Formally Unveils All-New i3, a 469-HP Electrical 3 Collection With 440 Miles of Vary
  • Baidu Qianfan Group Releases Qianfan-OCR: A 4B-Parameter Unified Doc Intelligence Mannequin
  • Geely Stories File 2025 Income of RMB 345.23 Billion, Up 25% YoY
Thursday, March 19
NextTech NewsNextTech News
Home - AI & Machine Learning - Baidu Qianfan Group Releases Qianfan-OCR: A 4B-Parameter Unified Doc Intelligence Mannequin
AI & Machine Learning

Baidu Qianfan Group Releases Qianfan-OCR: A 4B-Parameter Unified Doc Intelligence Mannequin

NextTechBy NextTechMarch 18, 2026No Comments4 Mins Read
Share Facebook Twitter Pinterest LinkedIn Tumblr Telegram Email Copy Link
Follow Us
Google News Flipboard
Baidu Qianfan Group Releases Qianfan-OCR: A 4B-Parameter Unified Doc Intelligence Mannequin
Share
Facebook Twitter LinkedIn Pinterest Email






The Baidu Qianfan Group launched Qianfan-OCR, a 4B-parameter end-to-end mannequin designed to unify doc parsing, format evaluation, and doc understanding inside a single vision-language structure. Not like conventional multi-stage OCR pipelines that chain separate modules for format detection and textual content recognition, Qianfan-OCR performs direct image-to-Markdown conversion and helps prompt-driven duties like desk extraction and doc query answering.

Screenshot 2026 03 18 at 11.40.01 AM 1
https://arxiv.org/pdf/2603.13398

Structure and Technical Specs

Qianfan-OCR makes use of the multimodal bridging structure from the Qianfan-VL framework. The system consists of three main parts:

  • Imaginative and prescient Encoder (Qianfan-ViT): Employs an Any Decision design that tiles photos into 448 x 448 patches. It helps variable-resolution inputs as much as 4K, producing as much as 4,096 visible tokens per picture to keep up spatial decision for small fonts and dense textual content.
  • Cross-Modal Adapter: A light-weight two-layer MLP with GELU activation that initiatives visible options into the language mannequin’s embedding area.
  • Language Mannequin Spine (Qwen3-4B): A 4.0B-parameter mannequin with 36 layers and a local 32K context window. It makes use of Grouped-Question Consideration (GQA) to scale back KV cache reminiscence utilization by 4x.

‘Structure-as-Thought’ Mechanism

The principle characteristic of the mannequin is Structure-as-Thought, an optionally available pondering section triggered by tokens. Throughout this section, the mannequin generates structured format representations—together with bounding bins, component sorts, and studying order—earlier than producing the ultimate output.

  • Practical Utility: This course of recovers express format evaluation capabilities (component localization and kind classification) typically misplaced in end-to-end paradigms.
  • Efficiency Traits: Analysis on OmniDocBench v1.5 signifies that enabling the pondering section gives a constant benefit on paperwork with excessive “format label entropy”—these containing heterogeneous components like blended textual content, formulation, and diagrams.
  • Effectivity: Bounding field coordinates are represented as devoted particular tokens ( to ), decreasing pondering output size by roughly 50% in comparison with plain digit sequences.

Empirical Efficiency and Benchmarks

Qianfan-OCR was evaluated in opposition to each specialised OCR methods and normal vision-language fashions (VLMs).

Doc Parsing and Normal OCR

The mannequin ranks first amongst end-to-end fashions on a number of key benchmarks:

  • OmniDocBench v1.5: Achieved a rating of 93.12, surpassing DeepSeek-OCR-v2 (91.09) and Gemini-3 Professional (90.33).
  • OlmOCR Bench: Scored 79.8, main the end-to-end class.
  • OCRBench: Achieved a rating of 880, rating first amongst all examined fashions.

On public KIE benchmarks, Qianfan-OCR achieved the very best common rating (87.9), outperforming considerably bigger fashions.

Mannequin Total Imply (KIE) OCRBench KIE Nanonets KIE (F1)
Qianfan-OCR (4B) 87.9 95.0 86.5
Qwen3-4B-VL 83.5 89.0 83.3
Qwen3-VL-235B-A22B 84.2 94.0 83.8
Gemini-3.1-Professional 79.2 96.0 76.1

Doc Understanding

Comparative testing revealed that two-stage OCR+LLM pipelines typically fail on duties requiring spatial reasoning. As an example, all examined two-stage methods scored 0.0 on CharXiv benchmarks, because the textual content extraction section discards the visible context (axis relationships, knowledge level positions) needed for chart interpretation.

Screenshot 2026 03 18 at 11.40.47 AM 1Screenshot 2026 03 18 at 11.40.47 AM 1
https://arxiv.org/pdf/2603.13398

Deployment and Inference

Inference effectivity was measured in Pages Per Second (PPS) utilizing a single NVIDIA A100 GPU.

  • Quantization: With W8A8 (AWQ) quantization, Qianfan-OCR achieved 1.024 PPS, a 2x speedup over the W16A16 baseline with negligible accuracy loss.
  • Structure Benefit: Not like pipeline methods that depend on CPU-based format evaluation—which might change into a bottleneck—Qianfan-OCR is GPU-centric. This avoids inter-stage processing delays and permits for environment friendly large-batch inference.

Try Paper, Repo and Mannequin on HF. Additionally, be at liberty to observe us on Twitter and don’t neglect to affix our 120k+ ML SubReddit and Subscribe to our Publication. Wait! are you on telegram? now you possibly can be a part of us on telegram as nicely.


a professional linkedin headshot photogr 0jcmb0R9Sv6nW5XK zkPHw uARV5VW1ST6osLNlunoVWg

Michal Sutter is an information science skilled with a Grasp of Science in Information Science from the College of Padova. With a stable basis in statistical evaluation, machine studying, and knowledge engineering, Michal excels at remodeling advanced datasets into actionable insights.






Earlier articleNVIDIA AI Open-Sources ‘OpenShell’: A Safe Runtime Setting for Autonomous AI Brokers


Elevate your perspective with NextTech Information, the place innovation meets perception.
Uncover the newest breakthroughs, get unique updates, and join with a worldwide community of future-focused thinkers.
Unlock tomorrow’s tendencies in the present day: learn extra, subscribe to our publication, and change into a part of the NextTech neighborhood at NextTech-news.com

Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
NextTech
  • Website

Related Posts

NVIDIA AI Open-Sources ‘OpenShell’: A Safe Runtime Atmosphere for Autonomous AI Brokers

March 18, 2026

ServiceNow Analysis Introduces EnterpriseOps-Gymnasium: A Excessive-Constancy Benchmark Designed to Consider Agentic Planning in Real looking Enterprise Settings

March 18, 2026

Unsloth AI Releases Unsloth Studio: A Native No-Code Interface For Excessive-Efficiency LLM Fantastic-Tuning With 70% Much less VRAM Utilization

March 18, 2026
Add A Comment
Leave A Reply Cancel Reply

Economy News

AI made builders quicker. So why are product groups nonetheless gradual?

By NextTechMarch 19, 2026

AI coding assistants can now generate code, documentation, and checks in minutes. However quicker code…

Do you have to promote your Impinj inventory?

March 19, 2026

The Reformatory Lab opens two new areas, bringing speciality espresso and pastries to Sharjah’s Aljada

March 18, 2026
Top Trending

AI made builders quicker. So why are product groups nonetheless gradual?

By NextTechMarch 19, 2026

AI coding assistants can now generate code, documentation, and checks in minutes.…

Do you have to promote your Impinj inventory?

By NextTechMarch 19, 2026

Scott Searle of Roth Capital Companions maintained a “Purchase” score on Impinj…

The Reformatory Lab opens two new areas, bringing speciality espresso and pastries to Sharjah’s Aljada

By NextTechMarch 18, 2026

The Reformatory Lab, an Australian café model that kinds a part of…

Subscribe to News

Get the latest sports news from NewsSite about world, sports and politics.

NEXTTECH-LOGO
Facebook X (Twitter) Instagram YouTube

AI & Machine Learning

Robotics & Automation

Space & Deep Tech

Web3 & Digital Economies

Climate & Sustainability Tech

Biotech & Future Health

Mobility & Smart Cities

Global Tech Pulse

Cybersecurity & Digital Rights

Future of Work & Education

Creator Economy & Culture

Trend Radar & Startup Watch

News By Region

Africa

Asia

Europe

Middle East

North America

Oceania

South America

2025 © NextTech-News. All Rights Reserved
  • About Us
  • Contact Us
  • Privacy Policy
  • Terms Of Service
  • Advertise With Us
  • Write For Us
  • Submit Article & Press Release

Type above and press Enter to search. Press Esc to cancel.

Subscribe For Latest Updates

Sign up to best of Tech news, informed analysis and opinions on what matters to you.

Invalid email address
 We respect your inbox and never send spam. You can unsubscribe from our newsletter at any time.     
Thanks for subscribing!