Constructing a Retrieval-Augmented Era (RAG) pipeline is simple; constructing one which doesn’t hallucinate throughout a 10-Okay audit is almost inconceivable. For devs within the monetary sector, the ‘commonplace’ vector-based RAG strategy—chunking textual content and hoping for the perfect—usually ends in a ‘textual content soup’ that loses the important structural context of tables and steadiness sheets.
VectifyAI is making an attempt to shut this hole with the launch of Mafin 2.5, a multimodal monetary agent, and PageIndex, an open-source framework that shifts the business towards ‘Vectorless RAG.’
The Drawback: Why Vector RAG Fails Finance
Conventional RAG depends on semantic similarity. When you ask about ‘Web Revenue,’ a vector database appears for chunks of textual content that sound like internet earnings. Nonetheless, monetary paperwork are layout-dependent. A quantity in a cell is meaningless with out its header, and people headers are sometimes stripped away throughout conventional PDF-to-text conversion.
That is the ‘rubbish in, rubbish out’ entice: even the neatest LLM can’t motive appropriately if the enter knowledge has misplaced its hierarchical construction.
Mafin 2.5: Accuracy at Scale
Mafin 2.5 isn’t only a fine-tuned mannequin; it’s a reasoning engine that achieved 98.7% accuracy on FinanceBench, considerably outperforming GPT-4o and Perplexity in monetary retrieval duties.
What units it aside for devs is its native integration with high-fidelity knowledge sources:
- Complete SEC Entry: Direct indexing of 10-Okay, 10-Q, and 8-Okay filings.
- Earnings Intel: Actual-time and historic earnings name transcripts.
- Market Knowledge: Dwell tickers throughout the Russell 3000 and Nasdaq.

PageIndex: The Transfer to ‘Vectorless’ RAG
The ‘secret sauce’ behind Mafin 2.5’s precision is PageIndex. PageIndex replaces conventional flat embeddings with a hierarchical tree index.
As a substitute of looking out by way of random chunks, PageIndex permits an LLM to ‘motive’ by way of a doc’s construction. It builds a semantic tree—primarily an clever map of the doc—enabling the agent to establish the precise part, web page, and line merchandise required.
Key technical options embody:
- Imaginative and prescient-Native Help: PageIndex helps Imaginative and prescient-based RAG, permitting fashions to ‘see’ the worldwide format of a web page (charts, complicated grids) quite than relying solely on OCR textual content.
- Hierarchical Navigation: It transforms PDFs right into a navigable tree construction, making certain the connection between headers and knowledge stays intact.
- Traceability: In contrast to the ‘black field’ of vector similarity, each reply has a transparent path by way of the doc tree, offering a much-needed audit path for regulated monetary environments.
Key Takeaways
- Unprecedented Monetary Accuracy (98.7%): Mafin 2.5 has set a brand new state-of-the-art file on the FinanceBench benchmark, attaining 98.7% accuracy. This considerably outperforms general-purpose fashions like GPT-4o (~31%) and Perplexity (~45%) by specializing in specialised monetary reasoning quite than normal retrieval.
- The Shift to ‘Vectorless RAG’: Transferring away from the “vibe-based” search of conventional vector databases, PageIndex introduces Reasoning-based RAG. It makes use of an LLM to ‘motive’ its manner by way of a doc’s construction, mimicking how a human analyst navigates a report to seek out particular knowledge factors.
- Hierarchical ‘Tree’ Indexing vs. Chunking: As a substitute of chopping paperwork into arbitrary, contextless textual content chunks, PageIndex organizes PDFs right into a semantic tree construction (an clever Desk of Contents). This preserves the crucial relationship between headers, nested tables, and footnotes that conventional RAG usually destroys.
- Imaginative and prescient-Native & OCR-Free Workflows: The framework helps Imaginative and prescient-based Vectorless RAG, permitting the AI to ‘see’ and retrieve info immediately from web page pictures. This can be a game-changer for monetary paperwork the place the visible format of a steadiness sheet or complicated grid is as essential because the numbers themselves.
- Enterprise-Grade Traceability: In contrast to the ‘black field’ of vector similarity, PageIndex supplies a totally auditable reasoning path. Each response is linked to particular nodes, pages, and sections, offering the transparency required for high-stakes monetary audits and compliance.
Try the Technical particulars and Repo. Additionally, be happy to observe us on Twitter and don’t neglect to affix our 100k+ ML SubReddit and Subscribe to our E-newsletter. Wait! are you on telegram? now you’ll be able to be a part of us on telegram as effectively.

Michal Sutter is an information science skilled with a Grasp of Science in Knowledge Science from the College of Padova. With a strong basis in statistical evaluation, machine studying, and knowledge engineering, Michal excels at remodeling complicated datasets into actionable insights.

Elevate your perspective with NextTech Information, the place innovation meets perception.
Uncover the most recent breakthroughs, get unique updates, and join with a world community of future-focused thinkers.
Unlock tomorrow’s tendencies right now: learn extra, subscribe to our publication, and develop into a part of the NextTech neighborhood at NextTech-news.com

