Welcome

Welcome to the ColiVara documentation! Here you'll get an overview of all the features ColiVara offers to help you build a state of the art retrieval system.

Colivara is a suite of services that allows you to store, search, and retrieve documents based on their visual embeddings.

Why visual embeddings?

Documents are visually rich structures that convey information through text, as well as tables, figures, page layouts, and charts. While legacy document retrieval systems exhibit good performance on query-to-text matching, they struggle to pass visual cues efficiently to large language models, hindering their performance on practical document retrieval applications such as Retrieval Augmented Generation.

It is a web-first implementation of the ColPali paper using ColQwen2 as the LLM model. It works exactly like RAG from the end-user standpoint - but using vision models instead of chunking and text-processing for documents. No OCR, no text extraction, no broken tables, or missing images. What you see, is what you get.

Performance

ColiVara performance is near state of the art for Retrieval-Augmented Generation on the vidore leaderboard. We significantly outperfomed currently methods for document parsing and processing such as OCR and captioning.

Our detailed Benchmark Performance Evaluation have illustrated Colivara's performance across diverse benchmarks. Metrics like NDCG@5 score (Normalized Discounted Cumulative Gain at rank 5) and Latency were recorded for a comprehensive analysis.

Benchmark

Colivara Score

Avg Latency (s) (lower is better)

Num Docs

Average

86.8

N/A

ArxivQA

87.6

3.2

500

DocVQA

54.8

2.9

500

InfoVQA

90.1

2.9

500

Shift Project

87.7

5.3

1000

Artificial Intelligence

98.7

4.3

1000

Energy

96.4

4.5

1000

Government Reports

96.8

4.4

1000

Healthcare Industry

98.5

4.5

1000

TabFQuad

86.6

3.7

280

TatQA

70.9

8.4

1663

Key Findings

ColiVara dominated visual-heavy benchmarks like ArxivQA and InfoQA with NDCG@5 score of 88.1, double the performance of captioning-based systems.
Even for on text-centric benchmarks, ColiVara outperformed traditional methods by up to 30% on benchmarks like DocQA and multimodal benchmarks like InfoQA.
For more comprehensive benchmarks, where a holistic approach of visual and textual analysis is key to query generation, such as the key for queries in specific domains (Sustainability, Energy, AI, Government Report, Healthcare), ColiVara shines overwhelmingly over competitions, scoring in the high 90s for all benchmarks. This is due to ColiVara's Holistic Multimodal Integration and Spatial Context Awareness.

Jump right in

Getting Started

Create your first RAG pipeline with 2 lines of code

Guides

Learn all what ColiVara have to offer

API Reference

Try the API live

NextQuickstart

Last updated 8 months ago

Was this helpful?