About

RAG (Retrieval Augmented Generation) is a powerful technique that allows us to enhance LLMs (Language Models) output with private documents and proprietary knowledge that is not available elsewhere. For example, a company's internal documents or a researcher's notes).

However, it is limited by the quality of the text extraction pipeline. With limited ability to extract visual cues and other non-textual information, RAG can be sub-optimal for documents that are visually rich.

ColiVara uses vision models to generate embeddings for documents, allowing you to retrieve documents based on their visual content. Read more about the original ColPali Model here

From the ColPali paper:

Documents are visually rich structures that convey information through text, as well as tables, figures, page layouts, or fonts. While modern document retrieval systems exhibit strong performance on query-to-text matching, they struggle to exploit visual cues efficiently, hindering their performance on practical document retrieval applications such as Retrieval Augmented Generation.

Learn More in the ColPali Paper

Key Features

State of the Art retrieval: The API is based on the ColPali paper and uses the ColQwen2 model for embeddings. It outperforms existing retrieval systems on both quality and latency.
User Management: Multi-user setup with each user having their own collections and documents.
Wide Format Support: Supports over 100 file formats including PDF, DOCX, PPTX, and more.
Webpage Support: Automatically takes a screenshot of webpages and indexes them even if it not a file.
Collections: A user can have multiple collections. For example, a user can have a collection for research papers and another for books. Allowing for efficient retrieval and organization of documents.
Documents: Each collection can have multiple documents with unlimited and user-defined metadata.
Filtering: Filtering for collections and documents on arbitrary metadata fields. For example, you can filter documents by author or year. Or filter collections by type.
Convention over Configuration: The API is designed to be easy to use with opinionated and optimized defaults.
Modern PgVector Features: We use HalfVecs for faster search and reduced storage requirements.
REST API: Easy to use REST API with Swagger documentation.
Comprehensive: Full CRUD operations for documents, collections, and users.
Dockerized: Easy to setup and run with Docker and Docker Compose on your infrastructure.

Evals:

The ColPali team has provided the following evals in their paper. We have run quick sanity checks on the API and the Embeddings Service and are getting similar results. We are working on own independent evals and will update this section with our results.

Updates:

11/6/2024: Our ArxivQ score is 86.6 - matching state of the art results in the vidore leaderboard.

Components:

Postgres DB with pgvector extension for storing embeddings. ColiVara repo
REST API for document/collection management. ColiVara repo.
Embeddings Service. This needs a GPU with at least 8gb VRAM. The code is under ColiVarE repo and is optimized for a serverless GPU workload.
You can run the embedding service separately and use your own storage and API for the rest of the components. The Embedding service is designed to be modular and can be used with any storage and API. (For example, if you want to use Qdrant for storage and Node for the API)
Language-specific SDKs for the API (Typescript SDK Coming Soon)
1. Python SDK: ColiVara-Py

License

This project is licensed under Functional Source License, Version 1.1, Apache 2.0 Future License.

For commercial licensing, please contact us at tjmlabs.com. We are happy to work with you to provide a license that meets your needs.

PreviousQuickstart NextColPali Architecture

Last updated 9 months ago

Was this helpful?