bullseye-arrowQuickstart

ColiVara is an web API that abstracts all the difficult parts about visual RAG. It embeds and saves documents, and then returns the highest matching pages when a user makes a query.

circle-info

We use the Python SDK in this quickstart, but since ColiVara is an API, you can use any language by making standard API calls.

API Keys

Get an API Key from the ColiVara Websitearrow-up-right or via self-hosting.

Install the Python SDK

Index a document

Colivara accepts a file url, or base64 encoded file, or a file path. We support over 100 file formats including PDF, DOCX, PPTX, and more. We will also automatically take a screenshot of URLs (webpages) and index them.

You can filter by collection name, collection metadata, and document metadata. You can also specify the number of results you want.

FAQ

chevron-rightDo I need a vector database?hashtag

No - ColiVara uses Postgres and pgVector to store vectors for you. You DO NOT need to generate, save, or manage embeddings in anyway.

chevron-rightDo you convert the documents to markdown/text?hashtag

No - ColiVara treats everything as an image, and uses vision models. There are no parsing, chunking, or OCR involved. This method outperforms chunking, and OCR for both text-based documents and visual documents.

chevron-rightHow does non-pdf documents or web pages work?hashtag

We run a pipeline to convert them to images, and perform our normal image-based retrieval. This all happen for you under the hood, and you get the top-k pages when performing retrieval.

chevron-rightCan I use my vector database? hashtag

Yes - we have an embedding endpoint that only generates embeddings without saving or doing anything else. You can store these embeddings at your end. Keep in mind that we use late-interaction and multi-vectors, many vector databases do not support this yet.

Last updated