Retrieval Augmented Generation (RAG)
The common problem ColiVara solves is RAG over visually rich documents where text extraction pipelines fail or presents incomplete data. For example, heavy tabular data with many charts such as policies for a medical facility.
You can find the complete code for this demo below.
Overview
1. Prepare your environment
First, install the ColiVara Python SDK. If using Jupyter Notebook:
If using the command shell:
2. Prepare your documents
This is the start of our RAG pipeline. We start by preparing the documents. That could be a directory on your local computer, a S3 bucket, a google drive. Anything you can think of will work with ColiVara. For the purposes of this guide - we will use a local directory with some documents in them.
If using Jupyter Notebook:
If using the command shell:
Then download the documents from Github. For the purposes of this demo, we will download the smallest 2 files. But, feel free to try with your own documents or all the documents in our demo repository.
3. Sync your documents
We want to sync our documents to the ColiVara server. So, we can just call this as our documents change or updated. ColiVara logic automatically updates or inserts new documents depending on what changed. The wait=True parameter ensures this process is synchronous.
4. Transform your query
Next - we we want to to transform a user messages or questions into an appropriate RAG question. In retrieval augmented generation - user and AI take turns in a conversation. In each turn, we want to get a factual context - that the AI can use in providing the answer.
We will need an LLM to help us with this transformation. For this guide, we will use gpt-4o but lighter models are also effective.
If using Jupyter Notebook:
If using the command shell:
Here is the code for the transformation.
5. Search for context using the RAG pipeline
Finally, with our document and query prepared - we are ready to run our RAG pipeline with ColiVara.
Let's peek what our context looks like:



6. Generate answer
With your context in hand - now you pass this to an LLM with multi-modal/vision capabilities and get a factually grounded answer.
And with that - your RAG pipeline is complete.
Last updated
Was this helpful?