Self-hosting

ColiVara is made up of multiple services. The first is an embedding service that turns images and queries into Vectors. This is bundled separately as it needs a GPU. The second is a Postgres with a pgvector extension to store vectors. A Gotenberg service that handles document conversions to PDFs. And finally, a Django-Ninja REST API that handles user requests. Other than the Embedding service, everything else is bundles together via docker-compose to run seamlessly on a typical VPS.

For production workloads - you may consider a managed Postgres instance for automatic security updates and regular backup.

Embedding Service

Git clone the service repository

git clone https://github.com/tjmlabs/ColiVarE

Optional: download uv and install it in your environment. We use uv, however you can also use pip to install the requirements.

pip install uv
uv venv # or python -m venv .venv
source .venv/bin/activate #.venv/Scripts/activate on windows

Compile requirements based on your environment. As this services uses pytorch under the hood- requirements will be different depending on your OS and Nvidia GPU availability. We use a mac for development and a Linux in production.
Install the requirements

uv pip compile builder/requirements.in -o builder/requirements.txt

Download the models from huggingface and save them in the models_hub directory before building. See src/download_models.py for more details.

from colpali_engine.models import ColQwen2, ColQwen2Processor
import torch

model_name = "vidore/colqwen2-v1.0"
if torch.cuda.is_available():
    device_map = "cuda"
elif torch.backends.mps.is_available():
    device_map = "mps"
else:
    device_map = None
    
model = ColQwen2.from_pretrained(
        model_name,
        cache_dir="models_hub/",  # where to save the model
        device_map=device_map,
    )

processor = ColQwen2Processor.from_pretrained(model_name, cache_dir="models_hub/")

Run the service locally using the following command

python3 src/handler.py --rp_serve_api

The Embedding service is now running on http://localhost:8000/. You can test it using the following command. Remember - you do need a GPU and at least 8gb of VRAM available. The performance on a M-series of Macs is also acceptable for local development.

curl --request POST \
  --url http://localhost:8000/runsync \
  --header 'Content-Type: application/json' \
  --data '{"input": {"task": "query","input_data": ["hello"]}}'

You may consider running this service in an "on-demand" fashion via Docker for cost-savings in production settings.

REST API

Clone the ColiVara repository

git clone https://github.com/tjmlabs/ColiVara

Create a .env.dev file in the root directory with the following variables:

EMBEDDINGS_URL="the serverless embeddings service url" # for local setup use http://localhost:8000/runsync/
EMBEDDINGS_URL_TOKEN="the serverless embeddings service token"  # for local setup use any string will do.
AWS_S3_ACCESS_KEY_ID="an S3 or compatible storage access key"
AWS_S3_SECRET_ACCESS_KEY="an S3 or compatible storage secret key"
AWS_STORAGE_BUCKET_NAME="an S3 or compatible storage bucket name"

Run all the services via docker-compose

``bash
docker-compose up -d --build
docker-compose exec web python manage.py migrate
docker-compose exec web python manage.py createsuperuser
# get the token from the superuser creation
docker-compose exec web python manage.py shell
from accounts.models import CustomUser
user = CustomUser.objects.first().token # save this token
```

Application will be running at http://localhost:8001 and the swagger documentation at http://localhost:8001/v1/docs
The swagger documentations page is also a playground - where you can try all the endpoints using the token created earlier

Development

Follow the steps above to get the service up and running.

To run tests and type checking - we have 100% test coverage

docker-compose exec web pytest 
#mypy for type checking
docker-compose exec web mypy .

Make a branch with your changes and additional code
Open a Pull request on Github. We have CI/CD and pre-commit hooks to format and test your changes

We welcome contribution and discussion.

PreviousColiVara Benchmark Evaluation NextRetrieval Augmented Generation (RAG)

Last updated 2 months ago

Was this helpful?