Self-hosting

ColiVara is made up of multiple services. The first is an embedding service that turns images and queries into Vectors. This is bundled separately as it needs a GPU. The second is a Postgres with a pgvector extension to store vectors. A Gotenberg service that handles document conversions to PDFs. And finally, a Django-Ninja REST API that handles user requests. Other than the Embedding service, everything else is bundles together via docker-compose to run seamlessly on a typical VPS.

For production workloads - you may consider a managed Postgres instance for automatic security updates and regular backup.

Embedding Service

  1. Git clone the service repository

git clone https://github.com/tjmlabs/ColiVarE
  1. Optional: download uv and install it in your environment. We use uv, however you can also use pip to install the requirements.

pip install uv
uv venv # or python -m venv .venv
source .venv/bin/activate #.venv/Scripts/activate on windows
  1. Compile requirements based on your environment. As this services uses pytorch under the hood- requirements will be different depending on your OS and Nvidia GPU availability. We use a mac for development and a Linux in production.

  2. Install the requirements

uv pip compile builder/requirements.in -o builder/requirements.txt
  1. Download the models from huggingface and save them in the models_hub directory before building. See src/download_models.py for more details.

from colpali_engine.models import ColQwen2, ColQwen2Processor
import torch

model_name = "vidore/colqwen2-v0.1"
if torch.cuda.is_available():
    device_map = "cuda"
elif torch.backends.mps.is_available():
    device_map = "mps"
else:
    device_map = None
    
model = ColQwen2.from_pretrained(
        model_name,
        cache_dir="models_hub/",  # where to save the model
        device_map=device_map,
    )

processor = ColQwen2Processor.from_pretrained(model_name, cache_dir="models_hub/")
  1. Run the service locally using the following command

python3 src/handler.py --rp_serve_api
  1. The Embedding service is now running on http://localhost:8000/. You can test it using the following command. Remember - you do need a GPU and at least 8gb of VRAM available. The performance on a M-series of Macs is also acceptable for local development.

curl --request POST \
  --url http://localhost:8000/runsync \
  --header 'Content-Type: application/json' \
  --data '{"input": {"task": "query","input_data": ["hello"]}}'  

You may consider running this service in an "on-demand" fashion via Docker for cost-savings in production settings.

REST API

  1. Clone the ColiVara repository

git clone https://github.com/tjmlabs/ColiVara
  1. Create a .env.dev file in the root directory with the following variables:

EMBEDDINGS_URL="the serverless embeddings service url" # for local setup use http://localhost:8000/runsync/
EMBEDDINGS_URL_TOKEN="the serverless embeddings service token"  # for local setup use any string will do.
AWS_S3_ACCESS_KEY_ID="an S3 or compatible storage access key"
AWS_S3_SECRET_ACCESS_KEY="an S3 or compatible storage secret key"
AWS_STORAGE_BUCKET_NAME="an S3 or compatible storage bucket name"
  1. Run all the services via docker-compose

``bash
docker-compose up -d --build
docker-compose exec web python manage.py migrate
docker-compose exec web python manage.py createsuperuser
# get the token from the superuser creation
docker-compose exec web python manage.py shell
from accounts.models import CustomUser
user = CustomUser.objects.first().token # save this token
```
  1. Application will be running at http://localhost:8001 and the swagger documentation at http://localhost:8001/v1/docs

  2. The swagger documentations page is also a playground - where you can try all the endpoints using the token created earlier

Development

Follow the steps above to get the service up and running.

  1. To run tests and type checking - we have 100% test coverage

    docker-compose exec web pytest 
    #mypy for type checking
    docker-compose exec web mypy .
  2. Make a branch with your changes and additional code

  3. Open a Pull request on Github. We have CI/CD and pre-commit hooks to format and test your changes

We welcome contribution and discussion.

Last updated