Self-hosting
ColiVara is made up of multiple services. The first is an embedding service that turns images and queries into Vectors. This is bundled separately as it needs a GPU. The second is a Postgres with a pgvector extension to store vectors. A Gotenberg service that handles document conversions to PDFs. And finally, a Django-Ninja REST API that handles user requests. Other than the Embedding service, everything else is bundles together via docker-compose to run seamlessly on a typical VPS.
For production workloads - you may consider a managed Postgres instance for automatic security updates and regular backup.
Embedding Service
Git clone the service repository
git clone https://github.com/tjmlabs/ColiVarE
Optional: download uv and install it in your environment. We use uv, however you can also use pip to install the requirements.
pip install uv
uv venv # or python -m venv .venv
source .venv/bin/activate #.venv/Scripts/activate on windows
Compile requirements based on your environment. As this services uses pytorch under the hood- requirements will be different depending on your OS and Nvidia GPU availability. We use a mac for development and a Linux in production.
Install the requirements
uv pip compile builder/requirements.in -o builder/requirements.txt
Download the models from huggingface and save them in the
models_hub
directory before building. See src/download_models.py for more details.
from colpali_engine.models import ColQwen2, ColQwen2Processor
import torch
model_name = "vidore/colqwen2-v1.0"
if torch.cuda.is_available():
device_map = "cuda"
elif torch.backends.mps.is_available():
device_map = "mps"
else:
device_map = None
model = ColQwen2.from_pretrained(
model_name,
cache_dir="models_hub/", # where to save the model
device_map=device_map,
)
processor = ColQwen2Processor.from_pretrained(model_name, cache_dir="models_hub/")
Run the service locally using the following command
python3 src/handler.py --rp_serve_api
The Embedding service is now running on
http://localhost:8000/
. You can test it using the following command. Remember - you do need a GPU and at least 8gb of VRAM available. The performance on a M-series of Macs is also acceptable for local development.
curl --request POST \
--url http://localhost:8000/runsync \
--header 'Content-Type: application/json' \
--data '{"input": {"task": "query","input_data": ["hello"]}}'
You may consider running this service in an "on-demand" fashion via Docker for cost-savings in production settings.
REST API
Clone the ColiVara repository
git clone https://github.com/tjmlabs/ColiVara
Create a .env.dev file in the root directory with the following variables:
EMBEDDINGS_URL="the serverless embeddings service url" # for local setup use http://localhost:8000/runsync/
EMBEDDINGS_URL_TOKEN="the serverless embeddings service token" # for local setup use any string will do.
AWS_S3_ACCESS_KEY_ID="an S3 or compatible storage access key"
AWS_S3_SECRET_ACCESS_KEY="an S3 or compatible storage secret key"
AWS_STORAGE_BUCKET_NAME="an S3 or compatible storage bucket name"
Run all the services via docker-compose
``bash
docker-compose up -d --build
docker-compose exec web python manage.py migrate
docker-compose exec web python manage.py createsuperuser
# get the token from the superuser creation
docker-compose exec web python manage.py shell
from accounts.models import CustomUser
user = CustomUser.objects.first().token # save this token
```
Application will be running at http://localhost:8001 and the swagger documentation at
http://localhost:8001/v1/docs
The swagger documentations page is also a playground - where you can try all the endpoints using the token created earlier

Development
Follow the steps above to get the service up and running.
To run tests and type checking - we have 100% test coverage
docker-compose exec web pytest #mypy for type checking docker-compose exec web mypy .
Make a branch with your changes and additional code
Open a Pull request on Github. We have CI/CD and pre-commit hooks to format and test your changes
We welcome contribution and discussion.
Last updated
Was this helpful?