Python SDK

Using the SDK

The SDK can be installed:

pip install colivara_py

Then imported into your project:

from colivara_py import ColiVara

Essential Methods

search: Sends a query to the server.

The query species which collection to search within, the number of top results to return, and optional filters to refine the search. It returns the most relevant results based on the given parameters.

Parameters

  • query (str): The search query string. This value cannot be null or empty.

  • collection_name (str, optional): The name of the collection to search within. Defaults to "all", which searches across all collections.

  • top_k (int, optional): Specifies the maximum number of results to return. Defaults to 3.

  • query_filter (Dict[str, Any], optional): An optional filter to narrow down search results. Read more about Advance Filtering options here.

    • on (str): Specifies whether the filter applies to "document" or "collection".

    • key (str, List[str]]): A single key or a list of keys to match.

    • value (str, int, float, bool, List[str, int, float, bool]): The value(s) to match for the specified key(s).

    • lookup (str): Defines the matching condition. Options include "key_lookup", "contains", "contained_by", "has_key", "has_keys", and "has_any_keys".

Returns

  • QueryOut: An object that includes the search query and a list of relevant pages based on the specified parameters.

Exceptions

  • ValueError: Raised if the query is empty, the specified collection_name does not exist, or the query_filter is improperly configured.

Example

# searches for pages within the "my_collection" collection that contains
# content related to "Gemini" and are categorized under "AI".
# returns 5 tops results
results = client.search(
    query="Gemini",
    collection_name=my_collection,
    top_k=5,
    query_filter={
      "on": "document",
      "key": "category",
      "value": "AI",
      "lookup": "contains"
    }
)
create_collection:Creates a new collection

A collection is a storage within ColiVara server that your documents could be uploaded into for search purposes. A collection is created with a specified name and optional metadata.

Parameters

  • name (str): The name of the collection to be created. This value cannot be null or empty.

  • metadata (Dict[str, Any], optional): A dictionary containing metadata for the new collection. This is optional and can include any relevant key-value pairs.

Returns

  • CollectionOut: An object representing the newly created collection, including details such as its name and metadata.

Exceptions

  • Exceptionwith the message Conflict error: Raised if there’s a conflict, such as when a collection with the same name already exists.

Example

# searches for pages within the "my_collection" collection that contains
# content related to "Gemini" and are categorized under "AI".
# returns 5 tops results
results = client.search(
    query="Gemini",
    collection_name=my_collection,
    top_k=5,
    query_filter={
      "on": "document",
      "key": "category",
      "value": "AI",
      "lookup": "contains"
    }
)
upsert: Adds or updates a document.

This operation also supports adding metadata and providing document content through a URL, a base64-encoded string, or a file path.

Parameters

  • name (str): The name of the document to be added or updated. This value cannot be null.

  • metadata (Dict[str, Any], optional): Additional metadata for the document, such as tags or descriptive information.

  • collection_name (str, optional): The collection to add the document to. Defaults to "default collection".

  • document_url (str, optional): The URL of the document if it’s available online.

  • document_base64 (str, optional): The document content encoded in base64.

  • document_path (str, optional): The file path to the document, which will be read and converted to base64.

  • wait (bool, optional): If True, the method will be synchronous, which mean it will wait for the document processing to complete before returning, making . The default for this value is False, making asynchronous the default behavior .

Returns

  • DocumentOut: An object containing details of the created or updated document. This is returned for synchronous processing

  • GenericMessage: A message object returned if the document is accepted for processing .This is returned for asynchronous processing

Exceptions

  • ValueError: Raised if no valid document source (URL, base64, or file path) is provided, or if there is an issue with the file path.

  • FileNotFoundError: Raised if the specified file path does not exist.

  • PermissionError: Raised if there is no read permission for the specified file.

Example

# This code synchronously adds/updates an "AI Research Paper" document  in the "AI_Papers" collection
document = client.upsert_document(
    name="AI_Research_Paper",
    metadata={
        "category": "Machine Learning",
        "year": "2024",
        "author": "Dr. AI Researcher"
    },
    collection_name="AI_Papers",
    document_path="/path/to/AI_Research_Paper.pdf",
    wait=True
)
create_embedding: Generates embeddings on the processed images or text

Embeddings are vector representations of the data. This method can generate vectors on either image data, or on a text query. After generation, these vectors comparison is processed to generate query result.

Parameters

  • input_data (str, List[str]): A single string or a list of strings representing the data for which embeddings need to be generated. This could be text (for a query) or paths to image files.

  • task (str,, optional): Specifies the type of embedding task.

    • Acceptable values are "query" (default) for text queries or "image" for images.

Returns

  • EmbeddingsOut: An object containing the generated embeddings, along with information about the model and usage data.

Exceptions

  • ValueError: Raised if an invalid task type is provided (i.e., not "query" or "image") or if the input data is improperly formatted.

Example

# Create embeddings for a text query
text_embeddings = client.create_embedding(
  input_data="What is artificial intelligence?", 
  task="query")

# Create embeddings for a list of image paths
image_paths = 
image_embeddings = client.create_embedding(
  input_data=["image1.jpg", "image2.jpg"], 
  task="image")

Collection Manipulation Methods

get_collection: Retrieves a collection

Parameters

  • collection_name (str): The name of the collection to retrieve. This value cannot be null or empty and must match an existing collection.

Returns

  • CollectionOut: An object representing the retrieved collection, including details such as its name and metadata.

Exceptions

  • Exceptionwith message Collection not found : Raised if the specified collection does not exist.

Example

# retrieves the "AI_Research_Papers" collection
collection = client.get_collection(collection_name="AI_Research_Papers")
list_collections:Retrieves a list of all collections available to the user

Parameters

Returns

  • List[CollectionOut]: A list of CollectionOut objects, each representing a collection with details such as name and metadata.

Exceptions

  • ValueError: Raised if the server response format is unexpected (e.g., not a list).

Example

# Retrieve the list of all collections
collections = client.list_collections()
partial_update_collection: Partially updates a collection's metadata.

Only the fields provided in the parameters will be updated. Metadata already exists for the collection but was not provided will not be removed.

Parameters

  • collection_name (str): The name of the collection to update. This value cannot be null.

  • name (str, optional): A new name for the collection, if you wish to rename it.

  • metadata (Dict[str, Any], optional): New metadata for the collection. This replaces or adds to the existing metadata.

Returns

  • CollectionOut: An object representing the updated collection, including details such as its name and metadata.

Exceptions

  • Exceptionwith message Collection not found : Raised if the specified collection does not exist.

Example

# updates the "AI_Projects" collection name to "AI_Research_Projects" and adds more metadata
updated_collection = client.partial_update_collection(
    collection_name="AI_Projects",
    name="AI_Research_Projects",
    metadata={
      "updated_by": "admin", 
      "status": "active"}
)
delete_collection:Removes a collection

The collection will be deleted from ColiVara server. This action is permanent and final.

Parameters

  • collection_name (str): The name of the collection to delete. This must match an existing collection.

Returns

Exceptions

  • Exceptionwith message Collection not found : Raised if the specified collection does not exist.

Example

# Delete the specified collection
client.delete_collection(collection_name="Obsolete_Collection")

Document Manipulation Methods

get_document: Retrieves a document

Parameters

  • document_name (str): The name of the document to retrieve.

  • collection_name (str, optional): The name of the collection containing the document. Defaults to "default collection".

  • expand (str, optional): if the value is "pages",the method will include the document’s pages.

Returns

  • DocumentOut: An object containing details of the retrieved document, including details such as its name and metadata.

Exceptions

  • ValueError with message Document not found : Raised if the document is not found because the document name does not exist

Example

# retrieves a document with its pages included
document = client.get_document(
    document_name="Research_Paper",
    collection_name="AI_Research",
    expand=["pages"]
)
list_documents:Retrieves a list of all documents in a collection or in all collections available to the user

Parameters

  • collection_name (str, optional): The name of the collection to fetch documents from. Defaults to "default collection".

    • Use "all" to fetch documents from all collections.

  • expand (str, optional): if the value is "pages",the method will include all the documents’ pages.

Returns

  • List[DocumentOut]: A list of DocumentOut objects, each representing a document with its details.

Exceptions

Example

# Retrieves all documents in the "AI_Research" collection, including pages for each document
documents = client.list_documents(
  collection_name="AI_Research", 
  expand="pages")
partial_update_document: Partially updates a document

This method can update either or both the document's content or metadata. Only the fields provided in the parameters will be updated. Metadata already exists for the document but was not provided will not be removed.

Parameters

  • document_name (str): The name of the document to be updated.

  • name (str, optional): A new name for the document, if renaming.

  • metadata (Dict[str, Any], optional): Updated metadata for the document.

  • collection_name (str, optional): The new collection name if you wish to move the document to a different collection.

  • document_url (str, optional): A new URL for the document content, if changing.

  • document_base64 (str, optional): A new base64-encoded string of the document content, if changing.

Returns

  • DocumentOut: An object containing details of the retrieved document, including details such as its name and metadata.

Exceptions

  • ValueError with message Update failed: Raised if the document is not found or if there is an issue with the update

Example

# Update a document's name from "Research_Paper" to "Updated_Research_Paper"
# Also updates its metadata, and URL
updated_document = client.partial_update_document(
    document_name="Research_Paper",
    name="Updated_Research_Paper",
    metadata={
      "author": "Dr. AI Researcher", 
      "year": "2024"
    },
    document_url="https://example.com/updated_paper.pdf"
)
delete_document:Removes a document

The document to be deleted can be identified from a specific collection, or from all collections. The document will be deleted from ColiVara server. This action is permanent and final.

Parameters

  • document_name (str): The name of the document to delete.

  • collection_name (str, optional): The name of the collection containing the document.

    • Defaults to "default collection".

    • Use "all" to access documents across all collections belonging to the user.

Returns

Exceptions

  • ValueError: Raised if the document does not exist or if there is an issue with the deletion

Example

# Deletes the "Old_Report" document from the "Archived_Documents" collection
client.delete_document(
  document_name="Old_Report", 
  collection_name="Archived_Documents"
)

Other Methods

file_to_base64: Converts file content to a base64-encoded string

This method is useful to update a document's content - if the document is short or has only 1 page - by first converting the file content into a base64 string, then submitted this string as a parameter.

Parameters

  • file_path (str): The path to the file you want to convert.

Returns

  • str: A base64-encoded string representing the file's content

Exceptions

  • Exception: Raised if there’s an error during the file reading or encoding process.

Example

# Converts the contents of document.pdf to a base64 string.
base64_string = client.file_to_base64("/path/to/document.pdf")
file_to_imgbase64:Convert a file into a list of base64 strings, each represents a page from the document

This method is useful to update a document's content if the document has multiple pages

Parameters

  • file_path (str): The path to the file you want to convert.

Returns

  • List[FileOut]: A list of FileOut objects, each containing a base64-encoded string of an image, and the page number within the document.

Exceptions

  • Exception: Raised if there’s an error during the file reading or encoding process.

Example

# Converts the contents of multi_page_document.pdf to a list of base64 string
base64_images = client.file_to_imgbase64("/path/to/multi_page_document.pdf")

Last updated