Architecture
The main code libraries you will use for vector search with TileDB are:
- TileDB-Vector-Search: The Vector Search library, exposing both Python and C++ APIs.
- TileDB Embedded: A C++ library (exposing C and C++ APIs) implementing the core array engine of TileDB.
- TileDB-Py: A Python wrapper of TileDB Embedded.
- TileDB-Cloud-Py: A Python client for TileDB Cloud.
The following figure outlines the interactions among the above libraries. TileDB Embedded is responsible for the bulk of interactions with the preferred backend, which can be either an object store (such as Amazon S3, Google Cloud Storage, Azure Blob Storage, or MinIO), or TileDB Cloud. TileDB-Vector-Search is built on top of TileDB Embedded, leveraging the power of arrays. TileDB-Cloud-Py is a useful client for TileDB Cloud, which implements TileDB Cloud functionality (such as authentication) and some features specific to Vector Search (such as object ingestion). TileDB-Py is the Python wrapper for TileDB Embedded.
The following figure describes a typical TileDB-Vector-Search workflow, when interacting directly with object stores. The workflow starts with a set of vectors stored on the object store. You then use TileDB-Vector-Search to ingest the vectors into a TileDB vector search index. Finally, you can use TileDB-Vector-Search to efficiently query the data directly from the object store, without having to download any large files to local storage
For scalable and secure vector search data management, you should use TileDB Cloud, as demonstrated in the figure below. The vectors are stored in an object store supported by TileDB Cloud. You can invoke distributed ingestion using TileDB-Vector-Search, which leverages the scalable computational power of TileDB Cloud to perform it in a highly parallel fashion across numerous cloud workers. Then you can use TileDB-Vector-Search to query the data, in a similar scalable fashion.
TileDB Cloud also offers a broad spectrum of governance and application-building features.