Distributed Compute
This feature is available only on TileDB Cloud.
TileDB-Vector-Search is designed to handle large-scale vector datasets by leveraging TileDB Cloud distributed task graphs. This allows you to run TileDB-Vector-Search workloads (namely scalable ingestion and similarity searches) in a distributed manner across numerous machines that execute tasks in parallel.
The way that TileDB-Vector-Search uses these machines is algorithm-specific. As a brief summary:
FLAT
andVAMANA
: TileDB-Vector-Search can only run ingestion and query on a single machine at a time. This is still useful because you can leverage the power of TileDB Cloud task graphs to run the workload on a more powerful machine than your laptop.IVF_FLAT
: TileDB-Vector-Search is able to parallelize ingestion and query to run on a distributed group of machines at the same time. This allows you to easily scale your workloads to support billions of vectors. Note that, to most effectively use this feature, you should setpartitions
such that \(k\)-means clustering creates enough clusters to parallelize the ingestion and query workload across several machines. For example, if you setpartitions
to 1, only a single cluster will be created and ingestion will not be able to be parallelized.
The Tutorials: Distributed Compute section includes examples of using task graphs in TileDB-Vector-Search.
For more information on task graphs, visit the Catalog: Task Graphs section.