Vector Search: Distributed Compute

ai/ml

vector search

foundation

scalable compute

Learn how you can leverage TileDB to perform scalable ingestion and search on vectors.

Note

This feature is available only on TileDB Cloud.

TileDB-Vector-Search is designed to handle large-scale vector datasets by leveraging TileDB Cloud distributed task graphs. This allows you to run TileDB-Vector-Search workloads (namely scalable ingestion and similarity searches) in a distributed manner across numerous machines that execute tasks in parallel.

The way that TileDB-Vector-Search uses these machines is algorithm-specific. As a brief summary:

FLAT and VAMANA: TileDB-Vector-Search can only run ingestion and query on a single machine at a time. This is still useful because you can leverage the power of TileDB Cloud task graphs to run the workload on a more powerful machine than your laptop.
IVF_FLAT: TileDB-Vector-Search is able to parallelize ingestion and query to run on a distributed group of machines at the same time. This allows you to easily scale your workloads to support billions of vectors. Note that, to most effectively use this feature, you should set partitions such that \(k\)-means clustering creates enough clusters to parallelize the ingestion and query workload across several machines. For example, if you set partitions to 1, only a single cluster will be created and ingestion will not be able to be parallelized.

The Tutorials: Distributed Compute section includes examples of using task graphs in TileDB-Vector-Search.

For more information on task graphs, visit the Catalog: Task Graphs section.