Parallelism
TileDB’s design supports parallelized read and write queries. This document outlines parallelized reds and writes and the configuration parameters you can use to control the amount of parallelization for both. Visit Configuration for a full list of configuration parameters and more details about each of the following parameters mentioned here.
Reads
When you perform a read query, TileDB goes through the following steps in this order:
- Identify the physical attribute data tiles relevant to the query, and prune the rest.
- Perform parallel I/O to retrieve those tiles from the storage backend.
- Unfilter on the data tiles in parallel to get the raw cell values and coordinates.
- Perform a refining step to get the actual results and organize them in the query layout.
TileDB parallelizes all steps, but the following subsections focus on steps 2 and 3, which are the most heavyweight.
Parallel I/O
TileDB reads the relevant tiles from all attributes to be read in parallel based on the following pseudocode:
parallel_for_each attribute being read:
parallel_for_each relevant tile of the attribute read:
prepare a byte range to be read (I/O task)
TileDB computes the byte ranges required to be fetched from each attribute file. Those byte ranges might be disconnected, and there might be many of them, especially with multi-range subarrays. To reduce the latency of the I/O requests (especially on cloud object stores), TileDB tries to merge byte ranges that are close to each other and dispatch fewer, larger I/O requests instead of many smaller ones. More specifically, TileDB merges two byte ranges if their gap size is smaller than "vfs.min_batch_gap"
and their resulting size is smaller than "vfs.min_batch_size"
. Then, each byte range (always corresponding to the same attribute file) becomes an I/O task. TileDB dispatches these I/O tasks for concurrent execution, where the "sm.io_concurrency_level parameter"
configuration parameter controls the maximum level of concurrency.
TileDB may further partition each byte range to be fetched based on the value of "vfs.file.max_parallel_ops"
(for POSIX and Windows systems), "vfs.<backend>.max_parallel_ops"
(for cloud object stores, where <backend>
is either s3
, azure
, or gcs
) and "vfs.min_parallel_size"
. TileDB then reads those partitions in parallel.
Parallel tile unfiltering
Once the relevant data tiles are in main memory, TileDB unfilters them (that is, it runs the filters applied during writes in reverse) in parallel in a nested manner, based on the following pseudocode:
for_each attribute being read:
parallel_for_each tile of the attribute read:
parallel_for_each chunk of the tile:
unfilter chunk
The TileDB filter list has a parameter that controls the size of each “chunk” of a tile. This parameter defaults to 64 KB.
The "sm.compute_concurrency_level"
configuration parameter affects these for
loops, although it is not recommended to modify this configuration parameter from its default setting. The nested parallelism in reads allows for maximum usage of the available cores for filtering (such as compression and decompression), in either the case where the query intersects fewer, larger tiles or more, smaller tiles.
Writes
When you perform a write query, TileDB goes through the following steps in this order:
- Reorganize the cells in the global cell order and into attribute data tiles.
- Filter the attribute data tiles to be written.
- Perform parallel I/O to write those tiles to the storage backend.
TileDB parallelizes all steps, but the following subsections focus on steps 2 and 3, which are the most heavyweight.
Parallel tile filtering
TileDB uses a similar strategy for writes as it does for reads:
parallel_for_each attribute being written:
for_each tile of the attribute being written:
parallel_for_each chunk of the tile:
filter chunk
Similar to reads, the "sm.compute_concurrency_level"
parameter affects these for
loops, although it is not recommended to modify this configuration parameter from its default setting.
Parallel I/O
Similar to reads, TileDB creates I/O tasks for each tile of every attribute. TileDB then dispatches these I/O tasks for concurrent execution, where the "sm.io_concurrency_level"
configuration parameter controls the maximum level of concurrency.
For POSIX and Windows, if a data tile is large enough, the VFS layer partitions the tile based on configuration parameters "vfs.file.max_parallel_ops"
and "vfs.min_parallel_size"
. TileDB then writes those partitions in parallel by using the VFS thread pool. The "vfs.io_concurrency"
configuration parameter controls the size of the VFS thread pool.
For cloud object stores, TileDB buffers potentially many tiles and issues parallel, multipart upload requests to cloud storage. The size of the buffer is equal to \(S × O\), with \(S\) being the value of "vfs.<backend>.multipart_part_size"
, \(O\) being the value of "vfs.<backend>.max_parallel_ops"
, and <backend>
being s3
, azure
, or gcp
, depending on your cloud storage backend. When the buffer is filled, TileDB issues "vfs.<backend>.max_parallel_ops"
parallel, multipart upload requests to cloud storage.