Summary of Factors
Here, you can find a summary of the most important performance factors of TileDB, including the following broad topics:
Array storage
You must consider the following factors when designing the array schema. In general, designing an effective schema that results in high performance depends on the type of data that the array will store, and the type of queries that users will run on the array.
Dense vs. sparse arrays
Choosing a dense or sparse array representation depends on the particular data and use case. Sparse arrays in TileDB require extra upkeep versus dense arrays due to the need to store explicit cell coordinates of the non-empty cells. Depending on your use case, the storage savings from not storing fill values for empty cells in a dense array may outweigh the extra upkeep from a sparse array. In that case, a sparse array would be a good choice. Sparse array reads also have extra runtime upkeep due to the potential need of sorting the cells based on their coordinates.
For more information, visit Dense vs. Sparse.
Space tile extent
In dense arrays, the space tile shape and size decides the atomic unit of I/O and compression. TileDB reads all space tiles in their entirety that intersect a read query’s subarray from disk, even if those space tiles only partially intersect with the query. Matching space tile shape and size with the typical read query can greatly improve performance. In sparse arrays, space tiles affect the order of the non-empty cells and, thus, the minimum bounding rectangle (MBR) shape and size.
In sparse arrays, TileDB reads from disk all tiles where the MBRs intersect a read query’s subarray. Choosing the space tiles in a way that the shape of the resulting MBR’s is similar to the read access patterns can improve read performance, since fewer MBRs may then overlap with the typical read query.
For more information, visit Tiling and Data Layout.
Tile capacity
The tile capacity decides the number of cells in a data tile of a sparse fragment. A data tile, as you may recall, is the atomic unit of I/O and compression. If the capacity is too small, the data tiles may also be too small, requiring many small, independent read operations to fulfill a particular read query. Moreover, very small tiles could lead to poor compressibility. If the capacity is too large, I/O operations would become fewer and larger (which is a good thing in general), but the amount of wasted work may increase due to reading more data than necessary to fulfill the typical read query.
For more information, visit Tiling and Data Layout.
Dimensions vs. attributes
If read queries typically access the entire domain of a particular dimension, consider making that dimension an attribute instead. You should use dimensions when read queries will issue ranges of that dimension of cells to retrieve, so that TileDB can internally prune cell ranges or whole tiles from the data that you want to read.
For more information, visit Dimensions vs. Attributes.
Tile filtering
Tile filtering (such as compression) can reduce persistent storage consumption, but at the expense of increased time to read tiles from and write tiles to the array. However, you can read tiles from disk in less time if those tiles are smaller due to compression, which can improve performance. Additionally, many compressors offer different levels of compression, which you can use to fine-tune the tradeoff.
Visit Compression for more recommendations related to compression.
Duplicate cells for sparse arrays
For sparse arrays, if you know the data doesn’t have any entries that have the same coordinates, or if you don’t mind getting more than one result for a coordinate, allowing duplicates can lead to significant performance improvements. For queries asking for results in the unordered layout, TileDB can process fragments in any order (and less of them at once) and skip deduplication of the results, so they will run much faster and require less memory.
Queries
The following factors impact query performance (that is, when reading data from or writing data to an array).
Number of array fragments
Large numbers of fragments can slow down performance during read queries, as TileDB must fetch a lot of fragment metadata, and internally sort and find potential overlap of cell data across all fragments. Thus, consolidating arrays with many fragments improves performance, since consolidation collapses a set of fragments into a single one.
Attribute subsetting
A benefit of the column-oriented nature of attribute storage in TileDB is if a read query does not require data for some attributes in the array, TileDB won’t touch or read those attributes from disk at all. Thus, specifying only the attributes you need for each read query will improve performance. In write queries, you must still specify values for all attributes.
Read layout
When you use an ordered layout (row-major or column-major) for a read query, and that layout differs from the array physical layout, TileDB must internally sort and reorganize the cell data it reads from the array into the requested order. If you don’t care about the particular ordering of cells in the result buffers, you can specify a global order query layout, allowing TileDB to skip the sorting and reorganization steps.
Write layout
Similarly, if your specified write layout differs from the physical layout of your array (that is, the global cell order), then TileDB needs to internally sort and reorganize the cell data into global order before writing to the array. If your cells are already arranged in global order, TileDB can skip the sorting and reorganization step when you issue a write query with global order. Additionally, many consecutive writes in global order to an open array will append to a single fragment rather than creating a new fragment per write, which can improve later the read performance. You must take care with global-order writes, as it is an error to write data to a subarray that is not aligned with the space tiling of the array.
Open and close arrays
Before issuing queries, you must first “open” an array. When you open an array, TileDB reads and decodes the array and fragment data, which may require disk operations, decompression, or both. You can open an array once and issue many queries to the array before closing. Minimizing the number of array open and close operations can improve performance.
That being said, when reading from an array, TileDB fetches metadata from relevant fragments from the backend and cached in memory. The more queries you perform, the more metadata TileDB fetches throughout time. To save memory space, you will need to occasionally close and reopen the array.
Some TileDB APIs support closing and reopening the array in a single command, which can be faster than closing and reopening the array in two different steps. For example, TileDB-Py has a reopen()
method.
Configuration parameters
The following runtime configuration parameters can tune the performance of many internal TileDB subsystems.
Cache size
When TileDB reads a specific tile from disk during a read query, TileDB places the uncompressed tile in an in-memory LRU cache associated with the query’s context. When future read queries in the same context request that tile, TileDB may copy that tile directly from the cache instead of re-fetching it from disk. The "sm.tile_cache_size"
configuration parameter decides the total size in bytes of the in-memory tile cache. Increasing it can lead to a higher cache hit ratio, and better performance.
Coordinate deduplication
During sparse writes, if you set "sm.dedup_coords"
to true, TileDB will internally deduplicate cells it’s trying to write, so that it doesn’t try to write many cells with the same coordinate tuples (which is an error). If you leave "sm.dedup_coords"
as false, TileDB runs a lighter-weight check, controlled with "sm.check_coord_dups"
(which is true by default), where TileDB will search for duplicates and return an error if TileDB finds any duplicates. Disabling these checks can lead to better performance on writes.
Coordinate out-of-bounds check
During sparse writes, TileDB will internally check whether the given coordinates fall outside the domain. You can control this check with "sm.check_coord_oob"
, which is true by default. If you are certain that this is impossible in your application, you can set this parameter to false, avoiding the check and, thus, boosting performance.
Coordinate global order check
During sparse writes in global order, TileDB will internally check whether the given coordinates obey the global order. You can control this check with "sm.check_global_order"
, which is true by default. If you are certain that this is impossible in your application, you can set this parameter to false, avoiding the check and, thus, boosting performance.
Consolidation parameters
You can learn about the effect of the different "sm.consolidation.\*"
parameters in Consolidation. For a full list of configuration parameters related to consolidation, refer to Configuration: Consolidation and Vacuuming.
Memory budget
The "sm.memory_budget"
and "sm.memory_budget_var"
parameters control the memory budget, which caps the total number of bytes that TileDB for each fixed-length or variable-length attribute during reads. This can prevent out-of-memory issues when a read query overlaps with many tiles that TileDB must fetch and decompress in memory. For large subarrays, this may lead to incomplete queries.
Concurrency
TileDB controls concurrency by the "sm.compute_concurrency_level"
and "sm.io_concurrency_level"
parameters. "sm.compute_concurrency_level"
is the upper limit on the number of OS threads TileDB will use for compute-bound tasks, whereas "sm.io_concurrency_level"
is the upper limit of the number of OS threads TileDB will use for I/O-bound tasks. These parameters default to the number of logical cores on the system.
VFS read batching
During read queries, the VFS will batch reads for distinct tiles that are physically close together (but not necessarily adjacent) in the same file. The "vfs.min_batch_size"
parameter sets the minimum size in bytes required for a single batched read operation. The VFS will use this parameter to group “close by” tiles into the same batch, if the new batch size is smaller than or equal to "vfs.min_batch_size"
. This can help minimize the I/O latency that can come with many, very small VFS read operations. Similarly, "vfs.min_batch_gap"
defines the minimum number of bytes between the end of one batch and the start of the next one. If two batches are fewer bytes apart than "vfs.min_batch_gap"
, TileDB stitches them together into a single batch read operation. This can help better group and parallelize over “adjacent” batch read operations.
Cloud object store parallelism
"vfs.<backend>.max_parallel_ops"
controls the level of parallelism, with <backend>
being any of s3
, azure
, or gcs
, depending on your cloud storage backend. This decides the maximum number of parallel operations for s3://
, azure://
, or gcs://
URIs independently of the VFS thread pool size, so you can oversubscribe or undersubscribe VFS threads. Oversubscription can sometimes be useful with cloud object stores, to help hide I/O latency.
Cloud object store write size
Replacing "vfs.min_parallel_size"
for cloud storage objects, the "vfs.<backend>.multipart_part_size"
parameter controls the minimum part size of cloud object storage multi-part writes, with <backend>
being any of s3
, azure
, or gcs
, depending on your cloud storage backend.
Note that TileDB buffers a specific number of bytes in memory before submitting a write request, represented by the formula \(S × O\), with \(S\) being the value of "vfs.<backend>.multipart_part_size"
and \(O\) being the value of "vfs.<backend>.max_parallel_ops"
. After TileDB sets the buffer, it issues the multipart write in parallel.
Python multiprocessing
TileDB’s Python integration works well with Python’s multiprocessing
package and the ProcessPoolExecutor
and ThreadPoolExecutor
classes of the concurrent.futures
package. You can find a large usage example demonstrating parallel CSV ingestion in the TileDB-Inc/TileDB-Py repo. You can run this example in either ThreadPool
or ProcessPool
mode.
The default multiprocessing
execution method for ProcessPoolExecutor
on Linux is not compatible with TileDB (nor with most other multi-threaded applications) due to complications of the global process state with the fork
start method. You must use ProcessPoolExecutor
with multiprocessing.set_start_method("spawn")
to avoid unexpected behavior, such as hangs and crashes.
For more information about the different start methods, refer to Python’s documentation on contexts and start methods.
System parameters
Hardware Parallelism
The number of cores and hardware threads of the machine affects the amount of parallelism TileDB can use internally to speed up reads, writes, compression, and decompression.
Storage backends
The different types of storage backend (cloud object stores and local disk) have different throughput and latency characteristics, which can impact query time.