Compression Performance

arrays

foundation

performance

compression

Learn about the factors to consider when implementing array compression, as well as why you should use compression for your arrays.

Note

Before reading onward, make sure to visit the following sections:

You should always consider compression when creating an array, as it can reduce the storage consumption of your array. It may also boost the average read query time, in case the I/O savings due to the reduced quantity of data fetched from storage outweighs the extra computational overhead of decompression. TileDB applies compression at the tile level. Thus, tiling has an impact on compression (visit Performance: Tiling for more details on choosing the tile size and shape).

Choosing a particular compression filter for attributes (in both dense and sparse arrays) and dimensions (applicable only to sparse arrays for the materialized cell coordinates) highly depends on the nature of the dataset and, specifically, the attribute or dimension data type and values. TileDB offers a wide range of compression filters that should work with any use case.

Each of the solutions offered by TileDB gives some strongly recommended defaults chosen after a lot of experimentation. However, your use case and datasets may be different. Thus, you should always test empirically what compression filter works for what dimensions and attributes before defining your final array schema and ingesting large quantities of data in your arrays.