Dense vs. Sparse
The first decision when creating a new array is whether it will be dense or sparse. Read onward for tips that can help with this decision.
When to use a dense array
Dense arrays don’t materialize the coordinates of the cells. This allows them to use much more lightweight internal indexing during read queries. Choose this array type when the majority of the cells in your array have meaningful values (that is, they’re nonempty, nonzero, and not null). The benefits of this are as follows:
- Less storage consumption due to not having to materialize the cell coordinates and large indexes.
- Faster slicing queries as TileDB takes advantage of the dense data layouts and lightweight indexing.
When to use a sparse array
Sparse arrays materialize the coordinates of the cells and uses more sophisticated, multi-dimensional indexing, as compared to dense arrays. Choose this array type if the majority of the cells in the array are empty, zero, or null (in which case you should consider them as non-existent). The benefits of this are as follows:
- Less storage consumption than using a dense array and having to fill the non-existent cells with zeros or placeholder values.
- Faster slicing queries in sparse datasets with unknown distributions, as TileDB takes advantage of advanced multi-dimensional indexes.