Tiles
It is strongly recommended to read the following sections before you learn about tiles.
What is a tile?
A tile is a group of cells, serving as the atomic unit of I/O and compression (see the Key Concepts: Compression and Key Concepts: Tile Filters sections for more information about compression and the other tile filters supported in TileDB). TileDB differentiates between the following tile definitions:
- Space tile: This is defined upon array creation by specifying a tile extent per dimension, which partitions the dimension domain into equal segments.
- Data tile: This is the actual collection of data values included in the tile and materialized on storage. In dense arrays, the space and the data tiles are equivalent. In sparse arrays, they may be different; space tiles define the overall data layout that sorts the data values on storage, but may contain empty cells that are not materialized. A data tile in sparse arrays is defined after the data is sorted and is determined by an extra parameter called capacity. All data tiles in both dense and sparse arrays have the same capacity (i.e., the same number of non-empty cells). In dense arrays, the capacity is inferred by the space tile, whereas in sparse arrays, the user specifies the capacity explicitly specified upon the array creation.
Section Key Concepts: Data Layout explains space and data tiles in more detail.
In addition, it is worth mentioning the following terminology:
- Logical tile: A logical tile (which can be either a space tile or a data tile) refers to the multi-dimensional cells of the array, regardless of how many attribute values each contains or how these are laid out on storage.
- Physical tile: A physical tile can only be a data tile and always corresponds to the stored cell values across a specific attribute. This is the actual atomic unit of compression and I/O.
The above terminology forms the basis of other concepts and tutorials across the Academy.
Fill values
A user can populate a TileDB array partially and incrementally (see the Key Concepts: Domain section for the discussion on the non-empty domain). Therefore, the following scenarios are possible for the case of dense arrays:
- A tile may be partially written.
- A tile may be partially outside the array domain.
- An empty tile may be read.
In all these cases, TileDB may need to write special fill values to tiles to indicate “empty cells”, or similarly return fill values for a query asking for unpopulated tiles. The following figure demonstrates some examples.
TileDB supports the following default fill values per attribute type, but these can be set by the user when defining the attributes upon array creation.
Datatype | Default fill value |
---|---|
TILEDB_BLOB |
0 |
TILEDB_STRING_ASCII |
0 |
TILEDB_STRING_UTF8 |
0 |
TILEDB_STRING_UTF16 |
0 |
TILEDB_STRING_UTF32 |
0 |
TILEDB_INT8 |
Minimum int8 value |
TILEDB_UINT8 |
Maximum uint8 value |
TILEDB_INT16 |
Minimum int16 value |
TILEDB_UINT16 |
Maximum uint16 value |
TILEDB_INT32 |
Minimum int32 value |
TILEDB_UINT32 |
Maximum uint32 value |
TILEDB_INT64 |
Minimum int64 value |
TILEDB_UINT64 |
Maximum uint64 value |
TILEDB_FLOAT32 |
NaN |
TILEDB_FLOAT64 |
NaN |
TILEDB_DATETIME_* |
Minimum int64 value |
In the case a fixed-sized attribute stores more than one value, all the cell values will be assigned the corresponding default value shown above.