Attributes
Dimensions define the hyperspace of the array, in which cells are stored and efficiently retrieved. You can think of dimensions as the fields of a dataset that receive the majority of the query conditions, since TileDB is designed to perform range searches on these dimensions very fast. Dimensions differ from attributes, which define the values that TileDB stores inside the multi-dimensional cells. Visit the Performance: Dimensions vs. Attributes section for information about how to choose the dimensions and attributes when creating an array for your dataset.
Visit the Array Data Model section for more details.
Columnar format
An array may have multiple attributes, which means that every (non-empty) cell can store multiple values, potentially of different types. TileDB stores the values of each attribute across all cells in separate files (i.e., it follows the so-called columnar format). This is beneficial for several reasons, two of which are:
- Each file contains values of the same type, potentially very similar to each other, which leads to more effective compression.
- Queries that subselect over attributes do not need to fetch values of irrelevant attributes from storage at all. This significantly boosts performance.
Fixed-length attributes
A cell along a fixed-length attribute either takes a single value of a defined basic data type (e.g., int32
), or it can take a fixed, prespecified (upon array creation) number of values of the same defined basic data type (e.g., 3 int32
values, such as 1,2,3
). In both cases, each cell value along the a fixed-length attribute consume the same size in bytes.
Variable-length attributes
In addition, TileDB supports variable-length attributes, such as strings and lists of basic data type values of different size for each cell. TileDB stores two files for each variable-length attribute. One serializes all the values of the non-empty cells on this attribute (visit the Key Concepts: Data Layout section for more detail on how TileDB stores multi-dimensional values serially on storage), and one stores the starting offsets (in bytes) of each cell value in the first file. For example, if an ASCII string attribute stores values "a", "bb", "ccc"
, then the first file contains abbccc
, whereas the second file contains offsets (in bytes) 0, 1, 3
. With this information, TileDB can easily locate the second attribute value in the first file using the second offset (1
) in the offsets file. Visit the Key Concepts: Data Layout and Storage Format Spec sections for a detailed description of the data stored in each attribute data and offset file in TileDB.
Supported attribute data types
The following table summarizes the supported attribute data types for dense and sparse arrays.
Datatype | Description | Array type |
---|---|---|
TILEDB_BLOB |
Opaque bytes. Does not support query conditions. | Dense & Sparse |
TILEDB_STRING_ASCII |
ASCII string | Dense & Sparse |
TILEDB_STRING_UTF8 |
UTF-8 string | Dense & Sparse |
TILEDB_STRING_UTF16 |
UTF-16 string | Dense & Sparse |
TILEDB_STRING_UTF32 |
UTF-32 string | Dense & Sparse |
TILEDB_INT8 |
8-bit integer | Dense & Sparse |
TILEDB_UINT8 |
8-bit unsigned integer | Dense & Sparse |
TILEDB_INT16 |
16-bit integer | Dense & Sparse |
TILEDB_UINT16 |
16-bit unsigned integer | Dense & Sparse |
TILEDB_INT32 |
32-bit integer | Dense & Sparse |
TILEDB_UINT32 |
32-bit unsigned integer | Dense & Sparse |
TILEDB_INT64 |
64-bit integer | Dense & Sparse |
TILEDB_UINT64 |
64-bit unsigned integer | Dense & Sparse |
TILEDB_FLOAT32 |
32-bit floating point | Dense & Sparse |
TILEDB_FLOAT64 |
64-bit floating point | Dense & Sparse |
TILEDB_DATETIME_YEAR |
Years | Dense & Sparse |
TILEDB_DATETIME_MONTH |
Months | Dense & Sparse |
TILEDB_DATETIME_WEEK |
Weeks | Dense & Sparse |
TILEDB_DATETIME_DAY |
Days | Dense & Sparse |
TILEDB_DATETIME_HR |
Hours | Dense & Sparse |
TILEDB_DATETIME_MIN |
Minutes | Dense & Sparse |
TILEDB_DATETIME_SEC |
Seconds | Dense & Sparse |
TILEDB_DATETIME_MS |
Milliseconds | Dense & Sparse |
TILEDB_DATETIME_US |
Microseconds | Dense & Sparse |
TILEDB_DATETIME_NS |
Nanoseconds | Dense & Sparse |
TILEDB_DATETIME_PS |
Picoseconds | Dense & Sparse |
TILEDB_DATETIME_FS |
Femtoseconds | Dense & Sparse |
TILEDB_DATETIME_AS |
Attoseconds | Dense & Sparse |
Datatype | Description | Array type |
---|---|---|
Datatype.Blob |
Opaque bytes. Does not support query conditions. | Dense & Sparse |
Datatype.StringAscii |
Variable length string | Dense & Sparse |
Datatype.StringUtf8 |
UTF-8 string | Dense & Sparse |
Datatype.StringUtf16 |
UTF-16 string | Dense & Sparse |
Datatype.StringUtf32 |
UTF-32 string | Dense & Sparse |
Datatype.Int8 |
8-bit integer | Dense & Sparse |
Datatype.UInt8 |
8-bit unsigned integer | Dense & Sparse |
Datatype.Int16 |
16-bit integer | Dense & Sparse |
Datatype.UInt16 |
16-bit unsigned integer | Dense & Sparse |
Datatype.Int32 |
32-bit integer | Dense & Sparse |
Datatype.UInt32 |
32-bit unsigned integer | Dense & Sparse |
Datatype.Int64 |
64-bit integer | Dense & Sparse |
Datatype.UInt64 |
64-bit unsigned integer | Dense & Sparse |
Datatype.Float32 |
32-bit floating point | Dense & Sparse |
Datatype.Float64 |
64-bit floating point | Dense & Sparse |
DataType.DateTimeYear |
Years | Dense & Sparse |
DataType.DateTimeMonth |
Months | Dense & Sparse |
DataType.DateTimeWeek |
Weeks | Dense & Sparse |
DataType.DateTimeDay |
Days | Dense & Sparse |
DataType.DateTimeHour |
Hours | Dense & Sparse |
DataType.DateTimeMinute |
Minutes | Dense & Sparse |
DataType.DateTimeSecond |
Seconds | Dense & Sparse |
DataType.DateTimeMillisecond |
Milliseconds | Dense & Sparse |
DataType.DateTimeMicrosecond |
Microseconds | Dense & Sparse |
DataType.DateTimeNanosecond |
Nanoseconds | Dense & Sparse |
DataType.DateTimePicosecond |
Picoseconds | Dense & Sparse |
DataType.DateTimeFemtosecond |
Femtoseconds | Dense & Sparse |
DataType.DateTimeAttosecond |
Attoseconds | Dense & Sparse |
Datatype | Description | Array type |
---|---|---|
"ascii" |
Variable length string | Dense & Sparse |
np.dtype('U') |
UTF-8 string | Dense & Sparse |
np.int8 |
8-bit integer | Dense & Sparse |
np.uint8 |
8-bit unsigned integer | Dense & Sparse |
np.int16 |
16-bit integer | Dense & Sparse |
np.uint16 |
16-bit unsigned integer | Dense & Sparse |
np.int32 |
32-bit integer | Dense & Sparse |
np.uint32 |
32-bit unsigned integer | Dense & Sparse |
np.int64 |
64-bit integer | Dense & Sparse |
np.uint64 |
64-bit unsigned integer | Dense & Sparse |
np.float32 |
32-bit floating point | Dense & Sparse |
np.float64 |
64-bit floating point | Dense & Sparse |
"datetime64[Y]" |
Years | Dense & Sparse |
"datetime64[M]" |
Months | Dense & Sparse |
"datetime64[W]" |
Weeks | Dense & Sparse |
"datetime64[D]" |
Days | Dense & Sparse |
"datetime64[h]" |
Hours | Dense & Sparse |
"datetime64[m]" |
Minutes | Dense & Sparse |
"datetime64[s]" |
Seconds | Dense & Sparse |
"datetime64[ms]" |
Milliseconds | Dense & Sparse |
"datetime64[us]" |
Microseconds | Dense & Sparse |
"datetime64[ns]" |
Nanoseconds | Dense & Sparse |
"datetime64[ps]" |
Picoseconds | Dense & Sparse |
"datetime64[fs]" |
Femtoseconds | Dense & Sparse |
"datetime64[as]" |
Attoseconds | Dense & Sparse |
Datatype | Description | Array type |
---|---|---|
raw |
Opaque bytes. Does not support query conditions. | Dense & Sparse |
"ASCII" |
Variable length string | Dense & Sparse |
character |
UTF-8 string | Dense & Sparse |
"INT8" |
8-bit integer | Dense & Sparse |
"UINT8" |
8-bit unsigned integer | Dense & Sparse |
"INT16" |
16-bit integer | Dense & Sparse |
"UINT16" |
16-bit unsigned integer | Dense & Sparse |
"INT32" |
32-bit integer | Dense & Sparse |
"UINT32" |
32-bit unsigned integer | Dense & Sparse |
"INT64" |
64-bit integer | Dense & Sparse |
"UINT64" |
64-bit unsigned integer | Dense & Sparse |
"FLOAT32" |
32-bit floating point | Dense & Sparse |
"FLOAT64" |
64-bit floating point | Dense & Sparse |
"DATETIME_YEAR" |
Years | Dense & Sparse |
"DATETIME_MONTH" |
Months | Dense & Sparse |
"DATETIME_WEEK" |
Weeks | Dense & Sparse |
"DATETIME_DAY" |
Days | Dense & Sparse |
"DATETIME_HR" |
Hours | Dense & Sparse |
"DATETIME_MIN" |
Minutes | Dense & Sparse |
"DATETIME_SEC" |
Seconds | Dense & Sparse |
"DATETIME_MS" |
Milliseconds | Dense & Sparse |
"DATETIME_US" |
Microseconds | Dense & Sparse |
"DATETIME_NS" |
Nanoseconds | Dense & Sparse |
"DATETIME_PS" |
Picoseconds | Dense & Sparse |
"DATETIME_FS" |
Femtoseconds | Dense & Sparse |
"DATETIME_AS" |
Attoseconds | Dense & Sparse |