Learn how tables in TileDB use indexing and how it maps to traditional tabular indexes.
Tabular data commonly uses indexing to boost query performance. You can think of TileDB’s use of dimensions as analogous to the concept of indexing in a relational databases. This document outlines the similarities and differences between TileDB dimensions and traditional tabular indexing.
Clustered index
A clustered index defines the sort order of the physical layout of the data on your chosen storage medium. Clustered indexes are defined on one or more fields in a dataset, and do not require uniqueness of the index fields. Traditionally, a clustered index is also sorted in row-major ordering.
In TileDB, the dimensions directly relate to a clustered index for both sparse and dense arrays. TileDB arrays have the additional advantage of supporting multiple sort orders, including row-major, column-major, Hilbert, and other flexible orders based on space-filling curves defined over regular grids of the multi-dimensional space defined by the dimensions.
Primary index
A primary index is a clustered index that enforces uniqueness of rows on the dimension values. TileDB dense arrays always require uniqueness. Sparse arrays can be configured for uniqueness or not by toggling the allows_duplicates option when creating the array. You can learn more details about the differences in and sparse arrays, as well as cell multiplicities, in the Array Data Model section.
Secondary index
Some tabular systems support creating secondary indexes, which do not affect the physical sort order of data, but instead provide a fast look up for one or more fields that then “points” to the original physical row location.
TileDB has a significant advantage over traditional database systems in that you can query any dimension without requiring an in-order composite key. Often, tabular systems use secondary indexes because they do not support partial key lookups.
For example, consider a tabular dataset with fields dim0 and dim1 as a composite key. TileDB supports queries that have the following query patterns: filters on dim0 only, filters on dim1 only, or filters on dim0 and dim1 combined. Traditional database systems would only support queries on 1) dim0 or 2) dim0 and dim1 combined. The additional flexibility and power of TileDB replaces some of the secondary indexing use cases.
The following SQL CREATE TABLE and SELECT statements follow the previous example. TileDB is able to push down the predicate from the WHERE clause, unlike most traditional SQL databases, which would require a secondary index.