Introduction
TileDB’s architecture builds upon multi-dimensional arrays. The tabular data model is a specialization of multi-dimensional arrays and, thus, TileDB is an excellent solution for modeling tabular data, as well as all other broad data modalities (such as genomics, transcriptomics, imaging, vector embeddings, and many more).
Tabular data is perhaps the most ubiquitous data modality in any organization. Tabular data is often directly associated with relational databases, which have been around for decades. While people have tried over the years to expand databases beyond the relational model to capture other modalities, none have succeeded in combining tables and other modalities in a single database system.
Multi-dimensional arrays, the basis of TileDB, can shape shift into almost any data model. This allows TileDB to capture both tabular and non-tabular use cases.
Why TileDB for tabular data
Different instantiations of TileDB arrays can efficiently capture tables, depending on the performance requirements of each use case. In the Data Model section, you’ll learn how multi-dimensional arrays are an efficient generalization of tables. TileDB’s notion of array dimensions serves as an effective indexing technique for tables, like traditional B+-trees or other multi-dimensional indexes, such as R-trees (which are also used internally in TileDB).
TileDB supports tables with native dataframe and SQL integrations. TileDB offers a larger number of benefits for tables including:
- Serverless: TileDB is completely serverless, which combines the best aspects of large data warehouses and cloud object stores, while minimizing idle cost for you and your organization.
- Cloud-native: TileDB is optimized for object stores. As such, you can scale your tables to cloud storage, while enjoying superb performance, leading to significant cost savings.
- Shared-everything: TileDB follows a so-called shared-everything architecture. This means that while each compute node is independent, they use centralized storage, such as an object store.
- Elastic: TileDB’s serverless and shared-everything architecture yield an effective separation of compute and storage. This enables for elastic scaling of each component independently, which yields cost-efficient resource usage without compromising performance.
- Omni-modal: TileDB’s shape-shifting arrays allow it to capture more uses cases than traditional relational databases, all in a single database system.
- Embarrassingly parallel: TileDB has its own distributed computing infrastructure, enabled by the lock-free multi-reader and multi-writer experience of the underlying array storage format. You can use task graphs to enable sophisticated algorithms with complex parallel task dependencies to handle embarrassingly parallel workloads.
TileDB is a compelling offering for tables, which can serve as a standalone data lakehouse. However, its true power stems from the fact that an organization can use it as a holistic data platform for all its data assets, which span way beyond tabular data.
Section organization
This rest of the Tables section is organized as follows:
- Quickstart: This page is the best way to get started with tabular data in TileDB. You’ll learn about TileDB’s SQL offering and how to run basic examples.
- Foundation: You can find all the background information and internal mechanics of TileDB regarding tabular data here. Learning these will give you a conceptual and practical understanding of TileDB to optimize usage of your tabular data.
- Tutorials: Follow a series of tutorials covering all aspects of TileDB’s tabular support, from basic ingestion of CSVs to advanced topics. Running these tutorials can help users start without any prior knowledge of TileDB to become power users.
- API Reference: Find a complete catalog of different table, column, and system parameters for SQL, options along with APIs in different languages.