Introduction to Arrays

arrays

An introduction to arrays and their benefits, and an outline of the rest of the Arrays section.

It is all about arrays!

TileDB is architected around a powerful data structure, the multi-dimensional array.

Arrays and their benefits

The multi-dimensional array is a first-class citizen in TileDB. Arrays constitute standalone solutions that the users can use for their scientific work, as they appear in numerous use cases, such as in linear algebra, statistics, machine learning, quantitative analysis, mathematical simulations, and many more. In addition, arrays can be used as a building block to develop other sophisticated solutions around complex data. For reference, you can learn about what the TileDB team builds with arrays in the rest of the Structure section (for example, in Life Sciences, Geospatial, and so on).

The core TileDB array engine is open-source and lives in the TileDB-Inc/TileDB repository. This is a C++ library built in C++ (often referred to as TileDB Embedded), which comes with numerous other language wrapper APIs (used throughout the Tutorials section and thoroughly covered in the API Reference section). However, a lot of key functionality (especially around secure data management, distributed computing and scalability) covered in this section is specific to the TileDB Cloud commercial product.

The TileDB arrays solution offers broad functionality and benefits:

Support for both dense and sparse arrays
Chunked (tiled) arrays
Multiple compression, encryption, and checksum filters
Efficient push-down of aggregates and other query conditions
Fully multi-threaded implementation
Efficient object storage support (Amazon S3, Google Cloud Storage, Azure Blob Storage, MinIO)
Parallel I/O with multiple concurrent writers and multiple readers
Data versioning (rapid updates, time traveling)
Array metadata
Array groups
Numerous APIs on top of a super performant core C++ library
Holistic catalog for discoverability (on TileDB Cloud)
Access control (on TileDB Cloud)
Logging for auditing (on TileDB Cloud)
A marketplace to share your work and discover other exciting datasets (on TileDB Cloud)

Section organization

This rest of the Arrays section is organized as follows:

Quickstart: This is the best way to get started with TileDB arrays. You will learn how to install TileDB in your preferred language and run basic array examples.
Foundation: This contains all the background information and internal mechanics of TileDB. Learning these will provide a very deep understanding of the TileDB technology and power, and help maximize the value users get from TileDB.
Tutorials: This is a series of tutorials covering all aspects of TileDB arrays, from basic ingestion to massively scalable computations. Running those tutorials can help users start without any prior knowledge of TileDB and become power users.
API Reference: This lists all the TileDB functionality across the numerous programming languages it supports, and enables fast lookups on API usage.

How to run the various tutorials

You can run each of the tutorials in this section in one of two ways, which is specified in the beginning of each tutorial:

Locally on your machine.
On TileDB Cloud.

However, since TileDB Cloud has a free tier, we strongly recommend that you sign up and run everything there, as that requires no installations or deployment.