Introduction

scale

This introduces scalable computation in TileDB Cloud.

TileDB Cloud’s serverless, elastic, distributed compute infrastructure allows you to scale as the data analysis requirements grow, while optimizing the total cost of operations. Enjoy phenomenal scale, staying compliant within one trusted research environment that governs your data.

TileDB Cloud’s scaling and efficiencies allow you to run analysis and workflows to enable rapid experimentation, which ultimately expedites discovery.

Architecture

TileDB Cloud architecture is designed to scale elastically without user interactions. A major difference from traditional databases and traditional computation cluster solutions is the fact that, as a direct user, you do not need to worry about manually spinning up a cluster or having to wait for IT to provision additional resources. Instead, we designed TileDB Cloud so that the infrastructure is completely elastic, and compute resources are automatically provisioned. TileDB Cloud is built on top of Kubernetes as its provisioning infrastructure.

Multi-region

TileDB Cloud is designed to support multi-region deployments where data can be stored securely in the region of choice, and all compute is moved to the region where the data lives. This allows for both security and performance by limiting data movement.

Scalable data access

Scalability starts with the being able to access large quantities of data in a fast, efficient, and serverless manner. Visit Slicing for more details on how TileDB supports efficient data access. TileDB Cloud uses centralized scalable storage such as object stores like Amazon S3 that allow for highly parallel and concurrent requests. High-performance, centralized storage combined with the elastic computation capabilities allow for you to achieve extreme scale in a cost-effective manner.

Task graphs

TileDB Cloud’s task graphs provide the foundation for large-scale computations. Task graphs are built to support both real-time analytics and large-scale, distributed jobs. This allows for adaptability to fit different use cases. Most task systems either support batch-style operations or smaller, real-time use cases, but not both.

Task graphs allow for heterogeneous resources and heterogeneous tasks. Different languages, different functions, and a mixing of SQL, Python, and R are all supported in a single task graph.

Serverless SQL

TileDB Cloud’s support for tables and serverless SQL can be used in conjunction with task graphs to provide the ability to run large-scale, distributed SQL tasks, along with using the input or output of SQL along with any function as part of a task graph.

User-defined functions

TileDB Cloud’s user-defined functions allow you to run any arbitrarily-defined Python or R function. Along with Serverless SQL, user-defined functions support heterogeneous compute and building complex task graphs.

Workflows

TileDB Cloud supports Nextflow workflows. This allows for running of pipelines from nf-core and building your own custom pipelines to run on top of TileDB Cloud’s computation infrastructure.