Introduction
It is critical for organizations to be able to list and search over all their data and code assets in one securely governed place, eliminating silos and increasing asset discoverability and team productivity. TileDB enables you to store all your assets (both data and code) physically on a cloud object store like Amazon S3 under your full control, while being able to register these assets with TileDB, which catalogs, version-controls, and indexes them for rapid search, and logs them for audit purposes.
TileDB supports a broad spectrum of data and code modalities:
- Data modalities
- Arrays: This is the foundational data structure that TileDB uses to structure all other complex modalities, which leads to unprecedented performance.
- Tables: Tabular data (tables, dataframes, worksheets) is one of the most common data modalities.
- Single-cell (SOMA): TileDB’s pioneering data representation for single-cell data, called SOMA.
- Genomics (VCF): TileDB’s pioneering representation for genomic variants, which losslessly and super efficiently models VCF genomic variant data at grand scale.
- Biomedical Imaging: TileDB models biomedical images as 2D or 3D dense arrays.
- Vectors: The building block for vector search (aka similarity search), typically used in conjunction with a large language model (LLM) for Generative AI.
- Files: Files are natively stored in TileDB, capturing any arbitrary binary data representation that does not fit in any other category.
- Code modalities
- Notebooks: TileDB supports Jupyter notebooks, launched and managed within its secure and compliant infrastructure.
- Dashboards: You are able to create and launch dashboards, powered by Python widgets or R Shiny.
- User-defined functions (UDFs): TileDB supports serverless UDFs in a variety of programming languages.
- Task graphs: TileDB has its own powerful, serverless, distributed compute infrastructure, which allows you to create and launch distributed workloads modeled as task dependency graphs.
- ML models: You can store, version-control, and share any machine learning model.
A strong differentiating attribute of TileDB is its ability to structure some of the above modalities as multi-dimensional arrays, which can lead to tremendous performance benefits, especially for large-scale data. Examples include tables, single-cell, genomics, and more. Visit the Structure section to learn more about how TileDB can help you properly structure your data to maximize performance and lower your cloud costs.
The rest of the Catalog section focuses on TileDB’s cataloging functionality. Sections Colaborate, Analyze, Scale, and Structure provide for more information on TileDB’s other capabilities around the above modalities.
Before you continue exploring TileDB’s catalog functionality, make sure you read the Get Started section, which will guide you through signing up and creating a TileDB account, as well as installing the necessary libraries in case you wish to interface with TileDB via programmatic APIs instead of TileDB’s UI console.