TileDB Cloud URIs with Single Cell Data

life sciences

single cell (soma)

foundation

Learn about the TileDB Cloud URI scheme and how to use it to interact with SOMA datasets.

Introduction

TileDB Cloud provides a secure and convenient way to access data stored in remote object stores, with Amazon S3 being the preferred option for TileDB Cloud SaaS. Using a specialized URI (Uniform Resource Identifier) scheme, TileDB Cloud automatically handles the authentication process, eliminating the need for users to directly configure AWS IAM roles, that can be used to create, access, and manage any of your assets. This document provides an overview of TileDB Cloud URIs and how to use them.

URI format for asset creation

When creating a new array, group, or SOMA experiment on TileDB Cloud, you will need to use a special creation URI in the following format: tiledb://<namespace>/s3://<bucket>/<name>, where:

<namespace> is your TileDB username or organization name.
<bucket> is the S3 bucket where the asset will be physically stored.
<name> is the name of the asset.

The following examples use tiledb-inc as the namespace and demo-data as the S3 bucket. In practice, you would replace these with your actual values.

Example:

To create a new SOMA experiment named pbmc3k, you could use the following command:

tiledbsoma.io.from_anndata(
    experiment_uri="tiledb://tiledb-inc/s3://demo-data/pbmc3k",
    measurement_name="RNA",
    anndata=pbmc3k,
)

This command will:

Physically create the new TileDB SOMA experiment at s3://demo-data/pbmc3k.
Register this on TileDB Cloud under the tiledb-inc namespace and assign it a UUID.

Accessing assets

After an asset is created and registered, it appears in TileDB Cloud’s data catalog under the specified namespace. On the asset’s landing page, you will find two URIs:

tiledb://tiledb-inc/pbmc3k (assuming pbmc3k is unique in your namespace)
tiledb://tiledb-inc/<uuid> (where <uuid> is a unique identifier for the asset)

You can access this experiment using either of these URIs, or with the original creation URI (i.e., tiledb://tiledb-inc/s3://demo-data/pbmc3k):

with tiledbsoma.Experiment.open("tiledb://tiledb-inc/pbmc3k") as exp:
    exp.obs.read().concat()

with tiledbsoma.Experiment.open(
    "tiledb://tiledb-inc/078007ae-49c0-46b8-8924-3cc625615ddb"
) as exp:
    exp.obs.read().concat()

with tiledbsoma.Experiment.open("tiledb://tiledb-inc/s3://demo-data/pbmc3k") as exp:
    exp.obs.read().concat()

While all three forms of TileDB Cloud URIs offer secure access to the asset, each form has their own advantages. The following list offers suggestions on when to use each form of URI:

Name-based URI: A convenient, human-readable format.
UUID-based URI: Necessary if your namespace contains more than one asset named pbmc3k.
S3-based URI: Useful if you intend to add new nested assets to a TileDB group (this also includes SOMA experiments and its constituent collections).

Conclusion

Understanding the URI schemes in TileDB Cloud is essential for efficient and secure data management. Each URI variant serves a particular use case and has its advantages, from human-readability to disambiguation and organizational logic.