Learn how to run TileDB-SOMA on your local machine, and interact with local file systems and TileDB Cloud.
This tutorial will walk you through installing and configuring TileDB-SOMA on your local machine Whether you’re working with single-cell data stored locally, in cloud object stores, or directly on TileDB Cloud, this guide covers the essential steps to get started.
Install
TileDB-SOMA provides APIs for Python and R. Each of these APIs can be installed in a few different ways. Select your preferred API and installation method below to get started.
Conda will install pre-built TileDB-R and TileDB core binaries for macOS or Linux.
Installing TileDB-R on macOS with Apple Silicon chips
When loading r-tiledbsoma with library(tiledbsoma), you may encounter the following error:
Error: package or namespace load failed for 'tiledbsoma':
.onLoad failed in loadNamespace() for 'tiledbsoma', details:
call: dyn.load(file, DLLpath = DLLpath, ...)
error: unable to load shared object '/tmp/miniconda3/envs/tiledb-soma/lib/R/library/RcppCCTZ/libs/RcppCCTZ.dylib':
dlopen(/tmp/miniconda3/envs/tiledb-soma/lib/R/library/RcppCCTZ/libs/RcppCCTZ.dylib, 0x0006): tried: '/tmp/miniconda3/envs/tiledb-soma/lib/R/library/RcppCCTZ/libs/RcppCCTZ.dylib' (mach-o file, but is an incompatible architecture (
have 'x86_64', need 'arm64e' or 'arm64')), '/System/Volumes/Preboot/Cryptexes/OS/tmp/miniconda3/envs/tiledb-soma/lib/R/library/RcppCCTZ/libs/RcppCCTZ.dylib' (no such file), '/tmp/miniconda3/envs/tiledb-soma/lib/R/library/RcppCCTZ/l
ibs/RcppCCTZ.dylib' (mach-o file, but is an incompatible architecture (have 'x86_64', need 'arm64e' or 'arm64'))
To overcome this error, reinstall RcppCCTZ and nanotime from source:
install.packages("RcppCCTZ", type ="source")install.packages("nanotime", type ="source")
Mamba will install pre-built TileDB-R and TileDB core binaries for macOS or Linux.
Installing TileDB-R on macOS with Apple Silicon chips
When loading r-tiledbsoma with library(tiledbsoma), you may encounter the following error:
Error: package or namespace load failed for 'tiledbsoma':
.onLoad failed in loadNamespace() for 'tiledbsoma', details:
call: dyn.load(file, DLLpath = DLLpath, ...)
error: unable to load shared object '/tmp/miniconda3/envs/tiledb-soma/lib/R/library/RcppCCTZ/libs/RcppCCTZ.dylib':
dlopen(/tmp/miniconda3/envs/tiledb-soma/lib/R/library/RcppCCTZ/libs/RcppCCTZ.dylib, 0x0006): tried: '/tmp/miniconda3/envs/tiledb-soma/lib/R/library/RcppCCTZ/libs/RcppCCTZ.dylib' (mach-o file, but is an incompatible architecture (
have 'x86_64', need 'arm64e' or 'arm64')), '/System/Volumes/Preboot/Cryptexes/OS/tmp/miniconda3/envs/tiledb-soma/lib/R/library/RcppCCTZ/libs/RcppCCTZ.dylib' (no such file), '/tmp/miniconda3/envs/tiledb-soma/lib/R/library/RcppCCTZ/l
ibs/RcppCCTZ.dylib' (mach-o file, but is an incompatible architecture (have 'x86_64', need 'arm64e' or 'arm64'))
To overcome this error, reinstall RcppCCTZ and nanotime from source:
install.packages("RcppCCTZ", type ="source")install.packages("nanotime", type ="source")
tiledbsoma: 1.11.4
tiledb-r: 0.27.0
tiledb core: 2.23.1
libtiledbsoma: 2.23.1
R: R version 4.3.3 (2024-02-29)
OS: Debian GNU/Linux 11 (bullseye)
Your starting point is the pbmc3k dataset, which contains 2,700 peripheral blood mononuclear cells (PBMC) from a healthy donor. The raw data was generated by 10X Genomics and is available on their website. The version of the dataset used here was processed with this scanpy notebook.
Download and load an RDS file containing a Seurat version of pbmc3k dataset, which has been made available on TileDB Cloud using the Files feature.
rds_uri <-"tiledb://TileDB-Inc/scanpy_pbmc3k_processed_rds"rds_path <-file.path(tempdir(), "pbmc3k_processed.rds")if (!file.exists(rds_path)) {if (!tiledb_filestore_uri_export(rds_path, rds_uri)) {stop("Failed to export RDS file from TileDB Cloud") }}pbmc3k <-readRDS(rds_path)pbmc3k
An object of class Seurat
1838 features across 2638 samples within 1 assay
Active assay: RNA (1838 features, 0 variable features)
2 layers present: counts, data
4 dimensional reductions calculated: umap, tsne, draw_graph_fr, pca
Local file system
Here you will see how to work with TileDB-SOMA on your local machine, using a local file system. This is the simplest setup, as it requires no additional configuration.
Ingestion
As you learned in the Data Ingestion tutorial, the SOMA ingestion process requires a user-specified URI that controls where the SOMA experiment is created.
Now pass the Seurat object to write_soma() to ingest the dataset into a new SOMA experiment at the specified URI.
write_soma(pbmc3k, uri = EXPERIMENT_URI)
'/tmp/RtmpusaiOF/soma-exp-pbmc3k'
Data access
Now that the dataset has been ingested into a SOMA experiment, you can access it using the EXPERIMENT_URI. Here, you will retrieve all annotations for the B cells in this dataset.
To delete the SOMA experiment from local directory you can call shutil.rmtree() as you would with any other directory.
shutil.rmtree(EXPERIMENT_URI)
To delete the experiment from the local directory you can simply call base R’s unlink() function as you would with any other directory.
unlink(EXPERIMENT_URI, recursive =TRUE)
TileDB Cloud
Note
Running the examples in this sections requires completing prerequisites detailed in the [TileDB Cloud Onboarding][] section.
Next you will see how to work with TileDB Cloud from a local machine. In practice, this means the new SOMA experiment will be created on S3 and registered in TileDB Cloud’s data catalog.
While the code required to ingest and access data is largely the same as working with a local file system, you will need to provide your TileDB Cloud credentials and a destination S3 bucket.
Setup
You must authenticate yourself with TileDB Cloud in order to interact with the service from your local machine (this is handled automatically when using TileDB Cloud hosted notebooks). While it’s possible to authenticate with a username and password, it’s recommended to use a REST API token for enhanced security. This tutorial assumes you have already stored your REST API token as an environment variable called TILEDB_REST_TOKEN. Additionally, the following environment variables must be defined in your environment with custom values before running the following examples.
S3_BUCKET with the URI for the destination S3 bucket.
S3_REGION with the region of the destination S3 bucket.
TILEDB_NAMESPACE with the TileDB Cloud account name.
# Get the keys from the environment variables.config = {"rest.token": os.environ.get("TILEDB_REST_TOKEN"),# or username and password# "rest.username": os.environ.get("TILEDB_USERNAME"),# "rest.password": os.environ.get("TILEDB_PASSWORD"),}tiledb_account = os.environ.get("TILEDB_ACCOUNT")s3_bucket = os.environ.get("S3_BUCKET")s3_region = os.environ.get("S3_REGION")
# Get the keys from the environment variables.config <-list(rest.token =Sys.getenv("TILEDB_REST_TOKEN")# or use username and password# rest.username = Sys.getenv("TILEDB_USERNAME"),# rest.password = Sys.getenv("TILEDB_PASSWORD"))tiledb_account <-Sys.getenv("TILEDB_ACCOUNT")s3_bucket <-Sys.getenv("S3_BUCKET")s3_region <-Sys.getenv("S3_REGION")
Pass the rest token to the TileDB-SOMA context constructor, which will be used to authenticate TileDB Cloud access.
Use the TileDB Cloud account name (i.e., namespace), S3 bucket, and experiment name to create a TileDB Cloud URI in the form tiledb://<namespace>/s3://<bucket>/<experiment_name>.
Tip
See the [TileDB Cloud URI][] foundation page for more details about this URI format.
By virtue of using the TileDB Cloud URI the SOMA experiment will automatically be created on S3 in the specified bucket and registered with TileDB Cloud.
Query
Similarly, the same query can be executed on the experiment using the exact same code as before. The only differences being the tiledb:// URI and the context object that contains the TileDB Cloud credentials.
To fully cleanup you must delete the experiment from the S3 bucket and unregister it from TileDB Cloud. Both steps can be performed at once by running the following code.
In this tutorial, you’ve learned how to install and configure TileDB-SOMA on your local machine, as well as how to interact with data stored locally and on TileDB Cloud.