Run TileDB-SOMA Locally

life sciences

single cell (soma)

tutorials

python

local access

Learn how to run TileDB-SOMA on your local machine, and interact with local file systems and TileDB Cloud.

This tutorial will walk you through installing and configuring TileDB-SOMA on your local machine Whether you’re working with single-cell data stored locally, in cloud object stores, or directly on TileDB Cloud, this guide covers the essential steps to get started.

Install

TileDB-SOMA provides APIs for Python and R. Each of these APIs can be installed in a few different ways. Select your preferred API and installation method below to get started.

Python
R

Warning for macOS users using Homebrew with conda/mamba

If you are using conda/mamba with macOS, you may experience issues when using Homebrew’s version of conda/mamba.

To avoid these issues, you should install conda/mamba using the Anaconda, Miniconda, or Miniforge installers, rather than using Homebrew.

conda install -c conda-forge -c tiledb tiledbsoma-py

Note

Conda will install pre-built TileDB-Py and TileDB core binaries for macOS or Linux.

Warning for macOS users using Homebrew with conda/mamba

If you are using conda/mamba with macOS, you may experience issues when using Homebrew’s version of conda/mamba.

To avoid these issues, you should install conda/mamba using the Anaconda, Miniconda, or Miniforge installers, rather than using Homebrew.

mamba install -c conda-forge -c tiledb tiledbsoma-py

Note

Mamba will install pre-built TileDB-Py and TileDB core binaries for macOS or Linux.

pip install tiledbsoma

Note

pip currently provides binary wheels for Linux, and will build all dependencies from source on other platforms.

conda install -c conda-forge -c tiledb r-tiledbsoma

Note

Conda will install pre-built TileDB-R and TileDB core binaries for macOS or Linux.

Installing TileDB-R on macOS with Apple Silicon chips

When loading r-tiledbsoma with library(tiledbsoma), you may encounter the following error:

Error: package or namespace load failed for 'tiledbsoma':
 .onLoad failed in loadNamespace() for 'tiledbsoma', details:
  call: dyn.load(file, DLLpath = DLLpath, ...)
  error: unable to load shared object '/tmp/miniconda3/envs/tiledb-soma/lib/R/library/RcppCCTZ/libs/RcppCCTZ.dylib':
  dlopen(/tmp/miniconda3/envs/tiledb-soma/lib/R/library/RcppCCTZ/libs/RcppCCTZ.dylib, 0x0006): tried: '/tmp/miniconda3/envs/tiledb-soma/lib/R/library/RcppCCTZ/libs/RcppCCTZ.dylib' (mach-o file, but is an incompatible architecture (
have 'x86_64', need 'arm64e' or 'arm64')), '/System/Volumes/Preboot/Cryptexes/OS/tmp/miniconda3/envs/tiledb-soma/lib/R/library/RcppCCTZ/libs/RcppCCTZ.dylib' (no such file), '/tmp/miniconda3/envs/tiledb-soma/lib/R/library/RcppCCTZ/l
ibs/RcppCCTZ.dylib' (mach-o file, but is an incompatible architecture (have 'x86_64', need 'arm64e' or 'arm64'))

To overcome this error, reinstall RcppCCTZ and nanotime from source:

install.packages("RcppCCTZ", type = "source")
install.packages("nanotime", type = "source")

mamba install -c conda-forge -c tiledb r-tiledbsoma

Note

Mamba will install pre-built TileDB-R and TileDB core binaries for macOS or Linux.

Installing TileDB-R on macOS with Apple Silicon chips

When loading r-tiledbsoma with library(tiledbsoma), you may encounter the following error:

Error: package or namespace load failed for 'tiledbsoma':
 .onLoad failed in loadNamespace() for 'tiledbsoma', details:
  call: dyn.load(file, DLLpath = DLLpath, ...)
  error: unable to load shared object '/tmp/miniconda3/envs/tiledb-soma/lib/R/library/RcppCCTZ/libs/RcppCCTZ.dylib':
  dlopen(/tmp/miniconda3/envs/tiledb-soma/lib/R/library/RcppCCTZ/libs/RcppCCTZ.dylib, 0x0006): tried: '/tmp/miniconda3/envs/tiledb-soma/lib/R/library/RcppCCTZ/libs/RcppCCTZ.dylib' (mach-o file, but is an incompatible architecture (
have 'x86_64', need 'arm64e' or 'arm64')), '/System/Volumes/Preboot/Cryptexes/OS/tmp/miniconda3/envs/tiledb-soma/lib/R/library/RcppCCTZ/libs/RcppCCTZ.dylib' (no such file), '/tmp/miniconda3/envs/tiledb-soma/lib/R/library/RcppCCTZ/l
ibs/RcppCCTZ.dylib' (mach-o file, but is an incompatible architecture (have 'x86_64', need 'arm64e' or 'arm64'))

To overcome this error, reinstall RcppCCTZ and nanotime from source:

install.packages("RcppCCTZ", type = "source")
install.packages("nanotime", type = "source")

install.packages(
  pkgs = "tiledbsoma",
  repos = c("https://tiledb-inc.r-universe.dev", "https://cloud.r-project.org")
)

library(tiledbsoma)

Installing TileDB-R on macOS

TileDB-SOMA is not yet available on CRAN but can be installed from r-universe, which provides pre-built binaries for macOS and Linux.

With the TileDB-SOMA package installed, you can start working with single-cell data stored in the TileDB-SOMA format.

Setup

Load the tiledbsoma package and a few other packages to complete this tutorial.

Python
R

import os
import shutil
import tempfile

import scanpy as sc
import tiledb
import tiledbsoma
import tiledbsoma.io

tiledbsoma.show_package_versions()

library(tiledb)
library(tiledbsoma)
suppressPackageStartupMessages(library(Seurat))

show_package_versions()

tiledbsoma:    1.11.4
tiledb-r:      0.27.0
tiledb core:   2.23.1
libtiledbsoma: 2.23.1
R:             R version 4.3.3 (2024-02-29)
OS:            Debian GNU/Linux 11 (bullseye)

Your starting point is the pbmc3k dataset, which contains 2,700 peripheral blood mononuclear cells (PBMC) from a healthy donor. The raw data was generated by 10X Genomics and is available on their website. The version of the dataset used here was processed with this scanpy notebook.

Python
R

Download and load the pbmc3k dataset using the scanpy package.

adata = sc.datasets.pbmc3k_processed()
adata

AnnData object with n_obs × n_vars = 2638 × 1838
    obs: 'n_genes', 'percent_mito', 'n_counts', 'louvain'
    var: 'n_cells'
    uns: 'draw_graph', 'louvain', 'louvain_colors', 'neighbors', 'pca', 'rank_genes_groups'
    obsm: 'X_pca', 'X_tsne', 'X_umap', 'X_draw_graph_fr'
    varm: 'PCs'
    obsp: 'distances', 'connectivities'

Download and load an RDS file containing a Seurat version of pbmc3k dataset, which has been made available on TileDB Cloud using the Files feature.

rds_uri <- "tiledb://TileDB-Inc/scanpy_pbmc3k_processed_rds"
rds_path <- file.path(tempdir(), "pbmc3k_processed.rds")

if (!file.exists(rds_path)) {
  if (!tiledb_filestore_uri_export(rds_path, rds_uri)) {
    stop("Failed to export RDS file from TileDB Cloud")
  }
}

pbmc3k <- readRDS(rds_path)
pbmc3k

An object of class Seurat 
1838 features across 2638 samples within 1 assay 
Active assay: RNA (1838 features, 0 variable features)
 2 layers present: counts, data
 4 dimensional reductions calculated: umap, tsne, draw_graph_fr, pca

Local file system

Here you will see how to work with TileDB-SOMA on your local machine, using a local file system. This is the simplest setup, as it requires no additional configuration.

Ingestion

As you learned in the Data Ingestion tutorial, the SOMA ingestion process requires a user-specified URI that controls where the SOMA experiment is created.

Python
R

Create a local temporary directory to pass to the ingestor’s experiment_uri argument.

EXPERIMENT_NAME = "soma-exp-pbmc3k"

EXPERIMENT_URI = tempfile.mkdtemp(prefix=EXPERIMENT_NAME)
EXPERIMENT_URI

'/tmp/soma-exp-pbmc3komifo08l'

Create a local temporary directory to pass to the ingestor’s uri argument.

EXPERIMENT_NAME <- "soma-exp-pbmc3k"

EXPERIMENT_URI <- file.path(tempdir(), EXPERIMENT_NAME)
EXPERIMENT_URI

'/tmp/RtmpusaiOF/soma-exp-pbmc3k'

With the experiment URI defined, proceed to ingest the dataset into the SOMA format at the specified location.

Python
R

Now pass the AnnData object to tiledbsoma.io.from_anndata() to ingest the dataset into a new SOMA experiment at the specified URI.

tiledbsoma.io.from_anndata(
    experiment_uri=EXPERIMENT_URI, measurement_name="RNA", anndata=adata
)

'/tmp/soma-exp-pbmc3komifo08l'

Now pass the Seurat object to write_soma() to ingest the dataset into a new SOMA experiment at the specified URI.

write_soma(pbmc3k, uri = EXPERIMENT_URI)

'/tmp/RtmpusaiOF/soma-exp-pbmc3k'

Data access

Now that the dataset has been ingested into a SOMA experiment, you can access it using the EXPERIMENT_URI. Here, you will retrieve all annotations for the B cells in this dataset.

Python
R

with tiledbsoma.Experiment.open(EXPERIMENT_URI) as experiment:
    with experiment.axis_query(
        measurement_name="RNA",
        obs_query=tiledbsoma.AxisQuery(
            value_filter="louvain == 'B cells'",
        ),
    ) as query:
        obs = query.obs().concat().to_pandas()

obs

	soma_joinid	obs_id	n_genes	percent_mito	n_counts	louvain
0	1	AAACATTGAGCTAC-1	1352	0.037936	4903.0	B cells
1	10	AAACTTGAAAAACG-1	1116	0.026316	3914.0	B cells
2	18	AAAGGCCTGTCTAG-1	1446	0.015283	4973.0	B cells
3	19	AAAGTTTGATCACG-1	446	0.034700	1268.0	B cells
4	20	AAAGTTTGGGGTGA-1	1020	0.025907	3281.0	B cells
...	...	...	...	...	...	...
337	2628	TTTCAGTGTCACGA-1	700	0.034314	1632.0	B cells
338	2630	TTTCAGTGTGCAGT-1	637	0.018925	1321.0	B cells
339	2634	TTTCTACTGAGGCA-1	1227	0.009294	3443.0	B cells
340	2635	TTTCTACTTCCTCG-1	622	0.021971	1684.0	B cells
341	2636	TTTGCATGAGAGGC-1	454	0.020548	1022.0	B cells

342 rows × 6 columns

experiment <- SOMAExperimentOpen(EXPERIMENT_URI)

query <- experiment$axis_query(
  measurement_name = "RNA",
  obs_query = SOMAAxisQuery$new(
    value_filter = "louvain == 'B cells'"
  )
)

obs <- query$obs()$concat()$to_data_frame()
obs

A tibble: 342 x 9
soma_joinid	orig.ident	nCount_RNA	nFeature_RNA	n_genes	percent_mito	n_counts	louvain	obs_id
<int>	<fct>	<dbl>	<int>	<int>	<dbl>	<dbl>	<chr>	<chr>
1	SeuratProject	233.96095	249	1352	0.03793596	4903	B cells	AAACATTGAGCTAC-1
10	SeuratProject	191.90643	216	1116	0.02631579	3914	B cells	AAACTTGAAAAACG-1
18	SeuratProject	250.50210	277	1446	0.01528253	4973	B cells	AAAGGCCTGTCTAG-1
19	SeuratProject	73.80223	88	446	0.03470032	1268	B cells	AAAGTTTGATCACG-1
20	SeuratProject	187.42732	207	1020	0.02590674	3281	B cells	AAAGTTTGGGGTGA-1
...	...	...	...	...	...	...	...	...
2628	SeuratProject	113.45525	139	700	0.03431373	1632	B cells	TTTCAGTGTCACGA-1
2630	SeuratProject	96.41425	119	637	0.01892506	1321	B cells	TTTCAGTGTGCAGT-1
2634	SeuratProject	171.67429	193	1227	0.00929422	3443	B cells	TTTCTACTGAGGCA-1
2635	SeuratProject	92.68251	108	622	0.02197150	1684	B cells	TTTCTACTTCCTCG-1
2636	SeuratProject	77.38343	95	454	0.02054795	1022	B cells	TTTGCATGAGAGGC-1

To delete the SOMA experiment from local directory you can call shutil.rmtree() as you would with any other directory.

shutil.rmtree(EXPERIMENT_URI)

To delete the experiment from the local directory you can simply call base R’s unlink() function as you would with any other directory.

unlink(EXPERIMENT_URI, recursive = TRUE)

TileDB Cloud

Note

Running the examples in this sections requires completing prerequisites detailed in the [TileDB Cloud Onboarding][] section.

Next you will see how to work with TileDB Cloud from a local machine. In practice, this means the new SOMA experiment will be created on S3 and registered in TileDB Cloud’s data catalog.

While the code required to ingest and access data is largely the same as working with a local file system, you will need to provide your TileDB Cloud credentials and a destination S3 bucket.

Setup

You must authenticate yourself with TileDB Cloud in order to interact with the service from your local machine (this is handled automatically when using TileDB Cloud hosted notebooks). While it’s possible to authenticate with a username and password, it’s recommended to use a REST API token for enhanced security. This tutorial assumes you have already stored your REST API token as an environment variable called TILEDB_REST_TOKEN. Additionally, the following environment variables must be defined in your environment with custom values before running the following examples.

S3_BUCKET with the URI for the destination S3 bucket.
S3_REGION with the region of the destination S3 bucket.
TILEDB_NAMESPACE with the TileDB Cloud account name.

Python
R

# Get the keys from the environment variables.
config = {
    "rest.token": os.environ.get("TILEDB_REST_TOKEN"),
    # or username and password
    # "rest.username": os.environ.get("TILEDB_USERNAME"),
    # "rest.password": os.environ.get("TILEDB_PASSWORD"),
}

tiledb_account = os.environ.get("TILEDB_ACCOUNT")
s3_bucket = os.environ.get("S3_BUCKET")
s3_region = os.environ.get("S3_REGION")

# Get the keys from the environment variables.
config <- list(
  rest.token = Sys.getenv("TILEDB_REST_TOKEN")
  # or use username and password
  # rest.username = Sys.getenv("TILEDB_USERNAME"),
  # rest.password = Sys.getenv("TILEDB_PASSWORD")
)

tiledb_account <- Sys.getenv("TILEDB_ACCOUNT")
s3_bucket <- Sys.getenv("S3_BUCKET")
s3_region <- Sys.getenv("S3_REGION")

Pass the rest token to the TileDB-SOMA context constructor, which will be used to authenticate TileDB Cloud access.

Python
R

ctx = tiledbsoma.SOMATileDBContext(tiledb_config=config)

ctx <- tiledbsoma::SOMATileDBContext$new(config = config)

Ingest

Use the TileDB Cloud account name (i.e., namespace), S3 bucket, and experiment name to create a TileDB Cloud URI in the form tiledb://<namespace>/s3://<bucket>/<experiment_name>.

Tip

See the [TileDB Cloud URI][] foundation page for more details about this URI format.

Python
R

EXPERIMENT_URI = f"tiledb://{tiledb_account}/{s3_bucket}/{EXPERIMENT_NAME}"
EXPERIMENT_URI

'tiledb://aaronwolen/s3://tiledb-aaron/academy/soma-exp-pbmc3k'

EXPERIMENT_URI <- sprintf("tiledb://%s/%s/%s", tiledb_account, s3_bucket, EXPERIMENT_NAME)
EXPERIMENT_URI

'tiledb://aaronwolen/s3://tiledb-aaron/academy/soma-exp-pbmc3k'

Other than providing the authenticated context object, no changes to the ingestion code are required.

Python
R

if tiledb.object_type(EXPERIMENT_URI) is not None:
    tiledb.remove(EXPERIMENT_URI)

tiledbsoma.io.from_anndata(
    experiment_uri=EXPERIMENT_URI, measurement_name="RNA", anndata=adata, context=ctx
)

'tiledb://aaronwolen/s3://tiledb-aaron/academy/soma-exp-pbmc3k'

type <- tiledb_object_type(EXPERIMENT_URI, ctx$to_tiledb_context())
if (type != "INVALID") {
  tiledb_object_remove(ctx$to_tiledb_context(), EXPERIMENT_URI)
}

write_soma(pbmc3k, uri = EXPERIMENT_URI, tiledbsoma_ctx = ctx)

'tiledb://aaronwolen/s3://tiledb-aaron/academy/soma-exp-pbmc3k'

By virtue of using the TileDB Cloud URI the SOMA experiment will automatically be created on S3 in the specified bucket and registered with TileDB Cloud.

Query

Similarly, the same query can be executed on the experiment using the exact same code as before. The only differences being the tiledb:// URI and the context object that contains the TileDB Cloud credentials.

Python
R

with tiledbsoma.Experiment.open(EXPERIMENT_URI, context=ctx) as experiment:
    with experiment.axis_query(
        measurement_name="RNA",
        obs_query=tiledbsoma.AxisQuery(
            value_filter="louvain == 'B cells'",
        ),
    ) as query:
        obs = query.obs().concat().to_pandas()

obs

	soma_joinid	obs_id	n_genes	percent_mito	n_counts	louvain
0	1	AAACATTGAGCTAC-1	1352	0.037936	4903.0	B cells
1	10	AAACTTGAAAAACG-1	1116	0.026316	3914.0	B cells
2	18	AAAGGCCTGTCTAG-1	1446	0.015283	4973.0	B cells
3	19	AAAGTTTGATCACG-1	446	0.034700	1268.0	B cells
4	20	AAAGTTTGGGGTGA-1	1020	0.025907	3281.0	B cells
...	...	...	...	...	...	...
337	2628	TTTCAGTGTCACGA-1	700	0.034314	1632.0	B cells
338	2630	TTTCAGTGTGCAGT-1	637	0.018925	1321.0	B cells
339	2634	TTTCTACTGAGGCA-1	1227	0.009294	3443.0	B cells
340	2635	TTTCTACTTCCTCG-1	622	0.021971	1684.0	B cells
341	2636	TTTGCATGAGAGGC-1	454	0.020548	1022.0	B cells

342 rows × 6 columns

experiment <- SOMAExperimentOpen(EXPERIMENT_URI, tiledbsoma_ctx = ctx)

query <- experiment$axis_query(
  measurement_name = "RNA",
  obs_query = SOMAAxisQuery$new(
    value_filter = "louvain == 'B cells'"
  )
)

obs <- query$obs()$concat()$to_data_frame()
obs

A tibble: 342 x 9
soma_joinid	orig.ident	nCount_RNA	nFeature_RNA	n_genes	percent_mito	n_counts	louvain	obs_id
<int>	<fct>	<dbl>	<int>	<int>	<dbl>	<dbl>	<chr>	<chr>
1	SeuratProject	233.96095	249	1352	0.03793596	4903	B cells	AAACATTGAGCTAC-1
10	SeuratProject	191.90643	216	1116	0.02631579	3914	B cells	AAACTTGAAAAACG-1
18	SeuratProject	250.50210	277	1446	0.01528253	4973	B cells	AAAGGCCTGTCTAG-1
19	SeuratProject	73.80223	88	446	0.03470032	1268	B cells	AAAGTTTGATCACG-1
20	SeuratProject	187.42732	207	1020	0.02590674	3281	B cells	AAAGTTTGGGGTGA-1
...	...	...	...	...	...	...	...	...
2628	SeuratProject	113.45525	139	700	0.03431373	1632	B cells	TTTCAGTGTCACGA-1
2630	SeuratProject	96.41425	119	637	0.01892506	1321	B cells	TTTCAGTGTGCAGT-1
2634	SeuratProject	171.67429	193	1227	0.00929422	3443	B cells	TTTCTACTGAGGCA-1
2635	SeuratProject	92.68251	108	622	0.02197150	1684	B cells	TTTCTACTTCCTCG-1
2636	SeuratProject	77.38343	95	454	0.02054795	1022	B cells	TTTGCATGAGAGGC-1

Cleanup

To fully cleanup you must delete the experiment from the S3 bucket and unregister it from TileDB Cloud. Both steps can be performed at once by running the following code.

Python
R

with tiledb.Group(uri=EXPERIMENT_URI, ctx=ctx.tiledb_ctx, mode="m") as group:
    group.delete(recursive=True)

grp <- tiledb_group(EXPERIMENT_URI, "READ")
grp <- tiledb_group_close(grp)

grp <- tiledb_group_open(grp, "MODIFY_EXCLUSIVE")
tiledb_group_delete(grp, EXPERIMENT_URI, recursive = TRUE)

Summary

In this tutorial, you’ve learned how to install and configure TileDB-SOMA on your local machine, as well as how to interact with data stored locally and on TileDB Cloud.