1. Structure
  2. Life Sciences
  3. Single-cell
  4. Tutorials
  5. Running Locally
  • Home
  • What is TileDB?
  • Get Started
  • Explore Content
  • Accounts
    • Individual Accounts
      • Apply for the Free Tier
      • Profile
        • Overview
        • Cloud Credentials
        • Storage Paths
        • REST API Tokens
        • Credits
    • Organization Admins
      • Create an Organization
      • Profile
        • Overview
        • Members
        • Cloud Credentials
        • Storage Paths
        • Billing
      • API Tokens
    • Organization Members
      • Organization Invitations
      • Profile
        • Overview
        • Members
        • Cloud Credentials
        • Storage Paths
        • Billing
      • API Tokens
  • Catalog
    • Introduction
    • Data
      • Arrays
      • Tables
      • Single-Cell (SOMA)
      • Genomics (VCF)
      • Biomedical Imaging
      • Vector Search
      • Files
    • Code
      • Notebooks
      • Dashboards
      • User-Defined Functions
      • Task Graphs
      • ML Models
    • Groups
    • Marketplace
    • Search
  • Collaborate
    • Introduction
    • Organizations
    • Access Control
      • Introduction
      • Share Assets
      • Asset Permissions
      • Public Assets
    • Logging
    • Marketplace
  • Analyze
    • Introduction
    • Slice Data
    • Multi-Region Redirection
    • Notebooks
      • Launch a Notebook
      • Usage
      • Widgets
      • Notebook Image Dependencies
    • Dashboards
      • Dashboards
      • Streamlit
    • Preview
    • User-Defined Functions
    • Task Graphs
    • Serverless SQL
    • Monitor
      • Task Log
      • Task Graph Log
  • Scale
    • Introduction
    • Task Graphs
    • API Usage
  • Structure
    • Why Structure Is Important
    • Arrays
      • Introduction
      • Quickstart
      • Foundation
        • Array Data Model
        • Key Concepts
          • Storage
            • Arrays
            • Dimensions
            • Attributes
            • Cells
            • Domain
            • Tiles
            • Data Layout
            • Compression
            • Encryption
            • Tile Filters
            • Array Schema
            • Schema Evolution
            • Fragments
            • Fragment Metadata
            • Commits
            • Indexing
            • Array Metadata
            • Datetimes
            • Groups
            • Object Stores
          • Compute
            • Writes
            • Deletions
            • Consolidation
            • Vacuuming
            • Time Traveling
            • Reads
            • Query Conditions
            • Aggregates
            • User-Defined Functions
            • Distributed Compute
            • Concurrency
            • Parallelism
        • Storage Format Spec
      • Tutorials
        • Basics
          • Basic Dense Array
          • Basic Sparse Array
          • Array Metadata
          • Compression
          • Encryption
          • Data Layout
          • Tile Filters
          • Datetimes
          • Multiple Attributes
          • Variable-Length Attributes
          • String Dimensions
          • Nullable Attributes
          • Multi-Range Reads
          • Query Conditions
          • Aggregates
          • Deletions
          • Catching Errors
          • Configuration
          • Basic S3 Example
          • Basic TileDB Cloud
          • fromDataFrame
          • Palmer Penguins
        • Advanced
          • Schema Evolution
          • Advanced Writes
            • Write at a Timestamp
            • Get Fragment Info
            • Consolidation
              • Fragments
              • Fragment List
              • Consolidation Plan
              • Commits
              • Fragment Metadata
              • Array Metadata
            • Vacuuming
              • Fragments
              • Commits
              • Fragment Metadata
              • Array Metadata
          • Advanced Reads
            • Get Fragment Info
            • Time Traveling
              • Introduction
              • Fragments
              • Array Metadata
              • Schema Evolution
          • Array Upgrade
          • Backends
            • Amazon S3
            • Azure Blob Storage
            • Google Cloud Storage
            • MinIO
            • Lustre
          • Virtual Filesystem
          • User-Defined Functions
          • Distributed Compute
          • Result Estimation
          • Incomplete Queries
        • Management
          • Array Schema
          • Groups
          • Object Management
        • Performance
          • Summary of Factors
          • Dense vs. Sparse
          • Dimensions vs. Attributes
          • Compression
          • Tiling and Data Layout
          • Tuning Writes
          • Tuning Reads
      • API Reference
    • Tables
      • Introduction
      • Quickstart
      • Foundation
        • Data Model
        • Key Concepts
          • Indexes
          • Columnar Storage
          • Compression
          • Data Manipulation
          • Optimize Tables
          • ACID
          • Serverless SQL
          • SQL Connectors
          • Dataframes
          • CSV Ingestion
      • Tutorials
        • Basics
          • Ingestion with SQL
          • CSV Ingestion
          • Basic S3 Example
          • Running Locally
        • Advanced
          • Scalable Ingestion
          • Scalable Queries
      • API Reference
    • AI & ML
      • Vector Search
        • Introduction
        • Quickstart
        • Foundation
          • Data Model
          • Key Concepts
            • Vector Search
            • Vector Databases
            • Algorithms
            • Distance Metrics
            • Updates
            • Deployment Methods
            • Architecture
            • Distributed Compute
          • Storage Format Spec
        • Tutorials
          • Basics
            • Ingestion & Querying
            • Updates
            • Deletions
            • Basic S3 Example
            • Running Locally
          • Advanced
            • Versioning
            • Time Traveling
            • Consolidation
            • Distributed Compute
            • RAG LLM
            • LLM Memory
            • File Search
            • Image Search
            • Protein Search
          • Performance
        • API Reference
      • ML Models
        • Introduction
        • Quickstart
        • Foundation
          • Basics
          • Storage
          • Cloud Execution
          • Why TileDB for Machine Learning
        • Tutorials
          • Ingestion
            • Data Ingestion
              • Dense Datasets
              • Sparse Datasets
            • ML Model Ingestion
          • Management
            • Array Schema
            • Machine Learning: Groups
            • Time Traveling
    • Life Sciences
      • Single-cell
        • Introduction
        • Quickstart
        • Foundation
          • Data Model
          • Key Concepts
            • Data Structures
            • Use of Apache Arrow
            • Join IDs
            • State Management
            • TileDB Cloud URIs
          • SOMA API Specification
        • Tutorials
          • Data Ingestion
          • Bulk Ingestion Tutorial
          • Data Access
          • Distributed Compute
          • Basic S3 Example
          • Multi-Experiment Queries
          • Appending Data to a SOMA Experiment
          • Add New Measurements
          • SQL Queries
          • Running Locally
          • Shapes in TileDB-SOMA
          • Drug Discovery App
        • Spatial
          • Introduction
          • Foundation
            • Spatial Data Model
            • Data Structures
          • Tutorials
            • Spatial Data Ingestion
            • Access Spatial Data
            • Manage Coordinate Spaces
        • API Reference
      • Population Genomics
        • Introduction
        • Quickstart
        • Foundation
          • Data Model
          • Key Concepts
            • The N+1 Problem
            • Architecture
            • Arrays
            • Ingestion
            • Reads
            • Variant Statistics
            • Annotations
            • User-Defined Functions
            • Tables and SQL
            • Distributed Compute
          • Storage Format Spec
        • Tutorials
          • Basics
            • Basic Ingestion
            • Basic Queries
            • Export to VCF
            • Add New Samples
            • Deleting Samples
            • Basic S3 Example
            • Basic TileDB Cloud
          • Advanced
            • Scalable Ingestion
            • Scalable Queries
            • Query Transforms
            • Handling Large Queries
            • Annotations
              • Finding Annotations
              • Embedded Annotations
              • External Annotations
              • Annotation VCFs
              • Ingesting Annotations
            • Variant Statistics
            • Tables and SQL
            • User-Defined Functions
            • Sample Metadata
            • Split VCF
          • Performance
        • API Reference
          • Command Line Interface
          • Python API
          • Cloud API
      • Biomedical Imaging
        • Introduction
        • Foundation
          • Data Model
          • Key Concepts
            • Arrays
            • Ingestion
            • Reads
            • User Defined Functions
          • Storage Format Spec
        • Quickstart
        • Tutorials
          • Basics
            • Ingestion
            • Read
              • OpenSlide
              • TileDB-Py
          • Advanced
            • Batched Ingestion
            • Chunked Ingestion
            • Machine Learning
              • PyTorch
            • Napari
    • Files
  • API Reference
  • Self-Hosting
    • Installation
    • Upgrades
    • Administrative Tasks
    • Image Customization
      • Customize User-Defined Function Images
      • AWS ECR Container Registry
      • Customize Jupyter Notebook Images
    • Single Sign-On
      • Configure Single Sign-On
      • OpenID Connect
      • Okta SCIM
      • Microsoft Entra
  • Glossary

On this page

  • Install
  • Setup
  • Local file system
    • Ingestion
    • Data access
    • Cleanup
  • TileDB Cloud
    • Setup
    • Ingest
    • Query
    • Cleanup
  • Summary
  1. Structure
  2. Life Sciences
  3. Single-cell
  4. Tutorials
  5. Running Locally

Run TileDB-SOMA Locally

life sciences
single cell (soma)
tutorials
python
r
local access
Learn how to run TileDB-SOMA on your local machine, and interact with local file systems and TileDB Cloud.

This tutorial will walk you through installing and configuring TileDB-SOMA on your local machine Whether you’re working with single-cell data stored locally, in cloud object stores, or directly on TileDB Cloud, this guide covers the essential steps to get started.

Install

TileDB-SOMA provides APIs for Python and R. Each of these APIs can be installed in a few different ways. Select your preferred API and installation method below to get started.

  • Python
  • R
  • Conda
  • Mamba
  • PyPI
conda install -c conda-forge -c tiledb tiledbsoma-py
Note

Conda will install pre-built TileDB-Py and TileDB core binaries for macOS or Linux.

mamba install -c conda-forge -c tiledb tiledbsoma-py
Note

Mamba will install pre-built TileDB-Py and TileDB core binaries for macOS or Linux.

pip install tiledbsoma
Note

pip currently provides binary wheels for Linux, and will build all dependencies from source on other platforms.

  • Conda
  • Mamba
  • r-universe
conda install -c conda-forge -c tiledb r-tiledbsoma
Note

Conda will install pre-built TileDB-R and TileDB core binaries for macOS or Linux.

Installing TileDB-R on macOS with Apple Silicon chips

When loading r-tiledbsoma with library(tiledbsoma), you may encounter the following error:

Error: package or namespace load failed for 'tiledbsoma':
 .onLoad failed in loadNamespace() for 'tiledbsoma', details:
  call: dyn.load(file, DLLpath = DLLpath, ...)
  error: unable to load shared object '/tmp/miniconda3/envs/tiledb-soma/lib/R/library/RcppCCTZ/libs/RcppCCTZ.dylib':
  dlopen(/tmp/miniconda3/envs/tiledb-soma/lib/R/library/RcppCCTZ/libs/RcppCCTZ.dylib, 0x0006): tried: '/tmp/miniconda3/envs/tiledb-soma/lib/R/library/RcppCCTZ/libs/RcppCCTZ.dylib' (mach-o file, but is an incompatible architecture (
have 'x86_64', need 'arm64e' or 'arm64')), '/System/Volumes/Preboot/Cryptexes/OS/tmp/miniconda3/envs/tiledb-soma/lib/R/library/RcppCCTZ/libs/RcppCCTZ.dylib' (no such file), '/tmp/miniconda3/envs/tiledb-soma/lib/R/library/RcppCCTZ/l
ibs/RcppCCTZ.dylib' (mach-o file, but is an incompatible architecture (have 'x86_64', need 'arm64e' or 'arm64'))

To overcome this error, reinstall RcppCCTZ and nanotime from source:

install.packages("RcppCCTZ", type = "source")
install.packages("nanotime", type = "source")
mamba install -c conda-forge -c tiledb r-tiledbsoma
Note

Mamba will install pre-built TileDB-R and TileDB core binaries for macOS or Linux.

Installing TileDB-R on macOS with Apple Silicon chips

When loading r-tiledbsoma with library(tiledbsoma), you may encounter the following error:

Error: package or namespace load failed for 'tiledbsoma':
 .onLoad failed in loadNamespace() for 'tiledbsoma', details:
  call: dyn.load(file, DLLpath = DLLpath, ...)
  error: unable to load shared object '/tmp/miniconda3/envs/tiledb-soma/lib/R/library/RcppCCTZ/libs/RcppCCTZ.dylib':
  dlopen(/tmp/miniconda3/envs/tiledb-soma/lib/R/library/RcppCCTZ/libs/RcppCCTZ.dylib, 0x0006): tried: '/tmp/miniconda3/envs/tiledb-soma/lib/R/library/RcppCCTZ/libs/RcppCCTZ.dylib' (mach-o file, but is an incompatible architecture (
have 'x86_64', need 'arm64e' or 'arm64')), '/System/Volumes/Preboot/Cryptexes/OS/tmp/miniconda3/envs/tiledb-soma/lib/R/library/RcppCCTZ/libs/RcppCCTZ.dylib' (no such file), '/tmp/miniconda3/envs/tiledb-soma/lib/R/library/RcppCCTZ/l
ibs/RcppCCTZ.dylib' (mach-o file, but is an incompatible architecture (have 'x86_64', need 'arm64e' or 'arm64'))

To overcome this error, reinstall RcppCCTZ and nanotime from source:

install.packages("RcppCCTZ", type = "source")
install.packages("nanotime", type = "source")
install.packages(
  pkgs = "tiledbsoma",
  repos = c("https://tiledb-inc.r-universe.dev", "https://cloud.r-project.org")
)

library(tiledbsoma)
Installing TileDB-R on macOS

TileDB-SOMA is not yet available on CRAN but can be installed from r-universe, which provides pre-built binaries for macOS and Linux.

With the TileDB-SOMA package installed, you can start working with single-cell data stored in the TileDB-SOMA format.

Setup

Load the tiledbsoma package and a few other packages to complete this tutorial.

  • Python
  • R
import os
import shutil
import tempfile

import scanpy as sc
import tiledb
import tiledbsoma
import tiledbsoma.io

tiledbsoma.show_package_versions()
library(tiledb)
library(tiledbsoma)
suppressPackageStartupMessages(library(Seurat))

show_package_versions()
tiledbsoma:    1.11.4
tiledb-r:      0.27.0
tiledb core:   2.23.1
libtiledbsoma: 2.23.1
R:             R version 4.3.3 (2024-02-29)
OS:            Debian GNU/Linux 11 (bullseye)

Your starting point is the pbmc3k dataset, which contains 2,700 peripheral blood mononuclear cells (PBMC) from a healthy donor. The raw data was generated by 10X Genomics and is available on their website. The version of the dataset used here was processed with this scanpy notebook.

  • Python
  • R

Download and load the pbmc3k dataset using the scanpy package.

adata = sc.datasets.pbmc3k_processed()
adata
AnnData object with n_obs × n_vars = 2638 × 1838
    obs: 'n_genes', 'percent_mito', 'n_counts', 'louvain'
    var: 'n_cells'
    uns: 'draw_graph', 'louvain', 'louvain_colors', 'neighbors', 'pca', 'rank_genes_groups'
    obsm: 'X_pca', 'X_tsne', 'X_umap', 'X_draw_graph_fr'
    varm: 'PCs'
    obsp: 'distances', 'connectivities'

Download and load an RDS file containing a Seurat version of pbmc3k dataset, which has been made available on TileDB Cloud using the Files feature.

rds_uri <- "tiledb://TileDB-Inc/scanpy_pbmc3k_processed_rds"
rds_path <- file.path(tempdir(), "pbmc3k_processed.rds")

if (!file.exists(rds_path)) {
  if (!tiledb_filestore_uri_export(rds_path, rds_uri)) {
    stop("Failed to export RDS file from TileDB Cloud")
  }
}

pbmc3k <- readRDS(rds_path)
pbmc3k
An object of class Seurat 
1838 features across 2638 samples within 1 assay 
Active assay: RNA (1838 features, 0 variable features)
 2 layers present: counts, data
 4 dimensional reductions calculated: umap, tsne, draw_graph_fr, pca

Local file system

Here you will see how to work with TileDB-SOMA on your local machine, using a local file system. This is the simplest setup, as it requires no additional configuration.

Ingestion

As you learned in the Data Ingestion tutorial, the SOMA ingestion process requires a user-specified URI that controls where the SOMA experiment is created.

  • Python
  • R

Create a local temporary directory to pass to the ingestor’s experiment_uri argument.

EXPERIMENT_NAME = "soma-exp-pbmc3k"

EXPERIMENT_URI = tempfile.mkdtemp(prefix=EXPERIMENT_NAME)
EXPERIMENT_URI
'/tmp/soma-exp-pbmc3komifo08l'

Create a local temporary directory to pass to the ingestor’s uri argument.

EXPERIMENT_NAME <- "soma-exp-pbmc3k"

EXPERIMENT_URI <- file.path(tempdir(), EXPERIMENT_NAME)
EXPERIMENT_URI
'/tmp/RtmpusaiOF/soma-exp-pbmc3k'

With the experiment URI defined, proceed to ingest the dataset into the SOMA format at the specified location.

  • Python
  • R

Now pass the AnnData object to tiledbsoma.io.from_anndata() to ingest the dataset into a new SOMA experiment at the specified URI.

tiledbsoma.io.from_anndata(
    experiment_uri=EXPERIMENT_URI, measurement_name="RNA", anndata=adata
)
'/tmp/soma-exp-pbmc3komifo08l'

Now pass the Seurat object to write_soma() to ingest the dataset into a new SOMA experiment at the specified URI.

write_soma(pbmc3k, uri = EXPERIMENT_URI)
'/tmp/RtmpusaiOF/soma-exp-pbmc3k'

Data access

Now that the dataset has been ingested into a SOMA experiment, you can access it using the EXPERIMENT_URI. Here, you will retrieve all annotations for the B cells in this dataset.

  • Python
  • R
with tiledbsoma.Experiment.open(EXPERIMENT_URI) as experiment:
    with experiment.axis_query(
        measurement_name="RNA",
        obs_query=tiledbsoma.AxisQuery(
            value_filter="louvain == 'B cells'",
        ),
    ) as query:
        obs = query.obs().concat().to_pandas()

obs
soma_joinid obs_id n_genes percent_mito n_counts louvain
0 1 AAACATTGAGCTAC-1 1352 0.037936 4903.0 B cells
1 10 AAACTTGAAAAACG-1 1116 0.026316 3914.0 B cells
2 18 AAAGGCCTGTCTAG-1 1446 0.015283 4973.0 B cells
3 19 AAAGTTTGATCACG-1 446 0.034700 1268.0 B cells
4 20 AAAGTTTGGGGTGA-1 1020 0.025907 3281.0 B cells
... ... ... ... ... ... ...
337 2628 TTTCAGTGTCACGA-1 700 0.034314 1632.0 B cells
338 2630 TTTCAGTGTGCAGT-1 637 0.018925 1321.0 B cells
339 2634 TTTCTACTGAGGCA-1 1227 0.009294 3443.0 B cells
340 2635 TTTCTACTTCCTCG-1 622 0.021971 1684.0 B cells
341 2636 TTTGCATGAGAGGC-1 454 0.020548 1022.0 B cells

342 rows × 6 columns

experiment <- SOMAExperimentOpen(EXPERIMENT_URI)

query <- experiment$axis_query(
  measurement_name = "RNA",
  obs_query = SOMAAxisQuery$new(
    value_filter = "louvain == 'B cells'"
  )
)

obs <- query$obs()$concat()$to_data_frame()
obs
A tibble: 342 x 9
soma_joinid orig.ident nCount_RNA nFeature_RNA n_genes percent_mito n_counts louvain obs_id
<int> <fct> <dbl> <int> <int> <dbl> <dbl> <chr> <chr>
1 SeuratProject 233.96095 249 1352 0.03793596 4903 B cells AAACATTGAGCTAC-1
10 SeuratProject 191.90643 216 1116 0.02631579 3914 B cells AAACTTGAAAAACG-1
18 SeuratProject 250.50210 277 1446 0.01528253 4973 B cells AAAGGCCTGTCTAG-1
19 SeuratProject 73.80223 88 446 0.03470032 1268 B cells AAAGTTTGATCACG-1
20 SeuratProject 187.42732 207 1020 0.02590674 3281 B cells AAAGTTTGGGGTGA-1
... ... ... ... ... ... ... ... ...
2628 SeuratProject 113.45525 139 700 0.03431373 1632 B cells TTTCAGTGTCACGA-1
2630 SeuratProject 96.41425 119 637 0.01892506 1321 B cells TTTCAGTGTGCAGT-1
2634 SeuratProject 171.67429 193 1227 0.00929422 3443 B cells TTTCTACTGAGGCA-1
2635 SeuratProject 92.68251 108 622 0.02197150 1684 B cells TTTCTACTTCCTCG-1
2636 SeuratProject 77.38343 95 454 0.02054795 1022 B cells TTTGCATGAGAGGC-1

Cleanup

  • Python
  • R

To delete the SOMA experiment from local directory you can call shutil.rmtree() as you would with any other directory.

shutil.rmtree(EXPERIMENT_URI)

To delete the experiment from the local directory you can simply call base R’s unlink() function as you would with any other directory.

unlink(EXPERIMENT_URI, recursive = TRUE)

TileDB Cloud

Note

Running the examples in this sections requires completing prerequisites detailed in the [TileDB Cloud Onboarding][] section.

Next you will see how to work with TileDB Cloud from a local machine. In practice, this means the new SOMA experiment will be created on S3 and registered in TileDB Cloud’s data catalog.

While the code required to ingest and access data is largely the same as working with a local file system, you will need to provide your TileDB Cloud credentials and a destination S3 bucket.

Setup

You must authenticate yourself with TileDB Cloud in order to interact with the service from your local machine (this is handled automatically when using TileDB Cloud hosted notebooks). While it’s possible to authenticate with a username and password, it’s recommended to use a REST API token for enhanced security. This tutorial assumes you have already stored your REST API token as an environment variable called TILEDB_REST_TOKEN. Additionally, the following environment variables must be defined in your environment with custom values before running the following examples.

  • S3_BUCKET with the URI for the destination S3 bucket.
  • S3_REGION with the region of the destination S3 bucket.
  • TILEDB_NAMESPACE with the TileDB Cloud account name.
  • Python
  • R
# Get the keys from the environment variables.
config = {
    "rest.token": os.environ.get("TILEDB_REST_TOKEN"),
    # or username and password
    # "rest.username": os.environ.get("TILEDB_USERNAME"),
    # "rest.password": os.environ.get("TILEDB_PASSWORD"),
}

tiledb_account = os.environ.get("TILEDB_ACCOUNT")
s3_bucket = os.environ.get("S3_BUCKET")
s3_region = os.environ.get("S3_REGION")
# Get the keys from the environment variables.
config <- list(
  rest.token = Sys.getenv("TILEDB_REST_TOKEN")
  # or use username and password
  # rest.username = Sys.getenv("TILEDB_USERNAME"),
  # rest.password = Sys.getenv("TILEDB_PASSWORD")
)

tiledb_account <- Sys.getenv("TILEDB_ACCOUNT")
s3_bucket <- Sys.getenv("S3_BUCKET")
s3_region <- Sys.getenv("S3_REGION")

Pass the rest token to the TileDB-SOMA context constructor, which will be used to authenticate TileDB Cloud access.

  • Python
  • R
ctx = tiledbsoma.SOMATileDBContext(tiledb_config=config)
ctx <- tiledbsoma::SOMATileDBContext$new(config = config)

Ingest

Use the TileDB Cloud account name (i.e., namespace), S3 bucket, and experiment name to create a TileDB Cloud URI in the form tiledb://<namespace>/s3://<bucket>/<experiment_name>.

Tip

See the [TileDB Cloud URI][] foundation page for more details about this URI format.

  • Python
  • R
EXPERIMENT_URI = f"tiledb://{tiledb_account}/{s3_bucket}/{EXPERIMENT_NAME}"
EXPERIMENT_URI
'tiledb://aaronwolen/s3://tiledb-aaron/academy/soma-exp-pbmc3k'
EXPERIMENT_URI <- sprintf("tiledb://%s/%s/%s", tiledb_account, s3_bucket, EXPERIMENT_NAME)
EXPERIMENT_URI
'tiledb://aaronwolen/s3://tiledb-aaron/academy/soma-exp-pbmc3k'

Other than providing the authenticated context object, no changes to the ingestion code are required.

  • Python
  • R
if tiledb.object_type(EXPERIMENT_URI) is not None:
    tiledb.remove(EXPERIMENT_URI)

tiledbsoma.io.from_anndata(
    experiment_uri=EXPERIMENT_URI, measurement_name="RNA", anndata=adata, context=ctx
)
'tiledb://aaronwolen/s3://tiledb-aaron/academy/soma-exp-pbmc3k'
type <- tiledb_object_type(EXPERIMENT_URI, ctx$to_tiledb_context())
if (type != "INVALID") {
  tiledb_object_remove(ctx$to_tiledb_context(), EXPERIMENT_URI)
}

write_soma(pbmc3k, uri = EXPERIMENT_URI, tiledbsoma_ctx = ctx)
'tiledb://aaronwolen/s3://tiledb-aaron/academy/soma-exp-pbmc3k'

By virtue of using the TileDB Cloud URI the SOMA experiment will automatically be created on S3 in the specified bucket and registered with TileDB Cloud.

Query

Similarly, the same query can be executed on the experiment using the exact same code as before. The only differences being the tiledb:// URI and the context object that contains the TileDB Cloud credentials.

  • Python
  • R
with tiledbsoma.Experiment.open(EXPERIMENT_URI, context=ctx) as experiment:
    with experiment.axis_query(
        measurement_name="RNA",
        obs_query=tiledbsoma.AxisQuery(
            value_filter="louvain == 'B cells'",
        ),
    ) as query:
        obs = query.obs().concat().to_pandas()

obs
soma_joinid obs_id n_genes percent_mito n_counts louvain
0 1 AAACATTGAGCTAC-1 1352 0.037936 4903.0 B cells
1 10 AAACTTGAAAAACG-1 1116 0.026316 3914.0 B cells
2 18 AAAGGCCTGTCTAG-1 1446 0.015283 4973.0 B cells
3 19 AAAGTTTGATCACG-1 446 0.034700 1268.0 B cells
4 20 AAAGTTTGGGGTGA-1 1020 0.025907 3281.0 B cells
... ... ... ... ... ... ...
337 2628 TTTCAGTGTCACGA-1 700 0.034314 1632.0 B cells
338 2630 TTTCAGTGTGCAGT-1 637 0.018925 1321.0 B cells
339 2634 TTTCTACTGAGGCA-1 1227 0.009294 3443.0 B cells
340 2635 TTTCTACTTCCTCG-1 622 0.021971 1684.0 B cells
341 2636 TTTGCATGAGAGGC-1 454 0.020548 1022.0 B cells

342 rows × 6 columns

experiment <- SOMAExperimentOpen(EXPERIMENT_URI, tiledbsoma_ctx = ctx)

query <- experiment$axis_query(
  measurement_name = "RNA",
  obs_query = SOMAAxisQuery$new(
    value_filter = "louvain == 'B cells'"
  )
)

obs <- query$obs()$concat()$to_data_frame()
obs
A tibble: 342 x 9
soma_joinid orig.ident nCount_RNA nFeature_RNA n_genes percent_mito n_counts louvain obs_id
<int> <fct> <dbl> <int> <int> <dbl> <dbl> <chr> <chr>
1 SeuratProject 233.96095 249 1352 0.03793596 4903 B cells AAACATTGAGCTAC-1
10 SeuratProject 191.90643 216 1116 0.02631579 3914 B cells AAACTTGAAAAACG-1
18 SeuratProject 250.50210 277 1446 0.01528253 4973 B cells AAAGGCCTGTCTAG-1
19 SeuratProject 73.80223 88 446 0.03470032 1268 B cells AAAGTTTGATCACG-1
20 SeuratProject 187.42732 207 1020 0.02590674 3281 B cells AAAGTTTGGGGTGA-1
... ... ... ... ... ... ... ... ...
2628 SeuratProject 113.45525 139 700 0.03431373 1632 B cells TTTCAGTGTCACGA-1
2630 SeuratProject 96.41425 119 637 0.01892506 1321 B cells TTTCAGTGTGCAGT-1
2634 SeuratProject 171.67429 193 1227 0.00929422 3443 B cells TTTCTACTGAGGCA-1
2635 SeuratProject 92.68251 108 622 0.02197150 1684 B cells TTTCTACTTCCTCG-1
2636 SeuratProject 77.38343 95 454 0.02054795 1022 B cells TTTGCATGAGAGGC-1

Cleanup

To fully cleanup you must delete the experiment from the S3 bucket and unregister it from TileDB Cloud. Both steps can be performed at once by running the following code.

  • Python
  • R
with tiledb.Group(uri=EXPERIMENT_URI, ctx=ctx.tiledb_ctx, mode="m") as group:
    group.delete(recursive=True)
grp <- tiledb_group(EXPERIMENT_URI, "READ")
grp <- tiledb_group_close(grp)

grp <- tiledb_group_open(grp, "MODIFY_EXCLUSIVE")
tiledb_group_delete(grp, EXPERIMENT_URI, recursive = TRUE)

Summary

In this tutorial, you’ve learned how to install and configure TileDB-SOMA on your local machine, as well as how to interact with data stored locally and on TileDB Cloud.

SQL Queries
Shapes in TileDB-SOMA