Data Access

life sciences

single cell (soma)

tutorials

python

reads

Learn how to access TileDB-SOMA data in a variety of ways.

This tutorial outlines how to access single-cell data stored in a SOMA experiment. See the Data Ingestion tutorial for information on how to create a SOMA experiment from a single-cell dataset.

Prerequisites

While you can run this tutorial locally, note that this tutorial relies on remote resources to run correctly.

You must create a REST API token and create an environment variable named $TILEDB_REST_TOKEN set to the value of your generated token.

Setup

First, load tiledbsoma as well as a few other packages used in this tutorial:

Python
R

import scanpy as sc
import tiledbsoma
import tiledbsoma.io

tiledbsoma.show_package_versions()

tiledbsoma.__version__              1.11.4
TileDB-Py version                   0.29.0
TileDB core version (tiledb)        2.23.1
TileDB core version (libtiledbsoma) 2.23.1
python version                      3.9.19.final.0
OS version                          Linux 6.8.0-1013-aws

library(tiledb)
library(tiledbsoma)
suppressPackageStartupMessages(library(Seurat))

show_package_versions()

tiledbsoma:    1.11.4
tiledb-r:      0.27.0
tiledb core:   2.23.1
libtiledbsoma: 2.23.1
R:             R version 4.3.3 (2024-02-29)
OS:            Debian GNU/Linux 11 (bullseye)

Dataset

This tutorial uses a dataset from the Tabula Sapiens consortium, which includes nearly 265,000 immune cells across various tissue types. The original H5AD file was downloaded from Figshare and converted into a SOMA experiment using the TileDB-SOMA API. The resulting SOMA experiment is hosted on TileDB Cloud (tabula-sapiens-immune).

This SOMA experiment is accessible programmatically with the following URI:

Python
R

SOMA_URI = "tiledb://TileDB-Inc/tabula-sapiens-immune"

SOMA_URI <- "tiledb://TileDB-Inc/tabula-sapiens-immune"

Access SOMA components

Open the new SOMA experiment in read mode to view its structure:

Python
R

experiment = tiledbsoma.Experiment.open(SOMA_URI)
experiment

<Experiment 'tiledb://TileDB-Inc/tabula-sapiens-immune' (open for 'r') (2 items)
    'ms': 'tiledb://TileDB-Inc/e19ed185-3710-4542-be4f-a82ce8418fd6' (unopened)
    'obs': 'tiledb://TileDB-Inc/e11d2d07-ab5a-41aa-9408-378802cd4890' (unopened)>

experiment <- SOMAExperimentOpen(SOMA_URI)
experiment

<SOMAExperiment>
  uri: tiledb://TileDB-Inc/tabula-sapiens-immune 
  arrays: obs* 
  groups: ms*

Note that opening a SOMA experiment (or any SOMA object) only returns a pointer to the object on disk. No data is actually loaded into memory until it’s requested.

The top level of the experiment contains two elements: obs, a SOMA DataFrame containing the cell annotations; and ms, a SOMA Collection of the measurements (e.g., RNA) in the experiment.

You can access the obs array directly with:

Python
R

experiment.obs

<DataFrame 'tiledb://TileDB-Inc/e11d2d07-ab5a-41aa-9408-378802cd4890' (open for 'r')>

experiment$obs

<SOMADataFrame>
  uri: tiledb://TileDB-Inc/e11d2d07-ab5a-41aa-9408-378802cd4890 
  dimensions: soma_joinid 
  attributes: cell_id, organ_tissue, method, donor, anatomical_information, n_counts_UMIs, ...

Other elements are nested within the experiment according to the SOMA data model (see the following diagram) but are accessible using a similar syntax.

For example, feature-level annotations are stored in the var array, which is always located at the top-level of each SOMA Measurement. This dataset contains only a single measurement, RNA but more complex datasets may contain multiple measurements. Access the RNA measurement’s var array.

Python
R

experiment.ms["RNA"].var

<DataFrame 'tiledb://TileDB-Inc/51fd7f27-3d17-49d0-abc3-04efd8fb9712' (open for 'r')>

experiment$ms$get("RNA")$var

<SOMADataFrame>
  uri: tiledb://TileDB-Inc/51fd7f27-3d17-49d0-abc3-04efd8fb9712 
  dimensions: soma_joinid 
  attributes: var_id, gene_symbol, feature_type, ensemblid, highly_variable, means, dispers...

Similarly, assay data (e.g., RNA expression levels) is stored in SOMASparseNDArrays within the X collection. Each array within X is referred to as a layer. Access the X collection to see what layers are available.

Python
R

experiment.ms["RNA"].X

<Collection 'tiledb://TileDB-Inc/eed32c99-793e-45c8-9fc4-2d2bfbf1ea75' (open for 'r') (3 items)
    'raw_counts': 'tiledb://TileDB-Inc/c7e36602-3603-43aa-9b74-0702dfc67261' (unopened)
    'decontXcounts': 'tiledb://TileDB-Inc/9e00e1b2-3839-466c-8d84-b563bdc9ad16' (unopened)
    'data': 'tiledb://TileDB-Inc/f831aedb-ec83-4a28-87c9-ebeda0932bce' (unopened)>

experiment$ms$get("RNA")$X

<SOMACollection>
  uri: tiledb://TileDB-Inc/eed32c99-793e-45c8-9fc4-2d2bfbf1ea75 
  arrays: data*, decontXcounts*, raw_counts*

The next section covers reading data from these components.

Read into memory

All array-based SOMA objects provide a read method for loading data into memory. Designed with large datasets in mind, these methods always return an iterator, allowing data to be loaded in chunks intelligently sized by TileDB to accommodate the allocated memory, and efficiently materialize the results in Apache Arrow format, leveraging zero-copy memory sharing where possible.

In following example, expression values from the X data layer are loaded into memory one chunk at a time.

Python
R

The .tables() method is used to materialize each chunk as an Arrow Table. For the sake of this tutorial, the operation is limited to the first 3 chunks.

chunks = []
for chunk in experiment.ms["RNA"].X["data"].read().tables():
    if len(chunks) == 3:
        break
    chunks.append(chunk)

chunks

[pyarrow.Table
 soma_dim_0: int64
 soma_dim_1: int64
 soma_data: float
 ----
 soma_dim_0: [[0,0,0,0,0,...,1941,1941,1941,1941,1941]]
 soma_dim_1: [[38,137,148,197,229,...,25576,25581,25620,25679,25714]]
 soma_data: [[2.3135314,2.017924,1.7682451,2.9146569,1.7524959,...,2.342981,2.5981085,4.8260975,2.4611263,1.7042084]],
 pyarrow.Table
 soma_dim_0: int64
 soma_dim_1: int64
 soma_data: float
 ----
 soma_dim_0: [[1941,1941,1941,1941,1941,...,1832,1832,1832,1832,1832]]
 soma_dim_1: [[25769,25801,25879,25953,25959,...,51698,51760,51772,51790,51920]]
 soma_data: [[2.325368,1.4856703,4.992154,2.4606707,2.4371808,...,4.936844,1.9636455,6.960403,5.834246,1.6239011]],
 pyarrow.Table
 soma_dim_0: int64
 soma_dim_1: int64
 soma_data: float
 ----
 soma_dim_0: [[1832,1832,1832,1832,1832,...,3432,3432,3432,3432,3432]]
 soma_dim_1: [[51925,51930,51951,51994,52036,...,21530,21597,21603,21642,21679]]
 soma_data: [[3.1878636,2.1417744,2.521798,8.600453,3.123161,...,1.8475122,1.0498972,0,2.0656362,2.8646808]]]

The $tables() method is used to materialize each chunk as Arrow Table. For the sake of this tutorial, the operation is limited to the first 3 chunks.

chunks <- list()

x_reader <- experiment$ms$get("RNA")$X$get("data")$read()$tables()

while (!x_reader$read_complete()) {
  if (length(chunks) == 3) {
    break
  }
  chunks <- c(chunks, x_reader$read_next())
}

chunks

[[1]]
Table
2097152 rows x 3 columns
$soma_dim_0 <int64 not null>
$soma_dim_1 <int64 not null>
$soma_data <float not null>

[[2]]
Table
2097152 rows x 3 columns
$soma_dim_0 <int64 not null>
$soma_dim_1 <int64 not null>
$soma_data <float not null>

[[3]]
Table
2097152 rows x 3 columns
$soma_dim_0 <int64 not null>
$soma_dim_1 <int64 not null>
$soma_data <float not null>

This approach is particularly useful when working with large arrays that may not fit into memory all at once. However, for smaller arrays that comfortably fit into memory, the concat method is used to automatically load all chunks and concatenate them into a single Arrow Table.

Python
R

Use .concat() to load the entirety of the obs array as an Arrow Table that is then converted to a Pandas DataFrame.

experiment.obs.read().concat().to_pandas()

	soma_joinid	cell_id	organ_tissue	method	donor	anatomical_information	n_counts_UMIs	n_genes	cell_ontology_class	free_annotation	manually_annotated	compartment	gender
0	0	AAACCCACACTCCTGT_TSP6_Liver_NA_10X_1_1	Liver	10X	TSP6	nan	7633.0	2259	macrophage	Monocyte/Macrophage	True	immune	male
1	1	AAACGAAGTACCAGAG_TSP6_Liver_NA_10X_1_1	Liver	10X	TSP6	nan	2858.0	1152	monocyte	Monocyte	True	immune	male
2	2	AAAGAACAGCCTCTTC_TSP6_Liver_NA_10X_1_1	Liver	10X	TSP6	nan	10395.0	2598	macrophage	Monocyte/Macrophage	True	immune	male
3	3	AAAGAACGTAGCACAG_TSP6_Liver_NA_10X_1_1	Liver	10X	TSP6	nan	6610.0	2125	liver dendritic cell	Dendritic cell	True	immune	male
4	4	AAAGAACGTTTCTTAC_TSP6_Liver_NA_10X_1_1	Liver	10X	TSP6	nan	9387.0	2345	macrophage	Monocyte/Macrophage	True	immune	male
...	...	...	...	...	...	...	...	...	...	...	...	...	...
264819	264819	TSP2_Vasculature_aorta_SS2_B113343_B133091_Imm...	Vasculature	smartseq2	TSP2	aorta	37347.0	395	macrophage	macrophage	True	immune	female
264820	264820	TSP2_Vasculature_aorta_SS2_B113343_B133091_Imm...	Vasculature	smartseq2	TSP2	aorta	111047.0	769	macrophage	macrophage	True	immune	female
264821	264821	TSP2_Vasculature_aorta_SS2_B113343_B133091_Imm...	Vasculature	smartseq2	TSP2	aorta	140634.0	2468	macrophage	macrophage	True	immune	female
264822	264822	TSP2_Vasculature_aorta_SS2_B113343_B133091_Imm...	Vasculature	smartseq2	TSP2	aorta	176268.0	2700	macrophage	macrophage	True	immune	female
264823	264823	TSP2_Vasculature_aorta_SS2_B113343_B133091_Imm...	Vasculature	smartseq2	TSP2	aorta	69025.0	982	t cell	t cell	True	immune	female

264824 rows × 13 columns

Use .concat() to load the entirety of the obs array as an Arrow Table that is then converted to a data.frame.

experiment$obs$read()$concat()$to_data_frame()

A tibble: 264824 x 13
soma_joinid	cell_id	organ_tissue	method	donor	anatomical_information	n_counts_UMIs	n_genes	cell_ontology_class	free_annotation	manually_annotated	compartment	gender
<int>	<chr>	<fct>	<fct>	<fct>	<fct>	<dbl>	<int>	<fct>	<fct>	<lgl>	<fct>	<fct>
0	AAACCCACACTCCTGT_TSP6_Liver_NA_10X_1_1	Liver	10X	TSP6	nan	7633	2259	macrophage	Monocyte/Macrophage	TRUE	immune	male
1	AAACGAAGTACCAGAG_TSP6_Liver_NA_10X_1_1	Liver	10X	TSP6	nan	2858	1152	monocyte	Monocyte	TRUE	immune	male
2	AAAGAACAGCCTCTTC_TSP6_Liver_NA_10X_1_1	Liver	10X	TSP6	nan	10395	2598	macrophage	Monocyte/Macrophage	TRUE	immune	male
3	AAAGAACGTAGCACAG_TSP6_Liver_NA_10X_1_1	Liver	10X	TSP6	nan	6610	2125	liver dendritic cell	Dendritic cell	TRUE	immune	male
4	AAAGAACGTTTCTTAC_TSP6_Liver_NA_10X_1_1	Liver	10X	TSP6	nan	9387	2345	macrophage	Monocyte/Macrophage	TRUE	immune	male
...	...	...	...	...	...	...	...	...	...	...	...	...
264819	TSP2_Vasculature_aorta_SS2_B113343_B133091_Immune_P5_S365	Vasculature	smartseq2	TSP2	aorta	37347	395	macrophage	macrophage	TRUE	immune	female
264820	TSP2_Vasculature_aorta_SS2_B113343_B133091_Immune_P6_S366	Vasculature	smartseq2	TSP2	aorta	111047	769	macrophage	macrophage	TRUE	immune	female
264821	TSP2_Vasculature_aorta_SS2_B113343_B133091_Immune_P7_S367	Vasculature	smartseq2	TSP2	aorta	140634	2468	macrophage	macrophage	TRUE	immune	female
264822	TSP2_Vasculature_aorta_SS2_B113343_B133091_Immune_P8_S368	Vasculature	smartseq2	TSP2	aorta	176268	2700	macrophage	macrophage	TRUE	immune	female
264823	TSP2_Vasculature_aorta_SS2_B113343_B133091_Immune_P9_S369	Vasculature	smartseq2	TSP2	aorta	69025	982	t cell	t cell	TRUE	immune	female

Select and filter data

One of the most useful features of SOMA is the ability to efficiently select and filter only the data necessary for your analysis without loading the entire dataset into memory first. The read methods offer several arguments to access a specific subset of data, depending on the type of object being read.

The most basic type of filtering is selecting a subset of records based on their coordinates using the coords argument, which is available on all array-based SOMA objects.

Python
R

This example loads the expression data for the first 100 cells and 50 genes. Because the requested slice is realtively small the .concat() method is added to return a single Arrow Table before converting the output to a pandas DataFrame.

(
    experiment.ms["RNA"]
    .X["data"]
    .read(coords=[slice(0, 99), slice(0, 49)])
    .tables()
    .concat()
    .to_pandas()
)

	soma_dim_0	soma_dim_1	soma_data
0	0	38	2.313531
1	1	36	0.000000
2	1	38	2.566711
3	2	32	0.622891
4	2	38	1.939062
...	...	...	...
174	97	38	2.798242
175	98	12	7.249046
176	98	32	0.884104
177	98	38	2.239379
178	99	38	2.808726

179 rows × 3 columns

This example loads the expression data for the first 100 cells and 50 genes. Because the requested slice is realtively small the $concat() method is added to return a single Arrow Table before converting the output to a data.frame.

experiment$ms$get("RNA")$X$get("data")$read(
  coords = list(0:99L, 0:49L)
)$tables()$concat()$to_data_frame()

A tibble: 179 x 3
soma_dim_0	soma_dim_1	soma_data
<int>	<int>	<dbl>
0	38	2.3135314
1	36	0.0000000
1	38	2.5667109
2	32	0.6228909
2	38	1.9390616
...	...	...
97	38	2.7982423
98	12	7.2490458
98	32	0.8841037
98	38	2.2393794
99	38	2.8087256

For SOMADataFrame objects like obs and var, the read method provides additional arguments to filter by values on query conditions and select a subset of columns to return.

Load the first 100 records from obs with at least 2,000 detected reads and retrieve only two columns of interest from the array.

Python
R

experiment.obs.read(
    coords=[slice(0, 99)],
    value_filter="n_counts_UMIs > 2000",
    column_names=["cell_id", "n_counts_UMIs"],
).concat().to_pandas()

	cell_id	n_counts_UMIs
0	AAACCCACACTCCTGT_TSP6_Liver_NA_10X_1_1	7633.0
1	AAACGAAGTACCAGAG_TSP6_Liver_NA_10X_1_1	2858.0
2	AAAGAACAGCCTCTTC_TSP6_Liver_NA_10X_1_1	10395.0
3	AAAGAACGTAGCACAG_TSP6_Liver_NA_10X_1_1	6610.0
4	AAAGAACGTTTCTTAC_TSP6_Liver_NA_10X_1_1	9387.0
...	...	...
95	ACACGCGAGCGAGTAC_TSP6_Liver_NA_10X_1_1	6029.0
96	ACACGCGAGTATGTAG_TSP6_Liver_NA_10X_1_1	8961.0
97	ACACTGAAGGTAGTAT_TSP6_Liver_NA_10X_1_1	4409.0
98	ACAGAAAAGCAATAGT_TSP6_Liver_NA_10X_1_1	6239.0
99	ACAGCCGCAGGATTCT_TSP6_Liver_NA_10X_1_1	8171.0

100 rows × 2 columns

experiment$obs$read(
  coords = 0:99L,
  value_filter = "n_counts_UMIs > 2000",
  column_names = c("cell_id", "n_counts_UMIs")
)$concat()$to_data_frame()

A tibble: 100 x 2
cell_id	n_counts_UMIs
<chr>	<dbl>
AAACCCACACTCCTGT_TSP6_Liver_NA_10X_1_1	7633
AAACGAAGTACCAGAG_TSP6_Liver_NA_10X_1_1	2858
AAAGAACAGCCTCTTC_TSP6_Liver_NA_10X_1_1	10395
AAAGAACGTAGCACAG_TSP6_Liver_NA_10X_1_1	6610
AAAGAACGTTTCTTAC_TSP6_Liver_NA_10X_1_1	9387
...	...
ACACGCGAGCGAGTAC_TSP6_Liver_NA_10X_1_1	6029
ACACGCGAGTATGTAG_TSP6_Liver_NA_10X_1_1	8961
ACACTGAAGGTAGTAT_TSP6_Liver_NA_10X_1_1	4409
ACAGAAAAGCAATAGT_TSP6_Liver_NA_10X_1_1	6239
ACAGCCGCAGGATTCT_TSP6_Liver_NA_10X_1_1	8171

Leverage the same filtering options on the var array to retrieve pre-calculated gene expression means and standard deviations for a set of relevant genes.

Python
R

experiment.ms["RNA"].var.read(
    value_filter="gene_symbol in ['CD19', 'CD3E', 'CD4', 'CD8A', 'CD14']",
    column_names=["ensemblid", "mean", "std", "gene_symbol"],
).concat().to_pandas()

[2024-08-15 02:21:29.735] [tiledbsoma] [Process: 246425] [Thread: 246425] [warning] [TileDB-SOMA::ManagedQuery] [51fd7f27-3d17-49d0-abc3-04efd8fb9712] Invalid column selected: feature_name

	ensemblid	mean	std	gene_symbol
0	ENSG00000153563.15	0.115993	0.462204	CD8A
1	ENSG00000170458.14	0.284372	0.698817	CD14
2	ENSG00000198851.9	0.338931	0.756713	CD3E
3	ENSG00000010610.10	0.122590	0.371681	CD4
4	ENSG00000177455.13	0.054992	0.266716	CD19

experiment$ms$get("RNA")$var$read(
  value_filter = "gene_symbol %in% c('CD19', 'CD3E', 'CD4', 'CD8A', 'CD14')",
  column_names = c("ensemblid", "gene_symbol", "mean", "std")
)$concat()$to_data_frame()

A tibble: 5 x 4
ensemblid	gene_symbol	mean	std
<chr>	<fct>	<dbl>	<dbl>
ENSG00000153563.15	CD8A	0.11599332	0.4622044
ENSG00000170458.14	CD14	0.28437189	0.6988171
ENSG00000198851.9	CD3E	0.33893079	0.7567127
ENSG00000010610.10	CD4	0.12258978	0.3716811
ENSG00000177455.13	CD19	0.05499229	0.2667163

For SOMASparseNDArrays such as X layers containing expression data (for this dataset) or obsm layers containing dimensionality reduction results, the read method’s filtering capabilities are limited to the coords argument.

This example loads expression data for the first 100 cells and 50 genes as a table.

Python
R

(
    experiment.ms["RNA"]
    .X["data"]
    .read(coords=[slice(0, 99), slice(0, 49)])
    .tables()
    .concat()
    .to_pandas()
)

	soma_dim_0	soma_dim_1	soma_data
0	0	38	2.313531
1	1	36	0.000000
2	1	38	2.566711
3	2	32	0.622891
4	2	38	1.939062
...	...	...	...
174	97	38	2.798242
175	98	12	7.249046
176	98	32	0.884104
177	98	38	2.239379
178	99	38	2.808726

179 rows × 3 columns

experiment$ms$get("RNA")$X$get("data")$read(
  coords = list(0:99L, 0:49L)
)$tables()$concat()$to_data_frame()

A tibble: 179 x 3
soma_dim_0	soma_dim_1	soma_data
<int>	<int>	<dbl>
0	38	2.3135314
1	36	0.0000000
1	38	2.5667109
2	32	0.6228909
2	38	1.9390616
...	...	...
97	38	2.7982423
98	12	7.2490458
98	32	0.8841037
98	38	2.2393794
99	38	2.8087256

Experiment-level queries

The real power of the SOMA API comes from the ability to slice and filter measurement data based on the cell- and feature-level annotations stored in the experiment. For datasets containing millions of cells, this means you can easily access expression values for cells within a specific cluster, or that meet a certain quality threshold, etc.

The example below shows how to filter for highly variable genes within dendritic cells.

Python
R

query = experiment.axis_query(
    measurement_name="RNA",
    obs_query=tiledbsoma.AxisQuery(
        value_filter="cell_ontology_class == 'dendritic cell'",
    ),
    var_query=tiledbsoma.AxisQuery(
        value_filter="highly_variable == True",
    ),
)

query <- experiment$axis_query(
  measurement_name = "RNA",
  obs_query = SOMAAxisQuery$new(
    value_filter = "cell_ontology_class == 'dendritic cell'"
  ),
  var_query = SOMAAxisQuery$new(
    value_filter = "highly_variable == TRUE"
  )
)

Inspect result

The returned query object allows you to inspect the query results and selectively access data. For example, you can see how many cells and genes were returned by the query:

Python
R

{"cells": query.n_obs, "genes": query.n_vars}

{'cells': 533, 'genes': 2435}

c("cells" = query$n_obs, "genes" = query$n_vars)

cells: 533
genes: 2435

Load result

Data loaded into memory from a SOMA experiment via the query object only includes records that matches the specified query criteria. The following example demonstrates how to load expression values for matching cells and genes from an X layer.

Python
R

You can also load the expression data into memory for the selected cells and genes as an Arrow sparse tensor.

query.X(layer_name="data").coos().concat()

<pyarrow.SparseCOOTensor>
type: float
shape: (2147483646, 2147483646)

Important

Note the shape of the returned tensor corresponds to the capacity of the underlying TileDB array. By default, SOMA creates arrays with extra room for adding new data in the future.

From here, you could use the to_sparse_matrix() method to easily load query results for any matrix-like data as a sparse dgTMatrix (from the Matrix package). At a minimum, you need to pass a collection (e.g., X, or obsm) and layer (e.g., data). You can also populate the matrix dimension names by specifying which obs column contains the values to use for row names and which var column contains the values to use for column names.

mat <- query$to_sparse_matrix(
  collection = "X",
  layer_name = "data",
  obs_index = "cell_id",
  var_index = "var_id"
)

mat[1:10, 1:5]

10 x 5 sparse Matrix of class "dgTMatrix"
                                     AL627309.6 MTCO1P12    ISG15    CCNL2 NADK
ACAAAGATCATGCAGT_TSP5_Eye_NA_10X_1_2          . .        .        .           .
ACGGAAGAGGTTGCCC_TSP5_Eye_NA_10X_1_2          . .        2.728290 .           .
AGTGACTAGCGGTAAC_TSP5_Eye_NA_10X_1_2          . .        .        .           .
ATGCCTCAGCCGAACA_TSP5_Eye_NA_10X_1_2          . .        .        .           .
CAACCAAAGTTGGCGA_TSP5_Eye_NA_10X_1_2          . .        2.816236 .           .
CATACTTTCATTGTGG_TSP5_Eye_NA_10X_1_2          . .        .        .           .
CATGCAATCGAGCTGC_TSP5_Eye_NA_10X_1_2          . .        2.300955 .           .
CATGCGGGTACTAGCT_TSP5_Eye_NA_10X_1_2          . .        .        2.834261    .
CCGAACGGTAGGGAGG_TSP5_Eye_NA_10X_1_2          . 1.926033 .        .           .
CCTGCATCACAATGAA_TSP5_Eye_NA_10X_1_2          . .        .        .           .

Toolkit interoperability

SOMA also provides support for exporting query results to various in-memory data structures used by popular single-cell analysis toolkits. As before, the results only include data that passed the specified query criteria. Unlike the query accessors shown previously, these methods must access and load multiple data elements to construct these complex objects but still offer flexibility to customize what is included in the resulting object.

Python
R

This example shows how to materialize the query results as an AnnData object, populating X with expression data from the "data" layer.

adat = query.to_anndata(X_name="data")
adat

AnnData object with n_obs × n_vars = 533 × 2435
    obs: 'soma_joinid', 'cell_id', 'organ_tissue', 'method', 'donor', 'anatomical_information', 'n_counts_UMIs', 'n_genes', 'cell_ontology_class', 'free_annotation', 'manually_annotated', 'compartment', 'gender'
    var: 'soma_joinid', 'var_id', 'gene_symbol', 'feature_type', 'ensemblid', 'highly_variable', 'means', 'dispersions', 'dispersions_norm', 'mean', 'std'

This example shows how to materialize the query results as an Seurat object, populating the RNA Assay’s data slot with expression data from the data layer. The obs_index and var_index arguments are used to specify which columns in the obs and var arrays should be used as column and row names, respectively.

sobj <- query$to_seurat(
  X_layers = c(data = "data"),
  obs_index = "cell_id",
  var_index = "var_id",
  obsm_layers = FALSE,
  varm_layers = FALSE
)

sobj

An object of class Seurat 
2435 features across 533 samples within 1 assay 
Active assay: RNA (2435 features, 0 variable features)
 1 layer present: data

Now that you have the data loaded in memory and in a toolkit-specific format, the full suite of analysis and visualization methods provided by that toolkit are available.

Python
R

This example leverages scanpy’s plotting capabilities to visualize the distribution of the n_counts_UMIs and n_genes attributes across the cells in the query result.

sc.pl.violin(adat, ["n_counts_UMIs", "n_genes"], jitter=0.4, multi_panel=True)

This example leverages Seurat’s plotting capabilities to visualize the distribution of the n_counts_UMIs and n_genes attributes across the cells in the query result.

Seurat::VlnPlot(sobj, features = c("n_counts_UMIs", "n_genes"))