Learn how to access the spatial data components of a SOMA Experiment.
TileDB-SOMA supports spatial omics data, such as those generated by 10X Visium. SOMA experiments created from these spatial datasets include extra elements beyond typical single-cell experiments, such as high-resolution tissue images and spatial coordinates for each measurement location. This tutorial guides you through understanding these new spatial components and shows how to access them using TileDB-SOMA’s API. You’ll learn how to do the following:
Open a TileDB-SOMA experiment with spatial data.
Navigate the experiment’s spatial components, including scenes, images, and spatial dataframes.
Load image data for visualization.
Access spatial location data using bounding boxes.
Prerequisites
While you can run this tutorial locally, note that this tutorial relies on remote resources to run correctly.
You must create a REST API token and create an environment variable named $TILEDB_REST_TOKEN set to the value of your generated token.
However, this is not necessary when running this notebook inside of a TileDB workspace where the API token is automatically generated and configured for you.
Setup
Load the necessary packages. Import tiledbsoma to interact with SOMA data, scanpy for some data visualization, and matplotlib for displaying images and plots.
The dataset used here is a spatial gene expression dataset of a mouse brain coronal section (FFPE), generated by 10X Genomics’ Visium CytAssist platform. To view the raw data, visit 10x Genomics. This data has been processed with Space Ranger and ingested into TileDB-SOMA. You can find it on TileDB.
Tip
Visit the Spatial Data Ingestion tutorial for information on how to create a SOMA experiment from a spatial dataset.
Open the SOMA experiment in read mode. This action loads the experiment’s structure into memory. Note that the actual data is not loaded at this point, as it is only a pointer to the data on disk.
As you can see in the output, the SOMA experiment consists of three top-level attributes, ms (measurements), obs (observation annotations), and spatial. The ms and obs attributes are common to all SOMA experiments, while the spatial attribute is specific to spatial experiments and stores all spatial data associated with the experiment.
It’s worth noting that even within a spatial SOMA experiment, you can access and query the non-spatial elements, such as the obs and ms collections, by using the same methods and APIs as with typical non-spatial SOMA experiments. This means that you can use familiar filtering, slicing, and data loading techniques with TileDB-SOMA for single-cell data, offering a consistent workflow, whether you’re working with spatial or non-spatial data components.
For example, read the var data from the SOMA Experiment as a pandas.DataFrame:
<Collection 'tiledb://TileDB-Inc/58fab955-1470-483e-8c79-ae55d201382b' (open for 'r') (1 item)
'scene0': 'tiledb://TileDB-Inc/8aec3f0f-4ec1-40cf-a180-a1ba8a80f49e' (unopened)>
Scenes
Inside the spatial collection, you can find one or more Scene objects. A scene represents a distinct view or a physical section of the sample. This particular dataset has a single scene called scene0.
img: This collection stores image data, such as the high- and low-resolution slide images of the tissue sample.
obsl: This collection stores the obs-indexed location data. In this case, it has a PointCloudDataFrame called loc, holding the Visium spot locations and sizes.
varl: This collection stores the var-indexed location data. In this example, the collection is empty because Visium datasets don’t include spatial feature data.
Images
Inside the img collection, you will find one or more MultiscaleImage objects.
soma_dim_0: int64 not null
soma_dim_1: int64 not null
soma_dim_2: int64 not null
soma_data: uint8 not null
You can use the DenseNDArray’s read method to load the image data. This next snippet shows how to load the low-resolution image data and store the results as a NumPy array with shape (height, width, channels).
<PointCloudDataFrame 'tiledb://TileDB-Inc/fadabf5c-2bf1-4d0b-a6b2-e124f89f3461' (open for 'r')>
Just like with the MultiscaleImage, you can access the coordinate_space property of the PointCloudDataFrame to see the coordinate space of the point cloud.
This spatial dataframe has a key soma_joinid that maps to the obs array in the root of the SOMA experiment and the x and y coordinates of each spot. Read the data from the loc dataframe into memory.
This tutorial covered basic methods for accessing spatial data within a SOMA experiment. This should give a solid foundation for building more complex spatial access patterns with TileDB-SOMA.