1. Structure
  2. Life Sciences
  3. Single-cell
  4. Spatial
  5. Tutorials
  6. Access Spatial Data
  • Home
  • What is TileDB?
  • Get Started
  • Explore Content
  • Accounts
    • Individual Accounts
      • Apply for the Free Tier
      • Profile
        • Overview
        • Cloud Credentials
        • Storage Paths
        • REST API Tokens
        • Credits
    • Organization Admins
      • Create an Organization
      • Profile
        • Overview
        • Members
        • Cloud Credentials
        • Storage Paths
        • Billing
      • API Tokens
    • Organization Members
      • Organization Invitations
      • Profile
        • Overview
        • Members
        • Cloud Credentials
        • Storage Paths
        • Billing
      • API Tokens
  • Catalog
    • Introduction
    • Data
      • Arrays
      • Tables
      • Single-Cell (SOMA)
      • Genomics (VCF)
      • Biomedical Imaging
      • Vector Search
      • Files
    • Code
      • Notebooks
      • Dashboards
      • User-Defined Functions
      • Task Graphs
      • ML Models
    • Groups
    • Marketplace
    • Search
  • Collaborate
    • Introduction
    • Organizations
    • Access Control
      • Introduction
      • Share Assets
      • Asset Permissions
      • Public Assets
    • Logging
    • Marketplace
  • Analyze
    • Introduction
    • Slice Data
    • Multi-Region Redirection
    • Notebooks
      • Launch a Notebook
      • Usage
      • Widgets
      • Notebook Image Dependencies
    • Dashboards
      • Dashboards
      • Streamlit
    • Preview
    • User-Defined Functions
    • Task Graphs
    • Serverless SQL
    • Monitor
      • Task Log
      • Task Graph Log
  • Scale
    • Introduction
    • Task Graphs
    • API Usage
  • Structure
    • Why Structure Is Important
    • Arrays
      • Introduction
      • Quickstart
      • Foundation
        • Array Data Model
        • Key Concepts
          • Storage
            • Arrays
            • Dimensions
            • Attributes
            • Cells
            • Domain
            • Tiles
            • Data Layout
            • Compression
            • Encryption
            • Tile Filters
            • Array Schema
            • Schema Evolution
            • Fragments
            • Fragment Metadata
            • Commits
            • Indexing
            • Array Metadata
            • Datetimes
            • Groups
            • Object Stores
          • Compute
            • Writes
            • Deletions
            • Consolidation
            • Vacuuming
            • Time Traveling
            • Reads
            • Query Conditions
            • Aggregates
            • User-Defined Functions
            • Distributed Compute
            • Concurrency
            • Parallelism
        • Storage Format Spec
      • Tutorials
        • Basics
          • Basic Dense Array
          • Basic Sparse Array
          • Array Metadata
          • Compression
          • Encryption
          • Data Layout
          • Tile Filters
          • Datetimes
          • Multiple Attributes
          • Variable-Length Attributes
          • String Dimensions
          • Nullable Attributes
          • Multi-Range Reads
          • Query Conditions
          • Aggregates
          • Deletions
          • Catching Errors
          • Configuration
          • Basic S3 Example
          • Basic TileDB Cloud
          • fromDataFrame
          • Palmer Penguins
        • Advanced
          • Schema Evolution
          • Advanced Writes
            • Write at a Timestamp
            • Get Fragment Info
            • Consolidation
              • Fragments
              • Fragment List
              • Consolidation Plan
              • Commits
              • Fragment Metadata
              • Array Metadata
            • Vacuuming
              • Fragments
              • Commits
              • Fragment Metadata
              • Array Metadata
          • Advanced Reads
            • Get Fragment Info
            • Time Traveling
              • Introduction
              • Fragments
              • Array Metadata
              • Schema Evolution
          • Array Upgrade
          • Backends
            • Amazon S3
            • Azure Blob Storage
            • Google Cloud Storage
            • MinIO
            • Lustre
          • Virtual Filesystem
          • User-Defined Functions
          • Distributed Compute
          • Result Estimation
          • Incomplete Queries
        • Management
          • Array Schema
          • Groups
          • Object Management
        • Performance
          • Summary of Factors
          • Dense vs. Sparse
          • Dimensions vs. Attributes
          • Compression
          • Tiling and Data Layout
          • Tuning Writes
          • Tuning Reads
      • API Reference
    • Tables
      • Introduction
      • Quickstart
      • Foundation
        • Data Model
        • Key Concepts
          • Indexes
          • Columnar Storage
          • Compression
          • Data Manipulation
          • Optimize Tables
          • ACID
          • Serverless SQL
          • SQL Connectors
          • Dataframes
          • CSV Ingestion
      • Tutorials
        • Basics
          • Ingestion with SQL
          • CSV Ingestion
          • Basic S3 Example
          • Running Locally
        • Advanced
          • Scalable Ingestion
          • Scalable Queries
      • API Reference
    • AI & ML
      • Vector Search
        • Introduction
        • Quickstart
        • Foundation
          • Data Model
          • Key Concepts
            • Vector Search
            • Vector Databases
            • Algorithms
            • Distance Metrics
            • Updates
            • Deployment Methods
            • Architecture
            • Distributed Compute
          • Storage Format Spec
        • Tutorials
          • Basics
            • Ingestion & Querying
            • Updates
            • Deletions
            • Basic S3 Example
            • Running Locally
          • Advanced
            • Versioning
            • Time Traveling
            • Consolidation
            • Distributed Compute
            • RAG LLM
            • LLM Memory
            • File Search
            • Image Search
            • Protein Search
          • Performance
        • API Reference
      • ML Models
        • Introduction
        • Quickstart
        • Foundation
          • Basics
          • Storage
          • Cloud Execution
          • Why TileDB for Machine Learning
        • Tutorials
          • Ingestion
            • Data Ingestion
              • Dense Datasets
              • Sparse Datasets
            • ML Model Ingestion
          • Management
            • Array Schema
            • Machine Learning: Groups
            • Time Traveling
    • Life Sciences
      • Single-cell
        • Introduction
        • Quickstart
        • Foundation
          • Data Model
          • Key Concepts
            • Data Structures
            • Use of Apache Arrow
            • Join IDs
            • State Management
            • TileDB Cloud URIs
          • SOMA API Specification
        • Tutorials
          • Data Ingestion
          • Bulk Ingestion Tutorial
          • Data Access
          • Distributed Compute
          • Basic S3 Example
          • Multi-Experiment Queries
          • Appending Data to a SOMA Experiment
          • Add New Measurements
          • SQL Queries
          • Running Locally
          • Shapes in TileDB-SOMA
          • Drug Discovery App
        • Spatial
          • Introduction
          • Foundation
            • Spatial Data Model
            • Data Structures
          • Tutorials
            • Spatial Data Ingestion
            • Access Spatial Data
            • Manage Coordinate Spaces
        • API Reference
      • Population Genomics
        • Introduction
        • Quickstart
        • Foundation
          • Data Model
          • Key Concepts
            • The N+1 Problem
            • Architecture
            • Arrays
            • Ingestion
            • Reads
            • Variant Statistics
            • Annotations
            • User-Defined Functions
            • Tables and SQL
            • Distributed Compute
          • Storage Format Spec
        • Tutorials
          • Basics
            • Basic Ingestion
            • Basic Queries
            • Export to VCF
            • Add New Samples
            • Deleting Samples
            • Basic S3 Example
            • Basic TileDB Cloud
          • Advanced
            • Scalable Ingestion
            • Scalable Queries
            • Query Transforms
            • Handling Large Queries
            • Annotations
              • Finding Annotations
              • Embedded Annotations
              • External Annotations
              • Annotation VCFs
              • Ingesting Annotations
            • Variant Statistics
            • Tables and SQL
            • User-Defined Functions
            • Sample Metadata
            • Split VCF
          • Performance
        • API Reference
          • Command Line Interface
          • Python API
          • Cloud API
      • Biomedical Imaging
        • Introduction
        • Foundation
          • Data Model
          • Key Concepts
            • Arrays
            • Ingestion
            • Reads
            • User Defined Functions
          • Storage Format Spec
        • Quickstart
        • Tutorials
          • Basics
            • Ingestion
            • Read
              • OpenSlide
              • TileDB-Py
          • Advanced
            • Batched Ingestion
            • Chunked Ingestion
            • Machine Learning
              • PyTorch
            • Napari
    • Files
  • API Reference
  • Self-Hosting
    • Installation
    • Upgrades
    • Administrative Tasks
    • Image Customization
      • Customize User-Defined Function Images
      • AWS ECR Container Registry
      • Customize Jupyter Notebook Images
    • Single Sign-On
      • Configure Single Sign-On
      • OpenID Connect
      • Okta SCIM
      • Microsoft Entra
  • Glossary

On this page

  • Prerequisites
  • Setup
  • Dataset
  • Navigate spatial components
  • Spatial components
    • Scenes
    • Images
    • Spatial Dataframe
  • Summary
  1. Structure
  2. Life Sciences
  3. Single-cell
  4. Spatial
  5. Tutorials
  6. Access Spatial Data

Access Spatial Data

life sciences
single cell (soma)
spatial
tutorials
Learn how to access the spatial data components of a SOMA Experiment.

TileDB-SOMA supports spatial omics data, such as those generated by 10X Visium. SOMA experiments created from these spatial datasets include extra elements beyond typical single-cell experiments, such as high-resolution tissue images and spatial coordinates for each measurement location. This tutorial guides you through understanding these new spatial components and shows how to access them using TileDB-SOMA’s API. You’ll learn how to do the following:

  • Open a TileDB-SOMA experiment with spatial data.
  • Navigate the experiment’s spatial components, including scenes, images, and spatial dataframes.
  • Load image data for visualization.
  • Access spatial location data using bounding boxes.

Prerequisites

While you can run this tutorial locally, note that this tutorial relies on remote resources to run correctly.

You must create a REST API token and create an environment variable named $TILEDB_REST_TOKEN set to the value of your generated token.

However, this is not necessary when running this notebook inside of a TileDB workspace where the API token is automatically generated and configured for you.

Setup

Load the necessary packages. Import tiledbsoma to interact with SOMA data, scanpy for some data visualization, and matplotlib for displaying images and plots.

  • Python
import scanpy as sc
import tiledbsoma
import tiledbsoma.io
from matplotlib import patches as mplp
from matplotlib import pyplot as plt
from matplotlib.collections import PatchCollection

tiledbsoma.show_package_versions()

Configure the notebook to display images in jpeg format, which keeps the size of the rendered notebook smaller.

  • Python
import matplotlib_inline.backend_inline

matplotlib_inline.backend_inline.set_matplotlib_formats("jpeg")

Dataset

The dataset used here is a spatial gene expression dataset of a mouse brain coronal section (FFPE), generated by 10X Genomics’ Visium CytAssist platform. To view the raw data, visit 10x Genomics. This data has been processed with Space Ranger and ingested into TileDB-SOMA. You can find it on TileDB.

Tip

Visit the Spatial Data Ingestion tutorial for information on how to create a SOMA experiment from a spatial dataset.

  • Python
EXPERIMENT_URI = "tiledb://tiledb-inc/CytAssist_FFPE_Mouse_Brain_Rep2"

Navigate spatial components

Open the SOMA experiment in read mode. This action loads the experiment’s structure into memory. Note that the actual data is not loaded at this point, as it is only a pointer to the data on disk.

  • Python
exp = tiledbsoma.Experiment.open(EXPERIMENT_URI)
exp
<Experiment 'tiledb://tiledb-inc/CytAssist_FFPE_Mouse_Brain_Rep2' (open for 'r') (4 items)
    'ms': 'tiledb://TileDB-Inc/84fb4d8b-017d-4692-a356-267c0175305d' (unopened)
    'obs': 'tiledb://TileDB-Inc/45857df6-dd60-4d48-931c-878830339b71' (unopened)
    'obs_spatial_presence': 'tiledb://TileDB-Inc/2a909b4c-135e-4faf-a945-2d7029d82af1' (unopened)
    'spatial': 'tiledb://TileDB-Inc/58fab955-1470-483e-8c79-ae55d201382b' (unopened)>

As you can see in the output, the SOMA experiment consists of three top-level attributes, ms (measurements), obs (observation annotations), and spatial. The ms and obs attributes are common to all SOMA experiments, while the spatial attribute is specific to spatial experiments and stores all spatial data associated with the experiment.

It’s worth noting that even within a spatial SOMA experiment, you can access and query the non-spatial elements, such as the obs and ms collections, by using the same methods and APIs as with typical non-spatial SOMA experiments. This means that you can use familiar filtering, slicing, and data loading techniques with TileDB-SOMA for single-cell data, offering a consistent workflow, whether you’re working with spatial or non-spatial data components.

For example, read the var data from the SOMA Experiment as a pandas.DataFrame:

  • Python
MEASUREMENT_NAME = "RNA"

var_df = (
    exp.ms[MEASUREMENT_NAME]
    .var.read(column_names=["soma_joinid", "var_id", "gene_ids"])
    .concat()
    .to_pandas()
)
var_df
soma_joinid var_id gene_ids
0 0 Xkr4 ENSMUSG00000051951
1 1 Rp1 ENSMUSG00000025900
2 2 Sox17 ENSMUSG00000025902
3 3 Lypla1 ENSMUSG00000025903
4 4 Tcea1 ENSMUSG00000033813
... ... ... ...
19460 19460 Zfy2 ENSMUSG00000000103
19461 19461 Sry ENSMUSG00000069036
19462 19462 Gm4064 ENSMUSG00000102053
19463 19463 Gm3376 ENSMUSG00000096520
19464 19464 Gm20830 ENSMUSG00000096686

19465 rows × 3 columns

Load the non-spatial elements into memory as a standard AnnData object.

  • Python
adata = tiledbsoma.io.to_anndata(
    experiment=exp,
    measurement_name=MEASUREMENT_NAME,
    X_layer_name="data",
)

adata
AnnData object with n_obs × n_vars = 2235 × 19465
    var: 'gene_ids', 'feature_types', 'genome'

Spatial components

The spatial collection organizes the spatial data elements. This collection stores all spatial information associated with your experiment.

  • Python
exp.spatial
<Collection 'tiledb://TileDB-Inc/58fab955-1470-483e-8c79-ae55d201382b' (open for 'r') (1 item)
    'scene0': 'tiledb://TileDB-Inc/8aec3f0f-4ec1-40cf-a180-a1ba8a80f49e' (unopened)>

Scenes

Inside the spatial collection, you can find one or more Scene objects. A scene represents a distinct view or a physical section of the sample. This particular dataset has a single scene called scene0.

  • Python
scene = exp.spatial["scene0"]
scene
<Scene 'tiledb://TileDB-Inc/8aec3f0f-4ec1-40cf-a180-a1ba8a80f49e' (open for 'r') (3 items)
    'img': 'tiledb://TileDB-Inc/fbf2da75-dbcf-4e12-9e0c-2e6516a58bf0' (unopened)
    'obsl': 'tiledb://TileDB-Inc/47b25792-b016-4157-8cf2-ee534b3a2fad' (unopened)
    'varl': 'tiledb://TileDB-Inc/ddd4ae97-9ef3-4579-a0e4-3fbb821134a3' (unopened)>

Each Scene includes a coordinate space shared by all its members.

  • Python
scene.coordinate_space
CoordinateSpace(axes=(Axis(name='x', unit='pixels'), Axis(name='y', unit='pixels')))

A Scene has three key SOMA collections:

  • img: This collection stores image data, such as the high- and low-resolution slide images of the tissue sample.
  • obsl: This collection stores the obs-indexed location data. In this case, it has a PointCloudDataFrame called loc, holding the Visium spot locations and sizes.
  • varl: This collection stores the var-indexed location data. In this example, the collection is empty because Visium datasets don’t include spatial feature data.

Images

Inside the img collection, you will find one or more MultiscaleImage objects.

  • Python
scene.img
<Collection 'tiledb://TileDB-Inc/fbf2da75-dbcf-4e12-9e0c-2e6516a58bf0' (open for 'r') (1 item)
    'tissue': 'tiledb://TileDB-Inc/bec86aa9-08a7-4a34-9887-8e7e2a117845' (unopened)>

In this example, a single MultiscaleImage image named “tissue” exists.

  • Python
tissue_image = scene.img["tissue"]
tissue_image
<MultiscaleImage 'tiledb://TileDB-Inc/bec86aa9-08a7-4a34-9887-8e7e2a117845' (open for 'r')>

This object has a pyramid of images at different resolutions, which supports efficient loading of each zoom level.

The coordinate_space property of the MultiscaleImage has the pixel coordinate space of the image.

  • Python
tissue_image.coordinate_space
CoordinateSpace(axes=(Axis(name='x', unit='pixels'), Axis(name='y', unit='pixels')))

Use the level_count property to see the number of levels in the MultiscaleImage.

  • Python
tissue_image.level_count
2

Examine the metadata for each image resolution level in the tissue collection.

  • Python
tissue_image.levels()
{'hires': ('tiledb://TileDB-Inc/2d8e4e5c-83ee-4b8f-aedf-7dde1988490f',
  (2000, 1692, 3)),
 'lowres': ('tiledb://TileDB-Inc/6c0db493-6a7f-4078-ad28-ed11b87cd572',
  (600, 508, 3))}

Each image is a DenseNDArray with dimensions corresponding to the image’s width, height, and color channels (RGB).

  • Python
tissue_image["hires"].schema
soma_dim_0: int64 not null
soma_dim_1: int64 not null
soma_dim_2: int64 not null
soma_data: uint8 not null

You can use the DenseNDArray’s read method to load the image data. This next snippet shows how to load the low-resolution image data and store the results as a NumPy array with shape (height, width, channels).

  • Python
im = tissue_image["lowres"].read().to_numpy()
im.shape
(600, 508, 3)

Now, visualize the low-resolution image with matplotlib.

  • Python
fig, ax = plt.subplots()
ax.imshow(im, cmap="gray")
plt.show()

Spatial Dataframe

The obsl collection has a PointCloudDataFrame called loc, which holds the spatial coordinates for each spot in the experiment.

  • Python
scene.obsl["loc"]
<PointCloudDataFrame 'tiledb://TileDB-Inc/fadabf5c-2bf1-4d0b-a6b2-e124f89f3461' (open for 'r')>

Just like with the MultiscaleImage, you can access the coordinate_space property of the PointCloudDataFrame to see the coordinate space of the point cloud.

  • Python
scene.obsl["loc"].coordinate_space
CoordinateSpace(axes=(Axis(name='x', unit='pixels'), Axis(name='y', unit='pixels')))

This spatial dataframe has a key soma_joinid that maps to the obs array in the root of the SOMA experiment and the x and y coordinates of each spot. Read the data from the loc dataframe into memory.

  • Python
spots = scene.obsl["loc"].read().concat().to_pandas()
spots
x y soma_joinid in_tissue array_row array_col spot_diameter_fullres
0 4008 11916 688 1 41 23 255.860716
1 3698 13834 1258 1 35 21 255.860716
2 3853 12875 462 1 38 22 255.860716
3 3871 13511 807 1 36 22 255.860716
4 4026 12552 1903 1 39 23 255.860716
... ... ... ... ... ... ... ...
2230 19018 18796 261 1 18 104 255.860716
2231 19192 18472 64 1 19 105 255.860716
2232 18535 21037 377 1 11 101 255.860716
2233 18554 21672 1728 1 9 101 255.860716
2234 18708 20713 768 1 12 102 255.860716

2235 rows × 7 columns

You can use the read_spatial_region method to perform a spatial query and retrieve the spots within a defined bounding box:

  • Python
scene.obsl["loc"].read_spatial_region(
    region=[3000, 3000, 10000, 10000]
).data.concat().to_pandas()
x y soma_joinid in_tissue array_row array_col spot_diameter_fullres
0 5048 9977 2012 1 47 29 255.860716
1 5222 9654 1399 1 48 30 255.860716
2 5395 9331 1069 1 49 31 255.860716
3 5414 9967 1576 1 47 31 255.860716
4 5568 9008 1185 1 50 32 255.860716
... ... ... ... ... ... ... ...
119 9934 8244 823 1 52 56 255.860716
120 9971 9515 640 1 48 56 255.860716
121 8318 9246 2234 1 49 47 255.860716
122 8702 9871 2103 1 47 49 255.860716
123 9953 8880 2065 1 50 56 255.860716

124 rows × 7 columns

Finally, visualize the tissue spots with matplotlib.

  • Python
radius = scene.obsl["loc"].metadata["soma_geometry"]

spot_patches = PatchCollection(
    [
        mplp.Circle((row["x"], row["y"]), radius=radius, color="b")
        for _, row in spots.iterrows()
    ]
)

fig, ax = plt.subplots()
ax.set_xlim((0, spots["x"].max()))
ax.set_ylim((0, spots["y"].max()))
ax.invert_yaxis()
ax.add_collection(spot_patches)
plt.show()

Summary

This tutorial covered basic methods for accessing spatial data within a SOMA experiment. This should give a solid foundation for building more complex spatial access patterns with TileDB-SOMA.

Spatial Data Ingestion
Manage Coordinate Spaces