Data Structures

life sciences

single cell (soma)

spatial

foundation

Learn about the key data structures for spatial omics in TileDB-SOMA on this page.

The spatial omics data structures are an addition to the core SOMA data model. Refer to SOMA Data Structures for other data structures in the SOMA data model.

Foundational data structures

SOMAPointCloudDataFrame

SOMAPointCloudDataFrame is a multi-column table with a user-defined Arrow Schema, defining the number of columns and the column name and value type.

Like the SOMADataFrame, every SOMAPointCloudDataFrame must have a column called soma_joinid of type int64 and domain [0, 2^63-1]. The soma_joinid acts as a joint key for other objects, such as SOMASparseNDArray. There may be many items with the same soma_joinid stored in the SOMAPointCloudDataFrame.

Along with the soma_joinid, the user must define spatial columns, referred to as spatial axes, that define the “points” in the array. Each spatial axis must be either an integer or floating point type, and they must all have the same type. The user may specify a restricted domain for spatial axes or allow the axes to support the entire valid type range.

The default “fill” value for SOMAPointCloudDataFrame is the zero or null value of the corresponding column data type (for example, Arrow.float32 defaults to 0.0, Arrow.string to "", and so on).

SOMAGeometryDataFrame

SOMAGeometryDataFrame is a multi-column table with a user-defined Arrow Schema, defining the number of columns and their corresponding column name and value type.

Like the SOMADataFrame, every SOMAGeometryDataFrame must have a column called soma_joinid of type int64 and domain [0, 2^63-1]. The soma_joinid acts as a joint key for other objects, such as SOMASparseNDArray. There may be many items with the same soma_joinid stored in the SOMAGeometryDataFrame.

Every SOMAGeometryDataFrame must also have a column called soma_geometry with type binary that stores a well-known binary blob. The user must give names for the axes of the geometry stored in the well-known binary that are distinct from the names of other columns in the table. The SOMAGeometryDataFrame can store many items with the same geometry.

The default “fill” value for SOMAGeometryDataFrame is the zero or null value of the corresponding column data type (for example, Arrow.float32 defaults to 0.0, Arrow.string to "", and so on).

SOMAMultiscaleImage

SOMAMultiscaleImage is a string-keyed map of “images” where each image is a SOMADenseNDArray. The SOMAMultiscaleImage is also indexed by the maximum shape (largest to smallest). The maximum shape of each SOMADenseNDArray must be the size of the entire image, but it may have regions without data. These regions without data return the fill value of the SOMADenseNDArray. Keys in the map are unique and singular (no duplicates, that is, since the SOMAMultiscaleImage is not a multi-map).

The SOMAMultiscaleImage must have a fixed image axis order (for example channel-height-width) and a fixed number of channels (if a channel column exists). Each image within the SOMAMultiscaleImage must match these conventions.

Composed data structures

SOMAScene

A SOMAScene is a specialized SOMACollection that stores spatially resolved data on a defined coordinate space. The SOMAScene stores coordinate transformations from the coordinate spaces of the stored elements back to the coordinate space of the scene. It has the following predefined fields:

img: A SOMACollection of SOMAMultiscaleImages. This subcollection stores imagery that is mappable back to the SOMAScene.
obsl: A SOMACollection of SOMAPointCloudDataFrame and SOMAGeometryDataFrame objects. The l in obsl is for location. This stores location-based annotations on the observation domain. The soma_joinid column of any object stored in this collection refers to the obsid.
varl: A nested SOMACollection of SOMACollections. The top-level collection maps from measurement name to SOMACollections of SOMAPointCloudDataFrame and SOMAGeometryDataFrame objects. The l in varl is for location. This stores location-based annotations on the variable domain. The soma_joinid column of any object stored in the collections refers to the varid of the measurement.