Data Structures
The spatial omics data structures are an addition to the core SOMA data model. Refer to SOMA Data Structures for other data structures in the SOMA data model.
Foundational data structures
SOMAPointCloudDataFrame
SOMAPointCloudDataFrame
is a multi-column table with a user-defined Arrow Schema
, defining the number of columns and the column name and value type.
Like the SOMADataFrame
, every SOMAPointCloudDataFrame
must have a column called soma_joinid
of type int64
and domain [0, 2^63-1]
. The soma_joinid
acts as a joint key for other objects, such as SOMASparseNDArray
. There may be many items with the same soma_joinid
stored in the SOMAPointCloudDataFrame
.
Along with the soma_joinid
, the user must define spatial columns, referred to as spatial axes, that define the “points” in the array. Each spatial axis must be either an integer or floating point type, and they must all have the same type. The user may specify a restricted domain for spatial axes or allow the axes to support the entire valid type range.
The default “fill” value for SOMAPointCloudDataFrame
is the zero or null value of the corresponding column data type (for example, Arrow.float32
defaults to 0.0, Arrow.string
to ""
, and so on).
SOMAGeometryDataFrame
SOMAGeometryDataFrame
is a multi-column table with a user-defined Arrow Schema
, defining the number of columns and their corresponding column name and value type.
Like the SOMADataFrame
, every SOMAGeometryDataFrame
must have a column called soma_joinid
of type int64
and domain [0, 2^63-1]
. The soma_joinid
acts as a joint key for other objects, such as SOMASparseNDArray
. There may be many items with the same soma_joinid
stored in the SOMAGeometryDataFrame
.
Every SOMAGeometryDataFrame
must also have a column called soma_geometry
with type binary
that stores a well-known binary blob. The user must give names for the axes of the geometry stored in the well-known binary that are distinct from the names of other columns in the table. The SOMAGeometryDataFrame
can store many items with the same geometry.
The default “fill” value for SOMAGeometryDataFrame
is the zero or null value of the corresponding column data type (for example, Arrow.float32
defaults to 0.0, Arrow.string
to ""
, and so on).
SOMAMultiscaleImage
SOMAMultiscaleImage
is a string
-keyed map of “images” where each image is a SOMADenseNDArray
. The SOMAMultiscaleImage
is also indexed by the maximum shape (largest to smallest). The maximum shape of each SOMADenseNDArray
must be the size of the entire image, but it may have regions without data. These regions without data return the fill
value of the SOMADenseNDArray
. Keys in the map are unique and singular (no duplicates, that is, since the SOMAMultiscaleImage
is not a multi-map).
The SOMAMultiscaleImage
must have a fixed image axis order (for example channel-height-width) and a fixed number of channels (if a channel column exists). Each image within the SOMAMultiscaleImage
must match these conventions.
Composed data structures
SOMAScene
A SOMAScene
is a specialized SOMACollection
that stores spatially resolved data on a defined coordinate space. The SOMAScene
stores coordinate transformations from the coordinate spaces of the stored elements back to the coordinate space of the scene. It has the following predefined fields:
img
: ASOMACollection
ofSOMAMultiscaleImages
. This subcollection stores imagery that is mappable back to theSOMAScene
.obsl
: ASOMACollection
ofSOMAPointCloudDataFrame
andSOMAGeometryDataFrame
objects. Thel
inobsl
is for location. This stores location-based annotations on the observation domain. Thesoma_joinid
column of any object stored in this collection refers to theobsid
.varl
: A nestedSOMACollection
ofSOMACollections
. The top-level collection maps from measurement name toSOMACollections
ofSOMAPointCloudDataFrame
andSOMAGeometryDataFrame
objects. Thel
invarl
is for location. This stores location-based annotations on the variable domain. Thesoma_joinid
column of any object stored in the collections refers to thevarid
of the measurement.