import tempfile
import tiledbsoma
import pyarrow as pa
Use of Apache Arrow
SOMA uses Apache Arrow for its in-memory type system. This page explains why Arrow is used and what this means for working with SOMA data structures.
Why Arrow?
Arrow is a widely adopted open standard for in-memory data representation. It provides a rich set of data types and is designed for high performance and interoperability across languages. Using Arrow, TileDB-SOMA can leverage these benefits and ensure that data is consistently represented across different systems and tools.
Practical implications
While SOMA data structures are typically created automatically by converting data from other formats (e.g., Seurat, AnnData, etc.), it’s also possible to create them manually, in which case Arrow data types must be specified.
DataFrame example
To demonstrate this, you will create a SOMADataFrame
with a user-defined schema, which must be specified as an Arrow schema.
Start by importing the necessary libraries:
Define a URI to store the data use (this tutorial uses tempfile
to create a temporary directory):
Define the schema
Create the Arrow schema, which defines the Arrow data types for each column.
Now, use this schema create a new SOMADataFrame
:
This produced an empty SOMADataFrame
with a TileDB schema that matches the provided Arrow schema.
TileDB-SOMA is strongly typed, which means all requests for a given Arrow type must be fulfilled or throw an error. This ensures that the API is self-consistent and predictable. For example, as you’ve seen, SOMA creation operations require an Arrow schema. Thus, the schema
accessor returns the same type.
Perform a write
Similarly, when writing data to a SOMA object, it must be provided in the correct Arrow type. In this case, you will create a synthetic Arrow Table with the same schema used to create the SOMADataFrame
.
This table can now be written to the SOMADataFrame
:
Remember to close the table when you are done with it:
Additional resources
Refer to the SOMA API specification for more technical details about SOMA’s use of Arrow.