State Management
In computational biology, managing large datasets efficiently and consistently is crucial. TileDB-SOMA’s stateful API is designed to address these challenges, ensuring that data operations are performed reliably and consistently. This document explains what a stateful API is, why TileDB-SOMA uses this approach, and the implications for data handling.
What is a stateful API?
A stateful API maintains context and state information across multiple operations. When a data object, such as a TileDB array, is opened, it retains information about its state until explicitly closed. This stands in contrast with a stateless API, where each operation is independent, and no state information is preserved between operations. In TileDB-SOMA, a stateful API allows for more efficient reuse of resources and provides better consistency guarantees during data operations.
Advantages of a stateful API
- Efficiency: By maintaining an open connection storage engine, the API can perform multiple operations without the overhead of repeatedly opening and closing connections. This is particularly beneficial when working with large datasets, where minimizing latency is critical.
- Consistency: A stateful API can provide a consistent view of the data. When a dataset is opened at a specific timestamp, all subsequent operations within that session will reflect the state of the data at that time. This ensures that analyses are based on a stable snapshot of the data, even if other users make changes concurrently.
Practical implications
When working with TileDB-SOMA, objects must be explicitly opened for reading or writing. Once operations are finished the objects must then be explicitly closed. This process ensures that the necessary resources are allocated and managed correctly throughout the session.
To demonstrate this, consider the following R example, in which the SOMA experiment is opened in read mode, the read is performed, and then the experiment is closed:
<- "tiledb://TileDB-Inc/soma-exp-tabula-sapiens-immune"
SOMA_URI
# Open the experiment for reading
<- SOMAExperimentOpen(SOMA_URI, mode="READ")
experiment
# Read the obs array
$obs$read()$concat()
experiment
# Close the experiment
$close() experiment
The same pattern can be utilized in Python:
= "tiledb://TileDB-Inc/soma-exp-tabula-sapiens-immune"
SOMA_URI
# Open the experiment for reading
= tiledbsoma.Experiment.open(SOMA_URI, mode="r")
experiment
# Read the obs array
experiment.obs.read().concat()
# Close the experiment
experiment.close()
In Python, SOMA objects are implemented as context managers, allowing you to use the with
statement to automatically ensure the object is closed automatically:
with tiledbsoma.Experiment.open(SOMA_URI, mode="r") as experiment:
experiment.obs.read().concat()