Shapes in TileDB-SOMA

life sciences

single cell (soma)

tutorials

python

shapes

Arrays shapes in the TileDB-SOMA data model.

The TileDB-SOMA team is proud to support an intuitive and extensible notion of shape with the release of TileDB-SOMA 1.15.

In this notebook, you will learn how to use shapes for the dataframes and arrays within your SOMA experiments, when and how you can resize, and options for experiments created in TileDB-SOMA versions before 1.15.

The dataset used is from Peripheral Blood Mononuclear Cells (PBMC3K), which is freely available from 10X Genomics.

The shape feature

Like other tutorials in this series, the SOMA data model brings across many familiar concepts from AnnData. This includes the ability to ask component dataframes and arrays what their shapes are.

First, import the necessary libraries and open an example experiment.

This is data ingested to TileDB-SOMA from PBMC3K.

Python
R

import tiledbsoma

uri = "tiledb://TileDB-Inc/shapes-example-processed"
exp = tiledbsoma.Experiment.open(uri)

library(tiledbsoma)

uri <- "tiledb://TileDB-Inc/shapes-example-processed"
exp <- SOMAExperimentOpen(uri)

The obs dataframe has a domain, which is a soft limit on what values you may write to it. You’ll get an exception like Query: A range was set outside of the current domain if you try to read or write soma_joinid values outside this range. This is an important data-integrity reassurance.

The domain seen here matches with the data populated inside of it. This will usually be the case, unless you created the dataframe but haven’t written any data to it yet. In that case, it’s empty, but it still has a domain.

If you have more data (more cells) to add to the experiment later, you will be able resize the obs, up to the maxdomain, which is a hard limit.

Python
R

exp.obs.domain

((0, 2637),)

exp$obs$domain()

$soma_joinid =

0
2637

Python
R

exp.obs.maxdomain

((0, 9223372036854773758),)

exp$obs$maxdomain()

$soma_joinid
integer64
[1] 0                   9223372036854773758

Python
R

exp.obs.read().concat().to_pandas()

	soma_joinid	obs_id	n_genes	percent_mito	n_counts	louvain
0	0	AAACATACAACCAC-1	781	0.030178	2419.0	CD4 T cells
1	1	AAACATTGAGCTAC-1	1352	0.037936	4903.0	B cells
2	2	AAACATTGATCAGC-1	1131	0.008897	3147.0	CD4 T cells
3	3	AAACCGTGCTTCCG-1	960	0.017431	2639.0	CD14+ Monocytes
4	4	AAACCGTGTATGCG-1	522	0.012245	980.0	NK cells
...	...	...	...	...	...	...
2633	2633	TTTCGAACTCTCAT-1	1155	0.021104	3459.0	CD14+ Monocytes
2634	2634	TTTCTACTGAGGCA-1	1227	0.009294	3443.0	B cells
2635	2635	TTTCTACTTCCTCG-1	622	0.021971	1684.0	B cells
2636	2636	TTTGCATGAGAGGC-1	454	0.020548	1022.0	B cells
2637	2637	TTTGCATGCCTCAC-1	724	0.008065	1984.0	CD4 T cells

2638 rows × 6 columns

as.data.frame(exp$obs$read()$concat())

A data.frame: 2638 × 6
soma_joinid	obs_id	n_genes	percent_mito	n_counts	louvain
<int>	<chr>	<int>	<dbl>	<dbl>	<fct>
0	AAACATACAACCAC-1	781	0.030177759	2419	CD4 T cells
1	AAACATTGAGCTAC-1	1352	0.037935957	4903	B cells
2	AAACATTGATCAGC-1	1131	0.008897362	3147	CD4 T cells
3	AAACCGTGCTTCCG-1	960	0.017430846	2639	CD14+ Monocytes
4	AAACCGTGTATGCG-1	522	0.012244898	980	NK cells
⋮	⋮	⋮	⋮	⋮	⋮
2633	TTTCGAACTCTCAT-1	1155	0.021104366	3459	CD14+ Monocytes
2634	TTTCTACTGAGGCA-1	1227	0.009294220	3443	B cells
2635	TTTCTACTTCCTCG-1	622	0.021971496	1684	B cells
2636	TTTGCATGAGAGGC-1	454	0.020547945	1022	B cells
2637	TTTGCATGCCTCAC-1	724	0.008064516	1984	CD4 T cells

You’ll learn more about this on experiment-level resizes throughout this tutorial, as well as in the tutorial on TileDB-SOMA’s append mode.

The var dataframe’s domain is similar:

Python
R

var = exp.ms["RNA"].var
var.domain

((0, 1837),)

var <- exp$ms$get("RNA")$var
var$domain()

$soma_joinid =

0
1837

Python
R

var.maxdomain

((0, 9223372036854773968),)

var$maxdomain()

$soma_joinid
integer64
[1] 0                   9223372036854773968

Likewise, the N-dimensional arrays within the experiment have their shapes as well.

An important difference: while the dataframe domain gives you the inclusive lower and upper bounds for soma_joinid writes, the shape for the N-dimensional arrays is the upper bound plus 1.

Since there are 2638 cells and 1838 genes here, X’s shape reflects that.

Python
R

exp.obs.domain

((0, 2637),)

exp$obs$domain()

$soma_joinid =

0
2637

Python
R

exp.ms["RNA"].var.domain

((0, 1837),)

exp$ms$get("RNA")$var$domain()

$soma_joinid =

0
1837

Python
R

exp.ms["RNA"].X["data"].shape

(2638, 1838)

exp$ms$get("RNA")$X$get("data")$shape()

integer64
[1] 2638 1838

Python
R

exp.ms["RNA"].X["data"].maxshape

(9223372036854773759, 9223372036854773759)

exp$ms$get("RNA")$X$get("data")$maxshape()

integer64
[1] 9223372036854773759 9223372036854773759

The other N-dimensional arrays are similar:

Python
R

obsm = exp.ms["RNA"].obsm
list(obsm.keys())

['X_draw_graph_fr', 'X_pca', 'X_tsne', 'X_umap']

obsm <- exp$ms$get("RNA")$obsm
obsm$names()

'X_draw_graph_fr'
'X_pca'
'X_tsne'
'X_umap'

Python
R

obsp = exp.ms["RNA"].obsp
list(obsp.keys())

['connectivities', 'distances']

obsp <- exp$ms$get("RNA")$obsp
obsp$names()

'connectivities'
'distances'

Python
R

[
    obsm["X_pca"].shape,
    obsm["X_pca"].maxshape,
]

[(2638, 50), (9223372036854773759, 9223372036854773759)]

list(
  obsm$get("X_pca")$shape(),
  obsm$get("X_pca")$maxshape()
)

[[1]]
integer64
[1] 2638 50  

[[2]]
integer64
[1] 9223372036854773759 9223372036854773759

Python
R

[
    obsp["distances"].shape,
    obsp["distances"].maxshape,
]

[(2638, 2638), (9223372036854773759, 9223372036854773759)]

list(
  obsp$get("distances")$shape(),
  obsp$get("distances")$maxshape()
)

[[1]]
integer64
[1] 2638 2638

[[2]]
integer64
[1] 9223372036854773759 9223372036854773759

In particular, the X array in this experiment — and in most experiments — is sparse. That means the matrix doesn’t need a number in every row or cell. Still, the shape serves as a soft limit for reads and writes: you’ll get an exception trying to read or write outside of these bounds. (Specifically, the message you’ll see is Query: A range was set outside of the current domain.)

As a convenience, you can see all the experiment’s objects’ shapes at once as follows:

Python
R

import tiledbsoma.io

tiledbsoma.io.show_experiment_shapes(exp.uri)

[DataFrame] obs
  URI tiledb://TileDB-Inc/4e63acce-71cc-4d42-96b8-0815bf7fc497
  non_empty_domain     ((0, 2637),)
  domain               ((0, 2637),)
  maxdomain            ((0, 9223372036854773758),)
  upgraded             True

[DataFrame] ms/RNA/var
  URI tiledb://TileDB-Inc/95998d1a-82f9-4555-adc9-dfdee2f057f0
  non_empty_domain     ((0, 1837),)
  domain               ((0, 1837),)
  maxdomain            ((0, 9223372036854773968),)
  upgraded             True

[SparseNDArray] ms/RNA/X/data
  URI tiledb://TileDB-Inc/68acd3b3-fb31-4089-8242-f72f35288ab6
  used_shape           ((0, 2637), (0, 1837))
  shape                (2638, 1838)
  maxshape             (9223372036854773759, 9223372036854773759)
  upgraded             True

...

[SparseNDArray] ms/RNA/obsm/X_pca
  URI tiledb://TileDB-Inc/e147bdff-4066-45ca-90d3-e0041ee4259b
  used_shape           ((0, 2637), (0, 49))
  shape                (2638, 50)
  maxshape             (9223372036854773759, 9223372036854773759)
  upgraded             True

  ...

[SparseNDArray] ms/RNA/obsp/distances
  URI tiledb://TileDB-Inc/b37fb332-6e31-4a08-8138-272f196081d9
  used_shape           ((0, 2637), (0, 2637))
  shape                (2638, 2638)
  maxshape             (9223372036854773759, 9223372036854773759)
  upgraded             True

[SparseNDArray] ms/RNA/varm/PCs
  URI tiledb://TileDB-Inc/7b2849bb-5804-469c-95e1-c5bf52aa6266
  used_shape           ((0, 1837), (0, 49))
  shape                (1838, 50)
  maxshape             (9223372036854773759, 9223372036854773759)
  upgraded             True

  ...

(Not currently implemented in R.)

As with AnnData, as a general rule you’ll see the following:

An X array’s shape is nobs x nvar.
An obsm array’s shape is nobs x some number, maybe 50.
An obsp array’s shape is nobs x nobs.
A varm array’s shape is nvar x some number, maybe 50.
A varp array’s shape is nvar x nvar.

When and how to resize at the experiment level

The primary reason you’d resize a dataframe or an array within an experiment is to append more data. For example, say you have an experiment with the results of Monday’s lab run on a sample of 100,000 cells. Then maybe on Tuesday, you’ll want to add that day’s lab run of another 70,000 cells to the same experiment, for a new total of 170,000 cells. It’s also possible that Tuesday’s data might include some infrequently expressed genes that didn’t appear in Monday’s data.

Because the shapes are soft limits, reading or writing beyond which will result in an exception, you’d need to resize the experiment to accommodate new shapes for the dataframes and arrays in the experiment to allow for new nobs = 170,000.

Visit the append-mode tutorial for information on how to resize experiments by using tiledbsoma.io.register_anndatas and tiledbsoma.io.resize_experiment

While you can resize each dataframe and array in the experiment one at a time (refer to Advanced usage), the most common case is tiledbsoma.io.resize_experiment, which exists to make this quick and convenient.

Note

resize_experiment is available only in Python, because the append-mode feature only exists currently in Python.

How to upgrade older experiments

Experiments created by TileDB-SOMA 1.15 and later will look as shown previously. The following code block shows an experiment created using TileDB-SOMA 1.14.5. This is the same PBMC3K dataset as before, except it’s the unprocessed version: this has fewer component arrays, which keeps the display here more compact.

Note

Experiment-level upgrade is applicable only to the TileDB-SOMA Python API. This is because TileDB-SOMA experiments created n R before TileDB-SOMA 1.15 have their array shape already the same as maxshape, so these can’t be expanded more.

Python
R

import tiledbsoma.io

uri = "tiledb://TileDB-Inc/shapes-example-pre-1.15-not-upgraded"
pre_115_exp = tiledbsoma.Experiment.open(uri)

uri <- "tiledb://TileDB-Inc/shapes-example-pre-1.15-not-upgraded"
pre_115_exp <- SOMAExperimentOpen(uri)

Compare the shapes from before TileDB-SOMA 1.15 to TileDB-SOMA 1.15:

Python
R

pre_115_exp.obs.domain

((0, 2147483646),)

pre_115_exp$obs$domain()

$soma_joinid =

0
2147483646

Python
R

pre_115_exp.obs.maxdomain

((0, 2147483646),)

pre_115_exp$obs$maxdomain()

$soma_joinid =

0
2147483646

Python
R

pre_115_exp.obs.tiledbsoma_has_upgraded_domain

False

pre_115_exp$obs$tiledbsoma_has_upgraded_domain()

FALSE

Python
R

[
    pre_115_exp.ms["RNA"].X["data"].shape,
    pre_115_exp.ms["RNA"].X["data"].maxshape,
    pre_115_exp.ms["RNA"].X["data"].tiledbsoma_has_upgraded_shape,
]

[(2147483646, 2147483646), (2147483646, 2147483646), False]

X <- pre_115_exp$ms$get("RNA")$X$get("data")
list(
  X$shape(),
  X$maxshape(),
  X$tiledbsoma_has_upgraded_shape()
)

[[1]]
integer64
[1] 2147483646 2147483646

[[2]]
integer64
[1] 2147483646 2147483646

[[3]]
[1] FALSE

Note that for the pre-1.15 experiment, the shape is large — like the maxshape — and tiledbsoma_has_upgraded_domain is False.

To make the old experiment look like the new experiment, call upgrade_experiment_shapes, and reopen.

For purposes of this document, we show the results of having done that.

Note that show_experiment_shapes and upgrade_experiment_shapes are currently only implemented in Python.

Before upgrading:

tiledbsoma.io.show_experiment_shapes(
    "tiledb://TileDB-Inc/shapes-example-pre-1.15-upgraded"
)

[DataFrame] obs
  URI tiledb://TileDB-Inc/85bdf23b-e0fe-4494-9012-c9102fc6be90
  non_empty_domain     ((0, 2699),)
  domain               ((0, 2147483646),)
  maxdomain            ((0, 2147483646),)
  upgraded             False

[DataFrame] ms/RNA/var
  URI tiledb://TileDB-Inc/45bd8385-dd82-40f6-a428-2a85c8626afe
  non_empty_domain     ((0, 13713),)
  domain               ((0, 2147483646),)
  maxdomain            ((0, 2147483646),)
  upgraded             False

[SparseNDArray] ms/RNA/X/data
  URI tiledb://TileDB-Inc/b714d8f6-9283-4191-8e2d-9b41c4007ee1
  used_shape           ((0, 2699), (0, 13713))
  shape                (2147483646, 2147483646)
  maxshape             (2147483646, 2147483646)
  upgraded             False
True

Applying the upgrade:

tiledbsoma.io.upgrade_experiment_shapes(
    "tiledb://TileDB-Inc/shapes-example-pre-1.15-upgraded", verbose=True
)

[DataFrame] obs
  URI tiledb://TileDB-Inc/85bdf23b-e0fe-4494-9012-c9102fc6be90
  Applying tiledbsoma_upgrade_soma_joinid_shape(2700)

[DataFrame] ms/RNA/var
  URI tiledb://TileDB-Inc/45bd8385-dd82-40f6-a428-2a85c8626afe
  Applying tiledbsoma_upgrade_soma_joinid_shape(13714)

[SparseNDArray] ms/RNA/X/data
  URI tiledb://TileDB-Inc/b714d8f6-9283-4191-8e2d-9b41c4007ee1
  Applying tiledbsoma_upgrade_shape((2700, 13714))
True

After the upgrade:

tio.show_experiment_shapes("tiledb://TileDB-Inc/shapes-example-pre-1.15-upgraded")

[DataFrame] obs
  URI tiledb://TileDB-Inc/85bdf23b-e0fe-4494-9012-c9102fc6be90
  non_empty_domain     ((0, 2699),)
  domain               ((0, 2699),)
  maxdomain            ((0, 2147483646),)
  upgraded             True

[DataFrame] ms/RNA/var
  URI tiledb://TileDB-Inc/45bd8385-dd82-40f6-a428-2a85c8626afe
  non_empty_domain     ((0, 13713),)
  domain               ((0, 13713),)
  maxdomain            ((0, 2147483646),)
  upgraded             True

[SparseNDArray] ms/RNA/X/data
  URI tiledb://TileDB-Inc/b714d8f6-9283-4191-8e2d-9b41c4007ee1
  used_shape           ((0, 2699), (0, 13713))
  shape                (2700, 13714)
  maxshape             (2147483646, 2147483646)
  upgraded             True

Python
R

pre_115_exp = tiledbsoma.open("tiledb://TileDB-Inc/shapes-example-pre-1.15-upgraded")

pre_115_exp <- SOMAExperimentOpen("tiledb://TileDB-Inc/shapes-example-pre-1.15-upgraded")

Python
R

[
    pre_115_exp.ms["RNA"].X["data"].shape,
    pre_115_exp.ms["RNA"].X["data"].maxshape,
    pre_115_exp.ms["RNA"].X["data"].tiledbsoma_has_upgraded_shape,
]

[(2700, 13714), (2147483646, 2147483646), True]

X <- pre_115_exp$ms$get("RNA")$X$get("data")
list(
  X$shape(),
  X$maxshape(),
  X$tiledbsoma_has_upgraded_shape()
)

[[1]]
integer64
[1] 2700  13714

[[2]]
integer64
[1] 2147483646 2147483646

[[3]]
[1] TRUE

To run a pre-check, you can do the following:

Python
R

tiledbsoma.io.upgrade_experiment_shapes(the_uri, check_only=True)

(Not currently implemented in R.)

This won’t change anything. It’ll only tell you if the operation will be possible.

Advanced usage

Dataframes with non-standard index columns

In the SOMA data model, the SparseNDArray and DenseNDArray objects always have int64 dimensions named soma_dim_0, soma_dim_1, and up, and they have a numeric soma_data attribute for the contents of the array.

Python
R

exp.ms["RNA"].X["data"].schema

soma_dim_0: int64 not null
soma_dim_1: int64 not null
soma_data: float not null

X$schema()

Schema
soma_dim_0: int64 not null
soma_dim_1: int64 not null
soma_data: double not null

For dataframes, though, while there must be a soma_joinid column of type int64, you can have additional index columns, or soma_joinid may be a non-index column.

This means that in the most common case, you can think of a dataframe has having a shape just as the N-dimensional arrays do.

Python
R

exp.obs.schema

soma_joinid: int64 not null
obs_id: large_string
n_genes: int64
percent_mito: float
n_counts: float
louvain: dictionary<values=string, indices=int32, ordered=0>

exp$obs$schema()

Schema
soma_joinid: int64 not null
obs_id: large_string
n_genes: int64
percent_mito: float
n_counts: float
louvain: dictionary<values=string, indices=int32>

Python
R

exp.obs.index_column_names

('soma_joinid',)

exp$obs$index_column_names()

'soma_joinid'

That being said, dataframes are capable of more than that, via the index-column names you specify at creation time.

Create some dataframes, with the same data, but different choices of index-column names.

Python
R

import tempfile

sdfuri1 = tempfile.mktemp()
sdfuri2 = tempfile.mktemp()

sdfuri1 <- tempfile()
sdfuri2 <- tempfile()

Python
R

import pyarrow as pa

schema = pa.schema(
    [
        ("soma_joinid", pa.int64()),
        ("mystring", pa.string()),
        ("myint", pa.int32()),
        ("myfloat", pa.float32()),
    ]
)

data = pa.Table.from_pydict(
    {
        "soma_joinid": [0, 1],
        "mystring": ["hello", "world"],
        "myint": [33, 44],
        "myfloat": [4.5, 5.5],
    }
)

library(arrow)

schema <- arrow::schema(
  arrow::field("soma_joinid", arrow::int64(), nullable = FALSE),
  arrow::field("mystring", arrow::large_utf8(), nullable = FALSE),
  arrow::field("myint", arrow::int32(), nullable = FALSE),
  arrow::field("myfloat", arrow::float32(), nullable = FALSE)
)

data <- arrow::arrow_table(
  soma_joinid = c(0, 1),
  mystring = c("hello", "world"),
  myint = c(33, 44),
  myfloat = c(4.5, 5.5)
)

Python
R

with tiledbsoma.DataFrame.create(
    sdfuri1,
    schema=schema,
    index_column_names=["soma_joinid", "mystring"],
    domain=[(0, 9), None],
) as sdf1:
    sdf1.write(data)

sdf1 <- SOMADataFrameCreate(
  sdfuri1,
  schema = schema,
  index_column_names = c("soma_joinid", "mystring"),
  domain = list(soma_joinid = c(0, 9), mystring = NULL)
)
sdf1$write(data)
sdf1$close()

Now inspect the domain and maxdomain for these dataframes.

Python
R

sdf1 = tiledbsoma.DataFrame.open(sdfuri1)

sdf1 <- SOMADataFrameOpen(sdfuri1)

Python
R

sdf1.index_column_names

('soma_joinid', 'mystring')

sdf1$index_column_names()

'soma_joinid'
'mystring'

Notice the soma_joinid slot of the dataframe’s domain is as requested.

Another point is that domain cannot be specified for string-type index columns.

You can set them at creation time in one of two ways:

Python
R

domain = ([(0, 9), None],)
# or
domain = ([(0, 9), ("", "")],)

    domain=list(soma_joinid=c(0, 9), mystring=NULL),
    # or
    domain=list(soma_joinid=c(0, 9), mystring=c('', '')),

In either case, the domain slot for a string-typed index column will read back as a pair of empty strings:

Python
R

sdf1.domain

((0, 9), ('', ''))

sdf1$domain()

$soma_joinid

$mystring

Python
R

sdf1.maxdomain

((0, 9223372036854775796), ('', ''))

sdf1$maxdomain()

$soma_joinid
integer64
[1] 0                   9223372036854773759

$mystring
[1] "" ""

Now inspect the other dataframe. Here, soma_joinid isn’t an index column at all. This is fine, as long as within the data you write to it, the index-column values uniquely identify each row.

Python
R

with tiledbsoma.DataFrame.create(
    sdfuri2,
    schema=schema,
    index_column_names=["myfloat", "myint"],
    domain=[(0, 999), (-1000, 1000)],
) as sdf2:
    sdf2.write(data)

sdf2 <- SOMADataFrameCreate(
  sdfuri2,
  schema = schema,
  index_column_names = c("myfloat", "myint"),
  domain = list(myfloat = c(0, 999), myint = c(-1000, 1000))
)
sdf2$write(data)
sdf2$close()

Python
R

sdf2 = tiledbsoma.DataFrame.open(sdfuri2)

sdf2 <- SOMADataFrameOpen(sdfuri2)

Python
R

sdf2.index_column_names

('myfloat', 'myint')

sdf2$index_column_names()

'myfloat'
'myint'

The domain reads back as written.

Python
R

sdf2.domain

((0.0, 999.0), (-1000, 1000))

sdf2$domain()

$myfloat

$myint

-1000
1000

Python
R

sdf2.maxdomain

((-3.4028234663852886e+38, 3.4028234663852886e+38), (-2147483648, 2147481645))

sdf2$maxdomain()

$myfloat

-3.40282346638529e+38
3.40282346638529e+38

$myint

-2147483647
2147481599

Use `resize` at the dataframe/array level with the SOMA API

Earlier in this tutorial, you learned a fast and convenient way to resize all the dataframes and arrays within an experiment.

However, should you choose to do so, you can apply these one dataframe or array at a time.

For N-dimensional arrays that have been upgraded, or that were created using TileDB-SOMA 1.15 or later, do the following:

If the array’s tiledbsoma_has_upgraded_shape method reports False, invoke the tiledbsoma_upgrade_shape method.
Otherwise, invoke the .resize method.

Note: for purposes of this document, two experiments are shown: a before and an after. For your purposes, you would use a single experiment, and operate on only that.

Unpack a pre-1.15 experiment:

Python
R

pre_115_exp = tiledbsoma.Experiment.open(
    "tiledb://TileDB-Inc/shapes-example-pre-1.15-not-upgraded"
)
X = pre_115_exp.ms["RNA"].X["data"]

pre_115_exp <- SOMAExperimentOpen("tiledb://TileDB-Inc/shapes-example-pre-1.15-not-upgraded")
X <- pre_115_exp$ms$get("RNA")$X$get("data")

Notice that the X array has not been upgraded, and that its shape reports the same as maxshape:

Python
R

X.tiledbsoma_has_upgraded_shape

False

X$tiledbsoma_has_upgraded_shape()

FALSE

Python
R

X.shape

(2147483646, 2147483646)

X$shape()

integer64
[1] 2147483646 2147483646

Now give the X array the new-style shape. First, consult its non-empty domain to find get a report of what data have already been successfully written there:

Python
R

X.non_empty_domain()

((0, 2699), (0, 13713))

X$non_empty_domain()

$soma_dim_0

0
2699

$soma_dim_1

0
13713

Python
R

with tiledbsoma.Experiment.open(
    "tiledb://TileDB-Inc/shapes-example-pre-1.15-upgraded", "w"
) as exp:
    exp.ms["RNA"].X["data"].tiledbsoma_upgrade_shape(
        [X.non_empty_domain()[0][1] + 1, X.non_empty_domain()[1][1] + 1],
        check_only=True,  # Omit this when operating on live data
    )

ned <- X$non_empty_domain(max_only = TRUE)
exp <- SOMAExperimentOpen("tiledb://TileDB-Inc/shapes-example-pre-1.15-upgraded", "WRITE")
exp$ms$get("RNA")$X$get("data")$tiledbsoma_upgrade_shape(
  c(ned[[1]], ned[[2]]),
  check_only = TRUE # Omit this when operating on live data
)
exp$close()

Next, reopen the experiment to find out what happened:

Python
R

pre_115_exp = tiledbsoma.Experiment.open(
    "tiledb://TileDB-Inc/shapes-example-pre-1.15-upgraded"
)
X = pre_115_exp.ms["RNA"].X["data"]

pre_115_exp <- SOMAExperimentOpen("tiledb://TileDB-Inc/shapes-example-pre-1.15-upgraded")
X <- pre_115_exp$ms$get("RNA")$X$get("data")

Python
R

X.tiledbsoma_has_upgraded_shape

True

X$tiledbsoma_has_upgraded_shape()

TRUE

Python
R

X.shape

(2700, 13714)

X$shape()

integer64
[1] 2700  13714

Python
R

X.maxshape

(2147483646, 2147483646)

X$maxshape()

integer64
[1] 2147483646 2147483646

If you want, you can resize it even more:

Python
R

with tiledbsoma.Experiment.open(
    "tiledb://TileDB-Inc/shapes-example-pre-1.15-upgraded", "w"
) as exp:
    # Omit check_only=True when operating on live data
    exp.ms["RNA"].X["data"].resize([7200, 1848], check_only=True)

exp <- SOMAExperimentOpen("tiledb://TileDB-Inc/shapes-example-pre-1.15-upgraded", "WRITE")
exp$ms$get("RNA")$X$get("data")$tiledbsoma_upgrade_shape(
  c(7200, 1848),
  check_only = TRUE # Omit this when operating on live data
)
exp$close()

For dataframes, the process is similar. If you want to expand only the soft limits for soma_joinid, you can use these methods instead:

If the dataframe’s tiledbsoma_has_upgraded_domain reports False, invoke .tiledbsoma_upgrade_domain
Otherwise, invoke the .change_domain method.

Python
R

pre_115_exp = tiledbsoma.Experiment.open(
    "tiledb://TileDB-Inc/shapes-example-pre-1.15-not-upgraded"
)
pre_115_exp.obs.tiledbsoma_has_upgraded_domain

False

pre_115_exp <- SOMAExperimentOpen("tiledb://TileDB-Inc/shapes-example-pre-1.15-not-upgraded")
pre_115_exp$obs$tiledbsoma_has_upgraded_domain()

FALSE

Python
R

pre_115_exp.obs.domain

((0, 2147483646),)

pre_115_exp$obs$domain()

$soma_joinid =

0
2147483646

Python
R

pre_115_exp.obs.maxdomain

((0, 2147483646),)

pre_115_exp$obs$maxdomain()

$soma_joinid =

0
2147483646

Python
R

pre_115_exp.obs.non_empty_domain()

((0, 2699),)

pre_115_exp$obs$non_empty_domain()

$soma_joinid =

0
2699

Python

with tiledbsoma.Experiment.open(pre_115_exp.uri, "w") as exp:
    exp.obs.tiledbsoma_upgrade_domain(
        [[0, pre_115_exp.obs.non_empty_domain()[0][1] + 1]],
        check_only=True,  # Omit check_only=True when operating on live data
    )

Python
R

pre_115_exp = tiledbsoma.Experiment.open(
    "tiledb://TileDB-Inc/shapes-example-pre-1.15-upgraded"
)

pre_115_exp <- SOMAExperimentOpen("tiledb://TileDB-Inc/shapes-example-pre-1.15-upgraded")

Python
R

pre_115_exp.obs.tiledbsoma_has_upgraded_domain

True

pre_115_exp$obs$tiledbsoma_has_upgraded_domain()

TRUE

Python
R

pre_115_exp.obs.domain

((0, 2699),)

pre_115_exp$obs$domain()

$soma_joinid =

0
2699

Python
R

pre_115_exp.obs.maxdomain

((0, 2147483646),)

pre_115_exp$obs$maxdomain()

$soma_joinid =

0
2147483646

TileDB-SOMA `shape` and `domain` in comparison to other TileDB terminology

TileDB-SOMA uses TileDB to implement the SOMA specification. You may find terminology corresponding to both TileDB and SOMA. This document has made use of SOMA terminology only. However, if you are familiar with broader TileDB concepts, here are the mappings.

Core domain:
- This has always existed.
- This is immutable: it cannot be changed either larger or smaller once a dataframe or array has been created.
- A SOMA DataFrame’s maxdomain is implemented by core domain.
- A SOMA SparseNDArray or DenseNDArray’s maxshape is implemented by core domain.
- It’s a runtime error to read or write data outside these boundaries.
- This is a hard limit, in that it can’t be increased.
Core current_domain:
- This was introduced in 2024 as of version 2.26 of the open-source core of TileDB, and is available in TileDB-SOMA as of version 1.15.
- This is mutable: it can’t be made smaller after dataframe or array creation, but you can make it larger, up to the core domain (SOMA maxdomain/maxshape).
- A SOMA DataFrame’s domain is implemented by core current_domain.
- A SOMA SparseNDArray or DenseNDArray’s shape is implemented by core current_domain.
- TileDB-SOMA will throw a runtime error if you try to read or write data outside these boundaries: you will see the error message A range was set outside of the current domain.
- This is a soft limit, in that may be increased up to the hard limit.
Dataframes/arrays created by TileDB-SOMA 1.14 or lower:
- These will necessarily have core domain (SOMA maxdomain and maxshape, respectively).
- These won’t have the core current_domain.
- When you ask for a SOMA dataset’s domain or shape, you get the same value as maxdomain or maxshape.
- Their tiledbsoma_has_upgraded_domain() and tiledbsoma_has_upgraded_shape() methods return False.
- Using the upgrade feature mentioned previously, you can apply a core current_domain.
Dataframes and arrays created by TileDB-SOMA 1.15 and later, or that have been upgraded:
- These will necessarily have the core domain (SOMA maxdomain and maxshape, respectively).
- These will also have the core current_domain (SOMA domain and shape, respectively).
- Their tiledbsoma_has_upgraded_domain() and tiledbsoma_has_upgraded_shape() methods return True.

The shape feature

When and how to resize at the experiment level

How to upgrade older experiments

Advanced usage

Dataframes with non-standard index columns

Use resize at the dataframe/array level with the SOMA API

TileDB-SOMA shape and domain in comparison to other TileDB terminology

Use `resize` at the dataframe/array level with the SOMA API

TileDB-SOMA `shape` and `domain` in comparison to other TileDB terminology