Array Upgrade

arrays

tutorials

python

New versions of the TileDB storage format require upgrading your arrays to support the latest features.

How to run this tutorial

You can run this tutorial in two ways:

Locally on your machine.
On TileDB Cloud.

However, since TileDB Cloud has a free tier, we strongly recommend that you sign up and run everything there, as that requires no installations or deployment.

TileDB arrays have a format version corresponding to the version of the library used in creating the array. The format version controls items such as the directory layout and the binary format. Writes to TileDB arrays are done in the same format version as the dataset in order to maintain client compatibility. When a new format version is available, you can upgrade your arrays to take advantage of improvements of the newer format.

Updating the array version is a logical change. This allows for future write operations or consolidation to write in the newer version. Calling the upgrade API will not rewrite the dataset. After calling the upgrade API, you can use consolidation to perform a rewrite of existing fragments.

First, import the necessary libraries, set the array URI (i.e., its path, which in this tutorial will be on local storage), and delete any previously created arrays with the same name.

Python
R

# Import necessary libraries
import os.path
import shutil

import numpy as np
import tiledb

# Set array URI
array_uri = os.path.expanduser("~/array_upgrade")

# Delete array if it already exists
if os.path.exists(array_uri):
    shutil.rmtree(array_uri)

# Import necessary libraries
library(tiledb)

# Set array URI
array_uri <- path.expand("~/array_upgrade_r")

# Delete array if it already exists
if (file.exists(array_uri)) {
  unlink(array_uri, recursive = TRUE)
}

Next, create the array by specifying its schema. This example focuses on a sparse array, but the array upgrade functionality is applicable to any array.

Python
R

# Create the two dimensions
d1 = tiledb.Dim(name="d1", domain=(0, 3), tile=2, dtype=np.int32)
d2 = tiledb.Dim(name="d2", domain=(0, 3), tile=2, dtype=np.int32)

# Create a domain using the two dimensions
dom = tiledb.Domain(d1, d2)
# Order of the dimensions matters when slicing subarrays.
# Remember to give priority to more selective dimensions to
# maximize the pruning power during slicing.

# Create an attribute
a = tiledb.Attr(name="a", dtype=np.int32)

# Create the array schema with `sparse=True`.
# Set `cell_order` to 'row-major' (default) or 'C', 'col-major' or 'F', or 'hilbert'.
# Set `tile_order` to 'row-major' (default) or 'C', 'col-major' or 'F'.
sch = tiledb.ArraySchema(domain=dom, sparse=True, attrs=[a])

# Create the array on disk (it will initially be empty)
tiledb.Array.create(array_uri, sch)

# Create the two dimensions
d1 <- tiledb_dim("d1", c(0L, 3L), 2L, "INT32")
d2 <- tiledb_dim("d2", c(0L, 3L), 2L, "INT32")

# Create a domain using the two dimensions
dom <- tiledb_domain(dims = c(d1, d2))
# Order of the dimensions matters when slicing subarrays.
# Remember to give priority to more selective dimensions to
# maximize the pruning power during slicing.

# Create an attribute
a <- tiledb_attr("a", type = "INT32")

# Create the array schema with `sparse = TRUE`
sch <- tiledb_array_schema(dom, a, sparse = TRUE)

# Create the array on disk (it will initially be empty)
arr <- tiledb_array_create(array_uri, sch)

Upgrade the array format to the latest supported by the library you are using.

Python
R

# Open the array for writing, optionally passing a context
with tiledb.open(array_uri, "w") as A:
    A.upgrade_version()

    # optionally pass a config
    A.upgrade_version(config=tiledb.Config())

# Open the array
arr <- tiledb_array(array_uri)

# Upgrade the array
tiledb_array_upgrade_version(arr)

# Optionally pass a config object
tiledb_array_upgrade_version(arr, config = tiledb_config())

# Or a ctx object
tiledb_array_upgrade_version(arr, ctx = tiledb_ctx())

Clean up in the end by deleting the array.

Python
R

# Delete the array
if os.path.exists(array_uri):
    shutil.rmtree(array_uri)

if (file.exists(array_uri)) {
  unlink(array_uri, recursive = TRUE)
}