New versions of the TileDB storage format require upgrading your arrays to support the latest features.
How to run this tutorial
You can run this tutorial in two ways:
Locally on your machine.
On TileDB Cloud.
However, since TileDB Cloud has a free tier, we strongly recommend that you sign up and run everything there, as that requires no installations or deployment.
TileDB arrays have a format version corresponding to the version of the library used in creating the array. The format version controls items such as the directory layout and the binary format. Writes to TileDB arrays are done in the same format version as the dataset in order to maintain client compatibility. When a new format version is available, you can upgrade your arrays to take advantage of improvements of the newer format.
Updating the array version is a logical change. This allows for future write operations or consolidation to write in the newer version. Calling the upgrade API will not rewrite the dataset. After calling the upgrade API, you can use consolidation to perform a rewrite of existing fragments.
First, import the necessary libraries, set the array URI (i.e., its path, which in this tutorial will be on local storage), and delete any previously created arrays with the same name.
# Import necessary librariesimport tiledbimport numpy as npimport shutilimport os.path# Set array URIarray_uri = os.path.expanduser("~/array_upgrade")# Delete array if it already existsif os.path.exists(array_uri): shutil.rmtree(array_uri)
# Import necessary librarieslibrary(tiledb)# Set array URIarray_uri <-path.expand("~/array_upgrade_r")# Delete array if it already existsif (file.exists(array_uri)) {unlink(array_uri, recursive =TRUE)}
Next, create the array by specifying its schema. This example focuses on a sparse array, but the array upgrade functionality is applicable to any array.
# Create the two dimensionsd1 = tiledb.Dim(name="d1", domain=(0, 3), tile=2, dtype=np.int32)d2 = tiledb.Dim(name="d2", domain=(0, 3), tile=2, dtype=np.int32)# Create a domain using the two dimensionsdom = tiledb.Domain(d1, d2)# Order of the dimensions matters when slicing subarrays.# Remember to give priority to more selective dimensions to# maximize the pruning power during slicing.# Create an attributea = tiledb.Attr(name="a", dtype=np.int32)# Create the array schema with `sparse=True`.# Set `cell_order` to 'row-major' (default) or 'C', 'col-major' or 'F', or 'hilbert'.# Set `tile_order` to 'row-major' (default) or 'C', 'col-major' or 'F'.sch = tiledb.ArraySchema(domain=dom, sparse=True, attrs=[a])# Create the array on disk (it will initially be empty)tiledb.Array.create(array_uri, sch)
# Create the two dimensionsd1 <-tiledb_dim("d1", c(0L, 3L), 2L, "INT32")d2 <-tiledb_dim("d2", c(0L, 3L), 2L, "INT32")# Create a domain using the two dimensionsdom <-tiledb_domain(dims =c(d1, d2))# Order of the dimensions matters when slicing subarrays.# Remember to give priority to more selective dimensions to# maximize the pruning power during slicing.# Create an attributea <-tiledb_attr("a", type ="INT32")# Create the array schema with `sparse = TRUE`sch <-tiledb_array_schema(dom, a, sparse =TRUE)# Create the array on disk (it will initially be empty)arr <-tiledb_array_create(array_uri, sch)
Upgrade the array format to the latest supported by the library you are using.
# Open the array for writing, optionally passing a contextwith tiledb.open(array_uri, "w") as A: A.upgrade_version()# optionally pass a config A.upgrade_version(config=tiledb.Config())
# Open the arrayarr <-tiledb_array(array_uri)# Upgrade the arraytiledb_array_upgrade_version(arr)# Optionally pass a config objecttiledb_array_upgrade_version(arr, config =tiledb_config())# Or a ctx objecttiledb_array_upgrade_version(arr, ctx =tiledb_ctx())