As your data and business requirements evolve, so should your array schema. TileDB allows for versioned updates to your array schema.
How to run this tutorial
You can run this tutorial in two ways:
Locally on your machine.
On TileDB Cloud.
However, since TileDB Cloud has a free tier, we strongly recommend that you sign up and run everything there, as that requires no installations or deployment.
This tutorial describes the array schema evolution functionality in TileDB. For more details, visit the Key Concepts: Schema Evolution section.
First, import the necessary libraries, set the array URI (i.e., its path, which in this tutorial will be on local storage), and delete any previously created arrays with the same name.
# Import necessary librariesimport tiledbimport numpy as npimport shutilimport os.path# Set array URIarray_uri = os.path.expanduser("~/schema_evolution")# Delete array if it already existsif os.path.exists(array_uri): shutil.rmtree(array_uri)
# Import necessary librarieslibrary(tiledb)# Set array URIarray_uri <-path.expand("~/schema_evolution_r")# Delete array if it already existsif (file.exists(array_uri)) {unlink(array_uri, recursive =TRUE)}
Next, create an array by specifying its schema. This example uses a dense array, but this described functionality is applicable to sparse arrays as well. The array initially contains two attributes.
# Create the two dimensionsd1 = tiledb.Dim(name="d1", domain=(1, 4), tile=2, dtype=np.int32)d2 = tiledb.Dim(name="d2", domain=(1, 4), tile=2, dtype=np.int32)# Create a domain using the two dimensionsdom = tiledb.Domain(d1, d2)# Create two attributesa1 = tiledb.Attr(name="a1", dtype=np.int32)a2 = tiledb.Attr(name="a2", dtype=np.float32)# Create the array schema, setting `sparse=False` to indicate a dense array.sch = tiledb.ArraySchema(domain=dom, sparse=False, attrs=[a1, a2])# Create the array on disk (it will initially be empty)tiledb.Array.create(array_uri, sch)
# Create the two dimensionsd1 <-tiledb_dim("d1", c(1L, 4L), 2L, "INT32")d2 <-tiledb_dim("d2", c(1L, 4L), 2L, "INT32")# Create a domain using the two dimensionsdom <-tiledb_domain(dims =c(d1, d2))# Create two attributesa1 <-tiledb_attr("a1", type ="INT32")a2 <-tiledb_attr("a2", type ="FLOAT64")# Create the array schema, setting `sparse = FALSE` to indicate a dense arraysch <-tiledb_array_schema(dom, c(a1, a2), sparse =FALSE)# Create the array on disk (it will initially be empty)arr <-tiledb_array_create(array_uri, sch)
Populate the TileDB array using 2-dimensional input arrays, one for each attribute.
# Prepare some data in NumPy arraysa1_data = np.array( [[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12], [13, 14, 15, 16]], dtype=np.int32)a2_data = np.array( [ [1.1, 2.2, 3.3, 4.4], [5.5, 6.6, 7.7, 8.8], [9.9, 10.10, 11.11, 12.12], [13.13, 14.14, 15.15, 16.16], ], dtype=np.float32,)# Write data to the arraywith tiledb.open(array_uri, "w") as A: A[:] = {"a1": a1_data, "a2": a2_data}
# Prepare some data in two arrays, one for each attributea1_data <-t(array(1:16, dim =c(4, 4)))a2_data <-array(c(1.1, 2.2, 3.3, 4.4,5.5, 6.6, 7.7, 8.8,9.9, 10.10, 11.11, 12.12,13.13, 14.14, 15.15, 16.16 ),dim =c(4L, 4L))# Open the array for writing and write data to the arrayarr <-tiledb_array(uri = array_uri,query_type ="WRITE",return_as ="data.frame")arr[] <-list(a1 = a1_data,a2 = a2_data)# Close the arrayarr <-tiledb_array_close(arr)
The array schema and contents at this moment are as follows.
a = tiledb.Attr("a", dtype=np.int8)se = tiledb.ArraySchemaEvolution()se.add_attribute(a)se.array_evolve(array_uri)
a <-tiledb_attr("a", type ="INT8")se <-tiledb_array_schema_evolution()tiledb_array_schema_evolution_add_attribute(se, a)tiledb_array_schema_evolution_array_evolve(se, array_uri)
The array schema and contents after this second change are as follows. Observe that attribute a has no contents (value -128 is a fill value that in this case indicates an empty cell).
# Delete the arrayif os.path.exists(array_uri): shutil.rmtree(array_uri)
if (file.exists(array_uri)) {unlink(array_uri, recursive =TRUE)}
Note
If you wish to evolve the schema at a particular timestamp, similar to writing at a timestamp for fragments and array metadata (visit the Tutorials: Writing at a Timestamp section for details), you can set a timestamp to the schema evolution object. For an example, see how this is used in the Tutorials: Time Traveling - Schema Evolution section.