Writing data at a timestamp is particularly useful for versioning data and time traveling.
How to run this tutorial
You can run this tutorial in two ways:
Locally on your machine.
On TileDB Cloud.
However, since TileDB Cloud has a free tier, we strongly recommend that you sign up and run everything there, as that requires no installations or deployment.
This tutorial shows how you can control the timestamps of the write operations, which can be particularly useful in time traveling. For more information on time traveling, visit the following sections:
First, import the necessary libraries, set the array URI (i.e., its path, which in this tutorial will be on local storage), and delete any previously created arrays with the same name.
# Create the two dimensionsd1 = tiledb.Dim(name="d1", domain=(0, 3), tile=2, dtype=np.int32)d2 = tiledb.Dim(name="d2", domain=(0, 3), tile=2, dtype=np.int32)# Create a domain using the two dimensionsdom = tiledb.Domain(d1, d2)# Create an attributea = tiledb.Attr(name="a", dtype=np.int32)# Create the array schema with `sparse=True`.sch = tiledb.ArraySchema(domain=dom, sparse=True, attrs=[a])# Create the array on disk (it will initially be empty)tiledb.Array.create(array_uri, sch)
# Create the two dimensionsd1 <-tiledb_dim("d1", c(0L, 3L), 2L, "INT32")d2 <-tiledb_dim("d2", c(0L, 3L), 2L, "INT32")# Create a domain using the two dimensionsdom <-tiledb_domain(dims =c(d1, d2))# Create an attributea <-tiledb_attr("a", type ="INT32")# Create the array schema, setting `sparse = TRUE`sch <-tiledb_array_schema(dom, a, sparse =TRUE)# Create the array on disk (it will initially be empty)arr <-tiledb_array_create(array_uri, sch)
Populate the TileDB array using a set of 1D input arrays, one for the coordinates of each dimension, and one for the attribute values (TileDB sparse arrays expect the COO format). Observe that you can now set the timestamp for the write.
# Prepare some data in numpy arraysd1_data = np.array([2, 0, 3, 2, 0, 1], dtype=np.int32)d2_data = np.array([0, 1, 1, 2, 3, 3], dtype=np.int32)a_data = np.array([4, 1, 6, 5, 2, 3], dtype=np.int32)# Open the array in write mode and write the data in COO format.# NOTE: You can set the timestamp parameterwith tiledb.open(array_uri, "w", timestamp=1) as A: A[d1_data, d2_data] = a_data A.meta["a"] =1.0
# Prepare some data in an arrayd1_data <-c(2L, 0L, 3L, 2L, 0L, 1L)d2_data <-c(0L, 1L, 1L, 2L, 3L, 3L)a_data <-c(4L, 1L, 6L, 5L, 2L, 3L)# Declare a timestampts <-1# Create an array with a timestamp_start and timestamp_endarr <-tiledb_array(uri = array_uri,timestamp_start =as.POSIXct(ts),timestamp_end =as.POSIXct(ts),keep_open =TRUE)arr <-tiledb_array_close(arr)# Open the array for writing and write data to the arrayarr <-tiledb_array_open( arr,type ="WRITE")arr[d1_data, d2_data] <- a_datainvisible(tiledb_put_metadata(arr, "a", 1.0))# Close the arrayarr <-tiledb_array_close(arr)
The array is a folder in the path specified in array_uri. The contents are explained in other sections of the Academy, but the effect of setting the timestamp in the previous code snippet is that the written fragment in the fragments directory and the commit file in the commits directory are prefixed with _1_1 (i.e., the timestamp you set). Also the same is true for the written array metadata; the file inside the meta directory is prefixed with _1_1.