You can time travel on array metadata as you would with fragments. Learn how in this tutorial.
How to run this tutorial
You can run this tutorial in two ways:
Locally on your machine.
On TileDB Cloud.
However, since TileDB Cloud has a free tier, we strongly recommend that you sign up and run everything there, as that requires no installations or deployment.
This tutorial demonstrates the time traveling functionality on array metadata. For more information on time traveling and array metadata, visit the following sections:
First, import the necessary libraries, set the array URI (i.e., its path, which in this tutorial will be on local storage), and delete any previously created arrays with the same name.
# Import necessary librariesimport tiledbimport numpy as npimport shutilimport os.path# Set array URIarray_uri = os.path.expanduser("~/time_traveling_array_meta")# Delete array if it already existsif os.path.exists(array_uri): shutil.rmtree(array_uri)
# Import necessary librarieslibrary(tiledb)# Set array URIarray_uri <-path.expand("~/time_traveling_array_meta_r")# Delete array if it already existsif (file.exists(array_uri)) {unlink(array_uri, recursive =TRUE)}
Next, create the array by specifying its schema. This example uses a dense array, but the described time traveling functionality is applicable to any array.
# Create the two dimensionsd1 = tiledb.Dim(name="d1", domain=(1, 4), tile=2, dtype=np.int32)d2 = tiledb.Dim(name="d2", domain=(1, 4), tile=2, dtype=np.int32)# Create a domain using the two dimensionsdom = tiledb.Domain(d1, d2)# Create an attributea = tiledb.Attr(name="a", dtype=np.int32)# Create the array schema, setting `sparse=False` to indicate a dense arraysch = tiledb.ArraySchema(domain=dom, sparse=False, attrs=[a])# Create the array on disk (it will initially be empty)tiledb.Array.create(array_uri, sch)
# Create the two dimensionsd1 <-tiledb_dim("d1", c(1L, 4L), 2L, "INT32")d2 <-tiledb_dim("d2", c(1L, 4L), 2L, "INT32")# Create a domain using the two dimensionsdom <-tiledb_domain(dims =c(d1, d2))# Create an attributea <-tiledb_attr("a", type ="INT32")# Create the array schema, setting `sparse = FALSE` to indicate a dense arraysch <-tiledb_array_schema(dom, a, sparse =FALSE)# Create the array on disk (it will initially be empty)arr <-tiledb_array_create(array_uri, sch)
Write some metadata to the array. Observe that the write is perfomed at a specific user-defined timestamp (namely, at 1), in order to make time traveling easier later.
# Open the array in write mode and write some metadatawith tiledb.open(array_uri, "w", timestamp=1) as A: A.meta["aaa"] =1.1 A.meta["bbb"] =2
# Initialize a TileDB array objectarr <-tiledb_array( array_uri,return_as ="data.frame",)arr <-tiledb_array_close(arr)# Open the array in write mode and write some metadataarr <-tiledb_array_open_at( arr,type ="WRITE",timestamp =as.POSIXct(1))invisible(tiledb_put_metadata(arr, "aaa", 1.1))invisible(tiledb_put_metadata(arr, "bbb", 2L))arr <-tiledb_array_close(arr)
# You can delete metadata as followswith tiledb.open(array_uri, "w", timestamp=2) as A:del A.meta["bbb"]
# You can delete metadata as followsarr <-tiledb_array_open_at( arr,type ="WRITE",timestamp =as.POSIXct(2))invisible(tiledb_delete_metadata(arr, "bbb"))arr <-tiledb_array_close(arr)
The array is a folder in the path specified in array_uri. The contents are explained in other sections of the Academy, but you can observe that the above write operations created two files inside the meta directory, one prefixed with _1_1 and one with _2_2, the timestamps at which you performed the writes.
Next, read the array metadata at different timestamp combinations:
No timestamp provided: This means read at the current timestamp, which will include metadata from both the write and delete operations.
At timestamp 1: This will include metadata from only the write (i.e., read the array as of timestamp 1).
At timestamp 2: This will include metadata from both the write and delete operations (i.e., read the array as of timestamp 2)
At timestamp range [2,2]: This will include metadata written only within the timestamp range, i.e., metadata only from the second write. Since no metadata were written at timestamp 2 (only one deletion was performed), the read returns no metadata.
# Read at current timestampwith tiledb.open(array_uri) as A:print("Read at current timestamp:")for key, value in A.meta.items():print(f"{key}: {value}")print("\n")# Read at timestamp 1with tiledb.open(array_uri, timestamp=1) as A:print("Read at timestamp 1:")for key, value in A.meta.items():print(f"{key}: {value}")print("\n")# Read at timestamp 2with tiledb.open(array_uri, timestamp=2) as A:print("Read at timestamp 2:")for key, value in A.meta.items():print(f"{key}: {value}")print("\n")# Read at timestamp range [2,2]# At timestamp 2, only a deletion happened,# so the following should print an empty listwith tiledb.open(array_uri, timestamp=(2, 2)) as A:print("Read at timestamp range [2,2]:")for key, value in A.meta.items():print(f"{key}: {value}")
Read at current timestamp:
aaa: 1.1
Read at timestamp 1:
aaa: 1.1
bbb: 2
Read at timestamp 2:
aaa: 1.1
Read at timestamp range [2,2]:
# Read at current timestamparr <-tiledb_array_open(arr = arr, type ="READ")cat("Read at current timestamp:\n")for (name innames(tiledb_get_all_metadata(arr))) {cat(name, ": ", tiledb_get_all_metadata(arr)[[name]], "\n")}cat("\n")arr <-tiledb_array_close(arr)# Read at timestamp 1arr <-tiledb_array_open_at( arr,type ="READ",timestamp =as.POSIXct(1))cat("Read at timestamp 1:\n")for (name innames(tiledb_get_all_metadata(arr))) {cat(name, ": ", tiledb_get_all_metadata(arr)[[name]], "\n")}cat("\n")arr <-tiledb_array_close(arr)# Read at timestamp 2arr <-tiledb_array_open_at( arr,type ="READ",timestamp =as.POSIXct(2))cat("Read at timestamp 2:\n")for (name innames(tiledb_get_all_metadata(arr))) {cat(name, ": ", tiledb_get_all_metadata(arr)[[name]], "\n")}arr <-tiledb_array_close(arr)
Read at current timestamp:
aaa : 1.1
Read at timestamp 1:
aaa : 1.1
bbb : 2
Read at timestamp 2:
aaa : 1.1