You can delete data written to a TileDB array while preserving the full time-traveling functionality.
How to run this tutorial
You can run this tutorial in two ways:
Locally on your machine.
On TileDB Cloud.
However, since TileDB Cloud has a free tier, we strongly recommend that you sign up and run everything there, as that requires no installations or deployment.
This tutorial shows how to perform deletions in sparse arrays. For more details on deletions, visit the Key Concepts: Deletions section.
Warning
Deletions are applicable only to sparse arrays.
First, import the necessary libraries, set the array URI (that is, its path, which in this tutorial will be on local storage), and delete any previously created arrays with the same name.
# Create the two dimensionsd1 = tiledb.Dim(name="d1", domain=(0, 3), tile=2, dtype=np.int32)d2 = tiledb.Dim(name="d2", domain=(0, 3), tile=2, dtype=np.int32)# Create a domain using the two dimensionsdom = tiledb.Domain(d1, d2)# Create an attributea = tiledb.Attr(name="a", dtype=np.int32)# Create the array schema with `sparse=True`.sch = tiledb.ArraySchema(domain=dom, sparse=True, attrs=[a])# Create the array on disk (it will initially be empty)tiledb.Array.create(array_uri, sch)
# Create the two dimensionsd1 <-tiledb_dim("d1", c(0L, 3L), 2L, "INT32")d2 <-tiledb_dim("d2", c(0L, 3L), 2L, "INT32")# Create a domain using the two dimensionsdom <-tiledb_domain(dims =c(d1, d2))# Create an attributea <-tiledb_attr("a", type ="INT32")# Create the array schema with `sparse = TRUE`sch <-tiledb_array_schema(dom, a, sparse =TRUE)# Create the array on disk (it will initially be empty)arr <-tiledb_array_create(array_uri, sch)
Populate the array with a set of 1-dimensional arrays, one for the coordinates of each dimension, and one for the attribute values. TileDB sparse arrays expect the coordinate (COO) format.
# Prepare some data in numpy arraysd1_data = np.array([2, 0, 3, 2, 0, 1], dtype=np.int32)d2_data = np.array([0, 1, 1, 2, 3, 3], dtype=np.int32)a_data = np.array([4, 1, 6, 5, 2, 3], dtype=np.int32)# Open the array in write mode and write the data in COO formatwith tiledb.open(array_uri, "w") as A: A[d1_data, d2_data] = a_data
# Prepare some data in an arrayd1_data <-c(2L, 0L, 3L, 2L, 0L, 1L)d2_data <-c(0L, 1L, 1L, 2L, 3L, 3L)a_data <-c(4L, 1L, 6L, 5L, 2L, 3L)# Open the array for writing and write data to the arrayarr <-tiledb_array(uri = array_uri,query_type ="WRITE")arr[d1_data, d2_data] <- a_data# Close the arrayinvisible(tiledb_array_close(arr))
The array is a folder in the path specified in array_uri. You can learn about the different contents of the array folder in other sections of the Academy.
For now, observe the single file in the commits directory, which accounts for the write performed in the preceding code snippet.
# Open the array in read modeA = tiledb.open(array_uri, "r")# Show the entire arrayprint("Entire array: ")print(A[:])# Remember to close the arrayA.close()
# Open the array in read modearr <-tiledb_array_open(arr, type ="READ")# Show the entire arraycat("Entire array:\n")print(arr[])# Close the arrayarr <-tiledb_array_close(arr)
# Delete all cells with a value greater than 4qc ="a > 4"# Issue delete querywith tiledb.open(array_uri, "d") as A: A.query(cond=qc).submit()
# Define a query condition to use for deletionsqc <-parse_query_condition(a >4)qry <-tiledb_query(arr, "DELETE")tiledb_query_set_condition(qry, qc)tiledb_query_submit(qry)
# Open the array in read modeA = tiledb.open(array_uri, "r")# Show the entire arrayprint("Entire array: ")print(A[:])# Remember to close the arrayA.close()