Delete Array Data

arrays

tutorials

python

deletions

You can delete data written to a TileDB array while preserving the full time-traveling functionality.

How to run this tutorial

You can run this tutorial in two ways:

Locally on your machine.
On TileDB Cloud.

However, since TileDB Cloud has a free tier, we strongly recommend that you sign up and run everything there, as that requires no installations or deployment.

This tutorial shows how to perform deletions in sparse arrays. For more details on deletions, visit the Key Concepts: Deletions section.

Warning

Deletions are applicable only to sparse arrays.

First, import the necessary libraries, set the array URI (that is, its path, which in this tutorial will be on local storage), and delete any previously created arrays with the same name.

Python
R

# Import necessary libraries
import os.path
import shutil

import numpy as np
import tiledb

# Set array URI
array_uri = os.path.expanduser("~/deletions")

# Delete array if it already exists
if os.path.exists(array_uri):
    shutil.rmtree(array_uri)

# Import necessary libraries
library(tiledb)

# Set array URI
array_uri <- path.expand("~/deletions_r")

# Delete array if it already exists
if (file.exists(array_uri)) {
  unlink(array_uri, recursive = TRUE)
}

Next, create the array by specifying its schema.

Python
R

# Create the two dimensions
d1 = tiledb.Dim(name="d1", domain=(0, 3), tile=2, dtype=np.int32)
d2 = tiledb.Dim(name="d2", domain=(0, 3), tile=2, dtype=np.int32)

# Create a domain using the two dimensions
dom = tiledb.Domain(d1, d2)

# Create an attribute
a = tiledb.Attr(name="a", dtype=np.int32)

# Create the array schema with `sparse=True`.
sch = tiledb.ArraySchema(domain=dom, sparse=True, attrs=[a])

# Create the array on disk (it will initially be empty)
tiledb.Array.create(array_uri, sch)

# Create the two dimensions
d1 <- tiledb_dim("d1", c(0L, 3L), 2L, "INT32")
d2 <- tiledb_dim("d2", c(0L, 3L), 2L, "INT32")

# Create a domain using the two dimensions
dom <- tiledb_domain(dims = c(d1, d2))

# Create an attribute
a <- tiledb_attr("a", type = "INT32")

# Create the array schema with `sparse = TRUE`
sch <- tiledb_array_schema(dom, a, sparse = TRUE)

# Create the array on disk (it will initially be empty)
arr <- tiledb_array_create(array_uri, sch)

Populate the array with a set of 1-dimensional arrays, one for the coordinates of each dimension, and one for the attribute values. TileDB sparse arrays expect the coordinate (COO) format.

Python
R

# Prepare some data in numpy arrays
d1_data = np.array([2, 0, 3, 2, 0, 1], dtype=np.int32)
d2_data = np.array([0, 1, 1, 2, 3, 3], dtype=np.int32)
a_data = np.array([4, 1, 6, 5, 2, 3], dtype=np.int32)

# Open the array in write mode and write the data in COO format
with tiledb.open(array_uri, "w") as A:
    A[d1_data, d2_data] = a_data

# Prepare some data in an array
d1_data <- c(2L, 0L, 3L, 2L, 0L, 1L)
d2_data <- c(0L, 1L, 1L, 2L, 3L, 3L)
a_data <- c(4L, 1L, 6L, 5L, 2L, 3L)

# Open the array for writing and write data to the array
arr <- tiledb_array(
  uri = array_uri,
  query_type = "WRITE"
)
arr[d1_data, d2_data] <- a_data

# Close the array
invisible(tiledb_array_close(arr))

The array is a folder in the path specified in array_uri. You can learn about the different contents of the array folder in other sections of the Academy.

For now, observe the single file in the commits directory, which accounts for the write performed in the preceding code snippet.

/Users/stavrospapadopoulos/deletions
├── __commits
│   └── __1715724937904_1715724937904_7ae6e0c2b543277a9987d2244c0599fb_21.wrt
├── __fragment_meta
├── __fragments
│   └── __1715724937904_1715724937904_7ae6e0c2b543277a9987d2244c0599fb_21
│       ├── __fragment_metadata.tdb
│       ├── a0.tdb
│       ├── d0.tdb
│       └── d1.tdb
├── __labels
├── __meta
└── __schema
    ├── __1715724937894_1715724937894_00000002c0ffac5ba3823f340663bc5f
    └── __enumerations

9 directories, 6 files

Read the entire array.

Python
R

# Open the array in read mode
A = tiledb.open(array_uri, "r")

# Show the entire array
print("Entire array: ")
print(A[:])

# Remember to close the array
A.close()

Entire array: 
OrderedDict({'a': array([1, 2, 3, 4, 6, 5], dtype=int32), 'd1': array([0, 0, 1, 2, 3, 2], dtype=int32), 'd2': array([1, 3, 3, 0, 1, 2], dtype=int32)})

# Open the array in read mode
arr <- tiledb_array_open(arr, type = "READ")

# Show the entire array
cat("Entire array:\n")
print(arr[])

# Close the array
arr <- tiledb_array_close(arr)

Entire array:
  d1 d2 a
1  0  1 1
2  2  0 4
3  3  1 6
4  0  3 2
5  1  3 3
6  2  2 5

Delete all cells where the values of attribute a are greater than 4.

Python
R

# Delete all cells with a value greater than 4
qc = "a > 4"

# Issue delete query
with tiledb.open(array_uri, "d") as A:
    A.query(cond=qc).submit()

# Define a query condition to use for deletions
qc <- parse_query_condition(a > 4)
qry <- tiledb_query(arr, "DELETE")
tiledb_query_set_condition(qry, qc)
tiledb_query_submit(qry)

Reading the array again, you get the following.

Python
R

# Open the array in read mode
A = tiledb.open(array_uri, "r")

# Show the entire array
print("Entire array: ")
print(A[:])

# Remember to close the array
A.close()

Entire array: 
OrderedDict({'a': array([1, 2, 3, 4], dtype=int32), 'd1': array([0, 0, 1, 2], dtype=int32), 'd2': array([1, 3, 3, 0], dtype=int32)})

# Show the entire array
cat("Entire array:\n")
print(arr[])

# Close the array
arr <- tiledb_array_close(arr)

Entire array:
  d1 d2 a
1  0  1 1
2  2  0 4
3  0  3 2
4  1  3 3

Checking the array folder contents, observe the second file in the commits directory with a .del suffix.

/Users/stavrospapadopoulos/deletions
├── __commits
│   ├── __1715724937904_1715724937904_7ae6e0c2b543277a9987d2244c0599fb_21.wrt
│   └── __1715724938069_1715724938069_75f3b3cbd841672c1f5b05f29abf7458_21.del
├── __fragment_meta
├── __fragments
│   └── __1715724937904_1715724937904_7ae6e0c2b543277a9987d2244c0599fb_21
│       ├── __fragment_metadata.tdb
│       ├── a0.tdb
│       ├── d0.tdb
│       └── d1.tdb
├── __labels
├── __meta
└── __schema
    ├── __1715724937894_1715724937894_00000002c0ffac5ba3823f340663bc5f
    └── __enumerations

9 directories, 7 files

Clean up in the end by deleting the array.

Python
R

# Delete the array
if os.path.exists(array_uri):
    shutil.rmtree(array_uri)

if (file.exists(array_uri)) {
  unlink(array_uri, recursive = TRUE)
}