Write at a Timestamp

Writing data at a timestamp is particularly useful for versioning data and time traveling.

How to run this tutorial

You can run this tutorial in two ways:

Locally on your machine.
On TileDB Cloud.

However, since TileDB Cloud has a free tier, we strongly recommend that you sign up and run everything there, as that requires no installations or deployment.

This tutorial shows how you can control the timestamps of the write operations, which can be particularly useful in time traveling. For more information on time traveling, visit the following sections:

First, import the necessary libraries, set the array URI (i.e., its path, which in this tutorial will be on local storage), and delete any previously created arrays with the same name.

Python
R

# Import necessary libraries
import tiledb
import numpy as np
import shutil
import os.path

# Set array URI
array_uri = os.path.expanduser("~/write_at_a_timestamp")

# Delete array if it already exists
if os.path.exists(array_uri):
    shutil.rmtree(array_uri)

# Import necessary libraries
library(tiledb)

# Set array URI
array_uri <- path.expand("~/write_at_a_timestamp_r")

# Delete array if it already exists
if (file.exists(array_uri)) {
  unlink(array_uri, recursive = TRUE)
}

Next, create the array by specifying its schema. This example focuses on a sparse array, but the described functionality is applicable to any array.

Python
R

# Create the two dimensions
d1 = tiledb.Dim(name="d1", domain=(0, 3), tile=2, dtype=np.int32)
d2 = tiledb.Dim(name="d2", domain=(0, 3), tile=2, dtype=np.int32)

# Create a domain using the two dimensions
dom = tiledb.Domain(d1, d2)

# Create an attribute
a = tiledb.Attr(name="a", dtype=np.int32)

# Create the array schema with `sparse=True`.
sch = tiledb.ArraySchema(domain=dom, sparse=True, attrs=[a])

# Create the array on disk (it will initially be empty)
tiledb.Array.create(array_uri, sch)

# Create the two dimensions
d1 <- tiledb_dim("d1", c(0L, 3L), 2L, "INT32")
d2 <- tiledb_dim("d2", c(0L, 3L), 2L, "INT32")

# Create a domain using the two dimensions
dom <- tiledb_domain(dims = c(d1, d2))

# Create an attribute
a <- tiledb_attr("a", type = "INT32")

# Create the array schema, setting `sparse = TRUE`
sch <- tiledb_array_schema(dom, a, sparse = TRUE)

# Create the array on disk (it will initially be empty)
arr <- tiledb_array_create(array_uri, sch)

Populate the TileDB array using a set of 1D input arrays, one for the coordinates of each dimension, and one for the attribute values (TileDB sparse arrays expect the COO format). Observe that you can now set the timestamp for the write.

Python
R

# Prepare some data in numpy arrays
d1_data = np.array([2, 0, 3, 2, 0, 1], dtype=np.int32)
d2_data = np.array([0, 1, 1, 2, 3, 3], dtype=np.int32)
a_data = np.array([4, 1, 6, 5, 2, 3], dtype=np.int32)

# Open the array in write mode and write the data in COO format.
# NOTE: You can set the timestamp parameter
with tiledb.open(array_uri, "w", timestamp=1) as A:
    A[d1_data, d2_data] = a_data
    A.meta["a"] = 1.0

# Prepare some data in an array
d1_data <- c(2L, 0L, 3L, 2L, 0L, 1L)
d2_data <- c(0L, 1L, 1L, 2L, 3L, 3L)
a_data <- c(4L, 1L, 6L, 5L, 2L, 3L)

# Declare a timestamp
ts <- 1

# Create an array with a timestamp_start and timestamp_end
arr <- tiledb_array(
  uri = array_uri,
  timestamp_start = as.POSIXct(ts),
  timestamp_end = as.POSIXct(ts),
  keep_open = TRUE
)
arr <- tiledb_array_close(arr)

# Open the array for writing and write data to the array
arr <- tiledb_array_open(
  arr,
  type = "WRITE"
)
arr[d1_data, d2_data] <- a_data
invisible(tiledb_put_metadata(arr, "a", 1.0))

# Close the array
arr <- tiledb_array_close(arr)

The array is a folder in the path specified in array_uri. The contents are explained in other sections of the Academy, but the effect of setting the timestamp in the previous code snippet is that the written fragment in the fragments directory and the commit file in the commits directory are prefixed with _1_1 (i.e., the timestamp you set). Also the same is true for the written array metadata; the file inside the meta directory is prefixed with _1_1.

/Users/stavrospapadopoulos/write_at_a_timestamp
├── __commits
│   └── __1_1_7b6c9ff91cb69f12c3be835e945d4f2a_21.wrt
├── __fragment_meta
├── __fragments
│   └── __1_1_7b6c9ff91cb69f12c3be835e945d4f2a_21
│       ├── __fragment_metadata.tdb
│       ├── a0.tdb
│       ├── d0.tdb
│       └── d1.tdb
├── __labels
├── __meta
│   └── __1_1_3b8e0ad95db063904a9f0b05c8197eff
└── __schema
    ├── __1716067732150_1716067732150_71308c3c1ac65314cbe8bfe6d4285436
    └── __enumerations

9 directories, 7 files

Clean up in the end by deleting the array.

Python
R

# Delete the array
if os.path.exists(array_uri):
    shutil.rmtree(array_uri)

# # Delete the array
if (file.exists(array_uri)) {
  unlink(array_uri, recursive = TRUE)
}