Get Fragment Info

arrays

tutorials

python

fragments

You can fetch information about array fragments.

How to run this tutorial

You can run this tutorial in two ways:

Locally on your machine.
On TileDB Cloud.

However, since TileDB Cloud has a free tier, we strongly recommend that you sign up and run everything there, as that requires no installations or deployment.

This tutorial shows how to retrieve the information of all written fragments in an array, which can be particularly useful in time traveling. For more information on time traveling, visit the following sections:

First, import the necessary libraries, set the array URI (i.e., its path, which in this tutorial will be on local storage), and delete any previously created arrays with the same name.

Python
R

# Import necessary libraries
import os.path
import shutil

import numpy as np
import tiledb

# Set array URI
array_uri = os.path.expanduser("~/get_fragment_info_reads")

# Delete array if it already exists
if os.path.exists(array_uri):
    shutil.rmtree(array_uri)

# Import necessary libraries
library(tiledb)

# Set array URI
array_uri <- path.expand("~/get_fragment_info_reads_r")

# Delete array if it already exists
if (file.exists(array_uri)) {
  unlink(array_uri, recursive = TRUE)
}

Next, create the array by specifying its schema. This example focuses on a sparse array, but the described functionality is applicable to any array.

Python
R

# Create the two dimensions
d1 = tiledb.Dim(name="d1", domain=(0, 3), tile=2, dtype=np.int32)
d2 = tiledb.Dim(name="d2", domain=(0, 3), tile=2, dtype=np.int32)

# Create a domain using the two dimensions
dom = tiledb.Domain(d1, d2)

# Create an attribute
a = tiledb.Attr(name="a", dtype=np.int32)

# Create the array schema with `sparse=True`.
sch = tiledb.ArraySchema(domain=dom, sparse=True, attrs=[a])

# Create the array on disk (it will initially be empty)
tiledb.Array.create(array_uri, sch)

# Create the two dimensions
d1 <- tiledb_dim("d1", c(0L, 3L), 2L, "INT32")
d2 <- tiledb_dim("d2", c(0L, 3L), 2L, "INT32")

# Create a domain using the two dimensions
dom <- tiledb_domain(dims = c(d1, d2))

# Create an attribute
a <- tiledb_attr("a", type = "INT32")

# Create the array schema with `sparse = TRUE`
sch <- tiledb_array_schema(dom, a, sparse = TRUE)

# Create the array on disk (it will initially be empty)
arr <- tiledb_array_create(array_uri, sch)

Populate the array using a set of 1D arrays, one for the coordinates of each dimension, and one for the attribute values (TileDB sparse arrays expect the COO format).

Python
R

# Prepare some data in numpy arrays for the first write
d1_data = np.array([2, 0, 3, 2, 0, 1], dtype=np.int32)
d2_data = np.array([0, 1, 1, 2, 3, 3], dtype=np.int32)
a_data = np.array([4, 1, 6, 5, 2, 3], dtype=np.int32)

# Open the array in write mode and write the data in COO format.
with tiledb.open(array_uri, "w") as A:
    A[d1_data, d2_data] = a_data

# Prepare some data in an array
d1_data <- c(2L, 0L, 3L, 2L, 0L, 1L)
d2_data <- c(0L, 1L, 1L, 2L, 3L, 3L)
a_data <- c(4L, 1L, 6L, 5L, 2L, 3L)

# Open the array for writing and write data to the array
arr <- tiledb_array(
  uri = array_uri,
  query_type = "WRITE"
)
arr[d1_data, d2_data] <- a_data

# Close the array
invisible(tiledb_array_close(arr))

In a similar manner, perform a second write.

Python
R

# Prepare some data in numpy arrays for the second write
d1_data = np.array([0], dtype=np.int32)
d2_data = np.array([3], dtype=np.int32)
a_data = np.array([10], dtype=np.int32)

# Open the array in write mode and write the data in COO format.
# NOTE: You can get the fragment info after the write
with tiledb.open(array_uri, "w") as A:
    A[d1_data, d2_data] = a_data
    print(A.last_write_info)

# Prepare some data in an array
d1_data <- 0L
d2_data <- 3L
a_data <- 10L

# Open the array for writing and write data to the array
arr <- tiledb_array(
  uri = array_uri,
  query_type = "WRITE"
)
arr[d1_data, d2_data] <- a_data

# Close the array
invisible(tiledb_array_close(arr))

The array is a folder in the path specified in array_uri. The contents are explained in other sections of the Academy, but notice that the two write operations created two fragments in the fragments directory.

/Users/stavrospapadopoulos/get_fragment_info_reads
├── __commits
│   ├── __1715786277968_1715786277968_5a9ab248fade41fc2805c63923547045_21.wrt
│   └── __1715786277986_1715786277986_1334ed9685ed09a9fe07fb4faa2df7d8_21.wrt
├── __fragment_meta
├── __fragments
│   ├── __1715786277968_1715786277968_5a9ab248fade41fc2805c63923547045_21
│   │   ├── __fragment_metadata.tdb
│   │   ├── a0.tdb
│   │   ├── d0.tdb
│   │   └── d1.tdb
│   └── __1715786277986_1715786277986_1334ed9685ed09a9fe07fb4faa2df7d8_21
│       ├── __fragment_metadata.tdb
│       ├── a0.tdb
│       ├── d0.tdb
│       └── d1.tdb
├── __labels
├── __meta
└── __schema
    ├── __1715786277964_1715786277964_0000000269495c0a165e82e110308d18
    └── __enumerations

10 directories, 11 files

You can retrieve information about the fragments as follows.

Python
R

# Get fragment info
fragments_info = tiledb.array_fragments(array_uri)

# Number of fragments
print(len(fragments_info))

# URI of given fragment, with 0 <= idx < numfrag
print(fragments_info.uri[0])
print(fragments_info.uri[1])

# Timestamp range of given fragment, with 0 <= idx < numfrag
print(fragments_info.timestamp_range[0])
print(fragments_info.timestamp_range[1])

2
file:///Users/stavrospapadopoulos/get_fragment_info_reads/__fragments/__1715786277968_1715786277968_5a9ab248fade41fc2805c63923547045_21
file:///Users/stavrospapadopoulos/get_fragment_info_reads/__fragments/__1715786277986_1715786277986_1334ed9685ed09a9fe07fb4faa2df7d8_21
(1715786277968, 1715786277968)
(1715786277986, 1715786277986)

# Get fragment info
fragments_info <- tiledb_fragment_info(array_uri)

# Number of fragments
cat(tiledb_fragment_info_get_num(fragments_info), "\n")

# URI of given fragment, with 0 <= idx < numfrag
cat(tiledb_fragment_info_uri(fragments_info, 0), "\n")
cat(tiledb_fragment_info_uri(fragments_info, 1), "\n")

# Timestamp range of given fragment, with 0 <= idx < numfrag
cat(
  format(
    as.POSIXct(tiledb_fragment_info_get_timestamp_range(fragments_info, 0))
  ),
  "\n"
)
cat(
  format(
    as.POSIXct(tiledb_fragment_info_get_timestamp_range(fragments_info, 1))
  ),
  "\n"
)

2 
file:///Users/nickv/get_fragment_info_reads_r/__fragments/__1721315103605_1721315103605_4a5f706517601fd12674a8de1a7108aa_21 
file:///Users/nickv/get_fragment_info_reads_r/__fragments/__1721315103619_1721315103619_7b6941474f5ee1b0d07e2a1fdbca2dc8_21 
2024-07-18 11:05:03 2024-07-18 11:05:03 
2024-07-18 11:05:03 2024-07-18 11:05:03

Clean up in the end by deleting the array.

Python
R

# Delete the array
if os.path.exists(array_uri):
    shutil.rmtree(array_uri)

if (file.exists(array_uri)) {
  unlink(array_uri, recursive = TRUE)
}