Modify Array Metadata

arrays

tutorials

python

array metadata

Array metadata is a powerful way to assign key-value information to an array. Learn how to work with array metadata in this tutorial.

How to run this tutorial

You can run this tutorial in two ways:

Locally on your machine.
On TileDB Cloud.

However, since TileDB Cloud has a free tier, we strongly recommend that you sign up and run everything there, as that requires no installations or deployment.

This tutorial covers the basics of writing arbitrary metadata to an array. The example focuses on a dense array, but it applies identically to sparse arrays as well.

First, import the necessary libraries, set the array URI (that is, its path, which in this tutorial will be on local storage), and delete any previously created arrays with the same name.

Python
R

# Import necessary libraries
import os.path
import shutil

import numpy as np
import tiledb

# Set array URI
array_uri = os.path.expanduser("~/array_meta")

# Delete array if it already exists
if os.path.exists(array_uri):
    shutil.rmtree(array_uri)

# Import necessary libraries
library(tiledb)

# Set array URI
array_uri <- path.expand("~/array_meta_r")

# Delete array if it already exists
if (file.exists(array_uri)) {
  unlink(array_uri, recursive = TRUE)
}

Next, create an array (this one is dense, but it could have been a sparse array instead).

Python
R

# Create the two dimensions
d1 = tiledb.Dim(name="d1", domain=(1, 4), tile=2, dtype=np.int32)
d2 = tiledb.Dim(name="d2", domain=(1, 4), tile=2, dtype=np.int32)

# Create a domain using the two dimensions
dom = tiledb.Domain(d1, d2)

# Create an attribute
a = tiledb.Attr(name="a", dtype=np.int32)

# Create the array schema, setting `sparse=False` to indicate a dense array
sch = tiledb.ArraySchema(domain=dom, sparse=False, attrs=[a])

# Create the array on disk (it will initially be empty)
tiledb.Array.create(array_uri, sch)

# Create the two dimensions
d1 <- tiledb_dim("d1", c(1L, 4L), 2L, "INT32")
d2 <- tiledb_dim("d2", c(1L, 4L), 2L, "INT32")

# Create a domain using the two dimensions
dom <- tiledb_domain(dims = c(d1, d2))

# Create an attribute
a <- tiledb_attr("a", type = "INT32")

# Create the array schema, setting `sparse = FALSE` to indicate a dense array
sch <- tiledb_array_schema(dom, a, sparse = FALSE)

# Create the array on disk (it will initially be empty)
arr <- tiledb_array_create(array_uri, sch)

You can write arbitrary metadata to an array as follows.

Python
R

# Open the array in write mode and write some metadata
with tiledb.open(array_uri, "w") as A:
    A.meta["aaa"] = 1.1
    A.meta["bbb"] = 2
    A.meta["ccc"] = "hello!"
    # multiple values of the same type
    # may be written as a tuple:
    A.meta["tuple_int"] = (1, 2, 3, 4)
    # or list:
    A.meta["list_float"] = [1.0, 2.1, 3.2, 4.3]

array <- tiledb_array(
  array_uri,
  query_type = "WRITE",
  keep_open = TRUE,
  return_as = "data.frame"
)

tiledb_put_metadata(array, "aaa", 1.1)
tiledb_put_metadata(array, "bbb", 2L)
tiledb_put_metadata(array, "ccc", "hello!")
# Multiple values of the same type
# may be written as a vector:
tiledb_put_metadata(array, "vector_float", c(1.0, 2.1, 3.2, 4.3))

You can also delete already written metadata as follows.

Python
R

# You can delete metadata as follows
with tiledb.open(array_uri, "w") as A:
    del A.meta["bbb"]

tiledb_delete_metadata(array, "bbb")

# Remember to close the array
invisible(tiledb_array_close(array))

Now read the metadata you wrote to the array.

Python
R

# Open array for reading
with tiledb.open(array_uri) as A:
    # print all keys:
    print("keys:", A.meta.keys())

    # Read "aaa" key and print its value
    print("aaa:", A.meta["aaa"])

    # Read the "tuple_int" key
    print("tuple_int:", A.meta["tuple_int"])  # -> (1, 2, 3, 4)
    # Read the "list_float" key
    print("list_float:", A.meta["list_float"])  # -> (1.0, 2.1, 3.2, 4.3)

keys: ['aaa', 'ccc', 'list_float', 'tuple_int']
aaa: 1.1
tuple_int: (1, 2, 3, 4)
list_float: (1.0, 2.1, 3.2, 4.3)

invisible(tiledb_array_open(array, type = "READ"))

# Print all metadata
print(tiledb_get_all_metadata(array))

# Read "aaa" and print its value
print(paste("aaa:", tiledb_get_metadata(array, "aaa")))

# Read the "vector_float" key
print(paste("vector_float:", tiledb_get_metadata(array, "vector_float")))

aaa:    1.1
ccc:    hello!
vector_float:   1.0, 2.1, 3.2, 4.3
[1] "aaa: 1.1"
[1] "vector_float: 1"   "vector_float: 2.1" "vector_float: 3.2"
[4] "vector_float: 4.3"

An alternative way to read the metadata is by enumerating them as follows.

Python
R

# Open the array for reading
with tiledb.open(array_uri) as A:
    # print the keys
    print("keys:", A.meta.keys())

    # Iterate over all key-value pairs:
    for key, value in A.meta.items():
        print(f"{key}: {value}")

meta <- tiledb_get_all_metadata(array)

# Iterate over all key value pairs
for (key in names(meta)) {
  value <- meta[[key]]
  print(paste(key, ": ", value, sep = ""))
}

Finally, always clean up at the end.

Python
R

# Delete the array
if os.path.exists(array_uri):
    shutil.rmtree(array_uri)

invisible(tiledb_array_close(array))

if (file.exists(array_uri)) {
  unlink(array_uri, recursive = TRUE)
}