TileDB sparse arrays support the use of string dimensions. Learn how to work with string dimensions effectively in this tutorial.
How to run this tutorial
You can run this tutorial in two ways:
Locally on your machine.
On TileDB Cloud.
However, since TileDB Cloud has a free tier, we strongly recommend that you sign up and run everything there, as that requires no installations or deployment.
This tutorial explains how to use string dimensions. Note that this applies only to sparse arrays.
First, import the necessary libraries, set the array URI (that is, its path, which in this tutorial will be on local storage), and delete any previously created arrays with the same name.
# Create the two dimensions, one string, one int64d_str = tiledb.Dim(name="str_dim", domain=(None, None), dtype=np.bytes_)d_int = tiledb.Dim(name="int64_dim", domain=(0, 100), tile=10, dtype=np.int64)# Create a domain using the two dimensionsdom = tiledb.Domain(d_str, d_int)# Create an attributea = tiledb.Attr(name="a", dtype=np.int32)# Create the array schema, setting `sparse=True` to indicate a sparse arraysch = tiledb.ArraySchema(domain=dom, sparse=True, attrs=[a])# Create the array on disk (it will initially be empty)tiledb.Array.create(array_uri, sch)
# Create string, int64, and float64 dimensionsd_str <-tiledb_dim("str_dim", NULL, NULL, "ASCII")d_int <-tiledb_dim("int32_dim", c(1L, 100L), 10L, "INT32")# Create a domain using the above dimensionsdom <-tiledb_domain(c(d_str, d_int))# Create an attributeatt <-tiledb_attr("a", "INT32")# Create a sparse schemaschema <-tiledb_array_schema(dom, att, sparse =TRUE)# Create the array on diskarr <-tiledb_array_create(array_uri, schema)
Populate the TileDB array by passing 1D input arrays in the coordinate (COO) format.
# Open array as a data.framearr <-tiledb_array(array_uri, return_as ="data.frame")df <- arr[]# Print the whole dataframecat("Whole dataframe:\n")print(df)# Get a summary of the datacat("Summary:\n")print(summary(df))# Get string dimensioncat("String dimension:\n")print(df$str_dim)
Whole dataframe:
str_dim int32_dim a
1 aa 10 1
2 bbb 20 2
3 c 30 3
4 dddd 40 4
Summary:
str_dim int32_dim a
Length:4 Min. :10.0 Min. :1.00
Class :character 1st Qu.:17.5 1st Qu.:1.75
Mode :character Median :25.0 Median :2.50
Mean :25.0 Mean :2.50
3rd Qu.:32.5 3rd Qu.:3.25
Max. :40.0 Max. :4.00
String dimension:
[1] "aa" "bbb" "c" "dddd"