Learn how you can compress your array data for a smaller persisted storage footprint.
How to run this tutorial
You can run this tutorial in two ways:
Locally on your machine.
On TileDB Cloud.
However, since TileDB Cloud has a free tier, we strongly recommend that you sign up and run everything there, as that requires no installations or deployment.
This tutorial describes how to use compression for attributes and dimensions. For more information on compression (including the supported compression filters), visit the Key Concepts: Compression section.
First, import the necessary libraries, set the array URI (that is, its path, which in this tutorial will be on local storage), and delete any previously created arrays with the same name.
# Set up some filter lists to pass to dimensions and attributes# tiledb.FilterList accepts an iterable of zero or more filtersfilter_list_1 = tiledb.FilterList([tiledb.GzipFilter(level=10)], chunksize=10000)filter_list_2 = tiledb.FilterList([tiledb.ZstdFilter(level=-1)])
# Set up some filter lists to pass to dimensions and attributes# tiledb_filter_list() accepts a vector of zero or more filtersgzip_filter <-tiledb_filter("GZIP")tiledb_filter_set_option(gzip_filter, "COMPRESSION_LEVEL", 10L)filter_list_1 <-tiledb_filter_list(c(gzip_filter))set_max_chunk_size(filter_list_1, 10000)zstd_filter <-tiledb_filter("ZSTD")tiledb_filter_set_option(zstd_filter, "COMPRESSION_LEVEL", -1L)filter_list_2 <-tiledb_filter_list(c(zstd_filter))
Then create an array and pass the desired filters as arguments in each dimension and attribute.
# Create the two dimensions# The filter list we created above is passed into the `filters` parameterd1 = tiledb.Dim(name="d1", domain=(0, 3), tile=2, dtype=np.int32, filters=filter_list_1)d2 = tiledb.Dim(name="d2", domain=(0, 3), tile=2, dtype=np.int32, filters=filter_list_1)# Create a domain using the two dimensionsdom = tiledb.Domain(d1, d2)# Create an attribute# The filter list we created above is passed into the `filters` parametera = tiledb.Attr(name="a", dtype=np.int32, filters=filter_list_2)# Create the array schema with `sparse=True`sch = tiledb.ArraySchema(domain=dom, sparse=True, attrs=[a])# Create the array on disk (it will initially be empty)tiledb.Array.create(array_uri, sch)
# Create the two dimensionsd1 <-tiledb_dim("d1", c(0L, 3L), 2L, "INT32")d2 <-tiledb_dim("d2", c(0L, 3L), 2L, "INT32")# Create a domain using the two dimensionsdom <-tiledb_domain(dims =c(d1, d2))# Create an attributea <-tiledb_attr("a", type ="INT32")# Create the array schema with `sparse = TRUE`sch <-tiledb_array_schema(dom, a, sparse =TRUE)# Create the array on disk (it will initially be empty)arr <-tiledb_array_create(array_uri, sch)
For convenience, you can pass the same filters to all dimensions by passing them as an argument to the array schema object.
# ... create domain dom# ... create attributes attr1, attr2# ... create filter list filter_list# Create the schema setting the coordinates filter list.# This is applicable only to sparse arrays.schema = tiledb.ArraySchema( domain=dom, sparse=False, attrs=[a], coords_filters=filter_list_1)
# ... create domain dom# ... create attributes attr1, attr2# ... create filter list filter_list# Create the schema setting the coordinates filter list.# This is applicable only to sparse arrays.# assign filter list to schematiledb_array_schema_set_coords_filter_list(sch, filter_list_1)# Alternatively create the schema and set the coordinates filter listsch <-tiledb_array_schema( dom, a,coords_filter_list = filter_list_2)
You can also set filters for the variable-length attribute and dimension offsets.