Use Datetimes with TileDB Arrays

arrays

tutorials

python

datetimes

Learn how to work with datetimes in TileDB arrays, including writing and reading datetime data.

How to run this tutorial

You can run this tutorial in two ways:

Locally on your machine.
On TileDB Cloud.

However, since TileDB Cloud has a free tier, we strongly recommend that you sign up and run everything there, as that requires no installations or deployment.

This tutorial explains how to use datetimes. For more information, visit the Key Concepts: Datetimes section.

First, import the necessary libraries, set the array URI (that is, its path, which in this tutorial will be on local storage), and delete any previously created arrays with the same name.

Python
R

# Import necessary libraries
import os.path
import shutil

import numpy as np
import tiledb

# Set array URI
array_uri = os.path.expanduser("~/datetimes")

# Delete array if it already exists
if os.path.exists(array_uri):
    shutil.rmtree(array_uri)

library(tiledb)

array_uri <- path.expand("~/datetimes_r")

if (dir.exists(array_uri)) {
  unlink(array_uri, recursive = TRUE)
}

Next, create an 1D array by specifying its schema. Notice how to specify a datetime dimensions an define its tiling.

Python
R

# Single dimension with a domain of 10 years, day resolution, one tile per 365 days
dim = tiledb.Dim(
    name="dim",
    domain=(np.datetime64("2014-01-01"), np.datetime64("2024-01-01")),
    tile=np.timedelta64(365, "D"),
    dtype=np.datetime64("", "D").dtype,
)

# Add the dimension to the array domain
dom = tiledb.Domain(dim)

# Create an attribute
a = tiledb.Attr(name="a", dtype=np.float64)

# Create the array schema, setting `sparse=False` to indicate a dense array
sch = tiledb.ArraySchema(domain=dom, sparse=False, attrs=[a])

# Create the array on disk (it will initially be empty)
tiledb.Array.create(array_uri, sch)

# Domain is 10 years, day resolution, one tile per 365 days
dim <- tiledb_dim(
  "d1",
  c(as.Date("2014-01-01"), as.Date("2024-01-01")),
  365,
  "FLOAT64"
)
dom <- tiledb_domain(dims = c(dim))
sch <- tiledb_array_schema(
  dom,
  attrs = c(tiledb_attr("a1", type = "FLOAT64")),
  sparse = TRUE
)
arr <- tiledb_array_create(array_uri, sch)

Populate the TileDB array with a 1D input array.

Python
R

# Randomly generate 2 years of values for attribute 'a'
ndays = 365 * 2
a_vals = np.random.rand(ndays)

# Write the data at the beginning of the domain
start = np.datetime64("2014-01-01")
end = start + np.timedelta64(ndays - 1, "D")

# Write the data to the array
with tiledb.open(array_uri, "w") as A:
    A[start:end] = {"a": a_vals}

# Randomly generate 2 years of values for attribute 'a1'
ndays <- 365 * 2
a1_vals <- runif(ndays, min = 0, max = 1)

# Write the data at the beginning of the domain
d1_vals <- as.Date("2014-01-01") + 0:(ndays - 1)

arr <- tiledb_array(array_uri, return_as = "data.frame")
arr[] <- data.frame(d1 = d1_vals, a1 = a1_vals)

Read a slice of the data written.

Python
R

# Slice a few days from the middle using two datetimes
with tiledb.open(array_uri, "r", attr="a") as A:
    vals = A[np.datetime64("2014-11-01") : np.datetime64("2015-01-31")]
    print(vals)

[0.74648454 0.94789591 0.1149732  0.42300291 0.69696624 0.40833649
 0.63957744 0.57712728 0.52034725 0.1776967  0.9090284  0.38128281
 0.7646486  0.40420533 0.22839856 0.97798135 0.14239953 0.15138299
 0.57731402 0.20118496 0.2142516  0.62497396 0.00405885 0.50783416
 0.76770287 0.58806622 0.8121114  0.00370367 0.7482651  0.23087373
 0.77405921 0.77311234 0.24123093 0.33915956 0.95535612 0.71309336
 0.22220483 0.17921614 0.13226682 0.29301397 0.37226229 0.54678013
 0.79259178 0.62073576 0.55023035 0.25993844 0.79710839 0.89988925
 0.79044067 0.79034833 0.02678336 0.15186647 0.25540738 0.34682933
 0.79444135 0.37009185 0.83391189 0.46006167 0.82409507 0.51484141
 0.79109737 0.364555   0.91773069 0.58404277 0.96545549 0.79497207
 0.47532554 0.87704637 0.28672392 0.7343242  0.32434522 0.84295835
 0.09182258 0.55224877 0.56418694 0.01961472 0.94888983 0.48602958
 0.36733696 0.35354304 0.8362431  0.32185166 0.6879994  0.11198504
 0.51070241 0.30038353 0.8310266  0.73545288 0.02440096 0.67357947
 0.99745675 0.61844584]

arr <- tiledb_array(array_uri, return_as = "data.frame")
dat <- arr[as.Date("2014-11-01"):as.Date("2015-01-31")]["a1"]
print(dat)

            a1
1  0.712983265
2  0.088592994
3  0.031015285
4  0.782151856
5  0.046300765
6  0.033481821
7  0.978140368
8  0.633407647
9  0.923134662
10 0.600184229
11 0.360446875
12 0.288378358
13 0.750761432
14 0.334509616
15 0.905673793
16 0.262677963
17 0.877805450
18 0.477983785
19 0.526758750
20 0.576989970
21 0.592521695
22 0.530202027
23 0.728172884
24 0.587344419
25 0.369918146
26 0.681274444
27 0.951152359
28 0.182864725
29 0.599137348
30 0.992036244
31 0.311782320
32 0.498838423
33 0.605828947
34 0.786204587
35 0.646769766
36 0.631602470
37 0.450656176
38 0.018546762
39 0.458982277
40 0.791440152
41 0.016405281
42 0.101019274
43 0.152830346
44 0.052043710
45 0.681212134
46 0.346061537
47 0.582493251
48 0.731068654
49 0.055793842
50 0.869694304
51 0.990531203
52 0.656195527
53 0.292967161
54 0.947189726
55 0.378109916
56 0.555134332
57 0.767158172
58 0.542514948
59 0.854055091
60 0.588082922
61 0.366171707
62 0.780188766
63 0.140783411
64 0.858407169
65 0.180123144
66 0.459191737
67 0.899637877
68 0.784253929
69 0.500418225
70 0.776788717
71 0.598329837
72 0.167414829
73 0.889009025
74 0.322153720
75 0.153571705
76 0.741593552
77 0.375871683
78 0.663539977
79 0.434836608
80 0.348283913
81 0.787555164
82 0.224822458
83 0.691553252
84 0.119212263
85 0.135721842
86 0.198813488
87 0.903014536
88 0.537094163
89 0.001003132
90 0.996139813
91 0.684726957
92 0.195436064

Clean up in the end by deleting the array.

Python
R

# Delete the array
if os.path.exists(array_uri):
    shutil.rmtree(array_uri)

if (dir.exists(array_uri)) {
  unlink(array_uri, recursive = TRUE)
}