TileDB’s virtual filesystem (VFS) abstracts all I/O operations to storage backends behind a unified interface, supporting powerful file and directory management.
TileDB is designed such that all I/O to and from the storage backends is abstracted behind a virtual filesystem (VFS) module. The VFS module supports basic operations, such as creating a file or directory, reading from and writing to a file, and so on. With this abstraction, the TileDB team can add more storage backends in the future, effectively making the storage backend opaque to the user.
A nice positive “by-product” of this architecture is that it is possible to expose the basic VFS functionality through the TileDB APIs. This offers a simplified interface for file I/O and directory management (not related to TileDB assets such as arrays) on all the storage backends that TileDB supports.
This page covers most of the TileDB VFS functionality.
Setup
First, import the necessary libraries, set the array URI (that is, its path, which in this tutorial will be on local storage), and delete any previously created directories with the same name.
When writing to and reading from files, the Python VFS API treats files you open with the .open() method as a typical file through its io module, so all methods and attributes in the io module work with TileDB VFS file handlers.
The VFS API supports bytes only and does not automatically convert the data for you. Thus, you must open the file in binary mode and handle encoding manually. In the Python API, this only requires you use a b-string, but for floats, you’ll use struct.pack(). For the R API, you’ll need to serialize() all data first, and then cast it to an integer type with as.integer(), before passing it to tiledb_vfs_write().
# Create and open writable buffer objectwith vfs.open(path, "wb") as fh: fh.write(struct.pack("<f", 153.0)) fh.write(b"abcd")
fh <-tiledb_vfs_open(path, "WRITE")# create a binary payload from a serialized R objectpayload <-as.integer(serialize(list(dbl =153, string ="abcde"), NULL))# write it and close filetiledb_vfs_write(fh, payload)tiledb_vfs_close(fh)
# Write data again - this will overwrite the previous filewith vfs.open(path, "wb") as fh: fh.write(struct.pack("<f", 153.1)) fh.write(b"abcd")
# Write data again - this will overwrite the previous file# This is alternative syntax to the previous celltiledb_vfs_remove_file(uri = path)tiledb_vfs_serialize(obj =list(dbl =153, string ="abcde"), uri = path)
# Append data to existing file (this will NOT work on cloud object stores)with vfs.open(path, "ab") as fh: fh.write(b"ghijkl")
# Append data to existing object (this just overwrites the file again)obj <-tiledb_vfs_unserialize(uri = path)obj["string"] <-paste0(obj["string"], "ghijkl")tiledb_vfs_serialize(obj = obj, uri = path)
Open the file in read mode and decode the binary data:
# Create and open readable handlefh = vfs.open(path, "rb")float_struct = struct.Struct("<f")float_data = fh.read(float_struct.size)# Offset the starting bytefh.seek(float_struct.size)# Read the string datastring_data = fh.read(12)print(float_struct.unpack(float_data)[0])print(string_data.decode("UTF-8"))# Don't forget to close the handlefh = vfs.close(fh)
153.10000610351562
abcdghijkl
# Quickly print the unserialized path# print(tiledb_vfs_unserialize(path))# Create and open readable handlefh <-tiledb_vfs_open(path, "READ")# Get the file sizefile_size <-tiledb_vfs_file_size(path)# # Read the data into a vector of integers# vec_double <- tiledb_vfs_read(fh, 0, 228)# vec_str <- tiledb_vfs_read(fh, 208, file_size)vec <-tiledb_vfs_read(fh, 0, file_size)# Close the file handletiledb_vfs_close(fh)print(unserialize(as.raw(vec)))
$dbl
[1] 153
$string
[1] "abcde"
Common file operations
Create an empty file, similar to the Unix touch command:
print(f"Size of file {path}: {vfs.size(path)} bytes")print(f"Size of file {file_a}: {vfs.size(file_a)} bytes")# The .size() method also accepts directoriesprint(f"Size of dir {base_path}: {vfs.size(base_path)} bytes")
Size of file /Users/nickv/tiledb_vfs_py/tiledb_vfs.bin: 14
Size of file /Users/nickv/tiledb_vfs_py/dir_a/file_a: 0
Size of dir /Users/nickv/tiledb_vfs_py: 128
# Read file sizes with tiledb_vfs_file_size()cat(paste0("Size of file ", path, ": ", tiledb_vfs_file_size(path), " bytes\n"))cat(paste0("Size of file", file_a, ": ", tiledb_vfs_file_size(file_a), " bytes\n"))# Read directory sizes with tiledb_vfs_dir_size()cat(paste0("Size of file ", base_path, ": ", tiledb_vfs_dir_size(base_path), " bytes"))
Size of file /Users/nickv/tiledb_vfs_r/tiledb_vfs.bin: 448 bytes
Size of file/Users/nickv/tiledb_vfs_r/dir_a/file_a: 0 bytes
Size of file /Users/nickv/tiledb_vfs_r: 448 bytes
List the contents of a directory, similar to the Unix ls command. The results of this method are a list of files and directories in the given directory.
# Run an ls-like command on a directory:print("vfs.ls(base_path):\n")forfilein vfs.ls(base_path):print(f"- {file}")# You can run this recursively:print("\nvfs.ls(base_path, recursive=True):\n")forfilein vfs.ls(base_path, recursive=True):print(f"- {file}")# Shorthand for the recursive ls:print("\nvfs.ls_recursive(base_path):\n")forfilein vfs.ls_recursive(base_path):print(f"- {file}")
# Run an ls-like command on a directorycat(paste0("Non-recursive ls:\n\n"))for (path intiledb_vfs_ls(base_path)) {cat(paste0("- ", path, "\n"))}# You can make it recursivecat(paste0("\nRecursive ls:\n\n"))print(tiledb_vfs_ls_recursive(base_path))
if vfs.is_dir(base_path): vfs.remove_dir(base_path)
if (tiledb_vfs_is_dir(base_path)) {tiledb_vfs_remove_dir(base_path)}
Context and configuration
You can set a context, a configuration, or both on a VFS object. Any configuration object you pass through the config parameter overrides the ctx’s VFS configurations with updated values in config.
You can perform different operations on cloud storage buckets if you pass a valid URI. Except for appending data to an existing file, all the previously mentioned methods work on cloud storage buckets the same way as they do on files in your local filesystem.
You can check to see if your cloud storage provider is supported:
if (!tiledb_vfs_is_bucket(bucket_name)) {tiledb_vfs_create_bucket(bucket_name)}
Warning
You must take extreme care when creating or deleting buckets when using the VFS APIs. After its creation, the bucket may take some time to “appear” in the system. This will cause problems if you create the bucket and immediately try to write a file in it.
Wait some time before trying to write files to the bucket. You can add a polling mechanism with the .is_bucket() method in Python or the tiledb_vfs_is_bucket() function in R to verify TileDB created the bucket successfully.
After creating a new bucket and verifying the bucket exists, you can verify that the bucket is empty:
You can delete a bucket from cloud storage, with the appropriate permissions.
Caution
Deleting a bucket is irreversible.
Warning
Deleting a bucket may not take effect immediately. Thus, it may continue to “exist” for some time. You can apply a polling mechanism to check if you deleted the bucket successfully with the .is_bucket() method in Python or the tiledb_vfs_is_bucket() function in R.