1. Structure
  2. Arrays
  3. Tutorials
  4. Advanced
  5. Virtual Filesystem
  • Home
  • What is TileDB?
  • Get Started
  • Explore Content
  • Accounts
    • Individual Accounts
      • Apply for the Free Tier
      • Profile
        • Overview
        • Cloud Credentials
        • Storage Paths
        • REST API Tokens
        • Credits
    • Organization Admins
      • Create an Organization
      • Profile
        • Overview
        • Members
        • Cloud Credentials
        • Storage Paths
        • Billing
      • API Tokens
    • Organization Members
      • Organization Invitations
      • Profile
        • Overview
        • Members
        • Cloud Credentials
        • Storage Paths
        • Billing
      • API Tokens
  • Catalog
    • Introduction
    • Data
      • Arrays
      • Tables
      • Single-Cell (SOMA)
      • Genomics (VCF)
      • Biomedical Imaging
      • Vector Search
      • Files
    • Code
      • Notebooks
      • Dashboards
      • User-Defined Functions
      • Task Graphs
      • ML Models
    • Groups
    • Marketplace
    • Search
  • Collaborate
    • Introduction
    • Organizations
    • Access Control
      • Introduction
      • Share Assets
      • Asset Permissions
      • Public Assets
    • Logging
    • Marketplace
  • Analyze
    • Introduction
    • Slice Data
    • Multi-Region Redirection
    • Notebooks
      • Launch a Notebook
      • Usage
      • Widgets
      • Notebook Image Dependencies
    • Dashboards
      • Dashboards
      • Streamlit
    • Preview
    • User-Defined Functions
    • Task Graphs
    • Serverless SQL
    • Monitor
      • Task Log
      • Task Graph Log
  • Scale
    • Introduction
    • Task Graphs
    • API Usage
  • Structure
    • Why Structure Is Important
    • Arrays
      • Introduction
      • Quickstart
      • Foundation
        • Array Data Model
        • Key Concepts
          • Storage
            • Arrays
            • Dimensions
            • Attributes
            • Cells
            • Domain
            • Tiles
            • Data Layout
            • Compression
            • Encryption
            • Tile Filters
            • Array Schema
            • Schema Evolution
            • Fragments
            • Fragment Metadata
            • Commits
            • Indexing
            • Array Metadata
            • Datetimes
            • Groups
            • Object Stores
          • Compute
            • Writes
            • Deletions
            • Consolidation
            • Vacuuming
            • Time Traveling
            • Reads
            • Query Conditions
            • Aggregates
            • User-Defined Functions
            • Distributed Compute
            • Concurrency
            • Parallelism
        • Storage Format Spec
      • Tutorials
        • Basics
          • Basic Dense Array
          • Basic Sparse Array
          • Array Metadata
          • Compression
          • Encryption
          • Data Layout
          • Tile Filters
          • Datetimes
          • Multiple Attributes
          • Variable-Length Attributes
          • String Dimensions
          • Nullable Attributes
          • Multi-Range Reads
          • Query Conditions
          • Aggregates
          • Deletions
          • Catching Errors
          • Configuration
          • Basic S3 Example
          • Basic TileDB Cloud
          • fromDataFrame
          • Palmer Penguins
        • Advanced
          • Schema Evolution
          • Advanced Writes
            • Write at a Timestamp
            • Get Fragment Info
            • Consolidation
              • Fragments
              • Fragment List
              • Consolidation Plan
              • Commits
              • Fragment Metadata
              • Array Metadata
            • Vacuuming
              • Fragments
              • Commits
              • Fragment Metadata
              • Array Metadata
          • Advanced Reads
            • Get Fragment Info
            • Time Traveling
              • Introduction
              • Fragments
              • Array Metadata
              • Schema Evolution
          • Array Upgrade
          • Backends
            • Amazon S3
            • Azure Blob Storage
            • Google Cloud Storage
            • MinIO
            • Lustre
          • Virtual Filesystem
          • User-Defined Functions
          • Distributed Compute
          • Result Estimation
          • Incomplete Queries
        • Management
          • Array Schema
          • Groups
          • Object Management
        • Performance
          • Summary of Factors
          • Dense vs. Sparse
          • Dimensions vs. Attributes
          • Compression
          • Tiling and Data Layout
          • Tuning Writes
          • Tuning Reads
      • API Reference
    • Tables
      • Introduction
      • Quickstart
      • Foundation
        • Data Model
        • Key Concepts
          • Indexes
          • Columnar Storage
          • Compression
          • Data Manipulation
          • Optimize Tables
          • ACID
          • Serverless SQL
          • SQL Connectors
          • Dataframes
          • CSV Ingestion
      • Tutorials
        • Basics
          • Ingestion with SQL
          • CSV Ingestion
          • Basic S3 Example
          • Running Locally
        • Advanced
          • Scalable Ingestion
          • Scalable Queries
      • API Reference
    • AI & ML
      • Vector Search
        • Introduction
        • Quickstart
        • Foundation
          • Data Model
          • Key Concepts
            • Vector Search
            • Vector Databases
            • Algorithms
            • Distance Metrics
            • Updates
            • Deployment Methods
            • Architecture
            • Distributed Compute
          • Storage Format Spec
        • Tutorials
          • Basics
            • Ingestion & Querying
            • Updates
            • Deletions
            • Basic S3 Example
            • Running Locally
          • Advanced
            • Versioning
            • Time Traveling
            • Consolidation
            • Distributed Compute
            • RAG LLM
            • LLM Memory
            • File Search
            • Image Search
            • Protein Search
          • Performance
        • API Reference
      • ML Models
        • Introduction
        • Quickstart
        • Foundation
          • Basics
          • Storage
          • Cloud Execution
          • Why TileDB for Machine Learning
        • Tutorials
          • Ingestion
            • Data Ingestion
              • Dense Datasets
              • Sparse Datasets
            • ML Model Ingestion
          • Management
            • Array Schema
            • Machine Learning: Groups
            • Time Traveling
    • Life Sciences
      • Single-cell
        • Introduction
        • Quickstart
        • Foundation
          • Data Model
          • Key Concepts
            • Data Structures
            • Use of Apache Arrow
            • Join IDs
            • State Management
            • TileDB Cloud URIs
          • SOMA API Specification
        • Tutorials
          • Data Ingestion
          • Bulk Ingestion Tutorial
          • Data Access
          • Distributed Compute
          • Basic S3 Example
          • Multi-Experiment Queries
          • Appending Data to a SOMA Experiment
          • Add New Measurements
          • SQL Queries
          • Running Locally
          • Shapes in TileDB-SOMA
          • Drug Discovery App
        • Spatial
          • Introduction
          • Foundation
            • Spatial Data Model
            • Data Structures
          • Tutorials
            • Spatial Data Ingestion
            • Access Spatial Data
            • Manage Coordinate Spaces
        • API Reference
      • Population Genomics
        • Introduction
        • Quickstart
        • Foundation
          • Data Model
          • Key Concepts
            • The N+1 Problem
            • Architecture
            • Arrays
            • Ingestion
            • Reads
            • Variant Statistics
            • Annotations
            • User-Defined Functions
            • Tables and SQL
            • Distributed Compute
          • Storage Format Spec
        • Tutorials
          • Basics
            • Basic Ingestion
            • Basic Queries
            • Export to VCF
            • Add New Samples
            • Deleting Samples
            • Basic S3 Example
            • Basic TileDB Cloud
          • Advanced
            • Scalable Ingestion
            • Scalable Queries
            • Query Transforms
            • Handling Large Queries
            • Annotations
              • Finding Annotations
              • Embedded Annotations
              • External Annotations
              • Annotation VCFs
              • Ingesting Annotations
            • Variant Statistics
            • Tables and SQL
            • User-Defined Functions
            • Sample Metadata
            • Split VCF
          • Performance
        • API Reference
          • Command Line Interface
          • Python API
          • Cloud API
      • Biomedical Imaging
        • Introduction
        • Foundation
          • Data Model
          • Key Concepts
            • Arrays
            • Ingestion
            • Reads
            • User Defined Functions
          • Storage Format Spec
        • Quickstart
        • Tutorials
          • Basics
            • Ingestion
            • Read
              • OpenSlide
              • TileDB-Py
          • Advanced
            • Batched Ingestion
            • Chunked Ingestion
            • Machine Learning
              • PyTorch
            • Napari
    • Files
  • API Reference
  • Self-Hosting
    • Installation
    • Upgrades
    • Administrative Tasks
    • Image Customization
      • Customize User-Defined Function Images
      • AWS ECR Container Registry
      • Customize Jupyter Notebook Images
    • Single Sign-On
      • Configure Single Sign-On
      • OpenID Connect
      • Okta SCIM
      • Microsoft Entra
  • Glossary

On this page

  • Setup
  • Write to and read from files
  • Common file operations
  • Context and configuration
  • Cloud object storage
  1. Structure
  2. Arrays
  3. Tutorials
  4. Advanced
  5. Virtual Filesystem

Virtual Filesystem

TileDB’s virtual filesystem (VFS) abstracts all I/O operations to storage backends behind a unified interface, supporting powerful file and directory management.

TileDB is designed such that all I/O to and from the storage backends is abstracted behind a virtual filesystem (VFS) module. The VFS module supports basic operations, such as creating a file or directory, reading from and writing to a file, and so on. With this abstraction, the TileDB team can add more storage backends in the future, effectively making the storage backend opaque to the user.

A nice positive “by-product” of this architecture is that it is possible to expose the basic VFS functionality through the TileDB APIs. This offers a simplified interface for file I/O and directory management (not related to TileDB assets such as arrays) on all the storage backends that TileDB supports.

This page covers most of the TileDB VFS functionality.

Setup

First, import the necessary libraries, set the array URI (that is, its path, which in this tutorial will be on local storage), and delete any previously created directories with the same name.

  • Python
  • R
import shutil
import struct
from pathlib import Path

import tiledb

ctx = tiledb.Ctx()
vfs = tiledb.VFS(ctx=ctx)

base_path = Path("~/tiledb_vfs_py").expanduser()
path = Path(base_path) / "tiledb_vfs.bin"

if Path(base_path).exists():
    shutil.rmtree(base_path)
library(tiledb)

vfs <- tiledb_vfs()

base_path <- path.expand("~/tiledb_vfs_r")
path <- file.path(base_path, "tiledb_vfs.bin")

# Delete array if it already exists
if (file.exists(base_path)) {
  unlink(base_path, recursive = TRUE)
}

Next, create a directory to hold the files you’ll be managing in this tutorial:

  • Python
  • R
# Create a directory
if not vfs.is_dir(base_path):
    vfs.create_dir(base_path)
    print(f"Created {base_path}")
else:
    print(f"{base_path} already exists")
Created /Users/nickv/tiledb_vfs_py
# Create a directory
if (!tiledb_vfs_is_dir(base_path)) {
  tiledb_vfs_create_dir(base_path)
  cat(paste0("Created ", base_path))
} else {
  cat(paste0(base_path, " already exists"))
}
Created /Users/nickv/tiledb_vfs_r

Write to and read from files

When writing to and reading from files, the Python VFS API treats files you open with the .open() method as a typical file through its io module, so all methods and attributes in the io module work with TileDB VFS file handlers.

The VFS API supports bytes only and does not automatically convert the data for you. Thus, you must open the file in binary mode and handle encoding manually. In the Python API, this only requires you use a b-string, but for floats, you’ll use struct.pack(). For the R API, you’ll need to serialize() all data first, and then cast it to an integer type with as.integer(), before passing it to tiledb_vfs_write().

  • Python
  • R
# Create and open writable buffer object
with vfs.open(path, "wb") as fh:
    fh.write(struct.pack("<f", 153.0))
    fh.write(b"abcd")
fh <- tiledb_vfs_open(path, "WRITE")

# create a binary payload from a serialized R object
payload <- as.integer(serialize(list(dbl = 153, string = "abcde"), NULL))

# write it and close file
tiledb_vfs_write(fh, payload)
tiledb_vfs_close(fh)
  • Python
  • R
# Write data again - this will overwrite the previous file
with vfs.open(path, "wb") as fh:
    fh.write(struct.pack("<f", 153.1))
    fh.write(b"abcd")
# Write data again - this will overwrite the previous file
# This is alternative syntax to the previous cell
tiledb_vfs_remove_file(uri = path)
tiledb_vfs_serialize(obj = list(dbl = 153, string = "abcde"), uri = path)

You can append data to a binary file as follows:

  • Python
  • R
# Append data to existing file (this will NOT work on cloud object stores)
with vfs.open(path, "ab") as fh:
    fh.write(b"ghijkl")
# Append data to existing object (this just overwrites the file again)
obj <- tiledb_vfs_unserialize(uri = path)
obj["string"] <- paste0(obj["string"], "ghijkl")
tiledb_vfs_serialize(obj = obj, uri = path)

Open the file in read mode and decode the binary data:

  • Python
  • R
# Create and open readable handle
fh = vfs.open(path, "rb")
float_struct = struct.Struct("<f")

float_data = fh.read(float_struct.size)

# Offset the starting byte
fh.seek(float_struct.size)

# Read the string data
string_data = fh.read(12)

print(float_struct.unpack(float_data)[0])
print(string_data.decode("UTF-8"))

# Don't forget to close the handle
fh = vfs.close(fh)
153.10000610351562
abcdghijkl
# Quickly print the unserialized path
# print(tiledb_vfs_unserialize(path))

# Create and open readable handle
fh <- tiledb_vfs_open(path, "READ")

# Get the file size
file_size <- tiledb_vfs_file_size(path)

# # Read the data into a vector of integers
# vec_double <- tiledb_vfs_read(fh, 0, 228)

# vec_str <- tiledb_vfs_read(fh, 208, file_size)

vec <- tiledb_vfs_read(fh, 0, file_size)

# Close the file handle
tiledb_vfs_close(fh)

print(unserialize(as.raw(vec)))
$dbl
[1] 153

$string
[1] "abcde"

Common file operations

Create an empty file, similar to the Unix touch command:

  • Python
  • R
# Create a directory
dir_a = Path(base_path) / "dir_a"
vfs.create_dir(dir_a)

# Create an empty file
file_a = Path(dir_a) / "file_a"
if not vfs.is_file(file_a):
    vfs.touch(file_a)
    print(f"Created empty file {file_a}")
else:
    print(f"{file_a} already exists")
Created empty file /Users/nickv/tiledb_vfs_py/dir_a/file_a
# Create a directory
dir_a <- file.path(base_path, "dir_a")
invisible(tiledb_vfs_create_dir(dir_a))

file_a <- file.path(dir_a, "file_a")
if (!tiledb_vfs_is_file(file_a)) {
  tiledb_vfs_touch(file_a)
  cat(paste0("Created empty file ", file_a))
} else {
  cat(paste0(file_a, " already exists"))
}
Created empty file /Users/nickv/tiledb_vfs_r/dir_a/file_a

Get the size of a file or directory in bytes:

  • Python
  • R
print(f"Size of file {path}: {vfs.size(path)} bytes")
print(f"Size of file {file_a}: {vfs.size(file_a)} bytes")

# The .size() method also accepts directories
print(f"Size of dir {base_path}: {vfs.size(base_path)} bytes")
Size of file /Users/nickv/tiledb_vfs_py/tiledb_vfs.bin: 14
Size of file /Users/nickv/tiledb_vfs_py/dir_a/file_a: 0
Size of dir /Users/nickv/tiledb_vfs_py: 128
# Read file sizes with tiledb_vfs_file_size()
cat(paste0("Size of file ", path, ": ", tiledb_vfs_file_size(path), " bytes\n"))
cat(paste0("Size of file", file_a, ": ", tiledb_vfs_file_size(file_a), " bytes\n"))

# Read directory sizes with tiledb_vfs_dir_size()
cat(paste0("Size of file ", base_path, ": ", tiledb_vfs_dir_size(base_path), " bytes"))
Size of file /Users/nickv/tiledb_vfs_r/tiledb_vfs.bin: 448 bytes
Size of file/Users/nickv/tiledb_vfs_r/dir_a/file_a: 0 bytes
Size of file /Users/nickv/tiledb_vfs_r: 448 bytes

List the contents of a directory, similar to the Unix ls command. The results of this method are a list of files and directories in the given directory.

  • Python
  • R
# Run an ls-like command on a directory:
print("vfs.ls(base_path):\n")
for file in vfs.ls(base_path):
    print(f"- {file}")

# You can run this recursively:
print("\nvfs.ls(base_path, recursive=True):\n")
for file in vfs.ls(base_path, recursive=True):
    print(f"- {file}")

# Shorthand for the recursive ls:
print("\nvfs.ls_recursive(base_path):\n")
for file in vfs.ls_recursive(base_path):
    print(f"- {file}")
vfs.ls(base_path):

- file:///Users/nickv/tiledb_vfs_py/dir_a
- file:///Users/nickv/tiledb_vfs_py/tiledb_vfs.bin

vfs.ls(base_path, recursive=True):

- file:///Users/nickv/tiledb_vfs_py/tiledb_vfs.bin
- file:///Users/nickv/tiledb_vfs_py/dir_a
- file:///Users/nickv/tiledb_vfs_py/dir_a/file_a

vfs.ls_recursive(base_path):

- file:///Users/nickv/tiledb_vfs_py/tiledb_vfs.bin
- file:///Users/nickv/tiledb_vfs_py/dir_a
- file:///Users/nickv/tiledb_vfs_py/dir_a/file_a
# Run an ls-like command on a directory
cat(paste0("Non-recursive ls:\n\n"))
for (path in tiledb_vfs_ls(base_path)) {
  cat(paste0("- ", path, "\n"))
}

# You can make it recursive
cat(paste0("\nRecursive ls:\n\n"))
print(tiledb_vfs_ls_recursive(base_path))
Non-recursive ls:

- file:///Users/nickv/tiledb_vfs_r/dir_a
- file:///Users/nickv/tiledb_vfs_r/tiledb_vfs.bin

Recursive ls:

                                             path size
1 file:///Users/nickv/tiledb_vfs_r/tiledb_vfs.bin  448
2          file:///Users/nickv/tiledb_vfs_r/dir_a    0
3   file:///Users/nickv/tiledb_vfs_r/dir_a/file_a    0

Copy file_a to a new path file_b.

Note

Copying files on Windows is not yet supported.

  • Python
  • R
file_b = Path(dir_a) / "file_b"
vfs.copy_file(file_a, file_b)

print("Files:\n")
for file in vfs.ls_recursive(base_path):
    print(f"- {file}")
Files:

- file:///Users/nickv/tiledb_vfs_py/tiledb_vfs.bin
- file:///Users/nickv/tiledb_vfs_py/dir_a
- file:///Users/nickv/tiledb_vfs_py/dir_a/file_a
- file:///Users/nickv/tiledb_vfs_py/dir_a/file_b
file_b <- file.path(dir_a, "file_b")
invisible(tiledb_vfs_copy_file(file_a, file_b))

cat(paste0("Files:\n\n"))
print(tiledb_vfs_ls_recursive(base_path))
Files:

                                             path size
1 file:///Users/nickv/tiledb_vfs_r/tiledb_vfs.bin  448
2          file:///Users/nickv/tiledb_vfs_r/dir_a    0
3   file:///Users/nickv/tiledb_vfs_r/dir_a/file_a    0
4   file:///Users/nickv/tiledb_vfs_r/dir_a/file_b    0

Copy dir_a to a new path dir_b. This recursively copies all files in that directory.

Note

Copying directories on Windows is not yet supported.

  • Python
  • R
dir_b = Path(base_path) / "dir_b"
vfs.copy_dir(dir_a, dir_b)

print("Files:\n")
for file in vfs.ls_recursive(base_path):
    print(f"- {file}")
Files:

- file:///Users/nickv/tiledb_vfs_py/tiledb_vfs.bin
- file:///Users/nickv/tiledb_vfs_py/dir_b
- file:///Users/nickv/tiledb_vfs_py/dir_b/file_a
- file:///Users/nickv/tiledb_vfs_py/dir_b/file_b
- file:///Users/nickv/tiledb_vfs_py/dir_a
- file:///Users/nickv/tiledb_vfs_py/dir_a/file_a
- file:///Users/nickv/tiledb_vfs_py/dir_a/file_b
copy_dir <- function(source, target) {
  if (tiledb_vfs_is_dir(target)) {
    stop(cat(paste0(target, " already exists")))
  } else {
    tiledb_vfs_create_dir(target)
  }
  for (f in sort(tiledb_vfs_ls_recursive(source)$path)) {
    source_file <- gsub("file://", "", f)
    target_file <- gsub(source, target, source_file)
    if (tiledb_vfs_is_dir(source_file)) {
      tiledb_vfs_create_dir(target_file)
    } else if (tiledb_vfs_is_file(source_file)) {
      tiledb_vfs_copy_file(source_file, target_file)
    } else {
      stop(cat(paste0(source_file, " is not a valid file")))
    }
  }
}

dir_b <- file.path(base_path, "dir_b")

copy_dir(dir_a, dir_b)

cat(paste0("Files:\n\n"))
print(sort(tiledb_vfs_ls_recursive(base_path)$path))
Files:

[1] "file:///Users/nickv/tiledb_vfs_r/dir_a"         
[2] "file:///Users/nickv/tiledb_vfs_r/dir_a/file_a"  
[3] "file:///Users/nickv/tiledb_vfs_r/dir_a/file_b"  
[4] "file:///Users/nickv/tiledb_vfs_r/dir_b"         
[5] "file:///Users/nickv/tiledb_vfs_r/dir_b/file_a"  
[6] "file:///Users/nickv/tiledb_vfs_r/dir_b/file_b"  
[7] "file:///Users/nickv/tiledb_vfs_r/tiledb_vfs.bin"

Rename file_b to file_c. The following command also works if you’re moving a file to a different directory without renaming it:

  • Python
  • R
file_c = Path(dir_a) / "file_c"
vfs.move_file(file_b, file_c)

print("Files:\n")
for file in vfs.ls_recursive(base_path):
    print(f"- {file}")
Files:

- file:///Users/nickv/tiledb_vfs_py/tiledb_vfs.bin
- file:///Users/nickv/tiledb_vfs_py/dir_b
- file:///Users/nickv/tiledb_vfs_py/dir_b/file_a
- file:///Users/nickv/tiledb_vfs_py/dir_b/file_b
- file:///Users/nickv/tiledb_vfs_py/dir_a
- file:///Users/nickv/tiledb_vfs_py/dir_a/file_a
- file:///Users/nickv/tiledb_vfs_py/dir_a/file_c
file_c <- file.path(dir_a, "file_c")
invisible(tiledb_vfs_move_file(file_a, file_c))

cat(paste0("Files:\n\n"))
print(tiledb_vfs_ls_recursive(base_path))
Files:

                                             path size
1 file:///Users/nickv/tiledb_vfs_r/tiledb_vfs.bin  448
2          file:///Users/nickv/tiledb_vfs_r/dir_b    0
3   file:///Users/nickv/tiledb_vfs_r/dir_b/file_a    0
4   file:///Users/nickv/tiledb_vfs_r/dir_b/file_b    0
5          file:///Users/nickv/tiledb_vfs_r/dir_a    0
6   file:///Users/nickv/tiledb_vfs_r/dir_a/file_c    0
7   file:///Users/nickv/tiledb_vfs_r/dir_a/file_b    0

You can also move directories, which moves the source directory and all children recursively to the destination directory, or rename the directory.

  • Python
  • R
dir_b = Path(base_path) / "dir_b"
vfs.copy_dir(dir_a, dir_b)

print("Files:\n")
for file in vfs.ls_recursive(base_path):
    print(f"- {file}")
Files:

- file:///Users/nickv/tiledb_vfs_py/tiledb_vfs.bin
- file:///Users/nickv/tiledb_vfs_py/dir_b
- file:///Users/nickv/tiledb_vfs_py/dir_b/file_a
- file:///Users/nickv/tiledb_vfs_py/dir_b/file_b
- file:///Users/nickv/tiledb_vfs_py/dir_a
- file:///Users/nickv/tiledb_vfs_py/dir_a/file_a
- file:///Users/nickv/tiledb_vfs_py/dir_a/file_b
copy_dir <- function(source, target) {
  if (tiledb_vfs_is_dir(target)) {
    stop(cat(paste0(target, " already exists")))
  } else {
    tiledb_vfs_create_dir(target)
  }
  for (f in sort(tiledb_vfs_ls_recursive(source)$path)) {
    source_file <- gsub("file://", "", f)
    target_file <- gsub(source, target, source_file)
    if (tiledb_vfs_is_dir(source_file)) {
      tiledb_vfs_create_dir(target_file)
    } else if (tiledb_vfs_is_file(source_file)) {
      tiledb_vfs_copy_file(source_file, target_file)
    } else {
      stop(cat(paste0(source_file, " is not a valid file")))
    }
  }
}

dir_b <- file.path(base_path, "dir_b")

copy_dir(dir_a, dir_b)

cat(paste0("Files:\n\n"))
print(sort(tiledb_vfs_ls_recursive(base_path)$path))
Files:

[1] "file:///Users/nickv/tiledb_vfs_r/dir_a"         
[2] "file:///Users/nickv/tiledb_vfs_r/dir_a/file_a"  
[3] "file:///Users/nickv/tiledb_vfs_r/dir_a/file_b"  
[4] "file:///Users/nickv/tiledb_vfs_r/dir_b"         
[5] "file:///Users/nickv/tiledb_vfs_r/dir_b/file_a"  
[6] "file:///Users/nickv/tiledb_vfs_r/dir_b/file_b"  
[7] "file:///Users/nickv/tiledb_vfs_r/tiledb_vfs.bin"

Remove file_c:

  • Python
  • R
if vfs.is_file(file_c):
    vfs.remove_file(file_c)

print("Files:\n")
for file in vfs.ls_recursive(base_path):
    print(f"- {file}")
Files:

- file:///Users/nickv/tiledb_vfs_py/tiledb_vfs.bin
- file:///Users/nickv/tiledb_vfs_py/dir_a
- file:///Users/nickv/tiledb_vfs_py/dir_a/dir_c
- file:///Users/nickv/tiledb_vfs_py/dir_a/dir_c/file_a
- file:///Users/nickv/tiledb_vfs_py/dir_a/dir_c/file_b
- file:///Users/nickv/tiledb_vfs_py/dir_a/file_a
if (tiledb_vfs_is_file(file_c)) {
  invisible(tiledb_vfs_remove_file(file_c))
}

cat(paste0("Files:\n\n"))
print(tiledb_vfs_ls_recursive(base_path))
Files:

                                                 path size
1     file:///Users/nickv/tiledb_vfs_r/tiledb_vfs.bin  448
2              file:///Users/nickv/tiledb_vfs_r/dir_a    0
3        file:///Users/nickv/tiledb_vfs_r/dir_a/dir_c    0
4 file:///Users/nickv/tiledb_vfs_r/dir_a/dir_c/file_a    0
5 file:///Users/nickv/tiledb_vfs_r/dir_a/dir_c/file_b    0
6       file:///Users/nickv/tiledb_vfs_r/dir_a/file_b    0

Remove the base_path directory and all its remaining children:

  • Python
  • R
if vfs.is_dir(base_path):
    vfs.remove_dir(base_path)
if (tiledb_vfs_is_dir(base_path)) {
  tiledb_vfs_remove_dir(base_path)
}

Context and configuration

You can set a context, a configuration, or both on a VFS object. Any configuration object you pass through the config parameter overrides the ctx’s VFS configurations with updated values in config.

  • Python
  • R
cfg = tiledb.Config(
    {"vfs.file.posix_file_permissions": "660", "vfs.read_logging_mode": "fragments"},
)
ctx = tiledb.Ctx(cfg)
cfg["vfs.file.posix_directory_permissions"] = "770"
cfg["vfs.read_logging_mode"] = "all_files"

vfs_cfg_ctx = tiledb.VFS(config=cfg, ctx=ctx)

new_cfg = vfs_cfg_ctx.config()

print(
    "vfs.file.posix_directory_permissions:",
    new_cfg["vfs.file.posix_directory_permissions"],
)
print("vfs.file.posix_file_permissions:", new_cfg["vfs.file.posix_file_permissions"])
print("vfs.read_logging_mode:", new_cfg["vfs.read_logging_mode"])
vfs.file.posix_directory_permissions: 770
vfs.file.posix_file_permissions: 660
vfs.read_logging_mode: all_files
cfg <- tiledb_config(
  c("vfs.file.posix_file_permissions" = "660", "vfs.read_logging_mode" = "fragments")
)
ctx <- tiledb_ctx(config = cfg)
cfg["vfs.file.posix_directory_permissions"] <- "770"
cfg["vfs.read_logging_mode"] <- "all_files"

vfs_cfg_ctx <- tiledb_vfs(config = cfg, ctx = ctx)

new_cfg <- config(tiledb_get_context())

cat(paste0("vfs.file.posix_directory_permissions: ", new_cfg["vfs.file.posix_directory_permissions"], "\n"))
cat(paste0("vfs.file.posix_file_permissions: ", new_cfg["vfs.file.posix_file_permissions"], "\n"))
cat(paste0("vfs.read_logging_mode: ", new_cfg["vfs.read_logging_mode"]))
vfs.file.posix_directory_permissions: 770
vfs.file.posix_file_permissions: 660
vfs.read_logging_mode: all_files

Cloud object storage

You can perform different operations on cloud storage buckets if you pass a valid URI. Except for appending data to an existing file, all the previously mentioned methods work on cloud storage buckets the same way as they do on files in your local filesystem.

You can check to see if your cloud storage provider is supported:

  • Python
  • R
print("Amazon S3 supported:", vfs.supports("s3"))
print("Microsoft Azure supported:", vfs.supports("azure"))
print("Google Cloud Storage supported:", vfs.supports("gcs"))

try:
    print("Storj supported:", vfs.supports("sj"))
except Exception:
    print("Storj supported: False")
Amazon S3 supported: True
Microsoft Azure supported: True
Google Cloud Storage supported: True
Storj supported: False
cat(paste0("Amazon S3 supported: ", tiledb_is_supported_fs("s3"), "\n"))
cat(paste0("Microsoft Azure supported: ", tiledb_is_supported_fs("azure"), "\n"))
cat(paste0("Google Cloud Storage supported: ", tiledb_is_supported_fs("gcs"), "\n"))

tryCatch(
  {
    cat(paste0("Storj supported: ", tiledb_is_supported_fs("sj")))
  },
  error = function(cond) {
    cat(paste0("Storj supported: FALSE"))
  }
)
Amazon S3 supported: TRUE
Microsoft Azure supported: TRUE
Google Cloud Storage supported: TRUE
Storj supported: FALSE

After logging in to your cloud storage provider with their preferred authentication mechanism, you can check if a bucket exists:

  • Python
  • R
bucket_name = "<cloud_provider_scheme>://<name_of_your_bucket>"
print(f"Bucket {bucket_name} exists: {vfs.is_bucket(bucket_name)}")
bucket_name <- "<cloud_provider_scheme>://<name_of_your_bucket>"
cat(paste0(
    "Bucket ",
    bucket_name,
    " exists: ",
    tiledb_vfs_is_bucket(bucket_name)
))

If the bucket doesn’t exist, you can create the bucket, if you have the appropriate permissions within your cloud storage provider:

  • Python
  • R
if not vfs.is_bucket(bucket_name):
    vfs.create_bucket(bucket_name)
if (!tiledb_vfs_is_bucket(bucket_name)) {
    tiledb_vfs_create_bucket(bucket_name)
}
Warning

You must take extreme care when creating or deleting buckets when using the VFS APIs. After its creation, the bucket may take some time to “appear” in the system. This will cause problems if you create the bucket and immediately try to write a file in it.

Wait some time before trying to write files to the bucket. You can add a polling mechanism with the .is_bucket() method in Python or the tiledb_vfs_is_bucket() function in R to verify TileDB created the bucket successfully.

After creating a new bucket and verifying the bucket exists, you can verify that the bucket is empty:

  • Python
  • R
print(f"Bucket {bucket_name} is empty: {vfs.is_empty_bucket(bucket_name)}")
bucket_name <-> "<name_of_your_bucket>"
cat(paste0(
    "Bucket ",
    bucket_name,
    " is empty: ",
    tiledb_vfs_is_empty_bucket(bucket_name)
))

You can empty a bucket, if you have permission to do so.

Caution

Emptying a bucket will permanently delete all items in that bucket. This is irreversible.

  • Python
  • R
vfs.empty_bucket(bucket_name)
tiledb_vfs_empty_bucket(bucket_name)

You can delete a bucket from cloud storage, with the appropriate permissions.

Caution

Deleting a bucket is irreversible.

Warning

Deleting a bucket may not take effect immediately. Thus, it may continue to “exist” for some time. You can apply a polling mechanism to check if you deleted the bucket successfully with the .is_bucket() method in Python or the tiledb_vfs_is_bucket() function in R.

  • Python
  • R
vfs.remove_bucket(bucket_name)
tiledb_vfs_remove_bucket(bucket_name)
Lustre
User-Defined Functions