Concurrency
TileDB is designed with parallel programming in mind. Specifically, scientific computing users may be familiar with using multiprocessing through tools like MPI, Dask, or Spark, or writing multi-threaded programs to improve performance. TileDB enables concurrency by using a lock-free multiple reader/writer model.
Writes
TileDB achieves concurrent writes by having each thread or process create one or more separate fragments for each write operation. No synchronization is needed across processes, and no internal state is shared across threads among the write operations. Thus, no locking is necessary. Regarding the concurrent creation of fragments, TileDB is thread-safe and process-safe, because each thread and process creates a fragment with a unique name that incorporates a UUID and an integer representing the number of milliseconds since the Unix epoch. Therefore, conflicts are virtually impossible, even at the storage backend level.
TileDB also supports lock-free, concurrent writes of array metadata. Each write creates a separate array metadata file with a unique name (also incorporating a UUID and integer timestamp). Thus, TileDB avoids name collisions entirely.
Visit Writes for more information.
Reads
During array opening, TileDB loads the array schema and fragment metadata to main memory once and shares them across all array objects referring to the same array. Thus, for the multi-threading case, it is highly recommended that you open the array once outside the atomic block and have all threads create the query on the same array object. This prevents the possible scenario where a thread opens the array, then closes it before another thread opens the array again, and so on. TileDB internally employs a reference count system, discarding the array schema and fragment metadata each time you close the array and the reference count reaches zero—TileDB typically caches the schema and the metadata, but it still needs to deserialize them in the previous scenario. Having all concurrent queries use the same array object eliminates this problem.
Reads in the multiprocessing setting are completely independent, and no locking is required. In the multi-threading scenario, locking is only employed through mutexes when the queries access the tile cache, which incurs a small cost.
Visit Reads for more information.
Mix reads and writes
You can mix concurrent reads and writes. Fragments are not visible unless the write query completes successfully (and a .ok
file appears). With fragment-based writes, reads see the logical view of the array without any new, incomplete fragments. This multiple writer/multiple reader concurrency model of TileDB is more powerful than competing approaches, such as HDF5’s single writer/mutliple reader (SWMR) model. This feature comes with a more relaxed consistency model, described in ACID: Consistency.
Consolidation
You can perform consolidation in the background in parallel with and independently of other reads and writes. Any active reads are unable to see the consolidated fragment until consolidation completes.
Visit Consolidation for general information about the consolidation process.
Vacuuming
Vacuuming deletes fragments that have been consolidated. Although it can never lead to a corrupted array state, it may lead to issues if a read operation tries to access a fragment that TileDB is currently vacuuming. This is possible when you open the array at a timestamp before some consolidation took place, thus considering the fragment vacuumed. Most likely, that will lead to a segmentation fault or other unexpected behavior.
TileDB locks the array upon vacuuming to prevent this scenario and achieves this through mutexes in multi-threading, and file locking in multiprocessing (for those cloud storage backends that support it).
All POSIX-compliant filesystems and Windows filesystems support file locking. Note that Lustre supports POSIX file locking semantics and exposes local-level locking (mounted with -o localflock
) and cluster-level locking (mounted with -o flock
). For filesystems that do not support file locking, the multiprocessing programs are responsible for synchronizing the concurrent writes.
You must take particular care when vacuuming arrays on cloud object stores that do not have file locking. Without file locking, TileDB has no way to prevent vacuuming from deleting the earlier consolidated fragments. If another process is reading those fragments while consolidation is deleting them, the read process is likely to throw an error or crash.
For arrays stored in cloud object stores, avoid running vacuuming at the same time users are time traveling. You are usually safe to vacuum if users are reading the array at the current timestamp, and you are safest to vacuum if no users are actively running queries against the array.
Visit Vacuuming for more information.
Array creation
Array creation (that is, storing the array schema on persistent storage) is neither thread-safe nor process safe. The TileDB team does not expect a practical scenario where multiple threads and processes attempt to create the same array in parallel. It is recommended that only one thread and process creates the array, before multiple threads and processes start working concurrently on writes and reads.