1. Structure
  2. Arrays
  3. Foundation
  4. Key Concepts
  5. Compute
  6. Concurrency
  • Home
  • What is TileDB?
  • Get Started
  • Explore Content
  • Accounts
    • Individual Accounts
      • Apply for the Free Tier
      • Profile
        • Overview
        • Cloud Credentials
        • Storage Paths
        • REST API Tokens
        • Credits
    • Organization Admins
      • Create an Organization
      • Profile
        • Overview
        • Members
        • Cloud Credentials
        • Storage Paths
        • Billing
      • API Tokens
    • Organization Members
      • Organization Invitations
      • Profile
        • Overview
        • Members
        • Cloud Credentials
        • Storage Paths
        • Billing
      • API Tokens
  • Catalog
    • Introduction
    • Data
      • Arrays
      • Tables
      • Single-Cell (SOMA)
      • Genomics (VCF)
      • Biomedical Imaging
      • Vector Search
      • Files
    • Code
      • Notebooks
      • Dashboards
      • User-Defined Functions
      • Task Graphs
      • ML Models
    • Groups
    • Marketplace
    • Search
  • Collaborate
    • Introduction
    • Organizations
    • Access Control
      • Introduction
      • Share Assets
      • Asset Permissions
      • Public Assets
    • Logging
    • Marketplace
  • Analyze
    • Introduction
    • Slice Data
    • Multi-Region Redirection
    • Notebooks
      • Launch a Notebook
      • Usage
      • Widgets
      • Notebook Image Dependencies
    • Dashboards
      • Dashboards
      • Streamlit
    • Preview
    • User-Defined Functions
    • Task Graphs
    • Serverless SQL
    • Monitor
      • Task Log
      • Task Graph Log
  • Scale
    • Introduction
    • Task Graphs
    • API Usage
  • Structure
    • Why Structure Is Important
    • Arrays
      • Introduction
      • Quickstart
      • Foundation
        • Array Data Model
        • Key Concepts
          • Storage
            • Arrays
            • Dimensions
            • Attributes
            • Cells
            • Domain
            • Tiles
            • Data Layout
            • Compression
            • Encryption
            • Tile Filters
            • Array Schema
            • Schema Evolution
            • Fragments
            • Fragment Metadata
            • Commits
            • Indexing
            • Array Metadata
            • Datetimes
            • Groups
            • Object Stores
          • Compute
            • Writes
            • Deletions
            • Consolidation
            • Vacuuming
            • Time Traveling
            • Reads
            • Query Conditions
            • Aggregates
            • User-Defined Functions
            • Distributed Compute
            • Concurrency
            • Parallelism
        • Storage Format Spec
      • Tutorials
        • Basics
          • Basic Dense Array
          • Basic Sparse Array
          • Array Metadata
          • Compression
          • Encryption
          • Data Layout
          • Tile Filters
          • Datetimes
          • Multiple Attributes
          • Variable-Length Attributes
          • String Dimensions
          • Nullable Attributes
          • Multi-Range Reads
          • Query Conditions
          • Aggregates
          • Deletions
          • Catching Errors
          • Configuration
          • Basic S3 Example
          • Basic TileDB Cloud
          • fromDataFrame
          • Palmer Penguins
        • Advanced
          • Schema Evolution
          • Advanced Writes
            • Write at a Timestamp
            • Get Fragment Info
            • Consolidation
              • Fragments
              • Fragment List
              • Consolidation Plan
              • Commits
              • Fragment Metadata
              • Array Metadata
            • Vacuuming
              • Fragments
              • Commits
              • Fragment Metadata
              • Array Metadata
          • Advanced Reads
            • Get Fragment Info
            • Time Traveling
              • Introduction
              • Fragments
              • Array Metadata
              • Schema Evolution
          • Array Upgrade
          • Backends
            • Amazon S3
            • Azure Blob Storage
            • Google Cloud Storage
            • MinIO
            • Lustre
          • Virtual Filesystem
          • User-Defined Functions
          • Distributed Compute
          • Result Estimation
          • Incomplete Queries
        • Management
          • Array Schema
          • Groups
          • Object Management
        • Performance
          • Summary of Factors
          • Dense vs. Sparse
          • Dimensions vs. Attributes
          • Compression
          • Tiling and Data Layout
          • Tuning Writes
          • Tuning Reads
      • API Reference
    • Tables
      • Introduction
      • Quickstart
      • Foundation
        • Data Model
        • Key Concepts
          • Indexes
          • Columnar Storage
          • Compression
          • Data Manipulation
          • Optimize Tables
          • ACID
          • Serverless SQL
          • SQL Connectors
          • Dataframes
          • CSV Ingestion
      • Tutorials
        • Basics
          • Ingestion with SQL
          • CSV Ingestion
          • Basic S3 Example
          • Running Locally
        • Advanced
          • Scalable Ingestion
          • Scalable Queries
      • API Reference
    • AI & ML
      • Vector Search
        • Introduction
        • Quickstart
        • Foundation
          • Data Model
          • Key Concepts
            • Vector Search
            • Vector Databases
            • Algorithms
            • Distance Metrics
            • Updates
            • Deployment Methods
            • Architecture
            • Distributed Compute
          • Storage Format Spec
        • Tutorials
          • Basics
            • Ingestion & Querying
            • Updates
            • Deletions
            • Basic S3 Example
            • Running Locally
          • Advanced
            • Versioning
            • Time Traveling
            • Consolidation
            • Distributed Compute
            • RAG LLM
            • LLM Memory
            • File Search
            • Image Search
            • Protein Search
          • Performance
        • API Reference
      • ML Models
        • Introduction
        • Quickstart
        • Foundation
          • Basics
          • Storage
          • Cloud Execution
          • Why TileDB for Machine Learning
        • Tutorials
          • Ingestion
            • Data Ingestion
              • Dense Datasets
              • Sparse Datasets
            • ML Model Ingestion
          • Management
            • Array Schema
            • Machine Learning: Groups
            • Time Traveling
    • Life Sciences
      • Single-cell
        • Introduction
        • Quickstart
        • Foundation
          • Data Model
          • Key Concepts
            • Data Structures
            • Use of Apache Arrow
            • Join IDs
            • State Management
            • TileDB Cloud URIs
          • SOMA API Specification
        • Tutorials
          • Data Ingestion
          • Bulk Ingestion Tutorial
          • Data Access
          • Distributed Compute
          • Basic S3 Example
          • Multi-Experiment Queries
          • Appending Data to a SOMA Experiment
          • Add New Measurements
          • SQL Queries
          • Running Locally
          • Shapes in TileDB-SOMA
          • Drug Discovery App
        • Spatial
          • Introduction
          • Foundation
            • Spatial Data Model
            • Data Structures
          • Tutorials
            • Spatial Data Ingestion
            • Access Spatial Data
            • Manage Coordinate Spaces
        • API Reference
      • Population Genomics
        • Introduction
        • Quickstart
        • Foundation
          • Data Model
          • Key Concepts
            • The N+1 Problem
            • Architecture
            • Arrays
            • Ingestion
            • Reads
            • Variant Statistics
            • Annotations
            • User-Defined Functions
            • Tables and SQL
            • Distributed Compute
          • Storage Format Spec
        • Tutorials
          • Basics
            • Basic Ingestion
            • Basic Queries
            • Export to VCF
            • Add New Samples
            • Deleting Samples
            • Basic S3 Example
            • Basic TileDB Cloud
          • Advanced
            • Scalable Ingestion
            • Scalable Queries
            • Query Transforms
            • Handling Large Queries
            • Annotations
              • Finding Annotations
              • Embedded Annotations
              • External Annotations
              • Annotation VCFs
              • Ingesting Annotations
            • Variant Statistics
            • Tables and SQL
            • User-Defined Functions
            • Sample Metadata
            • Split VCF
          • Performance
        • API Reference
          • Command Line Interface
          • Python API
          • Cloud API
      • Biomedical Imaging
        • Introduction
        • Foundation
          • Data Model
          • Key Concepts
            • Arrays
            • Ingestion
            • Reads
            • User Defined Functions
          • Storage Format Spec
        • Quickstart
        • Tutorials
          • Basics
            • Ingestion
            • Read
              • OpenSlide
              • TileDB-Py
          • Advanced
            • Batched Ingestion
            • Chunked Ingestion
            • Machine Learning
              • PyTorch
            • Napari
    • Files
  • API Reference
  • Self-Hosting
    • Installation
    • Upgrades
    • Administrative Tasks
    • Image Customization
      • Customize User-Defined Function Images
      • AWS ECR Container Registry
      • Customize Jupyter Notebook Images
    • Single Sign-On
      • Configure Single Sign-On
      • OpenID Connect
      • Okta SCIM
      • Microsoft Entra
  • Glossary

On this page

  • Writes
  • Reads
  • Mix reads and writes
  • Consolidation
  • Vacuuming
  • Array creation
  1. Structure
  2. Arrays
  3. Foundation
  4. Key Concepts
  5. Compute
  6. Concurrency

Concurrency

TileDB eases concurrency with its lock-free multiple writer and lock-free multiple reader.

TileDB is designed with parallel programming in mind. Specifically, scientific computing users may be familiar with using multiprocessing through tools like MPI, Dask, or Spark, or writing multi-threaded programs to improve performance. TileDB enables concurrency by using a lock-free multiple reader/writer model.

Writes

TileDB achieves concurrent writes by having each thread or process create one or more separate fragments for each write operation. No synchronization is needed across processes, and no internal state is shared across threads among the write operations. Thus, no locking is necessary. Regarding the concurrent creation of fragments, TileDB is thread-safe and process-safe, because each thread and process creates a fragment with a unique name that incorporates a UUID and an integer representing the number of milliseconds since the Unix epoch. Therefore, conflicts are virtually impossible, even at the storage backend level.

TileDB also supports lock-free, concurrent writes of array metadata. Each write creates a separate array metadata file with a unique name (also incorporating a UUID and integer timestamp). Thus, TileDB avoids name collisions entirely.

Visit Writes for more information.

Reads

During array opening, TileDB loads the array schema and fragment metadata to main memory once and shares them across all array objects referring to the same array. Thus, for the multi-threading case, it is highly recommended that you open the array once outside the atomic block and have all threads create the query on the same array object. This prevents the possible scenario where a thread opens the array, then closes it before another thread opens the array again, and so on. TileDB internally employs a reference count system, discarding the array schema and fragment metadata each time you close the array and the reference count reaches zero—TileDB typically caches the schema and the metadata, but it still needs to deserialize them in the previous scenario. Having all concurrent queries use the same array object eliminates this problem.

Reads in the multiprocessing setting are completely independent, and no locking is required. In the multi-threading scenario, locking is only employed through mutexes when the queries access the tile cache, which incurs a small cost.

Visit Reads for more information.

Mix reads and writes

You can mix concurrent reads and writes. Fragments are not visible unless the write query completes successfully (and a .ok file appears). With fragment-based writes, reads see the logical view of the array without any new, incomplete fragments. This multiple writer/multiple reader concurrency model of TileDB is more powerful than competing approaches, such as HDF5’s single writer/mutliple reader (SWMR) model. This feature comes with a more relaxed consistency model, described in ACID: Consistency.

Consolidation

You can perform consolidation in the background in parallel with and independently of other reads and writes. Any active reads are unable to see the consolidated fragment until consolidation completes.

Visit Consolidation for general information about the consolidation process.

Vacuuming

Vacuuming deletes fragments that have been consolidated. Although it can never lead to a corrupted array state, it may lead to issues if a read operation tries to access a fragment that TileDB is currently vacuuming. This is possible when you open the array at a timestamp before some consolidation took place, thus considering the fragment vacuumed. Most likely, that will lead to a segmentation fault or other unexpected behavior.

TileDB locks the array upon vacuuming to prevent this scenario and achieves this through mutexes in multi-threading, and file locking in multiprocessing (for those cloud storage backends that support it).

Caution

All POSIX-compliant filesystems and Windows filesystems support file locking. Note that Lustre supports POSIX file locking semantics and exposes local-level locking (mounted with -o localflock) and cluster-level locking (mounted with -o flock). For filesystems that do not support file locking, the multiprocessing programs are responsible for synchronizing the concurrent writes.

You must take particular care when vacuuming arrays on cloud object stores that do not have file locking. Without file locking, TileDB has no way to prevent vacuuming from deleting the earlier consolidated fragments. If another process is reading those fragments while consolidation is deleting them, the read process is likely to throw an error or crash.

Tip

For arrays stored in cloud object stores, avoid running vacuuming at the same time users are time traveling. You are usually safe to vacuum if users are reading the array at the current timestamp, and you are safest to vacuum if no users are actively running queries against the array.

Visit Vacuuming for more information.

Array creation

Array creation (that is, storing the array schema on persistent storage) is neither thread-safe nor process safe. The TileDB team does not expect a practical scenario where multiple threads and processes attempt to create the same array in parallel. It is recommended that only one thread and process creates the array, before multiple threads and processes start working concurrently on writes and reads.

Distributed Compute
Parallelism