1. Glossary
  • Home
  • What is TileDB?
  • Get Started
  • Explore Content
  • Accounts
    • Individual Accounts
      • Apply for the Free Tier
      • Profile
        • Overview
        • Cloud Credentials
        • Storage Paths
        • REST API Tokens
        • Credits
    • Organization Admins
      • Create an Organization
      • Profile
        • Overview
        • Members
        • Cloud Credentials
        • Storage Paths
        • Billing
      • API Tokens
    • Organization Members
      • Organization Invitations
      • Profile
        • Overview
        • Members
        • Cloud Credentials
        • Storage Paths
        • Billing
      • API Tokens
  • Catalog
    • Introduction
    • Data
      • Arrays
      • Tables
      • Single-Cell (SOMA)
      • Genomics (VCF)
      • Biomedical Imaging
      • Vector Search
      • Files
    • Code
      • Notebooks
      • Dashboards
      • User-Defined Functions
      • Task Graphs
      • ML Models
    • Groups
    • Marketplace
    • Search
  • Collaborate
    • Introduction
    • Organizations
    • Access Control
      • Introduction
      • Share Assets
      • Asset Permissions
      • Public Assets
    • Logging
    • Marketplace
  • Analyze
    • Introduction
    • Slice Data
    • Multi-Region Redirection
    • Notebooks
      • Launch a Notebook
      • Usage
      • Widgets
      • Notebook Image Dependencies
    • Dashboards
      • Dashboards
      • Streamlit
    • Preview
    • User-Defined Functions
    • Task Graphs
    • Serverless SQL
    • Monitor
      • Task Log
      • Task Graph Log
  • Scale
    • Introduction
    • Task Graphs
    • API Usage
  • Structure
    • Why Structure Is Important
    • Arrays
      • Introduction
      • Quickstart
      • Foundation
        • Array Data Model
        • Key Concepts
          • Storage
            • Arrays
            • Dimensions
            • Attributes
            • Cells
            • Domain
            • Tiles
            • Data Layout
            • Compression
            • Encryption
            • Tile Filters
            • Array Schema
            • Schema Evolution
            • Fragments
            • Fragment Metadata
            • Commits
            • Indexing
            • Array Metadata
            • Datetimes
            • Groups
            • Object Stores
          • Compute
            • Writes
            • Deletions
            • Consolidation
            • Vacuuming
            • Time Traveling
            • Reads
            • Query Conditions
            • Aggregates
            • User-Defined Functions
            • Distributed Compute
            • Concurrency
            • Parallelism
        • Storage Format Spec
      • Tutorials
        • Basics
          • Basic Dense Array
          • Basic Sparse Array
          • Array Metadata
          • Compression
          • Encryption
          • Data Layout
          • Tile Filters
          • Datetimes
          • Multiple Attributes
          • Variable-Length Attributes
          • String Dimensions
          • Nullable Attributes
          • Multi-Range Reads
          • Query Conditions
          • Aggregates
          • Deletions
          • Catching Errors
          • Configuration
          • Basic S3 Example
          • Basic TileDB Cloud
          • fromDataFrame
          • Palmer Penguins
        • Advanced
          • Schema Evolution
          • Advanced Writes
            • Write at a Timestamp
            • Get Fragment Info
            • Consolidation
              • Fragments
              • Fragment List
              • Consolidation Plan
              • Commits
              • Fragment Metadata
              • Array Metadata
            • Vacuuming
              • Fragments
              • Commits
              • Fragment Metadata
              • Array Metadata
          • Advanced Reads
            • Get Fragment Info
            • Time Traveling
              • Introduction
              • Fragments
              • Array Metadata
              • Schema Evolution
          • Array Upgrade
          • Backends
            • Amazon S3
            • Azure Blob Storage
            • Google Cloud Storage
            • MinIO
            • Lustre
          • Virtual Filesystem
          • User-Defined Functions
          • Distributed Compute
          • Result Estimation
          • Incomplete Queries
        • Management
          • Array Schema
          • Groups
          • Object Management
        • Performance
          • Summary of Factors
          • Dense vs. Sparse
          • Dimensions vs. Attributes
          • Compression
          • Tiling and Data Layout
          • Tuning Writes
          • Tuning Reads
      • API Reference
    • Tables
      • Introduction
      • Quickstart
      • Foundation
        • Data Model
        • Key Concepts
          • Indexes
          • Columnar Storage
          • Compression
          • Data Manipulation
          • Optimize Tables
          • ACID
          • Serverless SQL
          • SQL Connectors
          • Dataframes
          • CSV Ingestion
      • Tutorials
        • Basics
          • Ingestion with SQL
          • CSV Ingestion
          • Basic S3 Example
          • Running Locally
        • Advanced
          • Scalable Ingestion
          • Scalable Queries
      • API Reference
    • AI & ML
      • Vector Search
        • Introduction
        • Quickstart
        • Foundation
          • Data Model
          • Key Concepts
            • Vector Search
            • Vector Databases
            • Algorithms
            • Distance Metrics
            • Updates
            • Deployment Methods
            • Architecture
            • Distributed Compute
          • Storage Format Spec
        • Tutorials
          • Basics
            • Ingestion & Querying
            • Updates
            • Deletions
            • Basic S3 Example
            • Running Locally
          • Advanced
            • Versioning
            • Time Traveling
            • Consolidation
            • Distributed Compute
            • RAG LLM
            • LLM Memory
            • File Search
            • Image Search
            • Protein Search
          • Performance
        • API Reference
      • ML Models
        • Introduction
        • Quickstart
        • Foundation
          • Basics
          • Storage
          • Cloud Execution
          • Why TileDB for Machine Learning
        • Tutorials
          • Ingestion
            • Data Ingestion
              • Dense Datasets
              • Sparse Datasets
            • ML Model Ingestion
          • Management
            • Array Schema
            • Machine Learning: Groups
            • Time Traveling
    • Life Sciences
      • Single-cell
        • Introduction
        • Quickstart
        • Foundation
          • Data Model
          • Key Concepts
            • Data Structures
            • Use of Apache Arrow
            • Join IDs
            • State Management
            • TileDB Cloud URIs
          • SOMA API Specification
        • Tutorials
          • Data Ingestion
          • Bulk Ingestion Tutorial
          • Data Access
          • Distributed Compute
          • Basic S3 Example
          • Multi-Experiment Queries
          • Appending Data to a SOMA Experiment
          • Add New Measurements
          • SQL Queries
          • Running Locally
          • Shapes in TileDB-SOMA
          • Drug Discovery App
        • Spatial
          • Introduction
          • Foundation
            • Spatial Data Model
            • Data Structures
          • Tutorials
            • Spatial Data Ingestion
            • Access Spatial Data
            • Manage Coordinate Spaces
        • API Reference
      • Population Genomics
        • Introduction
        • Quickstart
        • Foundation
          • Data Model
          • Key Concepts
            • The N+1 Problem
            • Architecture
            • Arrays
            • Ingestion
            • Reads
            • Variant Statistics
            • Annotations
            • User-Defined Functions
            • Tables and SQL
            • Distributed Compute
          • Storage Format Spec
        • Tutorials
          • Basics
            • Basic Ingestion
            • Basic Queries
            • Export to VCF
            • Add New Samples
            • Deleting Samples
            • Basic S3 Example
            • Basic TileDB Cloud
          • Advanced
            • Scalable Ingestion
            • Scalable Queries
            • Query Transforms
            • Handling Large Queries
            • Annotations
              • Finding Annotations
              • Embedded Annotations
              • External Annotations
              • Annotation VCFs
              • Ingesting Annotations
            • Variant Statistics
            • Tables and SQL
            • User-Defined Functions
            • Sample Metadata
            • Split VCF
          • Performance
        • API Reference
          • Command Line Interface
          • Python API
          • Cloud API
      • Biomedical Imaging
        • Introduction
        • Foundation
          • Data Model
          • Key Concepts
            • Arrays
            • Ingestion
            • Reads
            • User Defined Functions
          • Storage Format Spec
        • Quickstart
        • Tutorials
          • Basics
            • Ingestion
            • Read
              • OpenSlide
              • TileDB-Py
          • Advanced
            • Batched Ingestion
            • Chunked Ingestion
            • Machine Learning
              • PyTorch
            • Napari
    • Files
  • API Reference
  • Self-Hosting
    • Installation
    • Upgrades
    • Administrative Tasks
    • Image Customization
      • Customize User-Defined Function Images
      • AWS ECR Container Registry
      • Customize Jupyter Notebook Images
    • Single Sign-On
      • Configure Single Sign-On
      • OpenID Connect
      • Okta SCIM
      • Microsoft Entra
  • Glossary

On this page

  • A
    • Academy
    • Account
    • Aggregate
    • API token
    • Array
    • Array domain
    • Array metadata
    • Array schema
    • Asset
    • Attribute
  • B
    • Binary variant call format (BCF)
  • C
    • Cell
    • Cell order
    • Cloud credentials
    • Column-major order
    • Commit
    • Consolidation
    • Configuration
    • Context
    • Coordinates
    • Cosine distance
  • D
    • Dashboard
    • Data tile
    • Default context
    • Deletion
    • Dense array
    • Dimension
    • Dimension domain
    • Distance function
  • E
    • Embedding
    • Empty cell
    • Euclidean distance function
    • Expression quantitative trait loci (eQTL) analysis
  • F
    • Fill values
    • Filter
    • FLAT
    • Fragment
    • Fragment metadata
    • Frontier data
  • G
    • Genome-wide association study (GWAS)
    • Genomic VCF (gVCF)
    • Global cell order
    • Group
  • I
    • Incomplete query
    • Inner product distance
    • Internal allele frequency (IAF)
    • Interoperability
    • IVF_FLAT
  • L
    • L2 distance function
    • Large language model (LLM)
  • M
    • Machine Learning (ML)
    • Marketplace
    • Member
    • ML models
    • Multi-Modal Omics
    • Multi-Modal Support
    • Multi-range subarray
  • N
    • N+1 problem
    • Namespace
    • No-call
    • Non-empty domain
    • Normalization
    • Notebook
    • Notebook server
    • Nullable attribute
  • O
    • Omics
    • Organization
  • P
    • Phenome-wide association studies (PheWAS)
    • Polygenic risk scores (PRS)
    • Population VCF (pVCF)
    • PQ (Product Quantization)
  • Q
    • Query
    • Query condition
  • R
    • R-tree
    • Reinforcement learning models
    • Role-Based Access Control (RBAC)
    • Row-major order
  • S
    • Scalability
    • Schema
    • Schema evolution
    • Semi-supervised learning models
    • Single-range subarray
    • Slice
    • SOMA
    • Space tile
    • Sparse array
    • Storage paths
    • Subarray
    • Supervised learning models
  • T
    • Task
    • Task graph
    • Task log
    • Task graph log
    • Technical performance
    • Throughput
    • Tile
    • Tile extent
    • Tile order
    • TileDB Cloud
    • Time traveling
  • U
    • Unsupervised learning models
    • URI
    • User-defined function (UDF)
  • V
    • Vacuuming
    • Vamana
    • Variable-sized attribute
    • Vector
    • Vector Search
    • Vector space

Glossary

reference
A complete glossary of terms related to TileDB and and all supported solutions.

A

Academy

A learning platform containing all the information (explanation of product features, tutorials, foundational concepts, and more) needed to make you a TileDB superuser and extract maximum value for you and your organization.

Account

A personal namespace created for an individual user in TileDB Cloud. This namespace cannot give access to other users into their account namespace, as it is for personal use. By default, each user has a personal namespace.

Aggregate

A function built into TileDB to manipulate qualifying data in parallel in TileDB rather than passing the data to an external system to manipulate data. Examples of aggregates are count, sum, min, max, null count, and mean.

API token

Also known as a REST API token, a string of alphanumeric characters and symbols. You use REST API tokens authenticate to TileDB Cloud as a specific user and perform certain actions, depending on the scope set when you created the token.

Array

A multi-dimensional collection of data values stored in TileDB that allows for complex data structures and efficient access.

Array domain

The hyperspace defined by the dimension domains

Array metadata

Key-value pairs of arbitrary data users can attach to an array, where the key is a string and the value can be any type.

Array schema

An object that stores details about the array definition, such as the attributes, dimensions, tile capacity and extent, and the tile and cell order.

Asset

Any object in TileDB Cloud, including the following:

  • Arrays
  • Files
  • VCF datasets
  • SOMA experiments
  • Biomedical Imaging datasets
  • Vector Search indexes
  • Notebooks
  • Dashboards
  • UDFs
  • Task Graphs
  • ML Models
  • Groups

Attribute

The values stored within each cell, represented by a key (attribute name) and a value.

B

Binary variant call format (BCF)

Binary version of a VCF file. Note: a BCF file is not a tabix bgzipped VCF file, although it provides the necessary indices for TileDB ingestion.

C

Cell

An ordered tuple of dimension domain values (following the order in which the array dimensions were specified during array creation), called coordinates.

Cell order

The order in which cells within each space tile are written to disk. Cell orders can be row-major or column-major

Cloud credentials

Credentials used to access resources on S3-compatible object stores. Cloud credentials are configured in your account settings.

Column-major order

An order for tiles and cells where, assuming each tile or cell can be identified by a set of coordinates in the multi-dimensional space, the leftmost coordinate index “varies the fastest”.

Commit

A file TileDB creates to signify the successful creation of a fragment. Commits are eligible for consolidation and vacuuming.

Consolidation

A process in which TileDB combines various fragments, fragment metadata, commits, array metadata, or group metadata within the array into a smaller number of objects for faster reads at the cost of less granularity for time traveling. Consolidation is usually followed by vacuuming and is process-safe (i.e., you can run consolidation while read and write operations are happening on the array without the risk of losing data).

Configuration

A set of customizable key-value pairs in TileDB containing various settings it uses when it opens (creates an instance) of an array. When you pass a configuration object into a context object, and then you pass that context object into the method that opens (instantiates) your array, those configuration settings will ultimately affect the performance and overall functionality of the instance of your array until you close the array.

Context

An object containing a configuration object that gives TileDB instructions on how to configure the instantiation of an array.

Coordinates

The coordinates of an array cell are an ordered tuple of dimension domain values that identifies the cell. In dense arrays, the coordinates of each cell are unique. In sparse arrays, the same coordinates may appear more than once.

Cosine distance

The distance of two vectors computed by calculating the cosine of the angle between those two vectors. In specific vector spaces, cosine similarity is a measure of vector similarity.

D

Dashboard

An app built on top of a notebook allowing you to visualize your data and analyses.

Data tile

A subset of cell values on a particular attribute. Data tiles are also known as the atomic unit of I/O and compression.

Default context

The context to be used for all operations that accept a context as a parameter for the lifecycle of the program. This means that closing and reopening an array will use the same context and, consequently, the same configuration.

Deletion

In the context of sparse arrays, a deletion is a type of write operation that stores the delete conditions to logically represent data have been removed from an array, without altering any past fragments. TileDB materializes deletions (permanently deletes the data matching the delete condition) as soon as you run consolidation and vacuuming.

Dense array

A type of TileDB array where all possible data points are stored, allowing for efficient storage and retrieval.

Dimension

Dimensions are data structures that comprise the hyperspace of the array. Each dimension has a name, a data type, and a domain of values (not to be confused with the array domain).

Dimension domain

The range of possible values a dimension can be, applicable to numeric domains.

Distance function

A function that computes the distance between two vectors. Different types of distance functions exist, including the L2 distance function (Euclidean distance function), the cosine distance, and the inner product distance

E

Embedding

This is a vector generated by a machine learning model given an external input object. Different machine learning models can be used to generate embeddings for different object types (i.e. image, text, audio, video, etc.). Embedding similarity (vector distance) is expected to approximate the input object similarity and therefore vector search is approximating object similarity search.

Empty cell

A cell that contains no attribute data.

Euclidean distance function

Refer to L2 distance function.

Expression quantitative trait loci (eQTL) analysis

A quantitative trait loci (QTL) analysis measures the association between variants and a phenotype. An expression quantitative trail loci (eQTL) analysis is a QTL analysis in which the phenotype is expression.

F

Fill values

Attribute values given to empty cells in a dense array.

Filter

A data transformation on an attribute or dimension, such as compression or encryption, that TileDB applies to data tiles before it writes those data tiles to disk.

FLAT

A straightforward algorithm implementation for Vector Search used to provide exact vector similarity search by computing the distance of the query vector and all the dataset vectors.

Fragment

A timestamped portion of data within a TileDB array that represents a snapshot or subset of the array’s content at a specific time.

Fragment metadata

System-specific, non-user-editable information about a fragment, including whether the fragment is dense or sparse, the non-empty domain of the fragment, the tile offsets, and the tile sizes. For sparse fragments, the fragment metadata includes an R-tree of the fragment data.

Frontier data

Data for bleeding-edge scientific applications that will power current and future generations of scientific discovery. Today, this includes population genomics, biomedical imaging, proteomics, metabolomics, single-cell, and spatial transcriptomics, all data that wasn’t collectable even 50 years ago. In the future, it will include brand new -omics fields and new areas of discovery.

G

Genome-wide association study (GWAS)

Genome-wide association studies identify genetic variants associated with complex traits or diseases across the entire genome.

Genomic VCF (gVCF)

A genomic VCF (gVCF) file, usually limited to one sample, is a VCF file containing variant positions; reference spans, which define long genomic intervals where the sample matches the reference sequence; and no-call spans, where the sequencing depth is too low to make a confident genotype call. TileDB-VCF recognizes these gVCF reference spans during queries and export.

Global cell order

A mapping from the multi-dimensional cell space to the 1-dimensional physical storage space (i.e., it is the order in which TileDB stores cell values on disk). It comprises the tile order and cell order.

Group

An asset within TileDB that creates a logical, hierarchical storage of other TileDB assets, including other groups. Groups are advantageous for objects that may exist in cloud object stores, which have no concept of a directory.

I

Incomplete query

A query where the result size of a subarray is larger than the allocated buffers that will hold the result. TileDB gracefully handles this case via result estimation and subarray partitioning.

Inner product distance

The distance between two vectors computed by the inner product of two vectors. Also known as dot product, it is a measure of vector similarity in specific vector spaces.

Internal allele frequency (IAF)

The internal allele frequency (IAF) frequency of an allele observed in a TileDB variant store, taking into account reference calls, no-calls, and polyallelic sites.

Interoperability

The capability of different software systems and tools to work together seamlessly, often facilitated by standardized data formats and APIs.

IVF_FLAT

A Vector Search algorithm implementation based on \(k\)-means clustering, where TileDB computes \(k\) separate clusters (as well as their centroids) of the dataset vectors, shuffling them in such a way that vectors appear adjacent on storage. Answering a query involves focusing only on a small number of clusters, based on the query’s proximity to their centroids.

L

L2 distance function

Also known as Euclidean distance, is the length of the line segment between 2 vectors. It can be calculated from the Cartesian coordinates of the points using the Pythagorean theorem.

Large language model (LLM)

A Machine Learning model that takes as input data, typically in the form of a request in natural language, and outputs a response in natural language.

M

Machine Learning (ML)

A subfield of artificial intelligence (AI) focused on developing algorithms that allow computers to learn and improve from experience without being explicitly programmed, imitating intelligent human behavior.

Marketplace

The Marketplace is a collection of publicly available assets uploaded to TileDB by other TileDB users. You can view high-level details about each asset without an account, but you must be logged in to TileDB in order to take actions on these assets or use them programmatically. Users who publish items on the Marketplace have the option to monetize any data and code they publish.

Member

A TileDB user that belongs to an organizational namespace.

ML models

Algorithms or mathematical representations that enable computers to learn from and make predictions or decisions based on data.

Multi-Modal Omics

Integration of various omics data types (genomics, proteomics, metabolomics, etc.) to provide a comprehensive understanding of biological systems.

Multi-Modal Support

The capability of TileDB to handle different data types and formats within a single database system, providing flexibility for various use cases.

Multi-range subarray

A subarray comprised of multiple ranges (subsets represented as hyper-rectangles of the array). Multi-range subarrays are only applicable to reads.

N

N+1 problem

The N+1 problem refers to a long-standing problem in using flat tables when columns introduce new rows. In VCFs, the N+1 problem describes the problem when a new sample is added to a cohort represented by a project VCF. In this situation, any new variants introduced by the new sample must be interrogated among all the existing samples to reconstruct this pVCF.

Namespace

A mechanism in TileDB Cloud where you can control who has access to specific resources and assets and who can use compute resources. TileDB offers two types of namespaces: individual namespaces, known as accounts, and organizational namespaces.

No-call

A no-call refers to a position in a VCF, denoted by ./. in the GT field, where there is not adequate depth to make a genotype call.

Non-empty domain

The minimum bounding hyper-rectangle of an array that tightly encompasses all non-empty cells in the array.

Normalization

The process of adjusting data to remove technical variation, making different datasets comparable.

Notebook

A container object designed by Project Jupyter that stores both Markdown and code (typically Python or R, but other languages are also supported). Notebooks allow you to document, execute, and share analyses on your array data. Notebooks form the basis of dashboards.

Notebook server

A JupyterLab instance spun up by TileDB Cloud for you to run your Jupyter notebooks without having to manually set up servers and deploy JupyterLab.

Nullable attribute

An attribute that accepts a null value. This is different from an empty cell, where the attribute is not present.

O

Omics

A field of study in biology that involves the comprehensive analysis of biological molecules, including genomics, proteomics, transcriptomics, and metabolomics.

Organization

A namespace that allows a team of users to join. An individual user creates this type of namespace. An organization can contain zero or more members.

P

Phenome-wide association studies (PheWAS)

Phenome-wide association studies explore associations between genetic variants and a wide range of phenotypes, providing a comprehensive view of genetic influences.

Polygenic risk scores (PRS)

Polygenic risk scores aggregate multiple genetic variants to predict an individual’s susceptibility to certain traits or diseases, facilitating personalized risk assessment in healthcare.

Population VCF (pVCF)

A population or project VCF (pVCF) is usually a multi-sample VCF without reference/no-call spans.

PQ (Product Quantization)

Product quantization (PQ) is a form of lossy data compression for vectors. PQ reduces the memory footprint of a vector index but also reduces the search result accuracy.

Q

Query

A request made to a TileDB that either reads data from the array, writes new data to the array, or (for sparse arrays only) deletes data from the array.

Query condition

A logical expression on an attribute or a dimension, applied to a query to limit the data accessed by the query.

R

R-tree

A data structure used by TileDB for multi-dimensional indexing in sparse fragments. It allows for fast pruning of irrelevant data during reading.

Reinforcement learning models

ML models that learn by interacting with an environment, receiving rewards or penalties based on actions taken.

Role-Based Access Control (RBAC)

A security mechanism that restricts system access to authorized users based on their role within the organization.

Row-major order

An order for tiles and cells where, assuming each tile or cell can be identified by a set of coordinates in the multi-dimensional space, the righ-most coordinate index “varies the fastest”.

S

Scalability

The ability of software to handle increasing amounts of data or number of users without performance degradation.

Schema

A definition of the structure of an array, including its dimensions, attributes, and their data types.

Schema evolution

The act of modifying the array’s schema after the array has been created.

Semi-supervised learning models

ML models that use both labeled and unlabeled data. Typically, semi-supervised learning models use a small amount of labeled data and a large amount of unlabeled data.

Single-range subarray

A single subset of the array represented as a hyper-rectangle of the array

Slice

A set of tuples corresponding to start and end values of each of the dimensions of an array that defines a subset of that array.

SOMA

Stack Of Matrices, Annotated. This includes the main assay (X counts) matrix, as well as additional annotations on the assay matrix.

Space tile

The tile defined by the tile extents of each dimension. In dense arrays, space tiles have a one-to-one correspondence with data tiles. Space tiles in sparse arrays, however, can have empty cells. Thus, space tiles don’t have the same one-to-one correspondence to data tiles in sparse arrays.

Sparse array

A TileDB array that only stores non-empty data points, optimizing space and retrieval for data that is not uniformly populated.

Storage paths

Paths within your S3 object store where TileDB Cloud will save your assets. You have the option to set granular storage paths for the following assets:

  • Arrays
  • Notebooks
  • UDFs
  • ML models
  • Files
  • Groups
  • Task graphs

Subarray

A slice of an array. Subarrays can be single-range or multi-range and are applicable to both dense and sparse arrays.

Supervised learning models

ML models that train on labeled data, meaning that each training example is paired with an output label.

T

Task

An arbitrary computation on TileDB. Tasks can be a generic function, an array UDF, a serverless SQL query, or a local function.

Task graph

Also known as a pipeline, a mechanism containing a set of pre-defined, serialized or parallel, synchronous or asynchronous tasks on TileDB. A task graph contains one or more tasks.

Task log

A log of all tasks run by any user belonging to the current namespace. The task log includes information such as the action type, the user who launched the task, and the associated task graph.

Task graph log

A log of all task graphs run by any user belonging to the current namespace. The task graph log includes the name of the task graph, the namespace from where this task graph was launched, who launched the task, the duration of the task graph since it was launched, the start time, the number of tasks, the type of task graph, and the task graph ID.

Technical performance

Evaluating software based on speed, accuracy, and robustness, including how well it handles large datasets and complex analyses.

Throughput

The amount of data a system can process in a given time period, important for high-volume multi-omics analysis.

Tile

A chunk of data within a TileDB array used to facilitate efficient read and write operations by grouping cells.

Tile extent

The number of cells along a specific dimension of an array that can be stored in a tile.

Tile order

The order in which space tiles are stored on disk. The tile order can be row-major or column-major.

TileDB Cloud

The commercial product offering, built by the TileDB team.

Time traveling

A feature of TileDB that allows users to read different facets of an array at different points in time.

U

Unsupervised learning models

ML models that train on unlabeled data. The goal of unsupervised learning models is to find hidden patterns or intrinsic structures in the data.

URI

A Uniform Resource Identifier used to specify the location of a TileDB array in persistent storage.

User-defined function (UDF)

A packaged piece of code you wish to reuse through your notebooks or task graphs on TileDB Cloud.

V

Vacuuming

A process in TileDB, usually run after consolidation, that deletes any fragments, fragment metadata, commits, array metadata, or group metadata (depending on the vacuuming mode chosen) that TileDB consolidated, in an attempt to save space on disk. Vacuuming is not process-safe.

Vamana

The Vamana vector search algorithm is an efficient method for nearest neighbor search in high-dimensional vector spaces, which constructs a graph where each node represents a vector and edges connect to its nearest neighbors. It utilizes a greedy search strategy on this graph to quickly locate the approximate nearest neighbors of a query vector.

Variable-sized attribute

An attribute that has more than one piece of data. TileDB supports two types of variable-length attributes: lists of objects and strings.

Vector

A point in a \(d\)-dimensional vector space. This can be represented as an one-dimensional array of length \(d\), containing values of a specific datatype.

Vector Search

Also known as similarity search or nearest neighbor search, Vector Search involves finding vectors in a dataset that are similar to a given query based on a distance function.

Vector space

A multi-dimensional space with \(d\) dimensions.

Microsoft Entra