1. Structure
  2. Life Sciences
  3. Population Genomics
  4. API Reference
  5. Command Line Interface
  • Home
  • What is TileDB?
  • Get Started
  • Explore Content
  • Accounts
    • Individual Accounts
      • Apply for the Free Tier
      • Profile
        • Overview
        • Cloud Credentials
        • Storage Paths
        • REST API Tokens
        • Credits
    • Organization Admins
      • Create an Organization
      • Profile
        • Overview
        • Members
        • Cloud Credentials
        • Storage Paths
        • Billing
      • API Tokens
    • Organization Members
      • Organization Invitations
      • Profile
        • Overview
        • Members
        • Cloud Credentials
        • Storage Paths
        • Billing
      • API Tokens
  • Catalog
    • Introduction
    • Data
      • Arrays
      • Tables
      • Single-Cell (SOMA)
      • Genomics (VCF)
      • Biomedical Imaging
      • Vector Search
      • Files
    • Code
      • Notebooks
      • Dashboards
      • User-Defined Functions
      • Task Graphs
      • ML Models
    • Groups
    • Marketplace
    • Search
  • Collaborate
    • Introduction
    • Organizations
    • Access Control
      • Introduction
      • Share Assets
      • Asset Permissions
      • Public Assets
    • Logging
    • Marketplace
  • Analyze
    • Introduction
    • Slice Data
    • Multi-Region Redirection
    • Notebooks
      • Launch a Notebook
      • Usage
      • Widgets
      • Notebook Image Dependencies
    • Dashboards
      • Dashboards
      • Streamlit
    • Preview
    • User-Defined Functions
    • Task Graphs
    • Serverless SQL
    • Monitor
      • Task Log
      • Task Graph Log
  • Scale
    • Introduction
    • Task Graphs
    • API Usage
  • Structure
    • Why Structure Is Important
    • Arrays
      • Introduction
      • Quickstart
      • Foundation
        • Array Data Model
        • Key Concepts
          • Storage
            • Arrays
            • Dimensions
            • Attributes
            • Cells
            • Domain
            • Tiles
            • Data Layout
            • Compression
            • Encryption
            • Tile Filters
            • Array Schema
            • Schema Evolution
            • Fragments
            • Fragment Metadata
            • Commits
            • Indexing
            • Array Metadata
            • Datetimes
            • Groups
            • Object Stores
          • Compute
            • Writes
            • Deletions
            • Consolidation
            • Vacuuming
            • Time Traveling
            • Reads
            • Query Conditions
            • Aggregates
            • User-Defined Functions
            • Distributed Compute
            • Concurrency
            • Parallelism
        • Storage Format Spec
      • Tutorials
        • Basics
          • Basic Dense Array
          • Basic Sparse Array
          • Array Metadata
          • Compression
          • Encryption
          • Data Layout
          • Tile Filters
          • Datetimes
          • Multiple Attributes
          • Variable-Length Attributes
          • String Dimensions
          • Nullable Attributes
          • Multi-Range Reads
          • Query Conditions
          • Aggregates
          • Deletions
          • Catching Errors
          • Configuration
          • Basic S3 Example
          • Basic TileDB Cloud
          • fromDataFrame
          • Palmer Penguins
        • Advanced
          • Schema Evolution
          • Advanced Writes
            • Write at a Timestamp
            • Get Fragment Info
            • Consolidation
              • Fragments
              • Fragment List
              • Consolidation Plan
              • Commits
              • Fragment Metadata
              • Array Metadata
            • Vacuuming
              • Fragments
              • Commits
              • Fragment Metadata
              • Array Metadata
          • Advanced Reads
            • Get Fragment Info
            • Time Traveling
              • Introduction
              • Fragments
              • Array Metadata
              • Schema Evolution
          • Array Upgrade
          • Backends
            • Amazon S3
            • Azure Blob Storage
            • Google Cloud Storage
            • MinIO
            • Lustre
          • Virtual Filesystem
          • User-Defined Functions
          • Distributed Compute
          • Result Estimation
          • Incomplete Queries
        • Management
          • Array Schema
          • Groups
          • Object Management
        • Performance
          • Summary of Factors
          • Dense vs. Sparse
          • Dimensions vs. Attributes
          • Compression
          • Tiling and Data Layout
          • Tuning Writes
          • Tuning Reads
      • API Reference
    • Tables
      • Introduction
      • Quickstart
      • Foundation
        • Data Model
        • Key Concepts
          • Indexes
          • Columnar Storage
          • Compression
          • Data Manipulation
          • Optimize Tables
          • ACID
          • Serverless SQL
          • SQL Connectors
          • Dataframes
          • CSV Ingestion
      • Tutorials
        • Basics
          • Ingestion with SQL
          • CSV Ingestion
          • Basic S3 Example
          • Running Locally
        • Advanced
          • Scalable Ingestion
          • Scalable Queries
      • API Reference
    • AI & ML
      • Vector Search
        • Introduction
        • Quickstart
        • Foundation
          • Data Model
          • Key Concepts
            • Vector Search
            • Vector Databases
            • Algorithms
            • Distance Metrics
            • Updates
            • Deployment Methods
            • Architecture
            • Distributed Compute
          • Storage Format Spec
        • Tutorials
          • Basics
            • Ingestion & Querying
            • Updates
            • Deletions
            • Basic S3 Example
            • Running Locally
          • Advanced
            • Versioning
            • Time Traveling
            • Consolidation
            • Distributed Compute
            • RAG LLM
            • LLM Memory
            • File Search
            • Image Search
            • Protein Search
          • Performance
        • API Reference
      • ML Models
        • Introduction
        • Quickstart
        • Foundation
          • Basics
          • Storage
          • Cloud Execution
          • Why TileDB for Machine Learning
        • Tutorials
          • Ingestion
            • Data Ingestion
              • Dense Datasets
              • Sparse Datasets
            • ML Model Ingestion
          • Management
            • Array Schema
            • Machine Learning: Groups
            • Time Traveling
    • Life Sciences
      • Single-cell
        • Introduction
        • Quickstart
        • Foundation
          • Data Model
          • Key Concepts
            • Data Structures
            • Use of Apache Arrow
            • Join IDs
            • State Management
            • TileDB Cloud URIs
          • SOMA API Specification
        • Tutorials
          • Data Ingestion
          • Bulk Ingestion Tutorial
          • Data Access
          • Distributed Compute
          • Basic S3 Example
          • Multi-Experiment Queries
          • Appending Data to a SOMA Experiment
          • Add New Measurements
          • SQL Queries
          • Running Locally
          • Shapes in TileDB-SOMA
          • Drug Discovery App
        • Spatial
          • Introduction
          • Foundation
            • Spatial Data Model
            • Data Structures
          • Tutorials
            • Spatial Data Ingestion
            • Access Spatial Data
            • Manage Coordinate Spaces
        • API Reference
      • Population Genomics
        • Introduction
        • Quickstart
        • Foundation
          • Data Model
          • Key Concepts
            • The N+1 Problem
            • Architecture
            • Arrays
            • Ingestion
            • Reads
            • Variant Statistics
            • Annotations
            • User-Defined Functions
            • Tables and SQL
            • Distributed Compute
          • Storage Format Spec
        • Tutorials
          • Basics
            • Basic Ingestion
            • Basic Queries
            • Export to VCF
            • Add New Samples
            • Deleting Samples
            • Basic S3 Example
            • Basic TileDB Cloud
          • Advanced
            • Scalable Ingestion
            • Scalable Queries
            • Query Transforms
            • Handling Large Queries
            • Annotations
              • Finding Annotations
              • Embedded Annotations
              • External Annotations
              • Annotation VCFs
              • Ingesting Annotations
            • Variant Statistics
            • Tables and SQL
            • User-Defined Functions
            • Sample Metadata
            • Split VCF
          • Performance
        • API Reference
          • Command Line Interface
          • Python API
          • Cloud API
      • Biomedical Imaging
        • Introduction
        • Foundation
          • Data Model
          • Key Concepts
            • Arrays
            • Ingestion
            • Reads
            • User Defined Functions
          • Storage Format Spec
        • Quickstart
        • Tutorials
          • Basics
            • Ingestion
            • Read
              • OpenSlide
              • TileDB-Py
          • Advanced
            • Batched Ingestion
            • Chunked Ingestion
            • Machine Learning
              • PyTorch
            • Napari
    • Files
  • API Reference
  • Self-Hosting
    • Installation
    • Upgrades
    • Administrative Tasks
    • Image Customization
      • Customize User-Defined Function Images
      • AWS ECR Container Registry
      • Customize Jupyter Notebook Images
    • Single Sign-On
      • Configure Single Sign-On
      • OpenID Connect
      • Okta SCIM
      • Microsoft Entra
  • Glossary

On this page

  • Consolidate
    • Description
    • Usage
    • Options
    • Subcommands
  • Create
    • Description
    • Usage:
    • Options
      • Ingestion task options:
      • TileDB options:
      • Debug options:
  • Delete
    • Description
    • Usage
    • Options
  • Export
    • Description
    • Usage
    • Options
      • Output options
      • Region options
      • Sample options
      • TileDB options
      • Debug options
  • List
    • Description
    • Usage
    • Options
  • Stat
    • Description
    • Usage
    • Options
  • Store
    • Description
    • Usage
    • Positionals
    • Options
      • Sample options
      • TileDB options
      • Advanced options
      • Contig options
      • Debug options
      • Legacy options
  • Vacuum
    • Description
    • Usage
    • Options
    • Subcommands
  1. Structure
  2. Life Sciences
  3. Population Genomics
  4. API Reference
  5. Command Line Interface

VCF Command Line Interface

life sciences
genomics (vcf)
reference

Subcommands:

  • consolidate - Consolidate TileDB-VCF dataset
  • create - Creates an empty TileDB-VCF dataset
  • delete - Delete samples from a TileDB-VCF dataset
  • export - Exports data from a TileDB-VCF dataset
  • list - Lists all sample names present in a TileDB-VCF dataset
  • stat - Prints high-level statistics about a TileDB-VCF dataset
  • store - Tests samples into a TileDB-VCF dataset
  • vacuum - Vacuum TileDB-VCF dataset

Consolidate

Description

Consolidate TileDB-VCF dataset

Usage

tiledbvcf utils consolidate [OPTIONS] SUBCOMMAND

Options

  -h,--help                             Print this help message and exit

Subcommands

  commits                               Consolidate TileDB-VCF dataset commits
  fragments                             Consolidate TileDB-VCF dataset fragments
  fragment_meta                         Consolidate TileDB-VCF dataset fragment metadata

Create

Description

Creates an empty TileDB-VCF dataset

Usage:

tiledbvcf create [OPTIONS]

Options

  -u,--uri TEXT REQUIRED                TileDB-VCF dataset URI
  -a,--attributes TEXT=[] ... Excludes: --vcf-attributes
                                        INFO and/or FORMAT field names (comma-delimited) to store as separate attributes.
                                        Names should be 'fmt_X' or 'info_X' for a field name 'X' (case sensitive).
  -v,--vcf-attributes TEXT Excludes: --attributes
                                        Create separate attributes for all INFO and FORMAT fields in the provided VCF file.
  -g,--anchor-gap UINT=1000             Anchor gap size to use
  -n,--no-duplicates                    Allow records with duplicate start positions to be written to the array.
  --compress-sample-dim,--no-compress-sample-dim{false}
                                        Enable/disable compression of the sample dimension. Enabled by default.

Ingestion task options:

  --enable-allele-count,--disable-allele-count{false}
                                        Enable/disable allele count array creation. Enabled by default.
  --enable-variant-stats,--disable-variant-stats{false}
                                        Enable/disable variant stats array creation. Enabled by default.

TileDB options:

  -c,--tile-capacity UINT=10000         Tile capacity to use for the array schema
  --tiledb-config TEXT=[] ...           CSV string of the format 'param1=val1,param2=val2...' specifying optional TileDB
                                        configuration parameter settings.
  --checksum ENUM:value in {md5->md5,none->none,sha256->sha256} OR {md5,none,sha256}=sha256
                                        Checksum to use for dataset validation on read and writes.

Debug options:

  --log-level TEXT:{fatal,error,warn,info,debug,trace}=fatal
                                        Log message level
  --log-file TEXT                       Log message output file

Delete

Description

Delete samples from a TileDB-VCF dataset

Usage

tiledbvcf delete [OPTIONS]

Options

  -u,--uri TEXT REQUIRED                TileDB-VCF dataset URI
  -s,--sample-names TEXT=[] ...         CSV list of sample names to delete
  --tiledb-config TEXT=[] ...           CSV string of the format 'param1=val1,param2=val2...' specifying optional TileDB
                                        configuration parameter settings.
  --log-level TEXT:{fatal,error,warn,info,debug,trace}=fatal
                                        Log message level
  --log-file TEXT                       Log message output file

Export

Description

Exports data from a TileDB-VCF dataset

Usage

tiledbvcf export [OPTIONS]

Options

  -u,--uri TEXT REQUIRED                TileDB-VCF dataset URI

Output options

  -O,--output-format ENUM:value in {b->b,t->t,u->u,v->v,z->z} OR {b,t,u,v,z}=b
                                        Export format. Options are: 'b': bcf (compressed); 'u': bcf; 'z': vcf.gz; 'v': vcf;
                                        't': TSV
  -o,--output-path TEXT                 [TSV or combined VCF export only] The name of the output file.
  -m,--merge Needs: --output-path       Export combined VCF file.
  -t,--tsv-fields TEXT=[] ...           [TSV export only] An ordered CSV list of fields to export in the TSV. A field name
                                        can be one of 'SAMPLE', 'ID', 'REF', 'ALT', 'QUAL', 'POS', 'CHR', 'FILTER'. Additionally,
                                        INFO fields can be specified by 'I:<name>' and FMT fields with 'F:<name>'. To export
                                        the intersecting query region for each row in the output, use the field names 'Q:POS',
                                        'Q:END' and 'Q:LINE'.
  -n,--limit UINT=18446744073709551615  Only export the first N intersecting records.
  -d,--output-dir TEXT                  Directory used for local output of exported samples
  --upload-dir TEXT                     If set, all output file(s) from the export process will be copied to the given directory
                                        (or S3 prefix) upon completion.
  -c,--count-only Excludes: --af-filter Don't write output files, only print the count of the resulting number of intersecting
                                        records.
  --af-filter TEXT Excludes: --count-only
                                        If set, only export data that passes the AF filter.

Region options

  -r,--regions TEXT=[] ... Excludes: --regions-file
                                        CSV list of regions to export in the format 'chr:min-max'
  -R,--regions-file TEXT Excludes: --regions
                                        File containing regions (BED format)
  --sorted                              Do not sort regions or regions file if they are pre-sorted
  --region-partition TEXT               Partitions the list of regions to be exported and causes this export to export only
                                        a specific partition of them. Specify in the format I:N where I is the partition
                                        index and N is the total number of partitions. Useful for batch exports.

Sample options

  -f,--samples-file TEXT Excludes: --sample-names
                                        Path to file with 1 sample name per line
  -s,--sample-names TEXT=[] ... Excludes: --samples-file
                                        CSV list of sample names to export
  --sample-partition TEXT               Partitions the list of samples to be exported and causes this export to export only
                                        a specific partition of them. Specify in the format I:N where I is the partition
                                        index and N is the total number of partitions. Useful for batch exports.
  --disable-check-samples{false}        Disable validating that sample passed exist in dataset before executing query and
                                        error if any sample requested is not in the dataset

TileDB options

  --tiledb-config TEXT=[] ...           CSV string of the format 'param1=val1,param2=val2...' specifying optional TileDB
                                        configuration parameter settings.
  --mem-budget-buffer-percentage FLOAT=25
                                        The percentage of the memory budget to use for TileDB query buffers.
  --mem-budget-tile-cache-percentage FLOAT=10
                                        The percentage of the memory budget to use for TileDB tile cache.
  -b,--mem-budget-mb UINT=2048          The memory budget (MB) used when submitting TileDB queries.
  --stats                               Enable TileDB stats
  --stats-vcf-header-array              Enable TileDB stats for vcf header array usage

Debug options

  --log-level TEXT:{fatal,error,warn,info,debug,trace}=fatal
                                        Log message level
  --log-file TEXT                       Log message output file
  -v,--verbose :DEPRECATED              Enable verbose output DEPRECATED: please use '--log-level debug' instead
  --enable-progress-estimation          Enable progress estimation in verbose mode. Progress estimation can sometimes cause
                                        a performance impact, so enable this with consideration.
  --debug-print-vcf-regions             Enable debug printing of vcf region passed by user or bed file. Requires verbose
                                        mode
  --debug-print-sample-list             Enable debug printing of sample list used in read. Requires verbose mode
  --debug-print-tiledb-query-ranges     Enable debug printing of tiledb query ranges used in read. Requires verbose mode

List

Description

Lists all sample names present in a TileDB-VCF dataset

Usage

tiledbvcf list [OPTIONS]

Options

  -u,--uri TEXT REQUIRED                TileDB-VCF dataset URI
  --tiledb-config TEXT=[] ...           CSV string of the format 'param1=val1,param2=val2...' specifying optional TileDB
                                        configuration parameter settings.
  --log-level TEXT:{fatal,error,warn,info,debug,trace}=fatal
                                        Log message level
  --log-file TEXT                       Log message output file

Stat

Description

Prints high-level statistics about a TileDB-VCF dataset

Usage

tiledbvcf stat [OPTIONS]

Options

  -u,--uri TEXT REQUIRED                TileDB-VCF dataset URI
  --tiledb-config TEXT=[] ...           CSV string of the format 'param1=val1,param2=val2...' specifying optional TileDB
                                        configuration parameter settings.
  --log-level TEXT:{fatal,error,warn,info,debug,trace}=fatal
                                        Log message level
  --log-file TEXT                       Log message output file

Store

Description

Tests samples into a TileDB-VCF dataset

Usage

tiledbvcf store [OPTIONS] [paths...]

Positionals

  paths TEXT=[] ... Excludes: --samples-file
                                        VCF URIs to ingest

Options

  -u,--uri TEXT REQUIRED                TileDB-VCF dataset URI
  -t,--threads UINT=20                  Number of threads
  -m,--total-memory-budget-mb UINT:UINT in [512 - 64103]=48077
                                        The total memory budget for ingestion (MiB)
  -M,--total-memory-percentage FLOAT:FLOAT in [0 - 1]=0
                                        Percentage of total system memory used for ingestion (overrides '--total-memory-budget-mb')
  --resume                              Resume incomplete ingestion of sample batch

Sample options

  -e,--sample-batch-size UINT=10        Number of samples per batch for ingestion
  -f,--samples-file TEXT Excludes: paths
                                        File with 1 VCF path to be ingested per line. The format can also include an explicit
                                        index path on each line, in the format '<vcf-uri><TAB><index-uri>'
  --remove-sample-file Needs: --samples-file
                                        If specified, the samples file ('-f' argument) is deleted after successful ingestion
  -d,--scratch-dir TEXT                 Directory used for local storage of downloaded remote samples
  -s,--scratch-mb UINT=0                Amount of local storage that can be used for downloading remote samples (MB)

TileDB options

  -p,--s3-part-size UINT=50             [S3 only] Part size to use for writes (MB)
  --tiledb-config TEXT=[] ...           CSV string of the format 'param1=val1,param2=val2...' specifying optional TileDB
                                        configuration parameter settings.
  --stats                               Enable TileDB stats
  --stats-vcf-header-array              Enable TileDB stats for vcf header array usage

Advanced options

  --ratio-tiledb-memory FLOAT:FLOAT in [0.01 - 0.99]=0.5
                                        Ratio of memory budget allocated to TileDB::sm.mem.total_budget
  --max-tiledb-memory-mb UINT=4096      Maximum memory allocated to TileDB::sm.mem.total_budget (MiB)
  --input-record-buffer-mb UINT=1       Size of input record buffer for each sample file (MiB)
  --avg-vcf-record-size INT:INT in [1 - 4096]=512
                                        Average VCF record size (bytes)
  --ratio-task-size FLOAT:FLOAT in [0.01 - 1]=0.75
                                        Ratio of worker task size to computed task size
  --ratio-output-flush FLOAT:FLOAT in [0.01 - 1]=0.75
                                        Ratio of output buffer capacity that triggers a flush to TileDB

Contig options

  --disable-contig-fragment-merging{false} Excludes: --contigs-to-keep-separate --contigs-to-allow-merging
                                        Disable merging of contigs into fragments. Generally contig fragment merging is good,
                                        this is a performance optimization to reduce the prefixes on a s3/azure/gcs bucket
                                        when there is a large number of pseudo contigs which are small in size.
  --contigs-to-keep-separate TEXT ... Excludes: --disable-contig-fragment-merging --contigs-to-allow-merging
                                        Comma-separated list of contigs that should not be merged into combined fragments.
                                        The default list includes all standard human chromosomes in both UCSC (e.g., chr1)
                                        and Ensembl (e.g., 1) formats.
  --contigs-to-allow-merging TEXT=[] ... Excludes: --disable-contig-fragment-merging --contigs-to-keep-separate
                                        Comma-separated list of contigs that should be allowed to be merged into combined
                                        fragments.
  --contig-mode ENUM:value in {all->all,merged->merged,separate->separate} OR {all,merged,separate}=all
                                        Select which contigs are ingested: 'separate', 'merged', or 'all' contigs

Debug options

  --log-level TEXT:{fatal,error,warn,info,debug,trace}=fatal
                                        Log message level
  --log-file TEXT                       Log message output file
  -v,--verbose :DEPRECATED              Enable verbose output DEPRECATED: please use '--log-level debug' instead

Legacy options

  -n,--max-record-buff UINT             Max number of VCF records to buffer per file
  -k,--thread-task-size UINT            Max length (# columns) of an ingestion task. Affects load balancing of ingestion
                                        work across threads, and total memory consumption.
  -b,--mem-budget-mb UINT               The maximum size of TileDB buffers before flushing (MiB)

Vacuum

Description

Vacuum TileDB-VCF dataset

Usage

tiledbvcf utils vacuum [OPTIONS] SUBCOMMAND

Options

  -h,--help                             Print this help message and exit

Subcommands

  commits                               Vacuum TileDB-VCF dataset commits
  fragments                             Vacuum TileDB-VCF dataset fragments
  fragment_meta                         Vacuum TileDB-VCF dataset fragment metadata
API Reference
Python API