1. Structure
  2. Life Sciences
  3. Population Genomics
  4. API Reference
  5. Cloud API
  • Home
  • What is TileDB?
  • Get Started
  • Explore Content
  • Accounts
    • Individual Accounts
      • Apply for the Free Tier
      • Profile
        • Overview
        • Cloud Credentials
        • Storage Paths
        • REST API Tokens
        • Credits
    • Organization Admins
      • Create an Organization
      • Profile
        • Overview
        • Members
        • Cloud Credentials
        • Storage Paths
        • Billing
      • API Tokens
    • Organization Members
      • Organization Invitations
      • Profile
        • Overview
        • Members
        • Cloud Credentials
        • Storage Paths
        • Billing
      • API Tokens
  • Catalog
    • Introduction
    • Data
      • Arrays
      • Tables
      • Single-Cell (SOMA)
      • Genomics (VCF)
      • Biomedical Imaging
      • Vector Search
      • Files
    • Code
      • Notebooks
      • Dashboards
      • User-Defined Functions
      • Task Graphs
      • ML Models
    • Groups
    • Marketplace
    • Search
  • Collaborate
    • Introduction
    • Organizations
    • Access Control
      • Introduction
      • Share Assets
      • Asset Permissions
      • Public Assets
    • Logging
    • Marketplace
  • Analyze
    • Introduction
    • Slice Data
    • Multi-Region Redirection
    • Notebooks
      • Launch a Notebook
      • Usage
      • Widgets
      • Notebook Image Dependencies
    • Dashboards
      • Dashboards
      • Streamlit
    • Preview
    • User-Defined Functions
    • Task Graphs
    • Serverless SQL
    • Monitor
      • Task Log
      • Task Graph Log
  • Scale
    • Introduction
    • Task Graphs
    • API Usage
  • Structure
    • Why Structure Is Important
    • Arrays
      • Introduction
      • Quickstart
      • Foundation
        • Array Data Model
        • Key Concepts
          • Storage
            • Arrays
            • Dimensions
            • Attributes
            • Cells
            • Domain
            • Tiles
            • Data Layout
            • Compression
            • Encryption
            • Tile Filters
            • Array Schema
            • Schema Evolution
            • Fragments
            • Fragment Metadata
            • Commits
            • Indexing
            • Array Metadata
            • Datetimes
            • Groups
            • Object Stores
          • Compute
            • Writes
            • Deletions
            • Consolidation
            • Vacuuming
            • Time Traveling
            • Reads
            • Query Conditions
            • Aggregates
            • User-Defined Functions
            • Distributed Compute
            • Concurrency
            • Parallelism
        • Storage Format Spec
      • Tutorials
        • Basics
          • Basic Dense Array
          • Basic Sparse Array
          • Array Metadata
          • Compression
          • Encryption
          • Data Layout
          • Tile Filters
          • Datetimes
          • Multiple Attributes
          • Variable-Length Attributes
          • String Dimensions
          • Nullable Attributes
          • Multi-Range Reads
          • Query Conditions
          • Aggregates
          • Deletions
          • Catching Errors
          • Configuration
          • Basic S3 Example
          • Basic TileDB Cloud
          • fromDataFrame
          • Palmer Penguins
        • Advanced
          • Schema Evolution
          • Advanced Writes
            • Write at a Timestamp
            • Get Fragment Info
            • Consolidation
              • Fragments
              • Fragment List
              • Consolidation Plan
              • Commits
              • Fragment Metadata
              • Array Metadata
            • Vacuuming
              • Fragments
              • Commits
              • Fragment Metadata
              • Array Metadata
          • Advanced Reads
            • Get Fragment Info
            • Time Traveling
              • Introduction
              • Fragments
              • Array Metadata
              • Schema Evolution
          • Array Upgrade
          • Backends
            • Amazon S3
            • Azure Blob Storage
            • Google Cloud Storage
            • MinIO
            • Lustre
          • Virtual Filesystem
          • User-Defined Functions
          • Distributed Compute
          • Result Estimation
          • Incomplete Queries
        • Management
          • Array Schema
          • Groups
          • Object Management
        • Performance
          • Summary of Factors
          • Dense vs. Sparse
          • Dimensions vs. Attributes
          • Compression
          • Tiling and Data Layout
          • Tuning Writes
          • Tuning Reads
      • API Reference
    • Tables
      • Introduction
      • Quickstart
      • Foundation
        • Data Model
        • Key Concepts
          • Indexes
          • Columnar Storage
          • Compression
          • Data Manipulation
          • Optimize Tables
          • ACID
          • Serverless SQL
          • SQL Connectors
          • Dataframes
          • CSV Ingestion
      • Tutorials
        • Basics
          • Ingestion with SQL
          • CSV Ingestion
          • Basic S3 Example
          • Running Locally
        • Advanced
          • Scalable Ingestion
          • Scalable Queries
      • API Reference
    • AI & ML
      • Vector Search
        • Introduction
        • Quickstart
        • Foundation
          • Data Model
          • Key Concepts
            • Vector Search
            • Vector Databases
            • Algorithms
            • Distance Metrics
            • Updates
            • Deployment Methods
            • Architecture
            • Distributed Compute
          • Storage Format Spec
        • Tutorials
          • Basics
            • Ingestion & Querying
            • Updates
            • Deletions
            • Basic S3 Example
            • Running Locally
          • Advanced
            • Versioning
            • Time Traveling
            • Consolidation
            • Distributed Compute
            • RAG LLM
            • LLM Memory
            • File Search
            • Image Search
            • Protein Search
          • Performance
        • API Reference
      • ML Models
        • Introduction
        • Quickstart
        • Foundation
          • Basics
          • Storage
          • Cloud Execution
          • Why TileDB for Machine Learning
        • Tutorials
          • Ingestion
            • Data Ingestion
              • Dense Datasets
              • Sparse Datasets
            • ML Model Ingestion
          • Management
            • Array Schema
            • Machine Learning: Groups
            • Time Traveling
    • Life Sciences
      • Single-cell
        • Introduction
        • Quickstart
        • Foundation
          • Data Model
          • Key Concepts
            • Data Structures
            • Use of Apache Arrow
            • Join IDs
            • State Management
            • TileDB Cloud URIs
          • SOMA API Specification
        • Tutorials
          • Data Ingestion
          • Bulk Ingestion Tutorial
          • Data Access
          • Distributed Compute
          • Basic S3 Example
          • Multi-Experiment Queries
          • Appending Data to a SOMA Experiment
          • Add New Measurements
          • SQL Queries
          • Running Locally
          • Shapes in TileDB-SOMA
          • Drug Discovery App
        • Spatial
          • Introduction
          • Foundation
            • Spatial Data Model
            • Data Structures
          • Tutorials
            • Spatial Data Ingestion
            • Access Spatial Data
            • Manage Coordinate Spaces
        • API Reference
      • Population Genomics
        • Introduction
        • Quickstart
        • Foundation
          • Data Model
          • Key Concepts
            • The N+1 Problem
            • Architecture
            • Arrays
            • Ingestion
            • Reads
            • Variant Statistics
            • Annotations
            • User-Defined Functions
            • Tables and SQL
            • Distributed Compute
          • Storage Format Spec
        • Tutorials
          • Basics
            • Basic Ingestion
            • Basic Queries
            • Export to VCF
            • Add New Samples
            • Deleting Samples
            • Basic S3 Example
            • Basic TileDB Cloud
          • Advanced
            • Scalable Ingestion
            • Scalable Queries
            • Query Transforms
            • Handling Large Queries
            • Annotations
              • Finding Annotations
              • Embedded Annotations
              • External Annotations
              • Annotation VCFs
              • Ingesting Annotations
            • Variant Statistics
            • Tables and SQL
            • User-Defined Functions
            • Sample Metadata
            • Split VCF
          • Performance
        • API Reference
          • Command Line Interface
          • Python API
          • Cloud API
      • Biomedical Imaging
        • Introduction
        • Foundation
          • Data Model
          • Key Concepts
            • Arrays
            • Ingestion
            • Reads
            • User Defined Functions
          • Storage Format Spec
        • Quickstart
        • Tutorials
          • Basics
            • Ingestion
            • Read
              • OpenSlide
              • TileDB-Py
          • Advanced
            • Batched Ingestion
            • Chunked Ingestion
            • Machine Learning
              • PyTorch
            • Napari
    • Files
  • API Reference
  • Self-Hosting
    • Installation
    • Upgrades
    • Administrative Tasks
    • Image Customization
      • Customize User-Defined Function Images
      • AWS ECR Container Registry
      • Customize Jupyter Notebook Images
    • Single Sign-On
      • Configure Single Sign-On
      • OpenID Connect
      • Okta SCIM
      • Microsoft Entra
  • Glossary

On this page

  • read_allele_frequency
    • Description
    • Usage
    • Parameters
  • calc_af
    • Description
    • Usage
    • Parameters
  • ingest
    • Usage
    • Description
    • Parameters
  • ingest_annotations
    • Usage
    • Description
    • Parameters
  • build_read_dag
    • Usage
    • Description
    • Parameters
    • Return values
  • read
    • Usage
    • Description
    • Parameters
    • Return value
  1. Structure
  2. Life Sciences
  3. Population Genomics
  4. API Reference
  5. Cloud API

Cloud API

life sciences
genomics (vcf)
reference

read_allele_frequency

Description

Read variant status

Usage

tiledb.cloud.vcf.read_allele_frequency(dataset_uri, region)

Parameters

  • dataset_uri: dataset URI
  • region: genomics region to read

calc_af

Description

Consolidate allele count (AC) and compute allele number (AN), allele frequency (AF)

Usage

tiledb.cloud.vcf.allele_frequency.calc_af(df)

Parameters

  • df: a pandas dataframe

ingest

Usage

tiledb.cloud.vcf.ingest(dataset_uri, acn = None, config = None, namespace = None, register_name = None, search_uri = None, pattern = None, ignore = None, sample_list_uri = None, metadata_uri = None, metadata_attr = "uri", max_files = None, contigs = Contigs.ALL, resume = True, extra_attrs = repr(DEFAULT_ATTRIBUTES), vcf_attrs = None, anchor_gap = None, compression_level = None, manifest_batch_size = MANIFEST_BATCH_SIZE, manifest_workers = MANIFEST_WORKERS, vcf_batch_size = VCF_BATCH_SIZE, vcf_workers = VCF_WORKERS, ingest_resources = None, verbose = False, create_index = True, trace_id = None, consolidate_stats = True, aws_find_mode = False)

Description

Ingest samples into a dataset.

Parameters

  • dataset_uri: dataset URI
  • acn: Access Credentials Name (ACN) registered in TileDB Cloud (ARN type), defaults to None
  • config: config dictionary, defaults to None
  • namespace: TileDB-Cloud namespace, defaults to None
  • register_name: name to register the dataset with on TileDB Cloud, defaults to None
  • search_uri: URI to search for VCF files, defaults to None
  • pattern: Unix shell style pattern to match when searching for VCF files, defaults to None
  • ignore: Unix shell style pattern to ignore when searching for VCF files, defaults to None
  • sample_list_uri: URI with a list of VCF URIs, defaults to None
  • metadata_uri: URI of metadata array holding VCF URIs, defaults to None
  • metadata_attr: name of metadata attribute containing URIs, defaults to uri
  • max_files: maximum number of VCF URIs to read/find, defaults to None (no limit)
  • max_samples: maximum number of samples to ingest, defaults to None (no limit)
  • contigs: contig mode (Contigs.ALL | Contigs.CHROMOSOMES | Contigs.OTHER | Contigs.ALL_DISABLE_MERGE) or list of contigs to ingest, defaults to Contigs.ALL
  • resume: enable resume ingestion mode, defaults to True
  • extra_attrs: INFO/FORMAT fields to materialize, defaults to repr(DEFAULT_ATTRIBUTES)
  • vcf_attrs: VCF with all INFO/FORMAT fields to materialize, defaults to None
  • anchor_gap: anchor gap for VCF dataset, defaults to None
  • compression_level: zstd compression level for the VCF dataset, defaults to None (uses the default level in TileDB-VCF)
  • manifest_batch_size: batch size for manifest ingestion, defaults to MANIFEST_BATCH_SIZE
  • manifest_workers: number of workers for manifest ingestion, defaults to MANIFEST_WORKERS
  • vcf_batch_size: batch size for VCF ingestion, defaults to VCF_BATCH_SIZE
  • vcf_workers: number of workers for VCF ingestion, defaults to VCF_WORKERS
  • vcf_threads: number of threads for VCF ingestion, defaults to VCF_THREADS
  • ingest_resources: manual override for ingest UDF resources, defaults to None
  • verbose: verbose logging, defaults to False
  • create_index: force creation of a local index file, defaults to True
  • trace_id: trace ID for logging, defaults to None
  • consolidate_stats: consolidate the stats arrays, defaults to True
  • aws_find_mode: use AWS CLI to find VCFs, defaults to False

ingest_annotations

Usage

tiledb.cloud.vcf.ingest_annotations(dataset_uri, vcf_uri = None, search_uri = None, pattern = None, ignore = None, create_index = True, config = None, acn = None, namespace = None, register_name = None, ingest_resources = None, verbose = False)

Description

Ingest annotation VCF into a dataset. For example, a ClinVar or gnomAD VCF.

Parameters

  • dataset_uri: dataset URI
  • vcf_uri: VCF URI, defaults to None
  • search_uri: URI to search for VCF files, defaults to None
  • pattern: Unix shell style pattern to match when searching for VCF files, defaults to None
  • ignore: Unix shell style pattern to ignore when searching for VCF files, defaults to None
  • create_index: force creation of a local index file, defaults to True
  • config: config dictionary, defaults to None
  • acn: Access Credentials Name (ACN) registered in TileDB Cloud (ARN type), defaults to None
  • namespace: TileDB-Cloud namespace, defaults to None
  • register_name: name to register the dataset with on TileDB Cloud, defaults to None
  • ingest_resources: manual override for ingest UDF resources, defaults to None
  • verbose: verbose logging, defaults to False

build_read_dag

Usage

tiledb.cloud.vcf.build_read_dag(dataset_uri, config = None, attrs = None, regions = None, bed_file = None, num_region_partitions = 1, samples = None, memory_budget_mb = 1024, af_filter = None, transform_result = None, max_sample_batch_size = 500, log_uri = None, namespace = None, resource_class = None, verbose = False)

Description

Build the DAG for a distributed read on a TileDB-VCF dataset.

Parameters

  • dataset_uri: dataset URI
  • config: config dictionary, defaults to None
  • attrs: attribute names to read, defaults to None
  • regions: genomics regions to read, defaults to None
  • bed_file: URI of a BED file containing genomics regions to read, defaults to None
  • num_region_partitions: number of region partitions, defaults to 1
  • samples: sample names to read, defaults to None
  • memory_budget_mb: VCF memory budget in MiB, defaults to 1024
  • af_filter: allele frequency filter, defaults to None
  • transform_result: function to apply to each partition; by default, does not transform the result
  • max_sample_batch_size: maximum number of samples to read in a single node, defaults to 500
  • log_uri: log array URI for profiling, defaults to None
  • namespace: TileDB-Cloud namespace, defaults to None
  • resource_class: TileDB-Cloud resource class for UDFs, defaults to None
  • verbose: verbose logging, defaults to False

Return values

DAG and result Node

read

Usage

tiledb.cloud.vcf.read(dataset_uri, config = None, attrs = None, regions = None, bed_file = None, num_region_partitions = 1, samples = None, memory_budget_mb = 1024, af_filter = None, transform_result = None, max_sample_batch_size = 500, log_uri = None, namespace = None, resource_class = None, verbose = False)

Description

Run a distributed read on a TileDB-VCF dataset.

Parameters

  • dataset_uri: dataset URI
  • config: config dictionary, defaults to None
  • attrs: attribute names to read, defaults to None
  • regions: genomics regions to read, defaults to None
  • bed_file: URI of a BED file containing genomics regions to read, defaults to None
  • num_region_partitions: number of region partitions, defaults to 1
  • samples: sample names to read, defaults to None
  • memory_budget_mb: VCF memory budget in MiB, defaults to 1024
  • af_filter: allele frequency filter, defaults to None
  • transform_result: function to apply to each partition; by default, does not transform the result
  • max_sample_batch_size: maximum number of samples to read in a single node, defaults to 500
  • log_uri: log array URI for profiling, defaults to None
  • namespace: TileDB-Cloud namespace, defaults to None
  • resource_class: TileDB-Cloud resource class for UDFs, defaults to None
  • verbose: verbose logging, defaults to False

Return value

  • Arrow table containing the query results
Python API
Biomedical Imaging