Cloud API
life sciences
genomics (vcf)
reference
read_allele_frequency
Description
Read variant status
Usage
tiledb.cloud.vcf.read_allele_frequency(dataset_uri, region)
Parameters
dataset_uri: dataset URIregion: genomics region to read
calc_af
Description
Consolidate allele count (AC) and compute allele number (AN), allele frequency (AF)
Usage
tiledb.cloud.vcf.allele_frequency.calc_af(df)
Parameters
df: a pandas dataframe
ingest
Usage
tiledb.cloud.vcf.ingest(dataset_uri, acn = None, config = None, namespace = None, register_name = None, search_uri = None, pattern = None, ignore = None, sample_list_uri = None, metadata_uri = None, metadata_attr = "uri", max_files = None, contigs = Contigs.ALL, resume = True, extra_attrs = repr(DEFAULT_ATTRIBUTES), vcf_attrs = None, anchor_gap = None, compression_level = None, manifest_batch_size = MANIFEST_BATCH_SIZE, manifest_workers = MANIFEST_WORKERS, vcf_batch_size = VCF_BATCH_SIZE, vcf_workers = VCF_WORKERS, ingest_resources = None, verbose = False, create_index = True, trace_id = None, consolidate_stats = True, aws_find_mode = False)
Description
Ingest samples into a dataset.
Parameters
dataset_uri: dataset URIacn: Access Credentials Name (ACN) registered in TileDB Cloud (ARN type), defaults toNoneconfig: config dictionary, defaults toNonenamespace: TileDB-Cloud namespace, defaults toNoneregister_name: name to register the dataset with on TileDB Cloud, defaults toNonesearch_uri: URI to search for VCF files, defaults toNonepattern: Unix shell style pattern to match when searching for VCF files, defaults toNoneignore: Unix shell style pattern to ignore when searching for VCF files, defaults toNonesample_list_uri: URI with a list of VCF URIs, defaults toNonemetadata_uri: URI of metadata array holding VCF URIs, defaults toNonemetadata_attr: name of metadata attribute containing URIs, defaults tourimax_files: maximum number of VCF URIs to read/find, defaults toNone(no limit)max_samples: maximum number of samples to ingest, defaults toNone(no limit)contigs: contig mode (Contigs.ALL|Contigs.CHROMOSOMES|Contigs.OTHER|Contigs.ALL_DISABLE_MERGE) or list of contigs to ingest, defaults toContigs.ALLresume: enable resume ingestion mode, defaults toTrueextra_attrs:INFO/FORMATfields to materialize, defaults torepr(DEFAULT_ATTRIBUTES)vcf_attrs: VCF with all INFO/FORMAT fields to materialize, defaults toNoneanchor_gap: anchor gap for VCF dataset, defaults toNonecompression_level: zstd compression level for the VCF dataset, defaults toNone(uses the default level in TileDB-VCF)manifest_batch_size: batch size for manifest ingestion, defaults toMANIFEST_BATCH_SIZEmanifest_workers: number of workers for manifest ingestion, defaults toMANIFEST_WORKERSvcf_batch_size: batch size for VCF ingestion, defaults toVCF_BATCH_SIZEvcf_workers: number of workers for VCF ingestion, defaults toVCF_WORKERSvcf_threads: number of threads for VCF ingestion, defaults toVCF_THREADSingest_resources: manual override for ingest UDF resources, defaults toNoneverbose: verbose logging, defaults toFalsecreate_index: force creation of a local index file, defaults toTruetrace_id: trace ID for logging, defaults toNoneconsolidate_stats: consolidate the stats arrays, defaults toTrueaws_find_mode: use AWS CLI to find VCFs, defaults toFalse
ingest_annotations
Usage
tiledb.cloud.vcf.ingest_annotations(dataset_uri, vcf_uri = None, search_uri = None, pattern = None, ignore = None, create_index = True, config = None, acn = None, namespace = None, register_name = None, ingest_resources = None, verbose = False)
Description
Ingest annotation VCF into a dataset. For example, a ClinVar or gnomAD VCF.
Parameters
dataset_uri: dataset URIvcf_uri: VCF URI, defaults toNonesearch_uri: URI to search for VCF files, defaults toNonepattern: Unix shell style pattern to match when searching for VCF files, defaults toNoneignore: Unix shell style pattern to ignore when searching for VCF files, defaults toNonecreate_index: force creation of a local index file, defaults toTrueconfig: config dictionary, defaults toNoneacn: Access Credentials Name (ACN) registered in TileDB Cloud (ARN type), defaults toNonenamespace: TileDB-Cloud namespace, defaults toNoneregister_name: name to register the dataset with on TileDB Cloud, defaults toNoneingest_resources: manual override for ingest UDF resources, defaults toNoneverbose: verbose logging, defaults toFalse
build_read_dag
Usage
tiledb.cloud.vcf.build_read_dag(dataset_uri, config = None, attrs = None, regions = None, bed_file = None, num_region_partitions = 1, samples = None, memory_budget_mb = 1024, af_filter = None, transform_result = None, max_sample_batch_size = 500, log_uri = None, namespace = None, resource_class = None, verbose = False)
Description
Build the DAG for a distributed read on a TileDB-VCF dataset.
Parameters
dataset_uri: dataset URIconfig: config dictionary, defaults toNoneattrs: attribute names to read, defaults toNoneregions: genomics regions to read, defaults toNonebed_file: URI of a BED file containing genomics regions to read, defaults toNonenum_region_partitions: number of region partitions, defaults to1samples: sample names to read, defaults toNonememory_budget_mb: VCF memory budget in MiB, defaults to1024af_filter: allele frequency filter, defaults toNonetransform_result: function to apply to each partition; by default, does not transform the resultmax_sample_batch_size: maximum number of samples to read in a single node, defaults to500log_uri: log array URI for profiling, defaults toNonenamespace: TileDB-Cloud namespace, defaults toNoneresource_class: TileDB-Cloud resource class for UDFs, defaults toNoneverbose: verbose logging, defaults toFalse
Return values
DAG and result Node
read
Usage
tiledb.cloud.vcf.read(dataset_uri, config = None, attrs = None, regions = None, bed_file = None, num_region_partitions = 1, samples = None, memory_budget_mb = 1024, af_filter = None, transform_result = None, max_sample_batch_size = 500, log_uri = None, namespace = None, resource_class = None, verbose = False)
Description
Run a distributed read on a TileDB-VCF dataset.
Parameters
dataset_uri: dataset URIconfig: config dictionary, defaults toNoneattrs: attribute names to read, defaults toNoneregions: genomics regions to read, defaults toNonebed_file: URI of a BED file containing genomics regions to read, defaults toNonenum_region_partitions: number of region partitions, defaults to1samples: sample names to read, defaults toNonememory_budget_mb: VCF memory budget in MiB, defaults to1024af_filter: allele frequency filter, defaults toNonetransform_result: function to apply to each partition; by default, does not transform the resultmax_sample_batch_size: maximum number of samples to read in a single node, defaults to500log_uri: log array URI for profiling, defaults toNonenamespace: TileDB-Cloud namespace, defaults toNoneresource_class: TileDB-Cloud resource class for UDFs, defaults toNoneverbose: verbose logging, defaults toFalse
Return value
Arrowtable containing the query results