Cloud API
read_allele_frequency
Description
Read variant status
Usage
tiledb.cloud.vcf.read_allele_frequency(dataset_uri, region)
Parameters
dataset_uri
: dataset URIregion
: genomics region to read
calc_af
Description
Consolidate allele count (AC
) and compute allele number (AN
), allele frequency (AF
)
Usage
tiledb.cloud.vcf.allele_frequency.calc_af(df)
Parameters
df
: a pandas dataframe
ingest
Usage
tiledb.cloud.vcf.ingest(dataset_uri, acn = None, config = None, namespace = None, register_name = None, search_uri = None, pattern = None, ignore = None, sample_list_uri = None, metadata_uri = None, metadata_attr = "uri", max_files = None, contigs = Contigs.ALL, resume = True, extra_attrs = repr(DEFAULT_ATTRIBUTES), vcf_attrs = None, anchor_gap = None, compression_level = None, manifest_batch_size = MANIFEST_BATCH_SIZE, manifest_workers = MANIFEST_WORKERS, vcf_batch_size = VCF_BATCH_SIZE, vcf_workers = VCF_WORKERS, ingest_resources = None, verbose = False, create_index = True, trace_id = None, consolidate_stats = True, aws_find_mode = False)
Description
Ingest samples into a dataset.
Parameters
dataset_uri
: dataset URIacn
: Access Credentials Name (ACN) registered in TileDB Cloud (ARN type), defaults toNone
config
: config dictionary, defaults toNone
namespace
: TileDB-Cloud namespace, defaults toNone
register_name
: name to register the dataset with on TileDB Cloud, defaults toNone
search_uri
: URI to search for VCF files, defaults toNone
pattern
: Unix shell style pattern to match when searching for VCF files, defaults toNone
ignore
: Unix shell style pattern to ignore when searching for VCF files, defaults toNone
sample_list_uri
: URI with a list of VCF URIs, defaults toNone
metadata_uri
: URI of metadata array holding VCF URIs, defaults toNone
metadata_attr
: name of metadata attribute containing URIs, defaults touri
max_files
: maximum number of VCF URIs to read/find, defaults toNone
(no limit)max_samples
: maximum number of samples to ingest, defaults toNone
(no limit)contigs
: contig mode (Contigs.ALL
|Contigs.CHROMOSOMES
|Contigs.OTHER
|Contigs.ALL_DISABLE_MERGE
) or list of contigs to ingest, defaults toContigs.ALL
resume
: enable resume ingestion mode, defaults toTrue
extra_attrs
:INFO
/FORMAT
fields to materialize, defaults torepr(DEFAULT_ATTRIBUTES)
vcf_attrs
: VCF with all INFO/FORMAT fields to materialize, defaults toNone
anchor_gap
: anchor gap for VCF dataset, defaults toNone
compression_level
: zstd compression level for the VCF dataset, defaults toNone
(uses the default level in TileDB-VCF)manifest_batch_size
: batch size for manifest ingestion, defaults toMANIFEST_BATCH_SIZE
manifest_workers
: number of workers for manifest ingestion, defaults toMANIFEST_WORKERS
vcf_batch_size
: batch size for VCF ingestion, defaults toVCF_BATCH_SIZE
vcf_workers
: number of workers for VCF ingestion, defaults toVCF_WORKERS
vcf_threads
: number of threads for VCF ingestion, defaults toVCF_THREADS
ingest_resources
: manual override for ingest UDF resources, defaults toNone
verbose
: verbose logging, defaults toFalse
create_index
: force creation of a local index file, defaults toTrue
trace_id
: trace ID for logging, defaults toNone
consolidate_stats
: consolidate the stats arrays, defaults toTrue
aws_find_mode
: use AWS CLI to find VCFs, defaults toFalse
ingest_annotations
Usage
tiledb.cloud.vcf.ingest_annotations(dataset_uri, vcf_uri = None, search_uri = None, pattern = None, ignore = None, create_index = True, config = None, acn = None, namespace = None, register_name = None, ingest_resources = None, verbose = False)
Description
Ingest annotation VCF into a dataset. For example, a ClinVar or gnomAD VCF.
Parameters
dataset_uri
: dataset URIvcf_uri
: VCF URI, defaults toNone
search_uri
: URI to search for VCF files, defaults toNone
pattern
: Unix shell style pattern to match when searching for VCF files, defaults toNone
ignore
: Unix shell style pattern to ignore when searching for VCF files, defaults toNone
create_index
: force creation of a local index file, defaults toTrue
config
: config dictionary, defaults toNone
acn
: Access Credentials Name (ACN) registered in TileDB Cloud (ARN type), defaults toNone
namespace
: TileDB-Cloud namespace, defaults toNone
register_name
: name to register the dataset with on TileDB Cloud, defaults toNone
ingest_resources
: manual override for ingest UDF resources, defaults toNone
verbose
: verbose logging, defaults toFalse
build_read_dag
Usage
tiledb.cloud.vcf.build_read_dag(dataset_uri, config = None, attrs = None, regions = None, bed_file = None, num_region_partitions = 1, samples = None, memory_budget_mb = 1024, af_filter = None, transform_result = None, max_sample_batch_size = 500, log_uri = None, namespace = None, resource_class = None, verbose = False)
Description
Build the DAG for a distributed read on a TileDB-VCF dataset.
Parameters
dataset_uri
: dataset URIconfig
: config dictionary, defaults toNone
attrs
: attribute names to read, defaults toNone
regions
: genomics regions to read, defaults toNone
bed_file
: URI of a BED file containing genomics regions to read, defaults toNone
num_region_partitions
: number of region partitions, defaults to1
samples
: sample names to read, defaults toNone
memory_budget_mb
: VCF memory budget in MiB, defaults to1024
af_filter
: allele frequency filter, defaults toNone
transform_result
: function to apply to each partition; by default, does not transform the resultmax_sample_batch_size
: maximum number of samples to read in a single node, defaults to500
log_uri
: log array URI for profiling, defaults toNone
namespace
: TileDB-Cloud namespace, defaults toNone
resource_class
: TileDB-Cloud resource class for UDFs, defaults toNone
verbose
: verbose logging, defaults toFalse
Return values
DAG and result Node
read
Usage
tiledb.cloud.vcf.read(dataset_uri, config = None, attrs = None, regions = None, bed_file = None, num_region_partitions = 1, samples = None, memory_budget_mb = 1024, af_filter = None, transform_result = None, max_sample_batch_size = 500, log_uri = None, namespace = None, resource_class = None, verbose = False)
Description
Run a distributed read on a TileDB-VCF dataset.
Parameters
dataset_uri
: dataset URIconfig
: config dictionary, defaults toNone
attrs
: attribute names to read, defaults toNone
regions
: genomics regions to read, defaults toNone
bed_file
: URI of a BED file containing genomics regions to read, defaults toNone
num_region_partitions
: number of region partitions, defaults to1
samples
: sample names to read, defaults toNone
memory_budget_mb
: VCF memory budget in MiB, defaults to1024
af_filter
: allele frequency filter, defaults toNone
transform_result
: function to apply to each partition; by default, does not transform the resultmax_sample_batch_size
: maximum number of samples to read in a single node, defaults to500
log_uri
: log array URI for profiling, defaults toNone
namespace
: TileDB-Cloud namespace, defaults toNone
resource_class
: TileDB-Cloud resource class for UDFs, defaults toNone
verbose
: verbose logging, defaults toFalse
Return value
Arrow
table containing the query results