1. Structure
  2. Life Sciences
  3. Population Genomics
  4. Tutorials
  5. Advanced
  6. Annotations
  7. External Annotations
  • Home
  • What is TileDB?
  • Get Started
  • Explore Content
  • Accounts
    • Individual Accounts
      • Apply for the Free Tier
      • Profile
        • Overview
        • Cloud Credentials
        • Storage Paths
        • REST API Tokens
        • Credits
    • Organization Admins
      • Create an Organization
      • Profile
        • Overview
        • Members
        • Cloud Credentials
        • Storage Paths
        • Billing
      • API Tokens
    • Organization Members
      • Organization Invitations
      • Profile
        • Overview
        • Members
        • Cloud Credentials
        • Storage Paths
        • Billing
      • API Tokens
  • Catalog
    • Introduction
    • Data
      • Arrays
      • Tables
      • Single-Cell (SOMA)
      • Genomics (VCF)
      • Biomedical Imaging
      • Vector Search
      • Files
    • Code
      • Notebooks
      • Dashboards
      • User-Defined Functions
      • Task Graphs
      • ML Models
    • Groups
    • Marketplace
    • Search
  • Collaborate
    • Introduction
    • Organizations
    • Access Control
      • Introduction
      • Share Assets
      • Asset Permissions
      • Public Assets
    • Logging
    • Marketplace
  • Analyze
    • Introduction
    • Slice Data
    • Multi-Region Redirection
    • Notebooks
      • Launch a Notebook
      • Usage
      • Widgets
      • Notebook Image Dependencies
    • Dashboards
      • Dashboards
      • Streamlit
    • Preview
    • User-Defined Functions
    • Task Graphs
    • Serverless SQL
    • Monitor
      • Task Log
      • Task Graph Log
  • Scale
    • Introduction
    • Task Graphs
    • API Usage
  • Structure
    • Why Structure Is Important
    • Arrays
      • Introduction
      • Quickstart
      • Foundation
        • Array Data Model
        • Key Concepts
          • Storage
            • Arrays
            • Dimensions
            • Attributes
            • Cells
            • Domain
            • Tiles
            • Data Layout
            • Compression
            • Encryption
            • Tile Filters
            • Array Schema
            • Schema Evolution
            • Fragments
            • Fragment Metadata
            • Commits
            • Indexing
            • Array Metadata
            • Datetimes
            • Groups
            • Object Stores
          • Compute
            • Writes
            • Deletions
            • Consolidation
            • Vacuuming
            • Time Traveling
            • Reads
            • Query Conditions
            • Aggregates
            • User-Defined Functions
            • Distributed Compute
            • Concurrency
            • Parallelism
        • Storage Format Spec
      • Tutorials
        • Basics
          • Basic Dense Array
          • Basic Sparse Array
          • Array Metadata
          • Compression
          • Encryption
          • Data Layout
          • Tile Filters
          • Datetimes
          • Multiple Attributes
          • Variable-Length Attributes
          • String Dimensions
          • Nullable Attributes
          • Multi-Range Reads
          • Query Conditions
          • Aggregates
          • Deletions
          • Catching Errors
          • Configuration
          • Basic S3 Example
          • Basic TileDB Cloud
          • fromDataFrame
          • Palmer Penguins
        • Advanced
          • Schema Evolution
          • Advanced Writes
            • Write at a Timestamp
            • Get Fragment Info
            • Consolidation
              • Fragments
              • Fragment List
              • Consolidation Plan
              • Commits
              • Fragment Metadata
              • Array Metadata
            • Vacuuming
              • Fragments
              • Commits
              • Fragment Metadata
              • Array Metadata
          • Advanced Reads
            • Get Fragment Info
            • Time Traveling
              • Introduction
              • Fragments
              • Array Metadata
              • Schema Evolution
          • Array Upgrade
          • Backends
            • Amazon S3
            • Azure Blob Storage
            • Google Cloud Storage
            • MinIO
            • Lustre
          • Virtual Filesystem
          • User-Defined Functions
          • Distributed Compute
          • Result Estimation
          • Incomplete Queries
        • Management
          • Array Schema
          • Groups
          • Object Management
        • Performance
          • Summary of Factors
          • Dense vs. Sparse
          • Dimensions vs. Attributes
          • Compression
          • Tiling and Data Layout
          • Tuning Writes
          • Tuning Reads
      • API Reference
    • Tables
      • Introduction
      • Quickstart
      • Foundation
        • Data Model
        • Key Concepts
          • Indexes
          • Columnar Storage
          • Compression
          • Data Manipulation
          • Optimize Tables
          • ACID
          • Serverless SQL
          • SQL Connectors
          • Dataframes
          • CSV Ingestion
      • Tutorials
        • Basics
          • Ingestion with SQL
          • CSV Ingestion
          • Basic S3 Example
          • Running Locally
        • Advanced
          • Scalable Ingestion
          • Scalable Queries
      • API Reference
    • AI & ML
      • Vector Search
        • Introduction
        • Quickstart
        • Foundation
          • Data Model
          • Key Concepts
            • Vector Search
            • Vector Databases
            • Algorithms
            • Distance Metrics
            • Updates
            • Deployment Methods
            • Architecture
            • Distributed Compute
          • Storage Format Spec
        • Tutorials
          • Basics
            • Ingestion & Querying
            • Updates
            • Deletions
            • Basic S3 Example
            • Running Locally
          • Advanced
            • Versioning
            • Time Traveling
            • Consolidation
            • Distributed Compute
            • RAG LLM
            • LLM Memory
            • File Search
            • Image Search
            • Protein Search
          • Performance
        • API Reference
      • ML Models
        • Introduction
        • Quickstart
        • Foundation
          • Basics
          • Storage
          • Cloud Execution
          • Why TileDB for Machine Learning
        • Tutorials
          • Ingestion
            • Data Ingestion
              • Dense Datasets
              • Sparse Datasets
            • ML Model Ingestion
          • Management
            • Array Schema
            • Machine Learning: Groups
            • Time Traveling
    • Life Sciences
      • Single-cell
        • Introduction
        • Quickstart
        • Foundation
          • Data Model
          • Key Concepts
            • Data Structures
            • Use of Apache Arrow
            • Join IDs
            • State Management
            • TileDB Cloud URIs
          • SOMA API Specification
        • Tutorials
          • Data Ingestion
          • Bulk Ingestion Tutorial
          • Data Access
          • Distributed Compute
          • Basic S3 Example
          • Multi-Experiment Queries
          • Appending Data to a SOMA Experiment
          • Add New Measurements
          • SQL Queries
          • Running Locally
          • Shapes in TileDB-SOMA
          • Drug Discovery App
        • Spatial
          • Introduction
          • Foundation
            • Spatial Data Model
            • Data Structures
          • Tutorials
            • Spatial Data Ingestion
            • Access Spatial Data
            • Manage Coordinate Spaces
        • API Reference
      • Population Genomics
        • Introduction
        • Quickstart
        • Foundation
          • Data Model
          • Key Concepts
            • The N+1 Problem
            • Architecture
            • Arrays
            • Ingestion
            • Reads
            • Variant Statistics
            • Annotations
            • User-Defined Functions
            • Tables and SQL
            • Distributed Compute
          • Storage Format Spec
        • Tutorials
          • Basics
            • Basic Ingestion
            • Basic Queries
            • Export to VCF
            • Add New Samples
            • Deleting Samples
            • Basic S3 Example
            • Basic TileDB Cloud
          • Advanced
            • Scalable Ingestion
            • Scalable Queries
            • Query Transforms
            • Handling Large Queries
            • Annotations
              • Finding Annotations
              • Embedded Annotations
              • External Annotations
              • Annotation VCFs
              • Ingesting Annotations
            • Variant Statistics
            • Tables and SQL
            • User-Defined Functions
            • Sample Metadata
            • Split VCF
          • Performance
        • API Reference
          • Command Line Interface
          • Python API
          • Cloud API
      • Biomedical Imaging
        • Introduction
        • Foundation
          • Data Model
          • Key Concepts
            • Arrays
            • Ingestion
            • Reads
            • User Defined Functions
          • Storage Format Spec
        • Quickstart
        • Tutorials
          • Basics
            • Ingestion
            • Read
              • OpenSlide
              • TileDB-Py
          • Advanced
            • Batched Ingestion
            • Chunked Ingestion
            • Machine Learning
              • PyTorch
            • Napari
    • Files
  • API Reference
  • Self-Hosting
    • Installation
    • Upgrades
    • Administrative Tasks
    • Image Customization
      • Customize User-Defined Function Images
      • AWS ECR Container Registry
      • Customize Jupyter Notebook Images
    • Single Sign-On
      • Configure Single Sign-On
      • OpenID Connect
      • Okta SCIM
      • Microsoft Entra
  • Glossary
  1. Structure
  2. Life Sciences
  3. Population Genomics
  4. Tutorials
  5. Advanced
  6. Annotations
  7. External Annotations

External Annotations

life sciences
genomics (vcf)
tutorials
annotations
remote access
Learn about joining variant annotation sources with TileDB-VCF datasets.
How to run this tutorial

You can run this tutorial only on TileDB Cloud. However, TileDB Cloud has a free tier. We strongly recommend that you sign up and run everything there, as that requires no installations or deployment.

This tutorial demonstrates how to query external annotation tables (stored as TileDB arrays) and join the returned information against TileDB-VCF datasets.

Import the necessary libraries, and set the URIs that will be used in this tutorial. If you are running this from a local notebook, visit the Tutorials: Basic TileDB Cloud for more information on how to set your TileDB Cloud credentials in a configuration object (this step can be omitted inside a TileDB Cloud notebook).

  • Python
import os

import tiledb
import tiledb.cloud
import tiledb.cloud.vcf
import tiledb.cloud.vcf.vcf_toolbox as vtb
import tiledbvcf

# Get your credentials
tiledb_token = os.environ["TILEDB_REST_TOKEN"]
# or use your username and password (not recommended)
# tiledb_username = os.environ["TILEDB_USERNAME"]
# tiledb_password = os.environ["TILEDB_PASSWORD"]


# Public URI datasets to be used in this tutorial
vep_uri = "tiledb://tiledb-genomics-dev/vep_20230726_6"
vcf_uri = "tiledb://TileDB-Inc/vcf-1kg-dragen-v376"

# Log into TileDB Cloud
tiledb.cloud.login(token=tiledb_token)
# or use your username and password (not recommended)
# tiledb.cloud.login(username=tiledb_username, password=tiledb_password)

First, create a function that searches for terms in the Consequence field of a VEP annotation table:

  • Python
def query_vep_by_consequence(
    vep_uri: str = None, consequence_list: list = None, genomic_coordinates: list = None
):
    """
    This function queries the VEP Variant Annotation
    Arguments:
        @param consequence_list: the list of VEP consequences
        @param genomic_coordinates: The list of gene coordinates
    Returns:
        A list of variants and consequences
    Todo:
        Accept gene lists rather than genomic coordinates
    """
    import re

    import pandas
    import pyarrow
    import tiledb

    with tiledb.open(vep_uri, ctx=tiledb.cloud.Ctx()) as vep_array_obj:
        resdfs = []
        for genomic_coodinate in genomic_coordinates:
            regexgroups = re.match(
                "(chr[0-9XYMT]+):([0-9]+)-([0-9]+)", genomic_coodinate.replace(" ", "")
            )
            regexchr = regexgroups.group(1)
            regexstart = int(regexgroups.group(2))
            regexend = int(regexgroups.group(3))  # lose the dash
            resdfs += [
                vep_array_obj.query(
                    cond=f"Consequence in {consequence_list}",
                    dims=[
                        "contig",
                        "pos_start",
                    ],
                    attrs=[
                        "ref",
                        "alt",
                        "Gene",
                        "Feature",
                        "Feature_type",
                        "Consequence",
                        "cDNA_position",
                        "CDS_position",
                        "Protein_position",
                        "Amino_acids",
                        "Codons",
                    ],
                ).df[regexchr, regexstart:regexend]
            ]
        results_df = pandas.concat(resdfs, ignore_index=True)
    results = pyarrow.Table.from_pandas(results_df, preserve_index=False)
    return results

Use this function to search for frameshifts in the TTN gene, in 1000 Genomes samples as was done in this manuscript.

  • Python
consequence_results = query_vep_by_consequence(
    vep_uri=vep_uri,
    consequence_list=["frameshift_variant"],
    genomic_coordinates=["chr2:178525989-178830802"],
)
consequence_results.to_pandas()
contig pos_start ref alt Gene Feature Feature_type Consequence cDNA_position CDS_position Protein_position Amino_acids Codons
0 chr2 178527497 TA T ENSG00000155657 ENST00000589042 Transcript frameshift_variant 107853 107628 35876 N/X aaT/aa
1 chr2 178530599 TC T ENSG00000155657 ENST00000589042 Transcript frameshift_variant 106240 106015 35339 D/X Gat/at
2 chr2 178531200 T TC ENSG00000155657 ENST00000589042 Transcript frameshift_variant 105639-105640 105414-105415 35138-35139 -/X -/G
3 chr2 178531203 TG T ENSG00000155657 ENST00000589042 Transcript frameshift_variant 105636 105411 35137 S/X tcC/tc
4 chr2 178531285 T TG ENSG00000155657 ENST00000589042 Transcript frameshift_variant 105554-105555 105329-105330 35110 E/DX gaa/gaCa
... ... ... ... ... ... ... ... ... ... ... ... ... ...
124 chr2 178774924 C CGTTGTTG ENSG00000155657 ENST00000589042 Transcript frameshift_variant 7011-7012 6786-6787 2262-2263 -/QQX -/CAACAAC
125 chr2 178774927 CAA C ENSG00000155657 ENST00000589042 Transcript frameshift_variant 7007-7008 6782-6783 2261 I/X aTT/a
126 chr2 178779363 CT C ENSG00000155657 ENST00000589042 Transcript frameshift_variant 4053 3828 1276 E/X gaA/ga
127 chr2 178785866 TC T ENSG00000155657 ENST00000589042 Transcript frameshift_variant 2576 2351 784 G/X gGa/ga
128 chr2 178793484 C CT ENSG00000155657 ENST00000589042 Transcript frameshift_variant 1680-1681 1455-1456 485-486 -/X -/A

129 rows × 13 columns

This function can generate a list of search loci (chr:start-end) that works with a tiledbvcf.read query

Use query_vep_by_consequence to search for coding mutations in the region chr16:30915000-30975000 and generate loci of interest.

  • Python
vep_res_arrow = query_vep_by_consequence(
    vep_uri=vep_uri,
    consequence_list=["missense_variant", "nonsense_variant", "frameshift_variant"],
    genomic_coordinates=["chr16:30915000-30975000"],
)
vep_res = vep_res_arrow.to_pandas()
query_loci = vep_res.apply(
    lambda x: f"{x['contig']}:{x['pos_start']}-{x['pos_start']}", axis=1
).tolist()
len(query_loci)
233

Query the DRAGEN 1000 Genomes dataset on those loci:

  • Python
ds = tiledbvcf.Dataset(vcf_uri, tiledb_config=tiledb.cloud.Config())
attrs = ds.attributes()
vcf_df = ds.read(regions=query_loci, samples=None, attrs=attrs)
vcf_df
alleles contig filters fmt fmt_AD fmt_AF fmt_DP fmt_F1R2 fmt_F2R1 fmt_GP ... info_R2_5P_bias info_ReadPosRankSum info_SOR pos_end pos_start qual query_bed_end query_bed_line query_bed_start sample_name
0 [T, TC] chr16 [PASS] [0, 0, 0, 0] [18, 13] [0.419] 31 [11, 7] [7, 6] [48.238, 8.6978e-05, 53.0] ... [-0.172] [-1.541] [2.147] 30953571 30953571 48.240002 30953571 -1 30953570 NA18868
1 [G, T] chr16 [PASS] [0, 0, 0, 0] [21, 18] [0.462] 39 [4, 9] [17, 9] [49.354, 7.2222e-05, 53.0] ... [20.499] [3.761] [0.646] 30958751 30958751 49.349998 30958751 -1 30958750 NA18871
2 [G, A] chr16 [PASS] [0, 0, 0, 0] [10, 20] [0.667] 30 [8, 7] [2, 13] [50.0, 0.00016257, 45.622] ... [-3.06] [2.354] [0.765] 30964724 30964724 50.000000 30964724 -1 30964723 NA18619
3 [A, T] chr16 [PASS] [0, 0, 0, 0] [26, 21] [0.447] 47 [10, 11] [16, 10] [48.754, 7.9471e-05, 53.0] ... [1.227] [3.691] [0.905] 30964989 30964989 48.750000 30964989 -1 30964988 NA18626
4 [C, G] chr16 [PASS] [0, 0, 0, 0] [19, 10] [0.345] 29 [5, 5] [14, 5] [44.034, 0.00019337, 53.0] ... [-6.998] [3.556] [0.555] 30965664 30965664 44.029999 30965664 -1 30965663 NA18570
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
263 [T, A] chr16 [PASS] [0, 0, 0, 0] [21, 25] [0.543] 46 [12, 18] [9, 7] [50.0, 6.9116e-05, 52.272] ... [-4.939] [4.014] [0.523] 30964808 30964808 50.000000 30964808 -1 30964807 HG03022
264 [C, G] chr16 [PASS] [0, 0, 0, 0] [17, 24] [0.585] 41 [5, 11] [12, 13] [50.0, 8.2059e-05, 50.516] ... [8.578] [3.533] [0.637] 30965042 30965042 50.000000 30965042 -1 30965041 HG03096
265 [G, A] chr16 [PASS] [0, 0, 0, 0] [24, 20] [0.455] 44 [8, 5] [16, 15] [49.279, 7.2999e-05, 53.0] ... [-6.616] [1.638] [0.43] 30966144 30966144 49.279999 30966144 -1 30966143 HG03168
266 [G, A] chr16 [PASS] [0, 0, 0, 0] [20, 28] [0.583] 48 [11, 11] [9, 17] [50.0, 8.1542e-05, 50.576] ... [-22.176] [2.75] [1.565] 30966144 30966144 50.000000 30966144 -1 30966143 HG03170
267 [G, T] chr16 [PASS] [0, 0, 0, 0] [18, 21] [0.538] 39 [12, 14] [6, 7] [50.0, 6.8339e-05, 52.426] ... [20.639] [3.169] [1.432] 30971395 30971395 50.000000 30971395 -1 30971394 HG03063

268 rows × 42 columns

Split the resulting alleles column to generate ref and alt columns for a join, and perform a join to merge the VCF and VEP data frames. The resulting data frame is limited to coding variants found in the TileDB-VCF dataset for this region.

  • Python
vcf_df["ref"] = vcf_df["alleles"].str[0]
vcf_df["alt"] = vcf_df["alleles"].apply(lambda x: ",".join(x[1:]))
vcf_df = vcf_df.drop("alleles", axis=1)
vcf_df.merge(vep_res, on=["contig", "pos_start", "ref", "alt"])
contig filters fmt fmt_AD fmt_AF fmt_DP fmt_F1R2 fmt_F2R1 fmt_GP fmt_GQ ... alt Gene Feature Feature_type Consequence cDNA_position CDS_position Protein_position Amino_acids Codons
0 chr16 [PASS] [0, 0, 0, 0] [18, 13] [0.419] 31 [11, 7] [7, 6] [48.238, 8.6978e-05, 53.0] 47 ... TC ENSG00000175938 ENST00000318663 Transcript frameshift_variant 837-838 615-616 205-206 -/X -/C
1 chr16 [PASS] [0, 0, 0, 0] [19, 17] [0.472] 36 [10, 11] [9, 6] [49.758, 6.7563e-05, 53.0] 48 ... TC ENSG00000175938 ENST00000318663 Transcript frameshift_variant 837-838 615-616 205-206 -/X -/C
2 chr16 [PASS] [0, 0, 0, 0] [28, 30] [0.517] 58 [13, 10] [15, 20] [50.0, 6.5751e-05, 52.896] 48 ... TC ENSG00000175938 ENST00000318663 Transcript frameshift_variant 837-838 615-616 205-206 -/X -/C
3 chr16 [PASS] [0, 0, 0, 0] [20, 25] [0.556] 45 [7, 7] [13, 18] [50.0, 7.1187e-05, 51.927] 48 ... TC ENSG00000175938 ENST00000318663 Transcript frameshift_variant 837-838 615-616 205-206 -/X -/C
4 chr16 [PASS] [0, 0, 0, 0] [23, 28] [0.549] 51 [12, 14] [11, 14] [50.0, 6.9893e-05, 52.166] 48 ... TC ENSG00000175938 ENST00000318663 Transcript frameshift_variant 837-838 615-616 205-206 -/X -/C
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
260 chr16 [PASS] [0, 0, 0, 0] [15, 14] [0.483] 29 [8, 7] [7, 7] [49.925, 6.601e-05, 53.0] 48 ... A ENSG00000099381 ENST00000262519 Transcript missense_variant 2571 2345 782 G/D gGc/gAc
261 chr16 [PASS] [0, 0, 0, 0] [31, 13] [0.295] 44 [13, 10] [18, 3] [35.039, 0.001383, 53.001] 35 ... C ENSG00000099364 ENST00000338343 Transcript frameshift_variant 2297 1385 462 R/X cGc/cc
262 chr16 [PASS] [0, 0, 0, 0] [15, 20] [0.571] 35 [7, 11] [8, 9] [50.0, 7.4552e-05, 51.45] 48 ... T ENSG00000175938 ENST00000318663 Transcript missense_variant 934 712 238 H/Y Cat/Tat
263 chr16 [PASS] [0, 0, 0, 0] [13, 12] [0.48] 25 [9, 7] [4, 5] [49.913, 6.601e-05, 53.0] 48 ... T ENSG00000175938 ENST00000318663 Transcript missense_variant 934 712 238 H/Y Cat/Tat
264 chr16 [PASS] [0, 0, 0, 0] [13, 22] [0.629] 35 [9, 13] [4, 9] [50.0, 0.00011364, 47.909] 46 ... T ENSG00000099381 ENST00000262519 Transcript missense_variant 1047 821 274 T/M aCg/aTg

265 rows × 52 columns

TileDB Cloud offers a convenience utility in the tiledb.cloud.vcf.vcftoolbox package called annotate that allows you to perform joins like the ones described above, more easily and in a distributed fashion (similar to Tutorials: Scalable Queries). This approach works well with the transform function included in tiledb.cloud.vcf (visit the Tutorials: Query Transforms for more information on query transforms).

Here is the documentation of the annotate functions:

  • Python
from tiledb.cloud.vcf.vcf_toolbox.annotate import _annotate

help(_annotate)
Help on function _annotate in module tiledb.cloud.vcf.vcf_toolbox.annotate:

_annotate(vcf_df: pandas.core.frame.DataFrame, *, ann_uri: str, ann_regions: Union[str, Sequence[str]], ann_attrs: Union[Sequence[str], str, NoneType] = None, vcf_filter: Optional[str] = None, split_multiallelic: bool = True, add_zygosity: bool = False, reorder: Optional[Sequence[str]] = None, rename: Optional[Mapping[str, str]] = None, verbose: bool = False) -> pandas.core.frame.DataFrame
    Annotate a VCF DataFrame with annotations from a TileDB array.
    
    :param vcf_df: VCF DataFrame to annotate
    :param ann_uri: URI of the annotation array
    :param ann_regions: regions to annotate. All regions must be in the same
        chromosome/contig.
    :param ann_attrs: annotation attributes to read,
        defaults to None which queries all attributes.
    :param vcf_filter: a pandas filter to apply to the VCF DataFrame before annotation,
        defaults to None
    :param split_multiallelic: split multiallelic variants into separate rows,
        defaults to True
    :param add_zygosity: add zygosity column to the DataFrame, defaults to False
    :param reorder: list of columns to reorder (before renaming), defaults to None
    :param rename: dict of columns to rename, defaults to None
    :param verbose: enable verbose logging, defaults to False
    :return: annotated VCF DataFrame

Configure and run the VCF query with VEP annotations on a small region of NA12878.

  • Python
regions = "chr21:26973732-27213386"

# Run the VCF query with annotation
df = tiledb.cloud.vcf.read(
    dataset_uri=vcf_uri,
    regions=regions,
    samples="NA12878",
    transform_result=vtb.annotate(
        ann_uri=vep_uri,
        ann_regions=regions,
    ),
).to_pandas()
df
sample_name contig pos_start fmt_GT ref alt Gene Feature Feature_type Consequence ... gnomADg_ASJ_AF gnomADg_EAS_AF gnomADg_FIN_AF gnomADg_MID_AF gnomADg_NFE_AF gnomADg_OTH_AF gnomADg_SAS_AF CLIN_SIG SOMATIC PHENO
0 NA12878 chr21 26973860 [1, 1] G A None None None intergenic_variant ... 0.1866 0.008484 0.2310 0.1804 0.2395 0.1740 0.09950 None None None
1 NA12878 chr21 26974227 [1, 1] G C None None None intergenic_variant ... 0.6084 0.249900 0.5712 0.6487 0.5876 0.5941 0.38260 None None 1
2 NA12878 chr21 26974527 [1, 1] G A None None None intergenic_variant ... 0.1869 0.008308 0.2312 0.1804 0.2397 0.1745 0.09921 None None None
3 NA12878 chr21 26976088 [1, 1] CTATATA C None None None intergenic_variant ... 0.2852 0.059410 0.3608 0.3654 0.3537 0.3103 0.16260 None None None
4 NA12878 chr21 26977141 [1, 1] C A None None None intergenic_variant ... 0.1872 0.008320 0.2307 0.1772 0.2401 0.1761 0.09959 None None None
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
361 NA12878 chr21 27207400 [1, 1] C T None None None intergenic_variant ... 0.8284 0.829600 0.8893 0.7922 0.8123 0.8211 0.86620 None None None
362 NA12878 chr21 27208103 [1, 1] T C None None None intergenic_variant ... 0.8468 0.829200 0.9266 0.8323 0.8473 0.8622 0.87770 None None None
363 NA12878 chr21 27209337 [0, 1] T G None None None intergenic_variant ... 0.3681 0.493500 0.4325 0.3013 0.3781 0.3706 0.29170 None None None
364 NA12878 chr21 27210578 [1, 1] G A None None None intergenic_variant ... 0.7875 0.805600 0.8903 0.7468 0.8127 0.7874 0.79300 None None None
365 NA12878 chr21 27211633 [1, 1] G C None None None intergenic_variant ... 0.8847 0.406600 0.8037 0.8481 0.8779 0.8270 0.80540 None None None

366 rows × 47 columns

Embedded Annotations
Annotation VCFs