API-based method retrieves variant consequence proportions for human diseases
Explored by:
DARCI-MAHER, NICHOLAS WAXTER, RAZMA, CONNOR JOHN- (Bioinfo 201 - Winter 2022)
Description
Project overview
- 
Assume we are given a disease of interest 
- 
Use the Ensembl phenotype annotation endpoint to extract genomic regions associated with the disease 
- Use the Ensembl overlap endpoint to find variants in these regions 4. Use the Ensembl Variant Effect Predictor (VEP) endpoint to discover the consequences of each variant
- Calculate the proportions of each consequence for the disease
Function to access Ensembl Rest API features
● Input: Extension specifying application ● Output: Object associated with application ● In our case we will use this function to get a list of dictionaries with areas of the genome associated with alzheimer's

Function to find gene regions in our list of dictionaries
● Input: List of dictionaries of areas of genome associated with a disease ● Output: List of gene regions that are associated with a disease

Function to return all variants associated with regions
● Input: A list of regions associated with a disease ● Output: All variants associated with those regions

Use VEP to get variant consequence
● Input: a single variant ID (rsID) ● Output: The most severe consequence of that variant (e.g. missense, splice site, stop gained, etc.)

Calculate proportions of consequence in a variant list
● Input: a list of variant IDs ● Output: proportion of consequences for the disease associated with the list of variants

Results: Proportion of variant consequences in Alzheimer’s disease
● Ran our full pipeline on Alzheimer’s  disease (EFO ID=0000249) 
● Found that the most common
consequences in Alzheimer’s variants are: 
1. Intron variant (35%) 
2. Missense variant (14%) 
3.  3’ UTR variant (13%)
