Project Lead:

Analysis Methodology

Our objective was to nominate the most probable genes to be involved in PD from each GWAS locus based on the most recent PD GWAS (see Figure 1 for the study protocol). To do so, we first defined all the genes and SNPs that are within these loci (see below) and used a machine learning approach to nominate the top genes in each locus. Based on the previous literature and consensus between authors, we identified seven genes from well-established loci associated with PD that can be considered the likeliest driving genes of their respective loci (GBA1, LRRK2, SNCA, GCH1, MAPT, TMEM175, VPS13C). We then acquired data for multiple features, including different distance measures from top SNPs, different QTLs, expression in relevant tissues and cell types and predictions of variant consequences (78 features out of 284 were used after removal of redundant features, Supplementary Table 1). Using the seven well-established PD genes, which were labeled as positive, and 212 genes in the same loci that received negative labels (i.e. not likely to drive the association with PD, since the PD-driving gene is already well established), we trained a machine learning model. This model enabled us to generate a prediction score for each gene within each locus, assessing their potential involvement in PD. The gene with the highest score in each locus is the nominated gene to be associated with PD. We then performed multiple post hoc analyses to further validate and explore our results: burden tests for rare variants in the top-scoring genes, pathway enrichment and pathway PRS analyses, differential expression analyses and structural analyses for candidate coding variants.


Underlying Analyses

Analysis Type:

Machine Learning

Target Type:

Gene

Description:

Genome-wide association studies (GWAS) have nominated many variants associated with complex traits. In Parkinson’s disease (PD), the most recent GWAS revealed 90 independent risk variants across 78 genomic loci. Although many single-nucleotide polymorphisms (SNPs) are in novel genomic loci, well-established PD genes discovered many years ago, such as LRRK2, PINK1, DJ-1, SNCA, GBA1, PRKN and MAPT still account for the vast majority of research on Parkinson’s disease. Several disadvantages of GWAS limit additional functional analyses. First, above 90% of all GWAS significant SNPs are in noncoding regions. These SNPs are often passenger variants due to complex linkage disequilibrium (LD). Second, the causal gene associated with the causal SNPs remains unclear in most GWAS loci. To overcome these challenges, downstream GWAS analyses were established with the aim of identifying causal genes within GWAS loci. This involves techniques such as fine-mapping and colocalization methods to nominate causal SNPs, as well as transcriptome-wide association studies to nominate gene-trait associations. These models use LD structure, and gene expression panels to discover causal SNPs/genes. While these methods may propose causal variants and genes, additional biological evidence is generally required to pair causal variants with causal genes. Using multi-omic analyses, one can integrate a diverse range of comprehensive cellular and biological datasets such as genomic, transcriptomic and epigenetic datasets and use platforms such as Open Targets Genetics (https://genetics.opentargets.org/) to perform systematic analyses of gene prioritization across all publicly available GWASs. Although powerful, Open Targets Genetics lacks disease-specific tissues relevant to PD such as dopaminergic neurons and microglia. Using a similar approach, we may discover additional pathways and genetic targets involved in PD. In this study, we leveraged PD-relevant transcriptomic, epigenomic and other datasets in our gradient boosting model (Figure 1). We trained this model on well-established PD genes to nominate causal genes from PD GWAS loci.

Tissue Type:

Brain Tissue, Whole Blood

Source Data Type:

Genomics, Transcriptomics, Epigenetics

Source Data Cohorts:

FOUNDIN-PD, McGill, Parkinson's Progression Markers Initiative (PPMI), BioFIND, Parkinson's Disease Biomarkers Program (PDBP), Harvard Biomarkers Study (HBS), NINDS Study of Isradipine as a Disease19 modifying Agent in Subjects With Early Parkinson Disease, Phase 3 (STEADY-PD3), Vance (dbGap phs000394), International Parkinson's Disease Genomics Consortium (IPDGC) NeuroX dataset (dbGap phs000918.v1.p1), National Institute of Neurological Disorders and Stroke (NINDS) Genome-Wide genotyping in Parkinson's Disease (dbGap phs000089.v4.p2), NeuroGenetics Research Consortium (NGRC) (dbGap phs000196.v3.p1), UK Biobank, GTEx, SMR, Cuomo et al. 2020, Bryois et al. 2021, Kamath et al. 2021


Data Dictionary

Field NameField Name ExpandedShort Description (optional)
GeneGene Name
LocusGene locus
ensembl_idEnsembl ID
gene_typeHGNC Gene Locus Type
Prediction Rank
Prediction Probability
Nearest gene GWAS
Nearest gene based on distance
Nearest gene based on TSS

Nominated Targets

GBA1 (gene)

ensembl_id: ENSG00000177628
hgnc_symbol: GBA1
gene_type: protein-coding gene
gene_name:
GBA1
Locus:
1.0
Prediction rank:
1.0
Prediction probability:
0.95301306
Nearest gene GWAS:
Nearest gene based on distance:
Yes
Nearest gene based on TSS:

SLC50A1 (gene)

ensembl_id: ENSG00000169241
hgnc_symbol: SLC50A1
gene_type: protein-coding gene
gene_name:
SLC50A1
Locus:
1.0
Prediction rank:
2.0
Prediction probability:
0.22055028
Nearest gene GWAS:
Nearest gene based on distance:
Yes
Nearest gene based on TSS:
Yes

SLC25A44 (gene)

ensembl_id: ENSG00000160785
hgnc_symbol: SLC25A44
gene_type: protein-coding gene
gene_name:
SLC25A44
Locus:
1.0
Prediction rank:
3.0
Prediction probability:
0.17052083
Nearest gene GWAS:
Nearest gene based on distance:
Nearest gene based on TSS:

FCGR2A (gene)

ensembl_id: ENSG00000143226
hgnc_symbol: FCGR2A
gene_type: protein-coding gene
gene_name:
FCGR2A
Locus:
2.0
Prediction rank:
1.0
Prediction probability:
0.7849973
Nearest gene GWAS:
Yes
Nearest gene based on distance:
Yes
Nearest gene based on TSS:
Yes

VAMP4 (gene)

ensembl_id: ENSG00000117533
hgnc_symbol: VAMP4
gene_type: protein-coding gene
gene_name:
VAMP4
Locus:
3.0
Prediction rank:
1.0
Prediction probability:
0.2436475
Nearest gene GWAS:
Yes
Nearest gene based on distance:
Yes
Nearest gene based on TSS:
Yes

DNM3 (gene)

ensembl_id: ENSG00000197959
hgnc_symbol: DNM3
gene_type: protein-coding gene
gene_name:
DNM3
Locus:
3.0
Prediction rank:
2.0
Prediction probability:
0.1755844
Nearest gene GWAS:
Nearest gene based on distance:
Nearest gene based on TSS:

NUCKS1 (gene)

ensembl_id: ENSG00000069275
hgnc_symbol: NUCKS1
gene_type: protein-coding gene
gene_name:
NUCKS1
Locus:
4.0
Prediction rank:
1.0
Prediction probability:
0.7768597
Nearest gene GWAS:
Yes
Nearest gene based on distance:
Yes
Nearest gene based on TSS:
Yes

RAB7L1 (gene)

ensembl_id: ENSG00000117280
hgnc_symbol: RAB7L1
gene_type: protein-coding gene
gene_name:
RAB7L1
Locus:
4.0
Prediction rank:
2.0
Prediction probability:
0.4207892
Nearest gene GWAS:
Nearest gene based on distance:
Yes
Nearest gene based on TSS:

ITPKB (gene)

ensembl_id: ENSG00000143772
hgnc_symbol: ITPKB
gene_type: protein-coding gene
gene_name:
ITPKB
Locus:
5.0
Prediction rank:
1.0
Prediction probability:
0.5402806
Nearest gene GWAS:
Yes
Nearest gene based on distance:
Yes
Nearest gene based on TSS:
Yes

SIPA1L2 (gene)

ensembl_id: ENSG00000116991
hgnc_symbol: SIPA1L2
gene_type: protein-coding gene
gene_name:
SIPA1L2
Locus:
6.0
Prediction rank:
1.0
Prediction probability:
0.8489601
Nearest gene GWAS:
Yes
Nearest gene based on distance:
Yes
Nearest gene based on TSS:
Yes

KCNS3 (gene)

ensembl_id: ENSG00000170745
hgnc_symbol: KCNS3
gene_type: protein-coding gene
gene_name:
KCNS3
Locus:
7.0
Prediction rank:
1.0
Prediction probability:
0.834962
Nearest gene GWAS:
Yes
Nearest gene based on distance:
Yes
Nearest gene based on TSS:
Yes

SMC6 (gene)

ensembl_id: ENSG00000163029
hgnc_symbol: SMC6
gene_type: protein-coding gene
gene_name:
SMC6
Locus:
7.0
Prediction rank:
2.0
Prediction probability:
0.10313215
Nearest gene GWAS:
Nearest gene based on distance:
Nearest gene based on TSS:

KCNIP3 (gene)

ensembl_id: ENSG00000115041
hgnc_symbol: KCNIP3
gene_type: protein-coding gene
gene_name:
KCNIP3
Locus:
8.0
Prediction rank:
1.0
Prediction probability:
0.770249
Nearest gene GWAS:
Yes
Nearest gene based on distance:
Yes
Nearest gene based on TSS:
Yes

MAP4K4 (gene)

ensembl_id: ENSG00000071054
hgnc_symbol: MAP4K4
gene_type: protein-coding gene
gene_name:
MAP4K4
Locus:
9.0
Prediction rank:
1.0
Prediction probability:
0.8729259
Nearest gene GWAS:
Yes
Nearest gene based on distance:
Yes
Nearest gene based on TSS:
Yes

TMEM163 (gene)

ensembl_id: ENSG00000152128
hgnc_symbol: TMEM163
gene_type: protein-coding gene
gene_name:
TMEM163
Locus:
10.0
Prediction rank:
1.0
Prediction probability:
0.97705317
Nearest gene GWAS:
Yes
Nearest gene based on distance:
Yes
Nearest gene based on TSS:
Yes

STK39 (gene)

ensembl_id: ENSG00000198648
hgnc_symbol: STK39
gene_type: protein-coding gene
gene_name:
STK39
Locus:
11.0
Prediction rank:
1.0
Prediction probability:
0.89619964
Nearest gene GWAS:
Yes
Nearest gene based on distance:
Yes
Nearest gene based on TSS:
Yes

TBC1D5 (gene)

ensembl_id: ENSG00000131374
hgnc_symbol: TBC1D5
gene_type: protein-coding gene
gene_name:
TBC1D5
Locus:
12.0
Prediction rank:
1.0
Prediction probability:
0.73659813
Nearest gene GWAS:
Nearest gene based on distance:
Yes
Nearest gene based on TSS:
Yes

SATB1 (gene)

ensembl_id: ENSG00000182568
hgnc_symbol: SATB1
gene_type: protein-coding gene
gene_name:
SATB1
Locus:
12.0
Prediction rank:
2.0
Prediction probability:
0.15966341
Nearest gene GWAS:
Yes
Nearest gene based on distance:
Nearest gene based on TSS:

RBMS3 (gene)

ensembl_id: ENSG00000144642
hgnc_symbol: RBMS3
gene_type: protein-coding gene
gene_name:
RBMS3
Locus:
13.0
Prediction rank:
1.0
Prediction probability:
0.22128376
Nearest gene GWAS:
Nearest gene based on distance:
Nearest gene based on TSS:
Yes

IP6K2 (gene)

ensembl_id: ENSG00000068745
hgnc_symbol: IP6K2
gene_type: protein-coding gene
gene_name:
IP6K2
Locus:
14.0
Prediction rank:
1.0
Prediction probability:
0.5775295
Nearest gene GWAS:
Yes
Nearest gene based on distance:
Yes
Nearest gene based on TSS: