Project Lead:

Analysis Methodology

To assess PD risk, summary statistics from Chang et al. PD GWAS meta-analysis involving 26,035 PD cases and 403,190 controls of European ancestry were used as the reference dataset for the primary analysis to define risk allele weights. Individual-level genotyping data not included in Chang et al. [7] and from the last GWAS meta-analysis [17] was then randomly divided as the training and testing datasets. The training dataset used to construct the PRS consisted of 7218 PD cases and 9424 controls, while the testing dataset to validate the results consisted of 5429 PD cases and 5814 controls, all of European ancestry. A polygenic effect score (PES) was generated to estimate polygenic risk for each of the 2199 gene sets representative of biological pathways and then tested for association with PD. PES was calculated based on the weighted allele dose as implemented in PRSice2 (v2.1.1). The sequence kernel association test-optimal (SKAT-O) was implemented using default parameters in RVTESTS [35] to determine the difference in the aggregate burden of rare coding genetic variants (minor allele count≥ 3) between PD cases and controls for the nominated gene-sets by PRS. ANNOVAR was used for variant annotation. Baseline peri-diagnostic RNA sequencing data derived from the blood for 1612 PD patients and 1042 healthy subjects available from the Parkinson Progression Marker Initiative (PPMI) was used to construct a network of expression communities based on a graph model with Louvain clusters. This cleaned and normalized data was downloaded from the Accelerating Medicines Partnership for Parkinson’s disease (AMP-PD) on March 1st, 2020. Scikit-learn’s extraTreeClassifer option was used to extract coding gene features for inclusion in the network builds that are likely to contribute to classifying cases versus controls under default settings in the feature selection phase, leaving 8.3 k protein-coding genes for candidate networks. Following this feature extraction phase, controls were excluded, and case-only correlations were calculated for all remaining gene features. Next, this correlation structure was converted to a graph object using NetworkX. Subsequently, the Louvain algorithm was employed to build network communities within this graph object derived from the selected feature set. Finally, pathway enrichment analysis within expression communities was performed to further dissect its biological function using the function g:GOSt from g:ProfleR. The significance of each pathway was tested by hypergeometric tests with Bonferroni correction to calculate the error rate of each network. Single-cell RNA sequencing data [25] based on a total of 9970 cells obtained from several mouse brain regions (neocortex, hippocampus, hypothalamus, striatum, and midbrain) was used to explore cell types associated with PD risk. Linear regression adjusted by the number of SNPs included in the PRS was performed to assess the trend of increased PRS R2 per decile of cell-type expression specificity. Two-sample SMR was applied to explore the enrichment of cis eQTLs within the 46 gene-sets nominated by our large scale PRS analysis. The methodology can be interpreted as an analysis to test if the effect size of genetic variants influencing PD risk is mediated by gene expression or methylation to prioritize genes underlying these gene-sets for follow-up functional studies. Additionally, we studied expression patterns in blood from the largest eQTL meta-analysis so far. The number of genes tested per gene-set were Bonferroni corrected, and a Chisq test was applied to assess whether the proportion of QTLs per geneset was significantly higher than expected by chance.

In an effort to prioritize the top genes within significant gene-sets showing the highest cumulative effect on PD risk, individual gene-based SKAT-O analyses were performed considering a MAF threshold ≤3% and three functional categories (missense, loss of function and CADD score >12). Using this approach, gene-level prioritization is highlighted in Supplementary Table 7.


Underlying Analyses

Analysis Type:

GWAS

Target Type:

Gene

Description:

Here, we present a novel high-throughput and hypothesis-free approach to detect the existence of PD genetic risk linked to any particular biological pathway. We apply polygenic risk score (PRS) to a total of 2199 curated and well-defined gene sets representative of canonical pathways publicly available in the Molecular Signature Database v7.2 (MSigDB) to define the cumulative effect of pathway specific genetic variation on PD risk. To assess the impact of rare variation on PD risk explained by significant pathways, we perform gene-set burden analyses in an independent cohort of whole-genome sequencing (WGS) data, including 2101 cases and 2230 controls. Additionally, we explore cell-type expression specificity enrichment linked to PD etiology by using single-cell RNA sequencing data from brain cells. Furthermore, we use graph-based analyses to generate de novo pathways that could be involved in disease etiology by constructing a transcriptome map of network communities based on RNA sequencing data derived from the blood of 1612 PD patients and 1042 healthy subjects. Subsequently, we perform summary-data-based Mendelian randomization (SMR) analyses to prioritize genes from significant gene-sets by exploring possible genomic associations with expression quantitative trait loci (eQTL) in public databases and nominate overlapping genes within our transcriptome communities for follow-up functional studies. Finally, we present a user-friendly platform for the PD research community that enables easy and interactive access to these results: Pathways Browser.

Tissue Type:

Whole Blood

Source Data Type:

Genomics, Transcriptomics

Source Data Cohorts:

BioFIND, NABEC, LNG Path confirmed, Parkinson's Disease Biomarkers Program (PDBP), Parkinson's Progression Markers Initiative (PPMI), NIH PD CLINIC, WELLDERLY, UKBEC


Data Dictionary

Field NameField Name ExpandedShort Description (optional)
gene_nameGene Name
hgnc_symbolHGNC Gene Symbol
ensembl_idEnsembl ID
gene_typeHGNC Gene Locus Type
NumVarNum variants
QVariance-component score statistic
p_valp-value
Gene_setGene setAnnotated gene set from curated pathways, eg. KEGG

Nominated Targets

NDUFA10 (gene)

ensembl_id: ENSG00000130414
hgnc_symbol: NDUFA10
gene_type: protein_coding
gene_name:
NDUFA10
NumVar:
3
Q:
3.11E+04
p_val:
0.001
Gene_Set:
PARKINSONS DISEASE

NDUFS8 (gene)

ensembl_id: ENSG00000110717
hgnc_symbol: NDUFS8
gene_type: protein_coding
gene_name:
NDUFS8
NumVar:
9
Q:
2.11E+04
p_val:
0.008
Gene_Set:
PARKINSONS DISEASE

LRRK2 (gene)

ensembl_id: ENSG00000188906
hgnc_symbol: LRRK2
gene_type: protein_coding
gene_name:
LRRK2
NumVar:
18
Q:
5.07E+04
p_val:
0.016
Gene_Set:
PARKINSONS DISEASE

APAF1 (gene)

ensembl_id: ENSG00000120868
hgnc_symbol: APAF1
gene_type: protein_coding
gene_name:
APAF1
NumVar:
13
Q:
3.41E+04
p_val:
0.02
Gene_Set:
PARKINSONS DISEASE

UQCR10 (gene)

ensembl_id: ENSG00000184076
hgnc_symbol: UQCR10
gene_type: protein_coding
gene_name:
UQCR10
NumVar:
1
Q:
7.97E+03
p_val:
0.037
Gene_Set:
PARKINSONS DISEASE

UQCR11 (gene)

ensembl_id: ENSG00000127540
hgnc_symbol: UQCR11
gene_type: protein_coding
gene_name:
UQCR11
NumVar:
1
Q:
6.97E+02
p_val:
0.047
Gene_Set:
PARKINSONS DISEASE

NDUFS7 (gene)

ensembl_id: ENSG00000115286
hgnc_symbol: NDUFS7
gene_type: protein_coding
gene_name:
NDUFS7
NumVar:
10
Q:
4.16E+04
p_val:
0.058
Gene_Set:
PARKINSONS DISEASE

COX7A1 (gene)

ensembl_id: ENSG00000161281
hgnc_symbol: COX7A1
gene_type: protein_coding
gene_name:
COX7A1
NumVar:
2
Q:
1.95E+03
p_val:
0.075
Gene_Set:
PARKINSONS DISEASE

NDUFA7 (gene)

ensembl_id: ENSG00000267855
hgnc_symbol: NDUFA7
gene_type: protein_coding
gene_name:
NDUFA7
NumVar:
2
Q:
1.24E+04
p_val:
0.077
Gene_Set:
PARKINSONS DISEASE

PRKN (gene)

ensembl_id: ENSG00000185345
hgnc_symbol: PRKN
gene_type: protein_coding
gene_name:
PRKN
NumVar:
10
Q:
2.32E+04
p_val:
0.093
Gene_Set:
PARKINSONS DISEASE

COX6B2 (gene)

ensembl_id: ENSG00000160471
hgnc_symbol: COX6B2
gene_type: protein_coding
gene_name:
COX6B2
NumVar:
1
Q:
2.21E+03
p_val:
0.107
Gene_Set:
PARKINSONS DISEASE

NDUFS5 (gene)

ensembl_id: ENSG00000168653
hgnc_symbol: NDUFS5
gene_type: protein_coding
gene_name:
NDUFS5
NumVar:
1
Q:
4.64E+02
p_val:
0.109
Gene_Set:
PARKINSONS DISEASE

NDUFS6 (gene)

ensembl_id: ENSG00000145494
hgnc_symbol: NDUFS6
gene_type: protein_coding
gene_name:
NDUFS6
NumVar:
1
Q:
4.30E+02
p_val:
0.135
Gene_Set:
PARKINSONS DISEASE

NDUFB5 (gene)

ensembl_id: ENSG00000136521
hgnc_symbol: NDUFB5
gene_type: protein_coding
gene_name:
NDUFB5
NumVar:
1
Q:
6.02E+03
p_val:
0.159
Gene_Set:
PARKINSONS DISEASE

COX6A2 (gene)

ensembl_id: ENSG00000156885
hgnc_symbol: COX6A2
gene_type: protein_coding
gene_name:
COX6A2
NumVar:
2
Q:
4.59E+03
p_val:
0.168
Gene_Set:
PARKINSONS DISEASE

UCHL1 (gene)

ensembl_id: ENSG00000154277
hgnc_symbol: UCHL1
gene_type: protein_coding
gene_name:
UCHL1
NumVar:
2
Q:
2.11E+03
p_val:
0.174
Gene_Set:
PARKINSONS DISEASE

GPR37 (gene)

ensembl_id: ENSG00000170775
hgnc_symbol: GPR37
gene_type: protein_coding
gene_name:
GPR37
NumVar:
7
Q:
1.15E+04
p_val:
0.206
Gene_Set:
PARKINSONS DISEASE

VDAC3 (gene)

ensembl_id: ENSG00000078668
hgnc_symbol: VDAC3
gene_type: protein_coding
gene_name:
VDAC3
NumVar:
3
Q:
3.66E+03
p_val:
0.206
Gene_Set:
PARKINSONS DISEASE

NDUFA9 (gene)

ensembl_id: ENSG00000139180
hgnc_symbol: NDUFA9
gene_type: protein_coding
gene_name:
NDUFA9
NumVar:
5
Q:
8.13E+03
p_val:
0.24
Gene_Set:
PARKINSONS DISEASE

VDAC1 (gene)

ensembl_id: ENSG00000213585
hgnc_symbol: VDAC1
gene_type: protein_coding
gene_name:
VDAC1
NumVar:
2
Q:
5.69E+02
p_val:
0.247
Gene_Set:
PARKINSONS DISEASE