NIA Laboratory of Neurogenetics

The Molecular Genetics Section (MGS) of the Laboratory of Neurogenetics, led by Dr. Andrew Singleton, works on the identification of genetic variants that cause and contribute to both simple and complex neurological diseases. The MGS works across varied neurodegenerative diseases including most prominently Parkinson's disease but also ataxia, dementia with Lewy bodies, frontotemporal dementia, and rare orphan syndromes.

The MGS uses a combination of state of the art genetics and genomics approaches. Further the MGS has acted as a catalyst for the creation of large collaborative consortia aimed at dissecting the genetic basis of common complex diseases.

The Neuromuscular Diseases Research Section of the Laboratory of Neurogenetics, led by Dr. Bryan J. Traynor, is best known for its work to understand the genetic etiology of amyotrophic lateral sclerosis (also known as Lou Gehrig's disease) and frontotemporal dementia. In 2011, he led the international consortium that identified a pathogenic hexanucleotide repeat expansion in the C9orf72 gene as the underlying gene variant in a large proportion of amyotrophic lateral sclerosis and frontotemporal dementia (Neuron 2011).

Other notable achievements of Dr. Traynor's laboratory include the first genome-wide association study of ALS (Lancet Neuro 2007); identification of the chromosome 9p21 association signal for ALS in the Finnish founder population (Lancet Neuro 2010); the identification of the C9orf72 repeat expansion in patients clinically diagnosed with Alzheimer’s disease (NEJM 2012); and the description of variations in the VCP, MATR3, CHCHD10, HTT and SPTLC1 genes as causes of familial ALS and FTD (Neuron 2010, Nature Neuroscience 2014, Brain 2014, Neuron 2018, Neuron 2021, JAMA Neurology 2021).

Project Lead: S. Bandres‑Ciga and S. Saez-Atienzar
For information, contact:

A. B. Singleton


Analysis Methodology

To assess PD risk, summary statistics from Chang et al. PD GWAS meta-analysis involving 26,035 PD cases and 403,190 controls of European ancestry were used as the reference dataset for the primary analysis to define risk allele weights. Individual-level genotyping data not included in Chang et al. [7] and from the last GWAS meta-analysis [17] was then randomly divided as the training and testing datasets. The training dataset used to construct the PRS consisted of 7218 PD cases and 9424 controls, while the testing dataset to validate the results consisted of 5429 PD cases and 5814 controls, all of European ancestry. A polygenic effect score (PES) was generated to estimate polygenic risk for each of the 2199 gene sets representative of biological pathways and then tested for association with PD. PES was calculated based on the weighted allele dose as implemented in PRSice2 (v2.1.1). The sequence kernel association test-optimal (SKAT-O) was implemented using default parameters in RVTESTS [35] to determine the difference in the aggregate burden of rare coding genetic variants (minor allele count≥ 3) between PD cases and controls for the nominated gene-sets by PRS. ANNOVAR was used for variant annotation. Baseline peri-diagnostic RNA sequencing data derived from the blood for 1612 PD patients and 1042 healthy subjects available from the Parkinson Progression Marker Initiative (PPMI) was used to construct a network of expression communities based on a graph model with Louvain clusters. This cleaned and normalized data was downloaded from the Accelerating Medicines Partnership for Parkinson’s disease (AMP-PD) on March 1st, 2020. Scikit-learn’s extraTreeClassifer option was used to extract coding gene features for inclusion in the network builds that are likely to contribute to classifying cases versus controls under default settings in the feature selection phase, leaving 8.3 k protein-coding genes for candidate networks. Following this feature extraction phase, controls were excluded, and case-only correlations were calculated for all remaining gene features. Next, this correlation structure was converted to a graph object using NetworkX. Subsequently, the Louvain algorithm was employed to build network communities within this graph object derived from the selected feature set. Finally, pathway enrichment analysis within expression communities was performed to further dissect its biological function using the function g:GOSt from g:ProfleR. The significance of each pathway was tested by hypergeometric tests with Bonferroni correction to calculate the error rate of each network. Single-cell RNA sequencing data [25] based on a total of 9970 cells obtained from several mouse brain regions (neocortex, hippocampus, hypothalamus, striatum, and midbrain) was used to explore cell types associated with PD risk. Linear regression adjusted by the number of SNPs included in the PRS was performed to assess the trend of increased PRS R2 per decile of cell-type expression specificity. Two-sample SMR was applied to explore the enrichment of cis eQTLs within the 46 gene-sets nominated by our large scale PRS analysis. The methodology can be interpreted as an analysis to test if the effect size of genetic variants influencing PD risk is mediated by gene expression or methylation to prioritize genes underlying these gene-sets for follow-up functional studies. Additionally, we studied expression patterns in blood from the largest eQTL meta-analysis so far. The number of genes tested per gene-set were Bonferroni corrected, and a Chisq test was applied to assess whether the proportion of QTLs per geneset was significantly higher than expected by chance.

SMR revealed functional genomic associations with eQTLs in 201 genes (Supplementary Table 12, online resource) of which 88 were found to be part of the network communities significantly associated with PD in our transcriptome community map (Supplementary Table 13, online resource).


Underlying Analyses

Analysis Type:

GWAS

Target Type:

Gene

Description:

Here, we present a novel high-throughput and hypothesis-free approach to detect the existence of PD genetic risk linked to any particular biological pathway. We apply polygenic risk score (PRS) to a total of 2199 curated and well-defined gene sets representative of canonical pathways publicly available in the Molecular Signature Database v7.2 (MSigDB) to define the cumulative effect of pathway specific genetic variation on PD risk. To assess the impact of rare variation on PD risk explained by significant pathways, we perform gene-set burden analyses in an independent cohort of whole-genome sequencing (WGS) data, including 2101 cases and 2230 controls. Additionally, we explore cell-type expression specificity enrichment linked to PD etiology by using single-cell RNA sequencing data from brain cells. Furthermore, we use graph-based analyses to generate de novo pathways that could be involved in disease etiology by constructing a transcriptome map of network communities based on RNA sequencing data derived from the blood of 1612 PD patients and 1042 healthy subjects. Subsequently, we perform summary-data-based Mendelian randomization (SMR) analysis to prioritize genes from significant gene-sets by exploring possible genomic associations with expression quantitative trait loci (eQTL) in public databases and nominate overlapping genes within our transcriptome communities for follow-up functional studies. Finally, we present a user-friendly platform for the PD research community that enables easy and interactive access to these results: Pathways Browser.

Tissue Type:

Whole Blood

Source Data Type:

Genomics, Transcriptomics

Source Data Cohorts:

BioFIND, NABEC, LNG Path confirmed, Parkinson's Disease Biomarkers Program (PDBP), Parkinson's Progression Markers Initiative (PPMI), NIH PD CLINIC, WELLDERLY, UKBEC


Data Dictionary

Field NameField Name ExpandedShort Description (optional)
gene_nameGene Name
hgnc_symbolHGNC Gene Symbol
ensembl_idEnsembl ID
gene_typeHGNC Gene Locus Type
ProteinProteinProtein name

Nominated Targets

APH1B (gene)

ensembl_id: ENSG00000138613
hgnc_symbol: APH1B
gene_type: protein-coding gene
gene_name:
APH1B
Protein:
Gamma-secretase subunit APH-1B (APH-1b) (Aph-1beta) (Presenilin-stabilization factor-like)

ATP2A1 (gene)

ensembl_id: ENSG00000196296
hgnc_symbol: ATP2A1
gene_type: protein-coding gene
gene_name:
ATP2A1
Protein:
Sarcoplasmic/endoplasmic reticulum calcium ATPase 1 (SERCA1) (SR Ca(2+)-ATPase 1) (EC 7.2.2.10) (Calcium pump 1)

B3GNT5 (gene)

ensembl_id: ENSG00000176597
hgnc_symbol: B3GNT5
gene_type: protein-coding gene
gene_name:
B3GNT5
Protein:
Hexosyltransferase

BCL2L1 (gene)

ensembl_id: ENSG00000171552
hgnc_symbol: BCL2L1
gene_type: protein-coding gene
gene_name:
BCL2L1
Protein:
Bcl-2-like protein 1 (Bcl2-L-1) (Apoptosis regulator Bcl-X)

BLK (gene)

ensembl_id: ENSG00000136573
hgnc_symbol: BLK
gene_type: protein-coding gene
gene_name:
BLK
Protein:
Tyrosine-protein kinase Blk (B lymphocyte kinase) (p55-Blk)

BST1 (gene)

ensembl_id: ENSG00000109743
hgnc_symbol: BST1
gene_type: protein-coding gene
gene_name:
BST1
Protein:
Bone marrow stromal cell antigen 1 variant 2

BTN3A2 (gene)

ensembl_id: ENSG00000186470
hgnc_symbol: BTN3A2
gene_type: protein-coding gene
gene_name:
BTN3A2
Protein:
Butyrophilin, subfamily 3, member A2, isoform CRA_a

CACNG8 (gene)

ensembl_id: ENSG00000142408
hgnc_symbol: CACNG8
gene_type: protein-coding gene
gene_name:
CACNG8
Protein:
Voltage-dependent calcium channel gamma-8 subunit (Neuronal voltage-gated calcium channel gamma-8 subunit)

CAMKK2 (gene)

ensembl_id: ENSG00000110931
hgnc_symbol: CAMKK2
gene_type: protein-coding gene
gene_name:
CAMKK2
Protein:
Calcium/calmodulin-dependent protein kinase kinase 2

CLN3 (gene)

ensembl_id: ENSG00000188603
hgnc_symbol: CLN3
gene_type: protein-coding gene
gene_name:
CLN3
Protein:
Battenin

COMMD7 (gene)

ensembl_id: ENSG00000149600
hgnc_symbol: COMMD7
gene_type: protein-coding gene
gene_name:
COMMD7
Protein:
COMM domain-containing protein 7

CSNK2B (gene)

ensembl_id: ENSG00000204435
hgnc_symbol: CSNK2B
gene_type: protein-coding gene
gene_name:
CSNK2B
Protein:
Casein kinase II subunit beta

CTSB (gene)

ensembl_id: ENSG00000164733
hgnc_symbol: CTSB
gene_type: protein-coding gene
gene_name:
CTSB
Protein:
Cathepsin B

DGKQ (gene)

ensembl_id: ENSG00000145214
hgnc_symbol: DGKQ
gene_type: protein-coding gene
gene_name:
DGKQ
Protein:
Diacylglycerol kinase (DAG kinase) (EC 2.7.1.107)

DNM3 (gene)

ensembl_id: ENSG00000197959
hgnc_symbol: DNM3
gene_type: protein-coding gene
gene_name:
DNM3
Protein:
Dynamin-3

DPAGT1 (gene)

ensembl_id: ENSG00000172269
hgnc_symbol: DPAGT1
gene_type: protein-coding gene
gene_name:
DPAGT1
Protein:
Dolichyl-phosphate (UDP-N-acetylglucosamine) N-acetylglucosaminephosphotransferase 1 (GlcNAc-1-P transferase)

DPM3 (gene)

ensembl_id: ENSG00000179085
hgnc_symbol: DPM3
gene_type: protein-coding gene
gene_name:
DPM3
Protein:
Dolichol-phosphate mannosyltransferase subunit 3

ELOVL7 (gene)

ensembl_id: ENSG00000164181
hgnc_symbol: ELOVL7
gene_type: protein-coding gene
gene_name:
ELOVL7
Protein:
Elongation of very long chain fatty acids protein 7 (EC 2.3.1.199) (3-keto acyl-CoA synthase ELOVL7) synthase 7)

FBXL5 (gene)

ensembl_id: ENSG00000118564
hgnc_symbol: FBXL5
gene_type: protein-coding gene
gene_name:
FBXL5
Protein:
F-box/LRR-repeat protein 5

FCGR2A (gene)

ensembl_id: ENSG00000143226
hgnc_symbol: FCGR2A
gene_type: protein-coding gene
gene_name:
FCGR2A
Protein:
Low affinity immunoglobulin gamma Fc region receptor II-a