2025-06-25 05:44来源:本站
没有使用统计方法来预先确定样本量。实验不是随机的,研究人员在实验和结果评估过程中并未对分配视而不见。
我们选择了来自Rosmap的48个人,均对衰老和痴呆症的纵向临床对比研究研究,其中所有参与者都是脑捐献者。研究包括每年收集的临床数据,详细的验尸后病理评估以及广泛的遗传学,表观基因组,转录组,蛋白质组学和代谢组批量组织分析28。出于这项研究的目的,根据AD的NIA-REAGAN诊断和Braak阶段得分(Braak阶段0、1和1和2,N = 20; Braak阶段3和4,n = 14; Braak阶段5和6,n = 14),选择了个体,26个个体对AD和22个患者进行了负面病情诊断。先前报道了临床和病理数据收集方法的详细信息2,5,6,28,84。个体在性别之间保持平衡(男性:女性比率13:13在AD中,NOAD 11:11),年龄(中位数为86.6岁(AD)和86.0岁(无AD))和验尸间隔(中位数,5.9 h(AD)和6.3 h(no AD AD))。从每个参与者那里获得了知情同意,宗教秩序的研究以及Rush记忆和衰老项目均由Rush University Medical Center的机构审查委员会(IRB)批准。参与者还签署了《解剖学礼物法》,并征得了允许其数据重新使用的存储库同意。
所有解剖均使用细齿剃刀锯(用于皮质区域)或带有钻石电线(用于皮层下区域)的珠宝商在干冰床上进行。特定区域的描述如下。(1)AG:AG的全厚皮质(Brodmann地区:BA 39/40);从第一个或第二个平板沿后部到HC的末端。最小化白质。(2)MT:中间回的全厚皮质(BA 22);尽可能接近前;最小化白质。(3)PFC:额头的全厚皮质(BA 10);从第一或第二平板的侧面取。最小化白质。(4)EC:EC的全厚皮质(BA 28);占用杏仁核的水平。避免杏仁核。最小化白质。(5)后HC:从最后一个包含HC的平板中取。如果最后一个平板的HC少于5毫米,则从下一个板前移动。收集完整的横截面。(6)TH:从丘脑的第一个平板上拿走。包括最内侧的方面。
从先前的研究中调整了从冷冻后脑组织中分离核的方案。12。所有程序均在冰上或4°C上进行。简而言之,在700 µL均质化缓冲液(320 mM蔗糖,5 mM CaCl2,3毫米毫克(CH3COO)2,10 mm Tris HCl pH 7.8,0.1 mm EDTA pH 8.0,0.1%igepal Ca-630,1mm imm imercaptoEthAnane和0.1 n.4.1 f ph 7.8,10毫米,1 mm iepaptoemanto和0.4 eyananane ucim n ucim n.4.4 ll usin anananate,抑制剂(Clontech))使用Wheaton Dounce Tissue Grinder(15笔触诊的杵)。然后将均质的组织通过40 µm细胞滤网过滤,与相等量的工作溶液(50%的OptipRep密度梯度培养基(Sigma-Aldrich),5 mM CaCl2,3 mm Mg(CH3COO)2,10 mm Tris Hcl pH 7.8,0.8,0.8,0.8,0.8,0.1 mm Edta PH 8.0和1 mmmmβ-density gradient (750 µl 30% OptiPrep solution (30% OptiPrep density gradient medium, 134 mM sucrose, 5 mM CaCl2, 3 mM Mg(CH3COO)2, 10 mM Tris HCl pH 7.8, 0.1 mM EDTA pH 8.0, 1 mM β-mercaptoethanol, 0.04% IGEPAL CA-630 and 0.17 U µl−1recombinant RNase inhibitor)) on top of 300 µl 40% OptiPrep solution (40% OptiPrep density gradient medium, 96 mM sucrose, 5 mM CaCl2, 3 mM Mg(CH3COO)2, 10 mM Tris HCl pH 7.8, 0.1 mM EDTA pH 8.0, 1 mM β-mercaptoethanol, 0.03% IGEPAL CA-630 and0.12 U µl -1重组RNase抑制剂)。通过离心分离核(5分钟,10,000g,4°C)。从30%/40%的相间收集了总共100 µL的核,并用含有0.04%BSA的1 mL PBS洗涤。将核以300g离心3分钟(4°C),并用含有0.04%BSA的1 mL PBS洗涤。然后将核以300g离心3分钟(4°C),并重悬于含0.04%BSA的100 µL PBS中。将细胞核计数并稀释至含有0.04%BSA的PBS中的1,000个核。
对于基于液滴的SNRNA-Seq,根据制造商的协议(10X基因组学),使用Chromium单细胞3'试剂盒V3制备库。使用NextSeq 500/550高输出V2试剂盒(150个周期)或Novaseq 6000 S2试剂试剂盒对生成的SNRNA-SEQ库进行了测序。
通过使用Cell Ranger软件(v.3.0.2)(10x Genomics)将读取与GRCH38基因组的读取获得基因计数85。为了说明未填充的核转录本,计数读取映射到前mRNA。使用细胞游舱计数管道对前MRNA进行定量后,使用Cell Ranger Aggr Pipeline来汇总所有库(不均衡组之间的读取深度)以生成基因符号矩阵。Cell Ranger v.3.0默认参数用于调用细胞条形码。我们使用scanpy86来处理和聚集表达谱并推断细胞身份。我们仅保留蛋白质编码基因,并过滤了超过20%的线粒体或5%核糖体RNA的细胞,在所有区域中,在48个个体和283个样品中留下147万细胞。我们进一步将数据集过滤为前5,000个最可变的基因,并使用它们来计算细胞的低维嵌入(UMAP)(UMAP)(默认参数,使用50个主要组件和15个主要邻居),并使用莱登聚类算法以高分辨率(15)以337个前序列的preelinary Clustersters87。我们使用DoubleTefinder分别称为双线,并具有强烈的双重双值曲线和簇的标记和删除簇,显示出强烈的个体特异性批处理效应,最终数据集为135万个单元格88。
对于单个主要细胞类型类别(兴奋性神经元,抑制性神经元,星形胶质细胞,少突胶质细胞,OPC,免疫细胞)的UMAP可视化,使用默认设置89,90。我们根据肘部图选择了相关的主要组件集。我们使用先前发布的标记基因和单细胞RNA-seq数据进行注释的细胞类型9,12,33,91,92,93。根据先前发表的单细胞RNA序列数据(Allen Institute的细胞类型数据库; https://portal.brain-map.org/atlases-andlases-and-data/rnaseq/rnaseq/human-multiple-cortical-cortical-cortical-cortical-cortical-areas-smart-seq)一下,以注释细胞类型。First, Spearman rank correlation coefficients between the average expression profiles of neuronal subpopulations previously defined by the Allen Brain Institute33 and the neuronal subtypes identified in this study were computed using the cor function in R. Second, to project annotations of neuronal subpopulations previously defined by the Allen Brain Institute onto the neuronal cells analysed in this study, we followed the integration and label transfer workflow ofSeurat90。第三,我们根据seurat的FindAllmarkers函数(Wilcoxon Rank-sum测试bonferroni校正进行多次测试; PADJ)确定了基于Allen Brain Institute 33发表的数据的细胞类型标记基因。< 0.05) and computed module scores for each cell type marker gene set across all neuronal cells analysed in this study using the AddModuleScore function of Seurat. To further annotate cell types, we determined marker genes using the FindAllMarkers function from Seurat (Wilcoxon rank-sum test with Bonferroni correction for multiple testing; Padj < 0.05). We tested only genes that were detected in a minimum of 25% of the cells within the cell type (min.pct = 0.25) and that showed, on average, at least a 0.25-fold difference (log-scale) between the cells of the cell type and all remaining cells (logfc.threshold = 0.25). Marker genes of the high-resolution cell types or states were determined separately for each major cell type class. We additionally compared the EC excitatory neuron subtypes to cell type annotations previously reported previously94, which were computed using ACTIONet95, and compared microglial markers to previously reported subtypes96,97.
G2/M and S phase cell cycle scores were determined using the function CellCycleScoring in Seurat. Histograms showing the distribution of the G2/M- and the S phase module scores in each major cell class were generated using Prism 9 software. The statistical analyses comparing the number of genes detected per cell and the number of unique transcripts (UMIs) detected per cell between cell types was performed using Prism 9 software.
Single-cell transcriptomic data from the human dLGN98 were obtained from the Allen Brain Institute (https://portal.brain-map.org/atlases-and-data/rnaseq/comparative-lgn). Single-cell transcriptomic data from multiple cortical areas and the hippocampal formation of the mouse brain43 were obtained from the Allen Brain Institute (https://portal.brain-map.org/atlases-and-data/rnaseq/mouse-whole-cortex-and-hippocampus-10x). Single-cell transcriptomic data across nine regions in the adult mouse brain39 were obtained from the McCarroll and Macosko Labs (http://dropviz.org/). Single-cell transcriptomic data from the mouse nervous system40 were obtained from the Linnarsson laboratory (http://mousebrain.org/adolescent/downloads.html). The external datasets and the human multiregion data presented in this study were integrated using the reciprocal PCA (RPCA) pipeline with the default parameters in Seurat (https://satijalab.org/seurat/articles/integration_rpca.html). The integration of single-cell data was performed separately for astrocytes, excitatory neurons and inhibitory neurons. For the integration of GABAergic neurons, the single-cell transcriptomic data from multiple cortical areas and the hippocampal formation of the mouse brain43 were downsampled to 50,000 GABAergic neurons. For the integration of excitatory neurons, the human multiregion dataset was downsampled to 5,000 neurons per high-resolution cell type. The mouse cortical data43 were downsampled to 50,000 excitatory neurons. The frontal cortex, posterior cortex, HC and thalamus data of the DropViz dataset were combined and downsampled to 50,000 neurons. Downsampling of the data was performed using the Seurat function subset with the default parameters. The comparison of microarray data from different human brain regions was performed using the Differential Search tool of the Allen Brain Atlas data portal (https://human.brain-map.org/microarray/search). The thalamus was selected as the target structure and compared to the cerebral cortex as the contrast structure. The differential search results including the fold change values and P values of the top 2,000 genes were downloaded from the data portal.
We would like to find gene expression modules by calculating gene–gene correlations in single-cell data and using these to detect communities of similarly expressed genes. However, in single-cell data, which often contain an unbalanced composition of cell types, modules computed using this approach will be dominated by the most common cell type markers and pathways. Moreover, correlation values will often be inflated for pairs of sparsely expressed genes. We developed a method which accounts for these pitfalls to call multiresolution gene expression programs from single-cell data using an SVD-based approximation of zero-phase component analysis (ZCA) and gene sparsity-dependent thresholding99,100.
The preprocessing transformations alternately called decorrelation, whitening, or sphering, transform a matrix X with a matrix W such that the covariance of XW is the identity matrix101. In particular, ZCA is the transformation which maximizes the similarity of the transformed data to the original, which is achieved by setting W = C−1/2, where C is the covariance of X.
In single-cell data, given a count matrix X with n cells (rows) and g genes (columns), we would like to perform ZCA decorrelation on the samples as a preprocessing step for calling modules. Computing and storing the n by n sample-wise covariance Cn = XXT/g is prohibitively expensive for modern datasets (with n > 1 × 106), even without centring X. Instead, we can analytically approximate the covariance with the SVD of XT = UnSnVnT as Cn ≈ (UnSnVnT)T(UnSnVnT)/g = VnSn2VnT/g and therefore Cn−1/2 = g1/2VnSn−1VnT. The ZCA transformation can then be computed as XZCA = Cn−1/2X = g1/2VnSn−1VnTX before calculating the covariance of XZCA for downstream analysis. While this approximation is already tractable for small single-cell datasets, we may not be able to compute the matrix multiplications or centred SVD for larger datasets. Here, we can use the SVD of X = USVT, which is commonly calculated in single-cell analyses, to approximate Cn−1/2 as g1/2US−1UT and XZCA = g1/2US−1UTX. From this, the non-centred covariance of XZCA is CZCA = XZCATXZCA/n = g × (US−1UTX)T(US−1UTX)/n. By substituting the SVD in for X, this reduces to CZCA = g × VVT/n, which is a simple and efficient approximation for very large single-cell data. As this approximation commonly drops out the largest identity program in the data due to the decorrelation approach, we allow computing CZCA = g × VSpVT/n, for any p, to tune the relative involvement of the larger eigenvalue components of the SVD.
To control for inflated correlation estimates in highly sparse genes, we bin the estimated correlations (CZCA) for each pair of genes according to their sparsities (fraction of cells expressing the gene, binned on a log10 scale). We calculate the mean and s.d. for each 2D bin and smooth the estimates by fitting bivariate splines to the binned statistics, weighted by the log number of examples in the bin. We use the smoothed estimates to z-score the correlation matrix (CZCA), which we then threshold with a single z-score cut-off to create an adjacency matrix for a gene–gene graph. Graphs are laid out using the Fruchterman–Reingold algorithm and we remove connected components with fewer than four genes102. We then use the leidenalg package and the Leiden algorithm with an RBConfiguration vertex partition to cluster the graph into gene modules87. To robustly estimate modules for each set of cells in our analyses, we first performed a grid search for the optimal number of SVD components for cells of that type. We then computed the z-scored matrices for each of 10 bootstraps, selecting 90% of batches for each bootstrap and using only genes expressed in over 5% of cells in the full dataset for the cell type. We thresholded the average of the bootstrapped z-score estimates with z = 4.5 to build a graph. To balance the contributions of modules across the compositional spectrum, we calculated and thresholded separate graphs for eigenvalue powers p = 0, 0.25, 0.5, 0.75 and 1 and combined them using multigraph Leiden clustering to call modules with leiden resolution = 3. Although we identify smaller modules, here we only report modules with at least 10 genes. We also ran the modules method on three published datasets, for which we ran the method with the same parameters on each dataset (k = 100, z = 4.5, resolution = 2.5), used genes with >Covid103和Brain16数据集的5%稀疏性对于Tabula Sapiens DataSet104的稀疏性> 10%,并报告具有10个基因或更多基因的模块。
通过计算模块得分超过1 s.d的细胞,使用超几何测试进行了细胞亚型和大脑区域的模块富集。从平均得分中显着富集在特定的亚型或区域中。对于针对其他模块的绘制得分,平均模块按样本级别(在同一细胞类型内)或单独的样品级别(跨单元类型)和计算的Pearson相关性和P值。使用PADJ的单面测试按样品级别汇总亚型< 0.01 as a cut-off105. To generate contour plots of module expression on a UMAP, we first smoothed cell-level expression on a 500 × 500 grid with a 2D Gaussian kernel (size = 25 × 25 and σ = 1) and then plot contours for smoothed values (0.1 to 0.8).
Gene expression programs underlying both cell type identity and cellular activities were determined according to the consensus NMF (cNMF) analysis pipeline established previously using the default parameters34. The number of components (K) to use for cNMF was determined on the basis of a diagnostic plot showing the stability of the solution and the Frobenius reconstruction error as a function of K. To reduce runtime and working memory requirements, the data were downsampled using the Seurat function subset with the default parameters. The data were downsampled to 200 cells per major cell type. For the cNMF analysis at the level of high-resolution cell types, the analysis was performed separately on excitatory neurons, inhibitory neurons and astrocytes. For these analyses, the data were downsampled to 2,000 cells per astrocyte subtype and 1,000 cells per excitatory and inhibitory neuron subtype. Statistical significance of the overlap between the top 200 genes of a gene expression program and cell type marker genes was computed using Fisher’s exact tests.
The gene regulatory network analysis was performed using pySCENIC with the default parameters35. To reduce runtime and working memory requirements, the data were downsampled to 2,000 cells per major cell type. For the SCENIC analysis at the level of high-resolution cell types, the analysis was performed separately on excitatory neurons, inhibitory neurons and astrocytes and the data were downsampled to 1,000 cells per high-resolution cell type. To identify the top cell-type-specific regulons, we calculated regulon specificity scores as described by previously and ranked the regulons based on their regulon specificity score106. Finally, we calculated the activity of each regulon in each cell using the AddModuleScore function of Seurat. The calculation of regulon module scores for major cell types was performed on a random sample of 50% of the cells (676,537 cells). For the analysis at the level of high-resolution cell types, the regulon module scores were determined based on all the cells of a major cell type class. For the statistical analysis of differences in the activity of regulons between cell types, the average regulon module score per individual and major cell type or high-resolution cell type was computed, respectively. The statistical analyses comparing the regulon module score activity was performed using Prism 9 software.
GABAergic and glutamatergic module scores across all neuronal cell types were determined on the basis of a set of GABAergic and glutamatergic neuron marker genes, respectively, using the AddModuleScore function of Seurat. The sets of GABAergic and glutamatergic neuron marker genes were determined based on the human multiple cortical areas SMART-seq dataset from the Allen Brain Institute (https://portal.brain-map.org/atlases-and-data/rnaseq/human-multiple-cortical-areas-smart-seq). We identified marker genes using the FindAllMarkers function from Seurat (Wilcoxon rank-sum test with Bonferroni correction for multiple testing; Padj < 0.05). We tested only genes that were detected in a minimum of 25% of the cells within the cell type (min.pct = 0.25) and that showed, on average, at least a 0.25-fold difference (log-scale) between the cells of the cell class of interest and all remaining cells (logfc.threshold = 0.25). To quantify the intermediate character of thalamic excitatory and inhibitory neurons, we first computed the average GABAergic and glutamatergic module score values for each neuronal cell type and for each individual. We then used the resulting data to determine the first principal component (PC1) scores (the coordinates of the individual observations on the first principal component axis) using the princomp function in R. The ridgeline plot showing the distribution of PC1 score for each neuronal cell type was generated using the ggplot2 package in R. To determine the association between the average glutamatergic and the GABAergic module score across neuronal cell types, we performed a simple linear regression analysis using Prism 9 software.
Marker genes significantly upregulated in extratelencephalic projection neurons (exc. L5 ET) compared with near-projecting excitatory neurons in layers 5 and 6 (exc. L5/6 NP) were determined using the FindAllMarkers function from Seurat (Wilcoxon rank-sum test with Bonferroni correction for multiple testing; Padj < 0.05). We tested only genes that were detected in a minimum of 25% of the cells within the cell type (min.pct = 0.25) and that showed, on average, at least a 0.25-fold difference (log-scale) between exc. L5 ET cells and exc. L5/6 NP cells (logfc.threshold = 0.25). The exc. L5 ET module score was computed based on the identified marker genes using the AddModuleScore function of Seurat. To determine the exc. L5 ET module scores across excitatory and inhibitory neurons, cells were downsampled to 2,000 cells per high-resolution cell type.
Cell–cell communication events were predicted using the LIgand-receptor ANalysis framework (LIANA)107 in R. Specifically, the ligand–receptor analysis was performed using liana_wrap(). The methods included were CellPhoneDB108, NATMI109 and SingleCellSignalR110. liana_aggregate() with the argument ‘aggregate_how’ set to ‘magnitude’ was run to find consensus ranks of different methods. only interactions (ligand–receptor pairs) with a robust rank aggregation (RRA) score smaller than 0.05 (aggregate_rank < 0.05) were considered in downstream analyses. The interaction score of ligand–receptor pairs was calculated by applying −log10 transformation to the RRA score (aggregate_rank). To determine the number of interactions and the overlap of interactions between regions, liana_wrap() was run on the pool of cells isolated from all individuals, with separate analyses conducted for each brain region. To determine cell–cell communication events that are brain-region specific, liana_wrap() was run separately for each individual. We then used a linear mixed-effects model to evaluate the association between the interaction scores of individual ligand–receptor pairs obtained from LIANA and the respective brain region serving as the predictor variable. To account for potential confounding factors and individual variability, we included age, sex and post-mortem interval as covariates in the linear mixed-effects model. These variables were added as fixed effects to the model. Moreover, we included a random effect for the individual to capture the participant-specific variability in the data. Linear mixed-effects models were implemented using the R software packages lme4111 and lmerTest112. The lmer() function from the lme4 package was employed to fit the models. To obtain P values for the fixed effects in these models, we used the lmerTest package, which incorporated Satterthwaite’s degrees of freedom approximation. To account for multiple hypothesis testing, the obtained P values were further adjusted using the Bonferroni method.
For comparing the relative abundance of major cell types across brain regions, the fraction of cells of a major cell type class was computed relative to all the cells isolated from a region. For the statistical analysis of cell type composition differences between brain regions, we also computed the relative abundance of major cell type classes separately for each study participant. To this end, the fraction of cells of a major cell type class was computed relative to all the cells isolated from a brain region of an individual. At the level of high-resolution cell types or cell states, two distinct measures of relative abundance were computed. First, the relative abundance of each subtype of a major cell class was computed as the proportion of a subtype relative to all cells of the corresponding major cell class isolated from a brain region. Second, for the statistical analysis of differences between brain regions, the fraction of cells of a subtype was computed relative to all the cells isolated from a brain region of an individual. The statistical analyses comparing the relative abundance of major cell types and subtypes between brain regions was performed using Prism 9 software.
We calculated compositional differences in individuals with AD versus individuals without AD (or AD dementia versus no dementia) by modelling the number of cells of a certain cell type or subtype in a specific sample (individual by region) relative to the total number of cells using a quasi-binomial regression model. We modelled AD status by binary ascertainment variables (cogdx 4–5, NIA-Reagan score 1–2, Braak stage 5–6 versus others, as well as any detected presence of NFTs, neuritic plaque or diffuse plaques in the region) while adjusting for brain region and sex. We used the emmeans package in R to assess the significance of the regression contrasts and used p.adjust with the fdr method to adjust P values. We modelled the effects of fraction of cells on cognitive performance in multiple domains with gaussian linear regression of cognitive performance on last visit versus the log10-transformed fraction of cells in the subtype or major cell type jointly with covariates for age, sex, APOE-ε4 and post-mortem interval, with false-discovery rate P-value correction (p.adjust in R). We compared the fractional abundances of pairs of neuronal subtypes between two regions using Kendall’s τ only in individuals with AD (NIA-Reagan score). Significance was assessed using beta regression (R library betareg) controlling for sex, APOE genotype, post-mortem interval and age of death, and we adjusted P values using p.adjust in R with the fdr method.
We performed differential expression analyses with three separate methods: MAST, Nebula and Wilcoxon testing113,114. For all methods, we subset the tested genes to only genes present in over 20% of cells. For MAST and Nebula, we calculated and included in the regression the top 10 components of unwanted variation calculated using RUV on the pseudo-bulk-level data (individuals by regions). For these methods, we also included as covariates the individual’s sex, age of death and post-mortem interval, each cell’s counts per gene and number of captured genes and, where applicable, the high-resolution cell subtypes and the brain region. For Nebula we used a Poisson mixed-model on the counts data with an offset of the log10-transformed total counts per cell. For MAST and Wilcoxon, we normalized each cell to a total library size of 10,000 counts. We ran Wilcoxon tests on both the cell and individual levels. We adjusted P values for multiple testing in all cases by using the p.adjust function in R with the fdr method. For our final set of differential genes in each analysis, we took all genes that were significant (Padj > 0.05) and concordant in both the MAST and Nebula results. We also provide the results for Wilcoxon tests, but did not use these to determine concordant results as they do not correct for many covariates. We computed differential expression against five AD ascertainment variables: continuous measurements of NFT, plaq_n, and plaq_d measured in each region except the thalamus (excluded from these analyses) and binary cognitive impairment (cogdx no dementia = 1 and 2 versus AD dementia = 4 and 5) and NIA-Reagan score classifications (non-AD = 3 and 4 versus AD = 1 and 2). We provide differential expression results for each of the 14 major cell types (with T cells, CNS macrophages, and each vascular subtype separately) for all regions jointly and for each region separately. We also provide results for each of the excitatory subtypes either in its most prevalent region for EC, HC or TH subclasses, or across the neocortex for neocortical subtypes (Supplementary Table 9). We also computed DEGs for the interaction between pathologic diagnosis of AD and sex in each major cell type, both across all regions and in each region separately. For the glial energy metabolism analyses we recomputed all DEGs in glial glycolysis-associated modules separately (keeping all genes, with no cell percentage cut-off). Glycolysis pathway diagram is from the glycolysis and gluconeogenesis pathway from WikiPathways115.
Pathology-biased DEGs were based on neuritic plaque or NFT pathology measurements in each region and were computed in each major cell type across all regions and in each region (except for the TH). Genes were ordered by the residual between NFT effect size and predicted NFT effect size from a regression using plaque effect size and region. Genes were retained if they were consistently up (or down) in 3+ regions for either NFT or plaque but in fewer than 2 regions for the other pathology measurement (shared genes are genes found in 2+ cell types).
We compared our differential expression results to results from seven different previously published studies11,12,19,47,48,49,50. We compared the published DEGs both to: (1) cross-region DEGs calculated in each major cell type for individual-level AD status (NIA-Reagan score or clinical diagnosis of AD) and for quantitative measurements of AD pathology (neuritic plaques, diffuse plaques and NFTs); and (2) region-specific DEGs calculated in each major cell type and in endothelial cells, computed relative to pathologic diagnosis of AD (NIA-Reagan score, AD, 1–2; non-AD, 3–4). As some studies reported only the significant genes, we compared the log-transformed fold change estimates for our DEGs and reported DEGs by a Pearson correlation test.
To assess the enrichment of upregulated, downregulated non-differentially expressed genes in each module, we first assigned each tested DEGs to its closest module by correlation to the module’s average expression profile. We then performed a hypergeometric enrichment test for the number of genes in a category (upregulated, downregulated, not differentially expressed) assigned to the module, against the total number of tested genes assigned to the module, the total number in the category and the total number tested and corrected P values using p.adjust (Benjamini–Hochberg). Enrichments of pathology-biased DEGs in modules were performed in the same manner.
To partition neuronal DEGs into non-vulnerable and vulnerable-associated subclasses, we calculated each genes’ average expressions and differential expression effect sizes at the subtype level and compared these to the relative depletion of the subtypes. For each gene that was differentially expressed in late-AD (stratified by Braak stage, late AD, 5–6 versus non-AD or early-AD, 1–4) in at least 25% of all neuronal subtypes, we calculated the correlation of its average subtype expression in late-AD with each subtype’s compositional stability (log2[OR] in late AD) across excitatory subtypes, separating non-vulnerable-associated genes (correlation >0.2)来自脆弱相关的基因(相关性< −0.2). We calculated functional enrichments on neuronal DEG partitions using the top 250 genes ordered by effect size in each category. We further separated DEGs with higher effect sizes in vulnerable subtypes from those with similar effect sizes across all neuronal subtypes by calculating the correlation of their differential effect sizes in each subtype with that subtype’s depletion (log2[OR] in late AD). To perform enrichments along the continuum of genes associated with vulnerability to non-vulnerability, we kept only genes with biased effect sizes (effect size correlation < −0.2) and binned them along the axis of expression correlation (window size 0.2 for at 0.05 intervals) and performed functional enrichments for all bins jointly.
We performed DEG enrichments for each differential expression run using the gprofiler2 package in R, with multi-query for upregulated and downregulated genes, as unordered queries, a P-value cutoff of 0.05, and using GO, REAC, WP, KEGG and CORUM as annotation sources, and retained enriched terms with fewer than 500 genes. Module and module cluster enrichments were performed in the same manner, using the core genes identified for each module and for genes found in more than two modules within a module cluster.
We identified markers associated with excitatory neuron subtype vulnerability by performing linear regression to predict the log2[OR] of each subtype’s depletion in late AD based on its expression at the subtype-aggregate level (log[X + 1], averaged normalized expression in each subtype by region by individual batch), controlling for age, sex and post-mortem interval and adjusting P values with p.adjust (fdr).
Processed snRNA-seq data (DLPFC, experiment 2) were obtained from Synapse (syn51123521) and integrated with our own PFC snRNA-seq dataset comprising 427 individuals. To identify vulnerable inhibitory neuron subtypes, we examined the association between the relative abundance of cell types and the measure of NFT density (variable tangles). We used a quasi-binomial regression model to model the number of cells belonging to a specific cell type in a given sample (individual study participant), relative to the total number of cells in that sample. We fitted the regression model using the glm function in R, including age, sex and post-mortem interval as covariates. P values were corrected for multiple testing using the Benjamini–Hochberg procedure as implemented in the R function p.adjust. The results are presented in the form of association scores (signed −log10-transformed Benjamini–Hochberg-adjusted P value, where the sign was determined by the direction (positive or negative) of the association). Inhibitory neuron subtypes demonstrating a significant negative association with tangle density (Benjamini–Hochberg-adjusted P value < 0.05) were classified as vulnerable subtypes, while all other subtypes were categorized as non-vulnerable. Genes exhibiting differential expression between vulnerable and non-vulnerable inhibitory neuron subtypes were identified on the basis of our PFC snRNA-seq dataset. This analysis was restricted to individuals without a pathologic diagnosis of AD. The differential expression analysis comparing vulnerable to non-vulnerable inhibitory neuron subtypes was performed using the R package dreamlet (https://diseaseneurogenomics.github.io/dreamlet/). We used the dreamletCompareClusters function with the argument ‘method’ set to ‘fixed’ for this analysis. Adjusted P values for multiple testing were obtained using the topTable function of dreamlet, with the ‘adjust.method’ set to ‘BH’.
Intersection of regional expression and pathology-specific DEGs (across all regions) was performed for 149 annotated AD GWAS familial and AD risk loci from recent GWAS54,56,57,58. We calculated the disease-relevance score of each cell in the dataset against a recent Alzheimer’s GWAS, using scDRS (based on MAGMA)54,55,116. For the scDRS results, we counted the fraction of cells with significant scDRS scores (FDR < 0.05) in each cell type, subtype and region. To test for overlap with microglia/immune modules, we compared the set of immune cells with significant expression of each module (z score >2.5)并且具有具有显着SCDRS分数的细胞集(FDR< 0.05) and performed a hypergeometric test for significance of the overlap (Padj < 0.01, Benjamini–Hochberg correction). To identify region-specific GWAS genes, we performed an analysis of variance for the effect of region on average gene expression at the pseudobulk level.
To quantify CR, we computed a CR score as the difference between the observed cognition and the cognition predicted by a linear regression model, given the level of pathology (Fig. 5a). Using this approach, we computed cognitive resilience (CR) scores based on the measure of global cognitive function and CDR scores based on the measure estimating the rate of change of global cognitive function over time (Fig. 5a). Four distinct CR and CDR scores were derived using four distinct measures of AD pathology, namely global AD pathology and, separately, neuritic plaque burden, NFT burden and tangle density.
We performed differential expression analyses using the R package muscat to identify genes associated with CR in the PFC117. Low-expressed genes were excluded and only genes with more than one count in at least ten cells were considered. To take advantage of robust bulk RNA-seq differential expression frameworks, such as edgeR118, in a first step, muscat aggregates measurements for each sample (in each cluster) to obtain pseudobulk data. Using this approach, single-cell measurements were aggregated per study participant and cell type using the sum of raw counts option. Differential expression analysis was run using the edgeR method as implemented in muscat. We included as covariates the individual’s age at death and post-mortem interval. We report adjusted P values for multiple testing in all cases by using the p.adjust function with the Benjamini–Hochberg method as implemented in muscat. The multiple testing correction was performed locally, that is, on each of the cell types separately with the number of tests equal to the number of genes considered. These differential expression analyses were performed on the entire set of 427 individuals except for the group-based differential expression analysis based on our categorical definition of CR. In this case, we focused on comparing two distinct groups determined by their pathologic and clinical diagnoses of AD. First, we identified individuals with a pathologic diagnosis of AD, using the NIA-Reagan pathology criteria. Subsequently, these individuals were further categorized on the basis of their clinical consensus diagnosis of cognitive status at the time of death. Specifically, we compared individuals with no cognitive impairment (NCI, final consensus cognitive diagnosis (cogdx) value of 1) against individuals with a cognitive diagnosis of AD dementia and no other cause of cognitive impairment (cogdx value of 4) among individuals with a pathologic diagnosis of AD.
To confirm the differential gene expression results based on the CR and CDR scores, we also evaluated the association between gene expression and global cognitive function or the rate of change of global cognitive function adjusting for AD pathology as a covariate. The AD pathology measures considered as a covariate were global AD pathology (gpath), neuritic plaque burden (plaq_n), NFT burden (nft), or tangle density (tangles). Thus, together with the DGE analysis based on CR and CDR scores, we performed a total of 16 different tests assessing the association between gene expression and CR.
We used the model-based analysis of single-cell transcriptomics (MAST) tool to investigate whether the CR genes identified in PFC astrocytes were also associated with CR in astrocytes from other regions of the human brain. To ensure robust analysis, we initially filtered the genes under investigation, selecting only those with more than one count in at least 10 cells. The analytical model incorporated the condition variable of interest, as well as several covariates known to influence gene expression. These covariates included the cellular detection rate (cngeneson), age at death (age_death), post-mortem interval (pmi), and sex (msex). We also accounted for potential participant-specific variation in the data by incorporating a random effect term for the individual (1|individual). To account for multiple comparisons, the P values were adjusted using the FDR method as implemented in the p.adjust function.
Differential expression analysis of bulk RNA-seq data from the ROSMAP cohorts was performed using DESeq2119 (plotted) and edgeR118. Age at death and post-mortem interval were converted into z scores and included as covariates in the regression equation. Both approaches (DESeq2 and edgeR) provided similar results.
The average expression level of each gene within each major cell type was determined using the ‘Averageexpression’ function from the Seurat R package. The genes considered in the differential expression analysis for each major cell type were categorized into ten subsets based on their average expression level within the corresponding cell type. We next intersected the genes in each of the ten subsets with genes identified as significantly associated with either neuritic plaque burden (in our dataset) or the CPS score (in the MTG SEA-AD dataset). This intersection enabled us to determine the number of significant DEGs in each subset. The process was performed separately for genes positively and negatively associated with AD pathology. Subsequently, we randomly sampled the determined number of significant DEGs from each of the 10 subsets, ensuring that the expression level distribution of the DEGs was preserved in the random samples. This random sampling step was repeated for a total of 1,000 iterations. These steps were performed separately for both our dataset and the SEA-AD dataset. For each of the 1,000 random samples, we determined the overlap of genes between datasets and compared it to the observed overlap between the two datasets. To assess the significance of the observed overlap, we computed z scores, which represent the difference between the observed value of overlap and the mean value of overlap based on the permutation results, divided by the s.d. of the permutation results.
To further validate our differential expression results, we evaluated the correlation between the effect sizes of gene expression changes observed in our study and those identified through quantitative proteomics51. We specifically examined the correlation between the effect sizes of genes associated with neuritic plaque burden in our study and the effect sizes of overlapping differentially expressed proteins in the quantitative proteomics analysis of bulk tissue. The correlation was computed using the cor.test function in R with the argument ‘alternative’ set to ‘two.sided’ and the argument ‘method’ set to ‘pearson’. P values were adjusted for multiple testing using the Benjamini–Hochberg method as implemented in the R function p.adjust.
We determined genes significantly associated with global AD pathology for each glial cell type, using single-nucleus RNA sequencing data from the PFC of 427 participants in the ROSMAP study. We then calculated module scores for these gene sets in astrocytes, microglia, oligodendrocytes, and OPCs using the Seurat ‘AddModuleScore’ function. The module scores were determined separately for genes positively and negatively associated with global AD pathology. To assess the progression of these scores across the spectrum of global AD pathology burden, we averaged the module scores of all cells of a specified cell type isolated from the brain region of interest of an individual. For visualizing the relationship between the global AD pathology burden and mean module scores, we employed Locally Estimated Scatterplot Smoothing (LOESS) using the ggplot2 package in R, with the ‘geom_smooth’ function and the ‘method’ parameter set to ‘loess’. The correlation of mean module scores between regions was determined using the cor.mtest function of the R package corrplot. P values were adjusted for multiple hypotheses testing using the Benjamini–Hochberg method as implemented in the R function p.adjust.
Frozen human post-mortem brain samples were embedded in Tissue-Tek OCT compound (VWR; 25608-930), sectioned on a Leica cryostat at a thickness of 20 μm, and mounted onto Fisherbrand Superfrost Plus microscope slides (Thermo Fisher Scientific; 12-550-15). Slides were fixed in 4% paraformaldehyde at 4 °C for 30 min, and dehydrated in ethanol. The RNAscope 2.5 HD Chromogenic Duplex Detection Kit and RNAscope Multiplex fluorescent V2 Kit (ACDBio) were then used according to the manufacturer’s instructions. Tissue samples were hybridized using the following chromogenic RNAscope probes: GAD2, FOXP2, MEIS2, AQP4, GRM3, ADCY8, PFKP, PNPLA6, GPCPD1 and CHDH (ACDBio). For in situ hybridization of Reelin, tissue samples were hybridized using the following fluorescent RNAscope probes: vGlut and Reelin. Cell nuclei were stained with 50% haematoxylin (for chromogenic experiments) or with Hoechst (for fluorescent experiments). For fluorescence RNAscope analysis, sections were incubated in TrueBlack (Biotium; 23007) for 10 s before Hoechst staining to quench auto-fluorescence. Images were acquired using the Zeiss LSM 900 confocal microscope, with a 63× oil objective. Two images were acquired per tissue sample. For both chromogenic and fluorescence RNAscope experiments, puncta were manually counted by researchers blinded to the experimental group of each image.
All experiments were performed according to the Guide for the Care and Use of Laboratory Animals and were approved by the National Institute of Health and the Committee on Animal Care at Massachusetts Institute of Technology. Sample sizes were determined on the basis of previous work from our laboratory, without power analysis calculation or randomization. In the experiment comparing App-KI (C57BL/6-App
将切片在原代抗体(抗凝血素,1:200; Anti-Neun,1:200,抗磷tau,1:200;和抗淀粉样蛋白β,1:500)中孵育。在4°C下过夜。初级抗体孵育后,将切片用PBS洗涤3次,两次用阻止液进行两次,并在辅助抗体(1:1,000)(1:1,000)在室温下孵育2小时。然后将切片用PBS洗涤3次,并与Hoechst(1:1,000)孵育10分钟,然后再用PBS洗涤一次。
EC的共聚焦瓷砖扫描是在Zeiss LSM 900上使用20倍物镜在所有队列中均保持一致的激光设置。根据先前的标准,确定了II – III级EC120。共聚焦瓷砖扫描的正交投影被出口到斐济进行信号定量。在斐济,将EC的II – III置于感兴趣的区域,并使用宏来计数感兴趣区域中的reelin阳性细胞,并量化每个细胞的平均荧光强度。随后将reelin通道的信号强度标准化为Neun信号。研究人员对动物基因型视而不见。
西雅图阿尔茨海默氏病脑细胞图集(SEA-AD)财团(Seaad_Mtg_rnaseq_final-Nuclei.2023-05-05.H5AD)产生的已加工的SNRNA-SEQ数据是从西雅图阿尔茨海默氏病疾病脑细胞Atlas(Sea-ad Atlas(Sea-ad)获得的。(https://sea-ad-single-cell-profiling.s3.amazonaws.com/index.html#mtg/rnaseq/)。SEA-AD DLPFC数据(seaad_dlpfc_rnaseq_final-nuclei.2023-07-19.h5ad)从https://sea-ad-ad-single-cell-profiling.s3.amazonaw.s.coms.coms.coms.com/index.htex.htmlllppfc/rnaseq/下载。从Linnarsson实验室获得了其他处理后的SNRNA-SEQ数据集(特别是H5AD文件神经元和非神经元。H5AD)(https://console.cloud.google.com/storage/browser/linnarsson-lab-human@tab=Objects?authuser = 0&prefix=&forceonObjectsssSsSsSSSSSSSSSSSSSSSORTINGFILTERTEN = FALSE)。
有关研究设计的更多信息可在与本文有关的自然投资组合报告摘要中获得。