发展中新皮层的分子和细胞动力学

2025-06-24 05:18来源:本站

  从四个来源获取了人脑组织样品(补充表1和5)。

  从人类发育生物学资源(HDBR)中收集了四个去识别的头三年人体组织样本,该样品使用冠状长度进行了上演,在干冰上解剖并捕获了冻干。

  在扎克伯格旧金山综合医院(ZSFGH)收集了13个识别的第二孕期人体组织样本。二孕期人体组织样品的获取得到了UCSF人配子,胚胎和干细胞研究委员会的批准(10-05113)。所有实验均根据协议指南进行。在收集样本之前,获得了知情同意,并用于本研究。

  在E.J.H.领导的UCSF小儿神经病理研究实验室(PNRL)上,获得了两个识别的三个月和早期产后组织样本。这些样本是在严格遵守法律和机构道德法规的患者同意下获得的,并根据UCSF IRB委员会批准的研究协议。解剖这些样品,然后在放在干冰平板上的冷板上或在干冰上等异苯烷中进行快速冻结。

  通过NIH Neurobiobiobank从马里兰州的大脑和组织库中获得了23个识别的三个月,早期产后和青少年组织样本,没有已知的神经系统疾病。

  补充表1中提供了用于单核多组分分析的样品列表,并在补充表5中提供了用于空间转录组分析的样品列表。

  小鼠实验已获得UCSF机构动物护理和使用委员会(IACUC)的批准,并根据相关的机构准则进行。小鼠在标准的12 h – 12 h浅色周期中饲养,湿度在30至70%之间,温度在68至79°F之间。

  先前报道了详细的协议48。所有程序均在冰上或4°C上完成。简而言之,使用含有1 ml冷匀化缓冲液(HB)(20 mM Tricine-KOH pH 7.8,250 mm蔗糖,250 mM蔗糖,25 mm kcl,25 mm kcl,5 mmmgcl2,5 mmmgcl2,1 mmmmmmmmmmmmmmmmmmmmmmm sprerm,0.5 mmmmmine,0.5 mmmmmine 0.5 mmmmmine 0.5 mmmmmine 0.5 mmmms,使用1 ml冷均质缓冲液(HB)(20 mM Tricine-KOH pH 7.8,250毫米),冷冻的组织样品(20-50 mg)均质化。NP-40,1×完整的蛋白酶抑制剂(Roche)和0.6 U ML-1 Ribolock(Thermo Fisher Scientific))。将组织样品用松散的杵匀浆,并用紧密的杵匀浆。通过以350克离心5分钟,将核固定在25%的碘糖溶液中,并加载到30%和40%的碘二醇层中,以使梯度成为梯度。将梯度以3,000克离心20分钟。在30–40%的界面上收集清洁核,并在洗涤缓冲液中稀释(10 mM Tris-HCl pH 7.4、10 mM NaCl,3 mM MGCL2、1 mM Dithiothreitol,1%BSA,0.1%Tween-20-20和0.6 U ML-ML-1 Ribolock(Heplo ribolock(Heplo Fishericific))。接下来,通过以500克离心5分钟离心并重悬于稀释的核缓冲液(10倍基因组学)中,从而使核被固定。使用血细胞计数仪对核进行计数,每μL稀释至3,220个核,并根据10倍基因组铬进行进一步处理,下一个GEM单细胞多组MultiOMETIOME ATAC +基因表达试剂试剂盒用户指南。我们每个反应每个样品靶向10,000个核。在Novaseq 6000测序系统上汇总了来自单个样品的库,并将每个核的25,000对读取对靶向ATAC,而RNA的读对25,000。

  使用BCL格式的原始测序信号使用Cell Ranger Arc Suite中的MKFASTQ函数(v.2.0.0,10x Genomics)中的MKFASTQ函数将其删除为FastQ格式。根据10X基因组学的方案,使用人参考基因组(GRCH38,GENCODE V32/ENSEMBL98)实施了细胞游侠-ARC计数管道,用于使用人类参考基因组(GRCH38,GENCODE V32/ENSEMBL98)进行细胞条形码调用,读取对齐和质量评估。该管道评估了从背景中保留所有完整核的总体质量,并过滤了与非核相关的读数。这项研究中的所有基因表达文库均显示核中的读数很高,表明所谓的核中的RNA含量很高,并且检测到的环境RNA水平最小。补充表1列出了每个样本的数据质量的总体摘要。接下来,我们在单个核水平上进一步评估数据,并保留了以下标准:(1)基因表达计数(NCOUNT_RNA)在1,000至25,000的范围内;(2)检测到的基因的数量(NFeature_RNA)大于400;(3)峰区域(ATAC_PEAK_REGION_FRAGMENTS)的ATAC片段数量在100至100,000范围内;(4)ATAC – SEQ的转录起始位点(TSS)富集得分大于1;(5)核小体信号的强度(单核体与无核体片段的比率)低于2。为了确保仅分析单个核,我们通过SCRUBLETET49测量了DoubleT概率,并排除了所有潜在的Doublets的得分大于0.3的下游分析。总共包括通过所有质量控制标准的243,535个核以进行进一步分析。

  对于SNMultiome分析的ATAC数据,使用MACS2(v.2.2.7)50在单个样品上调用开放染色质区域峰。将所有样品的峰统一分为基因组间隔,而落入黑名单区域的间隔则排除在外51。在所有398,512个加工的ATAC峰中,所有核的共有峰的前20%(n = 82,505)被选为下游片段计数和数据积分的可变特征。每个样本的峰值计数通过使用R软件包Signac(V.1.10.0)52的相互潜在语义索引(LSI)投影函数集成。对于RNA-seq数据,使用seurat(v.4)6中的sctransform v2(v.0.4.1)53进行归一化和数据缩放。在数据整合之前,对每个核的G2M和S相之间的细胞周期差进行了评分。所有核传递质量控制的转化基因数据矩阵通过使用Seurat v.4在不同样品之间的相互PCA投影进行了整合,遵循先前描述的最佳实践52,54。

  使用具有1-50个主要成分和2-40个LSI组件的Seurat v.4进行加权最近的邻居分析。最接近的邻居图用于使用SLM算法55进行UMAP嵌入和聚类。丢弃了在纹状体(ISL1和SIX3)中表达的已知标记的簇,并丢弃了Diencephalon(OTX2和GBX2)。此外,丢弃了神经突(NRGN)和少突胶质细胞过程(MBP)中两个转录物的簇,这可能是由于碎屑污染所致。这些过滤步骤在最终数据集中产生了232,328个核(扩展数据图1和补充表2)。使用过滤的数据重新计算了加权最近的邻居,尺寸降低和聚类。基于已知标记基因的表达确定细胞身份,如图3和补充表3所示。五个确定的类别是祖细胞,神经元,神经胶质,免疫细胞和血管细胞。The 11 identified subclasses were RGs, intermediate progenitor cell for ENs (IPC-EN), glutamatergic neurons, GABAergic neurons, intermediate progenitor cell for glia (IPC-glia), astrocytes, oligodendrocyte precursor cells (OPCs), oligodendrocytes, Cajal–Retzius cells, microglia and vascular细胞。已鉴定出的细胞类型为心室RGS(RG-VRG),截短的RGS(RG-TRG),外部RGS(RG-ORG),IPC-EN,新生儿ENS(新生儿),未成熟的IT神经元(EN-IT-IN-IMMATIAL),第2-3层(L2-3)ITNERONS(L2-3)ITERON(L2-3)ITERONS NERONS(L2-3)ITERONS(EN-L4-LIN4-LIT)(EN-L4-LIT),L2_3-IT-3-IT,l2-3--l2-IT,L2-IT,L2-IT,l2-3--3--3--3-IT(L2-3),l2-IT,l2-3-IT(L2-3)。L5 IT neurons (EN-L5-IT), L6 IT neurons (EN-L6-IT), immature non-IT neurons (EN-non-IT-immature), L5 extratelencephalic neurons (EN-L5-ET), L5–6 near-projecting neurons (EN-L5_6-NP), L6 corticothalamic neurons (EN-L6-CT), EN-L6b, dorsal外侧神经节杰出的衍生不成熟的INS(dlge-Inmumemation),尾神经节神经脱位衍生的未成熟INS(内部含量),VIP INS(CGE-VIP),SNCG INS(CGE-SNCG),LAMP5 INS(MIX-LAMP5 ins(mix-Lamp5) 中型神经节杰出的衍生不成熟的INS(IN-MGE-INMEMATER),SST INS(IN-MGE-SST),PVALB INS(IMGE-PV),IPC-GLIA),未成熟的星形胶质细胞(星形胶质细胞 - 免疫),原生质质质体星形胶质细胞(星形胶质细胞),卷素 - 纤维素(Artrotocyte),术语(星形胶质细胞)OPC,未成熟的少突胶质细胞(少突胶质细胞),少突胶质细胞,Cajal-Retzius细胞,小胶质细胞和血管细胞。

  使用R包斑点(V.1.2.0)56和Limma(V.3.58.1)57中实施的线性模型方法对不同年龄组和大脑区域的细胞类型比例的变化进行了研究。为了确定随着时间的推移的细胞类型比例的变化,我们将每个样品中的比例转换为logit,并使用limma拟合了线性模型(〜log2 [age] + region)。此外,为了解决来自同一个人的样本之间的潜在相关性,应用了Limma中的重复关系。一旦拟合模型,使用经验贝叶斯收缩的调节t检验来测试每种细胞类型的Log2 [年龄]系数的统计显着性。为了确定PFC和V1之间的细胞类型比例差异,进行了类似的分析,但仅使用了三个月及以上的样品。Benjamini – Hochberg调整后的细胞类型P< 0.05 were determined to be significant (Supplementary Table 3).

  The per-cell regulatory activities of TFs were quantified by chromVAR (v.1.16.0)58. In brief, peaks were combined by removing any peaks overlapping with a peak with a greater signal, and only peaks with a width greater than 75 bp were retained for motif enrichment analysis. We computed the per-cell enrichment of curated motifs from the JASPAR2020 database59. In total, 633 unique human transcriptional factors were assigned to their most representative motifs. The per-cell-type transcriptional activity of each TF was represented by averaging the per-cell chromVAR scores within the cell type, and the cell-type-specific TFs were chosen for further analysis and visualization (Supplementary Table 4).

  Spatial transcriptomic analysis using MERFISH was performed using the Vizgen MERSCOPE platform. We designed a customized 300-gene panel composed of cell-type markers (Supplementary Table 5b) using online tools (https://portal.vizgen.com/). Fresh-frozen human brain tissue samples were sectioned at a thickness of 10 µm using a cryostat and mounted onto MERSCOPE slides (Vizgen). The sections were fixed with 4% formaldehyde, washed three times with PBS, photobleached for 3 h and stored in 70% ethanol for up to 1 week. Hybridizations with gene probes were performed at 37 °C for 36–48 h. Next, the sections were fixed using formaldehyde and embedded in a polyacrylamide gel. After gel embedding, the tissue samples were cleared using a clearing mix solution supplemented with proteinase K for 1–7 days at 37 °C until no visible tissue was evident in the gel. Next, the sections were stained for DAPI and poly(T) and fixed with formaldehyde before imaging. The imaging process was performed on the MERSCOPE platform according to the manufacturer’s instructions. Cell segmentation was performed using the Watershed algorithm based on seed stain (DAPI) and watershed stain (poly(T)).

  Standard MERSCOPE output data were imported into Seurat (v.5)60. We retained high-quality cells with the following criteria: (1) cell volume is greater than 10 µm3; (2) gene expression count (nCount_Vizgen) is in the range of 25 to 2,000; (3) the number of detected genes (nFeature_ Vizgen) is greater than 10. Normalization, data scaling and variable feature detection were performed using SCTransform v.2 (v.0.4.1)53. The transformed gene-by-cell data matrices for all cells passing quality control were integrated by reciprocal PCA projections between samples using 1–30 principal components. After integration, nearest-neighbour analysis was performed with 1–30 principal components. The resulting nearest-neighbour graph was used to perform UMAP embedding and clustering using the Louvain algorithm61. Clusters with markers known to be mutually exclusive were deemed doublets and discarded. These filtering steps resulted in 404,030 cells in the final dataset (Supplementary Table 6). The identity of specific cell types was determined based on the expression of known marker genes, as is shown in Extended Data Fig. 4b. Niches were identified by k-means clustering cells based on the identities of their 50 nearest spatial neighbours.

  GW23–24 human cortical samples were fixed in 4% paraformaldehyde (PFA) in PBS at 4 °C overnight. The samples were cryoprotected in 15% and 30% sucrose in PBS and frozen in OCT. The samples were sectioned at a thickness of 16 µm, air-dried and rehydrated in PBS. Antigen retrieval was performed using citrate-based antigen unmasking solution (Vector Laboratory) at 95 °C for 15 min. The slides were then washed in PBS and blocked in PBS-based blocking buffer containing 10% donkey serum, 0.2% gelatin and 0.1% Triton X-100 at room temperature for 1 h. After blocking, the slides were incubated with primary antibodies in the blocking buffer at 4 °C overnight. The slides were washed in PBS and 0.1% Triton X-100 (PBST) three times and incubated with secondary antibodies in the blocking buffer at room temperature for 2 h. The slides were then washed in PBST three times as described above, counterstained with DAPI and washed in PBS once more. The slides were mounted with coverslips using ProLong Gold (Invitrogen). Confocal tiled images were acquired on the Zeiss LSM900 microscope using a 20× air objective. Acquired images were processed using Imaris v.9.7 (Oxford Instruments) and ImageJ v.1.5462. The following antibodies were used: NR2F2 (Abcam, ab211777, 1:250) and LHX6 (Santa Crux, sc-271433, 1:250).

  To evaluate the spatial proximity of cell types in each sample, we obtained a neighbourhood enrichment z-score using the nhood_enrichment function from Squidpy (v.1.2.3)63. The graph neural-network-based NCEM (v.0.1.4) method13 was used for intercellular communication modelling (Supplementary Table 7). A node-centric linear expression analysis was implemented to predict gene expression states from both cell-type annotations and the surrounding neighbourhood of each cell, where dependencies between sender and receiver cell types were constrained by the connectivity graph with a mean number of neighbours around 10 for each cell within each sample. One exception is that sample ARKFrozen-65-V1 was randomly downsampled to 60,000 cells to ensure that it has a similar neighbourhood size to other samples. Significant interactions were called if the magnitude of interactions (the Euclidean norm of coefficients in the node-centric linear expression interaction model) was above 0.5 and at least 25 differentially expressed genes (q < 0.05 for specific sender–receiver interaction terms) were detected. For visualization purposes, only significant interactions were plotted in circular plots.

  We implemented CellChat (v.1.6.1)14 to quantify the strength of interactions among cell types using the default parameter settings (Supplementary Table 8). After normalization, the batch-corrected gene expression data from all 232,328 nuclei were taken as the CellChat input. We considered all curated ligand–receptor pairs from CellChatDB, where higher expression of ligands or receptors in each cell type was identified to compute the probability of cell-type-specific communication at the ligand–receptor pair level (refer to the original publication for details). We filtered out the cell–cell communication if less than ten cells in the outgoing or incoming cell types expressing the ligand or receptor, respectively. The computed communication network was then summarized at the signalling pathway level and was aggregated into a weighted-directed graph by summarizing the communication probability. The calculated weights represent the total interaction strength between any two cell types. The statistically significant ligand–receptor communications between the two groups were determined by one-sided permutation tests, where P < 0.05 was considered to be considered significant.

  Primary cortical tissue from GW16–24 was maintained in artificial cerebrospinal fluid (ACSF) containing 110 mM choline chloride, 2.5 mM KCl, 7 mM MgCl2, 0.5 mM CaCl2, 1.3 mM NaH2PO4, 25 mM NaHCO3, 10 mM -(+)-glucose and 1× penicillin–streptomycin. Before use, ACSF was bubbled with 95% O2/5% CO2. Cortical tissue was embedded in a 3.5% or 4% low-melting-point agarose gel. Embedded tissue was acutely sectioned at 300 μm thickness using the Leica VT1200 vibratome before being plated on Millicell inserts (Millipore, PICM03050) into six-well tissue culture plates. Tissue slices were cultured at the air–liquid interface in medium containing 32% HBSS, 60% basal medium Eagle, 5% FBS, 1% glucose, 1% N2 and 1× penicillin–streptomycin–glutamine. The slices were maintained for 12 h in culture at 37 °C for recovery. After recovery, the slices were grown in the presence of 1 μM Octreotide (SelleckChem, P1017), 4 μM (1R,1′S,3′R/1R,1′R,3′S)--054,264 (Tocris, 2444), or without any compound as a control. The slices were maintained for 72 h in culture at 37 °C, and the medium was changed every 24 h.

  The cultured slices treated with somatostatin receptor agonists were fixed using the Chromium Next GEM Single Cell Fixed RNA Sample Preparation Kit (10x Genomics, 1000414) according to the manufacturer’s instructions. In brief, the slices were finely minced on the prechilled glass Petri dish, transferred into 1 ml fixation buffer, incubated at 4 °C for 18 h and stored at −80 °C with 10% enhancer and 10% glycerol. After collecting all of the samples from six experimental batches, the stored samples were manually dissociated using Liberase TL (Sigma-Aldrich, 5401020001). Dissociated cells were counted using a haemocytometer and then proceeded to fixed scRNA-seq following the 10x Chromium Fixed RNA Profiling Reagent Kits (for Multiplexed Samples) user guide. In brief, fixed single-cell suspensions were mixed with Human WTA Probes BC001–BC016, hybridized overnight (18 h) at 42 °C, washed individually and pooled after the washing. Gene expression libraries were pooled and sequenced on the NovaSeq X sequencing platform, targeting 20,000 read pairs per cell.

  The Cell Ranger multi pipeline was implemented for cell barcode calling, read alignment and quality assessment using the human probe set reference (Chromium_Human_Transcriptome_Probe_Set_v1.0.1_GRCh38-2020-A) according to the protocols described by 10x Genomics. The overall summary of data quality for each sample is listed in Supplementary Table 9. We next further assessed the data at the individual-cell level and retained high-quality cells with the number of detected genes (nFeature_RNA) greater than 500. Doublets were removed using the R package scDblFinder (v.1.18.0)64 with the default settings. Normalization and data scaling were performed using SCTransform v.2 (v.0.4.1)53. The transformed gene-by-cell data matrices for all cells passing quality control were integrated by reciprocal PCA projections between samples using 1–30 principal components. After integration, nearest-neighbour analysis was performed with 1–30 principal components. The resulting nearest-neighbour graph was used to perform UMAP embedding and clustering using the Louvain algorithm61. Clusters with fewer UMI counts and markers known to be mutually exclusive were deemed low quality and discarded. These filtering steps resulted in 132,856 cells in the final dataset (Supplementary Table 10). The identity of specific cell types was determined based on the expression of known marker genes, as is shown in Extended Data Fig. 8b.

  Pseudobulk differential gene expression analysis was performed using the pseudoBulkDGE function from the R package scran (v.1.32.0). UMI counts were aggregated across cell types, individual patients and treatment conditions. Pseudobulk samples with less than 10 cells were discarded. Next, we fitted the pseudobulked count data to a fixed-effect limma-voom model (~patient_ID +treatment). once the model was fit, moderated t-tests were used to determine statistical significance through limma’s standard pipeline (Supplementary Table 11). The resulting moderated t-statistics of each gene were ranked and used as the input for gene set enrichment analysis (GSEA) using the R package clusterProfiler65. GSEA was performed against gene sets defined by the terms of biological processes in Gene ontology (Supplementary Table 12). only pathway sets with gene numbers between 10 and 500 were used for the analysis.

  We implemented the SCENIC+ (v0.1.dev448+g2c0bafd) workflow15 to build GRNs of the developing human neocortex based on the snMultiome data. As running the workflow on all nuclei is memory intensive, we subsampled 10,000 representative nuclei by geometric sketching66 to accelerate the analyses while preserving rare cell states and the overall data structure. First, MACS2 was used for consensus peak calling in each cell type50. Each peak was extended for 250 bp in both directions from the summit. Next, weak peaks were removed, and the remaining peaks were summarized into a peak-by-nuclei matrix. Topic modelling was performed on the matrix by pycisTopic67 using the default parameters, and the optimal number of topics (48) was determined based on log-likelihood metrics. Three different methods were used in parallel to identify candidate enhancer regions: (1) regions of interest were selected by binarizing the topics using the Otsu method; (2) regions of interest were selected by taking the top 3,000 regions per topic; and (3) regions of interest were selected by calling differentially accessible peaks on the imputed matrix using a Wilcoxon rank sum test (log[FC]  >0.5和Benjamini – Hochberg调整后的P< 0.05). Pycistarget and discrete element method (DEM) based motif enrichment analysis were then implemented to determine whether the candidate enhancers were linked to a given TF68. Next, eRegulons, defined as TF-region-gene triplets consisting of a specific TF, all regions that are enriched for the TF-annotated motif, and all genes linked to these regions, were determined by a wrapper function provided by SCENIC+ using the default settings. We applied a standard eRegulon filtering procedure: (1) only eRegulons with more than ten target genes and positive region–gene relationships were retained; (2) only genes with top TF-to-gene importance scores were selected as the target genes for each eRegulon; and (3) eRegulons with an extended annotation was only kept if no direct annotation is available. After filtering, 582 eRegulons were retained (Supplementary Table 13). For each retained eRegulon, specificity scores were calculated using the RSS algorithm based on region- or gene-based eRegulon enrichment scores (AUC scores)69 (Supplementary Table 14). eRegulons with top specificity scores in each cell type were selected for visualization. Finally, we extended our eRegulon enrichment analysis from the 10,000 sketched nuclei to all 232,328 nuclei by computing the gene-based AUC scores for all 582 eRegulons using the R package AUCell (v.1.20.2)18 using the default settings.

  The predicted open chromatin regions (OCRs) regulated by the selected TFs in SCENIC+ were validated using ChIP–seq data described previously16. The data were downloaded from Synapse (https://www.synapse.org/Synapse:syn51942384.1/datasets). We focused on available data for core TFs of eRegulons with >10,000个芯片 - 隔离峰,导致24个数据集用于进一步分析。对于每个TF,将针对基因组背景的芯片 - seq峰中的Eregulon靶向OCR富集被计算为优势比。P值源自双面Fisher的精确测试,并进行多次比较的校正。使用长距离H3K4ME3介导的染色质相互作用捕获了Plac-Seq17捕获的远程H3K4ME3介导的染色质相互作用,OCR与其靶基因的关联进行了验证,其中考虑了两个相互作用箱的重叠。使用双面Fisher的精确测试测试了OCR到基因相互作用的过度代表。

  从整个数据集中选择了属于兴奋性神经元谱系的细胞,包括RG细胞,IPC-ENS和谷氨酸能神经元,用于使用Slingshot(v.2.6.0)21进行轨迹推理。使用1-50个主组件和2-40个LSI组件在子集上重新计算了一个加权最近的邻居图。根据计算出的最接近的邻居图进行了降低,从而产生了八维的UMAP嵌入。我们使用MCLUST70删除一个离群值后,在此UMAP空间中确定了23个簇。接下来,我们使用基于群集的最小生成树(MST)确定了全局谱系结构。将包含RG-VRG的群集设置为起始群集,其中包含终端分化的细胞的簇设置为结束簇(扩展数据图11a)。随后,我们拟合了九个同时的主曲线,以描述九个谱系中的每一个,根据每个单元的重量根据其投影距离与代表该谱系的曲线的投影距离获得。假时是根据主要曲线来推断的,每个分支进行了收缩以获得更好的收敛(补充表16)。最后,将八维UMAP空间中的主要曲线投影到二维UMAP空间以进行可视化。

  为了模拟沿推断轨迹的埃雷隆的活性,我们使用Tradeseq(V.1.12.0)22拟合了基于基因的Ere-Ere-Ere-Ere-Ere-Eregulon AUC分数(GAM)22。由于AUC分数可以看作是(0,1)的比例数据,而不是默认的负二项式GAM,因此我们在TradeSeq中安装了一个带有六个结的beta GAM。使用预测光滑函数提取TradEseQ模型的拟合值,每个轨迹沿每个轨迹具有100个数据点。由于我们专注于兴奋性神经元谱系进行埃里隆分析,因此删除了组织和TRG轨迹。根据拟合的AUC值,通过K-均值聚类鉴定了六个Ere-Eregulon模块(补充表17a)。

  在群集生产商(v.4.0.5)65中实施的单侧超几何检验用于识别每个Eregulon模块中的过度代表基因本体论(生物途径)(补充表17B)。在一个模块中至少8%的8%中存在的基因被认为是该模块的核心靶基因。模块特异性核心靶基因集用作输入基因集。任何Eregulon的靶基因的结合都用作背景。

  为了鉴定在常见和v1特异性en-l4-it之间差异表达的基因,我们首先选择了所有en-l4-it核,并根据原始数据的标记和组织确定了它们的亚型身份(常见或V1特异性)(扩展数据图12a,b)。然后,我们汇总了样品和亚型的计数,以生成伪库样品。通过使用R package glmmseq(v.0.5.5)71,将伪核计数数据拟合到广义线性混合模型(〜亚型+log2 [age]+[1 | dataset])中来进行差异基因表达分析。使用R软件包EDGER(V.3.42.4)72估算尺寸因子和分散体。一旦拟合模型,使用(〜log2 [age]+[1 | dataset])作为还原模型来确定统计显着性。Benjamini – Hochberg调整的基因P< 0.05 were determined to be significant (Supplementary Table 18).

  based on the principal curves, five BPs were identified along neuronal differentiation. To identify genes that are differentiating around a BP of the trajectory, we performed an earlyDETest using tradeSeq. Specifically, we first separated the pseudotimes into five consecutive segments (Extended Data Fig. 11g). We then compared the expression patterns of gene-based eRegulon AUCs along pseudotime between lineages by contrasting 12 equally spaced pseudotimes within segments that enclose the BP (Supplementary Table 19). We included segments 2–3 for BP1, segments 3–4 for BP2, and segments 4–5 for BP3, BP4 and BP5.

  Glial progenitor cells were isolated from GW20–24 human dorsal cortical tissue samples. The VZ/iSVZ and oSVZ were dissected and dissociated using the Papain Dissociation System (Worthington Biochemical). Dissociated cells were layered onto undiluted papain inhibitor solution (Worthington Biochemical) and centrifuged at 70g for 6 min to eliminate debris. The cell pellet was resuspended in 10 ml complete culture medium (DMEM/F12, 2 mM GlutaMAX, 2% B27 without vitamin A, 1% N2 and 1× penicillin–streptomycin) and incubated at 37 °C for 3 h for surface-antigen recovery. From this point on, cells were handled on ice or at 4 °C. Cells were washed once with staining buffer (Hank’s balanced salt solution (HBSS) without Ca2+ and Mg2+, 10 mM HEPES pH 7.4, 1% BSA, 1 mM EDTA, 2% B27 without vitamin A, 1% N2 and 1× penicillin–streptomycin), centrifuged at 300g for 5 min and resuspended in staining buffer to a density of 1 × 108 cells per ml. Cells were blocked by FcR blocking reagent (Miltenyi Biotech, 1:20) for 10 min, followed by antibody incubation for 30 min. Antibodies used for fluorescence-activated cell sorting (FACS) include FITC anti-EGFR (Abcam, ab11400), PE anti-F3 (BioLegend, 365204), PerCP-Cy5.5 anti-CD38 (BD Biosciences, 551400), Alexa Fluor 647 anti-PDGFRA (BD Biosciences, 562798) and PE-Cy7 anti-ITGA2 (BioLegend, 359314). All antibodies were used at 1:20 dilution. After incubation, cells were washed twice in staining buffer, resuspending in staining buffer containing Sytox Blue (Invitrogen) and sorted using the BD FACSAria II sorter. Cells were sorted into collection buffer (HBSS without Ca2+ and Mg2+, 10 mM HEPES pH 7.4, 5% BSA, 2% B27 without vitamin A, 1% N2 and 1× penicillin–streptomycin). After sorting, cells were centrifuged at 300g for 5 min, resuspended in complete culture medium and plated onto glass coverslips pre-coated with poly--lysine and laminin at a density of 2.5 × 104 cells per cm2. Cells were cultured in a humidified incubator with 5% CO2 and 8% O2. Half of the medium was changed with fresh medium every 3–4 days until collection at the indicated time.

  On DIV0 and DIV14, glial progenitors or their progenies were fixed with 4% formaldehyde/4% sucrose in PBS and permeabilized/blocked with PBS-based blocking buffer containing 10% donkey serum, 0.2% gelatin and 0.1% Triton X-100 at room temperature for 1 h. The samples were then incubated with primary antibodies diluted in the blocking buffer at 4 °C overnight. The next day, the samples were washed in PBS three times and incubated with secondary antibodies in the blocking buffer at room temperature for 1 h. Samples were then washed twice in PBS, counterstained with DAPI and washed in PBS again. z-stack images were acquired using the Leica TCS SP8 using a 25× water-immersion objective. Acquired images were processed using Imaris v.9.7 (Oxford Instruments) and ImageJ v.1.5462. The following antibodies were used: TFAP2C (R&D systems, AF5059, 1:50), CRYAB (Abcam, ab13496, 1:200), OLIG2 (Abcam, ab109186, 1:150), EGFR (Abcam, ab231, 1:200), SPARCL1 (R&D systems, AF2728, 1:50), DLX5 (Sigma-Aldrich, HPA005670, 1:100) and NeuN (EMD Millipore, ABN90, 1:250).

  Glial progenitors were either immediately subjected to scRNA-seq or cultured in vitro for 7 and 14 days before scRNA-seq. In the latter cases, cells were released using the Papain Dissociation System (Worthington Biochemical) without DNase for 20 min. Released cells were washed twice in HBSS without Ca2+ and Mg2+ supplemented with 0.04% BSA, centrifuged at 250g for 5 min, and resuspended in HBSS without Ca2+ and Mg2+ supplemented with 0.04% BSA. Cells were counted using a haemocytometer, diluted to ~1,000 nuclei per μl and further processed according to the 10x Genomics Chromium Single Cell 3’ Reagent Kits User Guide (v3.1 Chemistry). We targeted 10,000 cells per sample per reaction. Libraries from individual samples were pooled and sequenced on the NovaSeq 6000 sequencing system, targeting 22,500 read pairs per cell.

  The raw sequencing signals in the BCL format were demultiplexed into fastq format using the mkfastq function in the Cell Ranger suite (v.7.1.0, 10x Genomics). The Cell Ranger count pipeline was implemented for cell barcode calling, read alignment and quality assessment using the human reference genome (GRCh38, GENCODE v32/Ensembl98) according to the protocols described by 10x Genomics. The pipeline assessed the overall quality to retain all intact cells from the background and filtered out non-cell associated reads. All gene expression libraries in this study showed a high fraction of reads in cells, indicating high RNA content in called cells and minimal levels of ambient RNA detected. The overall summary of data quality for each sample is listed in Supplementary Table 20. Next, we further assessed the data at the individual-cell level and retained high-quality cells with the following criteria: (1) the number of detected genes (nFeature_RNA) is greater than 1,000 and less than 10,000; and (2) less than 10% of all reads mapped to mitochondrial genes. Raw counts were log-normalized with a size factor of 10,000. The first 30 principal components were used to construct the nearest-neighbour graph, and Louvain clustering was used to identify clusters. Clusters with significantly fewer UMI counts, probably consisting of low-quality, dying cells, were also excluded for further analysis. The identity of specific cell types was determined based on the expression of known marker genes (Extended Data Fig. 15e and Supplementary Table 21). The ten identified cell types were dividing cell (dividing), RGs, ependymal cell, IPC-EN, tripotential intermediate progenitor cell (Tri-IPC), astrocytes, OPCs, intermediate progenitor cell for INs (IPC-IN) and INs.

  To determine the similarity between glial-progenitor-derived cells and our atlas data, we applied SingleCellNet (v.0.1.0), a random-forest-based cell-type classification method35. Specifically, we randomly selected 700 cells from each cell type as the training set. We found the top 60 most differentially expressed genes per cell type, and then ranked the top 150 gene pairs per cell type from those genes. The preprocessed training data were then transformed according to the selected gene pairs and were used to build a multi-class classifier of 1,000 trees. Moreover, we created 400 randomized cell expression profiles to train up an ‘unknown’ category in the classifier. After the classifier was built, we selected 165 cells from each cell type from the held-out data, along with another 165 randomized cells, and assessed the performance of the classifier on the held-out data using precision-recall curves, obtaining an average AUPRC of 0.827. To classify Tri-IPC-derived INs, we transformed the query data with top pairs selected from the optimized training data and classified it with the trained classifier. Here we chose a classification score threshold of 0.2, and cells with scores below this threshold were assigned as unmapped.

  For clonal analysis, samples for FACS were processed as above with the following changes: individual tRG, oRG or Tri-IPC cells were sorted using the BigFoot Spectral Cell Sorter (Thermo Fisher Scientific) using single-cell precision mode into a single well of 96-well glass-bottom plates precoated with polyethylenimine and laminin containing 100 μl complete culture medium. For tRGs and oRGs, the complete culture medium was supplemented with 10 ng ml−1 FGF2 to promote initial cell survival and proliferation. The culture medium was changed weekly for a total of 2 weeks. After 2 weeks, cells were fixed and stained in the same way as mentioned above. The following antibodies were used: EOMES (Abcam, ab23345, 1:200), OLIG2 (EMD Millipore, MABN50, 1:200), EGFR (Abcam, ab231, 1:200), SPARCL1 (R&D systems, AF2728, 1:50), SOX10 (Santa Cruz, sc-365692, 1:50) and DLX5 (Sigma-Aldrich, HPA005670, 1:100).

  Glial progenitors were isolated from GW20–24 primary cortical tissue by FACS, as described above. about 200,000 cells were centrifuged at 300g for 5 min and resuspended in 0.5 ml complete culture medium containing 1 × 107 plaque-forming units of CMV-GFP adenoviruses (Vector Biolabs). Next, cells were incubated in a low-attachment plate for 1 h under the normal culture conditions. After infection, cells were washed twice with complete culture medium containing 0.3% BSA and resuspended in slice culture medium. about 25,000 cells were transplanted onto the oSVZ of freshly prepared slices through a pipette. The slices were maintained for 8 days in culture at 37 °C, and the medium was changed every other day.

  After 8 days in culture, the slices were fixed with 4% formaldehyde in PBS at room temperature for 1 h, followed by permeabilization and blocking with PBS-based blocking buffer containing 10% donkey serum, 0.2% gelatin and 1% Triton X-100 at room temperature for 1 h. The samples were then incubated with primary antibodies diluted in the blocking buffer at 4 °C for 48 h. Then, 2 days later, the samples were washed in PBS plus 0.1% Triton X-100 four times and incubated with secondary antibodies in the blocking buffer at 4 °C for 24 h. After secondary antibody incubation, the samples were washed twice in PBS plus 0.1% Triton X-100, counterstained with DAPI and washed in PBS again. z-stack images were acquired on the Leica TCS SP8 system using a 25× water-immersion objective. Acquired images were processed using Imaris v.9.7 (Oxford Instruments) and ImageJ (v.1.54)62. The following antibodies were used: GFP (Aveslabs, GFP-1020, 1:1,000), EOMES (Abcam, ab23345, 1:200), NeuN (EMD Millipore, ABN90, 1:250), OLIG2 (EMD Millipore, MABN50, 1:200), EGFR (Abcam, ab32077, 1:200), DLX5 (Sigma-Aldrich, HPA005670, 1:100) and SPARCL1 (R&D systems, AF2728, 1:50).

  FACS-sorted Tri-IPCs (60,000 cells) were centrifuged and resuspended in Leibovitz’s L-15 medium with DNase I (180 μg ml−1). Immediately before transplantation, cells were further concentrated by centrifugation (4 min, 800g) and resuspended in 2 μl Leibovitz’s L-15 with DNase I. The cell suspension was loaded into bevelled glass micropipettes (about 70–90 μm in diameter, Wiretrol 5 μl, Drummond Scientific) prefilled with mineral oil and mounted onto a microinjector. Recipient mice (NSG, JAX 005557, postnatal day 5) were anaesthetized by hypothermia (about 4 min) and positioned in a clay head mould to stabilize the skull73. Micropipettes were positioned vertically in a stereotactic injection apparatus. Injections were performed in both the left and right hemispheres perpendicular to the skin surface. Eye coordinates were x: 1.5, y: 3.6. A total of 50 nl of cell suspension was released at z: 0.2, 0.4, 0.8 and 1 from the surface of the skin. The mice were returned to their litters after injection.

  Twelve weeks after injection, the recipient mice were perfused with 4% PFA and post-fixed in 4% PFA at 4 °C overnight. The samples were cryoprotected in 15% and 30% sucrose in PBS and frozen in OCT. The samples were sectioned at a thickness of 16 µm, air-dried and rehydrated in PBS. Immunostaining was done in the same way as described above for human brain sections. Confocal images were acquired with a Leica TCS SP8 using a 20× oil-immersion objective. Acquired images were processed using ImageJ (v.1.54)62. The following antibodies were used: human nuclear antigen (Abcam, ab191181, 1:200), GABA (Sigma-Aldrich, A2052, 1:250), GFAP (Invitrogen, 13-0300, 1:300) and SOX10 (R&D Systems, AF2864, 1:50).

  Human ganglionic eminence scRNA-seq data from a previous study33 were downloaded from the GEO (GSE135827) and used as the reference. We integrated all samples using the RPCA methods, subset the data to focus on cells from the ganglionic eminence, reclustered the cells and annotated IN subtypes based on marker genes reported in the literature34 (Extended Data Fig. 17a,b).

  To determine the identity of Tri-IPC-derived INs based on the reference dataset, we applied SingleCellNet in a similar way as mentioned above with the following parameter modifications. We randomly selected 400 cells from each cell type as the training set. We found the top 200 most differentially expressed genes per cell type, and then ranked the top 200 gene pairs per cell type from those genes. The preprocessed training data were then transformed according to the selected gene pairs and were used to build a multi-class classifier of 1,000 trees. Moreover, we created 400 randomized cell expression profiles to train up an ‘unknown’ category in the classifier. After the classifier was built, we selected 100 cells from each cell type from the held-out data, along with another 100 randomized cells, and assessed the performance of the classifier on the held-out data using precision-recall curves, obtaining an average AUPRC of 0.901. To classify Tri-IPC-derived INs, we transformed the query data with top pairs selected from the optimized training data and classified it with the trained classifier. Here we chose a classification score threshold of 0.35, and cells with scores below this threshold were assigned as unmapped.

  As an alternative classification method to determine the identity of Tri-IPC-derived INs, we performed mutual nearest-neighbour-based label transfer using the MapQuery() function in Seurat v.4. The first 30 principal components were used to identify transfer anchors. Cell-type labels from ref. 33 were transferred to Tri-IPC-derived INs when confidence was high (prediction score >0.5)。预测评分等于或低于0.5的细胞被标记为未映射。

  鼠标scrna-seq数据来自参考。从单个单元门户(SCP1290)下载了36个,并用作参考。我们将数据子集并集中在星形胶质细胞和循环神经胶质细胞上(由原始作者定义)。基于文献中报道的标记基因74中报道的标记基因,将这些细胞重新聚集并注释为Olig2或S100A11谱系(扩展数据图17E,F)。我们使用了TRI-IPC衍生的星形胶质细胞作为查询数据,并以与Tri-IPC衍生的INS相同的方式应用了Singlecellnet。我们还以相同的方式应用了Seurat标签传输,但使用20个主要组件来识别转移锚。

  当我们能够区分两个星形胶质细胞谱系时,我们还在婴儿期与我们的Snmultiome数据中使用星形胶质细胞。我们从整个数据集中选择了婴儿期的星形胶质细胞,并使用1-50个主要组件(已经在SCTRANSFORM和RPCA集成之后计算出来)。这些细胞基于所得最接近的邻次图,并根据文献中报道的标记基因进行注释74(扩展数据图17i,j)。我们使用了TRI-IPC衍生的星形胶质细胞作为查询数据,其重新处理的方式与SNMultiome数据相同,包括Sctransform V.2建模和细胞周期回归。Singlecellnet的应用与上述方式相同。对于Seurat标签转移,使用前50个主要组件来识别转移锚。

  我们从扩展的GBMAP75中获得了人类GBM细胞的单细胞和单核RNA-Seq数据,从Cellxgene下载(https://datasets.cellxgene.cziscience.com/ead761be-309f-309f-4b79-8208-8208-41da14ca305f.h5ad)。使用Snmultiome Atlas数据作为参考,我们应用了Singlecellnet来识别GBMAP中恶性细胞的相应细胞类型。使用以前应用于神经胶质祖细胞衍生的细胞分类的相同参数执行单核。我们的分析得出的平均AUPRC为0.832。对于分类,我们将得分阈值设置为0.15;分数低于该阈值的细胞被指定为未映射。

  我们实施了Scavenge(v.1.0.2)40,将SNMultiome数据的单核ATAC – SEQ一部分与四个认知性状(流畅智能,处理速度,执行功能和工作记忆)和五种神经精神疾病(ASD,MDD,MDD,MDD,BPD,ADHD,ADHD和SCZ)的GWAS数据相结合。对阿尔茨海默氏病的分析作为阳性对照。对于每个性状或条件,我们对所有GWAS SNP进行了基于多SNP的条件和联合关联分析,并进行了默认设置。实施了一个逐步的模型选择过程,以选择独立关联的SNP并计算精细映射后验概率(PP)。PP是为我们随后的GCHROMVAR Analysis的导入的,在该分析中,我们使用来自集成的单核ATAC – SEQ数据来构建了一个单元格计数矩阵。在校正GC偏置后,计算了每个单元的一组背景峰上潜在的GWAS信号富集的Gchromvar评分。为了最大程度地减少批处理效果,我们使用了与批处理的LSI矩阵进行最接近的neighbour图构造和随后的网络传播。将代表潜在GWAS风险关联的性状分数(TRS)分配给每个细胞,以构建认知性状或神经系统疾病的单细胞风险图。为了确定明显的性状 - 细胞关联,我们考虑了获得最高0.1%TRS得分性状的细胞,并将网络传播置于1,000次以达到统计显着性。带有p的细胞< 0.05 were defined as trait associated. To determine the trait relevance per cell type, we calculated the odds ratio of cells associated with each trait in each cell type over the background and determined statistical significance using a two-sided hypergeometric test followed by Benjamini–Hochberg correction. Cell types with FDR-adjusted P < 0.05 and odds ratio >1.4被认为对性状相关的变体显着富集。对地区和年龄组进行了类似的分析。最后,通过z转换进行比较和可视化(补充表23和24)进行标准化。本研究中使用的GWAS数据可以从以下链接下载:流体智能(Phenocode 20016),处理速度(Phenocode 20023),执行功能(Phenocode 399)和工作记忆(Pentocode 4282):https://pan.ukbb.bhodinstitute.org/downloads/;ASD:https://figshare.com/articles/dataset/asd2019/14671989;MDD:https://datashare.ed.ac.uk/handle/10283/3203;BPD:https://figshare.com/articles/dataset/bip2021_noukbb/22564402;ADHD:https://figshare.com/articles/dataset/adhd2022/22564390;SCZ:https://figshare.com/articles/dataset/cdg2018-bip-scz/14672019;alz:https://vu.data.surfsara.nl/index.php/s/jvlyt1m9bb2maki/download?path=%2f&files = pgcalz2sumstatsatexcluding23andme.txt.gz。

  有关研究设计的更多信息可在与本文有关的自然投资组合报告摘要中获得。

左文资讯声明:未经许可,不得转载。