1,962,069个人的骨关节炎的转化基因组学_健康动态

　　我们进行了一项GWAS荟萃分析，在11种骨关节炎表型中结合了多达87个GWA的摘要统计数据。任何部位的骨关节炎，髋关节骨关节炎，膝关节骨关节炎，髋关节和/或膝关节骨关节炎，脊柱骨关节炎，手指关节炎，手指骨关节炎，膝关节骨关节炎，缩微骨关节炎和尾骨抑制（总计骨骼）的hip骨头（总hip）hipection（总hip）hipection thre three thre thre thre thre thre thre（thr）（TKR）和总髋关节和/或膝盖置换（TJR）（补充表1和2以及补充说明）。

　　为了评估自我报告的疾病状况的分类准确性，我们在任何部位进行了骨关节炎的敏感性分析，不包括27种含有自我报告的骨关节炎的GWAS。我们通过排除具有自我报告的疾病状态的个体（补充图5和6和补充说明）来进一步扩大分析的分析。

　　为了研究结构性和有症状性骨关节炎之间的不一致，我们进行了灵敏度荟萃分析，限制了与表型的同类，仅基于任何部位的骨关节炎成像。敏感性荟萃分析包括来自HKDDDPC，Riken和Rotterdam研究1、2和3的5个GWASS，总计6,816例和9,624例对照（补充图4和补充注释）。

　　我们结合了内部脚本和EasyQC51（https://github.com/hmgu-itg/genetics-of---------------ost-osteoarhthritis-2.0;补充图3和补充注释）在每个队列中的GWAS摘要统计数据中心进行质量控制。

　　我们使用了11次骨关节炎表型中在metal52中实现的固定效应逆变量加权荟萃分析方法，其中包括来自42个不同队列的最多87 GWAS摘要统计数据，包括5个主要的祖先。除非已经进行了，否则我们包括基因组控制校正。荟萃分析后，我们排除了仅在单个GWA中观察到的任何变体和/或具有MAF< 0.01, which resulted in 14.7 to 24.3 million variants depending on the phenotype (Supplementary Note).

　　We used P ≤ 1.3 × 10−8 to declare genome-wide significance, as previously described3, to account for the effective number of independent phenotypic traits. In brief, we first estimated the genetic correlation matrix between the 11 osteoarthritis traits by using bivariate LD score regression53 with genome-wide meta-analysis summary statistics. This method produces reasonably robust estimates of genetic correlation when the sample size of unrelated individuals is high54 by aiming to overcome the limitations of the analysis, including (1) the tendency to be higher than phenotypic correlations; and (2) the potential for inflated estimates when heritability estimates are low. We then calculated the effective number of independent traits (Peff) from the eigenvalues λi of the correlation matrix55. For the P = 11 osteoarthritis phenotypes in this study, Peff = 4.6565.

　　To define independent signals, within and across phenotypes, we used a three-step approach; detailed are available at GitHub (https://github.com/hmgu-itg/Genetics-of-Osteoarthritis-2.0). (1) For each phenotype, we performed clumping using Plink56 together with a significance threshold of P ≤ 1.3 × 10−8, 2 Mb window around each index variants and linkage disequilibrium (LD) threshold of 0.1. For the LD calculations, we used UK Biobank (v.3) for all ancestries (https://www.ukbiobank.ac.uk). (2) For each index variant in a given clump, we performed an approximate stepwise model-selection procedure implemented by COJO in GCTA57 to establish whether index variants were independent (Supplementary Note). (3) To define independent signals across phenotypes, we included index variants from all independent signals across all phenotypes if they were within 1 Mb of each other. We performed reciprocal approximate conditional analyses, implemented by COJO in GCTA57. We considered signals independent if either signal conditioned on the other had P ≤ 1.3 × 10−8. For each independent signal, we selected a lead variant as the variant with the most significant P value across all phenotypes.

　　To determine whether a signal was newly reported or previously known, we included all independent signals and all previously reported variants (Supplementary Table 4 and Supplementary Note) and we performed reciprocal approximate conditional analyses, implemented by COJO in GCTA57. We considered signals to be newly reported if either the signal or previously reported variant conditioned on the other had P ≤ 1.3 × 10−8. After COJO analysis, we also required that each genome-wide significant independent signal should be internally validated in at least one osteoarthritis phenotype. Internal validation was defined as at least two GWASs having the same direction of risk effect and nominally significant (P < 0.05). We defined a locus as follows: (1) index variants separated by <1 Mb were grouped together in the same locus; (2) we added 500 kb upstream and downstream of index variants to define the final region of each locus. The loci that contained more than one index variants have been extended out to 500 kb beyond edge variants. If a locus contained a variant that was previously reported for osteoarthritis, the locus was considered to be known.

　　We estimated the phenotypic variance explained by the 962 independently associated variants as a function of the effect size and the risk-allele frequency (Fig. 1 and Supplementary Fig. 1). The phenotypic variance explained by a variant is ln(OR)2 × 2 × RAF × (1 − RAF), where ln(OR) is the natural logarithm of the OR of the variant in the meta-analysis, and RAF is its weighted risk-allele frequency across all cohorts.

　　For the chromosome X non-pseudoautosomal region, we performed the GWAS in men and women separately. Moreover, for those cohorts without their own reference panel that imputed to the Haplotype Reference Consortium (HRC), we applied an additional level of quality control to ensure only good-quality genotypes were included (Supplementary Note).

　　We carried out a sex-differentiated analysis to identify any sex-specific variants in addition to the variants identified in the sex-combined meta-analysis, potentially missed due to differences in effects between male and female individuals (magnitude and/or direction). We used GWAMA58,59 (https://genomics.ut.ee/en/tools), which provides four different P values: single-sex, combined, heterogeneity (Phet), and differentiated (Pdiff). In the sex-differentiated analysis, male and female individuals are analysed separately in each GWAS. The male- and female-specific allelic effect estimates are obtained by a fixed-effects meta-analysis, and tested for association with the trait, allowing for sex-differentiation using . By contrast, in the sex-combined analysis, male and female individuals are analysed combined in each GWAS, ambivalent to sex. Combined allelic effect estimates are obtained from a fixed-effects meta-analysis, weighted by the inverse variance, and tested for association with the trait. We defined a significant sex-differentiated association on the basis of the following criteria, all of which must be satisfied: a significant association with one osteoarthritis phenotype in at least one single sex (P ≤ 1.3 × 10−8) and a significant sex-differentiated P value (Pdiff ≤ 1.3 × 10−8) and a significant heterogeneity P value (Phet ≤ 0.0125). If the direction of effect between male and female individuals is opposite, we additionally required the association to be present in one sex and at least nominally significant in the opposite direction in the other sex, to ensure that the observed difference in effect is not due to chance or power differences. We defined the independent signals using the three-step approach in COJO and required that they be internally validated (as defined above). The Phet significance was determined according to the number of newly identified sex-specific variants (n = 4), which are independent of the previously reported variants and the main analysis variants (Supplementary Table 5). To identify potential effector genes for the sex-specific signals, we performed fine-mapping and produced 95% credible sets for all 4 signals; each set contained the lead variant (Supplementary Table 6 and Supplementary Note).

　　We performed a fixed-effect inverse-variance-weighted meta-analysis using metaL in five ancestry groups separately (European, African, Hispanic, East Asian and South Asian), and for sensitivity analysis, we also performed meta-analysis of these data using Han and Eskin’s random-effects model (RE2)60 implemented in metaSOFT (http://genetics.cs.ucla.edu/meta_jemdoc/). None of the variants in the non-European-ancestry-specific meta-analysis reached study-wide significance (P ≤ 1.3 × 10−8).

　　We derived GRSs for osteoarthritis of the knee, hip, hip and/or knee, hand, finger, thumb, THR, TKR, and TJR and performed validation in the Million Veteran Program (MVP) (Supplementary Note and Supplementary Table 7). The MVP did not contribute to the joint-specific meta-analysis and is therefore an independent validation set for the GRS.

　　Functional GWAS analysis61 was applied to identify disease-relevant cell types as described in detail previously62 (https://github.com/natsuhiko/PHM). In brief, the association statistics (log[OR] and standard errors) were converted into approximate Bayes factors using the Wakefield approach63. After defining a cis-regulatory region of 1 Mb centred at the transcription start site (TSS) for each gene, the Bayes factors of variants existing in each cis region were weighted and averaged by a prior probability (an exponential function of TSS proximity), which was estimated from the distance distribution of regulatory interactions64. Finally, the likelihood of an fGWAS model was given by the averaged Bayes factors across all genes multiplied by the feature-level prior probability. The latter was obtained from a linear combination of cell-type-specific expression and the averaged expression across all cell types as a baseline. The maximum-likelihood estimator of the effect size for the cell-type-specific expression was used to compute the enrichment of each cell type.

　　Full summary statistics from the GWAS were used to test knee osteoarthritis and TKR GWAS signals against single-cell knee tissue data, hip osteoarthritis and THR against hip tissue data, and finger osteoarthritis against data from all appendicular tissues. For results presentation, the 30 cell types from single-cell multiome data were grouped into three different categories: those involved in chondrogenesis (9 cell types), osteogenesis (4 cell types) and all other cell lineages4 (17 cell types) (Fig. 2 and Supplementary Table 8).

　　For each independent signal and each phenotype, we included all variants within 1 Mb around the lead variant. GWAS summary statistics quality control was performed using kriging_rss from susieR package65 (v.0.12.27, R v.4.2.166); we used this function to calculate, based on the observed Z scores, the expected Z score and its variance; we then detected possible outliers using standardized differences between the observed Z score and the expected value, at the significance level 0.05, corrected for multiple testing using the Bonferroni method. Fine-mapping of the GWAS summary statistics was performed using susie_rss function from the susieR package65 (v.0.12.27, R v.4.2.166). For the fine-mapping, we set the maximum number of causal variants to 10 and a purity threshold of 0.1 to determine 95% credible sets of potentially causal variants. External LD matrices were computed using Plink (v.1.9) on the imputed genotypes from UK Biobank data (v.3) of all ancestries. Out of a total of 962 independent variants, 913 were assigned a credible set, of which 855 contained the lead variant (Supplementary Table 9).

　　The main challenge here and in any GWAS is to pinpoint the likely causal variants and the biological effects and mechanisms through which they have a role in disease. To this end, we integrated multiple orthogonal statistical and functional methods to identify effector genes. We considered 24 supporting lines of information, including variant information, functional genomics and database searches (Extended Data Fig. 1 and Supplementary Note). To assess whether certain lines of evidence are more informative than others, we conducted sensitivity analyses at both the variant and gene levels, along with heritability analyses (Supplementary Tables 13, 21 and 22, Supplementary Fig. 8, Extended Data Fig. 2 and Supplementary Note). For the additional four sex-specific signals, we considered variant consequence, fine-mapping within a gene transcript, active promoter, human and mouse musculoskeletal and pain/neuronal phenotype searches as the rest of the supporting lines were performed with males and females combined. We consider newly reported effector genes to be those that were not identified previously3. We use the term identify in reference to effector genes to indicate that these genes are implicated as having a role in osteoarthritis.

　　We carried out pathway over-representation analysis with the 700 effector genes. We performed pathway analyses using different thresholds as inclusion criteria for genes from scores of 3 and above, up to scores of 7 and upwards (Supplementary Table 26, Extended Data Fig. 3 and Supplementary Note).

　　Allelic expression imbalance was determined using RNA-sequencing data of macroscopically preserved subchondral bone of 24 patients who underwent total joint replacement surgery due to osteoarthritis (RAAK-study, granted by the medical ethics committee of Leiden University Medical Center, P08.239/P19.013) (Supplementary Note, Supplementary Table 27 and Supplementary Fig. 10).

　　We performed colocalization of the osteoarthritis associations with associations with variations in protein levels in plasma (plasma pQTL) using the coloc software package implemented in R67. For plasma pQTL analysis, we used the dataset described previously68, which tested for the association of 58 million sequence variants with levels of 2,941 proteins, measured by Olink Explore 3072, in plasma samples from 46,218 individuals of British or Irish ancestry included in the UK Biobank dataset. Using summary statistics for the osteoarthritis phenotypes (excluding the UK Biobank datasets) and the plasma pQTL, that is, effects and P values, we calculated Bayes factors for each of the variants in the associated regions tor the two traits and used coloc to calculate posterior probability for two hypotheses: (1) that the association with osteoarthritis phenotypes and plasma pQTL are independent signals (PP3); and (2) that the association with osteoarthritis phenotypes and plasma pQTL are due to a shared signal (PP4) (Supplementary Table 28 and Supplementary Note).

　　We used the variant effect predictor (VEP)69 to predict the consequences of the variants sequenced in each dataset. We classified as high-impact variants those predicted as start-lost, stop-gain, stop-lost, splice donor, splice acceptor or frameshift, collectively called LOF variants. We filtered out LOF variants predicted by the Loss-Of-Function Transcript Effect Estimator70 (LOFTEE; https://github.com/konradjk/loftee) not to be likely to be truly LOF (for example, near the end of the transcript) and used only high-confidence LOF variants.

　　We classified as moderate-impact variants (MIS) those missense variants predicted with LOF by at least one of the following prediction methods: metaSVM, metaLR71 or CADD72 (combined annotation dependent depletion) with a phred score of ≥25, using variants available in dbNSFP (v.4.1c)73. We further included indels of moderate impact without any filtering.

　　We used logistic regression under an additive model to test for association between (1) LOF or (2) LOF + MIS gene burdens and phenotypes, in which disease status was the dependent variable and genotype counts as the independent variable, using a likelihood ratio test to compute two-sided P values. Individuals were coded 1 if they carried any of the LOF variants (LOF/LOF + MIS) with MAF < 2% and 0 otherwise. For the analyses, we used software developed at deCODE Genetics74. We analysed these gene burden models in whole-genome sequencing (WGS) data and then imputed data for 211,690 patients with osteoarthritis (osteoarthritis at any site), of which 54,513 had WGS, and 719,856 controls, of which 148,488 had WGS, in the UK Biobank, Icelandic, Danish and US Intermountain datasets75, and the FinnGen dataset for the LOF model, and meta-analysed the results. For Iceland, we included county of birth, age, age squared, sex and an indicator function for the overlap of the lifetime of the individual with the time span of phenotype collection as covariates to account for differences between cases and controls. We used county of birth as a proxy covariate for the first principal components (PCs) in our analysis because county of birth has been shown to be in concordance with the first PC in Iceland76. The UK, Danish and US associations were adjusted for sex, age and the first 20, 12 and 4 PCs, respectively. We used LD score regression intercepts53 to adjust the χ2 statistics and avoid inflation due to cryptic relatedness and stratification, using a set of 1.1 million variants. P values were calculated from the adjusted χ2 results.

　　meta-analysis was performed on the summary results from Iceland, the UK, Denmark and the USA, when available, using a fixed-effects inverse-variance-weighted method77, in which the datasets were allowed to have different population frequencies for alleles and genotypes but were assumed to have a common OR and weighted with the inverse of the variance of the effect estimate derived from the logistic regression. The FinnGen dataset was also included in the LOF model for the VIT gene, no LOF variants were identified in the other genes. We set a study-wise significance threshold at P < 7.1 × 10−5, accounting for the 700 unique genes tested, whereas a genome-wide significance threshold is considered for burden P < 2.5 × 10−6, accounting for the approximately 20,000 genes in the genome.

　　To determine whether any of the variants in the credible set were localized in gene regulatory regions, we used the ROADMAP ChromHMM data78, predicting gene regulatory regions (enhancers and promoters) in human mesenchymal stem-cell-derived chondrocytes (E049) and primary osteoblasts (E127). We used the ROADMAP-generated core 15-state chromatin state model, where the following states were considered as gene regulatory: active TSS, flanking active TSS, enhancers, genic enhancers, bivalent/poised TSS, flanking bivalent/poised TSS/enhancer and bivalent enhancer. Variants that localized in one of these gene regulatory regions were also assessed if they affected a possible transcription-factor-binding motif as predicted by Haploreg (v.4.2)79,80 (Supplementary Note and Supplementary Tables 11, 12 and 14).

　　To identify potential drug-repurposing options from the effector gene list, we queried around 17,000 drug molecules and 21,087 protein targets (with UniProt and Ensembl identifiers) from Open Targets81 (https://platform.opentargets.org/downloads). This dataset comprises 1,543 genes, of which the protein products are the target of at least 1 drug, and 4,930 drugs that target at least 1 gene product. For the 700 effector genes, there were 652 approved drugs that target the protein of 70 unique genes. After filtering out drugs that were withdrawn and that were not listed with an indication, there are 473 drugs that target the protein of 69 unique effector genes (Supplementary Table 30). Finally, we also investigated the similarities and differences between these effector genes and those in large pain datasets (Supplementary Table 31 and Supplementary Note).

　　With the increase in sample size, we detected 39 loci with >1个独立信号（基因座的13.5％具有≥1个额外信号）（补充表3）。由于许多基因座具有≥1个效应子基因，我们认为所有效应基因在骨关节炎病理学中具有潜在作用，因此其他信号可能会通过相同或不同的效应基因发挥作用。以效应子基因为基础，我们的目标是通过使用多个来源来识别可以用于药物靶向的效应基因的途径，网络和常见主题来建立基因之间的联系。我们根据其得分对700个效应基因进行了排名。我们进行了文献搜索，以了解有关功能和效应基因之间关联的信息（补充说明）。最后，我们通过使用从11个骨关节炎表型的11个骨炎（补充图9）中使用的摘要统计数据对LDAK v.6 Software82（https://www.ldak.org）识别的八个生物学过程中的每一个进行了遗传遗传分析。

　　补充说明中提供了研究级的道德声明。

　　有关研究设计的更多信息可在与本文有关的自然投资组合报告摘要中获得。

左文资讯声明：未经许可，不得转载。