SU Jia-ming, PENG Jing, CHEN Hai-min, ZHOU Ying, SHI Yang, DONG Zhao-xi,WEN Ya-xuan, LIN Zi-xuan, LIU Hong-fang
1. Beijing University of Traditional Chinese Medicine, Beijing 100029, China
2. Dongzhimen Hospital Affiliated to Beijing University of Traditional Chinese Medicine, Beijing 100700, China
Keywords:Diabetic nephropathy Renal tubulointerstitial injury Bioinformatics Machine Learning immune infiltration Core genes
ABSTRACT Objective: To explore the genes related to renal tubulointerstitial injury in DN and to elucidate their underlying mechanism by using bioinformatics multi-chip joint analysis and machine learning technology, so as to provide new ideas for the diagnosis and treatment of DN. Methods: Four gene expression datasets of DN tubulointerstitial tissues were retrieved from the GEO database. GSE30122, GSE47185 and GSE99340 were used as the combined microarray datasets, and GSE104954 was used as the independent verification datasets. The differentially expressed genes (DEGs) were identified by R language, and Gene Ontology(GO) enrichment, KEGG pathway enrichment, Gene Set Enrichment Analysis (GSEA) and Immune Cell Infiltration Analysis were performed. Furthermore, LASSO regression, SVMRFE and RF machine learning algorithm were used to screen core genes, while external validation and Receiver Operating Curve (ROC) analysis as well as the model of prediction nomogram were performed. Finally, the influence of the clinical characteristics of DN patients was explored by Nephroseq. Results: A total of 107 DEGs were obtained, enrichment analysis revealed that the tubulointerstitial injury in DN was mainly involved in adaptive immune response, lymphocyte mediated immunity, regulation of immune effector process and immune-inflammatory pathways such as staphylococcus aureus infection, complement and coagulation cascades, phagosomes, and Th1 and Th2 cell differentiation. In addition,cell adhesion molecule, cytokine-cytokine receptor interaction and ECM-receptor interaction pathways were also significantly enriched. Memory resting CD4 T cells, γδδ T cells, resting mast cells and neutrophil cells were up-regulated, while CD8 T cells were down-regulated.Machine learning identified MARCKSL1, CX3CR1, FSTL1, AGR2, GADD45B as core genes with good diagnostic and predictive efficacy. Conclusion: The key pathological mechanism of tubulointerstitial injury in DN is immune disorder, inflammatory reaction, cytokine action and extracellular matrix deposition. Moreover, MARCKSL1, CX3CR1, FSTL1 may be the potential biomarkers for the diagnosis and prediction of DN.
Currently, bioinformatics methods are widely used to develop biomarkers that are relevant to diagnosis or prognosis, but given the high false positive rate due to a limited sample size, it may be difficult to obtain reliable results in single-chip data analysis[4].In this study, a bioinformatics multi-chip approach was used to search the GEO database for gene sequencing chips in renal tubulointerstitial tissues of all existing DN patients, to search for differentially expressed genes (DEGs) which are specifically related to tubulointerstitial injury of DN, and to try to combine machine learning algorithm to screen the core genes, at the same time, the specific enrichment pathway and immune infiltration mechanism of related genes were studied, and the risk factors were evaluated by correlation analysis of clinical features, therefore, the molecular mechanism of renal tubulointerstitial injury associated with DN is revealed and the potential biomarkers are explored to provide theoretical reference and scientific basis for the early diagnosis and targeted therapy of DN.
The search term "Diabetic nephropathy" was used to retrieve gene chip data from the GEO database (http://www.ncbi.nlm.nih.gov/GEO). The following were the criteria for inclusion: (1) human mRNA expression data set; (2) all samples were renal tissue; (3)used a case-control design with sample sizes exceeding 2 for each group. After screening, the public datasets of the human DN sample Gene expression profiling, GSE30122, GSE47185 and GSE99340,were obtained as the combine chip datasets, and GSE104954 as the independent verification datasets, details were shown in Table 1 .
We converts probe ids into“Entrez ID” based on platform annotation information. After combining the matrix files of GSE30122, GSE47185 and GSE99340 into the combine chip data set, the data set was corrected by Batch normalization in the Sva package of R software (version 4.2.0) , standardized Gene expression profiling were subjected to difference analysis by the“l(fā)imma” package, and DEGs were screened using Bayesian equation multiple-test correction with |log2fc|>1 and adjusted P<0.05 as criteria, the ggplot2 package is used to draw the volcano map of DEGs, and the pheatmap package is used to draw the heat map of DEGs.
In order to explore the main functions and approaches of DEGs, the Gene Ontology (GO) enrichment and KEGG (Kyotoencyclopedia of genes and genomes) pathway enrichment were applied by the“Bioconductor” package in R software, biological process (BP), cellular component (CC) , molecular function (MF) and signal pathway were screened by the criteria of P<0.05. In order to get more comprehensive results, gene set enrichment analysis (GSEA)method was used to select c5.go.v7.4.symbols.gmt and c2.cp.kegg.v7.4.symbols.gmt as the reference gene set, and GSEA software was used to perform 1 000 simulation analyses of the corrected gene expression matrix of DN patients in the combine chip data set,respectively, to obtain GO and KEGG enrichment analysis results.
Using R software to perform immune infiltration analysis on the corrected gene expression matrix of the combined microarray data set obtained above. The gene expression data of 22 immune cells were downloaded from CIBERSOFT (https://cibersort.stanford.edu/), the ratio of 22 kinds of immune cells in each sample was calculated using R software“e1071” package, and the abundance map of immune infiltration was drawn. “corrplot” and “vioplot”were used to plot the correlation heatmap and violin plot respectively to analyze the correlation and difference of immune cell infiltration distribution. P<0.05 indicated that there was significant difference between the two groups.
Machine learning algorithms were used to obtain more refinedmodels, which has been widely used in the search for biomarkers. In this study, the minimum absolute convergence and selection operator(LASSO), support vector machine-recursive feature cancellation(SVM-RFE) and random forest (RF) are introduced[5]. Lasso is built from the “glmnet” R package, generalized linear model for variable selection and complexity adjustment while fitting, and SVMRFE is a supervised machine learning technique that sorts features recursively[6], RF uses a decision tree classifier model to iteratively score the classification variables, thus producing high-precision classification features and jointly screening the core genes in the joint microarray data set.
Tab 1 Dataset details
We used GSE104954 as an independent validation dataset and unpaired t-test with P<0.05 as the statistical difference, to verify the difference of expression of the selected core genes between the two groups. Subsequently, a receiver operating characteristic (ROC)curve was established and the area under the ROC curve (AUC) was calculated to evaluate the effectiveness of core gene diagnosis of DN.
The core gene expression matrix in combine chip data set were integrated, and the logistic regression analysis was used to build prediction model using R software to visualize as nomograms, ROC curve was used to identify the model performance in predicting tubulointerstitial injury in DN patients.
Core gene clinical profiling Nephroseq database (https://www.Nephroseq.org ) is a clinical database that stores gene expression data for kidney disease and its control groups, and is widely used in the study of kidney disease[7]. In this study, the core genes selected were verified by the database, pearson correlation analysis was used to explore the effect of the expression of core gene on glomerular filtration rate (GFR) , 24h(proteinuria/24h), serum creatinine (SCr)and blood urea nitrogen (BUN) in patients with DN, with P<0.05 as the statistical criterion.
Mrs. Thompson took pains to open it in the middle of the other presents. Some of the children started to laugh when she found a rhinestone4() bracelet5 with some of the stones missing, and a bottle that was one-quaeter full of perfume. But she stifled6 the children s laughter when she exclaimed how pretty the brcelet was, putting it on, and dabbing7 some of the perfume on her wrist. Teddy stayed after school that day just long enough to say, Mrs. Thompson, today you smelled just like my mom used to.
A combined chip dataset with a total sample of 71 DN patients and 87 healthy controls was obtained, with a consistent distribution of data across datasets after batch correction. The results are shown in Figure 1. A total of 107 DN tubulointerstitial tissue-associated DEGs were obtained according to screening criteria compared with healthy controls, of which 26 genes were downregulated and 81 genes were upregulated. In order to show the change and cluster relation of DEGs, the volcano map and heat map are drawn, as shown in Figure 2.
The 107 selected DEGs targets were imported into R software for GO enrichment analysis, the biological processes are mainly involved in the regulation of endopeptidase activity, the regulation of peptidase activity, the negative regulation of hydrolase activity,the positive regulation of Cell-mediated immunity and cytokine production, etc. The cellular components include collagen-containing extracellular matrix, secretory granules and cytoplasmic vacuoles.
The molecular functions are mainly concentrated in the extracellular matrix structure, enzyme inhibitor activity and peptidase regulatory activity. There are 34 KEGG pathway enrichment results, the main related pathways are staphylococcus aureus infection, complement and coagulation cascade, cell adhesion molecule, phagosomes,ECM-receptor interaction, and Th1 and Th2 cell differentiation, as shown in Figure 3. Furthermore, GSEA enrichment analysis showed that in the tubulointerstitial gene expression matrix of DN patients,active GO functions are enriched in immune-related processes such as adaptive immune response, lymphoid Cell-mediated immunity,and regulation of immune effector processes; The active KEGG pathway was enriched in cell adhesion molecule, cytokine-Cytokine receptor interactions, and ECM receptor interactions, as shown in Figure 4.
Fig 1 Standardization of chip data set
Fig 2 Volcano plots(A) and heatmap (B) of differentially expressed genes
A matrix of immune cell content in tubulointerstitial tissue was constructed from 87 healthy controls and 71 patients with DN,showing differences in the distribution of infiltration of 22 immune cells across different samples, as shown in Figure 5A. By immune cell correlation analysis, the interaction between eosinophil granulocyte and naive B cells was the most obvious and positively correlated (r=0.66) , as shown in Figure 5B. Compared with the healthy controls, the infiltration of 22 kinds of immune cells in renal tubulointerstitial tissue of DN patients was significant (P<0.05) ,memory resting CD4 T cells, γδT cells, resting mast cells and neutrophil cells were up-regulated, while CD8 T cells were downregulated, as shown in Figure 5.
LASSO Regression, SVM-RFE algorithm and RF algorithm were used to further screen the DEGs in tubulointerstitial tissue of DN,and LASSO regression model was constructed and to cross-verified.The minimum error value corresponded to 16 characteristic genes.The SVM-RFE algorithm selected 8 feature genes by 5 fold cross validation, and RF algorithm identified 10 feature genes. Five core genes, such as MARCKSL1, CX3CR1, FSTL1, AGR2 and GADD45B,were obtained by taking intersection.
Fig 3 GO analysis(A) and KEGG analysis(B) of differentially expressed genes
Fig 4 GSEA enrichment analysis of tubulointerstitial gene expression matrix in DN
Fig 5 Infiltration of 22 kinds of immune cells
External cross-validation using the GSE104954 validation data set showed that the expression of MARCKSL1, CX3CR1 and FSTL1 genes was significantly up-regulated in the tubulointerstitium of DN patients compared with healthy controls, while the expression of GADD45B gene was significantly down-regulated, there was no significant difference in AGR2 expression between the two groups(Figure 7A-E). At the same time, the ROC curves showed that the screened 5 core genes had higher diagnostic efficacy (AUC>0.7)in distinguishing DN patients from healthy controls within the validation data set (Figure 7F-J) .
Based on the core gene expression matrix of the joint microarray data set, the prediction model was constructed by logistic regression and visualized as nomogram. The c-index of the prediction model was 0.994, which indicating a high correlation. In addition, ROC curves indicate that the combined nomogram model has the highest performance in predicting tubulointerstitial injury in patients with DN compared with other single-core gene models, as shown in Figure 8.
Fig 6 The core genes of DN with tubulointerstitial injury were screened by machine learning algorithm
Fig 7 The verification results of core genes
The results of core gene clinical features analysis and the correlation between core gene expression and clinical features of DN were verified by Nephroseq. Four datasets including ERCB Nephrotic Syndrome Tublnt (10 DN patients) , Ju CKD TubInt (17 DN patients) , Schmid Diabetes Tublnt (9 DN patients) and Woroniecka Diabetes Tublnt (10 DN patients) were retrieved, pearson correlation analysis showed that the expressions of MARCKSL1, CX3CR1 and AGR2 were negatively correlated with GFR (mL/min) in DN patients (Figure 9 A-C) , and the expressions of MARCKSL1, FSTL1 and GADD45B were positively correlated with urine protein (g/d) in DN patients, see Figure 9 D- F. The remaining results are not statistically significant and were not shown in the Figure.
Fig 8 Model of prediction nomogram
Fig 9 The clinical characteristics results of core genes
Tubulointerstitial injury in DN is not secondary to glomerular lesions, but rather exists early in DN and has a significant role in disease progression, making the research of relevant biomarkers one of the breakthroughs in the early diagnosis of DN[8, 9]. Based on a gene sequencing chip of tubulointerstitial tissue from DN patients included in the GEO database prior to January 2022, this work investigated the underlying biological pathways utilizing bioinformatics and machine learning. A total of 107 DEGs were found to be associated with tubulointerstitial lesions in DN patients and healthy controls. The results of DEGs enrichment analysis revealed that immune dysregulation, inflammatory reaction, cytokine action, and extracellular matrix deposition were the most prominent mechanisms in tubulointerstitial injury of DN. Renal interstitial fibrosis is a leading cause of progressive kidney failure in DN and the final consequence of DN progression to end-stage renal disease[10].ECM overdeposition determines the extent and progression of renal interstitial fibrosis[11] and immune inflammatory cells that infiltrate locally early in DN release multiple cytokines, growth factors, and vasoactive substances that cause kidney-intrinsic cell damage, which activate the transcription of many ECM proteins, such as fibrinogen and fibronectin, and infiltrate the damaged site, at the same time,fibroblasts and immune cells are cell adhesion molecule to the damaged site to facilitate repair or clearance of pathogens[12,13]. But with the long-term infiltration of an immune microinflammatory state and the persistence of high glucose stimulation, repair after injury extends to pathological changes[14], with excessive deposition of ECM in the tubulointerstitium and partial hydrolysis into biologically active fragments, stimulating peripheral cells to convert to fibrotic ECM such as collagen and fibronectin, which are difficult to degrade, results in the loss of normal kidney tissue structure and function and the formation of fibrotic damage in the renal interstitium[15,16]. Based on the gene enrichment pathway in this study, it is inferred that immune dysregulation and inflammatory reaction are the driving factors of tubulointerstitial injury in DN,and the interstitial fibrosis caused by ECM deposition is the final outcome.
Many immune system components, including memory resting CD4 T cells, γδT cells, resting mast cells, neutrophil cells, and CD8 T cells, were found to be involved in tubulointerstitial injury in DN. Although DN is not a "immune-mediated" predominant kidney disease, numerous studies have demonstrated that dysregulation of the inflammatory response caused by innate and adaptive immune abnormalities leads to progressive renal impairment in patients with DN[17], including a variety of immune and inflammatory cells (T lymphocyte, monocyte macrophage, neutrophil, etc.),inflammatory factors [ Vascular endothelial growth factor (VEGF)],tumor necrosis factor (TNF-α), Transforming growth factor-β1(TGF-β1), interleukIL (IL), c-reactive proteCRP(CRP), (NF-κB),connective tissue growth fact(CTGF), monocyte chemoattractant protein (MCP-1), etc. ]. Through a variety of signaling pathways and interaction, this exacerbates the body's micro-inflammatory state and oxidative stress, which together promote ECM deposition, apoptosis,tubular sclerosis and changes in the Hemodynamics of the kidney,resulting in continuous deterioration of renal function. Therefore,DN is considered to be an immuno-inflammatory disease in recent years[18,19].
To realize the early diagnosis and evaluate the prognosis of DN, a new molecular feature of tubulointerstitial injury of DN,including MARCKSL1, CX3CR1, FSTL1, AGR2, and GADD45B,was identified by machine learning, the ROC curve and nomogram prediction model demonstrated the excellent diagnostic efficacy.MARCKSL1 is involved in the control of numerous physiological processes, including cell migration, secretion, proliferation, and differentiation, and is expressed in numerous organs[20].MARCKSL1 plays an important regulatory role in the immune system, it can inhibit p38, JNK MAPKs and NF-B by inhibiting their phosphorylation, and reduce the levels of inflammatory cytokines such as TNF- and IL-6, it also affects the migration and adhesion of macrophages and neutrophil[21], but its function and role in immune infiltration and micro-inflammation of renal tubules in DN require further investigation. Multiple clinical studies have shown that CX3CR1 is elevated in DN patients' kidneys[22], and Song et al.[23] have suggested that CX3CR1 plays an important role in DN by upregulating ECM synthesis, the inhibition of CX3CR1 reduces ECM deposition in a mouse model of DN and improves renal macrophage infiltration and fibrosis, thus potentially becoming an effective target for the prevention and treatment of DN. FSTL1 is a fibroblastderived cytokine, it is closely associated with fibrosis in a variety of tissues and organs (kidney, liver, lung, etc.)[24,25], and has been shown to be up-regulated in patients with chronic kidney disease disease in both clinical and basic studies, it may be a new therapeutic target for chronic kidney disease to promote the process of renal fibrosis, inflammation and apoptosis. And AGR2 may be involved in cell proliferation and growth through a variety of pathways, there is increasing recognition of its role in the development, progression and targeted therapy of cancer[26] and no significant differences have been observed in the validation data set, however, Zhou et al.[27] also suggested that AGR2 is one of the key genes for tubulointerstitial lesions in DN. GADD45B is involved in cell cycle arrest, cell survival or apoptosis, and DNA damage and repair, a few studies have reported that high glucose can stimulate the high expression of GADD45B in kidney tissue of db/db mice and human proximal tubular epithelial cells, promoting tubular epithelial-mesenchymal transition and apoptosis, however, it has also been suggested that GADD45B may have anti-apoptotic effects in other cell lines and in different disease models[28], whereas its expression was found to be significantly down-regulated in the tubulointerstitial tissues of DN patients in this study.
On the other hand, DN is characterized by progressive impairment of renal function such as worsening glomerular filtration rate,progressive proteinuria, elevated serum creatinine and urea nitrogen levels, and tubulointerstitial injury is more sensitive to deterioration of renal function[29], according to study. In current study, the expression of MARCKSL1, CX3CR1 and AGR2 were found to be associated with GFR and urinary protein levels in DN patients, it is suggested that this method may have some significance in judging the progress of DN. These results suggest that MARCKSL1, CX3CR1 and FSTL1 have remarkable diagnostic and predictive efficacy in tubulointerstitial injury of DN, and may even be potential therapeutic targets for DN, but AGR2 and GADD45B need to be further studied.
In conclusion, based on the integration of gene maps of differential expression of DN tubulointerstitial tissues in GEO Database and the use of bioinformatics and machine learning methods, the biological significance of related biomarkers in renal tubulointerstitial injury of DN was elucidated, and a novel concept for the diagnosis and treatment of DN was proposed; however, the related detection methods and the range and accuracy of specific applications still necessitate further investigation.
Journal of Hainan Medical College2022年20期