Immunogenetics DOI 10.1007/s00251-016-0956-4
ORIGINAL ARTICLE
Integrative analysis for identification of shared markers from various functional cells/tissues for rheumatoid arthritis Wei Xia 1,2 & Jian Wu 3 & Fei-Yan Deng 1,2 & Long-Fei Wu 1,2 & Yong-Hong Zhang 2 & Yu-Fan Guo 3 & Shu-Feng Lei 1,2
Received: 10 August 2016 / Accepted: 19 October 2016 # Springer-Verlag Berlin Heidelberg 2016
Abstract Rheumatoid arthritis (RA) is a systemic autoimmune disease. So far, it is unclear whether there exist common RA-related genes shared in different tissues/cells. In this study, we conducted an integrative analysis on multiple datasets to identify potential shared genes that are significant in multiple tissues/cells for RA. Seven microarray gene expression datasets representing various RA-related tissues/ cells were downloaded from the Gene Expression Omnibus (GEO). Statistical analyses, testing both marginal and joint effects, were conducted to identify significant genes shared in various samples. Followed-up analyses were conducted on functional annotation clustering analysis, protein-protein interaction (PPI) analysis, gene-based association analysis, and ELISA validation analysis in in-house samples. We identified 18 shared significant genes, which were mainly involved in the immune response and chemokine signaling
pathway. Among the 18 genes, eight genes (PPBP, PF4, HLA-F, S100A8, RNASEH2A, P2RY6, JAG2, and PCBP1) interact with known RA genes. Two genes (HLA-F and PCBP1) are significant in gene-based association analysis (P = 1.03E-31, P = 1.30E-2, respectively). Additionally, PCBP1 also showed differential protein expression levels in in-house case-control plasma samples (P = 2.60E-2). This study represented the first effort to identify shared RA markers from different functional cells or tissues. The results suggested that one of the shared genes, i.e., PCBP1, is a promising biomarker for RA.
Wei Xia and Jian Wu contribute equally to the work.
Introduction
Electronic supplementary material The online version of this article (doi:10.1007/s00251-016-0956-4) contains supplementary material, which is available to authorized users. * Yu-Fan Guo
[email protected] * Shu-Feng Lei
[email protected] 1
Center for Genetic Epidemiology and Genomics, School of Public Health, Soochow University, Suzhou, Jiangsu 215123, People’s Republic of China
2
Jiangsu Key Laboratory of Preventive and Translational Medicine for Geriatric Diseases, Soochow University, Suzhou, Jiangsu 215123, People’s Republic of China
3
Department of Rheumatology, the First Affiliated Hospital of Soochow University, Suzhou, Jiangsu 215000, People’s Republic of China
Keywords Rheumatoid arthritis . Multi-dataset analysis . Integrative analysis . Shared genetic marker
Rheumatoid arthritis (RA) is a systemic autoimmune disease, which is characterized by chronic synovitis that progresses to destruction of the cartilage and bone. DNA microarray is a powerful technology used in systemically detecting genomewide gene expression and has become an effective tool in exploring the complex pathogenesis and heterogeneous manifestations of diseases including RA (Haupl et al. 2010; Hu and Daly 2012). Previous microarray studies have identified abnormal expression of some genes that were associated with the pathogenesis of RA (Barnes et al. 2004; Grayson et al. 2011; Huber et al. 2008; Teixeira et al. 2009; Ungethuem et al. 2010; Yarilina et al. 2008). These genes have important functions related to transcription factors, cytokines, costimulatory molecules, intracellular signaling pathways, such as NFKappaB, AP-1, NFATc2, and Smad3 (Criswell 2010; Han et al. 1998; Paradowska-Gorycka et al. 2014). The identified genes
Immunogenetics
can serve as potential biomarkers in disease diagnosis and monitor and predicting responses to therapies (Kim et al. 2014). Various cells or tissues (e.g., synovial tissue, synovial membrane, synovial macrophage, synovial fluid mononuclear cell (SFMC), peripheral blood mononuclear cell (PBMC), and peripheral blood cells) are involved in RA pathology. For example, chronic synovitis that progresses to destruction of the cartilage and bone is a major clinical characteristic. Many activated cell types (e.g., monocytes/macrophages, synovial macrophage T and B cells, and synovial fibroblasts) play a role in maintaining joint inflammation, degradation of extracellular matrix components, invasion of the cartilage and bone (Abeles and Pillinger 2006; Karouzakis et al. 2006), as well as fibrosis of the affected joints (Postlethwaite et al. 1992). Therefore, these tissues/cells have been frequently used in identifying RA relevant genes. Currently, most of the gene expression studies only focused on analyzing a single type of tissue or cell, and little attention had been paid to study whether there are commonly regulated genes in different related tissues or cells which are significant for RA. Such genes, if available, would become drug targets or diagnosis biomarkers. To identify shared RA-associated markers from various tissues or cells, we conducted a multi-dataset analysis to identify genes with differential expression in various cells or tissues. Follow-up analyses (functional annotation clustering analysis, protein-protein interaction analysis, gene-based association analysis, and plasma protein differential expression) were further performed to test functional relevance of the identified genes to RA.
Materials and methods The main analysis methods and analysis process have been briefly summarized and presented in Fig. 1.
(joint effects) in a single model. The analysis methods are briefly introduced as follows. a. Analysis of marginal effects 1. [Approach 1] The top 100 rank approach. We used the logistic regression model [approach 1-L], t test [approach 1-T] and fold change [approach 1-F] as ranking statistic analysis. In each ranking method, genes ranked at the top 100 in at least three datasets were selected for further analysis. 2. [Approach 2] The false discovery rate (FDR) approach. To partially alleviate multiple testing problems, we applied the false discovery rate (FDR) approach (Gusnanto et al. 2007) to the P values calculated by approach 1-T. With an attempt to discover important genes with an increased chance, we set FDR = 0.1 as a filter. Genes appeared in at least three datasets under FDR = 0.1 were selected for further analysis. 3. [Approach 3] The genetic variation score (GVS) approach. To capture the similarity of expression profiles among datasets, we used the genetic variation score (GVS) approach (Sirota et al. 2009) which can capture the strength and direction of association between a gene and a disease to obtain the genetic variation profile (GVP) for each dataset which consisted of GVSs of all genes, and then computed the correlations between the GVPs, which quantifies the similarity of gene effects. The package Bgplots^ in R software was used to draw heatmaps based on the computed correlations.
b. Analysis of joint effects
To identify suitable microarray datasets, we searched the NCBI Gene Expression Omnibus (GEO) (http://www.ncbi. nlm.nih.gov/geo/) (Edgar et al. 2002) and searched the database using the keywords Brheumatoid arthritis.^ We identified seven eligible datasets that utilized case-control study design (RA patients vs. healthy subjects).
The joint analysis places all genes simultaneously into a model for analysis. It is expected that of a large number of profiled genes, only a small subset is associated with the response. With such considerations, we used the Lasso penalized estimate (Huang et al. 2012). Lasso penalized estimate was realized using R package Bglmnet.^ To solve the problem of model parameters with Lasso, the stability selection approach (Alexander and Lange 2011) has been adopted with a loose cutoff as 0.1. To overcome the model convergence problem, the stability selection approach was only applied to datasets with more than six cases vs. six controls.
Multi-dataset analysis for identification of shared genes
c. Identification of the shared genes
We conducted the multi-dataset analyses, as described in the previous study (Shi et al. 2014). The multi-dataset analyses have two main analysis measures, (1) analyze each gene separately (marginal effects) and then compare across datasets, (2) simultaneously analyze the effects of all genes
After multi-dataset analysis, we identified significant genes in both marginal analysis and joint analysis. The significant genes identified by both marginal (genes identified by approach 1 or approach 2 (at level of FDR = 0.1) in more than three of the datasets) and joint analysis (genes simultaneously identified by
Identification of eligible gene expression datasets
Immunogenetics Fig. 1 Summary of analysis process
both Lasso penalized estimate and stability selection) were regarded as the shared genes. In addition, if the dataset cannot apply the stability selection approach, the results of Lasso would be used directly. Functional analysis for the identified common genes For the identified common genes, in-depth integrative analyses including functional annotation clustering analysis, protein-protein interaction analysis, gene-based association analysis, and plasma protein expression analysis were performed to test functional relevance of the identified genes to RA. a. Functional annotation clustering analysis: To gain insights into the biological functions of the identified genes, we performed GO and KEGG enrichment analysis using the Database for Annotation, Visualization and Integrated Discovery (DAVID) integrated database query tools (http://david.abcc.ncifcrf.gov/) (Huang da et al. 2009). b. Protein-protein interaction (PPI) analysis: To obtain functional evidence for identified genes and evaluate the connectivity between identified genes and the existing RA-associated genes identified in previous GWAS, we conducted PPI analysis in the Search Tool for the Retrieval of Interacting Genes (STRING) database (http://string-db.org/) (Franceschini et al. 2013). In our study, the STRING 9.1 online tool was used to screen the PPI with required confidence (combined score) >0.4.
c. Gene-based association analysis: In order to seek supplementary evidence regarding the significance of the above identified genes, we performed gene-based association analysis for the above identified genes, based on previous SNP-based genome-wide association study (GWAS) meta-analysis data of RA (Okada et al. 2014). Gene-based association analysis was carried out using Gene-based Association Test using ExtendedSimes (GATE) procedure modeled in KGG, with the resultant gene-based P value as a measure of statistical significance (Li et al. 2011). d. ELISA analyses of protein levels in plasma for differential expression of plasma proteins: For the identified genes, we also tested if the corresponding proteins had differential levels in plasma samples between RA cases and controls. Twenty-five RA patients were recruited from the First Affiliated Hospital of Soochow University. This study has been specifically approved by the ethics committee of the First Affiliated Hospital of Soochow University. All the patients were diagnosed as RA according to the American College of Rheumatology 1987/2010 revised criteria. According to disease severity classification of RA patients, serious disease activity is defined as DAS28 >5.10. The range of DAS28 in our patients is from 4.87 to 7.20. Thirteen age- and sex- matched healthy volunteers were included in this study. All subjects signed informed consent before entering the study. Plasma protein levels for PCBP1, PF4, P2RY6, JAG2, HLA-F and RNASEH2A were measured by using commercially available ELISA kits (Enzyme-linked Biotechnology Co., Ltd., Shanghai, China) according to the manufacturers’ protocols.
Immunogenetics
Results
results taken together showed that the identified genes from different sources of samples/datasets are largely different.
Identification of eligible gene expression datasets Identification of genes by analysis of joint effects As shown in Table 1, we identified seven eligible GEO datasets from six studies. These datasets were generated with six types of RA-associated tissue/cell types, including synovial tissue, synovial membrane, synovial macrophages, synovial fluid mononuclear cells (SFMCs), peripheral blood mononuclear cells (PBMCs), and peripheral blood cells. The six types were divided into two categories: synovium (GSE1919 (Ungethuem et al. 2010) (D1), GSE12021 (Huber et al. 2008) (D2), GSE1402SFMC (Barnes et al. 2004) (D3) and GSE10500 (Yarilina et al. 2008) (D4)), and blood cell (GSE15573 (Teixeira et al. 2009) (D5), GSE23561 (Grayson et al. 2011) (D6) and GSE1402PBMC (Barnes et al. 2004) (D7)). A total of 7502 genes, simultaneously profiled in all the 7 datasets, were retained for further analysis. Identification of genes by analysis of marginal effects The analysis results of approach 1-L, approach 1-T, and approach 1-F are shown in Table S1. Approach-F identified more overlapping genes than the other two methods. The number of overlapping genes, identified by approach 1-L and approach 1-T, were slightly different. For the datasets either from the same category or from different category, the number of overlapping genes among the top 100 ranked genes was very small. The numbers of genes identified by approach 1-L, approach 1-T, and approach 1-F for further analysis were 1, 2, and 21, respectively. The analysis results of approach 2 with FDR = 0.1 and 0.01 are shown in Table S2. Under different FDR cutoffs, the number of identified genes was greatly different. Under FDR = 0.1, the genes selected in the seven datasets were 392, 783, 2028 and 1035, 565, 5877, and 2360, respectively. The largest number of the overlapping genes in synovium datasets was 374 and 1851 in blood datasets. Simultaneously, in the seven datasets, the largest number of overlapping genes was 1580. But under the relatively stringent threshold FDR = 0.01, fewer genes were identified. The number of genes presented in at least three datasets was 1653 under FDR = 0.1 and 79 under FDR = 0.01. The analysis results of approach 3 are shown in Fig. S1. The heatmaps showed the correlation matrices between the datasets. Positive correlations are shown in red and negative correlations in green. The results produced by three correlation coefficients were similar. The correlations were weak among the same category of datasets, and much weaker among different types of tissues or cells. The clustering results showed that most datasets in the same categories were gathered together. Seven datasets were put in two clusters: D1, D2, D3, and D6 were put in a cluster; D4, D5, and D7 were put in another cluster. These
The Lasso approach was applied to each dataset separately. The tuning parameter was selected using five-fold cross validation. Numbers of genes identified in each dataset are shown in Table S3. No genes were found in common among any two sets of identified genes. Due to the constraint of sample size, the stability selection approach was just applied to four datasets (D2, D3, D5, and D7). Results with cutoff = 0.1 are summarized in Table S3. None of the genes were found overlapped across datasets too. Identification of the shared genes After multi-dataset analysis, we achieved 18 shared genes by our identification standard. For example, PPBP was selected through approach 1-F (D3-D4-D5) of marginal effect and both Lasso and Stability selection (D3) of joint effect. All details are showed in Table 2. Functional analysis for the identified shared genes a. Functional annotation clustering analysis: In the GO and KEGG enrichment analysis, only biological process categories were considered. All the enriched GO terms and KEGG pathways are listed in Table S4, including immune response (GO: 0006955), cell surface receptor linked signal transduction (GO: 0007166), and chemokine signaling pathway. b. PPI analysis: Compared with the known gene list from the GWAS Catalog (http://www.genome.gov/gwastudies/index. cfm?pageid=26525384#searchForm) that identified the RAassociated genes from previous association studies, the 18 genes shared across datasets are likely novel genes for RA. The genes (P < 1.00E-7) from GWAS Catalog dataset and 18 shared genes were put into PPI analysis together. PPI analysis (Fig. 2) showed that eight among the 18 genes interact with the known RA-associated genes in the whole network. The P values and the connection type between them are shown in Table S5. c. Gene-based association analysis: For the above eight novel genes, the gene-based association analysis result is presented in Table 3. The HLA-F showed significant P value in both European (P = 1.03E-31) and Asian (P = 1.80E-19). PCBP1 was significant in European only (P = 1.30E-2). d. Differential plasma protein levels in in-house case-control samples: Plasma samples from 25 RA patients (19 female and 6 male) (age range: 19–78 years) and 13 age- and sexmatched health controls were analyzed in this study. Previous studies have confirmed that the levels of plasma
Immunogenetics Table 1
Summary information of seven eligible datasets
Dataset
D1
Disease
Rheumatoid Rheumatoid arthritis arthritis Synovial tissues Synovial membrane 5:5 12:9
D2
D3
D4
D5
D6
D7
Juvenile rheumatoid arthritis SFMCs
Rheumatoid arthritis Synovial macrophages 5:3
Rheumatoid arthritis PBMCs
Rheumatoid Juvenile rheumatoid arthritis arthritis Peripheral blood PBMCs cells 6:9 20:11
Target cells Sample size GSE NO. GSE1919
GSE12021
GSE1402SFMC
GSE10500
GSE15573
GSE23561
GSE1402PBMC
Platform
GPL96
GPL8300
GPL8300
GPL6102
GPL10775
GPL8300
GPL91
15:11
18:15
Note: D3 and D7 from the same study GSE1402, the GSE1402SFMC and GSE1402PBMC are the GSE NO of two sub-groups SFMCs synovial fluid mononuclear cells, PBMCs peripheral blood mononuclear cells
S100A8 (mean ± SD 5.99 ± 0.88 vs. 1.92 ± 1.16 μg/mL, P < 1.00E-4) and PPBP (mean ± SD 98.00 ± 60.49 vs. 62.38 ± 30.41 ng/mL, P = 2.00E-2) were significantly higher in RA patients than health controls (Andres Cerezo et al. 2011; Karatoprak et al. 2013). So in our study, we only tested the protein level for the remaining six identified genes in plasma. As shown in Table 4, RA Table 2
patients have significantly lower levels of PCBP1 (mean ± SD 0.16 ± 0.08 vs. 0.32 ± 0.22 ng/mL, P = 2.60E-2) compared with controls. The plasma levels of PF4, P2RY6, and JAG2 (P = 1.29E-1, P = 2.50E-1, P = 9.90E-2, respectively) have no statistical significant difference between RA and controls. However, a trend of lower expression level in RA patients was observed.
Genes differentially expressed in multiple RA-related tissues/cells and detected significantly by both marginal and joint effect analyses
Gene symbol Chromosome Gene name
Marginal effect
Joint effect
PPBP
4q13.3
Approach 1-F (D3-D4-D5)
Lasso and Stability selection (D3)
GNB1
1p36.33
Approach2-FDR = 0.1 (D3-D5-D6-D7)
Lasso and Stability selection (D5)
TEX261 P2RY6
2p13.3 11q13.4
JAG2
14q32.33
Pro-platelet basic protein (chemokine (C-X-C motif) ligand 7) Guanine nucleotide binding protein (G protein), beta polypeptide 1 Testis expressed 261 Pyrimidinergic receptor P2Y, G-protein coupled, 6 Jagged 2
Approach 2-FDR = 0.1 (D4-D5-D6-D7) Lasso and Stability selection (D5) Approach 2-FDR = 0.1 (D1-D3-D4-D6) Lasso (D1) Approach 2-FDR = 0.1 (D3-D6-D7)
Lasso and Stability selection (D7)
PCBP1 2p13.3 RNASEH2A 19p13.2 S100A8 1q21.3
Poly (rC) binding protein 1 Ribonuclease H2, subunit A S100 calcium-binding protein A8
Approach 2-FDR = 0.1 (D5-D6-D7) Approach 2-FDR = 0.1 (D5-D6-D7) Approach 2-FDR = 0.1 (D3-D4-D5)
Lasso and Stability selection (D5) Lasso and Stability selection (D7) Lasso and Stability selection (D5)
PF4 HLA-F
4q13.3 6p22.1
Approach 2-FDR = 0.1 (D3-D6-D7) Approach 2-FDR = 0.1 (D1-D6-D7)
Lasso and Stability selection (D3) Lasso (D1)
HMGN4
6p22.2
Approach 2-FDR = 0.1 (D2-D6-D7)
Lasso and Stability selection (D2)
NUP62 PDK1
19q13.33 2q31.1
Approach 2-FDR = 0.1 (D3-D5-D6) Approach 2-FDR = 0.1 (D2-D3-D6)
Lasso and Stability selection (D5) Lasso and Stability selection (D3)
UTRN IFITM2
6q24.2 11p15.5
Approach 2-FDR = 0.1 (D3-D5-D7) Approach 2-FDR = 0.1 (D4-D6-D7)
Lasso and Stability selection (D3) Lasso (D6)
GNE
9p13.3
Approach 2-FDR = 0.1 (D1-D5-D6)
Lasso (D1)
NSFL1C CTSF
20p13 11q13.2
Platelet factor 4 Major histocompatibility complex, class I, F High mobility group nucleosomal-binding domain 4 Nucleoporin 62 kDa Pyruvate dehydrogenase kinase, isozyme 1 Utrophin Interferon induced transmembrane protein 2 (1-8D) Glucosamine (UDP-N-acetyl)-2-epimerase/N-acetylmannosamine kinase NSFL1 (p97) cofactor (p47) Cathepsin F
Approach 2-FDR = 0.1 (D3-D5-D6) Approach 2-FDR = 0.1 (D3-D4-D6)
Lasso (D6) Lasso (D6)
Note: Genes identified by both marginal effect analyses (by approach 1 and approach 2 FDR = 0.1) and joint analysis (by both lasso penalized estimate and stability selection) were regarded as the shared genes. In marginal analysis, genes significant in at least three datasets are selected. In joint analysis, genes significant in both the Lasso and the stability selection methods are selected. Due to the constraint of sample size, the stability selection approach could not be applied to the datasets D1, D4, and D6. In the case, only the Lasso results were considered
Immunogenetics
Fig. 2 PPI analysis for the GWAS Catalog RA-associated genes and 18 identified genes. Note: we first selected genes with P<1.00E-7 from GWAS Catalog as RA-associated genes, and then performed PPI
analysis between them and the 18 novel genes. The eight genes in the red frames are connected to known RA-associated genes. The colorful lines between genes indicate different connection types
HLA-F and RNASEH2A were not detected in plasma. Both are intracellular proteins and probably are not secreted.
candidate potential RA biomarkers. Functional annotation clustering analysis indicated that these genes mainly involved in immune response and chemokine signaling pathway, which are well-known biological process and pathway involved in the pathogenesis and development of RA. Further PPI analysis also indicated that eight genes (e.g., PPBP, PF4, HLA-F, S100A8, RNASEH2A, P2RY6, JAG2, and PCBP1) were connected with previously known RA genes. Gene-based association analysis also provided supportive evidence from DNA level. More importantly, differential protein levels were also detected and validated in independent in-house plasma samples for two proteins. The above results not only support the efficiency of our integrative analysis strategy, but also highlight the significance of the newly discovered genes as biomarkers for RA.
Discussion Identification of RA biomarkers is an important goal in RA research. If a differentially expressed biomarker exists in different functional cells/tissues of RA patients, it is more likely to play a critical role in pathogenesis of RA. This study represents the first effort in identifying RA biomarkers by using multi-dataset and integrative analysis strategy. The multidataset analysis detected a total of 18 shared genes as
Immunogenetics Table 3 The gene-based association analysis of the eight RA-associated genes after PPI analysis
Gene
Chr.
Europeans
Asians Number of related SNPs
Genebased P value
6
Min-P value of SNP
Genebased P value
Number of related SNPs 5
Min-P value of SNP
PPBP
4q13.3
2.50E-1
1.60E-1
1.90E-1
PF4 HLA-F
4q13.3 6p22.1
2.60E-1 1.03E-31
8 174
1.70E-1 2.10E-38
3.80E-1 1.80E-19
S100A8 RNASEH2A
1q21.3 19p13.2
9.60E-1 5.50E-1
9 24
6.40E-1 5.10E-2
9.90E-1 6.20E-1
8 4
9.40E-1 2.30E-1
P2RY6
11q13.4
8.10E-1
61
1.70E-2
5.90E-1
49
8.20E-2
6 140
1.90E-1 1.80E-1 1.30E-20
JAG2
14q32.33
8.90E-1
55
4.40E-2
6.90E-1
15
7.40E-2
PCBP1
2p13.3
1.30E-2
6
1.10E-3
1.90E-1
6
6.70E-2
Note: Min-P value of SNP: the smallest SNP-based P value Chr chromosome
The proportion of overlapped differential expression genes across multiple datasets is relatively low, as reflected by the marginal and joint effect analyses. The possible reasons are as follows. First, multiple genes, including minor and moderate effects, are involved in RA pathogenesis. Various cells/tissues cooperate but exert different roles and involve specific functional genes in RA pathogenesis. Second, from the point of statistics, even if there are shared RA-related genes, the probability to identify such genes simultaneously from up to seven datasets is much lower than the probability to identify it from a single dataset. Third, the original studies generating these microarray datasets are different in terms of sample characteristics (such as age, gender, and ethnic background), microarray platform, and data quality control procedure. In the present study, evidences gained from SNP, mRNA, and protein levels strongly support that PCBP1 is a novel biomarker for RA. The functions of PCBP1 mainly consist of
Table 4
regulating gene transcription, maintaining the stability of the mRNA, causing gene silence or promoting translation and serving as a partner iron ion. PCBP1 can regulate expression of many transcriptions, such as the WNT2B, WNT4, and WNT7B, in the Wingless gene family (Wingless and Int, WNT) (Huo et al. 2010). The formation and damage of the bone are one of the important clinical manifestations of RA. Mak et al. (2009) concluded that the contrasting anabolic and catabolic effects of glucocorticoids on the bone are, at least in part, mediated through the regulation of Wnt expression and its inhibitors in mature osteoblasts (Mak et al. 2009). However, the molecular mechanism on how PCBP1 is involved in RA is unknown and has yet to be studied in the future. With the best of our knowledge, PCBP1 is a novel biomarker for RA, but no other relevant reports have indicated that PCBP1 is associated with other autoimmune diseases (e.g., systemic lupus erythematosus). Additionally, PCBP1 is associated with several cancers including lung cancer (Li et al. 2016), thyroid carcinoma
Plasma protein levels for the identified shared genes in in-house samples by ELISA
Biomarkers
Protein type
Case (n = 25) Mean ± SD
Control (n = 13) Mean ± SD
P value
Reference
S100A8 (μg/mL)
S
5.99 ± 0.88
1.92 ± 1.16
<1.00E-4
(Andres Cerezo et al. 2011)
PPBP (ng/mL)
S
98.00 ± 60.49
62.38 ± 30.41
2.00E-2
(Karatoprak et al. 2013)
PCBP1 (ng/mL) PF4 (ng/mL) P2RY6 (ng/mL) JAG2 (ng/mL) HLA-F RNASEH2A
S S S S I I
0.16 ± 0.08 15.40 ± 7.24 170.60 ± 117.20 1.12 ± 0.73 0 0
0.32 ± 0.22 23.34 ± 16.96 213.60 ± 99.85 2.08 ± 1.88 0 0
2.60E-2 1.29E-1 2.50E-1 9.90E-2 / /
S secretary protein, I intracellular protein, Case rheumatoid arthritis patient, Control healthy control, Mean ± SD mean ± standard deviation; The P value is from t-test
Immunogenetics
(Zhang et al. 2016), squamous cell carcinoma (Xia et al. 2016), acute myeloid leukemia (Zhou and Tong 2015), Burkitt lymphoma (Wagener et al. 2015), prostate cancer (Chen et al. 2015), cervical cancer (Pathak et al. 2014), and so on. In recent years, anti-cyclic citrullinated protein/peptide antibodies have emerged as reliable biomarkers for RA. Peptidyl arginine deiminase (PAD) enzymes mediate the metabolism of citrullination in RA. We performed further analyses to test if the interactions exist between PCBP1 and PAD genes. Our analyses showed that just PADI2 among all PAD members was shared by all seven downloaded datasets. For the interaction between PCBP1 and PAD genes, the correlation coefficients are significant only in the RA patients of D1 dataset and the health control groups of D2. The online tool String has tested the interactions between PCBP1 and PAD genes (PADI1, PADI2, PADI3, PADI4, and PADI6), but unfortunately, no known interactions exist between them. HLA-F belongs to the HLA class I heavy chain paralogues. It has been recently reported to be a surface marker of activated lymphocytes (Lee et al. 2010). It also contributes to the regulation of immune system functions and the immune defense by cooperating with MHC-I open conformers (Goodridge et al. 2013). Lee et al. (2010) found that the MHC class I molecules (HLA-E, HLA-F, HLA-G, TAP, and TAPBP) were all over-expressed in the bone marrow cells of RA patients (Lee et al. 2011). As showed in our PPI network, HLA-F, HLA-G, MHC-G, and MHC are closely correlated with each other, suggesting the important role of HLA-F in the process of immune response. But unfortunately, HLA-F is not a secretary protein, and it was undetectable in plasma by ELISA. PPBP (CXCL7) and PF4 (CXCL4) are clustered in the BImmune response,^ as well as in BChemokine signaling pathway^ according to functional annotation. As we know, the two biological processes are closely related to the pathogenesis and development of RA. PPBP (CXCL7) and PF4 (CXCL4) are members of the CXC chemokine family. The level of beta-thromboglobulin (beta-TG) which is the alias of PPBP was found significantly higher in female rheumatoid arthritis patients compared with health population (Karatoprak et al. 2013). This protein of PF4 is chemotactic for numerous other cell types and also functions as an inhibitor of hematopoiesis, angiogenesis, and T-cell function. It also may contribute to designate the chronicity of synovitis in RA patients (Erdem et al. 2007). Its variants also play an important role in the regulation of platelet activation and systemic inflammation serum biomarkers (Bhatnagar et al. 2012). The protein encoded by S100A8 is a member of the S100 family of proteins containing two EF-hand calciumbinding motifs. Increased levels of calprotectin, a heterodimer of S100A8 and S100A9 subunits, in extracellular fluid were associated with numerous inflammation
pathological conditions, such as rheumatoid arthritis (Yui et al. 2003). The level of calprotectin that is released during activation and turnover of leukocytes (Johne et al. 1997) is strongly associated with disease activity in RA patients (Hammer et al. 2007). JC et al. suggested that S100A8 and S100A9 enhance the inflammatory response by inducing cytokine secretion of PBMCs (Simard et al. 2013). S100A8 and S100A9 are two major leukocyte proteins, known as damage-associated molecular patterns, which were found in the synovial fluid of RA patients with high concentrations (Kang et al. 2014). Their findings suggested S100A8/A9 as a valuable diagnostic and prognostic biomarker for RA. Our research on a variety of functional cells/tissues provides further evidence for S100A8 as a novel biomarker for RA. Although there is no literature reported P2RY6 directly associated with RA, several studies have proved that the P2Y6 signaling pathway plays a predominant role in mediating HNPinduced IL-8 production (Khine et al. 2006). And the role of interleukin (IL)-8 in the pathogenesis of joint and systemic inflammation in rheumatoid arthritis (RA) has been clearly demonstrated (Hwang et al. 2004; Terenzi et al. 2013). JAG2 is involved in the Notch signaling pathway and the Notch signal has a key role in the process of lymphocyte development, which can promote the formation of α/β T cells (Kageyama et al. 2008) and increase the number of B cells in the edge area of peripheral immune organ (Amsen et al. 2007). These immune cells more or less participate in the development process of RA, though the exact role of JAG2 in the pathogenesis of RA remains largely unknown.RNASEH2 is involved in the pathway of DNA replication. Although it has not been reported that this gene was associated with RA, its significant differential expression between RA patients and healthy controls imply its potential role in the pathogenesis of RA.
Conclusions In summary, our multi-dataset analysis of microarray studies provides a unique view on the similarity of expression profiles from various functional tissues/cells. Follow-up integrative analysis identified eight novel important shared genes which may be involved in the progression and development of RA. In addition, evidences from SNP, mRNA, and protein levels consistently and strongly supported that PCBP1 is the most promising RA biomarker. Further studies are necessary for validating the genes as predictors for RA, and for novel insights into their roles in development or progression of RA. Acknowledgments The study was supported by the Natural Science Foundation of China (81473046, 31401079, 81401343, 31271336, and 81373010), the Natural Science Foundation of Jiangsu Province
Immunogenetics (BK20130300), the Startup Fund from Soochow University (Q413900112, Q413900712), the Project funded by China Postdoctoral Science Foundation (2014M551649), and a Project of the Priority Academic Program Development of Jiangsu Higher Education Institutions. Compliance with ethical standards Conflict of interest The authors declare that they have no competing interests.
References Abeles AM, Pillinger MH (2006) The role of the synovial fibroblast in rheumatoid arthritis: cartilage destruction and the regulation of matrix metalloproteinases. Bull NYU Hosp Jt Dis 64:20–24 Alexander DH, Lange K (2011) Stability selection for genome-wide association. Genet Epidemiol 35:722–728. doi:10.1002/gepi.20623 Amsen D et al (2007) Direct regulation of Gata3 expression determines the T helper differentiation potential of Notch. Immunity 27:89–99. doi:10.1016/j.immuni.2007.05.021 Andres Cerezo L, Mann H, Pecha O, Plestilova L, Pavelka K, Vencovsky J, Senolt L (2011) Decreases in serum levels of S100A8/9 (calprotectin) correlate with improvements in total swollen joint count in patients with recent-onset rheumatoid arthritis. Arthritis Res Ther 13:R122. doi:10.1186/ar3426 Barnes MG et al (2004) Gene expression in juvenile arthritis and spondyloarthropathy: pro-angiogenic ELR+ chemokine genes relate to course of arthritis. Rheumatology (Oxford) 43:973–979. doi:10.1093/rheumatology/keh224 Bhatnagar P et al (2012) Genetic variants in platelet factor 4 modulate inflammatory and platelet activation biomarkers. Circ Cardiovasc Genet 5:412–421. doi:10.1161/CIRCGENETICS.111.961813 Chen Q, Cai ZK, Chen YB, Gu M, Zheng DC, Zhou J, Wang Z (2015) Poly r(C) binding protein-1 is central to maintenance of cancer stem cells in prostate cancer cells cellular physiology and biochemistry. Int J Exp Cell Physiol, Biochem Pharmacol 35:1052–1061. doi:10.1159/000373931 Criswell LA (2010) Gene discovery in rheumatoid arthritis highlights the CD40/NF-kappaB signaling pathway in disease pathogenesis. Immunol Rev 233:55–61. doi:10.1111/j.0105-2896.2009.00862.x Edgar R, Domrachev M, Lash AE (2002) Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res 30:207–210 Erdem H, Pay S, Musabak U, Simsek I, Dinc A, Pekel A, Sengul A (2007) Synovial angiostatic non-ELR CXC chemokines in inflammatory arthritides: does CXCL4 designate chronicity of synovitis? Rheumatol Int 27:969–973. doi:10.1007/s00296-007-0317-6 Franceschini A et al (2013) STRING v9.1: protein-protein interaction networks, with increased coverage and integration. Nucleic Acids Res 41:D808–D815. doi:10.1093/nar/gks1094 Goodridge JP et al (2013) HLA-F and MHC-I open conformers cooperate in a MHC-I antigen cross-presentation pathway. J Immunol 191: 1567–1577. doi:10.4049/jimmunol.1300080 Grayson BL, Wang L, Aune TM (2011) Peripheral blood gene expression profiles in metabolic syndrome, coronary artery disease and type 2 diabetes. Genes Immun 12:341–351. doi:10.1038/gene.2011.13 Gusnanto A, Calza S, Pawitan Y (2007) Identification of differentially expressed genes and false discovery rate in microarray studies. Curr Opin Lipidol 18:187–193. doi:10.1097/MOL.0b013e3280895d6f Hammer HB et al (2007) Calprotectin (a major leucocyte protein) is strongly and independently correlated with joint inflammation and
damage in rheumatoid arthritis. Ann Rheum Dis 66:1093–1097. doi:10.1136/ard.2006.064741 Han Z, Boyle DL, Manning AM, Firestein GS (1998) AP-1 and NF-kappaB regulation in rheumatoid arthritis and murine collagen-induced arthritis. Autoimmunity 28:197–208. doi:10.3109/08916939808995367 Haupl T, Stuhlmuller B, Grutzkau A, Radbruch A, Burmester GR (2010) Does gene expression analysis inform us in rheumatoid arthritis? Ann Rheum Dis 69(Suppl 1):i37–i42. doi:10.1136/ard.2009.119487 Hu X, Daly M (2012) What have we learned from six years of GWAS in autoimmune diseases, and what is next? Curr Opin Immunol 24: 571–575. doi:10.1016/j.coi.2012.09.001 Huang da W, Sherman BT, Lempicki RA (2009) Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat Protoc 4:44–57. doi:10.1038/nprot.2008.211 Huang J, Breheny P, Ma S (2012) A selective review of group selection in high-dimensional. Models Stat Sci 27:481–499. doi:10.1214/12-STS392 Huber R, Hummert C, Gausmann U, Pohlers D, Koczan D, Guthke R, Kinne RW (2008) Identification of intra-group, inter-individual, and gene-specific variances in mRNA expression profiles in the rheumatoid arthritis synovial membrane. Arthritis Res Ther 10:R98. doi:10.1186/ar2485 Huo LR et al (2010) Identification of differentially expressed transcripts and translatants targeted by knock-down of endogenous PCBP1. Biochim Biophys Acta 1804:1954–1964. doi:10.1016/j.bbapap.2010.07.002 Hwang SY, Kim JY, Kim KW, Park MK, Moon Y, Kim WU, Kim HY (2004) IL-17 induces production of IL-6 and IL-8 in rheumatoid arthritis synovial fibroblasts via NF-kappaB- and PI3-kinase/Aktdependent pathways. Arthritis Res Ther 6:R120–R128. doi:10.1186/ar1038 Johne B, Fagerhol MK, Lyberg T, Prydz H, Brandtzaeg P, NaessAndresen CF, Dale I (1997) Functional and clinical aspects of the myelomonocyte protein calprotectin. Mol Pathol 50:113–123 Kageyama R, Ohtsuka T, Shimojo H, Imayoshi I (2008) Dynamic Notch signaling in neural progenitor cells and a revised view of lateral inhibition. Nat Neurosci 11:1247–1251. doi:10.1038/nn.2208 Kang KY, Woo JW, Park SH (2014) S100A8/A9 as a biomarker for synovial inflammation and joint damage in patients with rheumatoid arthritis. Korean J Intern Med 29:12–19. doi:10.3904/kjim.2014.29.1.12 Karatoprak C, Uyar S, Abanonu GB, Pehlevan SM, Okuroglu N, Demirtunc R (2013) The levels of beta-thromboglobulin in female rheumatoid arthritis patients as activation criteria. Rheumatol Int 33: 1229–1232. doi:10.1007/s00296-012-2511-4 Karouzakis E, Neidhart M, Gay RE, Gay S (2006) Molecular and cellular basis of rheumatoid joint destruction. Immunol Lett 106:8–13. doi:10.1016/j.imlet.2006.04.011 Khine AA et al (2006) Human neutrophil peptides induce interleukin-8 production through the P2Y6 signaling pathway. Blood 107:2936– 2942. doi:10.1182/blood-2005-06-2314 Kim TH, Choi SJ, Lee YH, Song GG, Ji JD (2014) Gene expression profile predicting the response to anti-TNF treatment in patients with rheumatoid arthritis; analysis of GEO datasets. Joint Bone Spine 81: 325–330. doi:10.1016/j.jbspin.2014.01.013 Lee HM et al (2011) Abnormal networks of immune response-related molecules in bone marrow cells from patients with rheumatoid arthritis as revealed by DNA microarray analysis. Arthritis Res Ther 13:R89. doi:10.1186/ar3364 Lee N, Ishitani A, Geraghty DE (2010) HLA-F is a surface marker on activated lymphocytes. Eur J Immunol 40:2308–2318. doi:10.1002 /eji.201040348 Li J, Feng Q, Wei X, Yu Y (2016) MicroRNA-490 regulates lung cancer metastasis by targeting poly r(C)-binding protein 1
Immunogenetics tumour biology. J Int Soc Oncodevelopmental Biol Med. doi:10.1007/s13277-016-5347-9 Li MX, Gui HS, Kwan JS, Sham PC (2011) GATES: a rapid and powerful gene-based association test using extended Simes procedure. Am J Hum Genet 88:283–293. doi:10.1016/j.ajhg.2011.01.019 Mak W, Shao X, Dunstan CR, Seibel MJ, Zhou H (2009) Biphasic glucocorticoid-dependent regulation of Wnt expression and its inhibitors in mature osteoblastic cells. Calcif Tissue Int 85:538–545. doi: 10.1007/s00223-009-9303-1 Okada Y et al (2014) Genetics of rheumatoid arthritis contributes to biology and drug discovery. Nature 506:376–381. doi:10.1038/nature12873 Paradowska-Gorycka A, Romanowska-Prochnicka K, Haladyj E, Manczak M, Maslinski S, Olesinska M (2014) Association of the Smad3 and NFATc2 gene polymorphisms and their serum levels with susceptibility to rheumatoid arthritis in Polish cohorts. Clin Exp Immunol. doi:10.1111/cei.12482 Pathak S, Bajpai D, Banerjee A, Bhatla N, Jain SK, Jayaram HN, Singh N (2014) Serum one-carbon metabolites and risk of cervical cancer. Nutr Cancer 66:818–824. doi:10.1080/01635581.2014.916318 Postlethwaite AE, Holness MA, Katai H, Raghow R (1992) Human fibroblasts synthesize elevated levels of extracellular matrix proteins in response to interleukin 4. J Clin Invest 90:1479–1485. doi:10.1172/JCI116015 Shi X, Shen S, Liu J, Huang J, Zhou Y, Ma S (2014) Similarity of markers identified from cancer gene expression studies: observations from GEO. Brief Bioinform 15:671–684. doi:10.1093/bib/bbt044 Simard JC, Cesaro A, Chapeton-Montes J, Tardif M, Antoine F, Girard D, Tessier PA (2013) S100A8 and S100A9 induce cytokine expression and regulate the NLRP3 inflammasome via ROSdependent activation of NF-kappaB (1.). PLoS One 8:e72138. doi:10.1371/journal.pone.0072138 Sirota M, Schaub MA, Batzoglou S, Robinson WH, Butte AJ (2009) Autoimmune disease classification by inverse
association with SNP alleles. PLoS Genet 5:e1000792. doi:10.1371/journal.pgen.1000792 Teixeira VH et al (2009) Transcriptome analysis describing new immunity and defense genes in peripheral blood mononuclear cells of rheumatoid arthritis patients. PLoS One 4:e6803. doi:10.1371/journal.pone.0006803 Terenzi R, Romano E, Manetti M, Peruzzi F, Nacci F, MatucciCerinic M, Guiducci S (2013) Neuropeptides activate TRPV1 in rheumatoid arthritis fibroblast-like synoviocytes and foster IL-6 and IL-8 production. Ann Rheum Dis 72:1107–1109. doi:10.1136/annrheumdis-2012-202846 Ungethuem U et al (2010) Molecular signatures and new candidates to target the pathogenesis of rheumatoid arthritis. Physiol Genomics 42A:267–282. doi:10.1152/physiolgenomics.00004.2010 Wagener R et al (2015) The PCBP1 gene encoding poly (rC) binding protein I is recurrently mutated in Burkitt lymphoma. Genes, Chromosome Cancer 54:555–564. doi:10.1002/gcc.22268 Xia S, Zhao Z, Xie F, He J, Li H (2016) Poly r (C) binding protein is posttranscriptionally repressed by MiR-490-3p to potentiate squamous cell carcinoma tumour biology. J Int Soc Oncodevelopmental Biol Med. doi:10.1007/s13277-016-5234-4 Yarilina A, Park-Min KH, Antoniv T, Hu X, Ivashkiv LB (2008) TNF activates an IRF1-dependent autocrine loop leading to sustained expression of chemokines and STAT1-dependent type I interferonresponse genes. Nat Immunol 9:378–387. doi:10.1038/ni1576 Yui S, Nakatani Y, Mikami M (2003) Calprotectin (S100A8/S100A9), an inflammatory protein complex from neutrophils with a broad apoptosis-inducing activity. Biol Pharm Bull 26:753–760 Zhang M, Wang X, Tan J, Zhao M, Lian L, Zhang W (2016) Poly r (C) binding protein (PCBP) 1 is a negative regulator of thyroid carcinoma. Am J Transl Res 8:3567–3573 Zhou M, Tong X (2015) Downregulated poly-C binding protein-1 is a novel predictor associated with poor prognosis in acute myeloid leukemia. Diagn Pathol 10:147. doi:10.1186/s13000-015-0377-y