WGCNA analysis, a widely used tool in transcriptomic research, has become a valuable asset for exploring data correlations in the realm of multi-omics studies. In previous articles, we delved into 'Understanding WGCNA Analysis in Publications'. In this piece, we will explore how WGCNA can seamlessly integrate metabolomic and transcriptomic data, and how it can be leveraged to identify candidate genes within the transcriptome.
In the context of multi-omics studies that combine transcriptomics and metabolomics, WGCNA plays a crucial role in uncovering core genes that regulate key metabolites. Unlike the traditional gene module-to-sample correlation approach, this approach focuses on the correlation between gene modules and metabolites. Through this method, researchers can identify modules strongly correlated with metabolites and subsequently pinpoint the genes within these modules. Furthermore, it enables the selection of core genes within modules highly correlated with metabolites.
For example, a study published in July 2021 in New Phytologist titled "Integrative Analyses of Metabolome and Genome-Wide Transcriptome Reveals the Regulatory Network Governing Flavor Formation in Kiwifruit (Actinidia chinensis)" employed WGCNA to jointly analyze the metabolome and transcriptome. The research uncovered a kiwifruit flavor metabolite regulation network, revealing a novel mechanism governing the transcriptional control of flavor compounds in kiwifruit.
Similarly, a research paper published online in December 2020 in Horticulture Research, titled 'Identification of Key Gene Networks Controlling Organic Acid and Sugar Metabolism During Watermelon Fruit Development by Integrating Metabolic Phenotypes and Gene Expression Profiles' used WGCNA analysis to identify co-expressed gene modules and link them to metabolic traits, identifying key genes within these networks.
In this particular study, WGCNA analysis of the transcriptome resulted in the identification of 11 distinct gene modules. Correlation analysis was carried out to assess the relationship between sugars, acids, and these modules. The research further involved the selection of core genes within the network. A comprehensive approach was used, comparing these hub genes with annotation information to shortlist 23 candidate genes for further verification through RT-qPCR (reverse transcription quantitative polymerase chain reaction) to precisely identify key candidate genes.
The WGCNA analysis approach for combining transcriptomics and metabolomics can be divided into three key components: the selection of critical metabolites, the establishment of metabolite-to-gene correlation modules, and the identification of candidate genes.
In the context of WGCNA analysis for metabolites and transcriptomics, it's crucial to limit the number of critical metabolites to around 30 (a practical guideline without specific data support). An excessive number of metabolites can increase the complexity of later gene selection. Therefore, it's advisable to initially screen and filter out metabolites that are not significantly expressed, have low abundance, or are irrelevant to the research objectives. Criteria for selecting core metabolites typically include:
a) Metabolites with the highest fold change and substantial abundance (e.g., ≥10^6) as a guideline.
b) Review of literature for metabolites reported as positive controls.
c) Examination of the up- and downstream components of the metabolite pathway for potential correlated changes in their abundances.
d) Identification of metabolites specific to the species, tissue, or perturbation being studied.
After selecting core metabolites, a WGCNA analysis is performed to establish correlation modules with transcriptomic data. This analysis yields modules with high correlations to the selected core metabolites. KEGG enrichment analysis can be applied to these modules to identify significantly enriched pathways, aiding in the localization of key modules.
Various methods can be employed to select candidate genes:
a) Hub Gene Selection: Genes with high connectivity (k value) within a module are often situated at the center of the regulatory network. These hub genes, typically including transcription factors, play a crucial role in regulation and are a focus for further analysis.
b) Functional Gene Analysis: Genes with high connectivity in a module are often located upstream in the regulatory network, such as transcription factors. Conversely, those with low connectivity are usually downstream and can include functional enzymes and genes regulated by the central hub genes. Genes with similar expression patterns as the central hub genes within the same module may have similar functions, allowing for the discovery of structural genes highly related to the transcriptome.
c) Association Analysis with Target Genes: By referring to existing literature and previous experimental results, known genes can be obtained. These known genes are used to find the gene modules they belong to. Correlation analysis can then be conducted to identify unknown genes highly correlated with known genes and the critical metabolites. Predicting the functions of unknown genes based on the known gene functions can result in the discovery of ‘new genes’ related to the phenotypes of interest, providing valuable targets for further investigation and validation.