In our prior article "How to understand the WGCNA analysis in publications?", we introduced the foundational concepts of WGCNA analysis, elucidating the techniques for computing gene correlation coefficients within the framework of WGCNA. Building upon that foundation, this discourse delves deeper into the intricacies of WGCNA analysis. We focus specifically on three key aspects: gene module determination, module-trait association analysis, and the extraction of pivotal candidate genes.
Moving forward, we proceed with the construction of a hierarchical clustering tree, rooted in the correlation coefficients between genes. In this dendrogram, distinct branches represent discrete gene modules, each uniquely identified by a distinct color. Leveraging the weighted gene correlations, genes are grouped based on their expression patterns, with those exhibiting similar patterns converging into coherent modules. This process efficiently condenses thousands of genes into several dozen modules, facilitating the extraction and synthesis of valuable information. The visual below illustrates the Dynamic Tree Cut, showing the initial module divisions based on correlations, while the Merged Dynamic demonstrates the outcomes following the amalgamation of analogous modules.
At the core of this analysis lies the Module Eigengene, which results from a principal component analysis (PCA) applied to all genes within a module. The first principal component encapsulates the overall expression pattern of genes within that module, effectively characterizing the module as a singular entity, with the module eigengene representing its expression value.
A correlation heatmap emerges, driven by the interplay of eigengenes across modules. This heatmap can be divided into two sections: the upper portion organizes module clustering based on eigengenes, with the vertical axis representing the dissimilarity between nodes and each module distinguished by a unique color. The lower section of the heatmap employs various modules for both the horizontal and vertical axes, with different colors assigned to each module. Within the heatmap, each cell symbolizes the correlation between modules, with deeper, redder shades indicating stronger correlations and lighter hues denoting weaker associations.
Conducting a correlation analysis between module eigengenes and samples unveils associations between modules and specific samples. An increase in the expression of a module's eigengene, whether positive or negative, within a given sample, signifies a close connection between the module and that particular sample.
The heatmap representing sample-module correlations features samples on the horizontal axis and modules on the vertical axis. In this matrix, each cell quantifies the degree of correlation between a sample and a module. In this representation, deeper red hues denote stronger correlations, while bluer tones suggest weaker associations. This analytical approach assists in identifying key modules associated with specific samples.
After selecting key modules through the sample-module correlation analysis, a significant challenge arises: within each module, there exists a multitude of genes. Consequently, the focus shifts to identifying genes most closely aligned with the samples, requiring the identification of core genes. Connectivity, which gauges the degree of interconnectivity between a gene and its peers (typically computed within a specific module), serves as a crucial metric. Hub genes, those with high connectivity (or high k values) within a module, hold a prominent position in the connectivity rankings. The table below provides a means to identify core genes by examining kWithin values.
Once core genes within a module are identified, the subsequent step involves selecting genes displaying strong associations with these core genes within the same module. This can be achieved by analyzing the network relationships between nodes within the module. The first two columns in the table represent pairs of genes, with the third column, labeled "weight," quantifying the strength of the interconnections between these gene pairs. Typically, genes are considered correlated if their weight exceeds a predefined threshold, conventionally set at 0.15. Therefore, scrutinizing correlations surpassing this threshold aids in identifying genes intrinsically linked to core genes within the module.
This article provides an in-depth overview of WGCNA analysis, with a specific focus on gene module determination, module-trait association analysis, and the extraction of critical candidate genes. In our forthcoming post, we will delve deeper into harnessing the potential of WGCNA analysis in multi-omics data.