Home Resources Blog Data analysis

Beginner for KEGG Pathway Analysis: The Complete Guide

MetwareBio data analysis blog series

In transcriptome, proteome, metabolome and microbiome analyses, KEGG pathway annotation and enrichment analysis are often encountered. KEGG analysis has become an essential and most commonly presented analysis content in high-throughput sequencing and protein and metabolite analysis. Using updated KEGG data is crucial for accurate results in omics studies, as it allows for precise enrichment analysis and visualization. In this guide, we will walk you through the key steps involved in conducting KEGG pathway analysis, including data preparation, analysis, and interpretation of results. Additionally, you can easily generate KEGG data analysis plots for free using our Metware Cloud Platform. Watch the video tutorial on the right for a comprehensive overview.

1. What is KEGG database?

The KEGG database is a versatile tool for various omics fields. It is widely used for annotating genes in proteomics research, allowing researchers to connect genes, proteins, and metabolites within the context of biological pathways. The KEGG Database for Proteomics helps in understanding complex interactions at the molecular level.

The KEGG database was developed by the Kanehisa laboratory in 1995, and is known as the Kyoto Encyclopedia of Genes and Genomes. It has now developed into a comprehensive database, which is roughly divided into four categories: system information, genome information, chemical information, and health information. It can be further subdivided into 15 major databases. The most core ones are the KEGG PATHWAY and KEGG ORTHOLOGY databases. The KEGG PATHWAY is the most important and common database in the KEGG database. It is a large number of manually drawn KEGG pathway diagrams by researchers based on existing research literature. The KEGG PATHWAY can be divided into six categories: Cellular Processes, Environmental Information Processing, Genetic Information Processing, Human Diseases, Metabolism, and Organismal Systems.

2. Exploring the KEGG Website: Tips and Tricks

On the homepage of KEGG, the entire interface is divided into four areas. The top is a search box, the left side contains descriptions of different modules, and the bottom contains an introduction to the database and all of its sub-links.

Beginner_for_KEGG_Pathway_Analysis,The_Complete_Guide_picture_1

In KEGG, the two most commonly used links are "KEGG PATHWAY" and "KEGG COMPOUND". Next, let's take a closer look at these two sections.

Beginner_for_KEGG_Pathway_Analysis,The_Complete_Guide_picture_2

3. Overview of KEGG Pathways: Decoding the Biological Roadmaps

In metabolomics and other multi-omics research, the KEGG Pathway is an invaluable resource. Specifically, in Metabolomic Data analysis, the KEGG Pathway provides detailed insights into metabolic processes and the genes involved. Clicking on the "KEGG PATHWAY" on the homepage of the website will take you to this link. On this link page, in addition to the search box above, there is a detailed description of the pathway classification below, mainly including seven categories: ① Metabolism, ② Genetic Information Processing, ③ Environmental Information Processing, ④ Cellular Process, ⑤ Organismal Systems, ⑥ Human Diseases, and ⑦ Drug Development.

In metabolomics or multi-omics research, the most commonly used is the metabolic pathway Metabolism, which involves genes corresponding to enzymes involved in substance metabolism and metabolites.

Beginner_for_KEGG_Pathway_Analysis,The_Complete_Guide_picture_3

To know how to use KEGG PATHWAY to retrieve metabolic pathways of interest, it is necessary to briefly understand the naming rules for metabolic pathways on KEGG. Each pathway in KEGG is encoded by 2-4 prefixes and 5 numbers. The specific encoding method is shown in the following table.

Beginner_for_KEGG_Pathway_Analysis,The_Complete_Guide_picture_4

In metabolomics or multi-omics studies, we use the most of five of them: pathwaymap/hsa, koK, genevg/vp/ag, compoundC, enzyme. Pathway and ko are two forms of a pathway. Since these two are the most frequently used ones, we will focus on them.

The pages linked with the prefix 'map' mainly include seven modules: 'name', 'pathway description', 'pathway classification', 'pathway map link', 'module', 'other database link', and 'related article link'.

Among these modules, the one we use the most is the Pathway map. Clicking on the corresponding blue font can link to the pathway map. In the pathway map, the boxes represent enzymes, and clicking on the corresponding box can obtain the gene information that constitutes the enzyme. The circles represent metabolites, and clicking on them can obtain relevant information about metabolites and genes.

Beginner_for_KEGG_Pathway_Analysis,The_Complete_Guide_picture_6'

The gene information contained in the enzyme mainly includes the following: the gene number K in KEGG, the gene symbol Symbol, the gene name Name, the pathway Pathway of the gene, the module Module, the functional hierarchy Brite, other database links Other DBs, and the gene number including genes currently studied in various species, as well as homologous genes Genes, related article links Reference Authors Title Journal.

Beginner_for_KEGG_Pathway_Analysis,The_Complete_Guide_picture_7

Metabolites contain the following main information: substance number C, substance name Name, molecular formula Formula, exact weight Exact weight, molecular weight Mol weight, molecular structure Structure, reactions involved in the substance Reaction, metabolic pathway pathway, enzyme enzyme, functional hierarchy Brite, and links or numbers to other databases Other DBs.

Beginner_for_KEGG_Pathway_Analysis,The_Complete_Guide_picture_8.

4. KEGG vs GO: Looking for a Comparison?

KEGG and GO are both widely used for gene function annotation, but they serve different purposes. If you're wondering which one to choose—or whether to use both—we’ve prepared a dedicated article explaining their differences in detail.

Read: GO vs KEGG vs GSEA – Functional Enrichment Tools Compared

5. How to view the KEGG pathway map in the transcriptome

In transcriptome analysis, KEGG Pathway Analysis plays a pivotal role in understanding the biological functions of differentially expressed genes.The KEGG pathway map is the most intuitive database display result in the analysis results of transcriptome. In transcriptome analysis, thousands or even tens of thousands of genes are often involved, so we hope to classify genes and try to analyze genes with the same function together. This gene classification can be achieved through gene annotation. For transcriptome analysis, annotation information is generally obtained by referencing information in the genome. For unreferenced transcriptome, it is obtained by comparing with specific databases. After annotating the KEGG database, we can annotate the differentially expressed genes in a certain differential group to the KEGG pathway and display it in a graphical form, which visually and conveniently classifies and views the differentially expressed genes.

Beginner_for_KEGG_Pathway_Analysis,The_Complete_Guide_picture_9

In the KEGG pathway map, rectangular boxes represent gene enzymes, and circles represent metabolites. In the KEGG pathway map of differential genes, some genes are marked in red, green, or blue. What do these colors represent? If a gene is marked in red, it means that the gene expression of the enzyme annotated in the differential group is up-regulated. If it is marked in green, it means that the gene expression of the enzyme annotated in the differential group is down-regulated. If it is marked in blue, it means that the gene expression of the enzyme annotated in the differential group is both up- and down-regulated.

6. How to Perform KEGG Pathway Analysis Step by Step

KEGG pathway analysis typically begins with a list of differentially expressed genes or metabolites. The first step is to ensure the correct format of your input data—most tools require Ensembl IDs or KEGG Orthology (KO) IDs, and mismatched formats (e.g., gene names instead of KO IDs) may result in errors.

Next, users must select the appropriate reference organism and gene background. Tools such as DAVID, clusterProfiler, and Metware Cloud Platform support multiple species and ID types, but inconsistencies in genome version or file formatting can impact outcomes.

Once input data is verified, enrichment analysis is performed using statistical models such as hypergeometric distribution. Visualization of significant pathways—especially KEGG maps—is key to interpretation. Most tools allow for interactive exploration of pathways, where boxes represent genes or enzymes and circles represent metabolites.

To simplify this process, platforms like Metware Cloud offer streamlined, automated analysis workflows, reducing technical barriers and improving reproducibility.

7. Importance of KEGG Enrichment Analysis in Biological Research

Enrichment Analysis is a critical tool in Multi-Omics studies for identifying significant pathways. KEGG Enrichment Analysis in Multi-Omics reveals the underlying molecular mechanisms of biological processes, offering insights that are not immediately apparent from raw data alone. Even though we classified the differentially expressed genes using KEGG annotation analysis, we still found that there were dozens of pathways in each differentially expressed group. Therefore, we usually perform enrichment analysis on gene functions to discover the biological pathways that play a key role in biological processes, so as to reveal and understand the basic molecular mechanisms of biological processes. In addition, under different experimental conditions, activated pathways are obviously more convincing than simple gene and protein lists. Enrichment analysis is a statistical algorithm that combines functionally similar gene sets to facilitate the study of genes with certain functions. The principle of enrichment analysis is based on the hypergeometric distribution, and KEGG enrichment analysis uses qvalue less than 0.05 as the threshold for significant enrichment. The calculation formula of hypergeometric distribution is as follows:

Beginner_for_KEGG_Pathway_Analysis,The_Complete_Guide_picture_10

Where, N is the number of all genes annotated to the KEGG database, n is the number of all differentially expressed genes annotated to the KEGG database, M represents the number of genes annotated to a certain pathway in the KEGG database, and m is the number of differentially expressed genes annotated to the same pathway in the KEGG database.

Moreover, KEGG Pathway Analysis is essential for studying metabolic regulation in neurological diseases, where disrupted metabolic pathways can contribute to disease pathology.

8. Common Mistakes in KEGG Pathway Interpretation

❌ Mistake Type	⚠ Description	✅ Suggested Fix
Wrong Gene ID Format	Using gene symbols instead of Ensembl or KO IDs	Convert IDs using standard tools (e.g., BioMart)
Ensembl ID with Version	e.g., ENSG00000123456.12 causes error	Remove version suffix (use ENSG00000123456)
Species Mismatch	Selected species doesn't match gene list	Check species and genome version compatibility
Improper Background File	Incorrect KO formatting or extra columns	Use correct file type: KO should be “K+number”
Formatting Errors	Special characters, empty rows, multiple sheets	Clean file and retain only one worksheet
No Overlap Between Target and Background	May occur due to incompatible IDs or too few entries	Ensure gene lists intersect and are species-matched
All p-values = 1	Usually due to target ≈ background size	Reduce target list to focus on differential genes
Irrelevant Pathways Shown	KEGG includes all species by default	Filter by organism before visualization
Mixed-color Boxes in Map	Red/green boxes confuse interpretation	Indicates mixed regulation in gene family

To reduce such errors, platforms like Metware Cloud offer data pre-checks and curated visualization pipelines tailored for KEGG outputs.

9. Conclusion

The intricate journey through the realms of transcriptome, proteome, metabolome, and microbiome analysis using KEGG pathway annotations and enrichment analysis underscores the pivotal role these methodologies play in unraveling the complex web of biological processes. The precision and depth offered by KEGG analysis facilitate a deeper understanding of high-throughput sequencing data, enabling researchers to make significant strides in scientific discovery and innovation.

Discover the frontier of biological data with MetwareBio. Our cutting-edge tools and databases unlock new insights in metabolomics, proteomics and multi-omics. Backed by leading-edge technologies and seasoned professionals, we're your partner for groundbreaking discoveries. Contact us to explore our innovative services as well as Metware Cloud Platform for seamless analysis of your multi-omics data.

Connect With Us

NEXT: Understanding WGCNA Analysis in Publications

Resources

Sample Requirements

Document Download

FAQ

Proteomics

Proteomics Methodology Proteomics Sample Extraction Proteomics Sample Preparation Proteomics Data Analysis

Metabolomics

Metabolites for Metabolomics Metabolomics Methodology Metabolomics Sample Extraction Metabolomics Sample Preparation Metabolomics Data Analysis

Multiomics

Multiomics Methodology Multi-omics Data Analysis

Lipidomics

Lipids for Lipidomics Lipidomics Methodology Lipidomics Sample Extraction Lipidomics Sample Preparation Lipidomics Data Analysis

Blog

Spatial Metabolomics

Proteomics

Metabolomics

Metabolites

Lipidomics

Multi-omics

Data analysis

Metabolites Library

Knowledgebase

Metabolomics

Metabolites

Lipidomics

Proteomics

Multi-omics

Data Analysis

Instrumentation

Metware Cloud

Publications

Metware Cloud Platform

Applications

Cancer

Metabolic Disorders

Infectious Diseases

Agriculture & Breeding

Microbiome

Services

Metabolomics Services

Global Metabolite Profiling

Lipidomics

Targeted Metabolomics

Proteomics

Quantitative Proteomics

Peptidomics

PTM Proteomics

Proteome + PTM Analysis

Protein Complex Analysis

Spatial Omics

Untargeted Spatial Metabolomics

Untargeted Spatial Lipidomics

Neurotransmitter Spatial Profiling

Phytohormone Spatial Profiling

Multi-Omics

Proteomics + Metabolomics

Microbiome+Metabolome

Transcriptome+Metabolome

Resequencing+Metabolome

Transcriptomics + Proteomics + Metabolomics

Eukaryotic mRNA-Seq

16S rRNA gene Sequencing

Metagenomic Sequencing

Name can't be empty

Email error!

Message can't be empty

CONTACT FOR DEMO

Next-Generation Omics Solutions:
Proteomics & Metabolomics

Have a project in mind? Tell us about your research, and our team will design a customized proteomics or metabolomics plan to support your goals.
Ready to get started? Submit your inquiry or contact us at support-global@metwarebio.com.

Name can't be empty

Email error!

Message can't be empty

CONTACT FOR DEMO