+1(781)975-1541
support-global@metwarebio.com

A Guide to Protein Database Selection

Proteomics relies on protein databases for spectrum prediction and comparison with mass spectrometry data to achieve protein identification. Therefore, protein databases serve as the foundation of proteomic analysis, and their completeness and accuracy directly impact the quality of final proteomic data.

 

 

Figure 1. The princeple of protein identification

 

Human Proteome

Compared to other species, research on the human proteome is relatively well-established, with the commonly utilized database provided by UniProt. Within UniProt, there are three sub-databases dedicated to the human proteome: UniProtKB/Swiss-Prot (referred to as Swiss-Prot), Proteome (UP000005640), and UniProtKB (comprising Swiss-Prot and TrEMBL). These databases differ in terms of protein count, accuracy, and annotation depth.

UniProt

Type

Total Number of Protein Sequences  

Number of Unique Protein Sequences

 

 

UniProtKB

Swiss-Prot(Reviewed)

20404

20330

TrEMBL(Unreviewed)

186900

 

Total

207304

182025

 

 

Proteome

Swiss-Prot(Reviewed)

20389

TrEMBL(Unreviewed)

61448

Total

81837

81579

 

Swiss-Prot stands out among these databases. It is a high-quality, manually curated, non-redundant database primarily derived from research findings in literature and computationally analyzed results validated by E-value verification. Only data meeting quality standards are included in this database, making it a verified resource.

 

UniProtKB/TrEMBL, in contrast, consists of automatically translated nucleotide-encoded sequences, which undergo high-quality annotation and classification. This database is categorized as unverified.

Proteomes contain protein information translated and annotated from nucleotide sequences of whole-genome sequenced species, with each dataset assigned a Unique Proteome Identifier (UPID).

 

1) Comparison of Detected Protein Count

To assess the influence of different databases on the quality of proteomic data, Metware conducted searches using human cell proteomic data (divided into groups A and B, each with 4 replicates) across various databases. Subsequently, the qualitative and quantitative results of protein identification were evaluated.

 

Peptides identified using the Swiss-Prot, Proteome, and UniProteinKB databases numbered 100,821, 100,723, and 99,597, respectively. The corresponding quantities of identified proteins were 7,798, 8,158, and 8,452, respectively. The proportion of proteins and peptides identified across all three databases collectively was 85.97% and 87.21%, respectively.

 

 

Figure 2. Differences in Protein Detection Data Across Different Databases for Cell Samples

 

2) Comparison of Missing Values

An analysis was conducted on the quantitative missing values across different databases. The trend of missing values for all samples showed consistency across the three databases. However, Proteome and UniProtKB exhibited overall higher missing value rates compared to Swiss-Prot. In the comparison of missing values, Swiss-Prot showed a slight advantage over the other databases.

 

Figure 3. Missing Values in Protein Detection across Different Databases for Cell Samples

 

Summary:

a) Compared to the UniProtKB/Swiss-Prot database, using the Proteome and UniProtKB databases for searching resulted in a slight increase in the number of identified proteins by 4.62% and 8.44%, respectively. However, the increase was not significant.

b) Compared to the Proteome and UniProtKB databases, the Swiss-Prot database had the least sequence information and identified the fewest proteins. However, it had the highest number of identified peptide segments, indicating higher accuracy and better matching of protein sequences in Swiss-Prot.

c) With the increase in database usage, the proportion of missing values in quantified proteins also increased, indicating that many of the additionally identified proteins may have lower ion intensity and could potentially be false positives relative to the database.

 

In summary, while the Swiss-Prot database may have slightly fewer identified proteins, it offers superior qualitative accuracy and quantitative stability. Overall, for human cell samples, it is recommended to use the Swiss-Prot database for proteomic analysis.

WHAT'S NEXT IN OMICS: THE METABOLOME

Please submit a detailed description of your project. We will provide you with a customized project plan metabolomics services to meet your research requests. You can also send emails directly to support-global@metwarebio.com for inquiries.
Name can't be empty
Email error!
Message can't be empty
CONTACT FOR DEMO

Related Metware Metabolomics Service

+1(781)975-1541
LET'S STAY IN TOUCH
submit
Copyright © Metware Biotechnology Inc. All Rights Reserved.
support-global@metwarebio.com +1(781)975-1541
8A Henshaw Street, Woburn, MA 01801
Contact Us Now
Name can't be empty
Email error!
Message can't be empty