+1(781)975-1541
support-global@metwarebio.com

Proteomics Biomarker Discovery: A Quantitative Workflow Guide

This concise guide walks through a start-to-finish proteomics biomarker discovery workflow, emphasizing quantitative proteomics choices that matter—study design and pre-analytics (plasma vs serum), discovery with DIA proteomics/label-free, proteomics QC before statistics, and closing the loop with PRM/MRM verification. If you’re building plasma proteomics (Blood DIA) cohorts and need a practical route from discovery to actionable candidates, start here.

 

Proteomics Biomarker Discovery Workflow: Discovery, Qualification, and Validation

In the literature, biomarker development is typically divided into three phases: discovery, qualification (screening), and validation. The validation phase is often further split into analytical validation and clinical validation. Because the discovery phase aims to identify a large pool of candidate biomarkers, it primarily relies on in-depth, non-targeted proteomics to identify and quantify as many proteins as possible. This phase typically yields dozens to hundreds of candidates, which are subsequently assessed during qualification and validation. The purpose of the screening/qualification phase is to confirm that target proteins exhibit statistically significant abundance differences between disease and control groups. The number of samples analyzed at this stage depends on disease complexity; typically, tens to hundreds of samples are used to verify differential abundance in the candidates. The goal of the validation phase is to confirm the practical utility of the biomarker assay; only a small subset (3 to 10) of the top candidates proceeds to analytical validation.

Figure 1. Phases of biomarker development

Figure 1. Phases of biomarker development [1]

 

What Are Biomarkers and Why Use Proteomics?

In 2001, a National Institutes of Health working group defined a biomarker as “a characteristic that is objectively measured and evaluated as an indicator of normal biological processes, pathogenic processes, or pharmacologic responses to a therapeutic intervention”. Based on this definition, biomarkers should meet two criteria: (1) they must be objectively measurable, and (2) they must be usable to evaluate a specific process in the human body.

Proteomics is the systems-level study of the protein composition of cells, tissues, organs, or organisms and how it changes. Proteomics reveals protein expression, post-translational modifications, interactions, and dynamics, thereby elucidating relationships between protein function and cellular activities to provide a comprehensive, protein-level understanding of cellular processes and disease onset and progression.

Proteins are produced through transcription, translation, and post-translational modification of genes. As the direct executors of biological function, proteins are well suited to serve as biomarkers indicating physiological or pathological abnormalities. Protein biomarkers can be broadly categorized as physiological or pathological. Physiological biomarkers can assess normal biological states, whereas pathological biomarkers support disease diagnosis, prognosis, and pharmacodynamic evaluation, and are widely used in clinical practice.

 

Selected clinical protein biomarkers [2][3][4]

Category

Name

Disease

Tumor-related biomarkers

β2-microglobulin

Multiple myeloma; lymphoid malignancies

C-peptide

Insulinoma

Ferritin

Leukemia; pediatric teratoma

Bence Jones protein

Multiple myeloma

Immunoglobulins

Multiple myeloma; lymphoma

Ceruloplasmin

Liver, gastrointestinal, and pancreatic cancers

Thyroglobulin

Thyroid cancer

Pancreatic embryonic antigen

Pancreatic cancer

Serum proteins

Hepatocellular carcinoma; multiple myeloma

M-protein

Myeloma; lymphoma

Cytokeratin 19 fragment

Non-small cell lung cancer

Infection markers

C-reactive protein (CRP)

Bacterial and viral infections

Procalcitonin (PCT)

Bacterial infection; sepsis

Serum amyloid A (SAA)

Early identification of viral and bacterial infections

Interleukin-6 (IL-6)

Acute infection

Disease-related proteins

Amyloid-β (Aβ)

Alzheimer’s disease

Tau protein

Apolipoprotein E (ApoE)

S100B

Traumatic brain injury

GFAP

UCH-L1

 

Study Design for Biomarker Discovery

Biomarkers can be used for clinical screening, diagnosis, or monitoring of disease activity, and can guide targeted therapy or evaluate treatment response; their practicality and importance are well established. A complete biomarker discovery effort can be organized into six steps: discovery, qualification, verification, research assay optimization, clinical validation, and commercialization. The first three phases fall within the research domain—some investigators omit qualification and move directly to verification—whereas the latter three belong to the IVD domain.

Figure 2. Workflow for developing new protein biomarker candidates

Figure 2. Workflow for developing new protein biomarker candidates [5]

 

Sample Preparation

As the saying goes, “the eyes are the window to the soul.” In disease research, blood is among the most frequently studied sample types because it circulates throughout the body and reflects characteristics of systemic physiology. The following focuses on blood sample preparation.

Plasma
Plasma is the supernatant remaining after all cellular components are removed, during which the fibrin clot is eliminated. The procedure is as follows: collect blood in tubes containing EDTA or sodium heparin, gently invert immediately to mix, centrifuge at 3000 rpm for 10 minutes at 4 °C, and transfer the supernatant (plasma) to a centrifuge tube.

Serum
Serum is prepared without anticoagulants. The procedure is as follows: collect blood in serum-separator tubes containing a coagulant (commonly recommended brand: BD), allow the sample to clot at room temperature for 60 minutes (note: do not shake the tube), centrifuge at 3000 rpm for 10 minutes at 4 °C, and aliquot the supernatant (serum) into appropriate centrifuge tubes.

Differences between plasma and serum
The coagulation process makes serum fundamentally different from plasma. The fibrin clot sequesters most of the fibrinogen present in plasma; consequently, removal of the clot yields a lower protein concentration in serum than in plasma, although the difference is only about 3–4%. Additional proteins may be removed through specific or nonspecific interactions within the fibrin clot. Although it is often assumed that many coagulation factors are removed during serum preparation, factors IX, X, XI, and VII/VIIa are in fact detectable in serum. The principal effects of coagulation are removal of the fibrin clot, platelets, red blood cells, and white blood cells, along with increases in the concentration of certain proteins in serum. Multiple studies indicate that levels of vascular endothelial growth factor (VEGF) are influenced by platelets in both serum and plasma, but the effect is more pronounced in serum.

In summary, plasma offers several advantages over serum for proteomics: (1) plasma sampling is simpler—parameters such as clotting time and centrifugation time during serum preparation can substantially affect the proteome; (2) the total protein concentration of serum is lower than that of plasma, and some proteins are removed during clotting; and (3) serum is more strongly affected by platelet-derived constituents.

 (See more: Blood Proteomics: Serum or Plasma – Which Should You Choose?)

 

Data Acquisition: TMT vs DIA vs LFQ

A key advantage of MS-based proteomics is the ability to quantify proteins across a broad dynamic range. Several label-free quantitative approaches have been developed for bottom-up proteomics. Label-free methods include label-free DDA and DIA, whereas labeled approaches include TMT and iTRAQ. The table below summarizes commonly used methods and their features.

 

Major MS-based proteomics techniques and characteristics

Technique

Labeling

Instruments

Scan mode

Identification level

Quantitation level

Advantages

Disadvantages

Label-free

None

Orbitrap Astral; timsTOF Pro2; Orbitrap Exploris 480; Q Exactive HF-X; …

DDA

MS2

MS1

Broad applicability

Lower quantitative accuracy and reduced identification depth

DIA

DIA

MS2

Broad applicability; comprehensive data; accurate quantitation

Complex data processing

TMT/iTRAQ

Labeled

Orbitrap Astral; Orbitrap Exploris 480; Q Exactive HF-X; …

DDA

Accurate identification; good reproducibility

Ratio compression (reduced sensitivity); reagent batch effects

PRM

Targeted quantitation

Orbitrap Astral; timsTOF Pro2; Q Exactive HF-X; …

PRM

High sensitivity and accuracy; absolute quantitation achievable

Low protein-level throughput

 

Statistical Analysis and Candidate Filtering

Initial screening of differentially expressed proteins

Screening for differential proteins is often the most basic yet critical component of data analysis. Common methods include the t-test, fold-change analysis, and analysis of variance (ANOVA). Fold-change (FC) reflects the magnitude of between-group differences, whereas the p-value/FDR (from the t-test or ANOVA) indicates statistical significance. Accordingly, proteomics analyses typically combine FC with p-value/FDR as criteria for differential protein selection. This combination is applicable to two-group comparisons; for multi-group comparisons, ANOVA—using the p-value/FDR as the selection metric—is required.

Biomarker selection strategies

Common strategies include: (1) identifying candidates using machine-learning algorithms; (2) selecting candidates based on protein expression levels combined with functional analysis; and (3) selecting candidates based on protein expression integrated with clinical phenotypic data. The first approach is relatively straightforward and therefore most frequently used.

Machine-learning workflows generally require two datasets—a training set and a validation set. These may originate from a single cohort or, respectively, from a discovery cohort and a validation cohort. The training set includes explicit group labels and is used to analyze quantitative proteomics data, select a candidate protein panel, and build a predictive model. The validation set is used to evaluate model performance, i.e., the classification accuracy of the candidate panel for the disease. Common algorithms include random forests, support vector machines, deep neural networks, and naïve Bayes; in practice, multiple algorithms are often combined for biomarker selection.

 

Validation Approaches (PRM, ELISA, etc.)

Once target proteins have been selected, multiple approaches can be used for validation. Common methods include MS-based PRM, antibody-based western blot (WB), and enzyme-linked immunosorbent assay (ELISA). Other techniques include immunohistochemistry (localization, qualitative assessment, and relative quantitation via chromogenic detection of labeled antibodies) and real-time quantitative PCR (indirectly verifying differential proteins via relative gene abundance). (See more: ELISA vs. Western Blot)

WB and ELISA rely on antigen–antibody pairs to quantify target proteins; however, many proteins lack well-validated commercial antibodies, and even when available, specificity and stability can be problematic. In addition, single-experiment throughput is low and costs are high. These limitations substantially constrain downstream validation in proteomics. The MS-based targeted detection approach—PRM—addresses these issues by eliminating the need for antibodies and enabling the simultaneous measurement of dozens of proteins in a single run.

Parallel reaction monitoring (PRM) detects target proteins using high-resolution mass spectrometers (e.g., Orbitrap Exploris 480). In PRM, peptide precursor ions are isolated by the first quadrupole (Q1), transmitted via the C-trap into the multistage ion path, fragmented by high-energy collision-induced dissociation to generate product ions, returned to the C-trap, and finally acquired in the Orbitrap to yield MS/MS spectra for the target peptides.

Figure 3. Principle of PRM

Figure 3. Principle of PRM [7]

 

Case Studies from Clinical Research

Alcohol-related liver disease (ALD) is a common chronic liver disease that frequently progresses to cirrhosis and is a major indication for liver transplantation. Nearly 75% of ALD patients are diagnosed only after decompensated cirrhosis has developed, resulting in missed optimal treatment windows. The diagnostic accuracy of non-invasive biomarkers for early disease detection remains limited, severely hindering timely identification and intervention in high-risk populations. There is an urgent need for minimally invasive strategies to screen patients within high-risk groups.

This study performed proteomics analyses in ALD patients, healthy controls, and an independent ALD validation cohort, revealing marked proteome remodeling in both liver and plasma, and establishing the diagnostic value of circulating proteins for detecting hepatic fibrosis, inflammatory activity, and steatosis. Incorporating follow-up data further demonstrated the model’s prognostic performance for liver-related events and all-cause mortality, informing clinical diagnosis and decision-making.

Figure 4.proteomics biomarker discovery workflow blood plasma DIA evosep orbitrap exploris 480

Figure 4. proteomics biomarker discovery workflow blood plasma DIA evosep orbitrap exploris 480 [7]

 

Proteomics biomarker discovery workflow: blood plasma cohorts with pooled-plasma QC, automated sample prep (denaturation/reduction/alkylation, 95 °C; digestion, 37 °C), LC–MS/MS via EVOSEP One and Orbitrap Exploris 480 using DIA 21-min gradient with spectral library, analyzed in Spectronaut, Python/Jupyter, Perseus, STATA, and Cytoscape.

 

Common Pitfalls in Protein Biomarker Discovery

Biomarker programs often fail for avoidable reasons: uncontrolled pre-analytics (hemolysis, freeze–thaw cycles, clotting time) that confound plasma proteomics vs serum comparisons; hidden batch effects without pooled-QC tracking; choosing discovery chemistry misaligned to the question (e.g., expecting very low-abundance targets from shallow runs); misunderstanding isobaric labeling ratio compression; proceeding to modeling before QC and missing-value handling; data leakage and overfitting from feature selection outside nested cross-validation; relying only on internal CV without an external validation cohort; reporting significance without proper FDR control; skipping targeted verification (no PRM/MRM method, no isotopic standards); and attempting clinical claims without a clear discovery→verification→clinical validation plan. Recognizing these risks early—and documenting mitigations in your protocol—preserves power, improves reproducibility, and shortens the path from discovery to a usable biomarker panel.

 (See more: Metabolomics batch effectsTranscriptomics batch effects)

 

FAQ: Proteomics Biomarker Discovery Workflow

Q1. Is plasma better than serum for biomarker discovery?
For most clinical cohorts, plasma proteomics (Blood DIA) is preferred because it reduces platelet-driven artifacts from clotting and improves reproducibility.

Q2. When should I choose DIA over label-free DDA?
Use DIA proteomics for discovery in medium-to-large cohorts to minimize missing values and stabilize quantitation; label-free DDA suits small, exploratory studies prioritizing depth.

Q3. PRM vs MRM for verification—how do I decide?
PRM (HRMS) is flexible and specific for multiplexed verification; MRM (triple quad) is ideal for high-throughput routine assays once targets and methods are finalized.

Q4. How many candidates typically enter analytical validation?
A focused panel of about 3–10 proteins usually advances, balancing assay effort with clinical utility.

Q5. Can I combine MS-based discovery with antibody platforms (e.g., Olink, SomaScan)?
Yes—use them as orthogonal validation or to expand coverage while keeping the core MS discovery → PRM/MRM verification workflow for traceable quantitation.

Q6. Where can I find QC and ML best practices?
See Common Pitfalls in Protein Biomarker Discovery for QC acceptance, batch-effect control, and avoiding overfitting/data leakage—then apply those checks before modeling.

 

Conclusion: Outlook for Protein Biomarkers

Robust biomarkers for complex diseases can be measured in blood or other readily accessible biofluids—an observation supported by extensive precedent—and cancer screening, diagnosis, and treatment increasingly depend on more effective biomarkers. Genomics, transcriptomics, and proteomics together underpin the next generation of biomarker research.

The paucity of newly discovered, validated, and clinically deployed protein biomarkers has raised concerns about future success. Nevertheless, advances in methodology and technology are enabling a more coherent biomarker pipeline with a higher likelihood of success than in the past. Historically, one obstacle to proteomics-based biomarker discovery has been the detection of low-abundance proteins in blood. With technological progress, SomaScan now measures >10,000 proteins in blood samples, and Olink assays can quantify proteins at the pg/mL level. Sample throughput has also been a constraint, but Thermo Fisher’s recently released Orbitrap Astral can analyze up to ~180 samples per day while maintaining protein coverage, partially alleviating this bottleneck. With continued innovation, optimization of the biomarker discovery workflow, and improvements in robustness and reproducibility, increasing numbers of protein biomarkers are expected to support earlier disease screening and more effective evaluation of emerging therapies.

 

How We Can Help: DIA Proteomics, Blood DIA, and PTM Proteomics

MetwareBio supports end-to-end proteomics biomarker discovery with a practical, quantitative workflow: discovery by DIA proteomics or label-free, Blood DIA (plasma proteomics) for clinical cohorts, and PTM proteomics (e.g., phospho-, acetyl-, ubiquitin-proteomics) to capture mechanism-relevant signals. We provide curated reports, QC summaries, differential analysis, candidate shortlists, and PRM/MRM method recommendations for verification—plus optional external-cohort support. If you’re planning a study or need to translate candidates into a validated panel, contact us to tailor a proteomics biomarker discovery workflow to your samples, timeline, and endpoints.

 

Reference

[1] Ernesto S Nakayasu,Marina Gritsenko et al.Tutorial: best practices and considerations for mass-spectrometry-based protein biomarker discovery and validation.Nat Protoc.2021 Aug;16(8):3737-3760.

[2] Biomarkers Definitions Working Group.. Biomarkers and surrogate endpoints: preferred definitions and conceptual framework. Clin Pharmacol Ther. 2001 Mar;69(3):89-95. doi: 10.1067/mcp.2001.113989. PMID: 11240971.

[3] Puntmann VO. How-to guide on biomarkers: biomarker definitions, validation and applications with examples from cardiovascular disease. Postgrad Med J. 2009 Oct;85(1008):538-45. doi: 10.1136/pgmj.2008.073759. PMID: 19789193.

[4] Strimbu K , Tavel J A .What are biomarkers?[J].Current Opinion in Hiv & Aids, 2010, 5(6):463-6.DOI:10.1097/COH.0b013e32833ed177.

[5] Nader Rifai , Michael A Gillette, Steven A Carr.Protein biomarker discovery and validation: the long and uncertain path to clinical utility.Nat Biotechnol.2006 Aug;24(8):971-83.

[6] Issaq HJ, Xiao Z, Veenstra TD.Serum and Plasma Proteomics.Chem Rev. 2007 Aug;107(8):3601-20.

[7] Rauniyar N.Parallel Reaction Monitoring: A Targeted Experiment Performed Using High Resolution and High Mass Accuracy Mass Spectrometry.Int J Mol Sci. 2015 Dec 2;16(12):28566-81.

Contact Us
Name can't be empty
Email error!
Message can't be empty
CONTACT FOR DEMO

Next-Generation Omics Solutions:
Proteomics & Metabolomics

Have a project in mind? Tell us about your research, and our team will design a customized proteomics or metabolomics plan to support your goals.
Ready to get started? Submit your inquiry or contact us at support-global@metwarebio.com.
Name can't be empty
Email error!
Message can't be empty
CONTACT FOR DEMO
+1(781)975-1541
LET'S STAY IN TOUCH
submit
Copyright © 2025 Metware Biotechnology Inc. All Rights Reserved.
support-global@metwarebio.com +1(781)975-1541
8A Henshaw Street, Woburn, MA 01801
Contact Us Now
Name can't be empty
Email error!
Message can't be empty