Recent advances in DNA sequencing methods have made large-scale human genetic investigations a reality. This information is already beginning to illuminate human biology. By contrast, the research community has become increasingly circumspect as to when or whether we will be able to capture all of the protein information that is downstream of the genetic blueprint. Proteomic studies of human blood, in particular, have provided iterative confirmation that the complexity of protein analytes vastly exceeds that of DNA. A myriad of regulatory processes from translation to proteolytic degradation lead to dauntingly large numbers of proteins that need to be cataloged, and subtle modifications of the structures of these species make their unambiguous identification extremely challenging. Furthermore, the dynamic range of protein abundances can span more than 10 orders of magnitude in blood, and current technologies strain to document linearity of analytes beyond about 4 or 5 logs of variation.1
These obstacles have limited the number of reproducible reports that have successfully tied proteomic data to clinical phenotypes—particularly in the field of cardiovascular biology—and have biased most of what has been discovered toward very abundant species. So, is it time to throw in the towel on proteomics? Is the problem too big for our present toolkit, or is there any true hope on the horizon?
In this Viewpoint, we review several of the strategies currently used in proteomic biomarker research and highlight advances that are beginning to provide the first glimpses of how, in practical terms, proteomics may move beyond the early stages of development and be applied to large-scale, population-based studies.
Proteomic experiments can follow either a nontargeted or a targeted strategy. In a nontargeted experiment, one attempts to acquire information on every analyte in a mixture in an unbiased manner. The advantage to such an approach is that it maximizes the amount of information that one can gather from a sample and provides the opportunity for discovery of completely new protein species. The drawback is that such experiments generally have very low throughput owing to the need for upstream processing to reduce sample complexity prior to analysis.
In practice, most nontargeted or discovery studies are conducted using liquid chromatography (LC) coupled with tandem mass spectrometry (MS/MS). In a typical experiment in human serum or plasma samples, several dozen of the most abundant protein species are initially depleted using antibody-based techniques to simplify the dynamic range of concentrations in the sample. Proteins are then cleaved to peptides using proteases, and these peptides are separated by charge, size, or hydrophobicity using LC and injected into the mass spectrometer. Depending on the complexity of the starting material, several separation techniques can be combined upstream of the LC step in what is termed a multidimensional separation strategy. The tradeoff to each of these additional steps is a significant cost in experimental time and reproducibility. Once in the mass spectrometer, peptides are ionized and their mass-to-charge ratios are measured by their time of flight through a vacuum system. Peptides of particular interest can be selected for additional analysis by MS/MS and fragmented into sequence-informative product ions by collision with a neutral gas. Each peptide’s mass-to-charge ratio, relative intensity, retention time from the LC column, and MS/MS sequence information (if available) can be cross-referenced against large databases, and the peptide can be identified with varying degrees of confidence.
Significant advances in sample fractionation methods, MS instrumentation, and MS analysis packages are occurring. Whereas 5 years ago, one could identify approximately 1000 proteins in a single unbiased LC-MS/MS experiment in human blood, today we are closer to analyzing 5000 proteins in several days.2 More than 84% of the annotated protein-coding genes in the human genome have been identified by MS,3 and new methods to systematically characterize posttranslationally modified protein species are under way.
The major limitation of LC-MS/MS nontargeted proteomics at this point is throughput. We are not yet ready to apply unbiased MS methods to large population-based cohorts. As noted here, a single sample can take many hours of upfront preparation and on-instrument time. This bottleneck has focused interest on the development of new multiplexing strategies for proteomic discovery. Chemical bar-coding techniques can be used to label individual samples prior to their pooling. Tandem mass spectrometry is then leveraged to reassign the proteomic information to the original sample.4 This strategy has already led to nearly a 10-fold increase in throughput.
Instead of attempting to acquire all of the information contained within a sample, targeted proteomics focuses on a list of prespecified analytes. This allows for methods to be developed specifically to extract these predefined species from complex samples with less dependence on nonspecific fractionation techniques. This improves assay sensitivity as well as throughput.
Affinity reagents that enrich for a specific protein in a complex sample are central to targeted proteomics. Unfortunately, enzyme-linked immunosorbent assay–grade antibodies are simply unavailable for most proteins. Using antibodies that are easier to produce than enzyme-linked immunosorbent assay–grade reagents, LC-MS/MS can replace the role of the secondary antibody in a traditional enzyme-linked immunosorbent assay and be used to measure fragment ions derived from the enriched protein species with absolute specificity and minimal upfront fractionation.5,6 Using these strategies coupled with recent advances in targeted MS analysis software, it is now possible to evaluate 300 protein analytes in approximately 10 patient samples a day.
Newer proteomic strategies also integrate tools presently used in genetic research. In one such approach, antibodies that recognize target proteins of interest are labeled with short oligonucleotides. Only when cognate antibodies are brought into close proximity during substrate binding can 2 complementary oligonucleotide tags serve as a substrate for detection by quantitative polymerase chain reaction. This strategy has allowed for the parallel measurement of approximately 100 proteins.7 A second approach uses DNA aptamers, which are short oligonucleotides that bind protein targets with high affinity and specificity.8 Extremely large libraries of 1014 aptamers can be generated at random using standard DNA technologies and iteratively screened for affinity to specific protein targets of interest. Following binding to their target proteins, aptamers can be processed and measured using commercially available DNA arrays. It is already possible for a single laboratory with expertise in proteomics to measure more than 1000 proteins that span 8 logs of dynamic range in several hundred patient plasma samples per week using these reagents.
The Outlook for Proteomics
Advances in proteomic throughput and coverage have provided us with proof of principle that large-scale proteomic studies to illuminate human pathobiology are on the horizon. Contributions to this progress have come from a spectrum of scientific disciplines, including biochemistry, physics, computer science, and medicine. It is clear that no one individual can conquer the diverse analytical challenges inherent to proteomics. Akin to the progress that has catapulted genetics, we need to form investigative teams that span individual disciplines and research centers to tackle this problem. Moreover, pooling of relevant data and sharing of expertise will allow us to more accurately adjudicate our progress and will provide further optimism for this field.
Corresponding Author: Robert E. Gerszten, MD, Cardiovascular Medicine, Beth Israel Deaconess Medical Center, 330 Brookline Ave, Boston, MA 02115 (rgerszten@partners.org).
Published Online: April 27, 2016. doi:10.1001/jamacardio.2016.0279.
Conflict of Interest Disclosures: All authors have completed and submitted the ICMJE Form for Disclosure of Potential Conflicts of Interest. Dr Sabatine reported receiving research grant support through Brigham and Women’s Hospital from Abbott Laboratories, Accumetrics, Amgen, AstraZeneca, Bristol-Myers Squibb, Critical Diagnostics, CVS Caremark, Daiichi-Sankyo, Eisai, Genzyme, Gilead, GlaxoSmithKline, Intarcia, Merck, Nanosphere, Roche Diagnostics, Sanofi, and Takeda. Dr Sabatine has received personal fees for consulting from Cubist, MyoKardia, Pfizer, Quest Diagnostics, Vertex, Zeus Scientific, as well as support from Poxel and Alnylam. No other disclosures were reported.
Disclaimer: Dr Sabatine is a Deputy Editor of JAMA Cardiology but was not involved in the editorial review or the decision to accept the manuscript for publication.
1.Anderson
NL, Anderson
NG. The human plasma proteome: history, character, and diagnostic prospects.
Mol Cell Proteomics. 2002;1(11):845-867.
PubMedGoogle ScholarCrossref 2.Keshishian
H, Burgess
MW, Gillette
MA,
et al. Multiplexed, quantitative workflow for sensitive biomarker discovery in plasma yields novel candidates for early myocardial injury.
Mol Cell Proteomics. 2015;14(9):2375-2393.
PubMedGoogle ScholarCrossref 4.Ross
PL, Huang
YN, Marchese
JN,
et al. Multiplexed protein quantitation in Saccharomyces cerevisiae using amine-reactive isobaric tagging reagents.
Mol Cell Proteomics. 2004;3(12):1154-1169.
PubMedGoogle ScholarCrossref 5.Kuhn
E, Addona
T, Keshishian
H,
et al. Developing multiplexed assays for troponin I and interleukin-33 in plasma by peptide immunoaffinity enrichment and targeted mass spectrometry.
Clin Chem. 2009;55(6):1108-1117.
PubMedGoogle ScholarCrossref 6.van den Broek
I, Nouta
J, Razavi
M,
et al. Quantification of serum apolipoproteins A-I and B-100 in clinical samples using an automated SISCAPA-MALDI-TOF-MS workflow.
Methods. 2015;81:74-85.
PubMedGoogle ScholarCrossref 7.Assarsson
E, Lundberg
M, Holmquist
G,
et al. Homogenous 96-plex PEA immunoassay exhibiting high sensitivity, specificity, and excellent scalability.
PLoS One. 2014;9(4):e95192.
PubMedGoogle ScholarCrossref 8.Gold
L, Ayers
D, Bertino
J,
et al. Aptamer-based multiplexed proteomic technology for biomarker discovery.
PLoS One. 2010;5(12):e15004.
PubMedGoogle ScholarCrossref