Combining machine learning and targeted mass spectrometry to validate protein isoforms
Why identify protein isoforms?
Alternative splicing plays a very important role in the heart. In a previous work, we found that the heart is one of the examined tissues that are the most affected by alternative splicing (link). Important proteins in the heart that have splice variants include tropomycin 1 as well as titin, the proper splicing of which have been implicated in congenital heart diseases.
A current blind spot of alternative splicing research is that most isoforms have been defined only at the mRNA level, and there are not enough technologies that can allow different protein isoforms to be detected. This is important if we want to know whether the spliced isoforms are correctly translated and what their potential molecular functions (localization, interactions) are. Sometimes the isoforms can be distinguished by gel migration patterns, but often the isoforms have very similar molecular weights and so may or may not separate on a gel.
What is spectrum prediction?
We previously developed an "RNA-guided proteomics" approach to identify some candidate protein isoforms in the heart from proteomics data (link). However, identifying "non-canonical" peptide sequences (i.e., protein products not encoded from the most common/prominent version of a gene) from shotgun proteomics data can be prone to false positives and so requires careful validation. Targeted mass spectrometry could provide an avenue to verify isoform discovery, by allowing the isoform to be targeted and detected again in additional samples. But a challenge of building targeted mass spectrometry assays is that it is labor intensive and often requires the use of expensive stable isotope labeled peptide standards to verify peptide identity.
Several approaches have now been described that allows the fragmentation spectrum of a peptide to be predicted in silico. This could mean an easier way to build targeted mass spectrometry assays without using expensive peptide standards. These prediction approaches usually fall into two camps, using either a detailed physicochemical model that predicts peptide behaviors, or using a data-driven, deep learning based approach to predict the fragmentation pattern from existing experimentald data. Prosit is one such deep learning algorithm that has been shown to perform exceptionally well in predicting the fragmentation spectra of peptides.
Does this work with alternative peptides?
The current Prosit model was trained against a large library of synthetic peptides, and it was not completely clear if it works well for alternative or novel isoform sequences. We verified this in our study by comparing Prosit prediction to a number of synthetic peptide standards for the isoform sequences we identified, then used the result to build a number of "computation-assisted" targeted mass spectrometry assay. We showed that these assays allow some candidate protein isoforms to be reliably re-identified in human heart tissue as well as cultured human AC16 cells, suggesting they have good potential to be employed for isoform quantification and functional studies.
To read more, check out this new paper by Erin, Juliana, and others online at JMCC (link)!