Archives

Previous <·····> Next


Speaker: Mark McDowall, Micromass, Manchester, M23 9LZ, United Kingdom

Topic: An Integrated Approach to Automated High Throughput Protein Identification by 2D Gel Electrophoresis and Mass Spectrometry

Place: Building 426, Conference Room, NCI-Frederick, Frederick, MD

Time: Tuesday, March 21, 2000, at 2:00 PM

Abstract: Establishing the function of gene products is the major challenge of the post genomic era. The rate-limiting step in this endeavour is the speed with which proteins can be isolated and identified.

Separation of proteins from cell lysates or sub-cellular domains by 2D gel electrophoresis is an established method of visualizing these complex systems. Recently, mass spectrometry has proved to be a powerful method of further characterizing these proteins. From the mass spectrum of the enzyme digest of a 2D gel spot, the resulting digest map is compared with the theoretical maps from the databases and the protein identified when these correlate. MALDI-TOF is of great benefit in these studies since it requires a minimal amount of sample, is relatively tolerant to salts and other contaminants arising from the gel and may be configured for automated sample analysis. High throughput with automated analyses including data processing and client-server database searching are already available. Our system automatically acquires the data and processes the MALDI mass spectrum into a monoisotopic peak list. This peak list is then automatically sent to a networked database for protein identification.

When proteins are not identified from the MALDI analysis or an ambiguous result is obtained, then further analysis of the sample by Electrospray CapLC-MS-MS is required. The development of a hybrid quadrupole orthogonal acceleration time-of-flight mass spectrometer has facilitated the generation of unambiguous amino acid sequences from the MS-MS analyses of tryptic peptides. These MS-MS spectra can be automatically searched against protein, nucleotide or EST databases, thus enabling protein identification from gel spots, despite non-specific enzymatic cleavage, protein co-migration and post translational modifications.

For organisms whose genome sequences are poorly represented in the data bases de novo amino acid sequencing may be required. Interfering de novo sequences from MS-MS data is complex and is often the rate-determining step in this method. However, it is now possible to interpret the MS-MS spectrum automatically. In our approach, the raw MS-MS spectrum is reduced to the plausible single-charge, monoisotopic mass spectrum. Sequence interpretation is achieved by generating "trial sequences" consistent with the experimentally determined molecular weight. A probabilistic fragmentation model is used to transform the trial sequences to predicted spectra for comparison to the single-charge, monoisotopic spectrum and to calculate the likelihood that the trial sequence would account for the observed data. The possible number of trial sequences for a peptide containing any of the 20 naturally occurring amino acids is large, for example, there are 2010 possible sequences for a peptide containing any of the 20 naturally occurring amino acids and having 10 residues. To reduce the scale of the problem, a terminated Markov Chain Monte Carlo algorithm is used to produce sequences. This Bayesian method simulates an exhaustive search of all sequences having the correct mass.


MSIG Home  Meetings  Members  Join MSIG  Special Items  Archives  Links


Updated 11-January-2001

Copyright © 1999-2006 The National Cancer Institute at Frederick (Frederick, MD 21701 USA)