Identification of peptides within a known protein sequence using COMSEQ analysis of data containing multiple sequences

Steven D. Carson,Bruce Baggenstoss

Identification of peptides within a known protein sequence using COMSEQ analysis of data containing multiple sequences

1991

Abstract Modern methods of automated protein sequence analysis can provide high-quality data with which unambiguous amino-acid sequences can be determined, but analyses are more difficult when the sample is not pure. COMSEQ and auxillary programs were written to facilitate reconciliation of multiple amino-acid sequences potentially contained in noisy data with the known amino-acid sequence of the parent protein. The COMSEQ program prints a matrix in which the first vertical column represents the known amino-acid sequence of a selected protein. Each row of the matrix contains the sequencer yield corresponding to the amino acid in the first column, with each column corresponding to the sequencing reaction cycle. A diagonal which contains net increases of amino acids for each amino acid in the known sequence identifies a peptide potentially contained within the data. The number of matches for each diagonal over the entire known sequence are tabulated and presented as an aid to locating comparisons of greatest interest. The RNDSEQ program conducts multiple analyses using randomized versions of the known amino-acid sequence and tabulates the cumulative frequencies of potential sequence matches irrespective of the true known sequence. TRANSEQ is a utility program that translates edited sequence data from common databases into files that can be used by COMSEQ and RNDSEQ. The programs have been used successfully to identify two co-sequenced peptides from bovine serum albumin, an albumin peptide sequenced in the presence of hemoglobin,and to identify two sequences of rat α-2u-globulin that differ in their amino termini.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations