An Extracting System of Accurate ORFs from cDNA Sequences

2002 
Conventionally, an amino acid frame display has generally been used for the extraction of amino acid sequence from a cDNA sequence. In the frame each position of initiation and termination codon is displayed and a segment that starts at an initiation codon and terminates at a termination codon is identied; the obtained segments are identied as possible open reading frames (ORF), and among them, the longest ORF is identied as an amino acid sequence extracted from the cDNA. In the case where a frame shift error exists on a cDNA sequence, an ORF is split and displayed over 2 frames. Further, since the border of the split ORF is not clear, an amino acid sequence is, in general, identied with an error of tens of bases. It has been reported that statistical information included in a DNA sequence, such as coding potential, can be used to identify cloning errors including frame shifts [2]. Dr. Hirosawa showed that the application of a modied GeneMark program for detection of artifacts in cDNA clones. This program serves to provide a warning when any spurious split of protein-coding regions is detected. Though this method is eectiv e for detecting the split of protein-coding regions, it is dicult to detect the strict location of the frame-shift, because of the limitation of the statistical analysis. The most reliable method to identify the frame-shift errors in a DNA sequence is to use similarity information to known amino acid sequences. Methods of comparing a cDNA sequence with amino acid sequences in consideration of the occurrence of frame-shift errors in the DNA sequence have been developed including FASTY [5] and TRANS series developed by our laboratory [3]. Using
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    4
    References
    2
    Citations
    NaN
    KQI
    []