High-resolution differential melting curves of φX174 Y1 and Y2 restriction fragment DNAs, for which the base sequences were known, were measured at various sodium ion concentrations ranging from 195 to 2.3 mM. The curves were resolved into component peaks, and the change in the melting temperature, the change in the area, and the change in the breadth of each peak with change in salt concentration were examined. The locations of the melting regions corresponding to the peaks in the melting curves were assigned based on theoretical calculations of melting curves and stability maps. It was found that as the salt concentration was decreased from the high to the intermediate range, the breadths of the peaks on the low-temperature side decreased whereas those on the high-temperature side remained almost constant, and also the separation between the peaks along the temperature axis increased. Changes in the positions of peaks relative to one another were interpreted in terms of the difference in the free energy increase between a loop state and an end-coil state as the salt concentration decreased.
We propose a tetrahedral Gray code that facilitates visualization of genome information on the surfaces of a tetrahedron, where the relative abundance of each -mer in the genomic sequence is represented by a color of the corresponding cell of a triangular lattice. For biological significance, the code is designed such that the -mers corresponding to any adjacent pair of cells differ from each other by only one nucleotide. We present a simple procedure to draw such a pattern on the development surfaces of a tetrahedron. The thus constructed tetrahedral Gray code can demonstrate evolutionary conservation and variation of the genome information of many organisms at a glance. We also apply the tetrahedral Gray code to the honey bee (Apis mellifera) genome to analyze its methylation structure. The results indicate that the honey bee genome exhibits CpG overrepresentation in spite of its methylation ability and that two conserved motifs, CTCGAG and CGCGCG, in the unmethylated regions are responsible for the overrepresentation of CpG.
I " IA cDNA clone for the nitratelnitrite-inducible cytochrome P-450 (P-450) of the fungus Fusarium oxysporum (tentatively termed P'450dNIR) was isolated by an immunoscreening method.Sequence determination revealed a polypeptide of 403 amino acid residues (Mr = 44,371), which was shown to contain the full-length sequence of the fungal P-450.The amino terminus region of the predicted sequence contained neither the signal-like, hydrophobic domain that is commonly observed in microsomal P-450s nor the tagging prosequence that is essential for localization of mitochondrial P-450s.Further, the sequence exhibited higher homologies against those of soluble bacterial P-450s, in particular P-450s of Streptomyces, rather than those of eukaryotic P-450s including yeast and fungal P-450s.These results are highly indicative that P-450dNIR is the first soluble P-450 derived from eukaryotic organisms.The unique features might be related to the novel function of P-450dNIR, which is involved in a dissimilatory reduction of nitrite by the fungus.P-460dNIR was classified into a new family, P-45OLV, and the corresponding gene of the fungus was named CYP55.Cytochrome P-450 (P-450) is a collective term for hemoproteins that catalyze monooxygenase reactions against a wide variety of endogenous and exogenous substrates (1).Cytochromes P-450 are widely distributed among living organisms and have a variety of physiological functions.More than 150 P-450s have been cloned and sequenced.These homology comparisons indicate that they are, as a whole, encoded by a gene superfamily.Using a criterion that greater than 40% sequence homology places different P-450s in the same family, one can designate at least 27 different P-450 families (2).Most of these families are from mammals (3-10); the others are from an insect (ll), yeasts (12-14), fungi (15, 16), or bacteria (17-22).Recently, the first cDNA clone for a plant P-450 was identified (23).
Most multiple sequence alignment programs explicitly or implicitly try to optimize some score associated with the resulting alignment. Although the sum-of-pairs score is currently most widely used, it is inappropriate when the phylogenetic relationships among the sequences to be aligned are not evenly distributed, since the contributions of densely populated groups dominate those of minor members. This paper proposes an iterative multiple sequence alignment method which optimizes a weighted sum-of-pairs score, in which the weights given to individual sequence pairs are adjusted to compensate for the biased contributions. A simple method that rapidly calculates such a set of weights for a given phylogenetic tree is presented. The multiple sequence alignment is refined through partitioning and realignment restricted to the edges of the tree. Under this restriction, profile-based fast and rigorous group-to-group alignment is achieved at each iteration, rendering the overall computational cost virtually identical to that using an unweighted score. Consistency of nearly 90% was attained between structural and sequence alignments of multiple divergent globins, confirming the effectiveness of this strategy in improving the quality of multiple sequence alignment.
Cytochrome P450 (CYP) constitutes a large gene superfamily descended from a single common ancestor. CYP genes are widely distributed in all domains of life from bacteria, archaea, and viruses to higher plants and animals. Because of their monophyletic nature, all CYP genes may be hierarchically classified at several distinct levels based on similarity of the protein amino acid sequences. A five-level classification (class, group, clan, family, and subfamily) is reasonably stable and useful for conceptual categorization of CYP genes. With a few exceptions, genes in a clan are specific to a kingdom or phylum, whereas cross-kingdom genes may belong to the same group, indicating an ancient origin of CYP diversification. CYP proteins are often functionally categorized into catalysts of "endogenous," "secondary," and "xenobiotic" compounds according to their substrate specificities. It was once postulated that xenobiotic-metabolizing enzymes were derived from an endogenous substrate-catalyzing enzyme. Although functional flow from endogenous to xenobiotic substrates occurred, recent evidence from a wide range of genomic analyses has indicated that the opposite is the more dominant stream. Expression of most vertebrate CYP genes is regulated by internal and external stimuli through transcription factors in the nuclear receptor family and bHLH-PAS family. Some aspects of cooperative evolution between transcriptional regulators and their target genes are briefly reviewed.
The rst and one of the most important processes in the eld of genome annotation is to nd allgenes encoded in a genome and to identify all variation of transcripts. Although great progress incollection of a large amount of cDNA and EST sequences has been achieved, the goal is not yet close.A promising approach toward solution of this problem is to use comparative analyses of genomes andcDNA or protein sequences [1, 4]. These \homology-based gene-prediction programs perform wellwhen one or more closely related homologous sequence is available. However, accuracy in predictingexons drops sharply with a decrease in similaritybetween the reference and target sequences [5, 7]. Forexample, the performance of GeneWise [1], which is currently most popular and used for constructionof Ensembl annotation, falls behind that of ab initio methods, such as Genescan [2], when amino-acididentities between the reference sequence and translated target are less than ˘60%.It is natural to expect that combination of both approaches of homology-based and ab initiomethods may lead to better performance than that achieved by individual approaches. With thisexpectation, we developed the program aln [3] which adopts a dynamic programming algorithm tooptimizea totalscore derived from severallines of informationon similarityto known cDNA or proteinsequences, intrinsic statistical properties of coding and non-coding parts of genomic sequences, andsignal strengths around translational start sites and intron-exon boundaries. Although aln proved tosigni cantly outperform GeneWise for prediction of nematode genes [3], it had several shortcomingswhen applied to the human genome. We report here our attempt to adapt aln to human genome.