Parameter Landscape Analysis for Improving the Performance of Common Motif Detection Algorithms
2
Citation
6
Reference
20
Related Paper
Abstract:
It is known that genes with similar expression profile are likely to be regulated by a common transcription factor and finding the common cis-element to which the protein binds from the upstream regions of these genes is very important. Although several famous programs for motif extraction exist nowadays, end-users often hesitate about the reliability of the output results. One reason is that the user does not have any a priori knowledge about the motif, i.e., either it is a single motif, multiple similar motifs, or multiple non-similar motifs; even some of the sequence may not contain any motifs at all. Although the performance of some of these famous programs, such as MEME [1], Gibbs DNA [3] and Consensus [5], has been studied [2, 4, 6], they were usually done in their default mode, which cannot be suited for all the circumstances. In our research, we systematically explored various possibilities of parameter-setting, which is the most complicated part of these programs in realistic situations. In this way, we tried to solve the problem of elucidating strongly corrupted motifs, such as 10/2, 12/3, 15/4 (motif length/mismatches), which have been reported before as being very difficult with the above algorithms [4].Keywords:
Motif (music)
Sequence motif
As the transcription factor binding sites,the motifs play an important role in promoter recognition and the gene transcription and expression.Finding their characteristics is obviously a meaningful subject.According on the correlation detection of motifs as promoter features,a method for getting motif packages is given by independent component analysis.It can decompose the obverved motif frequency matrix into smaller components and finally obtain the motif packages with concurrent motifs.In the experiment results,the influence of DNA sequence data selection to the motif packages is analyzed.More stable motif relation structures can be obtained using longer motifs.
Motif (music)
Sequence motif
Structural motif
Transcription
Cite
Citations (0)
Abstract Motivation: We demonstrate a computational process by which transcription factor binding sites can be elucidated using genome-wide expression and binding profiles. The profiles direct us to the intergenic locations likely to contain the promoter regions for a given factor. These sequences are multiply and locally aligned to give an anchor motif from which further characterization can take place. Results: We present bases for and assumptions about the variability within these motifs which give rise to potentially more accurate motifs, capture complex binding sites built upon the basis motif, and eliminate the constraints of the currently employed promoter searching protocols. We also present a measure of motif quality based on the occurrence of the putative motifs in regions observed to contain the binding sites. The assumptions, motif generation, quality assessment and comparison allow the user as much control as their a priori knowledge allows. Availability: IGRDB and the datasets mentioned herein are available at http://chipdb.wi.mit.edu/ Contact: rhonda@bu.edu * To whom correspondence should be addressed.
DNA binding site
Cite
Citations (12)
BACKGROUND: Transcriptional regulation is a key mechanism in the functioning of the cell, and is mostly effected through transcription factors binding to specific recognition motifs located upstream of the coding region of the regulated gene. The computational identification of such motifs is made easier by the fact that they often appear several times in the upstream region of the regulated genes, so that the number of occurrences of relevant motifs is often significantly larger than expected by pure chance. RESULTS: To exploit this fact, we construct sets of genes characterized by the statistical overrepresentation of a certain motif in their upstream regions. Then we study the functional characterization of these sets by analyzing their annotation to Gene Ontology terms. For the sets showing a statistically significant specific functional characterization, we conjecture that the upstream motif characterizing the set is a binding site for a transcription factor involved in the regulation of the genes in the set. CONCLUSIONS: The method we propose is able to identify many known binding sites in S. cerevisiae and new candidate targets of regulation by known transcription factors. Its application to less well studied organisms is likely to be valuable in the exploration of their regulatory interaction network.
Identification
DNA binding site
Cite
Citations (12)
Transcription Factors (TFs) control transcription by binding to specific sites in the promoter regions of the target genes, which can be modelled by structured motifs. In this paper we propose AliBiMotif, a method combining sequence alignment and a biclustering approach based on efficient string matching techniques using suffix trees to unravel approximately conserved sets of blocks (structured motifs) while straightforwardly disregarding non-conserved stretches in-between. The ability to ignore the width of non-conserved regions is a major advantage of the proposed method over other motif finders, as the lengths of the binding sites are usually easier to estimate than the separating distances.
DNA binding site
Conserved sequence
Multiple sequence alignment
Motif (music)
Biclustering
Cite
Citations (2)
Many biologically active regions of a genome can be discovered by searching for small sequence patterns, or motifs. A class of motifs of great interest in biology corresponds to sites bound by gene regulatory proteins. The shortness and degeneracy of these sites have, however, frustrated standard sequence-based, motif discovery methods. Moreover, as classical experiments in Molecular Biology have shown, a binding site for a regulatory protein can assume different biological functions in different promoter regions, rendering standard methods unsuitable for motif classification.
Previous studies have shown that the binding sites for some regulatory proteins have positional preferences with respect to the transcription start site. Making use of the precise transcription start site locations, this thesis describes computational methods to detect binding sites based on their positional and nucleotide preferences. Three different methods of this type are described: (1) an enumerative statistical test, related to gapless BLAST statistics, that detects octanucleotides that are unusually clustered with respect to the transcription start site in promoter sequences, (2) a Gibbs-sampler program that can use the results generated by the statistic (mentioned in 1) to anchor a multiple alignment on any set of positions thought to contribute to a common binding site, and (3) a statistical method to detect clusters of previously defined motifs in promoter sequences anchored on the transcription start site. Extensions to the Gibbs sampler program including a post-processing step, a Markov background model and a Bayesian positional model are also described.
Examples from datasets containing known binding sites revealed that positional information lends better retrieval accuracy. In silico validation of the motifs using gene expression data and functional similarity data demonstrated that some binding sites can have two different roles in transcription regulation (activation or repression), depending on where they are positioned with respect to the transcription start site. The results from this thesis broaden our understanding of positional control in gene regulation, and illustrate the significance of incorporating positional information in motif discovery methods. All the tools developed in this study have been made available for download via the World Wide Web.
DNA binding site
Sequence motif
Cis-regulatory module
Degeneracy (biology)
Cite
Citations (0)
Motif (music)
DNA binding site
Cite
Citations (3)
Recognition of motifs in multiple unaligned sequences provides an insight into protein structure and function. The task of discovering these motifs is very challenging because most of these motifs exist in different sequences in different mutated forms of the original consensus motif and thus have weakly conserved regions. Different score metrics and algorithms have been proposed for motif recognition. In this paper, we propose a new genetic algorithm based method for identification of multiple motifs instances in multiple biological sequences. The experimental results on simulated and real data show that our algorithm can identify multiple occurrences of a weak motif in single sequences as well as in multiple sequences. Moreover, it can identify weakly conserved regions more accurately than other genetic algorithm based motif discovery methods.
Identification
Cite
Citations (36)
Motif (music)
Sequence motif
Cite
Citations (1)
Transcriptional regulation is the mechanism in the cell that controls when and how genes are expressed into proteins. This document gives an overview over current computational approaches that try to predict motifs that control the process. Motifs are short degenerate words or patterns within promotor sequences, putative binding sites, which are commonly usually upstream of a gene. The two common approaches in bioinformatics (matching against known representatives and ab initio prediction) are presented. For the latter, we describe algorithmical details of most existing implementations and introduce a tool that could simplify everyday work with these programs.
Motif (music)
Sequence motif
Cite
Citations (5)
The transcriptional regulation of eukaryotic genes is one of the major problems in the post-genomics era.The preliminary work is to identify the transcription factor binding sites(motifs)and their distributions in DNA.In this paper,we first counted the occurrence numbers of the motifs in the upstream promoter sequences of the ribosomal protein(RP)genes of Saccharomyces cerevisiae yeast based on Markov chain model.Then some over-and under-represented motifs were extracted by using a Z-score statistic.95% of these motifs are accordance with the transcription factor binding sites which are verified by experimental analyses.Pairing the above motifs each other and comparing to a set of background sequences,we detected some motif pairs with statistical significance both on occurrence numbers and on distance distributions in the RP genes of yeast.Combinatorial transcription regulation probably takes place for every these non-random motif pairs.The combinatorial regulations of some of these motif pairs have been verified by laboratory work.Checking the positions of the motif pairs,it was found that about 94% of the motif pairs are located upstream to transcription start sites(TSS).For an overwhelming majority of the motif pairs,the distances between each two motifs are less than 100bp,and 30% of them are less than 30bp.Such a small space of a motif pair may be favorable for the interaction of the two motifs.These results will be helpful for understanding the mechanisms of the transcriptional regulation for RP genes in yeast.
Motif (music)
Sequence motif
DNA binding site
Transcription
Cite
Citations (0)