logo
    Motif relation analysis based on ICA technology
    0
    Citation
    0
    Reference
    20
    Related Paper
    Abstract:
    As the transcription factor binding sites,the motifs play an important role in promoter recognition and the gene transcription and expression.Finding their characteristics is obviously a meaningful subject.According on the correlation detection of motifs as promoter features,a method for getting motif packages is given by independent component analysis.It can decompose the obverved motif frequency matrix into smaller components and finally obtain the motif packages with concurrent motifs.In the experiment results,the influence of DNA sequence data selection to the motif packages is analyzed.More stable motif relation structures can be obtained using longer motifs.
    Keywords:
    Motif (music)
    Sequence motif
    Structural motif
    Transcription
    Identifying the transcriptional factor binding sites(or motifs) of eukaryotic genes is a major work in the post-genomics era.The accuracy of motif identification could be improved if we analyze co-expression or co-regulated genes at the same time.In this paper we analyze the motifs common used in ribosomal protein genes of yeast,counting the number of genes including a certain motif,based on log-linear model of contingency table.Then the over-represented motifs relative to background sequences are further filtered out with a U-statistics.These motifs are potential transcriptional regulatory elements of yeast RP genes,90% of which are accordance with the transcription factor binding sites verified by experimental analyses.The advantage of this method is to extract the motifs shared by a set of gene promoters in a strict statistical standard,which overcomes the fuzzy judge in previous work.This method could also be used to search combinatorial regulation motif pairs efficiently in co-regulated genes.A phenomenon has been discovered that there is an obvious relevancy between the Pearson's correlation coefficient,which reflects the correlation extent of two attributes in contingency table,and the interaction effect of log-linear model.This result suggests that we could evaluate the correlation of two attributes by the interaction effect of log-linear model.
    Motif (music)
    DNA binding site
    Contingency table
    Cis-regulatory module
    Citations (0)
    It is known that genes with similar expression profile are likely to be regulated by a common transcription factor and finding the common cis-element to which the protein binds from the upstream regions of these genes is very important. Although several famous programs for motif extraction exist nowadays, end-users often hesitate about the reliability of the output results. One reason is that the user does not have any a priori knowledge about the motif, i.e., either it is a single motif, multiple similar motifs, or multiple non-similar motifs; even some of the sequence may not contain any motifs at all. Although the performance of some of these famous programs, such as MEME [1], Gibbs DNA [3] and Consensus [5], has been studied [2, 4, 6], they were usually done in their default mode, which cannot be suited for all the circumstances. In our research, we systematically explored various possibilities of parameter-setting, which is the most complicated part of these programs in realistic situations. In this way, we tried to solve the problem of elucidating strongly corrupted motifs, such as 10/2, 12/3, 15/4 (motif length/mismatches), which have been reported before as being very difficult with the above algorithms [4].
    Motif (music)
    Sequence motif
    Citations (2)
    We propose motif regressor for discovering sequence motifs upstream of genes that undergo expression changes in a given condition. The method combines the advantages of matrix-based motif finding and oligomer motif-expression regression analysis, resulting in high sensitivity and specificity. motif regressor is particularly effective in discovering expression-mediating motifs of medium to long width with multiple degenerate positions. When applied to Saccharomyces cerevisiae , motif regressor identified the ROX1 and YAP1 motifs from Rox1p and Yap1p overexpression experiments, respectively; predicted that Gcn4p may have increased activity in YAP1 deletion mutants; reported a group of motifs (including GCN4, PHO4, MET4, STRE, USR1, RAP1, M3A, and M3B) that may mediate the transcriptional response to amino acid starvation; and found all of the known cell-cycle regulation motifs from 18 expression microarrays over two cell cycles.
    Motif (music)
    YAP1
    Sequence motif
    Structural motif
    Citations (349)
    Abstract Motivation: We demonstrate a computational process by which transcription factor binding sites can be elucidated using genome-wide expression and binding profiles. The profiles direct us to the intergenic locations likely to contain the promoter regions for a given factor. These sequences are multiply and locally aligned to give an anchor motif from which further characterization can take place. Results: We present bases for and assumptions about the variability within these motifs which give rise to potentially more accurate motifs, capture complex binding sites built upon the basis motif, and eliminate the constraints of the currently employed promoter searching protocols. We also present a measure of motif quality based on the occurrence of the putative motifs in regions observed to contain the binding sites. The assumptions, motif generation, quality assessment and comparison allow the user as much control as their a priori knowledge allows. Availability: IGRDB and the datasets mentioned herein are available at http://chipdb.wi.mit.edu/ Contact: rhonda@bu.edu * To whom correspondence should be addressed.
    DNA binding site
    Abstract A genome encodes two types of information, the “what can be made” and the “when and where”. The “what” are mostly proteins which perform the majority of functions within living organisms and the “when and where” is the regulatory information that encodes when and where DNA is transcribed. Currently, it is possible to efficiently predict the majority of the protein content of a genome but nearly impossible to predict the transcriptional regulation. This regulation is based upon the interaction between transcription factors and genomic sequences at the site of binding motifs 1,2,3 . Information contained within the motif is necessary to predict transcription factor binding, however, it is not sufficient 4 , as experimentally verified binding sites are substantially scarcer than the corresponding binding motif. Thus, it remains challenging to derive regulational information from binding motifs. Here we show that a random forest machine learning approach, which incorporates the 3D-shape of DNA, enhances binding prediction for all 216 tested Arabidopsis thaliana transcription factors and improves the resolution of differential binding by transcription factor family members which share the same binding motif. Our results contribute to the understanding of protein-DNA recognition and demonstrate the extraction of binding site features beyond the binding sequence. We observed that those features were individually weighted for each transcription factor, even if they shared the same binding sequence. We show that the gained insights enable a more robust prediction of binding behavior regarding novel, not-in-genome motif sequences. Understanding transcription factor binding as a combination of motif sequence and motif shape brings us closer to predicting gene expression from promoter sequence.
    DNA binding site
    Sequence motif
    Motif (music)
    Transcription
    Citations (1)
    Transcription Factors (TFs) control transcription by binding to specific sites in the promoter regions of the target genes, which can be modelled by structured motifs. In this paper we propose AliBiMotif, a method combining sequence alignment and a biclustering approach based on efficient string matching techniques using suffix trees to unravel approximately conserved sets of blocks (structured motifs) while straightforwardly disregarding non-conserved stretches in-between. The ability to ignore the width of non-conserved regions is a major advantage of the proposed method over other motif finders, as the lengths of the binding sites are usually easier to estimate than the separating distances.
    DNA binding site
    Conserved sequence
    Multiple sequence alignment
    Motif (music)
    Biclustering
    Citations (2)
    Background The need for efficient algorithms to uncover biologically relevant phosphorylation motifs has become very important with rapid expansion of the proteomic sequence database along with a plethora of new information on phosphorylation sites. Here we present a novel unsupervised method, called Motif Finder (in short, F-Motif) for identification of phosphorylation motifs. F-Motif uses clustering of sequence information represented by numerical features that exploit the statistical information hidden in some foreground data. Furthermore, these identified motifs are then filtered to find "actual" motifs with statistically significant motif scores. Results and Discussion We have applied F-Motif to several new and existing data sets and compared its performance with two well known state-of-the-art methods. In almost all cases F-Motif could identify all statistically significant motifs extracted by the state-of-the-art methods. More importantly, in addition to this, F-Motif uncovers several novel motifs. We have demonstrated using clues from the literature that most of these new motifs discovered by F-Motif are indeed novel. We have also found some interesting phenomena. For example, for CK2 kinase, the conserved sites appear only on the right side of S. However, for CDK kinase, the adjacent site on the right of S is conserved with residue P. In addition, three different encoding methods, including a novel position contrast matrix (PCM) and the simplest binary coding, are used and the ability of F-motif to discover motifs remains quite robust with respect to encoding schemes. Conclusions An iterative algorithm proposed here uses exploratory data analysis to discover motifs from phosphorylated data. The effectiveness of F-Motif has been demonstrated using several real data sets as well as using a synthetic data set. The method is quite general in nature and can be used to find other types of motifs also. We have also provided a server for F-Motif at http://f-motif.classcloud.org/, http://bio.classcloud.org/f-motif/ or http://ymu.classcloud.org/f-motif/.
    Exploratory analysis
    Motif discovery aims to detect short, highly conserved patterns in a collection of unaligned DNA or protein sequences. Discriminative motif finding algorithms aim to increase the sensitivity and selectivity of motif discovery by utilizing a second set of sequences, and searching only for patterns that can differentiate the two sets of sequences. Potential applications of discriminative motif discovery include discovering transcription factor binding site motifs in ChIP-chip data and finding protein motifs involved in thermal stability using sets of orthologous proteins from thermophilic and mesophilic organisms.We describe DEME, a discriminative motif discovery algorithm for use with protein and DNA sequences. Input to DEME is two sets of sequences; a "positive" set and a "negative" set. DEME represents motifs using a probabilistic model, and uses a novel combination of global and local search to find the motif that optimally discriminates between the two sets of sequences. DEME is unique among discriminative motif finders in that it uses an informative Bayesian prior on protein motif columns, allowing it to incorporate prior knowledge of residue characteristics. We also introduce four, synthetic, discriminative motif discovery problems that are designed for evaluating discriminative motif finders in various biologically motivated contexts. We test DEME using these synthetic problems and on two biological problems: finding yeast transcription factor binding motifs in ChIP-chip data, and finding motifs that discriminate between groups of thermophilic and mesophilic orthologous proteins.Using artificial data, we show that DEME is more effective than a non-discriminative approach when there are "decoy" motifs or when a variant of the motif is present in the "negative" sequences. With real data, we show that DEME is as good, but not better than non-discriminative algorithms at discovering yeast transcription factor binding motifs. We also show that DEME can find highly informative thermal-stability protein motifs. Binaries for the stand-alone program DEME is free for academic use and is available at http://bioinformatics.org.au/deme/
    Discriminative model
    Motif (music)
    Sequence motif
    Citations (105)
    Recognition of motifs in multiple unaligned sequences provides an insight into protein structure and function. The task of discovering these motifs is very challenging because most of these motifs exist in different sequences in different mutated forms of the original consensus motif and thus have weakly conserved regions. Different score metrics and algorithms have been proposed for motif recognition. In this paper, we propose a new genetic algorithm based method for identification of multiple motifs instances in multiple biological sequences. The experimental results on simulated and real data show that our algorithm can identify multiple occurrences of a weak motif in single sequences as well as in multiple sequences. Moreover, it can identify weakly conserved regions more accurately than other genetic algorithm based motif discovery methods.
    Identification
    Citations (36)