Motif relation analysis based on ICA technology

Applied science and technology (2011)

Citation

Reference

Related Paper

Abstract:

As the transcription factor binding sites,the motifs play an important role in promoter recognition and the gene transcription and expression.Finding their characteristics is obviously a meaningful subject.According on the correlation detection of motifs as promoter features,a method for getting motif packages is given by independent component analysis.It can decompose the obverved motif frequency matrix into smaller components and finally obtain the motif packages with concurrent motifs.In the experiment results,the influence of DNA sequence data selection to the motif packages is analyzed.More stable motif relation structures can be obtained using longer motifs.

Keywords:

Motif (music)

Sequence motif

Structural motif

Transcription

Topics:

Blind Source Separation Techniques

Neural Networks and Applications

Advanced Algorithms and Applications

Source

Cite

Analysis of transcription regulation motifs in yeast genes based on log-linear model

China Journal of Bioinformatics (2011)

Jing Zhang

Identifying the transcriptional factor binding sites(or motifs) of eukaryotic genes is a major work in the post-genomics era.The accuracy of motif identification could be improved if we analyze co-expression or co-regulated genes at the same time.In this paper we analyze the motifs common used in ribosomal protein genes of yeast,counting the number of genes including a certain motif,based on log-linear model of contingency table.Then the over-represented motifs relative to background sequences are further filtered out with a U-statistics.These motifs are potential transcriptional regulatory elements of yeast RP genes,90% of which are accordance with the transcription factor binding sites verified by experimental analyses.The advantage of this method is to extract the motifs shared by a set of gene promoters in a strict statistical standard,which overcomes the fuzzy judge in previous work.This method could also be used to search combinatorial regulation motif pairs efficiently in co-regulated genes.A phenomenon has been discovered that there is an obvious relevancy between the Pearson's correlation coefficient,which reflects the correlation extent of two attributes in contingency table,and the interaction effect of log-linear model.This result suggests that we could evaluate the correlation of two attributes by the interaction effect of log-linear model.

Motif (music)

DNA binding site

Contingency table

Cis-regulatory module

Source

Cite

Citations (0)

Parameter Landscape Analysis for Improving the Performance of Common Motif Detection Algorithms

Proceedings Genome Informatics Workshop/Genome informatics (2002)

Natalia Poluliakh Michiko Konno Toshihisa Takagi Kenta Nakai

It is known that genes with similar expression profile are likely to be regulated by a common transcription factor and finding the common cis-element to which the protein binds from the upstream regions of these genes is very important. Although several famous programs for motif extraction exist nowadays, end-users often hesitate about the reliability of the output results. One reason is that the user does not have any a priori knowledge about the motif, i.e., either it is a single motif, multiple similar motifs, or multiple non-similar motifs; even some of the sequence may not contain any motifs at all. Although the performance of some of these famous programs, such as MEME [1], Gibbs DNA [3] and Consensus [5], has been studied [2, 4, 6], they were usually done in their default mode, which cannot be suited for all the circumstances. In our research, we systematically explored various possibilities of parameter-setting, which is the most complicated part of these programs in realistic situations. In this way, we tried to solve the problem of elucidating strongly corrupted motifs, such as 10/2, 12/3, 15/4 (motif length/mismatches), which have been reported before as being very difficult with the above algorithms [4].

Motif (music)

Sequence motif

10.11234/gi1990.13.430

Cite

Citations (2)

Integrating regulatory motif discovery and genome-wide expression analysis

Proceedings of the National Academy of Sciences (2003)

Erin M. Conlon X. Shirley Liu Jason D. Lieb Jun S. Liu

We propose motif regressor for discovering sequence motifs upstream of genes that undergo expression changes in a given condition. The method combines the advantages of matrix-based motif finding and oligomer motif-expression regression analysis, resulting in high sensitivity and specificity. motif regressor is particularly effective in discovering expression-mediating motifs of medium to long width with multiple degenerate positions. When applied to Saccharomyces cerevisiae , motif regressor identified the ROX1 and YAP1 motifs from Rox1p and Yap1p overexpression experiments, respectively; predicted that Gcn4p may have increased activity in YAP1 deletion mutants; reported a group of motifs (including GCN4, PHO4, MET4, STRE, USR1, RAP1, M3A, and M3B) that may mediate the transcriptional response to amino acid starvation; and found all of the known cell-cycle regulation motifs from 18 expression microarrays over two cell cycles.

Motif (music)

YAP1

Sequence motif

Structural motif

10.1073/pnas.0630591100

Cite

Citations (349)

Condition specific transcription factor binding site characterization in Saccharomyces cerevisiae

Bioinformatics (2002)

Rhonda Harrison Charles DeLisi

Abstract Motivation: We demonstrate a computational process by which transcription factor binding sites can be elucidated using genome-wide expression and binding profiles. The profiles direct us to the intergenic locations likely to contain the promoter regions for a given factor. These sequences are multiply and locally aligned to give an anchor motif from which further characterization can take place. Results: We present bases for and assumptions about the variability within these motifs which give rise to potentially more accurate motifs, capture complex binding sites built upon the basis motif, and eliminate the constraints of the currently employed promoter searching protocols. We also present a measure of motif quality based on the occurrence of the putative motifs in regions observed to contain the binding sites. The assumptions, motif generation, quality assessment and comparison allow the user as much control as their a priori knowledge allows. Availability: IGRDB and the datasets mentioned herein are available at http://chipdb.wi.mit.edu/ Contact: rhonda@bu.edu * To whom correspondence should be addressed.

DNA binding site

10.1093/bioinformatics/18.10.1289

Cite

Citations (12)

Local DNA shape is a general principle of transcription factor binding specificity in Arabidopsis thaliana

bioRxiv (Cold Spring Harbor Laboratory) (2020)

Janik Sielemann Donat Wulf Romy Schmidt Andrea Bräutigam

Abstract A genome encodes two types of information, the “what can be made” and the “when and where”. The “what” are mostly proteins which perform the majority of functions within living organisms and the “when and where” is the regulatory information that encodes when and where DNA is transcribed. Currently, it is possible to efficiently predict the majority of the protein content of a genome but nearly impossible to predict the transcriptional regulation. This regulation is based upon the interaction between transcription factors and genomic sequences at the site of binding motifs 1,2,3 . Information contained within the motif is necessary to predict transcription factor binding, however, it is not sufficient 4 , as experimentally verified binding sites are substantially scarcer than the corresponding binding motif. Thus, it remains challenging to derive regulational information from binding motifs. Here we show that a random forest machine learning approach, which incorporates the 3D-shape of DNA, enhances binding prediction for all 216 tested Arabidopsis thaliana transcription factors and improves the resolution of differential binding by transcription factor family members which share the same binding motif. Our results contribute to the understanding of protein-DNA recognition and demonstrate the extraction of binding site features beyond the binding sequence. We observed that those features were individually weighted for each transcription factor, even if they shared the same binding sequence. We show that the gained insights enable a more robust prediction of binding behavior regarding novel, not-in-genome motif sequences. Understanding transcription factor binding as a combination of motif sequence and motif shape brings us closer to predicting gene expression from promoter sequence.

DNA binding site

Sequence motif

Motif (music)

Transcription

10.1101/2020.09.29.318923

Cite

Citations (1)

AliBiMotif: Integrating alignment and biclustering to unravel transcription factor binding sites in DNA sequences

International Journal of Data Mining and Bioinformatics (2012)

Joana P. Gonçalves Yves Moreau Sara C. Madeira

Transcription Factors (TFs) control transcription by binding to specific sites in the promoter regions of the target genes, which can be modelled by structured motifs. In this paper we propose AliBiMotif, a method combining sequence alignment and a biclustering approach based on efficient string matching techniques using suffix trees to unravel approximately conserved sets of blocks (structured motifs) while straightforwardly disregarding non-conserved stretches in-between. The ability to ignore the width of non-conserved regions is a major advantage of the proposed method over other motif finders, as the lengths of the binding sites are usually easier to estimate than the separating distances.

DNA binding site

Conserved sequence

Multiple sequence alignment

Motif (music)

Biclustering

10.1504/ijdmb.2012.048198

Cite

Citations (2)

Discovery of Protein Phosphorylation Motifs through Exploratory Data Analysis

PLoS ONE (2011)

Yicheng Chen Kripamoy Aguan Chu‐Wen Yang Yao‐Tsung Wang Nikhil R. Pal

Background The need for efficient algorithms to uncover biologically relevant phosphorylation motifs has become very important with rapid expansion of the proteomic sequence database along with a plethora of new information on phosphorylation sites. Here we present a novel unsupervised method, called Motif Finder (in short, F-Motif) for identification of phosphorylation motifs. F-Motif uses clustering of sequence information represented by numerical features that exploit the statistical information hidden in some foreground data. Furthermore, these identified motifs are then filtered to find "actual" motifs with statistically significant motif scores. Results and Discussion We have applied F-Motif to several new and existing data sets and compared its performance with two well known state-of-the-art methods. In almost all cases F-Motif could identify all statistically significant motifs extracted by the state-of-the-art methods. More importantly, in addition to this, F-Motif uncovers several novel motifs. We have demonstrated using clues from the literature that most of these new motifs discovered by F-Motif are indeed novel. We have also found some interesting phenomena. For example, for CK2 kinase, the conserved sites appear only on the right side of S. However, for CDK kinase, the adjacent site on the right of S is conserved with residue P. In addition, three different encoding methods, including a novel position contrast matrix (PCM) and the simplest binary coding, are used and the ability of F-motif to discover motifs remains quite robust with respect to encoding schemes. Conclusions An iterative algorithm proposed here uses exploratory data analysis to discover motifs from phosphorylated data. The effectiveness of F-Motif has been demonstrated using several real data sets as well as using a synthetic data set. The method is quite general in nature and can be used to find other types of motifs also. We have also provided a server for F-Motif at http://f-motif.classcloud.org/, http://bio.classcloud.org/f-motif/ or http://ymu.classcloud.org/f-motif/.

Exploratory analysis

10.1371/journal.pone.0020025

Cite

Citations (26)

Discriminative motif discovery in DNA and protein sequences using the DEME algorithm

BMC Bioinformatics (2007)

Emma Redhead Timothy L. Bailey

Motif discovery aims to detect short, highly conserved patterns in a collection of unaligned DNA or protein sequences. Discriminative motif finding algorithms aim to increase the sensitivity and selectivity of motif discovery by utilizing a second set of sequences, and searching only for patterns that can differentiate the two sets of sequences. Potential applications of discriminative motif discovery include discovering transcription factor binding site motifs in ChIP-chip data and finding protein motifs involved in thermal stability using sets of orthologous proteins from thermophilic and mesophilic organisms.We describe DEME, a discriminative motif discovery algorithm for use with protein and DNA sequences. Input to DEME is two sets of sequences; a "positive" set and a "negative" set. DEME represents motifs using a probabilistic model, and uses a novel combination of global and local search to find the motif that optimally discriminates between the two sets of sequences. DEME is unique among discriminative motif finders in that it uses an informative Bayesian prior on protein motif columns, allowing it to incorporate prior knowledge of residue characteristics. We also introduce four, synthetic, discriminative motif discovery problems that are designed for evaluating discriminative motif finders in various biologically motivated contexts. We test DEME using these synthetic problems and on two biological problems: finding yeast transcription factor binding motifs in ChIP-chip data, and finding motifs that discriminate between groups of thermophilic and mesophilic orthologous proteins.Using artificial data, we show that DEME is more effective than a non-discriminative approach when there are "decoy" motifs or when a variant of the motif is present in the "negative" sequences. With real data, we show that DEME is as good, but not better than non-discriminative algorithms at discovering yeast transcription factor binding motifs. We also show that DEME can find highly informative thermal-stability protein motifs. Binaries for the stand-alone program DEME is free for academic use and is available at http://bioinformatics.org.au/deme/

Discriminative model

Motif (music)

Sequence motif

10.1186/1471-2105-8-385

Cite

Citations (105)

Identification of weak motifs in multiple biological sequences using genetic algorithm

Topon Kumar Paul Hitoshi Iba

Recognition of motifs in multiple unaligned sequences provides an insight into protein structure and function. The task of discovering these motifs is very challenging because most of these motifs exist in different sequences in different mutated forms of the original consensus motif and thus have weakly conserved regions. Different score metrics and algorithms have been proposed for motif recognition. In this paper, we propose a new genetic algorithm based method for identification of multiple motifs instances in multiple biological sequences. The experimental results on simulated and real data show that our algorithm can identify multiple occurrences of a weak motif in single sequences as well as in multiple sequences. Moreover, it can identify weakly conserved regions more accurately than other genetic algorithm based motif discovery methods.

Identification

10.1145/1143997.1144044

Cite

Citations (36)

Detection of over-represented motifs corresponding to known TFBSs via motif clustering and matching

Computers & Mathematics with Applications (2009)

Lifang Liu Licheng Jiao

Motif (music)

Sequence motif

10.1016/j.camwa.2009.10.016

Cite

Citations (1)