logo
    CAMUS DB for Amino Acid Sequence Data
    1
    Citation
    1
    Reference
    20
    Related Paper
    Citation Trend
    Abstract:
    DDBJ/EMBL/GenBank International Nucleotide Sequence Database is still increasing, keeping doubling time only slightly longer than one year for the last these years. This situation affects the increase of DDBJ Amino acid sequence Database (DAD) [4], which is made from translation of nucleotide sequences in CDS regions annotated in DDBJ [3], and makes computation time for homology search of DAD longer as well as that of DNA database. We therefore created compressed sequence database, consisting of highly homologous sequence clusters in multiple aligned form with representative sequences, the DAD version of CAMUS (Compressed database for homology searches And MUltiple aligned Sequence database) [2].
    Keywords:
    Homology
    Sequence (biology)
    Sequence logo
    Sequence logo
    Methanococcus
    Coding region
    Sequence (biology)
    Alignment-free sequence analysis
    Protein sequencing
    Citations (599)
    FISH (Fast Index Search for Homologous coding sequences) consists of a database and associated software and is intended to function as a directory of protein-coding gene sequences. The FISH index contains descriptions of 22 361 DNA sequences from release 69.0 of the GenBank genetic sequence database. Complete coding sequences are represented numerically with counts of nucleotides and synonymous codons, and with GenBank LOCUS names and short descriptions. The software permits the database to be queried by GenBank LOCUS name, sequence length (expressed as total nwnber of codons), or by comparison with a DNA sequence. In the latter case, the numerical descriptions are compared with simple distance measures in place of actual DNA sequences. The FISH package can be used to rapidly assemble lists of similar coding sequences, without regard to functional annotation or sequence align Typical search times are well under a minute on widely available IBM-compatible microcomputers.
    Coding region
    Multiple sequence alignment
    Sequence (biology)
    Alignment-free sequence analysis
    Smith–Waterman algorithm
    Coding region
    Sequence logo
    Citations (228)
    When routinely analysing very long stretches of DNA sequences produced by genome sequencing projects, detailed analysis of database search results becomes exceedingly time consuming. To reduce the tedious browsing of large quantities of protein similarities, two programs, MSPcrunch and Blixem, were developed, which assist in processing the results from the database search programs in the BLAST suite. MSPcrunch removes biased composition and redundant matches while keeping weak matches that are consistent with a larger gapped alignment. This makes BLASTsearching in practice more sensitive and reduces the risk of overlooking distant similarities. Blixem is a multiple sequence alignment viewer for X-windows which makes it significantly easier to scan and evaluate the matches ratified by MSPcrunch. In Blixem, matches to the translated DNA query sequence are simultaneously aligned in three frames. Also, the distribution of matches over the whole DNA query is displayed. Examples of usage are drawn from 36 C.elegans cosmid clones totalling 1.2 megabases, to which these tools were applied.
    Workbench
    Homology
    Sequence (biology)
    Sequence homology
    Since the publication of the first rapid method for comparing biological sequences 15 years ago (1), DNA and protein sequence comparisons have become routine steps in biochemical characterization, from newly cloned proteins to entire genomes. As the DNA and protein sequence databases become more complete, a sequence similarity search is more likely to reveal a database sequence with statistically significant similarity, and thus inferred homology, to a query sequence. Indeed, even in the archaebacterium Methanococcus jannaschii, more than 40% of the open reading frames could be assigned a function based on significant sequence similarity to a protein of known function (2).
    Sequence (biology)
    Similarity (geometry)
    Citations (607)
    Abstract MOTIVATION: To maximize the chances of biological discovery, homology searching must use an up-to-date collection of sequences. However, the available sequence databases are growing rapidly and are partially redundant in content. This leads to increasing strain on CPU resources and decreasing density of first-hand annotation. RESULTS: These problems are addressed by clustering closely similar sequences to yield a covering of sequence space by a representative subset of sequences. No pair of sequences in the representative set has >90% mutual sequence identity. The representative set is derived by an exhaustive search for close similarities in the sequence database in which the need for explicit sequence alignment is significantly reduced by applying deca- and pentapeptide composition filters. The algorithm was applied to the union of the Swissprot, Swissnew, Trembl, Tremblnew, Genbank, PIR, Wormpep and PDB databases. The all-against-all comparison required to generate a representative set at 90% sequence identity was accomplished in 2 days CPU time, and the removal of fragments and close similarities yielded a size reduction of 46%, from 260 000 unique sequences to 140 000 representative sequences. The practical implications are (i) faster homology searches using, for example, Fasta or Blast, and (ii) unified annotation for all sequences clustered around a representative. As tens of thousands of sequence searches are performed daily world-wide, appropriate use of the non-redundant database can lead to major savings in computer resources, without loss of efficacy. AVAILABILITY: A regularly updated non-redundant protein sequence database (nrdb90), a server for homology searches against nrdb90, and a Perl script (nrdb90.pl) implementing the algorithm are available for academic use from http://www.embl-ebi.ac. uk/holm/nrdb90. CONTACT: holm@embl-ebi.ac.uk
    UniProt
    Sequence (biology)
    RefSeq
    Protein sequencing
    Sequence logo
    Perl
    Homology
    A scheme of fast similarity search of nucleotide sequences is suggested based on sequence imaging, which results in chunks of information much less than original sequence but more specialized for comparison. Three methods were developed using three different imaging functions. The first is based on identity of local sites of up to twelve nucleotides, the second is based on statistical homology of local 42 nucleotide fragments, and the third is based on the homology of 100-150 nucleotide fragments and models the comparison of restriction maps. Each of them requires the library of sequence images. The total size of such a library is less than the size of sequences stored in compressed form. The sequences are aligned allowing local homology searches. The method reduces total time for a similarity search about 100-fold. The programs can be easily included in any software, which allows user to define his own set of sequences. One of the programs is implemented within DNA-SUN software and is used in Institute of Molecular Genetics and Institute of Molecular Biology.
    Homology
    Sequence (biology)
    Similarity (geometry)
    Sequence logo
    Sequence homology
    Citations (0)
    Improved sensitivity of biological sequence database searches Get access Douglas L. Brutlag, Douglas L. Brutlag Search for other works by this author on: Oxford Academic PubMed Google Scholar Jean-Pierre Dautricourt, Jean-Pierre Dautricourt 1IntelliGenetics Inc.,700 East E1 Camino Real, Mountain View, CA 94040, USA Search for other works by this author on: Oxford Academic PubMed Google Scholar Sunil Maulik, Sunil Maulik 1IntelliGenetics Inc.,700 East E1 Camino Real, Mountain View, CA 94040, USA Search for other works by this author on: Oxford Academic PubMed Google Scholar John Relnh John Relnh 1IntelliGenetics Inc.,700 East E1 Camino Real, Mountain View, CA 94040, USA Search for other works by this author on: Oxford Academic PubMed Google Scholar Bioinformatics, Volume 6, Issue 3, July 1990, Pages 237–245, https://doi.org/10.1093/bioinformatics/6.3.237 Published: 01 July 1990 Article history Received: 10 October 1989 Accepted: 01 May 1990 Published: 01 July 1990
    Sequence (biology)
    Similarity (geometry)
    Matrix (chemical analysis)
    ENCODE
    Sequence logo