Osamu Gotoh

National Institute of Advanced Industrial Science and Technology

Author Statistics

Papers

Citation

H-Index

i-10 index

Research Trends

Author Order

Document Type

Co-Authors

Hideki Nagasaki

Kazusa DNA Research Institute

Makiko Suwa

Aoyama Gakuin University

Hiroaki Iwata

Kyoto University

Tetsushi Yada

Kyushu Institute of Technology

Shinsuke Yamada

Asahi Intecc (Japan)

Ryuichiro Nakato

The University of Tokyo

Hayato Yamana

Jichi Medical University

Kintomo Takakura

Central Brain Tumor Registry of the United States

Masanori Arita

National Institute of Genetics

Tatsuya Nishizawa

Ritsumeikan University

Cooperative Institutions

The University of Tokyo

Kyoto University

National Institute of Advanced Industrial Science and Technology

Tohoku University

Tokyo University of Science

Nagoya University

Osaka University

Japan Science and Technology Agency

National Institute of Technology and Evaluation

University of Tsukuba

Author Statistics

Papers

Citation

H-Index

i-10 index

Research Field

Improvement in Speed and Accuracy of Multiple Sequence Alignment Program PRIME

IPSJ Transactions on Bioinformatics (2008)

Shinsuke Yamada Osamu Gotoh Hayato Yamana

Multiple sequence alignment (MSA) is a useful tool in bioinformatics. Although many MSA algorithms have been developed, there is still room for improvement in accuracy and speed. We have developed an MSA program PRIME, whose crucial feature is the use of a group-to-group sequence alignment algorithm with a piecewise linear gap cost. We have shown that PRIME is one of the most accurate MSA programs currently available. However, PRIME is slower than other leading MSA programs. To improve computational performance, we newly incorporate anchoring and grouping heuristics into PRIME. An anchoring method is to locate well-conserved regions in a given MSA as anchor points to reduce the region of DP matrix to be examined, while a grouping method detects conserved subfamily alignments specified by phylogenetic tree in a given MSA to reduce the number of iterative refinement steps. The results of BAliBASE 3.0 and PREFAB 4 benchmark tests indicated that these heuristics contributed to reduction in the computational time of PRIME by more than 60% while the average alignment accuracy measures decreased by at most 2%. Additionally, we evaluated the effectiveness of iterative refinement algorithm based on maximal expected accuracy (MEA). Our experiments revealed that when many sequences are aligned, the MEA-based algorithm significantly improves alignment accuracy compared with the standard version of PRIME at the expense of a considerable increase in computation time.

Multiple sequence alignment

Heuristics

Benchmark (surveying)

Sequence (biology)

Speedup

10.2197/ipsjtbio.1.2

Cite

Citations (5)

RDF Curator: A Novel Workflow that Generates Semantic Graph from Literature for Curation Using Text Mining

Nature Precedings (2010)

Yusuke Komiyama Osamu Gotoh

Abstract There exist few databases that enable cross-reference among various research fields related to bioenergy. Cross-reference is highly desired among bioinformatics databases related to environment, energy, and agriculture for better mutual cooperation. By uniting Semantic Graph, we can economically construct a distributed database, regardless of the size of research laboratories and research endeavors.Our purpose is to design and develop a workflow based on RDF (Resource Description Framework) that generates Semantic Graph for a set of technical terms extracted from documents of various formats, such as PDF, HTML, and plain text. Our attempt is to generate Semantics Graph as a result of text mining including morphological analysis and syntax analysis.We have developed a prototype of workflow program named "RDF Curator". By using this system, various types of documents can be automatically converted into RDF. "RDF Curator" is composed of general tools and libraries so that no special environment is needed. Hence, “RDF Curator” can be used on many platforms, such as MacOSX, Linux, and Windows (Cygwin). We expect that our system can assist human curators in constructing Semantic Graph. Although fast and high throughput, the accuracy of the present version of "RDF Curator" is lower than that of human curators. As a future task, we have to improve the accuracy of the workflow. In addition, we also plan to apply our system to analysis of network similarity.

RDF Schema

10.1038/npre.2010.5072.1

Cite

Citations (0)

Multiple Sequence Alignment

Chapman & Hall/CRC computer and information science series (2005)

Osamu Gotoh Shinsuke Yamada Tetsushi Yada

Sequence (biology)

10.1201/9781420036275.ch3

Cite

Citations (7)

An Algorithm for Classification of Alternative Splicing and Transcriptional Initiation and Its Genome-Wide Application

Proceedings Genome Informatics Workshop/Genome informatics (2003)

Hideki Nagasaki Makiko Suwa Osamu Gotoh

We developed an algorithm that classi es all observed units of alternative splicing and transcriptionalinitiation and termination (UASTs) into an extendable set of distinct elementary patterns, when acollection of alignments between genomic DNA sequences and a set of cDNA/EST sequences are pro-vided. Thealgorithm rstconverts aligned exon-intron structuresinto bitarrays, extracts UASTs, andthen encodes each UAST into a pair (or vector) of decimal numbers, which specify the correspondingpattern. This system can uniquely and compactly encode not only typical patterns but also any rareor novel patterns which have usually been collectively assigned as \others. This system deals withtranscriptional variation and alternative splicing in the same framework of classi cation.

ENCODE

Decimal

10.11234/gi1990.14.424

Cite

Citations (4)

Genome to Function, the Role of Sequence Alignment

Kobunshi (1999)

Osamu Gotoh

Sequence (biology)

10.1295/kobunshi.48.337

Cite

Citations (0)

Prediction of melting profiles and local Helix stability for sequenced DNA

Advances in Biophysics (1983)

Osamu Gotoh

Helix (gastropod)

10.1016/0065-227x(83)90007-2

Cite

Citations (126)

Multiple sequence alignment: Algorithms and applications

Advances in Biophysics (1999)

Osamu Gotoh

Sequence (biology)

Theme (computing)

10.1016/s0065-227x(99)80007-0

Cite

Citations (94)

Coherent Structural Prediction of a Set of Paralogous Genes on a Eukaryotic Genome

Proceedings Genome Informatics Workshop/Genome informatics (1999)

Osamu Gotoh

Following the completion of genomic sequencing of S. cerevisiae and C. elegans, complete sequencing of several eukaryotic genomes, including that of human, is being accomplished within a few years. An essential but yet unresolved problem is to locate genes on a genomic sequence and to precisely predict their internal (exon-intron) structures. Statistical gene-finding methods have attained significant success, but the performance of even the best available methods is still unsatisfactory for many practical purposes [1, 2]. Homology-based gene-identification methods can considerably improve the accuracy of prediction, provided that one or more known protein or mRNA sequence closely related to the target gene is found in databases [5]. However, it is often observed that the closest relative to a gene is another gene on the same genome. In fact, genomes of higher eukaryotes, such as C. elegans and A. thaliana, possess a number of large gene families, members of which are mutually well related but far from any genes in other organisms. Here, I propose a method for simultaneously predicting the gene structures of all members in such a species-specific family.

Gene prediction

Homology

10.11234/gi1990.10.255

Cite

Citations (0)

Optimal sequence alignment allowing for long gaps

Bulletin of Mathematical Biology (1990)

Osamu Gotoh

Sequence (biology)

Multiple sequence alignment

Constant (computer programming)

10.1007/bf02458577

Cite

Citations (52)

Pattern matching of biological sequences with limited storage

Bioinformatics (1987)

Osamu Gotoh

Existing methods for getting the locally best matched alignments between a pair of biological sequences require O(N2) computational steps and O(N2) storage, where N is the average sequence length. An improved method is presented with which the storage requirement is greatly reduced, while the computational steps remain O(N2). Only a small number of additional steps are required to display any common sub-sequences with similarity scores greater than a given threshold. The aligments found by the algorithm are optimal in the sense that their scores are locally maximal, where each score is a sum of weights given to individual matches/replacements, insertions and deletions involved in the alignment. The algorithm was implemented in C programming language on a personal computer. Data area of 64 kbytes on random access memory and a few hundred kbytes on a disk is sufficient for comparing two protein or nucleic acid sequences of 2500 residues. The programs are particularly valuable when used in combination with fast sequence search programs.

Sequence (biology)

Similarity (geometry)

Auxiliary memory

10.1093/bioinformatics/3.1.17

Cite

Citations (27)