GENOMIN: A SOFTWARE FRAMEWORK FOR READING GENOMIC SIGNALS

2011 
Data mining produces models that capture and represent hidden patterns in the DNA structure. Any attempt to develop and test new algorithms for data mining in the field of bioinformatics, must begin with an optimal method by which even the huge FASTA files can be read step by step. The aim of the GENOMIN software is to provide an open source software platform which can work with large files like a whole chromosome or genome sequence. We have created an open source template software, named GENOMIN, for analyzing genetic data of sequences of different sizes downloaded from NCBI servers. Large NCBI FASTA files which store sequences of individual chromosomes come from other processing systems like UNIX. Processing these files on other operating systems is difficult due to different markers which indicate the end of each line. The GENOMIN software, reads the FASTA files by continuous buffer reading, without taking into account the end of line markers. The result of this type of reading is a brute, noisy free DNA sequence of the entire file regardless of its size. We presented three examples to demonstrate how the program can be used in biology: the estimation of GC content, identification of repetitive elements and search for sequences with different biological functions (e.g. duplicated regions or potential binding sites for transcription factors). Development of this open source software is limited only by the researcher programming skills. The results of our tests have been shown that GENOMIN can perform various tests on large sequences files and can work with different algorithms used in biology.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    30
    References
    2
    Citations
    NaN
    KQI
    []