logo
    Math Agents: Computational Infrastructure, Mathematical Embedding, and Genomics
    4
    Citation
    0
    Reference
    10
    Related Paper
    Citation Trend
    Abstract:
    The advancement in generative AI could be boosted with more accessible mathematics. Beyond human-AI chat, large language models (LLMs) are emerging in programming, algorithm discovery, and theorem proving, yet their genomics application is limited. This project introduces Math Agents and mathematical embedding as fresh entries to the "Moore's Law of Mathematics", using a GPT-based workflow to convert equations from literature into LaTeX and Python formats. While many digital equation representations exist, there's a lack of automated large-scale evaluation tools. LLMs are pivotal as linguistic user interfaces, providing natural language access for human-AI chat and formal languages for large-scale AI-assisted computational infrastructure. Given the infinite formal possibility spaces, Math Agents, which interact with math, could potentially shift us from "big data" to "big math". Math, unlike the more flexible natural language, has properties subject to proof, enabling its use beyond traditional applications like high-validation math-certified icons for AI alignment aims. This project aims to use Math Agents and mathematical embeddings to address the ageing issue in information systems biology by applying multiscalar physics mathematics to disease models and genomic data. Generative AI with episodic memory could help analyse causal relations in longitudinal health records, using SIR Precision Health models. Genomic data is suggested for addressing the unsolved Alzheimer's disease problem.
    Keywords:
    Computational genomics
    Python
    Abstract Motivation Genomic intervals are one of the most prevalent data structures in computational genome biology, and used to represent features ranging from genes, to DNA binding sites, to disease variants. Operations on genomic intervals provide a language for asking questions about relationships between features. While there are excellent interval arithmetic tools for the command line, they are not smoothly integrated into Python, one of the most popular general-purpose computational and visualization environments. Results Bioframe is a library to enable flexible and performant operations on genomic interval dataframes in Python. Bioframe extends the Python data science stack to use cases for computational genome biology by building directly on top of two of the most commonly-used Python libraries, NumPy and Pandas. The bioframe API enables flexible name and column orders, and decouples operations from data formats to avoid unnecessary conversions, a common scourge for bioinformaticians. Bioframe achieves these goals while maintaining high performance and a rich set of features. Availability and implementation Bioframe is open-source under MIT license, cross-platform, and can be installed from the Python Package Index. The source code is maintained by Open2C on GitHub at https://github.com/open2c/bioframe.
    Background Today, there are more than a hundred times as many sequenced prokaryotic genomes than were present in the year 2000. The economical sequencing of genomic DNA has facilitated a whole new approach to microbial genomics. The real power of genomics is manifested through comparative genomics that can reveal strain specific characteristics, diversity within species and many other aspects. However, comparative genomics is a field not easily entered into by scientists with few computational skills. The CMG-biotools package is designed for microbiologists with limited knowledge of computational analysis and can be used to perform a number of analyses and comparisons of genomic data. Results The CMG-biotools system presents a stand-alone interface for comparative microbial genomics. The package is a customized operating system, based on Xubuntu 10.10, available through the open source Ubuntu project. The system can be installed on a virtual computer, allowing the user to run the system alongside any other operating system. Source codes for all programs are provided under GNU license, which makes it possible to transfer the programs to other systems if so desired. We here demonstrate the package by comparing and analyzing the diversity within the class Negativicutes, represented by 31 genomes including 10 genera. The analyses include 16S rRNA phylogeny, basic DNA and codon statistics, proteome comparisons using BLAST and graphical analyses of DNA structures. Conclusion This paper shows the strength and diverse use of the CMG-biotools system. The system can be installed on a vide range of host operating systems and utilizes as much of the host computer as desired. It allows the user to compare multiple genomes, from various sources using standardized data formats and intuitive visualizations of results. The examples presented here clearly shows that users with limited computational experience can perform complicated analysis without much training.
    Comparative Genomics
    Workbench
    Computational genomics
    Structural genomics
    Citations (109)
    Computational genomics
    Representation
    Comparative Genomics
    Functional Genomics
    Sequence (biology)
    Citations (3)
    Functional Genomics
    Genome Biology
    Computational genomics
    Section (typography)
    Structural genomics
    Citations (0)
    Abstract Motivation Genomic intervals are one of the most prevalent data structures in computational genome biology, and used to represent features ranging from genes, to DNA binding sites, to disease variants. Operations on genomic intervals provide a language for asking questions about relationships between features. While there are excellent interval arithmetic tools for the command line, they are not smoothly integrated into Python, one of the most popular general-purpose computational and visualization environments. Results Bioframe is a library to enable flexible and performant operations on genomic interval dataframes in Python. Bioframe extends the Python data science stack to use cases for computational genome biology by building directly on top of two of the most commonly-used Python libraries, numpy and pandas . The bioframe API enables flexible name and column orders, and decouples operations from data formats to avoid unnecessary conversions, a common scourge for bioinformaticians. Bioframe achieves these goals while maintaining high performance and a rich set of features. Availability and implementation Bioframe is open-source under MIT license, cross-platform, and can be installed from the Python package index. The source code is maintained by Open2C on Github at https://github.com/open2c/bioframe .
    Python
    MIT License
    Computational genomics
    Citations (19)
    EDITORIAL article Front. Genet., 25 January 2024Sec. Computational Genomics Volume 15 - 2024 | https://doi.org/10.3389/fgene.2024.1367531
    Computational genomics
    Genome Biology
    Functional Genomics
    In the wake of the completion of model species' whole genome sequencing,genomics have been improved from the structural-genomics area to the functional-genomics area.Being a forward science,genomics shows a highest activities and influences.Based on the achievements of structural genomics,all subjects in functional genomics have different principles,key technologies with characteristics and superiorities,and thus have special spheres of application and developmental trends.Functional genomics is giving rise to the born of diverse aimed new-sciences because of its integration with many aspects of modern sciences.
    Structural genomics
    Functional Genomics
    Genome Biology
    Computational genomics
    Comparative Genomics
    Personal genomics
    Citations (0)
    Genomics: Is the interdisciplinary field of biology and it is sub discipline of genetics (Mapping, Sequence and Functional analysis of genomics ) A genome is the complete set of DNA, including all of its genes Genomics build on R-DNA Technology and this include the study of inter-genomic phenomena and other interactions between the genome. It involves the study of all genes and their interrelation ships. Objective: The main objective is to describe the importance of genomics & Bioinformatics in biomedical Research. Perform searches using various accessible tools and databases and tools.
    Genome Biology
    Functional Genomics
    Computational genomics
    Comparative Genomics
    Personal genomics