Bioinformatics may seem to be a scientific field processing primarily large string datasets, as nucleotides and amino acids are represented with dedicated characters. On the other hand, many computational tasks that bioinformatics challenges are mathematical problems understandable as operations with digits. In fact, many computational tasks are solved this way in the background. One of the most widely used digital representations is mapping of nucleotides and amino acids with integers 0–3 and 0–20, respectively. The limitation of this mapping occurs when the digital signal of nucleotides has to be translated into a digital signal of amino acids as the genetic code is degenerated. This causes non-monotonies in a mapping function. Although map for reducing this undesirable effect has already been proposed, it is defined theoretically and for standard genetic codes only. In this study, we derived a novel optimal criterion for reducing the influence of degeneration by utilizing a large dataset of real sequences with various genetic codes. As a result, we proposed a new robust global optimal map suitable for any genetic code as well as specialized optimal maps for particular genetic codes.
Classification methods of DNA most commonly use comparison of the differences in DNA symbolic records, which requires the global multiple sequence alignment. This solution is often inappropriate, causing a number of imprecisions and requires additional user intervention for exact alignment of the similar segments. The similar segments in DNA represented as a signal are characterized by a similar shape of the curve. The DNA alignment in genomic signals may adjust whole sections not only individual symbols. The dynamic time warping (DTW) is suitable for this purpose and can replace the multiple alignment of symbolic sequences in applications, such as phylogenetic analysis.The proposed method is composed of three main parts. The first part represent conversion of symbolic representation of DNA sequences in the form of a string of A,C,G,T symbols to signal representation in the form of cumulated phase of complex components defined for each symbol. Next part represents signals size adjustment realized by standard signal preprocessing methods: median filtration, detrendization and resampling. The final part necessary for genomic signals comparison is position and length alignment of genomic signals by dynamic time warping (DTW).The application of the DTW on set of genomic signals was evaluated in dendrogram construction using cluster analysis. The resulting tree was compared with a classical phylogenetic tree reconstructed using multiple alignment. The classification of genomic signals using the DTW is evolutionary closer to phylogeny of organisms. This method is more resistant to errors in the sequences and less dependent on the number of input sequences.Classification of genomic signals using dynamic time warping is an adequate variant to phylogenetic analysis using the symbolic DNA sequences alignment; in addition, it is robust, quick and more precise technique.
Sequencing technology allows us to study the structure and function of several genomic processes. Recently, nanopore sequencing technology has become widely used. It allows us to study a single molecule's structure and provide reads several times longer than previously used second sequencer generation methods. However, the error rate is still higher, which limits further analysis. The main source of errors is in the base-calling process. Nanopore sequencers produce a digital signal based on the electric potential's alternations on the pore. The signal then needs to be translated to the sequence of characters representing the nucleotides. The Base-calling process is still improving, but there is a potential to bypass it and process the raw signal data as the carrier of the sequenced molecule information. [1] Thus we present a Python package designed to manipulate, extract and process nanopore sequencing files and provide a helpful tool to analyze squiggles. It provides a single API to access the reads attributes allowing extraction and signal visualization. The package can work with base-called data and provide a known BLAST search process to select specific regions of interest. The package is available at: https://github.com/VojtechBarton/manasig
Abstract Metallothionein (MT) as a potential cancer marker is at the center of interest and its properties, functions and behavior under various conditions is intensively studied. In the present study, two major mammalian MT isoforms (MT‐1 and MT‐2) were separated using capillary electrophoresis (CE) coupled with UV detector in order to describe their basic behavior. Under the optimized conditions, the separation of both isoforms was enabled as well as estimation of detection limits as subunits and units of ng per μL for MT‐2 and MT‐1, respectively. Further, the effects of thermal treatment and the presence of denaturing agent such as urea on MT‐1 and MT‐2 isoforms were studied by CE‐UV. Thermal treatment caused an increase in the signals of both isoforms. A new parameter called precipitation rate has been defined based on this finding. This parameter can be expressed as a slope of the linear regression of the time dependency curve recalculated on the MT concentration. The thermal precipitation rate for MT‐1 and MT‐2 was determined as 1.1 and 0.9 ng of MT/min, respectively. The chemical precipitation rate calculated from the linear regression for both isoforms provided the same value of 0.25 ng of MT/min. The results were confirmed by manual spectrometric measurements and by differential pulse voltammetry Brdicka reaction. Based on these results, a model of MT behavior under the conditions studied was suggested.
The automation of a classification process of electrophoresis gel images is a difficult task. The result highly depends on quality of gel image digitization and on imprecisions in an electrophoretic process. The methodology proposed in the paper helps to remove most of gel image distortions and effectively overcomes the problem of non-uniform electrophoretic process.
Abstract Background Bacterial genotyping is a crucial process in outbreak investigation and epidemiological studies. Several typing methods such as pulsed-field gel electrophoresis, multilocus sequence typing (MLST) and whole genome sequencing are currently used in routine clinical practice. However, these methods are costly, time-consuming and have high computational demands. An alternative to these methods is mini-MLST, a quick, cost-effective and robust method based on high-resolution melting analysis. Nevertheless, no standardized approach to identify markers suitable for mini-MLST exists. Here, we present a pipeline for variable fragment detection in unmapped reads based on a modified hybrid assembly approach using data from one sequencing platform. Results In routine assembly against the reference sequence, high variable reads are not aligned and remain unmapped. If de novo assembly of them is performed, variable genomic regions can be located in created scaffolds. Based on the variability rates calculation, it is possible to find a highly variable region with the same discriminatory power as seven housekeeping gene fragments used in MLST. In the work presented here, we show the capability of identifying one variable fragment in de novo assembled scaffolds of 21 Escherichia coli genomes and three variable regions in scaffolds of 31 Klebsiella pneumoniae genomes. For each identified fragment, the melting temperatures are calculated based on the nearest neighbor method to verify the mini-MLST’s discriminatory power. Conclusions A pipeline for a modified hybrid assembly approach consisting of reference-based mapping and de novo assembly of unmapped reads is presented. This approach can be employed for the identification of highly variable genomic fragments in unmapped reads. The identified variable regions can then be used in efficient laboratory methods for bacterial typing such as mini-MLST with high discriminatory power, fully replacing expensive methods such as MLST. The results can and will be delivered in a shorter time, which allows immediate and fast infection monitoring in clinical practice.
Low-molecular mass proteins rich in cysteines called metallothioneins (MT) can be considered as markers for the pollution of the environment by metals. Here, we report on suggestion for an automated procedure for the isolation of MT followed by voltammetric analysis. Primarily, we optimized the automated detection of MT using an electrochemical analyser. It was found that the most sensitive and repeatable analyses are obtained at a temperature of 4 °C for the supporting electrolyte. Further, we optimized experimental conditions for the isolation of MT by using antibody-linked paramagnetic microparticles. Under the optimal conditions (4 h long interaction between the microparticles and MT), the microparticles were tested on isolation of various amounts of MT. The lowest isolated amount of MT by antibody-linked paramagnetic microparticles was 5 μg ml−1 of MT (50 ng). The automated procedure of MT isolation was further tested on isolation of MT from guppy fish (Poecilia reticulata) treated with silver(I) ions (50 μM AgNO3). The whole process lasted less than five hours and was fully automated. We attempted to correlate these results with the standard method for MT isolation. The correlation coefficient is 0.9901, which confirms that results are in good agreement. Moreover, the concentration of silver ions in tissues of fish treated with Ag(I) ions was determined by high performance liquid chromatography with electrochemical detection.