Tegillarca granosa samples contaminated artificially by three kinds of toxic heavy metals including zinc (Zn), cadmium (Cd), and lead (Pb) were attempted to be distinguished using laser-induced breakdown spectroscopy (LIBS) technology and pattern recognition methods in this study. The measured spectra were firstly processed by a wavelet transform algorithm (WTA), then the generated characteristic information was subsequently expressed by an information gain algorithm (IGA). As a result, 30 variables obtained were used as input variables for three classifiers: partial least square discriminant analysis (PLS-DA), support vector machine (SVM), and random forest (RF), among which the RF model exhibited the best performance, with 93.3% discrimination accuracy among those classifiers. Besides, the extracted characteristic information was used to reconstruct the original spectra by inverse WTA, and the corresponding attribution of the reconstructed spectra was then discussed. This work indicates that the healthy shellfish samples of Tegillarca granosa could be distinguished from the toxic heavy-metal-contaminated ones by pattern recognition analysis combined with LIBS technology, which only requires minimal pretreatments.
In this paper, we present a discrete event method to implement the simulation of DNA replication process. This method focuses on analyzing features about DNA replication, finding the transition among these statuses of origins, and then using these transitions to create events, at last, we construct event model based on these events and present a method to realize the simulation. The result we get from simulation is numbered in minutes represent the time to complete the process of DNA replication. This method function well in data processing stability and result authenticity.
Dimension reduction is an important topic in data mining, which is widely used in the areas of genetics, medicine, and bioinformatics. We propose a new local dimension reduction algorithm TotalPLS that operates in a unified partial least squares (PLS) framework and implement an information fusion of PLS-based feature selection and feature extraction. This paper focuses on extracting the potential structure hidden in high-dimensional multicategory microarray data, and interpreting and understanding the results provided by the potential structure information. First, we propose using PLS-based recursive feature elimination (PLSRFE) in multicategory problems. Then, we perform feature importance analysis based on PLSRFE for high-dimensional microarray data to determine the information feature (biomarkers) subset, which relates to the studied tumor subtypes problem. Finally, PLS-based supervised feature extraction is conducted on the selected specific genes subset to extract comprehensive features that best reflect the nature of classification to have a discriminating ability. The proposed algorithm is compared with several state-of-the-art methods using multiple high-dimensional multicategory microarray datasets. Our comparison is performed in terms of recognition accuracy, relevance, and redundancy. Experimental results show that the algorithm proposed by us can improve the recognition rate and computational efficiency. Furthermore, mining potential structure information improves the interpretability and understandability of recognition results. The proposed algorithm can be effectively applied to microarray data analysis for the discovery of gene coexpression and coregulation.
In view of the characteristics of small sample and high dimensional data,Generalized Small Samples(GSS) is defined. It reduces information feature of GSS:feature extraction(dimensionality extraction) and feature selection(dimensionality selection). Firstly,unsupervised feature extraction based on Principal Component Analysis(PCA) and supervised feature extraction based on Partial Least Squares(PLS) are introduced.Secondly,analyzing the structure of first PC,it presents new global PCA-based and PLSbased feature selection approaches,in addition recursive feature elimination on PLS(PLS-RFE) is realized.Finally,the approaches are applied to the classification of MIT AML/ALL,it performs feature extraction on PCA and PLS,and feature selection compared with PLS-RFE.The information compression of GSS is realized.