The purpose of the study is to ascertain the key feature subsets of hepatitis b virus (HBV) reactivation and establish classification prognosis models of HBV reactivation for primary liver carcinoma (PLC) patients after precise radiotherapy (RT). Genetic Algorithm (GA) is proposed to extract the key feature subsets of HBV reactivation from the initial feature sets of primary liver carcinoma. Bayes and support vector machine (SVM) are employed to build classification prognosis models of HBV reactivation, the classification performance of the key feature subsets and the initial feature sets are predicted. The experimental results show that feature extraction based on GA improve the classification performance of HBV reactivation, five risk factors have best recognition performance of HBV reactivation, including 'HBV DNA level', 'tumor staging TNM', 'outer margin of radiotherapy', 'two kinds code of outer margin of radiotherapy' and 'V45'. Two kinds of classifiers have good recognition performance in HBV reactivation. The best classification accuracy of Bayes classifier reached to 82.07%, and the best classification accuracy of SVM classifier reached to 82.89%.
Hepatitis B virus (HBV) reactivation is a common complication in patients with primary liver cancer (PLC) after precise radiotherapy. Prognostic protection in time can reduce morbidity and mortality. In this paper, we first identify the risk factors and repertoire of hepatitis B virus reactivation through a new feature selection method based on neighborhood component analysis (NCA). Then we use SVM classifier to classify and predict all the feature subsets. Based on this, Bayes and Grid optimization are respectively used to optimize the previous SVM model. The key feature subset is classified and predicted. The experimental results show that HBV DNA level, KPS score, segmentation method, extrinsic boundary, V25, TNM stage and Child-Pugh stage are the risk factors of HBV reactivation. Among them, V25 found after NCA feature selection is the first risk factor in the study of hepatitis B virus reactivation. Among them, under 10-fold cross validation the prediction accuracy of the two eigenvector combinations of HBV DNA level and extrinsic boundary is 85.56%. Bayesian and grid-optimized SVM classifier can be applied to the study of hepatitis B virus reactivation.
A feature extracting method based on wavelets for Fourier Transform Infrared (FTIR) cancer data analysis is presented in this paper. A set of low frequency wavelet basis is used to represent FTIR data to reduce data dimension and remove noise. The fuzzy C-means algorithm is used to classify the data. Experiments are conducted to compare classification performance using wavelet features and the original FTIR data provided by the Derby City General Hospital in the UK. Experiments show that only 30 wavelet features are needed to represent 901 wave numbers of the FTIR data to produce good clustering results.
CASE REPORT A 12-year-old boy visited our clinic due to multiple persistent erythematous plaques on the left leg. History of trauma, operation or drug intake before the appearance of the skin lesions was absent. Physical examination revealed dusky red plaques with central purpuric change and peripheral brownish hue (Fig. 1). Older lesions turned to yellow-brownish and persisted unchanged. There was mild tenderness, but no itching. The lesions seemed to follow the course of the greater saphenous vein and its main branches (Fig. 2). Skin biopsy was done and showed as Fig. 3 and 4.
In this paper we propose an approach to use wavelets and 2D convolutional neural network (CNN) to extract features for the prediction of protein secondary structure. A wavelet feature matrix extracted from PSSM profiles is input into convolutional neural network to extract the features. Wavelets extract changing information of PSSM evolutionary matrix, and convolutional neural networks catch the sequence interaction information of residue. The feature maps extracted from last convolutional layer are used to feed to Bayes classifier, in order to build prediction model. The Q3 accuracy 76.9% of ASTRAL dataset is achieved based on 3 fold cross validation experiments using wavelet and CNN features. The performance based on wavelet and CNN features is better than 73.7% of ASTRAL dataset using the original features. Experimental results illustrate that wavelet and CNN features improve the prediction performance.
Protein secondary structure prediction (PSSP) is not only beneficial to the study of protein structure and function but also to the development of drugs. As a challenging task in computational biology, experimental methods for PSSP are time-consuming and expensive. In this paper, we propose a novel PSSP model DLBLS_SS based on deep learning and broad learning system (BLS) to predict 3-state and 8-state secondary structure. We first use a bidirectional long short-term memory (BLSTM) network to extract global features in residue sequences. Then, our proposed SEBTCN based on temporal convolutional networks (TCN) and channel attention can capture bidirectional key long-range dependencies in sequences. We also use BLS to rapidly optimize fused features while further capturing local interactions between residues. We conduct extensive experiments on public test sets including CASP10, CASP11, CASP12, CASP13, CASP14 and CB513 to evaluate the performance of the model. Experimental results show that our model exhibits better 3-state and 8-state PSSP performance compared to five state-of-the-art models.
The highest three-state prediction accuracy of protein secondary structure is now at 82-84% without using structure templates, approaching to the theoretical limit 88-90%. Increasingly larger training datasets cover more protein sequences and structures. More powerful deep learning techniques are not only able to deal with the computation load of large data, but also can capture the long-range interactions of protein sequence. In this research, we propose a new approach to design a two dimensional deep convolutional neural networks (2DCNN) with 6 convolutional layers and 5 max-pooling layers. The two dimensional convolutional neural networks keep original amino acid sequence position information based on two dimensional input matrix, and extract features of the sequence interactions better. The performance of our prediction model 2DCNN is 83.09%, 81.74%, 82.41%, 83.56%, 81.16%, and 80.30% for 25PDB, CB513, CASP9, CASP10, CASP11, and CASP12 datasets. Our prediction model achieves better results compared to most state of the art methods. (http://qilubio.qlu.edu.cn/protein)