Wavelengths combination optimization in near infrared spectroscopy (NIRS) analysis was very important for improving model prediction effect, simplifying high dimension problems, reducing model complexity and designing special NIRS instruments with high signal noise ratio. Based on the prediction effect of single wavelength linear regression model, a special wavelength set with 25 information data points was filtered out. All wavelengths combinations of these 25 wavelengths were used to establish multiple linear regression (MLR) models respectively. With a prediction effect close to the PLS model based on whole spectral region, the simplest MLR model is the 7-wavelengths combination of 1105.5, 1108, 1895, 2150.5, 2278.5, 2284, 2286.5 (nm), RMSEP, R P , RRMSEP was 0.2505 (%), 0.8753, 15.73% respectively. This indicated that the wavelengths combination selection method based on the prediction effect of single wavelength linear regression model could be applied to the NIRS analysis and could provide valuable reference for designing minitype special NIRS instruments.
The rapid determination method of blood clinical biochemical indicators based on near infrared spectral (NIRS) analysis is an important research branch in health monitoring systems. In this paper, the rapid determination method and the optimal analysis model of serum cholesterol were established by using the NIRS technology, partial least squares (PLS) and Savitzky-Golay (SG) smoothing method. Based on the prediction effect of the optimal single wavenumber model, calibration set and prediction set were divided. The calibration and prediction models were established by using PLS method adopting the combination bands of 10000-5300 cm -1 and 4920-4160 cm -1 with SG smoothing. By extending the number of smoothing points to 5, 7 ... 61 (odd) and polynomial degree to 2, 3, 4, 5, 6, fourteen smoothing coefficient tables including 400 SG smooth modes were calculated. Based on computer algorithms platform which was developed by authors, PLS models corresponding to all combinations of 400 SG smooth modes and 1-40 PLS factors were constructed. The optimal model was selected according to the prediction effect, and the derivation order is 1, the polynomial degree is 3 or 4, the number of smoothing points is 43, the optimal PLS factor is 13, the prediction correlation coefficient RP is 0.811, and the optimal RMSEP reaches 0.416 mmol/L. The dividing method for calibration set and prediction set, the extending of SG smoothing modes, large-scale joint optimization of SG smoothing modes and PLS factors can be effectively applied to the model optimization of NIRS analysis.
The moving-window bis-correlation coefficients (MW-BiCC) was proposed and employed for the discriminant analysis of transgenic sugarcane leaves and [Formula: see text]-thalassemia with visible and near-infrared (Vis–NIR) spectroscopy. The well-performed moving-window principal component analysis linear discriminant analysis (MW-PCA–LDA) was also conducted for comparison. A total of 306 transgenic (positive) and 150 nontransgenic (negative) leave samples of sugarcane were collected and divided to calibration, prediction, and validation. The diffuse reflection spectra were corrected using Savitzky–Golay (SG) smoothing with first-order derivative ([Formula: see text]), third-degree polynomial ([Formula: see text]) and 25 smoothing points ([Formula: see text]). The selected waveband was 736–1054[Formula: see text]nm with MW-BiCC, and the positive and negative validation recognition rates ([Formula: see text]_REC[Formula: see text], [Formula: see text]_REC[Formula: see text] were 100%, 98.0%, which achieved the same effect as MW-PCA–LDA. Another example, the 93 [Formula: see text]-thalassemia (positive) and 148 nonthalassemia (negative) of human hemolytic samples were collected. The transmission spectra were corrected using SG smoothing with [Formula: see text], [Formula: see text] and [Formula: see text]. Using MW-BiCC, many best wavebands were selected (e.g., 1116–1146, 1794–1848 and 2284–2342[Formula: see text]nm). The [Formula: see text]_REC[Formula: see text] and [Formula: see text]_REC[Formula: see text] were both 100%, which achieved the same effect as MW-PCA–LDA. Importantly, the BiCC only required calculating correlation coefficients between the spectrum of prediction sample and the average spectra of two types of calibration samples. Thus, BiCC was very simple in algorithm, and expected to obtain more applications. The results first confirmed the feasibility of distinguishing [Formula: see text]-thalassemia and normal control samples by NIR spectroscopy, and provided a promising simple tool for large population thalassemia screening.
A simultaneous and rapid quantification method of the thalassemia screening indicators (MCV, MCH and HbA2) in the human blood was discussed by using Fourier transform infrared (FTIR) spectrometer and attenuated total reflection (ATR) techniques. Eight samples of the human blood were collected, MCV, MCH and HbA2 were measured by conventional chemical methods respectively, and it was as the reference chemical value of the calibration model for the spectrum. Each sample distilled water hemolysis, were diluted to 2 times, 3 times, 4 times, 5 times, 6 times hemolytic solution sample respectively, and the whole blood samples had been together a total of 6 groups of 48 samples for spectrometry. To each sample group and each screening indicator, based on the second derivatives of the spectra were calculated by using 11 points Savitzky-Goray smoothing, multiple linear regression (MLR) models were established by using the whole region (4000-600 cm -1 ) and the fingerprint region (1600-900 cm -1 ) respectively. The linear regression model corresponding to each wavenumber was also established, and the optimal single-point model was selected by the prediction effect. In the above calculation process, the predictive value of each sample was calculated by using the leave-one-out cross-validation. The results showed that the optimal single-point model corresponding to each sample group and each screening indicator had all good prediction effect. To the optimal single-point models for the whole blood sample group for the indicators of MCV, MCH and HbA2, which by direct determination, the adopting wavenumbers, root mean square error cross validations (RMSECV), relative root mean square error cross validations (RRMSECV), prediction correlation coefficients (R P ) were 1753 cm -1 , 2.52fl, 2.8%, 0.724; 951 cm -1 , 1.05 pg, 3.2% 0.864; 868 cm -1 , 0.1%, 3.1%, 0.941 respectively.
A wavelength selection method for spectroscopic analysis, named correlation coefficient optimization partial least-squares (CCO-PLS), is proposed, and was successfully employed for reagent-free ATR-FTIR spectroscopic analysis of albumin and globulin in human serum.
Regarding absorption spectrum, high absorption corresponds to low light transmittance and relatively loud noise, whereas low absorption corresponds to low information content, which interferes with the modeling of spectral analysis. Appropriate absorbance level is necessary to improve spectral information content and reduces noise level. In this study, based on the selection of the upper and lower bounds of absorbance, the absorbance value optimization partial least squares (AVO-PLS) method was proposed for appropriate wavelength model selection. Near-infrared spectroscopic analysis of hyperlipidemia indicators, namely, total cholesterol (TC), and triglyceride (TG), was conducted to validate the predicted performance of AVO-PLS. Well-performed wavelength selection methods, namely, moving-window PLS (MW-PLS) of continuous type-and successive projections algorithm (SPA) of discrete type, were also conducted for comparison. The spectra were first corrected using Savitzky–Golay smoothing. Modeling was performed based on the multiple partitioning of calibration and prediction sets to avoid data over-fitting and achieve parameter stability. The selected absorbance ranged from 0.45 to 0.86 for TC and from 0.45 to 0.92 for TG, and the corresponding waveband combinations were 1,376–1,388 and 1,560–1840 nm for TC and 1,376–1,390 and 1,552–1,846 nm for TG. Among them, the waveband combination of TG covers TC’s one, and can be used for the high-precision cooperativity analysis of the two indicators. Using the independent validation samples, the RMSEP and R P of 0.164 mmol l −1 and 0.990 for TC and 0.096 mmol l −1 and 0.997 for TG were obtained by the cooperativity model. And the sensitivity and specificity for hyperlipidemia were 98.0 and 100%, respectively. These values were better than those of MW-PLS and SPA. Importantly, the proposed AVO-PLS is a novel multi-band optimization approach for improving prediction performance and applicability. This method is expected to obtain more applications.