Analysis routines in R to study PARAFAC components of DOM fluorescence from mixing zones of arctic shelf seas

2020 
Abstract We have developed the package albatross in freely available R environment to perform parallel factor analysis of excitation–emission matrices (EEM/PARAFAC) analysis of seawaters. The key pre-processing steps for successful data analysis included intensity corrections for the instrument response function; inner filter effect correction; Raman and Rayleigh scattering handling. After implementing the necessary routines, we have focused on evaluation of their robustness for analysis of real data as well as on model validation and the assessment of adequacy of identified factors. The tolerance of 10 − 9 in the absolute change of R 2 was found to be enough to prevent stopping before reaching the solution during the modeling. We have found that widely used S 4 C 6 T 3 split-half procedure for validation cannot result in an unequivocal selection of the number of components. We have instead suggested a statistical evaluation of 100 randomly obtained splits to optimize number of factors (“randomized split-half”). The median split-half distance allows finding the unique optimal decomposition of EEM (4-component model in our case), and, therefore, it can serve as a criterion for model validation. The validation of the model was additionally supported by monitoring of a number of iterations before convergence and a decomposition R 2 . Finally, the validated EEM/PARAFAC model with 4 components for seawater obtained during the 63rd cruise of RV Akademik Mstislav Keldysh in 2015 (66 Arctic seawater samples)to those obtained by drEEM package in MatLab and to previous studies with the use of OpenFluor spectral library. A brief characterization of oceanographic data is provided.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    66
    References
    0
    Citations
    NaN
    KQI
    []