<p class="Bodytext">Los métodos de regresión no paramétricos son una gran herramienta estadística para obtener parámetros biofísicos a partir de medidas realizadas mediante teledetección. Pero los resultados obtenidos se pueden ver afectados por los datos utilizados en la fase de entrenamiento del modelo. Para asegurarse de que los modelos son robustos, se hace uso de varias técnicas de validación cruzada. Estas técnicas permiten evaluar el modelo con subconjuntos de la base de datos de campo. Aquí, se evalúan dos tipos de validación cruzada en el desarrollo de modelos de regresión no paramétricos: hold-out y k-fold. Los métodos de regresión lineal seleccionados fueron: Linear Regression (LR) y Partial Least Squares Regression (PLSR). Y los métodos no lineales: Kernel Ridge Regression (KRR) y Gaussian Process Regression (GPR). Los resultados de la validación cruzada mostraron que LR ofrece los resultados más inestables, mientras KRR y GPR llevan a resultados más robustos. Este trabajo recomienda utilizar algoritmos de regresión no lineales (como KRR o GPR) combinando con la validación cruzada k-fold con un valor de k igual a 10 para hacer la estimación de una manera robusta.</p>
The continuous monitoring of the terrestrial Earth system by a growing number of optical satellite missions provides valuable insights into vegetation and cropland characteristics. Satellite missions typically provide different levels of data, such as level 1 top-of-atmosphere (TOA) radiance and level 2 bottom-of-atmosphere (BOA) reflectance products. Exploiting TOA radiance data directly offers the advantage of bypassing the complex atmospheric correction step, where errors can propagate and compromise the subsequent retrieval process. Therefore, the objective of our study was to develop models capable of retrieving vegetation traits directly from TOA radiance data from imaging spectroscopy satellite missions. To achieve this, we constructed hybrid models based on radiative transfer model (RTM) simulated data, thereby employing the vegetation SCOPE RTM coupled with the atmosphere LibRadtran RTM in conjunction with Gaussian process regression (GPR). The retrieval evaluation focused on vegetation canopy traits, including the leaf area index (LAI), canopy chlorophyll content (CCC), canopy water content (CWC), the fraction of absorbed photosynthetically active radiation (FAPAR), and the fraction of vegetation cover (FVC). Employing band settings from the upcoming Copernicus Hyperspectral Imaging Mission (CHIME), two types of hybrid GPR models were assessed: (1) one trained at level 1 (L1) using TOA radiance data and (2) one trained at level 2 (L2) using BOA reflectance data. Both the TOA- and BOA-based GPR models were validated against in situ data with corresponding hyperspectral data obtained from field campaigns. The TOA-based hybrid GPR models revealed a range of performance from moderate to optimal results, thus reaching R2 = 0.92 (LAI), R2 = 0.72 (CCC) and 0.68 (CWC), R2 = 0.94 (FAPAR), and R2 = 0.95 (FVC). To demonstrate the models’ applicability, the TOA- and BOA-based GPR models were subsequently applied to imagery from the scientific precursor missions PRISMA and EnMAP. The resulting trait maps showed sufficient consistency between the TOA- and BOA-based models, with relative errors between 4% and 16% (R2 between 0.68 and 0.97). Altogether, these findings illuminate the path for the development and enhancement of machine learning hybrid models for the estimation of vegetation traits directly tailored at the TOA level.
Abstract. Vegetation productivity is a critical indicator of global ecosystem health and is impacted by human activities and climate change. A wide range of optical sensing platforms, from ground-based to airborne and satellite, provide spatially continuous information on terrestrial vegetation status and functioning. As optical Earth observation (EO) data are usually routinely acquired, vegetation can be monitored repeatedly over time; reflecting seasonal vegetation patterns and trends in vegetation productivity metrics. Such metrics include e.g., gross primary productivity, net primary productivity, biomass or yield. To summarize current knowledge, in this paper, we systematically reviewed time series (TS) literature for assessing state-of-the-art vegetation productivity monitoring approaches for different ecosystems based on optical remote sensing (RS) data. As the integration of solar-induced fluorescence (SIF) data in vegetation productivity processing chains has emerged as a promising source, we also include this relatively recent sensor modality. We define three methodological categories to derive productivity metrics from remotely sensed TS of vegetation indices or quantitative traits: (i) trend analysis and anomaly detection, (ii) land surface phenology, and (iii) integration and assimilation of TS-derived metrics into statistical and process-based dynamic vegetation models (DVM). Although the majority of used TS data streams originate from data acquired from satellite platforms, TS data from aircraft and unoccupied aerial vehicles have found their way into productivity monitoring studies. To facilitate processing, we provide a list of common toolboxes for inferring productivity metrics and information from TS data. We further discuss validation strategies of the RS-data derived productivity metrics: (1) using in situ measured data, such as yield, (2) sensor networks of distinct sensors, including spectroradiometers, flux towers, or phenological cameras, and (3) inter-comparison of different productivity products or modelled estimates. Finally, we address current challenges and propose a conceptual framework for productivity metrics derivation, including fully-integrated DVMs and radiative transfer models here labelled as "Digital Twin". This novel framework meets the requirements of multiple ecosystems and enables both an improved understanding of vegetation temporal dynamics in response to climate and environmental drivers and also enhances the accuracy of vegetation productivity monitoring.
Precise and spatially-explicit knowledge of leaf chlorophyll content $(Chl)$ is crucial to adequately interpret the chlorophyll fluorescence $(ChF)$ signal from space. Accompanying information about the reliability of the $Chl$ estimation becomes more important than ever. Recently, a new statistical method was proposed within the family of nonparametric Bayesian statistics, namely Gaussian Processes regression (GPR). GPR is simpler and more robust than their machine learning family members while maintaining very good numerical performance and stability. Other features include: i) GPR requires a relatively small training data set and can adopt very flexible kernels, ii) GPR identifies the relevant bands and observations in establishing relationships with a variable, and finally iii) along with pixelwise estimations GPR provides accompanying confidence intervals. We used GPR to retrieve $Chl$ from hyperspectral reflectance data and evaluated the portability of the regression model to other images. Based on field $Chl$ measurements from the SPARC dataset and corresponding spaceborne CHRIS spectra (acquired in 2003, Barrax, Spain), GPR developed a regression model that was excellently validated ( $r^{2}$ : 0.96, RMSE: 3.82 $\mu{\rm g/cm}^{2}$ ). The SPARC-trained GPR model was subsequently applied to CHRIS images (Barrax, 2003, 2009) and airborne CASI flightlines (Barrax 2009) to generate $Chl$ maps. The accompanying confidence maps provided insight in the robustness of the retrievals. Similar confidences were achieved by both sensors, which is encouraging for upscaling $Chl$ estimates from field to landscape scale. Because of its robustness and ability to deliver confidence intervals, GPR is evaluated as a promising candidate for implementation into $ChF$ processing chains.
<p>The aim of ESA's forthcoming FLuorescence EXplorer (FLEX) is to achieve a global monitoring of the vegetation's chlorophyll fluorescence by means of an imaging spectrometer, FLORIS. For the retrieval of the fluorescence signal measured from space, other vegetation variables need to be retrieved simultaneously, such as (1) Leaf Area Index (LAI), (2) Leaf Chlorophyll content (Cab), and (3) Fractional Vegetation cover (FCover), among others. The undergoing SENTIFLEX ERC project has already demonstrated the feasibility to operationally infer these variables by hybrid retrieval approaches, which combine the generalization capabilities offered by radiative transfer models (RTMs) and computational efficiency of machine learning methods. Reflectance spectra corresponding to a large variety of canopy realizations served as input to train a Gaussian Process Regression (GPR) algorithm for each targeted variable. Following this approach, sets of GPR retrieval models have been trained for Sentinel-2 and -3 reflectance images.</p><p>In that direction, we started to explore the potential of Google Earth Engine (GEE) to facilitate regional to global mapping. &#160;GEE is a platform with multi-petabyte satellite imagery catalog and geospatial datasets with planetary-scale analysis capabilities, which is freely available for scientific purposes. Among the different EO archives, it is possible to access the whole collection of Sentinel-2 ground reflectance data. In this work, we present the results of an efficient implementation of the GPR-based vegetation models developed for Sentinel-2 in the framework of SENSAGRI H2020 project in GEE. By taking advantage of GEE cloud-computing power, we are able to avoid the typical bottleneck of downloading and process large amounts of data locally and generate results of GPR-based retrieval models developed for Sentinel-2 in a fast and efficient way, covering large areas in matter of seconds. As a first step in that direction we present here an open web-based GEE application able to generate LAI Green and LAI Brown maps from Sentinel-2- imagery at 20m in a tile-wise manner all over the world, and time series of selected pixels during user-defined time interval.</p><p>To illustrate this functionalities and have better understanding of the phenology, we targeted a region in Castilla y Le&#243;n (Spain) from where we will present results for 2018 classified per crop type. This land cover classification was generated by the ITACYL (<span>Instituto Tecnol&#243;gico Agrario de Castilla y Le&#243;n</span>) during SENSAGRI.</p><p>Future development will tackle the possibility to extend our analysis capability to additional variables, such as FCover and Cab, maintaining the computational efficiency as the main driver to ensure that the GEE application continues to be an agile and easy tool for spatiotemporal Earth observation studies.</p>
<p>In general, modeling phenological evolution represents a challenging task mainly because of time series gaps and noisy data, coming from different viewing and illumination geometries, cloud cover, seasonal snow and the interval needed to revisit and acquire data for the exact same location. For that reason, the use of reliable gap-filling fitting functions and smoothing filters is frequently required for retrievals at the highest feasible accuracy. Of specific interest to filling gaps in time series is the emergence of machine learning regression algorithms (MLRAs) which can serve as fitting functions. Among the multiple MLRA approaches currently available, the kernel-based methods developed in a Bayesian framework deserve special attention because of both being adaptive and providing associated uncertainty estimates, such as Gaussian Process Regression (GPR).</p><p>Recent studies demonstrated the effectiveness of GPR for gap-filling of biophysical parameter time series because the hyperparameters can be optimally set for each time series (one for each pixel in the area) with a single optimization procedure. The entire procedure of learning a GPR model only relies on appropriate selection of the type of kernel and the hyperparameters involved in the estimation of input data covariance. Despite its clear strategic advantage, the most important shortcomings of this technique are the (1) high computational cost and (2) memory requirements of their training, which grows cubically and quadratically with the number of model&#8217;s samples, respectively. This can become problematic in view of processing a large amount of data, such as in Sentinel-2 (S2) time series tiles. Hence, optimization strategies need to be developed on how to speed up the GPR processing while maintaining the superior performance in terms of accuracy.</p><p>To mitigate its computational burden and to address such shortcoming and repetitive procedure, we evaluated whether the GPR hyperparameters can be preoptimized over a reduced set of representative pixels and kept fixed over a more extended crop area. We used S2 LAI time series over an agricultural region in Castile and Leon (North-West Spain) and testing different functions for Covariance estimation such as exponential Kernel, Squared exponential kernel and matern kernel with parameter 3/2 or 5/2. The performance of image reconstructions was compared against the standard per-pixel GPR time series training process. Results showed that accuracies were on the same order (12% RMSE degradation) whereas processing time accelerated up to 90 times. Crop phenology indicators were also calculated and compared, revealing similar temporal patterns with differences in start and end of growing season of no more than five days. To the benefit of crop monitoring applications, all the gap-filling and phenology indicators retrieval techniques have been implemented into the <strong>freely downloadable GUI toolbox DATimeS</strong> (Decomposition and Analysis of Time Series Software - https://artmotoolbox.com/).</p>
Accurate plant-type (PT) detection forms an important basis for sustainable land management maintaining biodiversity and ecosystem services. In this sense, Sentinel-2 satellite images of the Copernicus program offer spatial, spectral, temporal, and radiometric characteristics with great potential for mapping and monitoring PTs. In addition, the selection of a best-performing algorithm needs to be considered for obtaining PT classification as accurate as possible. To date, no freely downloadable toolbox exists that brings the diversity of the latest supervised machine-learning classification algorithms (MLCAs) together into a single intuitive user-friendly graphical user interface (GUI). To fill this gap and to facilitate and automate the usage of MLCAs, here we present a novel GUI software package that allows systematically training, validating, and applying pixel-based MLCA models to remote sensing imagery. The so-called MLCA toolbox has been integrated within ARTMO's software framework developed in Matlab which implements most of the state-of-the-art methods in the machine learning community. To demonstrate its utility, we chose a heterogeneous case study scene, a landscape in Southwest Iran to map PTs. In this area, four main PTs were identified, consisting of shrub land, grass land, semi-shrub land, and shrub land-grass land vegetation. Having developed 21 MLCAs using the same training and validation, datasets led to varying accuracy results. Gaussian process classifier (GPC) was validated as the top-performing classifier, with an overall accuracy (OA) of 90%. GPC follows a Laplace approximation to the Gaussian likelihood under the supervised classification framework, emerging as a very competitive alternative to common MLCAs. Random forests resulted in the second-best performance with an OA of 86%. Two other types of ensemble-learning algorithms, i.e., tree-ensemble learning (bagging) and decision tree (with error-correcting output codes), yielded an OA of 83% and 82%, respectively. Following, thirteen classifiers reported OA between 70% and 80%, and the remaining four classifiers reported an OA below 70%. We conclude that GPC substantially outperformed all classifiers, and thus, provides enormous potential for the classification of a diversity of land-cover types. In addition, its probabilistic formulation provides valuable band ranking information, as well as associated predictive variance at a pixel level. Nevertheless, as these are supervised (data-driven) classifiers, performances depend on the entered training data, meaning that an assessment of all MLCAs is crucial for any application. Our analysis demonstrated the efficacy of ARTMO's MLCA toolbox for an automated evaluation of the classifiers and subsequent thematic mapping.
Optical Earth Observation is often limited by weather conditions such as cloudiness. Radar sensors have the potential to overcome these limitations, however, due to the complex radar-surface interaction, the retrieving of crop biophysical variables using this technology remains an open challenge. Aiming to simultaneously benefit from the optical domain background and the all-weather imagery provided by radar systems, we propose a data fusion approach focused on the cross-correlation between radar and optical data streams. To do so, we analyzed several multiple-output Gaussian processes (MOGP) models and their ability to fuse efficiently Sentinel-1 (S1) Radar Vegetation Index (RVI) and Sentinel-2 (S2) vegetation water content (VWC) time series over a dry agri-environment in southern Argentina. MOGP models not only exploit the auto-correlations of S1 and S2 data streams independently but also the inter-channel cross-correlations. The S1 RVI and S2 VWC time series at the selected study sites being the inputs of the MOGP models proved to be closely correlated. Regarding the set of assessed models, the Convolutional Gaussian model (CONV) delivered noteworthy accurate data fusion results over winter wheat croplands belonging to the 2020 and 2021 campaigns (NRMSEwheat2020 = 16.1%; NRMSEwheat2021 = 10.1%). Posteriorly, we removed S2 observations from the S1 & S2 dataset corresponding to the complete phenological cycles of winter wheat from September to the end of December to simulate the presence of clouds in the scenes and applied the CONV model at the pixel level to reconstruct spatiotemporally-latent VWC maps. After applying the fusion strategy, the phenology of winter wheat was successfully recovered in the absence of optical data. Strong correlations were obtained between S2 VWC and S1 & S2 MOGP VWC reconstructed maps for the assessment dates (R2¯wheat−2020 = 0.95, R2¯wheat−2021 = 0.96). Altogether, the fusion of S1 SAR and S2 optical EO data streams with MOGP offers a powerful innovative approach for cropland trait monitoring over cloudy high-latitude regions.
CHRIS/PROBA is capable of sampling reflected radiation at five viewing angles over the visible and near-infrared regions of the solar spectrum with a relatively high spatial resolution (~17m). We exploited both the spectral and angular domain of CHRIS data in order to map the surface heterogeneity of an Alpine coniferous forest during winter. In the spectral domain, linear spectral unmixing of the nadir image resulted in a canopy cover map. In the angular domain, pixelwise inversion of the Rahman‐Pinty‐Verstraete (RPV) model at a single wavelength at the red edge (722 nm) yielded a map of the Minnaert-k parameter that provided information on surface heterogeneity at subpixel scale. Merging both maps resulted in a forest cover heterogeneity map, which contains more detailed information on canopy heterogeneity at the CHRIS subpixel scale than can be obtained from a single-source data set.