Ellenberg indicator values (EIV) are widely used in vegetation ecology, but the values for many species in Southeastern Europe are not available due to incomplete knowledge of their ecology: it is therefore of paramount importance to estimate missing values in existing databases. The entire EIV set for a single species can be missing or a single EIV can be missing for species for which other indicator values are available. Our aim here is to provide a simple method to impute missing values for species who have missing data in a single or multiple EIV. For this purpose, we adopt a multiple imputation procedure and compare a number of imputation methods on the basis of two datasets: i) "indices", the set of 9 Ellenberg indicators taken from literature, available for 10,824 species and ii) "vegetation", a set describing the physical and climatic characteristics (Light, Temperature, Continentality, Soil moisture, Nitrogen, Soil pH, Hemeroby index, Humidity, Organic_matter) of 29,935 relevés from Southeastern Europe where at least one tree species is present. The imputation methods we considered are: k-Nearest Neighbour, multiple linear regression (with or without collinearity correction), Reprediction Algorithm, Weighted Averaging (WA) and Weighted Averaging Partial Least Squares (WAPLS) regression. The different methods of imputation were compared by looking at the output produced and its deviation from the "true" observed values for a set of species with known EIVs. We have considered a set of species with known EIVs and proceeded to multiple imputation using the methods above; as a measure of performance we adopted the mean squared error (MSE) estimate, and expert judgement of ecological consistency. Models based on Regression and k-Nearest Neighbour seem to outperform the others. On the contrary, Reprediction algorithm in its different forms: produced less satisfactory results. Imputation of missing values is generally based on expert knowledge or on some variant of weighted averaging (also known as Hill's method). Here we show that other methods may be more effective and should be appropriately considered by vegetation scientists, since those may allow the application of EIVs in other biogeographic regions.
Aim : To propose a Finite Mixture Model (FMM) as an additional approach for classifying large datasets of georeferenced vegetation plots from complex vegetation systems. Study area : The Italian peninsula including the two main islands (Sicily and Sardinia), but excluding the Alps and the Po plain. Methods : We used a database of 5,593 georeferenced plots and 1,586 vascular species of forest vegetation, created in TURBOVEG by storing published and unpublished phytosociological plots collected over the last 30 years. The plots were classified according to species composition and environmental variables using a FMM. Classification results were compared with those obtained by TWINSPAN algorithm. Groups were characterized in terms of ecological parameters, dominant and diagnostic species using the fidelity coefficient. Interpretation of resulting forest vegetation types was supported by a predictive map, produced using discriminant functions on environmental predictors, and by a non‐metric multidimensional scaling ordination. Results : FMM clustering obtained 24 groups that were compared with those from TWINSPAN, and similarities were found only at a higher classification level corresponding to the main orders of the Italian broadleaf forest vegetation: Fagetalia sylvaticae, Carpinetalia betuli, Quercetalia pubescenti-petraeae and Quercetalia ilicis . At lower syntaxonomic level, these 24 groups were referred to alliances and sub-alliances. Conclusions : Despite a greater computational complexity, FMM appears to be an effective alternative to the traditional classification methods through the incorporation of modelling in the classificatory process. This allows classification of both the co-occurrence of species and environmental factors so that groups are identified not only on their species composition, as in the case of TWINSPAN, but also on their specific environmental niche. Taxonomic reference : Conti et al. (2005). Abbreviations : CLM = Community-level models; FMM = Finite Mixture Model; NMDS = non‐metric multidimensional scaling.
Abstract Aim Assessing the performances of different sampling approaches for documenting community diversity may help to identify optimal sampling efforts and strategies, and to enhance conservation and monitoring planning. Here, we used two data sets based on probabilistic and preferential sampling schemes of Italian forest vegetation to analyze the multifaceted performances of the two approaches across three major forest types at a large scale. Location Italy. Methods We pooled 804 probabilistic and 16,259 preferential forest plots as samples of vascular plant diversity across the country. We balanced the two data sets in terms of sizes, plot size, geographical position, and vegetation types. For each of the two data sets, 1000 subsets of 201 random plots were compared by calculating the shared and exclusive indicator species, their overlap in the multivariate space, and the areas encompassed by spatially‐constrained rarefaction curves. We then calculated an index of performance using the ratio between the additional and total information collected by each sampling approach. The performances were tested and evaluated across the three major forest types. Results The probabilistic approach performed better in estimating species richness and diversity of species assemblages, but did not detect other components of the regional diversity, such as azonal forests. The preferential approach outperformed the probabilistic approach in detecting forest‐specialist species and plant diversity hotspots. Conclusions Using a novel workflow based on vegetation‐plot exclusivities and commonalities, our study suggests probabilistic and preferential sampling approaches are to be used in combination for better conservation and monitor planning purposes to detect multiple aspects of plant community diversity. Our findings can assist the implementation of national conservation planning and large‐scale monitoring of biodiversity.
A database of phytosociological relevés of Broadleaved Temperate Deciduous Forests of peninsular Italy (Apennines, from Liguria to Calabria regional districts) is presented (EVSItalia Database of Broadleaved Temperate Deciduous Forests, GIVD ID EU-IT-011).The data-set aims to store information from published certified phytosociological sources, in order to support ongoing reviews based on larger data-sets for comparative classification of Italian vegetation types.The database stores 1,092 relevés, mostly from beech forests, along an altitudinal gradient.Three beech forest types are identified: between 800 and 1,000/1,100 m.a.s.l., a mixed forest with the presence of Q. cerris, Ilex aquifolium, Tilia sp. and Taxus baccata; above 1,100 m a.s.l. until the treeline, a beechdominated forest; below 800 m a.s.l., relic Fagus forests, in very restricted areas of the Apennines.All the relevés are uploaded to the TURBOVEG software for plot databases.An ArcGis Database related to the TURBOVEG units reports localization information in GPS coordinates, toponym or quadrats (grid of Operational Geographic Units -OGU of the Italian floristic grid).Descriptive records of locations extracted from geographical information systems are also stored.The data model structure allows producing presenceabsence or quantitative matrices for classification of communities and their parametrisation over larger areas by geostatistical analysis.The aims are to explore patterns of similarity among the distribution of different associations (chorological groups of associations, provincialism), patterns of geographic changes in community distribution along topographical gradients and to test changes in the physical scenario of selected individual communities along geographical gradients.A different insight into the patterns of synonymy and reassessment among syntaxa on the basis of a geographical treatise is expected as well.