Abstract. Wind energy is a critical part of overcoming the use of fossil or nuclear energy usage. The price pressure on the renewable industry sector demands to cut the costs for costly regular inspections carried out by industrial climbers. Drone-based video-inspection reduces costs as well as increases the safety of inspection personal. To further increase the throughput, automatic or semi-automatic solutions to analyze these videos are needed. However, modern machine learning architectures need a lot of data to work reliably. This is by design a problem, as structural damage is rather rare in industrial infrastructure. Our proposed approach uses Generative Adversarial Networks to generate synthetic unmanned aerial vehicle imagery. This allows us to create a large enough training dataset (> 103) from a dataset, which is at least an order of magnitude smaller (approx. 102). We show that we can increase the classification accuracy of up to 6 percentage points.
In the last years, bioimaging has turned from qualitative measurements towards a high-throughput and highcontent modality, providing multiple variables for each biological sample analyzed. We present a system which combines machine learning based semantic image annotation and visual data mining to analyze such new multivariate bioimage data. Machine learning is employed for automatic semantic annotation of regions of interest. The annotation is the prerequisite for a biological object-oriented exploration of the feature space derived from the image variables. With the aid of visual data mining, the obtained data can be explored simultaneously in the image as well as in the feature domain. Especially when little is known of the underlying data, for example in the case of exploring the effects of a drug treatment, visual data mining can greatly aid the process of data evaluation. We demonstrate how our system is used for image evaluation to obtain information relevant to diabetes study and screening of new anti-diabetes treatments. Cells of the Islet of Langerhans and whole pancreas in pancreas tissue samples are annotated and object specific molecular features are extracted from aligned multichannel fluorescence images. These are interactively evaluated for cell type classification in order to determine the cell number and mass. Only few parameters need to be specified which makes it usable also for non computer experts and allows for high-throughput analysis.
Abstract Deep convolutional neural networks are emerging as the state of the art method for supervised classification of images also in the context of taxonomic identification. Different morphologies and imaging technologies applied across organismal groups lead to highly specific image domains, which need customization of deep learning solutions. Here we provide an example using deep convolutional neural networks (CNNs) for taxonomic identification of the morphologically diverse microalgal group of diatoms. Using a combination of high-resolution slide scanning microscopy, web-based collaborative image annotation and diatom-tailored image analysis, we assembled a diatom image database from two Southern Ocean expeditions. We use these data to investigate the effect of CNN architecture, background masking, data set size and possible concept drift upon image classification performance. Surprisingly, VGG16, a relatively old network architecture, showed the best performance and generalizing ability on our images. Different from a previous study, we found that background masking slightly improved performance. In general, training only a classifier on top of convolutional layers pre-trained on extensive, but not domain-specific image data showed surprisingly high performance (F1 scores around 97%) with already relatively few (100–300) examples per class, indicating that domain adaptation to a novel taxonomic group can be feasible with a limited investment of effort.
Data augmentation is an established technique in computer vision to foster the generalization of training and to deal with low data volume. Most data augmentation and computer vision research are focused on everyday images such as traffic data. The application of computer vision techniques in domains like marine sciences has shown to be not that straightforward in the past due to special characteristics, such as very low data volume and class imbalance, because of costly manual annotation by human domain experts, and general low species abundances. However, the data volume acquired today with moving platforms to collect large image collections from remote marine habitats, like the deep benthos, for marine biodiversity assessment and monitoring makes the use of computer vision automatic detection and classification inevitable. In this work, we investigate the effect of data augmentation in the context of taxonomic classification in underwater, i.e., benthic images. First, we show that established data augmentation methods (i.e., geometric and photometric transformations) perform differently in marine image collections compared to established image collections like the Cityscapes dataset, showing everyday traffic images. Some of the methods even decrease the learning performance when applied to marine image collections. Second, we propose new data augmentation combination policies motivated by our observations and compare their effect to those proposed by the AutoAugment algorithm and can show that the proposed augmentation policy outperforms the AutoAugment results for marine image collections. We conclude that in the case of small marine image datasets, background knowledge, and heuristics should sometimes be applied to design an effective data augmentation method.
We present results of our machine learning approach to the problem of classifying GC-MS data originating from wheat grains of different farming systems. The aim is to investigate the potential of learning algorithms to classify GC-MS data to be either from conventionally grown or from organically grown samples and considering different cultivars. The motivation of our work is rather obvious on the background of nowadays increased demand for organic food in post-industrialized societies and the necessity to prove organic food authenticity. The background of our data set is given by up to eleven wheat cultivars that have been cultivated in both farming systems, organic and conventional, throughout three years. More than 300 GC-MS measurements were recorded and subsequently processed and analyzed in the MeltDB 2.0 metabolomics analysis platform, being briefly outlined in this paper. We further describe how unsupervised (t-SNE, PCA) and supervised (RF, SVM) methods can be applied for sample visualization and classification. Our results clearly show that years have most and wheat cultivars have second-most influence on the metabolic composition of a sample. We can also show, that for a given year and cultivar, organic and conventional cultivation can be distinguished by machine-learning algorithms.