Special Section on Large Scale Algorithms for Learning and OptimizationIn these days, it has been easy to collect massively large dataset such as web contents on the internet and genome data in bioinformatics, and the role of datamining that extracts relevant information from such a huge dataset is getting important.In the field of machine learning, various kinds of learning and optimization techniques such as support vector machines, boosting, and particle filters have been developed.However, those algorithms cannot be applied directly to practical applications, because they are based on the large amount of freedoms which yields unacceptable computational cost for large-scale data.Recently, many algorithms that approximate state-of-art learning algorithms in a reasonable computational cost have been developed on a case-by-case basis.Thus, the IEICE (Institute of Electronics, Information and Communication Engineers) Transactions on Information and Systems organized a Special Section on Large Scale Algorithms for Learning and Optimization to establish the theoretical foundation of the algorithms, to develop new algorithms and to open up new applications by sharing the problems from the viewpoint of large-scale algorithms.We received 19 papers in response to the call for papers for this special section.Out of these papers, 2 invited papers related to kernel methods and particle swarm optimization methods, 4 papers and 1 letter were selected by the careful and impartial review of the editorial committee.
As one of human like foveation systems, we present the autonomous foveating system based on the Pulse-Coupled Neural Network(PCNN). The PCNN is expected to be useful for the image processing. This system spontaneously selects the foveation points from the edges and/or optical flows of the input images through the PCNN without any training. The foveation point is defined as the point with the maximum output from the PCNN. The output of the original PCNN neuron takes binary value, so the PCNN would select a lot of candidates for the foveation points. To avoid such confusing situation, we adopt the sigmoidal pulse generator. This decreases the candidates for the foveation points to a few or a single candidate. It is also given some experiments to show the effectiveness of the autonomous foveating system through the PCNN.
There are two major approaches to content-based image retrieval using local image descriptors. One is descriptor-by-descriptor matching and the other is based on comparison of global image representation that describes the set of local descriptors of each image. In large-scale problems, the latter is preferred due to its smaller memory requirements; however, it tends to be inferior to the former in terms of retrieval accuracy. To achieve both low memory cost and high accuracy, we investigate an asymmetric approach in which the probability distribution of local descriptors is modeled for each individual database image while the local descriptors of a query are used as is. We adopt a mixture model of probabilistic principal component analysis. The model parameters constitute a global image representation to be stored in database. Then the likelihood function is employed to compute a matching score between each database image and a query. We also propose an algorithm to encode our image representation into more compact codes. Experimental results demonstrate that our method can represent each database image in less than several hundred bytes achieving higher retrieval accuracy than the state-of-the-art method using Fisher vectors.
In recent years, deep neural networks (DNNs) have made a significant impact on a variety of research fields and applications. One drawback of DNNs is that it requires a huge amount of dataset for training. Since it is very expensive to ask experts to label the data, many non-expert data collection methods such as web crawling have been proposed. However, dataset created by non-experts often contain corrupted labels, and DNNs trained on such dataset are unreliable. Since DNNs have an enormous number of parameters, it tends to overfit to noisy labels, resulting in poor generalization performance. This problem is called Learning with Noisy labels (LNL). Recent studies showed that DNNs are robust to the noisy labels in the early stage of learning before over-fitting to noisy labels because DNNs learn the simple patterns first. Therefore DNNs tend to output true labels for samples with noisy labels in the early stage of learning, and the number of false predictions for samples with noisy labels is higher than for samples with clean labels. Based on these observations, we propose a new sample selection approach for LNL using the number of false predictions. Our method periodically collects the records of false predictions during training, and select samples with a low number of false predictions from the recent records. Then our method iteratively performs sample selection and training a DNNs model using the updated dataset. Since the model is trained with more clean samples and records more accurate false predictions for sample selection, the generalization performance of the model gradually increases. We evaluated our method on two benchmark datasets, CIFAR-10 and CIFAR-100 with synthetically generated noisy labels, and the obtained results which are better than or comparative to the-state-of-the-art approaches.