Global land cover (GLC) products can be utilized to provide geographical supervision for remote sensing representation learning, which has significantly improved downstream tasks' performance and decreased the demand of manual annotations. However, the time differences between remote sensing images and GLC products may introduce deviations in geographical supervision. In this paper, we propose a Geographical supervision Correction method (GeCo) for remote sensing representation learning. Deviated geographical supervision generated by GLC products can be corrected adaptively using the correction matrix during network pre-training and joint optimization process is designed to simultaneously update the correction matrix and network parameters. Additionally, we identify prior knowledge on geographical supervision to guide representation learning and restrict the correction process. The prior knowledge named "minor changes" implies that the geographical supervision may not change significantly, whereas the prior knowledge named "spatial aggregation" implies that land covers are aggregated in their spatial distribution. According to the prior knowledge, corresponding regularization terms are proposed to prevent abrupt changes in geographical supervision correction process and excessive smoothing of network outputs, thereby ensuring the adaptive correction process's correctness. Experimental results demonstrate that our proposed method outperforms random initialization, ImageNet pre-training, and other representation learning methods on a variety of downstream tasks. In particular, when compared to the method that learns representations directly from deviated geographical supervision, it is proved that our method can eliminate the influence of deviations and further improve the effect of representation learning.
Cloud detection is one of the important tasks for remote sensing image processing. In this paper, a novel multilevel cloud detection method based on deep learning is proposed for remote sensing images. First, the simple linear iterative clustering (SLIC) method is improved to segment the image into good quality superpixels. Then, a deep convolutional neural network (CNN) with two branches is designed to extract the multiscale features from each superpixel and predict the superpixel as one of three classes including thick cloud, thin cloud, and noncloud. Finally, the predictions of all the superpixels in the image yield the cloud detection result. In the proposed cloud detection framework, the improved SLIC method can obtain accurate cloud boundaries by optimizing initial cluster centers, designing dynamic distance measure, and expanding search space. Moreover, different from traditional cloud detection methods that cannot achieve multilevel detection of cloud, the designed deep CNN model can not only detect cloud but also distinguish thin cloud from thick cloud. Experimental results indicate that the proposed method can detect cloud with higher accuracy and robustness than compared methods.
Assessment on haze can filter out images with dense haze to improve the reliability of remote-sensing image interpretation. In this letter, a novel no-reference haze assessment method based on haze distribution is proposed for remote-sensing images. First, range channel of an image is defined and the haze distribution map (HDM) is extracted from the hazy image. Then, the haze assessment metric HDM-based haze assessment (HDMHA) is designed according to the HDM. Finally, the degree of haze in remote-sensing images is predicted using the proposed metric. In order to objectively verify the effectiveness of the proposed metric HDMHA, a method of simulating hazy remote-sensing images based on the haze imaging model is proposed in this letter, and the simulated hazy images are greatly similar to real ones in vision. A series of experiments are done on both real images and simulated images, and the results show that the proposed metric achieves good consistency when compared with subjective experiments and outperforms typical blind image quality assessment methods.
<p>Remote sensing image change captioning (RSICC) is a novel task that aims to describe the differences between bi-temporal images by natural language. Previous methods ignore a significant specificity of the task: the difficulty of RSICC is different for unchanged and changed image pairs. They process the unchanged and changed image pairs in a coupled way, which usually causes confusion for change captioning. In this paper, we decouple the task into two issues to ease it: whether and what changes have occurred. An image-level classifier performs binary classification to address the first issue. A feature-level encoder contributes to extracting discriminative features to help the caption generation module address the second issue. For caption generation, we utilize prompt learning to introduce pre-trained large language models (LLMs) into the RSICC task. A multi-prompt learning strategy is proposed to generate a set of unified prompts and a class-specific prompt conditioned on the image-level classifier's results. It can prompt a pre-trained LLM to know whether changes exist and generate captions. Finally, the multiple prompts and the features of the feature-level encoder are fed into a frozen LLM for captioning. Compared with previous methods, our method can leverage the powerful abilities of the pre-trained LLM in language to generate plausible captions, which is free of training. Extensive experiments show that our method is effective and achieves state-of-the-art performance. Besides, an additional experiment demonstrates that our decoupling paradigm is more promising than the previous coupled paradigm for the RSICC task. We will make our codebase publicly available to facilitate future research at https://github.com/Chen-Yang-Liu/PromptCC</p>
This paper addresses the problem of remote sensing image classification based on the semantic context using Discriminative Random Field (DRF) model. The DRF model is used to capture the highly complicated spatial interactions and contextual information in remote sensing images. The DRF labels different image regions by using neighborhood spatial interactions of the labels as well as the observed data. Based on the DRF model, a graph-based inference algorithm--Belief Propagation (BP), is employed to obtain the optimal classification result. This inference algorithm is efficient in the sense that it produces highly accurate results in practice compared to other traditional inference algorithms.
The recent advancement of generative foundational models has ushered in a new era of image generation in the realm of natural images, revolutionizing art design, entertainment, environment simulation, and beyond. Despite producing high-quality samples, existing methods are constrained to generating images of scenes at a limited scale. In this paper, we present MetaEarth, a generative foundation model that breaks the barrier by scaling image generation to a global level, exploring the creation of worldwide, multi-resolution, unbounded, and virtually limitless remote sensing images. In MetaEarth, we propose a resolution-guided self-cascading generative framework, which enables the generating of images at any region with a wide range of geographical resolutions. To achieve unbounded and arbitrary-sized image generation, we design a novel noise sampling strategy for denoising diffusion models by analyzing the generation conditions and initial noise. To train MetaEarth, we construct a large dataset comprising multi-resolution optical remote sensing images with geographical information. Experiments have demonstrated the powerful capabilities of our method in generating global-scale images. Additionally, the MetaEarth serves as a data engine that can provide high-quality and rich training data for downstream tasks. Our model opens up new possibilities for constructing generative world models by simulating Earth visuals from an innovative overhead perspective.
Remote sensing image scene classification aims to automatically assign semantic labels for remote sensing images. Recently, to overcome the distribution discrepancy of training data and test data, domain adaptation has been applied to remote sensing image scene classification. Most domain adaptation approaches usually explore transferability under the assumption that the source domain and target domain have common classes. However, in real applications, new categories may appear in the target domain. Besides, only considering the transferability will degrade the classification performance due to the strong interclass similarity of remote sensing images. In this article, we present an open set domain adaptation algorithm via exploring transferability and discriminability (OSDA-ETD) for remote sensing image scene classification. To be specific, we propose the transferability technology, which aims at the high interdomain variations and high intraclass diversity of remote sensing images. The purpose of transferability is to reduce the global distribution difference of domains and the local distribution discrepancy of the same classes in different domains. For high interclass similarity in remote sensing images, we adopt the discriminability strategy. The discriminability intends to enlarge the distribution discrepancy of different classes in different domains. To further promote the effectiveness of scene classification, we integrate the transferability and the discriminability into a framework. Moreover, we prove that the algorithm has a unique optimizer.
Remote sensing image change captioning (RSICC) is a novel task that aims to describe the differences between bi-temporal images by natural language. Previous methods ignore a significant specificity of the task: the difficulty of RSICC is different for unchanged and changed image pairs. They process the unchanged and changed image pairs in a coupled way, which usually causes confusion for change captioning. In this paper, we decouple the task into two issues to ease it: whether and what changes have occurred. An image-level classifier performs binary classification to address the first issue. A feature-level encoder contributes to extracting discriminative features to help the caption generation module address the second issue. Besides, for caption generation, we utilize prompt learning to introduce pre-trained large language models (LLMs) into the RSICC task. A multi-prompt learning strategy is proposed to generate a set of unified prompts and a class-specific prompt conditioned on the image-level classifier's results. The strategy can prompt a pre-trained LLM to know whether changes exist and generate captions. Finally, the multiple prompts and the visual features of the feature-level encoder are fed into a frozen LLM for language generation. Compared with previous methods, our method can leverage the powerful abilities of the pre-trained LLM in language to generate plausible captions, which is free of training. Extensive experiments show that our method is effective and achieves state-of-the-art performance. Besides, an additional experiment demonstrates that our decoupling paradigm is more promising than the previous coupled paradigm for the RSICC task. We will make our codebase publicly available to facilitate future research at https://github.com/Chen-Yang-Liu/PromptCC.