In this paper, we propose a regularization technique for noisy-image super-resolution and image denoising.Total variation (TV) regularization is adopted in many image processing applications to preserve the local smoothness.However, TV prior is prone to oversmoothness, staircasing effect, and contrast losses.Nonlocal TV (NLTV) mitigates the contrast losses by adaptively weighting the smoothness based on the similarity measure of image patches.Although it suppresses the noise effectively in the flat regions, it might leave residual noise surrounding the edges especially when the image is not oversmoothed.To address this problem, we propose the bilateral spectrum weighted total variation (BSWTV).Specially, we apply a locally adaptive shrink coefficient to the image gradients and employ the eigenvalues of the covariance matrix of the weighted image gradients to effectively refine the weighting map and suppress the residual noise.In conjunction with the data fidelity term derived from a mixed Poisson-Gaussian noise model, the objective function is decomposed and solved by the alternating direction method of multipliers (ADMM) algorithm.In order to remove outliers and facilitate the convergence stability, the weighting map is smoothed by a Gaussian filter with an iteratively decreased kernel width and updated in a momentum-based manner in each ADMM iteration.We benchmark our method with the state-of-the-art approaches on the public real-world datasets for super-resolution and image denoising.Experiments show that the proposed method obtains outstanding performance for superresolution and achieves promising results for denoising on realworld images.
Synthesis of unavailable imaging modalities from available ones can generate modality-specific complementary information and enable multi-modality based medical images diagnosis or treatment. Existing generative methods for medical image synthesis are usually based on cross-modal translation between acquired and missing modalities. These methods are usually dedicated to specific missing modality and perform synthesis in one shot, which cannot deal with varying number of missing modalities flexibly and construct the mapping across modalities effectively. To address the above issues, in this paper, we propose a unified Multi-modal Modality-masked Diffusion Network (M2DN), tackling multi-modal synthesis from the perspective of "progressive whole-modality inpainting", instead of "cross-modal translation". Specifically, our M2DN considers the missing modalities as random noise and takes all the modalities as a unity in each reverse diffusion step. The proposed joint synthesis scheme performs synthesis for the missing modalities and self-reconstruction for the available ones, which not only enables synthesis for arbitrary missing scenarios, but also facilitates the construction of common latent space and enhances the model representation ability. Besides, we introduce a modality-mask scheme to encode availability status of each incoming modality explicitly in a binary mask, which is adopted as condition for the diffusion model to further enhance the synthesis performance of our M2DN for arbitrary missing scenarios. We carry out experiments on two public brain MRI datasets for synthesis and downstream segmentation tasks. Experimental results demonstrate that our M2DN outperforms the state-of-the-art models significantly and shows great generalizability for arbitrary missing modalities.
The current clinical diagnosis framework of Alzheimer's disease (AD) involves multiple modalities acquired from multiple diagnosis stages, each with distinct usage and cost. Previous AD diagnosis research has predominantly focused on how to directly fuse multiple modalities for an end-to-end one-stage diagnosis, which practically requires a high cost in data acquisition. Moreover, a significant part of these methods diagnose AD without considering clinical guideline and cannot offer accurate sub-type diagnosis. In this paper, by exploring inter-correlation among multiple modalities, we propose a novel progressive AD sub-type diagnosis framework, aiming to give diagnosis results based on easier-to-access modalities in earlier low-cost stages, instead of modalities from all stages. Specifically, first, we design 1) a text disentanglement network for better processing tabular data collected in the initial stage, and 2) a modality fusion module for fusing multi-modality features separately. Second, we align features from modalities acquired in earlier low-cost stage(s) with later high-cost stage(s) to give accurate diagnosis without actual modality acquisition in later-stage(s) for saving cost. Furthermore, we follow the clinical guideline to align features at each stage for achieving sub-type diagnosis. Third, we leverage a progressive classifier that can progressively include additional acquired modalities (if needed) for diagnosis, to achieve the balance between diagnosis cost and diagnosis performance. We evaluate our proposed framework on large diverse public and in-home datasets (8280 in total) and achieve superior performance over state-of-the-art methods. Our codes will be released after the acceptance.
As a sensitive functional imaging technique, positron emission tomography (PET) plays a critical role in early disease diagnosis. However, obtaining a high-quality PET image requires injecting a sufficient dose (standard dose) of radionuclides into the body, which inevitably poses radiation hazards to patients. To mitigate radiation hazards, the reconstruction of standard-dose PET (SPET) from low-dose PET (LPET) is desired. According to imaging theory, PET reconstruction process involves multiple domains (e.g., projection domain and image domain), and a significant portion of the difference between SPET and LPET arises from variations in the noise levels introduced during the sampling of raw data as sinograms. In light of these two facts, we propose an end-to-end TriPle-domain LPET EnhancemenT (TriPLET) framework, by leveraging the advantages of a hybrid denoising-and-reconstruction process and a triple-domain representation (i.e., sinograms, frequency spectrum maps, and images) to reconstruct SPET images from LPET sinograms. Specifically, TriPLET consists of three sequentially coupled components including 1) a Transformer-assisted denoising network that denoises the inputted LPET sinograms in the projection domain, 2) a discrete-wavelet-transform-based reconstruction network that further reconstructs SPET from LPET in the wavelet domain, and 3) a pair-based adversarial network that evaluates the reconstructed SPET images in the image domain. Extensive experiments on the real PET dataset demonstrate that our proposed TriPLET can reconstruct SPET images with the highest similarity and signal-to-noise ratio to real data, compared with state-of-the-art methods.
In this paper, we propose a regularization technique for real-world
super-resolution and image denoising. Total variation (TV) regularization is
adopted in many image processing applications to preserve the local smoothness.
However, TV prior is prone to oversmoothness, staircasing effect, and contrast
losses. Nonlocal TV (NLTV) mitigates the contrast losses by adaptively
weighting the smoothness based on the similarity measure of image patches.
Although it suppresses the noise effectively in the flat regions, it might
leave residual noise surrounding the edges especially when the image is not
oversmoothed. To address this problem, we propose the bilateral spectrum
weighted total variation (BSWTV). Specially, we apply a locally adaptive shrink
coefficient to the image gradients and employ the eigenvalues of the covariance
matrix of the weighted image gradients to effectively refine the weighting map
and suppress the residual noise. In conjunction with the data fidelity term
derived from a mixed Poisson-Gaussian noise model, the objective function is
decomposed and solved by the alternating direction method of multipliers (ADMM)
algorithm. In order to remove outliers and facilitate the convergence
stability, the weighting map is smoothed by a Gaussian filter with an
iteratively decreased kernel width and updated in a momentum-based manner in
each ADMM iteration. We benchmark our method with the state-of-the-art
approaches on the public real-world datasets for super-resolution and image
denoising. Experiments show that the proposed method obtains outstanding
performance for super-resolution and achieves promising results for denoising
on real-world images.
As a sensitive functional imaging technique, positron emission tomography (PET) plays a critical role in early disease diagnosis. However, obtaining a high-quality PET image requires injecting a sufficient dose (standard dose) of radionuclides into the body, which inevitably poses radiation hazards to patients. To mitigate radiation hazards, the reconstruction of standard-dose PET (SPET) from low-dose PET (LPET) is desired. According to imaging theory, PET reconstruction process involves multiple domains (e.g., projection domain and image domain), and a significant portion of the difference between SPET and LPET arises from variations in the noise levels introduced during the sampling of raw data as sinograms. In light of these two facts, we propose an end-to-end TriPle-domain LPET EnhancemenT (TriPLET) framework, by leveraging the advantages of a hybrid denoising-and-reconstruction process and a triple-domain representation (i.e., sinograms, frequency spectrum maps, and images) to reconstruct SPET images from LPET sinograms. Specifically, TriPLET consists of three sequentially coupled components including 1) a Transformer-assisted denoising network that denoises the inputted LPET sinograms in the projection domain, 2) a discrete-wavelet-transform-based reconstruction network that further reconstructs SPET from LPET in the wavelet domain, and 3) a pair-based adversarial network that evaluates the reconstructed SPET images in the image domain. Extensive experiments on the real PET dataset demonstrate that our proposed TriPLET can reconstruct SPET images with the highest similarity and signal-to-noise ratio to real data, compared with state-of-the-art methods.