In this paper, a singing voice splitting (SVS) system is researched and implemented by short-time Fourier transform (STFT) and Nonuniform Discrete Fourier Transform (NDFT). Specifically, there are four processes: T-F decomposition, main pitch detection, T-F information extraction and singing synthesis. Firstly, choose STFT as T-F analysis tool. Then, use NDFT realizing main pitch detection. Next, extract various order harmonics of each frame of music signals, getting the short-time spectrum. Finally, gain singing signals from the short-time spectrum by overlap-add (OLA) algorithm. The experiment result shows that the SVS system based on STFT and NDFT can effectively split singing signals.
Sora's high-motion intensity and long consistent videos have significantly impacted the field of video generation, attracting unprecedented attention. However, existing publicly available datasets are inadequate for generating Sora-like videos, as they mainly contain short videos with low motion intensity and brief captions. To address these issues, we propose MiraData, a high-quality video dataset that surpasses previous ones in video duration, caption detail, motion strength, and visual quality. We curate MiraData from diverse, manually selected sources and meticulously process the data to obtain semantically consistent clips. GPT-4V is employed to annotate structured captions, providing detailed descriptions from four different perspectives along with a summarized dense caption. To better assess temporal consistency and motion intensity in video generation, we introduce MiraBench, which enhances existing benchmarks by adding 3D consistency and tracking-based motion strength metrics. MiraBench includes 150 evaluation prompts and 17 metrics covering temporal consistency, motion strength, 3D consistency, visual quality, text-video alignment, and distribution similarity. To demonstrate the utility and effectiveness of MiraData, we conduct experiments using our DiT-based video generation model, MiraDiT. The experimental results on MiraBench demonstrate the superiority of MiraData, especially in motion strength.
We propose a new type of regularization functional for images called oscillation total generalized variation (TGV) which can represent structured textures with oscillatory character in a specified direction and scale. The infimal convolution of oscillation TGV with respect to several directions and scales is then used to model images with structured oscillatory texture. Such functionals constitute a regularizer with good texture preservation properties and can flexibly be incorporated into many imaging problems. We give a detailed theoretical analysis of the infimal-convolution-type model with oscillation TGV in function spaces. Furthermore, we consider appropriate discretizations of these functionals and introduce a first-order primal-dual algorithm for solving general variational imaging problems associated with this regularizer. Finally, numerical experiments are presented which show that our proposed models can recover textures well and are competitive in comparison to existing state-of-the-art methods.
Abstract Total variation (TV) based models have been used widely in multiplicative denoising problem. However, these models are always accompanied by an unsatisfactory effect named staircase due to the property of BV space. In this paper, we present two high-order variational models based on total generalized variation (TGV) for two kinds of multiplicative noises. The proposed models reduce the staircase while preserving the edges. In the meantime we develop an efficient algorithm which is called Prediction-Correction proximal alternative direction method of multipliers (PADMM) to solve our models. Moreover, we show the convergence of our algorithm under certain conditions. Numerical experiments demonstrate that our high-order models outperform the classical TV-based models in PSNR and SSIM values.
ℓ 1 based sparse regularization plays a central role in compressive sensing and image processing.In this paper, we propose ℓ 1 DecNet, as an unfolded network derived from a variational decomposition model, which incorporates ℓ 1 related sparse regularizations and is solved by a non-standard scaled alternating direction method of multipliers.ℓ 1 DecNet effectively separates a spatially sparse feature and a learned spatially dense feature from an input image, and thus helps the subsequent spatially sparse feature related operations.Based on this, we develop ℓ 1 DecNet+, a learnable architecture framework consisting of our ℓ 1 DecNet and a segmentation module which operates over extracted sparse features instead of original images.This architecture combines well the benefits of mathematical modeling and data-driven approaches.To our best knowledge, this is the first study to incorporate mathematical image prior into feature extraction in segmentation network structures.Moreover, our ℓ 1 DecNet+ framework can be easily extended to 3D case.We evaluate the effectiveness of ℓ 1 DecNet+ on two commonly encountered sparse segmentation tasks: retinal vessel segmentation in medical image processing and pavement crack detection in industrial abnormality identification.Experimental results on different datasets demonstrate that, our ℓ 1 DecNet+ architecture with various lightweight segmentation modules can achieve equal or better performance than their enlarged versions respectively.This leads to especially practical advantages on resource-limited devices.
Image restoration is a typical inverse problem, and piecewise constant images have extensive applications in industry and business. Variational models with nonconvex, nonsmooth regularizations can achieve high-quality restorations with neat edges. In particular, a class of truncated potential functions effectively supports contrast-preserving restoration. However, these functions are not subdifferentially regular and thus yield no variational or convergence results for minimization algorithms. In this paper, we present a general smoothing scheme to overcome this nonregularity of the existing truncated regularizers. We also propose globally convergent algorithms to solve the noncoercive variational models with our new smoothly truncated regularizer (STR) functions by introducing a novel proximal term. The limit point of the iterative sequence is shown to be a -stationary point of the original objective function. We then give the implementation details for the inner subproblem by the alternating direction method of multipliers (ADMM). Numerical experiments are carried out to illustrate the good ability of the new regularizer to preserve neat edges and contrasts for piecewise constant images.
X-ray computed tomography has been a useful technology in cancer detection and radiation therapy. However, high radiation dose during CT scans may increase the underlying risk of healthy organs. Usually, sparse-view X-ray projection is an effective method to reduce radiation. In this paper, we propose a constrained nonconvex truncated regularization model for this low-dose CT reconstruction. It preserves sharp edges very well. Although this model is quite complicated to analyze, we establish two useful theoretical results for its minimizers. Motivated by them, an iterative support shrinking algorithm is introduced. To handle more nondifferentiable points of the regularization function except zero point, we use a general proximally linearization technique at them, which is helpful to implement our algorithm. For this algorithm, we prove the convergence of the objective function, and give a lower bound theory of the iterative sequence. Numerical experiments and comparisons demonstrate that our model with the proposed algorithm performs good for low-dose CT reconstruction.
The convex infimal convolution model proposed in Chambolle and Lions [Numer. Math., 76 (1997), pp. 167--188] is a fundamental model to extract two useful components from a single input image and has various low level vision applications. In many of them, one target component has an (approximately) piecewise constant structure and the other is a smoothly varying function or repeated texture pattern. In this paper, we propose and study a general non-Lipschitz infimal convolution (GnLIC) regularization model, which covers most existing applications in this type. Therein the non-Lipschitz regularization enforces the piecewise constant property of the first component. For this GnLIC model, we prove a lower bound theory for its local minimizers and a local version for its stationary points. Motivated by these, we naturally extend previous works to design an inexact iterative support shrinking algorithm with proximal linearization for our GnLIC model (InISSAPL-GnLIC). Moreover, we establish the sequence convergence property and a sequence lower bound theory for InISSAPL-GnLIC, provided that an inexact subgradient condition generated by a subsolver holds. The subsolver is constructed by efficient ADMM and a specially designed feasibilization operation. We finally give numerical experiments in two low level vision applications: Retinex and cartoon-texture decomposition. These tests demonstrate that our non-Lipschitz regularization based method can indeed extract the piecewise constant component better than existing approaches, which is consistent with the established lower bound theory.