In this paper, we introduce a new framework for unsupervised deep homography estimation. Our contributions are 3 folds. First, unlike previous methods that regress 4 offsets for a homography, we propose a homography flow representation, which can be estimated by a weighted sum of 8 pre-defined homography flow bases. Second, considering a homography contains 8 Degree-of-Freedoms (DOFs) that is much less than the rank of the network features, we propose a Low Rank Representation (LRR) block that reduces the feature rank, so that features corresponding to the dominant motions are retained while others are rejected. Last, we propose a Feature Identity Loss (FIL) to enforce the learned image feature warp-equivariant, meaning that the result should be identical if the order of warp operation and feature extraction is swapped. With this constraint, the unsupervised optimization is achieved more effectively and more stable features are learned. Extensive experiments are conducted to demonstrate the effectiveness of all the newly proposed components, and results show that our approach outperforms the state-of-the-art on the homography benchmark datasets both qualitatively and quantitatively. Code is available at https://github.com/megvii-research/BasesHomo
Diffusion models have achieved promising results in image restoration tasks, yet suffer from time-consuming, excessive computational resource consumption, and unstable restoration. To address these issues, we propose a robust and efficient Diffusion-based Low-Light image enhancement approach, dubbed DiffLL. Specifically, we present a wavelet-based conditional diffusion model (WCDM) that leverages the generative power of diffusion models to produce results with satisfactory perceptual fidelity. Additionally, it also takes advantage of the strengths of wavelet transformation to greatly accelerate inference and reduce computational resource usage without sacrificing information. To avoid chaotic content and diversity, we perform both forward diffusion and denoising in the training phase of WCDM, enabling the model to achieve stable denoising and reduce randomness during inference. Moreover, we further design a high-frequency restoration module (HFRM) that utilizes the vertical and horizontal details of the image to complement the diagonal information for better fine-grained restoration. Extensive experiments on publicly available real-world benchmarks demonstrate that our method outperforms the existing state-of-the-art methods both quantitatively and visually, and it achieves remarkable improvements in efficiency compared to previous diffusion-based methods. In addition, we empirically show that the application for low-light face detection also reveals the latent practical values of our method. Code is available at https://github.com/JianghaiSCU/Diffusion-Low-Light.
In this paper, we propose a modular wireless sensor platform that consists of sensor modules. Each sensor module is a part of sensor system and in charge of one job in the system, such as computation, communication, output or sensing. Users can stack multiple modules together to build a unique sensor platform. Since users are able to easily replace one module with others, the proposed platform is highly extendable and reusable. Besides, we also design different kinds of mounts, so that sensors can be mounted on objects. This is especially helpful for some moving sensing applications such as attitude monitoring. To demonstrate the proposed platform, we show an alcohol detection application in the paper. The results show that the proposed platform is suitable for academic researches and industrial prototype verification.
Recent deep learning-based optical flow estimators have exhibited impressive performance in generating local flows between consecutive frames. However, the estimation of long-range flows between distant frames, particularly under complex object deformation and large motion occlusion, remains a challenging task. One promising solution is to accumulate local flows explicitly or implicitly to obtain the desired long-range flow. Nevertheless, the accumulation errors and flow misalignment can hinder the effectiveness of this approach. This paper proposes a novel recurrent framework called AccFlow, which recursively backward accumulates local flows using a deformable module called as AccPlus. In addition, an adaptive blending module is designed along with AccPlus to alleviate the occlusion effect by backward accumulation and rectify the accumulation error. Notably, we demonstrate the superiority of backward accumulation over conventional forward accumulation, which to the best of our knowledge has not been explicitly established before. To train and evaluate the proposed AccFlow, we have constructed a large-scale high-quality dataset named CVO, which provides ground-truth optical flow labels between adjacent and distant frames. Extensive experiments validate the effectiveness of AccFlow in handling long-range optical flow estimation. Codes are available at https://github.com/mulns/AccFlow .
Power consumption has become one of the most important concerns in microprocessor design. However, the potential for further power-saving in microprocessors with a conventional architecture is limited because of their unified architectures and mature low-power techniques. An alternative way is proposed in this paper to save power - embedding a dataflow coprocessor in a conventional RISC processor. The dataflow coprocessor is designed to execute short code segments very efficiently. The primary experimental results show that the dataflow coprocessor can increase the power efficiency of a RISC processor by an order of magnitude.
Video coding focuses on reducing the data size of videos. Video stabilization targets at removing shaky camera motions. In this paper, we enable video coding for video stabilization by constructing the camera motions based on the motion vectors employed in the video coding. The existing stabilization methods rely heavily on image features for the recovery of camera motions. However, feature tracking is time-consuming and prone to errors. On the other hand, nearly all captured videos have been compressed before any further processing and such a compression has produced a rich set of block-based motion vectors that can be utilized for estimating the camera motion. More specifically, video stabilization requires camera motions between two adjacent frames. However, motion vectors extracted from video coding may refer to non-adjacent frames. We first show that these non-adjacent motions can be transformed into adjacent motions such that each coding block within a frame contains a motion vector referring to its adjacent previous frame. Then, we regularize these motion vectors to yield a spatially-smoothed motion field at each frame, named as CodingFlow, which is optimized for a spatially-variant motion compensation. Based on CodingFlow, we finally design a grid-based 2D method to accomplish the video stabilization. Our method is evaluated in terms of efficiency and stabilization quality, both quantitatively and qualitatively, which shows that our method can achieve high-quality results compared with the state-of-the-art methods (feature-based).
In this work, we study the problem of separating the global camera motion and the local dynamic motion from an optical flow. Previous methods either estimate global motions by a parametric model, such as a homography, or estimate both of them by an optical flow field. However, none of these methods can directly estimate global and local motions through an end-to-end manner. In addition, separating the two motions accurately from a hybrid flow field is challenging. Because one motion can easily confuse the estimate of the other one when they are compounded together. To this end, we propose an end-to-end global and local motion estimation network GLM-Net. We design two encoder-decoder structures for the motion separation in the optical flow based on different task orientations. One structure adopts a mask autoencoder to extract the global motion, while the other one uses attention U-net for the local motion refinement. We further designed two effective training methods to overcome the problem of lacking supervisions. We apply our method on the action recognition datasets NCAA and UCF-101 to verify the accuracy of the local motion, and the homography estimation dataset DHE for the accuracy of the global motion. Experimental results show that our method can achieve competitive performance in both tasks at the same time, validating the effectiveness of the motion separation.
In this article, we relate the operation of single-frame-based high dynamic range (HDR) image reconstruction to the following two tasks: 1) highlight suppression in over-exposed areas and 2) noise elimination in under-exposed areas. The common goal of both tasks is to preserve or even enhance the details and improve the visibility of scenes when generating the HDR image. These two tasks can be solved separately with fundamentally different ways. In this article, we propose a dual-branch network to process the over- and under- exposed areas respectively for single-frame-based HDR image reconstruction. First, the low dynamic range (LDR) image is normalized, linearized and inputted into both branches, and the masks of the over- and under- exposed regions are calculated to detect the improper exposed areas. Second, the over- and under- exposed areas are restored and enhanced by the two branches respectively, at the same time, the color distribution is learned to obtain more consistent color saturation between the generated HDR image and the ground truth. Third, the output of the two branches and the linearized input LDR image are combined based on the masks to obtain the reconstructed HDR image. Extensive experiments show that the proposed method can efficiently restore the texture and color of the over-exposed areas, suppress the noise of the under-exposed areas, and obtain the HDR image with good contrast, clear details and high structural fidelity of the ground truth image appearance.
The paper proposes a hybrid synthesis method for multi-exposure image fusion taken by hand-held cameras. Motions either due to the shaky cameras or caused by dynamic scenes should be compensated before any content fusion. The misalignment will cause blurring/ghosting artifacts in the fused result. The proposed method can deal with such motions and maintain the exposure information of each input effectively. In particular, the proposed method first applies optical flow for a coarse registration, which performs well with complex non-rigid motion but produces deformations at regions with missing correspondences. To correct such error registration, we segment images into superpixels and identify problematic alignments based on each superpixel, which is further aligned by PatchMatch. After that, the proposed method obtains a fully aligned image stack which facilitates a high-quality fusion that is free from blurring/ghosting artifacts. We compare our method with existing fusion algorithms on various challenging examples, including the static/dynamic, the indoor/outdoor and the daytime/nighttime scenes. Experiment results demonstrate the effectiveness and robustness.