Quantifications of image quality and aesthetic have been regarded as two independent fields in computer vision. Generally, image quality assessment aims at measuring image distortions and image aesthetic is judged by commonly established photography rules. However, either measuring image quality or aesthetic alone is not sufficient to qualitatively rank images. Therefore, this paper puts forward the synergetic assessment of quality and aesthetic to help understand the subjective human preferences of digital pictures more comprehensively. Specifically, considering that the images of existing benchmark datasets are only labeled with single attribute, we first establish a new dataset which contains 9042 real-world images with the corresponding human rated pair-wise quality-aesthetic scores. Previously, these images are only labeled with aesthetic score, and we evaluate the subjective quality score of them, so that it can make up the lack of image dataset with double attributes. Moreover, since the existing methods are mostly designed for individual attribute prediction. We then propose a two-stream learning network to assess both quality and aesthetic of images in parallel. This network follows the top-down perception mechanism which learns from both fined grained details and holistic image layout simultaneously. Furthermore, we introduce a Channel-Diversity loss, which can be deployed in grouped convolution operation, and can constrain channels to be mutually exclusive across the spatial dimensions. To some extent, this contributes to spotlight different local discriminative regions with a finer granularity. Finally, experiments demonstrate that our method outperforms the state-of-the-art methods on our established benchmark dataset and other benchmark datasets in terms of image quality and aesthetic assessment. We hope this paper could serve as a potent reference and be useful for future research on the study of image ranking. Both the benchmark dataset and the code will be publicly available to facilitate further research.
Mesh is a powerful data structure for 3D shapes. Representation learning for 3D meshes is important in many computer vision and graphics applications. The recent success of convolutional neural networks (CNNs) for structured data (e.g., images) suggests the value of adapting insight from CNN for 3D shapes. However, 3D shape data are irregular since each node's neighbors are unordered. Various graph neural networks for 3D shapes have been developed with isotropic filters or predefined local coordinate systems to overcome the node inconsistency on graphs. However, isotropic filters or predefined local coordinate systems limit the representation power. In this paper, we propose a local structure-aware anisotropic convolutional operation (LSA-Conv) that learns adaptive weighting matrices for each node according to the local neighboring structure and performs shared anisotropic filters. In fact, the learnable weighting matrix is similar to the attention matrix in the random synthesizer -- a new Transformer model for natural language processing (NLP). Comprehensive experiments demonstrate that our model produces significant improvement in 3D shape reconstruction compared to state-of-the-art methods.
QR (Quick Response) Codes are widely used as a convenient unidirectional communication channel to convey information, such as emails, hyperlinks, or phone numbers, from publicity materials to mobile devices. But the QR Code is not visually appealing and takes up valuable space of publicity materials. In this paper, we propose a new method to embed QR Code on digital screen via temporal psychovisual modulation (TPVM). By exploiting the difference between human eyes and semiconductor imaging sensors in temporal convolution of optical signals, we make QR Code perceptually transparent to human but detectable for mobile devices. Based on the idea of invisible QR Code, many applications can be implemented, e.g., "physical hyperlink" for something interesting on TV or digital signage , "invisible watermark" for anti-piracy in theater. A prototype system introduced in this paper serves as a proof-of-concept of the invisible QR Code and can be improved in future works.
Head-mounted displays (HMDs) and virtual reality (VR) have been frequently used in recent years, and a user's experience and computation efficiency could be assessed by mounting eye-trackers. However, in addition to visually induced motion sickness (VIMS), eye fatigue has increasingly emerged during and after the viewing experience, highlighting the necessity of quantitatively assessment of the detrimental effects. As no measurement method for the eye fatigue caused by HMDs has been widely accepted, we detected parameters related to optometry test. We proposed a novel computational approach for estimation of eye fatigue by providing various verifiable models.We implemented three classifications and two regressions to investigate different feature sets, which led to present two valid assessment models for eye fatigue by employing blinking features and eye movement features with the ground truth of indicators for optometry test. Three graded results and one continuous result were provided by each model, respectively, which caused the whole result to be repeatable and comparable.We showed differences between VIMS and eye fatigue, and we also presented a new scheme to assess eye fatigue of HMDs users by analysis of parameters of the eye tracker.
Ultra high definition television (UHDTV) has gradually entered our daily life. However, because of the large data of UHDTV, it is hard to render images in real time. We proposed an eye tracking based solution using the concept of uncrowned window from vision research. The theory of uncrowded window suggests that human vision can only effectively recognize objects inside a small window. Object features outside the window cannot be combined properly and therefore they will not be recognizable. We use eye tracker to locate fixation points of the user in real time, and the area inside the uncrowded window displays the results of some advanced image processing algorithms such as deblurring, up scaling, tone mapping of high dynamic range (HDR) imaging and contrast enhancement. Outside of the window we can just process the images simply. Since the user can only see within the uncrowded window, detrimental impact of those poor images outside of the window is almost negligible. The proposed prototype system was written in C++ with SDKs of DirectX, Tobii Gaze SDK, OpenCV, CEGUI and etc. Demonstration of the system will be provided to show that the proposed method is an effective solution to UHDTV display.
In the era of multimedia and Internet, people are eager to obtain information from offline to online. Quick Response (QR) codes and digital watermarks help us access information quickly. However, QR codes look ugly and invisible watermarks can be easily broken in physical photographs. Therefore, this paper proposes a novel method to embed hyperlinks into natural images, making the hyperlinks invisible for human eyes but detectable for mobile devices. Our method is an end-to-end neural network with an encoder to hide information and a decoder to recover information. From original images to physical photographs, camera imaging process will introduce a series of distortion such as noise, blur, and light. To train a robust decoder against the physical distortion from the real world, a distortion network based on 3D rendering is inserted between the encoder and the decoder to simulate the camera imaging process. Besides, in order to maintain the visual attraction of the image with hyperlinks, we propose a loss function based on just noticeable difference (JND) to supervise the training of encoder. Experimental results show that our approach outperforms the previous method in both simulated and real situations.
Novel view synthesis has advanced significantly with the development of neural radiance fields (NeRF) and 3D Gaussian splatting (3DGS). However, achieving high quality without compromising real-time rendering remains challenging, particularly for physically-based ray tracing with view-dependent effects. Recently, N-dimensional Gaussians (N-DG) introduced a 6D spatial-angular representation to better incorporate view-dependent effects, but the Gaussian representation and control scheme are sub-optimal. In this paper, we revisit 6D Gaussians and introduce 6D Gaussian Splatting (6DGS), which enhances color and opacity representations and leverages the additional directional information in the 6D space for optimized Gaussian control. Our approach is fully compatible with the 3DGS framework and significantly improves real-time radiance field rendering by better modeling view-dependent effects and fine details. Experiments demonstrate that 6DGS significantly outperforms 3DGS and N-DG, achieving up to a 15.73 dB improvement in PSNR with a reduction of 66.5% Gaussian points compared to 3DGS. The project page is: https://gaozhongpai.github.io/6dgs/
Spatial psychovisual modulation (SPVM) is a new information display technology, which aims to generate multiple visual percepts for different viewers on a single display simultaneously. After the proposal of SPVM, lots of efforts have been made and several applications (i.e., dual-view display system) have been implemented based on this technology. The dual-view display (DVD) system is considered as an effective digital image hiding system based on SPVM theory, but little work has been dedicated to the perceptual quality assessment of DVD system. Up to now, there is no clear and standard method to evaluate the performance of the dual-view display system. It is important for the viewers to see a clear and non-aliasing image when they are front of the screen. Therefore, in this paper, we will build a DVD database and carry out a subjective experiment to evaluate the performance of the DVD system, and then we investigate and analyze the performance of prevailing no-reference (NR) image quality metrics on the particular DVD system. We have a sufficient belief that this paper can supply the guideline for the performance on the DVD system and serve as a good testing bed for future research of SPVM technology.