Zhongpai Gao

Shanghai Jiao Tong University

Author Statistics

Papers

Citation

H-Index

i-10 index

Research Trends

Author Order

Document Type

Co-Authors

Guangtao Zhai

Shanghai Jiao Tong University

Xiongkuo Min

Shanghai Jiao Tong University

Benjamin Planche

United Imaging Healthcare (China)

Terrence Chen

United Imaging Healthcare (China)

Xiaokang Yang

Shanghai Jiao Tong University

Ziyan Wu

Second Affiliated Hospital of Nanjing Medical University

Meng Zheng

Shenyang Institute of Automation

Chunjia Hu

Harbin Institute of Technology

Jun Jia

Shanghai Jiao Tong University

Menghan Hu

East China Normal University

Cooperative Institutions

Shanghai Jiao Tong University

Shanghai Ninth People's Hospital

Tsinghua University

Chinese Academy of Sciences

Chinese Academy of Medical Sciences & Peking Union Medical College

United Imaging Healthcare (China)

Zhejiang University

Central South University

East China Normal University

Ministry of Education of the People's Republic of China

Author Statistics

Papers

Citation

H-Index

i-10 index

Research Field

Synergetic Assessment of Quality and Aesthetic: Approach and Comprehensive Benchmark Dataset

IEEE Transactions on Circuits and Systems for Video Technology (2023)

Kaiwei Zhang Dandan Zhu Xiongkuo Min Zhongpai Gao Guangtao Zhai

Quantifications of image quality and aesthetic have been regarded as two independent fields in computer vision. Generally, image quality assessment aims at measuring image distortions and image aesthetic is judged by commonly established photography rules. However, either measuring image quality or aesthetic alone is not sufficient to qualitatively rank images. Therefore, this paper puts forward the synergetic assessment of quality and aesthetic to help understand the subjective human preferences of digital pictures more comprehensively. Specifically, considering that the images of existing benchmark datasets are only labeled with single attribute, we first establish a new dataset which contains 9042 real-world images with the corresponding human rated pair-wise quality-aesthetic scores. Previously, these images are only labeled with aesthetic score, and we evaluate the subjective quality score of them, so that it can make up the lack of image dataset with double attributes. Moreover, since the existing methods are mostly designed for individual attribute prediction. We then propose a two-stream learning network to assess both quality and aesthetic of images in parallel. This network follows the top-down perception mechanism which learns from both fined grained details and holistic image layout simultaneously. Furthermore, we introduce a Channel-Diversity loss, which can be deployed in grouped convolution operation, and can constrain channels to be mutually exclusive across the spatial dimensions. To some extent, this contributes to spotlight different local discriminative regions with a finer granularity. Finally, experiments demonstrate that our method outperforms the state-of-the-art methods on our established benchmark dataset and other benchmark datasets in terms of image quality and aesthetic assessment. We hope this paper could serve as a potent reference and be useful for future research on the study of image ranking. Both the benchmark dataset and the code will be publicly available to facilitate further research.

Benchmark (surveying)

Discriminative model

Convolution (computer science)

Granularity

Rank (graph theory)

10.1109/tcsvt.2023.3303933

Cite

Citations (1)

Learning Local Neighboring Structure for Robust 3D Shape Representation

arXiv (Cornell University) (2020)

Zhongpai Gao Junchi Yan Guangtao Zhai Juyong Zhang Yiyan Yang

Mesh is a powerful data structure for 3D shapes. Representation learning for 3D meshes is important in many computer vision and graphics applications. The recent success of convolutional neural networks (CNNs) for structured data (e.g., images) suggests the value of adapting insight from CNN for 3D shapes. However, 3D shape data are irregular since each node's neighbors are unordered. Various graph neural networks for 3D shapes have been developed with isotropic filters or predefined local coordinate systems to overcome the node inconsistency on graphs. However, isotropic filters or predefined local coordinate systems limit the representation power. In this paper, we propose a local structure-aware anisotropic convolutional operation (LSA-Conv) that learns adaptive weighting matrices for each node according to the local neighboring structure and performs shared anisotropic filters. In fact, the learnable weighting matrix is similar to the attention matrix in the random synthesizer -- a new Transformer model for natural language processing (NLP). Comprehensive experiments demonstrate that our model produces significant improvement in 3D shape reconstruction compared to state-of-the-art methods.

Representation

10.48550/arxiv.2004.09995

Cite

Citations (4)

The Invisible QR Code

Zhongpai Gao Guangtao Zhai Chunjia Hu

QR (Quick Response) Codes are widely used as a convenient unidirectional communication channel to convey information, such as emails, hyperlinks, or phone numbers, from publicity materials to mobile devices. But the QR Code is not visually appealing and takes up valuable space of publicity materials. In this paper, we propose a new method to embed QR Code on digital screen via temporal psychovisual modulation (TPVM). By exploiting the difference between human eyes and semiconductor imaging sensors in temporal convolution of optical signals, we make QR Code perceptually transparent to human but detectable for mobile devices. Based on the idea of invisible QR Code, many applications can be implemented, e.g., "physical hyperlink" for something interesting on TV or digital signage , "invisible watermark" for anti-piracy in theater. A prototype system introduced in this paper serves as a proof-of-concept of the invisible QR Code and can be improved in future works.

Code (set theory)

Hyperlink

10.1145/2733373.2806398

Cite

Citations (28)

Extended geometric models for stereoscopic 3D with vertical screen disparity

Displays (2020)

Zhongpai Gao Guangtao Zhai Hongwei Deng Xiaokang Yang

Monocular

Binocular disparity

Geometric Modeling

10.1016/j.displa.2020.101972

Cite

Citations (44)

Assessment of eye fatigue caused by head-mounted displays using eye-tracking

BioMedical Engineering OnLine (2019)

Yan Wang Guangtao Zhai Sichao Chen Xiongkuo Min Zhongpai Gao

Head-mounted displays (HMDs) and virtual reality (VR) have been frequently used in recent years, and a user's experience and computation efficiency could be assessed by mounting eye-trackers. However, in addition to visually induced motion sickness (VIMS), eye fatigue has increasingly emerged during and after the viewing experience, highlighting the necessity of quantitatively assessment of the detrimental effects. As no measurement method for the eye fatigue caused by HMDs has been widely accepted, we detected parameters related to optometry test. We proposed a novel computational approach for estimation of eye fatigue by providing various verifiable models.We implemented three classifications and two regressions to investigate different feature sets, which led to present two valid assessment models for eye fatigue by employing blinking features and eye movement features with the ground truth of indicators for optometry test. Three graded results and one continuous result were provided by each model, respectively, which caused the whole result to be repeatable and comparable.We showed differences between VIMS and eye fatigue, and we also presented a new scheme to assess eye fatigue of HMDs users by analysis of parameters of the eye tracker.

Simulator sickness

Optical head-mounted display

Feature (linguistics)

10.1186/s12938-019-0731-5

Cite

Citations (52)

Uncrowded Window Inspired Ultra High Definition Television Display

IEEE International Conference on Multimedia Big Data (2015)

Zhongpai Gao Guangtao Zhai

Ultra high definition television (UHDTV) has gradually entered our daily life. However, because of the large data of UHDTV, it is hard to render images in real time. We proposed an eye tracking based solution using the concept of uncrowned window from vision research. The theory of uncrowded window suggests that human vision can only effectively recognize objects inside a small window. Object features outside the window cannot be combined properly and therefore they will not be recognizable. We use eye tracker to locate fixation points of the user in real time, and the area inside the uncrowded window displays the results of some advanced image processing algorithms such as deblurring, up scaling, tone mapping of high dynamic range (HDR) imaging and contrast enhancement. Outside of the window we can just process the images simply. Since the user can only see within the uncrowded window, detrimental impact of those poor images outside of the window is almost negligible. The proposed prototype system was written in C++ with SDKs of DirectX, Tobii Gaze SDK, OpenCV, CEGUI and etc. Demonstration of the system will be provided to show that the proposed method is an effective solution to UHDTV display.

Deblurring

10.1109/bigmm.2015.24

Cite

Citations (0)

Few-Shot 3D Volumetric Segmentation with Multi-surrogate Fusion

Lecture notes in computer science (2024)

Meng Zheng Benjamin Planche Zhongpai Gao Terrence Chen Richard J. Radke

10.1007/978-3-031-72114-4_28

Cite

Citations (0)

Robust Invisible Hyperlinks in Physical Photographs Based on 3D Rendering Attacks

arXiv (Cornell University) (2019)

Jun Jia Zhongpai Gao Kang Chen Menghan Hu Guangtao Zhai

In the era of multimedia and Internet, people are eager to obtain information from offline to online. Quick Response (QR) codes and digital watermarks help us access information quickly. However, QR codes look ugly and invisible watermarks can be easily broken in physical photographs. Therefore, this paper proposes a novel method to embed hyperlinks into natural images, making the hyperlinks invisible for human eyes but detectable for mobile devices. Our method is an end-to-end neural network with an encoder to hide information and a decoder to recover information. From original images to physical photographs, camera imaging process will introduce a series of distortion such as noise, blur, and light. To train a robust decoder against the physical distortion from the real world, a distortion network based on 3D rendering is inserted between the encoder and the decoder to simulate the camera imaging process. Besides, in order to maintain the visual attraction of the image with hyperlinks, we propose a loss function based on just noticeable difference (JND) to supervise the training of encoder. Experimental results show that our approach outperforms the previous method in both simulated and real situations.

Hyperlink

10.48550/arxiv.1912.01224

Cite

Citations (2)

6DGS: Enhanced Direction-Aware Gaussian Splatting for Volumetric Rendering

arXiv (Cornell University) (2024)

Zhongpai Gao Benjamin Planche Meng Zheng Anwesa Choudhuri Terrence Chen

Novel view synthesis has advanced significantly with the development of neural radiance fields (NeRF) and 3D Gaussian splatting (3DGS). However, achieving high quality without compromising real-time rendering remains challenging, particularly for physically-based ray tracing with view-dependent effects. Recently, N-dimensional Gaussians (N-DG) introduced a 6D spatial-angular representation to better incorporate view-dependent effects, but the Gaussian representation and control scheme are sub-optimal. In this paper, we revisit 6D Gaussians and introduce 6D Gaussian Splatting (6DGS), which enhances color and opacity representations and leverages the additional directional information in the 6D space for optimized Gaussian control. Our approach is fully compatible with the 3DGS framework and significantly improves real-time radiance field rendering by better modeling view-dependent effects and fine details. Experiments demonstrate that 6DGS significantly outperforms 3DGS and N-DG, achieving up to a 15.73 dB improvement in PSNR with a reduction of 66.5% Gaussian points compared to 3DGS. The project page is: https://gaozhongpai.github.io/6dgs/

10.48550/arxiv.2410.04974

Cite

Citations (0)

Quality assessment for dual-view display system

Yuanchun Chen Ning Liu Guangtao Zhai Ke Gu Jia Wang

Spatial psychovisual modulation (SPVM) is a new information display technology, which aims to generate multiple visual percepts for different viewers on a single display simultaneously. After the proposal of SPVM, lots of efforts have been made and several applications (i.e., dual-view display system) have been implemented based on this technology. The dual-view display (DVD) system is considered as an effective digital image hiding system based on SPVM theory, but little work has been dedicated to the perceptual quality assessment of DVD system. Up to now, there is no clear and standard method to evaluate the performance of the dual-view display system. It is important for the viewers to see a clear and non-aliasing image when they are front of the screen. Therefore, in this paper, we will build a DVD database and carry out a subjective experiment to evaluate the performance of the DVD system, and then we investigate and analyze the performance of prevailing no-reference (NR) image quality metrics on the particular DVD system. We have a sufficient belief that this paper can supply the guideline for the performance on the DVD system and serve as a good testing bed for future research of SPVM technology.

Aliasing

10.1109/vcip.2016.7805459

Cite

Citations (7)