Quality assessment of anatomical MRI images from Generative Adversarial Networks: human assessment and image quality metrics
0
Citation
43
Reference
10
Related Paper
Abstract:
Abstract Background Generative Adversarial Networks (GANs) can synthesize brain images from image or noise input. So far, the gold standard for assessing the quality of the generated images has been human expert ratings. However, due to limitations of human assessment in terms of cost, scalability, and the limited sensitivity of the human eye to more subtle statistical relationships, a more automated approach towards evaluating GANs is required. New method We investigated to what extent visual quality can be assessed using image quality metrics and we used group analysis and spatial independent components analysis to verify that the GAN reproduces multivariate statistical relationships found in real data. Reference human data was obtained by recruiting neuroimaging experts to assess real Magnetic Resonance (MR) images and images generated by a Wasserstein GAN. Image quality was manipulated by exporting images at different stages of GAN training. Results : Experts were sensitive to changes in image quality as evidenced by ratings and reaction times, and the generated images reproduced group effects (age, gender) and spatial correlations moderately well. We also surveyed a number of image quality metrics which consistently failed to fully reproduce human data. While the metrics Structural Similarity Index Measure (SSIM) and Naturalness Image Quality Evaluator (NIQE) showed good overall agreement with human assessment for lower-quality images (i.e. images from early stages of GAN training), only a Deep Quality Assessment (QA) model trained on human ratings was sensitive to the subtle differences between higher-quality images. Conclusions We recommend a combination of group analyses, spatial correlation analyses, and both distortion metrics (SSIM, NIQE) and perceptual models (Deep QA) for a comprehensive evaluation and comparison of brain images produced by GANs.Keywords:
Quality Score
Mean opinion score
Similarity (geometry)
Quality Assessment
Many subjective quality assessment methods have been standardized. Experimenters can select a method from these methods in accordance with the aim of the planned subjective assessment experiment. It is often argued that the results of subjective quality assessment are affected by range effects that are caused by the quality distribution of the assessment videos. However, there are no studies on the double stimulus continuous quality-scale (DSCQS) and absolute category rating with hidden reference (ACR-HR) methods that investigate range effects in the high-quality range. Therefore, we conduct experiments using high-quality assessment videos (high-quality experiment) and low-to-high-quality assessment videos (low-to-high-quality experiment) and compare the DSCQS and ACR-HR methods in terms of accuracy, stability, and discrimination ability. Regarding accuracy, we find that the mean opinion scores of the DSCQS and ACR-HR methods were marginally affected by range effects, although almost all common processed video sequences showed no significant difference for the high- and low-to-high-quality experiments. Second, the DSCQS and ACR-HR methods were equally stable in the low-to-high-quality experiment, whereas the DSCQS method was more stable than the ACR-HR method in the high-quality experiment. Finally, the DSCQS method had higher discrimination ability than the ACR-HR method in the low-to-high-quality experiment, whereas both methods had almost the same discrimination ability for the high-quality experiment. We thus determined that the DSCQS method is better at minimizing the range effects than the ACR-HR method in the high-quality range.
Quality Assessment
Quality Score
Cite
Citations (1)
Colour plus depth map based stereoscopic video has attracted significant attention in the last 10 years, as it can reduce storage and bandwidth requirements for the transmission of stereoscopic content over wireless channels such as mobile networks. However, quality assessment of coded 3D video sequence can currently be performed reliably using expensive and inconvenient subjective tests [1]. The main goal of many subjective video quality metrics is to automatically estimate average user or viewer opinion on a quality of video processed by the system. However, measurement of subjective video quality can also be challenging because it may require a trained expert to judge it. Many subjective video quality measurements are described in ITUT recommendation BT.500. Their main idea is the same as in Mean Opinion Score for video sequences which are showed to the group of viewers and then their opinion is recorded and averaged to evaluate the quality of each video sequence. Optimization of 3D video systems in a timely manner is very important, it is therefore necessary that reliable subjective measures are calculated based on statistical analysis. This paper investigates subjective assessment for four standard 3D video sequences. Subjective tests are performed to verify the 3D video quality and depth perception of a range of differently coded video sequences, with packet loss rates ranging from 0% to 20%. The subjective quality results are used to determine more accurately the objective quality assessment metrics for 3D video sequences such as the Average PSNR, Structural similarities (SSIM), Mean Square Error (MSE) etc.The proposed measure of 3D perception and 3D quality of experience (QoE) is shown to correlate well with human perception of quality on a publicly available dataset of 3D videos and human subjective scores. The proposed measure extracts statistical features from depth and 3D videos to predict the human perception and 3D QoE.
Mean opinion score
PEVQ
Cite
Citations (8)
Video quality assessment is a crucial routine in the broadcasting industry. Due to the duration and the excessive number of video files, a computer-based video quality assessment mechanism is the only solution. While it is common to measure the quality of a video file at the compression stage by comparing it against the raw data, at later stages, no reference video is available for comparison. Therefore, a no-reference (Blind) video quality assessment (NR-VQA) technique is essential. The current NR-VQA methods predict only the mean opinion score (MOS) and do not provide further information about the distribution of people score. However, this distribution is informative for the evaluation of QoE. In this paper, we propose a method for predicting the empirical distribution of human opinion scores in the assessment of video quality. To this end, we extract some frame-level features, and next, we feed these features to a recurrent neural network. Finally, the distribution of opinion score is predicted in the last layer of the RNN. The experiments show that averages of predicted distributions have comparable or better results with previous methods on the KonVid-1k dataset.
Mean opinion score
PEVQ
Quality Score
Raw score
Quality Assessment
Cite
Citations (3)
Since 3D video has the potential to provide stronger immersive perception, evaluating the quality of 3D video becomes an important subject. This paper mainly studies the subjective quality assessment of 3D video of two views compressed by the state-of-the-art standard H.264/MVC in 3DTV system. Our goal is to obtain available opinion scores which can be applied in the research of objective quality assessment in the future. The subjective evaluation is conducted under the guidance of ITU recommendations, including test materials, test environment, test equipment, assessment factors, subjective methodologies and analysis of assessment results. Experiments showed that Mean Opinion Scores (MOS) of overall quality increases with bitrate and different 3D sequences with different complexity have different quality trends.
Quality Assessment
PEVQ
Mean opinion score
Cite
Citations (4)
To establish stable video operations and services while maintaining high quality of experience, perceptual video quality assessment becomes an essential research topic in video technology. The goal of image quality assessment is to predict the perceptual quality for improving imaging systems' performance. The paper presents a novel visual quality metric for video quality assessment. To address this problem, we study the of neural networks through the robust optimization. High degree of correlation with subjective estimations of quality is due to using of a convolutional neural network trained on a large amount of pairs video sequence-subjective quality score. We demonstrate how our predicted no-reference quality metric correlates with qualitative opinion in a human observer study. Results are shown on the MCL-V dataset with comparison existing approaches.
Mean opinion score
Quality Assessment
Quality Score
Observer (physics)
Cite
Citations (0)
The new era of digital communication has brought up many challenges that network operators need to overcome. The high demand of mobile data rates require improved networks, which is a challenge for the operators in terms of maintaining the quality of experience (QoE) for their consumers. In live video transmission, there is a sheer need for live surveillance of the videos in order to maintain the quality of the network. For this purpose objective algorithms are employed to monitor the quality of the videos that are transmitted over a network. In order to test these objective algorithms, subjective quality assessment of the streamed videos is required, as the human eye is the best source of perceptual assessment. In this paper we have conducted subjective evaluation of videos with varying spatial and temporal impairments. These videos were impaired with frame freezing distortions so that the impact of frame freezing on the quality of experience could be studied. We present subjective Mean Opinion Score (MOS) for these videos that can be used for fine tuning the objective algorithms for video quality assessment. Keywords—Frame freezing, mean opinion score, objective assessment, subjective evaluation.
Mean opinion score
Quality Assessment
Quality Score
PEVQ
Cite
Citations (3)
본 논문에서는 멀티미디어 응용을 위한 동영상의 주관적 품질평가방법으로 널리 사용되는 DSCQS(Double Stimulus Continuous Quality Scale method) 및 ACR(Absolute Category Rating)방법을 비교분석하였다. 두 가지 방법을 비교 실험한 결과 MOS (Mean Opinion Score) 값의 경우 DSCQS와 ACR의 평가치가 매우 높은 상관도를 보이는 것을 확인할 수 있었다. 또한 DMOS (Difference Mean Opinion Score)값의 경우에는 DSCQS와 ACR 방법이 MOS 값의 경우보다 약간 낮은 정도의 상관도를 보였다. 이 같은 결과는 ACR 방식이 DSCQS에 필적하는 성능을 제공하며 많은 수의 영상을 평가 할 수 있는 효율적인 주관적 품질평가 방법임을 보여주고 있다. In this paper, we compared two subjective assessment methods DSCQS(Double Stimulus Continuous Quality Scale method) and ACR(Absolute Category Rating). These methods are widely used in order to evaluate video quality for multimedia application. We performed subjective quality tests using DSCQS and ACR methods. The subjective scores obtained by the DSCQS and ACR methods show that these methods are highly correlated in terms of MOS(Mean Opinion Score) and have slightly lower correlation in terms of DMOS(Difference Mean Opinion Score). The results indicate that ACR method is an effective subjective quality assessment method, which shows compatible performance with DSCQS method and can evaluate a larger number of video sequences.
Mean opinion score
Quality Score
Quality Assessment
Stimulus (psychology)
Cite
Citations (9)
This A method is proposed for the generation of a set of video sequence with a predictive subjective video quality (Mean Opinion Score) based on a limited number of sequences and the associated objective video quality measure (Peak Signal-to-Noise Ratio). The MOS is predicted using a sigmoid function model that is optimized based on a limited number of subjective tests for each video sequence. The correlation of the P-MOS to MOS achieved using this method is 0.94. The method can be used to enrich an existing video sequence dataset for the training phase in machine learning or deep leaning applications without bearing the burden and cost of the human opinion based regular subjective video quality test.
Mean opinion score
Sigmoid function
Quality Score
Sequence (biology)
PEVQ
Peak signal-to-noise ratio
Cite
Citations (2)
Quality assessment of digital compound images is a less investigated research topic. In this paper, we present a study for subjective quality assessment of Digital Compound Images (DCIs), and investigate whether existing Image Quality Assessment (IQA) methods are effective to evaluate the quality of distorted DCIs. A new Compound Image Quality Assessment Database (CIQAD) is constructed, including 24 reference DCIs and their 576 distorted versions. The Paired Comparison (PC) method is employed for the subjective viewing, and the Hodgerank decomposition is adopted to generate incomplete but balanced comparison pairs, so as to reduce the execution time while guaranteeing the reliability of the results. In our experiment, correlation of 14 existing IQA methods with the obtained Mean Opinion Score (MOS) values on the CIQAD is calculated, which indicates that the 14 IQA methods are not consistent with human visual perception when judging DCIs in different conditions. Therefore, objective quality assessment metrics should be specifically designed for DCIs. Our subjective study has delivered convincing information to guide the construction of objective metrics. Furthermore, we has also published the database online to favor future research on quality assessment of DCIs.
Mean opinion score
Quality Assessment
Quality Score
Cite
Citations (6)
The International Telecommunication Union has standardized many subjective assessment methods for stereoscopic three-dimensional (3D) and 2D video quality. The same methods are used for 3D and 2D videos. The assessment time, stability, and discrimination ability, which means the ability to identify differences in video quality, are important factors in subjective assessment methods. Many studies on these factors have been done for 2D video quality. However, these factors for 3D video quality have not been sufficiently studied. To address this, we conduct subjective quality assessments for 3D and 2D videos using the absolute category rating (ACR), degradation category rating (DCR), and double stimulus continuous quality-scale (DSCQS) methods that are defined in ITU Recommendations. We first investigate the Pearson's correlation coefficients and Spearman's rank correlation coefficients between different pairings of the three methods to clarify which method is most efficient in terms of assessment time. The different pairings of the three methods exhibit high coefficients. These results indicate that the order relation of the mean opinion score (MOS) and the distance between the MOSs for these methods are almost the same. Therefore, for generally investigating the quality characteristics, the ACR method is most efficient because it has the shortest assessment time. Next, we analyze the stability of these subjective assessment methods. We clarify that the confidence intervals (CIs) of the MOSs for 3D video are almost the same as those for 2D video and that the stability of the DCR method is higher than that of the other methods. The DSCQS method has the smallest CIs for high-quality video. Finally, we investigate the discrimination ability of these subjective assessment methods. The results show that the DCR method performs better than the others in terms of the number of paired MOSs with a significant difference for low-quality video. However, we confirm that the DSCQS method performs better than the others for high-quality video.
Quality Assessment
Mean opinion score
Rank correlation
Quality Score
Cite
Citations (4)