This paper reviews the AIS 2024 Video Quality Assessment (VQA) Challenge, focused on User-Generated Content (UGC). The aim of this challenge is to gather deep learning-based methods capable of estimating the perceptual quality of UGC videos. The user-generated videos from the YouTube UGC Dataset include diverse content (sports, games, lyrics, anime, etc.), quality and resolutions. The proposed methods must process 30 FHD frames under 1 second. In the challenge, a total of 102 participants registered, and 15 submitted code and models. The performance of the top-5 submissions is reviewed and provided here as a survey of diverse deep models for efficient video quality assessment of user-generated content.
Recent deep learning-based optical flow estimators have exhibited impressive performance in generating local flows between consecutive frames. However, the estimation of long-range flows between distant frames, particularly under complex object deformation and large motion occlusion, remains a challenging task. One promising solution is to accumulate local flows explicitly or implicitly to obtain the desired long-range flow. Nevertheless, the accumulation errors and flow misalignment can hinder the effectiveness of this approach. This paper proposes a novel recurrent framework called AccFlow, which recursively backward accumulates local flows using a deformable module called as AccPlus. In addition, an adaptive blending module is designed along with AccPlus to alleviate the occlusion effect by backward accumulation and rectify the accumulation error. Notably, we demonstrate the superiority of backward accumulation over conventional forward accumulation, which to the best of our knowledge has not been explicitly established before. To train and evaluate the proposed AccFlow, we have constructed a large-scale high-quality dataset named CVO, which provides ground-truth optical flow labels between adjacent and distant frames. Extensive experiments validate the effectiveness of AccFlow in handling long-range optical flow estimation. Codes are available at https://github.com/mulns/AccFlow .
Previous studies showed that facial appearance is an important phenotypic indicator of human diseases or biological conditions. Recent advancements in deep learning have shown great potential in facial image analysis, including health status assessments. However, prior methods mainly focused on single modality analysis of either the 2D texture images or 3D facial meshes, which are limited in their ability to fully capture the relationships between biometric measurements and diseases. To address these issues, we propose a task-adaptive multi-modal fusion network, TAMM, for facial-related health assessments. Our model leverages both the geometric and texture features of 3D facial images by a task-adaptive Transformer (TAFormer), which can dynamically extract features from different modalities and scales for various tasks via spatial attention and cross modal multi-scale attention, effectively capture intra- and inter-modal relationships between features. Experimental results on a dataset of 19,775 patients demonstrate that TAMM achieves the state-of-the-art performance on various regression and classification tasks including age, BMI, and fatty liver disease predictions. Ablation studies shows the importance of multi-modal fusion and task-specific adaptability of our model in achieving optimal performance.
Magnetic resonance images (MRI) acquired with low through-plane resolution compromise time and cost. The poor resolution in one orientation is insufficient to meet the requirement of high resolution for early diagnosis of brain disease and morphometric study. The common Single image super-resolution (SISR) solutions face two main challenges: (1) local detailed and global anatomical structural information combination; and (2) large-scale restoration when applied for reconstructing thick-slice MRI into high-resolution (HR) iso-tropic data. To address these problems, we propose a novel two-stage network for brain MRI SR named TransMRSR based on the convolutional blocks to extract local information and transformer blocks to capture long-range dependencies. TransMRSR consists of three modules: the shallow local feature extraction, the deep non-local feature capture, and the HR image reconstruction. We perform a generative task to encapsulate diverse priors into a generative network (GAN), which is the decoder sub-module of the deep non-local feature capture part, in the first stage. The pre-trained GAN is used for the second stage of SR task. We further eliminate the potential latent space shift caused by the two-stage training strategy through the self-distilled truncation trick. The extensive experiments show that our method achieves superior performance to other SSIR methods on both public and private datasets. Code is released at https://github.com/goddesshs/TransMRSR.git .
Recently, many video enhancement methods have been proposed to improve video quality from different aspects such as color, brightness, contrast, and stability. Therefore, how to evaluate the quality of the enhanced video in a way consistent with human visual perception is an important research topic. However, most video quality assessment methods mainly calculate video quality by estimating the distortion degrees of videos from an overall perspective. Few researchers have specifically proposed a video quality assessment method for video enhancement, and there is also no comprehensive video quality assessment dataset available in public. Therefore, we construct a Video quality assessment dataset for Perceptual Video Enhancement (VDPVE) in this paper. The VDPVE has 1211 videos with different enhancements, which can be divided into three sub-datasets: the first sub-dataset has 600 videos with color, brightness, and contrast enhancements; the second sub-dataset has 310 videos with deblurring; and the third sub-dataset has 301 deshaked videos. We invited 21 subjects (20 valid subjects) to rate all enhanced videos in the VDPVE. After normalizing and averaging the subjective opinion scores, the mean opinion score of each video can be obtained. Furthermore, we split the VDPVE into a training set, a validation set, and a test set, and verify the performance of several state-of-the-art video quality assessment methods on the test set of the VDPVE.
Phase measuring deflectometry (PMD) is a superior technique to obtain three-dimensional (3D) shape information of specular surfaces because of its advantages of large dynamic range, noncontact operation, full-field measurement, fast acquisition, high precision, and automatic data processing. We review the recent advances on PMD. The basic principle of PMD is introduced following several PMD methods based on fringe reflection. First, a direct PMD (DPMD) method is reviewed for measuring 3D shape of specular objects having discontinuous surfaces. The DPMD method builds the direct relationship between phase and depth data, without gradient integration procedure. Second, an infrared PMD (IR-PMD) method is reviewed to measure specular objects. Because IR light is used as a light source, the IR-PMD method is insensitive to the effect of ambient light on the measured results and has high measurement accuracy. Third, a proposed method is reviewed to measure the 3D shape of partial reflective objects having discontinuous surfaces by combining fringe projection profilometry and DPMD. Then, the effects of error sources that mainly include phase error and geometric calibration error on the measurement results are analyzed, and the performance of the 3D shape measurement system is also evaluated. Finally, the future research directions of PMD are discussed.
The 1.89 Å resolution structure of the complex of bovine pancreatic phospholipase A 2 (PLA2) with the transition-state analogue L-1- O -octyl-2-heptylphosphonyl- sn -glycero-3-phosphoethanolamine (TSA) has been determined. The crystal of the complex is trigonal, space group P 3 1 21, a = b = 46.58 and c = 102.91 Å and isomorphous to the native recombinant wild type (WT). The structure was refined to a final crystallographic R value of 18.0% including 957 protein atoms, 88 water molecules, one calcium ion and all 31 non-H atoms of the inhibitor at 1.89 Å resolution. In all, 7 726 reflections [F\gt2\sigma(F)] were used between 8.0 and 1.89 Å resolution. The inhibitor is deeply locked into the active-site cleft and coordinates to the calcium ion by displacing the two water molecules in the calcium pentagonal bipyramid by the anionic O atoms of the phosphate and phosphonate group. The hydroxyl group of Tyr69 hydrogen bonds to the second anionic O atom of the phosphate group while that of the phosphonate group replaces the third water, `catalytic' water, which forms a hydrogen bond to N δ1 of His48. The fourth water which also shares N δ1 of His48 is displaced by the steric hinderance of the inhibitor. The fifth conserved structural water is still present in the active site and forms a network of hydrogen bonds with the surrounding residues. The structure is compared with the other known TSA–PLA2 complexes.
This study aims to investigate the role of an orchestrator of third-party logistics service providers (3PLs) in the context of supply chain finance (SCF).SCF is the latest phenomenon to have emerged in the field of supply chain management (SCM).It is concerned with the optimization of financial flows and the integration of financing processes with all participating companies within the supply chains.By leveraging supply chain financing, small and middle -size enterprises (SMEs) are able to obtain funds from banks from which they could not borrow in conventional loaning systems.However, to perform a SCF model successfully, the role of 3PLs cannot be ignored.Drawing upon the discussions in the literature on supply chain orchestration of 3PLs, this study conceptualizes the orchestrator role of 3PLs in SCF.Three forms of 3PLs as an orchestrator in SCF are articulated accordingly.