Camcorder piracy has great impact on the movie industry. Although there are many methods to prevent recording in theatre, no recognized technology satisfies the need of defeating camcorder piracy as well as having no effect on the audience. This paper presents a new projector display technique to defeat camcorder piracy in the theatre using a new paradigm of information display technology, called temporal psychovisual modulation (TPVM). TPVM exploits the difference in image formation mechanisms of human eyes and imaging sensors. The images formed in human vision is continuous integration of the light field while discrete sampling is used in digital video acquisition which has "blackout" period in each sampling cycle. Based on this difference, we can decompose a movie into a set of display frames and broadcast them out at high speed so that the audience can not notice any disturbance, while the video frames captured by camcorder will contain highly objectionable artifacts. The proposed prototype system built on the platform of DLP® LightCrafter 4500™ serves as a proof-of-concept of anti-piracy system.
Traditional image steganography focuses on concealing one image within another, aiming to avoid steganalysis by unauthorized entities. Coverless image steganography (CIS) enhances imperceptibility by not using any cover image. Recent works have utilized text prompts as keys in CIS through diffusion models. However, this approach faces three challenges: invalidated when private prompt is guessed, crafting public prompts for semantic diversity, and the risk of prompt leakage during frequent transmission. To address these issues, we propose DiffStega, an innovative training-free diffusion-based CIS strategy for universal application. DiffStega uses a password-dependent reference image as an image prompt alongside the text, ensuring that only authorized parties can retrieve the hidden information. Furthermore, we develop Noise Flip technique to further secure the steganography against unauthorized decryption. To comprehensively assess our method across general CIS tasks, we create a dataset comprising various image steganography instances. Experiments indicate substantial improvements in our method over existing ones, particularly in aspects of versatility, password sensitivity, and recovery quality. Codes are available at \url{https://github.com/evtricks/DiffStega}.
Dual-language annotation for in-theatre movie exhibition is a useful technique to facilitate the understanding of the movie for audiences from different nations and with different cultural backgrounds. Currently, the most popular solution is the direct superimposition of subtitles in a pair of different languages over the movie, e.g. English + Chinese which can often be seen in cinema in China today. An obvious drawback of this straightforward solution is that the subtitle area often occludes the movie contents and even becomes annoying for the audience. In this paper, we propose a new Spatial Psychovisual Modulation (SPVM) based solution to the dual-language subtitling problem. SPVM is a new paradigm of information display exploiting the mismatch between high resolution of the modern optoelectronic displays and limited spatial resolution of the human visual system (HVS). In this work, we design a simultaneous dual-subtitle exhibition system using 2 synchronized projectors with linear polarization filters and polarization glasses. Most audiences can enjoy the movie as usual with a default subtitle language, say English, while others have the option of only seeing subtitle in the other language, e.g. Chinese using the polarization glasses. We have implemented the system and experimental results will be demonstrated to justify the effectiveness and robustness of the proposed dual-subtitle exhibition system.
For meshes, sharing the topology of a template is a common and practical setting in face-, hand-, and body-related applications. Meshes are irregular since each vertex's neighbors are unordered and their orientations are inconsistent with other vertices. Previous methods use isotropic filters or predefined local coordinate systems or learning weighting matrices for each vertex of the template to overcome the irregularity. Learning weighting matrices for each vertex to soft-permute the vertex's neighbors into an implicit canonical order is an effective way to capture the local structure of each vertex. However, learning weighting matrices for each vertex increases the parameter size linearly with the number of vertices and large amounts of parameters are required for high-resolution 3D shapes. In this paper, we learn spectral dictionary (i.e., bases) for the weighting matrices such that the parameter size is independent of the resolution of 3D shapes. The coefficients of the weighting matrix bases for each vertex are learned from the spectral features of the template's vertex and its neighbors in a weight-sharing manner. Comprehensive experiments demonstrate that our model produces state-of-the-art results with a much smaller model size.
Imagine an interesting situation when watching a movie, we can scan the screen using our smartphones to get some extra information about this movie such as the cast, the release date, the movie's homepage, etc. Our prospect is a world where each video contains invisible information that can be delivered to us through mobile devices with cameras. This paper proposes the first deep learning-based information hiding method for videos to achieve information transmission from screens to cameras. Compared with hiding information in single images, the methods for videos need to maintain visual quality in both spatial and temporal domains. Furthermore, the training of video models builds on a large video dataset, which needs much more computational resources than training models for images. To reduce the computational complexity, we propose to simulate data on-the-fly to generate simulated sequences from single images. Then, we use the simulated data to train a spatio-temporal generator that hides information in videos while maintaining visual quality. During training, a temporal loss function based on the simulated data is exploited to ensure the temporal consistency of generated videos. After embedding, we use a decoder to recover the hidden information. To simulate the imaging pipeline from screens to cameras in the real world, we insert a distortion network between the generator and decoder. The distortion network is based on differentiable 3D rendering to cover possible distortions introduced in the procedure of camera imaging. Experimental results show that the hidden information in videos can be extracted by cameras without impacting the visual quality. Our work can be applied to many fields, such as advertisement, entertainment, and education.
With the rise of cameras and smart sensors, humanity generates an exponential amount of data. This valuable information, including underrepresented cases like AI in medical settings, can fuel new deep-learning tools. However, data scientists must prioritize ensuring privacy for individuals in these untapped datasets, especially for images or videos with faces, which are prime targets for identification methods. Proposed solutions to de-identify such images often compromise non-identifying facial attributes relevant to downstream tasks. In this paper, we introduce Disguise, a novel algorithm that seamlessly de-identifies facial images while ensuring the usability of the modified data. Unlike previous approaches, our solution is firmly grounded in the domains of differential privacy and ensemble-learning research. Our method involves extracting and substituting depicted identities with synthetic ones, generated using variational mechanisms to maximize obfuscation and non-invertibility. Additionally, we leverage supervision from a mixture-of-experts to disentangle and preserve other utility attributes. We extensively evaluate our method using multiple datasets, demonstrating a higher de-identification rate and superior consistency compared to prior approaches in various downstream tasks.
Accurate lesions detection on retinopathy images is crucial for the diagnosis of diabetes. However, it is always hampered by various characteristics of lesions such as shape, color, texture, and similarities. Most advanced algorithms still cannot automatically detect common lesions, e.g. exudate, hemorrhage, and cotton-wool spots, being used for comprehensive analysis of disease state. To this end, we present a multi-functional detection model for diabetic retinopathies and further analyze disease mechanisms overall. Specifically, this paper attempts to implement a multi-lesion detector via modified Mask region-oriented CNN, which can be used for the aforementioned retinopathies. Meanwhile, a non-local attention module is introduced to the detector as a spatial attention mechanism for handling the global information missing problem. In addition, to boost the effectiveness of the detector, the dilated operation is implemented for dataset preprocessing. Improvement is achieved both algorithmically and architecturally, via investigating thoroughly the most probable lesion category with a novel ensemble learning framework. Extensive experiments on standard datasets for three different tasks evidence the superior performance of the proposed method over state-of-the-art methods.
This paper introduces an information security display system using temporal psychovisual modulation (TPVM). TPVM was proposed as a new information display technology using the interplay of signal processing, optoelectronics and psychophysics. Since the human visual system cannot detect quick temporal changes above the flicker fusion frequency (about 60 Hz) and yet modern display technologies offer much higher refresh rates, there is a chance for a single display to simultaneously serve different contents to multiple observers. A TPVM display broadcasts a set of images called atom frames at a high speed, and those atom frames are then weighted by liquid crystal (LC) shutter based viewing devices that are synchronized with the display before entering the human visual system and fusing into the desired visual stimuli. And through different viewing devices, people can see different information. In this work, we develop a TPVM based information security display prototype. There are two kinds of viewers, those authorized viewers with the viewing devices who can see the secret information and those unauthorized viewers (bystanders) without the viewing devices who only see mask/disguise images. The prototype is built on a 120 Hz LCD screen with synchronized LC shutter glasses that were originally developed for stereoscopic display. The system is written in C++ language with SDKs of Nvidia 3D Vision, DirectX, CEGUI, MuPDF and etc. We also added human-computer interaction support of the system using Kinect. The information security display system developed in this work serves as a proof-of-concept of the TPVM paradigm, as well as a testbed for future research of TPVM technology.
Camcorder piracy has great impact on the movie industry. Although there are many methods to prevent recording in theatre, no recognized technology satisfies the need of defeating camcorder piracy as well as having no effect on the audience. To realize anti-piracy, we uses a new paradigm of information display technology, called temporal psychovisual modulation (TPVM). TPVM exploits the difference in image formation mechanisms of human eyes and imaging sensors. Based on this difference, we build a prototype system on the platform of DLP® LightCrafter 4500™ which features high speed pattern display. The display system serves as a proof-of-concept of anti-piracy system.