Aiming at the shortcomings of ant colony algorithm in the complex environment, such as being easy to fall into local optimum and difficult to guarantee real-time path planning of robots, this paper proposes a dynamic window algorithm based on improved ant colony (IACO-DWA). In order to avoid the blind search of ants in the early stage, this method designs an adaptive distance induction factor, and combines the maximum and minimum ant system (MMAS) to improve the pheromone update rule to prevent falling into the local optimum; to improve the probability transfer rule by constructing a corner suppression factor, Reduce the path inflection points, and integrate the global path points generated by the DWA tracking ant colony to construct a new position evaluation function, and then plan a smooth path trajectory. The simulation results show that the method in this paper strengthens the optimization performance of the global path while realizing the local dynamic obstacle avoidance.
Optical cryptanalysis based on deep learning (DL) has grabbed more and more attention. However, most DL methods are purely data-driven methods, lacking relevant physical priors, resulting in generalization capabilities restrained and limiting practical applications. In this paper, we demonstrate that the double-random phase encoding (DRPE)-based optical cryptosystems are susceptible to preprocessing ciphertext-only attack (pCOA) based on DL strategies, which can achieve high prediction fidelity for complex targets by using only one random phase mask (RPM) for training. After preprocessing the ciphertext information to procure substantial intrinsic information, the physical knowledge DL method based on physical priors is exploited to further learn the statistical invariants in different ciphertexts. As a result, the generalization ability has been significantly improved by increasing the number of training RPMs. This method also breaks the image size limitation of the traditional COA method. Optical experiments demonstrate the feasibility and the effectiveness of the proposed learning-based pCOA method.
Vision-language models have been widely explored across a wide range of tasks and achieve satisfactory performance. However, it's under-explored how to consolidate entity understanding through a varying number of images and to align it with the pre-trained language models for generative tasks. In this paper, we propose MIVC, a general multiple instance visual component to bridge the gap between various image inputs with off-the-shelf vision-language models by aggregating visual representations in a permutation-invariant fashion through a neural network. We show that MIVC could be plugged into the visual-language models to improve the model performance consistently on visual question answering, classification and captioning tasks on a public available e-commerce dataset with multiple images per product. Furthermore, we show that the component provides insight into the contribution of each image to the downstream tasks.
Detecting forgery videos is highly desirable due to the abuse of deepfake. Existing detection approaches contribute to exploring the specific artifacts in deepfake videos and fit well on certain data. However, the growing technique on these artifacts keeps challenging the robustness of traditional deepfake detectors. As a result, the development of generalizability of these approaches has reached a blockage. To address this issue, given the empirical results that the identities behind voices and faces are often mismatched in deepfake videos, and the voices and faces have homogeneity to some extent, in this paper, we propose to perform the deepfake detection from an unexplored voice-face matching view. To this end, a voice-face matching method is devised to measure the matching degree of these two. Nevertheless, training on specific deepfake datasets makes the model overfit certain traits of deepfake algorithms. We instead, advocate a method that quickly adapts to untapped forgery, with a pre-training then fine-tuning paradigm. Specifically, we first pre-train the model on a generic audio-visual dataset, followed by the fine-tuning on downstream deepfake data. We conduct extensive experiments over three widely exploited deepfake datasets - DFDC, FakeAVCeleb, and DeepfakeTIMIT. Our method obtains significant performance gains as compared to other state-of-the-art competitors. It is also worth noting that our method already achieves competitive results when fine-tuned on limited deepfake data.
Xinya Du, Luheng He, Qi Li, Dian Yu, Panupong Pasupat, Yuan Zhang. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers). 2021.
Detecting forgery videos is highly desirable due to the abuse of deepfake. Existing detection approaches contribute to exploring the specific artifacts in deepfake videos and fit well on certain data. However, the growing technique on these artifacts keeps challenging the robustness of traditional deepfake detectors. As a result, the development of these approaches has reached a blockage. In this article, we propose to perform deepfake detection from an unexplored voice-face matching view. Our approach is founded on two supporting points: first, there is a high degree of homogeneity between the voice and face of an individual (i.e., they are highly correlated), and second, deepfake videos often involve mismatched identities between the voice and face due to face-swapping techniques. To this end, we develop a voice-face matching method that measures the matching degree between these two modalities to identify deepfake videos. Nevertheless, training on specific deepfake datasets makes the model overfit certain traits of deepfake algorithms. We instead advocate a method that quickly adapts to untapped forgery, with a pre-training then fine-tuning paradigm. Specifically, we first pre-train the model on a generic audio-visual dataset, followed by the fine-tuning on downstream deepfake data. We conduct extensive experiments over three widely exploited deepfake datasets: DFDC, FakeAVCeleb, and DeepfakeTIMIT. Our method obtains significant performance gains as compared to other state-of-the-art competitors. For instance, our method outperforms the baselines by nearly 2%, achieving an AUC of 86.11% on FakeAVCeleb. It is also worth noting that our method already achieves competitive results when fine-tuned on limited deepfake data.
Computational ghost imaging (CGI), in which an image is retrieved from the known speckle patterns that illuminate the object and the total transmitted intensity, has shown great advances because of its advantages and potential applications at all wavelengths. However, high-quality and less time-consuming imaging has been proven challenging especially in color CGI. In this paper, we will present a new color CGI method that can achieve the reconstruction of high-fidelity images at a relatively low sampling rate (0.0625) by using plug-and-play generalized alternating projection algorithm (PnP-GAP). The spatial distribution and color information of the object are encoded into a one-dimensional light intensity sequence simultaneously by combining randomly distributed speckle patterns and a Bayer color mask as modulation patterns, which is measured by a single-pixel detector. A pre-trained deep denoising network is utilized in the PnP-GAP algorithm to achieve better results. Furthermore, a joint reconstruction and demosaicking method is developed to restore the target color information more realistically. Simulations and optical experiments are performed to verify the feasibility and superiority of our proposed scheme by comparing it with other classical reconstruction algorithms. This new color CGI scheme will enable CGI to obtain information in real scenes more effectively and further promote its practical applications.
Given a query graph, subgraph matching is the process of finding all the isomorphic graphs over a large data graph. Subgraph is one of the fundamental steps of many graph-based applications including recommendation system, information retrieval, social network analysis, etc. In this paper, we investigate the problem of subgraph matching over power grid knowledge graph. Since knowledge graph is a modelled as a directed, labelled, and multiple edges graph, it brings new challenges for the subgraph matching on knowledge graph. One challenge is that subgraph matching candidate calculation complexity increases with edges increase. Another challenge is that the search space of isomorphic subgraphs for a given region is huge, which needs more system resources to prune the unpromising graph candidates. To address these challenges, we propose subgraph index to accelerate the matching processing of subgraph que-ry. We use domain-specific information to construct index of power grid knowledge and maintain a small portion of search candidates in the search space. Experimental studies on real knowledge graph and synthetic graphs demonstrate that the proposed techniques are efficient compared with counterparts.