Qi Li

Institute of Automation

Author Statistics

Papers

Citation

H-Index

i-10 index

Research Trends

Author Order

Document Type

Co-Authors

Zhenan Sun

Institute of Automation

Yunfan Liu

South China Agricultural University

Jieping Ye

Alibaba Group (China)

Muyi Sun

Institute of Automation

Qiyao Deng

Tianjin University of Technology and Education

Ran He

University of Chinese Academy of Sciences

Tieniu Tan

Institute of Automation

Chandra Kambhamettu

University of Delaware

Ravi Janardan

University of Minnesota

Cheng‐Zhong Xu

University of Macau

Cooperative Institutions

Chinese Academy of Sciences

University of Chinese Academy of Sciences

Institute of Automation

Zhejiang University

Beijing Academy of Artificial Intelligence

Shanghai Jiao Tong University

Tsinghua University

Shandong Institute of Automation

Beijing University of Posts and Telecommunications

University of Science and Technology of China

Author Statistics

Papers

Citation

H-Index

i-10 index

Research Field

Deep Supervised Discrete Hashing

arXiv (Cornell University) (2017)

Qi Li Zhenan Sun Ran He Tieniu Tan

With the rapid growth of image and video data on the web, hashing has been extensively studied for image or video search in recent years. Benefit from recent advances in deep learning, deep hashing methods have achieved promising results for image retrieval. However, there are some limitations of previous deep hashing methods (e.g., the semantic information is not fully exploited). In this paper, we develop a deep supervised discrete hashing algorithm based on the assumption that the learned binary codes should be ideal for classification. Both the pairwise label information and the classification information are used to learn the hash codes within one stream framework. We constrain the outputs of the last layer to be binary codes directly, which is rarely investigated in deep hashing algorithm. Because of the discrete nature of hash codes, an alternating minimization method is used to optimize the objective function. Experimental results have shown that our method outperforms current state-of-the-art methods on benchmark datasets.

Benchmark (surveying)

Binary code

Feature hashing

Dynamic perfect hashing

Universal hashing

10.48550/arxiv.1705.10999

Cite

Citations (170)

Hand Pose Estimation Based on Improved NSRM Network

Research Square (Research Square) (2022)

Shiqiang Yang Duo He Qi Li Jinhua Wang Dexin Li

Abstract Hand pose estimation is the basis of dynamic gesture recognition. In vision-based hand pose estimation, the joints of the human hand are highly flexible, and problems such as local similarity and severe occlusion have great influence on the estimation of hand posture. In order to identify the complicated hand posture, the structural relationship between the hand nodes is established, more accurate hand pose estimation can be achieved through the improved Nonparametric Structure Regularization Machine (NSRM) in this paper. Based on the NSRM network, the backbone network is replaced by New High-Resolution Net (NHRNet), then the input and output channels of some convolutional layers are reduced. Finally, a public dataset is used to conduct the hand pose estimation experiments. The experimental results show that the optimized NSRM network has higher accuracy and faster recognition speed for hand pose estimation.

Regularization

Similarity (geometry)

10.21203/rs.3.rs-1839230/v1

Cite

Citations (0)

A Novel Local Feature Extraction Algorithm Based on Gabor Wavelet Transform

Jin Liu Zilu Wu Qi Li Qian Pu

Aiming at the problem that the feature extraction method based on Gabor wavelet transform makes the feature vector dimension higher, a novel method named GCLBP (Gabor-CSLBP) is proposed in this paper. Based on Gabor wavelet transform, the proposed algorithm is a local feature extraction method, which extracted a new kind of feature through applying the idea of CS-LBP (Center-Symmetric Local Binary Pattern) into the resulted sub-images of Gabor transform. The feature vector obtained by the GCLBP method combines the advantages of Gabor wavelet transform and CS-LBP, which not only reduces the dimension of the feature vector, but also improves the robustness of image variation. The proposed method is evaluated by extensive experiments on benchmark databases CMU PIE, and Extended Yale B. The experimental results show that the proposed method -- GCLBP, can significantly improve the face recognition rate under complex illumination.

Gabor wavelet

Gabor transform

Robustness

S transform

Local Binary Patterns

Feature vector

Feature (linguistics)

Gabor filter

10.1145/3373419.3373452

Cite

Citations (1)

MIVC: Multiple Instance Visual Component for Visual-Language Models

arXiv (Cornell University) (2023)

Wenyi Wu Qi Li Wenliang Zhong Junzhou Huang

Vision-language models have been widely explored across a wide range of tasks and achieve satisfactory performance. However, it's under-explored how to consolidate entity understanding through a varying number of images and to align it with the pre-trained language models for generative tasks. In this paper, we propose MIVC, a general multiple instance visual component to bridge the gap between various image inputs with off-the-shelf vision-language models by aggregating visual representations in a permutation-invariant fashion through a neural network. We show that MIVC could be plugged into the visual-language models to improve the model performance consistently on visual question answering, classification and captioning tasks on a public available e-commerce dataset with multiple images per product. Furthermore, we show that the component provides insight into the contribution of each image to the downstream tasks.

Closed captioning

Component (thermodynamics)

Visual Language

10.48550/arxiv.2312.17109

Cite

Citations (0)

Pedestrian detection via a leg-driven physiology framework

2022 IEEE International Conference on Image Processing (ICIP) (2016)

Gongbo Liang Qi Li Xiangui Kang

In this paper, we propose a leg-driven physiology framework for pedestrian detection. The framework is introduced to reduce the search space of candidate regions of pedestrians. Given a set of vertical line segments, we can generate a space of rectangular candidate regions, based on a model of body proportions. The proposed framework can be either integrated with or without learning-based pedestrian detection methods to validate the candidate regions. A symmetry constraint is then applied to validate each candidate region to decrease the false positive rate. The experiment demonstrates the promising results of the proposed method by comparing it with Dalal & Triggs method. For example, rectangular regions detected by the proposed method has much similar area to the ground truth than regions detected by Dalal & Triggs method.

Pedestrian detection

Ground truth

10.1109/icip.2016.7532895

Cite

Citations (0)

Cross-Scenario Unknown-Aware Face Anti-Spoofing With Evidential Semantic Consistency Learning

IEEE Transactions on Information Forensics and Security (2024)

Fangling Jiang Yunfan Liu Haolin Si Jingjing Meng Qi Li

In recent years, domain adaptation techniques have been widely used to adapt face anti-spoofing models to a cross-scenario target domain. Most previous methods assume that the Presentation Attack Instruments (PAIs) in such cross-scenario target domain are same as in the source domain. However, as the malicious users are free to use any form of unknown PAIs to attack the system, this assumption does not always hold in practical applications of face anti-spoofing. Thus, unknown PAIs would inevitably lead to significant performance degradation, since samples of known and unknown PAIs usually have large differences. In this paper, we propose an Evidential Semantic Consistency Learning (ESCL) framework to address this problem. Specifically, a regularized evidential deep learning strategy with a two-way balance of class probability and uncertainty is leveraged to produce uncertainty scores for unknown PAI detection. Meanwhile, entropy optimization-based semantic consistency learning strategy is also employed to encourage features of live and known PAIs to be gathered in the label-conditioned clusters across the source and target domains, while make the features of unknown PAIs to be self-clustered according to intrinsic semantic information. In addition, a new evaluation metric, KUHAR, is proposed to comprehensively evaluate the error rate of known classes and unknown PAIs. Extensive experimental results on six public datasets demonstrate the effectiveness of our method in generalizing face anti-spoofing models to both known classes and unknown PAIs with different types and quantities in a cross-scenario testing domain. Our method achieves state-of-the-art performance on eight different protocols.

Spoofing attack

Cross entropy

Domain Adaptation

10.1109/tifs.2024.3356234

Cite

Citations (4)

Research on Embedded System with Implementation of a Moving Object Tracking Algorithm Based on Improved Meanshift on DM6437

Advanced materials research (2014)

Yan Fei Liu Qi Li Hao Fang Xu Hua

To improve the real-time performance of the meanshift algorithm in the embedded system, an improved meanshift algorithm for tracking moving target is proposed in this paper. In order to reduce the influence of background pixel in a target model, the target model is build by using the target model of continuous frames; to reduce the times of iteration, a kalman filter is used to predict the position of moving object in the current frame; to improve the accuracy of the target model, it is updated in real-time. At last, the improved algorithm is realized on a DM6437 platform and the experimental results show that the improved algorithm can track moving objects effectively.

Position (finance)

Tracking (education)

Tracking system

10.4028/www.scientific.net/amr.1003.207

Cite

Citations (2)

Scale invariant representation of imbalanced points

Neurocomputing (2015)

Qi Li Yongyi Gong

Scale-invariant feature transform

Representation

Scale space