The LoRA-finetuning quantization of LLMs has been extensively studied to obtain accurate yet compact LLMs for deployment on resource-constrained hardware. However, existing methods cause the quantized LLM to severely degrade and even fail to benefit from the finetuning of LoRA. This paper proposes a novel IR-QLoRA for pushing quantized LLMs with LoRA to be highly accurate through information retention. The proposed IR-QLoRA mainly relies on two technologies derived from the perspective of unified information: (1) statistics-based Information Calibration Quantization allows the quantized parameters of LLM to retain original information accurately; (2) finetuning-based Information Elastic Connection makes LoRA utilizes elastic representation transformation with diverse information. Comprehensive experiments show that IR-QLoRA can significantly improve accuracy across LLaMA and LLaMA2 families under 2-4 bit-widths, e.g., 4- bit LLaMA-7B achieves 1.4% improvement on MMLU compared with the state-of-the-art methods. The significant performance gain requires only a tiny 0.31% additional time consumption, revealing the satisfactory efficiency of our IRQLoRA. We highlight that IR-QLoRA enjoys excellent versatility, compatible with various frameworks (e.g., NormalFloat and Integer quantization) and brings general accuracy gains. The code is available at https://github.com/htqin/ir-qlora.
This paper addresses the growing interest in deploying deep learning models directly in-sensor. We present "Q-Segment", a quantized real-time segmentation algorithm, and conduct a comprehensive evaluation on a low-power edge vision platform with an in-sensors processor, the Sony IMX500. One of the main goals of the model is to achieve end-to-end image segmentation for vessel-based medical diagnosis. Deployed on the IMX500 platform, Q-Segment achieves ultra-low inference time in-sensor only 0.23 ms and power consumption of only 72mW. We compare the proposed network with state-of-the-art models, both float and quantized, demonstrating that the proposed solution outperforms existing networks on various platforms in computing efficiency, e.g., by a factor of 75x compared to ERFNet. The network employs an encoder-decoder structure with skip connections, and results in a binary accuracy of 97.25% and an Area Under the Receiver Operating Characteristic Curve (AUC) of 96.97% on the CHASE dataset. We also present a comparison of the IMX500 processing core with the Sony Spresense, a low-power multi-core ARM Cortex-M microcontroller, and a single-core ARM Cortex-M4 showing that it can achieve in-sensor processing with end-to-end low latency (17 ms) and power concumption (254mW). This research contributes valuable insights into edge-based image segmentation, laying the foundation for efficient algorithms tailored to low-power environments.
Sound event detection (SED) is a hot topic in consumer and smart city applications. Existing approaches based on deep neural networks (DNNs) are very effective, but highly demanding in terms of memory, power, and throughput when targeting ultra-low power always-on devices.
During recent years, Wireless Sensor Networks captured the imagination of many researchers with number of applications growing rapidly. Power consumption is (most often) the dominant constraint in designing such systems. This constraint has multi-dimensional implications such as battery type and size, energy harvester design, lifetime of the deployment, etc. Energy neutral system implementation is the ultimate goal in wireless sensor networks and represents a hot topic of research. Several recent advances promise significant reduction of the overall sensor network power consumption. These advances include novel sensors and sensor interfaces, low power wireless transceivers, low power processing, etc. Power optimization techniques have to explore a large design search space. This paper reviews a number of system level power management methodologies for Wireless Sensor Networks which use ultra low power wake-up radio receivers.
Electroencephalogram (EEG)-based Brain-Computer Interfaces (BCIs) have garnered significant interest across various domains, including rehabilitation and robotics. Despite advancements in neural network-based EEG decoding, maintaining performance across diverse user populations remains challenging due to feature distribution drift. This paper presents an effective approach to address this challenge by implementing a lightweight and efficient on-device learning engine for wearable motor imagery recognition. The proposed approach, applied to the well-established EEGNet architecture, enables real-time and accurate adaptation to EEG signals from unregistered users. Leveraging the newly released low-power parallel RISC-V-based processor, GAP9 from Greeenwaves, and the Physionet EEG Motor Imagery dataset, we demonstrate a remarkable accuracy gain of up to 7.31\% with respect to the baseline with a memory footprint of 15.6 KByte. Furthermore, by optimizing the input stream, we achieve enhanced real-time performance without compromising inference accuracy. Our tailored approach exhibits inference time of 14.9 ms and 0.76 mJ per single inference and 20 us and 0.83 uJ per single update during online training. These findings highlight the feasibility of our method for edge EEG devices as well as other battery-powered wearable AI systems suffering from subject-dependant feature distribution drift.
Accurate and low-power indoor localization is becoming more and more of a necessity to empower novel consumer and industrial applications. In this field, the most promising technology is based on UWB modulation; however, current UWB positioning systems do not reach centimeter accuracy in general deployments due to multipath and nonisotropic antennas, still necessitating several fixed anchors to estimate an object's position in space. This article presents an in-depth study and assessment of angle of arrival (AoA) UWB measurements using a compact, low-power solution integrating a novel commercial module with phase difference of arrival (PDoA) estimation as integrated feature. Results demonstrate the possibility of reaching centimeter distance precision and ang 2.4 average angular accuracy in many operative conditions, e.g., in a ang 90 range around the center. Moreover, integrating the channel impulse response, the phase differential of arrival, and the point-to-point distance, an error correction model is discussed to compensate for reflections, multipaths, and front-back ambiguity.
Energy harvesting is generally seen to be the key to power cyber-physical systems in a low-cost, long term, efficient manner. However, harvesting has traditionally been coupled with large energy storage devices to mitigate the effects of the source's variability. The emerging class of transiently powered systems avoids this issue by performing computation only as a function of the harvested energy, minimizing the obtrusive and expensive storage element. In this work, we present an efficient Energy Management Unit (EMU) to supply generic loads when the average harvested power is much smaller than required for sustained system operation. By building up charge to a pre-defined energy level, the EMU can generate short energy bursts predictably, even under variable harvesting conditions. Furthermore, we propose a dynamic energy burst scaling (DEBS) technique to adjust these bursts to the load's requirements. Using a simple interface, the load can dynamically configure the EMU to supply small bursts of energy at its optimal power point, independent from the harvester's operating point. Extensive theoretical and experimental data demonstrate the high energy efficiency of our approach, reaching up to 73.6% even when harvesting only 110 µW to supply a load of 3.89mW.
Tactile sensing is a crucial perception mode for robots and human amputees in need of controlling a prosthetic device. Today, robotic and prosthetic systems are still missing the important feature of accurate tactile sensing. This lack is mainly due to the fact that the existing tactile technologies have limited spatial and temporal resolution and are either expensive or not scalable. In this article, we present the design and implementation of a hardware–software embedded system called SmartHand. It is specifically designed to enable the acquisition and real-time processing of high-resolution tactile information from a hand-shaped multisensor array for prosthetic and robotic applications. During data collection, our system can deliver a high throughput of 100 frames per second, which is $13.7\times $ higher than previous related work. This has allowed the collection of a new tactile dataset consisting of 340 000 frames while interacting with 16 objects from everyday life during five different sessions. Together with the empty hand, the dataset presents a total of 17 classes. We propose a compact yet accurate convolutional neural network that requires one order of magnitude less memory and $15.6\times $ fewer computations compared with related work without degrading classification accuracy. The top-1 and top-3 cross-validation accuracies on the collected dataset are, respectively, 98.86% and 99.83%. We further analyze the intersession variability and obtain the best top-3 leave-one-out-validation accuracy of 77.84%. We deploy the trained model on a high-performance ARM Cortex-M7 microcontroller achieving an inference time of only 100 ms minimizing the response latency. The overall measured power consumption is 505 mW. Finally, we fabricate a new control sensor and perform additional experiments to provide analyses on sensor degradation and slip detection. This work is a step forward in giving robotic and prosthetic devices a sense of touch by demonstrating the practicality of a smart embedded system that uses a scalable tactile sensor with embedded tiny machine learning.