Sensing makes us aware of our ambient environment with amazing precision and speed. The influx of multitude attributes is processed in real time that makes us act in an exceptionally coordinated and timely manner. This human capability has influenced researchers worldwide trying to imitate this intricate delicacy of nature. Such a concept of global sensing and actuation is what the world has been waiting for, which has emerged in the form of Wireless Sensor Networks (WSN). This chapter delves into the whole spectrum of WSNs; explaining in detail the constituents (sensors, processor, transceiver etc) of sensor node, characteristics, survey of existing platforms, network dynamics and topology, energy matters and applications of WSNs. Applications of WSN have been given special consideration to show how WSN can revolutionize living standards, safety and our environment concurrently being more economically beneficial. Extensive due diligence has been carried out to ensure the right technology for the right application, with comparison to existing solutions and supported by real life deployments. In the end, a macro level insight of emerging branches (WMSN and WSAN) of WSNs provides a glimpse into the future of WSN.
Crowd estimation is a very challenging problem. The most recent study tries to exploit auditory information to aid the visual models, however, the performance is limited due to the lack of an effective approach for feature extraction and integration. The paper proposes a new audiovisual multi-task network to address the critical challenges in crowd counting by effectively utilizing both visual and audio inputs for better modalities association and productive feature extraction. The proposed network introduces the notion of auxiliary and explicit image patch-importance ranking (PIR) and patch-wise crowd estimate (PCE) information to produce a third (run-time) modality. These modalities (audio, visual, run-time) undergo a transformer-inspired cross-modality co-attention mechanism to finally output the crowd estimate. To acquire rich visual features, we propose a multi-branch structure with transformer-style fusion in-between. Extensive experimental evaluations show that the proposed scheme outperforms the state-of-the-art networks under all evaluation settings with up to 33.8% improvement. We also analyze and compare the vision-only variant of our network and empirically demonstrate its superiority over previous approaches.
Sign language recognition (SLR) enables the deaf and speech-impaired community to integrate and communicate effectively with the rest of society. Word level or isolated SLR is a fundamental yet complex task with the main objective of using models to correctly recognize signed words. Sign language consists of very fast and complex hand, body, face movements, and mouthing cues that make the task very challenging. Several input modalities; RGB, optical Flow, RGB-D, and pose/skeleton have been proposed for SLR. However, the complexity of these modalities and the state-of-the-art (SOTA) methodologies tend to be exceedingly sophisticated and over-parametrized. In this paper, our focus is to use the hands and body poses as an input modality. One major problem in pose-based SLR is extracting the most valuable and distinctive features for all skeleton joints. In this regard, we propose an accurate, efficient, and lightweight pose-based pipeline leveraging a graph convolution network (GCN) along with residual connections and a bottleneck structure. The proposed architecture not only facilitates efficient learning during model training providing significantly improved accuracy scores but also alleviates computational complexity. With the proposed architecture in place, we are able to achieve improved accuracies on three different subsets of the WLASL dataset and the LSA-64 dataset. Our proposed model outperforms previous SOTA pose-based methods by providing a relative improvement of 8.91%, 27.62%, and 26.97% for WLASL-100, WLASL-300, and WLASL-1000 subsets. Moreover, our proposed model also outperforms previous SOTA appearance-based methods by providing a relative improvement of 2.65% and 5.15% for WLASL-300 and WLASL-1000 subsets. For the LSA-64 dataset, our model is able to achieve 100% test recognition accuracy. We are able to achieve this improved performance with far less computational cost as compared to existing appearance-based methods.
This paper presents a novel scalable CMOS single photon avalanche diode (SPAD) based PPG sensor integrated with direct light to digital conversion using simple CMOS logic gates on a single chip. With single photon detection capability, a SPAD provides much higher light sensitivity than a photodiode (PD) used in typical PPG sensors, reducing LED driving current significantly for the same receiving current than that with a PD. Moreover, the proposed PPG sensor eliminates the typical power and area intensive analog front end readout electronics associated with conventional PPG sensing, which makes it an attractive option for wearable consumer electronics. For this prototype, a 2 X 2 SPAD based pixel array has been implemented in a standard 180 nm CMOS process. Four SPAD based subpixels have been combined to reduce dead time at the pixel level resulting in increased photon detection capability.
This paper is about the design and development of "Ribo", the upper torso enabled social humanoid robot and the mass people's response to it as received at several public exhibition. Ribo is 135cm tall and has necessary actuation in the face to show basic facial expression. The exterior design is especially crafted to make it look more like a social artificial being rather than just a mechanical robot. The robot is optimized by a distributed software architecture which enables modules developed in different programming languages to work in sync. Ribo was presented in several exhibitions in Bangladesh where mass people directly interacted with it. During that time the visitors were asked several questions on the robot's design to rate the social behavior of Ribo. According to the survey, people liked Ribo mostly because of its facial design and how it speaks in their mother tongue.
Crowd estimation is a very challenging problem. The most recent study tries to exploit auditory information to aid the visual models, however, the performance is limited due to the lack of an effective approach for feature extraction and integration. The paper proposes a new audiovisual multi-task network to address the critical challenges in crowd counting by effectively utilizing both visual and audio inputs for better modalities association and productive feature extraction. The proposed network introduces the notion of auxiliary and explicit image patch-importance ranking (PIR) and patch-wise crowd estimate (PCE) information to produce a third (run-time) modality. These modalities (audio, visual, run-time) undergo a transformer-inspired cross-modality co-attention mechanism to finally output the crowd estimate. To acquire rich visual features, we propose a multi-branch structure with transformer-style fusion in-between. Extensive experimental evaluations show that the proposed scheme outperforms the state-of-the-art networks under all evaluation settings with up to 33.8% improvement. We also analyze and compare the vision-only variant of our network and empirically demonstrate its superiority over previous approaches.
Background subtraction is one of the most commonly used components in machine vision systems. Despite the numerous algorithms proposed in the literature and used in practical applications, key challenges remain in designing a single system that can handle diverse environmental conditions. In this paper we present Multiple Background Model based Background Subtraction Algorithm as such a candidate. The algorithm was originally designed for handling sudden illumination changes. The new version has been refined with changes at different steps of the process, specifically in terms of selecting optimal color space, clustering of training images for Background Model Bank and parameter for each channel of color space. This has allowed the algorithm's applicability to wide variety of challenges associated with change detection including camera jitter, dynamic background, Intermittent Object Motion, shadows, bad weather, thermal, night videos etc. Comprehensive evaluation demonstrates the superiority of algorithm against state of the art.