Outlier or anomaly detection in large data sets is a fundamental task in data science, with broad applications. However, in real data sets with high-dimensional space, most outliers are hidden in certain dimensional combinations and are relative to a user's search space and interest. It is often more effective to give power to users and allow them to specify outlier queries flexibly, and the system will then process such mining queries efficiently. In this study, we introduce the concept of query-based outlier in heterogeneous information networks, design a query language to facilitate users to specify such queries flexibly, define a good outlier measure in heterogeneous networks, and study how to process outlier queries efficiently in large data sets. Our experiments on real data sets show that following such a methodology, interesting outliers can be defined and uncovered flexibly and effectively in large heterogeneous networks.
An important component of the cyber-defense mechanism is the adequate staffing levels of its cybersecurity analyst workforce and their optimal assignment to sensors for investigating the dynamic alert traffic. The ever-increasing cybersecurity threats faced by today’s digital systems require a strong cyber-defense mechanism that is both reactive in its response to mitigate the known risk and proactive in being prepared for handling the unknown risks. In order to be proactive for handling the unknown risks, the above workforce must be scheduled dynamically so the system is adaptive to meet the day-to-day stochastic demands on its workforce (both size and expertise mix). The stochastic demands on the workforce stem from the varying alert generation and their significance rate, which causes an uncertainty for the cybersecurity analyst scheduler that is attempting to schedule analysts for work and allocate sensors to analysts. Sensor data are analyzed by automatic processing systems, and alerts are generated. A portion of these alerts is categorized to be significant , which requires thorough examination by a cybersecurity analyst. Risk, in this article, is defined as the percentage of significant alerts that are not thoroughly analyzed by analysts. In order to minimize risk, it is imperative that the cyber-defense system accurately estimates the future significant alert generation rate and dynamically schedules its workforce to meet the stochastic workload demand to analyze them. The article presents a reinforcement learning-based stochastic dynamic programming optimization model that incorporates the above estimates of future alert rates and responds by dynamically scheduling cybersecurity analysts to minimize risk (i.e., maximize significant alert coverage by analysts) and maintain the risk under a pre-determined upper bound. The article tests the dynamic optimization model and compares the results to an integer programming model that optimizes the static staffing needs based on a daily-average alert generation rate with no estimation of future alert rates (static workforce model). Results indicate that over a finite planning horizon, the learning-based optimization model, through a dynamic (on-call) workforce in addition to the static workforce, (a) is capable of balancing risk between days and reducing overall risk better than the static model, (b) is scalable and capable of identifying the quantity and the right mix of analyst expertise in an organization, and (c) is able to determine their dynamic (on-call) schedule and their sensor-to-analyst allocation in order to maintain risk below a given upper bound. Several meta-principles are presented, which are derived from the optimization model, and they further serve as guiding principles for hiring and scheduling cybersecurity analysts. Days-off scheduling was performed to determine analyst weekly work schedules that met the cybersecurity system’s workforce constraints and requirements.
A sensor node with multiple sensing units is usually unable to process simultaneously the data generated by multiple sensing units, thereby resulting in event misses. This paper presents a collaborative scheduling algorithm, called CTAS, to minimize event misses and energy consumption by exploiting power modes and overlapping sensing areas of sensor nodes. The novel idea of CTAS lies in that it employs a two-level scheduling approach to the execution of tasks collaboratively at group and individual levels among neighboring sensor nodes. CTAS first implements coarse-grain scheduling at the group level to schedule the event types to be detected by each group member. Then, CTAS performs fine-grain scheduling to schedule the tasks corresponding to the assigned event types. The coarse grain scheduling of CTAS is based on a new algorithm that determines the degree of overlapping among neighboring sensor nodes. Simulation results show that CTAS yields significant improvements in energy consumption up to 67% and reduction in event misses by 75%. I. INTRODUCTION Wireless sensor networks emerged as an important wireless technology with the advances in sensor architectures such as the inclusion of multiple sensing units and other components with variable power mode capability. The sensor nodes exploit the availability of multiple power modes by selecting low power modes when they are idle. While saving energy, this also introduces the problem of data accuracy due to the latency involved in switching a sensor node from an energy saving low power mode to the high power mode required for event processing. This latency may be long enough to cause the sensor node to miss the processing of an event on time. Fortunately, the accuracy of data can be improved by making use of the fact that multiple sensor nodes observe the same physical region in densely deployed sensor networks. How- ever, for such an approach to minimize energy consumption and event misses, we address how the tasks corresponding to the events of interest can be executed collaboratively among a group of neighboring sensor nodes with multiple sensing units. In this paper, we use the term task to refer to the required processing of data generated by a sensing unit upon occurrence of an event. We introduce a Collaborative two-level Task Scheduling al- gorithm, called CTAS, for wireless sensor nodes with multiple sensing units. CTAS employs both coarse-grain scheduling
This paper addresses the transmission of medical and context-aware data from mobile patients to healthcare centers over heterogeneous wireless networks. A handheld device, called personal wireless hub (PWH), of each mobile patient first gathers and aggregates the vital sign and context-aware data for various telemedicine applications. PWH transmits the aggregated data to the remote healthcare center over multiple wireless interfaces such as cellular, WLAN, and WiMAX. The aggregated data contain both periodic data and those nonperiodic unpredictable emergency messages that are sporadic and delayintolerant. This paper addresses the problem of providing QoS (e.g., minimum delay, sufficient data rate, acceptable blocking, and/or dropping rate) by designing a packet scheduling and channel/network allocation algorithm over wireless networks. The proposed resource-efficient QoS mechanism is simple and collaborates with an adaptive security algorithm. The QoS and security are achieved mainly with the collaboration of differentiator, delay monitor, data classifier, and scheduler modules within the PWH. This paper also discusses secure data transmission over body sensor networks by introducing key establishment and management algorithms. Simulation results show that the proposed framework achieves low-blocking probability, meets delay requirements, and provides energy-efficient secure communication for the combination of vital signs and context-aware data.
Cyber resilience usually refers to the ability of an entity to detect, respond to, and recover from cybersecurity attacks to the extent that the entity can continuously deliver the intended outcome despite their presence. This paper presents a method and system for providing cyber resilience by integrating autonomous adversary and defender agents, deep reinforcement learning, and graph thinking. Specifically, the proposed cyber resilience system first predicts the current and future adversary activities and then provides an automated critical asset protection and recovery by enabling agents to take appropriate reactive and pro-active actions for preventing and mitigating adversary activities. In particular, the automated cyber resilience system's adversary agent makes it possible for cybersecurity adversary activities, patterns, and intentions to be identified and tracked more accurately and dynamically, based on the preprocessed cybersecurity measurements and observations. The automated system's defender agent is designed to determine and execute cost-effective defensive actions against the adversary activities and intentions predicted by the adversary agent. The game of these adversary and defender agents employ deep reinforcement learning to play a zero-sum observations-aware stochastic game. The experiment results show that the agents perform their tasks efficiently, as the adversary agent is dynamically provided with the input data of infected asset predictions.
This paper proposes a secure data aggregation and source-channel coding algorithm, called SAC, for implementing secure data aggregation and compression along with error correction in wireless sensor networks by employing multiple-input turbo (MIT) code that we have recently introduced for source and channel coding. If there is no direct communication between two sensor nodes, SAC implements Slepian-Wolf coding principles in performing source encoding using MIT code. However, if there exists an explicit communication between two sensor nodes, the node that has more residual energy performs data aggregation and source encoding. When the bit error rate is not acceptable, MIT code is used for channel coding as well. Security is achieved by encrypting the data with shared keys. To reduce energy consumption, latency, and memory size requirements, MIT code employs partial interleavers.
The nodes in wireless sensor networks (WSNs) utilize the radio frequency (RF) channel to communicate. Given that the RF channel is the primary communication channel, many researchers have developed techniques for securing that channel. However, the RF channel is not the only interface into a sensor. The sensing components, which are primarily designed to sense characteristics about the outside world, can also be used (or misused) as a communication (side) channel. In this paper, we characterize the side channels for various sensory components (i.e., light sensor, acoustic sensor, and accelerometer). While previous work has focused on the use of these side channels to improve the security and performance of a WSN, we seek to determine if the side channels have enough capacity to potentially be used for malicious activity. Specifically, we evaluate the feasibility and practicality of the side channels using today's sensor technology and illustrate that these channels have enough capacity to enable the transfer of common, well-known malware. The ultimate goal of this work is to illustrate the need for intrusion detection systems (IDSs) that not only monitor the RF channel, but also monitor the values returned by the sensory components.
Data aggregation in wireless sensor networks eliminates data redundancy, thereby improving bandwidth usage and energy utilization. The paper presents a secure data aggregation protocol, called SRDA (secure reference-based data aggregation), for wireless sensor networks. In order to reduce the number of bits transmitted, sensor nodes compare their raw sensed data value with their reference data value and then transfer only the difference data. In addition to reducing the number of transmitted bits, SRDA also establishes secure connectivity among sensor nodes without any online key distribution. The security level of the communication links is gradually increased as packets are transmitted at higher level cluster-heads, since intercepting a packet at higher levels of the clustering hierarchy provides a summary of a large number of transmissions at lower levels. Simulation results show that the proposed protocol yields significant savings in energy consumption while preserving data security.