IoT anomaly detection faces challenges due to the rarity of IoT anomalies and the limited availability of labels. Recent weakly-supervised approaches, like Feature Encoding with AutoEncoder and Weakly-supervised Anomaly Detection (FeaWAD) and an improvement on FeaWAD (iFWAD), address this scarcity by constructing detectors from a combination of unlabeled data and a small labeled anomalous set. While effective, these methods lack constraints during the feature learning stage to delineate normal regions from anomalies. Notably, the Shrink AutoEncoder promotes clustering of normal data around the origin while preserving space for anomalies. Drawing inspiration from the Shrink AutoEncoder, the study aims to introduce Shrink iFWAD (called sFWAD), embedding a shrink regularizer into iFWAD. This term compels the feature encoder of sFWAD to learn penalizing normal data that is close to zero, while simultaneously pushing IoT anomalies further away from zero. This process facilitates the anomalous score generator of sFWAD in efficiently identifying IoT anomalies. The proposed method is evaluated against state-of-the-art weakly-supervised techniques and other common anomaly detection methods using the N-BaIoT dataset. Experimental results indicate that sFWAD often surpasses recent weakly-supervised methods as well as the common techniques in IoT anomaly detection performance. For identifying unknown/new IoT anomalies, Missed Detection Rate from sFWAD (0.008) is much lower than those from iFWAD (0.026) and RoSAS (0.015).
In the Internet of Things, sensor devices often generate massive sensory data across multiple domains and applications. Identifying IoT malware from a huge amount of such IoT data is often a challenging task. In our previous studies, analytic techniques were applied to reduce dimensionality and discover valuable information from the original data. Particularly, the Self-organizing Maps (SOM)-based classifier with an AutoEncoder is used to create an end-to-end IoT malware detection model. However, the SOM-based classifier has a constraint that new instances may be incorrectly classified if they are mapped into unlabelled neurons in the SOM map. To address this issue, in this study, a novel hybrid between SOM-based classifier and well-known classification algorithms like K-Nearest Neighbors, Support Vector Machine, Softmax, Random Forest. In this hybrid, classification methods will help to correctly assign labels for instances mapped into the unlabeled neurons. In addition, this article investigates hyperparameter optimization methods for optimizing SOM hyperparameters. Our proposed methods were tested on the NBaIoT dataset with various experimental settings. Experimental results illustrate that SOMKNN often performs better than stand-alone techniques, including the SOM classifier.
Malicious software, known as malware, has become urgently serious threat for computer security, so automatic mal-ware classification techniques have received increasing attention. In recent years, deep learning (DL) techniques for computer vision have been successfully applied for malware classification by visualizing malware files and then using DL to classify visualized images. Although DL-based classification systems have been proven to be much more accurate than conventional ones, these systems have been shown to be vulnerable to adversarial attacks. However, there has been little research to consider the danger of adversarial attacks to visualized image-based malware classification systems. This paper proposes an adversarial attack method based on the gradient to attack image-based malware classification systems by introducing perturbations on resource section of PE files. The experimental results on the Malimg dataset show that by a small interference, the proposed method can achieve success attack rate when challenging convolutional neural network malware classifiers.
The Internet of Things with a billion connected devices can generate a huge amount of data daily. This poses challenges to security tasks (i.e. identifying IoT malware). Our previous studies used analytic techniques to reduce the data size and extract valuable information. Currently, clustering is a key technique for many data-driven applications, and it has been widely studied with different distance functions and algorithms. One research direction is to use representation learning for clustering. This research proposes a combination of Deep Clustering AutoEncoder (DCAE) with anomaly detection algorithms for an end-to-end anomaly detection framework. The DCAE maps the data from the original space to a lower-dimensional latent space, where it iteratively minimizes the clustering loss. Then, the output of DCAE is fed to algorithms such as Isolation Forest (IF), $K -$nearest Neighbors (KNN), Local Outlier Factor (LOF), and One-class Support Vector Machine (OCSVM) for identifying anomalies. The proposed model is evaluated on nine recent devices in the N-BaIoT dataset and measure their performance. The experimental results show that the new latent representation improves the IoT outlier detection methods significantly. The model's time efficiency is also recorded to assess its suitability for practical applications.