In this paper we propose and test a system that indexes a large collection of HTML documents (i.e. an entire web site) and automatically generates context-relevant inline text links between pairs of related documents (i.e. web pages). The goal of the system is threefold: to increase user interaction with the site being browsed, to discover relevant keywords for each document, and to effectively cluster the documents into semantically-significant groupings. The quality of the links is improved over time through passive user feedback collection. Our system can be deployed as a web service and has been tested on offline datasets as well as a live web site. A distinctive feature of our system is that it supports datasets that grow or change over time.
Despite recent advancements in malicious website detection and phishing mitigation, the security ecosystem has paid little attention to Fraudulent e-Commerce Websites (FCWs), such as fraudulent shopping websites, fake charities, and cryptocurrency scam websites. Even worse, there are no active large-scale mitigation systems or publicly available datasets for FCWs.In this paper, we first propose an efficient and automated approach to gather FCWs through crowdsourcing. We identify eight different types of non-phishing FCWs and derive key defining characteristics. Then, we find that anti-phishing mitigation systems, such as Google Safe Browsing, have a detection rate of just 0.46% on our dataset. We create a classifier, BEYOND PHISH, to identify FCWs using manually defined features based on our analysis. Validating BEYOND PHISH on never-before-seen (untrained and untested data) through a user study indicates that our system has a high detection rate and a low false positive rate of 98.34% and 1.34%, respectively. Lastly, we collaborated with a major Internet security company, Palo Alto Networks, as well as a major financial services provider, to evaluate our classifier on manually labeled real-world data. The model achieves a false positive rate of 2.46% and a 94.88% detection rate, showing potential for real-world defense against FCWs.
We describe the design for a distributed game-playing environment suitable for student software development of player strategies. The framework has three main components: the game server, which runs as a RESTful web service on the Internet, the game client, which runs on the student's computer, and the graphical interface, which runs inside a web browser on the student's computer. Our earlier framework ran all components locally, and in a single programming language. The new framework supports single-user sessions, in which the student-implemented player plays against another, possibly faculty-supplied, software player, or against a human player. It also supports multi-user sessions, in which student players on two or more separate computers can play against each other in a single game. Supported by the NSF, award ID 1044721.
Phishing is a critical threat to Internet users. Although an extensive ecosystem serves to protect users, phishing websites are growing in sophistication, and they can slip past the ecosystem's detection systems—and subsequently cause real-world damage—with the help of evasion techniques. Sophisticated client-side evasion techniques, known as cloaking, leverage JavaScript to enable complex interactions between potential victims and the phishing website, and can thus be particularly effective in slowing or entirely preventing automated mitigations. Yet, neither the prevalence nor the impact of client-side cloaking has been studied.In this paper, we present CrawlPhish, a framework for automatically detecting and categorizing client-side cloaking used by known phishing websites. We deploy CrawlPhish over 14 months between 2018 and 2019 to collect and thoroughly analyze a dataset of 112,005 phishing websites in the wild. By adapting state-of-the-art static and dynamic code analysis, we find that 35,067 of these websites have 1,128 distinct implementations of client-side cloaking techniques. Moreover, we find that attackers' use of cloaking grew from 23.32% initially to 33.70% by the end of our data collection period. Detection of cloaking by our framework exhibited low false-positive and false-negative rates of 1.45% and 1.75%, respectively. We analyze the semantics of the techniques we detected and propose a taxonomy of eight types of evasion across three high-level categories: User Interaction, Fingerprinting, and Bot Behavior.Using 150 artificial phishing websites, we empirically show that each category of evasion technique is effective in avoiding browser-based phishing detection (a key ecosystem defense). Additionally, through a user study, we verify that the techniques generally do not discourage victim visits. Therefore, we propose ways in which our methodology can be used to not only improve the ecosystem's ability to mitigate phishing websites with client-side cloaking, but also continuously identify emerging cloaking techniques as they are launched by attackers.
Although the Internet of Things (IoT) incorporates millions of heterogeneous devices to provide advanced intelligent services and has greatly impacted our lives over time, it has a huge blind spot since its design favors connectivity over security. Myriad efforts have been made to secure it, but it is still one of the most lucrative and often an easy target for attackers. IoT devices remain at higher risk of attack due to their intrinsic properties which include but are not limited to extreme heterogeneity, mostly plug-and-play nature, computational limitations, improper patch management, unnecessary open ports, default or no security credentials, and extensive use of reusable open-source software. To address these security concerns we need to thoroughly understand IoT devices' vulnerabilities, associated attacks, and how criminal services can abuse these devices. In this paper, we present recent advances in IoT security vulnerabilities, criminal services by empirically identifying major vulnerable IoT devices and cyber attacks exploiting them by cyber criminals. Additionally, we present mapping of vulnerabilities, criminal services, attacks, and potential solutions against such vulnerabilities and attacks. We have also presented different approaches in a tabular form for side by side comparison.
Phishing attacks trick victims into disclosing sensitive information. To counter them, we explore machine learning and deep learning models leveraging large-scale data. We discuss models built on different kinds of data and present multiple deployment options to detect phishing attacks.