logo
    A Keyword Based Strategy for Spam Topic Discovery from the Internet
    4
    Citation
    8
    Reference
    10
    Related Paper
    Citation Trend
    Abstract:
    The increasing volume of spam has become a serious threat not only to the Internet, but also to the society. However, it's a great challenge to discover the spam from the Internet effectively and efficiently. Content-based filtering is one of the mainstream methods to solve the problem. This paper proposed a content based spam topic detection strategy through keyword extraction. In particular, spam topic is detected by using the topic model of multiple features with the keywords of clues, which integrate the corresponding feature of News, BBS and Blog. We get the min cost of 0.282 through TDT4 evaluating corpus and the satisfaction of 93.3% through the golaxy public opinion monitoring system of ICT, which is more effective than traditional method. The Experiments show that this algorithm is effective for spam topic detection.
    Keywords:
    Forum spam
    Mainstream
    Unsolicited commercial email or spam is recognized as a problem disrupting email communication and costing the community dearly. In order to protect recipients from receiving spam, anti-spam measures building on technologies, such as filters and block lists, have been deployed widely. There is some evidence that certain anti-spam measures based on the purported origin of the spam cause unintended consequences which relate to issues of equity of access which we term digital redlining. Spammers have an interest in bypassing such measures by obscuring the real origin of their messages. Investigating these effects means we need to determine the true origin of spam, despite the efforts of spammers to confuse us and spam filters. The aim to find the true origin of spam is different from the objective of most anti-spam developers who are mainly interested in identifying spam when it knocks on their front door (mail server). In this paper we discuss why the difference between originator and delivery host matters when investigating digital redlining. We also highlight some of the difficulties we are facing when trying to determine the originating host as opposed to the delivering host.
    Forum spam
    Citations (0)
    Based on the recent research and statistics by Symantec, significant amount of all global web traffic and email traffic is marked as spam. Spambot is basically a robot that maliciously traverses the World Wide Web (WWW), and gathers information, email addresses, etc. For the spammer. The increasing growth of spam bot sophistication advances in the introduction of Spam 2.0, which infiltrate legitimate Web 2.0 unsolicited. This leads to various unwanted outcomes, such as the appearance of spam pages as the top search engines results due to excessive usage of popular terms, unreal web-pages visit rate, spam emails, and wastes of resources. Here we present an efficient method to detect web spam bot in the presence of decoy actions, by applying efficient approximate string-matching techniques. Our preliminary experimental results show that the proposed method is successful for the classification of web spam bot in the presence of decoy actions, hence eliminating spam in Web 2.0 applications.
    Forum spam
    Spamdexing
    Decoy
    Web crawler
    Web traffic
    Deep Web
    Citations (3)
    How do we keep our channels of electronic communication, both individual and group, open, while keeping out inappropriate and unrelated materials, such as spam? Does someone other than the intended recipient have the right to control what electronic mail users see? Might this lead to censorship? If others DO have the right to control what e-mail users see, how should this filtering or censorship occur? Are users aware of this filtering? If others are NOT controlling what users receive, what can users themselves do to control their environments to limit the amount of incoming spam? These are some of the topics that this CHI panel will address.
    Forum spam
    Opt-in email
    Electronic mail
    Citations (0)
    The sending of unsolicited communications (commonly known as ‘spam’) is considered as a great intrusion into the privacy of the user of electronic communications services, and is therefore regulated in Article 13 of the ePrivacy directive. At the time of the adoption of the directive, the most common ways of spamming were via telephone, fax, electronic mail and SMS. Technological progress, however, has since created more types of spamming, one of which is Bluespam, i.e., the action of sending spam to Bluetooth-enabled devices, such as mobile phones, PDAs or laptop computers. Although, at first sight, it would seem that Bluespam should be considered as any other kind of spam, and would therefore fall under the ambit of Article 13 of the ePrivacy directive, a closer look reveals that the answer is in fact not so obvious.
    Forum spam
    Citations (1)
    A spammer needs three elements to run a spam operation: a list of victim email addresses, content to be sent, and a botnet to send it. Each of these three elements are critical for the success of the spam operation: a good email list should be composed of valid email addresses, a good email content should be both convincing to the reader and evades anti-spam filters, and a good botnet should efficiently sent spam. Given how critical these three elements are, figures specialized on one of these elements have emerged in the spam ecosystem. Email harvesters crawl the web and compile email lists, botmasters infect victim computers and maintain efficient botnets for spam dissemination, and spammers rent botnets and buy email lists to run spam campaigns. Previous research suggested that email harvesters and botmasters sell their services to spammers in a prosperous underground economy. No rigorous research has been performed, however, on understanding the relations between these three actors. This paper aims to shed some light on the relations between harvesters, botmasters, and spammers. By disseminating email addresses on the Internet, fingerprinting the botnets that contact these addresses, and looking at the content of these emails, we can infer the relations between the actors involved in the spam ecosystem. Our observations can be used by researchers to develop more effective anti-spam systems.
    Botnet
    Forum spam
    Citations (32)
    The rise of spam in the last decade has been staggering, with the rate of spam exceeding that of legitimate email. While conjectures exist on how spammers gain access to email addresses to spam, most work in the area of spam containment has either focused on better spam filtering methodologies or on understanding the botnets commonly used to send spam. In this paper, we aim to understand the origins of spam. We post dedicated email addresses to record how and where spammers go to obtain email addresses. We find that posting an email address on public Web pages yields immediate and high-volume spam. Surprisingly, even simple email obfuscation approaches are still sufficient today to prevent spammers from harvesting emails. We also find that attempts to find open relays continue to be popular among spammers. The insights we gain on the use of Web crawlers used to harvest email addresses and the commonalities of techniques used by spammers open the door for radically different follow-up work on spam containment and even systematic enforcement of spam legislation at a large scale.
    Forum spam
    Botnet
    Opt-in email
    Citations (26)
    The most widely recognized form of spam is e-mail spam, however the term "spam" is used to describe similar abuses in other media and mediums. Spam 2.0 (or Web 2.0 Spam) is refereed to as spam content that is hosted on online Web 2.0 applications. In this paper: we provide a definition of Spam 2.0, identify and explain different entities within Spam 2.0, discuss new difficulties associated with Spam 2.0, outline its significance, and list possible countermeasure. The aim of this paper is to provide the reader with a complete understanding of this new form of spamming.
    Forum spam
    Spamdexing
    CAPTCHA
    Countermeasure
    Citations (45)
    In this paper, we study the topical characteristic of spam hosts. To categorize spam hosts, we extract link spam structures from multiple time snapshots of Japanese Web archive using graph algorithms. Next, we define several spam topic categories and classify spam hosts in those structures into such spam topics using their uniform resource locator(URL)s and a machine learning approach. We analyze the spam topic distribution on the Web in different years and observe the change in spam topics through the time.
    Forum spam
    Spamdexing
    Citations (0)