logo
    Definition of spam 2.0: New spamming boom
    45
    Citation
    17
    Reference
    10
    Related Paper
    Citation Trend
    Abstract:
    The most widely recognized form of spam is e-mail spam, however the term "spam" is used to describe similar abuses in other media and mediums. Spam 2.0 (or Web 2.0 Spam) is refereed to as spam content that is hosted on online Web 2.0 applications. In this paper: we provide a definition of Spam 2.0, identify and explain different entities within Spam 2.0, discuss new difficulties associated with Spam 2.0, outline its significance, and list possible countermeasure. The aim of this paper is to provide the reader with a complete understanding of this new form of spamming.
    Keywords:
    Forum spam
    Spamdexing
    CAPTCHA
    Countermeasure
    Combating Web spam is one of the greatest challenges for Web search engines. State-of-the-art anti-spam techniques focus mainly on detecting varieties of spam strategies, such as content spamming and link-based spamming. Although these anti-spam approaches have had much success, they encounter problems when fighting against a continuous barrage of new types of spamming techniques. We attempt to solve the problem from a new perspective, by noticing that queries that are more likely to lead to spam pages/sites have the following characteristics: 1) they are popular or reflect heavy demands for search engine users and 2) there are usually few key resources or authoritative results for them. From these observations, we propose a novel method that is based on click-through data analysis by propagating the spamicity score iteratively between queries and URLs from a few seed pages/sites. Once we obtain the seed pages/sites, we use the link structure of the click-through bipartite graph to discover other pages/sites that are likely to be spam. Experiments show that our algorithm is both efficient and effective in detecting Web spam. Moreover, combining our method with some popular anti-spam techniques such as TrustRank achieves improvement compared with each technique taken individually.
    Spamdexing
    Forum spam
    Click-through rate
    Deep Web
    Citations (17)
    Definition Web spam refers to a host of techniques to subvert the ranking algorithms of web search engines and cause them to rank search results higher than they would otherwise. Examples of such techniques include content spam (populating web pages with popular and often highly monetizable search terms), link spam (creating links to a page in order to increase its linkbased score), and cloaking (serving different versions of a page to search engine crawlers than to human users). Web spam is annoying to search engine users and disruptive to search engines; therefore, most commercial search engines try to combat web spam. Combating web spam consists of identifying spam content with high probability and – depending on policy – downgrading it during ranking, eliminating it from the index, no longer crawling it, and tainting affiliated content. The first step – identifying likely spam pages – is a classification problem amenable to machine learning techniques. Spam classifiers take a large set of diverse features as input, including contentbased features, link-based features, DNS and domainregistration features, and implicit user feedback. Commercial search engines treat their precise set of spam-prediction features as extremely proprietary, and features (as well as spamming techniques) evolve continuously as search engines and web spammers are engaged in a continuing “arms race.”
    Spamdexing
    Web crawler
    Forum spam
    Search engine optimization
    Citations (0)
    Based on the recent research and statistics by Symantec, significant amount of all global web traffic and email traffic is marked as spam. Spambot is basically a robot that maliciously traverses the World Wide Web (WWW), and gathers information, email addresses, etc. For the spammer. The increasing growth of spam bot sophistication advances in the introduction of Spam 2.0, which infiltrate legitimate Web 2.0 unsolicited. This leads to various unwanted outcomes, such as the appearance of spam pages as the top search engines results due to excessive usage of popular terms, unreal web-pages visit rate, spam emails, and wastes of resources. Here we present an efficient method to detect web spam bot in the presence of decoy actions, by applying efficient approximate string-matching techniques. Our preliminary experimental results show that the proposed method is successful for the classification of web spam bot in the presence of decoy actions, hence eliminating spam in Web 2.0 applications.
    Forum spam
    Spamdexing
    Decoy
    Web crawler
    Web traffic
    Deep Web
    Citations (3)
    Spamdexing
    Web crawler
    Forum spam
    Search engine optimization
    Citations (16)
    Web spam refers to some techniques, which try to manipulate search engine ranking algorithms in order to raise web page position in search engine results. In the best case, spammers encourage viewers to visit their sites, and provide undeserved advertisement gains to the page owner. In the worst case, they use malicious contents in their pages and try to ins tall malware on the victim’s machine. Spammers use three kinds of spamming techniques to get higher score in ranking. These techniq ues are Link based techniques, hiding techniques and Content-based techniques. Existing spam pages cause distrust to search engine results. This not only wastes the time of visitors, but also wastes lots of search engine resources. Hence spam detection methods have been proposed as a solution for web spam in order to reduce negative effects of spam pages. Experimental results sh ow that some of these techniques are working well and can find spam pages more accurate than the others. This paper classifies web spam techniques and the related detection methods.
    Spamdexing
    Forum spam
    Web crawler
    Deep Web
    Search engine optimization
    Citations (0)
    Web spam can significantly deteriorate the quality of search engines. Early web spamming techniques mainly manipulate page content. Since linkage information is widely used in web search, link-based spamming has also developed. So far, many techniques have been proposed to detect link spam. Those approaches are basically built on link-based web ranking methods.
    Spamdexing
    Link (geometry)
    Link analysis
    Forum spam
    Citations (43)
    Web spam refers to some techniques, which try to manipulate search engine ranking algorithms in order to raise web page position in search engine results. In the best case, spammers encourage viewers to visit their sites, and provide undeserved advertisement gains to the page owner. In the worst case, they use malicious contents in their pages and try to install malware on the victims machine. Spammers use three kinds of spamming techniques to get higher score in ranking. These techniques are Link based techniques, hiding techniques and content-based techniques. Existing spam pages cause distrust to search engine results. This not only wastes the time of visitors, but also wastes lots of search engine resources. Hence spam detection methods have been proposed as a solution for web spam in order to reduce negative effects of spam pages. Experimental results show that some of these techniques are working well and can find spam pages more accurate than the others. This paper classifies web spam techniques and the related detection methods.
    Spamdexing
    Forum spam
    Web crawler
    Deep Web
    Citations (15)
    Web spam potentially causes three deleterious effects: unnecessary work for crawlers and search engines; diversion of traffic away from legitimate businesses; and annoyance to search engine users through poorer results. Past research on web spam has focused on spamming techniques, spam suppression techniques, and methods for classifying web content as spam or non-spam. Here we focus on the deterioration of search result quality caused by the presence of spam in a countryscale web. We present a framework for measuring the degradation in quality of search results caused by the presence of web spam. We index the 80 million page UK2006 web spam collection on one machine. We trial the proposed framework in an experiment with the UK2006 collection and demonstrate that simple removal of spam pages from result sets can increase result quality. We conclude that the framework is a reasonable vehicle for research in this area and outline changes necessary for planned future experiments.
    Spamdexing
    Forum spam
    Web crawler
    Search engine optimization
    Citations (7)
    High ranking of a Web site in search engines can be directly correlated to high revenues. This amplifies the phenomenon of Web spamming which can be defined as preparing or manipulating any features of Web documents or hosts to mislead search engines’ ranking algorithms to gain an undeservedly high position in search results. Web spam remarkably deteriorates the information quality available on the Web and thus affects the whole Web community including search engines. The struggle between search engines and spammers is ongoing: both sides apply increasingly sophisticated techniques and counter-techniques against each other. In this paper, we first present a general background concerning the Web spam phenomenon. We then explain why the machine learning approach is so attractive for Web spam combating. Finally, we provide results of our experiments aiming at verification of certain open questions. We investigate the quality of data provided as the Web Spam Reference Corpus, widely used by the research community as a benchmark, and propose some improvements. We also try to address the question concerning parameter tuning for cost-sensitive classifiers and we delve into the possibility of using linguistic features for distinguishing spam from non-spam.
    Spamdexing
    Forum spam
    Web crawler
    Citations (2)
    The most widely recognized form of spam is e-mail spam, however the term "spam" is used to describe similar abuses in other media and mediums. Spam 2.0 (or Web 2.0 Spam) is refereed to as spam content that is hosted on online Web 2.0 applications. In this paper: we provide a definition of Spam 2.0, identify and explain different entities within Spam 2.0, discuss new difficulties associated with Spam 2.0, outline its significance, and list possible countermeasure. The aim of this paper is to provide the reader with a complete understanding of this new form of spamming.
    Forum spam
    Spamdexing
    CAPTCHA
    Countermeasure
    Citations (45)