Abstract
Web search engines have become indispensable tools for finding content. As the popularity of the Web has increased, the efforts to exploit the Web for commercial, social, or political advantage have grown, making it harder for search engines to discriminate between truthful signals of content quality and deceptive attempts to game search engines’ rankings. This problem is further complicated by the open nature of the Web, which allows anyone to write and publish anything, and by the fact that search engines must analyze ever-growing numbers of Web pages. Moreover, increasing expectations of users, who over time rely on Web search for information needs related to more aspects of their lives, further deepen the need for search engines to develop effective counter-measures against deception.
In this monograph, we consider the effects of the adversarial relationship between search systems and those who wish to manipulate them, a field known as “Adversarial Information Retrieval”. We show that search engine spammers create false content and misleading links to lure unsuspecting visitors to pages filled with advertisements or malware. We also examine work over the past decade or so that aims to discover such spamming activities to get spam pages removed or their effect on the quality of the results reduced.
Research in Adversarial Information Retrieval has been evolving over time, and currently continues both in traditional areas (e.g., link spam) and newer areas, such as click fraud and spam in social media, demonstrating that this conflict is far from over.
Topics

No keywords indexed for this article. Browse by subject →

References
257
[1]
A "Link-based similarity search to fight Web spam" (2006)
[2]
Abernethy "Semi-supervised classification with hyperlinks"
[3]
Abernethy "Webspam identification through content and hyperlinks" (2008)
[4]
Abernethy "Graph regularization methods for web spam detection" Machine Learning Journal (2010) 10.1007/s10994-010-5171-1
[5]
Adali "Optimal link bombs are uncoordinated"
[6]
[7]
Amitay "Serial sharers: Detecting split identities of Web authors"
[8]
[9]
Arasu "Searching the Web" ACM Transactions on the Internet Technology (TOIT) 1 (2001) 10.1145/383034.383035
[10]
Attenberg (2008)
[11]
Bacarella (2004)
[12]
Baeza-Yates "PageRank increase under different collusion topologies" (2005)
[13]
Baeza-Yates (1999)
[14]
Bar-Ilan "Web links and search engine ranking: The case of Google and the query “jew”" Journal of the American Society for Information Science and Technology (2006) 10.1002/asi.20404
[15]
Bar-Ilan "Google bombing from a time perspective" Journal of Computer-Mediated Communication (2007) 10.1111/j.1083-6101.2007.00356.x
[16]
Bar-Yossef "Do not crawl in the DUST: Different URLs with similar text" ACM Transactions on the Web (2009) 10.1145/1462148.1462151
[17]
Battelle (2005)
[18]
Becchetti "Link analysis for Web spam detection" ACM Transactions on the Web (2008) 10.1145/1326561.1326563
[19]
Becchetti (2006)
[20]
Benczúr (2006) 10.1145/1135777.1135954
[21]
Benczúr "SpamRank: Fully automatic link spam detection" (2005)
[22]
Benevenuto (2008)
[23]
A Survey on PageRank Computing

Pavel Berkhin

Internet Mathematics 2005 10.1080/15427951.2005.10129098
[24]
Berlt "A hypergraph model for computing page reputation on Web collections" (2007)
[25]
Bharat "Improved algorithms for topic distillation in hyperlinked environments" (1998)
[26]
Bian (2008)
[27]
Bifet "An analysis of factors used in search engine ranking"
[28]
Bíró (2009)
[29]
Bíró (2008)
[30]
Blei "Latent dirichlet allocation" Journal of Machine Learning Research (2003)
[31]
Borodin "Finding authorities and hubs from link structures on the World Wide Web" (2001)
[32]
Boykin "Personal Email networks: an effective anti-spam tool" (2004)
[33]
Brin "What can you do with a Web in your pocket?" Data Engineering Bulletin (1998)
[34]
Brin "The anatomy of a large-scale hypertextual Web search engine" (1998)
[35]
Brod "Advantageous semi-collusion" The Journal of Industrial Economics (1999) 10.1111/1467-6451.00098
[36]
Syntactic clustering of the Web

Andrei Z. Broder, Steven C. Glassman, Mark S. Manasse et al.

Computer Networks and ISDN Systems 1997 10.1016/s0169-7552(97)00031-7
[37]
Brooks "Web search: How the Web has changed information retrieval" Information Research (2003)
[38]
Buehrer (2008)
[39]
Büttcher (2006)
[40]
Castillo "Effective Web Crawling" (2004)
[41]
Castillo (2008)
[42]
Castillo (2008)
[43]
Castillo "A reference collection for Web spam" SIGIR Forum (2006) 10.1145/1189702.1189703
[44]
Castillo (2007)
[45]
Caverlee (2007)
[46]
Caverlee (2007) 10.1145/1281100.1281124
[47]
Caverlee "Socialtrust: Tamper-resilient trust establishment in online communities" (2008)
[48]
Caverlee (2008) 10.1145/1367497.1367707
[49]
Caverlee "Spam-resilient Web rankings via influence throttling" (2007)
[50]
Chellapilla "Improving cloaking detection using search query popularity and monetizability" (2006)

Showing 50 of 257 references

Cited By
63
Information Retrieval on the Blogosphere

Rodrygo L. T. Santos, Craig Macdonald · 2012

Foundations and Trends® in Informat...
Metrics
63
Citations
257
References
Details
Published
Jan 22, 2011
Vol/Issue
4(5)
Pages
377-486
Cite This Article
Carlos Castillo, Brian D. Davison (2011). Adversarial Web Search. Foundations and Trends® in Information Retrieval, 4(5), 377-486. https://doi.org/10.1561/1500000021
Related

You May Also Like

The Probabilistic Relevance Framework: BM25 and Beyond

Stephen Robertson, Hugo Zaragoza · 2009

2,108 citations

Learning to Rank for Information Retrieval

Tie-Yan Liu · 2009

1,409 citations

Authorship Attribution

Patrick Juola · 2008

400 citations

LifeLogging: Personal Big Data

Cathal Gurrin, Alan F. Smeaton · 2014

328 citations