Abstract
Blogs have recently emerged as a new open, rapidly evolving and reactive publishing medium on the Web. Rather than managed by a central entity, the content on the blogosphere — the collection of all blogs on the Web — is produced by millions of independent bloggers, who can write about virtually anything. This open publishing paradigm has led to a growing mass of user-generated content on the Web, which can vary tremendously both in format and quality when looked at in isolation, but which can also reveal interesting patterns when observed in aggregation. One field particularly interested in studying how information is produced, consumed, and searched in the blogosphere is information retrieval. In this survey, we review the published literature on searching the blogosphere. In particular, we describe the phenomenon of blogging and the motivations for searching for information on blogs. We cover both the search tasks underlying blog searchers’ information needs and the most successful approaches to these tasks. These include blog post and full blog search tasks, as well as blog-aided search tasks, such as trend and market analysis. Finally, we also describe the publicly available resources that support research on searching the blogosphere.
Topics

No keywords indexed for this article. Browse by subject →

References
241
[1]
Adamic "The political blogosphere and the 2004 U.S. election: divided they blog" (2005) 10.1145/1134271.1134277
[2]
Adar "Implicit structure and the dynamics of blogspace" (2004)
[3]
Agarwal "Blogosphere: research issues, tools, and applications" SIGKDD Explorations Newsletter (2008) 10.1145/1412734.1412737
[4]
Agarwal "Identifying the influential bloggers in a community" (2008)
[5]
Ali-Hasan "Expressing social relationships on the blog through links and comments" (2007)
[6]
Allan "Retrieval and novelty detection at the sentence level" (2003)
[7]
Amati (2003)
[8]
Amati "Automatic construction of an opinion-term vocabulary for ad hoc retrieval" (2008)
[9]
Amati "FUB, IASI-CNR and University of Tor Vergata at TREC 2008 blog track" (2008)
[10]
Amati "FUB, IASI-CNR, UNIVAQ at TREC 2011" (2011)
[11]
Amati "On performance of topical opinion retrieval" (2010)
[12]
Andreevskaia "All blogs are not made equal: exploring genre differences in sentiment tagging of blogs" (2007)
[13]
Arguello (2008)
[14]
Baccianella "SentiWordNet 3.0: An enhanced lexical resource for sentiment analysis and opinion mining" (2010)
[15]
Baeza-Yates (1999)
[16]
Bailey "Overview of the TREC-2007 enterprise track" (2007)
[17]
Baker "The Berkeley FrameNet Project" (1998)
[18]
Ballmer (1981) 10.1007/978-3-642-67758-8
[19]
Balog (2006)
[20]
Balog (2008)
[21]
Bansal (2007) 10.1145/1242572.1242802
[22]
Berkman (2008)
[23]
Blei "Latent dirichlet allocation" Journal of Machine Learning Research (2003)
[24]
Blood (2002)
[25]
Blood "How blogging software reshapes the online community" Communications of the ACM (2004) 10.1145/1035134.1035165
[26]
Bolourian "Quantification of topic propagation using percolation theory: A study of the icwsm network" (2009)
[27]
Boyd "Tweet, tweet, retweet: Conversational aspects of retweeting on twitter" (2010)
[28]
A taxonomy of web search

Andrei Broder

ACM SIGIR Forum 2002 10.1145/792550.792552
[29]
Burton "The icwsm 2009 spinn3r dataset" (2009)
[30]
Burton "The icwsm 2011 spinn3r dataset" (2011)
[31]
Cacheda "A case study of distributed information retrieval architectures to index one terabyte of text" Information Processing and Management (2005) 10.1016/j.ipm.2004.05.002
[32]
Callan (2000)
[33]
Adversarial Web Search

Carlos Castillo, Brian D. Davison

Foundations and Trends® in Information Retrieval 2010 10.1561/1500000021
[34]
Cha "Measuring user influence in twitter: The million follower fallacy" (2010)
[35]
Cha "Flash floods and ripples: The spread of media content through the blogosphere" (2009)
[36]
Chakrabarti "Page-level template detection via isotonic smoothing" (2007) 10.1145/1242572.1242582
[37]
Chenlo "Combining document and sentence scores for blog topic retrieval" (2010)
[38]
Chenlo "Effective and efficient polarity estimation in blogs based on sentence-level evidence" (2011) 10.1145/2063576.2063634
[39]
Chi (2006)
[40]
Chinavle (2009)
[41]
Cho "Social media and search" IEEE Internet Computing (2007) 10.1109/mic.2007.130
[42]
Cleverdon "The cranfield tests on index language devices" Aslib Proceedings (1967) 10.1108/eb050097
[43]
Cohn "The missing link: a probabilistic model of document content and hypertext connectivity" (2000)
[44]
Craswell "Overview of the TREC-2005 enterprise track" (2006)
[45]
Craswell "Random walks on the click graph" (2007)
[46]
C. Crum , “Google reveals factors for ranking tweets,” 2010. http://www. webpronews.com/google-reveals-factors-for-ranking-tweets-2010-01, accessed on 29/09/2011.
[47]
boyd (2008)
[48]
Demartini "ARES: a retrieval engine based on sentiments sentiment-based search result annotation and diversification" (2011)
[49]
[50]
Duan "An empirical study on learning to rank of tweets" (2010)

Showing 50 of 241 references

Metrics
15
Citations
241
References
Details
Published
Jul 30, 2012
Vol/Issue
6(1)
Pages
1-125
Cite This Article
Rodrygo L. T. Santos, Craig Macdonald, Richard McCreadie, et al. (2012). Information Retrieval on the Blogosphere. Foundations and Trends® in Information Retrieval, 6(1), 1-125. https://doi.org/10.1561/1500000026
Related

You May Also Like

The Probabilistic Relevance Framework: BM25 and Beyond

Stephen Robertson, Hugo Zaragoza · 2009

2,108 citations

Learning to Rank for Information Retrieval

Tie-Yan Liu · 2009

1,409 citations

Authorship Attribution

Patrick Juola · 2008

400 citations

LifeLogging: Personal Big Data

Cathal Gurrin, Alan F. Smeaton · 2014

328 citations