Abstract
How can a single person understand what’s going on in a collection of millions of documents? This is an increasingly common problem: sifting through an organization’s e-mails, understanding a decade worth of newspapers, or characterizing a scientific field’s research. Topic models are a statistical framework that help users understand large document collections: not just to find individual documents but to understand the general themes present in the collection. This survey describes the recent academic and industrial applications of topic models with the goal of launching a young researcher capable of building their own applications of topic models. In addition to topic models’ effective application to traditional problems like information retrieval, visualization, statistical inference, multilingual modeling, and linguistic understanding, this survey also reviews topic models’ ability to unlock large text collections for qualitative analysis. We review their successful use by researchers to help understand fiction, non-fiction, scientific publications, and political texts.
How can a single person understand what’s going on in a collection of millions of documents? This is an increasingly widespread problem: sifting through an organization’s e-mails, understanding a decade worth of newspapers, or characterizing a scientific field’s research. This monograph explores the ways that humans and computers make sense of document collections through tools called topic models. Topic models are a statistical framework that help users understand large document collections; not just to find individual documents but to understand the general themes present in the collection.
Applications of Topic Models describes the recent academic and industrial applications of topic models. In addition to topic models’ effective application to traditional problems like information retrieval, visualization, statistical inference, multilingual modeling, and linguistic understanding, Applications of Topic Models also reviews topic models’ ability to unlock large text collections for qualitative analysis. It reviews their successful use by researchers to help understand fiction, non-fiction, scientific publications, and political texts.
Applications of Topic Models is aimed at the reader with some knowledge of document processing, basic understanding of some probability, and interested in many application domains. It discusses the information needs of each application area, and how those specific needs affect models, curation procedures, and interpretations. By the end of the monograph, it is hoped that readers will be excited enough to attempt to embark on building their own topic models. It should also be of interest to topic model experts as the coverage of diverse applications may expose models and approaches they had not seen before.
Topics

No keywords indexed for this article. Browse by subject →

References
250
[1]
Airoldi "Mixed membership stochastic blockmodels" Journal of Machine Learning Research (2008)
[2]
Aletras "Representing topics labels for exploring digital libraries" (2014) 10.1109/jcdl.2014.6970174
[3]
Algee-Hewitt "On paragraphs. scale, themes, and narrative form" Stanford Literary Lab Pamphlets (2015)
[4]
[5]
AlSumait "On-line LDA: Adaptive topic models for mining text streams with applications to topic detection and tracking" (2008) 10.1109/icdm.2008.140
[6]
Anandkumar "A spectral algorithm for latent Dirichlet allocation" (2012)
[7]
Andrzejewski "Latent topic feedback for information retrieval" (2011)
[8]
Andrzejewski "Incorporating domain knowledge into topic modeling via Dirichlet forest priors" (2009) 10.1145/1553374.1553378
[9]
Arora "A practical algorithm for topic modeling with provable guarantees" (2013)
[10]
Bakalov "Topic models for taxonomies" (2012) 10.1145/2232817.2232861
[11]
Batmanghelich "Nonparametric spherical topic modeling with word embeddings" (2016) 10.18653/v1/p16-2087
[12]
Baumer "Comparing grounded theory and topic modeling: Extreme divergence or unlikely convergence?" Journal of the Association for Information Science and Technology (2017) 10.1002/asi.23786
[13]
Bellegarda "A latent semantic analysis framework for large-span language modeling" (1997) 10.21437/eurospeech.1997-421
[14]
Bellegarda "Statistical language model adaptation: review and perspectives" (2004)
[15]
Bengio "A neural probabilistic language model" J. Mach. Learn. Res. (2003)
[16]
Berger "Information retrieval as statistical translation" (1999) 10.1145/312624.312681
[17]
Bhattacharya "Collective entity resolution in relational data" (2006)
[18]
Blei "Topic modeling and digital humanities" Journal of Digital Humanities (2012)
[19]
Blei "A correlated topic model of science" The Annals of Applied Statistics (2007)
[20]
Blei "Dynamic topic models" (2006) 10.1145/1143844.1143859
[21]
Blei "Supervised topic models" (2007)
[22]
Blei "Latent Dirichlet allocation" Journal of Machine Learning Research (2003)
[23]
Blei "The nested Chinese restaurant process and Bayesian nonparametric inference of topic hierarchies" Journal of the ACM (2010) 10.1145/1667053.1667056
[24]
Bowen "Pseudo-events pay dividends from Cleopatra to Chipotle" (2016)
[25]
Box (1987)
[26]
Boyd-Graber "Multilingual topic models for unaligned text" (2009)
[27]
Boyd-Graber "Holistic sentiment analysis across languages: Multilingual supervised latent Dirichlet allocation" (2010)
[28]
Boyd-Graber "A topic model for word sense disambiguation" (2007)
[29]
Boyd-Graber (2014)
[30]
Bridgman (1927)
[31]
Brody "Bayesian word sense induction" (2009) 10.3115/1609067.1609078
[32]
Broniatowski "Using social media to perform local influenza surveillance in an inner-city hospital: A retrospective observational study" JMIR Public Health and Surveillance (2015) 10.2196/publichealth.4472
[33]
Burrows "Delta: a measure of stylistic difference and a guide to likely authorship" Lit Linguist Computing (2002) 10.1093/llc/17.3.267
[34]
Carbonell "Translingual information retrieval: A comparative evaluation" (1997)
[35]
Carman "Towards query log based personalization using topic models" (2010) 10.1145/1871437.1871745
[36]
Cha "Social-network analysis using topic models" (2012) 10.1145/2348283.2348360
[37]
Chaney "Visualizing topic models" (2012)
[38]
Chang "Relational topic models for document networks" (2009)
[39]
Chang "Reading tea leaves: How humans interpret topic models" (2009)
[40]
Chen "Adaptation of reordering models for statistical machine translation" (2013)
[41]
Chen (1998)
[42]
Chiang "Two easy improvements to lexical weighting" (2011)
[43]
Choo "UTOPIAN: User-driven topic modeling based on interactive nonnegative matrix factorization" IEEE Transactions on Visualization and Computer Graphics (2013) 10.1109/tvcg.2013.212
[44]
Chuang "Termite: Visualization techniques for assessing textual topic models" (2012)
[45]
Chuang "TopicCheck: Interactive alignment for assessing topic model stability" (2015) 10.3115/v1/n15-1018
[46]
Clarkson "Language model adaptation using mixtures and an exponentially decaying cache" (1997)
[47]
Coccaro "Towards better integration of semantic predictors in statistical language modeling" (1998)
[48]
Collobert "Torch7: A Matlab-like environment for machine learning" (2011)
[49]
Crammer "Online passive-aggressive algorithms" Journal of Machine Learning Research (2006)
[50]
Croft "Language modeling for information retrieval" (2003) 10.1007/978-94-017-0171-6

Showing 50 of 250 references

Cited By
177
Journal of Educational Measurement
ACM Computing Surveys
Computational Literature Reviews: Method, Algorithms, and Roadmap

David Antons, Christoph F. Breidbach · 2021

Organizational Research Methods
Metrics
177
Citations
250
References
Details
Published
Jul 20, 2017
Vol/Issue
11(2-3)
Pages
143-296
Cite This Article
Jordan Boyd-Graber, Yuening Hu, David Mimno (2017). Applications of Topic Models. Foundations and Trends® in Information Retrieval, 11(2-3), 143-296. https://doi.org/10.1561/1500000030
Related

You May Also Like

The Probabilistic Relevance Framework: BM25 and Beyond

Stephen Robertson, Hugo Zaragoza · 2009

2,108 citations

Learning to Rank for Information Retrieval

Tie-Yan Liu · 2009

1,409 citations

Authorship Attribution

Patrick Juola · 2008

400 citations

LifeLogging: Personal Big Data

Cathal Gurrin, Alan F. Smeaton · 2014

328 citations