journal article Apr 05, 2017

Enhancing Validity in Observational Settings When Replication is Not Possible

View at Publisher Save 10.1017/psrm.2017.5
Abstract
We argue that political sciexntists can provide additional evidence for the predictive validity of observational and quasi-experimental research designs by minimizing the expected prediction error or generalization error of their empirical models. For observational and quasi-experimental data not generated by a stochastic mechanism under the researcher’s control, the reproduction of statistical analyses is possible but replication of the data-generating procedures is not. Estimating the generalization error of a model for this type of data and then adjusting the model to minimize this estimate—regularization—provides evidence for the predictive validity of the study by decreasing the risk of overfitting. Estimating generalization error also allows for model comparisons that highlight underfitting: when a model generalizes poorly due to missing systematic features of the data-generating process. Thus, minimizing generalization error provides a principled method for modeling relationships between variables that are measured but whose relationships with the outcome(s) are left unspecified by a deductively valid theory. Overall, the minimization of generalization error is important because it quantifies the expected reliability of predictions in a way that is similar to external validity, consequently increasing the validity of the study’s conclusions.
Topics

No keywords indexed for this article. Browse by subject →

References
76
[1]
Regularization and Variable Selection Via the Elastic Net

Hui Zou, Trevor Hastie

Journal of the Royal Statistical Society Series B:... 10.1111/j.1467-9868.2005.00503.x
[2]
Wood Simon , and Wood Maintainer Simon . 2015. ‘Package “Mgcv”’. R Package Version, 1–7.
[4]
Wager Stefan , and Athey Susan . 2015. ‘Estimation and Inference of Heterogeneous Treatment Effects Using Random Forests’. ArXiv Preprint ArXiv:1510.04342.
[8]
[13]
Mentch Lucas , and Hooker Giles . 2014. ‘Ensemble Trees and Clts: Statistical Inference for Supervised Learning’. ArXiv Preprint ArXiv:1404.6473.
[14]
LeBlanc "Combining Estimates in Regression and Classification" Journal of the American Statistical Association (1996)
[16]
Kenkel Brenton , and Signorino Curtis S. . 2013. ‘Bootstrapped Basis Regression With Variable Selection: A New Method for Flexible Functional Form Estimation’. Manuscript, University of Rochester, Rochester, NY. 10.32614/cran.package.polywog
[18]
Keele (2008)
[20]
Hothorn "Model-Based Boosting 2.0" The Journal of Machine Learning Research (2010)
[25]
Herrnson "Replication, Verification, Secondary Analysis, and Data Collection in Political Science" PS: Political Science and Politics (1995)
[27]
Hothorn Torsten , Buehlmann Peter , Kneib Thomas , Schmid Matthias , and Hofner Benjamin . 2014. ‘Model-Based Boosting’.
[28]
The Elements of Statistical Learning

Trevor Hastie, Robert Tibshirani, Jerome Friedman

Springer Series in Statistics 10.1007/978-0-387-84858-7
[31]
Vapnik (1998)
[33]
Jones "Git/Github, Transparency, and Legitimacy in Quantitative Research" The Political Methodologist (2013)
[36]
Regression Shrinkage and Selection Via the Lasso

Robert Tibshirani

Journal of the Royal Statistical Society Series B:... 1996 10.1111/j.2517-6161.1996.tb02080.x
[38]
A survey of cross-validation procedures for model selection

Sylvain Arlot, Alain Celisse

Statistics Surveys 10.1214/09-ss054
[39]
Dafoe "Science Deserves Better: The Imperative to Share Complete Replication Files" PS: Political Science and Politics (2014)
[40]
Unbiased Recursive Partitioning: A Conditional Inference Framework

Torsten Hothorn, Kurt Hornik, Achim Zeileis

Journal of Computational and Graphical Statistics 10.1198/106186006x133933
[42]
Chenoweth Erica , and Ulfelder Jay . 2015. ‘Can Structural Conditions Explain the Onset of Nonviolent Uprisings?’. Journal of Conflict Resolution 61(2), 2017.
[43]
Douglass Rex W . 2015. ‘Understanding Civil War Violence Through Military Intelligence: Mining Civilian Targeting Records from the Vietnam War’. ArXiv Preprint arXiv:1506.05413v1. 10.1093/acprof:oso/9780199378296.003.0023
[44]
Athey Susan , and Imbens Guido . 2015. ‘Machine Learning Methods for Estimating Heterogeneous Causal Effects’. ArXiv Preprint ArXiv:1504.01132.
[50]
McDonald Daniel J. , Shalizi Cosma Rohilla , and Schervish Mark . 2012. ‘Time Series Forecasting: Model Evaluation and Selection Using Nonparametric Risk Bounds’. ArXiv Preprint arXiv:1212.0463.

Showing 50 of 76 references

Cited By
17
Journal of Conflict Resolution
Metrics
17
Citations
76
References
Details
Published
Apr 05, 2017
Vol/Issue
6(2)
Pages
365-380
License
View
Cite This Article
Christopher J. Fariss, Zachary M. Jones (2017). Enhancing Validity in Observational Settings When Replication is Not Possible. Political Science Research and Methods, 6(2), 365-380. https://doi.org/10.1017/psrm.2017.5