journal article Open Access Aug 04, 2023

Ensemble learning for score likelihood ratios under the common source problem

View at Publisher Save 10.1002/sam.11637
Abstract
AbstractMachine learning‐based score likelihood ratios (SLRs) have emerged as alternatives to traditional likelihood ratios and Bayes factors to quantify the value of evidence when contrasting two opposing propositions. When developing a conventional statistical model is infeasible, machine learning can be used to construct a (dis)similarity score for complex data and estimate the ratio of the conditional distributions of the scores. Under the common source problem, the opposing propositions address if two items come from the same source. To develop their SLRs, practitioners create datasets using pairwise comparisons from a background population sample. These comparisons result in a complex dependence structure that violates the independence assumption made by many popular methods. We propose a resampling step to remedy this lack of independence and an ensemble approach to enhance the performance of SLR systems. First, we introduce a source‐aware resampling plan to construct datasets where the independence assumption is met. Using these newly created sets, we train multiple base SLRs and aggregate their outputs into a final value of evidence. Our experimental results show that this ensemble SLR can outperform a traditional SLR approach in terms of the rate of misleading evidence and discriminatory power and present more consistent results.
Topics

No keywords indexed for this article. Browse by subject →

References
50
[2]
F.Báez‐Santiago J.Lundstrom A.Crawford N.Berry B.Escobar J.Taylor S.Reinders andD.Ommen.Handwriter: An r package for statistical writership analysis.2021.
[5]
Bagging predictors

Leo Breiman

Machine Learning 10.1007/bf00058655
[6]
Random Forests

Leo Breiman

Machine Learning 10.1023/a:1010933404324
[7]
Chawla N. V. (2005)
[10]
National Research Council Committee (2009)
[18]
N.Garton D.Ommen J.Niemi andA.Carriquiry.Score‐based likelihood ratios to evaluate forensic pattern evidence. arXiv preprint arXiv:2002.09470.2020.
[23]
James G. (2021)
[24]
The class imbalance problem: A systematic study1

Nathalie Japkowicz, Shaju Stephen

Intelligent Data Analysis 10.3233/ida-2002-6504
[26]
F.Kleber S.Fiel M.Diem andR.Sablatnig.Cvl‐database: An off‐line database for writer retrieval writer identification and word spotting. 2013 12th international conference on document analysis and recognition IEEE 560–564.2013. 10.1109/icdar.2013.117
[27]
E. S.LanderandPCAST Working Group.Forensic science in criminal courts: Ensuring scientific validity of feature comparison methods.2016.
[33]
R.Neijmeijer.Assessing performance of score‐based likelihood ratio methods for forensic data. Master's thesis Leiden University.2016https://openaccess.leidenuniv.nl/bitstream/handle/1887/44582/Neijmeijer%2C%20Ren%C3%A9‐s1436643‐MA%20Thesis%20MS‐2016.pdf?sequence=1.
[39]
Osborn A. S. (1929)
[44]
Sugiyama M. "Density ratio estimation: A comprehensive review (statistical experiment and its related topics)" RIMS Kokyuroku (2010)
[47]
F.VeneriandD.Ommen.An evaluation of score‐based likelihood ratios for glass data. Master's thesis Iowa State University.2021https://dr.lib.iastate.edu/handle/20.500.12876/7wbOPGNv. 10.31274/cc-20240624-173
[48]
S.Willis C.Aitken A.Barrett C.Berger A.Biedermann C.Champod T.Hicks J.Lucena‐Molina L.Lunt S.McDermott L.McKenna A.Nordgaard G.O'Donnell B.Rasmusson M.Sjerps F.Taroni andG.Zadora.ENFSI guideline for evaluative reporting in forensic science. European Network of Forensic Science Institutes.2015http://enfsi.eu/wp‐content/uploads/2016/09/m1_guideline.pdf.
[50]
X.Zhu L.Tang andE.Tabassi.Repeatability and reproducibility of forensic likelihood ratio methods when sample size ratio varies. 2017 IEEE International Joint Conference on Biometrics (IJCB) IEEE 517–524.2017. 10.1109/btas.2017.8272737
Metrics
7
Citations
50
References
Details
Published
Aug 04, 2023
Vol/Issue
16(6)
Pages
528-546
License
View
Funding
National Institute of Standards and Technology
Center for Statistics and Applications in Forensic Evidence Award: 70NANB15H176
Cite This Article
Federico Veneri, Danica M. Ommen (2023). Ensemble learning for score likelihood ratios under the common source problem. Statistical Analysis and Data Mining: An ASA Data Science Journal, 16(6), 528-546. https://doi.org/10.1002/sam.11637
Related

You May Also Like