journal article Feb 13, 2024

Measuring and Mitigating Gender Bias in Legal Contextualized Language Models

Abstract
Transformer-based contextualized language models constitute the state-of-the-art in several natural language processing (NLP) tasks and applications. Despite their utility, contextualized models can contain human-like social biases, as their training corpora generally consist of human-generated text. Evaluating and removing social biases in NLP models has been a major research endeavor. In parallel, NLP approaches in the legal domain, namely, legal NLP or computational law, have also been increasing. Eliminating unwanted bias in legal NLP is crucial, since the law has the utmost importance and effect on people. In this work, we focus on the gender bias encoded in BERT-based models. We propose a new template-based bias measurement method with a new bias evaluation corpus using crime words from the FBI database. This method quantifies the gender bias present in BERT-based models for legal applications. Furthermore, we propose a new fine-tuning-based debiasing method using the European Court of Human Rights (ECtHR) corpus to debias legal pre-trained models. We test the debiased models’ language understanding performance on the LexGLUE benchmark to confirm that the underlying semantic vector space is not perturbed during the debiasing process. Finally, we propose a bias penalty for the performance scores to emphasize the effect of gender bias on model performance.
Topics

No keywords indexed for this article. Browse by subject →

References
96
[3]
Michał Araszkiewicz, Trevor Bench-Capon, Enrico Francesconi, Marc Lauritsen, and Antonino Rotolo. 2022. Thirty years of artificial intelligence and law: Overviews. Artif. Intell. Law (2022), 1–18.
[4]
Elliott Ash, Daniel L. Chen, and Arianna Ornaghi. 2021. Gender attitudes in the judiciary: Evidence from US circuit courts. Cent. Law Econ. Work. Pap. Series 2019, 02 (2021).
[5]
Kevin D. Ashley. 1988. Modelling Legal Argument: Reasoning with Cases and Hypotheticals. Ph. D. Dissertation. University of Massachusetts.
[9]
Roberto Asmat and Lajos Kossuth. 2021. Gender differences in judicial decisions under incomplete information: Evidence from child support cases. Retrieved from: SSRN 3964747 (2021).
[10]
Ngo Xuan Bach, Nguyen Le Minh, Tran Thi Oanh, and Akira Shimazu. 2013. A two-phase framework for learning logical structures of paragraphs in legal articles. ACM Trans. Asian Lang. Inf. Process. 12, 1, Article 3 (Mar. 2013), 32 pages.
[11]
Marion Bartl, Malvina Nissim, and Albert Gatt. 2020. Unmasking contextual stereotypes: Measuring and mitigating BERT’s gender bias. In Proceedings of the 2nd Workshop on Gender Bias in Natural Language Processing. Association for Computational Linguistics, 1–16. Retrieved from https://aclanthology.org/2020.gebnlp-1.1
[12]
Iz Beltagy, Matthew E. Peters, and Arman Cohan. 2020. Longformer: The long-document transformer. CoRR abs/2004.05150 (2020).
[13]
Trevor Bench-Capon. 2022. Thirty years of artificial intelligence and law: Editor’s introduction. Artif. Intell. Law (2022), 1–5.
[15]
Rishabh Bhardwaj, Navonil Majumder, and Soujanya Poria. 2021. Investigating gender bias in BERT. Cog. Comput. 13, 4 (2021), 1008–1018. 10.1007/s12559-021-09881-2
[16]
Tolga Bolukbasi, Kai-Wei Chang, James Zou, Venkatesh Saligrama, and Adam Kalai. 2016. Man is to computer programmer as woman is to homemaker? Debiasing word embeddings. In Proceedings of the 30th International Conference on Neural Information Processing Systems (NIPS’16). Curran Associates Inc., Red Hook, NY, 435–364.
[18]
Semantics derived automatically from language corpora contain human-like biases

Aylin Caliskan, Joanna J. Bryson, Arvind Narayanan

Science 10.1126/science.aal4230
[19]
Ilias Chalkidis, Ion Androutsopoulos, and Nikolaos Aletras. 2019. Neural legal judgment prediction in English. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 4317–4323. DOI:10.18653/v1/P19-1424 10.18653/v1/p19-1424
[20]
Ilias Chalkidis, Ion Androutsopoulos, and Achilleas Michos. 2017. Extracting contract elements. In Proceedings of the 16th Edition of the International Conference on Articial Intelligence and Law (ICAIL’17). Association for Computing Machinery, New York, NY, 19–28. DOI:10.1145/3086512.3086515
[21]
Ilias Chalkidis, Ion Androutsopoulos, and Achilleas Michos. 2018. Obligation and prohibition extraction using hierarchical RNNs. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 254–259. DOI:10.18653/v1/P18-2041
[22]
Ilias Chalkidis, Emmanouil Fergadiotis, Prodromos Malakasiotis, Nikolaos Aletras, and Ion Androutsopoulos. 2019. Extreme multi-label legal text classification: A case study in EU legislation. In Proceedings of the Natural Legal Language Processing Workshop. Association for Computational Linguistics, 78–87. DOI:10.18653/v1/W19-2209
[23]
Ilias Chalkidis, Emmanouil Fergadiotis, Prodromos Malakasiotis, and Ion Androutsopoulos. 2019. Large-scale multi-label text classification on EU legislation. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 6314–6322. DOI:10.18653/v1/P19-1636 10.18653/v1/p19-1636
[24]
Ilias Chalkidis, Manos Fergadiotis, and Ion Androutsopoulos. 2021. MultiEurlEX—A multi-lingual and multi-label legal document classification dataset for zero-shot cross-lingual transfer. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 6974–6996. Retrieved from https://aclanthology.org/2021.emnlp-main.559
[26]
Ilias Chalkidis, Manos Fergadiotis, Prodromos Malakasiotis, and Ion Androutsopoulos. 2019. Neural contract element extraction revisited. In Proceedings of the Workshop on Document Intelligence at NeurIPS. Retrieved from https://openreview.net/forum?id=B1x6fa95UH
[27]
Ilias Chalkidis, Manos Fergadiotis, Dimitrios Tsarapatsanis, Nikolaos Aletras, Ion Androutsopoulos, and Prodromos Malakasiotis. 2021. Paragraph-level rationale extraction through regularization: A case study on European Court of Human Rights Cases. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 226–241. DOI:10.18653/v1/2021.naacl-main.22
[28]
Ilias Chalkidis, Abhik Jana, Dirk Hartung, Michael Bommarito, Ion Androutsopoulos, Daniel Katz, and Nikolaos Aletras. 2022. LexGLUE: A benchmark dataset for legal language understanding in English. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics). Association for Computational Linguistics, 4310–4330. DOI:10.18653/v1/2022.acl-long.297
[30]
Ilias Chalkidis, Tommaso Pasini, Sheng Zhang, Letizia Tomada, Sebastian Schwemer, and Anders Søgaard. 2022. FairLex: A multilingual benchmark for evaluating fairness in legal text processing. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 4389–4406. Retrieved from https://aclanthology.org/2022.acl-long.301
[32]
Donghyun Danny Choi, J. Andrew Harris, and Fiona Shen-Bayh. 2022. Ethnic bias in judicial decision making: Evidence from criminal appeals in Kenya. Am. Polit. Sci. Rev. 116, 3 (2022), 1067–1080. 10.1017/s000305542100143x
[34]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 4171–4186. DOI:10.18653/v1/N19-1423
[35]
Ahmed Elnaggar, Robin Otto, and Florian Matthes. 2018. Deep learning for named-entity linking with transfer learning for legal documents. In Proceedings of the Artificial Intelligence and Cloud Computing Conference (AICCC’18). Association for Computing Machinery, New York, NY, 23–28. DOI:10.1145/3299819.3299846
[36]
Filippo Galgani, Paul Compton, and Achim Hoffmann. 2012. Combining different summarization techniques for legal text. In Proceedings of the Workshop on Innovative Hybrid Approaches to the Processing of Textual Data (HYBRID’12). Association for Computational Linguistics, 115–123.
[37]
Hila Gonen and Yoav Goldberg. 2019. Lipstick on a pig: Debiasing methods cover up systematic gender biases in word embeddings but do not remove them. In Proceedings of the Workshop on Widening NLP. Association for Computational Linguistics, 60–63. Retrieved from https://aclanthology.org/W19-3621
[38]
Guido Governatori, Trevor Bench-Capon, Bart Verheij, Michał Araszkiewicz, Enrico Francesconi, and Matthias Grabmair. 2022. Thirty years of artificial intelligence and law: The first decade. Artif. Intell. Law (2022), 1–39.
[41]
Pengcheng He, Xiaodong Liu, Jianfeng Gao, and Weizhu Chen. 2020. DeBERTa: Decoding-enhanced BERT with disentangled attention. CoRR abs/2006.03654 (2020).
[42]
Long Short-Term Memory

Sepp Hochreiter, Jürgen Schmidhuber

Neural Computation 10.1162/neco.1997.9.8.1735
[43]
Zhiheng Huang, Wei Xu, and Kai Yu. 2015. Bidirectional LSTM-CRF models for sequence tagging. arXiv preprint arXiv:1508.01991 (2015).
[45]
Masahiro Kaneko and Danushka Bollegala. 2019. Gender-preserving debiasing for pre-trained word embeddings. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 1641–1650. DOI:10.18653/v1/P19-1160 10.18653/v1/p19-1160
[46]
Masahiro Kaneko and Danushka Bollegala. 2021. Debiasing pre-trained contextualised embeddings. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics. Association for Computational Linguistics, 1256–1266. DOI:10.18653/v1/2021.eacl-main.107
[48]
Svetlana Kiritchenko and Saif Mohammad. 2018. Examining gender and race bias in two hundred sentiment analysis systems. In Proceedings of the 7th Joint Conference on Lexical and Computational Semantics (SEM@NAACL-HLT’18), Malvina Nissim, Jonathan Berant, and Alessandro Lenci (Eds.). Association for Computational Linguistics, 43–53. DOI:10.18653/v1/s18-2005 10.18653/v1/s18-2005
[49]
Measuring Bias in Contextualized Word Representations

Keita Kurita, Nidhi Vyas, Ayush Pareek et al.

Proceedings of the First Workshop on Gender Bias i... 2019 10.18653/v1/w19-3823
[50]
Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, and Radu Soricut. 2020. ALBERT: A lite BERT for self-supervised learning of language representations. In Proceedings of the 8th International Conference on Learning Representations (ICLR’20). OpenReview.net, 17 pages. Retrieved from https://openreview.net/forum?id=H1eA7AEtvS

Showing 50 of 96 references

Metrics
11
Citations
96
References
Details
Published
Feb 13, 2024
Vol/Issue
18(4)
Pages
1-26
License
View
Funding
TUBITAK 1001 grant Award: 120E346
BAGEP 2023 Young Scientist Award
Cite This Article
Mustafa Bozdag, Nurullah Sevim, Aykut Koç (2024). Measuring and Mitigating Gender Bias in Legal Contextualized Language Models. ACM Transactions on Knowledge Discovery from Data, 18(4), 1-26. https://doi.org/10.1145/3628602
Related

You May Also Like

Graph evolution

Jure Leskovec, Jon Kleinberg · 2007

2,024 citations

Isolation-Based Anomaly Detection

Fei Tony Liu, Kai Ming Ting · 2012

1,600 citations

Hierarchical Density Estimates for Data Clustering, Visualization, and Outlier Detection

Ricardo J. G. B. Campello, Davoud Moulavi · 2015

673 citations

A Survey on Causal Inference

Liuyi Yao, Zhixuan Chu · 2021

376 citations

Temporal Link Prediction Using Matrix and Tensor Factorizations

Daniel M. Dunlavy, Tamara G. Kolda · 2011

351 citations