Information Extraction and Reasoning in Legal Texts
Project dates (estimated):
October 2021 - September 2025
Name of the PhD student:
Claire Barale
Supervisors:
Michael Rovatsos β School of Informatics
Nehal Bhuta - Law School
Project aims:
This research focuses on integrating knowledge into language models and enhancing their reasoning abilities. Specifically, my PhD explores legal information extraction as a foundational step in advancing natural language processing for the legal domain.
Legal language is a distinct linguistic register with unique challenges for NLP. Simply increasing pretraining or post-training data is not always effective, especially given the scarcity of high-quality legal text and proprietary restrictions.
Beyond developing new resources and methods for legal NLP, I evaluate how current language models process legal texts, investigate their limitations, and identify strategies to improve legal language modeling.
Disciplines and subfields engaged:
Artificial intelligence and machine learning
Legal decision making
Human-centered computing
Cognitive computing
AI Ethics
Research Themes:
Ethics of Algorithms
Bias and Discrimination in Machine Learning Ethics of Algorithmic Decision-Making
Algorithmic Accountability and Responsibility
Ethics of Human-Machine Interaction
Ethics of Automation
Ethics of Knowledge Augmentation
Emerging Technology and Human Identity
AI Automation and Human Autonomy
Related outputs:
Grants and Awards:
Awarded the Bloomberg Data Science PhD Fellowship starting July 2023.
Won the Best Paper Award for, Empowering Refugee Claimants and their Lawyers: Using Machine Learning to Examine Decision-Making in Refugee Law, presented at the International Conference on Artificial Intelligence and Law 2023 (ICAIL 2023) Doctoral Consortium.
Publications and Presentations:
Drish Mali, Rubash Mali, Claire Barale. 2024. Information Extraction for Planning Court Cases. In Proceedings of the Natural Legal Language Processing Workshop (NLLP) at EMNLP 2024. π
Aryo Pradipta Gema, Joshua Ong Jun Leang, Giwon Hong, Alessio Devoto, Alberto Carlo Maria Mancino, Rohit Saxena, Xuanli He, Yu Zhao, Xiaotang Du, Mohammad Reza Ghasemi Madani, Claire Barale, Robert McHardy, Joshua Harris, Jean Kaddour, Emile van Krieken, Pasquale Minervini. Are We Done with MMLU? NAACL 2025. π
Claire Barale, Mark Klaisoongnoen, Pasquale Minervini, Michael Rovatsos, and Nehal Bhuta. 2023. AsyLex: A Dataset for Legal Language Processing of Refugee Claims. In Proceedings of the Natural Legal Language Processing Workshop 2023, pages 244β257, Singapore. Association for Computational Linguistics. π
Claire Barale, Michael Rovatsos, and Nehal Bhuta. 2023. Do Language Models Learn about Legal Entity Types during Pretraining?. In Proceedings of the Natural Legal Language Processing Workshop 2023, pages 25β37, Singapore. Association for Computational Linguistics. π
Co-organized a workshop, NLP and Network Analysis in Financial Applications, at the 4th ACM Conference on AI in Finance held on November 7th 2023 in New York.
Presented, fAsyLex: Accelerating Legal NLP through Comparative Analysis of Multi-GPU Approaches, at the Women in High Performance Computing Workshop (WHPC) at SC2023, Denver, CO.
Automated Refugee Case Analysis: An NLP Pipeline for Supporting Legal Practitioners, Claire Barale, Michael Rovatsos, and Nehal Bhuta, Findings of the Annual Meeting of the Association for Computational Linguistics (ACL), 2023, Toronto, Canada π
Human-centered computing in legal NLP - An application to refugee status determination, Claire Barale, Proceedings of the Second Workshop on Bridging HumanβComputer Interaction and Natural Language Processing, 2022 π
βEnabling Ethical Human-AI Reasoning in International Law, talk by Claire Barale, Artificial Intelligence and its Applications Institute (AIAI) Seminar 20 June 2022.