Personal ICAIL 2021 Highlights

9 min readJun 25, 2021

In this post, I list the papers I found most interesting from the ICAIL 2021 conference. They are also specifically selected regarding their relevance to our research project.

Converting Copyright Legislation into Machine-Executable Code: Interpretation, Coding Validation and Legal Alignment
Authors: Witt, Alice; Huggins, Anna; Governatori, Guido; Buckley, Joshua
Abstract: A critical challenge in “Rules as Code” (“RaC”) initiatives is ensuring legal accuracy. In this paper, we present the preliminary results of a two-week, first of its kind experiment that aims to shed light on how different legally trained people interpret and translate Australian Commonwealth legislation into machine-readable code. We find that coders collaboratively agreeing on key legal terms, or atoms, before commencing independent coding work can significantly increase the similarity of their encoded rules. Participants nonetheless made a range of divergent interpretive choices, which we argue are most likely due to: (1) the complexity of statutory interpretation, (2) encoded provisions having varying levels of granularity and (3) the functionality of our coding language. Based on these findings, we draw an important distinction between processes for technical validation of encoded rules, which focus on ensuring rules adhere to select coding languages and conventions, and processes of legal alignment, which we conceptualise as enhancing congruence between the encoded rules and the true meaning of the statutory provisions in line with the modern approach to statutory interpretation. We argue that these processes are distinct but both critically important in enhancing the accuracy of RaC. We conclude by underlining the need for multi-disciplinary expertise across specific legal subject matters, statutory interpretation and technical programming in RaC initiatives.

A Combined Rule-Based and Machine Learning Approach for Automated GDPR Compliance Checking
Authors: El Hamdani, Rajaa; Mustapha, Majd; Restrepo Amariles, David; Troussel, Aurore; Meeus, Sébastien; Krasnashchok, Katsiaryna
Abstract: The General Data Protection Regulation (GDPR) requires data controllers to implement end-to-end compliance. Controllers must therefore ensure that the terms agreed with the data subject and their own obligations under GDPR are respected in the data flows from data subject to controllers, processors and sub processors (i.e. data supply chain). This paper seeks to contribute to bridge both ends of compliance checking through a two-pronged study. First, we conceptualize a framework to implement a document-centric approach to compliance checking in the data supply chain. Second, we develop specific methods to automate compliance checking of privacy policies. We test a two-modules system, where the first module relies on NLP to extract data practices from privacy policies. The second module encodes GDPR rules to check the presence of mandatory information. The results show that the text-to-text approach outperforms local classifiers and enables the extraction of both coarse-grained and fine-grained information with only one model. We implement an end-to-end evaluation of our system on a dataset of 30 privacy policies annotated by legal experts. We conclude that this approach could be generalized to other documents in the data supply as a means to improve end-to-end compliance.

Discovering the Rationale of Decisions: Towards a Method for Aligning Learning and Reasoning
Authors: Steging, Cornelis Cor; Renooij, Silja; Verheij, Bart
Abstract: In AI and law, systems that are designed for decision support should be explainable when pursuing justice. In order for these systems to be fair and responsible, they should make correct decisions and make them using a sound and transparent rationale. In this paper, we introduce a knowledge-driven method for model-agnostic rationale evaluation using dedicated test cases, similar to unit-testing in professional software development. We apply this new method in a set of machine learning experiments aimed at extracting known knowledge structures from artificial datasets from fictional and non-fictional legal settings. We show that our method allows us to analyze the rationale of black box machine learning systems by assessing which rationale elements are learned or not. Furthermore, we show that the rationale can be adjusted using tailor-made training data based on the results of the rationale evaluation.

CriminelBART: A French Canadian Legal Language Model Specialized in Criminal Law
Authors: Garneau, Nicolas; Gaumond, Eve; Lamontagne, Luc; Déziel, Pierre-Luc
Abstract: Learning language representations is a key component in many natural language processing tasks, and their usefulness is most often challenged by the target domain and vocabulary. It has been shown that language models are surprisingly efficient at learning and transferring such representations to specific domains. However, the more specialized a target domain is, the harder the transfer, and thus proper fine-tuning is required. This is why we introduce CriminelBART, a French Canadian Legal Language Model specialized in Criminal Law. CriminelBART has been trained exclusively on criminal data. Therefore, the model learned specialized language representation for the criminal domain and not any other area of law. We illustrate its usefulness within two tasks; the first one, semantic textual similarity, is discriminative in the sense that we analyze the impact of having good language representation for textual classification involving semantic reasoning. The other one analyzes the generative capabilities of CriminelBART with a suite of Cloze Tests. Those are the first stepping stones in this very unique and particular arena that is French-Canadian criminal law.

Labels distribution matters in performance achieved in legal judgment prediction task
Authors: Salaün, Olivier; Langlais, Philippe; Benyekhlef, Karim
Abstract: Legal judgment prediction (LJP) can be formalized as text classification tasks in which models are given the factual description of a dispute and must return some labels that can be either the verdict decided by the judge or some other information such as relevant law articles or charge prediction. The literature shows that the use of articles as input features helps in improving the classification performance. In our work, we designed a verdict prediction task as text classification based on landlord-tenant tribunal decisions and we applied a BERT-based model to which we fed different article-based representation. Although the addition of such features helps in gaining up to an extra 3.5% in exact match, it delivers mitigated results in terms of macro-averaged F1 score as such approach only improves the prediction of the most frequent labels but fails at predicting the least frequent ones. We also notice that some conditions must apply for the articles-based features to improve the F1 score of some verdict labels. All in all, these experiments suggest that pre-trained and fine-tuned transformer-based models are not scalable as is for legal reasoning in real life scenarios at they would only excel in accurately predicting the most recurrent verdicts to the detriment of other legal outcomes.

Context-Aware Legal Citation Recommendation using Deep Learning
Authors: Huang, Zihan; Low, Charles; Teng, Mengqiu; Zhang, Hongyi; Ho, Daniel E.; Krass, Mark; Grabmair, Matthias
Abstract: Lawyers and judges spend a large amount of time researching the proper legal authority to cite while drafting decisions. In this paper, we develop a citation recommendation tool that can help improve efficiency in the process of opinion drafting. We train four types of machine learning models, including a citation-list based method (collaborative filtering) and three context-based methods (text similarity, BiLSTM and RoBERTa classifiers). Our experiments show that leveraging local textual context improves recommendation, and that deep neural models achieve decent performance. We show that non-deep text-based methods benefit from access to structured case metadata, but deep models only benefit from such access when predicting from context of insufficient length. We also find that, even after extensive training, RoBERTa does not outperform a recurrent neural model, despite its benefits of pretraining. Our behavior analysis of the RoBERTa model further shows that predictive performance is stable across time and citation classes.

Evaluating Document Representations for Content-based Legal Literature Recommendations
Authors: Ostendorff, Malte; Ash, Elliott; Ruas, Terry; Gipp, Bela; Moreno-Schneider, Julian; Rehm, Georg
Abstract: Recommender systems assist legal professionals in finding relevant literature for supporting their case. Despite its importance for the profession, legal applications do not reflect the latest advances in recommender systems and representation learning research. Simultaneously, legal recommender systems are typically evaluated in small-scale user study without any public available benchmark datasets. Thus, these studies have limited reproducibility. To address the gap between research and practice, we explore a set of state-of-the-art document representation methods for the task of retrieving semantically related US case law. We evaluate text-based (e.g., fastText, Transformers), citation-based (e.g., DeepWalk, Poincaré), and hybrid methods. We compare in total 27 methods using two silver standards with annotations for 2,964 documents. The silver standards are newly created from Open Case Book and Wikisource and can be reused under an open license facilitating reproducibility. Our experiments show that document representations from averaged fastText word vectors (trained on legal corpora) yield the best results, closely followed by Poincaré citation embeddings. Combining fastText and Poincaré in a hybrid manner further improves the overall result. Besides the overall performance, we analyze the methods depending on document length, citation count, and the coverage of their recommendations. We make our source code, models, and datasets publicly available.

Structural Text Segmentation of Legal Documents
Authors: Aumiller, Dennis; Almasian, Satya; Lackner, Sebastian; Gertz, Michael
Abstract: The growing complexity of legal cases has lead to increasing interest in legal information retrieval systems that can effectively satisfy user-specific information needs. However, such downstream systems typically require documents to be properly formatted and segmented, which is often done with relatively simple pre-processing steps, disregarding topical coherence of segments. Systems generally rely on representations of individual sentences or paragraphs, which may lack crucial context, or document-level representations, which are too long for meaningful search results. To address this issue, we propose a segmentation system that can predict topical coherence of subsequent text segments spanning several paragraphs, effectively segmenting a document and providing a more balanced representation for downstream applications. We build our model on top of popular transformer networks and formulate structural text segmentation as topical change detection, by performing a series of independent classifications that allows for efficient fine-tuning on task-specific data. We crawl a novel dataset consisting of roughly 74,000 online Terms-of-Service documents, including hierarchical topic annotations, which we use for training. Results show that our proposed system significantly outperforms baselines, and adapts well to structural peculiarities of legal documents. We release both data and trained models to the research community for future work.

A Dataset for Evaluating Legal Question Answering on Private International Law
Authors: Sovrano, Francesco; Palmirani, Monica; Distefano, Biagio; Sapienza, Salvatore; Vitali, Fabio
Abstract: International Private Law (PIL) is a complex legal domain that presents frequent conflicting norms between the hierarchy of legal sources, legal domains, and the adopted procedures. Scientific research on PIL reveals the need to create a bridge between European and national laws. In this context, legal experts have to access heterogeneous sources, being able to recall all the norms and to combine them using case-laws and following the principles of interpretation theory. This clearly poses a daunting challenge to humans, whenever Regulations change frequently or are big-enough in size. Automated reasoning over legal texts is not a trivial task, because legal language is very specific and in many ways different from a commonly used natural language. When applying state-of-the-art language models to legalese understanding, one of the challenges is always to figure how to optimally use the available amount of data. This makes hard to apply state-of-the-art sub-symbolic question answering algorithms on legislative texts, especially the PIL ones, because of data scarcity. In this paper we try to expand previous works on legal question answering, publishing a larger and more curated dataset for the evaluation of automated question answering on PIL.

Anonymization of German Legal Court Rulings
Authors: Glaser, Ingo; Schamberger, Tom; Matthes, Florian
Abstract: In the legal domain, many legal documents such as court decisions and contracts are regularly anonymized. This process requires text sequences with high sensitivity to be identified and neutralized to secure sensitive information from third parties. Usually, this process is performed manually by trained employees. Therefore, anonymization is generally considered an expensive and inefficient process. This work proposes a machine learning approach for the automatic identification of sensitive text elements in German legal court decisions and provides an implementation. For this task, different deep neural network architectures based on generally pre-trained contextual embeddings as well as trained word embeddings are evaluated. Because of the lack of non-anonymized data sets, an approach to create pseudonymized data sets is proposed as well.

Case-level Prediction of Motion Outcomes in Civil Litigation
Authors: McConnell, Devin J.; Zhu, James; Pandya, Sachin S.; Aguiar, Derek Cole

Lex Rosetta: Transfer of Predictive Models Across Languages, Jurisdictions, and Legal Domains
Authors: Savelka, Jaromir; Westermann, Hannes; Benyekhlef, Karim; Alexander, Charlotte S.; Grant, Jayla C.; Amariles, David Restrepo; El-Hamdani, Rajaa; Meeus, Sebastien; Troussel, Aurore; Araszkiewicz, Michal; Ashley, Kevin D.; Ashley, Alexandra; Branting, Karl L.; Falduti, Mattia; Grabmair, Matthias; Harasta, Jakub; Novotna, Tereza; Tippett, Elizabeth; Johnson, Shiwanni

Wrap up
The conference had papers spanning very different tasks and some coming more from the technical side, some more from a legal side. Surprising to me was that many papers did not use state-of-the-art NLP methods like Transformers or even RNNs.

Personal ICAIL 2021 Highlights

Written by Joel Niklaus