E-NER - An Annotated Named Entity Recognition Corpus of Legal Text

Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review

Standard

E-NER - An Annotated Named Entity Recognition Corpus of Legal Text. / Au, Ting Wai Terence; Lampos, Vasileios; Cox, Ingemar J.

NLLP 2022 - Natural Legal Language Processing Workshop 2022, Proceedings of the Workshop. Association for Computational Linguistics (ACL), 2022. p. 246-255.

Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review

Harvard

Au, TWT, Lampos, V & Cox, IJ 2022, E-NER - An Annotated Named Entity Recognition Corpus of Legal Text. in NLLP 2022 - Natural Legal Language Processing Workshop 2022, Proceedings of the Workshop. Association for Computational Linguistics (ACL), pp. 246-255, 4th Natural Legal Language Processing Workshop, NLLP 2022, co-located with the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022, Abu Dhabi, United Arab Emirates, 08/12/2022. <https://aclanthology.org/2022.nllp-1.22>

APA

Au, T. W. T., Lampos, V., & Cox, I. J. (2022). E-NER - An Annotated Named Entity Recognition Corpus of Legal Text. In NLLP 2022 - Natural Legal Language Processing Workshop 2022, Proceedings of the Workshop (pp. 246-255). Association for Computational Linguistics (ACL). https://aclanthology.org/2022.nllp-1.22

Vancouver

Au TWT, Lampos V, Cox IJ. E-NER - An Annotated Named Entity Recognition Corpus of Legal Text. In NLLP 2022 - Natural Legal Language Processing Workshop 2022, Proceedings of the Workshop. Association for Computational Linguistics (ACL). 2022. p. 246-255

Author

Au, Ting Wai Terence ; Lampos, Vasileios ; Cox, Ingemar J. / E-NER - An Annotated Named Entity Recognition Corpus of Legal Text. NLLP 2022 - Natural Legal Language Processing Workshop 2022, Proceedings of the Workshop. Association for Computational Linguistics (ACL), 2022. pp. 246-255

Bibtex

@inproceedings{c3c976b34eee4e4db5d91c9aaf00ec7e,

title = "E-NER - An Annotated Named Entity Recognition Corpus of Legal Text",

abstract = "Identifying named entities such as a person, location or organization, in documents can highlight key information to readers. Training Named Entity Recognition (NER) models requires an annotated data set, which can be a time-consuming labour-intensive task. Nevertheless, there are publicly available NER data sets for general English. Recently there has been interest in developing NER for legal text. However, prior work and experimental results reported here indicate that there is a significant degradation in performance when NER methods trained on a general English data set are applied to legal text. We describe a publicly available legal NER data set, called E-NER, based on legal company filings available from the US Securities and Exchange Commission's EDGAR data set. Training a number of different NER algorithms on the general English CoNLL-2003 corpus but testing on our test collection confirmed significant degradations in accuracy, as measured by the F1-score, of between 29.4% and 60.4%, compared to training and testing on the E-NER collection.",

author = "Au, {Ting Wai Terence} and Vasileios Lampos and Cox, {Ingemar J.}",

note = "Publisher Copyright: {\textcopyright} 2022 Association for Computational Linguistics.; 4th Natural Legal Language Processing Workshop, NLLP 2022, co-located with the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022 ; Conference date: 08-12-2022",

year = "2022",

language = "English",

pages = "246--255",

booktitle = "NLLP 2022 - Natural Legal Language Processing Workshop 2022, Proceedings of the Workshop",

publisher = "Association for Computational Linguistics (ACL)",

address = "United States",

}

RIS

TY - GEN

T1 - E-NER - An Annotated Named Entity Recognition Corpus of Legal Text

AU - Au, Ting Wai Terence

AU - Lampos, Vasileios

AU - Cox, Ingemar J.

PY - 2022

Y1 - 2022

N2 - Identifying named entities such as a person, location or organization, in documents can highlight key information to readers. Training Named Entity Recognition (NER) models requires an annotated data set, which can be a time-consuming labour-intensive task. Nevertheless, there are publicly available NER data sets for general English. Recently there has been interest in developing NER for legal text. However, prior work and experimental results reported here indicate that there is a significant degradation in performance when NER methods trained on a general English data set are applied to legal text. We describe a publicly available legal NER data set, called E-NER, based on legal company filings available from the US Securities and Exchange Commission's EDGAR data set. Training a number of different NER algorithms on the general English CoNLL-2003 corpus but testing on our test collection confirmed significant degradations in accuracy, as measured by the F1-score, of between 29.4% and 60.4%, compared to training and testing on the E-NER collection.

AB - Identifying named entities such as a person, location or organization, in documents can highlight key information to readers. Training Named Entity Recognition (NER) models requires an annotated data set, which can be a time-consuming labour-intensive task. Nevertheless, there are publicly available NER data sets for general English. Recently there has been interest in developing NER for legal text. However, prior work and experimental results reported here indicate that there is a significant degradation in performance when NER methods trained on a general English data set are applied to legal text. We describe a publicly available legal NER data set, called E-NER, based on legal company filings available from the US Securities and Exchange Commission's EDGAR data set. Training a number of different NER algorithms on the general English CoNLL-2003 corpus but testing on our test collection confirmed significant degradations in accuracy, as measured by the F1-score, of between 29.4% and 60.4%, compared to training and testing on the E-NER collection.

UR - http://www.scopus.com/inward/record.url?scp=85154582292&partnerID=8YFLogxK

M3 - Article in proceedings

AN - SCOPUS:85154582292

SP - 246

EP - 255

BT - NLLP 2022 - Natural Legal Language Processing Workshop 2022, Proceedings of the Workshop

PB - Association for Computational Linguistics (ACL)

T2 - 4th Natural Legal Language Processing Workshop, NLLP 2022, co-located with the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022

Y2 - 8 December 2022

ER -

ID: 358726773

Department of Computer Science