Named entities in text are persons, places, companies, etc. that are explicitly mentioned in text using proper nouns. The process of finding named entities in a text and classifying them to a semantic type, is called named entity recognition. Resolution of named entities is the process of linking a mention of a name in text to a pre-existing database entry. This grounds the mention in something analogous to a real world entity. For example, a mention of a judge named
might be resolved to a database entry for a specific judge of a specific district of a specific state. This recognition and resolution of named entities can be leveraged in a number of ways including providing hypertext links to information stored about a particular judge: their education, who appointed them, their other case opinions, etc.
This paper discusses named entity recognition and resolution in legal documents such as US case law, depositions, and pleadings and other trial documents. The types of entities include judges, attorneys, companies, jurisdictions, and courts.
We outline three methods for named entity recognition, lookup, context rules, and statistical models. We then describe an actual system for finding named entities in legal text and evaluate its accuracy. Similarly, for resolution, we discuss our blocking techniques, our resolution features, and the supervised and semi-supervised machine learning techniques we employ for the final matching.