Skip to main content

2013 | Buch

Linguistic Identity Matching

verfasst von: Bertrand Lisbach, Victoria Meyer

Verlag: Springer Fachmedien Wiesbaden

insite
SUCHEN

Über dieses Buch

Regulation, risk awareness and technological advances are increasingly drawing identity search functionality into business, security and data management processes, as well as fraud investigations and counter-terrorist measures.

Over the years, a number of techniques have been developed for searching identity data, traditionally focusing on logical algorithms. These techniques often failed to take into account the complexities of language and culture that provide the rich variations seen in names used around the world. A new paradigm has now emerged for understanding the way that identity data should be searched. This new approach focuses on understanding the influences that languages, writing systems and cultural conventions have on proper names.

A must-read for anyone involved in the purchase, design or use of identity matching systems, this book describes how linguistic knowledge can be used to create a more reliable and precise identity search, and looks at the practical benefits that can be achieved by implementing third-generation linguistic search technology.

Inhaltsverzeichnis

Frontmatter

Introduction to Linguistic Identity Matching

Frontmatter
Chapter 1. Basic Concepts
Abstract
The terms “identity matching” and “name matching” are sometimes mistakenly used interchangeably. In fact, name matching is not the only component in an identity matching process, though it is clearly the most challenging. Identity matching is a technique employed to determine whether or not two separate data records relate to the same person (where persons can be either legal or natural, and a legal person could be a company, foundation, trust or other organisation).
Matching two such “person records” typically involves calculating a metric to describe the similarity of the characteristics displayed by each. In the case of a natural person these characteristics include the name, date of birth, nationality and address as well as various reference codes such as passport and social security numbers. The characteristics of a legal person may include the name, legal status and place and date of incorporation, while its reference numbers may include tax codes and incorporation references. In both cases, in the absence of a unique identifying code, the highest weight is usually given to the person’s name, as this is a key determinant of identity in most societies.
Bertrand Lisbach, Victoria Meyer
Chapter 2. The Application of Identity Matching Techniques
Abstract
Although not a primary business process in itself, identity matching plays a crucial role behind the scenes in many modern organisations. As a discrete function that can be incorporated into a variety of different business processes, identity matching has a wide ranging scope that covers areas as diverse as risk management, customer support and data integration. Even within smaller organisations, there are often multiple processes that rely on identity matching techniques in some form. In larger organisations there may be dozens of systems incorporating the searching and matching of identity data, including customer service applications, human resource systems, data quality solutions, credit control functions and regulatory compliance systems.
This chapter describes how strong identity matching solutions can help manage risks in a wide variety of areas. These risk management advantages are maximised if identity matching methods are standardised so that comparable results are achieved wherever identity matching is applied within an organisation. In addition, the diversity of potential uses for identity matching techniques means that there are often significant cost and efficiency advantages to be gained from the standardisation of identity matching practices across an organisation, particularly in the areas of technology development, training, operations and licensing fees. The advantages of standardisation and a possible framework for achieving this are considered in Chapter 14.
This chapter looks more closely at the wide variety of application areas for identity matching techniques and the different ways in which getting the identity matching technology right can benefit an organisation.
Bertrand Lisbach, Victoria Meyer
Chapter 3. Introduction to Proper Names
Abstract
A proper name can refer to any unique entity. In the context of identity matching, the proper names of individuals, organisations and other entities, such as ships, are the most relevant. Throughout this book, the term “proper name” is used in a restricted sense to refer to these classes of names.
This chapter looks in particular at the proper names of natural persons. It considers the basics of what makes up a person’s name, together with the historical and cultural development of naming practices around the world. This lays the foundation for understanding the different features of proper names that are important for identity matching purposes and which will be considered in more detail in later chapters.
This book uses the term “naming system” to describe the structure of proper names and the function of the different parts within them, as well as the cultural norms and legal processes that might see a name changing over time. With language and culture playing such a huge role in the development of proper names, there are obviously very many different naming systems in use around the world. To highlight the differences in global naming systems, this chapter uses examples from four of the most widely used: the Western, Russian, Arabic and Chinese.
Bertrand Lisbach, Victoria Meyer
Chapter 4. Transcription
Abstract
The previous chapter considered the variations that can be introduced when proper names are converted from one person name system to another and explained that this frequently occurs when non-Western names are stored in a database designed around Western naming conventions. This chapter considers a related but significantly more problematic source of variation in proper names: the variations that are caused by transferring proper names from one writing system to another.
There may be many valid romanised versions of the same non-Latin names, and the differences between them can be considerable. This is a particularly important consideration in the design of any identity matching system used for compliance, law enforcement or national security purposes, as watch lists published for use in such systems often focus on romanised versions of names, despite including a significant number of names originating from countries where the Latin script is not used.
Bertrand Lisbach, Victoria Meyer
Chapter 5. Derivative Forms of Names
Abstract
It is not uncommon for a person to be known by a name other than their primary, official name. Some of these other names may be alternate forms or derivatives of the officially registered name. Others may be completely different, with no connection at all to the official name. Such alias names may be used for many different reasons. Criminals may seek alternate identities to escape detection, entertainers may choose to perform under a stage name and writers may allow themselves the freedom afforded by a pseudonym. In these cases, there is no predictable connection between an individual’s official name and the alternate name they have chosen for themselves. Eric Arthur Blair, for example, is more widely known as the writer George Orwell, and many readers are unaware that the classics created by George Eliot are, in fact, the work of Mary Anne Evans.
Bertrand Lisbach, Victoria Meyer
Chapter 6. Phonetically Similar Names
Abstract
It has long been recognised that phonetically-motivated misspelling is a frequent cause of variation in names. Some of the earliest attempts at name matching were based on an understanding that names are often misspelt in a way that sounds similar, so that Taylor may be misspelt as Tailer, or Wight as White. Such errors may result if the person writing the name is doing so from hearing it spoken rather than from seeing it written out. It may also be that the writer has seen the name written out but remembers only the sound pattern rather than the exact spelling. In case of doubt, a person may be more likely to recall the spelling that is most familiar to them.
This chapter looks at variations in names that occur as a result of the relationship between pronunciation and spelling. Pronunciation is a complex topic and one that is highly language specific, so any consideration of the effect of pronunciation on spelling must be made in the context of the relevant language. However, even within individual languages, the various peculiarities and ambiguities in pronunciation mean that great care has to be taken in trying to identify names that are pronounced in the same way.
Bertrand Lisbach, Victoria Meyer
Chapter 7. Typos
Abstract
The term “variation” is relatively neutral and does not necessarily imply that a mistake has been made. Chapter 4 looked at transcription variants, which are technically neither correct nor incorrect. Similarly, the derivative forms of names discussed in Chapter 5 are not technically incorrect variations; they are simply more or less likely to be used to refer to a given person. In contrast, the phonetically-driven variations considered in Chapter 6 are errors in the spelling of a person’s name, though as linguistically motivated errors they form some of the most frequent misspellings of person names.
Bertrand Lisbach, Victoria Meyer

Name Matching Methods

Frontmatter
Chapter 8. Name Matching Methods of the First Generation
Abstract
Computer algorithms have been used for automatic name matching for over half a century. For the most part, earlier efforts used methods that were not originally developed for matching names but were borrowed from other disciplines. Those that did focus specifically on comparing proper names did so in a largely superficial way. As a result, none of the first generation of methods used for matching names (G1 Methods) effectively cover the requirements of global name matching, though some do still have a limited role to play when used in conjunction with other, more sophisticated techniques.
Bertrand Lisbach, Victoria Meyer
Chapter 9. Second Generation Name Matching Methods
Abstract
Second generation name matching methods (G2 Methods) are direct progressions from G1 Methods. They have solved some of the most obvious problems of their predecessors and, as a result, provide marked improvements in both precision and recall. For the most part, they represent relatively recent developments and so have been designed with more modern technological capabilities in mind. Overall, they provide clear advantages over G1 Methods, but the market has not adopted them as enthusiastically as might have been expected.
One reason for this may be that the advent of G2 Methods cannot really be seen as a conceptual revolution in identity matching theory. In much the same way as the early tweaking of the original G1 Methods, G2 Methods represent a technology-driven optimisation of existing solutions; neither linguistics nor onomastic research feature strongly in their design. Examples of their benefits are usually given with reference to familiar examples from the Anglo-Saxon community, while the true global context of international name matching continues to be held at arm’s length.
Bertrand Lisbach, Victoria Meyer
Chapter 10. Third Generation Name Matching Methods
Abstract
The previous chapters have demonstrated that none of the main sources of variation in names can be comprehensively covered by any combination of G1 or G2 Methods. Nowhere is this more apparent than in the coverage of linguistic variations, particularly those caused by the use of different transcription standards. This is to be expected, given the limited role that linguistics and onomastics played in the development of the original matching methods and the fact that many G2 Methods represent technology-driven enhancements of their earlier counterparts.
Name matching solutions of the third generation (G3 Solutions) differ from G2 Methods in that they do not simply build on an existing matching technique. Instead, the starting point for G3 Solutions is a careful examination of all the different classes of variation within names. This is the core of the paradigm shift which sets G3 Solutions apart from their predecessors: the solution design follows a comprehensive analysis of the causes of variation in spelling.
Bertrand Lisbach, Victoria Meyer
Chapter 11. Benchmark Study
Abstract
Earlier chapters theorised about the performance of G1, G2 and G3 Methods and commented on their potential contribution to identity matching systems. This chapter presents the findings of a simple, practical study that has been conducted to demonstrate the relative performance of selected methods.
The test reflects a simplified version of the benchmark studies that should be carried out prior to any investment in identity matching technology. Full application evaluations are designed to test a variety of technical and functional criteria, and are discussed further in Chapter 13. The assessment described here has been stripped to its core in order to highlight the recall and precision that can be achieved with each of the selected match methods when matching the major classes of variation in names.
Bertrand Lisbach, Victoria Meyer

Into the New Paradigm

Frontmatter
Chapter 12. Name Matching and Identity Matching
Abstract
The matching of proper names plays a key role in any identity matching process; in many cases, the name may be the only criterion entered into the search. Name matching is also the one element of identity matching that is most directly affected by the paradigm shift towards linguistic matching, which is why this text has so far focused on the way that names are matched. However, a person’s name is only one feature of their identity and many identity matching processes allow for a broad range of search attributes. This chapter looks at the non-name characteristics that are most frequently included in the search profile and introduces the additional requirements that they bring to the design of a comprehensive identity search solution.
Bertrand Lisbach, Victoria Meyer
Chapter 13. Evaluation of Identity Matching Software
Abstract
There are several commercially available tools designed to search identity data and many more software suites that contain an identity matching component. Assessing the matching performance of such products is no trivial task, requiring specialist expertise that is available in only a very few commercial or administrative organisations. In the absence of specialist identity matching resources, businesses often look to market position as an indicator of quality.
In many software areas this may be a reasonable approach, but in a market where many businesses are ill-equipped to evaluate the performance of identity matching components, this has lead to a situation where popularity begets wider popularity, with actual functionality often having a surprisingly low impact on this process. As a result, the matching performance of the most widely recognised search-related products often displays little correlation with their popularity in the market.
Bertrand Lisbach, Victoria Meyer
Chapter 14. A Linguistic Search Standard
Abstract
The identity matching industry is currently experiencing a shift in the paradigm on which it has been based for decades. Traditional methods still dominate the search technology market, but the improved quality achieved by linguistic search techniques is already evident in a number of software products. Vendors of identity matching tools are starting to build linguistic capabilities into their applications in response to growing expectations from corporate users, regulators, law enforcement and the general public. However, the increased flexibility offered by linguistic techniques has lead to less consistency in the way search technology is applied.
Larger organisations frequently maintain many different identity searches across multiple locations and business processes, often with greatly differing configurations. Significant effort can be saved in the design and testing of these different search processes if their individual configurations are underpinned by the same basic principles governing the definition of true and false positive hits. Consistency across different search processes can also lead to greater operational efficiency and fewer problems with satisfying the requirements of auditors, regulators and other external bodies.
Bertrand Lisbach, Victoria Meyer
Backmatter
Metadaten
Titel
Linguistic Identity Matching
verfasst von
Bertrand Lisbach
Victoria Meyer
Copyright-Jahr
2013
Verlag
Springer Fachmedien Wiesbaden
Electronic ISBN
978-3-8348-2095-2
Print ISBN
978-3-8348-1370-1
DOI
https://doi.org/10.1007/978-3-8348-2095-2

Premium Partner