Skip to main content

2019 | OriginalPaper | Buchkapitel

17. How to Generate Unbiased Data

verfasst von : Tobias Baer

Erschienen in: Understand, Manage, and Prevent Algorithmic Bias

Verlag: Apress

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

One motto of our times could be "data is the new gold"—however, it will shine only if it is pure and free of dirt. Biased data can be lethally polluted and thus worthless. For example, a tax authority once asked me to help them build an algorithm to direct customs inspectors to those containers in the port that were most likely to contain contraband. The project could not go ahead because the only data they had was from a very limited number of customs inspections their officers had done in the past year. The problem: Customs inspectors had chosen which containers to check, and they had complete freedom in how to conduct the checks (e.g., they may have limited themselves to opening the first box falling into their hands and accepting the shipment as containing "Louis Vuitton Croisette handbags" because the duffel bags with a "Luis Vitton" label seemed close enough, or they may have completely emptied the container and carefully checked the L’s, O’s, and T’s of the Louis Vuitton stamp of two dozen bags, knowing that variations in these letters are among the frequent tell-tale signs of fake bags).

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
This is the simple version of the rule. The complicated version of the rule would take into account a technique called stratification—e.g., if I believe that the persona of the shipper is more important than the individual container, a more efficient sampling strategy might randomly select five containers from each ship regardless of the ship’s size. While this means that the probability of a particular container to be checked is a lot smaller if it is on a Triple E class ship (which can carry up to 18,000 containers) than if it is on a small vessel carrying just 250 containers, it is still random, and by applying weights (capturing for each container what fraction of the ship’s total cargo it represents) we even can undo our stratification when estimating the coefficients of the algorithm.
 
Metadaten
Titel
How to Generate Unbiased Data
verfasst von
Tobias Baer
Copyright-Jahr
2019
Verlag
Apress
DOI
https://doi.org/10.1007/978-1-4842-4885-0_17

Premium Partner