Skip to main content

Über dieses Buch

This book presents a philosophical approach to probability and probabilistic thinking, considering the underpinnings of probabilistic reasoning and modeling, which effectively underlie everything in data science. The ultimate goal is to call into question many standard tenets and lay the philosophical and probabilistic groundwork and infrastructure for statistical modeling. It is the first book devoted to the philosophy of data aimed at working scientists and calls for a new consideration in the practice of probability and statistics to eliminate what has been referred to as the "Cult of Statistical Significance."

The book explains the philosophy of these ideas and not the mathematics, though there are a handful of mathematical examples. The topics are logically laid out, starting with basic philosophy as related to probability, statistics, and science, and stepping through the key probabilistic ideas and concepts, and ending with statistical models.
Its jargon-free approach asserts that standard methods, such as out-of-the-box regression, cannot help in discovering cause. This new way of looking at uncertainty ties together disparate fields — probability, physics, biology, the “soft” sciences, computer science — because each aims at discovering cause (of effects). It broadens the understanding beyond frequentist and Bayesian methods to propose a Third Way of modeling.



Chapter 1. Truth, Argument, Realism

Truth exists and we can know it. The universe (all there is) also exists and we can know it. Further, universals exist and we can know these, too. Any skepticism about truth, reality, or universals is self-refuting. There are two kinds of truth: ontological and epistemological, comprising existence and our understanding of existence. Tremendous disservice has been done by ignoring this distinction. There are two modes of truth: necessary and local or conditional. A necessary truth is proposition that is so based on a chain of reasoning from indubitable axioms or sense impressions. A local truth, and most truths are local, is so based on a set of premises assumed or believed true. From this seemingly trivial observation, everything flows, and is why so-called Gettier problems and the like aren’t problems after all. Science is incapable of answering questions about itself; the belief that it can is called scientism. Faith, belief, and knowledge are differentiated.
William Briggs

Chapter 2. Logic

Logical truth is conditional, as are all necessary and local truths, on the premises given or assumed. Logic is the study of the relation between propositions, between premises and conclusion, that is. So too is probability, which is the continuation, fullness, or completion of logic. All arguments use language, and therefore the terms, definitions, and grammar of language are part of the tacit premises in every argument. It is well to bring these tacit premises out when possible. Logic, like mathematics, is not empirical, though observations may inform logic and math, and logic and math may be used on empirical propositions. Probability, because it is part of logic, is also not empirical; and it, too, can be used on empirical propositions. Syllogistic is preferred over symbolic logic for its ease of understanding; syllogisms are an ideal way of grouping evidence. The fundamental principles of logic ultimately are not formal in a sense to be defined. Finally, not all fallacies are what they seem.
William Briggs

Chapter 3. Induction and Intellection

There is no knowledge more certain than that provided by induction. Without induction, no argument could, as they say, get off the ground floor. All arguments must trace eventually back to some foundation. This foundational knowledge is first present in the senses; through intellection, i.e. induction, first principles, universals, and essences are discovered. Induction is what accounts for our being certain, after observing only a finite number of instances or even one and sometimes even none, that all flames are hot, that all men are mortal, that for all natural numbers x and y, if x = y, then y = x, and for providing content and characteristics of all other universals and axioms. Induction is analogical; it is of five different kinds, some more and some less reliable. That this multiplicity is generally unknown accounts for a great deal of the controversy over induction. Arguments are not valid because of their form but because of their content.
William Briggs

Chapter 4. What Probability Is

Probability is, like logic, an argument. Logic is the study of the relation between propositions, and so is probability. Like logic, probability is not a real or physical thing: it does not exist, it is not ontological. It cannot be measured with any apparatus, like mass or energy can. Like logic, probability is a measure of certainty of some proposition in relation to given or assumed premises—and only on these, and no other, premises, and this includes the tacit premises of language. All probability, without exception, is therefore conditional. Probability is widely misunderstood for two main reasons: the confusion between ontological and epistemological truth, and the conflation of acts or decisions with probability. We know the proposition “Mike is green” is true given “All dragons are green and Mike is a dragon”. This is an epistemological conditional, or local, truth. But we also know the major part of the premise is ontologically false because there are no dragons, green or otherwise. Counterfactuals are always ontologically false; i.e. they begin with premises known observationally to be false. Yet counterfactuals can have meaningful (epistemological) probabilities. Counterfactuals are surely meaningful epistemologically but never ontologically. Not all probabilities are quantifiable; most are not.
William Briggs

Chapter 5. What Probability Is Not

Logic is not an ontological property of things. You cannot, for instance, extract a syllogism from the existence of an object; the imagined syllogism is not somehow buried deep in the folds of the object waiting to be measured by some sophisticated apparatus. Logic is the relation between propositions, and these relations are not physical. A building can be twice as high as another building; the “twice” is the relation, but what exists physically are only the two buildings. Probability is also the relation between sets of propositions, so it too cannot be physical. Once propositions are set, the relation between them is also set and is a deducible consequence, i.e. the relation is not subjective, a matter of opinion. Mathematical equations are lifeless creatures; they do not “come alive” until they are interpreted, so that probability cannot be an equation. Probability is a matter of our understanding. Subjective probability is therefore a fallacy. The most common interpretation of probability, limited relative frequency, also confuses ontology with epistemology and therefore gives rise to many fallacies.
William Briggs

Chapter 6. Chance and Randomness

Randomness is not a thing, neither is chance. Both are measures of uncertainty and express ignorance of causes. Because randomness and chance are not ontologically real, they cannot cause anything to happen. Immaterial measures of information are never and can never be physically operative. It is always a mistake, and the cause of vast confusion, to say things like “due to chance”, “games of chance”, “caused by random (chance, spontaneous) mutations”, “these results are significant”, “these results are not explainable by chance”, “random effects”, “random variable”, and the like. All this holds in quantum mechanics, where the evidence for physical chance appears strongest. What also follows, although it is not at first apparent, is that simulations are not needed. This statement will appear striking and even obviously false, until it is understood that the so-called “randomness” driving simulations is anything but random. Coincidences are defined and their relation to cause explained. The ties between information theory and probability are given.
William Briggs

Chapter 7. Causality

Cause is analogical. There is not one type, flavor, or aspect of cause, but four. A formal, material, efficient, and final or teleological. Most causation concerns events which occur not separately, as in this before that, but simultaneously, where simultaneous events can be spread through time. Many causal data are embedded in time, and there two types of time series which are often confused: per se and accidental. These should not be mistaken for non-causal data series (the most common) which are all accidental. All causes are activiations of potentials by something actual. A vase is potential a pile of shards. It is made actually a pile of shards by an actual baseball. All four aspects of the cause are there: form of shards, clay fragments, efficient bat, and the pile itself as an end. Deterministic (and probability) models are epistemological; essential causal models are ontological and express true understanding of the nature of a thing. Causes, if they exist and are present, must always be operative, a proposition that has deep consequences for probability modeling. Falsifiability is rarely of interest, and almost never happens in practice. And under-determination, i.e. the possibility of causes other than those under consideration, will always be with us.
William Briggs

Chapter 8. Probability Models

A model is an argument. Models are collections of various premises which we assign to an observable proposition, i.e. an observable. Modelling reverses the probability equation: the proposition of interest or conclusion, i.e. the observable Y, is specified first after which premises X thought probative of the observable are sought or discovered. The ultimate goal is to discover just those premises X which cause or which determine Y. Absent these—and there may be many causes of Y—it is hoped to find X which give Y probabilities close to 0 or 1, given X in its various states. Measures of X’s importance are given. A model’s usefulness depends on what decisions are made with it, and how costly and rewarding those decisions are. Proper scores which help define usefulness are given. Probability models can and do have causative elements. Some probability models are even fully causal or deterministic in the sense given last chapter, but which are treated as probabilistic in practice. Tacit premises are added to the predictions from these models which adds uncertainty. Bayes is not all its cracked up to be. The origin and limitations of parameters and parametric models are given.
William Briggs

Chapter 9. Statistical and Physical Models

Statistical models are probability models and physical models are causal or deterministic or mixed causal-deterministic-probability models applied to observable propositions. It is observations which turn probability into statistics. Statistical and physical models are thus verifiable, and all use statistics in their verification. All models should be verified, but most aren’t. Classical modeling emphasizes hypothesis or “significance” testing and estimation. No hypothesis test, Bayesian or frequentist, should ever be used. Death to all p-values or Bayes factors! Hypothesis testing does not prove or show cause; therefore, embedded in every test used to claim cause is a fallacy. If cause is known, probability isn’t needed. Neither should parameter-centric (estimation, etc.) methods be used. Instead, use only probability, make probabilistic predictions of observables given observations and other premises, then verify these predictions. Measures of model goodness and observational relevance are given in a language which requires no sophisticated mathematical training to understand. Speak only in terms of observables and match models to measurement. Hypothesis-testing and parameter estimation are responsible for a pandemic of over-certainty in the sciences. Decisions are not probability, a fact with many consequences.
William Briggs

Chapter 10. Modelling Goals, Strategies, and Mistakes

Here are highlighted only a few of the most egregious and common mistakes made in modeling. Particular models are not emphasized so much as how model results should be communicated. The goal of probability models is to quantify uncertainty in an observable Y given assumptions or observations X. That and nothing more. This, and only this, form of model result should be presented. Regression is of paramount importance. The horrors to thought and clear reasoning committed in its name are legion. Scarcely any user of regression knows its limitations, mainly because of the fallacies of hypothesis testing and the over-certainty of parameter-based reporting. The Deadly Sin of Reification is detailed. The map is not the territory, though this fictional land is unfortunately where many choose to live. When the data do not match a theory, it is often the data that are suspected, not the theory. Models should never take the place of actual data, though they often do, particularly in time series. Risk is nearly always exaggerated. The fallacious belief that we can quantify the unquantifiable is responsible for scientism. “Smoothed” data is often given pride of place over actual observations. Over-certainty rampages across the land and leads to irreproducible results.
William Briggs


Weitere Informationen

Premium Partner

BranchenIndex Online

Die B2B-Firmensuche für Industrie und Wirtschaft: Kostenfrei in Firmenprofilen nach Lieferanten, Herstellern, Dienstleistern und Händlern recherchieren.



Best Practices für die Mitarbeiter-Partizipation in der Produktentwicklung

Unternehmen haben das Innovationspotenzial der eigenen Mitarbeiter auch außerhalb der F&E-Abteilung erkannt. Viele Initiativen zur Partizipation scheitern in der Praxis jedoch häufig. Lesen Sie hier  - basierend auf einer qualitativ-explorativen Expertenstudie - mehr über die wesentlichen Problemfelder der mitarbeiterzentrierten Produktentwicklung und profitieren Sie von konkreten Handlungsempfehlungen aus der Praxis.
Jetzt gratis downloaden!