A creative abduction approach to scientific and knowledge discovery
Introduction
We take the notion of ‘knowledge discovery in databases’ (KDD) to mean methods that generate new, plausible, useful, and intelligible knowledge for observed events. Similar definitions can be found in the literature, differing mostly in emphasizing individual features of the produced knowledge [2], [17].
In this paper, we advocate an approach to knowledge discovery that is based on abductive reasoning, an inference scheme originally introduced by Peirce [10]. The standard formulation describes abduction as an inference to a hypothesis C that would explain the evidence E, given the law E←C. This form of abduction became a prevalent reasoning mechanism in many fields of artificial intelligence such as diagnosis, natural language understanding, default reasoning, database updates, planning, and high-level vision [5], [7], [12], [13]. From a knowledge discovery point of view, however, the standard form of abduction is rather uninteresting since in principle, all the knowledge needed to explain the observations is already given in the problem formulation.
Schurz [15] observed that Peirce actually introduced two forms of abductive inference: the first one that he calls non-creative corresponds to the scheme mentioned above. The second form of abduction infers a disposition of certain objects that would explain a set of local temporal (empirical) regularities involving those objects. For instance, the hypothesis that a has the disposition of (electric) conductivity explains the local temporal regularity ‘whenever object a is subject to a voltage source, a conducts current’. Since the predicate denoting the disposition is not already part of the theory, he calls this form of reasoning creative abduction. In addition to the abduced disposition, a new rule is inferred expressing that the set of empirical regularities is ‘caused’ by the objects' disposition. In order to state a causal relationship, the hypothesized disposition is also required to unify a given corpus of knowledge, which means that the abduced disposition can explain a set of correlated regularities.
The creative form of abduction can be used to accomplish many kinds of knowledge discovery tasks. In this paper, we will explore two such tasks. The first one is scientific discovery where a disposition (or cause) is invented to explain multiple correlated empirical regularities. The second one employs a weaker form of creative abduction that is broadly applicable to KDD tasks, such as Web usage mining [3].
The rest of the paper is organized as follows. In Section 2, creative forms of abduction found in the literature are discussed. Section 3 is devoted to introducing disposition-creative abduction and its application to scientific discovery. In Section 4, goal-creative abduction, a weaker form of disposition-creative abduction, is introduced and its relevance to Web usage mining is demonstrated. Section 5 discusses related work, and Section 6 concludes the paper.
Section snippets
Related work
In this section, we offer a short primer on abductive reasoning, and discuss two forms of abduction found in the literature that can be called creative since some ‘new’ hypothesis is invented to explain observed events. There, a new hypothesis is a piece of knowledge that comes in two different syntactic forms:
- •
Element-creative abduction. The hypothesis is a constant denoting an hitherto unknown object (element) of the domain. Element-creative abduction is a method for scientific discovery, i.e.
Creative abduction for scientific discovery
Schurz [15] argues that empirical regularities applying to some (at least one but not all) objects of a domain, called local temporal regularities (or simply empirical regularities), can be explained by hypothesizing an ‘intrinsic property’ or ‘disposition’ of those objects, and calls this form of inference ‘abduction to a disposition’. Then he shows that if the regularities are correlated, this form of abduction is able to significantly unify a given corpus of knowledge. In this case, the
Creative abduction for knowledge discovery
We will develop goal-creative abduction analogous to disposition-creative abduction. However, goal-creative abduction is weaker in the sense that it assumes ‘regularities’ of a quite simple and easily available format. As an example, we discuss Web usage mining based on records in a Web server log.
Discussion
We presented two forms of creative abduction that are intended to cover different enterprises within KDD. The first one is called disposition-creative abduction and can be applied to the task of theory formation in scientific discovery. Disposition-creative abduction is closely related to Reichenbach's principle of the common cause [14], i.e. whenever two events A and B are correlated statistically (or deterministic), then they must have a (temporally prior) common cause. The assumption of
Conclusion
In this paper, we advance creative abduction as a unifying framework for knowledge discovery. Variants of creative abduction subsume important discovery tasks such as theory formation and revision, data mining, and inductive concept learning. In particular, we focus on a form of creative abduction where a disposition or goal is hypothesized in order to explain observed regularities: disposition-creative and goal-creative abduction. Both of them produce knowledge that is new, plausible,
Acknowledgements
This research is supported by the Research Grant (1999–2003) for the Future Program (‘Mirai Kaitaku’) from the Japanese Society for the Promotion of Science (JSPS).
References (17)
- et al.
SL method for computing a near-optimal solution using linear and non-linear programming in cost-based hypothetical reasoning
Knowledge-based Systems
(2002) - et al.
Networked bubble propagation: a polynomial-time hypothetical reasoning method for computing near-optimal solutions
Artificial Intelligence
(1997) Probabilistic Horn abduction and Bayesian networks
Artificial Intelligence
(1993)Principles of human computer collaboration for knowledge discovery in science
Artificial Intelligence
(1999)- et al.
Content-based, collaborative recommendation
Communications of the ACM
(1997) - et al.
From data mining to knowledge discovery in databases
AI Magazine
(1996) - Y. Fu, K. Sandhu, M.-Y. Shih, Clustering of web users based on access patterns, in: Proceedings Workshop on Web Usage...
- D. Heckerman, A tutorial on learning with Bayesian networks, Technical Report, Microsoft Research, Advanced Technology...