Predicting carcinoid heart disease with the noisy-threshold classifier

https://doi.org/10.1016/j.artmed.2006.09.003Get rights and content

Summary

Objective

To predict the development of carcinoid heart disease (CHD), which is a life-threatening complication of certain neuroendocrine tumors. To this end, a novel type of Bayesian classifier, known as the noisy-threshold classifier, is applied.

Materials and methods

Fifty-four cases of patients that suffered from a low-grade midgut carcinoid tumor, of which 22 patients developed CHD, were obtained from the Netherlands Cancer Institute (NKI). Eleven attributes that are known at admission have been used to classify whether the patient develops CHD. Classification accuracy and area under the receiver operating characteristics (ROC) curve of the noisy-threshold classifier are compared with those of the naive-Bayes classifier, logistic regression, the decision-tree learning algorithm C4.5, and a decision rule, as formulated by an expert physician.

Results

The noisy-threshold classifier showed the best classification accuracy of 72% correctly classified cases, although differences were significant only for logistic regression and C4.5. An area under the ROC curve of 0.66 was attained for the noisy-threshold classifier, and equaled that of the physician’s decision-rule.

Conclusions

The noisy-threshold classifier performed favorably to other state-of-the-art classification algorithms, and equally well as a decision-rule that was formulated by the physician. Furthermore, the semantics of the noisy-threshold classifier make it a useful machine learning technique in domains where multiple causes influence a common effect.

Introduction

Bayesian networks have become a widely accepted formalism for reasoning under uncertainty by providing a concise representation of a joint probability distribution over a set of random variables [1]. This distribution is factorized according to an associated acyclic directed graph (ADG) that represents the independence structure between random variables. However, the construction of a Bayesian network that fully captures this independence structure for a realistic domain, has proven to be a difficult task. It requires either manual specification of the ADG by means of available expert knowledge, or large amounts of high-quality data when we resort to structure learning.

An alternative to the construction of an ADG that fully captures the independence structure that holds between variables within the domain, is to use a fixed or severely constrained graph topology for classification purposes. In the latter context we call a Bayesian network a Bayesian classifier. The use of Bayesian methods in medicine was first proposed by Ledley and Lusted in their classic 1959 paper [2], and one of the first successful implementations of Bayesian classifiers in medicine was De Dombal’s system for the diagnosis of acute abdominal pain [3]. The classifier that was used assumes independence of symptoms given the disease, and is known as the naive-Bayes classifier. Over the years, many different Bayesian classifier architectures have been proposed, and many of them focus on lifting the independence assumptions of the naive-Bayes classifier [4]. However, a standard technique such as logistic regression, which is used extensively in medicine, can also be interpreted in terms of a Bayesian classifier architecture (Fig. 1). Other examples of Bayesian classifier architectures can be found in refs. [5], [6], [7].

Although, typically, the actual joint probability distribution, and the joint probability distribution that is represented by the Bayesian classifier, differ considerably, this approach can still yield good results with respect to the classification task [8]. However, a weakness of this approach is that the ad-hoc restrictions that are placed on the underlying graph effectively reduces the Bayesian network to a black box model, making the relation between properties of the domain and classification outcome often difficult to understand. This is an undesirable property; especially in medicine, where ideally one wants to be able to interpret how the classification outcome (such as diagnosed disease or patient prognosis) relates to the available domain knowledge (its causes). The explanation of drawn conclusions is required to increase the acceptance of machine-learning techniques in practice [9], [10].

In this paper, we employ a novel Bayesian classifier, introduced in ref. [11], that facilitates this interpretation as it explicitly provides for a semantics in terms of cause and effect relationships [12]. This noisy-threshold classifier is based on a generalization of the well-known noisy-or model, which has already been used for the purpose of text classification in ref. [13]. In order to demonstrate the merits of the noisy-threshold classifier in a medical context, we apply the technique to the prediction of carcinoid heart disease(CHD); a serious condition that arises as a complication of certain neuroendocrine tumors [14]. We demonstrate that the noisy-threshold classifier performs competitively with state-of-the art classification techniques for this medically relevant problem. Furthermore, an expert physician at the Netherlands Cancer Institute (NKI) was consulted, and it is demonstrated how her knowledge concerning CHD relates to the parameters that were estimated for the noisy-threshold classifier.

This paper proceeds as follows. Section 2 introduces the necessary preliminaries and discusses the semantics of the noisy-threshold model, whereas Section 3 describes the medical problem. The use of the noisy-threshold model as a Bayesian classifier is discussed in Section 4. The results on the classification task and the medical interpretation by the expert physician is presented in Section 5. The paper is ended by some concluding remarks in Section 6.

Section snippets

Bayesian networks

Bayesian networks provide for a compact factorization of a joint probability distribution over a set of random variables by exploiting the notion of conditional independence [1]. Conditional independence can be represented by an acyclic directed graph (ADG) G consisting of vertices V(G) and arcs A(G), and relies on the notion of d-separation [1]. Let G be an ADG and P a joint probability distribution over a set of random variables X={X1,,Xn}. We assume that there is a one-to-one correspondence

Carcinoid heart disease

Carcinoid tumors belong to the group of neuroendocrine tumors, which are known for the production of vasoactive agents in the presence of metastatic disease; usually hepatic (liver) metastases. Among these agents, serotonin is the most important agent, leading to the characteristic carcinoid syndrome of flushes and diarrhea. The other main characteristic feature of neuroendocrine tumors is the slow progression of most tumors if the histology shows a low-grade pattern [26].

Serotonin

Classifier construction

Construction of a noisy-threshold classifier (NTC) proceeds as follows. We first determine the cause variables C and effect variable E that are used in the classifier. In the context of a classifier, the cause variables stand for the attributes and the effect variable stands for the class-variable. Secondly, we need to determine the positive states of the variables. In the CHD domain, the positive states are simply defined as the presence of attributes that affect the presence of the

Classification performance

Table 2 lists the classification accuracy for noisy-threshold classifiers Pτ1 to Pτ12. The noisy-threshold classifier Pτ6 is selected, based on the validation set Dvalidate, and shows the best classification accuracy of 0.72 on the test set Dtest. Note that this exceeds considerably the classification accuracy of 0.54 for the noisy-or classifier Pτ1.

In order to test how well the NTC performs compared with the physician, and with the other classification algorithms that were discussed in Section

Conclusions

The noisy-threshold classifier is a novel type of classifier that has a well-defined semantics in terms of causes and effect. Due to the independence assumptions that are made by the classifier, parameters can be reliably estimated without needing to resort to huge amounts of data. This is an important feature since many domains are characterized by limited amounts of data, as discussed in ref. [39]. Learning Bayesian classifiers from data is to be contrasted with the construction of a full

Acknowledgements

This research was sponsored by the Netherlands Organization for Scientific Research (NWO) under grant numbers 612.066.201 and FN4556. We would like to thank the anonymous reviewers for their valuable comments.

References (41)

  • D. Spiegelhalter et al.

    Statistical and knowledge-based approaches to clinical decision-support systems, with an application in gastroenterology

    J R Stat Soc

    (1984)
  • M. Sahami

    Learning limited dependence Bayesian classifiers

  • N. Friedman et al.

    Bayesian network classifiers

    Machine Learn

    (1997)
  • J. Cheng et al.

    Comparing Bayesian network classifiers

  • P. Domingos et al.

    On the optimality of the simple Bayesian classifier under zero-one loss

    Machine Learn

    (1997)
  • C. Lacave et al.

    A review of explanation methods for Bayesian networks

    Knowledge Eng Rev

    (2002)
  • R. Jurgelenaite et al.

    EM algorithm for symmetric causal independence models

  • J. Vomlel

    Exploiting functional dependence in Bayesian network inference

  • J. Zuetenhorst et al.

    Carcinoid heart disease: the role of urinary 5-HIAA excretion and plasma levels of TGF- β and FGF

    Cancer

    (2003)
  • N. Zhang et al.

    Exploiting causal independence in Bayesian network inference

    J Artif Intell Res

    (1996)
  • Cited by (0)

    View full text