6.1 Original Approach
We describe two classical learning models in CLT which have been applied for learning DL concept expressions and ontologies. We start with the classical PAC learning model and then describe the exact learning model.
3
In the PAC learning model, a learner receives classified examples drawn according to a probability distribution and attempts to create a hypothesis that approximates the target. The aim is to bound the probability that a hypothesis constructed by the learner misclassifies an example. This approach can be applied to any learning framework. Within this model, one can investigate the complexity of learning an abstract target, such as a DL concept, an ontology, or the weights of a NN.
We now formalise this model. Let \({\mathfrak {F}} = (\mathcal {E} , \mathcal {L} , \mu )\) be a learning framework. A probability distribution \(\mathcal {D}\) over \(\mathcal {E}\) is a function mapping events in a \(\sigma\)-algebra E of subsets of \(\mathcal {E}\) to \([0, 1] \subset {\mathbb {R}}\) such that \(\mathcal {D} (\bigcup _{i \in I} X_{i}) = \sum _{i \in I} \mathcal {D} (X_{i})\) for mutually exclusive \(X_i\), where I is a countable set of indices, \(X_{i}\in E\), and \(\mathcal {D} (\mathcal {E} ) = 1\). Given a target \(t \in \mathcal {L}\), let \({\mathsf{EX}}^{\mathcal {D} }_{{{\mathfrak {F}},t}}\) be the oracle that takes no input, and outputs a classified example \((e,\ell _{t}(e))\), where \(e \in \mathcal {E}\) is sampled according to the probability distribution \(\mathcal {D}\), \(\ell _{t}(e) =1\), if \(e \in \mu (t)\), and \(\ell _{t}(e) =0\), otherwise. An example query is a call to the oracle \({\mathsf{EX}}^{\mathcal {D} }_{{{\mathfrak {F}},t}}\). A sample generated by \({\mathsf{EX}}^{\mathcal {D} }_{{{\mathfrak {F}},t}}\) is a (multi-)set of indexed classified examples, independently and identically distributed according to \(\mathcal {D}\), sampled by calling \({\mathsf{EX}}^{\mathcal {D} }_{{{\mathfrak {F}},t}}\).
A learning framework \({\mathfrak {F}}\) is PAC learnable if there is a function \(f : (0, 1)^{2} \rightarrow {\mathbb {N}}\) and a deterministic algorithm such that, for every \(\epsilon , \delta \in (0, 1) \subset {\mathbb {R}}\), every probability distribution \(\mathcal {D}\) on \(\mathcal {E}\), and every target \(t \in \mathcal {L}\), given a sample of size \(m \ge f(\epsilon , \delta )\) generated by \({\mathsf{EX}}^{\mathcal {D} }_{{{\mathfrak {F}},t}}\), the algorithm always halts and outputs \(h \in \mathcal {L}\) such that with probability at least \((1 - \delta )\) over the choice of m examples in \(\mathcal {E}\), we have that \(\mathcal {D} (\mu (h) \oplus \mu (t)) \le \epsilon\). If the number of computation steps used by the algorithm is bounded by a polynomial function \(p (|t |, |e|, 1/\epsilon , 1/\delta )\), where e is the largest example in the sample generated by \({\mathsf{EX}}^{\mathcal {D} }_{{{\mathfrak {F}},t}}\), then \({\mathfrak {F}}\) is PAC learnable in polynomial time.
PAC Problem Given a learning framework decide whether it is PAC learnable in polynomial time.
In the classical PAC approach, the probability distribution
\(\mathcal {D}\) is unknown to the learner. The algorithm should provide a probabilistic bound for any possible
\(\mathcal {D}\). We now describe the exact learning model. In this model, a learner tries to identify an abstract target known by a teacher, also called an
oracle, by interacting with the teacher [
2]. The most successful protocol is based on
membership and
equivalence queries. As it happens with the PAC learning model, this model can be used to formulate learning problems within the context of any kind of learning framework.
We formalise these notions as follows. Given a learning framework \({\mathfrak {F}} = (\mathcal {E} , \mathcal {L} , \mu )\), we are interested in the exact identification of a target concept representation \(t \in \mathcal {L}\) by posing queries to oracles. Let \({\mathsf{MQ}}_{{\mathfrak {F}},t}\) be the oracle that takes as input some \(e \in \mathcal {E}\) and returns ‘yes’ if \(e \in \mu (t)\) and ‘no’ otherwise. A membership query is a call to the oracle \({\mathsf{MQ}}_{{\mathfrak {F}},t}\). For every \(t \in \mathcal {L}\), we denote by \({\mathsf{EQ}}_{{\mathfrak {F}},t}\) the oracle that takes as input a hypothesis concept representation \(h \in \mathcal {L}\) and returns ‘yes’ if \(\mu (h) = \mu (t)\) and a counterexample \(e \in \mu (h) \oplus \mu (t)\) otherwise, where \(\oplus\) denotes the symmetric set difference. There is no assumption regarding which counterexample in \(\mu (h) \oplus \mu (t)\) is chosen by the oracle. An equivalence query is a call to the oracle \({\mathsf{EQ}}_{{\mathfrak {F}},t}\). In this model, if examples are interpretations or entailments, the notion of ‘equivalence’ coincides with logical equivalence.
A learning framework \({\mathfrak {F}}\) is exactly learnable if there is a deterministic algorithm such that, for every \(t \in \mathcal {L}\), it eventually halts and outputs some \(h\in \mathcal {L}\) with \(\mu (h) = \mu (t)\). Such algorithm is allowed to call the oracles \({\mathsf{MQ}}_{{\mathfrak {F}},t}\) and \({\mathsf{EQ}}_{{\mathfrak {F}},t}\). If the number of computation steps used by the algorithm is bounded by a polynomial \(p(|t |,|e |)\), where \(t \in \mathcal {L}\) is the target and \(e \in \mathcal {E}\) is the largest counterexample seen so far, then \({\mathfrak {F}}\) is exactly learnable in polynomial time.
Exact Problem Given a learning framework decide whether it is exactly learnable in polynomial time.
In Theorem
1, we recall an interesting connection between the exact learning model and the PAC model extended with membership queries. If there is a polynomial time algorithm for a learning framework
\({\mathfrak {F}}\) that is allowed to make membership queries then
\({\mathfrak {F}}\) is
PAC learnable with membership queries in polynomial time.
The converse of Theorem
1 does not hold [
6]. That is, there is a learning framework that is PAC learnable in polynomial time (even without membership queries) but not exactly learnable in polynomial time.
6.2 Building DL Ontologies
The PAC learning model has been already applied to learn DL concept expressions formulated in DL CLASSIC [
9,
14] (see also [
36]). The main difficulty in adapting the PAC approach for learning DL ontologies is the complexity of this task. In the PAC learning model, one is normally interested in
polynomial time complexity, however, many DLs, such as
\(\mathcal{ALC}\), have superpolynomial time complexity for the entailment problem and entailment checks are often important to combine the information present in the classified examples.
It has been shown that the
\({{\mathcal {E}}\mathcal {L}}\) fragments
\({{\mathcal {E}}\mathcal {L}} _{{\mathsf{lhs}}}\) and
\({{\mathcal {E}}\mathcal {L}} _{{\mathsf{rhs}}}\)—the
\({{\mathcal {E}}\mathcal {L}}\) fragments that allow only conjunctions of concept names on the right-side and on the left-side of CIs, respectively—are polynomial time exactly learnable from entailments [
24,
25,
37],
4 however, this is not the case for
\({{\mathcal {E}}\mathcal {L}}\). The learning framework is the one in Example
1 and the problem statement is the same as in the original approach. By Theorem
1, the results for
\({{\mathcal {E}}\mathcal {L}} _{{\mathsf{lhs}}}\) and
\({{\mathcal {E}}\mathcal {L}} _{{\mathsf{rhs}}}\) are transferable to the PAC learning model extended with membership queries. The results show how changes in the ontology language can impact the complexity of searching for a suitable ontology in the hypothesis space. The main difficulty of implementing this model is that it is based on oracles, in particular, on an equivalence query oracle. Fortunately, as already mentioned, such equivalence queries can be simulated by the sampling oracle of the PAC learning model to achieve PAC learnability (Theorem
1) [
2].