We now present selected approaches for retrieval and reasoning with arguments from the knowledge level of the architecture.
4.1 Matching Similar Claims by Textual Similarity
In an initial study [
14], we evaluated different methods for claim similarity. We built upon the groundwork of Wachsmuth et al. [
29], who set up an argument search engine based on crawling and indexing arguments from four debate portals. Since their corpus was not freely available at that time, we built a comparable corpus with 63,250 claims and about 695,000 premises by crawling the same portals. For our evaluation we used 232 claims from this corpus on the topic
energy. To determine these claims, we first identified the 44 most similar words to
energy using a pretrained word2Vec [
21] model, and then randomly chose 232
query claims amongst all claims containing at least one of them. We then evaluated how well 196 text similarity methods implemented in Apache Lucene performed in finding relevant
result claims for these query claims. To build a gold standard, we constructed a result pool for each query from the top five results of each method, resulting in a total of 3,622 (query, result) pairs. Each pair was then assessed by two annotators on a scale from 1 (semantically dissimilar) to 5 (semantically equal). For each method, the result quality was then measured using the established nDCG metric [
17]. Our results show that the widely used BM25 method [
26] performs very well with an nDCG@5 of 0.7944, but an even better performance (0.8355) was achieved by a combination of Axiomatic Approaches for IR and Divergence from Randomness (DFR) [
1]. The results of our experiments also support the intuitive assumption that, given a query claim, the premises of a similar claim are more relevant to the query claim than those of a dissimilar one, using a second set of relevance assessments for (query claim, result premise) pairs on a binary scale.
4.2 A Probabilistic Ranking Framework for Argument Retrieval
For finding good premises for a query claim from a large corpus of already mined arguments, we proposed a principled probabilistic ranking framework [
13]. Given a controversial claim or topic, the system first identifies highly similar claims in the corpus, and then clusters and ranks their supporting and attacking premises, taking clusters of claims as well as the stances of query and premises into account.
The description of the whole framework is beyond the scope of this paper. We only sketch the approach for finding supporting premises to a query claim; finding attacking premises is analogous. Given a large corpus of claims and premises, we first create a set of disjoint claim clusters \(\Gamma=\{\gamma_{1},\gamma_{2},\ldots\}\) where each cluster \(\gamma_{j}\) consists of claims with the same meaning. Analogously, we create a set of disjoint premise clusters \(\Pi=\{\pi_{1},\pi_{2},\ldots\}\) consisting of premises with the same meaning. Our goal is to find the best clusters of supporting premises \(\pi^{+}\) for a query \(q\). To do so, we estimate the probability of relevance \(P(\pi^{+}|q)\) for each \(\pi^{+}\in\Pi\). This probability is high if many premises from the cluster strongly support claims relevant to the query claim. To quantify this, we consider the probability \(P(c|q)\) that claim \(c\) is relevant for query \(q\) and the probability \(P(p^{+}|c,q)\) that a user would pick premise \(p\) amongst all supporting premises of \(c\). We then obtain \(P(p^{+}|q)\) by adding \(P(c|q)\cdot P(p^{+}|c,q)\) over all claims in the corpus, and can compute \(P(\pi_{j}^{+}|q)\) as the sum of \(P(p^{\prime+}|q)\) over all premises \(p^{\prime+}\in\pi_{j}^{+}\).
We can estimate
\(P(c|q)\) with standard text retrieval methods; in our experiments, we use DFR, the best method for claim retrieval (see Sect.
4.1). Regarding premises, we prefer premises that appear often within a claim cluster but disfavour premises that appear within most or even all claim clusters; this is the same principle used in the tf-idf weight [
27]. We thus estimate
\(P(p^{+}|c)\) as the product of two frequency statistics (plus normalisation): the
premise frequency pf (p+,
c), i.e. the frequency with which
\(p\) is used as support for claims equivalent to
\(c\) (i.e. within
\(c\)’s claim cluster), and the
inverse claim frequency icf (p+), i.e. the inverse number of claim clusters for which
\(p\) is used as support.
We evaluated our ranking framework using the dataset introduced in Sect.
4.1. We calculated all claims’ and premises’ embeddings utilising BERT [
11]. We then clustered the claims in an offline operation with agglomerative clustering [
15] and obtained clusters by applying a dynamic tree cut [
18]. Premise clusters relevant to the query are determined with the same method at query time, considering the premises of the claims most similar to the query and the ten most similar premises to each of these premises determined by BM25. We randomly picked 30 query claims out of the 232 claims. As a baseline system, we implemented the approach proposed by Wachsmuth et al. [
29]. Two annotators assessed the 1,195 premises retrieved by at least one system on a three-fold relevance scale. Our approach significantly outperformed the baseline for nDCG@5.
4.3 Case-Based Reasoning for Retrieval and Adaptation of Argument Graphs
Besides methods from information retrieval we also investigated case-based reasoning (CBR) methods [
2,
25] applied to cases in the form of argument graphs. CBR is a method from knowledge-based problem solving based on experiential knowledge, called cases. It allows the retrieval of cases similar to a query but also the adaptation of cases towards the query. Thus, retrieval methods from CBR can be used as an alternative approach to information retrieval and they are particularly useful for whole argument graphs as their argumentative structure can be considered during similarity assessment. Further, adaptation methods from CBR can be applied to the adaptation of argument graphs. Both issues are subject of investigation in the project.
In our work [
4,
20] we aim at retrieving and adapting argument graphs from a repository (called
case-base in CBR terminology). Formally, an argument graph is a semantically labeled directed graph and represented as a tuple
\(A=(N,E,\tau,\lambda,t)\) [
3].
\(N\) is the set of nodes and
\(E\subseteq N\times N\) is the set of directed edges connecting two nodes.
\(\tau:N\to\mathcal{T}\) assigns each node a type and
\(\lambda:N\to\mathcal{L}\) assigns each node a semantic description from a language
\(\mathcal{L}\).
\(t\in\mathcal{L}\) describes the overall topic of the argument represented in the graph. The types
\(\mathcal{T}\) follow the AIF standard [
9] so that a node can either be an I‑node with natural language propositional content or an S‑node characterized by the respective argumentation scheme. The mapping function
\(\lambda\) is used to link a semantic representation to a node. For an I‑node
\(n\),
\(\lambda(n)\) is the original textual representation (possibly after traditional pre-processing such as stopword removal) together with a semantic representation of this text in the form of a vector, produced by a sentence encoder.
A query to be used in retrieval is also an argument graph or a partial argument graph, which can consist of one or a few (maybe linked) nodes only. For example, a claim with a few premises can be used as a query to retrieve a set of graphs that contribute additional premises for the claim or other sub-graphs supporting or attacking the premises in the query.
For case retrieval, a graph-based similarity measure has been developed which allows to assess the similarity between a query graph \(QA\) and a case graph \(CA\) form the repository. The graph similarity is computed based on a local node similarity measure \(\operatorname{sim}_{N}(n_{q},n_{c})\) of a node \(n_{q}\) from the query argument graph \(QA\) and a node \(n_{c}\) from the case argument graph \(CA\) and an edge similarity measure \(\operatorname{sim}_{E}(e_{q},e_{c})=0.5\cdot(\operatorname{sim}_{N}(e_{q}.l,e_{c}.l)+\operatorname{sim}_{N}(e_{q}.r,e_{c}.r))\) which assesses the similarity of an edge \(e_{q}\) from \(QA\) and an edge \(e_{c}\) from \(CA\).
To construct a global graph similarity value, an admissible mapping
\(m\) is applied which maps nodes and edges from
\(QA\) to
\(CA\), such that only nodes of the same type (I-nodes to I‑nodes and S‑nodes to S‑nodes) are mapped. Edges can only be mapped if the nodes they link are mapped as well by
\(m\). For a given mapping
\(m\) let
\(sn_{i}\) be the node similarities
\(\operatorname{sim}_{N}(n_{i},m(n_{i}))\) and
\(se_{i}\) the edge similarities
\(\operatorname{sim}_{E}(e_{i},m(e_{i}))\). The similarity for a query graph
\(QA\) and a case graph
\(CA\) given a mapping
\(m\) is the normalised sum of the node and edge similarities:
\(\operatorname{sim}_{m}(QA,CA)=(sn_{1}+\cdots+sn_{n}+se_{1}+\cdots+se_{m})/(n_{N}+n_{E})\) Finally, the similarity of
\(QA\) and
\(CA\) is the similarity of an optimal mapping
\(m\), which can be computed using an
\(A^{*}\) search [
3], i.e.,
\(\operatorname{sim}(QA,CA)=\max_{m}\{\operatorname{sim}_{m}(QA,CA)\mid m\text{ is admissible}\}\)
For similarity-based retrieval of argument graphs from a case base, a linear retrieval approach should be avoided due to unacceptable retrieval times caused by the complexity of
\(A^{*}\) search as well as the complexity of the involved node similarity measures. Thus, we applied a two-phase approach, which divides the retrieval into an efficient pre-filter stage followed by phase in which only the filtered cases are assessed in depth using the complex graph similarity measure. We implemented the pre-filter as a linear similarity-based retrieval of the cases based only on the semantic similarity of the topic vector
\(t\) [
4]. The filter selects the
\(k\) most similar cases, which are passed over to the second phase which implements the ranking by a linear assessment of the cases using the graph-based similarity as described above.
This approach significantly depends on the methods used to assess the similarity of nodes. For S‑nodes representing argument schemes their similarity is determined according to the closeness of the schemes within a taxonomic ontology of argument schemes [
20]. Therefore, we apply a similarity measure proposed by Wu and Palmer [
31] that considers the depth of the two schemes to be compared and the length of the taxonomy path to their closest common predecessor. For I‑nodes, their textual information can be compared by textual similarity measures. In order to capture the semantic closeness of the I‑nodes, we investigated various word and sentence embedding methods assessing the similarity.
In a first paper [
4], we used plain word2vec Skip-gram embeddings (WV) [
21] applied to the pre-processed node text (tokenisation and an optional stopword removal). The similarity between two I‑nodes is then assessed using the cosine similarity applied to the aggregated embedding vectors of the words in the pre-processed text. We further extend this investigation by considering various alternative embedding approaches [
20] as well as combinations of them with alternative vector similarity measures. In particular, the unsupervised methods fastText [
7] and GloVe [
24] (word embeddings) as well as the distributed memory model of paragraph vectors (DV) [
19] (sentence embedding) have been applied. In addition, the supervised sentence embedding methods InferSent [
10] (based on BiLSTMs) and the Universal Sentence Encoder [
8] variants USE‑T and USE‑D have been investigated as well as various combinations based on vector concatenation. In experiments using the semantically extended Potsdam Microtexts Corpus [
23], the USE‑T achieved the highest Average Precision of
\(0.972\) whereas WV achieved the highest nDCG@10 of
\(0.877\).
Besides their use in retrieval, we also investigated the use of the argument graph similarity measures for clustering the argument graphs in the repository w.r.t. their similarity [
6]. Clusters of graphs can then be used for further research on generalisation of graphs as pre-processing step for argument graph adaptation. In addition, we approach argument graph adaptation by analogical reasoning. For this purpose, we further enhance the argument graph representation by identifying noun chunks in the text of the I‑nodes and linking them to concepts in the ConceptNet knowledge graph [
28] as a means to represent background knowledge. Based on the knowledge graph, various substitutions of the concepts can be performed as a means for argument adaptation. For example, generalisations can be determined which can be further specialised differently towards the concepts in the query node. Also shortest paths in the knowledge graph between the core concepts occurring in the I‑nodes of an argument graph can be determined as a source for analogical transfer to different concepts occurring in the query. Respective methods are currently being implemented and tested.