Top

Autonomous Agents and Multi-Agent Systems

Published in:

Open Access 01-10-2022

Towards an axiomatic approach to truth discovery

Authors: Joseph Singleton, Richard Booth

Published in: Autonomous Agents and Multi-Agent Systems | Issue 2/2022

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Patentsearch

Off

Abstract

The problem of truth discovery, i.e., of trying to find the true facts concerning a number of objects based on reports from various information sources of unknown trustworthiness, has received increased attention recently. The problem is made interesting by the fact that the relative believability of facts depends on the trustworthiness of their sources, which in turn depends on the believability of the facts the sources report. Several algorithms for truth discovery have been proposed, but their evaluation has mainly been performed experimentally by computing accuracy against large datasets. Furthermore, it is often unclear how these algorithms behave on an intuitive level. In this paper we take steps towards a framework for truth discovery which allows comparison and evaluation of algorithms based instead on their theoretical properties. To do so we pose truth discovery as a social choice problem, and formulate various axioms that any reasonable algorithm should satisfy. Along the way we provide an axiomatic characterisation of the baseline ‘Voting’ algorithm—which leads to an impossibility result showing that a certain combination of the axioms cannot hold simultaneously—and check which axioms a particular well-known algorithm satisfies. We find that, surprisingly, our more fundamental axioms do not hold, and propose modifications to the algorithms to partially fix these problems.

This paper is an extended version of our previous work [35].

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

1 Introduction

There is an increasing amount of data available in today’s world, particularly from the web, social media platforms and crowdsourcing systems. The openness of such platforms makes it simple for a wide range of users to share information quickly and easily, potentially reaching a wide international audience. It is inevitable that amongst this abundance of data there are conflicts, where data sources disagree on the truth regarding a particular object or entity. For example, low-quality sources may mistakenly provide erroneous data for topics on which they lack expertise.

Resolving such conflicts and determining the true facts is therefore an important task. Truth discovery has emerged as a set of techniques to achieve this by considering the trustworthiness of sources [7, 19, 27]. The general principle is that true facts are those claimed by trustworthy sources, and trustworthy sources are those that claim believable facts. Application areas include real-time traffic navigation [14], drug side-effect discovery [30], crowdsourcing and social sensing [29, 38, 47].

For a simple example of a situation where trust can play an important role in conflict resolution, consider the following example.

Example 1.1

Let o and p represent two images for which crowdsourcing workers are asked to provide labels (in the truth discovery terminology, o and p are called objects). Consider workers (the data sources) s, t, u and v who put forward potential labels f, g for o, and h, i for p, as shown in Fig. 1; such potential answers are termed facts. In the graphical representation, sources, facts and objects are shown from left to right, and the edges indicate claims made by sources and the objects to which facts relate.

Without considering trust information, the label for o appears a tie, with both options f and g receiving one vote from sources s and t respectively.

Taking a trust-aware approach, however, we can look beyond object o to consider the trustworthiness of s and t. Indeed, when it comes to object p, t agrees with two extra sources u and v, whereas s disagrees with everyone. In principle there could be hundreds of extra sources here instead of just two, in which case the effect would be even more striking. We may conclude that s is less trustworthy than t. Returning to o, we see that g is supported by a more trustworthy source, and conclude that it should be accepted over f.

Many truth discovery algorithms have been proposed in the literature with a wide range of techniques used, e.g. iterative heuristic-based methods [17, 34], probabilistic models [45], maximum likelihood estimation and optimisation-based methods [28], and neural network models [24, 31, 39]. It is common for such algorithms to be evaluated empirically by running them against real-world or synthetic datasets for which the true facts are already known; this allows accuracy and other metrics to be calculated, and permits comparison between algorithms (see [37] for a systematic empirical evaluation of this kind). This may be accompanied by some theoretical analysis, such as calculating run-time complexity [19], proving convergence of an iterative algorithm [46], or proving convergence to the ‘true’ facts under certain assumptions on the distribution of source trustworthiness [18, 41, 42].

A limitation of this kind of analysis is that the results only apply narrowly to particular algorithms, due to the assumptions made (for instance, that claims from sources follow a particular probability distribution). Such assumptions can be problematic in domains where the desired truth is somewhat ‘fuzzy’; for example, image classification problems and determining the copyright status of books.¹

In this work we take first steps towards a more general approach, in which we aim to study truth discovery without reference to any specific methodology or probabilistic framework. To do so we note the similarities between truth discovery and problems such as judgment aggregation [15], voting theory [50] ranking and recommendation systems [1‐3, 36] in which the axiomatic approach of social choice has been successfully applied. In taking the axiomatic approach one aims to formulate axioms that encode intuitively desirable properties that an algorithm may possess. The interaction between these axioms can then be studied; typical results include impossibility results, where it is shown that a set of axioms cannot hold simultaneously, and characterisation results, where it is shown that a set of axioms are uniquely satisfied by a particular algorithm.

Such analysis brings a new normative perspective to the truth discovery literature. This complements empirical evaluation: in addition to seeing how well an algorithm performs in practise on test datasets, one can check how well it does against theoretical properties that any ‘reasonable’ algorithm should satisfy. The satisfaction (or failure) of such properties then shines new light on the intuitive behaviour of an algorithm, and may guide development of new ones.

With this in mind, we develop a simplified framework for truth discovery in which axioms can be formulated, and go on to give both an impossibility result and an axiomatic characterisation of a baseline voting algorithm. We also analyse the class of recursive truth discovery algorithms, which includes many existing examples from the literature. In particular, we analyse the well-known algorithm Sums [34] with respect to the axioms.

However, as a first step towards a social choice perspective of truth discovery, our framework involves a number of simplifying assumptions not commonly made in the truth discovery literature.

Manipulation and collusion. Some of our axioms assume sources are not manipulative: they provide claims in good faith, and do not aim to misinform or artificially improve their standing with respect to the truth discovery algorithm. We also assume sources act independently, i.e. they do not collude with or copy one another.
Object correlations. We do not model correlations between the objects of interest in the truth discovery problem. For example, in a crowdsourcing setting it may be known in advance that two objects o and p are similar, so that the true labels for o and p are correlated; this cannot be expressed in our framework.
Ordinal outputs. For the most part, the outputs of our truth discovery methods consist of rankings of the sources and facts. Thus, we describe when a source is considered more trustworthy than another, but do not assign precise numerical values representing trustworthiness. This breaks with tradition in the truth discovery literature, but is a common point of view in social choice theory.

At first glance these are strong assumptions, and rule out potential applications of our version of truth discovery. However, we argue that the problem is non-trivial even in this simplified setting, and that interesting axioms can still be put forth. The framework as set out here lays the groundwork for these assumptions to be lifted in future work.

The paper is organised as follows. Our framework is introduced and formally defined in the next section. Section 3 provides examples of truth discovery algorithms from the literature expressed in the framework. In Sect. 4 we formally introduce the axioms and present an impossibility result showing a subset of these cannot all be satisfied simultaneously. The examples of Sect. 3 are then revisited in Sect. 5, where we analyse them with respect to the axioms and propose modifications to resolve some axiom failures. In Sect. 6 we extend the framework to allow variable domains of sources, facts and objects, and give an impossibility result similar to that of Sect. 4. We discuss the axioms in Sect. 7 and related work in Sect. 8. We conclude in Sect. 9. Missing proofs are given in appendix A.

2 An idealised framework for truth discovery

In this section we define our formal framework, which provides the key definitions required for theoretical discussion and analysis of truth discovery methods.

For most of the paper, we consider a fixed domain of finite and mutually disjoint sets ${\mathcal {S}}$, ${\mathcal {F}}$ and ${\mathcal {O}}$ throughout, called the sources, facts and objects respectively. All definitions and axioms will be stated with respect to these sets.²

2.1 Truth discovery networks

A core definition of the framework is that of a truth discovery network, which represents the input to a truth discovery problem. We model this as a tripartite graph with certain constraints on its structure, in keeping with approaches taken throughout the truth discovery literature [19, 45].

Definition 2.1

A truth discovery network (hereafter a TD network) is a directed graph $N = (V, E)$ where $V = {\mathcal {S}}\cup {\mathcal {F}}\cup {\mathcal {O}}$, and $E \subseteq ({\mathcal {S}}\times {\mathcal {F}}) \cup ({\mathcal {F}}\times {\mathcal {O}})$ has the following properties:

For each $f \in {\mathcal {F}}$ there is a unique $o \in {\mathcal {O}}$ with $(f, o) \in E$, denoted $\mathsf {obj}_N(f)$. That is, each fact is associated with exactly one object.

For $s \in {\mathcal {S}}$ and $o \in {\mathcal {O}}$, there is at most one directed path from s to o. That is, sources cannot claim multiple facts for a single object.

$({\mathcal {S}}\times {\mathcal {F}}) \cap E$ is non-empty. That is, at least one claim is made.

We will say that s claims f when $(s, f) \in E$. Let ${\mathcal {N}}$ denote the set of all TD networks.

Figure 1 (page 2) provides an example of a TD network. Note that there is no requirement that a source makes a claim for every object, or even that a source makes any claims at all. This reflects the fact that truth discovery datasets are in practise extremely sparse, i.e. each individual source makes few claims. Conversely, we allow for facts that receive no claims from any sources.

Also note that the object associated with a fact f is not fixed across all networks. This is because we view facts as labels for information that sources may claim, not the pieces of information themselves. Similarly, we consider objects simply as labels for real-world entities. Whilst a particular piece of information has a fixed entity to which it pertains, the labels do not.³

A special case of our framework is the binary case in which every object has exactly two associated facts. This brings us close to the setting studied in judgment aggregation [15] and, specifically (since sources do not necessarily claim a fact associated to every object) to the setting of binary aggregation with abstentions [9, 11]. An important difference, however, is that for simplicity we do not assume any constraints on the possible configurations of true facts across different objects. That is, any combination of facts is feasible. In judgment aggregation such an assumption has the effect of neutralising the impossibility results that arise in that domain (see, e.g., [9]). We shall see that that is not the case in our setting.

To simplify the notation in what follows, for a network $N=(V, E)$ we write $\mathsf {facts}_N(s) = \{f \in {\mathcal {F}}: (s, f) \in E\}$ for the set of facts claimed by a source s, and $\mathsf {src}_N(f) = \{s \in {\mathcal {S}}: (s, f) \in E\}$ for the set of sources claiming a fact f.

2.2 Truth discovery operators

Having defined the input to a truth discovery problem, the output must be defined. Contrary to many approaches in the truth discovery literature which output numeric trust scores for sources and belief scores for facts [17, 34, 45, 47‐49], we consider the primary output to be rankings of the sources and facts. To the extent that we do consider numeric scores, it is only to induce a ranking. This is because we are chiefly interested in ordinal properties rather than quantitative values. Indeed, for the theoretical analysis we wish to perform it is only important that a source is more trustworthy than another; the particular numeric scores produced by an algorithm are irrelevant.

Moreover, the scores produced by existing algorithms may have no semantic meaning [34], and so referring to numeric values is not meaningful when comparing across algorithms. In this case it is only the rankings of sources and facts that can be compared, which is further motivation for our choice. This point of view is also common across the social choice literature.

However, numerical scores do provide valuable information for comparing sources and facts given a fixed algorithm. For example, the magnitude of the difference in trust scores for sources s and t tells us something about confidence: a small difference indicates low confidence in distinguishing s and t—even if one is ranked above the other—whereas a large difference indicates high confidence. In this sense our decision to primarily deal with ordinal outputs (and ordinal axioms) is another simplifying assumption compared to typical truth discovery settings.

For a set X, let ${\mathcal {L}}(X)$ denote the set of all total preorders on X, i.e. the set of transitive, reflexive and complete binary relations on X. We define a truth discovery operator as a function which maps networks to rankings of sources and facts.

Definition 2.2

An ordinal truth discovery operator T (hereafter TD operator) is a mapping $T: {\mathcal {N}}\rightarrow {\mathcal {L}}({\mathcal {S}}) \times {\mathcal {L}}({\mathcal {F}})$. We shall write $T(N) = (\sqsubseteq _N^T, \preceq _N^T)$, i.e. $\sqsubseteq _N^T$ is a total preorder on ${\mathcal {S}}$ and $\preceq _N^T$ is a total preorder on ${\mathcal {F}}$.

Intuitively, the relation $\sqsubseteq _N^T$ is a measure of source trustworthiness in the network N according to T, and $\preceq _N^T$ is a measure of fact believability; $s_1 \sqsubseteq _N^T s_2$ means that source $s_2$ is at least as trustworthy as source $s_1$, and $f_1 \preceq _N^T f_2$ means fact $f_2$ is at least as believable as fact $f_1$. The notation $\sqsubset _N^T$ and $\simeq _N^T$ will be used to denote the strict and symmetric orders induced by $\sqsubseteq _N^T$ respectively. For fact rankings, $\prec _N^T$ and $\approx _N^T$ are defined similarly. Note that for simplicity the fact ranking $\preceq _N^T$ compares all facts, even those associated with different objects in N.

To capture existing truth discovery methods we introduce numerical operators, which assign each source a numeric trust score and each fact a belief score.

Definition 2.3

A numerical TD operator is a mapping $T: {\mathcal {N}}\rightarrow {\mathbb {R}}^{{\mathcal {S}}\cup {\mathcal {F}}}$, i.e. T assigns to each TD network N a function $T(N) = T_N: {\mathcal {S}}\cup {\mathcal {F}}\rightarrow {\mathbb {R}}$. For $s \in {\mathcal {S}}$, $T_N(s)$ is the trust score for s in the network N according to T; for $f \in {\mathcal {F}}$, $T_N(f)$ is the belief score for f. The set of all numerical TD operators will be denoted by ${\mathcal {T}}_{Num}$.

Note that any numerical operator T naturally induces an ordinal operator ${\hat{T}}$, where ${s_1 \sqsubseteq _N^{{\hat{T}}} s_2}$ iff ${T_N(s_1) \le T_N(s_2)}$, and ${f_1 \preceq _N^{{\hat{T}}} f_2}$ iff ${T_N(f_1) \le T_N(f_2)}$. Henceforth we shall write $\sqsubseteq _N^T$, $\preceq _N^T$ without explicitly defining the induced ordinal operator ${\hat{T}}$.

It is worth noting that yet other truth discovery algorithms output neither rankings nor numeric scores for facts, but only a single ‘true’ fact for each object [10, 28, 43]. This is also the approach taken in judgment aggregation, where an aggregation rule selects which formulas are to be taken as true. In the case of finitely many possible facts, such algorithms can be modelled in our framework as numerical operators where $T_N(f) = 1$ for each identified ‘true’ fact f, and $T_N(g) = 0$ for other facts g. To go in the reverse direction and obtain the ‘true’ facts according to an operator, one may simply select the set of facts for each object that rank maximally.

3 Examples of truth discovery operators

Our framework can capture some operators that have been proposed in the truth discovery literature. In this section we provide two concrete examples: Voting, which is a simple approach commonly used as a baseline method, and Sums [34]. We go on to outline the class of recursive operators—of which Sums is an instance—which contains many more examples from the literature.

3.1 Voting

In Voting, we consider each source to cast ‘votes’ for the facts they claim, and facts are ranked according to the number of votes received. Clearly this method disregards the source trustworthiness aspect of truth discovery, as a vote from one source carries as much weight as a vote from any other. As such, Voting cannot be considered a serious contender for truth discovery. It is nonetheless useful as a simple baseline method against which to compare more sophisticated methods.

Definition 3.1

Voting is the numerical operator defined as follows: for any network $N \in {\mathcal {N}}$, $s \in {\mathcal {S}}$ and $f \in {\mathcal {F}}$, $T_N(s) = 1$ and $T_N(f) = |\mathsf {src}_N(f)|$.

Consider the network N shown in Fig. 1. Facts f, g and h each receive one vote, whereas i receives 3. The fact ranking induced by Voting is therefore $f \approx g \approx h \prec i$. On the other hand, all sources receive a trust score of 1 and therefore rank equally.

3.2 Sums

Sums [34] is a simple and well-known operator adapted from the Hubs and Authorities [21] algorithm for ranking web pages. The algorithm operates iteratively and recursively, assigning each source and fact a sequences of scores, with the final scores taken as the limit of the sequence.

Initially, scores are fixed at a constant value of 1/2. The trust score for each source is then updated by summing the belief score of its associated facts. Similarly, belief scores are updated by summing the trust scores of the associated sources. To prevent these scores from growing without bound as the algorithm iterates, they are normalised at each iteration by dividing each trust score by the maximum across all sources (belief scores are normalised similarly).

Expressed in our framework, we have that if T is the (numerical) operator giving the scores at iteration n, then the pre-normalisation scores at iteration $n+1$ are given by $T'$, where

$$\begin{aligned} T'_N(s) = \sum _{f \in \mathsf {facts}_N(s)}{T_N(f)}; \quad T'_N(f) = \sum _{s \in \mathsf {src}_N(f)}{T'_N(s)} \end{aligned}$$

(3.1)

Consider again the network N shown in Fig. 1. It can be shown that, with T denoting the limiting scores from Sums with normalisation, we have $T_N(s) = 0$, $T_N(t) = 1$, and $T_N(u) = T_N(v) = \sqrt{2} / 2$. The induced ranking of sources is therefore $s \sqsubset u \simeq v \sqsubset t$.

For fact scores, we have $T_N(f) = 0$, $T_N(g) = \sqrt{2} - 1$, $T_N(h) = 0$ and $T_N(i) = 1$, so the ranking is $f \approx h \prec g \prec i$. Note that fact g fares better under Sums than Voting, due to its association with the highly-trusted source t.

3.3 Recursive truth discovery operators

The iterative and recursive aspect of Sums is hoped to result in the desired mutual dependence between trust and belief scores: namely that sources claiming high-belief facts are seen as trustworthy, and vice versa. In fact, this recursive approach is near universal across the truth discovery literature (see for instance [14, 17, 28, 44, 48, 49]). As such it is appropriate to identify the class of recursive operators as an important subset of ${\mathcal {T}}_{Num}$. To make a formal definition we first define an iterative operator.

Definition 3.2

An iterative operator is a sequence $(T^n)_{n \in {\mathbb {N}}}$ of numerical operators. An iterative operator is said to converge to a numerical operator $T^*$ if $\lim _{n \rightarrow \infty }{T_N^n(z)}=T_N^*(z)$ for all networks N and $z \in {\mathcal {S}}\cup {\mathcal {F}}$. In such case the iterative operator can be identified with the ordinal operator induced by its limit $T^*$.

Note that it is possible that an iterative operator $(T^n)_{n \in {\mathbb {N}}}$ converges for only a subset of networks. In such case we can consider $(T^n)_{n \in {\mathbb {N}}}$ to converge to a ‘partial operator’ and identify it with the induced partial ordinal operator; that is, a partial function ${\mathcal {N}}\rightarrow {\mathcal {L}}({\mathcal {S}}) \times {\mathcal {L}}({\mathcal {F}})$. Recursive operators can now be defined as those iterative operators where $T^{n+1}$ can be obtained from $T^n$.

Definition 3.3

An iterative operator $(T^n)_{n \in {\mathbb {N}}}$ is said to be recursive if there is a function $U: {\mathcal {T}}_{Num}\rightarrow {\mathcal {T}}_{Num}$ such that $T^{n+1} = U(T^n)$ for all $n \in {\mathbb {N}}$.

In this context the mapping $U: {\mathcal {T}}_{Num}\rightarrow {\mathcal {T}}_{Num}$ is called the update function, and the initial operator $T^1$ is called the prior operator. For a prior operator T and update function U, we write $\mathsf {rec}(T, U)$ for the associated recursive operator; that is, $T^1 = T$ and $T^{n+1}=U(T^n)$.

Returning to Sums, we see that Eq. (3.1) defines a mapping ${\mathcal {T}}_{Num}\rightarrow {\mathcal {T}}_{Num}$ and consequently an update function $U^{\text {Sums}}$. The normalisation step can be considered a separate update function $\mathsf {norm}$ which maps any numerical operator T to $T'$, where⁴

$$\begin{aligned} T_N'(s) = \frac{T_N(s)}{\max \limits _{x \in {\mathcal {S}}}{|T_N(x)|}}, \quad T_N'(f) = \frac{T_N(f)}{\max \limits _{y \in {\mathcal {F}}}{|T_N(y)|}} \end{aligned}$$

It can then be seen that Sums is the recursive operator $\mathsf {rec}(T^{\text {fixed}}, \mathsf {norm}\circ U^{\text {Sums}})$, where $T^{\text {fixed}}_N \equiv 1/2$.

Many other existing algorithms proposed in the literature can also be realised as recursive operators in the framework, such as Investment, PooledInvestment [34], TruthFinder [45], LDT [48] and others.

4 Axioms for truth discovery

Having laid out the formal framework, we now introduce axioms for truth discovery. To start with, we consider axioms which encode a desirable theoretical property that we believe any ‘reasonable’ operator T should satisfy. Several properties of this nature can be obtained by adapting existing axioms from the social choice literature (e.g. from voting [8], ranking systems [2, 36] and judgement aggregation [15]), to our framework.

However, the correspondence between truth discovery and classical social choice problems—such as voting—has its limits. To show this, we translate the famous Independence of Irrelevant Alternatives (IIA) axiom [4] to our setting, and argue that it is actually an undesirable property. Indeed, it will be seen that this translated axiom, in combination with two basic desirable axioms, leads to Voting-like behaviour in every network, which is undesirable for the reasons given in Sect. 3.1. Furthermore, a slight strengthening of the IIA axiom completely characterises the fact ranking component of Voting. These results formalise the intuition that truth discovery’s consideration of source-trustworthiness leads to fundamental differences from classical social choice problems.

Afterwards, we will revisit the specific operators of the previous section to check which axioms are satisfied.

4.1 Coherence

As mentioned previously, a guiding principle of truth discovery is that sources claiming highly believed facts should be seen as trustworthy, and that facts backed by highly trusted sources should be seen as believable.

Whilst this intuition is difficult to formalise in general, it is possible to do so in particular cases where there are obvious means by which to compare the set of facts for two sources (and vice versa). This situation is considered in the axiomatic analysis of ranking and reputation systems under the name Transitivity [2, 36], and we adapt it to truth discovery in this section. A preliminary definition is required.

Definition 4.1

Let T be a TD operator, N be a TD network and $Y, Y' \subseteq {\mathcal {F}}$. We say Y is less believable than $Y'$ with respect to N and T if there is a bijection $\varphi : Y \rightarrow Y'$ such that $f \preceq _N^T \varphi (f)$ for each $f \in Y$, and ${\hat{f}} \prec _N^T \varphi ({\hat{f}})$ for some ${\hat{f}} \in Y$.

For $X, X' \subseteq {\mathcal {S}}$ we define X less trustworthy than $X'$ with respect to N and T in a similar way.

In plain English, Y less believable than $Y'$ means that the facts in each set can be paired up in such a way that each fact in $Y'$ is at least as believable as its counterpart in Y, and at least one fact in $Y'$ is strictly more believable. Now, consider a situation where $\mathsf {facts}_N(s_1)$ is less believable than $\mathsf {facts}_N(s_2)$. In this case the intuition outlined above tells us that $s_2$ provides ‘better’ facts, and should thus be seen as more trustworthy than $s_1$. A similar idea holds if $\mathsf {src}_N(f_1)$ is less trustworthy than $\mathsf {src}_N(f_2)$ for some facts $f_1, f_2$. We state this formally as our first axiom.

Axiom 1

(Coherence) For any network N, $\mathsf {facts}_N(s_1)$ less believable than $\mathsf {facts}_N(s_2)$ implies $s_1 \sqsubset _N^T s_2$, and $\mathsf {src}_N(f_1)$ less trustworthy than $\mathsf {src}_N(f_2)$ implies $f_1 \prec _N^T f_2$.

Coherence can be broken down into two sub-axioms: Source-Coherence, where the first implication regarding source rankings is satisfied; and Fact-Coherence, where the second implication is satisfied. We take Coherence to be a fundamental desirable axiom for TD operators.

4.2 Symmetry

Our next axiom requires that rankings of sources and facts should not depend on their ‘names’, but only on the structure of the network. To state it formally, we need a notion of when two networks are essentially the same but use different names.

Definition 4.2

Two TD networks N and $N'$ are equivalent if there is a graph isomorphism $\pi$ between them that preserves sources, facts and objects, i.e., $\pi (s) \in {\mathcal {S}}$, $\pi (f) \in {\mathcal {F}}$ and $\pi (o) \in {\mathcal {O}}$ for all $s \in {\mathcal {S}}$, $f \in {\mathcal {F}}$ and $o \in {\mathcal {O}}$. In such case we write $\pi (N)$ for $N'$.

Axiom 2

(Symmetry) Let N and $N' = \pi (N)$ be equivalent networks. Then for all $s_1, s_2 \in {\mathcal {S}}$, $f_1, f_2 \in {\mathcal {F}}$, we have $s_1 \sqsubseteq _N^T s_2\ \text {iff } \pi (s_1) \sqsubseteq _{N'}^T \pi (s_2)$ and $f_1 \preceq _N^T f_2\ \text {iff } \pi (f_1) \preceq _{N'}^T \pi (f_2)$.

In the theory of voting in social choice, Symmetry as above is expressed as two axioms: Anonymity, where output is insensitive to the names of voters, and Neutrality, where output is insensitive to the names of alternatives [50]. Analogous axioms are also used in judgment aggregation.

Symmetry can also be broken down into sub-axioms where the above need only hold for a subset of permutations $\pi$ satisfying some condition: Source-Symmetry (where $\pi$ must leave facts and objects fixed) and Fact-Symmetry (where $\pi$ leaves sources and objects fixed). For truth discovery we have the additional notion of objects, and thus Object-Symmetry can defined be similarly.

4.3 Fact ranking axioms

Next, we introduce axioms that dictate the ranking of particular facts in cases where there is an ‘obvious’ ordering. Unanimity and Groundedness express the idea that if all sources are in agreement about the status of a fact, then an operator should respect this in its verdict. Two obvious ways in which sources can be in agreement are when all sources believe a fact is true, and when none believe a fact is true.

Axiom 3

(Unanimity) Suppose $N \in {\mathcal {N}}$, $f \in {\mathcal {F}}$, and $\mathsf {src}_N(f) = {\mathcal {S}}$. Then for any other $g \in {\mathcal {F}}$, $g \preceq _N^T f$.

Axiom 4

(Groundedness) Suppose $N \in {\mathcal {N}}$, $f \in {\mathcal {F}}$, and $\mathsf {src}_N(f) = \emptyset$. Then for any other $g \in {\mathcal {F}}$, $f \preceq _N^T g$.

That is, f cannot do better than to be claimed by all sources when T satisfies Unanimity, and cannot do worse than to be claimed by none when T satisfies Groundedness. Unanimity here is a truth discovery rendition of the same axiom in judgment aggregation, and can also be compared to the weak Paretian property in voting [8]. Groundedness is a version of the same axiom studied in the analysis of collective annotation [25].

The next axiom is a monotonicity property, which states that if f receives extra support from a new source s, then its ranking should receive a strictly positive boost.⁵ Note that we do not make any judgement on the new ranking of s.

Axiom 5

(Monotonicity) Suppose $N \in {\mathcal {N}}$, $s \in {\mathcal {S}}$, $f \in {\mathcal {F}}\setminus \mathsf {facts}_N(s)$. Write E for the set of edges in N, and let $N'$ be the network in which s claims f; i.e. the network with edge set

$$\begin{aligned} E' = \{(s, f)\} \cup E \setminus \{(s, g) : g \ne f, \mathsf {obj}_N(g) = \mathsf {obj}_N(f) \} \end{aligned}$$

Then for all $g \ne f$, $g \preceq _N^T f\ \text {implies } g \prec _{N'}^T f$.

Note that the axioms in this section assume sources do not have ‘negative’ trust levels. That is, we assume that support from even the most untrustworthy source still has a positive effect on the believability of a fact. Consequently, these axioms are not suitable in the presence of knowledgable but malicious sources who always claim false facts. Indeed, otherwise a fact claimed only by a ‘negative’ source should rank strictly worse than a fact with no sources, but this goes against Groundedness. Similarly, receiving extra support from a negative source should worsen a fact’s ranking, contrary to Monotonicity. Moreover, Monotonicity implicitly assumes sources act independently, i.e. they do not collude with one another.⁶

While these assumptions may appear somewhat strong, we argue that this ‘simple’ case—with no ‘negative’ sources or collusion—is already non-trivial and permits interesting axiomatic analysis. We therefore view Unanimity, Groundedness and Monotonicity as desirable properties for TD operators.

4.4 Independence axioms

We now come to exploring the differences between truth discovery and other social choice problems via independence axioms. In voting, this takes the form of Independence of Irrelevant Alternatives (IIA), which requires that the ranking of two alternatives A and B depends only on the individual assessments of A and B, not on some ‘irrelevant’ alternative C.

An analogous truth discovery axiom states that the ranking of facts $f_1$ and $f_2$ for some object o depends only on the claims relating to o. Intuitively, this is not a desirable property. Indeed, we have already seen in example 1.1 that the claims for object p in the network from Fig. 1 can play an important role in determining the ranking of f and g for object o, but the adapted IIA axiom precludes this.

This undesirability can be made precise. First, we must state the axiom formally.

Axiom 6

(Per-object Independence (POI)) Let $o \in {\mathcal {O}}$. Suppose $N_1$, $N_2$ are networks such that $F_o = \mathsf {obj}_{N_1}^{-1}(o) = \mathsf {obj}_{N_2}^{-1}(o)$ and $\mathsf {src}_{N_1}(f) = \mathsf {src}_{N_2}(f)$ for each $f \in F_o$. Then the restrictions of $\preceq _{N_1}^T$ and $\preceq _{N_2}^T$ to $F_o$ are equal; that is, $f_1 \preceq _{N_1}^T f_2\ \text {iff }f_1 \preceq _{N_2}^T f_2$ for all $f_1, f_2 \in F_o$.

Considering Fig. 1 again, POI implies that the ranking of f and g remains the same if the claims for h and i are removed. But in this case, Symmetry implies $f \approx g$. Similarly, the ranking of h and i remains the same if the claims for f and g are removed. In this case, Symmetry together with Monotonicity implies $h \prec i$, since $|\mathsf {src}_N(h)| < |\mathsf {src}_N(i)|$.

This observation forms the basis of the following result, which formalises the undesirability of POI: in the presence of our less controversial requirements of Symmetry and Monotonicity, it forces Voting-like behaviour within $\mathsf {obj}_N^{-1}(o)$ for each $o \in {\mathcal {O}}$. We note that, for the special case of binary networks, similar results have been shown in the literature on binary aggregation with abstentions [9].

Theorem 4.1

Let T be any operator satisfying Symmetry, Monotonicity and POI. Then for any $N \in {\mathcal {N}}$, $o \in {\mathcal {O}}$ and $f_1, f_2 \in \mathsf {obj}_N^{-1}(o)$ we have $f_1 \preceq _N^T f_2$ iff $|\mathsf {src}_N(f_1)| \le |\mathsf {src}_N(f_2)|$.

Proof

(sketch) We will sketch the main ideas of the proof here with some technical details omitted; see appendix A for the full proof. Let N be a network, o be an object and $f_1, f_2 \in \mathsf {obj}_N^{-1}(o)$. Consider $N'$ obtained by removing from N all claims for objects other than o. By POI, we have $f_1 \preceq _N^T f_2$ iff $f_1 \preceq _{N'}^T f_2$. Since $|\mathsf {src}_N(f_j)| = |\mathsf {src}_{N'}(f_j)|$ also ($j \in \{1,2\}$), it is sufficient for the proof to show that $f_1 \preceq _{N'}^T f_2$ iff $|\mathsf {src}_{N'}(f_1)| \le |\mathsf {src}_{N'}(f_2)|$.

For the ‘if’ direction, first suppose $|\mathsf {src}_{N'}(f_1)| = |\mathsf {src}_{N'}(f_2)|$. Let $\pi$ be the permutation which swaps $f_1$ with $f_2$ and swaps each source in $\mathsf {src}_{N'}(f_1)$ with one in $\mathsf {src}_{N'}(f_2)$; then we have $\pi (N') = N'$, and Symmetry of T gives $f_1 \approx _{N'}^T f_2$. In particular $f_1 \preceq _{N'}^T f_2$ as required.

Otherwise, $|\mathsf {src}_{N'}(f_2)| - |\mathsf {src}_{N'}(f_1)| = k > 0$. Consider $N''$ where k sources from $\mathsf {src}_{N'}(f_2)$ are removed, and all other claims remain. By Symmetry as above, $f_1 \approx _{N''}^T f_2$. Applying Monotonicity k times we can produce $N'$ from $N''$ and get $f_1 \prec _{N'}^T f_2$ as desired.

For the ‘only if’ statement, suppose $f_1 \preceq _{N'}^T f_2$ but, for contradiction, $|\mathsf {src}_{N'}(f_1)| > |\mathsf {src}_{N'}(f_2)|$. Applying Monotonicity again as above we get $f_1 \succ _{N'}^T f_2$ and the required contradiction. $\square$

Recall that Coherence formalises the idea that source-trustworthiness should inform the fact ranking, and vice versa. Clearly Voting does not conform to this idea, and in fact even the object-wise voting patterns in Theorem 4.1 are incompatible with Coherence. This can easily be seen in the network in Fig. 1 where, regarding object p, we have $|\mathsf {src}_N(h)| < |\mathsf {src}_N(i)|$ (hence $h \prec ^T_N i$) and, regarding object o, we have $|\mathsf {src}_N(f)| = |\mathsf {src}_N(g)|$ (hence $f \approx ^T_N g$). Hence $\mathsf {facts}_N(s)$ is less believable than $\mathsf {facts}_N(t)$. If Coherence held this would give $s \sqsubset _N^T t$, but then $\mathsf {src}_N(f)$ is less trustworthy than $\mathsf {src}_N(g)$, giving $f \prec _N^T g$—a contradiction. From this discussion and Theorem 4.1 we obtain as a corollary the following first impossibility result for truth discovery.

Theorem 4.2

There is no TD operator satisfying Coherence, Symmetry, Monotonicity and POI.

Given that Theorem 4.1 characterises the fact ranking of Voting for facts relating to a single object, it is natural to ask if there is a stronger form of independence that guarantees this behaviour across all facts. As our next result shows, the answer is yes, and the necessary axiom is obtained by ignoring the role of objects altogether for fact ranking.

Axiom 7 (Strong Independence)

For any networks $N_1, N_2$ and facts $f_1, f_2$, if $\mathsf {src}_{N_1}(f_j) = \mathsf {src}_{N_2}(f_j)$ for each $j \in \{1, 2\}$ then $f_1 \preceq _{N_1}^T f_2$ iff $f_1 \preceq _{N_2}^T f_2$.

That is, the ranking of two facts $f_1$ and $f_2$ is determined solely by the sources claiming $f_1$ and $f_2$. In particular, the fact-object affiliations and claims for facts other than $f_1, f_2$ are irrelevant when deciding on $f_1$ versus $f_2$. Note that Strong Independence implies POI. We have the following result.

Theorem 4.3

Suppose $|{\mathcal {O}}| \ge 3$. Then an operator T satisfies Strong Independence, Monotonicity and Symmetry if and only if for any network N and $f_1, f_2 \in {\mathcal {F}}$ we have

$$\begin{aligned} f_1 \preceq _N^T f_2 \iff |\mathsf {src}_N(f_1)| \le |\mathsf {src}_N(f_2)| \end{aligned}$$

Theorem 4.3 can be seen as a characterisation of the class of TD operators that rank facts in the same way as Voting. The proof is similar to that of Theorem 4.1, but uses a different transformation to obtain a modified network $N'$ in the first step.

We have established that neither POI nor Strong Independence are satisfactory axioms for truth discovery, and a weaker independence property is required. Figure 1 can help us once again in this regard. Whereas POI and Strong Independence would say that facts h and i are irrelevant to f, the argument with Coherence for Theorem 4.2 suggests otherwise due the indirect links via the sources. We therefore propose that only when there is no (undirected) path between two nodes can we consider them to be truly irrelevant to each other. That is, nodes are relevant to each other iff they lie in the same connected component of the network.

Our final rendering of independence states that the ordering of two facts in the same connected component does not depend on any claims outside of the component, and similarly for sources.

Axiom 8 (Per-component Independence (PCI))

For any TD networks $N_1$, $N_2$ with a common connected component G, the restrictions of $\sqsubseteq _{N_1}^T$ and $\sqsubseteq _{N_2}^T$ to $G \cap {\mathcal {S}}$ are equal, and the restrictions of $\preceq _{N_1}^T$ and $\preceq _{N_2}^T$ to $G \cap {\mathcal {F}}$ are equal; that is, $s_1 \sqsubseteq _{N_1}^T s_2\ \text {iff } s_1 \sqsubseteq _{N_2}^T s_2$ and $f_1 \preceq _{N_1}^T f_2\ \text {iff } f_1 \preceq _{N_2}^T f_2$ for $s_1, s_2 \in G \cap {\mathcal {S}}$ and $f_1, f_2 \in G \cap {\mathcal {F}}$.

In analogy with Source/Fact Coherence and Source/Fact Symmetry, it is possible to split the two requirements of PCI into sub-axioms Source-PCI (in which only the constraint on source ranking is imposed) and Fact-PCI (in which only the fact ranking is constrained).

Note that while our framework can be easily adapted to require by definition that a network is itself connected (and therefore has only one connected component), we have found that datasets with multiple connected components do indeed occur in practise.⁷ This means that failure of PCI is a real issue, and consequently we consider PCI to be another core axiom that all reasonable operators should satisfy.

5 Satisfaction of the axioms

With the axioms formally defined, we can now consider whether they are satisfied by the example operators of Sect. 3. Voting can be analysed outright; for Sums we require some preliminary results giving sufficient conditions for iterative and recursive operators to satisfy various axioms. It will be seen that neither Voting nor Sums satisfy all our desirable axioms, but it is possible to modify each operator to gain some improvement with respect to the axioms.

5.1 Voting

As the simplest operator, we consider Voting first. The following theorem shows that all axioms except Coherence are satisfied. Since Coherence is a fundamental principle of truth discovery, and we actually consider POI and Strong Independence to be undesirable, this formally rules out Voting as a viable operator.

Theorem 5.1

Voting satisfies Symmetry, Unanimity, Groundedness, Monotonicity, POI, Strong Independence and PCI. Voting does not satisfy Coherence.

The proof is straightforward, and is deferred to appendix A. Note that once Symmetry, Monotonicity and POI are shown, the fact that Voting fails Coherence follows from our impossibility result (Theorem 4.2), and Fig. 1 serves as an explicit counterexample.

5.2 Iterative and recursive operators

In this section we give sufficient conditions for iterative and recursive operators to satisfy various axioms. These results will be useful in what follows when analysing Sums, although they may also be applied more generally to other operators.

Coherence. To analyse whether the limit of a recursive operator satisfies Coherence, we consider how the update function U behaves when the difference in belief scores between the facts of $s_1$ and $s_2$ is ‘small’ (and similarly for the sources of $f_1$, $f_2$). To that end, we introduce a numerical variant of a set of facts Y being ‘less believable’ than $Y'$.

Definition 5.1

Let T be a numerical TD operator, N a network, $Y, Y' \subseteq {\mathcal {F}}$ and $\varepsilon , \rho > 0$. We say Y is $(\varepsilon , \rho )$-less believable than $Y'$ with respect to N and T if there is a bijection $\varphi : Y \rightarrow Y'$ such that $T_N(f) - T_N(\varphi (f)) \le \varepsilon$ for all $f \in Y$, and $T_N({\hat{f}}) - T_N(\varphi ({\hat{f}})) \le \varepsilon - \rho$ for some ${\hat{f}} \in Y$.

For $X, X' \subseteq {\mathcal {S}}$, we define X $(\varepsilon , \rho )$-less trustworthy than $X'$ similarly.

This generalises definition 4.1 by relaxing the requirement that ${f \preceq _N^T \varphi (f)}$, and instead requiring that f can only be more believable than $\varphi (f)$ by some threshold $\varepsilon > 0$. Definition 4.1 is recovered in the limiting case $\varepsilon \rightarrow 0$. We obtain a sufficient condition on the update function U for a recursive operator to satisfy Source-Coherence.

Lemma 5.1

Let $U: {\mathcal {T}}_{Num}\rightarrow {\mathcal {T}}_{Num}$. For any prior operator $T^{\text {prior}}$, $\mathsf {rec}(T^{\text {prior}}, U)$ satisfies Source-Coherence if the following condition is satisfied: there exist $C, D > 0$ such that for all networks N and numerical operators T it holds that if $\mathsf {facts}_N(s_1)$ is $(\varepsilon , \rho )$-less believable than $\mathsf {facts}_N(s_2)$ with respect to N and T, then $T'_N(s_1) - T'_N(s_2) \le C\varepsilon - D\rho$, where $T' = U(T)$.

The proof of Lemma 5.1 uses the following result, the proof of which is a straightforward application of the definition of the limit.

Lemma 5.2

Let N be a truth discovery network and $(T^n)_{n \in {\mathbb {N}}}$ be a convergent iterative operator with limit $T^*$. Then for $f_1, f_2 \in {\mathcal {F}}$, $f_1 \preceq _N^{T^*} f_2$ if and only if

$$\begin{aligned} \forall \varepsilon > 0\ \exists K \in {\mathbb {N}}: \forall n \ge K: T_N^n(f_1) - T_N^n(f_2) \le \varepsilon \end{aligned}$$

Also, $f_1 \prec _N^{T^*} f_2$ if and only if

$$\begin{aligned} \exists \rho> 0 : \forall \varepsilon > 0\ \exists K \in {\mathbb {N}}: \forall n \ge K: T_N^n(f_1) - T_N^n(f_2) \le \varepsilon - \rho \end{aligned}$$

Analogous statements for source rankings also hold.

Proof (Lemma 5.1)

Let N be a network. Suppose U has the stated property and that $\mathsf {rec}(T^{\text {prior}}, U) = (T^n)_{n \in {\mathbb {N}}}$ converges to $T^*$. Suppose $\mathsf {facts}_N(s_1)$ is less trustworthy than $\mathsf {facts}_N(s_2)$ with respect to N and $T^*$ under a bijection $\varphi$. We must show that $s_1 \sqsubset _N^{T^*} s_2$.

Now, there is some ${\hat{f}} \in \mathsf {facts}_N(s_1)$ with ${\hat{f}} \prec _N^{T^*} \varphi ({\hat{f}})$. The second part of Lemma 5.2 therefore applies; let $\rho$ be as given there. Now let $\varepsilon > 0$. Since $f \preceq _N^{T^*} \varphi (f)$ for each $f \in \mathsf {facts}_N(s_1)$, we may apply Lemma 5.2 with $f, \varphi (f)$ and ${\bar{\varepsilon }} = \varepsilon / C$ to get that there is $K \in {\mathbb {N}}$ such that

$$\begin{aligned} T_N^n(f) - T_N^n(\varphi (f)) \le {\bar{\varepsilon }} \end{aligned}$$

and

$$\begin{aligned} T_N^n({\hat{f}}) - T_N^n(\varphi ({\hat{f}})) \le {\bar{\varepsilon }} - \rho \end{aligned}$$

for all $n \ge K$. In other words, $\mathsf {facts}_N(s_1)$ is $({\bar{\varepsilon }}, \rho )$-less believable than $\mathsf {facts}_N(s_2)$ with respect to N and $T^n$ for all $n \ge K$.

Now, recall that $T^{n+1}=U(T^n)$. For $m \ge K' = K + 1$ we therefore have, applying our condition on U,

$$\begin{aligned} T_N^m(s_1) - T_N^m(s_2) \le C{\bar{\varepsilon }} - D\rho = \varepsilon - D\rho \end{aligned}$$

Since $D\rho$ is positive and does not depend on $\varepsilon$, we get $s_1 \sqsubset _N^{T^*} s_2$ by Lemma 5.2. This shows that $T^*$ satisfies Source-Coherence. $\square$

A similar result gives conditions under which Fact-Coherence is satisfied.

Lemma 5.3

$\mathsf {rec}(T^{\text {prior}}, U)$ satisfies Fact-Coherence if there exist $E, F > 0$ such that for all networks N and numerical operators T it holds that if $\mathsf {src}_N(f_1)$ is $(\varepsilon , \rho )$-less trustworthy than $\mathsf {src}_N(f_2)$ with respect to N and $T'$, then $T'_N(f_1) - T'_N(f_2) \le E\varepsilon - F\rho$, where $T' = U(T)$.

Proof

The proof proceeds in an identical way to Lemma 5.1; the only difference is that we may simply take $K' = K$ in the final step. $\square$

Note that there is asymmetry between Lemmas 5.1 and 5.3—in the condition on U in Lemma 5.1 we have $\mathsf {facts}_N(s_1)$ $(\varepsilon , \rho )$-less trustworthy than $\mathsf {facts}_N(s_2)$ with respect to T, whereas in Lemma 5.3 the corresponding condition is with respect to $T' = U(T)$. This reflects the manner in which Sums and other TD operators are typically defined: source trust scores are updated based on the fact scores of the previous iteration, whereas fact belief scores are updated based on the (new) trust scores in the current iteration.

Also note that the above results still hold if U has the stated property only for ‘small’ $\varepsilon$; that is, if there is a constant $0< \lambda < 1$ such that the property holds for all $\rho$ and for all $\varepsilon < \lambda \rho$.

Symmetry and PCI. When considering either Symmetry or PCI for an iterative operator $(T^n)_{n \in {\mathbb {N}}}$, it is not enough to know that each $T^n$ satisfies the relevant axiom. The following example illustrates this fact for Symmetry.

Example 5.1

Fix some ${\hat{f}} \in {\mathcal {F}}$, and define an iterative operator by

$$\begin{aligned} T_N^n(s)= & {} 1\\ T_N^n(f)= & {} {\left\{ \begin{array}{ll} |\mathsf {src}_N(f)| + (1 - \frac{1}{n + 1}) &{} \text { if } |\mathsf {src}_N(f)| = |\mathsf {src}_N({\hat{f}})| \\ |\mathsf {src}_N(f)| &{} \text { otherwise } \end{array}\right. } \end{aligned}$$

That is, each $T^n$ is a modification of Voting in which we boost the score of all facts tied with ${\hat{f}}$ under Voting by $1 - \frac{1}{n+1}$. Since this additional weight is (strictly) less than 1 for each n, the ordinal operator induced by $T^n$ is simply Voting, and therefore satisfies Symmetry. However, it is easy to see that the limit operator $T^*$ has $T_N^*({\hat{f}}) = |\mathsf {src}_N({\hat{f}})| + 1$; this means $T^*$ uses extra information beyond the structure of the network N in its ranking (namely, the identity of a selected fact ${\hat{f}}$) which violates Symmetry.

Using a similar tactic, one can construct a sequence of numerical operators $(T^n)_{n \in {\mathbb {N}}}$ such that each $T^n$ satisfies PCI, but the limit operator $T^*$ does not.

Fortunately, there is a natural strengthening of both Symmetry and PCI for numerical operators which is preserved in the limit. Let us say that a numerical operator T satisfies numerical Symmetry if for any equivalent networks $N, \pi (N)$ we have $T_N(z) = T_{\pi (N)}(\pi (z))$ for all $z \in {\mathcal {S}}\cup {\mathcal {F}}$. Similarly, T satisfies numerical PCI if for any networks $N_1$ and $N_2$ with a common connected component G, we have $T_{N_1}(z) = T_{N_2}(z)$ for all $z \in G \cap ({\mathcal {S}}\cup {\mathcal {F}})$. Clearly numerical Symmetry implies Symmetry, and numerical PCI implies PCI. The following result is immediate.

Lemma 5.4

Suppose $(T^n)_{n \in {\mathbb {N}}}$ converges to $T^*$. Then

If $T^n$ satisfies numerical Symmetry for each $n \in {\mathbb {N}}$, then $T^*$ satisfies Symmetry.
If $T^n$ satisfies numerical PCI for each $n \in {\mathbb {N}}$, then $T^*$ satisfies PCI.

As a consequence of Lemma 5.4, any recursive operator $\mathsf {rec}(T^\text {prior}, U)$ satisfies Symmetry whenever $T^{\text {prior}}$ satisfies numerical Symmetry and U preserves numerical Symmetry, in the sense that U(T) satisfies numerical Symmetry whenever T does (and similarly for PCI).

Unanimity, Groundedness and Monotonicity. In contrast to Symmetry and PCI, both Unanimity and Groundedness are preserved when taking the limit of an iterative operator.

Lemma 5.5

Suppose $(T^n)_{n \in {\mathbb {N}}}$ converges to $T^*$. Then

If $T^n$ satisfies Unanimity for each $n \in {\mathbb {N}}$, then $T^*$ satisfies Unanimity.
If $T^n$ satisfies Groundedness for each $n \in {\mathbb {N}}$, then $T^*$ satisfies Groundedness.

For Monotonicity, we require the following (stronger) property to hold for each $T^n$.

Definition 5.2

A numerical operator T satisfies Improvement if for each $N, N'$ and f as in the statement of Monotonicity, we have $\delta (f) > \delta (g)$ for all $g \ne f$, where

$$\begin{aligned} \delta (g) = T_{N'}(g) - T_N(g) \end{aligned}$$

In this case we write $\rho _{N,N'} = \min _{g \ne f}{(\delta (f) - \delta (g))} > 0$.

Here $\delta (g)$ is the amount by which the belief score for g increases when going from the network N to $N'$. Improvement simply says that when adding a new source to a fact f, it is f that sees the largest increase.

Proposition 5.1

Suppose $(T^n)_{n \in {\mathbb {N}}}$ converges to $T^*$, and $T^n$ satisfies Improvement for each $n \in {\mathbb {N}}$. Suppose also that $\inf _{n \in {\mathbb {N}}}{\rho _{N,N'}^n} > 0$ for each $N, N'$ arising in the statement of Monotonicity. Then $T^*$ satisfies Monotonicity.

Proof

Let $N, N'$ and f be as in the statement of Monotonicity, and suppose $g \preceq _N^{T^*} f$ for some $g \ne f$. We will show $g \prec _{N'}^{T^*} f$ using Lemma 5.2.

Write $\rho ^* = \inf _{n \in {\mathbb {N}}}{\rho _{N,N'}^n} > 0$ and let $\varepsilon > 0$. Since $g \preceq _N^{T^*} f$, there is $K \in {\mathbb {N}}$ such that $T^n_N(g) - T^n_N(f) \le \varepsilon$ for all $n \ge K$. For such n, we have

$$\begin{aligned} T_{N'}^n(g) - T_{N'}^n(f)&= (T_N^n(g) + \delta ^n(g)) - (T_N^n(f) + \delta ^n(f)) \\&= \underbrace{T_N^n(g) - T_N^n(f)}_{\le \varepsilon } - \underbrace{(\delta ^n(f) - \delta ^n(g))}_{\ge \rho _{N,N'}^n}\\&\le \varepsilon - \rho _{N,N'}^n \\&\le \varepsilon - \rho ^* \end{aligned}$$

By Lemma 5.2, we have $g \prec _{N'}^{T^*} f$ as required. $\square$

The requirement that $\inf _{n \in {\mathbb {N}}}{\rho _{N,N'}^n} > 0$ is a technical condition which ensures the strict inequality $g \prec _{N'}^{T^*} f$ holds in the limit, as required for Monotonicity. If this condition fails $T^*$ still satisfies a natural ‘weak Monotonicity’ axiom, in which the strict inequality $g \prec _{N'}^{T^*} f$ is replaced with $g \preceq _{N'}^{T^*} f$.

5.3 Sums

We come to the axiomatic analysis of Sums. Coherence and the simpler axioms are satisfied here, and the undesirable independence axioms (POI and Strong Independence) are not. However, Monotonicity and PCI do not hold. Since PCI is one of our most important axioms that we expect any reasonable operator to satisfy, this potentially limits the usefulness of Sums in practise.

Theorem 5.2

Sums satisfies Coherence, Symmetry, Unanimity and Groundedness. Sums does not satisfy POI, Strong Independence, PCI or Monotonicity.

Proof (sketch)

Symmetry, Unanimity and Groundedness can be easily shown from Lemmas 5.4 and 5.5; the details can be found in the appendix. In the remainder of the proof, $(T^n)_{n \in {\mathbb {N}}}$ will denote the iterative operator Sums, $T^*$ will denote the limit operator, and $U = \mathsf {norm}\circ U^{\text {Sums}}$ will denote the update function for Sums.

Coherence. We will show Source-Coherence using Lemma 5.1. The argument for Fact-Coherence is similar (using Lemma 5.3) and can be found in the appendix.

Suppose $N \in {\mathcal {N}}$, $T \in {\mathcal {T}}_{Num}$, $\varepsilon , \rho > 0$, and $\mathsf {facts}_N(s_1)$ is $(\varepsilon , \rho )$-less believable than $\mathsf {facts}_N(s_2)$ with respect to N and T under a bijection $\varphi : \mathsf {facts}_N(s_1) \rightarrow \mathsf {facts}_N(s_2)$. By definition there is ${\hat{f}} \in \mathsf {facts}_N(s_1)$ such that $T_N({\hat{f}}) - T_N(\varphi ({\hat{f}})) \le \varepsilon - \rho$. By the remark after the proof of Lemma 5.1, we may assume without loss of generality that $\varepsilon < \frac{1}{|{\mathcal {F}}|} \rho$.

Recall that the update function for Sums is $U = \mathsf {norm}\circ U^{\text {Sums}}$. Write $T' = U^{\text {Sums}}(T)$ and ${\tilde{T}} = U(T) = \mathsf {norm}(U^{\text {Sums}}(T))$ so that ${\tilde{T}} = \mathsf {norm}(T')$. We must show that ${\tilde{T}}_N(s_1) - {\tilde{T}}_N(s_2) \le C\varepsilon - D\rho$ for some constants $C, D > 0$.

Note at this stage that it is possible to further weaken the hypotheses of Lemma 5.1: the result follows if U has the stated property not for all operators T, but only for those such that $T = T^n$ for some $n \in {\mathbb {N}}$. Next, note that if $T'_N(x) = 0$ for all $x \in {\mathcal {S}}$ then trust and belief scores are 0 in all subsequent iterations, and thus all sources rank equally in the limit $T^*$. But this means the hypothesis for Source-Coherence cannot be satisfied (there are no strict inequalities). We may therefore assume without loss of generality that $T'_N(x) \ne 0$ for at least one $x \in {\mathcal {S}}$. Therefore, by definition of $\mathsf {norm}$,

$$\begin{aligned} {\tilde{T}}_N(s) = \alpha T'_N(s) \end{aligned}$$

where

$$\begin{aligned} \alpha = \frac{1}{\max \limits _{x \in {\mathcal {S}}}{|T'_N(x)|}} \end{aligned}$$

Applying the definition of $U^{\text {Sums}}$ and using the pairing of $\mathsf {facts}_N(s_1)$ and $\mathsf {facts}_N(s_2)$ via $\varphi$, we have

$$\begin{aligned} {\tilde{T}}_N(s_1) - {\tilde{T}}_n(s_2)&= \alpha [T'_N(s_1) - T'_N(s_2)] \\&= \alpha \left[ \sum _{f \in \mathsf {facts}_N(s_1)}{T_N(f)} - \sum _{f \in \mathsf {facts}_N(s_1)}{T_N(\varphi (f))} \right] \\&= \alpha \sum _{f \in \mathsf {facts}_N(s_1)}{ \Big ( T_N(f) - T_N(\varphi (f)) \Big ) } \\&= \underbrace{\alpha }_{> 0} \left[ \underbrace{ \Big ( T_N({\hat{f}}) - T_N(\varphi ({\hat{f}})) \Big ) }_{\le \varepsilon - \rho } + \sum _{f \in \mathsf {facts}_N(s_1) \setminus \{{\hat{f}}\} }{ \underbrace{ \Big ( T_N(f) - T_N(\varphi (f)) \Big ) }_{\le \varepsilon } } \right] \\&\le \alpha \left[ \varepsilon - \rho + \sum _{f \in \mathsf {facts}_N(s_1) \setminus \{{\hat{f}}\} }{ \varepsilon } \right] \\&\le \alpha \cdot \underbrace{ \Big (|{\mathcal {F}}|\varepsilon - \rho \Big ) }_{< 0} \end{aligned}$$

To complete the proof, we need to find a lower bound for $\alpha$ that is independent of T and N (note that a lower bound on $\alpha$ is required since $|{\mathcal {F}}|\varepsilon - \rho$ is negative). It is here that we use the assumption that $T = T^n$ for some $n \in {\mathbb {N}}$. Since $T^n_N(x) \in [0,1]$ for any $n \in {\mathbb {N}}$ and $x \in S$, we have

$$\begin{aligned} |T_N'(x)| = T_N'(x) = \sum _{f \in \mathsf {facts}_N(x)}{ \underbrace{T_N(f)}_{\le 1} } \le |\mathsf {facts}_N(x)| \le |{\mathcal {F}}| \end{aligned}$$

and so

$$\begin{aligned} \alpha = \frac{1}{\max \limits _{x \in {\mathcal {S}}}{|T'_N(x)|}} \ge \frac{1}{|{\mathcal {F}}|} \end{aligned}$$

Combining this with the above bound for ${\tilde{T}}_N(s_1) - {\tilde{T}}_n(s_2)$, we get

$$\begin{aligned} {\tilde{T}}_N(s_1) - {\tilde{T}}_n(s_2) \le \frac{1}{|{\mathcal {F}}|} \Big ( |{\mathcal {F}}|\varepsilon - \rho \Big ) = \varepsilon - \frac{1}{|{\mathcal {F}}|}\rho \end{aligned}$$

Taking $C = 1$ and $D = \frac{1}{|{\mathcal {F}}|}$, the hypotheses of Lemma 5.1 are satisfied; thus Sums satisfies Source-Coherence.

POI, Strong Independence, PCI and Monotonicity. The remaining axioms are handled by counterexamples derived from the network shown in Fig. 2. It can be shown that, if N denotes this network, we have $T_N^*(f) = T_N^*(g) = 0$, so $f \approx _N^{T^*} g$.

Let $N'$ denote the network whose claims are just those of the top connected component. Then it can be shown that $T_{N'}^*(f) = 1$ and $T_{N'}^*(g) = 0$, i.e. $g \prec _{N'}^{T^*} f$. However it is easily verified that our three independence axioms, if satisfied, would each imply $f \preceq _N^{T^*} g$ iff $f \preceq _{N'}^{T^*} g$. Therefore none of POI, Strong Independence and PCI can hold for Sums.

For Monotonicity, consider the network $N''$ obtained from N by removing the edge (u, g). Then we still have $T_{N''}^*(f) = T_{N''}^*(g) = 0$, and in particular $f \preceq _{N''}^{T^*} g$. Returning to N amounts to adding extra support for the fact g. Monotonicity would give $f \prec _N^{T^*} g$ here, but this is clearly false. Hence Monotonicity is not satisfied by Sums. $\square$

The key to the counterexamples derived from Fig. 2 in the above proof lies in the lower connected component, which—restricted to ${\mathcal {S}}\cup {\mathcal {F}}$—is a connected bipartite graph. That is, each source $x_i$ claims all facts in the component, and each fact $y_j$ is claimed by all sources in the component. Moreover, sources elsewhere in the network claim fewer facts than the $x_i$, and facts elsewhere are claimed by fewer sources than the $y_j$.

Since Sums assigns scores by a simple sum, this results in the scores for the $x_i$ and $y_j$ dominating those of the other sources and facts. The normalisation step then divides these scores by the (comparatively large) maximum. As the next result shows, under certain conditions this causes scores to decrease exponentially and become 0 in the limit. In particular, we can generate pathological examples such as Fig. 2 where a whole connected component receives scores of 0, which leads to failure of Monotonicity and the independence axioms.

Proposition 5.2

Let N be a network. Suppose there is $X \subseteq {\mathcal {S}}$, $Y \subseteq {\mathcal {F}}$ such that

$\mathsf {facts}_N(x) = Y$ for each $x \in X$

$\mathsf {src}_N(y) = X$ for each $y \in Y$

$\mathsf {facts}_N(s) \cap Y = \emptyset$ and $|\mathsf {facts}_N(s)| \le \frac{|Y|}{2}$ for each $s \in {\mathcal {S}}\setminus X$

$\mathsf {src}_N(f) \cap X = \emptyset$ and $|\mathsf {src}_N(f)| \le \frac{|X|}{2}$ for each $f \in {\mathcal {F}}\setminus Y$

Then, with $(T^n)_{n \in {\mathbb {N}}}$ denoting Sums, for all $n > 1$ we have

$$\begin{aligned} T_N^n(s)\le & {} \frac{1}{2^{n-1}} \quad (s \in {\mathcal {S}}\setminus X) \\ T_N^n(f)\le & {} \frac{1}{2^{n-1}} \quad (f \in {\mathcal {F}}\setminus Y) \\ T_N^n(x)= & {} 1 \quad (x \in X) \\ T_N^n(y)= & {} 1 \quad (y \in Y) \end{aligned}$$

In particular, if $T^*$ denotes the limit of Sums then $T_N^*(s) = T_N^*(f) = 0$ for all $s \in {\mathcal {S}}\setminus X$ and $f \in {\mathcal {F}}\setminus Y$.

Proof

We proceed by induction. The result is easy to show in the base case $n = 2$ since $|\mathsf {facts}_N(s)| \le \frac{1}{2}|\mathsf {facts}_N(x)|$ for any $x \in X$ and $s \notin X$ (and similarly for facts). Assume the result holds for some $n > 1$. Write $T' = U^{\text {Sums}}(T^n)$, so that $T^{n+1} = \mathsf {norm}(T')$. If $s \notin X$ then $\mathsf {facts}_N(s) \subseteq {\mathcal {F}}\setminus Y$, so

$$\begin{aligned} T'_N(s) = \sum _{f \in \mathsf {facts}_N(s)}{ \underbrace{T^n_N(f)}_{\le \frac{1}{2^{n-1}}} } \le \frac{|\mathsf {facts}_N(s)|}{2^{n-1}} \le \frac{\frac{1}{2}|Y|}{2^{n - 1}} = \frac{|Y|}{2^{(n+1) - 1}} \end{aligned}$$

Similarly, if $f \notin Y$ then $\mathsf {src}_N(f) \subseteq {\mathcal {S}}\setminus X$, so

$$\begin{aligned} T'_N(f) = \sum _{s \in \mathsf {src}_N(f)}{ \underbrace{T'_N(s)}_{\le \frac{|Y|}{2^{(n+1) - 1}}} } \le \frac{|\mathsf {src}_N(f)| \cdot |Y|}{2^{(n+1) - 1}} \le \frac{\frac{1}{2}|X| \cdot |Y|}{2^{(n+1) - 1}} = \frac{|X| \cdot |Y|}{2^{(n+2) - 1}} \end{aligned}$$

On the other hand, the fact that $T_N^n(x) = T_N^n(y) = 1$ for $x \in X$ and $y \in Y$ gives

$$\begin{aligned} T'_N(x)= & {} \sum _{y \in Y}{ T^n_N(y) } = |Y| \\ T'_N(y)= & {} \sum _{x \in X}{T'_N(x)} = |X| \cdot |Y| \end{aligned}$$

Clearly the $x \in X$ and $y \in Y$ are the sources and facts with maximal trust and belief scores, respectively. This means that after normalisation via $\mathsf {norm}$, $T_N^{n+1}(x) = T_N^{n+1}(y) = 1$ and for $s \notin X$ and $f \notin Y$,

$$\begin{aligned} T_N^{n+1}(s)= & {} \frac{T'_N(s)}{|Y|} \le \frac{1}{2^{(n+1) - 1}} \\ T_N^{n+1}(f)= & {} \frac{T'_N(f)}{|X| \cdot |Y|} \le \frac{1}{2^{(n+2) - 1}} \le \frac{1}{2^{(n+1) - 1}} \end{aligned}$$

This shows that the claim holds for $n+1$; by induction, the proof is complete. $\square$

5.4 Modifying Voting and Sums

So far we have seen that neither of the basic operators Voting or Sums are completely satisfactory with respect to the axioms of Sect. 4. Armed with the knowledge of how and why certain axioms fail, one may wonder whether it is possible to modify the operators accordingly so that the axioms are satisfied. Presently we shall show that this is partially possible both in the case of Voting and Sums.

5.4.1 Voting

A core problem with Voting is that it fails Coherence. Indeed, all sources are ranked equally regardless of the ‘votes’ for facts, so in some sense it is obvious that the source ranking does not cohere with the fact ranking.⁸ An easy improvement is to explicitly construct the source ranking to guarantee Source-Coherence.

Definition 5.3

For a network N, define a binary relation $\mathrel {\lhd }_N$ on ${\mathcal {S}}$ by $s_1 \mathrel {\lhd }_N s_2$ iff $\mathsf {facts}_N(s_1)$ is less believable than $\mathsf {facts}_N(s_2)$ with respect to Voting. The numerical operator SC-Voting (Source-Coherence Voting) is defined by

$$\begin{aligned} T_N^{SCV}(s) = |\{t \in {\mathcal {S}}: t \mathrel {\lhd }_N s \}|, \quad T_N^{SCV}(f) = |\mathsf {src}_N(f)| \end{aligned}$$

It can be seen that SC-Voting satisfies Source-Coherence, which is a significant improvement over regular Voting. Since $\mathrel {\lhd }_N$ relies on ‘global’ properties on N, however, this comes at the expense of Source-PCI. Satisfaction of the other axioms is inherited from Voting.

Theorem 5.3

SC-Voting satisfies Source-Coherence, Symmetry, Unanimity, Groundedness, Monotonicity, Fact-PCI, POI and Strong Independence. It does not satisfy Fact-Coherence or Source-PCI.

The following properties of ${\mathrel {\lhd }_N}$ are useful for showing Source-Coherence.

Lemma 5.6

$\mathrel {\lhd }_N$ is transitive and irreflexive.

Proof

For transitivity, suppose $s \mathrel {\lhd }_N t$ and $t \mathrel {\lhd }_N u$. Then $\mathsf {facts}_N(s)$ is less believable than $\mathsf {facts}_N(t)$ (with respect to Voting) via some bijection $\varphi : \mathsf {facts}_N(s) \rightarrow \mathsf {facts}_N(t)$, and $\mathsf {facts}_N(t)$ is less believable than $\mathsf {facts}_N(u)$ via some bijection $\psi : \mathsf {facts}_N(t) \rightarrow \mathsf {facts}_N(u)$. It is easily seen that $\mathsf {facts}_N(s)$ is less believable than $\mathsf {facts}_N(u)$ via the composition $\theta = \psi \circ \varphi$, so $s \mathrel {\lhd }_N u$.

For irreflexivity, suppose for contradiction that $s \mathrel {\lhd }_N s$ for some $s \in {\mathcal {S}}$, i.e. $F = \mathsf {facts}_N(s)$ is less believable than itself under some bijection $\varphi : F \rightarrow F$. Then ${f \preceq _N^T \varphi (f)}$ for each $f \in F$, and there is ${\hat{f}}$ such that ${{\hat{f}} \prec _N^T \varphi ({\hat{f}})}$. Iterating applications of $\varphi$, we get

$$\begin{aligned} {\hat{f}} \prec _N^T \varphi ({\hat{f}}) \preceq _N^T \varphi (\varphi ({\hat{f}}) \preceq _N^T \cdots \preceq _N^T \varphi ^n({\hat{f}}) \end{aligned}$$

(5.1)

for each $n \ge 1$, where $\varphi ^n$ is the n-th iterate of $\varphi$ and T denotes Voting.

Since F is finite, the sequence $\varphi ({\hat{f}}), \varphi (\varphi ({\hat{f}})), \ldots$ must repeat at some point, i.e. there is $i < j$ such that $\varphi ^i({\hat{f}}) = \varphi ^j({\hat{f}})$. But then injectivity of $\varphi$ implies that ${\hat{f}} = \varphi ^{j - i}({\hat{f}})$. Taking $n = j - i$ in Eq. (5.1) we get ${\hat{f}} \prec _N^T {\hat{f}}$—a contradiction. $\square$

Proof (Theorem 5.3 (sketch))

Note that SC-Voting inherits Unanimity, Groundedness, Monotonicity, Fact-PCI, POI and Strong Independence from Voting, since these axioms only refer to the rankings of facts (which is the same for SC-Voting as for Voting).

We take the remaining axioms in turn. To simplify notation, write $W_N(s) = \{t \in {\mathcal {S}}: t \mathrel {\lhd }_N s \}$ in what follows.

Source-Coherence. Suppose $\mathsf {facts}_N(s_1)$ is less believable than $\mathsf {facts}_N(s_2)$ with respect to $T^{SCV}$. We need to show $s_1 \sqsubset _N^{T^{SCV}} s_2$.

Note that since the fact ranking for $T^{SCV}$ coincides with Voting, we have $s_1 \mathrel {\lhd }_N s_2$. Transitivity of ${\mathrel {\lhd }_N}$ gives $W_N(s_1) \subseteq W_N(s_2)$. Moreover, $s_1 \in W_N(s_2)$ but by irreflexivity, $s_1 \notin W_N(s_1)$. Therefore $W_N(s_1) \subset W_N(s_2)$, which means $T_N^{SCV}(s_1) = |W_N(s_1)| < |W_N(s_2)| = T_N^{SCV}(s_2)$, i.e. $s_1 \sqsubset _N^{T^{SCV}} s_2$ as required.

Symmetry. Since the fact ranking of $T^{SCV}$ is the same as Voting, which satisfies Symmetry, we only need to show that $s_1 \sqsubseteq _N^{T^{SCV}} s_2$ iff $\pi (s_1) \sqsubseteq _{\pi (N)}^{T^{SCV}} \pi (s_2)$ for all equivalent networks $N, \pi (N)$ and $s_1, s_2 \in {\mathcal {S}}$.

In can be shown, and we do so in the appendix, that the Symmetry of Voting implies a symmetry property for ${\mathrel {\lhd }_N}$ and ${\mathrel {\lhd }_{\pi (N)}}$: we have $s_1 \mathrel {\lhd }_N s_2$ iff $\pi (s_1) \mathrel {\lhd }_{\pi (N)} \pi (s_2)$. Consequently, $t \in W_N(s_i)$ iff $\pi (t) \in W_{\pi (N)}(\pi (s_i))$; in particular, $|W_N(s_i)| = |W_{\pi (N)}(\pi (s_i))|$. This means

$$\begin{aligned} s_1 \sqsubseteq _N^{T^{SCV}} s_2&\iff |W_N(s_1)| \le |W_N(s_2)| \\&\iff |W_{\pi (N)}(\pi (s_1))| \le |W_{\pi (N)}(\pi (s_2))| \\&\iff \pi (s_1) \sqsubseteq _{\pi (N)}^{T^{SCV}} \pi (s_2) \end{aligned}$$

as required for Symmetry.

Fact-Coherence Consider the network shown in Fig. 3. We have $f \approx g \approx i \prec h$. Source-Coherence between s and t gives $t \sqsubset s$. If Fact-Coherence held we would then get $g \prec f$, but this is not the case.

Source-PCI Let $N_1$ denote the top connected component of the network shown in Fig. 4, and let $N_2$ denote the network as a whole. The fact ranking is the same in both networks: $g \approx h \approx i \prec f$. In $N_1$ sources s and t claim a different number of facts, so neither $s \mathrel {\lhd }_{N_1} t$ nor $t \mathrel {\lhd }_{N_1} s$. Consequently $W_{N_1}(s) = W_{N_1}(t) = \emptyset$ and $s \simeq _{N_1}^{T^{SCV}} t$.

In $N_2$ sources t and u can be compared for Source-Coherence, and we see that $u \mathrel {\lhd }_{N_2} t$ since $i \preceq _{N_2}^{T^{SCV}} g$ and $h \prec _{N_2}^{T^{SCV}} f$. Hence $W_{N_2}(t) = \{u\}$ and $W_{N_2}(s) = \emptyset$, which means $s \sqsubset _{N_2}^{T^{SCV}} t$. This contradicts Source-PCI, which requires the ranking of s and t to be the same in both networks. $\square$

Note that the idea behind SC-Voting can be generalised beyond Voting: it is possible to define $\mathrel {\lhd }_N$ in terms of any operator T, and to construct a new operator $T'$ whose source ranking is given according to $\mathrel {\lhd }_N$ as above, and whose fact ranking coincides with that of T. This ensures $T'$ satisfies Source-Coherence whilst keeping the existing fact ranking from T.

Moreover we can go in the other direction and ensure Fact-Coherence whilst retaining the source ranking of T by defining a relation $\mathrel {\blacktriangleleft }_N$ on ${\mathcal {F}}$ in a analogous manner to $\mathrel {\lhd }_N$, and proceeding similarly.

5.4.2 Sums

Our main concern with Sums is the failure of PCI and Monotonicity. We have seen that this is in some sense caused by the normalisation step: in Fig. 2 the scores of f, g etc go to 0 in the limit after dividing the ‘global’ maximum score across the network, which happens to come from a different connected component.

A natural fix for PCI is to therefore divide by the maximum score within each component. In this case the score for a source s depends only on the structure of the connected component in which it lies, which is exactly what is required for PCI.

However, this approach does not negate the undesirable effects of proposition 5.2. Indeed, suppose the network in Fig. 2 was modified so that fact $y_1$ is associated with object o instead of $p_1$. Clearly proposition 5.2 still applies after this change, and all sources and facts shown now belong to the same connected component. Therefore the ‘per-component Sums’ operator gives the same result as Sums itself, and in particular our Monotonicity counterexample still applies. Perhaps even worse, one can show that Coherence fails for this operator. We consider the loss of Coherence too high a price to pay for PCI.

Instead, let us take a step back and consider if normalisation is truly necessary. On the one hand, without normalisation the trust and belief scores are unbounded and therefore do not converge. On the other, we are not interested in the numeric scores for their own sake, but rather for the rankings that they induce. It may be possible that whilst the scores diverge without normalisation, the induced rankings do converge to a fixed one, which we may take as the ‘ordinal limit’. This is in fact the case. We call this new operator UnboundedSums.

Definition 5.4

UnboundedSums is the recursive operator $\mathsf {rec}(T^{\text {prior}}, U^{\text {Sums}})$ where $T_N^{\text {prior}}(s) = 1$, $T_N^{\text {prior}}(f) = |\mathsf {src}_N(f)|$ and $U^{\text {Sums}}$ is defined as in Sect. 3.2.⁹

Theorem 5.4

UnboundedSums is ordinally convergent in the following sense: there is an ordinal operator $T^*$ such that for each network N there exists $J_N \in {\mathbb {N}}$ such that $T_N^n(s_1) \le T_N^n(s_2)$ iff $s_1 \sqsubseteq _N^{T^*} s_2$ for all $n \ge J_N$ and $s_1, s_2 \in {\mathcal {S}}$ (and similarly for facts).

That is, the rankings induced by $T^n$ remain constant after $J_N$ iterations, and are identical to those of $T^*$.

Proof

The proof will use some results from linear algebra, so we will work with a matrix and vector representation of UnboundedSums. Fix an enumeration ${\mathcal {S}}= \{s_1,\ldots ,s_k\}$ of ${\mathcal {S}}$ and ${\mathcal {F}}= \{f_1,\ldots ,f_l\}$ of ${\mathcal {F}}$. Write M for the $k \times l$ matrix given by

$$\begin{aligned} {[}M]_{ij} = {\left\{ \begin{array}{ll} 1 &{} \text { if } s_i \in \mathsf {src}_N(f_j) \\ 0 &{} \text { otherwise } \end{array}\right. } \quad (1 \le i \le k, 1 \le j \le l) \end{aligned}$$

We also write $v_n$ and $w_n$ for the vectors of trust and belief scores of UnboundedSums at iteration n; that is

$$\begin{aligned} v_n= & {} [T_N^n(s_1), \ldots , T_N^n(s_k)]^\top \in {\mathbb {R}}^k\\ w_n= & {} [T_N^n(f_1), \ldots , T_N^n(f_l)]^\top \in {\mathbb {R}}^l \end{aligned}$$

where $(T^n)_{n \in {\mathbb {N}}}$ denotes UnboundedSums.

Multiplication by M encodes the update step of UnboundedSums: it is easily shown that $v_{n+1} = Mw_n$ and $w_{n+1} = M^{\top }v_{n+1}$. Writing $A = MM^\top \in {\mathbb {R}}^{k \times k}$, we have $v_{n+1} = Av_n$, and therefore $v_{n+1} = A^n v_1$.

To show that the rankings of UnboundedSums remain constant after finitely many iterations, we will show that for each $s_p, s_q \in {\mathcal {S}}$ there is $J_{pq} \in {\mathbb {N}}$ such that ${{\,\mathrm{sign}\,}}([v_n]_p - [v_n]_q)$ is constant for all $n \ge J_{pq}$. Since $[v_n]_p$ and $[v_n]_q$ are the trust scores of $s_p$ and $s_q$ respectively in the n-th iteration, this will show that the ranking of $s_p$ and $s_q$ remains the same after $J_{pq}$ iterations. Since there are only finitely many pairs of sources, we may then take $J_N$ as the maximum value of $J_{pq}$ over all pairs (p, q), and the entire source ranking ${\sqsubseteq _N^{T^n}}$ of UnboundedSums remains constant for $n \ge J_N$. An almost identical argument can be carried out for the fact ranking, and these together will prove the result.

So, fix $s_p, s_q \in {\mathcal {S}}$. Write $\delta _n = [v_n]_p - [v_n]_q$. First note that $A = MM^\top$ is symmetric, so the spectral theorem gives the existence of k orthogonal eigenvectors $x_1, \ldots , x_k$ for A [5, Theorem 7.29]. Let $\lambda _1, \ldots , \lambda _k$ be the corresponding eigenvalues. Form a $(k \times k)$-matrix P whose i-th column is $x_i$, and let $D = \text {diag}(\lambda _1,\ldots ,\lambda _k)$. Then A can be diagonalised as $A = PDP^{-1}$. It follows that for any $n \in {\mathbb {N}}$, $A^n = PD^nP^{-1}$.

Now, since $x_1,\ldots ,x_k$ are orthogonal, P is an orthogonal matrix, i.e. $P^\top = P^{-1}$. Hence $A^n = PD^nP^\top$. Note that

$$\begin{aligned} PD^n = \begin{bmatrix} x_1 \mid \ldots \mid x_k \end{bmatrix} \begin{bmatrix} \lambda _1^n &{} &{} \\ &{} \ddots &{} \\ &{} &{} \lambda _k^n \\ \end{bmatrix} = \begin{bmatrix} \lambda _1^n x_1 \mid \ldots \mid \lambda _k^n x_k \end{bmatrix} \end{aligned}$$

and

$$\begin{aligned} P^{\top }v_1 = \begin{bmatrix} x_1 \\ - \\ \vdots \\ - \\ x_k \end{bmatrix} v_1 = \begin{bmatrix} x_1 \cdot v_1 \\ \vdots \\ x_k \cdot v_1 \end{bmatrix} \end{aligned}$$

which means

$$\begin{aligned} v_{n+1} = A^nv_1 = PD^nP^{\top }v_1 = \begin{bmatrix} \lambda _1^n x_1 \mid \ldots \mid \lambda _k^n x_k \end{bmatrix} \begin{bmatrix} x_1 \cdot v_1 \\ \vdots \\ x_k \cdot v_1 \end{bmatrix} = \sum _{i=1}^{k}{(x_i \cdot v_1) \lambda _i^n x_i} \end{aligned}$$

We obtain an explicit formula for $\delta _{n+1}$:

$$\begin{aligned} \delta _{n+1} = [v_n]_p - [v_n]_q = \sum _{i=1}^{k}{ (x_i \cdot v_1) \lambda _i^n \left( [x_i]_p - [x_i]_q \right) } = \sum _{i=1}^{k}{r_i \lambda _i^n} \end{aligned}$$

(5.2)

where $r_i = (x_i \cdot v_1)\left( [x_i]_p - [x_i]_q \right)$. Note that $r_i$ does not depend on n.

Now, it is easy to see that $A = MM^\top$ is positive semi-definite, which means its eigenvalues $\lambda _1,\ldots ,\lambda _k$ are all non-negative. We re-index the sum in Eq. (5.2) by grouping together the $\lambda _i$ which are equal, to get

$$\begin{aligned} \delta _{n+1} = \sum _{t=1}^{K}{R_t \mu _t^n} \end{aligned}$$

where $K \le k$, each $R_t$ is a sum of the $r_i$ (whose corresponding $\lambda _i$ are equal), and the $\mu _t$ are distinct and non-negative. Assume without loss of generality that $\mu _1> \mu _2> \cdots > \mu _K \ge 0$. If $R_t = 0$ for all t, then clearly ${{\,\mathrm{sign}\,}}(\delta _{n+1}) = {{\,\mathrm{sign}\,}}(0) = 0$ which is constant, so we are done. Otherwise, let T be the minimal t such that $R_t \ne 0$. We may also assume $\mu _T > 0$ (otherwise we necessarily have $\mu _T = 0$, $T=K$ and ${{\,\mathrm{sign}\,}}(\delta _{n+1}) = {{\,\mathrm{sign}\,}}(R_T \cdot 0^n)$ which is again constant 0). Observe that

$$\begin{aligned} \delta _{n+1}= & {} R_T \mu _T^n + \sum _{t=T + 1}^{K}{R_t \mu _t^n} \\= & {} \mu _T^n \left[ R_T + \sum _{t=T + 1}^{K}{ R_t \left( \frac{\mu _t}{\mu _T}\right) ^n } \right] \end{aligned}$$

By our assumption on the ordering of the $\mu _t$, we have $\mu _t < \mu _T$ in the sum. Consequently $|\mu _t / \mu _T| < 1$, and $(\mu _t / \mu _T)^n \rightarrow 0$ as $n \rightarrow \infty$. This means

$$\begin{aligned} \lim _{n \rightarrow \infty }{\left[ R_T + \sum _{t=T + 1}^{K}{ R_t \underbrace{\left( \frac{\mu _t}{\mu _T}\right) ^n}_{\rightarrow 0} } \right] } = R_T \ne 0 \end{aligned}$$

Since this limit is non-zero, there is $J_{pq} \in {\mathbb {N}}$ such that the sign of term in square brackets is equal to $S = {{\,\mathrm{sign}\,}}R_T \in \{1, -1\}$ for all $n \ge J_{pq}$. Finally, for such n we have

$$\begin{aligned} {{\,\mathrm{sign}\,}}\delta _{n+1} = {{\,\mathrm{sign}\,}}\left( \underbrace{\mu _T^n}_{> 0} \left[ R_T + \sum _{t=T + 1}^{K}{ R_t \left( \frac{\mu _t}{\mu _T}\right) ^n } \right] \right) = {{\,\mathrm{sign}\,}}\left( R_T + \sum _{t=T + 1}^{K}{ R_t \left( \frac{\mu _t}{\mu _T}\right) ^n } \right) = S \end{aligned}$$

i.e. ${{\,\mathrm{sign}\,}}\delta _n$ is constant for $n \ge J_{pq} + 1$. This completes the proof.¹⁰$\square$

In light of Theorem 5.4, we may consider UnboundedSums itself as an ordinal operator $T^*$, where $s \sqsubseteq _N^{T^*} t$ iff $s \sqsubseteq _N^{T^{J_n}} t$ for each network N (and similarly for the fact ranking). Moreover, with the normalisation problems aside, UnboundedSums provides an example of a principled operator satisfying our two key axioms—Coherence and PCI. In particular, we escape the undesirable behaviour of Sums in Fig. 2; whereas Sums trivialises the ranking of sources and facts in the upper connected component, UnboundedSums allows meaningful ranking (e.g. we have $g \prec f$). In particular, the counterexample for Monotonicity for Sums is no longer a counterexample for UnboundedSums. We conjecture that UnboundedSums also satisfies Monotonicity, but this remains to be proven. For example, we have experimentally verified that UnboundedSums satisfies all the specific instances of Monotonicity with the starting network N as in Fig. 1.

Theorem 5.5

UnboundedSums satisfies Coherence, Symmetry, Unanimity, Groundedness and PCI. UnboundedSums does not satisfy POI and Strong Independence.

Proof (sketch)

The proof that UnboundedSums satisfies Symmetry, PCI, Unanimity and Groundedness is routine, and we leave the details to the appendix. In what follows, let $(T^n)_{n \in {\mathbb {N}}}$ denote UnboundedSums, $T^*$ denote the ordinal limit of UnboundedSums, and for a network N let $J_N$ be as in Theorem 5.4. Then the rankings in N induced by $T^n$ for $n \ge J_N$ are the same as $T^*$.

Coherence. First we show Source-Coherence. Let N be a network and suppose $\mathsf {facts}_N(s_1)$ is less believable than $\mathsf {facts}_N(s_2)$ with respect to N and $T^*$. Let $\varphi$ and ${\hat{f}}$ be as in the definition of less believable.

Let $n \ge J_N$. Then $f \preceq _N^{T^*} \varphi (f)$ and ${\hat{f}} \prec _N^{T^*} \varphi ({\hat{f}})$ for each $f \in \mathsf {facts}_N(s_1)$ means $T_N^n(f) \le T_N^n(\varphi (f))$ and $T_N^n({\hat{f}}) < T_N^n(\varphi ({\hat{f}}))$. Hence

$$\begin{aligned} T_N^{n+1}(s)&= \sum _{f \in \mathsf {facts}_N(s_1)}{T_N^n(f)} \\&= T_N^n({\hat{f}}) + \sum _{f \in \mathsf {facts}_N(s_1) \setminus \{{\hat{f}}\}}{T_N^n(f)} \\&< T_N^n(\varphi ({\hat{f}})) + \sum _{f \in \mathsf {facts}_N(s_1) \setminus \{{\hat{f}}\}}{T_N^n(\varphi (f))} \\&= \sum _{f \in \mathsf {facts}_N(s_1)}{T_N^n(\varphi (f))} \\&= \sum _{g \in \mathsf {facts}_N(s_2)}{T_N^n(g)} \\&= T_N^{n+1}(s_2) \end{aligned}$$

i.e. $T_N^{n+1}(s_1) < T_N^{n+1}(s_2)$. But $T_N^{n+1}$ gives the same ranking as $T_N^n$ and therefore the same ranking as $T^*$, so we get $s_1 \sqsubset _N^{T^*} s_2$ as required.

For Fact-Coherence, suppose $\mathsf {src}_N(f_1)$ is less trustworthy than $\mathsf {src}_N(f_2)$ with respect to N and $T^*$. Again, let $n \ge J_N$ and $\varphi$, ${\hat{s}}$ be as in the definition of less trustworthy. Recall that belief scores for facts in $T_N^n$ are obtained from trust scores in $T_N^n$; applying the same argument as above we get $T_N^n(f_1) < T_N^n(f_2)$ and consequently $f_1 \preceq _N^{T^*} f_2$ as required. Hence $T^*$ satisfies Coherence.

POI and Strong Independence. To show POI and Strong Independence are not satisfied, consider the network N shown in Fig. 5. It can be seen (e.g. by induction) that

$$\begin{aligned} T_N^n(f) = 1, \quad T_N^n(g) = 2^{n-1} \end{aligned}$$

for all $n \in {\mathbb {N}}$. Consequently $f \prec _N^{T^*} g$.¹¹

Now let $N'$ be the network in which the claim (t, h) is removed. Since $\mathsf {src}_N(f) = \mathsf {src}_{N'}(f) = \{s\}$ and $\mathsf {src}_N(g) = \mathsf {src}_{N'}(g) = \{t\}$, both POI and Strong Independence imply $f \preceq _N^{T^*} g$ iff $f \preceq _{N'}^{T^*} g$. Therefore assuming either of POI or Strong Independence we get $f \prec _{N'}^{T^*} g$. However is is also clear that

$$\begin{aligned} T_{N'}^n(f) = T_{N'}^n(g) = 1 \end{aligned}$$

for all $n \in {\mathbb {N}}$, so $f \approx _{N'}^{T^*} g$. This is a contradiction, so neither POI nor Strong Independence are satisfied. $\square$

To summarise this section, Table 1 shows which axioms are satisfied by each of the operators.

Table 1

Satisfaction of the axioms for the various operators. Recall that POI and Strong Independence are undesirable properties

	Voting	SC-voting	Sums	U-sums
Source-coherence	X	$\checkmark$	$\checkmark$	$\checkmark$
Fact-coherence	$\checkmark$	X	$\checkmark$	$\checkmark$
Symmetry	$\checkmark$	$\checkmark$	$\checkmark$	$\checkmark$
Unanimity	$\checkmark$	$\checkmark$	$\checkmark$	$\checkmark$
Ground.	$\checkmark$	$\checkmark$	$\checkmark$	$\checkmark$
Mon.	$\checkmark$	$\checkmark$	X	?
Source-PCI	$\checkmark$	X	X	$\checkmark$
Fact-PCI	$\checkmark$	$\checkmark$	X	$\checkmark$
POI	$\checkmark$	$\checkmark$	X	X
Str. indep.	$\checkmark$	$\checkmark$	X	X

6 Variable domain truth discovery

So far, we have considered an arbitrary but fixed (finite) domain of sources, facts and objects $({\mathcal {S}}, {\mathcal {F}}, {\mathcal {O}})$. Our operators and axioms were defined with respect to this domain. However, the operators do not depend on the domain: they can be defined for any choice of ${\mathcal {S}}$, ${\mathcal {F}}$ and ${\mathcal {O}}$. In this section we generalise the framework so that these sets are no longer fixed. This allows new situations to be modelled, such as new sources entering the network. Adapting the definition of a TD operator to this case, we can then see how the ranking of facts changes as new sources are added. Such variable domain operators are then analogues of variable electorate voting rules in social theory.

Formally, let ${\mathbb {S}}$, ${\mathbb {F}}$ and ${\mathbb {O}}$ be countably infinite sets of sources, facts and objects respectively. A domain is a triple ${\mathcal {D}}= ({\mathcal {S}}, {\mathcal {F}}, {\mathcal {O}})$, where ${\mathcal {S}}\subseteq {\mathbb {S}}$, ${\mathcal {F}}\subseteq {\mathbb {F}}$ and ${\mathcal {O}}\subseteq {\mathbb {O}}$ are finite, non-empty sets. We think of ${\mathbb {S}}$, ${\mathbb {F}}$ and ${\mathbb {O}}$ as being the ‘universe’ of possible sources, facts and objects, and a domain as the (finite) sets of entities under consideration in a particular TD problem. Given a domain ${\mathcal {D}}= ({\mathcal {S}}, {\mathcal {F}}, {\mathcal {O}})$, we define ${\mathcal {D}}$-networks and ${\mathcal {D}}$-operators as in definitions 2.1 and 2.2.

Definition 6.1

A variable domain operator T is a mapping which maps each domain ${\mathcal {D}}$ to a ${\mathcal {D}}$-operator $T_{\mathcal {D}}$.

Note that for a domain ${\mathcal {D}}= ({\mathcal {S}}, {\mathcal {F}}, {\mathcal {O}})$ and a ${\mathcal {D}}$-network N, $\sqsubseteq _N^{T_{\mathcal {D}}}$ and $\preceq _N^{T_{\mathcal {D}}}$ are rankings only over the set of sources ${\mathcal {S}}$ and ${\mathcal {F}}$ in the domain ${\mathcal {D}}$, not all of ${\mathbb {S}}$ and ${\mathbb {F}}$. If ${\mathcal {D}}$ is clear from context, we write $\sqsubseteq _N^T$ and $\preceq _N^T$ without explicit reference to the domain.

Since all the axioms so far were stated with respect to a fixed but arbitrary domain, they can be easily lifted to the variable domain case. For instance, we say a variable domain operator T satisfies Coherence if $T_{\mathcal {D}}$ satisfies the instance of Coherence for domain ${\mathcal {D}}$, for all ${\mathcal {D}}$, and similar for the other axioms.

But we can now go further, and introduce axioms which make use of several domains. First, we generalise Symmetry to act across domains. Say networks $N, N'$ in domains ${\mathcal {D}}, {\mathcal {D}}'$ respectively are equivalent if there is a graph isomorphism $\pi$ between them such that $\pi (s) \in {\mathcal {S}}'$, $\pi (f) \in {\mathcal {F}}'$ and $\pi (o) \in {\mathcal {O}}'$ for all $s \in {\mathcal {S}}$, $f \in {\mathcal {F}}$ and $o \in {\mathcal {O}}$.

Axiom 9 (Isomorphism)

Let N and $N' = \pi (N)$ be equivalent networks. Then for all $s_1, s_2 \in {\mathcal {S}}$, $f_1, f_2 \in {\mathcal {F}}$, we have $s_1 \sqsubseteq _N^T s_2$ iff $\pi (s_1) \sqsubseteq _{N'}^T \pi (s_2)$ and $f_1 \preceq _N^T f_2$ iff $\pi (f_1) \preceq _{N'}^T \pi (f_2)$.

Like Symmetry, Isomorphism simply says that operators only care about the structure of the network, not the particular ‘names’ chosen for sources, facts and objects. Symmetry is obtained as the special case where N and $N'$ are equivalent when seen as networks in a common domain ${\mathcal {D}}$. All the operators of Sects. 3 and 5.4 satisfy Isomorphism.

Next we introduce a new monotonicity property. In what follows, for a network $N = (V, E)$ in domain $({\mathcal {S}}, {\mathcal {F}}, {\mathcal {O}})$, $f \in {\mathcal {F}}$ and ${\mathcal {S}}' \subseteq {\mathbb {S}}$ finite and disjoint from ${\mathcal {S}}$, write $N + ({\mathcal {S}}', f)$ for the network in domain $({\mathcal {S}}\cup {\mathcal {S}}', {\mathcal {F}}, {\mathcal {O}})$ with edge set $E \cup \{(s, f) \mid s \in {\mathcal {S}}'\}$, i.e. the extension of N where a collection of ‘fresh’ sources ${\mathcal {S}}'$ each claim f. For example, Fig. 6 shows $N + ({\mathcal {S}}', h)$ for the network N from Fig. 1 and new sources ${\mathcal {S}}' = \{x_1, \ldots , x_4\}$.

Axiom 10 (Eventual Monotonicity)

Let ${\mathcal {D}}= ({\mathcal {S}}, {\mathcal {F}}, {\mathcal {O}})$ be a domain and N a ${\mathcal {D}}$-network. Then for all $f, g \in {\mathcal {F}}$, $f \ne g$, there is a finite, non-empty set ${\mathcal {S}}' \subseteq {\mathbb {S}}$ with ${\mathcal {S}}\cap {\mathcal {S}}' = \emptyset$ and $g \prec _{N + ({\mathcal {S}}', f)}^T f$.

This axiom says that, given any pair of distinct facts f, g, it is possible to add enough new claims for f to make f strictly more believable than g. Intuitively, this is less demanding that Monotonicity, which requires that f can become strictly more believable than g with the addition of just one additional claim. Note that Eventual Monotonicity is not possible to state in the fixed domain case (e.g. consider N already containing claims from all the available sources in ${\mathcal {S}}$).

When paired with Isomorphism, Eventual Monotonicity takes on a form similar to postulates for Improvement and Majority operators in belief merging [22, 23]: there is a threshold $n \in {\mathbb {N}}$ such that f becomes strictly more believable than g after n new claims are added for f. That is, the identities of the new sources ${\mathcal {S}}'$ are irrelevant; it is just the size of ${\mathcal {S}}'$ that matters. We require a preliminary lemma.

Lemma 6.1

Suppose a variable domain operator T satisfies Isomorphism. Let ${\mathcal {D}}= ({\mathcal {S}}, {\mathcal {F}}, {\mathcal {O}})$ be a domain, N a network in ${\mathcal {D}}$ and $f \in {\mathcal {F}}$. Then for all non-empty, finite sets ${\mathcal {S}}'_1, {\mathcal {S}}'_2 \subseteq {\mathbb {S}}$ disjoint from ${\mathcal {S}}$ with $|{\mathcal {S}}'_1| = |{\mathcal {S}}'_2|$,

$$\begin{aligned} {\preceq _{N + ({\mathcal {S}}'_1, f)}^T} = {\preceq _{N + ({\mathcal {S}}'_2, f)}^T} \end{aligned}$$

Proof

Write ${\mathcal {D}}_1 = ({\mathcal {S}}\cup {\mathcal {S}}'_1, {\mathcal {F}}, {\mathcal {O}})$ and ${\mathcal {D}}_2 = ({\mathcal {S}}\cup {\mathcal {S}}'_2, {\mathcal {F}}, {\mathcal {O}})$. Then $N + ({\mathcal {S}}'_i, f)$ is a network in domain ${\mathcal {D}}_i$ (for $i \in \{1, 2\}$). Since $|{\mathcal {S}}'_1| = |{\mathcal {S}}'_2|$ by assumption, there is a bijection $\varphi : {\mathcal {S}}'_1 \rightarrow {\mathcal {S}}'_2$. Define a mapping $\pi$ from ${\mathcal {D}}_1$ to ${\mathcal {D}}_2$ by

$$\begin{aligned} \pi (s) = {\left\{ \begin{array}{ll} s,&{} s \in {\mathcal {S}}\\ \varphi (s),&{} s \in {\mathcal {S}}'_1 \end{array}\right. } \quad (s \in {\mathcal {S}}\cup {\mathcal {S}}'_1) \end{aligned}$$

and $\pi (g) = g$, $\pi (o) = o$ for $g \in {\mathcal {F}}$ and $o \in {\mathcal {O}}$. Then it is easily verified that $\pi$ is an isomorphism from $N + ({\mathcal {S}}'_1, f)$ to $N + ({\mathcal {S}}'_2, f)$. For $g_1, g_2 \in {\mathcal {F}}$, we have $g_1 \preceq _{N + ({\mathcal {S}}'_1, f)}^T g_2$ iff $\pi (g_1) \preceq _{N + ({\mathcal {S}}'_2, f)}^T \pi (g_2)$ by Isomorphism. Since $\pi (g_1) = g_1$ and $\pi (g_2) = g_2$, this shows ${\preceq _{N + ({\mathcal {S}}'_1, f)}^T} = {{\preceq _{N + ({\mathcal {S}}'_2, f)}^T}}$. $\square$

Note that since ${\mathbb {S}}$ is infinite and domains are finite, for any $n \in {\mathbb {N}}$ and any domain ${\mathcal {D}}= ({\mathcal {S}}, {\mathcal {F}}, {\mathcal {O}})$ there is always some ${\mathcal {S}}' \subseteq {\mathbb {S}}$, disjoint from ${\mathcal {S}}$, with $|{\mathcal {S}}'| = n$. For operators T satisfying Isomorphism, write $\preceq _{N + (n \times f)}^T$ for $\preceq _{N + ({\mathcal {S}}', f)}^T$; Lemma 6.1 guarantees this is well-defined (i.e. does not depend on the particular choice of ${\mathcal {S}}'$). That is, $\preceq _{N + (n \times f)}^T$ is the fact ranking resulting from adding n new claims for f from fresh sources. We obtain an equivalent characterisation of Eventual Monotonicity, whose proof is almost immediate given Lemma 6.1.

Proposition 6.1

Suppose T satisfies Isomorphism. Then T satisfies Eventual Monotonicity if and only if for all domains ${\mathcal {D}}= ({\mathcal {S}}, {\mathcal {F}}, {\mathcal {O}})$, all networks N in ${\mathcal {D}}$ and distinct $f, g \in {\mathcal {F}}$, there is $n \in {\mathbb {N}}$ such that $g \prec _{N + (n \times f)}^T f$.

Proof (sketch)

‘if’: To show Eventual Monotonicity, take any ${\mathcal {S}}' \subseteq {\mathbb {S}}\setminus {\mathcal {S}}$ of size n.

‘Only if’: Given that Eventual Monotonicity holds, simply take $n = |{\mathcal {S}}'|$. $\square$

We can now show that all operators studied so far—when lifted to the variable domain case—satisfy Eventual Monotonicity.

Theorem 6.1

Voting, Sums, SC-Voting and UnboundedSums satisfy Eventual Monotonicity.

Proof (sketch)

Let ${\mathcal {D}}= ({\mathcal {S}}, {\mathcal {F}}, {\mathcal {O}})$ be a domain, N a network in ${\mathcal {D}}$ and $f, g \in {\mathcal {F}}$. Given that Isomorphism holds for each operator, we sketch the proof via proposition 6.1.

For Voting and SC-Voting, we may simply take $n = 1 + |\mathsf {src}_N(g)|$. For Sums and UnboundedSums, take $n = 2 |{\mathcal {S}}| \cdot |{\mathcal {F}}|$. Write $N' = N + ({\mathcal {S}}', f)$ for some ${\mathcal {S}}' \subseteq {\mathbb {S}}\setminus {\mathcal {S}}$ with $|{\mathcal {S}}'| = n$

If $(T^k)_{k \in {\mathbb {N}}}$ denotes Sums, one can show by induction that $T^k_{N'}(f) = 1$ and $T^{k}_{N'}(h) \le \frac{1}{2}$ for any $h \ne f$ and $k > 1$, and thus $g \prec _{N'}^{T^Sums {}} f$.

Similarly, letting $(T^k)_{k \in {\mathbb {N}}}$ denote UnboundedSums, one can show by induction that $T^{k}_{N'}(f) > T^{k}_{N'}(h)$ for $h \ne f$, and thus $g \prec _{N'}^{T^UnboundedSums {}} g$.

$\square$

To conclude this section, we show that the impossibility result of Theorem 4.2 holds in the variable domain case if one replaces Monotonicity with Eventual Monotonicity and Symmetry with Isomorphism.

Theorem 6.2

There is no variable domain operator satisfying Coherence, Isomorphism, Eventual Monotonicity and POI.

Proof

For contradiction, suppose T is an operator satisfying the stated axioms. Let N be the network from Fig. 1, viewed as a network in domain $(\{s, t, u, v\}, \{f, g, h, i\}, \{o, p\})$. Applying Eventual Monotonicity with i and h, we have that there is $N'$ with $i \prec _{N'}^T h$, where $N' = N + ({\mathcal {S}}', h)$ for some ${\mathcal {S}}' \subseteq {\mathbb {S}}\setminus \{s, t, u, v\}$. Since $N'$ only adds claims for p-facts, POI applied to object o and Isomorphism give $f \approx _{N'}^T g$ (e.g. consider $\pi$ which simply swaps s with t and f with g). From Source-Coherence we get $t \sqsubset _{N'}^T s$. But $\mathsf {src}_{N'}(f) = \{s\}$ and $\mathsf {src}_{N'}(g) = \{t\}$, so Fact-Coherence gives $g \prec _{N'}^T f$: contradiction! $\square$

7 Discussion

In this section we discuss the axioms and their limitations. First, the version of Monotonicity we consider is a strict one: a new claim for f gives f a strictly positive boost in the fact believability ranking. This is also the case for Eventual Monotonicity in the variable domain case, where we require that some number of new claims make f strictly more believable than any other fact g. As noted in Sect. 4.3, this assumes there is no collusion between sources. Indeed, suppose sources $s_1$, $s_2$ are in collusion. For example, $s_2$ may agree to unconditionally back up all claims made by $s_1$. In this case a claim of f from $s_1$ alone should carry just as much weight as the claim from both $s_1$ and $s_2$. However, Monotonicity requires that f becomes strictly more believable when moving to the latter case.

A natural solution is to simply relax the strictness requirement. We obtain the following weak version of Monotonicity.

Axiom 11 (Weak Monotonicity)

Let $N, s, f, N'$ be as in the statement of Monotonicity. Then for all $g \ne f$, $g \preceq _N^T$ implies $g \preceq _{N'}^T f$.

Weak Monotonicity only says says that extra support for a fact f does not make f less believable. Clearly Monotonicity implies Weak Monotonicity, but not vice versa. In the collusion example, an operator may select to leave the fact ranking unchanged when a new report of f from $s_2$ arrives; this is inconsistent with Monotonicity but consistent with Weak Monotonicity. The weak analogue of Eventual Monotonicity can be defined in the same way.

In the same spirit, one could consider versions of Coherence only using weak comparisons. Say $\mathsf {facts}_N(s_1)$ is weakly less believable than $\mathsf {facts}_N(s_2)$ iff the condition in definition 4.1 holds, but without the requirement that some ${\hat{f}} \in \mathsf {facts}_N(s_1)$ is strictly less believable than its counterpart $\varphi ({\hat{f}})$ in $\mathsf {facts}_N(s_2)$, and define $\mathsf {src}_N(f_1)$ weakly less trustworthy than $\mathsf {src}_N(f_2)$ in a similar way. The weak analogue of Coherence is as follows.

Axiom 12 (Weak Coherence)

For any network N, $\mathsf {facts}_N(s_1)$ weakly less believable than $\mathsf {facts}_N(s_2)$ implies $s_1 \sqsubseteq _N^T s_2$, and $\mathsf {src}_N(f_1)$ weakly less trustworthy than $\mathsf {src}_N(f_2)$ implies $f_1 \preceq _N^T f_2$.

Note that Coherence does not imply Weak Coherence. This is because Weak Coherence relaxes both the consequent and the antecedent in the implications in the statement of the axiom. Whereas Coherence imposes no constraint when $\mathsf {facts}_N(s_1)$ is only weakly less believable than $\mathsf {facts}_N(s_2)$, Weak Coherence requires $s_1 \sqsubseteq _N^T s_2$. Consequently, the ‘weakness’ of Weak Coherence refers to its use of weak comparisons between sources and facts, not its logical strength in relation to Coherence.

A natural question now arises. Does the impossibility result of Theorem 4.2 still hold with these new axioms? We have an answer in the negative: the ‘flat’ operator, which sets all sources and facts equally ranked in all networks, satisfies all the axioms of the would-be impossibility.

Proposition 7.1

Define an operator T by $s_1 \simeq _N^T s_2$ and $f \approx _N^T f_2$ for all networks N, sources $s_1, s_2$ and facts $f_1, f_2$. Then T satisfies Coherence, Weak Coherence, Symmetry, Weak Monotonicity and POI.

Proof

Coherence holds vacuously since we can never have $\mathsf {facts}_N(s_1)$ less believable than $\mathsf {facts}_N(s_2)$ or $\mathsf {src}_N(f_1)$ less believable than $\mathsf {src}_N(f_2)$. Since any weak ranking holds for T, the other axioms are straightforward to see. $\square$

This shows that (strict) Monotonicity is required for the impossibility result, since the result is no longer true when relaxing to Weak Monotonicity.

We now consider the new axioms in relation to the operators. First, Weak Coherence.

Proposition 7.2

Voting, Sums and UnboundedSums satisfy Weak Coherence

Proof (sketch)

Voting Since $s_1 \sqsubseteq _N^{T^Voting {}} s_2$ always holds, Weak Source-Coherence clearly holds. For Weak Fact-Coherence, suppose $\mathsf {src}_N(f_1)$ is weakly less trustworthy than $\mathsf {src}_N(f_2)$. Then there is a bijection $\varphi : \mathsf {src}_N(f_1) \rightarrow \mathsf {src}_N(f_2)$, so $|\mathsf {src}_N(f_1)| = |\mathsf {src}_N(f_2)|$. By definition of Voting, $f_1 \approx _N^{T^Voting {}} f_2$. In particular, $f_1 \prec _N^{T^Voting {}} f_2$.

Sums First, one may adapt definition 5.1 to a numerical variant of a set of facts Y being weakly less believable than $Y'$, by dropping all references to $\rho$. We then have an analogue of Lemma 5.1, and Weak Coherence for Sums follows by an argument similar to the one used to show Coherence using Lemma 5.1.

UnboundedSums The proof that UnboundedSums satisfies Coherence can be adapted in a straightforward way to show Weak Coherence. $\square$

Proposition 7.2 indicates that Weak Coherence may in fact be too weak to capture the original intuition behind Coherence—namely, that there should be a mutual dependence between trustworthy sources and believable facts—since it does not even rule out Voting. Instead, Weak Coherence can be seen as a simple requirement which only rules out undesirable behaviour, and complements (strict) Coherence.

Since Monotonicity implies Weak Monotonicity, it is clear that Voting satisfies Weak Monotonicity. We conjecture that Weak Monotonicity also holds for Sums and UnboundedSums, but this remains to be proven.¹²

In this section we discuss related work.

Ranking systems. Altman and Tennenholtz [2] initiated axiomatic study of ranking systems. First we discuss their framework in relation to ours, and then turn to their axioms. In their framework, a ranking system F maps any (finite) directed graph $G = (V, E)$ to a total preorder $\le _G^F$ on the vertex set V. In their view this is a variation of the classical social choice setting, in which the set of voters and alternatives coincide. Nodes $v \in V$ “vote" on their peers in V by a form of approval voting [26]: an edge $v \rightarrow u$ is interpret as a vote for u from v. A ranking system then outputs a ranking of V, following the general intuition that the more “votes" v receives (i.e. the more incoming edges), the higher v should rank. As with the ranking of facts in truth discovery, this does not necessarily mean ranking nodes simply by the number of votes received, since the quality of the voters should also be taken in account. For example, a ranking system may prioritise nodes which receive few votes from highly ranked nodes over those with many votes from lower ranked nodes.

Note that our truth discovery networks N are themselves directed graphs on the vertex set ${\mathcal {S}}\cup {\mathcal {F}}\cup {\mathcal {O}}$. However, naively applying a ranking system to N directly makes little sense: sources never receive any “votes", and the resulting ranking includes objects, which do not need to be ranked in our setting. Perhaps a more sensible approach is to consider the bipartite graph $G_N = (V_N, E_N)$ associated with a network N, where

$$\begin{aligned} V_N = {\mathcal {S}}\cup {\mathcal {F}}, \qquad E_N = \bigcup _{(s, f) \in N}{ \{(s, f), (f, s)\} }. \end{aligned}$$

That is, we take the edges from sources to facts together with the reversal of such edges. The edges in $G_N$ have some intuitive interpretation: a source votes for the facts which it claims are true, and a fact votes for the sources who vouch for it. Any ranking system F thus gives rise to a truth discovery operator, where $s_1 \sqsubseteq _N^T s_2$ iff $s_1 \le _{G_N}^F s_2$, and similar for facts.

However, some characteristic aspects of the truth discovery problem are lost in this translation to ranking systems. Notably, the objects play no role at all in $G_N$. Sources and facts are also treated symmetrically, where they perhaps should not be. For example, a fact f receiving more claims than g is beneficial for f, all else being equal (see Monotonicity), but a source s claiming more facts than t does not tell us anything about the relative trustworthiness of s and t.

While other choices of $G_N$ may be possible to alleviate some of these problems, we believe the truth discovery is sufficiently specialised beyond general graph ranking so that a bespoke modelling is required to capture its nuances appropriately. Our framework provides this novel contribution.

In [2], Altman and Tennenholtz also introduce axioms for ranking systems. Their first set of axioms deal with the transitive effects of voting when the alternatives are the voters themselves. As mentioned in Sect. 4, these axioms provided the inspiration for Coherence. The core idea is that if the predecessors of a node v are weaker than those of u, then v should be ranked below u. If v additionally has more predecessors, v should rank strictly below. Coherence applies this idea both in the direction of sources-to-facts (Fact-Coherence) and from facts-to-sources (Source-Coherence). A notable difference is that we only consider the case where the number of sources for two facts (or the number of facts, for two sources) is the same. For example, a source claiming more facts does not give it the strict boost Transitivity would dictate. Under the mapping $N \mapsto G_N$ described above, any ranking system satisfying Transitivity induces a truth discovery operator which satisfies Coherence.

The other axiom in [2] is an independence axiom RIIA (ranked independence of irrelevant alternatives), which adapts the classical IIA axiom from social choice theory to the ranking system setting, although in a different manner to our independence axioms of Sect. 4.4. We describe the axiom in rough terms, deferring to the paper for the technical details. Suppose the relative ranking of $u_1$’s predecessors compared to $u_2$’s predecessors is the same as that of $v_1$’s compared to $v_2$’s. Then RIIA requires $u_1 \le u_2$ iff $v_1 \le v_2$. Informally, “the relative ranking of two agents must only depend on the pairwise comparison of the ranks of their predecessors" [2]. While we do not have an analogous axiom, the idea can be adapted to truth discovery networks. Intuitively, such an axiom would state that the ranking of two facts depends only on comparisons between their corresponding sources (and similar for the ranking of sources).

However, the main result of Altman and Tennenholtz is an impossibility: Transitivity is incompatible with RIIA. Moreover, the result remains true even when restricting to bipartite graphs, such as $G_N$ described above. Accordingly, we can expect a similar impossibility result to hold in the truth discovery setting between Coherence and any analogue of RIIA.

PageRank. PageRank [33] is a well-known algorithm for ranking web pages based on the hyperlink structure of the web, viewed as a directed graph. It has also been studied through the lens of social choice and characterised axiomatically [1, 40].¹³ In [1] the authors propose several invariance axioms, each of which requires that the ranking of pages is not affected by a certain transformation of the web graph. For example, the axiom Self Edge says that adding a self loop from a page a to itself does not change the relative ranking of other pages, and results in a strictly positive boost for a (c.f. Monotonicity). However, if we identify a truth discovery network N with the graph $G_N$ as described above, most of the transformations involved do not respect the bipartite, symmetric structure of $G_N$. That is, the transformed graph does not correspond to any $G_{N'}$, for a network $N'$. Consequently, the PageRank axioms have no truth discovery counterpart in our setting. The only exception is Isomorphism, where the transformation in question is graph isomorphism; this axiom is analogous to our Symmetry and Isomorphism axioms. However, since PageRank is similar to the Hubs and Authorities [21] algorithm on which Sums is based—which also uses the link structure of the web to rank pages—we expect there may be additional axioms which can be expressed both for general graphs and truth discovery networks, satisfied by PageRank and Sums. We leave the task of finding such axioms to future work.

9 Conclusion

In this paper we formalised a mathematical framework for truth discovery. While a number of simplifying assumptions were made compared to the mainstream truth discovery literature, we are able to express several algorithms in the framework. This provided the setting for the axiomatic method of social choice to be applied. To our knowledge, this is the first such axiomatic treatment in this context.

It was possible to adapt many axioms from social choice theory and related areas. In particular, the Transitivity axiom studied in the context of ranking systems [2, 36] took on new life in the form of Coherence, which we consider a core axiom for TD operators. We proceeded to establish the differences between our setting and classical social choice by considering independence axioms. This led to an impossibility result and an axiomatic characterisation of the baseline Voting method.

On the practical side, we analysed the existing TD algorithm Sums and found that, surprisingly, it fails PCI. This is a serious issue for Sums which has not been discussed in the literature to date, and its discovery here highlights the benefits of the axiomatic method. To resolve this, we suggested a modification to Sums—which we call UnboundedSums—for which PCI is satisfied. However, while UnboundedSums resolves axiomatic problems with Sums, it may introduce computational difficulties (since the numeric scores involved grow without bound). We leave further investigation of such issues to future work.

A restriction of our analysis is that only one ‘real-world’ algorithm was considered. Further axiomatic analysis of algorithms provides a deeper understanding of how algorithms operate on an intuitive level, but is made difficult by the complexity of the state-of-the-art truth discovery methods. New techniques for establishing the satisfaction (or otherwise) of axioms would be helpful in this regard.

There is also scope for extensions to our model of truth discovery in the framework itself. For example, even in the variable domain setting of Sect. 6, we make the somewhat simplistic assumption that there are only finitely many possible facts for sources to claim. This effectively means we can only consider categorical values; modelling an object whose domain is the set of real numbers, for example, is not straightforward in our framework.

Next, our model does not account for any associations or constraints between objects, whereas in reality the belief in a fact for one object may strengthen or weaken our belief in other facts for related objects. These types of constraints or correlations have been studied both on the theoretical side (e.g. in judgment aggregation) and practical side in truth discovery [44].

The axioms can also be further refined to relax some of the simplifying assumptions we make regarding source attitudes; e.g. that they do not collude or attempt to manipulate. Most notably, Monotonicity should be weakened to account for such sources.

Finally, it may be argued that truth discovery as formulated in this paper risks simply to find consensus among sources, rather than the truth. To remedy this, the framework could be extended to model the possible states of the world and thus the ground truth (c.f. [32]). Upon doing so one could investigate how well, and under what conditions, an operator is able to recover the truth from a TD network. Such truth-tracking methods have also been studied in judgment aggregation and belief fusion [16, 20].

Declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

previous article Scalar reward is not enough: a response to Silver, Singh, Precup and Sutton (2021)

next article Towards explainable interactive multiobjective optimization: R-XIMO

Proofs

Proof of theorem 4.1

The following lemma is required before the proof.

Lemma A.1

Suppose a network $N=(V, E)$ contains claims only for a single object $o \in {\mathcal {O}}$; that is, there exists $o \in {\mathcal {O}}$ such that $(s, f) \in E$ implies $obj_N(f) = o$ for all $s \in {\mathcal {S}}, f \in {\mathcal {F}}$. Then for any Symmetric operator T and $f_1, f_2 \in {\mathcal {F}}$, $|\mathsf {src}_N(f_1)| = |\mathsf {src}_N(f_2)| > 0$ implies $f_1 \approx _N^T f_2$.

Proof

Suppose N has the stated property, T satisfies symmetry, and $|\mathsf {src}_N(f_1)| = |\mathsf {src}_N(f_2)| > 0$. Then there is a bijection $\varphi : \mathsf {src}_N(f_1) \rightarrow \mathsf {src}_N(f_2)$. Note that since $f_1$ and $f_2$ are for the same object no source can claim both facts, i.e. $\mathsf {src}_N(f_1) \cap \mathsf {src}_N(f_2) = \emptyset$.

Define a permutation $\pi$ by

$$\begin{aligned} \pi (s)= & {} {\left\{ \begin{array}{ll} \varphi (s) &{} \text { if } s \in \mathsf {src}_N(f_1) \\ \varphi ^{-1}(s) &{} \text { if } s \in \mathsf {src}_N(f_2) \\ s &{} \text { otherwise } \end{array}\right. } \\ \pi (f)= & {} {\left\{ \begin{array}{ll} f_2 &{} \text { if } f = f_1 \\ f_2 &{} \text { if } f = f_2 \\ f &{} \text { otherwise } \end{array}\right. } \end{aligned}$$

and $\pi (o) = o$ for all $o \in {\mathcal {O}}$. That is, $\pi$ swaps facts $f_1$ and $f_2$, and swaps the sources of $f_1$ with their counterparts in $f_2$. Note that $\pi = \pi ^{-1}$.

Write $N' = \pi (N)$. We claim that $N' = N$. Write $E, E'$ for the edges in N and $N'$ respectively. First we will show $E \subseteq E'$. Suppose $(s, f) \in E$. There are three cases.

Case 1 $f=f_1$. Here we have $(s, f_1) \in E$, so $s \in \mathsf {src}_N(f_1)$. Consequently $\pi (s) = \varphi (s) \in \mathsf {src}_N(f_2)$, i.e. $(\pi (s), f_2) \in E$. By the definition of a graph isomorphism we get $(\pi (\pi (s)), \pi (f_2)) \in E'$. Noting that $\pi (f_2) = f_1 = f$ and $\pi (\pi (s)) = s$ (since $\pi =\pi ^{-1}$), we have $(s, f) \in E'$ as desired.
Case 2 $f = f_2$. Similar to the above case, here we have $s \in \mathsf {src}_N(f_2)$ and so $\pi (s) = \varphi ^{-1}(s) \in \mathsf {src}_N(f_1)$, i.e. $(\pi (s), f_1) \in E$. As before, applying the definition of a graph isomorphism and using $\pi =\pi ^{-1}$, we get $(s, f) \in E'$.
Case 3 $f \notin \{f_1, f_2\}$. By hypothesis f relates to the same object as $f_1$ and $f_2$. This means $s \notin \mathsf {src}_N(f_1)$ and $s \notin \mathsf {src}_N(f_2)$, since otherwise s would make claims for multiple facts for a single object. Hence we have $\pi (s)=s$ and $\pi (f)=f$. This means $(s, f) = (\pi (s), \pi (f)) \in E'$ as required.

To complete the claim $E \subseteq E'$, suppose $(f, o) \in E$. There are again three cases: $f = f_1$, $f = f_2$, or $f \notin \{f_1,f_2\}$. In each case the definition of $\pi$ and $\pi (N)$ easily yield $(f, o) \in E'$. Hence $E \subseteq E'$.

Now for the reverse direction: we must show $E' \subseteq E$. Let $(x, y) \in E'$. By definition of a graph isomorphism, we have $(\pi ^{-1}(x), \pi ^{-1}(y)) \in E$. Using $\pi ^{-1} = \pi$ and the first part we get $(\pi (x), \pi (y)) = (\pi ^{-1}(x), \pi ^{-1}(y)) \in E \subseteq E'$. The definition of a graph isomorphism then yields $(x, y) \in E$ and so $E' \subseteq E$. Hence $E = E'$ and $N = N'$.

To conclude the proof, we apply Symmetry of T to get

$$\begin{aligned} f_1 \preceq _N^T f_2&\iff \pi (f_1) \preceq _{N'}^T \pi (f_2) \\&\iff f_2 \preceq _{N'}^T f_1 \\&\iff f_2 \preceq _N^T f_1 \end{aligned}$$

and so $f_1 \approx _N^T f_2$ as required. $\square$

Proof (Theorem 4.1)

Suppose T is an operator satisfying Symmetry, Monotonicity and POI. Let $N \in {\mathcal {N}}$, $o \in {\mathcal {O}}$ and $f_1, f_2 \in \mathsf {obj}_N^{-1}(o)$. We need to show that $f_1 \preceq _N^T f_2$ iff $|\mathsf {src}_N(f_1)| \le |\mathsf {src}_N(f_2)|$.

Let $N'$ be the network obtained from N by removing all claims for facts other than those for object o; that is, $N' = (V, E')$ where E is the set of edges in N and

$$\begin{aligned} E' = (E \cap ({\mathcal {S}}\times \mathsf {obj}_N^{-1}(o))) \cup (E \cap ({\mathcal {F}}\times {\mathcal {O}})) \end{aligned}$$

Note that the fact-object affiliations are the same in $N'$ as in N, and the set of sources for each fact in $\mathsf {obj}_N^{-1}(o)$ is the same. Therefore POI applies, and it is sufficient to show that $f_1 \preceq _{N'}^T f_2$ iff $|\mathsf {src}_{N'}(f_1)| \le |\mathsf {src}_{N'}(f_2)|$.

First suppose $|\mathsf {src}_{N'}(f_1)| \le |\mathsf {src}_{N'}(f_2)|$. If $|\mathsf {src}_{N'}(f_1)| = |\mathsf {src}_{N'}(f_2)|$, then we have $f_1 \approx _{N'}^T f_2$ by Symmetry and Lemma A.1; in particular $f_1 \preceq _{N'}^T f_2$. Otherwise $|\mathsf {src}_{N'}(f_2)| - |\mathsf {src}_{N'}(f_1)| = k > 0$. Removing k sources from $f_2$ to obtain a new network $N''$, we can apply the lemma to get $f_1 \approx _{N''}^T f_2$. We may then add these sources back to obtain $N'$ again; k applications of Monotonicity then give $f_1 \prec _{N'}^T f_2$ as required.

To complete the proof we show that $f_1 \preceq _{N'}^T f_2$ implies $|\mathsf {src}_{N'}(f_1)| \le |\mathsf {src}_{N'}(f_2)|$. Indeed, suppose $f_1 \preceq _{N'}^T f_2$ but $|\mathsf {src}_{N'}(f_1) > |\mathsf {src}_{N'}(f_2)|$. Then the argument above gives $f_1 \succ _{N'}^T f_2$, which is clearly a contradiction. Hence the proof is complete. $\square$

Proof of theorem 4.3

The proof of this theorem is similar in spirit to that of Theorem 4.1. Like in that case, a preliminary result is required first.

Lemma A.2

Let N be a network and $f_1, f_2 \in {\mathcal {F}}$. Write $o_1 = \mathsf {obj}_{N}(f_1)$, $o_2 = \mathsf {obj}_N(f_2)$. Suppose N has the following properties:

There is $o^* \in {\mathcal {O}}\setminus \{o_1, o_2\}$ such that $f \in {\mathcal {F}}\setminus \{f_1, f_2\} \implies \mathsf {obj}_{N}(f) = o^*$; and

$\mathsf {src}_{N}(f) = \emptyset$ for all $f \in {\mathcal {F}}\setminus \{f_1, f_2\}$.

Then for any operator T satisfying Symmetry, $|\mathsf {src}_{N}(f_1)| = |\mathsf {src}_{N}(f_2)|$ implies $f_1 \approx _{N}^T f_2$.

Proof

The proof is similar to that of Lemma A.1. Suppose $|src_{N}(f_1)| = |\mathsf {src}_{N}(f_2)|$. Write

$$\begin{aligned} Q_1&= \mathsf {src}_{N}(f_1) \setminus \mathsf {src}_{N}(f_2) \\ Q_2&= \mathsf {src}_{N}(f_2) \setminus \mathsf {src}_{N}(f_1) \end{aligned}$$

Then $|Q_1| = |Q_2|$, so there exists a bijection $\varphi : Q_1 \rightarrow Q_2$. Define a permutation $\pi$ as follows:

$$\begin{aligned} \pi (s)= & {} {\left\{ \begin{array}{ll} \varphi (s) &{} \text { if } s \in Q_1 \\ \varphi ^{-1}(s) &{} \text { if } s \in Q_2 \\ s &{} \text { otherwise } \end{array}\right. } \\ \pi (f)= & {} {\left\{ \begin{array}{ll} f_2 &{} \text { if } f = f_1 \\ f_1 &{} \text { if } f = f_2 \\ f &{} \text { otherwise } \end{array}\right. } \\ \pi (o)= & {} {\left\{ \begin{array}{ll} o_2 &{} \text { if } o = o_1 \\ o_1 &{} \text { if } o = o_2 \\ o &{} \text { otherwise } \end{array}\right. } \end{aligned}$$

That is, $\pi$ swaps $f_1$ and $f_2$, swaps $o_1$ and $o_2$, and swaps sources in $Q_1$ with their counterparts in $Q_2$. Note that $\pi = \pi ^{-1}$. Write $N' = \pi (N)$. We claim that $N' = N$. Write $E, E'$ for the edges in N and $N'$ respectively. First we show that $E \subseteq E'$. For this, first suppose $(s, f) \in E$ for some $s \in {\mathcal {S}}$, $f \in {\mathcal {F}}$. By definition of E, either $f = f_1$ or $f = f_2$.

Case 1 $f = f_1$. Here $\pi (f) = f_2$, and we have either $s \in Q_1$ or $s \in \mathsf {src}_{N}(f_1) \cap \mathsf {src}_{N}(f_2)$. In the first case, $\pi (s) = \varphi (s) \in Q_2 \subseteq \mathsf {src}_{N}(f_2) = \mathsf {src}_{N}(\pi (f))$. In the second case $\pi (s) = s \in \mathsf {src}_{N}(f_2) = \mathsf {src}_{N}(\pi (f))$. In either case, $(\pi (s), \pi (f)) \in E$.

Applying the definition of a graph isomorphism we get $(\pi (\pi (s)), \pi (\pi (f))) \in E'$. But $\pi = \pi ^{-1}$, so this means $(s, f) \in E'$ as desired.
Case 2 $f = f_2$. This case is similar. Here $\pi (f) = f_1$. If $s \in Q_2$, then $\pi (s) = \varphi ^{-1}(s) \in Q_1 \subseteq \mathsf {src}_{N}(f_1) = \mathsf {src}_{N}(\pi (f))$. Otherwise $s \in \mathsf {src}_{N}(f_1) \cap \mathsf {src}_{N}(f_2)$ and $\pi (s) = s \in \mathsf {src}_{N}(f_1) = \mathsf {src}_{N}(\pi (f))$. Again, we have $(\pi (s), \pi (f)) \in E$ in either case, so $(s, f) \in E'$.

Note that these two cases cover all possibilities since by hypothesis $\mathsf {src}_{N}(f) = \emptyset$ if $f \notin \{f_1, f_2\}$.

Next, suppose $(f, o) \in E$. If $f = f_1$ then $o = o_1$, so $(\pi (f), \pi (o)) = (f_2, o_2) \in E$. Similarly if $f = f_2$ then $o = o_2$ and $(\pi (f), \pi (o)) = (f_1, o_1) \in E$. If $f \notin \{f_1, f_2\}$ then $\pi (f) = f$ and $o = o^*$, so $\pi (o) = o$. We see that in all cases, $(\pi (f), \pi (f)) \in E$. Applying the same argument as in case 1 above, we see that $(f, o) \in E'$. This shows $E \subseteq E'$.

To complete the claim that $N = N'$ we must show $E' \subseteq E$. This can be shown using the same argument used in Lemma A.1. Indeed, suppose $(x, y) \in E'$. Then by definition of a graph isomorphism, $(\pi ^{-1}(x), \pi ^{-1}(y)) \in E$. Using the fact that $\pi =\pi ^{-1}$ and $E \subseteq E'$ we get $(\pi (x), \pi (y)) \in E'$, which then yields $(x, y) \in E$ as required. Hence $E = E'$ and $N = N'$.

Finally, using Symmetry of T we have

$$\begin{aligned} f_1 \preceq _N^T f_2&\iff \pi (f_1) \preceq _{\pi (N)}^T \pi (f_2) \\&\iff f_2 \preceq _{N'}^T f_1 \\&\iff f_2 \preceq _N^T f_1 \end{aligned}$$

Consequently $f_1 \approx _N^T f_2$. $\square$

Proof (Theorem 4.3)

The ‘if’ direction is clear since Voting satisfies Strong Independence, Monotonicity and Symmetry (see Theorem 5.1). For the other direction, suppose T satisfies the stated axioms. Let N be a network and $f_1, f_2 \in {\mathcal {F}}$. We will construct a network $N'$ where all claims for facts other than $f_1, f_2$ are removed, and these facts are grouped under a single object. To do so, let $o_1 = \mathsf {obj}_N(f_1)$, $o_2 = \mathsf {obj}_N(f_2)$ and take $o^* \in {\mathcal {O}}\setminus \{o_1, o_2\}$. Define an edge set $E'$ by

$$\begin{aligned}&(s, f) \in E' \iff f \in \{f_1, f_2\} \text { and } s \in \mathsf {src}_N(f) \\&(f, o) \in E' \iff (f \in \{f_1, f_2\} \text { and } o = \mathsf {obj}_N(f)) \text { or } (f \notin \{f_1, f_2\} \text { and } o = o^*) \end{aligned}$$

Then let $N'$ be the network with edge set $E'$. Note that $\mathsf {src}_{N'}(f_j) = \mathsf {src}_N(f_j)$. By Strong Independence it is therefore sufficient to show that $f_1 \preceq _{N'}^T f_2$ iff $|src_{N'}(f_1)| \le |\mathsf {src}_{N'}(f_2)|$. Note also that $N'$ satisfies the hypothesis of Lemma A.2.

Now, suppose $|\mathsf {src}_{N'}(f_1)| \le |\mathsf {src}_{N'}(f_2)|$. If $|\mathsf {src}_{N'}(f_1)| = |\mathsf {src}_{N'}(f_2)|$ then by Lemma A.2$f_1 \approx _{N'}^T f_2$, and in particular $f_1 \preceq _{N'}^T f_2$.

Otherwise, $|\mathsf {src}_{N'}(f_2)| - |\mathsf {src}_{N'}(f_1)| = k > 0$. Consider $N''$ where k sources from $\mathsf {src}_{N'}(f_2)$ are removed, and all other claims remain. By the lemma, $f_1 \approx _{N''}^T f_2$. Applying Monotonicity k times we can produce $N'$ from $N''$ and get $f_1 \prec _{N'}^T f_2$ as desired.

For the other implication, suppose $f_1 \preceq _{N'}^T f_2$ and, for contradiction, $|\mathsf {src}_{N'}(f_1)| > |\mathsf {src}_{N'}(f_2)|$. Applying Monotonicity again as above gives $f_1 \succ _{N'}^T f_2$ and the required contradiction. $\square$

Proof of theorem 5.1

Proof

We will show that Voting satisfies Symmetry, Unanimity, Groundedness, Monotonicity, POI, Strong Independence and PCI, and that Coherence is not satisfied. For Symmetry and PCI we use the (stronger) numerical variants numerical Symmetry and numerical PCI, introduced in Sect. 5.2. T will denote the (numerical) Voting operator in what follows.

Symmetry. Suppose N and $\pi (N)$ are equivalent networks. Let $f \in {\mathcal {F}}$. By definition of equivalent networks we have $s \in \mathsf {src}_N(f)$ iff $\pi (s) \in \mathsf {src}_{\pi (N)}(\pi (f))$ for all $s \in {\mathcal {S}}$. Consequently $\pi$ restricted to $\mathsf {src}_N(f)$ is a bijection into $\mathsf {src}_{\pi (N)}(\pi (f))$, and hence

$$\begin{aligned} T_N(f) = |\mathsf {src}_N(f)| = |\mathsf {src}_{\pi (N)}(\pi (f))| = T_{\pi (N)}(\pi (f)) \end{aligned}$$

Now let $s \in {\mathcal {S}}$. Clearly we have $T_N(s) = 1 = T_{\pi (N)}(\pi (s))$. Hence T satisfies numerical Symmetry and therefore Symmetry.

Unanimity and Groundedness. Suppose $N \in {\mathcal {N}}$ and $f \in {\mathcal {F}}$. If $\mathsf {src}_N(f) = {\mathcal {S}}$ then for any $g \in {\mathcal {F}}$,

$$\begin{aligned} T_N(g) = |\mathsf {src}_N(g)| \le |{\mathcal {S}}| = |\mathsf {src}_N(f)| = T_N(f) \end{aligned}$$

so $g \preceq _N^{T} f$ and Unanimity is satisfied. If instead $\mathsf {src}_N(f) = \emptyset$, we have

$$\begin{aligned} T_N(g) = |\mathsf {src}_N(g)| \ge 0 = |\mathsf {src}_N(f)| = T_N(f) \end{aligned}$$

so $f \preceq _N^{T} g$ and Groundedness is satisfied.

Monotonicity. Let $N, N', s$ and f be as given in the statement of Monotonicity. It is clear that $|\mathsf {src}_{N'}(f)| = |\mathsf {src}_N(f)| + 1$. Also, for any $g \in {\mathcal {F}}$, $g \ne f$, the set of sources in $N'$ is the same as in N but with s possibly removed. Hence $|\mathsf {src}_{N'}(g)| \le |\mathsf {src}_N(g)$. Therefore $g \preceq _N^{T} f$ implies

$$\begin{aligned} |\mathsf {src}_{N'}(g)| \le |\mathsf {src}_N(g)| \le |\mathsf {src}_N(f)| < |\mathsf {src}_{N'}(f)| \end{aligned}$$

and so $g \prec _{N'}^{T} f$ as required.

Independence axioms. Next we show Strong Independence, which implies POI. Suppose $N_1, N_2 \in {\mathcal {N}}$, $f_1, f_2 \in {\mathcal {F}}$ and $\mathsf {src}_{N_1}(f_j) = \mathsf {src}_{N_2}(f_j)$ for each $j \in \{1, 2\}$. Clearly we have

$$\begin{aligned} T_{N_1}(f_j) = |\mathsf {src}_{N_1}(f_j)| = |\mathsf {src}_{N_2}(f_j)| = T_{N_2}(f_j) \end{aligned}$$

Consequently

$$\begin{aligned} f_1 \preceq _{N_1}^{T} f_2&\iff T_{N_1}(f_1) \le T_{N_1}(f_2) \\&\iff T_{N_2}(f_1) \le T_{N_2}(f_2) \\&\iff f_1 \preceq _{N_2}^{T} f_2 \end{aligned}$$

as required for Strong Independence.

For PCI we proceed as with Symmetry by showing numerical PCI. Let $N_1, N_2$ have a common connected component G. Let $f \in G \cap {\mathcal {F}}$. By definition of a connected component, $s \in \mathsf {src}_{N_1}(f)$ iff $s \in \mathsf {src}_{N_2}(f)$, so $\mathsf {src}_{N_1}(f) = \mathsf {src}_{N_2}(f)$. Hence

$$\begin{aligned} T_{N_1}(f) = |\mathsf {src}_{N_1}(f)| = |\mathsf {src}_{N_2}(f)| = T_{N_2}(f) \end{aligned}$$

For $s \in G \cap {\mathcal {S}}$, we trivially have $T_{N_1}(s) = 1 = T_{N_1}(s)$. Hence numerical PCI is satisfied.

Coherence. The violation of Coherence follows from Theorem 4.2, since we have already shown that Symmetry, Monotonicity and POI are satisfied. $\square$

Proof of lemma 5.2

Proof

The first statement follows easily from the definition of the limit. We shall prove only the second one.

First we prove the ‘if’ direction. Write $D = T^*_N(f_1) - T^*_n(f_2)$. We need to show that $D < 0$. Write $d_n = T_N^n(f_1) - T_N^n(f_2)$ so that $D = \lim _{n \rightarrow \infty }{d_n}$. Take $\varepsilon = \rho / 2 > 0$. Then for sufficiently large n we have $d_n \le -\rho / 2 < 0$. Taking $n \rightarrow \infty$, we have $D = \lim _{n \rightarrow \infty }{d_n} \le -\rho / 2 < 0$ as required.

For the ‘only if’ direction, suppose $D < 0$. Let $\rho = -D$. Then for any $\varepsilon > 0$, by the definition of the limit there is $K \in {\mathbb {N}}$ such that $|d_n - D| < \varepsilon$ for $n \ge K$; in particular, $d_n < \varepsilon + D = \varepsilon - \rho$ as required. $\square$

Proof of theorem 5.2

The following results will be helpful to simplify the Proof of Theorem 5.2.

Lemma A.3

$\mathsf {norm}$ has the following properties.

$\mathsf {norm}$ preserves numerical Symmetry, in the sense that $\mathsf {norm}(T)$ satisfies numerical Symmetry whenever T does.

$\mathsf {norm}$ leaves rankings unchanged, in the following sense. For $T \in {\mathcal {T}}_{Num}$, $N \in {\mathcal {N}}$, $s_1, s_2 \in {\mathcal {S}}$, $f_1, f_2 \in {\mathcal {F}}$,

$$\begin{aligned} s_1 \sqsubseteq _N^T s_2&\iff s_1 \sqsubseteq _N^{\mathsf {norm}(T)} s_2 \\ f_1 \preceq _N^T f_2&\iff f_1 \preceq _N^{\mathsf {norm}(T)} f_2 \end{aligned}$$

Proof

For part (i), suppose T satisfies numerical Symmetry, and write $T' = U(T)$. Let N and $\pi (N)$ be equivalent networks. First note that

$$\begin{aligned} \max _{x \in {\mathcal {S}}}{|T_N(x)|} = \max _{x \in {\mathcal {S}}}{|T_{\pi (N)}(\pi (x))}| = \max _{x \in {\mathcal {S}}}{|T_{\pi (N)}(x)|} \end{aligned}$$

where the second equality follows since $\pi$ restricted to ${\mathcal {S}}$ is a surjection into ${\mathcal {S}}$ by the definition of equivalent networks. If this maximum is 0, then $T'_N(s)=0=T'_{\pi (N)}(s)$ for all $s \in {\mathcal {S}}$. Otherwise,

$$\begin{aligned} T'_N(s) = \frac{T_N(s)}{\max \limits _{x \in {\mathcal {S}}}{|T_N(x)|}} = \frac{T_{\pi (N)}(\pi (s))}{\max \limits _{x \in {\mathcal {S}}}{|T_{\pi (N)}(x)|}} = T'_{\pi (N)}(\pi (s)) \end{aligned}$$

One can show that $T'_N(f) = T'_{\pi (N)}(\pi (f))$ by an identical argument. Hence $T'=U(T)$ satisfies numerical Symmetry also.

Now we prove part (ii). First suppose $s_1 \sqsubseteq _N^T s_2$. Write $T' = \mathsf {norm}(T)$. We have $T'_N(x) = \alpha T_N(x)$ for some $\alpha \ge 0$ and all $x \in {\mathcal {S}}$ (either $\alpha = 1 / \max _{x \in {\mathcal {S}}}{|T_N(x)|}$ or $\alpha = 0$). We therefore have

$$\begin{aligned} s_1 \sqsubseteq _N^T s_2&\implies T_N(s_1) \le T_N(s_2) \\&\implies \alpha T_N(s_1) \le \alpha T_N(s_2) \\&\implies T'_N(s_1) \le T'_N(s_2) \\&\implies s_1 \sqsubseteq _N^{T'} s_2 \end{aligned}$$

as desired.

Now suppose $s_1 \sqsubseteq _N^{T'} s_2$, i.e. $\alpha T_N(s_1) \le \alpha T_N(s_2)$. If $\alpha > 0$ then dividing by $\alpha$ readily gives $s_1 \sqsubseteq _N^T s_2$. Otherwise, $\alpha = 0$. This means $\max _{x \in {\mathcal {S}}}{|T_N(x)|} = 0$, and thus $T_N(x) = 0$ for all $x \in {\mathcal {S}}$. In particular $T_N(s_1) = 0 \le 0 = T_N(s_2)$ so $s_1 \sqsubseteq _N^T s_2$.

The second statement regarding fact ranking may be shown using an identical argument. $\square$

Corollary A.1

$\mathsf {norm}$ preserves Coherence, Unanimity, Groundedness and PCI.

Proof (Theorem 5.2)

Throughout this proof, $(T^n)_{n \in {\mathbb {N}}}$ will denote the iterative operator Sums, $T^*$ will denote the limit operator, and $U = \mathsf {norm}\circ U^{\text {Sums}}$ will denote the update function for Sums.

Coherence. Source-Coherence was shown in the body of the paper. The proof that Fact-Coherence is satisfied is similar, and uses Lemma 5.3. Suppose $N \in {\mathcal {N}}$, $T = T^n$ for some $n \in {\mathbb {N}}$, $\varepsilon , \rho > 0$, and $\mathsf {src}_N(f_1)$ is $(\varepsilon , \rho )$-less trustworthy than $\mathsf {src}_N(f_2)$ with respect to N and ${\tilde{T}}$ under a bijection $\varphi$, where ${\tilde{T}} = U(T)$. Let ${\hat{s}} \in \mathsf {src}_N(f_1)$ be such that ${\tilde{T}}_N(s) - {\tilde{T}}_N(\varphi (s)) \le \varepsilon - \rho$.

Write $T' = U^{\text {Sums}}(T)$ so that ${\tilde{T}} = \mathsf {norm}(T')$, and set

$$\begin{aligned} \alpha = \frac{1}{\max \limits _{x \in {\mathcal {S}}}{|T'_N(x)|}} \end{aligned}$$

We may assume without loss of generality that $\varepsilon < \frac{1}{|{\mathcal {S}}|}\rho$. Note that for $s \in {\mathcal {S}}$, ${\tilde{T}}_N(s) = {\alpha }T'_N(s)$ and therefore $T'_N(s) = \frac{1}{\alpha }{\tilde{T}}_N(s)$. Writing

$$\begin{aligned} \beta = \frac{1}{\max \limits _{y \in {\mathcal {F}}}{|T'_N(y)|}} \end{aligned}$$

and applying a similar argument as for showing Source-Coherence, we find

$$\begin{aligned} {\tilde{T}}_N(f_1) - {\tilde{T}}_N(f_2)&= \beta \sum _{s \in \mathsf {src}_N(f_1)}{ \Big ( T'_N(s) - T'_N(\varphi (s)) \Big ) } \\&= \frac{\beta }{\alpha } \sum _{s \in \mathsf {src}_N(f_1)}{ \Big ( {\tilde{T}}_N(s) - {\tilde{T}}_N(\varphi (s)) \Big ) } \\&= \frac{\beta }{\alpha } \left[ \underbrace{ \Big ( {\tilde{T}}_N({\hat{s}}) - {\tilde{T}}_N(\varphi ({\hat{s}})) \Big ) }_{\le \varepsilon - \rho } + \sum _{s \in \mathsf {src}_N(f_1) \setminus \{{\hat{s}}\} }{ \underbrace{ \Big ( {\tilde{T}}_N(s) - {\tilde{T}}_N(\varphi (s)) \Big ) }_{\le \varepsilon } } \right] \\&\le \frac{\beta }{\alpha } \cdot \underbrace{ \Big (|{\mathcal {S}}|\varepsilon - \rho \Big ) }_{< 0} \end{aligned}$$

Now we need to bound $\beta / \alpha$ from below. Since we assume $T = T^n$ for some $n \in {\mathbb {N}}$, for any $y \in {\mathcal {F}}$ we have

$$\begin{aligned} |T'_N(y)| = \sum _{s \in \mathsf {src}_N(y)}{ \underbrace{T'_N(s)}_{\le |{\mathcal {F}}|} } \le |\mathsf {src}_N(y)| \cdot |{\mathcal {F}}| \le |{\mathcal {S}}| \cdot |{\mathcal {F}}| \end{aligned}$$

Therefore

$$\begin{aligned} \beta \ge \frac{1}{|{\mathcal {S}}| \cdot |{\mathcal {F}}|} \end{aligned}$$

Next, we claim there is some fact ${\bar{f}} \in {\mathcal {F}}$ with $T_N({\bar{f}}) \ge 1 / 2$ and $\mathsf {src}_N({\bar{f}}) \ne \emptyset$. Indeed, if $T = T^1 = T^{\text {fixed}}$ then take any fact with at least one associated source.¹⁴ Otherwise, since we assume not all scores are 0 in the limit, there is some ${\bar{f}}$ with $T_N({\bar{f}}) = 1$ due to the application of $\mathsf {norm}$. Clearly $\mathsf {src}_N({\bar{f}}) \ne \emptyset$, since we would have $T_N({\bar{f}}) = 0$ otherwise.

Let ${\bar{x}} \in \mathsf {src}_N({\bar{f}})$. Then

$$\begin{aligned} |T'_N({\bar{x}})| = T'_N({\bar{x}}) = \underbrace{T_N({\bar{f}})}_{\ge 1 / 2} + \underbrace{ \sum _{f \in \mathsf {facts}_N({\bar{x}}) \setminus \{{\bar{f}}\}}{ T_N(f) } }_{\ge 0} \ge \frac{1}{2} \end{aligned}$$

This means

$$\begin{aligned} \frac{1}{\alpha } = \max _{x \in {\mathcal {S}}}{|T'_N(x)|} \ge |T'_N({\bar{x}})| \ge \frac{1}{2} \end{aligned}$$

and so, finally,

$$\begin{aligned} \frac{\beta }{\alpha } \ge \frac{1}{|{\mathcal {S}}| \cdot |{\mathcal {F}}|} \cdot \frac{1}{2} \end{aligned}$$

Combined with what was shown before, this means

$$\begin{aligned} {\tilde{T}}_N(f_1) - {\tilde{T}}_N(f_2) \le \frac{1}{2 \cdot |{\mathcal {S}}| \cdot |{\mathcal {F}}|} \Big (|{\mathcal {S}}|\varepsilon - \rho \Big ) \end{aligned}$$

and Fact-Coherence follows from Lemma 5.3.

Symmetry. As a consequence of Lemma 5.4, to show Symmetry it is sufficient to show that $T^{\text {fixed}}$ satisfies numerical Symmetry, and that $U = \mathsf {norm}\circ U^{\text {Sums}}$ preserves numerical Symmetry. Since $T^{\text {fixed}}$ is constant with value 1/2, it is clear that numerical Symmetry is satisfied. Moreover, Lemma A.3 part (i) already shows that $\mathsf {norm}$ preserves numerical Symmetry, so we only need to show that $U^{\text {Sums}}$ does.

To that end, suppose $T \in {\mathcal {T}}_{Num}$ satisfies numerical symmetry, and write $T' = U^{\text {Sums}}(T)$. Let N and $\pi (N)$ be equivalent networks and $s \in {\mathcal {S}}$. Then

$$\begin{aligned} T'_{\pi (N)}(\pi (s))&= \sum _{y \in \mathsf {facts}_{\pi (N)}(\pi (s))}{T_{\pi (N)}(y)} \end{aligned}$$

Note that $f \in \mathsf {facts}_N(s)$ iff $\pi (f) \in \mathsf {facts}_{\pi (N)}(\pi (s))$. Rephrased slightly, we have $y \in \mathsf {facts}_{\pi (N)}(\pi (s))$ iff $\pi ^{-1}(y) \in \mathsf {facts}_N(s)$. Hence we may make a ‘substitution‘ $f = \pi ^{-1}(y)$ and sum over $\mathsf {facts}_N(s)$, i.e.

$$\begin{aligned} T'_{\pi (N)}(\pi (s)) = \sum _{f \in \mathsf {facts}_N(s)}{T_{\pi (N)}(\pi (f))} \end{aligned}$$

Applying numerical symmetry for T, we get

$$\begin{aligned} T'_{\pi (N)}(\pi (s))&= \sum _{f \in \mathsf {facts}_N(s)}{T_N(f)} \\&= T'_N(s) \end{aligned}$$

Following the same tactic, one may also show that $T'_{\pi (N)}(\pi (f)) = T'_N(f)$ for all $f \in {\mathcal {F}}$. Hence $U^{\text {Sums}}$ preserves numerical Symmetry, and we are done.

Unanimity and Groundedness.

Unanimity and Groundedness can be proved together using Lemma 5.5 and corollary A.1. By these results it is sufficient that $T^{\text {fixed}}$ satisfies Unanimity and Groundedness—this is trivial—and that $U^{\text {Sums}}$ preserves them.

Suppose T satisfies Unanimity and Groundedness and write $T' = U^{\text {Sums}}(T)$. Assume without loss of generality that $T = T^n$ for some $n \in {\mathbb {N}}$ so that $T'_N \ge 0$. Suppose $N \in {\mathcal {N}}$, $f \in {\mathcal {F}}$ and that $\mathsf {src}_N(f) = {\mathcal {S}}$. Let $g \in {\mathcal {F}}$. We must show that $g \preceq _N^{T'} f$. We have

$$\begin{aligned} T'_N(g) = \sum _{s \in \mathsf {src}_N(g)}{T'_N(s)} \le \sum _{s \in {\mathcal {S}}}{T'_N(s)} = T'_N(f) \end{aligned}$$

i.e. $g \preceq _N^{T'} f$ as required for Unanimity. For Groundedness, suppose $\mathsf {src}_N(f) = \emptyset$. We must show $f \preceq _N^{T'} g$. Indeed, the sum in the expression for $T'_N(f)$ is taken over the empty set, which by convention is 0. Since $T'_N(g) \ge 0$, we are done. $\square$

Proof of theorem 5.3

Proof

Here we give only the technical details for the argument showing SC-Voting satisfies Symmetry, since the results for the other axioms were given in the main text.

Symmetry. Since Voting satisfies Symmetry, it is clear that $f_1 \preceq _N^{T^{SCV}} f_2$ iff $\pi (f_1) \preceq _{\pi (N)}^{T^{SCV}} \pi (f_2)$ for any equivalent networks N and $\pi (N)$. We need to show that $s_1 \sqsubseteq _N^{T^{SCV}} s_2$ iff $\pi (s_1) \sqsubseteq _{\pi (N)}^{T^{SCV}} \pi (s_2)$.

First we will show that $\mathrel {\lhd }_N$ and $\mathrel {\lhd }_{\pi (N)}$ have a similar symmetry property: $s_1 \mathrel {\lhd }_N s_2$ iff $\pi (s_1) \mathrel {\lhd }_{\pi (N)} \pi (s_2)$. Indeed, suppose $s_1 \mathrel {\lhd }_N s_2$. Then there is a bijection $\varphi : \mathsf {facts}_N(s_1) \rightarrow \mathsf {facts}_N(s_2)$ with $f \preceq _N^{T^{SCV}} \varphi (f)$, and there is some ${\hat{f}}$ with ${\hat{f}} \prec _N^{T^{SCV}} \varphi ({\hat{f}})$.

It can be seen that $\pi$ restricted to $\mathsf {facts}_N(s_i)$ is a bijection into $\mathsf {facts}_{\pi (N)}(\pi (s_i))$. Let $\pi _1$ and $\pi _2$ denote these restrictions for $i=1, 2$ respectively. Set $\theta = \pi _2 \circ \varphi \circ \pi _1^{-1}$, so that $\theta$ maps $\mathsf {facts}_{\pi (N)}(\pi (s_1))$ into $\mathsf {facts}_{\pi (N)}(\pi (s_2))$. As a composition of bijections, $\theta$ is itself bijective.

Let $g \in \mathsf {facts}_{\pi (N)}(\pi (s_1))$. Write $f = \pi _1^{-1}(g) \in \mathsf {facts}_N(s_1)$. By the property of $\varphi$, we have

$$\begin{aligned} f \preceq _N^{T^{SCV}} \varphi (f) \end{aligned}$$

By the symmetry property of the fact-ranking (which follows from symmetry of Voting), we can apply $\pi$ to the above to get

$$\begin{aligned} \pi (f) \preceq _{\pi (N)}^{T^{SCV}} \pi (\varphi (f)) \end{aligned}$$

Since $f \in \mathsf {facts}_N(s_1)$ and $\varphi (f) \in \mathsf {facts}_N(s_2)$, we have $\pi (f) = \pi _1(f)$ and $\pi (\varphi (f)) = \pi _2(\varphi (f))$. Using this fact in the above inequality and recalling $f = \pi ^{-1}(g)$ we get

$$\begin{aligned} g = \pi _1(f) = \pi (f) \preceq _{\pi (N)}^{T^{SCV}} \pi (\varphi (f)) = \pi _2(\varphi (f)) = \pi _2(\varphi (\pi _1^{-1}(g))) = \theta (g) \end{aligned}$$

i.e. $g \preceq _{\pi (N)}^{T^{SCV}} \theta (g)$. Applying the same argument with ${\hat{g}} = \pi _1^{-1}({\hat{f}})$ we get ${\hat{g}} \prec _{\pi (N)}^{T^{SCV}} \theta ({\hat{g}})$.

This shows that $\mathsf {facts}_{\pi (N)}(\pi (s_1))$ is less believable than $\mathsf {facts}_{\pi (N)}(\pi (s_2))$ with respect to SC-Voting (whose fact-ranking coincides with Voting) in $\pi (N)$ under $\theta$. Hence $\pi (s_1) \mathrel {\lhd }_{\pi (N)} \pi (s_2)$.

We have shown $s_1 \mathrel {\lhd }_N s_2 \implies \pi (s_1) \mathrel {\lhd }_{\pi (N)} \pi (s_2)$. For the converse implication, apply the same argument starting from $\pi (s_1) \mathrel {\lhd }_{\pi (N)} \pi (s_2)$ with the $\pi ^{-1}$.

Next, we note that for $i=1, 2$ and any $t \in {\mathcal {S}}$,

$$\begin{aligned} t \in W_N(s_i)&\iff t \mathrel {\lhd }_N s_i \\&\iff \pi (t) \mathrel {\lhd }_{\pi (N)} \pi (s_i) \\&\iff \pi (t) \in W_{\pi (N)}(\pi (s_i)) \end{aligned}$$

Consequently $\pi$ restricted to $W_N(s_i)$ is a bijection into $W_{\pi (N)}(\pi (s_i))$, which means $|W_N(s_i)| = |W_{\pi (N)}(\pi (s_i))|$. Finally, this means

as required for Symmetry. $\square$

Proof of theorem 5.5

Proof

Here we show that UnboundedSums satisfies Symmetry, PCI, Unanimity and Groundedness, since the other axioms were dealt with in the main body of the paper.

Throughout the proof, let $(T^n)_{n \in {\mathbb {N}}}$ denote UnboundedSums, $T^*$ denote the ordinal limit of UnboundedSums, and for a network N let $J_N$ be as in Theorem 5.4. Then the rankings in N induced by $T^n$ for $n \ge J_N$ are the same as $T^*$.

Symmetry. In the Proof of Theorem 5.2, we saw that the update function $U^{\text {Sums}}$ preserves numerical Symmetry, in the sense that if T satisfies numerical Symmetry then $U^{\text {Sums}}(T)$ does also. Since it is clear that the prior operator for UnboundedSums satisfies numerical Symmetry, $T^n$ satisfies numerical Symmetry and consequently Symmetry for all $n \in {\mathbb {N}}$.

Now, let N and $\pi (N)$ be equivalent networks. Let $J, J' \in {\mathbb {N}}$ be such that $T^*(N)$ and $T^*(\pi (N))$ are given by $T_N^J$ and $T_{\pi (N)}^{J'}$ respectively and take $n \ge \max \{J, J'\}$. For $s_1, s_2 \in {\mathcal {S}}$ we have by Symmetry of $T^n$,

$$\begin{aligned} s_1 \sqsubseteq _N^{T^*} s_2&\iff s_1 \sqsubseteq _N^{T^n} s_2 \\&\iff \pi (s_1) \sqsubseteq _{\pi (N)}^{T^n} \pi (s_2) \\&\iff \pi (s_1) \sqsubseteq _{\pi (N)}^{T^*} \pi (s_2) \end{aligned}$$

as required for Symmetry. Using an identical argument, one can show that $f_1 \preceq _N^{T^*} f_2$ iff $\pi (f) \preceq _{\pi (N)}^{T^*} \pi (f_2)$. Hence $T^*$ satisfies Symmetry.

PCI. As with Symmetry, we will show that $T^n$ satisfies numerical PCI, and consequently PCI, for all $n \in {\mathbb {N}}$. Let $N_1, N_2$ be networks with a common connected component G. Let $s \in G \cap {\mathcal {S}}$ and $f \in G \cap {\mathcal {F}}$. Note that $\mathsf {facts}_{N_1}(s) = \mathsf {facts}_{N_2}(s)$ and $\mathsf {src}_{N_1}(f) = \mathsf {src}_{N_2}(f)$ since by definition a source is connected to its facts and vice versa. For $n = 1$ we have

$$\begin{aligned}&T_{N_1}^1(s) = 1 = T_{N_2}^1(s) \\&T_{N_1}^1(f) = |\mathsf {src}_{N_1}(f)| = |\mathsf {src}_{N_2}(f)| = T_{N_2}^1(f) \end{aligned}$$

so $T^1$ has numerical PCI. Supposing $T^n$ has numerical PCI for some $n \in {\mathbb {N}}$, we have

$$\begin{aligned} T_{N_1}^{n+1}(s) = \sum _{g \in \mathsf {facts}_{N_1}(s)}{ \underbrace{T_{N_1}^n(g)}_{=T_{N_2}^n(g)} } = \sum _{g \in \mathsf {facts}_{N_2}(s)}{ T_{N_2}^n(g) } = T_{N_2}^{n+1}(s) \end{aligned}$$

and similarly

$$\begin{aligned} T_{N+1}^{n+1}(f) = T_{N_2}^{n+1}(f) \end{aligned}$$

Hence, by induction, $T^n$ has numerical PCI for all $n \in {\mathbb {N}}$, and we are done.

Unanimity and Groundedness. For Unanimity, suppose $\mathsf {src}_N(f) = {\mathcal {S}}$. For any $g \in {\mathcal {F}}$ and $n \in {\mathbb {N}}$ we have

$$\begin{aligned} T_N^n(g)&= \sum _{s \in \mathsf {src}_N(g)}{T_N^n(s)} \\&\le \sum _{s \in \mathsf {src}_N(g)}{T_N^n(s)} + \sum _{s \in {\mathcal {S}}\setminus \mathsf {src}_N(g)}{T_N^n(s)} \\&= \sum _{s \in {\mathcal {S}}}{T_N^n(s)} \\&= \sum _{s \in \mathsf {src}_N(f)}{T_N^n(s)} \\&= T_N^n(f) \end{aligned}$$

so $g \preceq _N^{T^n} f$ for all $n \in {\mathbb {N}}$. Since the ranking of $T^*$ corresponds to $T^n$ for large n, we have $g \preceq _N^{T^*} f$ as required

For Groundedness, suppose $\mathsf {src}_N(f) = \emptyset$. Then $T_N^n(f) = 0$ for all $n \in {\mathbb {N}}$. For any $g \in {\mathcal {F}}$, we have $T_N^n(g) \ge 0 = T_N^n(f)$. Consequently $f \preceq _N^{T^n} g$ for all $n \in {\mathbb {N}}$. As above, this gives $f \preceq _N^{T^*} g$ as required. $\square$

https://www.nytimes.com/2019/08/19/technology/amazon-orwell-1984.html

We generalise to variable domains in Sect. 6.

For example, when implementing truth discovery algorithms in practise it is common to assign integer IDs to the ‘facts’ and ‘objects’; the algorithm then operates using only the integer IDs. In this case there is no reason to require that fact 17 is always associated with object 4, for example, and the same principle applies in our framework.

If $\max _{x \in {\mathcal {S}}}{|T_N(x)|} = 0$ then the above is ill-defined; we set $T_N'(s) = 0$ for all s in this case. Fact belief scores are defined similarly if the maximum is 0.

One could also consider the weak version, in which we only require $g \preceq _{N'}^T f$ in the consequent; we discuss this in Sect. 7.

Note that collusion has been studied in the truth discovery literature (e.g. [6, 12, 13]).

For example, the Book and Restaurant datasets found at the following web page each contain two connected components: http://lunadong.com/fusionDataSets.htm

Fact-Coherence is vacuously satisfied, however: since all sources rank equally we can never have $\mathsf {src}_N(f_1)$ less trustworthy than $\mathsf {src}_N(f_2)$.

Note that to simplify proof of ordinal convergence we use a different prior operator to Sums, but this does not effect the operator in any significant way.

The argument which shows that the difference between fact belief scores is also eventually constant in sign is almost identical. Write $B = M^{\top }M$, and observe that $w_{n+1} = B^nw_1$. Since B is also symmetric and positive semi-definite, the proof goes through as above.

Note that g ranks higher than f in this network simply because t makes more claims than s, and each fact is claimed only by a single source. This questionable property of UnboundedSums is inherited from Sums.

Indeed, we conjectured in Sect. 5 that the stronger axiom (strict) Monotonicity holds for UnboundedSums. As with Monotonicity, experimental evidence from various starting networks N suggests that Weak Monotonicity is likely to hold.

Wąs and Skibski [40] axiomatise the numerical scores of PageRank, whereas Altman and Tennenholtz [1] axiomatise the resulting ranking. Moreover, Wąs and Skibski point out that Altman and Tennenholtz in fact only consider a simplified version of PageRank called Katz prestige, defined only for strongly connected graphs.

Note that this is always possible since a truth discovery network contains at least one claim by definition.

Altman, A., & Tennenholtz, M. (2005). Ranking systems: The pagerank axioms. In Proceedings of the 6th ACM Conference on Electronic Commerce (pp. 1–8). ACM.

Altman, A., & Tennenholtz, M. (2008). Axiomatic foundations for ranking systems. Journal of Artificial Intelligence Research, 31(1), 473–495.MathSciNetCrossRefMATH

Andersen, R., Borgs, C., Chayes, J., Feige, U., Flaxman, A., Kalai, A., Mirrokni, V., & Tennenholtz, M. (2008). Trust-based recommendation systems: An axiomatic approach. In Proceedings of the 17th International Conference on World Wide Web, WWW ’08 (pp. 199–208). ACM. https://doi.org/10.1145/1367497.1367525.

Arrow, K. J. (1952). Social choice and individual values. Ethics, 62(3), 220–222.CrossRef

Axler, S. (2014). Linear Algebra Done Right. Springer International Publishing.MATH

Balakrishnan, R., & Kambhampati, S. (2011). Sourcerank: Relevance and trust assessment for deep web sources based on inter-source agreement. In Proceedings of the 20th International Conference on World Wide Web (pp. 227–236).

Berti-Equille, L., & Borge-Holthoefer, J. (2015). Veracity of data: From truth discovery computation algorithms to models of misinformation dynamics. Synthesis Lectures on Data Management, 7(3), 1–155.CrossRef

Brandt, F., Conitzer, V., Endriss, U., Lang, J., & Procaccia, A. D. (2016). Introduction to computational social choice. In F. Brandt, V. Conitzer, U. Endriss, J. Lang, & A. D. Procaccia (Eds.), Handbook of Computational Social Choice. Cambridge University Press.CrossRefMATH

Christoff, Z., & Grossi, D. (2017). Binary voting with delegable proxy: An analysis of liquid democracy. In Proc. TARK 2017.

10.

Ding, H., Gao, J., & Xu, J. (2016). Finding global optimum for truth discovery: Entropy based geometric variance. In Proc. 32nd International Symposium on Computational Geometry (SoCG 2016).

11.

Dokow, E., & Holzman, R. (2010). Aggregation of binary evaluations with abstentions. Journal of Economic Theory, 145, 544–561.MathSciNetCrossRefMATH

12.

Dong, X. L., Berti-Equille, L., & Srivastava, D. (2009). Integrating conflicting data: The role of source dependence. Proceedings of the VLDB Endowment, 2(1), 550–561.CrossRef

13.

Dong, X. L., Berti-Equille, L., & Srivastava, D. (2009). Truth Discovery and Copying Detection in a Dynamic World. Proceedings of the VLDB Endowment, 2(1), 562–573. https://doi.org/10.14778/1687627.1687691.CrossRef

14.

Du, Y., Sun, Y. E., Huang, H., Huang, L., Xu, H., Bao, Y., & Guo, H. (2019). Bayesian co-clustering truth discovery for mobile crowd sensing systems. IEEE Transactions on Industrial Informatics. https://doi.org/10.1109/TII.2019.2896287.CrossRef

15.

Endriss, U. (2016). Judgment aggregation. In F. Brandt, V. Conitzer, U. Endriss, J. Lang, & A. D. Procaccia (Eds.), Handbook of Computational Social Choice. Cambridge University Press.MATH

16.

Everaere, P., Konieczny, S., Marquis, P. (2010). The epistemic view of belief merging: Can we track the truth?. In ECAI (pp. 621–626).

17.

Galland, A., Abiteboul, S., Marian, A., & Senellart, P. (2010). Corroborating information from disagreeing views. In Proceedings of the Third ACM International Conference on Web Search and Data Mining, WSDM ’10 (pp. 131–140). ACM. https://doi.org/10.1145/1718487.1718504.

18.

Ghosh, A., Kale, S., & McAfee, P. (2011). Who moderates the moderators?. In Proceedings of the 12th ACM Conference on Electronic Commerce—EC 11. ACM Press . https://doi.org/10.1145/1993574.1993599.

19.

Gupta, M., & Han, J. (2011). Heterogeneous network-based trust analysis: A survey. ACM SIGKDD Explorations Newsletter, 13(1), 54–71. https://doi.org/10.1145/2031331.2031341.CrossRef

20.

Hartmann, S., & Sprenger, J. (2012). Judgment aggregation and the problem of tracking the truth. Synthese, 187, 209–221.MathSciNetCrossRefMATH

21.

Kleinberg, J. M. (1999). Authoritative sources in a hyperlinked environment. Journal of the ACM, 46(5), 604–632. https://doi.org/10.1145/324133.324140.MathSciNetCrossRefMATH

22.

Konieczny, S., & Pérez, R. P. (2002). Merging information under constraints: A logical framework. Journal of Logic and computation, 12(5), 773–808.MathSciNetCrossRefMATH

23.

Konieczny, S., & Pérez, R. P. (2008). Improvement operators. In G. Brewka, J. Lang (Eds.), Principles of Knowledge Representation and Reasoning: Proceedings of the Eleventh International Conference, KR 2008, Sydney, Australia, 16–19 Sept, 2008 (pp. 177–187). AAAI Press. http://www.aaai.org/Library/KR/2008/kr08-018.php.

24.

Kotonya, N., & Toni, F. (2020). Explainable automated fact-checking for public health claims. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) (pp. 7740–7754).

25.

Kruger, J., Endriss, U., Fernandez, R., & Qing, C. (2014). Axiomatic analysis of aggregation methods for collective annotation. In Proceedings of the 2014 International Conference on Autonomous Agents and Multi-agent Systems, AAMAS ’14 (pp. 1185–1192). International Foundation for Autonomous Agents and Multiagent Systems, Richland, SC. http://dl.acm.org/citation.cfm?id=2617388.2617437.

26.

Laslier, J. F., & Sanver, M. R. (2010). Handbook on Approval Voting. Springer Science & Business Media.CrossRefMATH

27.

Li, Y., Gao, J., Meng, C., Li, Q., Su, L., Zhao, B., et al. (2016). A survey on truth discovery. ACM SIGKDD Explorations Newsletter, 17(2), 1–16. https://doi.org/10.1145/2897350.2897352.CrossRef

28.

Li, Y., Li, Q., Gao, J., Su, L., Zhao, B., Fan, W., & Han, J. (2016). Conflicts to harmony: A framework for resolving conflicts in heterogeneous data by truth discovery. IEEE Transactions on Knowledge and Data Engineering, 28(8), 1986–1999. https://doi.org/10.1109/TKDE.2016.2559481CrossRef

29.

Ma, F., Li, Y., Li, Q., Qiu, M., Gao, J., Zhi, S., Su, L., Zhao, B., Ji, H., & Han, J. (2015). FaitCrowd: Fine Grained Truth Discovery for Crowdsourced Data Aggregation. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’15 (pp. 745–754). ACM. https://doi.org/10.1145/2783258.2783314. Event-place: Sydney, NSW, Australia.

30.

Ma, F., Meng, C., Xiao, H., Li, Q., Gao, J., Su, L., & Zhang, A. (2017). Unsupervised discovery of drug side-effects from heterogeneous data sources. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’17 (pp. 967–976). ACM. https://doi.org/10.1145/3097983.3098129.

31.

Marshall, J., Argueta, A., & Wang, D. (2017). A neural network approach for truth discovery in social sensing. In 2017 IEEE 14th International Conference on Mobile Ad Hoc and Sensor Systems (MASS) (pp. 343–347). IEEE.

32.

Meir, R., Amir, O., Cohensius, G., Ben-Porat, O., & Xia, L. (2019). Truth discovery via proxy voting. CoRR. arXiv:abs/1905.00629.

33.

Page, L., Brin, S., Motwani, R., & Winograd, T. (1999). The PageRank Citation Ranking: Bringing Order to the Web. Technical Report 1999-66, Stanford InfoLab. http://ilpubs.stanford.edu:8090/422/.

34.

Pasternack, J., & Roth, D. (2010). Knowing what to believe (when you already know something). In Proceedings of the 23rd International Conference on Computational Linguistics, COLING ’10(pp. 877–885). Association for Computational Linguistics, Stroudsburg, PA, USA. http://dl.acm.org/citation.cfm?id=1873781.1873880.

35.

Singleton, J., & Booth, R. (2020). An axiomatic approach to truth discovery. In Proceedings of the 19th International Conference on Autonomous Agents and MultiAgent Systems, AAMAS ’20 (pp. 2011–2013). International Foundation for Autonomous Agents and Multiagent Systems, Richland, SC.

36.

Tennenholtz, M. (2004). Reputation systems: An axiomatic approach. In Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence, UAI ’04 (pp. 544–551). AUAI Press, Arlington, Virginia, United States. http://dl.acm.org/citation.cfm?id=1036843.1036909.

37.

Waguih, D. A., & Berti-Equille, L. (2014). Truth discovery algorithms: An experimental evaluation. arXiv preprint. arXiv:1409.6428.

38.

Wang, D., Kaplan, L., Le, H., & Abdelzaher, T. (2012). On truth discovery in social sensing: A maximum likelihood estimation approach. In Proceedings of the 11th International Conference on Information Processing in Sensor Networks, IPSN ’12 (pp. 233–244). ACM. https://doi.org/10.1145/2185677.2185737. Event-place: Beijing, China.

39.

Wang, Y., Ma, F., Jin, Z., Yuan, Y., Xun, G., Jha, K., Su, L., & Gao, J. (2018). Eann: Event adversarial neural networks for multi-modal fake news detection. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (pp. 849–857).

40.

Wąs, T., & Skibski, O. (2020). Axiomatic characterization of pagerank. https://doi.org/10.48550/ARXIV.2010.08487. arXiv:2010.08487.

41.

Xiao, H. (2018). Multi-sourced information trustworthiness analysis: Applications and theory. Ph.D. thesis, University at Buffalo, State University of New York.

42.

Xiao, H., Gao, J., Wang, Z., Wang, S., Su, L., & Liu, H. (2016). A truth discovery approach with theoretical guarantee. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’16 (pp. 1925–1934). ACM. https://doi.org/10.1145/2939672.2939816.

43.

Yang, Y., Bai, Q., & Liu, Q. (2018). On the discovery of continuous truth: A semi-supervised approach with partial ground truths. In Web Information Systems Engineering—WISE 2018 (pp. 424–438). Springer International Publishing. https://doi.org/10.1007/978-3-030-02922-7_29.

44.

Yang, Y., Bai, Q., & Liu, Q. (2019). A probabilistic model for truth discovery with object correlations. Knowledge-Based Systems, 165, 360–373. https://doi.org/10.1016/j.knosys.2018.12.004.CrossRef

45.

Yin, X., Han, J., & Yu, P. S. (2008). Truth discovery with multiple conflicting information providers on the web. IEEE Transactions on Knowledge and Data Engineering, 20(6), 796–808. https://doi.org/10.1109/TKDE.2007.190745CrossRef

46.

Yin, X., & Tan, W. (2011). Semi-supervised truth discovery. In Proceedings of the 20th International Conference on World Wide Web, WWW ’11 (pp. 217–226). ACM. https://doi.org/10.1145/1963405.1963439. Event-place: Hyderabad, India.

47.

Zhang, D. Y., Han, R., Wang, D., & Huang, C. (2016-12). On robust truth discovery in sparse social media sensing. In 2016 IEEE International Conference on Big Data (Big Data) (pp. 1076–1081). https://doi.org/10.1109/BigData.2016.7840710.

48.

Zhang, L., Qi, G. J., Zhang, D., & Tang, J. (2018). Latent dirichlet truth discovery: Separating trustworthy and untrustworthy components in data sources. IEEE Access, 6, 1741–1752. https://doi.org/10.1109/ACCESS.2017.2780182CrossRef

49.

Zhi, S., Zhao, B., Tong, W., Gao, J., Yu, D., Ji, H., & Han, J. (2015). Modeling truth existence in truth discovery. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’15 (pp. 1543–1552). ACM. https://doi.org/10.1145/2783258.2783339.

50.

Zwicker, W. S. (2016). Introduction to the theory of voting. In F. Brandt, V. Conitzer, U. Endriss, J. Lang, & A. D. Procaccia (Eds.), Handbook of Computational Social Choice. Cambridge University Press.

Title: Towards an axiomatic approach to truth discovery
Authors: Joseph Singleton
Richard Booth
Publication date: 01-10-2022
Publisher: Springer US
Published in: Autonomous Agents and Multi-Agent Systems / Issue 2/2022
Print ISSN: 1387-2532
Electronic ISSN: 1573-7454
DOI: https://doi.org/10.1007/s10458-022-09569-3

	Voting	SC-voting	Sums	U-sums
Source-coherence	X	\(\checkmark\)	\(\checkmark\)	\(\checkmark\)
Fact-coherence	\(\checkmark\)	X	\(\checkmark\)	\(\checkmark\)
Symmetry	\(\checkmark\)	\(\checkmark\)	\(\checkmark\)	\(\checkmark\)
Unanimity	\(\checkmark\)	\(\checkmark\)	\(\checkmark\)	\(\checkmark\)
Ground.	\(\checkmark\)	\(\checkmark\)	\(\checkmark\)	\(\checkmark\)
Mon.	\(\checkmark\)	\(\checkmark\)	X	?
Source-PCI	\(\checkmark\)	X	X	\(\checkmark\)
Fact-PCI	\(\checkmark\)	\(\checkmark\)	X	\(\checkmark\)
POI	\(\checkmark\)	\(\checkmark\)	X	X
Str. indep.	\(\checkmark\)	\(\checkmark\)	X	X

Springer Professional

Towards an axiomatic approach to truth discovery

Abstract

Publisher's Note

1 Introduction

2 An idealised framework for truth discovery

2.1 Truth discovery networks

2.2 Truth discovery operators

3 Examples of truth discovery operators

3.1 Voting

3.2 Sums

3.3 Recursive truth discovery operators

4 Axioms for truth discovery

4.1 Coherence

4.2 Symmetry

4.3 Fact ranking axioms

4.4 Independence axioms

5 Satisfaction of the axioms

5.1 Voting

5.2 Iterative and recursive operators

5.3 Sums

5.4 Modifying Voting and Sums

5.4.1 Voting

5.4.2 Sums

6 Variable domain truth discovery

7 Discussion

9 Conclusion

Declarations

Conflict of interest

Publisher's Note

Proofs

Proof of theorem 4.1

Proof of theorem 4.3

Proof of theorem 5.1

Proof of lemma 5.2

Proof of theorem 5.2

Proof of theorem 5.3

Proof of theorem 5.5

Premium Partner

Springer Professional

Abstract

Publisher's Note

1 Introduction

2 An idealised framework for truth discovery

2.1 Truth discovery networks

2.2 Truth discovery operators

3 Examples of truth discovery operators

3.1 Voting

3.2 Sums

3.3 Recursive truth discovery operators

4 Axioms for truth discovery

4.1 Coherence

4.2 Symmetry

4.3 Fact ranking axioms

4.4 Independence axioms

5 Satisfaction of the axioms

5.1 Voting

5.2 Iterative and recursive operators

5.3 Sums

5.4 Modifying Voting and Sums

5.4.1 Voting

5.4.2 Sums

6 Variable domain truth discovery

7 Discussion

8 Related work

9 Conclusion

Declarations

Conflict of interest

Publisher's Note

Proofs

Proof of theorem 4.1

Proof of theorem 4.3

Proof of theorem 5.1

Proof of lemma 5.2

Proof of theorem 5.2

Proof of theorem 5.3

Proof of theorem 5.5

Other articles of this Issue 2/2022

Guest editorial: special issue on fair division

Generalized kings and single-elimination winners in random tournaments

Minimality and comparison of sets of multi-attribute vectors

Topological influence and locality in swap schelling games

A framework for the comparison of agent-based models

On-line estimators for ad-hoc task execution: learning types and parameters of teammates for effective teamwork

Premium Partner