2.1 Energy Minimization
Most works on group synchronization require minimizing an energy function. We first describe a general energy minimization framework for group synchronization and then review relevant previous works. This framework uses a metric
\(d_{\mathcal {G}}\) defined on
\({\mathcal {G}}\) and a function
\(\rho \) from
\({\mathbb {R}}_+^{|E|}\) to
\({\mathbb {R}}_+\). We remark that
\({\mathbb {R}}_+\) denotes the set of nonnegative numbers and
\(|\cdot |\) denotes the cardinality of a set. The general framework aims to solve
$$\begin{aligned} \min _{g_i\in {\mathcal {G}}} \rho \left( \left( d_\mathcal {G}(g_{ij},g_ig_j^{-1})\right) _{ij\in E} \right) . \end{aligned}$$
(4)
Natural examples of
\(\rho \) include the sum of
pth powers of elements, where
\(p>0\), the number of nonzero elements, and the maximal element.
The elements of
\({\mathbb {Z}}_2\),
\(S_m\) and
SO(
d) (that is, the most common groups that arise in synchronization problems) can be represented by orthogonal matrices with sizes
\(N=1\),
m and
d, respectively. For these groups, it is common to identify each
\(g_i\),
\(i \in [n]\), with its representing matrix, choose
\(d_\mathcal {G}\) as the Frobenius norm of the difference of two group elements (that is, their representing matrices),
\(\rho (\cdot )= \Vert \cdot \Vert _{\nu }^{\nu }\), where
\(\nu =2\) or
\(\nu =1\), and consider the following minimization problem
$$\begin{aligned} \min _{g_i\in {\mathcal {G}}} \sum _{ij\in E} \Vert g_ig_j^{-1}-g_{ij}\Vert _F^\nu . \end{aligned}$$
(5)
The best choice of
\(\nu \) depends on
\({\mathcal {G}}\), the underlying noise and the corruption model. For Lie groups,
\(\nu =2\) is optimal under Gaussian noise, and
\(\nu =1\) is more robust to outliers (i.e., robust to significantly corrupted group ratios). For some examples of discrete groups, such as
\({\mathbb {Z}}_2\) and
\(S_N\),
\(\nu =2\) is information-theoretically optimal for both Gaussian noise and uniform corruption.
For
\(\nu =2\), one can form an equivalent formulation of (
5). It uses the block matrix
\(\varvec{Y}\in \mathbb {R}^{nN\times nN}\), where
\(\varvec{Y}_{ij}=g_{ij}\) if
\(ij\in E\) and
\(\varvec{Y}_{ij}=\varvec{0}_{N\times N}\), otherwise. Its solution is a block matrix
\(\varvec{X}\in {\mathbb {R}}^{nN\times nN}\), whose [
i,
j]-th block,
\(i,j \in [n]\), is denoted by
\(\varvec{X}_{ij}\). It needs to satisfy
\(\varvec{X}_{ij} = g_ig_j^T\), where
\(\{g_i\}_{i \in [n]}\) is the solution of (
5), or equivalently,
\(\varvec{X}=\varvec{x}\varvec{x}^T\), where
\(\varvec{x}=(g_i)_{i\in [n]}\in {\mathbb {R}}^{nN\times N}\). In order to obtain this,
\(\varvec{X}\) needs to be positive semi-definite of rank
N and its blocks need to represent elements of
\({\mathcal {G}}\), where the diagonal ones need to be the identity matrices. For
SO(2), it is more convenient to represent
\(g_i\) and
\(g_{ij}\) by elements of
U(1), the unit circle in
\({\mathbb {C}}\), and thus replace
\(\mathbb {R}^{2n\times 2}\) and
\({\mathbb {R}}^{2n\times 2n}\) with
\(\mathbb {C}^{n\times 1}\) and
\({\mathbb {C}}^{n\times n}\). Using these components, the equivalent formulation can be written as
$$\begin{aligned} \begin{array} { c l } { \underset{ \varvec{X}\in {\mathbb {R}} ^ { nN \times nN }}{ {\text { max }} } } &{} { {\text { Tr }} ( \varvec{X}^T\varvec{Y}) } \\ { \text{ subject } \text{ to } } &{} {\{\varvec{X}_{ij}\}_{i,j=1}^n\subset {\mathcal {G}}}\\ { } &{} { \varvec{X}_ { i i } = \varvec{I}_{N\times N} , i = 1 , \ldots , n } \\ { } &{} { \varvec{X}\succeq \varvec{0} } \\ { } &{} { \text {rank}\,(\varvec{X})=N.} \end{array} \end{aligned}$$
(6)
The above formulation is commonly relaxed by removing its two nonconvex constraints: rank
\((\varvec{X})=N\) and
\(\{\varvec{X}_{ij}\}_{i,j=1}^n\subset {\mathcal {G}}\). The solution
\({\hat{\varvec{X}}}\) of this relaxed formulation can be found by an SDP solver. One then commonly computes its top
N eigenvectors and stacks them as columns to obtain the
\(n \times 1\) vector of
\(N \times N\) blocks,
\({\tilde{\varvec{x}}}\) (note that
\({\tilde{\varvec{x}}}{\tilde{\varvec{x}}}^T\) is the best rank-N approximation of
\({\hat{\varvec{X}}}\) in Frobenius norm). Next, one projects each of the
N blocks of
\({\tilde{\varvec{x}}}\) (of size
\(N\times N\)) onto
\({\mathcal {G}}\). This whole procedure, which we refer to in short as SDP, is typically slow to implement [
41]. A faster common method, which we refer to as Spectral, applies a similar procedure while ignoring all constraints in (
6). In this case, the highly relaxed solution of (
6) is
\({\hat{\varvec{X}}}: = \varvec{Y}\) and one only needs to find its top
N eigenvectors and project their blocks on the group elements [
41].
Formulation (
6) and its SDP relaxation first appeared in the celebrated work of Goemans and Williamson [
18] on the max-cut problem. Their work can be viewed as a formulation for solving
\({\mathbb {Z}}_2\) synchronization. Amit Singer [
41] proposed the generalized formulation and its relaxed solutions for group synchronization, in particular, for angular synchronization.
The exact recovery for
\({\mathbb {Z}}_2\) synchronization is studied in [
2,
5] by assuming an Erdős-Rényi graph, where each edge is independently corrupted with probability
\(q<1/2\). Abbe et al. [
2] specified an information-theoretic lower bound on the average degree of the graph in terms of
q. Bandeira [
5] established asymptotic exact recovery for SDP for
\({\mathbb {Z}}_2\) synchronization w.h.p. (with high probability) under the above information-theoretic regime. Montanari and Sen [
32] studied the detection of good edges, instead of their recovery, under i.i.d. additive Gaussian noise.
Asymptotic exact recovery for convex relaxation methods of permutation synchronization appears in [
10,
35]. In [
35], noise is added to the relative permutations in
\(S_N\). The permutations are represented by
\(N \times N\) matrices, and the elements of the additive
\(N \times N\) noise matrix are i.i.d.
\(N(0,\eta ^2)\). In this setting, exact recovery can be guaranteed when
\(\eta ^2<{(n/N)}/{(1+4(n/N)^{-1})}\) as
\(nN\rightarrow \infty \). An SDP relaxation, different from (
6), is proposed in [
10,
22]. It is shown in [
22] that for fixed
N and probability of corruption less than 0.5, their method exactly recovers the underlying permutations w.h.p. as
\(n\rightarrow \infty \). We remark that [
22] assumes element-wise corruption of permutation matrices which is different from ours. An improved theoretical result is given by Chen et al. [
10], which matches the information-theoretic bound.
Rotation synchronization has been extensively studied [
3,
8,
19,
21,
30,
45]. In order to deal with corruption, it is most common to use
\(\ell _1\) energy minimization [
8,
21,
45]. For example, Wang and Singer formulated a robust
SO(
d) synchronization, for any
\(d \ge 2\), as the solution of (
5) with
\(\nu =1\) and
\({\mathcal {G}} = SO(d)\). Inspired by the analysis of [
27,
48], they established asymptotic and probabilistic exact recovery by the solution of their minimization problem under the following very special probabilistic model: The graph is complete or even Erdős-Rényi, the corruption model for edges is Bernoulli with corruption probability less than a critical probability
\(p_c\) that depends on
d, and the corrupted rotations are i.i.d. sampled from the Haar distribution on
SO(
d). They proposed an alternating direction augmented Lagrangian method for practically solving their formulation, but their analysis only applies to the pure minimizer.
A somewhat similar problem to group synchronization is camera location estimation [
20,
33,
34,
38]. It uses the noncompact group
\({\mathbb {R}}^3\) with vector addition, and its input includes possibly corrupted measurements of
\(\{Tg_{ij}^*\}_{ij\in E}\), where
\(T(g_{ij}^*)=g_{ij}^*/\Vert g_{ij}^*\Vert \) and
\(\Vert \cdot \Vert \) denotes the Euclidean norm. The application of
T distorts the group structure and may result in loss of information.
For this problem, other forms of energy minimization have been proposed, which often differ from the framework in (
5). The first exact recovery result for a specific energy minimization algorithm was established by Hand, Lee and Voroniski [
20]. The significance of this work is in the weak assumptions of the corruption model, whereas in the previously mentioned works on exact recovery [
2,
5,
12,
22,
35,
45], the corrupted group ratios followed very specific probability distributions. More specifically, the main model in [
20] assumed an Erdős-Rényi graph
G([
n],
E) with parameter
p for connecting edges and an arbitrary corrupted set of edges
\(E_b\), whose corruption is quantified by the maximal degree of
\(G([n],E_b)\) divided by
n, which is denoted by
\(\epsilon _b\). The transformed group ratios,
\(T(g_{ij})\), are
\(T(g_{ij}^*)\) for
\(ij \in E_g\) and are arbitrarily chosen in
\(S^2\), the unit sphere, for
\(ij \in E_b\). They established exact recovery under this model with
\(\epsilon _b=O(p^5/\log ^3 n)\). A similar exact recovery theory for another energy minimization algorithm, namely the least unsquared deviations (LUD) [
33], was established by Lerman, Shi and Zhang [
28], but with the stronger corruption bound,
\(\epsilon _b=O(p^{7/3}/\log ^{9/2} n)\).
Huang et al. [
23] solved an
\(\ell _1\) formulation for 1D translation synchronization, where
\({\mathcal {G}}={\mathbb {R}}\) with regular addition. They proposed a special version of IRLS and provided a deterministic exact recovery guarantee that depends on
\(\epsilon _b\) and a quantity that uses the graph Laplacian.
2.2 Synchronization Methods Based on Cycle Consistency
Previous methods that use the cycle consistency constraint in (
2) only focus on synchronizing camera rotations. Additional methods use a different cycle consistency constraint to synchronize camera locations. Assuming that
\({\mathcal {G}}\) lies in a metric space with a metric
\(d_{{\mathcal {G}}}(\cdot \,,\cdot )\), the corruption level in a cycle
L can be indicated by the cycle inconsistency measure
\( d_{{\mathcal {G}}}(g_L\,, e_{{\mathcal {G}}}) \), where
\(g_L\) was defined in (
3). There exist few works that exploit such information to identify and remove the corrupted edges. A likelihood-based method [
47] was proposed to classify the corrupted and uncorrupted edges (relative camera motion) from observations
\(d_{{\mathcal {G}}}(g_L\,, e_{{\mathcal {G}}})\) of many sampled
L’s. This work has no theoretical guarantees. It seeks to solve the following problem:
$$\begin{aligned} \max _{x_{ij}\in \{0,1\}}\prod _{L}\Pr \left( \{x_{ij}\}_{ij\in L}|d_{{\mathcal {G}}}(g_L\,, e_{{\mathcal {G}}})\right) . \end{aligned}$$
(7)
The variables
\(\{x_{ij}\}_{ij \in E}\) provide the assignment of edge
ij in the sense that
\(x_{ij}={\mathbf {1}}_{\{ij\in E_g\}}\), where
\({\mathbf {1}}\) denotes the indicator function. One of the proposed solutions in [
47] is a linear programming relaxation of (
7). The other proposed solution of (
7) uses belief propagation. It is completely different from the message passing approach proposed in this work.
Shen et al. [
37] find a cleaner subset of edges by searching for consistent cycles. In particular, if a cycle
L of length
m satisfies
\(d_{{\mathcal {G}}}(g_L\,, e)<\epsilon /\sqrt{m}\), then all the edges in the cycle are treated as uncorrupted. However, this approach lacks any theoretical guarantees and may fail in various cases. For example, the case where edges are maliciously corrupted and some cycles with corrupted edges satisfy
\(d_{\mathcal {G}}(g_L\,, e)<\epsilon /\sqrt{m}\).
An iterative reweighting strategy, referred to as IR-AAB, was proposed in [
38] to identify corrupted pairwise directions when estimating camera locations. Experiments on synthetic data showed that IR-AAB was able to detect exactly the set of corrupted pairwise directions that were uniformly distributed on
\(S^2\) with low or medium corruption rate. However, this strategy was only restricted to camera location estimation and no exact recovery guarantees were provided for the reweighting algorithm. We remark that our current work is a generalization of [
38] to compact group synchronization problems. We also provide a message-passing interpretation for the ideas of [
38] and stronger mathematical guarantees in our context, but we do not address here the camera location estimation problem.
2.3 Message Passing Algorithms
Message passing algorithms are efficient methods for statistical inference on graphical models. The most famous message passing algorithm is belief propagation (BP) [
46]. It is an efficient algorithm for solving marginal distribution or maximizing the joint probability density of a set of random variables that are defined on a Bayesian network. The joint density and the corresponding Bayesian network can be uniquely described by a factor graph that encodes the dependencies of factors on the random variables. In particular, each factor is considered as a function of a small subset of random variables and the joint density is assumed as the product of these factors. The BP algorithm passes messages between the random variables and factors in the factor graph. When the factor graph is a tree, BP is equivalent to dynamic programming and can converge in finite iterations. However, when the factor graph contains loops, BP has no guarantee of convergence and accuracy. The BP algorithm is applied in [
47] to solve the maximal likelihood problem (
7). However, since the factor graph defined in [
47] contains many loops, there are no convergence and accuracy guarantees of the solution.
Another famous class of message passing algorithms is approximate message passing (AMP) [
14,
36]. AMP can be viewed as a modified version of BP, and it is also used to compute marginal distribution and maximal likelihood. The main advantage of AMP over BP is that it enjoys asymptotic convergence guarantees even on loopy factor graphs. AMP was first proposed by Donoho, Maleki and Montanari [
14] to solve the compressed sensing problem. They formulated the convex program for this problem as a maximal likelihood estimation problem and then solved it by AMP. Perry et al. [
36] apply AMP to group synchronization over any compact group. However, they have no corruption and only assume additive i.i.d. Gaussian noise model, where they seek an asymptotic solution that is statistically optimal.
Another message passing algorithm [
12] was proposed for
\({\mathbb {Z}}_2\) synchronization. It assigns probabilities of correct labeling to each node and each edge. These probabilities are iteratively passed and updated between nodes and edges until convergence. There are several drawbacks of this method. First of all, it cannot be generalized to other group synchronization problems. Second, its performance is worse than SDP under high corruption [
12]. At last, no theoretical guarantee of exact recovery is established. We remark that this method is completely different from the method proposed here.