We propose a covering type solution to select a subset of modifications to explain observed Hi–C interactions between restriction sites. We assume each Hi–C interaction to be covered by at least one modification pair. Similar covering problems have been studied in primer selection and haplotyping (Halldórsson et al.
2004), [24]. Let
\(x_{mn}\) be a binary variable taking value 1 when modification
m interacts with modification
n, and let
\(y_{m}\) be a binary variable taking value 1 when modification
m is in the solution. Without loss of generality, we assume each restriction site to have at least a single modification. Otherwise, we remove restriction sites to which there is no mapped modification. We assume that two histone modifications can interact only when both modifications individually belong to the solution. The resulting Program (
1)–(
5) is defined as follows:
$$\begin{aligned}&\mathrm{argmin}_{Y}\,\sum _{m \in M}\,w_{m}y_{m} \end{aligned}$$
(1)
$$\begin{aligned}&\text {s.t. }\;\; \sum _{(m, c^{u}_{m}) \in H[u]}\,\left( \sum _{(n, c^{v}_{n}) \in H[v]} \,x_{mn}\right) \quad \ge 1,\quad (u, v) \in E \end{aligned}$$
(2)
$$\begin{aligned}&\quad \;\;\; x_{mn} \le y_{m}, \nonumber \\&\quad \;\;\; x_{mn} \le y_{n},\quad m \le n \in M^{2} \end{aligned}$$
(3)
$$\begin{aligned}&\quad \;\;\; x_{mn} \ge 0,\quad \;\, m \le n \in M^{2} \end{aligned}$$
(4)
$$\begin{aligned}&\quad \;\;\; y_{m} \ge 0,\quad \;\;\; m \in M \end{aligned}$$
(5)
where
\(w_{m}\) is the cost of adding modification
m to the solution. We define
\(w_{m}\) as
$$\begin{aligned} w_{m} = \frac{\sum _{(u, v) \notin E}\,\min (c^{u}_{m}, c^{v}_{m})}{\left( {\begin{array}{c}R\\ 2\end{array}}\right) - |E|} \end{aligned}$$
(6)
which increases if
m exists highly across non-interacting restriction sites. This heuristic weighting scheme penalizes the modifications that are also seen across non-interacting sites which cannot be penalized by the unweighted problem formulation. Constraint (
2) ensures that each interaction is covered by at least one modification pair existing in the corresponding sites, and constraint (
3) ensures that a modification pair can cover an interaction only when both histone modifications belong to the solution. Since interactions are independent sets of modification pairs, we can replace constraints (
3) by the following stronger set of constraints:
$$\begin{aligned} \sum _{(n, c^{u}_{n}) \in H[u]}\,x_{mn}\,+\,\sum _{(n, c^{v}_{n}) \in H[v]}\,x_{mn} \le y_{m},\quad m \in M, \, (u, v) \in E \end{aligned}$$
(7)
Let
\(Q = \max _{(u,v) \in E}(|H[u]|\,|H[v]|)\) be maximum size of histone modification pairs that can cover an interaction, C
hromatinC
overage is NP-hard, and Program (
1)–(
5) with the replaced constraint can be approximated by
\(O(\sqrt{Q\log (|E|)})\) as in Theorem
1 which follows from approximation-preserving reduction to
Minimum Weight Multicolored Subgraph Problem (MWMCSP) ( (Hajiaghayi et al.
2006). This is achieved by solving its LP relaxation and running a randomized rounding, adding each modification
m to the solution with probability
\(y_{m}\). If constraints are still not satisfied after rounding, we keep adding
\(y_{m}\) with the maximum number of satisfied constraints increase per unit cost (
\(w_{m}\)) until solution is satisfied. Problem is of polynomial size in the order of variables and constraints; the number of variables is
\(\frac{M(M+1)}{2} + M\), number of constraints is
\(E + ME\).