Skip to main content
Log in

Irreversible samplers from jump and continuous Markov processes

  • Published:
Statistics and Computing Aims and scope Submit manuscript

Abstract

In this paper, we propose irreversible versions of the Metropolis–Hastings (MH) and Metropolis-adjusted Langevin algorithm (MALA) with a main focus on the latter. For the former, we show how one can simply switch between different proposal and acceptance distributions upon rejection to obtain an irreversible jump sampler (I-Jump). The resulting algorithm has a simple implementation akin to MH, but with the demonstrated benefits of irreversibility. We then show how the previously proposed MALA method can also be extended to exploit irreversible stochastic dynamics as proposal distributions in the I-Jump sampler. Our experiments explore how irreversibility can increase the efficiency of the samplers in different situations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  • Andrieu, C., Thoms, J.: A tutorial on adaptive MCMC. Stat. Comput. 18, 343–373 (2008)

    Article  MathSciNet  Google Scholar 

  • Bardenet, R., Doucet, A., Holmes, C.: On Markov chain Monte Carlo methods for tall data. arXiv:1505.02827 (2015)

  • Bardenet, R., Doucet, A., Holmes, C.: Towards scaling up Markov chain Monte Carlo: an adaptive subsampling approach. In: Proceedings of the 30th International Conference on Machine Learning (ICML’14) (2014)

  • Barp, A., Briol, F.-X., Kennedy, A. D., Girolami, M.: Geometry and dynamics for Markov chain Monte Carlo. arXiv:1705.02891 (2017)

  • Bartlett, M.S.: Smoothing periodograms from time-series with continuous spectra. Nature 161, 686–687 (1948)

    Article  Google Scholar 

  • Bierkens, J., Fearnhead, P., Roberts, G.: The Zig-Zag process and super-efficient sampling for Bayesian analysis of big data. arXiv:1607.03188 (2016)

  • Bierkens, J., Roberts, G.: A piecewise deterministic scaling limit of Lifted Metropolis–Hastings in the Curie-Weiss model. arXiv:1509.00302 (2016)

  • Bierkens, J.: Non-reversible metropolis-hastings. Stat. Comput. 26, 1–16 (2015)

    MathSciNet  MATH  Google Scholar 

  • Bouchard-Côté, A., Vollmer, S.J., Doucet, A.: The bouncy particle sampler: A non-reversible rejection-free Markov chain Monte Carlo method. arXiv:1510.02451 (2016)

  • Bou-Rabee, N., Owhadi, H.: Long-run accuracy of variational integrators in the stochastic context. SIAM J. Num. Anal. 48, 278–297 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  • Chen, C., Ding, N., Carin, L.: On the convergence of stochastic gradient MCMC algorithms with high-order integrators. In: Advances in Neural Information Processing Systems 28 (NIPS’15), pp. 2278–2286 (2015)

  • Chen, T., Fox, E. B., Guestrin, C.: Stochastic gradient Hamiltonian Monte Carlo. In: Proceeding of 31st International Conference on Machine Learning (ICML’14) (2014)

  • Chen, F., Lovász, L., Pak, I.: Lifting Markov chains to speed up mixing. In: Proceedings of the 31st annual ACM STOC, pp. 275–281 (1999)

  • Chen, T.-L., Hwang, C.-R.: Accelerating reversible Markov chains. Stat. Probab. Lett. 83(9), 1956–1962 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  • Chib, S., Greenberg, E.: Understanding the Metropolis-Hastings algorithm. Am. Stat. 49(4), 327–335 (1995)

    Google Scholar 

  • Crooks, G.: Entropy production fluctuation theorem and the nonequilibrium work relation for free energy differences. Phys. Rev. E 60, 2721–2726 (1999)

    Article  Google Scholar 

  • Dembo, A., Deuschel, J.-D.: Markovian perturbation, response and fluctuation dissipation theorem. Ann. Inst. H. Poincaré Probab. Stat. 46, 822–852 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  • Deuschel, J.D., Stroock, D.W.: Large Deviations. American Mathematical Society, Providence (2001)

    MATH  Google Scholar 

  • Diaconis, P., Holmes, S., Neal, R.M.: Analysis of a nonreversible Markov chain sampler. Ann. Appl. Probab. 10, 726–752 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  • Ding, N., Fang, Y., Babbush, R., Chen, C., Skeel, R. D., Neven, H.: Bayesian sampling using stochastic gradient thermostats. In: Advances in Neural Information Processing Systems 27 (NIPS’14) (2014)

  • Duane, S., Kennedy, A.D., Pendleton, B.J., Roweth, D.: Hybrid Monte Carlo. Phys. Lett. B 195(2), 216–222 (1987)

    Article  Google Scholar 

  • Duncan, A.B., Lelièvre, T., Pavliotis, G.A.: Variance reduction using nonreversible Langevin samplers. J. Stat. Phys. 163(3), 457–491 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  • Flegal J. M., Vats, D., Jones, G. L.: Strong consistency of multivariate spectral variance estimators in Markov chain Monte Carlo. arXiv:1507.08266 (2016)

  • Flegal, J.M., Vats, D., Jones, G.L.: Multivariate output analysis for Markov chain Monte Carlo (2017)

  • Gelman, A., Carhn, J.B., Stern, H.S., Rubin, D.B.: Bayesian Data Analysis. Chapman and Hall, Boca Raton (2004)

    Google Scholar 

  • Geyer, C.J.: Practical Markov chain Monte Carlo. Stat. Sci. 7, 473–483 (1992)

    Article  Google Scholar 

  • Girolami, M., Calderhead, B.: Riemann manifold Langevin and Hamiltonian Monte Carlo methods. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 73(2), 123–214 (2011)

    Article  MathSciNet  Google Scholar 

  • Gustafson, P.: A guided walk Metropolis algorithm. Stat. Comput. 8(4), 357–364 (1998)

    Article  Google Scholar 

  • Hastings, W.K.: Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57, 97–109 (1970)

    Article  MathSciNet  MATH  Google Scholar 

  • Hatano, T., Sasa, S.-I.: Steady-state thermodynamics of Langevin systems. Phys. Rev. Lett. 86, 3463–3466 (2001)

    Article  Google Scholar 

  • Horowitz, A.M.: A generalized guided Monte Carlo algorithm. Phys. Lett. B 268(2), 247–252 (1991)

    Article  Google Scholar 

  • Hwang, C.-R., Hwang-Ma, S.-Y., Sheu, S.-J.: Accelerating Gaussian diffusions. Ann. Appl. Probab. 3(3), 897–913, 08 (1993)

    Article  MathSciNet  MATH  Google Scholar 

  • Hwang, C.-R., Hwang-Ma, S.-Y., Sheu, S.-J.: Accelerating diffusions. Ann. Appl. Probab. 15(2), 1433, 05–1444 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  • Jansen, S., Kurt, N.: On the notion(s) of duality for Markov processes. Probab. Surv. 11, 59–120 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  • Jarner, S.F., Roberts, G.O.: Convergence of heavy-tailed Monte Carlo Markov chain algorithms. Scand. J. Stat. 34(4), 781–815 (2007)

    MathSciNet  MATH  Google Scholar 

  • Kaiser, Marcus, Jack, Robert L., Zimmer, Johannes: Acceleration of convergence to equilibrium in Markov chains by breaking detailed balance. J. Stat. Phys. 168(2), 259–287 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  • Kim, S., Shephard, N., Chib, S.: Stochastic volatility: likelihood inference and comparison with ARCH models. Rev. Econ. Stud. 65, 361–393 (1998)

    Article  MATH  Google Scholar 

  • Komorowski, T., Landim, C., Olla, S.: Fluctuations in Markov Processes—Time Symmetry and Martingale Approximation. Springer, Berlin (2012)

    Book  MATH  Google Scholar 

  • Korattikara, A., Chen, Y., Welling, M.: Austerity in MCMC land: cutting the Metropolis-Hastings budget. In: Proceedings of the 30th International Conference on Machine Learning (ICML’14) (2014)

  • Kou, S.C., Zhou, Q., Wong, W.H.: Discussion paper: equi-energy sampler with applications in statistical inference and statistical mechanics. Ann. Stat. 34(4), 1581–1619 (2006)

    Article  MATH  Google Scholar 

  • Kwon, C., Ao, P., Thouless, D.J.: Structure of stochastic dynamics near fixed points. Proc. Natl. Acad. Sci. 102(37), 13029–13033 (2005)

    Article  Google Scholar 

  • Leimkuhler, B., Shang, X.: Adaptive thermostats for noisy gradient systems. SIAM J. Sci. Comput. 38(2), A712–A736 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  • Leimkuhler, B., Matthews, C., Tretyakov, M.: On the long-time integration of stochastic gradient systems. Proc. R. Soc. A 470, 20140120 (2014)

    Article  MathSciNet  Google Scholar 

  • Leliévre, T., Nier, F., Pavliotis, G.A.: Optimal non-reversible linear drift for the convergence to equilibrium of a diffusion. J. Stat. Phys. 152, 237–274 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  • Liu, C., Zhu, J., Song, Y.: Stochastic gradient geodesic MCMC methods. In: Advances in Neural Information Processing Systems 29 (NIPS’16), pp 3009–3017 (2016)

  • Liu, J.S.: Monte Carlo Strategies in Scientific Computing. Springer, Berlin (2001)

    MATH  Google Scholar 

  • Lu, X., Perrone, V., Hasenclever, L., Teh, Y.W., Vollmer, S.J.: Relativistic Monte Carlo. In: Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (AISTATS’17) (2017)

  • Ma, Y.-A, Chen, T., Fox, E.: A complete recipe for stochastic gradient MCMC. In: Advances in Neural Information Processing Systems 28 (NIPS’15), pp. 2899–2907 (2015)

  • Ma, Y.-A., Qian, H.: Universal ideal behavior and macroscopic work relation of linear irreversible stochastic thermodynamics. New J. Phys. 17(6), 065013 (2015)

    Article  MathSciNet  Google Scholar 

  • Metropolis, M., Rosenbluth, A.W., Rosenbluth, M.N., Teller, A.H., Teller, E.: Equation of state calculations by fast computing machines. J. Chem. Phys 21, 1087–1092 (1953)

    Article  Google Scholar 

  • Neal, R.M.: Improving asymptotic variance of MCMC estimators: non-reversible chains are better. arXiv:math/0407281 (2004)

  • Neal, R.M.: Bayesian Learning for Neural Networks. Springer, Berlin (1996)

    Book  MATH  Google Scholar 

  • Neal, R.M.: MCMC using Hamiltonian dynamics. Handb. Markov Chain Monte Carlo 54, 113–162 (2010)

    MATH  Google Scholar 

  • Ottobre, M., Pillai, N.S., Pinski, F.J., Stuart, A.M.: A function space HMC algorithm with second order Langevin diffusion limit. Bernoulli 22(1), 60–106, 02 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  • Patterson, S., Teh, Y.W.: Stochastic gradient Riemannian Langevin dynamics on the probability simplex. In: Advances in Neural Information Processing Systems 26 (NIPS’13) (2013)

  • Pavliotis, G.A.: Stochastic Processes and Applications. Springer, Berlin (2014)

    MATH  Google Scholar 

  • Pazy, A.: Semigroups of Linear Operators and Applications to Partial Differential Equations. Springer, Berlin (1983)

    Book  MATH  Google Scholar 

  • Poncet, R.: Generalized and hybrid Metropolis-Hastings overdamped Langevin algorithms. arXiv:1701.05833 (2017)

  • Priestley, M.B.: Spectral Analysis and Time Series. Academic, San Diego (1981)

    MATH  Google Scholar 

  • Qian, H.: A decomposition of irreversible diffusion processes without detailed balance. J. Math. Phys. 54, 053302 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  • Qian, H., Qian, M., Tang, X.: Thermodynamics of the general diffusion process: time-reversibility and entropy production. J. Stat. Phys. 107, 1129 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  • Rey-Bellet, L., Spiliopoulos, K.: Irreversible Langevin samplers and variance reduction: a large deviations approach. Nonlinearity 28(7), 2081 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  • Rey-Bellet, L., Spiliopoulos, K.: Improving the convergence of reversible samplers. J. Stat. Phys. 164(3), 472–494 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  • Robert, C., Casella, G.: Monte Carlo Statistical Methods, 2nd edn. Springer, Berlin (2004)

    Book  MATH  Google Scholar 

  • Roberts, G.O., Stramer, O.: Langevin diffusions and Metropolis-Hastings algorithms. Methodol. Comput. Appl. Probab. 4, 337–357 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  • Shang, X., Zhu, Z., Leimkuhler, B., Storkey, A.: Covariance-controlled adaptive Langevin thermostat for large-scale Bayesian sampling. In: Advances in Neural Information Processing Systems 28 (NIPS’15) (2015)

  • Shi, J., Chen, T., Yuan, R., Yuan, B., Ao, P.: Relation of a new interpretation of stochastic differential equations to Itô process. J. Stat. Phys. 148(3), 579–590 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  • Tak, H., Meng, X.-L., van Dyk, D. A.: A repulsive-attractive Metropolis algorithm for multimodality. arXiv:1601.05633 (2016)

  • Turitsyn, K.S., Chertkov, M., Vucelja, M.: Irreversible Monte Carlo algorithms for efficient sampling. Physica D 240(4–5), 410–414 (2011)

    Article  MATH  Google Scholar 

  • Villani, C.: Hypocoercivity. American Mathematical Society, Providence (2009)

    Book  MATH  Google Scholar 

  • Vucelja, M.: Lifting—a nonreversible Markov chain Monte Carlo algorithm. arXiv:1412.8762 (2015)

  • Welling, M., Teh, Y.W.: Bayesian learning via stochastic gradient Langevin dynamics. In: Proceedings of the 28th International Conference on Machine Learning (ICML’11), pp. 681–688 (2011)

  • Wu, S.-J., Hwang, C.-R., Chu, M.T.: Attaining the optimal Gaussian diffusion acceleration. J. Stat. Phys. 155(3), 571–590 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  • Xifara, T., Sherlock, C., Livingstone, S., Byrne, S., Girolami, M.: Langevin diffusions and the Metropolis-adjusted Langevin algorithm. Stat. Probab. Lett. 91, 14–19 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  • Yin, L., Ao, P.: Existence and construction of dynamical potential in nonequilibrium processes without detailed balance. J. Phys. A 39(27), 8593 (2006)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

This work was supported in part by ONR Grant N00014-15-1-2380, NSF CAREER Award IIS-1350133 and the TerraSwarm Research Center sponsored by MARCO and DARPA. We thank Samuel Livingstone, Paul Fearnhead, Galin Jones, Hong Qian and Michael I. Jordan for helpful suggestions and discussions. Y.-A. Ma would like to thank Sebastian J. Vollmer for directing him to the reference (Komorowski et al. 2012). We also thank the reviewers for their thoughtful comments and suggestions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yi-An Ma.

Appendices

Samplers from continuous Markov processes

1.1 Proof of Theorem 1 (existence of \(\mathbf {Q}(\mathbf {z})\) in (4))

Proof of Theorem 1 is comprised of two sets of ideas appearing in different fields. Here, we write the proof in two steps accordingly.

  1. 1.

    Plug the stationary solution \(\pi (\mathbf {z})\) into (2), and observe that

    $$\begin{aligned} \sum _i \dfrac{\partial }{\partial \mathbf {z}_i} \left\{ \sum _j\dfrac{\partial }{\partial \mathbf {z}_j}\Big (\mathbf {D}_{ij}(\mathbf {z})\pi (\mathbf {z}) \Big ) - \mathbf {f}_i\pi (\mathbf {z}) \right\} = 0. \end{aligned}$$
  2. 2.

    Constructively prove that if entries of

    $$\begin{aligned} \left\{ \sum _j\dfrac{\partial }{\partial \mathbf {z}_j}\Big (\mathbf {D}_{ij}(\mathbf {z})\pi (\mathbf {z}) \Big ) - \mathbf {f}_i\pi (\mathbf {z}) \right\} \end{aligned}$$

    belong to \(L^1(\mathbb {R}^d)\), then there exists a matrix \(\mathbf {Q}(\mathbf {z})\) with entries in \(W^{1,1}(\pi )\), such that \(\sum _{j} \dfrac{\partial }{\partial \mathbf {z}_j} \Big ( \mathbf {Q}_{ij}(\mathbf {z}) \pi (\mathbf {z}) \Big ) = \mathbf {f}_i\pi (\mathbf {z}) - \sum _j\dfrac{\partial }{\partial \mathbf {z}_j}\Big (\mathbf {D}_{ij}(\mathbf {z})\pi (\mathbf {z}) \Big )\).

Here we denote \(L^1(\mathbb {R}^d)\) as the space of Lebesgue integrable functions on \(\mathbb {R}^d\), and \(W^{1,1}(\pi )\) as the Sobolev space of functions with weak derivatives and function values integrable with respect to the density \(\pi \) times the Lebesgue measure. Step 1 can be found in the literature of continuous Markov processes (Qian et al. 2002; Dembo and Deuschel 2010; Qian 2013; Deuschel and Stroock 2001; Villani 2009; Pavliotis 2014). Step 2 has been found in earlier works on stochastic models in fluid dynamics and homogenization (Komorowski et al. 2012).

Step 1 is accomplished by noting that the un-normalized density function \(\pi (\mathbf {z})\propto p^s(\mathbf {z})\) is also a stationary solution of (2). Hence,

$$\begin{aligned} \sum _i \dfrac{\partial }{\partial \mathbf {z}_i} \left\{ \sum _j\dfrac{\partial }{\partial \mathbf {z}_j}\Big (\mathbf {D}_{ij}(\mathbf {z})\pi (\mathbf {z}) \Big ) - \mathbf {f}_i\pi (\mathbf {z}) \right\} = 0. \end{aligned}$$
(39)

Step 2 provides a possible form of \(\mathbf {Q}\) in terms of \(\mathbf {D}\), \(\mathbf {f}\) and \(\pi \). First, compare the right-hand sides of (2) and (4) and observe that they are equivalent if and only if there exists \(\mathbf {Q}(\mathbf {z})\) such that:

$$\begin{aligned} \sum _{j} \dfrac{\partial }{\partial \mathbf {z}_j} \mathbf {Q}_{ij}(\mathbf {z}) \pi (\mathbf {z}) = \mathbf {f}_i\pi (\mathbf {z}) - \sum _j\dfrac{\partial }{\partial \mathbf {z}_j}\Big (\mathbf {D}_{ij}(\mathbf {z})\pi (\mathbf {z}) \Big ). \end{aligned}$$
(40)

Since entries of \(\left\{ \sum _j\dfrac{\partial }{\partial \mathbf {z}_j}\Big (\mathbf {D}_{ij}(\mathbf {z})\pi (\mathbf {z}) \Big ) - \mathbf {f}_i\pi (\mathbf {z}) \right\} \) belong to \(L^1(\mathbb {R}^d)\), Fourier transform of it exists:

$$\begin{aligned} \hat{{\mathbf {F}}}_i({\mathbf {k}}) =&\int _{\mathbb {R}^{d}} \text {d}\mathbf {z}\left( {\mathbf {f}}_i(\mathbf {z}) \pi (\mathbf {z}) - \sum \limits _{j} \dfrac{\partial }{\partial \mathbf {z}_j} \Big ({\mathbf {D}}_{ij}(\mathbf {z}) \pi (\mathbf {z})\Big )\right) \\&\quad e^{-2\pi \mathrm{i}\ {\mathbf {k}}^T \mathbf {z}}. \end{aligned}$$

Therefore, we can take \(\mathbf {Q}(\mathbf {z})\) such that

$$\begin{aligned} {\mathbf {Q}}_{ij}(\mathbf {z}) \pi (\mathbf {z}) = {\displaystyle \int _{\mathbb {R}^{d}} \dfrac{{\mathbf {k}}_j\hat{{\mathbf {F}}}_i({\mathbf {k}})-{\mathbf {k}}_i\hat{{\mathbf {F}}}_j({\mathbf {k}})}{(2\pi \mathrm{i})\cdot \sum \limits _l {\mathbf {k}}_l^2} e^{2\pi \mathrm{i} \sum \limits _l {\mathbf {k}}_l \mathbf {x}_l} \text {d}{\mathbf {k}} }, \end{aligned}$$
(41)

where entries of \(\mathbf {Q}(\mathbf {z})\) belong to \(W^{1,1}(\pi )\).

Remark 1

It is worth noting that the condition of

$$\begin{aligned} \left\{ \sum _j\dfrac{\partial }{\partial \mathbf {z}_j}\Big (\mathbf {D}_{ij}(\mathbf {z})\pi (\mathbf {z}) \Big ) - \mathbf {f}_i\pi (\mathbf {z}) \right\} \in L^1(\mathbb {R}^d) \end{aligned}$$

can be rewritten for the vector field related to the SDE as: \(\left\{ \mathbf {f}(\mathbf {z}) + \mathbf {D}(\mathbf {z})\nabla H(\mathbf {z}) - \varGamma ^{\mathbf {D}}(\mathbf {z}) \right\} \in L^1(\pi )\), where \(\varGamma ^{\mathbf {D}}_i(\mathbf {z}) = \sum _{j=1}^d \dfrac{\partial }{\partial \mathbf {z}_j} {\mathbf {D}}_{ij}(\mathbf {z})\). We see that the condition is relatively mild for the purpose of constructing samplers.

1.2 Reversible and irreversible continuous dynamics for sampling

As has been previously noted (Qian et al. 2002; Dembo and Deuschel 2010; Qian 2013; Deuschel and Stroock 2001; Villani 2009; Pavliotis 2014), the stochastic dynamics of Eq. (4) and the corresponding (3) can be decomposed into a reversible Markov process and an irreversible process. Formally, this can be elucidated by the infinitesimal generator \(\mathcal {G}[\cdot ]\) of the stochastic process (3) (see Pavliotis 2014; Komorowski et al. 2012 for more background). For ease of derivation, we work with the Hilbert space \(L^2(\pi )\) of square integrable functions with respect to \(\pi (\mathbf {z})\), equipped with inner product \(<\phi , \varphi >_\pi = \int _{\mathbb {R}^{d}} \bar{\phi }(\mathbf {z}) \varphi (\mathbf {z}) \pi (\mathbf {z}) \text {d}\mathbf {z}\). For \(\phi (\mathbf {z})\in W^{2,2}(\pi )\) (i.e., \(\phi \) and its second-order weak derivatives belong to \(L^2(\pi )\)),

$$\begin{aligned} \mathcal {G} [\phi (\mathbf {z})] =&\nabla ^T \Big ( \big (\mathbf {D}(\mathbf {z}) - \mathbf {Q}(\mathbf {z})\big ) \nabla \phi (\mathbf {z}) \Big ) \nonumber \\&- \nabla H(\mathbf {z})^T \big (\mathbf {D}(\mathbf {z}) - \mathbf {Q}(\mathbf {z})\big ) \nabla \phi (\mathbf {z}). \end{aligned}$$
(42)

Then adjoint of \(\mathcal {G}\) in \(L^2(\pi )\) is:

$$\begin{aligned} \mathcal {G}^* [\phi (\mathbf {z})]&= \nabla ^T \Big ( \big (\mathbf {D}(\mathbf {z}) + \mathbf {Q}(\mathbf {z})\big ) \nabla \phi (\mathbf {z}) \Big ) \nonumber \\&\quad - \nabla H(\mathbf {z})^T \big (\mathbf {D}(\mathbf {z}) + \mathbf {Q}(\mathbf {z})\big ) \nabla \phi (\mathbf {z}). \end{aligned}$$
(43)

Therefore, \(\mathcal {G}\) decomposes into a self-adjoint part \(\mathcal {G}^S [\phi (\mathbf {z})] = \nabla ^T \Big ( \mathbf {D}(\mathbf {z}) \nabla \phi (\mathbf {z}) \Big ) - \nabla H(\mathbf {z})^T \mathbf {D}(\mathbf {z}) \nabla \phi (\mathbf {z})\) and an anti-self-adjoint part \(\mathcal {G}^A [\phi (\mathbf {z})] = \nabla ^T \Big ( \mathbf {Q}(\mathbf {z}) \nabla \phi (\mathbf {z}) \Big ) - \nabla H(\mathbf {z})^T \mathbf {Q}(\mathbf {z}) \nabla \phi (\mathbf {z})\). The self-adjoint operator \(\mathcal {G}^S\) corresponds to the reversible Markov process, while the anti-self-adjoint operator \(\mathcal {G}^A\) corresponds to the irreversible process.

It can be seen that the reversible process is determined solely by the diffusion matrix \(\mathbf {D}\), where evolution of its probability density function is:

$$\begin{aligned}&\dfrac{\partial }{\partial t} p(\mathbf {z};t)\nonumber \\&\quad = \nabla ^T \cdot \bigg ( \mathbf {D}(\mathbf {z}) \left[ p(\mathbf {z};t) \nabla H(\mathbf {z}) + \nabla p(\mathbf {z};t) \right] \bigg ) \nonumber \\&\quad = \sum _{i,j=1}^d \dfrac{\partial ^2}{\partial \mathbf {z}_i \partial \mathbf {z}_j} \bigg (\mathbf {D}_{ij}(\mathbf {z}) p(\mathbf {z};t)\bigg ) \nonumber \\&\qquad + \sum _{i=1}^d \dfrac{\partial }{\partial \mathbf {z}_i} \left( \left[ \sum _j\mathbf {D}_{ij}(\mathbf {z}) \dfrac{\partial H(\mathbf {z})}{\partial \mathbf {z}_j} - \varGamma ^{\mathbf {D}}_i(\mathbf {z})\right] p(\mathbf {z};t) \right) . \end{aligned}$$
(44)

Here, \(\varGamma ^{\mathbf {D}}_i(\mathbf {z}) = \sum _{j=1}^d \dfrac{\partial }{\partial \mathbf {z}_j} {\mathbf {D}}_{ij}(\mathbf {z})\). According to Itô’s convention, (44) corresponds to reversible Brownian motion in a potential force field on a Riemannian manifold specified by the diffusion matrix \(\mathbf {D}(\mathbf {z})\): \(\text {d}\mathbf {z}= \big [ - \mathbf {D}(\mathbf {z}) \nabla H(\mathbf {z}) + \varGamma ^{\mathbf {D}}(\mathbf {z})\big ] \text {d}t + \sqrt{2 \mathbf {D}(\mathbf {z})} \text {d}\mathbf {W}(t)\). This is referred to as Riemannian Langevin dynamicsGirolami and Calderhead 2011; Xifara et al. 2014. When \(\mathbf {D}(\mathbf {z})\) is positive definite, the reversible Markov dynamics have nice statistical regularity and will drive the system to converge to the stationary distribution.

The irreversible process is determined solely by \(\mathbf {Q}\), with its probability density function evolving according to:

$$\begin{aligned} \dfrac{\partial }{\partial t} p(\mathbf {z};t) =&\nabla ^T \cdot \bigg ( \mathbf {Q}(\mathbf {z}) \left[ p(\mathbf {z};t) \nabla H(\mathbf {z}) + \nabla p(\mathbf {z};t) \right] \bigg ) \nonumber \\ =&\nabla ^T \cdot \Bigg ( \Big [\mathbf {Q}(\mathbf {z}) \nabla H(\mathbf {z}) - \varGamma ^{\mathbf {Q}}(\mathbf {z}) \Big ] {p(\mathbf {z};t)} \Bigg ). \end{aligned}$$
(45)

Here, \(\varGamma ^{\mathbf {Q}}_i(\mathbf {z}) = \sum _{j=1}^d \dfrac{\partial }{\partial \mathbf {z}_j} {\mathbf {Q}}_{ij}(\mathbf {z})\). The last line of (45) is a Liouville’s equation, which describes the density evolution of \({p(\mathbf {z};t)}\) according to conserved, deterministic dynamics: \(\text {d}\mathbf {z}/\text {d}t = -\mathbf {Q}(\mathbf {z}) \nabla H(\mathbf {z}) + \varGamma ^{\mathbf {Q}}(\mathbf {z})\), with \(\pi (\mathbf {z})\) its invariant measure.

Combining the dynamics of (44) and (45) leads to a general SDE, (3), with stationary distribution \(\pi (\mathbf {z})\). Previous methods (Roberts and Stramer 2002; Xifara et al. 2014; Welling and Teh 2011; Patterson and Teh 2013; Neal 2010) have primarily focused on solely reversible or irreversible processes, respectively, as we make explicit in Sect. 2.3.

Irreversible Jump processes for MCMC

1.1 Equivalence of (10) and (11)

We introduce a symmetric bivariate function \(S(\mathbf {x},\mathbf {z}) = S(\mathbf {z},\mathbf {x}) = \dfrac{1}{2}\Big (W(\mathbf {z}|\mathbf {x})\pi (\mathbf {x}) + W(\mathbf {x}|\mathbf {z})\pi (\mathbf {z})\Big )\), and an antisymmetric bivariate function \(A(\mathbf {x},\mathbf {z}) = -A(\mathbf {z},\mathbf {x}) = \dfrac{1}{2}\Big (W(\mathbf {z}|\mathbf {x})\pi (\mathbf {x}) - W(\mathbf {x}|\mathbf {z})\pi (\mathbf {z})\Big )\) for (10). A different form of the jump process (10) can be written according to \(S(\mathbf {x},\mathbf {z})\) and \(A(\mathbf {x},\mathbf {z})\) as

$$\begin{aligned} \dfrac{\partial p(\mathbf {z}|\mathbf {y};t)}{\partial t} =&\int _{\mathbb {R}^d} \text {d}\mathbf {x}\bigg [ S(\mathbf {x},\mathbf {z}) \dfrac{p(\mathbf {x}|\mathbf {y};t)}{\pi (\mathbf {x})} - S(\mathbf {x},\mathbf {z}) \dfrac{p(\mathbf {z}|\mathbf {y};t)}{\pi (\mathbf {z})} \\&+ A(\mathbf {x},\mathbf {z}) \dfrac{p(\mathbf {x}|\mathbf {y};t)}{\pi (\mathbf {x})}\bigg ]. \end{aligned}$$

Plugging \(p(\mathbf {z}|\mathbf {y};t) = \pi (\mathbf {z})\) into the above equation, we find that as long as \(\int _{\mathbb {R}^d} A(\mathbf {x},\mathbf {z}) \text {d}\mathbf {x}= 0\), \(\pi (\mathbf {z})\) is a stationary solution to the equation. Since \(\dfrac{S(\mathbf {x},\mathbf {z}) + A(\mathbf {x},\mathbf {z})}{\pi (\mathbf {x})}\) denotes a transition probability density, \({S(\mathbf {x},\mathbf {z}) + A(\mathbf {x},\mathbf {z})}>0\) for any \(\mathbf {x}\) and \(\mathbf {z}\). The restriction that \(S(\mathbf {x},\mathbf {z}) \pi ^{-1}(\mathbf {x})\) and \( A(\mathbf {x},\mathbf {z}) \pi ^{-1}(\mathbf {x})\) are bounded is imposed for the practical purpose of proposing samples in (12). We thereby notice that the requirement that \(\pi (\mathbf {z})\) is a stationary distribution of the jump process is translated into simpler constraints.

1.2 Verifying condition 3.1 on \(A(\mathbf {y},\mathbf {y}^p,\mathbf {z},\mathbf {z}^p)\) in Section 3.3

The antisymmetric function \(A(\mathbf {y},\mathbf {y}^p,\mathbf {z},\mathbf {z}^p)\) (expressed in (21)) of (20) can be written as:

$$\begin{aligned}&A(\mathbf {y}, \mathbf {y}^p,\mathbf {z},\mathbf {z}^p) \\&\quad = \dfrac{1}{2 \varDelta t} \Big ( \pi (\mathbf {y})\pi (\mathbf {y}^p) p(\mathbf {z},\mathbf {z}^p|\mathbf {y},\mathbf {y}^p;\varDelta t) \\&\qquad - \pi (\mathbf {z})\pi (\mathbf {z}^p) p(\mathbf {y},\mathbf {y}^p|\mathbf {z},\mathbf {z}^p;\varDelta t) \Big ) \\&\quad = \dfrac{1}{2} \delta (\mathbf {z}^p -\mathbf {y}^p) \Big (\mathfrak {F}(\mathbf {y},\mathbf {y}^p,\mathbf {z},\mathbf {z}^p) - \mathfrak {F}(\mathbf {z},\mathbf {z}^p,\mathbf {y},\mathbf {y}^p)\Big ) \\&\qquad - \dfrac{1}{2} \delta ( \mathbf {z}^p+\mathbf {y}^p)\delta (\mathbf {z}-\mathbf {y}) \\&\qquad \cdot \int _{\mathbb {R}^d} \Big (\mathfrak {F}(\mathbf {y},\mathbf {y}^p,\mathbf {x},-\mathbf {z}^p) - \mathfrak {F}(\mathbf {z},\mathbf {z}^p,\mathbf {x},-\mathbf {y}^p)\Big )\text {d}\mathbf {x}. \end{aligned}$$

Below we prove that, as required,

$$\begin{aligned} \int _{\mathbb {R}^{d+d^p}} A(\mathbf {y},\mathbf {y}^p,\mathbf {z},\mathbf {z}^p) \ \text {d}\mathbf {y}\ \text {d}\mathbf {y}^p = 0. \end{aligned}$$

Proof

$$\begin{aligned}&\int _{\mathbb {R}^{d+d^p}} A(\mathbf {x},\mathbf {x}^p,\mathbf {z},\mathbf {z}^p) \ \text {d}\mathbf {x}\ \text {d}\mathbf {x}^p \\&\quad = \dfrac{1}{2} \int _{\mathbb {R}^{d}} \Big (\mathfrak {F}(\mathbf {y},\mathbf {z}^p,\mathbf {z},\mathbf {z}^p) - \mathfrak {F}(\mathbf {z},\mathbf {z}^p,\mathbf {y},\mathbf {z}^p)\Big ) \ \text {d}\mathbf {y}\\&\qquad - \dfrac{1}{2} \int _{\mathbb {R}^{d}} \Big (\mathfrak {F}(\mathbf {z},-\mathbf {z}^p,\mathbf {x},-\mathbf {z}^p) - \mathfrak {F}(\mathbf {z},\mathbf {z}^p,\mathbf {x},\mathbf {z}^p)\Big )\text {d}\mathbf {x}. \end{aligned}$$

One can check that in (23) and (19),

$$\begin{aligned} \widetilde{f}(\mathbf {z},\cdot \ |\mathbf {y}, -\mathbf {y}^p) = \widetilde{g}(\mathbf {z},\cdot \ |\mathbf {y},\mathbf {y}^p). \end{aligned}$$

Hence,

$$\begin{aligned} \mathfrak {F}(\mathbf {y},-\mathbf {y}^p,\mathbf {z},-\mathbf {z}^p)=\mathfrak {F}(\mathbf {z},\mathbf {z}^p,\mathbf {y},\mathbf {y}^p). \end{aligned}$$

Therefore,

$$\begin{aligned}&\int _{\mathbb {R}^{d+d^p}} A(\mathbf {x},\mathbf {x}^p,\mathbf {z},\mathbf {z}^p) \ \text {d}\mathbf {x}\ \text {d}\mathbf {x}^p \nonumber \\&\quad = \dfrac{1}{2} \int _{\mathbb {R}^{d}} \Big (\mathfrak {F}(\mathbf {z},-\mathbf {z}^p,\mathbf {y},-\mathbf {z}^p) - \mathfrak {F}(\mathbf {z},\mathbf {z}^p,\mathbf {y},\mathbf {z}^p)\Big ) \ \text {d}\mathbf {y}\nonumber \\&\qquad - \dfrac{1}{2} \int _{\mathbb {R}^{d}} \Big (\mathfrak {F}(\mathbf {z},-\mathbf {z}^p,\mathbf {x},-\mathbf {z}^p) - \mathfrak {F}(\mathbf {z},\mathbf {z}^p,\mathbf {x},\mathbf {z}^p)\Big )\text {d}\mathbf {x}= 0. \end{aligned}$$
(46)

\(\square \)

Proof of Theorem 2 (relation between forward process and adjoint process)

We first prove that for the infinitesimal generators, the backward transition probability density following the adjoint process and the forward transition probability density are related as: \(\pi (\mathbf {y}) P\big (\mathbf {z}| \mathbf {y}; \text {d}t \big ) = \pi (\mathbf {z}) P^\dag \big (\mathbf {y}| \mathbf {z}; \text {d}t \big )\). Taking path integrals with respect to the infinitesimal generators leads to the conclusion.

As is standard, we use two arbitrary smooth test functions \(\psi (\mathbf {y})\) and \(\phi (\mathbf {z})\). Then we use the definition of the infinitesimal generator of the process P and \(P^\dag \): \(\mathcal {G} [\phi (\mathbf {y})] \text {d}t = \int P\big (\mathbf {z}| \mathbf {y}; \text {d}t \big ) \phi (\mathbf {z}) \text {d}\mathbf {z}- \phi (\mathbf {y})\) and obtain

$$\begin{aligned}&\int \int \text {d}\mathbf {y}\text {d}\mathbf {z}\ {P\big (\mathbf {z}| \mathbf {y}; \text {d}t\big )} {\pi (\mathbf {y})} \ \psi (\mathbf {y}) \phi (\mathbf {z}) \\&\quad = <\psi , (I+ \text {d}t \ \mathcal {G})[\phi ]>_\pi , \end{aligned}$$

and

$$\begin{aligned}&\int \int \text {d}\mathbf {y}\text {d}\mathbf {z}\ {P^\dag \big (\mathbf {y}| \mathbf {z}; \text {d}t\big )} {\pi (\mathbf {z})} \ \psi (\mathbf {y}) \phi (\mathbf {z}) \\&\quad = <(I+ \text {d}t \ \mathcal {G}^*)[\psi ], \phi >_\pi . \end{aligned}$$

Since \(\mathcal {G}\) and \(\mathcal {G}^*\) are adjoint in \(L^2(\pi )\): \(<\psi , \mathcal {G}[\phi ]>_\pi = <\mathcal {G}^*[\psi ], \phi >_\pi \),

$$\begin{aligned} \pi (\mathbf {y}) P\big (\mathbf {z}| \mathbf {y}; \text {d}t \big ) = \pi (\mathbf {z}) P^\dag \big (\mathbf {y}| \mathbf {z}; \text {d}t \big ). \end{aligned}$$

Then we take path integrals over the forward path and the backward one. Using the Markov properties,

$$\begin{aligned}&P(\mathbf {z}_N,t_N|\mathbf {z}_0,t_0)\\&\quad = \int \cdots \int \prod _{i=1}^{N-1}\text {d}\mathbf {z}_i\prod _{i=0}^{N-1} P(\mathbf {z}_{i+1},t_{i+1}|\mathbf {z}_i,t_i); \end{aligned}$$

and

$$\begin{aligned}&P^\dag (\mathbf {z}_0,t_N|\mathbf {z}_N,t_0) \\&\quad = \int \cdots \int \prod _{i=1}^{N-1}\text {d}\mathbf {z}_i\prod _{i=0}^{N-1} P^\dag (\mathbf {z}_{i},t_{i+1}|\mathbf {z}_{i+1},t_i)\\&\quad = \int \cdots \int \prod _{i=1}^{N-1}\text {d}\mathbf {z}_i\prod _{i=0}^{N-1} \dfrac{\pi (\mathbf {z}_i)}{\pi (\mathbf {z}_{i+1})} P(\mathbf {z}_{i+1},t_{i+1}|\mathbf {z}_{i},t_i)\\&\quad = \int \cdots \int \prod _{i=1}^{N-1}\text {d}\mathbf {z}_i\prod _{i=0}^{N-1} \dfrac{\pi (\mathbf {z}_i)}{\pi (\mathbf {z}_{i+1})} P(\mathbf {z}_{i+1},t_{i+1}|\mathbf {z}_{i},t_i)\\&\quad = \dfrac{\pi (\mathbf {z}_0)}{\pi (\mathbf {z}_{N})} P(\mathbf {z}_N,t_N|\mathbf {z}_0,t_0). \end{aligned}$$

Taking the time interval between \(t_i\) and \(t_{i+1}\) to be infinitesimal, we obtain that \(\dfrac{P(\mathbf {z}^{(T)},T|\mathbf {z}^{(t)},t)}{P^\dag (\mathbf {z}^{(t)},T|\mathbf {z}^{(T)},t)} = \dfrac{\pi (\mathbf {z}^{(T)})}{\pi (\mathbf {z}^{(t)})}\). The same conclusion can be reached by proving that the semigroups \(e^{t\mathcal {G}}\) and \(e^{t\mathcal {G}^*}\) generated by \(\mathcal {G}\) and \(\mathcal {G}^*\) are also adjoint with each other (Jansen and Kurt 2014; Pazy 1983; Poncet 2017).

Experiments

1.1 Parameter settings for the irreversible jump sampler

In the 1D experiments, we use \(\beta = 1.2\) for the normal distribution case (where the length of the region of definition is 10) and \(\beta = 0.8\) for the log-normal distribution (where the length of the region of definition is 5). The acceptance rate is around \(50\%\) in these cases. Due to the irreversibility of the sampler, a high acceptance rate can be maintained while reducing the autocorrelation time.

In the visual comparison of samplers of Fig. 2, we use \(\beta =0.15\) (where the length of the region of definition is \(2 \times 2\)). In the 2D bimodal experiments, we use \(\beta =0.4\) (where the length of the region of definition is \(6 \times 3\)). In the 2D multimodal experiments, we take \(\beta =1.5\) (where the length of the region of definition is \(14 \times 14\)). For the 2D correlated distribution, we take \(\beta =0.25\) (where the length of the region of definition is \(5 \times 2\)).

1.2 Effect of dimensionality on I-Jump versus MH

In this appendix, we explore how the relative improvement of I-Jump over MH scales with dimensionality. We consider a standard normal distribution and double the dimension from one comparison to the next. For the I-Jump sampler, we use the vanilla half-space Gaussian proposal. It can be observed from Table 4 that the potential benefits of irreversibility in the I-Jump sampler slowly diminish with increasing dimensionality.

Table 4 Comparison of \(\widehat{\mathrm{ESS}}\) per second of runtime for I-Jump versus MH with different dimensions of Gaussian target distributions

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ma, YA., Fox, E.B., Chen, T. et al. Irreversible samplers from jump and continuous Markov processes. Stat Comput 29, 177–202 (2019). https://doi.org/10.1007/s11222-018-9802-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11222-018-9802-x

Keywords

Navigation