A Machine Learning-based Source Property Inference for Compact Binary Mergers

Deep Chatterjee; Shaon Ghosh; Patrick R. Brady; Shasvath J. Kapadia; Andrew L. Miller; Samaya Nissanke; Francesco Pannarale

doi:10.3847/1538-4357/ab8dbe

1. Introduction

The first two observing runs of the LIGO detectors, (Aasi et al. 2015) and the Virgo detector (Acernese et al. 2014) witnessed a remarkable level of participation from the electromagnetic (EM) astronomy community in the search for EM counterparts of gravitational-wave (GW) detections from coalescing binaries (Abbott et al. 2019a, 2019b). As the detectors become more sensitive, the projected detection rates of such events will increase (Abbott et al. 2018). Technological improvement is not just confined to GW detectors alone. Current and upcoming telescope facilities such as the Zwicky Transient Facility (Kulkarni 2016) and the Large Synoptic Survey Telescope, (Ivezić et al. 2019), consistent with the timeline of LIGO/Virgo operations, plan to participate in the follow-up efforts (see Graham et al. 2019, for example).

Observers are interested in the presence of neutron stars (NSs) in coalescing binaries. This is a minimum condition for there to be matter post-merger. The dynamics of matter in the extreme environment of the aftermath of a compact binary merger is responsible for EM phenomena associated with GWs. Binary black hole (BBH) mergers, therefore, are not expected to have an associated counterpart, since they are vacuum solutions to the Einstein's field equations. Even in the presence of an NS, other effects, like the equation of state (EoS) of the NS(s), or the mass and spin of the companion BH, play crucial roles in the tidal disruption, and the amount of matter ejected. For a neutron star black hole (NSBH) system, tidally disrupted material from the NS could form an accretion disk around the central BH. High temperatures in the disk could lead to annihilation of neutrinos to pair produce electron-positrons, which further annihilate to power a short gamma-ray burst (GRB). This could also happen via extraction of rotational energy from the BH due to the presence of magnetic field lines threading the BH horizon (Blandford & Znajek 1977). In the case of unbound ejecta, r-process nucleosynthesis can power a kilonova. (Lattimer & Schramm 1974; Li & Paczyński 1998; Korobkin et al. 2012; Barnes & Kasen 2013; Tanaka & Hotokezaka 2013; Kasen et al. 2015) For a binary neutron star (BNS) system, even if the tidal interaction is not strong enough, the two bodies will eventually come into physical contact, resulting in shocks that expel neutron-rich material. This will result in a kilonova, as seen in the case of GW170817 (Abbott et al. 2017; Arcavi et al. 2017; Coulter et al. 2017; Kasliwal et al. 2017; Lipunov et al. 2017; Soares-Santos et al. 2017; Tanvir et al. 2017). The interaction of the ejecta with the surrounding medium can result in synchrotron emission, observable in X-rays and the radio spectrum in weeks to months. There can be relativistic outflows, which could result in a GRB, as seen for GW170817; there could also be cases of prompt collapse where GRB generation could be suppressed (Ruiz & Shapiro 2017). Nevertheless, the generation of some EM messenger is highly probable. Therefore, data products that predict the existence of matter are useful in EM counterpart follow-up operations.

An accurate computation of remnant matter requires general-relativistic numerical simulations of compact mergers. These are expensive, and only a few (≲100) such simulations have been performed to date. Also, such simulations are not possible given the timescale of discoveries and generic target of opportunity follow-ups of GW candidates. Empirical fits to the numerical relativity results, however, have been performed, and are a use case for such real-time inferences. For example, Foucart (2012) and Foucart et al. (2018) devised an empirical fit to predict the combined mass from the accretion disk, the tidal tail, and the ejecta remaining outside the final BH for the case of a NSBH merger. However, note that such fits often require more input than what is available from the real-time GW data. For example, the fits mentioned above require the compactness of the NS, which is not a parameter inferred by the GW searches. The NS EoS, which is not constrained strongly, has to be assumed in order to infer the compactness.

The second LIGO/Virgo observing run, O2, saw the first effort to provide real-time data products to aid EM follow-up operations from ground- and space-based facilities (Abbott et al. 2019a). These included sky localization maps, (Singer & Price 2016; Singer et al. 2016) and source classification of the binary, which included the following:

1.
the probability that there was at least one neutron star in the binary, $p({\mathtt{HasNS}})$ ; and
2.
the probability that there was non-zero remnant matter, $p({\mathtt{HasRemnant}})$ , considering the mass and spin of the components, based on the Foucart (2012) fit.

For a BNS merger, we expect some matter to be expelled (see Table 1 of Shibata & Hotokezaka 2019 for different scenarios). Therefore, we expect the result, $p({\mathtt{HasNS}})=1;$ $p({\mathtt{HasRemnant}})=1$ . At the other extreme, BBH coalescences will not lead to remnant matter, since they are vacuum solutions, i.e., $p({\mathtt{HasNS}})=0;$ $p({\mathtt{HasRemnant}})=0$ . Hence, $p({\mathtt{HasRemnant}})$ is more relevant for NSBH systems. Here, the mass and spin of the BH determine the tidal disruption of the NS. Lower mass and high spin imply a smaller innermost stable circular orbit, which allows the NS to inspiral closer to BH. The tidal force exerted by the BH, which also increases with spin, then tears the NS apart. This leaves remnant matter post-merger. However, if the NS is compact, or tidal forces are not sufficient enough, the NS is swallowed whole into the BH, leaving no remnant. The type and morphology of EM counterparts generated depend on the amount of matter ejected and its properties. Pannarale & Ohme (2014) considered the conditions for short GRB production in the context of LIGO/Virgo observations of NSBHs. More recent work has tried to understand the morphology of kilonovae from NSBH mergers considering the density structure of the ejected matter, opacity properties, the viewing angle, and other factors (see Barbieri et al. 2019; Hotokezaka & Nakar 2020, for example). However, accurate modeling is still in its infancy. Thus, the presence of remnant matter is a conservative proxy for the presence of counterparts, still more constraining than the presence of a NS component alone, despite the model dependence, i.e., the assumption of NS EoS, and the usage of a particular fit. The rationale behind computing two quantities is to give flexibility to observing partners in follow-up operations.

The main challenge in this inference, however, is to handle detection uncertainties in the parameter recovery of the real-time GW template-based searches. This was done in O2 via an effective Fisher formalism using an ambiguity region around the parameters of the triggered template. The algorithm used for O2 is described in Section 3.3.2 of Abbott et al. (2019a), and briefly summarized in Section 2 below. While it accounted for statistical uncertainties, the systematic errors in the low-latency GW template-based analysis were not considered. Here we consider the problem differently. We treat the problem as binary classification, and present a new technique that is based on supervised learning. This not only improves the speed and accuracy, but also removes runtime dependencies that were required during O2 operations. Also, this technique provides the flexibility to incorporate astrophysical rates of binary populations in the universe.

In the third LIGO/Virgo observing run, O3, these data products (and a few more) continue to be part of the public alerts.⁸ In this work, we make a slight modification to the nomenclature. The $p({\mathtt{HasRemnant}})$ quantity was referred to as EM bright classification probability in Abbott et al. (2019a). Here, we refer to the both these quantities collectively as source properties, following the O3 LIGO/Virgo public alert userguide. These values indicate the chances of the matter remaining post-merger, the dynamics of which can launch EM counterparts. For example, the combination $p({\mathtt{HasNS}})=1;$ $p({\mathtt{HasRemnant}})=0$ , indicates a conservative measure of presence of matter—just the presence of an NS. However, the combination $p({\mathtt{HasNS}})=1;$ $p({\mathtt{HasRemnant}})=1$ is a stronger indication of the presence of a counterpart, despite some model dependence.

The organization of the paper is as follows. In Section 2 we provide a brief review of the ellipsoid-based inference used in O2. In Section 3, we present the inference using a supervised learning method called KNeighborClassifier (Pedregosa et al. 2011), which was trained on injection campaigns from the GstLAL search pipeline (Messick et al. 2017) used by LIGO/Virgo in routine search sensitivity analyses during O2. We test the performance of the machine-learned inference. In Section 4, we conclude and propose using this method to report source properties, $p({\mathtt{HasNS}})$ and $p({\mathtt{HasRemnant}})$ in future operations.

2. Ellipsoid-based Classification

2.1. Low-latency Searches

LIGO/Virgo searches for transient GW signals fall into two broad categories: modeled compact binary coalescence (CBC) searches (Adams et al. 2016; Hooper et al. 2012; Messick et al. 2017; Nitz et al. 2018; Abbott et al. 2019b) and un-modeled burst searches (Klimenko et al. 2016; Lynch et al. 2017). In this work, we are concerned with the former. The modeled searches use a discrete template bank of CBC waveforms to carry out matched filtering on the data. This is further broken down into real-time online analysis, and calibration-corrected offline analysis. The online low-latency searches report CBC events in sub-minute latencies. They use waveform templates that are characterized by masses, (m₁, m₂), and the dimensionless aligned/anti-aligned spins of the binary elements along the orbital angular momentum of the binary, $({\chi }_{1}^{z},{\chi }_{2}^{z})$ . They report a best-matching template based on an appropriate detection statistic. We call the parameters of this template, $\{{m}_{1},{m}_{2},{\chi }_{1}^{z},{\chi }_{2}^{z}\}$ , the point estimate. This data can be used for low-latency source property inference.

2.2. Capturing Detection Uncertainties

Since the source property inference is to be done based on the point estimates, the obvious pitfall in the inference is: How accurate are the point estimates compared to the true parameters of the source? The primary goal of detection pipelines is to maximize detection efficiency at fixed false-alarm probability. While some parameters like the chirp mass,

$\begin{eqnarray}&&{{ \mathcal M }}_{c}={({m}_{1}{m}_{2})}^{3/5}/{({m}_{1}+{m}_{2})}^{1/5},\end{eqnarray} \tag{ 1 }$

on which the signal strongly depends, are measured accurately,⁹ others, like the individual mass or spin components are often inconsistent compared to the true parameters. Accurate parameter recovery is left to Bayesian parameter estimation analysis (Veitch et al. 2015; Ashton et al. 2019; Biwer et al. 2019).

Consider the case for the GstLAL search (Messick et al. 2017; Mukherjee et al. 2018; Sachdev et al. 2019) in Figure 1. Here, we compare fake GW signals whose parameters we know a priori, to the recovered template, i.e., point estimate, obtained from injecting the fake signals in detector noise and running the pipeline. Note that the recovered masses can sometimes be significantly different from the injected values, leading to an erroneous classification of the systems based on point estimates alone. To alleviate this problem attempts were made to capture the uncertainty in the recovery of the parameters using an effective Fisher formalism (Cho et al. 2013). This method allows us to construct an ellipsoidal region of the parameter space around the point estimate that captures the uncertainty in the parameters under the Fisher approximation. This was used to create confidence regions in the parameter estimation code RapidPE (Pankow et al. 2015), from which it was implemented in the EM-Bright pipeline to construct 90% confidence regions in three dimensions—chirp mass, symmetric mass ratio, and effective spin. This ellipsoidal region was populated uniformly with one thousand points (besides the original triggered point). The fraction of these ellipsoid samples that had ${m}_{2}\lt {m}_{\max }^{\mathrm{NS}}$ ¹⁰ constituted the $p({\mathtt{HasNS}})$ value, while the fraction that had non-vanishing disk mass, ${M}_{\mathrm{disk}}\gt 0$ from the Foucart (2012) fit, constituted $p({\mathtt{HasRemnant}})$ value.

**Figure 1.** In this figure we compare the mass and spin recovery of one of the search pipelines, GstLAL (Messick et al. 2017), that meet the false-alarm rate threshold of Equation (2). Upper panel: this panel shows the (m₁, m₂) pairs of a Gaussian distributed BNS population $\sim { \mathcal N }[1.33{M}_{\odot },0.09{M}_{\odot }]$ (see Table 1). The left plot shows the masses injected following a normal distribution, as mentioned in Table 1, colored by the injected primary aligned spin component, ${\chi }_{1}^{z}$ . The right plot shows the recovered masses colored by the recovered ${\chi }_{1}^{z}$ . It can be seen that the distribution in the recovered space is significantly different from the one in the injected space. One may also see that the recovered spin values may be higher than the injected ones, especially in the case of higher-mass ratio recoveries. Lower panel: this panel shows the injected values of the primary and secondary masses against their recovered values for low-mass injections. This is an example where one can see the systematic effect of the primary mass being recovered at higher values than the injected values. The secondary follows the opposite trend: the recovered value is less than the injected values. The effect also exists at higher-mass ranges. Both plots are colored by the recovered ${\chi }_{1}^{z}$ values. Note the recovered ${m}_{1}^{\mathrm{rec}}\gt 2{M}_{\odot }$ (both panels) have higher values of recovered ${\chi }_{1}^{z}$ . This is because the GstLAL search uses templates with low spins for masses $\leqslant 2{M}_{\odot }$ and high spins above that (see Figures 1 and 2 in Mukherjee et al. (2018) for example). Even values slightly higher than $2{M}_{\odot }$ may result in high spin values compared to the injections.
Download figure:
Standard image High-resolution image

3. Machine Learning-based Classification

The method of uncertainty ellipsoids handles the statistical uncertainties of the parameters from the low-latency search pipelines. However, the underlying Fisher approximation is only suitable in cases with a high signal-to-noise ratio, when the parameter uncertainties are expected to be Gaussian-distributed (see Section 2 of Cutler & Flanagan 1994 for example). Also, it is not robust at capturing any bias that a search might have. Such trends are seen, for example, in Figure 1, where the m₁ parameter is recovered to be larger than the injected value, while the m₂ parameter is recovered to be smaller.¹¹ Such uncertainties are more often the dominant source of error in this inference. While they decrease as the significance increases, they may be pronounced otherwise. Capturing and correcting such selection effects can be done by supervised machine-learning algorithms. By injecting fake signals into real noise, performing the search, and comparing the recovered parameters with the original parameters of injections, one gets the map between the injected and recovered parameters. This is qualitatively illustrated in Figure 2. Given a broad training set, the supervised algorithm learns this map. The training features are recovered parameters obtained after running the search; however, the labels of having an NS or remnant are determined from the injected values. It should be highlighted that we are not using machine learning to predict the recovered parameters from the injected values, or vice versa. Rather, we use it for binary classification, correcting for selection biases that could have, otherwise, given an erroneous answer from the point estimate. We return the probability that the binary had a component less than $3{M}_{\odot }$ , which we assume to be a conservative upper limit of the NS mass, and the probability that it had remnant matter based on the Foucart et al. (2018) (hereafter F18) expression.

**Figure 2.** This figure is a qualitative illustration of the binary classification treatment of the problem. The top panel represents the true parameter space of binaries, i.e., the injected parameters in this case, where the two colors represent satisfying either of the conditions in Equations (3) and (4). The lower panel is the parameter space of the recovery, i.e., what the search reports. For the training process, the parameters in the recovered space are the features, while the label is inferred from the actual parameters. A fiducial detection during the production running is represented by the × mark in this plane. The probability of this fiducial detection being either of the two binary classes is determined from the nearest neighbors in the recovered parameter space.
Download figure:
Standard image High-resolution image

3.1. Injection Campaign

In this study, we use a broad injection set that samples the space of compact binaries. The distribution of the masses and spins is tabulated in Table 1. The injections are simulated waveforms placed in real detector noise at specific times. The BNS injections use the SpinTaylorT4 approximant (Buonanno et al. 2009), while NSBH and BBH injections use the SEOBNR approximant (Bohé et al. 2017). We consider the injections made in two detector operations from O2 (see Table B1 for times). The population contains uniform/log-uniform distribution of the masses, and both aligned and isotropic distributions of spins. It was used for the spacetime volume sensitivity analysis for the GstLAL search in Abbott et al. (2019b). In particular, injection campaigns were conducted for all astrophysical categories (BNS, NSBH, BBH) to analyze search sensitivity. We use the results, as a byproduct, to train our algorithm.¹²

Table 1. The Table Lists the Different Population Features Used in the Injection Campaign

Type	Mass Distribution	Spin Distribution	Num. Injections
BBH	$U[\mathrm{log}{m}_{1},\mathrm{log}{m}_{2}]$	$\| {\chi }^{\max }\| =0.99$ (Isotropic)	4.0 × 10⁴
BBH	$U[\mathrm{log}{m}_{1},\mathrm{log}{m}_{2}]$	$\| {\chi }^{\max }\| =0.99$ (Aligned)	1.9 × 10⁴
BNS	${ \mathcal N }[1.33{M}_{\odot },0.09{M}_{\odot }]$	$\| {\chi }^{\max }\| =0.05$ (Isotropic)	1.6 × 10⁴
BNS	$U[{m}_{1},{m}_{2}]$	$\| {\chi }^{\max }\| =0.40$ (Isotropic)	1.6 × 10⁴
NSBH	$U[\mathrm{log}{m}_{1},\mathrm{log}{m}_{2}]$	$\| {\chi }_{\mathrm{NS}}^{\max }\| =0.40;$ $\| {\chi }_{\mathrm{BH}}^{\max }\| =0.99$ (Aligned)	1.9 × 10⁴
NSBH	$\delta ({m}_{1}-5{M}_{\odot },{m}_{2}-1.4{M}_{\odot })$	$\| {\chi }_{\mathrm{NS}}^{\max }\| =0.05;$ $\| {\chi }_{\mathrm{BH}}^{\max }\| =0.99$ (Aligned/Isotropic)	1.6 × 10⁴/1.5 × 10⁴
NSBH	$\delta ({m}_{1}-10{M}_{\odot },{m}_{2}-1.4{M}_{\odot })$	$\| {\chi }_{\mathrm{NS}}^{\max }\| =0.05;$ $\| {\chi }_{\mathrm{BH}}^{\max }\| =0.99$ (Aligned/Isotropic)	1.7 × 10⁴/1.3 × 10⁴
NSBH	$\delta ({m}_{1}-30{M}_{\odot },{m}_{2}-1.4{M}_{\odot })$	$\| {\chi }_{\mathrm{NS}}^{\max }\| =0.05;$ $\| {\chi }_{\mathrm{BH}}^{\max }\| =0.99$ (Aligned/Isotropic)	1.8 × 10⁴/1.3 × 10⁴

Note. This includes signals in the three categories of CBC signals—binary black hole (BBH), neutron star black hole (NSBH), and binary neutron star (BNS) categories. The BBH category has both aligned and isotropic spin distributions. The BNS category has high-spinning and low-spinning systems to account for isolated high-spinning neutron stars and galactic binaries. The NSBH category includes δ function distributions along with a uniform log mass distribution. The $U,{ \mathcal N },\delta$ imply uniform, normal, and delta function distributions, respectively. These injections densely sample possible populations of binaries. The number of found injections that passed the FAR threshold in Equation (2) used in training are listed in the rightmost column. The campaign uses the SpinTaylorT4 approximant for BNS injections, and is effectively one-body-calibrated to the numerical relativity SEOBNR approximant for NSBH and BBH injections.

Download table as: ASCII Typeset image

For an injection campaign such as this one, fake GW signals are put in real detector noise, and then the search is run in the same approach used for analyzing the production data. The injections may be recovered based on the noise properties, and the GW intrinsic (masses and spins) and extrinsic (distance, sky location, etc.) parameters. Since we are using real data, the dynamic variation of the power spectral density is taken into account (see Table B1 for the stretch of data used, and the splitting of the data into chunks). Not all injections are found by the searches, partly because of the signal strength, and because they are at a sky location where the detectors are not sensitive. The search reports trigger based on coincidence across multiple detectors and passing a detection statistic threshold. The triggers are assigned a false-alarm rate (FAR) based on the frequency of background triggers that are assigned an equal or more significant value of the detection statistic. If the time of an injection coincides with the time of the recovery of a trigger, the injection is considered found. For this study, we further subsample to the set where the FAR of the recovered triggers corresponding to found injections is less than one per month,

$\begin{eqnarray}\begin{array}{rcl}\mathrm{FAR} & \leqslant & 1/1\ \mathrm{month}\\ & = & 3.85\times {10}^{-7}\mathrm{Hz}.\end{array}\end{eqnarray} \tag{ 2 }$

This leaves us with ∼2.0 × 10⁵ injections to train our supervised algorithm. The breakdown into different populations is shown in Table 1. This FAR threshold is reasonable since the LIGO/Virgo public alerts in the third observing run consider a FAR threshold of one per two months, further modified by a trials factor that considers the number of independent searches (see https://emfollow.docs.ligo.org/userguide/).

3.2. Training Features and Performance

For the HasNS quantity, to label an injection as having an NS we use

$\begin{eqnarray}&&{m}_{2}^{\mathrm{inj}}\leqslant 3{M}_{\odot }.\end{eqnarray} \tag{ 3 }$

The value $\approx 3{M}_{\odot }$ has been regarded as a traditional and conservative upper limit for the NS maximum mass. The limit comes from the causality condition of the sound speed being less than the speed of light. The exact numbers, however, differ based on how the high core density is matched to the low crustal density, which is of the order of the nuclear density. If the low density is known to about twice the nuclear density, one obtains the $\approx 3{M}_{\odot }$ upper limit (see, for example, Rhoades & Ruffini 1974; Kalogera & Baym 1996; Lattimer 2012). Observational evidence of pulsars obeys this limit (see Table 1 of Lattimer 2012). The total mass of the GW170817 system, $\approx 2.74{M}_{\odot }$ , also provides an observational upper limit. However, the system could have undergone prompt collapse to form a BH, ejecting some mass prior to it (see Section 2.2 of Friedman (2018), and references therein for a discussion). Some GW template-based searches also regard $3{M}_{\odot }$ to be the upper boundary for placing BNS templates (Nitz et al. 2018). Thus, Equation (3) is conservative and a fundamental inference of the presence of an NS. However, note that the presence of compact objects apart from BNS, NSBH, and BBH that satisfy Equation (3) would be included in this inference. Our inference is only based on the secondary mass, and we do not prejudge the nature of the object.

For the HasRemnant quantity, to label an injection as having remnant matter, we use the F18 empirical fit to check for non-vanishing remnant matter (see Equation (4) therein for expression),

$\begin{eqnarray}&&{M}_{\mathrm{rem}}({m}_{1}^{\mathrm{inj}},{m}_{2}^{\mathrm{inj}},{\chi }_{1}^{z\ \mathrm{inj}})\gt 0.\end{eqnarray} \tag{ 4 }$

The F18 fit requires the compactness of the NS, and hence an EoS model. For this work, we use the 2H EoS (Kyutoku et al. 2010), which has a maximum NS mass of $2.83{M}_{\odot }$ . Note that this value is not to be confused with the value mentioned in Equation (3), which is the value considered for the HasNS categorization. The value $2.83{M}_{\odot }$ for HasRemnant comes from the usage of a particular model EoS. We use the condition in Equation (4) only for the injections that have primary mass above the $2.83{M}_{\odot }$ and secondary mass below this value, i.e., NSBH systems based on this EoS. The injections with masses less than $2.83{M}_{\odot }$ are labeled as having remnant matter, while those with both masses above this value are labeled as not having remnant matter, based on the assumption that BNS mergers will always produce some remnant matter, while BBH mergers will never do so. The 2H is an unusually stiff EoS resulting in NS radii ∼15–16 km, but it errs toward larger values of the remnant matter, and therefore is a conservative choice in the sense of not misclassifying a CBC having remnant matter as otherwise, due to uncertainty in the EoS. This could be extended to compute disk masses based on different EoS models reported in the literature, giving each of them individual astrophysical weights and obtaining an EoS-averaged disk mass, and thereby, a $p({\mathtt{HasRemnant}})$ after marginalizing over the EoS.

We can restrict to the part of the parameter space on which the classification strongly depends on. We choose the following set as training features:

$\begin{eqnarray}&&{\boldsymbol{\beta }}=\{{m}_{1},{m}_{2},{\chi }_{1}^{z},{\chi }_{2}^{z},{\rm{S}}/{\rm{N}}\}.\end{eqnarray} \tag{ 5 }$

The reason to use more parameters than what are used to label the injections is the recovered parameters have correlations (see Figure 3). For example, the masses are expected to be positively correlated since the chirp mass is recovered fairly accurately and is an increasing function of the individual masses. There can also exist biases in the recovery due to degeneracies in the space of CBC GW signals. For example, high spin recovery is associated with high mass ratio. Regarding the choice of the feature set to be used, the masses and primary spins are natural, since they are the intrinsic properties of the binary on which the source properties depend. As for a detection-specific property, we use the signal-to-noise ratio, labeled SN, since it captures the general statistical uncertainty in the recovered parameters.

With this set, we use the machinery of supervised learning provided by the scikit-learn library (Pedregosa et al. 2011) to train a binary classifier based on the search results. Once trained, the classifier outputs a probability $p({\mathtt{HasNS}})$ or $p({\mathtt{HasRemnant}})$ given arbitrary but physical values of ${\boldsymbol{\beta }}$ . We tested the performance using two non-parametric algorithms: KNeighborsClassifier and RandomForestClassifier, both provided in the scikit-learn library. We found that the former outperforms the latter in our case and is used for this study.¹³ We train it using 11 neighbors—twice the number of dimensions plus one to break ties. The collection of parameters of a point estimate is a point in this parameter space. To obtain the probability of this point having a secondary mass $\leqslant 3{M}_{\odot }$ or having some remnant matter based on F18 expression, we use the nearest neighbors from the training set, weighting them by the inverse of their distance from the fiducial point,

$\begin{eqnarray}&&p({\mathtt{HasNS}}/{\mathtt{HasRemnant}})=\displaystyle \frac{{\sum }_{{\mathtt{HasNS}}/{\mathtt{HasRemnant}}}{w}_{K}}{\sum {w}_{K}},\end{eqnarray} \tag{ 6 }$

where the numerator (denominator) goes over neighbors that satisfy Equation (3) and (4) (all neighbors) of the fiducial point, and ${w}_{K}=1/{d}_{K}$ (w_K = 1) for the inverse distance (uniform) weighting. We also used the Mahalanobis metric (Mahalanobis 1936) in the space of ${\boldsymbol{\beta }}$ where distance, and therefore, nearest neighbors, are determined via

$\begin{eqnarray}&&{d}_{K}={({\boldsymbol{x}}-\tilde{{\boldsymbol{x}}})}^{T}{{\rm{\Sigma }}}^{-1}({\boldsymbol{x}}-\tilde{{\boldsymbol{x}}}),\end{eqnarray} \tag{ 7 }$

where $\tilde{{\boldsymbol{x}}}$ is the mean and Σ is the covariance matrix of the training set. This is done in the light of handling correlations. However, we find that the metric or weighting scheme used does not affect the result significantly (see Table 2).

Table 2. Percentage Misclassification When Using a Threshold of p(HasNS/HasRemnant) = 0.5 to Infer a Binary to Have a Counterpart, as a Function of the Fraction of the Data Set Used for Training and Testing Purposes

Fraction	Misclassification % p(HasNS)			Misclassification % p(HasRemnant)
	Uniform	Inverse Distance	Mahalanobis Metric	Uniform	Inverse Distance	Mahalanobis Metric
0.1	3.21	3.38	4.24	4.04	4.34	3.77
0.2	3.03	2.98	3.80	3.65	3.62	3.28
0.5	2.91	2.96	⋯	3.00	2.92	⋯
0.9	2.83	2.80	⋯	2.65	2.64	⋯
1.0	2.83	2.82	⋯	2.59	2.59	⋯

Notes. One way to think about this table is as an example of an external partner deciding to follow up CBCs that report $p(\mathrm{HasNS})\gt 0.5$ (or $p\mathrm{HasRemnant}\gt 0.5$ ). The table lists the fraction of observations that would be false positives. Out of the fraction of the total data set used (leftmost column), we train using 90% and test on the remaining 10%, cycling the training/testing set to have predictions on all points in the set. The uniform and inverse distance weighting of the nearest neighbors are used in all cases. We see that the answer starts to converge when using ≳50% of the total data set. In light of verifying correlations (shown in Figure 3) between parameters not affecting the prediction and the impurity, we trained using the Mahalanobis metric (Mahalanobis 1936) in the parameter space mentioned in Equation (5)^a. The misclassification does not change significantly based on the weighting scheme or the metric used^b.

^aSee https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.DistanceMetric.html for the implementation in the scikit-learn framework. ^bCross-validation when using the Mahalanobis metric is expensive and was performed for small fractions of the total training data.

Download table as: ASCII Typeset image

3.3. Receiver Operating Characteristic (ROC) Curve

In the case of perfect performance, one expects the trained algorithm to predict $p({\mathtt{HasNS}})=1$ ( $p({\mathtt{HasRemnant}})=1$ ) from the recovered parameters of the fake injections, which originally had an NS (had remnant matter). On the other hand, in the absence of an NS component we also do not expect any remnant matter and hence expect p(HasNS/HasRemnant) = 0. In order to test the accuracy of the classifier we trained the algorithm on 90% of the data set and tested it on the remaining 10%, cycling the training/testing combination on the full data set. The results are shown in Figure 5. While most of the binaries are correctly classified as shown in the histogram plot (left panel) for the two quantities, there is a small fraction that do not end up getting a perfect score ( $p({\mathtt{HasNS}})=1$ ). The choice of threshold value when attempting to find a binary suitable for follow-up operations could result in an impurity fraction. For example, if we use $p({\mathtt{HasNS}})\geqslant 0.5$ , shown as a dashed vertical line in the upper left panel of Figure 5, the contribution of the "No NS" histogram to the right of that line constitutes the false positive. The variation of the efficiency with the false positive as a function of the threshold applied is shown in the right panels of Figure 5. Some example values are listed in Table 3. The threshold could be set depending on the desired efficiency, or alternatively, a chosen false-positive rate. Note that the ROC curve depends on the relative rates of the different astrophysical sources. In this injection campaign each population has been densely sampled, without considering the relative rates. However, the current methodology works given an injection campaign curated based on astrophysical rate estimates of mergers as more observations are made.

Table 3. Example Values of True Positive and False Positives for Changing Values of the Threshold Used in Figure 5

Threshold	TP(`HasNS`)	FP(`HasNS`)	TP(`HasRemnant`)	FP(`HasRemnant`)
0.07	0.999	0.144	0.995	0.106
0.27	0.995	0.096	0.979	0.040
0.51	0.986	0.061	0.949	0.014
0.80	0.959	0.028	0.894	0.003
0.94	0.900	0.010	0.822	0.001

Note. The column containing threshold values corresponds to the color bar in both right panels of Figure 5. The true-positive and false-positive values are to be read off based on the HasNS/HasRemnant case.

Download table as: ASCII Typeset image

The predictions for a parameter sweep on the (m₁, m₂) values are shown in Figure 4. Considering the $p({\mathtt{HasNS}})$ plot, a perfect performance of the search would have rendered the region under the vertical line of ${m}_{2}=3{M}_{\odot }$ as $p({\mathtt{HasNS}})=1$ . In reality, we expect a fuzz around the ${m}_{2}=3{M}_{\odot }$ line, as shown in the figure. $p({\mathtt{HasRemnant}})$ is behaving as expected with respect to the increasing spin values, increasing the region with a non-vanishing remnant mass boundary.

**Figure 5.** This figure shows the receiver operating characteristic curve for the classifier. It shows the true positive rate against the false-positive rate as a function of the threshold to classify binaries as having an NS or having remnant matter. Top panel: the left figure is a histogram of the p(HasNS) values for the injections that represented a binary that had an NS and for those that did not. In the limit of perfect performance, the values for the former (latter) should be at p(HasNS) = 1 (p(HasNS) = 0). The true positive and false negative performance are decided based on the threshold that is applied to make the decision. For example, using a value of p(HasNS) = 0.5 (dotted–dashed vertical line) would imply that all the values to the right of the line are have an NS. While such a decision captures most of the true NS bearing binaries, there is a small misclassification fraction. The right figure shows the fractions as a function of this threshold. Bottom panel: same as the top panel, except the values correspond to the binary with remnant matter post-merger.
Download figure:
Standard image High-resolution image

4. Conclusion

A low-latency inference of the presence of a neutron star or post-merger remnant matter in a compact binary merger provides crucial information about whether the binary will have an EM counterpart, and be worth following up for observing partners. Such time-sensitive inferences have to be carried out from low-latency point-estimate parameters provided by gravitational-wave real-time search pipelines. However, the point-estimate masses and spins could be inaccurate. Bayesian parameter estimation provides the best answer to such inferences, but requires hours to days to complete. In order to correct for such systematics in low latency, we use supervised machine learning on the parameter recovery of the GstLAL online search pipeline from LIGO/Virgo operations. The result is a binary classifier that is trained based on an injection campaign to learn such systematics. Once trained, the real-time computation of arbitrary binaries is sub-second. This method is adaptive to changes in template banks in the low-latency search algorithms, provided injection campaigns are conducted. Also, it is adaptive to changes in the noise power spectral density of the interferometer, which naturally manifest in the performance of the search. While we have used a broad training set for the purposes of this paper, this methodology could be extended to incorporate astrophysical rates by curating injection campaigns based on our knowledge of the rates of binary mergers.

This work was supported by NSF grants No. PHY-1700765, PHY-1912649, and PHY-1626190. D.C. acknowledges the use of computing resources of the LIGO Data Grid and facilities provided by Leonard E. Parker Center for Gravitation, Cosmology and Astrophysics at University of Wisconsin-Milwaukee. D.C. would like to thank Jolien Creighton, Siddharth Mohite, and Duncan Meacher for helpful discussions. The authors would like to thank the anonymous referee for helpful comments.

Software: scikit-learn (Pedregosa et al. 2011), Matplotlib (Hunter 2007), scipy (Virtanen et al. 2020), numpy (van der Walt et al. 2011), pandas (McKinney 2010), jupyter (https://jupyter.org/), SQLAlchemy (https://www.sqlalchemy.org/).

Appendix A: Parameter Sweep Showing Variation with Signal-to-noise Ratio

In this section, we make in extension of the parameter sweep results shown in Figure 4. Here we sweep over the (m₁, m₂) values but keep the values of the spins fixed, only varying the signal-to-noise ratio. The result is shown in Figure A1. It is expected that uncertainty in the recovered parameter will decrease with increasing signal-to-noise, which manifests as a decrease in the fuzzy region separating the bright ( $p({\mathtt{HasNS}})=1$ / $p({\mathtt{HasRemnant}})=1$ ) and dark ( $p({\mathtt{HasNS}})=0$ / $p({\mathtt{HasRemnant}})=0$ ) regions.

**Figure A1.** This figure is an extension of Figure 4. Here we see the behavior of the predictions from the binary classifiers as the signal-to-noise ratio of recovery increases. Left panel: variation in $p({\mathtt{HasNS}})$ with signal-to-noise. Right panel: variation in $p({\mathtt{HasRemnant}})$ with signal-to-noise.
Download figure:
Standard image High-resolution image

Appendix B: GstLAL Injection Sets

In this section, we report the calendar dates for the data chunks used in this study. These are tabulated in Table B1. The chunks cover most of the duration of the observing run, although they may not be contiguous, corresponding to breaks in the observing run. Three detector injections were performed in the last month of the second observing run. Thus, their length and hence the missed found injections are smaller in number. In a future work, we plan to reanalyze the performance of the classifier based on injection campaigns in the third observing run as they are performed.

Table B1. Calendar Times for Two Detector (H1L1) Chunks of LIGO O2 Data

GstLAL chunk	Start date	End date
Chunk 02	`Wed Nov 30 16:00:00 GMT 2016`	`Fri Dec 23 00:00:00 GMT 2016`
Chunk 03	`Wed Jan 04 00:00:00 GMT 2017`	`Sun Jan 22 08:00:00 GMT 2017`
Chunk 04	`Sun Jan 22 08:00:00 GMT 2017`	`Fri Feb 03 16:20:00 GMT 2017`
Chunk 05	`Fri Feb 03 16:20:00 GMT 2017`	`Sun Feb 12 15:30:00 GMT 2017`
Chunk 06	`Sun Feb 12 15:30:00 GMT 2017`	`Mon Feb 20 13:30:00 GMT 2017`
Chunk 07	`Mon Feb 20 13:30:00 GMT 2017`	`Tue Feb 28 16:30:00 GMT 2017`
Chunk 08	`Tue Feb 28 16:30:00 GMT 2017`	`Fri Mar 10 13:35:00 GMT 2017`
Chunk 09	`Fri Mar 10 13:35:00 GMT 2017`	`Sat Mar 18 20:00:00 GMT 2017`
Chunk 10	`Sat Mar 18 20:00:00 GMT 2017`	`Mon Mar 27 12:00:00 GMT 2017`
Chunk 11	`Mon Mar 27 12:00:00 GMT 2017`	`Tue Apr 04 16:00:00 GMT 2017`
Chunk 12	`Tue Apr 04 16:00:00 GMT 2017`	`Fri Apr 14 21:25:00 GMT 2017`
Chunk 13	`Fri Apr 14 21:25:00 GMT 2017`	`Sun Apr 23 04:00:00 GMT 2017`
Chunk 14	`Sun Apr 23 04:00:00 GMT 2017`	`Mon May 08 16:00:00 GMT 2017`
Chunk 15	`Fri May 26 06:00:00 GMT 2017`	`Sun Jun 18 18:30:00 GMT 2017`
Chunk 16	`Sun Jun 18 18:30:00 GMT 2017`	`Fri Jun 30 02:30:00 GMT 2017`
Chunk 17	`Fri Jun 30 02:30:00 GMT 2017`	`Sat Jul 15 00:00:00 GMT 2017`
Chunk 18	`Sat Jul 15 00:00:00 GMT 2017`	`Thu Jul 27 19:00:00 GMT 2017`

Note. We consider the injections performed by the GstLAL search in these durations for this study. The time series is available at https://www.gw-openscience.org/data/.

Download table as: ASCII Typeset image

A Machine Learning-based Source Property Inference for Compact Binary Mergers

Article metrics

Permissions

Author affiliations

ORCID iDs

Dates

Abstract

1. Introduction