Evaluating performance improvements

We have evaluated our implementation of density based clustering by measuring its effect on the detection of simulated signals injected into typical single detector LIGO data, and its effect on the rate of false detections.

In order to evaluate the false detection rate, the QPipeline was first applied to single detector data without injected signals. This was performed both with and without clustering. Since detectable GWB events are expected to be extremely rare in the few hours of data considered here, and since we have set our thresholds to yield event rates up to $\sim$ 1 Hz and not demanded coincidence between multiple detectors to reject false events, we can safely identify false events as those events in a given data stretch whose total normalized energy exceeded a specified detection threshold. Three sets of false events were identified: unclustered events, clustered events, and combined events formed by the union of unclustered and clustered events. The resulting false event rates as a function of detection threshold are shown in Figure 5.

An adverse effect of using density based clustering is the occasional rejection of highly localized signals, regardless of the detection threshold. This is due to the tendency of density based clustering to exclude isolated triggers. This is also evident in Figure 6, where detection efficiency of sinusoidal Gaussians does not converge to 100 percent for the case of clustered triggers. To overcome this, we have also considered the performance of a search consisting of the union of both clustered and unclustered triggers, and compared it with that for clustered triggers and unclustered triggers only. The resulting sinusoidal Gaussian combined detection efficiency for a given energy-threshold is then comparable to that of the unclustered case, as shown in Figure 6.

Another possible solution to this problem is to reduce the required number of tiles within the neighborhood radius to zero, permitting single tile clusters. Classical hierarchical clustering also provides an alternative to density based clustering that permits single tile clusters. Since the focus of this paper is on improved performance for signals that are extended in time and/or frequency, we have not considered either of these alternative choices here.

The lower false event rate observed in Figure 5 for clustered triggers at low detection thresholds is associated with the rejection of isolated noise events as described in Section 4. At high detection thresholds, the opposite is true. The presence of transient non-stationary ``glitches'' in the data that are extended in time and/or frequency cause the false event rate of clustered triggers to exceed that of unclustered triggers.

**Figure:** The false event rate of the search algorithm as a function of detection threshold when applied to typical LIGO data. The trigger rate is shown for three different trigger sets: unclustered triggers, clustered triggers, and the union of clustered and unclustered triggers.
$\includegraphics[angle=0,width=100mm]{figures/falserate}$

Figure: Comparison of the detection efficiency vs. search threshold (left) and Receiver Operator Characteristics (ROC) (right) of the search algorithm, with and without clustering, applied to the detection of simulated inspiral (top), white noise burst (middle), and sinusoidal Gaussian (bottom) waveforms injected 200 times into typical LIGO data at fixed SNR.

Inspiral signals, SNR 25
$\includegraphics[angle=0,width=75mm]{figures/inspiral_25_significance-efficiency}$	$\includegraphics[angle=0,width=75mm]{figures/inspiral_25_roc}$
White noise burst signals, SNR 25
$\includegraphics[angle=0,width=75mm]{figures/noiseburst_25_significance-efficiency}$	$\includegraphics[angle=0,width=75mm]{figures/noiseburst_25_roc}$
Sinusoidal Gaussian signals, SNR 10
$\includegraphics[angle=0,width=75mm]{figures/sinegaussian_10_significance-efficiency}$	$\includegraphics[angle=0,width=75mm]{figures/sinegaussian_10_roc}$

To evaluate the effect of clustering on the detection of signals, we next applied the QPipeline to the recovery of simulated signals injected into the same single detector data. Again, this was performed both with and without clustering. Injections were identified as detected if a event was observed above the detection threshold within 1 second of the time of the injected signal. We define the detection efficiency as the fraction of injection signals that were correctly detected, and evaluate this efficiency as a function of detection threshold for signals injected at a constant signal to noise ratio.

In order to characterize the performance of density based clustering for a variety of signal morphologies, we have repeated this analysis for five different waveform families. They include simple Gaussian pulses, sinusoidal Gaussian pulses, and the fundamental ring down mode of perturbed black holes, which represent signals that are highly localized in the time-frequency plane; and the inspiral phase of coalescing binary compact objects and band-limited time-windowed white noise bursts, which are both extended in the time-frequency plane. Within each waveform family, signals were injected with random parameters such as time, frequency, duration, bandwidth, mass, etc.

Among non-localized signals, inspirals and white noise bursts represent two extremes: white noise bursts fill a large time-frequency region, whereas inspirals, while extended in time and frequency, still only occupy a small portion of a time-frequency region. Three of the waveform families are ad-hoc: simple Gaussian pulses, sinusoidal Gaussian pulses, and white noise bursts. Two of the waveform families were astrophysical: inspirals and ringdowns. While we are not designing a search to only target known waveforms such as ringdowns and inspirals, they are nonetheless also a useful test case because they are astrophysically motivated and because they can form a basis for comparison with other existing searches, including matched filter searches.

On the left side panels of Figure 6, we present the resulting detection efficiency as a function of detection threshold for three of the waveform families that we have considered, representing both ad-hoc and astrophysical, as well as localized and extended. On the right side panels of Figure 6, we report the receiver operator characteristic (ROC) for each waveform, which combines the measured false rate from Figure 5 with the detection efficiencies from the left side panels.

The results indicate that for the extended waveforms, such as the inspiral and noise burst waveforms, clustering increases search efficiency and significantly improves the resulting ROC by approximately an order of magnitude in false rate. The primary reason for this improved performance is the increase in measured signal energy due to clustering, which is evident as increased detection efficiency in the left hand side of Figure 6.

Although clustering provides a marked improvement for the detection of signals that are extended in time and frequency, Figure 6 indicates that clustering also adversely impacts the performance of the search for localized waveforms. In particular, the ROC for sinusoidal Gaussians is worse by roughly a factor of 3 in false rate due to the addition of clustering. The primary cause of this decreased performance is the higher false event rate, which is due to the increased significance of detector glitches after clustering, and is evident in Figure 5. For signals that are extended in time and/or frequency this higher false event rate is more than compensated by the significant improvement in detection efficiency, but for more localized signals there is no improvement in detection efficiency to compensate for the increased false rate. In practice, we expect the presence of such detector glitches to be largely mitigated by the requirement of a coincident and consistent observation of a gravitational wave in multiple detectors, as well as the absence of a signal in environmental monitors. As a result, the decreased performance for localized signals may also be somewhat mitigated.