XSPEC and low count spectra

Using XSPEC on spectra with small numbers of counts per bin requires some care. In the following discussion we consider ways to ensure that we get an unbiased estimate of the parameters. We do not consider the problem of deciding whether the model with these parameters is a good fit.

Chi-squared : diagnostics and explanations

It is well-known that chi-squared doesn't work properly when there are significant numbers of bins in the spectrum with only a few counts. Folk wisdom is that there should be at least 5 counts in every bin. A good diagnostic for problems is if the best fit model is biased low relative to the data. The problems occur because the denominator in chi-squared is the variance for each bin. However, we don't necessarily know that variance so have to estimate it. The standard estimate is simply to use the data (assuming a Poisson distribution). Now, consider what happens if the data in a bin is an upward fluctuation or a downward fluctuation. Because we use the observed data as the denominator then a downward fluctuation contributes more to chi-squared than an upward fluctuation by the same amount. The net result is that the best-fit model is biased downwards.

For instance, this example shows that the data is systematically above the model at energies above 2 keV. Note that I binned up the plot using setplot rebin to make it look a bit better but this command does not bin the data used in the fit.

Use cstat when there is no background

In the simple case where the background is non-existent or negligible and you can assume that the counts in a given channel are just a Poisson variable with mean given by the model-predicted counts in that channel then the best solution is to change the fit statistic to cstat (statistic cstat). The same model and response used above now gives the fit below which no longer has a downward bias at energies above 2 keV.

Confidence regions estimation proceeds in an identical fashion to that for the chi-squared statistic.

Here is an example of the effects of bias in chi-squared. Performing 1000 simulations of a Chandra spectrum with 1000 counts and a power-law index of 1.7 and then recovering the power-law index gives a mean and 90% range of

       Standard chi-squared    1.41     (-0.15, +0.16) 
       Gehrels chi-squared     1.88     (-0.22, +0.25) 
       Churazov chi-squared    1.69     (-0.15, +0.16)  
       Maximum Likelihood      1.70     (-0.13, +0.14) 

When there is background it is more complicated

Fit both the source and background files simultaneously

When possible, the best way to approach this is to simultaneously fit to both the source and background files with a background model for the background file and a combination of source model and background model for the source file. The parameters for the background models should be linked. A simple example would be set up in xspec as follows

  XSPEC12> data 1:1 source.pi 2:2 background.pi
  XSPEC12> model phabs(apec) + pow

where phabs(apec) is the source model and pow the background model. You will now be prompted for parameters for data group 1 (ie the source file) then data group 2 (ie the background file). For the second (background) set of parameters freeze all the parameters of phabs(apec) and set the normalization to zero. Link the second set of parameters for the pow to the first set of parameters. Now fit using cstat.

The problems with this method are :

Use a modified cstat

xspec has a modified version of cstat (W, actually a profile likelihood) for the case when there is a background file. Our simple example now becomes

  XSPEC12> data source.pi 
  XSPEC12> back background.pi
  XSPEC12> model phabs(apec)

This works well in many cases however is known to fail sometimes when there are very few total counts in the spectrum. In this case, try grouping the spectrum so that there is at least one count in each bin. I have no idea why this helps but it does.

Bin up the spectrum

An alternative approach is simply bin up the spectrum so that there are enough counts in each bin that the chi-squared bias is not significant. Note that this should be done using rbnpha or grppha, not within xspec using setplot rebin. Ideally, the binning scheme should not depend on the exact number of counts in each bin - ie the group min option in grppha - but a deterministic scheme decided on before looking at the spectrum. This has method has the obvious disadvantage that the binning erases information.

Change the denominator in chi-squared

Since the problem with chi-squared arises because the estimator we use for the variance does not work well we can try a different estimator. One such scheme is to calculate an average number of counts over the surrounding spectral bins and use that as an estimator for the variance. This is done in xspec by changing the chi-squared weighting using the command weight churazov.

low count spectra (last edited 2012-04-27 17:02:07 by KeithArnaud)