statistical methods in XSPEC

This page summarizes statistical methods used in XSPEC. It is under construction.

Parameter estimation

The purpose of parameter estimation is to determine the best-fit parameter values for the data sets in use and the model defined. A statistic is calculated from the data and model and the parameters varied until this statistic is minimized. XSPEC can use either of two families of statistics.

Chi-squared (stat chi)

The chi-squared statistic is calculated by :

χ² = ∑ (D_i - M_i)² / σ_i²

where D_i are the observed counts, M_i the model predicted counts, and σ_i² the variance.

The observed and predicted counts are straightforward however, in general, we do not know the true variance and have to estimate it. If the errors are Normal then we just use the variance associated with each bin as the estimator. There are no particular issues with this as long as the variances are reasonable estimates. If the errors are Poisson then the situation is more complex. The default option (weight standard) is to use the observed number of counts as an estimator for the underlying variance (equals the underlying mean). It is important to realize that this introduces a bias. Downward fluctuations will be weighted more heavily than upward fluctuations because, while the numerator of chi-squared for the bin will be the same, the denominator will be smaller for the downward fluctuation. An obvious alternative to try is to use the predicted counts from the model as an estimator for the Poisson variance (weight model). This does not have the bias problem of the standard method however in practice it turns out to be unstable and can drive the fit away from the best parameters. A clever alternative was suggested by Churazov et al. and appears to work well (weight churazov). In this case the variance is estimated from the mean of nearby channels. This reduces the size of the bias since the variance is less dependent on the fluctuation of an individual bin. It does require the expected counts/bin to be relatively constant over the bins being averaged to estimate the variance.

C statistic (stat cstat)

If the fluctuations in the counts in each bin are solely Poisson then the probability of seeing the observation given the model (the likelihood) is :

Pr = Π M_i^Di exp(-M_i) / D_i !

The best fit will occur when this probability is maximized so we use as a statistic to minimize the negative of the log (base e) probability. We can choose a normalization for the statistic by adding terms which are independent of the model since they will not change the best fit. The particular version used in XSPEC is :

C = 2 ∑ ( M_i - D_i + D_i (ln D_i - ln M_i) )

This particular normalization of the statistic has the advantage it tends to χ² in the limit of large D_i.

This statistic only works on spectra which have not been background-subtracted. If background is important and it can be modeled then the best solution is to simultaneously fit both the source and background spectra using the C statistic. If there is no simple model for the background or this method is inappropriate the problem becomes what is known in the statistics literature as the contaminated Poisson. The general solution to the contaminated Poisson is not known however there are several approaches. XSPEC uses the profile likelihood which is obtained by allowing each spectral channel to have one free parameter for the predicted background counts. We can now write a joint likelihood for the source and background observation with parameters being those we care about for the model and those we do not care about for each background channel. The profile likelihood is then the total likelihood optimized over all the background parameters. In this case there is an analytic solution for the profile likelihood in terms of the observations and the source model predicted counts and this is used in XSPEC (see appendix B of the manual).

Parameter estimation

Chi-squared (stat chi)

C statistic (stat cstat)

Confidence intervals

Fisher matrix

Error command

Goodness-of-fit

Chi-squared

goodness command

Model comparison

Bayesian methods