statistical methods in XSPEC

This page summarizes statistical methods used in XSPEC. It is under construction.

Parameter estimation

The purpose of parameter estimation is to determine the best-fit parameter values for the data sets in use and the model defined. A statistic is calculated from the data and model and the parameters varied until this statistic is minimized. XSPEC can use either of two families of statistics.

Chi-squared (stat chi)

The chi-squared statistic is calculated by :

χ² = ∑ (D_i - M_i)² / σ_i²

where D_i are the observed counts, M_i the model predicted counts, and σ_i² the variance.

The observed and predicted counts are straightforward however, in general, we do not know the true variance and have to estimate it. If the errors are Normal then we just use the variance associated with each bin as the estimator. There are no particular issues with this as long as the variances are reasonable estimates. If the errors are Poisson then the situation is more complex. The default option (weight standard) is to use the observed number of counts as an estimator for the underlying variance (equals the underlying mean). It is important to realize that this introduces a bias. Downward fluctuations will be weighted more heavily than upward fluctuations because, while the numerator of chi-squared for the bin will be the same, the denominator will be smaller for the downward fluctuation. An obvious alternative to try is to use the predicted counts from the model as an estimator for the Poisson variance (weight model). This does not have the bias problem of the standard method however in practice it turns out to be unstable and can drive the fit away from the best parameters. A clever alternative was suggested by Churazov et al. and appears to work well (weight churazov). In this case the variance is estimated from the mean of nearby channels. This reduces the size of the bias since the variance is less dependent on the fluctuation of an individual bin. It does require the expected counts/bin to be relatively constant over the bins being averaged to estimate the variance.

C statistic (stat cstat)

If the fluctuations in the counts in each bin are solely Poisson then the probability of seeing the observation given the model (the likelihood) is :

Pr = Π M_i^Di exp(-M_i) / D_i !

The best fit will occur when this probability is maximized so we use as a statistic to minimize the negative of the log (base e) probability. We can choose a normalization for the statistic by adding terms which are independent of the model since they will not change the best fit. The particular version used in XSPEC is :

C = 2 ∑ ( M_i - D_i + D_i (ln D_i - ln M_i) )

This particular normalization of the statistic has the advantage it tends to χ² in the limit of large D_i.

This statistic only works on spectra which have not been background-subtracted. If background is important and it can be modeled then the best solution is to simultaneously fit both the source and background spectra using the C statistic. If there is no simple model for the background or this method is inappropriate the problem becomes what is known in the statistics literature as the contaminated Poisson. The general solution to the contaminated Poisson is not known however there are several approaches. XSPEC uses the profile likelihood which is obtained by allowing each spectral channel to have one free parameter for the predicted background counts. We can now write a joint likelihood for the source and background observation with parameters being those we care about for the model and those we do not care about for each background channel. The profile likelihood is then the total likelihood optimized over all the background parameters. In this case there is an analytic solution for the profile likelihood in terms of the observations and the source model predicted counts and this is used in XSPEC (see appendix B of the manual). The drawback is that the profile likelihood does not have the same statistical properties as the likelihood and may be biased. Some problems have been reported with XSPEC when fitting models to spectra with < 100 counts in total.

Confidence intervals

The best-fit parameters on their own do not provide much useful information, we usually want to have some idea of the likely range of the parameters. Formally, the X% confidence interval on a parameter is defined so that if the observation is repeated many times and the confidence interval calculated each time then X% of these confidence intervals will contain the true value of the parameter. Note that is not the customary interpretation which is to say that there is an X% chance of the true value of the parameter falling within the confidence interval. We will discuss this further in the section on Bayesian methods.

Fisher matrix

The errors written out for each parameter at the end of the fit are the one sigma errors based on the Fisher information matrix. The Fisher matrix is calculated from the second derivatives of the statistic with respect to the parameters at the best-fit (where the first derivatives are zero). We invert this matrix to determine the covariance matrix then use the diagonal elements for the one sigma error estimates. If there is any significant correlation between parameters these errors will be underestimates.

Error command

A better confidence interval estimation is obtained using the error command. This command works by stepping away from the best-fit value of the requested parameter. For each fixed new value of the parameter all other parameters are varied until the best-fit is attained. The 90% confidence interval limits are then determined by the parameter values for which the statistic is 2.706 greater than its best-fit value. In a similar manner the confidence region of two or more correlated parameters can be determined using the steppar command. For two parameters the contour plot produced by plot contour shows the 68, 90, and 99% confidence regions.

Goodness-of-fit

The fit and error commands give the best-fit parameter values and confidence intervals for some model but do not tell us whether the model is appropriate. If the model does not fit the observations well then any estimated parameter values will be meaningless. To check this we need a goodness-of-fit criterion to determine whether the observed data could have been produced by the hypothesized model.

Chi-squared

This is the test that everyone knows : if χ² per degree of freedom (defined as number of spectral bins minus the number of parameters) is approximately 1 then the fit is reasonable. This is only valid if the errors are Gaussian. XSPEC writes out the χ², the reduced χ², and the null hypothesis probability. This last probability is that of the data being drawn from the underlying model (ie source model and instrumental response). An important point to note about reduced χ² is that a small value (<< 1) also indicates a problem - likely that the errors are not Gaussian.

C statistic

The C statistic does not provide a goodness-of-fit measure however the XSPEC version is defined so as to tend to χ² in the case of large number of counts. This means that the C statistic will tend to a value equal to the number of degrees of freedom for a correct model but it is dangerous to use this as a test.

In XSPEC v11 we also write out the Akaike Information Criterion and Bayesian Information Criterion but we have not added these to v12 yet.

goodness command

If the data are Poisson (ie counting statistics are the only error source) then the goodness command can be used to estimate whether the fit is reasonable. The goodness command repeatedly generates fake data based on the current model and calculates the fit statistic. The distribution of fake fit statistics is then compared with that observed. If most of the fake fits are better than the observed one then it is unlikely that the observed data were drawn from the model. The goodness command has an additional option which uses for each faked data set a set of model parameters randomly generated using the covariance matrix instead of the best-fit parameters.

Model comparison

Often we do not know the correct scientific model for a source so try several different models. For each model we can find best-fit parameters and statistic. We then want to know whether one of these models is better than the others.

Nested models

The case of nested models occurs when we have two models, one of which is a subset of the other. A very common case is that of adding an additional emission line to a model then asking whether the data justify the addition. A very common procedure is to use the F-test which compares the χ² statistics for the two models. Do not do this. The derivation of the standard equation used to interpret the F-test implicitly assumes that the inner model does not lie on the boundary of the outer model. This is not true for the case of an emission line because here the boundary of the emission line model has normalization of zero and this is the model without the extra component. The only way to avoid this is to fit for a model without prejudice of it being an emission or an absorption line (ie in xspec allow the normalization to be negative).

We can take an approach similar to that of the goodness command. Consider the case of testing for an emission line. We define a statistic as the ratio of the likelihoods of the model with and without the emission line (the difference of the χ² in the Gaussian error case). We calculate this statistic for the observed data. We now generate fake data sets based on the model without the emission line. To do this we randomly generate a set of model parameters based on the covariance matrix (similar to the sim option on the goodness command) then use these parameters to create the faked data. We fit these data with both models and calculate the likelihood ratio statistic. If the true model which the observed data samples does not have an emission line then we expect that actual statistic to lie within the distribution of faked statistics. If there really is an emission line then the observed statistic should not be consistent with the distribution of faked statistics.

There is a Tcl script to help with this method although it is difficult to write a completely general script so it may need modifications for particular cases.

Non-nested models

We deal with non-nested models analogously to nested models although we do have to make a decision on which model is the null hypothesis (equivalent in the example above to the model without the emission line).