Next: 5 dataset_2d Tool
Up: User's Guide for the
Previous: 3 Event Browser
Subsections
4 dataset_1d Tool
The dataset_1d tool is used to study the distribution of univariate datasets. If supplied with a simple vector containing N measurements of some quantity, dataset_1d will estimate the probability density function that describes the dataset, for example by computing a histogram of the data, and will allow the user to visualize and fit that density. Analysis may be restricted to a subset of the data by specifying a range of values to include or exclude.
The following sections describe the specific capabilities of dataset_1d.
- As described in Sections 3.3.2 & 3.3, Event Browser can send one or more named datasets into dataset_1d.
- The menu selection File
Load 1-D Dataset can be used to read a column of data from a FITS binary table or an ASCII file.
- From the IDL prompt you can load a vector of data into dataset_1d (Section 8).
A droplist just above the plot controls the basic mathematical entity that
is plotted.
- Scatter Plot: The value of each 1-D datapoint is simply plotted
against the datapoint's index. The plot symbol and color may be changed
by pressing the Edit button.
- Density Function: A binned density function for the 1-D dataset is
estimated and plotted. The Y-axis values are normalized by the binsize. Thus they represent the number of data points per unit of the quantity being measured, e.g. seconds, eV, pixels, rather than the number of datapoints falling in each bin which is how histograms are often displayed.
The line style, color, bin size, bin phase, and error bar presentation
may be changed by pressing the Edit button. By default, the
density function is simply a scaled histogram. An error (sigma)
estimate for each bin is made based on simple Poisson counting
statistics, i.e. the error on a bin with N events is
. Bins
with 0 events are arbitrarily assigned an error of 1.
You may, however, specify that the histogram should be smoothed. Note that a smoothed histogram made with a small binsize approximates
the result obtained by the kernel smoothing method often
recommended by statisticians in the field of non-parametric density
estimation. [Silverman1986] Kernel smoothing avoids spurious
features often found in histograms - features that change dramatically
when the phase of the bins is changed. Unfortunately, I do not know
how to put error bars on a smoothed histogram.
- Distribution Function: A distribution function (the integral
of the density) for the 1-D dataset is plotted.
4.3 Axis & Title Controls
Axis ranges are specified by pan-and-zoom-style controls found below and to the
left of the plot window.
: These buttons pan the plot window left & right.
- Zoom-, Zoom+: These buttons change the range of the axis.
- Auto: Setting this button makes the axis range follow the data.
- Range: This button prompts you to choose the axis
range by clicking the mouse on the plot.
- Center: This button prompts you to click on the plot location that should become the new center of the plot.
- X-edit/Y-edit: This button brings up a dialog box that lets you
type values for the axis endpoints, lets you choose a logarithmic style, and
lets you specify the margins to the left & right of the plot (where the
Y-title goes).
When the 1-1 button is checked a 1-1 aspect ratio is maintained and the Zoom, Center, and Range buttons affect both axes. For example, with 1-1 checked you can "zoom in" on a image feature either by pressing either Range button and selecting the corners of the region you want
to display, OR by pressing Center and clicking on the feature and then pressing Zoom to scale.
The Titles button brings up a dialog box that lets you specify
miscellaneous properties of the plot.
- window dimensions (the size of the plot window on the screen)
- titles
- date annotation
- marker positions: Two markers exist in the plot coordinate system,
displayed by red plus signs. Their plot coordinates may be changed in this dialog box, and if world coordinates have been defined their positions in that system are displayed in this dialog box. The markers are used for defining regions of interest and for specifying positions used in various analyses (see below).
The Big Marker may also be moved with the left mouse button
and the Small Marker may be moved with the right mouse button.
Clicking the middle mouse button will display information about the
nearest datapoint, density function bin, or distribution function sample.
If world coordinates have been defined for the axes, the mouse position in those coordinates is displayed continuously as the mouse is moved.
4.4 Selected Dataset Droplist
The Univariate Analysis widget can analyze multiple 1-D datasets, plotting density functions for each on the same plot. Each
dataset has a unique name - the selected dataset's name is shown
in a droplist to the left of the mode droplist. Many controls pertain
only to this selected dataset.
The File menu is used to print the display and to save the density
and distribution functions to FITS files. Dialog boxes will appear to
let you configure PostScript parameters and choose filenames.
At the top of the widget are controls that let you specify an interval of
data values that define a region-of-interest. For example, to compute
statistics on a range of data you would either:
- Move the two markers to the ends of the range by clicking
the left & right mouse buttons.
- Press the Use Markers button.
- Change the left-hand droplist from None to Stats.
or
- Type in the range endpoints in the Edit dialog box.
- Change the left-hand droplist from None to Stats.
The right-hand droplist may be used to exclude rather than include a
range of data.
If the left-hand droplist is set to Filter, then datapoints
falling outside the ROI are excluded from analysis. Don't forget, if
you wish this filter to propagate to the Working Dataset you must press
the Apply Filter button (Section 3.3.1).
The selected dataset (optionally filtered) may be fit to a
gaussian+polynomial probability model by direct application of the Maximum
Likelihood Method, i.e. by directly maximizing the likelihood of
the data (see Chapter 10 of [Bevington and Robinson1992]).
One advantage of this method is that it does not use a density function
estimated from the univariate dataset. Such density estimates invariable
require choosing arbitrary parameters, such as histogram bin size and phase.
The menu item Fit Setup creates controls that allow you to
specify the number of gaussian components and the order of the
polynomial background component in the model, as well as supply initial
values for the model parameters. If you press the button labeled
``Mouse'' you will be asked to click on the density plot to define the
initial parameter values for the current gaussian component.
Individual model parameters may be frozen at the value you supplied by
changing the droplist next to the parameter from ``free'' to
``fixed''. Up to three gaussian components are allowed, although only
the parameters for one gaussian are displayed at a time. A gaussian
component will be used in the fit if it's initial amplitude parameter
is non-zero OR if it's amplitude parameter is marked ``free''.
Similarly, a polynomial term will be used in the fit if it's initial
coefficient parameter is non-zero OR if it's coefficient parameter is
marked ``free''.
To perform the fit, choose the Perform Fit menu item. Since the
Maximum Likelihood Method evaluates the model at each datapoint, the
fit will be slow for large datasets. We constrain the integral of the
model over the range of the data to be equal to the number of
datapoints. Thus, the amplitudes of the gaussian components plus the
coefficients of the polynomial terms cannot all be free parameters.
One of them is chosen to be a derived parameter so that the model
integral comes out right. As a result, the number of free parameters,
reported in the fit result message, is often one less than you expect.
The Kolmogorov-Smirnov statistic is used to characterize the
goodness-of-fit (see Section 14.3 of [Press1992] and
Section 4.5.2 of [Babu and Feigelson1996]) by comparing the model probability
density to the density you've estimated from the data (the plot you're
looking at). Thus, the choices you've made in computing the density
estimate (bin size, bin phase, smoothing) may slightly affect the KS
statistic. Unfortunately, this fitting method does not conveniently
produce estimates for the errors on the fit parameters.
If you would prefer to perform a traditional least squares fit of your
density function to a gaussian+polynomial model, then you must export
the density function to the function_1d tool (see the next section).
Least squares fitting is fast and will give you error estimates on the
parameters, but you'll face the angst of choosing bins sizes, phases,
and smoothing and worrying about what happens to the weighting of bins
that have small numbers (including zero) of counts.
If desired, the density of the selected dataset and/or the model
function may be exported to a function_1d tool for further analysis.
For example, suppose you wanted to try fitting your dataset with both a
single gaussian model and with a double gaussian model, and you wanted
to produce a plot that shows both models and the density estimate
(histogram) for the data. This cannot be done in the Univariate
Analysis widget because only one fit exists at a time. However, by
exporting all the functions you're working with (the density function
plus the fit functions you create) to a function_1d tool (which knows
how to work with multiple functions), you can create the plot you
need.
- Perform the first fit
- Export the first fit - this creates a new function_1d tool
- Change the number of gaussian components (Fit Setup) then
perform the second fit
- Export the second fit - it is added to the function_1d tool you
recently created
- Export the density function itself
- In the function_1d tool, edit the function descriptions as
desired, turn on the legend, adjust the plot symbols, line styles, and
colors as desired, adjust the axes as desired, and print.
Next: 5 dataset_2d Tool
Up: User's Guide for the
Previous: 3 Event Browser
Patrick Broos
Penn State Department of Astronomy
2013-01-08