next up previous contents pdf.png
Next: 5 dataset_2d Tool Up: User's Guide for the Previous: 3 Event Browser

Subsections



4 dataset_1d Tool

The dataset_1d tool is used to study the distribution of univariate datasets. If supplied with a simple vector containing N measurements of some quantity, dataset_1d will estimate the probability density function that describes the dataset, for example by computing a histogram of the data, and will allow the user to visualize and fit that density. Analysis may be restricted to a subset of the data by specifying a range of values to include or exclude. The following sections describe the specific capabilities of dataset_1d.

4.1 Getting Datasets into dataset_1d

4.2 Mode Droplist

A droplist just above the plot controls the basic mathematical entity that is plotted.


4.3 Axis & Title Controls

Axis ranges are specified by pan-and-zoom-style controls found below and to the left of the plot window. When the 1-1 button is checked a 1-1 aspect ratio is maintained and the Zoom, Center, and Range buttons affect both axes. For example, with 1-1 checked you can "zoom in" on a image feature either by pressing either Range button and selecting the corners of the region you want to display, OR by pressing Center and clicking on the feature and then pressing Zoom to scale.

The Titles button brings up a dialog box that lets you specify miscellaneous properties of the plot.

The Big Marker may also be moved with the left mouse button and the Small Marker may be moved with the right mouse button. Clicking the middle mouse button will display information about the nearest datapoint, density function bin, or distribution function sample. If world coordinates have been defined for the axes, the mouse position in those coordinates is displayed continuously as the mouse is moved.


4.4 Selected Dataset Droplist

The Univariate Analysis widget can analyze multiple 1-D datasets, plotting density functions for each on the same plot. Each dataset has a unique name - the selected dataset's name is shown in a droplist to the left of the mode droplist. Many controls pertain only to this selected dataset.

4.5 File Menu

The File menu is used to print the display and to save the density and distribution functions to FITS files. Dialog boxes will appear to let you configure PostScript parameters and choose filenames.

4.6 Region-of-interest Controls

At the top of the widget are controls that let you specify an interval of data values that define a region-of-interest. For example, to compute statistics on a range of data you would either:
  1. Move the two markers to the ends of the range by clicking the left & right mouse buttons.

  2. Press the Use Markers button.

  3. Change the left-hand droplist from None to Stats.
or
  1. Type in the range endpoints in the Edit dialog box.

  2. Change the left-hand droplist from None to Stats.

The right-hand droplist may be used to exclude rather than include a range of data.

If the left-hand droplist is set to Filter, then datapoints falling outside the ROI are excluded from analysis. Don't forget, if you wish this filter to propagate to the Working Dataset you must press the Apply Filter button (Section 3.3.1).

4.7 Analysis Menu, Fitting

The selected dataset (optionally filtered) may be fit to a gaussian+polynomial probability model by direct application of the Maximum Likelihood Method, i.e. by directly maximizing the likelihood of the data (see Chapter 10 of [Bevington and Robinson1992]). One advantage of this method is that it does not use a density function estimated from the univariate dataset. Such density estimates invariable require choosing arbitrary parameters, such as histogram bin size and phase.

The menu item Fit Setup creates controls that allow you to specify the number of gaussian components and the order of the polynomial background component in the model, as well as supply initial values for the model parameters. If you press the button labeled ``Mouse'' you will be asked to click on the density plot to define the initial parameter values for the current gaussian component. Individual model parameters may be frozen at the value you supplied by changing the droplist next to the parameter from ``free'' to ``fixed''. Up to three gaussian components are allowed, although only the parameters for one gaussian are displayed at a time. A gaussian component will be used in the fit if it's initial amplitude parameter is non-zero OR if it's amplitude parameter is marked ``free''. Similarly, a polynomial term will be used in the fit if it's initial coefficient parameter is non-zero OR if it's coefficient parameter is marked ``free''.

To perform the fit, choose the Perform Fit menu item. Since the Maximum Likelihood Method evaluates the model at each datapoint, the fit will be slow for large datasets. We constrain the integral of the model over the range of the data to be equal to the number of datapoints. Thus, the amplitudes of the gaussian components plus the coefficients of the polynomial terms cannot all be free parameters. One of them is chosen to be a derived parameter so that the model integral comes out right. As a result, the number of free parameters, reported in the fit result message, is often one less than you expect.

The Kolmogorov-Smirnov statistic is used to characterize the goodness-of-fit (see Section 14.3 of [Press1992] and Section 4.5.2 of [Babu and Feigelson1996]) by comparing the model probability density to the density you've estimated from the data (the plot you're looking at). Thus, the choices you've made in computing the density estimate (bin size, bin phase, smoothing) may slightly affect the KS statistic. Unfortunately, this fitting method does not conveniently produce estimates for the errors on the fit parameters.

If you would prefer to perform a traditional least squares fit of your density function to a gaussian+polynomial model, then you must export the density function to the function_1d tool (see the next section). Least squares fitting is fast and will give you error estimates on the parameters, but you'll face the angst of choosing bins sizes, phases, and smoothing and worrying about what happens to the weighting of bins that have small numbers (including zero) of counts.

4.8 Analysis Menu, Exporting

If desired, the density of the selected dataset and/or the model function may be exported to a function_1d tool for further analysis. For example, suppose you wanted to try fitting your dataset with both a single gaussian model and with a double gaussian model, and you wanted to produce a plot that shows both models and the density estimate (histogram) for the data. This cannot be done in the Univariate Analysis widget because only one fit exists at a time. However, by exporting all the functions you're working with (the density function plus the fit functions you create) to a function_1d tool (which knows how to work with multiple functions), you can create the plot you need.
  1. Perform the first fit

  2. Export the first fit - this creates a new function_1d tool

  3. Change the number of gaussian components (Fit Setup) then perform the second fit

  4. Export the second fit - it is added to the function_1d tool you recently created

  5. Export the density function itself

  6. In the function_1d tool, edit the function descriptions as desired, turn on the legend, adjust the plot symbols, line styles, and colors as desired, adjust the axes as desired, and print.


next up previous contents pdf.png
Next: 5 dataset_2d Tool Up: User's Guide for the Previous: 3 Event Browser
Patrick Broos
Penn State Department of Astronomy
2013-01-08