next up previous contents pdf.png
Next: A. Excerpt from ASCA Up: User's Guide for the Previous: 9 IDL Configuration

Subsections



10 Other Tools


10.1 Catalog Matching

The TARA package includes a tool (match_xy.pro) to match astronomical catalogs using positional uncertainties for individual sources plus a match significance threshold, rather than using a fixed matching radius as is commonly employed. This tool can be used to match catalogs from different observatories (e.g. to find infrared counterparts to Chandra sources), or can be used to find the union of Chandra source lists obtained from multiple source detection runs.

10.1.1 Inputs

The inputs to match_xy are two catalogs (a ``master'' and a ``slave'') plus a match significance threshold (described later). Every catalog entry consists of two positional coordinates (X,Y), uncertainties for the coordinates expressed as Gaussian standard deviations ( $\sigma_X, \sigma_Y$), and a unique identifier (e.g. sequence number) for the entry. If a known reference frame offset between the catalogs is provided to match_xy (see examples) then that offset is applied to one of the catalogs at the start of the algorithm.

10.1.2 Algorithm

The first stage of the algorithm consists of hypothesis testing. For every possible pair of sources from the two catalogs we test the hypothesis that the two physical sources are spatially coincident, i.e. that the two observed positions are random samples drawn from Gaussian distributions with identical means (the true position of the single source) and with the reported Gaussian standard deviations (position errors). If the hypothesis is true then the X and Y offsets between the two observed positions will be drawn from a 2-D Gaussian probability distribution with zero mean and with X/Y variances equal to the sums of the reported variances. We reject this hypothesis if the observed positional offsets fall outside a specified confidence region of this two-dimensional Gaussian. The confidence region is specified by a match significance threshold; for example a match significance threshold of 0.99 defines a confidence region which includes 99% of the 2-D Gaussian probability distribution. This threshold directly controls the fraction of true matches which you are willing to miss due to chance (assuming the actual position errors are distributed as claimed).

The second stage of the algorithm seeks to pare down the large set of significant matches produced by the first stage, which can include many-to-one and one-to-many relationships, to a reasonable one-to-one set of matches destined to be accepted by the observer as ``true'' associations. This stage is asymmetric; the matches are examined from the point of view of one of the catalogs, the ``master catalog''. We refer to the most significant match of each master source as its Primary Match (PM), and refer to any other significant matches as ``Secondary Matches'' (SM); if no significant match exists we refer to the source as ``isolated''. This stage seeks to label the PMs as either ``successful'', meaning that the observer will accept them as true associations, or ``failed'', meaning that the PM should be examined carefully. The labeling is constructed so that the successful PMs form one-to-one relationships. Imagine how a human with the two catalogs plotted together on paper might construct such a labeling by circling (accepting) matches in order of their likelihood while avoiding circling any source twice. The algorithm uses the same ``greedy'' approach, processing the matches from the first stage in order of their significance, and disallowing many-to-one and one-to-many relationships. When a PM is accepted as ``successful'' any remaining PMs which involve either participating source are labeled as ``failed''. Thus for example if two master sources are significantly close to a single slave source, the closer master will be labeled as a successful PM and the other will be labeled as a failed PM. The term ``failed'' is meant to convey that a match would have been accepted as a true association if another more likely association had not interfered.

The third stage of the algorithm produces ds9 region files, plots, and statistics which help the observer understand and examine the matching results.

Although the matching algorithm operates with only a single slave catalog, the matching process can be repeated with several slave catalogs, producing a composite catalog with columns from several slave catalogs.

10.1.3 Limitations of the Algorithm

Spatial coincidence is the only match criterion used by the algorithm. When there are astophysical justifications for using other aspects of sources (e.g. photometry) to judge the probability of association, then more sophisiticated algorithms are helpful--see the literature for various Bayesian matching algorithms.

The current algorithm does not estimate estimate absolute probabilities for the validity of each individual match, based on the local density of sources, as some algorithms in the literature do.

10.1.4 Outputs

Our typical matching process (described in the next section) produces a binary ``IDL save file'' containing a wide table (an IDL array of structures) with a row for each master source. The table contains all the columns found in all the catalogs there were matched. The counterpart columns contain data when there was either a ``successful primary match'' or a ``failed primary match'' (defined above). Below are examples of how this structure array can be manipulated in IDL, using an ACIS/2MASS match as an example.
; Configure the IDL Astronomy Library.
astrolib

; Load the match "save file" into IDL
restore, /VERBOSE, 'acis_twomass_cat.sav'

; To save typing, give the structure array a shorter name.
cat = ACIS_TWOMASS_CAT

; Examine the contents of the structure array.
help, cat
help, /ST, cat
help, /ST, cat.acis
help, /ST, cat.twomass

; Identify the "successful primary matches" using the "match type" field.
is_match = (cat.twomass.type EQ 2)

; Print selected catalog columns for all the good matches.
forprint, SUBSET=where(is_match), cat.acis.label, cat.twomass.designation, cat.twomass.K_CIT

; Send that output to a file instead of to the screen.
forprint, SUBSET=where(is_match), cat.acis.label, cat.twomass.designation, cat.twomass.K_CIT, TEXTOUT='table.txt', /NOCOMMENT

; Print selected catalog columns for only the matches that meet some criterion.
k_bright = (cat.twomass.k_cit LT 8.0)                                                           
forprint, SUBSET=where(is_match AND k_bright), cat.acis.label, cat.twomass.designation, cat.twomass.K_CIT

The ``forprint'' command is in Wayne Landsman's IDL Astronomy Library; it has a FORMAT option to control the format of its output. Note that older versions of this tool do not have the SUBSET option used in the example above.

The process also produces a ds9 region file (described below) containing graphics which help the observer visualize the matching results. The meaning of the symbols and colors used are recorded in the header of the region file. The coordinate system used in the region file is the (X,Y) coordinate system used in the catalogs (see below). The regions have``group tags'', which you can see in the ds9 window Region:Groups, that help you select the classes of regions described below. This is a convenient way to select and delete a class of regions that you do not wish to see (e.g. the unmatched ``slave'' entries), or select and change the color of a region class.

The third (optional) output is a ``union'' (merged) catalog containing the union of the source lists, i.e. with matches (duplicates) represented only once. The position of a duplicate source is taken from the more accurate of the two matching positions. For example if you have multiple catalogs for the same field produced by different source detection runs this union catalog output could be adopted as the final source list for the field.

10.1.5 Usage

Whatever catalogs you're working with must be read into the IDL data structure match_xy uses to represent catalogs, which is an array of structures containing the following tags:
ID: a unique long integer source identifier
(X,Y): a position in some coordinate system. It does not have to be a Cartesian tangent plane system, but typically would be.
( $X_{ERR}, Y_{ERR}$): 1-sigma position error estimates, possibly different for each source.
The program file match_xy.pro contains catalog reader functions for a variety of catalog formats that we have encountered. You should be able to easily modify these examples to read in whatever catalogs you're working with. All these existing catalog readers convert RA and DEC celestial coordinates found in the catalog (and their errors) to a cartesian (X,Y) tangent plane coordinate system, e.g. the ``sky'' coordinate system used by the Chandra-ACIS data.


If your goal is to form the union of (i.e. merge) multiple catalogs assumed to lie in the same reference frame then an example set of IDL calls would be:

.run match_xy

; Read in three catalogs.
cat_A = build_mycat('catA.txt')
cat_B = build_mycat('catB.txt')
cat_C = build_mycat('catC.txt')

; Pick one of the catalogs to be the master.
match_xy, the_match,   cat_A, 'A', /INIT

; Choose a match significance.
sig = 0.99

; Match to the second catalog.
match_xy, the_match,   cat_B, 'B', sig, UNION_CAT=cat_AB

; Match the AB union to the third catalog.
match_xy, the_match,  cat_AB, 'AB', /INIT
match_xy, the_match,  cat_C,  'C',  sig, UNION_CAT=cat_ABC
The array of structures cat_ABC is the union of the three catalogs.


If your goal is to find 2MASS and GLIMPSE counterparts to an ACIS catalog then an example set of IDL calls would be:

.run match_xy

; Read in the catalogs.
event2wcs_astr = get_astrometry_from_eventlist('acis.evt')

acis_cat    = build_AE_cat     ('catACIS.fits')
twomass_cat = build_2mass_cat  ('cat2MASS.txt'  , event2wcs_astr)
glimpse_cat = build_glimpse_cat('catGLIMPSE.txt', event2wcs_astr)

; Let the ACIS catalog be the master.
match_xy, the_match, acis_cat, 'ACIS', /INIT

; Choose a match significance.
sig = 0.99

; Match to the slaves and analyze the reference frame offsets.
match_xy, the_match, twomass_cat, 'TWOMASS', sig
match_xy, the_match, glimpse_cat, 'GLIMPSE', sig
match_xy_analyze, the_match, composite_cat

; From the plots and statistics presented, choose reference frame offsets and rematch.
match_xy, the_match, twomass_cat, 'TWOMASS', sig, XSHIFT=-0.125, YSHIFT= 0.574
match_xy, the_match, glimpse_cat, 'GLIMPSE', sig, XSHIFT= 0.074, YSHIFT=-0.066

; Confirm the catalogs are now well aligned.
; Construct & save a ds9 region file and a composite catalog
match_xy_analyze, the_match, composite_cat

save, event2wcs_astr, composite_cat, FILE='composite_cat.sav'
The composite catalog composite_cat is an array of structures carrying all the information from the master catalog plus information from all the the matching slave catalogs. The ds9 region file (described above) named ``composite_cat.reg'' is produced.


If you wish to use only a subset of the sources for estimating reference frame offsets you can supply a mask vector to the match_xy_analyze routine via the keyword ANALYSIS_MASK. For example:

; Use only on-axis ACIS sources with more than 10 counts.
acis_mask = (acis_cat.THETA LE 2) AND (acis_cat.SRC_CNTS GE 10)

match_xy_analyze, the_match, composite_cat, ANALYSIS_MASK=acis_mask
A more careful approach would also mask out inaccurate entries in the slave catalog.

10.1.6 Systematic Astrometric Offsets

We must emphasize how important it is to remove reference frame offsets between catalogs before matching (using XSHIFT/YSHIFT as shown above)! Misaligned catalogs will cause many true associations to be missed and in crowded fields can easily lead to numerous false associations. Included in match_xy.pro is a program called match_xy_estimate_offset which can be used to iteratively estimate the XSHIFT and YSHIFT values that should be applied to a slave catalog.

The XSHIFT/YSHIFT offsets you supply affect the coordinates of the objects in the region files that match_xy produces, and affect the X and Y columns found in the catalogs that match_xy produces. World coordinates found in the original catalogs (e.g. columns RA and DEC) are NOT altered. The region file produced by match_xy uses the X/Y coordinate system employed by the matching algorithm, which include any XSHIFT/YSHIFT you supplied, rather than celestial coordinates. In our own work, all our catalog readers convert RA and DEC celestial coordinates found in the catalog to the ``sky'' (X,Y) tangent plane coordinate system used by the Chandra-ACIS data, which has been carefully aligned to an astrometric reference and thus has no XSHIFT/YSHIFT applied within match_xy. Thus, we easily convert the output region file from X/Y coordinates to celestial coordinates by displaying it on top of the ACIS data, configuring ds9 to use celestial coordinates, and re-saving.

10.1.7 Performance Characterization

Sometimes a catalog's position errors will represent only the random component of the uncertainty related to having a finite number of source photons, and a bright source will report a tiny error. Because the algorithm uses position errors to determine the confidence in a match, a source with tiny errors will have great difficulty matching anything, particularly if any systematic positions errors (e.g. remaining reference frame offsets) exist. Thus you may find it necessary to impose an arbitrary ``floor'' on the positions errors, as is done in the functions build_wavdetect_cat() and build_AE_cat() supplied in match_xy.pro, in order to get the brightest sources to match their obvious counterparts.

We strongly recommend that you visualize the matching results using the region file described above, considering whether you're comfortable with the results. Such visual review can often reveal defects or omissions in the catalogs you're using. You should carefully examine all sources marked as ``isolated'' to make sure you do not see obvious matches that were missed due to uncorrected reference frame offsets or unreasonably small position error estimates. Various categories of regions in the ds9 region file are identified using the ds9 tag property. The categories (tag values) can be seen and selected via the ds9 menu selection Region:Groups, making it easy to remove groups you don't want to see (e.g. unused slave sources) or to change the appearance of all the members of a group.

Included in match_xy.pro is a program called match_xy_simulate which can be used to statistically characterize the performance of the matching process, e.g. estimating the number of false matches identified. Such simulations and analysis are described in ``A Catalog of Chandra X-ray Sources in the Carina Nebula'' (Broos et. al, 2011) and in ``The Young Stellar Population in M17 Revealed by Chandra'' (Broos et al., 2006).


10.2 Adaptive Kernel Image Smoothing

The program adaptive_density_2d.pro implements a simple adaptive kernel image smoothing algorithm that accounts for variations in the exposure of the observation. Note that this simple algorithm spreads out point sources more than the CIAO tool csmooth because this algorithm lacks the csmooth feature whereby bright pixels are smoothed early on, and then removed from the image so they don't spread to neighboring regions. We have typically used adaptive_density_2d to study diffuse emission in images in which the point sources have been masked.

10.2.1 The Algorithm

Three sets of smoothing kernels are available, parameterized by the kernel size variable $r = \{0,1,2,3,\ldots\}$:
  1. tophat

    \begin{displaymath}k(r,i,j) = 1 \mbox{ when } i^2 + j^2 \leq r^2 \end{displaymath}

  2. 2-D Gaussian

    \begin{displaymath}k(0,i,j) = \delta(i,j) = 1 \mbox{ when } i=0,j=0 \end{displaymath}


    \begin{displaymath}k(r,i,j) = \exp^{-\frac{i^2+j^2}{2r^2}} \end{displaymath}

  3. Epanechnikov

    \begin{displaymath}k(r,i,j) = 1 - \frac{i^2+j^2}{(r+1)^2} \mbox{ when } i^2 + j^2 < (r+1)^2\end{displaymath}

For a particular kernel size, $r=R$, the flux at position $(X,Y)$ is estimated as

\begin{displaymath}
flux(X,Y) = \frac{counts}{exposure}
\end{displaymath} (1)

where

\begin{displaymath}counts = \sum_{x \in OB} \sum_{y \in OB} k(R,x-X,y-Y) image(x,y) \end{displaymath}


\begin{displaymath}exposure = \sum_{x \in OB} \sum_{y \in OB} k(R,x-X,y-Y) emap(x,y) \end{displaymath}

and where $OB$ is the set of pixels in the ``observed'' region (positive exposure) of the scene, $image$ is the integer-valued image of observed counts, and $emap$ is the exposure map.

To derive an estimate of the error on the flux value, we first note that each pixel in our observed counts image, $image(x,y)$, is of course a sample from an unknown underlying Poisson distribution associated with that pixel. We make the simplifying assumption that all the pixels under the kernel have the same true flux (which we just estimated). By multiplying that flux by the pixel's exposure value, we then compute the mean number of counts that should be observed in each pixel if the image was repeatedly observed

\begin{displaymath}mean(x,y) = flux(X,Y) emap(x,y)\end{displaymath}

Each (Poisson) pixel's variance is then assumed to be that mean value. We then propagate the variances of the distributions through the linear equation (1) in the usual way to obtain an error for our flux estimate:

\begin{displaymath}error = \frac{\sqrt{\sum_{x \in OB} \sum_{y \in OB} [k(R,x-X,y-Y)]^2 flux(X,Y) emap(x,y)}}{exposure} \end{displaymath}

The flux and flux error are used to define a simple ``significance'' value,

\begin{eqnarray*}
significance & = & \frac{flux}{error} \\
& = & \frac{\sum_{x...
... \in OB} \sum_{y \in OB} [k(R,x-X,y-Y)]^2 flux(X,Y) emap(x,y)}}
\end{eqnarray*}

which increases as $r$ increases since more counts are encompassed under the kernel. The observer supplies a significance threshold, and at each position $(X,Y)$ the flux is estimated using the smallest kernel size, $r$, that produces a significance exceeding the threshold. Thus, regions of low flux are more heavily smoothed than regions of higher flux. A map of the kernel sizes chosen is returned by the smoothing program.

Application of the kernel to the exposure map in equation (1) is required for accurate smoothing of any wide-field ACIS image because the exposure varies significantly due to chip gaps and bad regions of the detector, as shown in the example exposure map in Figure 13. The algorithm naturally handles unobserved regions, such as areas explicitly masked by the observer and areas outside the field of view of ACIS, because those zero-count and zero-exposure pixels do not affect either the flux or flux error computations. As you would expect, an unobserved region tends to cause neighboring observed regions to be more heavily smoothed since a larger kernel must be used to find an adequate number of counts. Note that the algorithm will estimate a flux for unobserved pixels (zero exposure) using the nearby observed regions. This is of course a form of interpolation.

The smoothing program recognizes a negative value in the exposure map as a flag specifying that the position is ``off-field'', meaning that no flux estimate should be computed there. We flag unobserved pixels in this way for two purposes:

Figure 13: An example ACIS exposure map after masking point sources.
\includegraphics[width=0.5\textwidth]{1874_background_emap}

10.2.2 Example Usage

Suppose we have a high-resolution (say 1024x1024) image (foo.img) and corresponding exposure map (foo.emap), and we have a region file which defines appropriate masks to remove the point sources. To study diffuse emission we would start by masking out the point sources:
dmcopy "foo.img [exclude sky=region(mask.reg)][opt full]" foo_masked.img
dmcopy "foo.emap[exclude sky=region(mask.reg)][opt full]" foo_masked.emap

The next step would be to consider what resolution (image size) is desirable for the final smoothed flux image. In this example we're studying diffuse emission, so 1024x1024 resolution is excessive. In any case only the largest monitors can display such an image at full resolution, and a figure in a paper could never render such high resolution. Thus, in this example, we will rebin (in IDL) to 512x512:

img       = readfits('foo_masked.img')
emap      = readfits('foo_masked.emap')
full_emap = readfits('foo.emap')

img       = fix(frebin(img,      512,512,/TOTAL))
emap      =     frebin(emap,     512,512,/TOTAL)
full_emap =     frebin(full_emap,512,512,/TOTAL)

If the image scene includes the field edges, where the exposure tapers off to zero due to dither, we would typically declare pixels with low exposure, say $<10\%$ nominal, to be "off-field" to prevent pixels with very small relative exposure from producing artifacts in the flux image.

off_field = where(full_emap LT (max(full_emap)/10.))
img [off_field] =  0
emap[off_field] = -1

The call to adaptive_density_2d.pro, for a Gaussian kernel, would then look like this:

adaptive_density_2d, img, min_significance, EMAP=emap, /GAUSS, $
                     flux_map, error_map, radius_map
The three output images (flux_map, error_map, radius_map) contain the flux, flux error, and kernel sizes. The minimum significance parameter controls the smoothness of the flux map produced. The optional keyword MAX_RADIUS can be used if the default maximum kernel size parameter (40) is not suitable. If the Epanechnikov kernel is desired, supply /EPAN instead of /GAUSS. If the tophat kernel is desired omit both /EPAN & /GAUSS.

In the example above the exposure map in the masked regions contained the value zero. As described above, flux values are computed for those pixels using whatever kernel is required to find enough counts. If this sort of interpolation over the holes is not desired we can flag the holes as ``off-field'' so they will be retained in the smoothed image:

off_field = where(emap EQ 0)
img [off_field] =  0
emap[off_field] = -1

Alternatively, we might want to smooth over the edges of the holes, leaving the centers of the larger holes off-field:

on_field  = (emap GT 0)
off_field = where(smooth(float(on_field),3, /EDGE) EQ 0)
img [off_field] =  0
emap[off_field] = -1


next up previous contents pdf.png
Next: A. Excerpt from ASCA Up: User's Guide for the Previous: 9 IDL Configuration
Patrick Broos
Penn State Department of Astronomy
2013-01-08