The TARA package includes a tool (match_xy.pro) to match astronomical catalogs using positional uncertainties for individual sources plus a match significance threshold, rather than using a fixed matching radius as is commonly employed. This tool can be used to match catalogs from different observatories (e.g. to find infrared counterparts to Chandra sources), or can be used to find the union of Chandra source lists obtained from multiple source detection runs.
The second stage of the algorithm seeks to pare down the large set of significant matches produced by the first stage, which can include many-to-one and one-to-many relationships, to a reasonable one-to-one set of matches destined to be accepted by the observer as ``true'' associations. This stage is asymmetric; the matches are examined from the point of view of one of the catalogs, the ``master catalog''. We refer to the most significant match of each master source as its Primary Match (PM), and refer to any other significant matches as ``Secondary Matches'' (SM); if no significant match exists we refer to the source as ``isolated''. This stage seeks to label the PMs as either ``successful'', meaning that the observer will accept them as true associations, or ``failed'', meaning that the PM should be examined carefully. The labeling is constructed so that the successful PMs form one-to-one relationships. Imagine how a human with the two catalogs plotted together on paper might construct such a labeling by circling (accepting) matches in order of their likelihood while avoiding circling any source twice. The algorithm uses the same ``greedy'' approach, processing the matches from the first stage in order of their significance, and disallowing many-to-one and one-to-many relationships. When a PM is accepted as ``successful'' any remaining PMs which involve either participating source are labeled as ``failed''. Thus for example if two master sources are significantly close to a single slave source, the closer master will be labeled as a successful PM and the other will be labeled as a failed PM. The term ``failed'' is meant to convey that a match would have been accepted as a true association if another more likely association had not interfered.
The third stage of the algorithm produces ds9 region files, plots, and statistics which help the observer understand and examine the matching results.
Although the matching algorithm operates with only a single slave catalog, the matching process can be repeated with several slave catalogs, producing a composite catalog with columns from several slave catalogs.
The current algorithm does not estimate estimate absolute probabilities for the validity of each individual match, based on the local density of sources, as some algorithms in the literature do.
; Configure the IDL Astronomy Library. astrolib ; Load the match "save file" into IDL restore, /VERBOSE, 'acis_twomass_cat.sav' ; To save typing, give the structure array a shorter name. cat = ACIS_TWOMASS_CAT ; Examine the contents of the structure array. help, cat help, /ST, cat help, /ST, cat.acis help, /ST, cat.twomass ; Identify the "successful primary matches" using the "match type" field. is_match = (cat.twomass.type EQ 2) ; Print selected catalog columns for all the good matches. forprint, SUBSET=where(is_match), cat.acis.label, cat.twomass.designation, cat.twomass.K_CIT ; Send that output to a file instead of to the screen. forprint, SUBSET=where(is_match), cat.acis.label, cat.twomass.designation, cat.twomass.K_CIT, TEXTOUT='table.txt', /NOCOMMENT ; Print selected catalog columns for only the matches that meet some criterion. k_bright = (cat.twomass.k_cit LT 8.0) forprint, SUBSET=where(is_match AND k_bright), cat.acis.label, cat.twomass.designation, cat.twomass.K_CIT
The ``forprint'' command is in Wayne Landsman's IDL Astronomy Library; it has a FORMAT option to control the format of its output. Note that older versions of this tool do not have the SUBSET option used in the example above.
The process also produces a ds9 region file (described below) containing graphics which help the observer visualize the matching results. The meaning of the symbols and colors used are recorded in the header of the region file. The coordinate system used in the region file is the (X,Y) coordinate system used in the catalogs (see below). The regions have``group tags'', which you can see in the ds9 window Region:Groups, that help you select the classes of regions described below. This is a convenient way to select and delete a class of regions that you do not wish to see (e.g. the unmatched ``slave'' entries), or select and change the color of a region class.
The third (optional) output is a ``union'' (merged) catalog containing the union of the source lists, i.e. with matches (duplicates) represented only once. The position of a duplicate source is taken from the more accurate of the two matching positions. For example if you have multiple catalogs for the same field produced by different source detection runs this union catalog output could be adopted as the final source list for the field.
If your goal is to form the union of (i.e. merge) multiple catalogs assumed to lie in the same reference frame then an example set of IDL calls would be:
.run match_xy ; Read in three catalogs. cat_A = build_mycat('catA.txt') cat_B = build_mycat('catB.txt') cat_C = build_mycat('catC.txt') ; Pick one of the catalogs to be the master. match_xy, the_match, cat_A, 'A', /INIT ; Choose a match significance. sig = 0.99 ; Match to the second catalog. match_xy, the_match, cat_B, 'B', sig, UNION_CAT=cat_AB ; Match the AB union to the third catalog. match_xy, the_match, cat_AB, 'AB', /INIT match_xy, the_match, cat_C, 'C', sig, UNION_CAT=cat_ABCThe array of structures cat_ABC is the union of the three catalogs.
If your goal is to find 2MASS and GLIMPSE counterparts to an ACIS catalog then an example set of IDL calls would be:
.run match_xy ; Read in the catalogs. event2wcs_astr = get_astrometry_from_eventlist('acis.evt') acis_cat = build_AE_cat ('catACIS.fits') twomass_cat = build_2mass_cat ('cat2MASS.txt' , event2wcs_astr) glimpse_cat = build_glimpse_cat('catGLIMPSE.txt', event2wcs_astr) ; Let the ACIS catalog be the master. match_xy, the_match, acis_cat, 'ACIS', /INIT ; Choose a match significance. sig = 0.99 ; Match to the slaves and analyze the reference frame offsets. match_xy, the_match, twomass_cat, 'TWOMASS', sig match_xy, the_match, glimpse_cat, 'GLIMPSE', sig match_xy_analyze, the_match, composite_cat ; From the plots and statistics presented, choose reference frame offsets and rematch. match_xy, the_match, twomass_cat, 'TWOMASS', sig, XSHIFT=-0.125, YSHIFT= 0.574 match_xy, the_match, glimpse_cat, 'GLIMPSE', sig, XSHIFT= 0.074, YSHIFT=-0.066 ; Confirm the catalogs are now well aligned. ; Construct & save a ds9 region file and a composite catalog match_xy_analyze, the_match, composite_cat save, event2wcs_astr, composite_cat, FILE='composite_cat.sav'The composite catalog composite_cat is an array of structures carrying all the information from the master catalog plus information from all the the matching slave catalogs. The ds9 region file (described above) named ``composite_cat.reg'' is produced.
If you wish to use only a subset of the sources for estimating reference frame offsets you can supply a mask vector to the match_xy_analyze routine via the keyword ANALYSIS_MASK. For example:
; Use only on-axis ACIS sources with more than 10 counts. acis_mask = (acis_cat.THETA LE 2) AND (acis_cat.SRC_CNTS GE 10) match_xy_analyze, the_match, composite_cat, ANALYSIS_MASK=acis_maskA more careful approach would also mask out inaccurate entries in the slave catalog.
The XSHIFT/YSHIFT offsets you supply affect the coordinates of the objects in the region files that match_xy produces, and affect the X and Y columns found in the catalogs that match_xy produces. World coordinates found in the original catalogs (e.g. columns RA and DEC) are NOT altered. The region file produced by match_xy uses the X/Y coordinate system employed by the matching algorithm, which include any XSHIFT/YSHIFT you supplied, rather than celestial coordinates. In our own work, all our catalog readers convert RA and DEC celestial coordinates found in the catalog to the ``sky'' (X,Y) tangent plane coordinate system used by the Chandra-ACIS data, which has been carefully aligned to an astrometric reference and thus has no XSHIFT/YSHIFT applied within match_xy. Thus, we easily convert the output region file from X/Y coordinates to celestial coordinates by displaying it on top of the ACIS data, configuring ds9 to use celestial coordinates, and re-saving.
We strongly recommend that you visualize the matching results using the region file described above, considering whether you're comfortable with the results. Such visual review can often reveal defects or omissions in the catalogs you're using. You should carefully examine all sources marked as ``isolated'' to make sure you do not see obvious matches that were missed due to uncorrected reference frame offsets or unreasonably small position error estimates. Various categories of regions in the ds9 region file are identified using the ds9 tag property. The categories (tag values) can be seen and selected via the ds9 menu selection Region:Groups, making it easy to remove groups you don't want to see (e.g. unused slave sources) or to change the appearance of all the members of a group.
Included in match_xy.pro is a program called match_xy_simulate which can be used to statistically characterize the performance of the matching process, e.g. estimating the number of false matches identified. Such simulations and analysis are described in ``A Catalog of Chandra X-ray Sources in the Carina Nebula'' (Broos et. al, 2011) and in ``The Young Stellar Population in M17 Revealed by Chandra'' (Broos et al., 2006).
The program adaptive_density_2d.pro implements a simple adaptive kernel image smoothing algorithm that accounts for variations in the exposure of the observation. Note that this simple algorithm spreads out point sources more than the CIAO tool csmooth because this algorithm lacks the csmooth feature whereby bright pixels are smoothed early on, and then removed from the image so they don't spread to neighboring regions. We have typically used adaptive_density_2d to study diffuse emission in images in which the point sources have been masked.
For a particular kernel size, , the flux at position is estimated as
To derive an estimate of the error on the flux value, we first note that each pixel in our observed counts image, , is of course a sample from an unknown underlying Poisson distribution associated with that pixel. We make the simplifying assumption that all the pixels under the kernel have the same true flux (which we just estimated). By multiplying that flux by the pixel's exposure value, we then compute the mean number of counts that should be observed in each pixel if the image was repeatedly observed
The flux and flux error are used to define a simple ``significance'' value,
Application of the kernel to the exposure map in equation (1) is required for accurate smoothing of any wide-field ACIS image because the exposure varies significantly due to chip gaps and bad regions of the detector, as shown in the example exposure map in Figure 13. The algorithm naturally handles unobserved regions, such as areas explicitly masked by the observer and areas outside the field of view of ACIS, because those zero-count and zero-exposure pixels do not affect either the flux or flux error computations. As you would expect, an unobserved region tends to cause neighboring observed regions to be more heavily smoothed since a larger kernel must be used to find an adequate number of counts. Note that the algorithm will estimate a flux for unobserved pixels (zero exposure) using the nearby observed regions. This is of course a form of interpolation.
The smoothing program recognizes a negative value in the exposure map as a flag specifying that the position is ``off-field'', meaning that no flux estimate should be computed there. We flag unobserved pixels in this way for two purposes:
dmcopy "foo.img [exclude sky=region(mask.reg)][opt full]" foo_masked.img dmcopy "foo.emap[exclude sky=region(mask.reg)][opt full]" foo_masked.emap
The next step would be to consider what resolution (image size) is desirable for the final smoothed flux image. In this example we're studying diffuse emission, so 1024x1024 resolution is excessive. In any case only the largest monitors can display such an image at full resolution, and a figure in a paper could never render such high resolution. Thus, in this example, we will rebin (in IDL) to 512x512:
img = readfits('foo_masked.img') emap = readfits('foo_masked.emap') full_emap = readfits('foo.emap') img = fix(frebin(img, 512,512,/TOTAL)) emap = frebin(emap, 512,512,/TOTAL) full_emap = frebin(full_emap,512,512,/TOTAL)
If the image scene includes the field edges, where the exposure tapers off to zero due to dither, we would typically declare pixels with low exposure, say nominal, to be "off-field" to prevent pixels with very small relative exposure from producing artifacts in the flux image.
off_field = where(full_emap LT (max(full_emap)/10.)) img [off_field] = 0 emap[off_field] = -1
The call to adaptive_density_2d.pro, for a Gaussian kernel, would then look like this:
adaptive_density_2d, img, min_significance, EMAP=emap, /GAUSS, $ flux_map, error_map, radius_mapThe three output images (flux_map, error_map, radius_map) contain the flux, flux error, and kernel sizes. The minimum significance parameter controls the smoothness of the flux map produced. The optional keyword MAX_RADIUS can be used if the default maximum kernel size parameter (40) is not suitable. If the Epanechnikov kernel is desired, supply /EPAN instead of /GAUSS. If the tophat kernel is desired omit both /EPAN & /GAUSS.
In the example above the exposure map in the masked regions contained the value zero. As described above, flux values are computed for those pixels using whatever kernel is required to find enough counts. If this sort of interpolation over the holes is not desired we can flag the holes as ``off-field'' so they will be retained in the smoothed image:
off_field = where(emap EQ 0) img [off_field] = 0 emap[off_field] = -1
Alternatively, we might want to smooth over the edges of the holes, leaving the centers of the larger holes off-field:
on_field = (emap GT 0) off_field = where(smooth(float(on_field),3, /EDGE) EQ 0) img [off_field] = 0 emap[off_field] = -1