Source Grouping#

Introduction#

In Point Spread Function (PSF) photometry, the accuracy of measuring a source’s brightness can be compromised by the light from nearby sources. When sources are close to each other, their individual light profiles overlap, affecting the fit of the PSF model used for measurement. To address this, a grouping algorithm can be employed to combine neighboring sources into distinct sets that are then analyzed simultaneously.

The primary objective of this grouping is to ensure that the light from any source within one group does not significantly spill over into the area where a source in another group is being measured. This method of creating and analyzing smaller groups of sources is more computationally efficient than attempting to fit a model to all the sources in an image at once, a task that is often impractical, especially in densely populated star fields.

A straightforward method for this grouping was introduced by Stetson (1987). This algorithm determines whether a given source’s light profile interferes with that of any other source by using a “critical separation” parameter. This parameter sets the minimum distance required between two source for them to be placed in separate groups. Typically, this critical separation is defined as a multiple of the stellar full width at half maximum (FWHM), which is a measure of the source’s apparent size.

Getting Started#

To group sources, Photutils includes a tool called SourceGrouper. This class organizes sources into groups by applying a technique known as hierarchical agglomerative clustering, which uses a distance-based criterion. This functionality is implemented using the scipy.cluster.hierarchy.fclusterdata function from the SciPy library.

Typically, to group sources during PSF fitting, one would provide a SourceGrouper object, configured with a minimum separation distance, directly to one of the PSF photometry classes. However, for the purpose of illustration, we will show how to use the SourceGrouper class independently to group stars within a sample image.

The first step is to generate a simulated astronomical image that contains sources modeled as 2D Gaussians, which we will accomplish using the make_psf_model_image function:

>>> from photutils.datasets import make_noise_image
>>> from photutils.psf import CircularGaussianPRF, make_psf_model_image
>>> shape = (256, 256)
>>> fwhm = 4.7
>>> psf_model = CircularGaussianPRF(fwhm=fwhm)
>>> psf_shape = (11, 11)
>>> n_sources = 100
>>> flux = (500, 1000)
>>> border_size = (7, 7)
>>> data, stars = make_psf_model_image(shape, psf_model, n_sources,
...                                    model_shape=psf_shape,
...                                    flux=flux,
...                                    border_size=border_size, seed=123)
>>> noise = make_noise_image(shape, mean=0, stddev=2, seed=123)
>>> data += noise

The make_psf_model_image provides two outputs: the image itself, which we call data, and a table containing the positions and fluxes of the stars, which we call stars. The x and y coordinates of the stars are located in the x_0 and y_0 columns of this table.

Let’s display the image:

>>> import matplotlib.pyplot as plt
>>> fig, ax = plt.subplots(figsize=(8, 8))
>>> ax.imshow(data, origin='lower')

(Source code, png, hires.png, pdf, svg)

With the simulated data ready, we can now identify groups of stars. The first step is to create a SourceGrouper object. For this example, we will define the minimum separation between stars in different groups (min_separation) as 2.5 times the full width at half maximum (FWHM). The FWHM value is known because it was used to create the simulated stars. In a real-world scenario, you would first need to measure the FWHM from the actual star images:

>>> from photutils.psf import SourceGrouper
>>> fwhm = 4.7
>>> min_separation = 2.5 * fwhm
>>> grouper = SourceGrouper(min_separation)

After initializing the SourceGrouper, we apply it to the x and y coordinates of the stars. While we are using the known, true positions from our simulated data, you would typically use a source detection tool to find the star positions in an actual image:

>>> import numpy as np
>>> x = np.array(stars['x_0'])
>>> y = np.array(stars['y_0'])
>>> group_ids = grouper(x, y)
>>> print(group_ids[:20])  # first 20 group IDs
[ 1  2  3  4  5  6  7  8  9 10 11  4  6  3 12 13 14 15 16 17]

The result of this process is an array of integers, group_id, where each integer represents the group to which the corresponding star. Stars that share the same group ID are considered part of the same group.

When performing PSF photometry, you can add the group IDs to the initial parameters table (init_params) that is passed to the photometry tool. If you provide these group IDs, a SourceGrouper does not need to be passed to the photometry class, as the grouping will already be defined.

Returning a SourceGroups Object#

Alternatively, you can set the return_groups_object keyword to True when calling the SourceGrouper object, and it will return a SourceGroups object instead of an array of integers:

>>> groups = grouper(x, y, return_groups_object=True)
>>> print(type(groups))
<class 'photutils.psf.groupers.SourceGroups'>

In this case, groups is a SourceGroups object that contains the grouping results and provides convenient methods for analysis. This object stores the source coordinates, group IDs, and provides properties and methods to analyze the grouping.

The grouping algorithm separated the 100 stars into 65 distinct groups:

>>> print(groups.n_groups)
65

You can access the group IDs directly from the groups attribute, which is an array of integers corresponding to the input star coordinates. Stars with the same group ID belong to the same group:

>>> print(groups.groups[:20])  # first 20 group IDs
[ 1  2  3  4  5  6  7  8  9 10 11  4  6  3 12 13 14 15 16 17]

Similar to above, you can add the group IDs from groups.groups to the initial parameters table (init_params) that is passed to the photometry tool to define the source grouping.

To find the positions of the stars in group 3, you can use the get_group_sources method:

>>> x_group3, y_group3 = groups.get_group_sources(3)
>>> print(x_group3, y_group3)
[60.32708921 58.73063714] [147.24184586 158.0612346 ]

The SourceGroups object also provides useful properties and methods to analyze the grouping results:

>>> # Get the size of each group for each source
>>> sizes = groups.sizes
>>> print(f'Group sizes: {sizes[:5]}')  # first 5
Group sizes: [1 2 2 5 2]

>>> # Get the mapping of group IDs to group sizes
>>> size_map = groups.size_map
>>> print(f'Size map: {list(size_map.items())[:5]}')  # first 5
Size map: [(1, 1), (2, 2), (3, 2), (4, 5), (5, 2)]

>>> print(f'Largest group size: {max(size_map.values())}')
Largest group size: 5

>>> # Get a list of group IDs that have the largest group size
>>> largest_group_ids = ([gid for gid, size in size_map.items()
...                       if size == max(size_map.values())])
>>> print(f'Largest group IDs: {largest_group_ids}')
Largest group IDs: [4]

>>> # Get the centroid of group 5
>>> xy_center = groups.group_centers[5]
>>> print(f'Group 5 center: {xy_center}')
Group 5 center: (48.35899721341876, 73.85258893310564)

To visualize the results, we can use the plot method, which draws color-coded circles around each star to show which stars have been grouped together:

>>> import matplotlib.pyplot as plt
>>> fig, ax = plt.subplots(figsize=(8, 8))
>>> ax.imshow(data, origin='lower')
>>> groups.plot(radius=fwhm, ax=ax, lw=2, seed=123)

(Source code, png, hires.png, pdf, svg)

You can also label each group with its ID by setting the label_groups keyword:

>>> import matplotlib.pyplot as plt
>>> fig, ax = plt.subplots(figsize=(8, 8))
>>> ax.imshow(data, origin='lower')
>>> groups.plot(radius=fwhm, ax=ax, lw=2, seed=123,
                label_groups=True, label_offset=(6, 6))

(Source code, png, hires.png, pdf, svg)