Synergies between low- and intermediate-redshift galaxy population classifications revealed with unsupervised machine learning

(with Malgorzata Siudek, Agnieszka Pollo, Samir Salim, Kasia Malek, Ivan Baldry, and collaborators)

Check back soon for more info on this project!

Click here to view a poster about this work (presented at NAM 2019, Lancaster, where it won Best Student Poster).

Testing a cosmological galaxy simulation with unsupervised machine learning

(with Ivan Baldry, Paulo Lisboa, Rob Crain, and collaborators)

Check back soon for more info on this project!

Reproducible k-means clustering in galaxy feature data from the GAMA survey

(with Lee Kelvin, Ivan Baldry, Paulo Lisboa, Steve Longmore, Chris Collins, and collaborators)

My first paper reports the results of a test of the k-means clustering algorithm, guided by a unique cluster evaluation approach, as a tool for exploring the large, multidimensional datasets expected from the next generation of extragalactic surveys (e.g. EUCLID). Our Monte Carlo approach identifies suitable values of k for modelling a given sample by considering the reproducibility of clustering outcomes at each value of k relative to those at other values. Reproducibility (a.k.a. stability) is an underrated tool for cluster evaluation. The approach is fast, robust, and malleable. It may be adapted for use with any clustering algorithm and any sample. A Python 3 package that implements the approach is available, along with a notebook providing instructions for its use, at this link.

We tested our clustering approach using a sample of 7338 galaxies taken from the Galaxy And Mass Assembly survey. We characterised the galaxies using the following five features, each relevant to galaxy evolution: stellar mass, u-r colour, Sersic index, half-light radius, and specific star formation rate. Reproducible clustering was found at k = 2, 3, 5, and 6. The clustering outcomes at each of these values of k agreed with established notions of a bimodality of galaxies. The outcomes at the higher of these values of k appeared to indicate distinct evolutionary pathways of galaxies through the green valley, consistent with suggestions in other, recent publications.

Clusters in the colour-mass plane.


Click here to view a poster about this work (presented at EWASS 2018, Liverpool, and STFC Summer School in AI & ML 2018, London).