Where is Waldo (and his friends)? A comparison of anomaly detection algorithms for time-domain astronomy


Our understanding of the Universe has progressed through deliberate, targeted studies of known phenomena, like the supernova campaigns that enabled the discovery of the accelerated expansion of the Universe, as much as through serendipitous, unexpected discoveries. The discovery of the Jovian moons, and of interstellar objects like 1I/‘Oumuamua forced us to rethink the framework through which we explain the Universe and develop new theories. Recent surveys, like the Catalina Realtime-Transient Survey and the Zwicky Transient Facility, and upcoming ones, like the Rubin Legacy Survey of Space and Time, explore the parameter space of astrophysical transients at all time scales, from hours to years, and offer the opportunity to discover new, unexpected phenomena. In this paper, we investigate strategies to identify novel objects and to contextualize them within large time-series data sets to facilitate the discovery of new objects, new classes of objects, and the physical interpretation of their anomalous nature. We compare tree-based and manifold-learning algorithms for anomaly detection as they are applied to a data set of light curves from the Kepler observatory that include the bona fide anomalous Boyajian’s star. We assess the impact of pre-processing and feature engineering schemes and investigate the astrophysical nature of the objects that our models identify as anomalous by augmenting the Kepler data with emph{Gaia} color and luminosity information. We find that multiple models, used in combination, are a promising strategy to not only identify novel time series but also to find objects that share phenomenological and astrophysical characteristics with them, facilitating the interpretation of their anomalous characteristics.