We developed a new research platform, QUOTAS,  that combines observational and simulated data to facilitate the study of supermassive black hole (SMBH) populations1. Multi-wavelength observations of accreting sources and cosmological simulations that incorporate black-hole growth as a part of galaxy formation have grown in volume and complexity. A further data deluge is expected with the large number of observational facilities both in space and on the ground that are expected to come online soon. Our research tools need to be commensurately updated to leverage data-driven discovery in this upcoming era. The QUOTAS database is tailored to harness the currently ongoing developments in machine learning (ML) techniques and algorithms for new discoveries2,3,4,5. QUOTAS co-locates these disparate kinds of data to permit detailed analysis of the growth and evolution of SMBHs over cosmic time.

In the pilot version, QUOTAS comprises all detected optical quasars at z ≥ 3 alongside simulated data spanning the same cosmic epochs. At the highest redshifts, only the most luminous quasars are detected, providing us an incomplete and biased census of the SMBH population. Our goal is to go beyond these extreme, rare sources and uncover the more representative and ubiquitous quasars at early epochs that will permit building more comprehensive and robust models for the formation and growth history of SMBHs as a function of redshift. The queryable QUOTAS database, and associated Colab notebooks for analysing the data, along with illustrative examples that demonstrate the ease of use, are available on the public Google Kaggle platform.

This project was done in collaboration with Google-X (Jack Hidary & Joe Tricot) to develop. QUOTAS  has been developed and optimized to use machine learning algorithms to make predictions for associations between supermassive black holes, their host galaxies and their parent dark matter halos. Pilot version of the project focuses on supermassive black holes at z > 3 and corresponding simulated slices from the The First Billion Years (FiBY) & Legacy Simulations


Using ML in the expanded Legacy 1 Gpc^3 dark matter only simulation volume, we were able to populate quasars from the smaller IllustrisTNG simulation box that include the full suite of baryonic physics and feedback. We then compared the properties of the simulated quasar populaiton at z ~ 3 with the observed sources from the SDSS
Intriguingly we find that simulations are unable to reproduce the properties of the observed quasar population due to the efficiency of AGN feedback being too high, which is stunting the growth of supermassive black holes prematurely. It is well understood that simulations have difficulty reproducing the observed quasar population at z > 6 not just due to their rarity. Now we find that they are unable to do so at even later times at z ~ 3.