I am getting my PhD in astrophysics from UC Berkeley. My research focuses on observational cosmology, which boils down to analyzing images of the night sky. I help develop a pipeline that automatically detects stars and galaxies in multi-wavelength images and fits models to them. From the sample of galaxies we collect, it is possible to answer questions like, “how has the expansion rate of the Universe changed over time?”. The Sloan Digital Sky Survey (SDSS) showed that you can do this with a large enough sample (about a million galaxies) and region of the sky (1/3 of it). My team is carrying out the Legacy Surveyto detect 30x more, and up to 6x fainter, galaxies than SDSS. The most distant galaxies are about 10 billion lightyears away.
Our goal is to create a 2D map of the positions of about 30 million galaxies extracted from more than 100 TBs of images. Given the locations, we can take a spectrum of each galaxy (e.g. how bright it is at many wavelengths of visible light) and infer how far away it is. From this 3D map we can measure the expansion rate of the Universe at different points in time (Eisenstein & Hu 1998; Eisenstein et al. 2005; Seo & Eisenstein 2007; Butler et al. 2017.
The primary goal of my thesis is to measure the statistical bias and variance of our pipeline. How does our completeness depend on whether a galaxy is bright or faint, blue or red, big or small? How well does our pipeline handle image artifacts, instrument issues, or transient objects?
To answer all of this, John Moustakas, a professor at Siena College, and I developed the obiwan code. It does Monte Carlo simulations of the pipeline by adding fake galaxies to random locations in the images, running the pipeline, and repeating.