The two-point correlation function of the galaxy distribution is a key cosmological observable that allows us to constrain the dynamical and geometrical state of our Universe. To measure the correlation function we need to know both the galaxy positions and the expected galaxy density field. The expected field is commonly specified using a Monte-Carlo sampling of the volume covered by the survey and, to minimize additional sampling errors, this random catalog has to be much larger than the data catalog. Correlation function estimators compare data-data pair counts to data-random and random-random pair counts, where random-random pairs usually dominate the computational cost. Future redshift surveys will deliver spectroscopic catalogs of tens of millions of galaxies. Given the large number of random objects required to guarantee sub-percent accuracy, it is of paramount importance to improve the efficiency of the algorithm without degrading its precision. We show both analytically and numerically that splitting the random catalog into a number of subcatalogs of the same size as the data catalog when calculating random-random pairs, and excluding pairs across different subcatalogs provides the optimal error at fixed computational cost. For a random catalog fifty times larger than the data catalog, this reduces the computation time by a factor of more than ten without affecting estimator variance or bias.
We investigate the cosmological constraints that can be expected from measurement of the cross-correlation of galaxies with cosmic voids identified in the Euclid spectroscopic survey, which will include spectroscopic information for tens of millions of galaxies over $15\,000$ deg$^2$ of the sky in the redshift range $0.9\leq z<1.8$. We do this using simulated measurements obtained from the Flagship mock catalogue, the official Euclid mock that closely matches the expected properties of the spectroscopic data set. To mitigate anisotropic selection-bias effects, we use a velocity field reconstruction method to remove large-scale redshift-space distortions from the galaxy field before void-finding. This allows us to accurately model contributions to the observed anisotropy of the cross-correlation function arising from galaxy velocities around voids as well as from the Alcock-Paczynski effect, and we study the dependence of constraints on the efficiency of reconstruction. We find that Euclid voids will be able to constrain the ratio of the transverse comoving distance $D_{\rm M}$ and Hubble distance $D_{\rm H}$ to a relative precision of about $0.3\%$, and the growth rate $f\sigma_8$ to a precision of between $5\%$ and $8\%$ in each of four redshift bins covering the full redshift range. In the standard cosmological model, this translates to a statistical uncertainty $\Delta\Omega_\mathrm{m}=\pm0.0028$ on the matter density parameter from voids, better than can be achieved from either Euclid galaxy clustering and weak lensing individually. We also find that voids alone can measure the dark energy equation of state to $6\%$ precision.
(Abridged) The Euclid mission is expected to discover thousands of z>6 galaxies in three Deep Fields, which together will cover a ~40 deg2 area. However, the limited number of Euclid bands and availability of ancillary data could make the identification of z>6 galaxies challenging. In this work, we assess the degree of contamination by intermediate-redshift galaxies (z=1-5.8) expected for z>6 galaxies within the Euclid Deep Survey. This study is based on ~176,000 real galaxies at z=1-8 in a ~0.7 deg2 area selected from the UltraVISTA ultra-deep survey, and ~96,000 mock galaxies with 25.3$\leq$H<27.0, which altogether cover the range of magnitudes to be probed in the Euclid Deep Survey. We simulate Euclid and ancillary photometry from the fiducial, 28-band photometry, and fit spectral energy distributions (SEDs) to various combinations of these simulated data. Our study demonstrates that identifying z>6 with Euclid data alone will be very effective, with a z>6 recovery of 91(88)% for bright (faint) galaxies. For the UltraVISTA-like bright sample, the percentage of z=1-5.8 contaminants amongst apparent z>6 galaxies as observed with Euclid alone is 18%, which is reduced to 4(13)% by including ultra-deep Rubin (Spitzer) photometry. Conversely, for the faint mock sample, the contamination fraction with Euclid alone is considerably higher at 39%, and minimized to 7% when including ultra-deep Rubin data. For UltraVISTA-like bright galaxies, we find that Euclid (I-Y)>2.8 and (Y-J)<1.4 colour criteria can separate contaminants from true z>6 galaxies, although these are applicable to only 54% of the contaminants, as many have unconstrained (I-Y) colours. In the most optimistic scenario, these cuts reduce the contamination fraction to 1% whilst preserving 81% of the fiducial z>6 sample. For the faint mock sample, colour cuts are infeasible.
Recent cosmic shear studies have shown that higher-order statistics (HOS) developed by independent teams now outperform standard two-point estimators in terms of statistical precision thanks to their sensitivity to the non-Gaussian features of large-scale structure. The aim of the Higher-Order Weak Lensing Statistics (HOWLS) project is to assess, compare, and combine the constraining power of $10$ different HOS on a common set of $Euclid$-like mocks, derived from N-body simulations. In this first paper of the HOWLS series we compute the non-tomographic ($\Omega_{\rm m}$, $\sigma_8$) Fisher information for one-point probability distribution function, peak counts, Minkowski functionals, Betti numbers, persistent homology Betti numbers and heatmap, and scattering transform coefficients, and compare them to the shear and convergence two-point correlation functions in the absence of any systematic bias. We also include forecasts for three implementations of higher-order moments, but these cannot be robustly interpreted as the Gaussian likelihood assumption breaks down for these statistics. Taken individually, we find that each HOS outperforms the two-point statistics by a factor of around $2$ in the precision of the forecasts with some variations across statistics and cosmological parameters. When combining all the HOS, this increases to a $4.5$ times improvement, highlighting the immense potential of HOS for cosmic shear cosmological analyses with $Euclid$. The data used in this analysis are publicly released with the paper.
Forthcoming large photometric surveys for cosmology require precise and accurate photometric redshift (photo- z ) measurements for the success of their main science objectives. However, to date, no method has been able to produce photo- z s at the required accuracy using only the broad-band photometry that those surveys will provide. An assessment of the strengths and weaknesses of current methods is a crucial step in the eventual development of an approach to meet this challenge. We report on the performance of 13 photometric redshift code single value redshift estimates and redshift probability distributions (PDZs) on a common set of data, focusing particularly on the 0.2 − 2.6 redshift range that the Euclid mission will probe. We designed a challenge using emulated Euclid data drawn from three photometric surveys of the COSMOS field. The data was divided into two samples: one calibration sample for which photometry and redshifts were provided to the participants; and the validation sample, containing only the photometry to ensure a blinded test of the methods. Participants were invited to provide a redshift single value estimate and a PDZ for each source in the validation sample, along with a rejection flag that indicates the sources they consider unfit for use in cosmological analyses. The performance of each method was assessed through a set of informative metrics, using cross-matched spectroscopic and highly-accurate photometric redshifts as the ground truth. We show that the rejection criteria set by participants are efficient in removing strong outliers, that is to say sources for which the photo- z deviates by more than 0.15(1 + z ) from the spectroscopic-redshift (spec- z ). We also show that, while all methods are able to provide reliable single value estimates, several machine-learning methods do not manage to produce useful PDZs. We find that no machine-learning method provides good results in the regions of galaxy color-space that are sparsely populated by spectroscopic-redshifts, for example z > 1. However they generally perform better than template-fitting methods at low redshift ( z < 0.7), indicating that template-fitting methods do not use all of the information contained in the photometry. We introduce metrics that quantify both photo- z precision and completeness of the samples (post-rejection), since both contribute to the final figure of merit of the science goals of the survey (e.g., cosmic shear from Euclid ). Template-fitting methods provide the best results in these metrics, but we show that a combination of template-fitting results and machine-learning results with rejection criteria can outperform any individual method. On this basis, we argue that further work in identifying how to best select between machine-learning and template-fitting approaches for each individual galaxy should be pursued as a priority.
The Euclid mission is expected to image millions of galaxies at high resolution, providing an extensive dataset with which to study galaxy evolution. Because galaxy morphology is both a fundamental parameter and one that is hard to determine for large samples, we investigate the application of deep learning in predicting the detailed morphologies of galaxies in Euclid using Zoobot , a convolutional neural network pretrained with 450 000 galaxies from the Galaxy Zoo project. We adapted Zoobot for use with emulated Euclid images generated based on Hubble Space Telescope COSMOS images and with labels provided by volunteers in the Galaxy Zoo: Hubble project. We experimented with different numbers of galaxies and various magnitude cuts during the training process. We demonstrate that the trained Zoobot model successfully measures detailed galaxy morphology in emulated Euclid images. It effectively predicts whether a galaxy has features and identifies and characterises various features, such as spiral arms, clumps, bars, discs, and central bulges. When compared to volunteer classifications, Zoobot achieves mean vote fraction deviations of less than 12% and an accuracy of above 91% for the confident volunteer classifications across most morphology types. However, the performance varies depending on the specific morphological class. For the global classes, such as disc or smooth galaxies, the mean deviations are less than 10%, with only 1000 training galaxies necessary to reach this performance. On the other hand, for more detailed structures and complex tasks, such as detecting and counting spiral arms or clumps, the deviations are slightly higher, of namely around 12% with 60 000 galaxies used for training. In order to enhance the performance on complex morphologies, we anticipate that a larger pool of labelled galaxies is needed, which could be obtained using crowd sourcing. We estimate that, with our model, the detailed morphology of approximately 800 million galaxies of the Euclid Wide Survey could be reliably measured and that approximately 230 million of these galaxies would display features. Finally, our findings imply that the model can be effectively adapted to new morphological labels. We demonstrate this adaptability by applying Zoobot to peculiar galaxies. In summary, our trained Zoobot CNN can readily predict morphological catalogues for Euclid images.
The Euclid Space Telescope will provide deep imaging at optical and near-infrared wavelengths, along with slitless near-infrared spectroscopy, across ~15,000 sq deg of the sky. Euclid is expected to detect ~12 billion astronomical sources, facilitating new insights into cosmology, galaxy evolution, and various other topics. To optimally exploit the expected very large data set, there is the need to develop appropriate methods and software. Here we present a novel machine-learning based methodology for selection of quiescent galaxies using broad-band Euclid I_E, Y_E, J_E, H_E photometry, in combination with multiwavelength photometry from other surveys. The ARIADNE pipeline uses meta-learning to fuse decision-tree ensembles, nearest-neighbours, and deep-learning methods into a single classifier that yields significantly higher accuracy than any of the individual learning methods separately. The pipeline has `sparsity-awareness', so that missing photometry values are still informative for the classification. Our pipeline derives photometric redshifts for galaxies selected as quiescent, aided by the `pseudo-labelling' semi-supervised method. After application of the outlier filter, our pipeline achieves a normalized mean absolute deviation of ~< 0.03 and a fraction of catastrophic outliers of ~< 0.02 when measured against the COSMOS2015 photometric redshifts. We apply our classification pipeline to mock galaxy photometry catalogues corresponding to three main scenarios: (i) Euclid Deep Survey with ancillary ugriz, WISE, and radio data; (ii) Euclid Wide Survey with ancillary ugriz, WISE, and radio data; (iii) Euclid Wide Survey only. Our classification pipeline outperforms UVJ selection, in addition to the Euclid I_E-Y_E, J_E-H_E and u-I_E,I_E-J_E colour-colour methods, with improvements in completeness and the F1-score of up to a factor of 2. (Abridged)
We present in this paper the general real- and redshift-space clustering properties of galaxies as measured in the first data release of the VIPERS survey. VIPERS is a large redshift survey designed to probe the distant Universe and its large-scale structure at 0.5 < z < 1.2. We describe in this analysis the global properties of the sample and discuss the survey completeness and associated corrections. This sample allows us to measure the galaxy clustering with an unprecedented accuracy at these redshifts. From the redshift-space distortions observed in the galaxy clustering pattern we provide a first measurement of the growth rate of structure at z = 0.8: f\sigma_8 = 0.47 +/- 0.08. This is completely consistent with the predictions of standard cosmological models based on Einstein gravity, although this measurement alone does not discriminate between different gravity models.
LensMC is a weak lensing shear measurement method developed for Euclid and Stage-IV surveys. It is based on forward modelling to deal with convolution by a point spread function with comparable size to many galaxies; sampling the posterior distribution of galaxy parameters via Markov Chain Monte Carlo; and marginalisation over nuisance parameters for each of the 1.5 billion galaxies observed by Euclid. The scientific performance is quantified through high-fidelity images based on the Euclid Flagship simulations and emulation of the Euclid VIS images; realistic clustering with a mean surface number density of 250 arcmin$^{-2}$ ($I_{\rm E}<29.5$) for galaxies, and 6 arcmin$^{-2}$ ($I_{\rm E}<26$) for stars; and a diffraction-limited chromatic point spread function with a full width at half maximum of $0.^{\!\prime\prime}2$ and spatial variation across the field of view. Objects are measured with a density of 90 arcmin$^{-2}$ ($I_{\rm E}<26.5$) in 4500 deg$^2$. The total shear bias is broken down into measurement (our main focus here) and selection effects (which will be addressed elsewhere). We find: measurement multiplicative and additive biases of $m_1=(-3.6\pm0.2)\times10^{-3}$, $m_2=(-4.3\pm0.2)\times10^{-3}$, $c_1=(-1.78\pm0.03)\times10^{-4}$, $c_2=(0.09\pm0.03)\times10^{-4}$; a large detection bias with a multiplicative component of $1.2\times10^{-2}$ and an additive component of $-3\times10^{-4}$; and a measurement PSF leakage of $\alpha_1=(-9\pm3)\times10^{-4}$ and $\alpha_2=(2\pm3)\times10^{-4}$. When model bias is suppressed, the obtained measurement biases are close to Euclid requirement and largely dominated by undetected faint galaxies ($-5\times10^{-3}$). Although significant, model bias will be straightforward to calibrate given the weak sensitivity.