# Advanced statistical tools for astrophysics

The physics of the interstellar medium is highly complex, involving many processes acting on a vast range of spatial and temporal scales. The nonlinearity of the interplay between these processes leads to the emergence of non-Gaussian structures such as filaments, sheets, and voids.

To characterize these structures is a necessary first step towards understanding their physical origin, but usual statistical tools such as power spectra are insufficient for non-Gaussian fields. We have adapted tools developed in data science to tackle this problem, the Wavelet Scattering Transform (WST) and Wavelet Phase Harmonics (WPH). Inspired by convolutional neural networks, but without the necessity of a training stage, they provide comprehensive yet low-dimensionality statistical descriptions of complex fields.

An illustration of the denoising of Planck polarization data using the method based on WPH statistics described in Regaldo-Saint Blancard et al. (2021).

## Software

PyWST is a public Python package designed to perform statistical analyses of two-dimensional data with the Wavelet Scattering Transform (WST) and the Reduced Wavelet Scattering Transform (RWST). The WST/RWST provide convenient sets of coefficients that describe non-Gaussian data in a comprehensive way.

*Note:* For GPU-accelerated WST computations, take a look at Kymatio (on which part of this code is based).

## Papers

A new approach for the statistical denoising of Planck interstellar dust polarization data

A&A, submitted, 2021PDF

##### Short Description

This paper presents a novel method to clean the noise from polarization maps, based on the different spatial statistics of the noise and the signal, using a loss function tailored on the Wavelet Phase Harmonics diagnostics.

##### Abstract

Dust emission is the main foreground to Cosmic Microwave Background (CMB) polarization. Its statistical characterization must be derived from the analysis of observational data because the precision required for a reliable component separation is far greater than currently achievable with physical models of the turbulent magnetized interstellar medium. This letter takes a significant step towards this goal by proposing a method that retrieves non-Gaussian statistical characteristics of dust emission from noisy Planck polarization observations at 353 GHz. We devise a statistical denoising method based on the Wavelet Phase Harmonics (WPH) statistics, which characterize the coherent structures in non-Gaussian random fields and define a generative model of the data. The method is validated on mock data combining a dust map from a magnetohydrodynamic simulation and Planck noise maps. The denoised map reproduces the true power spectrum down to scales where the noise power is an order of magnitude larger than that of the signal. It remains highly correlated to the true emission and retrieves some of its non-Gaussian properties. Applied to Planck data, the method provides a new approach towards building a generative model of dust polarization that will characterize the full complexity of the dust emission.

New interpretable statistics for large-scale structure analysis and generation

Phys. Rev. D, 102, 10, 2020PDF

##### Short Description

This paper introduces the Wavelet Phase Harmonics, and demonstrates their ability to efficiently capture non-Gaussian statistics, including such usual signatures as bispectra and Minkowski functionals, on simulations of the large-scale structure of the Universe.

##### Abstract

We introduce wavelet phase harmonics (WPH) statistics: interpretable low-dimensional statistics that describe 2D non-Gaussian fields. These statistics are built from WPH moments, which were recently introduced in the data science and machine learning community. We apply WPH statistics to projected 2D matter density fields from the Quijote N -body simulations of the large-scale structure of the Universe. By computing Fisher information matrices, we find that the WPH statistics place more stringent constraints on four of five cosmological parameters when compared to statistics based on the combination of the power spectrum and bispectrum. We also use the WPH statistics with a maximum entropy model to statistically generate new 2D density fields that accurately reproduce the probability density function, the mean and standard deviation of the power spectrum, the bispectrum, and Minkowski functionals of the input density fields. Although other methods are efficient for either parameter estimates or statistical syntheses of the large-scale structure, WPH statistics are the first statistics that achieve state-of-the-art results for both tasks as well as being interpretable.

Statistical description of dust polarized emission from the diffuse interstellar medium. A RWST approach

A&A, 642, 217, 2020PDF

##### Short Description

This paper generalizes the RWST model to polarized emission maps, and demonstrates that the same model used in total intensity remains applicable.

##### Abstract

The statistical characterization of the diffuse magnetized interstellar medium (ISM) and Galactic foregrounds to the cosmic microwave background (CMB) poses a major challenge. To account for their non-Gaussian statistics, we need a data analysis approach capable of efficiently quantifying statistical couplings across scales. This information is encoded in the data, but most of it is lost when using conventional tools, such as one-point statistics and power spectra. The wavelet scattering transform (WST), a low-variance statistical descriptor of non-Gaussian processes introduced in data science, opens a path towards this goal. To establish the methodology, we applied the WST to noise-free maps of dust polarized thermal emission computed from a numerical simulation of magnetohydrodynamical turbulence in the diffuse ISM. We analyzed normalized complex Stokes maps and maps of the polarization fraction and polarization angle. The WST yields a few thousand coefficients; some of them measure the amplitude of the signal at a given scale, and the others characterize the couplings between scales and orientations. The dependence on orientation can be fitted with the reduced wavelet scattering transform (RWST), an angular model introduced in previous works for total intensity maps. The RWST provides a statistical description of the polarization maps, quantifying their multiscale properties in terms of isotropic and anisotropic contributions. It allowed us to exhibit the dependence of the map structure on the orientation of the mean magnetic field and to quantify the non-Gaussianity of the data. We also used RWST coefficients, complemented by additional constraints, to generate random synthetic maps with similar statistics. Their agreement with the original maps demonstrates the comprehensiveness of the statistical description provided by the RWST. This work is a step forward in the analysis of observational data and the modeling of CMB foregrounds. We also release PyWST, a public Python package to perform WST and RWST analyses of two-dimensional data.

The RWST, a comprehensive statistical description of the non-Gaussian structures in the ISM

A&A, 629, 115, 2019PDF

##### Short Description

This paper introduces the Wavelet Scattering Transform in an astrophysical context and demonstrates that it may be further reduced to the RWST, a statistical description with a small number of coefficients that characterizes complex structures arising from nonlinear phenomena.

##### Abstract

The interstellar medium (ISM) is a complex nonlinear system governed by the interplay between gravity and magneto-hydrodynamics, as well as radiative, thermodynamical, and chemical processes. Our understanding of it mostly progresses through observations and numerical simulations, and a quantitative comparison between these two approaches requires a generic and comprehensive statistical description of the emerging structures. The goal of this paper is to build such a description, with the purpose of permitting an efficient comparison that is independent of any specific prior or model. We started from the wavelet scattering transform (WST), a low-variance statistical description of non-Gaussian processes, which was developed in data science and encodes long-range interactions through a hierarchical multiscale approach based on the wavelet transform. We performed a reduction of the WST through a fit of its angular dependencies. This allowed us to gather most of the information it contains into a few components whose physical meanings are identified and describe for instance isotropic and anisotropic behaviours. The result of this paper is the reduced wavelet scattering transform (RWST), a statistical description with a small number of coefficients that characterizes complex structures arising from nonlinear phenomena, in particular interstellar magnetohydrodynamical (MHD) turbulence, independently of any specific priors. The RWST coefficients encode moments of order up to four, have reduced variances, and quantify the couplings between scales. To show the efficiency and generality of this description, we applied it successfully to the following three kinds of processes that are a priori very different: fractional Brownian motions, MHD simulations, and Herschel observations of the dust thermal continuum in a molecular cloud. With fewer than 100 RWST coefficients when probing six scales and eight angles on 256 by 256 maps, we were able to perform quantitative comparisons, infer relevant physical properties, and produce realistic synthetic fields.