Daniel Alejandro Saavedra Morales. Pontificia Universidad Católica de Chile
A Bayesian Approach Model Selection for Heavy-Tailed Data.
Sala 3, Facultad de Matemáticas
Abstract:
Heavy-tailed distributions have been a subject of study for a long time due to their numerous applications in various fields, such as economics, natural disasters, signals, and social sciences. In particular, there is extensive research on power-law distributions ($p(x) \propto x^{\alpha}$) and their generalization, regularly varying functions ($\mathcal{RV}_\alpha$), which behave approximately like a power-law in the tail of the distribution.
Although multiple approaches have been developed to study tail behavior in both univariate and multivariate data, as well as in the presence of regressors, many of these studies tend to set an arbitrary threshold or percentile from which the fitting process begins. This can result in a loss of information contained in the body of the distribution. On the other hand, some research uses all observed data to estimate heavy-tailed densities, particularly under Bayesian approaches. However, these models tend to be complex to handle, especially when model selection is required.
This project has two main objectives. The first is to propose Bayesian model selection in flexible regression models for heavy-tailed distributions $\mathcal{RV}_\alpha$, using a simple yet flexible model such as the Gaussian mixture model under a dependent Dirichlet process (DDP-GMM), in the logarithmic space of the observations, where $\mathcal{RV}_\alpha$ distributions become light-tailed. This approach facilitates model selection through a Spike and Slab methodology, as it allows for the analytical computation of the marginal likelihood.
The second objective is to develop a model selection strategy using flexible regression for heavy-tailed $\mathcal{RV}_\alpha$ data. To achieve this, a Bayesian quantile regression will be proposed for both low and high percentiles, with errors distributed according to an asymmetric Laplace mixture under a normalized generalized gamma (NGG) process on the scale parameters. A Spike and Slab methodology will be employed for model selection, enabling the analysis of relevant regressors for the quantiles in the tails of the distribution.