Seminario de Estadística


2017-09-08
12:00hrs.
Daniela Castro. King Abdullah University of Science and Technology (Saudi Arabia)
Spatial analysis of U.S. precipitation extremes: a local likelihood approach for estimating complex tail dependence structures in high dimensions
Sala 2, Facultad de Matemáticas, Edificio Rolando Chuaqui, Campus San Joaquin, Pontificia Universidad Católica de Chile
Abstract:
In order to model the complex non-stationary dependence structure of precipitation extremes over the entire contiguous U.S., we propose a flexible local approach based on factor copula models. Specifically, by using Gaussian location mixture processes, we assume that there is a common, unobserved, random factor affecting the joint dependence of all measurements in small regional neighborhoods. Choosing this common factor to be exponentially distributed, one can show that the joint upper tail of the resulting copula is asymptotically equivalent to the (max-stable) Hüsler-Reiss copula or its Pareto process counterpart; therefore, the so-called exponential factor copula model captures tail dependence, but unlike the latter, has weakening dependence strength as events become more extreme, a feature commonly observed with precipitation data. In order to describe the stochastic behavior of extreme precipitation events over the U.S., we embed the exponential factor model in a more general non-stationary model, but we fit its local stationary counterpart to high threshold exceedances under the assumption of local stationarity. This allows us to gain in flexibility, while making inference for such a large and complex dataset feasible. Adopting a local censored likelihood approach, inference is made on a fine spatial grid, and local model fitting is performed taking advantage of distributed computing resources and of the embarrassingly parallel nature of this estimation method. The local model is efficiently fitted at all grid points, and uncertainty is measured using a block bootstrap procedure. Simulation results show that our approach is able to adequately capture complex dependencies on a local scale, therefore providing valuable input for regional risk assessment. Additionally, our data application shows that the model is able to flexibly represent extreme rainfall characteristics on a continental scale. A comparison between past and current U.S. rainfall data suggests that extremal dependence might be stronger nowadays than during the first half of the twentieth century in some areas, which has important implications on regional flood risk assessment.
2017-07-07
12:00hrs.
Gabriel Muñoz. Pontificia Universidad Católica de Chile
Identificación en modelos tipo Rasch, convirtiendo 1PL-AG en 2*1PL
Sala 2, Facultad de Matemáticas
Abstract:

La identificabilidad en los modelos tipo Rasch es de suma importancia ya que establece una biyección entre los parámetros identifi cados por naturaleza y los parámetros de interés. Mientras en algunos de estos modelos ha sido posible identifi car los parámetros de interés (1PL, 1PL-G 2PL), en otros aun es un problema abierto (1PL-AG, 3PL). Las técnicas usualmente usadas para resolver estos problemas relacionan los parámetros de interés entre si, para luego fijar restricciones sobre el espacio paramétrico. En esta charla, se mostrará una forma de relacionar modelos distintos (1PL-AG con 1PL y 3PL con 2PL) para buscar la identi ficación de los parámetros de interés a partir de resultados obtenidos en modelos más sencillos.

2017-06-30
12:00 hrshrs.
Giovanni Motta. Pontificia Universidad Católica de Chile
Cholesky decomposition for time-varying covariances and auto-covariances
Sala 2, Facultad de Matemáticas
Abstract:

In this work we introduce a positive definite non-parametric estimator of the covariance matrix while permitting different smoothing parameters. This estimator is based on the Cholesky decomposition of a pre-estimator of the covariance matrix.

Kernel-type smoothers of the cross-products are based on the same bandwidth for all the variances and co-variances. Though this approach guarantees positive semi-definiteness of the estimated covariance matrix, the use of one common bandwidth for all the entries might be restrictive, as variances and co-variances are in general characterized by different degrees of smoothness. On the other hand, a kernel-type estimator based on different smoothing parameters may deliver an estimated matrix which is not necessarily positive semi-definite. The estimator we propose in this paper is based on the Cholesky decomposition of a preliminary raw estimate of the local covariances. This new approach allows for different smoothing bandwidths while preserving positive definiteness. This approach is particularly appealing for locally stationary time series. In particular, we address the estimation problems for two types of time-varying covariance matrices: the contemporaneous covariance matrix of a multivariate non-stationary time series, and the auto-covariance matrix of a univariate non-stationary time series.

2017-06-23
12:00hrs.
Carlos Sing-Long. Ingeniería Matemática y Computacional Escuela de Ingeniería
Computational Methods for Unbiased Risk Estimation
Sala 2, Facultad de Matemáticas
Abstract:

In many engineering applications, one seeks to estimate, recover or reconstruct an unknown object of interest from an incomplete set of linear measurements. Mathematically, the unknown object can be represented as the solution to an underdetermined system of linear equations. In recent years it has been shown that it is possible to recover the true object by exploiting a priori information about its structure, such as sparsity in compressed sensing or low-rank in matrix completion. However, in practice the measurements are corrupted by noise and exact recovery is not possible.

A popular approach to address this issue is to solve an unconstrained convex optimization problem to obtain an estimate that both explains the measurements and resembles the known structural characteristics of the true object. The objective function quantifies the trade-off between data fidelity and structural fidelity, which is usually controlled by a single regularization parameter. One possible criterion for selecting the value of this parameter is to minimize an unbiased estimate for the prediction error as a surrogate for the true prediction risk. Unfortunately, evaluating this estimate requires an expression for the weak divergence of the predicted observations. Therefore, it is necessary to characterize the regularity of the solution to the convex optimization problem with respect to the measurements.

In this talk I will present a conceptual and practical framework to study the regularity of the solution to a popular class of such optimization problems. The approach consists of using an auxiliary optimization problem that characterizes the smoothness of the predicted observations. In particular, we can relate the analytic singularities of the predicted observations with the geometric singularities of the feasible set to this problem. I will then present a disciplined approach for obtaining closed-form expressions for the derivatives of the predicted measurements that are amenable to computation. Finally, I will explain how the expressions establish a connection between the geometry of the convex optimization problem and the unbiased estimate for the prediction risk.

2017-06-09
12:00 hrs.
Ricardo Bórquez. Departamento de Economía Agraria, Facultad de Agronomía e Ingeniería Forestal
Financial Markets: Controversial or Untestable Theories?
Sala 2, Facultad de Matemáticas
Abstract:

This paper demonstrates a measurability property of no-arbitrage prices, which precludes martingale law identification based on price information. Theoretical model hypotheses defined on martingale laws  are therefore untestable using prices. Examples of such untestable models are  found in the  literature of rational  bubbles, efficient  markets, and more  recently, of arbitrage pricing.

2017-06-02
12:00hrs.
Mauricio Castro. Universidad de Concepción
Recent advances in censored regression models
Sala 2, Facultad de Matemáticas
Abstract:

Recently, the study of statistical models where the dependent variable is censored has been studied in different fields, namely, econometric analysis, clinical assays and biostatistical analysis among others. In practice, it is quite common assume Gaussianity for the random components of the model due mainly to the computational flexibility for parameter estimation.

However, such an assumption may not be realistic. In fact, the likelihood-based and Bayesian inferences for censored models can be seriously affected by the presence of atypical observations, skewness and/or the misspecification of the distributions for random terms.

The objective of this talk is to provide a review about statistical models for censored data using non-Gaussian families of distributions to model human immunodeficiency virus (HIV) dynamics. 

2017-05-26
12:00 hrshrs.
Nedret Billor. Department of Mathematics and Statistics Auburn University
Robust Inference in Functional Data Analysis
Sala 2, Facultad de Matemáticas
Abstract:

In the last twenty years, a substantial amount of attention has been drawn to the field of functional data analysis. While the study of the probabilistic tools for infinite dimensional variables started in the beginning of the 20th century, the development of statistical models and methods for functional data has only really been developed in the last two decades since many scientific fields involving applied statistics have started measuring and recording massive continuous data due to rapid technological advancements. The methods developed in this field mainly require homogeneity of functional data, namely free of outliers. However, the development of methods in the presence of outliers has just been recently studied. In this talk, we focus on the effect of outliers on functional data analysis techniques. Then we introduce robust estimation and variable selection methods for a special functional regression model as well as simultaneous confidence band for the Mean Function of functional data. Simulation studies and data applications are presented to compare the performance of the proposed methods with the non?robust techniques. 

2017-05-19
12:00 hrs.
Isabelle Beaudry. Pontificia Universidad Católica de Chile
Correcting for nonrandom recruitment with RDS data: design-based and Bayesian approaches
Sala 2, Facultad de Matemáticas
Abstract:

Respondent-driven sampling (RDS) is a sampling mechanism that has proven very effective to sample hard-to-reach human populations connected through social networks. A small number of individuals typically known to the researcher are initially sampled and asked to recruit a small fixed number of their contacts who are also members of the target population. Each subsequent sampling waves are produced by peer recruitment until a desired sample size is achieved. However, the researcher's lack of control over the sampling process has posed several challenges to producing valid statistical inference from RDS data. For instance, participants are generally assumed to recruit completely at random among their contacts despite the growing empirical evidence that suggests otherwise and the substantial sensitivity of most RDS estimators to this assumption. The main contributions of this study are to parameterize alternative recruitment behaviors and propose a design-based and a model-based (Bayesian) estimators to correct for nonrandom recruitment.

2017-04-21
12:00hrs.
Felipe Osorio. Instituto de Estadística, Pontificia Universidad Católica de Valparaíso
Test Gradiente Para Extremum Estimators
Sala 2, Facultad de Matemáticas
Abstract:

En este trabajo se introduce el test gradiente propuesto por Terrell [Comp. Sci. Stat. 34: 206-215, 2002] al contexto de estimadores que surgen como el extremo de una función objetivo, esta clase general de estimadores frecuentemente conocidos como “extremum estimators” proveen un marco general para el estudio de distintos procedimientos de estimación que comparten principios comunes. En esta charla, nos enfocamos principalmente en abordar test de hipótesis no lineales así como de la aplicación del test gradiente en diagnóstico de influencia. La metodología es aplicada para determinar la igualdad entre razones de Sharpe asociado a las rentabilidades de los fondos de pensiones desde el sistema previsional chileno. 

2017-03-06
12:00hrs.
Garritt Page. Brigham Young University
Estimation and Prediction in the Presence of Spatial Confounding for Spatial Linear Models
Sala 2, Facultad de Matemáticas
Abstract:

In studies that produce data with spatial structure it is common that covariates of interest vary spatially in addition to the error. Because of this,  the error and covariate are often correlated. When this occurs it is difficult to distinguish the covariate effect from residual spatial variation.  In an iid normal error setting, it is well known that this type of correlation produces biased coefficient estimates but predictions remain unbiased.  In a spatial setting  recent studies have shown that coefficient estimates remain biased, but spatial prediction has not been addressed. The purpose of this paper is to provide a more detailed study of coefficient estimation from spatial models when covariate and error are correlated and then begin a formal study regarding spatial prediction. This is carried out by investigating properties of the generalized least squares estimator and the best linear unbiased predictor when a spatial random effect and a covariate are jointly modeled. Under this setup we demonstrate that the mean squared prediction error is possibly reduced when covariate and error are correlated.  

2017-01-11
12:00hrs.
Marc G. Genton. King Abdullah University of Science and Technology (Kaust), Saudi Arabia
Computational Challenges with Big Environmental Data
Sala 2, Facultad de Matemáticas
Abstract:

Two types of computational challenges arising from big environmental data are discussed. The first type occurs with multivariate or spatial extremes. Indeed, inference for max-stable processes observed at a large collection of locations is among the most challenging problems in computational statistics, and current approaches typically rely on less expensive composite likelihoods constructed from small subsets of data. We explore the limits of modern state-of-the-art computational facilities to perform full likelihood inference and to efficiently evaluate high-order composite likelihoods. With extensive simulations, we assess the loss of information of composite likelihood estimators with respect to a full likelihood approach for some widely-used multivariate or spatial extreme models. The second type of challenges occurs with the emulation of climate model outputs. We consider fitting a statistical model to over 1 billion global 3D spatio-temporal temperature data using a distributed computing approach. The statistical model exploits the gridded geometry of the data and parallelization across processors. It is therefore computationally convenient and allows to fit a non-trivial model to a data set with a covariance matrix comprising of 10^{18} entries. We provide 3D visualization of the results. The talk is based on joint work with Stefano Castruccio and Raphael Huser.

 

2017-01-11
11:00hrs.
Ying Sun. King Abdullah University of Science and Technology (Kaust), Saudi Arabia
Total Variation Depth for Functional Data
Sala 2, Facultad de Matemáticas
Abstract:

There has been extensive work on data depth-based methods for robust multivariate data analysis. Recent developments have moved to infinite-dimensional objects such as functional data. In this work, we propose a new notion of depth, the total variation depth, for functional data. As a measure of depth, its properties are studied theoretically, and the associated outlier detection performance is investigated through simulations. Compared to magnitude outliers, shape outliers are often masked among the rest of samples and harder to identify. We show that the proposed total variation depth has many desirable features and is well suited for outlier detection. In particular, we propose to decompose the total variation depth into two components that are associated with shape and magnitude outlyingness, respectively. This decomposition allows us to develop an effective procedure for outlier detection and useful visualization tools, while naturally accounting for the correlation in functional data. Finally, the proposed methodology is demonstrated using real datasets of curves, images, and video frames. The talk is based on joint work with Huang Huang.

2016-12-16
12:00hrs.
Fernanda de Bastiani. Pontificia Universidad Católica de Chile
Flexible Regression and Smoothing: Gaussian Markov random field models in GAMLSS
Sala 2, Facultad de Matemáticas
Abstract:

This work describes a brief history about GAMLSS and the modelling and fitting of Gaussian Markov random field components within a GAMLSS model. This allows modelling of any or all the parameters of the distribution for the response variable using explanatory variables and spatial effects. The response variable distribution is allowed to be a non-exponential family distribution. A new package developed in R to achieve this is presented. We use Gaussian Markov random fields to model  the spatial effect in Munich  rent data and explore some features and characteristics of the data. The potential of using spatial analysis within GAMLSS is discussed. We argue that the flexibility of  parametric distributions, ability to model all the parameters of the distribution and diagnostic tools of GAMLSS provide an ideal environment for modelling spatial features of data. 

2016-12-02
12:00hrs.
Gabriel Martos. Pontificia Universidad Católica de Valparaíso
Discrimination surfaces for region-specific brain asymmetry analysis
Sala 2, Facultad de Matemáticas
Abstract:
Discrimination surfaces are introduced as a diagnostic for localizing brain regions where discrimination between diseased and non-diseased subjects is higher. An applied goal of interest is on conducting a brain asymmetry analysis so to localize brain regions where schizophrenia patients differ further from healthy controls. Joint work with Miguel de Carvalho.
2016-11-25
12:00hrs.
Giovanni Motta. Pontificia Universidad Católica de Chile
Spatial Identification of Epilepsy Regions
Sala 2, Facultad de Matemáticas
Abstract:

The surgical outcomes of patients suffering from neocortical epilepsy are not always successful. The main difficulty in the treatment of neocortical epilepsy is that current technology has limited accuracy in mapping neocortical epileptogenic tissue (see Haglund and Hochman 2004). It is known that the optical spectroscopic properties of brain tissue are correlated with changes in neuronal activity. The method of mapping these activity-evoked optical changes is known as imaging of intrinsic optical signals (ImIOS). Activity-evoked optical changes measured in neocortex are generated by changes in cerebral hemodynamics (i.e., changes in blood oxygenation and blood volume).

ImIOS has the potential to be useful for both clinical and experimental investigations of the human neocortex. However, its usefulness for human studies is currently limited because intra-operatively acquired ImIOS data is noisy. To improve the reliability and usefulness of ImIOS for human studies, it is desirable to find appropriate statistical appropriate methods for the removal of noise artifacts and its statistical analysis (see Lavine et al. 2011).

In this paper we introduce a novel flexible tool, based on spatial statistical representation of ImIOS, that allows for source localization of the epilepsy regions. In particular, our model incorporates spatial correlation between the location of the epileptic region(s) and the neighboring regions, non-stationarity of the observed time series, and heartbeat/respiration cyclical components. The final goal is clustering (dimension reduction) of the pixels in regions, in order to localize the epilepsy regions for the craniectomy.

The advantage of our approach compared with previous approaches is twofold. Firstly, we use a non-parametric specification, rather than the (more restrictive) parametric or polynomial-based specification. Secondly, we provide a statistical method – based on the spatial information – that is able to identify the clusters in a data-driven way, rather than the (sometimes arbitrary) ad-hoc currently used approaches.

To demonstrate how our method might be used for intra-operative neuro- surgical mapping, we provide an application of the technique to optical data acquired from a single human subject during direct electrical stimulation of the cortex. 

2016-11-18
12:00hrs.
Karine Bertin. Cimfav - Universidad de Valparaíso
Adaptive Density Estimation on Bounded Domains
Sala 2, Facultad de Matemáticas
Abstract:
We study the estimation, in $L_p$-norm, of density function defined on $[0, 1]^d$. We construct a new family of kernel density estimators that do not suffer from the so-called boundary bias problem and we propose a data-driven procedure based on Goldenshluger and Lepski approach that jointly select a kernel and a bandwidth. We derive two estimators that satisfy oracle type inequalities and that are also proved to be adaptive over a scale of anisotropic or isotropic Sobolev-Slobodetskii classes. The main interest of the isotropic procedure is to obtain adaptive results without any restriction on the smoothness parameter.
2016-11-11
12:00hrs.
José Quinlan. Pontificia Universidad Católica de Chile
Parsimonious Hierarchical Modeling Using Repulsive Distributions
Sala 2, Facultad de Matemáticas
Abstract:

Employing nonparametric methods for density estimation has become routine in Bayesian statistical practice. In this regard, models based on discrete nonparametric priors such as Dirichlet Process Mixture (DPM) models are very attractive choices due to their flexibility and tractability. However, a common problem in fitting DPMs or other discrete models to data is that they tend to produce a large number of (sometimes) redundant clusters. In this work we propose a method that produces parsimonious mixture models (i.e. mixtures that avoid creating redundant clusters), without sacrificing flexibility or model fit. This method is based on the idea of repulsion, that is, that any two mixture components are encouraged to be well separated. We propose a family of d-dimensional probability densities whose coordinates tend to repel each other in a smooth way. The induced probability measure has a close relation with Gibbs measures, Graph theory and Point Processes. We investigate its global properties and explore its use in the context of mixture models for density estimation. Computational techniques are detailed and we illustrate utility with some well-known data sets. 

2016-11-04
12:00hrs.
Rodrigo Rubio. Pontificia Universidad Católica de Chile
Similary-Based Clustering for Stock Market Extremes
Sala 2, Facultad de Matemáticas
Abstract:
The analysis of the magnitude and dynamics of extreme losses in a stock market is essential from an investors viewpoint. An important question of applied interest is: “How to group into different categories, stocks which are more similar from the viewpoint of those features?”.In this talk we discuss methods of similarity-based clustering for statistics of heteroscedastic extremes which allow us to assemble stocks that are more similar from the viewpoint of the scedasis and/or tail index. 
2016-10-21
12:00hrs.
Ramsés Mena. Instituto de Investigaciones en Matemáticas Aplicadas y en Sistemas, Universidad Nacional Autónoma de México, México
Algunos procesos markovianos derivados de intercambiabilidad
Sala 2, Facultad de Matemáticas
Abstract:
Utilizando la  simetría inherente al concepto de intercambiabilidad  y el aprendizaje del método bayesiano se vislumbra una atractiva construcción de procesos markovianos. Dicho enfoque es considerablemente general, en tanto a supuestos distribucionales y de dependencia se refiere. Discutiremos varios casos particulares, tanto a tiempo discreto como a tiempo continuo. Si el tiempo lo permite presentaremos algunas generalizaciones a procesos con valores en espacio de medidas de probabilidad y algunas de sus aplicaciones. 
2016-10-19
12:00hrs.
Mingan Yang. San Diego State University
Bayesian semiparametric latent variable model: An application on Fibroid tumor study
Sala 5, Facultad de Matemáticas
Abstract:
In parametric hierarchical models, it is standard practice to place mean and variance constraints on the latent variable distributions for the sake of identifiability and interpretability. Because incorporation of such constraints is challenging in semiparametric models that allow latent variable distributions to be unknown, previous methods either constrain the median or avoid constraints. In this article, we propose a centered stick-breaking process (CSBP), which induces mean and variance constraints on an unknown distribution in a hierarchical model. This is accomplished by viewing an unconstrained stick-breaking process as a parameter-expanded version of a CSBP. An efficient blocked Gibbs sampler is developed for approximate posterior computation. The methods are illustrated through a simulated example and an epidemiologic application.