Seminario de Estadística

   
2017-06-23
12:00hrs.
Carlos Sing-Long. Ingeniería Matemática y Computacional Escuela de Ingeniería
Computational Methods For Unbiased Risk Estimation
Sala 2, Facultad de Matemáticas
Abstract:

In many engineering applications, one seeks to estimate, recover or reconstruct an unknown object of interest from an incomplete set of linear measurements. Mathematically, the unknown object can be represented as the solution to an underdetermined system of linear equations. In recent years it has been shown that it is possible to recover the true object by exploiting a priori information about its structure, such as sparsity in compressed sensing or low-rank in matrix completion. However, in practice the measurements are corrupted by noise and exact recovery is not possible.

A popular approach to address this issue is to solve an unconstrained convex optimization problem to obtain an estimate that both explains the measurements and resembles the known structural characteristics of the true object. The objective function quantifies the trade-off between data fidelity and structural fidelity, which is usually controlled by a single regularization parameter. One possible criterion for selecting the value of this parameter is to minimize an unbiased estimate for the prediction error as a surrogate for the true prediction risk. Unfortunately, evaluating this estimate requires an expression for the weak divergence of the predicted observations. Therefore, it is necessary to characterize the regularity of the solution to the convex optimization problem with respect to the measurements.

In this talk I will present a conceptual and practical framework to study the regularity of the solution to a popular class of such optimization problems. The approach consists of using an auxiliary optimization problem that characterizes the smoothness of the predicted observations. In particular, we can relate the analytic singularities of the predicted observations with the geometric singularities of the feasible set to this problem. I will then present a disciplined approach for obtaining closed-form expressions for the derivatives of the predicted measurements that are amenable to computation. Finally, I will explain how the expressions establish a connection between the geometry of the convex optimization problem and the unbiased estimate for the prediction risk.

2017-06-09
12:00 hrs.
Ricardo Bórquez. Departamento de Economía Agraria, Facultad de Agronomía e Ingeniería Forestal
Financial Markets: Controversial Or Untestable Theories?
Sala 2, Facultad de Matemáticas
Abstract:

This paper demonstrates a measurability property of no-arbitrage prices, which precludes martingale law identification based on price information. Theoretical model hypotheses defined on martingale laws  are therefore untestable using prices. Examples of such untestable models are  found in the  literature of rational  bubbles, efficient  markets, and more  recently, of arbitrage pricing.

2017-06-02
12:00hrs.
Mauricio Castro. Universidad de Concepción
Recent Advances In Censored Regression Models
Sala 2, Facultad de Matemáticas
Abstract:

Recently, the study of statistical models where the dependent variable is censored has been studied in different fields, namely, econometric analysis, clinical assays and biostatistical analysis among others. In practice, it is quite common assume Gaussianity for the random components of the model due mainly to the computational flexibility for parameter estimation.

However, such an assumption may not be realistic. In fact, the likelihood-based and Bayesian inferences for censored models can be seriously affected by the presence of atypical observations, skewness and/or the misspecification of the distributions for random terms.

The objective of this talk is to provide a review about statistical models for censored data using non-Gaussian families of distributions to model human immunodeficiency virus (HIV) dynamics. 

2017-05-26
12:00 hrshrs.
Nedret Billor. Department Of Mathematics And Statistics Auburn University
Robust Inference In Functional Data Analysis
Sala 2, Facultad de Matemáticas
Abstract:

In the last twenty years, a substantial amount of attention has been drawn to the field of functional data analysis. While the study of the probabilistic tools for infinite dimensional variables started in the beginning of the 20th century, the development of statistical models and methods for functional data has only really been developed in the last two decades since many scientific fields involving applied statistics have started measuring and recording massive continuous data due to rapid technological advancements. The methods developed in this field mainly require homogeneity of functional data, namely free of outliers. However, the development of methods in the presence of outliers has just been recently studied. In this talk, we focus on the effect of outliers on functional data analysis techniques. Then we introduce robust estimation and variable selection methods for a special functional regression model as well as simultaneous confidence band for the Mean Function of functional data. Simulation studies and data applications are presented to compare the performance of the proposed methods with the non?robust techniques. 

2017-05-19
12:00 hrs.
Isabelle Beaudry. Pontificia Universidad Católica de Chile
Correcting For Nonrandom Recruitment With Rds Data: Design-Based And Bayesian Approaches
Sala 2, Facultad de Matemáticas
Abstract:

Respondent-driven sampling (RDS) is a sampling mechanism that has proven very effective to sample hard-to-reach human populations connected through social networks. A small number of individuals typically known to the researcher are initially sampled and asked to recruit a small fixed number of their contacts who are also members of the target population. Each subsequent sampling waves are produced by peer recruitment until a desired sample size is achieved. However, the researcher's lack of control over the sampling process has posed several challenges to producing valid statistical inference from RDS data. For instance, participants are generally assumed to recruit completely at random among their contacts despite the growing empirical evidence that suggests otherwise and the substantial sensitivity of most RDS estimators to this assumption. The main contributions of this study are to parameterize alternative recruitment behaviors and propose a design-based and a model-based (Bayesian) estimators to correct for nonrandom recruitment.

2017-04-21
12:00hrs.
Felipe Osorio. Instituto de Estadística, Pontificia Universidad Católica de Valparaíso
Test Gradiente Para Extremum Estimators
Sala 2, Facultad de Matemáticas
Abstract:

En este trabajo se introduce el test gradiente propuesto por Terrell [Comp. Sci. Stat. 34: 206-215, 2002] al contexto de estimadores que surgen como el extremo de una función objetivo, esta clase general de estimadores frecuentemente conocidos como “extremum estimators” proveen un marco general para el estudio de distintos procedimientos de estimación que comparten principios comunes. En esta charla, nos enfocamos principalmente en abordar test de hipótesis no lineales así como de la aplicación del test gradiente en diagnóstico de influencia. La metodología es aplicada para determinar la igualdad entre razones de Sharpe asociado a las rentabilidades de los fondos de pensiones desde el sistema previsional chileno. 

2017-03-06
12:00hrs.
Garritt Page. Brigham Young University
Estimation And Prediction In The Presence Of Spatial Confounding For Spatial Linear Models
Sala 2, Facultad de Matemáticas
Abstract:

In studies that produce data with spatial structure it is common that covariates of interest vary spatially in addition to the error. Because of this,  the error and covariate are often correlated. When this occurs it is difficult to distinguish the covariate effect from residual spatial variation.  In an iid normal error setting, it is well known that this type of correlation produces biased coefficient estimates but predictions remain unbiased.  In a spatial setting  recent studies have shown that coefficient estimates remain biased, but spatial prediction has not been addressed. The purpose of this paper is to provide a more detailed study of coefficient estimation from spatial models when covariate and error are correlated and then begin a formal study regarding spatial prediction. This is carried out by investigating properties of the generalized least squares estimator and the best linear unbiased predictor when a spatial random effect and a covariate are jointly modeled. Under this setup we demonstrate that the mean squared prediction error is possibly reduced when covariate and error are correlated.  

2017-01-11
12:00hrs.
Marc G. Genton. King Abdullah University Of Science And Technology (Kaust), Saudi Arabia
Computational Challenges With Big Environmental Data
Sala 2, Facultad de Matemáticas
Abstract:

Two types of computational challenges arising from big environmental data

are discussed. The first type occurs with multivariate or spatial

extremes. Indeed, inference for max-stable processes observed at a large

collection of locations is among the most challenging problems in

computational statistics, and current approaches typically rely on less

expensive composite likelihoods constructed from small subsets of data. We

explore the limits of modern state-of-the-art computational facilities to

perform full likelihood inference and to efficiently evaluate high-order

composite likelihoods. With extensive simulations, we assess the loss of

information of composite likelihood estimators with respect to a full

likelihood approach for some widely-used multivariate or spatial extreme

models. The second type of challenges occurs with the emulation of climate

model outputs. We consider fitting a statistical model to over 1 billion

global 3D spatio-temporal temperature data using a distributed computing

approach. The statistical model exploits the gridded geometry of the data

and parallelization across processors. It is therefore computationally

convenient and allows to fit a non-trivial model to a data set with a

covariance matrix comprising of 10^{18} entries. We provide 3D

visualization of the results. The talk is based on joint work with Stefano

Castruccio and Raphael Huser.

 

2017-01-11
11:00hrs.
Ying Sun. King Abdullah University Of Science And Technology (Kaust), Saudi Arabia
Total Variation Depth For Functional Data
Sala 2, Facultad de Matemáticas
Abstract:

There has been extensive work on data depth-based methods for robust

multivariate data analysis. Recent developments have moved to

infinite-dimensional objects such as functional data. In this work, we

propose a new notion of depth, the total variation depth, for functional

data. As a measure of depth, its properties are studied theoretically, and

the associated outlier detection performance is investigated through

simulations. Compared to magnitude outliers, shape outliers are often

masked among the rest of samples and harder to identify. We show that the

proposed total variation depth has many desirable features and is well

suited for outlier detection. In particular, we propose to decompose the

total variation depth into two components that are associated with shape

and magnitude outlyingness, respectively. This decomposition allows us to

develop an effective procedure for outlier detection and useful

visualization tools, while naturally accounting for the correlation in

functional data. Finally, the proposed methodology is demonstrated using

real datasets of curves, images, and video frames. The talk is based on

joint work with Huang Huang.

2016-12-16
12:00hrs.
Fernanda de Bastiani. Pontificia Universidad Católica de Chile
Flexible Regression And Smoothing: Gaussian Markov Random Field Models In Gamlss
Sala 2, Facultad de Matemáticas
Abstract:

This work describes a brief history about GAMLSS and the modelling and fitting of Gaussian Markov random field components within a GAMLSS model. This allows modelling of any or all the parameters of the distribution for the response variable using explanatory variables and spatial effects. The response variable distribution is allowed to be a non-exponential family distribution. A new package developed in R to achieve this is presented. We use Gaussian Markov random fields to model  the spatial effect in Munich  rent data and explore some features and characteristics of the data. The potential of using spatial analysis within GAMLSS is discussed. We argue that the flexibility of  parametric distributions, ability to model all the parameters of the distribution and diagnostic tools of GAMLSS provide an ideal environment for modelling spatial features of data. 

2016-12-02
12:00hrs.
Gabriel Martos. Pontificia Universidad Católica de Valparaíso
Discrimination Surfaces For Region-Specific Brain Asymmetry Analysis
Sala 2, Facultad de Matemáticas
Abstract:
Discrimination surfaces are introduced as a diagnostic for localizing brain regions where discrimination between diseased and non-diseased subjects is higher. An applied goal of interest is on conducting a brain asymmetry analysis so to localize brain regions where schizophrenia patients differ further from healthy controlsJoint work with Miguel de Carvalho.
2016-11-25
12:00hrs.
Giovanni Motta. Pontificia Universidad Católica de Chile
Spatial Identification Of Epilepsy Regions
Sala 2, Facultad de Matemáticas
Abstract:

The surgical outcomes of patients suffering from neocortical epilepsy are not always successful. The main difficulty in the treatment of neocortical epilepsy is that current technology has limited accuracy in mapping neocortical epileptogenic tissue (see Haglund and Hochman 2004). It is known that the optical spectroscopic properties of brain tissue are correlated with changes in neuronal activity. The method of mapping these activity-evoked optical changes is known as imaging of intrinsic optical signals (ImIOS). Activity-evoked optical changes measured in neocortex are generated by changes in cerebral hemodynamics (i.e., changes in blood oxygenation and blood volume).

ImIOS has the potential to be useful for both clinical and experimental investigations of the human neocortex. However, its usefulness for human studies is currently limited because intra-operatively acquired ImIOS data is noisy. To improve the reliability and usefulness of ImIOS for human studies, it is desirable to find appropriate statistical appropriate methods for the removal of noise artifacts and its statistical analysis (see Lavine et al. 2011).

In this paper we introduce a novel flexible tool, based on spatial statistical representation of ImIOS, that allows for source localization of the epilepsy regions. In particular, our model incorporates spatial correlation between the location of the epileptic region(s) and the neighboring regions, non-stationarity of the observed time series, and heartbeat/respiration cyclical components. The final goal is clustering (dimension reduction) of the pixels in regions, in order to localize the epilepsy regions for the craniectomy.

The advantage of our approach compared with previous approaches is twofold. Firstly, we use a non-parametric specification, rather than the (more restrictive) parametric or polynomial-based specification. Secondly, we provide a statistical method – based on the spatial information – that is able to identify the clusters in a data-driven way, rather than the (sometimes arbitrary) ad-hoc currently used approaches.

To demonstrate how our method might be used for intra-operative neuro- surgical mapping, we provide an application of the technique to optical data acquired from a single human subject during direct electrical stimulation of the cortex. 

2016-11-18
12:00hrs.
Karine Bertin. Cimfav - Universidad de Valparaíso
Adaptive Density Estimation On Bounded Domains
Sala 2, Facultad de Matemáticas
Abstract:
We study the estimation, in $L_p$-norm, of density function defined on $[0, 1]^d$. We construct a new family of kernel density estimators that do not suffer from the so-called boundary bias problem and we propose a data-driven procedure based on Goldenshluger and Lepski approach that jointly select a kernel and a bandwidth. We derive two estimators that satisfy oracle type inequalities and that are also proved to be adaptive over a scale of anisotropic or isotropic Sobolev-Slobodetskii classes. The main interest of the isotropic procedure is to obtain adaptive results without any restriction on the smoothness parameter.
2016-11-11
12:00hrs.
José Quinlan. Pontificia Universidad Católica de Chile
Parsimonious Hierarchical Modeling Using Repulsive Distributions
Sala 2, Facultad de Matemáticas
Abstract:

Employing nonparametric methods for density estimation has become routine in Bayesian statistical practice. In this regard, models based on discrete nonparametric priors such as Dirichlet Process Mixture (DPM) models are very attractive choices due to their flexibility and tractability. However, a common problem in fitting DPMs or other discrete models to data is that they tend to produce a large number of (sometimes) redundant clusters. In this work we propose a method that produces parsimonious mixture models (i.e. mixtures that avoid creating redundant clusters), without sacrificing flexibility or model fit. This method is based on the idea of repulsion, that is, that any two mixture components are encouraged to be well separated. We propose a family of d-dimensional probability densities whose coordinates tend to repel each other in a smooth way. The induced probability measure has a close relation with Gibbs measures, Graph theory and Point Processes. We investigate its global properties and explore its use in the context of mixture models for density estimation. Computational techniques are detailed and we illustrate utility with some well-known data sets. 

2016-11-04
12:00hrs.
Rodrigo Rubio. Pontificia Universidad Católica de Chile
Similary-Based Clustering For Stock Market Extremes
Sala 2, Facultad de Matemáticas
Abstract:
The analysis of the magnitude and dynamics of extreme losses in a stock market is essential from an investors viewpoint. An important question of applied interest is: “How to group into different categories, stocks which are more similar from the viewpoint of those features?”.In this talk we discuss methods of similarity-based clustering for statistics of heteroscedastic extremes which allow us to assemble stocks that are more similar from the viewpoint of the scedasis and/or tail index. 
2016-10-21
12:00hrs.
Ramsés Mena. Instituto de Investigaciones en Matemáticas Aplicadas y en Sistemas, Universidad Nacional Autónoma de México, México
Algunos Procesos Markovianos Derivados de Intercambiabilidad
Sala 2, Facultad de Matemáticas
Abstract:
Utilizando la  simetría inherente al concepto de intercambiabilidad  y el aprendizaje del método bayesiano se vislumbra una atractiva construcción de procesos markovianos. Dicho enfoque es considerablemente general, en tanto a supuestos distribucionales y de dependencia se refiere. Discutiremos varios casos particulares, tanto a tiempo discreto como a tiempo continuo. Si el tiempo lo permite presentaremos algunas generalizaciones a procesos con valores en espacio de medidas de probabilidad y algunas de sus aplicaciones. 
2016-10-19
12:00hrs.
Mingan Yang. San Diego State University
Bayesian Semiparametric Latent Variable Model: An Application On Fibroid Tumor Study
Sala 5, Facultad de Matemáticas
Abstract:
In parametric hierarchical models, it is standard practice to place mean and variance constraints on the latent variable distributions for the sake of identifiability and interpretability. Because incorporation of such constraints is challenging in semiparametric models that allow latent variable distributions to be unknown, previous methods either constrain the median or avoid constraints. In this article, we propose a centered stick-breaking process (CSBP), which induces mean and variance constraints on an unknown distribution in a hierarchical model. This is accomplished by viewing an unconstrained stick-breaking process as a parameter-expanded version of a CSBP. An efficient blocked Gibbs sampler is developed for approximate posterior computation. The methods are illustrated through a simulated example and an epidemiologic application.
2016-10-14
12:00hrs.
Carlos Díaz Ávalos. Instituto de Investigaciones en Matemáticas Aplicadas y en Sistemas - Unam, México
Estadística Espacial: Contestando Preguntas Relevantes en Ciencias Ambientales y de la Salud
Sala 2, Facultad de Matemáticas
Abstract:
La estadística espacial es una rama que cobró auge a finales de la década de los 80, cuando se desató una procupación mundial por problemas ambientales.  Esto dió lugar al desarrollo de métodos aplicables a ciencias ambientales en las que se consideran campos aleatorios de tipo contínuo, discreto y puntual.  Actualmente la mayoría de los países enfrentan graves problemas ambientales entre los que destacan la quema de bosques, la contaminación y problemas epidemiológicos.

En esta charla se presenta una visión global de los métodos de la estadística espacial adecuados para el análisis en los tres tipos de soporte y se muestran tres aplicaciones a datos reales, con una breve reseña de tópicos de investigación aún abiertos.
2016-10-07
12:00hrs.
Giovanni Motta. Pontificia Universidad Católica de Chile
Local Polynomials For Time-Varying Correlations: Adaptivity Versus Positivity
Sala 2, Facultad de Matemáticas
Abstract:
In this paper we propose a new nonparametric method to estimate the time-varying correlation between two non-stationary time series. Linear smoothers of the cross-products are based on the same bandwidth for both numerator (covariance) and denominator (variances). This approach guarantees two important properties: the estimated correlation is bounded between minus one and one, and the resulting correlation matrix is positive semi-definite. However, the use of one common bandwidth for both numerator and denominator appears to be restrictive, as the covariance and the variances are in general characterized by different degrees of smoothness. On the other hand, a kernel-type estimator based on different smoothing parameters for numerator and denominator has two drawbacks. First, the ratio between time-varying numerators and denominators is not necessarily bounded between minus one and one; as a consequence, the resulting correlation matrix is not necessarily positive semi-definite. Second, the estimated bandwidths that are optimal for estimating the covariance and the variances are not necessarily optimal for estimating the ratio. The estimator we propose in this paper is based on local smoothing of the sign of the cross-products, which does not require distinguishing between numerator and denominator. Our novel method can be used to estimate the time-varying AR coefficients and time-varying spectra of locally stationary time series.
2016-07-07
Pedro Jodrá. Universidad de Zaragoza
On The Log-Extended Exponential-Geometric Distribution And Applications In Business Research
Sala 2 (Víctor Ochsenius) Facultad de Matemáticas a las 12:00 Hrs.