Seminario de Estadística


2018-09-14
12:00hrs.
Alejandro Murua. Universidad de Montreal
Cox regression with Potts-driven latent clusters model
Sala 5, Facultad de Matemáticas, Edificio Rolando Chuaqui, Campus San Joaquin, Pontificia Universidad Católica de Chile
Abstract:
We consider a Bayesian nonparametric survival regression model with latent partitions. Our goal is to predict survival, and to cluster survival patients within the context of building prognosis systems. We propose the Potts clustering model as a prior on the covariates space so as to drive cluster formation on individuals and/or Tumor-Node-Metastasis stage system patient blocks. For any given partition, our model assumes a interval-wise Weibull distribution for the baseline hazard rate. The number of intervals is unknown. It is estimated with a lasso-type penalty given by a sequential double exponential prior. Estimation and inference are done with the aid of MCMC. To simplify the computations, we use the Laplace's approximation method to estimate some constants, and to propose parameter updates within MCMC. We illustrate the methodology with an application to cancer survival.

Seminario organizado por el Centro para el Descubrimiento de Estructuras en Datos Complejos - MiDaS.
http://midas.mat.uc.cl
2018-09-07
12:00hrs.
Luis Gutierrez. Departamento de Estadística, Pontificia Universidad Católica de Chile
A Bayesian nonparametric multiple testing procedure for comparing several treatments against a control
Sala 5, Facultad de Matemáticas, Edificio Rolando Chuaqui, Campus San Joaquin, Pontificia Universidad Católica de Chile
Abstract:
We propose a Bayesian nonparametric strategy to test for differences between a control group and several treatment regimes. Most of the existing tests for this type of comparison are based on the differences between location parameters. In contrast, our approach identifies differences across the entire distribution, avoids strong modeling assumptions over the distributions for each treatment, and accounts for multiple testing through the prior distribution on the space of hypotheses. The proposal is compared to other commonly used hypothesis testing procedures under simulated scenarios. A real application is also analyzed with the proposed methodology.

Seminario organizado por el Centro para el Descubrimiento de Estructuras en Datos Complejos - MiDaS.
http://midas.mat.uc.cl
2018-08-31
12:00hrs.
Claudia Wehrhahn. University of California, Santa Cruz
A Bayesian approach to Disease Clustering using restricted Chinese restaurant processes
Sala 5, Facultad de Matemáticas, Edificio Rolando Chuaqui, Campus San Joaquin, Pontificia Universidad Católica de Chile
Abstract:
Disease clustering models, whose goal is to detect clusters of regions with unusual high incidence, are of importance in epidemiology and public health. We describe a restricted Chinese restaurant process that constrains clusters to be formed of contiguous regions. The model is illustrated using synthetic data sets and in an application to oral cancer in Germany. The performance of the model is compared to the disease clustering model proposed by Knorr-Held and Rasser [Biometrics, 1, 56 (2000)].

Seminario organizado por el Centro para el Descubrimiento de Estructuras en Datos Complejos - MiDaS.
http://midas.mat.uc.cl
2018-08-24
12:00hrs.
Garritt Page. Department of Statistics, Brigham Young University
Temporal and Spatio-Temporal Random Partition Models
Auditorio Ninoslav Bralic, Facultad de Matemáticas, Edificio Rolando Chuaqui, Campus San Joaquin, Pontificia Universidad Católica de Chile
Abstract:
Data that are spatially referenced often represent an instantaneous point in time at which the spatial process is measured.  Because of this it is becoming more common to monitor spatial processes over time.  We propose capturing the temporal evolution of dependent structures by modeling a sequence of partitions indexed by time jointly.  We derive a few characteristics from the joint model and show how it impacts dependence at the observation level.  Computation strategies are detailed and apply the method to Chilean standardized testing scores.

Seminario organizado por el Centro para el Descubrimiento de Estructuras en Datos Complejos (http://bnp.mat.uc.cl).

http://bnp.mat.uc.cl
2018-08-17
12:00hrs.
Evan Ray. Mount Holyoke College
Ensemble Forecasts of Infectious Disease
Sala 5, Facultad de Matemáticas, Edificio Rolando Chuaqui, Campus San Joaquin, Pontificia Universidad Católica de Chile
Abstract:
Real-time forecasts of measures of the timing and severity of the spread of infectious disease are important inputs to public policy officials planning interventions designed to slow or stop the spread of the disease.  A wide variety of models have been developed to generate these forecasts, using different data sources and model structures.  In general, no single modeling approach always outperforms all other approaches.  In recent forecasting competitions focusing on influenza in the Unites States, models with very different structures and using different data sources have performed at or near the top of the rankings.  Here we describe two recent experiments with using ensemble approaches to combine the forecasts from several different component models.  We show that these ensembles have improved performance relative to the individual component models, in terms of having better performance on average and more consistent performance across multiple seasons.

Seminario organizado por el Centro para el Descubrimiento de Estructuras en Datos Complejos (http://bnp.mat.uc.cl).

2018-08-09
16:00hrs.
Victor Hugo Lachos. Department of Statistics, University of Connecticut
Censored Regression Models for Complex Data
Sala 3, Facultad de Matemáticas, Edificio Rolando Chuaqui, Campus San Joaquin, Pontificia Universidad Católica de Chile
Abstract:
Measurement data can be subject to some upper and/or lower detection limits because of the restriction/limitation of experimental apparatus. A complication arises when these continuous measures present a heavy-tailed behavior because inference can be seriously affected by the misspecification of their parametric distribution. For such data structures, we discuss some useful models and estimation strategies for robust estimation. The practical utility of the proposed method are exemplified using real datasets.
2018-08-08
16:00hrs.
Mariangela Guidolin. Dipartimento Di Scienze Statistiche, Università Di Padova
On inverse product cannibalisation: A new Lotka-Volterra model for asymmetric competition in the ICTs
Sala 2, Facultad de Matemáticas, Edificio Rolando Chuaqui, Campus San Joaquin, Pontificia Universidad Católica de Chile
Abstract:
Product cannibalisation is a well-known phenomenon in marketing and technological research, describing the case when a new product steals sales from another product under the same brand. An extremely special case of cannibalisation may occur when the older product reacts to the competitive strength of the newer one, absorbing the corresponding market shares. Given its special character, we call this phenomenon inverse product cannibalisation . We suppose that a case of inverse cannibalisation is observed between two products of Apple Inc.- the iPhone and the more recent iPad- and the first has been able to succeed at the expense of the second. To explore this hypothesis, from a diffusion of innovations perspective, we propose a modified Lotka-Volterra model for mean trajectories in asymmetric competition, allowing us to test the presence and extent of the inverse cannibalisation phenomenon. A SARMAX refinement integrates the short-term predictions with seasonal and autodependent components. A non-dimensional representation of the proposed model shows that the penetration of the second technology has been beneficial for the first, both in terms of the market size and life cycle length
2018-08-06
16:00hrs.
Bruno Scarpa. Dipartimento Di Scienze Statistiche, Università Di Padova
Bayesian modelling of networks in business intelligence problems
Sala 5, Facultad de Matemáticas, Edificio Rolando Chuaqui, Campus San Joaquin, Pontificia Universidad Católica de Chile
Abstract:
Complex network data problems are increasingly common in many fields of application. Our motivation is drawn from strategic marketing studies monitoring customer choices of specific products, along with co-subscription networks encoding multiple purchasing behavior.

Data are available for several agencies within the same insurance company, and our goal is to efficiently exploit co-subscription networks to inform targeted advertising of cross-sell strategies to currently mono-product customers. We address this goal by developing a Bayesian hierarchical model, which clusters agencies according to common mono-product customer choices and co-subscription networks. Within each cluster, we efficiently model customer behavior via a cluster-dependent mixture of latent eigenmodels. This formulation provides key information on mono-product customer choices and multiple purchasing behavior within each cluster, informing targeted cross-sell strategies. We develop simple algorithms for tractable inference, and assess performance in simulations and an application to business intelligence.
2018-06-08
12:00hrs.
Fabio Lopes. Instituto de Estadística, Pontificia Universidad Católica de Valparaíso
Extensions of the birth-and-assassination process and their applications
Sala 1, Facultad de Matemáticas, Edificio Rolando Chuaqui, Campus San Joaquin, Pontificia Universidad Católica de Chile
Abstract:
In this talk, we introduce some extensions of the birth-and-assassination (BA) process. The BA process is a variant of the continuous-time branching process which was introduced by Aldous and Krebs. In this model, each individual reproduces independently at rate $\lambda$ throughout its lifetime, but it is not at risk of dying until its parent's death. A typical realization of this process resembles a finite collection of ´´clans'', where only the current leaders of clans can be killed. This model behaves differently than the classical branching process, and has found interesting applications in queueing theory and the spread of rumors.
 
The extensions we will introduce can be seen as versions of the BA process with infinitely many types and mutations. We will illustrate them with some applications related to immunology.

This is a joint work with C. Grejo (USP), F. Machado (USP) and A. Roldán-Correa (U. Antioquia).
2018-06-01
12:00hrs.
Christian Caamaño Carrillo. Departamento de Estadística, Universidad del Bío-Bío
Modeling and estimation of some non Gaussian random fields
Sala 5, Facultad de Matemáticas, Edificio Rolando Chuaqui, Campus San Joaquin, Pontificia Universidad Católica de Chile
Abstract:

In this work, we propose two types of models for the analysis of regression and dependence of positive and continuous spatio-temporal data, and of continuous spatio-temporal data with possible asymmetry and/or heavy tails. For the first case, we propose two (possibly non stationary) random fields with Gamma and Weibull marginals. Both random fields are obtained transforming a rescaled sum of independent copies of squared Gaussian random fields. For the second case, we propose a random field with t marginal distribution. We then consider two possible generalizations allowing for possible asymmetry. In the first approach we obtain a skew-t random field mixing a skew Gaussian random field with an inverse square root Gamma random field. In the second approach we obtain a two piece t random field mixing a specific binary discrete random field with half-t random field.

We study the associated second order properties and in the stationary case, the geometrical properties. Since maximum likelihood estimation is computationally unfeasible, even for relatively small data-set, we propose the use of the pairwise likelihood. The effectiveness of our proposal for the gamma and weibull cases, is illustrated through a simulation study and a re-analysis of the Irish Wind speed data (Haslett and Raftery, 1989) without considering any prior transformation of the data as in previous statistical analysis. For the t and asymmetric t cases we present a simulated study in order to show the performance of our method.

2018-05-25
12:00hrshrs.
Moreno Bevilacqua. Instituto de Estadística, Universidad de Valparaíso
Estimation and prediction of Gaussian random field using generalized Wendland covariance functions under fixed domain asymptotics
Sala 5, Facultad de Matemáticas, Edificio Rolando Chuaqui, Campus San Joaquin, Pontificia Universidad Católica de Chile
Abstract:
Some results on  the  estimation and prediction of Gaussian random fields with covariance models belonging to the generalized Wendland class, under fixed domain asymptotics are presented. As for the Matern case, this class allows for a continuous parameterization of the smoothness of the underlying Gaussian random field, being additionally compactly supported. 

We first study the equivalence of two  Gaussian measures with  Matern and generalized Wendland  covariance models. Then  we give strong consistency and asymptotic distribution of the maximum likelihood estimator of the microergodic parameter associated to generalized Wendland covariance model, under fixed domain asymptotics. Finally we  give some  results in terms of (misspecified) best linear unbiased predictor, under fixed domain asymptotics when the true model is Matern and the misspecified is Generalized Wendland.
2018-05-18
12:00hrshrs.
Orietta Nicolis. Instituto de Estadística, Universidad de Valparaíso
Modelos espacio-temporales para la evaluación de riesgos ambientales
Sala 5, Facultad de Matemáticas, Edificio Rolando Chuaqui, Campus San Joaquin, Pontificia Universidad Católica de Chile
Abstract:
El objetivo principal del trabajo es el uso de modelos estadísticos espacio-temporales para construir mapas de riesgo ambiental para los desastres naturales y antrópicos, con el fin de mejorar la evaluación, la prevención y la mitigación de sus impactos. Con este fin, se analiza la variabilidad espacial y temporal de los puntos georreferenciados y observaciones, se estudia la dependencia de las variables exógenas, y se produce mapas de riesgo. Aunque todos los datos que describen algunos fenómenos ambientales (tales como terremotos, incendios forestales, avalanchas, contaminación del aire, etc.) pueden ser caracterizados por una variabilidad temporal y espacial, diferentes supuestos tienen que tomarse para una correcta definición del modelo. Algunos estudios de caso serán mostrados usando el catálogo de terremotos de Chile y los datos sobre la contaminación del aire de Santiago de Chile.
2018-04-06
12:00hrshrs.
Cristian Meza. Cimfav, Facultad de Ingeniería, Universidad de Valparaíso
A Bayesian approach for the segmentation of series with a functional effect
Sala 5, Facultad de Matemáticas, Edificio Rolando Chuaqui, Campus San Joaquin, Pontificia Universidad Católica de Chile
Abstract:
In some application fields, series are affected by two different types of effects: abrupt changes (or change-points) and functional effects. We propose here a Bayesian approach that allows us to estimate these two parts. Here the underlying piecewise-constant part (associated to the abrupt changes) is expressed as the product of a lower triangular matrix by a sparse vector and the functional part as a linear combination of functions from a large dictionary where we want to select the relevant ones. This problem can thus lead to a global sparse estimation and a Stochastic Search Variable Selection approach is used to this end. The performance of our proposed method is assessed using simulation experiments. Applications to three real datasets from geodesy, agronomy and economy fields are also presented.
2017-12-06
12:00hrs.
Guillermina Eslava. Departamento de Matemáticas, Facultad de Ciencias, Unam
modelos log-lineales, la regresión logística y las redes Bayesianas, su relación y aplicación
Sala 1, Facultad de Matemáticas, Edificio Rolando Chuaqui, Campus San Joaquin, Pontificia Universidad Católica de Chile
Abstract:

Los modelos log-lineales son modelos estadísticos útiles para explicar la interrelación entre un conjunto de variables aleatorias discretas $X = (X_1, \ldots, X_p)$. El modelo se expresa a través del logaritmo de la función de distribución conjunta, $\log(P(X_1 = x_1,\ldots,X_p = x_p))$. Si una de estas variables, digamos $X_1: 0,1$, es considerada como variable respuesta y el resto como explicativas, podemos considerar a un modelo de regresión logística para modelar el logaritmo del momio $log(P (X_1 = 1|X_2, \ldots, X_p)/P(X_1 = 0|X_2,\ldots, X_p))$.

Estos dos modelos guardan una relación estrecha en el sentido que a través de cada uno de ellos puede expresarse la probabilidad condicional de $X_1$ dado el resto de las variables, $P (X_1 = 1|X_2 = x_2 ,\ldots, X_p = x_p )$. La probabilidad condicional $P (X_1 = 1|X_2 = x_2,\ldots,X_p = x_p)$ derivada del modelo log-lineal y la derivada de la regresión logística en general no son iguales.

En esta plática damos la condición bajo la cual esta probabilidad condicional es la misma bajo un modelos log-lineales jerárquico y bajo una regresión logística. Se presenta un ejemplo de aplicación que ilustra la utilidad de los modelos y la relación que guardan entre ellos. Adicionalmente presentamos los mismos ejemplos desde la perspectiva de los modelos gráficos dirigidos probabilísticos, también conocidos como redes Bayesianas. 

2017-12-04
12:00hrs.
Mogens Bladt. Institute for Applied Mathematics and Systems, Unam, México
Fitting phase-type scale mixtures to heavy-tailed data and distributions
Sala 1, Facultad de Matemáticas, Edificio Rolando Chuaqui, Campus San Joaquin, Pontificia Universidad Católica de Chile
Abstract:

If X has a phase–type distribution and N is any positive discrete random variable, then we say that the distribution of X · N belongs to the class of NPH distributions. Such distributions preserve the tractability and generality of phase–type distributions (often allowing for explicit solutions to stochastic models and being dense in the class of distri- butions on the positive reals) but with a different tail behaviour which is basically dictated by the tail of N. We thereby gain a tool for specifying distributions with a “body” shaped by X and with a tail defined by N. After reviewing the construction and basic properties of distributions from the NPH class, we will consider the problem of their estimation. To this end we will employ the EM algorithm, using a similar method as for finite–dimensional phase–type distributions. We consider the the fitting of a NPH distribution to observed data, (left-,right and interval-) censored data, theoretical distributions, histograms, and a couple of examples. 

2017-11-24
12:00hrs.
Juan F. Olivares. Universidad de Atacama
Inferencia y reducción de sesgo en el modelo estructural normal y elíptico
Sala 2, Facultad de Matemáticas, Edificio Rolando Chuaqui, Campus San Joaquin, Pontificia Universidad Católica de Chile
Abstract:

En este trabajo estudiamos los modelos estructurales eli?pticos siguiendo el enfoque de que los errores son independientes y tienen distribucio?n marginal eli?ptica. Adoptaremos un enfoque estudiado por Gleser (1992), para el cual obtenemos una representacio?n alternativa y conveniente a la forma usual de la funcio?n de log-verosimilitud. Esta representacio?n involucra una nueva parametrizacio?n la cual produce un modelo ma?s parsimonioso, transformando el modelo con medida de error cla?sico en un modelo de regresio?n con matriz de disen?o aleatoria usual, pero heterosceda?stico. Adema?s, esta parametrizacio?n tiene la ventaja de establecer una u?til conexio?n entre los modelos de regresio?n multivariado usuales y los modelos multivariados con medida de error. Los problemas de identificacio?n presentes en los modelos estructurales normales se tienen tambie?n en los modelos estructurales eli?pticos. Por tanto, supuestos adicionales son necesarios para hacer el problema de estimacio?n factible. Estos supuestos pueden ser considerados como esta?ndar en la literatura de modelos con medida de error. Por otro lado, determinamos el vector de score y la matriz de informacio?n esperada para el modelo reparametrizado. Se dan expresiones simples para calcular los elementos de la matriz de informacio?n, en el cual so?lo algunos momentos univariados deben ser calculados nume?ricamente. Una ilustracio?n de las distribuciones eli?pticas es considerada. Finalmente, consideramos en el contexto de modelos con medida de error, los estimadores de ma?xima verosimilitud son sesgados, entonces implementamos dos algoritmos para corregir el sesgo de las estimaciones. En particular, consideramos el me?todo de reduccio?n de sesgo propuesto por Firth (1993), el cual tiene la ventaja de que este no depende directamente de la existencia de los estimadores de ma?xima verosimilitud. 

2017-11-10
12:00hrshrs.
Daniel Taylor Rodriguez. Portland State University
Análisis Bayesiano intrínseco para modelos de ocupancia
Sala 2, Facultad de Matemáticas, Edificio Rolando Chuaqui, Campus San Joaquin, Pontificia Universidad Católica de Chile
Abstract:
Los modelos de ocupancia son utilizados para determinar la probabilidad de que una especie este presente en una locación controlando por el efecto de la detección imperfecta.  Los datos recolectados para este tipo de análisis incluyen multiples predictores con los cuales se pretende caracterizar la viabilidad del habitat, al igual que la facilidad con que la especie de interés puede ser detectada.  Sin embargo no todos los predictores recolectados resultan ser relevantes, por lo cual es necesario identificar las variables que contienen algún valor explicativo.  La practica usual consiste en utilizar el criterio de Akaike (AIC).  Dada la ausencia de alternativas adaptadas a este tipo de respuesta proponemos la primera estrategia Bayesiana no-informativa. Construimos priors en el espacio de parámetros basados en la metodología de los "intrinsic priors", e incorporamos priors en el espacio de modelos que controlan por "multiple-testing" y respetan el orden jerárquico de los predictors cuando interacciones y términos de orden dos (o mayores) son considerados.   El método controla adecuadamente la inclusion de falsos positivos sin comprometer su habilidad para identificar predictores relevantes.  Validamos el desempeño de la metodología a través de simulaciones y utilizando datos reales.
2017-11-03
12:00hrs.
Brajendra Sutradhar. Carleton University, Ottawa, Canada and Memorial University, St. John's, Canada
Semi-parametric Models for Longitudinal Count, Binary and Multinomial Data
Sala 2, Facultad de Matemáticas, Edificio Rolando Chuaqui, Campus San Joaquin, Pontificia Universidad Católica de Chile
Abstract:

In this talk, I will demonstrate the use of dynamic models to fit (1) longitudinal count responses such as repeated number of yearly physician visits by an individual; (2) longitudinal binary responses such as repeated asthma status (yes or no) of an individual over several months, and (3) longitudinal multinomial responses such as repeated stress levels (low, medium, high) of an individual worker over a period of few years. These dynamic models are developed to accommodate the correlations of the repeated responses and then to find out the regression effects of certain primary covariates on the repeated responses. In some situations, it may be necessary to add a non-parametric function in certain secondary covariates to the regression function. The ex- tended models are referred to as the longitudinal semi-parametric models. Estimation theory for the model parameters and the non-parametric functions will be discussed. The models and the inference methodologies will also be illustrated by numerical examples. 

2017-10-23
12:00hrs.hrs.
Claudia Wehrhahn. University of California, Santa Cruz
A Bayesian non parametric approach for human mobility modeling
Sala 5, Facultad de Matemáticas, Edificio Rolando Chuaqui, Campus San Joaquin, Pontificia Universidad Católica de Chile
Abstract:
Mobility models are widely used in disciplines as diverse as engineering, computer science, sociology, and ecology.  For instance, human mobility models are a key tool in the design and evaluating of wireless networks protocols.   There is a vast literature on human mobility models, but most proposed models rely on relatively simple assumptions about human behavior.  In this talk, we present a Bayesian non parametric  model  for human mobility. First, we discuss a non-homogeneous time dependent Poisson process in which the intensity function is modeled using multivariate Bernstein polynomials.  Then we discuss how to incorporate human interactions into the model, by means of repulsive Matérn point processes.  The performance of the model is illustrated in simulated and real traces for groups of individuals collected by GPS.
2017-10-06
12:00hrs.
Tamara Broderick. Massachusetts Institute of Technology
Fast Quantification of Uncertainty and Robustness with Variational Bayes
Sala 2, Facultad de Matemáticas, Edificio Rolando Chuaqui, Campus San Joaquin, Pontificia Universidad Católica de Chile
Abstract:
In Bayesian analysis, the posterior follows from the data and a choice of a prior and a likelihood. These choices may be somewhat subjective and reasonably vary over some range. Thus, we wish to measure the sensitivity of posterior estimates to variation in these choices. While the field of robust Bayes has been formed to address this problem, its tools are not commonly used in practice. We demonstrate that variational Bayes (VB) techniques are readily amenable to fast robustness analysis. Since VB casts posterior inference as an optimization problem, its methodology is built on the ability to calculate derivatives of posterior quantities with respect to model parameters. We use this insight to develop local prior robustness measures for mean-field variational Bayes (MFVB), a particularly popular form of VB due to its fast runtime on large data sets. A potential problem with MFVB is that it has a well-known major failing: it can severely underestimate uncertainty and provides no information about covariance. We generalize linear response methods from statistical physics to deliver accurate uncertainty estimates for MFVB---both for individual variables and coherently across variables. We call our method linear response variational Bayes (LRVB).