Top

2013 | Book

Read chapter Read first chapter

Advances in Theoretical and Applied Statistics

Editors: Nicola Torelli, Fortunato Pesarin, Avner Bar-Hen

Publisher: Springer Berlin Heidelberg

Book Series : Studies in Theoretical and Applied Statistics

Part of: Springer Professional "Wirtschaft+Technik" , Springer Professional "Technik" , Springer Professional "Wirtschaft"

About this book

This volume includes contributions selected after a double blind review process and presented as a preliminary version at the 45th Meeting of the Italian Statistical Society. The papers provide significant and innovative original contributions and cover a broad range of topics including: statistical theory; methods for time series and spatial data; statistical modeling and data analysis; survey methodology and official statistics; analysis of social, demographic and health data; and economic statistics and econometrics.

Frontmatter

Statistical Theory

Frontmatter

1. Default Priors Based on Pseudo-Likelihoods for the Poisson-GPD Model

Extreme values are usually modeled with the peaks over the threshold approach by means of the Poisson-Generalized Pareto Distribution (Poisson-GPD). This model is governed by three parameters: the Poisson rate, the scale and the shape of the GPD. The quantity of interest, in many applications, is the mean return level which is a function of Poisson-GPD parameters. Moreover, the shape parameter of GPD is itself of interest in order to gain more insights on the underlying extremal process. For a suitable orthogonal parametrization, we derive matching priors for shape, scale and Poisson rate parameters based on an approximate conditional pseudo-likelihood. The formal rule, used here to obtain such priors, in some cases leads to the same priors obtained with Jeffreys’ and Reference procedures. Moreover, we can provide a formal proof that each marginal prior for shape and scale parameters, respectively, are second order matching priors. We estimate the coverages of the corresponding posterior credible intervals and apply our approach to an example from hydrology.

Stefano Cabras

2. Robustness, Dispersion, and Local Functions in Data Depth

Data depth is a rapidly growing area in nonparametric statistics, especially suited for the analysis of multidimensional data. This chapter covers influence functions and robustness, depth-based dispersion measures, and a generalization of the basic notion of depth function, called local depth, able to deal with multimodal data.

Mario Romanazzi, Claudio Agostinelli

3. New Distribution-Free Plotting Position Through an Approximation to the Beta Median

Even in the modern software, graphical techniques are often utilized to visualize the data and to determine the actual underlying distribution. In this chapter, a new and better distribution-free formula is obtained via an axiomatic approach leading to a new approximation to the median of the Beta distribution. A comparative study, carried out also by using extensive Monte Carlo simulation, shows the advantages of the new solution especially in estimating the median return period and for each considered sample dimension (

= 5, 15, 30, 50).

Pasquale Erto, Antonio Lepore

4. On Gaussian Compound Poisson Type Limiting Likelihood Ratio Process

Different change-point type models encountered in statistical inference for stochastic processes give rise to different limiting likelihood ratio processes. Recently it was established that one of these likelihood ratios, which is an exponential functional of a two-sided Poisson process driven by some parameter, can be approximated (for sufficiently small values of the parameter) by another one, which is an exponential functional of a two-sided Brownian motion. In this chapter we consider yet another likelihood ratio, which is the exponent of a two-sided compound Poisson process driven by some parameter. We establish that the compound Poisson type likelihood ratio can also be approximated by the Brownian type one for sufficiently small values of the parameter. We equally discuss the asymptotics for large values of the parameter.

Sergueï Dachian, Ilia Negri

5. Archetypal Symbolic Objects

Symbolic Data Analysis has represented an important innovation in statistics since its first presentation by E. Diday in the late 1980s. Most of the interest has been for the statistical analysis of Symbolic Data that represent complex data structure where variables can assume more than just a single value. Thus, Symbolic Data allow to describe classes of statistical units as a whole. Furthermore, other entities can be defined in the realm of Symbolic data. These entities are the Symbolic objects, defined in terms of the relationships between two different knowledge levels. This article aims at introducing a new type of SO based on the archetypal analysis.

M. R. D’Esposito, F. Palumbo, G. Ragozini

6. Lorenz Zonoids and Dependence Measures: A Proposal

Recently, the analysis of ordered and non-ordered categorical variables has assumed a relevant role, especially with regard to the evaluation of customer satisfaction, health and educational effectiveness. In such real contexts, the study of dependence relations among the involved variables represents an attractive research field. However, the categorical nature of variables does not always successfully allow the application of the existing standard dependence measures, since categorical data are not specified according to a metric scale. In fact, the aforementioned statistical methods are more appropriate in a purely quantitative setting, because based on the Euclidean distance. Our purpose aims at overcoming these restrictions by extending the dependence study in a quali–quantitative perspective. The idea is focused on employing specific statistical tools, such as the Lorenz curves and the so-called Lorenz zonoids. A novel Lorenz zonoids-based relative dependence measure is proposed as an alternative to the partial correlation coefficient to establish each categorical covariate contribution in a multiple linear regression model.

Emanuela Raffinetti, Paolo Giudici

7. Algebraic Generation of Orthogonal Fractional Factorial Designs

The joint use of counting functions, Hilbert basis, and Markov basis allows to define a procedure to generate all the fractional factorial designs that satisfy a given set of constraints in terms of orthogonality [Fontana, Pistone and Rogantin (JSPI, 2000), Pistone and Rogantin (JSPI, 2008)]. The general case of mixed level designs without restrictions on the number of levels of each factor (such as power of prime number) is studied. The generation problem is reduced to finding positive integer solutions of a linear system of equations [e.g., Carlini and Pistone (JSTP, 2007)]. This new methodology has been experimented on some significant classes of fractional factorial designs, including mixed level orthogonal arrays and sudoku designs [Fontana and Rogantin in Algebraic and Geometric Methods in Statistics, CUP (2009)]. For smaller cases the complete generating set of all the solutions can be computed. For larger cases we resort to the random generation of a sample solution.

Roberto Fontana, Giovanni Pistone

Methods for Time Series, Spatial and Functional Data

Frontmatter

8. A Functional Spatio-Temporal Model for Geometric Shape Analysis

In this chapter we consider a functional spatio-temporal model for shape objects represented by landmark data. The model describes a time-varying deformation of the ambient space in which the objects of interest lie. The use of basis functions, defined by principal warps in space and time, facilitates both the model specification and the fitting of the data in Procrustes tangent coordinates. The fitted model can be interpreted either just in terms of the finite set of landmarks at the given set of time points, or in terms of a deformation of the space which varies continuously in time. The method is illustrated on a facial expression dataset.

Lara Fontanella, Luigi Ippoliti, Pasquale Valentini

9. Vector Threshold Moving Average Models: Model Specification and Invertibility

In this chapter we propose a class of nonlinear time series models in which the underlying process shows a threshold structure where each regime follows a vector moving average model. We call this class of processes Threshold Vector Moving Average. The stochastic structure is presented even proposing alternative model specifications. The invertibility of the model is discussed detail and, in this context, empirical examples are proposed to show some features that distinguish the stochastic structure under analysis from other linear and nonlinear time series models widely investigated in the literature.

Marcella Niglio, Cosimo Damiano Vitale

10. A Regionalization Method for Spatial Functional Data Based on Variogram Models: An Application on Environmental Data

This chapter proposes a Dynamic Clustering Algorithm (

DCA

) as a new regionalization method for spatial functional data. The method looks for the best partition optimizing a criterion of spatial association among functional data. Furthermore it is such that a summary of the variability structure of each cluster is discovered. The performance of the proposal is checked through an application on real data.

Elvira Romano, Antonio Balzanella, Rosanna Verde

11. Spectral Decomposition of the AR Metric

This work investigates a spectral decomposition of the

metric proposed as a measure of structural dissimilarity among

ARIMA

processes. Specifically, the metric will be related to the variance of a stationary process so that its behaviour in the frequency domain will help to detect how unobserved components generated by the parameters of both phenomena concur in specifying the obtained distance. Foundations for the metric are briefly reminded and the main consequences of the proposed decomposition are discussed with special reference to some specific stochastic processes in order to improve the interpretative content of the

metric.

Maria Iannario, Domenico Piccolo

12. Nonlinear Nonstationary Model Building by Genetic Algorithms

Many time series exhibits both nonlinearity and nonstationarity. Though both features have been often taken into account separately, few attempts have been proposed for modeling them simultaneously. We consider threshold models and present a general model allowing for several different regimes both in time and in levels, where regime transitions may happen according to self-exciting, or smoothly varying, or piecewise linear threshold modeling. Since fitting such a model involves the choice of a large number of structural parameters, we propose a procedure based on genetic algorithms, evaluating models by means of a generalized identification criterion. The proposed model building strategy is applied to a financial index.

Francesco Battaglia, Mattheos K. Protopapas

13. Using the Autodependogram in Model Diagnostic Checking

In this chapter the autodependogram is contextualized in model diagnostic checking for nonlinear models by studying the lag-dependencies of the residuals. Simulations are considered to evaluate its effectiveness in this context. An application to the Swiss Market Index is also provided.

Luca Bagnato, Antonio Punzo

Statistical Modelling and Data Analysis

Frontmatter

14. Refined Estimation of a Light Tail: An Application to Environmental Data

In this chapter, we consider a recent class of

generalized negative moment

estimators of a negative extreme value index, the primary parameter in

statistics of extremes

. Apart from the usual integer parameter

, related to the number of top order statistics involved in the estimation, these estimators depend on an extra real parameter

, which makes them highly flexible and possibly second-order unbiased for a large variety of models. In this chapter, we are interested not only on the adaptive choice of the

tuning

parameters

and

, but also on an application of these semi-parametric estimators to the analysis of sets of environmental and simulated data.

M. Ivette Gomes, Lígia Henriques-Rodrigues, Frederico Caeiro

15. Model-Based Classification of Clustered Binary Data with Non-ignorable Missing Values

A hierarchical logistic regression model with nested, discrete random effects is proposed for the unsupervised classification of clustered binary data with non-ignorable missing values. An E-M algorithm is proposed that essentially reduces to the iterative estimation of a set of weighted logistic regressions from two augmented datasets, alternated with weights updating. The proposed approach is exploited on a sample of Chinese older adults, to cluster subjects according to their cognitive impairment and ability to cope with a Mini-Mental State Examination questionnaire.

Francesco Lagona

16. A Model for Correlated Paired Comparison Data

Paired comparison data arise when objects are compared in couples. This type of data occurs in many applications. Traditional models developed for the analysis of paired comparison data assume independence among all observations, but this seems unrealistic because comparisons with a common object are naturally correlated. A model that introduces correlation between comparisons with at least a common object is discussed. The likelihood function of the proposed model involves the approximation of a high dimensional integral. To overcome numerical difficulties a pairwise likelihood approach is adopted. The methodology is illustrated through the analysis of the 2006/2007 Italian men’s volleyball tournament and the 2008/2009 season of the Italian water polo league.

Manuela Cattelan, Cristiano Varin

17. Closed Skew Normal Stochastic Frontier Models for Panel Data

We introduce a stochastic frontier model for longitudinal data where a subject random effect coexists with a time independent random inefficiency component and with a time dependent random inefficiency component. The role of the closed skew normal distribution in this kind of modeling is stressed.

Roberto Colombi

18. How Far Can Man Go?

In this chapter we address the question of “What is the Largest Jump at Man’s reach, given today’s state of the art?” To answer that question it will be used the

best from the best

, i.e., the data will be collected from the best “jumpers” from World Athletics Competitions—“Long Jump Men Outdoors” event. Our approach to the problem is based on the probability theory of extreme values (EVT) and the corresponding statistical techniques. We shall only use the top performances of World top lists. Our estimated ultimate record, i.e., the right endpoint of the jumping event, tells us what is possible to infer about the possible personal best mark, given today’s knowledge, sports conditions and rules.

Isabel Fraga Alves, Laurens de Haan, Cláudia Neves

19. Joint Modeling of Longitudinal and Time-to-Event Data: Challenges and Future Directions

In longitudinal studies measurements are often collected on different types of outcomes for each subject. These may include several longitudinally measured responses (such as blood values relevant to the medical condition under study) and the time at which an event of particular interest occurs (e.g., death, development of a disease or dropout from the study). These outcomes are often separately analyzed; however, in many instances, a joint modeling approach is either required or may produce a better insight into the mechanisms that underlie the phenomenon under study. In this chapter we provide a general overview of the joint modeling framework, discuss its main features, and we refer to future directions.

Dimitris Rizopoulos

20. A Class of Linear Regression Models for Imprecise Random Elements

The linear regression problem of a fuzzy response variable on a set of real and/or fuzzy explanatory variables is investigated. The notion of

fuzzy random variable is introduced in this connection, leading to the probabilization of the center and the left and right spreads of the response variable. A specific metric is suggested for coping with this type of variables. A class of linear regression models is then proposed for the center and for suitable transforms of the spreads in order to satisfy the nonnegativity conditions for the latter ones. A Least Squares solution for estimating the parameters of the models is derived, along with a goodness-of-fit measure and the associated hypothesis testing procedure. Finally, the results of a real-life application are illustrated.

Renato Coppi, Maria Brigida Ferraro, Paolo Giordani

21. A Model-Based Dimension Reduction Approach to Classification of Gene Expression Data

The monitoring of the expression profiles of thousands of genes have proved to be particularly promising for biological classification, particularly for cancer diagnosis. However, microarray data present major challenges due to the complex, multiclass nature and the overwhelming number of variables characterizing gene expression profiles. We introduce a methodology that combine dimension reduction method and classification based on finite mixture of Gaussian densities. Information on the dimension reduction subspace is based on the variation of components means for each class, which in turn are obtained by modeling the within class distribution of the predictors through finite mixtures of Gaussian densities. The proposed approach is applied to the leukemia data, a well known dataset in the microarray literature. We show that the combination of dimension reduction and model-based clustering is a powerful technique to find groups among gene expression data.

Luca Scrucca, Avner Bar-Hen

22. Exploiting Multivariate Outcomes in Bayesian Inference for Causal Effects with Noncompliance

A Bayesian approach to causal inference in the presence of noncompliance to assigned randomized treatment is considered. It exploits multivariate outcomes for improving estimation of weakly identified models, when the usually invoked exclusion restriction assumptions are relaxed. Using artificial data sets, we analyze the properties of the posterior distribution of causal estimands to evaluate the potential gains of jointly modeling more than one outcome. The approach can be used to assess robustness with respect to deviations from structural identifying assumptions. It can also be extended to the analysis of observational studies with instrumental variables where exclusion restriction assumptions are usually questionable.

Alessandra Mattei, Fabrizia Mealli, Barbara Pacini

23. Fuzzy Composite Indicators: An Application for Measuring Customer Satisfaction

Composite indicators should ideally measure multidimensional concepts which cannot be captured by a single variable. In this chapter, we suggest a method based on fuzzy set theory for the construction of a fuzzy synthetic index of a latent phenomenon (e.g., well-being, quality of life, etc.), using a set of manifest variables measured on different scales (quantitative, ordinal and binary). A few criteria for assigning values to the membership function are discussed, as well as criteria for defining the weights of the variables. For ordinal variables, we propose a fuzzy quantification method based on the sampling cumulative function and a weighting system which takes into account the relative frequency of each category. An application regarding the results of a survey on the users of a contact center is presented.

Sergio Zani, Maria Adele Milioli, Isabella Morlini

Survey Methodology and Official Statistics

Frontmatter

24. Symmetric Association Measures in Effect-Control Sampling

In order to measure the association between an exposure variable

and an outcome variable

, we introduce the effect-control sampling design and we consider the family of symmetric association measures. Focusing on the case of binary exposure and outcome variables, a general estimator of such measures is proposed and its asymptotic properties are also discussed. We define an allocation procedure for a stratified effect-control design, which is optimal in terms of the variance of such an estimator. Finally, small sample behavior is investigated by Monte Carlo simulation for a measure belonging to the family, which we believe particularly interesting as it possess the appealing property of being normalized.

Riccardo Borgoni, Donata Marasini, Piero Quatto

25. Spatial Misalignment Models for Small Area Estimation: A Simulation Study

We propose a class of misaligned data models for addressing typical small area estimation (SAE) problems. In particular, we extend hierarchical Bayesian atom-based models for spatial misalignment to the SAE context enabling use of auxiliary covariates, which are available on areal partitions non-nested with the small areas of interest, along with planned domains survey estimates also misaligned with these small areas. We model the latent characteristic of interest at atom level as a Poisson variate with mean arising as a product of population size and incidence. Spatial random effects are introduced using either a CAR model or a process specification. For the latter, incidence is a function of a Gaussian process model for the spatial point pattern over the entire region. Atom counts are driven by integrating the point process over atoms. In the proposed class of models benchmarking to large area estimates is automatically satisfied. A simulation study examines the capability of the proposed models to improve on traditional SAE model estimates.

Matilde Trevisani, Alan Gelfand

26. Scrambled Response Models Based on Auxiliary Variables

We discuss the problem of obtaining reliable data on a sensitive quantitative variable without jeopardizing respondent privacy. The information is obtained by asking respondents to perturb the response through a scrambling mechanism. A general device allowing for the use of multi-auxiliary variables is illustrated as well as a class of estimators for the unknown mean of a sensitive variable. A number of scrambled response models are shown and others discussed in terms of the efficiency of the estimates and the privacy guaranteed to respondents.

Pier Francesco Perri, Giancarlo Diana

27. Using Auxiliary Information and Nonparametric Methods in Weighting Adjustments

Weighting adjustments are commonly used in survey estimation to compensate for unequal selection probabilities, nonresponse, noncoverage, and sampling fluctuations from known population values. Over time many weighting methods have been proposed, mainly in the nonresponse framework. These methods generally make use of auxiliary variables to reduce the bias of the estimators and improve their efficiency. Frequently, a substantial amount of auxiliary information is available and the choice of the auxiliary variables and the way in which they are employed may be significant. Moreover, the efficacy of weighting adjustments is often seen as a bias–variance trade-off. In this chapter, we analyze these aspects of the nonresponse weighting adjustments and investigate the properties of mean estimators adjusted by individual response probabilities estimated through nonparametric methods in situations where multiple covariates are both categorical and continuous.

Emilia Rocco

28. Open Source Integer Linear Programming Solvers for Error Localization in Numerical Data

Error localization problems can be converted into Integer Linear Programming problems. This approach provides several advantages and guarantees to find a set of erroneous fields having minimum total cost. By doing so, each erroneous record produces an Integer Linear Programming model that should be solved. This requires the use of specific solution softwares called Integer Linear Programming solvers. Some of these solvers are available as open source software. A study on the performance of internationally recognized open source Integer Linear Programming solvers, compared to a reference commercial solver on real-world data having only numerical fields, is reported. The aim was to produce a stressing test environment for selecting the most appropriate open source solver for performing error localization in numerical data.

Gianpiero Bianchi, Renato Bruni, Alessandra Reale

29. Integrating Business Surveys

Guidelines and Principles Based on the Belgian Experience

In the past few years the modernization of business surveys has been extensively discussed by several NSI’s in Europe. The MEETS program (Modernisation of European Enterprise and Trade Statistics) has been recently launched for the years 2008–2013 by EUROSTAT for encouraging the reform process at the European level in order to identify new areas for business statistics, to enhance the integration of data collection and treatment, and to improve the harmonization of methods and concepts in business statistics. At Statistics Belgium the debate has been brought especially through a revision of concepts and methods in business surveys with the aim of reducing the survey costs for the Administration and the response burden for the enterprises. In the present contribution, the issue of integration of business surveys is tackled with a variable-oriented approach and using classification techniques.

Maria Caterina Bramati

Social Statistics and Demography

Frontmatter

30. A Measure of Poverty Based on the Rasch Model

Asking questions about income is often a hard task: nonresponse rates are high, and the reliability of the surveyed answers is sometimes poor. In this contribution we propose a measure of the economic status based on a set of wealth-related items. This measure is based on the Rasch model, which allows to estimate different relevance weights for each item: difficulty parameters indicate the severity of each of the situations described by the items, while ability parameters indicate poverty level of each unit. This poverty measure is computed on survey data collected on a sample of 2,465 households living in Veneto. Some analyses conducted on the estimated poverty measure indicate a good consistency with the expected relationships, and confirm the possibility to get this kind of estimates from indirect measures.

Maria Cristiana Martini, Cristiano Vanin

31. Chronic Poverty in European Mediterranean Countries

This chapter investigates the characteristics of chronic poverty among youth in Southern European countries. These countries have the highest levels of poverty in Europe and welfare systems unable to smooth social inequalities effectively. The main aim of this chapter is to investigate on how long-lasting poor among the Mediterranean youth differs in socio-demographic and economic characteristics. The intensity of poverty over time is measured by a very recent longitudinal poverty index based on the rationale of cumulative hardship. We tested the effects of many covariates on the incidence and intensity of chronic poverty, controlling for main socio-demographic and economic characteristics, using a ZINB model. The peculiar mix between the lack of effective state institutions and a strong presence of families explains why many factors, which usually are significantly related to poverty, do not have an important role here. A strong inertial effect is due to the intensity of poverty at the start of the time-window. Italian youth has the worst performance.

Daria Mendola, Annalisa Busetta

32. Do Union Formation and Childbearing Improve Subjective Well-being? An Application of Propensity Score Matching to a Bulgarian Panel

The link between childbearing, union formation and subjective well-being is still under-investigated. A key problem is disentangling causal effects, a challenge when the interplay between life course pathways and states of mind is investigated. Here we use propensity score matching estimates applied to panel data to demonstrate how the birth of a first child or entry into union increase individuals’ psychological well-being and reduce disorientation in Bulgaria, a transition country with lowest-low fertility and postponement of union formation. Sensitivity analyses confirm the robustness of our findings to heterogeneous levels of hidden bias.

Emiliano Sironi, Francesco C. Billari

33. Health and Functional Status in Elderly Patients Living in Residential Facilities in Italy

Over the last 50 years the aging of the population in Italy has been one of the fastest among developed countries, and healthcare professionals have witnessed a rapid increase in the complexity of the case mix of older patients. In Italy in 2006, Residential Facilities (RFs) cared for 230,468 people aged 65 and over. Due to the increase in the overall proportion of the aged in the general population (particularly in those over 85) and the sharp decline in the number of extended families (with the consequent reduction in informal support), the probability of an increase in the number of RF residents in future years is very high. The objective of this work is twofold. Firstly, to report on the availability of institutional care in Italy by analysing the territorial distribution of residential facilities, rehabilitation centres and the hospital structure with the intent of gathering both quantitative and qualitative data. Secondly, to examine the health conditions of the elderly in these institutions. Functional status, multiple pathology and medical conditions requiring care have been evaluated in 1,215 elderly subjects living in Residential Facilities across five Italian regions.

Giulia Cavrini, Claudia Di Priamo, Lorella Sicuro, Alessandra Battisti, Alessandro Solipaca, Giovanni de Girolamo

34. Dementia in the Elderly: Health Consequences on Household Members

The growing number of the oldest-old will cause an increase in the number of mentally ill elderly persons in the population, given that no positive evolution of senile dementia is expected in the near future. Living in the household is the best strategy to contain the pace of mental deterioration, to better manage the disease and to maintain as long as possible the vigilance of the demented person. But dementia is one of the most devastating impairments, both for the persons who are affected by it and for their entire family network and friends, and its impact is high on life and well-being for all persons living with the demented elderly and maximum for his/her caregiver. The aim of this work is to evaluate the impact of the presence of an elderly person with dementia on the perceived health of the co-living household members, using data from the Italian health interview survey carried out by the Italian National Institute of Statistics (Istat) in 2005.

V. Egidi, M. A. Salvatore, L. Gargiulo, L. Iannucci, G. Sebastiani, A. Tinto

35. Asset Ownership of the Elderly Across Europe: A Multilevel Latent Class Analysis to Segment Countries and Households

Wealth is a useful measure of the socio-economic status of the elderly, because it might reflect both accumulated socio-economic position and potential for current consumption. A growing number of papers have studied household portfolio in old age, both from a financial point of view (i.e. in the framework of the life-cycle model) and from a marketing perspective. In this chapter, we aim at providing new evidence on this issue both at the household and country level, by investigating similarities and differences in the ownership patterns of several financial and real assets among elderly in Europe. To do so, we exploit the richness of information provided by SHARE (Survey of Health, Ageing and Retirement in Europe), an international survey on ageing that collects detailed information on several aspects of the socio-economic condition of the European elderly. Given the hierarchical structure of the data, the econometric solution we adopt is a multilevel latent class analysis, which allows us to obtain simultaneously country and household segments.

Omar Paccagnella, Roberta Varriale

36. The Longevity Pattern in Emilia Romagna, Italy: A Spatio-temporal Analysis

In this chapter, we investigate the pattern of longevity in an Italian region at a municipality level in two different periods. Spatio-temporal modeling is used to tackle at both periods the random variations in the occurrence of long-lived individuals, due to the rareness of such events in small areas. This method allows to exploit the spatial proximity smoothing the observed data, as well as to control for the effects of a set of regressors. As a result, clusters of areas characterized by extreme indexes of longevity are well identified and the temporal evolution of the phenomenon can be depicted. A joint analysis of male and female longevity by cohort in the two periods is conducted specifying a set of hierarchical Bayesian models.

Giulia Roli, Rossella Miglio, Rosella Rettaroli, Alessandra Samoggia

37. Material Deprivation and Incidence of Lung Cancer: A Census Block Analysis

We study the relationship between incidence of lung cancer in males in the Tuscan region and material deprivation defined at census block level. We developed a bi-variate hierarchical Bayesian model to assess completeness of registration of incidence data, and we proposed a series of random effect hierarchical Bayesian models to estimate the degree of association with material deprivation. Model comparison is addressed by a modified Deviance Information Criterion. We estimated a percent increase in risk of lung cancer for an increase of one standard deviation of material deprivation at census block level of 3.36% vs 5.87% at municipality level. The random slope models reported a paradoxically negative effect of material deprivation at census block level in some areas. Spatially structured random intercept models behaved better and random slope models were penalized by their extra complexity.

Laura Grisotto, Dolores Catelan, Annibale Biggeri

38. Mining Administrative Health Databases for Epidemiological Purposes: A Case Study on Acute Myocardial Infarctions Diagnoses

We present a pilot data mining analysis on the subset of the Public Health Database (PHD) of Lombardia Region concerning hospital discharge data relative to Acute Myocardial Infarctions without ST segment elevation (NON-STEMI). The analysis is carried out using nonlinear semi-parametric and parametric mixed effects models, in order to detect different patterns of growth in the number of NON-STEMI diagnoses within the 30 largest clinical structures of Lombardia Region, along the time period 2000–2007. The analysis is a seminal example of statistical support to decision makers in clinical context, aimed at monitoring the diffusion of new procedures and the effects of health policy interventions.

Francesca Ieva, Anna Maria Paganoni, Piercesare Secchi

Economic Statistics and Econometrics

Frontmatter

39. Fractional Integration Models for Italian Electricity Zonal Prices

In the last few years we have observed an increasing interest in deregulated electricity markets. Only few papers, to the authors’ knowledge, have considered the Italian Electricity Spot market since it has been deregulated recently. This contribution is an investigation with emphasis on price dynamics accounting for technologies, market concentration, and congestions as well as extreme spiky behavior. We aim to understand how technologies, concentration, and congestions affect the zonal prices since all these combine to bring about the single national price (

prezzo unico d’acquisto

, PUN). Implementing Reg–ARFIMA–GARCH models, we draw policy indications based on the empirical evidence that technologies, concentration, and congestions do affect Italian electricity prices.

Angelica Gianfreda, Luigi Grossi

40. A Generalized Composite Index Based on the Non-Substitutability of Individual Indicators

Composite indices for ranking territorial units are widely used by many international organizations in order to measure economic, social and environmental phenomena. The literature offers a great variety of aggregation methods, all with their pros and cons. In this chapter, we propose an alternative composite index which, starting from a linear aggregation, introduces penalties for the units with “unbalanced” values of the indicators. As an example of application, we consider the set of individual indicators of the Technology Achievement Index (TAI) and present a comparison between some traditional methods and the proposed index.

Matteo Mazziotta, Adriano Pareto

41. Evaluating the Efficiency of the Italian University Educational Processes through Frontier Production Methods

The university efficiency indicators, computed as partial indices following the input–output approach, are subject to criticism mainly concerning the

problem of comparability

of institutions due to the influence of several environmental factors. The aim of this study is to construct

comparable efficiency indicators

for the university educational processes by using frontier production methods which enable us to solve the problem. The indicators are computed for the Italian State Universities by using a new data set referring to the cohort of students enrolled in the academic year 2004/2005. A comparison with the corresponding partial indicators usually employed is presented.

Luigi Biggeri, Tiziana Laureti, Luca Secondi

42. Modeling and Forecasting Realized Range Volatility

In this chapter, we estimate, model, and forecast Realized Range Volatility, a realized measure and estimator of the quadratic variation of financial prices. This quantity was early introduced in the literature and it is based on the high-low range observed at high frequency during the day. We consider the impact of the microstructure noise in high frequency data and correct our estimations, following a known procedure. Then, we model the Realized Range accounting for the well-known stylized effects present in financial data. We consider an HAR model with asymmetric effects with respect to the volatility and the return, and GARCH and GJR-GARCH specifications for the variance equation. Moreover, we consider a non-Gaussian distribution for the innovations. The analysis of the forecast performance during the different periods suggests that the introduction of asymmetric effects with respect to the returns and the volatility in the HAR model results in a significant improvement in the point forecasting accuracy.

Massimiliano Caporin, Gabriel G. Velo

43. Clusters and Equivalence Scales

Equivalence scales (

) are difficult to estimate: even apparently solid microeconomic foundations do not necessarily lead to consistent results. We contend that this depends on “style” effects: households with the same economic resources and identical “needs” (e.g. same number of members) may spend differently, following unobservable inclinations (“style” or “taste”). We submit that these style effects must be kept under control if one wants to obtain unbiased estimates of

. One way of doing this is to create clusters of households, with different resources (income), different demographic characteristics (number of members) but similar “economic profile”, in terms of both standard of living and “style”. Cluster-specific scales, and the general

that derives from their average, prove defensible on theoretical grounds and are empirically reasonable and consistent.

Gustavo De Santis, Mauro Maltagliati

44. The Determinants of Income Dynamics

Models of income distribution more or less succeed in linking the current

level

of household (or individual) income to household (or individual) characteristics. However, they are typically far less satisfactory in explaining income

dynamics

. Gibrat’s model proves helpful in highlighting the predominant role of randomness in the short run (here, 2–4 years), and this explains why other systematic influences are difficult to identify. One empirical regularity that does emerge, however, is that small incomes tend to increase more, and with more variability, than large ones. The traditional version of Gibrat’s model does not incorporate this peculiarity, but this shortcoming can be overcome with a relatively minor modification of the original model.

Gustavo De Santis, Giambattista Salinari

45. Benchmarking and Movement Preservation: Evidences from Real-Life and Simulated Series

The benchmarking problem arises when time series data for the same target variable are measured at different frequencies with different level of accuracy, and there is the need to remove discrepancies between annual benchmarks and corresponding sums of the sub-annual values. Two widely used benchmarking procedures are the modified Denton

Proportionate First Differences

(PFD) and the Causey and Trager

Growth Rates Preservation

(GRP) techniques. In the literature it is often claimed that the PFD procedure produces results very close to those obtained through the GRP procedure. In this chapter we study the conditions under which this result holds, by looking at an artificial and a real-life economic series, and by means of a simulation exercise.

Tommaso Di Fonzo, Marco Marini

46. Cumulation of Poverty Measures to Meet New Policy Needs

Reliable indicators of poverty and social exclusion are an essential monitoring tool. Policy research and application increasingly require statistics

disaggregated to lower levels and smaller subpopulations

. This paper addresses some statistical aspects relating to improving the sampling precision of such indicators for subnational regions, in particular through the cumulation of data.

Vijay Verma, Francesca Gagliardi, Caterina Ferretti

47. Long Memory in Integrated and Realized Variance

The long memory properties of the integrated and realized volatility are investigated under the assumption that the instantaneous volatility is driven by a fractional Brownian motion. The equality of their long memory degrees is proved in the ideal situation when prices are observed continuously. In this case, the spectral density of the integrated and realized volatility coincide.

Eduardo Rossi, Paolo Santucci de Magistris

Title: Advances in Theoretical and Applied Statistics
Editors: Nicola Torelli
Fortunato Pesarin
Avner Bar-Hen
Publisher: Springer Berlin Heidelberg
Electronic ISBN: 978-3-642-35588-2
Print ISBN: 978-3-642-35587-5
DOI: https://doi.org/10.1007/978-3-642-35588-2

Springer Professional

About this book

Table of Contents

Frontmatter

Statistical Theory

Frontmatter

1. Default Priors Based on Pseudo-Likelihoods for the Poisson-GPD Model

2. Robustness, Dispersion, and Local Functions in Data Depth

3. New Distribution-Free Plotting Position Through an Approximation to the Beta Median

4. On Gaussian Compound Poisson Type Limiting Likelihood Ratio Process

5. Archetypal Symbolic Objects

6. Lorenz Zonoids and Dependence Measures: A Proposal

7. Algebraic Generation of Orthogonal Fractional Factorial Designs

Methods for Time Series, Spatial and Functional Data

Frontmatter

8. A Functional Spatio-Temporal Model for Geometric Shape Analysis

9. Vector Threshold Moving Average Models: Model Specification and Invertibility

10. A Regionalization Method for Spatial Functional Data Based on Variogram Models: An Application on Environmental Data

11. Spectral Decomposition of the AR Metric

12. Nonlinear Nonstationary Model Building by Genetic Algorithms

13. Using the Autodependogram in Model Diagnostic Checking

Statistical Modelling and Data Analysis

Frontmatter

14. Refined Estimation of a Light Tail: An Application to Environmental Data

15. Model-Based Classification of Clustered Binary Data with Non-ignorable Missing Values

16. A Model for Correlated Paired Comparison Data

17. Closed Skew Normal Stochastic Frontier Models for Panel Data

18. How Far Can Man Go?

19. Joint Modeling of Longitudinal and Time-to-Event Data: Challenges and Future Directions

20. A Class of Linear Regression Models for Imprecise Random Elements

21. A Model-Based Dimension Reduction Approach to Classification of Gene Expression Data

22. Exploiting Multivariate Outcomes in Bayesian Inference for Causal Effects with Noncompliance

23. Fuzzy Composite Indicators: An Application for Measuring Customer Satisfaction

Survey Methodology and Official Statistics

Frontmatter

24. Symmetric Association Measures in Effect-Control Sampling

25. Spatial Misalignment Models for Small Area Estimation: A Simulation Study

26. Scrambled Response Models Based on Auxiliary Variables

27. Using Auxiliary Information and Nonparametric Methods in Weighting Adjustments

28. Open Source Integer Linear Programming Solvers for Error Localization in Numerical Data

29. Integrating Business Surveys

Social Statistics and Demography

Frontmatter

30. A Measure of Poverty Based on the Rasch Model

31. Chronic Poverty in European Mediterranean Countries

32. Do Union Formation and Childbearing Improve Subjective Well-being? An Application of Propensity Score Matching to a Bulgarian Panel

33. Health and Functional Status in Elderly Patients Living in Residential Facilities in Italy

34. Dementia in the Elderly: Health Consequences on Household Members

35. Asset Ownership of the Elderly Across Europe: A Multilevel Latent Class Analysis to Segment Countries and Households

36. The Longevity Pattern in Emilia Romagna, Italy: A Spatio-temporal Analysis

37. Material Deprivation and Incidence of Lung Cancer: A Census Block Analysis

38. Mining Administrative Health Databases for Epidemiological Purposes: A Case Study on Acute Myocardial Infarctions Diagnoses

Economic Statistics and Econometrics

Frontmatter

39. Fractional Integration Models for Italian Electricity Zonal Prices

40. A Generalized Composite Index Based on the Non-Substitutability of Individual Indicators

41. Evaluating the Efficiency of the Italian University Educational Processes through Frontier Production Methods

42. Modeling and Forecasting Realized Range Volatility

43. Clusters and Equivalence Scales

44. The Determinants of Income Dynamics

45. Benchmarking and Movement Preservation: Evidences from Real-Life and Simulated Series

46. Cumulation of Poverty Measures to Meet New Policy Needs

47. Long Memory in Integrated and Realized Variance

Premium Partner