Skip to main content
Top

2018 | Book

Studies in Theoretical and Applied Statistics

SIS 2016, Salerno, Italy, June 8-10

insite
SEARCH

About this book

This book includes a wide selection of the papers presented at the 48th Scientific Meeting of the Italian Statistical Society (SIS2016), held in Salerno on 8-10 June 2016. Covering a wide variety of topics ranging from modern data sources and survey design issues to measuring sustainable development, it provides a comprehensive overview of the current Italian scientific research in the fields of open data and big data in public administration and official statistics, survey sampling, ordinal and symbolic data, statistical models and methods for network data, time series forecasting, spatial analysis, environmental statistics, economic and financial data analysis, statistics in the education system, and sustainable development. Intended for researchers interested in theoretical and empirical issues, this volume provides interesting starting points for further research.

Table of Contents

Frontmatter

Advances in Survey Methods and New Sources in Public Statistics

Frontmatter
Robustness in Survey Sampling Using the Conditional Bias Approach with R Implementation
Abstract
The classical tools of robust statistics have to be adapted to the finite population context. Recently, a unified approach for robust estimation in surveys has been introduced. It is based on an influence measure called the conditional bias that allows to take into account the particular finite population framework and the sampling design. In the present paper, we focus on the design-based approach and we recall the main properties of the conditional bias and how it can be used to define a general class of robust estimators of a total. The link between this class and the well-known winsorized estimators is detailed. We also recall how the approach can be adapted for estimating domain totals in a robust and consistent way. The implementation in R of the proposed methodology is presented with some functions that estimate the conditional bias, calculate the proposed robust estimators and compute the weights associated to the winsorized estimator for particular designs. One function for computing consistently domain totals is also proposed.
Cyril Favre-Martinoz, Anne Ruiz-Gazen, Jean Francois Beaumont, David Haziza
Methodological Perspectives for Surveying Rare and Clustered Population: Towards a Sequentially Adaptive Approach
Abstract
Sampling a rare and clustered trait in a finite population is challenging: traditional sampling designs usually require a large sample size in order to obtain reasonably accurate estimates, resulting in a considerable investment of resources in front of the detection of a small number of cases. A notable example is the case of WHO’s tuberculosis (TB) prevalence surveys, crucial for countries that bear a high TB burden, the prevalence of cases being still less than 1%. In the latest WHO guidelines, spatial patterns are not explicitly accounted for, with the risk of missing a large number of cases; moreover, cost and logistic constraints can pose further problems. After reviewing the methodology in use by WHO, the use of adaptive and sequential approaches is discussed as natural alternatives to improve over the limits of the current practice. A simulation study is presented to highlight possible advantages and limitations of these alternatives, and an integrated approach, combining both adaptive and sequential features in a single sampling strategy is advocated as a promising methodological perspective.
Federico Andreis, Emanuela Furfaro, Fulvia Mecatti
Age Management in Italian Companies. Findings from Two INAPP Surveys
Abstract
The aim of this paper is to analyze the behavior of the Italian companies and the solutions adopted for keeping and reintegrating ageing workers in the labour market, as well as the strategies implemented for their professional enhancement, starting from the results of two INAPP surveys. The first one, is a quantitative survey on the attitude of the small and medium-sized enterprises (SMEs) employers towards the ageing workers; the second one is a qualitative research on age management best practices in the large companies. As a result, emergent trends will be underlined in the great enterprises, with a focus on similarities and differences with respect to the SMEs, in order to individualize a feasible development perspective.
Maria Laura Aversa, Paolo Emilio Cardone, Luisa D’Agostino
Generating High Quality Administrative Data: New Technologies in a National Statistical Reuse Perspective
Abstract
Statistical reuse of administrative data is limited by serious issues about data quality: these concerns are particularly serious in innovative contexts, in which only administrative data could provide the required granularity and fit to the real processes. Administrative data semantic and meta-information, in particular, are hardly ascribable to general notation standards: PA, as a services purchaser, is often not able to provide to its suppliers efficient clues about the way data must be denoted in their statistical reuse perspective. In this paper we discuss how new technologies and semantic web, in particular, may provide unprecedented methods and instruments to support PA in orienting their suppliers to high quality administrative data: the joint role of the National Statistical Institute and Public bodies, as owners of administrative data, will be discussed.
Manlio Calzaroni, Cristina Martelli, Antonio Samaritani
Exploring Solutions for Linking Big Data in Official Statistics
Abstract
Official statistics has acknowledged the value of big data and has started exploring the use of diverse sources in several domains. Sometimes, big data objects can be easily connected to statistical units. If a unit identifier is available, the opportunity to link big data to existing statistical micro data can allow enlarging the content, the coverage, the accuracy and the timeliness of official statistics, for example Internet-scraped data could be used with this aim. In this setting, new challenges arise in data integration with respect to linking administrative data. In this work, we describe a real case of integration of web scraped data and a statistical register of agritourisms specifying the novelties and challenges of the procedure.
Tiziana Tuoto, Daniela Fusco, Loredana Di Consiglio

Recent Debates in Statistics and Statistical Algorithms

An Algorithm for Finding Projections with Extreme Kurtosis
Abstract
Projection pursuit is a multivariate statistical technique aimed at finding interesting low-dimensional data projections. A projection pursuit index is a function which associates a data projection to a real value measuring its interestingness: the higher the index, the more interesting the projection. Consequently, projection pursuit looks for the data projection which maximizes the projection pursuit index. The absolute value of the fourth standardized cumulant is a prominent projection pursuit index. In the general case, a projection achieving either minimal or maximal kurtosis poses computational difficulties. We address them by an algorithm which converges to the global optimum, whose computational advantages are illustrated with air pollution data.
Cinzia Franceschini, Nicola Loperfido
Maxima Units Search (MUS) Algorithm: Methodology and Applications
Abstract
An algorithm for extracting identity submatrices of small rank and pivotal units from large and sparse matrices is proposed. The procedure has already been satisfactorily applied for solving the label switching problem in Bayesian mixture models. Here we introduce it on its own and explore possible applications in different contexts.
Leonardo Egidi, Roberta Pappadà, Francesco Pauli, Nicola Torelli
DESPOTA: An Algorithm to Detect the Partition in the Extended Hierarchy of a Dendrogram
Abstract
DESPOTA is a method proposed to seek the best partition among the ones hosted in a dendrogram. The algorithm visits nodes from the tree root toward the leaves. At each node, it tests the null hypothesis that the two descending branches sustain only one cluster of units through a permutation test approach. At the end of the procedure, a partition of the data into clusters is returned. This paper focuses on the interpretation of the test statistic using a data–driven approach, exploiting a real dataset to show the details of the test statistic and the algorithm in action. The working principle of DESPOTA is shown in the light of the Lance–Williams recurrence formula, which embeds all types of agglomeration methods.
Davide Passaretti, Domenico Vistocco
The p-value Case, a Review of the Debate: Issues and Plausible Remedies
Abstract
We review the recent debate on the lack of reliability of scientific results and its connections to the statistical methodologies at the core of the discovery paradigm. Null hypotheses statistical testing, in particular, has often been related to, if not blamed for, the present situation. We argue that a loose relation exists: although NHST, if properly used, could not be seen as a cause, some common misuses may mask or even favour bad practices leading to the lack of reliability. We discuss various proposals which have been put forward to deal with these issues.
Francesco Pauli

Statistical Models and Methods for Network Data, Ordinal and Symbolic Data

Frontmatter
A Dynamic Discrete-Choice Model for Movement Flows
Abstract
We consider data where we have individuals affiliated with at most one organisational unit and where the interest is in modelling changes to these affiliations over time. This could be the case of people working for organisations or people living in neighbourhoods. We draw on dynamic models for social networks to propose an actor-oriented model for how these affiliations change over time. These models specifically take into account constraints of the system and allow for the system to be observed at discrete time-points. Constraints stem from the fact that for example not everybody can have the same job or live in the same neighbourhood, something which induces dependencies among the decisions marginally. The model encompasses two modelling components: a model for determining the termination of an affiliation; and a discrete-choice model for determining the new affiliation. For estimation we employ a Bayesian data-augmentation algorithm, that augments the observed states with unobserved sequences of transitions. We apply the proposed methods to a dataset of house-moves in Stockholm and illustrate how we may infer the mechanisms that sustain and perpetuate segregation on the housing market.
Johan Koskinen, Tim Müller, Thomas Grund
On the Analysis of Time-Varying Affiliation Networks: The Case of Stage Co-productions
Abstract
Multiple Correspondence Analysis and Multiple Factor Analysis have proved appropriate for visually analyzing affiliation (two-mode) networks. However, more could be said about the use of these tools within the positional approach of social network analysis, relying upon the ways in which both these factorial methods and blockmodeling can lead to an appraisal of positional equivalences. This paper presents a joint approach that combines all these methods in order to perform a positional analysis of time-varying affiliation networks. We present an application to an affiliation network of theatre companies involved in stage co-productions over four seasons. The study shows how the joint use of Multiple Factor Analysis and blockmodeling helps us understand network positions and the longitudinal affiliation patterns characterizing them.
Giancarlo Ragozini, Marco Serino, Daniela D’Ambrosio
Similarity and Dissimilarity Measures for Mixed Feature-Type Symbolic Data
Abstract
This paper presents some preliminary results for the similarity and dissimilarity measures based on the Cartesian System Model (CSM) that is a mathematical model to manipulate mixed feature-type symbolic data. We define the notion of concept size for the description of each object in the feature space. By extending the notion to the concept sizes of the Cartesian join and the Cartesian meet of the descriptions for objects, we can obtain various similarity and dissimilarity measures. We present especially asymmetric and symmetric similarity measures useful for pattern recognition problems.
Manabu Ichino, Kadri Umbleja
Dimensionality Reduction Methods for Contingency Tables with Ordinal Variables
Abstract
Several extensions of correspondence analysis have been introduced in literature coping with the possible ordinal structure of the variables. They usually obtain a graphical representation of the interdependence between the rows and columns of a contingency table, by using several tools for the dimensionality reduction of the involved spaces. These tools are able to enrich the interpretation of the graphical planes, providing also additional information, with respect to the usual singular value decomposition. The main aim of this paper is to suggest an unified theoretical framework of several methods of correspondence analysis coping with ordinal variables.
Luigi D’Ambra, Pietro Amenta, Antonello D’Ambra

Forecasting Time Series

Frontmatter
Extended Realized GARCH Models
Abstract
We introduce a new class of models that extends the Realized GARCH models of Hansen et al. (J Appl Econom 27:877–906, 2012, [10]). Our model generalizes the original specification of Hansen et al. (J Appl Econom 27:877–906, 2012, [10]). along three different directions. First, it features a time varying volatility persistence. Namely, the shock response coefficient in the volatility equation adjusts to the time varying accuracy of the associated realized measure. Second, our framework allows to consider, in a parsimonious way, the inclusion of multiple realized measures. Finally, it allows for heteroskedasticity of the noise component in the measurement equation. The appropriateness of the proposed class of models is appraised by means of an application to a set of stock returns data.
Richard Gerlach, Giuseppe Storti
Updating CPI Weights Through Compositional VAR Forecasts: An Application to the Italian Index
Abstract
Worldwide, monthly CPIs are mostly calculated as weighted averages of price relatives with fixed base weights. The main source of estimation of CPI weights are National Accounts, whose complexity in terms of data collection, estimation of aggregates and validation procedures leads to several months of delay in the release of the figures. This ends up in a non completely consistent Laspeyres formula since the weights do not refer to the same period as the base prices do, being older by one year and then corrected by the elapsed inflation. In this paper we propose to forecast CPI weights via a compositional VAR model, to obtain more updated weights and, consequently, a more updated measure of inflation through CPIs.
Lisa Crosato, Biancamaria Zavanella
Prediction Intervals for Heteroscedastic Series by Holt-Winters Methods
Abstract
The paper illustrates a procedure to calculate prediction intervals in case of heteroscedasticity using Holt-Winters methods. The procedure has been applied to the Italian daily electricity prices (PUN) of the year 2014; then the prediction intervals have compared to those provided by an ARIMA-GARCH model. The intervals obtained with HW methods have been very similar to the others, but easier to calculate. Moreover, the HW procedure is more flexible in dealing with periodic volatility as proved in the case study.
Paolo Chirico

Spatial Analysis and Issues on Ecological and Environmental Statistics

Frontmatter
Measuring Residential Segregation of Selected Foreign Groups with Aspatial and Spatial Evenness Indices. A Case Study
Abstract
Over the last decades there have been important methodological advances in measuring residential segregation, especially concerning spatial indices. After a discussion of the fundamental concepts and approaches some of the numerous indices are introduced. We focus in particular on the most known aspatial and spatial indices in the dimension of evenness namely segregation and dissimilarity indices. The contribution is based on data of the geographic distribution of selected foreign groups resident in the census enumeration areas that form the Local Labour Market Area (LLMA) of Rome. Data refer to the population censuses 2001 and 2011. Applying the indices to the LLMA of Rome serves as a test of the practical and potential usefulness of the proposed measures and their possible interpretation.
Federico Benassi, Frank Heins, Fabio Lipizzi, Evelina Paluzzi
Space-Time FPCA Clustering of Multidimensional Curves
Abstract
In this paper we focus on finding clusters of multidimensional curves with spatio-temporal structure, applying a variant of a k-means algorithm based on the principal component rotation of data. The main advantage of this approach is to combine the clustering functional analysis of the multidimensional data, with smoothing methods based on generalized additive models, that cope with both the spatial and the temporal variability, and with functional principal components that takes into account the dependency between the curves.
Giada Adelfio, Francesca Di Salvo, Marcello Chiodi
The Power of Generalized Entropy for Biodiversity Assessment by Remote Sensing: An Open Source Approach
Abstract
The assessment of species diversity in relatively large areas has always been a challenging task for ecologists, mainly because of the intrinsic difficulty to judge the completeness of species lists and to undertake sufficient and appropriate sampling. Since the variability of remotely sensed signal is expected to be related to landscape diversity, it could be used as a good proxy of diversity at species level. It has been demonstrated that the relation between species and landscape diversity measured from remotely sensed data or land use maps varies with scale. While traditional metrics supply point descriptions of diversity, generalized entropy’s framework offers a continuum of possible diversity measures, which differ in their sensitivity to rare and abundant reflectance values. In this paper, we aim at: (i) discussing the ecological background beyond the importance of measuring diversity based on generalized entropy and (ii) providing a test on an Open Source tool with its source code for calculating it. We expect that the subject of this paper will stimulate discussions on the opportunities offered by Free and Open Source Software to calculate landscape diversity indices.
Duccio Rocchini, Luca Delucchi, Giovanni Bacaro
An Empirical Approach to Monitoring Ship CO Emissions via Partial Least-Squares Regression
Abstract
Kyoto Protocol and competitiveness of the shipping market have been urging shipping companies to pay increasing attention to ship energy efficiency monitoring. At the same time, new monitoring data acquisition systems on modern ships have brought to a navigation data overload that have to be fully utilized via statistical methodologies. For this purpose, an empirical approach based on Partial Least-Squares regression is introduced by means of a real case study in order to give practical indications on CO2 emission control and for supporting prognosis of faults.
Antonio Lepore, Biagio Palumbo, Christian Capezza

Statistics and the Education System

Frontmatter
Promoting Statistical Literacy to University Students: A New Approach Adopted by Istat
Abstract
Istat, Italian National Statistical Institute, has been pursuing the aim of promoting statistical literacy for many years. Recently (2013) the constitution of a territorial network of experts in disseminating activities is a further effort towards this direction. A new project is devoted to university students. The new approach is gradual: (i) to assess statistical literacy of students; (ii) to intercept statistical requirements of professors; (iii) to design standardized educational packages aimed at improving students ability to read data and statistical information; (iv) to guide students towards statistical thinking through laboratories. Implications of the new approach are discussed.
Alessandro Valentini, Monica Carbonara, Giulia De Candia
From South to North? Mobility of Southern Italian Students at the Transition from the First to the Second Level University Degree
Abstract
In the last decades, the Italian University System has encountered several structural reforms aimed at making it more internationally competitive. Among them, the introduction of the University financial autonomy has triggered an “internal” competition among Universities to attract students from the entire country. Students’ enrollment at the first level has decreased significantly especially after the economic crisis of 2008, while the students’ migration from the South to the Central and Northern regions of the country has increased. These phenomena have created further inequalities within the country and a cultural and socio-economic loss for the South that does not appear to slow down. While Italian internal mobility at the first level has been previously investigated, second level mobility has received little attention. This work attempts to fill this gap, by analyzing the transition from first to second level university degree courses of the Southern Italian students in terms of macro-regional mobility. The data were provided by the Italian Ministry of Education, University and Research. They are a national level longitudinal administrative micro-data on educational careers of the freshmen enrolled at the first level Italian university degree course in 2008–09 and followed up to 2014. We will use a discrete-time competing risk model with the aim to detect the determinants of the choices of Southern Italian students after their bachelor degree: discontinuing university; enrolling at the second level University degree course in a Southern university, or (moving) to Central or Northern universities. We will analyze the role played by demographic variables, time elapsed to get the first level degree, the performance in the previous schooling career, etc. in order to provide mover or stayer profiles of Southern bachelors.
Marco Enea
Monitoring School Performance Using Value-Added and Value-Table Models: Lessons from the UK
Abstract
Since 1992, the UK Government has published so-called ‘school league tables’ summarizing the average attainment and progress made by students in each state-funded secondary school in England. In this article, we statistically critique and compare prominent past, current and forthcoming value-added and value-table measures of school performance. We discuss the advantages and disadvantages of these different measures as well as their underlying statistical models.
George Leckie, Harvey Goldstein

Economic and Financial Data Analysis

Frontmatter
Indexing the Normalized Worthiness of Social Agents
Abstract
A class of indexes is proposed to evaluate the “worthiness” of the performance of social agents (e.g. governors of health-care districts, schools, etc.), which is fully standardized on the conventional reference-framework specified by the policy-maker. An interdisciplinary attempt is made herein to integrate concepts and methods from different fields (management and political science, decision theory, statistics, economics, artificial intelligence, etc.). The performance is interpreted from the view of the policy-maker which pursues his overall-goal on a sequential planning of goals. The index is adapted on the data of the reference standard-agent, also normalized on the conventional behavior which has been specified by stakeholders through setting of a probabilistic model. Pseudo-Bayes tools are used into the normalization process.
Giulio D’Epifanio
Financial Crises and Their Impacts: Data Gaps and Innovation in Statistical Production
Abstract
Financial crises damage output and social cohesion. Lack of timely and accurate data makes it more difficult to assess risks’ build up. Information gaps can also limit the ability to respond to crises. This calls for better data to monitor economic and financial risks. Several measures taken by the international official statistics community address information needs. These include efforts to fill the data gaps, ensure policy relevance of key indicators, and measure the “unmeasured” complex dimensions of economy and society. Harnessing new data sources and promoting innovation in statistical production processes are key to improving timeliness and adequacy of statistical information services. Nowcasting and predictive analytics can enhance the provision of early warnings about crises.
Emanuele Baldacci
European Welfare Systems in Official Statistics: National and Local Levels
Abstract
In the last decades, European welfare systems have undergone continuous reforms in the light of financial pressures. Monitoring changes requires to consider several dimensions of welfare systems, such as the composition of risks and needs covered, the rules for accessing benefits or the type of social benefits delivered. Finally, it is relevant to take into account the geographical area where beneficiaries live, since in some countries local governments are assigned managing and, sometimes, legislative competencies on social protection areas. This paper aims at exploring official statistics on European welfare systems, by focusing on social benefits. The objective is assessing if available statistics allow one to compare the level and the kind of social benefits delivered across European countries both at national and sub-national levels. We focus on the Italian case to provide some examples.
Alessandra Coli, Barbara Pacini
Financial Variables Analysis by Inequality Decomposition
Abstract
This paper illustrates the use of the methods related to inequality decomposition for the analysis of financial variables. By means of the overlapping component and of the inequality between it is possible to detect and to assess the main factors determining the cross section assets variability.
Michele Costa

Sustainable Development: Theory, Measures and Applications

Frontmatter
A Novel Perspective in the Analysis of Sustainability, Inclusion and Smartness of Growth Through Europe 2020 Indicators
Abstract
The comparison of different territorial areas according to multiple factors raises the challenge of representing synthetically the complexity of multidimensional phenomena, such as the targets of growth promoted by the Europe 2020 strategy. We considered data for 10 years in order to highlight the evolution of the similarities and dissimilarities of the 28 European countries in the whole period. The analysis is centred on a technique which combines cluster analysis with the use of a composite indicator, thus permitting to identify Countries both according to their structural characteristics and to their overall performance. We also look at convergence processes among countries and link our results to GDP growth to better qualify countries patterns of development.
Elena Grimaccia, Tommaso Rondinella
The Italian Population Behaviours Toward Environmental Sustainability: A Study from Istat Surveys
Abstract
The interest of the Scientific Community in environmental protection issues aiming at guaranteeing future sustainability is constantly increasing. For this reason, the environmental social sciences, in recent years, are treating the interrelationships between population and environment. In Italy, an informative contribution comes from both Istat traditional Multipurpose Survey “Aspects of daily life” and Istat more recent Survey “Energy Consumption of Households”. The aims of the paper are: (1) to propose synthetic measures of PECB and PEEB (respectively, Pro-Environmental Curtailment and Efficiency Behaviours), by a methodology which facilitates the replicability of the analysis over time, on the basis of the upcoming Istat surveys on these topics; (2) to analyse the determinants of pro-environmental behaviours of Italian citizens, by deepening the direction of the relationships with socio-demographic and other relevant characteristics, using a multivariate data analysis approach.
Isabella Mingo, Valentina Talucci, Paola Ungaro
Estimating the at Risk of Poverty Rate Before and After Social Transfers at Provincial Level in Italy
Abstract
Considering the local areas where citizens live is fundamental to investigate deprivation and social exclusion, particularly in a period of increasing financial difficulties and reduction of public funding. In this work we estimate the at risk of poverty rate of Italian households before and after social transfers at provincial level. To obtain these estimates we use data coming from the EU-SILC 2013 survey and data coming from the population census and administrative archives in a small area estimation framework, since the design of EU-SILC survey does not allow for reliable direct estimation at provincial level. Our results, besides indicating the essential role of social transfers in the reduction of the at risk of poverty rate, allow a sub-national analysis of the phenomenon of interest that would be lost by using traditional statistical techniques.
Caterina Giusti, Stefano Marchetti
Metadata
Title
Studies in Theoretical and Applied Statistics
Editors
Prof. Cira Perna
Prof. Monica Pratesi
Prof. Anne Ruiz-Gazen
Copyright Year
2018
Electronic ISBN
978-3-319-73906-9
Print ISBN
978-3-319-73905-2
DOI
https://doi.org/10.1007/978-3-319-73906-9

Premium Partner