Skip to main content
Top

2023 | Book

Models for Data Analysis

SIS 2018, Palermo, Italy, June 20–22

insite
SEARCH

About this book

The 49Th Scientific meeting of the Italian Statistical Society was held in June 2018 in Palermo, with more than 450 attendants. There were plenary sessions as well as specialized and solicited and contributed sessions.

This volume collects a selection of twenty extended contributions covering a wide area of applied and theoretical issues, according to the modern trends in statistical sciences. Only to mention some topics, there are papers on modern textual analysis, sensorial analysis, social inequalities, themes on demography, modern modeling of functional data and high dimensional data, and many other topics.

This volume is addressed to academics, PhD students, professionals and researchers in applied and theoretical statistical models for data analysis.

Table of Contents

Frontmatter
Environmental Vibration Data Analysis for Damage Detection on a Civil Engineering Structure
Abstract
Management of the useful life and safety performance of the infrastructure of a motorway network is currently a topic of great interest. The dynamic behavior of civil engineering structures has usually been studied by means of ambient vibration observations, analyzing them with methods of Operational Modal Analysis and the so-called Peak-Picking technique. The present paper reports the results of an alternative multivariate statistical approach, specifically Principal Component Analysis (PCA), for ambient vibration data, collected to detect suspected structural damage on some specific highway bridge spans in Sicily. The method consists in comparing the system structures of undamaged spans with that coming from a suspected damaged one, as designated by optimal subspaces determined by PCA. The distance between the subspaces, measured by means of the maximum principal angle between them, provides evidence regarding damage in the span under investigation.
Gianna Agrò
Using Differential Geometry for Sparse High-Dimensional Risk Regression Models
Abstract
With the introduction of high-throughput technologies in clinical and epidemiological studies, the need for inferential tools that are able to deal with fat data-structures, i.e., relatively small number of observations compared to the number of features, is becoming more prominent. In this paper we propose an extension of the dgLARS method to high-dimensional risk regression models. The main idea of the proposed method is to use the differential geometric structure of the partial likelihood function in order to select the optimal subset of covariates.
Luigi Augugliaro, Ernst C. Wit, Hassan Pazira, Javier González, Fentaw Abegaz, Angelo M. Mineo
Perceived Benefits and Individual Characteristics of Internationally Mobile Students: A Discrete Latent Variable Analysis
Abstract
In this paper, the potential benefits of studying abroad during higher education are analysed on the basis of an online survey administered to about 1600 students of a medium-sized university located in the North of Italy (the University of Bergamo) who spent a credit mobility experience abroad during the academic years from 2008/09 to 2014/15. Two dimensions are specifically investigated: the impact of the international experience on the students’ skills, and the fulfilment of expectations concerning the international experience. A two-dimensional latent class Item Response Theory model under a concomitant variable approach is used to assess the item responses. The results show that international mobility positively affects students’ perception of personality development and improves their language and soft skills. Mobile students report gains in terms of personal growth and enhanced employability at home and abroad. However, individual characteristics influence the latent class membership probability.
Silvia Bacci, Valeria Caviezel, Anna Maria Falzoni
Consumers’ Preferences for Coffee Consumption: A Choice Experiment Integrated with Tasting and Chemical Analyses
Abstract
This study proposes an innovative approach for analysing consumer preferences for coffee by integrating a choice experiment with a guided tasting and chemical analysis. Firstly, two types of coffee were chosen from the mass market retailers with different sensorial profiles (100% Arabica, and Arabica and Robusta blends); subsequently, a guided tasting has been included to analyse the role of the sensory descriptors. An optimal design for the choice experiment was planned in order to achieve the joint purpose of the efficient estimation of the attributes, and the assessment of the information obtained from the guided tasting. The same choice experiment was administered twice, e.g. before and after the guided tasting. Random Utility Models were applied for better evaluating the consumers’ behaviour.
Rossella Berni, Nedka D. Nikiforova, Patrizia Pinelli
Urban Transformations and the Spatial Distribution of Foreign Immigrants in Messina
Abstract
Messina exhibits a fragmented urban structure, a consequence of past historical events, mainly the 1908 earthquake. After this tragic event, Messina experienced economic downturns and nowadays it passively suffers rather than managing its considerable mercantile traffics. The fragmented urban fabric affects the residential location of foreign migrants. Related literature distinguishes between two sources of spatial segregation: apparent contagion (i.e. economic inhomogeneities affecting the urban context) and true contagion (individual preference to live close to ethnically similar neighbors). We use point pattern analysis to assess residual clustering of migrant households while adjusting for economic inhomogeneity. We implement a case–control approach to avoid confounding between the two sources: migrant households represent cases, while a random sample of natives constitutes the controls. Results show that Sri Lankans, Filipinos (exceeding one kilometer), and Romanians exhibit the highest voluntary segregation, contributing to the creation of spatial clusters that boost the polycentric structure of Messina.
Francesca Bitonti, Angelo Mazza, Massimo Mucciardi, Luigi Scrofani
Cultural Participation and Social Inequality in the Digital Age: A Multilevel Cross-National Analysis in Europe
Abstract
Cultural participation is considered as a necessary element of social equity, able to generate positive effects on individual opportunities and quality of life as a whole. Adopting a cross-national perspective, this study considers both traditional and new cultural practices deriving from the rise of new technologies, aiming at analyse how social inequality affects cultural participation in the European countries in the digital age. The main specific goals are the following: (1) elaboration of a synthetic index of Cultural Participation at European level; (2) identification of the determinants of participation at both individual and country level; (3) test of the interactions between some country features and individual characteristics, indicators of social differences, to verify their effects on cultural participation. The empirical analysis is based on 26,053 respondents aged 15 years and over, collected by the Special Eurobarometer survey n. 399 containing comparable data on cultural participation. To take into account country characteristics, some variables have been taken from other statistical sources (Eurostat). Data analysis resorted to Nonlinear Principal Component Analysis and multilevel regression models.
Laura Bocci, Isabella Mingo
Reducing Bias of the Matching Estimator of Treatment Effect in a Nonexperimental Evaluation Procedure
Abstract
The traditional matching methods for the estimation of treatment parameters are often affected by selectivity bias due to the endogenous joint influence of latent factors on the assignment to treatment and on the outcome, especially in a cross-sectional framework. In this study, we show that the influence of unobserved factors involves a cross-correlation between the endogenous components of propensity scores and causal effects. We propose a correction for the bias effect of this correlation on matching results, adopting a state-space model to identify and estimate the unobserved factors. A Monte Carlo experiment supports this finding.
Maria Gabriella Campolo, Antonino Di Pino Incognito, Edoardo Otranto
Gender Gap Assessment and Inequality Decomposition
Abstract
We propose to measure and to evaluate gender gaps and gender inequalities by means of the decomposition of an inequality measure. A three-terms decomposition of the Gini index is applied, thus allowing to take into account also the role of overlapping between female and male subpopulations. We develop an unified framework for the evaluation of gender gap, linking traditional measures, based on subgroups income means, to the approach related to inequality decomposition, and showing how overlapping component represents a key issue in gender gap analysis. An analysis of the income distribution of the Italian households shows how gender gaps represent a major source of inequality, without particular improvements during the last 20 years.
Michele Costa
Functional Linear Models for the Analysis of Similarity of Waveforms
Abstract
In seismology methods based on waveform similarity analysis are adopted to identify sequences of events characterized by similar fault mechanism and propagation pattern. Seismic waves can be considered as spatially interdependent, three dimensional curves depending on time and the waveform similarity analysis can be configured as a functional clustering approach, on the basis of which the membership is assessed by the shape of the temporal patterns. For providing qualitative extraction of the most important information from the recorded signals, we propose the use of metadata, related to the waves, as covariates of a functional response regression model. The temporal patterns of this effects, as well as of the residual component, obtained after having taken into account the most relevant predictors, are investigated in order to detect a cluster structure. The implemented clustering techniques are based on functional data depth.
Francesca Di Salvo, Renata Rotondi, Giovanni Lanzano
Capturing Measurement Error Bias in Volatility Forecasting by Realized GARCH Models
Abstract
This paper proposes generalisations of the Realized GARCH model, in three different directions. First, heteroskedasticity of the noise term in the measurement equation is modelled letting the variance of the measurement error to vary over time as a function of an estimator of the Integrated Quarticity obtained from intra-daily returns. Second, to account for attenuation bias effects, volatility dynamics are allowed to depend on the accuracy of the realized measure letting the response coefficient of the lagged realized measure be a function of the time-varying variance of the volatility measurement error. Therefore, the model tends to assign more weight to lagged volatilities when they are measured more accurately. Finally, a further extension is proposed by introducing an additional explanatory variable into the measurement equation, aiming to quantify the bias due to the effect of jumps.
Richard Gerlach, Antonio Naimoli, Giuseppe Storti
Zero Inflated Bivariate Poisson Regression Models for a Sport (in)activity Data Analysis
Abstract
In this paper, we analyze a sport (in)activity case study using a zero inflated bivariate Poisson model. We use the “(in)activity” term in order to embrace both active and passive sport participation (practicing or watching a sport, respectively). The paper investigates the determinants of sport (in)activity: the frequency and the probability of sports participation. It distinguishes between genuine “non-participants” and the ones who do not participate at a time of the survey but might under different circumstances.
Maria Iannario, Ioannis Ntzoufras, Claudia Tarantola
Network-Based Dimensionality Reduction for Textual Datasets
Abstract
There is an increasing interest in developing statistical tools for extracting information from textual datasets. In a text mining framework, a knowledge discovery process typically implies the reduction of the vocabulary dimensionality, via a feature selection or a feature extraction approach. Here we propose a strategy designed to reduce the dimensionality of textual datasets through a network-based procedure. Network tools allow performing the reduction taking into account the association relations among terms used in the texts. The effectiveness of this strategy is shown by analysing a set of tweets about the recent COVID-19 global pandemic.
Michelangelo Misuraca, Germana Scepi, Maria Spano
Assessing the Performance of the Italian Translations of Modified MEIM, EIS and FESM Scales to Measure Ethnic Identity: A Case Study
Abstract
Measuring the ethnic identity of linguistic minorities is a research problem which can be tackled departing from a clear operational definition of the construct. This paper will present the performance of the Italian translations of various scales widely used in the relevant literature, which have been modified for the aims of this study and used in research conducted in 2016 in the Arbereshe Municipalities of Piana degli Albanesi and Santa Cristina Gela (Province of Palermo). These scales consist of modifications of the Multigroup Ethnic Identity Measure (MEIM), Ethnic Identity Scale (EIS) and Familial Ethnic Socialization Measure (FESM). The psychometric properties were analysed for all these scales in terms of reliability and unidimensionality, using Classical Test Theory (CTT) and Item Response Theory (IRT). The latter proved useful in suggesting further improvements regarding the scales.
Antonino Mario Oliveri, Gabriella Polizzi
Towards Global Monitoring: Equating the Food Insecurity Experience Scale (FIES) and Food Insecurity Scales in Latin America
Abstract
In order to face food insecurity as a global phenomenon, it is essential to rely on measurement tools that guarantee comparability across countries. Although the official indicators adopted by the United Nations in the context of the Sustainable Development Goals (SDGs) and based on the Food Insecurity Experience Scale (FIES) already embeds cross-country comparability, other experiential scales of food insecurity currently employ national thresholds and issues of comparability thus arise. In this work we address comparability of food insecurity experience-based scales by presenting two different studies. The first one involves the FIES and three national scales (ELCSA, EMSA and EBIA) currently included in national surveys in Guatemala, Ecuador, Mexico and Brazil. The second study concerns the adult and children versions of these national scales. Different methods from the equating practice of the educational testing field are explored: classical and based on the Item Response Theory (IRT).
Federica Onori, Sara Viviani, Pierpaolo Brutti
Position Weighted Decision Trees for Ranking Data
Abstract
Preference data represent a particular type of ranking data where a group of people gives their preferences over a set of alternatives. Within this framework, distance-based decision trees represent a non-parametric tool for identifying the profiles of subjects giving a similar ranking. This paper aims at detecting, in the framework of (complete and incomplete) ranking data, the impact of the differently structured weighted distances for building decision trees. By means of simulations, we will compute the impact of higher/lower homogeneity in groups and different weighting structures both on splitting and on consensus ranking. The distances that will be used satisfy Kemeny’s axioms and, accordingly, a modified version of the rank correlation coefficient \(\tau _x\), proposed by Emond and Mason, will be proposed and used for rank aggregation and class label in the tree leaves.
Antonella Plaia, Simona Buscemi, Mariangela Sciandra
European Funds and Regional Convergence: From the European Context to the Italian Scenario
Abstract
The inclusive economic growth and the territorial cohesion represent the central points of the EU agenda. The regional funds are the main European Regional Policy aimed to increase the employment levels in the Union and to reduce the territorial divides between the backward and forward regions. This work aims to verify the effectiveness of 2007–2013 EU funding by means of the Difference-in-Differences regression on official data referring to European NUTS-2 regions. The aim is twofold. First, we verify whether the regional funds narrowed employment disparities in the European context. Second, we focus on Italy as one of the largest beneficiary countries of funds. The results suggest a general ineffectiveness of funds across the European countries and an even worst scenario in Italy. The quality of institutions, the fund management by the national and regional governments and the monitoring activities are the main causes of failure of regional policies in Italy.
Gennaro Punzo, Mariateresa Ciommi, Gaetano Musella
A BoD Composite Indicator to Measure the Italian “Sole 24 Ore” Quality of Life
Abstract
The measure of Quality of Life (QoL) is still a topic widely discussed in literature. In Italy, the newspaper “Il Sole 24 Ore” publishes a famous ranking that highlights strong disparities among provinces. In this paper, “Il Sole 24 Ore” and BoD-DEA methods are compared in order to show how different types of normalization and aggregation significantly influence the results making these rankings very fragile and questionable.
Mariantonietta Ruggieri, Gianna Agrò, Erasmo Vassallo
Trends and Random Walks in Mortality Series
Abstract
The notion that time series cannot be properly dealt with until their nature has been established is nowadays largely accepted among economists, less so among demographers. In this paper, based on theoretical considerations and empirical data, we prove that mortality evolves over time following a geometric random walk with drift. If this is true, other series too must follow a non-stationary path, for instance person-years and survivors in mortality tables, and survivors in actual populations. In the empirical part of the paper, we carry out 160 tests on age-specific log-mortality rates in France and England-Wales (at ages 0–79) over the years 1850–2016. The DS (difference stationary), not TS (trend stationary) nature of the series emerges clearly, probably with just one unit root.
Giambattista Salinari, Gustavo De Santis
Fuzzy and Model Based Clustering Methods: Can We Fruitfully Compare Them?
Abstract
During the last years, fuzzy and model-based approaches to clustering have received a great deal of attention and have been increasingly used in several empirical contexts. Even if they are very different from a theoretical point of view, they are similar in practice. In fact, model-based clustering gives posterior probabilities of component, treated as cluster, membership. Fuzzy clustering assigns observations to clusters through fuzzy membership degrees, while no probabilistic assumption is made to represent the clusters. The aim of this work is to compare the performance of some clustering methods belonging to the two approaches, in terms of recovering the true clusters, in a large scale simulation study.
Alessio Serafini, Luca Scrucca, Marco Alfò, Paolo Giordani, Maria Brigida Ferraro
An Analysis of Misclassification Rates in Rater Agreement Studies
Abstract
This study aims at investigating, via a Monte Carlo simulation, the performance of two non-parametric benchmarking procedures for characterizing the extent of rater agreement in non asymptotic conditions. The performance of each procedure has been evaluated by computing an overall weighted misclassification rate; moreover, in order to investigate whether the procedures overestimate or underestimate the level of agreement, misclassification frequencies have been computed for each agreement category.
Amalia Vanacore, Maria Sole Pellegrino
Backmatter
Metadata
Title
Models for Data Analysis
Editors
Eugenio Brentari
Marcello Chiodi
Ernst-Jan Camiel Wit
Copyright Year
2023
Electronic ISBN
978-3-031-15885-8
Print ISBN
978-3-031-15884-1
DOI
https://doi.org/10.1007/978-3-031-15885-8

Premium Partner