Skip to main content
Top

2021 | Book

Advances in Contemporary Statistics and Econometrics

Festschrift in Honor of Christine Thomas-Agnan

insite
SEARCH

About this book

This book presents a unique collection of contributions on modern topics in statistics and econometrics, written by leading experts in the respective disciplines and their intersections. It addresses nonparametric statistics and econometrics, quantiles and expectiles, and advanced methods for complex data, including spatial and compositional data, as well as tools for empirical studies in economics and the social sciences. The book was written in honor of Christine Thomas-Agnan on the occasion of her 65th birthday. Given its scope, it will appeal to researchers and PhD students in statistics and econometrics alike who are interested in the latest developments in their field.

Table of Contents

Frontmatter

Nonparametric Statistics and Econometrics

Frontmatter
Profile Least Squares Estimators in the Monotone Single Index Model

We consider least squares estimators of the finite regression parameter $$\boldsymbol{\alpha }$$ α in the single index regression model $$Y=\psi (\boldsymbol{\alpha }^T\boldsymbol{X})+\varepsilon $$ Y = ψ ( α T X ) + ε , where $$\boldsymbol{X}$$ X is a d-dimensional random vector, $${\mathbb E}(Y|\boldsymbol{X})=\psi (\boldsymbol{\alpha }^T\boldsymbol{X})$$ E ( Y | X ) = ψ ( α T X ) , and $$\psi $$ ψ is a monotone. It has been suggested to estimate $$\boldsymbol{\alpha }$$ α by a profile least squares estimator, minimizing $$\sum _{i=1}^n(Y_i-\psi (\boldsymbol{\alpha }^T\boldsymbol{X}_i))^2$$ ∑ i = 1 n ( Y i - ψ ( α T X i ) ) 2 over monotone $$\psi $$ ψ and $$\boldsymbol{\alpha }$$ α on the boundary $$\mathcal {S}_{d-1}$$ S d - 1 of the unit ball. Although this suggestion has been around for a long time, it is still unknown whether the estimate is $$\sqrt{n}$$ n -convergent. We show that a profile least squares estimator, using the same pointwise least squares estimator for fixed $$\boldsymbol{\alpha }$$ α , but using a different global sum of squares, is $$\sqrt{n}$$ n -convergent and asymptotically normal. The difference between the corresponding loss functions is studied and also a comparison with other methods is given.

Fadoua Balabdaoui, Piet Groeneboom
Optimization by Gradient Boosting

Gradient boosting is a state-of-the-art prediction technique that sequentially produces a model in the form of linear combinations of elementary predictors—typically decision trees—by solving an infinite-dimensional convex optimization problem. We provide in the present paper a thorough analysis of two widespread versions of gradient boosting, and introduce a general framework for studying these algorithms from the point of view of functional optimization. We prove their convergence as the number of iterations tends to infinity and highlight the importance of having a strongly convex risk functional to minimize. We also present a reasonable statistical context ensuring consistency properties of the boosting predictors as the sample size grows. In our approach, the optimization procedures are run forever (that is, without resorting to an early stopping strategy), and statistical regularization is basically achieved via an appropriate $$L^2$$ L 2 penalization of the loss and strong convexity arguments.

Gérard Biau, Benoît Cadre
Nonparametric Model-Based Estimators for the Cumulative Distribution Function of a Right Censored Variable in a Small Area

In survey analysis, the estimation of the cumulative distribution function (cdf) is of great interest as it facilitates the derivation of mean/median estimators for both populations and sub-populations (i.e. domains). We focus on small domains and consider the case where the response variable is right censored. Under this framework, we propose a nonparametric model-based estimator that extends the cdf estimator of Casanova (2012) to the censored case: it uses auxiliary information in the form of a continuous covariate and utilizes nonparametric quantile regression. We then employ simulations to compare the constructed estimator with the model-based cdf estimator of Casanova and Leconte (2015) and the Kaplan–Meier estimator (Kaplan and Meier 1958), both of which use only information contained within the domain: the quantile-based estimator performs better than the former two for very small domain sample sizes. Access times to the first job for young female graduates in the Occitania region are used to illustrate the new methodology.

Sandrine Casanova, Eve Leconte
Relaxing Monotonicity in Endogenous Selection Models and Application to Surveys

This paper considers endogenous selection models, in particular, nonparametric ones. Estimating the unconditional law of the outcomes is possible when one uses instrumental variables. Using a selection equation which is additively separable in a one dimensional unobservable has the sometimes undesirable property of instrument monotonicity. We present models which allow for nonmonotonicity and are based on nonparametric random coefficients indices. We discuss their nonparametric identification and apply these results to inference on nonlinear statistics such as the Gini index in surveys when the nonresponse is not missing at random.

Eric Gautier
B-Spline Estimation in a Survey Sampling Framework

Nonparametric regression models have been used more and more over the last years to model survey data and incorporate efficiently auxiliary information in order to improve the estimation of totals, means or other study parameters such as Gini index or poverty rate. B-spline nonparametric regression has the benefit of being very flexible in modeling nonlinear survey data while keeping many similarities and properties of the classical linear regression. This method proved to be efficient for deriving a unique system of weights which allowed to estimate in an efficient way and simultaneously many study parameters. Applications on real and simulated survey data showed its high efficiency. This paper aims at giving a review of applications of the B-spline nonparametric regression in a survey sampling framework and design-based approach. Handling item nonresponse by B-spline modeling is also considered. This review includes also new properties and improved consistency rates of the suggested penalized and unpenalized B-spline estimators.

Camelia Goga
Computational Outlier Detection Methods in Sliced Inverse Regression

Sliced inverse regression (SIR) focuses on the relationship between a dependent variable y and a p-dimensional explanatory variable x in a semiparametric regression model, in which, the link relies on an index $$x'\beta $$ x ′ β and link function f. SIR allows estimating the direction of $$\beta $$ β that forms the effective dimension reduction (EDR) space. Based on the estimated index, the link function f can then be nonparametrically estimated using kernel estimator. This two-step approach is sensitive to the presence of outliers in the data. The aim of this paper is to propose computational methods to detect outliers in that kind of single-index regression model. Three outlier detection methods are proposed and their numerical behaviors are illustrated on a simulated sample. To discriminate outliers from “normal” observations, they use IB (in-bags) or OOB (out-of-bags) prediction errors from subsampling or resampling approaches. These methods, implemented in R, are compared with each other in a simulation study. An application on a real data is also provided.

Hadrien Lorenzo, Jérôme Saracco
Uncoupled Isotonic Regression with Discrete Errors

In Rigollet and Weed (2019), an estimator was proposed for the uncoupled isotonic regression problem. It was shown that a so-called minimum Wasserstein deconvolution estimator achieves the rate $$\log \log n / \log n$$ log log n / log n . Furthermore, it was shown that for normally distributed errors, this rate is optimal. In this note, we will show that for error distributions supported on a finite set of points, this rate can be improved to the order of $$n ^{-1/(2p)}$$ n - 1 / ( 2 p ) for L $$_p$$ p -risks. We also show that this rate is optimal and cannot be improved for Bernoulli errors.

Jan Meis, Enno Mammen

Quantiles and Expectiles

Frontmatter
Partially Linear Expectile Regression Using Local Polynomial Fitting

This chapter deals with partially linear expectile regression using local polynomial fitting as a basic smoothing technique in the various steps. The advantage of the estimation method is that an explicit expression for an optimal choice of the bandwidth (matrix) can be established, and based on this, a rule-of-thumb bandwidth selector is presented. A small simulation study demonstrates that the estimation method with this data-driven choice of the bandwidth performs very well. An illustration with a real data example is provided.

Cécile Adam, Irène Gijbels
Piecewise Linear Continuous Estimators of the Quantile Function

In Blanke and Bosq (2018), families of piecewise linear estimators of the distribution function F were introduced. It was shown that they reduce the mean integrated squared error (MISE) of the empirical distribution function $$F_n$$ F n and that the minimal MISE was reached by connecting the midpoints $$(\frac{X_k^{*}+ X^{*}_{k+1}}{2}, \frac{k}{n})$$ ( X k ∗ + X k + 1 ∗ 2 , k n ) , with $$X_1^{*},\dotsc ,X_n^{*}$$ X 1 ∗ , ⋯ , X n ∗ the order statistics. In this contribution, we consider the reciprocal estimators, built respectively for known and unknown support of distribution, for estimating the quantile function $$F^{-1}$$ F - 1 . We prove that these piecewise linear continuous estimators again strictly improve the MISE of the classical sample quantile function $$F_n^{-1}$$ F n - 1 .

Delphine Blanke, Denis Bosq
Single-Index Quantile Regression Models for Censored Data

When the dimension of the covariate space is high, semiparametric regression models become indispensable to gain flexibility while avoiding the curse of dimensionality. These considerations become even more important for incomplete data. In this work, we consider the estimation of a semiparametric single-index model for conditional quantiles with right-censored data. Iteratively applying the local-linear smoothing approach, we simultaneously estimate the linear coefficients and the link function. We show that our estimating procedure is consistent and we study its asymptotic distribution. Numerical results are used to show the validity of our procedure and to illustrate the finite-sample performance of the proposed estimators.

Axel Bücher, Anouar El Ghouch, Ingrid Van Keilegom
Extreme -quantile Kernel Regression

Quantiles are recognized tools for risk management and can be seen as minimizers of an $$L^1$$ L 1 -loss function, but do not define coherent risk measures in general. Expectiles, meanwhile, are minimizers of an $$L^2$$ L 2 -loss function and define coherent risk measures; they have started to be considered as good alternatives to quantiles in insurance and finance. Quantiles and expectiles belong to the wider family of $$L^p$$ L p -quantiles. We propose here to construct kernel estimators of extreme conditional $$L^p$$ L p -quantiles. We study their asymptotic properties in the context of conditional heavy-tailed distributions, and we show through a simulation study that taking $$p \in (1,2)$$ p ∈ ( 1 , 2 ) may allow to recover extreme conditional quantiles and expectiles accurately. Our estimators are also showcased on a real insurance data set.

Stéphane Girard, Gilles Stupfler, Antoine Usseglio-Carleve
Robust Efficiency Analysis of Public Hospitals in Queensland, Australia

In this study, we utilize various approaches for efficiency analysis to explore the state of efficiency of public hospitals in Queensland, Australia, in the year 2016/17. Besides the traditional nonparametric approaches like DEA and FDH, we also use a more recent and very promising robust approach–order- $$\alpha $$ α quantile frontier estimators (Aragon et al. 2005). Upon obtaining the individual estimates from various approaches, we also analyze performance on a more aggregate level—the level of Local Hospital Networks by using an aggregate efficiency measure constructed from the estimated individual efficiency scores. Our analysis suggests that the relatively low efficiency of some Local Hospital Networks in Queensland can be partially explained by the fact that the majority of their hospitals are small and located in remote areas.

Bao Hoang Nguyen, Valentin Zelenyuk
On the Behavior of Extreme d-dimensional Spatial Quantiles Under Minimal Assumptions

Spatial or geometric quantiles are among the most celebrated concepts of multivariate quantiles. The spatial quantile $$\mu _{\alpha ,u}(P)$$ μ α , u ( P ) of a probability measure P over $$\mathbb {R}^d$$ R d is a point in $$\mathbb R^d$$ R d indexed by an order $$\alpha \in [0,1)$$ α ∈ [ 0 , 1 ) and a direction u in the unit sphere $$\mathcal {S}^{d-1}$$ S d - 1 of $$\mathbb R^d$$ R d —or equivalently by a vector $$\alpha u$$ α u in the open unit ball of $$\mathbb R^d$$ R d . Recently, Girard and Stupfler (2017) proved that (i) the extreme quantiles $$\mu _{\alpha ,u}(P)$$ μ α , u ( P ) obtained as $$\alpha \rightarrow 1$$ α → 1 exit all compact sets of $$\mathbb R^d$$ R d and that (ii) they do so in a direction converging to u. These results help understanding the nature of these quantiles: the first result is particularly striking as it holds even if P has a bounded support, whereas the second one clarifies the delicate dependence of spatial quantiles on u. However, they were established under assumptions imposing that P is non-atomic, so that it is unclear whether they hold for empirical probability measures. We improve on this by proving these results under much milder conditions, allowing for the sample case. This prevents using gradient condition arguments, which makes the proofs very challenging. We also weaken the well-known sufficient condition for the uniqueness of finite-dimensional spatial quantiles.

Davy Paindaveine, Joni Virta
Modelling Flow in Gas Transmission Networks Using Shape-Constrained Expectile Regression

The flow of natural gas within a gas transmission network is studied with the aim to model high-demand situations. Knowledge about the latter can be used to optimise such networks. The analysis of data using shape-constrained expectile regression provides deeper insights into the behaviour of gas flow within the network. The models describe dependence of the maximal daily gas flow on the air temperature, including further effects, like day of the week and type of node. Particular attention is given to spatial effects. Geoadditive models offer a combination of such effects and are easily estimated with penalised mean regression. In order to put special emphasis on the highest demands, we use expectile regression, a quantile-like extension of mean regression that offers the same flexibility. Additional assumptions on the influence of the temperature can be added via shape-constraints. The forecast of gas loads for very low temperatures based on this approach and the application of the obtained results is discussed.

Fabian Otto-Sobotka, Radoslava Mirkov, Benjamin Hofner, Thomas Kneib

Spatial Statistics and Econometrics

Frontmatter
Asymptotic Analysis of Maximum Likelihood Estimation of Covariance Parameters for Gaussian Processes: An Introduction with Proofs

This article provides an introduction to the asymptotic analysis of covariance parameter estimation for Gaussian processes. Maximum likelihood estimation is considered. The aim of this introduction is to be accessible to a wide audience and to present some existing results and proof techniques from the literature. The increasing-domain and fixed-domain asymptotic settings are considered. Under increasing-domain asymptotics, it is shown that in general all the components of the covariance parameter can be estimated consistently by maximum likelihood and that asymptotic normality holds. In contrast, under fixed-domain asymptotics, only some components of the covariance parameter, constituting the microergodic parameter, can be estimated consistently. Under fixed-domain asymptotics, the special case of the family of isotropic Matérn covariance functions is considered. It is shown that only a combination of the variance and spatial scale parameter is microergodic. A consistency and asymptotic normality proof is sketched for maximum likelihood estimators.

François Bachoc
Global Scan Methods for Comparing Two Spatial Point Processes

In many scientific areas such as forestry, ecology, or epidemiology, deciding whether two spatial point patterns are equally distributed is an important issue. This work proposes an adaptation of spatial scan methods, originally designed for local cluster detection, in order to test for the global similarity between two spatial point patterns. We design two spatial global scan statistics based on likelihood ratio on the one hand and on moments on the other, and explain how to compute their significance. A simulation procedure is conducted to compare these global scan methods to others based on kernel density estimation or second-order summary statistics. We also apply them to a dataset of wildfires registered in France.

Florent Bonneu, Lionel Cucala
Assessing Spillover Effects of Spatial Policies with Semiparametric Zero-Inflated Models and Random Forests

The aim of this work is to estimate the variation over time of the spatial spillover effects of a public policy that was devoted to boost rural development in France over the period 1993–2002. At a micro data level, it is often observed that the dependent variable, such as local employment in a municipality, does not vary along time, so that we face a kind of zero inflated phenomenon that cannot be dealt with a classical continuous response model or propensity score approaches. We consider two recent non parametric techniques that are able to deal with that estimation issue. The first approach consists in fitting two generalized additive models to estimate both the probability of no variation as well as the variation along time of the continuous part of the response. The second approach is based on the use of random forests which can naturally handle the observation of a mixture of a discrete response as well as a continuous one. Instead of estimating average treatment effects, we take advantage of the flexibility of the non parametric approaches to estimate what would have been the potential outcome under treatment, as well as treatment of the neighboring municipalities, on some particular municipalities chosen as being representative or as being of particular interest. The results indicate the evidence of interesting patterns of temporal spatially-mediated spillover effects of the policy with relevant nonlinear effects. Policy spillovers matter, even if they are generally not high in magnitude, for some municipalities with specific demographic and economic characteristics.

Hervé Cardot, Antonio Musolesi
Spatial Autocorrelation in Econometric Land Use Models: An Overview

This chapter provides an overview of the literature on econometric land use models including spatial autocorrelation. These models are useful to analyze the determinants of land use changes and to study their implications for the environment (carbon stocks, water quality, biodiversity, ecosystem services). Recent methodological advances in spatial econometrics have improved the quality of econometric models allowing them to identify more precisely the determinants of land use changes and make more accurate land use predictions. We review the current state of the literature on studies which account explicitly for spatial autocorrelation in econometric land use models or in the environmental impacts of land use.

Raja Chakir, Julie Le Gallo
Modeling Dependence in Spatio-Temporal Econometrics

This chapter is concerned with lattice data that have a temporal label as well as a spatial label, where these spatio-temporal data appear in the “space-time cube” as a time series of spatial lattice (regular or irregular) processes. The spatio-temporal autoregressive (STAR) models have traditionally been used to model such data but, importantly, one should include a component of variation that models instantaneous spatial dependence as well. That is, the STAR model should include the spatial autoregressive (SAR) model as a subcomponent, for which we give a generic form. Perhaps more importantly, we illustrate how noisy and missing data can be accounted for by using the STAR-like models as process models, alongside a data model and potentially a parameter model, in a hierarchical statistical model (HM).

Noel Cressie, Christopher K. Wikle
Guidelines on Areal Interpolation Methods

The objective of this article is to delve deeper into the understanding and practical implementation of classical areal interpolation methods using R software. Based on a survey paper from Do et al. (Spat Stat 14:412–438, 2015), we focus on four classical methods used in the area-to-area interpolation problem: point-in-polygon, areal weighting interpolation, dasymetric method with auxiliary variable and dasymetric method with control zones. Using the departmental election database for Toulouse in 2015, we find that the point-in-polygon method can be applied if the sources are much smaller than the targets; the areal interpolation method provides good results if the variable of interest is related to the area, but otherwise, a good alternative is to use the dasymetric method with another auxiliary variable; and finally, the dasymetric method with control zones allows us to benefit from both areal interpolation and dasymetric method and, from that perspective, seems to be the best method.

Van Huyen Do, Thibault Laurent, Anne Vanhems
Predictions in Spatial Econometric Models: Application to Unemployment Data

In the context of localized unemployment rates in France, we study the issue of prediction of spatial econometric models for areal data, by applying the prediction formulas gathered and derived in Goulard et al. (Spatial Economic Analysis, 12(2–3), 304–325, 2017), (2017). To model regional unemployment taking into account local interactions, we estimate several spatial econometric model specifications, namely, the spatial autoregressive SAR and SDM models, as well as the SLX model. We consider both types of predictions, namely, in-sample and out-of-sample prediction. We show that the prediction can be a complementary method to testing procedures for model comparison.

Thibault Laurent, Paula Margaretic
Lagrangian Spatio-Temporal Nonstationary Covariance Functions

The Lagrangian reference frame has been used to model spatio-temporal dependence of purely spatial second-order stationary random fields that are being transported. This modeling paradigm involves transforming a purely spatial process to spatio-temporal by introducing a transformation in the spatial coordinates. Recently, it has been used to capture dependence in space and time of transported purely spatial random fields with second-order nonstationarity. However, under this modeling framework, the presence of mechanisms enforcing second-order nonstationary behavior introduces considerable challenges in parameter estimation. To address these, we propose a new estimation methodology which includes modeling the second-order nonstationarity parameters by means of thin plate splines and estimating all the parameters via two-step maximum likelihood estimation. In addition, through numerical experiments, we tackle the consequences of model misspecification. That is, we discuss the implications, both in the stationary and nonstationary cases, of fitting Lagrangian spatio-temporal covariance functions to data generated from non-Lagrangian models, and vice versa. Lastly, we apply the Lagrangian models and the new estimation technique to analyze particulate matter concentrations over Saudi Arabia.

Mary Lai O. Salvaña, Marc G. Genton

Compositional Data Analysis

Frontmatter
Logratio Approach to Distributional Modeling

Distributional data, such as age distributions of populations, can be treated as continuous or discrete data, but the main interest is in the relative information, e.g., in terms of ratios (or logratios) between the different age classes. Here we present a unifying framework for the discrete and the continuous case based on the theory of Bayes spaces. While the discrete case is more widely treated in the literature, the continuous case allows to make a link to functional data analysis. Moreover, the methodological framework of Bayes spaces can also be used to develop methods for analyzing several distributional variables simultaneously. It turns out that the centered logratio transformation is a convenient tool for practical computations. Two real data examples illustrate the usefulness of this framework.

Peter Filzmoser, Karel Hron, Alessandra Menafoglio
A Spatial Durbin Model for Compositional Data

A compositional linear model (regression of a scalar response on a set of compositions) for areal data is proposed, where observations are not independent and present spatial autocorrelation. Specifically, we borrow thoughts from the spatial Durbin model considering that it produces unbiased coefficient estimates compared to other spatial linear regression models (including the spatial error model, the spatial autoregressive model, the Kelejian-Prucha model, and the spatial Durbin error model). The orthonormal log-ratio (olr) transformation based on a sequential binary partition of compositions and maximum likelihood estimation method are employed to estimate the new model. We also check the proposed estimators on a simulated and a real dataset.

Tingting Huang, Gilbert Saporta, Huiwen Wang
Compositional Analysis of Exchange Rates

Triangular arbitrage in the foreign exchange market of a group of countries exists whenever it is possible to make profit by buying and selling their currencies using the spot exchange rates. Working in the framework of the Aitchison geometry, and using characterizations of the absence of triangular arbitrage, we present two applications to the currencies of Brazil (Real), the European Union (Euro), Great Britain (Pound Sterling), and the United States of America (US Dollar). The first application refers to the Special Drawing Rights, an asset created by the International Monetary Fund to provide liquidity to the member countries. The exchange rates matrix is projected onto the subspace of no-arbitrage exchange rate matrices, and its only eigenvector, associated with a non-null eigenvalue, is demonstrated to be compositional and close to the Special Drawing Rights. The second application studies the relative exchange rate bubbles among the countries. It uses the closest no-arbitrage matrix of an exchange rate matrix and the purchasing power parity values for the fundamental exchange rates to analyze the dynamics of those bubbles. These applications show the potential the compositional approach has for the matrices of exchange rates.

Wilfredo L. Maldonado, Juan José Egozcue, Vera Pawlowsky-Glahn
Log-contrast and Orthonormal Log-ratio Coordinates for Compositional Data with a Total

Compositional data require an appropriate statistical analysis because they provide the relative importance of the parts of a whole. Methods based on log-ratio coordinates give a consistent framework for analyzing this type of data. Any statistical model including variables created using the original parts should be formulated according to the geometry of the simplex. This geometry includes the log-contrast: a simple way to express a set of log-ratios in a linear form. Basic concepts and properties of log-ratios, log-contrasts, and orthonormal coordinates are revisited. In addition, we introduce an approach that includes both the log-ratio orthonormal coordinates and an auxiliary variable carrying absolute information. We illustrate the approach through the principal component analysis and discriminant analysis of real data sets.

Josep Antoni Martín-Fernández, Carles Barceló-Vidal
Independent Component Analysis for Compositional Data

Compositional data represent a specific family of multivariate data, where the information of interest is contained in the ratios between parts rather than in absolute values of single parts. The analysis of such specific data is challenging as the application of standard multivariate analysis tools on the raw observations can lead to spurious results. Hence, it is appropriate to apply certain transformations prior to further analysis. One popular multivariate data analysis tool is independent component analysis. Independent component analysis aims to find statistically independent components in the data and as such might be seen as an extension to principal component analysis. In this paper, we examine an approach of how to apply independent component analysis on compositional data by respecting the nature of the latter and demonstrate the usefulness of this procedure on a metabolomics dataset.

Christoph Muehlmann, Kamila Fačevicová, Alžběta Gardlo, Hana Janečková, Klaus Nordhausen
Diet Quality and Food Sources in Vietnam: First Evidence Using Compositional Data Analysis

Food environments have been evolving rapidly in lower-middle-income countries. Nevertheless, little is known about the impact of these changes on diet quality. Thanks to the availability of detailed data on Vietnamese household consumption, this chapter presents a set of first results on the association between food sources and diet quality. These results highlight the contrasts between three Vietnamese districts located on an urban to rural gradient. We used recent advances in compositional data analysis to take into account the compositional nature of the share data describing the different food sources: principal balances as a tool for summarizing information carried by share data and techniques to deal with observed zero-valued shares.

Michel Simioni, Huong Thi Trinh, Tuyen Thi Thanh Huynh, Thao-Vy Vuong

Tools for Empirical Studies in Economics and Social Sciences

Frontmatter
Mobility for Study and Professional Integration: An Empirical Overview of the Situation in France Based on the Céreq generational surveys

This chapter serves to elucidate the empirical reality of the phenomenon of geographical mobility among students and young graduates, based on data taken from five generational surveys conducted by Céreq. Our study shows that the degree of mobility among students’ region of origin, region of education, and region of employment is relatively low: less than one in three high school graduates move to another region for their university studies, and less than one in three university graduates move to another region to find employment. The children of senior executives/Master’s degrees are more likely to move to another region to pursue further education or find employment. Furthermore, more than half of such interregional movements correspond to people returning home. These results appear to demonstrate that individuals remain strongly geographically rooted: relatively few people move, and some of those movements correspond to people returning home.

Bastien Bernela, Liliane Bonnal, Pascal Favard
Toward a FAIR Reproducible Research

Two major movements are actively at work to change the way research is done, shared, and reproduced. The first is the reproducible research (RR) approach, which has never been easier to implement given the current availability of tools and DIY manuals. The second is the FAIR (Findable, Accessible, Interoperable, and Reusable) approach, which aims to support the availability and sharing of research materials. We show here that despite the efforts made by researchers to improve the reproducibility of their research, the initial goals of RR remain mostly unmet. There is great demand, both within the scientific community and from the general public, for greater transparency and for trusted published results. As a scientific community, we need to reorganize the diffusion of all materials used in a study and to rethink the publication process. Researchers and journal reviewers should be able to easily use research materials for reproducibility, replicability, or reusability purposes or for exploration of new research paths. Here we present how the research process, from data collection to paper publication, could be reorganized and introduce some already available tools and initiatives. We show that even in cases in which data are confidential, journals and institutions can organize and promote “FAIR-like RR” solutions where not only the published paper but also all related materials can be used by any researcher.

Christophe Bontemps, Valérie Orozco
“One Man, One Vote” Part 2: Measurement of Malapportionment and Disproportionality and the Lorenz Curve A: Introduction and Measurement Tools

The main objective of this paper is to explore and estimate the departure from the “One Man, One Vote” principle in the context of political representation and its consequences for distributive politics. To proceed to the measurement of the inequalities in the representation of territories (geographical under/over representation) or opinions/parties (ideological under/over representation), we import (with some important qualifications and adjustments) the Lorenz curve which is an important tool in the economics of income distribution. We consider subsequently some malapportionment and disproportionality indices. We provide several applications of these concepts in Chap. 32 .

Olivier de Mouzon, Thibault Laurent, Michel Le Breton
“One Man, One Vote” Part 2: Measurement of Malapportionment and Disproportionality and the Lorenz Curve B: Applications

This chapter contains applications of the tools to the evaluation of malapportionment and disproportionality, already presented in Chap. 31 . It is applied to the 2010 Electoral College and the French parliamentary and local elections with a special attention to the electoral reform of 2015. In these applications, the Lorenz curve ordering is almost conclusive, and consequently the Gini and DK indices are aligned and complement the almost complete ranking derived from Lorenz.

Olivier de Mouzon, Thibault Laurent, Michel Le Breton
Visualizing France with Cartograms

France has a long tradition of using statistical (choropleth) maps, which use shading to represent the spatial distribution of a variable, such as population, by department. Such maps lead the observer to underestimate the importance of urban areas, especially Paris. A solution that complements the choropleth map is to create a cartogram, which deliberately distorts each department so that the area is in proportion to the variable (such as population). Shading can then be used to show a second variable, typically representing density, on the same map. We illustrate the use of cartograms for the case of metropolitan France, with maps that show the spatial distribution of social housing, unemployment, immigration, suicides, election patterns, and the advance of COVID-19. The maps are relatively straightforward to construct, using ArcMap, but attention is needed to the use of colors and classifications. The cartograms reveal patterns that would not be clear based solely on traditional statistical maps.

Jonathan Haughton, Dominique Haughton
Kernel and Dissimilarity Methods for Exploratory Analysis in a Social Context

While most of the statistical methods for prediction or data mining have been built for data made of independent observations of a common set of p numerical variables, many real-world applications do not fit in this framework. A more common and general situation is the case where a relevant similarity or dissimilarity can be computed between the observations, providing a summary of their relations to each other. This setting is related to the kernel framework that has allowed to extend most of standard statistical supervised and unsupervised methods to any type of data for which a relevant such kernel can be obtained. The present chapter aims at presenting kernel methods in general, with a specific focus on the less studied unsupervised framework. We illustrate its usefulness by describing the extension of self-organizing maps and by proposing an approach to combine kernels in an efficient way. The overall approach is illustrated on categorical time series in a social-science context and allows to illustrate how the choice of a given type of dissimilarity or group of dissimilarities can influence the output of the exploratory analysis.

Jérôme Mariette, Madalina Olteanu, Nathalie Vialaneix
Of Particles and Molecules: Application of Particle Filtering to Irrigated Agriculture in Punjab, India

We present an estimation method for agricultural crop yield functions, when unobserved productivity depends on water availability that is only partially observed. Using the setting of Bayesian non-linear filtering for estimating Hidden Markov Models, we discuss joint estimation of state variables and parameters in a structural production model with potentially endogenous regressors. An extension to particle filtering with resampling, convolution filter based on kernel regularization, is then discussed. We apply this non-parametric method to estimate a system of structural equations for rice crop yield and unobserved productivity on panel data for 10 districts in Punjab, India. Results based on computer-intensive resampling steps illustrate the interest of convolution particle filtering techniques, with low interquartile range of time-varying estimates. We compare fertilizer elasticity estimates with and without accounting for unobserved productivity, and we find a significant relationship between unobserved productivity and nitrogen fertilizer input, when the former is conditioned on district-level climate variables (summer rainfall, potential evapotranspiration).

Alban Thomas
Metadata
Title
Advances in Contemporary Statistics and Econometrics
Editors
Abdelaati Daouia
Anne Ruiz-Gazen
Copyright Year
2021
Electronic ISBN
978-3-030-73249-3
Print ISBN
978-3-030-73248-6
DOI
https://doi.org/10.1007/978-3-030-73249-3

Premium Partner