The Representative Household Survey
The first classic type of statistical infrastructure
examined here has its origins in France
in the 1950s. It was imported from the United States
by public statisticians working at the INSEE (Institut National de la Statistique et des Études Économiques, the French National Institute for Statistics) in the immediate post-war period, who had spent time across the Atlantic training in the latest methodological innovations.
6 Among other tools, they brought back the random
sample survey
(then used in
the area of employment
). This technique gave way to the adoption of the now standard
notion of statistical representativeness, providing the necessary conditions for the description of the national population as a whole, as opposed to the hitherto targeting of specific subpopulations (Desrosières,
2008, vol. 2, ch. 8, p. 194). This period was characterized by the strong belief in the scientificity of a ‘new statistical language’ for economic
matters (Desrosières,
2008, vol. 2, ch. 3), which coincided with the desire to depict a post-war society in full reconstruction. Numerous surveys
were at the time created, forming a first point of departure for repeated studies still active today. Described as ‘structural’, these surveys dealt with general
themes that both organized administrative action
and reported on French daily life: employment
(1950), housing (1955), family budgets
(1956), health (1960), training and employment skills (1963), time-use (1966). The ‘programme of priority
surveys
on standards of living’, adopted in 1965 under the Fifth Plan, reflects the accordance of this type of survey with the knowledge and management of
social and economic life objectives
pursued at the time by the General Planning Commissioner.
The questionnaires
of these surveys
are usually short and in paper format
.
7 They have only a few filters (technical indications
that determine whether to ask one or a set of questions) and the general principle is to use the same questions for the entire population, with identical formulations and response
options
for all respondents. They follow the model
of a social identity card resembling, for example, the census
report which de facto defines the principal socio-demographic characteristics of the population. There is thus only space for the ‘major variables
’,
8 or those of an administrative nature, approved by public statistics
and assumed to be unanimously and uniformly understood. These variables
generally go hand in hand with legal categories
or are derived from institutions
, such as civil status
registers for sex and age, nationality and country of birth for geographic origin, diploma or nationally certified trainings for level of education
, administrative subdivisions (departments, regions) for place
of residence or work, contractual terms for professional situation
(type of contract
, working time). As suggested by Michel Gollac (
1997), the law
saves on construction
costs, as there is a shared belief that the categories
that refer to the law are solid.
Due to high production costs (up until the introduction of micro-computing)—the survey
samples
are smaller in size and their analysis constrained by limited automated processing capacities—the results
often take the form
of tables or charts, with few intersecting variables
: on one side there are indicators on employment
, housing, health, etc., and on the other side those corresponding to socio-demographic characteristics (sex, age cohort, nationality, region of residence, etc.). Their purpose is to provide ‘photographs’, thematic snapshots of society, so as to gain an understanding of its organization and functioning. The periodic reissue of the surveys
provides insight into macro-social dynamics. The results
are produced according to schemas of a structural-functionalist inspiration that govern the elaboration of the surveys. It is in this manner that the demographic, social or economic behaviour
of the population and its households are studied: the economy and social matters primarily being the domain of the ‘head’ (a man)
9; the domestic and familial reserved for women, their spouses.
10.
Public statistics considers the household to be the central unit of analysis. Indeed, this is the title given, for almost twenty-five years, to the section primarily responsible for designing statistical infrastructure
on population at INSEE: the Population and Household Department (from 1966 to 1989). The notion of household separates the interior (private
) from the exterior (public) according to a strict gender division of roles. In addition to accommodating a male-dominated
vision of the world
, this concept also reflects a holistic vision of society and the economy. By law
, men and women long had clearly assigned roles within the household. Women could not work without their husband’s permission until 1965 in France
, and divorce by mutual consent was not introduced until 1975.
11 In this context
, the statistical household is seen as a full-fledged economic
actor
in terms of income, consumption, savings, or economic expectations
. As such, according to the monthly business survey
(still in place today), households have opinions, independent of the men and women, parents or children, who compose them.
The statistical nomenclature
of socio-professional categories
occupies a special place in this survey model
. Its success was total during the three decades following their creation in the early 1950s (Desrosières
& Thévenot,
1988). Broken down at the level of head of household or father, these categories are systematically used in statistical tables, evincing class
inequalities or those of social origin.
12 This ‘major variable’ is emblematic of the back and forth between public statistics
(and its surveys
) and the socio-economic administration in France
at the time: on the one hand, statistical nomenclature
draws on social categories
, occupational subdivisions which, backed by the law
, are in place within companies and administrations; on the other hand, it is used directly by social actors, whether under the General Planning Commissioner or, to mention just one example, the indexation of the minimum wage which gives rise to national negotiations
between labour unions and employers’
organizations.
The scope of these surveys is usually households in ordinary accommodations in metropolitan France, which compose the statistical heart of society, an echo of the electoral body. The non-zero probability of selection of households that organize the sampling procedures can, in fact, be thought of as the equivalent of a statistical right to vote. Furthermore, the methods of analysis used are essentially summation techniques, much like the adding up of votes in an election. The notion of representativeness is central here, its statistical meaning lending political acceptance to the term. If, in addition, we consider the particular role played by the law and institutions, this survey model certainly seems emblematic of the representative democracy of intermediary bodies characteristic of France from the 1950s to the 1980s. Indeed, the General Planning Commissioner constituted one of the primary transmission channels in organizing government and social partner participation in the elaboration of medium-term policies based, specifically, on predictions from statistical surveys.
This political-administrative tone is found in the term ‘survey
’ itself. While it has certainly been used in a generic sense since the post-war period within the French statistical community
, it also refers, originally, to the search for information
within a judicial
framework, then by further extension the systematic collection of testimonies or documentation aimed at clarifying an issue or dispute
.
13 These different definitions share the implicit meaning
of an ‘unveiling’; of obtaining private
, sometimes secret or hidden, information. The word also recalls the asymmetry of the survey context
, which has long been reinforced by the sociological profile of INSEE interviewers (former gendarmes and military personnel) (see Dussert,
1996). This asymmetry is, moreover, particularly significant in that it distinguishes between ordinary households and public agents. The statistical survey
, notably that which follows this first model
, is marked by the State’s seal: official
notification letters often accompanied by an obligation to respond, the professional card of the interviewer knocking on the door bearing the colours of the flag—such elements give off an air of an administrative questioning.
This survey
model was for many years the only one that existed. Then progressively, starting in the late 1970s and particularly in the 1980s, two other models
were developed. That said, this initial form
has not disappeared, but rather continues to exist, giving way to hybridization between original and emergent models with the micro-computerization and the integration of longitudinal
questionings (see ‘matched panels
’ below). Today they are essentially annual surveys by wave (on the labour force, housing, etc.) which, as a continuation of their antecedents from the 1950s and 1960s, compose the ‘back bone’ of the INSEE statistical infrastructure
.
14 There have, however, been two notable developments. First, they are increasingly governed by European
regulations
, for the purposes of updating social descriptions at the continental level in the form
of reporting indicators or national barometers relying on several ‘core variables’. While similar to the ‘major variables’ mentioned above, the latter differs in not always referring to institutional categories
, in the absence of common institutions
at the European level.
15 Second, they aim to more fully cover the entire population, surveying segments of the population usually labelled as “outside the scope” such as the homeless, those in institutions (prisons
, health or social establishments) or by geographical extension of existent statistical infrastructure
(such as for the overseas departments).
The Biographical Investigation
The second type of statistical infrastructure
on population examined here has its origins at INSEE during a time of reflection and critique of the social sciences in France
starting in the mid-1970s with, for example, the shift towards Pierre Bourdieu’s
critical sociology, affirmation of the work of Michel Foucault
, and the start of Luc Boltanski
and Laurent Thévenot’s
pragmatic sociology. A two-fold movement of diversification thus began to shape large scale public statistical infrastructure. First, in terms of the variables
used, with less primacy given to institutional categories
in the questionnaires
, and more openness towards the social sciences, whereby theoretical
advances and methodological observations of ordinary practices
provided new ways of questioning the world
. Secondly and more broadly, the themes of such infrastructure
diversified, going well beyond the economic
behaviours
of households and the socio-demographic characteristics of their members.
16
The ‘biographical investigation’ questionnaires
stand out for their length and evident distancing from examinations of a more administrative nature. They follow a linear path much like the biographical interviews of interpretive sociology,
17 and frequently employ retrospective questions and timelines to reconstruct respondents
’ trajectories. To this regard, two practical protocols were developed. The timelines can, on the one hand, rely on paper chronologies allowing respondents to mark their own points of reference
(a birth, a move, a promotion
, etc.) and in this way reconstruct different parallel accounts (i.e. familial, residential, professional, etc.). On the other hand, or simultaneously, the timelines can be assessed using resources offered by computerization
, allowing to gradually unfold stories in function of past events. Thanks to filters and the configuration of successive questions, the survey
fits the life of the respondent like a glove. In this model
, the survey situation
targets
the unit of time and of place
in order to ensure the coherence of responses
, which rely on memory recall, sensitivity, and the perception
of contexts
, and thus depend on the interaction between interviewer and interviewee
.
The objective of ‘biographical investigations’ is less about consistency with official categories (specific to the ‘representative household surveys’) or the pureness of ‘matched panels’ (see below) than the sincerity and coherence of the responses provided. This statistical infrastructure model has contributed to the development of new questions relative to emotional experiences, whether they be physical or mental, using a subjective (perception or opinion; feeling or emotion) or more objective (ordinary situations, practical experiences) approach. In this way, violence, suffering or hardship, physical ailments or bodily nuisances, satisfaction or happiness, freedom, etc., become ‘statisticable’ notions. Simultaneously, information on the temporal context or local environment is collected, on different levels or according to different timeframes, often fixed by the respondents themselves. Multiple ‘nested’ descriptive circles can thus be identified: from the closest members who compose the ‘living unit’ or ‘relations’ (terms which invite moving beyond the alleged unicity of the ‘household’) to the furthest, such as the social class or geographical area to which one feels belonging.
Examples of statistical infrastructure
within this model
, which share some or all of its features, are as diverse in their themes as in the government
departments or administrative bodies
that produce them. They were originally carried out mostly by INED (the French Institute for Demographic Studies), where in the 1980s demographers began implementing statistical modelling for the analysis of biographies
(Courgeau & Lelièvre,
1989),
18 and then a decade later, multilevel or contextual analyses (Baccaini & Courgeau,
1997).
19 Such statistical infrastructures
have, however, also subsequently been used by a number of other public institutions
. Two particularly stand out for their attention to biographical nature, to perceptions
of past situations
, and importance given to the contexts
in which personal
trajectories unfold. The first, the
Health and Career Path survey
(2006 and 2010;
Santé et itinéraire professionnel) asks respondents
to reconstruct their entire professional careers while also indicating major health events, with the objective of understanding how health and work influence one another over time. The second, the
Life History—Construction of Identities survey (2003;
Histoire de vie—construction des identités) combines a complete retrospective timeline (residence, family, employment
, economic
well-being
) with questions aimed at understanding the articulation of the latter with different facets of personal identity (e.g. family, work, friends, hobbies, health, origins, etc.). As explained in the survey’s
guideline note, the idea is to account for the multiple processes
of identifying individuals
with places
, groups, histories, values: “the individual bears multiple identities” whose “main dimensions must be explored” (see Héran,
1998). These two statistical infrastructures
both leave a great deal of freedom
to the respondents in how they mark their biographical itineraries and give more room to their subjectivity
.
More than one method has been developed to use data collected in this way. Duration models
, chronograms and, more generally, life course analysis methods all aim to understand procedural logics, successive choices, bifurcations, potential disruptions or protections of a given trajectory.
20 In a different way, exploratory factorial analyses can both show structural oppositions within the population and the coherence of answers
for each respondent. Results
can lead to a first, inductive, modelling of areas previously little explored and where a full understanding has not been reached in the absence of structuring ‘major variables
’.
21 More generally, “biographical investigations” seem to be consistent with the desire to reconcile the holism and individualism
we find in the “new sociologies” described by Philippe Corcuff (
2007 [
1995]).
22 In two different registers and disciplines, the methods derived from Amartya Sen’s
capability
approach in economics
and the multilevel analyses used in demography (Baccaini & Courgeau,
1997) have been adopted in efforts to understand the effect of situated
interactions and local contexts
.
23
These approaches all have in common the fact that statistical representativeness is not the primary concern. Certainly the statistical infrastructure relies on a random sampling procedure and the subsequent analyses often use weights based on the latter, but this use is secondary in that it is the processes, the consistencies or oppositions, that are of particular interest. Echoing the sociological interview principle, the methods share a reasoning ‘by row’ at least as much as ‘by column’; in the sense that they first follow the logic of the respondents, not that of the variables. This sort of thinking is present from the very conception of the questionnaires—the queries are formulated using verbatim accounts from sociological studies—and of their computerization (filters linked to previous responses, which act like reminders during an interview). The term ‘investigation’ used here recalls the exploratory dimension of this statistical infrastructure: simultaneously as a study of ‘rows’ (that is to say, an attempt to reconstruct, for each person, their complex biographical history, their subjectivity, their social inscription) and taken as a whole, in many ways following an exploratory research approach.
The Matched Panels
The third form of statistical infrastructure responds to a micro-causalist agenda, which differs both from grand narratives which collective entities deploy as historical causes, or biographical narrations where causality is presented in a singular way. The statistical infrastructure that follows this logic was made possible by unprecedented advances in information processing, and maintains a close relationship with micro-econometrics. Long dominated by macro-structural approaches, they were developed along with individual computerization in the 1990s before becoming dominant in the next decade. Like the ‘biographical investigation’, this statistical infrastructure uses a large number of variables. However, in contrast, their collection of information does not necessarily suppose a specific unit of place and of time. While the information gathered can certainly draw on questions asked by an interviewer during a single interview, the latter is just as likely to derive from subsequent interviews (possibly with different interviewers and respondents for the same panel unit) or from matching with external data of diverse origins. In fact, the origin or the situated consistency of the collected information is not of all that much importance. What counts is their quality, thought of as intrinsic, and the ability of the statistical infrastructure to amass pertinent variables for each respondent so as to be able to saturate the explanatory models with the phenomena under analysis.
When the classic format of the face-to-face interview
is used, this form
of statistical infrastructure
relies on a complex questionnaire
including numerous case disjunctions, where computerization
plays a crucial role. Multiple filters and parameter settings allow to adjust the questioning and adapt the formulations so as to obtain, for each respondent
, the best measure of the targeted variables
. Computerized questionnaires are not modelled on either the social identity card of the ‘representative
household survey’, or the carefully tailored ‘biographical investigation’. They are foremost routines, computer
programmes, difficult to grasp in their entirety. Indeed, the initial structuring requires a paper format translation, necessary in order to give analysts a clear understanding. The current
Labour Force Survey (
Enquête Emploi) questionnaire
provides a good example: its paper version is more than 80 pages long (compared to the simple DIN-A3 sheet in its original form
); its reading, broken up by numerous filters and parameter settings, underlines the complex task of trying to construct a linear, narrative version. It is, more broadly, an intricate set of documents (technical notes, training guide, survey lexicon or glossary, etc.) necessary for using the ‘data’. These questionnaires, fragmented and complex, in reality, blend into the object
they target
, i.e. panel
individuals
; direct
questioning is one way (among others) of obtaining information
deemed pertinent. The collected variables
rarely, or secondarily, refer to practical experience
or to institutional categories
, but rather ideally and in principle, to observed facts
24 in accordance with theoretical
categories which, besides, derive more from the natural sciences
than the social sciences. They are ideally continuous, being categorical only when imposed by constraints (conditioning of reality or resulting from data collection arrangements). It is about ‘good variables’ (to use a common econometric expression
), and not ‘ordinary variables’ or ‘major variables’ which are both too dependent on the administrative, spatial or temporal context
. Education, for example, is thus ideally measured by number of years of study, and not by diplomas obtained (for which recognition
by the State can vary over time) or the level of education
reported (susceptible to perception
bias). The aim of establishing causalities, scientifically demonstrated and ideally universal in scope, necessitates discarding variables
that do not relate to any theory
or may be endogenous. The term ‘data’ is preferred to ‘responses
’; the constructed and declarative nature of the information
must be neutralized in the analysis, following an objectivist plan that aims to get rid of limits associated with the subjectivity
of the respondents or with conventions linked to institutions
in the development of variables
.
This type of statistical infrastructure
is also characterized by a specific way of viewing time. Like the ‘biographical investigations’, longitudinal information
is of central importance. In contrast, however, here such information is collected prospectively through data collection or matching of information
repeated over time. Retrospective examinations are seen as tainted by bias, echoing the work of Karl Popper in the immediate post-war period, later analysed in detail and criticized by Luc Boltanski
(
2012), whereby history, the past, memory, are all seen as obstacles to the establishment of scientific
truth
, which should be timeless. If time plays a role, it is only as a source of variation and turned towards the future; it allows to identify causalities between actual and subsequent events. It is not about understanding historical processes
—viewed as impossible to grasp other than as partial constructions
(loss of respondents
from one examination to another) or biased reconstructions (memory). It is a question of having repeated measures, of independent observations over time of the same panel
units, so as to establish statistical links between various changes
that have affected them over the period under analysis. At INSEE, a shift in orientation towards prospective panel data occurred in the mid-2000s (Chaleix & Lollivier,
2004), in part explaining our use of the term ‘matched panels’ to describe this model
. The adjective ‘matched’ emphasizes the matching used to collect various information
on the individuals
panelled.
Examples of such statistical infrastructure
, today numerous, are most often prospective panels
. They tend to dominate public statistics and have contributed to transforming ‘representative
household surveys’, leading to a hybridization of the two models
. This is the case, for example, with regard to the
Labour Force Survey (
Enquête Emploi), whose longitudinal
dimension was significantly reinforced in 2003, when it came to rely on a ‘continual’
25 statistical infrastructure. Whereas once it was (in its original form
) emblematic of our first survey model, two important characteristics now attest to its hybridization. First, its original objective
(establishment of the unemployment
rate
) has shifted from a reporting logic–made possible by the assumed strength of institutions
themselves to give meaning
to a direct question on labour status, notably unemployment
, to a factual
logic where only the combination of specific criteria
referring to the past week (or reference
period) determines unemployed status under the International Labour Office definition (Goux,
2003). Second, the increase in the number of interview
waves has allowed for the development of panel
analyses.
The European Community Household Panel (1994–2001) and its successor, the Survey on Income and Living Conditions (2006) provide additional examples of hybridizations. In addition, the Permanent Demographic Sample (Echantillon démographique permanent; census subsamples matched with civil status, electoral participation or, since 2011, social-fiscal data) and the Dads panel are older examples, which have recently been enhanced and whose exploitation has been strengthened.
Classic themes are addressed in this form of statistical infrastructure (e.g. housing, education, employment, health, etc.). They refer to areas of public action, but in a renewed form compared to the ‘representative household survey’ model. The emphasis is placed on individual change, its determinants and consequences, rather than on understanding broad macro-structural dynamics in an effort, for example, to organize national planning and accounting. In the shift towards micro-statistics, global management of the economy and society is no longer the central concern, as it was during the time of the planners, but rather scientific expertise, a priori neutral and independent, allowing to evaluate the capacity of public policies to change, within a given domain, individual behaviour. In this regard, incentive theory, sometimes implicitly, plays a determinant role. The political dimension of this form of statistical infrastructure is thus naturalized: their design integrates political objectives in advance, in the definition of eligible populations and target variables. Certain forms of this statistical infrastructure are particularly inventive in their efforts to evaluate public policies, such as the Panel on State-aided contracts (Panel des contrats aidés) carried out in 2008 and 2014 by the Ministry responsible for labour and employment. In this case, the policies were evaluated using a quasi-experimental model directly imported from the natural sciences (notably medical research).
While specific, this ‘matched panel’ sub-model (described here as ‘quasi-experimental’), is no less important from a symbolic point of view. The INSEE report on longitudinal
data (Chaleix & Lollivier,
2004) in fact accords a central place to the latter: panels
must be “targeted at specific populations (that is chosen in reference
to the evaluation
process
of a given social policy, which usually means that collection begins before the introduction of the policy
in question, so as to be able to observe change
over time)”, “coupled with information
from administrative registers (to limit survey
time and assure greater reliability
of certain information)” and follow a “sampling plan that includes a control group” (Chaleix & Lollivier,
2004, p. 22).
In ‘matched panels
’, the notion of representativeness is secondary. The descriptive results
are usually but a first step, which aims to verify that there is no bias in the structure of the sample
excluding an identifiable segment (according to observable characteristics) of the population of interest. Indeed, for panels, maintaining the initial representativeness of the samples is particularly challenging due to attrition. Although dynamic weighting can be developed, helping to guarantee
representativeness over time, most studies restrict their analyses to cylindrical samples (i.e. to individuals
who responded
in all the waves of the panel
) and don’t use weights. These methodological options
highlight the tension between survey
specialists and econometricians,
26 and more broadly between the aims of representatively describing populations and estimating factors associated with a specific situation
or behaviour
. The methods used, most often micro-econometric, neither need nor take into account the representative
structure of the samples
analysed, as if the ‘data’ were exhaustive.
27 It is the size of the sample, or the repetition of observations (within a population and/or over time) and the (large) number and (‘good’) quality of variables
that are determinant in demonstrating the ‘purified’ causalities of composition effects or selection
processes
.
Econometric modelling generally identifies one or more response
variables
—a behaviour
or a situation
that should evolve or improve in the population, such as educational attainment, participation
in the labour market, etc.—and a large variety of explanatory variables
. Among the latter, an analytical distinction is made between so-called variables ‘of interest’ and ‘control’ variables. The former, which correspond to potential public action levers (financial or in-kind support, for example), dominate the reflection while the latter delineate as precisely as possible the socio-economic profile, geographical or professional environmental, etc., of the respondents. Ideally, as mentioned above, control variables
should be exogenous (that is, not depend on either the response
variables
or the variables of interest
). They can thus either be an assumed stable property
of the respondents (e.g. sex, date of birth, age at end of initial study, etc.) or a characteristic of their environment that they have not chosen.
28 The number and the precision of these control variables
condition the degree of purity of the statistical associations between the response
variables and the variables of interest. They define what econometricians call observable heterogeneity and are not generally described in published results
. They are only occasionally used in analyses of subpopulations (e.g. women or men, those with low levels of education
, etc.) aimed at evaluating the differential effectiveness
of the policies studied.
Establishing causality relies on even more demanding and sophisticated methods than those leading to controlled correlations. Without going into the technical details of the three main methods employed (instrumental and panel
econometrics, experimental matching), we can emphasize that all use the temporal and contextual richness of individual
data to
eliminate any source of endogeneity (reverse causality, omitted variable, etc.) and to ensure perfect control of individual heterogeneity (observable
and unobservable
29). Such types of statistical infrastructure
differ from the previous two presented above, as much in their objectives
as in their formats and analytical methods. Undoubtedly, these models
lean on different social science disciplines and the influence of the latter on French public statistics over the last seventy years. But they are not limited to this influence, in the sense that methods circulate and spread from one discipline to another, and the ‘models’ that compose the three types of statistical infrastructure identified here take on, in fact, a normative
dimension as they are implemented by public statisticians.
Homo Statisticus: Three Types of Being Constructed by the Statistical Infrastructure
The description of the three models
provides several initial characteristics of the
beings to which the different statistical infrastructures
refer. In what follows, we summarize their features using the same terminology as that employed by Alain Supiot
to describe the three pillars of his
homo juridicus (Supiot,
2007 [
2005]), with which the three forms of
homo statisticus share a surprisingly similar kinship.
We have chosen to call the
being defined in and by ‘representative
household surveys’,
subject. This
subject has a more pronounced political dimension than the
beings of the other two models
. Its tone is administrative, as we have seen, with its official
socio-demographic characteristics, the institutional variables
defining its identity. From the outset, the characteristics of the
subject refer to
the collective
beings with which it is assumed to identify (at a minimum during the survey), according to the process
described by Nicolas Dodier (
1996) of temporarily accepting assimilation to the pragmatic condition of statistical identity. This identity is borne by the ‘major variables’ of ‘representative
household surveys’: the
subject is the representative, among others, of collective beings; it is a member of a class
, a category
, or a group. Its singularity is not, in a sense, detachable from its properties
, which are familial and sexed status
within the household (‘head’ then ‘reference person’, or spouse, a child, etc.), age group (e.g. youth
30), social group (e.g. managers or blue-collar workers, farmers, etc.), nationality, or geographical unit of resident
(region or department).
Subjects are thus only indirectly represented, by their categories
of belonging: institutions
such as the family, school, company, or administration
, and more broadly the State (as source and guarantor of the law
) primarily determine the categories that express, as much to society as to themselves, the surveyed
beings. It is not, however, solely the range of collective
belongings
that public statistics uses to characterize its
subject, but also economic
behaviour
, and social roles expected of the
subject. This assignment is particularly clear with regard to the role of women, about whom statisticians ask: how should they assist in the reconstruction of post-war France
, by having children (according to a fertility logic whose military inspiration has shifted to a production aspiration) or by providing a source of additional labour-power
(Amossé & de Peretti,
2011)? More broadly, analyses of sex and age groups, migratory status
or social group, region of residence, are difficult to separate from a planning
vision that poses, or rather imposes, behaviours
based on a rationale
of matching social characteristics to expected socio-economic conditions. According to sex, age, social class
, etc. public statistics
see, but also and above all, foresee average
, or probable, conditions for its
subjects, in terms of school, work, family, etc.; the training-employment matrices initially developed under the General Planning Commissioner providing a good example.
By their categories
of belonging
, the
subjects are integrated into a
holistic representation
of the economy and of society, as cogs in a mechanism. These
beings are only active
subjects to the extent that they accept their assigned role of contributing to the collective
future. They are actors and acted upon, subjects of and subjected by public statistics, much like the two facets (active and passive) of the notion of ‘subject
’ that Alain Supiot
stresses in
homo juridicus (Supiot,
2007 [
2005], p. 17). In descriptions of the tables and graphs that result from these surveys
, it is the collective
beings that are the subjects of sentences.
31 In this way, they take on a certain realness, becoming common nouns (‘women’, ‘youth’, ‘managers’, etc.), that circulate both in public administration
services
, in scientific publications
and events, in meetings between social partners, and even in ordinary situations
. This is a vision of the world
made up of
subjects that are inseparable from their institutional social properties
borne by the first survey model
.
The parallel between electoral representation and statistical representativeness, central to this type of survey, illustrates in another way the political dimension of these subjects. With their non-zero probability of selection, according to the random sampling procedure, they are part of a ‘statisticable’ population, analogous to the electorate. Linking of the surveyed being to the whole is thus achieved in a similar way to the mode of voting, by aggregation-summation or, technically, with the use of survey weights. The subject is not only represented by collective entities, but also compared to the whole, to French society. From this point of view, recent change in the ‘representative household survey’ is not insignificant. Whether in terms of shifts due to processes of European harmonization or extension to margins previously considered ‘outside the scope’, this statistical infrastructure introduces two new types of subjects: French peri-subjects and European proto-subjects, which join in statistically defining the economy and society without, however, having a political representation that is, as of yet, anything more than marginal or nascent.
Even if sub-models (retrospective, subjective, situated) can be identified within ‘biographical investigations’, the corresponding statistical infrastructure draw on homine statistici with similar characteristics, what we call here persons. As highlighted above, such statistical infrastructure shares a certain kinship with sociology’s semi-structured interviews in that the beings to whom they are addressed are attempts to statistically capture the singularity of personal situations. They are constructed as the questionnaire proceeds: the questions follow a biographical trajectory, and sometimes adapt to the latter. The numerous open-ended questions take into account, in an exploratory fashion, the subjectivity of the respondents, allowing for the expression of opinions or assessments. The co-construction of these beings and of the questionnaires underline a critical difference, symbolically, from the subjects of the ‘representative household surveys’. Indeed, the statistical infrastructure of this second model grants greater freedom to the beings it addresses; these beings are not entirely pre-constructed as can be the case for subjects (who must accept that their singularity is limited to institutional categories) or individuals (whose characteristics must respond to the requisites of scientific theories, as we will see below).
It is neither the ‘major’ nor the ‘good’ variables
that define the
person, but the near as possible accounting of their experiences
, their practices
, their feelings, their wishes. This form
of statistical infrastructure seeks, in fact, to reveal the various facets of an identity, possibly plural, constructed over time; the multiple interactions that
persons have with their loved ones or their environment, their resources and their capacity
to take action
and plan
for the future. The biographical investigation is in itself an experience
, during which the
person surrenders, body
and soul, to the interviewer, much like the notion of ‘person
’ described by Alain Supiot
.
32 Some, such as the statistical infrastructure
underlying the investigation on
Violence Against Women (2000;
violences envers les femmes), have moreover raised practical and legal issues for public statistics, in their necessary consideration of the emotional burden and moral commitments faced by interviewees and interviewers.
The practical, sometimes intimate, identity of
persons considered is thus revealed during the administering of the questionnaire
, and the tone of the interviews often remains present in the production and reception of the results
: to the personal
implication of the respondents
corresponds an empathetic recall of statisticians as readers. The process
of identification, as described by Dodier (
1996), here calls upon the experience
, sometimes symbolic, of a common humanity (and not the sentiment of collective
belonging
as for the
subjects). From one end of the statistical chain to the other, attention is paid to the body and emotions
, to interactions and confrontations with other people, to biographical bifurcations and disruptions. Although there is no one analytical framework used in this type of statistical infrastructure
, the capability
approach developed by economist-philosopher Amartya Sen
(which, for example, has inspired recent forms of statistical infrastructure on educational pathways) corresponds quite closely to the notion of
person. It describes
beings who are a priori capable of desiring, of expressing themselves, of making themselves understood, of learning, and of working (when they have good quality jobs), and finally of finding a harmonious balance between work and family (Bonvin & Farvaque,
2007).
33 More broadly, this logic is consistent with political subjectification processes
linked to struggles for social rights.
We use the word
individual for the
beings corresponding to the ‘matched panels
’. This
being can be seen primarily as a way of identifying causalities between two groups of variables
(response
and of interest
, to use econometric terminology). The corresponding statistical infrastructure
is thus not really focused on
individuals per se, either in and of themselves (as
persons) or as members of the collective
entities
to which they belong (as
subjects). From a literal point of view, the term recalls a basic unit that cannot be divided. Certainly, the composite nature of the statistical definition of these
beings, resulting from an assembling of matched data, brings to mind Arjun Appadurai’s (
2016) concept of the “dividual” in reference
to actors in the finance sector who, like the derivatives they manipulate, are socially divided. Reference might also be made to the algorithmic beings of big data
, made up of multiple data marks and imprints (Rouvroy & Berns,
2013). That said, the resemblance is closest to the
individual described by Alain Supiot
in
Homo juridicus (Supiot,
2007 [
2005], p. 13): the individual is at once identical and unique, indivisible and stable (it is the “basic accounting unit
par excellence”); in its irreducible unicity (“unknowable essence and containing its own end”), a free
being (“substantial ego[s] that freely forge social links rather than being fashioned by them”).
The individuals in this third statistical infrastructure model have these qualities. They are characterized by stable and immutable properties, which neither determine nor are determined externally. These beings are different from subjects in that they are not a priori integrated into collective entities. They don’t have interactions, connections that they build over the course of time and that contribute in turn to their construction, like persons. Analytical methods for ‘matched panels’ are most often based on so-called ‘i.i.d.’ hypotheses, meaning that the statistical observations, or the individuals at a given moment in time, are independent and identically distributed. With regard to principles, and beyond specific modelling, dependence is not possible either between individuals or between their successive states over time. These beings are, in this way, alone and without history. They have neither depth nor belonging. They are a support for statistical identification, this essential heterogeneity on which the models rest, but which they ultimately aim to make disappear. Similar to the creation of data files for the use of heterogeneous information, their characteristics are removable: they are statistical tools (controls), not the objects of analysis, in the absence of a descriptive plan followed by exploitations using ‘matched panels’.
An aggregation of situations, individual states or behaviours, results from these statistical models. The statistical individuals of the ‘matched panels’ are not strictly speaking the individuals of standard economics. Although they share numerous traits, their definition is empirical and not theoretical; it is not necessarily assumed that they are driven only by their preferences or utility. There is not, in any event, explicit place for either communities, classes, groups established by belonging, or for local context, familiar environment, close circles. Like atoms without connections to one another, individuals cross time and space. Their trajectories are certainly influenced by their environment, which only acts, however, as an external element and is not, a priori, modified in turn. Moreover, this influence does not fundamentally change individuals. Time is made up of events, of shocks, that have consequences but don’t contribute to building a history, either personal or collective.
As basic beings of this type of statistical infrastructure, individuals have something almost fictional about them. They are both syntactically central–symbolizing and, especially, allowing the micro-statistical shift that characterizes the analyses—and semantically absent, upstream and downstream of data collection. They are necessary, but must also be overcome: they are seen as obstacles, as reflected by the notion of ‘unobservable heterogeneity’ and the clear need for the results to be reflected by the notion ‘purified’. The polysemy of the term ‘identify’ characterizes well this present-absent dynamic. It is not about the interviewees, interviewers, researchers, or readers being able to identify with the individuals who are produced by these statistical infrastructures (like with subjects or with persons, but in different ways). Rather it is about being able, thanks to them but also by making them disappear, to statistically identify equations, i.e. to assess statistical associations between variables that should be able to be separated from the observations on which they rely. Individuals are supports and not the objects of identification. This sacrifice of diversity is undoubtedly characteristic of statistical methods of analysis more generally, which aim to establish results that are valid beyond single cases. However, a non-trivial observation is that identification resources do exist in the first two statistical infrastructure models, whether in the institutional manner of subjects of collective entities to which they belong, or in empathy with persons “having declared that…”, “suffering from…”, “hoping that…” (to take several examples of phrases used in the ‘biographical investigations’). These resources disappear for individuals who, as we have seen, can be represented as mosaics that, before being assembled, were composed of heterogeneous, fragmented pieces. In analyses produced using this last statistical infrastructure, phrases most often directly link two types of variables: the first, those of interest, reflecting (eventually as consequence) the fact of having benefited from a public policy; the second measuring the evolution of a situation, of a state on a market (housing, education, employment, etc.).
Conclusion
Over a period of about seventy years, the French public statistical infrastructure on the general population has diversified. Today, three models co-exist: ‘biographical investigations’, and the ‘matched panels’, which were successively added to the original ‘representative household surveys’. These models have different specificities, not only in terms of their formats but also their theoretical frameworks, objectives, methods of analysis, and visions of public action. They imply three different types of beings who represent three variations of humanity (here in its statistical version) in as many ‘ideal types’: the subject, the person, and the individual. The last few decades have seen a blossoming of these last two beings, who previously were as if restrained by the sole logic of the subject in the ‘representative household surveys’.
The originality of these two typologies (of statistical infrastructure
and of
beings) is relative. There are echoes of divisions already highlighted and analysed, within the social sciences and its sub-disciplines. Indeed, the
subjects of the ‘representative
household surveys’ (and the collective
entities that represent them) are primarily found in the structural analysis of populations, macroeconomics, structural sociology, and social history of grand narratives. The
persons of the ‘biographical investigations’ have instead followed the reorientations of demography (interaction between events, diversity of populations, link to context
), sociology
(biographical shift, ‘new’ sociologies, etc.) and history (critical shift and microhistory). The
individuals of the
‘matched panels’
are analysed by micro-econometricians, neoclassical economists
or rational choice
sociologists, with ties to experimental psychology or contract
law
. Another limit of the proposed typology is that, one might argue, only certain statistical infrastructures
possess all the characteristics used to describe the models
. Yet, the strength of these proposals is their empirical basis, the way that they precisely aggregate a large number of traits that result in a ternary interplay of oppositions. Following the intuitions of French pioneering work on the social history of statistics
, the analysis of quantitative tools seems particularly instructive for organizing the plurality of ways of seeing reality. That said, the transformation of statistical
beings—the tensions, conflicts, hegemonic
drive of notions associated with
subjects, persons and
individuals—does not solely correspond to the evolution of social theories in France
. There is every reason
to think that the elements discussed here have a more general scope, and are relevant to the “politics of statistics” (Desrosières,
1998 [
1993]).
What can we conclude, politically, from this diversification of statistical infrastructure
and statistical
beings? First, that the emergence of
individuals, who occupy both a central and evanescent place
in their statistical infrastructure, echoes critiques often formed against neoliberalism
, where incentivized
individuals would be
deemed free
, but without much power
, whereas the
subjects of a Keynesian planner, admittedly acted upon, were socially protected and retained their capacity
for collective
action
thanks to institutions
(Castel & Haroche,
2001). Our third model
is no less political
, nor any less related to the State than the first. The role of the State is, however, profoundly transformed, in that it aims to establish market instruments entailing feedback loops
on individual behaviours
, much
like the corresponding form of government
(neoliberal ‘city’) described
by Alain Desrosières
(
2014) or the absent State proposed by Robert Salais
(
2015). The
personal logic complicates this description, and cannot only be understood as a parenthesis between
subjects and
individuals. This logic nuances the ‘individualization’ movement broadly employed (in the social sciences as well as in public debate) to describe contemporary change
, but whose ambivalence can be sensed. The real ability to act belongs to
persons, the only
beings to which the statistical infrastructure
accords a value
per se. This evolution leaves somewhat open, however, the question of their collective
aggregation, or the political forms with which they may be associated.