1 1 Introduction
Retrospective reports in survey interviews and questionnaires are subject to many types of recall error, which may affect their completeness, consistency and dating accuracy (
Schwarz and Sudman 1994;
Scott and Alwin 1998;
Van der Vaart 1996;
Van der Vaart et al. 1995). In the social and the medical sciences, where many studies focus on the reconstruction of life histories, concerns about this problem have led to the development of so-called calendar instruments, or timeline techniques (
Freedman et al. 1988;
Sobell et al. 1988). These data collection procedures offer an alternative to regular survey questionnaires, which usually consist of lists of chronologically ordered standardized questions, organized in thematic blocks.
Calendar and timeline methods are aimed at helping respondents gain better access to long-term memory by providing them with a graphical time frame (for an example see the Appendix) in which life history information can be represented (
Van der Vaart 2004). This stimulates the respondent to relate, visually and/or mentally, the timing among several kinds of events. Inconsistencies in reports are more easily discovered and one event may prompt the recall of another. Additionally, detailed sequences of events are easier to record since they can be marked graphically in the time frame. In recent years the application of these techniques in social research has been growing rapidly. This fact is illustrated by their integration into large-scale, longitudinal social surveys such as the German Life History Study (
Brückner and Mayer 1998); the Panel Study of Income Dynamics in the USA (
Belli et al. 2001); and the Process of Social Integration of Young Adults in the Netherlands (
Van der Vaart 1996). There are some small (historical) differences between the concepts ‘calendar’ and ‘timeline’, as will be pointed out later, but here we will use the word ‘calendar’ for both methods.
Even though the general assumption is that calendar methods improve data quality, there has been little methodological research into their effectiveness. Given the fact that using these aided recall procedures tends to increase operational costs, more methodological insights are needed. In the past two decades, several authors (e.g.,
Freedman et al. 1988) have described the method in detail, most of the time focusing on the specific type of calendar, which they used in their own study. Still, little is known about the effects of calendar instruments on data quality, measured in terms of completeness and consistency of the data and the occurrence of dating errors. Only recently, a number of experimental studies have been conducted, the results of which indicate that calendars can indeed be beneficial to data quality.
This article presents an overview of calendar instruments currently used in different fields of research and their effects on the accuracy of retrospective reports. Firstly, we will provide an overview of the most important areas of application and present the instrument’s rationale. Secondly, design features and the suitability of the instrument for different modes of data collection will be described. Thirdly, there will be a detailed discussion of the effects of several types of calendar instruments on data quality. Fourthly, we will present the consequences for operational costs and a summary of interviewer and respondent evaluations of those instruments. Finally, we will draw conclusions from our review and offer suggestions for further research into specific features of calendar procedures.
4 4 Effects of calendar methods on data quality
4.1 4.1 Evaluations of data quality
As early as in the late 1960s
Balán et al. (1969) concluded, in what was probably the first (non-experimental) evaluation of calendar methods, that the calendar instrument had the following advantages over traditional question-list surveys:
1.
It improved the completeness of reports by enabling the interviewer to detect ‘gaps’ in the data provided by the respondent.
2.
Inconsistencies in the account could be detected by the interviewer or by the respondent himself. The respondent could then correct his original account.
3.
It facilitated recall for distinct events, by displaying those events as part of a sequence. This (supposedly) lead to a reduction of omissions.
4.
It improved timing of recalled events by allowing the respondent to relate events and dates from different life domains to each other.
Although the study by Balán and his colleagues did not have an experimental design, their observations are still valid today. The expected positive effects of calendar methods on completeness, consistency, recall and timing—as well as the implied effective mechanisms of the calendar—are main issues in evaluations of calendar methods.
Over the years several authors, though not explicitly referring to the Balán-study, have tested one or more of the four statements mentioned above. Many have also included more general observations about the data collection process, such as experiences with different modes of data collection (see previous section), respondent-interviewer rapport, and consequences for the duration of the interview. The body of research on calendar methods also includes a few psychometric studies on reliability and/or validity of data collected with calendar instruments.
Our review of the methodological literature reveals that the quality of the data collected with calendar instruments has been evaluated in multiple ways. The studies can be grouped into three categories:
2.
Studies in which the agreement between data collected with a calendar instrument and external data sources is measured, but no comparisons are made with regular questionnaires. External data sources include physicians’ records (
Rosenberg et al. 1983;
Wingo et al. 1988) or reports from earlier waves of longitudinal surveys (
Freedman et al. 1988).
First, we will turn to the first group and present some findings based on indirect comparisons between calendars and traditional questionnaires, after that results from the ‘agreement studies’ (group 2 and 3) will be presented.
4.2 4.2 Indirect comparisons between calendar data and regular survey data
The focus of the first group of studies is mainly restricted to indirect measures of data quality, in particular consistency of the data based on logical arguments (e.g., in most societies, there should be no overlap between marriages), completeness of the data (e.g., the detection of “gaps” in employment histories) and patterns in recalled dates (such as the use of prototypical values, i.e., “heaping”). Since these studies do not include an external standard of comparison, they cannot provide direct evidence for the superiority of calendar data in terms of accuracy. However, as will be illustrated below, they do provide some indications that the calendar method overall performs better in collecting recall data than the traditional question-list.
A split-ballot comparison between a calendar method and a traditional questionnaire in a fertility study (
Becker and Sosa 1992) indicated that the use of the calendar resulted in more consistent reports. It demonstrated that the calendar method resulted in less superposition of (supposedly) mutually exclusive behaviors: significantly less overlap of advanced pregnancy and contraception use was reported in the calendar condition (1.3%) than in the traditional interview (10.3%). Also supporting positive calendar effects, an interaction was found between the recency of the behavior and the effect of the calendar (
Goldman et al. 1989). Goldman and her colleagues note that the calendar instrument was especially effective in enhancing recall of contraceptive use in the beginning of the reference period. A similar effect was found in a study of domestic violence victimization (
Yoshihama et al. 2005). The results indicate that higher lifetime victimization rates in the calendar condition were caused by the fact that more respondents reported incidents, which took place in the distant past.
Studies that evaluated retrospective data in terms of completeness mostly concluded that the calendar method performs better than the traditional question-list. Calendars were found to be more helpful in reducing the amount of time unaccounted for in the respondent’s life course (
Engel et al. 2001b;
Goldman et al. 1989). This reduction is likely to be due to the visual nature of the calendar, which makes it easier for the interviewer to detect those left-out periods and ask the respondent about them (
Balán et al. 1969). Overall, calendars appear to result in higher numbers of reported events and episodes, which is usually interpreted as a positive effect (
Becker and Sosa 1992;
Engel et al. 2001b).
Regarding the heaping of reported event dates—which occurs when respondents report prototypical values (e.g., courses starting in September, or “the accident happened two year ago”) instead of the actual values—only few studies are known to evaluate calendar effects (see also the next section). In an experimental evaluation
Goldman et al. (1989) found that the calendar method significantly reduced heaping in reports of contraceptive use. While in the traditional questionnaire condition a disproportionate number of women rounded durations to prototypical values of 6, 12, 24, 36, and 48months of use, this hardly occurred in the calendar condition. It should be noted however, that this difference was probably enhanced by the coding protocol. While in the questionnaire condition, interviewers could record durations in either months or years; in the calendar condition, interviewers were instructed to always code durations in months.
4.3 4.3 Agreement between calendar data and external sources
The second and third group of studies focus on direct assessments of agreement between the recalled information and the external information: in particular concerning the number of events, their characteristics and the duration or dates of events. Some authors turned to data sources such as doctors’ records (
Rosenberg et al. 1983), purchase records (
Van der Vaart and Glasner 2005) or population registers (
Auriat 1993) to validate the retrospective reports. In the absence of this type of validating information, authors compared calendar data with respondents’ earlier (concurrent) reports from the same longitudinal study (
Belli et al. 2001;
Freedman et al. 1988;
Van der Vaart 1996). It can be argued that comparisons of the latter type are an assessment of (test-retest) reliability rather than of validity (
Dex 1995). Nevertheless, it seems reasonable to assume that the amount of error is smaller in concurrent than in retrospective reports, since the former are less affected by memory bias. As illustrated below, the results of these both types of studies generally suggest that the calendar method has beneficial effects on data quality.
4.4 4.4 Non-experimental validation studies
While non-experimental agreement studies do not compare the performance of the calendar method to the performance of other methods, they do give an indication of the quality of calendar data. In this line,
Rosenberg et al. (1983) performed a record check study, which did not include a comparison with another type of questionnaire. Using doctors’ records as validation measures the authors report an agreement of 90% between the calendar data and the records for month-specific use of oral contraceptives. The mean duration of the reference period was 33months. The agreement between physicians’ records and self-reports decreased when brand and dose of contraceptive were also considered.
High levels of data quality were also reported in non-experimental longitudinal studies. In their evaluation of calendar questionnaires
Hoppin et al. (1998) report very high test-retest reliability of pesticide use when respondents were contacted by telephone one to three weeks after the original interview. A more detailed study of test-retest reliability of the calendar method—the time between the interviews being eight to fourteen months—resulted in very high agreement for reported life event anchors such as marriages, or immigration (
Engel et al. 2001a).
Freedman et al. (1988) compared respondents’ self-reports from two waves (1980 and 1985) of a longitudinal study. In the 1985 wave, a calendar instrument was used. The authors found an 87% agreement between school attendance reported concurrently in 1980 and retrospectively in 1985. Part-time school attendance was remembered less well than either full-time attendance or no attendance. Responses about work in 1980 were less consistent. Here, the agreement between waves was 72%. The general tendency to underreport unemployment in retrospective surveys was not fully compensated for by the calendar.
Thus, several life course studies that applied event history calendars report relatively high correspondence between retrospective calendar data and matching responses or collateral reports obtained beforehand. Similar results are found in small-scale medical studies on health timelines (e.g.,
Searles et al. 2000). Although these results suggest positive effects of the calendar procedures on recall accuracy, they lack an experimental design: since there is no control condition, it has not been demonstrated whether these results would have been different in a study without aided recall procedures.
4.5 4.5 Quasi-experimental studies
In the 1996 study
Van der Vaart (1996,
2004) developed and tested a calendar method (in these studies called a ‘timeline’) that was filled out by the respondents during a face-to-face interview and was subsequently used as a visual recall aid. The calendar was tested in a field experiment on educational careers during the second wave of a longitudinal social survey, comparing the retrospective reports with reports during the first wave four years before (the recall period was 4–8 years). As compared to the regular questionnaire procedure, adding the calendar enhanced data quality with respect to the number of educational courses followed, the starting year of the courses, and the entire sequence of types of courses taken. Although the calendar reduced recall error in the dates of courses, it did so for absolute error only: it did not affect telescoping (i.e., the direction of the net error in dates) and neither did it diminish the heaping effect in reported dates. The calendar was shown to be most effective if respondents had to perform relatively difficult retrieval tasks in terms of recency, saliency, and frequency of the target behaviour (e.g., for respondents who had followed a great number of courses).
Comparable results were found by
Belli et al. (2001,
2004) who evaluated an event history calendar by means of a field experiment integrated into a longitudinal household study on social and economic behaviours. All interviews in this study were conducted via telephone in 1998. Respondents were asked for retrospective reports on the number and the duration of events that occurred in 1996. The quality of the 1998 reports using either a calendar—that was visible to the interviewer only—or a question-list, was assessed using data from the same respondents collected one year earlier on events in 1996.
1 Compared to the question-list survey the calendar instrument resulted in significant difference scores, indicating positive effects, for three out of six topics: the number of moves, the number of jobs and the number of persons entering the residence. No differences in data quality were found regarding the number of persons leaving the residence, whether having received children aid and whether having received food stamps. Regarding four out of six continuous measures the calendar method led to significantly higher correlations with the 1996 reports than the question-list. This applied to ‘income’ (a) and the durations of periods ‘being unemployed’ (b), periods ‘missing work due to illness’ (c) and periods ‘missing work due to illness of others’ (d) No differences in correlations were found for the duration of periods ‘working’ (e) and periods ‘on vacation’ (f). In spite of the effects on correlations, hardly any differences in mean errors were found between both conditions.
Finally, the experimental record check study by
Van der Vaart and Glasner (2005) generally confirmed the findings of both field experiments presented above. In this study a calendar was employed as a visual aid for respondents in a telephone survey. Unlike most calendar instruments used in the social sciences, this calendar aimed to enhance the recall of singular events (the purchase of pairs of glasses) instead of episodes. The respondents’ retrospective reports about a recall period of over 7 years, were compared to database information on three issues: the
price and the
date of the latest purchase of pairs of glasses and the
number of purchases. Hardly any effects could be established regarding the
number of purchases due to a lack of variation in this measure. Regarding both the
price and the
date of the purchase this study demonstrated that:
(a)
The calendar had positive effects on recall accuracy, although it did not affect telescoping (net error in dates);
(b)
A more difficult recall task—in terms of the saliency and recency—led to greater recall errors;
(c)
Employing the calendar was especially effective in enhancing recall accuracy when the respondents’ recall task was relatively difficult, that is: for less salient and less recent purchases.
As will be discussed in more detail below, a downside of this procedure was that the response rate in the calendar condition was quite low. Sending respondents the calendar instrument beforehand probably increased the risk of refusal.
Overall the results of these experimental studies—that compared the calendar method and the question-list method by using external validating data—are mixed but quite promising. They demonstrated that the calendar method exerted positive effects on recall accuracy for different types of data and never led to worse data quality.
6 6 Evaluations of the interviewing process
Since the calendar method affects the questioning procedure and the tasks of both the interviewer and the respondent substantially, it probably has motivational effects next to memory effects. However, whether these motivational effects prove to be positive or negative is hard to say. On the one hand, a calendar procedure may have positive motivational effects since it is a less well-known method and asks for a more active approach by the interviewer and/or respondent. This may suggest to the interviewer and respondent that that their task is important and needs precision, as is the case for aided recall methods in general (
Sudman and Bradburn 1983). But on the other hand, these very same characteristics might also cause fatigue, meaning that the task may be too burdensome for the interviewer and/or the respondent (
Billiet et al. 1984). While researchers usually appreciate the positive influence of calendar instruments on data quality, they also note that the application of a calendar method often increases the length of the interviewand sometimes of the interviewer training time (see the previous section). However, reports of experiences with calendars as instruments for data collection are generally quite positive. This is especially true for interviewers’ and respondents’ subjective evaluations of the instrument (
Martyn and Martin 2003).
Results of evaluation surveys among interviewers, who worked with calendar instruments, suggest that these instruments are perceived as being interesting to work with (
Belli et al. 2004;
Freedman et al. 1988). Among others, the interviewers’ preference for calendar instruments over regular questionnaires is attributed to the fact that inconsistencies in the data can be removed while the respondent is still available for clarification (
Balán et al. 1969;
Belli et al. 2001;
Goldman et al. 1989). Generally these instruments are also perceived to yield better data quality than do standard questionnaires. Interviewers feel that calendar instruments help respondents to understand questions better (
Belli et al. 2001). Interviewers also think that calendars make the interview “more enjoyable” for the respondent (
Freedman et al. 1988). Similarly interviewers found that the calendar helped them discuss sensitive issues with their respondents (
Martyn and Martin 2003).
Interviewer perceptions that respondents prefer calendar instruments to traditional questionnaires are confirmed by respondents’ feedback on the interviewing process. In the study by
Martyn and Martin (2003), mentioned above, respondents reported that the calendar made it easier for them to discuss sensitive issues with the interviewer because they could refer to the information they had written down in the calendar. In addition they stated that the calendar did ‘jog their memory’ by helping them relate events from several life domains to each other. Respondents are also reported to enjoy filling in the calendar (
Hoppin et al. 1998). In a comparison of a calendar-based questionnaire with a traditional questionnaire, respondents appeared to be more patient and cooperative in the calendar condition (
Engel et al. 2001b). They were more concerned with data quality when a calendar was used, and sometimes even asked for a copy of the completed calendar to take home with them. Caspi and his colleagues (1996) note that, in a pilot study, respondents actively corrected the information in the interviewer-administered paper-and-pencil calendar. Literature on the use of satisfying strategies by respondents suggests that this higher involvement with the interviewing process and objective should have a positive effect on data quality (e.g.,
Holbrook et al. 2003).
On the whole, the respondent feedback about calendar instruments is often positive. This is especially true for interviews in which respondents can see the calendar and use it as a visual recall aid (e.g.,
Engel et al. 2001b;
Hoppin et al. 1998). When calendars are used mainly as a tool for the interviewer, interviewer evaluations of the procedure tend to be more positive than respondent evaluations (
Belli 2000).
7 7 Conclusion and discussion
This review illustrates that applications of calendar instruments in social research have been growing rapidly during the last decade. Calendar instruments are used in a wide variety of research fields and with very diverse populations. The instruments are used in personal as well as in telephone interviews, and they serve either purely as a recall aid or as an instrument of data collection. In some studies respondents filled in calendars according to written instructions and without the assistance of an interviewer. Computerized versions of calendar instruments are available for both, CATI and CAPI applications, but not (yet) for self-completed questionnaires.
Methodological studies on the use of calendar techniques show mixed results with regard to effects on data quality. Effects are found for some issues, but not for others. Sometimes those effects are strong, mostly they are modest. Most calendar instruments are found to increase the completeness of respondents’ accounts. This is especially true for the reduction of ‘gaps’, i.e., time unaccounted for. Often also beneficial effects are reported on the accuracy of the number and characteristics of reported events. From the empirical evidence we have, it can be concluded that calendars do also enhance consistency of responses. Additionally, they are reported to lead to a reduction of dating error, though results on telescoping and heaping are mixed. Furthermore, several studies report interaction effects of the calendar application with the difficulty of the recall task, indicating that calendars might be especially helpful for recalling less recent, more frequent and less salient events. However, up to the present the number of methodological studies is limited and most of the supposed beneficial effects of the calendar reported in this review need more empirical evidence.
The potential operational costs, in terms of increased interview duration and additional interviewer training, as well as consequences for sampling, are not clear yet: both positive and negative consequences were reported. The interviewer response to calendar instruments is generally good, while respondent evaluations seem to depend on the degree to which respondents can see and experience the calendar. This ‘user evaluation’ is important. Since working with calendar instruments requires some extra effort, it is crucial to keep up interviewer and respondent motivation. Overall, it is encouraging that the operational problems that arose in this review do not appear to be insurmountable. Again, more research is needed and then may lead to ‘best practices’.
We should be aware of the fact that applying calendar procedures can also be counterproductive. They may create consistency and completeness in the data that does not represent higher validity but biased reconstructions instead. This artifact may arise especially if respondents’ behavior is less consistent than is assumed by the researcher. It is possible, for example, that women use contraceptive methods during pregnancy (
Becker and Sosa 1992), or that somebody holds a fulltime and a part-time job at the same time. Likewise, in one culture certain events or states might be mutually exclusive and thus reflect response inconsistency—e.g., multiple simultaneous marriages—while they are not in another culture (
Axinn et al. 1999). While in many cases removal of such inconsistencies will result in more valid data, it will lead to error in other cases. Correspondingly, when it comes to ‘time unaccounted for’ in the respondent’s life history, asking the respondent to ‘fill in the gaps’ might not always be a good decision. When the gap stands for an episode that the respondent cannot recall during the interview, he or she might stretch previous or subsequent episodes in order to reduce the amount of time unaccounted for, leading to decreased data quality (
Auriat 1993).
The literature review revealed that the growing interest in calendar methods resulted in many different applications with many different names, which suggests that researcher do not take enough advantage of each other’s efforts. It also appeared that most applications are characterized by a modest theoretical foundation, irrespective of the apparent fact that calendar methodology may utilize organizing principles of autobiographical memory (
Belli 1998;
Belli et al. 2001). Theories and insights from cognitive science and related fields (see
Tourangeau et al. 2000) can be further employed to formulate a theoretical framework on recall bias and calendar techniques in social survey. It is clear that a calendar procedure has to be attuned to the subject matter of the specific survey. Different topics (e.g., educational history versus alcohol consumption) and different populations also entail differences in the requested information and thus different recall tasks. Computer-assisted applications may be helpful to adjust the procedures to a certain (group) profile that is derived from the respondent’s answer’s to earlier questions from the questionnaire, like has been done in ‘pre-loading’ electronic questionnaires (
Hoogendoorn 2004).
However, from the review it appears that it often remains unknown and unspecified, which design characteristics of calendars have which kind of effects on recall accuracy. Therefore a main line of future methodological research would be to perform experimental studies that can shed light on this relationship. Those studies might address topics such as the optimal length of the reference period, number of domains included for maximum effects, effectiveness of different kinds of landmarks, or the effectiveness of the calendar together with different modes of data collection. The issue of non-response also remains very important. Apart from that, further research is needed regarding the roles of the interviewer and respondent. This includes questions like: when should the calendar be used as a questionnaire in itself and when as a separate recall aid? How active and/or steering should the role of the interviewer be? In what way should the interviewer probe and/or help the respondent fill out the calendar?
In sum, our proposal for future research involve a research program that includes at least the following variables:
1.
Independent variables: properties of calendar techniques, such as types of cues and landmarks, the bounding of reference periods, and the graphical design.
2.
Conditional variables: the difficulty of the recall task (like the length of the recall interval), group differences (e.g., age groups) and modes of data collection.
3.
Dependent variables: data quality of responses to factual, retrospective questions on the occurrence, properties, dates and duration of life events in different life domains.
Both small-scale experiments and experimental studies in large-scale surveys are required in order to provide information on the working and effectiveness of calendar applications. In order to further develop and adjust the rationale of the calendar methodology, studies are needed that explicitly test the theoretical ideas that underlie a specific calendar application. It is obvious that many fields might benefit from such methodological studies into calendar procedures. Calendar instruments emerge from the literature as promising methods to enhance retrospective reports. A more systematic approach in the development of calendar methodology might increase their relevance and effectiveness substantially.