Review Article
Do open access articles have greater citation impact?: A critical review of the literature

https://doi.org/10.1016/j.joi.2007.04.001Get rights and content

Abstract

The last few years have seen the emergence of several open access options in scholarly communication which can broadly be grouped into two areas referred to as ‘gold’ and ‘green’ open access (OA). In this article we review the literature examining the relationship between OA status and citation counts of scholarly articles. Early studies showed a correlation between the free online availability or OA status of articles and higher citation counts, and implied causality without due consideration of potential confounding factors. More recent investigations have dissected the nature of the relationship between article OA status and citations. Three non-exclusive postulates have been proposed to account for the observed citation differences between OA and non-OA articles: an open access postulate, a selection bias postulate, and an early view postulate. The most rigorous study to date (in condensed matter physics) showed that, after controlling for the early view postulate, the remaining difference in citation counts between OA and non-OA articles is explained by the selection bias postulate. No evidence was found to support the OA postulate per se; i.e. article OA status alone has little or no effect on citations. Further studies using a similarly rigorous approach are required to determine the generality of this finding.

Introduction

With the advent of the Internet and electronic publishing, new models of scholarly communication have emerged that simultaneously complement and challenge established systems. Although the term ‘open access’ (OA) is taken broadly to mean that accessing, downloading, and reading material is free to the entire population of Internet users, several options for the provision of that access have emerged. These can be grouped into two broad models: ‘gold’ and ‘green’. The gold model uses a traditional journal publication system, but shifts the economic/financial model. Instead of a subscriber paying to read the final version of a peer-reviewed article, an author or sponsor pays to publish the article, and reading the article is free to anyone wishing to do so. A journal may operate wholly under this model, or may use a hybrid model combining subscription and article sponsorship. The green model of OA relies on posting the author's manuscript of an article into an institutional or subject-based electronic archive, either in the form of a pre-print (as submitted to a journal for peer review) or as a final copy of the peer-reviewed edited full text (a post-print).1 A less rigorous but increasingly common form of archiving is the use of individual author webpages, outside of a structured archive. Articles that are posted as pre-prints may be subsequently accepted for publication in a journal and may then also be archived as post-prints, sometimes, but not universally, replacing the pre-print version. Economically, green OA relies on the sustainability of the existing journal system as, unlike gold OA, it does not provide any financial support for journals.

An increasing amount of research on the effects of OA models on scholarly communication has emerged in recent years, and it is clear that the methods for performing this research have been developing alongside the new models that they study. One of the foremost questions asked is: ‘Do open access research articles have a greater citation impact?’ Another way of asking this question at the most personal level for the authors of journal articles is: ‘Will my research paper(s), and therefore will I, get a citation benefit from the gold and green open access models?’ In this article we survey the original research literature on this topic to date, with a particular emphasis on methodological issues, and highlight areas in which further research is required. As this is an evolving area of research, the terminology has yet to become fixed; we follow the terminology used by the authors of each article under discussion.

Section snippets

Methodological issues in citation analysis

A citation is defined as the listing of a previously published article in the reference section of a current work; this is usually taken to imply the relevance of the cited article to the current work. Information about articles and the citations between them are collected in databases known as citation indexes. The best-known example of a citation index is Thomson Scientific's Web of Science®, which now contains about 40 million bibliographic records and over 550 million citations from the

Correlations of online availability and increased citations

The first research that showed a correlation between articles made available online and higher citations was carried out by Lawrence, 2001a, Lawrence, 2001b. Lawrence based his study solely on conference proceedings articles in computer sciences and related disciplines that were listed in the DBLP Computer Science Bibliography. He assessed the availability of a corresponding full-text article online and the number of citations (excluding author self-citations) received to date using the

Correlations of open access and increased citations

The first study to assess the effect of green OA relating to published journal articles (and not simply to conference proceedings and ‘online availability’) was published by Harnad and Brody (2004), and elements of these data were included in later papers (Brody, 2004, Harnad et al., 2004). Over 95,000 pre-print manuscripts in physics and mathematics deposited in the subject-based repository arXiv (http://www.arXiv.org) were matched with the final published journal article indexed in Thomson

When should citation counting begin?

None of the studies discussed so far have taken account of the critical dimension of temporal progression: that is, the time difference between when an article is made available online or deposited in an electronic archive and when it is published. The sole consideration is whether or not the article was freely available at the time of the study. Furthermore, the relative timing of publication and the counting of references to the article must be precisely defined (ideally imposing a fixed

Deconstructing the open access citation effect

All of the studies discussed above were concerned with demonstrating a difference between average citation counts to articles that were made available online and those that were not. While some implied a causal relationship, most acknowledged selection bias as a possible explanation for the observed citation patterns, and some also noted differences in the effective citation life-times of the two groups. The articles discussed in this section represent a new phase in the development of the

What does open access mean for individual authors?

In an attempt to investigate the effect of gold open access on citations, Eysenbach (2006) undertook an analysis of articles published in the latter half of 2004 in a single hybrid gold journal, the Proceedings of the National Academy of Sciences (PNAS). PNAS is a large multidisciplinary journal, publishing in areas as diverse as biochemistry, neuroscience, genetics, biophysics, chemistry, evolution, microbiology, and plant sciences. Articles whose cost of publication was borne by the authors

Conclusions

We posed two questions at the beginning of this article: firstly ‘Do open access research articles have a greater citation impact?’ and secondly ‘Will my research paper(s), and therefore will I, get a citation benefit from the gold and green open access models?’ These two questions in turn represent the main stages in the development of the research literature on this subject. While early work was simply concerned with seeking a positive correlation between open access and citation counts, more

Acknowledgements

The authors would like to thank Jeffrey Aronson at the Department of Clinical Pharmacology, Oxford University, and Henry Small, Chief Scientist at Thomson Scientific, for helpful comments on the manuscript.

References (27)

  • J. Bar-Ilan

    An ego-centric citation analysis of the works of Michael O. Rabin based on multiple citation indexes

    Information Processing and Management

    (2006)
  • S. Harnad et al.

    The access/impact problem and the green and gold roads to open access

    Serials Review

    (2004)
  • M.J. Kurtz et al.

    The effect of use and access on citations

    Information Processing and Management

    (2005)
  • K. Anderson et al.

    Publishing online-only peer-reviewed biomedical literature: three years of citation, author perception, and usage experience

    Journal of Electronic Publishing

    (2001)
  • K. Antelman

    Do open-access articles have a greater citation impact?

    College & Research Libraries

    (2004)
  • K. Antelman

    Letter to the Editor: Response to Philip Davis

    College & Research Libraries

    (2006)
  • Antelman, K., Bakkalbasi, N., Goodman, D., Hajjem, C., & Harnad, S. (2005). Evaluation of algorithm performance on...
  • K.W. Boyack

    Mapping knowledge domains: Characterizing PNAS

    Proceedings of the National Academy of Sciences of United States of America

    (2004)
  • Brody, T. (2004). Citation analysis in the open access world. Available at http://eprints.ecs.soton.ac.uk/10000 (link...
  • P.M. Davis

    Letter to the Editor: Do open-access articles have a greater citation impact?

    College & Research Libraries

    (2006)
  • P.M. Davis et al.

    Does the arXiv lead to higher citations and reduced publisher downloads for mathematics articles?

    Scientometrics

    (2007)
  • G. Eysenbach

    Citation advantage of open access articles

    PLoS Biology

    (2006)
  • W. Glänzel et al.

    Does co-authorship inflate the share of self-citations?

    Scientometrics

    (2004)
  • Cited by (0)

    View full text