Elsevier

Journal of Informetrics

Volume 11, Issue 1, February 2017, Pages 128-151
Journal of Informetrics

Regular article
Three practical field normalised alternative indicator formulae for research evaluation

https://doi.org/10.1016/j.joi.2016.12.002Get rights and content

Highlights

  • Introduces a new field normalised indicator formula with confidence limits.

  • Introduces a field normalised indicator for the proportion of cited articles.

  • Introduces a practical strategy for funders to use alternative indicators for research evaluation.

Abstract

Although altmetrics and other web-based alternative indicators are now commonplace in publishers’ websites, they can be difficult for research evaluators to use because of the time or expense of the data, the need to benchmark in order to assess their values, the high proportion of zeros in some alternative indicators, and the time taken to calculate multiple complex indicators. These problems are addressed here by (a) a field normalisation formula, the Mean Normalised Log-transformed Citation Score (MNLCS) that allows simple confidence limits to be calculated and is similar to a proposal of Lundberg, (b) field normalisation formulae for the proportion of cited articles in a set, the Equalised Mean-based Normalised Proportion Cited (EMNPC) and the Mean-based Normalised Proportion Cited (MNPC), to deal with mostly uncited data sets, (c) a sampling strategy to minimise data collection costs, and (d) free unified software to gather the raw data, implement the sampling strategy, and calculate the indicator formulae and confidence limits. The approach is demonstrated (but not fully tested) by comparing the Scopus citations, Mendeley readers and Wikipedia mentions of research funded by Wellcome, NIH, and MRC in three large fields for 2013–2016. Within the results, statistically significant differences in both citation counts and Mendeley reader counts were found even for sets of articles that were less than six months old. Mendeley reader counts were more precise than Scopus citations for the most recent articles and all three funders could be demonstrated to have an impact in Wikipedia that was significantly above the world average.

Introduction

Citation analysis is now a standard part of the research evaluation toolkit. Citation-based indicators are relatively straightforward to calculate and are inexpensive compared to peer review. Cost is a key issue for evaluations designed to inform policy decisions because these tend to cover large numbers of publications but may have a restricted budget. For example, reports on government research policy or national research performance can include citation indicators (e.g., Elsevier, 2013; Science-Metrix, 2015), as can programme evaluations by research funders (Dinsmore, Allen, & Dolby, 2014). Although funding programme evaluations can be conducted by aggregating end-of-project reviewer scores (Hamilton, 2011), this does not allow benchmarking against research funded by other sources in the way that citation counts do. The increasing need for such evaluations is driven by a recognition that public research funding must be accountable (Jaffe, 2002) and for charitable organisations to monitor their effectiveness (Hwang & Powell, 2009).

The use of citation-based indicators has many limitations. Some well discussed issues, such as the existence of negative citations, systematic failures to cite important influences and field differences (MacRoberts and MacRoberts, 1996, Seglen, 1998, MacRoberts and MacRoberts, 2010), can be expected to average out when using appropriate indicators and comparing large enough collections of articles (van Raan, 1998). Other problems are more difficult to deal with, such as language biases within the citation databases used for the raw data (Archambault, Vignola-Gagne, Côté, Larivière, & Gingras, 2006; Li, Qiao, Li, & Jin, 2014). More fundamentally, the ultimate purpose of research, at least from the perspective of many funders, is not to understand the world but to help shape it (Gibbons et al., 1994). An important limitation of citations is therefore that they do not directly measure the commercial, cultural, social or health impacts of research. This has led to the creation and testing of many alternative types of indicators, such as patent citation counts (Jaffe, Trajtenberg, & Henderson, 1993; Narin, 1994), webometrics/web metrics (Thelwall & Kousha, 2015a) and altmetrics/social media metrics (Priem, Taraborelli, Groth, & Neylon, 2010; Thelwall & Kousha, 2015b). These indicators can exploit information created by non-scholars, such as industrial inventors’ patents, and may therefore reflect non-academic types of impacts, such as commercial value.

A practical problem with many alternative indicators (i.e., those not based on citation counts) is that there is no simple cheap source for them. It can therefore be time-consuming or expensive for organisations to obtain, say, a complete list of the patent citation counts for all of their articles. This problem is exacerbated if an organisation needs to collect the same indicators for other articles so that they can benchmark their performance against the world average or against other similar organisations. Even if the cost is the same as for citation counts, alternative indicators need to be calculated in addition to, rather than instead of, citation counts (e.g., Dinsmore et al., 2014; Thelwall, Kousha, Dinsmore, & Dolby, 2016) and so their costs can outweigh their value. This can make it impractical to calculate a range of alternative indicators to reflect different types of impacts, despite this seeming to be a theoretically desirable strategy. The problem is exacerbated by alternative indicator data usually being much sparser than citation counts (Kousha & Thelwall, 2008; Thelwall, Haustein, Larivière, & Sugimoto, 2013; Thelwall & Kousha, 2008). For example, in almost all Scopus categories, over 90% of articles have no patent citations (Kousha & Thelwall, in press-b). These low values involved make it more important to use statistical methods to detect whether differences between groups of articles are significant. Finally, the highly skewed nature of citation counts and most alternative indicator data causes problems with simple methods of averaging to create indicators, such as the arithmetic mean, and complicate the task of identifying the statistical significance of differences between groups of articles.

This article addresses the above problems and introduces a relatively simple and practical strategy to calculate a set of alternative indicators for a collection of articles in an informative way. The first component of the strategy is the introduction of a new field normalisation formula, the Mean Normalised Log-transformed Citation Score (MNLCS) for benchmarking against the world average. As argued below, this is simpler and more coherent than a previous similar field normalisation approach to deal with skewed indicator data. The second component is the introduction of a second new field normalisation formula, the Equalised Mean-based Normalised Proportion Cited (EMNPC), that targets sparse data, and an alternative, the Mean-based Normalised Proportion Cited (MNPC). The third component is a simple sampling strategy to reduce the amount of data needed for effective field normalisation. The final component is a single, integrated software environment for collecting and analysing the data so that evaluators can create their own alternative indicator reports for a range of indicators with relative ease. The methods are illustrated with a comparative evaluation of the impact of the research of three large medical funders using three types of data: Scopus citation counts; Mendeley reader counts; and Wikipedia citations.

Section snippets

Mean normalised log-transformed citation score

The citation count of an article must be compared to the citation counts of other articles in order to be assessed. The same is true for collections of articles and a simple solution would be to calculate the average number of citations per article for two or more collections so that the values can be compared. This is a flawed approach for the following reasons that have led to the creation of improved methods.

Older articles tend to be more cited than younger articles (Wallace, Larivière, &

Equalised mean-based normalised proportion cited

If a data set for an indicator includes many zeros (i.e., uncited articles or articles with an alternative indicator score of 0) then, as argued above, the MNLCS confidence limits may be wide or undefined. A possible solution to this would be to remove all journals with too many zeros (Bornmann & Haunschild, 2016a) but this produces a biased subset of journals and may reduce the sample sizes substantially if there is a high overall proportion of zeros. A different approach is to calculate the

Mean-based normalised proportion cited

As suggested by an anonymous referee, an alternative approach for calculating the proportion of articles cited is to echo the MNCS and, for each article, replace its citation count by the reciprocal of the world proportion cited for the field and year, if the citation count is positive, otherwise 0. Let pgf=sgf/ngf be the proportion of articles cited for group g in field/year f and let pwf=swf/nwf be the proportion of the world’s articles cited in field/year f. Thenri={0ifci=01/pwfifci>0wherearti

Research questions

This article primarily introduces a strategy for calculating field normalisation formulae and associated confidence intervals. The following research questions are designed to demonstrate the indicators rather than to give conclusive evidence of their value.

  • RQ1: Are MNLCS, MNPC and EMNPC practical in sense of being able to distinguish between different groups?

  • RQ2: Are EMNPC and MNPC preferable to MNLCS when the proportion of cited articles is low?

Data and methods

Large medical research funders were selected to test the new formulae because these are important users of citation analysis and they fund research within a relatively narrow area. The National Institutes of Health (NIH) conducts and funds biomedical research in the U.S.A. The U.K. equivalent is the Medical Research Council (MRC). The U.K.-based Wellcome Trust biomedical research charity is the largest similar non-government research funder. All are based in advanced English-speaking nations

Results

This section describes the results from the perspective of evaluating the funders and the research questions are returned to in the discussion. Here, graphs with confidence intervals are reported rather than hypothesis tests because these reveal effect sizes (Cumming, 2012) as well as being suitable for situations were multiple comparisons are possible. For the MNLCS, the effect size is in terms of the ratio of the average logged citations per article for a group to the world average. For EMNPC

Discussion

The main limitation of this study is that only three groups were investigated (MRC, NIH, Wellcome) and that the empirical results are therefore not conclusive. They primarily serve to illustrate the new methods introduced, show that the claims are broadly credible, and demonstrate that the formulae are capable of generating useful results. A practical limitation of the indicators is that some subject categories in Scopus and WoS contain periodicals that are rarely cited (e.g., trade

Conclusions

This article introduced new field (and year) normalised formulae for indicators, MNLCS, MNPC and EMNPC, all of which can be used to generate estimates of how far above or below the world average a set of articles is. It is possible to calculate approximate confidence intervals for them without bootstrapping as long as the raw data approximately follows the discretised lognormal distribution (MNLCS) and does not have too many zeros (MNPC), and so they seem to be superior to many previous

Author contributions

Mike Thelwall: Conceived and designed the analysis; collected the data; contributed data or analysis tools; performed the analysis; wrote the paper.

References (66)

  • M. Thelwall

    Are there too many uncited articles? Zero inflated variants of the discretised lognormal and hooked power law distributions

    Journal of Informetrics

    (2016)
  • M. Thelwall

    The precision of the arithmetic mean, geometric mean and percentiles for citation data: An experimental simulation modelling approach

    Journal of Informetrics

    (2016)
  • M. Thelwall

    The discretised lognormal and hooked power law distributions for complete citation data: Best options for modelling and regression

    Journal of Informetrics

    (2016)
  • A.F. van Raan et al.

    Rivals for the crown: Reply to Opthof and Leydesdorff

    Journal of Informetrics

    (2010)
  • M.L. Wallace et al.

    Modeling a century of citation distributions

    Journal of Informetrics

    (2009)
  • L. Waltman et al.

    Towards a new crown indicator: Some theoretical considerations

    Journal of Informetrics

    (2011)
  • É. Archambault et al.

    Benchmarking scientific output in the social sciences and humanities: The limits of existing databases

    Scientometrics

    (2006)
  • B.J.R. Bailey

    Confidence limits to the risk ratio

    Biometrics

    (1987)
  • R.A. Berk et al.

    Statistical inference for apparent populations

    Sociological Methodology

    (1995)
  • K.A. Bollen

    Apparent and nonapparent significance tests

    Sociological Methodology

    (1995)
  • L. Bornmann et al.

    How to normalize Twitter counts? A first attempt based on journals in the Twitter Index

    Scientometrics

    (2016)
  • G. Cumming

    Understanding the new statistics: Effect sizes, confidence intervals, and meta-analysis

    (2012)
  • A. Dinsmore et al.

    Alternative perspectives on impact: The potential of ALMs and altmetrics to inform funders about research impact

    PLoS Biology

    (2014)
  • F. de Moya-Anegón et al.

    Coverage analysis of Scopus: A journal metric approach

    Scientometrics

    (2007)
  • D. de Solla Price

    A general theory of bibliometric and other cumulative advantage processes

    Journal of the American Society for Information Science

    (1976)
  • Elsevier

    International comparative performance of the UK research base – 2013

    (2013)
  • G. Eysenbach

    Can tweets predict citations? Metrics of social impact based on Twitter and correlation with traditional metrics of scientific impact

    Journal of Medical Internet Research

    (2011)
  • E.C. Fieller

    Some problems in interval estimation

    Journal of the Royal Statistical Society Series B

    (1954)
  • M. Gibbons et al.

    The new production of knowledge: The dynamics of science and research in contemporary societies

    (1994)
  • W. Gunn

    Social signals reflect academic impact: what it means when a scholar adds a paper to Mendeley

    Information Standards Quarterly

    (2013)
  • S. Hamilton

    Evaluation of the ESRC’s participation in European collaborative research projects (ECRPs)

    (2011)
  • H. Hwang et al.

    The rationalization of charity: The influences of professionalism in the nonprofit sector

    Administrative Science Quarterly

    (2009)
  • A.B. Jaffe et al.

    Geographic localization of knowledge spillovers as evidenced by patent citations

    The Quarterly Journal of Economics

    (1993)
  • Cited by (0)

    View full text