Regular articleThree practical field normalised alternative indicator formulae for research evaluation
Introduction
Citation analysis is now a standard part of the research evaluation toolkit. Citation-based indicators are relatively straightforward to calculate and are inexpensive compared to peer review. Cost is a key issue for evaluations designed to inform policy decisions because these tend to cover large numbers of publications but may have a restricted budget. For example, reports on government research policy or national research performance can include citation indicators (e.g., Elsevier, 2013; Science-Metrix, 2015), as can programme evaluations by research funders (Dinsmore, Allen, & Dolby, 2014). Although funding programme evaluations can be conducted by aggregating end-of-project reviewer scores (Hamilton, 2011), this does not allow benchmarking against research funded by other sources in the way that citation counts do. The increasing need for such evaluations is driven by a recognition that public research funding must be accountable (Jaffe, 2002) and for charitable organisations to monitor their effectiveness (Hwang & Powell, 2009).
The use of citation-based indicators has many limitations. Some well discussed issues, such as the existence of negative citations, systematic failures to cite important influences and field differences (MacRoberts and MacRoberts, 1996, Seglen, 1998, MacRoberts and MacRoberts, 2010), can be expected to average out when using appropriate indicators and comparing large enough collections of articles (van Raan, 1998). Other problems are more difficult to deal with, such as language biases within the citation databases used for the raw data (Archambault, Vignola-Gagne, Côté, Larivière, & Gingras, 2006; Li, Qiao, Li, & Jin, 2014). More fundamentally, the ultimate purpose of research, at least from the perspective of many funders, is not to understand the world but to help shape it (Gibbons et al., 1994). An important limitation of citations is therefore that they do not directly measure the commercial, cultural, social or health impacts of research. This has led to the creation and testing of many alternative types of indicators, such as patent citation counts (Jaffe, Trajtenberg, & Henderson, 1993; Narin, 1994), webometrics/web metrics (Thelwall & Kousha, 2015a) and altmetrics/social media metrics (Priem, Taraborelli, Groth, & Neylon, 2010; Thelwall & Kousha, 2015b). These indicators can exploit information created by non-scholars, such as industrial inventors’ patents, and may therefore reflect non-academic types of impacts, such as commercial value.
A practical problem with many alternative indicators (i.e., those not based on citation counts) is that there is no simple cheap source for them. It can therefore be time-consuming or expensive for organisations to obtain, say, a complete list of the patent citation counts for all of their articles. This problem is exacerbated if an organisation needs to collect the same indicators for other articles so that they can benchmark their performance against the world average or against other similar organisations. Even if the cost is the same as for citation counts, alternative indicators need to be calculated in addition to, rather than instead of, citation counts (e.g., Dinsmore et al., 2014; Thelwall, Kousha, Dinsmore, & Dolby, 2016) and so their costs can outweigh their value. This can make it impractical to calculate a range of alternative indicators to reflect different types of impacts, despite this seeming to be a theoretically desirable strategy. The problem is exacerbated by alternative indicator data usually being much sparser than citation counts (Kousha & Thelwall, 2008; Thelwall, Haustein, Larivière, & Sugimoto, 2013; Thelwall & Kousha, 2008). For example, in almost all Scopus categories, over 90% of articles have no patent citations (Kousha & Thelwall, in press-b). These low values involved make it more important to use statistical methods to detect whether differences between groups of articles are significant. Finally, the highly skewed nature of citation counts and most alternative indicator data causes problems with simple methods of averaging to create indicators, such as the arithmetic mean, and complicate the task of identifying the statistical significance of differences between groups of articles.
This article addresses the above problems and introduces a relatively simple and practical strategy to calculate a set of alternative indicators for a collection of articles in an informative way. The first component of the strategy is the introduction of a new field normalisation formula, the Mean Normalised Log-transformed Citation Score (MNLCS) for benchmarking against the world average. As argued below, this is simpler and more coherent than a previous similar field normalisation approach to deal with skewed indicator data. The second component is the introduction of a second new field normalisation formula, the Equalised Mean-based Normalised Proportion Cited (EMNPC), that targets sparse data, and an alternative, the Mean-based Normalised Proportion Cited (MNPC). The third component is a simple sampling strategy to reduce the amount of data needed for effective field normalisation. The final component is a single, integrated software environment for collecting and analysing the data so that evaluators can create their own alternative indicator reports for a range of indicators with relative ease. The methods are illustrated with a comparative evaluation of the impact of the research of three large medical funders using three types of data: Scopus citation counts; Mendeley reader counts; and Wikipedia citations.
Section snippets
Mean normalised log-transformed citation score
The citation count of an article must be compared to the citation counts of other articles in order to be assessed. The same is true for collections of articles and a simple solution would be to calculate the average number of citations per article for two or more collections so that the values can be compared. This is a flawed approach for the following reasons that have led to the creation of improved methods.
Older articles tend to be more cited than younger articles (Wallace, Larivière, &
Equalised mean-based normalised proportion cited
If a data set for an indicator includes many zeros (i.e., uncited articles or articles with an alternative indicator score of 0) then, as argued above, the MNLCS confidence limits may be wide or undefined. A possible solution to this would be to remove all journals with too many zeros (Bornmann & Haunschild, 2016a) but this produces a biased subset of journals and may reduce the sample sizes substantially if there is a high overall proportion of zeros. A different approach is to calculate the
Mean-based normalised proportion cited
As suggested by an anonymous referee, an alternative approach for calculating the proportion of articles cited is to echo the MNCS and, for each article, replace its citation count by the reciprocal of the world proportion cited for the field and year, if the citation count is positive, otherwise 0. Let be the proportion of articles cited for group in field/year and let be the proportion of the world’s articles cited in field/year . Then
Research questions
This article primarily introduces a strategy for calculating field normalisation formulae and associated confidence intervals. The following research questions are designed to demonstrate the indicators rather than to give conclusive evidence of their value.
- •
RQ1: Are MNLCS, MNPC and EMNPC practical in sense of being able to distinguish between different groups?
- •
RQ2: Are EMNPC and MNPC preferable to MNLCS when the proportion of cited articles is low?
Data and methods
Large medical research funders were selected to test the new formulae because these are important users of citation analysis and they fund research within a relatively narrow area. The National Institutes of Health (NIH) conducts and funds biomedical research in the U.S.A. The U.K. equivalent is the Medical Research Council (MRC). The U.K.-based Wellcome Trust biomedical research charity is the largest similar non-government research funder. All are based in advanced English-speaking nations
Results
This section describes the results from the perspective of evaluating the funders and the research questions are returned to in the discussion. Here, graphs with confidence intervals are reported rather than hypothesis tests because these reveal effect sizes (Cumming, 2012) as well as being suitable for situations were multiple comparisons are possible. For the MNLCS, the effect size is in terms of the ratio of the average logged citations per article for a group to the world average. For EMNPC
Discussion
The main limitation of this study is that only three groups were investigated (MRC, NIH, Wellcome) and that the empirical results are therefore not conclusive. They primarily serve to illustrate the new methods introduced, show that the claims are broadly credible, and demonstrate that the formulae are capable of generating useful results. A practical limitation of the indicators is that some subject categories in Scopus and WoS contain periodicals that are rarely cited (e.g., trade
Conclusions
This article introduced new field (and year) normalised formulae for indicators, MNLCS, MNPC and EMNPC, all of which can be used to generate estimates of how far above or below the world average a set of articles is. It is possible to calculate approximate confidence intervals for them without bootstrapping as long as the raw data approximately follows the discretised lognormal distribution (MNLCS) and does not have too many zeros (MNPC), and so they seem to be superior to many previous
Author contributions
Mike Thelwall: Conceived and designed the analysis; collected the data; contributed data or analysis tools; performed the analysis; wrote the paper.
References (66)
- et al.
Normalization of Mendeley reader impact on the reader- and paper-side: A comparison of the mean discipline normalized reader score (MDNRS) with the mean normalized reader score (MNRS) and bare reader counts
Journal of Informetrics
(2016) - et al.
The use of percentiles and percentile rank classes in the analysis of bibliometric data: Opportunities and limits
Journal of Informetrics
(2013) - et al.
National research impact indicators from Mendeley readers
Journal of Informetrics
(2015) - et al.
More precise methods for national research citation impact comparisons
Journal of Informetrics
(2015) - et al.
Chinese-language articles are not biased in citations: evidences from Chinese-English bilingual journals in Scopus and Web of Science
Journal of Informetrics
(2014) Lifting the crown—citation z-score
Journal of Informetrics
(2007)- et al.
Caveats for the journal and field normalizations in the CWTS (Leiden) evaluations of research performance
Journal of Informetrics
(2010) - et al.
National, disciplinary and temporal variations in the extent to which articles with more authors have more impact: Evidence from a geometric field normalised citation indicator
Journal of Informetrics
(2016) Are the discretised lognormal and hooked power law distributions plausible for citation data?
Journal of Informetrics
(2016)Citation count distributions for large monodisciplinary journals
Journal of Informetrics
(2016)