The h index for research assessment: Simple and popular, but shown by mathematical analysis to be inconsistent and misleading

02/17/2020
by   Ricardo Brito, et al.
0

Citation distributions are lognormal. We use 30 lognormally distributed synthetic series of numbers that simulate real series of citations to investigate the consistency of the h index. Using the lognormal cumulative distribution function, the equation that defines the h index can be formulated; this equation shows that h has a complex dependence on the number of papers (N). Specifically, after a certain limit, an increase of N has a very small effect on h, which contradicts a rational expectation. We also investigate the correlation between h and the number of papers exceeding various citation thresholds, from 5 to 500 citations. The best correlation, for 100 citations, is not linear, and numerous data points deviate from the general trend. The size-independent indicator h/N shows no correlation with the probability of publishing a paper exceeding any of the citation thresholds. In contrast with the h index, the total number of citations in the series shows a high linear correlation with the number of papers exceeding the thresholds of 10 and 50 citations. Dividing by N, the mean number of citations correlates with the probability of publishing a paper that exceeds any level of citations. The dependence is never linear, but for the thresholds of 20 and 30 citations, the deviation from linearity is low. Thus, in synthetic series, the number of citations and the mean number of citations are much better indicators of research performance than h and h/N. We discuss that this conclusion can be extended to real citation series.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset