Wednesday, December 9, 2015

Impact factors are not the same thing as reliability

A very common issue that I see among scientists and science lovers (skeptics, enthusiasts, etc.) is this idea that impact factors are useful in determining the quality and reliability of a scientific study. Some take it to the point that anything with an impact factor less than 10 is questionable. Unfortunately, there are issues with this idea that become apparent once what an impact factor really means is defined. An impact factor is the number of times that the articles in a journal are cited the previous two years divided by the number of articles published that year.

This is the calculation used:
Figure 1: Calculation for journal impact factor.
A= total cites in 1992
B= 1992 cites to articles published in 1990-91 (this is a subset of A)
C= number of articles published in 1990-91
D= B/C = 1992 impact factor

All this means is that the impact factor is telling you how popular the previous year's articles were in the community (as measured by number of citations). Because of how simple this calculation is, it is prone to manipulations that artificially inflate it. There have been quite a few articles and papers detailing the issues with the impact factor including a recent one that talks about some of the damage that impact factor mania leads to. It has gotten to the point where some just accept science in high impact factor journals like Science, Nature and Cell without examining the science despite the dangers of this. Even worse, the impact factor has become part and parcel of the process of getting grants, positions, and tenure despite some people decrying this practice.

How did we get to the point where the impact factor is the end all be all of academic progress and the sole measure of reliability? Part of the problem is that some institutions, grant panels, and science communicators have used the impact factor as a quick measure of reliability. Take what this blogger has said about the impact factor:

"One of the better ways to ascertain scientific research quality to examine the quality of the journal where the research was published. Articles in high quality journals are cited more often, because those journals attract the best scientific articles (which are cited more). Yes, it’s a self-fulfilling system, but that’s not all bad. Some of the most prestigious journals have been around for a century or more, and their reputation is deserved.

Obviously, the best articles are sent to these journals partially because of the prestige of the journal, but also because the peer review is so thorough. Journals use a metric called “impact factor” that essentially states how many times an average article is cited by other articles in an index (in this case for all medical journals).

Not only is this an incorrect explanation of what an impact factor is (remember it is a measure of the number of citations the previous year divided by the number of articles published the previous two years (not the average number of citations per article as stated), but it sets the impact factor on the same level as quality of peer review and reliability. Although it might be true that journals like Cell, Nature and Science are harder to publish in, they also are very specific in what they are interested in publishing (known in academia as scope) and tend to publish flashier pieces. For example, 30 years ago it would be common to publish either the complete or partial genome of a virus in Science or Nature. These days you are more likely to publish such a paper in Genome Announcements or Archives of Virology. Does this mean that the peer review in GA of AoV is not rigorous or that the research published there is lesser quality than those published previously in Science or Nature? It is not likely to be the case due to the advancements in technology that eliminated the novelty (a big draw for journals like Science and Nature) in fact it is likely that the genome coverage is higher and the sequences are more reliable in recent papers than the days when a researcher called the bases on a long polyacrylamide sequencing gel. Does this mean that one journal is better than the other? No, they just have different scopes and therefore foci.

The aforementioned blogger does mention that impact factors aren't the sole determinant of reliability; however, they then come back to impact factors as a shortcut for determining reliability.

"As an independent, objective method to judge the quality of published research, Impact Factor is one of the best available."

Sadly nothing could be further from reality. This makes the assumption that journals with high impact factors never have to retract articles due to fraud. This is not the case as high impact factor journals have more retractions on average than lower impact factor journals. One possible explanation is that journals with a high impact factor have more concrete plans to deal with retractions; however, this has thus far only been studied in high impact factor journals with similar editorial practices regarding retractions and does not account for the increase in retractions as the impact factor increases.

Photo caption: Correlation between impact factor and retraction index. The 2010 journal impact factor s plotted against the retraction index as a measure of the frequency of retracted articles from 2001 to 2010 (see text for details). Journals analyzed were Cell, EMBO Journal, FEMS Microbiology Letters, Infection and Immunity, Journal of Bacteriology, Journal of Biological Chemistry, Journal of Experimental Medicine, Journal of Immunology, Journal of Infectious Diseases, Journal of Virology, Lancet, Microbial Pathogenesis, Molecular Microbiology, Nature, New England Journal of Medicine, PNAS, and Science. Credit: Fang et al., 2011, Figure 1.

Another variable in this discussion of impact factors is that it is very dependent on the field and the focus of the journal. A perfect example of this is the impact factor of journals focusing on entomology. Even the best of these journals only reaches an impact factor of 1-2. Does this mean the research in these journals are not reliable and therefore suspect? Of course not, and here is why. The researchers that will read and cite articles from entomology journals are either entomologists or work in a closely related field. A biomedical researcher is not likely to cite from an entomology journal unless they work with arthropods. It quickly becomes apparent that this is a numbers game. If a field has more people working in it, there will be more citations for journals in that field. Likewise, if a journal has a broader audience base, then they will have more people reading it and citing it.

So how do the virology journals make out in this impact factor game? I've put together a list of the 2014 impact factors for the virology-specific journals I could find. Note: I did not include general focus journals that include papers on virology in this list to help illustrate how field-specific journals suffer from this system.

*Journal publisher is on Beall's List of Predatory Publishers

If you notice, none of these journals, even the review journals, has an impact factor above 10, which people often use as their minimum for trustworthiness. So does this mean that these journals are automatically suspect? What are we to make of a journal published by a publisher on Beall's List having as high of an impact factor as other highly respected virology journals? The answer to both of these questions is simple: there must be something else going on here. In order to determine exactly what is going on, we need to dig deeper into what the impact factor was meant to convey and how it originated.  

The idea of the impact factor dates back to the 1920's with the proposal to produce an index for university librarians to determine which journals were worth purchasing. In 1955, it was mentioned by Dr. Eugene Garfield with the intention of building an index for scientific papers so that outdated or fraudulent work could be put to rest. This original index had the idea of individual papers being cataloged and the number of citations that they had being recorded. This idea lives on in google scholar citations, ResearchGate, Web of Science and several others. Take my google scholar page for example. One can easily see which papers I have published and how many times they have been cited.The original idea for the impact factor was to make citing work easier for researchers. Over time, the idea of the impact factor evolved from comparing individual papers to journals. Dr. Garfield originally intended the journal impact factor to be used to evaluate an author when their most recent papers had not had time to be cited. Dr. Garfield also had this to say about comparing journals across disciplines: "In any case, comparisons should not be made across disciplines unless the data are normalized in to take into account the various parameters that affect impact." This is something that is not done as the raw impact factors are used for the comparisons. 

The usefulness of the impact factor is hotly debated among bibliometricians, with some pointing out rather large flaws in the system. Among the flaws are the ease of impact factor manipulation, the evolution from the journal impact factor from being used as a selective criteria for purchasing to the end all, be all of academia, negative citations carry the same weight as positive citations, that the importance of papers may be noticed after the impact factor, and that some papers that are not cited actually have an impact on the field. A recurring theme among the bibliometricians is that the impact factor is being misused to judge quality of a journal and not the influence. Some are quite blunt in their disapproval of this practice. Take what Dr. Anthony van Raan, director of the Center for Science and Technology Studies at Leiden University in the Netherlands has said about this habit: "If there is one thing every bibliometrician agrees, it is that you should never use the journal impact factor to evaluate research performance for an article or for an individual — that is a mortal sin." Several other alternatives have been proposed, but they would suffer the same fate as the impact factor in that they would be misappropriated as a way to judge quality.  
So what can we do? There is no easy answer or shortcut in determining the quality of a particular paper. The only surefire way to judge an article's quality is to examine its methods, results and conclusions. Another option is to look for a meta analysis or a review covering the article in question as these are often written by experts who have an intimate knowledge of the topic and can determine the impact of the article. However, this runs into the issue that scientists are human with jealousies and grudges that can color their views on work by those they dislike. There really isn't an easy way around it. Sometimes, you just have to go to the data to see the quality of a paper.

So what's the take away from this? From my research on the history and intended use of the impact factor I found that:
  1. The impact factor does not measure reliability, just journal popularity
  2. Higher impact factor journals have higher retraction rates that cannot be explained away as differences in editorial practices as only high impact factor journals have been studied for this
  3. Just because a journal has a high impact factor it does not mean that the articles published there are of high quality
  4. The impact factor of journals in different disciplines and scopes cannot be directly compared without further mathematical adjustment 
  5. There is no real shortcut for determining the quality of a research paper other than examining it critically 

No comments:

Post a Comment