Monday, January 2, 2017

What are isogenic lines and why should they be used to study GE traits?

There has been quite a lot of talk about the latest paper from Seralini's group that claims that there are substantial metabolome differences between genetically engineered corn and non-GE corn. The paper was published in an online journal run by the Nature group (and not in Nature as some websites are claiming). At first glance, this paper seems to detail some results that are seriously concerning. However, when one examines the methodologies used, several glaring issues emerge that challenge the conclusions reached from the results presented. Many others have addressed several of the methodological problems with this study, but I'd like to focus on the corn lines that they used and the claims that they were isogenic as the entire experiment hinges on using the correct lines. 

To start with, I need to explain what an isogenic line is as most people (even other scientists outside of the plant sciences) do not know what this is. When discussing isogenic lines, it's not a single line but at least two lines. Genetically, these lines differ by only a few genes but are identical beyond that. Achieving this is nearly impossible, so researchers will use near-isogenic lines (genetically these are at least 99% identical). Generally speaking, near-isogenic lines (NIL) are not available for purchase and must be generated. To do this, a donor plant with the gene of interest (in this case, the gene would be a GE trait like glyphosate resistance or Bt production) is crossed with what is called the recurrent parent (see figure below). Using genetic markers (in a process called marker assisted selection), progeny with the appropriate genes are then back crossed against the recurrent parent and the trait selected for until the progeny are 99% genetically identical to the recurrent parent. It takes time and effort to generate an NIL like this, and there simply are no shortcuts to getting there.

Figure caption: In plant breeding, selected individuals are crossed to introduce or combine desired trait characteristics into new offspring; this necessitates numerous generations of backcrossing to establish the desired trait characteristics fully. Each successive backcross increases the genetic similarity of the new offspring to the recurrent parent, e.g. 75% similar at BC1 through to 99.2% by BC6. These numbers are based on how much of the recurrent parent genome can be theoretically regained at each step; however slight variations can occur. Marker-assisted methodologies that utilize DNA markers to enable selection of plant individuals that contain the greatest number of favorable alleles can reduce the number of generations required to get close to 99% similarity as adopted in the generation of the inbred variants of this study. From Harrigan et al., 2016 via PMC. DOI: 10.1007/s11306-016-1017-6

So what do NILs have to do with this new study and why do they matter? 
In order for the authors to clearly demonstrate that the differences seen are due to the GE trait (NK603, resistance to glyphosate), they need to use lines that have nearly identical genetic backgrounds. This is because it is well known that different lines have different transcript expression patterns, metabolomes, flavors, etc. For example, this paper by Wen et al. looked at the differences in the kernel metabolomes from different corn lines. This is not unexpected as different lines can have different phenotypes. One only has to look at a seed catalog to see the variation that is available for crops such as tomatoes or apples and yet these are the same species. Because of this, a NIL is needed to demonstrate that observed differences are due to the gene of interest and not the normal variation that is seen between lines.

In this study, the authors state that they are using isogenic lines in several places, but in the materials and methods they state they used the "closest" isogenic lines. For example, they state in the abstract that they used isogenic lines (see figure below).

In the materials and methods they state this: 

These are not isogenic or near-isogenic lines. The DKC stands for DEKALB seeds, which is owned by Monsanto, and the numbers are the identifiers used by the company to denote the line. These numbers are akin to a catalog number used to sell seed. The lines offered one year may not be the same ones offered the next. It's also important to note that these numbers do not denote lineages, which is a carefully guarded trade secret for seed companies. The authors did not provide any information on the lineage of the two lines (DKC 2678 or DKC 2675 [which was also labeled as DKC 2575 in some places in the manuscript]) and just because they have similar numbers, that does not mean that they are genetically related. To help illustrate this, I found a DEKALB catalog from 2012. On page 5, there is a line, DKC 27-55 that has VT Double PRO technology (a stacked Bt trait). On page 6, there is a line called DKC 27-45 that has a single Bt trait (YieldGard Corn Borer)and RoundUp Ready 2. DKC 27-55 was new for 2012 and DKC 27-45 was on the market for awhile and was a recommended line for growing silage (for more on what silage is, see this video from the Peterson Farm Brothers harvesting silage for their cattle). Although these two lines have similar DKC numbers, they have different traits and very different uses. Beyond just picking two lines with similar catalog numbers, there is another issue with their choice of lines; they used hybrids. 

With hybrids, two unrelated lines are crossed and the resulting progeny has higher yield through a process called hybrid vigor. It's very common for corn to be hybridized and it's something to be mindful of for a study looking at genetic effects as this can introduce a source of genetic variation if the same parent is not used in the hybridization process. If the lines did not share the same hybridization parent, then they would not be isogenic even if the original two lines were isogenic (for which no evidence is provided that they are). They ordered seeds that were already hybridized and provided no information on which parent was used to make the hybrid. Since there is no guarantee that the two hybrids share the same genes from the same hybridization parent, these cannot be considered isogenic. To do so is bad science.

The choice of lines used in this study introduces several major sources of variation that make it impossible to account for. Because the lines are not isogenic (or near-isogenic) and there is no information on if they were hybridized to the same parent line, it is impossible to say if the observed differences are due to the transgenic trait or due the fact that lines with differing genetics were used. The risk of misinterpreting the results based on this would be far too great and this type of experiment is too expensive to waste money like that. This is one of the reasons why researchers will generate their own NILs. Other issues with the study include the poor plot design that lacks randomization (or any other standard design for an experiment like this), lack of replications (different blocks within the field, other locations, repeated growing seasons, etc.) and the lack of information about these lines. No pedigree is offered and these lines are no longer on the market (I couldn't find any information on these lines online), so we can't say for sure how likely it is that they are closely related or not. This is bad science and this paper never should have made it through the peer review process with these and other major methodological issues intact (not to mention all of the grammatical and typographic errors).

This type of experiment really required the use of NILs that the researchers made themselves by crossing and backcrossing. Luckily such an experiment was published prior to the submission of this paper. Harrigan et al. (2016) examined this same issue. However, the experimental design for this study was superior in every way. The Harrigan study generated NILs for the same trait that the Seralini paper did (NK603); however, they generated four lines each that were transgenic or not as well as the recurrent parent (as a control). These lines were hybridized with two different female testers and planted in randomized complete block design with three replicate blocks in three separate locations (Illinois, 
Minnesota, and Nebraska). The metabolomes were then measured by GC-MS. They found few differences between the lines but that genetic variation between the lines accounted for more differences than the GE trait, for which no trait-specific effect was found.

As I stated above, there are numerous other issues with the methodology used in the Seralini paper that are not limited to just the use of improper lines. Generating NILs is time and labor intensive, but if you are truly trying to answer this type of question it is simply something that must be done. You cannot take shortcuts with experiments like this and you cannot assume that just because two lines have similar numbers in a seed catalog that they are related. Shortcuts in science lead to bad results that are a waste of time and money.