Commentary: New EPO study has serious flaws
Trevor Connor holds a master’s degree in exercise bioenergetics and nutrition from Colorado State University.
A recent study published in The Lancet Haematology has been stirring up the cycling world with what appears to be a surprising conclusion. It claims that the use of erythropoietin (EPO) may actually have no performance benefits for cyclists. A few journalists have gone so far as to say this study shows that Lance Armstrong may have won his Tours fairly.
That’s a dramatic claim. Even the study authors conclude by saying that EPO does not improve road race performance. But it is critical to delve deeper into the study before trusting the conclusions.
For the most part, this was a well-conducted study. Furthermore, it’s a needed study. For all the debate about doping, the taboo nature of doping products has in large part led scientists to avoid controlled research on their benefits. Instead, the focus has been on detection. But, to fight it, we need to understand what athletes gain from illicit products.
This study sought to do just that. And it used the full battery of standard maximal and sub-maximal in-lab measures to test the effects of EPO, including lactate threshold, max power, sub-maximal power, both ventilatory thresholds, efficiency, and a variety of other measures. This study’s in-lab testing was very thorough.
The study also addressed a common criticism of exercise science research — that often the in-lab tests of performance (such as a time trial to exhaustion) do not simulate actual racing. So, the study authors used a simulated race similar to what professionals might experience at the Tour: A 131-kilometer “stage” to the top of Mont Ventoux.
The in-lab testing provides a number of insights into the effects of EPO. It clearly shows greater improvements in the riders who took EPO. However, while a noble idea, several methodological issues of the real-world simulation up Ventoux negate any conclusions about real-world performance and the dramatic claims in the media.
Issue One: The study claims no improvements in performance without measuring past performance
Unfortunately, the authors are inconsistent about making comparisons within groups or across groups to draw their conclusions. They use an across-group comparison to draw within-group conclusions. That is simply not good study design.
To make this point, let’s use a fictional example of a cholesterol drug study. A within-group comparison would measure the subjects’ cholesterol levels before, during, and after taking the drug. It would then look for changes in their levels.
A stronger study — known as a placebo-controlled study — would use the same experimental design but have a second group taking a placebo. The changes in cholesterol levels over the course of the study would be compared to the placebo group to see if they were significantly different.
Now, imagine you participated in this study and the researchers only tested your cholesterol at the end of the study. Then they told you that the drug had no effect on your cholesterol levels, because your levels were no different than that of the placebo group. You would be left asking, “But were they the same at the start of the study, and did my cholesterol levels change?”
In the lab, the authors of the EPO study were careful to measure the participants at multiple time points. There, they saw bigger improvements in the EPO group. But the race up Ventoux — their test of real-world performance — was only done once. At the end of the study.
A large and inappropriate assumption was made: The EPO and placebo groups would have performed the same before the study. That’s a dangerous assumption. If you doped for a year and then matched Fabian Cancellara in a time trial, no one would claim that the EPO had no benefit on your performance.
A proper scientific study should never make assumptions about the relative level of groups without testing that comparison. In this case, it appears that the two groups may not have been equal at the start of the study: Across multiple parameters, the placebo group appears to have been stronger at the outset. To give just a few examples, the placebo group had a starting maximal power output of 4.36 W/kg compared to 4.19 W/kg. Their lactate threshold power of 3.90 W/kg was higher than the EPO group’s at 3.84 W/kg. Their ventilatory threshold (a good measure of submaximal power) was 50.291 mL*min-1*kg-1 compared to 48.912 mL*min-1*kg-1 .
It’s important to point out that the differences at the start of the study may not be statistically different. (For the true science geeks out there, no p-value or confidence interval was given.) Often in research, differences are just artifacts and should be ignored. Many researchers will tell you that an “eyeball test” will sometimes show a clear trend but that, due to a low number of participants (as in this study), they forced to conclude are not significant.
The fact is that the EPO group did experience statistically greater improvements in multiple parameters. So, it could be the researchers are right to conclude that in-lab improvements didn’t translate to real-world performance improvements. Or, it could be the eyeball test is correct — the EPO group was weaker to start and most of their improvements just raised them to the placebo group’s level.
Without a before and after real-world performance test to assess changes, we will never know what contributed to them. Most importantly, no conclusions can be drawn about relative improvements in real-world performance.
Issue Two: The study group is not representative of professionals
Due to the nature of their criteria, the researchers gathered a very unusual study group. The authors admit to one issue: To be included, a cyclist had to discontinue his or her membership in any organization that had anti-doping regulations. So, most participants either did not race or raced minimally.
Participants also had to be able to perform a 4.0 W/kg or greater maximal exercise test. Ultimately 121 riders were excluded based on fitness. The final pool of riders had a VT2 (threshold) of 306 watts or 3.99 W/kg, which is very strong for amateur riders. Yet, these riders averaged 4.9 hours (EPO group) and 5.9 hours (placebo group) of training per week. As a coach who has worked with a lot of amateur riders, it is unusual to see a cyclist who can put out that power on that little training. The authors say this group “closely approached the level of elite cyclists.” However, professional cyclists train 20-30 hours per week.
It is unclear how the unusual nature of the participants affected the results. The participants are clearly not representative of pro riders or typical amateurs. They are certainly naturally gifted. The important question is, do those differences influence the effectiveness of EPO and negate extrapolating their results to pros?
Issue Three: The low training volume may not allow a true test of EPO’s benefits
You can dope a couch potato to “Armstrong-ian” levels — they still aren’t going to win any bike races. In interviews, admitted dopers say time and again that they still had to do the training. They also frequently state that the biggest benefits are in recovery — they can train harder and recover better. At five hours per week of training, the study participants would have seen no recovery benefits. The researchers admit to this issue.
Further, if doping enhances the benefits of training, then with such low training volume one would expect the performance gains from EPO to be minimal. Frankly, the in-lab improvements seen in the doped group were impressive, especially considering they were under such minimal training load. With the small number of participants, the unusual nature of the participants, and their low training volume, I would not expect the EPO group to perform better. The fact that they did see significantly better in-lab improvements is telling.
Issue Four: “Mimicking a professional road race” was an inappropriate performance test for these participants
Cycling studies are often criticized for using “performance” tests that do not match real-world racing. So, while the focus of the study was in-lab testing, the study authors should be commended for attempting a real-world performance test.
However, the test basically simulated a stage of a professional race. The cyclists rode 110 kilometers with 5,000 feet of climbing to the base of Ventoux. Then, they raced up Ventoux (21.5 kilometers, 5,282 feet of climbing) for a total of 131.5 kilometers and 10,282 feet of climbing. The average time up Ventoux was one hour and 40 minutes. If we assume the ride to Ventoux averaged 31kph, that’s a total of five hours and 10 minutes. In other words, this simulated race was longer than what the EPO group averaged per week. This group of riders was ill-prepared to race such a stage.
The cyclists in this study were so ill-prepared for this professional level stage that their performance was likely more about grit and a natural ability to tolerate fatigue than physiological strength. Doping doesn’t help “grit.” This is evidenced by the fact that hemoglobin and hematocrit concentrations correlated with all in-lab maximal and sub-maximal physiological parameters in the EPO group but did not correlate with Ventoux times at all.
As further evidence that this race was more about fatigue tolerance, the subjects had a tested sub-max power of 3.66 and 3.72 W/kg for the placebo and EPO groups. They climbed Ventoux at a much lower 3.09 and 3.03 W/kg level respectively. They were fatigued.
Finally, in past studies lab parameters such as lactate threshold have correlated well with on-road flat and hill climb time trials. In this study, the authors point out that the correlation was “moderate at best.” Again, this demonstrates that physiological assets did not drive their performance.
The researchers ultimately imply that lab results aren’t a great predictor of real-world performance. Based on multiple past studies showing otherwise, the on-road test in this study was just a poor indicator of performance.
There were elements of this study that were very well-conducted. The study authors should be commended for including a “real world” performance test. However, it’s unfortunate that the media has latched onto the results of the on-road performance test and not the in-lab results.
The lab testing was very well conducted, very thorough, and clearly showed that the EPO group experienced greater improvements despite their low training volume. The race up Ventoux was a noble idea but poorly conducted.
Ultimately, it was clearly inappropriate for the authors to conclude that EPO “did not improve road race performance” based on the race up Ventoux.