[Please note: changes were made to this blog on 27/7/15, following discussion between Ioana Cristea and Tom Johnsen – see comments below].
Mental Elf readers are well aware of criticisms directed at psychotherapy’s prodigal son: cognitive behaviour therapy (CBT). Unarguably the most studied and most recommended form of psychotherapy, CBT has nonetheless been been shown to have some problems (see my previous blogs on CBT for adult depression and wait-list control exaggerating the efficacy of CBT).
This may explain why critics particularly rejoiced at this recent meta-analysis (Johnsen and Friborg, 2015), and why it attracted so much attention and almost indiscriminate praise, both among researchers and in the media. Publication in psychology’s number 1 journal, Psychological Bulletin, seemed to additionally guarantee its trustworthiness. But did it?
Disclosure: I am currently involved in a re-analysis of the Johnsen and Friborg study (2015).
Methods
The primary objective of this meta-analysis was to examine whether published clinical CBT trials (both uncontrolled and controlled) for depressive disorders showed an historical change, in the sense of an increase in their treatment effects over time, independent of other study related variables. The authors included both uncontrolled and controlled trials (randomised or not) in their meta-analysis.
They used an impressive set of exclusion criteria. Studies were excluded if:
- The implemented therapy was not “pure” CBT (e.g. mindfulness-based CBT)
- Unipolar depression was not the primary diagnosis
- Participants were not adults
- Therapy was not implemented by a trained CBT therapist
- The psychological intervention was not intended to treat depression
- Outcome was not measured with the Hamilton Rating Scale for Depression (HRSD) or the Beck Depression Inventory (BDI)
- Patients had acute physical illnesses, bipolar or psychotic disorders
- Treatment was not implemented as individual face-to-face therapy
- Patients had a BDI score lower than 13.5
For studies that did not include a control group, effects sizes (ES) were computed as the standardised mean difference, by subtracting the post- from the pre-intervention means, and dividing by the standard deviation of the change score. For controlled trials, effect sizes were calculated separately for the intervention and control group, with a similar procedure. The authors also computed remission rates, defined as the number of patients who completed treatment with a BDI score below a predefined cut-off of 10.
Results
The authors identified 70 trials, out of which 52 were randomised controlled trials (RCTs), and the rest were non-randomised trials. Apart from computing overall effect sizes, thus combining RCTs and non-randomised studies, they also operated an unusual combination. They also computed a controlled effect size (RCTs with a waitlist or treatment as usual control) and within-study design effect sizes (all non-randomised trials PLUS the CBT arm from RCTs that could not, in the authors’ words, be qualified as controlled in the present analyses, even if they were also RCTs such as comparisons with medication). What is essential to note is that only the former category (controlled effect sizes) included exclusively RCTs.
53 within-study design effect sizes and 17 controlled effect sizes (out of which 15 were wait-list comparisons) were analysed.
- There was a negative relationship between the ESs of CBT based on the BDI and publication year (p<.001). Subgroup analysis indicated that a similar relationship was evident among both within-study design (p<.001) and controlled studies (p<.05).
- A similar trend of the ESs of CBT decreasing with publication year was shown on the HRSD (p=.01). However, in this case, while the relationship was evident in the within-study design (p<.01), it was not significant in the controlled studies (p=0.51).
- Remission rates were also negatively related with publication year (p<0.01). Unfortunately, the authors did not report separate results for within-study design and controlled effect sizes.
- The waiting list control group did not show a similar trend of decreasing ESs across time (p=.48).
- Analyses excluding studies with small sample sizes (arbitrarily defined as n<20) obtained the same significant negative trend (p=0.02). Again, the authors failed to report separate results for within-study design and controlled studies.
Conclusion
The authors concluded that:
The main finding was that the treatment effect of CBT showed a declining trend across time and across both measures of depression (the BDI and the HRSD).
The authors also discuss possible reasons for the decreasing effects of CBT, ranging from deviations from the therapy manual, to the reduction of treatment fidelity and the dynamics of the placebo effect. As the placebo effect is always higher for new treatments, they wonder if that may have been the case for CBT, and whether as time passed, positive expectations about CBT dwindled. In fact, they even worry whether their own meta-analysis might further weaken faith in CBT.
But for anyone familiar with the methodological aspects of meta-analyses, this particular one does not engender a loss of confidence in CBT. It does, however, elicit considerable loss of faith in meta-analyses and in their reliability and usefulness.
Limitations
- The most important limitation is the combination of uncontrolled and controlled trials, or rather of non-randomised and randomised. Randomisation of participants to treatment groups serves to ensure that sources of bias are equally distributed between these groups, with the only difference between them being the intervention. Non-randomised trials are subject to a whole array of sources of bias, which to a degree we have no way of appropriately gauging. Effects in these trials might be due to many factors other than the intervention, such as the passing of time, non-specific factors like subjects’ expectations (the Placebo effect), the subjects being particular cases, variables unbeknownst to the experimenter being responsible for change, and so on. This phenomenon is of course made worse in uncontrolled trials, where we have absolutely no way of knowing whether effects were due to the specific nature of the intervention at all.
- Why, then, may the reader justly ask, do we even have non-randomised and indeed uncontrolled trials? Well, in some cases, for pragmatic reasons, it is impossible to randomise participants to treatment conditions. Maybe the disease is so rare or so serious that randomisation would be unfeasible or unethical. Maybe the treatment is so new and untested that one needs to see if it’s worth pursuing at all or if it doesn’t carry serious side-effects. Fortunately, none of these is the case for CBT, which has plenty of RCTs.
- The authors seem completely oblivious of the many recent meta-analysis of the efficacy of CBT in depression. At least 4 such meta-analyses including comparisons between CBT and a control group were conducted since 2013 (Cuijpers et al, 2013; Barth et al, 2013; Furukawa et al, 2014; Chen et al, 2014) and without exception all of them exclusively included RCTs, with numbers for CBT versus presumably non-active control group comparisons (waitlist, no
treatment, treatment as usual, placebo) ranging from 49 to 115. In contrast, the authors of this meta-analysis included only 17 such group comparisons. This difference is staggering, and difficult to explain, even if we assume more restrictive inclusion criteria. - One meta-analysis (Chen et al, 2014) was also an historical analysis looking at the changes in the quality and quantity of psychotherapy trials (including CBT) for depression. It revealed a relevant, albeit unsurprising, fact about RCTs for depression: most trial quality criteria considered improve over time, and this improvement was particularly present in CBT trials. So it is plausible that the apparent decrease in the efficiency of CBT for depression over time might simply be a by-product of increasing quality of trials. Johnsen and Friborg did look at study quality and found no moderating effect. But given their hotchpotch of uncontrolled and controlled trials and their limited sample of CBT studies, this analysis is not very informative.
- Related to this, another aspect that changed during time is sample size, with earlier studies having small sample studies. This is an important confounder as it is well established for treatments in general that larger studies yield smaller effect sizes and almost all meta-analyses of psychotherapy for depression have found evidence of this small sample bias. The authors did redo their analysis using an arbitrarily defined cut-off for sample size, but did not look whether sample size significantly moderated effect sizes, or the relationship between effect sizes and publication year.
- An important analysis that is missing regards heterogeneity, which the authors say they have analysed, but I was unable to find in the paper. Given their combination of studies, heterogeneity was probably very high. So high, in fact, as to indicate there is not much point in combining these studies at all. It is telling that in the most homogenous sample of studies (RCTs) the decreasing trend of CBT was much less evident.
Implications
I think the most important implication of this systematic review is not whether the initial efficiency of CBT vanished into thin air or what may be the reasons for that, but whether we can really put our faith in these reviews anymore.
It has always been argued that a major advantage of meta-analyses was that by aggregating more trials they could provide a more objective, balanced view of a field, where sources of bias would be effectively controlled. But if researchers conducting these reviews can obtain such widely different results, if their methodological choices can have such an influence over the results, and indeed if publication in the number one journal in a field is no guarantee, should end users still trust the objectivity and reliability of meta-analyses?
Links
Primary paper
Johnsen TJ, Friborg O, (2015) The effects of cognitive behavioral therapy as an anti-depressive treatment is falling: A meta-analysis (PDF). Psychol. Bull. 141, 747–768. doi:10.1037/bul0000015
Other references
Barth J, Munder T, Gerger H, Nüesch E, Trelle S, Znoj H, Jüni P, Cuijpers P (2013) Comparative efficacy of seven psychotherapeutic interventions for patients with depression: a network meta-analysis. PLoS Med. 10, e1001454. doi:10.1371/journal.pmed.1001454
Chen P, Furukawa TA, Shinohara K, Honyashiki M, Imai H, Ichikawa K, Caldwell DM, Hunot V, Churchill R (2014) Quantity and quality of psychotherapy trials for depression in the past five decades. J. Affect. Disord. 165, 190–195. doi:10.1016/j.jad.2014.04.071 [Abstract]
Cuijpers P, Berking M, Andersson G, Quigley L, Kleiboer A, Dobson KS (2013) A meta-analysis of cognitive-behavioural therapy for adult depression, alone and in comparison with other treatments. Can. J. Psychiatry Rev. Can. Psychiatr. 58, 376–385. [Abstract]
Furukawa TA, Noma H, Caldwell DM, Honyashiki M, Shinohara K., Imai H, Chen P, Hunot V, Churchill R (2014) Waiting list may be a nocebo condition in psychotherapy trials: a contribution from network meta-analysis. Acta Psychiatr. Scand. doi:10.1111/acps.12275 [Abstract]
No-one serves garlic trifle, so why serve meta-analyses w/ randomised & non-randomised trials? http://t.co/DSajLmVa3p http://t.co/7Qz9Pp1gVa
@Mental_Elf lol!!!!!!!!!!!! :-D
@Mental_Elf @ian_hamilton_ In all fairness, I do serve garlic trifle.
My dinner parties are sparsely attended.
@Mental_Elf @carotomes seriously? http://t.co/KURoVNfEIz
Crisis of faith? Instead of CBT, we should be worrying about meta-analyses http://t.co/tf7xZQXpLM #MentalHealth http://t.co/uAMiZQ9cZ4
Brilliant blog @Mental_Elf should this review make us question metaanalysis techniques rather than the therapy itself http://t.co/jreBDp0zwk
Crisis of faith, but in what? My 13th post @Mental_Elf analyses everybody’s favorite recent meta-analysis: http://t.co/tOB3j4N0h3
Crisis of faith? Instead of CBT, we should be worrying about meta-analyses https://t.co/14A2bQKPuX
Read @Zia_Julia’s critique of the recent meta-analysis suggesting that the antidepressant effect of CBT is falling http://t.co/yoZSBCVQS4
Psychol Bull meta not differentiating controlled vs uncontrolled trials, wtf? @Zia_Julia @Mental_Elf show value of post-publctn peer review
RT @Mental_Elf: Are Tom Johnson and Oddgeir Friborg from @UiTromso on Twitter? We’ve blogged about their research today http://t.co/yoZSBCV…
@OliverBurkeman Interested in yr thoughts on our blog of Johnsen & Friborg study you wrote about on Jul 3 http://t.co/yoZSBCVQS4 #CBT
@Mental_Elf This is interesting – thanks for it!
@Mental_Elf All psycho-shamanism decline. Better improve outside (REAL) factors than explain people that their ‘selves’ are just wrong.
Hi @psych_writer My comment on yr article about Johnsen & Friborg study http://t.co/xkpyhmWAjX Pls read @Zia_Julia http://t.co/yoZSBCVQS4
Meta-analysis- CBT for depression becoming less effective http://t.co/uFkIROAEH1 But see these criticisms @mental_elf http://t.co/UJa6IhjbR5
@ResearchDigest @Mental_Elf Good article. Link here meta analysis comparing ES for low vs high qual RCTs for deprsn
http://t.co/4LS0fgcqiV
Hi, this meta-analysis is actually referenced in the post. And I blogged about it 4 @Mental Elf: http://www.nationalelfservice.net/mental-health/depression/a-meta-analysis-of-cognitive-behavioral-therapy-for-adult-depression-the-winner-takes-it-all/
@ResearchDigest @Mental_Elf A sign of the rise of third wave therapies? Recent meta suggested trend for #ACT better for depression than CBT
RT @Mental_Elf: Some things in life you can rely on, but this meta-analysis of #CBT isn’t one of them http://t.co/n1HWFUWHlo http://t.co/Ii…
Crisis of faith? Instead of #CBT, we should be worrying about #metaanalyses http://t.co/yMQRNtbQGw @Mental_Elf on #systematicreviews
The problem is not meta-analysis, but the inappropriate use of meta-analysis http://t.co/yPlgy3SlYj #cbt
#CBT #depression @Zia_Julia has written some great critical blogs for us, but this one takes the biscuit http://t.co/yoZSBCVQS4
Hello @APA Is there a Twitter account for the Psychological Bulletin? We’d like some feedback on a blog post please http://t.co/yoZSBCVQS4
@Mental_Elf @APA while you’re waiting fancy a read of this just out:
http://t.co/kZyVPV5HWU
@Mental_Elf Not one dedicated to that specific journal, but you can certainly get in touch with @APA_Journals. Thanks!
@APA_Journals Please can someone from the Psychological Bulletin respond to our blog post? http://t.co/yoZSBCVQS4 #CBT #MetaAnalysis
Don’t miss: Crisis of faith? Instead of CBT, we should be worrying about meta-analyses http://t.co/yoZSBCVQS4 #EBP
@Mental_Elf great article…but I’m still shocked that no one is suggesting that the results of the initial study have been impacted by…
@Mental_Elf the fact that in an era in which we are glued to phones, that CBT clinicians have tried to get ppl to carry around a peice…
@Mental_Elf of paper to do homework! Surely CBT has suffered because of this and surely this will be corrected as CBT goes mobile!
A #ClarityBadger to @Zia_Julia for her brilliant critique of this #CBT bashing meta-analysis http://t.co/yoZSBDdrJC http://t.co/jHULhiKBpH
@Mental_Elf Aww thank you! Love badgers. And clarity.
Thanks for a great blog post, Ioana. I share your concern that caution is warranted with the abundance of meta-analysis and the different methods used. It instantly raised a few points in my head that I’d love to hear your thoughts about.
1. You write about meta-analyses including randomised and non-randomised studies and wonder why non-randomised studies should be included in the first place (for fields where plenty of RCT evidence is available). However, a recent article has pointed out that in small to moderate-sized samples, simple randomisation techniques may not be so fool proof in reducing bias. It was proposed that comparability of intervention groups may be more important in reducing bias. As such, should meta-analysis not only take into account whether randomised allocation was used, but also the comparability of the intervention groups when evaluating the quality of primary studies and aggregating the evidence? Stratified randomisation could be more important than simple randomisation.
Reference: http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0132102#abstract0
2. Several meta-analyses evaluated the efficacy of CBT for depression and one may wonder how much a new meta-analysis still adds. My biggest concerns with several related meta-analyses is the potential double counting of the evidence. Although each meta-analysis generally accounts for this, I think the greater danger is that policy makers see these meta-analyses and then aggregate their results, which is likely to be a more harmful way of overstating the evidence when a single primary study is included in four different meta-analysis.
Reference: http://www.biomedcentral.com/1471-2288/9/10/
We could leave it up to the reader to check previously published meta-analyses on a similar subject and adjust their conclusions accordingly, but shouldn’t it be our responsibility as researchers to make sure our work isn’t falsely interpreted or overstated?
It is already a requirement to register the protocol for new systematic reviews and meta-analyses and researchers are required to highlight whether there is a potential overlap with previous studies. However, it leaves the interpretation of the consequences of this overlap to the reader.
I was thinking about finding a way forward and was wondering: we quantify the quality of research with checklists, can’t we also quantify the overlap with previous meta-analyses? Authors could be required to declare the percentage of overlap with previous meta-analyses: for example, the percentage of participants/studies in the present meta-analysis whose data have been included in a similar meta-analysis that is interested in the same outcomes but may have used different inclusion criteria. Perhaps meta-analysis should be required to have a minimum % of unique participants/studies, or at the bear minimum provide a thorough quality appraisal and quantitative evidence regarding the contribution of having included studies that weren’t yet included in previous meta-analyses.
I don’t believe these suggestions are the right answer, but perhaps it’s time for a discussion on how to avoid the abundance of meta-analyses that may leave the reader more confused in their interpretations (it could turn into a pick ‘n mix to find a meta-analysis that supports your own ideas) and to ensure that we’re not overstating the evidence by double counting the evidence twice, three times or even more (within a single meta-analysis, as well as across multiple meta-analyses).
Curious to hear your thoughts.
Kind regards,
Leen
Hi. Thank you for these comments.
For the first, while I understand the point and this is certainly an interesting article, this does not mean, however, that including non-randomized trials, if there is the alternative of enough RCTs, is acceptable. Yes, randomization is not enough and most likely there are better ways to do it, but in its absence there is really no way of establishing equivalence between groups, because you can never be sure you measured all the relevant variables. Also so far there is no clear standard of ranking randomization methods, in the same way there is no standard of ranking methods of dealing with incomplete data (even if evidently some methods are better than others). So what I am trying to say is that while I am sympathethic with this argument, and maybe randomization in itself is not enough, it is still better than not having it.
Regarding your second point, I have to disagree. The article you cite refers in fact to include data on the same subjects within the same meta-analysis, it doesn’t apply to different meta-analyses on the same topic. Including studies that used the same sample within the same meta-analysis is clearly a problem (they are counted more than once) and in fact most meta-analyses and for sure all the ones I cited in this post specifically looked after this. If a paper mentioned it used a sample that was also reported in another paper, one of these papers was not considered. Authors were asked to confirm if there was sample overlap. From what I saw, people doing meta-analyses are generally careful about this. However, I totally do not agree it’s wrong to have more meta-analyses on the same studies. I did not have the space to detail this in my post, but the 4 ones I cited for depression all approached different (and in my mind very relevant) research questions. This is also why the number of studies varied, because for instance Furukawa at al excluded comparisons with treatment as usual. What you signal would only be a problem if different meta-analyses on studies that overlap would arrive at different conclusions; if they don’t and report results completely, I think it just serves to strengthen confidence in the results. In the case of the particular 4 meta-analyses I cited, their results, at least for the parts that were common, were very similar. And then of course each had its own original results, related to the particular research question they posed.
Thanks for your quick response.
I do understand that the link I gave for the second point referred to double counting of primary studies within one meta-analysis, and I’m aware that most meta-analysis take this into account. I also see the value in conducting multiple meta-analyses with a different approach and I understand that these may include some of the same studies. The potential danger that I see is when a report would mention that ‘the efficacy of CBT has been evaluated in four separate meta-analyses with N = ### and concluded that there is a moderate effect of SE = ### for CBT compared to TAU’. The issue here is that when collating evidence from multiple meta-analysis, authors do not always acknowledge the amount of overlap between these meta-analyses, and would conclude that the total N across meta-analyses is actually greater than the true N, which would lead to overstating the evidence.
Excellent measured critique of Johnson & Friborg meta-analysis of CBT for depression. @BABCP @DCoPUK @morriseric http://t.co/amu6lXtkbe
What does the future hold for meta-analyses? Join the discussion on our blog http://t.co/yoZSBCVQS4 http://t.co/mBfQIsNDDX
Instead of CBT, we should be worrying about meta-analyses http://t.co/LTA6s5eg3N
Crisis of faith? Instead of CBT, we should be worrying about meta-analyses https://t.co/H9WLrtamSV via @sharethis
RT @mental_elf:Crisis of faith? Instead of CBT, we should be worrying about meta-analyses http://t.co/kJenxELFhF #EBP
Most popular blog this week? It’s @Zia_Julia setting the record straight about *that* CBT meta-analysis http://t.co/yoZSBCVQS4
Crisis of faith? Instead of CBT, we should be worrying about meta-analyses https://t.co/h0dmgMygKi
Article: Instead of #CBT we should be worrying about #MetaAanalyses http://t.co/AGkXmZMIgx
Hello Mental Elf, its posters & readers. And thank you, Ioana, for your interest in our article.
I`m the first author of the critiqued analysis, and would like to take this opportunity to correct some of the factual incorrect information provided by the author of the blog-post. I am, quite frankly, extremely surprised and disappointed by the discrepancy betweeen what the article actually says was performed, and what Ioana has interpreted, and consequently stated, in her post.
I`m in the middle of my vacation right now, so I`ll just quickly adress the main inaccuracies -. which also happen to be the critiques focal points.
I must categorically deny that our meta-analysis comprise of only 15 RCTs. We have actually included 52 such studies in our paper. In fact, we have made efforts to include every available published RCT on the subject, if those studies met our -not very strict at all- inclusion criteria (for example, by utilising the BDI/HRSD, and by reporting data of pre/post scores and standard deviations). If anyone is aware of a specific, relevant paper that wasn’t included – and could be found in the databases we`ve looked through, I would appreciate it a great deal if you let me know.
Sadly, it is quite obvious that the author of the critique has not carefully read the very article she is now blazing down on. The real amount of included RCT`s could easily be found both in the method section (Effect sizes, page 6), the results section (Studies and Participants, page 7), AND in the descriptive table 1. There, every study and its characteristics (including RCT or other study form), are summed up in a very clear and concise manner.
How the conclusion of only 15 included RCTs was made, I am at a loss to explain. Upon reading the whole article, such a misconception seems incomprehensible.
The other factual incorrection I will take time to object to, is the point made regarding mixing RCT`s and non-randomised trials. We actually did several meta-regression analyses, where the non-randomized trials were excluded. The most comprehensive of those, consisting of the 50 plus RCT`s (page 7, AND table 3, page 13),revealed the exact same results, showing a significant fall (p <.001) with time in ES.
In addition to these two major misconceptions, there are a whole range of other factual errors in this blog spot, which I sadly don`t have the opportunity or time to go into right now. I must, however, urge the readers of this blog to please read the article carefully. Perhaps you`ll view our study in a more lenient (and well-informed) manner.
Lastly, I would like to stress that the aim of our study was not to discredit CBT, in any way, shape or form. We believe its an excellent treatment method, which I myself use frequently with my own patients – including those who are depressed.
My best regards,
Tom J Johnsen.
Dear Dr. Johnsen
Thank you for your comment. I did read the paper carefully. I would like to modestly like to point out that in general if readers (and I am by no means alone) might not understand something presented in the paper well enough, one might also consider the clarity of the data presentation. I assure you I have been studying this paper carefully.
On to your points, my error about the number of RCTs comes from a combination of your PRISMA flowchart (which along with its unusual presentation, it- no offence- states 17 RCTs). I then looked at page 7 at the controlled studies condition and there I found the number 15 which were the waitlist controlled studies. I meant to write 17, clearly this was a slip. However, checking your data now, as you instructed, there are indeed 52 RCTs in total and 17 in the controlled condition, although to be honest I am even more confused of how exactly they were used (because at this point some RCTs seem to go in the “within” or “uncontrolled” and others in what what you call “the controlled” condition). You state “The vast majority of these intervention studies were drawn from different randomized controlled trials, but because of methodological choices or study design issues, they could not be categorized as CTs in the present analysis.” I am sorry but this is very confusing, not to mention that once again it goes against the principle of randomization to put together the CBT condition from RCTs (even without a waitlist control; still participants were being randomized) with the CBT condition from uncontrolled, non randomized or what you call clinical field studies (which you do not define, but looking at them they clearly are not randomized). I will however, ammend the post, reflecting this issue and the uncertainties I still have around it.
On to your other point, I however disagree and admit no error. I carefully checked both table 3 and the results on page 7. For some measures, you do report separate controlled and uncontrolled results; for other you don’t. My point about creating this kind of hotchpotch is that you really do not know what you are combining.
I would have liked to see separate meta-regression results for ALL of your outcomes for the controlled and uncontrolled ESs (to use your terms), particularly since results, as one would expect, are much more modest for the controlled ESs (for the HRSD not even significant).
The paper only gives heterogeneity values in table 4 (again, an unusual choice) but looking at them they are just huge, almost all around 90%. This says a lot about your combination of studies, and in fact means that these studies should not even be combined at all, at least with this kind of heteregeneity.
Once again, thank you for the comments. I am certainly not one to go easy with CBT so I don’t think there is a need for reassurance on that front. However your paper did get a lot of media scrutiny (as usual, just focused on the conclusions and your- I hope you will agree- speculative interpretations). I think it’s only fair to expect that the methodology is also scrutinized, in the end, that is what post publication peer review and informing the public should also be about.
Best
Ioana
Crisis of faith? Instead of CBT, we should be worrying about meta-analyses https://t.co/zenz8D6u8y via @Mental_Elf
[…] que discuten los autores, pero sin embargo, son todas las hipótesis disponibles. Ioana Cristea, del blog de psicología The Mental Elf, proporciona una hipótesis diferente: el meta-análisis no está bien […]