Depression has a global (and European) point prevalence of about 4.4% (Baxter et al 2014); this translates into over 2.5 million Britons depressed at any one time.
It is estimated that up to 50% of cases are not identified in primary care and general medical practice (Cleare et al 2015).
NICE, in their depression clinical guidelines (National Institute for Health and Care Excellence, 2009) do not advocate general screening for depression (e.g. everyone attending their GP) but do recommend using the two Whooley questions (see Box) for ‘case finding’ in patients at increased risk of depression, such as those with long-term physical health conditions. NICE acknowledge that this is based on limited evidence about diagnostic accuracy, and it could be added, whether there is any benefit from so doing.
Whooley* questions for depression
1. During the last month, have you often been bothered by feeling down, depressed or hopeless? (YES/NO) 2. During the last month, have you often been bothered by little interest or pleasure in doing thing? (YES/NO YES to one or both questions is take as a positive screen for depression * So called from the first author of the original publication (Whooley et al, 1997) |
Given that the Whooley questions feature prominently in NICE guidance, Bosanquet et al (2016) set out to determine their diagnostic accuracy in a systematic review, and also to examine the benefit of an additional question asking whether the person wanted help which is sometimes used with the two questions.
Methods
The authors undertook a systematic review and meta-analysis by searching a wide range of electronic databases, including sources for studies in progress, those unpublished and the grey literature, from 1994 (when the Whooley questions were first published), until April 2015. The search strategy is available in supplementary material.
Studies were selected using a pre-piloted form by at least 2 reviewers, and had to use the standard Whooley wording (or derived translation) and scoring (see Box), with no restriction on how they were administered (including self-administration). The comparator was a gold standard diagnostic interview for major depression based either on the Diagnostic and Statistical Manual (DSM) or International Classification of Disease (ICD), and sufficient data had to be reported to extract 2×2 contingency tables (i.e. true positive, true negative, false positive and false negative results).
A bivariate diagnostic meta-analysis was undertaken to obtain pooled estimates of specificity, sensitivity, likelihood ratios, diagnostic odds ratios (ORs) and their associated 95% confidence intervals (CIs). The bivariate model took into account the precision by which differences in sensitivity and specificity had been calculated, incorporating and estimating the amount of between-study variability in sensitivity and specificity.
There were pre-specified subgroup analyses and examination of causes of heterogeneity.
Results
Ten studies were identified that met the inclusion criteria, ranging in size from 89 to 1,025, with the proportion varying from 3.3% to 34%. Six studies used the questions in English with clinicians administering them in most studies.
- For all studies the pooled sensitivity was high at 0.95 (CI 0.88 to 0.97) with a lower pooled specificity 0.65 (CI 0.56 to 0.74).
- The pooled positive likelihood ratio was 2.78 (CI 2.16 to 3.57) and pooled negative likelihood ratio 0.07 (CI 0.03 to 0.16), which means that a positive result only increases the likelihood that the person has depression modestly (e.g. up to 40% if population rate is less than 20%), but a negative results makes depression much less likely.
- The diagnostic OR (ratio of the odds of the test being positive if the subject has depression relative to the odds of the test being positive if the subject does not) was a healthy 36.91 (17.52 to 77.76).
- The level of between-study heterogeneity was low (I2=24.1%) suggesting that the studies tended to be measuring the same thing; only the prevalence of depression influenced the findings significantly in exploration of heterogeneity.
Analysis of the five primary care studies gave similar results. There were insufficient studies with similarly phrased ‘help’ questions for pooling – in general acknowledging the need for help appeared to decrease the sensitivity and increase the specificity of the test.
Conclusions
This meta-analysis confirmed the findings from previous reviews, and individual studies. The questions are efficient at ruling out depression when the population prevalence is low (e.g. <20%) but they are not an efficient way to identify depression. A positive screen requires a standard clinical assessment to take place subsequently, and most of those so assessed would not be depressed.
Strengths and limitations
The systematic review and meta-analysis were carried out to a high standard according to current best practice. Unfortunately the authors were unable to reach definitive conclusions about the use of additional questions about whether help was needed. A very useful Bayesian graph of pre-test versus post-test probabilities shows the trade-off between positive and negative results according to population prevalence of depression, showing it is only clinically useful to exclude depression at low prevalence.
The authors acknowledge that there are potential methodological issues in the included studies that could have led to test performance being overestimated, in particular most studies did not exclude those already known to have depression, and some studies were not blinded.
It is worth pointing out that the Whooley questions are not independent from the ‘gold’ standards used to make the syndromal diagnosis of major depression and it could be argued that this study is really a confirmation of the obvious; the real surprise would have been if results had been different. Whooley questions are extremely similar to the two core symptoms in the DSM system, at least one of which needs to be elicited to make the diagnosis. In fact it is difficult to see how a diagnosis of major depression can be made without endorsing at least one Whooley question (hence the high sensitivity). The lower specificity is explained by other symptoms being required for diagnosis (to reach a minimum threshold of 5 symptoms).
What this study cannot do is shed light on two key issues. First whether screening/case finding makes any clinical difference (e.g. Goldberg et al, 1998), and indeed whether benefits (e.g. appropriate treatment) outweigh potential harms (e.g. increased assessment time or over-diagnosis and inappropriate treatment). Second the assumption that excluding or identifying depression is enough when considering psychological distress/disorders. There is a danger that only thinking about ‘screening out’ depression may get in the way of recognising anxiety disorders. These are nearly as common as depression (Baxter et al 2014), occur in similar ‘high risk’ populations, cause significant morbidity and warrant treatment (Baldwin et al 2014).
Summary
The Whooley questions are sensitive but not specific in identifying major depression as defined by accepted diagnostic systems, something not surprising given their high similarity to core symptoms required to be present (but not sufficient on their own) to make the diagnosis. Although a negative test might be helpful in ruling out the syndrome of major depression in populations with a low prevalence, there is no reason to believe it performs well in excluding equally important anxiety disorders, so cannot be relied on to exclude a broader range of common psychiatric diagnoses. In individual situations where there is a high suspicion of depression, a full clinical assessment for depression is warranted; in which case the Whooley questions may be good place to start but not enough on their own.
Links
Primary paper
Bosanquet K, Bailey D, Gilbody S, et al (2015). Diagnostic accuracy of the Whooley questions for the identification of depression: a diagnostic meta-analysis (PDF). BMJ Open 5:e008913.
Other references
Baldwin DS, Anderson IM, Nutt DJ, et al (2014). Evidence-based pharmacological treatment of anxiety disorders, post-traumatic stress disorder and obsessive-compulsive disorder: a revision of the 2005 guidelines from the British Association for Psychopharmacology (PDF). J Psychopharmacol. 28:403-39.
Baxter AJ, Scott KM, Ferrari AJ, et al (2014).Challenging the myth of an “epidemic” of common mental disorders: trends in the global prevalence of anxiety and depression between 1990 and 2010 (PDF). Depress Anxiety. 31:506-16. (PubMed abstract)
Cleare A, Pariante CM, Young AH, et al (2015). Evidence-based guidelines for treating depressive disorders with antidepressants: A revision of the 2008 British Association for Psychopharmacology guidelines (PDF). J Psychopharmacol. 29:459-525.
Goldberg D, Privett M, Ustun B, et al (1998). The effects of detection and treatment on the outcome of major depression in primary care: a naturalistic study in 15 cities (PDF). Br.J.Gen.Pract. 48 :1840-1844.
National Institute for Health and Care Excellence. Clinical Guideline 90. Depression in adults (update): full guideline (PDF).
Whooley M, Avins A, Miranda J,et al (1997). Case-finding instruments for depression. Two questions are as good as many (PDF). J Gen Intern Med 12:439–45.
Photo credits
- Valerie Everett via Foter.com / CC BY-SA
- HasinHayder via Source / CC BY-NC-ND
- perceptions (off) via Foter.com / CC BY-ND
Whooley questions have high sensitivity and modest specificity in the… https://t.co/rzWmCOIiRx #MentalHealth https://t.co/xndDjaeZFO
Morning @SimonGilbody We’ve blogged about your Whooley questions meta-analysis in @BMJ_Open https://t.co/gfbss66q8i #depression #diagnosis
@Mental_Elf @SimonGilbody @BMJ_Open gr8 paper & blog. Might @mndsci MA on phq in Primary care be of interest? https://t.co/jIl4uGBIDx #oa
Whooley questions have high sensitivity and modest specificity in the detection of depression https://t.co/JJZcS7MTit
Today Prof Ian Anderson on the diagnostic accuracy of the Whooley questions for the identification of depression https://t.co/gfbss66q8i
Whooley questions have high sensitivity and modest specificity in the detection of depression https://t.co/PDbjIPvzoJ via @sharethis
Whooley questions have high sensitivity and modest specificity in the detection of depression https://t.co/JjPxNMLmqP via @Mental_Elf
Why does NICE not recommend screening for depression? I was thinking of introducing it as part of my work with higher ed students on campus. Now having 2nd thoughts
Up to half of people with depression are not identified as depressed by their GP https://t.co/gfbss66q8i
@Mental_Elf – 3 in 4 suffering from mental illness receive no treatment. @rcpsych
@Mental_Elf that is quite shocking! thanks for sharing.
Great summary of @SimonGilbody @MHARG_york work on #depression by @Mental_Elf https://t.co/4N9bsAevgC
How can we improve the detection of #depression in primary care? https://t.co/gfbss66q8i
Whooley questions are effective at ruling out depression, but false positives are common https://t.co/gfbss66q8i @rcgp @clarercgp @WeDocs
thoughtful @Mental_Elf comments on our Whooley 2Q depression review
https://t.co/b2jGycPAwJ https://t.co/VofLMZvsU1
@SimonGilbody @Mental_Elf
I feel it
When it’s there
Over 2.5 million Brits are depressed right now!
How can we improve detection & diagnosis?
https://t.co/y1FbD3xrMh https://t.co/IdnKOcEbCB
@Mental_Elf And over-diagnosis and over-treatment.
Don’t miss: Whooley questions have high sensitivity and modest specificity in the detection of depression https://t.co/gfbss66q8i #EBP
RT @Mental_Elf: The Whooley questions for depression are not a substitute for clinical assessment, says recent meta-analysis https://t.co/g…
Thank you for your thoughtful review.