![]() |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
Purpose of Studies Evaluating Diagnostic Accuracy The aim of studies evaluating diagnostic accuracy is to evaluate how well the results of the test under investigation (the index test) agree with the underlying patient's condition or disease (the reference standard). Note that the reference standard may be another laboratory test, a radiological examination, a post-hoc clinical consensus, a discharge diagnosis, the impression of the attending physician, or any combination of these. More Common Sources of Bias in Diagnostic Studies 1. Selection (sampling) bias—Who is in the study?
The criteria by which patients are included in, or excluded from, a study may not constitute bias per se, however, such criteria may limit the ability to generalize study findings to other populations (external validity). For example, a study evaluating the diagnostic accuracy of prostate specific antigen that includes only men over 70 years of age with an abnormal digital rectal examination would probably have findings with limited value to population screening of asymptomatic men over 40 years of age. Selection bias is more common in diagnostic case-control study designs in which the reference standard is typically done first. In effect, selection criteria also modify disease prevalence in the study and as a result change the positive and negative predictive values reported from the study.
2. Spectrum bias—How sick are the subjects in the study?
Spectrum bias is a form of selection bias in which patients selected for inclusion to the study have a narrow spectrum of disease severity. For example, a diagnostic accuracy study looking at a new cardiac marker will perform best if only patients with documented large myocardial infarctions are included in the study (“late spectrum” patients). The results of such a study are typically of limited use if the test were to be employed for all patients presenting to the ED with chest pain suspicious for AMI.
3. Incorporation bias—Does the “disease” include the “test”?
When the index test, or a close congener, is included as part of the reference standard, diagnostic accuracy will invariably be better for the index test. For example, in a study evaluating the diagnostic accuracy of serum amylase in the determination of acute pancreatitis, the reference standard should not include serum lipase or anything like it. For example, if the reference standard of such a study were to include “classical symptoms,” or radiographic findings, or elevated serum pancreatic amylase then the study would most likely have incorporation bias. Since elevated amylase and elevated lipase are strongly correlated in this patient group, the study would, in essence, become to a great extent a study of the correlation between serum pancreatic amylase and serum lipase.
4. Verification (workup) bias—Are the final diagnoses of all study subjects confirmed by the same reference standard?
When some study subjects have their diagnosis confirmed with one reference standard and the remainder is subjected to another reference standard, the study may have differential verification bias . Most of the time a more invasive or expensive reference standard is applied to those with a positive index test result and another, less invasive or less expensive, test is reserved for those with a negative index test. For example, in a study evaluating the diagnostic accuracy of d-dimer for pulmonary embolism, those with a positive d-dimer test might undergo pulmonary angiography while those with a negative test might have a ventilation-perfusion scan. When only a portion of the final study population undergoes diagnostic confirmation with the reference standard, the study may have partial verification bias . Most often subjects are selectively chosen to undergo the reference standard when the results of the index test are positive. In some cases ethical considerations may limit the confirmation of only positive tests with the reference standard. For example, in a study evaluating the diagnostic accuracy of d-dimer for pulmonary embolism, it might be considered too risky to perform pulmonary angiography on patients with a negative d-dimer assay.
5. Comparison bias—Who are the “normal” subjects in the study?
The choice of control group characteristics can influence the diagnostic accuracy reported from a study. For example, comparing the diagnostic accuracy of prostate specific antigen ( PSA ) in patients with prostate cancer and a control group with benign prostatic hypertrophy (BPH) will differ in diagnostic accuracy than if the control group were asymptomatic men over age 40. The latter control group would likely have better diagnostic accuracy.
6. Test Precision Issues —How good are the measurements for the index test and reference standard? When either the index test or reference standard are significantly lacking in precision then random error will creep into the final analysis and precision bias will be introduced into the study. Many papers fail to mention parameters describing the analytic precision of index tests or the inter-rater reliability of reference standard tests. Consequently, inclusion of a coefficient of variation or kappa statistic in a manuscript adds tremendously to an evaluation of literature quality.
7. Threshold bias—How is a positive test defined? Post-hoc modification of index test thresholds can lead to threshold bias in which either sensitivity or specificity is overstated. For example, a study in which the threshold for serum bone natriuretic peptide ( BNP ) is very high (e.g., 1000 pg/mL) will have a fantastic specificity since virtually all control subjects (without CHF) will have BNP results well below the threshold. Similarly, if the threshold is set very low (e.g., 10 pg/mL) the study would report fantastic sensitivity since almost all subjects with CHF would have BNP results much greater than the threshold. A solution to this type of bias is to report either a single parameter for overall diagnostic accuracy (e.g., diagnostic odds ratio) or a receiver operating characteristic (ROC) curve.
8. Review bias—Is there appropriate blinding in the study?
If those performing the index test are aware of reference standard determinations then forward review bias is introduced into the study. Likewise, if those making the reference standard determination are aware of the results of the index test then reverse review bias is introduced into the study. Proper blinding of those conducting the index test and reference standard will prevent review biases in studies of diagnostic accuracy.
9. Disease progression bias—How long did it take between the index test and reference standard determination May occur when a significant time elapses between the time the index test is performed and the confirmation of the diagnosis using the reference standard. As time between the index test and reference standard determination increase, the study becomes more of a prognostic accuracy study rather than a diagnostic accuracy study. Bossuyt PM, Reitsma JB, Bruns DE, Gatsonis CA, Glasziou PP, Irwig LM, et al. The STARD statement for reporting studies of diagnostic accuracy: explanation and elaboration. Clin Chem. 2003;49:7-18. [Free Full Text] Boyko EJ, Alderman BW, Baron AE. Reference test errors bias the evaluation of diagnostic tests for ischemic heart disease. J Gen Intern Med. 1988;3:476-81. [PubMed] Cecil MP, Kosinski AS, Jones MT, Taylor A, Alazraki NP, Pettigrew RI, et al. The importance of work-up (verification) bias correction in assessing the accuracy of SPECT thallium-201 testing for the diagnosis of coronary artery disease. J Clin Epidemiol. 1996;49:735-42. [PubMed] Corley DE, Kirtland SH, Winterbauer RH, Hammar SP, Dali DH, Bauermeister DE, et al. Reproducibility of the histologic diagnosis of pneumonia among a panel of four pathologists: analysis of a gold standard. Chest. 1997;112:458-65. [PubMed/Free Full Text] Good BC, Cooperstein LA, DeMarino GB, Miketic LM, Gennari RC, Rockette HE, et al. Does knowledge of the clinical history affect the accuracy of chest radiograph interpretation? AJR Am J Roentgenol. 1990;154:709-12. [PubMed] Heffner JE, Feinstein D, Celia B. Methodologic standards for diagnostic test research in pulmonary medicine. Chest 1998;114:877–885.[ Free Full Text] Holloway RG, Feasby TE. To test or not to test? That is the question. Neurology 1999;53:1905–1907.[ Free Full Text] Jaeschke R, Guyatt G, Sackett DL. Users’ guide to the medical literature. III. How to use an article about a diagnostic test. A. Are the results of the study valid? JAMA 1994;271:389–391.[PubMed] Jaeschke R, Guyatt G, Sackett DL. Users’ guide to the medical literature. III. How to use an article about a diagnostic test. B. What are the results and will they help me in caring for my patients? JAMA 1994;271:703–707.[Pubmed] Lachs MS, Nachamkin I, Edelstein PH, Goldman J, Feinstein AR, Schwartz JS. Spectrum bias in the evaluation of diagnostic tests: lessons from the rapid dipstick test for urinary tract infection. Ann Intern Med. 1992;117:135-40. '[PubMed] Lijmer JG, Mol BW, Heisterkamp S, Bonsel GJ, Prins MH, van der Meulen JH, et al. Empirical evidence of design-related bias in studies of diagnostic tests. JAMA. 1999;282:1061-6. [PubMed] Melbye H, Straume B. The spectrum of patients strongly influences the usefulness of diagnostic tests for pneumonia. Scand J Prim Health Care. 1993;11:241-6. [PubMed] Mol BW, Lijmer JG, van der Meulen J, Pajkrt E, Bilardo CM, Bossuyt PM. Effect of study design on the association between nuchal translucency measurement and Down syndrome. Obstet Gynecol. 1999;94:864-9. [PubMed/Free Full Text] Moons KG, van Es GA, Deckers JW, Habbema JD, Grobbee DE . Limitations of sensitivity, specificity, likelihood ratio, and Bayes' theorem in assessing diagnostic probabilities: a clinical example. Epidemiology. 1997; 8:12 -7. [PubMed] Phelps CE, Hutson A. Estimating diagnostic test accuracy using a "fuzzy gold standard." Med Decis Making. 1995; 15:44 -57. [PubMed] Raab SS, Thomas PA, Lenel JC, Bottles K, Fitzsimmons KM, Zaleski MS, et al. Pathology and probability. Likelihood ratios and receiver operating characteristic curves in the interpretation of bronchial brush specimens. Am J Clin Pathol. 1995;103:588-93. [PubMed] Raab SS, Oweity T, Hughes JH, Salomao DR, Kelley CM, Flynn CM, et al. Effect of clinical history on diagnostic accuracy in the cytologic interpretation of bronchial brush specimens. Am J Clin Pathol. 2000;114:78-83. [PubMed/Free Full Text] Ransohoff DF, Muir WA. Diagnostic workup bias in the evaluation of a test. Serum ferritin and hereditary hemochromatosis. Med Decis Making. 1982;2:139-45. [PubMed]Reid MC, Lachs MS, Feinstein AR. Use of methodological standards in diagnostic research. Getting better but still not good. JAMA 1995;274:645–650.[Abstract] Roger VL, Pellikka PA, Bell MR, Chow CW, Bailey KR, Seward JB. Sex and test verification bias. Impact on the diagnostic value of exercise echocardiography. Circulation. 1997;95:405-10. [PubMed/Free Full Text] Taube A, Tholander B. Over- and underestimation of the sensitivity of a diagnostic malignancy test due to various selections of the study population. Acta Oncol. 1990;29:971-6. [PubMed] Westwood ME, Whiting PF, Kleinen J. How does study quality affect the results of a diagnostic meta-analysis? BMC Medical Research Methodology 2005, 5:20 doi:10.1186/1471-2288-5-20 [E-pub; pdf ] Whiting PJ, Rutjes AWS, Dinnes J, Reitsma JB, Bossuyt PMM, Kleijnen J. Development and validation of methods for assessing the quality of diagnostic studies. Health Technol Assess. 2004;58(1):1-234. [PubMed/Free Full Text] Whiting PJ, Rutjes AWS, Dinnes J, Reitsma JB, Bossuyt PMM, Kleijnen J. A systematic review finds that diagnostic reviews fail to incorporate quality despite available tools. J Clin Epidemiol. 2005;58(1):1-12. [PubMed] |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
© 2005, Brad Brimhall, MD, MPH