Abstract This chapter provides an overview of the principles of evidence based medicine 2 and the purpose of this introductory chapter is to present an overview of the process which led their recommendations. The phrase Evidence Based Medicine ( EBM ) came into widespread use after 1992 following a publication by Guyatt et al. [ 5 ], and is now commonly agreed to mean: ‘…the conscientious, explicit, and judicious use of current best evidence in making decisions about the care of individual patients. The practice of evidence based medicine means integrating individual clinical expertise with the best available external clinical evidence from systematic research” it also means that “… thoughtful identifi cation and compassionate use of individual patients’ predicaments, rights, and preferences in making clinical decision…” [ 13 ]. The practice of EBM can be carried out by using the following principles: (1) ask a clinical question, (2) locate the evidence, (3) appraise and synthesize the evidence, and (4) apply the evidence [ 12 ]. Ask the Clinical Question On the face of it, asking the clinical question is straightforward. A patient problem is presented and a question arises. For example, Mrs. Smith is presenting with painless jaundice and a diagnosis of periampullary carcinoma. In considering the surgical options, you consider whether a pylorus preserving pancreaticoduodenectomy rather than a standard Whipple procedure should be performed. Going directly to Google with the key words “pylorus preserving pancreaticoduodenectomy ”, we obtain 47,900 hits, while Wikipedia results in 2 hits. Clearly, neither of these extremes is satisfactory in determining a surgical approach. A useful step is to convert this specifi c clinical question about Mrs. Smith to a form that will allow us to search for the relevant evidence. The PICO format, which is used throughout this book, is a useful tool for this purpose. The P stands for Patient or Population and specifi es the patient group to which the question refers, in this case it may be: (a) all patients undergoing a pancreaticoduodenectomy , (b) women over the age of 50, (c) Caucasian women over 50, or (d) Caucasian women over 50 who have previously undergone a cholecystectomy . It is apparent that each iteration of the defi nition of the population is more and more specifi c. These details are important, but we may limit the information available to us if we defi ne our population of interest too narrowly. The I is for the Intervention or exposure of interest, and specifi es what has happened to a group of patients such as an operation, or a diagnostic test. In our example the intervention we are considering is a pylorus preserving pancreaticoduodenectomy . However, there could also be specifi c issues that are considered important such as the specifi c method of reconstruction used or the use of drains. The C refers to the comparator that we are interested in. In this case it is a standard Whipple procedure, but again we should be mindful of specifi c details of the standard procedure that may be important for our specifi c question. S.K. Srinathan 3 O stands for the Outcome of interest. It is very important to be specifi c about the outcome of interest as it is likely that various studies may have used different outcomes in the study design than the one you are interested in. One study may have been focused on gastric emptying, whereas another may have been focused on blood loss during the procedure. It is worthwhile to identify each outcome of interest in the specifi c clinical scenario and to order them in order of importance to the patient and surgeon so that an overall assessment of the utility of an intervention can be made. Taking these features of the clinical question into account, we can frame the scenario for Mrs. Smith in the following PICO question: In patients with periampullary carcinoma or carcinoma of the pancreatic head, does a pylorus preserving pancreaticoduodenectomy result in 1) less blood loss 2) lower incidence of delayed gastric emptying 3) lower operative mortality than a standard Whipple procedure? P: Patients with a periampullary carcinoma or carcinoma of the pancreatic head I: pylorus preserving pancreaticoduodenectomy with the use of drains C: standard Whipple operation with the use of drains O: (1) operative mortality , (2) delayed emptying, (3) blood loss It is worth considering when reviewing the chapters in this book, whether the PICO questions chosen by the authors are suffi ciently similar to your own formulation of the question for their fi ndings and recommendations to apply to your specifi c case. Find the Evidence Often the fi rst step in a literature search is to go to PubMed, the interface to access the Medline database of citations in the National Library of Medicine in the United States. However, a search of “pylorus preserving pancreaticoduodenectomy ” produces 781 citations. This is more than we can reasonably go through for the purposes of answering a specifi c question for a patient. But, if we use the Clinical Queries page in PubMed which uses an algorithm to deliver focused studies relevant to clinical practice, [ 10 ] we obtain citations for 35 systematic reviews and 45 clinical studies, much better. Alternative search engines include TRIPdatabase (http:// and SUMsearch (, which use multiple databases including Medline, EMBASE, and databases of guidelines and technology may also be used. Last, but certainly not least is the expertise available through your local medical librarian who will be well versed in the methods of constructing a PICO question and fi nding the relevant information from the medical literature. 1 Finding and Appraising the Evidence: EBM and GRADE 4 Appraise the Studies Once we have found the studies of interest, the next step is to identify the “best evidence”. The concept of “best evidence” assumes a hierarchy of evidence. But in order to apply a hierarchy, it is important to understand the types of study designs and their use in answering specifi c types of clinical questions. Grimes et al. [ 7 ] provide a useful taxonomy of study designs (Fig. 1.1 ). In general, questions related to the superiority of one intervention over another (or no intervention) are best answered by experimental studies where one group of patients are assigned to the intervention by a bias free method, while another receive a comparison intervention. The gold standard for the experimental study is a well-designed randomized trial. Other types of clinical questions such as that of prognosis are appropriately answered using cohort studies, while questions of diagnosis rely on comparing the performance of a diagnostic test to a gold standard. All study types have the potential for any number of biases which may lead to a fi nding which deviates from the “truth” [ 8 ]. The tools of critical appraisal are used Fig. 1.1 Algorithm for classifi cation of types of clinical research (Grimes and Schulz [ 7 ], Reprinted with permission from Elsevier) S.K. Srinathan 5 determine the type and extent of these biases in the design and conduct of the study, and make a judgment of how it may have affected the fi ndings of the study and the extent to which it undermines our confi dence in the validity of the fi ndings. There are many excellent resources and tools to guide us in the specifi cs of appraising the medical literature and practicing EBM and these are listed in the recommended readings. What happens when despite the best formulation of a question and literature search we are unable to fi nd the high quality systematic review or randomized trial to guide us? Do we abandon the principles of EBM ? Again from Sackett: “Evidence based medicine is not restricted to randomized trials and meta-analyses. It involves tracking down the best external evidence with which to answer our clinical questions…. However, some questions about therapy do not require randomized trials (successful interventions for otherwise fatal conditions) or cannot wait for the trials to be conducted. And if no randomized trial has been carried out for our patient’s predicament, we must follow the trail to the next best external evidence and work from there” [ 13 ]. Although we can approach each problem we face by formulating a question and fi nding the best available evidence, individual clinicians are unlikely to have the time or resources to do this for all possible scenarios. To illustrate: our example PICO question generated 171 results using PubMed. There were 50 reviews, 74 relevant trials or studies, 3 guidelines and 44 other possibly relevant titles. This took an experienced medical librarian about 2.5 h to identify these studies, and does not include the time necessary to actually read these documents and appraise them. The alternative to searching for each question has been standard textbooks, which seek to distill the evidence and guide clinical practice. The authors of these textbooks have always made decisions about which studies to consider and judgments about their confi dence in making recommendation based on this evidence. However, these judgments and decisions have not been transparent. And although there are many schemes in use which grade the level of evidence and have been increasingly used in textbooks, it is not clear on what basis these decisions of grade were specifi cally arrived at [ 2 ]. A good systematic review makes transparent the question, the search strategy, and the rules for inclusion of studies and on what basis the quality of the study is determined. However, the fi nal assessment of the overall quality of evidence and the subsequent recommendation arising from this evidence is often obscure. In order to address this defi ciency this book has adopted the GRADE system to make transparent the decision-making about the quality of evidence and the factors considered in making a recommendation and a statement about the strength of this recommendation. The reader may disagree with certain judgments made by the authors, but the reason for disagreement will hopefully be clear with the GRADE system and the reader can make up their own minds whether the conclusions drawn by the authors are on the whole reasonable or valid. The key component of GRADE is that it explicitly separates the process of evaluating the quality of the evidence for an intervention from the process of making a recommendation for its adoption (or not). 1 Finding and Apprai 6 The GRADE System The GRADE system defi nes quality in the following way: “In the context of a systematic review, the ratings of the quality of evidence refl ect the extent of our confi – dence that the estimates of the effect are correct. In the context of making recommendations, the quality ratings refl ect the extent of our confi dence that the estimates of an effect are adequate to support a particular decision or recommendation” [ 3 ]. It is the latter defi nition that applies in this book, and the authors have included a discussion of their clinical experience that brings into play the necessity of balancing confl icting factors in making a recommendation. A more thorough discussion is provided by Andrews et al. and Brozek et al. [ 1 , 4 ]. The GRADE table used in this book lays out the justifi cation of why these decisions are made and it is instructive to describe in detail the components of the table. This example of a GRADE table is from Karanicalos et al.: (Tables 1.1 and 1.2 ) [ 9 , 11 ] Table 1.1 The GRADE system Study design Initial quality of the body of evidence Lower if Higher if Quality of a body of evidence Randomized trials High → Risk of bias Large effect High ⊕⊕⊕⊕ −1 Serious +1 Large −2 Very serious +2 Very large Inconsistency Dose response Moderate ⊕⊕⊕ −1 Serious +1 Evidence of a gradient Observational studies Low → −2 Very serious Indirectness All plausible residual confounding Low ⊕⊕ −1 Serious +1 Would reduce a demonstrated effect −2 Very serious +2 Would suggest a spurious effect if no effect was observed Imprecision Very low ⊕ −1 Serious −2 Very serious Publication bias −1 Likely −2 Very likely Derived from: Balshem et al. [ 3 ] S.K. Srinathan 7 Table 1.2 GRADE profi le for systematic review comparing pylorus preserving to standard Whipple procedure by Karanicolas et al. Quality assessment Summary of fi ndings # of studies (#of participants) Study limitations a Consistency Directness Precision Publication bias Relative effect (95 % CI) d Best estimate of Whipple group risk Absolute effect (95 %CI) Quality Five year mortality: 3(229) Serious limitations (−1) No important inconsistency Direct No important imprecision Unlikely 0.98 (0.87–1.11) 82.50 % 20 less/1,000;120 less to 80 more +++, moderate In-hospital mortality: 6(490) Serious limitations (−1) No important inconsistency Direct Imprecision (−1) c Unlikely 0.40 (0.14–1.13) 4.90 % 20 less/1,000; 50 less to 10 more ++, low Blood transfusions (units): 5(320) Serious limitations (−1) No important inconsistency Direct No important imprecision Unlikely – 2.45 units −0.66 (−1.06 to −0.25); favours pylorus preservation) +++, moderate Biliary leaks: 3(268) Serious limitations (−1) No important inconsistency Direct Imprecision (−1) c Unlikely 4.77 (0.23–97.96) 0 20 more/1,000; 20 less to 50 more ++, low Hospital stay (days): 5(446) Serious limitations (−1) No important inconsistency Direct Imprecision (−1) c Unlikely – 19.17 days −1.45 (−3.28 to 0.38); favours pylorus preservation ++, low (continued) 1 Finding and Appraising the Evidence: EBM and GRADE 8 Table 1.2 (continued) Quality assessment Summary of fi ndings # of studies (#of participants) Study limitations a Consistency Directness Precision Publication bias Relative effect (95 % CI) d Best estimate of Whipple group risk Absolute effect (95 %CI) Quality Delayed gastric emptying: 5(442) Serious limitations (−1) Unexplained heterogeneity (−1) b Direct Imprecision (−1) c Unlikely 1.52 (0.74–3.14) 25.50 % 110 more/1,000; 80 less to 290 more +, very low Derived from: Karanicolas et al. [ 11 ] a Unclear allocation concealment in all studies, patients blinded in only one study, outcome assessors not blinded in any study, >20 % loss to follow-up in three studies, not analysed using intention to treat in one study b I 2 = 72.6 %, P = 0.006 c Confi dence interval includes possible benefi t from both surgical approaches d Relative risks (95 % confi dence intervals) are based on random effect models S.K. Srinathan 9 The Header The general title of the clinical question being considered. Sub Heading A question broken up into the PICO format of patient or population, the setting, the intervention and the comparison to which the intervention is being made. The question is that which is of interest to the author of the table and may or may not refl ect the evidence which addresses this question. Outcomes The key component of the GRADE process is to focus on the outcomes to which the evidence applies. Individual studies may focus on differing outcomes that are of interest. It is often the case that many studies address common outcomes refl ecting benefi t, but do not reliably report on other outcomes, especially on harm. It is possible that with the same questions and same group of studies, the quality of evidence supporting an intervention is high for one outcome such but not others. This latter point is one of the reasons that during formulating the question it is useful to list in order of importance the outcomes of interest. Justifi cation for Quality Assessment In the GRADE system, a judgment is made whether the overall quality of evidence for each outcome is High, Moderate, Low, or Very Low. Initially evidence from RCTs is considered to be High quality evidence while observational studies start off as Low quality. Whether the overall body of evidence moves up or down the ranking is determined by the extent to which the studies have features which move them up or down and (Table 1.1 ) [ 3 ], specifi es the features which move a study up or down the list. Study Limitation The fi rst judgment is related to the possible defi ciency in the study designs themselves and these are determined during the critical appraisal process, features such as adequacy of randomization and blinding. 1 Finding and Appraising the Evidence: EBM and GRADE 10 Inconsistency Different studies may come to different conclusions either qualitatively e.g. the intervention works vs. it doesn’t or the degree to which a treatment works, i.e. the effect size differs. A measure of this in systematic reviews is the degree of heterogeneity often reported as the I 2 value and this is illustrated in our example when examining delayed gastric emptying. This heterogeneity can be due to differences in the patient population studied, the nature of the intervention, means of measuring outcomes or other study design features. Directness This is the degree to which the studies actually address the question we are interested in. The results may be indirect because the study population is different from one we are interested in or the intervention is differs substantially from what we are interested in. This is slightly different from the above example the indirectness refers to the whole body of evidence in relation to our specifi c question. Precision Studies may report effects with wide confi dence intervals where the values at the upper and lower bounds would suggest the different clinical actions. In our example, the mortality associated with PPP is expected to between 120 more deaths and 80 less deaths per 1000 patients. The wide confi dence intervals are most often driven by too small a sample size in a study. Publication Bias We may suspect publication bias when the preponderance of the available evidence comes from a number of small studies, most of which have been commercially funded. This may suggest that studies which not showing an effect have not been published which biases the evidence. Features Increasing Quality of Observational Studies Large Magnitude of Effect In well designed observation studies, if a large and plausible effect is observed (relative risk of greater than 5 or less than 0.2) there is reasonable confi dence that the effect is not due to confounding. This is the reason why one doesn’t really require a RCT to determine if parachutes are effective. S.K. Srinathan 11 Dose Response Gradient A fi nding in observational studies that increases our confi dence in a cause effect relationship is the demonstration of a dose response effect. For example, an increased risk of bleeding with increasing INR. All Plausible Confounding Would Reduce the Demonstrated Effect or Increase it if No Effect Was Observed A confounder is a factor related to both a predictor and outcome, but is not in the causal link between the predictor and outcome. If a likely confounder acts opposite to the way one would expect, then it is possible that the true effect is underestimated. For example if high risk patients do at least as well with a surgical procedure as do those at low risk, it more strongly suggests that there is a true effect of the surgical intervention and would increase our confi dence and thus the quality rating of the evidence. Summary of Findings The last column is a summary of fi ndings where the estimate of relative effect, the baseline risk of the standard therapy and the absolute effects of the intervention are reported. A measure of the absolute effect is crucial for making a recommendation since one intervention may be more effective in comparison to another, the overall effect in terms of overall numbers may be small, in our example the absolute risk of bleeding is only decreased by 1 %. Another example is if the baseline risk of pneumonia is 1 % and with the addition of preoperative antibiotics drops down to .7 %. A change in absolute risk of .3 % is unlikely to be of clinical signifi cance despite there being a 30 % relative risk reduction, which in many cases would be considered of considerable “clinical signifi cance”. The fi nal component of the GRADE system is to make a recommendation. In assessing the quality of the evidence necessary to make the recommendation, the ones making the recommendation should specify which of the various outcomes are crucial to making a recommendation, in our example it is reasonable to conclude that the evidence is low since that is the quality for the crucial outcome of perioperative mortality . It could be argued that the 5-year survival is more important in which case the quality of evidence is moderate (Tables 1.2 ). From determining the quality of evidence , a recommendation is made. This is a separate process from determining quality of evidence. A recommendation is either strong or weak where “The strength of a recommendation is defi ned as the extent to which one can be confi dent that the desirable consequences of an intervention outweigh its undesirable consequences” [ 1 ]. A strong recommendation is one where from the clinicians’ point of view; most patients should receive the intervention as 1 Finding and Appraising the Evidence: EBM and GRADE 12 the expected benefi ts comfortably outweigh the undesirable effects. In these situation there is usually little need for extensive discussions about the merits of the intervention. Weak recommendations on the other hand, may be appropriate in some patients, but requires more thorough discussions about the benefi ts and adverse effects of the treatment (Table 1.3 ) [ 4 ]. Ultimately, decisions about the care of individual patient falls to the surgeon and the patient which takes into account not just the external evidence for a particular course of action but crucially the patients own preferences and values and the practical ability for the surgeon to deliver on this decision in their own specifi c environment. Acknowledgment I would like to acknowledge Tania Gottschalk and Gordon Guyatt for their advice and assistance.

2. Atkins D, Best D, Briss PA, Eccles M, Falck-Ytter Y, Flottorp S, et al. Grading quality of evidence and strength of recommendations. BMJ (Clin Res Ed). 2004;328(7454):1490. doi:10.1136/bmj.328.7454.1490. 3. Balshem H, Helfand M, Schünemann HJ, Oxman AD, Kunz R, Brozek J, et al. GRADE guidelines: 3. Rating the quality of evidence. J Clin Epidemiol. 2011;64(4):401–6. doi:10.1016/j. jclinepi.2010.07.015. 4. Brożek JL, Akl EA, Compalati E, Kreis J, Terracciano L, Fiocchi A, et al. Grading quality of evidence and strength of recommendations in clinical practice guidelines Part 3 of 3. The GRADE approach to developing recommendations. Allergy. 2011;66(5):588–95. doi:10.1111/j.1398-9995.2010.02530.x. 5. Evidence-Based Medicine Working Group. Evidence-based medicine. A new approach to teaching the practice of medicine. JAMA. 1992;268(17):2420–5. doi:10.1001/ jama.1992.03490170092032. 6. GATE in ACP. 2013. GATE in ACP, 1–4. 7. Grimes DA, Schulz KF. An overview of clinical research: the lay of the land. Lancet. 2002;359(9300):57–61. doi:10.1016/S0140-6736(02)07283-5. 8. Grimes DA, Schulz KF. Bias and causal associations in observational research. Lancet. 2002;359(9302):248–52. doi:10.1016/S0140-6736(02)07451-2. 9. Guyatt GH, Oxman AD, Kunz R, Vist GE, Falck-Ytter Y, Schünemann HJ, GRADE Working Group. What is “quality of evidence” and why is it important to clinicians? BMJ (Clin Res Ed). 2008;336(7651):995–8. doi:10.1136/bmj.39490.551019.BE. 10. Haynes RB. Optimal search strategies for retrieving scientifi cally strong studies of treatment from Medline: analytical survey. BMJ (Clin Res Ed). 2005;330(7501):1179. doi:10.1136/ bmj.38446.498542.8F. 11. Karanicolas PJ, Davies E, Kunz R, Briel M, Koka HP, Payne DM, et al. The pylorus: take it or leave it? Systematic review and meta-analysis of pylorus-preserving versus standard whipple pancreaticoduodenectomy for pancreatic or periampullary cancer. Ann Surg Oncol. 2007;14(6):1825–34. doi:10.1245/s10434-006-9330-3. 12. Sackett DL. Evidence-based medicine. Semin Perinatol. 1997;21(1):3–5. 13. Sackett DL, Rosenberg WMC, Gray JAM, Haynes RB, Richardson WS. Evidence based medicine: what it is and what it isn’t. BMJ (Clin Res Ed). 1996;312(7023):71–2. doi:10.1136/ bmj.312.7023.71. 1 F

Leave a Reply

Your email address will not be published.