<![CDATA[Anesthesia & Analgesia - Statistical Grand Rounds: Advanced Statistical Methods for Research Best Practice]]>
https://journals.lww.com/anesthesia-analgesia/pages/collectiondetails.aspx?TopicalCollectionId=190
en-usFri, 07 Aug 2020 18:42:40 -0500Wolters Kluwer Health RSS Generatorhttps://cdn-images-journals.azureedge.net/anesthesia-analgesia/XLargeThumb.00000539-202008000-00000.CV.jpeg<![CDATA[Anesthesia & Analgesia - Statistical Grand Rounds: Advanced Statistical Methods for Research Best Practice]]>
https://journals.lww.com/anesthesia-analgesia/pages/collectiondetails.aspx?TopicalCollectionId=190
https://journals.lww.com/anesthesia-analgesia/Fulltext/2019/08000/Segmented_Regression_and_Difference_in_Difference.45.aspx
<![CDATA[Segmented Regression and Difference-in-Difference Methods: Assessing the Impact of Systemic Changes in Health Care]]>Perioperative investigators and professionals increasingly seek to evaluate whether implementing systematic practice changes improves outcomes compared to a previous routine. Cluster randomized trials are the optimal design to assess a systematic practice change but are often impractical; investigators, therefore, often select a before–after design. In this Statistical Grand Rounds, we first discuss biases inherent in a before–after design, including confounding due to periods being completely separated by time, regression to the mean, the Hawthorne effect, and others. Many of these biases can be at least partially addressed by using appropriate designs and analyses, which we discuss. Our focus is on segmented regression of an interrupted time series, which does not require a concurrent control group; we also present alternative designs including difference-in-difference, stepped wedge, and cluster randomization. Conducting segmented regression well requires a sufficient number of time points within each period, along with a robust set of potentially confounding variables. This method compares preintervention and postintervention changes over time, divergences in the outcome when an intervention begins, and trends observed with the intervention compared to trends projected without it. Difference-in-difference methods add a concurrent control, enabling yet stronger inference. When done well, the discussed methods permit robust inference on the effect of an intervention, albeit still requiring assumptions and having limitations. Methods are demonstrated using an interrupted time series study in which anesthesiologists took responsibility for an adult medical emergency team from internal medicine physicians in an attempt to improve outcomes.]]>Wed, 11 Sep 2019 13:20:35 GMT-05:0000000539-201908000-00045
https://journals.lww.com/anesthesia-analgesia/Fulltext/2019/04000/Quantile_Regression_and_Its_Applications__A_Primer.28.aspx
<![CDATA[Quantile Regression and Its Applications: A Primer for Anesthesiologists]]>Multivariable regression analysis is a powerful statistical tool in biomedical research with numerous applications. While linear regression can be used to model the expected value (ie, mean) of a continuous outcome given the covariates in the model, quantile regression can be used to compare the entire distribution of a continuous response or a specific quantile of the response between groups. The advantage of the quantile regression methodology is that it allows for understanding relationships between variables outside of the conditional mean of the response; it is useful for understanding an outcome at its various quantiles and comparing groups or levels of an exposure on those quantiles. We present quantile regression in a 3-step approach: determining that quantile regression is desired, fitting the quantile regression model, and interpreting the model results. We then apply our quantile regression analysis approach using 2 illustrative examples from the 2015 American College of Surgeons National Surgical Quality Improvement Program Pediatric database, and 1 example utilizing data on duration of sensory block in rats.]]>Wed, 11 Sep 2019 13:21:39 GMT-05:0000000539-201904000-00028
https://journals.lww.com/anesthesia-analgesia/Fulltext/2018/10000/Propensity_Score_Methods__Theory_and_Practice_for.38.aspx
<![CDATA[Propensity Score Methods: Theory and Practice for Anesthesia Research]]>Observational data are often readily available or less costly to obtain than conducting a randomized controlled trial. With observational data, investigators may statistically evaluate the relationship between a treatment or therapy and outcomes. However, inherent in observational data is the potential for confounding arising from the nonrandom assignment of treatment. In this statistical grand rounds, we describe the use of propensity score methods (ie, using the probability of receiving treatment given covariates) to reduce bias due to measured confounders in anesthesia and perioperative medicine research. We provide a description of the theory and background appropriate for the anesthesia researcher and describe statistical assumptions that should be assessed in the course of a research study using the propensity score. We further describe 2 propensity score methods for evaluating the association of treatment or therapy with outcomes, propensity score matching and inverse probability of treatment weighting, and compare to covariate-adjusted regression analysis. We distinguish several estimators of treatment effect available with propensity score methods, including the average treatment effect, the average treatment effect for the treated, and average treatment effect for the controls or untreated, and compare to the conditional treatment effect in covariate-adjusted regression. We highlight the relative advantages of the various methods and estimators, describe analysis assumptions and how to critically evaluate them, and demonstrate methods in an analysis of thoracic epidural analgesia and new-onset atrial arrhythmias after pulmonary resection.]]>Wed, 11 Sep 2019 13:22:04 GMT-05:0000000539-201810000-00038
https://journals.lww.com/anesthesia-analgesia/Fulltext/2018/04000/Error_Grid_Analysis_for_Arterial_Pressure_Method.16.aspx
<![CDATA[Error Grid Analysis for Arterial Pressure Method Comparison Studies]]>The measurement of arterial pressure (AP) is a key component of hemodynamic monitoring. A variety of different innovative AP monitoring technologies became recently available. The decision to use these technologies must be based on their measurement performance in validation studies. These studies are AP method comparison studies comparing a new method (“test method”) with a reference method. In these studies, different comparative statistical tests are used including correlation analysis, Bland-Altman analysis, and trending analysis. These tests provide information about the statistical agreement without adequately providing information about the clinical relevance of differences between the measurement methods. To overcome this problem, we, in this study, propose an “error grid analysis” for AP method comparison studies that allows illustrating the clinical relevance of measurement differences. We constructed smoothed consensus error grids with calibrated risk zones derived from a survey among 25 specialists in anesthesiology and intensive care medicine. Differences between measurements of the test and the reference method are classified into 5 risk levels ranging from “no risk” to “dangerous risk”; the classification depends on both the differences between the measurements and on the measurements themselves. Based on worked examples and data from the Multiparameter Intelligent Monitoring in Intensive Care II database, we show that the proposed error grids give information about the clinical relevance of AP measurement differences that cannot be obtained from Bland-Altman analysis. Our approach also offers a framework on how to adapt the error grid analysis for different clinical settings and patient populations.]]>Wed, 11 Sep 2019 13:23:42 GMT-05:0000000539-201804000-00016
https://journals.lww.com/anesthesia-analgesia/Fulltext/2017/10000/An_Appraisal_of_the_Carlisle_Stouffer_Fisher.46.aspx
<![CDATA[An Appraisal of the Carlisle-Stouffer-Fisher Method for Assessing Study Data Integrity and Fraud]]>Data fabrication and scientific misconduct have been recently uncovered in the anesthesia literature, partly via the work of John Carlisle. In a recent article in Anaesthesia, Carlisle analyzed 5087 randomized clinical trials from anesthesia and general medicine journals from 2000 to 2015. He concluded that in about 6% of studies, data comparing randomized groups on baseline variables, before the given intervention, were either too similar or dissimilar compared to that expected by usual sampling variability under the null hypothesis. Carlisle used the Stouffer-Fisher method of combining P values in Table 1 (the conventional table reporting baseline patient characteristics) for each study, then calculated trial P values and assessed whether they followed a uniform distribution across studies. Extreme P values targeted studies as likely to contain data fabrication or errors. In this Statistical Grand Rounds article, we explain Carlisle’s methods, highlight perceived limitations of the proposed approach, and offer recommendations. Our main findings are (1) independence was assumed between variables in a study, which is often false and would lead to “false positive” findings; (2) an “unusual” result from a trial cannot easily be concluded to represent fraud; (3) utilized cutoff values for determining extreme P values were arbitrary; (4) trials were analyzed as if simple randomization was used, introducing bias; (5) not all P values can be accurately generated from summary statistics in a Table 1, sometimes giving incorrect conclusions; (6) small numbers of P values to assess outlier status within studies is not reliable; (7) utilized method to assess deviations from expected distributions may stack the deck; (8) P values across trials assumed to be independent; (9) P value variability not accounted for; and (10) more detailed methods needed to understand exactly what was done. It is not yet known to what extent these concerns affect the accuracy of Carlisle’s results. We recommend that Carlisle’s methods be improved before widespread use (applying them to every manuscript submitted for publication). Furthermore, lack of data integrity and fraud should ideally be assessed using multiple simultaneous statistical methods to yield more confident results. More sophisticated methods are needed for nonrandomized trials, randomized trial data reported beyond Table 1, and combating growing fraudster sophistication. We encourage all authors to more carefully scrutinize their own reporting. Finally, we believe that reporting of suspected data fraud and integrity issues should be done more discretely and directly by the involved journal to protect honest authors from the stigma of being associated with potential fraud.]]>Wed, 11 Sep 2019 13:24:13 GMT-05:0000000539-201710000-00046
https://journals.lww.com/anesthesia-analgesia/Fulltext/2016/03000/Limitations_of_Significance_Testing_in_Clinical.30.aspx
<![CDATA[Limitations of Significance Testing in Clinical Research: A Review of Multiple Comparison Corrections and Effect Size Calculations with Correlated Measures]]>Modern clinical research commonly uses complex designs with multiple related outcomes, including repeated-measures designs. While multiple comparison corrections and effect size calculations are needed to more accurately assess an intervention’s significance and impact, understanding the limitations of these methods in the case of dependency and correlation is important. In this review, we outline methods for multiple comparison corrections and effect size calculations and considerations in cases of correlation and summarize relevant simulation studies to illustrate these concepts.]]>Wed, 11 Sep 2019 13:25:29 GMT-05:0000000539-201603000-00030
https://journals.lww.com/anesthesia-analgesia/Fulltext/2016/03000/Net_Reclassification_Improvement.29.aspx
<![CDATA[Net Reclassification Improvement]]>When adding new markers to existing prediction models, it is necessary to evaluate the models to determine whether the additional markers are useful. The net reclassification improvement (NRI) has gained popularity in this role because of its simplicity, ease of estimation, and understandability. Although the NRI provides a single-number summary describing the improvement new markers bring to a model, it also has several potential disadvantages. Any improved classification by the new model is weighted equally, regardless of the direction of reclassification. In prediction models that already identify the high- and low-risk groups well, a positive NRI may not mean better classification of those with medium risk, where it could make the most difference. Also, overfitting, or otherwise misspecified training models, produce overly positive NRI results. Because of the unaccounted for uncertainty in the model coefficient estimation, investigators should rely on bootstrapped confidence intervals rather than on tests of significance. Keeping in mind the limitations and drawbacks, the NRI can be helpful when used correctly.]]>Wed, 11 Sep 2019 13:26:24 GMT-05:0000000539-201603000-00029
https://journals.lww.com/anesthesia-analgesia/Fulltext/2016/01000/Quantifying_the_Diversity_and_Similarity_of.36.aspx
<![CDATA[Quantifying the Diversity and Similarity of Surgical Procedures Among Hospitals and Anesthesia Providers]]>In this Statistical Grand Rounds, we review methods for the analysis of the diversity of procedures among hospitals, the activities among anesthesia providers, etc. We apply multiple methods and consider their relative reliability and usefulness for perioperative applications, including calculations of SEs. We also review methods for comparing the similarity of procedures among hospitals, activities among anesthesia providers, etc. We again apply multiple methods and consider their relative reliability and usefulness for perioperative applications. The applications include strategic analyses (e.g., hospital marketing) and human resource analytics (e.g., comparisons among providers). Measures of diversity of procedures and activities (e.g., Herfindahl and Gini-Simpson index) are used for quantification of each facility (hospital) or anesthesia provider, one at a time. Diversity can be thought of as a summary measure. Thus, if the diversity of procedures for 48 hospitals is studied, the diversity (and its SE) is being calculated for each hospital. Likewise, the effective numbers of common procedures at each hospital can be calculated (e.g., by using the exponential of the Shannon index). Measures of similarity are pairwise assessments. Thus, if quantifying the similarity of procedures among cases with a break or handoff versus cases without a break or handoff, a similarity index represents a correlation coefficient. There are several different measures of similarity, and we compare their features and applicability for perioperative data. We rely extensively on sensitivity analyses to interpret observed values of the similarity index.]]>Wed, 11 Sep 2019 14:05:35 GMT-05:0000000539-201601000-00036
https://journals.lww.com/anesthesia-analgesia/Fulltext/2015/08000/Tracking_Changes_in_Cardiac_Output__Statistical.34.aspx
<![CDATA[Tracking Changes in Cardiac Output: Statistical Considerations on the 4-Quadrant Plot and the Polar Plot Methodology]]>When comparing 2 technologies for measuring hemodynamic parameters with regard to their ability to track changes, 2 graphical tools are omnipresent in the literature: the 4-quadrant plot and the polar plot recently proposed by Critchley et al. The polar plot is thought to be the more advanced statistical tool, but care should be taken when it comes to its interpretation. The polar plot excludes possibly important measurements from the data. The polar plot transforms the data nonlinearily, which may prevent it from being seen clearly. In this article, we compare the 4-quadrant and the polar plot in detail and thoroughly describe advantages and limitations of each. We also discuss pitfalls concerning the methods to prepare the researcher for the sound use of both methods. Finally, we briefly revisit the Bland-Altman plot for the use in this context.]]>Wed, 11 Sep 2019 14:06:40 GMT-05:0000000539-201508000-00034
https://journals.lww.com/anesthesia-analgesia/Fulltext/2014/09000/Bernoulli_Cumulative_Sum__CUSUM__Control_Charts.25.aspx
<![CDATA[Bernoulli Cumulative Sum (CUSUM) Control Charts for Monitoring of Anesthesiologists’ Performance in Supervising Anesthesia Residents and Nurse Anesthetists]]>We describe our experiences in using Bernoulli cumulative sum (CUSUM) control charts for monitoring clinician performance. The supervision provided by each anesthesiologist is evaluated daily by the Certified Registered Nurse Anesthetists (CRNAs) and/or anesthesia residents with whom they work. Each of 9 items is evaluated (1 = never, 2 = rarely, 3 = frequently, 4 = always). The score is the mean of the 9 responses. Choosing thresholds for low scores is straightforward, <2.0 for CRNAs and <3.0 for residents. Bernoulli CUSUM detection of low scores was within 50 ± 14 (median ± quartile deviation) days rather than 182 days without use of CUSUM. The true positive detection of anesthesiologists with incidences of low scores greater than the chosen “out-of-control” rate was 14 of 14. The false-positive detection rate was 0 of 29. This CUSUM performance exceeded that of Shewhart individual control charts, for which the smallest threshold sufficiently large to detect 14 of 14 true positives had false-positive detection of 16 of 29 anesthesiologists. The Bernoulli CUSUM assumes that scores are known right away, which is untrue. However, CUSUM performance was insensitive to this assumption. The Bernoulli CUSUM assumes statistical independence of scores, which also is untrue. For example, when an evaluation of an anesthesiologist 1 day by a CRNA had a low score, there was an increased chance that another CRNA working in a different operating room on the same day would also give that same anesthesiologist a low score (P < 0.0001). This correlation among scores does affect the Bernoulli CUSUM, such that detection is more likely. This is an advantage for our continual process improvement application since it flags individuals for further evaluation by managers while maintaining confidentiality of raters.]]>Wed, 11 Sep 2019 14:07:13 GMT-05:0000000539-201409000-00025
https://journals.lww.com/anesthesia-analgesia/Fulltext/2013/10000/Understanding_the_Mechanism__Mediation_Analysis_in.27.aspx
<![CDATA[Understanding the Mechanism: Mediation Analysis in Randomized and Nonrandomized Studies]]>In comparative clinical studies, a common goal is to assess whether an exposure, or intervention, affects the outcome of interest. However, just as important is to understand the mechanism(s) for how the intervention affects outcome. For example, if preoperative anemia was shown to increase the risk of postoperative complications by 15%, it would be important to quantify how much of that effect was due to patients receiving intraoperative transfusions. Mediation analysis attempts to quantify how much, if any, of the effect of an intervention on outcome goes though prespecified mediator, or “mechanism” variable(s), that is, variables sitting on the causal pathway between exposure and outcome. Effects of an exposure on outcome can thus be divided into direct and indirect, or mediated, effects. Mediation is claimed when 2 conditions are true: the exposure affects the mediator and the mediator (adjusting for the exposure) affects the outcome. Understanding how an intervention affects outcome can validate or invalidate one’s original hypothesis and also facilitate further research to modify the responsible factors, and thus improve patient outcome. We discuss the proper design and analysis of studies investigating mediation, including the importance of distinguishing mediator variables from confounding variables, the challenge of identifying potential mediators when the exposure is chronic versus acute, and the requirements for claiming mediation. Simple designs are considered, as well as those containing multiple mediators, multiple outcomes, and mixed data types. Methods are illustrated with data collected by the National Surgical Quality Improvement Project (NSQIP) and utilized in a companion paper which assessed the effects of preoperative anemic status on postoperative outcomes.]]>Wed, 11 Sep 2019 14:07:50 GMT-05:0000000539-201310000-00027
https://journals.lww.com/anesthesia-analgesia/Fulltext/2013/09000/A_Review_of_Analysis_and_Sample_Size_Calculation.23.aspx
<![CDATA[A Review of Analysis and Sample Size Calculation Considerations for Wilcoxon Tests]]>When a study uses an ordinal outcome measure with unknown differences in the anchors and a small range such as 4 or 7, use of the Wilcoxon rank sum test or the Wilcoxon signed rank test may be most appropriate. However, because nonparametric methods are at best indirect functions of standard measures of location such as means or medians, the choice of the most appropriate summary measure can be difficult. The issues underlying use of these tests are discussed. The Wilcoxon-Mann-Whitney odds directly reflects the quantity that the rank sum procedure actually tests, and thus it can be a superior summary measure. Unlike the means and medians, its value will have a one-to-one correspondence with the Wilcoxon rank sum test result. The companion article appearing in this issue of Anesthesia & Analgesia (“Aromatherapy as Treatment for Postoperative Nausea: A Randomized Trial”) illustrates these issues and provides an example of a situation for which the medians imply no difference between 2 groups, even though the groups are, in fact, quite different. The trial cited also provides an example of a single sample that has a median of zero, yet there is a substantial shift for much of the nonzero data, and the Wilcoxon signed rank test is quite significant. These examples highlight the potential discordance between medians and Wilcoxon test results. Along with the issues surrounding the choice of a summary measure, there are considerations for the computation of sample size and power, confidence intervals, and multiple comparison adjustment. In addition, despite the increased robustness of the Wilcoxon procedures relative to parametric tests, some circumstances in which the Wilcoxon tests may perform poorly are noted, along with alternative versions of the procedures that correct for such limitations.]]>Wed, 11 Sep 2019 14:08:25 GMT-05:0000000539-201309000-00023
https://journals.lww.com/anesthesia-analgesia/Fulltext/2013/05000/Estimating_Surgical_Case_Durations_and_Making.23.aspx
<![CDATA[Estimating Surgical Case Durations and Making Comparisons Among Facilities: Identifying Facilities with Lower Anesthesia Professional Fees]]>Consumer-driven health care relies on transparency in cost estimates for surgery, including anesthesia professional fees. Using systematic narrative review, we show that providing anesthesia costs requires that each facility (anesthesia group) estimate statistics, reasonably the mean and the 90% upper prediction limit of case durations by procedure. The prediction limits need to be calculated, for many procedures, using Bayesian methods based on the log-normal distribution. Insurers and/or governments lack scheduled durations and procedures and cannot practically infer these estimates because of the large heterogeneities among facilities in the means and coefficients of variation of durations. Consequently, the insurance industry cannot provide the cost information accurately from public and private databases. Instead, the role of insurers and/or governments can be to identify facilities with significantly briefer durations (costs to the patient) than average. Such comparisons of durations among facilities should be performed with correction for the effects of the multiple comparisons. Our review also has direct implications to the potentially more important issue of how to study the association between anesthetic durations and patient morbidity and mortality. When pooling duration data among facilities, both the large heterogeneity in the means and coefficients of variation of durations among facilities need to be considered (e.g., using “multilevel” or “hierarchical” models).]]>Wed, 11 Sep 2019 14:09:06 GMT-05:0000000539-201305000-00023
https://journals.lww.com/anesthesia-analgesia/Fulltext/2012/06000/Joint_Hypothesis_Testing_and_Gatekeeping.25.aspx
<![CDATA[Joint Hypothesis Testing and Gatekeeping Procedures for Studies with Multiple Endpoints]]>A claim of superiority of one intervention over another often depends naturally on results from several outcomes of interest. For such studies the common practice of making conclusions about individual outcomes in isolation can be problematic. For example, an intervention might be shown to improve one outcome (e.g., pain score) but worsen another (e.g., opioid consumption), making interpretation difficult. We thus advocate joint hypothesis testing, in which the decision rule used to claim success of an intervention over its comparator with regard to the multiple outcomes are specified a priori, and the overall type I error is protected. Success might be claimed only if there is a significant improvement detected in all primary outcomes, or alternatively, in at least one of them. We focus more specifically on demonstrating superiority on at least one outcome and noninferiority (i.e., not worse) on the rest. We also advocate the more general “gatekeeping” procedures (both serial and parallel), in which primary and secondary hypotheses of interest are a priori organized into ordered sets, and testing does not proceed to the next set, i.e., through the “gate,” unless the significance criteria for the previous sets are satisfied, thus protecting the overall type I error. We demonstrate methods using data from a randomized controlled trial assessing the effects of transdermal nicotine on pain and opioids after pelvic gynecological surgery. Joint hypothesis testing and gatekeeping procedures are shown to substantially improve the efficiency and interpretation of randomized and nonrandomized studies having multiple outcomes of interest.]]>Wed, 11 Sep 2019 14:09:32 GMT-05:0000000539-201206000-00025
https://journals.lww.com/anesthesia-analgesia/Fulltext/2011/11000/Importance_of_Appropriately_Modeling_Procedure_and.38.aspx
<![CDATA[Importance of Appropriately Modeling Procedure and Duration in Logistic Regression Studies of Perioperative Morbidity and Mortality]]>Multiple logistic regression studies frequently are performed with duration (e.g., operative time) included as an independent variable. We use narrative review of the statistical literature to highlight that when the association between duration and outcome is presumptively significant, the procedure itself (e.g., video-assisted thoracoscopic lobectomy or thoracotomy lobectomy) needs to be tested for inclusion in the logistic regression. If the procedure is a true covariate but excluded in lieu of category of procedure (e.g., lung resection), estimates of the odds ratios for other independent variables are biased. In addition, actual durations are sometimes used as the independent variable, rather than scheduled (forecasted) durations. Only the scheduled duration is known when a patient would be randomized in a trial of preoperative or intraoperative intervention and/or meets with the surgeon and anesthesiologist preoperatively. By reviewing the literature about logistic regression and about predicting case duration, we show that the use of actual instead of scheduled duration can result in biased logistic regression results.]]>Wed, 11 Sep 2019 14:11:31 GMT-05:0000000539-201111000-00038
https://journals.lww.com/anesthesia-analgesia/Fulltext/2011/06000/Design_and_Analysis_of_Studies_with_Binary__Event.35.aspx
<![CDATA[Design and Analysis of Studies with Binary- Event Composite Endpoints: Guidelines for Anesthesia Research]]>Composite endpoints consisting of several binary events, such as distinct perioperative complications, are frequently chosen as the primary outcome in anesthesia studies (and in many other clinical specialties) because (1) no single outcome fully characterizes the disease or outcome of interest, and/or (2) individual outcomes are rare and statistical power would be inadequate for any single one. Interpreting a composite endpoint is challenging because components rarely meet the ideal criteria of having comparable clinical importance, frequency, and treatment effects. We suggest guidelines for forming composite endpoints and show advantages of newer versus conventional statistical methods for analyzing them. Components should be a parsimonious set of outcomes, which when taken together, well represent the disease of interest and are very plausibly related to the intervention. Adding components that are too narrow, redundant, or minimally influenced by the study intervention compromises interpretation of results and reduces power. We show that multivariate (i.e., multiple outcomes per patient) methods of analyzing a binary-event composite provide distinct advantages over standard methods such as any-versus-none, count of events, or evaluation of individual events. Multivariate methods can incorporate clinical importance weights, compensate for events occurring at varying frequencies, assess treatment effect heterogeneity, and are often more powerful than alternative statistical approaches. Methods are illustrated with an American College of Surgeons National Surgical Quality Improvement Program registry study that evaluated the effects of smoking on major perioperative outcomes, and with a clinical trial comparing the effects of crystalloids and colloids on major complications. Sample data files and SAS code are included for convenience.]]>Wed, 11 Sep 2019 14:10:36 GMT-05:0000000539-201106000-00035
https://journals.lww.com/anesthesia-analgesia/Fulltext/2011/03000/Equivalence_and_Noninferiority_Testing_in.34.aspx
<![CDATA[Equivalence and Noninferiority Testing in Regression Models and Repeated-Measures Designs]]>Equivalence and noninferiority designs are useful when the superiority of one intervention over another is neither expected nor required. Equivalence trials test whether a difference between groups falls within a prespecified equivalence region, whereas noninferiority trials test whether a preferred intervention is either better or at least not worse than the comparator, with worse being defined a priori. Special designs and analyses are needed because neither of these conclusions can be reached from a nonsignificant test for superiority. Using the data from a companion article, we demonstrate analyses of basic equivalence and noninferiority designs, along with more complex model-based methods. We first give an overview of methods for design and analysis of data from superiority, equivalence, and noninferiority trials, including how to analyze each type of design using linear regression models. We then show how the analogous hypotheses can be tested in a repeated-measures setting in which there are multiple outcomes per subject. We especially address interactions between the repeated factor, usually time, and treatment. Although we focus on the analysis of continuous outcomes, extensions to other data types as well as sample size consideration are discussed.]]>Wed, 11 Sep 2019 14:12:11 GMT-05:0000000539-201103000-00034
https://journals.lww.com/anesthesia-analgesia/Fulltext/2011/10000/Analysis_of_Variance_of_Communication_Latencies_in.34.aspx
<![CDATA[Analysis of Variance of Communication Latencies in Anesthesia: Comparing Means of Multiple Log-Normal Distributions]]>Anesthesiologists rely on communication over periods of minutes. The analysis of latencies between when messages are sent and responses obtained is an essential component of practical and regulatory assessment of clinical and managerial decision-support systems. Latency data including times for anesthesia providers to respond to messages have moderate (> n = 20) sample sizes, large coefficients of variation (e.g., 0.60 to 2.50), and heterogeneous coefficients of variation among groups. Highly inaccurate results are obtained both by performing analysis of variance (ANOVA) in the time scale or by performing it in the log scale and then taking the exponential of the result. To overcome these difficulties, one can perform calculation of P values and confidence intervals for mean latencies based on log-normal distributions using generalized pivotal methods. In addition, fixed-effects 2-way ANOVAs can be extended to the comparison of means of log-normal distributions. Pivotal inference does not assume that the coefficients of variation of the studied log-normal distributions are the same, and can be used to assess the proportional effects of 2 factors and their interaction. Latency data can also include a human behavioral component (e.g., complete other activity first), resulting in a bimodal distribution in the log-domain (i.e., a mixture of distributions). An ANOVA can be performed on a homogeneous segment of the data, followed by a single group analysis applied to all or portions of the data using a robust method, insensitive to the probability distribution.]]>Wed, 11 Sep 2019 14:13:00 GMT-05:0000000539-201110000-00034
https://journals.lww.com/anesthesia-analgesia/Fulltext/2011/04000/Analysis_of_Interventions_Influencing_or_Reducing.30.aspx
<![CDATA[Analysis of Interventions Influencing or Reducing Patient Waiting While Stratifying by Surgical Procedure]]>Facilitation of the coordination of presurgical care is desirable not only from the patients' perspective, but also for increasing operating room productivity of surgeons and anesthesiologists. Times from each patient's first referral to a surgeon until surgery will be available on a vast scale from regional health information exchanges. Treatments (interventions) can include, for example, case management and use of health system networks with common electronic medical records. We developed a method to compare waiting times between treatment (intervention) groups, while stratifying by procedure, despite (1) highly skewed but non-lognormally distributed data, (2) highly heterogeneous sample sizes among groups and procedures, and (3) many combinations of groups and procedures with small sample sizes, resulting in estimated means and variances having substantial uncertainty. Corresponding results obtained by analyzing data from a health system were as follows. (1) The method uses a random-effects model to accommodate procedure heterogeneity and is otherwise distribution free. Log transformation reduced the skewness in waiting time data, making the distribution-free first-order Taylor series approximation analysis of proportional changes between treatments (interventions) reasonable. However, when instead of the random-effects distribution-free analysis, the assumption was made of lognormal distributions, the estimate of treatment effect was biased. (2) Repeating the analysis without stratification by procedure also resulted in biased estimates. (3) There are nearly an unlimited number of different procedures, most rare, so that considering procedure as a random effect was appropriate. Over the ranges of estimated parameter values, prior Monte-Carlo simulation studies showed that meta-analysis using the simple method of moments was appropriate. However, because many treatment/procedure combinations have small sample sizes, confidence interval coverage for the treatment effect was too narrow other than when the degrees of freedom were corrected. Nevertheless, the resulting statistical methodology is straightforward to apply because the data needed are just the summary statistics and the method involves just a series of equations to be followed in a stepwise manner (e.g., in a spreadsheet program such as Microsoft Office Excel).]]>Wed, 11 Sep 2019 14:13:38 GMT-05:0000000539-201104000-00030
https://journals.lww.com/anesthesia-analgesia/Fulltext/2011/10000/An_Introduction_to_Multilevel_Modeling_for.33.aspx
<![CDATA[An Introduction to Multilevel Modeling for Anesthesiologists]]> In population-based research, subjects are frequently in clusters with shared features or demographic characteristics, such as age range, neighborhood, who they have for a physician, and common comorbidities. Classification into clusters also applies at broader levels. Physicians are classified by physician group or by practice site; hospitals can be characterized by size, location, or demographics. Hierarchical, nested structures pose unique challenges in the conduct of research. Data from nested structures may be interdependent because of similarities among subjects in a cluster, while nesting at multiple levels makes it difficult to know whether findings should be applied to the individual or to the larger group. Statistical tools, known variously as hierarchical linear modeling, multilevel modeling, mixed linear modeling, and other terms, have been developed in the education and social science fields to deal effectively with these issues. Our goal in this article is to review the implications of hierarchical, nested data organization and to provide a step-by-step tutorial of how multilevel modeling could be applied to a problem in anesthesia research using current, commercially available software.]]>Wed, 11 Sep 2019 14:14:16 GMT-05:0000000539-201110000-00033