| Sign In to gain access to subscriptions and/or personal tools. |
Is Reduction of Pocket Probing Depth Correlated with the Baseline Value or is it "Mathematical Coupling"?
1 Biostatistics Unit, Academic Unit of Epidemiology and Health Services Research, Medical School, University of Leeds, 24 Hyde Terrace, Leeds, LS2 9LN, UK; and Correspondence: * corresponding author, m.s.gilthorpe{at}leeds.ac.uk
Previous studies using correlation or regression analysis have showed that treatment effects measured by the change in clinical parameters are often associated with baseline values of the same parameters. These studies, however, have a methodological weakness. Correlation/regression between baseline measures and the derived change variable invalidates the statistical procedures of testing the null hypothesis: that the coefficient of correlation/regression is zero. This is due to the phenomenon of mathematical coupling. To investigate the impact that this has on the observed correlation/regression coefficient when in reality this is zero, we used random simulations of hypothetical data to model the treatment of periodontal pockets. Results showed a strong probability of obtaining statistically significant correlation/regression coefficients. To separate this artificial effect of mathematical coupling from the true underlying biological relationship, one must apply appropriate analytical strategies to re-evaluate previous evidence within the periodontal literature.
Key Words: correlation coefficient regression coefficient regression to the mean periodontal research
In the periodontal literature, many studies have used correlation or regression to demonstrate that the treatment effect measured by the change of a clinical parameter—such as pocket probing depth, clinical attachment level, or gingival recession—is strongly associated with baseline values of the same parameter. There are many studies showing that the baseline value is a good indicator of the change following treatment (Newcomb and Nixon, 1990; Machtei et al., 1993, 1994; Selvig et al., 1993; Mora et al., 1996; Pini Prato et al., 1996; Falk et al., 1997; Mombelli et al., 1997). These studies, however, all have a common methodological weakness: They examine the correlation or the regression slope between a baseline variable prior to treatment and a derived variable based on the change between the respective pre-treatment and post-treatment values. In his classic book, Andersen (1990) warned against this phenomenon, known as mathematical coupling, since it distorts the relationship between variables and violates basic assumptions underpinning correlation and regression. Classic statistical procedures of testing the null hypothesis, i.e., that the coefficient of correlation or the slope of regression is zero, become erroneous. Andersen devoted a whole chapter in his book to illustrating this statistical consequence by various examples, and many people have appealed for discontinuing this malpractice of correlation/regression analysis (Archie, 1981; Altman, 1982, 1991). For this study, we simulated and assessed clinical research data on the treatment of periodontal pockets to discover the impact of MC in artificially explaining the "observed" relationship between baseline pocket depth and change in pocketing following surgery when there is no "true" underlying effect.
We simulated hypothetical studies to evaluate the efficacy of a new periodontal therapy. The chosen outcome was the change of pocket probing depth (PPD), as derived from pre- and post-treatment pocket depth measurements. The relationship linking baseline PPD (X), post-treatment PPD (Y), and the corresponding change variable (Z), is:
Simulations were undertaken where the aim was to test if the baseline PPD appeared to be a good predictor for the treatment outcome, i.e., the change in PPD, when there was no such relationship (S = 0). In practice, it is assumed that the "observed" pre- (x) and post-treatment (y) measurements contain some measurement error (ex and ey), such that x = X + ex and y = Y + ey, where both error distributions have zero mean and equal variances (square of the standard deviation). For the purposes of this study, it was further assumed that these errors are uncorrelated and independent of the unobserved values (X and Y). In a typical treatment study, sites would be selected according to treatment needs, not randomly. Thus, for any site to be included in a hypothetical study, its pocket depth must be greater than a pre-defined minimum, e.g., at least 4 mm. To model this, we simulated a population of 1000 error-free X values, generated from normally distributed random numbers, with a baseline mean value of 9 mm PPD, based on typical values observed in the periodontal literature (Laurell et al., 1998). Observed pre-treatment PPD measurements were then derived (x = X + ex). Values for the standard deviation (SD) of the population (PSD) were taken to be 2 mm and 3 mm, yielding reference ranges (i.e., the range in which 95% of the observations lie) of 5 to 13 mm and 3 to 15 mm initial PPD, respectively. A key factor that determines the effect of MC is the ratio (R) of the population SD to that of the measurement error SD (Hayes, 1988). Ratios of 1, 2, and 3 were considered, generating hypothetical population data with error SDs ranging from 0.7 mm to 3.0 mm. The simulated population data were sampled (without replacement) to yield a hypothetical study of N sites. In the event that a sampled pre-treatment PPD was less than 4 mm, the site was excluded and sampling continued until the required study size was attained. In this manner, no hypothetical study contained sites with observed pre-treatment PPD values of less than 4 mm. We considered various study sizes, with N = 10, 30, 50, and 500, to assess the effect this had on statistical power (i.e., the probability of correctly finding that a coefficient differs significantly from zero). We calculated post-treatment error-free PPD values (Y) from the pre-treatment error-free values (X), according to Eq. 2, thereby assuming no change between measurement occasions apart from that induced by the new therapy. The observed post-treatment PPD measurements were also derived (y = Y + ey), and if any observation was found to be negative, it was set to zero. The observed changes in PPD (z) were calculated based on a relationship similar to that in Eq. 1 applied to the observed values (x, y) instead of the unobserved values (X, Y). It can be shown that the correlation/regression coefficient is independent of the baseline mean PPD and the overall treatment effect (A) (Moreno et al., 1986), provided that simulated values are not truncated, as was the case for initially negative post-treatment PPD values. To investigate the effect of truncation, we explored two options for the overall reduction of PPD following treatment: no overall mean reduction (i.e., A = 0) and 4 mm overall mean reduction (i.e., A = 4). Each hypothetical population and its associated study sample were simulated 10,000 times by means of the modeling and simulation software MLwiN (Rasbash et al., 2000), for all possible scenarios being considered. For each simulation, the Pearson (parametric) correlation and the Spearman (non-parametric) rank correlation were assessed by means of the two-tailed t test (Kirkwood, 1992); significance was assumed at the 5% level. According to the t test, the test statistic for the Pearson correlation and the regression slope are equivalent under the corresponding null hypothesis; hence the results for the latter are not presented. The empirical median values for all statistics, along with 95% confidence intervals, were derived for each set of simulations.
The results of computer simulations are summarized for no overall mean treatment effect (Table 1
As the ratio of population to measurement errors SD, R = PSD/ESD, increased, the proportion of coefficients that spuriously tested as significant fell. The larger the study size, the more likely a significant coefficient was observed. An overall treatment effect of 4 mm reduction in PPD, following therapy, increased the number of truncated simulated negative PPD values and gave rise to an elevated likelihood of spuriously significant results (Table 2
Archie (1981) has proposed four types of mathematical coupling, and the statistical results based on any one should be carefully interpreted. The common problem in each type is that one variable either directly or indirectly contains the whole or components of another variable. In this hypothetical study, as within the previous periodontal literature, the correlation/regression coefficient between the change of a clinical parameter and its baseline value is known as Type III mathematical coupling. This problem arises through addition, subtraction, multiplication, or division of one variable by another and is the most common type in clinical research (Archie, 1981). The MC phenomenon casts doubt upon the validity of conclusions reported in many studies, because any inference on the relationship between change and baseline values that has been subjected to this type of analysis could be highly erroneous, yielding misleading results and consequently incorrect conclusions. Within medicine, studies on the use of calcium channel blockers to reduce blood pressure in patients with hypertension (Gill et al., 1985), surgical treatment of obesity (Halverson and Koehler, 1981), protein clearance of patients under dialysis (Lowrie, 1996), and oxygen consumption in relation to oxygen delivery (Yu et al., 1996; Granton et al., 1998) have all cited MC. In contrast, we are not aware that this issue has been discussed in the periodontal literature, yet the problem of assessing an outcome in relation to the initial disease severity is common in periodontal research.
There has been some discussion in the periodontal literature on the issue of regression to the mean (RTM) (Blomqvist, 1987; Egelberg, 1989; Gunsolley et al., 2001), which is where, due to measurement error or within-subject variation, initially high/low values are subsequently recorded to be lower/higher and vice versa (Yudkin and Stratton, 1996). RTM is a special case of MC, where coupling occurs through the "observed" variables as a consequence of the measurement error. However, one can experience MC without RTM. The concepts of MC without RTM and MC that is entirely RTM are illustrated (Figs. 1, 2
A recent meta-analysis showed that most periodontal studies in the literature have used 15 or more patients in either test or control groups (Laurell et al., 1998). Within this study, simulation sample size was increased from 10, through 30 and 50, to 500. When there was no underlying relationship between baseline and change, MC was more likely to lead to a significant finding for larger studies. The likelihood of finding an incorrect significant result, due to MC, approached 100% asymptotically as the study size increased. This is because the standard error of a correlation/regression coefficient reduces with increasing study size, and the likelihood of obtaining significance for the coefficient increases. However, it should not be inferred that small studies are to be preferred simply because they experience reduced likelihood of false-positive results; all statistical tests have less power within smaller studies, and the likelihood of finding a genuine underlying biological relationship between change and baseline would also reduce. For all study sizes, the simulations demonstrated beyond reasonable doubt that there exists a non-zero probability of acquiring a fraudulent significance in correlation/regression due to MC when in fact there is no underlying relationship between baseline outcome and change following treatment. Inclusion of pre-treatment sites with at least 4 mm pocketing and the truncation to zero of simulated post-treatment negative PPD values replicate real-life scenarios of patient recruitment and physical limitations of treatment outcome following surgery. However, these restrictions had minimal impact on the role of MC. There was also minimal difference between the two correlation methods adopted, which was perhaps to be expected given the underlying assumption of a linear relationship between pre- and post-treatment PPD values. Based on our study, and other medical literature, it is evident that the effect of MC cannot be neglected in periodontal research. One solution, proposed and recommended by many authors (Oldham, 1962; Altman, 1982, 1991), is to correlate the average of pre- and post-treatment values with the change variable. A more sophisticated regression approach is multilevel modeling (Gilthorpe et al., 2000). By construction of a random coefficient model (Gilthorpe et al., 2001), with outcome modeled over time and the pre- and post-treatment measures nested within subjects, the required correlation would be that between the random intercept and random slope (Bryk and Raudenbush, 1992). In the absence of such strategies, interpretation of the results and the conclusions reached in previous studies that fail to address the MC phenomenon are questionable. Moreover, the biological and functional mechanisms put forward to explain the strong connection between baseline values and treatment effects are suspect. It is therefore strongly suggested that the results and conclusions of previous periodontal literature, whose statistical evidence is tainted by MC, should be critically reviewed and re-analyzed. While the biological mechanisms and the clinical association between different parameters and measurements proposed might be genuine, initial conclusions need to be clarified in view of the highlighted problem. Until the artificial effect of MC has been separated from the "true" biological relationship, evidence of a relationship between baseline value and change following surgery remains questionable and could be potentially misleading. Careful formulation of the study question and knowledge of the measurements (in particular, the estimation of measurement errors) are crucial for the adoption of appropriate strategies whereby such problems can be avoided in future periodontal research.
The first author was self-funded, while the remaining two authors were funded by the UK governments Higher Education Funding Council for England (HEFCE).
A supplemental appendix to this article is published electronically only at http://www.dentalresearch.org. Received for publication June 27, 2001. Revision received July 8, 2002. Accepted for publication July 24, 2002.
Journal of Dental Research, Vol. 81, No. 10,
722-726 (2002) This article has been cited by other articles:
|
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||


0.707 and 1.0, respectively, i.e., not zero (Fig. 1b






