| Sign In to gain access to subscriptions and/or personal tools. |
Statistical Issues for Combining Multiple Caries Diagnostics for Demonstrating Caries Efficacy
1 Division of Biostatistics, Indiana University School of Medicine, 1050 Wishard Blvd, RG 4101, Indianapolis, IN 46260, USA; and Correspondence: * corresponding author, bkatz{at}iupui.edu
ABSTRACT Caries efficacy in clinical trials has been based primarily on visual examinations supplemented by Fiber Optic Transillumination (FOTI) and radiography, with the assessments combined at the surface level to classify each surface as to its caries status. Newer caries diagnostics techniques measure the caries process in a quantitative manner and so thus yield continuous rather than ordinal results. The objective of this study was to examine various methods for the analysis of multiple outcomes in clinical trials and to compare their usefulness for the analysis of caries trials. Four global tests (rank sum, ordinary least squares, general least squares, and generalized estimating equations) and two caries indices (based on average and maximum values of the methods) were evaluated with the use of one-year follow-up data from 1063 children in a recent caries trial. A new hybrid method was also developed and evaluated. All of the methods performed well when the diagnostic measures showed product differences in caries in the same direction. Ease of use, interpretability, and distributional assumptions must be considered before a consensus method for analysis of multiple diagnostic measures in caries trials can be determined.
Key Words: statistical analysis caries diagnostics clinical trials INTRODUCTION Caries efficacy in clinical trials has been based primarily on visual examinations. The most common outcome measure for caries trials is the number of surfaces that are sound or unerupted at baseline but exhibit caries at follow-up. This caries increment has traditionally been analyzed by analysis of covariance (ANCOVA) (Grainger et al., 1984). Covariates commonly included in the analysis are age, gender, and caries history. These variables are sometimes used as stratification factors in the design of the trial (Kingman, 1984). Alternative statistical analysis methods for caries trials have also been proposed. These include efforts to include the ordinal nature of the clinical exam (Fleiss, 1984) and the analysis of the caries increment as a count. The latter analyses use modeling procedures with a Poisson error structure (Hujoel et al., 1994; Bohning et al., 1999). Most recently, Caplan et al.(1999) proposed the incidence density method, an epidemiological approach which is based on the number of surfaces at risk over time. Some caries trials have incorporated radiographic assessment in addition to visual exam into the calculation of the caries increment. Decisions are made at the surface level, where a surface is considered carious if either measure shows evidence of a lesion. Although not generally thought of as a score based on multiple outcomes, this combined increment is a caries index that uses the maximum value of the available methods at each surface. The additional information provided by x-rays or other imaging methods presented few statistical challenges. One of the few formal analyses of multiple outcomes in caries trials was the use of multivariate analysis of covariance to analyze data from four caries trials (Geary et al., 1992). However, measures of dental health other than caries—including plaque, gingivitis, and calculus—were of interest in this analysis. The caries process is not a step function where surfaces or teeth transition instantly from sound to cavitation. Nor can visual exams or radiographic methods with several categories instead of two adequately describe the process. Caries is a more gradual disease process, with demineralization and remineralization occurring over time. Caries lesions occur when demineralization is the dominant process. As a better reflection of the biology, the new caries diagnostic tools attempt to measure the caries process on a continuum in a quantitative manner. Thus, they yield continuous rather than ordinal or dichotomous results for each surface. In addition, these methods lead to unbalanced data, since they are usually available for only a subset of the surfaces of interest. One possibility for incorporating these data into the traditional analysis of covariance method for caries trials would be to dichotomize the continuous measures and then classify the surface as caries if any of the measures (including visual exam) was above the threshold. However, establishing useful cut-off values for these new measures is a difficult process and results in a large amount of lost information. The use of the continuous data from these new methods holds the most promise for increasing the efficiency of future caries trials. There has been a variety of proposed methods for analyzing clinical trials with multiple outcomes, although not specifically for caries trials. Pocock (1997) looked at many of the proposed methods and discussed some of the associated practical and statistical issues. Although there are several possible classifications, we have chosen to characterize the methods broadly into four main themes: (1) Define each of the diagnostic measures as a primary or secondary outcome; (2) perform tests for each measure but formally control the type one error rate; (3) combine the data from the multiple outcomes into a single global test; and (4) construct a combined endpoint or index based on all of the methods. The first approach depends on a pre-specified decision rule to determine the "success" of the trial. If there is only a single primary outcome, then the success of the trial depends solely on that measure. Other pre-specified decision rules based on combinations of primary and secondary outcomes are also possible (Chi, 1998). The second method is to adjust the p-values for the individual tests (e.g., Bonferroni) or to control the type I error rate in some other way (Cook and Farewell, 1996; Zhang et al., 1997). We will not consider either of these approaches further, since they can easily yield different conclusions for each outcome, leading to uncertain interpretation of the results and/or conflict between the sponsor and the regulatory agencies. OBrien (1984) presented three global tests for multiple outcomes and showed that, for situations where group differences would be expected to be in the same direction for all measures, they had superior power to the traditional multivariate analyses. A brief description of the rank-sum, ordinary least-squares (OLS), and general least-squares (GLS) procedures follows. For the rank-sum test, the value for each subject is ranked for each outcome, and then the ranks are summed across outcomes within subject. For large sample sizes, this sum can then be compared among groups by standard parametric analyses (e.g., ANCOVA). The OLS and GLS procedures require that the outcomes be transformed to a common scale. For continuous measures, this is usually accomplished by converting each value to a Z-score. The OLS method is equivalent to a repeated-measures analysis of variance, where the outcomes are the repeated factor. This yields equal weights for all of the outcomes. The GLS procedure is similar, except that the outcomes do not receive equal weights. The weights are computed from the inverse of the sample covariance matrix. The OLS and GLS tests are equivalent when the outcomes are equally correlated. More recently, several other global tests have been proposed. One such method proposes treating the outcome measures as correlated data and then using generalized estimating equations (GEE) to perform a global analysis (Liang and Zeger, 1986; Lefkopoulou and Ryan, 1993). In addition, an approximate likelihood ratio test (Tang et al., 1989) and a multivariate linear mixed-model approach (Sammel et al., 1999) have also been developed. A simple method suggested by Wittes is to use the maximum value of the multiple outcome measures as the value for each subject and then perform a standard analysis (Follman, 1995). One thing that all of these methods have in common, with the notable exception of the rank-sum test, is that they require that the outcomes have the same scale. Since this is rarely the case, each outcome is transformed to a common scale. The mix of ordinal and continuous data in caries trials makes this an important issue. The construction of a caries index could possibly be accomplished with the use of multivariate statistical methods such as factor analysis. However, we have chosen to develop candidate indices by extending current practice and combining data from different methods at the surface level. In the remainder of this paper, we will illustrate several of the global tests and construct some candidate indices using data from a recently completed caries trial. MATERIALS & METHODS
Trial Design
Global Tests Four global tests were implemented for the data from this trial—rank sum, OLS, GLS, and GEE—with an unstructured working correlation matrix. These were chosen primarily for their ease of implementation. As noted above, the OLS, GLS, and GEE require that the outcomes be transformed to the same scale. For DD and ECM, this was done by transforming the observed value at each surface into a z-score (observed value minus baseline mean, all divided by the baseline standard deviation). For the CFX, observations in each of the 7 categories were converted to a standard normal scale by transforming the median percentile within each category according to the normal distribution (i.e., the 50 percentile transforms to 0, and the 97.5 percentile converts to 1.96). For all three measures, the data were then averaged over surfaces to yield the value for each individual.
Caries Indices
Hybrid Method RESULTS
Univariate Analyses
Global Tests The transformations for the three outcomes to a standard normal scale were reasonably successful. For example, the means and standard deviations of the average scores across surfaces for the baseline exam are as follows: CFX 0.078 (0.16), DD 0.003 (0.38), and ECM 0.003 (0.45). All of the distributions were generally bell-shaped but slightly skewed, with CFX showing the greatest skewness. It should be noted that the subject values are the means across surfaces, so that the standard deviation among individuals would be expected to be less than 1.0, and CFX would have the smallest standard deviation, since it is based on more surfaces. The analyses were implemented on the differences between baseline and follow-up for each transformed measure, and age and randomization strata were included as covariates. Difference scores were chosen to adjust each technique directly for its baseline value, rather than including all three baseline values as covariates in the analysis. When the original data were used (Table 2
Caries Indices
Hybrid Methods
DISCUSSION All of the methods presented in this paper perform well when the diagnostic measures all reflect the same direction. In the data from the caries trial, this was not the case, and the augmented data are a better illustration of the analysis methods. The properties of these diagnostic measures are not the focus of this paper, but none of the currently available new measurement methods has yet been sufficiently validated to be widely accepted in modern caries clinical trials. In addition, we made a series of decisions concerning transformation of the diagnostic results and the construction of our indices. Although these decisions were unlikely to make much difference in our results, there would need to be agreement on these issues if a "standard analysis" for caries trials is to be developed. Simulation studies comparing some of the global tests have shown that the GLS, which is the most flexible, tends to perform at least as well in most situations and better in some (OBrien, 1984). The rank-sum test has the major advantage that no transformations are needed, and descriptive statistics can be presented on the original scales. However, as with most non-parametric analyses, if the assumptions of a parametric approach can be met, the rank-sum test will be less powerful. The caries indices may have great appeal, since they are essentially an extension of current practice. This is particularly true of the MAX index. One drawback of the new indices is that the resulting scale has no biological interpretation and is dependent on the baseline distribution of each measure. Perhaps in the future, a standard transformation could be used across trials. This would result in comparable numbers. Finally, the hybrid method performed as well as the others and exhibited more homogeneous results across the analysis methods. This may indicate that it is more robust, but it is difficult to draw conclusions from a single trial. The hybrid method does allow the investigator to examine different areas of the mouth but is the most computationally unwieldy. Still, it can handle the unbalanced data in the fairest way. The major goal of adding new diagnostic tests to a caries trial is to increase our ability to detect differences among treatments. Analysis of the augmented data clearly shows that all of these methods are able to increase the power of a clinical caries trial if the diagnostic methods are an accurate and precise measure of the caries process.
FOOTNOTES Presented at the International Consensus Workshop on Caries Clinical Trials, Glasgow, Scotland, January 7–10, 2002 REFERENCES
Journal of Dental Research, Vol. 83, No. suppl 1,
C109-C112 (2004) This article has been cited by other articles:
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

