| Sign In to gain access to subscriptions and/or personal tools. |
Analysis of Clinical Trials Involving Non-cavitated Caries Lesions
1 Department of Biostatistics & Epidemiology, Cleveland Clinic Foundation/Wb4, 9500 Euclid Avenue, Cleveland, OH 44195, USA; and Correspondence: * corresponding author, pimrey{at}bio.ri.ccf.org
ABSTRACT Treatments to halt or reverse the progression of non-cavitated caries lesions are of increasing interest. Diagnostic technologies under development offer potential for the assessment of gradual progression and regression of such lesions. Many therapies directed at correcting demineralization-remineralization imbalance should, in principle, protect enamel similarly across lesion severities from initiation to near cavitation. If this is so, and if acceptable reproducibility and predictive validity can be demonstrated for a diagnostic of acceptable cost, then clinical trials of agents to prevent cavitation can become more efficient by the use of outcome indices that reflect, in addition to cavitation, the expansion and regression of non-cavitated lesions. However, to achieve such a benefit will require data analyses that fully exploit ordinal or continuous-scale outcome measures. We consider comparison of such measures of lesion status between treatment groups, with most attention to ordinal categorical data. Interim data from a clinical trial in Lithuanian children are used for illustration.
Key Words: dental caries clinical trials data analysis ordinal categorical data diagnostic modalities INTRODUCTION Clinical trials of anticaries agents typically analyze a primary outcome variable that attempts to represent the net conversion from apparently sound to frankly cavitated teeth or tooth surfaces in each subject, over a time interval appropriate to the relevant therapies and subject population. Such analyses may be based on a member of any of three classes of outcome variables. The cumulative incidence class of measures uses changes in counts of teeth or surfaces that have or may have become cavitated—for instance, change in D3MFT or a variant, possibly with some correction for misclassification error—over a fixed observation period. The incidence density class uses changes in such measures in relation to accumulated exposure, defined as proportional to the number of observed surfaces or teeth at risk and to the period of observation. The time to cavitate class uses the period of survival of a non-cavitated tooth or surface until cavitation is observed or observation ceases. Although their differences are meaningful, all these approaches share a common limitation: They rely on a binary classification of surfaces as simply non-cavitated or cavitated. This reliance has historically been dictated by the limitations of diagnostic methods but is at odds with our current understanding of cavitation as resulting from failure of a normal demineralization/remineralization process that continuously affects every dental surface. Indeed, a consensus is emerging that the future of dental practice in developed countries lies in the early recognition of incipient caries and in therapeutic intervention to prevent initial cavitation. A recent US NIH Consensus Statement (Bowersox, 2001) noted that "Digitally acquired and postprocessed images have great potential in the detection of noncavitated caries and in the diagnosis of secondary caries. Promising new diagnostic techniques are emerging, including fiber-optic transillumination and light and laser fluorescence.... At this time the panel senses a paradigm shift in the management of dental caries toward improved diagnosis of early noncavitated lesions and treatment for prevention and arrest of such lesions." (For relevant reviews, see, e.g., Featherstone and Fried, 2001; Stookey and Gonzalez-Cabezas, 2001.) However, no index based entirely on binary classifications of surfaces as non-cavitated or frankly cavitated can capture the intermediate information on progression and regression of lesions that is becoming available from new diagnostic technologies and that should offer, at least in principle, the most sensitive indication of therapeutic benefit from agents meant to halt and/or reverse early lesions. For this purpose, we require measurement scales that distinguish among different levels of progression toward cavitation or remineralization of an individual surface, and analytic techniques that can efficiently use the information that is accumulated when such ordinal, interval, or ratio-scaled data are observed longitudinally on multiple surfaces. Techniques for analyses of continuous (interval or ratio scale) data dominate a century of statistical literature and most statistics texts. There is also a very large body of literature on methods for ordinal data, but the core of this literature is as yet somewhat less well-known and perhaps more subject to misconception. This paper will briefly consider several topics in the analysis of correlated non-binary observations. We will emphasize but not entirely confine ourselves to consideration of ordinal categorizations, such as may arise from a combination of visual and tactile observations or from grouped continuous observations. For simplicity, it will be assumed that a single primary outcome variable has been specified initially, that the central focus is on progression/regression of a gradual process rather than avoidance of a "concluding" event such as cavitation, and that interest is in a single terminal measurement of the primary outcome, or in change between baseline and terminal measurements. Generalizations to repeated follow-up observations are not difficult. We discuss some issues relevant to analyses of caries data in such situations, emphasizing "semiparametric" methods that are relatively light on statistical assumptions, and illustrating using data also examined by Katz and Huntington (2004). SCORES AND MODELS FOR ORDINAL CATEGORIZATIONS
Differences in Mean Scores But the grounds for concern are easily misconstrued, and the criticism is often inappropriate. Choice of scores affects the relative sensitivity (statistical power) of resulting tests of treatment differences in different situations, but does not affect the fundamental validity of these tests. Scores should be selected in advance of the data, to be consistent with the clinical severity of the lesions they describe or the progress of the underlying pathologic process. This ensures that disparities in aggregated scores will reasonably represent disparities in disease burden, and hence be readily interpreted and accepted by the scientific and clinical communities. Within the realm of clinically meaningful possibilities, scores should also be selected to yield high power against the types of treatment differences suggested by the modes and speeds of action of the test agents. Since power is typically insensitive to moderate variations in the spacing of scores, precise spacing is rarely an issue (Wainer, 1976). Where the criteria of biological relevance and statistical power are irreconcilable, even after possible restriction of the subject population, then the grounds for interest in the new treatment are suspect. (Note that in clinical trials of efficacy and superiority, economic and scientific interests coincide such that all parties share an interest in using scores of high power. In equivalence trials, where this is not necessarily the case, controversy may be avoided by importing outcomes and associated scores from efficacy and superiority studies.) Finally, as shall be seen below, analyses of ordinal variables that avoid use of pre-specified scores often incorporate implicit forms of scoring that may also be controversial. Once scores have been selected, several analytic choices remain, including:
Although technical issues of implementation differ, these are essentially the same choices to make in analysis of continuous outcome or other variables, such as subject level averages of discrete surface or tooth scores, for which the use of continuous approximating distributions is appropriate.
Distributional Shift Models
Equal adjacent-odds ratios
The equal adjacent-odds ratio model may be regarded as a simplification of a model known alternatively as the multicategory, polychotomous, or generalized logit model (Agresti, 1996; Stokes et al., 2000). Under this model, the log ORj,j–1 = log
When the equal adjacent-odds ratio model adequately fits the data, the parameter
In this connection, it is relevant to note that the
Proportional odds
for j = 1,...,J – 1. Then the proportional odds model requires that the ratios of cumulative odds for treatment 2 relative to treatment 1, There is, of course, no prima facie biological reason why either the equal adjacent-odds ratio or the proportional odds model should ever be a true representation of nature. As the statistician George Box famously said, "All models are wrong; some models are useful." However, if the ordered categories are presumed to reflect intervals of values of a continuous latent random variable, the two models have similar mathematical rationales that may in some circumstances be attractive. Specifically, the proportional odds model applies exactly if the outcome category for an individual observation is determined by the location, with respect to a sequence of cut-points, of a linear function of covariate values and a random observation from a logistic distribution (a symmetric continuous probability law similar to the standard normal distribution, but with lower center and heavier tails; Agresti, 2002). The cut-points, as well as the resulting probabilities and cumulative odds, may be estimated from data. Substitution of a normal distribution for the logistic in this setting implies that the uniform association model will fit approximately, provided the number of cut-points is not too large (Goodman, 1981). Since the normal and logistic distributions are themselves similar, for many datasets both models will fit adequately and give similar results with respect to estimation of a treatment effect. This is not guaranteed, though. When neither model fits, particularly when the data do not conform to a general upward or downward shift across the range of the response variable distribution, then the models may differ considerably in both numerical results and substantive implications. The assumptions of each of these models and of other such alternatives may be evaluated by examining the odds ratios or cumulative odds ratios observed in a given dataset. Thus, the choice of how to represent ordinality in analysis becomes arguable on empirical statistical grounds, based on properties of the data that are quite separable from and neutral regarding the existence or absence of a treatment effect, in the spirit of Berry (1987) and other data analysts. However, the incorporation of ordinality in this fashion discards a clear linkage of spacing to underlying biology. Thus, objectivity and statistical simplicity of interpretation may be gained at the possible expense of clinical interpretability. The models above are simple representatives meant to illustrate a substantial class of ordinal data models. [See Becker (1998) and Agresti (1984) for more thorough surveys.] BASELINE ORDINAL COVARIATE DATA In some contexts, parametric adjustment for baseline ordinal covariate data poses conceptual problems. Particularly when the number of categories is small and/or there is controversy over scoring, it may not seem reasonable to model the effects of covariates in the same manner as the treatment effect. For instance, when adjusting for the baseline value of the outcome measure, a linear covariance analysis may seem unreasonable because of boundary effects for quite low and high baseline values. Similarly, while the proportional odds assumption may be reasonable for the effect of treatment, it may not be reasonable for modeling effects of the baseline value of the outcome or other covariates. For the purposes of establishing a treatment effect while minimizing potentially controversial assumptions, a general method of non-parametric covariance analysis for randomized studies may be useful. The approach assumes that the basic model under which treatment effect is to be estimated, exclusive of covariate adjustment, can be implemented by applying generalized least-squares (GLS) to a vector of functions of the observations, and that consistent estimates of the covariances between these functions and the within-treatment means or category proportions of covariates are available. The adjustment is then accomplished by appending to the GLS-based model for treatment effect a model that equates the expectations of the covariate means or proportions in the two treatment arms, and estimating the treatment parameter(s) under this joint model. The expected covariate means or proportions indeed must satisfy this model: They are equal by virtue of the randomization process through which subjects are allocated to treatments.
Consider, for example, a two-group trial with a univariate response yij by the jth subject in the ith treatment arm (i = 1,2;j = 1,...,ni), where a difference
with the
based on sums of cross-products across subjects. Partitioning VF = V1 + V2 =
the covariate-adjusted estimate of treatment effect is then obtained as EXAMPLE As illustration, we consider data from a two-year superiority trial of caries prevention products A and B, containing, respectively, 2500 and 1000 ppm fluoride as sodium monofluorosphosphate, in 2141 Lithuanian schoolchildren between 11 and 15 years of age at baseline. Treatment A had previously been shown to be superior to B in anticariogenic activity over 24 months. The children were randomized to treatment within each of 12 strata defined by factorial combinations of gender (male or female), second molar occlusal surfaces erupted (1–2 vs. 3–4), and initial D1MFS score (< 17, 17–26, > 26). Two-year followup was planned; data from one-year follow-up of 1063 children are available for this example. At baseline and 12 months, each subject was examined by visual inspection, Fiber Optic Transillumination (FOTI) for approximal surfaces and posterior occlusal surfaces, and radiography. DIAGNOdent (DD) was also used for approximal and occlusal posterior surfaces, and the Electric Caries Meter (ECM) as well for occlusal posterior surfaces, but we limit ourselves here to a seven-category ordinal classification based on combined visual, x-ray, and FOTI results:
For consistency between analyses, we analyzed only those tooth surfaces for which full data were available at both examinations.
Table 1
We examine the results of various analyses and note how they perform in these circumstances. For analyses based on mean scores, five scoring schemes will be considered, all normalized to a range of zero through six:
Table 3
It is noteworthy and encouraging that binary scores distinguishing between the presence and absence of any non-cavitated or cavitated lesion (s4) yield a statistically significant comparison, while those (s5) distinguishing only cavitated lesions did not statistically differentiate among these products on the basis of one-year change. The ordinal scores s3 that strongly emphasize pre-cavitated lesions also approached statistical significance, but were less sensitive than binary scores s4 because the preponderance of observed treatment effect was localized to very early lesions. Perhaps higher concentrations of fluoride are more effective against early than against later lesions. Also, while the scoring schemes clearly matter, note the similarity between results based on the widely disparate schemes s1, s2, s5, and even s3, supporting the relative insensitivity of results to moderate differences. The performance of these scores, of course, also reflects the (by no means immutable) manner in which the diagnostic methods were combined to form categories. The improvement in precision obtained by adjustment for baseline is apparent in comparison of the two rows of the table, but substantively the results of the unadjusted and adjusted analyses are similar. Subject-level non-parametric covariance analysis with equally spaced scores yields an estimated effect of 0.13 with p = 0.16. Analyses at the site-specific level, adjusting for clustering using elementary sample survey-based methods (Binder, 1983), should yield results very similar to those above.
In contrast, we now consider purely ordinal models in which no explicit category scores are used, but in which simplifying constraints are placed on probabilities through assumptions about ratios of adjacent generalized or cumulative odds. Fitting the equal adjacent-odds ratio model to the status of individual surfaces at the one-year examination, with sample survey-based adjustment for correlation among sites within the mouth, estimates the odds of a surface falling in the next highest category for subjects under Treatment B as 1.005 times the corresponding odds for surfaces in subjects under Treatment A. This effect does not attain statistical significance (Table 4
What can be learned from the differing results of these analyses? We assume, based on previous studies of these fluoride doses over longer time periods, that the treatment effect detected by some analyses is real. First, and unsurprisingly, we note that adjustment for predictive baseline covariables generally increases sensitivity, as supported by comparing the two rows of Table 3
The wide spread of adjacent category odds ratios noted earlier in Table 2 HEAVY-TAILED CONTINUOUS DATA Methods for the analysis of correlated continuous data, such as may arise from new quantitative measures of surface mineralization (e.g., subtraction radiography, DIAGNOdent, and the Electric Caries Meter), are extensively developed and well-documented. We confine ourselves to remarking on an issue that is sometimes neglected in practice, but may at times be pertinent to caries studies. It is commonly assumed that continuous measurements are more informative than classifications, and that continuous data are used most efficiently when their actual values are incorporated into descriptive summaries and hypothesis tests. This assumption is usually but not always correct. One exception is when continuous measurements are made with a great deal of error but related classifications can be made far more reliably. In that case, the apparent precision of continuous data is spurious, and the extra variability introduced by measurement error may be more damaging than would reduction to a cruder but more reliable categorical representation. A second exception is when data arise from a distribution with very heavy tails relative to the Gaussian, such as the double exponential or Cauchy distributions. We generally think of heavy-tailed distributions as caused by outliers, and the Cauchy distribution—which has tails so heavy that the mean and variance of the distribution do not exist, and most conventional properties of sample statistics do not hold—is often thought of as a theoretical oddity. But the Cauchy distribution is the probability law that governs the ratio of two standard Gaussian random variables, and thus its occurrence in practice is hardly inconceivable. For that matter, measurements from a technology might take this or a similar form, unbeknownst to the consumer, if a random Gaussian denominator were used in a normalization process.
As an example of somewhat heavy-tailed data in practice, the Fig.
Also, consider the choice among the t test, the Wilcoxon signed-rank test, and the sign test for testing the hypothesis that a distribution is centered at zero. This problem arises in clinical pre-post comparisons and many other applications. When the underlying distribution is Gaussian or close to it, as is usually assumed, the t test is clearly superior, and the sign test is highly inefficient. But if the underlying distribution is Cauchy, then the relative efficiency of the t test to the sign test declines to zero, because the t test depends upon the mean, and increasing sample sizes do not add additional information to the mean. Somewhat less pathologically, if the underlying distribution is double-exponential, the sign test is more efficient than the t test, with the Wilcoxon signed-rank test preferable to both. There is no guarantee that continuous measurements using new diagnostic methodologies will closely follow a Gaussian or any other particularly convenient statistical model. Indeed, biological outliers may signal rapid demineralization, and substantial measurement errors are more common with new than with established diagnostic systems. Consequently, careful attention should be paid to distributional shapes, and non-parametric approaches or other robust approaches to analysis may be of particular importance in the interpretation of clinical trial outcomes with the use of new diagnostic tools. SUMMARY COMMENTS Numerous methods are available for the analysis of ordinal categorizations or continuous data from caries-diagnostic tools. All analyses should properly account for correlation among measurements from the same subject, but the unit of analysis may be the site, tooth, or subject, as appropriate to answer a given scientific question. Scoring of ordinal categories is a very useful technique that benefits from agreement on basic conceptualization of the underlying biological process and the expected nature of a treatment effect, but does not require consensus on precise distances between categories along an underlying biological continuum. Dichotomies are special and extreme cases of scoring that may discard valuable information. Ordinal statistical models that avoid explicit scoring also may assume strong constraints on the nature of the treatment effect. Caution is necessary in the design and analysis of studies of any treatment-outcome combination for which the measurable effect of the treatment during the observational time frame may vary substantially depending upon the baseline disease measurement, as apparently in the example described above. Such variation of the treatment effect with initial disease status renders results of statistical models based on general distributional shifts less relevant to the biological situation and more difficult to interpret. Continuous measurements need not be normally distributed, and standard parametric analyses of data with very heavy tails can miss effects that are readily found with more robust approaches. Consequently, there is no guarantee that use of a site-level ordinal or continuous demineralization-based outcome measure will produce a more efficient clinical trial than would classic dichotomization by cavitation. However, caveats notwithstanding, close attention to (i) measurement reliability, (ii) the nature of treatment differences that may reasonably be anticipated based on underlying caries biology and mechanisms of action, and (iii) the anticipated statistical properties of selected ordinal or continuous outcome measures should produce clinical trials that can be shorter and smaller than previously, because they use new diagnostic modalities to increase the harvest of information about treatment activity from each site and subject studied.
ACKNOWLEDGMENTS The authors are grateful to Richard Chesters and Unilever Research for sharing the data used in the analytic examples, to Barry Katz for providing SAS® code to extract the ordinal variable analyzed in the "EXAMPLE" section, and to two anonymous referees for helpful comments. FOOTNOTES Presented at the International Consensus Workshop on Caries Clinical Trials, Glasgow, Scotland, January 7–10, 2002 REFERENCES
Journal of Dental Research, Vol. 83, No. suppl 1,
C103-C108 (2004) This article has been cited by other articles:
|
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
ij be the probability that a random subject who receives treatment i ends therapy in outcome category j, for i = 1,2 and j = 1,...,J. We consider two commonly used ordinal models, each of which is readily extended to dealing with multiple sites within a subject in the presence of covariates.
) for all j = 2,...,J—in other words, that the treatment 2 odds between adjacent categories are all the same multiple of the odds between these categories under treatment 1. A consequence is that the odds of being in category j relative to category 1 under treatment 2 are an increasing power
of 
, be identical for each of j = 1,...,J – 1. Remarks similar to those above regarding
of these J – 1 cumulative odds ratios. Under the proportional odds model,
2 - 

, i = 1,2 being estimated by the conventional empirical covariance matrices 

= (
2 - 

