Advanced Search

Journal Navigation

Journal Home

Subscriptions

Archive

Contact Us

Table of Contents

CiteULike is a free service for managing and discovering scholarly references - click here to get started.

Sign In to gain access to subscriptions and/or personal tools.
Journal of Dental Research
This Article
Right arrow Abstract Freely available
Right arrow Figures Only
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to Saved Citations
Right arrow Download to citation manager
Right arrowRequest Permissions
Right arrow Request Reprints
Right arrow Add to My Marked Citations
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Right arrow Citing Articles via Scopus
Google Scholar
Right arrow Articles by Imrey, P.B.
Right arrow Articles by Kingman, A.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Imrey, P.B.
Right arrow Articles by Kingman, A.
Right arrowPubmed/NCBI databases
Medline Plus Health Information
*Clinical Trials
Social Bookmarking
 Add to CiteULike   Add to Complore   Add to Connotea   Add to Del.icio.us   Add to Digg   Add to Reddit   Add to Technorati   Add to Twitter  
What's this?

ARTICLES

Analysis of Clinical Trials Involving Non-cavitated Caries Lesions

P.B. Imrey1,* and A. Kingman2

1 Department of Biostatistics & Epidemiology, Cleveland Clinic Foundation/Wb4, 9500 Euclid Avenue, Cleveland, OH 44195, USA; and
2 Biostatistics Core, Division of Population and Health Promotion Sciences, National Institute of Dental and Craniofacial Research, National Institutes of Health, 45 Center Drive, Room 4As-25U, Bethesda, MD 20892-6401;

Correspondence: * corresponding author, pimrey{at}bio.ri.ccf.org

ABSTRACT

Treatments to halt or reverse the progression of non-cavitated caries lesions are of increasing interest. Diagnostic technologies under development offer potential for the assessment of gradual progression and regression of such lesions. Many therapies directed at correcting demineralization-remineralization imbalance should, in principle, protect enamel similarly across lesion severities from initiation to near cavitation. If this is so, and if acceptable reproducibility and predictive validity can be demonstrated for a diagnostic of acceptable cost, then clinical trials of agents to prevent cavitation can become more efficient by the use of outcome indices that reflect, in addition to cavitation, the expansion and regression of non-cavitated lesions. However, to achieve such a benefit will require data analyses that fully exploit ordinal or continuous-scale outcome measures. We consider comparison of such measures of lesion status between treatment groups, with most attention to ordinal categorical data. Interim data from a clinical trial in Lithuanian children are used for illustration.

Key Words: dental caries • clinical trials • data analysis • ordinal categorical data • diagnostic modalities

INTRODUCTION

Clinical trials of anticaries agents typically analyze a primary outcome variable that attempts to represent the net conversion from apparently sound to frankly cavitated teeth or tooth surfaces in each subject, over a time interval appropriate to the relevant therapies and subject population. Such analyses may be based on a member of any of three classes of outcome variables. The ‘cumulative incidence’ class of measures uses changes in counts of teeth or surfaces that have or may have become cavitated—for instance, change in D3MFT or a variant, possibly with some correction for misclassification error—over a fixed observation period. The ‘incidence density’ class uses changes in such measures in relation to accumulated exposure, defined as proportional to the number of observed surfaces or teeth at risk and to the period of observation. The ‘time to cavitate’ class uses the period of survival of a non-cavitated tooth or surface until cavitation is observed or observation ceases. Although their differences are meaningful, all these approaches share a common limitation: They rely on a binary classification of surfaces as simply non-cavitated or cavitated. This reliance has historically been dictated by the limitations of diagnostic methods but is at odds with our current understanding of cavitation as resulting from failure of a normal demineralization/remineralization process that continuously affects every dental surface.

Indeed, a consensus is emerging that the future of dental practice in developed countries lies in the early recognition of incipient caries and in therapeutic intervention to prevent initial cavitation. A recent US NIH Consensus Statement (Bowersox, 2001) noted that "Digitally acquired and postprocessed images have great potential in the detection of noncavitated caries and in the diagnosis of secondary caries. Promising new diagnostic techniques are emerging, including fiber-optic transillumination and light and laser fluorescence.... At this time the panel senses a paradigm shift in the management of dental caries toward improved diagnosis of early noncavitated lesions and treatment for prevention and arrest of such lesions." (For relevant reviews, see, e.g., Featherstone and Fried, 2001; Stookey and Gonzalez-Cabezas, 2001.) However, no index based entirely on binary classifications of surfaces as non-cavitated or frankly cavitated can capture the intermediate information on progression and regression of lesions that is becoming available from new diagnostic technologies and that should offer, at least in principle, the most sensitive indication of therapeutic benefit from agents meant to halt and/or reverse early lesions. For this purpose, we require measurement scales that distinguish among different levels of progression toward cavitation or remineralization of an individual surface, and analytic techniques that can efficiently use the information that is accumulated when such ordinal, interval, or ratio-scaled data are observed longitudinally on multiple surfaces.

Techniques for analyses of continuous (interval or ratio scale) data dominate a century of statistical literature and most statistics texts. There is also a very large body of literature on methods for ordinal data, but the core of this literature is as yet somewhat less well-known and perhaps more subject to misconception. This paper will briefly consider several topics in the analysis of correlated non-binary observations. We will emphasize but not entirely confine ourselves to consideration of ordinal categorizations, such as may arise from a combination of visual and tactile observations or from grouped continuous observations. For simplicity, it will be assumed that a single primary outcome variable has been specified initially, that the central focus is on progression/regression of a gradual process rather than avoidance of a "concluding" event such as cavitation, and that interest is in a single terminal measurement of the primary outcome, or in change between baseline and terminal measurements. Generalizations to repeated follow-up observations are not difficult. We discuss some issues relevant to analyses of caries data in such situations, emphasizing "semiparametric" methods that are relatively light on statistical assumptions, and illustrating using data also examined by Katz and Huntington (2004).

SCORES AND MODELS FOR ORDINAL CATEGORIZATIONS

Differences in Mean Scores
The conceptually simplest approach to analyzing an ordinal dependent variable is to assign to its categories numerical scores, increasing in severity of the corresponding lesions, and to analyze these scores using statistical models for continuous outcome measurements. Technical corrections for discreteness of the scores can be incorporated where needed, e.g., using Wald statistics (Stokes et al., 2000), but may well be unnecessary (e.g., Heeren and D’Agostino, 1987; Cohen, 2001). This approach is often viewed skeptically, however, because results depend on relative spacings of scores between pairs of adjacent categories, for which a given choice may be perceived as arbitrary.

But the grounds for concern are easily misconstrued, and the criticism is often inappropriate. Choice of scores affects the relative sensitivity (statistical power) of resulting tests of treatment differences in different situations, but does not affect the fundamental validity of these tests. Scores should be selected in advance of the data, to be consistent with the clinical severity of the lesions they describe or the progress of the underlying pathologic process. This ensures that disparities in aggregated scores will reasonably represent disparities in disease burden, and hence be readily interpreted and accepted by the scientific and clinical communities. Within the realm of clinically meaningful possibilities, scores should also be selected to yield high power against the types of treatment differences suggested by the modes and speeds of action of the test agents. Since power is typically insensitive to moderate variations in the spacing of scores, precise spacing is rarely an issue (Wainer, 1976). Where the criteria of biological relevance and statistical power are irreconcilable, even after possible restriction of the subject population, then the grounds for interest in the new treatment are suspect. (Note that in clinical trials of efficacy and superiority, economic and scientific interests coincide such that all parties share an interest in using scores of high power. In equivalence trials, where this is not necessarily the case, controversy may be avoided by importing outcomes and associated scores from efficacy and superiority studies.) Finally, as shall be seen below, analyses of ordinal variables that avoid use of pre-specified scores often incorporate implicit forms of scoring that may also be controversial.

Once scores have been selected, several analytic choices remain, including:

  1. whether to analyze the outcome measure with or without adjustment for baseline;
  2. the type of adjustment to use, if adjustment is indeed chosen (analysis of change scores, parametric covariance adjustment, non-parametric covariance adjustment, repeated-measures analysis, or other multivariate parametric analysis with accompanying test of contrast);
  3. whether to weight equally across sites, teeth, or subjects;
  4. the degree of covariance-structure modeling to use in adjusting for within-subject correlation (from analyzing subject-based summaries or using sample survey-based adjustments that require essentially no modeling [minimal], to generalized estimating equation marginal modeling [GEE], to random effects models possibly including pre-specified correlation structures [maximal modeling]); and
  5. how to deal with missing values and apparent measurement errors, such as the apparent disappearance of untreated cavitation.

Although technical issues of implementation differ, these are essentially the same choices to make in analysis of continuous outcome or other variables, such as subject level averages of discrete surface or tooth scores, for which the use of continuous approximating distributions is appropriate.

Distributional Shift Models
When explicit category scores are unavailable, models for category probabilities that incorporate ordinal information may be used instead. However, it should be acknowledged that these models incorporate ordinality by constraining relationships among probabilities in ways that may, in the substantive context, be no less arbitrary than explicit category scoring. We assume now, for the sake of simple illustration, a parallel two-arm clinical trial with a J-category ordinal subject-level outcome, and let {pi}ij be the probability that a random subject who receives treatment i ends therapy in outcome category j, for i = 1,2 and j = 1,...,J. We consider two commonly used ordinal models, each of which is readily extended to dealing with multiple sites within a subject in the presence of covariates.

Equal adjacent-odds ratios
Define the odds of being in category j vs. category j' for the ith treatment arm as the ratio oi;j,j' = {pi}ij/{pi}ij', for i = 1,2 and j,j' = 1,...,J; the "adjacent-category odds" as the oi;j,j–1 for i = 1,2 and j = 2,...,J; and the "adjacent-category odds ratios" as the respective quotients of these odds in the second and first treatment arms, ORj,j–1 = o2;j,j–1/o1;j,j–1. The "equal adjacent-odds ratio" model, also known as the "uniform" ordinal association model (Agresti, 1996), stipulates equality of the ORj,j–1(= {Phi}) for all j = 2,...,J—in other words, that the treatment 2 odds between adjacent categories are all the same multiple of the odds between these categories under treatment 1. A consequence is that the odds of being in category j relative to category 1 under treatment 2 are an increasing power {Phi}j–1 of the same odds for treatment 1. On the log scale, we have log ORj,j' = log o2;j,j' - log o1;j,j' = (j j') log {Phi}. More generally, the log odds ratios between pairs of categories, which jointly constitute one way of summarizing the relationship of outcome to treatment, are just multiples of the numbers of categories separating the pair in the ordering.

The equal adjacent-odds ratio model may be regarded as a simplification of a model known alternatively as the multicategory, polychotomous, or generalized logit model (Agresti, 1996; Stokes et al., 2000). Under this model, the log ORj,j–1 = log {Phi}j may differ for each adjacent category pair. Mathematically, only the restriction that all {Phi}j = {Phi} differentiates the equal adjacent-odds model from the generalized logit model. However, the difference is more profound than this suggests, because the generalized logit model without this restriction entirely neglects any information in the ordinality of the response categories. The restriction itself may be thought of as having two components. The first, an ordinality requirement that all log {Phi}j have equal sign (so that either {Phi}j < 1 for every j or {Phi}j > 1 for every j), ensures that the treatment effect either consistently increases ({Phi}j > 1) or consistently decreases ({Phi}j < 1) ratios of all odds, with numerator following denominator in the category ordering, shifting the entire distribution toward one end of the scale or the other. The second, and more restrictive, component is the additional scaling requirement that this distributional shift be uniform in magnitude, in the sense that all {Phi}j be equal.

When the equal adjacent-odds ratio model adequately fits the data, the parameter {Phi} represents, in a single number, the manner in which the distribution of ordinal outcomes shifts either upward or downward for treatment 2 in comparison with the distribution in treatment 1. Values of {Phi} close to 1 represent little or no shift, while {Phi} > 1 indicates a shift toward higher outcomes with treatment 2 than with treatment 1 and, conversely, for {Phi} < 1. Even if this model is only approximately true, a test of H0 : {Phi} = 1 or, equivalently, of H0 : log {Phi} = 0, may be a powerful test of treatment differences, because an estimate Formula of {Phi} may concentrate most of the treatment’s effect in a single parameter.

In this connection, it is relevant to note that the Formula is mathematically determined by the observed mean of equally spaced scores for the ordinal categories, and that alternative models of similar mathematical form are defined by any set of category scores, no matter what their spacing. The scores in these models impose a relative spacing on the odds between different categories rather than reflecting, as in "Differences in Mean Scores" above, a presumably inherent spacing between the clinical severity of the categories. It is not clear that one type of assumption is more or less arbitrary than the other.

Proportional odds
The "proportional odds" model (Agresti, 1996; Stokes et al., 2000) is an alternative approach to representing distributional shift with a single parameter. Assuming that outcome categories are numbered in order of increasing severity, define the "cumulative odds" of being no higher than category j under treatment i as


Formula

for j = 1,...,J – 1. Then the proportional odds model requires that the ratios of cumulative odds for treatment 2 relative to treatment 1, Formula, be identical for each of j = 1,...,J – 1. Remarks similar to those above regarding {Phi} apply to the common value {Psi} of these J – 1 cumulative odds ratios. Under the proportional odds model, {Psi} is an effective one-parameter representation of a distributional shift, having the same role as {Phi} under the equal adjacent-odds ratio model.

There is, of course, no prima facie biological reason why either the equal adjacent-odds ratio or the proportional odds model should ever be a true representation of nature. As the statistician George Box famously said, "All models are wrong; some models are useful." However, if the ordered categories are presumed to reflect intervals of values of a continuous latent random variable, the two models have similar mathematical rationales that may in some circumstances be attractive. Specifically, the proportional odds model applies exactly if the outcome category for an individual observation is determined by the location, with respect to a sequence of cut-points, of a linear function of covariate values and a random observation from a logistic distribution (a symmetric continuous probability law similar to the standard normal distribution, but with lower center and heavier tails; Agresti, 2002). The cut-points, as well as the resulting probabilities and cumulative odds, may be estimated from data. Substitution of a normal distribution for the logistic in this setting implies that the uniform association model will fit approximately, provided the number of cut-points is not too large (Goodman, 1981). Since the normal and logistic distributions are themselves similar, for many datasets both models will fit adequately and give similar results with respect to estimation of a treatment effect. This is not guaranteed, though. When neither model fits, particularly when the data do not conform to a general upward or downward shift across the range of the response variable distribution, then the models may differ considerably in both numerical results and substantive implications.

The assumptions of each of these models and of other such alternatives may be evaluated by examining the odds ratios or cumulative odds ratios observed in a given dataset. Thus, the choice of how to represent ordinality in analysis becomes arguable on empirical statistical grounds, based on properties of the data that are quite separable from and neutral regarding the existence or absence of a treatment effect, in the spirit of Berry (1987) and other data analysts. However, the incorporation of ordinality in this fashion discards a clear linkage of spacing to underlying biology. Thus, objectivity and statistical simplicity of interpretation may be gained at the possible expense of clinical interpretability.

The models above are simple representatives meant to illustrate a substantial class of ordinal data models. [See Becker (1998) and Agresti (1984) for more thorough surveys.]

BASELINE ORDINAL COVARIATE DATA

In some contexts, parametric adjustment for baseline ordinal covariate data poses conceptual problems. Particularly when the number of categories is small and/or there is controversy over scoring, it may not seem reasonable to model the effects of covariates in the same manner as the treatment effect. For instance, when adjusting for the baseline value of the outcome measure, a linear covariance analysis may seem unreasonable because of boundary effects for quite low and high baseline values. Similarly, while the proportional odds assumption may be reasonable for the effect of treatment, it may not be reasonable for modeling effects of the baseline value of the outcome or other covariates.

For the purposes of establishing a treatment effect while minimizing potentially controversial assumptions, a general method of non-parametric covariance analysis for randomized studies may be useful. The approach assumes that the basic model under which treatment effect is to be estimated, exclusive of covariate adjustment, can be implemented by applying generalized least-squares (GLS) to a vector of functions of the observations, and that consistent estimates of the covariances between these functions and the within-treatment means or category proportions of covariates are available. The adjustment is then accomplished by appending to the GLS-based model for treatment effect a model that equates the expectations of the covariate means or proportions in the two treatment arms, and estimating the treatment parameter(s) under this joint model. The expected covariate means or proportions indeed must satisfy this model: They are equal by virtue of the randomization process through which subjects are allocated to treatments.

Consider, for example, a two-group trial with a univariate response yij by the jth subject in the ith treatment arm (i = 1,2;j = 1,...,ni), where a difference y2 - y1 between the mean responses is of primary interest, and where a k-vector of covariates x'ij = (xij1,...,xijk) is observed corresponding to yij. Then the model may be written as


Formula


Formula

with the Formula, i = 1,2 being estimated by the conventional empirical covariance matrices


Formula

based on sums of cross-products across subjects. Partitioning VF = V1 + V2 =


Formula

the covariate-adjusted estimate of treatment effect is then obtained as Formula= (y2 - y1) - V'yxV–1xx(x2 - x1) with asymptotic estimated variance vFormula = V'yy - V'yxV–1xxVyx. The ratio of the squared covariate-adjusted effect to this variance, Formula2/vFormula, then gives a Wald chi-square statistic with df = 1 that may be used to test statistical significance of the treatment difference after adjustment for the covariates. [See Koch et al.(1998) for more detailed discussion, in a broader context, of this adjustment technique.]

EXAMPLE

As illustration, we consider data from a two-year superiority trial of caries prevention products A and B, containing, respectively, 2500 and 1000 ppm fluoride as sodium monofluorosphosphate, in 2141 Lithuanian schoolchildren between 11 and 15 years of age at baseline. Treatment A had previously been shown to be superior to B in anticariogenic activity over 24 months. The children were randomized to treatment within each of 12 strata defined by factorial combinations of gender (male or female), second molar occlusal surfaces erupted (1–2 vs. 3–4), and initial D1MFS score (< 17, 17–26, > 26). Two-year followup was planned; data from one-year follow-up of 1063 children are available for this example. At baseline and 12 months, each subject was examined by visual inspection, Fiber Optic Transillumination (FOTI) for approximal surfaces and posterior occlusal surfaces, and radiography. DIAGNOdent (DD) was also used for approximal and occlusal posterior surfaces, and the Electric Caries Meter (ECM) as well for occlusal posterior surfaces, but we limit ourselves here to a seven-category ordinal classification based on combined visual, x-ray, and FOTI results:

  1. No evidence of caries by visual exam, x-ray, or FOTI.
  2. Visible white spot and/or vague shadow under FOTI.
  3. Brown spot.
  4. Cavity restricted to enamel.
  5. Visible non-cavitated lesion into dentin, and/or inner or outer half enamel lesion on x-ray, and/or definite shadow under FOTI.
  6. Visible cavity into dentin, and/or inner or outer half dentin lesion on x-ray.
  7. Visible cavity into pulp, filled surface, or tooth extracted due to caries.

For consistency between analyses, we analyzed only those tooth surfaces for which full data were available at both examinations.

Table 1Go gives percentages of sites in each category, by treatment. The distributions differ little, with the exception of a 1.3% excess of surfaces with signs of caries under treatment B. Eighty-four percent (1.1 of 1.3%) of this excess consists of white spots or vague FOTI shadows (category 2). Thus, these data do not suggest a treatment effect beyond a very early stage of demineralization. This observation is also reflected in the observed adjacent category and cumulative odds ratios, shown in Table 2Go. In each case, the odds ratio formed using the boundary between first and second categories substantially exceeds all other odds ratios of the same type, which fluctuate around 1, the adjacent-odds ratios spanning a much wider range than the ratios of cumulative odds. Hence, neither distributional shift model describes these data well. Note also that in situations where treatment impact, as here, is confined to a limited range of the baseline data distribution, analyses distinguishing between ordinal categories outside that range may contaminate the restricted treatment effect with noise from data in regions where the treatment is ineffective.


View this table:
[in this window]
[in a new window]

 
Table 1. Distribution of One-year Site-specific Caries Categories, by Treatment
 

View this table:
[in this window]
[in a new window]

 
Table 2. Observed Adjacent Category and Cumulative Odds Ratios for One-year Site-specific Caries Categories
 
We examine the results of various analyses and note how they perform in these circumstances. For analyses based on mean scores, five scoring schemes will be considered, all normalized to a range of zero through six:

s1
equal spacing (0, 1, 2, 3, 4, 5, 6)

s2
linearly increasing spacing, (2/7) x (0, 1, 3, 6, 10, 15, 21)

s3
linearly decreasing spacing, (2/7) x (0, 6, 11, 15, 18, 20, 21)

s4
binary scoring: any caries, 6 x (0, 1, 1, 1, 1, 1, 1)

s5
binary scoring: cavity, 6 x (0, 0, 0, 1, 1, 1, 1)

Table 3Go displays the results of simple analyses based upon these scores, either at the one-year examination unadjusted for baseline, or adjusted for baseline status by subtraction. Mean scores were obtained for each subject at both baseline and one-year examinations, and these means were analyzed at the subject level, so that the estimated treatment effects represent differences between the across-subject means, of the within-subject averages across sites, for Treatment B vs. Treatment A. For instance, the unadjusted results for s1 are compatible with a one-category increase for roughly one in 71 tooth surfaces among patients under Treatment B, or a one-category-increased change from baseline for one in roughly 83 tooth surfaces in such patients, beyond increases seen under Treatment A. Note that approximately two-fifths of the disadvantage of Treatment B seen in Table 1Go was present at baseline. Baseline-adjusted differences between treatments are small, ranging from 0.004 to 0.046 on a 0 to 6 scale, depending upon scoring scheme. Similar results could have been obtained by analysis at the surface level with appropriate adjustment for within-subject correlation—for instance, using the svyreg command of the StataTM statistical analysis package.


View this table:
[in this window]
[in a new window]

 
Table 3. Estimated Year One Treatment Effects, Based on One-year Status and One-year Change from Baseline, Using Five Scoring Schemes: Effect/Wald Statistic p-Value
 
It is noteworthy and encouraging that binary scores distinguishing between the presence and absence of any non-cavitated or cavitated lesion (s4) yield a statistically significant comparison, while those (s5) distinguishing only cavitated lesions did not statistically differentiate among these products on the basis of one-year change. The ordinal scores s3 that strongly emphasize pre-cavitated lesions also approached statistical significance, but were less sensitive than binary scores s4 because the preponderance of observed treatment effect was localized to very early lesions. Perhaps higher concentrations of fluoride are more effective against early than against later lesions. Also, while the scoring schemes clearly matter, note the similarity between results based on the widely disparate schemes s1, s2, s5, and even s3, supporting the relative insensitivity of results to moderate differences. The performance of these scores, of course, also reflects the (by no means immutable) manner in which the diagnostic methods were combined to form categories.

The improvement in precision obtained by adjustment for baseline is apparent in comparison of the two rows of the table, but substantively the results of the unadjusted and adjusted analyses are similar. Subject-level non-parametric covariance analysis with equally spaced scores yields an estimated effect of 0.13 with p = 0.16. Analyses at the site-specific level, adjusting for clustering using elementary sample survey-based methods (Binder, 1983), should yield results very similar to those above.

In contrast, we now consider purely ordinal models in which no explicit category scores are used, but in which simplifying constraints are placed on probabilities through assumptions about ratios of adjacent generalized or cumulative odds. Fitting the equal adjacent-odds ratio model to the status of individual surfaces at the one-year examination, with sample survey-based adjustment for correlation among sites within the mouth, estimates the odds of a surface falling in the next highest category for subjects under Treatment B as 1.005 times the corresponding odds for surfaces in subjects under Treatment A. This effect does not attain statistical significance (Table 4Go). Incorporation of strata into the model, and adjustment for baseline surface status either by additional stratification or by assuming equal spacing among baseline categories, does not substantially change this result. The analogous proportional odds models, however, do find a statistically significant treatment difference after adjustment for strata, with or without adjustment for surface status at baseline, as does the generalized logit model after adjustment for both baseline risk stratum and baseline surface score (Table 4Go). The equal adjacent-odds ratio and generalized logit models were fit, after reformulation as loglinear models, using the StataTM svypois command with the desmat prefix; the proportional hazards models were fit using StataTM svyologit.


View this table:
[in this window]
[in a new window]

 
Table 4. Odds Ratios for More Advanced One-year Surface Lesion Category, and Wald Statistic P-values, from Equal Adjacent-odds Ratio, Proportional Odds, and Generalized Logit (P-Values only) Models
 
What can be learned from the differing results of these analyses? We assume, based on previous studies of these fluoride doses over longer time periods, that the treatment effect detected by some analyses is real. First, and unsurprisingly, we note that adjustment for predictive baseline covariables generally increases sensitivity, as supported by comparing the two rows of Table 3Go and by moving rightward in Table 4Go. Second, a clear understanding of treatment mechanisms and effects prior to the initiation of a clinical trial can be used to advantage in planning an analysis, by selecting an approach with high power against the anticipated desirable outcome. Although we have argued that many treatment effects should be visible across the spectrum of non-cavitated lesions, in this instance anticipation that treatment differences would be observed only at the earliest stage of disease might have been used to pre-select the dichotomous analysis specified by scores s4, achieving greater sensitivity than other possibilities. This study population experienced a fairly high level of dental caries, both cavitated and non-cavitated. It would be interesting to investigate whether the apparently higher benefit of increased fluoride level on early non-cavitated lesions is reproducible in populations with lower cavitation rates. Third, the sensitivity of analyses based on ordinal models depends on how well the models portray the underlying circumstances. In this example, the proportional odds model performed robustly despite its apparent lack of fit, while the equal adjacent-odds ratio model was much less sensitive to treatment effect localized at one end of the scale. The generalized logit model, which has some sensitivity to treatment effects of any nature but is less powerful than equal adjacent or proportional odds models against a general distributional shift, had intermediate sensitivity.

The wide spread of adjacent category odds ratios noted earlier in Table 2Go is virtually replicated by the fitted odds from the generalized logit model after adjustment for both strata and baseline status, confirming the poor fit of the equal adjacent-odds ratio model (data not shown). The adjusted cumulative odds remain less variable. Even when distributional shift does occur, one would anticipate that adjacent category odds models would be more vulnerable than proportional odds models to instability due to data sparseness such as found here in categories 4–6, because each adjacent category odds omits much of the data. Hence, such models may be less useful than alternatives when there is great imbalance in the distribution of observations across categories.

HEAVY-TAILED CONTINUOUS DATA

Methods for the analysis of correlated continuous data, such as may arise from new quantitative measures of surface mineralization (e.g., subtraction radiography, DIAGNOdent, and the Electric Caries Meter), are extensively developed and well-documented. We confine ourselves to remarking on an issue that is sometimes neglected in practice, but may at times be pertinent to caries studies.

It is commonly assumed that continuous measurements are more informative than classifications, and that continuous data are used most efficiently when their actual values are incorporated into descriptive summaries and hypothesis tests. This assumption is usually but not always correct. One exception is when continuous measurements are made with a great deal of error but related classifications can be made far more reliably. In that case, the apparent precision of continuous data is spurious, and the extra variability introduced by measurement error may be more damaging than would reduction to a cruder but more reliable categorical representation. A second exception is when data arise from a distribution with very heavy tails relative to the Gaussian, such as the double exponential or Cauchy distributions. We generally think of heavy-tailed distributions as caused by outliers, and the Cauchy distribution—which has tails so heavy that the mean and variance of the distribution do not exist, and most conventional properties of sample statistics do not hold—is often thought of as a theoretical oddity. But the Cauchy distribution is the probability law that governs the ratio of two standard Gaussian random variables, and thus its occurrence in practice is hardly inconceivable. For that matter, measurements from a technology might take this or a similar form, unbeknownst to the consumer, if a random Gaussian denominator were used in a normalization process.

As an example of somewhat heavy-tailed data in practice, the Fig.Go displays a quantile-quantile (QQ) plot of one-year ECM measurements from tooth 47 of the Lithuanian children. These ECM data are quite heavy-tailed, and not because of outliers. Normally distributed data would be expected to more or less track the solid diagonal line, while Cauchy data would be even more vertical at both upper and lower ends. The reason this matters is that parametric analyses are not robust to extreme heaviness in the tails, and conventional statistical intuition may fail dramatically in such circumstances, with major reductions in statistical efficiency relative to non-parametric methods. The most remarkable example is that sample means from the Cauchy distribution provide no more information about the center of the distribution than the first observation, no matter how large a sample is drawn!


Figure 1
View larger version (9K):
[in this window]
[in a new window]

 
Figure. Quantile-quantile plot of distribution of ECM changes.

 
Also, consider the choice among the t test, the Wilcoxon signed-rank test, and the sign test for testing the hypothesis that a distribution is centered at zero. This problem arises in clinical pre-post comparisons and many other applications. When the underlying distribution is Gaussian or close to it, as is usually assumed, the t test is clearly superior, and the sign test is highly inefficient. But if the underlying distribution is Cauchy, then the relative efficiency of the t test to the sign test declines to zero, because the t test depends upon the mean, and increasing sample sizes do not add additional information to the mean. Somewhat less pathologically, if the underlying distribution is double-exponential, the sign test is more efficient than the t test, with the Wilcoxon signed-rank test preferable to both.

There is no guarantee that continuous measurements using new diagnostic methodologies will closely follow a Gaussian or any other particularly convenient statistical model. Indeed, biological outliers may signal rapid demineralization, and substantial measurement errors are more common with new than with established diagnostic systems. Consequently, careful attention should be paid to distributional shapes, and non-parametric approaches or other robust approaches to analysis may be of particular importance in the interpretation of clinical trial outcomes with the use of new diagnostic tools.

SUMMARY COMMENTS

Numerous methods are available for the analysis of ordinal categorizations or continuous data from caries-diagnostic tools. All analyses should properly account for correlation among measurements from the same subject, but the unit of analysis may be the site, tooth, or subject, as appropriate to answer a given scientific question. Scoring of ordinal categories is a very useful technique that benefits from agreement on basic conceptualization of the underlying biological process and the expected nature of a treatment effect, but does not require consensus on precise distances between categories along an underlying biological continuum. Dichotomies are special and extreme cases of scoring that may discard valuable information. Ordinal statistical models that avoid explicit scoring also may assume strong constraints on the nature of the treatment effect.

Caution is necessary in the design and analysis of studies of any treatment-outcome combination for which the measurable effect of the treatment during the observational time frame may vary substantially depending upon the baseline disease measurement, as apparently in the example described above. Such variation of the treatment effect with initial disease status renders results of statistical models based on general distributional shifts less relevant to the biological situation and more difficult to interpret.

Continuous measurements need not be normally distributed, and standard parametric analyses of data with very heavy tails can miss effects that are readily found with more robust approaches. Consequently, there is no guarantee that use of a site-level ordinal or continuous demineralization-based outcome measure will produce a more efficient clinical trial than would classic dichotomization by cavitation.

However, caveats notwithstanding, close attention to (i) measurement reliability, (ii) the nature of treatment differences that may reasonably be anticipated based on underlying caries biology and mechanisms of action, and (iii) the anticipated statistical properties of selected ordinal or continuous outcome measures should produce clinical trials that can be shorter and smaller than previously, because they use new diagnostic modalities to increase the harvest of information about treatment activity from each site and subject studied.

ACKNOWLEDGMENTS

The authors are grateful to Richard Chesters and Unilever Research for sharing the data used in the analytic examples, to Barry Katz for providing SAS® code to extract the ordinal variable analyzed in the "EXAMPLE" section, and to two anonymous referees for helpful comments.

FOOTNOTES

Presented at the International Consensus Workshop on Caries Clinical Trials, Glasgow, Scotland, January 7–10, 2002

REFERENCES

  • Agresti A (1984). Analysis of ordinal categorical data. New York: Wiley.
  • Agresti A (1996). An introduction to categorical data analysis. New York: Wiley.
  • Agresti A (2002). Categorical data analysis. 2nd ed. New York: Wiley.
  • Becker M (1998). Ordered categorical data. In: Encyclopedia of biostatistics. Vol. 4: Med-Pre. Armitage P, Colton T, editors. New York: Wiley, pp. 3186–3195.
  • Berry DA (1987). Logarithmic transformations in ANOVA. Biometrics 43:439–456.
  • Binder DA (1983). On the variance of asymptotically normal estimators from complex surverys. Internat Statist Rev 51:279–292.
  • Bowersox J (2001). National Institutes of Health consensus development conference statement: diagnosis and management of dental caries throughout life, March 26–28, 2001. J Am Dent Assoc 132:1153–1161.[Abstract/Free Full Text]
  • Cohen ME (2001). Analysis of ordinal dental data: evaluation of conflicting recommendations. J Dent Res 80:309–313.[Abstract/Free Full Text]
  • Featherstone JDB, Fried D (2001). Fundamental interactions of lasers with dental hard tissues. Med Laser Appl 16:181–194.
  • Goodman LA (1981). Association models and the bivariate normal for contingency tables with ordered categories. Biometrika 68:347–355.[Abstract/Free Full Text]
  • Heeren T, D’Agostino R (1987). Robustness of the two independent samples t-test when applied to ordinal scaled data. Stat Med 6:79–90.[Medline] [Order article via Infotrieve]
  • Katz BP, Huntington E (2004). Statistical issues for combining multiple caries diagnostics for demonstrating caries efficacy. J Dent Res 83(Spec Iss C):C109–C113.[Abstract/Free Full Text]
  • Koch GG, Tangen CM, Jung JW, Amara IA (1998). Issues for covariance analysis of dichotomous and ordered categorical data from randomized clinical trials and nonparametric strategies for addressing them. Stat Med 17:1863–1892.[CrossRef][Medline] [Order article via Infotrieve]
  • Stokes ME, Davis CS, Koch GG. (2000). Categorical data analysis using the SAS® system. 2nd ed. Cary, NC: SAS Institute, Inc.
  • Stookey GK, Gonzalez-Cabezas C (2001). Emerging methods of caries diagnosis. J Dent Educ 65:1001–1006.[Abstract]
  • Wainer H (1976). Estimating coefficients in linear models: it don’t make no nevermind. Psycholog Bull 83:213–217.

Journal of Dental Research, Vol. 83, No. suppl 1, C103-C108 (2004)
DOI: 10.1177/154405910408301S21


Add to CiteULike CiteULike   Add to Complore Complore   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us   Add to Digg Digg   Add to Reddit Reddit   Add to Technorati Technorati   Add to Twitter Twitter    What's this?


This article has been cited by other articles:


Home page
JDRHome page
J.W. Stamm
The Classic Caries Clinical Trial: Constraints and Opportunities
Journal of Dental Research, July 1, 2004; 83(suppl_1): C6 - C14.
[Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow Figures Only
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to Saved Citations
Right arrow Download to citation manager
Right arrowRequest Permissions
Right arrow Request Reprints
Right arrow Add to My Marked Citations
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Right arrow Citing Articles via Scopus
Google Scholar
Right arrow Articles by Imrey, P.B.
Right arrow Articles by Kingman, A.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Imrey, P.B.
Right arrow Articles by Kingman, A.
Right arrowPubmed/NCBI databases
Medline Plus Health Information
*Clinical Trials
Social Bookmarking
 Add to CiteULike   Add to Complore   Add to Connotea   Add to Del.icio.us   Add to Digg   Add to Reddit   Add to Technorati   Add to Twitter  
What's this?