| Sign In to gain access to subscriptions and/or personal tools. |
The Challenges of Validating Diagnostic Methods and Selecting Appropriate Gold Standards
1 Dept. of Dentistry and Oral Hygiene, University of Groningen, A. Deusinglaan 1, NL-9713 AV Groningen, The Netherlands; and Correspondence: * corresponding author, m.c.d.n.j.m.huysmans{at}med.rug.nl
ABSTRACT Caries diagnostic methods are usually methods for caries lesion detection and measurement. Caries lesions occur on a continuous scale of tissue damage, from subclinical surface changes to macroscopic cavities reaching the pulp. Any change of a lesion on this continuous scale offers the opportunity for the diagnosis of disease activity or remission. Research aimed at remineralizing agents may focus on lesions that are amenable to remineralization, and select a method that will measure small changes in early lesions. General caries management strategies depend on detecting all stages of lesion development, and methods covering early to late stages are preferred. This paper addresses some methodological issues in validating caries diagnostic methods. The available gold standards for caries lesions are discussed, with their suitability in different applications, and their "validity" as far as it is known or can be inferred. The gold standards are compared as far as their measurement of lesion parameters and reproducibility is concerned. Tentative conclusions are formulated, and recommendations for future research are given.
Key Words: dental caries diagnostic systems INTRODUCTION For a clear understanding of the subject of validation of caries diagnostic methods, we must first make sure that we know what we mean by caries. In recent years, the definition of caries has shifted from concentrating only on lesions, to concentrating on an out-of-balance dynamic system leading to net mineral loss. Lesions are of course still the outcome of this imbalance, but the diagnostic interest has shifted from "history of disease"—caries prevalence or DMFT, for instance—to "presently active disease"—caries incidence or mineral loss. In short, caries is a disease of mineral loss, and caries lesions are symptoms of this disease. Caries lesions occur on a continuous scale of tissue damage, from subclinical surface changes to macroscopic cavities reaching the pulp. Any change of a lesion on this continuous scale, whether forward (increasing damage) or backward (repair), offers the opportunity for the diagnosis of disease activity or remission. Epidemiological surveys are still interested in caries experience, but increasingly they include not only end stages of lesion progression, but also early stages, to give a more complete picture of disease activity in a population. In the field of individual patient care, there is a need for diagnosing the actual disease, for the indication of preventive care, and for detailed detection of lesion stage, so that clinicians can judge the need for operative care. In caries clinical trials, the goal would be to measure relevant differences in progression with the smallest possible expense of time and patient numbers. When looking at increasingly smaller increments of the process of lesion progression, we must ask ourselves the question, "How representative is that increment of the whole process?" Since the caries process involves alternating de- and remineralization, at what threshold of net mineral loss should it be termed "caries"? Is the caries process the same in rate and mechanism over the whole range of lesion stages? In the planning of clinical trials for caries-preventive agents, similar questions must be answered. Does an agent that reduces the progression of early free smooth-surface lesions by 50% have the same effect throughout all the stages of lesion progression? And the same effect for other tooth surfaces? It is the responsibility of the planner of clinical trials to consider whether the sample and the measured parameter have sufficient external validity for the results to be extrapolated to other populations. WHICH PARAMETER OF A LESION TO MEASURE? If one considers the definition of caries process as a de-/remineralization imbalance leading to net mineral loss, it seems most logical to take mineral loss as the preferred parameter. Considering the direction of progression and the relationship between the defense reaction of the tooth and depth of progression, depth of mineral loss may be as relevant as volume of mineral loss. It is open for discussion whether depth measurements should be absolute (µm or mm) or relative (% of enamel, % of dentin). The latter is used predominantly, and it normalizes the results for various layer thicknesses of enamel and dentin. This may be useful, because treatment decisions are often based on relative lesion depth, and enamel thickness, for instance, has considerable variability. However, a comparison of lesion progression rates will then be less straightforward. For most purposes, we would like to measure or detect lesion progression on a moderately fine scale. We are used to dividing lesion progression into about 4 or 5 stages, but at least double that number seems desirable. For clinical trials, the measurement of very small changes in mineral content could also be suitable. The diagnostic methods that are being developed for that purpose are primarily targeted at early enamel lesions. Two other parameters have been receiving attention recently: lesion activity and lesion infection (primarily of dentin). The concept of measuring lesion activity is very attractive: It would give instant, localized disease diagnosis. However, by its very name, activity implies a time-dependence. At most, a single measurement or assessment will give a derived value related to recent mineral loss. Suggested assessments are not yet validated with actual progression. Most clinical visual criteria for detecting arrested lesions (color, hardness, surface gloss) may take so long to develop that longitudinal monitoring would have been as effective. However, as suggested earlier, activity measurements may aid in patient or lesion selection for clinical trials (Pitts, 2004). The idea to measure lesion infection originates mainly from clinical, operative dentistry, for determining caries excavation depth. The relationship between lesion infection and lesion progression is not yet completely clarified, and therefore using infection as a lesion progression measurement is not recommended. However, beside demineralization, there is another feature of the caries process that is associated with lesion infection: collagen breakdown. This feature may provide future measurement options for lesion progression in dentin. VALIDATION Recently, the confusion between accuracy and validity of a diagnostic method was highlighted (Ten Bosch and Angmar-Månsson, 2000). Accuracy is the ability of a method to measure or detect what it purports to measure or detect. One determines this by comparing the method with a reference method of measuring or detecting that same parameter. Validity of a diagnostic method is its ability to predict or determine the disease for which it is designed. It is not altogether unexpected that, in caries diagnosis, the two have become almost synonymous. Where caries was defined as lesions, the accurate detection of lesions was also a valid diagnosis. Following the new definition of caries, an accurate measurement of the loss of very small amounts of mineral from a tooth surface would also yield a valid diagnostic method. But it implies that "validation studies" of caries diagnostic methods, comparing measured lesions extent or stage with a gold standard of lesion extent or stage, are really accuracy studies. In the following, however, we will continue to use the term validation for this process. True validation of such diagnostic methods can be achieved only clinically, where the real disease occurs. A validating method should fit the aim of the diagnostic method under evaluation (Ten Bosch and Angmar-Månsson, 2000). Where dichotomous analysis has been logical from the old standpoint of disease diagnosis, it is almost useless from the standpoint of accuracy of measurements. There is only one situation where it may be suitable: if we are interested only in the formation of new initial lesions. Validating ordinal or even continuous diagnostic methods with a dichotomous or dichotomized gold standard is all too common and very hard to defend, because it reduces the continuous lesion progression to only 2 stages. Even with Receiver Operating Characteristic (ROC) analysis, and therefore many or all values of the diagnostic method, this is hardly appropriate. It is important to take into account the parameter that the diagnostic method is evaluating. Take for instance the new DIAGNOdent device, which probably measures some parameter of lesion infection or infiltration by bacterial products. This method has been validated by histological lesion depth and mineral loss (Lussi et al., 1999; Shi et al., 2001). However, would not an intermediate step between diagnostic method and gold standard be helpful?—that is, calibrating the instrument for what it purports to measure, infection or infiltration, before establishing the correlation between lesion infection or infiltration and lesion progression. This would yield more information on the suitability of the detected parameter for the aim of caries lesion detection or monitoring, and could also suggest other applications for the diagnostic device. In most diagnostic method validations, it is necessary that the distribution of the disease in the sample reflect the distribution in the population in which the diagnostic method will be used (Ten Bosch and Angmar-Månsson, 2000). If it does not, calculated values for sensitivity and specificity may be either underestimated or overestimated with respect to application in the target population: underestimated when the sample contains too many "borderline" cases, and overestimated when too many "obvious" cases are included (Lussi, 1996; Vaarkamp et al., 2000). When ROC analysis is performed for a continuous or ordinal diagnostic method, the review authors suggested that it sufficed if the sample was well-distributed (Ten Bosch and Angmar-Månsson, 2000). However, when the target population has a disease distribution that is distinctly not well-distributed, the sensitivity and specificity values, and thus the ROC area, will still be unrealistic. Optimal validation of a continuous diagnostic method involves a gold standard which is also continuous. An equally distributed sample will then ensure that the method is calibrated over the whole measuring range. This requirement of the sample being representative or even only well-distributed may be one of our biggest challenges, because extracted teeth are ever harder to come by and constitute an inherently biased group. GOLD STANDARDS
Gold Standards for Changes in Mineral Content of Small Lesions
Gold Standards for (Stages in) the Entire Caries Lesion Process
The methodology of histological validation shows large variations (see Table PLM and TMR seem to have been used mainly for the purpose of detecting zones of different mineral content within the lesion. For this purpose, PLM requires section imbibition with different media, yielding limited data on mineral content, and not improving the detection of lesion depth very much (Wefel et al., 1985). Traditional TMR is less suitable for complete process validation, since the procedure uses different section thickness and exposure parameters for enamel and dentin. This problem could be overcome by combining the information from two exposures. Radiographs of thicker sections (> 100 µm), either by Wavelength-Independent Microradiography (WIM) or microfocal macroradiography (MaR), may be an alternative (Herkströter and Ten Bosch, 1990; Ricketts et al., 1998). Normal radiographs of thick sections (± 700 µm) proved inadequate as a gold standard (Hintze et al., 1995). CLSMdye for entire process validation and CLSM with dentin autofluorescence (blue excitation light, yellow emission light) for dentin caries validation have been described recently (Banerjee and Boyde, 1998; Ricketts et al., 1998). Both, however, have not yet been adequately calibrated with demineralization. Few formal comparisons of methods have been undertaken, but general observations have been made when more than one method was used. PLM was reported to detect smaller lesions than TMR (Mortimer, 1964; Fejerskov et al., 1976). For dentin (root) caries, a comparison among LM, PLM, and TMR showed qualitatively similar results, although certain features could be detected only with one method, e.g., the surface layer by TMR (Wefel et al., 1985). In one of the first studies of validation methods, SM of sections and radiography of sections, performed by four observers, were compared (Wenzel et al., 1994). The agreement between the two methods was low, kappa values for the observers ranged between 0.16 and 0.53, with histology overall scoring deeper lesions (Wenzel et al., 1994). Four validating methods were compared in a study that included unerupted third molars to provide a definitely negative (sound) control (Hintze et al., 1995). The methods evaluated were SM, film radiography (FR), and naked-eye inspection (NEI) of the same 700-µm sections, and TMR of selected sections reduced to 150 µm. The authors concluded that SM would be the method of choice, since it alone did not detect caries in the unerupted teeth, while detecting the highest number of lesions in the erupted teeth. It must be stated that the FR and NEI are relatively uncommon methods of validation, and the TMR radiographs were overexposed for dentin, making dentin lesion detection almost impossible. In a study that compared MaR and CLSMdye of the same sections, the two standards agreed completely on a rough scoring scale (sound, enamel lesion, dentin lesion). Rank correlation between depth measurements was 0.93, but the CLSMdye measurements were significantly less deep (average difference, 0.41 mm) than the MaR measurements for dentin lesions (Ricketts et al., 1998). A later study compared CLSMdye with LM inspection, if only for enamel lesions (Ferreira-Zandoná et al., 1998). Reported data are limited, but they report a lower threshold for caries detection by CLSM and 78% agreement on lesion presence between the methods. A large study comparing gold standards was published recently by ten Cate and co-workers (2000). In it, 100 exfoliated, deciduous teeth were subjected to 10 validating procedures (TMR, LM, laboratory-QLF, and PLM) in 5 different institutes. The validating parameter was simply the presence or absence of a lesion, and all the lesions were small. Agreement between and among different standards at the same institute was very low: kappa between 0.05 and 0.33.
Reproducibility/Repeatability of Gold Standards A few recent studies have included a report on the reproducibility of their gold standard. Most of them involve LM or SM inspection of sectioned teeth. In a scoring system with only 3 scores, a repeated measurement of 30 teeth resulted in only 1 change (Ashley et al., 1998). Reported intra-observer kappa values vary considerably: between 0.37 and 0.82 (Pitts et al., 2001) and 0.93 (Cayley and Holt, 1997). Reported inter-observer kappa values for microscopic inspection are 0.75 (D1) and 0.74 (D3) (Fyffe et al., 2000), or between 0.44.and 0.64 (Pitts et al., 2001). A study comparing microscopic and radiographic validations (radiographs of thick sections) showed similar kappa values: microscope, 0.47 to 0.60; and radiography, 0.44 to 0.76 (Wenzel et al., 1994). In the previously mentioned large comparison of gold standards, agreement for the same standard used at different institutes was reported as a range of positive scores: TMR, 11–61; microscopic inspection, 27–37; and in vitro QLF, 21–58 (ten Cate et al., 2000). Complete agreement on positive scores (3 institutes) was reached only for about 9, 11, and 11 teeth, respectively. It must be remarked that the TMR procedure was quite different at the different institutes, whereas the microscopic procedure was very similar. The authors emphasized that no attempts were made to reach consensus between and among the institutes, nor were specimens checked afterward for signs of damage during transportation or handling. Both actions might have led to higher rates of agreement. In the studies above, the gold standard was used as an ordinal system, and agreement was typically analyzed as for a dichotomous or ordinal dataset, according to Cohens kappa. It is advisable to use the gold standard for measuring mineral loss or lesion depth. One study which measured lesion depth with SM showed a 0.97 correlation coefficient between two independent observers (Vaarkamp et al., 1997b). We must conclude that, although we have a gold standard for early lesions, a well-calibrated and reproducible gold standard for the complete caries process is not available.
Independence of Detection Method and Gold Standard EXTRAPOLATION OF VALIDATION TO CLINICAL APPLICATION Diagnostic methods may depend on clinical factors to very different degrees, and clinical studies should be performed to explore if our methods measure lesion extent as accurately in vivo as they do in vitro. Several methods are available (Pine and Ten Bosch, 1996). A meta-analysis showed that the diagnostic performance results for in vivo studies were likely to be higher than for in vitro studies of the same diagnostic method (Van Rijkom and Verdonschot, 1995). Intuitively, it seems unlikely that this represents actual better clinical performance, and may well be related to an interaction with the different choice of gold standard. To optimize the extrapolation of in vitro studies to the clinical situation, we should (a) take into account the requirements for sample selection and analysis as mentioned before, and (b) simulate clinical circumstances in laboratory validation studies as much as is expected to be of influence and as is reasonably achievable. Furthermore, it seems a good idea to attempt to validate our clinical gold standards—for instance, using an in vitro set-up and a non-destructive "real" gold standard. Ultimately, the true validation of diagnostic methods must be by clinical disease progression itself. CONCLUSIONS AND RECOMMENDATIONS The tentative conclusion from the above must be that radiographic methods seem to be the most suitable gold standards, for their direct relationship to mineral loss and the volume of data they provide. For quantifying mineral content in small lesions, the standard of choice is TMR. However, for quantifying lesion depth in larger lesions, the problems of exposure parameters have not been solved. Double exposures, or thicker sections and WIM, or macroradiography may be an alternative, but none has been sufficiently calibrated and documented. Also, for occlusal caries lesions, the effect of the irregular morphology may render radiographic methods less valuable. Microscopic inspection (usually stereomicroscopy) of sections is the most-used gold standard for lesion depth validation. Reports about reproducibility vary widely, but the use of a continuous, instead of an ordinal, analysis may improve it. Visual determination of lesion extent in dentin, however, is not well-defined or -calibrated. Dye or quinoline imbibition of sections may be of assistance and should be investigated further to arrive at a standardized methodology. Sectioning of the sample is necessary for all these gold standards and remains a factor of variation. The section analyzed may not correspond exactly with the site measured by the diagnostic method, or represent only part of it (as in volume measurements like the laser fluorescence measurements), and considerable parts of the tooth are lost in the sectioning process (typically from 120 to 250 µm per slice). Diagnostic methods may therefore be expected to perform better in longitudinal monitoring than a cross-sectional validation would imply (Ten Bosch and Angmar-Månsson, 2000). If sectioning is to be avoided, the future lies with non-destructive validating methods. The most appropriate at this moment is x-ray microtomography (Anderson et al., 1996). This technique has reached acceptable resolution but is still very expensive and time-consuming. Magnetic resonance imaging may be an alternative non-destructive gold standard, but resolution is still much too low. We suggest that future studies into caries validation should include:
FOOTNOTES Presented at the International Consensus Workshop on Caries Clinical Trials, Glasgow, Scotland, January 7–10, 2002 REFERENCES
Journal of Dental Research, Vol. 83, No. suppl 1,
C48-C52 (2004) This article has been cited by other articles:
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

