| Sign In to gain access to subscriptions and/or personal tools. |
Validation of Self-reported Periodontal Disease: A Systematic Review
1 Department of Oral Health Policy and Epidemiology, Harvard School of Dental Medicine, 188 Longwood Avenue, Boston, MA 02115, USA; Correspondence: * corresponding author, kjoshipura{at}hsdm.harvard.edu
Self-report is an efficient and accepted means of assessing many population characteristics, risk factors, and diseases, but has rarely been used for periodontal disease (chronic periodontitis). The availability of valid self-reported measures of periodontal disease would facilitate epidemiologic studies on a much larger scale, allow for integration of new studies of periodontal disease within large ongoing studies, and facilitate lower-cost population surveillance of periodontitis. Several studies have been conducted to validate self-reported measures for periodontal disease, but results have been inconsistent. In this report, we conducted a systematic review of the validation studies. We reviewed the 16 studies that assessed the validity of self-reported periodontal and gingivitis measures against clinical gold standards. Seven of the studies included self-reported measures specific to gingivitis, four included measures only for periodontitis, and five included both gingivitis and periodontal measures. Three of the studies used a self-assessment method where they provided the patient with a detailed manual for performing a self-exam. The remaining 13 studies asked participants to self-report symptoms, presence of periodontal disease itself, or their recollection of a dental health professional diagnosing them or providing treatment for periodontal disease. The review indicates that some measures showed promise, but results varied across populations and self-reported measures. One example of a good measure is, "Has any dentist/hygienist told you that you have deep pockets?", which had a sensitivity of 55%, a specificity of 90%, positive predictive value of 77%, and negative predictive value of 75% against clinical pocket depth. Higher validity could be potentially obtained by the use of combinations of several self-reported questions and other predictors of periodontal disease.
Key Words: Systematic review self-report validity periodontal disease gingivitis.
Self-report is an efficient and accepted means of assessing many diseases, such as cancer, cardiovascular disease (Newell et al., 1999), and juvenile rheumatoid arthritis (Wright et al., 1994), as well as risk factors for disease, such as diet (Willett, 1990; Rimm et al., 1992), physical activity (Wolf et al., 1994), high blood pressure (Tormo et al., 2000), and general health (Sheridan et al., 1998). In the United States, The Behavioral Risk Factor Surveillance System (BRFSS), a self-report survey system established in 1984 by the Centers for Disease Control and Prevention (CDC), is used extensively at the state and local levels to survey and track trends in diseases such as heart disease, cancer, stroke, and diabetes, and risk factors such as obesity, and has been used in recent years to monitor trends in dental visits, dental cleanings, and tooth loss (Battelle Memorial Institute, 1999). Self-report is used for overall oral health in other studies as well. For example, the Geriatric Oral Health Assessment Index (GOHAI) has been validated for use in populations diverse in ethnicity and age (Atchison and Dolan, 1990; Atchison et al., 1998; Tubert-Jeannin et al., 2003). Nonetheless, self-report has rarely been used for periodontal disease (chronic periodontitis). Investigators have questioned whether self-report can be used for this purpose. Studies evaluating the validity of self-reported measures for periodontal disease and gingivitis have reported inconsistent results. The development, implementation, and evaluation of public health interventions for periodontal disease will require that the diseases be monitored at several levels of the population. Current measures of periodontal disease are extremely resource-intensive and cannot be used in several state-based surveillance systems. The existence and use of valid, low-cost, and low-resource self-reported measures of periodontal disease would be beneficial in a variety of ways. It would facilitate epidemiological studies of periodontal disease on a much larger scale than is feasible with the present clinical measures, since much larger study populations could be reached by surveys rather than by clinical examination. Additionally, questions regarding periodontal disease could easily be added to ongoing studies to evaluate associations with other diseases and conditions. The use of self-report would allow for an easier and low-cost method of obtaining data for research and would support the creation of oral health programs (Siegal et al., 1988; Kallio, 1996). Self-assessment can additionally serve as a motivational tool for good oral hygiene (Kallio, 1996). Finally, self-reported measures would allow for surveillance of the periodontal condition of populations over time, in national, state, or regional surveillance programs. To date, no comprehensive review of the field has been published. In this report, we have reviewed all of the studies validating self-report of periodontal or gingival diseases. We did not necessarily expect to find a clear "yes" or "no" answer as to whether self-reported periodontal measures were valid. Our objective was not only to summarize the validity of different self-reported measures in different populations, but also to identify methods and measures which show promise for use and/or further development, testing, and refinement.
We sought to identify all studies that evaluated the validity of self-report of periodontal and gingival diseases using clinical measures as the standard.
Literature searches were performed via the Ovid Web Gateway (2000) Internet interface for MEDLINE. The search strategy (Table 1
We reviewed each of these 16 studies, and extracted data from each study in the following fields: population characteristics and sampling criteria, method of self-report (self-assessment, questionnaire, interview), self-reported questions, clinical gold standards, and results of the validation study. Information regarding the patients self-reported signs, symptoms, perceptions, or knowledge of gingivitis or periodontal disease or treatment was included, whereas measures regarding perceived treatment needs or family history were discarded. A single abstracter performed the first abstraction, with consultation from a second author, as is accepted in the literature (Horvitz-Lennon et al., 2001). The second author subsequently verified all the results abstracted. Where there were discrepancies, they were discussed and resolved. We synthesized the information regarding the studies and the results into three tables. Self-reported questions are grouped according to topic, and measures are described according to the specific wording of the questions to the extent provided by the authors, along with the clinical gold standards that were used for validating the self-reported questions. Results presented are generally as reported by the authors, and include p-values, percent agreement, correlation coefficients, regression coefficients, sensitivity and specificity, predictive values, and simple descriptive measures. We have calculated additional statistics based on the data provided in the manuscript when needed and possible, as noted. We were unable to perform summary analyses, such as ROC curves, of the studies under review, due to the inconsistency of statistical measures reported. We considered a measure to have good validity when the sum of either sensitivity plus specificity or positive plus negative predictive values was 120% or above. This value was arbitrarily chosen; however, it represents the levels of the statistics that are accepted as good validity. Changing the threshold of the gold standard definition would increase specificity at the cost of sensitivity, or vice versa. In this context of validation of measures that could be used for etiologic studies, surveys, or surveillance, it is hard to know the relative importance of sensitivity and specificity. Hence, it is important to look at the combination of sensitivity plus specificity or predictive values.
The 16 studies gave us a large array of results, and we have presented only the most pertinent results from each study (Tables 3
The 16 studies are briefly summarized in Table 2 The studies varied by population characteristics. Six of the studies were conducted among school-age children (Nakashima et al., 1988, 1989; Schwarz, 1989; Kallio et al., 1994; Kallio, 1996; Taani and Alhaija, 2003), while ten were conducted among adults. Of the 16 reports, only three, all by the same group, were conducted in the United States (Joshipura et al., 1996, 2002; Pitiphat et al., 2002). Two of these reports consisted of populations of health professionals—one a cohort of dentists (Joshipura et al., 1996), and the other a cohort of non-dentist health professionals (Joshipura et al., 2002). The other US report (Pitiphat et al., 2002) performed separate validations among two different populations—one a group of veterans, and another, consecutive patients at a dental school clinic. Thus, there were actually 17 separate populations evaluated in the 16 publications. Of the 16 reports, three used a specified self-assessment method (Glavind and Attström, 1979; Kallio et al., 1990; Kallio, 1996). In these studies, participants were given written manuals detailing the procedures they should use for self-assessment and were asked to report their findings on the forms provided. The remaining 13 publications assessed symptoms or awareness of disease conditions by means of a questionnaire for self-report. The questionnaires were either administered in writing at the time of the patient visit (Kallio et al., 1994; Gilbert and Nuttall, 1999; Taguchi et al., 1999; Pitiphat et al., 2002; Taani and Alhaija, 2003), distributed by mail (Tervonen and Knuuttila, 1988; Schwarz, 1989; Joshipura et al., 1996, 2002; Unell et al., 1997; Buhlin et al., 2002), or given as an interview, conducted in person (Nakashima et al., 1988, 1989) or by telephone (Pitiphat et al., 2002).
The results of the validation studies are presented in Table 3
Results showing good validity are indicated by gray highlighting in Tables 3
As seen in Table 3
Question 1 in Table 3
Question 4 in Table 3
Often, questions were validated by more than one clinical gold standard. Question 2 in Table 3
As seen in Table 4
Question 1 in Table 4
Two measures for gingivitis showed good validity, as indicated by gray highlighting. Both measures were from the category "Bleeding from Gums". One measure was from the study by Gilbert and Nuttall (1999). The measure, "Gums have bled recently" (question 9 in Table 4
As indicated by gray highlighting in Tables 3
Sixteen (80%) of the 20 self-reported measures for periodontal disease in Table 3 The self-report questions repeated in two or more studies often showed conflicting results. For example, self-reported bleeding from gums was reported in six studies, and appropriate statistics were given for three of these. Of these, two studies (Gilbert and Nuttall, 1999; Buhlin et al., 2002) found that measure to be valid, and one did not. The two studies that showed good validity used different clinical measures, although both were measures of gingival bleeding, and had slight differences in wording across the self-reported measures ("gums have bled sometimes" and "gums have bled recently"). However, there does not seem to be any obvious factor explaining the discrepancy between studies that did and those that did not find self-reported measures to be valid, though many factors are likely to have an effect. Critics often question whether self-report is a valid measure at all. Self-report is considered a suitable measure in routine use for many different conditions and diseases. For example, the Behavioral Risk Factor Surveillance System (BRFSS) assessed self-reported diabetes. Studies have compared self-reported estimates of diabetes from BRFSS, with fasting serum glucose levels and medical records as gold standards (Bowlin et al., 1993; Martin et al., 2000; Nelson et al., 2001). The sensitivity values ranged from 67% to 80%, and specificity ranged from 98% to 99%, suggesting that persons without diabetes provided valid answers. Based on data for 1995, estimates for diabetes prevalence were 4.7% in BRFSS and compared well with a prevalence of 4.5% in the National Health Interview Survey, which used clinical information (Battelle Memorial Institute, 1999). Thus, self-reported diabetes appears to be valid in this context.
Other measures used in the BRFSS show validity similar to, or even weaker than, the measures for self-reported periodontal disease and gingivitis. Validation studies of self-reported hypercholesterolemia compared with clinical measurements resulted in sensitivity of 43% and specificity of 86% (sum = 129%). Comparing self-reported blood pressure with medical records yielded a sensitivity of 99% and specificity of 23% (sum = 122%). In contrast, the best measure for self-reported periodontal disease was by Buhlin et al.(2002), asking, "Has any dentist/hygienist told you that you have deep pockets?" (question 7, Table 3 To convey the extent of the impact of misclassification, one would need to compare the true prevalence, as defined by the clinical gold standards, with the observed prevalence from the self-reported measures. For example, in a hypothetical population of US adults, and a single self-report question with positive predictive value of 76% and negative predictive value of 74%, if the true prevalence is 35%, then the prevalence in the self-reports would be 19% (Barron, 1977; Flegal et al., 1986; Joshipura, 1995). Thus, there would be an underestimation of periodontal disease by 46%. Hence, if we use self-reports for estimating prevalence, we would need to use the formulae for correcting the prevalence estimates from self-report to arrive at the true prevalence.
Validity is likely to vary across the types of questions asked. Severe measures of periodontal disease, such as mobility, may be easier for the patients to notice in themselves. For example, self-reported "Think teeth loose or wobbly" [question 12 in Table 3 Three measures in our review evaluated the validity of self-reported mobility. One measure, self-assessed highest recorded tooth mobility score compared with professionally determined mobility (Glavind and Attström, 1979), showed good validity, based on our calculations from the authors reported statistics, with sensitivity of 92% and specificity of 53% (SN + SP = 145%). The second measure, "think teeth loose or wobbly", compared with clinical mobility (Gilbert and Nuttall, 1999) showed good overall validity, based on our criteria of sensitivity + specificity > 120%. Although the sensitivity is only about 30% compared with the clinical gold standards, the > 90% specificity indicates that this can in fact be a reasonable measure, especially if combined with another measure that shows very high sensitivity. Hence, mobility does seem to be a relatively good measure. The third measure included mobility but combined it with bleeding, "self-assessed number of pockets exhibiting bleeding or mobility compared to (sic) clinically determined pocket depth" (Glavind and Attström, 1979), and did not report appropriate statistics for us to determine validity.
Not only is the idea behind the question important, but also the specific wording used plays a role in making a patient understand what is being asked, and affects his or her ability to answer the question. For example, asking a patient, "Do you have periodontal disease?" is different from asking him or her, "Do you have gum problems?" or even, "Do you have gum disease?". Depending on the patients background, different terms may have very different meanings. Wording may also help the patient to answer a question. Self-reported "Told by dentist/hygienist have gum disease" may trigger the memory of being in the dentists office, enhancing the patients ability to answer the self-report accurately. Accordingly, all three measures about professional diagnosis of periodontal disease show good validity (questions 7, 8, and 9 in Table 3 The manner in which the self-reported measure is asked or reported by the patient may also play a role in determining its validity. Having a patient follow a self-assessment protocol requires more discipline and skill, and these measures might be biased by the types of patients who can adhere to the protocol. In the reports compiled, self-assessment measures were used to determine toothpick- or toothbrush-induced bleeding (Glavind and Attström, 1979; Kallio et al., 1990; Kallio, 1996) or tooth mobility (Glavind and Attström, 1979). However, we cannot infer anything regarding the validity of self-assessment from this report: Two studies (Glavind and Attström, 1979; Kallio et al., 1990) did not report statistics appropriately for us to determine validity of their studies, and the other study (Kallio, 1996) did not show good validity. Written questionnaires might be better measures for some populations, although education level certainly determines the groups for which this would work best. It is likely that patient motivation and concentration may be lower when completing a mailed questionnaire vs. one given in person at the time of a visit to the dentists office. Six of the seven studies using a written questionnaire at the time of patient visit, and that reported the appropriate statistics, were valid. All four studies using a mailed questionnaire and reporting the appropriate statistics showed good validity. Telephone vs. live interviewing methods could also affect the efficacy of the self-reported measure. Adequate statistics were not given for us to compare the validity of telephone vs. personal interviewing tactics. The population studied is likely to play a role in the validity. Population characteristics such as disease status, socio-economic status (SES), and dental care utilization are all likely to affect the validity of self-report. Subjects who face more disease might be more aware of their periodontal status. SES certainly plays a role in the utilization of dental care, and, as stated earlier, it may be difficult for a person who does not visit the dentist to answer questions regarding professional diagnosis of periodontal disease. Additionally, awareness and perceptions of periodontal condition may differ among different SES levels and different levels of dental care utilization. Several of the measures reviewed showed low sensitivity, and this may be due to low dental care access and utilization. Patients who do not visit the dentist are less likely to be aware of their periodontal condition. Presumably, the sensitivity for these measures would be higher in populations with higher levels of dental care utilization. Dental care utilization affects a populations ability to self-report its oral condition. For example, there is recent evidence that dental care is available and used widely in the US (Vargas et al., 2003). The 2002 National Health Interview Survey by the CDC found that 87% of adults aged 18 years or older had contacted a dentist or other dental health professional within the preceding five years (Lethbridge-Cejku et al., 2004). The 2002 BRFSS found that 71% of adults in a population over 18 (weighted to reflect the US population) had had a dental cleaning, and 70% had visited the dentist in the preceding 12 months. Only 28% had not visited the dentist or had a dental cleaning in the previous 12 months. High access to care may make an American population more adept at reporting its periodontal condition. Seven of the 12 self-reported measures for periodontal disease showing good validity were among US populations, although three of these measures were among health professionals, which might account for their better validity.
As seen in Tables 3
Additionally, besides the variations that occur in validating the self-reported measures with different clinical measures, we must question which clinical measures are most appropriate. Clinical measures should be similar in the context of the self-reported measures which they validate. For example, clinical measures specific to periodontal disease, such as attachment loss or bone loss, should be used to validate questions that are specific to periodontal disease. Several studies reported validation results for clinical measures that did not match the self-reported measure in symptoms or severity of disease, and this could explain the lack of good validity in these cases. For example, question 9 in Table 4 Determining which clinical measures are most appropriate for validation is a difficult task, and even more so when one considers what exactly makes a clinical measure valid. Clinical gold standards themselves lack inherent standardization; thus, the definition of periodontal disease varies according to which measure was used for diagnosis. Clinical measurements are difficult to standardize, due to variations in the exact site of placement of the probe, probing force, angulation, patient discomfort, degree of inflammation, and bleeding (Pihlstrom, 1992), and, when radiographs are used, in the angulation and technique. Additionally, there is no universally accepted threshold of periodontal disease, and comparisons with different thresholds of attachment loss, bone loss, or pocket depth will give different levels of validity. Complicating matters more, the clinical measures could themselves be considered surrogate endpoints (Prentice, 1989). For example, obesity, hypertension, and hypercholesterolemia are surrogate endpoints for the true outcomes of cardiovascular morbidity and mortality (Psaty et al., 1999). Gingival bleeding, pocket depth, mobility, and radiographic bone loss are measures that we have available to tell us a patients risk for developing the true endpoints of periodontal disease. One can consider the true endpoints for periodontal disease as the health outcomes that result, such as tooth loss (Hujoel et al., 1997) leading to loss of function, pain, or loss of aesthetics. However, such endpoints cannot easily be measured in a standardized way. Loss of function and pain must be reported by the patients themselves; thus, self-report could be considered better than clinical measures if one considers function and patients perceptions as the gold standards of periodontal disease. Further work must be done before the validity of self-report for periodontal disease and gingivitis can be determined. Investigators conducting future research in this area should keep in mind the results of previous studies as guidance. The studies above do not cover the breadth of possible questions that could be asked. Studies must be done in which several variations of the same question are compared, or combinations of questions are examined. The studies we have tabulated all had adequate sample size, with 60 or more participants, and were generally cross-sectional. However, we are limited by the few populations examined in the 16 studies, and future work should examine a greater diversity of populations. Only seven of the 16 studies were conducted on random samples; hence, investigators should keep in mind the greater generalizability the study may have with random sampling, when feasible. Overall, the specificity of measures was high, while sensitivity was low. The sensitivity might be improved by the addition of questions in parallel, rather than in series. If a patient incorrectly responds "no" to the original question, further questions are unable to detect the presence of a periodontal condition. Allowing each question to be asked separately, although seemingly redundant, will allow each question its fair chance at detecting the condition. Several of the manuscripts that we reviewed did not contain all of the details that we wished to extract to make a true systematic comparison of the studies. There are several components of validation studies that would have allowed us to better analyze and compare the studies reviewed. A comprehensive format, such as that outlined in the Standards for Reporting of Diagnostic Accuracy STARD Checklist for Reporting of Diagnostic Accuracy Studies (Bossuyt et al., 2003), would ensure that population characteristics, self-reported and clinical measures, and diagnostic accuracy or validity are complete. A major limitation to the work in this review was the lack of uniformity and appropriateness in statistical analysis, and this prevented us from conducting a quantitative assessment of the validity in the field. The reporting of statistics in a standard and uniform manner is important for comparison of measures and allows for easier prediction of which measures might work best. We were unable to perform any formal quality assessments, such as those outlined by Antczak et al. (1986a,b), both due to the lack of uniform reporting among studies and because the methods are generally for the assessment of randomized control trials.
Based on the literature synthesized above, self-report shows good potential for the assessment of periodontal disease. Thirteen self-reported measures of periodontal disease showed good validity compared with clinical gold standards. Results were less supportive of the validity of self-report for gingivitis, since only two measures of gingivitis showed validity. Several measures for periodontal disease were useful and valid in the populations examined, but the results so far have not consistently proved any single measures superiority and qualification for use alone in a general population. Using several self-reported measures in combination may prove to be a good alternative. The best measures we found were, "Have you had periodontal disease with bone loss?" (question 4 in Table 3
The authors acknowledge Dr. Jeff Hyman for conducting an initial literature search, and Dr. Shuku Fujimaki for translating the Japanese articles. Brooke Blicher was supported by NIH Training Grant DEO7151. The project was supported by the Centers for Disease Control and Prevention, Division of Oral Health. Received for publication September 13, 2004. Accepted for publication June 10, 2005.
Journal of Dental Research, Vol. 84, No. 10,
881-890 (2005)
|
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
2 mm", and "above median % of sites with score 