Stulberg classification system for evaluation of Legg-Calve-Perthes disease: Intra-rater and inter-rater reliability

Jeroen G. Neyt, Stuart L. Weinstein, Kevin F. Spratt, Lori Dolan, José Morcuende, Frederick R. Dietz, Greg Guyton, Robert Hart, Michelle Stevens Kraut, Gregory Lervick, Peter Pardubsky, Andrea Saterbak

Research output: Contribution to journalArticle

55 Citations (Scopus)

Abstract

Background: Researchers and clinicians commonly use the classification system of Stulberg et al. as a basis for treatment decisions during the active phase of Legg-Calve-Perthes disease because of its putative utility as a predictor of long-term outcome. It is generally assumed that this system has an acceptable degree of reliability. This assumption, however, is not convincingly supported by the literature. Methods: The purpose of the present study was to assess the inter-rarer and intra-rater reliability of the classification system of Stulberg et al. with use of a pre-test, post-test design. During the pre-test phase, nine raters independently used the system to evaluate the radiographs of skeletally mature patients who had been managed for Legg-Calve-Perthes disease. The intervention between the pre-test and post-test phases consisted of a consensus-building session during which all raters jointly arrived at standardized definitions of the various joint structures that are assessed with use of the classification system. The effect of these definitions on reliability then was assessed by reevaluating the radiographs during the post-test phase. Results: The pre-test intra-rater reliability coefficients ranged from 0.709 to 0.915, and the post-test coefficients ranged from 0.568 to 0.874. The pre-test inter-rater reliability coefficients ranged from 0.603 to 0.732, and the post-test coefficients ranged from 0.648 to 0.744. Contributing to the variance was a lack of agreement concerning the assessment of joint structures and the way in which the raters translated these evaluations into a classification according to the system of Stulberg et al. Conclusions: Although intra-rater reliability was marginally acceptable, the degree of variability between the classifications assigned by different raters - even after the intervention - calls into question the reliability of the system of Stulberg et al.; consequently, the validity of any treatment decisions, outcome evaluations, or epidemiological studies based on this system is also in question.

Original languageEnglish (US)
Pages (from-to)1209-1216
Number of pages8
JournalJournal of Bone and Joint Surgery - Series A
Volume81
Issue number9
StatePublished - Sep 1999
Externally publishedYes

Fingerprint

Legg-Calve-Perthes Disease
Joints
Epidemiologic Studies
Research Personnel
Outcome Assessment (Health Care)

ASJC Scopus subject areas

  • Surgery
  • Orthopedics and Sports Medicine

Cite this

Neyt, J. G., Weinstein, S. L., Spratt, K. F., Dolan, L., Morcuende, J., Dietz, F. R., ... Saterbak, A. (1999). Stulberg classification system for evaluation of Legg-Calve-Perthes disease: Intra-rater and inter-rater reliability. Journal of Bone and Joint Surgery - Series A, 81(9), 1209-1216.

Stulberg classification system for evaluation of Legg-Calve-Perthes disease : Intra-rater and inter-rater reliability. / Neyt, Jeroen G.; Weinstein, Stuart L.; Spratt, Kevin F.; Dolan, Lori; Morcuende, José; Dietz, Frederick R.; Guyton, Greg; Hart, Robert; Kraut, Michelle Stevens; Lervick, Gregory; Pardubsky, Peter; Saterbak, Andrea.

In: Journal of Bone and Joint Surgery - Series A, Vol. 81, No. 9, 09.1999, p. 1209-1216.

Research output: Contribution to journalArticle

Neyt, JG, Weinstein, SL, Spratt, KF, Dolan, L, Morcuende, J, Dietz, FR, Guyton, G, Hart, R, Kraut, MS, Lervick, G, Pardubsky, P & Saterbak, A 1999, 'Stulberg classification system for evaluation of Legg-Calve-Perthes disease: Intra-rater and inter-rater reliability', Journal of Bone and Joint Surgery - Series A, vol. 81, no. 9, pp. 1209-1216.
Neyt, Jeroen G. ; Weinstein, Stuart L. ; Spratt, Kevin F. ; Dolan, Lori ; Morcuende, José ; Dietz, Frederick R. ; Guyton, Greg ; Hart, Robert ; Kraut, Michelle Stevens ; Lervick, Gregory ; Pardubsky, Peter ; Saterbak, Andrea. / Stulberg classification system for evaluation of Legg-Calve-Perthes disease : Intra-rater and inter-rater reliability. In: Journal of Bone and Joint Surgery - Series A. 1999 ; Vol. 81, No. 9. pp. 1209-1216.
@article{b301246626024e928265b88d3dab709b,
title = "Stulberg classification system for evaluation of Legg-Calve-Perthes disease: Intra-rater and inter-rater reliability",
abstract = "Background: Researchers and clinicians commonly use the classification system of Stulberg et al. as a basis for treatment decisions during the active phase of Legg-Calve-Perthes disease because of its putative utility as a predictor of long-term outcome. It is generally assumed that this system has an acceptable degree of reliability. This assumption, however, is not convincingly supported by the literature. Methods: The purpose of the present study was to assess the inter-rarer and intra-rater reliability of the classification system of Stulberg et al. with use of a pre-test, post-test design. During the pre-test phase, nine raters independently used the system to evaluate the radiographs of skeletally mature patients who had been managed for Legg-Calve-Perthes disease. The intervention between the pre-test and post-test phases consisted of a consensus-building session during which all raters jointly arrived at standardized definitions of the various joint structures that are assessed with use of the classification system. The effect of these definitions on reliability then was assessed by reevaluating the radiographs during the post-test phase. Results: The pre-test intra-rater reliability coefficients ranged from 0.709 to 0.915, and the post-test coefficients ranged from 0.568 to 0.874. The pre-test inter-rater reliability coefficients ranged from 0.603 to 0.732, and the post-test coefficients ranged from 0.648 to 0.744. Contributing to the variance was a lack of agreement concerning the assessment of joint structures and the way in which the raters translated these evaluations into a classification according to the system of Stulberg et al. Conclusions: Although intra-rater reliability was marginally acceptable, the degree of variability between the classifications assigned by different raters - even after the intervention - calls into question the reliability of the system of Stulberg et al.; consequently, the validity of any treatment decisions, outcome evaluations, or epidemiological studies based on this system is also in question.",
author = "Neyt, {Jeroen G.} and Weinstein, {Stuart L.} and Spratt, {Kevin F.} and Lori Dolan and Jos{\'e} Morcuende and Dietz, {Frederick R.} and Greg Guyton and Robert Hart and Kraut, {Michelle Stevens} and Gregory Lervick and Peter Pardubsky and Andrea Saterbak",
year = "1999",
month = "9",
language = "English (US)",
volume = "81",
pages = "1209--1216",
journal = "Journal of Bone and Joint Surgery - American Volume",
issn = "0021-9355",
publisher = "Journal of Bone and Joint Surgery Inc.",
number = "9",

}

TY - JOUR

T1 - Stulberg classification system for evaluation of Legg-Calve-Perthes disease

T2 - Intra-rater and inter-rater reliability

AU - Neyt, Jeroen G.

AU - Weinstein, Stuart L.

AU - Spratt, Kevin F.

AU - Dolan, Lori

AU - Morcuende, José

AU - Dietz, Frederick R.

AU - Guyton, Greg

AU - Hart, Robert

AU - Kraut, Michelle Stevens

AU - Lervick, Gregory

AU - Pardubsky, Peter

AU - Saterbak, Andrea

PY - 1999/9

Y1 - 1999/9

N2 - Background: Researchers and clinicians commonly use the classification system of Stulberg et al. as a basis for treatment decisions during the active phase of Legg-Calve-Perthes disease because of its putative utility as a predictor of long-term outcome. It is generally assumed that this system has an acceptable degree of reliability. This assumption, however, is not convincingly supported by the literature. Methods: The purpose of the present study was to assess the inter-rarer and intra-rater reliability of the classification system of Stulberg et al. with use of a pre-test, post-test design. During the pre-test phase, nine raters independently used the system to evaluate the radiographs of skeletally mature patients who had been managed for Legg-Calve-Perthes disease. The intervention between the pre-test and post-test phases consisted of a consensus-building session during which all raters jointly arrived at standardized definitions of the various joint structures that are assessed with use of the classification system. The effect of these definitions on reliability then was assessed by reevaluating the radiographs during the post-test phase. Results: The pre-test intra-rater reliability coefficients ranged from 0.709 to 0.915, and the post-test coefficients ranged from 0.568 to 0.874. The pre-test inter-rater reliability coefficients ranged from 0.603 to 0.732, and the post-test coefficients ranged from 0.648 to 0.744. Contributing to the variance was a lack of agreement concerning the assessment of joint structures and the way in which the raters translated these evaluations into a classification according to the system of Stulberg et al. Conclusions: Although intra-rater reliability was marginally acceptable, the degree of variability between the classifications assigned by different raters - even after the intervention - calls into question the reliability of the system of Stulberg et al.; consequently, the validity of any treatment decisions, outcome evaluations, or epidemiological studies based on this system is also in question.

AB - Background: Researchers and clinicians commonly use the classification system of Stulberg et al. as a basis for treatment decisions during the active phase of Legg-Calve-Perthes disease because of its putative utility as a predictor of long-term outcome. It is generally assumed that this system has an acceptable degree of reliability. This assumption, however, is not convincingly supported by the literature. Methods: The purpose of the present study was to assess the inter-rarer and intra-rater reliability of the classification system of Stulberg et al. with use of a pre-test, post-test design. During the pre-test phase, nine raters independently used the system to evaluate the radiographs of skeletally mature patients who had been managed for Legg-Calve-Perthes disease. The intervention between the pre-test and post-test phases consisted of a consensus-building session during which all raters jointly arrived at standardized definitions of the various joint structures that are assessed with use of the classification system. The effect of these definitions on reliability then was assessed by reevaluating the radiographs during the post-test phase. Results: The pre-test intra-rater reliability coefficients ranged from 0.709 to 0.915, and the post-test coefficients ranged from 0.568 to 0.874. The pre-test inter-rater reliability coefficients ranged from 0.603 to 0.732, and the post-test coefficients ranged from 0.648 to 0.744. Contributing to the variance was a lack of agreement concerning the assessment of joint structures and the way in which the raters translated these evaluations into a classification according to the system of Stulberg et al. Conclusions: Although intra-rater reliability was marginally acceptable, the degree of variability between the classifications assigned by different raters - even after the intervention - calls into question the reliability of the system of Stulberg et al.; consequently, the validity of any treatment decisions, outcome evaluations, or epidemiological studies based on this system is also in question.

UR - http://www.scopus.com/inward/record.url?scp=0032870427&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0032870427&partnerID=8YFLogxK

M3 - Article

C2 - 10505517

AN - SCOPUS:0032870427

VL - 81

SP - 1209

EP - 1216

JO - Journal of Bone and Joint Surgery - American Volume

JF - Journal of Bone and Joint Surgery - American Volume

SN - 0021-9355

IS - 9

ER -