Comparison of manual and automatic segmentation methods for brain structures in the presence of space-occupying lesions: A multi-expert study

M. A. Deeley, A. Chen, R. Datteri, J. H. Noble, A. J. Cmelak, E. F. Donnelly, A. W. Malcolm, L. Moretti, Jerry Jaboin, K. Niermann, Eddy S. Yang, David S. Yu, F. Yei, T. Koyama, G. X. Ding, B. M. Dawant

Research output: Contribution to journalArticle

51 Citations (Scopus)

Abstract

The purpose of this work was to characterize expert variation in segmentation of intracranial structures pertinent to radiation therapy, and to assess a registration-driven atlas-based segmentation algorithm in that context. Eight experts were recruited to segment the brainstem, optic chiasm, optic nerves, and eyes, of 20 patients who underwent therapy for large space-occupying tumors. Performance variability was assessed through three geometric measures: volume, Dice similarity coefficient, and Euclidean distance. In addition, two simulated ground truth segmentations were calculated via the simultaneous truth and performance level estimation algorithm and a novel application of probability maps. The experts and automatic system were found to generate structures of similar volume, though the experts exhibited higher variation with respect to tubular structures. No difference was found between the mean Dice similarity coefficient (DSC) of the automatic and expert delineations as a group at a 5% significance level over all cases and organs. The larger structures of the brainstem and eyes exhibited mean DSC of approximately 0.8-0.9, whereas the tubular chiasm and nerves were lower, approximately 0.4-0.5. Similarly low DSCs have been reported previously without the context of several experts and patient volumes. This study, however, provides evidence that experts are similarly challenged. The average maximum distances (maximum inside, maximum outside) from a simulated ground truth ranged from (-4.3, +5.4) mm for the automatic system to (-3.9, +7.5) mm for the experts considered as a group. Over all the structures in a rank of true positive rates at a 2 mm threshold from the simulated ground truth, the automatic system ranked second of the nine raters. This work underscores the need for large scale studies utilizing statistically robust numbers of patients and experts in evaluating quality of automatic algorithms.

Original languageEnglish (US)
Pages (from-to)4557-4577
Number of pages21
JournalPhysics in Medicine and Biology
Volume56
Issue number14
DOIs
StatePublished - Jul 21 2011
Externally publishedYes

Fingerprint

Brain Stem
Brain
Optic Chiasm
Expert Systems
Atlases
Optic Nerve
Radiotherapy
Neoplasms
Therapeutics

ASJC Scopus subject areas

  • Radiology Nuclear Medicine and imaging
  • Radiological and Ultrasound Technology

Cite this

Comparison of manual and automatic segmentation methods for brain structures in the presence of space-occupying lesions : A multi-expert study. / Deeley, M. A.; Chen, A.; Datteri, R.; Noble, J. H.; Cmelak, A. J.; Donnelly, E. F.; Malcolm, A. W.; Moretti, L.; Jaboin, Jerry; Niermann, K.; Yang, Eddy S.; Yu, David S.; Yei, F.; Koyama, T.; Ding, G. X.; Dawant, B. M.

In: Physics in Medicine and Biology, Vol. 56, No. 14, 21.07.2011, p. 4557-4577.

Research output: Contribution to journalArticle

Deeley, MA, Chen, A, Datteri, R, Noble, JH, Cmelak, AJ, Donnelly, EF, Malcolm, AW, Moretti, L, Jaboin, J, Niermann, K, Yang, ES, Yu, DS, Yei, F, Koyama, T, Ding, GX & Dawant, BM 2011, 'Comparison of manual and automatic segmentation methods for brain structures in the presence of space-occupying lesions: A multi-expert study', Physics in Medicine and Biology, vol. 56, no. 14, pp. 4557-4577. https://doi.org/10.1088/0031-9155/56/14/021
Deeley, M. A. ; Chen, A. ; Datteri, R. ; Noble, J. H. ; Cmelak, A. J. ; Donnelly, E. F. ; Malcolm, A. W. ; Moretti, L. ; Jaboin, Jerry ; Niermann, K. ; Yang, Eddy S. ; Yu, David S. ; Yei, F. ; Koyama, T. ; Ding, G. X. ; Dawant, B. M. / Comparison of manual and automatic segmentation methods for brain structures in the presence of space-occupying lesions : A multi-expert study. In: Physics in Medicine and Biology. 2011 ; Vol. 56, No. 14. pp. 4557-4577.
@article{808b9648d4464ef780aaa372b1842c07,
title = "Comparison of manual and automatic segmentation methods for brain structures in the presence of space-occupying lesions: A multi-expert study",
abstract = "The purpose of this work was to characterize expert variation in segmentation of intracranial structures pertinent to radiation therapy, and to assess a registration-driven atlas-based segmentation algorithm in that context. Eight experts were recruited to segment the brainstem, optic chiasm, optic nerves, and eyes, of 20 patients who underwent therapy for large space-occupying tumors. Performance variability was assessed through three geometric measures: volume, Dice similarity coefficient, and Euclidean distance. In addition, two simulated ground truth segmentations were calculated via the simultaneous truth and performance level estimation algorithm and a novel application of probability maps. The experts and automatic system were found to generate structures of similar volume, though the experts exhibited higher variation with respect to tubular structures. No difference was found between the mean Dice similarity coefficient (DSC) of the automatic and expert delineations as a group at a 5{\%} significance level over all cases and organs. The larger structures of the brainstem and eyes exhibited mean DSC of approximately 0.8-0.9, whereas the tubular chiasm and nerves were lower, approximately 0.4-0.5. Similarly low DSCs have been reported previously without the context of several experts and patient volumes. This study, however, provides evidence that experts are similarly challenged. The average maximum distances (maximum inside, maximum outside) from a simulated ground truth ranged from (-4.3, +5.4) mm for the automatic system to (-3.9, +7.5) mm for the experts considered as a group. Over all the structures in a rank of true positive rates at a 2 mm threshold from the simulated ground truth, the automatic system ranked second of the nine raters. This work underscores the need for large scale studies utilizing statistically robust numbers of patients and experts in evaluating quality of automatic algorithms.",
author = "Deeley, {M. A.} and A. Chen and R. Datteri and Noble, {J. H.} and Cmelak, {A. J.} and Donnelly, {E. F.} and Malcolm, {A. W.} and L. Moretti and Jerry Jaboin and K. Niermann and Yang, {Eddy S.} and Yu, {David S.} and F. Yei and T. Koyama and Ding, {G. X.} and Dawant, {B. M.}",
year = "2011",
month = "7",
day = "21",
doi = "10.1088/0031-9155/56/14/021",
language = "English (US)",
volume = "56",
pages = "4557--4577",
journal = "Physics in Medicine and Biology",
issn = "0031-9155",
publisher = "IOP Publishing Ltd.",
number = "14",

}

TY - JOUR

T1 - Comparison of manual and automatic segmentation methods for brain structures in the presence of space-occupying lesions

T2 - A multi-expert study

AU - Deeley, M. A.

AU - Chen, A.

AU - Datteri, R.

AU - Noble, J. H.

AU - Cmelak, A. J.

AU - Donnelly, E. F.

AU - Malcolm, A. W.

AU - Moretti, L.

AU - Jaboin, Jerry

AU - Niermann, K.

AU - Yang, Eddy S.

AU - Yu, David S.

AU - Yei, F.

AU - Koyama, T.

AU - Ding, G. X.

AU - Dawant, B. M.

PY - 2011/7/21

Y1 - 2011/7/21

N2 - The purpose of this work was to characterize expert variation in segmentation of intracranial structures pertinent to radiation therapy, and to assess a registration-driven atlas-based segmentation algorithm in that context. Eight experts were recruited to segment the brainstem, optic chiasm, optic nerves, and eyes, of 20 patients who underwent therapy for large space-occupying tumors. Performance variability was assessed through three geometric measures: volume, Dice similarity coefficient, and Euclidean distance. In addition, two simulated ground truth segmentations were calculated via the simultaneous truth and performance level estimation algorithm and a novel application of probability maps. The experts and automatic system were found to generate structures of similar volume, though the experts exhibited higher variation with respect to tubular structures. No difference was found between the mean Dice similarity coefficient (DSC) of the automatic and expert delineations as a group at a 5% significance level over all cases and organs. The larger structures of the brainstem and eyes exhibited mean DSC of approximately 0.8-0.9, whereas the tubular chiasm and nerves were lower, approximately 0.4-0.5. Similarly low DSCs have been reported previously without the context of several experts and patient volumes. This study, however, provides evidence that experts are similarly challenged. The average maximum distances (maximum inside, maximum outside) from a simulated ground truth ranged from (-4.3, +5.4) mm for the automatic system to (-3.9, +7.5) mm for the experts considered as a group. Over all the structures in a rank of true positive rates at a 2 mm threshold from the simulated ground truth, the automatic system ranked second of the nine raters. This work underscores the need for large scale studies utilizing statistically robust numbers of patients and experts in evaluating quality of automatic algorithms.

AB - The purpose of this work was to characterize expert variation in segmentation of intracranial structures pertinent to radiation therapy, and to assess a registration-driven atlas-based segmentation algorithm in that context. Eight experts were recruited to segment the brainstem, optic chiasm, optic nerves, and eyes, of 20 patients who underwent therapy for large space-occupying tumors. Performance variability was assessed through three geometric measures: volume, Dice similarity coefficient, and Euclidean distance. In addition, two simulated ground truth segmentations were calculated via the simultaneous truth and performance level estimation algorithm and a novel application of probability maps. The experts and automatic system were found to generate structures of similar volume, though the experts exhibited higher variation with respect to tubular structures. No difference was found between the mean Dice similarity coefficient (DSC) of the automatic and expert delineations as a group at a 5% significance level over all cases and organs. The larger structures of the brainstem and eyes exhibited mean DSC of approximately 0.8-0.9, whereas the tubular chiasm and nerves were lower, approximately 0.4-0.5. Similarly low DSCs have been reported previously without the context of several experts and patient volumes. This study, however, provides evidence that experts are similarly challenged. The average maximum distances (maximum inside, maximum outside) from a simulated ground truth ranged from (-4.3, +5.4) mm for the automatic system to (-3.9, +7.5) mm for the experts considered as a group. Over all the structures in a rank of true positive rates at a 2 mm threshold from the simulated ground truth, the automatic system ranked second of the nine raters. This work underscores the need for large scale studies utilizing statistically robust numbers of patients and experts in evaluating quality of automatic algorithms.

UR - http://www.scopus.com/inward/record.url?scp=79960349087&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=79960349087&partnerID=8YFLogxK

U2 - 10.1088/0031-9155/56/14/021

DO - 10.1088/0031-9155/56/14/021

M3 - Article

C2 - 21725140

AN - SCOPUS:79960349087

VL - 56

SP - 4557

EP - 4577

JO - Physics in Medicine and Biology

JF - Physics in Medicine and Biology

SN - 0031-9155

IS - 14

ER -