A Comparison of Lung Nodule Segmentation Algorithms: Methods and Results from a Multi-institutional Study

Jayashree Kalpathy-Cramer, Binsheng Zhao, Dmitry Goldgof, Yuhua Gu, Xingwei Wang, Hao Yang, Yongqiang Tan, Robert Gillies, Sandy Napel

Research output: Contribution to journalArticle

21 Citations (Scopus)

Abstract

Tumor volume estimation, as well as accurate and reproducible borders segmentation in medical images, are important in the diagnosis, staging, and assessment of response to cancer therapy. The goal of this study was to demonstrate the feasibility of a multi-institutional effort to assess the repeatability and reproducibility of nodule borders and volume estimate bias of computerized segmentation algorithms in CT images of lung cancer, and to provide results from such a study. The dataset used for this evaluation consisted of 52 tumors in 41 CT volumes (40 patient datasets and 1 dataset containing scans of 12 phantom nodules of known volume) from five collections available in The Cancer Imaging Archive. Three academic institutions developing lung nodule segmentation algorithms submitted results for three repeat runs for each of the nodules. We compared the performance of lung nodule segmentation algorithms by assessing several measurements of spatial overlap and volume measurement. Nodule sizes varied from 29 μl to 66 ml and demonstrated a diversity of shapes. Agreement in spatial overlap of segmentations was significantly higher for multiple runs of the same algorithm than between segmentations generated by different algorithms (p <0.05) and was significantly higher on the phantom dataset compared to the other datasets (p <0.05). Algorithms differed significantly in the bias of the measured volumes of the phantom nodules (p <0.05) underscoring the need for assessing performance on clinical data in addition to phantoms. Algorithms that most accurately estimated nodule volumes were not the most repeatable, emphasizing the need to evaluate both their accuracy and precision. There were considerable differences between algorithms, especially in a subset of heterogeneous nodules, underscoring the recommendation that the same software be used at all time points in longitudinal studies.

Original languageEnglish (US)
Pages (from-to)1-12
Number of pages12
JournalJournal of Digital Imaging
DOIs
StateAccepted/In press - Feb 3 2016
Externally publishedYes

Fingerprint

Lung
Tumors
Volume measurement
Neoplasms
Cone-Beam Computed Tomography
Tumor Burden
Set theory
Longitudinal Studies
Lung Neoplasms
Software
Datasets
Imaging techniques
Therapeutics

Keywords

  • Computed tomography
  • Infrastructure
  • Lung cancer
  • Quantitative imaging
  • Segmentation

ASJC Scopus subject areas

  • Radiology Nuclear Medicine and imaging
  • Radiological and Ultrasound Technology
  • Computer Science Applications

Cite this

Kalpathy-Cramer, J., Zhao, B., Goldgof, D., Gu, Y., Wang, X., Yang, H., ... Napel, S. (Accepted/In press). A Comparison of Lung Nodule Segmentation Algorithms: Methods and Results from a Multi-institutional Study. Journal of Digital Imaging, 1-12. https://doi.org/10.1007/s10278-016-9859-z

A Comparison of Lung Nodule Segmentation Algorithms : Methods and Results from a Multi-institutional Study. / Kalpathy-Cramer, Jayashree; Zhao, Binsheng; Goldgof, Dmitry; Gu, Yuhua; Wang, Xingwei; Yang, Hao; Tan, Yongqiang; Gillies, Robert; Napel, Sandy.

In: Journal of Digital Imaging, 03.02.2016, p. 1-12.

Research output: Contribution to journalArticle

Kalpathy-Cramer, J, Zhao, B, Goldgof, D, Gu, Y, Wang, X, Yang, H, Tan, Y, Gillies, R & Napel, S 2016, 'A Comparison of Lung Nodule Segmentation Algorithms: Methods and Results from a Multi-institutional Study', Journal of Digital Imaging, pp. 1-12. https://doi.org/10.1007/s10278-016-9859-z
Kalpathy-Cramer, Jayashree ; Zhao, Binsheng ; Goldgof, Dmitry ; Gu, Yuhua ; Wang, Xingwei ; Yang, Hao ; Tan, Yongqiang ; Gillies, Robert ; Napel, Sandy. / A Comparison of Lung Nodule Segmentation Algorithms : Methods and Results from a Multi-institutional Study. In: Journal of Digital Imaging. 2016 ; pp. 1-12.
@article{d37d0e2a6c4c4976a77270f0dc4ae70c,
title = "A Comparison of Lung Nodule Segmentation Algorithms: Methods and Results from a Multi-institutional Study",
abstract = "Tumor volume estimation, as well as accurate and reproducible borders segmentation in medical images, are important in the diagnosis, staging, and assessment of response to cancer therapy. The goal of this study was to demonstrate the feasibility of a multi-institutional effort to assess the repeatability and reproducibility of nodule borders and volume estimate bias of computerized segmentation algorithms in CT images of lung cancer, and to provide results from such a study. The dataset used for this evaluation consisted of 52 tumors in 41 CT volumes (40 patient datasets and 1 dataset containing scans of 12 phantom nodules of known volume) from five collections available in The Cancer Imaging Archive. Three academic institutions developing lung nodule segmentation algorithms submitted results for three repeat runs for each of the nodules. We compared the performance of lung nodule segmentation algorithms by assessing several measurements of spatial overlap and volume measurement. Nodule sizes varied from 29 μl to 66 ml and demonstrated a diversity of shapes. Agreement in spatial overlap of segmentations was significantly higher for multiple runs of the same algorithm than between segmentations generated by different algorithms (p <0.05) and was significantly higher on the phantom dataset compared to the other datasets (p <0.05). Algorithms differed significantly in the bias of the measured volumes of the phantom nodules (p <0.05) underscoring the need for assessing performance on clinical data in addition to phantoms. Algorithms that most accurately estimated nodule volumes were not the most repeatable, emphasizing the need to evaluate both their accuracy and precision. There were considerable differences between algorithms, especially in a subset of heterogeneous nodules, underscoring the recommendation that the same software be used at all time points in longitudinal studies.",
keywords = "Computed tomography, Infrastructure, Lung cancer, Quantitative imaging, Segmentation",
author = "Jayashree Kalpathy-Cramer and Binsheng Zhao and Dmitry Goldgof and Yuhua Gu and Xingwei Wang and Hao Yang and Yongqiang Tan and Robert Gillies and Sandy Napel",
year = "2016",
month = "2",
day = "3",
doi = "10.1007/s10278-016-9859-z",
language = "English (US)",
pages = "1--12",
journal = "Journal of Digital Imaging",
issn = "0897-1889",
publisher = "Springer New York",

}

TY - JOUR

T1 - A Comparison of Lung Nodule Segmentation Algorithms

T2 - Methods and Results from a Multi-institutional Study

AU - Kalpathy-Cramer, Jayashree

AU - Zhao, Binsheng

AU - Goldgof, Dmitry

AU - Gu, Yuhua

AU - Wang, Xingwei

AU - Yang, Hao

AU - Tan, Yongqiang

AU - Gillies, Robert

AU - Napel, Sandy

PY - 2016/2/3

Y1 - 2016/2/3

N2 - Tumor volume estimation, as well as accurate and reproducible borders segmentation in medical images, are important in the diagnosis, staging, and assessment of response to cancer therapy. The goal of this study was to demonstrate the feasibility of a multi-institutional effort to assess the repeatability and reproducibility of nodule borders and volume estimate bias of computerized segmentation algorithms in CT images of lung cancer, and to provide results from such a study. The dataset used for this evaluation consisted of 52 tumors in 41 CT volumes (40 patient datasets and 1 dataset containing scans of 12 phantom nodules of known volume) from five collections available in The Cancer Imaging Archive. Three academic institutions developing lung nodule segmentation algorithms submitted results for three repeat runs for each of the nodules. We compared the performance of lung nodule segmentation algorithms by assessing several measurements of spatial overlap and volume measurement. Nodule sizes varied from 29 μl to 66 ml and demonstrated a diversity of shapes. Agreement in spatial overlap of segmentations was significantly higher for multiple runs of the same algorithm than between segmentations generated by different algorithms (p <0.05) and was significantly higher on the phantom dataset compared to the other datasets (p <0.05). Algorithms differed significantly in the bias of the measured volumes of the phantom nodules (p <0.05) underscoring the need for assessing performance on clinical data in addition to phantoms. Algorithms that most accurately estimated nodule volumes were not the most repeatable, emphasizing the need to evaluate both their accuracy and precision. There were considerable differences between algorithms, especially in a subset of heterogeneous nodules, underscoring the recommendation that the same software be used at all time points in longitudinal studies.

AB - Tumor volume estimation, as well as accurate and reproducible borders segmentation in medical images, are important in the diagnosis, staging, and assessment of response to cancer therapy. The goal of this study was to demonstrate the feasibility of a multi-institutional effort to assess the repeatability and reproducibility of nodule borders and volume estimate bias of computerized segmentation algorithms in CT images of lung cancer, and to provide results from such a study. The dataset used for this evaluation consisted of 52 tumors in 41 CT volumes (40 patient datasets and 1 dataset containing scans of 12 phantom nodules of known volume) from five collections available in The Cancer Imaging Archive. Three academic institutions developing lung nodule segmentation algorithms submitted results for three repeat runs for each of the nodules. We compared the performance of lung nodule segmentation algorithms by assessing several measurements of spatial overlap and volume measurement. Nodule sizes varied from 29 μl to 66 ml and demonstrated a diversity of shapes. Agreement in spatial overlap of segmentations was significantly higher for multiple runs of the same algorithm than between segmentations generated by different algorithms (p <0.05) and was significantly higher on the phantom dataset compared to the other datasets (p <0.05). Algorithms differed significantly in the bias of the measured volumes of the phantom nodules (p <0.05) underscoring the need for assessing performance on clinical data in addition to phantoms. Algorithms that most accurately estimated nodule volumes were not the most repeatable, emphasizing the need to evaluate both their accuracy and precision. There were considerable differences between algorithms, especially in a subset of heterogeneous nodules, underscoring the recommendation that the same software be used at all time points in longitudinal studies.

KW - Computed tomography

KW - Infrastructure

KW - Lung cancer

KW - Quantitative imaging

KW - Segmentation

UR - http://www.scopus.com/inward/record.url?scp=84957603850&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84957603850&partnerID=8YFLogxK

U2 - 10.1007/s10278-016-9859-z

DO - 10.1007/s10278-016-9859-z

M3 - Article

C2 - 26847203

AN - SCOPUS:84957603850

SP - 1

EP - 12

JO - Journal of Digital Imaging

JF - Journal of Digital Imaging

SN - 0897-1889

ER -