Making sense of large data sets without annotations: Analyzing age-related correlations from lung CT scans

Yashin Dicente Cid, Artem Mamonov, Andrew Beers, Armin Thomas, Vassili Kovalev, Jayashree Kalpathy-Cramer, Henning Müller

Research output: ResearchConference contribution

  • 2 Citations

Abstract

The analysis of large data sets can help to gain knowledge about specific organs or on specific diseases, just as big data analysis does in many non-medical areas. This article aims to gain information from 3D volumes, so the visual content of lung CT scans of a large number of patients. In the case of the described data set, only little annotation is available on the patients that were all part of an ongoing screening program and besides age and gender no information on the patient and the findings was available for this work. This is a scenario that can happen regularly as image data sets are produced and become available in increasingly large quantities but manual annotations are often not available and also clinical data such as text reports are often harder to share. We extracted a set of visual features from 12,414 CT scans of 9,348 patients that had CT scans of the lung taken in the context of a national lung screening program in Belarus. Lung fields were segmented by two segmentation algorithms and only cases where both algorithms were able to find left and right lung and had a Dice coefficient above 0.95 were analyzed. This assures that only segmentations of good quality were used to extract features of the lung. Patients ranged in age from 0 to 106 years. Data analysis shows that age can be predicted with a fairly high accuracy for persons under 15 years. Relatively good results were also obtained between 30 and 65 years where a steady trend is seen. For young adults and older people the results are not as good as variability is very high in these groups. Several visualizations of the data show the evolution patters of the lung texture, size and density with age. The experiments allow learning the evolution of the lung and the gained results show that even with limited metadata we can extract interesting information from large-scale visual data. These age-related changes (for example of the lung volume, the density histogram of the tissue) can also be taken into account for the interpretation of new cases. The database used includes patients that had suspicions on a chest X-ray, so it is not a group of healthy people, and only tendencies and not a model of a healthy lung at a specific age can be derived.

LanguageEnglish (US)
Title of host publicationMedical Imaging 2017
Subtitle of host publicationImaging Informatics for Healthcare, Research, and Applications
PublisherSPIE
Volume10138
ISBN (Electronic)9781510607217
DOIs
StatePublished - 2017
Externally publishedYes
EventMedical Imaging 2017: Imaging Informatics for Healthcare, Research, and Applications - Orlando, United States
Duration: Feb 15 2017Feb 16 2017

Other

OtherMedical Imaging 2017: Imaging Informatics for Healthcare, Research, and Applications
CountryUnited States
CityOrlando
Period2/15/172/16/17

Fingerprint

Computerized tomography
annotations
lungs
Lung
Datasets
Screening
Metadata
Visualization
Textures
Tissue
X rays
Experiments
Big data
screening
Belarus
metadata
chest
histograms
organs
learning

Keywords

  • Big data
  • Lung segmentation
  • Lung tissue analysis

ASJC Scopus subject areas

  • Atomic and Molecular Physics, and Optics
  • Electronic, Optical and Magnetic Materials
  • Biomaterials
  • Radiology Nuclear Medicine and imaging

Cite this

Dicente Cid, Y., Mamonov, A., Beers, A., Thomas, A., Kovalev, V., Kalpathy-Cramer, J., & Müller, H. (2017). Making sense of large data sets without annotations: Analyzing age-related correlations from lung CT scans. In Medical Imaging 2017: Imaging Informatics for Healthcare, Research, and Applications (Vol. 10138). [1013809] SPIE. DOI: 10.1117/12.2255609

Making sense of large data sets without annotations : Analyzing age-related correlations from lung CT scans. / Dicente Cid, Yashin; Mamonov, Artem; Beers, Andrew; Thomas, Armin; Kovalev, Vassili; Kalpathy-Cramer, Jayashree; Müller, Henning.

Medical Imaging 2017: Imaging Informatics for Healthcare, Research, and Applications. Vol. 10138 SPIE, 2017. 1013809.

Research output: ResearchConference contribution

Dicente Cid, Y, Mamonov, A, Beers, A, Thomas, A, Kovalev, V, Kalpathy-Cramer, J & Müller, H 2017, Making sense of large data sets without annotations: Analyzing age-related correlations from lung CT scans. in Medical Imaging 2017: Imaging Informatics for Healthcare, Research, and Applications. vol. 10138, 1013809, SPIE, Medical Imaging 2017: Imaging Informatics for Healthcare, Research, and Applications, Orlando, United States, 2/15/17. DOI: 10.1117/12.2255609
Dicente Cid Y, Mamonov A, Beers A, Thomas A, Kovalev V, Kalpathy-Cramer J et al. Making sense of large data sets without annotations: Analyzing age-related correlations from lung CT scans. In Medical Imaging 2017: Imaging Informatics for Healthcare, Research, and Applications. Vol. 10138. SPIE. 2017. 1013809. Available from, DOI: 10.1117/12.2255609
Dicente Cid, Yashin ; Mamonov, Artem ; Beers, Andrew ; Thomas, Armin ; Kovalev, Vassili ; Kalpathy-Cramer, Jayashree ; Müller, Henning. / Making sense of large data sets without annotations : Analyzing age-related correlations from lung CT scans. Medical Imaging 2017: Imaging Informatics for Healthcare, Research, and Applications. Vol. 10138 SPIE, 2017.
@inbook{cb02859d37a5454da8ebe86c215b103b,
title = "Making sense of large data sets without annotations: Analyzing age-related correlations from lung CT scans",
abstract = "The analysis of large data sets can help to gain knowledge about specific organs or on specific diseases, just as big data analysis does in many non-medical areas. This article aims to gain information from 3D volumes, so the visual content of lung CT scans of a large number of patients. In the case of the described data set, only little annotation is available on the patients that were all part of an ongoing screening program and besides age and gender no information on the patient and the findings was available for this work. This is a scenario that can happen regularly as image data sets are produced and become available in increasingly large quantities but manual annotations are often not available and also clinical data such as text reports are often harder to share. We extracted a set of visual features from 12,414 CT scans of 9,348 patients that had CT scans of the lung taken in the context of a national lung screening program in Belarus. Lung fields were segmented by two segmentation algorithms and only cases where both algorithms were able to find left and right lung and had a Dice coefficient above 0.95 were analyzed. This assures that only segmentations of good quality were used to extract features of the lung. Patients ranged in age from 0 to 106 years. Data analysis shows that age can be predicted with a fairly high accuracy for persons under 15 years. Relatively good results were also obtained between 30 and 65 years where a steady trend is seen. For young adults and older people the results are not as good as variability is very high in these groups. Several visualizations of the data show the evolution patters of the lung texture, size and density with age. The experiments allow learning the evolution of the lung and the gained results show that even with limited metadata we can extract interesting information from large-scale visual data. These age-related changes (for example of the lung volume, the density histogram of the tissue) can also be taken into account for the interpretation of new cases. The database used includes patients that had suspicions on a chest X-ray, so it is not a group of healthy people, and only tendencies and not a model of a healthy lung at a specific age can be derived.",
keywords = "Big data, Lung segmentation, Lung tissue analysis",
author = "{Dicente Cid}, Yashin and Artem Mamonov and Andrew Beers and Armin Thomas and Vassili Kovalev and Jayashree Kalpathy-Cramer and Henning Müller",
year = "2017",
doi = "10.1117/12.2255609",
volume = "10138",
booktitle = "Medical Imaging 2017",
publisher = "SPIE",

}

TY - CHAP

T1 - Making sense of large data sets without annotations

T2 - Analyzing age-related correlations from lung CT scans

AU - Dicente Cid,Yashin

AU - Mamonov,Artem

AU - Beers,Andrew

AU - Thomas,Armin

AU - Kovalev,Vassili

AU - Kalpathy-Cramer,Jayashree

AU - Müller,Henning

PY - 2017

Y1 - 2017

N2 - The analysis of large data sets can help to gain knowledge about specific organs or on specific diseases, just as big data analysis does in many non-medical areas. This article aims to gain information from 3D volumes, so the visual content of lung CT scans of a large number of patients. In the case of the described data set, only little annotation is available on the patients that were all part of an ongoing screening program and besides age and gender no information on the patient and the findings was available for this work. This is a scenario that can happen regularly as image data sets are produced and become available in increasingly large quantities but manual annotations are often not available and also clinical data such as text reports are often harder to share. We extracted a set of visual features from 12,414 CT scans of 9,348 patients that had CT scans of the lung taken in the context of a national lung screening program in Belarus. Lung fields were segmented by two segmentation algorithms and only cases where both algorithms were able to find left and right lung and had a Dice coefficient above 0.95 were analyzed. This assures that only segmentations of good quality were used to extract features of the lung. Patients ranged in age from 0 to 106 years. Data analysis shows that age can be predicted with a fairly high accuracy for persons under 15 years. Relatively good results were also obtained between 30 and 65 years where a steady trend is seen. For young adults and older people the results are not as good as variability is very high in these groups. Several visualizations of the data show the evolution patters of the lung texture, size and density with age. The experiments allow learning the evolution of the lung and the gained results show that even with limited metadata we can extract interesting information from large-scale visual data. These age-related changes (for example of the lung volume, the density histogram of the tissue) can also be taken into account for the interpretation of new cases. The database used includes patients that had suspicions on a chest X-ray, so it is not a group of healthy people, and only tendencies and not a model of a healthy lung at a specific age can be derived.

AB - The analysis of large data sets can help to gain knowledge about specific organs or on specific diseases, just as big data analysis does in many non-medical areas. This article aims to gain information from 3D volumes, so the visual content of lung CT scans of a large number of patients. In the case of the described data set, only little annotation is available on the patients that were all part of an ongoing screening program and besides age and gender no information on the patient and the findings was available for this work. This is a scenario that can happen regularly as image data sets are produced and become available in increasingly large quantities but manual annotations are often not available and also clinical data such as text reports are often harder to share. We extracted a set of visual features from 12,414 CT scans of 9,348 patients that had CT scans of the lung taken in the context of a national lung screening program in Belarus. Lung fields were segmented by two segmentation algorithms and only cases where both algorithms were able to find left and right lung and had a Dice coefficient above 0.95 were analyzed. This assures that only segmentations of good quality were used to extract features of the lung. Patients ranged in age from 0 to 106 years. Data analysis shows that age can be predicted with a fairly high accuracy for persons under 15 years. Relatively good results were also obtained between 30 and 65 years where a steady trend is seen. For young adults and older people the results are not as good as variability is very high in these groups. Several visualizations of the data show the evolution patters of the lung texture, size and density with age. The experiments allow learning the evolution of the lung and the gained results show that even with limited metadata we can extract interesting information from large-scale visual data. These age-related changes (for example of the lung volume, the density histogram of the tissue) can also be taken into account for the interpretation of new cases. The database used includes patients that had suspicions on a chest X-ray, so it is not a group of healthy people, and only tendencies and not a model of a healthy lung at a specific age can be derived.

KW - Big data

KW - Lung segmentation

KW - Lung tissue analysis

UR - http://www.scopus.com/inward/record.url?scp=85020375908&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85020375908&partnerID=8YFLogxK

U2 - 10.1117/12.2255609

DO - 10.1117/12.2255609

M3 - Conference contribution

VL - 10138

BT - Medical Imaging 2017

PB - SPIE

ER -