Mining a clinical data warehouse to discover disease-finding associations using co-occurrence statistics.

Hui Cao, Marianthi Markatou, Genevieve B. Melton, Michael Chiang, George Hripcsak

Research output: Contribution to journalArticle

40 Citations (Scopus)

Abstract

This paper applies co-occurrence statistics to discover disease-finding associations in a clinical data warehouse. We used two methods, chi2 statistics and the proportion confidence interval (PCI) method, to measure the dependence of pairs of diseases and findings, and then used heuristic cutoff values for association selection. An intrinsic evaluation showed that 94 percent of disease-finding associations obtained by chi2 statistics and 76.8 percent obtained by the PCI method were true associations. The selected associations were used to construct knowledge bases of disease-finding relations (KB-chi2, KB-PCI). An extrinsic evaluation showed that both KB-chi2 and KB-PCI could assist in eliminating clinically non-informative and redundant findings from problem lists generated by our automated problem list summarization system.

Original languageEnglish (US)
Pages (from-to)106-110
Number of pages5
JournalAMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium
StatePublished - 2005
Externally publishedYes

Fingerprint

Confidence Intervals
Knowledge Bases
Heuristics

ASJC Scopus subject areas

  • Medicine(all)

Cite this

Mining a clinical data warehouse to discover disease-finding associations using co-occurrence statistics. / Cao, Hui; Markatou, Marianthi; Melton, Genevieve B.; Chiang, Michael; Hripcsak, George.

In: AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium, 2005, p. 106-110.

Research output: Contribution to journalArticle

@article{1c24e2ead93042d4a548b5a2bd1b883a,
title = "Mining a clinical data warehouse to discover disease-finding associations using co-occurrence statistics.",
abstract = "This paper applies co-occurrence statistics to discover disease-finding associations in a clinical data warehouse. We used two methods, chi2 statistics and the proportion confidence interval (PCI) method, to measure the dependence of pairs of diseases and findings, and then used heuristic cutoff values for association selection. An intrinsic evaluation showed that 94 percent of disease-finding associations obtained by chi2 statistics and 76.8 percent obtained by the PCI method were true associations. The selected associations were used to construct knowledge bases of disease-finding relations (KB-chi2, KB-PCI). An extrinsic evaluation showed that both KB-chi2 and KB-PCI could assist in eliminating clinically non-informative and redundant findings from problem lists generated by our automated problem list summarization system.",
author = "Hui Cao and Marianthi Markatou and Melton, {Genevieve B.} and Michael Chiang and George Hripcsak",
year = "2005",
language = "English (US)",
pages = "106--110",
journal = "AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium",
issn = "1559-4076",
publisher = "American Medical Informatics Association",

}

TY - JOUR

T1 - Mining a clinical data warehouse to discover disease-finding associations using co-occurrence statistics.

AU - Cao, Hui

AU - Markatou, Marianthi

AU - Melton, Genevieve B.

AU - Chiang, Michael

AU - Hripcsak, George

PY - 2005

Y1 - 2005

N2 - This paper applies co-occurrence statistics to discover disease-finding associations in a clinical data warehouse. We used two methods, chi2 statistics and the proportion confidence interval (PCI) method, to measure the dependence of pairs of diseases and findings, and then used heuristic cutoff values for association selection. An intrinsic evaluation showed that 94 percent of disease-finding associations obtained by chi2 statistics and 76.8 percent obtained by the PCI method were true associations. The selected associations were used to construct knowledge bases of disease-finding relations (KB-chi2, KB-PCI). An extrinsic evaluation showed that both KB-chi2 and KB-PCI could assist in eliminating clinically non-informative and redundant findings from problem lists generated by our automated problem list summarization system.

AB - This paper applies co-occurrence statistics to discover disease-finding associations in a clinical data warehouse. We used two methods, chi2 statistics and the proportion confidence interval (PCI) method, to measure the dependence of pairs of diseases and findings, and then used heuristic cutoff values for association selection. An intrinsic evaluation showed that 94 percent of disease-finding associations obtained by chi2 statistics and 76.8 percent obtained by the PCI method were true associations. The selected associations were used to construct knowledge bases of disease-finding relations (KB-chi2, KB-PCI). An extrinsic evaluation showed that both KB-chi2 and KB-PCI could assist in eliminating clinically non-informative and redundant findings from problem lists generated by our automated problem list summarization system.

UR - http://www.scopus.com/inward/record.url?scp=34248363374&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=34248363374&partnerID=8YFLogxK

M3 - Article

C2 - 16779011

AN - SCOPUS:34248363374

SP - 106

EP - 110

JO - AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium

JF - AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium

SN - 1559-4076

ER -