Algorithms for modeling distributions over large alphabets

Alon Orlitsky; Sajama; Narayana Santhanam; Krishnamurthy Viswanathan; Junan Zhang

Algorithms for modeling distributions over large alphabets

Alon Orlitsky, Sajama, Narayana Santhanam, Krishnamurthy Viswanathan, Junan Zhang

Research output: Contribution to journal › Conference article › peer-review

Abstract

We consider the problem of modeling a distribution whose alphabet size is large relative to the amount of observed data. It is well known that conventional maximum-likelihood estimates do not perform well in that regime. Instead, we find the distribution maximizing the probability of the data's pattern. We derive an efficient algorithm for approximating this distribution. Simulations show that the computed distribution models the data well and yields general estimators that evaluate various data attributes as well as specific estimators designed especially for these tasks.

Original language	English (US)
Pages (from-to)	306
Number of pages	1
Journal	IEEE International Symposium on Information Theory - Proceedings
State	Published - 2004
Externally published	Yes
Event	Proceedings - 2004 IEEE International Symposium on Information Theory - Chicago, IL, United States Duration: Jun 27 2004 → Jul 2 2004

ASJC Scopus subject areas

Theoretical Computer Science
Information Systems
Modeling and Simulation
Applied Mathematics

Cite this

@article{d16c23e339254db188e2f5d956f6f6e1,

title = "Algorithms for modeling distributions over large alphabets",

abstract = "We consider the problem of modeling a distribution whose alphabet size is large relative to the amount of observed data. It is well known that conventional maximum-likelihood estimates do not perform well in that regime. Instead, we find the distribution maximizing the probability of the data's pattern. We derive an efficient algorithm for approximating this distribution. Simulations show that the computed distribution models the data well and yields general estimators that evaluate various data attributes as well as specific estimators designed especially for these tasks.",

author = "Alon Orlitsky and Sajama and Narayana Santhanam and Krishnamurthy Viswanathan and Junan Zhang",

year = "2004",

language = "English (US)",

pages = "306",

journal = "IEEE International Symposium on Information Theory - Proceedings",

issn = "2157-8097",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

note = "Proceedings - 2004 IEEE International Symposium on Information Theory ; Conference date: 27-06-2004 Through 02-07-2004",

}

TY - JOUR

T1 - Algorithms for modeling distributions over large alphabets

AU - Orlitsky, Alon

AU - Sajama,

AU - Santhanam, Narayana

AU - Viswanathan, Krishnamurthy

AU - Zhang, Junan

PY - 2004

Y1 - 2004

N2 - We consider the problem of modeling a distribution whose alphabet size is large relative to the amount of observed data. It is well known that conventional maximum-likelihood estimates do not perform well in that regime. Instead, we find the distribution maximizing the probability of the data's pattern. We derive an efficient algorithm for approximating this distribution. Simulations show that the computed distribution models the data well and yields general estimators that evaluate various data attributes as well as specific estimators designed especially for these tasks.

AB - We consider the problem of modeling a distribution whose alphabet size is large relative to the amount of observed data. It is well known that conventional maximum-likelihood estimates do not perform well in that regime. Instead, we find the distribution maximizing the probability of the data's pattern. We derive an efficient algorithm for approximating this distribution. Simulations show that the computed distribution models the data well and yields general estimators that evaluate various data attributes as well as specific estimators designed especially for these tasks.

UR - http://www.scopus.com/inward/record.url?scp=5044241234&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=5044241234&partnerID=8YFLogxK

M3 - Conference article

AN - SCOPUS:5044241234

SN - 2157-8097

SP - 306

JO - IEEE International Symposium on Information Theory - Proceedings

JF - IEEE International Symposium on Information Theory - Proceedings

T2 - Proceedings - 2004 IEEE International Symposium on Information Theory

Y2 - 27 June 2004 through 2 July 2004

ER -

Algorithms for modeling distributions over large alphabets

Abstract

ASJC Scopus subject areas

Other files and links

Fingerprint

Cite this