Always Good-Turing: Asymptotically optimal probability estimation

Alon Orlitsky; Narayana P. Santhanam; Junan Zhang

Always Good-Turing: Asymptotically optimal probability estimation

Alon Orlitsky, Narayana P. Santhanam, Junan Zhang

Research output: Contribution to journal › Conference article › peer-review

Abstract

While deciphering the German Enigma code during World War II, I.J. Good and A.M. Turing considered the problem of estimating a probability distribution from a sample of data. They derived a surprising and unintuitive formula that has since been used in a variety of applications and studied by a number of researchers. Borrowing an information-theoretic and machine-learning framework, we define the attenuation of a probability estimator as the largest possible ratio between the per-symbol probability assigned to an arbitrarily-long sequence by any distribution, and the corresponding probability assigned by the estimator. We show that some common estimators have infinite attenuation and that the attenuation of the Good-Turing estimator is low, yet larger than one. We then derive an estimator whose attenuation is one, namely, as the length of any sequence increases, the per-symbol probability assigned by the estimator is at least the highest possible. Interestingly, some of the proofs use celebrated results by Hardy and Ramanujan on the number of partitions of an integer. To better understand the behavior of the estimator, we study the probability it assigns to several simple sequences. We show that for some sequences this probability agrees with our intuition, while for others it is rather unexpected.

Original language	English (US)
Pages (from-to)	179-188
Number of pages	10
Journal	Annual Symposium on Foundations of Computer Science - Proceedings
State	Published - 2003
Externally published	Yes
Event	Proceedings: 44th Annual IEEE Symposium on Foundations of Computer Science - FOCS 2003 - Cambridge, MA, United States Duration: Oct 11 2003 → Oct 14 2003

ASJC Scopus subject areas

Hardware and Architecture
General Computer Science

Cite this

@article{3ab902da0cee4afebbfcdb1b843da2fb,

title = "Always Good-Turing: Asymptotically optimal probability estimation",

abstract = "While deciphering the German Enigma code during World War II, I.J. Good and A.M. Turing considered the problem of estimating a probability distribution from a sample of data. They derived a surprising and unintuitive formula that has since been used in a variety of applications and studied by a number of researchers. Borrowing an information-theoretic and machine-learning framework, we define the attenuation of a probability estimator as the largest possible ratio between the per-symbol probability assigned to an arbitrarily-long sequence by any distribution, and the corresponding probability assigned by the estimator. We show that some common estimators have infinite attenuation and that the attenuation of the Good-Turing estimator is low, yet larger than one. We then derive an estimator whose attenuation is one, namely, as the length of any sequence increases, the per-symbol probability assigned by the estimator is at least the highest possible. Interestingly, some of the proofs use celebrated results by Hardy and Ramanujan on the number of partitions of an integer. To better understand the behavior of the estimator, we study the probability it assigns to several simple sequences. We show that for some sequences this probability agrees with our intuition, while for others it is rather unexpected.",

author = "Alon Orlitsky and Santhanam, {Narayana P.} and Junan Zhang",

year = "2003",

language = "English (US)",

pages = "179--188",

journal = "Annual Symposium on Foundations of Computer Science - Proceedings",

issn = "0272-5428",

note = "Proceedings: 44th Annual IEEE Symposium on Foundations of Computer Science - FOCS 2003 ; Conference date: 11-10-2003 Through 14-10-2003",

}

TY - JOUR

T1 - Always Good-Turing

T2 - Proceedings: 44th Annual IEEE Symposium on Foundations of Computer Science - FOCS 2003

AU - Orlitsky, Alon

AU - Santhanam, Narayana P.

AU - Zhang, Junan

PY - 2003

Y1 - 2003

N2 - While deciphering the German Enigma code during World War II, I.J. Good and A.M. Turing considered the problem of estimating a probability distribution from a sample of data. They derived a surprising and unintuitive formula that has since been used in a variety of applications and studied by a number of researchers. Borrowing an information-theoretic and machine-learning framework, we define the attenuation of a probability estimator as the largest possible ratio between the per-symbol probability assigned to an arbitrarily-long sequence by any distribution, and the corresponding probability assigned by the estimator. We show that some common estimators have infinite attenuation and that the attenuation of the Good-Turing estimator is low, yet larger than one. We then derive an estimator whose attenuation is one, namely, as the length of any sequence increases, the per-symbol probability assigned by the estimator is at least the highest possible. Interestingly, some of the proofs use celebrated results by Hardy and Ramanujan on the number of partitions of an integer. To better understand the behavior of the estimator, we study the probability it assigns to several simple sequences. We show that for some sequences this probability agrees with our intuition, while for others it is rather unexpected.

AB - While deciphering the German Enigma code during World War II, I.J. Good and A.M. Turing considered the problem of estimating a probability distribution from a sample of data. They derived a surprising and unintuitive formula that has since been used in a variety of applications and studied by a number of researchers. Borrowing an information-theoretic and machine-learning framework, we define the attenuation of a probability estimator as the largest possible ratio between the per-symbol probability assigned to an arbitrarily-long sequence by any distribution, and the corresponding probability assigned by the estimator. We show that some common estimators have infinite attenuation and that the attenuation of the Good-Turing estimator is low, yet larger than one. We then derive an estimator whose attenuation is one, namely, as the length of any sequence increases, the per-symbol probability assigned by the estimator is at least the highest possible. Interestingly, some of the proofs use celebrated results by Hardy and Ramanujan on the number of partitions of an integer. To better understand the behavior of the estimator, we study the probability it assigns to several simple sequences. We show that for some sequences this probability agrees with our intuition, while for others it is rather unexpected.

UR - http://www.scopus.com/inward/record.url?scp=0345412700&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0345412700&partnerID=8YFLogxK

M3 - Conference article

AN - SCOPUS:0345412700

SN - 0272-5428

SP - 179

EP - 188

JO - Annual Symposium on Foundations of Computer Science - Proceedings

JF - Annual Symposium on Foundations of Computer Science - Proceedings

Y2 - 11 October 2003 through 14 October 2003

ER -

Always Good-Turing: Asymptotically optimal probability estimation

Abstract

ASJC Scopus subject areas

Other files and links

Fingerprint

Cite this