Always Good Turing: Asymptotically optimal probability estimation

A. Orlitsky, N. P. Santhanam, Junan Zhang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

13 Citations (Scopus)

Abstract

While deciphering the German Enigma code during World War II, I.J. Good and A.M. Turing considered the problem of estimating a probability distribution from a sample of data. They derived a surprising and unintuitive formula that has since been used in a variety of applications and studied by a number of researchers. Borrowing an information-theoretic and machine-learning framework, we define the attenuation of a probability estimator as the largest possible ratio between the per-symbol probability assigned to an arbitrarily-long sequence by any distribution, and the corresponding probability assigned by the estimator. We show that some common estimators have infinite attenuation and that the attenuation of the Good-Turing estimator is low, yet larger than one. We then derive an estimator whose attenuation is one, namely, as the length of any sequence increases, the per-symbol probability assigned by the estimator is at least the highest possible. Interestingly, some of the proofs use celebrated results by Hardy and Ramanujan on the number of partitions of an integer. To better understand the behavior of the estimator, we study the probability it assigns to several simple sequences. We show that some sequences this probability agrees with our intuition, while for others it is rather unexpected.

Original languageEnglish (US)
Title of host publicationProceedings - 44th Annual IEEE Symposium on Foundations of Computer Science, FOCS 2003
PublisherIEEE Computer Society
Pages179-188
Number of pages10
Volume2003-January
ISBN (Electronic)0769520405
DOIs
StatePublished - 2003
Externally publishedYes
Event44th Annual IEEE Symposium on Foundations of Computer Science, FOCS 2003 - Cambridge, United States
Duration: Oct 11 2003Oct 14 2003

Other

Other44th Annual IEEE Symposium on Foundations of Computer Science, FOCS 2003
CountryUnited States
CityCambridge
Period10/11/0310/14/03

Fingerprint

Probability distributions
Learning systems

Keywords

  • Computer science

ASJC Scopus subject areas

  • Computer Science(all)

Cite this

Orlitsky, A., Santhanam, N. P., & Zhang, J. (2003). Always Good Turing: Asymptotically optimal probability estimation. In Proceedings - 44th Annual IEEE Symposium on Foundations of Computer Science, FOCS 2003 (Vol. 2003-January, pp. 179-188). [1238192] IEEE Computer Society. https://doi.org/10.1109/SFCS.2003.1238192

Always Good Turing : Asymptotically optimal probability estimation. / Orlitsky, A.; Santhanam, N. P.; Zhang, Junan.

Proceedings - 44th Annual IEEE Symposium on Foundations of Computer Science, FOCS 2003. Vol. 2003-January IEEE Computer Society, 2003. p. 179-188 1238192.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Orlitsky, A, Santhanam, NP & Zhang, J 2003, Always Good Turing: Asymptotically optimal probability estimation. in Proceedings - 44th Annual IEEE Symposium on Foundations of Computer Science, FOCS 2003. vol. 2003-January, 1238192, IEEE Computer Society, pp. 179-188, 44th Annual IEEE Symposium on Foundations of Computer Science, FOCS 2003, Cambridge, United States, 10/11/03. https://doi.org/10.1109/SFCS.2003.1238192
Orlitsky A, Santhanam NP, Zhang J. Always Good Turing: Asymptotically optimal probability estimation. In Proceedings - 44th Annual IEEE Symposium on Foundations of Computer Science, FOCS 2003. Vol. 2003-January. IEEE Computer Society. 2003. p. 179-188. 1238192 https://doi.org/10.1109/SFCS.2003.1238192
Orlitsky, A. ; Santhanam, N. P. ; Zhang, Junan. / Always Good Turing : Asymptotically optimal probability estimation. Proceedings - 44th Annual IEEE Symposium on Foundations of Computer Science, FOCS 2003. Vol. 2003-January IEEE Computer Society, 2003. pp. 179-188
@inproceedings{dd609f590cf44b8c80ae90589b4f6a89,
title = "Always Good Turing: Asymptotically optimal probability estimation",
abstract = "While deciphering the German Enigma code during World War II, I.J. Good and A.M. Turing considered the problem of estimating a probability distribution from a sample of data. They derived a surprising and unintuitive formula that has since been used in a variety of applications and studied by a number of researchers. Borrowing an information-theoretic and machine-learning framework, we define the attenuation of a probability estimator as the largest possible ratio between the per-symbol probability assigned to an arbitrarily-long sequence by any distribution, and the corresponding probability assigned by the estimator. We show that some common estimators have infinite attenuation and that the attenuation of the Good-Turing estimator is low, yet larger than one. We then derive an estimator whose attenuation is one, namely, as the length of any sequence increases, the per-symbol probability assigned by the estimator is at least the highest possible. Interestingly, some of the proofs use celebrated results by Hardy and Ramanujan on the number of partitions of an integer. To better understand the behavior of the estimator, we study the probability it assigns to several simple sequences. We show that some sequences this probability agrees with our intuition, while for others it is rather unexpected.",
keywords = "Computer science",
author = "A. Orlitsky and Santhanam, {N. P.} and Junan Zhang",
year = "2003",
doi = "10.1109/SFCS.2003.1238192",
language = "English (US)",
volume = "2003-January",
pages = "179--188",
booktitle = "Proceedings - 44th Annual IEEE Symposium on Foundations of Computer Science, FOCS 2003",
publisher = "IEEE Computer Society",

}

TY - GEN

T1 - Always Good Turing

T2 - Asymptotically optimal probability estimation

AU - Orlitsky, A.

AU - Santhanam, N. P.

AU - Zhang, Junan

PY - 2003

Y1 - 2003

N2 - While deciphering the German Enigma code during World War II, I.J. Good and A.M. Turing considered the problem of estimating a probability distribution from a sample of data. They derived a surprising and unintuitive formula that has since been used in a variety of applications and studied by a number of researchers. Borrowing an information-theoretic and machine-learning framework, we define the attenuation of a probability estimator as the largest possible ratio between the per-symbol probability assigned to an arbitrarily-long sequence by any distribution, and the corresponding probability assigned by the estimator. We show that some common estimators have infinite attenuation and that the attenuation of the Good-Turing estimator is low, yet larger than one. We then derive an estimator whose attenuation is one, namely, as the length of any sequence increases, the per-symbol probability assigned by the estimator is at least the highest possible. Interestingly, some of the proofs use celebrated results by Hardy and Ramanujan on the number of partitions of an integer. To better understand the behavior of the estimator, we study the probability it assigns to several simple sequences. We show that some sequences this probability agrees with our intuition, while for others it is rather unexpected.

AB - While deciphering the German Enigma code during World War II, I.J. Good and A.M. Turing considered the problem of estimating a probability distribution from a sample of data. They derived a surprising and unintuitive formula that has since been used in a variety of applications and studied by a number of researchers. Borrowing an information-theoretic and machine-learning framework, we define the attenuation of a probability estimator as the largest possible ratio between the per-symbol probability assigned to an arbitrarily-long sequence by any distribution, and the corresponding probability assigned by the estimator. We show that some common estimators have infinite attenuation and that the attenuation of the Good-Turing estimator is low, yet larger than one. We then derive an estimator whose attenuation is one, namely, as the length of any sequence increases, the per-symbol probability assigned by the estimator is at least the highest possible. Interestingly, some of the proofs use celebrated results by Hardy and Ramanujan on the number of partitions of an integer. To better understand the behavior of the estimator, we study the probability it assigns to several simple sequences. We show that some sequences this probability agrees with our intuition, while for others it is rather unexpected.

KW - Computer science

UR - http://www.scopus.com/inward/record.url?scp=33746333177&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33746333177&partnerID=8YFLogxK

U2 - 10.1109/SFCS.2003.1238192

DO - 10.1109/SFCS.2003.1238192

M3 - Conference contribution

AN - SCOPUS:33746333177

VL - 2003-January

SP - 179

EP - 188

BT - Proceedings - 44th Annual IEEE Symposium on Foundations of Computer Science, FOCS 2003

PB - IEEE Computer Society

ER -