TY - JOUR
T1 - Always Good Turing
T2 - Asymptotically Optimal Probability Estimation
AU - Orlitsky, Alon
AU - Santhanam, Narayana P.
AU - Zhang, Junan
PY - 2003/10/17
Y1 - 2003/10/17
N2 - While deciphering the Enigma code, Good and Turing derived an unintuitive, yet effective, formula for estimating a probability distribution from a sample of data. We define the attenuation of a probability estimator as the largest possible ratio between the per-symbol probability assigned to an arbitrarily long sequence by any distribution, and the corresponding probability assigned by the estimator. We show that some common estimators have infinite attenuation and that the attenuation of the Good-Turing estimator is low, yet greater than 1. We then derive an estimator whose attenuation is 1; that is, asymptotically it does not underestimate the probability of any sequence.
AB - While deciphering the Enigma code, Good and Turing derived an unintuitive, yet effective, formula for estimating a probability distribution from a sample of data. We define the attenuation of a probability estimator as the largest possible ratio between the per-symbol probability assigned to an arbitrarily long sequence by any distribution, and the corresponding probability assigned by the estimator. We show that some common estimators have infinite attenuation and that the attenuation of the Good-Turing estimator is low, yet greater than 1. We then derive an estimator whose attenuation is 1; that is, asymptotically it does not underestimate the probability of any sequence.
UR - http://www.scopus.com/inward/record.url?scp=0142084741&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=0142084741&partnerID=8YFLogxK
U2 - 10.1126/science.1088284
DO - 10.1126/science.1088284
M3 - Article
C2 - 14564004
AN - SCOPUS:0142084741
SN - 0036-8075
VL - 302
SP - 427
EP - 431
JO - Science
JF - Science
IS - 5644
ER -