Always Good Turing: Asymptotically Optimal Probability Estimation

Alon Orlitsky; Narayana P. Santhanam; Junan Zhang

doi:10.1126/science.1088284

Always Good Turing: Asymptotically Optimal Probability Estimation

Alon Orlitsky, Narayana P. Santhanam, Junan Zhang

Research output: Contribution to journal › Article › peer-review

88 Scopus citations

Abstract

While deciphering the Enigma code, Good and Turing derived an unintuitive, yet effective, formula for estimating a probability distribution from a sample of data. We define the attenuation of a probability estimator as the largest possible ratio between the per-symbol probability assigned to an arbitrarily long sequence by any distribution, and the corresponding probability assigned by the estimator. We show that some common estimators have infinite attenuation and that the attenuation of the Good-Turing estimator is low, yet greater than 1. We then derive an estimator whose attenuation is 1; that is, asymptotically it does not underestimate the probability of any sequence.

Original language	English (US)
Pages (from-to)	427-431
Number of pages	5
Journal	Science
Volume	302
Issue number	5644
DOIs	https://doi.org/10.1126/science.1088284
State	Published - Oct 17 2003
Externally published	Yes

ASJC Scopus subject areas

General

Access to Document

10.1126/science.1088284

Cite this

@article{4bc76385cbcf4dd0b6a56e68014a4567,

title = "Always Good Turing: Asymptotically Optimal Probability Estimation",

abstract = "While deciphering the Enigma code, Good and Turing derived an unintuitive, yet effective, formula for estimating a probability distribution from a sample of data. We define the attenuation of a probability estimator as the largest possible ratio between the per-symbol probability assigned to an arbitrarily long sequence by any distribution, and the corresponding probability assigned by the estimator. We show that some common estimators have infinite attenuation and that the attenuation of the Good-Turing estimator is low, yet greater than 1. We then derive an estimator whose attenuation is 1; that is, asymptotically it does not underestimate the probability of any sequence.",

author = "Alon Orlitsky and Santhanam, {Narayana P.} and Junan Zhang",

year = "2003",

month = oct,

day = "17",

doi = "10.1126/science.1088284",

language = "English (US)",

volume = "302",

pages = "427--431",

journal = "Science",

issn = "0036-8075",

publisher = "American Association for the Advancement of Science",

number = "5644",

}

TY - JOUR

T1 - Always Good Turing

T2 - Asymptotically Optimal Probability Estimation

AU - Orlitsky, Alon

AU - Santhanam, Narayana P.

AU - Zhang, Junan

PY - 2003/10/17

Y1 - 2003/10/17

N2 - While deciphering the Enigma code, Good and Turing derived an unintuitive, yet effective, formula for estimating a probability distribution from a sample of data. We define the attenuation of a probability estimator as the largest possible ratio between the per-symbol probability assigned to an arbitrarily long sequence by any distribution, and the corresponding probability assigned by the estimator. We show that some common estimators have infinite attenuation and that the attenuation of the Good-Turing estimator is low, yet greater than 1. We then derive an estimator whose attenuation is 1; that is, asymptotically it does not underestimate the probability of any sequence.

AB - While deciphering the Enigma code, Good and Turing derived an unintuitive, yet effective, formula for estimating a probability distribution from a sample of data. We define the attenuation of a probability estimator as the largest possible ratio between the per-symbol probability assigned to an arbitrarily long sequence by any distribution, and the corresponding probability assigned by the estimator. We show that some common estimators have infinite attenuation and that the attenuation of the Good-Turing estimator is low, yet greater than 1. We then derive an estimator whose attenuation is 1; that is, asymptotically it does not underestimate the probability of any sequence.

UR - http://www.scopus.com/inward/record.url?scp=0142084741&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0142084741&partnerID=8YFLogxK

U2 - 10.1126/science.1088284

DO - 10.1126/science.1088284

M3 - Article

C2 - 14564004

AN - SCOPUS:0142084741

SN - 0036-8075

VL - 302

SP - 427

EP - 431

JO - Science

JF - Science

IS - 5644

ER -

Always Good Turing: Asymptotically Optimal Probability Estimation

Abstract

ASJC Scopus subject areas

Access to Document

Other files and links

Cite this