### Abstract

We consider the problem of modeling a distribution whose alphabet size is large relative to the amount of observed data. It is well known that conventional maximum-likelihood estimates do not perform well in that regime. Instead, we find the distribution maximizing the probability of the data's pattern. We derive an efficient algorithm for approximating this distribution. Simulations show that the computed distribution models the data well and yields general estimators that evaluate various data attributes as well as specific estimators designed especially for these tasks.

Original language | English (US) |
---|---|

Title of host publication | IEEE International Symposium on Information Theory - Proceedings |

Pages | 306 |

Number of pages | 1 |

State | Published - 2004 |

Externally published | Yes |

Event | Proceedings - 2004 IEEE International Symposium on Information Theory - Chicago, IL, United States Duration: Jun 27 2004 → Jul 2 2004 |

### Other

Other | Proceedings - 2004 IEEE International Symposium on Information Theory |
---|---|

Country | United States |

City | Chicago, IL |

Period | 6/27/04 → 7/2/04 |

### Fingerprint

### ASJC Scopus subject areas

- Electrical and Electronic Engineering

### Cite this

*IEEE International Symposium on Information Theory - Proceedings*(pp. 306)

**Algorithms for modeling distributions over large alphabets.** / Orlitsky, Alon; Sajama; Santhanam, Narayana; Viswanathan, Krishnamurthy; Zhang, Junan.

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

*IEEE International Symposium on Information Theory - Proceedings.*pp. 306, Proceedings - 2004 IEEE International Symposium on Information Theory, Chicago, IL, United States, 6/27/04.

}

TY - GEN

T1 - Algorithms for modeling distributions over large alphabets

AU - Orlitsky, Alon

AU - Sajama,

AU - Santhanam, Narayana

AU - Viswanathan, Krishnamurthy

AU - Zhang, Junan

PY - 2004

Y1 - 2004

N2 - We consider the problem of modeling a distribution whose alphabet size is large relative to the amount of observed data. It is well known that conventional maximum-likelihood estimates do not perform well in that regime. Instead, we find the distribution maximizing the probability of the data's pattern. We derive an efficient algorithm for approximating this distribution. Simulations show that the computed distribution models the data well and yields general estimators that evaluate various data attributes as well as specific estimators designed especially for these tasks.

AB - We consider the problem of modeling a distribution whose alphabet size is large relative to the amount of observed data. It is well known that conventional maximum-likelihood estimates do not perform well in that regime. Instead, we find the distribution maximizing the probability of the data's pattern. We derive an efficient algorithm for approximating this distribution. Simulations show that the computed distribution models the data well and yields general estimators that evaluate various data attributes as well as specific estimators designed especially for these tasks.

UR - http://www.scopus.com/inward/record.url?scp=5044241234&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=5044241234&partnerID=8YFLogxK

M3 - Conference contribution

SP - 306

BT - IEEE International Symposium on Information Theory - Proceedings

ER -