### Abstract

We consider the problem of modeling a distribution whose alphabet size is large relative to the amount of observed data. It is well known that conventional maximum-likelihood estimates do not perform well in that regime. Instead, we find the distribution maximizing the probability of the data's pattern. We derive an efficient algorithm for approximating this distribution. Simulations show that the computed distribution models the data well and yields general estimators that evaluate various data attributes as well as specific estimators designed especially for these tasks.

Original language | English (US) |
---|---|

Number of pages | 1 |

Journal | IEEE International Symposium on Information Theory - Proceedings |

State | Published - Oct 20 2004 |

Event | Proceedings - 2004 IEEE International Symposium on Information Theory - Chicago, IL, United States Duration: Jun 27 2004 → Jul 2 2004 |

### ASJC Scopus subject areas

- Theoretical Computer Science
- Information Systems
- Modeling and Simulation
- Applied Mathematics

## Fingerprint Dive into the research topics of 'Algorithms for modeling distributions over large alphabets'. Together they form a unique fingerprint.

## Cite this

Orlitsky, A., Sajama, Santhanam, N., Viswanathan, K., & Zhang, J. (2004). Algorithms for modeling distributions over large alphabets.

*IEEE International Symposium on Information Theory - Proceedings*.