Improving Data-Driven Methods to Identify and Categorize Transgender Individuals by Gender in Insurance Claims Data

Jaclyn M.W. Hughto, Landon Hughes, Kim Yee, Jae Downing, Jacqueline Ellison, Ash Alpert, Guneet Jasuja, Theresa I. Shireman

Research output: Contribution to journalArticlepeer-review


Purpose: Prior algorithms enabled the identification and gender categorization of transgender people in insurance claims databases in which sex and gender are not simultaneously captured. However, these methods have been unable to categorize the gender of a large proportion of their samples. We improve upon these methods to identify the gender of a larger proportion of transgender people in insurance claims data. Methods: Using 2001-2019 Optum's Clinformatics® Data Mart insurance claims data, we adapted prior algorithms by combining diagnosis, procedure, and pharmacy claims to (1) identify a transgender sample; and (2) stratify the sample by gender category (trans feminine and nonbinary [TFN], trans masculine and nonbinary [TMN], unclassified). We used logistic regression to estimate the burden of 13 chronic health conditions, controlling for gender category, age, race/ethnicity, enrollment length, and census region. Results: We identified 38,598 unique transgender people, comprising 50% [n = 19,252] TMN, 26% (n = 10,040) TFN, and 24% (n = 9306) unclassified individuals. In adjusted models, relative to TMN people, TFN people had significantly higher odds of most chronic health conditions, including HIV, atherosclerotic cardiovascular disorder, myocardial infarction, alcohol use disorder, and drug use disorder. Notably, TMN individuals had significantly higher odds of post-traumatic stress disorder and depression than TFN individuals. Conclusion: By combining complex administrative claims-based algorithms, we identified the largest U.S.-based sample of transgender individuals and inferred the gender of >75% of the sample. Adjusted models extend prior research documenting key health disparities by gender category. These methods may enable researchers to explore rare and sex-specific conditions in hard-to-reach transgender populations.

Original languageEnglish (US)
Pages (from-to)254-263
Number of pages10
JournalLGBT Health
Issue number4
StatePublished - Jun 1 2022
Externally publishedYes


  • health comorbidities
  • insurance
  • methods
  • transgender

ASJC Scopus subject areas

  • Dermatology
  • Obstetrics and Gynecology
  • Public Health, Environmental and Occupational Health
  • Psychiatry and Mental health
  • Urology


Dive into the research topics of 'Improving Data-Driven Methods to Identify and Categorize Transgender Individuals by Gender in Insurance Claims Data'. Together they form a unique fingerprint.

Cite this