Primate orbitofrontal cortex codes information relevant for managing explore-exploit tradeoffs

Vincent D. Costa, Bruno B. Averbeck

Research output: Contribution to journalArticlepeer-review

33 Scopus citations

Abstract

Reinforcement learning (RL) refers to the behavioral process of learning to obtain reward and avoid punishment. An important component of RL is managing explore- exploit tradeoffs, which refers to the problem of choosing between exploiting options with known values and exploring unfamiliar options. We examined correlates of this tradeoff, as well as other RL related variables, in orbitofrontal cortex (OFC) while three male monkeys performed a three-armed bandit learning task. During the task, novel choice options periodically replaced familiar options. The values of the novel options were unknown, and the monkeys had to explore them to see if they were better than other currently available options. The identity of the chosen stimulus and the reward outcome were strongly encoded in the responses of single OFC neurons. These two variables define the states and state transitions in our model that are relevant to decision-making. The chosen value of the option and the relative value of exploring that option were encoded at intermediate levels. We also found that OFC value coding was stimulus specific, as opposed to coding value independent of the identity of the option. The location of the option and the value of the current environment were encoded at low levels. Therefore, we found encoding of the variables relevant to learning and managing explore- exploit tradeoffs in OFC. These results are consistent with findings in the ventral striatum and amygdala and show that this monosynaptically connected network plays an important role in learning based on the immediate and future consequences of choices.

Original languageEnglish (US)
Pages (from-to)2553-2561
Number of pages9
JournalJournal of Neuroscience
Volume40
Issue number12
DOIs
StatePublished - Mar 18 2020

Keywords

  • Decision-making
  • Explore- exploit
  • Monkey
  • Orbitofrontal cortex
  • Reinforcement learning

ASJC Scopus subject areas

  • General Neuroscience

Fingerprint

Dive into the research topics of 'Primate orbitofrontal cortex codes information relevant for managing explore-exploit tradeoffs'. Together they form a unique fingerprint.

Cite this