Frank et al

Frank et al. click here (2009) previously reported that the behavioral data were best fit with the simplifying assumption that subjects track the probability of positive RPEs, which can be accomplished by “counting phasic dopamine bursts,” rather than the specific expected reward values of the different responses. As such, θ consists of beta distributed, Beta(η,β), estimates of positive prediction errors expected for fast and slow responses ( Figure 2). Parameters from alternative models in which expected reward magnitude

is tracked are strongly correlated with those from this model that tracks the probability of RPE. But model fits are superior for the RPE model, which also yields uncertainty estimates that are potentially more suitable for fMRI (see

Supplemental Information). Given the learned expected values, the difference of their means (μslow, μfast) contributes to response latency on trial t scaled by free parameter ρ: equation(3) ρ[μslow(t)−μfast(t)]ρ[μslow(t)−μfast(t)]It is important to clarify that though the reward statistics are tracked for different categorical actions (i.e., in terms of “fast” versus “slow”), the predicted RTs are continuous as a function of these statistics. selleck chemicals More specifically, RTs are predicted to continuously adjust in proportion to the difference in mean reward statistics, in that a larger difference in values for fast and slow leads to larger changes in RT. Finally, the exploratory component of the model capitalizes on the uncertainty of the probability distributions to strategically explore those responses for which reward

statistics are most uncertain. Specifically, the model assumes that subjects explore uncertain responses to reduce this uncertainty. This component is computed as: equation(4) Explore(t)=ε[σslow(t)−σfast(t)],Explore(t)=ε[σslow(t)−σfast(t)],where σslow and σfast are the uncertainties, quantified in terms of standard deviations of the probability distributions tracked by the Bayesian update rule PD184352 (CI-1040) (Figure 2), and ε is a free parameter controlling the degree to which subjects make exploratory responses in proportion to relative uncertainty. In the primary model, we constrained ε to be greater than 0 to estimate the degree to which relative uncertainty guides exploration, and to prevent the model fits from leveraging this parameter to account for variance related to perseveration during exploitation. However, we also report a series of alternate models for which ε is unconstrained (i.e., it is also allowed to go negative to reflect “ambiguity aversion”; Payzan-LeNestour and Bossaerts, 2011). These exploit and explore mechanisms, together with other components, afford quantitative fits of RT adjustments in this task, and the combined model is identical to that determined to provide the best fit in prior work.

Comments are closed.