Simplified to remove features unrelated to the present study, the experience-weighted attraction (EWA) model of Camerer and Ho (1999) is described by the following equations: equation(Equation 1) nc,t=nc,t−1×ρ+1,nc,t=nc,t−1×ρ+1,and equation(Equation 2) vc,t=(vc,t−1×φ×nc,t−1+λt−1)/nc,t.vc,t=(vc,t−1×φ×nc,t−1+λt−1)/nc,t.Here, ns,t is the “experience weight” of stimulus s (blue or yellow) on trial t, which is updated on every trial, using the experience decay factor ρ. vc,t is the value of choice c on trial t, λt ∈0, 1 for the outcome received in response to that choice and φ is the decay factor for the previous payoffs, equivalent to the learning rate in the Rescorla-Wagner model. In particular,
note that for ρ = 0, nc,t is everywhere 1, and the model reduces to Rescorla-Wagner. For ρ > 0, the experience weights promote more sluggish updating with time. Note Selumetinib in vivo that a rearrangement of the parameters is required to see the equivalence between these equations and Rescorla-Wagner. The Rescorla-Wagner learning rate, usually denoted α, is here equivalent to (1 – φ). Moreover, the softmax inverse temperature β, below, is equivalent to the product βα in Rescorla-Wagner. This is because the values vc,t learned here are scaled
by a constant factor of 1/α relative to those learned by their Rescorla-Wagner equivalents. This rescaling makes the find more model more numerically stable at small α. through The hypothesis reflected by this model is that perseverative behavior is caused by reduced learning from punishment, where punishment to the previously rewarded stimulus has little effect, resulting in a failure to devalue this stimulus. This model is described by the following equations: equation(Equation 3) vc,t=vc,t−1+αpun×(λt−1−vc,t−1)+αrew×(λt−1−vc,t−1)vc,t=vc,t−1+αpun×(λt−1−vc,t−1)+αrew×(λt−1−vc,t−1)and equation(Equation 4) v¬c,t=v¬c,t−1,v¬c,t=v¬c,t−1,where αpun is the punishment
learning rate (0 on reward trials), and αrew is the learning rate for reward (0 on punishment trials). V¬c,t is the value of the unchosen option. Note that only the chosen stimulus is updated. For both models, to select an action based on the computed values, we used a softmax choice function to compute the probability of each choice. For a given set of parameters, this equation allows us to compute the probability of the next choice being “i” given the previous choices: equation(Equation 5) p(ct+1=i)=eβQ(c=i,t+1)∑jeβQ(c=j,t+1).Here, β is the inverse temperature parameter. For both models, we fit all parameters separately to the choices of each individual ([RP: αpun, αrew; β; EWA: ϕ,ρ, β]). To facilitate stable estimation across so large a group of subjects, we used weakly informative priors (Table 1) to regularize the estimated priors toward realistic ones. Thus we use maximum a posteriori (MAP; rather than maximum likelihood) estimation (Daw, 2011).