Crudely, subjects choose actions because they think that those ac

Crudely, subjects choose actions because they think that those actions lead to outcomes that they presently desire. By contrast, habitual instrumental behavior is supposed to have been stamped in by past reinforcement (Thorndike, 1911) and so is divorced from the

current value of an associated outcome. Thus, key characteristics of habitual instrumental control include automaticity, computational efficiency, and inflexibility, while characteristics of goal-directed control include active deliberation, Selleck GSK1210151A high computational cost, and an adaptive flexibility to changing environmental contingencies (Dayan, 2009). Demonstrating that behavior is goal directed is usually assayed in a test session using posttraining manipulations, which either involve reinforcer devaluation or contingency degradation. Consider a test session carried

out in extinction, i.e., without ongoing reinforcement. In this case, there should be less instrumental responding for an outcome that has been devalued (for example, a food reinforcer that has just been rendered unpalatable) than for an outcome that has not. Importantly, this is only true if knowledge of a reinforcer’s current value (i.e., its desirability) exerts a controlling influence on performance; in other words, if task performance is mediated by a representation of the reinforcer (Adams and Dickinson, 1981). Conversely, habitual behavior comprises instrumental responding that continues to be enacted even Selleck PD-1/PD-L1 inhibitor 2 when the outcome is undesired. Various circumstances promote habitual responding, notably extended training on interval schedules of reinforcement involving single actions and single outcomes (Dickinson and Charnock, 1985, Dickinson and Balleine, 2002 and Dickinson et al., 1983). The requirement for extensive experience is key and this also implies that behavior Astemizole is initially goal directed but then becomes habitual over the course of experience. For completeness, we also mention the contingency criterion wherein goal-directed behavior also involves

an encoding of the causal relationship between actions and their consequences. Consider a subject trained to press a lever to receive an outcome. If the outcome subsequently becomes equally available with and without a lever press, goal-directed control leads to a decrease in pressing (Dickinson and Balleine, 1994 and Dickinson and Charnock, 1985). The behavioral distinction between goal-directed and habitual control has provided the foundation for a wealth of lesion, inactivation, and pharmacological animal experiments investigating their neural bases. Rodent studies repeatedly highlight a dorsomedial striatum circuit that supports goal-directed behavior (Balleine, 2005, Corbit and Balleine, 2005 and Yin et al., 2005). Related studies show that a circuit centered on dorsolateral striatum supports habit-based behavior (Yin et al., 2004, Yin et al.

In the theory of reinforcement learning,

In the theory of reinforcement learning, Selleck Autophagy inhibitor the general problem to be solved is to use experience to identify a suitable control policy in an unknown or changing environment (Sutton and Barto, 1998). All motor learning can be conceptualized within this framework; even if there is no explicit reward structure, any task implicitly carries some notion of success or failure that can be encapsulated mathematically through a cost (reward) function. There are two broad categories of solution methods for such a problem. In a model-based

approach, an explicit model of the dynamics of the environment is built from experience, and this is then used to compute the best possible course of action through standard methods of optimal control theory such as dynamic programming. Note that, in general, model-based control can also entail building GW-572016 a model of the reward structure of the task. In the case of motor control, however,

we assume that the reward structure is unambiguous: success is achieved by the cursor reaching the target. In model-free control, by contrast, no such model of the task dynamics is built and instead the value of executing a given action in a given state is learned directly from experience based on subsequent success or failure. While a model-based learning strategy requires significantly less experience to obtain proficient control in an environment and offers greater flexibility (particularly in terms of the ability to generalize knowledge to other

tasks), model-free approaches have the advantage of computational simplicity and are not susceptible to problems associated with learning inaccurate or imprecise models (Daw et al., 2005 and Dayan, 2009). Sitaxentan Therefore, each approach can be advantageous in different circumstances. In sequential discrete decision-making tasks, the brain utilizes both model-based and model-free strategies in parallel (Daw et al., 2005, Daw et al., 2011, Fermin et al., 2010 and Gläscher et al., 2010). Theoretical treatments have argued that competition between these two mechanisms enables the benefits of each to be combined to maximum effect (Daw et al., 2005). Our results suggest that a similar scenario of model-based and model-free learning processes acting in parallel also occurs in the context of motor learning. Adaptation is the model-based component, while model-free components include use-dependent plasticity and operant reinforcement. It is important to note that although the terminology of model-free and model-based learning arises from the theory of reinforcement learning, this does not imply that adaptation is directly sensitive to reward. On the contrary, we believe that adaptation is indifferent to reward outcomes on individual trials, and is purely sensitive to errors in the predicted state of the hand or cursor.

, 2009) Interestingly, basal forebrain activation also causes a

, 2009). Interestingly, basal forebrain activation also causes a decrease in interneuronal correlation and increase in sensory-driven response reliability in the visual cortex (Figure 3C), and both effects contribute to improved coding of natural scenes (Goard and

Dan, 2009). The strong similarity between the effects of attention and basal forebrain activation again suggests an involvement of the cholinergic system in selective attention. The decrease in interneuronal correlation may be mediated by mAChRs within the cortex Antidiabetic Compound Library datasheet (Goard and Dan, 2009; Metherate et al., 1992), whereas the improved visual responses of single cortical neurons could involve enhanced responses of thalamic neurons (Goard and Dan, 2009), nAChR-dependent augmentation of thalamocortical transmission (Disney et al., 2007),

and/or mAChR-dependent firing rate increase within the cortex (Herrero et al., 2008; Soma et al., 2012). The recent finding that cholinergic activity can be modulated in a task-dependent manner (Parikh et al., 2007) further supports the plausibility of its involvement in attentional modulation. Of course, it is possible that the cholinergic input plays a permissive rather than instructive role. Selective attention is associated with local activity changes in neurons encoding the attended stimuli, but the neuromodulatory systems in general project Stem Cells inhibitor diffusely to multiple brain regions. Although there is some topographical

organization of the basal not forebrain projections to the cortex (Zaborszky et al., 1999), whether there is sufficient spatial precision to support the local modulation by selective attention remains unclear. Another candidate pathway is the top-down feedback from higher-order cortical areas, such as the frontal eye field (FEF), to the visual cortical areas (Gregoriou et al., 2009; Moore and Fallah, 2004; Zhou and Desimone, 2011) (Figure 6). Interestingly, firing rate increases in the FEF induced by local application of a dopamine receptor antagonist mimicked the attentional modulation of V4 neuronal responses (Noudoost and Moore, 2011), suggesting that the effect of neuromodulators could also be mediated by activating the cortico-cortical glutamatergic pathways. While selective attention is typically associated with firing rate increase of the relevant neurons, behavioral arousal or task engagement in general does not always lead to enhanced responses. In the barrel cortex, behavioral arousal or engagement in the learning of a new task was found to suppress whisker-evoked responses (Castro-Alamancos, 2004a; Castro-Alamancos and Oldford, 2002). Similarly, smaller responses to brief tactile stimuli were observed in the rat during exploratory whisker movement than during quiet immobility (Fanselow and Nicolelis, 1999).

To accomplish this, we labeled evoked vesicles by using one actio

To accomplish this, we labeled evoked vesicles by using one action potential (AP) stimulation at the beginning of the 30-s-long dye exposure. Spontaneous labeling was performed via dye exposure for 60 s in the presence of tetrodotoxin (TTX) after 30 s of “presilencing” with TTX to ensure complete activity block (Figure 1A). Given the low release probability of excitatory hippocampal synapses (Murthy et al., 1997) and a very low rate of spontaneous endocytosis at these synapses (∼1 vesicle/min) (Murthy and Stevens, 1999 and Xu et al., 2009), we expected Lapatinib cell line that these protocols would stain at most one vesicle per synapse

in the majority of the synaptic population. To further ensure that only single vesicles were selected for analysis, we used a previously described feature-detection software (Jaqaman et al., 2008) that was capable

of identifying closely positioned particles at subdiffraction distances. Because synaptic vesicles (∼50 nm in diameter) are much smaller than the diffraction-limited resolution of conventional light microscopy, individual vesicles are expected to appear as puncta with a size and shape very similar to the point spread function (PSF) of our imaging system, which was predetermined using stationary fluorescent 40 nm beads (see Figures buy XAV-939 1C and S1A–S1C). The detection software extracts the locations of objects within an image by fitting each detected feature with one or more appropriately positioned Gaussians, each with same width as the PSF. A mixture-model fitting algorithm which weights the penalty from having multiple PSFs against improvement of the fit in the form of an F test (cutoff p < 0.0005) is used to determine the optimal number of PSF features that would best represent each puncta (Jaqaman et al., 2008). Such iterative PSF fitting has been previously shown to achieve ∼100 nm resolution without the use of specialized superresolution imaging equipment (Thomann et al., 2002). In our experiments, more than one particle was also indeed identified in a small number of synapses

(<10%; Figures S1D and S1E). These cases were not analyzed further to avoid ambiguity of intersecting vesicle trajectories. To make sure that only single-vesicle trajectories were being analyzed, we plotted the histograms of integrated fluorescence values at the sites of functional synapses (as determined by our whole synapse stain/destain procedure; Figure 1A) for both spontaneous and evoked vesicle labeling (Figures S1D and S1E). The prediction of the number of vesicles labeled per functional synapse, as given by the fluorescence values histograms, closely agreed with the PSF feature counts from our detection software (Figures S1D and S1E, inset), providing an independent confirmation of the single-vesicle assertion.

The null direction sweep evoked similar excitation, but the inhib

The null direction sweep evoked similar excitation, but the inhibition preceded excitation (Figures 4A, right, and 4E). The magnitude of the excitatory and inhibitory conductances for the opposing directions and their ratio did not show significant difference (Figures 4D and S4D; p > 0.05, paired t test). Therefore, excitation

was suppressed to a larger extent by preceding inhibition in the null direction. Interestingly, with slower speed sweeps, we noticed that both preferred and null direction sweeps evoked large and transient excitatory conductances, whereas inhibitory conductances were scattered throughout the duration of FM sweeps (Figures S4C and S4D). This suggests that a coincident arrival of inhibitory inputs at the optimal speed might occur without regard to sweep JQ1 in vitro directions. Twenty-six neurons Galunisertib in vitro in the CNIC were recorded under the voltage-clamp mode. Among them, 17 neurons’ membrane potential changes were also measured. The DSI of membrane potential changes were well correlated with the cell’s CF, whereas both excitatory and inhibitory inputs were not (Figure 4C). Group data demonstrated an amplitude-balanced inhibition and a temporally reversed inhibition evoked by opposing directions (Figures

4D and 4E). To further examine the contribution of the temporal asymmetry between excitation and inhibition to

the direction selectivity, we used a single-compartment neuron model to simulate membrane potential responses (Figure S4E) (Zhou et al., 2010). When the excitatory input and the inhibitory input arrived at the same time, the membrane potential change was not strong enough to pass the action potential threshold to evoke spikes. However, when the excitatory input preceded inhibitory input, especially by more than 2 ms, the amplitude of the depolarization increased nonlinearly and could exceed the spike threshold. In comparison, when the inhibitory input preceded the excitatory inputs, the membrane most potentials were hyperpolarized first and then depolarized to a lesser extent, that is, below the threshold for all the tested temporal relationships. It implies that the direction-selective membrane potential output is sensitive to the temporal asymmetry of nonselective excitatory and inhibitory inputs received by DS neurons. To examine what is the synaptic mechanism underlying such temporal asymmetry of excitation and inhibition, and whether there is a coincidental arrival of synaptic inputs, we next had to acquire the spectrotemporal pattern of both excitatory and inhibitory inputs within their receptive fields. FM sweeps can be decomposed into a series of tone pips with continuously changing frequencies.

50%) in the frontal cortex of schizophrenics ( Sokolov, 1998) Cl

50%) in the frontal cortex of schizophrenics ( Sokolov, 1998). Clearly, SNPs found in the Grik4 gene as markers of schizophrenia and bipolar disorders confirm a role for this gene as a risk factor for mood disorders. The SNPs vary with the disease, with SNPs at the center of the gene at chromosome 6q11 associated with schizophrenia and SNPs at the gene’s 3′ end associated with bipolar disorders. Clarification of aspects mentioned above is critical for envisioning therapeutic opportunities. On the one hand, data from patients

suggest that a pharmacologically mediated increase in KAR activity Epigenetic Reader Domain inhibitor might be beneficial to protect against bipolar disorders, while on the other hand, behavioral data from mice (e.g., GluK4-deficient mice) may open the door to therapeutic opportunities for antagonists (e.g., of GluK4). However, this latter approach would be detrimental to other phenotypes, such as schizophrenia. The uncertainty of interpreting behavioral data in mice must also be born in mind and, as mentioned above, reduced immobility of KO mice in the Compound C mw forced swimming test has been interpreted as antidepressant in some cases and as a sign of mania in others. A significant

decrease in GluK2 mRNA expression has been reported in schizophrenic subjects (Porter et al., 1997). Interestingly, this gene maps close to a locus of schizophrenia susceptibility on chromosome 6 (6q16.3-q21) (Bah et al., 2004), although no association between this gene and schizophrenia could be demonstrated after studying 15 SNPs evenly distributed over the entire Grik2 region, ruling out a major role of GluK2 in the pathogenesis of schizophrenia ( Shibata et al., 2002). However, several genome-wide studies STK38 have shown significant linkage between bipolar disorders and chromosome 6q21 ( McQueen et al., 2005), where Grik2 maps, and GluK2 mRNA expression is also reduced in the brain of bipolar patients ( Beneyto et al., 2007). Interestingly, Grik2 KO mice exhibit a variety of behaviors, including hyperactivity, aggressiveness, and sensitivity to psychostimulants, reproducing in mice the behavioral symptoms of mania in humans (

Shaltiel et al., 2008). However, it is not currently possible to infer whether GluK2 is involved in the pathophysiology of mania and/or susceptibility to bipolar disorders, or if it is just related to some features of their symptoms. In one of the eight genomic loci linked to nonsyndromic autosomal recessive mental retardation in a study of 78 consanguineous Iranian families, gene defects were revealed precisely in an interval on chromosome 6116.1-q21. This locus contains 25 annotated genes, including Grik2, which was screened for DNA mutations in patients with mental retardation. Only one single nonpolymorphic sequence change was detected, involving a deletion that removed exons 7 and 8 of the Grik2 gene ( Motazacker et al., 2007).

, 1989) Most studies testing the BBS hypothesis investigated dis

, 1989). Most studies testing the BBS hypothesis investigated distributed neuronal activations within a given area (Singer and Gray, 1995). Yet, a stimulus activates neurons distributed across several brain areas and the BBS hypothesis is meant to apply also to such interareal neuronal assemblies. As V4 neurons with two stimuli in their RF dynamically represent the attended stimulus, the BBS hypothesis predicts that they should dynamically synchronize to those V1 neurons that represent the Bioactive Compound high throughput screening same, i.e., the attended, stimulus. This prediction is confirmed

by our present results. Attention affected the gamma rhythm in area V1: while there was no significant attention effect on gamma power, there was a very reliable increase in gamma frequency. The absence of an attentional effect on gamma power in V1 disagrees with one previous ISRIB supplier study using small static bar stimuli (Chalk et al., 2010) and agrees with another previous

study that used very similar stimuli and task as our paradigm (Buffalo et al., 2011). The attentional increase in gamma peak frequency has not been reported before. It is intriguing, because attention to a stimulus is similar to an increase in stimulus contrast (Reynolds and Chelazzi, 2004), and higher contrast induces higher gamma-band frequencies in monkey area V1 (Figure S5A) (Ray and Maunsell, 2010). Higher contrast typically results in gamma power to increase (Henrie and Shapley, 2005; Chalk et al., 2010). Yet, for very high contrast levels, gamma power can saturate or even decrease, as is illustrated in Figure S5B, which explains why attention to our full-contrast stimuli did not lead to further gamma power enhancements. Figure 5 shows that the local gamma peaks had a certain width, overlapping for their

larger parts. While the gamma peak frequency at the relevant V1 site was 2–3 Hz higher than at the irrelevant V1 site, it Fossariinae was 4–6 Hz higher than in V4. If one considered these slightly different gamma peak frequencies without the coherence results, then the simplest interpretation would be the following: the rhythms at the attended V1 site, the unattended V1 site, and the V4 site reflect three independent sine wave oscillators with slightly, but distinctly different, frequencies; the width of the respective frequency bands is due to moment-to-moment deviations from perfect sine waves of the respective frequencies; those deviations are irrelevant noise. This interpretation entails that the three oscillators constantly precess relative to each other, because their peak frequencies differ. For example, in monkey P, the V1-attended gamma peak frequency was 65.3 Hz and the V4 gamma peak frequency was 59.5 Hz (Figure 5), i.e., the peak frequencies differed by roughly 6 Hz.

Binned firing rates were then converted to z-scores and averaged

Binned firing rates were then converted to z-scores and averaged across all units with positive EPM scores and all such transitions. As expected, units that fired preferentially in the closed arms had higher firing rates prior to leaving the closed arm (Figure 5C, upper panel). Consistent with predictive firing patterns, closed-arm-preferring unit firing rates began to decrease approximately 2.5 s before the mouse left the closed arm. Similarly, firing rates of open arm-preferring units were

low in the closed arms and began to increase several seconds before the transition point (Figure 5C, middle panel). During transitions back to the closed arms, firing rates of these neurons demonstrated complementary profiles (Figure 5D). In both types of transitions, units with negative (non-paradigm-related) EPM scores did not display consistent changes in firing rates. RAD001 To quantitatively demonstrate predictivity, the time bins at which firing rates began to change were identified using a change point analysis (Gallistel et al., 2004). This method identifies the point at which the slope of the cumulative sum of the time

series of interest changes significantly (Kolmogorov-Smirnov test, p < 0.01). The identified change points are indicated by arrows in Figures click here 5C and 5D. Note that in each case, mPFC single unit activity began to change 1.5–2.7 s prior to the exit from or entry into the closed arm, demonstrating that firing rates

are not simply passively reflecting the location of the animal but rather foreshadowing behavior a few seconds into the future. To confirm these firing patterns using an unbiased approach, we used principal component analysis (Chapin, 2004) on firing rates of all units during of arm transitions (Figures 5E and 5F). As predicted from the firing patterns described above, the first principal component (PC1) during each transition type appeared to closely follow the patterns of closed-arm- and open-arm-preferring units, with PC1 value switching sign at or just prior to the transition point. Closed-arm- and open-arm-preferring units loaded inversely onto the PC1 for each transition type. The above data demonstrate that mPFC single units fired differently in closed and open arms of the EPM. However, firing patterns shown in Figure 1 could be induced by differences between the closed and open arms that are unrelated to anxiety. One such confound is the geometric arrangement of the arms. It is possible, for example, that a cell that is active preferentially in the open arms is actually firing not because the animal is in the open arms, but rather, because it is walking in the north-south direction.

For each task-related neuron, we then performed a three-way neste

For each task-related neuron, we then performed a three-way nested Galunisertib ANOVA with target position, distance, and color combination as factors using mean activity within a 100 ms window starting from color-change

onset and slid along the trial in steps of 10 ms. If the neuron revealed a main effect of target position in at least three consecutive time bins, it was classified as a target-selection unit (Figure 3). (See Table S1 for the results.) The position (left or right) at which the unit produced the stronger response to the target was considered the neuron’s preferred location. In 64 out of the 122 target-selective neurons, we obtained data during the mapping task. We conducted in each unit a three-way ANOVA with target position, color, and motion direction as factors using mean firing rates within a 300 ms time window following stimulus onset. The proportion of neurons selective for each factor appears in Figure S2B. In order to determine whether such proportions

were significantly different from those expected by chance, we compared them against the ones obtained through a simulation procedure. We simulated for each neuron firing rates for the same amount of trials as during the task. These were obtained through an algorithm that chose for each condition n values (n, number of trials) from a normal distribution of responses with mean equal to the mean firing rate across the entire sample (over the find more same 300 ms following stimulus onset) and standard deviation equal to the average standard deviation across the sample. For the few cases of negative firing rates, the values were set to zero. We then performed the same three-way ANOVA on the simulated data. We ran the simulation and the ANOVA 100 times and obtained mean estimates of the proportions as mafosfamide well as confidence intervals. The mean proportions of cells that revealed a significant main effect were: 5.1% (color), 5.33% (side), and 5.25% (direction). Confidence intervals for all of them were between 4.5%

and 5.82%, considerably overlapping with the real data corresponding to color and direction, but not target position. To quantify the ability of target-distracter discrimination by the group of 122 dlPFC neurons showing differences in firing rate between targets and distracter at the preferred location, we applied a ROC analysis. This analysis takes into account not only the differences in mean response between two conditions but also the response variability of a neuron in individual trials (Thompson et al., 1996). A derived measurement, the auROC, represents the probability with which, on the basis of firing rates, an ideal observer can reliably identify the target in the presence of a distracter. A value of 0.5 indicates that a given firing rate could have been elicited with equal probability by the target or the distracter at the neurons’ preferred location. A value of 1.

Frank et al

Frank et al. click here (2009) previously reported that the behavioral data were best fit with the simplifying assumption that subjects track the probability of positive RPEs, which can be accomplished by “counting phasic dopamine bursts,” rather than the specific expected reward values of the different responses. As such, θ consists of beta distributed, Beta(η,β), estimates of positive prediction errors expected for fast and slow responses ( Figure 2). Parameters from alternative models in which expected reward magnitude

is tracked are strongly correlated with those from this model that tracks the probability of RPE. But model fits are superior for the RPE model, which also yields uncertainty estimates that are potentially more suitable for fMRI (see

Supplemental Information). Given the learned expected values, the difference of their means (μslow, μfast) contributes to response latency on trial t scaled by free parameter ρ: equation(3) ρ[μslow(t)−μfast(t)]ρ[μslow(t)−μfast(t)]It is important to clarify that though the reward statistics are tracked for different categorical actions (i.e., in terms of “fast” versus “slow”), the predicted RTs are continuous as a function of these statistics. selleck chemicals More specifically, RTs are predicted to continuously adjust in proportion to the difference in mean reward statistics, in that a larger difference in values for fast and slow leads to larger changes in RT. Finally, the exploratory component of the model capitalizes on the uncertainty of the probability distributions to strategically explore those responses for which reward

statistics are most uncertain. Specifically, the model assumes that subjects explore uncertain responses to reduce this uncertainty. This component is computed as: equation(4) Explore(t)=ε[σslow(t)−σfast(t)],Explore(t)=ε[σslow(t)−σfast(t)],where σslow and σfast are the uncertainties, quantified in terms of standard deviations of the probability distributions tracked by the Bayesian update rule PD184352 (CI-1040) (Figure 2), and ε is a free parameter controlling the degree to which subjects make exploratory responses in proportion to relative uncertainty. In the primary model, we constrained ε to be greater than 0 to estimate the degree to which relative uncertainty guides exploration, and to prevent the model fits from leveraging this parameter to account for variance related to perseveration during exploitation. However, we also report a series of alternate models for which ε is unconstrained (i.e., it is also allowed to go negative to reflect “ambiguity aversion”; Payzan-LeNestour and Bossaerts, 2011). These exploit and explore mechanisms, together with other components, afford quantitative fits of RT adjustments in this task, and the combined model is identical to that determined to provide the best fit in prior work.