|
Resolution: standard / high Figure 4.
Dependence of the ramp on learning rate. The shape of the ramp, but not the height of its peak, is dependent on the learning
rate. The graph shows simulated activity for the case of pr = 0.5 near the time of the expected reward, for different learning rates, averaged
over both rewarded and unrewarded trials. According to TD learning with persistent
asymmetrically coded prediction errors, averaging over activity in rewarded and unrewarded
trials results in a ramp up to the time of reward. The height of the peak of the ramp
is determined by the ratio of rewarded and unrewarded trials, however, the breadth
of the ramp is determined by the rate of back-propagation of these error signals from
the time of the (expected) reward to the time of the predictive stimulus. A higher
learning rate results in a larger fraction of the error propagating back, and thus
a higher ramp. With lower learning rates, the ramp becomes negligible, although the
positive activity (on average) at the time of reward is still maintained. Note that
although the learning rate used in the simulations depicted in Figure 1b,d was 0.8,
this should not be taken as the literal synaptic learning rate of the neural substrate,
given our schematic representation of the stimulus. In a more realistic representation
in which a population of neurons is active at every timestep, a much lower learning
rate would produce similar results.
Niv et al. Behavioral and Brain Functions 2005 1:6 doi:10.1186/1744-9081-1-6 |