Backpropagation of prediction errors explains ramping activity. (a) The TD prediction error across each of six consecutive trials (top to bottom) from the simulation in Figure 1b, with pr = 0.5. Highlighted in red is the error at the time of the reward in the first of the trials, and its gradual back-propagation towards the time of the stimulus in subsequent trials. Block letters indicate the outcome of each specific trial (R = rewarded; N = not rewarded). The sequence of rewards preceding these trials is given on the top right. (b) The TD error from these six trials, and four more following them, superimposed. The red and green lines illustrate the envelope of the errors in these trials. Summing over these trials results in no above-baseline activity on average (black line), as positive and negative errors occur at random 50% of the time, and so cancel each other. (c) However, when the prediction errors are asymmetrically represented above and below the baseline firing rate (here negative errors were asymmetrically scaled by d = 1/6 to simulate the asymmetric encoding of prediction errors by DA neurons), an average ramping of activity emerges when averaging over trials, as is illustrated by the black line. All simulation parameters are the same as in Figure 1b,d.
Niv et al. Behavioral and Brain Functions 2005 1:6 doi:10.1186/1744-9081-1-6