Stochastic gradient descent in multilayer networks of neuron-like units has led to dramatic recent progress in a variety of difficult AI problems. Now that we know how effective backpropagation can be in large networks it is worth reconsidering the widely held belief that the cortex could not possibly be doing backpropagation. Drawing on joint work with Timothy Lillicrap, I will go through the main objections of neuroscientists and show that none of them are hard to overcome if we commit to representing error derivatives as temporal derivatives. This allows the same axon to carry information about the presence of some feature in the input and information about the derivative of the cost function with respect to the input to that neuron. It predicts spike-time dependent plasticity and it also explains why we cannot code velocity by the rate of change of features that represent position.
Part of a symposium to celebrate the work of Professor Sir David MacKay FRS. The symposium was held over the period 14-15 March 2016.