journal article Mar 14, 1997

A Neural Substrate of Prediction and Reward

View at Publisher Save 10.1126/science.275.5306.1593
Abstract
The capacity to predict future events permits a creature to detect, model, and manipulate the causal structure of its interactions with its environment. Behavioral experiments suggest that learning is driven by changes in the expectations about future salient events such as rewards and punishments. Physiological work has recently complemented these studies by identifying dopaminergic neurons in the primate whose fluctuating output apparently signals changes or errors in the predictions of future salient and rewarding events. Taken together, these findings can be understood through quantitative theories of adaptive optimizing control.
Topics

No keywords indexed for this article. Browse by subject →

References
81
[1]
Dickinson A., Contemporary Animal Learning Theory (Cambridge Univ. Press, Cambridge, 1980); N. J. Mackintosh, Conditioning and Associative Learning (Oxford Univ. Press, Oxford, 1983); C. R. Gallistel, The Organization of Learning (MIT Press, Cambridge, MA, 1990); L. A. Real, Science253, 980 (1991) .
[2]
Pavlov I. P., Conditioned Reflexes (Oxford Univ. Press, Oxford, 1927); B. F. Skinner, The Behavior of Organisms (Appleton-Century-Crofts, New York, 1938); J. Olds, Drives and Reinforcement (Raven, New York 1977); R. A. Wise, in The Neuropharmacological Basis of Reward, J. M. Liebeman and S. J. Cooper, Eds. (Clarendon Press, New York, 1989); N. W. White and P. M. Milner, Annu. Rev. Psychol.43, 443 (1992); T. W. Robbins and B. J. Everitt, Curr. Opin. Neurobiol.6, 228 (1996) .
[3]
Rescorla R. A. Wagner A. R.in Classical Conditioning II: Current Research and Theory A. H. Black and W. F. Prokasy Eds. (Appleton-Century-Crofts New York 1972) pp. 64–69.
[5]
Pearce J. M. and , Hall G., ibid. 87, 532 (1980).
[6]
Kamin L. J., Punishment and Aversive Behavior, Campbell B. A., Church R. M.Appleton-Century-CroftsNew York1969279296.
[7]
Sutton R. S., Barto A. G., Psychol. Rev. 8821351981; R. S. Sutton, Mach. Learn.3, 9 (1988). 10.1037/0033-295x.88.2.135
[8]
Sutton R. S. Barto A. G.Proceedings of the Ninth Annual Conference of the Cognitive Science SocietySeattle WA1987; in Learning and Computational Neuroscience M. Gabriel and J. Moore Eds. (MIT Press Cambridge MA 1989). For specific application to eyeblink conditioning see J. W. Moore et al . Behav. Brain Res. 12 143 (1986) .
[9]
Quartz S. R., Dayan P., Montague P. R., Sejnowski T. J., Soc. Neurosci. Abstr. 18, 1210 (1992);
[10]
Montague P. R. Dayan P. Nowlan S. J. Pouget A. Sejnowski T. J. in Advances in Neural Information Processing Systems 5 Hanson S. J. Cowan J. D. Giles C. L. Eds. (Morgan Kaufmann San Mateo CA 1993) pp. 969–976.
[11]
Montague P. R., Dayan P., Sejnowski T. J., in Advances in Neural Information Processing Systems 6, , Tesauro G., Cowan J. D., Alspector J., Eds. (Morgan Kaufmann, San Mateo, CA, 1994), pp. 598-605.
[12]
Montague P. R., Sejnowski T. J., Learn. Mem. 1, 1 (1994); 10.1101/lm.1.1.1
[13]
Montague P. R., Neural-Network Approaches to Cognition—Biobehavioral Foundations, , Donahoe J., Ed. (Elsevier, Amsterdam, in press);
[14]
Montague P. R. and Dayan P. A Companion to Cognitive Science Bechtel W. and Graham G. Eds. (Blackwell Oxford in press).
[16]
A framework for mesencephalic dopamine systems based on predictive Hebbian learning

PR Montague, P Dayan, TJ Sejnowski

The Journal of Neuroscience 10.1523/jneurosci.16-05-01936.1996
[17]
Other work has suggested an interpretation of monoaminergic influences similar to that taken above (8-12) [K. J. Friston G. Tononi G. N. Reeke O. Sporns G. M. Edelman Neuroscience 59 229 (1994) 10.1016/0306-4522(94)90592-4
[18]
J. C. Houk J. L. Adams A. G. Barto in Models of Information Processing in the Basal Ganglia J. C. Houk J. L. Davis D. G. Beiser Eds. (MIT Press Cambridge MA 1995)] pp. 249-270. Other models of monoaminergic influences have considered what could be called attention-based accounts (4) rather than prediction error-based explanations [D. Servan-Schreiber H. Printz J. D. Cohen Science 249 892 (1990)]. 10.7551/mitpress/4708.003.0020
[19]
Koob G. F., Semin. Neurosci. 4, 139 (1992); 10.1016/1044-5765(92)90012-q
[20]
Wise R. A. and , Hoffman D. C., Synapse 10, 247 (1992); 10.1002/syn.890100307
[21]
DiChiara G., Drug Alcohol Depend. 38, 95 (1995). 10.1016/0376-8716(95)01118-i
[22]
Phillips A. G., Brooke S. M., Fibiger H. C., Brain Res. 85, 13 (1975); 10.1016/0006-8993(75)90998-1
[23]
Phillips A. G. , Carter D. A. , Fibiger H. C., ibid. 104, 221 (1976);
[25]
Phillips A. G. , Mora F. , Rolls E. T., Psychopharmacology 62, 79 (1979); 10.1007/bf00426039
[26]
Corbett D. and , Wise R. A., Brain Res. 185, 1 (1980); 10.1016/0006-8993(80)90666-6
[28]
Wise R. A., Behav. Brain Sci. 5, 39 (1982); 10.1017/s0140525x00010372
[29]
Beninger R. J., Brain Res. Rev. 6, 173 (1983); 10.1016/0165-0173(83)90038-3
[30]
___ and Hahn B. L., Science 220, 1304 (1983); 10.1126/science.6857251
[31]
Beninger R. J., Brain Res. Bull. 23, 365 (1989); 10.1016/0361-9230(89)90223-2
[32]
LeMoal M. and , Simon H., Physiol. Rev. 71, 155 (1991); 10.1152/physrev.1991.71.1.155
[34]
Schultz W., J. Neurophysiol. 56, 1439 (1986); 10.1152/jn.1986.56.5.1439
[35]
Romo R. and , Schultz W., ibid. 63, 592 (1990);
[36]
Schultz W. and , Romo R., ibid., p. 607;
[37]
Ljungberg T. , Apicella P. , Schultz W.,ibid. 67, 145 (1992);
[39]
Mirenowicz J. and , Schultz W., J. Neurophysiol. 72, 1024 (1994); 10.1152/jn.1994.72.2.1024
[40]
Schultz W. et al. in Models of Information Processing in the Basal Ganglia Houk J. C. Davis J. L. Beiser D. G. Eds. (MIT Press Cambridge MA 1995) pp. 233–248;
[42]
Recent experiments showed that the simple displacement of the time of reward delivery resulted in dopamine responses. In a situation in which neurons were not driven by a fully predicted drop of juice activations reappeared when the juice reward occurred 0.5 s earlier or later than predicted. Depressions were observed at the normal time of juice reward only if reward delivery was late [J. R. Hollerman and W. Schultz Soc. Neuroci. Abstr. 22 1388 (1996)].
[44]
Bertsekas D. P. and Tsitsiklis J. N. Neurodynamic Programming (Athena Scientific Belmont NJ 1996).
[45]
Church R. M., Contemporary Learning Theories: Instrumental Conditioning Theory and the Impact of Biological Constraints on Learning, Klein S. B., Mowrer R. R.ErlbaumHillsdale, NJ198941; J. Gibbon, Learn. Motiv.22, 3 (1991).
[46]
Grossberg S., Schmajuk N. A., Neural Networks 2, 79 (1989); 10.1016/0893-6080(89)90026-9
[47]
Grossberg S. and , Merrill J. W. L.,Cognit. Brain Res. 1, 3 (1992). 10.1016/0926-6410(92)90003-a
[48]
Dayan P., Mach. Learn. 8, 341 (1992);
[49]
Dayan P. and , Sejnowski T. J., ibid. 14, 295 (1994);
[50]
Jaakkola T. , Jordan M. I. , Singh S. P.,Neural Computation 6, 1185 (1994). 10.1162/neco.1994.6.6.1185

Showing 50 of 81 references

Cited By
7,476
Why things get important: GPCRs in salience processing

Nina K. Blum, Rainer K. Reinscheid · 2026

Biochemical Pharmacology
Reinforcement Learning as Meta-induction

Igor Douven, Gerhard Schurz · 2026

Minds and Machines
Neurobiology of Disease
Frustrative nonreward: A century of progress

Thomas A. Green, Ellen Leibenluft · 2026

Neuroscience & Biobehavioral Re...
Social stress and affiliative touch: A tale of two affective states

Lito Parapera Papantoniou, Orysia Vityk · 2026

iScience
Nature Neuroscience
Imaging Neuroscience
Nature Neuroscience
The neurobiology of overeating

Garret D. Stuber, Valerie M. Schwitzgebel · 2025

Neuron
Biological Theory
Nature Reviews Neuroscience
Communications Psychology
Striatal Dopamine Contributions to Skilled Motor Learning

Chris D. Phillips, Alexander T. Hodge · 2024

The Journal of Neuroscience
Metrics
7,476
Citations
81
References
Details
Published
Mar 14, 1997
Vol/Issue
275(5306)
Pages
1593-1599
Cite This Article
Wolfram Schultz, Peter Dayan, P. Read Montague (1997). A Neural Substrate of Prediction and Reward. Science, 275(5306), 1593-1599. https://doi.org/10.1126/science.275.5306.1593
Related

You May Also Like

Electric Field Effect in Atomically Thin Carbon Films

K. S. Novoselov, A. K. Geim · 2004

61,289 citations

Optimization by Simulated Annealing

S. Kirkpatrick, C. D. Gelatt · 1983

44,123 citations

Emergence of Scaling in Random Networks

Albert-László Barabási, Réka Albert · 1999

35,859 citations

Judgment under Uncertainty: Heuristics and Biases

Amos Tversky, Daniel Kahneman · 1974

27,432 citations

The Tragedy of the Commons

Garrett Hardin · 1968

22,676 citations