Topics

No keywords indexed for this article. Browse by subject →

References
14
[1]
Barto, A.G., Bradtke, S.J. & Singh, S.P. (1991).Real-time learning and control using asynchronous dynamic programming. (COINS technical report 91-57). Amherst: University of Massachusetts.
[2]
Barto, A.G. & Singh, S.P. (1990). On the computational economics of reinforcement learning. In D.S. Touretzky, J. Elman, T.J. Sejnowski & G.E. Hinton, (Eds.),Proceedings of the 1990 Connectionist Models Summer School. San Mateo, CA: Morgan Kaufmann.
[3]
Bellman, R.E. & Dreyfus, S.E. (1962).Applied dynamic programming. RAND Corporation. 10.1515/9781400874651
[4]
Chapman, D. & Kaelbling, L.P. (1991). Input generalization in delayed reinforcement learning: An algorithm and performance comparisons.Proceedings of the 1991 International Joint Conference on Artificial Intelligence (pp. 726?731).
[5]
Kushner, H. & Clark, D. (1978).Stochastic approximation methods for constrained and unconstrained systems. Berlin, Germany: Springer-Verlag. 10.1007/978-1-4684-9352-8
[6]
Lin, L. (1992). Self-improving reactive agents based on reinforcement learning, planning and teaching.Machine Learning, 8. 10.1007/bf00992699
[7]
Mahadevan & Connell (1991). Automatic programming of behavior-based robots using reinforcement learning.Proceedings of the 1991 National Conference on AI (pp. 768?773).
[8]
Ross, S. (1983).Introduction to stochastic dynamic programming. New York, Academic Press.
[9]
Sato, M., Abe, K. & Takeda, H. (1988). Learning control of finite Markov chains with explicit trade-off between estimation and control.IEEE Transactions on Systems, Man and Cybernetics, 18, pp. 677?684. 10.1109/21.21595
[10]
Sutton, R.S. (1984).Temporal credit assignment in reinforcement learning. PhD Thesis, University of Massachusetts, Amherst, MA.
[11]
Sutton, R.S. (1988). Learning to predict by the methods of temporal difference.Machine Learning, 3, pp. 9?44.
[12]
Sutton, R.S. (1990). Integrated architectures for learning, planning, and reacting based on approximating dynamic programming.Proceedings of the Seventh International Conference on Machine Learning. San Mateo, CA: Morgan Kaufmann.
[13]
Watkins, C.J.C.H. (1989).Learning from delayed rewards. PhD Thesis, University of Cambridge, England.
[14]
Werbos, P.J. (1977). Advanced forecasting methods for global crisis warning and models of intelligence.General Systems Yearbook, 22, pp. 25?38.
Cited By
7,429
ACM Transactions on Software Engine...
IEEE Transactions on Neural Network...
IEEE Transactions on Services Compu...
A formal model for multiagent Q-learning on graphs

Jinzhuo Liu, Guangchen Jiang · 2025

Science China Information Sciences
Applied Sciences
Deep learning in computational mechanics: a review

Leon Herrmann, Stefan Kollmannsberger · 2024

Computational Mechanics
IEEE Transactions on Artificial Int...
Metrics
7,429
Citations
14
References
Details
Published
May 01, 1992
Vol/Issue
8(3-4)
Pages
279-292
License
View
Cite This Article
Christopher J. C. H. Watkins, Peter Dayan (1992). Q-learning. Machine Learning, 8(3-4), 279-292. https://doi.org/10.1007/bf00992698
Related

You May Also Like

Random Forests

Leo Breiman · 2001

120,932 citations

Support-Vector Networks

Corinna Cortes, Vladimir Vapnik · 1995

32,049 citations

Support-vector networks

Corinna Cortes, Vladimir Vapnik · 1995

30,034 citations

Bagging predictors

Leo Breiman · 1996

16,268 citations

Induction of decision trees

J. R. Quinlan · 1986

9,120 citations