Bandit Based Monte-Carlo Planning

Levente Kocsis; Csaba Szepesvári

doi:10.1007/11871842_29

book chapter Jan 01, 2006

Bandit Based Monte-Carlo Planning

Levente Kocsis Csaba Szepesvári

Lecture Notes in Computer Science pp. 282-293 · Springer International Publishing

View at Publisher Save 10.1007/11871842_29

Topics

No keywords indexed for this article. Browse by subject →

References

14

[1]

Auer, P., Cesa-Bianchi, N., Fischer, P.: Finite time analysis of the multiarmed bandit problem. Machine Learning 47(2-3), 235–256 (2002) 10.1023/a:1013689704352

[2]

Auer, P., Cesa-Bianchi, N., Freund, Y., Schapire, R.E.: The nonstochastic multiarmed bandit problem. SIAM Journal on Computing 32, 48–77 (2002) 10.1137/s0097539701398375

[3]

Barto, A.G., Bradtke, S.J., Singh, S.P.: Real-time learning and control using asynchronous dynamic programming. Technical report 91-57, Computer Science Department, University of Massachusetts (1991)

[4]

Billings, D., Davidson, A., Schaeffer, J., Szafron, D.: The challenge of poker. Artificial Intelligence 134, 201–240 (2002) 10.1016/s0004-3702(01)00130-8

[5]

Bouzy, B., Helmstetter, B.: Monte Carlo Go developments. In: van den Herik, H.J., Iida, H., Heinz, E.A. (eds.) Advances in Computer Games 10, pp. 159–174 (2004) 10.1007/978-0-387-35706-5_11

[6]

Chang, H.S., Fu, M., Hu, J., Marcus, S.I.: An adaptive sampling algorithm for solving Markov decision processes. Operations Research 53(1), 126–139 (2005) 10.1287/opre.1040.0145

[7]

Chung, M., Buro, M., Schaeffer, J.: Monte Carlo planning in RTS games. In: CIG 2005, Colchester, UK (2005)

[8]

Kearns, M., Mansour, Y., Ng, A.Y.: A sparse sampling algorithm for near-optimal planning in large Markovian decisi on processes. In: Proceedings of IJCAI 1999, pp. 1324–1331 (1999)

[9]

Lai, T.L., Robbins, H.: Asymptotically efficient adaptive allocation rules. Advances in Applied Mathematics 6, 4–22 (1985) 10.1016/0196-8858(85)90002-8

[10]

Péret, L., Garcia, F.: On-line search for solving Markov decision processes via heuristic sampling. In: de Mántaras, R.L., Saitta, L. (eds.) ECAI, pp. 530–534 (2004)

[11]

Sheppard, B.: World-championship-caliber Scrabble. Artificial Intelligence 134(1–2), 241–275 (2002) 10.1016/s0004-3702(01)00166-7

[12]

Smith, S.J.J., Nau, D.S.: An analysis of forward pruning. In: AAAI, pp. 1386–1391 (1994)

[13]

Tesauro, G., Galperin, G.R.: On-line policy improvement using Monte-Carlo search. In: Mozer, M.C., Jordan, M.I., Petsche, T. (eds.) NIPS 9, pp. 1068–1074 (1997)

[14]

Vanderbei, R.: Optimal sailing strategies, statistics and operations research program. University of Princeton (1996), http://www.sor.princeton.edu/~rvdb/sail/sail.html

Cited By

1,242

A novel tree search-based method for robust data-driven discovery of governing equations in complex network dynamics

Bingchen Dong, Zhenglin Liang · 2026

Nonlinear Dynamics

MCMC-Escape: Multi-Capacity Ordered Escape Routing Based on Monte-Carlo Tree Search

Jianxuan Yu, Zhenyi Gao · 2026

ACM Transactions on Design Automati...

Simulation Optimization of Spatiotemporal Dynamics in 3D Geometries

Bing Yao, Fabio Leonelli · 2025

IEEE Transactions on Automation Sci...

POMDP-Driven Cognitive Massive MIMO Radar: Joint Target Detection-Tracking in Unknown Disturbances

Imad Bouhou, Stefano Fortunati · 2025

IEEE Transactions on Radar Systems

Generative AI for designing and validating easily synthesizable and structurally novel antibiotics

Kyle Swanson, Gary Liu · 2024

Nature Machine Intelligence

Hybrid Parameter Search and Dynamic Model Selection for Mixed-Variable Bayesian Optimization

Hengrui Luo, Younghyun Cho · 2024

Journal of Computational and Graphi...

Automated machine learning: past, present and future

Mitra Baratchi, Can Wang · 2024

Artificial Intelligence Review

AdvSQLi: Generating Adversarial SQL Injections Against Real-World WAF-as-a-Service

Zhenqing Qu, Xiang Ling · 2024

IEEE Transactions on Information Fo...

Predictive chemistry: machine learning for reaction deployment, reaction development, and reaction discovery

Zhengkai Tu, Thijs Stuyver · 2023

Chemical Science

A UCB-Based Tree Search Approach to Joint Verification-Correction Strategy for Large-Scale Systems

Peng Xu, Xinwei Deng · 2023

IEEE Transactions on Systems, Man,...

Timing-Aware Qubit Mapping and Gate Scheduling Adapted to Neutral Atom Quantum Computing

Yongshang Li, Yu Zhang · 2023

IEEE Transactions on Computer-Aided...

Controlling chaotic itinerancy in laser dynamics for reinforcement learning

Ryugo Iwami, Takatomo Mihana · 2022

Science Advances

Partially Observable Markov Decision Processes and Robotics

Hanna Kurniawati · 2022

Annual Review of Control, Robotics,...

Monte-Carlo Robot Path Planning

Tuan Dam, Georgia Chalvatzaki · 2022

IEEE Robotics and Automation Letter...

Mastering Atari, Go, chess and shogi by planning with a learned model

Julian Schrittwieser, Ioannis Antonoglou · 2020

Nature

Machine Learning Guidance for Connection Tableaux

Michael Färber, Cezary Kaliszyk · 2020

Journal of Automated Reasoning

Cooperative Driving at Unsignalized Intersections Using Tree Search

Huile Xu, Yu-Zhong Zhang · 2020

IEEE Transactions on Intelligent Tr...

DeepStack: Expert-level artificial intelligence in heads-up no-limit poker

Matej Moravčík, Martin Schmid · 2017

Science

Single Photon in Hierarchical Architecture for Physical Decision Making: Photon Intelligence

Makoto Naruse, Martin Berthel · 2016

ACS Photonics

Bayesian Reinforcement Learning: A Survey

Mohammad Ghavamzadeh, Shie Mannor · 2015

Foundations and Trends® in Machine...

Metrics

1,242

Citations

14

References

Details

Published: Jan 01, 2006
Pages: 282-293

Authors

Cite This Article

Levente Kocsis, Csaba Szepesvári (2006). Bandit Based Monte-Carlo Planning. Lecture Notes in Computer Science, 282-293. https://doi.org/10.1007/11871842_29

Bandit Based Monte-Carlo Planning

You May Also Like