Abstract
A reinforcement learning-based approach is proposed to design the multi-impulse rendezvous trajectories in linear relative motions. For the relative motion in elliptical orbits, the relative state propagation is obtained directly from the state transition matrix. This rendezvous problem is constructed as a Markov decision process that reflects the fuel consumption, the transfer time, the relative state, and the dynamical model. An actor–critic algorithm is used to train policy for generating rendezvous maneuvers. The results of the numerical optimization (e.g., differential evolution) are adopted as the expert data set to accelerate the training process. By deploying a policy network, the multi-impulse rendezvous trajectories can be obtained on board. Moreover, the proposed approach is also applied to generate a feasible solution for many impulses (e.g., 20 impulses), which can be used as an initial value for further optimization. The numerical examples with random initial states show that the proposed method is much faster and has slightly worse performance indexes when compared with the evolutionary algorithm.
Topics

No keywords indexed for this article. Browse by subject →

References
32
[1]
Terminal Guidance System for Satellite Rendezvous

W. H. CLOHESSY, R. S. WILTSHIRE

Journal of the Aerospace Sciences 1960 10.2514/8.8704
[2]
Tschauner J, Hempel P. Rendezvous zueinem in elliptischer bahn umlaufenden ziel. Astronaut Acta. 1965;11(2):104–109.
[3]
Yamanaka K, Ankersen F. New state transition matrix for relative motion on an arbitrary elliptical orbit. J Guid Control Dyn. 2002;25(1):60–66. 10.2514/2.4875
[4]
Lawden DF. Optimal trajectories for space navigation. London (England): Butterworths; 1963. p. 56–68.
[5]
Prussing JE. Optimal four-impulse fixed-time rendezvous in the vicinity of a circular orbit. AIAA J. 1969;7(5):928–935. 10.2514/3.5246
[6]
Prussing JE. Optimal two- and three-impulse fixed-time rendezvous in the vicinity of a circular orbit. AIAA J. 1970;8(7):1221–1228. 10.2514/3.5876
[7]
Carter TE, Alvarez SA. Quadratic-based computation of four-impulse optimal rendezvous near circular orbit. J Guid Control Dyn. 2000;23(1):109–117. 10.2514/2.4493
[8]
Samsam S, Chhabra R. Multi-impulse smooth trajectory design for long-range rendezvous with an orbiting target using multi-objective non-dominated sorting genetic algorithm. Aerosp Sci Technol. 2022;120: Article 107285. 10.1016/j.ast.2021.107285
[9]
Lin X, Zhang G, Ma H. Optimal low-thrust linearized elliptic orbit rendezvous considering the communication window. Acta Astronaut. 2022;197:14–22. 10.1016/j.actaastro.2022.05.004
[10]
Abdelkhalik O, Mortari D. N-impulse orbit transfer using genetic algorithms. J Spacecr Rocket. 2007;44(2):456–460. 10.2514/1.24701
[11]
Sun Z, Simo J, Gong S. Satellite attitude identification and prediction based on neural network compensation. Space Sci Technol. 2023;3: Article 0009.
[12]
Li H, Chen S, Izzo D, Baoyin H. Deep networks as approximators of optimal low-thrust and multi-impulse cost in multitarget missions. Acta Astronaut. 2020;166:469–481. 10.1016/j.actaastro.2019.09.023
[13]
Yang B, Li S, Feng J, Vasile M. Fast solver for j2-perturbed lambert problem using deep neural network. J Guid Control Dyn. 2022;45(5):875–884. 10.2514/1.g006091
[14]
Li J, Zhang G. Multi-spacecraft intelligent orbit phasing control considering collision avoidance. Trans Nanjing Univ Aeronaut Astronaut. 2022;2022(4):379–388.
[15]
Mastering the game of Go without human knowledge

David Silver, Julian Schrittwieser, Karen Simonyan et al.

Nature 10.1038/nature24270
[16]
Sutton RS, Barto AG. Reinforcement learning: An introduction2nd Cambridge (MA): MIT Press; 1998. Chapter 1, Introduction; p. 1–17.
[17]
Haarnoja T, Zhou A, Abbeel P, Levine S. Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In: Proceedings of the 35th International Conference on Machine Learning. PMLR; Proceedings of Machine Learning Research. vol. 80. 2018. p. 1861–1870.
[18]
Zavoli A, Federici L. Reinforcement learning for robust trajectory design of interplanetary missions. J Guid Control Dyn. 2021;44(8):1440–1453. 10.2514/1.g005794
[19]
Bonasera S, Bosanac N, Sullivan CJ, Elliott I, Ahmed N, McMahon JW. Designing Sun–Earth L2 halo orbit stationkeeping maneuvers via reinforcement learning. J Guid Control Dyn. 2023;46(2):301–311. 10.2514/1.g006783
[20]
He Y, Sheng B, Yin H, Liu Y, Zhang Y. Distributed satellite cluster laser networking algorithm with double-layer markov drl architecture. Space Sci Technol. 2023;3: Article 0012.
[21]
Silvestrini S, Lavagna M. Neural-based predictive control for safe autonomous spacecraft relative maneuvers. J Guid Control Dyn. 2021;44(12):2303–2310. 10.2514/1.g005481
[22]
Gupta A, Kumar V, Lynch C, Levine S, Hausman K. Relay policy learning: Solving long-horizon tasks via imitation and reinforcement learning. In: Proceedings of the Conference on Robot Learning. PMLR; Proceedings of Machine Learning Research. vol. 100. 2020. p. 1025–1037.
[23]
Zhang G, Zhou D. A second-order solution to the two-point boundary value problem for rendezvous in eccentric orbits. Celest Mech Dyn Astron. 2010;107(3):319–336. 10.1007/s10569-010-9269-3
[24]
Luo Y-Z, Tang G-J, Lei Y-J. Optimal multi-objective linearized impulsive rendezvous. J Guid Control Dyn. 2007;30(2):383–389. 10.2514/1.21433
[25]
Luo Y-Z, Tang G-J, Li H-Y. Optimization of multiple-impulse minimum-time rendezvous with impulse constraints using a hybrid genetic algorithm. Aerosp Sci Technol. 2006;10(6):534–540. 10.1016/j.ast.2005.12.007
[26]
Mnih V, Badia AP, Mirza M, Graves A, Lillicrap TP, Harley T, Silver D, Kavukcuoglu K. Asynchronous methods for deep reinforcement learning. In: Proceedings of The 33rd International Conference on Machine Learning. PMLR; Proceedings of Machine Learning Research. vol. 48. 2016. p. 1928–1937.
[27]
Bain M, Sammut C. A framework for behavioural cloning. In: Furukawa K, Michie D, Muggleton S, editors. Machine intelligence 15: Intelligent agents. Oxford (UK): Oxford University Press; 1995. p. 103–129.
[28]
Mastering the game of Go with deep neural networks and tree search

David Silver, Aja Huang, Chris J. Maddison et al.

Nature 2016 10.1038/nature16961
[29]
Ashvin N Murtaza D Abhishek G Sergey L. AWAC: Accelerating online reinforcement learning with offline datasets. ArXiv. 2021. https://doi.org/10.48550/arXiv.2006.09359
[30]
Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, Killeen T, Lin Z, Gimelshein N, Antiga L, et al. Pytorch: An imperative style, high-performance deep learning library. In:Wallach H, Larochelle H, Beygelzimer A, d’Alché-Buc F,Fox E, Garnett R, editors. Advances in neural information processing systems 32 (NeurIPS 2019). Red Hook (NY): Curran Associates, Inc.; 2019. vol. 32. p. 8024–8035.
[31]
SciPy 1.0: fundamental algorithms for scientific computing in Python

Pauli Virtanen, Ralf Gommers, Travis E. Oliphant et al.

Nature Methods 10.1038/s41592-019-0686-2
[32]
Baranov AA, Roldugin DS. Six-impulse maneuvers for rendezvous of spacecraft in near-circular noncoplanar orbits. Cosm Res. 2012;50(6):441–448. 10.1134/s0010952512050012
Cited By
25
Science China Technological Science...
Metrics
25
Citations
32
References
Details
Published
Jan 01, 2023
Vol/Issue
3
Cite This Article
Longwei Xu, Gang Zhang, Shi Qiu, et al. (2023). Optimal Multi-impulse Linear Rendezvous via Reinforcement Learning. Space: Science & Technology, 3. https://doi.org/10.34133/space.0047