Home » Journal of Intelligent Communication

Hybrid Evolutionary Reinforcement Learning for UAV Path Planning: Genetic Programming and Soft Actor Critic Integrations

Received: 09 September 2025
Published: 02 September 2025

Abstract

Unmanned Aerial Vehicle (UAV) path planning in unknown environments continues to pose a significant challenge, as Deep Reinforcement Learning (DRL) solutions are often severely hampered by slow convergence rates as well as unstable training dynamics. To address this gap, we introduce a Genetic Programming–seeded Soft Actor–Critic (GP+SAC) approach in which Genetic Programming produces high-quality trajectories that are introduced into the replay buffer of SAC as a “warm-start” policy to prevent wasteful early exploration. Through experiments in three benchmark grid environments, we demonstrate that GP+SAC converges significantly more rapidly than the FA-DQN baseline, achieving superior returns in fewer episodes while capitalizing on the same reward design. We show that in large environments, GP+SAC achieved a mean path length of 30.55 units as compared to FA-DQN’s 28.38, thus validating that rapid convergence has no tradeoff in path efficiency. Observably, results also show that as much as GP+SAC obtains superior cumulative rewards, there is a visible fluctuation in the level of training that is indicative of instabilities under very constrained environments. Numerical evaluations show that the proposed GP+SAC agent converges significantly faster than the FA-DQN baseline, achieving higher episodic returns within only a few episodes. In terms of path efficiency, GP+SAC yields an average path length of 30.55 units, which is comparable to the FA-DQN’s 28.38 units, demonstrating that accelerated convergence is achieved without sacrificing path optimality.

Keywords

References

×