In order to solve both of the "curse of dimensionality" and slow convergence speed problem,a reward optimization method based on action sub-rewards in hierarchical reinforcement learning was proposed.
针对强化学习的“维数灾”问题和算法收敛速度过慢的困难,提出了一种基于动作分值的分层强化学习奖赏优化方法。
参考来源 - 强化学习维数灾问题解决方法研究·2,447,543篇论文数据,部分数据来源于NoteExpress
应用推荐