The policy iteration method is used in solving process.
文中应用策略迭代法求解。
The optimal allocation policy was obtained using policy iteration or value iteration.
采用策略迭代或值迭代的办法,可以求解系统的最优库存分配策略。
An appropriate selection of basis function directly in?uences the learning performance of a policy iteration method during the value function approximation.
该算法先用渐进方法进行多序列比对,然后通过迭代策略,利用上一轮多序列比对结果修正指导树,产生新一轮比对。
An appropriate selection of basis function directly in? Uences the learning performance of a policy iteration method during the value function approximation.
在策略迭代结强化学习方法的值函数逼近过程中,基函数的合理选择直接影响方法的性能。
Because traditional theoretical methods such as policy iteration and value iteration can usually not be used to optimize large-scale systems, we rely on simulation methods.
针对传统的理论优化方法如策略迭代、数值迭代不能适用于大规模系统的问题,我们采用仿真方法。
The first iteration of the policy issued too many permits, undermining demand.
政策实施之初发放太多的许可证会暗中损害需求。
Finally an iteration algorithm to find an optimal stationary policy is proposed, and an numerical example is provided to (illustrate) the application of the algorithm.
最后给出一个求解最优平稳策略的迭代算法,并提供一个数值例子以表明该算法的应用。
Finally an iteration algorithm to find an optimal stationary policy is proposed, and an numerical example is provided to (illustrate) the application of the algorithm.
最后给出一个求解最优平稳策略的迭代算法,并提供一个数值例子以表明该算法的应用。
应用推荐