This paper is concerned with the problem of a novel Q-learning algorithm for solving optimal cost function.
该文利用求解最优费用函数的方法给出了一种新的Q学习算法。
In order to enhance the study speed and the convergence rate of Q-learning algorithm, an algorithm that based on the experience knowledge about environment is proposed.
为了提高智能体系统中的典型的强化学习——Q -学习的学习速度和收敛速度,使学习过程充分利用环境信息,本文提出了一种基于经验知识的Q -学习算法。
The improved Q learning algorithm was suggested because of the traditional algorithm has limitations of slow and partial constringency.
传统的Q学习存在收敛速度慢和容易导致局部收敛的矛盾,为此提出一种改进的Q学习算法。
A fuzzy Q learning algorithm is proposed in this dissertation, which map continuous state Spaces to continuous action Spaces by fuzzy inference system and then learn a rule base.
首先,提出一种模糊Q学习算法,通过模糊推理系统将连续的状态空间映射到连续的动作空间,然后通过学习得到一个完整的规则库。
The paper proposes a model of reinforcement learning based on ant colony algorithm, namely the combination of ant colony algorithm and Q learning.
本文提出了一种基于蚁群算法的强化学习模型,即蚁群算法与Q学习相结合的思想。
Q learning algorithm is the most popular reinforcement learning algorithm, but the algorithm exist some problems.
目前主流的强化学习算法是Q学习算法,但Q学习本身存在一些问题。
To solve the problem of slow update speed in Q learning, a multi-step Q learning scheduling algorithm is proposed, in which the value function is updated based on the information in multiple steps.
针对任务调度的Q学习算法更新速度慢的问题,提出一种基于多步信息更新值函数的多步q学习调度算法。
Readers'collision problem was analyzed and an anti-collision algorithm based on Q-learning was presented out in the paper.
文章分析了读卡器碰撞的两种情形,提出了基于Q学习的仿碰撞算法,并进行了仿真测试。
In this paper Q reinforcement learning algorithm is adopted for mobile robot local path planning. It makes mobile robot resolve the problem of local path planning in a complex environment.
将Q强化学习算法应用于移动机器人局部路径规划,解决了移动机器人在复杂环境中的局部路径规划问题。
A single behavior object contains the algorithm for optimizing the demonstrated group of ACTS. The algorithm is using the Q-learning based on artificial nerve network.
在单独的行为对象中包含了基于强化学习中的Q学习及人工神经网络的优化学习算法。
A new model based on Markov decision processes is proposed and the correlative novel algorithm is implemented with the adaptive ability of improved Q-learning for dynamic grid service selection.
对满足马尔可夫决策过程的服务组合提出了一种支持不完备信息描述的网格服务描述模型,实现了对服务组合整个生命周期的描述。
To the problems higher rate of false retrieval in anomaly detection system due to the uncertainty of intrusion, this paper presents an Anomaly Detection Model Based on Q- Learning Algorithm (QLADM).
针对网络入侵的不确定性导致异常检测系统误报率较高的不足,提出一种基于Q-学习算法的异常检测模型(QLADM)。 该模型把Q-学习、行为意图跟踪和入侵预测结合起来,可获得未知入侵行为的检测和响应。
To the problems higher rate of false retrieval in anomaly detection system due to the uncertainty of intrusion, this paper presents an Anomaly Detection Model Based on Q- Learning Algorithm (QLADM).
针对网络入侵的不确定性导致异常检测系统误报率较高的不足,提出一种基于Q-学习算法的异常检测模型(QLADM)。 该模型把Q-学习、行为意图跟踪和入侵预测结合起来,可获得未知入侵行为的检测和响应。
应用推荐