通过利用基于词频的权值计算,同时改进传统文本相似度计算概率模型,改进SVM算法实现邮件过滤系统。
We can use term frequency to have a weighted calculation and improve traditional text similarity calculation probability model in SVM algorithm.
文本文在大规模语料的基础上,利用语言模型中稀疏事件的概率估计方法对汉语的熵进行计算,并讨论了语料规模等因素对熵的影响。
Different estimation methods of the probabilities of sparse events for the computation of the entropy in large scale modern Chinese text are applied in this paper.
基于概率的算法只考虑了训练集语料的概率模型,对于不同领域的文本的处理不尽如人意。
And the probabilistic methods those consider the probabilistic model of the training set only also do a bad job on the texts of a specific domain.
基于概率的算法只考虑了训练集语料的概率模型,对于不同领域的文本的处理不尽如人意。
And the probabilistic methods those consider the probabilistic model of the training set only also do a bad job on the texts of a specific domain.
应用推荐