词条的文档频率(Document Frequency)是指在训练语料中出现该词条的文档数。文档频率方法提取文档频率较高的特征,它的目的是去掉在训练集上出现次数过少的特征,保留具...
基于170个网页-相关网页
...矾则是文档集D中出现特征项瓦的文档的数量,称为文档频率,^矾为码‘的 倒数,称为反转文档频率(inverted document frequency),显而易见,如果特征 项在表征文档Di中有重要作用,必然有着较高的项频和较低的文档频率(较高 的反转文档频率),因此其权值‰...
基于14个网页-相关网页
In traditional Document Frequency(DF) method,the number of a term which is used in a category is the only information for feature selection,wihtout involving the times of a term appearing in a document.
传统的文档频率(DF)方法在进行特征选择时仅考虑特征词在类别中出现的DF,没有考虑特征词在每篇文档中出现的词频率(TF)问题。
参考来源 - 基于文档频率的特征选择方法·2,447,543篇论文数据,部分数据来源于NoteExpress
但是通过分析我们发现,由于仅仅使用文档频率来衡量特征的区分能力,文档频率方法存在两个问题。
However, because the method of DF only USES document frequency to scale the distinguish capacity, we find it has two disadvantages.
尽管如此,用于排序结果的数学模型通常是常用的 词频/倒排文档频率模型的变体,而对于这种模型,已有很好的研究基础。
Still, the mathematical models used to rank results are usually some variation of the common term-frequency/inversed document frequency model, which is well-mapped territory.
我们还研究了与搜索频率、全文索引以及邮件文档是否在收件箱中编辑或保存相关的影响因素。
We also looked at the impact related to frequency of searching, full-text indexing, and whether mail documents are filed or kept in the Inbox.
应用推荐