不幸的是,字词边界一般都被忽视掉了,大部分人都没有在意他的现实意义。
Unfortunately, boundaries are so often skimmed over that many do not recognize their real significance. For example, let’s say you want to match the word “import”
如果把“engineering”拿去匹配,正则引擎会先匹配到“engineer”,但接下来就遇到了字词边界,b,所以匹配不成功。
If the regex engine is given the word 'engineering', it will correctly match 'engineer'. The next word boundary, \ b, will not match.
该方法对分词后的语料中产生的连续单字词进行关联范数估计,利用右边缘扩展的方法判断词的边界。
This new method is based on association norm estimation, and searches for the word boundaries by right boundary expanding.
应用推荐