This process is known more formally as parsing (or to be more precise, tokenizing, lexing, and parsing).
这个过程更正式的称呼是解析(parsing)(更准确地说,是标记解释(tokenizing)、词法解析(lexing)和语法解析)。
At the level of tokenizing, we are not yet interested in how the tokens are structured, just what they are.
在记号赋予这一层,我们还不想讨论如何构造记号,让它们维持原样就可以了。
Tokenizing/Lexing: breaking up a string of characters into meaningful (to the language) chunks, called tokens.
分词/词法分析:分词指将一段字符串破成有意义的(相对于这门语言)小块。
The filters do any post-tokenizing work on the tokens (typically dropping out punctuation and commonly occurring words like "the", "an", "a", etc).
滤词器在分词之后对标号做进一步处理(最典型的比如去掉标点符号和一些常见词,像"the", "an", "a")。
The filters do any post-tokenizing work on the tokens (typically dropping out punctuation and commonly occurring words like "the", "an", "a", etc).
滤词器在分词之后对标号做进一步处理(最典型的比如去掉标点符号和一些常见词,像"the", "an", "a")。
应用推荐