Thus, an 8-bit, single-byte-character set (SBCS) is sufficient to represent the ASCII character set, as well as the character sets for many European languages.
因此,8位的单字节字符集(SBCS)足以表示ASCII字符集以及许多欧洲语言的字符集。
It first explained key concepts, such as character and byte semantics with respect to string data.
首先介绍了一些关键概念,如针对字符串数据的字符语义和字节语义。
An individual character is usually encoded using a byte or more, depending upon the encoding used.
单个字符通常使用一个字节或多个字节进行编码,具体情况取决于使用的编码方式。
For documents that contain Unicode characters beyond the ASCII range, the parser must read and convert multiple byte sequences for each character.
对于包含ASCII以外的Unicode字符的文档,解析器必须为每个字符读取和转换多字节序列。
The characters numbered 0 to 0x7f (127) encode to themselves as a single byte, and larger character values are encoded into 2 to 6 bytes.
从0到0x7f(127)的字符把自身编码成单字节,而将值更大的字符编码成2到6个字节。
The locale setting will cause the % ls format specifier in printf to call the wcsrtombs function in order to convert the wide character argument string into the locale-dependent multi-byte encoding.
语言环境设置会导致printf中的%l s格式说明符调用wcsrtombs函数以便于将宽字符的参数字符串转换成依赖语言环境的多字节编码。
In Figure 6, shows a search for the character after the third byte, which would have been the second occurrence of the character "a" if all were single byte characters.
在图6中,搜索第三个字节后的字符,如果所有的字符都是单字节字符的话应该搜索到第二次出现的“a”字符。
The 10xxxxxx byte is a continuation byte with the XXXXXX bit positions filled with the bits of the character code number in binary representation.
字节10xxxxxx是一个扩展字节,它的xxxxx x位位置被以二进制表示的字符代码号的位所填充。
Character vs byte semantics of string data.
字符与字符串数据的字节语义之间的对比。
One byte stores one character (e.g. a letter or a number).
1字节存储1个字符(如一个字母或一个数字)。
The problem of byte positions being returned for the character positions can be solved using character-based functions.
使用基于字符的函数可以解决将字节位置返回为字符位置的问题。
No longer can the program assume that one byte is one character, so all data has to be decoded from UTF-8 and encoded back to UTF-8.
程序不再假设一个字节就是一个字符,因此所有的数据都需要从UTF-8 进行解码,然后再重新编码成 UTF-8。
This encoding scheme makes it possible to encode an ASCII character with one byte, and a non-ASCII character with multiple (up to 4) bytes.
这个编码方案可以用一个字节对ASCII字符进行编码,用多个字节(最多4 字节)对非 ASCII 字符进行编码。
But since you have a multi-byte character as the first character, you get the result as 3, which is the first occurrence of the search string.
但是由于第一个字符是多字节字符,因此得到结果3,它是搜索字符串的第一次出现的位置。
However, in the case of a multi-byte encoding, the length of the character in bytes varies according to the encoding used, and each character can be one or more bytes in length.
但是对于多字节编码,字符的字节长度随使用编码模式的不同而不同,每个字符的长度可能是一个字节或多个字节。
Since the first character is multi-byte, it results in splitting the character and leads to dirty output.
由于第一个字符是多字节的,因此会导致字符分解和错误输出。
Consider that you have a character in UTF-8 encoding has length of 3 bytes, and the string has only the first two byte of the encoding.
假设您拥有一个UTF - 8编码的字符,其长度为3字节,而字符串只拥有编码的前两个字节。
English is not a multi-byte character set, so it cannot contain partial characters.
英语不是多字节字符集,因此不能包含部分字符。
Standard ASCII assigns a single-byte character in the English language to a numeric value, such as 0 to 127 in decimal format.
标准ASCII 为英语语言中的一个单字节字符分配一个数值,比如十进制格式的0到 127。
Recognizing the character as a single, unit as opposed to a sequence of bytes, is a requirement in the case of string manipulations involving multi-byte characters.
将字符看作一个单元而不是一个字节序列,这是进行多字节字符的字符串操作的必要条件。
This leads to non-support for the double-byte character input/output used by the Japanese, Korean, and Chinese character sets.
这并不支持由日本,韩国,以及中国字符集使用的双字节字符的输入/输出。
A final advantage of some (though not all) native XML databases is worth mentioning. They can retrieve the original, unparsed document, character-per-character or even byte-per-byte.
一些(不是全部)原生XML数据库带来的最后一个优点也值得一提,即能够逐字符甚至逐字节地检索原始的未解析文档。
Therefore, it is important to understand what constitutes a character for writing applications that involve multi-byte character data.
因此,理解字符组成对编写应用程序处理多字节字符数据非常重要。
If you use a nullable VARCHAR column and only ASCII characters (UTF-8 format 1 byte per character) involved, the maximum character length that can be indexed is 1021 characters.
如果使用可空的varchar列并只涉及ASCII字符(在utf - 8格式中每字符1字节),那么可以建立索引的最大字符长度是1021个字符。
The UTF-8 character set is easier to parse and to manipulate than any other multi-byte encoding format.
UTF - 8字符集比任何其他多字节编码格式更易于分析和操作。
In the case of a single-byte character encoding scheme, a single byte constitutes a character and the length of a single byte string is the same as the byte length of the string.
对于单字节字符编码模式,一个字节组成一个字符,单字节字符串的长度与字符串的字节长度相同。
However, these numeric values do not adhere to character semantics in the case of multi-byte character encodings, like UTF-8.
但是对于多字节字符编码(如utf - 8),这些数值并不符合字符语义。
As mentioned above, ratified standards provide for multi-byte character storage and portability; as yet, though, there are no standards for input or rendering.
正如上面介绍的一样,有一些广为认可的标准为多字节存储和可移植性提供了一些便利;然而,现在还没有为输入和显示制定标准。
Unfortunately, there is no EBCDIC encoder by default, so we'll convert the value to a UTF-16LE byte array (which just adds in a "0" byte for the second byte of each character).
遗憾的是,缺省情况下没有EBCDIC编码器,所以我们将把值转换为utf- 16le字符数组(这只是为每个字符的第二个字节添加一个“0”字节)。
Unfortunately, there is no EBCDIC encoder by default, so we'll convert the value to a UTF-16LE byte array (which just adds in a "0" byte for the second byte of each character).
遗憾的是,缺省情况下没有EBCDIC编码器,所以我们将把值转换为utf- 16le字符数组(这只是为每个字符的第二个字节添加一个“0”字节)。
应用推荐