Chinese character encoding

In computing, Chinese character encodings can be used to represent text written in the CJK languages — Chinese, Japanese, Korean — and (rarely) obsolete Vietnamese, all of which use Chinese characters. Several general-purpose character encodings accommodate Chinese characters, and some of them were developed specifically for Chinese. In computing, Chinese character encodings can be used to represent text written in the CJK languages — Chinese, Japanese, Korean — and (rarely) obsolete Vietnamese, all of which use Chinese characters. Several general-purpose character encodings accommodate Chinese characters, and some of them were developed specifically for Chinese. In addition to Unicode (with the set of CJK Unified Ideographs), local encoding systems exist. The Chinese Guobiao (or GB, 'national standard') system used in Mainland China and Singapore, and the (mainly) Taiwanese Big5 system used in Taiwan, Hong Kong and Macau are two primary 'legacy' local encoding systems. Guobiao is usually displayed using simplified characters and Big5 is usually displayed using traditional characters. There is however no mandated connection between the encoding system and the font used to display the characters; font and encoding are usually tied together for practical reasons. The issue of which encoding to use can also have political implications, as GB is the official standard of the People's Republic of China and Big5 is a de facto standard of Taiwan. In contrast to the situation with Japanese, there has been relatively little overt opposition to Unicode, which solves many of the issues involved with GB and Big5. Unicode is widely regarded as politically neutral, has good support for both simplified and traditional characters, and can be easily converted to and from the GB and Big5. Furthermore, Unicode has the advantage of not being limited only to Chinese, since it can also display many other character sets. The Guobiao (GB) line of character encodings start with the Simplified Chinese charset GB 2312 published in 1980. Two encoding schemes existed for GB2312: a one-or-two byte 8-bit EUC-CN encoding commonly used, and a 7-bit encoding called HZ for usenet posts.:94 A traditional variant called GB/T 12345 was published in 1990. The EUC-CN form was later extended into GBK to include all Unicode 1.1 CJK Ideographs in 1993, abandoning the ISO-2022 model. By doing so, GBK includes Traditional Chinese characters in addition to simplified ones in GB2312. GBK gained popularity through the widespread Code page 936 implementation found in Microsoft Windows 95. In 2000, GB 18030 was published as GBK's successor. This new encoding includes a four-byte UTF which encodes all Unicode codepoints not previously encoded. In 2005, GB 18030 was published to contain reference glyphs for scripts used by ethnic minorities in China, as well as glyphs from CJK Unified Ideographs Extension B due to the update of Unicode. Adobe-GB1 is the corresponding PostScript charset for GB encodings. The Big5 family of character encodings start with the initial definition by the consortium of five companies in Taiwan that developed it. It is a double-byte character set (DBCS) somehow similar to Shift JIS, often combined with a MBCS like ASCII. Quite a few vendor as well as official extensions exist, of which ETEN, HKSCS (Hong Kong) and Big5-2003 (as a part of CNS 11643 by Taiwan) are the most well-known ones. Adobe-CNS1 is the PostScript charset corresponding to the Big5 family of encodings.

Parent Topic

Child Topic

No Parent Topic