" Universal Multiple 8-Bit Encoded Character Set ( UCS ) "ie. the international standard ISO/IEC10646.1-1993, is used for the written form of all kinds of languages in the world and used for the representation, transmission, exchange, processing, storage, input and display.
1. Overall framework of UCS
The overall framework of UCS encoded character set is a 4D encoding space, it contains 00~7F totaled 128 3D groups, each 3D group contains 00~FF totaled 256 2D plane, each 2D plane contains 00~FF totaled 256 1D row, each row totaled 256 character bits (00~FF), each character bit is represented by binary number with one byte. Therefore each character will use 4 binary number encoding in UCS, so as to determine the group, plane, row and character bit of each character in the encoding space. The above four 8-bit binary number encoded form are called four eight bit canonical form of UCS, written as UCS-4.
2. Basic multilingual plane
In the UCS encoding space, 00 plane of 00 group is called basic multilingual plane. It contains letters, syllables and characters as well as all kinds of symbols and numbers that usually used in ideogram.
The group encoding of basic multilingual plane is 00H. UCS stipulates that it can be omitted when the group of canonical form and the plane encoding is 00H, therefore, the characters arranged on the basic multilingual plane can be represented by binary number with two bytes, forming double 8-bit encoded character set, written as UCS-2.
The basic multilingual plane is divided into four zones, ie. A, I, O and R zone.
Zone A: code positions from 0000 ~ 4DFF totaled 19903 character bits. This zone is used for the encoding of letters, syllable as well as all kinds of symbols, in which, 0000 ~ 001F and 007E ~ 009F will be retained for character control.
Zone I: code positions from 4E00 ~ 9FFF totaled 20992 character bits. This zone is used for the unified ideogram of Chinese, Japanese and Korean(CJK), that is, the character encoding of Chinese characters from China, Japan and Korea.
Zone O: code positions from A000 ~ DFFF totaled 16384 character bits. This zone is unused at present, it will be kept for future standardization.
Zone R: code positions from E000 ~ FFFD totaled 8190 character bits. This zone is a usage-restriction zone, used for special character, distortion display form and the encoding of compatible characters.
3. The encoding of CJK unified Chinese character
The Chinese characters of unified encoding in zone I totaled 20902, according to the radicals-strokes ordering. In which, Chinese characters used by China totaled 17000, the source character set is simplified character of GB base set, the first, third and fifth assisted set traditional character, the universal character table of contemporary Chinese, the character set of post and communications, CNS11643 " Universal Chinese Character Standard Exchange Code " in Taiwan region, furthermore, 58 characters used by Hong Kong and 92 "吏读" used by Korean nationality in Yanbian Region are also included. Chinese characters used by Japan and Chinese source character set used by Korea in CJK unified Chinese character encoding character set are related standards for Japan and Korea separately.