The Simsun (Founder Extended) includes more than 64,000 Chinese characters, but because the pronunciations of many of these Chinese characters is difficult (if not impossible) to determine, an input method based on a Chinese phonetic alphabet would have been difficult to produce. Therefore, Microsoft specially extended the original zone bit/inner code/Unicode input method from IMEs found in prior versions of Windows and developed the enhancement-type zone bit code. For the Chinese characters in GBK (ie. U+4E00~U+9FFF in Unicode), the user can continue to input according to the original mode (by inputting the zone bit code, the GBK inner code, or the Unicode code point). For the Chinese characters of Extension A, you can only input via the Unicode code point, and for the ones in Extension B, you can only input via the four-byte Surrogate code. In short, through the enhancement-type zone bit code, the user not only can input GBK Chinese characters, but also can input the Chinese characters of Extension A and Extension B smoothly. The key is how to get Unicode and Surrogate code bit of these Chinese characters. In the help file topic enhancement type zone bit code, we also include a set of code tables of Extension A and Extension B supported by the font. However, we do not suggest the user to query in the vast code table directly. Below is a relatively simple method that can help the user to query the code bit of Chinese characters faster. Before the query, the user should install Simsun (Founder Extended) font first, so as to made certain that the Chinese characters can be displayed normally when querying.
The code table has the following fields:
For the Chinese characters of the BMP (GBK and Extension A), use Unicode; for the Chinese characters of Extension B, you can use surrogate pairs.
The page number format of the Chinese Dictionary of Kang Xi is:xxxx.xxx, and the page number format of the Chinese Language Dictionary is:xxxxx.xxx. The source of the data here comes from the first edition of Chinese Dictionary of Kang Xi, January 1958.
Here the digit before the decimal point is the page number on the dictionary, the two digits behind the decimal point are the positions of this Chinese character in this page. It will represent that this Chinese character is indeed in this page if the last one is 0. If the last one is “1”, this means that the particular Chinese character is not in this page, and the two digits behind the decimal point will represent the particular Chinese character according to the stroke numbers.
Here are three examples:
㐀 U+3400 0078.010 10015.030 (the Chinese character of Extension A, Page 78 of Chinese Dictionary of Kang Xi, the first character. To input this Chinese character, you only need to input 3400 under the Unicode input pattern).
㑢 U+3462 0106.041 10156.141 (the Chinese character of Extension A , not in the Chinese Dictionary of Kang Xi. But according to the radicals and number of it, this Chinese character should be in the Page 106 of the Chinese Dictionary of Kang Xi, the fourth character. To input this Chinese character, you only need to input 3462 under the Unicode input pattern ).
𠀀 D840DC00 00020000 0075.060 10011.070 (the Chinese character of the Extension B, Page 75 of the Chinese Dictionary of Kang Xi, the sixth character. You will see that D840DC00 is its surrogate encoding, while 0002000 is its Unicode code. To input this Chinese character, you only need to input surrogate code D840DC00 under the Unicode input pattern)
Therefore, the user only needs to find the corresponding page number on which the character would be found (xxxx.xxx of Chinese Dictionary of Kang Xi or xxxxx.xxx of the Chinese Language Dictionary) in Chinese Dictionary of Kang Xi (or the Chinese Language Dictionary), and take this page number character string as the keyword for search to find out the corresponding Unicode code point or surrogate pair. After that, you can input the character by activating this input method.
In Word 2002, you can input the Unicode code directly in Word 2002, then press Alt+X to convert the code point to the Chinese character. For the Chinese character of Extension B, please take out the first two leading zeroes. For example, for the above example (𠀀, Unicode is 00020000), you would input 020000, and then press Alt+X. To know the Unicode code point of a character, you also can locate the cursor behind the Chinese character, and then press ALT+X to display the corresponding Unicode code point.