Help:Components

From Wenlin Dictionaries

This page explains Chinese character components appearing in Zi (字典 Zìdiǎn) entries and the CDL (Character Description Language) database.

Components

Wenlin includes different kinds of component information for each character, and this information is stored in different places in the program databases, serving a variety of purposes. The usual view of component-related data in Zìdiǎn entries is as follows (taking the entry for 好 hǎo ‘good’ as an example).

  • components: 女子
  • list characters containing 好 as a component
  • radical 38 女

Each of these lines in the Zìdiǎn entry provides access to a different kind of component information. The “components” line itself lists all common components, and optionally includes special structural and other information (discussed below). The “list characters containing X” line provides access to characters in which X appears. The “radical” line organizes characters according to their radical and residual stroke count.

There are several different systems of structural analysis of Chinese characters. In such systems, a given Chinese character may be analyzed as being comprised of other characters. A character which is used in the writing of another character is termed a “component” of the character in which it appears.

This is analogous to the use of English letters to construct words. For example, just as the letter “g” can be identified as (the left-hand) part of the word “good”, so too the Chinese character 好 hǎo ‘good’ can be said to contain on its left the character 女 nǚ ‘woman’ (serving as component). And as the English word “good” can be spelled out letter by letter, the Chinese character 好 hǎo ‘good’ may be said to contain the two components 女 nǚ ‘woman’ and 子 zǐ ‘child’. In fact, such analysis of a character as “containing” one or more “components” is relatively high-level, compared, for example, to stroke-by-stroke “spelling” as in CDL, but nevertheless serves several practical purposes.

  • First, it provides a rudimentary description of the shape of the character. In the case of 好, it happens to be historically true that 好 contains both 女 and 子 as components, and this is easily seen in the modern form of the character.
  • Second, this simple analysis causes the Zìdiǎn entry for 好 to be automatically indexed under both the components 女 and 子. In the entry for 女 (or 子), you can follow the link for “characters containing...” in order to find 好.

Caveats

Some characters don’t have a component analysis (for several reasons). When there is a component analysis, there is not always only one possible analysis, and experts might not always agree on the correctness of a given analysis. Moreover, “correctness” is sometimes a matter of purpose, rather than an absolute. What Wenlin identifies as a “component” for some purpose may not be historically accurate. Likewise, historically accurate analysis of the component structure may not always be obviously relevant to the immediate purposes of a student trying to learn/find/input a certain unknown character in finite time.

For example, the character 克 may have 兄 listed as one of its components, simply because the modern form of 克 seems to have 兄 at the bottom (and 十 at the top); but historically 克 and 兄 have nothing to do with each other (兄 is not a traditional component of 克). Nevertheless, it is easy to see 兄 in 克, and very useful for users to be able to access any character by way of easily recognizable parts.

The criteria used by Wenlin component analysis are neither rigid nor unambiguous. Certainly 意 contains 音 and 心 as components, since that is the true historical explanation. But we might want to list 立 as a component of 意 also, since anyone trying to learn details of the writing system might see 意 and understandably think it contained 立. To help the student find 意 when listing characters containing 立, we might want to list 立 as a component of 意. Then again, for some purposes we might prefer not to list 立 as a component. You decide! Simply by editing the database record for 意, you can add 立 or delete it as the case may be, and the index will automatically be updated accordingly.

For another example: 章 is historically derived from 音 and 十. But a student might reasonably imagine 章 was derived from 立 and 早. Certainly in the explanation of 章, we should say that 章 is from 音 and 十. But when we list the components, for the purpose of indexing, it might be kinder to the student to include all of the “components” 音十立 and 早 (and maybe even 日), to give the student a better chance of accessing the database record for 章 in the first place, if the student uses the command “List: Characters containing components” and specifies 立 as a component.

Advanced Features

Please note: the methods of searching by components described here are not yet implemented in our wiki; they are available in the desktop Wenlin Software for Learning Chinese.

This section describes advanced features relating to Zìdiǎn components and CDL (Character Description Language) components.

Wenlin stores information about the component structure of modern-style characters in two places: ① In the 字典 Zìdiǎn entry for that character, in the “components (#c)” line for that character; ② in the CDL font database. It is possible for users to perform separate component searches on each of these databases.

For searches of the “#c” type, searches may include one or more components in any order, and a special alternative view of the results is available. For searches of the CDL type, searches may include only a single CJK CDL comp element.

Component data: “#c” component data is available for most every common Chinese character, and for a great many uncommon ones as well (including most all 《漢語大字典》 characters, and all Shuōwén seal characters). CDL component data is available for all characters in Unicode 6.0 CJK-related blocks.

Component order: The order of components is not distinctive for the purposes of the “List: Characters containing components” dialogue: inputting the components “女子” (in that order) gives the same results that inputting “子女” (in that order) would give. In addition to finding the common modern character 好 hǎo ‘good’, if your “Options: Hanzi filter...” settings are set to maximum, you will also see the rare variant 𡥃 hǎo ‘good’ (in which the components are reversed, with 子 on the left instead of the right, and with 女 on the right.

Alternative view: If “!” EXCLAMATION POINT is prefixed to the list of components entered into the “List: Characters containing components” dialogue, you will see a different view of the result. Instead of seeing each Zìdiǎn record’s “top line”, you will see the “#c” component analysis line of the database record in the result. This can be very useful, since the user can then use the “Find:Search...” dialogue to search for components in such a list (perhaps with “Full=simple” enabled, or with regular expressions), to quickly find the exact character.

CDL comps: If the SHIFT key is held down when choosing “List: Characters containing components”, instead of searching “#c” lines in the Zìdiǎn, Wenlin enters a special mode in which it searches the CDL database instead. This provides an extremely powerful and fine-grained way to search the complete character database (including the entire Unicode 6.0 CJK character set). The actual CDL “comp” elements in CDL database records are examined recursively, in search of the exact component sought. (As of Wenlin 4, CDL “comp” search is limited to one CJK component at a time, and cannot be combined with the “!” prefix.)

Combining forms: CDL “comp” elements are sometimes special “combining forms” (a.k.a. “formal” or “graphical” variants), of relatively restricted distribution.

Consider the previous example of 好 hǎo ‘good’, this time with the Zìdiǎn entry “components” line displayed with the “advanced CDL (Character Description Language) features” option enabled in “Options:Advanced Options...”:

▷components: 女子 ; a total of 2 CDL comp elements (V=0): (子)

With that option enabled, Wenlin appends component information from the CDL database to the end of the line. In the above case, we see that CDL for 好[U+597d] (variant=0) has two CDL “comp” elements, a combining form of 女 nǚ ‘woman’ on the left, and 子 zǐ ‘son’ on the right. Note that the “#c” list (at the beginning of the line) has 女 (the independent character), while the CDL list has  (the dependent form).

As another example, consider the rare character 𡥃 (a variant of 好 hǎo ‘good’, with the same two components, but reversed). The form of 子[U+5b50] on the left of 𡥃 is 孑[U+5b51], which is both an independent character (completely different from 子 zǐ ‘son’) to be read “jié”, and also a “combining form” of 子 zǐ ‘son’. The CDL “comp” 孑[U+5b51] in 𡥃 is not listed among the “#c” components in the Zìdiǎn database record, so including 孑 in a simple “List: Characters containing components” search will not find the character 𡥃. However, if the SHIFT key is down when “List: Characters containing components” is selected, then the resulting list will include every character in the CDL database that has the 孑 ‘left-hand son’ CDL “comp” element at any depth of recursion (at present: “Total CDL descriptions with this comp, Direct and Indirect: 406”). If the “comp” is in the top level of the CDL description, then the character will be listed in the “Direct” list; if the “comp” occurs at a lower level (at a deeper depth of recursion, that is, as a “comp” of a “comp”), then the character will be listed in the “Indirect” list.

Wenlin developers have access to extensive structural information, including comprehensive stroke and component indexes. For details, please contact the CDL development team.

Shuowen Features

《說文解字·注》Seal Characters  􀙯􁡵􀱄􂥒􁻏 􀱬􀬣􂥒

In addition to component analyses for modern-style Chinese characters, Wenlin also includes component information for Seal 􀱬􀬣 (篆體) characters.

As in Zìdiǎn entries for modern-style characters, Zìdiǎn entries for Seal characters also include “component (#c)” lines listing Seal components, following the analysis of 段玉裁 (1815; Cook 2003).

Please note: the methods of searching by components described here are not yet implemented in our wiki; they are available in the desktop Wenlin Software for Learning Chinese.

The “List: Characters containing components” operations relating to “component (#c)” lines of modern-style characters also apply to Seal characters as well.

For example:

• Click the Seal character 􂊓 (好 hǎo ‘good’) to look up its entry in Wenlin’s Zìdiǎn;

The Seal Zìdiǎn entry opens: Note the “component” line in that entry:

▷篆字部: 􂈻􂥍

• Choose “List: Characters containing components”;

• Use the Grabber (or Copy/Paste) to insert the two components 􂈻􂥍 into the dialogue, and click “OK”.

The resulting list (which includes only one item) provides additional information on Seal character components at the end.

Some seal characters have no traditional component analysis. Such characters are termed “graphical primitives”, and although they may have obvious parts, these parts do not correspond to independent characters, and so are not termed components. For example, 􂥍 (子 zǐ ‘son’) has the following component analysis:

􂥍▷篆字部: 〇

Wherever “〇” occurs alone in the component analysis it indicates a “graphical primitive”. Although 􂥍 has parts identifiable as the child’s head, arms, body, none of these is traditionally associated with an independent character. For comparison, consider the component analysis for a traditional variant of 􂥍:

􂥎▷篆字部: 􂥍􁾗〇

In this case the “〇” in the component analysis follows a list of components, and so indicates that although there are parts of 􂥎 with obvious similarity to independent characters 􂥍 and 􁾗, and though 􂥍 must in fact be present in 􂥎, the traditional componential analysis (in the gloss in this edition of 《說文》) does not identify 􂥍 explicitly (only implicitly) as a component; likewise, the component 􁾗 (巛 → 巛 chuān ‘river’) is not really ‘river’ in this usage, but rather, a 象 ‘picture’ of the (flowing?) hair of the child. That is, the ‘river’ character is used without its usual meaning, simply as a stylized depiction of ‘hair’ (cp. the identical usage in 􁡞 ⇒ 𩠐 → 首 shǒu ‘head’).

For another example, consider the character 􁋑 ⇒ 𠅏 →克 kè ‘able’ (‘勝任’) already mentioned above. The traditional analysis alludes to the identification of a 􁎝 → 宀 mián ‘roof’ / 亠 tóu ‘lid’ component on the top (not 十 at all), but this is not explicit. The rest of the character structure is unclear even in Eastern Hàn times (121 A.D.): the lower part of 克 is associated (not with 兄, but) with a homophonous character 􀯫 ⇒ 刻 kè ‘carve, chop wood’, seen in an associated ancient form 􁋓. By this analysis, the top is not 􁎝 at all, but rather the axe head in the stump of a felled tree (cp. 􁋔􁣨). Even this analysis may be called into question, when one examines the various attested inscriptional forms from various periods associated with 克 by various paleographic authorities.

There are some 546 Seal entries which have only “〇” in the component analysis (out of 10706 entries total). These include many natural objects, body parts, and generic plant and animal characters. For example, 􁦗 ‘rock’, 􁶢 ‘water’, 􁨟􁨠 ‘horse’, 􀨤􀦸 ‘bird’. Many of the ancient variant forms listed in《說文解字》also lack component analyses. This is not to say that such characters do not have parts, or that componential analyses are impossible or completely unavailable. On the contrary, it is simply an indication that according to the regular methods of component analysis in this particular edition of《說文解字》there is no explicit analysis available in the Eastern Hàn text. The various commentaries will often speculate on character structure, with or without reference to independent characters or recurrent patterns, and sometimes with relation to inscriptional forms and historical developments.

In some cases a “〇” character in the component analysis may be followed by one or more Seal characters. For example, consider 祝:

􀀽▷篆字部: 􀀐􁝒􀌼􁝖〇􁝙

In this case, the traditional component analysis gives alternative explanations of the character structure: 􀀽 ‘curse/pray’ is comprised of 􀀐 ‘heavens’ on the left, and on the right is either 􁝒 ‘man/legs’ over 􀌼 ‘mouth’, or else 􁝖 ䷹ ‘joy’ abbreviated (omitting the top “􀋮” component of 􀐂 in 􁝖). Despite the fact that the traditional analysis does not identify 􁝙 ‘elder brother’ as a component of 􀀽, we include “􁝙” as a comment after the “〇” character to facilitate component-based look-up (by a user who happens to look at 􀀽 and see 􁝙 as one of its components).

The Eastern Hàn dictionary《說文解字》in the 清 Qīng Dynasty edition by 段玉裁 (1815; Cook 2003) is a primary traditional authority informing the opinions of modern paleographers.