Origins of Chinese Language

Written by Sally Guo Updated Jun. 15, 2021

The earliest Chinese written language was the pictographic characters that belong to the so-called Oracle-Bone script. By the time of the late Han (BCE 206 – CE 220) Dynasty (in fact, during the Eastern Han (CE 25-220) Dynasty period, or the second half, as it were, of the Han Dynasty), a comprehensive set of more stylized characters – officially called hanzi (han zi = "Han (Dynasty) writing") though more often referred to today as "traditional characters" – was developed by the lexicographer Xu Shen, who compiled the first Chinese dictionary, which included an analysis of the Chinese written character.*(3)

Xu's Dictionary

Xu's dictionary was completed already in CE 100, but the author withheld its publication until CE 121 for "political" reasons (even publishing a treatise on something as seemingly banal as the Chinese language might be seen as a kind of heresy by certain emperors, or, if the treatise were to meet with the disapproval of the court's favorite scholars – perhaps for reasons as petty as jealousy – its author might not only not receive scholarly recognition, he might in fact be punished!).

Xu divided the system of Chinese characters then in use into six categories: pictographs, simple and compound ideographs (an ideograph expresses an abstract idea, such as hope, or a number, etc.), phonetic loans, phonetic compounds, and derivative characters. As the reader can appreciate, by the time of Xu's scholarly work, the written Chinese language had evolved enormously from its primitive, pictographic Shang Dynasty origins, with characters/ words that included both a semantic (meaning bearing) as well as a phonetic (pronunciation-bearing) component. In fact, less than 5% of the characters in use at the time of Xu's scholarly work were purely pictographic, the rest being stylized characters that contained both a semantic and a phonetic component.

Structure of Chinese Characters

The hanzi character consists, as indicated, of two parts, or components, a semantic and a phonetic component. The "leading" (i.e., on the left-hand side) part of a character, which is typically very stylistic, is generally the semantic component and has traditionally been referred to as a radical. Some radicals, however, represent the phonetic component. Xu Shen's dictionary listed 540 radicals, though this list puzzlingly included a number of redundancies (and note that by the time of the latter part of the Qing (CE 1644-1911) Dynasty, the number of radicals had been reduced to only 214).

The highly stylized hanzi character is made up of a combination of calligraphic dashes, or strokes. Whereas the words of Indo-European languages are made up of syllables which themselves are made up of letters, the words of the Chinese language are made up of characters, which, in turn, are made up of calligraphic strokes.*(4) Amazingly, there are only 12 standardized strokes in the Chinese language in use today.

Even though some hanzi characters consist of up to 64 combined strokes, all hanzi characters, however complex, are nonetheless composed of the aforementioned set of only 12, standardized strokes, meaning that regardless of the number of combined strokes, a hanzi/ traditional character can always be disaggregated into its 12 standardized strokes, as follows:

Since Chinese words do not consist of letters, the "alphabetization" of Chinese characters in an ancient character dictionary such as that compiled by the lexicographer Xu Shen is more complicated and is at least partly determined by the character's number of combined strokes (first come to the single-stroke characters, then the double-stroke characters, etc., etc.). Today, Chinese words are genuinely alphabetized with the help of Pinyin, then the various alternative script styles are listed (see more on Pinyin, aka Hanyu Pinyin, and similar earlier efforts to "translate" Chinese languages into a Western sound alphabet – aka Romanization, or Latinization – in the orthography section below).

Below are some examples of Shang Dynasty period Oracle-Bone pictographs as well as a few pictographs that have survived down through time, such as the pictographs for a mountain, a tree, and a pig. These pictographs are for the most part readily recognizable, even if, in some instances, only symbolically so:

Note that the pictograph for a pig, while not representational in the conventional sense, is very symbolic of the pig if one imagines the diagonal hash marks as representative of the pig's (sow's) teats (a modern viewer might argue that the hash marks represent the ribs of the pig's elongated abdomen, but since ancient society was very much preoccupied with fertility, given that one's progeny was the only old-age retirement plan on offer, the sow was most likely the symbolic representation of the pig).

Similarly, the strokes of "mountain" and "tree" are also representational in a symbolic sense. Of all of the pictographs listed immediately above, the pictograph for a woman is perhaps least "pictographic" to a modern viewer, though it was surely pictographic to a Shang Dynasty period viewer. In contrast, in the early-period Sumerian, or pre-Cuneiform, the pictograph for a woman is instantly recognizable as representative of the female (at least to about half of the world's current population : ) - Sumerian Woman):

Comments to the above ideo- and pictographs (if you click on the image above - any image on this site! - a separate window will open showing the image more clearly)

Interesting Facts for Chinese Characters

It is interesting to note the underlying fertility concept inherent in the compound ideographs for "good" and "home/family". I am assuming that the "pig under roof" imagery is symbolic of "mom, dad and a flock of kids", rather than any suggestion of a family being able to afford a pig (though either will work), and if we assume that early Chinese people may have lived on raised houses (houses on stilts) – as some Chinese ethnic minorities still do – then the sow might well have suckled her piglets on a bed of straw on the ground, underneath the human habitation part of the house, thus the sow would have been "under the roof". In contrast, the idea of a cow having to spend time "under the roof", rather than freely roaming the meadow in search of fresh green grass, is perhaps a good approximation of the notion of captivity, or prison.

The top part of the ideograph for "thought" – and perhaps ditto for "man" – was original "brain", i.e., brain + heart (thought) and brain/intellect + strength (man), but it eventually got corrupted, as it were – as did as many a Chinese word – by, in the instance of "brain", the homophone "field", which sounds the same as "brain". In the case of "man", either brain/intellect or field would work, since early agrarian man (and we have already established that written language arose thanks to the sedentary – as opposed to nomadic – a lifestyle that agriculture brought with it) tilled the fields, or at least he did the most strenuous part of this effort, even if the daily task of tending the field may have eventually been delegated to the female, once the crop had begun to sprout.

It is also interesting to note that the "rain cloud" part of the compound ideograph for "thunder" is in fact a compound itself since part of the ideograph for "rain cloud" is a modified "roof" of sorts, which we might here interpret as "the sky".

Examples of compound forms that include a semantic radical and a phonetic component (note that the columns refer to the right-hand side phonetic component while the rows refer to the left-hand side semantic component/ radical – though appearing as a "denominator" (as in "To slander" below) where the phonetic component itself is compound):

The prevailing consensus among anthropologists the world over is that the first writing – the prototype written language – was the use of numerals, which, since written language arose in response to the agrarian stage in mankind's development, seems intuitive enough, given that anyone wealthy enough to be able to store foodstuffs would have need of a system of bookkeeping in order to keep track of what s/he owned in the way of foodstuffs, containers, "money" (eg., rare seashells), etc. (in fact, one of the first "agricultural products" of ancient Sumeria was beer, a product that was brewed and kept in urns – presumably plugged in order to prevent the beer from becoming flat).

We have seen an example of Chinese numerals ("one - two - three" in the Simple Ideographs image) above; here are the numbers 1 through 10, as well as a few compounds of the number 10:

A modern "writing" counterpart to the above "printed" numerals – which typically appear on banknotes, cheques, coins, etc. – are the following complex numeral forms:

The writing script that is associated with the traditional (hanzi) characters is called wenyan (文言, in traditional characters), though it is known by many other names, including, in English, Regular script or Standard script (and there are about 3-4 Pinyin names for the same concept!). Wenyan was the main, or "classical", a form of Chinese writing that spanned the period beginning with the Han Dynasty and ending, roughly, with the last such dynasty in the early 20th century (the new writing script was introduced in the 1920s, during the Republic of China (1912-49) period). In fact, in ancient times, Wenyan was borrowed by neighboring Japan, Korea, and Vietnam until these cultures eventually developed their own respective writing scripts.

The wenyan script was made up of words that consisted of single-syllable characters, which naturally explains why the first dictionary over the Chinese language was a dictionary over the Chinese characters of the wenyan script.

Words consisting of multiple syllable characters first arose with the introduction of Chinese language reforms during the 1920s, following the 1917 development of the written form of Mandarin, baihua, in the vernacular style, by the Chinese scholar, Hu Shi. Hu, a philosopher, and historian developed baihua in an effort to make the language of the literary classics – Classical Chinese (i.e., Traditional Chinese) – more accessible to the masses. Hu's efforts paid off, with baihua becoming the official language of newspapers, periodicals, government documents, and even textbooks.

In itself it is a very curious phenomenon that even though for centuries there existed the spoken Chinese languages such as Mandarin and Cantonese (as well as other, lesser spoken Chinese languages), all educated Chinese people down through the ages continued to use the only extant version of written Chinese, Classical Chinese – that is, until the introduction of baihua and the Cantonese and other equivalents.

At about the same time that baihua was introduced, other scholars, including the Chinese-American linguist, Zhao Yuanren, had begun to experiment with a written form of Mandarin. Similar, albeit, unofficial written forms of vernacular Cantonese, etc., have since made their appearance. Yet, until the early 20th century, all educated Chinese people relied on Classical Chinese as their sole written language. Today, however, baihua has become the official standard written language for all Chinese people, regardless of their spoken language, even if unofficial vernacular forms of written Cantonese, etc., exist.

In fact, for this reason – i.e., because the written forms, including the unofficial ones, of the various vernacular Chinese languages were modeled on Classical Chinese – individuals who were born after the written forms of the vernacular languages made their appearances can, with little difficulty, wade their way through a Classical Chinese text (and this is of course much easier for a Cantonese speaker who has learned baihua, which is, of course, the case for the educated).

Create My Trip

Need Help?

Request a custom itinerary today and get one step closer to your personalized trip

Create Your Trip