Recording of [ʐǎ]
1 2021-12-08T17:53:30+00:00 Daniel Flaum ea93be260d87ba3b2eff1eff16731562457639f9 80 1 plain 2021-12-08T17:53:30+00:00 Daniel Flaum ea93be260d87ba3b2eff1eff16731562457639f9This page is referenced by:
-
1
2021-09-01T20:25:49+00:00
Chapter 4: The Sounds of Standard Chinese, by Daniel Flaum
17
plain
2021-12-17T21:58:35+00:00
Section 1: Introduction
Standard Chinese is a variety in the Mandarin Chinese group, itself the most widely spoken and influential of the Chinese language groups. The Mandarin family claims 70% of the total Chinese population as native speakers, and has the largest geographic distribution, although other Chinese languages occupy substantial portions of the country. The map below is from the Perry-Castañeda Library Map Collection, and illustrates Mandarin's extent.
Standard Chinese is one of the official working languages of the UN, making it particularly relevant world wide in matters both political and economic (United Nations). It is an example of an isolating language, featuring a simple word structure: most words involve no more than two morphemes, and there is a focus on compounding and derivation, with very little reliance on inflection (Comrie, chap.42).
This chapter uses the phonological inventory from The Phonology of Standard Chinese, 2nd edition, by San Duanmu. According to Duanmu, Standard Chinese has 21 consonantal phonemes and five vowels. There are roughly 400 syllables when not including tonemes, the small number being down to the strict syllable structure noted by Comrie. Like nearly all Chinese languages, Standard Chinese uses a tone system with four tonemes that enables syllables to be stressed in different ways. If tones are considered, then Standard Chinese has about 1,300 syllables (Duanmu chap.3.5).
How to interpret sound illustrations
This chapter uses several illustrations similar to this one:
These illustrations, composed using Praat, can appear overwhelming. But they succinctly display a wealth of information that helps to characterize different sounds.
The top half displays the waveform, the simplest possible visual representation of a sound, where periods of high and low air pressure are represented by peaks and troughs in the line, and time flows from left to right.
The bottom half overlays four separate sets of information. The grainy black and white field is a spectrogram. As in the waveform, time flows from left to right. But the vertical axis represents pitch. A third axis is represented by using a range of grays from black to white. White on a point in the field indicates that a particular pitch at a particular time is not present, while black indicates that a pitch is present. The blacker the point, the louder the pitch. The spectrogram ranges from 0hz to 5000hz, covering the frequencies which are most important in human speech.
The blue line describes the overall pitch of the sound over time. In the above example, the pitch increases slightly. The red series of dots are formants, which also describe pitch. But unlike the pitch line, each formant can behave independently, so that one rises while another falls. Formants play an important role in determining the sounds of vowels.
Finally, the yellow line indicates intensity, or how loud the sound is at a point in time.
Section 2: Consonants
Standard Chinese possesses 21 consonantal phonemes, or individual sounds from the perspective of speakers. Following is a table displaying 19 of them. As in a standard International Phonetic Alphabet consonant chart, each column corresponds to a place of articulation, and each row except the last corresponds to a manner of articulation. The final row displays affricates. Although SC uses affricates as phonemes, from a phonetic perspective they are actually composed of two consonantal sounds pronounced very quickly.
Labial
Dental/alveolar
Retroflex
Velar
Stop
p, pʰ
t, tʰ
k, kʰ
Nasal
m
n
ŋ
Fricative
f
s
ʂ, ʐ
x
Lateral approximant
l
Affricate
ts, tsʰ
ʈʂ, ʈʂʰ
Several things are worth noting. First, the dental place of articulation is the most used, including stops, affricates, fricatives, nasals, and liquids. Second, the majority of the consonants are voiceless.
Thirdly, three of these consonants are syllabic and may be used as the nucleus of syllables: [m, n, ŋ]. Fourthly, SC also uses the three semivowels [j, w, ɥ]. Semivowels are difficult in the sense that they blur the distinction between vowels and consonants (Trask). I have chosen to place them among the consonants based on their behavior: they may be used in the onsets and codas of syllables.
Besides the 19 consonants and 3 semivowels above, Standard Chinese also uses the alveolo-palatals [tɕ, tɕʰ, ɕ]. These are the unaspirated voiceless alveolo-palatal affricate, the aspirated voiceless alveolo-palatal affricate, and voiceless alveolo-palatal fricative. There is nothing special about them compared to other consonants already covered. They are commonly separated from other consonants merely due space constraints on printed IPA charts, similar to how the f-block is set apart in the most common layout of the Periodic Table of the Elements.
Selected consonantal phonemes
We now begin a more in-depth exploration of selected consonantal phonemes.
Aspiration in the the labial stops [p, pʰ]
Standard Chinese uses aspiration to distinguish between phonemes. Aspiration is a relatively strong puff of breath that follows the release of a stop (Davenport and Hannahs, sec.5.2.4.4).
An example of this is found in the unaspirated voiceless labial stop [p] and the aspirated voiceless labial stop [pʰ]. We know they are separate phonemes because we have found minimal pairs that prove this. The two words below are the SC words [pay], meaning "to worship", and [pʰay], meaning "to send".
These two words can be distinguished in the above diagrams by noting how aspiration appears in the second diagram. It is the period extending from the beginning of the sound to just prior to the beginning of the blue pitch line. During this period we can see how the exaggerated unvoiced flow of air produces sound widely spread over many different pitches, creating a "neutral" sound not unlike air flowing from a car's ventilation system.
This period is an example of a positive voice onset time, where voicing begins after the stop has already been released. Voice onset time measures the time between which a stop sound is released and the vocal chords begin vibrating (Trask). In the first diagram, the voice onset time is zero, so that voicing begins exactly when the stop is released.
Voicing in the retroflex fricatives [ʂ, ʐ]
Standard Chinese uses voicing to distinguish between phonemes. An example of this is found in the voiceless retroflex fricative [ʂ] and the voiced retroflex fricative [ʐ]. We know they are separate phonemes because we have found minimal pairs that prove this.
The two words below are the SC words [ʂa], meaning "to kill", and [ʐa], meaning "to disturb". These are not a true minimal pair because the vowels differ in their tone. They are used here because they were what was available in the UCLA Phonetics Lab Archive. Tone will be discussed in greater detail later.
These two words can be distinguished much the same way we distinguished the unaspirated and aspirated labials previously: by looking at the period from the beginning of the sound to the beginning of the blue pitch line.
In the second diagram above, the voicing of the voiced retroflex fricative is evidenced by the extent of blue pitch line throughout the course of the beginning of the sound, simultaneously with the "neutral" sound of the flow of air that is more prominent in fricatives generally. This is in contrast to the first diagram, where the pitch line does not appear until just after the fricative has ended.
Be careful, though: this time around we are not dealing with voice onset times. Voice onset times are used only to describe stop consonants (Davenport and Hannahs, sec.5.2.4.4), but these are fricatives which could be thought of as having aspiration "baked" in. If we were looking at stop consonants, however, then we would discuss how the voiced stop has a negative voice onset time, whereas the unvoiced stop would have a zero-length onset time.
Section 3: Vowels
Following is a table displaying the vowel phonemes used in the Standard Chinese language. The table follows the usual form of a vowel table, where columns correspond to the tongue's position along the sagittal axis and rows correspond to the tongue's position along the longitudinal axis. These are called the backness and height of the vowels, respectively.
Front
Central
Back
High
i, y
u
Mid
ə
Low
a
Of these, the vowels [y, u] are rounded.
Selected vowel phonemes
We now begin a more in-depth exploration of selected vowel phonemes. We begin by reviewing formants.
Understanding formants
A musical instrument may play a series of notes one at a time. Here, for example, is a spectrograph of a simple major scale:
But instruments may also play chords, such as these:
Different sets of notes may be chosen to produce chords of different qualities. One of the easiest to hear is the difference between a major chord and a minor chord, each repeated here twice:
They differ only in the middle note, but people tend to perceive the major chord as sounding "happy" or "good" while the minor chord sounds "sad" or "bad".
An analogy can be drawn between chords and vowels. A formant is similar to a single note in a chord, while a vowel is the entire chord. Vowels are like chords, and by containing different "notes" result in sounds of different "qualities".
Backness in high vowels
Standard Chinese uses backness to distinguish between vowel phonemes. Backness refers to the position of the tongue along the sagittal axis that runs from the back of the head through to the front of the head. An example of this is found in the rounded high front vowel [y] and the rounded high back vowel [u].
Unfortunately, I was unable to locate an acceptable minimal pair in UCLA Phonetics Lab Archive. Although recordings of a pair exist, they are of different speakers in different recording environments, making them unsuitable for comparative analysis. Instead, the examples below are of a non-minimal pair, but still serve to illustrate the relevant acoustic differences. They are the words [ny], meaning "daughter", and [fu], meaning "father".
The two vowels in the above diagrams can be distinguished by their formants. Recall that formants are displayed as series of red dots. In the first diagram, the second formant from the bottom is relatively high, occupying the space mostly around 2500 Hertz. But in the second diagram, this formant is much lower, nearer 1100 Hertz.
This formant, called the second formant or F2, strongly correlates with the backness of a vowel. As the tongue moves closer to the front of the mouth, F2 becomes higher in pitch. But as the tongue moves closer to the back of the mouth, F2 becomes lower in pitch.
Section 4: Syllables and Syllable Structure
The following discussion is based on Duanmu's chapter 4. Standard Chinese has a relatively simple syllable structure. The structures of twelve basic varieties are depicted below:
These twelve varieties are now repeated at the phoneme level using a pure text notation that agrees with the symbols used in the above diagram. C represents a consonant, V a vowel, G a glide (semivowel), and CG a single surface form of a Standard Chinese phoneme consisting of a consonant followed by a semivowel. The IPA symbol for lengthening a vowel (ː) is also used.
- Vː
- VV
- VC
- CVː
- CVV
- CVC
- GVː
- GVV
- GVC
- CGVː
- CGVV
- CGVC
Duanmu's varieties are far from the only scholarly position, however. Much of Duanmu's discussion is dedicated to mentions of alternative views and reasonings for Duanmu's own choices. (Part of this may be ascribed to phonology's general longstanding struggle to define and wield syllables effectively, as noted by Trask.) A selection of these issues follows.
The zero-onset issue
In basic syllable theory, a syllable may be composed of up to three parts: the onset, nucleus, and coda, of which only the nucleus is mandatory. The onset and coda are consonants (possibly consonant clusters), while the nucleus is a vowel, diphthong, or triphthong. In a language, specific sequences of phones may or may not be valid syllables depending on rules (or, as we shall see in section 5, constraints) that refine these syllable forms into a set of concrete syllables.
Scholars found that the onset in Standard Chinese appeared to break this mold, however. Although it is possible for a syllable to have an empty onset, they had noticed that syllables which would have begun with a non-high vowel instead frequently received an onset, such as [ʔ].
Some scholars explained this with a "zero-onset", a sort of abstract phoneme that, while silent, could interact with its environment and result in a pronounceable surface form. According to these scholars, in Standard Chinese syllables the onset was mandatory just like the nucleus, and in those rare cases where not filled with some consonant or semivowel was instead filled with the zero-onset's silent form.
Duanmu used to support this explanation, but has since revised their position. Their decision hinges on whether the effect of the zero-onset is driven by a natural need--that the vocal tract cannot easily pronounce a vowel without first producing a perceptible obstruent. Proponents of the zero-onset imply that this need is not in play. Opponents, including Duanmu, assert that it is, and that therefore the zero-onset is not truly a part of Standard Chinese phonology. Their conclusion means that Standard Chinese does have optional onsets, the same as most other languages.
The rhyme issue
In basic syllable theory, a syllable's onset, nucleus, and coda can be distributed among two parts: the onset; and the rhyme, containing the nucleus and coda both. But in Standard Chinese this has been complicated by the CG phonemes.
Scholars have adopted various positions about whether the semivowel belongs to the onset or the rhyme, seen below.
These possibilities are now described in pure text. Clockwise from top left:
- Duanmu's position, that CG are surface forms of a single phoneme belonging to the onset.
- That C and G are separate, consecutive phonemes belonging to the onset.
- That C and G are separate phonemes, with the G belonging to the nucleus in the rhyme.
- That C and G are separate phonemes, with the G belonging to dedicated "G-slot" which forms part of the onset.
- The C and G are separate phonemes, with the G belonging to a dedicated "G-slot" which forms part of the rhyme.
(The term "G-slot" is my own creation and not used by Duanmu, who mentions only that "some believe that G has its own slot.") In addition to the above five possibilities, there is a sixth held by some who say that both 4 and 5 are true and occur in variation.
Duanmu justifies their choice of a single CG phoneme surface form with several arguments. Here are summaries of the first four:
- That the English word [swei] ("sway") and the Standard Chinese word [swei] ("age") sound different, such that English [sw] can be more easily perceived as two sounds than Chinese [sw].
- That all observed CG sounds in Standard Chinese can be described as a single phone when phones are built using distinctive features.
- The presence or absence of a semivowel does not significantly affect the perceived length of a syllable in minimal pairs.
- There are observed instances where a CG surface form alternates with a simple C form.
This issue in particular illustrates the progressing course of Standard Chinese phonological literature, thanks to Duanmu's application of the relatively recent distinctive features.
The final position issue
Duanmu divides the twelve syllable structures from the beginning of this section into heavy and light syllables and asserts that it is reasonable to expect syllables of the same weight to also possess the same duration.
But they point out that the same syllable may take on different durations depending on whether it is followed by a pause. They use as an example the word for "horse":
- When followed by another word: [maː]
- When followed by a pause: [maːa]
Besides their duration, these two cases differ in their pronounced tone. We'll return to tone in section 6, but for now it is enough to know that this word uses a falling-rising tone, and the second case's longer length makes this significantly more evident. Duanmu claims that the lengthier latter case can be expressed in his syllable system by using either one or two syllables:
Unlike in other issues, Duanmu does not choose a particular side.
Section 5: Allophonic Alternations
The nature of the physical workings of the vocal tract as well as the differences in the voices of individual speakers mean that not all sequences of sounds are equally easy for speakers to pronounce. To cope with this, languages naturally develop phonemes, which are abstract sets of sounds that are interchangeable. During speech production, speakers choose (usually unconsciously) which specific sounds, or allophones, of a phoneme to use.
Over the years, linguists have developed different ways to describe and explain these variations. Two terms which are vital to this are surface form and underlying form. The underlying form is the phoneme itself, while the surface form is the allophone which ends up being chosen by the speaker for pronunciation.
The constraint-based approach
The following discussion is based on Duanmu's chapter 3. Duanmu begins by noting how previous scholars have differed in many ways about how to characterize Standard Chinese phonology. Such differences range from relatively trivial (such as which IPA symbol to use to transcribe the same sound) to more consequential (such as whether certain pairs of sounds are contrastive or not). The author then explains how they choose to resolve these differences when possible and which sides (or new positions) they take when resolution is not possible. One of the most important choices the author makes is whether to use a rule-based or constraint-based approach when dealing with how phonemes combine and vary into their allophones.
The rule-based approach is more traditional, wherein we write things such as
[ə] → [−back] / __[−back]
to mean that the sound [ə] loses its backness when it precedes a front vowel, so that it becomes [e] or [ɛ]. In this approach, a linguist may devise a set of rules which apply in a specified sequence like a pipeline, each rule taking as input the result of the previous rule's application.
The alternative is a constraint-based approach. Originally devised for use in syntax (where it is called Optimality Theory), this approach recasts ordered rules as ranked constraints. The example rule above becomes
Two adjacent vowels must agree in backness
During application, all possible resulting surface forms are considered simultaneously, and the form which satisfies the most and most highly ranked constraints "wins" and is selected for pronunciation.
For example, suppose we have the phoneme /ə/ with the three allophones [ə, e, ɤ]. Suppose also that we have two named, ranked constraints:
- Front-agreement: Two adjacent vowels must agree in frontness
- Back-agreement: A vowel preceding a velar consonant must be a back vowel
Finally, suppose our phoneme appears in the environment [i_k]. Here we have a conflict. The first constraint demands that our phoneme become [e], but the second demands that it become [ɤ]. To work this out, we produce a constraint tableau:
/iək/
Front-agreement
Back-agreement
[iək]
*
*
[iek]
*
[iɤk]
*
Each row corresponds to one of the possible surface forms, a result of one of the use of one of the allophones of our phoneme. Each column corresponds to one of our constraints. Asterisks indicate that a constraint is violated by a surface form.
We can see that the first form, [iək], violates both constraints. The second form, [iek], violates only back-agreement. The third form violates only front-agreement.
Since front-agreement is ranked before back-agreement, it is more important to not violate front-agreement than back-agreement. Consequently, even though there is no possible form that does not violate some constraint, [iek] wins and will be pronounced.
This constraint-based approach is Duanmu's choice for considering allophonic variation in Standard Chinese.
G-Spreading
Duanmu discusses a constraint they call G-Spreading, and how it produces surface forms from underlying forms. The "G" in G-Spreading stands for glide, or semivowel. The G-Spreading constraint is expressed as follows:
A high nuclear vowel spreads to the onset C like so
[Ci] → [Cji]
[Cu] → [Cwu]
[Cy] → [Cɥy]The G-Spreading constraint describes how the first nuclear vowel of a syllable may demand a change in the surface form of the onset consonant. Specifically, the onset consonant becomes either palatalized, labialized, or labio-palatized, depending on the backness and rounding of the high vowel involved.
Variation in vowels
Duanmu describes how Standard Chinese's vowel phonemes surface in different forms depending on their environment. For example, the mid vowel may take one of three different surface forms when it appears in an open syllable:
- /ə/ → [eː] / j_
- /ə/ → [eː] / ɥ_
- /ə/ → [oː] / w_
- /ə/ → [ɤː] / elsewhere
In addition to these rules, /ə/ also becomes [oː] when it follows a labial consonant.
But there are also cases where Standard Chinese vowels do not seem to vary. For example, there is no evidence that the long vowels [iː, uː, yː, aː] ever experience any variation. For the high vowels, Duanmu suggests this is because these vowels cannot vary without risking conflation. For the low vowels, though, Duanmu can only say that it could vary... it just happens not to.
Unfortunately, Duanmu's descriptions are relatively thin and not always confident. For example, Duanmu's coverage of the mid vowel in closed syllables is only two sentences long:
In closed syllables, the mid vowel may have different shades of variation in different environments, but there is no evidence that such variation is relevant or required. For example, the mid vowel is somewhat rounded in [əu], but there is no phonological evidence that it has become [o].
Section 6: Focus on Tone in Standard Chinese
Thanks to their influential status on the world stage, Chinese languages are perhaps the best known examples of tone languages, where the pitch of the speaker's voice can create differences in the meaning of words. The following discussion is based on Duanmu's chapter 10.
Tonemes
Duanmu never uses the term "toneme", but considering that they can describe how the four Standard Chinese tones surface differently in different kinds of syllable, they probably should. According to Trask, a toneme is "any one of two or more distinctive tones in a particular language which can serve alone to distinguish words. By analogy with phoneme." Just as a single underlying phoneme surfaces as one of its allophones, so does a toneme surface as an allotone.
The tonemes of Standard Chinese
Duanmu describes the four tonemes. The literature, such as in Comrie, is relatively consistent about which number corresponds to which toneme, and they are frequently introduced like so:
Tone 1
High level
Tone 2
High rising
Tone 3
Dipping/falling
Tone 4
High falling
But Duanmu prefers not to use such names, instead using T1, T2, T3, and T4 throughout.
The inconsistency of previous treatments of tone
Duanmu begins by pointing out how scholars have differed radically in the past over how to handle tone in Chinese languages, including fundamental matters such as how to transcribe tones and whether tone is a prosodic feature--a feature that applies to more than one sound at a time.
For example, Duanmu presents no less than five systems for transcribing tone. Here are three, one in each row:
Tone 1
Tone 2
Tone 3
Tone 4
ma1
ma2
ma3
ma4
ma˥
ma˦˥
ma˧˩˧
ma˥˩
mā
má
mǎ
mà
The first system uses numerals to identify the term, the second uses IPA tone characters, and the third uses a set of diacritics. Duanmu notes that these systems make various tradeoffs, but that none of them are based on recent progress in the field.
Duanmu's approach to tones
Instead of trying to distill an usable tone system from the existing literature, Duanmu builds and justifies their own.
They begin by considering the phonetic foundation of tones, including their acoustic and articulatory properties. Duanmu chooses to focus on two muscles: the cricothryoid and the vocalis. The first is thought to drive the pitch of the voice, so that as the muscle is drawn tighter so are the vocal folds, resulting in faster vibrations and therefore a higher pitch. The second is thought to drive the breathiness of the voice, although its action is more complex than that of the cricothyroid. Duanmu mentions the vocalis because, although Standard Chinese does not rely on breathiness for contrast, the vocalis can also affect the pitch produced, and is used by Standard Chinese speakers.
Next, Duanmu builds two phonological features of a tone: pitch and register. Duanmu's phonological pitch is not the same as phonetic or acoustic pitch. Phonetic and acoustic pitches exist on a continuum, but Duanmu's phonological pitch is discrete. Register is similar.
Duanmu then claims that pitch and register each may take on one of two values, and the four resulting combinations produce four tone levels. Phonological pitch can be thin or thick (raising or lowering the acoustic pitch, respectively), and register can be stiff or slack (likewise). The four tone levels are named and listed with their features below:
- Non-breathy H(igh), of thin pitch and stiff register
- Non-breathy L(ow), of thick pitch and stiff register
- Breathy H, of thin pitch and slack register
- Breathy L, of thick pitch and slack register
Duanmu's approach is not entirely new. They mention the work of a prior scholar, Yip, who developed an extremely similar system. But Duanmu's differs in a key point. Whereas Yip's claims that the tone levels may be ordered from low to high based on their acoustic pitch, Duanmu's approach attempts to conform to evidence that this cannot be so. Duanmu is aware of instances where, at one moment in one syllable, one tone level X may be higher than another Y but, at another moment in a different syllable, X may change to be lower than Y. In other words, Duanmu's system does a better job of connecting with phonetic and acoustic reality.
With four tone levels established, Duanmu turns to constructing the possible allotones (see the discussion of tonemes below) themselves. They begin by listing some basic possibilities. Starting by ignoring register and limiting tones to involving at most three levels, Duanmu sees six:
- There are two level tones: H and L
- There are two simple contour (changing) tones: Rising (LH) and falling (HL)
- There are two complex contour tones: Rise-fall (LHL) and fall-rise (HLH)
Duanmu then mixes in register, including the fact that register is influenced by a historical connection to the voicing of syllable onset, to double the possibilities to twelve.
How tones relate to syllable structure
Duanmu's approach deliberately makes tone levels similar to phones: both are built with features. But does this mean that they can be combined, to produce a wider set of tone-bearing phones? It's not so simple.
The problem is that tones may require periods of different lengths to pronounce, and these periods do not trivially map to either syllables, rhymes, nuclei, or phones.
So Duanmu relies instead on moras as the tone bearing unit. A mora is itself a unit which can receive a single tone level. In Duanmu's system, a single phone may receive a number of moras depending on its length and other properties, and tone levels are assigned to moras from left-to-right, irrespective of which phone the moras belong to. For example, see how moras behave in the words [mai] ("to sell") and [maː] ("to scold"), both of which using a falling tone:
Because the falling tone involves two tone levels, it needs two moras. In the word [mai], each vowel's duration is able to receive one mora. But in the word [maː], the single vowel [a] must be lengthened in order to make time for both required moras.
The underlying forms of the tonemes
For each of the tonemes, Duanmu chooses the following underlying forms:
Tone 1
H
Tone 2
LH
Tone 3
L
Tone 4
HL
Except for Tone 3, all tone levels have a slack register and are non-breathy.
Tone 3 deserves particular attention because it is often introduced as a complex contour tone. Here is an example from the UCLA Phonetics Lab Archive of the word for "horse", where the blue pitch line can be seen to fall and then rise:
For the fall, Duanmu argues that it is caused by the breathiness of this tone and is so slight that it should not count (Duanmu 237). For the rise, Duanmu argues that it is driven by causes external to the toneme. Specifically, when the rise appears it is because it is needed to avoid a polarity reversal with the following tone.
Section 7: References
- “Chinese, Standard Chinese.” UCLA Phonetics Lab Archive, http://archive.phonetics.ucla.edu/Language/CMN/cmn.html.
- Comrie, Bernard. The World’s Major Languages. 3rd ed., 2020.
- Davenport, Michael, and S. J. Hannahs. Introducing Phonetics and Phonology. Fourth edition, Routledge, 2020.
- Duanmu, San. The Phonology of Standard Chinese. 2nd ed, Oxford University Press, 2007.
- Trask, Robert L. A Dictionary of Phonetics and Phonology. Reprinted, Routledge, 2006.
- United Nations. “Official Languages.” United Nations, https://www.un.org/en/our-work/official-languages.