Int J Speech Technol (2017) 20:305–326 DOI 10.1007/s10772-017-9409-1
Discourse prosody planning in native (L1) and nonnative (L2) (L1-Bengali) English: a comparative study Shambhu Nath Saha1 · Shyamal Kr Das Mandal1
Received: 28 September 2016 / Accepted: 10 April 2017 / Published online: 19 April 2017 © Springer Science+Business Media New York 2017
Abstract This paper conducts a comparative study between L1 and L2 (L1 Bengali) English discourse level speech planning to investigate differences between L1 and L2 English speaker groups in the organization of discourselevel speech planning. For this purpose, English speech of 10 L1 English and 40 L1 Bengali speakers of the same discourse are analyzed in terms of using prosodic and acoustic cues by applying hierarchical discourse prosody framework. From this analysis, between-group differences in discourse level speech planning are found through the speech rate, locations of discourse boundary breaks as well as size and scope of speech planning and chunking units. Result of analysis shows that the speech rate of L1 English speakers is higher than that of L2 English speakers, L2 English speakers contain more break boundary than that of the L1 English speakers at every discourse level in the organization, which exhibit the fact that L2 English speakers use more intermediate chunking units and larger scale planning units than that of L1 English speakers. Betweengroup differences are also found through the analysis of phrase component at prosodic phrase level and accent component at the prosodic word level. These findings can be attributed to L2 English speakers’ improper phrasing, improper word level prominence and the ambiguous difference between content words and function words. The study concludes that the deficiencies in English strategy for L1 Bengali speakers’ discourse-level speech planning * Shambhu Nath Saha
[email protected] Shyamal Kr Das Mandal
[email protected] 1
Centre for Educational Technology, Indian Institute of Technology, Kharagpur, India
compared to L1 English speakers are due to the influence of L1 (Bengali) prosody at the L2 discourse level. Keywords Discourse · Discourse-level speech planning · Discourse prosody framework · F0 contour · Phrase component · Accent component · Prosodic phrase · Prosodic word · Content word · Function word · Most prominent word · Less prominent word
1 Introduction English is an international language for communication, and it has growing importance as a second language for spoken and written. The blending of English with local languages and dialects has given rise to a wide variety of world Englishes, which exhibit rich variation in pronunciation, lexicon and grammar. Thus understanding the range of variation present in the English spoken in the world today is the fundamental issue for the development of English language education as well as spoken language science and technology. Asia is an important market for English language education, and it is important to learn about Asian language speakers’ English and identify their features. Therefore, the Asian English Speech cOrpus Project (AESOP) was launched in order to construct a common shared large-scale English speech corpus of Asian language speakers (Meng et al. 2009). Its primary aim is to collect nonnative (L2) English speech corpus using the common recording platform and compare English speech corpora from the different Asian countries using a consistent set of core materials in order to derive a set of core properties common to all varieties of Asian English, as well as to discover features that are particular to individual dialects.
13
Vol.:(0123456789)
306
In India, ‘English’ serves as a language of wider communication among the people. Moreover in India, there are a large number of regional varieties, each different from the other in certain ways, and retaining to some extent the phonetic patterns of the Indian language spoken in that particular region. These regional varieties of English sometimes are not even mutually intelligible (Bansal 1966). India has 1683 spoken languages and more than 30,000 dialects (Chaudhary 2009). Since the spoken English of L2 speakers is influenced by their L1 language, so there are several regional varieties of spoken English in India. Since the spoken English of a speaker may also vary with the dialect of the speaker’s mother tongue, it not only depends on local language but also depends on the particular dialect of the language. Thus, research on spoken English of Indian speakers from a multidisciplinary perspective is urgently needed. Like other Eastern Indo-Aryan languages, Bengali arose from eastern Middle Indo-Aryan dialects of Magadhi Prakrit and Pali. It is the official state language of the Eastern Indian state of West Bengal and the national language of Bangladesh. With nearly 230 million total speakers, Bengali is one of the most spoken languages (ranking fifth) in the world (Lewis et al. 2009). Dialect-wise Bengali is divided into two main branches—Western and Eastern. The Western branch consists of Rarha (South), Varendra (North Central) and Kamrupa (North Bengal) dialect clusters. Rarha is further sub-divided into South Western Bengali (SWB) and Western Bengali; the Standard Colloquial Bengali (SCB) is spoken around Kolkata (Bhattacharya 1988). This study considered the L1 speakers of Standard Colloquial Bengali. There are certain differences in English utterances between L1 English and L1 Bengali speakers. One of the most common causes of these differences is the difference in prosodic pattern between English and Bengali. English is a stressed-timed language with quasi-uniform durations between consecutive stressed syllables (Roach 1998). English pitch contour also has specific linguistic and paralinguistic importance. On the other hand, Bengali is a syllable-timed language where all syllables, whether stressed or unstressed, tend to occur at regular time interval and have quasi-uniform durations (Hayes and Lahiri 1991), which generate very different speech prosody (rhythm, intonation) compared with English. F0 contour has a significant role in measuring prosodic differences between L1 and L2 English speech. As the importance of English grows day by day for communication purpose, it is necessary to produce English speech correctly by L1 Bengali speakers. There are many varieties of L2 English due to the influence of L1 prosody at the L2 discourse level. Therefore, a systematic study of the various factors that have influences on the L1 and L2 strategies in the organization of discourse-level speech planning is required for second language acquisition.
13
Int J Speech Technol (2017) 20:305–326
Discourse is defined as a piece of language behavior that involves multiple utterances and multiple participants. In linguistic, discourse may be defined as a unit of language longer than a single sentence. More broadly, discourse is the use of spoken or written language in a social context. Prosody is always present in spoken discourse known as discourse prosody. In discourse prosody, utterances are phrased into constituents and are hierarchically organized into various domains at different levels of the discourse-level speech planning organization. For instance, there is speech planning of spoken discourse for speakers, known as discourse prosody planning, vary from language to language. It involves more elaborate planning of discourse structure (DS) in addition to information structure (IS). Basically, discourse prosody planning specifies speech strategies in relation to prosody at the every level of discourse-level speech planning organization. The purpose of discourse prosody planning is to minimize the difficulty and processing load of speech production, as well as improves the comprehensibility, intelligibility, and naturalness of speech. The objective of this paper is: • To conduct a comparative study of discourse level
speech planning of English speaking by L1 English and L1 Bengali speakers in order to investigate the betweengroup differences at every level of discourse-level speech planning organization. For this purpose, a hierarchical discourse prosody framework HPG (Hierarchy of Prosodic Phrase Group) is used in data analysis.
2 Literature review There are different types of studies on the realization of discourse structure prosodically. The research by Lehiste (1975), Hirschberg and Grosz (1992), Den Ouden et al. (2009) has identified systematic correlates between aspects of structure of discourses and prosodic measures of pitch, pause duration, intensity, and speech rate. This indicates that the prosody of speech can carry cues to the structure of discourse. One common feature of discourse with which prosody is correlated is the depth of discourse boundary. There are diverse criteria to identify boundaries of different size. These criteria include orthographic markers of paragraph boundaries (Lehiste 1975, 1982) and intuitive analyses of breaks in the discourse, either by the experimenter (Yule 1980) or the participants themselves (Swerts 1997). Other work creates measures of boundary size using a specific theory of discourse structure, e.g. Den Ouden et al. (2009) use Rhetorical Structure Theory (Mann and Thompson 1988) and Hirschberg and Grosz (1992) use the Grosz and Sidnel model (1986). The prosodic measures, most commonly found to correlate with discourse, have
Int J Speech Technol (2017) 20:305–326
been pause duration and pitch maxima, though others have been explored as well. Pause durations have tended to be longer at larger discourse boundaries (Den Ouden et al. 2009; Lehiste 1982).The results from Silverman (1987) and Mayer et al. (2006) suggest that intonation may play a more important role at the level of discourse. Mayer et al. (2006) found that pause and pitch manipulations together biased interpretation but either alone had no effect. On the other hand, Silverman (1987) found a stronger effect when pause and pitch information was both available, but the pitch manipulations alone still had a significant effect. In recent times, there are some comparative studies between discourse-level speech planning in L1 and L2 English based on relationship between discourse structure and prosody. In particular, Tseng (2006) addressed higher level organization of discourse-level speech planning based on Mandarin Chinese speech prosody and performed acoustic analyses included F0 patterns, syllable duration, intensity distribution at every level of this organization. As a result, a hierarchical multi-layer multi-phrase prosody framework HPG was postulated that accounts for higher level discourse organization. Furthermore, in the study by Chen et al. (2016),Chen and Wei-te Fang (2016), Visceglia et al. (2010), L1 English and L1 Taiwan Mandarin English speech data were analyzed in terms of using cues (prosodic and acoustic) by applying HPG framework to find out similarities and differences in discourse-level speech planning between L1 and L2 English. The results showed that both groups produced similar configurations of acoustic contrasts to signal discourse units and boundaries, but L1 English speakers were found to produce these cues more robustly. Moreover, between-group differences in discourse units were also found through the distribution of discourse break levels and break locations, where L2 English speakers used more intermediate chunking units and fewer largescale planning units compared to L1 English speakers in discourse-level speech planning organization. Realization of the prosodic cues to information structure is a challenge for L2 speakers, and failure to realize focus may reduce L2 speakers’ level of comprehensibility, may contribute to listeners’ difficulty in extracting their intended meaning or in following their discourse structure. Recent research by Chen and Wei-te Fang (2015) investigates differences in the acoustic cues used to realize L1 and L2 focus (prominence), and determine whether L2 differences can be attributed to transfer of L1 prosodic strategies to realize focus (prominence). Visceglia et al. (2012) investigated differences in the realization of English narrow focus by L1 speakers of North American English, Taiwan Mandarin, and Beijing Mandarin. Results of this study showed that the differences between L1 and L2 realization of narrow focus can be attributed to transfer of L1-specific prosodic features, L2-specific processing strategies, or a combination of
307
the two. In the study by Tseng and Su (2014), investigated discourse chunking and information arrangement through analysis of perceived degrees of prominences. Comparison of L1 and L2 (L1 Mandarin) English showed that emphasis patterns are shared regardless of language and speech type; the difference lies only in their distribution. Moreover, L2 speakers used less varied patterns as well as less type of variation, and L2 speakers’ arrangements of information structure were less complex and less varied than L1. This study hypothesized that higher-level discourse planning through patterns of chunking and phrasing, and information structure through placements of emphases by degree may further cause prosodic deviations. F0 contour has a significant role in measuring prosodic differences between L1 and L2 speech. Comparative studies of L1 and L2 discourse structure focus on features of speakers’ F0 production and find out differences in the use of prosodic cues by L2 speakers (AndersonHsieh et al. 1992; Chen and Tseng 2015; Hirschberg and Pierrehumbert 1986; Pickering 2001; Tyler et al. 1988; Tyler and Davies 1990; Wennerstrom 1998). In a study by Wennerstrom (1994), investigating 30 speakers from three L1 backgrounds performing a read-aloud task, found that while Spanish L2 speakers and an L1 speaker control group marked a new topic with a high pitch onset, Thai, and Japanese L2 speakers demonstrated no paragraph initial pitch change. In a latter study, Wennerstrom (1998) examined 10–12 min lectures given by 18 L1 Mandarin Chinese speakers and found a significant correlation between the students’ scores on a spoken exam and their ability to indicate rhetorical units by expanding their pitch range. Studies by Beckman and Pierrehumbert (1986), Nariai and Tanaka (2008) compared Japanese and English prosodies, which revealed that Japanese accents have a fixed tonal shape and the only sources of variation seem to be different choices of phrasing and of pitch range that produced deficiencies in Japanese English. Furthermore, Pickering (2004) in his study found that while L1 English speakers produce a hierarchical system of prosodic units to create semantic cohesion within spoken paragraphs and to signal aspects of information structure, there is evidence that L1 Mandarin international teaching assistants are much less adept at using prosodic cues in this way, even when they are necessary to clarify aspects of sequencing and emphasis. Xiaoli et al. (2009) performed comparative study on prosody of English and Chinese and found that prosodic differences between L1 and L2 English are mainly on location of stress, types of boundary tone and pitch accent. In the studies by Visceglia et al. (2010), and Tseng and Su (2014), L1-L2 comparisons of English and Mandarin by F 0 contours across different size of prosodic units were performed. The results showed that English requires sharper high/
13
308
low contrasts by phrase and by words while Mandarin does not rely on contrast degree to realize phonological as well as prosodic differentiation, and the study also showed that L2 English sounds flatter than L1 English. Busa and Urbani (2011) did preliminary investigation of the differences in pitch range in English spoken as L1 and L2 and found that Italian speakers of English have a narrower pitch range and less pitch variation than L1 English speakers, results in prosodic difference between L1 and L2 speech. From the review of literature, it is reasonable to suggest that comparative study between discourse planning for L1 and L2 English, which is based on relationship between discourse structure and prosody, is an important issue in L2 acquisition. In the present study, English speech of L1 English and L1 Bengali speakers of the same discourse is analyzed by prosodic as well as acoustic cues to the structure of discourse in order to investigate between group differences in speech planning of L1 and L2 (L1 Bengali) English speakers at discourse level.
3 Speech material The material used for present study was the AESOP’s fable “The North Wind and the Sun” (Mondonedo 1999), which produces a large range of segmental and suprasegmental characteristics in English. This phonetically balanced passage is recommended by the IPA, which contains 144 syllables, 113 words, 8 independent clauses, 5 dependent clauses, 5 sentences, 3 paragraphs and all English phonemes. The material was read by 10 (5 male, 5 female) L1 American English speakers and they were in the age group between 21 and 28 years (detailed description given in Appendix 1). Moreover, they were all originally from the United States (U.S) and were undergraduate or postgraduate students. The material was also read by 40 (20 male, 20 female) L1 Bengali speakers whose native language was Standard Colloquial Bengali (SCB) (detailed description given in Appendix 2). All L1 Bengali speakers were in the age group between 20 and 35 years and had studied English as a second language for a minimum of 10 years. In the present study, participants were originally from Kolkata in West Bengal and had either completed undergraduate degree studies or were continuing their postgraduate studies or were continuing their research. The speech was recorded by using AESOP’s recording toolkit with AESOP’s specified recording platform (detailed description given in Appendix 3). For the fluency of reading, the speakers were instructed to read out the text several times before recording and read the material aloud. The speech was digitized
13
Int J Speech Technol (2017) 20:305–326
at a sampling rate 16 kHz with an accuracy of 16 bits/ sample.
4 Methodology For the purpose of comparison between L1 and L2 (L1 Bengali) English speech planning at discourse level, hierarchical multi-layer discourse prosody framework HPG (Hierarchy of Prosodic Phrase Group) was used in the present study. HPG is a fundamental framework of discourse prosodic organization and it is based on different discourse units located inside the different levels of boundary breaks across speech flow (Tseng 2006). Figure 1 shows a schematic representation of hierarchical discourse prosody framework HPG. From bottom up, the layered nodes are syllables (SYL), prosodic words (PW), prosodic phrases (PPh), breath groups (BG), prosodic phrase groups (PG) which correspond to speech paragraphs and discourse. Optional discourse markers (DM) and prosodic fillers (PF) exist between phrases (House 2013), but are linkers and transitions within and across PGs. These constituents are associated with discourse boundary breaks from B1 to B5 respectively. B1 denotes syllable boundary at the SYL layer; B2 denotes a minor break at the PW layer; B3 indicates a major break at the PPh layer; B4 when the speaker is out of breath and takes a full breath and breaks at the BG layer; and B5 denotes a longest break at the PG layer (Tseng et al. 2005). When a speech paragraph is relatively shorter and does not exceed the speaker’s breathing cycle, the top two layers BG and PG of the framework merged into the PG layer. In the passage “The North Wind and the Sun”, discourse units and 5 levels of discourse boundaries B1 through B5 associated with discourse units were manually tagged based on syntactic rules by linguists. By
Discourse
PG
PG
BG PPh
DM
PW SYL
BG PPh
PW SYL
BG DM
PW SYL
PPh PW
SYL
PW SYL
SYL
Fig. 1 A schematic representation of hierarchical discourse prosody framework
Int J Speech Technol (2017) 20:305–326
309
default, the boundary breaks, discourse units and their relationships are SYL/B1 < PW/B2 < PPh/B3 < BG/ B4 < PG/B5. Next, for each speaker, the recorded speech data was examined manually using open source software Praat to identify physical pause (silence) or break of speech event (in case of vocalic boundary) at each of the tagged boundary break labeled manually in the passage for verification of presence of discourse unit. In order to investigate the differences between speech planning of L1 and L2 English speaker groups at different discourse levels, the prosodic cues of speech rate, discourse boundary break realization, chunking and planning unit size, pause duration at different discourse boundaries were analyzed by applying HPG protocol to L1 and L2 English speech of the same discourse. Since change of breath (boundary B4) and change of paragraph (boundary B5) assumed to reflect different levels and units of higher level discourse planning, consistency of discourse boundaries B4/B5 between L1 and L2 English speaker groups was also measured to find out between-group consistency of discourse planning in text. In addition, to measure prosodic differences between L1 and L2 English speech at discourse level, an acoustic analysis of F 0 contour was also performed at PPh and PW levels.
5 Results 5.1 Analysis of speech rate In this analysis, speech rate has been calculated in four ways: syllable number per second, syllable number per minute, word number per second and word number per minute. The passage “The North Wind and the Sun” contains 5 sentences, 113 words, and 144 syllables. The duration of each of the 5 sentences for each speaker was measured manually using open source software Praat (Boersma and Weenink 2004). Next, speech rate for each speaker was calculated using overall duration and the total number of syllables and words of 5 sentences. Finally, means of speech rate of all speakers for each speaker group was calculated in the number of syllables and number of words respectively. Table 1 presents the means/standard deviations of speech rate for L1 and L2 English speaker groups. The result of t test (for syllables: p = 0.089, p < 0.1; for words: p = 0.084, p < 0.1) implies that there was statistically significant (p < 0.1) difference in speech rate between L1 and L2 English speaker groups. From the Table 1 Speech rate by unit of measurement and speaker group
Speaker group
L1 L2
result of t test and the result shown in Table 1, it is indicated that the speech rate of L2 English speakers was lower than the speech rate of L1 English speakers, which implies that L1 English speakers were faster than the L2 English speakers in English readout speech. Moreover, based on standard deviations, the result shows that within-group variability of L2 English speakers was much higher than that of L1 English speakers. 5.2 Analysis of discourse boundary break From the manual labels of discourse boundaries in the passage “The North Wind and the Sun”, it was observed that the passage contains 61 B2, 33 B3, 13 B4, and 3 B5 boundary breaks. For each speaker, the recorded speech data was examined manually using open source software Praat to identify physical pause (silence) or break of speech event (in case of vocalic boundary), at each of the tagged boundary break labeled in the passage. If the physical pause or break of speech event was identified, that means, the speaker realized the corresponding boundary break; otherwise, the boundary break was not realized by the speaker. In this way, read speech data of all speakers was examined manually for each boundary break, and counted the total number of each boundary break realized by each speaker group. Next, percentage calculation was done by using the total number of each boundary break realized by speaker group, out of given number of manually tagged boundary break in the passage. Table 2 shows the result of analysis of discourse boundary break of L1 and L2 English speaker groups. The result of t test (B2: p = 0.047, p < 0.05; B3: p = 0.049, p < 0.05; B4: p = 0.049, p < 0.05; B5: p = 0.049, p < 0.05) shows that there was statistically significant difference in number (in percentage) of realized discourse break between both speaker groups at every discourse Table 2 Realization of discourse boundary break by speaker group
Boundary break
Speaker group L1 (%)
L2 (%)
B2 B3 B4 B5
50.16 45.45 80.00 81.87
58.52 60.30 83.84 82.30
Measurement Syl/s (µ/σ)
Syl/min (µ/σ)
Words/s (µ/σ)
Words/min (µ/σ)
4.30/0.33 3.82/0.49
258.14/19.95 229.12/29.64
3.33/0.25 2.95/0.38
199.620/14.92 177.007/22.88
13
310
Int J Speech Technol (2017) 20:305–326
level. Highest difference was found at B3 level for L1 and L2 English speaker groups. The result shows that L2 English speakers contained more B2, B3, B4, B5 breaks than L1 English speakers. Therefore, L2 English speakers used more intermediate chunking units at PW and PPh levels than that of L1 English speakers and L2 English speakers also used more larger-scale planning units at BG and PG levels than that of L1 English speakers. This means that L1 and L2 English speakers use different strategies to plan discourse units. 5.3 Chunking and planning unit size From the analysis of discourse boundary break (discussed in Sect. 5.2), the total number of each discourse unit realized by speaker group was obtained. Next, the total number of syllable and word in each type of discourse unit for each speaker group was measured manually. Finally, size of each discourse unit was calculated by using these two measured values. Table 3 shows the size of each discourse unit in the number of syllables and words for L1 and L2 English speaker groups. From the result of t test (shown in Table 4), it is observed that there was statistically significant difference (p < 0.01) in size of each discourse unit, except PG unit (p > 0.01) between both speaker groups. From Table 3 and result of t test, it is implied that L1 PW, PPh, and BG contained more syllables and words than that of L2 PW, PPh and BG respectively, but L1 PG contained the same number of syllables and words with that of L2 counterpart. Therefore, the size of PW, PPh, and BG for L1 English speakers was larger than that of L2 English speakers. The most substantial difference in size was at PPh and BG levels for both speaker groups. However, the size of PG for L1 and L2 English speakers was same, because PG boundary locations correspond to paragraph breaks in text for L1 and L2 English speaker groups. Since planning strategies of speakers depend on the use of intermediate chunking units (PW and PPh), so the result of this analysis suggests that L1 English Table 3 Size of chunking and planning units by speaker group and unit of measurement Discourse unit
PW (µ/σ) PPh (µ/σ) BG (µ/σ) PG (µ/σ)
13
Table 4 Result of t test between both speaker groups by prosodic unit and unit of measurement Discourse unit
PW PPh BG
Measurement unit Syllable number
Word number
p = 0.008, p < 0.01 p = 0.009, p < 0.01 p = 0.008, p < 0.01
p = 0.007, p < 0.01 p = 0.009, p < 0.01 p = 0.008, p < 0.01
speakers can be able to plan on a larger scale than L2 English speakers in the organization of discourse-level speech planning. But smaller chunks in L2 English may also explain L2 English speakers’ inconsistent placement of B4/B5 boundaries. 5.4 Among‑speaker B4/B5 consistency Among-speaker consistency by paragraph boundaries (B4/ B5) is derived from L1 and L2 English speech respectively and compared to observe if L2 English speakers could organize discourse as consistently and systematically as L1 English speakers. An indicator is used to quantify among speaker consistency of discourse boundaries B4/B5, i.e. average consistency. Average consistency (AC) is defined as follows:
AC =
BN ⎛ SN ⎞ 1 �⎜ 1 � Csj Bi ⎟ ⎟ BN B =1 ⎜ SN S =1 i j ⎠ ⎝
(1)
where SN and BN represent the number of speakers and i denote number of B4/B5 boundary respectively; S j and B index of speaker and B4/B5 respectively; CSjBi = 1 when jth speaker with ith B4/B5 and CSjBi = 0 when jth speaker without ith B4/B5. Table 5 shows average among-speaker consistency (in percentage) by B4/B5 by L1 and L2 English speakers. The result of t test (p = 0.099, p < 0.1) indicates that there was statistically significant difference in average consistency between both speaker groups. From result shown in Table 5 and result of t test, it is observed that there was higher average consistency in L1 than L2. Thus, it seems that L1 English speakers have a high level of agreement on the planning structure for a fixed text, unlike L2 English speakers.
Measurement and group Syllable number
Word number
L1
L2
L1
L2
3.15/0.10 7.61/0.38 13.70/0.84 48.00/1.49
2.53/0.08 4.21/0.19 11.75/0.57 48.00/2.32
2.19/0.09 5.42/0.26 12.30/0.64 37.66/1.38
1.99/0.05 3.30/0.06 9.22/0.44 37.66/1.78
Table 5 Among-speaker consistency by B4/B5 by speaker group
Speaker group L1
L2
81.54%
76.92%
Int J Speech Technol (2017) 20:305–326 Table 6 Pause duration (ms) by break boundary and speaker group
Speaker group
L1 (µ/σ) L2 (µ/σ)
311 Break boundary B2
B3
B4
B5
222.58/34.50 214.14/31.95
300.74/43.87 374.68/92.04
480.23/112.92 447.67/55.79
678.3/124.60 639.2/179.86
Table 7 Result of t test between both speaker groups by break boundary
5.6 Analysis of F0 contour
Break boundary
F0 contour plays an important role to express prosodic information of an utterance. According to Fujisaki’s command-response model (Fujisaki and Hirose 1984; Hirose and Fujisaki 1982), logarithmic F 0 contour [loge F0 (t)] consists of slowly varying components, called phrase components, rapidly varying components, called accent components, and logarithmic of baseline value (Fb) of F 0 (loge Fb) (Fujisaki and Ohno 1995; Fujisaki et al. 1998). Phrase components represent syntactic structure of a sentence, mode of utterance and accent components represent the lexical tone of syllables, the lexical accent of words. A prosodic phrase is defined by the presence of single phrase component and a prosodic word is defined by the presence of one or more accent components (Acharya et al. 2011, 2013). In this study, logarithmic F 0 contour [loge F0 (t)] of the every speaker’s utterance was decomposed into phrase component and accent component using empirical mode decomposition (EMD) technique.
B2 p = 0.099, p < 0.1
B3 p = 0.072, p < 0.1
B4 p = 0.093, p < 0.1
B5 p = 0.095, p < 0.1
5.5 Analysis of pause duration In this analysis, duration of pause (silence) was measured at every realized discourse break boundary (manually tagged in the passage “The North Wind and the Sun”) in speech data of each speaker manually by using Praat acoustic analysis software. Table 6 shows the means/standard deviations of pause duration (ms) for L1 and L2 English speaker groups at different discourse break boundaries. Table 7 presents the result of t test between L1 and L2 English speaker groups at different discourse break boundaries. The result of t test shows that there was statistically significant difference (p < 0.1) in pause duration between both speaker groups at each discourse break boundary. The result of t test and result shown in Table 6 imply that pause duration of L1 English speakers was higher than that of L2 English speakers at B2, B4 and B5 boundaries, but at B3 boundary, pause duration of L1 English speakers was lower than that of L2 English speakers. Furthermore, based on values of standard deviations, it is observed that, within-group variability of pause duration of L1 English speakers was higher than that of L2 English speakers at B2 and B4 boundaries; but at B3 and B5 boundaries, within-group variability of pause duration of L1 English speakers was lower than that of L2 English speakers. Moreover, variation of pause duration at B3 boundary for L2 English speakers was found to be greater than the variation found at the B2/B4 boundaries. For L1 English speakers, much higher level variation of pause duration was found at B4 boundary compared to B2/B3 boundaries. These results indicate that the pause duration can be used to differentiate break boundaries in English discourse.
5.6.1 EMD analysis EMD is a signal processing technique that decomposes a complex signal into elementary, almost orthogonal components that don’t overlap in frequency and the different components match the signal itself very well (Rilling et al. 2003). Because the approach is algorithmic, it does not allow expressing the different components in closed form. The basic principle of the EMD technique is to decompose a signal into a sum of the band limited functions, called intrinsic mode functions (IMFs) (Huang et al. 1998). Each IMF satisfies two basic conditions: (i) it is symmetric (ii) the number of extrema and the number of zero crossings must be same, or it may differ at most by one. EMD technique is an iterative process that ends when it satisfies a predefined stopping criterion. At the end of the process, we have:
x(t) =
n ∑
di (t) + mn (t)
(2)
i=1
where x(t) is input signal, m n(t) is the residue and d i(t) is the intrinsic mode function relative to i. di has the same number of zero crossings and extrema; and is symmetric with respect to the local mean. In EMD technique, the
13
312
Int J Speech Technol (2017) 20:305–326
number of maxima and minima decreases when going from one residual to the next and the corresponding spectral supports are expected to decrease accordingly; as a result, complete decomposition is achieved in a finite number of steps. The basic reasons to use EMD in this analysis are: • It is an automatic decomposition technique and fully
adaptive.
• The different oscillating components produced from
EMD match the signal itself very well.
• EMD does not allow expressing the different compo-
nents in closed form.
• A small change of signal component can be detected
properly by using EMD.
• There are different techniques to decompose a compos-
ite signal into elementary components such as Fourier transform, short time Fourier transform (STFT), wavelet decomposition. But the limitation of Fourier transform is that it is unable to determine the strength of the frequency component over a specific interval of time. To overcome this limitation, the concept of STFT is introduced. Since STFT analyzes a signal with a fixed window size, it fails to capture frequency components with variable frequencies. To overcome this drawback, the idea of Wavelet decomposition has emerged, where window size changes dynamically to analyze time–frequency behavior of the signal. All these techniques analyze the signal on fixed and predetermined basis, but EMD differs from other decomposition methods in such a way that its basis functions are not fixed and match varying nature of signals.
In this study, the speech of every speaker was divided into the number of utterances, where utterance was identified by the presence of pause with duration more than 300 ms. F0 contour, extracted from the utterance by using the open source software Praat, was discrete in nature. To get the continuous F 0 contour, cubic spline interpolation technique was applied to discrete F 0 contour (Narusawa et al. 2002). Using EMD technique, continuous logarithmic F0 contour [loge F0 (t)] was decomposed into the number of IMFs and residue. The DC value (residue) and last IMF were added to get phrase component and IMF with maximum energy was considered as accent component (Acharya et al. 2011, 2013). 5.6.2 Prosodic phrase realization According to English intonation phonology (Pierrehumbert 1980), prosodic phrase (intermediate phrase/phonological phrase) is identified by slow variation of fall (low) to rise (high) in the global pattern of F0 contour. For an utterance, the time at which change from fall to rise occurred
13
was measured manually from the global pattern of F0 contour using Praat acoustic analysis software. Next for the same utterance, the time at which change from fall to rise in the global pattern of phrase component (obtained from EMD technique) occurred was also measured manually. For 90.18% utterances (out of 30 utterances of L1 English speakers and 80 utterances of L1 Bengali speakers), the times (at which change from fall to rise occurred) obtained from global pattern of F 0 contour and corresponding phrase component were same; that means, a prosodic phrase of an utterance was also realized by a change in the global pattern of phrase component from fall to rise. Hence, in this analysis, the phrase component timing of L1 and L2 English speakers was used to find the change from fall to rise in the global pattern of phrase component to mark the prosodic phrase. 5.6.3 Prosodic word realization According to English intonation phonology (Pierrehumbert 1980), prosodic word is identified by rapid variation of fall (low) to rise (high) in the local pattern of F0 contour. For an utterance, the time at which change from fall to rise occurred, was measured manually from local pattern of F0 contour using Praat acoustic analysis software. Next for the same utterance, the time at which change from fall to rise in the local pattern of accent component (obtained from EMD technique) occurred was also measured manually. For 88.91% utterances (out of 30 utterances of L1 English speakers and 80 utterances of L1 Bengali speakers), the times (at which change from fall to rise occurred) obtained from local pattern of F0 contour and corresponding accent component were same; that means, a prosodic word of an utterance was also realized by change in local pattern of accent component from fall to rise. Hence, in this analysis, the accent component timing of L1 and L2 English speakers was used to find the change from fall to rise in the local pattern of accent component to mark the prosodic word. This section describes a comparative study of L1 and L2 (L1 Bengali) English speakers’ phrase components at the prosodic phrase (PPh) level and accent components at the prosodic word (PW) level and reports the differences in discourse planning between both speaker groups at PPh and PW levels. 5.6.4 Analysis of phrase component at PPh level In this analysis, phrase components of 110 utterances of L1 English speakers and 440 utterances of L2 English speakers were compared with respect to phrasing, pitch height, nature of pitch peak at PPh level. From the result of the comparison, six types of utterance patterns were found for L2 English speakers that differed from L1 English speakers.
Int J Speech Technol (2017) 20:305–326
313
Fig. 2 Prosodic phrasing position by a L1 English speaker, b L2 English (L1 Bengali) speaker
Type 1 Consider the sample utterances “The North Wind and the Sun were disputing which was the stronger” of an L1 and an L2 English speakers. Figure 2a, b show the F0 contour of the prosodic phrasing position of these utterances by L1 and L2 English speakers respectively. From Fig. 2a, it is observed that: (i) L1 English speaker’s utterance consisted of two prosodic phrases—prosodic phrase 1 (PP1) (0.02–1.38 s): The North Wind and the Sun; prosodic phrase 2 (PP2) (1.4–3.27 s): were disputing which was the stronger. (ii) The utterance was clearly phrased by pitch height; that means there was substantial difference in pitch height between two consecutive phrases; so that both phrases were differentiated clearly. (iii) Pitch peak in the latter prosodic phrase (PP2) was lower than that of the foregoing prosodic phrase (PP1). It is observed from Fig. 2b that: (i) L2 English speaker’s utterance consisted of three prosodic phrases instead of two prosodic phrases—prosodic phrase 1 (PP1) (0.34–1.23 s): the North Wind; prosodic phrase 2 (PP2) (1.24–2.96 s): and the Sun were disputing; prosodic phrase 3 (PP3) (3.06–4.46 s): which was the stronger. Last prosodic phrase was used as an additional phrase. Hence, the phrasing of L2 English speaker’s utterance was not matched with that of L1 English speaker’s utterance, results in improper phrasing. (ii) The utterance was not clearly phrased by pitch height, unlike L1 English
speaker; that means there was not substantial difference in pitch height between two consecutive phrases; so that both phrases were not differentiated clearly. (iii) Pitch peaks of all prosodic phrases were same, unlike L1 English speaker. Therefore, this utterance of L2 English speaker was involved in improper phrasing, and was not clearly phrased by pitch height, and nature of pitch peak which was not matched with that of L1 English speaker’s utterance. Type 2 Consider the sample utterances “And so the North Wind was obliged to confess” of an L1 and an L2 English speakers. Figure 3a, b represent F0 contour of the prosodic phrasing position of these utterances by L1 and L2 English speakers respectively. From Fig. 3a, it is observed that: (i) L1 English speaker’s utterance had three prosodic phrases—prosodic phrase 1 (PP1) (0.05–2.37 s): and so the North Wind was; prosodic phrase 2 (PP2) (2.4–4.47 s): obliged; prosodic phrase 3 (PP3) (4.49–5.09 s): to confess (ii) pitch height clearly phrased the utterance. (iii) Pitch peak in the latter prosodic phrase (PP2) was lower than that of the foregoing prosodic phrase (PP1) and so on. It is observed from Fig. 3b that: (i) L2 English speaker’s utterance had two prosodic phrases instead of three prosodic phrases—prosodic phrase 1 (PP1) (0.04–1.54 s): and so the North Wind was; prosodic phrase 2 (PP2) (1.6–2.78 s): obliged to confess. In this case, L2
Fig. 3 Prosodic phrasing position by a L1 English speaker, b L2 English (L1 Bengali) speaker
13
314
English speaker’s utterance was involved in improper phrasing due to the use of a fewer number of phrase component compared to L1 English speaker. (ii) The utterance was clearly phrased by pitch height like L1 English speaker. (iii) Pitch peak in the latter prosodic phrase (PP2) was higher than that of the foregoing prosodic phrase (PP1), unlike L1 English speaker. Therefore, this utterance of L2 English speaker was involved in improper phrasing, and was clearly phrased by pitch height, but nature of pitch peaks was not matched with that of L1 English speaker’s utterance. Type 3 Consider the sample utterances “take his cloak off should be considered stronger than the other” of an L1 and an L2 English speakers. Figure 4a, b show the F 0 contour of the prosodic phrasing position of these utterances by L1 and L2 English speakers respectively. From Fig. 4a, it is observed that: (i) L1 English speaker’s utterance consisted of three prosodic phrases—prosodic phrase 1 (PP1) (0.07–0.78 s): take his cloak off; prosodic phrase 2 (PP2) (0.82–1.78 s): should be considered; prosodic phrase 3 (PP3) (1.81–3.17 s): stronger than the other. (ii) The utterance was clearly phrased by pitch height. (iii) Pitch peaks in latter prosodic phrases
Int J Speech Technol (2017) 20:305–326
were lower than those of the foregoing prosodic phrases. On the other hand, it is observed from Fig. 4b that: (i) L2 English speaker’s utterance consisted of four prosodic phrases instead of three prosodic phrases—prosodic phrase 1 (PP1) (0.22–1.07 s): take his cloak off; prosodic phrase 2 (PP2) (1.08–2.03 s): should be; prosodic phrase 3 (2.04–2.96 s): considered stronger; prosodic phrase 4 (2.98–3.73 s): than the other. Last prosodic phrase was used as an additional phrase. Hence, L2 English speaker’s phrasing in utterance was not matched with that of L1 English speaker’s utterance due to the use of additional phrase component, results in improper phrasing. (ii) The utterance was not clearly phrased by pitch height, unlike L1 English speaker. (iii) Pitch peaks in latter prosodic phrases were lower than those of the foregoing prosodic phrases, like L1 English speaker. Hence, this utterance of L2 English speaker was involved in improper phrasing, and was not clearly phrased by pitch height, but nature of pitch peaks was matched with that of L1 English speaker’s utterance. Type 4 Consider the sample utterances “They agreed that the one who first succeeded in making the traveler” of an L1 and an L2 English speakers. Figure 5a, b show
Fig. 4 Prosodic phrasing position by a L1 English speaker, b L2 English (L1 Bengali) speaker
Fig. 5 Prosodic phrasing position by a L1 English speaker, b L2 English (L1 Bengali) speaker
13
Int J Speech Technol (2017) 20:305–326
the F0 contour of prosodic phrasing position of these utterances by L1 and L2 English speakers respectively. From Fig. 5a, it is observed that: (i) L1 English speaker’s utterance had two prosodic phrases—prosodic phrase 1 (PP1) (0.03–1.58 s): they agreed that the one; prosodic phrase 2 (PP2) (1.6–4.04 s): who first succeeded in making the traveler. (ii) Pitch height clearly phrased the utterance. (iii) Pitch peak in the latter prosodic phrase (PP2) was lower than that of the foregoing prosodic phrase (PP1). On the other hand, from Fig. 5b it is observed that: (i) L2 English speaker’s utterance had two prosodic phrases—prosodic phrase 1 (PP1) (0.01–1.54 s): they agreed that the one; prosodic phrase 2 (PP2) (1.56–4.34 s): who first succeeded in making the traveler. In this case, the phrasing of L2 English speaker’s utterance was matched with that of L1 English speaker’s utterance, results in proper phrasing. (ii) The utterance was not clearly phrased by pitch height, unlike L1 English speaker. (iii) Pitch peak in the latter prosodic phrase (PP2) was higher than that of the foregoing prosodic phrase (PP1), unlike L1 English speaker. Hence, this utterance of L2 English speaker was involved in proper phrasing, but it was not clearly phrased by pitch height
315
and nature of pitch peaks was not matched with that of L1 English speaker’s utterance. Type 5 Consider the sample utterances “when a traveler came along wrapped in a warm cloak” of an L1 and an L2 English speakers. Figure 6a, b represent F 0 contour of the prosodic phrasing position of these utterances by L1 and L2 English speakers respectively. From Fig. 6a, it is observed that: (i) L1 English speaker’s utterance consisted of two prosodic phrases—prosodic phrase 1 (PP1) (0.06–1.48 s): when a traveler came along; prosodic phrase 2 (PP2) (1.5–2.61 s): wrapped in a warm cloak. (ii) Pitch height clearly phrased the utterance. (iii) Pitch peak in the latter prosodic phrase (PP2) was lower than that of the foregoing prosodic phrase (PP1). From Fig. 6b, it is observed that: (i) L2 English speaker’s utterance consisted of two prosodic phrases—prosodic phrase 1 (PP1) (0.05–1.3 s): when a traveler came along; prosodic phrase 2 (PP2) (1.4–3.08 s): wrapped in a warm cloak. The phrasing of L2 English speaker’s utterance was matched with that of L1 English speaker’s utterance, results in proper phrasing. (ii) The utterance was clearly phrased by pitch height like L1 English speaker. (iii) Pitch peak in the latter prosodic phrase (PP2) was higher than that
Fig. 6 Prosodic phrasing position by a L1 English speaker, b L2 English (L1 Bengali) speaker
Fig. 7 Prosodic phrasing position by a L1 English speaker, b L2 English (L1 Bengali) speaker
13
316
Int J Speech Technol (2017) 20:305–326
of the foregoing prosodic phrase (PP1), unlike L1 English speaker. Therefore, this utterance of L2 English speaker was involved in proper phrasing with clearly phrased by pitch height, but nature of pitch peaks was not matched with that of L1 English speaker’s utterance. Type 6 Consider the sample utterances “and at last the North Wind gave up the attempt” of an L1 and an L2 English speakers. Figure 7a, b show the F 0 contour of the prosodic phrasing position of these utterances by L1 and L2 English speakers respectively. From Fig. 7a, it is observed that: (i) L1 English speaker’s utterance had three prosodic phrases—prosodic phrase 1 (PP1) (0.07–1.22 s): And at last; prosodic phrase 2 (PP2) (1.24–1.64 s): the North Wind; prosodic phrase 3 (PP3) (1.7–2.49 s): gave up the attempt. (ii) The utterance was clearly phrased by pitch height. (iii) Pitch peak in the latter prosodic phrase (PP2) was lower than that of the foregoing prosodic phrase (PP1) and so on. On the other hand, from Fig. 7b, it is observed that: (i) L2 English speaker’s utterance had two prosodic phrases instead of three prosodic phrases—prosodic phrase 1 (PP1) (0.09–1.27 s): and at last; prosodic phrase 2 (PP2) (1.3–2.61 s): the North Wind gave up the attempt. As a result, L2 English speaker’s utterance was involved in improper phrasing due to the use of a fewer number of phrase component compared to L1 English speaker. (ii) The utterance was clearly phrased by pitch height like L1 English speaker. (iii) Pitch peak in the latter prosodic phrase (PP2) was lower than that of the foregoing prosodic phrase (PP1) like L1 English speaker. Therefore, this utterance of L2 English speaker was involved in improper phrasing, but utterance was clearly phrased by pitch height and nature of pitch peaks was matched with that of L1 English speaker’s utterance. Results from comparison of phrase components of L1 and L2 English speaker groups at PPh level are summarized as follows: • For L1 English speakers, each utterance was clearly
phrased by pitch height, but in the case of L2 English speakers, 60.6% utterances were clearly phrased by pitch height like L1 English speakers; that means there
Table 8 Reasons and number of utterances of L2 English speakers to involve in improper phrasing
13
was substantial difference in pitch height between two consecutive phrases of the corresponding utterance; so that both phrases were differentiated clearly. Remaining 39.4% utterances were not clearly phrased by pitch height, unlike L1 English speakers. • In case of each utterance of L1 English speakers, pitch peaks in latter prosodic phrases were lower than those of following prosodic phrases. In case of L2 English speakers, pitch peaks in latter prosodic phrases of 24.2% utterances were lower than those of following prosodic phrases like L1 English speakers, but pitch peaks of all prosodic phrases of 18.2% utterances were at same height, and pitch peaks of latter prosodic phrases of remaining 57.6% utterances were higher than those of following prosodic phrases, unlike L1 English speakers. • In case of 27.2% utterances of L2 English speakers, prosodic phrases were matched with the prosodic phrases of L1 English speakers’ utterances, results in proper phrasing. But in case of 72.8% utterances of L2 English speakers, prosodic phrases were not matched with the prosodic phrases of L1 English speakers’ utterances, results in improper phrasing. Table 8 shows the possible reasons for improper phrasing and number of L2 English speakers’ utterances (out of 72.8% utterances) that involved in improper phrasing due to those reasons. • 12.1% utterances of L2 English speakers were involved in proper phrasing with clearly phrased by pitch height and pitch peaks in latter prosodic phrases were lower than those of forgoing prosodic phrases like L1 English speakers. Remaining 87.9% utterances of L2 English speakers were different from utterances of L1 English speakers due to improper phrasing or not clearly phrased by pitch height or nature of pitch peaks differ from L1 English speakers or all of them. Table 9 shows the number of utterances, out of 87.9% utterances of L2 English speakers, with all possible combinations of three criteria (phrasing, pitch height, nature of pitch pick) that results in differences from utterances of L1 English speakers.
Reasons for improper phrasing
Number of L2 English speakers’ utterances (%)
Frequent use of an additional phrase component Use of fewer phrase component compared to L1 English speakers Number of prosodic phrase is same with that of L1 English speakers, but lexical word distribution among prosodic phrases is different from that of L1 English speakers
36.4 15.2 21.2
Int J Speech Technol (2017) 20:305–326
317
Fig. 8 Prosodic word position by a L1 English speaker, b L2 English (L1 Bengali) speaker
Table 9 Stressed words in prosodic phrases of L1 English speaker’s utterance
Table 10 Word level prominence in prosodic phrases of L1 English speaker’s utterance
Prosodic phrase (PP)
Prominent WORD
Stressed word Content word
PP1 PP2
sun (0.86 s) disputing (1.76 s), stronger (2.60 s)
Function word – –
5.6.5 Analysis of accent component at PW level In this analysis, accent components of 110 utterances of L1 English speakers and 440 utterances of L2 English speakers were compared with respect to lexical stress of words and word level prominence at PW level. Since pitch peaks of stressed words are comparatively higher than that of unstressed words, so every position, where peak of accent component appeared, denotes the position of stressed word. Therefore, the time at which pitch peak occurred was measured manually from the accent component and found out the word (considered as stressed word)from the corresponding time point (measured manually from the accent component) from the local pattern of F0 contour using Praat acoustic analysis software. In English, prominence is given to some words in an utterance to express the most important meaning of the utterance (Pierrehumbert 1980). The most prominent word is the last stressed word (accented word) of the prosodic phrase, and the less prominent word is the stressed word (accented word) that precedes the most prominent word in the same prosodic phrase (Chen et al. 2009). From the result of comparison between accent components of both speaker groups, four types of utterance patterns were found for L2 English speakers that differed from L1 English speakers.
Most prominent word Less prominent word
Prosodic phrase (PP) PP1
PP2
Sun –
stronger disputing
Type 1 Consider the sample utterances “The North Wind and the Sun were disputing which was the stronger” of an L1 and an L2 English speakers. Both speakers had two prosodic phrases in their utterances—prosodic phrase 1 (PP1): The North Wind and the Sun; prosodic phrase 2 (PP2): were disputing which was the stronger; that means, L2 English speaker involved in proper phrasing. Figure 8a, b show the F 0 contour of prosodic word position of these utterances by L1 and L2 English speakers respectively. From Fig. 8a, it is observed that: (i) L1 English speaker’s utterance consisted of three prosodic words—prosodic word 1 (PW1) (0.02–1.33 s): The North Wind and the Sun; prosodic word 2 (PW2) (1.35–2.16 s): were disputing; prosodic word 3 (PW3) (2.19–3.29 s): which was the stronger. (ii) From Table 9, it is observed that L1 English speaker’s function words were unstressed, but content words were stressed. (iii) Table 10 shows word level prominence in prosodic phrases of L1 English speaker’s utterance. From Fig. 8b, it is observed that: (i) L2 English speaker’s utterance consisted of five prosodic words—prosodic word 1 (PW1) (0.08–0.8 s): The North Wind; prosodic word 2 (PW2) (0.9–1.79 s): and the Sun; prosodic word 3 (PW3) (1.8–2.34 s): were disputing; prosodic word 4 (PW4) (2.38–2.78 s): which; prosodic word 5 (PW5) (2.8–3.67 s): was the stronger. L2 English speaker used two prosodic words PW1, PW2 in place of single prosodic word PW1 of L1 English speaker and L2 English speaker also used two prosodic words PW4, PW5 in place of single prosodic word PW3 of L1 English speaker, results in accent
13
318
Int J Speech Technol (2017) 20:305–326
Table 11 Stressed words in prosodic phrases of L2 English speaker’s utterance
Table 13 Stressed words in prosodic phrases of L1 English speaker’s utterance
Prosodic phrase Stressed word (PP) Content word
Stressed word
Function word
Prosodic phrase (PP)
Content word
PP1 PP2
the (1.23 s) which (2.38 s), the (3.23 s)
Function word
PP1
when (0.20 s), traveler (0.68 s), came (1.05 s) wrapped (1.67 s), warm (2.40 s)
–
North (0.38 s) disputing (2.08 s)
Table 12 Word level prominence in prosodic phrases of L2 English speaker’s utterance Prominent word
Most prominent word Less prominent word
Prosodic phrase (PP)
PP2
PP1
PP2
Table 14 Word level prominence in prosodic phrases of L1 English speaker’s utterance
the (1.23 s) North
the (3.23 s) disputing
Prominent word
redundancy. (ii) Table 11 shows the stressed words in prosodic phrases of L2 English speaker’s utterance. From Tables 9 and 11, it is observed that only stressed word ‘disputing’ was common between L1 and L2 English speakers, but stressed words ‘sun’ and ‘stronger’ of L1 English speaker were not stressed by L2 English speaker, results in stress mismatch. L1 English speaker did not put stress on function words, unlike L2 English speaker; that means L2 English speaker could not realize function words and content words properly. (iii) Table 12 shows the word level prominence in prosodic phrases of L2 English speaker’s utterance. From Tables 10 and 12, it is observed that L2 English speaker’s most prominent words were not matched with that of L1 English speaker, but L2 English speaker’s less prominent word ‘disputing’ was matched with that of L1 English speaker only, results in improper prominence. Hence, this utterance of L2 English speaker was involved in proper phrasing, improper prominence and redundant use of accent component.
Most prominent word Less prominent word
Prosodic phrase (PP) PP1
PP2
came treaveler
warm wrapped
Type 2 Consider the sample utterances “when a traveler came along wrapped in a warm cloak” of an L1 and an L2 English speakers. Both speakers had two prosodic phrases in their utterances—prosodic phrase 1 (PP1): When a traveler came along; prosodic phrase 2 (PP2): wrapped in a warm cloak, and L2 English speaker involved in proper phrasing. Figure 9a, b show the F 0 contour of prosodic word position of these utterances by L1 and L2 English speakers respectively. From Fig. 9a, it is observed that: (i) L1 English speaker’s utterance had five prosodic words—prosodic word 1 (PW1) (0.06–0.4 s): when a; prosodic word 2 (PW2) (0.42–0.86 s): traveler; prosodic word 3 (PW3) (0.88–1.38 s): came along; prosodic word 4 (PW4)(1.4–2.06 s): wrapped in a; prosodic word 5 (PW5) (2.07–2.64 s): warm cloak. (ii) From Table 13, it is observed that L1 English speaker’s function words were unstressed, unlike content words. (iii)
Fig. 9 Prosodic word position by a L1 English speaker, b L2 English (L1 Bengali) speaker
13
–
Int J Speech Technol (2017) 20:305–326
319
Table 15 Stressed words in prosodic phrases of L2 English speaker’s utterance
Table 17 Stressed words in prosodic phrases of L1 English speaker’s utterance
Prosodic phrase (PP)
Stressed word
Prosodic phrase (PP)
PP1 PP2
came (0.38 s) a (0.18 s), along (1.26 s) wrapped (1.91 s), warm in (2.33 s) (2.85 s)
Content word
Function word
Table 16 Word level prominence in prosodic phrases of L2 English speaker’s utterance Prominent word
Most prominent word Less prominent word
Prosodic phrase (PP) PP1
PP2
along came
warm wrapped
Table 14 shows word level prominence in prosodic phrases of L1 English speaker’s utterance. On the other hand, from Fig. 9b, it is observed that: (i) L2 English speaker’s utterance had five prosodic words— prosodic word 1 (PW1) (0.05–0.35 s): When a; prosodic word 2 (PW2) (0.4–0.92 s): traveler came; prosodic word 3 (PW3) (0.93–1.57 s): along; prosodic word 4 (PW4) (1.59–2.54 s): wrapped in a; prosodic word 5 (PW5) (2.55–3.1 s): warm cloak. The number of prosodic word of L2 English speaker is same as that of L1 English speaker, but lexical word distribution among the prosodic words of L2 English speaker was different from that of L1 English speaker. (ii) Table 15 shows the stressed words in prosodic phrases of L2 English speaker’s utterance. From Tables 13 and 15, it is observed that the stressed words ‘came’, ‘wrapped’ and ‘warm’ were common between L1 and L2 English speakers’ utterances, but content words ‘when’, ‘traveler’ of L1 English speaker were not stressed by L2 English speaker, results in stress mismatch. L1 English speaker did not put stress on function words, but L2 English speaker did. This means that L2
PP1 PP2
Stressed word Content word
Function word
sun (0.60 s) stronger (1.22 s), two (1.86 s)
– –
English speaker could not differentiate function words and content words properly. (iii) Table 16 shows the word level prominence in prosodic phrases of L2 English speaker’s utterance. From Tables 14 and 16, it is observed that only one most prominent word ‘warm’ and one less prominent word ‘wrapped’ of L2 English speaker were matched with that of L1 English speaker; as a result, L2 English speaker involved in improper prominence. Hence, this utterance of L2 English speaker was involved in proper phrasing, improper prominence, as well as the number of prosodic word in the utterance of L2 English speaker, was same as that of L1 English speaker, but lexical word distribution among prosodic words of L2 English speaker was different from that of L1 English speaker. Type 3 Consider the sample utterances “that the Sun was the stronger of the two” of an L1 and an L2 English speakers. There were two prosodic phrases for L1 English speaker’s utterance; prosodic phrase 1 (PP1): that the Sun was the; prosodic phrase 2 (PP2): stronger of the two. For L2 English speaker, there were two prosodic phrases also; prosodic phrase 1 (PP1): that the Sun; prosodic phrase 2 (PP2): was the stronger of the two. As the number of prosodic phrases was same as that of L1 English speaker, but lexical word distribution among prosodic phrases of L2 English speaker was different from L1 English speaker; that means, L2 English speaker involved in improper phrasing. Figure 10a, b represent F 0 contour of prosodic
Fig. 10 Prosodic word position by a L1 English speaker, b L2 English (L1 Bengali) speaker
13
320
Int J Speech Technol (2017) 20:305–326
Table 18 Word level prominence in prosodic phrases of L1 English speaker’s utterance
Table 21 Stressed words in prosodic phrases of L1 English speaker’s utterance
Prominent word
Prosodic phrase (PP)
Prosodic phrase (PP)
Most prominent word Less prominent word
PP1
PP2
sun –
two stronger
Table 19 Stressed words in prosodic phrases of L2 English speaker’s utterance Prosodic phrase (PP)
PP1 PP2
Stressed word Content word
Function word
sun (0.99 s) stronger (1.64 s), two (2.26 s)
– the (2.13 s)
Table 20 Word level prominence in prosodic phrases of L2 English speaker’s utterance Prominent word
Most prominent word Less prominent word
Prosodic phrase (PP) PP1
PP2
sun –
two stronger
word position of these utterances by L1 and L2 English speakers respectively. From Fig. 10a, it is observed that: (i) L1 English speaker’s utterance consisted of three prosodic words—prosodic word 1 (PW1) (0.09–0.9 s): that the Sun was the; prosodic word 2 (PW2) (0.92–1.49 s): stronger; prosodic word 3 (PW3) (1.5–2.1 s): of the two; (ii) From Table 17, it is observed that function words were unstressed by L1 English speaker, unlike content words. (iii) Table 18 shows the word level prominence in prosodic phrases of L1 English speaker’s utterance.
PP1 PP2 PP3
Content word
Function word
last (0.59 s) wind (1.40 s) attempt (2.05 s)
– – –
From Fig. 10b, it is observed that: (i) L2 English speaker’s utterance consisted of five prosodic words— prosodic word 1 (PW1) (0.06–0.63 s): that; prosodic word 2 (PW2) (0.64–1.06 s): the sun; prosodic word 3 (PW3) (1.07–1.4 s): was the; prosodic word 4 (PW4) (1.45–1.92 s): stronger of; prosodic word 5 (PW5) (1.96–2.28 s): the two. The number of prosodic word of L2 English speaker was higher than that of L1 English speaker due to the use of additional accent component. (ii) Table 19 shows the stressed words in prosodic phrases of L2 English speaker’s utterance. From Tables 17 and 19, it is observed that content words sun (0.87 s) in PP1, stronger (1.64 s), two (1.26 s) in PP2 were stressed, which were matched with the stressed content words of L1 English speaker. Function words ‘the’ (1.23 s) in PP2 was stressed by L2 English speaker, but L1 English speaker did not put stress on function words. This means that, L2 English speaker could not realize function words and content words properly. (iii) Table 20 shows the word level prominence in prosodic phrases of L2 English speaker’s utterance. From Tables 18 and 20, it is observed that most prominent and less prominent words of L2 English speaker were matched with that of L1 English speaker; as a result, L2 English speaker’s utterance was involved in proper prominence. Hence, this utterance of L2 English speaker was involved in improper phrasing, proper prominence and use of additional accent component.
Fig. 11 Prosodic word position by a L1 English speaker, b L2 English (L1 Bengali) speaker
13
Stressed word
Int J Speech Technol (2017) 20:305–326
321
Type 4 Consider the sample utterances “and at last the North Wind gave up the attempt” of an L1 and an L2 English speakers. There were three prosodic phrases for L1 English speaker; prosodic phrase 1 (PP1): and at last; prosodic phrase 2 (PP2): the North Wind; prosodic phrase 3 (PP3): gave up the attempt. For L2 English speaker, there were three prosodic phrases also; prosodic phrase 1 (PP1): and at last the; prosodic phrase 2 (PP2): North Wind gave up; prosodic phrase 3 (PP3): the attempt. Since the number of prosodic phrase was same as that of L1 English speaker, but lexical word distribution among prosodic phrases of L2 English speaker was different from L1 English speaker; as a result, L2 English speaker involved in improper phrasing. Figure 11a, b show the F0 contour of prosodic word position of these utterances by L1 and L2 English speakers respectively. From Fig. 11a, it is observed that: (i) L1 English speaker’s utterance had three prosodic words—prosodic word 1 (PW1) (0.07–1.05 s): and at last; prosodic word 2 (PW2) (1.06–1.65 s): the North Wind; prosodic word 3 (PW3) (1.67–2.48 s): gave up the attempt.(ii) From Table 21, it is observed that function words were unstressed by L1 English speaker, unlike content words. (iii)Table 22 shows the word level prominence in prosodic phrases of L1 English speaker’s utterance. From Fig. 11b, it is observed that: (i) L2 English speaker’s utterance had three prosodic words—prosodic word 1 (PW1) (0.1–1.15 s): and at last the; prosodic word 2 (PW2) (1.17–2.24 s): North Wind gave up; prosodic word 3 (PW3) (2.25–3.03 s): the attempt. The number of prosodic word of L2 English speaker is same as that of L1 English speaker, but lexical word distribution among the prosodic words of L2 English speaker was different from that of L1 English speaker. (ii) Table 23 shows the stressed words in prosodic phrases of L2 English speaker’s utterance. From Tables 21 and 23, it is observed that only stressed word ‘last’ was common between L1 and L2 English speakers, but stressed words ‘wind’ and ‘attempt’ of L1 English speaker were not stressed words in L2 English speaker’s utterance, results in stress mismatch. Function word ‘the’ (2.66 s) in PP3 was stressed by L2 English speaker, but L1 English speaker did not put stress on function words; that means, L2 English speaker could not realize the difference between function words and content words properly.
Table 23 Stressed words in prosodic phrases of L2 English speaker’s utterance
Table 22 Word level prominence in prosodic phrases of L1 English speaker’s utterance
Table 24 Word level prominence in prosodic phrases of L2 English speaker’s utterance
Prominent word
Prominent word
Most prominent word Less prominent word
Prosodic phrase (PP) PP1
PP2
PP3
last –
wind –
attempt –
Prosodic phrase Stressed word (PP) Stressed content word
Stressed function word
PP1 PP2 PP3
– – the (2.66 s)
last (0.59 s) North (1.19 s) –
(iii) Table 24 shows the word level prominence in prosodic phrases of L2 English speaker’s utterance. From Tables 22 and 24, it is observed that there was no less prominent word in L2 English speaker’s utterance like L1 English speaker, but only most prominent word ‘last’ was common between L1 and L2 English speakers’ utterances; as a result, L2 English speaker involved in improper prominence. Hence, this utterance of L2 English speaker was involved in improper phrasing, improper prominence, and the number of prosodic word in the utterance of L2 English speaker was same as that of L1 English speaker, but lexical word distribution among prosodic words of L2 English speaker was different from L1 English speaker. Results from comparison of accent components at PW level of L1 and L2 English speaker groups are summarized as follows: • In case of 81.8% utterances of L2 English speakers,
the number of prosodic words was higher than that of L1 English speakers. Table 25 shows the reason and the number of L2 English speakers’ utterances (out of 81.8% utterances) to have more prosodic word compared to L1 English speakers. In case of remaining 18.2% utterances of L2 English speakers, the number of prosodic words was same as that of L1 English speakers, but lexical word distribution among prosodic words of L2 English speakers was different from L1 English speakers. • For L1 English speakers, 94.6% content words were stressed, but function words were unstressed. For L2 English speakers, out of these 94.6% content words, 61.3% content words were stressed, results in stress mismatch. L2 English speakers also put stress on 48.7% function words, unlike L1 English speakers; that means,
Most prominent word Less prominent word
Prosodic phrase (PP) PP1
PP2
PP3
last –
North –
the –
13
322
Int J Speech Technol (2017) 20:305–326
Table 25 Reason and number of L2 English speaker’s utterances to have more prosodic words Reason of more prosodic words
Number of L2 English speakers’ utterances (%)
Redundant use of accent component Use of additional accent component
63.6 18.2
Table 26 L2 English speakers’ utterances with all possible combinations of two criteria Criteria
Number of L2 English speakers’ utterances (%)
Proper phrasing, improper prominence Improper phrasing, proper prominence Improper phrasing, improper prominence
22.6 18.2 54.6
L2 English speakers could not differentiate content words and function words properly. • L2 English speakers realized 27.3% most prominent words and 26.7% less prominent words of L1 English speakers, results in improper prominence. 4.6% utterances of L2 English speakers were involved in proper phrasing with proper prominence like L1 English speakers, but remaining 95.4% utterances of L2 English speakers differ from the utterances of L1 English speakers due to improper phrasing or improper prominence or both of them. Table 26 shows the number of utterances, out of 95.4% utterances of L2 English speakers, which were associated with all possible combinations of two criteria (phrasing and prominence), that causes the difference from L1 English speakers’ utterances.
6 Discussions The analysis of discourse unit size and discourse boundary break realization shows that the L2 English (L1 Bengali) speakers produced more and shorter discourse units PPh and PW than that of L1 English speakers. These results indicate that the L2 English speakers divide English speech into shorter chunks than L1 English speakers in order to reduce the difficulty present in the large scale discourse. In particular, L1 Greek, Spanish, Korean, Mandarin speakers also divide English utterances into smaller chunking units than L1 English speakers (Hewings 1995). Therefore, this is a universal L2 speech planning strategy to minimize the difficulty and processing load of speech production. Since L2 English speakers produced smaller and more intermediate chunking units (PPh, PW) and large scale planning units (BG, PG) than that of L1 English speakers, L1
13
English speakers are capable of planning on a larger scale than L2 English speakers at every level of discourse-level speech planning organization. The smaller intermediate chunks in L2 English speech also represent their inconsistent positioning of B4/B5 boundaries. Furthermore, from the result of L1/L2 consistency by paragraph boundaries (B4/B5), higher average among-speaker consistency was found in L1 than L2 by 4.62% (81.54 vs. 76.92%).This result may seem to suggest that L2 English speakers’ planning of paragraph units is more inconsistent compared to L1 English speakers. The analysis of the pause duration reveals that the variation of pause duration at B3 boundary in L2 English speakers is higher than the variation at B2 and B4 boundaries, which explains that acoustic cues other than pause are more important in differentiation of boundary strength. These findings are consistent with the findings from pause duration of Taiwan Mandarin (L1) speakers in same English discourse (Visceglia et al. 2010). The analysis of phrase components at PPh level reveals that each utterance of L1 English speakers is clearly phrased by pitch height, but in case of L2 English (L1 Bengali) speakers, not all utterances are clearly phrased by pitch height and differences between clearly phrased and not clearly phrased utterances are sometime ambiguous. For L1 English speakers, pitch peaks in latter prosodic phrases of each utterance are lower than those of the forgoing prosodic phrases, but in case of L2 English speakers such prosodic phrases are not realized properly. This finding indicates that L1 English speakers’ utterances of simple declarative sentences are involved in falling pitch pattern and this pitch pattern cannot be acquired properly by L2 English speakers due to wrong utterance planning. Furthermore, L2 English speakers involve in improper phrasing due to the frequent use of an additional phrase component or fewer number of phrase component than that of L1 English speakers or number of phrase component of L2 English speakers is same as that of L1 English speakers, but word distribution among the prosodic phrases is different from L1 English speakers. From the analysis of phrase components, it is observed that the maximum number (36.4%) of utterances of L2 English speakers which involved in improper phrasing is due to frequent use of an additional phrase component; that means, L2 English speakers’ tendency is to produce more prosodic phrases in English utterances compared to L1 English speakers. On the other hand, the analysis of accent components at PW level reveals that the L1 English speakers put stress on the content words, unlike the function words, but L2 English (L1 Bengali) speakers put stress on the both type of words; that means, L2 English speakers cannot differentiate content words and function words properly, such differences are ambiguous to them. L2 English speakers produce more prosodic words than that of L1 English speakers
Int J Speech Technol (2017) 20:305–326
due to the redundant use of accent components or use of additional accent components. For L2 English speakers, prominence is realized irrelevant words for sentence structure or not realized at all, results in improper prominence. Considering all the results obtained from analysis of F0 components at prosodic labels (PPh, PW), it seems to conclude that the main reasons behind certain defects in prosodic patterns of L2 English speakers’ speech are improper phrasing, improper word level prominence, improper pitch patterns, and ambiguous differences between clearly, and not clearly phrased utterances, and ambiguous differences between content words and function words. English speech of L1 Bengali speakers will have better (i.e. closer to English) prosodic patterns if all of these deficiencies will be removed.
7 Conclusions This study reported the results of a systematic analysis on the various factors that have influences on the L1 and L2 (L1 Bengali) English speech strategies at the discourse level. Between-group differences in discourse units with respect to discourse boundary break and chunking and planning unit size represent the variation between L1 and L2 English discourse-level speech planning. Moreover, taking L1 English speakers as the norm, the between-group differences in consistency by paragraph boundaries (B4/ B5) should be interpreted as inconsistency of L2 English speakers to realize higher-level discourse constrained chunking, continuation or termination among paragraphs as systematically and clearly as L1 English speakers. As a result, the inconsistency may also be considered as higher level feature that may impede to overall intelligibility of L2 English speech. The results of this comparative study represent the most common causes of deficiencies in English speech planning of L1 Bengali speakers compared to L1 English speakers at discourse level also. The differences between speaker groups found in this study are marked with respect to L2 English speakers’ slower speech rate to L2 English speakers’ more production of shorter intermediate chunking units and large scale planning units to L2 English speakers’ inconsistent positioning of paragraph
323
boundaries (B4/B5) to L2 English speakers’ more usage of accent component at PW level and phrase component at PPh level to L2 English speakers’ lower pitch range, improper phrasing, improper word level prominence and ambiguous difference between content words and function words in the discourse level speech planning. Further analysis of the same data set will find other possible between-group differences at different levels of information sequencing and structure for L1 and L2 English speakers. The experiments need more samples and subjects and also need to be discussed in more detail in the future study to investigate the between-group differences in temporal domain, intensity domain, information planning. In addition, a cross L1 comparisons will be done in future to find whether between-group differences represent L2 universal processing strategies and limitations. English prosody differs among individuals and varies widely across dialects. Therefore, how the English of L1 Bengali speaker is affected by the speaker’s dialect will be investigated in the future study. Some differences are also presumed to be revealed in the prosodic patterns depending on whether Bengali subjects understand the meaning of a sentence, so this is also an issue to be studied in the future.
Appendix 1: detailed information of L1 American English speakers ID
Gender
Age (years)
Educational level
1 2 3 4 5 6 7 8 9 10
Male Male Male Male Male Female Female Female Female Female
23 25 21 22 27 28 24 26 22 23
Undergraduate Postgraduate Undergraduate Undergraduate Postgraduate Postgraduate Postgraduate Postgraduate Undergraduate Undergraduate
13
324
Int J Speech Technol (2017) 20:305–326
Appendix 2: detailed information of L1 Bengali speakers ID
Gender
Age (years)
Educational level
Discipline
Number of years studied English
7 18 32 39 40 9 31 10 25 38 8 14 1 26 3 13 19 20 21 4 2 23 28 36 35 37 6 24 34 16 30 33 12 11 17 15 29 22 27 5
Male Male Male Female Female Male Male Male Male Female Female Male Male Male Female Female Female Male Female Male Male Male Female Male Female Female Male Female Female Male Female Female Male Female Female Female Female Male Female Male
25 27 22 20 23 22 31 24 34 32 27 26 35 28 29 26 32 35 28 25 27 21 32 25 20 25 29 31 27 33 35 24 29 35 30 29 22 26 21 35
Postgraduate Postgraduate Undergraduate Undergraduate Undergraduate Undergraduate PhD Undergraduate PhD PhD Postgraduate Postgraduate PhD Postgraduate Postgraduate Postgraduate PhD PhD Postgraduate Undergraduate Postgraduate Undergraduate PhD Undergraduate Undergraduate Postgraduate PhD Undergraduate Postgraduate PhD PhD Undergraduate PhD PhD Postgraduate Postgraduate Undergraduate Postgraduate Undergraduate Postgraduate
Engineering Engineering Science Engineering Arts Science Education Engineering Engineering Engineering Humanity Arts Engineering Science Social Science Engineering Engineering Engineering Science Engineering Engineering Law Engineering Engineering Humanity Arts Science Science Engineering Engineering Social Science Social Science Engineering Social Science Science Engineering Engineering Engineering Science Social Science
10 12 10 11 14 12 21 16 25 24 19 20 27 21 23 16 23 25 20 17 18 11 22 16 12 17 23 26 21 27 28 17 22 23 20 21 15 19 14 29
13
Int J Speech Technol (2017) 20:305–326
Appendix 3: AESOP’s specified recording setup The speech of L1 and L2 (L1 Bengali) English speakers was recorded by using AESOP’s recording tool kit with AESOP’s specified recording platform. The detailed description of AESOP’s specified recording setup is given below. • Recording environment
Recording can be conducted in quiet room, such as a seminar room, lab, or a class room. • Sound card
Sennheiser PC155 comes equipped with a built-in sound card. No driver installation is required. • Recording machine
Either a desktop PC or a laptop may be used. Connect the headset to the PC or Laptop. The CUHK-SIAT recording tool is compatible with the following operating systems: Windows XP Service Pack 2. • Recording tool kit
The CUHK-SIAT recording tool was developed by Chinese University of Hong Kong, in collaboration with Shenzhen Institutes of Advanced Technology. It was subsequently modified to fit the requirements of this project. • Audio file format
Source: microphone Sampling rate: 16 kHz Bit rate: 16-bit Channel: mono
References Acharya, S., Mandal, D., & Kumar, S. (2011). Prosodic word boundary detection for Bangla based on empirical mode decomposition of F0 contour. In: Proceedings of Oriental COCOSDA International Conference on Speech Database and Assessments (pp. 1–5). Hsinchu, Taiwan. Acharya, S., Mandal, D., & Kumar, S. (2013). Prosodic word and phrase boundary detection based on F0 contour analyses using empirical mode decomposition. In: Proceedings of Oriental COCOSDA International Conference on Speech Database and Assessments (pp. 1–5). Delhi, India. Anderson-Hsieh, J., Johnson, R., & Koehler, K. (1992). The relationship between native speaker judgments of nonnative
325 pronunciation and deviance in segmentals, prosody, and syllable structure. Language Learning, 42(4), 529–555. Bansal, R. K. (1966). The intelligibility of Indian English: Measures of the intelligibility of connected speech, and sentence and word material, presented to listeners of different nationalities. Unpublished Doctoral Dissertation, University of London. Beckman, M. E., & Pierrehumbert, J. B. (1986). Intonational structure in Japanese and English. Phonology, 3(01), 255–309. Bhattacharya, K. (1988). Bengali phonetic reader (Vol. 28). Mysore: Central Institute of Indian Languages. Boersma, P., & Weenink, D. (2004). Retrieved May 15, 2015 from http://www.fon.hum.uva.nl/praat/. Busa, M. G., & Urbani, M. (2011). A cross linguistic analysis of pitch range in English L1 and L2. In: Proceedings of International Congress of Phonetic Sciences (pp. 380–383). Hong Kong. Chaudhary, S. (2009). Foreigners and foreign languages in India: A sociolinguistic history. Cambridge: Cambridge University Press. Chen, H., Fang, W., & Tseng, C. (2016). Prosodic prompts and information planning units in continuous speech—Relative allocation and compensation of prosodic highlight. In: Proceedings of 12th Phonetic Conference of China (pp. 21–26). Tongliao, China. Chen, H. K. Y., & Tseng, C. Y. (2015). Information content, weighting and distribution in continuous speech prosody-A crossgenre comparison. In: Proceedings of Oriental COCOSDA (pp. 75–80). Shanghai, China. Chen, H. K. Y., & Wei-te Fang, C. Y. T. (2015). Advance Prosodic Indexing—Acoustic realization of prompted information projection in continuous speeches and discourses. In: Proceedings of Oriental COCOSDA (pp. 31–35). Shanghai, China. Chen, H. K. Y., & Wei-te Fang, C. Y. T. (2016). The convergence of perceived prosodic highlight for discourse prosody. In: Proceedings of Speech Prosody (pp. 654–658). Boston. Chen, S. W., Wang, B., & Xu, Y. (2009). Closely related languages, different ways of realizing focus. In: Proceedings of Interspeech (pp. 1007–1010). United Kingdom: Brighton. Den Ouden, H., Noordman, L., & Terken, J. (2009). Prosodic realizations of global and local structure and rhetorical relations in read aloud news reports. Speech Communication, 51(2), 116–129. Fujisaki, H., & Hirose, K. (1984). Analysis of voice fundamental frequency contours for declarative sentences of Japanese. Journal of the Acoustical Society of Japan (E), 5(4), 233–242. Fujisaki, H., & Ohno, S. (1995). Analysis and modeling of fundamental frequency contours of Greek utterances. In: Proceedings of EUROSPEECH (Vol. 2, pp. 985–988). Madrid, Spain. Fujisaki, H., Ohno, S., & Wang, C. (1998). A command-response model for F 0 contour generation in multilingual speech synthesis. In: The Third ESCA/COCOSDA Workshop (ETRW) on Speech Synthesis. Jenolan Caves House, Blue Mountains, Australia. Grosz, B. J., & Sidner, C. L. (1986). Attention, intentions, and the structure of discourse. Computational Linguistics, 12(3), 175–204. Hayes, B., & Lahiri, A. (1991). Bengali intonational phonology. Natural Language & Linguistic Theory, 9(1), 47–96. Hewings, M. (1995). Tone choice in the English intonation of nonnative speakers. International Review of Applied Linguistics in Language Teaching, 33(3), 251–265. Hirose, K., & Fujisaki, H. (1982). Analysis and synthesis of voice fundamental frequency contours of spoken sentences. In: Proceedings of International Conference on Acoustics, Speech, and Signal Processing (Vol. 7, pp. 950–953). Paris, France. Hirschberg, J., & Grosz, B. (1992). Intonational features of local and global discourse structure. In: Proceedings of the workshop on Speech and Natural Language (pp. 441–446). Stroudsburg, PA, USA.
13
326 Hirschberg, J., & Pierrehumbert, J. (1986). The intonational structuring of discourse. In: Proceedings of the 24th annual meeting on Association for Computational Linguistics (pp. 136–144). New York, USA. House, J. (2013). Developing pragmatic competence in English as a lingua franca: Using discourse markers to express (inter) subjectivity and connectivity. Journal of Pragmatics, 59, 57–67. Huang, N. E., Shen, Z., Long, S. R., Wu, M. C., Shih, H. H., Zheng, Q., Yen, N.C., Tung, C.C., & Liu, H. H. (1998). The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis. In: Proceedings of the Royal Society of London A: Mathematical, Physical and Engineering Sciences (Vol. 454, No. 1971, pp. 903–995). Lehiste, I. (1975). The phonetic structure of paragraphs. In: Proceedings of Structure and process in speech perception (pp. 195– 206). Berlin: Springer. Lehiste, I. (1982). Some phonetic characteristics of discourse. Studia Linguistica, 36(2), 117–130. Lewis, M. P., Simons, G. F., & Fennig, C. D. (2009). Ethnologue: Languages of the world (Vol. 9). Dallas, TX: SIL international. Mann, W. C., & Thompson, S. A. (1988). Rhetorical structure theory: Toward a functional theory of text organization. Text-Interdisciplinary Journal for the Study of Discourse, 8(3), 243–281. Mayer, J., Jasinskaja, E., & Kolsch, U. (2006). Pitch range and pause duration as markers of discourse hierarchy: perception experiments. In: Proceedings of International Speech Communication Association (pp. 473–476). Pittsburgh, Pennsylvania. Meng, H., Tseng, C. Y., Kondo, M., Harrison, A. M., & Visceglia, T. (2009). Studying L2 suprasegmental features in asian Englishes: a position paper. In: Proceedings of International Speech Communication Association (pp. 1715–1718). Brighton. Mondonedo, M. R. (1999). Handbook of the International Phonetic Association. A Guide to the Use of the international Phonetic Alphabet. Cambridge: Cambridge University Press. Nariai, T., & Tanaka, K. (2008). A study of pitch patterns of Japanese English analyzed via comparative linguistic features of English and Japanese. In: Proceedings of International Speech Communication Association (pp. 776–779). Brisbane, Australia. Narusawa, S., Minematsu, N., Hirose, K., & Fujisaki, H. (2002). A method for automatic extraction of model parameters from fundamental frequency contours of speech. In: Proceedings of Acoustics, Speech, and Signal Processing (Vol. 1, pp. 506–509). Orlando, Florida. Pickering, L. (2001). The role of tone choice in improving ITA communication in the classroom. TESOL Quarterly, 35(2), 233–255. Pickering, L. (2004). The structure and function of intonational paragraphs in native and nonnative speaker instructional discourse. English for Specific Purposes, 23(1), 19–43. Pierrehumbert, J. B. (1980). The phonology and phonetics of English intonation. Doctoral Dissertation, Massachusetts Institute of Technology.
13
Int J Speech Technol (2017) 20:305–326 Rilling, G., Flandrin, P., & Goncalves, P. (2003). On empirical mode decomposition and its algorithms. In: IEEE-EURASIP workshop on nonlinear signal and image processing (Vol. 3, pp. 8–11). Grado, Italy. Roach, P. (1998). English phonetics and phonology: A practical course (2nd edn.). Cambridge: Cambridge University Press. Silverman, K. E. A. (1987). The structure and processing of fundamental frequency contours. Unpublished Doctoral Dissertation, University of Cambridge. U.K: Cambridge. Swerts, M. (1997). Prosodic features at discourse boundaries of different strength. The Journal of the Acoustical Society of America, 101(1), 514–521. Tseng, C. Y. (2006). Higher level organization and discourse prosody. In: Proceedings of the Second International Symposium on Tonal Aspects of Languages (pp. 23–34). La Rochelle, France. Tseng, C. Y., Cheng, Y. C., & Chang, C. (2005). Sinica COSPRO and Toolkit—Corpora and platform of Mandarin Chinese fluent speech. In: Proceedings of Oriental COCOSDA International Conference on Speech Database and Assessments (pp. 23–28). Indonesia. Tseng, C. Y., & Su, C. Y. (2014). L2 discourse and information planning and their prosodic implicaitons. In: Proceedings of Oriental COCOSDA International Conference on Speech Database and Assessments (pp. 65–70). Phuket, Thailand. Tyler, A., & Davies, C. (1990). Cross-linguistic communication missteps. Text-Interdisciplinary Journal for the Study of Discourse, 10(4), 385–412. Tyler, A. E., Jefferies, A. A., & Davies, C. E. (1988). The effect of discourse structuring devices on listener perceptions of coherence in non-native university teacher’s spoken discourse. World Englishes, 7(2), 101–110. Visceglia, T., Su, C. Y., & Tseng, C. Y. (2012). Comparison of English narrow focus production by L1 English, Beijing and Taiwan Mandarin speakers. In: Proceedings of Oriental COCOSDA International Conference on Speech Database and Assessments (pp. 47–51). Macau, China. Visceglia, T., Tseng, C. Y., Su, Z. Y., & Huang, C. F. (2010). Discourse prosody planning in L1 and L2 English. In: Proceedings of Oriental COCOSDA International Conference on Speech Database and Assessments (pp. 24–25). Kathmandu, Nepal. Wennerstrom, A. (1994). Intonational meaning in English discourse. Applied Linguistics, 15(4), 399–420. Wennerstrom, A. (1998). Intonation as cohesion in academic discourse. Studies in Second Language Acquisition, 20(01), 1–25. Xiaoli, J., Xia, W., & Aijun, L. (2009). Intonation patterns of yes-no questions for Chinese EFL learners. In: Proceedings of Oriental COCOSDA International Conference on Speech Database and Assessments (pp. 47–51). Urumqi. Yule, G. (1980). Speakers’ topics and major paratones. Lingua, 52(1– 2), 33–47.