UCL talk, Feb 11th

Talk at University College London Speech Science Forum. Link is here. Slides for the talk are here. Recording is available here (Access Passcode: 71&Gx#4E)

Gestural coordination in the living lexicon of spoken words

Language varieties show variety-specific patterns of gestural coordination, where gestures are forces (dynamics) that exert control over articulatory movements (kinematics), see, e.g., Browman & Goldstein (1986). By hypothesis, the dimensions of gestural control are those that serve phonological function, e.g., supporting contrast in the lexicon.

I start by illustrating this point with a comparison of Russian palatalized consonants, e.g., /pj/, /bj/, /mj/, with articulatorily similar English sequences /pj/, /bj/, /mj/. High temporal resolution articulatory tracking, using Electromagnetic Articulography (EMA), reveals systematic differences in coordination corresponding to differing phonological functions: complex segments (Russian) vs. segment sequences (English).

I next present cases in which linguistic context conditions systematic changes in gestural coordination. First, in Tokyo Japanese, high vowel devoicing can trigger the categorical loss of a lingual gesture for the vowel and subsequent reorganization of gestural coordination (Shaw & Kawahara 2018, 2021). Second, in Mandarin Chinese, certain morpho-syntactic environments condition a shift in gestural timing, which shortens syllable duration and precipitates a loss of lexical tone. This last case is particularly informative when compared with diaspora Tibetan, where tone loss has proceeded without gestural reorganization (Geissler et al., 2021). These patterns are consistent with a characterization of the human lexicon in terms of a relatively small number of gestures and coordination modes, organized to support phonological function and sensitive to linguistic context.

I close by presenting two additional cases, also drawn from Mandarin and Japanese, that challenge the completeness of this view of the lexicon, showing both (1) that the lexicon absorbs contextual prosodic influences, leading to gradient shifts in phonetic form (Tang & Shaw, 2021) and (2) that words can resist influences of prosodic context (Kawahara, Shaw, Ishihara, 2021). Taken together, the data suggest that a low dimensional characterization of the lexicon in terms of discrete gestures and coordination modes co-exists with a representation of higher dimensional phonetic parameterization.


Browman, C., & Goldstein, L. (1986). Towards an Articulatory Phonology. Phonology Yearbook, 3, 219-252.

Geissler, C.,  Shaw, J.A., Fang H. & Tiede M.. (2021). Eccentric C-V timing across speakers of diaspora Tibetan with and without lexical tone contrasts. Proceedings of the 12th International Seminar on Speech Production, Yale University, 4pgs.

Kawahara, S., Shaw, J.A., & Ishihara, S. (2021). Assessing the prosodic licensing of wh-in-situ in Japanese: A computational-experimental approach. Natural Language & Linguistic Theory. https://doi.org/10.1007/s11049-021-09504-3

Shaw, J. A., & Kawahara, S. (2018). The lingual articulation of devoiced /u/ in Tokyo Japanese. Journal of Phonetics, 66, 100-119. https://doi.org/10.1016/j.wocn.2017.09.007

Shaw, J. A., & Kawahara, S. (2021). More on the articulation of devoiced /u/ in Tokyo Japanese: effects of surrounding consonants. manuscript, Yale University and Keio University. 47 pgs.

Tang, K., & Shaw, J. A. (2021). Prosody leaks into the memories of words. Cognition210, 104601. https://doi.org/10.1016/j.cognition.2021.104601


New Glossa paper

+Mandal, S., Best, C.T., Shaw, J.A., Cutler, A. 2020. Bilingual phonology in dichotic perception: A case study of Malayalam and English voicing. Glossa: a journal of general linguistics 5(1):73. 1-17. DOI: https://doi.org/10.5334/gjgl.853

Abstract: Listeners often experience cocktail-party situations, encountering multiple ongoing conversations while tracking just one. Capturing the words spoken under such conditions requires selective attention and processing, which involves using phonetic details to discern phonological structure. How do bilinguals accomplish this in L1-L2 competition? We addressed that question using a dichotic listening task with fluent Malayalam-English bilinguals, in which they were presented with synchronized nonce words, one in each language in separate ears, with competing onsets of a labial stop (Malayalam) and a labial fricative (English), both voiced or both voiceless. They were required to attend to the Malayalam or the English item, in separate blocks, and report the initial consonant they heard. We found that perceptual intrusions from the unattended to the attended language were influenced by voicing, with more intrusions on voiced than voiceless trials. This result supports our proposal for the feature specification of consonants in Malayalam-English bilinguals, which makes use of privative features, underspecification and the “standard approach” to laryngeal features, as against “laryngeal realism”. Given this representational account, we observe that intrusions result from phonetic properties in the unattended signal being assimilated to the closest matching phonological category in the attended language, and are more likely for segments with a greater number of phonological feature specifications.



New JASA paper

Shaw, J. A., & Tyler, M. D. (2020). Effects of vowel coproduction on the timecourse of tone recognition. The Journal of the Acoustical Society of America147(4), 2511-2524. pdf

Abstract: Vowel contrasts tend to be perceived independently of pitch modulation, but it is not known whether pitch can be perceived independently of vowel quality. This issue was investigated in the context of a lexical tone language, Mandarin Chinese, using a printed word version of the visual world paradigm. Eye movements to four printed words were tracked while listeners heard target words that differed from competitors only in tone (test condition) or also in onset consonant and vowel (control condition). Results showed that the timecourse of tone recognition is influenced by vowel quality for high, low, and rising tones. For these tones, the time for the eyes to converge on the target word in the test condition (relative to control) depended on the vowel with which the tone was coarticulated with /a/ and /i/ supporting faster recognition of high, low, and rising tones than /u/. These patterns are consistent with the hypothesis that tone-conditioned variation in the articulation of /a/ and /i/ facilitates rapid recognition of tones. The one exception to this general pattern—no effect of vowel quality on falling tone perception—may be due to fortuitous amplification of the harmonics relevant for pitch perception in this context.

Talk at BLS

Talk at the Berkeley Linguistics Society workshop “Phonological representations: at the crossroad between gradience and categoricity”  Feb 7-8 was entitled: Finding phonological structure in vowel confusions across English accents. The talk draws a connection between some collaborative work on cross-accent speech perception (Shaw et al. 2018. 2019) and contrastive feature hierarchies, in the sense of Dresher (2009).

The slides are available here.




New paper in Frontiers

“Spatially Conditioned Speech Timing: Evidence and Implications” is part of the Frontiers research topic “Models and Theories of Speech Production”. The paper provides evidence that the temporal coordination of articulatory gestures in speech is sensitive to the moment-by-moment location of speech organs (tongue, lips), a result which has implications for mechanisms of speech motor control, including the balance between feed-forward and state-based feedback control.


Patterns of relative timing between consonants and vowels appear to be conditioned in part by phonological structure, such as syllables, a finding captured naturally by the two-level feedforward model of Articulatory Phonology (AP). In AP, phonological form – gestures and the coordination relations between them – receive an invariant description at the inter-gestural level. The inter-articulator level actuates gestures, receiving activation from the inter-gestural level and resolving competing demands on articulators. Within this architecture, the inter-gestural level is blind to the location of articulators in space. A key prediction is that intergestural timing is stable across variation in the spatial position of articulators. We tested this prediction by conducting an Electromagnetic Articulography (EMA) study of Mandarin speakers producing CV monosyllables, consisting of labial consonants and back vowels in isolation. Across observed variation in the spatial position of the tongue body before each syllable, we investigated whether inter-gestural timing between the lips, for the consonant, and the tongue body, for the vowel, remained stable, as is predicted by feedforward control, or whether timing varied with the spatial position of the tongue at the onset of movement. Results indicated a correlation between the initial position of the tongue gesture for the vowel and C-V timing, indicating that inter-gestural timing is sensitive to the position of the articulators, possibly relying on somatosensory feedback. Implications of these results and possible accounts within the Articulatory Phonology framework are discussed.

Shaw, J. A., & Chen, W.-r. (2019). Spatially Conditioned Speech Timing: Evidence and Implications. Frontiers in psychology, 10(2726). doi:10.3389/fpsyg.2019.02726

AMP talk & poster: Oct 11

I’ll be representing a couple of research projects at the Annual Meeting on Phonology (AMP).

Titles and links to abstracts are below:

Poster: Kevin Tang (University of Florida) and Jason Shaw (Yale University). Sentence prosody leaks into the lexicon: evidence from Mandarin Chinese

Talk: Shigeto Kawahara (Keio University), Jason Shaw (Yale University) and Shinichiro Ishihara (Lund University). Do Japanese speakers always prosodically group wh-elements and their licenser? Implications for Richards’ (2010) theory of wh-movement 

USC colloquium talk: Sept 23

Title and abstract from colloquium talk at USC, Sept 23, 2019:

The temporal geometry of phonology

 Abstract: Languages differ in how the spatial dimensions of the vocal tract, i.e., constriction location/degree, are organized to express phonological form. Languages also differ in temporal geometry, i.e., how sequences of vocal tract constrictions are organized in time. The most comprehensive accounts of temporal organization to date have been developed within the Articulatory Phonology framework, where phonological representations take the form of temporally coordinated action units, known as gestures (Browman & Goldstein, 1986; Gafos & Goldstein, 2012; Goldstein & Pouplier, 2014). A key property of Articulatory Phonology is the feed-forward control of articulation by ensembles of temporally organized gestures.

In this talk, I first make explicit how the temporal geometry of phonology conditions language-specific patterns of phonetic variation. Through computational simulation, I illustrate how distinct temporal geometries for syllable types and segment types (complex segments vs. segment sequences) structure phonetic variation. Model predictions are tested on experimental phonetic data from English (Shaw, Durvasula, & Kochetov, 2019; Shaw & Gafos, 2015), Arabic (Shaw, Gafos, Hoole, & Zeroual, 2011), Japanese (Shaw & Kawahara, 2018) and Russian (Kochetov, 2006; Shaw et al., 2019). Phonological structure formalized as ensembles of local coordination relations between articulatory gestures (Gafos, 2002) and implemented in stochastic models (Gafos, Charlow, Shaw, & Hoole, 2014; Shaw & Gafos, 2015) reliably describes patterns of temporal variation in these languages. These results crucially rely on feed-forward control of gestures. I close with data from Mandarin Chinese which presents a potential challenge to strict feed-forward control. Unexpectedly, inter-gestural coordination in Mandarin appears to be sensitive to the spatial position of articulators—gestures begin earlier in time just when they are farther in space from their target. To account for the Mandarin data, I explore the possibility that gestures are temporal organized according to spatial targets, which requires a combination of feedback and feedforward control, and discuss some implications of the proposal for speech perception and sound change.


Browman, C., & Goldstein, L. (1986). Towards an Articulatory Phonology. Phonology Yearbook, 3, 219-252.

Gafos, Charlow, S., Shaw, J. A., & Hoole, P. (2014). Stochastic time analysis of syllable-referential intervals and simplex onsets. Journal of Phonetics, 44, 152-166.

Gafos, A. (2002). A grammar of gestural coordination. Natural Language and Linguistic Theory, 20, 269-337.

Gafos, A., & Goldstein, L. (2012). Articulatory representation and organization. In A. C. Cohn, C. Fougeron, & M. K. Huffman (Eds.), The Oxford Handbook of Laboratory Phonology (pp. 220-231).

Goldstein, L., & Pouplier, M. (2014). The Temporal Organization of Speech. The Oxford handbook of language production, 210-240.

Kochetov, A. (2006). Syllable position effects and gestural organization: Articulatory evidence from Russian. In L. G. Goldstein, D. H. Whalen, & C. Best (Eds.), Laboratory Phonology 8 (pp. 565-588). Berlin: de Gruyter.

Shaw, J. A., Durvasula, K., & Kochetov, A. 2019. The temporal basis of complex segments. In Sasha Calhoun, Paola Escudero, Marija Tabain & Paul Warren (eds.) Proceedings of the 19th International Congress of Phonetic Sciences, Melbourne, Australia 2019 (pp. 676-680). Canberra, Australia: Australasian Speech Science and Technology Association Inc.

Shaw, J. A., & Gafos, A. I. (2015). Stochastic Time Models of Syllable Structure. PLoS One, 10(5), e0124714 0124711-0124736.

Shaw, J. A., Gafos, A. I., Hoole, P., & Zeroual, C. (2011). Dynamic invariance in the phonetic expression of syllable structure: a case study of Moroccan Arabic consonant clusters. Phonology, 28(3), 455-490.

Shaw, J. A., & Kawahara, S. (2018). The lingual articulation of devoiced /u/ in Tokyo Japanese. Journal of Phonetics, 66, 100-119. doi:https://doi.org/10.1016/j.wocn.2017.09.007