Coralie Cram and I have a poster at the LSA in New York (Jan 4-8 2024) looking into variation in palatal stops in 11 Australian languages. This is proof of concept work looking at the structure of variation in duration, release burst presence, and intensity contours for palatals. It’s also a chance for us to figure out a pipeline for the corpus materials we are collecting and preparing.

Here’s the full poster: LSA2024_Palatals


Babinski, Sarah. 2020. Lexical stress: Phonetic variation under phonological stability in 23 Australian languages. In LabPhon17. online.

Babinski, Sarah. 2022. Archival Phonetics & Prosodic Typology in Sixteen Australian Languages. Yale University.

Dixon, R. M. W. 1980. The languages of Australia (Cambridge Language Surveys). Cambridge: Cambridge University Press.

Fletcher, Janet & Andrew Butcher. 2014. Sound patterns of Australian languages. In Harold Koch & Rachel Nordlinger (eds.), The Languages and Linguistics of Australia: A Comprehensive Guide. Walter de Gruyter GmbH & Co KG.

Round, Erich R. 2023. Segment inventories. In Claire Bowern (ed.), The Oxford Guide to Australian Languages (Oxford Guides to the World’s Languages), Chapter 10. Oxford, New York: Oxford University Press.

Tabain, Marija. 2023. Articulatory and acoustic phonetics. In Claire Bowern (ed.), The Oxford Guide to Australian Languages (Oxford Guides to the World’s Languages), Chapter 9. Oxford, New York: Oxford University Press.

Tabain, Marija & Richard Beare. 2011. A Spectral Analysis of Stop Bursts in Pitjantjatjara. In ICPhS, 1934–1937.

Sample sizes for Australian phonetics corpora

Coralie Cram and I have been trying to figure out how many tokens are a safe minimum for the corpora we are investigating. I have an ongoing project on comparative phonetics of Australian languages and one of the issues we continually come across is how much data we need, and whether we should include corpora with relatively small amounts of data, or should stick to the largest corpora. Of course, we understand that the answer is likely to be “it depends” – it depends on the clarity of recordings and the number of speakers, for example. But we should still be able to come up with some general guidelines: is 100 tokens enough, for example? Or does it need to be more like 500? This poster has some preliminary findings, using data from my Bardi corpus. Here’s a link to the full poster.


In the poster, we take two datasets for Bardi: the wordlist data that was used for the 2012 Bardi JIPA sketch (DOI: and a set of narrative recordings. The narratives are much more like the general purpose field collections that we are working with form archives, while the wordlist data is smaller but higher quality recordings and more careful speech.

Methods are on the poster, but in brief they involve taking different subsets of the full dataset and using the Kolgorov-Smirnov test to evaluate differences in sample means and variance. We looked at short vowels and made no attempt to remove mistracked formants or outliers.

We found that for the wordlist data, we needed more than about 50% of the data (so, about 400 tokens across 4 vowels) to replicate the sample characteristics of the full dataset. For the narrative data, it was about 30% of the data (more like 2300 tokens), but the variance was much higher.


Arnold, T. B., Emerson, J. W., & worldwide, R. C. T. and contributors. (2022). dgof: Discrete Goodness-of-Fit Tests (1.4).

Bombien, L., & Winkelmann, R. (2023). Formants and their bandwidths.

Dockum, R., & Bowern, C. (2019). Swadesh wordlists are not long enough. Language Documentation and Description, 16.

Stanley, J. A., & Sneller, B. (2023). Sample size matters in calculating Pillai scores. The Journal of the Acoustical Society of America, 153(1), 54–67.

Whalen, D. H., & McDonough, J. (2015). Taking the Laboratory into the Field. Annual Review of Linguistics, 1(1), 395–415.

Whalen, D. H., DiCanio, C., & Dockum, R. (2022). Phonetic documentation in three collections: Topics and evolution. Journal of the International Phonetic Association, 52(1), 95–121.