LSA2024-Palatals

Posted on January 3, 2024 by Bowern, Claire

Coralie Cram and I have a poster at the LSA in New York (Jan 4-8 2024) looking into variation in palatal stops in 11 Australian languages. This is proof of concept work looking at the structure of variation in duration, release burst presence, and intensity contours for palatals. It’s also a chance for us to figure out a pipeline for the corpus materials we are collecting and preparing.

Here’s the full poster: LSA2024_Palatals

References

Babinski, Sarah. 2020. Lexical stress: Phonetic variation under phonological stability in 23 Australian languages. In LabPhon17. online.

Babinski, Sarah. 2022. Archival Phonetics & Prosodic Typology in Sixteen Australian Languages. Yale University.

Dixon, R. M. W. 1980. The languages of Australia (Cambridge Language Surveys). Cambridge: Cambridge University Press.

Fletcher, Janet & Andrew Butcher. 2014. Sound patterns of Australian languages. In Harold Koch & Rachel Nordlinger (eds.), The Languages and Linguistics of Australia: A Comprehensive Guide. Walter de Gruyter GmbH & Co KG.

Round, Erich R. 2023. Segment inventories. In Claire Bowern (ed.), The Oxford Guide to Australian Languages (Oxford Guides to the World’s Languages), Chapter 10. Oxford, New York: Oxford University Press.

Tabain, Marija. 2023. Articulatory and acoustic phonetics. In Claire Bowern (ed.), The Oxford Guide to Australian Languages (Oxford Guides to the World’s Languages), Chapter 9. Oxford, New York: Oxford University Press.

Tabain, Marija & Richard Beare. 2011. A Spectral Analysis of Stop Bursts in Pitjantjatjara. In ICPhS, 1934–1937.

Sample sizes for Australian phonetics corpora

Posted on January 3, 2024 by Bowern, Claire

Coralie Cram and I have been trying to figure out how many tokens are a safe minimum for the corpora we are investigating. I have an ongoing project on comparative phonetics of Australian languages and one of the issues we continually come across is how much data we need, and whether we should include corpora with relatively small amounts of data, or should stick to the largest corpora. Of course, we understand that the answer is likely to be “it depends” – it depends on the clarity of recordings and the number of speakers, for example. But we should still be able to come up with some general guidelines: is 100 tokens enough, for example? Or does it need to be more like 500? This poster has some preliminary findings, using data from my Bardi corpus. Here’s a link to the full poster.

LSA2024_Bardi

In the poster, we take two datasets for Bardi: the wordlist data that was used for the 2012 Bardi JIPA sketch (DOI: https://doi.org/10.1017/S0025100312000217) and a set of narrative recordings. The narratives are much more like the general purpose field collections that we are working with form archives, while the wordlist data is smaller but higher quality recordings and more careful speech.

Methods are on the poster, but in brief they involve taking different subsets of the full dataset and using the Kolgorov-Smirnov test to evaluate differences in sample means and variance. We looked at short vowels and made no attempt to remove mistracked formants or outliers.

We found that for the wordlist data, we needed more than about 50% of the data (so, about 400 tokens across 4 vowels) to replicate the sample characteristics of the full dataset. For the narrative data, it was about 30% of the data (more like 2300 tokens), but the variance was much higher.

References

Arnold, T. B., Emerson, J. W., & worldwide, R. C. T. and contributors. (2022). dgof: Discrete Goodness-of-Fit Tests (1.4). https://cran.r-project.org/web/packages/dgof/index.html

Bombien, L., & Winkelmann, R. (2023). Formants and their bandwidths. https://cran.r-project.org/web/packages/wrassp/vignettes/wrassp_intro.html

Dockum, R., & Bowern, C. (2019). Swadesh wordlists are not long enough. Language Documentation and Description, 16. https://doi.org/10.25894/ldd112

Stanley, J. A., & Sneller, B. (2023). Sample size matters in calculating Pillai scores. The Journal of the Acoustical Society of America, 153(1), 54–67. https://doi.org/10.1121/10.0016757

Whalen, D. H., & McDonough, J. (2015). Taking the Laboratory into the Field. Annual Review of Linguistics, 1(1), 395–415. https://doi.org/10.1146/annurev-linguist-030514-124915

Whalen, D. H., DiCanio, C., & Dockum, R. (2022). Phonetic documentation in three collections: Topics and evolution. Journal of the International Phonetic Association, 52(1), 95–121. https://doi.org/10.1017/S0025100320000079

Two recent Voynich talks

Posted on July 15, 2021 by Bowern, Claire

I’ve started to do enough public Voynich-related activities that it was time to add a page for them.

Paradisec @100

Posted on March 8, 2021 by Bowern, Claire

I recently gave a talk at Paradisec’s conference in honour of their 100th terabyte of data. The youtube version is here. It’s a companion talk to the ICLDC talk that looks at digital archiving and potential issues in using materials.

ICLDC

Posted on March 8, 2021 by Bowern, Claire

Sarah Babinski and I recently did a talk on Digital Linguistics and the Archive. The actual talk is here on YouTube. We talked about our work on how using archives for language work is not as straightforward as it could be and some of the most important issues that can be addressed.

Claire Bowern

Professor, Yale Linguistics

Category Archives: Research

LSA2024-Palatals

Sample sizes for Australian phonetics corpora

Two recent Voynich talks

Paradisec @100

ICLDC

Claire Bowern

Professor, Yale Linguistics

Share this:

Share this:

Share this:

Share this:

Share this: