Program
Linguistics
College
Arts and Sciences
Student Level
Doctoral
Location
Student Union Building, Ballroom C
Start Date
8-11-2021 11:00 AM
End Date
8-11-2021 1:00 PM
Abstract
Linguistic research and language instruction have benefitted from linguistic corpora (digital databases containing numerous texts written in one language). For low-resource languages where written records are scarce, the creation of corpora can be a challenge, but has the benefit of supporting conservation and revitalization. My project consists in the creation of a corpus of Diné Bizaad (Navajo), based on existing publicly available narratives written in the 1950's. I present examples from the corpus as well as mention challenges faced in the annotation of the first 5000 words. One problem that pertains to Native American Languages in general is that only few speakers are trained in linguistic analysis. Learning how to disentangle the meaning in words is necessary for the creation of a grammatically annotated corpus, and it requires long training, offered only at certain institutions. Another challenge results from the particular structure of the language: even for linguists, the annotation of Navajo words is not straightforward, since grammatical rules are word-specific. Unlike in English or Spanish, word components in Navajo comprise both lexical and grammatical information, which means that every word has a different grammatical pattern. Resources like the Navajo Dictionary and the Analytical Lexicon provide support in resolving ambiguities in annotation. On the other hand, students and instructors of the Navajo Language Program at UNM have sufficient expertise to collaborate in this project. Using the free software FieldWorks Language Explorer and the depository Language Depot, the corpus can be accessed, downloaded and expanded. The goal of building this corpus is to facilitate empirical research on Navajo and to improve education by providing enough data that can be used to develop exercises and assignments.
Lukas's Poster
Creating a corpus of Navajo Historical Narratives - Prospects and Challenges
Student Union Building, Ballroom C
Linguistic research and language instruction have benefitted from linguistic corpora (digital databases containing numerous texts written in one language). For low-resource languages where written records are scarce, the creation of corpora can be a challenge, but has the benefit of supporting conservation and revitalization. My project consists in the creation of a corpus of Diné Bizaad (Navajo), based on existing publicly available narratives written in the 1950's. I present examples from the corpus as well as mention challenges faced in the annotation of the first 5000 words. One problem that pertains to Native American Languages in general is that only few speakers are trained in linguistic analysis. Learning how to disentangle the meaning in words is necessary for the creation of a grammatically annotated corpus, and it requires long training, offered only at certain institutions. Another challenge results from the particular structure of the language: even for linguists, the annotation of Navajo words is not straightforward, since grammatical rules are word-specific. Unlike in English or Spanish, word components in Navajo comprise both lexical and grammatical information, which means that every word has a different grammatical pattern. Resources like the Navajo Dictionary and the Analytical Lexicon provide support in resolving ambiguities in annotation. On the other hand, students and instructors of the Navajo Language Program at UNM have sufficient expertise to collaborate in this project. Using the free software FieldWorks Language Explorer and the depository Language Depot, the corpus can be accessed, downloaded and expanded. The goal of building this corpus is to facilitate empirical research on Navajo and to improve education by providing enough data that can be used to develop exercises and assignments.