•   When: Thursday, February 01, 2024 from 10:30 AM to 11:30 AM
  •   Speakers: Heather Lent,
  •   Location: ENGR 4201
  •   Export to iCal

Abstract: Only a handful of the world’s languages benefit from today’s modern language technology, such as online search and automatic translation. Due to the massive amounts of data required by state-of-the-art solutions, many languages will continue to be left behind, as smaller communities cannot generate enough textual data to power modern NLP approaches. In this talk, we will focus on Creoles, a collection of languages spoken by many communities across the globe, which evolved from historical linguistic contact between unrelated languages. Notably, Creoles are largely stigmatised due to their historical ties with colonisation and slavery, and the consequences this stigmatisation are also evident in NLP today, as resources and technologies for these languages are almost non-existent. Considering this context, we will discuss the specific challenges of developing NLP for Creoles, our efforts to create resources and viable models for Creoles across a variety of tasks, and finally, motivate how Creoles offer unique possibilities for understanding the underlying mechanisms of transfer learning. 

 

Bio: Heather Lent is a postdoctoral researcher at Aalborg University in Denmark. Her primary research interests pertain to “low resource” scenarios within NLP (i.e., overcoming the limitations of working with insufficient data). In conjunction with this, Heather is particularly interested in working to strengthen areas of NLP which could be described as “vulnerable”, whether that be NLP for vulnerable languages (e.g. Creoles), or addressing the vulnerabilities inherent to the NLP models we create (e.g., backdoor attacks), so that safe NLP can be available to all. Prior to her postdoc, Heather completed her PhD in 2022 at the University of Copenhagen in NLP, where she first began her work in Creole NLP. Prior to this, she has several years experience working in bio/medical NLP, another under-resourced domain.  

 

Posted 2 months, 4 weeks ago