multilingual NLP

Reducing Confusion in Active Learning for Part-Of-Speech Tagging

Active learning (AL) uses a data selection algorithm to select useful training samples to minimize annotation cost. This is now an essential tool for building low-resource syntactic analyzers such as part-of-speech (POS) taggers. Existing AL …

Automatic Extraction of Rules Governing Morphological Agreement

Creating a descriptive grammar of a language is an indispensable step for language documentation and preservation. However, at the same time it is a tedious, time-consuming task. In this paper, we take steps towards automating this process by …

It's not a Non-Issue: Negation as a Source of Error in Machine Translation

As machine translation (MT) systems progress at a rapid pace, questions of their adequacy linger. In this study we focus on negation, a universal, core property of human language that significantly affects the semantics of an utterance. We …

X-FACTR: Multilingual Factual Knowledge Retrieval from Pretrained Language Models

Language models (LMs) have proven surprisingly successful at capturing factual knowledge by completing cloze-style fill-in-the-blank questions such as *Punta Cana is located in _blank_.* However, while knowledge is both written and queried in many …

Comparison of Interactive Knowledge Base Spelling Correction Models for Low-Resource Languages

Multilingual NLP

An exciting research direction that we pursue at GMU NLP is building multi-lingual and polyglot systems. The languages of the world often share similar characteristics, and training systems cross-lingually allows us to leverage these similarities and overcome data scarcity issues.

Optimizing Data Usage via Differentiable Rewards

Predicting Performance for Natural Language Processing Tasks

Given the complexity of combinations of tasks, languages, and domains in natural language processing (NLP) research, it is computationally prohibitive to exhaustively test newly proposed models on each possible experimental setting. In this work, we …

Should All Cross-Lingual Embeddings Speak English?

Most of recent work in cross-lingual word embeddings is severely Anglocentric. The vast majority of lexicon induction evaluation dictionaries are between English and another language, and the English embedding space is selected by default as the hub …

The CMU-LTI submission to the SIGMORPHON 2020 Shared Task 0: Language-Specific Cross-Lingual Transfer

This paper describes the CMU-LTI submission to the SIGMORPHON 2020 Shared Task 0 on typologically diverse morphological inflection. The (unrestricted) submission uses the cross-lingual approach of our last year′s winning submission (Anastasopoulos …