Arbdialectid at madar shared task 1: Language modelling and ensemble learning for fine grained arabic dialect identification

IRIS

In this paper, we present a Dialect Identification system (ArbDialectID) that competed at Task 1 of the MADAR shared task, MADAR Travel Domain Dialect Identification. We build a coarse and a fine grained identification model to predict the label (corresponding to a dialect of Arabic) of a given text. We build two language models by extracting features at two levels (words and characters). We firstly build a coarse identification model to classify each sentence into one out of six dialects, then use this label as a feature for the fine grained model that classifies the sentence among 26 dialects from different Arab cities, after that we apply ensemble voting classifier on both subsystems. Our system ranked 1st that achieving an f-score of 67.32%. Both the models and our feature engineering tools are made available to the research community.

Arbdialectid at madar shared task 1: Language modelling and ensemble learning for fine grained arabic dialect identification

Saad M.^{Membro del Collaboration Group}

2019-01-01

Abstract

In this paper, we present a Dialect Identification system (ArbDialectID) that competed at Task 1 of the MADAR shared task, MADAR Travel Domain Dialect Identification. We build a coarse and a fine grained identification model to predict the label (corresponding to a dialect of Arabic) of a given text. We build two language models by extracting features at two levels (words and characters). We firstly build a coarse identification model to classify each sentence into one out of six dialects, then use this label as a feature for the fine grained model that classifies the sentence among 26 dialects from different Arab cities, after that we apply ensemble voting classifier on both subsystems. Our system ranked 1st that achieving an f-score of 67.32%. Both the models and our feature engineering tools are made available to the research community.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2019
			
	ISBN
	
				9781950737321
			
	Appare nelle tipologie:
	
				Relazione di atto di convegno in volume

File in questo prodotto:

File	Dimensione	Formato
W19-4632.pdf accesso aperto Tipologia: Versione editoriale Note: Materials published in or after 2016 are licensed on a Creative Commons Attribution 4.0 International License. V.: https://aclanthology.org/volumes/W19-46/ Licenza: Creative commons Dimensione 606.23 kB Formato Adobe PDF Visualizza/Apri	606.23 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11587/561299

Citazioni

ND

6

ND

social impact