Morphology-based vs unsupervised word clustering for training language models for Serbian

Ostrogonac S.; Pakoci, Edvin; Sečujski, Milan; Mišković, Dragiša

Mоlimо vаs kоristitе оvај idеntifikаtоr zа citirаnjе ili оvај link dо оvе stаvkе: https://open.uns.ac.rs/handle/123456789/814

Nаziv:	Morphology-based vs unsupervised word clustering for training language models for Serbian
Аutоri:	Ostrogonac S. Pakoci, Edvin Sečujski, Milan Mišković, Dragiša
Dаtum izdаvаnjа:	1-јан-2019
Čаsоpis:	Acta Polytechnica Hungarica
Sažetak:	© 2019, Budapest Tech Polytechnical Institution. All rights reserved. When training language models (especially for highly inflective languages), some applications require word clustering in order to mitigate the problem of insufficient training data or storage space. The goal of word clustering is to group words that can be well represented by a single class in the sense of probabilities of appearances in different contexts. This paper presents comparative results obtained by using different approaches to word clustering when training class N-gram models for Serbian, as well as models based on recurrent neural networks. One approach is unsupervised word clustering based on optimized Brown’s algorithm, which relies on bigram statistics. The other approach is based on morphology, and it requires expert knowledge and language resources. Four different types of textual corpora were used in experiments, describing different functional styles. The language models were evaluated by both perplexity and word error rate. The results show notable advantage of introducing expert knowledge into word clustering process.
URI:	https://open.uns.ac.rs/handle/123456789/814
ISSN:	17858860
DOI:	10.12700/APH.16.2.2019.2.11
Nаlаzi sе u kоlеkciјаmа:	FTN Publikacije/Publications

Prikаzаti cеlоkupаn zаpis stаvki

SCOPUS^TM
Nаvоđеnjа

9

prоvеrеnо 03.05.2024.

Prеglеd/i stаnicа

23

Prоtеklа nеdеljа
0

Prоtеkli mеsеc
0

prоvеrеnо 10.05.2024.

Google Scholar^TM

Prоvеritе

Аlt mеtrikа

Stаvkе nа DSpace-u su zаštićеnе аutоrskim prаvimа, sа svim prаvimа zаdržаnim, оsim аkо nije drugačije naznačeno.

SCOPUSTM Nаvоđеnjа

Prеglеd/i stаnicа

Google ScholarTM

Аlt mеtrikа

SCOPUS^TM
Nаvоđеnjа

Google Scholar^TM