Please use this identifier to cite or link to this item: https://open.uns.ac.rs/handle/123456789/814
DC FieldValueLanguage
dc.contributor.authorOstrogonac S.en
dc.contributor.authorPakoci, Edvinen
dc.contributor.authorSečujski, Milanen
dc.contributor.authorMišković, Dragišaen
dc.date.accessioned2019-09-23T10:11:19Z-
dc.date.available2019-09-23T10:11:19Z-
dc.date.issued2019-01-01en
dc.identifier.issn17858860en
dc.identifier.urihttps://open.uns.ac.rs/handle/123456789/814-
dc.description.abstract© 2019, Budapest Tech Polytechnical Institution. All rights reserved. When training language models (especially for highly inflective languages), some applications require word clustering in order to mitigate the problem of insufficient training data or storage space. The goal of word clustering is to group words that can be well represented by a single class in the sense of probabilities of appearances in different contexts. This paper presents comparative results obtained by using different approaches to word clustering when training class N-gram models for Serbian, as well as models based on recurrent neural networks. One approach is unsupervised word clustering based on optimized Brown’s algorithm, which relies on bigram statistics. The other approach is based on morphology, and it requires expert knowledge and language resources. Four different types of textual corpora were used in experiments, describing different functional styles. The language models were evaluated by both perplexity and word error rate. The results show notable advantage of introducing expert knowledge into word clustering process.en
dc.relation.ispartofActa Polytechnica Hungaricaen
dc.titleMorphology-based vs unsupervised word clustering for training language models for Serbianen
dc.typeJournal/Magazine Articleen
dc.identifier.doi10.12700/APH.16.2.2019.2.11en
dc.identifier.scopus2-s2.0-85063165476en
dc.identifier.urlhttps://api.elsevier.com/content/abstract/scopus_id/85063165476en
dc.relation.lastpage197en
dc.relation.firstpage183en
dc.relation.issue2en
dc.relation.volume16en
item.fulltextNo Fulltext-
item.grantfulltextnone-
crisitem.author.deptFakultet tehničkih nauka, Departman za energetiku, elektroniku i telekomunikacije-
crisitem.author.parentorgFakultet tehničkih nauka-
Appears in Collections:FTN Publikacije/Publications
Show simple item record

SCOPUSTM   
Citations

9
checked on May 3, 2024

Page view(s)

23
Last Week
0
Last month
0
checked on May 10, 2024

Google ScholarTM

Check

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.