Please use this identifier to cite or link to this item:
https://open.uns.ac.rs/handle/123456789/3624
Title: | Language model optimization for a deep neural network based speech recognition system for Serbian | Authors: | Pakoci, Edvin Popović, Boris Pekar, Darko |
Issue Date: | 1-Jan-2017 | Journal: | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) | Abstract: | © Springer International Publishing AG 2017. This paper presents the results obtained using several variants of trigram language models in a large vocabulary continuous speech recognition (LVCSR) system for the Serbian language, based on the deep neural network (DNN) framework implemented within the Kaldi speech recognition toolkit. This training approach allows parallelization using several threads on either multiple GPUs or multiple CPUs, and provides a natural-gradient modification to the stochastic gradient descent (SGD) optimization method. Acoustic models are trained over a fixed number of training epochs with parameter averaging in the end. This paper discusses recognition using different language models trained with Kneser-Ney or Good-Turing smoothing methods, as well as several pruning parameter values. The results on a test set containing more than 120000 words and different utterance types are explored and compared to the referent results with GMM-HMM speaker-adapted models for the same speech database. Online and offline recognition results are compared to each other as well. Finally, the effect of additional discriminative training using a language model prior to the DNN stage is explored. | URI: | https://open.uns.ac.rs/handle/123456789/3624 | ISBN: | 9783319664286 | ISSN: | 3029743 | DOI: | 10.1007/978-3-319-66429-3_48 |
Appears in Collections: | FTN Publikacije/Publications |
Show full item record
SCOPUSTM
Citations
8
checked on May 10, 2024
Page view(s)
20
Last Week
7
7
Last month
0
0
checked on May 10, 2024
Google ScholarTM
Check
Altmetric
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.