Please use this identifier to cite or link to this item:
https://open.uns.ac.rs/handle/123456789/3625
Title: | End-to-end large vocabulary speech recognition for the serbian language | Authors: | Popović, Boris Pakoci, Edvin Pekar, Darko |
Issue Date: | 1-Jan-2017 | Journal: | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) | Abstract: | © Springer International Publishing AG 2017. This paper presents the results of a large vocabulary speech recognition for the Serbian language, developed by using Eesen end-to-end framework. Eesen involves training a single deep recurrent neural network, containing a number of bidirectional long short-term memory layers, modeling the connection between the speech and a set of context-independent lexicon units. This approach reduces the amount of expert knowledge needed in order to develop other competitive speech recognition systems. The training is based on a connectionist temporal classification, while decoding allows the usage of weighted finite-state transducers. This provides much faster and more efficient decoding in comparison to other similar systems. A corpus of approximately 215 h of audio data (about 171 h of speech and 44 h of silence, or 243 male and 239 female speakers) was employed for the training (about 90%) and testing (about 10%) purposes. On a set of more than 120000 words, the word error rate of 14.68% and the character error rate of 3.68% is achieved. | URI: | https://open.uns.ac.rs/handle/123456789/3625 | ISBN: | 9783319664286 | ISSN: | 3029743 | DOI: | 10.1007/978-3-319-66429-3_33 |
Appears in Collections: | FTN Publikacije/Publications |
Show full item record
SCOPUSTM
Citations
9
checked on May 10, 2024
Page view(s)
14
Last Week
2
2
Last month
0
0
checked on May 10, 2024
Google ScholarTM
Check
Altmetric
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.