Please use this identifier to cite or link to this item: https://open.uns.ac.rs/handle/123456789/3625
Title: End-to-end large vocabulary speech recognition for the serbian language
Authors: Popović, Boris
Pakoci, Edvin 
Pekar, Darko 
Issue Date: 1-Jan-2017
Journal: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Abstract: © Springer International Publishing AG 2017. This paper presents the results of a large vocabulary speech recognition for the Serbian language, developed by using Eesen end-to-end framework. Eesen involves training a single deep recurrent neural network, containing a number of bidirectional long short-term memory layers, modeling the connection between the speech and a set of context-independent lexicon units. This approach reduces the amount of expert knowledge needed in order to develop other competitive speech recognition systems. The training is based on a connectionist temporal classification, while decoding allows the usage of weighted finite-state transducers. This provides much faster and more efficient decoding in comparison to other similar systems. A corpus of approximately 215 h of audio data (about 171 h of speech and 44 h of silence, or 243 male and 239 female speakers) was employed for the training (about 90%) and testing (about 10%) purposes. On a set of more than 120000 words, the word error rate of 14.68% and the character error rate of 3.68% is achieved.
URI: https://open.uns.ac.rs/handle/123456789/3625
ISBN: 9783319664286
ISSN: 3029743
DOI: 10.1007/978-3-319-66429-3_33
Appears in Collections:FTN Publikacije/Publications

Show full item record

SCOPUSTM   
Citations

9
checked on May 10, 2024

Page view(s)

14
Last Week
2
Last month
0
checked on May 10, 2024

Google ScholarTM

Check

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.