Please use this identifier to cite or link to this item: https://open.uns.ac.rs/handle/123456789/15353
DC FieldValueLanguage
dc.contributor.authorSuzić, Sinišaen
dc.contributor.authorDelić, Tijanaen
dc.contributor.authorPekar, Darkoen
dc.contributor.authorDelić, Vladoen
dc.contributor.authorSečujski, Milanen
dc.date.accessioned2020-03-03T14:59:36Z-
dc.date.available2020-03-03T14:59:36Z-
dc.date.issued2019-01-01en
dc.identifier.issn17858860en
dc.identifier.urihttps://open.uns.ac.rs/handle/123456789/15353-
dc.description.abstract© 2019, Budapest Tech Polytechnical Institution. All rights reserved. The paper proposes a novel deep neural network (DNN) architecture aimed at improving the expressiveness of text-to-speech synthesis (TTS) by learning the properties of a particular speech style from a multi-speaker, multi-style speech corpus, and transplanting it into the speech of a new speaker, whose actual speech in the target style is missing from the training corpus. In most research on this topic speech styles are identified with corresponding emotional expressions, which was the approach accepted in this research as well, and the entire process is conventionally referred to as “emotion transplantation”. The proposed architecture builds on the concept of shared hidden layer DNN architecture, which was originally used for multi-speaker modelling, principally by introducing the style code as an auxiliary input. In this way, the mapping between linguistic and acoustic features performed by the DNN was made style dependent. The results of both subjective or objective evaluation of the quality of synthesized speech as well as the quality of style reproduction show that in case the emotional speech data available for training is limited, the performance of the proposed system represents a small but clear improvement to the state of the art. The system used as a baseline reference is based on the standard approach which uses both speaker code and style code as auxiliary inputs.en
dc.relation.ispartofActa Polytechnica Hungaricaen
dc.titleStyle transplantation in neural network-based speech synthesisen
dc.typeJournal/Magazine Articleen
dc.identifier.doi10.12700/APH.16.6.2019.6.11en
dc.identifier.scopus2-s2.0-85068819967en
dc.identifier.urlhttps://api.elsevier.com/content/abstract/scopus_id/85068819967en
dc.relation.lastpage189en
dc.relation.firstpage171en
dc.relation.issue6en
dc.relation.volume16en
item.grantfulltextnone-
item.fulltextNo Fulltext-
crisitem.author.deptFakultet tehničkih nauka, Departman za energetiku, elektroniku i telekomunikacije-
crisitem.author.deptFakultet tehničkih nauka-
crisitem.author.deptFakultet tehničkih nauka, Departman za energetiku, elektroniku i telekomunikacije-
crisitem.author.deptFakultet tehničkih nauka, Departman za energetiku, elektroniku i telekomunikacije-
crisitem.author.parentorgFakultet tehničkih nauka-
crisitem.author.parentorgUniverzitet u Novom Sadu-
crisitem.author.parentorgFakultet tehničkih nauka-
crisitem.author.parentorgFakultet tehničkih nauka-
Appears in Collections:FTN Publikacije/Publications
Show simple item record

SCOPUSTM   
Citations

10
checked on May 10, 2024

Page view(s)

32
Last Week
7
Last month
2
checked on May 10, 2024

Google ScholarTM

Check

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.