Style-code method for multi-style parametric text-to-speech syntesis

Suzić, Siniša; Delić, Tijana; Ostrogonac S.; Đuričić , Jasna; Pekar, Darko

Mоlimо vаs kоristitе оvај idеntifikаtоr zа citirаnjе ili оvај link dо оvе stаvkе: https://open.uns.ac.rs/handle/123456789/2043

Pоljе DC-а	Vrеdnоst	Јеzik
dc.contributor.author	Suzić, Siniša	en
dc.contributor.author	Delić, Tijana	en
dc.contributor.author	Ostrogonac S.	en
dc.contributor.author	Đuričić , Jasna	en
dc.contributor.author	Pekar, Darko	en
dc.date.accessioned	2019-09-23T10:19:14Z	-
dc.date.available	2019-09-23T10:19:14Z	-
dc.date.issued	2018-01-01	en
dc.identifier.issn	20789181	en
dc.identifier.uri	https://open.uns.ac.rs/handle/123456789/2043	-
dc.description.abstract	© 2018 St. Petersburg Institute for Informatics and Automation of the Russian Academy of Sciences. All rights reserved. Modern text-to-speech systems generally achieve good intelligibility. The one of the main drawbacks of these systems is the lack of expressiveness in comparison to natural human speech. It is very unpleasant when automated system conveys positive and negative message in completely the same way. The introduction of parametric methods in speech synthesis gave possibility to easily change speaker characteristics and speaking styles. In this paper a simple method for incorporating styles into synthesized speech by using style codes is presented. The proposed method requires just a couple of minutes of target style and moderate amount of neutral speech. It is successfully applied to both hidden Markov models and deep neural networks-based synthesis, giving style code as additional input to the model. Listening tests confirmed that better style expressiveness is achieved by deep neural networks synthesis compared to hidden Markov model synthesis. It is also proved that quality of speech synthesized by deep neural networks in a certain style is comparable with the speech synthesized in neutral style, although the neutral-speech-database is about 10 times bigger. DNN based TTS with style codes are further investigated by comparing the quality of speech produced by single-style modeling and multi-style modeling systems. Objective and subjective measures confirmed that there is no significant difference between these two approaches.	en
dc.relation.ispartof	SPIIRAS Proceedings	en
dc.title	Style-code method for multi-style parametric text-to-speech syntesis	en
dc.type	Journal/Magazine Article	en
dc.identifier.doi	10.15622/sp.60.8	en
dc.identifier.scopus	2-s2.0-85057534493	en
dc.identifier.url	https://api.elsevier.com/content/abstract/scopus_id/85057534493	en
dc.relation.lastpage	240	en
dc.relation.firstpage	216	en
dc.relation.issue	60	en
dc.relation.volume	5	en
item.fulltext	No Fulltext	-
item.grantfulltext	none	-
crisitem.author.dept	Fakultet tehničkih nauka, Departman za energetiku, elektroniku i telekomunikacije	-
crisitem.author.parentorg	Fakultet tehničkih nauka	-
Nаlаzi sе u kоlеkciјаmа:	FTN Publikacije/Publications

Prikаzаti јеdnоstаvаn zаpis stаvki

SCOPUS^TM
Nаvоđеnjа

2

prоvеrеnо 20.05.2023.

Prеglеd/i stаnicа

25

Prоtеklа nеdеljа
10

Prоtеkli mеsеc
0

prоvеrеnо 10.05.2024.

Google Scholar^TM

Prоvеritе

Аlt mеtrikа

Stаvkе nа DSpace-u su zаštićеnе аutоrskim prаvimа, sа svim prаvimа zаdržаnim, оsim аkо nije drugačije naznačeno.

SCOPUSTM Nаvоđеnjа

Prеglеd/i stаnicа

Google ScholarTM

Аlt mеtrikа

SCOPUS^TM
Nаvоđеnjа

Google Scholar^TM