|
Definition of naturalnessThe term naturalness is used in a number of studies to evaluate speech synthesisers. It is not frequent in music acoustics literature, however. The term is often used without a clear definition attached to it.Both in music and speech synthesis, it is fruitful to consider naturalness as an attribute that makes the listener think that the performer or the speaker is a human, i.e. that the sound is humanlike. We therefore introduce the term "humanlikeness" and designate it a perceptive scale where a human performer should be placed at one end and a computer preformance towards the other end. [Review paper in preparation: Farner and Ternström] Speech naturalnessThis part of the project is currently attacked by looking at how cost functions may be adapted to the perception of naturalness of different types of discontinuities at a join in a vowel, in particular artifacts due to pitch and spectrum discontinuities. Preliminary results have been presented at the Interspeech conference (Bjørkan, Svendsen, and Farner, 2005).Music naturalnessIn music synthesis it seems as though humanlike gestures and a good instrument model are important factors for the generation of natural sound. As a first approach, we have assumed that a MIDI-controlled digital clarinet synthesiser based on physical models is a sufficiently good instrument model. The advantage of this approach is that the model may be played by a human musician or by a computer, so that their performances may be compared directly. A human being was asked to play the same melody 20 times with the same expression, and the playing parameters (MIDI) where recorded. The systematic behaviour of the parameters was extracted by averaging, and the fluctuation was quantified by the standard deviation. When reinjecting the averaged parameters (the natural fluctuations thus having been removed), a part of the naturalness seemed to have been lost, indicating that such fluctuations contribute to naturalness (Farner, Kronland, Voinier, and Ystad, 2005 and 2006a). This was planned verified and quantified by
listening tests:
The playing parameters (the blowing pressure BC) from the above
performances (unmodified condition V+F+) was modified along two
orthogonal axis: The variation of the note velocity has been removed
(condition V-), and the fluctuations during each note were fixed to a
certain level depending on the maximum and the mean of the BC of this
note (F-) or they were simplified by a ASDR envelope (F*). These
modifications gave 6 versions of the melody (see figure below and
listen to the resulting sound), and
the listeners are presented with 5 different performances in all these
conditions, in total 30 stimuli [Paper in preparation: Farner, Kronland, Behne, Voinier, and Ystad]. ![]() Listening examples:
References
|