Prosody and phonemes
The benefit of prosodic and additional spectral over exclusively cepstral feature information is investigated for the recognition of phonemes in eight different speaking styles reaching from informal to formal. As prosodic information is best analyzed on a supra-segmental level, the whole temporal context of a phoneme is exploited by application of statistical functionals. 521 acoustic features are likewise obtained and evaluated per descriptor and functional by either de-correlating floating search feature evaluation or classification performance: The classifier of choice are Support Vector Machines lately found highly suitable for this task. As database serves the open IFA corpus of 178 k hand-segmented and hand-labeled 47 Dutch phonemes. In the result, a significant gain is observed for segment-based over frame-based processing, and by inclusion of pitch and formant information for the informal styles. Overall, phonemes are recognized at 76.58% accuracy. The analysis of feature influence provides useful insight for artificial speech production in the considered speaking styles.