1887
Volume 19, Issue 2
  • ISSN 1387-6732
  • E-ISSN: 1570-6001
USD
Buy:$35.00 + Taxes

Abstract

Software systems convert between graphemes and phonemes using lexicon-based, rule-based or data-driven techniques. combines these techniques in a hybrid system which converts between graphemes and phonemes bi-directionally, adds linguistic and educational information about the relationships between graphemes and phonemes and provides estimates about the likelihood that the generated output is correct. We describe the components from which is built and determine its accuracy by running tests on two data sources, the BasisSpellingBank and , comparing the results to Nunn’s (1998) rule-based conversion system. converts phonemes to graphemes and vice versa with precision of 81% and 86% when tested on the BasisSpellingBank, and 80% and 81% when tested on . proves to be a powerful new conversion tool.

Loading

Article metrics loading...

/content/journals/10.1075/wll.19.2.02bee
2017-05-22
2025-02-14
Loading full text...

Full text loading...

References

  1. Busser, Bertjan , Walter Daelemans & Antal van den Bosch
    (1999) Machine learning of word pronunciation: the case against abstraction. InEurospeech99: 2123–2126.
    [Google Scholar]
  2. Cranshoff, Betty & Johan Zuidema
    (2010) Van Dale Basisspellinggids. Utrecht/Antwerpen: Van Dale Lexicografie.
    [Google Scholar]
  3. Daelemans, Walter
    (1988)  grafon: a grapheme-to-phoneme conversion system for Dutch. Proceedings, 12th international conference on computational linguistics ( coling-88), vol.1: 133–138.
    [Google Scholar]
  4. Daelemans, Walter & Antal van den Bosch
    (1993) Data-oriented methods for grapheme-to-phoneme conversion. Proceedings of eacl 6: 45–53.
    [Google Scholar]
  5. (1997) Language-independent data-oriented grapheme-to-phoneme conversion. In Jan P.H. van Santen , Richard W. Sproat , Joseph P. Olive & Julia Hirschberg (eds.), Progress in speech synthesis, Section 2: 77–89. New York: Springer-Verlag.
    [Google Scholar]
  6. (2001)  treetalk: memory-based word phonemisation. In Robert I. Damper (ed.) Data-driven techniques in speech synthesis, Chapter 7. Cambridge: mit Press.
    [Google Scholar]
  7. Daelemans, Walter , Antal van den Bosch & Ton Weijters
    (1996) IGTree: Using trees for compression and classification in lazy learning algorithms. Artificial Intelligence Review11: 407–423.
    [Google Scholar]
  8. Daelemans, Walter & Helmer Strik
    (2002) Het Nederlands in taal- en spraaktechnologie: prioriteiten voor basisvoorzieningen. Een rapport in opdracht van de Nederlandse Taalunie. Second version.
    [Google Scholar]
  9. Damper, Robert & John Eastmond
    (1997) Pronunciation by analogy: Impact of implementational choices on performance. Language and Speech40(1): 1–23.
    [Google Scholar]
  10. Decadt, Bart , Jacques Duchateau , Walter Daelemans & Patrick Wamback
    (2002) Memory-based phoneme-to-grapheme conversion. Language and Computers45(1): 47–61.
    [Google Scholar]
  11. Galescu, Lucian & James F. Allen
    (2001) Bi-directional conversion between graphemes and phonemes using a joint n-gram model. Proceedings of the4th ISCA Tutorial and Research Workshop on Speech Synthesis4: 103–108.
    [Google Scholar]
  12. Geeraerts, Dirk
    (2002) Groot woordenboek van de Nederlandse taal ( cd-rom, version 1.0 Plus). Utrecht/Antwerpen: Van Dale Lexicografie.
    [Google Scholar]
  13. Hamming, Richard
    (1950) Error detecting and error correcting codes. Bell System Technical Journal29(2): 147 – 160.
    [Google Scholar]
  14. Heemskerk, Josée & Wim Zonneveld
    (2000) Uitspraakwoordenboek. Utrecht: Het Spectrum.
    [Google Scholar]
  15. Hoste, Veronique , Steven Gillis & Walter Daelemans
    (2000) Machine learning for modeling Dutch pronunciation variation. 10th Meeting on Computational Linguistics in the Netherlands: 73–84. Utrecht Institute of Linguistics OTS.
    [Google Scholar]
  16. Jones, Daniel
    (1996) Analogical natural language processing. London: ucl Press.
    [Google Scholar]
  17. Jongenburger, Willy & Vincent J. van Heuven
    (1993) Sandhi processes in natural and synthetic speech. In Vincent J. van Heuven & Louis C.W. Pols (eds.), Analysis and synthesis of speech, strategy research towards high quality text-to-speech-generation: 261–276. Berlin: Mouton de Gruyter.
    [Google Scholar]
  18. Levenshtein, Vladimir
    (1966) Binary codes capable of correcting deletions, insertions, and reversals. Soviet Physics Doklady10(8): 707 – 710.
    [Google Scholar]
  19. Marchand, Yannick & Robert Damper
    (2000) A multistrategy approach to improving pronunciation by analogy. Computational Linguistics26(2): 195–219.
    [Google Scholar]
  20. Nunn, Anneke M.
    (1998) Dutch orthography. A systematic investigation of the spelling of Dutch words (Doctoral dissertation). The Hague: Holland Academic Graphics.
    [Google Scholar]
  21. Nunn, Anneke M. & Vincent J. van Heuven
    (1993)  morphon: lexicon-based text-to-phoneme conversion and phonological rules. In Vincent J. van Heuven & Louis C.W. Pols (eds.), Analysis and synthesis of speech, strategy research towards high quality text-to-speech-generation: 77–99. Berlin: Mouton de Gruyter.
    [Google Scholar]
  22. Santen, Jan P.H. van , Richard W. Sproat , Joseph P. Olive & Julia Hirschberg
    (1997) Progress in speech synthesis, Section 2. New York: Springer-Verlag.
    [Google Scholar]
  23. Skousen, Royal
    (1989) Analogical modeling of language. Dordrecht: Kluwer Academic Publishers.
    [Google Scholar]
  24. Zuidema, Johan & Anneke Neijt
    (2012) Verkennend onderzoek naar de wenselijkheid en de haalbaarheid van een verrijking van de Woordenlijst Nederlandse Taal ten behoeve van spellingonderwijs. Nijmegen: Radboud Universiteit Nijmegen. Online available: taalunieversum.org/sites/tuv/files/downloads/rapport%20VWS%2015022013.pdf.
    [Google Scholar]
  25. (to appear). The BasisSpellingBank – spelling knowledge stored in a lexicon of triplets.
    [Google Scholar]
/content/journals/10.1075/wll.19.2.02bee
Loading
/content/journals/10.1075/wll.19.2.02bee
Loading

Data & Media loading...

This is a required field
Please enter a valid email address
Approval was successful
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error