1887
Volume 33, Issue 1
  • ISSN 0213-2028
  • E-ISSN: 2254-6774
USD
Buy:$35.00 + Taxes

Abstract

Resumen

En la actualidad, la detección de palabras fonéticamente similares se ha logrado de forma exitosa gracias a la utilización de algoritmos fonéticos. Sin embargo, tales algoritmos dependen del lenguaje al que pertenecen, por lo que generalmente no están optimizados para el español. Por esta razón, en el siguiente artículo se presentará el algoritmo PFS y su variante PFS-US, los cuales son algoritmos fonéticos que consideran la fonología del español hablado en el centro de México, y fueron diseñados para detectar palabras fonéticamente similares en grandes conjuntos de palabras. Ahora bien, a través de un análisis comparativo entre otros cuatro algoritmos fonéticos de estado del arte, analizaremos la consideración fonológica mencionada. Para ello, se definieron métricas independientes de la lengua para evaluar algoritmos fonéticos en general. Dichas métricas se basan en la estructura de los grupos de palabras fonéticamente similares entre sí y su relación con palabras que no son similares con ninguna otra. Adicionalmente, los recursos generados se comparten de forma libre para su uso y análisis.

Loading

Article metrics loading...

/content/journals/10.1075/resla.18002.her
2020-08-21
2020-09-26
Loading full text...

Full text loading...

References

  1. Anguita, J., Peillon, S., Hernando, J., y Bramoulle, A.
    (2004) Word confusability prediction in automatic speech recognition. Eighth International Conference on Spoken Language Processing, 1–4.
    [Google Scholar]
  2. Bahl, Lalit R., Gennaro, S. V. De, Gopalakrishnan, P. S. and Mercer, Robert L.
    1989 First European Conference on Speech Communication and Technology. A fast approximate acoustic match for large vocabulary speech recognition.
    [Google Scholar]
  3. Blanch, J. M. L.
    (1967) La influencia del sustrato en la fonética del español de México. Revista de Filología Española, 50, 145–161. 10.3989/rfe.1967.v50.i1/4.851
    https://doi.org/10.3989/rfe.1967.v50.i1/4.851 [Google Scholar]
  4. Branting, L. K.
    (2003) A comparative evaluation of name-matching algorithms. Proceedings of the 9th international conference on Artificial intelligence and law, 224–232.
    [Google Scholar]
  5. Caballero-Morales, S.-O.
    (2013) Recognition of emotions in mexican spanish speech: An approach based on acoustic modelling of emotion-specific vowels. The Scientific World Journal, 1–13.
    [Google Scholar]
  6. Chavarría-Amezcua, M.-A.
    (2010) Manual de etiquetado fonético e imágenes acústicas de los alófonos del español de la Ciudad de México, para su uso en las tecnologías del habla (pp.70–187). Tesis de licenciatura, Facultad de Filosofía y Letras, UNAM.
    [Google Scholar]
  7. Chen, J.-Y., Olsen, P. A., y Hershey, J. R.
    (2007) Word confusability-measuring hidden Markov model similarity. Eighth Annual Conference of the International Speech Communication Association, 2089–2092.
    [Google Scholar]
  8. Christian, P.
    (1998) Soundex-can it be improved?Computers in Genealogy, 6, 215–221.
    [Google Scholar]
  9. Cuétara, J.
    (2004) Fonética de la ciudad de México. Aportaciones desde las tecnologías del habla (pp.15–135). Tesis de maestría, Posgrado en Lingüística, UNAM.
    [Google Scholar]
  10. Daniel, Y.
    (2004) Application of the Double Metaphone Algorithm to Amharic Orthography. International Conference of Ethiopian StudiesXV, 1–13.
    [Google Scholar]
  11. Davis, S., y Mermelstein, P.
    (1980) Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE transactions on acoustics, speech, and signal processing, 28, 357–366. 10.1109/TASSP.1980.1163420
    https://doi.org/10.1109/TASSP.1980.1163420 [Google Scholar]
  12. Elmagarmid, A. K., Ipeirotis, P. G., y Verykios, V. S.
    (2007) Duplicate record detection: A survey. IEEE Transactions on knowledge and data engineering, 19, 1–16. 10.1109/TKDE.2007.250581
    https://doi.org/10.1109/TKDE.2007.250581 [Google Scholar]
  13. Fernández, J. G.
    (2007) Fonética para profesores de español: de la teoría a la práctica. Madrid: Arco Libros.
    [Google Scholar]
  14. Gadd, T.
    (1988) Fisching fore werds: phonetic retrieval of written text in information systems. Program, 22, 222–237. 10.1108/eb046999
    https://doi.org/10.1108/eb046999 [Google Scholar]
  15. (1990) PHONIX: The algorithm. Program, 24, 363–366. 10.1108/eb047069
    https://doi.org/10.1108/eb047069 [Google Scholar]
  16. Gálvez, C.
    (2007) Identificación de nombres personales por medio de sistemas de codificación fonética. Encontros Bibli: Revista eletrônica de biblioteconomia e ciência da informação, 11, 105–116.
    [Google Scholar]
  17. Goldrick, M., Vaughn, C., y Murphy, A.
    (2013) The effects of lexical neighbors on stop consonant articulation. The Journal of the Acoustical Society of America, 134, 172–177. 10.1121/1.4812821
    https://doi.org/10.1121/1.4812821 [Google Scholar]
  18. Goldwater, S., Jurafsky, D., y Manning, C. D.
    (2010) Which words are hard to recognize? Prosodic, lexical, and disfluency factors that increase speech recognition error rates. Speech Communication, 52, 181–200. 10.1016/j.specom.2009.10.001
    https://doi.org/10.1016/j.specom.2009.10.001 [Google Scholar]
  19. Gonzales-Cam, C.
    (2008) Algoritmos fonéticos en el desarrollo de un sistema de información de marcas y signos distintivos. Biblios: Revista de bibliotecología y Ciencias de la comunicación, 32, 2–8.
    [Google Scholar]
  20. Grannis, S. J., Overhage, J. M., y McDonald, C. J.
    (2004) Real world performance of approximate string comparators for use in patient matching. Medinfo, 43–47.
    [Google Scholar]
  21. Knuth, D. E.
    (1998) Sorting and Searching. The Art of Computer Programming, 3, 392–396.
    [Google Scholar]
  22. Hernández-Mena, C. D., y Herrera-Camacho, J. A.
    (2013) Creación de un diccionario de pronunciación de nombres propios para uso en tecnologías del habla. Vigésima cuarta reunión internacional de otoño de comunicaciones, computación, electrónica, automatización, robótica y exposición industrial ROCyC’2013, 1–5.
    [Google Scholar]
  23. (2014a) CIEMPIESS: A new open-sourced mexican spanish radio corpus. Ninth International Conference on Language Resources and Evaluation, 14, 371–375.
    [Google Scholar]
  24. Hernández-Mena, C. D., Martınez-Gómez, N. N., y Herrera-Camacho, J.-A.
    (2014b) A Set of Phonetic and Phonological Rules for Mexican Spanish Revisited, Updated, Enhanced and Implemented. Advances in Computing Science. Center for Computing Research of IPN, 83, 61–71.
    [Google Scholar]
  25. Hernández-Mena, C. D., Meza-Ruiz, I. V., y Herrera-Camacho, J. A.
    (2017) Automatic speech recognizers for Mexican Spanish and its open resources. Journal of Applied Research and Technology, 15(1), 259–270. 10.1016/j.jart.2017.02.001
    https://doi.org/10.1016/j.jart.2017.02.001 [Google Scholar]
  26. Kondrak, G., y Dorr, B.
    (2004) Identification of confusable drug names: A new approach and evaluation methodology. Proceedings of the 20th international conference on Computational Linguistics, 952.
    [Google Scholar]
  27. Krstev, C., Vitas, D., Maurel, D., y Tran, M.
    (2005) Multilingual ontology of proper names. 2nd Language y Technology Conference, LTC’05, 116–119.
    [Google Scholar]
  28. Lambert, B. L., Lin, S.-J., Chang, K.-Y., y Gandhi, S. K.
    (1999) Similarity as a risk factor in drug-name confusion errors: the look-alike (orthographic) and sound-alike (phonetic) model. Medical care, 37, 1214–1225. 10.1097/00005650‑199912000‑00005
    https://doi.org/10.1097/00005650-199912000-00005 [Google Scholar]
  29. Levenshtein, V. I.
    (1966) Binary codes capable of correcting deletions, insertions, and reversals. Soviet physics doklady, 10, 707–710.
    [Google Scholar]
  30. Luce, P. A., y Pisoni, D. B.
    (1998) Recognizing spoken words: The neighborhood activation model. Ear and hearing, 19(1), 1. 10.1097/00003446‑199802000‑00001
    https://doi.org/10.1097/00003446-199802000-00001 [Google Scholar]
  31. McDonald, D.
    (1996) Internal and external evidence in the identification and semantic categorization of proper names. Corpus processing for lexical acquisition, 21–39.
    [Google Scholar]
  32. McQueen, J. M.
    (1991) The influence of the lexicon on phonetic categorization: stimulus quality in word-final ambiguity. Journal of Experimental Psychology: Human Perception and Performance, 17, 433.
    [Google Scholar]
  33. Mills, D. L., Prat, C., Zangl, R., Stager, C. L., Neville, H. J., y Werker, J. F.
    (2004) Language experience and the organization of brain activity to phonetically similar words: ERP evidence from 14-and 20 month olds. Journal of Cognitive Neuroscience, 16, 1452–1464. 10.1162/0898929042304697
    https://doi.org/10.1162/0898929042304697 [Google Scholar]
  34. Nye, P., y Gaitenby, J.
    (1973) Consonant intelligibility in synthetic speech and in a natural speech control (modified rhyme test results). Haskins Laboratories Status Report on Speech Research, 33, 77–91.
    [Google Scholar]
  35. Pande, B., y Dhami, H.
    (2011) Application of natural language processing tools in stemming. International Journal of Computer Applications, 27, 14–19. 10.5120/3302‑4530
    https://doi.org/10.5120/3302-4530 [Google Scholar]
  36. Parmar, V. P., y Kumbharana, C.
    (2014) Study Existing Various Phonetic Algorithms and Designing and Development of a working model for the New Developed Algorithm and Comparison by implementing it with Existing Algorithm(s). International Journal of Computer Applications, 98(19), 45–49. 10.5120/17295‑7795
    https://doi.org/10.5120/17295-7795 [Google Scholar]
  37. Peereman, R.
    (1997) Orthographic and phonological neighborhoods in naming: Not all neighbors are equally influential in orthographic space. Journal of Memory and language, 37, 382–410. 10.1006/jmla.1997.2516
    https://doi.org/10.1006/jmla.1997.2516 [Google Scholar]
  38. Philips, L.
    (1990) Hanging on the metaphone. Computer Language, 7(12), 39–44.
    [Google Scholar]
  39. (2000) The double metaphone search algorithm. Cusers journal, 18, 38–43.
    [Google Scholar]
  40. Pineda, L. A., Castellanos, H., Cuétara, J., Galescu, L., Juárez, J., Llisterri, J., Pérez, P., y Villaseñor, L.
    (2010) The Corpus DIMEx100: transcription and evaluation. Language Resources and Evaluation, 44, 347–370. 10.1007/s10579‑009‑9109‑9
    https://doi.org/10.1007/s10579-009-9109-9 [Google Scholar]
  41. Pineda, L. A., Pineda, L. V., Cuétara, J., Castellanos, H., y López, I.
    (2004) DIMEx100: A new phonetic and speech corpus for Mexican Spanish. Iberamia, 3315, 974–984.
    [Google Scholar]
  42. Pinto, D., Vilariño, D., Alemán, Y., Gómez, H., Loya, N., y Jiménez-Salazar, H.
    (2012) The Soundex phonetic algorithm revisited for SMS text representation. Text, Speech and Dialogue, 47–55. 10.1007/978‑3‑642‑32790‑2_5
    https://doi.org/10.1007/978-3-642-32790-2_5 [Google Scholar]
  43. Pisoni, D. B., Nusbaum, H. C., Luce, P. A., y Slowiaczek, L. M.
    (1985) Speech perception, word recognition and the structure of the lexicon. Speech communication, 4, 75–95. 10.1016/0167‑6393(85)90037‑8
    https://doi.org/10.1016/0167-6393(85)90037-8 [Google Scholar]
  44. Quilis, A.
    (1984) Métrica española. Barcelona: Ariel Barcelona.
    [Google Scholar]
  45. Rahm, E., y Do, H. H.
    (2000) Data cleaning: Problems and current approaches. IEEE Data Eng. Bull, 23, 3–13.
    [Google Scholar]
  46. Reddy, A. M., y Rose, R. C.
    (2008) Towards domain independence in machine aided human translation. Interspeech, 2358–2361.
    [Google Scholar]
  47. Reyes-Barragán, M. A., Pineda, L. V., y Montes-y Gómez, M.
    (2009) INAOE at qast 2009: Evaluating the usefulness of a phonetic codification of transcriptions. CLEF Working Notes, 1–5.
    [Google Scholar]
  48. Riley, M. D., y Roe, D. B.
    (1998) Confusable word detection in speech recognition. US: Patent No. 5,737,723, 7, 1–6.
    [Google Scholar]
  49. Russell, R., y Odell, M.
    (1918) The Soundex Indexing System. Technical Report.
    [Google Scholar]
  50. Stanier, A.
    (1990) How accurate is Soundex matching. Computers in Genealogy, 3, 286–288.
    [Google Scholar]
  51. Taft, R.
    (1970) Special Report no. 1. Albany. New York: Bureau of Systems Development, New York State Identification and Intelligence Systems (NYSIIS).
    [Google Scholar]
  52. UzZaman, N., y Khan, M.
    (2005) A double metaphone encoding for Bangla and its application in spelling checker. Natural Language Processing and Knowledge Engineering IEEE, 705–710.
    [Google Scholar]
  53. Voran, S.
    (2013) Using articulation index band correlations to objectively estimate speech intelligibility consistent with the modified rhyme test. Applications of Signal Processing to Audio and Acoustics (WASPAA), 1–4.
    [Google Scholar]
  54. Zobel, J., y Dart, P.
    (1996) Phonetic string matching: Lessons from information retrieval. Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval, 166–172.
    [Google Scholar]
http://instance.metastore.ingenta.com/content/journals/10.1075/resla.18002.her
Loading
/content/journals/10.1075/resla.18002.her
Loading

Data & Media loading...

This is a required field
Please enter a valid email address
Approval was successful
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error