1887
Volume 175, Issue 1
  • ISSN 0019-0829
  • E-ISSN: 1783-1490
USD
Buy:$35.00 + Taxes

Abstract

Abstract

MWE knowledge is key in the process of learning a foreign language, but its teaching remains hindered by the lack of list of expressions connected to pedagogical aims. In this paper, we present an extended version of the PolylexFLE database, containing 4,525 French multiword expressions (MWE) of three types: idioms, collocations or fixed expressions. In order to propose exercises following the difficulty scale of the European Framework of Reference for Languages (CEFR), we used a mixed approach (manual and automatic) to annotate 1,186 expressions according to the CEFR levels. The paper focuses mostly on the automatic procedure that first identifies the expressions from the PolylexFLE database (and their variants) in a corpus of pedagogical texts (with CEFR labels) using a pattern-based system. In a second step, their distribution in this corpus is estimated and transformed into a single CEFR level. The automatic approach proposed is finally evaluated by 52 French as foreign language learners.

Loading

Article metrics loading...

/content/journals/10.1075/itl.22031.tod
2024-04-05
2025-02-09
Loading full text...

Full text loading...

References

  1. Alfter, D., Graën, J.
    (2019) Interconnecting lexical resources and word alignment: How do learners get on with particle verbs?. InProceedings of the 22nd Nordic Conference on Computational Linguistics, pages321–326, Turku, Finland. Linköping University Electronic Press.
    [Google Scholar]
  2. Alfter, David, Therese Lindström Tiedemann, and Elena Volodina
    (2021) “Crowdsourcing Relative Rankings of Multi-Word Expressions: Experts versus Non-Experts.” Northern European Journal of Language Technology, 7(1). 10.3384/nejlt.2000‑1533.2021.3128
    https://doi.org/10.3384/nejlt.2000-1533.2021.3128 [Google Scholar]
  3. Alfter, D., Bizzoni, Y., Agebjörn, A., Volodina, E., & Pilán, I.
    (2016) From distributions to labels: A lexical proficiency analysis using learner corpora. InProceedings of the joint workshop on NLP for Computer Assisted Language Learning and NLP for Language Acquisition (pp.1–7).
    [Google Scholar]
  4. Al Saied, H., Candito, M., & Constant, M.
    (2017) The ATILF-LLF system for parseme shared task: A transition-based verbal multiword expression tagger. InThe European Chapter of the Association for Computational Linguistics EACL 2017, p.127–132. 10.18653/v1/W17‑1717
    https://doi.org/10.18653/v1/W17-1717 [Google Scholar]
  5. Araneta, M. G., Eryigit, G., König, A., Lee, J.-U., Luís, A., Lyding, V., Nicolas, L., Rodosthenous, C., Sangati, F.
    (2020) Substituto – A Synchronous Educational Language Game for Simultaneous Teaching and Crowdsourcing, InProc. of the 9th Workshop on Natural Language Processing for Computer Assisted Language Learning (NLP4CALL 2020), Linköping Electronic Conference Proceedings 175. 10.3384/ecp201759
    https://doi.org/10.3384/ecp201759 [Google Scholar]
  6. Artstein, R., & Poesio, M.
    (2008) Inter-coder agreement for computational linguistics. Computational linguistics, 34(4), 555–596. 10.1162/coli.07‑034‑R2
    https://doi.org/10.1162/coli.07-034-R2 [Google Scholar]
  7. Bahns, J. & Eldaw, M.
    (1993) Should We Teach EFL Students Collocations?System, 21(1), p.101–14. 10.1016/0346‑251X(93)90010‑E
    https://doi.org/10.1016/0346-251X(93)90010-E [Google Scholar]
  8. Baldwin, T. & Kim, S. N.
    (2010) Multiword Expressions. InHandbook of Natural Language Processing, Boca Raton, FL : CRC Press, Taylor and Francis Group. p.267–292.
    [Google Scholar]
  9. Beacco, J.-C. & Porquier, R.
    (2008) Niveau A2 pour le français : utilisateur-apprenant élémentaire, Didier, Paris.
    [Google Scholar]
  10. Beacco, J.-C., Bouquet, S., Porquier, R.
    (2004) Niveau B2 pour le français : un référentiel : utilisateur-apprenant indépendant, Didier, Paris.
    [Google Scholar]
  11. Beacco, J.-C.
    (2008) Niveau A1/A2 pour le français: Textes et références. Didier.
    [Google Scholar]
  12. Beacco, J.-C., Lepage, S., & Riba, P.
    (2011) Niveau B2 pour le français : un référentiel : utilisateur-apprenant indépendant. Didier.
    [Google Scholar]
  13. Beacco, J.-C., & Porquier, R.
    (2007) Niveau A1 pour le français: utilisateur-apprenant élémentaire. Didier.
    [Google Scholar]
  14. Burstein, J., Elliot, N., Klebanov, B. B., Madnani, N., Napolitano, D., Schwartz, M., Houghton, P., & Molloy, H.
    (2018) Writing Mentor: Writing Progress Using Self-Regulated Writing Support. Journal of Writing Analytics, 21, 285–313. 10.37514/JWA‑J.2018.2.1.12
    https://doi.org/10.37514/JWA-J.2018.2.1.12 [Google Scholar]
  15. Candito, M., Constant, M., Ramisch, C., Savary, A., Parmentier, Y., Pasquer, C., & Antoine, J.-Y.
    (2017, mai). Annotation d’expressions polylexicales verbales en français. Actes de TALN 2017.
    [Google Scholar]
  16. Cavalla, C.
    (2015) Les émotions : phraséologie et enseignement en FLE. Séminaire de recherche du CRISCO, CRISCO, Université de Caen – Basse Normandie, Dec 2015, Caen, France
    [Google Scholar]
  17. Cavalla, C., Loiseau, M., Diwersy, S., Lascombe, V., & Socha, J.
    (2013, juillet). EmoProf. Journées Lig-Lidilem. https://hal.archives-ouvertes.fr/hal-01099027
    [Google Scholar]
  18. Coavoux, M., & Crabbé, B.
    (2017) Incremental Discontinuous Phrase Structure Parsing with the GAP Transition. Proceedings of EACL 2017: Volume 1, Long Papers, 1259–1270. www.aclweb.org/anthology/E17-1118. 10.18653/v1/E17‑1118
    https://doi.org/10.18653/v1/E17-1118 [Google Scholar]
  19. Cobb, T.
    (2013) Frequency 2.0: Incorporating homoforms and multiword units in pedagogical frequency lists. InC. Bardel, C. Lindqvist, & B. Laufer (Éds.), L2 vocabulary acquisition, knowledge and use: New perspectives on assessment and corpus analysis (p. 79–108). Eurosla.
    [Google Scholar]
  20. Conseil de l’Europe
    Conseil de l’Europe (2001) Cadre européen commun de référence pour les langues : apprendre, enseigner, évaluer. Hatier.
    [Google Scholar]
  21. Constant, M., Ergÿgit, G., Monti, J., Van der Plas, L., Ramisch, C., Rosner, M., Todirascu, A.
    (2017) Multiword Expression Processing : A Survey. Computational Linguistics, 43(4), p.837–892. 10.1162/COLI_a_00302
    https://doi.org/10.1162/COLI_a_00302 [Google Scholar]
  22. Diwersy, S., Goossens, V., Grutschus, A., Kern, B., Kraif, O., Melnikova, E., & Novakova, I.
    (2014) Traitement des lexies d’émotion dans les corpus et les applications d’EmoBase. Corpus, 131, 269–293. 10.4000/corpus.2537
    https://doi.org/10.4000/corpus.2537 [Google Scholar]
  23. Dürlich, L., & François, T.
    (2018) EFLLex: A Graded Lexical Resource for Learners of English as a Foreign Language. Proceedings of LREC 2018, 873–879.
    [Google Scholar]
  24. Foster, P., Bolibaugh, C., & Kotula, A.
    (2014) Knowledge of nativelike selections in a L2: The influence of exposure, memory, age of onset, and motivation in foreign language and immersion settings. Studies in Second Language Acquisition, 36(1), 101–132. 10.1017/S0272263113000624
    https://doi.org/10.1017/S0272263113000624 [Google Scholar]
  25. François, T.
    (2014) An analysis of a French as a Foreign language corpus for readability assessment. Proceedings of the 3rd workshop on NLP for CALL, NEALT Proceedings Series Vol. 22, Linköping Electronic Conference Proceedings1071, 13–32.
    [Google Scholar]
  26. François, T., Gala, N., Watrin, P. & Fairon, C.
    (2014) FLELex : a graded lexical resource for French foreign learners. InProc. of the Language and Resources Evaluation Conference (LREC 2014), Reykjavick, Iceland, p.3766–3773.
    [Google Scholar]
  27. François, T., Volodina, E., Ildikó, P., & Tack, A.
    (2016) SVALex: a CEFR-graded lexical resource for Swedish foreign and second language learners. LREC 2016, 213–219.
    [Google Scholar]
  28. François, T., & Watrin, P.
    (2011) On the contribution of MWE-based features to a readability formula for French as a foreign language. Proceedings of RANLP 2011, 441–447.
    [Google Scholar]
  29. Gala, N., François, T. et Fairon, C.
    (2013) Towards a French lexicon with difficulty measures: NLP helping to bridge the gap between traditional dictionaries and specialized lexicons. InProceedings of Electronic lexicography in the 21st century: thinking outside the paper (eLEX-2013), 132–151, Tallinn, Estonia.
    [Google Scholar]
  30. Garnier, M., & Schmitt, N.
    (2015) The PHaVE List: A pedagogical list of phrasal verbs and their most frequent meaning senses. Language Teaching Research, 19(6), 645–666. 10.1177/1362168814559798
    https://doi.org/10.1177/1362168814559798 [Google Scholar]
  31. Granger, S., & Paquot, M.
    (2009) Lexical verbs in academic discourse: A corpus-driven study of learner use. InM. Charles, D. Pecorari, & S. Hunston (Eds.), Academic writing: At the interface of corpus and discourse (pp.193–214). New York, NY: Continuum.
    [Google Scholar]
  32. Granger, S., Paquot, M.,
    (2008) Disentangling the phraseological web, Phraseology: An interdisciplinary perspective, vol 27, John Benjamins Amsterdam, p.49. 10.1075/z.139
    https://doi.org/10.1075/z.139 [Google Scholar]
  33. Gross, M.
    (1994) Constructing Lexicon-Grammars, InAtkins, R. and Zampolli, A., Computational approaches to the lexicon, Oxford Univ. Press, p.213–263. 10.1093/oso/9780198239796.003.0008
    https://doi.org/10.1093/oso/9780198239796.003.0008 [Google Scholar]
  34. (1993) Les phrases figées en français. L’information grammaticale, 591, 36–41. 10.3406/igram.1993.3139
    https://doi.org/10.3406/igram.1993.3139 [Google Scholar]
  35. Gooding, S., Taslimipoor, S., & Kochmar, E.
    (2020) Incorporating Multiword Expressions in Phrase Complexity Estimation. Proceedings of the 1st Workshop on Tools and Resources to Empower People with REAding DIfficulties (READI), 14–19.
    [Google Scholar]
  36. Hamel, M.-J. & Milicevic, J.
    (2007) Analyse d’erreurs lexicales d’apprenants du FLS : démarche empirique pour l’élaboration d’un dictionnaire d’apprentissage. Canadian Journal of Applied Linguistics, 10(1), p.25–45.
    [Google Scholar]
  37. Hamel, M.-J., Slavkov, N., Inkpen, D., & Xiao, D.
    (2016) MyAnnotator : A Tool for Technology-Mediated Written Corrective Feedback. TAL, 57(3), 119–142.
    [Google Scholar]
  38. Hathout, N., Sajous, F., & Calderone, B.
    (2014) GLÀFF, a Large Versatile French Lexicon. Proceedings of LREC’14, 1007–1012.
    [Google Scholar]
  39. Jurafsky, D., & Martin, J. H.
    (2008) Speech and Language Processing: An introduction to speech recognition, computational linguistics and natural language processing. Upper Saddle River, NJ: Prentice Hall.
    [Google Scholar]
  40. Kilgarriff, A., Charalabopoulou, F., Gavrilidou, M., Johannessen, J. B., Khalil, S., Johansson Kokkinakis, S., … & Volodina, E.
    (2014) Corpus-based vocabulary lists for language learners for nine languages. Language resources and evaluation, 48(1), 121–163. 10.1007/s10579‑013‑9251‑2
    https://doi.org/10.1007/s10579-013-9251-2 [Google Scholar]
  41. Kremmel, B., Brunfaut, T., & Alderson, J. C.
    (2017) Exploring the Role of Phraseological Knowledge in Foreign Language Reading. Applied Linguistics, 38(6), 848–870.
    [Google Scholar]
  42. Laporte, É., Ranchhod, E., & Yannacopoulou, A.
    (2008) Syntactic variation of support verb constructions. Lingvisticae Investigationes, 31(2), 173–185. 10.1075/li.31.2.04lap
    https://doi.org/10.1075/li.31.2.04lap [Google Scholar]
  43. Madnani, N., Burstein, J., Sabatini, J., Biggers, K., & Andreyev, S.
    (2016) Language MuseTM: Automated Linguistic Activity Generation for English Language Learners. Proceedings of ACL 2016, 213–263.
    [Google Scholar]
  44. Marello, C.
    (2012) Word lists in Reference Level Descriptions of CEFR (Common European Framework of Reference for Languages). Proceedings of the XV Euralex International Congress, 328–335.
    [Google Scholar]
  45. Martinez, R., & Schmitt, N.
    (2012) A phrasal expressions list. Applied linguistics, 33(3), 299–320. 10.1093/applin/ams010
    https://doi.org/10.1093/applin/ams010 [Google Scholar]
  46. McCauley, S. M., & Christiansen, M. H.
    (2017) Computational investigations of multiword chunks in language learning. Topics in Cognitive Science, 9(3), 637–652. 10.1111/tops.12258
    https://doi.org/10.1111/tops.12258 [Google Scholar]
  47. Mel’čuk, I.
    (1998) Collocations and lexical functions. InPhraseology. Theory, analysis, and applications (p.23–53). Citeseer. 10.1093/oso/9780198294252.003.0002
    https://doi.org/10.1093/oso/9780198294252.003.0002 [Google Scholar]
  48. Ozasa, T., Weir, G., Fukui, M.
    (2007) Measuring readability for Japanese learners of English, Proceedings of PAAL 2007, pp.122–125 2007.
    [Google Scholar]
  49. Pasquer, C., Ramisch, C., Savary, A., & Antoine, J.-Y.
    (2018) VarIDE at PARSEME Shared Task 2018: Are Variants Really as Alike as Two Peas in a Pod?Proceedings of the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions (LAW-MWE-CxG-2018), 283–289. https://www.aclweb.org/anthology/W18-4932
    [Google Scholar]
  50. Pasquer, C., Savary, A., Ramisch, C., & Antoine, J-Y.
    (2020a) Seen2Unseen at PARSEME Shared Task 2020: All Roads do not Lead to Unseen Verb-Noun VMWEs, inthe Proceedings of the Joint Workshop on Multiword Expressions and Electronic Lexicons (MWE-LEX 2020), 13 December 2020, Barcelona, Spain (online).
    [Google Scholar]
  51. Pasquer, C., Savary, A., Ramisch, C., & Antoine, J.-Y.
    (2020b) Verbal Multiword Expression Identification: Do We Need a Sledgehammer to Crack a Nut?Proceedings of COLING 2020. 10.18653/v1/2020.coling‑main.296
    https://doi.org/10.18653/v1/2020.coling-main.296 [Google Scholar]
  52. Pawley, A., & Syder, F. H.
    (1983) Two puzzles for linguistic theory: nativelike selection and nativelike fluency. InJ. Richards & R. Schmitt (Éds.), Language and Communication (p.191–225). Longman.
    [Google Scholar]
  53. Pellicer-Sánchez, A., & Schmitt, N.
    (2010) Incidental vocabulary acquisition from an authentic novel: Do things fall apart?Reading in a Foreign Language, 221, 31–55.
    [Google Scholar]
  54. Pintard, A., & François, T.
    (2020) Combining expert knowledge with frequency information to infer CEFR levels for words. InProceedings of the 1st Workshop on Tools and Resources to Empower People with REAding DIfficulties (READI) (pp.85–92).
    [Google Scholar]
  55. Ramisch, C., Cordeiro, S., Savary, A., Vincze, V., Mititelu, V., Bhatia, A., Buljian, M., Candito, M., Gantar, P.
    , and others (2018) Edition 1.1 of the PARSEME shared task on automatic identification of verbal multiword expressions. InProceedings of the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions (LAW-MWE-CxG-2018), p.222–240, Santa Fe, New Mexico, USA : Association
    [Google Scholar]
  56. Ramisch, C.
    (2015) Multiword Expressions Acquisition: A Generic and Open Framework, Springer International Publishing Switzerland 2015 10.1007/978‑3‑319‑09207‑2
    https://doi.org/10.1007/978-3-319-09207-2 [Google Scholar]
  57. Rey, I. G.
    (2007) La didactique du français idiomatique. Editions Modulaires Européennes InterCommunication.
    [Google Scholar]
  58. Rott, S.
    (1999) The Effect of Exposure Frequency on Intermediate Language Learners’ Incidental Vocabulary Acquisition and Retention through Reading. Studies in second language acquisition, 21(4), 589–619. 10.1017/S0272263199004039
    https://doi.org/10.1017/S0272263199004039 [Google Scholar]
  59. Sag, I., Baldwin, T., Bond, F., Copestake, A., & Flickinger, D.
    (2002) Multiword Expressions: A Pain in the Neck for NLP. Proceedings of CICLing-2002, 1–15. 10.1007/3‑540‑45715‑1_1
    https://doi.org/10.1007/3-540-45715-1_1 [Google Scholar]
  60. Savary, A., Cordeiro, S. R., & Ramisch, C.
    (2019) Without lexicons, multiword expression identification will never fly: A position statement. Joint Workshop on Multiword Expressions and WordNet (MWE-WN 2019), 79–91. 10.18653/v1/W19‑5110
    https://doi.org/10.18653/v1/W19-5110 [Google Scholar]
  61. Siyanova-Chanturia, A.
    (2017) Researching the teaching and learning of multi-word expressions. Language Teaching Research, 21(3), 289–297. 10.1177/1362168817706842
    https://doi.org/10.1177/1362168817706842 [Google Scholar]
  62. Siyanova-Chanturia, A., & Spina, S.
    (2020) Multi-word expressions in second language writing: A large-scale longitudinal learner corpus study. Language Learning, 70(2), 420–463. 10.1111/lang.12383
    https://doi.org/10.1111/lang.12383 [Google Scholar]
  63. Tack, A., François, T., Desmet, P., & Fairon, C.
    (2018) NT2Lex: A CEFR-Graded Lexical Resource for Dutch as a Foreign Language Linked to Open Dutch WordNet. Proceedings of BEA 2018 (NAACL 2018). 10.18653/v1/W18‑0514
    https://doi.org/10.18653/v1/W18-0514 [Google Scholar]
  64. Tack, A., François, T., Ligozat, A.-L., & Fairon, C.
    (2016) Evaluating lexical simplification and vocabulary knowledge for learners of French: possibilities of using the FLELex resource. Proceedings of LREC 2016), 230–236.
    [Google Scholar]
  65. Todirascu, A. & Cargill, M.
    (2019) SimpleApprenant: a platform to improve French L2 learners’ knowledge of multiword expressions. Inproc. of EUROCALL “CALL & Complexity”, 1651, Louvain-La-Neuve, Belgium. 10.14705/rpnet.2019.38.1036
    https://doi.org/10.14705/rpnet.2019.38.1036 [Google Scholar]
  66. Todirascu, A., Cargill, M., Francois, T.
    (2019) PolylexFLE : une base de données d’expressions polylexicales pour le FLE. Actes de la 26e Conférence sur le Traitement Automatique des Langues Naturelles (TALN), Toulouse, France, p.143–156.
    [Google Scholar]
  67. Tolone, E.
    (2011) Maintenance du Lexique-Grammaire : Formules définitoires et arbre de classement. Ressources Linguistiques Libres, 52(3), 153–190.
    [Google Scholar]
  68. Tutin, A., Esperança-Rodier, E., Iborra, M., & Reverdy, J.
    (2015) Annotation of multiword expressions in French. InC.-P. Gloria (Éd.), European Society of Phraseology Conference (EUROPHRAS 2015) (p. 60–67). https://hal.archives-ouvertes.fr/hal-01348549
    [Google Scholar]
  69. Tutin, A. & Grossmann, F.
    (2002) Collocations régulières et irrégulières : esquisse de typologie du phénomène collocatif, RFLA, vol1, no1, p.7–25. 10.3917/rfla.071.0007
    https://doi.org/10.3917/rfla.071.0007 [Google Scholar]
  70. Verlinde, S., Binon, J., & Selva, T.
    (2006) The Base Lexicale du Français (BLF): A Multifunctional Online Database for Learners of French. InC. O. Elisa Corino Carla Marello (Éd.), Proceedings of the 12th EURALEX International Congress (p. 471–481). Edizioni dell’Orso.
    [Google Scholar]
  71. Zampieri, N., Scholivet, M., Ramisch, C., & Favre, B.
    (2018) Veyn at PARSEME Shared Task 2018: Recurrent Neural Networks for VMWE Identification. Proceedings of the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions (LAW-MWE-CxG-2018), 290–296. www.aclweb.org/anthology/W18-4933
    [Google Scholar]
/content/journals/10.1075/itl.22031.tod
Loading
/content/journals/10.1075/itl.22031.tod
Loading

Data & Media loading...

This is a required field
Please enter a valid email address
Approval was successful
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error