1887
image of Boosting LLM performance with generative question-answer pairs via Wh‑transformation
USD
Buy:$35.00 + Taxes

Abstract

This study explores methods to enhance the performance of offline Large Language Models (LLMs) using generative question-answer (QA) pairs. Existing research highlights the effectiveness of example-based prompts and QA pairs in improving LLM robustness and contextual understanding (Takahashi et al. 2023, Chowdhury & Chadha 2024). However, generating domain-specific QA pairs remains challenging due to the scarcity of datasets across diverse industrial sectors. To address this issue, we advance an innovative and adaptive approach that employs Generative Grammar (Chomsky 1957 .) to convert industry-specific statements into questions, thereby facilitating QA pair creation. We compare the efficacy of this method with that of LLM-generated QA pairs. Our proposed approach not only reduces the labor-intensive process typically associated with prompt engineering but also provides a transparent and systematic framework for question generation through controlled -movement transformations. Initial findings indicate that QA pairs generated via these transformational rules substantially enhance LLM performance in industrial chatbot applications by enriching contextual information and highlighting promising directions for future LLM research and downstream applications.

Loading

Article metrics loading...

/content/journals/10.1075/consl.24039.wan
2026-04-10
2026-05-11
Loading full text...

Full text loading...

References

  1. Chen, Shuangshuang
    2024 Resolving Chinese anaphora with Chatgpt. Proceedings of the 2024 International Conference on Asian Language Processing (IALP), ed. byRui Liu, Lei Wang, Feilong Bao, Yanfeng Lu, Cunhang Fan and Minghui Dong, –. New York: Institute of Electrical and Electronics Engineers. 10.1109/IALP63756.2024.10661112
    https://doi.org/10.1109/IALP63756.2024.10661112 [Google Scholar]
  2. Cheng, Lai-Shen, Lisa
    1991 On the Typology of Wh-questions. Doctoral Dissertation, Massachusetts Institute of Technology, Cambridge, MA.
    [Google Scholar]
  3. Chomsky, Noam
    1957Syntactic Structures. The Hague, Netherlands: Mouton. 10.1515/9783112316009
    https://doi.org/10.1515/9783112316009
  4. 1970 Remarks on nominalization. Readings in English Transformational Grammar, ed. byRoderick Jacobs and Peter Rosenbaum, –. Washington, D.C.: Georgetown UP.
    [Google Scholar]
  5. 1973 Conditions on transformations. A Festschrift for Morris Halle, ed. byStephen R. Anderson and Paul Kiparsky, –. New York: Holt, Rinehart & Winston.
    [Google Scholar]
  6. 1993A Minimalist Program for Linguistic Theory. Cambridge, MA: The MIT Press.
  7. Chowdhury, Arijit Ghosh, and Aman Chadha
    2024 Generative data augmentation using LLMs improves distributional robustness in question answering. Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics: Student Research Workshop, ed. byNeele Falk, Sara Papi and Mike Zhang, –. Stroudsburg, PA: Association for Computational Linguistics. 10.18653/v1/2024.eacl‑srw.20
    https://doi.org/10.18653/v1/2024.eacl-srw.20 [Google Scholar]
  8. Chung, Meng-Hsuan, and Chao-Ting Tim Chou
    2025 Climbing towards the NLU of the universal reading of shei ‘who’. Concentric.:–. 10.1075/consl.24041.chu
    https://doi.org/10.1075/consl.24041.chu [Google Scholar]
  9. Huang, Cheng-Teh James
    1982 Logical Relations in Chinese and the Theory of Grammar. Doctoral Dissertation, Massachusetts Institute of Technology, Cambridge, MA.
    [Google Scholar]
  10. Jackendoff, Ray
    1977X-bar Syntax: A Study of Phrase Structure. Cambridge, MA: The MIT Press.
  11. Li, Aijun
    2002, Chinese prosody and prosodic labeling of spontaneous speech. Proceedings of Speech Prosody 2002, ed. byBernard Bel and Isabelle Marlien, –. Aix-en-Provence, France: Laboratoire Parole et Langage, Université de Provence. 10.21437/SpeechProsody.2002‑6
    https://doi.org/10.21437/SpeechProsody.2002-6 [Google Scholar]
  12. Li, Shengnan, Qu Weiguang, Wei Tingxin, Zhou Junsheng, Gu Yanhui, and Li Bin
    2021 A survey of Chinese anaphora resolution. Artificial Intelligence and Security: 7th International Conference (ICAIS 2021), ed. byXingming Sun, Xiaorui Zhang, Zhihua Xia and Elisa Bertino, –. Dublin, Ireland: Springer International Publishing. 10.1007/978‑3‑030‑78609‑0_16
    https://doi.org/10.1007/978-3-030-78609-0_16 [Google Scholar]
  13. Li, Yen-Hui Audrey
    1992 Indefinite wh in Mandarin Chinese. Journal of East Asian Linguistics.:–. 10.1007/BF00130234
    https://doi.org/10.1007/BF00130234 [Google Scholar]
  14. Lin, Jo-Wang
    1996 Polarity Licensing and Wh-phrase Quantification in Chinese. Doctoral dissertation, University of Massachusetts at Amherst, Amherst, MA.
    [Google Scholar]
  15. 1998 On existential polarity wh-phrases in Chinese. Journal of East Asian Linguistics.:–. 10.1023/A:1008284513325
    https://doi.org/10.1023/A:1008284513325 [Google Scholar]
  16. Lu, Sin-En, Bo-Han Lu, Chao-Yi Lu, and Richard Tzong-Han Tsai
    2022 Exploring methods for building dialects-Mandarin code-mixing corpora: A case study in Taiwanese Hokkien. Findings of the Association for Computational Linguistics (EMNLP-2022), ed. byYoav Goldberg, Zornitsa Kozareva and Yue Zhang, –. Abu Dhabi, United Arab Emirates: Association for Computational Linguistics. 10.18653/v1/2022.findings‑emnlp.469
    https://doi.org/10.18653/v1/2022.findings-emnlp.469 [Google Scholar]
  17. Nunan, David
    1993Introducing Discourse Analysis. London: Penquin Group.
  18. Pater, Joe
    2019 Generative linguistics and neural networks at 60: Foundation, friction, and fusion. Language.:–. 10.1353/lan.2019.0009
    https://doi.org/10.1353/lan.2019.0009 [Google Scholar]
  19. Rajpurkar, Pranav, Jian Zhang, Konstantin Lopyrev, and Percy Liang
    2016 Squad: 100,000+ questions for machine comprehension of text. Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (EMNLP-2016), ed. byJian Su, Kevin Duh and Xavier Carreras, –. Austin, TX: Association for Computational Linguistics. 10.18653/v1/D16‑1264
    https://doi.org/10.18653/v1/D16-1264 [Google Scholar]
  20. Stowell, Tim
    1981 Origins of Phrase Structure. Doctoral dissertation, Massachusetts Institute of Technology, Cambridge, MA.
    [Google Scholar]
  21. Sukthanker, Rhea, Poria Soujanya, Cambria Erik, and Thirunavukarasu Ramkumar
    2020 Anaphora and coreference resolution: A review. Information Fusion:–. 10.1016/j.inffus.2020.01.010
    https://doi.org/10.1016/j.inffus.2020.01.010 [Google Scholar]
  22. Takahashi, Kosuke, Takahiro Omi, Kosuke Arima, and Tatsuya Ishigaki
    2023Training Generative Question-answering on Synthetic Data Obtained from an Instruct-tuned Model. RetrievedSeptember 27, 2024, fromhttps://arxiv.org/abs/2310.08072
  23. Tsai, Wei-Tien Dylan
    1994 On Economizing the Theory of A-Bar Dependencies. Doctoral dissertation, Massachusetts Institute of Technology, Cambridge, MA.
    [Google Scholar]
  24. Wang, Wen-jet, Chia-jung Chen, Chia-ming Lee, Chien-yu Lai, and Hsin-hung Lin
    2019aArticut: Chinese Word Segmentation and POS Tagging System (Version 274) [Computer program]. RetrievedOctober 3, 2024fromhttps://api.droidtown.co
  25. 2019bLinguistics-Oriented Keyword Interface NLU System (Version 4.0) [Computer program]. RetrievedOctober 3, 2024fromhttps://api.droidtown.co
  26. Wolfram, Stephen
    1985Analytical and Empirical Mathematics with Computers (Report No. 8540). Princeton, NJ: The Institute for Advanced Study.
  27. Zhou, Lexin, Wout Schellaert, Fernando Martínez-Plumed, Yael Moros-Daval, Cèsar Ferri, and José Hernández-Orallo
    2024 Larger and more instructable language models become less reliable. Nature:–. 10.1038/s41586‑024‑07930‑y
    https://doi.org/10.1038/s41586-024-07930-y [Google Scholar]
  28. Zhu, Peide, Zhen Wang, Claudia Hauff, Jie Yang, and Avishek Anand
    2022 Answer quality aware aggregation for extractive QA crowdsourcing. Findings of the Association for Computational Linguistics (EMNLP 2022), ed. byYoav Goldberg, Zornitsa Kozareva and Yue Zhang, –. Stroudsburg, PA: The Association for Computational Linguistics. 10.18653/v1/2022.findings‑emnlp.457
    https://doi.org/10.18653/v1/2022.findings-emnlp.457 [Google Scholar]
/content/journals/10.1075/consl.24039.wan
Loading
/content/journals/10.1075/consl.24039.wan
Loading

Data & Media loading...

This is a required field
Please enter a valid email address
Approval was successful
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error