Volume 24, Issue 1
  • ISSN 0929-9971
  • E-ISSN: 1569-9994
Buy:$35.00 + Taxes


Due to its specific linguistic properties, the language found in clinical records has been characterized as a distinct sublanguage. Even within the clinical domain, though, there are major differences in language use, which has led to more fine-grained distinctions based on medical fields and document types. However, previous work has mostly neglected the influence of term variation. By contrast, we propose to integrate the potential for term variation in the characterization of clinical sublanguages. By analyzing a corpus of clinical records, we show that the different sections of these records vary systematically with regard to their lexical, terminological and semantic composition, as well as their potential for term variation. These properties have implications for automatic term recognition, as they influence the performance of frequency-based term weighting.


Article metrics loading...

Loading full text...

Full text loading...


  1. Afzal, Zubair , Ewoud Pons , Ning Kang , Miriam Sturkenboom , Martijn J. Schuemie , and Jan A. Kors
    2014 “ContextD: An Algorithm to Identify Contextual Properties of Medical Terms in a Dutch Clinical Corpus.” BMC Bioinformatics15(1): 373. doi: 10.1186/s12859‑014‑0373‑3
    https://doi.org/10.1186/s12859-014-0373-3 [Google Scholar]
  2. Ahmad, Khurshid , Lee Gillam , and Lena Tostevin
    1999 “University of Surrey Participation in TREC8: Weirdness Indexing for Logical Document Extrapolation and Retrieval (WILDER).” InProceedings of the 8th Text Retrieval Conference (TREC-8), ed. by Ellen M. Voorhees , and Donna K. Harman , 717–724. Washington: National Institute of Standards and Technology.
    [Google Scholar]
  3. Bansler, Jørgen P. , Erling C. Havn , Kjeld Schmidt , and Troels Mønsted
    2016 “Cooperative Epistemic Work in Medical Practice: An Analysis of Physicians’ Clinical Notes.” Computer Supported Cooperative Work25: 503–546.10.1007/s10606‑016‑9261‑x
    https://doi.org/10.1007/s10606-016-9261-x [Google Scholar]
  4. Bowker, Lynne , and Shane Hawkins
    2006 “Variation in the Organization of Medical Terms: Exploring Some Motivations for Term Choice.” Terminology12: 79–110. doi: 10.1075/term.12.1.05bow
    https://doi.org/10.1075/term.12.1.05bow [Google Scholar]
  5. Chiaramello, Emma , Francesco Pinciroli , Alberico Bonalumi , Angelo Caroli , and Gabriella Tognola
    2016 “Use of ‘Off-the-Shelf’ Information Extraction Algorithms in Clinical Informatics: A Feasibility Study of MetaMap. Annotation of Italian Medical Notes.” Journal of Biomedical Informatics63: 22–32.10.1016/j.jbi.2016.07.017
    https://doi.org/10.1016/j.jbi.2016.07.017 [Google Scholar]
  6. Doing-Harris, Kristina , Olga Patterson , Sean Igo , and John Hurdle
    2013 “Document Sublanguage Clustering to Detect Medical Specialty in Cross-Institutional Clinical Texts.” InProceedings of the 7th International Workshop on Data and Text Mining in Biomedical Informatics, 9–12. AccessedJune 15, 2017.
    [Google Scholar]
  7. Doing-Harris, Kristina , Yarden Livnat , and Stephane Meystre
    2015 “Automated Concept and Relationship Extraction for the Semi-Automated Ontology Management (SEAM) System.” Journal of Biomedical Semantics6 (15): 1–15.
    [Google Scholar]
  8. Faber, Pamela
    . “Specialized Language Pragmatics.” InA Cognitive Linguistics View of Terminology and Specialized Language ed. Pamela Faber , 213–239. New York: De Gruyter Mouton 2010.
    [Google Scholar]
  9. Faber, Pamela , and Pilar León-Araúz
    2016 “Specialized Knowledge Representation and the Parameterization of Context.” Frontiers in Psychology7: 1–20.10.3389/fpsyg.2016.00196
    https://doi.org/10.3389/fpsyg.2016.00196 [Google Scholar]
  10. Feldman, Keith , and Nicholas Hazekamp
    2016 “Mining the Clinical Narrative: All Text Are Not Equal.” InIEEE International Conference on Healthcare Informatics 2016, ed. Wai-Tat Fu , Larry Hodges , Kai Zheng , Gregor Stiglic , and Ann Blandford , 271–280. Piscataway, N.J.: IEEE.
    [Google Scholar]
  11. Frantzi, Katerina , Sophia Ananiadou , and Hideki Mima
    . “Natural Language Processing for Digital Libraries Automatic Recognition of Multi-Word Terms: The C-Value/NC-Value Method.” International Journal on Digital Libraries3 (2000): 115–30. doi: 10.1007/s007999900023
    https://doi.org/10.1007/s007999900023 [Google Scholar]
  12. Friedman, Carol
    . “Sublanguage Text Processing – Application to Medical Narrative.” InAnalyzing language in restricted domains ed. Ralph, Grishman R. , Kittredge, R. , 85–102. Hillsdale, NJ: Lawrence Erlbaum 1986.
    [Google Scholar]
  13. Friedman, Carol , Pauline Kra , and Andrey Rzhetsky
    2002 “Two Biomedical Sublanguages: A Description Based on the Theories of Zellig Harris.” Journal of Biomedical Informatics35: 222–35. doi: 10.1016/S1532‑0464(03)00012‑1
    https://doi.org/10.1016/S1532-0464(03)00012-1 [Google Scholar]
  14. Grigonyte, Gintare , Maria Kvist , Mats Wirén , Sumithra Velupillai , and Aron Henriksson
    2016 “Swedification Patterns of Latin and Greek Affixes in Clinical Text.” Nordic Journal of Linguistics39(1): 5–37.10.1017/S0332586515000293
    https://doi.org/10.1017/S0332586515000293 [Google Scholar]
  15. Harris, Zellig Sabbettai
    . A Theory of Language and Information: A Mathematical Approach. Oxford: Clarendon Press 1991.
    [Google Scholar]
  16. He, Zhe , Zhiwei Chen , Sanghee Oh , Jinghui Hou , and Jiang Bian
    2017 “Enriching Consumer Health Vocabulary through Mining a Social Q&A Site: A Similarity-Based Approach.” Journal of Biomedical Informatics69. Elsevier Inc.: 75–85.
    [Google Scholar]
  17. Jensen, Lotte G. , and Claus Bossen
    2016 “Factors Affecting Physicians’ Use of a Dedicated Overview Interface in an Electronic Health Record: The Importance of Standard Information and Standard Documentation.” International Journal of Medical Informatics87: 44–53. doi: 10.1016/j.ijmedinf.2015.12.009
    https://doi.org/10.1016/j.ijmedinf.2015.12.009 [Google Scholar]
  18. Kaufman, David R. , Barbara Sheehan , Peter Stetson , Ashish R. Bhatt , and I. Adele
    2016 “Natural Language Processing-Enabled and Conventional Data Capture Methods for Input to Electronic Health Records: A Comparative Usability Study.” JMIR Medical Informatics4: e35.10.2196/medinform.5544
    https://doi.org/10.2196/medinform.5544 [Google Scholar]
  19. Leaman, Robert , Ritu Khare , and Zhiyong Lu
    2015 “Challenges in Clinical Natural Language Processing for Automated Disorder Normalization.” Journal of Biomedical Informatics57: 28–37. doi: 10.1016/j.jbi.2015.07.010
    https://doi.org/10.1016/j.jbi.2015.07.010 [Google Scholar]
  20. León-Araúz, Pilar , Pamela Faber , and Silvia Montero Martínez
    . “Specialized Language Semantics.” InA Cognitive Linguistics View of Terminology and Specialized Language ed. Pamela Faber , 133–212. New York: De Gruyter Mouton 2010.
    [Google Scholar]
  21. Lossio-Ventura, Juan Antonio , Clement Jonquet , Mathieu Roche , and Maguelonne Teisseire
    . “Biomedical Term Extraction: Overview and a New Methodology.” Information Retrieval Journal19 (2016): 59–99.
    [Google Scholar]
  22. Lövestam, Elin , Sumithra Velupillai , and Maria Kvist
    2014 “Abbreviations in Swedish Clinical Text – Use by Three Professions.” Studies in Health Technology and Informatics205: 720–24.
    [Google Scholar]
  23. Patterson, Olga O. , and John F. Hurdle
    2011 “Document Clustering of Clinical Narratives: A Systematic Study of Clinical Sublanguages.” InAMIA 2011 Annual Symposium, 1099–1107.
    [Google Scholar]
  24. Periñán-Pascual, Carlos
    2017 DEXTER: A Workbench for Automatic Term Extraction with Specialized Corpora. Natural Language Engineering. Cambridge University Press.
    [Google Scholar]
  25. Riveros, Alejandro , Maria De-Arteaga , Fabio A. Gonzalez , and Sergio Jimenez
    2014 “MindLab-UNAL: Comparing Metamap and T-Mapper for Medical Concept Extraction in SemEval 2014 Task 7.” InProceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014), edited by Preslav Nakov and Torsten Zesch , 424–27. Dublin, Ireland: Association for Computational Linguistics. doi: 10.3115/v1/S14‑2073
    https://doi.org/10.3115/v1/S14-2073 [Google Scholar]
  26. Roberts, Angus
    2017 “Language, Structure, and Reuse in the Electronic Health Record.” AMA Journal of Ethics19(3): 281–88.10.1001/journalofethics.2017.19.3.stas1‑1703
    https://doi.org/10.1001/journalofethics.2017.19.3.stas1-1703 [Google Scholar]
  27. Rosenbloom, S Trent , Joshua C. Denny , Hua Xu , Nancy Lorenzi , William W. Stead , and Kevin B. Johnson
    2011 “Data from Clinical Notes: A Perspective on the Tension between Structure and Flexible Documentation.” Journal of the American Medical Informatics Association18: 181–86. doi: 10.1136/jamia.2010.007237
    https://doi.org/10.1136/jamia.2010.007237 [Google Scholar]
  28. Sager, Naomi , Margaret Lyman , Christine Bucknall , Ngo Nhan , and Leo Tick
    1994 “Natural Language Processing and the Representation of Clinical Data.” Journal of the American Medical Informatics Association1: 142–60. doi: 10.1136/jamia.1994.95236145
    https://doi.org/10.1136/jamia.1994.95236145 [Google Scholar]
  29. Siklósi, Borbála , Attila Novák , and Gábor Prószéky
    2016 “Context-Aware Correction of Spelling Errors in Hungarian Medical Documents.” Computer Speech & Language35 (2016): 219–33. doi: 10.1016/j.csl.2014.09.001
    https://doi.org/10.1016/j.csl.2014.09.001 [Google Scholar]
  30. Stetson, Peter D. , Stephen B. Johnson , Matthew Scotch , and George Hripcsak
    2002 “The Sublanguage of Cross-Coverage.” InProceedings of the AMIA 2002 Annual Symposium, ed. Isaac S. Kohana , 742–46.
    [Google Scholar]
  31. Temmerman, Rita
    . Towards New Ways of Terminology Description: The Sociocognitive-Approach. Amsterdam/Philadelphia: John Benjamins Publishing Company 2000. doi: 10.1075/tlrp.3
    https://doi.org/10.1075/tlrp.3 [Google Scholar]
  32. Temnikova, Irina , Ivelina Nikolova , William Baumgartner , Galia Angelova , and Kevin Cohen
    2013 “Closure Properties of Bulgarian Clinical Text.” InRecent Advances in Natural Language Processing 2013 Proceedings, ed. Galia Angelova , Kalina Bontcheva , Ruslan Mitkov , 667–75.
    [Google Scholar]
  33. Topaz, Maxim , Kenneth Lai , Dawn Dowding , Victor Lei , Anna Zisberg , Kathryn H. Bowles , and Li Zhou
    2016 “Automated Identification of Wound Information in Clinical Notes of Patients with Heart Diseases: Developing and Validating a Natural Language Processing Application.” International Journal of Nursing Studies64: 25–31.10.1016/j.ijnurstu.2016.09.013
    https://doi.org/10.1016/j.ijnurstu.2016.09.013 [Google Scholar]
  34. Zeng, Qing T. , Doug Redd , Guy Divita , Cynthia Brandt , and Jonathan R. Nebeker
    2011 “Characterizing Clinical Text and Sublanguage: A Case Study of the VA Clinical Notes.” J Health Med Informat S3: 1–9.
    [Google Scholar]

Data & Media loading...

  • Article Type: Research Article
Keyword(s): clinical sublanguage; Dutch; electronic health records; term variation
This is a required field
Please enter a valid email address
Approval was successful
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error