1887
Volume 1, Issue 1
  • ISSN 2542-9477
  • E-ISSN: 2542-9485
USD
Buy:$35.00 + Taxes

Abstract

Abstract

Shlomo Argamon is Professor of Computer Science and Director of the Master of Data Science Program at the Illinois Institute of Technology (USA). In this article, he reflects on the current and potential relationship between register and the field of computational linguistics. He applies his expertise in computational linguistics and machine learning to a variety of problems in natural language processing. These include stylistic variation, forensic linguistics, authorship attribution, and biomedical informatics. He is particularly interested in the linguistic structures used by speakers and writers, including linguistic choices that are influenced by social variables such as age, gender, and register, as well as linguistic choices that are unique or distinctive to the style of individual authors. Argamon has been a pioneer in computational linguistics and NLP research in his efforts to account for and explore register variation. His computational linguistic research on register draws inspiration from Systemic Functional Linguistics, Biber’s multi-dimensional approach to register variation, as well as his own extensive experience accounting for variation within and across text types and authors. Argamon has applied computational methods to text classification and description across registers – including blogs, academic disciplines, and news writing – as well as the interaction between register and other social variables, such as age and gender. His cutting-edge research in these areas is certain to have a lasting impact on the future of computational linguistics and NLP.

Loading

Article metrics loading...

/content/journals/10.1075/rs.18015.arg
2019-04-26
2019-10-19
Loading full text...

Full text loading...

References

  1. Abbasi, A., & Chen, H.
    (2007) Categorization and analysis of text in computer mediated communication archives using visualization. InProceedings of the 7th ACM/IEEE-CS Joint Conference on Digital Libraries (pp.11–18). New York: ACM.
    [Google Scholar]
  2. Aizawa, A.
    (2003) An information-theoretic perspective of tf–idf measures. Information Processing & Management, 39(1), 45–65. 10.1016/S0306‑4573(02)00021‑3
    https://doi.org/10.1016/S0306-4573(02)00021-3 [Google Scholar]
  3. Amasyalı, M. F., & Diri, B.
    (2006) Automatic Turkish text categorization in terms of author, genre and gender. InInternational Conference on Application of Natural Language to Information Systems (pp.221–226). Berlin: Springer.
    [Google Scholar]
  4. Argamon-Engelson, S., Koppel, M., Avneri, G.
    (1998) Style-based text categorization: What newspaper am I reading?InProc. of AAAI Workshop on Learning for Text Categorization 1998 (pp.1–4).
    [Google Scholar]
  5. Argamon, S., & Levitan, S.
    (2005) Measuring the usefulness of function words for authorship attribution. InProceedings of the 2005 ACH/ALLC Conference.
    [Google Scholar]
  6. Argamon, S., & Koppel, M.
    (2010) The rest of the story: Finding meaning in stylistic variation. InThe structure of style (pp.79–112). Berlin: Springer. 10.1007/978‑3‑642‑12337‑5_5
    https://doi.org/10.1007/978-3-642-12337-5_5 [Google Scholar]
  7. Argamon, S., Dodick, J., & Chase, P.
    (2008) Language use reflects scientific methodology: A corpus-based study of peer-reviewed journal articles. Scientometrics, 75(2), 203–238. 10.1007/s11192‑007‑1768‑y
    https://doi.org/10.1007/s11192-007-1768-y [Google Scholar]
  8. Argamon, S., Koppel, M., Fine, J., & Shimoni, A. R.
    (2003) Gender, genre, and writing style in formal written texts. Text, 23(3), 321–346. 10.1515/text.2003.014
    https://doi.org/10.1515/text.2003.014 [Google Scholar]
  9. Argamon, S., Whitelaw, C., Chase, P., Hota, S. R., Garg, N., & Levitan, S.
    (2007) Stylistic text classification using functional lexical features. Journal of the American Society for Information Science and Technology, 58(6), 802–822. 10.1002/asi.20553
    https://doi.org/10.1002/asi.20553 [Google Scholar]
  10. Atkinson, D.
    (1992) The evolution of medical research writing from 1735 to 1985: The case of the Edinburgh Medical Journal. Applied Linguistics, 13(4), 337–374. 10.1093/applin/13.4.337
    https://doi.org/10.1093/applin/13.4.337 [Google Scholar]
  11. Bateman, J. A., Maier, E. A., Teich, E., & Wanner, L.
    (1991) Towards an architecture for situated text generation. InProceedings of the ICCICL (pp.289–302).
    [Google Scholar]
  12. Belz, A.
    (2005) Statistical generation: Three methods compared and evaluated. InProceedings of ENLG-2005 (pp.15–23).
    [Google Scholar]
  13. Berber Sardinha, T.
    (2017) Text types in Brazilian Portuguese: A multi-dimensional perspective. Corpora, 12(3), 483–515. 10.3366/cor.2017.0129
    https://doi.org/10.3366/cor.2017.0129 [Google Scholar]
  14. Biber, D.
    (1989) A typology of English texts. Linguistics, 27(1), 3–44. 10.1515/ling.1989.27.1.3
    https://doi.org/10.1515/ling.1989.27.1.3 [Google Scholar]
  15. (1991) Variation across speech and writing. Cambridge: Cambridge University Press.
    [Google Scholar]
  16. (1995) Dimensions of register variation: A cross-linguistic comparison. Cambridge: Cambridge University Press. 10.1017/CBO9780511519871
    https://doi.org/10.1017/CBO9780511519871 [Google Scholar]
  17. (2003) Variation among university spoken and written registers: A new multi-dimensional analysis. Language and Computers, 46, 47–70.
    [Google Scholar]
  18. (2004) Conversation text types: A multi-dimensional analysis. InLe poids des mots: Proc. of the 7th International Conference on the Statistical Analysis of Textual Data (pp.15–34). Louvain: Presses universitaires de Louvain.
    [Google Scholar]
  19. Biber, D., & Barbieri, F.
    (2007) Lexical bundles in university spoken and written registers. English for Specific P_urposes, 26(3), 263–286. 10.1016/j.esp.2006.08.003
    https://doi.org/10.1016/j.esp.2006.08.003 [Google Scholar]
  20. Biber, D., & Conrad, S.
    (2001) Register variation: A corpus approach. InD. Schiffrin, D. Tannen, & H. E. Hamilton (Eds.), The handbook of discourse analysis (pp.175–196). Malden, MA: Blackwell.
    [Google Scholar]
  21. (2009) Register, genre, and style. Cambridge: Cambridge University Press. 10.1017/CBO9780511814358
    https://doi.org/10.1017/CBO9780511814358 [Google Scholar]
  22. Biber, D., & Finegan, E.
    (2001) Diachronic relations among speech-based and written registers in English. InS. Conrad & D. Biber (Eds.), Variation in English: Multi-dimensional studies (pp.66–83). Harlow: Pearson Education.
    [Google Scholar]
  23. Brooke, J., Wang, T., & Hirst, G.
    (2010) Automatic acquisition of lexical formality. InProceedings of the 23rd International Conference on Computational Linguistics: Posters (pp.90–98). Stroudsburg, PA: Association for Computational Linguistics.
    [Google Scholar]
  24. Carroll, J., Minnen, G., & Briscoe, T.
    (1999) Corpus annotation for parser evaluation. InProceedings of the EACL workshop on LINC, June 1999.
    [Google Scholar]
  25. Clarke, I., & Grieve, J.
    (2017) Dimensions of abusive language on twitter. InProceedings of the First Workshop on Abusive Language Online (pp.1–10).
    [Google Scholar]
  26. Cohen, W. W.
    (1995) Fast effective rule induction. InProceedings 12th International Conference on Machine Learning (pp.115–123). Burlington MA: Morgan Kaufmann.
    [Google Scholar]
  27. Conrad, S. M.
    (1996) Investigating academic texts with corpus-based techniques: An example from biology. Linguistics and Education, 8(3), 299–326. 10.1016/S0898‑5898(96)90025‑X
    https://doi.org/10.1016/S0898-5898(96)90025-X [Google Scholar]
  28. Crowston, K., & Kwasnik, B. H.
    (2003) Can document-genre metadata improve information access to large digital collections?Library Trends, 52(2), 345–361.
    [Google Scholar]
  29. Crystal, D.
    (2011) Internet linguistics: A student guide. London: Routledge. 10.4324/9780203830901
    https://doi.org/10.4324/9780203830901 [Google Scholar]
  30. Damashek, M.
    (1995) Gauging similarity with n-grams: Language-independent categorization of text. Science, 267(5199), 843–848. 10.1126/science.267.5199.843
    https://doi.org/10.1126/science.267.5199.843 [Google Scholar]
  31. De Vel, O., Anderson, A., Corney, M., & Mohay, G.
    (2001) Mining e-mail content for author identification forensics. ACM Sigmod Record, 30(4), 55–64. 10.1145/604264.604272
    https://doi.org/10.1145/604264.604272 [Google Scholar]
  32. Degaetano-Ortlieb, S., Kermes, H., Khamis, A., & Teich, E.
    (2016) An information-theoretic approach to modeling diachronic change in scientific English. InC. Suhr, T. Nevalainen, & I. Taavitsainen (Eds.), Selected papers from Varieng – From data to evidence (d2e), Helsinki, Finland. Leiden: Brill.
    [Google Scholar]
  33. Diederich, J., Kindermann, J., Leopold, E., & Paass, G.
    (2003) Authorship attribution with support vector machines. Applied Intelligence, 19(1–2), 109–123. 10.1023/A:1023824908771
    https://doi.org/10.1023/A:1023824908771 [Google Scholar]
  34. DiMarco, C., & Foster, M. E.
    (1997) The automated generation of Web documents that are tailored to the individual reader. InProceedings of the AAAI-97 Spring Symposium on Natural Language Processing for the World Wide Web, Stanford, CA.
    [Google Scholar]
  35. Dong, L., Watters, C., Duffy, J., & Shepherd, M.
    (2008) An examination of genre attributes for web page classification. InProceedings of HICSS (pp.133). IEEE.
    [Google Scholar]
  36. Eisenstein, J., Smith, N. A., & Xing, E. P.
    (2011) Discovering sociolinguistic associations with structured sparsity. InProceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume1 (pp.1365–1374). Stroudsburg, PA: Association for Computational Linguistics.
    [Google Scholar]
  37. Ficler, J., & Goldberg, Y.
    (2017) Controlling linguistic style aspects in neural language generation. InProceedings from the Conference on Empirical Methods in Natural Language Processing (EMNLP) Workshop on Stylistic Variation (pp.94–104). 10.18653/v1/W17‑4912
    https://doi.org/10.18653/v1/W17-4912 [Google Scholar]
  38. Finn, A., & Kushmerick, N.
    (2006) Learning to classify documents according to genre. Journal of the American Society for Information Science and Technology, 57(11), 1506–1518. 10.1002/asi.20427
    https://doi.org/10.1002/asi.20427 [Google Scholar]
  39. Freund, L., Clarke, C. L., & Toms, E. G.
    (2006) Towards genre classification for IR in the workplace. InProceedings of the 1st International Conference on Information Interaction in Context (pp.30–36). ACM. 10.1145/1164820.1164829
    https://doi.org/10.1145/1164820.1164829 [Google Scholar]
  40. Fu, Z., Tan, X., Peng, N., Zhao, D. & Yan, R.
    (2018) Style transfer in text: Exploration and evaluation. InProceedings of the 32nd AAAI Conference on Artificial Intelligence.
    [Google Scholar]
  41. Gatt, A., & Krahmer, E.
    (2018) Survey of the state of the art in natural language generation: Core tasks, applications and evaluation. Journal of Artificial Intelligence Research, 61, 65–170. 10.1613/jair.5477
    https://doi.org/10.1613/jair.5477 [Google Scholar]
  42. Genkin, A., Lewis, D., & Madigan, D.
    (2006) Large-scale Bayesian logistic regression for text categorization. Technometrics, 49(3), 291–304. doi:  10.1198/004017007000000245
    https://doi.org/10.1198/004017007000000245 [Google Scholar]
  43. Giesbrecht, E., & Evert, S.
    (2009) Is part-of-speech tagging a solved task? An evaluation of POS taggers for the German web as corpus. InProceedings of the Fifth Web as Corpus Workshop (pp.27–35).
    [Google Scholar]
  44. Glover, A., & Hirst, G.
    (1996) Detecting stylistic inconsistencies in collaborative writing. InM. Sharples & T. van der Geest (Eds.), The New Writing Environment (pp.147–168). London: Springer. 10.1007/978‑1‑4471‑1482‑6_12
    https://doi.org/10.1007/978-1-4471-1482-6_12 [Google Scholar]
  45. Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., & Bengio, Y.
    (2014) Generative adversarial nets. InAdvances in Neural Information Processing Systems27 (pp.2672–2680).
    [Google Scholar]
  46. Goutte, C., & Gaussier, E.
    (2005, March). A probabilistic interpretation of precision, recall and F-score, with implication for evaluation. InEuropean Conference on Information Retrieval (pp.345–359). Berlin: Springer.
    [Google Scholar]
  47. Gries, S. T., & Mukherjee, J.
    (2010) Lexical gravity across varieties of English: An ICE-based study of n-grams in Asian Englishes. International Journal of Corpus Linguistics, 15(4), 520–548. 10.1075/ijcl.15.4.04gri
    https://doi.org/10.1075/ijcl.15.4.04gri [Google Scholar]
  48. Gries, S. T., Newman, J., & Shaoul, C.
    (2011) N-grams and the clustering of registers. Empirical Language Research Journal, 5(11).
    [Google Scholar]
  49. Grieve, J., Biber, D., Friginal, E. & Nekrasova, T.
    (2010) Variation among blogs: A multi-dimensional analysis. InGenres on the Web (pp.303–322). Springer, Dordrecht.10.1007/978‑90‑481‑9178‑9_14
    https://doi.org/10.1007/978-90-481-9178-9_14 [Google Scholar]
  50. Grieve, J., Biber, D., Friginal, E. and Nekrasova, T.
    (2010) Variation among blogs: A multi-dimensional analysis. InA. Mehler, S. Sharoff, & M. Santini (Eds.), Genres on the Web (pp.303–322). Dordrecht: Springer. 10.1007/978‑90‑481‑9178‑9_14
    https://doi.org/10.1007/978-90-481-9178-9_14 [Google Scholar]
  51. Halliday, M., & Hasan, R.
    (1989) Language, Context, and text: Aspects of language in a social-semiotic perspective, 2nd ed.Oxford: Oxford University Press.
    [Google Scholar]
  52. Halliday, M. A., McIntosh, A., & Strevens, P.
    (1968) The users and uses of language. InJ. Fischman (Ed.), Readings in the sociology of language (139–169). The Hague: Mouton. 10.1515/9783110805376.139
    https://doi.org/10.1515/9783110805376.139 [Google Scholar]
  53. Halliday, M. A. K., & Matthiessen, C.
    (2004) An introduction to functional grammar. London: Routledge.
    [Google Scholar]
  54. Hammerton, J., Osborne, M., Armstrong, S., & Daelemans, W.
    (2002) Introduction to special issue on machine learning approaches to shallow parsing. Journal of Machine Learning Research, 2, 551–558.
    [Google Scholar]
  55. Herring, S., Johnson, D. A., & DiBenedetto, T.
    (1995) This discussion is going too far!: Male resistance to female participation on the internet. InK. Hall & M. Bucholtz (Eds.), Gender articulated: Language and the socially constructed self (pp.67–96). New York: Routledge.
    [Google Scholar]
  56. Herring, S. C., & Paolillo, J. C.
    (2006) Gender and genre variation in weblogs. Journal of Sociolinguistics, 10(4), 439–459. 10.1111/j.1467‑9841.2006.00287.x
    https://doi.org/10.1111/j.1467-9841.2006.00287.x [Google Scholar]
  57. Heylighen, F., & Dewaele, J.
    (1999) Formality of language: definition, measurement and behavioral determinants. Interner Bericht, Center “Leo Apostel”, Vrije Universiteit Brüssel.
    [Google Scholar]
  58. Hochreiter, S., & Schmidhuber, J.
    (1997) Long short-term memory. Neural computation, 9(8), 1735–1780. 10.1162/neco.1997.9.8.1735
    https://doi.org/10.1162/neco.1997.9.8.1735 [Google Scholar]
  59. Holmes, D.
    (1998) The evolution of stylometry in humanities scholarship. Literary and Linguistic Computing, 13(3), 111–117. 10.1093/llc/13.3.111
    https://doi.org/10.1093/llc/13.3.111 [Google Scholar]
  60. Hoorn, J. F., Frank, S. L., Kowalczyk, W., & van Der Ham, F.
    (1999) Neural network identification of poets using letter sequences. Literary and Linguistic Computing, 14(3), 311–338. 10.1093/llc/14.3.311
    https://doi.org/10.1093/llc/14.3.311 [Google Scholar]
  61. Hovy, E., Lavid, J., Maier, E., Mittal, V., & Paris, C.
    (1992) Employing knowledge resources in a new text planner architecture. InAspects of automated natural language generation (pp.57–72). Berlin, Heidelberg: Springer. 10.1007/3‑540‑55399‑1_5
    https://doi.org/10.1007/3-540-55399-1_5 [Google Scholar]
  62. Hovy, E. H.
    (1990) Pragmatics and natural language generation. Artificial Intelligence, 43(2), pp.153–197. 10.1016/0004‑3702(90)90084‑D
    https://doi.org/10.1016/0004-3702(90)90084-D [Google Scholar]
  63. (1991) Approaches to the planning of coherent text. InR. Dale, E. Hovy, D. Rösner, & O. Stock (Eds.), Natural language generation in artificial intelligence and computational linguistics (pp.83–102). Boston, MA: Springer. 10.1007/978‑1‑4757‑5945‑7_3
    https://doi.org/10.1007/978-1-4757-5945-7_3 [Google Scholar]
  64. Husson, F., Lê, S., & Pags, J.
    (2010) Exploratory multivariate analysis by example using R. London: Chapman & Hall CRC. 10.1201/b10345
    https://doi.org/10.1201/b10345 [Google Scholar]
  65. Jhamtani, H., Gangal, V., Hovy, E., & Nyberg, E.
    (2017) Shakespearizing modern language using copy-enriched sequence to sequence models. InProceedings of the Workshop on Stylistic Variation at EMNLP 2017 (pp.10–19). 10.18653/v1/W17‑4902
    https://doi.org/10.18653/v1/W17-4902 [Google Scholar]
  66. Johansson, S., Leech, G. N., & Goodluck, H.
    (1978) Manual of information to accompany the Lancaster-Oslo/Bergen Corpus of British English, for use with digital computer. Oslo: Department of English, University of Oslo.
    [Google Scholar]
  67. Jolliffe, I.
    (2011) Principal component analysis. InM. Lovric (Ed.), International encyclopedia of statistical science (pp.1094–1096). Berlin: Springer. 10.1007/978‑3‑642‑04898‑2_455
    https://doi.org/10.1007/978-3-642-04898-2_455 [Google Scholar]
  68. Kakkonen, T., & Sutinen, E.
    (2008) Coverage-based evaluation of parser generalizability. InProceedings of the Third International Joint Conference on Natural Language Processing, Volume–II.
    [Google Scholar]
  69. Kan, M. Y., & McKeown, K. R.
    (2002) Corpus-trained text generation for summarization. InProceedings of the International Natural Language Generation Conference (pp.1–8).
    [Google Scholar]
  70. Kanaris, I., & Stamatatos, E.
    (2007) Webpage genre identification using variable-length character n-grams. InProceedings of the 19th IEEE International Conference on Tools with Artificial Intelligence (p.3–10). Washington, DC. 10.1109/ICTAI.2007.107
    https://doi.org/10.1109/ICTAI.2007.107 [Google Scholar]
  71. Karlgren, J.
    (1999) Stylistic experiments in information retrieval. InT. Strzalkowski (Ed.) Natural Language Information Retrieval (pp.147–166). Dordrecht: Springer. 10.1007/978‑94‑017‑2388‑6_6
    https://doi.org/10.1007/978-94-017-2388-6_6 [Google Scholar]
  72. Kešelj, V., Peng, F., Cercone, N., & Thomas, C.
    (2003) N-gram-based author profiles for authorship attribution. InProceedings of the Conference Pacific Association for Computational Linguistics, PACLING, 3 (pp.255–264).
    [Google Scholar]
  73. Kjell, B.
    (1994a) Authorship attribution of text samples using neural networks and Bayesian classifiers. InIEEE International Conference on Systems, Man and Cybernetics, San Antonio, TX. 10.1109/ICSMC.1994.400086
    https://doi.org/10.1109/ICSMC.1994.400086 [Google Scholar]
  74. Kjell, B., Woods, W. A., Frieder, O.
    (1995) Information retrieval using letter tuples with neural network and nearest neighbor classifiers. InIEEE International Conference on Systems, Man and Cybernetics (Vol., pp.1222–1225). Vancouver, BC.
    [Google Scholar]
  75. Koehn, P., Hoang, H., Birch, A., Callison-Burch, C., Federico, M., Bertoldi, N., Cowan, B., Shen, W., Moran, C., Zens, R., Dyer, C., Bojar, O., Constantin, A., & Herbst, E.
    (2007) Moses: open source toolkit for statistical machine translation. InProceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions. 10.3115/1557769.1557821
    https://doi.org/10.3115/1557769.1557821 [Google Scholar]
  76. Kohavi, R.
    (1995) A study of cross-validation and bootstrap for accuracy estimation and model selection. InC. S. Mellish (Ed.). Proceedings IJCAI-95, 14(2), 1137–1145. Montreal, Quebec.
    [Google Scholar]
  77. Koller, D., Friedman, N., & Bach, F.
    (2009) Probabilistic graphical models: Principles and techniques. Cambridge, MA: The MIT press.
    [Google Scholar]
  78. Koppel, M., Argamon, S., & Shimoni, A. R.
    (2002) Automatically categorizing written texts by author gender. Literary and Linguistic Computing, 17(4), 401–412. 10.1093/llc/17.4.401
    https://doi.org/10.1093/llc/17.4.401 [Google Scholar]
  79. Koppel, M., & Schler, J.
    (2003) August. Exploiting stylistic idiosyncrasies for authorship attribution. InProceedings of IJCAI’03 Workshop on Computational Approaches to Style Analysis and Synthesis (Vol.69, pp.72–80).
    [Google Scholar]
  80. Koppel, M., Schler, J., & Zigdon, K.
    (2005) Determining an author’s native language by mining a text for errors. InProceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining (pp.624–628). ACM. 10.1145/1081870.1081947
    https://doi.org/10.1145/1081870.1081947 [Google Scholar]
  81. Langkilde-Geary, I.
    (2002) An empirical verification of coverage and correctness for a general-purpose sentence generator. InProceedings of the International Natural Language Generation Conference (pp.17–24).
    [Google Scholar]
  82. Lee, D.
    (2001) Genres, registers, text types, domains, and styles: Clarifying the concepts and navigating a path through the BNC jungle. Language Learning and Technology, 5(3). 37–72.
    [Google Scholar]
  83. Loehlin, J. C.
    (1998) Latent variable models: An introduction to factor, path, and structural analysis. Hillsdale, NJ: Lawrence Erlbaum Associates.
    [Google Scholar]
  84. Louwerse, M. M., & Graesser, A. C.
    (2004) Coherence in discourse. InP. Strazny (Ed.), Encyclopedia of linguistics. Chicago, IL: Fitzroy Dearborn.
    [Google Scholar]
  85. Lowe, D., & Matthews, R.
    (1995), Shakespeare vs. Fletcher: A stylometric analysis by radial basis functions. Computers and the Humanities, 29, 449–461. 10.1007/BF01829876
    https://doi.org/10.1007/BF01829876 [Google Scholar]
  86. Madigan, D., Genkin, A., Lewis, D. D., Argamon, S., Fradkin, D., & Ye, L.
    (2006) Author identification on the large scale. InProc. of Classification Society of N. America 2005.
    [Google Scholar]
  87. Mann, W. C., & Thompson, S. A.
    (1988) Rhetorical structure theory: Toward a functional theory of text organization. Text-Interdisciplinary Journal for the Study of Discourse, 8(3), 243–281. 10.1515/text.1.1988.8.3.243
    https://doi.org/10.1515/text.1.1988.8.3.243 [Google Scholar]
  88. Marcu, D.
    (2000) The theory and practice of discourse parsing and summarization. Cambridge: The MIT press.
    [Google Scholar]
  89. (1997) From local to global coherence: A bottom-up approach to text planning. InAAAI/IAAI (pp.629–635).
    [Google Scholar]
  90. Martin, J. H., & Jurafsky, D.
    (2000) Speech and language processing. Englewood Cliffs, NJ: Prentice-Hall.
    [Google Scholar]
  91. Martin, J. R.
    (1992) English text: System and structure. Amsterdam: John Benjamins. 10.1075/z.59
    https://doi.org/10.1075/z.59 [Google Scholar]
  92. Matthews, R., & Merriam, T.
    (1993) Neural computation in stylometry : An application to the works of Shakespeare and Fletcher. Literary and Linguistic Computing, 8(4), 203–209. 10.1093/llc/8.4.203
    https://doi.org/10.1093/llc/8.4.203 [Google Scholar]
  93. Matthiessen, C. M. I. M.
    (2015) Register in the round: Registerial cartography. Functional Linguistics, 2(1), 9. 10.1186/s40554‑015‑0015‑8
    https://doi.org/10.1186/s40554-015-0015-8 [Google Scholar]
  94. Matthiessen, C. M. I. M., & Teruya, K.
    (2015) Grammatical realizations of rhetorical relations in different registers. Word, 61(3), 232–281. 10.1080/00437956.2015.1071963
    https://doi.org/10.1080/00437956.2015.1071963 [Google Scholar]
  95. McKeown, K., Kukich, K., & Shaw, J.
    (1994) Practical issues in automatic documentation generation. InProceedings of the Fourth Conference on Applied Natural Language Processing (pp.7–14). Stroudsburg, PA: Association for Computational Linguistics. 10.3115/974358.974361
    https://doi.org/10.3115/974358.974361 [Google Scholar]
  96. Merriam, T., & Matthews, R.
    (1994) Neural compuation in stylometry II: An application to the works of Shakespeare and Marlowe. Literary and Linguistic Computing9, 1–6. 10.1093/llc/9.1.1
    https://doi.org/10.1093/llc/9.1.1 [Google Scholar]
  97. Moore, J. D., & Paris, C. L.
    (1993) Planning text for advisory dialogues: Capturing intentional and rhetorical information. Computational Linguistics, 19(4), 651–694.
    [Google Scholar]
  98. Morato, J., Llorens, J., Génova, G., & Moreiro, J. A.
    (2003) Experiments in discourse analysis impact on information classification and retrieval algorithms. Information Processing & Management, 39(6), 825–851. 10.1016/S0306‑4573(02)00081‑X
    https://doi.org/10.1016/S0306-4573(02)00081-X [Google Scholar]
  99. Mosquera, A., & Moreda, P.
    (2012) A qualitative analysis of informality levels in web 2.0 texts: The Facebook case study. InProceedings of the LREC workshop:@ NLP can u tag# user generated content (pp.23–29).
    [Google Scholar]
  100. Nowson, S.
    (2006) The language of weblogs: A study of genre and individual differences. Unpublished PhD dissertation. University of Edinburgh.
  101. Nowson, S., Oberlander, J., & Gill, A. J.
    (2005) Weblogs, genres and individual differences. InProceedings of the 27th Annual Conference of the Cognitive Science Society (pp.1666–1671). Hillsdale, NJ: Lawrence Erlbaum Associates.
    [Google Scholar]
  102. Och, F. J., & Ney, H.
    (2003) A systematic comparison of various statistical alignment models. Computational Linguistics, 29(1), 19–51. 10.1162/089120103321337421
    https://doi.org/10.1162/089120103321337421 [Google Scholar]
  103. Paiva, D. S., & Evans, R.
    (2005) Empirically-based control of natural language generation. InProceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL’05) (pp.58–65).
    [Google Scholar]
  104. Pavalanathan, U., Fitzpatrick, J., Kiesling, S., & Eisenstein, J.
    (2017) A multi-dimensional lexicon for interpersonal stancetaking. InProceedings of the 55th Annual Meeting of the Association for Computational Linguistics (pp.884–895). Vancouver, CA: Association for Computation Linguistics.
    [Google Scholar]
  105. Pearl, J.
    (2009) Causality. Cambridge: Cambridge University Press. 10.1017/CBO9780511803161
    https://doi.org/10.1017/CBO9780511803161 [Google Scholar]
  106. Peng, F., Schuurmans, D., & Wang, S.
    (2004) Augmenting naive bayes classifiers with statistical language models. Information Retrieval, 7(3–4), 317–345. 10.1023/B:INRT.0000011209.19643.e2
    https://doi.org/10.1023/B:INRT.0000011209.19643.e2 [Google Scholar]
  107. Power, R., Scott, D., & Bouayad-Agha, N.
    (2003) Generating texts with style. InInternational Conference on Intelligent Text Processing and Computational Linguistics (pp.444–452). Berlin: Springer. 10.1007/3‑540‑36456‑0_47
    https://doi.org/10.1007/3-540-36456-0_47 [Google Scholar]
  108. Prabhumoye, S., Tsvetkov, Y., Salakhutdinov, R., & Black, A. W.
    (2018) Style transfer through back-translation. InProceedings of Association for Computational Linguistics Conference. Stroudsburg, PA: ACL.
    [Google Scholar]
  109. Quinlan, J. R.
    (2014) C4. 5: Programs for machine learning. Oxford: Elsevier.
    [Google Scholar]
  110. Raileanu, L. E., & Stoffel, K.
    (2004) Theoretical comparison between the GINI index and information gain criteria. Annals of Mathematics and Artificial Intelligence, 41(1), 77–93. 10.1023/B:AMAI.0000018580.96245.c6
    https://doi.org/10.1023/B:AMAI.0000018580.96245.c6 [Google Scholar]
  111. Rehbein, I., & Bildhauer, F.
    (2017) Data point selection for genre-aware parsing. InProceedings of the 16th International Workshop on Treebanks and Linguistic Theories (pp.95–105).
    [Google Scholar]
  112. Reiter, E., & Dale, R.
    (2000) Building natural language generation systems. Cambridge: Cambridge University Press. 10.1017/CBO9780511519857
    https://doi.org/10.1017/CBO9780511519857 [Google Scholar]
  113. Reiter, E., Sripada, S., Hunter, J., & Yu, J.
    (2005) Choosing words in computer-generated weather forecasts. Artificial Intelligence, 167(1–2), 137–169. 10.1016/j.artint.2005.06.006
    https://doi.org/10.1016/j.artint.2005.06.006 [Google Scholar]
  114. Reiter, E., & Williams, S.
    (2010) Generating texts in different styles. InS. Argamon, K. Burns, & S. Dubnov (Eds.), The structure of style. Algorithmic approachees to understanding manner and meaning (pp.59–75). Heidelberg: Springer.
    [Google Scholar]
  115. Santini, M.
    (2005) Genres in formation? An exploratory study of web pages using cluster analysis. InProceedings of the 8th Annual Colloquium for the UK Special Interest Group for Computational Linguistics (CLUK05). Manchester, UK.
    [Google Scholar]
  116. (2006) Some issues in automatic genre classification of web pages. InProceedings of JADT 2006: 8èmes Journées Internationales d’Analyse statistique des Données Textuelles.
    [Google Scholar]
  117. (2008) Zero, single, or multi? Genre of web pages through the users’ perspective. Information Processing & Management, 44(2), 702–737. 10.1016/j.ipm.2007.05.011
    https://doi.org/10.1016/j.ipm.2007.05.011 [Google Scholar]
  118. Sebastiani, F.
    (2002) Machine learning in automated text categorization. ACM Computing Surveys (CSUR), 34(1), 1–47. 10.1145/505282.505283
    https://doi.org/10.1145/505282.505283 [Google Scholar]
  119. Sennrich, R., Haddow, B., & Birch, A.
    (2016) Controlling politeness in neural machine translation via side constraints. InProceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (pp.35–40).
    [Google Scholar]
  120. Sharoff, S., Wu, Z., & Markert, K.
    (2010) The Web Library of Babel: Evaluating genre collections. InProceedings of the Seventh conference on International Language Resources and Evaluation (LREC’10).
    [Google Scholar]
  121. Sheika, F. A., & Inkpen, D.
    (2012) Learning to classify documents according to formal and informal style. Linguistic Issues in Language Technology, 8(1), 1–29.
    [Google Scholar]
  122. Speelman, D., Gondelaers, S., & Geeraerts, D.
    (2006) A profile-based calculation of region and register variation: The synchronic and diachronic status of the two main national varieties of Dutch. InA. Wilson, D. Archer, & P. Rayson (Eds.), Corpus Linguistics Around the World (pp.181–194). Amsterdam: Rodopi. 10.1163/9789401202213_015
    https://doi.org/10.1163/9789401202213_015 [Google Scholar]
  123. Stamatatos, E.
    (2008) Author identification: Using text sampling to handle the class imbalance problem. Information Processing & Management, 44(2), 790–799. 10.1016/j.ipm.2007.05.012
    https://doi.org/10.1016/j.ipm.2007.05.012 [Google Scholar]
  124. Stamatatos, E., Fakotakis, N., & Kokkinakis, G.
    (2000) Automatic text categorization in terms of genre and author. Computational Linguistics, 26(4), 471–495. 10.1162/089120100750105920
    https://doi.org/10.1162/089120100750105920 [Google Scholar]
  125. Svartvik, J., & Quirk, R.
    (1980) A corpus of English conversation. Lund: Gleerup.
    [Google Scholar]
  126. Szmrecsanyi, B.
    (2009) Typological parameters of intralingual variability: Grammatical analyticity versus syntheticity in varieties of English. Language Variation and Change, 21(3), 319–353. 10.1017/S0954394509990123
    https://doi.org/10.1017/S0954394509990123 [Google Scholar]
  127. Tambouratzis, G., Markantonatou, S., Hairetakis, N., Vassiliou, M., Tambouratzis, D., & Carayannis, G.
    (2000) Discriminating the registers and styles in the Modern Greek language. InProceedings of the workshop on Comparing corpora (Vol.9, pp.35–42). Stroudsburg, PA: Association for Computational Linguistics.
    [Google Scholar]
  128. Teich, E., & Fankhauser, P.
    (2010) Exploring a corpus of scientific texts using data mining. Language & Computers, 71(1), 233–247.
    [Google Scholar]
  129. Teich, E., Degaetano-Ortlieb, S., Kermes, H., & Lapshinova-Koltunski, E.
    (2013) Scientific registers and disciplinary diversification: a comparable corpus approach. InProceedings of the Sixth Workshop on Building and Using Comparable Corpora (pp.59–68).
    [Google Scholar]
  130. The British National Corpus
    The British National Corpus, version 3 (BNC XML Edition) (2007) Distributed by Bodleian Libraries, University of Oxford, on behalf of the BNC Consortium. www.natcorp.ox.ac.uk/
  131. Tweedie, S. Singh, & Holmes, D. I.
    (1996) Neural network applications in stylometry: The Federalist Papers. Computers and the Humanities, 30(1), 1–10. 10.1007/BF00054024
    https://doi.org/10.1007/BF00054024 [Google Scholar]
  132. van Dijk, T. A.
    (1993) Stories and racism. InD. K. Mumby (Ed.). Narrative and social control: Critical perspectives. Newbury Park, CA: Sage. 10.4135/9781483345277.n6
    https://doi.org/10.4135/9781483345277.n6 [Google Scholar]
  133. Vidulin, V., Luštrek, M., & Gams, M.
    (2007) Using genres to improve search engines. InProceedings of the International Workshop Towards Genre-Enabled Search Engines (pp.45–51).
    [Google Scholar]
  134. Waseem, Z., & Hovy, D.
    (2016) Hateful symbols or hateful people? predictive features for hate speech detection on twitter. InProceedings of Proceedings of NAACL-HLT 2016. (pp.88–93). Stroudsburg, PA: The Association for Computational Linguistics https://aclweb.org/anthology/N/N16/N16-2.pdf
    [Google Scholar]
  135. Waugh, S., Adams, A., & Tweedie, F. J.
    (2000) Computational stylistics using Artificial Neural Networks. Literary and Linguistic Computing, 15(2), 187–198. 10.1093/llc/15.2.187
    https://doi.org/10.1093/llc/15.2.187 [Google Scholar]
  136. Xiao, R.
    (2009) Multi-dimensional analysis and the study of world Englishes. World Englishes, 28(4), 421–450. 10.1111/j.1467‑971X.2009.01606.x
    https://doi.org/10.1111/j.1467-971X.2009.01606.x [Google Scholar]
  137. Xu, W., Ritter, A., Dolan, B., Grishman, R., & Cherry, C.
    (2012) Paraphrasing for style. InProceedings of COLING 2012 (pp.2899–2914).
    [Google Scholar]
  138. Zhao, Y. & Zobel, J.
    (2005) Effective and scalable authorship attribution using function words. InAsia Information Retrieval Symposium (pp.174–189). Heidelberg: Springer.
    [Google Scholar]
  139. Zheng, R., Li, J., Chen, H., & Huang, Z.
    (2006) A framework for authorship identification of online messages: Writing-style features and classification techniques. Journal of the American Society for Information Science and Technology, 57(3), 378–393. 10.1002/asi.20316
    https://doi.org/10.1002/asi.20316 [Google Scholar]
http://instance.metastore.ingenta.com/content/journals/10.1075/rs.18015.arg
Loading
/content/journals/10.1075/rs.18015.arg
Loading

Data & Media loading...

  • Article Type: Research Article
Keyword(s): computational linguistics , natural language processing , style , stylistics and text classification
This is a required field
Please enter a valid email address
Approval was successful
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error