Volume 21, Issue 4
  • ISSN 1018-2101
  • E-ISSN: 2406-4238


Text mining aims at constructing classification models and finding interesting patterns in large text collections. This paper investigates the utility of applying these techniques to media analysis, more specifically to support discourse analysis of news reports about the 2007 Kenyan elections and post-election crisis in local (Kenyan) and Western (British and US) newspapers. It illustrates how text mining methods can assist discourse analysis by finding contrast patterns which provide evidence for ideological differences between local and international press coverage. Our experiments indicate that most significant differences pertain to the interpretive frame of the news events: whereas the newspapers from the UK and the US focus on ethnicity in their coverage, the Kenyan press concentrates on sociopolitical aspects.


Article metrics loading...

Loading full text...

Full text loading...


  1. Baker, P
    (2006) Using Corpora in Discourse Analysis. London: Continuum.
    [Google Scholar]
  2. Baker, P. , C. Gabrielatos , M. Khosravinik , M. Krzyzanowski , T. McEnery , and R. Wodak
    (2008) A useful methodological synergy? Combining critical discourse analysis and corpus linguistics to examine discourses of refugees and asylum seekers in the UK press. Discourse Society 19.3: 273–306. doi: 10.1177/0957926508088962
    https://doi.org/10.1177/0957926508088962 [Google Scholar]
  3. Balahur, A. , and R. Steinberger
    (2009) Rethinking sentiment analysis in the news: From theory to practice and back. In Proceedings of the 1st Workshop on Opinion Mining and Sentiment Analysis , Satellite to CAEPIA2009.
    [Google Scholar]
  4. Bell, A
    (1991) The Language of News Media. Oxford: Blackwell.
    [Google Scholar]
  5. Cendrowska, J
    (1987) PRISM: An algorithm for inducing modular rules. International Journal of Man- Machine Studies 27.4: 349–370. doi: 10.1016/S0020‑7373(87)80003‑2
    https://doi.org/10.1016/S0020-7373(87)80003-2 [Google Scholar]
  6. Cohen, W
    (1995) Fast effective rule induction. In Proceedings of the 12th International Conference on Machine Learning , p. 115–123.
    [Google Scholar]
  7. Cohen, W. , and Y. Singer
    (1999) Context-sensitive learning methods for text categorization. ACM Transactions on Information Systems (TOIS) 17.2: 141–173. doi: 10.1145/306686.306688
    https://doi.org/10.1145/306686.306688 [Google Scholar]
  8. Daelemans, W. , S. Bucholz , and J. Veenstra
    (1999) Memory-based shallow parsing. In Proceedings of the Computational Natural Language Learning Workshop (CoNLL-99). Demo: www.cnts.ua.ac.be/cgi-bin/jmeyhi/MBSP-instant-webdemo.cgi
    [Google Scholar]
  9. EU EOM Kenya
    (2008) Kenya: Final Report. General Elections27December 2007 (3April 2008) Brussel: EU EOM Kenya, retrieved fromwww.eueom.eu/ [01/03/2010].
  10. Fairclough, N
    (1995) Media Discourse. London: Arnold.
    [Google Scholar]
  11. Fayyad, U. , G. Piatetsky-Shapiro , and P. Smyth
    (1996) The KDD process for extracting useful knowledge from volumes of data. Communication of the ACM 39. 11: 27–34. doi: 10.1145/240455.240464
    https://doi.org/10.1145/240455.240464 [Google Scholar]
  12. Feldman, R. , and J. Sanger
    (2007) The Text Mining Handbook. Advanced Approaches in Analyzing Unstructured Data. New York: Cambridge University Press.
    [Google Scholar]
  13. Fielding, N.G. , and R.M. Lee
    (1998) Computer Analysis of Qualitative Research. London: Sage.
    [Google Scholar]
  14. Finn, A. , and N. Kushmerick
    (2006) Learning to classify documents according to genre. InJournal of the American Society for Information Science and Technology 57.11: 1506–1518. doi: 10.1002/asi.20427
    https://doi.org/10.1002/asi.20427 [Google Scholar]
  15. Fortuna, B. , C. Galleguillos , and N. Cristianini
    (2009) Detecting the bias in media with statistical learning methods. In N. Ashok , Srivastava and M. Saham (eds.), Text Mining: Theory and Applications. London: Taylor and Francis Publisher.
    [Google Scholar]
  16. Fortuna, B. , M. Grobelnik , and D. Mladenić
    (2006) System for semi-automatic ontology construction. In Proceedings of the Demo Session at European Semantic Web Conference ESWC (2006).
    [Google Scholar]
  17. (2007) OntoGen: Semi-automatic ontology editor. In M.J. Smith , and G. Salvendy (eds.), Proceedings of Human Interface, Part II, HCI International 2007, LNCS 4558, Springer, p. 309–318.
    [Google Scholar]
  18. Galtung, J. , and M.H. Ruge
    (1965) The structure of foreign news: The presentation of the Congo, Cuba and Cyprus crises in four Norwegian newspapers. Journal of Peace Research2.1: 64–91. doi: 10.1177/002234336500200104
    https://doi.org/10.1177/002234336500200104 [Google Scholar]
  19. Gibbs, G.R
    (2004) Computer-assisted Qualitative Data Analysis (CAQDAS). In M.S. Lewis-Beck , A. Bryman , and T.F. Liao (eds.), The Sage Encyclopedia of Social Science Research Methods (1). Thousand Oaks: Sage, p. 87–89.
    [Google Scholar]
  20. Greevy, E.P. , and A.F. Smeaton
    (2004) Text categorisation of racist texts using a support vector machine. In Proceedings of 7es Journées internationales d’Analyse statistique des Données Textuelles JADT (1) . Leuven: PUL, p. 533–544.
    [Google Scholar]
  21. Harcup, T
    (2004) Journalism: Principles and Practice. London: Sage.
    [Google Scholar]
  22. Harris, R.J
    (2004) A Cognitive Psychology of Mass Communication (4th ed.) Mahwah: Lawrence Erlbaum.
    [Google Scholar]
  23. Kennedy, G
    (1998) An Introduction to Corpus Linguistics. London: Longman.
    [Google Scholar]
  24. Koller, V. , and G. Mautner
    (2004) Computer applications in critical discourse analysis. In C. Coffin , A. Hewings , and K. O'Halloran (eds.), Applying English Grammar: Functional and Corpus Approaches. London: Arnold, p. 216–228.
    [Google Scholar]
  25. Krishnamurty, R
    (1996) Ethnic, racial and tribal: The language of racism?In C.R. Caldas-Coulthard , and M. Coulthard (eds.), Texts and Practices: Readings in Critical Discourse Analysis. London/New York: Routledge, p. 129–149.
    [Google Scholar]
  26. Lee, C. , J.M. Chan , Z. Pan , and C.Y.K. So
    (2000) National prisms of a global 'Media Event'. In J. Curran , and M. Gurevitch (eds.), Mass Media and Society (3rd ed.). London: Arnold., p. 295–309.
    [Google Scholar]
  27. Lin, W.-H. , E. Xing , and A. Hauptmann
    (2008) A joint topic and perspective model for ideological discourse. In Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases , p. 17–32.
    [Google Scholar]
  28. Lindlof, T.R. , and B.C. Taylor
    (2011) Qualitative Communication Research Methods (3rd ed.). Thousand Oaks: Sage.
    [Google Scholar]
  29. Liu, S.-Z. , and H.-P. Hu
    (2007) Text classification using sentential frequent item sets. InJournal of Computer Science and Technology22.2. Beijing: Institute of Computing Technology, p. 334–337. doi: 10.1007/s11390‑007‑9041‑7
    https://doi.org/10.1007/s11390-007-9041-7 [Google Scholar]
  30. Liu, B
    (2010) Sentiment Analysis: A Multi-Faceted Problem. IEEE Intelligent Systems25.3. doi: 10.1109/MIS.2010.86
    https://doi.org/10.1109/MIS.2010.86 [Google Scholar]
  31. Lüdeling, A. , and M. Kytö
    (eds.) (2008) Corpus Linguistics. An International Handbook. Berlin: Mouton de Gruyter. doi: 10.1515/9783110211429
    https://doi.org/10.1515/9783110211429 [Google Scholar]
  32. Luyckx, K
    (2010) Scalability Issues in Authorship Attribution. Brussels: UPA University Press Antwerp.
    [Google Scholar]
  33. Luyckx, K. , and W. Daelemans
    (2008) Authorship attribution and verification with many authors and limited data. In Proceedings of the 22nd International Conference on Computational Linguistics (COLING 2008), p. 513–520.
    [Google Scholar]
  34. Machin, D
    (2008) News discourse I: Understanding the social goings-on behind news texts. In A. Mayr (ed.), Language and Power: An Introduction to Institutional Discourse. London: Continuum, p. 62–89.
    [Google Scholar]
  35. MacMillan, K
    (2005) More than just coding? Evaluating CAQDAS in a discourse analysis of news texts. Forum Qualitative Sozialforschung / Forum: Qualitative Social Research6.3, art. 25.
    [Google Scholar]
  36. Mahlberg, M
    (2007) Lexical items in discourse: Identifying local textual functions of sustainable development. In M. Hoey , M. Mahlberg , M. Stubbs , and W. Teubert (eds.), Text, Discourse and Corpora. Theory and Analysis. London/New York: Continuum, p. 191–218.
    [Google Scholar]
  37. Matu, P.M. , and H.J. Lubbe
    (2007) Investigating language and ideology: A presentation of the ideological square and transitivity in the editorials of three Kenyan newspapers. Journal of Language and Politics 6.3: 401–418. doi: 10.1075/jlp.6.3.07mat
    https://doi.org/10.1075/jlp.6.3.07mat [Google Scholar]
  38. Mautner, G
    (2007) Mining large corpora for social information: The case of elderly. Language in Society36.1: 51–72. doi: 10.1017/S0047404507070030
    https://doi.org/10.1017/S0047404507070030 [Google Scholar]
  39. McGee, M.C
    (1980) The ‘ideograph’: A link between rhetoric and ideology. The Quarterly Journal of Speech66.1: 1–16. doi: 10.1080/00335638009383499
    https://doi.org/10.1080/00335638009383499 [Google Scholar]
  40. Mitchell, T
    (1997) Machine Learning. Boston: McGraw Hill.
    [Google Scholar]
  41. Morley, J. , and P. Bayley
    (2009) Corpus-Assisted Discourse Studies on the Iraq Conflict: Wording the War. New York: Routledge.
    [Google Scholar]
  42. Ngonyani, D
    (2000) Tools of deception: Media coverage of student protests in Tanzania. Nordic Journal of African Studies9.2: 22–48.
    [Google Scholar]
  43. Ogola, G
    (2009) Media at cross-roads: Reflections on the Kenyan news media and the coverage of the 2007 political crisis. Africa Insight39.1: 58–71.
    [Google Scholar]
  44. O’Halloran, K
    (2010) How to use corpus linguistics in the study of media discourse. In A. O’Keeffe , and M. McCarthy (eds.), The Routledge Handbook of Corpus Linguistics. London/New York: Routledge, p.563–577.
    [Google Scholar]
  45. O'Halloran, K. , and C. Coffin
    (2004) Checking overinterpretation and underinterpretation: Help from corpora in critical linguistics. In C. Coffin , A. Hewings , and K. O'Halloran (eds.), Applying English Grammar: Functional and Corpus Approaches. London: Arnold, p. 275–297.
    [Google Scholar]
  46. O’Keeffe, A. , B. Clancy , and S. Adolphs
    (2011) Introducing Pragmatics in Use. London: Routledge.
    [Google Scholar]
  47. Oloo, A.G.R
    (2007) The contemporary opposition in Kenya: Between internal traits and state manipulation. In G.R. Murunga , and S.W. Nasong’o (eds.), Kenya: The Struggle for Democracy. Dakar: CODESRIA Books, p. 90–125.
    [Google Scholar]
  48. Pape, S. , and S. Featherstone
    (2005) Newspaper Journalism: A Practical Introduction. London: Sage.
    [Google Scholar]
  49. Quinlan, J
    (1993) C4.5: Programs for Machine Learning. San Francisco: Morgan Kaufmann.
    [Google Scholar]
  50. Rambaud, B
    (2008) Caught between information and condemnation: The Kenyan media in the electoral campaigns of December 2007. In J. Lafargue (ed.), The General Elections in Kenya, 2007 (Special issue of Les Cahiers d’Afrique de l’Est (38)). Nairobi: IFRA, p. 57–107.
    [Google Scholar]
  51. Ray, C
    (2008) How the word 'tribe' stereotypes Africa. New African471: 8–9.
    [Google Scholar]
  52. Reah, D
    (1998) The Language of Newspapers. London/New York: Routledge.
    [Google Scholar]
  53. Richardson, J.E
    (2007) Analysing Newspapers: An Approach from Critical Discourse Analysis. Basingstoke: Palgrave Macmillan.
    [Google Scholar]
  54. Rühlemann, C
    (2010) What can a corpus tell us about pragmatics?In A. O’Keeffe , and M. McCarthy (eds.), The Routledge Handbook of Corpus Linguistics. London/New York: Routledge, p.288–301.
    [Google Scholar]
  55. Scott, M
    (2008) WordSmith Tools version 5, Liverpool: Lexical Analysis Software.
    [Google Scholar]
  56. Schönfelder, W
    (2011) CAQDAS and qualitative syllogism logic—NVivo 8 and MAXQDA 10 Compared [91 paragraphs]. Forum Qualitative Sozialforschung/Forum: Qualitative Social Research12(1), art. 21.
    [Google Scholar]
  57. Sebastiani, F
    (2002) Machine learning in automated text categorization. ACM Computing Surveys 34.1: 1–47. doi: 10.1145/505282.505283
    https://doi.org/10.1145/505282.505283 [Google Scholar]
  58. Sinclair, J
    (1991) Corpus, Concordance, Collocation. Oxford: Oxford University Press.
    [Google Scholar]
  59. Stamatatos, E. , N. Fakotakis , and G. Kokkinakis
    (2000) Automatic text categorization in terms of genre and author. Computational Linguistics 26.4: 471–495. doi: 10.1162/089120100750105920
    https://doi.org/10.1162/089120100750105920 [Google Scholar]
  60. Stubbs, M
    (1996) Text and Corpus Analysis: Computer-assisted Studies of Language and Culture. Oxford: Blackwell.
    [Google Scholar]
  61. (2001) Texts, corpora, and problems of interpretation: A response to Widdowson. Applied Linguistics22.2: 149–172. doi: 10.1093/applin/22.2.149
    https://doi.org/10.1093/applin/22.2.149 [Google Scholar]
  62. Thornbury, S
    (2010) What can a corpus tell us about discourse?In A. O’Keeffe , and M. McCarthy (eds.), The Routledge Handbook of Corpus Linguistics. London/New York: Routledge, p.270–287.
    [Google Scholar]
  63. Van Dijk, T.A
    (1988) News as Discourse. Hillsdale: Lawrence Erlbaum.
    [Google Scholar]
  64. (2006) Ideology and discourse analysis. Journal of Political Ideologies 11.2: 115–140. doi: 10.1080/13569310600687908
    https://doi.org/10.1080/13569310600687908 [Google Scholar]
  65. Van Ginneken, J
    (2002) De schepping van de wereld in het nieuws: De 101 vertekeningen die elk 1 procent verschil maken (2nd ed.). Kluwer: Alphen aan den Rijn.
    [Google Scholar]
  66. Van Leeuwen, T
    (2008) Discourse and Practice: New Tools for Critical Discourse Analysis. Oxford: Oxford University Press.
    [Google Scholar]
  67. Verschueren, J
    (1996) Contrastive ideology research: Aspects of a pragmatic methodology. Language Sciences18.3/4: 589–603. doi: 10.1016/S0388‑0001(96)00036‑8
    https://doi.org/10.1016/S0388-0001(96)00036-8 [Google Scholar]
  68. (1999) Understanding Pragmatics. London: Arnold.
    [Google Scholar]
  69. (2008) Context and structure in a theory of pragmatics. Studies of Pragmatics 10: 13–23.
    [Google Scholar]
  70. Westerståhl, J. , and F. Johansson
    (1994) Foreign news: News values and ideologies. European Journal of Communication9: 71–89. doi: 10.1177/0267323194009001004
    https://doi.org/10.1177/0267323194009001004 [Google Scholar]
  71. Witten, I.H. , and E. Frank
    (2005) Data Mining Practical Machine Learning Tools and Techniques (2nd ed.). San Francisco: Elsevier.
    [Google Scholar]
  72. Wrong, M
    (2008) Don’t mention the war. New Statesman137.4884: 22–23.
    [Google Scholar]
  73. Wu, D.H
    (2007) A brave new world for international news? Exploring the determinants of the coverage of foreign nations on US websites. The International Communication Gazette69.6: 539–551. doi: 10.1177/1748048507082841
    https://doi.org/10.1177/1748048507082841 [Google Scholar]
  74. Zhao, Y. , and J. Zobel
    (2005) Effective and scalable authorship attribution using function words, LNCS 3689, p. 174–189. Berlin/Heidelberg: Springer.
    [Google Scholar]
  • Article Type: Research Article
Keyword(s): Discourse analysis; Ideology; Kenyan elections; Pragmatics; Text mining
This is a required field
Please enter a valid email address
Approval was successful
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error