Volume 13, Issue 3
  • ISSN 1878-9714
  • E-ISSN: 1878-9722
Buy:$35.00 + Taxes



Extremist online networks reportedly tend to use Twitter and other Social Networking Sites (SNS) in order to issue propaganda and recruitment statements. Traditional machine learning models may encounter problems when used in such a context, due to the peculiarities of microblogging sites and the manner in which these networks interact (both between themselves and with other networks). Moreover, state-of-the-art approaches have focused on non-transparent techniques that cannot be audited; so, despite the fact that they are top performing techniques, it is impossible to check if the models are actually fair. In this paper, we present a semi-supervised methodology that uses our algorithm for feature selection to detect expressions that are biased towards extremist content (Francisco and Castro 2020). With the help of human experts, the relevant expressions are filtered and used to retrieve further extremist content in order to iteratively provide a set of relevant and accurate expressions. These discriminatory expressions have been proved to produce less complex models that are easier to comprehend, and thus improve model transparency. In the following, we present close to 70 expressions that were discovered by using this method alongside the validation test of the algorithm in several different contexts.


Article metrics loading...

Loading full text...

Full text loading...


  1. Alharbi, Ahmed S. M., and Elise de Doncker
    2019 ‘Twitter Sentiment Analysis with a Deep Neural Network: An Enhanced Approach Using User Behavioral Information’. Cognitive Systems Research54: 50–61. 10.1016/j.cogsys.2018.10.001
    https://doi.org/10.1016/j.cogsys.2018.10.001 [Google Scholar]
  2. Al-Salemi, Bassam, Shahrul Azman Mohd Noah, and Mohd Juzaiddin Ab Aziz
    2016 ‘RFBoost: An Improved Multi-Label Boosting Algorithm and Its Application to Text Categorisation’. Knowledge-Based Systems103 (July): 104–17. 10.1016/j.knosys.2016.03.029
    https://doi.org/10.1016/j.knosys.2016.03.029 [Google Scholar]
  3. Alvari, Hamidreza, Soumajyoti Sarkar, and Paulo Shakarian
    2019 ‘Detection of Violent Extremists in Social Media’. ArXiv:1902.01577 [Cs], February. arxiv.org/abs/1902.01577. 10.1109/ICDIS.2019.00014
  4. Ashktorab, Zahra, Christopher Brown, Manojit Nandi, and Aron Culotta
    2014 ‘Tweedr: Mining Twitter to Inform Disaster Response.’ InISCRAM.
    [Google Scholar]
  5. Benigni, Matthew C., Kenneth Joseph, and Kathleen M. Carley
    2017 ‘Online Extremism and the Communities That Sustain It: Detecting the ISIS Supporting Community on Twitter’. PLOS ONE12 (12): e0181405. 10.1371/journal.pone.0181405
    https://doi.org/10.1371/journal.pone.0181405 [Google Scholar]
  6. Caropreso, Maria Fernanda, Stan Matwin, and Fabrizio Sebastiani
    2001 ‘A Learner-Independent Evaluation of the Usefulness of Statistical Phrases for Automated Text Categorization’, 15.
    [Google Scholar]
  7. Cowan, Nelson
    2001 ‘The Magical Number 4 in Short-Term Memory: A Reconsideration of Mental Storage Capacity’. The Behavioral and Brain Sciences24 (1): 87–114; discussion114–185. 10.1017/S0140525X01003922
    https://doi.org/10.1017/S0140525X01003922 [Google Scholar]
  8. Deng, Xuelian, Yuqing Li, Jian Weng, and Jilian Zhang
    2019 ‘Feature Selection for Text Classification: A Review’. Multimedia Tools and Applications78 (3): 3797–3816. 10.1007/s11042‑018‑6083‑5
    https://doi.org/10.1007/s11042-018-6083-5 [Google Scholar]
  9. Ding, Jianli, and Liyang Fu
    2018 ‘A Hybrid Feature Selection Algorithm Based on Information Gain and Sequential Forward Floating Search’. Journal of Intelligent Computing9 (3): 93. 10.6025/jic/2018/9/3/93‑101
    https://doi.org/10.6025/jic/2018/9/3/93-101 [Google Scholar]
  10. FAT/ML
    FAT/ML. n.d. ‘Principles for Accountable Algorithms and a Social Impact Statement for Algorithms’. Accessed8 January 2019. www.fatml.org/resources/principles-for-accountable-algorithms
  11. Forman, George
    2003 ‘An Extensive Empirical Study of Feature Selection Metrics for Text Classification [J]’. Journal of Machine Learning Research – JMLR3 (March).
    [Google Scholar]
  12. Francisco, Manuel, and Juan Luis Castro
    2020 ‘Discriminatory Expressions to Produce Interpretable Models in Microblogging Context’. ArXiv:2012.02104 [Cs], November. arxiv.org/abs/2012.02104
  13. Galavotti, Luigi, Fabrizio Sebastiani, and Maria Simi
    2000 ‘Experiments on the Use of Feature Selection and Negative Evidence in Automated Text Categorization’. InResearch and Advanced Technology for Digital Libraries, edited byJosé Borbinha and Thomas Baker, 59–68. Lecture Notes in Computer Science. Berlin, Heidelberg: Springer. 10.1007/3‑540‑45268‑0_6
    https://doi.org/10.1007/3-540-45268-0_6 [Google Scholar]
  14. Go, Alec, Richa Bhayani, and Lei Huang
    2009 ‘Twitter Sentiment Classification Using Distant Supervision’. Processing150 (January).
    [Google Scholar]
  15. Harris, Zellig S.
    1954 ‘Distributional Structure’. Word10 (2–3): 146–62. 10.1080/00437956.1954.11659520
    https://doi.org/10.1080/00437956.1954.11659520 [Google Scholar]
  16. Kotzias, Dimitrios, Misha Denil, Nando de Freitas, and Padhraic Smyth
    2015 ‘From Group to Individual Labels Using Deep Features’. InKDD ’15. 10.1145/2783258.2783380
    https://doi.org/10.1145/2783258.2783380 [Google Scholar]
  17. Kubat, Miroslav
    2017An Introduction to Machine Learning. Cham: Springer International Publishing. 10.1007/978‑3‑319‑63913‑0
    https://doi.org/10.1007/978-3-319-63913-0 [Google Scholar]
  18. Largeron, Christine, Christophe Moulin, and Mathias Géry
    2011 ‘Entropy Based Feature Selection for Text Categorization’. InACM Symposium on Applied Computing, edited byWilliam C. Chu, W. Eric Wong, Mathew J. Palakal, and Chih-Cheng Hung, 924–28. TaiChung, Taiwan: ACM. 10.1145/1982185.1982389
    https://doi.org/10.1145/1982185.1982389 [Google Scholar]
  19. Miller, George A.
    1956 ‘The Magical Number Seven, plus or Minus Two: Some Limits on Our Capacity for Processing Information’. Psychological Review63 (2): 81–97. 10.1037/h0043158
    https://doi.org/10.1037/h0043158 [Google Scholar]
  20. Misangyi, Vilmos F., Jeffery A. LePine, James Algina, and Jr Francis Goeddeke
    2016 ‘The Adequacy of Repeated-Measures Regression for Multilevel Research: Comparisons With Repeated-Measures ANOVA, Multivariate Repeated-Measures ANOVA, and Multilevel Modeling Across Various Multilevel Research Designs’. Organizational Research Methods, June. 10.1177/1094428105283190
    https://doi.org/10.1177/1094428105283190 [Google Scholar]
  21. O’Dair, M., and A. Fry
    2019 ‘Beyond the Black Box in Music Streaming: The Impact of Recommendation Systems upon Artists’. Popular Communication. 10.1080/15405702.2019.1627548
    https://doi.org/10.1080/15405702.2019.1627548 [Google Scholar]
  22. Periñán-Pascual, Carlos, and Francisco Arcas-Túnez
    2019 ‘Detecting Environmentally-Related Problems on Twitter’. Biosystems Engineering, Intelligent Systems for Environmental Applications, 177 (January): 31–48. 10.1016/j.biosystemseng.2018.10.001
    https://doi.org/10.1016/j.biosystemseng.2018.10.001 [Google Scholar]
  23. Phillips, Avery
    2018 ‘The Moral Dilemma of Algorithmic Censorship’. Becoming Human: Artificial Intelligence Magazine. 27 August 2018. https://becominghuman.ai/the-moral-dilemma-of-algorithmic-censorship-6d7b6faefe7
    [Google Scholar]
  24. Rudin, Cynthia
    2018 ‘Please Stop Explaining Black Box Models for High Stakes Decisions’. ArXiv:1811.10154 [Cs, Stat], November. arxiv.org/abs/1811.10154
  25. Rutkowski, Leszek, Ryszard Tadeusiewicz, Lofti A. Zadeh, and Jacek M. Zurada
    2008Artificial Intelligence and Soft Computing – ICAISC 2008: 9th International Conference Zakopane, Poland, June 22–26, 2008, Proceedings. Springer Science & Business Media. 10.1007/978‑3‑540‑69731‑2
    https://doi.org/10.1007/978-3-540-69731-2 [Google Scholar]
  26. Senthil, Kumar B. and Varma E. Bhavitha
    2016 ‘A Different Type of Feature Selection Methods for Text Categorization on Imbalanced Data’ 5 (9): 7.
    [Google Scholar]
  27. Sparck-Jones, Karen
    1972 ‘A Statistical Interpretation of Term Specificity and Its Application in Retrieval’. Journal of Documentation28 (1): 11–21. 10.1108/eb026526
    https://doi.org/10.1108/eb026526 [Google Scholar]
  28. Twitter Inc.
  29. ‘Twitter Usage Statistics – Internet Live Stats’ 2013. 2013www.internetlivestats.com/twitter-statistics/
  30. Villena-Román, Julio, Sara Lana-Serrano, Eugenio Martínez-Cámara, and José Carlos González-Cristóbal
    2013 ‘TASS – Workshop on Sentiment Analysis at SEPLN’. Procesamiento del Lenguaje Natural50 (0): 37–44.
    [Google Scholar]
  31. Wang, Hao, Dogan Can, Abe Kazemzadeh, François Bar, and Shrikanth Narayanan
    2012 ‘A System for Real-Time Twitter Sentiment Analysis of 2012 U.S. Presidential Election Cycle’. InProceedings of the ACL 2012 System Demonstrations, 115–20. ACL ’12. Stroudsburg, Penn.: Association for Computational Linguistics. dl.acm.org/citation.cfm?id=2390470.2390490
    [Google Scholar]
  32. Wu, Guohua, Liuyang Wang, Nailiang Zhao, and Hairong Lin
    2015 ‘Improved Expected Cross Entropy Method for Text Feature Selection’. In2015 International Conference on Computer Science and Mechanical Automation (CSMA), 49–54. 10.1109/CSMA.2015.17
    https://doi.org/10.1109/CSMA.2015.17 [Google Scholar]
  33. Xu, Yan, Gareth Jones, Jintao Li, Bin Wang, and Chunming Sun
    2007 ‘A Study on Mutual Information-Based Feature Selection for Text Categorization’. Journal of Computational Information Systems3 (March).
    [Google Scholar]
  34. Xue, Bing, Mengjie Zhang, and Will Browne
    2013 ‘Particle Swarm Optimization for Feature Selection in Classification: A Multi-Objective Approach’. IEEE Transactions on Cybernetics43 (December): 1656–71. 10.1109/TSMCB.2012.2227469
    https://doi.org/10.1109/TSMCB.2012.2227469 [Google Scholar]
  35. Zhao, Z., M. Gao, J. Yu, Y. Song, X. Wang, and M. Zhang
    2018 ‘Impact of the Important Users on Social Recommendation System’. Lecture Notes of the Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering, LNICST252: 425–34. 10.1007/978‑3‑030‑00916‑8_40
    https://doi.org/10.1007/978-3-030-00916-8_40 [Google Scholar]
  36. Zheng, Hai-Tao, Zhe Wang, Wei Wang, Arun Kumar Sangaiah, Xi Xiao, and Congzhi Zhao
    2018 ‘Learning-Based Topic Detection Using Multiple Features’. Concurrency and Computation-Practice & Experience30 (15): e4444. 10.1002/cpe.4444
    https://doi.org/10.1002/cpe.4444 [Google Scholar]
  37. Zheng, Zhaohui, Xiaoyun Wu, and Rohini Srihari
    2004 ‘Feature Selection for Text Categorization on Imbalanced Data’. ACM SIGKDD Explorations Newsletter6 (1): 80–89. 10.1145/1007730.1007741
    https://doi.org/10.1145/1007730.1007741 [Google Scholar]

Data & Media loading...

This is a required field
Please enter a valid email address
Approval was successful
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error