1887

oa Chapter 10. Challenges and strategies for beginners to solve research questions with DH methodologies on a corpus of multilingual Philippine periodicals

image of Chapter 10. Challenges and strategies for beginners to solve research questions with DH methodologies on a corpus of multilingual Philippine periodicals

A usually mentioned problem in Digital Humanities (DH) is the difficult fit between Humanities research questions and DH methodologies. This chapter is therefore configured as a meta-chapter that explains the problems and strategies when exploring the multilingual repository of Philippine periodicals constructed within the project “Strenghthening Digital Research at the UP System” in order to research the evolution of the image of China in these periodicals. The two main challenges found for analysing the periodicals to find an answer have been (1) Problematic OCRs, (2) Research across multi-lingual publications. The chapter lists literature and research projects that have approached similar questions and challenges in comparable corpora. Some suggestions of tools to address them will also be provided.

  • Affiliations: 1: University of Antwerp

References

  1. “About Newspapers”. n.d.Trove. Accessed25 January 2019. https://trove.nla.gov.au/newspaper/about
  2. “Aims”. n.d.Accessed25 January 2019. https://www.newseye.eu/project/aims/
  3. “Antwerp Centre for Digital Humanities and Literary Criticism – ACDC – University of Antwerp”. n.d.Accessed25 January 2019. https://www.uantwerpen.be/en/research-groups/digitalhumanities/
  4. “Archivo China España, 1800–1950”. n.d.Accessed4 November 2018. ace.uoc.edu/
  5. Benson, Rodney , and Erik Neveu
    2005 “Introduction: Field Theory as a Work in Progress”. InBourdieu and the Journalistic Field, 1–24. Cambridge, UK: Polity Press.
    [Google Scholar]
  6. “Bibliographical Data (BiblioData) | DARIAH”. n.d.Accessed2 February 2020. https://www.dariah.eu/activities/working-groups/bibliographical-data-bibliodata/
  7. Calamari-OCR/Calamari
    Calamari-OCR/Calamari (2018) 2020 Python. Calamari-OCR. https://github.com/Calamari-OCR/calamari
  8. Cano, Glòria
    2008De Tartessos a Manila: Siete estudios coloniales y poscoloniales. Edición: 1. València: Publicacions de la Universitat de València.
    [Google Scholar]
  9. Castells, P. , F. Perdrix , E. Pulido , M. Rico , R. Benjamins , J. Contreras , and J. Lorés
    2004 “Neptuno: Semantic Web Technologies for a Digital Newspaper Archive”. InThe Semantic Web: Research and Applications, edited by Christoph J. Bussler , John Davies , Dieter Fensel , and Rudi Studer , 445–58. Lecture Notes in Computer Science. Springer Berlin Heidelberg. 10.1007/978‑3‑540‑25956‑5_31
    https://doi.org/10.1007/978-3-540-25956-5_31 [Google Scholar]
  10. Castelvecchi, Davide
    2016 “Deep Learning Boosts Google Translate Tool”. Nature News. 10.1038/nature.2016.20696
    https://doi.org/10.1038/nature.2016.20696 [Google Scholar]
  11. Chaudhury, K. , A. Jain , S. Thirthala , V. Sahasranaman , S. Saxena , and S. Mahalingam
    2009 “Google Newspaper Search Amp;#150; Image Processing and Analysis Pipeline”. In2009 10th International Conference on Document Analysis and Recognition, 621–25. 10.1109/ICDAR.2009.272
    https://doi.org/10.1109/ICDAR.2009.272 [Google Scholar]
  12. Comenge, Rafael
    1894Cuestiones filipinas. 1a. parte. Los Chinos. (Estudio social y político). Manila: Tipolitografía de Chofré y compañía.
    [Google Scholar]
  13. Cordell, Ryan
    . n.d. “Our Project Team”. Accessed2 February 2020. oceanicexchanges.github.io/team/
  14. Crompton, Constance , Richard J. Lane , and Ray Siemens
    2016Doing Digital Humanities: Practice, Training, Research. Taylor & Francis. 10.4324/9781315707860
    https://doi.org/10.4324/9781315707860 [Google Scholar]
  15. “D*/DTA Search”. n.d.Accessed25 January 2019. kaskade.dwds.de/dstar/dta/
  16. “Delpher – Boeken Kranten Tijdschriften”. n.d.Accessed25 January 2019. https://www.delpher.nl/
  17. Eijnatten, Joris van , Toine Pieters , and Jaap Verheul
    2014 “Using Texcavator to Map Public Discourse”. Tijdschrift Voor Tijdschriftstudies, July, 59–65. 10.18352/ts.303
    https://doi.org/10.18352/ts.303 [Google Scholar]
  18. Elizalde Pérez-Grueso, María Dolores
    2008 “China – España – Filipinas: percepciones españolas de China – y de los chinos – en el siglo XIX”. Huarte de San Juan. Geografía e historia, no.15: 101–11. dialnet.unirioja.es/servlet/articulo?codigo=3074412
    [Google Scholar]
  19. Figueroa, José Cardona
    (2015) 2018Contribute to JoseCardonaFigueroa/Sentiment-Analysis-Spanish Development by Creating an Account on GitHub. R. https://github.com/JoseCardonaFigueroa/sentiment-analysis-spanish
    [Google Scholar]
  20. “Fire Breaks out at UP Diliman Campus” 2016 Cnn 2016 cnnphilippines.com/metro/2016/04/01/up-diliman-faculty-center-fire.html
  21. “Fire Hits National Archives Building” 2018 Philstar.Com. 28 May 2018. https://www.philstar.com/headlines/2018/05/28/1819408/fire-hits-national-archives-building
  22. GMA News Online
    GMA News Online 2016 “Namria Discovers 400 to 500 New Islands in PHL Archipelago” 2016 www.gmanetwork.com/news/story/555068/news/nation/namria-discovers-400-to-500-new-islands-in-phl-archipelago/
  23. Gu, Jiatao , Hany Hassan , Jacob Devlin , and Victor O. K. Li
    2018 “Universal Neural Machine Translation for Extremely Low Resource Languages”. InProceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), 344–354. New Orleans, Louisiana: Association for Computational Linguistics. 10.18653/v1/N18‑1032
    https://doi.org/10.18653/v1/N18-1032 [Google Scholar]
  24. Guenter, Muehlberger , and Guenter Hackl
    2019 “NewsEye / READ OCR training dataset from Austrian Newspapers (19th C.)”. Zenodo. 10.5281/zenodo.3387369
    https://doi.org/10.5281/zenodo.3387369 [Google Scholar]
  25. Haaf, Susanne , Frank Wiegand , and Alexander Geyken
    2013 “Measuring the Correctness of Double-Keying: Error Classification and Quality Control in a Large Corpus of TEI-Annotated Historical Text”. Journal of the Text Encoding Initiative, no. Issue4 (March). doi:  10.4000/jtei.739
    https://doi.org/10.4000/jtei.739 [Google Scholar]
  26. Hanumanthappa, M. , and Deepa Nagalavi
    2015 “Identification and Extraction of Headlines from Online English Newspaper- Statistical Approach” 10 (January): 19–22.
    [Google Scholar]
  27. Hébert, David , Thomas Palfray , Stephane Nicolas , Pierrick Tranouez , and Thierry Paquet
    2014 “Automatic article extraction in old newspapers digitized collections”. InProceedings of the First International Conference on Digital Access to Textual Cultural Heritage (DATeCH ’14). Association for Computing Machinery, New York, 3–8. doi:  10.1145/2595188.2595195
    https://doi.org/10.1145/2595188.2595195 [Google Scholar]
  28. Hedges, Mark , and Stuart Dunn
    2017Academic Crowdsourcing in the Humanities: Crowds, Communities and Co-Production. Chandos Publishing.
    [Google Scholar]
  29. “IIIF Newspapers – Devwiki”. n.d.Accessed25 January 2019. https://dev.llgc.org.uk/wiki/index.php?title=IIIF_Newspapers
  30. “IIIF Newspapers Community Group – IIIF | International Image Interoperability Framework”. n.d.Accessed25 January 2019. https://iiif.io/community/groups/newspapers/
  31. Impresso
    Impresso 2018 “Moving beyond Digital Filters. How to Integrate the Digitised Press into the Historian’s Workflow”. Blogpost. Impresso. 6July 2018 https://impresso-project.ch/news/2018/07/06/laurel.html
  32. “Issue 10: Innovation Agenda”. n.d.Europeana Pro. Accessed3 February 2020. https://pro.europeana.eu/page/issue-10-innovation-agenda
  33. Jockers, Matthew Lee
    2014Text Analysis with R for Students of Literature. 10.1007/978‑3‑319‑03164‑4
    https://doi.org/10.1007/978-3-319-03164-4 [Google Scholar]
  34. Jordana y Morera, Ramón
    1888La inmigración china en Filipinas. Madrid: Tipografía de Manuel G. Hernández.
    [Google Scholar]
  35. Kettunen, Kimmo , Tuula Pääkkönen , and Erno Liukkonen
    2019Clipping the Page -Automatic Article Detection and Marking Software in Production of Newspaper Clippings of a Digitized Historical Journalistic Collection. 10.1007/978‑3‑030‑30760‑8_33
    https://doi.org/10.1007/978-3-030-30760-8_33 [Google Scholar]
  36. “Kraken – Kraken 2.0.5-4-Gbb42ba5 Documentation”. n.d.Accessed1 February 2020. kraken.re/
  37. La Inmigración China y Japonesa En Filipinas: Documentos
    La Inmigración China y Japonesa En Filipinas: Documentos 1892 Madrid: Imprenta de Don Luis Aguado.
  38. Lagrama, Eimee Rhea C.
    2012 “Preventing Disaster: Quantifying Risks at the UP Diliman University Library”. InLibraries, Archives and Museums: Common Challenges, Unique Approaches, 10. Rizal Library. Ateneo de Manila University.
    [Google Scholar]
  39. “LASER NLP Toolkit: Zero-Shot Transfer across 93 Languages” 2019 22January 2019 https://ai.facebook.com/blog/laser-multilingual-sentence-embeddings/
  40. Li, David Leiwei
    2003Globalization and the Humanities. Hong Kong University Press.
    [Google Scholar]
  41. Los chinos en Filipinas: Males que se experimentan actualmente y peligros de esa creciente inmigración
    Los chinos en Filipinas: Males que se experimentan actualmente y peligros de esa creciente inmigración 1886 Manila: Establecimiento tipográfico de La Oceanía Española.
  42. “Netherlands EScience Center”. n.d.Accessed29 January 2019. https://www.esciencecenter.nl/project/mining-shifting-concepts-through-time-shico
  43. Netherlands EScience Center: Shifting Concepts Through Time Project – NLeSC/ShiCo
    Netherlands EScience Center: Shifting Concepts Through Time Project – NLeSC/ShiCo (2015) 2018Python. Netherlands eScience Center. https://github.com/NLeSC/ShiCo
    [Google Scholar]
  44. Neudecker, C. , and A. Antonacopoulos
    2016 “Making Europe’s Historical Newspapers Searchable”. In2016 12th IAPR Workshop on Document Analysis Systems (DAS), 405–10. 10.1109/DAS.2016.83
    https://doi.org/10.1109/DAS.2016.83 [Google Scholar]
  45. “OCR” 2019 13. EuropeanaTech. Europeana. https://pro.europeana.eu/page/issue-13-ocr
  46. “On Multilingual Dynamic Topic Modeling”. n.d.Accessed2 February 2020. https://www.newseye.eu/blog/news/multilingual-dynamic-topic-modelling/
  47. Ortuño, Casanova Rocío
    2017 “Philippine Literature in Spanish: Canon Away from Canon”. Iberoromania 2017 (85): 58–77. 10.1515/iber‑2017‑0003
    https://doi.org/10.1515/iber-2017-0003 [Google Scholar]
  48. Ortuño Casanova, Rocío and Anna Sarmiento
    2020 “Humanidades Digitales en Filipinas: proyectos, dificultades y oportunidades de la colaboración Norte-Sur”. Digital Scholarship in the Humanities, fqz086. doi:  10.1093/llc/fqz086
    https://doi.org/10.1093/llc/fqz086 [Google Scholar]
  49. “Our Research Center” 2014 HathiTrust Digital Library 2014 https://www.hathitrust.org/htrc
  50. Pa, Win Pa , Ye Kyaw Thu , Andrew Finch , and Eiichiro Sumita
    2016 “A Study of Statistical Machine Translation Methods for Under Resourced Languages”. Procedia Computer Science, SLTU-2016 5th Workshop on Spoken Language Technologies for Under-resourced languages 09–12 May 2016 Yogyakarta, Indonesia, 81 (January): 250–57. 10.1016/j.procs.2016.04.057
    https://doi.org/10.1016/j.procs.2016.04.057 [Google Scholar]
  51. Palfray, Thomas , David Hebert , Stéphane Nicolas , Pierrick Tranouez , and Thierry Paquet
    2012 “Logical segmentation for article extraction in digitized old newspapers”. InProceedings of the 2012 ACM symposium on Document engineering (DocEng ’12). Association for Computing Machinery, New York, 129–132. doi:  10.1145/2361354.2361383
    https://doi.org/10.1145/2361354.2361383 [Google Scholar]
  52. “Philippines”. n.d.Ethnologue. Accessed18 September 2018. https://www.ethnologue.com/country/PH
  53. Piotrkowicz, Alicja , Vania Dimitrova , and Katja Markert
    2017 “Automatic Extraction of News Values from Headline Text”. InProceedings of the Student Research Workshop at the 15th Conference of the European Chapter of the Association for Computational Linguistics, 64–74. Valencia, Spain: Association for Computational Linguistics. https://www.aclweb.org/anthology/E17-4007. 10.18653/v1/E17‑4007
    https://doi.org/10.18653/v1/E17-4007 [Google Scholar]
  54. Plale, Beth , Robert McDonald , Yiming Sun , Inna Kouper , Ryan Cobine , J. Stephen Downie , Beth Sandore Namachchivaya , and John Unsworth
    2013 “HathiTrust Research Center: Computational Access for Digital Humanities and Beyond”. InProceedings of the 13th ACM/IEEE-CS Joint Conference on Digital Libraries, 395–396. JCDL ’13. New York, NY, USA: ACM. 10.1145/2467696.2467767
    https://doi.org/10.1145/2467696.2467767 [Google Scholar]
  55. Ponce, Mariano
    1912Sun Yat Sen: El Fundador de La República de China. Manila: Imprenta de la Vanguardia y Taliba.
    [Google Scholar]
  56. Prado-Fonts, Carles
    2018 “Writing China from the Rest of the West: Travels and Transculturation in 1920s Spain”. Journal of Spanish Cultural Studies, April. 10.1080/14636204.2018.1453110
    https://doi.org/10.1080/14636204.2018.1453110 [Google Scholar]
  57. “READ | EADH – The European Association for Digital Humanities”. n.d.Accessed25 January 2019. https://eadh.org/projects/read
  58. Saldaña, Zoë Wilkinson
    2018 “Sentiment Analysis for Exploratory Data Analysis”. Programming Historian, January. https://programminghistorian.org/en/lessons/sentiment-analysis. 10.46430/phen0079
    https://doi.org/10.46430/phen0079 [Google Scholar]
  59. Ströbel, Phillip , and Simon Clematide
    2019 “Improving OCR of Black Letter in Historical Newspapers: The Unreasonable Effectiveness of HTR Models on Low-Resolution Images”. InDigital Humanities 2019. Utrecht. doi: 10.5167/uzh‑177164
    https://doi.org/10.5167/uzh-177164 [Google Scholar]
  60. Tesseract-Ocr/Tesseract
    Tesseract-Ocr/Tesseract (2014) 2020 C++. tesseract-ocr. https://github.com/tesseract-ocr/tesseract
  61. “Texcavator”. n.d.Accessed25 January 2019. texcavator.hum.uu.nl/
  62. “Text Correction Hall of Fame”. n.d.Trove. Accessed25 January 2019. https://trove.nla.gov.au/newspaper/hallOfFame?filter=newspaper
  63. Tom
    Tom (2014) 2020Tmbdev/Ocropy. Jupyter Notebook. https://github.com/tmbdev/ocropy
    [Google Scholar]
  64. “Transatlantis Locations”. n.d.Translantis. Accessed25 January 2019. https://translantis.wp.hum.uu.nl/transatlantis-locations/
  65. “Transkribus”. n.d.Accessed25 January 2019. https://transkribus.eu/Transkribus/
  66. “Trove – Digitised Newspapers and More”. n.d.Trove. Accessed25 January 2019. //trove.nla.gov.au/newspaper
  67. “Unsupervised MT: Fast and Accurate for More Languages” 2018Facebook Engineering (blog). 31 August 2018. https://engineering.fb.com/ai-research/unsupervised-machine-translation-a-novel-approach-to-provide-fast-accurate-translations-for-more-languages/
    [Google Scholar]
  68. Vanetik, Natalia , and Marina Litvak
    2019Multilingual Text Analysis: Challenges, Models, And Approaches.
    [Google Scholar]
  69. Viola, Lorella , and Jaap Verheul
    2019 “The Media Construction of Italian Identity: A Transatlantic, Digital Humanities Analysis of Italianità, Ethnicity, and Whiteness, 1867–1920”. Identity19 (4): 294–312. 10.1080/15283488.2019.1681271
    https://doi.org/10.1080/15283488.2019.1681271 [Google Scholar]
  70. “Welsh Newspapers Online – Home”. n.d.Accessed25 January 2019. https://newspapers.library.wales/
  71. Wijfjes, Huub
    2017 “Digital Humanities and Media History. A Challenge for Historical Newspaper Research”. Tijdschrift Voor Mediageschiedenis20 (1): 4–24. doi: 10.18146/tmg20277
    https://doi.org/10.18146/tmg20277 [Google Scholar]
  72. Willems, Marieke , and Rossitza Atanassova
    2015 “Europeana Newspapers: Searching Digitized Historical Newspapers from 23 European Countries”. Insights28 (1): 51–56. 10.1629/uksg.218
    https://doi.org/10.1629/uksg.218 [Google Scholar]
  73. “Xtas, the EXtensible Text Analysis Suite – Xtas 3.4 Documentation”. n.d.Accessed29 January 2019. xtas.net/
  74. Zosa, Elaine , and Mark Granroth-Wilding
    2019 “Multilingual Dynamic Topic Model”. Edited by Galia Angelova , Ruslan Mitkov , Ivelina Nikolova , and Irina Temnikova . RANLP 2019 – Natural Language Processing a Deep Learning World, International conference Recent advances in natural language processing, September, 1388–96. lml.bas.bg/ranlp2019/proceedings-ranlp-2019.pdf. 10.26615/978‑954‑452‑056‑4_159
    https://doi.org/10.26615/978-954-452-056-4_159 [Google Scholar]

References

  1. “About Newspapers”. n.d.Trove. Accessed25 January 2019. https://trove.nla.gov.au/newspaper/about
  2. “Aims”. n.d.Accessed25 January 2019. https://www.newseye.eu/project/aims/
  3. “Antwerp Centre for Digital Humanities and Literary Criticism – ACDC – University of Antwerp”. n.d.Accessed25 January 2019. https://www.uantwerpen.be/en/research-groups/digitalhumanities/
  4. “Archivo China España, 1800–1950”. n.d.Accessed4 November 2018. ace.uoc.edu/
  5. Benson, Rodney , and Erik Neveu
    2005 “Introduction: Field Theory as a Work in Progress”. InBourdieu and the Journalistic Field, 1–24. Cambridge, UK: Polity Press.
    [Google Scholar]
  6. “Bibliographical Data (BiblioData) | DARIAH”. n.d.Accessed2 February 2020. https://www.dariah.eu/activities/working-groups/bibliographical-data-bibliodata/
  7. Calamari-OCR/Calamari
    Calamari-OCR/Calamari (2018) 2020 Python. Calamari-OCR. https://github.com/Calamari-OCR/calamari
  8. Cano, Glòria
    2008De Tartessos a Manila: Siete estudios coloniales y poscoloniales. Edición: 1. València: Publicacions de la Universitat de València.
    [Google Scholar]
  9. Castells, P. , F. Perdrix , E. Pulido , M. Rico , R. Benjamins , J. Contreras , and J. Lorés
    2004 “Neptuno: Semantic Web Technologies for a Digital Newspaper Archive”. InThe Semantic Web: Research and Applications, edited by Christoph J. Bussler , John Davies , Dieter Fensel , and Rudi Studer , 445–58. Lecture Notes in Computer Science. Springer Berlin Heidelberg. 10.1007/978‑3‑540‑25956‑5_31
    https://doi.org/10.1007/978-3-540-25956-5_31 [Google Scholar]
  10. Castelvecchi, Davide
    2016 “Deep Learning Boosts Google Translate Tool”. Nature News. 10.1038/nature.2016.20696
    https://doi.org/10.1038/nature.2016.20696 [Google Scholar]
  11. Chaudhury, K. , A. Jain , S. Thirthala , V. Sahasranaman , S. Saxena , and S. Mahalingam
    2009 “Google Newspaper Search Amp;#150; Image Processing and Analysis Pipeline”. In2009 10th International Conference on Document Analysis and Recognition, 621–25. 10.1109/ICDAR.2009.272
    https://doi.org/10.1109/ICDAR.2009.272 [Google Scholar]
  12. Comenge, Rafael
    1894Cuestiones filipinas. 1a. parte. Los Chinos. (Estudio social y político). Manila: Tipolitografía de Chofré y compañía.
    [Google Scholar]
  13. Cordell, Ryan
    . n.d. “Our Project Team”. Accessed2 February 2020. oceanicexchanges.github.io/team/
  14. Crompton, Constance , Richard J. Lane , and Ray Siemens
    2016Doing Digital Humanities: Practice, Training, Research. Taylor & Francis. 10.4324/9781315707860
    https://doi.org/10.4324/9781315707860 [Google Scholar]
  15. “D*/DTA Search”. n.d.Accessed25 January 2019. kaskade.dwds.de/dstar/dta/
  16. “Delpher – Boeken Kranten Tijdschriften”. n.d.Accessed25 January 2019. https://www.delpher.nl/
  17. Eijnatten, Joris van , Toine Pieters , and Jaap Verheul
    2014 “Using Texcavator to Map Public Discourse”. Tijdschrift Voor Tijdschriftstudies, July, 59–65. 10.18352/ts.303
    https://doi.org/10.18352/ts.303 [Google Scholar]
  18. Elizalde Pérez-Grueso, María Dolores
    2008 “China – España – Filipinas: percepciones españolas de China – y de los chinos – en el siglo XIX”. Huarte de San Juan. Geografía e historia, no.15: 101–11. dialnet.unirioja.es/servlet/articulo?codigo=3074412
    [Google Scholar]
  19. Figueroa, José Cardona
    (2015) 2018Contribute to JoseCardonaFigueroa/Sentiment-Analysis-Spanish Development by Creating an Account on GitHub. R. https://github.com/JoseCardonaFigueroa/sentiment-analysis-spanish
    [Google Scholar]
  20. “Fire Breaks out at UP Diliman Campus” 2016 Cnn 2016 cnnphilippines.com/metro/2016/04/01/up-diliman-faculty-center-fire.html
  21. “Fire Hits National Archives Building” 2018 Philstar.Com. 28 May 2018. https://www.philstar.com/headlines/2018/05/28/1819408/fire-hits-national-archives-building
  22. GMA News Online
    GMA News Online 2016 “Namria Discovers 400 to 500 New Islands in PHL Archipelago” 2016 www.gmanetwork.com/news/story/555068/news/nation/namria-discovers-400-to-500-new-islands-in-phl-archipelago/
  23. Gu, Jiatao , Hany Hassan , Jacob Devlin , and Victor O. K. Li
    2018 “Universal Neural Machine Translation for Extremely Low Resource Languages”. InProceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), 344–354. New Orleans, Louisiana: Association for Computational Linguistics. 10.18653/v1/N18‑1032
    https://doi.org/10.18653/v1/N18-1032 [Google Scholar]
  24. Guenter, Muehlberger , and Guenter Hackl
    2019 “NewsEye / READ OCR training dataset from Austrian Newspapers (19th C.)”. Zenodo. 10.5281/zenodo.3387369
    https://doi.org/10.5281/zenodo.3387369 [Google Scholar]
  25. Haaf, Susanne , Frank Wiegand , and Alexander Geyken
    2013 “Measuring the Correctness of Double-Keying: Error Classification and Quality Control in a Large Corpus of TEI-Annotated Historical Text”. Journal of the Text Encoding Initiative, no. Issue4 (March). doi:  10.4000/jtei.739
    https://doi.org/10.4000/jtei.739 [Google Scholar]
  26. Hanumanthappa, M. , and Deepa Nagalavi
    2015 “Identification and Extraction of Headlines from Online English Newspaper- Statistical Approach” 10 (January): 19–22.
    [Google Scholar]
  27. Hébert, David , Thomas Palfray , Stephane Nicolas , Pierrick Tranouez , and Thierry Paquet
    2014 “Automatic article extraction in old newspapers digitized collections”. InProceedings of the First International Conference on Digital Access to Textual Cultural Heritage (DATeCH ’14). Association for Computing Machinery, New York, 3–8. doi:  10.1145/2595188.2595195
    https://doi.org/10.1145/2595188.2595195 [Google Scholar]
  28. Hedges, Mark , and Stuart Dunn
    2017Academic Crowdsourcing in the Humanities: Crowds, Communities and Co-Production. Chandos Publishing.
    [Google Scholar]
  29. “IIIF Newspapers – Devwiki”. n.d.Accessed25 January 2019. https://dev.llgc.org.uk/wiki/index.php?title=IIIF_Newspapers
  30. “IIIF Newspapers Community Group – IIIF | International Image Interoperability Framework”. n.d.Accessed25 January 2019. https://iiif.io/community/groups/newspapers/
  31. Impresso
    Impresso 2018 “Moving beyond Digital Filters. How to Integrate the Digitised Press into the Historian’s Workflow”. Blogpost. Impresso. 6July 2018 https://impresso-project.ch/news/2018/07/06/laurel.html
  32. “Issue 10: Innovation Agenda”. n.d.Europeana Pro. Accessed3 February 2020. https://pro.europeana.eu/page/issue-10-innovation-agenda
  33. Jockers, Matthew Lee
    2014Text Analysis with R for Students of Literature. 10.1007/978‑3‑319‑03164‑4
    https://doi.org/10.1007/978-3-319-03164-4 [Google Scholar]
  34. Jordana y Morera, Ramón
    1888La inmigración china en Filipinas. Madrid: Tipografía de Manuel G. Hernández.
    [Google Scholar]
  35. Kettunen, Kimmo , Tuula Pääkkönen , and Erno Liukkonen
    2019Clipping the Page -Automatic Article Detection and Marking Software in Production of Newspaper Clippings of a Digitized Historical Journalistic Collection. 10.1007/978‑3‑030‑30760‑8_33
    https://doi.org/10.1007/978-3-030-30760-8_33 [Google Scholar]
  36. “Kraken – Kraken 2.0.5-4-Gbb42ba5 Documentation”. n.d.Accessed1 February 2020. kraken.re/
  37. La Inmigración China y Japonesa En Filipinas: Documentos
    La Inmigración China y Japonesa En Filipinas: Documentos 1892 Madrid: Imprenta de Don Luis Aguado.
  38. Lagrama, Eimee Rhea C.
    2012 “Preventing Disaster: Quantifying Risks at the UP Diliman University Library”. InLibraries, Archives and Museums: Common Challenges, Unique Approaches, 10. Rizal Library. Ateneo de Manila University.
    [Google Scholar]
  39. “LASER NLP Toolkit: Zero-Shot Transfer across 93 Languages” 2019 22January 2019 https://ai.facebook.com/blog/laser-multilingual-sentence-embeddings/
  40. Li, David Leiwei
    2003Globalization and the Humanities. Hong Kong University Press.
    [Google Scholar]
  41. Los chinos en Filipinas: Males que se experimentan actualmente y peligros de esa creciente inmigración
    Los chinos en Filipinas: Males que se experimentan actualmente y peligros de esa creciente inmigración 1886 Manila: Establecimiento tipográfico de La Oceanía Española.
  42. “Netherlands EScience Center”. n.d.Accessed29 January 2019. https://www.esciencecenter.nl/project/mining-shifting-concepts-through-time-shico
  43. Netherlands EScience Center: Shifting Concepts Through Time Project – NLeSC/ShiCo
    Netherlands EScience Center: Shifting Concepts Through Time Project – NLeSC/ShiCo (2015) 2018Python. Netherlands eScience Center. https://github.com/NLeSC/ShiCo
    [Google Scholar]
  44. Neudecker, C. , and A. Antonacopoulos
    2016 “Making Europe’s Historical Newspapers Searchable”. In2016 12th IAPR Workshop on Document Analysis Systems (DAS), 405–10. 10.1109/DAS.2016.83
    https://doi.org/10.1109/DAS.2016.83 [Google Scholar]
  45. “OCR” 2019 13. EuropeanaTech. Europeana. https://pro.europeana.eu/page/issue-13-ocr
  46. “On Multilingual Dynamic Topic Modeling”. n.d.Accessed2 February 2020. https://www.newseye.eu/blog/news/multilingual-dynamic-topic-modelling/
  47. Ortuño, Casanova Rocío
    2017 “Philippine Literature in Spanish: Canon Away from Canon”. Iberoromania 2017 (85): 58–77. 10.1515/iber‑2017‑0003
    https://doi.org/10.1515/iber-2017-0003 [Google Scholar]
  48. Ortuño Casanova, Rocío and Anna Sarmiento
    2020 “Humanidades Digitales en Filipinas: proyectos, dificultades y oportunidades de la colaboración Norte-Sur”. Digital Scholarship in the Humanities, fqz086. doi:  10.1093/llc/fqz086
    https://doi.org/10.1093/llc/fqz086 [Google Scholar]
  49. “Our Research Center” 2014 HathiTrust Digital Library 2014 https://www.hathitrust.org/htrc
  50. Pa, Win Pa , Ye Kyaw Thu , Andrew Finch , and Eiichiro Sumita
    2016 “A Study of Statistical Machine Translation Methods for Under Resourced Languages”. Procedia Computer Science, SLTU-2016 5th Workshop on Spoken Language Technologies for Under-resourced languages 09–12 May 2016 Yogyakarta, Indonesia, 81 (January): 250–57. 10.1016/j.procs.2016.04.057
    https://doi.org/10.1016/j.procs.2016.04.057 [Google Scholar]
  51. Palfray, Thomas , David Hebert , Stéphane Nicolas , Pierrick Tranouez , and Thierry Paquet
    2012 “Logical segmentation for article extraction in digitized old newspapers”. InProceedings of the 2012 ACM symposium on Document engineering (DocEng ’12). Association for Computing Machinery, New York, 129–132. doi:  10.1145/2361354.2361383
    https://doi.org/10.1145/2361354.2361383 [Google Scholar]
  52. “Philippines”. n.d.Ethnologue. Accessed18 September 2018. https://www.ethnologue.com/country/PH
  53. Piotrkowicz, Alicja , Vania Dimitrova , and Katja Markert
    2017 “Automatic Extraction of News Values from Headline Text”. InProceedings of the Student Research Workshop at the 15th Conference of the European Chapter of the Association for Computational Linguistics, 64–74. Valencia, Spain: Association for Computational Linguistics. https://www.aclweb.org/anthology/E17-4007. 10.18653/v1/E17‑4007
    https://doi.org/10.18653/v1/E17-4007 [Google Scholar]
  54. Plale, Beth , Robert McDonald , Yiming Sun , Inna Kouper , Ryan Cobine , J. Stephen Downie , Beth Sandore Namachchivaya , and John Unsworth
    2013 “HathiTrust Research Center: Computational Access for Digital Humanities and Beyond”. InProceedings of the 13th ACM/IEEE-CS Joint Conference on Digital Libraries, 395–396. JCDL ’13. New York, NY, USA: ACM. 10.1145/2467696.2467767
    https://doi.org/10.1145/2467696.2467767 [Google Scholar]
  55. Ponce, Mariano
    1912Sun Yat Sen: El Fundador de La República de China. Manila: Imprenta de la Vanguardia y Taliba.
    [Google Scholar]
  56. Prado-Fonts, Carles
    2018 “Writing China from the Rest of the West: Travels and Transculturation in 1920s Spain”. Journal of Spanish Cultural Studies, April. 10.1080/14636204.2018.1453110
    https://doi.org/10.1080/14636204.2018.1453110 [Google Scholar]
  57. “READ | EADH – The European Association for Digital Humanities”. n.d.Accessed25 January 2019. https://eadh.org/projects/read
  58. Saldaña, Zoë Wilkinson
    2018 “Sentiment Analysis for Exploratory Data Analysis”. Programming Historian, January. https://programminghistorian.org/en/lessons/sentiment-analysis. 10.46430/phen0079
    https://doi.org/10.46430/phen0079 [Google Scholar]
  59. Ströbel, Phillip , and Simon Clematide
    2019 “Improving OCR of Black Letter in Historical Newspapers: The Unreasonable Effectiveness of HTR Models on Low-Resolution Images”. InDigital Humanities 2019. Utrecht. doi: 10.5167/uzh‑177164
    https://doi.org/10.5167/uzh-177164 [Google Scholar]
  60. Tesseract-Ocr/Tesseract
    Tesseract-Ocr/Tesseract (2014) 2020 C++. tesseract-ocr. https://github.com/tesseract-ocr/tesseract
  61. “Texcavator”. n.d.Accessed25 January 2019. texcavator.hum.uu.nl/
  62. “Text Correction Hall of Fame”. n.d.Trove. Accessed25 January 2019. https://trove.nla.gov.au/newspaper/hallOfFame?filter=newspaper
  63. Tom
    Tom (2014) 2020Tmbdev/Ocropy. Jupyter Notebook. https://github.com/tmbdev/ocropy
    [Google Scholar]
  64. “Transatlantis Locations”. n.d.Translantis. Accessed25 January 2019. https://translantis.wp.hum.uu.nl/transatlantis-locations/
  65. “Transkribus”. n.d.Accessed25 January 2019. https://transkribus.eu/Transkribus/
  66. “Trove – Digitised Newspapers and More”. n.d.Trove. Accessed25 January 2019. //trove.nla.gov.au/newspaper
  67. “Unsupervised MT: Fast and Accurate for More Languages” 2018Facebook Engineering (blog). 31 August 2018. https://engineering.fb.com/ai-research/unsupervised-machine-translation-a-novel-approach-to-provide-fast-accurate-translations-for-more-languages/
    [Google Scholar]
  68. Vanetik, Natalia , and Marina Litvak
    2019Multilingual Text Analysis: Challenges, Models, And Approaches.
    [Google Scholar]
  69. Viola, Lorella , and Jaap Verheul
    2019 “The Media Construction of Italian Identity: A Transatlantic, Digital Humanities Analysis of Italianità, Ethnicity, and Whiteness, 1867–1920”. Identity19 (4): 294–312. 10.1080/15283488.2019.1681271
    https://doi.org/10.1080/15283488.2019.1681271 [Google Scholar]
  70. “Welsh Newspapers Online – Home”. n.d.Accessed25 January 2019. https://newspapers.library.wales/
  71. Wijfjes, Huub
    2017 “Digital Humanities and Media History. A Challenge for Historical Newspaper Research”. Tijdschrift Voor Mediageschiedenis20 (1): 4–24. doi: 10.18146/tmg20277
    https://doi.org/10.18146/tmg20277 [Google Scholar]
  72. Willems, Marieke , and Rossitza Atanassova
    2015 “Europeana Newspapers: Searching Digitized Historical Newspapers from 23 European Countries”. Insights28 (1): 51–56. 10.1629/uksg.218
    https://doi.org/10.1629/uksg.218 [Google Scholar]
  73. “Xtas, the EXtensible Text Analysis Suite – Xtas 3.4 Documentation”. n.d.Accessed29 January 2019. xtas.net/
  74. Zosa, Elaine , and Mark Granroth-Wilding
    2019 “Multilingual Dynamic Topic Model”. Edited by Galia Angelova , Ruslan Mitkov , Ivelina Nikolova , and Irina Temnikova . RANLP 2019 – Natural Language Processing a Deep Learning World, International conference Recent advances in natural language processing, September, 1388–96. lml.bas.bg/ranlp2019/proceedings-ranlp-2019.pdf. 10.26615/978‑954‑452‑056‑4_159
    https://doi.org/10.26615/978-954-452-056-4_159 [Google Scholar]
/content/books/9789027260598-btl.155.10ort
dcterms_subject,pub_keyword
-contentType:Journal
10
5
Chapter
content/books/9789027260598
Book
false
Loading
This is a required field
Please enter a valid email address
Approval was successful
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error