Chapter 7. Automatic text classification of disciplinary texts

MyBook is a cheap paperback edition of the original book and will be sold at uniform, low price.
This Chapter is currently unavailable for purchase.

The aim of this research is to classify, using and comparing two automatic classification methods, the academic texts included in the PUCV-2006 Corpus of Spanish. The methods are based on shared lexical-semantic content words present in the corpus of academic texts. The classification methods compared in this study are Multinomial Naive Bayes and Support Vector Machine. Both enable the identification of a small group of shared words that help, according to statistical weights, to classify a new text into the four disciplinary areas involved in the corpora. The results allow us to establish that Support Vector Machine classifies academic texts efficiently. Using this method, we were able to automatically identify the disciplinary domain of an academic text – based on a reduced number of shared content lexemes – delivering high performance even in highly-refined disciplines such as Psychology and Social Work.


This is a required field
Please enter a valid email address