Identifying aboutgrams in engineering texts
This paper uses a new computer-mediated methodology, concgramming, to identify the aboutness of a text. Concgrams are the raw products of the concgramming process and consist of up to five co-occurring words irrespective of whether constituency variation (i.e. AB, A*B where * represents an intervening word) and/or positional variation (i.e. AB, BA) is present. Two engineering research articles are concgrammed to identify the most frequently occurring two-word lexical concgrams. The most frequent two-word lexical concgrams for each text are examined to determine whether the words simply co-occur or are meaningfully associated. Once this has been done, a provisional list of “aboutgrams” is drawn up which is tentatively taken to represent the aboutness of each text. These lists are then referred to a specialised corpus of engineering texts and then a general reference corpus. Those aboutgrams on the lists which are consistently more frequent in the text than in the two corpora are then put forward as representing the aboutness of the text. In the study, the lists of aboutgrams are compared with single word frequency lists to evaluate the advantages to be gained from determining aboutness by means of phraseology rather than key words. The conclusion is that aboutgrams are a better means for uncovering the aboutness of the texts.