Dimensions of incongruity in register humour
Register-based humour consists of texts in which most of the language is in a particular style or tone, except for one or two words which are radically different in tone (or register) from the rest. It is not initially clear how to define register formally in terms of constructs, such as literariness, archaism, formality, etc. We have adopted a perspective in which words are located in a multi-dimensional space, and incongruity between words should correspond to a relatively large distance between those words, within this space. In order to construct this space in a way which shows up differences relevant to the question of register, we have based each dimension on a word’s frequency of occurrence in a particular corpus of texts. We have put together a number of corpora between which there are likely to be differences of tone/register, and for each word in a text we compute its frequency within every corpus. These numbers are then used to plot the word’s position in our abstract space. The most successful technique, both for building the space and for computing outliers, was tested on the task of distinguishing humorous texts from plain newspaper sentences, where it performed quite well.