Empirical evaluation: Towards an automated index of lexical variety
This chapter proposes an objective approach to the formal analysis of literary prose in English in order to investigate the relation between lexical density and judgments of canonicity. Based on the concepts of literariness proposed by the Russian Formalists and lexical variety, a mathematical index is designed, relating three variables which take the materiality of text into consideration: (a) relative frequency of lexical bundles, (b) lexical bundle type/token ratio, and (c) word type/token ratio. The index is described and illustrated with 46 canonical and non-canonical literary works. Statistical analysis shows no significant relation between lexical richness and decisions of what has been classified as canonical, indicating that these judgments may be influenced by factors other than the text itself.