Register variation in school EFL textbooks

This study applies additive Multi-Dimensional Analysis (MDA) (Biber 1988) to explore the linguistic characteristics of ‘school English’ or ‘textbook English’. It seeks to find out how text registers commonly featured in English as a Foreign Language (EFL) textbooks differ from comparable registers found outside the EFL classroom. To this end, a Textbook English Corpus (TEC) of 43 coursebooks used in European schools is mobilised. The texts from six textbook register subcorpora and three target language corpora are mapped onto Biber’s (1998) ‘Involved vs. Informational’ dimension of General English. Register accounts for 63% of the variance in these dimension scores in the TEC. Additional factors such as textbook level, series and country of publication/use only play a marginal role in mediating textbook register variation. Textbook dialogues score considerably lower than the Spoken BNC2014, whereas Textbook Fiction scores closest to its corresponding reference Youth Fiction Corpus. Pedagogical and methodological implications are discussed.


School English as a Foreign Language (EFL) textbooks
Although no reliable data on textbook usage is available, it would appear that virtually all lower secondary EFL classrooms in Europe are equipped with textbooks. In most cases, they are the de facto interpretation of the curriculum and their tables of contents dictate the syllabus (cf. Vellenga 2004). At lower secondary level, few additional materials are used; hence, textbooks can be assumed to be the main source of language input for at least the first four to five years of EFL learning at secondary school (Usó-Juan & Martínez-Flor 2010: 424). If, as postulated by usage-based approaches, language learning is driven by frequency and frequency distributions of exemplars within constructions (cf. Ellis & Collins 2009), under-standing what characterises the type of language that learners are exposed to via their textbooks is crucial to understanding learner language development.
Though corpus-based textbook analysis can be traced back to the pioneering work of Dieter Mindt in the 1980s, secondary school (as opposed to universitylevel) EFL textbook language remains a relatively understudied area. As for the transfer of corpus linguistic insights into EFL textbooks, the much-awaited breakthrough has yet to materialise (cf. Römer 2006). Although some textbooks authors and publishers have started to make use of corpora, the rise in the number of corpus-informed pedagogical publications appears to primarily apply to learner dictionaries, grammars, English for Special Purposes (ESP) and English for Academic Purposes (EAP) textbooks (Meunier & Gouverneur 2009: 180-181). With few exceptions (e.g., Cambridge University Press), general EFL textbooks, especially those designed for national primary and secondary school markets, remain largely unaffected by these developments (personal communication with French and German publishers).
The school EFL textbooks examined in the present study are designed to provide sufficient materials for a whole school year's worth of (in most cases compulsory) English lessons at lower secondary school in France, Germany and Spain, where communicative approaches to foreign language teaching are favoured. They thus explain and provide exercises for grammar and vocabulary, as well as include tasks designed to develop reading, writing, speaking, listening and mediation skills. It can be assumed that the majority of the texts featured in these textbooks have been (co-)written by the authors of the textbooks since only very few texts, mostly from the fiction register, are clearly labelled as extracts or simplified versions of original texts (e.g., novels, newspaper articles).

Textbook English studies
to have a strong focus on edited, professionally-written language, which may not correspond to secondary school EFL learners' target language (Le Foll 2020).
The range of language features studied in previous Textbook English research ranges from the use of individual words (e.g., Conrad 2004 on the preposition though) and phraseological patterns (e.g., Gouverneur 2008 on high-frequency verbs), to tenses and aspects (e.g., Barbieri & Eckhardt 2007 on reported speech; Römer 2004 on modals), and has more rarely ventured into the study of pragmatics (e.g., hedging in ESP/EAP textbooks, Hyland 1994) and spoken grammar (Gilmore 2004). However, they have only ever focused on one or at most a handful of individual features. Taken together, these studies provide valuable insights into "the kind of synthetic English" (Römer 2004: 185) that pupils are exposed to via their school textbooks. However, three crucial aspects have commonly been neglected in past endeavours to study the language of EFL/ESL textbooks.
First, interactions between the frequencies of individual linguistic features have generally not been considered. Usage-based approaches to language acquisition, however, claim that the co-occurrence information that learners perceive in language input "is stored as points in a multi-dimensional space at coordinates, and that speakers process this stored linguistic information in ways that allow them to identify (under certain conditions and defined by various types of frequency occurrences) abstract linguistic patterns" (Rautionaho & Deshors 2018: 229). Thus, whilst some influential studies have helped us understand how EFL/ESL learners can be misled by their textbooks to make unidiomatic use of specific linguistic features (e.g., progressive aspect, Römer 2005), only a multivariate approach can paint the full picture as to how "Textbook English" -as a whole -differs from the English that language learners will later encounter outside the classroom.
The second frequently neglected aspect concerns potential register differences between the various types of texts typically featured in school foreign language textbooks. It has long been established that situational characteristics of texts are a major driver of functional linguistic variation (cf., e.g., Biber 2012; Gray & Egbert 2019). Given that school EFL textbooks may feature, for example, extracts of a short story, a dialogue, instructions, and exercises on any double page, Textbook English cannot be meaningfully examined without taking a register-based approach. Up until now, however, register variation within EFL textbooks has largely been ignored (however, see Miller 2011 with respect to university-level ESL textbooks). In the few cases where register has been taken into consideration, the focus has almost exclusively been on the representations of spoken language, e.g., Mindt (1987Mindt ( , 1995 and Römer (2005) who compared the dialogues of secondary school EFL textbooks to corpora of spoken and pseudospoken native speaker English. However, to the author's best knowledge, other textbook registers, such as fiction, instructions, or informative texts, have yet to be explored in EFL textbooks.
Finally, previous quantitative corpus-based studies of textbook language have usually been undertaken at the corpus level (rather than at coursebook volume, chapter, unit or individual text level) and have therefore not been able to take the potential impact of the varying proficiency levels of the textbooks or any potential idiosyncrasies of textbook authors, editors or publishers into consideration.

A multivariate exploration of textbook English
Consequently, this study aims to explore the specificities of Textbook English by: a. accounting for a broad range of lexical, grammatical and semantic features, b. taking account of potential register differences within textbooks, and c. using statistical methods that can model for the potential effects and interactions of textbook register, series and proficiency levels.
To do so, Biber's MDA framework is applied to the study of register variation in school EFL textbooks. In his pioneering study, Biber (1988) elaborated a robust model of language variation in written and spoken English along six dimensions (cf. Nini 2014, 2019 for an empirical validation of its generalisation to new texts using the Brown Corpus). The MDA framework uses factor analysis to reduce the co-occurrence patterns of a large set of lexico-grammatical features to a parsimonious set of latent factors, which are functionally interpreted (cf. Biber 1988;Conrad & Biber 2001Berber Sardinha & Biber 2014). Biber's (1988) model of general written and spoken English was elaborated on the basis of the cooccurrence patterns of 67 (largely automatically tagged) linguistic features observed in a large corpus covering a broad range of registers, including face-toface conversation, press, official documents, letters, etc. Post-1988, two approaches to register variation studies applying MDA have emerged. The first compares one or more new or more specialised registers relative to the dimensions of an earlier analysis of registers, most commonly Biber's (1988) English (cf. Conrad 1996, 2001Biber et al. 2002Biber et al. , 2004) -all of which relied on Biber's 1988 model as their baseline. Conrad (1996Conrad ( , 2001Conrad ( /2013) applied Biber's (1988) model to research articles and university-level textbooks. She compared dimension scores for the two registers (research articles/textbooks) and two disciplines (ecology/history). On Biber's first dimension, all disciplinary texts clustered at the negative, informational end of the scale, thus pointing to overall high informational density. However, fine-grained analyses of the many features that were entered in the MDA revealed notable differences between the two academic registers: the research articles featured more nouns, prepositions, attributive adjectives, and longer words, thus conveying information that is more densely packed than the textbooks that, by contrast, tended to feature more linguistically redundant explanations and examples.
In addition to textbook evaluation, additive MDA may also be used in the development of pedagogical materials: Zuppardo (2013) applied the method to compare the language of aircraft manuals to Biber's (1988) model. The results revealed the salient linguistic features of this specialised register. These can be used by teachers and textbook authors to develop ESP/EAP materials.
Whilst Conrad (1996Conrad ( , 2001Conrad ( /2013 and Zuppardo (2013) have demonstrated the potential of additive MDA in textbook language research, this method has yet to be applied to secondary school EFL textbooks. In fact, as mentioned above, EFL textbook studies, so far, have largely been univariate and, with few exceptions, have mostly ignored potential register-based linguistic variation within textbooks. It is quite probable that the sheer complexity of carrying out an MDA may have hitherto been prohibitive to applying Biber's (1988) model to applied research questions such as register variation in school EFL textbooks (cf. Nini 2019: 92). This study will therefore also investigate the potential of using a freely available and all-in-one programme (Nini 2019), which automatically tags, counts, and computes dimension scores for the first five of Biber's (1988) dimensions, for the analysis of register variation in secondary school EFL textbooks.

Aims and research questions
This paper aims to overcome some of the limitations of past EFL textbook studies by applying MDA to explore linguistic variation within school EFL textbooks and thus provide a more comprehensive view of the defining characteristics of Textbook English. This paper therefore seeks to tackle the following research questions: RQ1. What is the extent of the linguistic variation across the major registers of Textbook English? Do some textbook series show significantly more or less register-based variation? Do the proficiency levels of textbooks significantly interact with register-based variation? RQ2. To what extent do Textbook English registers differ from situationallysimilar, naturally-occurring registers? Are any significant differences observed between different textbook series and/or the proficiency level of individual textbook volumes? RQ3. What are some of the defining linguistic features that characterise Textbook English registers as compared to situationally-similar target language registers?
In addition, the strengths and limitations of applying additive MDA to the investigation of Textbook English using readily available software are considered and discussed.

Textbook English corpus (TEC)
The data explored in this paper is part of the Textbook English Corpus (TEC) (Le Foll in preparation). The TEC is made up of all the texts printed in 43 EFL coursebooks used in secondary schools in France, Germany and Spain, as well as the transcripts of the accompanying audio and video materials (see Table 1). Nine best-selling textbook series from eight major publishers are represented. Each series corresponds to the first four or five years of English instruction at secondary school level.
To be able to compare pedagogical materials used in different educational systems, each textbook was labelled for proficiency level on a universal scale of A to E (see Table 1): level A textbooks correspond to the first year of EFL instruction at secondary level, in other words, beginner level to roughly A1 on the CEFR scale (European Council 2004), whilst level E corresponds to the fifth year (CEFR B1-B2). French textbook series only cover the first four years of secondary school (which take place at Collèges), which is why, whenever possible, a textbook from the same publisher corresponding to the fifth year of instruction (the first year of Lycée) was added. At the time of corpus compilation, Le Livre Scolaire did not produce any textbooks for Lycées. Each of the 43 textbook volumes were digitalised and manually subdivided into text units, where one exercise, reading passage, or transcript corresponds to one text unit. At the same time, these texts were also coded for eight major textbook registers: Conversation, Informative texts, Fiction, Personal Correspondence (letters, diary entries, social media posts, and e-mails), Instructional texts (instructions and explanations), Poetry (songs and poems), Other texts (timetables, shopping lists, etc.) and Words & Phrases (e.g., contextless words and sentences from exercises). The categories Other texts and Words & Phrases were not analysed in the context of this paper. Example texts of the six textbook registers examined here can be found in the Appendix.
The coding was carried out by the author and a student research assistant. The coding scheme was developed following a cyclical categorisation process and was tested by having both coders blind-annotate three full textbook volumes and comparing the results. Inter-rater agreement rate was found to be satisfactorily high (96.65%). The only notable difficulty consisted in distinguishing between individual sentences and isolated words/phrases; hence these two categories were merged into one in the final annotation scheme. The use of custom macros activated using keyboard shortcuts considerably facilitated the XML annotation process and reduced the potential for inattention errors (Le Foll 2020).
The majority of textbook texts are too short for normalised linguistic feature counts to be meaningful. Linguists attempting to apply MDA to social media texts face a similar problem. To solve this issue in their multi-dimensional analysis of Twitter data, Clarke & Grieve (2017: 2) opted for binary feature frequencies (i.e., whether a feature is present or absent within a tweet) rather than relative frequencies. If, as Clarke & Grieve did, one considers a single tweet (as opposed to a thread of tweets) as a single text, this approach is very sensible because single tweets have, by corpus linguistic standards, a very small maximum character limit (currently 280 characters) and as a result, relative frequencies would largely depend on tweet length. The case of textbook texts, however, is much more complex: whilst many textbook texts are as short as a tweet (e.g., brief instructions, short rhymes), countless others run well over 1,000 words (e.g., short stories, newspaper articles). Indeed, defining text units in school EFL textbooks is a particularly challenging task. Numerous possibilities arise (cf. Le Foll 2020). Up until now, entire textbook volumes have often been conceived as single texts. However, as highlighted in Section 1.3, such an approach entirely ignores the variety of text registers encountered within a single textbook volume. A second approach might consider all the texts of one register found within a chapter or unit of a textbook volume to constitute one text. In some cases, this may be justified because texts within a textbook unit will often be thematically related and may therefore form a coherent whole; however, this will depend on the textbook series and is not always consistent across an entire textbook series, either.
In addition to the problem of defining text units, the great variety of text lengths encountered in school EFL textbooks must also be considered. Whilst there is no standard minimum text length for MDA studies, in order to carry out an additive MDA based on Biber's 1988 model, the type/token ratio variable must be calculated on the basis of the first 400 words of any text (Biber 1988: 238-239). It has long been established that type/token ratios must be calculated on the basis of text samples of equal text length as this lexical diversity measure is highly sensitive to text length (e.g., Brezina 2018: 58). Consequently, texts shorter than 400 words could not be included in the present analysis.
In light of both the great variety of text lengths encountered in school EFL textbooks and the fact that the majority are under 400 words, shorter texts within each textbook volume and register were collated into longer text files. This means that, for example, a number of short, consecutive instructional texts from any one textbook volume were combined until a total word count of at least 400 words was reached. This was done sequentially within each textbook volume so that short files from within a chapter/unit or across directly adjacent chapters/units are grouped together. Hence, the collated text files also correspond to the progression that the learners are expected to make. This resulted in the exclusion of Poetry texts from thirteen volumes, Fiction texts from seven volumes, and Informative texts from two volumes because the texts of these registers did not total to at least 400 words. Following these data preparation steps, 1,949 textbook text files were created (thereafter collectively referred to as the TEC, see Table 2).

Target language reference corpora
In answering RQ2 and RQ3, this paper focuses on three major textbook registers: Conversation, Fiction and Informative texts by comparing these three subcorpora of the TEC with reference corpora of situationally-similar target language registers. This section briefly outlines the composition of these reference corpora.

Spoken BNC2014
The Textbook Conversation subcorpus is compared to the Spoken BNC2014, an 11.4-million-word corpus of 1,251 orthographically transcribed conversations among L1 speakers in the U.K. (Love et al. 2017). The Spoken BNC2014 is rich in metadata and has been manually anonymised; however, for the purposes of this study, all mark-ups have been eliminated and anonymising tags replaced with placeholders of the corresponding word class (e.g., all anonymised place names have been replaced by IVYBRIDGE).

Youth fiction corpus ( YFC)
The Fiction subcorpus of the TEC was compared to the Youth Fiction Corpus (YFC), which consists of 300 novels targeted at teenagers and young adults (Le Foll in preparation). This is a better match for the narrative texts featured in school EFL textbooks than the fiction included in Biber's 1988 corpus, both in terms of target readership and publication dates. For the present study, four random samples of approximately 5,000 words were extracted from each of the 300 books in the corpus (splitting was performed at sentence boundaries, hence the slightly varying word counts), except for three short stories, which were only sampled once each in full. With a total of 1,191 YFC texts, this procedure resulted in a number of texts comparable to that of the Spoken BNC2014.

Informative texts for teens corpus (ITTC)
The Informative Texts for Teens Corpus (ITTC) was built by originally retrieving over 10,000 texts from 14 popular web domains of news and information specially targeted at English-speaking teenagers. Care was taken to include a broad range of topics including current affairs, science, technology, history, and entertainment (Le Foll in preparation). Of these, 4,895 text files were under 400 words and were thus discarded for the MDA. Following a stratified sampling approach, 100 texts from each web domain were then randomly selected. This number was chosen to approximately match the number of texts in the other two reference corpora. Fewer than 100 texts longer than 400 words were retrieved from two domains; for these, the full domain datasets were retained. The final selection thus consisted of 1,414 text files (see Table 3).

Comparative additive MDA
For reasons of space, this paper focuses on register variation in secondary school EFL textbooks along Biber's (1988) first 'Involved vs. Informational Production dimension. With its 23 features that contribute to higher dimension scores (positive loadings) and six to negative scores (negative loadings) (see Table 8), this dimension is the most powerful predictor of register variation in Biber's corpus of general English. It accounts for 84% of the variation in Dimension 1 scores (Biber 1988: 126-127). It has since proven to be a stable and robust baseline for additive MDAs across a wide range of domains (cf. Egbert & Mahlberg 2020: 82). Furthermore, this dimension's 'involved/oral/verbal' vs. 'informational/literate/nominal' opposition has, for a range of languages and domains, almost universally emerged as the strongest and most stable predictor of variation in full MDAs post-1988 (Biber 2014).

Tagging and counting linguistics features
To conduct an additive MDA using Biber's (1988) original model as "a base-rate knowledge of English" (Nini 2019: 70), it is necessary to tag and count exactly the same 67 features used in Biber's original study. This was achieved using the Multidimensional Analysis Tagger (hereafter MAT; Nini 2014, 2019): a freely available programme that aims to replicate the original Biber Tagger. It tags all 67 lexical, grammatical and semantic features using the regular expressions described in Biber (1988: 211-245), and normalises all feature frequencies to the number of occurrences per 100 words. The validity and reliability of the MAT as compared to the Biber Tagger has been demonstrated in Nini (cf. 2019: 92).

Computing the mean dimension scores for the new registers
To compute dimension scores, normalised counts must be standardised to avoid frequent features from having a disproportionate influence on the model. In an additive MDA, however, z-scores are not calculated on the basis of the features' means and standard deviations from the corpora under study, but rather from the original corpus from which the baseline model was derived. Consequently, texts whose normalised count for any one variable is equal to the variable mean in Biber's corpus (1988: 77) have a z-score of 0. Positive z-scores indicate that a feature occurs more frequently than on average across Biber's corpus, whilst negative z-scores indicate below average normalised counts. Finally, to compute the dimension scores of the new texts, the z-scores of the features with positive loadings are added and those with negative scores are subtracted. The standardisation step and the computing of the dimension scores were also performed using the MAT.

Computing dimension scores for additional reference corpora
In theory, conducting an additive MDA makes it possible to compare "new" registers to Biber's "old" general English registers without resorting to any additional reference corpora. However, in this study, three target language reference corpora are also mapped onto Biber's first dimension for comparison with the registers of the TEC. Both theoretical and methodological reasons justify this additional step.
First, although the registers included in Biber's 1988 model undoubtedly provide useful comparison points for EFL textbook registers, any differences observed, say between Biber's fiction registers and the fiction featured in EFL textbooks, could potentially be due to different target readerships. Indeed, the fiction subcorpora of the Lancaster-Oslo-Bergen Corpus of British English (LOB) corpus, on which Biber based his original analysis, predominantly contain samples from literature aimed at an adult readership, rather than secondary school students. Further, the corpora from which Biber's model was derived consist of texts published in 1961 (LOB; Johansson, Leech, & Goodluck 1978) and spoken material recorded between 1953 and 1987 (London-Lund; Svartvik & Quirk 1980). Modern EFL textbooks, however, can reasonably be expected to reflect more recent language change, especially in the conversation register.
Second, whilst Nini (2014Nini ( , 2019 demonstrated the overall reliability of the MAT, his analyses pointed to minor differences in some feature counts as compared to the original Biber Tagger. Needless to say, results of dimension score comparisons are more likely to be valid if the exact same method is used to tag and count the features of any corpora to be compared.
Consequently, this additive MDA compares register variation across six Textbook English registers, and additionally compares their Dimension 1 scores to three target language corpora.

Comparing dimension scores
To compare different registers on any one dimension, the mean dimension scores of all the texts in any one register can be compared to each other. Such comparisons have typically been tested and quantified using ANOVAs and coefficients of determination (e.g., Biber 1988: 95;Biber et al. 2004: 64;Gray 2015: 216;Berber Sardinha & Veirano Pinto 2019: 6), or with nonparametric Kruskal Wallis ANOVAs (Muhammad 2020). More recently, the use of predictive Discriminant Function Analysis (DFA) as a post-hoc analysis has been proposed to verify the robustness of dimensions as predictors of register (e.g., Crossley, Allen, & McNamara 2014;Crossley, Kyle, & Römer 2019; Veirano Pinto 2019). However, a crucial assumption of both ANOVAs and DFAs is that the data points be independent of each other (cf. Gries 2015; Winter 2019: chaps 14-15; on the consequences of using DFA on non-independent data, cf. Mundry & Sommer 2007). However, in the context of the present additive MDA, and, indeed, in many corpus linguistic studies, this assumption is not met. In the present study, each textbook series has largely been written by the same group of authors. They are thus not truly independent. Similarly, the YFC and the ITTC consist of several samples from any one book or web domain (see 2.2.2-2.2.3).
As a result, linear mixed effects models were computed using the R package lme4 (Bates et al. 2015). First, register variation within Textbook English is modelled on Biber's Dimension 1. To estimate the relationships between textbook register and Dimension 1 scores, a model was fitted with a random effect structure consisting of by-series varying intercepts and by-series varying slopes for each register to account for the non-independence of texts from within one textbook series. Dimension 1 scores are the outcome variable. Textbook register and textbook level are modelled as fixed level predictors. In addition, their two-way interaction term is also fitted, since it can be hypothesised that, as the proficiency of learners increases, the dimension scores of textbook texts within a register may move closer to their target language equivalents. For instance, upper-intermediate fictional texts from textbooks may be more like teenage/young adult fiction than a short story printed in a beginner textbook. If this were true, we would expect Dimension 1 scores for some registers to increase as learners are expected to become more proficient, whilst they may decrease for others.
To compare the Dimension 1 scores of Textbook Conversation, Fiction and Informative texts with the three corresponding target language reference corpora, a second mixed effect model was computed. In this model, the random effect structure consists of varying by-source intercepts and slopes, where 'source' corresponds to a factor variable with nine textbook levels corresponding to each textbook series for the TEC corpus, 300 book levels for the YFC, 14 web domain levels for the ITTC, and one level for the Spoken BNC2014. These levels have been chosen as the best-available proxies to capture the variation inherent to each (group of ) author(s)/editor(s). The fixed effects are corpus type (Textbook vs. Target Language Reference), register (Conversation, Fiction and Informative texts) and their two-way interactions.
For data sparsity reasons, a subset of the data that excluded the textbook register Poetry was used for all statistical modelling since several textbook volumes do not include any poems or songs longer than 400 words that could therefore be entered in the MDA (see 2.1.1).
Model diagnostic plots were inspected to check the assumptions of linearity, homogeneity of variance, and the normal distribution of residuals of the model (i.e., the differences between the observed and fitted values).
In the model summaries, the CI ranges reported are 95% confidence intervals. The R 2 -values reported summarise the predictive power of the fixed effects only (R 2 marginal ) and of both fixed and random effects (R 2 conditional ) and were computed using the R package sjPlot (Lüdecke 2020). The estimators of relative contrast effects between each register under study were calculated using the default parameters of the emmeans package (Lenth 2020). P-value adjustment followed the Tukey method (confidence level = 0.95).

Results and discussion
Section 3.1 explores intra-textbook linguistic variation by comparing six Textbook English registers on Biber's Dimension 1 (RQ1). Large within-register dispersions are further examined and examples of salient features that contribute to strikingly low or high scores are discussed in context. This is followed, in Section 3.2, by a more fine-grained comparison of three key textbook registers (Conversation, Informative texts and Fiction) to three comparable target language corpora (see Section 2.2) with the aim of investigating the extent to which textbook registers differ from similar registers encountered outside the classroom (RQ2). The results of this comparative additive MDA provide answers to RQ3 which seeks to pinpoint the linguistic features which most contribute to these differences. Limitations of the method are discussed throughout the results and summarised in the concluding discussion (see Section 4).
As illustrated in Figure 1, textbook register is clearly a strong predictor of Dimension 1 scores among textbook texts. A simple model featuring only register as a fixed effect and by-series varying intercepts already accounts for some 63% of the variance in Dimension 1 scores (R 2 marginal ≈ 0.63, R 2 conditional ≈ 0.66). Although model comparisons revealed that the proficiency level of textbooks is also a significant predictor of Dimension 1 scores (χ 2 (4) = 52.27, p < 0.001, as compared to the baseline model), its predicting power is very weak (R 2 marginal ≈ 0.03, R 2 conditional ≈ 0.08). We can thus conclude that text register within textbooks is a much stronger driver of linguistic variation than the proficiency levels the textbooks are designed for. The full model for intra-textbook variation along Dimension 1 is summarised in Table 4. It is a fairly good predictor of Dimension 1 scores with a predictive power of 65% with fixed predictors only, and 71% with both fixed and random effects. Figure 2 presents a visualisation of the model summarised in Table 4. In addition to providing a visualisation of the model fit, Figure 2 also serves as a reminder of the categories for which there is only sparse or no data: e.g., there are few Personal Correspondence texts, the textbook series Piece of Cake (POC) and Solutions only go as far as Level D, and some series feature very few or no Fiction texts at certain levels (see 2.1.1).
With Textbook Conversation scoring highest and Textbook Informative at the bottom of the scale, the distribution of scores on this first dimension echoes Biber's original as well as subsequent additive MDAs. The results indicate that textbook authors do make different, register-based linguistic choices when crafting the texts of secondary school EFL textbooks. Indeed, Table 5 shows that the register means for Dimension 1 are all significantly different from each other (p < .001), except for the Informative-Instructional, Conversation-Personal Correspondence, and Fiction-Personal Correspondence contrasts (as illustrated in Figure 2, the latter two are likely due to the fact that there are relatively fewer Personal Correspondence texts in the TEC). Thus, these results confirm the need to examine textbook language under the lens of register. Indeed, textbook register appears to have a much larger impact on the choice and frequencies of linguistic features of the texts featured in textbooks than the proficiency level of the textbook, or the linguistic idiosyncrasies of its authors (as, admittedly imperfectly, captured in the textbook series variable).

The specificities of textbook English registers
Having examined the extent of register variation within school EFL textbooks, this section compares three major textbook registers: Conversation, Fiction and Informative texts, with comparable target language reference corpora (see Section 2.2) on Biber's Dimension 1. The distribution of scores, as calculated with the MAT, is illustrated in Figure 3. Although Textbook Conversation scored highest among the textbook registers, the Spoken BNC2014 displays considerably higher scores than Textbook Conversation (x̄= 15.75, SD = 7.89 vs. x̄= 26.02, SD = 4.04). Crucially, this difference is, in fact, even greater because the results plotted in Figure 3 correspond to the unaltered MAT output, in which Dimension 1 scores of the Spoken BNC2014 are artificially deflated: this is caused by the absence of punctuation marks in the Spoken BNC2014. Indeed, the Biber Tagger and, as its faithful "copy", also the MAT, require the presence of punctuation marks and/or prosodic boundary markers to identify five of the 22 features with positive loadings on Biber's Dimension 1: stranded prepositions, discourse particles, non-phrasal clause coordination, sentence relatives and direct WH-questions (Biber 1988: Appendix II). The transcription scheme of the Spoken BNC2014, however, does not include any punctuation signs except question marks (Love, Hawtin, & Hardie 2018: 37-38). Thus, for example, following the operationalisation of the discourse marker variable used in Biber's original MDA, only discourse particles preceded by a punctuation mark are tagged and counted.
Consequently, the five aforementioned features that rely on punctuation and/ or prosodic boundary markers had to be excluded from the Dimension 1 scores of the Spoken BNC2014, and for comparability reasons, also from those of Textbook Conversation. This means that, in this particular case, it is not possible to apply Biber's (1988) model one-to-one and, consequently rely solely on Nini's (2014) MAT tool to compare Textbook Conversation with transcriptions of authentic conversation, unless the latter include punctuation marks. In order to bypass this limitation, adjusted scores were calculated in R by adding the z-scores (which the MAT helpfully outputs as a tab-separated file) of all the unproblematic features with positive loadings and subtracting those with negative loadings.
The model summarised in Table 6 takes these new adjusted comparable Dimension 1 scores as the outcome variable for the Textbook Conversation and the Spoken BNC2014 corpora. The model's reference levels are Corpus [Textbook] and Register [Conversation]. Hence, Table 6 shows that the estimated Dimension 1 score for naturally-occurring conversation is 16.01 higher than the score estimated for Textbook Conversation (the intercept), i.e., 30.66. The estimated score for the ITTC is −7.5, i.e., 22.15 lower than the intercept.

Textbook conversation
As illustrated in Figure 4, the exclusion of the features that rely on punctuation for their operationalisation further widens the gap between naturally-occurring conversation and textbook dialogues (see Table 7).   Table 8 sheds light on the linguistic features which most contribute to these strikingly low Dimension 1 scores for Textbook Conversation. All the features listed in Table 8 except amplifiers, possibility modals, second person pronouns and indefinite pronouns, contribute to textbook dialogues obtaining lower scores on this dimension.
As compared to the Spoken BNC2014, the greatest underuses in Textbook Conversation are observed in the frequency of hedges (e.g., sort of), that-deletion

Note.
Features with positive loadings in red, with negative loadings in blue. Significance testing was performed with independent two-tailed Wilcoxon tests (p < .001 after Holm correction = *) (marked [THATD] in the example below) and the use of the pronoun it. Furthermore, WH-clauses (e.g., do you know what I mean), causatives (e.g. because, cos), DO as a main verb, emphatics (e.g., just, really), analytic negation, contractions, demonstrative pronouns and private verbs (e.g., THINK, KNOW, BELIEVE, SEE, MEAN) are also considerably more frequent in naturally-occurring conversation (e.g., Excerpt (1)) than in textbook representations thereof (e.g., Excerpt (2)).
(1) it's the the erm whatever you call it greenfly yes it's er s that sort of greenflies yes it's it's erm something from the greenflies I think rather than it's not the tree itself it's the fact that it's the aphids erm producing something do you think they drink too drink too much of this and it makes them ill? I think [THATD] they go they go too too mad on the on the sap and it just produces all this sticky goo <BNC2014: SRWD> oh gosh I didn't know Nouns, on the other hand, appear to be considerably overrepresented in pedagogical dialogues (as in Excerpt (2)). These high noun counts correlate positively with high frequencies of prepositional phrases, attributive adjectives, higher type/ token ratios and longer words -all of which weigh negatively on this dimension. These features, together with relatively low frequencies of the features with positive loadings discussed above, frequently make textbook dialogues sound like rather unlikely transcripts of real-life conversations, e.g.: (2) Man: Is that your favourite British dish? Woman: Well, I like roast beef a lot. But my real favourite is waking up in the morning to the smell of a full English breakfast. Or Welsh breakfast, or the full Irish breakfast. Or the Ulster fry. Or the Scottish breakfast. Eggs, bacon and lots of other tasty things. It's more or less the same wherever you go in the British Isles. It's just the name that changes.

Man: Is that what you have for breakfast every day?
Woman: Well, not every day, but sometime at weekends. And of course, at hotels you can usually have the full cooked breakfast if you like. Tastes great with a nice cup of tea. By the way, did you know that people in the British Isles drink around three kilos of tea every year.

Man: Three kilos?
Woman: Yes, that's over ten times as much tea as people in Germany drink.
<TEC: Access G 3> Can you pass the milk and sugar, please?
By contrast, textbook conversations with comparatively high Dimension 1 scores feature more verbal features, such as present tense forms, contractions, negation, first and second person and it pronouns, as well as higher normalised counts of discourse markers, amplifiers, hedges, direct WH-questions and stranded prepositions than the majority of textbook dialogues, e.g.: (3) Jack: Lily, there's no way I'm going to recognise a model, it doesn't matter how famous she is. But I tell you what -I bet it isn't her. What's a famous model going to be doing in a shopping mall in our town? The model summarised in Table 4 does not lend support to the hypothesis that the dialogues featured in more advanced textbooks have higher, hence more authentic-like, Dimension 1 scores. In fact, some of the Level A textbook dialogues score comparatively high on Dimension 1 owing to their restricted vocabulary, shorter utterances and frequent turns leading to lower type/token and higher verb/noun ratios (e.g., Excerpt (4)). By contrast, many of the texts intended to represent spoken interactions in more advanced textbooks are characterised by a much more nominal style with high informational density, thus featuring high type/token ratios, many prepositions and longer words (e.g., Excerpt (5)).

Lucy: So?
Sam: I'm at Plymstock school too. They were in dire straits and wanted to escape poverty. They had to take care of themselves. They worked hard, and slowly they got richer and managed to build a new life. They saw the US as a land of freedom and opportunity, where everyone could work hard and be <TEC: Piece of Cake 3 e > successful.

Textbook informative texts
In contrast to Textbook Conversation, which appears to be considerably less "oral" than the Spoken BNC2014 data, Informative texts in EFL textbooks tend to score higher on Biber's 1988 first dimension than the Informative Texts for Teens Corpus (ITTC) (x̄estimated difference = 6.14, p = 0.002). The features which most contribute to this mean difference are first and second person pronouns, DO as a main verb, contractions and amplifiers. The prevalence of these features reflects the often informal, "chatty" tone of the Informative texts featured in school EFL textbooks, e.g.: (6) So how can you help yourself to remember things better in the long term?
Well, there are several things you can do. One of them is to make sure you pay attention and take in the information properly in the first place. Others are to do with the effort you make to remember it afterwards.
[…] Don't wait to revise until exam time -by then it's too late! Although the human brain is amazingly powerful, most people only use a tiny amount of its power. The brain is like a muscle. If you don't exercise it, it loses its strength and deteriorates. If you want to develop and improve your mind and make the most of it, you need to do regular mental exercises. In spite of all our potential brain power, we can easily forget 80% of what we learn in hours <TEC: Achievers B2> unless we make a special attempt to remember it.
The text from which Excerpt (6) was extracted corresponds to the mean Dimension 1 score of the Textbook Informative subcorpus. By way of comparison, Excerpt (7) scores around the mean score of the ITTC. The latter is characterised by more nouns, prepositions, attributive adjectives and longer words.
(7) Ayanna Pressley has won her election, making her the first black woman to represent Massachusetts in the House of Representatives, Boston.com reports. She ran unopposed in Massachusetts's 7th district. Before the polls closed on election day, she urged people on Twitter to vote. "Today, we are powerful. There are only a few hours left to get out the vote. Go #vote for progressive candidates who will fight for equity and justice, " she tweeted. "Vote for activist leaders who will work in and with community. Vote, because this is your democracy and your voice matters. " <ITTC: teenvogue.com> In both the textbook and the reference corpus, Informative texts that score lowest on Dimension 1 tend to include bullet point lists and thus feature a high proportion of nominal sentences, as well as many attributive adjectives, a high type/ token ratio and longer words, e.g.:

Textbook fiction
In contrast to the two textbook registers discussed above, the difference in mean Dimension 1 scores between Textbook Fiction and the reference Youth Fiction Corpus (YFC) is not significant (x̄estimated difference −0.53, SE = 1.71, p = 0.78). Fiction usually consists of alternating narration and fictional speech. Thus, novels with a high proportion of dialogues inevitably score high on Biber's first dimension, whilst those with longer descriptive passages score lower. Indeed, additive MDAs of 19th century novels have shown large significant differences on Biber's Dimension 1 between narrative passages, which are more associated with features corresponding to the informational end of the scale, and fictional speech, which is more associated with features characteristic of involvement and interaction (Egbert & Mahlberg 2020: 85;cf. Biber & Finegan 1994). These findings imply that this dimension is not best suited to examine the potentially defining characteristics of Textbook Fiction (cf. Le Foll in preparation, for comparisons on Biber's (1988) other dimensions). That said, the non-significant difference in Dimension 1 scores for Textbook Fiction and the YFC does suggest that they feature similar proportions of narration to fictional speech.
In addition, the model estimates for the Dimension 1 scores of Textbook English registers listed in Table 4 make clear that the small, but significant effect of textbook level on Dimension 1 scores is driven by its interactions with the Fiction register: Textbook Fiction tends towards marginally lower Dimension 1 scores as the proficiency level of the textbooks increases. Though statistically significant, this finding must be approached with caution: not only are the effect sizes very small, Figure 2 also shows some missing data in the Fiction register. Nonetheless, beginner textbooks tend to feature more dialogue-heavy fictional writing, leading to a greater use of first and second personal pronouns, verbal contractions, negation and demonstrative pronouns (see Excerpt (9)), than more advanced teaching materials (Excerpt (10)) or youth fiction novels (Excerpt (11)), which, on average, both feature many more prepositions, nouns and attributive adjectives. Moreover, beginner textbooks that have not yet introduced past tense forms rely on present-tense narration, which also contributes to higher Dimension 1 scores (e.g., Excerpt (9)) in contrast to the narrative texts of more advanced textbooks (e.g., Excerpt (10)) and the majority of novels sampled in the YFC, which largely feature past-tense narration (e.g., Excerpt (11)).
(9) 'Very funny, ' Lucy says. 'I think this is just a silly trick. I don't believe a word. ' ' A silly trick?' the Time Lord laughs. 'Ha, ha, ha, just look at this, you silly girl!' The lights in the Planetarium flicker again, and on the huge screen, Lucy, Sandy and Asim can see pictures of Greenwich -and it already looks very different. There aren't many old people any more, and children are looking down at clothes that are too big for them. Then they hear the scary voice again.

Conclusion and recommendations
This study has demonstrated that Biber's 1988 model of General English can successfully be used as a baseline to explore register variation within secondary school EFL textbooks. The fact that register explains 63% of the variance observed in Dimension 1 scores across six major registers of the TEC confirms the need to account for register in textbook language studies. Mixed effect models were used to explore additional factors that could potentially explain some of the variation observed, notably the style of the authors, editors and/or publishers of specific textbook series, as well as the proficiency levels of the textbooks. Compared to register, these were shown to only play a marginal role in mediating textbook language variation (RQ1). The only significant interaction between textbook register and proficiency level was observed in the Fiction register, which is easily explained by the fact that the past tense is not featured in beginner level textbooks, meaning that these rely on present tense narration instead -thus leading to higher Dimension 1 scores than narrative texts from more advanced textbooks.
In answer to RQ2, the most striking differences between the textbook and reference registers were observed in the Conversation register: on Biber's (1988) Dimension 1, Textbook Conversation scores considerably lower than the Spoken BNC2014. This is largely due to the much more nominal style of textbook dialogues, which also tend to feature longer speaker turns, longer words and higher type/token ratios. Thus, textbook dialogues appear to primarily function as reinforcers of the vocabulary students are expected to learn, rather than as models of realistic spontaneous spoken interactions. Excluding the features that rely on punctuation for their operationalisations, the most underrepresented Dimension 1 features in Textbook Conversation are hedges, that-deletions, WH-clauses and it pronouns.
On average, the Informative texts of school EFL textbooks were found to be more interactional and spoken-like than the texts featured on informative websites targeted at English-speaking teenagers; they tend to feature considerably more present tense verbs, contractions, and first and second personal pronouns.
Textbook Fiction scores closest to its corresponding reference corpus of Youth Fiction novels. Tellingly, the fictional, narrative texts featured in secondary school EFL textbooks are the most likely to be extracts or adaptations of works that were not originally penned for pedagogical purposes, i.e., extracts of original novels or short stories of the kind included in the YFC. In addition, some publishers (e.g., Klett, personal communication) contract experienced fiction authors to write such texts. However, the analysis also made clear that further explorations of this register ought to be made on other dimensions of Biber's (1988) model: Dimension 2, 'Narrative vs. Non-narrative Concerns' , in particular, may yield more salient results (Le Foll in preparation).
From a methodological point of view, a number of issues in applying Biber's (1988) Dimension 1, 'Involved vs. Informational' , to the registers of secondary school EFL textbooks and comparable target language registers have been highlighted. Solutions to overcome issues related to the non-independence of texts from the same textbook series (Section 2.3.4), text length (Section 2.1.1) and the punctuation-dependent operationalisation of some of the features (Section 3.2.1) were discussed and implemented. The latter two issues have made clear that, in spite of the availability and ease of use of the MAT, Biber's (1988) model of spoken and written English cannot be applied to secondary school EFL textbook registers "out of the box". First, we have seen that it requires careful considerations (and coding skills) to extract individual texts from the textbooks, calculate the length of each text and collate shorter texts in order to reach the 400-word threshold needed to calculate the token/type ratio that loads onto Biber's Dimension 1.
Second, the fact that Biber's first dimension includes five linguistic features with operationalisations that rely on punctuation is also clearly a limitation for comparing the dialogues of textbooks to naturally occurring conversation. Either one chooses a reference corpus of spoken English that includes punctuation (with all the transcription reliability issues that this implies), or, as was chosen here, the offending variables must be manually removed and the adjusted dimension scores must be calculated outside of the MAT. Finally, there is a risk that the results of this multi-feature analysis of Textbook Conversation were skewed because Biber's noun variable aggregates common and proper nouns and many textbook dialogues include the name of the person speaking at the start of every turn (see Excerpts (12) and (13)). This will undoubtably have inflated the relative frequencies of nouns in these textbook dialogues. Thus, for a more precise investigation of register variation in secondary school EFL textbooks, future projects include conducting a full MDA with more appropriate linguistic features (e.g., excluding some very rare features and adding more salient ones) and feature operationalisations (e.g., removing the need for punctuation and separating proper nouns from the total noun count).
Nonetheless, the relative simplicity of conducting additive MDAs and the availability of the MAT (Nini 2014), which largely automates the process (see Section 2.3.1), bears the advantage of making the methodology accessible beyond academia. Given its potential for the evaluation of textbook language, it is hoped that the method may be of interest to textbook authors, editors, publishers and representatives of educational authorities. Though it is by no means claimed that it could or should be used as a unique solution, Biber's (1988) framework has been shown to provide a valuable synthesis of the relative frequencies of many relevant linguistic features that can help to distinguish particularly unnatural-sounding texts from more natural-sounding ones. Since it captures functional variation along an involved/oral vs. informational/literate continuum, Dimension 1 lends itself particularly well to the examination of representations of spoken language. Thus, a high score on Biber's Dimension 1, such as that scored by the dialogue quoted in Excerpt (12) (Dim1 = 38.19 as calculated by the MAT; items that contributed to this high score are in bold), points to a pedagogical text that is likely to paint a more authentic picture of natural conversation than one with a much lower score, e.g., Excerpt (13). Nick: Yes, I know you always leave it there. And it's always in the way. This is a pretty small place, Amy. So perhaps just for once you could put your backpack somewhere where it isn't in the way, hmm? Amy: You don't own this place, Nick. So don't try and tell me what to do. I came in early to get some things done. I put my backpack on the floor. You <TEC: English in Mind 4> deal with it! Thus, textbook dialogues that score particularly low could be flagged as potentially worth re-examining or revising. For example, Excerpt (13) scored −6.10 on Dimension 1, which is the result of its considerably higher type/token ratio and longer average word length than most natural conversations, as well as the fact that it features many complex nominal phrases, which lead to high relative frequencies of prepositions and attributive adjectives -all of which contribute to negative Dimension 1 scores.

Journalist: Why?
Donna: Why? He has just won four golds and two silver medals and he is a record holder. The dream came true. Incredible. That's why he is nicknamed "the Baltimore Bullet". He symbolises determination, generosity, hope… great values. You see, he's a role model! He will be remembered forever. <TEC: New Mission 2 e > Crucially, whilst it can be said that textbook dialogues such as Excerpt (12) expose learners to interactional, genuinely conversation-like language that they are likely to encounter outside the classroom, texts such as Excerpt (13) cannot be considered realistic models for EFL learners to acquire spontaneous spoken language comprehension and/or production skills. Such texts, can, of course, be argued to serve other pedagogical purposes, e.g., the high lexical diversity of Excerpt (13) may be specifically aimed at increasing learners' passive vocabulary range. However, where the aim is to present learners with spontaneous, spoken English, low Dimension 1 scores can act as a helpful warning sign that revision ought to be considered. Inversely, when textbook Informative texts score particularly high on Dimension 1, this is a sign that they are unlikely to be of use as models for students to acquire the skills necessary to write their own informative texts or read for information independently outside the classroom; hence, here too, corpusinformed revisions should be considered. For example, Excerpt (13) could be improved by consulting a corpus of spoken language, such as the Spoken BNC2014 (Love et al. 2017), and adding some of the frequent lexico-grammatical features of spontaneous, interactional speech. The resulting, revised version is likely to include higher relative frequencies of the features that contribute to high scores on Biber's (1988) Dimension 1 (see Table 8). For example, the proposed revised dialogue printed below as Example (14) features more private verbs (e.g. think, forget), that-deletions, contractions, present tense verbs, first and second person pronouns, analytic negations (didn't he), emphatics (really), causative subordination (because), discourse participles (well, you know), hedges (kind of), sentence relatives, WH-questions, possibility modals, non-phrasal coordination and final prepositions than the original textbook dialogue in (13). As Excerpt (14) shows, such additions will also naturally lead to revised dialogues with lower type/token ratios, shorter average word lengths and, in particular, lower noun/verb ratios, which all contribute to high Dimension 1 scores, too.
(14) Journalist: I'm Sally Gordon, reporting from Leicester Square in London and the place is full of sports fans. Let's see who we can talk to. Excuse me, Sir. Can I ask you who's your sports hero? The present results indicate that textbook dialogues with high Dimension 1 scores are more likely to be appropriate models for EFL learners to acquire the skills necessary to navigate natural conversation. In particular, this includes the competent use of a variety of fluency-enhancing strategies to overcome planning phases and manage turn-taking in spontaneous conversation. Previous learner corpus research has shown that EFL learners significantly underuse discourse and vagueness markers as compared to native speakers and tend to rely more on filled and unfilled pauses and/or a very limited set of such markers, instead (e.g. , Müller 2005;Götz 2013;Gilquin 2016;Dumont 2018). It has already been suggested that this oft-observed underuse of discourse markers in learner speech "might stem from the fact that an explicit teaching of discourse markers as a fluencyenhancing strategy has not been systematically integrated into EFL textbooks" (Wolk, Götz, & Jäschke 2020: 4;cf. Römer 2005;Gilquin 2016). The results outlined in Section 3.2.1, in which the dialogues of 43 secondary school EFL textbooks were compared to the transcriptions of naturally occurring native-speaker conversations along Biber's (1988) Dimension 1, lends support to this hypothesis. To conclude, this paper has demonstrated that textbook authors, editors, publishers and educational authorities may want to consider applying additive MDA as part of a wide range of methods for textbook evaluation and revision purposes. However, given the limitations highlighted above, further research is needed to arrive at a comprehensive model of the linguistic specificities of the different registers of secondary school EFL textbooks as compared to situationally similar target language registers. Nonetheless, this preliminary study based on the first dimension of Biber's (1988) model has confirmed that Textbook English cannot be adequately modelled without considering register-based linguistic variation. It has also shown that robust statistical methods must be employed to additionally account for any linguistic variation inherent to the proficiency levels of the textbooks, as well as the idiosyncrasies of individual textbook series (and thereby of their authors, editors and/or publishers).
Register variation in school EFL textbooks 239 bye, carrying heavy bags or running to catch trains. A very tall man was standing completely still near the exit. Why was he wearing summer clothes in this weather? And why was he look-<TEC: Solutions Pre-intermediate> ing straight at me?

Personal correspondence
Ally McKoene > WestHigh Bros December 1 near University Heights, IA via mobile Your best feature is definitely your kindness and I'm sure everyone else agrees! You have tons of kindness in your heart and your compliments can light up anyone's face. You guys are some of the kindest people I've met and I'm so glad that