Multiple Correspondence Analysis, Newspaper Discourse and Subregister: A Case Study of Discourses of Islam in the British Press


 This article introduces a new method for grouping keywords and examines the extent to which it also allows analysts to
 explore the interaction of discourse and subregister. It uses the multivariate statistical technique, Multiple Correspondence Analysis, to
 reveal dimensions of keywords which co-occur across the texts of a corpus. These dimensions are then interpreted in terms of the discourses
 to which they contribute within the data, thus forming the basis of a corpus-assisted discourse analysis. The approach is demonstrated
 through analysis of the discourses that are used to represent Muslims and Islam in a corpus of UK national newspaper articles published on
 these topics spanning 2010–2019. The approach reveals an interaction between discourse and subregister, hence this article argues for the
 need for (corpus-assisted) discourse analysts to account for subregister as a level of meaningful variation when analysing press
 discourse.

the discourses associated with some object of study (Baker et al., 2013). Yet aggregation is an issue entailed by the use of standard keyword approaches which contrast one dataset against another. The explicit and implicit structure that may be present in each dataset is, effectively, ignored. Where that structure exists explicitly in metadata, it is possible to achieve some degree of disaggregation by conducting multiple comparisons of structured subparts of each corpus. For example, Baker et al. (2013) undertook comparisons of their corpus's subparts (e.g. broadsheets and tabloids) to try to disaggregate their keyword results. That subdivision was enabled by metadata. Their analyses showed that individual keywords could relate to numerous discourses.
Those discourses were aggregated in the set of keywords, but they represented an important, implicit, structuring of the data which analysts, through close reading and the use of other corpus-based tools, must disaggregate. Attempts to identify discrete discourses through some sort of clustering process, specifically topic modelling, have only demonstrated that topic modelling is not fit for the purpose of discourse analysis (Brookes and McEnery, 2019). Another persistent issue with keyword studies is their focus on presence rather than absence, yet absence can be as meaningful as presence in discourse analysis (Schroeter and Taylor, 2018) and patterns of presence and absence across a corpus may meaningfully interact (Partington, 2014).
Our approach, keyword co-occurrence, largely addresses the issues of aggregation and absence. This new method groups keywords based on their co-occurrence across the texts of a corpus, with each subsequent set of keywords representing a distinct pattern of co-variation. The method is grounded in the notion of linguistic co-occurrencethat frequent patterns of cooccurring linguistic features tend to have at least one underlying communicative function (Biber, 1988). Linguistic co-occurrence informs Multi-Dimensional Analysis (MDA) (Biber, 1988) and short-text MDA (Clarke, 2019), which identifies sets of lexical and grammatical features that cooccur often across the texts of a corpus. Standard MDA measures the relative frequencies of lexico-grammatical features and subjects these to a multivariate statistical technique called factor analysis (Biber, 1988). Factor analysis identifies patterns across numerous measured variables which can be explained in terms of latent or underlying constructs.
However, standard MDA was not suitable for our study because of the nature of the data we were dealing with. MDA works with relative frequencies of linguistic features. Yet the relative frequencies of most grammatical features are typically only reliable estimates in text samples greater than 1,000 words (Biber, 1993). Yet, as noted, the overwhelming majority of texts in our corpus are 1,000 words or less. Hence, we turned to short-text MDA which measures the presence or absence of features across the texts, allowing absence, presence and their relationship to one another to be accounted for. This information is then processed using MCA, which identifies and visualises relationships between three or more categorical variables. MCA was popularised by Benzécri (1979), who used it to analyse sociological data from questionnaires, as it can be used to observe relationships between individuals (e.g. people who answered questions similarly or dissimilarly), as well as between variables (i.e. which answers tend to be selected together, and which are rarely selected together).
MCA visualises the relationships between individuals and variables in terms of distance, producing two clouds of points, where the points on one cloud represent the individuals and the points on the other represent the categorical variables. The distance between each point is based on how similar they are in their distribution. For example, with Benzécri's questionnaire data, points representing people are closer in the space if they give the same responses to the questions, while points representing responses are closer if they distribute similarly across the people. So, if many people select the same responses, those responses are closer together in the space. MCA is used in short-text MDA much like factor analysis is used in standard MDA -to identify the major patterns of linguistic co-occurrence across texts. Conceptually, the method proposed here is similar to short-text MDA. However, rather than analyse lexical and grammatical features, we instead analyse keywords produced through keyness analysis.
Given the central role that keywords have in our study, we will now summarize and contextualize the rationale for our methodological choices. We will describe how these were operationalized and make some initial observations about the limitations the approach taken, a theme which is returned to in Section 6. This paper is part of a broader project examining the representation of Islam and Muslims in the UK press over time. The project has two aims. The first is covered by this paper; we wish to see whether the MCA approach can identify the dominant discourses of Islam and Muslims through keywords, according to their co-occurrence across the texts of the corpus, and to assess if this approach confirms, challenges or further illuminates the findings of Baker and McEnery (2019). The second goal, covered in Clarke et al. (forthcoming), builds upon the current paper by using the approach introduced here to track changes in press representations of Islam and Muslims over time.
Hence our broader project constrains our keyword extraction approach. Because we wish to achieve a close match to Baker and McEnery's study to assess if the MCA approach confirms their findings, we needed to extract keywords in a similar way. So following Baker and McEnery (2019), we used log-likelihood (Dunning, 1993) as our keyword statistic, reducing our keyword lists by discarding keywords that did not have a log-likelihood value of 3.84 or above (ensuring our keywords had a p-value of <0.05). To prepare the data for the MCA approach in this study, we then eliminated keywords which did not occur in at least 5% of texts in the target corpus (providing an indicative dispersion threshold that any candidate keyword must pass) and reduced the keyword list further by applying an upper bound for dispersion (keywords must not occur in more than 95% of texts).
One innovation we introduced relates to granularity: the extraction of keywords in Baker and McEnery (2019) was achieved at a very coarse level of granularity through the contrast of two time periods, i.e. 1998-2009(Baker et al., 2013) and 2010-2014(Baker and McEnery, 2019. A consequence of this is that, without the 1998-2009 corpus, we could not compute exactly the same keywords as Baker and McEnery (2019). We could not simply compare the 2010-2014 corpus with our 2015-2019 corpus as that would only enable us to achieve the broader project's second aim and not the assessment of the MCA approach. So, to achieve a close approximation of the keywords from the 2019 study, we divided both corpora by year, using each previous year as a reference corpus to the target corpus, which was the following year sub-corpus. For example, to obtain keywords for 2016 we compared this sub-corpus against the 2015 one. By dividing the corpus into yearly sub-corpora, we were, to the best of our ability, able to assess the aboutness of the 2010-2014 corpus; however, this is relative to each previous year within that corpus as opposed to articles published between 1998-2009. When compared with the keywords from Baker and McEnery (2019), we found many of the same keywords, but there are also discrepancies, some of which, we accept, are likely the result of this approach.
The consolidated keyword list includes 567 items (see Appendix I). By combining the keyword lists into a single list, we are actively investigating how all the keywords co-occur across all the texts in the corpus, rather than just a subset of the keywords in a subset of the corpus. We appreciate that many may see the merging of the lists as implying that we are treating a sub-corpus's keyword list as representing the whole corpus. Instead, we are treating it as a possibility as opposed to disregarding it. Overall, our merged keyword lists represent a list of variables computed from the corpus reflecting the aboutness of particular years. We seek to uncover patterns of variation in the corpus according to these variables. As with any study investigating patterns of variation amongst variables, the approach will be limited according to which variables are included. This keyword extraction approach ignores words that are stable across all the years as they will not be identified via the keyword approach. Additionally, different keyword extraction techniques would likely produce somewhat different results.
Consequently, the approach taken here is constrained by the project's broader aims and the previous study. Future research could explore different approaches, contrasting and comparing the results.
Having merged the keyword lists, the presence or absence of the keywords in each article across the corpus was recorded and analysed using a Perl program which ran through each file in the corpus and recorded, in a data matrix, whether, for each file, each keyword was present or absent. Table 1 is an excerpt of the data matrix. Each row is an article in the corpus, each column represents a keyword, and each cell reflects whether the given keyword is present or absent in the corresponding newspaper article in our corpus. Metadata for each article was added to the data matrix, including the publication date, newspaper name, and article length (in word tokens). This data matrix was subjected, in the third step, to MCA using 'FactoMineR' (Lê et al., 2008) in R, where the keywords were active variables and the metadata were supplementary. This produced a series of dimensions representing the most common patterns of co-occurring keywords across the texts and indicated the association of the newspapers with the dimensions.
MCA shows this by assigning contributions and coordinates to each category of a keyword (presence _P and absence _A) for each dimension. For example, Table 2 presents the coordinates and contributions for the categories (presence and absence) of the keyword army for Dimensions 1, 2, and 3. Of these three dimensions, the presence of army contributes to Dimension 2 the most.  (2010), we only interpreted the categories of keywords contributing above the average contribution, as these represent the most distinguishing patterns of variation.
We interpreted each dimension in turn, starting with the first and continuing until we encountered a dimension from which no coherent discourse could be derived. MCA also assigned each article in the corpus a coordinate and contribution for each dimension. This revealed which articles were most associated with the keyword co-occurrence patterns captured by the dimensions. To interpret the discourse associated with the dimension, we manually analysed the texts most strongly associated with that dimension. One analyst ( In total, ten dimensions were exploredthe tenth dimension was not coherent. Dimension 1 was simply Short vs. Long texts, which is largely a consequence of examining the presence/absence of features (see Clarke, 2019). As this reveals no particular insight into the representation of Islam, nor does it contribute helpfully to the discussion of subregister, we set it aside here. The keywords associated with the positive and negative sides of each dimension are given in Appendix II.

Results
We now present Dimensions 2-9, in each case describing the discourse associated with the assemblage of keywords which characterise that dimension. These dimensions explain 89% of the variance in the data using the standard modified rate (Benzécri, 1992: 412) on the eigenvalues. Our consistent finding is that absences of keywords associated with a particular pole of a dimension tended to have their presences associated with the other pole of the dimension.
So, to avoid repetition, we do not comment on the absences in what follows. Throughout our analyses we were sensitive to the possibility that a discourse was linked strongly to a subregister within newspapers. We note whether this was the case in the title of each dimension (link to subregister: yes/no). Finally, we consider in each section the association of the individual newspapers to the dimensions, as identified by including this information as supplementary, which produces an overall coordinate of the texts from the different newspapers for each dimensionsimilar to factor scores in factor analysis. Thus, we explore whether, in our corpus, each dimension is general to the newspapers studied or whether there is notable variation between the newspapers. We also comment on whether there is a notable trend in the placement of the newspapers in the dimension with regards to their political leaning or type (i.e. popular 'tabloid' newspapers versus quality 'broadsheet' newspapers). Note that this approach achieves another layer of disaggregation, from an overview of the newspapers to a view of them relative to one another (full results for each dimension are given in Appendix III).

Dimension 2: War, Conflict and Terrorism vs. Reporting of Everyday Life and Events (link to subregister: yes)
Dimension 2 is interpreted as opposing keywords which, on the positive side, are used in news reports discussing War, Conflict, and Terrorism with keywords on the negative side used in opinion pieces and/or feature articles to discuss everyday life and events. Thus, this dimension not only distinguishes articles by topic, but also by communicative style and subregister.
The keywords strongly associated with positive Dimension 2 include those related to war (e.g. fighters, soldiers, weapons), conflict (e.g. violence, murder) and terrorism (e.g. suicide, bombing, terrorists), as well as keywords which describe people and places (e.g. citizens, members, mr, spokesman) and times and dates (e.g. friday, november, yesterday) that are tied to the events being reported. Other keywords depict ongoing investigations (e.g. investigation, emerged, involved, described) and are used in articles reporting on news events related to war, terrorism and conflict, such as the article "Armed police shoot man 'carrying a bomb in a rucksack after he takes a woman hostage' at Brussels tram station as they swoop on terror suspects linked to 'imminent attack in France'" (MailOnline, 25.03.16).
By contrast, the keywords strongly associated with the negative Dimension 2 are used to describe entities and encode personal opinions and feelings (e.g. love, kind, hope). Unlike positive Dimension 2, these keywords are not connected by a consistent topic but vary in this regard. However, some of these keywords are used to discuss politics (e.g. Brexit, win, politics,) and business (e.g. job, money, business). Overall, these keywords are used in the articles associated with negative Dimension 2 to encode personal opinions and stances on a range of topics, including politics, work and business, as opposed to war, terrorism and conflict. For example, a Guardian article entitled 'What is an Ideal Childhood? ' (17.10.15) asks five celebrities about their views on an ideal childhood. One, the poet Lemn Sissay, talks about the benefit of parents believing in something (politically or religiously), such as the Qur'an, to get the child to think about who they are.
This first meaningful dimension indicates that the articles in our corpus most commonly vary in terms of those which report on war, terrorism and conflict and those which do not. This dimension, after Dimension 1, represents the best fit of the data, indicating that war, conflict and terrorism is a discourse that is commonly represented in the articles. This is consistent with previous research which found that war and conflict was the most common press discourse of Islam between 1998 and 2009. It also supports Baker et al.'s (2013) finding that opinion pieces are an important subregister within which strong stances predominate.
If we look at how the individual newspapers relate to this dimension, we find no overall trend, but the Express (0.2) is most associated with the war, terrorism and conflict discourse, whereas the Sun (-0.36) is most associated with everyday life and events.

Dimension 3: Foreign Affairs vs. Domestic Affairs (link to subregister: no)
Dimension 3 is interpreted as opposing keywords on the positive side that are used in reporting on foreign affairs with keywords on the negative side that are used in reporting on local and domestic affairs.
Many of the keywords strongly associated with positive Dimension 3 refer to foreign and UK-based politicians (e.g. MPs, Trump's, Cameron, president)

Dimension 4: Western Political Conflict vs. Overseas Conflict (link to subregister: yes)
Dimension 4 is interpreted as contrasting Western political conflict on its positive side with overseas conflict on its negative side. The keywords on the negative side link to the subregister travel guides and reviews.
Positive Dimension 4 is characterized by reporting which links Muslims to Western political conflict. The keywords strongly associated with the positive side of this dimension focus on terror attacks (e.g. attack, terror), political processes (e.g. meeting, response) and legal actors/actions (e.g. court, police, prison). Evaluation is apparent (wrong), as is reporting of speech and writing (e.g. read, said, told). The Muslim community, and often specifically the British Muslim community, is placed relative to the actors and actions discussed (muslims), especially with respect to hate crimes and discrimination experienced and enacted by them. The political contexts in which these events are situated are Western, more specifically the U.S. (e.g. Trump, white house), Europe (eu) and Britain (e.g. labour, parliament, prime minister), and are often placed in time (e.g. tuesday, yesterday). Many of the keywords co-occur in articles discussing the political far-right (right is a keyword) and the anti-Muslim bias expressed by such Finally, there is no overall trend between newspaper type or political affiliation and the employment of this discourse. However, the newspaper most associated with western political conflict is the Daily Mail (0.28), whereas the newspapers most associated with employing the discourse of overseas conflict are The People (-0.24) and The Times (-0.24).

Dimension 5: UK policy versus US policy (link to subregister: no)
Dimension 5 is interpreted as opposing keywords on the positive side that are used in articles concerning UK policy with keywords on the negative side that are used in articles concerning U.S. policy.
Many of the keywords strongly associated with positive

Dimension 6: Globalisation vs. Tribalism
Dimension 6 is interpreted as opposing keywords focusing on the positive side on globalisation and the UK's position in the world economy, and on the negative side on tribalism and an Othering of Muslims as 'Them'.
The positive side of this dimension includes keywords relating to UK politics, especially Brexit, which co-occur in articles discussing the effects of the Brexit vote and particular trade deals on the British pound and the world economy more broadly (e.g. brexit, result, vote). A group of keywords refers to the economy and commodities (economy, oil, car, agreement, deal, plans, cut, hit, return), which feature in discussions of the global economy and international trade agreements. Other keywords are used to forecast and predict (e.g. expected, likely, possible) and to refer to business (team, company, business, agency), and often occur in texts describing and gauging the prosperity of businesses. Many of the keywords are evaluative in terms of scale (e.g. biggest, large, major) and there are many temporal and frequency keywords (e.g. days, four, weeks, yesterday), as well as keywords referring to places (e.g. city, local, south). Overall, these keywords often co-occur in texts discussing globalisation, such as a particular country's role and influence in the global economy, various trade agreements, and the success of international businesses in articles such as "FTSE 100 falters but oil prices jump after Iraq says it will 'co-operate' with Opec deal", The Telegraph, 28.11.16.
The keywords on the negative side of Dimension 6 are identity-focused and are used to position groups and identities in opposition to each other in the sense that the identities and characteristics of these groups are presented as being distinct from others. The identities implicated in this are reflected in the keywords and include, among others, iraqi, islamist,   The seventh most common pattern of variation across the articles in our corpus thus involves articles that either critique the rise of the far-right and its promotion of anti-Muslim rhetoric, or which promote stories that describe radicalised Muslims, which ultimately contribute to a discourse of fear around Islam.
Except for The Express (0.09) and The Times (-0.1), Dimension 8 opposes Tabloid newspapers on the negative side with Broadsheet newspapers on the positive side. This indicates that reporting on the rise of the far-right is more often associated with broadsheet newspapers, while reporting on the radicalisation of British Muslims is more associated with the tabloids.

Threats (link to subregister: no)
Dimension 9 is interpreted as opposing keywords which on the positive side are used in articles to discuss political processes regarding elections with those on the negative side which discuss political processes regarding security threats.
The positive side of this dimension is about political actors engaged in political conflict during elections. These keywords relate to the political processes feeding into an election where candidates stand, when the election is active they are running, and at the end of the election they may have won or lost in their bid for power. Political actors linked to major parties in the UK are prominent in these keywords and may be identified explicitly (e.g. david cameron), with reference to a role they hold (e.g. defence secretary) or be collectivized ( We identified no overall trend between newspaper type or political affiliation and the employment of these discourses. The News of the World (0.19) and The Sun (0.18) are most strongly associated with elections, while The Express (-0.31) is most associated with security threats.

Discussion
In terms of the goals we set ourselves in the paper, Dimensions 2 to 9 clearly allow us to achieve the goal of refreshing our understanding of the representation of Islam in UK newspapers. The dimensions themselves paint a picture broadly consistent with the results of Baker and McEnery (2019). The success we experienced in achieving our first goal is evidence that we have fulfilled the secondwe have demonstrated that MCA may help to organise keywords in a way that facilitates a corpus-assisted discourse analysis. Importantly, the problem of aggregation in keyword studies is dealt with well by the technique. The MCA approach helped us identify meaningful discourses aligned to the groups of keywords on the Dimensions. It also allowed us to identify keywords which linked to multiple discourses but with different sensesbattle, for example, is a keyword which contributes to Dimension 4 to refer to literal overseas conflict, such as the 'Battle for Mosul' and Dimension 9 to refer to election processes, such as the metaphorical battle for votes. The MCA technique provided an approach to grouping keywords grounded in statistical co-occurrence and enabled the observation of which articles exhibit these patterns of co-occurrence most and least strongly. While the approach did allow us to consider the issue of absence, in this study at least, that was not a particularly productive avenue of enquiry, as absence and presence seemed largely to be two sides of the same coin.
Of more importance, potentially, our approach successfully highlighted that subregister plays a role in the representation of Islam. The subregisters we identified (in line with Biber and Conrad, 2019) with the assistance of MCA allow us to make some broad claims about the relationship between subregister and discourse. Firstly, not all subregisters link to discourses about Islam in our study. One notable example is letters/texts from readers, which was an important subregister linked to negative representations of Muslims and Islam in Baker et al. (2013). By contrast, the link with Opinion Columnists endures, while new linksto travel guides and obituarieshave been identified. Hence, we approach a second claim; the engagement of discourses of Islam with subregisters in the UK press is dynamic. While we cannot provide a comprehensive picture of the intersection of Islam and all subregisters in our data, we can comment on those we have seen and those which we know to exist but do not see in our dataand that confirms the interactional and dynamic nature of it. This in turn leads to a third claim that future research can explorethe dynamic interaction between subregister and discourse, in which the two interact to effect, is unlikely to be unique to Islam.
The claims made so far link subregister to effect in discourse, so next we must consider why the interaction exists and what its role in discourse is. In Baker et al. (2013), the subregister of letters to the editor played a role in the discourseit was a legitimation strategy. What of the new subregisters identified herewhy have obituaries become important to the representation of Islam and Muslims? The explanation is given in the discussion of Dimension 7pieces which appear to be obituaries are, in fact, strongly evaluative and use the subregister not to celebrate the life or lives in question, but to condemn them. In other words, they are delegitimation strategies. This is highly marked in the context of the obituary subregister, which normally serves 'the double purpose … of informing the general public of the demise of a well-known individual, and that of celebrating the contribution that the person has made to society' (Pinna and Brett, 2018:123). In this case, the appearance of the subregister within reportage is to reverse both of those purposesit is telling the public about the death of a person with whom they are unfamiliar and simultaneously damning that person's contribution to society. So, the link between discourse and subregister is shown, once again, to connect clearly to discourse and to achieving specific effects within it. This finding echoes Biber and Conrad's (2019: 46) suggestion, made when discussing shifts of subregister within a conversation, that such a 'switch in purpose can be regarded as a shift in subregister from one kind of conversation to another' and that these shifts in purpose across different kinds of communication, including writing and speech, can be identified within the linguistic characteristics. We see precisely this sort of shift in our data: a shift to the obituary subregister within reportage signals a change of purpose within an article. The situated nature of that switch inverts our expectations of what that subregister normally achieves, with the identification of the subregister in this case allowing the identification of distinct purposes that differentiate between specific subregisters (ibid).
Of course, we can question whether it is possible to determine newspaper subregisters, either automatically or using metadata in some suitably encoded corpus, to add further utility to the approach to keywords taken in this paper. The metadata approach can be dismissed swiftlythe news consolidation service we used to compile the data for this study, LexisNexis, does not provide reliable subregister data. Even if it did, the subtlety of the results for Dimension 7 should not be overlookedthere we had evidence that texts appearing to be reportage can, in fact, have embedded within them a substantial portion of text that is, effectively, in another subregisterin this case, obituary. This would provide a challenge both for news producers and automated systems which try to assign subregisters to articles. For example, while the articles do provide a broad topic categorization for an article and the section of the newspaper in which the text occurred, the mapping of subregisters to this information is, at best, highly imprecise. Hence the approach taken here is to place on the analyst the burden of identifying subregisters while accepting that the technique used to cluster keywords helps in this process. What would help this process further would be a comprehensive study of the subregisters of newspaper textshowever, there is no such study that we are aware of.
A final issue that we should consider is the limitations we inherited from previous studies. As noted in Section 2, we used a keyword detection method used in previous studiesyet since those studies were published other approaches to calculating keywords have been proposed, notably that of Egbert and Biber (2019). While future work could adopt such an approach, we anticipate that the differences that it produces would be of limited scale as the key innovation of that approach, a consideration of dispersion, has been acknowledged here by setting a threshold for keywords appearing in at least five percent of files in the corpus, hence eliminating the most egregious cases of ill-dispersed but frequent words creating keywords. Such a simple approach to dispersion, as is common in the key-keyword approach, was shown by Egbert and Biber to produce results similar to their technique, hence we expect differences to be matters of degree rather than absolutes.

Conclusion
This paper has introduced a new approach to conducting keyword analysis, which explores discourse through the lens of keyword co-occurrence in texts. Our analysis, which employed this approach to explore representations of Muslims and Islam in ten years of national newspaper coverage, identified the major dimensions that characterise this coverage through the qualitative exploration of co-occurring keywords in context, related to representational discourses. These dimensions, and their associated discourses, have indicated relative stability compared to the discourses described by Baker et al. (2013) and Baker and McEnery (2019). That is to say, though recent years have witnessed the emergence of new social actors, groups, contexts and events in reportage around Islam, representations continue to Other Muslims, by presenting them as especially violent and as adopting values and practices framed as different from those of the global West. This is a bleak outlook, but it is one that speaks to the power of these representations, such that they endure regardless of the specific people, places and events that are newsworthy at a given time.
Yet our analysis has highlighted one area of significant change. The approach introduced in this paper proved of value in accessing the intersection of subregister and discourse in a corpus in which subregister was not explicitly marked. Through this analysis, we were able to link the presence of particular subregisters to representational discourses. As well as confirming an earlier interaction between a subregister (e.g. opinion pieces) and discourse, we also saw the use of the subregister obituary as a rhetorical strategy, with texts invoking this latter subregister serving, we argue, as a delegitimatory function by discrediting the life and contribution of deceased Muslim social actors. It is notable that this rhetorical effect was often achieved by one subregister embedded within another (reportage). The overall effect, we argue, is a subversion of readers' usual expectations of the functions of obituaries.
The approach to keyword categorisation and analysis introduced in this paper has proven to be effective for providing a more nuanced account of keywords that is sensitive to the various senses and discourses that a single keyword can exhibit across the texts of a corpus. This approach helps to overcome the issue of keyword aggregration that is frequently present in corpus-assisted discourse studies. Such a consideration is relevant to studies of corpora comprising texts from different news outlets, as news reporting is an 'argumentative discourse genre' (Richardson, 2004: 227) and different news outlets can deploy a single (key)word when invoking distinct, even oppositional, discourses. Our analysis also suggests that it may benefit (corpus-assisted) discourse analysts to account for the role of subregister in their analyses.
Again, this is of particular relevance to studies of news texts, which comprise multiple subregisters. Accounting for the interaction between subregister and discourse could represent a fruitful avenue of inquiry for researchers working in a critical vein, as our analysis has demonstrated the potential for news producers to subvert the conventions of particular subregisters for the purposes of working potentially discriminatory discourses into their writing, and in sections of the news where readersand perhaps more importantly, media monitorswould not usually expect to encounter them.
This paper necessarily presents a series of first steps in using the technique we have introduced. The most obvious next step, given that the keywords were extracted sequentially, is to track the Dimensions through time, and this work is underway (Clarke et al., fc.).
Additionally, given that we used one keyword approach to force a fit of our results to previous studies, it would clearly be of interest to use different approaches to calculating keywords, whether that be in terms of the equation or comparison corpus used, to consider the extent to which these meaningfully change the dimensions identified in this paper. Finally, we have demonstrated here how MCA may be used when texts cannot be reliably analysed using standard MDA. There are other approaches that we could have taken, such as sparse Principal Component Analysis (PCA; Zou et al., 2006), which is sensitive to texts with more than one instance of a keyword. Future work comparing and contrasting the output of MCA and sparse PCA is thus clearly another fruitful avenue future work in this area may take. Finally, the intersection of discourse and subregister which is apparent, though not fully explored in this paper, suggests that a systematic approach to coding subregister in a large dataset would be of value to those interested in discourse analysis and (sub)register analysis alike.  able, according, accused, across, act, action, added, afghanistan, agency, ago, agreement, ahead, allowed, almost, along, also, although, always, american, among, announced, another, anyone, anything, appeared, arabia, area, areas, armed, army, around, arrived, articles, asked, attack, attacks, authorities, away, back, barack, battle, bbc, became, become, began, behind, best, better, big, biggest, black, body, bombing, border, brexit, bring, britain, britain.s, british, brother, brought, building, business, call, called, calling, calls, came, cameron, campaign, can, capital, car, carried, cent, central, centre, change, chief, child, children, church, citizens, city, civil, claim, claimed, claims, clear, come, comes, coming, comments, committee, company, conference, confirmed, conservative, continue, control, council, country, country.s, course, court, crime, crisis, cut, daily, david, day, days, de, deal, death, debate, december, decision, defence, described, despite, died, different, director, donald, done, due, early, east, economy, emerged, end, english, enough, eu, even, event, ever, every, everyone, everything, friends, front, full, future, gave, general, germany, get, getting, give, given, global, go, going, good, government, ground, group, groups, gun, half, happened, hard, hate, head, heard, held, help, history, hit, hold, home, hope, hospital, hours, house, however, huge, human, hundreds, images, important, incident, including, information, inside, instead, intelligence, international, interview, investigation, involved, iranian, iraq, iraqi, isil, isis, islam, islamic, islamist, issue, issues, its, january, jeremy, job, join, july, june, just, justice, keep, key, kill, killed, killing, kind, know, labour, large, last, late, latest, lead, leader, leaders, leadership, leading, least, leaving, led, left, legal, less, let, life, like, likely, line, little, lives, living, local, london, long, look, looking, lost, lot, love, made, main, major, make, makes, making, man, many, march, mass, may, means, media, meeting, member, members, men, message, met, michael, middle, might, militants, military, minister, minutes, moment, monday, money, months, morning, mosque, mother, move, mps, mr, much, murder, muslim, muslims, must, name, named, nation, national, need, never, new, news, next., night, nine, north, northern, nothing, november, now, number, october, office, officer, officers, official, often, oil, old, one, online, open, operation, opposition, order, others, outside, parents, parliament, part, party, past, pay, peace, people, perhaps, person, phone, place, plan, plans, play, point, police, political, politics, possible, post, posted, power, president, press, prime, prison, problem, public, put, question, questions, rather, read, real, received talk, talks, team, tell, tensions, terror, terrorism, terrorists, thing, things, think, third, though, thousands, threat, three, thursday, time, times, today, together, told, top, towards, town, travel, tried, troops, trump, trump.s, try, trying, tuesday, turkish, turned, tv, twitter, two, uk, un, union, united, university, us, use, used, using, victims, violence, visit, vote, want, wanted, wants