From real-life situated discourse to video-stream data-mining: An argument for agent-oriented modeling for multimodal corpus compilation Gu Yueguo,, 14, 433-466 (2009), doi = https://doi.org/10.1075/ijcl.14.4.01gu, publicationName = John Benjamins, issn = 1384-6655, abstract= This paper presents an argument for agent-oriented modeling (AOM) as a research methodology and a metalanguage for corpus linguistics. It is triggered by three closely related issues arising from compiling multimodal corpora such as the Spoken Chinese Corpora of Situated Discourse (SCCSD). Given a real-life situation, there are three types of representation: (i) the Written Word representation, (ii) audio recording, and (iii) video recording. It is shown that the three types are all data-transformative and involve data loss, and that they are intrinsically flawed. The current multiple-layered approach to data integration is also shown to be inadequate. AOM is proposed to be a potential solution to the problems. Modeling decision tree, levels of modeling, and modeling schema written in XML are demonstrated. The philosophical basis of AOM, and its theoretical implications are also discussed., language=, type=