
Full text loading...
Abstract
The use of very large social media datasets in corpus linguistics has obvious benefits. Such data represent a novel source of evidence when compared with structured digital text corpora. However, there is a clear need to assess critically how the effective reuse of data can be handled, how findings can be reproduced, and how results can be generalized. A relevant question concerns the presentation of data to ensure reproducibility and replicability. This article surveys the state-of-the-art of descriptions of data collection and methodological transparency in 30 studies that used Twitter/X as their data. The empirical section investigates how easy it would be to reproduce a study based on these descriptions. While we concentrate on evidence from one social media application, the discussion continues to a presentation of concrete steps that might be used to improve data management related to the reuse, discovery, and evaluation of social media data in general.
Article metrics loading...
Full text loading...
References
Data & Media loading...