Visual world studies of conversational perspective taking
Visual-world eyetracking greatly expanded the potential for insight into how listeners access and use common ground during situated language comprehension. Past reviews of visual world studies on perspective taking have largely taken the diverging findings of the various studies at face value, and attributed these apparently different findings to differences in the extent to which the paradigms used by different labs afford collaborative interaction. Researchers are asking questions about perspective taking of an increasingly nuanced and sophisticated nature, a clear indicator of progress. But this research has the potential not only to improve our understanding of conversational perspective taking. Grappling with problems of data interpretation in such a complex domain has the unique potential to drive visual world researchers to a deeper understanding of how to best map visual world data onto psycholinguistic theory. I will argue against this interactional affordances explanation, on two counts. First, it implies that interactivity affects the overall ability to form common ground, and thus provides no straightforward explanation of why, within a single noninteractive study, common ground can have very large effects on some aspects of processing (referential anticipation) while having negligible effects on others (lexical processing). Second, and more importantly, the explanation accepts the divergence in published findings at face value. However, a closer look at several key studies shows that the divergences are more likely to reflect inconsistent practices of analysis and interpretation that have been applied to an underlying body of data that is, in fact, surprisingly consistent. The diverging interpretations, I will argue, are the result of differences in the handling of anticipatory baseline effects (ABEs) in the analysis of visual world data. ABEs arise in perspective-taking studies because listeners have earlier access to constraining information about who knows what than they have to referential speech, and thus can already show biases in visual attention even before the processing of any referential speech has begun. To be sure, these ABEs clearly indicate early access to common ground; however, access does not imply integration, since it is possible that this information is not used later to modulate the processing of incoming speech. Failing to account for these biases using statistical or experimental controls leads to over-optimistic assessments of listeners’ ability to integrate this information with incoming speech. I will show that several key studies with varying degrees of interactional affordances all show similar temporal profiles of common ground use during the interpretive process: early anticipatory effects, followed by bottom-up effects of lexical processing that are not modulated by common ground, followed (optionally) by further late effects that are likely to be post-lexical. Furthermore, this temporal profile for common ground radically differs from the profile of contextual effects related to verb semantics. Together, these findings are consistent with the proposal that lexical processes are encapsulated from common ground, but cannot be straightforwardly accounted for by probabilistic constraint-based approaches.