The direct object / sentential complement ambiguity which occurs at the post-verbal NP in Tom admitted the students were right can be eliminated by including the complementizer that. Yet production experiments such as Ferreira and Dell (2000) suggest that speakers do not use that to avoid such ambiguities, and may not even be aware of the ambiguity during production. At the same time, evidence suggests that there are systematic differences between examples that do and do not use the complementizer. Thompson and Mulac (1991) show that the epistemicity of the main clause and the topicality of the complement contribute to that ellipsis. Hawkins (2002) argues that ellipsis is less likely when the subordinate subject is long, due to processing pressures to avoid lengthy ambiguity, and Ferreira and Dell (2000) suggest that ellipsis occurs to allow early mention of available material, such as when the main and subordinate clause subjects are the same.
These previous analyses have involved relatively small data sets, large enough to support the idea that each factor is a potential predictor of that ellipsis, but too small to allow for detailed examinations of the relative strength of each cause. We prepared a database of the approximately 1.3 million spoken and written tokens from the British National Corpus containing any of the 100 sentential complement-taking verbs listed in Garnsey et al. (1997). Approximately 182,000 of these instances involved a sentential complement, either with or without that. These examples were labeled for a variety of formal and semantic properties. The formal properties included the length of the subject and post verbal NPs and their heads, and the log lexical frequency of the heads of the subject NP and the post-verbal NP. The semantic properties consisted of automatically ranking the subject and post-verbal NPs and their heads on twenty semantic dimensions based on Latent Semantic Analysis (Deerwester et al. 1990). We then performed a variety of analyses on this data, including regression analyses to predict the extent to which various factors could predict the presence or absence of that.
Starting from a baseline of 63%, we were able to correctly predict the presence or absence of a that 79% of the time. This provides clear evidence that the presence of the complementizer is governed by contextual factors. Additionally, we show these factors conspire to reduce the occurrence of ambiguous cases such as the example in the first sentence.
Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T. K., & Harshman, R. (1990). Indexing by Latent Semantic Analysis. Journal of the American Society For Information Science, 41, 391-407.
Ferreira, V. S., & Dell, G. S. (2000). Effect of ambiguity and lexical availability on syntactic and lexical production. Cognitive Psychology, 40(4), 296-340.
Garnsey, S. M., Pearlmutter, N. J., Myers, E., & Lotocky, M. A. (1997). The contributions of verb bias and plausibility to the comprehension of temporarily ambiguous sentences. Journal of Memory & Language, 37(1), 58-93.
Hawkins, J. A. (2002). Symmetries and asymmetries: their grammar, typology and parsing. Theoretical Linguistics, 28(2), 95-150.
Thompson, S. A., & Mulac, A. (1991). The discourse conditions for the use of the complementizer `that' in conversational English. Journal of Pragmatics, 15, 237-251.