next up previous
Next: Boland, Hartsuiker, Pickering, Postma: Up: Poster Session 2: Tuesday Previous: Ashby, Martin, Morris: Syllable

Leaving the past tense behind: computational studies of a morphologically-complex language

Jelena Mirkovic, Mark S. Seidenberg and Marc F. Joanisse
jelena@lcnl.wisc.edu
University of Wisconsin

We will describe computational modeling and behavioral research on Serbian, a language with a complex inflectional system. Most research on inflectional morphology has focused on English, an inflectionally impoverished language, and on one inflection, the past tense, for which a rule can be easily stated. Serbian noun morphology, the focus of our work, marks 3 genders, 2 numbers, and 7 cases. This system is quasiregular: many inflections are predictable and consistent but there are numerous exceptions. We implemented a connectionist model in order to investigate the structure of this inflectional system, particularly statistical contingencies that hold among different types of information.

The implemented model took localist specifications of lemma, number, gender, and case as input and learned to generate correct phonological forms for 3244 words. The model also produced plausible forms on a generalization test. We then used the model to generate a novel neighborhood statistic that accounted for significant amount of variance in response latencies in a naming study conducted with 48 native speakers of Serbian.

Subsequent versions of the model varied the types of information provided as input. Some theories assume that words are represented by lexical entries that specify information such as grammatical category and gender. Several studies, however, have shown that there are probabilistic phonological cues to grammatical category (e.g. Cutler et al. 1990; Kelly, 1992; Cassidy et al. 1999). We used variants of the model to examine whether there are phonological cues to gender in Serbian. In addition to the original model, with localist representations of lemma, case, number and gender, a second model excluded information about gender. A third model represented lemma, case and number, plus a semantic factor (animacy), which is relevant for generating some forms (e.g. masculine accusative singular). All three types of models learned the training corpus successfully. Initial learning was faster with more explicit input formation, but all the models converged to 99% accuracy within the same time frame (3 million iterations).

The results provide evidence that a complex inflectional system such as the one for Serbian can be construed as a set of statistical, probabilistic relations among different types of information. A conjunction of phonological and semantic information is used in producing correctly inflected forms, with distinctions such as gender derived from correlated cues in the course of learning. This account suggests similarities in how inflectional morphology is processed in typologically diverse languages such as Serbian and English.

References

1. Cassidy, K. W., Kelly, M. H. and Sharoni, L. J. (1999): Inferring gender from name phonology. Journal of Experimental Psychology: General. 128, 3, 362-381.

2. Cutler, A., McQueen, J. & Robinson, K. (1990): Elizabeth and John: Sound patterns of men's and women's names. Journal of Linguistics, 26, 471-482.

3. Kelly, M. H. (1992): Using sound to solve syntactic problems: The role of phonology in grammatical category assignments. Psychological Review, 99, 2, 349-364.


next up previous
Next: Boland, Hartsuiker, Pickering, Postma: Up: Poster Session 2: Tuesday Previous: Ashby, Martin, Morris: Syllable
Patrick Sturt 2003-08-15