J Psycholinguist Res DOI 10.1007/s10936-014-9337-z
Polysemy Advantage with Abstract But Not Concrete Words Bernadet Jager · Alexandra A. Cleland
© Springer Science+Business Media New York 2014
Abstract It is a robust finding that ambiguous words are recognized faster than unambiguous words. More recent studies (e.g., Rodd et al. in J Mem Lang 46:245–266, 2002) now indicate that this ambiguity advantage may in reality be a polysemy advantage: caused by related senses (polysemy) rather than unrelated meanings (homonymy). We report two lexical decision studies that investigated the effects of polysemy with new word sets. In both studies, polysemy was factorially manipulated while homonymy was controlled for. In Experiment 1, where the stimulus set consisted solely of concrete nouns, there was no effect of polysemy. However, in Experiment 2, where the stimulus set consisted of a mix of abstract nouns, verbs, and adjectives, there was a significant polysemy advantage. Together, these two studies strongly suggest that polysemy affects abstract but not concrete nouns. In addition, they rule out several alternative explanations for these polysemy effects, e.g., sense dominance, age-of-acquisition, familiarity, and semantic diversity. Keywords Polysemy · Concreteness · Ambiguity advantage · Mental lexicon · Lexical decision · Linear mixed-effects models Many lexical decision studies have found that ambiguous words are recognized faster than unambiguous words (Azuma and Van Orden 1997; Borowsky and Masson 1996; Chumbley and Balota 1984; Hino and Lupker 1996; Hino et al. 2002, 1998, 2006; Jastrzembski 1981; Jastrzembski and Stanners 1975; Jastrzembski and Wittes 1982; Kellas et al. 1988; Lin and Ahrens 2010; Millis and Button 1989; Rubenstein et al. 1970, 1971; cf. Forster and Bednall 1976; Gernsbacher 1984). Although the assumption was often that this ambiguity advantage (Kawamoto et al. 1994) was caused by homonymy (ambiguity stemming from unrelated meanings; e.g, ‘bank’), more recent studies (Beretta et al. 2005; Rodd et al. 2002; Tamminen et al. 2006) have found indications that this effect is in reality due to polysemy (ambiguity stemming from related senses; e.g., ‘hook’). In contrast, these same studies did not find a homonymy advantage (Rodd et al. 2002, Experiment 2) and sometimes even a homonymy disadvantage (Beretta et al. 2005; Rodd et al. 2002; Experiment 3; Tamminen et al. 2006). B. Jager (B) · A. A. Cleland School of Psychology, University of Aberdeen, William Guild Building, Old Aberdeen, AB24 3FX, UK e-mail:
[email protected]
123
J Psycholinguist Res
Support for the existence of a polysemy advantage has been found with both visual (Beretta et al. 2005; Rodd et al. 2002, Experiment 2) and auditory presentation (Rodd et al. 2002, Experiment 3; Tamminen et al. 2006). However, all of these studies employed the same two stimulus sets (developed by Rodd et al. 2002). Additionally, those word sets were quite similar to each other, and in fact one set mostly consisted of a subset of the other. Therefore, the first aim of the current experiments was to take the essential next step: to replicate the polysemy advantage with new stimulus sets. A second goal of the current studies was to exclude alternative explanations for the polysemy advantage. Firstly, we wanted to confirm that the polysemy advantage is not due to homonymy. On the one hand, Rodd et al. (2002) already found that polysemy facilitated word processing whereas homonymy did not affect lexical decision times or even incurred a processing disadvantage (replicated by Beretta et al. 2005; Tamminen et al. 2006). On the other hand, the Rodd et al. stimulus set still comprised homonyms. Their presence could potentially have played a role in the findings. For example, polysemy often goes unnoticed whereas homonymy is a conspicuous type of ambiguity: it seems possible that participants detected that some words were ambiguous, which may have affected their responses. Furthermore, so far the polysemy explanation for the ambiguity advantage does not seem to have gained wide-spread acceptance yet. To ensure that the ambiguity advantage is not in any way due to homonymy, the current stimuli all consisted of non-homonyms. Therefore, findings of an ambiguity advantage cannot be attributed to homonymy. In addition, conditions were matched for a large number of additional variables, for example age of acquisition, frequency, and familiarity. These were then also potentially included as additional fixed effects (depending on whether they significantly contributed to variance). Thus, these variables could also be excluded as alternative explanations should a processing advantage be found. Lastly, the current experiments provided information about three additional variables of interest: concreteness, sense dominance, and semantic diversity. The results of Experiment 1 encouraged us to investigate the possibility that polysemy affects abstract but not concrete words, as well as the role of sense dominance. To this end, the stimuli employed in Experiment 2 consisted of words that tended to be more abstract than those in Experiment 1. In addition, whereas Experiment 1 employed only nouns, the second set was made up of an equal mix of words with a dominant noun sense, verb sense, or adjective/adverb sense. Finally, a closer look at the data showed that current results cannot be explained by means of semantic diversity, a new ambiguity measure recently developed by Hoffman et al. (2013).
Experiment 1: Polysemy Effects for Concrete Nouns The goal of Experiment 1 was to replicate the polysemy advantage with a new word set. To increase the odds that these new stimuli could be used in other tasks besides lexical decision (e.g., semantic categorization or priming), we selected words that were easily classifiable: concrete nouns for living beings and objects. Method Participants In return for course credit, 30 undergraduate students of the University of Aberdeen (25 women) participated in this study. Their age ranged from 17 to 25 (M = 20). They were all native speakers of English, and none of them reported reading or speech difficulties.
123
J Psycholinguist Res Table 1 Descriptive statistics for target stimuli Experiment 1
Polysemy condition
Few senses
Many senses
Example
Toe
Tie
N
45
45
Senses
2.56
8.87
Lemma frequency
63.87
64.69
Familiarity
5.22
5.26
Concreteness
5.74
5.73
Letters
5.07
5.09
Syllables
1.40
1.40
Bigram frequency
8649
8142
Age of acquisition
5.96
6.00
Neighbours
5.00
4.58
Stimuli Previous studies (e.g., Jager and Cleland 2014; Rodd et al. 2002) found that questionnaires are liable to conflate polysemy and homonymy, so the current word sets were developed with the help of a dictionary, the online Wordsmyth Dictionary–Thesaurus (WDT; Parks et al. 1998). This dictionary allows for a fast search of words with few or many related senses. In addition, its listing of meanings and senses is straightforward and clear. The new stimulus set for Experiment 1 consisted of 90 concrete nouns. All target words can be found in “Appendix 1”; their properties have been summarized in Table 1. For 45 words, the WDT listed few senses (1–4 senses, M = 2.56), whereas the other 45 words had many senses (6 or more senses, M = 8.87). Number of senses differed significantly between the two conditions, t (88) = 14.11, p < .001. Conditions were matched for lemma frequency (mean per million and log-transformed), bigram frequency and number of neighbours (Baayen et al. 1993), familiarity and concreteness (Coltheart 1981), age of acquisition (Kuperman et al. 2012), as well as word length and number of syllables (all ps ≥ .602). The WDT listed only one meaning for each of these words, so they were all non-homonyms. The target stimuli referred to objects or living beings, with high concreteness scores signalling that the noun interpretation was dominant. To prevent participants from guessing the aim of the study and ensure that the task encompassed an appropriate amount of time, 152 filler words were added to the stimulus set. For every word (filler and target) a nonword was included as well, so 242 nonwords were presented in total. The nonwords were obtained by replacing one letter in each word so that they followed rules and conventions of English spelling as much as possible without actually forming a word. Nonwords mainly consisted of legal nonwords (do not sound like existing words) and a few pseudohomophones (pronounced like an existing word). Procedure Participants were presented with a series of letter strings, and instructed to indicate whether they saw a word or nonword. The experiment was presented by means of a Dell PC (Windows XP), using E-Prime software. Responses were recorded by means of a button box, and instructions emphasized repeatedly that both speed and accuracy were important. On each trial, a fixation cross appeared for 500 ms. The fixation point was followed by presentation of the letter string (Courier New, 28 points). The trial ended when the participant had
123
J Psycholinguist Res
responded or 3,000 ms elapsed after presentation of the word. Following the end of the trial, the screen remained blank for 1,000 ms before presentation of the next fixation cross. Order of presentation was randomized for each participant. Prior to the experimental session, participants performed 30 practice trials (15 words, 15 nonwords) for which they received speed and accuracy feedback. For the actual experiment, stimuli were presented in one continuous block, which took around 15 min to complete. Analyses Data were analysed by means of linear mixed-effect models (Dixon 2008; Jaeger 2008; an extensive description of the method can be found in Baayen 2008; for a user-friendly overview tailored towards researchers without a strong computational background, see Cunnings 2012). Models were fitted by means of the “best-path” approach (e.g., Baayen 2008; Cunnings 2012), in which random slopes are only added if they significantly improve a model’s fit (as determined by means of ANOVAs). Research by Barr et al. (2013) has shown that performance of best-path models is comparable or superior to that of maximal models: they are similarly robust against type I errors (incorrectly rejecting the null hypothesis) as long as datasets are reasonably-sized (Barr et al. 2013, p. 269), and even have slightly lower chances of type II errors (incorrectly accepting the null hypothesis) in comparison with maximal models (Barr et al., p. 267). Furthermore, it was checked whether the models’ fits were significantly improved by including any of the eight matched variables: lemma frequency and bigram frequency (both log-transformed), age of acquisition, familiarity and concreteness, number of neighbours, letters, and syllables. Inclusion of additional variables was decided in a pre-determined way. First, models were progressively simplified by excluding variables one by one (weakest contributor first). The simplest model consisted only of the variable of interest (polysemy: few or many senses), the most complex model contained all variables. The nine models were then compared by means of an ANOVA. Additional variables beyond the variable of interest were only included if they significantly increased a model’s fit. Inclusion/exclusion of these additional variables will be mentioned when describing the optimal model, but only the effect of interest (polysemy) will be discussed. The models were implemented in R (R Development Core Team 2008). Reaction times were log-transformed (as recommended in Baayen 2008) and apart from the factorial effect of interest (polysemy: few or many senses) all independent variables were centred to reduce collinearity within the model (Jaeger 2010). Models for the continuous dependent variable reaction time were implemented by means of the function lmer(), those for the binomial dependent variable accuracy with the function glmer(), both from the package lme4. In addition we used the package LMERConvenienceFunction (for removing reaction time outliers). Currently there is no agreement about the optimal way to estimate significance for effects obtained with the function lmer(), so as suggested by Cunnings (2012) we decided to use a formula from Baayen (2008, p. 248): p = 2 ∗ (1 − pt(abs(X), Y − Z)). In this formula, X is the t-value, Y is the number of observations, and Z is the number of fixed effect parameters including the intercept (so Z comes down to the total number of fixed effects plus 1). Binomial data such as accuracy scores can be analysed with the function glmer(), which in contrast to the function lmer() does provide significance levels. Therefore, no additional calculations were needed for that variable.
123
J Psycholinguist Res Table 2 Mean reaction times and error rates for Experiment 1 Polysemy
RT in ms (SD)
ER in % (SD)
Few senses
548 (81)
3.29 (2.99)
Many senses
545 (75)
3.06 (2.28)
Reaction times (RT) in ms, error rates (ER) as percentages; both with between-subject standard deviations (SD) in brackets
Results and Discussion Due to a typing error, one of the words from the many senses condition (‘plug,’ which had been turned into ‘plu’) had to be excluded from the analysis, together with a matched word from the few senses condition (‘doll’). Number of senses still differed significantly between conditions ( p < .001), while the other seven variables did not (all ps ≥ .650). Target trials were excluded from analyses if reaction times were 3.5 standard deviations above/below each participant’s mean per condition (1.02 % of trials). Of the remaining trials, participants’ mean error rate ranged from 0 to 8 % (M = 3.18 %). Error trials were excluded for the reaction time analyses. For those filtered data, participants’ mean reaction times ranged from 413 to 665 ms (M = 547 ms). A summary of the results has been provided in Table 2. The best fitting model for reaction times (N = 2,530 log-likelihood = 604.14) included random intercepts for items and subjects as well as three fixed effects: the variable of interest (polysemy: few or many senses) and two further variables that significantly increased the model’s fit: frequency (χ2 (1) = 26.23, p < .001) and age of acquisition (χ2 (1) = 9.42, p = .002). Adding random slopes for polysemy did not significantly increase the model’s fit (χ2 (2) = 0.55, p = .760), nor did addition of further variables (all ps > .05). With Y = 2,530 and Z = 4, the 3 ms difference between words with few and many senses did not reach significance, t = −0.35, p = .726. For error rates, adding random slopes did not significantly increase the model’s fit (χ2 (2) = 0.47, p = .790). The addition of two variables improved the model’s fit: number of letters (χ2 (1) = 6.58, p = .010) and frequency (χ2 (1) = 4.62, p = .032). Adding any further variables did not make a significant difference (all ps > .05). Thus, the best fitting model for error rates (N = 2,613, log-likelihood = −354.34) included random intercepts for items and subjects and three fixed effects. Error rates were not affected by polysemy, z = 0.18, p = .860. They were almost identical for words with few and many senses. In Experiment 1, polysemy did not have any effect on reaction time or error rate. Several explanations for this result can be ruled out. Firstly, it is almost certainly not due to a lack of power since effects of polysemy were not only absent statistically, but also numerically: reaction times and error rates were very similar for the two conditions. Secondly, the findings are not likely to be caused by the nonwords that were used. Previous studies found a polysemy advantage with both legal nonwords (Rodd et al. 2002, Experiment 3; Tamminen et al. 2006) and pseudohomophones (Beretta et al. 2005; Rodd et al. 2002, Experiment 2), so the choice for one or the other should not make a difference in this case. The most noticeable way in which Experiment 1 differed from previous studies is the fact that all current stimuli were concrete nouns. Words had a mean concreteness score of 5.74, and 82 out of 88 words had a concreteness score over 5. In contrast, the stimulus sets developed by Rodd et al. (2002) consisted of a more even mix of different word types. Therefore, we suspected that the highly concrete nature of the current stimuli may have caused the absence of the polysemy advantage.
123
J Psycholinguist Res
Indeed, findings of a lexical decision study by Tokowicz and Kroll (2007) strongly suggest that the ambiguity advantage may indeed be sensitive to concreteness. They performed regression analyses over lexical decision times and error rates to ambiguous nouns and found that the ambiguity advantage interacted with concreteness: the effect was only present for nouns with low concreteness scores. Tokowicz and Kroll did not specify which kind of lexical ambiguity they intended to investigate, but only 29 out of their 399 nouns had more than one meaning entry in the WDT (Parks et al. 1998). Furthermore, the 260 words for which the average number of responses provided in the pre-test by Tokowicz and Kroll was below the overall mean (1.45) had on average 1.04 meaning entries in the WDT, and 4.09 sense entries. For the 139 words for which on average more than 1.45 responses were given, the number of meaning entries was very similar (1.17), but the number of sense entries was much higher (7.79). Thus it seems likely that the interaction effect found by Tokowicz and Kroll may have been driven by polysemy rather than homonymy, an interpretation that is strengthened by the absence of a polysemy effect in the current experiment. A second possibility is that the lack of a polysemy effect was not so much due to the concreteness of the words, but because of the strongly dominant interpretations of those words. For Experiment 1, concreteness ratings were taken from the MRC psycholinguistic database (Coltheart 1981) which is based on questionnaire ratings. Similarly, Tokowicz and Kroll (2007) also obtained concreteness scores by means of a questionnaire. However, when participants have to rate words for concreteness, a non-noun interpretation of the word can also lower an average concreteness score. For example, the noun senses of words such as ‘lie,’ ‘sin,’ or ‘thought’ are admittedly quite abstract, but these words could also easily be interpreted as verbs. The same is not the case for highly-concrete words. Although concrete words such as ‘comb’ or ‘stone’ have several verb senses according to the WDT, the activities referred to by these verb senses often involve the objects indicated by the noun senses (‘He carefully combed his six remaining hairs,’ ‘They stoned that unfortunate woman’), suggesting that the object interpretation of these words is dominant. Thus it seems possible that the lack of a polysemy advantage for highly-concrete words was because those words tended to have one very dominant sense, whereas the same may not have been true for the abstract words. Experiment 2: Polysemy Effects for Abstract Words Experiment 2 was conducted to investigate the two alternative explanations for the lack of a polysemy effect in Experiment 1: high concreteness or strong sense dominance. To this end, a second new word set was developed for Experiment 2: a mix of more abstract nouns, verbs, and adjectives/adverbs. Method Participants As part of their course requirements, 30 undergraduate students (16 women) took part in Experiment 2. Their age ranged from 18 to 45 (M = 24). None of them had any reading or speech difficulties, and all of them reported that English was their first language. Stimuli The 96 target words were assigned to the few senses condition if the Wordsmyth Dictionary– Thesaurus (WDT; Parks et al. 1998) listed 6 senses or less for them (M = 4.08 senses), and
123
J Psycholinguist Res Table 3 Descriptive statistics for target stimuli Experiment 2
Polysemy condition
Few senses
Many senses
Example
Calm
Cool
N
48
48
Senses
4.08
10.25
Lemma frequency
108.29
120.50
Familiarity
5.43
5.45
Concreteness
3.96
4.02
Letters
4.44
4.50
Bigram frequency
6763
7214
Age of acquisition
6.25
5.93
Neighbours
7.40
7.23
to the many senses condition if they had 7 or more senses (M = 10.25 senses). The number of senses differed significantly between the two conditions, t (94) = 11.91, p < .001. All words had one syllable, but other than that the two groups of words were matched on the same variables as those for the first experiment: word frequency, bigram frequency, and number of neighbours (Baayen et al. 1993), concreteness and familiarity (Coltheart 1981), age of acquisition (Kuperman et al. 2012), and number of letters (all ps ≥ .329). All words were non-homonyms (the WDT only provided one meaning entry for them). Words tended to be much more abstract than those for Experiment 1: they had a mean concreteness score of 3.99, and 80 out of 96 words had a concreteness score under 5. The target words can be found in “Appendix 2”, their properties are presented in Table 3. Words were selected to have either a dominant noun sense, verb sense, or adjective/adverb sense (32 words each). Sense dominance was confirmed in two ways: by means of a questionnaire and the WDT dictionary, which lists senses in order of frequency of use (see Parks et al. 1998; http://www.wordsmyth.net/?mode=history). Agreement with the WDT was 96 %: for 92 out of 96 words, the dominant sense was listed first. In a pen and paper questionnaire, forty participants who did not take part in the current study (age ranging from 17 to 30; M = 20, 28 women; all native speakers of English, no language difficulties) were asked to construct sentences with 32 of the words. Each participant saw an approximately equal number of words that were pre-classed as verbs, nouns, and adjectives (10 or 11). Order of the words was randomized for each participant. It was then analysed whether a word was used as a noun (or noun phrase), a verb (or verb phrase) or adjective/adverb. Agreement with the questionnaire was 94 %: for 90 out of 96 words, participants used the word in the dominant sense. The 96 target words were accompanied by 92 filler words and 188 nonwords. These nonwords were again created by replacing a letter in existing words. Results and Discussion Procedure and analyses were almost identical to those for Experiment 1. The only change compared with the first experiment was that analyses could potentially include seven instead of eight additional variables. Number of syllables was not included this time because all words in the current stimulus set had one syllable. Target trials were excluded from analyses if reaction times were 3.5 standard deviations above/below each participant’s mean per condition (1.49 % of trials). Of the remaining trials, participants’ mean error rate ranged from 0 to 15 % (M = 3.28 %). Error trials were excluded for the reaction time analyses. For those filtered data, participants’ mean reaction times ranged from 417 to 691 ms (M = 541 ms). A summary of the results has been provided in Table 4.
123
J Psycholinguist Res Table 4 Mean reaction times and error rates for Experiment 2 Polysemy
RT in ms (SD)
ER in % (SD)
Few senses
549 (77)
4.01 (5.05)
Many senses
533 (68)
2.55 (3.66)
Reaction times (RT) in ms, error rates (ER) as percentages; both with between-subject standard deviations (SD) in brackets
For reaction times, adding random slopes did not significantly increase the model’s fit (χ2 (2) = 1.03, p = .596). The addition of two variables improved the model’s fit: familiarity (χ2 (1) = 10.76, p = .001) and bigram frequency (χ2 (1) = 5.03, p = .025). Adding any further variables did not make a significant difference (all ps > .05). Thus, the best fitting model for reaction times (N = 2,744, log-likelihood = 649.89) included random intercepts for items and subjects as well as three fixed effects. With Y = 2,744 and Z = 4, the difference between words with few and many senses reached significance, t = −2.55, p = .011. Words with many senses were recognized 15 ms faster than words with few senses. The best fitting model for error rates (N = 2,837, log-likelihood = −362.71) included random intercepts for items and subjects as well as three fixed effects: polysemy (the variable of interest) and two further variables that significantly increased the model’s fit: number of letters (χ2 (1) = 15.46, p < .001) and word frequency (χ2 (1) = 7.94, p = .005). Adding random slopes for polysemy did not significantly increase the model’s fit (χ2 (2) = 0.16, p = .925), nor did addition of further variables (all ps > .05). Words with few senses elicited almost twice as many errors as words with many senses, but this polysemy effect did not reach significance, z = 1.35, p = .178. Making use of a newly-created stimulus set of abstract words, Experiment 2 replicated the polysemy advantage that was previously only found with the two word sets created by Rodd et al. (2002): words with many senses were recognized faster than words with few senses. The effect was even numerically similar to those found in earlier visual studies: 15 ms in the current experiment compared to 14 ms (Rodd et al. 2002, Experiment 2) and 20 ms (Beretta et al. 2005). These results provide further support for the psychological reality of polysemy, as well as for the hypothesis (Rodd et al. 2002) that previous findings of an ambiguity advantage were due to polysemy rather than homonymy. In addition, the fact that the polysemy advantage was found for words with dominant senses strongly suggests that the lack of that same effect in highly concrete words in Experiment 1 and the Tokowicz and Kroll (2007) study was not due to sense dominance. Therefore, for now the most likely explanation seems to be that polysemy interacts with concreteness.
General Discussion The current results replicate findings by Tokowicz and Kroll (2007): an ambiguity advantage for abstract but not concrete nouns. Tokowicz and Kroll explained this pattern by referring to the findings by Rodd et al. (2002) and suggested (pp. 747, 753) that concrete words may have related (polysemous) senses while abstract words have unrelated (homonymous) meanings. However, if that were true, concrete words should show the polysemy advantage, whereas abstract words should exhibit a homonymy disadvantage, which is not at all what was found in either the Tokowicz and Kroll study nor our own. In fact, our findings confirm that this
123
J Psycholinguist Res
pattern still occurs when controlling for homonymy, and thus should be interpreted as an interaction between polysemy and concreteness. Beretta et al. (2005) and Tamminen et al. (2006) considered the findings of a polysemy advantage as support for the notion that related senses are not represented separately in the mental lexicon, but are derived from an underspecified single entry (e.g., Frisson and Pickering 1999). However, merely stating that polysemous words are represented as a single entry does not explain why words with many related senses are recognized faster than words with few related senses. Thus, both Beretta et al. and Tamminen et al. refer to two related explanations (also suggested by Rodd et al. 2002) that involve information beyond the word level: higher context availability or higher context independence for words with many senses than for words with few senses. The current findings could be quite easily incorporated within these explanations. In fact, Schwanenflugel et al. (1988) used the context availability account to explain the faster recognition of concrete words. They argued that concrete words are recognized faster in lexical decision studies because they have more contextual information associated with them. This abundance of contextual information could conceivably result in a ceiling effect, nullifying a similarly-working bonus of having multiple senses. Hoffman et al. (2013) based their new ambiguity measure on contextual information as well. They too assumed that polysemous words do not have distinct senses. Instead, they argued that a word’s interpretation varies continuously depending on the particular context in which it appears, and that context variability rather than context availability affects the processing of ambiguous words. This semantic diversity (semD) was defined by estimating the semantic similarities of different linguistic contexts within a large text corpus; greater differences between contexts meant higher semD ratings. Hoffman et al. proposed that semantic diversity could provide an objective measure of ambiguity, and provided semD values for over 30,000 words. To investigate whether the current polysemy findings could be explained by semantic diversity, we looked up the semantic diversity scores for the words used in Experiments 1 and 2 (no score was provided for the word ‘spank,’ so instead we used the score of the related word ‘spanking’). We then used this information to test the semantic diversity explanation in two ways. First we checked whether semantic diversity scores differed between our polysemy conditions. No difference was found for either the 88 stimuli analysed for Experiment 1 ( p = .217) or the 96 words employed in Experiment 2 ( p = .140). Then we entered semantic diversity as an additional variable in the reaction time analyses of both experiments. In neither case would semantic diversity have contributed more strongly to reaction time variance than polysemy. For Experiment 1, the effect of polysemy did not significantly affect reaction times. However, semantic diversity would have been the weakest contributor to reaction time variability (t = 0.08), and therefore the first variable to be removed from the model. Comparison with the remaining models confirmed that addition of semantic diversity would not have significantly improved the model’s fit (χ2 (1) = 0.01, p = 0.928). For Experiment 2, the effect of polysemy reached significance. In contrast, semantic variability was again a weak contributor to reaction time variance (t = 0.63). This time, it would have been the 4th variable to be removed from the model. Again, an ANOVA confirmed that addition of the semantic diversity scores did not improve the model’s fit (χ2 (1) = 0.43, p = .516). These additional analyses show that semantic diversity does not provide an alternative explanation for the polysemy effects found in the current study. Firstly, semantic diversity scores did not differ between the polysemy conditions in either of the experiments. On the one hand, the similar semantic diversity for the two conditions could theoretically explain the absence of a polysemy effect in Experiment 1. On the other hand, the same pattern would then
123
J Psycholinguist Res
have to be found for Experiment 2. However, despite the fact that conditions were matched for semantic diversity in the second experiment as well, it did show a polysemy advantage. Secondly, semantic diversity did not contribute more strongly to reaction time variance than polysemy did, and including it into models did not increase their fits. Therefore, we can conclude that semantic diversity did not cause the polysemy advantage in Experiment 2, nor its absence in Experiment 1. Instead of explaining polysemy effects by means of context effects, it can also be addressed at the word level itself. Rodd et al. (2004) developed a parallel distributed processing (PDP) model in which they conceived of related senses as flexible interpretations of one lexical entry. The senses all fall within a large attractor basin, thereby increasing the probability of lexical activation. Thus, the model predicts a polysemy advantage in lexical decision tasks. In this explanation, the wider basin does not directly reflect individual senses but instead indicates increased flexibility (or reduced specificity). So this account invokes a single lexical entry (as was also done by Beretta et al. 2005; Tamminen et al. 2006), but the effect of polysemy can be interpreted as taking place within the lexical representation rather than resorting to extraword information such as context. However, it should be noted that the internal structure is still most likely the result of such extraneous activity. Although the current implementation of the Rodd et al. model only focused on ambiguity, it seems quite possible to incorporate interaction effects with concreteness by having this variable affect the size of attractor basins. For example, attractor basins of concrete words could already be very large, thus causing additional benefits of multiple senses to be negligible. Following Schwanenflugel et al. (1988), potential mechanisms behind such an increased attractor basin for concrete words could then be similar to those for polysemy: context availability/independence. So far, there are not many testable hypotheses regarding the polysemy advantage. And the ones that do exist (context availabily/independence, semantic diversity, larger activation basins) are based on the same premises (e.g., single lexical entries) and therefore result in similar predictions. So it may not be that surprising that only a few polysemy studies have been published since the groundbreaking work by Rodd et al. (2002). We argue that the best way of propelling polysemy research forwards is to conduct more experiments with a wider variety of stimuli. An ideal way to speed up the development of new stimulus sets is to construct them on the basis of dictionary information. As was shown in a previous study (Jager and Cleland 2014), dictionaries are similarly suited to questionnaires for defining ambiguity, and in fact have some strong advantages over that method when distinctions between homonymy and polysemy are of the essence. Constructing stimulus sets by means of a dictionary is not dependent on availability of participants, thus allowing researchers to prepare an experiment when that most valuable of resources is not available (for example during the summer months). In addition, it would allow researchers with little lab time (such as project students) to prepare their study ahead of time before testing season starts. The current studies provide new insights into the nature of polysemy effects. Firstly, the two of them together replicated the finding (Tokowicz and Kroll 2007) that polysemy affects abstract but not concrete words. Secondly, Experiment 1 ruled out an alternative explanation for that pattern: it does not seem to be due to sense dominance. Thirdly, Experiment 1 also ruled out several alternative explanations for the polysemy advantage itself. All words consisted of non-homonyms, so the effect cannot be due to the effect of unrelated meanings. Furthermore, conditions were matched for a large number of variables which can therefore also be excluded as alternative explanations, for example age of acquisition, familiarity, and word frequency. In addition, it was also found that semantic diversity did not play a role. In conclusion, the current results replicate polysemy effects with new stimulus sets: the polysemy advantage itself (Rodd et al. 2002), as well as its interaction with concreteness
123
J Psycholinguist Res
(Tokowicz and Kroll 2007). In addition, several alternative explanations for these effects were ruled out. Therefore, these findings further strengthen the case for the psychological reality of polysemy: ambiguity stemming from a high number of related senses.
Appendix 1: Target Stimuli for Experiment 1 See Table 5. Table 5 Experiment 1: words with (a) few senses, (b) many senses
Word
Senses
Word
Senses
Word
Senses
Barn
1
Driver
2
Cousin
3
Helmet
1
Student
2
Friend
3
Statue
1
Husband
2
Hero
3
Coffin
1
Banker
2
Boat
4
(a)
Vase
1
Hunter
2
Phone
4
Bullet
1
Duke
2
Bread
4
Poster
1
Car
3
Egg
4
Nun
1
Grape
3
Fruit
4
Monk
1
Basket
3
Toe
4
Lawyer
1
Desk
3
Map
4
Thief
1
Carpet
3
Cage
4
Missile
2
Cheek
3
Plank
4
Flask
2
Priest
3
Bottle
4
Sword
2
Badge
3
Chest
4
Weapon
2
Chief
3
Doll*
4
6
Boss
7
Guide
9 10
(b) Doctor
Sense entries were counted in the Wordsmyth Dictionary–Thesaurus (Parks et al. 1998). Two words were removed from analysis of Experiment 6; these have been marked with an asterisk (*)
Jockey
6
Guard
7
Stone
Butcher
6
Balloon
8
Trunk
10
Rebel
6
Key
8
Father
10
Skirt
7
Ribbon
8
Table
11
Gun
7
Pearl
8
Baby
11
Diamond
7
Bomb
8
Nurse
11
Needle
7
Boot
8
Fool
11 12
Nail
7
Flute
8
Hammer
Anchor
7
Scout
8
Train
12
Plate
7
Captain
8
Sink
12
Leaf
7
Judge
8
Master
13
Rod
7
Trumpet
9
Tie
15
Shovel
7
Saddle
9
Crown
21
Shield
7
Chain
9
Plug*
9
123
J Psycholinguist Res
Appendix 2: Target Stimuli for Experiment 2 See Table 6.
Table 6 Experiment 2: words with (a) few senses, (b) many senses
Adjectives/adverbs
Nouns
Word
Word
Senses
Verbs Senses
Word
Senses
(a) Glad
3
Threat
2
Cope
2
Both
3
Skill
2
Spank
2
Mild
3
Wealth
2
Seem
3
Cruel
3
Aunt
2
Stare
3
Coarse
3
Mood
3
Dare
4
Dumb
4
Sale
3
Beg
4
Dense
4
Gift
3
Choose
4
Sad
4
Birth
4
Earn
4
Sick
4
Gang
4
Seek
4
Main
5
Cell
4
Heal
4
Proud
5
Height
5
Ache
4
Ripe
5
Dirt
5
Learn
5
Poor
6
Sauce
5
Build
5
Calm
6
Coin
5
Send
6
Pale
6
Tank
5
Weigh
6
Tall
6
Branch
6
Grow
6
Late
7
Dread
7
Claim
7
Weak
7
Pride
7
Crawl
7
Broad
7
Fault
7
Melt
8
Sore
7
Pain
7
Lose
9
Wide
9
Coast
7
Care
9
Near
10
Soul
8
Steal
10
Quick
10
Crowd
8
Kill
10
Tough
10
Cloud
8
Put
11
Best
11
Bomb
8
Cheat
11
(b)
Sense entries were counted in the Wordsmyth Dictionary–Thesaurus (Parks et al. 1998)
123
Thin
11
Rope
8
Crush
11
Tight
11
Ghost
9
Hunt
12
Blind
12
Edge
9
Throw
14
Cool
13
Sponge
10
Wear
15
Deep
13
Tongue
10
Pull
16
Wild
14
Dream
11
Hang
16
Rough
14
Tune
11
Break
25
J Psycholinguist Res
References Azuma, T., & Van Orden, G. C. (1997). Why SAFE is better than FAST: The relatedness of a word’s meanings affects lexical decision times. Journal of Memory and Language, 36, 484–504. Baayen, R. H. (2008). Analyzing linguistic data: A practical introduction to statistics using R. Cambridge: Cambridge University Press. Baayen, R. H., Piepenbrock, R., & Van Rijn, H. (1993). TheCELEX lexical database [CD-ROM]. Philadelphia, PA: Linguistic Data Consortium, University of Pennsylvania. Barr, D. J., Levy, R., Scheepers, C., & Tily, H. J. (2013). Random effects structure for confirmatory hypothesis testing: Keep it maximal. Journal of Memory and Language, 68, 255–278. Beretta, A., Fiorentino, R., & Poeppel, D. (2005). The effects of homonymy and polysemy on lexical access: An MEG study. Cognitive Brain Research, 24, 57–65. Borowsky, R., & Masson, M. E. J. (1996). Semantic ambiguity effects in word identification. Journal of Experimental Psychology: Learning Memory and Cognition, 22, 63–85. Chumbley, J. I., & Balota, D. A. (1984). A word’s meaning affects the decision in lexical decision. Memory and Cognition, 12, 590–606. Coltheart, M. (1981). The MRC psycholinguistic database. Quarterly Journal of Experimental Psychology, 33, 334–338. Retrieved from: http://www.psy.uwa.edu.au/mrcdatabase/uwa_mrc.htm. Cunnings, I. (2012). An overview of mixed-effects statistical models for second language researchers. Second Language Research, 28, 369–382. Dixon, P. (2008). Models of accuracy in repeated-measures designs. Journal of Memory and Language, 59, 447–456. Forster, K. I., & Bednall, E. S. (1976). Terminating and exhaustive search in lexical access. Memory and Cognition, 4, 53–61. Gernsbacher, M. A. (1984). Resolving 20 years of inconsistent interactions between lexical familiarity and orthography, concreteness, and polysemy. Journal of Experimental Psychology: General, 113, 256–281. Hino, Y., & Lupker, S. J. (1996). Effects of polysemy in lexical decision and naming: An alternative to lexical access accounts. Journal of Experimental Psychology: Human Perception and Performance, 22, 1331–1356. Hino, Y., Lupker, S. J., & Pexman, P. M. (2002). Ambiguity and synonymy effects in lexical decision, naming, and semantic categorization tasks: Interactions between orthography, phonology, and semantics. Journal of Experimental Psychology: Learning Memory and Cognition, 28, 686–713. Hino, Y., Lupker, S. J., Sears, C. R., & Ogawa, T. (1998). The effects of polysemy for Japanese katakana words. Reading and Writing, 10, 395–424. Hino, Y., Pexman, P. M., & Lupker, S. J. (2006). Ambiguity and relatedness effects in semantic tasks: Are they due to semantic coding? Journal of Memory and Language, 55, 247–273. Hoffman, P., Lambon Ralph, M. A., & Rogers, T. T. (2013). Semantic diversity: A measure of semantic ambiguity based on variability in the contextual usage of words. Behavior Research Methods, 45, 718– 730. Frisson, S., & Pickering, M. J. (1999). The processing of metonymy: Evidence from eye movements. Journal of Experimental Psychology: Learning Memory and Cognition, 25, 1366–1383. Jaeger, T. F. (2008). Categorical data analysis: Away from ANOVAs (transformation or not) and towards logit mixed models. Journal of Memory and Language, 59, 434–446. Jaeger, T. F. (2010). Common issues and solutions in regression modelling (mixed or not). Presentation at Brain and Cognitive Sciences, University of Rochester, UK, May 4 2010. Retrieved from: https://www.hlp. rochester.edu/resources/recordedHLPtalks/PennStateRegression10/PennState-Day2.pdf (Feb 2014) Jager, B., & Cleland, A. A. (2014). Defining lexical ambiguity: A comparison of methodologies. Manuscript submitted for publication. Jastrzembski, J. E. (1981). Multiple meanings, number of related meanings, frequency of occurrence, and the lexicon. Cognitive Psychology, 13, 278–305. Jastrzembski, J. E., & Stanners, R. F. (1975). Multiple word meanings and lexical search speed. Journal of Verbal Learning and Verbal Behavior, 14, 534–537. Jastrzembski, J. E., & Wittes, R. (1982). Effects of word frequency and number of meanings for fast and slow readers. Contemporary Educational Psychology, 7, 195–200. Kawamoto, A. H., Farrar IV, W. T., & Kello, C. T. (1994). When two meanings are better than one: Modeling the ambiguity advantage using a recurrent distributed network. Journal of Experimental Psychology: Human Perception and Performance, 20, 1233–1247. Kellas, G., Ferraro, F. R., & Simpson, G. B. (1988). Lexical ambiguity and the timecourse of attentional allocation in word recognition. Journal of Experimental Psychology: Human Perception and Performance, 14, 601–609.
123
J Psycholinguist Res Kuperman, V., Stadthagen-Gonzalez, H., & Brysbaert, M. (2012). Age-of-acquisition ratings for 30,000 words. Behavior Research Methods, 44, 978–990. Lin, C.-J. C., & Ahrens, K. (2010). Ambiguity advantage revisited: Two meanings are better than one when accessing Chinese nouns. Journal of Psycholinguistic Research, 39, 1–19. Millis, M. L., & Button, S. B. (1989). The effect of polysemy on lexical decision time: Now you see it, now you don’t. Memory and Cognition, 17, 141–147. Parks, R., Ray, J. & Bland, S. (1998). Wordsmyth English dictionary–Thesaurus. [ONLINE]. University of Chicago. http://www.wordsmyth.net. R Development Core Team. (2008). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. http://www.R-project.org. Rodd, J. M., Gaskell, M. G., & Marslen-Wilson, W. D. (2002). Making sense of semantic ambiguity: Semantic competition in lexical access. Journal of Memory and Language, 46, 245–266. Rodd, J. M., Gaskell, M. G., & Marslen-Wilson, W. D. (2004). Modelling the effects of semantic ambiguity in word recognition. Cognitive Science, 28, 89–104. Rubenstein, H., Garfield, L., & Millikan, J. A. (1970). Homographic entries in the internal lexicon. Journal of Verbal Learning and Verbal Behavior, 9, 487–494. Rubenstein, H., Lewis, S. S., & Rubenstein, M. A. (1971). Homographic entries in the internal lexicon: Effects of systematicity and relative frequency of meanings. Journal of Verbal Learning and Verbal Behavior, 10, 57–62. Schwanenflugel, P. J., Harnishfeger, K. K., & Stowe, R. W. (1988). Context availability and lexical decisions for abstract and concrete words. Journal of Memory and Language, 27, 499–520. Tamminen, J., Cleland, A. A., Quinlan, P. T., & Gaskell, M. G. (2006). Processing semantic ambiguity: Different loci for meanings and senses. In Proceedings of the twenty-eighth annual conference of the cognitive science society (pp. 2222–2227). Mahwah, NJ: Lawrence Erlbaum Associates. Tokowicz, N., & Kroll, J. F. (2007). Number of meanings and concreteness: Consequences of ambiguity within and across languages. Language and Cognitive Processes, 22, 727–779.
123