by E Gerrits · 2004 · Cited by 214 — Speech sounds are said to be perceived categorically. This notion is usually operationalized as the extent to which discrimination of stimuli is predictable from
149 KB – 14 Pages
PAGE – 1 ============
Perception & Psychophysics 2004, 66 (3), 363-376Traditionally, research into the mental representation of phonetic categories has focused on the relationship be- tween discrimination and classification of speech sounds on a stimulus continuum. The first such experiment was performed by Liberman, Harris, Hoffman, and Griffith (1957). Their hypothesis was that discrimination of cer- tain speech sounds would be limited by classification; two different stimuli would be discriminated only to the extent that they were classified differently (this was later referred to as categorical perception ; Eimas, 1963). Liberman etal. concluded that their results did not agree with their own “extreme assumption”: Discrimination results were better than predicted from the classification results. The differ- ence presumably represented the listener’s ability to dis- tinguish the speech sounds not solely on the basis of the phonemic labels, but also on the basis of acoustic differ- ences. Despite this conclusion, however, this first study is often cited as typically demonstrating categorical percep- tion (for a review, see Repp, 1984). And even though a clear relationship between discrimination and classifica- tion has rarely been demonstrated in subsequent research, the results are often interpreted as indicating absolutely categorical perception (Macmillan, 1987). In other words, there is no explicit criterion for the maximum difference between discrimination and classification results that would still be compatible with categorical perception. In this article, we will use the original definition by Liber- man etal.: Perception is fully categorical only if there is no significant difference between phoneme categorization (i.e., predicted discrimination) and actually measured dis- crimination. In all other cases, one can talk only about various degrees of categorical perception. A great deal de- pends, of course, on the model used to derive the predic- tion; more about this will be said below. This article is about the effects of different discrimina- tion tasks on categorical perception. But that is not what we set out to do. Our original intention was to investigate the perception of vowels and to try to find an answer to the question of why vowels are generally perceived much less categorically than, for example, stop consonants (see, e.g., Fry, Abramson, Eimas, &Liberman, 1962; Repp, 1981). The results of Experiment1, however, did not provide an answer to this question, but did raise questions about the various aspects of the experimental task that had been used, thus changing the focus of the research and of this article. Since the design of Experiment1 was, of course, determined completely by the original hypothesis, this hy- pothesis will be briefly introduced, before we shift atten- tion to the discrimination task.The Original Hypothesis: Categorical Perception of Vowels? There is a clear difference in degree of categorical per- ception between stop consonants and vowels. An explana- tion proposed by Pisoni (1973, 1975) and Tartter (1981) is that this may be due to differences in cue duration. The es- sential acoustic cues for stop consonants are rapidly chang- ing formant transitions and a brief noise burst (Liberman etal., 1957; Tartter, 1982; see also Sawusch, Nusbaum, & Schwab, 1980, Experiment3). In contrast, vowels are as- sumed to remain uniform over a much longer duration (Delattre, Liberman, Cooper, & Gerstman, 1952). This dif- ference in cue duration has an effect on the availability of auditory memory for these two classes of speech sounds. 363Copyright 2004 Psychonomic Society, Inc. We thank Sieb Nooteboom for his thoughtful comments and sugges- tions. We are grateful to Theo Veenker for programming help. We also express our appreciation to Aad Houtsma, Cecile Kuijpers, and Frank Wijnen. Correspondence concerning this article should be addressed to M.E.H.Schouten, Utrecht Institute of Linguistics OTS, Utrecht University, Trans 10, 3512 JK Utrecht, The Netherlands (e-mail: bert. schouten@let.uu.nl).Note—This article was accepted by the previous editorial team, headed by Neil Macmillan.Categorical perception depends onthe discrimination taskE. GERRITS andM.E.H. SCHOUTEN Utrecht University, Utrecht, The Netherlands Speech sounds are said to be perceived categorically. This notion is usually operationalized as theextent to which discrimination of stimuli is predictable from phoneme classification of the same stim-uli. In this article, vowel continua were presented to listeners in a four-interval discrimination task (2IFC with flankers, or 4I2AFC) and a classification task. The results showed that there was no indication of categorical perception at all, since observed discrimination was found not to be predictable from the classification data. Variation in design, such as different step sizes or longer interstimulus intervals, did not affect this outcome, but a 2IFC experiment (without flankers, or 2I2AFC) involving the same stim- uli elicited the traditional categorical results. These results indicate that the four-interval task made it difficult for listeners to use phonetic information and, hence, that categorical perception may be a func- tion of the type of task used for discrimination.
PAGE – 2 ============
364GERRITS AND SCHOUTEN According to Pisoni (1973) and Fujisaki and Kawashima (1971), discrimination may be performed in an auditory mode or in a labeling mode (our terms). The auditory (or psychoacoustic) mode is supposed to use only bottom-upstimulus information, whereas in the labeling (or pho- netic) mode a subject first categorizes the stimuli and then bases the discrimination decision on the categories to which he has assigned the stimuli. The short cue duration for consonants is responsible for the inferior performance of the auditory mode. Presumably, the decay of rapidly changing acoustic information is too fast to make an au- ditory comparison of consonantal stimuli possible, with the result that discrimination is performed in a phonetic mode. This is not the case for vowels, which are, conse- quently, perceived much less categorically. Discussing the results of Pisoni (1973), Schouten and Van Hessen (1992) suggested that this low degree of categorical perception of vowels could be due to the nature of the stimulus mater- ial. Up to then, the vowels used as stimulus material had been modeled on productions in isolated words. When produced in isolated words, vowels are lengthened (hy- perarticulation). We hypothesized that, in running speech, temporal reduction and more complex spectral coding of the vowel would make vowel perception more categorical. To test this hypothesis, we intended to study the difference in perception between vowels spoken in isolated words and in a text read at a fast rate. However, as will be shown in the Results section, we obtained no categorical percep- tion in either condition, so the original hypothesis could not be confirmed. There was no visible relationship be- tween observed and predicted discrimination, and ob- served discrimination was actually worse than would be predicted from classification. The focus of our interest then shifted to the question of why we had failed to find even a hint of categorical perception. The New Hypothesis: The Effect of the Discrimination Task on Categorical Perception Great care had been taken over the choice of a discrim- ination task that would test the original hypothesis. If at all possible, we wanted to select a task that would leave a sub- ject free to use both auditory and phonetic information and would not encourage the use of one type of informa- tion at the expense of the other—that is, would not “push” the subjects into a particular mode. We therefore exam- ined the available tasks, speculating about subject behav- ior in each of them. In Experiment1, only one task was used, so only one of these speculations about subject be- havior could be tested. The ABX and AXB Tasks The prototypical discrimination test used for assessingcategorical perception is the ABX task, in which each trial consists of three intervals and a subject has to decide whether stimulus X in the third interval is the same as A in the first interval or B in the second interval (Liberman etal., 1957). In view of the relatively short time span of auditory memory (200–300msec), however, the rather high degree of categorical perception often found with the ABX task may, according to Massaro and Cohen (1983), reflect the exclusive use of phonetic memory. Subjects may try to remember both the auditory memory traces and the labels assigned to the A and B sounds. By the time X is presented, these auditory traces may have faded away. If they have, the subjects must rely on the labels they have assigned to A and B. This strategy may produce results in- dicating a high degree of categorical perception. More- over, as B.Schouten, Ger rits, and Van Hessen (2003), usinga signal detection analysis, have shown, the ABX task is subject to a very strong bias toward the response “ BX.”In theory, this need not worry us, since a signal detection analysis should provide us with a clear separation between sensitivity and bias, but in practice, the greater the bias, the less likely the conditions for such an analysis are to be met. The use of a variant of the ABX task, AXB, in which the second stimulus is identical to either the first or the third one and is close in time to both, has yielded contra- dictory results. Van Hessen and Schouten (1999) reported that their subjects often ignored the third stimulus, thus annulling the expected advantage over ABX. Gerrits (2001, pp.42–49), however, found considerable differ- ences between AXB and AX discrimination. The 2I2AFC Task Similar problems hold for another paradigm: two- interval two-alternative forced-choice (2I2AFC) discrim- ination. In the 2I2AFC paradigm, the stimuli are always different, and the subject has to determine the order in which they are presented (AB or BA). This makes it nec- essary to explain to the subjects what the term order means and even to mention the phoneme categories in the instructions, at the risk of encouraging labeling behavior (Schouten & Van Hessen, 1992). Response bias is much smaller than that in the ABX task; it favors a response that says “the first stimulus is closer to the phoneme prototype than the second” (B.Schouten etal., 2003). The AX Task To avoid strategies that rely exclusively on labeling, we need a task that reduces the load on auditory memory. Such a task could be AX (same–different) discrimination. In an AX discrimination experiment, the subject has to de- termine whether the two stimuli in a trial are the same or different. A disadvantage of this paradigm is that subjects may decide to respond “different” only if they are very sure of their decision. This means that AX may be strongly biased: A subject’s response could be completely domi- nated by a subjective phoneme-based criterion, very close to one end of a scale between sameand different .The 4IAX and 4I2AFC Tasks A discrimination test that is regarded as sensitive to au- ditory differences between speech stimuli is the 4IAX task (Pisoni, 1975). In the 4IAX test, two pairs of stimuli are presented on every trial; one pair is the same, and one pair is different (e.g., AB–AA, AA–BA, or BA–BB), and the
PAGE – 3 ============
CATEGORICAL PERCEPTION365 subject has to decide which pair contains the odd one out. The 4IAX task is assumed to be more sensitive to purely auditory cues, since a correct decision can be largely based on bottom-up auditory information and is thought not to be subject to strong top-down skewing by subjective criteria, such as phoneme boundaries. However, we de- cided not to use a 4IAX task, but a task in which impor- tant aspects of the 2I2AFC and the 4IAX tasks are com- bined: the 4I2AFCtask(see, e.g., Heller & Trahiotis, 1995; Trahiotis & Bernstein 1990). In this task, a subject is ex- pected to be free to use both auditory processing and pho- neme labeling. The Aand Bstimuli are presented randomly in the two possible orders AABA or ABAA, with a50% apriori probability. Stimulus Aat the beginning and end of each quadruplet functions as a reference. The subject has to decide whether the odd one out occurred in the sec- ond or in the third interval. The flanking stimuli are there to make direct auditory comparisons of the stimuli easier, and they may also make a low-bias 4IAX type of strategy possible, in which two differences are compared. On the other hand, the 2I2AFC aspect (order detection: AB or BA, leaving the two flanking A stimuli out of considera- tion) may encourage labeling. This task, therefore, should be a useful diagnostic instrument for determining whether and to what extent categorical perception occurs and was the focus of interest in ExperimentI. The aim of this arti- cle is to find out whether the 4I2AFC task really does allow both auditory and phonetic processing. EXPERIMENT 1MethodStimuli .The stimulus material consisted of two continua of eight vowel stimuli ranging from / u/ to /i/in a /pVp/ context.The vowels /u/ and /i/ were selected because it was expected that the differences between the two speech conditions would be greater with these two vowels than with most other vowels, due to the relatively long artic- ulatory trajectories required to reach them. The first step in stimu- lus generation was recording the vowels / u/ and /i/ in the meaning-ful words / pup/ and /pip/ produced both in isolation at a rather slow rate and in a text that was read aloud at a rapid speech rate by a male native speaker of Dutch. The speaker was instructed to read the text three times at an increasing speech rate. The third recording was se- lected, since it was the most rapidly read version, measured as the total amount of reading time for the whole text. In each of the two conditions, there were nine repetitions by the speaker of the / pup/ and /pip/ words. Five phonetically untrained listeners identified 30-msec segments from these vowels in an open- set identification task (12 monophthongs). The vowels spoken in isolated words, or word vowels , were significantly more often iden- tified as / u/ and /i/ than were the vowels from the fast text, or textvowels (65% vs. 32% for /u/ and 42% vs. 7% for/ i/). Other fre-quently used response categories were / o/ for /u/ and /y/ for /i/. Allthe words were rated on a 7-point acceptability scale by a listening panel that consisted of five phoneticians. The word pairs that were used as endpoints in the two stimulus continua were selected on the basis of acceptability and of matching vowel duration within a pair. The acoustic differences between the word and the text vowels were determined with analyses of duration and formant frequency. The text vowels were temporally reduced, as compared with the more carefully articulated word vowels. The duration of the word / u/was 90msec; its steady-state component was 60msec. The duration of the text / u/ was 70msec, with a steady-state component of 30msec, a reduction of 22% and 50%, respectively. The duration of the isolated word / i/ was 90msec; its steady-state component was 50msec. And the duration of the text / i/ was 70msec, with a steady- state component of 15msec, a reduction of 22% and 70%, respec- tively. This temporal reduction is comparable to the reduction re- ported in the Dutch studies by Schouten and Pols (1979) and Van Son and Pols (1990). The temporal vowel reduction found by Schouten and Pols was 28%. The reduction of steady-state segment duration was, on average, 38%. Van Son and Pols found a temporal reduction of 15% between vowels in a text that was read at a normal speech rate and those in a text read as fast as possible. An analysis of the formant frequencies of the vowels in the two speech conditions was also attempted. Since formant extraction failed with the text vowels (no second formant for / u/ could befound, nor a third formant for / i/), it was impossible to quantify the degree of spectral reduction. The absence of these formants, how- ever, suggests some loss of spectral detail in the text vowels. The stimuli in the continua between the original utterances were obtained by interpolation between the relative amplitudes of the spectral envelopes of the vowels. The interpolation method had been used successfully in studies on categorical perception by Schouten and Van Hessen (1992) and Van Hessen and Schouten (1992). This method was preferred to working in the formant domain: Since no second formant could be defined for text / u/, interpolating between the formants of the text vowels was impossible. Moreover, in this way, we avoided the risk that, after having listened for a while to stimuli in which only one or two parameters were varied, some sub- jects would learn to attend selectively to those parameters. The ex- perimental design was intended to motivate the listeners to focus on the speech signal as a whole. (Van Hessen & Schouten, 1999, have shown that there is an increase in categorical perception as synthe- sis quality improves from a simple synthesis by rule, via linear pre- dictive coding (LPC) synthesis, to the more complex method used in the present study.) The importance of stimuli in which more than one parameter is varied was also mentioned by Liberman (1996), who predicted that, with proper synthesis, when the acoustic signal changesin all relevant aspects and not just one cue is varied, the discrimina- tion functions will come much closer to being perfectly categorical. The first step in the interpolation method was an analysis of the spectral envelopes of the original vowels in terms of the phases and amplitudes of up to 70 spectral components between 80 and 5000Hz, depending on spectral density. Before interpolation, the sig- nal was split into a source spectrum and a filter spectrum by means of cepstral deconvolution. The spectral envelopes of the eight stim- uli, obtained by means of seven linear interpolation steps between each of the 70 pairs of spectral components, were then reconvolved with the original source spectrum of the / u/. The interpolation was al- ways done in overlapping 25.6-msec time frames over the full length of the vowel (frame shift was 6.4msec). Parameters such as F0, du-ration, and voice quality remained constant. For more details of this procedure, see Schouten and Van Hessen (1992) or Van Hessen (1992).Stimulus generation resulted in two continua of eight stimuli that sounded completely natural and convincingly like utterances from the original speaker. In each continuum, the initial / p/ and the final /p/ of the stimuli were copied from the original word / pup/. In a pilotexperiment, the stimuli of the two continua were identified (open set) by a listening panel that consisted of five phoneticians, all well- trained listeners. The listeners’ identification responses were always /u/ or /i/. In none of the cases were the stimuli identified as the Dutch vowel / y/, which might have been expected, because the F2 of thiscentral vowel lies between those of / u/ and /i/. The absence of an in-termediate / y/ is a result of the interpolation method, since it moves from one vowel to another by progressively lowering and raising spectral peaks. Formant peaks that do not occur in either endpoint cannot occur in any of the interpolated stimuli.
PAGE – 4 ============
366GERRITS AND SCHOUTEN The fundamental frequency and duration of the stimuli were the same as those of the original /pup/. In the word vowel continuum, F0 was 120Hz, and stimulus duration was 215msec (vowel, 90msec; steady state, 60msec). The duration of the stimuli in the text vowel continuum was 187msec (vowel, 70msec, steady state, 30msec), and F0 was 125Hz. Interstimulus interval .Since auditory memory is time depen- dent, it was important to make a considered decision about the in- terstimulus interval (ISI). If this interval exceeds the life span of au- ditory memory, all that will be left of the first stimulus will be a representation coding its relationship to the other stimuli in the ex- periment, to preestablished categories, or to both (Pisoni, 1973). Mas- saro (1972a, 1972b, 1974) tried to determine the time a sound pattern is held in some unanalyzed form. His results indicated that a pro- cessing time of approximately 250msec is sufficient for recognition of a speech signal. These results were in agreement with those of Dorman, Kewley-Port, Brady and Turvey (1977) and Plomp (1964). Cowan and Morse (1986), Pisoni (1973), and Van Hessen and Schouten (1992) tested the effect of varying the ISI on discrimina- tion performance for speech sounds. On the basis of Massaro’s (1974) results, within-category discrimination should decrease with an increasing interval, reflecting the fading of the memory trace. This is in agreement with the results of Van Hessen and Schouten (1992), who found a decrease of within-category stop-consonant dis- crimination with increases in ISI from 100 to 300msec. The between- category results of the discrimination studies by Cowan and Morse(1986), Pisoni (1973), and Van Hessen and Schouten (1992) con- firmed the notion that processing of the auditory signal is not ter- minated after 100–200msec. Their results indicated that when listen- ers use a labeling strategy to compare stimuli, discrimination improves as an effect of increasing ISI: Discrimination performance increases rapidly between 100 and 500msec, reaches a maximum between 500 and 1,000msec and falls gradually as the ISI increases further. On the basis of these results, we assumed that, after an ISI of more than 100–200msec, labeling processes would take over from direct auditory comparison. We did not want to use an interval shorter than 200msec, since this might have increased the chance of mutual masking among the stimuli. In line with Massaro (1974), Pisoni (1975), and the within-category results of Van Hessen and Schouten (1992), we therefore decided to use 200-msec intervals, hoping that the required information would be available for direct auditory com- parison of successive stimuli in a trial (200msec was, of course, a nominal interval, indicating only the silences between successive stimuli; the vowels were additionally separated by the durations of two instances of / p/, which added a total of 117 and 125msec to the intervals between the text and the word vowels, respectively). Subjects. The subjects were 19 students at the Faculty of Arts at Utrecht University. They had no known hearing deficits and were all native speakers of Dutch. They were paid a fixed hourly rate. Design.Experiment1 consisted of six tests, three for each of two vowel continua, involving the same subjects. The tests were fixed discrimination, roving discrimination, and classification. The sub- jects took the tests in a fixed order, counterbalancing fixed and rov- ing discrimination and the word and text vowel continua across sub- jects, but the classification tests were always performed after all the discrimination tests.The discrimination used in the experiment was the 4I2AFC task (AABA/ABAA; the subjects had to indicate whether Stimulus Boc-curred in the second or the third interval). The stimuli in the second and third intervals always differed by one step along the continuum; the number of comparisons was, therefore, seven. The intertrial in- terval was determined by the response time. The ISI within a trial was 200msec. In the fixed-discrimination experiment, only one stimulus pair (Aand B) was presented during a block of trials. The fixed- discrimination test consisted of seven blocks, one for each of the stimulus pairs, which were clearly separated from each other. Each block contained 64 trials, 32 for each of the two possible combinations, AABA and ABAA. The order of blocks was ran- domized for the fixed discrimination experiment. In the roving- discrimination experiment, the A and B stimuli to be discriminated were drawn randomly from the total range of stimuli and, thus, var- ied from trial to trial. In the roving-discrimination test, 7 64 trialswere presented. In the classification test, all eight stimuli had to be identified 64 times in a random order. Classification involved a forced choice be- tween two alternatives, the vowels / u/ and /i/. There was no response time limit.Procedure . The stimuli were presented to the subjects over head- phones in a sound-treated booth. In the discrimination tests, it was stressed that differences between the stimuli would be small and, in most cases, could be detected only by listening carefully to all de- tails of a stimulus. No phoneme labels were mentioned in the in- structions, but the subjects were told that three of the four stimuli were going to be identical and that the oddball was either the second or the third one. They responded by mouse clicking on one of two response fields (labeled “2” and “3”) on a computer screen. After the response had been made, visual feedback of the correct answer was given, so that the subject was able to judge and possibly improve his or her performance. Discrimination training consisted of 128 trials (responses to which were not stored) and was intended to familiar- ize the subjects with their task. In the fixed-discrimination context, the first 10 trials of every block were considered practice and were not included in the data analysis. In classification, one stimulus was played on each trial, and the subject had to identify it by mouse clicking on a response field la- beled “oe” or “ie” (/u/ or /i/). The only training consisted of 16 tri- als, two repetitions of the eight stimuli in the continuum, presented randomly. ResultsThe results of Experiment1 are displayed in Figures 1–3 and in Table 1. Figures 1 and 2 show the classification and discrimination data for the word vowels and the text vowels, respectively. The data displayed in the figures rep- resent the averages of 19 subjects’ individual dscores(the individual subjects’ data can be found on the World- Wide Web: www.let.uu.nl/~bert.schouten/personal/ger- rits.htm). The numbers (n) along the abscissa refer tostimulus pairs, consisting of the (n) and (n+ 1) stimuli; nis, therefore, a number between 1 and 7. The dscore atStimulus Pair 6, for example, represents the discrimina- tion of Stimulus 6 and Stimulus 7. The stimuli in Pair 1 re- semble / u/, and the stimuli in Pair 7 sound like / i/.The discrimination dscores were calculated by sub- tracting z(FA) from z(H), with 2 (Macmillan &Creelman, 1991, p.121). The results of the classification test are presented as predicted discrimination scores. The transformation of the classification data into predicted discrimination was done as follows. For each pair of stim- uli A and B and response alternatives / u/ and /i/, the pro-portion of / u/-responses to stimulus A (position n) was re- garded as an estimate of p(H), and the proportion of /u/-responses to stimulus B (position n1) was taken as an estimate of p(FA). The classification dvalues were de- termined by subtracting z(FA) from z(H). The values of p(H) and p(FA) were limited to the .99–.01 range, which
PAGE – 5 ============
CATEGORICAL PERCEPTION367 meant that the maximum dvalues that could be obtained were 4.65 *0.523.29 for discrimination and 4.65 forclassification. Figures 1–3 show that there was not much difference between the two vowel conditions. We expected the data in Figure 1 to be less categorical than those in Figure 2 and, hence, the ddifference scores in Figure 3 to be much higher for word vowels than for text vowels. This clearly was not the case: The word vowels were not perceived less categorically than the text vowels. Neither figure shows anything like the expected relationship between observed and predicted discrimination, so we can conclude that there is no indication of categorical perception for either of the two vowel conditions. The results of an analysis of variance (ANOVA) con- firmed what is shown by the figures. Fixed independent variables were task (three levels), vowel condition (two levels), and stimulus (seven levels, nested under vowel condition). Cell variance was over 19 subjects. Perfor- mance was not significantly affected by vowel condition [F(1,41)0.12, p.731]. The effect of task was signif- icant [F(2,41)14.62, p.01]. There was also a signif- icant effect of stimulus and a significant interaction be- tween task and stimulus [ F(6,41)5.58, p.01, andF(12,41)4.56, p.01, respectively]. A Ne wman–Keuls test on the task factor revealed a significant difference be- tween the means for roving discrimination and classifica- tion [F(795,2)12.45, p.01]. A Newman–Keuls test on the stimulus factor (word vowel continuum) showed a significant peak for fixed discrimination at Stimulus Pairs 1 and 2 and a significant peak for classification at Pair3. A similar test on the data from the text vowel continuum revealed a significant peak in the classification curve at Stimulus Pairs 3, 4, and 5. Figure1. Predicted discrimination (classification) and actually mea- sured discrimination (fixed and roving) for the word vowels. 4I2AFC, four-interval two-alternative forced choice. Figure2. Predicted discrimination (classification) and actually mea- sured discrimination (fixed and roving) for the text vowels. 4I2AFC, four-interval two-alternative forced choice.
PAGE – 6 ============
368GERRITS AND SCHOUTEN The two vowel conditions were compared in two dif- ferent analyses: a calculation of difference scores and a calculation of a categorical perception (CP) index, a mea- sure of the degree of categorical perception. In Figure3, the two vowel conditions are compared by calculating the difference scores between obtained discrimination and discrimination predicted by the classification data for each stimulus pair (as in Pisoni, 1975, but here in terms of averaged individual ddifferences). If text vowels are per- ceived more categorically than word vowels, we should expect a smaller difference between the obtained and the predicted functions for the former condition than for the latter (if perception is fully categorical, the difference scores should be zero). A series of paired-samples sign tests on the ddifference scores between classification and discrimination in Figure 3 revealed that there were only two stimulus pairs with a significant effect of vowel con- dition: Stimulus Pairs 3 and 5. At Stimulus Pair 3, the dif- ference between obtained and predicted discrimination of word vowels was higher than it was for text vowels, and at Stimulus Pair 5 the opposite occurred: The difference be- tween obtained and predicted discrimination of text vow- els was higher than that for the word vowels. Difference scores estimate the difference in degree of categorical perception between word and text vowels. However, in an ANOVA or sign test, only means are com- pared, and not the shapes of the various functions. There- fore, another criterion for degree of categorical perception has been proposed, the so-called CP index introduced by Van Hessen and Schouten (1999). Van Hessen and Schouten (1999) have shown that the CP index can be used to esti- mate differences in categorical perception between vari- ous stimulus synthesis modes. They calculated the CP index as follows: (1)In this equation, CPis the degree of categorical perception ranging from 0 to 100 (or to 100, in the case of negative correlations), ris the coefficient of correlation between classification and discrimination, and the denominator contains a term determined by the averaged absolute dif- ferences between the data points of the classification and the discrimination functions, multiplied by a constant cho- sen in such a way that the full range can be used. This equation expresses the degree of categorical perception as a function of the resemblance (numerator) and proximity (denominator) of the two functions. In Table 1, the CP in- dices, averaged over the 19 subjects, are shown for the word vowels versus the text vowels. Our hypothesis that word vowels are perceived less cat- egorically than text vowels would predict that the CP index of the word vowels should be lower than that of the text vowels. This was not really the case: All CP indices were effectively zero. Across the board, the CP indices presentedthe same picture as the difference scores: There was no in- dication of categorical perception for either of the two vowel conditions. DiscussionThe lack of any clear correlation between the discrimi- nation and the classification results is counterintuitive to anyone who has ever spent any time comparing classifi- cation and discrimination within a categorical perception context and makes it impossible to evaluate the original hypothesis about categorical perception of vowels. More- over, not only was there no evidence of the expected rela- tionship, but also the subjects did not even manage to de- tect any differences between stimuli to which, during classification, they gave, fairly consistently, different pho- netic labels. There appears to be something about the 4I2AFC discrimination task that prevents subjects from using phonetic information. If this is true, it could point to two different perceptual strategies: an auditory compari- son of stimulus information during discrimination and phonetic labeling during classification. In other words, during discrimination, listeners are in an auditory mode and, during classification, in a phonetic mode. Such a con-clusion would be almost the exact opposite of our predic- CPabsclassdiscr =+×()−()[]×rdd102 100..Figure3. Difference scores between the obtained and the pre- dicted data for each stimulus pair in Figures 1 and 2. The dashed lines represent the difference scores for the word vowels; the solid lines represent the difference scores for the text vowels. Table1 Mean Categorical Perception-Indices and Standard Errors of the Mean for Four-Interval Two-Alternative Forced-Choice (4I2AFC) Discrimination of Word Vowels and Text Vowels Vowels Fixed Roving Task MSEMSE Word Vowels 4I2AFC7819 Text Vowels 4I2AFC69110
149 KB – 14 Pages