A PARALLELOGRAM MODEL OF TIMBRE ANALOGIES Thesis for the Degree of M. A MICHIGAN STATE UNIVERSITY DAVID EDWARD EHRESMAN 1977 3 1293 102617626“ M~ . ...M.. ...m.. H... __ _ - Var-I‘m- ”I“ my.“ ABSTRACT A PARALLELOGRAM MODEL OF TIMBRE ANALOGIES By David Edward Ehresman Recent multidimensional scaling studies have found that of the acoustical properties on which tones produced by mus- ical instruments differ, two or three are important in the perception of timbre. These findings were replicated and the dimensions were used as the basis for a parallelogram model of timbre analogies. Fifteen naturalistic tones were synthesized by using an analysis based additive synthesis technique. The complex time varying amplitude and frequency/phase functions ob- tained during the analysis step were simplified by replacing them with straight line segment approximations during the synthesis step. Five musically sophisticated and five mus- ically untrained subjects rated the dissimilarity of all possible pairs of the 15 tones. The multidimensional scal- ing of this data was interpreted in two dimensions. The most potent dimension was interpreted in terms of the spec- tral energy distribution of the tones. The second dimension was related to the attack portion of the tones, that is, David Edward Ehresman whether the onset of the higher harmonics was synchronous or asynchronous. These interpretations agree with previous research. The parallelogram model of timbre analogies is based on a mapping of the stimulus tones onto a multidimensional space. The model assumes that for an analogy of the form A:B::C:? there is some ideal analogy point, I, that will complete a parallelogram. The prediction of the model is that in solving timbre analogies a subject will choose the alternative, D, which is closest to I in the multidimen- sional space. Three alternatives to the parallelogram model were tested. One possibility is that the subjects are unable to use the directional information that is the basis of the parallelogram model. In this situation, a subject might choose as the best solution to the analogy the alter- native D which is most similar to B, the terminal tone of the first half of the analogy. A second possibility is that subjects only use the most salient dimension in solving the analogies by projecting the parallelogram onto that axis. This model predicts that subjects will choose as the best solution the alternative D which is closest to the ideal analogy point in that one dimension. The third alternative model is a combination of the first two. It predicts that the similarity of the terminal tones along the one dimension is the basis for choosing the best solution. David Edward Ehresman Using the 15 tones from the first CXporiment, forty timbre analogies were formed; each analogy had {our alterna- tive solutions. Nine subjects from the scaling phase of the study rank ordered the four alternatives as to which best completed the timbre analogy. The parallelogram model best predicted subjects' solutions to the timbre analogies. The effects of musical training were not reflected in either the scaling solution or performance on the analogy task. A PARALLELOGRAM MODEL OF TIMBRE ANALOGIES BY David Edward Ehresman A THESIS Submitted to Michigan State University in partial fulfillment of the requirements for the degree of MASTER OF ARTS Department of Psychology 1977 ACKNOWLEDGMENTS I would like to express my appreciation to Dr. James Zacks, Dr. Cordon Wood, and Dr. William Hartmann for their help in serving as members of my thesis committee. I am also very grateful to Dr. David Wessel, my committee chairman, whose guidance, assistance, and inspiration made this work possible. My special thanks to to Dr. Judy Frankmann and Joey Mazzella for their editorial assistance and to my wife, Mary Anne, for her understanding support. Finally, I would like to thank Toni Tryon for her willing help in preparing this manuscript. ii TABLE OF CONTENTS LIST OF TABLES LIST OF FIGURES INTRODUCTION Timbre. . . Analogical Reasoning. SYNTHESIS OF STIMULI EXPERIMENT 1: DISSIMILARITY JUDGMENTS OF NATURALISTIC TONES . . . . . . . . Stimuli Procedure Subjects. . Results and discussion. EXPERIMENT 2: A TEST OF THE PARALLELOGRAM MODEL Stimuli Procedure Subjects. . . Results and discussion. CONCLUSION REFERENCES APPENDIX iii .iv .13 .29 .29 .29 .29 .30 .45 TABLE 1 2 LIST OF TABLES KYST stress values. INDSCAL goodness-of-fit correlations. Rank order data averaged over all subjects and all analogies. Rank order data for musically sophisticated subjects. Rank order data for musically untrained subjects. Rank order data from analogies of the form A:C::B:D. Rank order data from analogies of the form A:B::C:D. iv PAGE 30 33 48 S4 54 55 55 FIGURE 10 ll 12 13 14 15 16a 16b 17 18 LIST OF FIGURES Oboe l (01) amplitude envelope. Oboe 2 (02) amplitude envelope. Clarinet 1 (Cl) amplitude envelope. Clarinet 2 (C2) amplitude envelope. English horn (EH) amplitude envelope. Bassoon (BN) amplitude envelope. Flute (FL) amplitude envelope. Saxophone 1 (X1) amplitude envelope. Saxophone 2 (X2) amplitude envelope. Saxophone 3 (X3) amplitude envelope. French horn (FH) amplitude envelope. Trumpet (TP) amplitude envelope. Cello 1 (SI) amplitude envelope. Cello 2 (82) amplitude envelope. Cello 3 (83) amplitude envelope. Composite Shepard diagram for five musically sophisticated subjects. Composite Shepard diagram for five musically untrained subjects. Two dimensional subspace of the INDSCAL three dimensional weight space. INDSCAL two dimensional weight space. PAGE 15 16 17 18 19 20 21 22 23 24 25 26 26 27 28 31 32 34 34 FIGURE 19 20 21 22 23a 23b 24 25 26 INDSCAL two dimensional timbre space. INDSCAL X coordinates versus the first moment of the average amplitude of the harmonics. INDSCAL Y coordinates versus a weighted standard deviation of the onset times of the upper harmonics. A two dimensional subspace of Grey‘s (1975) three dimensional INDSCAL timbre space. Two dimensional KYST timbre space for musically sophisticated subjects. Two dimensional KYST timbre space for musically untrained subjects. ID distance versus the proportion of subjects who ranked an alternative as the best solution. ID distance versus the logarithm of the proportion of subjects who ranked an alternative as the best solution. Observed versus predicted proportion of subjects who ranked each alternative as the best solution. vi 39 4O 43 44 49 52 S3 INTRODUCTION The spectral energy distribution of a complex tone and the temporal properties of its attack provide major cues in the perception of its timbre. Recent multidimensional scal- ing studies report timbre spaces with these properties as salient dimensions. A parallelogram model of timbre analo- gies is based on such a multidimensional timbre space. The assumption is made that for an analogy of the form A:B::C:?, where A, B, and C are points in the timbre space, there is some ideal analogy point, I, that completes a parallelogram. This model predicts that the alternative which is closest to this ideal analogy point will be chosen as the best solution to the analogy. Timbre Timbre is not a clearly defined term; it is usually described negatively in terms of the qualities that are left after loudness and pitch have been determined. The defini- tion given by the American Standards Association (1960) is such a definition by exclusion: ”Timbre is that attribute of auditory sensation in terms of which a listener can judge that two sounds similarly presented and having the same loudness and pitch are dissimilar.” Similarly presented presumably implies that the stimuli are of equal duration and have the same spatial location. 1 2 The classical theory of timbre perception originated with Helmholtz. His argument, based on Ohm's Acoustical Law (Ohm, 1843), stated that differences in timbre are a result of the presence and the strength of harmonics in the tone and that the phase relationships among the harmonics make little difference (Helmholtz, 1877/1954). It is important to note that Helmholtz studied only the steady-state portion of complex tones, choosing to ignore the dynamically chang- ing phenomena which are characteristic of naturalistic auditory tones. Helmholtz was able to show that the steady-state por- tion of musical and vocal sounds is composed of sets of har- monics and that the ear can distinguish a number of these harmonics individually. After Ohm, he reasoned that the ear performs a Fourier analysis on a tone and thereby identifies the amplitude pattern of the resulting series of harmoni- cally related sinusoids which forms the basis of timbre judgment. A modification of Helmholtz's classical theory makes use of the notion of formant regions. A formant is a fre- quency range in which the amplitudes of the harmonics are considerably higher than the amplitudes of the harmonics in the neighboring regions; a formant shows up as a peak in the spectral envelope. This model, which originated in speech perception research, contends that it is the formants, not the harmonics, that provide the major cues in timbre judg- ments (Fletcher, 1934; Bartholomew, 1945; Slawson, 1968). 3 The classical theories of timbre dealt only with steady-state tones. Musical tones, however, are usually considered to consist of three segments: (1) the attack, the portion of the tone in which it builds in amplitude, (Z) the steady-state, the portion which is reached at the end of the attack, in which the tone remains stable, and (3) the decay, the portion of the tone in which its ampli- tude decreases until it has finished sounding. An important question is to what extent the perception of timbre is determined by the attack and decay portions of a tone where many dynamic changes occur in the spectral distribution. Several studies indicate that the acoustical details in the attack transient are very important for the identifica- tion of the instrument which produced a tone. In one of these studies (Saldanha and Corso, 1964), trained musicians identified the instrument which produced a tonal stimulus when the tone consisted of (l) the initial transients and a short steady-state (1/3 sec.), (2) the initial transients, a short steady-state, and the decay transients, (3) the ini- tial transients, a long steady-state (9 sec.), and the decay transients, (4) a short steady-state without the character- istic attack or decay, or (5) a short steady-state and the decay transients. A tone for each condition was obtained from eight wind instruments and two string instruments. The absence of the decay transients (Group 1 vs. Groups 2 and 3; Group 4 vs. Group 5) appeared to have minimal effect on the 4 ability to recognize the instrument (i.e., label the timbre). However, the absence of the characteristic attack transient (Groups 1, 2, and 3 vs. Groups 4 and 5) was detrimental to performance. Berger (1964) found similar results for 10 wind instruments. He asked musicians to identify the instrument which produced a tone when the stimulus consisted of (l) the unaltered tone, (2) the steady-state portion of the tone, or (3) the fundamental component of the tone with the harmonics filtered out. As expected, the unaltered tones were easier to identify than those without the transients, which in turn were easier to identify than those with the harmonics filtered out. Wedin and Goude (1972) also noted that attack transients were important in the identification of the timbre of a tone. In addition, Saldanha and Corso (1964) found that the timbre of tones played with vibrato, a tem- porally dynamic phenomenon, was easier to identify than the timbre of tones played without vibrato. This work in timbre perception was valuable to investi- gators in the related field of music synthesis. With a growing awareness that temporal events play an important role in timbre perception, these researchers sought a way to synthesize naturalistic tones with dynamic attack transients. Tones were analyzed to determine the temporal envelope (time vs. amplitude) for several or all of the harmonics in the tone. Part or all of this information was then used to synthesize tones. 5 Strong and Clark (1967a, 1967b) determined the steady-state spectral envelope (frequency vs. amplitude) and several temporal envelopes for three subsets of harmon- ics associated with each of nine wind instruments. Using this information, they synthesized tones with the aid of a digital computer. Music students were able to identify the synthesized tones with 66% accuracy as compared to 85% accu- racy for the natural tones. After observing the effects of systematically exchanging envelopes among instruments, Strong and Clark concluded that tones from some instruments had unique spectral envelopes and that for these tones the spectral enve10pe was more important for correct identifica- tion than was the temporal envelope. Tones from other instruments did not have unique spectral envelopes, and for these tones the temporal enve10pe was as important or more important than the spectral envelope in the identification process. Grey (1975) approached the synthesis problem somewhat differently. First, he determined the time varying ampli- tude and frequency functions for each of the harmonics present. Then he synthesized tones using all of the infor- mation to test the validity of the analysis (full data tones). Next he systematically simplified the information used in the synthesis to determine the effect of particular types of information. The simplifications he used were: (1) representing the complex time variant amplitude and fre- quency functions with small numbers of straight line 6 segments (line segment approximation tones), (2) excluding any clearly delineated initial attack segments which con- tained low-amplitude inharmonicities (cut attack tones), and (3) substituting constant frequencies for the time variant frequency functions while retaining the time variant line segment approximations for the amplitude functions (constant frequency tones). Grey (1975) tested for discriminability between the five types of tones, i.e., the original tones in digitized form, the full data tones, and the three types of simplified tones; he also asked his musically trained lis- teners to rate the subjective distances between tones. The results of both of these measures indicated that the full data tones were an adequate representation of the original tone. Verbal reports of the listeners indicated that dif- ferences between the original and the full data tones were difficult to detect, and when detected, the stimuli were described as tones from the same instrument played with a different articulation or style of playing. The line seg- ment approximation tones were very similar to the full data tones; however, the constant frequency tones and the cut attack tones were too discriminable to be of general use. Thus, it appears that perceptually convincing renditions of naturalistic tones can be obtained when the details of amplitude and frequency variation in each harmonic are approximated with relatively few linear segments. Several studies have used multidimensional scaling (MDS) techniques (Shepard, 1962a, 1962b; Kruskal, 1964a, 1964b; . I I' a L 7 Carroll and Chang, 1970) to gain a better understanding of timbre perception. Using matrices of dissimilarities (or similarities) between objects, these MDS techniques attempt to represent the dissimilarities as distances between points in an n-dimensional space (usually a two or three dimensional Euclidean space). The resulting "picture" can be viewed in two ways: (I) as a very useful data reduction suggesting hypotheses for new lines of research, or (2) more specula- tively, as a model of the way the objects are perceived. Plomp and Steeneken (1969) were the first to use MDS as a model of timbre perception. However, their study used only steady-state tones. Tones which included the trans— ients were used by Wessel (1973), who had music students rate the dissimilarity of tones played on nine orchestral instruments. Using MDS as a data reduction tool, he embed- ded the instruments in a two dimensional Euclidean space. One dimension differentiated among timbres by the distribu- tion of energy in the steady-state region of the tones. The energy of the tones on one end of this dimension was located predominately in the lower harmonics, while tones on the other end had more energy located in the higher harmonics. The second dimension was more difficult to interpret using a single physical characteristic of the tones. The tones tended to be grouped by family (i.e., brass, woodwinds, and strings). This dimension appeared to be related to temporal properties of the tone, i.e., differences in the attack segment. Wessel and Grey (in press) also scaled the 8 similarity judgments of nine instrument tones reported by Wedin and Goude (1972). The results were similar to those described above and strongly supported the important role of attack transients in providing perceptual distinctions among instruments. Grey (1975) used MDS techniques to analyze the similar- ity of the 16 ”line segment approximation" tones described earlier. Grey noted three ways in which these tones dif- fered from the tones used by Wedin and Goude (1972) and Wessel (1973). They were (1) shorter in duration, (2) syn- thesized naturalistic tones rather than actual natural tones, and (3) experimentally equalized for pitch, loudness, and perceived duration. His data reduction yielded not two, but three dimensions. One dimension, the spectral energy distribution, paral- leled the first dimension found in previous studies (Wessel, 1973; Wedin and Goude, 1972 as scaled by Wessel and Grey, in press). A second dimension reflected patterns in the onset- offset portion of the tones. At one extreme of this dimen- sion all of the upper harmonics entered and exited at approx- imately the same time or with synchrony; at the other extreme the upper harmonics entered and exited gradually or asynchro- nously. This dimension also related fairly well to musical instrument families. The third dimension also focused on the attack segment of the tones. Tones were differentiated by the presence of high frequency, low amplitude, usually inharmonic energy, during the attack segment as opposed to 9 low frequency inharmonic energy, or at least the absence of high frequency inharmoniousness in the attack. Since two of Grey's dimensions were interpreted in terms of the attack transients, it appears that they are encompassed by Wessel's second dimension. The consistency of findings and the com— plementary nature of the results in these studies clearly pin-point important attributes of tones which need to be taken into consideration in future studies of timbre. Analogical Reasoning Rumelhart and Abrahamson (1973) have presented an intuitively appealing theoretical model of analogical reason- ing based on MDS techniques. They assume that the elements to be used in forming analogies have been embedded in a multidimensional space. Their model states that for an anal- ogy of the form A:B::C:?, there is some "ideal analogy point” (Rumelhart and Abrahamson, 1973, p. 4), "I", that completes the analogy such that line segments connecting the four points in the multidimensional space will form the sides of a parallelogram. In other words, there is some vector, CI, which is parallel to and equal in length to the AB vector. The coordinates of this ideal analogy point, I, can be com- puted from the following formula: I(J’) = C0) + 30') - A(j), J'= 1m (1) where I(j), C(j), B(j), and A(j) refer to the coordinate on the jth dimension of points I, C, B, and A respectively, and n is the dimension of the multidimensional space. 10 This model, which will be referred to as the parallel- ogram model of analogical reasoning, predicts that for an analogy of the form A:B::Cle, D2, D3, or D4, the probabil- ity that a particular alternative will be chosen as the best solution to the analogy is a monotonic decreasing function of the distance between that point and the point "I" (the ID distance) in the multidimensional space. Rumelhart and Abrahamson found support for this model by using a three dimensional space of animal names obtained from a scaling study by Henley (1969). A second experiment reported in the same paper found further support for the model. Forming analogies from the same set of animal names they (1) replicated the first experiment, (2) showed that the probability of choosing a particular alternative did not depend on the particular analogy problem, provided that the distances between the ideal solution point and the alterna- tive were approximately equal, (3) found support for the idea that the monotonic decreasing function which relates the probability that an alternative will be chosen as the best solution to the ID distance is miexponential one, and (4) found that the 2nd, 3rd, and 4th best solutions were also predicted by the distance between the alternative and the ideal analogy point. An implicit assumption of the parallelogram model is that subjects are able to judge the similarity of the vectors involved in an analogy. If subjects are unable to appreciate the directional information implied by the concept of vectors, 11 they may solve the analogies by choosing the alternative D (the endpoint of the CD vector) which is most similar to B (the endpoint of the AB vector). This alternative hypothesis will be referred to as the similarity of terminal tones model. A second assumption is that subjects are able to use multidimensional information in solving an analogy. However, due to the complexity of the timbre analogy task, subjects may resort to using only the most potent dimension in solving an analogy. This possibility gives rise to two more alterna- tive hypotheses. The first of these potent dimension hypotheses is based on the parallelogram model. It states that subjects project the parallelogram onto the most potent dimension and proceed as in the parallelogram model. This hypothesis predicts the choice of the alternative D which is closest to the ideal analogy point along the one dimension as the best solution. The second potent dimension hypothesis is based on the similarity of terminal tones hypothesis. The prediction of this model is that the alternative D which is closest to tone B along the potent dimension will be selected as the best solution to an analogy. The purpose of the following two experiments is to test whether Rumelhart and Abrahamson's parallelogram model will predict subjects' choices of the best solutions to timbre analogies more accurately than the three alternative hypotheses. As discussed previously, recent MDS solutions 12 of tones of different timbre (Grey, 1975; Wedin and Goude, 1972 as scaled by Wessel and Grey, in press; Wessel, 1973) have resulted in two or three interpretable axes. In Experiment 1, a scaling of 15 tones of different timbre gave rise to a two dimensional timbre space that is comparable to those found by other researchers. Experiment 2 used this scaling solution to test Rumelhart and Abrahamson's (1973) parallelogram model of analogical reasoning in the timbre domain. SYNTHESIS OF STIMULI In the following experiments, subjects were asked to make judgments about 15 tones with different timbres. The specific tones that were used were 15 of Grey's (1975) line segment approximation tones, as discussed in the Introduc- tion. These tones were originally played on the following musical instruments: oboe (2 different instruments and players), English horn, bassoon, Eb clarinet, bass clarinet, flute, alto saxophone (2 tones from one instrument, played at p and mf) soprano saxophone, trumpet, French horn, and cello (3 tones from one instrument, played normally, muted sul taste, and sul ponticello). All 15 tones were played near the pitch of Eb above middle C (approximately 311 Hz), at approximately the same loudness level, and with durations between 280 and 400 milliseconds. These tones were then analyzed by Grey using the heterodyne filter method (Moorer, 1973), which produces time variant amplitude and frequency/ phase functions for each harmonic. The tones were then resynthesized by summing a set of harmonic sinusoids that were controlled in time by the amplitude and frequency func- tions obtained in the analysis stage. This process is called analysis based additive synthesis. The line segment approx- imation tones were synthexized by replacing the extremely 13 l4 complex time variant amplitude and frequency/phase functions with a small number of straight line segments and using these to control the harmonic sinusoids. The tones used in this study were synthesized at Michigan State University using a specialized additive syn- thesis program implemented on a PDP-ll/40 digital computer (See Appendix). The data were supplied by John Grey in the form of break-point tables (i.e., a list of the endpoints of each line segment) of amplitude and frequency/phase functions for each harmonic of each of the tones. The values of the amplitude and frequency/phase functions (as determined from the break-point table) for each time period were used to con- trol the sampling from a sinusoid for each harmonic and these samples were summed to yield the waveform value for that time period. This process was repeated until the entire tone was determined and stored on a magnetic disk in digital form. Figures 1-15 display the waveforms of these tones. Time is on the X axis, amplitude is on the Y axis, and fre- quency (with the fundamental in the background) is on the Z axis. A sampling rate of 25,000 samples per second was used. During the experiments the tones were played via a 16 bit digital to audio converter (DAC), constructed by Three Rivers Computer Corporation (Kriz, 1975). 15 Figure l. Oboe l (01) amplitude envelope. 16 -M-------- ~-------- M--------- M......." M--------- -m-------- --~ l“...-... Figure 2. Oboe 2 (02) amplitude enve10pe. 17 Figure 3. Clarinet 1 (Cl) amplitude envelope. 18 Figure 4. Clarinet 2 (C2) amplitude envelope. 19 Figure 5. English horn (EH) amplitude envelope. 20 ‘ ' W23: - ‘ : Mutant: Figure 6. Bassoon (BN) amplitude envelope. Figure 7. 21 \ V , --.-.-.-.-.-- — -------- e V 1 ’ <——" 9.333333- o~f ———‘-----------o M------------ Flute (FL) amplitude envelope. 22 fi-----.- ---~------- ----_—fi ---wr‘ ----—-" -----—' -.--’ _---_—— ----—-—— F'— w .. Saxophone 1 (X1) amplitude envelope. Figure 8. Figure 9. 23 0----~------- Saxophone 2 (X2) amplitude envelope. Figure 10. 24 — ~ .-4 __ “‘------- --——r fl ------ .----i i--.-.. Saxophone 3 (X3) amplitude envelope. 25 m---------- H---------- M- --------- fl - ------------- m------------- Figure 11. French horn (FH) amplitude envelope. \ Figure 12. Trumpet (TP) amplitude envelope. ‘ V ---.---O ------- ----------- ------------ ‘ ------------o .. -------------o . --------------- -----.---.---- -.-.-.---.-.--- QM---------------- ‘ ----------------o 0' — A~----------------- 9"" ‘------------------ Ov—f ‘------------------- 0-”------------------ .M------------------ Figure 13. Cello l (81) amplitude envelope. \ ----------- ---------O W .~-.-:-:-=.'£-:-:=:a-: 5-2-1: ”------------OOO Figure 14. Cello 2 ($2) amplitude envelope. 28 Figure 15. ' @333333- . ‘13-’5-333333-333333}. - 0*---------------- -~----------------o .~----------------- o—---------------- Cello 3 (S3) amplitude envelope. EXPERIMENT 1: DISSIMILARITY JUDGMENTS OF NATURALISTIC TONES Stimuli The 210 possible pairs of the 15 tones were formed and randomized for each subject. Procedure Subjects were asked to judge how dissimilar each pair of tones was. The tones were presented to subjects in a sound chamber via the DAC over a Philips 532 Motional Feed- back loudspeaker. Subjects sat approximately two and one half feet from the speaker. The tones were presented in pairs; in order to hear a pair, the subject pressed a button switch which was interfaced with the PDP-ll/40. The subject was allowed to listen to each pair as many times as he or she desired and then registered a dissimilarity judgment by using a linear potentiometer. Each judgment (the position of the potentiometer) was read by the PDP-ll/40 via a DRllC interface when a second button switch was depressed, and was stored on a magnetic disk for later analysis. Subjects were given 20 practice trials to become familar with the procedure. Subjects Five musically sophisticated and five musically untrained subjects were recruited from students and faculty at Michigan State University. 29 30 Results and discussion The dissimilarity judgments were analyzed using two MDS programs, KYST (Kruskal, 1964a, 1964b; Young and Torgerson, 1967; Shepard, 1962a, 1962b; Torgerson, 1958) and INDSCAL (Carroll and Chang, 1970). Data from the musically sophis- ticated subjects and from the musically untrained subjects were analyzed separately using the KYST program. Three, two and one dimensional solutions were obtained for both sets of data. The goodness-of-fit of these solutions is assessed by stress measures which are shown in Table 1. Large values of Table l. KYST stress values. Number of Stress Dimensions Musically Musically Sophisticated Untrained Three 0.21 0.22 Two 0.28 0.30 One 0.43 0.43 stress indicate a poor solution. For both groups, the good- ness of fit for the three dimensional solution was not sub- stantially better than that for the two dimensional solution. Stress for the one dimensional solutions vvas markedly higher than that for the two dimensional solutions. There— fore, the two dimensional solution seemed most appropriate. The Shepard diagrams (solution distances versus the data fitted with the monotone regression line) for both two dimensional solutions are shown in Figure 16. Although the stress was within acceptable limits, there was still consid- erable scatter. 31 .mpoonn3m woumowumfinmom xHHmufimse o>ww pom smhwmww wummosm enamomaou .moH ouawflm cocooamm: cocoocoot cooooacmm ocanoqmom acuoocmmw ocucoaoow ans=.onma mnon.mmna sapo.=am etteintfifitlbo‘eQOOOQQQOtototti\eeoOioonOOOOIedai QIOOIOOQQeO to. eeeoeveteneeeetobnoewnee-eecentennttenn O I I 1 .I I Q o 9:99.: . . -oon.o . .V t I 0 Q I 1 o o o an no no 1 0 i n m on o o a o 06 a o oo oo canoe or n n n seam. . o a o no on noon 0 n one o o o . oaom. » o o o o o a o oo oo Q can a on name a cone 0 n n o a can o oo u one on no 0 nnnnnnnnnnnnnnn n n n o o o o o o a can a oo o n n o 00 n a o oo ooooocooo o nn nnnn nnnnnnncoo n o o o o no on on oceanonn nnnnn nnnnn n n n o o n no oo o a 0 ocean o n o no a a o n o 0 one o o o o o oc mom a o n ease.“ .. o c on o a can on nnnnnnnnonno no 0 . cane.“ e o o o n no one can a nnnnnnnnnno a on n o a on o o o o a no no onnnnno oooaooaooonnnn,nnnnnnn n e on o o on o no no onnnnnn nn 0 nnnnnnnonooo one o o n o 0 one o oo cocoa no a n nnnnnnnnnocoo nnoooo cocoon n n e a o o o a onnnnnnnnnnnn nnno a Dance a o o n n o oo a can cocoonoc o co nnnnnnnnnnn o o o n e ooomoa o a a o a co nnnnnnnnnnnnnnn nnnnn coco no can a o o a . zoom.“ 5 a onnnnnnnnnn o a a o o no a o a n n c on a co nnnno nnnnnnnnnnnnco oo coco oooooo o o o no t e a o nnnnnnnnnn can a a a once a o a n o o o o nnnno coo on no noon 5 co cc 0 an ea c o a n e onnnnnnnoo nnnnnnnnnnnnnnnn oo oo coco Q can acooooo o o o n e o a a a non nnnnnnn o cone 0 coco o a c o a o n aoa¢.~ . o o n can onoooo n n o use on o n . maooow 0 non n oo o a an ac o n e a on one n .99 a a one a so a Goa o o o n e nn 0 n nn 0 oo o n a o on an o 4 o o a on n a 0000 o one o a o a n e 0 one o no no cone 0 n ’ -- - -- - . t i eoumo~ . o o o oo o a o o . Doom.~ e o oo o o o o oo o o t .0 O D O I t t O 0 i ! c.9o.n . . . . . . . . . . . coca.» ‘0‘ QOQODQOQOIO Oiiifiittfiot 0.0.0.6.... 4.64.00Q‘00 OOODQQOOOIQ .iiittttlit ititififliitt It. liiiiit ti. oooo.oom: oooo.eao: oooo.eamn aaoo.noon oooa.acm~ eooo.aoa~ oaao.comu aooaooooa ooao.cam osxo z«tmuwxw .an.nm:m 440mm x m >m mzo» wruaw mo bzwzooafi rummuo oohw. u « qaazaou.mmthm .m¢OHm‘tho ~ «on .meqnx. «pcn .m> awnxcnro An.»¢to ozq .o.»mno 32 coca. eaaood geomoun aoao.~ o-am.~ cocoooaou oaooouco toeeeeeetntoeeOtoesenosteneeo-e.noovoonne.neeoeeeoe.ewoeeeeeo.eneoeeeweoonenenoeo.e.nneennnonennin$nn. O O .muuonn3m wocfimnucd xaamowmse o>wm pom Ewnmmfiw unmmonm oufimoasoo aoo.ooco ooo.aao~ oam.ooow ooooomom ooc.onon amo.owow noo.mom~ amm.onon . ‘ ‘ .‘ o . oooooo . . . i . .I . . . t e a can as n 0 e t II I . ' . o a can moonoon . Doom. e o o o a com noooooomoon e e o o m anooo o nnnn » n o noooooco oo o noooooocnn n o no cocoa no no ooooonnn o e o 0 man a a o o ooonnan o n o o o n o Inca o e nooo oooo . e e o 0 mac n on 0000 oann n o o o aoocooco nooooooo n o o o o oo o o coco ooonnnnooa . coco.“ e o o o ooooooo ooonooooonnoc n o o o o a one 0900 oonlnnnlll e e a coco o nnnnllnncoo on e t can 0 a cocoonnnnnlnooo o I o o oo oonnnnnnnnnooonoo e o o coononn o o I e o ooonooooonnnnnnnnnnnooooco o n o o o oonnnlnnnnnnnn cocoon . ooamoa e o nnnnnn cocoon o co 1 e a on Dnnnno be one a ma an n e o nnnnnonnno oooo Dunno o c e t 000 0 00 D Q 0 Inn 0 n co so I e ooc nnnn oonnoo com o no 0000000 e e o o noonno o cocoooaoo can t e 4 o o o nnnoo oooooo non n o . oaoo.~ » nnn on o cacao oo 0 000000 e e oo o oo a one on o on n e ooonnnnnnnnoooooooo no one o n o e e n nnonoooooo a o o o o o o e e o oo o oo o o o t t e e a o oo o oo o e a once on o on on o o no a s . o ooomow ooeeenteeee.eeeoeeeeeoteeoeeoneoeoeeenoneoesleeeeoe.eeeeeeeoeoneneeeeeeoeneeeeeet.eoeetetenctneeeeeeeoe oooaoou eoo.cooo coo.oaao 093.950» ooooooow eoo.eocm uee.eea¢ eoo.ocon eeaooenw ooo.aeeu oaeoe ohxo zqtmmctw .onofimam umonthoz m >m szb m>mmo uo hzmtmoafi tummuo hoow u.« «Jatmou ummmhm .w‘oum‘mzno ~ mom .meqnxu «pan .m> .me. anahaxo ozq Acupmno .QOH onsmam mmo.o 33 The goodness-of-fit measure for INDSCAL is the correla- tion between the data and the distances in the solution. Again the three dimensional solution was little better than the two dimensional solution (Table 2); however the correla- tion did decrease when a one dimensional solution was obtained. Table 2. INDSCAL goodness-of-fit correlations. Number of Correlation Dimensions Three 0.65 Two 0.62 One 0.54 An examination of the subjects' weight space for the three dimensional INDSCAL solution (Figure 17), yields yet another reason for choosing the two dimensional solution: Subject U4 was the only one to place any weight on the third dimension. Since there was no substantial improvement in goodness-of—fit for the third dimension and since only one subject used the third INDSCAL dimension, the two dimensional solution seemed to be a more appropriate representation of the data. Figures 18 and 19 show the INDSCAL two dimensional subjects' weight space and group timbre space. The horizon- tal dimension of the timbre space closely corresponds to the first dimension found by Wessel and to Grey's Y dimension. At one extreme are the tones, from instruments such as the French horn and the cellos, which have most of their energy located in the lower harmonics. At the other extreme are the tones, such as those produced by the saxophones and oboes, Figure 17. Figure 18. 3 .50 DIMENSION 75 0. J O 0.25 .00 l l 34 U4 05 M2 M4U3TIQAII~3M5 .00 0125 0150 0175 DFMENSFON 1 Two dimensional subspace of the INDSCAL three dimensional weight space. 2 .40 DIMENSION .60 O O 0.20 l l Upw3M3 U5 MS DHMZ M4 U4 .00 0120 0140 0160 DIMENSION 1 INDSCAL two dimensional weight space. 35 0.60 .40 0 TP FH 83 X3 02 T T 7 .00 0.20 0.40 0.60 EH 01 Cp.00 r I I ”0060 "0-40 “0020 X2 -O.ZO .40 X1 -0 1 C2 C1 -0 060 Figure 19. INDSCAL two dimensional timbre space. 36 which have more of their energy located in the higher harmonics. One can think of this as a mellow to bright continuum. The first moment of the average amplitude of the harmonics can be used as a quantitative measure of this energy distribution dimension. The average amplitude of the kth harmonic, AA(k), was computed as: r f(t) dt , AA(k) = b _ a (2) where f(t) = the amplitude of the kth harmonic at time t, a = the start time of the tone, and b = the stop time of the tone. The first moment of the average amplitudes for a tone was calculated as: k AA(k) 79‘ W IIMDIIMD n—- H 3 ll (3) AA(k) where M = the first moment, k = the harmonic number, and n = the number of harmonics in the tone. This measure of energy distribution is highly correlated (r = 0.85) with the X coordinates of the INDSCAL solution (Figure 20). As in previous work, the second dimension appears to be determined by the attack portion of the tones. The onsets of the upper harmonics in the tones at one extreme of this dimension tend to be asynchronous; at the other extreme the onsets look much more synchronous. The tones at the asyn- chronous extreme, from instruments such as the clarinets and the saxophones, have a roughness or raggedness in the attack; 4.80 030 -0.50 30.40 -b.20 07.00 0'.20 01.40 INDSCHL X COORDINHTES ‘Figure 20. INDSCAL X coordinates versus the first moment of the average amplitude of the harmonics. 38 the tones which are more synchronous in the onsets of the upper harmonics, created by instruments such as the trumpet and the bassoon, have a much sharper or clearer attack. A weighted standard deviation of the start times of the upper harmonics, beginning with harmonic six, was chosen as a quantitative measure of this dimension. The standard deviation was calculated as: WSD = z (t(k) - {)2 * AMP(k) / TOTAMP , (4) n 5 where WSD = the weighted standard deviation, t(k) = the onset time of the kth harmonic, t = the mean start time of the upper harmonics, n = the number of harmonics in the tone, AMP(k) = the energy in harmonic k, and TOTAMP = the total amount of energy in the tone. The start of a harmonic was defined to be the time at which its amplitude was 1.5 (about 20 dB). These weighted standard deviations correlate highly (r = -0.78) with the Y coordinates of the INDSCAL solution (Figure 21). A comparison of the two dimensional INDSCAL solution with a two dimensional projection of Grey's Y and X dimen- sions (Figure 22) reveals that these two scaling solutions are very similar. This supports the position that the scal- ing solution is a reasonable one. However, there is a puz- zling discrepancy between Grey's interpretation of the syn- chronous-asynchronous dimension and the one just offered. While both studies obtained this dimension, Grey claimed that the woodwinds tend to be sychronous while the strings and brasses tend to be asynchronous. This study 4.00 l 3.20 2.40 l 1.80 l 0.80 NEIGHTEO STHNDRRD DEVIHTION 39 L I0.00 0.60 Figure 21. -b.40 45.20 0100 0120 0140 INDSCHL Y COORDINHTES INDSCAL Y coordinates versus a weighted standard deviation of the onset times of the upper harmonics. 83 FH Figure 22. 40 FL 82 81 TP TM EN 02 01 EH C2 X2 C1 X3 X1 A two dimensional subspace of Grey's (1975) three dimensional INDSCAL timbre space. 41 claims exactly the opposite, that the woodwinds tend to be asynchronous while the strings and brasses tend to be syn- chronous. It is important to note that Grey arrived at his interpretation by looking at spectrograms of his tones but failed to give any quantitative measure to support his interpretation. One possible problem with the quantitative interpreta- tion providedimme is that some of the harmonics have fre- quency glides. That is, at the start of the tones some frequencies are not at their proper harmonic values but glide to that value as the tone progresses. Thus a second way of defining the start of a harmonic is that time at which the frequencies reach some preportion of their proper harmonic value. Exploration with this definition of start time failed to provide as good a quantitative interpretation as did the amplitude definition of start time. The modified measure also failed to resolve the discrepancy since the strings and brasses still tended to be synchronous while the woodwinds tended to be asynchronous. The start of an harmonic could also be defined to be that time at which its frequency reaches some proportion of its proper value and its amplitude exceeds some threshold. Exploration with this measure also failed to equal or improve upon the original measure, and again the discrepancy remained unresolved. As can be seen from the subjects' weight space shown in Figure 18, there were no systematic differences between the 42 musically sophisticated subjects and those with no musical training. Three of the untrained subjects and the five musical subjects clustered together at a point which indicates that they were giving an approximately equal weight to both dimensions. Subject U5 gave less weight to the first dimen- sion that did the others and subject U4 gave little weight to either dimension. This conclusion is further substantia— ted by the two KYST solutions (Figure 23). Not only are the stress values very nearly the same, but the solutions them- selves are similar to each other. Note also that the two dimensional KYST solutions are roughly comparable to the INDSCAL solution. These convergent solutions serve to increase our confidence in the interpretations that have been provided. 43 BN FH TP 02 83 52 EH X3 81 FL 01 x2 02 x1 01 Figure 23a. Two dimensional KYST timbre space for musically sophisticated subjects. 83 EN 44 82 TP 81 02 FH C2 X2 X3 EH 01 C1 X1 Figure 23b. Two dimensional KYST timbre space for musically untrained subjects. EXPERIMENT 2: A TEST OF THE PARALLELOGRAM MODEL Stimuli In this experiment, Rumelhart and Abrahamson's (1973) parallelogram model of analogical reasoning was tested using the timbre space derived in Experiment 1. Twenty timbre analogies of the form A:B::C:Dl, D2, D3, or D4 were formed as follows: The 15 tones were arranged in random order; the first three in this order were chosen as A, B, and C of the first analogy, the second three were used to form the second analogy, and so on. When the list was exhausted the tones were rerandomized and the procedure repeated until 20 anal- ogies had been formed. For each of the analogies thus formed, the coordinates of the ideal analogy point, "I", were calculated and the distances between each of the remain- ing 12 tones and ”I" were computed. Four alternative solu- tions to the analogy were chosen such that each analogy had an alternative in each of the following ranges: 0.00-0.25 units from "I", 0.25-0.50 units from "I", 0.50-0.75 units from "I", and 0.75-1.00 units from "I". The units which were used are the ones produced by the INDSCAL program as shown in Figure 19. If it was not possible to choose altern- atives to meet these conditions, that analogy was discarded and another one formed as above. If more than one tone fell 45 46 within a given range, the one closest to the lower boundary was chosen. The four chosen alternatives were also ordered randomly for each subject. If A is to B as C is to D, then it ought to be the case that A is to C as B is to D. In terms of the parallelogram model, the above two analogies are exactly the same parallel- ogram. Therefore, for each of the 20 analogies formed above, another analogy was formed which had the same components but had the second and third elements reversed. In other words, the analogies were of the form A:C::B:Dl, D2, D3, or D4. Procedure The analogies were presented to subjects using the audio equipment described in Experiment 1. A trial consis- ted of the four alternative forms of an analogy (A:B:Can, where Dn is one of the four alternative solutions). Sub- jects selected one of the alternative forms by depressing one of four button switches; each alternative form was ran- domly associated with one switch. Subjects listened to each alternative as many times as he or she wished. Subjects were asked to rank order each of the four alternatives as to how well it completed the analogy. The rank order was indi- cated by rearranging the order of the four button switches until they were in the appropriate order. Subjects then pressed a fifth switch which signaled to the computer that the rank order was ready to be entered; subjects then pres- sed each of the four switches in the appropriate order and this was read by the PDP-ll/40 via the DRllC interface and was stored on a magnetic disk for later analysis. 47 Subjects Nine of the ten people who served as subjects in Experiment 1 served as subjects in this experiment. Subject M3 was unable to participate. Results and discussion The parallelogram model predicted subjects' responses on the timbre analogy task better than any of the alternative hypotheses. Of the alternative hypotheses, the potent dimen- sion similarity of terminal tones model was the least satis- factory. This model predicts that the alternative D closest to the tone B along the potent dimension will be chosen as the best solution. However, the correlation between the one dimensional BD distance and the proportion of subjects who ranked that particular alternative as the best solution was quite low (r = -0.3l). The other two alternative hypotheses were slightly better. The potent dimension parallelogram model predicts that the alternative D which is chosen as the best solution to an analogy is that one with the shortest ID distance along the potent dimension. In this case, the one dimensional ID distances correlated poorly with the proportion of subjects who ranked that alternative D as the best solution (r = -0.39). The prediction of the similarity of terminal tones model is that the alternative D closest to tone B in the multidimen- sional space will be chosen as the best solution to an analogy. The correlation between the BD distances and the distances and the proportions of subjects chosing a 48 particular D as the best solution was the same as the previous model (r = -0.39). The parallelogram model predicts that the probability of chosing a given alternative as the best solution is inversely related to the ID distance. Table 3 lists the Table 3. Rank order data averaged over all subjects and all analogies. RANK DISTANCE OF THE ALTERNA- SUBJECT-ASSIGNED RANK TIVE FROM I 1 2 3 4 1 0.422 0.303 0.156 0.119 2 0.322 0.283 0.217 0.178 3 0.169 0.267 0.358 0.206 4 0.086 0.147 0.269 0.497 prOportion of responses, averaged over subjects and anal- ogies, for which the 1th closest alternative to the ideal analogy point was ranked as the Jth best solution, where I is the row index and J is the column index. Column one of this table shows that the prediction was indeed ful- filled. In fact, the distance between an alternative and the ideal analogy point predicts not only the best solution, but the rank ordering of all four alternatives. Only one exception occurred (see column two). Figure 24 shows the scatter diagram of the ID distance versus the proportion of subjects who ranked that alternative D as the best solution. The product moment correlation coefficient is -0.52. This correlation is not too disappointing when one takes into consideration that the goodness of fit measure (r = 0.62) of timbre space places a kind of ceiling on subsequent 49 C: C) .5.“ ° 0 0 c: a: f—‘O- +_ 9 9 O 0 (.0 LLJ COO e ow oee «Nee C3“? LIJC)" K e e we» Z (I O: :1 01¢ 02:3" 3 D ca flaw a15~ c3 c1 ' $7 Wop—ww—o—w—H—l C0.00 0.20 0.40 0.60 0.80 1.00 ID DISTFINCE Figure 24. ID distance versus the proportion of subjects who ranked an alternative as the best solution. 50 correlations derived from the distances in a way analogous to the limit reliability places on validity in testing theory. By assuming Luce's choice rule (Luce, 1959; Atkinson, Bower, and Crothers, 1965) the probability of choosing a given alternative as the best solution can be predicted. Luce's choice rule states that the probability of choosing any given alternative Xi from the set of alternatives X1, . Xn is given by Prcxilxl. . . . .Xn) = pi = Map/[map]. (5) where di = distance between Xi and I, and v (di) is a mono- tonically decreasing function of its argument. The addi- tional assumption was made that V(X) = 8X13 CM). (6) where x and a are positive numbers. The exponential function was chosen for two reasons: (1) Shepard (1957) found a good fit to an exponential generalization function over a simil- arly derived space and (2) Rumelhart and Abrahamson found a good fit to an exponential function in their work with the parallelogram model. Taking the natural logarithm of both sides of equation 5 yields 1n (pi) = ln v(di) - 1n [£v(dj)]. (7) Substituting exp (-ax) for v(x) (equation 6), we get 1n pi = -adi - ln [£exp(-adj)]. (8) This function states that the parameter, -a, can be esti- mated by the slope of the regression line fit to the ID 51 distance versus the natural logarithm of the proportion of subjects who ranked that alternative as the best solution (Figure 25). Alternatives which were never chosen as the best solution were eliminated while making this calculation. This yielded an estimate of -a = -l.33. This parameter was then used to predict the prOportion of subjects who ranked each alternative as the best solution. Figure 26 shows the observed versus the predicted proportion of subjects who ranked each alternative as the best solution. Again this correlation (r = 0.52) is quite acceptable when compared to the goodness of fit measure. Given that there is no systematic clustering in the INDSCAL subjects' weight space, it is not surprising that there are no systematic differences in the analogy judgments attributable to musical training. Table 4 gives the propor- tion of responses, averaged over the musical subjects and the analogies, for which the Ith closest alternative to I was ranked as the Jth best solution. Table 5 gives these same results averaged over the musically untrained subjects. As one would expect, there were no systematic differences between these two tables and their entries were highly correlated (r = 0.89). If the analogies work equally well along both dimensions then the analogies of the form A:C::B:D should work as well as those of the form A:B::C:D. Table 6 and Table 7 summarize the data for these two forms of the analogies averaged over all subjects. Since the subjects' weight space (Figure 18) 52 OrOO e 0 0 0 O 0 0 <9 @MQOO 0060 “0050 1 TI] 0 0009 0009 $00 0O 0 l _1000 l [0 N88 RHNKED 8E8 -1.50 -—1 me «we 000409 000994 2.50 0.00 0120 0140 0150 0180 I100 10 DISTHNCE Figure 25. ID distance versus the logarithm of the proportion of subjects who ranked an alternative as the best solution. a: ¢ cow :1 53* . UJCF m 0 0 g 0 00 000 0 000000 D0100 0120 0140 0160 0180 1100 OBSERVED PR [0 RHNKED BEST) Figure 26. Observed versus predicted preportion of subjects who ranked each alternative as the best solution. 54 Table 4. Rank order data for musically sophisticated subjects. RANK DISTANCE SUBJECT-ASSIGNED RANK OF THE ALTERNA- ----------------------------------------- TIVE FROM I 1 2 3 4 1 0.406 0.331 0.131 0.131 2 0.325 0.294 0.219 0.162 3 0.213 0.187 0.375 0 225 4 0.056 0.187 0.275 0.481 Table 5. Rank order data for musically untrained subjects. RANK DISTANCE SUBJECT-ASSIGNED RANK OF THE ALTERNA- ------------------------------------------ TIVE FROM I 1 2 3 4 1 0.435 0.280 0.175 0.110 2 0.320 0.275 0.215 0 190 3 0.135 0.330 0.345 0.190 4 0.110 0.115 0.265 0.510 55 Table 6. Rank order data from analogies of the form A:C::B:D. RANK DISTANCE SUBJECT-ASSIGNED RANK OF THE ALTERNA- ------------------------------------------ TIVE FROM I l 2 3 4 1 0.389 0.344 0.167 0.100 2 0.361 0.283 0.200 0.156 3 0.189 0.239 0.367 0.206 4 0.061 0.133 0.267 0.539 Table 7. Rank order data from analogies of the form A:B::C:D. RANK DISTANCE SUBJECT-ASSIGNED RANK 00 THE ALTERNA- ------------------------------------------ TIVE FROM I 1 2 3 4 1 0.456 0.261 0.144 0.139 2 0.283 0.283 0.233 0.200 3 0.150 0.294 0.350 0.206 4 0.111 0.161 0.272 0.456 56 shows that subjects tend to put nearly equal weights on both dimensions, it is not surprising that these two tables are quite similar (r = 0.92). CONCLUSION The principal dimension of the timbre space which was obtained in the first experiment replicated the results of previous studies. This not only supports the idea that a reasonable solution was obtained, but also increases one's confidence that the energy distribution provides an impor- tant cue in the perception of timbre. The fact that the second dimension was interpreted in terms of the attack transient also agrees with previous research. However, the details of exactly what property (or properties) of the transient provides the perceptual cues are not totally clear. An experiment comparing three vari- ations of stimulus tones could provide further insight about this problem. This experiment involves three conditions. The first condition would repeat the scaling part of this study with the tones experimentally equalized for pitch, loudness, and duration. The tones scaled in the second condition would be synthesized by replacing the frequency functions used in the first condition with frequency functions which are exactly at the prOper harmonic value, thus removing the frequency glides as perceptual cues. The tones in the third condition would be synthesized by replacing the amplitude functions of 57 58 the original tones with trapezoidal functions that have the same average amplitude as the origninal function. In this condition, the amplitude envelope cannot provide any per- ceptual cues. A separate dissimilarity matrix and scaling solution would be found for each condition. By comparing the scaling solutions from the fixed fre- quency and the trapezoidal amplitude tones to the original tones solution, one will be able to better understand what effect the amplitude variations and the frequency glides have on the perception of timbre. If the fixed frequency scaling solution corresponds more closely to the original solution than the trapezoidal amplitude solution, one will be able to conclude that the perceptual cues provided by the amplitude envelope are more important in the perception of timbre than those cues provided by the frequency glides. This would be consistent with the synchronous-asynchronous interpretation of the second dimension. However, if the trapezoidal amplitude scaling solution is better than the fixed frequency solution, one will conclude that the fre- quency glides provide important cues in the perception of timbre. This would require that the synchronous-asynchronous interpretation be modified or rejected. Another method of evaluating the validity of the inter- pretations that have been given to timbre spaces is to manip- ulate directly the properties of the tone. If the centroid of the energy distribution and the variability of the onset times in the upper harmonics play the important role which 59 this study suggests, it should be possible to synthesize tones that map onto a predetermined area of the timbre space. Such results would provide additional strong support for this interpretation. Although there is still room for improvement, the parallelogram model predicts the solutions to timbre anal- ogies more accurately than alternative models. Furthermore, the model accomplishes this even though the particular tones in the present study were only approximately equalized for pitch, loudness, and duration. Although one rarely hears precisely equalized tones in real life situations, a repli- cation with tones that have been equalized experimentally is still desirable. An even more important step would be to carry out an analogy experiment based on a more orderly timbre space. Assuming the interpretation of the timbre space is correct, it should be possible to construct a space where tones actually occur at the ideal analogy point. Thus it would be possible to get a better test of the parallelogram model. These results open interesting and challenging avenues for composers and musicians. The concept of timbre analogies suggests that the idea of melodic transposition might now be extended from the domain of pitch to that of timbre. REFERENCES REFERENCES American Standards Association. American standard acousti- cal terminology (Sl.l-1960). New York: Ameritan Standards Association, Inc., 1960. Atkinson, R. D., Bower, G. H., and Crothers, E. J. Intro- duction to mathematical learning theory. New York: Wiley, 1965. Bartholomew, W. T. Acoustics of music. New York: Prentice- Hall, Inc., 1945. Berger, K. W. Some factors in the recognition of timbre. Journal of the Acoustical Society of America, 1964, 36, 1888-1891. Carroll, J. D., and Chang, J. J. Analysis of individual differences in multidimensional scaling via an N-way generalization of "Eckart-Young" decomposition. Psychometrika, 1970, 35, 283-319. Fletcher, H. Loudness, pitch and the timbre of musical tones and their relation for the intensity, the frequency and the overtone structure. Journal of the Acoustical Soc- iety of America, 1934, 6, 59-69. Grey, J. M. An exploration of musical timbre. Unpublished doctoral dissertation, Stanford University, 1975. Helmholtz, H. L. F. [On the sensations of tone as a physio- logical basis for the theory of music] (A. J. Ellis, trans.). New York: Dover, 1954. (Originally published, 1877.) Henley, N. M. A psychological study of the semantics of animal terms. Journal of Verbal Learning and Verbal Behavior, 1969, 8, 176-184. Kriz, J. S. A 16-bit A-D-A conversion system for high-fidel- ity audio research. IEEE Transactions’on Acoustics, Speech, and Signal Processing, 1975, 23, 146-149. 60 61 Kruskal, J. B. Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis. Psycho- metrika, 1964a, 99, 1-27. Kruskal, J. B. Nonmetric multidimensional scaling: A numerical method. Psychometrika, 1964b, 99, 115-129. Luce, R. D. Individual choice behavior: a theoretical analysis. New York: Wiley, 1959. Moorer, J. A. The heterodyne filter as a tgol for analysis of transient waveforms (Tech. Rep. CS-379). Stanford University, Computer Science Department, July 1973. Ohm, G. S. Uber die definition des tones, nebst daran geknupfter theorie der sirene und ahnlicher tonbil- dender vorrichtungen. Annalen der Physik und Chemie, 1843, 99, 513-565. Plomp, R., and Steeneken, H. J. M. Effects of phase on the timbre of complex tones. Journal of the Acoustical Society of America, 1969, 46, 409-421. Rumelhart, D. E., and Abrahamson, A. A. A model for anal- ogical reasoning. Cognitive Psychology, 1973, 9, 1-28. Saldanha, E. L., and Corso, J. F. Timbre cues for recogni- tion of musical instruments. Journal of the Acoustical Society of America, 1964, 99, 2021-2026. Shepard, R. N. Stimulus and response generalization: A stochastic model relating generalization to distance in psychological space. Psychometrika, 1957, 22, 325-345. Shepard, R. N. Analysis of proximities: Multidimensional scaling with an unknown distance function I. Psycho- metrika, 1962 a, 91, 125-140. Shepard, R. N. Analysis of proximities: Multidimensional scaling with an unknown distance function II. Psycho- metrika, 1962b, 91, 219-246. Slawson, A. W. Vowel quality and musical timbre as functions of spectrum envelopes and fundamental frequency. Jour- nal of the Acoustical Societygof America, 1968, 19, 87- 101. Strong, W., and Clark, M. Perturbations of synthetic orches- tral wind-instrument tones. Journal of the Acoustical Society of America, 1967a, 11, 277-285. 62 Torgerson, W. S. Theory and methods of scaling. New York: Wiley, 1958. Wedin, L., and Goude, G. Dimensional analysis of the per- ception of instrumental timbre. Scandinavian Journal of Psychology, 1972, 99, 228-240. Wessel, D. L. Psychoacoustics and music. PAGE: Bulletin of the Computer Arts Society, 1973, 99. Wessel, D. L., and Grey, J. M. Conceptual structures for the representation of musical material. In H. B. Lincoln (Ed.), Composition and sound synthesis with computers. Cornell University Press, in press. Young, F. W., and Torgerson, W. S. TORSCA: A Fortran-4 program for Shepard-Kruskal multidimensional scaling analysis. Behavioral Science, 1967, 12, 498. tenant-3000511700 .-\ Px } Li's {Di—33000 ' I ”ii—10:37:. ”Ti-100:. 4.} APPENDIX IADD: An Interpolating Additive Synthesis Program PROGRAM l'IADD" WRITTEN BY DAVID EHRESMAN DEPARTMENT OF PSYCHOLOGY MICHIGAN STATE UNIVERSITY EAST LANSING: MI 48824 THIS PROGRAMn WHICH HAS BEEN IMPLEMENTED ON A PDP 11/40 MINIMCOMPUTER AT MICHIGAN STATE! DOES ADDITIVE SYNTHESIS USING AN INTERPOLATING OSCILLATOR. THE CONTROL PARAMETERS ARE SAMPLING RATE! BEATS PER SECUND! INITIAL PHASE: AND SCALE FACTOR. THE ADDITIVE SYSTHESIS DONE DY THIS PROGRAM IS BASED ON STRAIGHT LINE APPROXIMATIONS OF THE COMPLEX TIME VARYING AMPLITUDE AND FREQUENCY FUNCTIONS OF EACH HARMONIC TO BE INCLUDED IN THE SYNTHESIZED TONE. THIS PROGRAM USES THE DREAKPOINT INFORMATION FROM THESE FUNCTIONS TO CONTROL THE SYNTHESIS. THE INPUT FILE MUST BE IN THE FOLLOWING FORMAT: LINE 1mAMP. LABEL (MAX. A 8 CHAR.)$ LINE 2~TIME OF FIRST AMP. DREAKPOINT (IS) FOR HARMONIC 1? LINE 3wAMPLITUDE AT FIRST DREAKPOINT (FI0.0) FOR HARMONIC 1. THIS IS REPEATED UNTIL ALL THE AMPLITUDE DATA FOR THE 1ST HARMONIC HAS BEEN ENTERED. THE NEXT LINE MUST CONTAIN 999 WHICH ACTS AS A DELIMITER. THIS IS FOLLOWED BY THE FREQUENCY (HZ) DREAKPOINT DATA FOR HARMONIC 1 USING THE SAME FORMAT AS ABOVE. DATA FOR EACH OF THE REMAINING HARMONICS MUST DE ENTERED USING THE SAME FORMAT. THIS PROGRAM CAN PROCESS A MAX. OF 29 HARMONICS WITH A MAX. OF 19 DREAKPOINTS/HARMONIC. THE OUTPUT IS STORED ON A DISK IN A FORMAT SUITABLE FOR READING THRU A DAC. THE FOLLOWING SUBROUTINES AND FUNCTIONS ARE NEEDED TO RUN THIS PROGRAM: (1) LINE ~ COMPUTES SLOPE AND CONSTANT FOR A LINE DEFINED DY TWO POINTS (2) OSCIL ~ COMPUTES A SAMPLE WHEN GIVEN AMP.y SAMPLE INCREMENT9 AND PHASE INFORMATION (3) SOUT W PACKS SAMPLES IN A FORMAT SUITABLE FOR USE WITH OUR DAC ( 4 ) S Y ‘ I... I D R OUT I NES 63 6 4 [3 C .... .. .... .... .... .... .. .._ -.. .... .. .... .. .... .. .... .... .... .. .. .. .... '[ A I. I" .... .... .... .... .. .... .. .... .... .... .... .... .... .... .... .... .... i“ C DOUBLE PRECISION EXT DIMENSION AMP (20,30), ITIMEI (20930)! FREO (20y30)r $ II (30)! 12 (30)9 ITIMEB (20,30), SLOPE1(SO)9 $ SLOP12(SO)9 CONSTI (SO)9CONST2 (SO)! PHS (SO)y $ ISPEC (39) COMMON SySINE (511) S /BUFFER/ ICHANv INDEX! AMAX! SAMP (256), IDUFF (256) DATA INDEX /0/, AMAX /0.0/rEXT /SRDATSND/v $ SAMP f256*0.0/y IDEL /"007/ C** READ CONTROL PARAMETERS WRITE (7960) 60 FORMAT (’$’p’ENTER SAMPLING RATE (F10.0) M ’) READ (5,120) SAMRAT WRITE (7970) 70 FORMAT (’$’r’ENTER BEATS PER SECOND (FI0.0) “ “) READ (5,120) BEAT WRITE (7’75) 75 FORMAT (’$’p’ENTER INITIAL PHASE (FI0.0) w ’) READ (SrIRO) PHASE C** FIGURES SAMPLES PER BEAT SPD 3 SAMRAT/DEAT WRITE (7y?6) 76 FORMAT (’$’r’ENTER AMPLITUDE SCALE FACTOR (FI0.0) w ’) READ (5,120) SCALE C*# READ OUTPUT 8 INPUT FILE NAMES IN STANDARD CSI CXX FORMAT AND OPEN FILES FOR I/O WRITE (5979) 79 FORMAT (’$’r’ENTER COMMAND STRING M ’) I F (I CS I (I SPEC , EXT 9 r 9 0) . NE . 0) STOP ’ I NVAL. I III CS I STRING" IF (IASIGN(10!ISPEC(16)yISPEC(17)vOrSQ).NE.0) $ STOP ’NO CHANNEL FOR INPUT’ ICHAN a IGETC () IF (ICHAN.LT.0) STOP ’NO CHANNEL FOR OUTPUT’ IF (IENTER(ICHAN9 ISPEC (1)!ISPEC (5)).LT.O) $ STOP ’NO CHANNEL OR NOT ENOUGH DISK SPACE’ CXX STORE A 512 SAMPLE SINE WAVE TEMP m (2. * 3.14159265)/511. DO 80 J x 0,511 SINE (J) m SIN (TEMPXJ) SO CONTINUE C** LOOP FOR MAX. OF 29 HARMONICS DO 190 J 3 1,30 C** READ AMP. BREAKPOINT DATA FOR HARMONIC J READ (109100yEND3200) LABEL 100 FORMAT (A12) DO 130 I a 1:20 READ (10,110) ITIMEI (IrJ) IF (ITIMEI (Ird).EO.999) GO TO ISO READ (109120) AMP (Ivd) 110 FORMAT (IS) 120 FORMAT (F10.0) CXX CONVERT BEATS TO SAMPLE NUMBER ITIMEI (I’d) m (ITIMEI (IvJ)“1) X SPD + 1 130 CONTINUE C** 150 C** C** 1?5 190 C** 200 C** C** 250 C** C** C** C** C** C** 300 C** C** 40 0 65 R EAIII F R E Q . BR EARF‘O INT DATA FOR l-~|Al-"