— , 7 - — 1 1, 1 I W \l I I l A GENERAL COMPILER CAPABLE OF LEARNING Thesis Ior II“ Degree oI M. S. MICHIGAN STATE UNIVERSITY Richard Fairbanks Arnold 1958 THF'WS LlfiRARY Michigan State University MSU LIBRARIES m » RETURNING MATERIALS: P1ace in book drop to remove this checkout from your record. FINES wiII be charged if book is returned after the date stamped be10w. 0 r ‘A GENERAL COMPILER CAPABLE OF LEARNING BY RICHARD.FAIRBANKS ARNOLD AN ABSTRACT Submitted to the College of Science and Arts Michigan State University of Agriculture and Applied Science in partial fulfillment of the requirements for the degree of INNSIER OF SCIENCE Department of Statistics Approved: fl @ ABSTRAGI‘ I This thesis describes in general terms a routine for a.high-speed electronic digital computer that is capable of accepting a program written in any precise language and producing a.program for any other language, in this case, the operation code of a computer. .A‘brief description is given of some existing programming tech- niques, including assembly, canpiling, and interpretive routines. Then a proposed compiler is described which operates by forming random pro- grams and then testing them.for acceptability. A.method of dividing the compilations into parts is discussed and.methods for testing these parts suggested. A.method of improving the expected time of compilation is discussed that involves allowing the random.program.generator to vary the prdba- bilities with which it selects orders depending on the values of certain variables present at the time of selection of the orders. Information about the relationship of acceptable orders to the values of these vari- ables is digested and stored as the compiler Operates. Two methods of storing this information are suggested, one using a,model analogous to that of linear regression, the second based on the recognition of patterns among the variables. Finally, mention is made of other uses that might be made of this general approach with regard to translation of human languages and game playing machines. A m COMPIIER CAPABIE OF IEARNING BY RICHARD FAIRBANKS ARNOLD ATHESIS Submitted to the College of Science and Arts Michigan State University of Agriculture and Applied Science in partial fulfillment of the requirements for the degree of MAS'ER OF SCENE Department of Statistics 1958 AMOWIEDGIENJB The author wishes to acknowledge the direction and encouragement of Professor Gerard P. Weeg during the development and writing of this thesis. I should also like to thank Professors Leo Katz and Louis L. Mcmitty for their helpful suggestions and Professor Charles F. Wrigley without whose personal interest and guidance this work would not have been possible. This research was sponsored in part by a grant from the National Science Foundation to the Computer Laboratory of Michigan State University. 11 INTRODUCTION The problem of programming has more and more come into focus as one of the biggest obstacles to increasing the versatility and to decreasing the operating costs of computers. At present, two main problems con- front the computer users. These are: (1) Programs written for one computer cannot in general be used on one of a different design; there- fore, the same problem must be programmed as many times as there are cmxputers. (2) The machine languages themselves are difficult to pro- gram in. his has led to many efforts to allow the programmer to pro- gram in a language similar to that used in the progratmer'sown field of interest. In such a case, the computer itself either compiles a routine in its own language, or it interprets the meaning of the programmer's language and performs sane preset instruction or group of instructions. Although these methods, which shall be described more in detail in Chapter I, have accomplished many of their aims, they have created new difficulties, partly because of their success and partly because of their inherent deficiencies. As shall be seen later, both of these problems are essentially the same, and a solution to the first will provide one for the second. This thesis is an attempt to stmnarize research on these problem and to outline a mode of attack on a general problem which not only seems to possess the solution for the programing problem but also perhaps has a general applicability to the using of computers to solve problems "heuristically. " Chapter I will describe with more detail the current techniques and some of their limitations, particularly the notion of an interpre- tive routine which will be found to be one of the most powerful tools available . Chapter II will develop the notion of a randomly behaving program generator and tester and show how such a scheme would be applied to the problems discussed in Chapter I. Sane of the specific difficul- ties involved will be considered, and ways in which they might be over- cane will be suggested. Finally in Chapter II it will be shown that we may achieve superior results by breaking down a program to be trans- lated into parts to avoid excessively large expected running times. Chapter III will suggest a manner in which the program generator might be modified so that gradually the expected time for a translation may be reduced considerably as the scheme itself continued to operate. In conclusion, the method used for solution of the two problems will be shown applicable to other areas. In particular, the application of the method to game playing machines will be mentioned. This naturally leads to certain analogies with the behavior of organisms . CHAPER I LAMUAE CONVERSION more A. Canpilers And Assembly Routines Since most computers use mmerical order codes and almost all of the high-speed computers use the binary rather than the decimal system, one obvious way to facilitate the construction of programs is to write a routine which accepts routines written in decimal form and which per- form the necessary operations to convert to binary. Furthermore, this routine would allow the programmer to use mnemonic devices instead of the numerical operation codes themselves. This is the simplest kind of compiler. A canpiler is a routine written in a machine language which accepts a subject program and which constructs an object program equiva- lent to the subject program; but the canpiler does not execute the pro- gram. By a subject program is meant a program written in any language, called the pseudo code. An object program is a program constructed by a compiler, the language of which is again any language, not necessarily that language in which the compiler is written. Finally, two programs will be said to be equivalent if given identical inputs, the outputs are also identical. 'lhus, the most general ccnpiler will accept a subject program in a language different from that of the compiler and will con- struct an object program in yet another language. If, however, the pseudo code is primarily the same as that used in the cmpiler, with the inclusion of a relatively few codes not acceptable to the machine for which the object program is to be written, then the compiler is called an assembler or assembly routine. Another feature that has been incorporated into such assemblers is the use of relative addresses in the subject program which, when included with the desired initial address specified as a parameter, permits the same subroutine, for example, to be stored anywhere in memory. Each address in the relatively programed subroutine must have an accompanying label indicating whether or not it is "fixed" or relative to the locations where the program is being stored. An example of such a routine is the "Decimal Order Input" used by the Illiac class computers. If the computer has a permanent storage of subroutines in a medium- or low-speed access memory, then an additional feature may be added which would provide for the bringing in of an entire subroutine by means of one order. Along with this would be a feature enabling the assembler to allocate mory space. The "Share Assembly Progr‘aln" for the 1.3.11. 70h is an exammle of such a routine. Several steps higher in saphistication is the true canpiler. This performs approximately the same operations as an assembly program except that it contains within itself so many subroutines that the entire object routine may be expressed in a pseudo language distinct from the language of the object routine. 'lhis would most likely be the language of mathe- matics, and thus a novice could program with relative efficiency. An example of a compiler is "Fortran," again for the 1.3.14. 70h. "Fortran" also provides for the saving of information that would otherwise have to be recomputed if there is a need for it further along in the program. Coupilers may also optimize the use of indexing registers. The next step would seem to be then that a cannon subject language should be agreed upon for all computers, and. then all that would be nec- essary would be that the compiler for a machine be written in order that that canputer might take full advantage of other programs. This indeed would be a boon to programmers and such an endeavor to standardize the programing language is being made. I.B.M. seems to be set on Fortran as its basic programming language, subject of course to modifications. Already a routine called a Eeprocesser is available called "Fortrans it" which, given a program using a subset of the "Fortran" code, will produce a program in "I.‘1'.," which is itself the subject language of an 1.3.14. 650 canpiler. Similar preprocessers and canpilers are being made avail- able for. the various models of the 705, the 709, and the "STRETCH" computers. ‘ B. Interpretive Routines An earlier attempt to solve the programing problem led to the creation of interpretive routines. An interpretive routine is a machine language program which accepts a portion of a subject program, selects from a predetermined set of subroutines an object subroutine in that machine language, and then executes that object program before proceeding to the next portion of the subject program. Early in the development of program libraries, it was found that certain problems used a certain group of subroutines repeatedly. It is convenient to group these subroutines into a package. Then a "macro" or "pseudo" instruction is made to represent each subroutine, and an inter- preter is added to the set of subroutines. The interpreter is capable of recognizing the pseudo instructions and of transferring control to a designated subroutine . After the subroutine is executed, control returns to the interpreter which then examines the next pseudo instruction and repeats the process. It is necessary for the interpreter to have a "control counter," that is to say, it must know the memory location of the pseudo order to be executed. It is also necessary to have pseudo arithmetic registers. It would be quite easy then to have pseudo in- structions that transfer control by resetting the pseudo control counter, and the transfer might be made contingent upon the contents of the pseudo arithmetic registers creating "test" instructions. In fact, the inter- pretive routine possesses exactly the same operating characteristics as a computer, and, therefore, it is possible by an interpretive routine to simulate any conceivable computer. Also, it is important to see that the control of the computer is never transferred to the pseudo orders themselves, which would of course be meaningless to the machine. One cannon interpretive routine is a floating point routine. That is one in which a number N = aobc is represented, for example, in a computer by (a, c). Other interpretive routines may interpret algebraic symbols and include subroutines for computing the elementary and certain transcen- dental functions . An example of the use of an interpretive routine to simulate a casputer has been the recent effort by the author on a routine for the mm which simulates the 1.3.14. 650. ‘nie purpose of writing this routine was to make available for use the library of 650 subroutines. As with all interpretive routines, this routine contains an additional instruction in its repertoire which transfers control out of the inter- pretive mode to direct execution of HISTIC orders. Thus, it is possible to mix MISTIC and 650 subroutines in the same program. Although, as shall be seen, this scheme has crippling drawbacks, it is indeed a solution of the programming problem since it allows the interchangeabil- ity of programs. If someone were to write an interpretive routine for the Remington Rand 1103A to simulate MJBTIC, he would gain not only all the MISTIC programs but all of the 650 programs, since the "pseudo MIIBTIC" could simulate the 650, and the llO3A would be Operating in a kind of second order interpretive mode. If a computer, 01, can interpretively accept the language of cmputer 61-1! for i = l, 2, ..., n, then computer at can accept the language of camputers Go through Or. he mode of interpretation between computers Cr and C, for s f r is said to be of the order r-s. If now computer an is the same as computer 00, then a total of only n inter- pretive routines is needed for each computer to simulate all the other computers. Before considering the disadvantages of interpretive schemes, a few more possibilities will be examined. It is possible that the struc- ture of the arithmetic unit of the computer being simulated be entirely different frcm the one performing the simulation. For example, a can- puter might be simulated which would operate only with matrices. It would also be possible to simulate a computer which differs only slightly from the computer performing the simulation. Consider, for example, an interpretive routine which carries out the orders of MISTIC with the exception that it would retain, say, the ten orders previously executed. If the interpretive routine received an order that would ordinarily cause MISTIC to hang up, it could immediately print out these 10 previously executed orders. 'fliis would give the programmer invaluable information, since, at present, when code checking by executing the program directly, he has no way of knowing exactly the sequence of orders leading to a hang up and thus where he made his mistake. Finally, it should be noticed that the pseudo codes used by compilers such as "Fortran" could be adapted to an interpretive routine. Even though the routine would be complicated, it would be considerably less work to write than the com- piler itself. Thus, interpretive routines can be used not only to go from one machine to another but also to go from any desired language that expresses a defined set of operations. Unfortunately, the obvious difficulty with any interpretive routine is that it takes much toolong to operate. In the author's 650 inter- pretive routine, for example, the necessity of separating two addresses from the operation code is extremely time consuming, involving several multiplications and divisions, because the 650 is decimal and the MISTIC binary. In the case of the 650 to MISTIC routine this is acceptable because the MISTIC is a much faster machine than the 650, and, therefore, the actual running time of a 650 program using the interpreter is not much different than if the same program were being executed by a real 650. However, if one were trying to adapt programs of a computer of comparable speed, . the interpretive routine would be much too cumbersome. Second or higher order interpretive schemes would operate fantastically slowly. What are the difficulties then associated with compilers of the "Fortran"type? First of all, they are difficult and lengthy to write. The initial 'Tortran" effort took 25 man years. This could be contrasted with the two months taken to write the 650 interpretive routine. A second difficulty of compilers, such as "Fortran, " is that although they are becoming more and more versatile, they still fail to express certain types of Operations,and it has become necessary to make it possible to adapt the compiler so that the "Fortran" language may be temporarily left and programming done in a language closer to the initial machine language. Of course, this is a desirable feature for a compiler to have, but it does not solve the initial problem for which it was created, namely, to avoid machine languages completely. A further disadvantage is that as a canpiler becomes adapted for use on more than one computer, many of the "coding tricks " will have to be avoided. This may be desirable from the point of view of the compiler writer, but optimum programs will not be written. This will be particularly true as newer generations of computers are built which may have less similarity to present ones than we expect. Also, as new computers come into exist- ence, it is likely that it will become increasingly difficult for even the trained programmer to use the machines efficiently. Whether or not compilers and their improved offspring are the answer for the future only time can tell. However, the author would like to offer a different approach to the problem which, although it will probably not prove to be a better solution for the present, may indeed offer a much better-line of attack for the future. Let us review first what is desired. We would like a scheme that would accept a program, either in the pseudo code of a canpiler or'in a different machine language, and. produce an m program for a given machine. We would like this program to take full 10 advantage of all the special features of the machine as well as be coded in a fashion such that running time and. program length are at a minimum. We would also wish not to do too much work in adapting this scheme to a new machine. Since we may not ourselves know just how best to program a given machine, we would like it if the scheme itself could find the optimum coding proceedures and. apply them. Finally, and this is the one requirement which will be most difficult to meet, the scheme must produce a program in a reasonable length of time and not require a tremendous amount of memory space. With the exception of the last racluirement, the author believes his scheme capable of satisfying all of these, and we hope that this ”last one too may be met. The two chapters that follow are devoted to the development of this scheme. CHAPEBII AMHERIBDBARWPRWW I A. The Ccmpiler This chapter will be devoted to describing a new kind of compiler. It is similar in function to other compilers in that it accepts a pro- gram in language A and produces an equivalent program in a language B. Since the requirement of our languages is that each order must specify exactly an operation, these languages define computers and, therefore, we may sometimes speak- of computers A and B. he campiler operates as follows. It first generates a program in language B at random as will be described below. Although the program must be of a specified length, it may be any conceivable ccnbination of orders. It then proceeds to determine whether this candidate progrms is equivalent to the given program in language A. he method of determining the acceptability of the program involves the use of two interpretive routines, A and B, which are capable of executing the orders in the two languages. The subject program and candidate program are then both executed using identical data to ascertain whether the candidate program B is capable of producing identically the same output. If it is determined that they are equivalent, then this program will be punched out and the transla- tion will be complete. If not, a new candidate program is produced and tested in a similar fashion. he process is repeated until an acceptable program is produced. 12 B. he Random Program Generator The random program generator will have the following characteristics. Provided with the length of program desired it will compose one in the following way. The first order of the program will be chosen at random from all possible orders that might appear there, so that the probabilities associated with the choosing of all orders are identical. hen, the second order will be chosen in the same manner, and then each successive order in the program until it is complete. his becomes the candidate program. This method could generate any conceivable program of a given number of orders. However, the probability that a particular program is generated is exactly the same for all programs. he program generator will utilize a random number generating routine, and the range of the random numbers will be partitioned in such a way that by assigning equal intervals to each order and selecting that order which corresponds to the interval containing the random number, a program may be generated in the desired fashion. We recognize that the canputer can generate only "pseudo" random numbers and that this will introduce difficulties. It is not too much to expect that the numbers be distributed rectangularly. at more importance, however ’.. is the sequence that the generation of these numbers may take. It will be necessary that the random numbers not appear in a sequence such that the corresponding programs do not contain certain sequences of orders. It will be necessary to continue watching for this possibility even if the canpiler’works, since it may be producing only certain kinds of programs . 13 C. he Acceptability Criterion l. he Interpretive Routine Simulation of Computers A and B The simmlation of the A computer will be straightforward. All one really needs to know is what the output is for a given input. However, we shall provide the interpretive routine that executes the A program with one additional feature, an "address stop." hat is, the programmer, or in this case perhaps acme other part of the campiler, shall be able to specify, before transferring control to the A interpreter, a particular location which, if appearing in the control counter, will cause the routine to jump to some previously designated location outside of the interpretive routine itself. Any order which would otherwise cause the computer to stop would be treated in a similar fashion. he B computer will be simulated in a like manner with the address stop feature. he B interpreter will also be equipped with a "clock" that will keep track of the running time of the program it is executing as if it were being executed directly by a real camputer B. hus, it will be possible to discriminate between two acceptable programs which have different running times. Also, by placing a limit on the length of time a program may run on interpreter B, it will have a means of discovering and stOpping endless looping. he "clock" would be checked before each order were executed to see if the maximum time permitted had been exceeded. One final specification will be that these interpretive routines shall be in some standard form as it is hOped that to adapt the compiler to any other two languages, it will only be necessary to put in the inter- pretive routines for those languages and to provide the randam program 1% generator with a list of ,the orders comprising the object language. his is greatly to be desired since the writing of interpretive routines themselves is a relatively simple task. It will be found in Chapter III that additional features on both of the interpretive routines may be desirable, but, for now, no additional features will be required. 2. he Use of Randomly Generated Data in Acceptability Tests he definitions of equivalent programs in Chapter I only considered the inputs and the outputs of the two programs being compared. It would be possible of course to require that for two programs to be considered equivalent, it would be required that they use the same algorithm. How- ever, given a program it is not possible to recover the algorithm used, and the only alternatives are either that the routine itself be treated as the algorithm or that we consider only results. It is not desirable that the program work correctly only for certain values of the data variables. herefore, many sets of data must be run successfully before acceptance is final. herefore, random sets of data will be generated using the random number generating routine and. input. It will of course only be necessary to use one set of data until a candidate program succeeds in producing the correct output for that one. It will also be necessary that the range of thedata variables be specified along with the subject routine. he final acceptance of a program will require a certain specified number of correct runs. he procedure will be that at the completion of the running of a program, the output will be examined and compared to that of commuter A. If it is different, the candidate program will be rejected as it would also be if a specified time on the "clock" were exceeded or if the 15 computer hung up. If the output is found to be identical to that of computer A, then a new set of data would be generated and run through canputer A with the subject program to determine the correct output, and computer 3 would be given the same data and the candidate program executed again. When an acceptable program is found, it may either be punched out on cards or tape or else retained and a search made for a better program in terms of word length and running time. he compiler as it has been described would be fortunate indeed to produce a program of moderate size in the life of the machine much less than within the hour or so maximum that might be allowable. Never- theless, aside from this difficulty, this compiler would indeed do every- thing else that was desired. If it were allowed to run long enough so that it could choose among the best of many acceptable programs, it would not only tailor the program to the machine involved but also in fact find many new "coding tricks" that a programmer might never stumble upon- How then can the expected time involved in a translation be cut down? Two methods will be discussed. Chapter III considers how the program generating part of the canpiler may be replaced by a unit which will gradually modify the probabilities associated with the generation of programs in a manner such that they will produce acceptable programs more frequently. The remainder of this chapter deals with the possibility of breaking the subject program into parts and translating one part at a time . D. Sectioning ‘me Subject Program 1. Criteria for Sectioning If it is desried to break the subject program into sections, it will be necessary that for a given section there must be a criterion for 16 the acceptability of that section. Also, it would be required that this criterion be such that if all the sections were correct, the entire pro- gram would also be correct. If a method of sectioning can be found and these requirements met, the compiler could then handle one section at a time in the same way it previously handled the whole program, and such a procedure might be much quicker. 'L’ne acceptance or rejection of the progrmn described in the pre- vious section was based on the fact that both computers had input and output devices and, furthermore, that these input and output devices were enough alike in the form in which they handled the data to make canparison easy. However, when considering a section of a program, we can no longer define correctness in these terms because it is unlikely that input and output operations even occur in that section. The input a'nd output devices were treated as equivalent parts of the computer. However, other parts may also be treated as equivalent, and this will make possible the development of adequate criteria for the testing of sections. Since it will be the procedure to test sections by executing them, it will be necessary to specify exactly what state the computer should be in at the termination of the execution of a section, and since this must be determined by the state of computer A at the same stage, there must be ways of expressing the state of calputer A in terms _ of the possible states of computer B. One obvious equivalence of parts is that the memories will be analogous. It might be said that the state of A is equivalent to the state of B if their memories contain identical information. If it were required that every part of computer A have an equivalent part in B, then the normal operations of B would have to be 1? abandoned and B made to mimic A's every move. This indeed would be undesirable since, for example, if it were required that B be in an equivalent state to A after every order, then, even if it were a decimal machine, it would have to find a way in which it could do operations such as forming logical products, which are extremely awkward in any system other than a binary system. “therefore, equivalent parts shall only be defined when the equivalence is natural. Initially, this will mean that the memories, control counters, and input and output devices must be made to correspond. If only equivalent parts may be examined, it must then be required that only those parts of the computer may contain information relevant to the program. This then requires that our sections be constructed so that if the state of all other parts of either computer A or B were altered at the time when tests for equivalent content were being made, there would be no interference with the correct operation of the program. An example of this would occur if we chose our section to end just prior to an operation that would clear one of the arithemtic registers. At this point, any random number might be inserted in that register without interfering with the correct operation of the program. This requirement will be used in sectioning the subject program. The subject program will be "cut" between those orders when the state of all non-equivalent parts of computer A may be altered with- out affecting the running of the program, and the procedure will be that all possible "cuts" will be made, random munbers inserted in those non- equivalent parts, and the program continued. 'BlOBe cases where no errors are introduced will then be recorded and the program divided at those points . ' .l..u . ..i‘},v{.£rfl:.lnll.jll‘.!dcy L. . ..IIIJ 18 The canparison of the inputs and outputs, if any, and the conditions of the control counters is fairly straightforward. The addresses speci- fied in the control counter would have to be compared by ascertaining whether or not the sections referred to by the two addresses are equivalent. 'nie canparison of the two memories is more difficult. Regarding memory itself, it is clear that locations which have in them numbers that are computed functions of the data should be compared for equality. Orders which have been modified by a section mist somehow be examined to see if they will perform correctly in their modified form. Also, the contents of other locations may have information which depends for its form on the order code of the particular computer, for example, "binary switches" and "dummy" orders. It is necessary then to establish the correctness of such locations by linking them with other sections so that we can conclude that if that section is acceptable, then the Opera- ti'on upon the location in memory which was subsequently used in that section is also correct. For each section this entails making a list of sections whose acceptance must cane prior to the acceptance of that section. In some cases it may be necessary to work backwards from sections which only Operate on data and may be checked imediately, through several different sections, the acceptance of each one being a necessary requisite for the acceptance of a prior one. . Admittedly, the above discussion is not canplete and there may be situations arising which we have not considered. Nevertheless, it is felt that the method itself is powerful and can probably be adopted to the new difficulties as they arise. l9 2. Expected Running Times The purpose of sectioning the program is of course that it is desirable not to throw out a program completely when only a part of it is incorrect. However, though it might seem obvious that the expected running time would be reduced, this does not itself follow just because the program is being compiled in sections as shall be seen. Consider first the probability that an entire program generated by the random program generator is exactly a given program. If the program contains Q orders, each of which might be any of 1: different orders, the probability will be l/k’q. The expected number of trials until it is constructed will be kg trials. Now, if the program is sectioned into g parts of lengths m1, m2, . .., -m then the probability 8) of a particular candidate section being correct is l/kmi, the expected number of trials is kmi, and the expected number of trials for the entire program is i kmi. However, this is not the situation. In fact, there are maglacceptable programs. If there are c acceptable programs of length R, then the expected number of trials to hit one of them will be kg/ c , and if the program is divided into g sections with lengths m1, m2, ..., ing and there are c1 acceptable ways of writing the ith section, then the expected number of trials to translate the entire program will be 28: kmi/ci. If TTci = c, then it could be concluded that the expectedaiumber of trialigfor the procedure using sectioning is much less than the other one provided that thelmininum expected numberlof runs for any section is greater than 33:. For g > 1, l< gm é 2, and the expected number of runs for any section will never become that low. However, by requiring the program to achieve 20 certain criteria at many points during its execution, many programs that would satisfy the criteria of identical outputs will be rejected. There- fore, in general 1% c1 < c. There is no apriori method then of saying conclusively thatiictioning will lower the expected number of times, although it seems that it would except in unusual circumstances. The point is, however, that it might at some point be possible to pool some of the sections together and reduce expected running times. In concluding this chapter then we shall summarize by saying that a method of dividing the translation of the subject program into parts has been suggested. Criteria for judging the acceptability of these parts have been specified, and consideration has been given to the question of how the sectioning of the subject program might affect the expected running time . «-.,—a CHAPTER III THBADAPTIONOFIEARNIMIDDEISTOTHE COMPUTER A. Illle Alteration Of The Random Program Generator To Permit learning In Chapter II the random program generator selected each of the k possible alternatives at each step with equal probabilities. It is entirely possible to allow the orders to be selected with other proba- bilities. Given a set {P} of probability vectors P a (pl, 132, ..., pn) such that i p1 . l where p1 is to be associated with order 01, the selection :1“ order may be made in the following manner. Given a pseudo randcn number R such that 04 R g l, and a vector P, then 03 will be selected when JE- Pi- < R < i p1. If R has a rectangular distribu- tion as it shouldilalive, then Dir-will be selected with probability p3. A specific selection of an order shall constitute a response, and the set of probability numbers (91’ p2, ..., Pk) shall be abbreviated P. Intuitively, it is felt that sane P's will be more satisfactory than others, and a method must be found to arrive at some "best" one. First some measure of performance is needed. The measure might include not only how often a particular P produces an acceptable program, but also whether the program is concise and has a short running time. However, it will be more convenient at present only to consider a U(P) = Pris candidate program using P is acceptable}. It would then be desirable to find the maximum of this function where P ranges over the set {P}, and use the corresponding P. At present there is no information 21 22 about this function. However, a guess might be made that it is contin- uous but has several modes. It shall be asslmed, however, that it has but one, the hope being that if any of them are discovered, a fairly satisfactory selection of orders will ensue. One method that might be used would be to start out with the pi's equal. then, as in all of the methods discussed in this chapter, there will be a modification of the pi's whenever an acceptable program is achieved, or if sections are being dealt with, an acceptable section. Given that a particular order 0c were in the acceptable section, the current set of pi's would be replaced by [(131 " '32-): (P2 " §2-))'°°9 (PC "’ T):°°°: (Pk ' g5” where d 2 l is a parameter that would reflect the magnitude of the desired change. This procedure would be repeated for each order in the acceptable section. The expected value of P should approach the value at which U(P) has its mode. It will be desirable to let d be a function of the number of modifications already mde, so that as modal value is approached, the variance about it will be decreased. This procedure might be called learning because it permits an increment in performance to occur as a result of the "experience" of the compiler. B. The Stimulus Situation Consider now the situation of the response selector at a given point in its selection of orders for a candidate section. Previously, the P's used for the selection of each order in the program have been 23 identical. However, if Py is the set of probability vectors that might be associated with the selection of a particular response y, it would seem that max U(Py) will occur at different values of P for different values of y. In order to devise a method for obtaining these different values of P, there must first be a method of classifying the y's. It may seem at first that 37 could be classified according to its location alone in the program, but this would not help a great deal. It is clear that most of the deviation of Py frcm P may be explained in terms of y's relation to the orders around it in the program. For example, Py should certainly depend on the selection that has already been made for (y-l). 'Ihis can be treated as if max Py (that value of r at which 00") has its maximm) and (y-l) were a bivariate distribution. (y-l) is a discrete variable that can take on any of k values. In general, not only will max P be dependent on (y-l) but also on (y-2), (y-3), and in fact on many other variables that might not even be defined in the candidate progrm. I Other variables that should be considered are the particular variables that make up the subJect section and the contents of certain locations in both of the interpretive routines used. While the rela- tionship of the subject orders might be obvious, that of the interpre- tivgo‘i‘snfi'obably not . The interpretive routines do contain variables in the sense that (y-l) is a variable, for exmple, the "left-right counter" necessary in an interpretive routine simulating MISTIC which has two orders stored in one location in memory. In fact, it might be suspected that since a human being can know all there is to know of importance to progrming about a canputer by examining an interpretive 21+ routine simulating it, indeed a great deal might be gained by consider- ing the relationship between max Py and sane of these variables. The reason for the concern with the dependence of all these variables with max Py is that prior to the selecting of the value of y, the particular values that these other variables have taken will be known; and if the relationship is also known, a better Py may be used. Certain definitions follow naturally. If x = (x1, x2, ..., xn) is a set of variables whose values may be determined prior to the selection of order y and S a (x11, x21, ..., 2cm), a set of particular values for X, then max Py,3 shall mean that value of {P} for which “(Py.s) achieves its maxi- mum, where U(Py.3) - Pr a response chosen using P -s is acceptable ' y given that x a 8}. S is called the stimulus vector, and x1, x2, ..., xn, stimlus variables. The execution of the previous candidate program, if it failed, may also prove of interest since if the routine "hung up, " the last order executed may well be the one at fault. It is seen then that there are many variables defined by their particular relationship to y that might be considered, some of which are easily accessable at the time of selecting y while others, such as the other orders themselves comprising the canparable sections, may have been overwritten before that time. Although no more consideration shall be given to the more obscure variables, it might be mentioned how one might go about saving them. The most obvious method would be to provide in the interpretive routines themselves for the storage of the values of some of these variables in some inviolable location. This would be wise to do iii-all 25 for those variables which we can apriori guess will be desired. However, it is hoped that since all possible stimulus variables cannot be con- sidered, some continuous selection and rejection of Just which ones shall be considered can be engaged in. This might involve the use of blocking orders or the use of an interpretive routine to execute the other interpretive routines themselves. C. The Selection Of lhe Response Considering again the selection of the value of y, it is desired to find max "(Py.s)- If the best P)“, were determined independently for each possible 8 and if d - 1, that is, if pc becanes 1 when c is the correct response for the specified 8, and if all possible. stimulus variables were included, the schene would have these characteristics. It would classify canpletely all the correct responses corresponding to the given stimulus vectors. But the same situation, that is B, would never be encountered twice unless it were again asked to translate the same program. Therefore, it would never be any better off than a scheme which took no cognizance of the stimulus situation. Of course, this method is impossible because of the space requirements. Restricting the amber of stimulus variables might be one way out. However, if one such variable could take on the average 20 values, the p's expressed to five binary digits, and k e 20, the hoo,ooo words of to bits necessary to store the p's for Just four variables would be prohibitive. D. A Linear Modal Clearly the trouble lies at least partly in allowing the max PY'B to be freely determined for every 8. While it is true that in general 26 the effect of one stimulus variable x1 on max P3,.s will depend upon the values of the other stimulus variables, it is not necessarily the case that the effect of xi on max Pyos will change for every change in the value of any stimulus variable. A method of determining the max Py-s given 8 that would ignore the interaction among stimulus variables might be developed along lines that parallel linear regression. 'nle modal assumes that max P)“; is a linear combination of max 13.3.1.1l where Pyoxa probability vector associated with the stimlus variable x3. 1 ranges .1 represents the ith over the possible values of 2:3. Given then an S a (x11, x21, ..., x111), the value Py,a to be used in the selection of y would be P3,,8 - b1(max Pyoxli) +b2(max PY°X21)’ ..., + bn(max PY°xn1) The max PYJ‘Ji'B would be estimated in the standard way; Py.xJi would be modified to increase Pc when 0c was an acceptable order occurring at _y when x, had value 1. b J should be increased as all the individual probabilities within {Pym} tend away fran max P (max P taken in the sense of the last section). 'lhis could be accomplished by letting b3 increase as 12:;(Py-xhih - Ph)2 where Py.xJih is the hth canponent of Py'XJi and Ph is the hth component of max P. Likewise, bJ should be decreased as its shows increasing correlation with the other variables of X. It is suggested that a measure of how much b3 should be lowered because of its correlation with other variables might be some function of the similarity of the" particular P to the other n-l vectors y.xdi determined by the particular 8. 27 This modal might be made even more flexible if each Py-x 31 had its own b Ji' his would permit "differential validity" within a par- ticular stimulus variable, but it would also necessitate some correction factor since a canbination of such in would not yield a probability vector. A method for the selection and rejection of stimulus variables would follow naturally. At periodic intervals that xi whose b J or in" were lowest would be discarded, and a new one selected at random from the remaining variables not currently represented in the stimulus variable set . E. A Pattern Analytic Model Another method of relating S to P is still in the process of development. A brief description will be given, however, because it is felt to be superior to the above method and also to parallel more closely the functioning of organisms. . It is suggested that a progranner's tentative choice of an order is determined by the relation of the particular 8 to certain patterns around which his perceptions are organized. An analogous scheme for the canputer would work in the following manner. Given an S where the stimulus variables each may have only the values 1 and 0, this scheme would classify 8 according to its similarity to each of my patterns A1, a2, ..., at (populations), where A1 is defined by a set of values an, 3.21, ..., am1 where a.31 a 1.fo = 1|A1‘3 with x1 the usual stimulus variable. he similarity of a given S to A1 would be measured by some monotonically increasing function of the likelihood of 8 given A1, say Ss’A1° Each A1 would have an associated P3,,“ and 28 t PMI =- Z 83.13 P - i=1 1 ”"51 Such a system would allow certain combinations of stimulus variables to have an effect on the FY entirely the reverse of the separate effect of either in the absence of the other. The problem of determining both the best set of Ai's and their corresponding Py'Ai scheme, and we have in mind no complete method for their determination 's is considerably more difficult than in the previously described since it is not even clear to us Just what kind of criterion we can apply to the selection of the A's. It might be said that the modifi- cation of the A's and P Y°A1 l) The extent to which each pattern A1 should be modified '15 would in general follow these rules. to resemble a given 8, at that point which an acceptable response has been determined should be: a) A direct function of 3,. A1. b) An inverse function of the number of other patterns which are more similar. c) An inverse function of the probability of obtaining the acceptable response given Ps- A1. 2) the extent to which each PM51 should be modified should be: a) Primarily a direct function of 33-A1° b) An inverse function of the number of other patterns whose 33.151 and whose likelihood of the acceptable response given P3.Ai's are also greater. c) A direct function of the likelihood of the correct response given P, . A1 . 29 F. Conclusion Still other methods have been considered, but they all seem to be either clumsy or offer little hope for any fast improvement. How- ever, if there were a scheme that required an extraordinarily large number of acceptable responses to be given it before it could begin to Operate efficiently, it might be possible to combine it in such a. fashion that as it became efficient it would gradually replace the other scheme. This would be a kind of evolution within the canpiler. At this point it seems that the best course of action now will be actually to try to write a compiler that embodies the principle of sectioning and a learning model similar to one of the ones descrifid. Since the majority of the ideas in the develoynent of the compiler described in this thesis have arisen with less concern for finding a "best way" than for finding "any way," it might prove that if no com- piler of this sort were attempted until the theory were canplete, indeed a great deal would be lost. However, this does not mean of course that we should be satisfied with a scheme Just because it works, since only by further developing the theory will it be possible to increase the learning and generalizing ability of machines. It should be mentioned that the scheme which we have conceived of primarily as a canpiler might be ad ptable to many other problems. Consider, for example, the playing of games such as tic tac toe or checkers. The latter would entail developing considerably more sephis- ticated criteria, since in checkers, for example, it would not be 5r Kit/a 30 desirable to classify moves as only "acceptable" or "unacceptable. " It might be found that a system of classifying results of moves might be made using a pattern analytic method similar to the one deve10ped for responding to stimuli. Since the response selector of the compiler responds variably to a particular stimulus pattern, it should be capable of learning to play games of mixed strategy. 'me use of the pattern classification scheme in the construction of human language translation machines should be investigated. 'nle recurring patterns of granmar in two languages are similar to the patterns that exist in the two machine languages . In conclusion, it is our hope that we have suggested a scheme which will lead to the develOpment of similar schemes of higher saphis- tication that will be able to increasingly invade those areas of problem solving that have hitherto been reserved for human beings. 1. . .Jrflll II||41 - Ii'll.l [ fill-\I .ll .5] 10. ll. 12. REFERENGS Preidberg, R. M., A Learningbhchine: Part 1, IBM Journal of Research and Development, Vol. 2, No. 1 Manual of Information, IBM 705 Autocoder System. Manual of Operation, IBM 650 Magnetic Drum Data-Processing Machine. Manual of Operation, IBM 70h Electronic Data Processing Machine. McQuitty, Louis L., Writ Analysis: Classifying Persons by Eredominate Patterns of Resmnses, British Journal of Statistical Psychology, Vol. IX, Part I. Hemiitty, Louis L. , Element_afl mm Analpis for Isolating Orthogonal and Oblique mes and £y&‘LfRelavanciesJ Educational and Psychological Measurement, Vol. 17, No. 2. Preliminary Manual of Operations, IBM Electronic Data-Processing Machine Type 705 - Progranner's Manual, Data-Processing Package for the EM 701i. Programner '3 Reference Manual, Fortran: Automatic Coding System for the IBM 70%. Programner's Reference Manual, Fortransit: Automatic Coding System for the IBM 650. Programmer '3 Reference Manual, IBM Electronic Data-Processing inchines Type 705 . Transactions of the Armour Symposium on Computer Applications, October, .1957. lIIIIl || I I'll |I| ||| III-I ll IIIIII |||| III ll III! || 1.. |||| llllll‘lhlfl 3