—
, 7 -
—

1
1,

1

I

W

\l

I

I

l

 

A GENERAL COMPILER CAPABLE OF LEARNING

Thesis Ior II“ Degree oI M. S.
MICHIGAN STATE UNIVERSITY

Richard Fairbanks Arnold
1958

 

THF'WS

 

LlﬁRARY
Michigan State
University

 

 

 

 

 

MSU

LIBRARIES
m

»

 

 

RETURNING MATERIALS:
P1ace in book drop to
remove this checkout from
your record. FINES wiII
be charged if book is
returned after the date
stamped be10w.

 

 

 

 

 

0

r

 

 

‘A GENERAL COMPILER CAPABLE OF LEARNING

BY
RICHARD.FAIRBANKS ARNOLD

AN ABSTRACT

Submitted to the College of Science and Arts
Michigan State University of Agriculture and
Applied Science in partial fulfillment of
the requirements for the degree of

INNSIER OF SCIENCE

Department of Statistics

Approved: ﬂ @

ABSTRAGI‘

I

This thesis describes in general terms a routine for a.high-speed
electronic digital computer that is capable of accepting a program
written in any precise language and producing a.program for any other
language, in this case, the operation code of a computer.

.A‘brief description is given of some existing programming tech-
niques, including assembly, canpiling, and interpretive routines. Then
a proposed compiler is described which operates by forming random pro-
grams and then testing them.for acceptability. A.method of dividing
the compilations into parts is discussed and.methods for testing these
parts suggested.

A.method of improving the expected time of compilation is discussed
that involves allowing the random.program.generator to vary the prdba-
bilities with which it selects orders depending on the values of certain
variables present at the time of selection of the orders. Information
about the relationship of acceptable orders to the values of these vari-
ables is digested and stored as the compiler Operates. Two methods of
storing this information are suggested, one using a,model analogous to
that of linear regression, the second based on the recognition of patterns
among the variables. Finally, mention is made of other uses that might

be made of this general approach with regard to translation of human

languages and game playing machines.

A m COMPIIER CAPABIE OF IEARNING

BY

RICHARD FAIRBANKS ARNOLD

ATHESIS

Submitted to the College of Science and Arts
Michigan State University of Agriculture and
Applied Science in partial fulfillment of
the requirements for the degree of

MAS'ER OF SCENE

Department of Statistics

1958

AMOWIEDGIENJB

The author wishes to acknowledge the direction and encouragement
of Professor Gerard P. Weeg during the development and writing of this
thesis. I should also like to thank Professors Leo Katz and Louis L.
Mcmitty for their helpful suggestions and Professor Charles F. Wrigley
without whose personal interest and guidance this work would not have
been possible.

This research was sponsored in part by a grant from the National
Science Foundation to the Computer Laboratory of Michigan State

University.

11

INTRODUCTION

The problem of programming has more and more come into focus as one
of the biggest obstacles to increasing the versatility and to decreasing
the operating costs of computers. At present, two main problems con-
front the computer users. These are: (1) Programs written for one
computer cannot in general be used on one of a different design; there-
fore, the same problem must be programmed as many times as there are
cmxputers. (2) The machine languages themselves are difficult to pro-
gram in. his has led to many efforts to allow the programmer to pro-
gram in a language similar to that used in the progratmer'sown field of
interest. In such a case, the computer itself either compiles a routine
in its own language, or it interprets the meaning of the programmer's
language and performs sane preset instruction or group of instructions.
Although these methods, which shall be described more in detail in
Chapter I, have accomplished many of their aims, they have created new
difficulties, partly because of their success and partly because of
their inherent deficiencies. As shall be seen later, both of these
problems are essentially the same, and a solution to the first will
provide one for the second. This thesis is an attempt to stmnarize
research on these problem and to outline a mode of attack on a general
problem which not only seems to possess the solution for the programing
problem but also perhaps has a general applicability to the using of

computers to solve problems "heuristically. "

Chapter I will describe with more detail the current techniques
and some of their limitations, particularly the notion of an interpre-
tive routine which will be found to be one of the most powerful tools
available . Chapter II will develop the notion of a randomly behaving
program generator and tester and show how such a scheme would be applied
to the problems discussed in Chapter I. Sane of the specific difficul-
ties involved will be considered, and ways in which they might be over-
cane will be suggested. Finally in Chapter II it will be shown that we
may achieve superior results by breaking down a program to be trans-
lated into parts to avoid excessively large expected running times.
Chapter III will suggest a manner in which the program generator might
be modified so that gradually the expected time for a translation may be
reduced considerably as the scheme itself continued to operate.

In conclusion, the method used for solution of the two problems
will be shown applicable to other areas. In particular, the application
of the method to game playing machines will be mentioned. This naturally
leads to certain analogies with the behavior of organisms .

CHAPER I
LAMUAE CONVERSION more
A. Canpilers And Assembly Routines

Since most computers use mmerical order codes and almost all of
the high-speed computers use the binary rather than the decimal system,
one obvious way to facilitate the construction of programs is to write
a routine which accepts routines written in decimal form and which per-
form the necessary operations to convert to binary. Furthermore, this
routine would allow the programmer to use mnemonic devices instead of
the numerical operation codes themselves. This is the simplest kind of
compiler. A canpiler is a routine written in a machine language which
accepts a subject program and which constructs an object program equiva-
lent to the subject program; but the canpiler does not execute the pro-
gram. By a subject program is meant a program written in any language,
called the pseudo code. An object program is a program constructed by
a compiler, the language of which is again any language, not necessarily
that language in which the compiler is written. Finally, two programs
will be said to be equivalent if given identical inputs, the outputs are
also identical. 'lhus, the most general ccnpiler will accept a subject
program in a language different from that of the compiler and will con-
struct an object program in yet another language. If, however, the
pseudo code is primarily the same as that used in the cmpiler, with the

inclusion of a relatively few codes not acceptable to the machine for

which the object program is to be written, then the compiler is called
an assembler or assembly routine.

Another feature that has been incorporated into such assemblers is
the use of relative addresses in the subject program which, when included
with the desired initial address specified as a parameter, permits the
same subroutine, for example, to be stored anywhere in memory. Each
address in the relatively programed subroutine must have an accompanying
label indicating whether or not it is "fixed" or relative to the locations
where the program is being stored. An example of such a routine is the
"Decimal Order Input" used by the Illiac class computers. If the computer
has a permanent storage of subroutines in a medium- or low-speed access
memory, then an additional feature may be added which would provide for
the bringing in of an entire subroutine by means of one order. Along
with this would be a feature enabling the assembler to allocate mory
space. The "Share Assembly Progr‘aln" for the 1.3.11. 70h is an exammle of
such a routine.

Several steps higher in saphistication is the true canpiler. This
performs approximately the same operations as an assembly program except
that it contains within itself so many subroutines that the entire object
routine may be expressed in a pseudo language distinct from the language
of the object routine. 'lhis would most likely be the language of mathe-
matics, and thus a novice could program with relative efficiency. An
example of a compiler is "Fortran," again for the 1.3.14. 70h. "Fortran"
also provides for the saving of information that would otherwise have to
be recomputed if there is a need for it further along in the program.

Coupilers may also optimize the use of indexing registers.

The next step would seem to be then that a cannon subject language
should be agreed upon for all computers, and. then all that would be nec-
essary would be that the compiler for a machine be written in order that
that canputer might take full advantage of other programs. This indeed
would be a boon to programmers and such an endeavor to standardize the
programing language is being made. I.B.M. seems to be set on Fortran
as its basic programming language, subject of course to modifications.
Already a routine called a Eeprocesser is available called "Fortrans it"
which, given a program using a subset of the "Fortran" code, will produce
a program in "I.‘1'.," which is itself the subject language of an 1.3.14.
650 canpiler. Similar preprocessers and canpilers are being made avail-
able for. the various models of the 705, the 709, and the "STRETCH"

computers. ‘
B. Interpretive Routines

An earlier attempt to solve the programing problem led to the
creation of interpretive routines. An interpretive routine is a machine
language program which accepts a portion of a subject program, selects
from a predetermined set of subroutines an object subroutine in that
machine language, and then executes that object program before proceeding
to the next portion of the subject program.

Early in the development of program libraries, it was found that
certain problems used a certain group of subroutines repeatedly. It is
convenient to group these subroutines into a package. Then a "macro" or
"pseudo" instruction is made to represent each subroutine, and an inter-

preter is added to the set of subroutines. The interpreter is capable of

recognizing the pseudo instructions and of transferring control to a
designated subroutine . After the subroutine is executed, control returns
to the interpreter which then examines the next pseudo instruction and
repeats the process. It is necessary for the interpreter to have a
"control counter," that is to say, it must know the memory location of
the pseudo order to be executed. It is also necessary to have pseudo
arithmetic registers. It would be quite easy then to have pseudo in-
structions that transfer control by resetting the pseudo control counter,
and the transfer might be made contingent upon the contents of the pseudo
arithmetic registers creating "test" instructions. In fact, the inter-
pretive routine possesses exactly the same operating characteristics as
a computer, and, therefore, it is possible by an interpretive routine to
simulate any conceivable computer. Also, it is important to see that
the control of the computer is never transferred to the pseudo orders
themselves, which would of course be meaningless to the machine. One
cannon interpretive routine is a floating point routine. That is one in
which a number N = aobc is represented, for example, in a computer by
(a, c). Other interpretive routines may interpret algebraic symbols and
include subroutines for computing the elementary and certain transcen-
dental functions .

An example of the use of an interpretive routine to simulate a
casputer has been the recent effort by the author on a routine for the
mm which simulates the 1.3.14. 650. ‘nie purpose of writing this
routine was to make available for use the library of 650 subroutines.

As with all interpretive routines, this routine contains an additional
instruction in its repertoire which transfers control out of the inter-

pretive mode to direct execution of HISTIC orders. Thus, it is possible

to mix MISTIC and 650 subroutines in the same program. Although, as
shall be seen, this scheme has crippling drawbacks, it is indeed a
solution of the programming problem since it allows the interchangeabil-
ity of programs. If someone were to write an interpretive routine for
the Remington Rand 1103A to simulate MJBTIC, he would gain not only all
the MISTIC programs but all of the 650 programs, since the "pseudo
MIIBTIC" could simulate the 650, and the llO3A would be Operating in a
kind of second order interpretive mode.

If a computer, 01, can interpretively accept the language of
cmputer 61-1! for i = l, 2, ..., n, then computer at can accept the
language of camputers Go through Or. he mode of interpretation between
computers Cr and C, for s f r is said to be of the order r-s. If now
computer an is the same as computer 00, then a total of only n inter-
pretive routines is needed for each computer to simulate all the other
computers.

Before considering the disadvantages of interpretive schemes, a
few more possibilities will be examined. It is possible that the struc-
ture of the arithmetic unit of the computer being simulated be entirely
different frcm the one performing the simulation. For example, a can-
puter might be simulated which would operate only with matrices. It
would also be possible to simulate a computer which differs only slightly
from the computer performing the simulation. Consider, for example, an
interpretive routine which carries out the orders of MISTIC with the
exception that it would retain, say, the ten orders previously executed.
If the interpretive routine received an order that would ordinarily cause

MISTIC to hang up, it could immediately print out these 10 previously

executed orders. 'ﬂiis would give the programmer invaluable information,
since, at present, when code checking by executing the program directly,
he has no way of knowing exactly the sequence of orders leading to a
hang up and thus where he made his mistake. Finally, it should be
noticed that the pseudo codes used by compilers such as "Fortran" could
be adapted to an interpretive routine. Even though the routine would be
complicated, it would be considerably less work to write than the com-
piler itself. Thus, interpretive routines can be used not only to go
from one machine to another but also to go from any desired language that
expresses a defined set of operations.

Unfortunately, the obvious difficulty with any interpretive routine
is that it takes much toolong to operate. In the author's 650 inter-
pretive routine, for example, the necessity of separating two addresses
from the operation code is extremely time consuming, involving several
multiplications and divisions, because the 650 is decimal and the MISTIC
binary. In the case of the 650 to MISTIC routine this is acceptable
because the MISTIC is a much faster machine than the 650, and, therefore,
the actual running time of a 650 program using the interpreter is not
much different than if the same program were being executed by a real
650. However, if one were trying to adapt programs of a computer of
comparable speed, . the interpretive routine would be much too cumbersome.
Second or higher order interpretive schemes would operate fantastically
slowly.

What are the difficulties then associated with compilers of the
"Fortran"type? First of all, they are difficult and lengthy to write.

The initial 'Tortran" effort took 25 man years. This could be contrasted

with the two months taken to write the 650 interpretive routine. A
second difficulty of compilers, such as "Fortran, " is that although
they are becoming more and more versatile, they still fail to express
certain types of Operations,and it has become necessary to make it
possible to adapt the compiler so that the "Fortran" language may be
temporarily left and programming done in a language closer to the
initial machine language. Of course, this is a desirable feature for a
compiler to have, but it does not solve the initial problem for which it
was created, namely, to avoid machine languages completely. A further
disadvantage is that as a canpiler becomes adapted for use on more than
one computer, many of the "coding tricks " will have to be avoided. This
may be desirable from the point of view of the compiler writer, but
optimum programs will not be written. This will be particularly true

as newer generations of computers are built which may have less similarity
to present ones than we expect. Also, as new computers come into exist-
ence, it is likely that it will become increasingly difficult for even
the trained programmer to use the machines efficiently. Whether or not
compilers and their improved offspring are the answer for the future
only time can tell.

However, the author would like to offer a different approach to
the problem which, although it will probably not prove to be a better
solution for the present, may indeed offer a much better-line of attack
for the future. Let us review first what is desired. We would like a
scheme that would accept a program, either in the pseudo code of a
canpiler or'in a different machine language, and. produce an m

program for a given machine. We would like this program to take full

10

advantage of all the special features of the machine as well as be
coded in a fashion such that running time and. program length are at a
minimum. We would also wish not to do too much work in adapting this
scheme to a new machine. Since we may not ourselves know just how best
to program a given machine, we would like it if the scheme itself could
find the optimum coding proceedures and. apply them. Finally, and this
is the one requirement which will be most difficult to meet, the scheme
must produce a program in a reasonable length of time and not require
a tremendous amount of memory space.

With the exception of the last racluirement, the author believes
his scheme capable of satisfying all of these, and we hope that this
”last one too may be met. The two chapters that follow are devoted to

the development of this scheme.

CHAPEBII

AMHERIBDBARWPRWW

I

A. The Ccmpiler

This chapter will be devoted to describing a new kind of compiler.
It is similar in function to other compilers in that it accepts a pro-
gram in language A and produces an equivalent program in a language B.
Since the requirement of our languages is that each order must specify
exactly an operation, these languages define computers and, therefore,
we may sometimes speak- of computers A and B. he campiler operates as
follows. It first generates a program in language B at random as will
be described below. Although the program must be of a specified length,
it may be any conceivable ccnbination of orders. It then proceeds to
determine whether this candidate progrms is equivalent to the given
program in language A. he method of determining the acceptability of
the program involves the use of two interpretive routines, A and B,
which are capable of executing the orders in the two languages. The
subject program and candidate program are then both executed using
identical data to ascertain whether the candidate program B is capable
of producing identically the same output. If it is determined that they
are equivalent, then this program will be punched out and the transla-
tion will be complete. If not, a new candidate program is produced and
tested in a similar fashion. he process is repeated until an acceptable

program is produced.

12
B. he Random Program Generator

The random program generator will have the following characteristics.
Provided with the length of program desired it will compose one in the
following way. The first order of the program will be chosen at random
from all possible orders that might appear there, so that the probabilities
associated with the choosing of all orders are identical. hen, the second
order will be chosen in the same manner, and then each successive order in
the program until it is complete. his becomes the candidate program.
This method could generate any conceivable program of a given number of
orders. However, the probability that a particular program is generated
is exactly the same for all programs. he program generator will utilize
a random number generating routine, and the range of the random numbers
will be partitioned in such a way that by assigning equal intervals to
each order and selecting that order which corresponds to the interval
containing the random number, a program may be generated in the desired
fashion. We recognize that the canputer can generate only "pseudo" random
numbers and that this will introduce difficulties. It is not too much to
expect that the numbers be distributed rectangularly. at more importance,
however ’.. is the sequence that the generation of these numbers may take.

It will be necessary that the random numbers not appear in a sequence

such that the corresponding programs do not contain certain sequences of
orders. It will be necessary to continue watching for this possibility
even if the canpiler’works, since it may be producing only certain kinds

of programs .

13
C. he Acceptability Criterion

l. he Interpretive Routine Simulation of Computers A and B

The simmlation of the A computer will be straightforward. All one
really needs to know is what the output is for a given input. However,
we shall provide the interpretive routine that executes the A program
with one additional feature, an "address stop." hat is, the programmer,
or in this case perhaps acme other part of the campiler, shall be able to
specify, before transferring control to the A interpreter, a particular
location which, if appearing in the control counter, will cause the
routine to jump to some previously designated location outside of the
interpretive routine itself. Any order which would otherwise cause the
computer to stop would be treated in a similar fashion.

he B computer will be simulated in a like manner with the address
stop feature. he B interpreter will also be equipped with a "clock"
that will keep track of the running time of the program it is executing
as if it were being executed directly by a real camputer B. hus, it
will be possible to discriminate between two acceptable programs which
have different running times. Also, by placing a limit on the length
of time a program may run on interpreter B, it will have a means of
discovering and stOpping endless looping. he "clock" would be checked
before each order were executed to see if the maximum time permitted
had been exceeded.

One final specification will be that these interpretive routines
shall be in some standard form as it is hOped that to adapt the compiler
to any other two languages, it will only be necessary to put in the inter-

pretive routines for those languages and to provide the randam program

1%

generator with a list of ,the orders comprising the object language. his
is greatly to be desired since the writing of interpretive routines
themselves is a relatively simple task. It will be found in Chapter III
that additional features on both of the interpretive routines may be
desirable, but, for now, no additional features will be required.
2. he Use of Randomly Generated Data in Acceptability Tests

he definitions of equivalent programs in Chapter I only considered
the inputs and the outputs of the two programs being compared. It would
be possible of course to require that for two programs to be considered
equivalent, it would be required that they use the same algorithm. How-
ever, given a program it is not possible to recover the algorithm used,
and the only alternatives are either that the routine itself be treated
as the algorithm or that we consider only results. It is not desirable
that the program work correctly only for certain values of the data
variables. herefore, many sets of data must be run successfully before
acceptance is final. herefore, random sets of data will be generated
using the random number generating routine and. input. It will of course
only be necessary to use one set of data until a candidate program
succeeds in producing the correct output for that one. It will also be
necessary that the range of thedata variables be specified along with
the subject routine. he final acceptance of a program will require a
certain specified number of correct runs.

he procedure will be that at the completion of the running of a
program, the output will be examined and compared to that of commuter A.
If it is different, the candidate program will be rejected as it would

also be if a specified time on the "clock" were exceeded or if the

15

computer hung up. If the output is found to be identical to that of
computer A, then a new set of data would be generated and run through
canputer A with the subject program to determine the correct output,
and computer 3 would be given the same data and the candidate program
executed again. When an acceptable program is found, it may either be
punched out on cards or tape or else retained and a search made for a
better program in terms of word length and running time.

he compiler as it has been described would be fortunate indeed
to produce a program of moderate size in the life of the machine much
less than within the hour or so maximum that might be allowable. Never-
theless, aside from this difficulty, this compiler would indeed do every-
thing else that was desired. If it were allowed to run long enough so
that it could choose among the best of many acceptable programs, it
would not only tailor the program to the machine involved but also in
fact find many new "coding tricks" that a programmer might never stumble
upon- How then can the expected time involved in a translation be cut
down? Two methods will be discussed. Chapter III considers how the
program generating part of the canpiler may be replaced by a unit which
will gradually modify the probabilities associated with the generation of
programs in a manner such that they will produce acceptable programs more
frequently. The remainder of this chapter deals with the possibility of

breaking the subject program into parts and translating one part at a time .

D. Sectioning ‘me Subject Program

1. Criteria for Sectioning
If it is desried to break the subject program into sections, it

will be necessary that for a given section there must be a criterion for

16

the acceptability of that section. Also, it would be required that this
criterion be such that if all the sections were correct, the entire pro-
gram would also be correct. If a method of sectioning can be found and
these requirements met, the compiler could then handle one section at a
time in the same way it previously handled the whole program, and such
a procedure might be much quicker.

'L’ne acceptance or rejection of the progrmn described in the pre-
vious section was based on the fact that both computers had input and
output devices and, furthermore, that these input and output devices
were enough alike in the form in which they handled the data to make
canparison easy. However, when considering a section of a program, we
can no longer define correctness in these terms because it is unlikely
that input and output operations even occur in that section. The input
a'nd output devices were treated as equivalent parts of the computer.
However, other parts may also be treated as equivalent, and this will
make possible the development of adequate criteria for the testing of
sections. Since it will be the procedure to test sections by executing
them, it will be necessary to specify exactly what state the computer
should be in at the termination of the execution of a section, and
since this must be determined by the state of computer A at the same
stage, there must be ways of expressing the state of calputer A in terms
_ of the possible states of computer B. One obvious equivalence of parts
is that the memories will be analogous. It might be said that the state
of A is equivalent to the state of B if their memories contain identical
information. If it were required that every part of computer A have an

equivalent part in B, then the normal operations of B would have to be

1?

abandoned and B made to mimic A's every move. This indeed would be
undesirable since, for example, if it were required that B be in an
equivalent state to A after every order, then, even if it were a decimal
machine, it would have to find a way in which it could do operations
such as forming logical products, which are extremely awkward in any
system other than a binary system. “therefore, equivalent parts shall
only be defined when the equivalence is natural. Initially, this will
mean that the memories, control counters, and input and output devices
must be made to correspond. If only equivalent parts may be examined,
it must then be required that only those parts of the computer may
contain information relevant to the program. This then requires that
our sections be constructed so that if the state of all other parts of
either computer A or B were altered at the time when tests for equivalent
content were being made, there would be no interference with the correct
operation of the program. An example of this would occur if we chose
our section to end just prior to an operation that would clear one of
the arithemtic registers. At this point, any random number might be
inserted in that register without interfering with the correct operation
of the program. This requirement will be used in sectioning the subject
program. The subject program will be "cut" between those orders when
the state of all non-equivalent parts of computer A may be altered with-
out affecting the running of the program, and the procedure will be that
all possible "cuts" will be made, random munbers inserted in those non-
equivalent parts, and the program continued. 'BlOBe cases where no errors

are introduced will then be recorded and the program divided at those
points . '

 

 

.l..u . ..i‘},v{.£rﬂ:.lnll.jll‘.!dcy

L. . ..IIIJ

18

The canparison of the inputs and outputs, if any, and the conditions
of the control counters is fairly straightforward. The addresses speci-
fied in the control counter would have to be compared by ascertaining
whether or not the sections referred to by the two addresses are
equivalent.

'nie canparison of the two memories is more difficult. Regarding
memory itself, it is clear that locations which have in them numbers
that are computed functions of the data should be compared for equality.
Orders which have been modified by a section mist somehow be examined to
see if they will perform correctly in their modified form. Also, the
contents of other locations may have information which depends for its
form on the order code of the particular computer, for example, "binary
switches" and "dummy" orders. It is necessary then to establish the
correctness of such locations by linking them with other sections so
that we can conclude that if that section is acceptable, then the Opera-
ti'on upon the location in memory which was subsequently used in that
section is also correct. For each section this entails making a list of
sections whose acceptance must cane prior to the acceptance of that
section. In some cases it may be necessary to work backwards from
sections which only Operate on data and may be checked imediately,
through several different sections, the acceptance of each one being a
necessary requisite for the acceptance of a prior one. .

Admittedly, the above discussion is not canplete and there may be
situations arising which we have not considered. Nevertheless, it is
felt that the method itself is powerful and can probably be adopted to

the new difficulties as they arise.

l9

2. Expected Running Times

The purpose of sectioning the program is of course that it is
desirable not to throw out a program completely when only a part of it
is incorrect. However, though it might seem obvious that the expected
running time would be reduced, this does not itself follow just because
the program is being compiled in sections as shall be seen.

Consider first the probability that an entire program generated
by the random program generator is exactly a given program. If the
program contains Q orders, each of which might be any of 1: different
orders, the probability will be l/k’q. The expected number of trials
until it is constructed will be kg trials. Now, if the program is

sectioned into g parts of lengths m1, m2, . .., -m then the probability

8)
of a particular candidate section being correct is l/kmi, the expected
number of trials is kmi, and the expected number of trials for the
entire program is i kmi. However, this is not the situation. In
fact, there are maglacceptable programs. If there are c acceptable
programs of length R, then the expected number of trials to hit one of
them will be kg/ c , and if the program is divided into g sections with
lengths m1, m2, ..., ing and there are c1 acceptable ways of writing the
ith section, then the expected number of trials to translate the entire
program will be 28: kmi/ci. If TTci = c, then it could be concluded
that the expectedaiumber of trialigfor the procedure using sectioning
is much less than the other one provided that thelmininum expected
numberlof runs for any section is greater than 33:. For g > 1,
l< gm é 2, and the expected number of runs for any section will

never become that low. However, by requiring the program to achieve

20

certain criteria at many points during its execution, many programs that
would satisfy the criteria of identical outputs will be rejected. There-
fore, in general 1% c1 < c. There is no apriori method then of saying
conclusively thatiictioning will lower the expected number of times,
although it seems that it would except in unusual circumstances. The
point is, however, that it might at some point be possible to pool some
of the sections together and reduce expected running times.

In concluding this chapter then we shall summarize by saying that
a method of dividing the translation of the subject program into parts
has been suggested. Criteria for judging the acceptability of these
parts have been specified, and consideration has been given to the
question of how the sectioning of the subject program might affect the

expected running time .

«-.,—a

 

CHAPTER III
THBADAPTIONOFIEARNIMIDDEISTOTHE COMPUTER
A. Illle Alteration Of The Random Program Generator To Permit learning

In Chapter II the random program generator selected each of the k
possible alternatives at each step with equal probabilities. It is
entirely possible to allow the orders to be selected with other proba-
bilities. Given a set {P} of probability vectors P a (pl, 132, ..., pn)
such that i p1 . l where p1 is to be associated with order 01, the
selection :1“ order may be made in the following manner. Given a
pseudo randcn number R such that 04 R g l, and a vector P, then 03 will
be selected when JE- Pi- < R < i p1. If R has a rectangular distribu-
tion as it shouldilalive, then Dir-will be selected with probability p3. A
specific selection of an order shall constitute a response, and the set
of probability numbers (91’ p2, ..., Pk) shall be abbreviated P.

Intuitively, it is felt that sane P's will be more satisfactory
than others, and a method must be found to arrive at some "best" one.
First some measure of performance is needed. The measure might include
not only how often a particular P produces an acceptable program, but
also whether the program is concise and has a short running time.
However, it will be more convenient at present only to consider a
U(P) = Pris candidate program using P is acceptable}. It would then be
desirable to find the maximum of this function where P ranges over the
set {P}, and use the corresponding P. At present there is no information

21

22

about this function. However, a guess might be made that it is contin-
uous but has several modes. It shall be asslmed, however, that it has
but one, the hope being that if any of them are discovered, a fairly
satisfactory selection of orders will ensue.

One method that might be used would be to start out with the pi's
equal. then, as in all of the methods discussed in this chapter, there
will be a modification of the pi's whenever an acceptable program is
achieved, or if sections are being dealt with, an acceptable section.
Given that a particular order 0c were in the acceptable section, the

current set of pi's would be replaced by

[(131 " '32-): (P2 " §2-))'°°9 (PC "’ T):°°°: (Pk ' g5”

where d 2 l is a parameter that would reflect the magnitude of the
desired change. This procedure would be repeated for each order in the
acceptable section. The expected value of P should approach the value
at which U(P) has its mode. It will be desirable to let d be a function
of the number of modifications already mde, so that as modal value
is approached, the variance about it will be decreased.

This procedure might be called learning because it permits an
increment in performance to occur as a result of the "experience" of

the compiler.
B. The Stimulus Situation

Consider now the situation of the response selector at a given
point in its selection of orders for a candidate section. Previously,

the P's used for the selection of each order in the program have been

23

identical. However, if Py is the set of probability vectors that
might be associated with the selection of a particular response y, it
would seem that max U(Py) will occur at different values of P for
different values of y. In order to devise a method for obtaining these
different values of P, there must first be a method of classifying the
y's. It may seem at first that 37 could be classified according to its
location alone in the program, but this would not help a great deal.
It is clear that most of the deviation of Py frcm P may be explained in
terms of y's relation to the orders around it in the program. For
example, Py should certainly depend on the selection that has already
been made for (y-l). 'Ihis can be treated as if max Py (that value of
r at which 00") has its maximm) and (y-l) were a bivariate distribution.
(y-l) is a discrete variable that can take on any of k values. In
general, not only will max P be dependent on (y-l) but also on (y-2),
(y-3), and in fact on many other variables that might not even be
defined in the candidate progrm. I

Other variables that should be considered are the particular
variables that make up the subJect section and the contents of certain
locations in both of the interpretive routines used. While the rela-
tionship of the subject orders might be obvious, that of the interpre-
tivgo‘i‘snﬁ'obably not . The interpretive routines do contain variables
in the sense that (y-l) is a variable, for exmple, the "left-right
counter" necessary in an interpretive routine simulating MISTIC which
has two orders stored in one location in memory. In fact, it might be

suspected that since a human being can know all there is to know of

importance to progrming about a canputer by examining an interpretive

21+

routine simulating it, indeed a great deal might be gained by consider-
ing the relationship between max Py and sane of these variables. The
reason for the concern with the dependence of all these variables with
max Py is that prior to the selecting of the value of y, the particular
values that these other variables have taken will be known; and if the
relationship is also known, a better Py may be used. Certain definitions
follow naturally. If x = (x1, x2, ..., xn) is a set of variables whose
values may be determined prior to the selection of order y and

S a (x11, x21, ..., 2cm), a set of particular values for X, then max
Py,3 shall mean that value of {P} for which “(Py.s) achieves its maxi-
mum, where U(Py.3) - Pr a response chosen using P -s is acceptable '

y
given that x a 8}. S is called the stimulus vector, and x1, x2, ..., xn,
stimlus variables.

The execution of the previous candidate program, if it failed,
may also prove of interest since if the routine "hung up, " the last
order executed may well be the one at fault. It is seen then that
there are many variables defined by their particular relationship to
y that might be considered, some of which are easily accessable at the
time of selecting y while others, such as the other orders themselves
comprising the canparable sections, may have been overwritten before
that time. Although no more consideration shall be given to the more
obscure variables, it might be mentioned how one might go about saving
them.

The most obvious method would be to provide in the interpretive
routines themselves for the storage of the values of some of these

variables in some inviolable location. This would be wise to do

iii-all

 

 

25

for those variables which we can apriori guess will be desired. However,
it is hoped that since all possible stimulus variables cannot be con-
sidered, some continuous selection and rejection of Just which ones
shall be considered can be engaged in. This might involve the use of
blocking orders or the use of an interpretive routine to execute the

other interpretive routines themselves.
C. The Selection Of lhe Response

Considering again the selection of the value of y, it is desired
to find max "(Py.s)- If the best P)“, were determined independently
for each possible 8 and if d - 1, that is, if pc becanes 1 when c is
the correct response for the specified 8, and if all possible. stimulus
variables were included, the schene would have these characteristics.
It would classify canpletely all the correct responses corresponding to
the given stimulus vectors. But the same situation, that is B, would
never be encountered twice unless it were again asked to translate
the same program. Therefore, it would never be any better off than
a scheme which took no cognizance of the stimulus situation. Of course,
this method is impossible because of the space requirements. Restricting
the amber of stimulus variables might be one way out. However, if one
such variable could take on the average 20 values, the p's expressed to
five binary digits, and k e 20, the hoo,ooo words of to bits necessary

to store the p's for Just four variables would be prohibitive.
D. A Linear Modal

Clearly the trouble lies at least partly in allowing the max PY'B

to be freely determined for every 8. While it is true that in general

26
the effect of one stimulus variable x1 on max P3,.s will depend upon the
values of the other stimulus variables, it is not necessarily the case

that the effect of xi on max Pyos will change for every change in the

value of any stimulus variable.

A method of determining the max Py-s given 8 that would ignore

the interaction among stimulus variables might be developed along lines
that parallel linear regression. 'nle modal assumes that max P)“; is a
linear combination of max 13.3.1.1l where Pyoxa
probability vector associated with the stimlus variable x3. 1 ranges

.1 represents the ith

over the possible values of 2:3. Given then an S a (x11, x21, ..., x111),

the value Py,a to be used in the selection of y would be

P3,,8 - b1(max Pyoxli) +b2(max PY°X21)’ ..., + bn(max PY°xn1)

The max PYJ‘Ji'B would be estimated in the standard way; Py.xJi would
be modified to increase Pc when 0c was an acceptable order occurring at
_y when x, had value 1.

b J should be increased as all the individual probabilities within
{Pym} tend away fran max P (max P taken in the sense of the last
section). 'lhis could be accomplished by letting b3 increase as
12:;(Py-xhih - Ph)2 where Py.xJih is the hth canponent of Py'XJi and
Ph is the hth component of max P. Likewise, bJ should be decreased as
its shows increasing correlation with the other variables of X.

It is suggested that a measure of how much b3 should be lowered
because of its correlation with other variables might be some function

of the similarity of the" particular P to the other n-l vectors

y.xdi
determined by the particular 8.

27

This modal might be made even more flexible if each Py-x 31 had
its own b Ji' his would permit "differential validity" within a par-
ticular stimulus variable, but it would also necessitate some correction
factor since a canbination of such in would not yield a probability
vector.

A method for the selection and rejection of stimulus variables
would follow naturally. At periodic intervals that xi whose b J or
in" were lowest would be discarded, and a new one selected at random

from the remaining variables not currently represented in the stimulus

variable set .

E. A Pattern Analytic Model

Another method of relating S to P is still in the process of
development. A brief description will be given, however, because it is
felt to be superior to the above method and also to parallel more
closely the functioning of organisms. .

It is suggested that a progranner's tentative choice of an order
is determined by the relation of the particular 8 to certain patterns
around which his perceptions are organized. An analogous scheme for
the canputer would work in the following manner. Given an S where the
stimulus variables each may have only the values 1 and 0, this scheme
would classify 8 according to its similarity to each of my patterns
A1, a2, ..., at (populations), where A1 is defined by a set of values
an, 3.21, ..., am1 where a.31 a 1.fo = 1|A1‘3 with x1 the usual stimulus
variable. he similarity of a given S to A1 would be measured by some
monotonically increasing function of the likelihood of 8 given A1, say

Ss’A1° Each A1 would have an associated P3,,“ and

28

t
PMI =- Z

83.13 P -
i=1 1 ”"51

Such a system would allow certain combinations of stimulus
variables to have an effect on the FY entirely the reverse of the
separate effect of either in the absence of the other. The problem
of determining both the best set of Ai's and their corresponding
Py'Ai
scheme, and we have in mind no complete method for their determination

's is considerably more difficult than in the previously described

since it is not even clear to us Just what kind of criterion we can

apply to the selection of the A's. It might be said that the modifi-
cation of the A's and P
Y°A1
l) The extent to which each pattern A1 should be modified

'15 would in general follow these rules.

to resemble a given 8, at that point which an acceptable
response has been determined should be:
a) A direct function of 3,. A1.
b) An inverse function of the number of other patterns
which are more similar.
c) An inverse function of the probability of obtaining
the acceptable response given Ps- A1.
2) the extent to which each PM51 should be modified should be:
a) Primarily a direct function of 33-A1°
b) An inverse function of the number of other patterns
whose 33.151 and whose likelihood of the acceptable
response given P3.Ai's are also greater.
c) A direct function of the likelihood of the correct

response given P, . A1 .

29

F. Conclusion

Still other methods have been considered, but they all seem to
be either clumsy or offer little hope for any fast improvement. How-
ever, if there were a scheme that required an extraordinarily large number
of acceptable responses to be given it before it could begin to Operate
efficiently, it might be possible to combine it in such a. fashion that
as it became efficient it would gradually replace the other scheme.
This would be a kind of evolution within the canpiler.

At this point it seems that the best course of action now will
be actually to try to write a compiler that embodies the principle of
sectioning and a learning model similar to one of the ones descriﬁd.
Since the majority of the ideas in the develoynent of the compiler
described in this thesis have arisen with less concern for finding a
"best way" than for finding "any way," it might prove that if no com-
piler of this sort were attempted until the theory were canplete, indeed
a great deal would be lost. However, this does not mean of course that
we should be satisfied with a scheme Just because it works, since only
by further developing the theory will it be possible to increase the
learning and generalizing ability of machines.

It should be mentioned that the scheme which we have conceived
of primarily as a canpiler might be ad ptable to many other problems.
Consider, for example, the playing of games such as tic tac toe or
checkers. The latter would entail developing considerably more sephis-

ticated criteria, since in checkers, for example, it would not be

5r Kit/a

30

desirable to classify moves as only "acceptable" or "unacceptable. "
It might be found that a system of classifying results of moves might
be made using a pattern analytic method similar to the one deve10ped
for responding to stimuli.

Since the response selector of the compiler responds variably
to a particular stimulus pattern, it should be capable of learning to
play games of mixed strategy.

'me use of the pattern classification scheme in the construction
of human language translation machines should be investigated. 'nle
recurring patterns of granmar in two languages are similar to the
patterns that exist in the two machine languages .

In conclusion, it is our hope that we have suggested a scheme
which will lead to the develOpment of similar schemes of higher saphis-
tication that will be able to increasingly invade those areas of

problem solving that have hitherto been reserved for human beings.

1. . .Jrflll II||41

- Ii'll.l [

ﬁll-\I .ll .5]

10.

ll.

12.

REFERENGS
Preidberg, R. M., A Learningbhchine: Part 1, IBM Journal of
Research and Development, Vol. 2, No. 1
Manual of Information, IBM 705 Autocoder System.
Manual of Operation, IBM 650 Magnetic Drum Data-Processing Machine.
Manual of Operation, IBM 70h Electronic Data Processing Machine.
McQuitty, Louis L., Writ Analysis: Classifying Persons by
Eredominate Patterns of Resmnses, British Journal of Statistical
Psychology, Vol. IX, Part I.
Hemiitty, Louis L. , Element_aﬂ mm Analpis for Isolating
Orthogonal and Oblique mes and £y&‘LfRelavanciesJ Educational
and Psychological Measurement, Vol. 17, No. 2.

Preliminary Manual of Operations, IBM Electronic Data-Processing
Machine Type 705 -

Progranner's Manual, Data-Processing Package for the EM 701i.

Programner '3 Reference Manual, Fortran: Automatic Coding System for
the IBM 70%.

Programner's Reference Manual, Fortransit: Automatic Coding System
for the IBM 650.

Programmer '3 Reference Manual, IBM Electronic Data-Processing
inchines Type 705 .

Transactions of the Armour Symposium on Computer Applications,
October, .1957.

lIIIIl
||
I
I'll
|I|
|||
III-I
ll
IIIIII
||||
III
ll
III!
||

1..

||||

llllll‘lhlﬂ

3