’“FORMAT'ON'EWPY CONCEPTS?“ {.7
NUTRITIONALSYSTEMS - . f 3: a. E , a: ,3,
‘9 : _ Dissertationforthe Degree ofPh. D. 7:.1 _
‘ MICHIGAN STATE UNIVERSITY *
JEROME PAUL HARPER '
1976
‘‘‘‘‘‘
”I”?
““ARY
«(on
\IllllllllllMill“Milli“\“llllllllmll
3 1293 10
Michigan 3136:
This is to certify that the
thesis entitled
Information—Entropy Concepts for Nutritional Systems
presented by
Jerome Paul Harper
has been accepted towards fulfillment
of the requirements for
Ph. D. degree in Agricultural Engineering
ngw
Major professor
Date ﬂ/ﬂ; $/¢Zé
0-7639
3.2:..-
ABSTRACT
INFORMATION-ENTROPY CONCEPTS
FOR NUTRITIONAL SYSTEMS
BY
Jerome Paul Harper
The objective of this dissertation is to view
nutritive processes as communication systems for trans—
mitting dietary nutritional information. This study
utilizes information theory concepts in the analysis of
nutritive communication systems. The concept of
information-entropy is used to derive the information
capacity of the system's dietary inputs and metabolic
requirements.
The major process investigated isthe systenltrans—
mitting informationensamino acids for protein metabolism.
First, a gene-protein channel is defined and hpothesized
to be the determinant of the metabolic information—
entropy requirements for amino acids. A nutritive
communication system is then postulated which contains
five basic components: (1) information source (food
protein), (2) encoder (intestinal amino acid transport
system), (3) channel (circulatory system), (4) decoder
cellular amino acid transport system), and (5) receiver
Jerome Paul Harper
(cellular amino acid pool). The transmission capacity of
amino acid information depends upon cellular metabolic
requirements which control the decoding capacity, and
thus the overall transmission efficiency. Cost of trans-
mission is defined as the ability of an information
source to satisfy metabolic requirements during a fixed
time period. A familiar rank-frequency distribution of
information theory, Zipf‘s law, is employed to order
source proteins on the basis of metabolic cost. Net
protein value is shown to be proportional to the inverse
protein rank (quality). A single channel model yields a
protein ranking similar to chemical score, while the
multichannel model generates a ranking similar to Oser's
essential amino acid index. The multichannel model could
be adapted to consider amino acid catabolism by the liver
(an important loss of information in the channel), and
predict a new protein ranking termed the "essential amino
acid retention index."
The other study concerns the information-entropy
of carbohydrate polymers. The hydrolysis of these poly-
mers is regarded as a metabolic encoding process. The
dietary carbohydrate message has to be reduced to the
monomer or dimer form if it is to be transmitted through
the nutritional channel (i.e., circulatory system). The
cost of encoding (time/monomer) is equated with the
Jerome Paul Harper
inverse activity of enzymatic hydrolysis (monomers/time).
The ranking of carbohydrate message length (degree of
polymerization) with respect to the rate of encoding
(hydrolytic activity) is shown to be identical to the
ordering scheme dictated by Zipf's law.
<
Approved M» S~
Migfr Professéf'
Approved Mi :M
Department Chairman
INFORMATION-ENTROPY CONCEPTS
FOR NUTRITIONAL SYSTEMS
BY
Jerome Paul Harper
A DISSERTATION
Submitted to
Michigan State University
in partial fulfillment of the requirements
for the degree of
DOCTOR OF PHILOSOPHY
Department of Agricultural Engineering
1976
To Christine
ii
ACKNOWLEDGMENTS
I wish to thank my major professor Dr. J. B.
Gerrish for his guidance and assistance on this disser-
tation and during my doctoral candidacy.
Also, I wish to thank the members of my com-
mittee, Dr. M. Z. v. Krzywoblocki, Dr. J. W. Thomas,
Dr. D. K. Anderson, and Dr. J. B. Holtman for their
thoughts and effortsixrthe preparation<1fmy dissertation.
In particular, I want to acknowledge Dr. Krzywoblocki,
“Ziggy" to his friends and students, for his tutelage in
the field of information theory.
LIST OF
LIST OF
LIST OF
Chapter
I.
II.
III.
IV.
TABLE OF CONTENTS
TABLES . .' . . . . . . . . .
FIGURES . . . . . . . . . .
TERMS . . . . . . . . . . .
INTRODUCTION . . . . . . . . .
LITERATURE REVIEW AND PERSPECTIVE . .
2.1 Some Thermodynamic Aspects of
Entropy . . .
2.2 Entropy and Information Theory .
2.3 Entropy, Information and Biology .
2.4 Entropy, Information and Nutrition
INFORMATION AND THE QUALITY OF PROTEINS
3.1 Indices of Protein Quality . .
3.2 An Information—Entropy Model of
Protein Quality . . .
3.3 Analysis of Information— —Entropy
Approach . . . . . . .
INFORMATION AND THE HYDROLYSIS OF
CARBOHYDRATE POLYMERS . . . . .
4.1 Aspects of Carbohydrate Structure,
Hydrolysis and Metabolism .
4.2 An Encoding Model for Carbohydrate
Information . .
4.3 Assessment of the Carbohydrate
Information— Entropy Analysis . .
DISCUSSION . . . . . . . . . .
5.1 Nitrogen Retention and Information—
Entropy . . . . . . . . .
Page
vi
ix
44
54
77
94
94
98
107
122
122
Chapter Page
5.2 The Information-Entropy Model for
Protein Metabolism: Summary . . . 133
5.3 Information— —Entropy and Polymers:
An Appraisal . . . . . . . 135
VI. CONCLUSIONS . . . . . . . . . . 138
REFERENCES . . . . . . . . . . . . . 139
v
LI ST OF TABLES
Table Page
3.1.1 Listings of Biological Value, Net Pro-
tein Utilization, Net Protein Value,
and Protein Efficiency Ratio Scores
with their Respective Rankings
(Source: FAO) . . . . . . . . . 49
3.1.2 Matrix of Correlation Coefficients
Relating Biological Value, Net Protein
Utilization, Net Protein Value, and
Protein Efficiency Ratio Scores . . . 50
3.1.3 Matrix of Correlation Coefficients
Relating Biological Value, Net Protein
Utilization, Net Protein Value, and
Protein Efficiency Ratio Ranks (ps) . . 50
3.3.1 Amino Acid Content (micromoles per gram
N) of Food Proteins (Source: Eggum) . . 80
3.3.2 Amino Acid Content (micromoles per gram
N) of Food Proteins (Source: FAO) . . 81
3.3.3 Biological Values and Net Protein Values
of Sixteen Test Diets of Rats and Baby
Pigs (Source: Eggum) . . . . . . . 83
3.3.4 Information-Entropy Indices for Sixteen
Different Food Proteins . . . . . . 84
3.3.5 Matrix of Correlation Coefficients for
Information-Entropy Measures Versus Net
Protein Values and Biological Values
of Rats and Baby Pigs (based on amino
acid content of dietary protein) . . . 85
3.3.6 Matrix of Spearman's Rank Correlation
Coefficients for Ranks of Information-
Entropy Measures Versus Net Protein
Values and Biological Values of Rats
and Baby Pigs (based on amino acid
content of dietary protein) . . . . . 86
vi
Table
3.3.7
3.3.8
3.3.9
3.3.10
Matrix of Correlation Coefficients for
Information-Entropy Measures versus
Net Protein Values and Biological
Values of Rats and Baby Pigs (based on
available amino acid content of
dietary protein) . . . . . . .
Matrix of Spearman's Rank Correlation
Coefficients for Ranks of Information-
EntrOpy Measures Versus Net Protein
Values and Biological Values of Rats
and Baby Pigs (based on avilable amino
acid content of dietary protein) . .
Matrix of Correlation Coefficients for
Zipfian (log-log) Analysis of
Information-Entropy Model . . . .
Matrix of Slopes of Zipfian (log-log)
Analysis of Information-Entropy Model
Degree of Polymerization and Enzyme
Kinetic Data of Amylose . . . . .
Degree of Polymerization and Activity
Data of Cellulose, with activity in
(moles/sec.) x 10'9 . . . . . .
Correlation and Regression Analysis of
Activity Data Versus Degree of
Polymerization . . . . . . . .
Hydrolysis of Amylose Polymers with
B-Amylase, and Degree of Polymeriza—
tion 0 C O O C O O I O O 0
Correlation and Regression Analysis of
Hydrolysis and Degree of Polymeriza-
tion 0 O O O O O 0 O O O 0
Chain Length Fractionalization of a
Polydisperse Carbohydrate System as a
Function of Its Degradation . . . .
Chain Length Behavior of Polydisperse
Carbohydrate Systems . . . . . .
vii
Page
88
88
9O
91
111
111
112
115
115
119
120
Table
5.1.1
Page
Correlation Coefficients Among Essential
Amino Acid Retention Indices and
Experimental Protein Values of Rats
and Pigs . . . . . . . . . . . 130
Linear Regression Coefficients Among
the Essential Amino Acid Retention
Indices and Experimental Net Protein
Values of Rats and Pigs . . . . . . 130
Slopes for Regression Analysis Among
Essential Amino Acid Indices and Experi-
mental Net Protein Values of Rats and
Pigs . . . . . . . . . . . . 130
viii
Figure
3.2.1
LIST OF FIGURES
Graphical Representation of Transcrip—
tion of Genetic Information in Protein
Synthesis . . . . . . . . . .
Idealized Communication System for
Transmission of Genetic Information .
Idealized Communication System for
the Transmission of Amino Acid
Informational Molecules . . . . .
Hg(EAA), the Average Information—
Entropy, Versus Zipfian-Rank—Ordering
for Net Protein Value for Rats . . .
The Degree of Polymerization Versus
Enzymatic Activity for B-Amylase . .
The Degree of Polymerization Versus
Enzymatic Activity for Cellulose A
(Penicillium Notatum) . . . . . .
Graph of Information—Entropy Indices
Ig(EEARe) and I§(EEAe) Versus Net Pro—
tein Value for Rats (Source: Eggum) .
Graph of Information-Entropy Indices
Ig(EAARe) and 1%(EAAe) Versus Net Pro—
tein Value for Pigs (Source: Eggum) .
ix
56
59
64
92
109
110
131
132
B!
b; b'
LIST OF TERMS
Canonical normalization constant
Constants
Total enzyme activity
Enzymatic activity for reaction
with one substrate bond
.th . .
3 amino acrd
Maximum jth amino acid frequency,
magnitude of amino acid variable
at receiver
Absolute ranking of jth amino acid
Frequency of jth
proteins x and 3
amino acid in
Body nitrogen of protein-fed
animals
Constant
Constants
Body nitrogen of non-protein-fed
animals 1
Biological value
Biological value of jth amino acid
Channel capacity of a system
Total costs of message
Cost of ith or jth
Capacity of jth channel at source
and receiver, respectively
symbol or word
D 8.); D x.)
(J (3
DP
DP.
3
d(x,t)
e
e-subscript
E
EAA
EAAI
EF.
3
F
f-subscript
Cost of source aa. and receiver
aaj, respectively
Total cost of message
Chemical score
Concentration of protein in diet
Digestibility
Hydrolysis coefficient
Digestibility of protein
Degree of polymerization
Length of message
Distribution function
Energy unit
Denote Eggum as data source
Macroscopic energy
Essential amino acid set
Essential amino acid index
Replica energy in canonical
ensemble
Encoding efficiency
Fecal nitrOgen
Denotes FAO as data source
Catabolism factor
Endoqenous fecal nitrogen
Conservation factor
Boltzman H-function
Information of jth symbol
xi
IX(CS)
I (EAAR);
Xo
Ix(EAAR)
I (NPV) ;
x o
I (NPV)
X
Maximum of H-function
nth order multivariate information
capacity
Independent nth order multivariate
information capacity
Independent bivariate information
capacity
Bivariate information capacity
Entropy of receiver
Information-entropy content of x
Characteristic information-entropy
of protein
Total information-entropy of jth
variable
Average information-entropy of
essential amino acid set
Channel transmission rate at source
.th . .
3 channel transmiss10n rate
Stored information
Univariate and bivariate divergence
Information index
Essential amino acid retention
index
Information—entrOpy indices
Constant for converting 1n to log2
Boltzman's constant
xii
K
m
m DP.
( 3)
min Hs(aaj)
min Hx(aaj)
NPU
NPV
NPV(Xj);
NPV(Sj)
NPVx(EAA);
NPVS(EAA)
Michaelis-Menten constant
Frequency of messages with length
DP.
3
Log frequency of IX(CS) for stan-
dard protein
Log frequency of IX(CS) for
protein 5
Chemical potential of jth particle
Total number of particles or words
Total number of words of cost cj
Total number of glucose units in
source
Number of glucose monomers in
system
Nitrogen intake
Nitrogen intake of non-protein-fed
animals
Net protein utility
Net protein value
Net protein value of jth amino acid
Log-averages of net protein values
Number of jth particles
Permutability factor
Canonical probability function
Protein efficiency ratio
Grand canonical probability function
xiii
P(i); P(j)
P(ij)
P.
J
P(j/i)
mc
Probability of ith or jth element
Joint probability of ith and jth
elements
. . .th
Probability of 3 state
Conditional probability of jth
state
Microcanonical probability function
Heat
Redundancy
Spearman's rho
h
Relative rank of jt amino acid in
protein x
Rank of protein x
Entropy
Substrate concentration
Statistical entropy of macrosystem
Absolute temperature
Time
Urinary nitrogen
Endogenous urinary nitrogen
Reaction velocity
Maximum reaction velocity
Partition function
CHAPTER I
INTRODUCTION
The title of this dissertation contains a compound
word, "information-entropy." This word carries with it
two different and distinct concepts which together repre—
sent somewhat of a union of information theory and ther-
modynamics. This union can be viewed as generalizing the
study of thermodynamics.
My notion of a generalized theory of thermo-
dynamics revolves about a simple hypothesis on the
occurrence of events. All events occur more or less
frequently for two reasons: (a) There exist physical
reasons favoring certain states, and (b) there exist
some mental reasons favoring certain states. Therefore,
if one is to interpret phenomena, a theory, methodology,
or principle is needed to quantify the frequency of
physical phenomena, and understand what quantification
means.
In science, the quantification of observable phe—
nomena is accomplished by calculating the entropy of the
system. In fact, the Second Law of Thermodynamics
implies that systems which cannot be quantified because
their behavior is so random have no utility. It was the
purpose of men such as Boltzman, with his permutability
factor, and Gibbs, with his interpretation of the
behavior of volume in phase space, to develop within the
science of thermodynamics the ability to quantify nature.
Given that entrOpy is a measure which strives to
quantify, how does it relate to information theory?
Information theory is the science of quantification.
Using its concepts and applying its theorems allows us to
understand how we are quantifying a system. This is why
I will later stress the importance of Zipf's law, an
empirical rank-size rule, in its information theory con-
text of frequency-cost relationships. The empirical
observation of Zipfian behavior is of little benefit if
one does not recognize that such behavior depicts a sys-
tem's organization. It is, then, no mere coincidence that
thermodynamic measures of entropy and the communication
measure of information are similar.
Because it is important to quantify phenomena, I
have sought in my thesis to develOp an information-
entropy methodology for analyzing nutritional systems.
Frequency, pattern, and organization are important con-
cepts in nutrition and a proper format should be devel-
oped for quantifying them. My conceptualization of
entrOpy in thermodynamics is that of quantification.
Little difference between the meanings of "entropy" in
thermodynamics, and "information" in information theory
exists. Both strive to achieve the same end, quantifica-
tion, which is the deeper meaning of entropy.
The quantification of nutritional systems is
begun by assuming the organism under study exists as a
biological information processing system. The informa-
tion being processed here is nutritional information.
The source of nutritional information for the organism is
the diet, which contains a vast array of different nutri—
tional signals, each signal varying in its frequency of
occurrence and each diet providing a distinctive pattern
of signals. This information is then fed into a highly
organized biological communication system which distrib—
utes and integrates the nutritional information to provide
the necessary chemical order (nutrients) for the continu-
ance of the organism's metabolic processes.
The nutritive system which best fits into the
above sequence of events is the protein metabolic system.
An extensive study of this system will be presented in
the text, and the relationships between the frequencies
of nutritional information (amino acids) and different
nutritional frequency patterns of proteins can be used
as a measure of the protein quality of the diet.
The carbohydrate study measures the information
or entropy of macromolecules. The amount of information
which they possess is based upon the frequency of
nutritional signals (glucose units) per molecule. The
ability of the carbohydrate message to be interpreted
(hydrolyzed) by the organism's enzymes depends on mes-
sage length.
The concept of "information—entropy" as used in
this study is thus defined as "the frequency of a nutri-
tional event" (e.g., the occurrence of an amino acid in
the diet or glucose unit insipolymer). This nutritional
frequency is then shown to favor a particular metabolic
state.
CHAPTER II
LITERATURE REVIEW AND PERSPECTIVE
The relationship between entropy, information and
biology is complex. The following chapter presents
classical and statistical mechanical ideas on entropy,
their relationships to information, and the role of
information—entropy in biological systems. This chapter
is partically an historical review and partially a com-
mentary on the subject. Its purpose is to provide the
reader with a perspective on both subjects' scientific
aspects and my own conceptualization of the interrela—
tionships among entropy, information and biology.
2.1 Some Thermodynamic Aspects
of Entropy
Perhaps the best way to present the concept of
information-entropy is to begin with the development of
the entropy principle in thermodynamics. Early ideas
on heat were based on the study of steam engines, and
Sadi Carnot's (1) notes in 1824 on the efficiencies of
these engines are often regarded as the starting point
of thermodynamics. The concept of entropy was intimately
associated with views on the nature of energy, which was
thought to possess one of two qualities: (1) to be free
and available to do mechanical work, or (2) to be bound
and incapable of mechanical work. A qualitative degrada-
tion of energy from the free to the bound form was invari-
ably observed to occur, and several rules were formulated
to describe this phenomenon.
Clausius (2) stated, "Heat can never, of itself,
flow from a colder to a hotter temperature." Thomson's
position, on the other hand, was, "It is impossible to
derive mechanical effect from any portion of matter by
cooling it below the temperature of the coldest sur-
roundings" (3).
These statements are similar, and the principle
of physics which was derived from them is known as the
Second Law of Thermodynamics. Clausius presented in
1865 the classic formulation of this law:
The entropy of the universe at all times
moves toward a maximum.
The sense of entropy here is like the notion of bound
or latent energy with the added constraint of being
quantified at a particular temperature. The following
equation shows the relationship (4):
Bound energy (2 l 1)
Absolute temperature ' ° '
Entropy =
A refinement of the above relationship was
obtained from analysis of the behavior of an elementary
heat engine in the Carnot cycle. It was reasoned that
if heat, g, were allowed in or out of the cycle only in
infinitesimally small increments, then q could be
approximated by its differential form, dq. Utilizing
this differential form the Carnot cycle heat behavior can
be described by the following integral form:
dq _ .
§ 77 — 0 (reverSIble), (2.1.2)
where T is the absolute temperature. The above is an
interesting mathematical form, for it is an equation which
implies that one has uncovered an exact differential mea-
sure, another state variable, different from energy, which
describes a system's thermodynamic behavior as the state
changes from a to b. The new state function was entropy,
S, and was formally defined:
dS = g? (for a reversible process) (2.1.3)
A mathematical approach for the derviation of caloric
entropy proposed by Caratheodory (5) verified that dq/T
is an exact differential only if the process is reversible.
Clausius used the efficiency concept to illustrate
a general difference in the entropy behavior between
reversible and irreversible processes. His result, known
as the Inequality of Clausius, states that the efficiency
of a reversible cyclic process, like the Carnot cycle,
is always greater than that of an irreversible cycle.
Clausius' comprehension of the fact that all real (i.e.,
irreversible) processes result in an entropy increase of
their surroundings, led to his statement of the Second
Law of Thermodynamics, which was the climactic effort
of the classicists in heat theory.
Because of the macroscopic nature of the Carnot
cycle study, it is not amenable to mechanistic analysis.
Such analysis requires a more detailed description of
the phenomenon, a microscopic picture, so that the macro-
scopic parameters such as temperature, pressure, etc.,
Vare understood as the aggregated mechanical behavior of
the elementary masses (atoms, molecules) in the system.
In a mechanical model, the possibility exists for defin—
ing the state variable, entropy, irrespective of the
process being reversible or in equilibrium. The ground—
work for such a model began in the late nineteenth cen—
tury and was to be the basis for understanding the
formulation of entropy.
The development of a mechanical explanation of
the Second Law of Thermodynamics can primarily be
attributed to Ludwig Boltzman. His explanation of the
Second Law has become the mainstay of statistical
mechanics. Boltzman began his studies in 1866 (6) which
were highlighted in 1872 (7) with the publication of a
long memoir which gave the first derivation of the
irreversible increase of entropy based on the laws of
mechanics and also upon those of probability. In this
memoir Boltzman presented a mathematical proof of the
second thermodynamic law by illustrating the uniqueness
of Maxwell's velocity distribution law (8) as a descrip-
tor of the equilibrium state. Maxwell had shown for
gases that his distribution was stationary and Boltzman
expanded the application of this proof by demonstrating
that whatever the initial state of a gas, it approaches
a limit in the distribution of Maxwell.
In this proof Boltzman derived a partial dif—
ferential equation for a distribution function d(x,t),
with respect to time, the distribution function repre—
senting the number of molecules per unit volume with
kinetic energies at time t lying within an interval of
x to (x + dx). He showed that Maxwell's function is
stationary and makes 8d(x,t)/at vanish. The next phase
called for the introduction of an auxiliary function
called H, defined:
H = fowd {1n [d(x,t)/ﬂ] - 1} dx, (2.1.4)
which he proved can only decrease with time due to the
symmetry characteristics of the collision process and
the possibility of inverse collisions. It was then
shown that the Maxwell's distribution function minimizes
10
the H function, proving that regardless of the initial
distribution of d(x,t) the final or equilibrium state
is realized in the Maxwell distribution. Even more
important than this fact, Boltzman pointed out, was
that the quantity H was proportional (with a negative
proportionality constant) to the entropy of the gas.
Needless to say, this result caused a consider—
able degree of interest and before long criticisms of his
approach arose. It was Boltzman's response (9) to one
of his critics that resulted in the formulation of the
second thermodynamic law as an expression of the laws of
probability. He showed that the entropy of a state is
reflected by its probability and an increase in entropy
merely reflects a shift from less to more probable states.
He employed a discrete model to illustrate this proba—
bilistic nature of entropy. It was hypothesized that a
collection of N particles possessed energies which were
integral multiples, j, of a basic energy unit, 3. The
number of particles having the j x e energy is denoted
by nj, such that the sum over i of nj equals N.
For a complete assessment of this system of par—
ticles, a listing of all the individual molecular ener—
gies would be required. To attain this assessment a
permutability measure, P, the number of different
arrangements (microstates) for a given distribution was
constructed by the equation:
,m .-"
- - .. . . .u..... _ ._
1 “
11
P— N‘ (215)
_—l—_l—__—' "
n0.nl....nj1
Boltzman then reasoned that the most probable distribution
was the one where P is maximized. To find the maximum
for 3 he first used Stirling's approximation for factori—
als (10) and proceeded to deduce the following equality:
1n P = —E n. ln nj + constant . (2.1.6)
3
Recall equation (2.1.4) and recognize the similarities
between nj and d(x,t) and that the negative of the H
function is entropy. The beauty of Boltzman's proof
becomes readily apparent: he has, first, found the dis—
tribution which maximizes 3; second, shown the relation-
ship between P and entropy; and third, knows the nj's
will have a Maxwell distribution when the entropy of the
system is maximum. The classic formulation Boltzman gave
for statistical entropy, Sm, of the macrosystem in terms
of its microstate distribution is:
S = k - 1n P (2.1.7)
where kbis known as Boltzman's constant which is deter—
mined by dividing the gas constant by Avogadro's number.
The notion of probability is ascertained from P,
the permutability factor. Consider the logarithmic
expansion of equation (2.1.5) and apply Stirling's
approximation; the result has the form:
12
1n P a N (ln N—l) - Z n.(1n n. - 1)
j J 3
= N 1n N - Z nj 1n jn
J
= -N X (nj/N)ln nj/N . (2.1.8)
3
The quantity nj/N is identical to the probability of the
jth microstate which shall be denoted as Pj' Substitut-
ing equation (2.1.8) into equation (2.1.7) for the
macrosystem entropy:
sm = —kbN g Pj 1n Pj . (2.1.9)
The behavior of this function is identical to that in
equation (2.1.3), and generates the most probable dis-
tributions when P is maximized.
Because one is greatly limited in his knowledge
of the microsystem structure, another successful approach
which overcame such limitations appeared at the beginning
of the twentieth century. It was developed by Gibbs
(11). The Gibbsian approach was proposed to show how
microscopic "behavior" determined the total thermodynamic
picture. Primarily, this was attained by employing an
abstraction which Gibbs called an ensemble. In essence,
this was a statistical-mechanical theory which could be
generalized as a statistical theory of systems of dif-
ferential equations (12).
13
An ensemble can be defined as a collection of a
large number of identical replicas of the representative
system. These replicas are all independently performing
the identical irreversible process. The main assumption
of ensemble theory is that the instantaneous macroscopic
state is related to the average of the replicas' states
taken over all the replicas.
The ensemble's macroscopic conditions can dic-
tate the probability distributions for each replica by
affecting energy and motion. Then, for different macro-
scopic conditions, different ensemble types can be
identified. The following are the three most common
ensembles employed (13):
Microcanonical ensemble: A statistical ensemble
of closed, energetically isolated systems in a
constant volume, a replica here can be thought
of as being enclosed in an adiabatic shell where
neither exchange of energy nor of particles is
allowed. The rigidity of the constraints implies
that a simple probability function holds for this
type of ensemble. The microcanonical probability
function, Pmc, is constant and of the form
Pmc = l/P . (2.1.10)
The Maxwell—Boltzman distribution can be used to
calculate the probability function of a micro-
canonical ensemble.
Canonical ensemble: A statistical ensemble in
thermal contact with a thermostat, here the
replica is permitted to exchange energy with
another system whose energy is so large by com—
parison that its state remains unchanged. The
canonical probability function, PC, is there—
fore a function of the energy E- of each
replica and its form is exponenfial:
PC = A exp(-B'Ej) . (2.1.11)
14
A is a constant fixed by normalizaton, B' is the
inverse of thepmoduct of the Boltzman constant
and the absolute temperature. The term exp(-B'Ej)
is called the Boltzman factor.
Grand canonical ensemble: A statistical ensemble
which can exchange both energy and particles with
its surroundings, such an ensemble can be con—
ceived of as a box in contact with a thermostat
and possessing permeable walls. The grand canoni-
cal probability function, P C, is based both on
considerations of energy an particles:
._ _ V .. '
ch A exp( B Ej B Z njuk) (2.1.12)
where pk is the chemical potential of the jth
particle type.
After deciding what ensemble to employ, the
remaining problem in the Gibbs approach is that of com—
puting what is known as the partition function. The
partition function is probably the most important concept
in statistical mechanics today, from which important
thermodynamic variables (including entropy) can be esti-
mated. The partition function is a very simple mathe-
matical form that depicts the distribution or partitioning
of the system among the various energy levels or quantum
states. To calculate, sum the Boltzman factors for all
the different states (14):
Z = E exp(-B'E.) . (2.1.13)
j J
The partition function has an important statistical rela-
tionship to the probability of the system:
Pj = nj/N = exp(—B Ej)/Z (2.1.14)
15
which gives the function immense utility in thermodynamic
calculations.
An expression for entropy in terms of the parti-
tion function can be readily deduced. Recall that the
classical definition for entropy is the reversible dif—
ferential energy change divided by the absolute tempera-
ture. As the energy changes:h1theprocess so will the
partition function with respect to B' and Ej(15):
d 1n z = -E dB' — (B'/N) 2 nj dEj (2.1.15)
3'
(1n Z is preferred to Z because of its additive proper-
ties).
By performing a Legendre transformation on equa—
tion (2.1.15) and collecting terms, we cause the differ-
ential energy or heat change of the system to become:
dE = T - dS = d(ln Z + B'E)/B' (2.1.16)
or, alternatively, the entropy is:
d8 = kb - d(ln Z) + d(E/T) (2.1.17)
S = kb - 1n Z + E/T . (2.1.18)
Equation (2.1.17) can be converted to the Boltzman
equation for entropy after we recognize two relationships:
E = Z P.E. (2.1.19)
j
16
-B'Ej = 1n(PjZ) . (2.1.20)
Equation 2.1.19 states that the macroscopic
energy of the system is the expected value determined
from the energies of each microstate, and equation
(2.1.20) is the logarithmic form of equation (2.1.14).
Substituting into equation (2.1.18) we have:
s = k 1n z + k 2 Pj(B'Ej)
b
J
= k 1n Z — k P. 1n P. — k P. 1n Z
b 2 j j Z 3
J 3
= k P. 1n P. . (2.1.21)
b E J J
The above equation is identical to equation (2.1.9), the
Boltzman entropy, divided by N.
In spite of the obvious similarities between the
Gibbsian approach and that of Boltzman, there are also
pertinent differences. Boltzman's entropy is a measure
which does not consider interparticle forces, and thus
neglects the effects of potential energy and the effect
of interparticle forces on pressure; Gibbs' entropy takes
into account all the energy and total pressure (16). The
question then arises, what is the true thermodynamic
entropy of a system?
The current thought on this question tends to the
Viewpoint that true thermodynamic entropy is difficult
17
to define because the partitioning or experimental con-
ditions of the system depend upon the human element (17).
This "anthropomorphic" aspect of entropy imparts a con—
siderable degree of arbitrariness, making a definition
of true thermodynamic entropy essentially impossible.
However, it should be remembered that irrespective of
the manner in which the system's partitioning is accom-
plished, the partition-dependent behavior is not arbi—
trary but follows a course dictated by the Second Law.
Consequently, when studying the entropic behavior of a
system the most difficult problem is to state what ques-
tions we want to resolve and to formulate entropic
measures which allow their resolution.
2.2 Entropy and Information Theory
An important aspect of the entropy concept not
usually accounted for in traditional thermodynamic
approaches to entropy is its information attribute.
The first to recognize the relationship between
entropy and information was Szilard (18) in 1929, who
related the usage of information to the production of
entropy. This was approximately twenty years before
the development of information theory by Shannon (19) in
1948, and its value has only recently been recognized.
Shannon's contribution to science was significant, since
for years investigators had tried to formulate a useful
18
measure of information for communication engineering
(20). Several names stand out in the early years:
Hartley (21) with his theory on information transmission,
using the logarithm of number of symbols as an informa—
tional measure, and Gabor (22), working on time-frequency
uncertainty and the logon concept. However, it was
Shannon who clarified the confused situation with his
theory.
Like Hartley, Shannon used a logarithmic measure
of the number of symbols as his measure of information.
Formally, Shannon's information measure for the jth sym—
bol, h., is defined as the negative logarithm of the
J
symbol's probability:
h. = - 1n P. . (2.2.1)
3 3
Shannon recognized his measure determined not the quan-
tity of information the symbol conveyed, but rather the
uncertainty of information. The Shannon measure can
also be applied to messages; this is accomplished by
determining the expected value of all symbols in the
message:
H = Z -Pj 1n 9. . (2.2.2)
The H-function has an absolute maximum when the proba-
bilities for all the symbols are equal (23):
_,
m
L
19
H = - 1n P. . (2.2.3)
max 3
Using equations (2.2.2) and (2.2.3) we can explore more
deeply the meaning of Shannon's information measure.
The notion of uncertainty is easily deduced, for as H
increases, the symbols become more equiprobable and the
ability to distinguish their information content
decreases, or alternatively, our uncertainty about them
increases. Another way to regard uncertainty is as a
measure of the number of degrees of freedom. The lower
the uncertainty the fewer degrees of freedom or the
greater the uncertainty the more degrees of freedom the
system has. The degrees of freedom concept also denotes
the idea of capacity and usually "information capacity"
is what Shannon's measure of information is called.
Information capacity means "message variety" to the com—
munications engineer, a useful parameter in designing
communication systems.
Given the measure of information in equation
(2.2.2), it is now much easier to see the relationship of
entrOpy, in equations (2.1.9) and (2.1.21), to informa—
tion.
The progression from information theory to thermo-
dynamics is accomplished by first relating information
theory to statistical mechanics, via the partition func-
tion, and thus to the various statistical-mechanical
20
analogs of the laws of thermodynamics. This was first
done by Jaynes (24, 25) and his conclusions have since
been verified by others (26, 27, 28). The agreement
between information theory and thermodynamic entropy has
become such a well—accepted relationship that many cur—
rent textbooks on thermodynamics and statistical
mechanics rely heavily on the concept of information
when presenting these subjects (29, 30, 31). Perhaps
the following quote of J. von Neumann can best summarize
the roles of information and thermodynamic entropy (32):
The thermodynamical methods of measuring
entropy were known in the mid—nineteenth cen-
tury. Already in the early work on statistical
physics it was observed that entropy was
closely connected with the idea of information:
Boltzman found entropy proportional to the loga—
rithm of the number of alternatives which are
possible for a physical system after all the
information that one possesses about the system
macroscopically (that is, on the directly,
humanly observable scale) has been recorded.
In other words, it (entropy) is proportional to
the amount of missing information.
Current investigations employing information theory
methodology in the study of the thermodynamics of open
systems (33) and chemical systems (34) have begun to
assess the use of information—entropy in engineering
disciplines other than telecommunications—related areas.
Both theoretical developments and wide application of
the information theory methodology came about during the
1950s (35) and brought the subject out of its infancy.
The 3rd London Symposium on Information Theory is an
21
excellent illustration of the diversity of subjects
examined using the new concepts (36). The topics for
papers given ranged from studies on computers, elec-
tronics, statistics and mathematics to those on animal
welfare, political theory, psychology, anthropology,
economics, and anatomy, which are subjects divorced from
traditional communication applications. This period also
smntherise of coding theory (37, 38), an important step
increasing the utility of information theory for solving
the problems of a rapidly expanding telecommunications
industry.
The sixties provided less innovation than the
fifties, and more time for reflection on the theory's
fundamental concepts and tenets (39). Emphasis was
placed on coding theory, in particular on decoding
algorithms (40), and these studies have remained the
basic thrust of its mathematical development (41). In
addition to direct telecommunication applications,
information theory began to be firmly established in
several other disciplines. Psychology proved a fertile
area for the application of information theory, where the
individual was regarded as an information processing
device (42).
The economic adaptations of the theory also found
acceptance. The theory of information-entropy was used
as a mathematical tool for analyzing industrial
=u
22
concentration (43), the future prices of stocks (44), and
as an accounting methodology (45). Utilization of this
concept has become so widespread that an entropy law for
general economic processes has been proposed (46).
Interesting, but perhaps esoteric, is the application of
an information theory observer—critic to evaluation of
the philosophical arguments of Aristotle (47), and the
application of the theory to detective work in crimi—
nology (48). However, an important new application of
information theory was in the field of biology, where
many interesting new facts were discovered about the
information transcription of the genetic code onto the
protein space (49, 50).
To understand the information concepts which will
be employed in this work, it is necessary to discuss addi—
tional terminology and theorems pertinent to the study.
The best place to begin is with the extension of Shan-
non's information capacity, equation (2.2.2), to nth order
Markov processes or multivariate analysis. The idea
behind this extension is that the possession of an infor-
mation measure based on the joint probability of B and E
could be used to construct the entropy of the word BE,
that of B and E and T, the word BET, and so forth.
Shannon called such information calculations n-gram
entropies (51). Used in this manner the joint probability
has a multivariate connotation, but this would change if
l1
23
we considered the same message or word coming along a
telegraph wire. Such a message is different because it
is dynamic and can be thought of as a stochastic process.
Because outcomes (words) in such a process would be dis-
crete the formulation of H in a Markovian sense is pos—
sible (19).
Let us first look at the bivariate case involving
a pair of events and define the joint probability of the
ith and jth elements:
P(ij) = P(i) P(j/i) (2.2.4)
where P(ij) is the joint probability, P(i) the proba—
bility of i and P(j/i) the conditional probability of 1
given 1. The bivariate information capacity, H2, equals
the following (52):
H = —Z 2 P(ij) 1n P(ij)
1 J
2 Z P(j/i) 1n P(i)P(j/i) (2.2.5)
1.
J
and if P(i) and P(j) are independent, H2 becomes
—2 Z P(i)P(j) 1n P(i)P(j)
1 3
NH
= -Z P(i) 1n P(I) — 2 P(j) 1n P(j). (2.2.6)
1 1
Note that by subtracting H2 from Hi we get a new measure,
. . s
the divergence from Independence or a measure of 1 t—order
24
Markov memory. Extending this to multivariate analysis
th .
or an n -order Markov chain we have
Hn = -2 ; ... 2 P(i)P(j/i) ... P(n/N-l) 1n P*P
1 j n
... P(n/n-l) (2.2.7)
and if all the probability sets are independent:
H: = —Z P(I) 1n P(i) - Z P(j) 1n P(j)
1 J
— -Zp(n)1n P(n) . (2.2.8)
n
The difference between H: and Hn would be the same as an
nth-order measure of Markov memory.
The information-entropy measure expresses a
capacity for freedom or variety and is sometimes referred
to as "potential" information. Often, it is desirable
to speak in terms of the order or "stored" information,
15' of a system. Logically, the order or stored informa-
tion is the difference between the maximum disorder or
entropy of the system and its actual entropy:
I = H - H . (2.2.9)
Obviously, the notion of stored information can be
extended to any of the multivariate cases, giving a
measure of order for each case of dependence.
25
The idea of "stored information" in information
theory terminology is usually conveyed by the concept
of redundancy, R. Redundancy, as explained by Weaver
(53), reflects that fraction of the message which is
ordered or repetitive:
R = IS/Hmax = l - H/Hmax (2.2.10)
The error in communicating a message is reduced as the
redundancy increases. Thus, the redundancy concept pro-
vides an indication of the reliability of the system.
Noise and channel capacity are two other common
terms. The channel capacity, C, is the maximum rate at
which information or entropy flows through a channel (the
physical medium of information transmission). Channel
capacity has the units symbols per unit time (54):
C = 1n (n)/t , (2.2.11)
where 1n n is sometimes referred to as the entropy of one
channel. Noise is one of the limiting factors in the
efficiency of transmission through a channel; it is the
error between the message sent and that received. The
processes of encoding and decoding the message are the
pertinent factors determining the noise of a channel.
One of Shannon's more important theorems describes the
potential for reducing the noise of a channel (19):
26
Let a discrete channel have the capacity 9
and a discrete source the entropy per second H.
If H i 9 there exists a coding system such that
the output of the source can be transmitted over
the channel with an arbitrarily small frequency
of errors (or an arbitrarily small equivocation).
If H 3 9 it is possible to encode the source so
that the equivocation is less than H — g + e
where s is arbitrarily small. There is no
method of encoding which gives an equivocation
less than H - g.
The theorem implies that we cannot eliminate noise in the
channel but we can "learn to live with it."
A question generally addressed in information
theory concerns the capacity of a set of symbols (words)
in a code (language) to transfer information based on
their respective durations (lengths). The duration or
length of a word is related to its cost, because the more
symbols needed to convey a bit of information, the greater
the cost and the less efficient such a transfer becomes.
One might suspect then that the frequency of a word in a
language would be related to its cost; the longer words
being less frequent and the shorter ones more frequent.
An analysis by Mandelbrot (55) showed the most efficient
coding scheme per unit cost satisfied the aforementioned
suspicion. However, the result achieved by Mandelbrot
by an information theory approach had already been
realized years before by Zipf (56, 57) through empirical
analyses of language. The principle or law discovered
by Zipf and rationalized by Mandelbrot can be expressed
either by:
.1 ”N1..J. I. n ...I...
.... taxman .....r1m :
27
a) a relation between the frequency of occur-
rence of an event and the number of different
events occurring with that frequency, or
b) a relation between the frequency of occur-
rence of an event and its rank when the
events are ordered with respect to frequency
of occurrence.
Zipf's law is a power law which can be linearized by
employing its logarithmic form. Mathematically, the law
states that the logarithm of a word's rank in a language
or code equals the negative of the logarithm of its fre—
quency plus a constant equal to the logarithm of the
frequency of the word with rank one.
Mandelbrot's method for deducing Zipf's law
began by calculating the best probability rule for words
of a varying cost, cj. The idea for obtaining this rule
was very similar to Boltzman's for finding the maximum
of ln P or the maximizing of the number of microstates,
yielding the most probable distribution. 3 was inter-
preted here as the total number of different messages
from E words:
1np=-N2 P. ln P. (2.2.12)
j J J
and was maximized by the Lagrange's multiples method with
the constraints that the sum of the Pj's equals one and
that the total cost of the message of E words, CN, equals
the sum of the costs for each jth symbol:
C = n.c. = N P.c. . 2.2.13
N E J J g 3 J ( )
2.2.1.: ..
.ﬁ 4.“.
28
{The result was that probability of the jth word must be
;re1ated exponentially to the jth cost to get the maximum
trumber of different words for a given cost:
Pj = exp (—bcj) (2.2.14)
“ﬁlere b is a constant.
The next step was to find the number of words
rq(cj) of cost cj. This problem boiled down to a finite
difference problem:
N(c.) = 2 n
j k kNk(cj - Ck) (2.2.15)-
Mﬂlich states that any one of the n
k words can be used to
(nonstruct a message of cost (cj — ck) to build a word of
txatal cost cj. For a stable code, the finite difference
sc>lution of equation (2.2.15) is:
= I
N(cj) Al exp (b cj) + A2 (2.2.16)
vﬂlere Al’ A2, and b' are constants, and where inverse b'
tiJmes the logarithm of Al equals the negative of the cost
Of the initial condition, cl.
By solving both equations (2.2.14) and (2.2.16)
fOr'cj, assuming A2 equals zero and sorting by increasing
cost or rank, the order of N(c.) as determined by its
cost is:
29
9— ln (Rank [N(c )1) = 9— ln A — 1n p (2 2 17)
bl j l l j 0 -
9— 1n (Rank [N(c )]) = -bc - 1n P
b' j 1 j
= ln P1 — ln Pj (2.2.18)
which is a generalized form of Zipf's law and dictates an
ordering for the number of words of cost C3. in a message
of g words, which maximizes total message variety
(efficiency).
Recently, Kozachkov (58) showed that a ratio of
b/b' equal to one maximized the total message variety.
He stated that the overall number of different messages
is:
P=ZP.P=P2P. (2.2.19)
3' j
and by determining Pj from equation (2.2.18) and substi-
tuting he got:
-(b/b')
PlN(cj)
Nude
_P=P
j 1
J _ .
p' 2 N(c.) (b/b ) . (2.2.20)
j=1 3
Then he calculated the sum over J different words for
three cases: b/b' greater than one, equal to one, and
less than one:
3o
_ p1 1-(b/b') .
P - l-—(bW)_ J , b/b < l
= p' 1n J , b/b' = 1
= P_'.___ b/b- > 1 (2 2 21)
(b/b') - l ’ ' ° '
The maximum P clearly is for b/b' equal to one as J
approaches infinity, and we can thus write Zipf's law:
ln P. = 1n P — 1n Rank. . 2.2.22
3 l ( 3) ( )
Kozachkov said that when a hierarchical structure or sys—
tem followed Zipf's law with a slope of minus one, its
organizability was maximum because the information
capacity at every level in the hierarchy was maximized
relative to the overall information capacity. This prin—
ciple was recently used as an indicator of national city-
size integration (59).
The far—reaching expressions of Zipf's law in
nature are very striking phenomena. Zipf himself expanded
the scope of his studies beyond that of word distribution
to distribution of interval frequency in classical music,
city-size frequency, product-manufacturing frequency,
retail store frequency, job-occupation frequency, newspaper
circulation frequency, charge account frequency in depart-
ment stores, frequency of telephone messages through
interchanges, and other examples which obeyed his rule (57).
31
Many empirical laws developed by others are Zipfian;
Pareto's law of income distribution (60) being one of
the more notable, has often been cited as instrumental
in formulating the graduated income tax structure of today.
The Lotka distribution law (61) deals with the frequency
of scientific writing. The biologist's law of allometric
growth, and allometry in general (62), are good examples
of the organization of parts within the organism work-
ing through self-regulation for the benefit of the whole.
The dose-response curves (63) demonstrate the inherent
ability within an organism to interpret an incoming
coded chemical message and elicit a response dependent
upon the message magnitude (chemical frequency). Zipf's
law is a powerful tool for assessing the organizability
of various physical, biological or social systems.
The importance of information-entropy criteria
in statistical inference was demonstrated by Tribus
(64). He utilized the principle logically set forth by
Jaynes (65) and Cox (66) that the maximum entropy formal-
ism elicits the minimally prejudiced probability distri-
bution. This is true since the maximum entropy state is
achieved when the hypothesis favors no inference more
than another. Tribus then proposed a method for calcu—
lating the probability distribution which would maximize
the entropy function under the constraints imposed by
the standard statistical distributions such as uniform,
32
exponential, gaussian, gamma, beta, etc. He then stipu-
lated that the particular probability density distribu-
tion which maximizes the entropy function yields the
minimally prejudiced probabilities. On this basis he
formulated an entropy inference test:
AS = N1) P(i) 1n P(i) + 2 P(J) ln P(J)
i j
- Z Z P(ij) 1n P(ij)] (2.2.23)
1 1
which, if one recalls equations (2.2.5) and (2.2.6), is
the difference between the maximum bivariate entropy,
Hg, and the distribution bivariate entropy, H2, quantity
multiplied by N. Thus, this entropy inference test
measures the independence between two distributions where
independence between the two sets implies that there is
no information difference between them. If one of the
distributions is prejudiced, then equation (2.2.22) mea-
sures the bias in our observed distribution. Tribus
showed that if the information difference between the
two distributions was small, the quanity -AS/N equals
the Chi-square statistic.
The importance of Tribus' work and other informa-
tion theory applications is that they demonstrate the
role of information in our world. The individuals men-
tioned in this section have uncovered that role: for
33
example, Jaynes in statistical mechanics, Mandelbrot in
Zipf's law, and Tribus in statistical inference. In sub—
sequent chapters, I will expand the perspective on
information into the area of nutrition to more fully
examine the utilization of information by the living
system.
2.3 Entropy, Information and Biology
The validity of applying information theory
methodology to other fields, aside from those directly
related to communication systems, was a point of some
debate after Shannon's inception of the approach (67).
Some felt that information theory was an approach which
could only be justifiably used in telecommunication
applications. However, such a viewpoint defines the
theory's value only in terms of its immediate success.
One cannot deny communication engineers their well—
deserved credit for laying the sound mathematical founda—
tions of the methodology, but such limited range for
application unduly restricts information theory develop—
ment to the parochial aspirations of this discipline.
Unlike the communication engineers, physical, biological,
and societal scientists could not as easily put their
encoders and decoders on the table and confirm the the-
oretical predictions of the theory. Because the
biophysicist could not take the cell's DNA and examine
34
its sequence nor the psychologist take a brain apart to
examine its neuron network, these early ventures of
information theory often resulted in frustration. This
frustration led to a decline in interest that has slowly
begun to be reversed with the determination of biochemical
and biophysical structures and functions of living systems.
The biological field is intimately associated with
the field of thermodynamics, and the important roles
played by entropy, information, order and control are
recognized throughout the discipline (68, 69, 70, 71).
The state of knowledge in several biological fields has
reached the level where application of the information-
entrOpy concept can have and has had a significant impact
on the interpretation of experimental studies.
A significant area of information theory applica-
tion in biology is in the field of neurophysiology. The
theoretical basis for applying information theory to the
nervous system was put forth by von Neumann in the mid-
fifties (72). His approach drew analogies between an
information synthesis of a reliable system's unreliable
components and neurological systems. Since this work,
experimental studies have tended to support the utility
of information analysis in neural systems. These appli-
cations have studied the encoding mechanism (73), trans-
mission and multiplexing of neural information (74, 75),
35
and the general physiology of nervous cells (76) and
systems (77).
Another prime arena of biological information
theory applications is genetics, and I would like to pre-
sent one of the major works on the subject as an illus-
tration of the potential the entropic approach possesses.
At the beginning of the last decade, a new understanding
arose concerning the relationship between the structure
of DNA and that of proteins (78). The genetic code, as
this relationship is commonly called, a universal
biological language for the storage and transmission of
cellular information essential to metabolism and behav—
ior, was deciphered (79). Code words are formed by a
sequence of three nucleic acids which link together
forming a strand of DNA. Each sequence translates into
an amino acid which during transcription of the code is
catalytically joined to others to form proteins.
An impressive study on the genetic code has been
done by Gatlin (80). Her work examined the univariate
and bivariate information capacities,eqpations(2.2.2),
H, and (2.2.5), H2; the stored information, equation
(2.2.9), I and redundancy, equation (2.2.10), R, of
S;
nucleic acid sequences in DNA. The results of her
examination have led to new insights not only on the
grammar aspect of the genetic code but also on the evo-
lutionary process in nature.
36
Recall that Is, stored information, is our
entropic measure for divergence from equiprobability or
equality in the univariate case or divergence from inde-
pendence in the bivarate case. Let us denote ISl as
univariate divergence and I$2 as bivariate divergence.
Both these measures have a special meaning with regard
to language. 151’ the divergence from symbol equality,
is the main determinant in a language of its message or
word variety. The frequency of a language's symbols
thus dictates the available vocabulary. ISZ’ the diver-
gence from independence, is the lSt—order measure of a
language's grammar. The degree of dependence imparts
redundancy or fidelity into the message by dictating the
symbols' relationships to each other (i.e., grammar).
Gatlin calculated I81 and IS2 based on the per—
centage of guanine and cytosine, for phage, virus,
bacteria, plant, insect, and vertebrate organisms. Of
course, univariate information-entropy studies had been
done on DNA before, but not bivariate. The union of
univariate and bivariate information—entropy concepts pre-
sented in the format of a language study is a significant
advance in genetics, for it gives a new methodology for
interpreting the biochemistry of cellular information
storage which would not be possible without information
theory.
37
How much further this new methodology might be
applied was also demonstrated by Gatlin in this study in
her application of the results to the evolutionary
process. [Such an application of information theory had
been previously suggested (81).] She was able because
of her application of information-entropy measures to
hypothesize a new theory about the course of evolution in
nature. The name given to this theory was Shannonian
evolution. Using the measures Isl and 152, Gatlin made
several important observations. One was that the nonver—
tebrate species showed considerable variation in the
frequency of guanine plus cytosine, whereas the verte-
brates displayed little variation. Also, it was noted
that the invertebrates had a significantly greater vari-
ation in both I81 and I82 than the vertebrates. After
extensive analysis of the situation her conclusion was
that the evolution from invertebrate to vertebrate life
forms has proceeded in two phases. The first was where
Is decreased; the second, where 152 increased.
2
Such an evolutionary course is significant in the
context of information theory because the Is2 governs the
relative degrees of variety and fidelity in a code. If
we assume that the overall redundancy or order in the
code increases consistently as one advances up the evo-
lutionary hierarchy, then, the first phase of decreasing
IS can be regarded as the period when ISl is increasing.
2
38
151 affects the message variety of the code and conse-
quently this first evolutionary phase can be looked on
as a search for an optimal alphabet for message variety.
The second phase of increasing 182 depicts an increase
in the grammar or dependence of symbols in the code.
This evolutionary process is similar to a child learning
how to read and write. First, the alphabet is taught
with simple spelling (a form of grammar). After the
alphabet has been mastered (end of evolutionary phase
one), the development of reading and writing skills
involves learning increasingly more grammar, I52, as
advanced spelling, syntax, etc. (evolutionary phase two).
Gatlin's explanation of evolution through the
information theory allows us to understand Darwin's
theory in the context of modern evidence in genetics.
Other biological information systems amenable to
information—entropy analysis exist. If the genetic
system is a living information storage system, the meta-
bolic or nutrition system can be regarded as the main-
tenance system of an organism. Such a nutritional system
operates by encoding information (food) it receives, and
channeling it for use in growth. Although simply stated,
the control of the various metabolic information processes
is very complex and the information messages are not as
neatly visualized here as in the genetic system. However,
by employing an information—entropy analysis of nutritional
39
systems, an integrated format uniting information stor-
age by the genes and information transmission in the
metabolic control process can begin to be understood.
2.4 Entropy, Information and Nutrition
The aim of this thesis is to employ the concept
of entropy in the context of information theory and to
use this measure, information-entropy, to analyze various
metabolic processes. The first question to be addressed
is whether Second Law principles hold for living systems.
Schroedinger (82) stated that Second Law behavior does
hold for these systems, but qualified this statement by
claiming life to be a steady—state process which preserves
the entropy of the individual organism at the expense of
increasing the entropy of its environment. Essentially,
the organism maintains itself by consuming low entropy
substances and transforming them into higher entropy com—
pounds. Given that such thermodynamic behavior is valid
for biosystems, can we extrapolate from what could be
called an energy—entropy basis to an information—entropy
basis?
A direct translation of the phenomenological laws
of the thermodynamic branch to the information branch was
thought by von Neumann to be consistent and a logical
step. He expressed this opinion in the following
manner (83):
_._
40
There is reason to believe that the general
degeneration laws, which hold when entropy is
used as a measure of the hierarchic position of
energy, have valid analogs when entropy is used
as a measure of information. On this basis one
may suspect the existence of connections between
thermodynamics and new extensions of logics.
Fong (84) also perceives congruency between the thermo-
dynamic and information behaviors of biologically active
systems, finding the laws for the creation and dissipa—
tion of information consistent with those of Prigogine's
(85) thermodynamic theory of structure, stability and
fluctuations.
A biological dissipative process can best be
understood in processes such as catabolism. By studying
the complete catabolism of alanine in the mammalian body,
it can be demonstrated through the use of an information-
entropy approach that a dissipation of information occurs
during catabolism as one would expect a dissipation or
increase in entrOpy to happen. The formula for the
complete respiration (catabolism) of alanine is (86):
4 CH3CH(NH2)COOH + 12 O + 10 C0 + 10 H O
2 2 2
+ 2 CO(NH2)2 . (2.4.1)
Using equation (2.2.2) to calculate H, our information-
entrOpy measure, the molecular information in the above
process will be dissipated if H(products) is greater than
H(reactants). The molecular probability on each side of
l1
41
the equation can be equated, using the respective mole
fractions of each compound. The following information—
entropy measures can be calculated from equation (2.4.1):
H(reactants) = -0.25 k 1n 0.25 — 0.75 k ln 0.75
= 0.811 bits/molecule, (2.4.2)
and
H(products) = -0.454 k 1n 0.454 — 0.454 k In 0.454
- 0.092 k 1n 0.092
= 1.351 bits/molecule (2.4.3)
where k is a constant factor for converting from natural
logarithms to base 2 logarithms. Results in information
theory are typically expressed as the number if binary
digits, termed "bits."
The amount of information present or stored in
the system is given by equation (2.2.9), and Hmax can be
calculated when the five different molecular species in
the above catabolic reaction are equiprobable (i.e.,
equal mole fractions). The informations of the respective
systems are:
I (reactants) = H - H(reactants) = 2.322 — 0.811
s max
= 1.511 bits/molecule, (2.4.4)
and
Is(products) = Hmax — H(products) = 2.322 — 1.351
= .971 bits/molecule. (2.4.5)
l1
42
The information change in going from reactants to products
is IS(reactants) minus Is(products), which equals 0.54
bits/molecule and indicates that information is being
dissipated.
Supposing that the basic dissipative laws of
thermodynamics can be transferred to those of information,
a detailed inspection of other information—entropy proc-
esses is necessary. Coding of information is an attribute
commonly seen in biological systems. An example of a
biological code is the genetic code.
However, information coding is not as readily
seen in nutrition as in genetics, the reason being that
the nutritional control system is a complex information
hierarchy. There is not a universal code such as that
in genetics which translates through a well-defined gene-
protein channel, but rather, many different codes and
channels make up the nutritional system. Perhaps the
most obvious code of relevance to nutritional studies is
that which can be termed the "protein code" (87). The
term "protein code" arises from the relationship between
the genetic code and proteins. If one assumes that DNA
is a coded sequence of nucleic acids, it logically fol-
lows that the amino acid sequence generated by the gene
itself is coded in some manner. Therefore, the protein
code can be envisioned as providing the basis for the
creation of chemical informational molecules whose
lr
43
function is dependent upon protein structure (e.g.,
enzymes).
Whether a protein can be formed from a nucleic
acid message depends upon the availability of amino acids
within the organism. Those amino acids which cannot be
synthesized by the cell must come from the diet, and the
proportion of such essential amino acids in the diet, as
well as quantities of nonessential amino acids, will
affect synthesis of cellular protein. This interrelation-
ship provides a basis for justifying application of
information-entropy to nutrition and for relating it to
previous work on information theory in biology.
Of course,theprotein system is but one system
for metabolic transformations. However, the above dis—
cussion allows one to visualize how entropy and informa-
tion concepts can be incorporated into an analysis of
nutritional systems. In the following chapters, I will‘
present more detailed discussions and analyses of the
relationship of amino acid nutrition to overall protein
metabolism, and of some aspects of carbohydrate bio—
chemistry, with the aid of information theory. Through
these studies I hope to elucidate the importance of the
concept of information—entropy in nutritional systems.
CHAPTER III
INFORMATION AND THE QUALITY OF PROTEINS
Nutritionists have developed many concepts over
the years, both qualitative and quantitative, to express
the value or worth of a food protein. These standards
have been formulated by using combinations of chemical
and biological analyses. This chapter will present an
information—entropy approach for ascertaining a nutritive
protein code and then show the relationships among the
various indicators of protein quality and the information-
entropy of the diet.
3.1 Indices of Protein Quality
Before applying the principles of information the-
ory to protein quality, a description of the most common
indices of protein quality is in order. The following
concepts will be discussed: biological value, digesti-
bility, net protein value, net protein utilization,
protein efficiency ratio, chemical score, and essential
amino acid index. Although it is not a complete or
exhaustive list of concepts for assessing protein quality,
this set of indices is representative of the more com—
monly used parameters.
44
45
The "biological value" (BV) is one useful estimate
of protein quality, involving a nitrogen balance approach.
This measure was defined in 1909 by Thomas (88) as the
fraction of absorbed nitrogen retained within the organism
for maintenance and growth. It may be expressed mathe-
matically as (89):
N - (F-F ) - (U—U )
_ I k k
where NI is the nitrogen intake, F is fecal nitrogen, Fk
is endogenous fecal nitrogen, U is urinary nitrogen, and
Uk is endogenous urinary nitrogen. The endogenous fecal
and urinary nitrogen can be determined by finding a
nitrogen—free diet or one containing a small amount of
high quality protein (90). Estimates of biological value
which do not correct for endogenous nitrogen losses are
termed "apparent biological values."
"Digestibility" (D) is probably one of the oldest
qualitative indicators used in nutritional studies. It
denotes the fraction of the food nitrogen which is
absorbed, and is calculated (89):
N - (F-F )
D = _———————I k (3.1.2)
NI
Like biological value, digestibility is a nitrogen bal-
ance index, and is classified as "true" or "apparent"
46
depending upon the inclusion or exclusion of endogenous
nitrogen losses in its determination.
Another nitrogen balance method which is a com-
bination of the previous two indices was put forth by
Bender and Miller in 1953 (91). Essentially, this new
index, originally called "net protein ‘value" (NPV), is
equivalent to biological value times digestibility, and
expresses the amount of nitrogen retained divided by the
total nitrogen intake:
N - (F-F ) - (U-U )
NPV = I k k . (3.1.3)
N1
Several years later, Bender and Miller proposed
a shortened method for determining what is effectively
the same quantity as net protein value. The difference
was that this new index was approached through a carcass
analysis method rather than by nitrogen balance; the name
coined for this parameter was "net protein utilization"
(NPU) (92). Net protein utilization was defined:
B - (B —N )
NPU = ____N_k__13f_ , (3.1.4)
I
where B is the body nitrogen of the animals fed the test
protein, and B and NI are the body nitrogen and nitro-
k k
gen intake of the group fed the nonprotein diet.
l1
47
All the aforementioned tests for protein are
usually noted as being conducted either under "standard-
ized" or under "operative" conditions (89). Standardized
measurements are those made under maintenance conditions,
whereas operative ones are those made under other defined
conditions. Sometimes a suffix indicating the percentage
of protein in the diet is used (e.g., NPUlO). These are
important constraints to recall when interpreting these
tests, for the quality of the protein depends greatly on
the purpose for which it is required (e.g., growth or
maintenance).
Typically, net protein utilization and net pro-
tein value are taken as measurements of the same quantity,
and are not distinguished between in the literature (93).
However, for my purposes a distinction will be made. The
term "net protein value" will refer to those measures
calculated by multiplying digestibility times biological
value (i.e., those done by a balance method), while "net
protein utilization" will denote measures determind by
a carcass analysis method.
The final biological estimate of protein quality
to be presented here is the "protein efficiency ratio"
(PER). It is a parameter proposed in 1919 by Osborne
et_§l. (94), and defined as the "gain in body weight
divided by weight of protein consumed." This is a very
popular index, primarily because of the ease with which
48
it can be determined. Previously, the determination of
this ratio was conducted at several levels of nitrogen
intake. In this manner, an optimal level of protein
intake could be identified for a maximum gain in weight.
Generally, good correspondence between gain in body weight
and gain in body protein exists, however PER is not always
an acceptable evaluation procedure (95), and is not as
reliable an indicator of protein quality as the other
indices.
Although these indices are determined experi—
mentally in quite different ways, biological value, net
protein value, and net protein utilization are based
upon the same criterion, retained nitrogen, and should
measure essentially the same thing (96). The protein
efficiency ratio would be an approximate measure of this
criterion, also. Table 3.1.1 gives a listing of 21 dif—
ferent food proteins taken from an FAO compilation of
biological data (97), for which the scores of the above
four indices were found (protein level of diets from
which scores were derived was 10%). The table includes
both the actual score and a ranking of the proteins
based upon their respective scores. Table 3.1.2 lists
linear correlation coefficients (98) which were calcu-
lated using paired scores of the indices, and all
regressions are significant at P < 0.01. This cross-
correlation analysis shows a very good relationship
49
ﬁrw-‘r ”‘7"
TABLE 3.1.l.——Listings of Biological Value, Net Protein Utilization,
Net Protein Value, and Protein Efficiency Ratio Scores
with their Respective Rankings (Source:
FAO).
Biological Net Protein Net Protein Eifiziigcy
Value Utilization Value Ratio
Score Rank Score Rank Score Rank Score Rank
Egg, whole 93.7 1 93.5 1 90.9 1 3.92 1
Wheat, whole 64.7 14 40.3 20 58.8 11 1.53 18.5
Maize 59.4 18 51.1 17 54.5 15 1.18 20
Casein 79.7 4 72.1 4 76.8 3 2.86 5
Fish, meal 81.1 3 65.8 7 76.2 4 3.42 3
Soybean 72.8 8 61.4 8 65.9 6 I 2.32 7
Groundnut 54.5 20 42.7 19 47.2 19 1.65 15
Sunflower 69.6 10 58.1 9 57.0 12 2.10 13
Lentil 44.6 21 29.7 21 37.9 21 0.93 21
Rice, polished 64.0 15 57.2 10 62.7 9 2.18 11
Wheat, germ 73.6 7 67.0 5 64.9 7 2.53 6
Cottonseed 67.2 11 52.7 14 53.5 16 2.25 9
Linseed 70.8 9 55.6 11.5 59.8 10 2.11 12
Sesame 62.0 17 53.4 13 50.7 17 1.77 14
Milk, whole 84.5 2 81.6 2 81.9 2 3.09 4
Beef, muscle 74.3 6 66.9 6 73.8 5 2.30 8
Lima beans 66.5 12.5 51.5 16 47.9 18 1.53 18.5
Peas 63.7 16 46.7 18 55.8 14 1.57 16.5
Pigeon peas 57.1 19 52.1 15 44.4 20 1.57 16.5
Brewer's yeast 66.5 12.5 55.6 11.5 56.1 13 2.24 10
Fish, muscle 76.0 5 79.5 3 64.6 8 3.55 2
._ --.;J
50
TABLE 3.1.2.--Matrix of Correlation Coefficients Relat-
ing Biological Value, Net Protein Utiliza—
tion, Net Protein Value, and Protein
Efficiency Ratio Scores.
BV NPV NPU
NPV 0.942 1.000 0.881
NPU 0.924 0.881 1.000
PER 0.906 0.859 0.918
TABLE 3.1.3.—-Matrix of Correlation Coefficients Relat-
ing Biological Value, Net Protein Utiliza-
tion, Net Protein Value, and Protein
Efficiency Ratio Ranks (Spearman's ps).
BV NPV NPU
NPV 0.911 1.000 0.865
NPU 0.901 0.865 1.000
PER 0.893 0.839 0.928
between raw scores of the various indices. On the other
hand, Table 3.1.3 has Spearman's rho (ps) (99) correla-
tion coefficients for the ratings (P < 0.01). Basically,
p can be regarded as a regression coefficient for two
3
ranked variables. Spearman's index additionally tells
us that not only are the scores highly correlated, but
so are the relative rankings ‘we derive from them. This
51
lends support to the contention that all these tests are
a measure of the same variable.
The previously mentioned measures of protein
quality all involve a combination of biological and
chemical methods of analysis. Nutritionists have ardu-
ously sought a simple chemical procedure for determina-
tion of protein quality which would be as accurate as
the experimental measure of biological value. The
incentive is that biological tests are expensive and
time-consuming.
One of the first attempts to minimize biological
testing utilized chemical score (CS). Employing the
principle of the limiting essential amino acid as a
justification for their method, Mitchell and Block (100)
calculated a mathematical regression between their chemi-
cal scores and the biological values of 23 different
proteins.
Chemical score is represented by the minimum
amino acid ratio of amino acids in a test protein to
those in a standard protein; it was first advanced as:
min (ax.)
_______1_
cs = asj , (3.1.5)
h . .
essential amino
wherenu11(axj) is the content of the jt
acid which is most limiting in a test protein and asj is
the content of the jth essential amino acid in the
77,—». -_..=..-.__~-lr-_—.-__ __ 1-7.11.._~__-__1 . _ .. __ _~._. ..,H
52
standard protein (usually egg), expressed in units of
milligrams of amino acid per gram protein-N or grams per
16 grams protein-N.
However, the chemical score method considers
only one amino acid in the protein, so a scoring method
was sought that would include more amino acids of the
food protein.
A variation of the above approach, which incor-
porates all the essential amino acids of a protein into
an index of quality, was conceived by Oser (101) and has
come to be known as the Essential Amino Acid Index (EAAI).
This index has been used to estimate the biological
value of a food protein relative to that of a standard
protein.
The EAAI is basically a determination of the
geometric mean of a set of ratios. These ratios are the
same as those used in the chemical score procedure (i.e.,
the ratio of essential amino acid concentration in an
arbitrary food protein 5 relative to its concentration
in a standard protein). The standard protein used is
egg and Oser assigns to egg the biological value of 100.
The following mathematical formula is used to
calculate the index:
1/10
ax2 3X10
X100. (3.1.7)
asl asz aslo
, -. ..-~.u- . ,__ . , ' .
‘ A .. .. ...”; ..
53
Oser had two additional rules he employed in determining
the EAAI: (l) the maximum value of the ratio for any
essential amino acid will never exceed 1.0, and (2) the
minimum ratio will never be less than 0.01.
The first of these assumptions is based on the
view that any quantity of an amino acid in excess of that
possessed by the standard is not needed by the organism
for growth. Thus, the surplus may be disregarded. In
the second assumption, the justification is that there
always exist certain endogenous sources of protein (e.g.,
intestinal enzymes, tissue degradation) which will supply
some of any essential amino acid.
The limitations of amino acid scoring methods
involve the factors which influence the digestibility of
the protein. For example, when the essential amino acids
areIuM:completely available for metabolism, due to malab-
sorption or some other factor, the tendency of any index
based solely upon their content is to overestimate the
real biological value. Consequently, these indices are
most accurate for those proteins which are more completely
digestible and lose accuracy as the protein digestibility
decreases.
A closing observation concerns the accuracy of
these indices in assessing protein quality. In a recent
appraisal of protein quality, Bender considered it suf-
ficient to classify proteins as poor, moderate and good,
54
and thought that Oser's EAAI or Mitchell's chemical score
were as good indices as any to use in assessing the multi-
plicity of protein needs (102). Thus, the salient point
seems to be that the amino acid content of the food pro—
tein may be one of the best indices of protein quality
available, and most certainly is a major determining
factor of the biological methods presented in this
section.
3.2 An Information—Entropy Model
of Protein Quality
The purpose of this section is to view protein-,
or more specifically, amino acid—nutrition with an
information—entropy lens. The study of protein nutri—
tion is extremely complex, but the main criteria for
nutritional well-being are dictated by the organism's
growth and maintenance requirements. As noted previously,
various indices judge protein quality by estimating that
fraction of the protein which is retained for growth or
maintenance, depending upon experimental constraints.
Consequently, any model which addresses the problem of
protein quality must consider the disparity between pro-
tein needs during growth and those of maintenance, and
the ways in which such variation affects protein quality.
I wish to begin my development of an information-
entropy model in a discussion of the information flow
from the genetic space to the protein space. My objective
55
is not to extensively discuss the transcription of DNA
into protein, but rather to outline the processes
involved. The following sequence depicts the transcrip-
tion of the information from DNA (103):
(l) transcription from nuclear DNA template
to messenger RNA (mRNA)
(2) mRNA to cytoplasm
(3) attachement of 30S and 50S ribosomal
RNA (rRNA) subunits (called ribosomes)
to mRNA
(4) activation of amino acid by reaction
with transfer RNA (tRNA) forming
aminoacyl-tRNA
(5) aminoacyl—tRNA is directed to appro-
priate codon on mRNA
(6) synthesis of peptide bond by rRNA—
mRNA—aminoacyl-tRNA-protein complex
(7) termination of peptide chain by chain-
terminating codon.
This sequence is graphically illustrated in Figure 3.2.1.
The above relates how information coded on the
DNA template passes to a protein through RNA intermedi—
aries. As was previously mentioned in sections 2.3 and
2.4, the information present in the DNA can be defined
by an information—entropy measure, Hn’ based on the DNA's
nucleic acid frequency and sequence. This idea can be
further refined to stipulate that each protein-generating
DNA template has its own individual information-entropy
content. If these templates can each be assumed to be
structurally unique, then their information-entropy
56
$
.mﬂmmsucmm cﬂmuoum CH
coaumEH0mcH owumcmw mo GOHMQHHO
1mcmua mo coaumucmmmnmmm Hmownm6H011.H.~.m mnsmﬂm
mmum
dzmH
EmmHQOpmo coaumﬂuomcmuu
ou dza
Alt]
® 9
ANZME
. .«ZG
manhnﬁmﬁ
Homaosq ummHosz
57
contents would also be unique. Given the information
transfer of protein synthesis, the information-entropy
level of the DNA template determines the amino acid com-
position of the protein. From the information theory
viewpoint, such a nuclear DNA template can be regarded
as an information—entropy message source, and the gen-
eration of protein structure can be accomplished by a
biological communication system through which the message
is sent.
Five basic components make up this information
communication system: (1) a message for transmission,
(2) an encoding device, (3) a channel, (4) a decoding
device, and (5) a message transmitted or received. The
message sent over this system must be derived from a DNA
template in the genetic structure of the organism. The
encoding device should consist of enzymes such as RNA
polymerase, which encode the DNA message into messenger
RNA. I define the messenger RNA as the channel for this
system, for it carries the nucleic acid message from the
nucleus to the cytoplasm, where synthesis or decoding
occurs. Decoding devices for the channel are the ribo—
somal RNA subunits, aminoacyl—tRNA, and various protein
initiating factors. The decoding process seems to be the
most complex step of all, involving many phases; it could
be viewed as a highly redundant process to ensure accu—
rate decoding of the message. The message is received
58
by the growing peptide chain, which upon completion
results in the protein—coded molecular form of the nucleic
acid message. The relationships described among these
biological phenomena and the information communication
system are illustrated in Figure 3.2.2.
One aspect of an information system thus far
ignored in the discussion of a gene-protein communication
system is the notion of noise. Because of the high speci—
ficity of the encoding and decoding devices (e.g.,
enzymes, tRNA, etc.), virtually error—free translation
from the DNA to protein occurs (103). Such noiseless
transmission in the system allows the information-entropy
content of the DNA template to be equivalent to the
information—entropy content.ofthe protein, for with
error-free transmission in communication systems, the
entropies of input and output are identical (104).
The above conservation principle between source
and receiver information-entropies is very important,
because we can now use it to explain the changes in
protein pattern requirements during growth. A rapid rate
of accretion of protein begins at birth and decreases as
the animal grows older (105). This rapid protein reten—
tion results both in a higher amino acid intake require—
ment, and in alteration of the pattern of amino acid
requirements between young organisms and adults. The
higher amino acid intake requirement is easily
.GOHmeHomcH oaumcmw mo GOHmmHEmcmHB How Emummm coﬂuMOAGDEEOO pmwﬂammpH11.N.m.m musmﬁm
©0>Hoowm . prooma Hmccmzo Hopoocm monsom
mmmmmmz
cﬂmuonm
mmmnmEMHom
Umsmﬂcﬂm
mpﬂcsnsm 42m mzm
HMEOmOQﬁH
‘ZMH —
59
. dzmp
.9 ..
Hmmcwmmma
mzm Hmwmcmuu mgmHmﬁmu
1H>omocﬂﬁm £29
60
rationalized by observation of the increased demand for
these compounds in protein synthesis. However, the
alterations in duapattern requirements of amino acids
during growth are not so readily explained.
During growth, a phasic development of various
organs (e.g., liver, brain, skeletal muscle, etc.) occurs,
and each organ's growth has its own particular amino acid
pattern (105). The development of these various organ
systems must be caused by the expression of a particular
genetic region on the chromosomes of the organism. If
the assumption, previously put forth, that each such
region possesses a unique information-entropy content,
is valid, then our "conservation of information-entropy"
principle dictates that for the gene-protein channel the
information-entropy content of the corresponding protein
must also be unique. Recall that uniqueness can be
defined as a particular set or pattern of symbol fre—
quencies in information theory, meaning that each unique
protein has a distinct pattern of amino acid frequencies.
Thus, the differences in the pattern requirements between
young and adult organisms can be understood to result
from the differences in the information-entropy levels
of genetic expression taking place during the early and
later stages of development.
The "conservation of information-entropy" princi-
ple explains how protein metabolism and amino acid
61
requirements can be affected by genetic expression. Now
II wish to relate the gene-protein communication systems
model depicted in Figure 3.2.2 to amino acid consumption
by the organism. The description of protein synthesis
illustrates the importance of amino acids within the
cytoplasm (called the "amino acid pool"). The presence
of many amino acids in the pool results from membrane
transportcﬂfthe plasma amino acids into the cell (106).
The primary source of plasma amino acids is the diet
(107). Consequently, the decoding of the genetic message
is greatly dependent upon dietary amino acids, and par-
ticularly upon the essential amino acids, because they
affect the amino acid pool composition and size.
Let us momentarily review the physiological phe-
nomena involved in transmitting dietary amino acids to
the tissue cell. Most dietary amino acids are found in
the polymer form. The peptide bonds linking the amino
acids must first be broken to free them for absorption
and utilization by the organism. This bond-breaking proc-
ess is termed "hydrolysis," and begins in the stomach and
is completed in the small intestine (108). The freed
amino acids, coming from endogenous as well as exogenous
sources, and also occasionally some small peptides, are
taken through the intestinal wall by several transport
systems, with each system transporting only a certain
62
set of amino acids. After absorption the amino acids
enter the portal blood.
The first major organ the dietary amino acids
encounter is the liver, which plays a central role in
allocating these compounds to the other body tissues
(109). Approximately 70% to 100% of the absorbed amino
acids are taken up by the liver. Four possible fates
await the acids absorbed here: (1) catabolism, (2) syn-
thesis into plasma proteins, (3) release as free amino
acids, and (4) storage as a part of the liver's labile
amino acid reserve. The last three play important roles
in supplying remaining body tissues with amino acids,
although complete mechanisms for accomplishing this, par-
ticularly for the plasma proteins, are not fully under-
stood. However, free amino acids in the plasma are
transported into the cell and affecttﬂuaintracellular
amino acid pool. Thus, the role of the liver is that of
a regulator which temporarily stores the dietary amino
acids until they are required by other organs.
The overall effect of the above is that the
capacity of a cell to carry on protein synthesis is
directly dependent upon the ability of the diet to fulfill
the anabolic requirements of the organism.
From the information perspective, the above proc-
esses can be explained in terms of a communication system.
First, we have an information source, the food protein,
63
which contains a coded message of amino acids. The
message possesses a certain information capacity deter-
mined in this case by its word frequency (or amino acid
frequency). The message is then encoded (i.e., protein
is digested and transported) for transmission through the
communiation channel (i.e., circulatory system), then
decoded (i.e., transported into cell) and directed to
some final destination (i.e., cellular amino acid pool).
With the inclusion of the "noise" concept in the system
(i.e., those inefficiencies such as the incomplete diges-
tion of the dietary protein or poor absorption of amino
acids from the gut), an essentially complete communica-
tion system has been described for the transmission of
the nutritive information in a food protein to the recep-
tor amino acid pools in the organism. Figure 3.2.3 is a
representation of this system relating the aspects of
nutrition and information theory.
Out of this basic concept of a communication sys-
tem, will be developed some nutritional information-
entropy measures. The first question concerns the nature
of our measures of information. Informational units are
amino acids, of which there are approximately twenty.
In the cellular pool, each of these amino acids indepen-
dently maintains a particular level or concentration as
a function of various metabolic outlets (106).
64
.mmasomaoz
HMGOHquHomcH baud 09.23 mo 9
scammwamcmua on» How Emumwm cod»
1moa§§=00 pmuﬂammlel . m . m . m mHsmHm a
. 0
mwgmw AW ﬁg.‘®
9
«HAND
w Akmumwm gonnadouﬁo
Emummm
unommamua
UHU<
2.... \. o 0 ......m...
HMHDHHOO Emumwm p m
“Hommcmna
baud ocﬂﬁﬁ
HoneymmucH
Hm>ﬂmomm Hopooma Hmccmno Hmpoocm ouunom
omMmmmz
65
The combinatorial approach proposed by Kolmogorov
(110), a maximum entropy formalization of Shannon's
method, is very appropriate for this system. In this
approach we assume that a variable, x, containing N ele-
ments, has an information-entropy content, H(x), equal
to k ln N. Note this formulation for the information-
entropy of variable x is exactly the same as Shannon's
maximum entropy expression, equation (2.2.3), where all
the Pj's are equivalent and equal to l/N. Kolmogorov
expanded this approach for a set of variables, x
1' ...,
xj, ..., xn, each capable of taking on values, N1, ...,
Nj’ ..., Nn' such that the information-entropy of this
set is defined:
H(xl,...,xj,...xn) H(xl) + --- + H(xj) + ---H(xn)
k ln N + --- + k ln N. + ...
l J
+ k 1n Nn . (3.2.1)
Thus, for my nutritive amino acid communication
system, the variables are the twenty different amino
acids present in the cellular pool, where each possesses
a particular magnitude dependent upon the overall meta-
bolic state of the cell. A similar situation holds for
the dietary amino acids, but the magnitude of each of
these variables is characteristic of the protein fed.
Now, let's formalize this communication system into the
66
above combinatorial format. Starting with our information-
entropy source, we stipulate that a given amino acid vari-
able, aaj, has a magnitude, cxaxj, for food protein x,
where cx is the concentration of protein in the diet
(based on molar units of protein), and axj is amino acid
content of the protein (based on molar units of aaj per
molar unit of protein). The resultant magnitude of the
amino acid variable is calculated on the basis of moles
of aaj input to the system. Therefore, the total
information-entropy of the jth variable is defined:
t x
H o = k l aXI ’ 3.2.2
X(aaj) n (C J) ( )
and a characteristic protein information-entrOpy, which
is based upon the typical or characteristic amino acid
spectrum of the dietary protein, as:
H aa. = k 1n ax. . 3.2.3
x< J> ( j) ( )
For the message source, the channel transmission rate,
I(aaj), can be defined:
I(aa ) = eL-k 1n (cxax.) (3 2 4)
j At 3 . . .
The amino acid variable at the receiving end of
the communication system has magnitude amj for the jth
amino acid variable. This quantity is also mole—based.
If we were to consider a value for amj based on a single
cell it would be that amount of amino acid necessary to
67
provide for all the cellular metabolic needs. However,
the total nutrition of the organism must be-considered,
and not only one cell. Hence, the value of amj will be
that amount necessary to fulfill the metabolic require-
ments of all the cells in the organism. The information-
entropy for the receiving end is:
H aa. = k ln am. . 3.2.5
r( j) J ( )
Having defined the information-entropies of the
source and receiver, the next aspect of the amino acid
nutritive communication system to be viewed is the notion
of "channel capacity." The theorem on page 26 states
that to minimize transmission errors (noise), the channel
capacity must be greater than or equal to that of the
source. Ability to minimize transmission error is highly
desirable for any organism and nature would probably not
design a system which violated conditions allowing error
minimization. To minimize transmission error, the rela-
tionship which holds for the entrOpy of the source and
the channel capacity must also hold between the entropy
of encoder and decoder. That is, entrOpy of the decoding
device must be greater than or equal to that of the
encoding device. Also, the information-entrepy capacity
of the source cannot be greater than that of the encoder.
If channel capacity is much greater than that of the
decoder, then decoder entrOpy becomes the determinant of
68
low error transmission. Assuming this is the case, the
decoder entropy determines the channel capacity for
error-free transmission. As was previously mentioned,
the decoder entropy's jth variable is determined by the
metabolic requirements of the receiving end of the system
for that amino acid. Consequently, the jth channel
capacity, ij, can be taken to be the information—
entropy of the respective receiver:
—1 _i
C . — Kt Hr(aaj) — At k 1n amj . (3.2.6)
Both the channel transmission rates and channel
capacities are measures of amino acid frequencies of our
system. These amino acid frequencies are very much like
word frequencies in any spoken language. That amino
acids are words and not symbols can be argued along
genetic lines. From the genetic code we know that amino
acids are coded by a combination of nucleic acid symbols,
as we similarly use letters to code words. In this vein,
amino acids can be regarded as words. The importance of
interpreting amino acids as words is that this interpre-
tation offers the opportunity to utilize Zipf's law and
obtain a cost—frequency ranking or ordering scheme.
The main proposition of Zipf's law is that the
total cost for a message of N words, C equals
NI
c = 2 n.c. . (3.2.7)
1'
.———~ .77‘ .374“ . , ~
69
For our system we sum over only one 1 value because we
are transmitting through a one-word channel. Therefore,
equation (3.2.7) reduces to:
C = n c. (3.2.8)
Before proceeding further, let us define more
fully what is meant by "cost." The total cost, CN’ can
be visualized as the total time available for the jth
amino acid to do an appointed number of metabolic tasks
for the best growth or maintenance performance by the
organism. The number of metabolic tasks which can be
performed during CN is nj. Associated with each nj is
an individual cost, Cj’ which is the average performance
time for each task in the time period CN. Now, if there
is some ideal number of tasks the jth amino acid must
perform during CN and nj is less than the ideal, an
inefficiency arises. The degree to which the system is
inefficient is measured by cj, the average task perform-
ance time.
Given the above definition, let us proceed to
deduce Zipf's law as Mandelbrot did. Fortunately, com-
binatoric information—entropy formulation is much easier
to handle than a probabilistic form. Thus, the finite
difference approach is not needed to obtain the cost-
frequency relationship. We begin by setting CN equal to
At, the time duration of our channel. By definition, the
70
ideal number of metabolic tasks required of aaj over the
time At is amj, and the number which can be supplied by
food protein x is cxaxj. Substituting these values into
equation (3.2.8) we get:
_ _ x
At — (amj)(cmj) - (c axj)(cxj) , (3.2.9)
which becomes:
c a .
ll= —m3—— , (3.2.10)
ij c axj
where cxj and cmj are the individual metabolic performance
costs of source aaj and receiver aaj, respectively. The
left side of equation (3.2.10) is a ranking or ordering
function of the ability to perform the ideal number of
metabolic tasks relative to the number permitted by
dietary limitation. The ranking is absolute if cxj is
restricted only to integer multiples of cmj' and the
readily identifiable integer sequence results. The rank-
is any other real multiple of c ..
ing is relative if c
m]
xj
The above formulation is exactly equivalent to
Zipf's law if we use an absolute ranking condition,
arxj ° cmj = ij' Making the above substitution, and
taking the logarithms of both sides, we obtain:
ln
(arx.)(cm.) x
J 3 = 1n ar . = 1n am. - 1n c ax. , (3.2.11)
X] J J
W.
71
where arx. is the absolute rank—order of protein x and
and amj is the maximum aaj frequency of the system, and
cxaxj is the aaj frequency of protein x. Equation (3.2.
11) is an exact formulation of Zipf's law.
However, because there is not sufficient evidence
to assume that cxj is always some integer multiple of
ij, the relative ranking form of Zipf's law will be
used:
C
—§i = ln r . = 1n am. — 1n chx. , (3.2.12)
ij X] J 3
where rXj is the relative rank—order.
Terms in the above formula should look familiar,
because after multiplying by a constant factor, k/At, the
formula becomes the difference between channel capacity
and the transmission rate. This allows one to see the
continuity between the operation of the nutritive amino
acid communication system and a metabolic cost—frequency
ranking dictated by Zipf's law. The objective now is to
utilize this concept of relative rank—order to relate our
information—entropy analysis to some of the indices of
protein quality discussed in the previous chapter.
We start by taking the antilogarithm of equation
(3.2.12) and shifting around some terms:
X
c ax. = a . r . . 3.2.13
3 m3/ X3 ( )
72
Recall that cxaxj is the greatest possible quantity of
the jth essential amino acid from food protein x that can
be utilized by the organism. For the time being, let us
assume that all of the jth amino acid content of protein
x is utilized for protein synthesis by the organism, and
is thereby retained. If we divide equation (3.2.13) by
a factor which we assume constant over At, the total pro-
tein nitrogen intake, N expressions for both the
II
biological value of the jth amino acid of x, BV(xj), and
the net protein value, NPV(xj) (or, similarly, net protein
utilization) result:
C ax. amj 1
EV X. = NPV X.) = = ° . 3.2.14
( J) ( ] —lNI NI rxj ( )
This states that biological value and net protein value
for a single amino acid are inversely proportional to the
relative rank—orders.
x
By taking c axj and csasj for two different food
proteins, x and s, and dividing them, we get:
rs. chx.
r—3—= Tl . (3.2.15)
xj c asj
The following identity is seen from equation (3.2.14):
rsj BV(Xj) NPV(X.)
ix—j = _BV—_(sj) = N—JTPV(Sj , (3.2.16)
73
and equation (3.2.15) becomes:
BV(x ) NPV(x.) cxaxj
BV(sj) = NPV(sj) = 5 ~ (3.2.17)
The above equation isaacombined form of Zipf's law for
two variables of different rank. The important point here
is that under our constraint of complete absorption and
retention of the aaj, two protein quality indices have
been generated from our information-entropy rule, Zipf's
law.
Before removing some constraints from equation
(3.2.17) and seeing what happens to BV and NPV, I wish
to demonstrate an invariance of these indices when the
protein concentrations of two proteins are identical.
Letlusassume cS equal to cx. The first thing which we
notice is that:
BV(x.) NPV(X.) ax.
_ J_J w
BV(sj) ‘ NPV(sj) ' SE; ' (3°2°18)
or the relationship betweentﬂmaprotein quality indices
remains unchanged. This allows us to approximate the
total source entropies by the characteristic information-
entropy of the food protein inputted to the system.
The first constaint I will remove is that of 100%
digestibility. Removing this constraint changes equation
(3.2.18) to:
.74
D(x.)-BV(x.) NPV(X.) ax.
= = —l , (3.2.19)
D(sj)°BV(sj) NPV(sj) asj
where D(sj) is the digestibility of protein s and D(xj)
is that of protein x. Removing the digestibility con-
dition causes no change in our NPV relationship, but
does alter our BV, leading us to conclude that NPV is an
index more amenable to information-entropy analysis than
BV.
From the previous equation, the chemical score
index is readily derived. First, we limit consideration
of amino acids to those which are essential. Then, we
make protein 3, usually egg, our standard protein,
against whose essential amino acid levels we will compare
those of protein x. The minimum axj/asj ratio will be
equal to the chemical score (CS). By taking the loga—
rithm of equation (3.2.19) the information-entropy
explanation of chemical score is made obvious:
NPV(X.)
k 1n [c5] = min kln —1—
NPV(s.)
3
ax.
=min kln ——3 . (3.2.20)
asj
The chemical score is a measure of the amino acid channel
which transmits the least information.
Ill
75
The essential amino acid index also can be
deduced from my information—entropy analysis. Instead
of searching through the essential amino acid channels
for the one with the minimum unknown/standard ratio, we
take the average of all the essential aaj channels:
1 E NPV(X.)
k 1n EAAI = —— k 1n
10 j EAA NPV(sj)
l E axj
= —— k 1n . (3.2.21)
10 j EAA asJ
The essential amino acid index is, thus, an average
essential amino acid transmission rate through the sys—
tem and if we neglect Oser's condition for rejecting that
fraction.of axj greater than asj, the EAAI is equivalent
to the average entropy over the essential amino acid set
(EAA) as defined by equation (3.2.1).
Now I will summarize the material presented thus
far in this section. First, a foundation for a nutritive
amino acid communication system was developed, by relat-
ing a schematic of an idealized communication system to
protein metabolism. Next, the type of information flow—
ing through the system (i.e., amino acids) was discussed.
The concept of "channel“ was defined separately for each
amino acid, the basis of this approach being the highly
specific decoding mechanisms of the cell, and a
76
mathematical formulation of maximum and actual channel
transmission was developed. The point was then made that
the actual information transmission over a channel, after
encoding and decoding had completely taken place for one.
message, reflected the amino acid frequency in the
source protein message. It was then shown that the word
frequency in the channel was associated with a cost in
the same way that cost is reflected by word frequency in
any other code. A Zipfian distribution betweeen the
word frequencies of the source protein and order with
respect to their costs was shown to hold. The relation—
ship between biological value, net protein value, and
rank was deduced, and resulted in general log—linear
relations among word frequency, biological value and net
protein value. On the assignment of various (experi-
mentally controlled) values to the terms of our Zipfian
equation, the nutritional indices of biological value,
net protein value, chemical score, and essential amino
acid index were obtained with the proper constraints.
This section has proposed a theoretical model for
a nutritional protein communication system with analysis
by information theory. The results of the analysis can
be shown to correspond to some experimental and empirical
protein quality indices. Such an analysis of protein
nutrition is significant because it illustrates that
information—entropy measures of protein nutrition can be
77
rec0gnized as relevant indicators of quality. Informa-
tion theory complements these chemical indices by pro-
viding them with a causal relation to nutritional protein
evaluation. An analysis with experimental data of the
concepts thus far presented follows.
3.39 Analysis of Information-
Entropy Approach
An analysis of the information-entrOpy model will
be undertaken in this section. Specifically, the model
predicts correspondence between amino acid content of
proteins and experimental measures of their protein
quality, namely, biological value and net protein value,
and this will be investigated. This analysis will be
accomplished using published data on the amino acid
contents of proteins and on their respective biological
data.
Three information—entrOpy measures will be
studied. Two utilize the notion of the average
information-entropy of the essential amino acid set.
Using equation (3.2.1) we define this variable for pro-
tein x_as:
HX(EAA) = 1%- . X k 1n axj . (3.3.1)
3 EAA
The characteristic protein form is used because the level
of protein in the experimental diet was constant. In the
ﬂak:
....
5..
......
act“?
-.I..L
......
....nr
1..
78
previous section Oser's essential amino acid index
was derived from this information—entropy measure and
related to a log-average of the aaj net protein values.
I shall denote the log—average of the net protein
values for protein x as NPVX(EAA), and for protein s as
NPVS(EAA). Equation (3.2.21) becomes:
NPVX(EAA) - NPVS(EAA) = HX(EAA) - HS(EAA) . (3.3.2)
If the assumption is made that the NPV for each aaj in
the standard is equal to 1.00, NPVS(EAA) equals zero and
NPVX(EAA) becomes:
NPVX(EAA) = Hx(EAA) - HS(EAA) . (3.3.3)
The above is a logarithmic form of the EAAI, discounting
Oser's rules. The antilog form of equation (3.3.3)
should be an approximation of the true experimental NPV.
This antilog form will be defined as IX(NPV) if Oser's
conditions are not utilized, and as I:(NPV) with his
conditions intact. Both are information-entropy indices.
The other information—entropy measure is one
found in equation (3.2.20) which generates the chemical
score index. The antilog form for the left side of that
equation will be denoted as IX(CS).
To avoid the hazards involved with collecting
values from widely scattered literature citations, I have
79
primarily selected data from a single extensive investi-
gation of the amino acid content of proteins and of their
effect on protein quality. The work to which I refer is
that of Bjorn O. Eggum (111) in studies carried out at
the Institute of Animal Science, Department of Animal
Physiology, Copenhagen. However, utilizing one source
also has its risks and to take into account various
peculiarities or errors in Eggum's work another source
on the amino acid content of proteins was employed.
These data are presented in an FAO compilation of amino
acid data, entitled The Amino Acid Content of Foods and
Biological Data of Proteins (97).
Sixteen different protein diets were studied in
Eggum's experiments. The essential amino acid contents
of these test diets are given in Table 3.3.1 for Eggum's
study, while Table 3.3.2 lists the amino acid contents
for similar food proteins found in the FAO report.
Unfortunately, Eggum's study did not include tryptophan
values. It is readily seen by comparison that the other
amino acids values in Tables 3.3.1 and 3.3.2 are very
similar. Therefore, I am going to use the FAO tryptophan
values with Eggum's other amino acid values in subsequent
calculations of information—entropy indices. Some com-
promises had to be made: FAO "meat and bone meal" data
was utilized in place of "meat and bone scraps"; "oatmeal"
was substituted for "oats" and "dehulled oats" in the
(ll
80
oo~.m «No.H mmo.m Hv~.o men.~ mmo.¢ omo.m oom.H mvm.m mom
oom.H Hmo mmv.m omo.v mmH.m mmH.m omH.m omH.H mmm.m Honuoummnm mHm
. . . . . . . . . wom .mumo omHHsnmo +
«on m moo H omm m «me m mom H mmo N amp H mom H poo H .wom .Hmms coonsom
. a s a a s a a mwOm .MUQO UGHHDSGU +
How H mom meH m 6mm m mmH m mmH m Hmm H mom H mmH m .wom .uoozom HHHE omsstm
mom.~ mmH.H mom.m omH.m ovm.m Hmo.m mmH.m va.H sov.H Home .uosoncsm
Hmo.m omm mom.m oqo.m moo.H Hmo.~ mmv.H moo som.H Home .unoocsouo
mom.m mmm.H oHo.m oom.m mmH.m oso.m omo.H omo.H omm.m Home .coonsom
oom.m «we mmH.m moo.m Hom.H mom.m H~5.H one mom.m mmouom 0:09 can you:
v~o.~ mom mvo.m «mo.m mom.m moo.m mmo.~ mmv.H mhm.m Home .BmHm
nom.H mom.H omH.m moa.m moo.m on.m mom.m wov.H oom.m chmmo
on.H mom smm.m mom.m moH.m omm.~ wom.H 65o.H mos assouom
mmm.H ooo.H HvH.m Hmo.m omn.H moo.m ooo.~ onm.H soH.H mNHmz
smo.m mmo omo.m vvm.m mmv.H mmv.m one.H on.H mom.H mam
moo.H mmo, Hmo.m om~.m oHo.H vev.m mom.H VmN.H ooo.H 680:3
mnH.~ com Hoo.m mom.m oom.H mmh.m moo.H omm.H mmo.H muoo
omm.H mom omH.m mmm.m mms.H mvm.m mow.H oom.H oom.H soHumm
H ...... .... m m H .... a m .. a ....
5 S 1 W 8 n O T 1 M m. S
...: H4 0 U 0 T. T. a I.
u .... s .....A T. a u o s I. u
T. P m. ..u I u n a u 1. O a
U T. a _ a D I. a u
9 U a T? U T... T:
a n. u a u u
8 e a
.Asbmmm "monzomv mcﬁwuoum coon mo Az swam Ham moHQSOHOHEV ucmucoo pﬁom 0:85411.H.m.m mamma
81
«me omH.N Nmm ooo.m moN.v mom.N mmo.m Nmo.N mmo.N omN.N mom
woe .Hmosuoo +
mHv Hvo.H omo.H ovm.m mHN.v mNN.N me.m omo.N NNN.H mHm.N woo .Hoozom HHHz
“kuumumwum mﬂm
. . . . . . . . wom .Hooauoo +
Hmm mmv N vmo Nam N Nmm m mmH N New N mom H can H NmH N .wom .omogsom
. . . . . . . . . mom .Hmwﬁumo +
8H8 moo H ooo H mwN m Nmo w omo H moo m mmm H moH H mom N wom .Hoosom HHHz
8H8 mom.N amo on.N Hoo.m ovo.N mo>.N mNo.H mmH.H ovm.H uoonmcsm
NHN Noo.v 8mm HNN.m omo.m NHo.H HNN.N oom.H mom HHm.H uncocdouo
Nom mom.N HNo.H omm.N mos.m osH.N omm.N NNo.N ohm mN>.N noonsom
NoN mov.N oNN voo.N mmm.N N6N.H mNN.N «No.H «on mvN.N Home much new poo:
voN NON.N wmo.H on.N omv.m Nmo.N oHs.N NNN.N moe.H on.m Home .anm
mom HNN.H mmH.H mso.e mNo.w mmo.N moo.m Nov.N NmH.N va.m :Hommo
van mOH.H How Nah.N nvm.o oom.H 686.N wmm.H who mom ssnmuom
wHN Nom.H hmo.H 66H.m Ham.m mms.H wmm.N omw.H NON.H NNH.H oNHoz
oNN Noo.H omm NNN.N 6mm.N woo.H mmm.N NmN.H «0H.H Nv¢.H Hows mHons .osm
Nmm mmo.H NNm H¢F.N oNH.m mmm.H omN.N omm.H moN.H NNN.H poor:
Nmm osN.N New NNo.m wmv.m mom.H mmo.N Nos.H emv.H omo.H Hooauoo
Hue voo.H wa mHo.m wsH.m HH».H Hoo.N mms.H owN.H omv.H soHumm
I V H R e d T I A E W T
.... N. H ...HN m H w m ..m H
d .... 1. ..Auu o T. T. a .A I.
3 u I. 1 1.x T a u o s I. u
o T. 0.. OUT. u n a u 1.0 a
w. w m. .. m. H. mm.
W a u u a nu
9 9 89
.A0¢m umOHﬂomv mCHououm boom mo AZ Baum Hmm meOEOHUHEV uqmucoo cﬂom ocﬂﬁmll.m.m.m mqmds
82
50:50 mixed diets of dehulled oats with skim milk powder
and soybean meal; whole meal rye for rye; soybean,
groundnut, and sunflower seed for the respective meals;
milk powder for skim milk powder; and a 60% milk powder
+ 40% oatmeal mixture for the pig prestarter.
The results of Eggum's biological tests are
listed in Table 3.3.3. Two of the protein quality indi-
ces discussed in section 3.1 were measured, biological
value and net protein value. Two different test animals
were used: rats, each initially weighing 75 grams, and
baby pigs, about 16 days old. The rats were fed 150 mg N
once daily. The balance period was 5 days, and the feed-
ing regimen was initiated 4 days before. The baby pigs
were fed 6 times daily, and the protein level in their
diets was 3.84% N of dry matter. The Pigs were condi—
tioned for 6 days before the balance period, which was
4 days long.
Table 3.3.4 lists the information model's three
predicted values for the experimental net protein values,
which reflect various constraints upon the model (results
are given on a scale of 0 to 100). IX(NPV) is the uncon-
strained EAAI, whereas I:(NPV) employs Oser's rules, and
IX(CS) takes account of the limiting amino acid con-
straint in chemical scoring. The subscripts e and f
denote Eggum's and FAQ data sources, respectively. A
linear regression analysis (98) was done between the
11 11 m.oo o.mo mom
N.mh H.mh h.ob N.mw mmﬂm anon How HmuHmummHm
N.oo H.ms m.oo v.oo mummmwMmemmooMom
m.mo H.HN m.mo o.nn Homwmm memwmwmswxm
6.8m o.oo o.so N.oa Home comm Hoonmqsm
v.8m n.mm m.mm v.oo Home useoqsouo
N.vo N.Ho N.om o.No Home coon msom
N.mm 6.86 N.Nv N.mv mmonom much 6:6 you:
o.qm H.oo o.HN m.oo Home rmHm
m3. mam New ENN NHN 50.1.8
N.No m.mo m.ve N.Nm ssnmuom
m.mo o.No o.om H.mm oNHmz
m.vo h.ma o.mm o.oo mam
N.mo N.HN m.Nm o.om ummgs
o.oo v.6N H.mm m.on muoo
p.66 m.om o.mm m.Ho mmHHmm
Hmono msHo> HmmHmc msHm> Hmnmmc onHm> Hmnmmv msHo>
:Hmuoum uoz HMOHmoHon :Hopoum uoz HmonoHon
. gamma " onsomv
mmﬂm whom can mumm mo mumwn puma comuxﬁm mo monam> :Hmuoum uoz paw mmaam> HMONUOHOHmII.m.m.m mamma
84
o.ooH o.ooH o.ooH moH.HH o.oOH o.ooH o.OOH HoH.HH mom
o.oo H.8m m.¢m oom.oH N.mo o.Nm m.mm Nvm.oH nouHmHmon oHo
o.NN o.oN o.om moa.oH v.46 m.ho o.oo NNN.oH mnoo ooMMMMommwmmom
H.om o.om H.Hm Hom.oH N.HN o.oN o.on noo.oH MMMMoMowmmmomommmxm
o.Hm N.ma N.mN Noo.oH N.Nm m.ao o.om ooN.oH Home .uosoncsm
o.om m.mo o.oo omm.oH o.oe o.mo o.oo oom.oH Home .uscocsouo
o.ov o.oa o.om HoN.oH o.em ¢.om o.om mam.oH Home .cmoosom
o.mm N.No H.mo mmv.oH o.He m.mo m.mo va.oH odouom moon cam now:
m.vo N.Hm N.Hm NHm.oH m.¢o N.am o.mw mom.oH Home .anm
N.No o.oo H.ooH vHH.HH o.om o.om H.5o mmo.HH :Hmmoo
o.oN v.mo H.oo mom.oH m.NN w.no N.oo mom.oH aoamuom
H.mm m.No o.oN mom.oH H.Hv H.Ho o.NN mmo.OH oNHmz
m.mv N.mo N.mo ovv.OH m.mv o.oo m.mo mov.oH mam
w.Na m.mo m.mo Nom.oH m.mm m.mo m.mo Hov.oH 060:3
m.vm «.mo o.mN oo>.oH o.oo e.No o.oo NmN.oH muoo
o.ov m.HN o.NN oNo.oH m.mm H.oo v.6N NHN.oH moHumm
Ammova Hm>mvaH Hm>mzcxH Hmammoxm
HomocxH Ho>mszH Ao>mzcxH “mammvxm
mama 0mm mama asmmm
.mawmuoum coon ucmuomMHn ammuxﬁm How mwowch meHuamlcoquEHomcHll.w.m.m Manda
1. . .... ~90. 1‘ . H
Vie... , .
.... . 2.41....1... .
9.19%? .1wﬁou11..uk. 1 .. +11 N
,. ,..f.«..s..11..
11...: 11 1111.11.71.11 1.0...Lllruwi £3:
85
experimental biological values and net protein values of
both rats and pigs and the three information-entrOpy
indices. Table 3.3.5 lists the correlation coefficients
(98) of this regression analysis. All correlations
between the information-entropy indices and biological
evaluations are highly significant (P < 0.005 for all
regressions). Because the digestibility oftﬂuaprotein
affects the model's ability to predict this parameter
the correlation of the information-entropy indices is
somewhat poorer for biological value than for net protein
value. This was anticipated, however, and a reasonable
correlation still exists between the model's predictions
TABLE 3.3.5.--Matrix of Correlation Coefficients for Information-
Entropy Measures Versus Net Protein values and
Biological Values of Rats and Baby Pigs (based on
amino acid content of dietary protein).
Net Protein Biological Net Protein Biological
Value (Rats) Value (Rats) Value (Pigs) Value (Rats)
Ix(NPVé) 0.843 0.706 0.795 0.673
I:(NPVe) 0.898 0.794 0.817 0.729
Ix(CSe) 0.914 0.896 0.702 0.675
Ix(NPVf) 0.856 0.704 0.786 0.634
I:(NPVf) 0.880 0.748 0.806 0.677
0.951 0.916 0.777 0.729
Ix(CSf)
86
and biological value. The two information indices,
Ix(NPV) and I3(NPV), in which were considered the total
essential amino acid contents of the proteins, gave a
more consistent interspecies correlation than did the
chemical scoring estimate. This seems to suggest that a
total entropy criterion based on all essential amino acid
channels gives better results than relying on the entropy
of a single channel.
A ranking of the sixteen proteins based on their
respective scores was done. Spearman's rank correlation
coefficients (99) were then computed. The correlations
TABLE 3.3.6.--Matrix of Spearman's Rank Correlation Coefficients for
Ranks of Information-Entropy Measures versus Net Pro-
tein Values and Biological Values of Rats and Baby
Pigs (based on amino acid content of dietary protein).
Net Protein Biological Net Protein Biological
Value (Rats) Value (Rats) Value (Pigs) Value (Pigs)
Ix(NPVé) 0.824 0.629 0.546 0.513
I:(NPVe) 0.859 0.685 0.615 0.582
Ix(CSe) 0.854 0.848 0.476 0.499
Ix(NPVf) 0.888 0.727 0.594 0.526
I:(NPVf) 0.897 0.747 0.641 0.538
Ix(CSf) 0.950 0.906 0.553 0.582
87
among the rank-orderings, Table 3.3.6, as dictated by
biological testing and information—entropy, were sig-
nificant (P < 0.05) for all regressions). This analy-
sis shows that not only does the information-entropy
model generate significantly correlated scoring, but
the rankings of these scores are also consistent.
Eggum determined the individual amino acid avail—
abilities for the food proteins, and thus performance of
the model was tested using quantities of available amino
acids as information sources. Each of the information-
entropy indices was recalculated using the fraction of the
protein amino acids which was available. A linear regres-
sion analysis and rank correlation were again done, and
the resultant correlation coefficients are given in
Tables 3.3.7 and 3.3.8, respectively. The results of this
analysis indicate that the model predictions are rela—
tively uninfluenced by the use of the original amino acid
content of the protein as opposed to the use of their
respective availabilities. Only the IX(CS) index, based
on the chemical scoring assumption, exhibits a consistent
improvement.
The final analysis undertaken in this section
was an examination of the use of Zipf's law in the model.
If a Zipfian relationship is present, a log-log plot of
our measure of information frequency versus the ranking
TABLE 3.3.7.--Matrix of Correlation Coefficients for
Information-Entropy Measures Versus Net
Protein Values and Biological Values of
Rats and Baby Pigs (based on available
amino acid content of dietary protein).
IX(NPV) 13(NPV) Ix(CS)
Negaiﬁgtﬁigts) 0.832 0.887 0.944
Bigiigicﬁﬁats) 0.643 0.700 0.866
Nesaiﬁgtigggs) 0.811 0.864 0.805
Bi°1°9ical 0.675 0.817 0.705
Value (Pigs)
TABLE 3.3.8.--Matrix of Spearman's Rank Correlation
Coefficients for Ranks of Information-
Entropy Measures Versus Net Protein Values
and Biological Values of Rats and Baby Pigs
(based on available amino acid content of
dietary protein).
IX(NPV) 13(NPV) IX(CS)
Nesaiﬁgtjigts) 0.782 0.812 0.918
11
Biological 0.551 0.700 0,589
Value (Pigs)
||1
89
function should be linear. The information frequencies
for our information variables are given as follows: for
IX(NPV) the logarithm of the information frequency is
HX(EAA), while for I:(NPV) the log frequency is HX(EAA),
which, modified by Oser's rules, will be denoted H:(EAA),
and that of IX(CS) is min Hx(aaj). The ranking function
is the inverse of the experimental net protein value or
biological value index (scaled to 1.0), as stipulated in
section 3.2. The amino acid frequency for the chemical
scoring method was calculated by multiplying IX(CS) by
the log-average amino acid frequency for egg protein,
anti-log HX(NPV) of egg. The logarithmic form of the
resultant frequency is denoted min H:(aaj). This stan-
dardization procedure is done to remove the variation which
would occur in the analysis if raw min HX(CS) were used.
The results of a correlation and regression analysis are
presented in Tables 3.3.9 and 3.3.10. Table 3.3.9 gives
the correlation coefficients. All correlations are at
least significant at the P < 0.01 level and these results
tend to support the cost-frequency behavior of Zipf's law.
However, the most interesting aspect of the analy-
sis is found in Table 3.3.10, which lists the slopes of
the Zipfian regression analyses. The information-entropy
model would predict a lepe of -l.00, but except for sev-
eral values, our regression analyses yield slopes of
approximately -0.50. Figure 3.3.1 illustrates this result
90
H
Hmo.o1 amo.o1 HHa.o1 amm.o1 1.6oexmncHs
amo.o1 mka.o1 m46.o1 who.o1 Anmamv mm
omm.o1 Nma.o1 ooN.o1 Nmm.o1 Ammamvxm
n
Hmm.o1 voo.o1 oem.o1 vom.o1 1.6618mooHa
@ . VA
woo.o1 Hm6.o1 NNN.o1 mam.o1 1 < Ammﬂmv wsam> Amuomv mDHm> Amummv moam> mmwwm wocmwwwmw
HMUHmoHon aﬂwwoum umz HMOHmOHon cHououm uoz mcﬁxzmm 1HdmcH
.Hmcoz >QOHucmlquumEHomsH
no mHmNHmaa looH1ooHv cmHmoHN you muaoHoHnmooo quuonHHoo no xHHumz11.m.m.m mHmme
||(
91
H
ooq.H1 soN.H1 Nov.H1 mNm.H1 1.6618mmcHs
x
ooe.o1 moo.o1 omm.o1 moo.o1 Ammamcom
Mn
mNo.o1 mwo.o1 Nwm.o1 mam.o1 Hmaamv m
n x m
mom.H1 NNo.H1 va.H1 mmN.H1 1 661 m 2H8
m um
4m6.o1 Hoo.o1 mmm.o1 on.o1 . 1 ammo m
G x
ovo.o1 NNo.o1 NHm.o1 Nmm.o1 1 «Nov m
Ammﬂmv wsam> Ammﬂmv wsaw> Ampmmv 05Hm> Amummv mdam> mmwwm mocmwwmwm
HmowmoHowm campoum umz HMOHmoHon cﬂmgoum umz mcﬂxcmm IHOMGH
. _ .Hmwoz
amonuGMIcoHmeHomcH mo mHm>HMG¢ Amoalmoav smemmwu mo mmmon mo xHHuMEII.OH.m.m mqmﬂa
92
Net Protein Value
Information Frequency
100 80 60 40
12* I 1 l 4000
-—3000
Slope = -0.526
11-
i )—-1500
E
c:%
IE ‘
10" L-1000
Slope = —1.000
_ 600
9 1 I 1 I l
0 0.25 0.5 0.75 1.0 1.25 1.5
log (Rank-ordering)
Figure 3.3.l.--H3(EAA), the Average Information-Entropy,
Versus Zipfian-Rank—Ordering for Net
Protein Value for Rats.
93
for the information-entropy model of the net protein
value rank-ordering of rats. This apparent error in the
model is due to a constraint we still have operating,
namely, total retention of all amino acids fed into the
system. This effect will be explored more fully in the
discussion when the model is assessed in light of all
evidence.
In general, the information—entropy model cor-
relates with both scores and rankings of such indices as
biological value and net protein value. These results
tend to indicate”that information theory could provide
a causal interpretation for underlying biological phe-
nomena and serve as an aid in rationalizing observed
nutritional behavior.
CHAPTER IV
INFORMATION AND THE HYDROLYSIS OF
CARBOHYDRATE POLYMERS
The release of simple sugars from complex glucose
polymers is directly related to the nutritonal values of
these substances. A relationship between chain length
and hydrolysis can be deduced by an information-entropy
analysis. The organization of carbohydrate information
is related to the polymer's length, and this structural
information affects the rate of hydrolysis. The following
presents an information-entropy approach for discerning
the above relationship and verifies the analysis with
experimental data on such degradative processes.
4.1 Aspects of Carbohydrate Structure,
Hydrolysis and Metabolism
Unlike protein structure, carbohydrate structure
is usually based on the frequency of only one type of
chemical information, namely, glucose (112). Glucose is
a hexose or six-carbon sugar. Glucose is commonly found
in polymer forms of which there are two main linear
classes, amyloses and celluloses. The difference between
amylose and cellulose is structural: the way the glucose
94
95
monomers are bonded into chains differs. In carbohy-
drates the general. picture is.of a single information
unit linked together in different ways, whereas in protein
nutritions there are many different information units
linked together in one way (i.e., peptide bond).
An amylose bond originates at the a-position on
the asymmetric carbon of glucose (113) and the cellulose
at the B-position (114). Primarily, the linkages go from
the C-1 to the carbon at the C-4 position in the adjoin-
ing molecule when the linear polymers are formed. Aside
from these linear bonds branching ones also exist; the
most common being the a-1,6. This discussion will be
limited to the linear 1-4 linkages. 7
Both amylose and cellulose must be degraded, so
their glucose can be made available in monomer form,
before these substances possess nutritional value.
Degradation is accomplished by bond-specific enzymes.
Those glucan hydrolase enzymes which react with the
d-l,4 linked glucose of amylose are known as amylases
(115), while those which react with the B-l,4 linkages of
cellulose are called cellulases (116). These glucan
hydrolases operate in two different ways. The exo-enzyme
mechanism attacks the nonreducing end of the polymer,
cleaving off disaccharides in an endwise fashion. The
endo-enzyme mechanism attacks the internal linkages of
the polymer, randomly breaking it down, initially into a
96
mixture of di- and tri—saccharides, and finally into a
mixture of mono- and di—saccharides.
The residual di-saccharides and the few tri-
saccharides produced by polymer degradation are readily
hydrolyzed into glucose by an enzymatic class known as
the glucosidases. These enzymes are specific for the
(r-orB-bonds of two or three unit glucose chains and
react poorly or not at all with polymers of greater chain
length. Maltase is the common name of the glucosidase
degrading the a-linked di-saccharide, maltose, whereas
cellobiase is the enzyme acting on the B—dimer of glucose,
cellobiose.
Once glucose is obtained from the digestion of
carbohydrate polymers, one of its metabolic functions is
to provide the organism with energy. This energy is
obtained by the breakdown of glucose into carbon dioxide
and water. Two metabolic pathways are needed to derive
the nutritional energy from glucose (117). The Embden—
Meyerhopf pathway takes glucose and converts it into two
molecules of pyruvate with some generation of biochemical
energy (ATP). The pyruvate is then oxidized, with the
loss of a carbon, into an acetyl group which enters the
tricarboxylic acid cycle and is completely oxidized into
Carbohydrate and water with significant generation of ATP.
Given the above metabolic role of carbohydrates,
how is their nutritional value measured? Digestibility
97
is the main criterion for ascertaining the value of car—
bohydrate. In the routine analysis of feeds, the nutri—
tional benefit of carbohydrate results from the
digestibility of two fractions, the crude fiber fraction
and the nitrogen—free extract. The chemical procedure for
this fractionation of feeds was developed over 100 years
ago and is known as the Weende method (118). The crude
fiber fraction consists basically of cellulose, lignin,
and other structural polysaccharides. The nitrogen-free
extract consists of amylose, sugars, lignin, and material
known as hemicellulose. Generally, then, the digesti-
bilities of cellulose and amylose components of the feed
are studied. Once the digestibilities of crude fiber
and nitrogen-free extract fractions are known, the contri-
butions to digestible and metabolizable energy of the
cellulosidic and amylosidic components can be determined
(119). The conversion factor for carbohydrates used in
calculating the digestible energy from the digestible
crude fiber and nitrogen—free extract fractions is
4 kcal/g. Also, if only the crude fiber and nitrogen-
free extract fractions are considered, the digestible
energy equals the metabolizable energy.
A chemical analysis of the feedstuff ingested by
an animal will give the necessary data on the quantity of
carbohydrate in the diet. The energy contribution of
these carbohydrates to the organism's metabolism is
98
ascertained by experimentally determining the digesti-
bility of this fraction. Given the polymeric nature of
carbohydrates and the role these polymers have in nutri-
tion, their structure—function behavior could be patterned
after an information—entropic behavior similar to that in
the previous analysis of proteins. Just what information—
entropic rules are followed in carbohydrate metabolism
will be examined in the following section by analyzing how
the polymeric structure of carbohydrates affects the rate
and extent of their hydrolysis.
4.2 An Encoding Model for
Carbohydrate Information
In Chapter III the information-entropy approach
was used to analyze the transmission of nutritional pro—
tein information. The capacity of the decoder was
important in the previous analysis because it reflected
the optimal metabolic requirement for amino acids or
protein information. This demand had to be uniquely
satisfied for each essential word (amino acid), because
each was missing information required for development.
The information requirements in carbohydrate nutrition
differ from those for protein nutrition.
First, a carbohydrate requirement does not exist
per se, but rather, an energy requirement does. The
primary function of carbohydrates is to satisfy the energy
requirement, a requirement also fulfilled by proteins and
99
lipids. Therefore, carbohydrates do not supply essential
information as proteins do. The channel and decoding
capacities are not as relevant information-entropy
parameters for carbohydrates as they are for proteins.
This is because the excessive carbohydrate storage capa—
bility of the organism places no limit upon the trans—
mission of the carbohydrate message once it enters the
channel. Consequently, the carbohydrate information
encoded (i.e., transported) into the organism identifies
the carbohydrate nutritional contribution.
In the previous section the main carbohydrate
messages, amylose and cellulose, and their basic informa-
tion unit, glucose, were discussed. To be properly
encoded, a carbohydrate polymer message must be reduced
to the monomer form. This encoding process is analogous
to the enzymatic digestion of the polymer and we can
think of the amylase and cellulase enzymes as encoding
devices. Typically, the efficiency of the encoding proc-
ess is related to the length of the message to be encoded.
The longer the message, the greater the cost of encoding
and the lower the efficiency.
Message length in carbohydrate chemistry is
ssynonymous with the degree of polymerization, DP. Con-
ssider a carbohydrate source which is monodisperse (i.e.,
5111 molecules have the same degree of polymerization)
(120). If Ng equals the total number of glucose units in
100
this source, the frequency of messages, m(DPj) with length
DPj, equals Ng divided by DPj. Given the relationship
between message length and encoding efficiency, Zipf's
law will order message length with respect to encoding
cost. Equation (2.2.22) gives this rank-frequency
relation:
k 1n m (DPj) = k 1n Ng - k 1n (Rankj) , (4.2.1)
or, alternatively,
k l N DP. = k l N — k l R k. . 4.2.2
n(g/ 3) ng n(anj) ( )
Solving equation (4.2.2) for Rankj yields the relation—
ship:
k 1n Rank. = k 1n N - k ln N DP.
( j) g (g/ j)
= k ln DPj . (4.2.3)
Equation (4.2.3) gives both the logarithm of the
rank and the absolute redundancy, which, if divided by
k 1n Ng' becomes the relative redundancy. The encoding
efficiency, EF., equals one minus the relative redundancy,
J
and for carbohydrates has the following form:
k ln DP. k 1n (N /DP )
g 1
EFj = 1 ' k ln N = k 1n N
9 g
k ln (Rank.)
= _ ____J_
1 k M N . (4.2.4)
9
101
The above equation is identical to Reza's definition of
encoding efficiency (121): the entropy of the original
message ensemble, k ln (Ng/DPj), divided by the maximum
information, k 1n Ng, times the average length of the
encoded message (equal to one for monomers). The inter-
pretation of the encoding process for a carbohydrate
message, then, is that as the cost of encoding increases
(i.e., as the ranking of messages with respect to their
degree of polymerization increases) the efficiency of
the encoding process decreases.
Now comes the question of relating the above
result from our information-entrOpy analysis to a rele—
vant nutritional index. Let us begin by defining the cost
associated with encoding further. Equation (2.2.13) gives
the total message cost which can accommodate an
information-entrOpy analysis. By considering a mono-
disperse carbohydrate message where the frequency, nj,
of words can be determined by dividing the total number
of symbols (monomers) present, Ng’ by the message length
or degree of polymerization, DPj, the total cost, CJ,
becomes:
N
C = n.c. = ——— c (4.2.5)
9
J 33 DP]. j.
An expression for the degree of polymerization based
upon the above variables is:
102
c. . (4.2.6)
The variable cj is, in the context of our defi-
nition, the cost (i.e., time) per word. The total
activity of an enzyme is defined as the substrate con-
sumed per unit time (122). Consideration of the dimen-
sions of these two variables indicates that they are
inversely related; cj is equal to the inverse of the total
enzymatic activity. It is also logical that the enzymatic
activity is a determinant of the cost of encoding because
the encoding device for the carbohydrate message is an
enzyme. Given that this relationship is correct, a link
between a physical measure of carbohydrate hydrolysis
(i.e., enzyme activity) and information encoding can be
established. Substituting the total enzymatic activity,
a., for a carbohydrate polymer of degree of polymerization
J
1, we obtain:
(4.2.7)
Let us now consider two monodisperse systems of different
degrees of polymerization, each with a total of N9 mono-
mers. The total costs, CJ and CI’ of each system are
equal, but the individual message cost, cj and Ci’ will
be different. This is logical because the message
103
frequency of the system possessing the lower degree of
polymerization increases proportionally as its individual
message cost decreases. Now, by taking the logarithm of
equation (4.2.7) for the jth and ith polymer systems and
calculating the difference between them, the following
is seen to be true:
k ln DPj - k ln DPi = k 1n (ai/aj) . (4.2.8)
Note that a similar result is obtained by computing the
difference between the jth and ith systems using equa-
tion (4.2.3), and results would be identical if the
inverse total enzymatic activity were equal to the rank.
Assuming that the enzymatic activity does have
such a relationship to rank, what is the significance?
Firstly, equations (4.2.7) and (4.2.3) imply that as rank
of DPj increases, the total enzymatic activity will
decrease, or alternatively, the respective polymer cost
will increase. Then, if we assume DPi to be greater
than DPj, the ratio ai/aj measures the relative rate of
hydrolysis as the more highly polymerized ith system is
degraded to a less polymerized state, the jth system.
This activity ratio is also known as the yield or recovery
(122) of the activity at the nth step of a reaction, com—
pared with some reference level. Since as the polymer is
degraded, it attains a lower degree of polymerization,
104
the activity increases proportionally; the activity
recovered by the polymeriJibeing degraded can be thought
of as proportional to the degree to which it has been
hydrolyzed. Therefore, the activity ratio ai/aj,
measuring the proportion of the activity recovered by
the molecule during degradation, can be viewed as an
estimate of a hydrolysis coefficient, Di' Setting
ai/aj equal to Di’ the degree of hydrolysis the ith poly-
mer system has undergone when it possesses a degree of
polymerization of j, and substituting into equation
(4.2.8), yields:
k 1n DP. = k ln DP. + k 1n D. , (4.2.9)
3 i i
which relates the information-entropy measure, message
length, to the degree of hydrolysis. The hydrolytic mea—
sure of equation (4.2.9) is identical to the ratio of the
encoding efficiency of the jth system to that of the ith
system. Using equation (4.2.4) to determine EFi and EFj,
the ratio equals:
EFi _ k 1n (Ng/DPi) k ln (Ng/DPi)
Er.‘ klnN ’ klnN
3 9 9
k 1n (DP./DPi)
= _______l_____
k In Ng , (4.2.10)
105
which is proportional to the extent of hydrolysis, Di’ in
equation (4.2.9).
Thus, the cost-efficiency reasoning encompassed
by Zipf's law has led to ordering of the hydrolyses of
carbohydrate messages based on their lengths. This rela-
tionship in turn has been shown to be identical to the
encoding efficiency of the carbohydrate message, which
is probably the most sensitive variable in the trans—
mission of carbohydrate information. The agreement
between this approach and experimental data will be
demonstrated in the following section.
Before assessing the information—entropy method
for estimating the hydrolysis of amylose, a modification
is necessary because when activities of enzymes are
determined, the experimental conditions are constrained
so the concentration or word frequency, nj, instead of
N9, the total monomer concentration, is constant. Thus,
equation (4.2.7) becomes:
N .
DP. = 93
J
l
__ 1 (4.2.11)
CJ aj
ﬁvhere Ngj is the number of glucose monomers in the jth
System, equal to DPj times nj. Given equation (4.2.11),
the difference between the jth and ith polymer systems
is:
106
k 1n DP. - k ln DP.
3 1
+ k
In order to see how equation (
tion (4.2.8), both the relationships between N
and those between C
N J
gi
If we denote DPi as greater than DPj, the experi-
mental constraint of nj being
N e uals N
g 9]
gi
ratio of CI to CJ
shown by first changing N
is equal to
gi
ratio and substituting it into equation
allows a solution of DPj in terms of CI'
equation (4.2.7) gives a solution of DPj
N .
k ln _El .
and CI must be known.
times the ratio of DPi to DPj.
to N
93
J
ln (ai/aj) . (4.2.12)
4.2.12) differs from equa-
gj and
equal to ni implies that
That the
that of DPi to DPj can be
times the DPi/DPj
(4.2.11), which
Alternatively,
in terms of CJ.
Equating these two expressions for DPi shows the ratio of
CI/CJ equal to aj/ai, and the
equation (4.2.8) equal to the
together these relationships,
(4.2.12) becomes:
(
the logarithm of which equals
gi
activity ratio given by
ratio DPi/DPj. Putting
the ratio term of equation
DPi
DP = l , (4.2.13)
zero. Therefore, the
(experimental modification of making the initial
107
concentrations (i.e., word frequencies) equal necessi-
tates no adjustment of the information-entropy relation-
ship and equation (4.2.9) is valid.
The remaining question concerns the validity of
this information-entropy approach in determining an
experimental value such as enzymatic activity. If the
mathematical relations previously developed can be con-
firmed by experimental means, then more faith can be
placed in the viability of the information approach as a
methodology for understanding enzyme hydrolysis. The
following section will assess the agreement of experi-
mental data with information-entropy theory.
4.3 Assessment of the Carbohydrate
Information-Entropy Analysis
Because the conditions of the information-
entropic analysis were based upon a very selective type
of experimental situation, the first part of this assess-
ment will focus on monodisperse systems. However, the
approach will later be modified so that hydrolyses of
polydisperse systems can be calculated. The total
activity of the enzyme will be determined from velocity
of the reaction (moles/unit time) as given by the
Michaelis-Menten equation (123), and using the kinetic
constants characteristic of the enzyme under study.
The data for amyloses were taken from a paper by
Husemann and Pfannemuller (124), who experimentally
L
108
determined the kinetic constants Vm' maximum reaction
velocity, and Km’ the Michaelis-Menten constants for two
amylases, B-amylase and phosphorylase synthetase (sources
of the enzymes in the studies were not discernible). Both
are exo-enzymes: B-amylase cleaves off maltose units
from the non—reducing end of the polymer, while phos-
phorylase cleaves or adds glucose-l-phosphate units to the
ends of polymers (125). The experiment was done with
amylose having degrees of polymerization from 750 to
3,815 where the molecular weight distribution for each
polymer was narrow (i.e., approximating a monodiSperse
solution). These amylose chains served as sites of
degradation for B-amylase action or sites of synthesis
for phOSphorylase. The substrate concentration used in
calculating the total activities of the enzymes was 0.06hL
Table 4 . 3. 1 (page 111) presents the experimental kinetic data
for these enzymes while a graphic display of these rela-
tionships to DPj for B-amylase can be found in Figure
4.3.1 (page 109). The results of a correlation and
regression analysis (P < 0.05) between total enzymatic
activity and degree of polymerization are found in Table
4.3.3 (page 112).
Data on monodisperse solutions of carboxymethyl
cellulose for four different cellulase complexes, after
an experiment by Almon and Eriksson (126), were used to
investigate the chain length—activity relationship for
109
Activity (10_6 moles/sec)
2.0 3.0 4.0 6.0 8.0
13 ' ' 1 8000
12— ..4000
a, 11- 1—2000
H O
(U
0
U)
N
8‘ 10— —1000
....4
o
9... —500
8 1 l r l 1 1 1 1 I
1.0 2 0 3.0
log2 Scale
Figure 4.3.1.-—The Degree of Polymerization Versus
Enzymatic Activity for B-Amylase.
Degree of Polymerization (Amylose)
110
Activity (10.9 moles/sec)
1 4 16 64 256
10.0 1 i 1
o
9.0- —
o
3 8.0— —
(U
8 O
U)
.9.
O h—
7.0-1 00
6.0— _
0 I I I I I I I I
o 5.0 10.0
log Scale
1000
500
250
125
64
32
Figure 4.3.2.--The Degree of Polymerization Versus
Enzymatic Activity for Cellulase A
(Penicillium Notatum).
Degree of Polymerization (Cellulose)
‘|_..rl
111
TABLE 4. 3. 1. --Degree of Polymerization and Enzyme
Kinetic Data of Amylose.
B-Amylase Phosphorylase
DPJ' Km* vm* * aj * * Km1L vmﬂ aj H
750 66.6 10.3 10.29 19.0 22.2 22.19
1,775 39.4 6.25 6.25 13.0 14.7 14.70
2,875 29.6 4.0 3.99 6.75 8.04 8.04
3,815 23.3 3.45 2.92 3.25 4.50 4.50
*micromoles maltose.
**micromoles maltose/sec.
*micromoles glucose.
+micromoles glucose/sec.
TABLE 4.3.2.--Degree of Polymerization and Activity Data
of Cellulose, with activity in (moles/
sec.) x 10'
DP Cellulase Cellulase Cellulase Cellulase
j A* B** CT DII
112 124 109 126 90
118 85 68 66 68
128 158 185 248 156
211 58 60 80 86
291 69 57 57 60
323 42 ‘ 25 35 30
351 37 16 _ 31 29
625 17 10 19 12
871 6.2 6.2 7.3 7.1
885 8.0 5.9 5.9 5.5
988 14.3 6.5 8.9 11.2
*Cellulase purified from Penicillium Chrysogenim Notatum.
**Cellulase dialyzed from Aspergillus Oryzae Niger.
1’Cellulase partially purified from Aspergillus Oryzae Niger.
++Cellulase purified from Stereum Sanguinolentum.
112
TABLE 4.3.3.—-Correlation and Regression Analysis of
Activity Data Versus Degree of Polymeriza-
tion.
Correlation .
Enzyme Coefficient Slope x-intercept
B-Amylase 0.99 -l.33 14.41
Phosphorylase 0.95 -0.97 14.21
Cellulase A* 0.95 -0.72 12.09
Cellulase B** 0.99 -0.65 11.46
Cellulase c”r 0.93 -0.57 11.26
Cellulase B** 0.97 -0.70 11.85
*Purified from Penicillium C. Notatum.
**Dialyzed from Aspergillus O. Niger.
+Partially purified from Aspergillus O. Niger.
IIPurified from Stereum Sanguinolentum.
cellulose. The carboxy-methyl-substituted cellulose was
used because samples with a narrow molecular distribution
were more readily attainable. The degree of substitution
on the cellulose had a range from 0.8 to 1.0. The
activity was calculated by relating the changes in vis—
cosity to enzymatic degradation and the enzymatic
activity was calculated through the number of bonds
broken per unit time; refer to the above paper if addi-
tional information on determination of the enzymatic
activity is necessary.
The cellulases employed are from three sources:
Penicillium Chrysogenim Notatum, Aspergillus Oryzae Niger,
113
and Stereum Sanguinolentum. These cellulases are all
complexes of exo—enzymes and endo—enzymes (116), and
random as well as endwise degradation occurs. The rela—
tionship between enzymatic activity and degree of poly—
merization is given in Table 4.3.2 (page 111). A
graphical presentation for the Penicillium Notatum com—
plex, illustrating the respective activity versus chain
length behavior, is found in Figure 4.3.2 (page 110).
The results of correlation and regression analysis
(P < 0.0005) on the cellulases' enzymatic activities are
given in Table 4.3.3 (page 112).
The activity—degree of polymerization data indi—
cate that the log—linear correlation between DPj and aj
based on equation (4.2.7) is good for both the amyloses
and celluloses. This is an important confirmation of my
approach, because equation (4.2.7) was derived from an
information analysis of carbohydrate encoding, and also
is the foundation for subsequent equations relating the
information approach to enzyme hydrolysis. The regression
analysis yields a considerable variation from the pre-
dicted slope of minus one, a necessary condition for
maximal information ordering in the system. Only one
enzyme, phosphorylase, has a slope close to unity——the
others differ. This behavior does not detract from the
information—entropy approach but rather implies that these
other systems are not most effectively organized. The
114
carboxymethyl celluloses are certainly not, because sub—
stitution of the cellulose polymer is known to have
unpredictable effects on the enzyme-substrate reaction
(126), which could account for the cellulose-cellulase
deviations from unity. The deviation for B-amylase can
perhaps be attributed to its mode of enzyme action.
Phosphorylase adds glucose monomers to the chain, where
B-amylase cleaves off maltose, a glucose dimer; there-
fore, B—amylase acts on only about half the bonds as.
would an enzyme cleaving monomers. Thus, a lepe between
—2 and -1 should be expected because B-amylase's action
is relatively quicker. The information-entropy activity
relation holds both in the synthesis and degradation of
carbohydrate polymers.
The relationship between hydrolysis, Dj’ and
degree of polymerization, DPj, of carbohydrate molecules
in vitro, can be shown for amyloses (127, 128). Four
different hydrolyses were conducted on narrowly distribu-
ted amylose polymers with B-amylase, using substrates with
different initial degrees of polymerization. The results
of these experiments are summarized in Table 4.3.4.
Hydrolysis, Dj’ is expressed on a scale of 100 instead
of l, and the logarithm of zero is designated as equal to
zero. The correlation and regression analysis (see Table
3.3.5) shows a lower degree of correlation and signifi-
cance than that seen in previous analysis (P < 0.1 for
115
TABLE 4.3.4.--Hydrolysis of Amylose Polymers with B-Amylase, and
Degree of Polymerization.
Test 1 Test 2 Test 3 Test 4 Natural
Amylose
55. D. 53. D. 53. D. 53. D. 55. D.
J J J 3 J J 3 J J 3
3,150 0% 1,230 0.0% 800 0.0% 795 0.0% 2,600 0.0%
2,050 20% 730 47.5% 560 29.5% 575 24.5% 2,500 35.5%
2,110 40% 525 68.0% 350 84.0% 280 78.0% 2,580 53.5%
1,550 70% 350 91.0% —- -— -- -- 2,200 72.0%
Table 4.3.5.--Correlation and Regression Analysis of
Hydrolysis and Degree of Polymerization.
Correlation
Coefficient SlOpe y-Intercept
Test 1 0.85 -0.14 11.72
Test 2 0.89 -0.22 10.33
Test 3 0.92 -0.17 9.71
Test 4 0.89 -O.21 9.74
test 1 and 2; P < 0.15 for 3 and 4). However, the paucity
and limited range of data could considerably bias the
results of this analysis.
Note that as the range for a
particular experiment increases, so does the correlation
coefficient.
the theoretical line of minus one.
The slopes of all the lines also differ from
116
This deviation of slopes from -l.0 is perhaps best
understood after the influence of the reaction order in
the encoding process is ascertained. Typically, enzyme
reactions are viewed as lSt order; substrate and enzyme
reacting on a one-to-one basis. However, B-amylase
reacts with an average of 4.3 linkages per encounter
(129), by a multichain mechanism, yielding a reaction
order of 4.3. Such a situation necessitates a revised
definition of enzyme activity. If we denote the enzy-
matic activity for a reaction between the enzyme and one
substrate bond as alj’ then the total enzymatic activity
equals the nth product because alj reflects the proba-
bility of reaction at one reaction site on the enzyme
molecule, and the joint probability that 3 sites will
react determines total enzyme activity and equals the
n + 1 product of the activities. Therefore, assuming
that the activity at each site is identical, the total
enzymatic activity equals:
a . =
3 (alj)n . _ (4.3.1)
The cost function, cj, which was initially thought to be
equal to the total enzymatic activity, is now seen to be
equal to the site activities, a Hence, cj must be
lj'
redefined in terms of site activities:
= l/n
Cj (aj) , (4.3.2)
117
which substituted into equation (4.2.7), gives:
-N 1
DP. — i . (4.3.3)
3 CJ (aj)l7n
This equation dictates a generalized equation for
“(I
k ln Di . (4.3.4)
hydrolysis:
u
UIH
01pm
k ln DP. — k 1n DP.
3 i
I
UIH
Using this result for a reaction order, 3, equal to 4.3
B-amylase, a slope of -0.232 is expected, closer to the
average lepe -0.185 which results from experimental
data. Why does the previous analysis of DPj versus
activity not exhibit similar behavior? The reason is that
the regression analysis done on the activities listed in
Tables 4.3.1 and 4.3.2 was effectively a comparison of
the rank orders of the enzymatic activities. The rank
ordering of activity for a particular enzyme will be the
same for k 1n a or for some constant multiple of it,
lj
n k ln alj equal to k ln aj. The digestion data were a
part of the dynamic analysis of the relative rates of
hydrolysis, which are dependent Iqxni reaction order.
The behavior of monodisperse carbohydrate polymers
differs from that of natural or polydisperse polymers.
118
The information analysis can be modified to encompass
polydisperse systems. We begin by defining the contri-
bution of the DPj polymer fraction to the polydisperse
activity of system i as k ln DPj times its probable
occurrence, Pj’ and do likewise for the DPi polymer.
Thus, the proportional change in the degree of poly—
merization of the system is:
X P. k 1n DP. - Z P. k 1n EP.
average
= k 1n chain length
ratio
. (4.3.4)
In Table 4.3.4, the degree of polymerization appears
independent of the degree of hydrolysis. Can equation
(4.3.4) predict such behavior?
Husemann and Pfannemuller (128) studied the dis-
tribution of polymers in polydisperse systems as a func—
tion of hydrolysis. Table 4.3.6 presents the chain
length distribution - Cj 1n P1 Il . (5.1.5)
126
A deviation from maximum organizability was noted
in the Zipfian analyses conducted in section 3.3, where
the slopes differed greatly from unity. Both the "under—
organized" and "overorganized" systems are present.
Since the organizability of the protein information sys-
tem is dependent upon the amounts of amino acids flowing
through the channels, an underorganized system having
b/b' values less than one reflects the loss of organiza-
tion in the system resulting from catabolism. The over-
organized systems, those with b/b' greater than one,
reflect the conservation of amino acids in the protein
information system; that is, the information transmission
rate of the jth channel is increased so more information
can pass through this channel.
Introduction of the catabolic concept into our
system effectively allows rejection of the complete
retention hypothesis. The Zipf's law equation for a
single amino acid becomes:
k ln ax. = k 1n am. - f k 1n r . , (5.1.6)
3 J C X]
or, fOr the entire essential amino acid set:
(5.1.7)
HX(EAA) = Hm(EAA) - fC k 1n rx ,
where rx is the rank of protein x, ﬁm(EAA) is the rank
one log-frequency, and fC is a catabolism factor
127
accounting for the loss of organization in the system
due to amino acid destruction by the liver.
The regression analyses done in section 3.3 give,
fortﬂuainformation-entropy measures based on the essen-
tial amino acid set, a range for fC of -0.517 to -0.608
for rats, and -0.601 to -O.700 for pigs. The interpreta-
tion of this result is that between 40% and 50% of the
essential amino acids in the diets of rats are catabolized,
and between 30% and 40% of those in the diets of baby pigs
are converted to urea. Both of these values are compar-
able to the experimentally determined value for mature
dogs of approximately 56%. Younger animals such_as the
experimental pigs and rats could be expected to have a
higher nitrogen retention.
If these urea production figures for baby pigs
and rats are correct, a new use for the information-
entropy model has arisen. In addition to being able to
predict protein quality indices, the model can in turn be
used to estimate the degree of catabolism of essential
amino acids for various species of animals. The experi-
mental method for measuring this phenomenon is very
difficult, and the information-entropy model may quite
possibly provide an adequate alternative.
The interpretation of an overorganized system as
overcompensating and thusreflecting essential amino
acid conservation is supported by the regression analyses
#—=_ v. u .— . ‘“.OI_-v-F—O-
128
in section 3.3. A formula similar to equation (5.1.7)
can be employed to reflect amino acid conservation:
min Hx(aaj) = min Hs(aaj) - fO - k 1n rxj ,‘ (5.1.8)
where fO is a conservation factor which represents the
overcompensation in the system due to essential amino
acid conservation, and min HS(aaj) is the log frequency
for the standard protein.
Returning to the regression analyses in section
3.3, the range for fO for rats is -1.287 to -1.487, while
the range for pigs is -l.097 to —l.460. These results
imply a conservation of the limiting essential amino acid.
The degree to which it is conserved is not readily evi-
dent: whereas the slope for underorganized systems can
range from 0.0 to 1.0, that for overorganized systems
goes from 1.0 to infinity. Nonetheless, the information-
entropy model conforms to the rule that the most limiting
amino acid is conserved by the organism.
The question now arises as to how the inclusion
of fg will affect the ability of the information-entropy
model to predict net protein value, the protein quality
index. The mathematics involved are quite simple and for
the essential amino acid set, the logarithm of NPV,
NPVX(EAA) is:
_ l - _ _ -
NPVX(EAA) — E— H (EAA) HS(EA.A) . (5.1.9)
129
The predicted NPV resulting from utilizing fc will be
denoted IX(EAAR) and known as the "essential amino acid
retention index":
_. 1- __1_-
IX(EAAR) — antilog f; HX(EAA) fc HS(EAA) (5.1.10)
when based on log-frequency ﬁx(EAA), or Ig(EAAR) for
log-frequency ﬁ:(EAA).
Utilizing equation (5.1.10) with fc = 0.5,
,. W
Ix(EAAR) and I:(EAAR) were determined from the data in g
section 3.3. Table 5.1.1 lists the correlation coeffi-
cients among IX(EAAR) and I3(EAAR), and the experimental
net protein values for rats and pigs. Table 5.1.2 lists
the slopes of a linear regression analysis for the above
variables and Table 5.1.3 lists the corresponding
y-intercept Values.
The correlation coefficients are not very dif-
ferent from those obtained by regression without fc.
However, the regression analysis shows a much improved
picture in the ability of the model to accurately pre-
dict the true score of net protein value for rats but the
results for pigs indicate an fc = 0.5 may be too high a
catabolism factor. This is graphically illustrated in
Figures 5.1.1 and 5.1.2 for rats and_pigs, respectively.
These graphs show the relationships of the model's
information-entropy measures, I:(EAA) and I:(EAAR), to
130
TABLE 5.1.l.--Correlation Coefficients Among Essential
Amino Acid Retention Indices and Experi-
mental Protein Values of Rats and Pigs.
Net Protein
Value (Pigs)
Net Protein
Value (Rats)
FAO Data:
Eggum Data:
IX(EAARf) 0.854 0.806
0
Ix(EAARf) 0.879 0.826
IX(EAARe) 0.851 0.819
IO(EAAR ) 0.910 0.842
X e
TABLE 5.1.2.
--Linear Regression Coefficients Among the
Essential Amino Acid Retention Indices and
Experimental Net Protein Values of Rats and
Pigs.
Net Protein Net Protein
Value (Rats) Value (Pigs)
FAO Data: Ix(EAARf) 0.642 0.533
0
Ix(EAARf) 0.657 0.543
Eggum Data: Ix(EAARe) 0.678 0.573
I0(EAAR ) 0.800 0.650
x e
TABLE 5.1.3.
--SlOpes for Regression Analysis Among Essen-
tial Amino Acid Indices and Experimental
Net Protein Values of Rats and Pigs.
Net Protein
Value (Pigs)
Net Protein
Value (Rats)
FAO Data:
Eggum Data:
Ix(EAARf) 23.7 35.5
0
IX(EAARf) 24.0 .35.8
IX(EAARe) 20.7 32.3
I°> S‘, then equation (5.3.1)
has the following form:
m m g
_ I — __. _—
a _ _S _ (5.3.2)
m m
or
aocl/DPj . (5.3.3)
This is the exact conclusion of the information—entropy
analysis. Thus, a valid analog to the encoding model
presented in this chapter can be found in a special appli—
cation of enzyme kinetics. Extension of the model to
predict digestibilities had questionable results. From
137
my VieWpoint, this was due to the lack of adequate data.
However, the regression analysis did not indicate that
the log-log relationship between degree of polymerization
and digestibility had complete merit. Rather, the level
of significance of the results did not offer sufficient
Cause for acceptance.
The most pertinent notion gleaned from this
analysis is the inverse proportionality between enzyme
activity and degree of polymerization. It indicates that
introduction of an information-entropy formalism may be
possible for the study of enzyme kinetics.
CHAPTER VI
CONCLUSIONS
1. The information-entrOpy model of protein
metabolism can be employed to assess the nutritional
quality of proteins. This is accomplished by relating
the amino acid content of proteins in the diet to net
protein value. The model's output is similar to other
amino acid scoring approaches such as Oser's essential
amino acid index and chemical score.
2. A new index, the "essential amino acid reten-
tion index," was postulated from the information—entropy
model. It was as well correlated to net protein value as
other chemical scoring methods, and permitted the catabo-
lism of ingested amino acids to be accounted for.
3. An information-entrOpy analysis of polymer
length versus the activity of enzymatic hydrolysis, by a
cost—frequency analysis of encoding, was well correlated
with experimental data on the subject.
4. The extension of the information-entropy study
for the estimation of a hydrolysis coefficient could not
be adequately correlated with experimental data.
138
REFERENCES
139
REFERENCES
(1) S. Carnot, Reflections on the Motive-power of Heat,
1824, trans. by R. H. Thurston, ed. by E. Mendoza
(New York: Dover Publications, 1960), p. 7.
(2) R. Clausius, The Mechanical Theory of Heat (London:
Macmillan and Co., 1879), p. 78.
(3) W. Thomson (Kelvin), "0n the Dynamical Theory of
Heat with Numerical Results Deduced from Mr. Joule's
Equivalent of a Thermal Unit, M. Regnault's Obser-
vations on Steam," Trans. of the Royal Society of
Edinburgh (March, 1851), reprinted in The Second
Law of Thermodynamics, ed. by W. F. Magie (New York:
Harper & Brothers, 1899), pp. 111-148.
(4) J. C. Maxwell, Theory of Heat (10th ed.; London:
Longmans, Green & Co., 1921), p. 189.
(5) C. Caratheodory, Math. Ann., No. 67 (1909), p. 355,
cited by S. Blinder, “Caratheodory's Formulation of
the Second Law," Physical Chemistry, ed. by W. Dost
(New York: Academic Press, 1971), pp. 613-637.
(6) M. J. Klein, "The DevelOpment of Boltzman's Statis-
tical Ideas," The Boltzman Equation, ed. by E. G. D.
Cohen and W. Thirring (New York: Springer-Verlag,
1973).
(7) L. Boltzman, "Further Studies on the Thermal Equi-
librium of Gas Molecules," Wiener Berichte, v. 66
(1872), cited by L. Boltzman, Lectures on Gas
Theory (1896), trans. by S. Brush (Berkeley: Uni-
ver31ty of California Press, 1964), p. 52.
(8) J. C. Maxwell, Matter and Motion (New York: Dover
Publications, Inc., 1877).
(9) L. Boltzman, "Observations on One Problem of the
Mechanics of Heat Theory," Wiener Berichte, v. 76
(1877), cited in L. Boltzman, Lectures on Gas
Theory (1910), trans. by S. Brush (Berkeley: Uni-
versity of California Press, 1964), p. 58.
140
(10)
(11)
(12)
(l3)
(l4)
(15)
(16)
(17)
(18)
(19)
(20)
(21)
(22)
(23)
141
G. Arfken, Mathematical Methods for Physicists (New
York: Academic Press, 1965).
J. W. Gibbs, Elementary Principles in Statistical
Mechanics (New Haven: Yale University Press,
1902).
E. H. Kerner, Gibbs Ensemble: Biological Ensemble
(New York: Gordon and Breach, Science Publishers,
1972), p. vii.
J. Kestin and J. R. Dorfman, A Course in Statistical
Thermodynamics (New York: Academic Press, 1971),
pp. 178-181.
Ibid., p. 196.
Ibid., p. 199.
E. T. Jaynes, "Gibbs vs. Boltzman Entropies," Amer.
J. Physics, v. 33 (1965), pp. 391—398.
A. Grunbaum, "Is the Coarse-grained Entropy of
Classical Statistical Mechanics an Anthropomor-
phism," Modern Developments of Thermodynamics,
ed. by B. Gallor (New York: J. Wiley & Sons, 1974),
pp. 413--28.
L. Szilard, "0n the Decrease of Entropy in a Thermo-
dynamic System by the Intervention of Intelligent
Beings," Z. Phy., v. 53 (1929), trans. in Behavioral
Science, v. 9 (October, 1964), pp. 301-310.
C. E. Shannon, "A Mathematical Theory of Communica-
tion," Bell System Tech. J., v. 27 (July-October,
1948), pp. 379 and 623.
J. R. Pierce, "The Early Days of Information Theory,"
IEEE Trans. on Info. Thy., v. IT-l9, No. 1 (January,
1973).
R. V. L. Hartley, "Transmission of Information,"
Bell System Tech. J., v. 7 (July, 1928), pp. 535-563.
D. Gabor, "New Possibilities in Speech Transmission,"
J. Inst. Elect. Eng. (London), v. 94 (November, 1947),
pp. 369-390.
A. I. Khinchin, Mathematical Foundations of Informa-
tion Theory (New York: Dover Publications, 1957),
pp. 9-13.
(24)
(25)
(26)
(27)
(28)
(29)
(30)
(31)
(32)
(33)
(34)
142
E. T. Jaynes, "Information Theory and Statistical
E. T. Jaynes, "Information Theory and Statistical
Mechanics, II," Physical Review, v. 108, No. 2
(1957).
M. Tribus, P. T. Shannon, and R. B. Evan, "Why
Thermodynamics is a Logical Consequence of Informa-
tion Theory," AIChE J. (March, 1966), p. 244.
P. Ziesche, "About a New Introduction of Entropy
in Statistical Mechanics due to Macke," Proc. of
Coll. on Info. Thy., v. II, ed. by A. Renyi’(Buda-
pest, Hungary: J. Bolyai Math. Soc., 1968), pp.
515-518.
J. Fritz, "Information Theory and Thermodynamics
of Gas Systems," Proc. of Coll. on Info. Thy.,
V. I, ed. by A. Renyi (Budapest, Hungary: J.
Bolyai Math. Soc., 1968), pp. 167-175.
M. Tribus, Thermostatics and Thermodynamics: An
Information Theory Approach (Princeton, N.J.:
Van Nostrand, 1961).
R. Baierlain, Atoms and Information Theory (San
Francisco: Freeman and Company, 1971).
D. C. Zubaren, Nonequilibrium Statistical Thermo-
dynamics (New York: Consultants Bureau, 1974),
pp. 38-45 and 100—103.
J. von Neumann, Theory of Self-Reproducing Automata
(Urbana, Illinois: (University of Illinois Press,
V1966), pp. 60-61.
G. Jumarie, "Further Advances on the General
Thermodynamics of Open Systems via Information
Theory: Effective Entropy, Negative Information,"
Int. J. of Sys. Sci., v. 6, 1975, pp. 249-269.
I. N. Taganov, "Information Simulation of Multi-
factor Systems in Chemistry and Chemical Engineer-
ing," Theoretical Foundations of Chemical Engineer-
ing, English translation from the Russian, v. 9,
No. 2 (1975, translated January, 1976), pp. 223-
228.
(35)
(36)
(37)
(38)
(39)
(40)
(41)
(42)
(43)
(44)
(45)
(46)
(47)
143
D. Slepian, "Information Theory in the Fifties,"
IEEE Trans.'on Info.'Thy., v. IT-l9, No. 2 (March,
1973), PP. 143-148.
C. Cherry, ed., Information Theory: 3rd London
Symposium (London: Burtersworths, 1956).
R. W. Hamming, "Error Detecting and Error Correct-
ing Codes," Bell System Tech. J., v. 29 (1950),
pp. 147-160.
C. E. Shannon, "Certain Results in Coding Theory
for Noisy Channels," Inform. Control, v. 1 (Septem-
ber, 1957), pp. 6—25.
A. J. Viterbi, "Information Theory in the Sixties,"
IEEE Trans. on Info., v. IT-19, No. -3 (May, 1973),
pp. 257-262.
E. R. Berlekamp, Algebraic Coding Theory (New York:
McGraw-Hill, 1968).
J. K. Wolf, "A Survey of Coding Theory: 1967-1972,"
IEEE Trans. on Info. Thy., v. IT—l9, No. -4 (July,
1973), PP. 381-389.
W. R. Garner, Uncertainty and Structure as Psycho-
logical Concepts (New York: John Wiley and Sons,
Inc., 1962), pp. 28-32.
H. Theil, Economics and Information Theory (Amster-
dam, Netherlands: North-Holland Publishing Co.,
1967).
J. M. Cozzolino and M. J. Zahner, "The Maximum-
EntrOpy Distribution of Future Market Price of
Stock," Operation Research, v. 21 (1973), pp. 1200-
1211.
B. Lev, "Accounting and Information Theory,“ Studies
in Accounting-Research (American Accounting Assoc1-
ation, 1969).
N. Georgescu-Roegen, The Entropy Law and the Eco?
nomic Process (Cambridge, Massachusetts: Harvard
University Press, 1971).
L. W. Rosenfield, Aristotle and Information Theory
(The Hague, Netherlands: Humanities Press, 1971).
(48)
(49)
(50)
(51)
(52)
(53)
(54)
(55)
(56)
(57)
(58)
(59)
(60)
144
M. A. P. Willmer, "Information Theory and the Mea-
surement of Detective Performance,"'Kybernetes,
L. L. Gatlin, "The Entropy Maximum of Protein,"
Math. Biosci., v. 13 (1972), pp. 213-227.
M. Haegawa and T. Yanko, "The Genetic Code and the
Entropy of the Protein," Math. Biosci., v. 24 (1975),
pp. 169-182.
C. E. Shannon, "Prediction and Entropy of Printed
English," Bell System Tech. J., v. 30 (1951),
p. 50.
J. F. Young, Information Theory (New York: Wiley-
Interscience, 1971), pp. 50—58.
C. E. Shannon and W. Weaver, The Mathematical
Theory of Communication (lst paperback ed.; Urbana,
Illinois: University of Illinois Press, 1949).
L. P. Hyvarinen, Information Theory for Engineers
(New York: Springer-Verlag, 1968), pp. 15-17.
B. Mandelbrot, Jeux de communication (Institut de
Statistique de l'UniversitE’de Paris, 1953), cited
by L. Brillouin, Science and Information Theory
(New York: Academic Press, Inc., 1962), pp. 28-47.
G. K. Zipf, The Psycho-Biology of Language (Cam-
bridge, Massachusetts: MIT Press, 1935).
G. K. Zipf, Human Behavior and the Principle of
Least Effort (Reading, Massachusetts: Addison-
Wesley Press, Inc., 1949).
L. S. Kozachkov, "Certain Integral Properties of
Information Systems of Hierarchic Type," Kiber-
netica (1974). .
V. T. Coates, Revitalization of Small Communities:
.TranSportation Options, U.S. Department of Trans-
portation, DOT-TST-75-l (May, 1974).
V. Pareto, Manual of Political Economy, trans. by
A. S. Schwier, ed. by A. S. Schwier and A. N. Page
(New York: A. M. Kelley, 1971).
(61)
(62)
(63)
(64)
'(65)
(66)
(67)
(68)
(69)
(70)
(71)
(72)
(73)
(74)
145
J. Lotka, “The Frequency Distribution of Scientific
Productivity," J.'Acad;'Sci. (Washington, D.C.,
No. 12, 1926).
J. Huxley, Problems of Relative Growth (2nd ed.;
New York: Dover Press, 1972).
T. A. Loomis, Essentials of Toxicology (Philadel-
phia: Lea & Febiger, 1968).
M. Tribus, Rational Descriptions, Decisions and
Designs (New York: Pergamon Press, 1969).
E. T. Jaynes, Probability Theory in Science and
Engineering (Dallas, Texas: Mobil Oil Research
Laboratory, 1959).
R. T. Cox, The Algebra of Probable Inference
(Baltimore, Maryland: Johns Hopkins Press, 1961),
pp. 35-65.
C. E. Shannon, "The Bandwagon," IEEE Trans. Info.
Thy., v. IT-2 (March, 1956), p. 3.
H. P. Yockey, R. L. Platzman and H. Quastler, edi-
tors, Symposium on Information Theory in Biology
(New York: Pergamon Press, 1958).
S. M. Danoff and H. Quastler, editors, Essays on
the Use of Information TheoryiJIBiology (Urbana,
Illinois: University of Illinois Press, 1953).
W. M. Elsasser, The Physical Foundations of Informa-
tion Theory in Biology (New York: Pergamon Press,
1958).
E. Samuel, Order: In Life (Englewood Cliffs, New
Jersey: Prentice-Hall, Inc., 1972).
J. von Neumann, "Probabilistic Logics and the Syn-
thesis of Reliable Organisms from Unreliable Com-
ponents," Ann. of Math. Studies, No. 34 (1956),
pp. 43-98.
B. Michaels and R. A. Chaplain, "The Encoder
Mechanism of Receptor Neurons," Kybernetic, v. 13
M. Abeles, "Transmission of Information by the Axon:
II--The Channel Capacity," Biological Cybernetics,
v. 19 (1975), pp. 121-125.
.—
(75)
(76)
(77)
(78)
(79)
(80)
(81)
(82)
(83)
(84)
(85)
(86)
(87)
146
R. F. Quick and T. A. Reichert, "Multi-Channel
Models of Human Vision: Bandwidth Considerations,"
Kybernetic, v. 12 (1973), pp. 141-144.
A. B. Kogan and O. G. Chorajan, "Some Information
Theory Applications to the Physiology of the Nervous
Cell," Kybernetes, v. 2 (1973), pp. 77-78.
R. Eckhorn and B. Popel, "Rigorous and Extended
Application of Information Theory to the Affernet
Visual System of the Cat-II: Experimental Results,"
Biological Cybernetics, v. 17 (1975), pp. 7-17.
M. W. Nirenberg and H. Matthaei, "Dependence of
Cell—Free Protein Synthesis in E. coli upon
Naturally Occurring or Synthetic Polyribonucleo-
tides," Proc. Nat. Acad. Sci. (USA), v. 47 (1961),
p. 1588.
H. R. Mahler and E. H. Cordes, Biological Chemistry
(New York: Harper and Row, 1971), p. 783.
L. L. Gatlin, Information Theory and the Living
System (New York: Columbia University Press, 1972).
L. M. Spetner, "Information Transmission in Evolu-
tion," IEEE Transactions on Information Theory,
V. IT-l4] No. l (1968), pp. 3-6.
E. Schroedinger, What is Life? (Cambridge, England:
Cambridge University Press, 1945).
J. von Neumann, Theory of Self-Reproducing Automata
(Urbana, Illinois: University of Illinois Press,
1966), p. 61.
P. Fong, "Thermodynamic and Statistical Theory of
Life: An Outline," Biogenesis, Evolution, Homeo—
stasis-—A Symposium bngorrespondence, ed. by A.
Locker (Berlin, Germany: Springer-Verlag, 1975),
pp. 93-100.
P. Glansdorff and I. Prigogine, Thermodynamic Theory
of Structure, Stability and Fluctuations (New York:
Wiley-Interscience, 1971).
L. A. Maynard and J. K. Loosli, Animal Nutrition
(New York: McGraw-Hill Book Co., 1969), p. 367.
M. Hawegawa and Y. Taka-aki, "The Genetic Code and
the EntrOpy of a Protein," Mathematical Biosciences,
v. 24 (1975), pp. 169—182.
(88)
(90)
(91)
(92)
(93)
(94)
(95)
(96)
(97)
(98)
(99)
147
K. Thomas, "Uber die Biolische Wertigkeit der
Stickstoff Substanzen in ver schieden Nahrung—
smittel," Arch. Anat. u. Physiol., Physiol.
Abstract (1909), pp. 212-302, cited by Maynard and
Loosli, op. cit., p. 459.
H. H. Mitchell and G. G. Carman, "The Biological
Value of the Protein Nitrogen Mixtures of Patent
White Flour and Animal Foods," J. Biol. Chem.,
v. 68 (1926), pp. 183-215.
A. E. Bender and D. S. Miller, "A Brief Method
for Estimating the Value of Protein," Biochem. J.,
v. 53 (1953), p. vii.
D. S. Miller and A. E. Bender, "The Determination
of the Net Utilization of Proteins by a Shortened
Method," British J. of Nutrition, v. 9 (1955),
pp. 382-390. ‘
J. M. McLaughlan and J. A. Campbell, "Methodology
of Protein Evaluation," Mammalian Protein Metabolism,
Vol. III, ed. by H. N. Munro (New York: Academic
Press, 1969), pp. 391-422.
T. B. Osborne, L. B. Mendel and E. L. Perry, "A
Method of Expressing Numerically the Growth-
Promoting Value of Proteins," J. Biological Chem.,
v. 37 (1919), p. 223.
D. V. Frost, "Methods of Measuring the Nutritive
Value of Proteins, Protein Hydrolyzates, and Amino
Acid Mixtures," Protein and Amino Acid Nutrition,
ed. by A. A. Albanese (New York: Academic Press,
1959). PP. 225-274.
D. M. Hegsted, "Assessment of Protein Quality,"
Improvement of Protein Nutriture, National Academy
of Sciences (1974), pp. 64-88.
Amino Acid Content of Foods and Biological Data on
Proteins, FAQ: FAQ Nutritive Studies, Rome, Italy
(1970), NO. 24.
S. B. Richmond, Statistical Analysis (2nd ed.; New
York: Ronald Press Co., 1964), pp. 424-465.
R. J. Senter, Analysis of Data (Glenview, Illinois:
Scott, Foresman, & Co., 1969), pp. 440—445.
.1
(100)
(101)
(102)
(103)
(104)
£105)
(106)
(107)
(1()£3)
(1C)9I)
148
H. H. Mitchell and R. J. Block, "Some Relationships
Between the Amino Acid Contents of Proteins and
their Nutritional Values for the Rat," J. Biol.
Chem., v. 163 (1946), p. 599.
B. L. Oser, "Method for Integrating Essential
Amino Acid Content in the Nutritional Evaluation
of Proteins," J. Amer. Dietetic Assoc., v. 27
(1951), pp. 396-402.
A. E. Bender, "Rat Assays for Protein Quality--A
Reappraisal," Proc. 9th International Congress on
Nutrition (MexiCo City, Mexico: 1972), v. 3,
reprinted by Karger Basel (1975), pp. 310-320.
H. N. Munro, "A General Survey of Mechanisms Regu-
lating Protein Metabolism in Mammals," Mammalian
Protein Metabolism, ed. by H. N. Munro (New York:
Academic Press, 1970), v. 4, pp. 3-130.
A. M. Rosie, Information and Communication Theory
(London, England: Van Nostrand Reinhold Co., 1973),
p. 90. ’
H. H. Williams, A. E. Harper, D. M. Hegsted, et al.,
"Nitrogen and Amino Acid Requirements," Improvement
of Protein Nutriture, National Academy of Sciences
(1974), pp. 23-63.
H. N. Munro, "Free Amino Acids and Their Role in
Regulation," Mammalian Protein Metabolism, v. 4,
ed. by H. N. Munro (New York: Academic Press,
1970), pp. 299-386.
J. M. McLaughlan and A. B. Morrison, "Dietary Fac-
tors Affecting Plasma Amino Acid Concentrations,"
Protein Nutrition and Free Amino Acid Patterns,
ed. by J. H. Leathem (New Brunswick, New Jersey:
Rutgers University Press, 1968), pp. 3-18.
C. Gitler, "Protein Digestion and Absorption in Non-
Ruminants," Mammalian Protein Metabolism, v. 1,
ed. by H. N. Munro (New York: Academic Press,
1964), pp. 35—70.
D. H. Elwyn, "Modification of Plasma Amino Acid
Pattern by the Liver," Protein Nutrition and Free
Amino Acid Patterns, ed. by J. H. Leathem (New
Brunswick, New Jersey: Rutgers University Press,
1968). pp. 88-106.
(110)
(111)
(112)
(113)
(114)
(115)
(116)
(117)
(118)
(119)
(120)
149
A. N. Kolmogorov, "Three Approaches to the Quanti—
tative Definition of Information," Problems of
Information Transmission, v. 1, No. 1 (1965),
pp. 3-11.
B. O. Eggum, A Study of Certain Factors Influencing
Protein Utilization in Rats and Pigs (COpenhagen,
Denmark: I Kommission has Lanhusholdningsselskabets
Forlag, 1973).
I. Danishefsky, R. L. Whistler and F. A. Bettelheim,
"Introduction to Polysaccharides," The Carbohy-
drates, Chemistry and Biochemistry, ed. by W. Pigman
and D. Horton (New York: Academic Press, 1970),
v. II-A, pp. 375—410.
"Starch in Glycogen," ed. by W.
"The Carbohydrates, Chemistry
Academic Press, 1970),
C. T. Greenwood,
Pigman and D. Horton,
and Biochemistry (New York:
v. II-B, pp. 471-513.
E. B. Cowling, "Structural Features of Cellulose,"
ed. by E. T. Reese, Advances in the Enzymatic
Hydrolysis of Cellulose and Related Materials
(New York: Pergamon Book Press, 1963).
W. S. Whelan, "Enzymatic Explorations of the Struc—
tures of Starch and GlyCOgen," Biochemistry,
K. W. King, "Enzymes of the Cellulase Complex,"
ed. by R. F. Gould, Cellulases and Their Applica-
tions, Advances in Chemistry Series 95 (Washington,
D.C.: American Chemical Society Publications, 1969),
pp. 7-26.
A. White, P. Handler, and E. Smith, Principles of
Biochemistry (New York: McGraw—Hill, Inc., 1973),
Weende Method, cited by L. A. Maynard and J. K.
Loosli, Animal Nutrition (New York: McGraw-Hill,
Inc., 1969), pp. 76-77.
Biological Energy Interrelationships and Glossary
of Energy Terms, National Academy of Sciences
(Washington, D.C.: Printing and Publishing
Office, NAS, 1966), Publication No. 1411.
R. A. Gibbons, Polydispersity," Nature, v. 200
(November 16, 1963), pp. 665-666.
(121)
(122)
(123)
(124)
(125)
(126)
(127)
(128)
(129)
(130)
150
F. M. Reza, An Introduction to Information Theory
(New York: McGraw-Hill, 1961), pp. 132-135.
H. R. Mahler, E. H. Cordes, Biological Chemistry
(New York: Harper & Row, 1966), p. 230.
L. Michaelis and M. L. Menten, Biochem. Z., V. 49
(1913), p. 333.
E. Husemann and B. Pfannemuller, "An Investigation
of the Kinetics of B-Amylase and Phosphorylase:
The Dependence of Reaction Velocity on the Chain
Length of the Amylose," D. Makromole. Chem., V. 87
(1965), pp. 139-151.
W. Z. Hassid, "Biosynthesis of Sugars and Poly-
saccharides," The Carbohydrate's Chemistry and
Biochemistry, ed. by W. Pigman and D. Horton (New
York: Academic Press), v. II-A, pp. 302-373.
K. E. Almin and K. E. Eriksson, "Influence*