’“FORMAT'ON'EWPY CONCEPTS?“ {.7 NUTRITIONALSYSTEMS - . f 3: a. E , a: ,3, ‘9 : _ Dissertationforthe Degree ofPh. D. 7:.1 _ ‘ MICHIGAN STATE UNIVERSITY * JEROME PAUL HARPER ' 1976 ‘‘‘‘‘‘ ”I”? ““ARY «(on \IllllllllllMill“Milli“\“llllllllmll 3 1293 10 Michigan 3136: This is to certify that the thesis entitled Information—Entropy Concepts for Nutritional Systems presented by Jerome Paul Harper has been accepted towards fulfillment of the requirements for Ph. D. degree in Agricultural Engineering ngw Major professor Date fl/fl; $/¢Zé 0-7639 3.2:..- ABSTRACT INFORMATION-ENTROPY CONCEPTS FOR NUTRITIONAL SYSTEMS BY Jerome Paul Harper The objective of this dissertation is to view nutritive processes as communication systems for trans— mitting dietary nutritional information. This study utilizes information theory concepts in the analysis of nutritive communication systems. The concept of information-entropy is used to derive the information capacity of the system's dietary inputs and metabolic requirements. The major process investigated isthe systenltrans— mitting informationensamino acids for protein metabolism. First, a gene-protein channel is defined and hpothesized to be the determinant of the metabolic information— entropy requirements for amino acids. A nutritive communication system is then postulated which contains five basic components: (1) information source (food protein), (2) encoder (intestinal amino acid transport system), (3) channel (circulatory system), (4) decoder cellular amino acid transport system), and (5) receiver Jerome Paul Harper (cellular amino acid pool). The transmission capacity of amino acid information depends upon cellular metabolic requirements which control the decoding capacity, and thus the overall transmission efficiency. Cost of trans- mission is defined as the ability of an information source to satisfy metabolic requirements during a fixed time period. A familiar rank-frequency distribution of information theory, Zipf‘s law, is employed to order source proteins on the basis of metabolic cost. Net protein value is shown to be proportional to the inverse protein rank (quality). A single channel model yields a protein ranking similar to chemical score, while the multichannel model generates a ranking similar to Oser's essential amino acid index. The multichannel model could be adapted to consider amino acid catabolism by the liver (an important loss of information in the channel), and predict a new protein ranking termed the "essential amino acid retention index." The other study concerns the information-entropy of carbohydrate polymers. The hydrolysis of these poly- mers is regarded as a metabolic encoding process. The dietary carbohydrate message has to be reduced to the monomer or dimer form if it is to be transmitted through the nutritional channel (i.e., circulatory system). The cost of encoding (time/monomer) is equated with the Jerome Paul Harper inverse activity of enzymatic hydrolysis (monomers/time). The ranking of carbohydrate message length (degree of polymerization) with respect to the rate of encoding (hydrolytic activity) is shown to be identical to the ordering scheme dictated by Zipf's law. < Approved M» S~ Migfr Professéf' Approved Mi :M Department Chairman INFORMATION-ENTROPY CONCEPTS FOR NUTRITIONAL SYSTEMS BY Jerome Paul Harper A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Department of Agricultural Engineering 1976 To Christine ii ACKNOWLEDGMENTS I wish to thank my major professor Dr. J. B. Gerrish for his guidance and assistance on this disser- tation and during my doctoral candidacy. Also, I wish to thank the members of my com- mittee, Dr. M. Z. v. Krzywoblocki, Dr. J. W. Thomas, Dr. D. K. Anderson, and Dr. J. B. Holtman for their thoughts and effortsixrthe preparation<1fmy dissertation. In particular, I want to acknowledge Dr. Krzywoblocki, “Ziggy" to his friends and students, for his tutelage in the field of information theory. LIST OF LIST OF LIST OF Chapter I. II. III. IV. TABLE OF CONTENTS TABLES . .' . . . . . . . . . FIGURES . . . . . . . . . . TERMS . . . . . . . . . . . INTRODUCTION . . . . . . . . . LITERATURE REVIEW AND PERSPECTIVE . . 2.1 Some Thermodynamic Aspects of Entropy . . . 2.2 Entropy and Information Theory . 2.3 Entropy, Information and Biology . 2.4 Entropy, Information and Nutrition INFORMATION AND THE QUALITY OF PROTEINS 3.1 Indices of Protein Quality . . 3.2 An Information—Entropy Model of Protein Quality . . . 3.3 Analysis of Information— —Entropy Approach . . . . . . . INFORMATION AND THE HYDROLYSIS OF CARBOHYDRATE POLYMERS . . . . . 4.1 Aspects of Carbohydrate Structure, Hydrolysis and Metabolism . 4.2 An Encoding Model for Carbohydrate Information . . 4.3 Assessment of the Carbohydrate Information— Entropy Analysis . . DISCUSSION . . . . . . . . . . 5.1 Nitrogen Retention and Information— Entropy . . . . . . . . . Page vi ix 44 54 77 94 94 98 107 122 122 Chapter Page 5.2 The Information-Entropy Model for Protein Metabolism: Summary . . . 133 5.3 Information— —Entropy and Polymers: An Appraisal . . . . . . . 135 VI. CONCLUSIONS . . . . . . . . . . 138 REFERENCES . . . . . . . . . . . . . 139 v LI ST OF TABLES Table Page 3.1.1 Listings of Biological Value, Net Pro- tein Utilization, Net Protein Value, and Protein Efficiency Ratio Scores with their Respective Rankings (Source: FAO) . . . . . . . . . 49 3.1.2 Matrix of Correlation Coefficients Relating Biological Value, Net Protein Utilization, Net Protein Value, and Protein Efficiency Ratio Scores . . . 50 3.1.3 Matrix of Correlation Coefficients Relating Biological Value, Net Protein Utilization, Net Protein Value, and Protein Efficiency Ratio Ranks (ps) . . 50 3.3.1 Amino Acid Content (micromoles per gram N) of Food Proteins (Source: Eggum) . . 80 3.3.2 Amino Acid Content (micromoles per gram N) of Food Proteins (Source: FAO) . . 81 3.3.3 Biological Values and Net Protein Values of Sixteen Test Diets of Rats and Baby Pigs (Source: Eggum) . . . . . . . 83 3.3.4 Information-Entropy Indices for Sixteen Different Food Proteins . . . . . . 84 3.3.5 Matrix of Correlation Coefficients for Information-Entropy Measures Versus Net Protein Values and Biological Values of Rats and Baby Pigs (based on amino acid content of dietary protein) . . . 85 3.3.6 Matrix of Spearman's Rank Correlation Coefficients for Ranks of Information- Entropy Measures Versus Net Protein Values and Biological Values of Rats and Baby Pigs (based on amino acid content of dietary protein) . . . . . 86 vi Table 3.3.7 3.3.8 3.3.9 3.3.10 Matrix of Correlation Coefficients for Information-Entropy Measures versus Net Protein Values and Biological Values of Rats and Baby Pigs (based on available amino acid content of dietary protein) . . . . . . . Matrix of Spearman's Rank Correlation Coefficients for Ranks of Information- EntrOpy Measures Versus Net Protein Values and Biological Values of Rats and Baby Pigs (based on avilable amino acid content of dietary protein) . . Matrix of Correlation Coefficients for Zipfian (log-log) Analysis of Information-Entropy Model . . . . Matrix of Slopes of Zipfian (log-log) Analysis of Information-Entropy Model Degree of Polymerization and Enzyme Kinetic Data of Amylose . . . . . Degree of Polymerization and Activity Data of Cellulose, with activity in (moles/sec.) x 10'9 . . . . . . Correlation and Regression Analysis of Activity Data Versus Degree of Polymerization . . . . . . . . Hydrolysis of Amylose Polymers with B-Amylase, and Degree of Polymeriza— tion 0 C O O C O O I O O 0 Correlation and Regression Analysis of Hydrolysis and Degree of Polymeriza- tion 0 O O O O O 0 O O O 0 Chain Length Fractionalization of a Polydisperse Carbohydrate System as a Function of Its Degradation . . . . Chain Length Behavior of Polydisperse Carbohydrate Systems . . . . . . vii Page 88 88 9O 91 111 111 112 115 115 119 120 Table 5.1.1 Page Correlation Coefficients Among Essential Amino Acid Retention Indices and Experimental Protein Values of Rats and Pigs . . . . . . . . . . . 130 Linear Regression Coefficients Among the Essential Amino Acid Retention Indices and Experimental Net Protein Values of Rats and Pigs . . . . . . 130 Slopes for Regression Analysis Among Essential Amino Acid Indices and Experi- mental Net Protein Values of Rats and Pigs . . . . . . . . . . . . 130 viii Figure 3.2.1 LIST OF FIGURES Graphical Representation of Transcrip— tion of Genetic Information in Protein Synthesis . . . . . . . . . . Idealized Communication System for Transmission of Genetic Information . Idealized Communication System for the Transmission of Amino Acid Informational Molecules . . . . . Hg(EAA), the Average Information— Entropy, Versus Zipfian-Rank—Ordering for Net Protein Value for Rats . . . The Degree of Polymerization Versus Enzymatic Activity for B-Amylase . . The Degree of Polymerization Versus Enzymatic Activity for Cellulose A (Penicillium Notatum) . . . . . . Graph of Information—Entropy Indices Ig(EEARe) and I§(EEAe) Versus Net Pro— tein Value for Rats (Source: Eggum) . Graph of Information-Entropy Indices Ig(EAARe) and 1%(EAAe) Versus Net Pro— tein Value for Pigs (Source: Eggum) . ix 56 59 64 92 109 110 131 132 B! b; b' LIST OF TERMS Canonical normalization constant Constants Total enzyme activity Enzymatic activity for reaction with one substrate bond .th . . 3 amino acrd Maximum jth amino acid frequency, magnitude of amino acid variable at receiver Absolute ranking of jth amino acid Frequency of jth proteins x and 3 amino acid in Body nitrogen of protein-fed animals Constant Constants Body nitrogen of non-protein-fed animals 1 Biological value Biological value of jth amino acid Channel capacity of a system Total costs of message Cost of ith or jth Capacity of jth channel at source and receiver, respectively symbol or word D 8.); D x.) (J (3 DP DP. 3 d(x,t) e e-subscript E EAA EAAI EF. 3 F f-subscript Cost of source aa. and receiver aaj, respectively Total cost of message Chemical score Concentration of protein in diet Digestibility Hydrolysis coefficient Digestibility of protein Degree of polymerization Length of message Distribution function Energy unit Denote Eggum as data source Macroscopic energy Essential amino acid set Essential amino acid index Replica energy in canonical ensemble Encoding efficiency Fecal nitrOgen Denotes FAO as data source Catabolism factor Endoqenous fecal nitrogen Conservation factor Boltzman H-function Information of jth symbol xi IX(CS) I (EAAR); Xo Ix(EAAR) I (NPV) ; x o I (NPV) X Maximum of H-function nth order multivariate information capacity Independent nth order multivariate information capacity Independent bivariate information capacity Bivariate information capacity Entropy of receiver Information-entropy content of x Characteristic information-entropy of protein Total information-entropy of jth variable Average information-entropy of essential amino acid set Channel transmission rate at source .th . . 3 channel transmiss10n rate Stored information Univariate and bivariate divergence Information index Essential amino acid retention index Information—entrOpy indices Constant for converting 1n to log2 Boltzman's constant xii K m m DP. ( 3) min Hs(aaj) min Hx(aaj) NPU NPV NPV(Xj); NPV(Sj) NPVx(EAA); NPVS(EAA) Michaelis-Menten constant Frequency of messages with length DP. 3 Log frequency of IX(CS) for stan- dard protein Log frequency of IX(CS) for protein 5 Chemical potential of jth particle Total number of particles or words Total number of words of cost cj Total number of glucose units in source Number of glucose monomers in system Nitrogen intake Nitrogen intake of non-protein-fed animals Net protein utility Net protein value Net protein value of jth amino acid Log-averages of net protein values Number of jth particles Permutability factor Canonical probability function Protein efficiency ratio Grand canonical probability function xiii P(i); P(j) P(ij) P. J P(j/i) mc Probability of ith or jth element Joint probability of ith and jth elements . . .th Probability of 3 state Conditional probability of jth state Microcanonical probability function Heat Redundancy Spearman's rho h Relative rank of jt amino acid in protein x Rank of protein x Entropy Substrate concentration Statistical entropy of macrosystem Absolute temperature Time Urinary nitrogen Endogenous urinary nitrogen Reaction velocity Maximum reaction velocity Partition function CHAPTER I INTRODUCTION The title of this dissertation contains a compound word, "information-entropy." This word carries with it two different and distinct concepts which together repre— sent somewhat of a union of information theory and ther- modynamics. This union can be viewed as generalizing the study of thermodynamics. My notion of a generalized theory of thermo- dynamics revolves about a simple hypothesis on the occurrence of events. All events occur more or less frequently for two reasons: (a) There exist physical reasons favoring certain states, and (b) there exist some mental reasons favoring certain states. Therefore, if one is to interpret phenomena, a theory, methodology, or principle is needed to quantify the frequency of physical phenomena, and understand what quantification means. In science, the quantification of observable phe— nomena is accomplished by calculating the entropy of the system. In fact, the Second Law of Thermodynamics implies that systems which cannot be quantified because their behavior is so random have no utility. It was the purpose of men such as Boltzman, with his permutability factor, and Gibbs, with his interpretation of the behavior of volume in phase space, to develop within the science of thermodynamics the ability to quantify nature. Given that entrOpy is a measure which strives to quantify, how does it relate to information theory? Information theory is the science of quantification. Using its concepts and applying its theorems allows us to understand how we are quantifying a system. This is why I will later stress the importance of Zipf's law, an empirical rank-size rule, in its information theory con- text of frequency-cost relationships. The empirical observation of Zipfian behavior is of little benefit if one does not recognize that such behavior depicts a sys- tem's organization. It is, then, no mere coincidence that thermodynamic measures of entropy and the communication measure of information are similar. Because it is important to quantify phenomena, I have sought in my thesis to develOp an information- entropy methodology for analyzing nutritional systems. Frequency, pattern, and organization are important con- cepts in nutrition and a proper format should be devel- oped for quantifying them. My conceptualization of entrOpy in thermodynamics is that of quantification. Little difference between the meanings of "entropy" in thermodynamics, and "information" in information theory exists. Both strive to achieve the same end, quantifica- tion, which is the deeper meaning of entropy. The quantification of nutritional systems is begun by assuming the organism under study exists as a biological information processing system. The informa- tion being processed here is nutritional information. The source of nutritional information for the organism is the diet, which contains a vast array of different nutri— tional signals, each signal varying in its frequency of occurrence and each diet providing a distinctive pattern of signals. This information is then fed into a highly organized biological communication system which distrib— utes and integrates the nutritional information to provide the necessary chemical order (nutrients) for the continu- ance of the organism's metabolic processes. The nutritive system which best fits into the above sequence of events is the protein metabolic system. An extensive study of this system will be presented in the text, and the relationships between the frequencies of nutritional information (amino acids) and different nutritional frequency patterns of proteins can be used as a measure of the protein quality of the diet. The carbohydrate study measures the information or entropy of macromolecules. The amount of information which they possess is based upon the frequency of nutritional signals (glucose units) per molecule. The ability of the carbohydrate message to be interpreted (hydrolyzed) by the organism's enzymes depends on mes- sage length. The concept of "information—entropy" as used in this study is thus defined as "the frequency of a nutri- tional event" (e.g., the occurrence of an amino acid in the diet or glucose unit insipolymer). This nutritional frequency is then shown to favor a particular metabolic state. CHAPTER II LITERATURE REVIEW AND PERSPECTIVE The relationship between entropy, information and biology is complex. The following chapter presents classical and statistical mechanical ideas on entropy, their relationships to information, and the role of information—entropy in biological systems. This chapter is partically an historical review and partially a com- mentary on the subject. Its purpose is to provide the reader with a perspective on both subjects' scientific aspects and my own conceptualization of the interrela— tionships among entropy, information and biology. 2.1 Some Thermodynamic Aspects of Entropy Perhaps the best way to present the concept of information-entropy is to begin with the development of the entropy principle in thermodynamics. Early ideas on heat were based on the study of steam engines, and Sadi Carnot's (1) notes in 1824 on the efficiencies of these engines are often regarded as the starting point of thermodynamics. The concept of entropy was intimately associated with views on the nature of energy, which was thought to possess one of two qualities: (1) to be free and available to do mechanical work, or (2) to be bound and incapable of mechanical work. A qualitative degrada- tion of energy from the free to the bound form was invari- ably observed to occur, and several rules were formulated to describe this phenomenon. Clausius (2) stated, "Heat can never, of itself, flow from a colder to a hotter temperature." Thomson's position, on the other hand, was, "It is impossible to derive mechanical effect from any portion of matter by cooling it below the temperature of the coldest sur- roundings" (3). These statements are similar, and the principle of physics which was derived from them is known as the Second Law of Thermodynamics. Clausius presented in 1865 the classic formulation of this law: The entropy of the universe at all times moves toward a maximum. The sense of entropy here is like the notion of bound or latent energy with the added constraint of being quantified at a particular temperature. The following equation shows the relationship (4): Bound energy (2 l 1) Absolute temperature ' ° ' Entropy = A refinement of the above relationship was obtained from analysis of the behavior of an elementary heat engine in the Carnot cycle. It was reasoned that if heat, g, were allowed in or out of the cycle only in infinitesimally small increments, then q could be approximated by its differential form, dq. Utilizing this differential form the Carnot cycle heat behavior can be described by the following integral form: dq _ . § 77 — 0 (reverSIble), (2.1.2) where T is the absolute temperature. The above is an interesting mathematical form, for it is an equation which implies that one has uncovered an exact differential mea- sure, another state variable, different from energy, which describes a system's thermodynamic behavior as the state changes from a to b. The new state function was entropy, S, and was formally defined: dS = g? (for a reversible process) (2.1.3) A mathematical approach for the derviation of caloric entropy proposed by Caratheodory (5) verified that dq/T is an exact differential only if the process is reversible. Clausius used the efficiency concept to illustrate a general difference in the entropy behavior between reversible and irreversible processes. His result, known as the Inequality of Clausius, states that the efficiency of a reversible cyclic process, like the Carnot cycle, is always greater than that of an irreversible cycle. Clausius' comprehension of the fact that all real (i.e., irreversible) processes result in an entropy increase of their surroundings, led to his statement of the Second Law of Thermodynamics, which was the climactic effort of the classicists in heat theory. Because of the macroscopic nature of the Carnot cycle study, it is not amenable to mechanistic analysis. Such analysis requires a more detailed description of the phenomenon, a microscopic picture, so that the macro- scopic parameters such as temperature, pressure, etc., Vare understood as the aggregated mechanical behavior of the elementary masses (atoms, molecules) in the system. In a mechanical model, the possibility exists for defin— ing the state variable, entropy, irrespective of the process being reversible or in equilibrium. The ground— work for such a model began in the late nineteenth cen— tury and was to be the basis for understanding the formulation of entropy. The development of a mechanical explanation of the Second Law of Thermodynamics can primarily be attributed to Ludwig Boltzman. His explanation of the Second Law has become the mainstay of statistical mechanics. Boltzman began his studies in 1866 (6) which were highlighted in 1872 (7) with the publication of a long memoir which gave the first derivation of the irreversible increase of entropy based on the laws of mechanics and also upon those of probability. In this memoir Boltzman presented a mathematical proof of the second thermodynamic law by illustrating the uniqueness of Maxwell's velocity distribution law (8) as a descrip- tor of the equilibrium state. Maxwell had shown for gases that his distribution was stationary and Boltzman expanded the application of this proof by demonstrating that whatever the initial state of a gas, it approaches a limit in the distribution of Maxwell. In this proof Boltzman derived a partial dif— ferential equation for a distribution function d(x,t), with respect to time, the distribution function repre— senting the number of molecules per unit volume with kinetic energies at time t lying within an interval of x to (x + dx). He showed that Maxwell's function is stationary and makes 8d(x,t)/at vanish. The next phase called for the introduction of an auxiliary function called H, defined: H = fowd {1n [d(x,t)/fl] - 1} dx, (2.1.4) which he proved can only decrease with time due to the symmetry characteristics of the collision process and the possibility of inverse collisions. It was then shown that the Maxwell's distribution function minimizes 10 the H function, proving that regardless of the initial distribution of d(x,t) the final or equilibrium state is realized in the Maxwell distribution. Even more important than this fact, Boltzman pointed out, was that the quantity H was proportional (with a negative proportionality constant) to the entropy of the gas. Needless to say, this result caused a consider— able degree of interest and before long criticisms of his approach arose. It was Boltzman's response (9) to one of his critics that resulted in the formulation of the second thermodynamic law as an expression of the laws of probability. He showed that the entropy of a state is reflected by its probability and an increase in entropy merely reflects a shift from less to more probable states. He employed a discrete model to illustrate this proba— bilistic nature of entropy. It was hypothesized that a collection of N particles possessed energies which were integral multiples, j, of a basic energy unit, 3. The number of particles having the j x e energy is denoted by nj, such that the sum over i of nj equals N. For a complete assessment of this system of par— ticles, a listing of all the individual molecular ener— gies would be required. To attain this assessment a permutability measure, P, the number of different arrangements (microstates) for a given distribution was constructed by the equation: ,m .-" - - .. . . .u..... _ ._ 1 “ 11 P— N‘ (215) _—l—_l—__—' " n0.nl....nj1 Boltzman then reasoned that the most probable distribution was the one where P is maximized. To find the maximum for 3 he first used Stirling's approximation for factori— als (10) and proceeded to deduce the following equality: 1n P = —E n. ln nj + constant . (2.1.6) 3 Recall equation (2.1.4) and recognize the similarities between nj and d(x,t) and that the negative of the H function is entropy. The beauty of Boltzman's proof becomes readily apparent: he has, first, found the dis— tribution which maximizes 3; second, shown the relation- ship between P and entropy; and third, knows the nj's will have a Maxwell distribution when the entropy of the system is maximum. The classic formulation Boltzman gave for statistical entropy, Sm, of the macrosystem in terms of its microstate distribution is: S = k - 1n P (2.1.7) where kbis known as Boltzman's constant which is deter— mined by dividing the gas constant by Avogadro's number. The notion of probability is ascertained from P, the permutability factor. Consider the logarithmic expansion of equation (2.1.5) and apply Stirling's approximation; the result has the form: 12 1n P a N (ln N—l) - Z n.(1n n. - 1) j J 3 = N 1n N - Z nj 1n jn J = -N X (nj/N)ln nj/N . (2.1.8) 3 The quantity nj/N is identical to the probability of the jth microstate which shall be denoted as Pj' Substitut- ing equation (2.1.8) into equation (2.1.7) for the macrosystem entropy: sm = —kbN g Pj 1n Pj . (2.1.9) The behavior of this function is identical to that in equation (2.1.3), and generates the most probable dis- tributions when P is maximized. Because one is greatly limited in his knowledge of the microsystem structure, another successful approach which overcame such limitations appeared at the beginning of the twentieth century. It was developed by Gibbs (11). The Gibbsian approach was proposed to show how microscopic "behavior" determined the total thermodynamic picture. Primarily, this was attained by employing an abstraction which Gibbs called an ensemble. In essence, this was a statistical-mechanical theory which could be generalized as a statistical theory of systems of dif- ferential equations (12). 13 An ensemble can be defined as a collection of a large number of identical replicas of the representative system. These replicas are all independently performing the identical irreversible process. The main assumption of ensemble theory is that the instantaneous macroscopic state is related to the average of the replicas' states taken over all the replicas. The ensemble's macroscopic conditions can dic- tate the probability distributions for each replica by affecting energy and motion. Then, for different macro- scopic conditions, different ensemble types can be identified. The following are the three most common ensembles employed (13): Microcanonical ensemble: A statistical ensemble of closed, energetically isolated systems in a constant volume, a replica here can be thought of as being enclosed in an adiabatic shell where neither exchange of energy nor of particles is allowed. The rigidity of the constraints implies that a simple probability function holds for this type of ensemble. The microcanonical probability function, Pmc, is constant and of the form Pmc = l/P . (2.1.10) The Maxwell—Boltzman distribution can be used to calculate the probability function of a micro- canonical ensemble. Canonical ensemble: A statistical ensemble in thermal contact with a thermostat, here the replica is permitted to exchange energy with another system whose energy is so large by com— parison that its state remains unchanged. The canonical probability function, PC, is there— fore a function of the energy E- of each replica and its form is exponenfial: PC = A exp(-B'Ej) . (2.1.11) 14 A is a constant fixed by normalizaton, B' is the inverse of thepmoduct of the Boltzman constant and the absolute temperature. The term exp(-B'Ej) is called the Boltzman factor. Grand canonical ensemble: A statistical ensemble which can exchange both energy and particles with its surroundings, such an ensemble can be con— ceived of as a box in contact with a thermostat and possessing permeable walls. The grand canoni- cal probability function, P C, is based both on considerations of energy an particles: ._ _ V .. ' ch A exp( B Ej B Z njuk) (2.1.12) where pk is the chemical potential of the jth particle type. After deciding what ensemble to employ, the remaining problem in the Gibbs approach is that of com— puting what is known as the partition function. The partition function is probably the most important concept in statistical mechanics today, from which important thermodynamic variables (including entropy) can be esti- mated. The partition function is a very simple mathe- matical form that depicts the distribution or partitioning of the system among the various energy levels or quantum states. To calculate, sum the Boltzman factors for all the different states (14): Z = E exp(-B'E.) . (2.1.13) j J The partition function has an important statistical rela- tionship to the probability of the system: Pj = nj/N = exp(—B Ej)/Z (2.1.14) 15 which gives the function immense utility in thermodynamic calculations. An expression for entropy in terms of the parti- tion function can be readily deduced. Recall that the classical definition for entropy is the reversible dif— ferential energy change divided by the absolute tempera- ture. As the energy changes:h1theprocess so will the partition function with respect to B' and Ej(15): d 1n z = -E dB' — (B'/N) 2 nj dEj (2.1.15) 3' (1n Z is preferred to Z because of its additive proper- ties). By performing a Legendre transformation on equa— tion (2.1.15) and collecting terms, we cause the differ- ential energy or heat change of the system to become: dE = T - dS = d(ln Z + B'E)/B' (2.1.16) or, alternatively, the entropy is: d8 = kb - d(ln Z) + d(E/T) (2.1.17) S = kb - 1n Z + E/T . (2.1.18) Equation (2.1.17) can be converted to the Boltzman equation for entropy after we recognize two relationships: E = Z P.E. (2.1.19) j 16 -B'Ej = 1n(PjZ) . (2.1.20) Equation 2.1.19 states that the macroscopic energy of the system is the expected value determined from the energies of each microstate, and equation (2.1.20) is the logarithmic form of equation (2.1.14). Substituting into equation (2.1.18) we have: s = k 1n z + k 2 Pj(B'Ej) b J = k 1n Z — k P. 1n P. — k P. 1n Z b 2 j j Z 3 J 3 = k P. 1n P. . (2.1.21) b E J J The above equation is identical to equation (2.1.9), the Boltzman entropy, divided by N. In spite of the obvious similarities between the Gibbsian approach and that of Boltzman, there are also pertinent differences. Boltzman's entropy is a measure which does not consider interparticle forces, and thus neglects the effects of potential energy and the effect of interparticle forces on pressure; Gibbs' entropy takes into account all the energy and total pressure (16). The question then arises, what is the true thermodynamic entropy of a system? The current thought on this question tends to the Viewpoint that true thermodynamic entropy is difficult 17 to define because the partitioning or experimental con- ditions of the system depend upon the human element (17). This "anthropomorphic" aspect of entropy imparts a con— siderable degree of arbitrariness, making a definition of true thermodynamic entropy essentially impossible. However, it should be remembered that irrespective of the manner in which the system's partitioning is accom- plished, the partition-dependent behavior is not arbi— trary but follows a course dictated by the Second Law. Consequently, when studying the entropic behavior of a system the most difficult problem is to state what ques- tions we want to resolve and to formulate entropic measures which allow their resolution. 2.2 Entropy and Information Theory An important aspect of the entropy concept not usually accounted for in traditional thermodynamic approaches to entropy is its information attribute. The first to recognize the relationship between entropy and information was Szilard (18) in 1929, who related the usage of information to the production of entropy. This was approximately twenty years before the development of information theory by Shannon (19) in 1948, and its value has only recently been recognized. Shannon's contribution to science was significant, since for years investigators had tried to formulate a useful 18 measure of information for communication engineering (20). Several names stand out in the early years: Hartley (21) with his theory on information transmission, using the logarithm of number of symbols as an informa— tional measure, and Gabor (22), working on time-frequency uncertainty and the logon concept. However, it was Shannon who clarified the confused situation with his theory. Like Hartley, Shannon used a logarithmic measure of the number of symbols as his measure of information. Formally, Shannon's information measure for the jth sym— bol, h., is defined as the negative logarithm of the J symbol's probability: h. = - 1n P. . (2.2.1) 3 3 Shannon recognized his measure determined not the quan- tity of information the symbol conveyed, but rather the uncertainty of information. The Shannon measure can also be applied to messages; this is accomplished by determining the expected value of all symbols in the message: H = Z -Pj 1n 9. . (2.2.2) The H-function has an absolute maximum when the proba- bilities for all the symbols are equal (23): _, m L 19 H = - 1n P. . (2.2.3) max 3 Using equations (2.2.2) and (2.2.3) we can explore more deeply the meaning of Shannon's information measure. The notion of uncertainty is easily deduced, for as H increases, the symbols become more equiprobable and the ability to distinguish their information content decreases, or alternatively, our uncertainty about them increases. Another way to regard uncertainty is as a measure of the number of degrees of freedom. The lower the uncertainty the fewer degrees of freedom or the greater the uncertainty the more degrees of freedom the system has. The degrees of freedom concept also denotes the idea of capacity and usually "information capacity" is what Shannon's measure of information is called. Information capacity means "message variety" to the com— munications engineer, a useful parameter in designing communication systems. Given the measure of information in equation (2.2.2), it is now much easier to see the relationship of entrOpy, in equations (2.1.9) and (2.1.21), to informa— tion. The progression from information theory to thermo- dynamics is accomplished by first relating information theory to statistical mechanics, via the partition func- tion, and thus to the various statistical-mechanical 20 analogs of the laws of thermodynamics. This was first done by Jaynes (24, 25) and his conclusions have since been verified by others (26, 27, 28). The agreement between information theory and thermodynamic entropy has become such a well—accepted relationship that many cur— rent textbooks on thermodynamics and statistical mechanics rely heavily on the concept of information when presenting these subjects (29, 30, 31). Perhaps the following quote of J. von Neumann can best summarize the roles of information and thermodynamic entropy (32): The thermodynamical methods of measuring entropy were known in the mid—nineteenth cen- tury. Already in the early work on statistical physics it was observed that entropy was closely connected with the idea of information: Boltzman found entropy proportional to the loga— rithm of the number of alternatives which are possible for a physical system after all the information that one possesses about the system macroscopically (that is, on the directly, humanly observable scale) has been recorded. In other words, it (entropy) is proportional to the amount of missing information. Current investigations employing information theory methodology in the study of the thermodynamics of open systems (33) and chemical systems (34) have begun to assess the use of information—entropy in engineering disciplines other than telecommunications—related areas. Both theoretical developments and wide application of the information theory methodology came about during the 1950s (35) and brought the subject out of its infancy. The 3rd London Symposium on Information Theory is an 21 excellent illustration of the diversity of subjects examined using the new concepts (36). The topics for papers given ranged from studies on computers, elec- tronics, statistics and mathematics to those on animal welfare, political theory, psychology, anthropology, economics, and anatomy, which are subjects divorced from traditional communication applications. This period also smntherise of coding theory (37, 38), an important step increasing the utility of information theory for solving the problems of a rapidly expanding telecommunications industry. The sixties provided less innovation than the fifties, and more time for reflection on the theory's fundamental concepts and tenets (39). Emphasis was placed on coding theory, in particular on decoding algorithms (40), and these studies have remained the basic thrust of its mathematical development (41). In addition to direct telecommunication applications, information theory began to be firmly established in several other disciplines. Psychology proved a fertile area for the application of information theory, where the individual was regarded as an information processing device (42). The economic adaptations of the theory also found acceptance. The theory of information-entropy was used as a mathematical tool for analyzing industrial =u 22 concentration (43), the future prices of stocks (44), and as an accounting methodology (45). Utilization of this concept has become so widespread that an entropy law for general economic processes has been proposed (46). Interesting, but perhaps esoteric, is the application of an information theory observer—critic to evaluation of the philosophical arguments of Aristotle (47), and the application of the theory to detective work in crimi— nology (48). However, an important new application of information theory was in the field of biology, where many interesting new facts were discovered about the information transcription of the genetic code onto the protein space (49, 50). To understand the information concepts which will be employed in this work, it is necessary to discuss addi— tional terminology and theorems pertinent to the study. The best place to begin is with the extension of Shan- non's information capacity, equation (2.2.2), to nth order Markov processes or multivariate analysis. The idea behind this extension is that the possession of an infor- mation measure based on the joint probability of B and E could be used to construct the entropy of the word BE, that of B and E and T, the word BET, and so forth. Shannon called such information calculations n-gram entropies (51). Used in this manner the joint probability has a multivariate connotation, but this would change if l1 23 we considered the same message or word coming along a telegraph wire. Such a message is different because it is dynamic and can be thought of as a stochastic process. Because outcomes (words) in such a process would be dis- crete the formulation of H in a Markovian sense is pos— sible (19). Let us first look at the bivariate case involving a pair of events and define the joint probability of the ith and jth elements: P(ij) = P(i) P(j/i) (2.2.4) where P(ij) is the joint probability, P(i) the proba— bility of i and P(j/i) the conditional probability of 1 given 1. The bivariate information capacity, H2, equals the following (52): H = —Z 2 P(ij) 1n P(ij) 1 J 2 Z P(j/i) 1n P(i)P(j/i) (2.2.5) 1. J and if P(i) and P(j) are independent, H2 becomes —2 Z P(i)P(j) 1n P(i)P(j) 1 3 NH = -Z P(i) 1n P(I) — 2 P(j) 1n P(j). (2.2.6) 1 1 Note that by subtracting H2 from Hi we get a new measure, . . s the divergence from Independence or a measure of 1 t—order 24 Markov memory. Extending this to multivariate analysis th . or an n -order Markov chain we have Hn = -2 ; ... 2 P(i)P(j/i) ... P(n/N-l) 1n PP 1 j n ... P(n/n-l) (2.2.7) and if all the probability sets are independent: H: = —Z P(I) 1n P(i) - Z P(j) 1n P(j) 1 J — -Zp(n)1n P(n) . (2.2.8) n The difference between H: and Hn would be the same as an nth-order measure of Markov memory. The information-entropy measure expresses a capacity for freedom or variety and is sometimes referred to as "potential" information. Often, it is desirable to speak in terms of the order or "stored" information, 15' of a system. Logically, the order or stored informa- tion is the difference between the maximum disorder or entropy of the system and its actual entropy: I = H - H . (2.2.9) Obviously, the notion of stored information can be extended to any of the multivariate cases, giving a measure of order for each case of dependence. 25 The idea of "stored information" in information theory terminology is usually conveyed by the concept of redundancy, R. Redundancy, as explained by Weaver (53), reflects that fraction of the message which is ordered or repetitive: R = IS/Hmax = l - H/Hmax (2.2.10) The error in communicating a message is reduced as the redundancy increases. Thus, the redundancy concept pro- vides an indication of the reliability of the system. Noise and channel capacity are two other common terms. The channel capacity, C, is the maximum rate at which information or entropy flows through a channel (the physical medium of information transmission). Channel capacity has the units symbols per unit time (54): C = 1n (n)/t , (2.2.11) where 1n n is sometimes referred to as the entropy of one channel. Noise is one of the limiting factors in the efficiency of transmission through a channel; it is the error between the message sent and that received. The processes of encoding and decoding the message are the pertinent factors determining the noise of a channel. One of Shannon's more important theorems describes the potential for reducing the noise of a channel (19): 26 Let a discrete channel have the capacity 9 and a discrete source the entropy per second H. If H i 9 there exists a coding system such that the output of the source can be transmitted over the channel with an arbitrarily small frequency of errors (or an arbitrarily small equivocation). If H 3 9 it is possible to encode the source so that the equivocation is less than H — g + e where s is arbitrarily small. There is no method of encoding which gives an equivocation less than H - g. The theorem implies that we cannot eliminate noise in the channel but we can "learn to live with it." A question generally addressed in information theory concerns the capacity of a set of symbols (words) in a code (language) to transfer information based on their respective durations (lengths). The duration or length of a word is related to its cost, because the more symbols needed to convey a bit of information, the greater the cost and the less efficient such a transfer becomes. One might suspect then that the frequency of a word in a language would be related to its cost; the longer words being less frequent and the shorter ones more frequent. An analysis by Mandelbrot (55) showed the most efficient coding scheme per unit cost satisfied the aforementioned suspicion. However, the result achieved by Mandelbrot by an information theory approach had already been realized years before by Zipf (56, 57) through empirical analyses of language. The principle or law discovered by Zipf and rationalized by Mandelbrot can be expressed either by: .1 ”N1..J. I. n ...I... .... taxman .....r1m : 27 a) a relation between the frequency of occur- rence of an event and the number of different events occurring with that frequency, or b) a relation between the frequency of occur- rence of an event and its rank when the events are ordered with respect to frequency of occurrence. Zipf's law is a power law which can be linearized by employing its logarithmic form. Mathematically, the law states that the logarithm of a word's rank in a language or code equals the negative of the logarithm of its fre— quency plus a constant equal to the logarithm of the frequency of the word with rank one. Mandelbrot's method for deducing Zipf's law began by calculating the best probability rule for words of a varying cost, cj. The idea for obtaining this rule was very similar to Boltzman's for finding the maximum of ln P or the maximizing of the number of microstates, yielding the most probable distribution. 3 was inter- preted here as the total number of different messages from E words: 1np=-N2 P. ln P. (2.2.12) j J J and was maximized by the Lagrange's multiples method with the constraints that the sum of the Pj's equals one and that the total cost of the message of E words, CN, equals the sum of the costs for each jth symbol: C = n.c. = N P.c. . 2.2.13 N E J J g 3 J ( ) 2.2.1.: .. .fi 4.“. 28 {The result was that probability of the jth word must be ;re1ated exponentially to the jth cost to get the maximum trumber of different words for a given cost: Pj = exp (—bcj) (2.2.14) “filere b is a constant. The next step was to find the number of words rq(cj) of cost cj. This problem boiled down to a finite difference problem: N(c.) = 2 n j k kNk(cj - Ck) (2.2.15)- Mfllich states that any one of the n k words can be used to (nonstruct a message of cost (cj — ck) to build a word of txatal cost cj. For a stable code, the finite difference sc>lution of equation (2.2.15) is: = I N(cj) Al exp (b cj) + A2 (2.2.16) vfllere Al’ A2, and b' are constants, and where inverse b' tiJmes the logarithm of Al equals the negative of the cost Of the initial condition, cl. By solving both equations (2.2.14) and (2.2.16) fOr'cj, assuming A2 equals zero and sorting by increasing cost or rank, the order of N(c.) as determined by its cost is: 29 9— ln (Rank [N(c )1) = 9— ln A — 1n p (2 2 17) bl j l l j 0 - 9— 1n (Rank [N(c )]) = -bc - 1n P b' j 1 j = ln P1 — ln Pj (2.2.18) which is a generalized form of Zipf's law and dictates an ordering for the number of words of cost C3. in a message of g words, which maximizes total message variety (efficiency). Recently, Kozachkov (58) showed that a ratio of b/b' equal to one maximized the total message variety. He stated that the overall number of different messages is: P=ZP.P=P2P. (2.2.19) 3' j and by determining Pj from equation (2.2.18) and substi- tuting he got: -(b/b') PlN(cj) Nude _P=P j 1 J _ . p' 2 N(c.) (b/b ) . (2.2.20) j=1 3 Then he calculated the sum over J different words for three cases: b/b' greater than one, equal to one, and less than one: 3o _ p1 1-(b/b') . P - l-—(bW)_ J , b/b < l = p' 1n J , b/b' = 1 = P_'.___ b/b- > 1 (2 2 21) (b/b') - l ’ ' ° ' The maximum P clearly is for b/b' equal to one as J approaches infinity, and we can thus write Zipf's law: ln P. = 1n P — 1n Rank. . 2.2.22 3 l ( 3) ( ) Kozachkov said that when a hierarchical structure or sys— tem followed Zipf's law with a slope of minus one, its organizability was maximum because the information capacity at every level in the hierarchy was maximized relative to the overall information capacity. This prin— ciple was recently used as an indicator of national city- size integration (59). The far—reaching expressions of Zipf's law in nature are very striking phenomena. Zipf himself expanded the scope of his studies beyond that of word distribution to distribution of interval frequency in classical music, city-size frequency, product-manufacturing frequency, retail store frequency, job-occupation frequency, newspaper circulation frequency, charge account frequency in depart- ment stores, frequency of telephone messages through interchanges, and other examples which obeyed his rule (57). 31 Many empirical laws developed by others are Zipfian; Pareto's law of income distribution (60) being one of the more notable, has often been cited as instrumental in formulating the graduated income tax structure of today. The Lotka distribution law (61) deals with the frequency of scientific writing. The biologist's law of allometric growth, and allometry in general (62), are good examples of the organization of parts within the organism work- ing through self-regulation for the benefit of the whole. The dose-response curves (63) demonstrate the inherent ability within an organism to interpret an incoming coded chemical message and elicit a response dependent upon the message magnitude (chemical frequency). Zipf's law is a powerful tool for assessing the organizability of various physical, biological or social systems. The importance of information-entropy criteria in statistical inference was demonstrated by Tribus (64). He utilized the principle logically set forth by Jaynes (65) and Cox (66) that the maximum entropy formal- ism elicits the minimally prejudiced probability distri- bution. This is true since the maximum entropy state is achieved when the hypothesis favors no inference more than another. Tribus then proposed a method for calcu— lating the probability distribution which would maximize the entropy function under the constraints imposed by the standard statistical distributions such as uniform, 32 exponential, gaussian, gamma, beta, etc. He then stipu- lated that the particular probability density distribu- tion which maximizes the entropy function yields the minimally prejudiced probabilities. On this basis he formulated an entropy inference test: AS = N1) P(i) 1n P(i) + 2 P(J) ln P(J) i j - Z Z P(ij) 1n P(ij)] (2.2.23) 1 1 which, if one recalls equations (2.2.5) and (2.2.6), is the difference between the maximum bivariate entropy, Hg, and the distribution bivariate entropy, H2, quantity multiplied by N. Thus, this entropy inference test measures the independence between two distributions where independence between the two sets implies that there is no information difference between them. If one of the distributions is prejudiced, then equation (2.2.22) mea- sures the bias in our observed distribution. Tribus showed that if the information difference between the two distributions was small, the quanity -AS/N equals the Chi-square statistic. The importance of Tribus' work and other informa- tion theory applications is that they demonstrate the role of information in our world. The individuals men- tioned in this section have uncovered that role: for 33 example, Jaynes in statistical mechanics, Mandelbrot in Zipf's law, and Tribus in statistical inference. In sub— sequent chapters, I will expand the perspective on information into the area of nutrition to more fully examine the utilization of information by the living system. 2.3 Entropy, Information and Biology The validity of applying information theory methodology to other fields, aside from those directly related to communication systems, was a point of some debate after Shannon's inception of the approach (67). Some felt that information theory was an approach which could only be justifiably used in telecommunication applications. However, such a viewpoint defines the theory's value only in terms of its immediate success. One cannot deny communication engineers their well— deserved credit for laying the sound mathematical founda— tions of the methodology, but such limited range for application unduly restricts information theory develop— ment to the parochial aspirations of this discipline. Unlike the communication engineers, physical, biological, and societal scientists could not as easily put their encoders and decoders on the table and confirm the the- oretical predictions of the theory. Because the biophysicist could not take the cell's DNA and examine 34 its sequence nor the psychologist take a brain apart to examine its neuron network, these early ventures of information theory often resulted in frustration. This frustration led to a decline in interest that has slowly begun to be reversed with the determination of biochemical and biophysical structures and functions of living systems. The biological field is intimately associated with the field of thermodynamics, and the important roles played by entropy, information, order and control are recognized throughout the discipline (68, 69, 70, 71). The state of knowledge in several biological fields has reached the level where application of the information- entrOpy concept can have and has had a significant impact on the interpretation of experimental studies. A significant area of information theory applica- tion in biology is in the field of neurophysiology. The theoretical basis for applying information theory to the nervous system was put forth by von Neumann in the mid- fifties (72). His approach drew analogies between an information synthesis of a reliable system's unreliable components and neurological systems. Since this work, experimental studies have tended to support the utility of information analysis in neural systems. These appli- cations have studied the encoding mechanism (73), trans- mission and multiplexing of neural information (74, 75), 35 and the general physiology of nervous cells (76) and systems (77). Another prime arena of biological information theory applications is genetics, and I would like to pre- sent one of the major works on the subject as an illus- tration of the potential the entropic approach possesses. At the beginning of the last decade, a new understanding arose concerning the relationship between the structure of DNA and that of proteins (78). The genetic code, as this relationship is commonly called, a universal biological language for the storage and transmission of cellular information essential to metabolism and behav— ior, was deciphered (79). Code words are formed by a sequence of three nucleic acids which link together forming a strand of DNA. Each sequence translates into an amino acid which during transcription of the code is catalytically joined to others to form proteins. An impressive study on the genetic code has been done by Gatlin (80). Her work examined the univariate and bivariate information capacities,eqpations(2.2.2), H, and (2.2.5), H2; the stored information, equation (2.2.9), I and redundancy, equation (2.2.10), R, of S; nucleic acid sequences in DNA. The results of her examination have led to new insights not only on the grammar aspect of the genetic code but also on the evo- lutionary process in nature. 36 Recall that Is, stored information, is our entropic measure for divergence from equiprobability or equality in the univariate case or divergence from inde- pendence in the bivarate case. Let us denote ISl as univariate divergence and I$2 as bivariate divergence. Both these measures have a special meaning with regard to language. 151’ the divergence from symbol equality, is the main determinant in a language of its message or word variety. The frequency of a language's symbols thus dictates the available vocabulary. ISZ’ the diver- gence from independence, is the lSt—order measure of a language's grammar. The degree of dependence imparts redundancy or fidelity into the message by dictating the symbols' relationships to each other (i.e., grammar). Gatlin calculated I81 and IS2 based on the per— centage of guanine and cytosine, for phage, virus, bacteria, plant, insect, and vertebrate organisms. Of course, univariate information-entropy studies had been done on DNA before, but not bivariate. The union of univariate and bivariate information—entropy concepts pre- sented in the format of a language study is a significant advance in genetics, for it gives a new methodology for interpreting the biochemistry of cellular information storage which would not be possible without information theory. 37 How much further this new methodology might be applied was also demonstrated by Gatlin in this study in her application of the results to the evolutionary process. [Such an application of information theory had been previously suggested (81).] She was able because of her application of information-entropy measures to hypothesize a new theory about the course of evolution in nature. The name given to this theory was Shannonian evolution. Using the measures Isl and 152, Gatlin made several important observations. One was that the nonver— tebrate species showed considerable variation in the frequency of guanine plus cytosine, whereas the verte- brates displayed little variation. Also, it was noted that the invertebrates had a significantly greater vari- ation in both I81 and I82 than the vertebrates. After extensive analysis of the situation her conclusion was that the evolution from invertebrate to vertebrate life forms has proceeded in two phases. The first was where Is decreased; the second, where 152 increased. 2 Such an evolutionary course is significant in the context of information theory because the Is2 governs the relative degrees of variety and fidelity in a code. If we assume that the overall redundancy or order in the code increases consistently as one advances up the evo- lutionary hierarchy, then, the first phase of decreasing IS can be regarded as the period when ISl is increasing. 2 38 151 affects the message variety of the code and conse- quently this first evolutionary phase can be looked on as a search for an optimal alphabet for message variety. The second phase of increasing 182 depicts an increase in the grammar or dependence of symbols in the code. This evolutionary process is similar to a child learning how to read and write. First, the alphabet is taught with simple spelling (a form of grammar). After the alphabet has been mastered (end of evolutionary phase one), the development of reading and writing skills involves learning increasingly more grammar, I52, as advanced spelling, syntax, etc. (evolutionary phase two). Gatlin's explanation of evolution through the information theory allows us to understand Darwin's theory in the context of modern evidence in genetics. Other biological information systems amenable to information—entropy analysis exist. If the genetic system is a living information storage system, the meta- bolic or nutrition system can be regarded as the main- tenance system of an organism. Such a nutritional system operates by encoding information (food) it receives, and channeling it for use in growth. Although simply stated, the control of the various metabolic information processes is very complex and the information messages are not as neatly visualized here as in the genetic system. However, by employing an information—entropy analysis of nutritional 39 systems, an integrated format uniting information stor- age by the genes and information transmission in the metabolic control process can begin to be understood. 2.4 Entropy, Information and Nutrition The aim of this thesis is to employ the concept of entropy in the context of information theory and to use this measure, information-entropy, to analyze various metabolic processes. The first question to be addressed is whether Second Law principles hold for living systems. Schroedinger (82) stated that Second Law behavior does hold for these systems, but qualified this statement by claiming life to be a steady—state process which preserves the entropy of the individual organism at the expense of increasing the entropy of its environment. Essentially, the organism maintains itself by consuming low entropy substances and transforming them into higher entropy com— pounds. Given that such thermodynamic behavior is valid for biosystems, can we extrapolate from what could be called an energy—entropy basis to an information—entropy basis? A direct translation of the phenomenological laws of the thermodynamic branch to the information branch was thought by von Neumann to be consistent and a logical step. He expressed this opinion in the following manner (83): _._ 40 There is reason to believe that the general degeneration laws, which hold when entropy is used as a measure of the hierarchic position of energy, have valid analogs when entropy is used as a measure of information. On this basis one may suspect the existence of connections between thermodynamics and new extensions of logics. Fong (84) also perceives congruency between the thermo- dynamic and information behaviors of biologically active systems, finding the laws for the creation and dissipa— tion of information consistent with those of Prigogine's (85) thermodynamic theory of structure, stability and fluctuations. A biological dissipative process can best be understood in processes such as catabolism. By studying the complete catabolism of alanine in the mammalian body, it can be demonstrated through the use of an information- entropy approach that a dissipation of information occurs during catabolism as one would expect a dissipation or increase in entrOpy to happen. The formula for the complete respiration (catabolism) of alanine is (86): 4 CH3CH(NH2)COOH + 12 O + 10 C0 + 10 H O 2 2 2 + 2 CO(NH2)2 . (2.4.1) Using equation (2.2.2) to calculate H, our information- entrOpy measure, the molecular information in the above process will be dissipated if H(products) is greater than H(reactants). The molecular probability on each side of l1 41 the equation can be equated, using the respective mole fractions of each compound. The following information— entropy measures can be calculated from equation (2.4.1): H(reactants) = -0.25 k 1n 0.25 — 0.75 k ln 0.75 = 0.811 bits/molecule, (2.4.2) and H(products) = -0.454 k 1n 0.454 — 0.454 k In 0.454 - 0.092 k 1n 0.092 = 1.351 bits/molecule (2.4.3) where k is a constant factor for converting from natural logarithms to base 2 logarithms. Results in information theory are typically expressed as the number if binary digits, termed "bits." The amount of information present or stored in the system is given by equation (2.2.9), and Hmax can be calculated when the five different molecular species in the above catabolic reaction are equiprobable (i.e., equal mole fractions). The informations of the respective systems are: I (reactants) = H - H(reactants) = 2.322 — 0.811 s max = 1.511 bits/molecule, (2.4.4) and Is(products) = Hmax — H(products) = 2.322 — 1.351 = .971 bits/molecule. (2.4.5) l1 42 The information change in going from reactants to products is IS(reactants) minus Is(products), which equals 0.54 bits/molecule and indicates that information is being dissipated. Supposing that the basic dissipative laws of thermodynamics can be transferred to those of information, a detailed inspection of other information—entropy proc- esses is necessary. Coding of information is an attribute commonly seen in biological systems. An example of a biological code is the genetic code. However, information coding is not as readily seen in nutrition as in genetics, the reason being that the nutritional control system is a complex information hierarchy. There is not a universal code such as that in genetics which translates through a well-defined gene- protein channel, but rather, many different codes and channels make up the nutritional system. Perhaps the most obvious code of relevance to nutritional studies is that which can be termed the "protein code" (87). The term "protein code" arises from the relationship between the genetic code and proteins. If one assumes that DNA is a coded sequence of nucleic acids, it logically fol- lows that the amino acid sequence generated by the gene itself is coded in some manner. Therefore, the protein code can be envisioned as providing the basis for the creation of chemical informational molecules whose lr 43 function is dependent upon protein structure (e.g., enzymes). Whether a protein can be formed from a nucleic acid message depends upon the availability of amino acids within the organism. Those amino acids which cannot be synthesized by the cell must come from the diet, and the proportion of such essential amino acids in the diet, as well as quantities of nonessential amino acids, will affect synthesis of cellular protein. This interrelation- ship provides a basis for justifying application of information-entropy to nutrition and for relating it to previous work on information theory in biology. Of course,theprotein system is but one system for metabolic transformations. However, the above dis— cussion allows one to visualize how entropy and informa- tion concepts can be incorporated into an analysis of nutritional systems. In the following chapters, I will‘ present more detailed discussions and analyses of the relationship of amino acid nutrition to overall protein metabolism, and of some aspects of carbohydrate bio— chemistry, with the aid of information theory. Through these studies I hope to elucidate the importance of the concept of information—entropy in nutritional systems. CHAPTER III INFORMATION AND THE QUALITY OF PROTEINS Nutritionists have developed many concepts over the years, both qualitative and quantitative, to express the value or worth of a food protein. These standards have been formulated by using combinations of chemical and biological analyses. This chapter will present an information—entropy approach for ascertaining a nutritive protein code and then show the relationships among the various indicators of protein quality and the information- entropy of the diet. 3.1 Indices of Protein Quality Before applying the principles of information the- ory to protein quality, a description of the most common indices of protein quality is in order. The following concepts will be discussed: biological value, digesti- bility, net protein value, net protein utilization, protein efficiency ratio, chemical score, and essential amino acid index. Although it is not a complete or exhaustive list of concepts for assessing protein quality, this set of indices is representative of the more com— monly used parameters. 44 45 The "biological value" (BV) is one useful estimate of protein quality, involving a nitrogen balance approach. This measure was defined in 1909 by Thomas (88) as the fraction of absorbed nitrogen retained within the organism for maintenance and growth. It may be expressed mathe- matically as (89): N - (F-F ) - (U—U ) _ I k k where NI is the nitrogen intake, F is fecal nitrogen, Fk is endogenous fecal nitrogen, U is urinary nitrogen, and Uk is endogenous urinary nitrogen. The endogenous fecal and urinary nitrogen can be determined by finding a nitrogen—free diet or one containing a small amount of high quality protein (90). Estimates of biological value which do not correct for endogenous nitrogen losses are termed "apparent biological values." "Digestibility" (D) is probably one of the oldest qualitative indicators used in nutritional studies. It denotes the fraction of the food nitrogen which is absorbed, and is calculated (89): N - (F-F ) D = _———————I k (3.1.2) NI Like biological value, digestibility is a nitrogen bal- ance index, and is classified as "true" or "apparent" 46 depending upon the inclusion or exclusion of endogenous nitrogen losses in its determination. Another nitrogen balance method which is a com- bination of the previous two indices was put forth by Bender and Miller in 1953 (91). Essentially, this new index, originally called "net protein ‘value" (NPV), is equivalent to biological value times digestibility, and expresses the amount of nitrogen retained divided by the total nitrogen intake: N - (F-F ) - (U-U ) NPV = I k k . (3.1.3) N1 Several years later, Bender and Miller proposed a shortened method for determining what is effectively the same quantity as net protein value. The difference was that this new index was approached through a carcass analysis method rather than by nitrogen balance; the name coined for this parameter was "net protein utilization" (NPU) (92). Net protein utilization was defined: B - (B —N ) NPU = ____N_k__13f_ , (3.1.4) I where B is the body nitrogen of the animals fed the test protein, and B and NI are the body nitrogen and nitro- k k gen intake of the group fed the nonprotein diet. l1 47 All the aforementioned tests for protein are usually noted as being conducted either under "standard- ized" or under "operative" conditions (89). Standardized measurements are those made under maintenance conditions, whereas operative ones are those made under other defined conditions. Sometimes a suffix indicating the percentage of protein in the diet is used (e.g., NPUlO). These are important constraints to recall when interpreting these tests, for the quality of the protein depends greatly on the purpose for which it is required (e.g., growth or maintenance). Typically, net protein utilization and net pro- tein value are taken as measurements of the same quantity, and are not distinguished between in the literature (93). However, for my purposes a distinction will be made. The term "net protein value" will refer to those measures calculated by multiplying digestibility times biological value (i.e., those done by a balance method), while "net protein utilization" will denote measures determind by a carcass analysis method. The final biological estimate of protein quality to be presented here is the "protein efficiency ratio" (PER). It is a parameter proposed in 1919 by Osborne et_§l. (94), and defined as the "gain in body weight divided by weight of protein consumed." This is a very popular index, primarily because of the ease with which 48 it can be determined. Previously, the determination of this ratio was conducted at several levels of nitrogen intake. In this manner, an optimal level of protein intake could be identified for a maximum gain in weight. Generally, good correspondence between gain in body weight and gain in body protein exists, however PER is not always an acceptable evaluation procedure (95), and is not as reliable an indicator of protein quality as the other indices. Although these indices are determined experi— mentally in quite different ways, biological value, net protein value, and net protein utilization are based upon the same criterion, retained nitrogen, and should measure essentially the same thing (96). The protein efficiency ratio would be an approximate measure of this criterion, also. Table 3.1.1 gives a listing of 21 dif— ferent food proteins taken from an FAO compilation of biological data (97), for which the scores of the above four indices were found (protein level of diets from which scores were derived was 10%). The table includes both the actual score and a ranking of the proteins based upon their respective scores. Table 3.1.2 lists linear correlation coefficients (98) which were calcu- lated using paired scores of the indices, and all regressions are significant at P < 0.01. This cross- correlation analysis shows a very good relationship 49 firw-‘r ”‘7" TABLE 3.1.l.——Listings of Biological Value, Net Protein Utilization, Net Protein Value, and Protein Efficiency Ratio Scores with their Respective Rankings (Source: FAO). Biological Net Protein Net Protein Eifiziigcy Value Utilization Value Ratio Score Rank Score Rank Score Rank Score Rank Egg, whole 93.7 1 93.5 1 90.9 1 3.92 1 Wheat, whole 64.7 14 40.3 20 58.8 11 1.53 18.5 Maize 59.4 18 51.1 17 54.5 15 1.18 20 Casein 79.7 4 72.1 4 76.8 3 2.86 5 Fish, meal 81.1 3 65.8 7 76.2 4 3.42 3 Soybean 72.8 8 61.4 8 65.9 6 I 2.32 7 Groundnut 54.5 20 42.7 19 47.2 19 1.65 15 Sunflower 69.6 10 58.1 9 57.0 12 2.10 13 Lentil 44.6 21 29.7 21 37.9 21 0.93 21 Rice, polished 64.0 15 57.2 10 62.7 9 2.18 11 Wheat, germ 73.6 7 67.0 5 64.9 7 2.53 6 Cottonseed 67.2 11 52.7 14 53.5 16 2.25 9 Linseed 70.8 9 55.6 11.5 59.8 10 2.11 12 Sesame 62.0 17 53.4 13 50.7 17 1.77 14 Milk, whole 84.5 2 81.6 2 81.9 2 3.09 4 Beef, muscle 74.3 6 66.9 6 73.8 5 2.30 8 Lima beans 66.5 12.5 51.5 16 47.9 18 1.53 18.5 Peas 63.7 16 46.7 18 55.8 14 1.57 16.5 Pigeon peas 57.1 19 52.1 15 44.4 20 1.57 16.5 Brewer's yeast 66.5 12.5 55.6 11.5 56.1 13 2.24 10 Fish, muscle 76.0 5 79.5 3 64.6 8 3.55 2 ._ --.;J 50 TABLE 3.1.2.--Matrix of Correlation Coefficients Relat- ing Biological Value, Net Protein Utiliza— tion, Net Protein Value, and Protein Efficiency Ratio Scores. BV NPV NPU NPV 0.942 1.000 0.881 NPU 0.924 0.881 1.000 PER 0.906 0.859 0.918 TABLE 3.1.3.—-Matrix of Correlation Coefficients Relat- ing Biological Value, Net Protein Utiliza- tion, Net Protein Value, and Protein Efficiency Ratio Ranks (Spearman's ps). BV NPV NPU NPV 0.911 1.000 0.865 NPU 0.901 0.865 1.000 PER 0.893 0.839 0.928 between raw scores of the various indices. On the other hand, Table 3.1.3 has Spearman's rho (ps) (99) correla- tion coefficients for the ratings (P < 0.01). Basically, p can be regarded as a regression coefficient for two 3 ranked variables. Spearman's index additionally tells us that not only are the scores highly correlated, but so are the relative rankings ‘we derive from them. This 51 lends support to the contention that all these tests are a measure of the same variable. The previously mentioned measures of protein quality all involve a combination of biological and chemical methods of analysis. Nutritionists have ardu- ously sought a simple chemical procedure for determina- tion of protein quality which would be as accurate as the experimental measure of biological value. The incentive is that biological tests are expensive and time-consuming. One of the first attempts to minimize biological testing utilized chemical score (CS). Employing the principle of the limiting essential amino acid as a justification for their method, Mitchell and Block (100) calculated a mathematical regression between their chemi- cal scores and the biological values of 23 different proteins. Chemical score is represented by the minimum amino acid ratio of amino acids in a test protein to those in a standard protein; it was first advanced as: min (ax.) _______1_ cs = asj , (3.1.5) h . . essential amino wherenu11(axj) is the content of the jt acid which is most limiting in a test protein and asj is the content of the jth essential amino acid in the 77,—». -_..=..-.__~-lr-_—.-__ __ 1-7.11.._~__-__1 . _ .. __ _~._. ..,H 52 standard protein (usually egg), expressed in units of milligrams of amino acid per gram protein-N or grams per 16 grams protein-N. However, the chemical score method considers only one amino acid in the protein, so a scoring method was sought that would include more amino acids of the food protein. A variation of the above approach, which incor- porates all the essential amino acids of a protein into an index of quality, was conceived by Oser (101) and has come to be known as the Essential Amino Acid Index (EAAI). This index has been used to estimate the biological value of a food protein relative to that of a standard protein. The EAAI is basically a determination of the geometric mean of a set of ratios. These ratios are the same as those used in the chemical score procedure (i.e., the ratio of essential amino acid concentration in an arbitrary food protein 5 relative to its concentration in a standard protein). The standard protein used is egg and Oser assigns to egg the biological value of 100. The following mathematical formula is used to calculate the index: 1/10 ax2 3X10 X100. (3.1.7) asl asz aslo , -. ..-~.u- . ,__ . , ' . ‘ A .. .. ...”; .. 53 Oser had two additional rules he employed in determining the EAAI: (l) the maximum value of the ratio for any essential amino acid will never exceed 1.0, and (2) the minimum ratio will never be less than 0.01. The first of these assumptions is based on the view that any quantity of an amino acid in excess of that possessed by the standard is not needed by the organism for growth. Thus, the surplus may be disregarded. In the second assumption, the justification is that there always exist certain endogenous sources of protein (e.g., intestinal enzymes, tissue degradation) which will supply some of any essential amino acid. The limitations of amino acid scoring methods involve the factors which influence the digestibility of the protein. For example, when the essential amino acids areIuM:completely available for metabolism, due to malab- sorption or some other factor, the tendency of any index based solely upon their content is to overestimate the real biological value. Consequently, these indices are most accurate for those proteins which are more completely digestible and lose accuracy as the protein digestibility decreases. A closing observation concerns the accuracy of these indices in assessing protein quality. In a recent appraisal of protein quality, Bender considered it suf- ficient to classify proteins as poor, moderate and good, 54 and thought that Oser's EAAI or Mitchell's chemical score were as good indices as any to use in assessing the multi- plicity of protein needs (102). Thus, the salient point seems to be that the amino acid content of the food pro— tein may be one of the best indices of protein quality available, and most certainly is a major determining factor of the biological methods presented in this section. 3.2 An Information—Entropy Model of Protein Quality The purpose of this section is to view protein-, or more specifically, amino acid—nutrition with an information—entropy lens. The study of protein nutri— tion is extremely complex, but the main criteria for nutritional well-being are dictated by the organism's growth and maintenance requirements. As noted previously, various indices judge protein quality by estimating that fraction of the protein which is retained for growth or maintenance, depending upon experimental constraints. Consequently, any model which addresses the problem of protein quality must consider the disparity between pro- tein needs during growth and those of maintenance, and the ways in which such variation affects protein quality. I wish to begin my development of an information- entropy model in a discussion of the information flow from the genetic space to the protein space. My objective 55 is not to extensively discuss the transcription of DNA into protein, but rather to outline the processes involved. The following sequence depicts the transcrip- tion of the information from DNA (103): (l) transcription from nuclear DNA template to messenger RNA (mRNA) (2) mRNA to cytoplasm (3) attachement of 30S and 50S ribosomal RNA (rRNA) subunits (called ribosomes) to mRNA (4) activation of amino acid by reaction with transfer RNA (tRNA) forming aminoacyl-tRNA (5) aminoacyl—tRNA is directed to appro- priate codon on mRNA (6) synthesis of peptide bond by rRNA— mRNA—aminoacyl-tRNA-protein complex (7) termination of peptide chain by chain- terminating codon. This sequence is graphically illustrated in Figure 3.2.1. The above relates how information coded on the DNA template passes to a protein through RNA intermedi— aries. As was previously mentioned in sections 2.3 and 2.4, the information present in the DNA can be defined by an information—entropy measure, Hn’ based on the DNA's nucleic acid frequency and sequence. This idea can be further refined to stipulate that each protein-generating DNA template has its own individual information-entropy content. If these templates can each be assumed to be structurally unique, then their information-entropy 56 $ .mflmmsucmm cflmuoum CH coaumEH0mcH owumcmw mo GOHMQHHO 1mcmua mo coaumucmmmnmmm Hmownm6H011.H.~.m mnsmflm mmum dzmH EmmHQOpmo coaumfluomcmuu ou dza Alt] ® 9 ANZME . .«ZG manhnfimfi Homaosq ummHosz 57 contents would also be unique. Given the information transfer of protein synthesis, the information-entropy level of the DNA template determines the amino acid com- position of the protein. From the information theory viewpoint, such a nuclear DNA template can be regarded as an information—entropy message source, and the gen- eration of protein structure can be accomplished by a biological communication system through which the message is sent. Five basic components make up this information communication system: (1) a message for transmission, (2) an encoding device, (3) a channel, (4) a decoding device, and (5) a message transmitted or received. The message sent over this system must be derived from a DNA template in the genetic structure of the organism. The encoding device should consist of enzymes such as RNA polymerase, which encode the DNA message into messenger RNA. I define the messenger RNA as the channel for this system, for it carries the nucleic acid message from the nucleus to the cytoplasm, where synthesis or decoding occurs. Decoding devices for the channel are the ribo— somal RNA subunits, aminoacyl—tRNA, and various protein initiating factors. The decoding process seems to be the most complex step of all, involving many phases; it could be viewed as a highly redundant process to ensure accu— rate decoding of the message. The message is received 58 by the growing peptide chain, which upon completion results in the protein—coded molecular form of the nucleic acid message. The relationships described among these biological phenomena and the information communication system are illustrated in Figure 3.2.2. One aspect of an information system thus far ignored in the discussion of a gene-protein communication system is the notion of noise. Because of the high speci— ficity of the encoding and decoding devices (e.g., enzymes, tRNA, etc.), virtually error—free translation from the DNA to protein occurs (103). Such noiseless transmission in the system allows the information-entropy content of the DNA template to be equivalent to the information—entropy content.ofthe protein, for with error-free transmission in communication systems, the entropies of input and output are identical (104). The above conservation principle between source and receiver information-entropies is very important, because we can now use it to explain the changes in protein pattern requirements during growth. A rapid rate of accretion of protein begins at birth and decreases as the animal grows older (105). This rapid protein reten— tion results both in a higher amino acid intake require— ment, and in alteration of the pattern of amino acid requirements between young organisms and adults. The higher amino acid intake requirement is easily .GOHmeHomcH oaumcmw mo GOHmmHEmcmHB How Emummm cofluMOAGDEEOO pmwflammpH11.N.m.m musmfim ©0>Hoowm . prooma Hmccmzo Hopoocm monsom mmmmmmz cflmuonm mmmnmEMHom Umsmflcflm mpflcsnsm 42m mzm HMEOmOQfiH ‘ZMH — 59 . dzmp .9 .. Hmmcwmmma mzm Hmwmcmuu mgmHmfimu 1H>omocflfim £29 60 rationalized by observation of the increased demand for these compounds in protein synthesis. However, the alterations in duapattern requirements of amino acids during growth are not so readily explained. During growth, a phasic development of various organs (e.g., liver, brain, skeletal muscle, etc.) occurs, and each organ's growth has its own particular amino acid pattern (105). The development of these various organ systems must be caused by the expression of a particular genetic region on the chromosomes of the organism. If the assumption, previously put forth, that each such region possesses a unique information-entropy content, is valid, then our "conservation of information-entropy" principle dictates that for the gene-protein channel the information-entropy content of the corresponding protein must also be unique. Recall that uniqueness can be defined as a particular set or pattern of symbol fre— quencies in information theory, meaning that each unique protein has a distinct pattern of amino acid frequencies. Thus, the differences in the pattern requirements between young and adult organisms can be understood to result from the differences in the information-entropy levels of genetic expression taking place during the early and later stages of development. The "conservation of information-entropy" princi- ple explains how protein metabolism and amino acid 61 requirements can be affected by genetic expression. Now II wish to relate the gene-protein communication systems model depicted in Figure 3.2.2 to amino acid consumption by the organism. The description of protein synthesis illustrates the importance of amino acids within the cytoplasm (called the "amino acid pool"). The presence of many amino acids in the pool results from membrane transportcflfthe plasma amino acids into the cell (106). The primary source of plasma amino acids is the diet (107). Consequently, the decoding of the genetic message is greatly dependent upon dietary amino acids, and par- ticularly upon the essential amino acids, because they affect the amino acid pool composition and size. Let us momentarily review the physiological phe- nomena involved in transmitting dietary amino acids to the tissue cell. Most dietary amino acids are found in the polymer form. The peptide bonds linking the amino acids must first be broken to free them for absorption and utilization by the organism. This bond-breaking proc- ess is termed "hydrolysis," and begins in the stomach and is completed in the small intestine (108). The freed amino acids, coming from endogenous as well as exogenous sources, and also occasionally some small peptides, are taken through the intestinal wall by several transport systems, with each system transporting only a certain 62 set of amino acids. After absorption the amino acids enter the portal blood. The first major organ the dietary amino acids encounter is the liver, which plays a central role in allocating these compounds to the other body tissues (109). Approximately 70% to 100% of the absorbed amino acids are taken up by the liver. Four possible fates await the acids absorbed here: (1) catabolism, (2) syn- thesis into plasma proteins, (3) release as free amino acids, and (4) storage as a part of the liver's labile amino acid reserve. The last three play important roles in supplying remaining body tissues with amino acids, although complete mechanisms for accomplishing this, par- ticularly for the plasma proteins, are not fully under- stood. However, free amino acids in the plasma are transported into the cell and affecttfluaintracellular amino acid pool. Thus, the role of the liver is that of a regulator which temporarily stores the dietary amino acids until they are required by other organs. The overall effect of the above is that the capacity of a cell to carry on protein synthesis is directly dependent upon the ability of the diet to fulfill the anabolic requirements of the organism. From the information perspective, the above proc- esses can be explained in terms of a communication system. First, we have an information source, the food protein, 63 which contains a coded message of amino acids. The message possesses a certain information capacity deter- mined in this case by its word frequency (or amino acid frequency). The message is then encoded (i.e., protein is digested and transported) for transmission through the communiation channel (i.e., circulatory system), then decoded (i.e., transported into cell) and directed to some final destination (i.e., cellular amino acid pool). With the inclusion of the "noise" concept in the system (i.e., those inefficiencies such as the incomplete diges- tion of the dietary protein or poor absorption of amino acids from the gut), an essentially complete communica- tion system has been described for the transmission of the nutritive information in a food protein to the recep- tor amino acid pools in the organism. Figure 3.2.3 is a representation of this system relating the aspects of nutrition and information theory. Out of this basic concept of a communication sys- tem, will be developed some nutritional information- entropy measures. The first question concerns the nature of our measures of information. Informational units are amino acids, of which there are approximately twenty. In the cellular pool, each of these amino acids indepen- dently maintains a particular level or concentration as a function of various metabolic outlets (106). 64 .mmasomaoz HMGOHquHomcH baud 09.23 mo 9 scammwamcmua on» How Emumwm cod» 1moa§§=00 pmuflammlel . m . m . m mHsmHm a . 0 mwgmw AW fig.‘® 9 «HAND w Akmumwm gonnadoufio Emummm unommamua UHU< 2.... \. o 0 ......m... HMHDHHOO Emumwm p m “Hommcmna baud ocflfifi HoneymmucH Hm>flmomm Hopooma Hmccmno Hmpoocm ouunom omMmmmz 65 The combinatorial approach proposed by Kolmogorov (110), a maximum entropy formalization of Shannon's method, is very appropriate for this system. In this approach we assume that a variable, x, containing N ele- ments, has an information-entropy content, H(x), equal to k ln N. Note this formulation for the information- entropy of variable x is exactly the same as Shannon's maximum entropy expression, equation (2.2.3), where all the Pj's are equivalent and equal to l/N. Kolmogorov expanded this approach for a set of variables, x 1' ..., xj, ..., xn, each capable of taking on values, N1, ..., Nj’ ..., Nn' such that the information-entropy of this set is defined: H(xl,...,xj,...xn) H(xl) + --- + H(xj) + ---H(xn) k ln N + --- + k ln N. + ... l J + k 1n Nn . (3.2.1) Thus, for my nutritive amino acid communication system, the variables are the twenty different amino acids present in the cellular pool, where each possesses a particular magnitude dependent upon the overall meta- bolic state of the cell. A similar situation holds for the dietary amino acids, but the magnitude of each of these variables is characteristic of the protein fed. Now, let's formalize this communication system into the 66 above combinatorial format. Starting with our information- entropy source, we stipulate that a given amino acid vari- able, aaj, has a magnitude, cxaxj, for food protein x, where cx is the concentration of protein in the diet (based on molar units of protein), and axj is amino acid content of the protein (based on molar units of aaj per molar unit of protein). The resultant magnitude of the amino acid variable is calculated on the basis of moles of aaj input to the system. Therefore, the total information-entropy of the jth variable is defined: t x H o = k l aXI ’ 3.2.2 X(aaj) n (C J) ( ) and a characteristic protein information-entrOpy, which is based upon the typical or characteristic amino acid spectrum of the dietary protein, as: H aa. = k 1n ax. . 3.2.3 x< J> ( j) ( ) For the message source, the channel transmission rate, I(aaj), can be defined: I(aa ) = eL-k 1n (cxax.) (3 2 4) j At 3 . . . The amino acid variable at the receiving end of the communication system has magnitude amj for the jth amino acid variable. This quantity is also mole—based. If we were to consider a value for amj based on a single cell it would be that amount of amino acid necessary to 67 provide for all the cellular metabolic needs. However, the total nutrition of the organism must be-considered, and not only one cell. Hence, the value of amj will be that amount necessary to fulfill the metabolic require- ments of all the cells in the organism. The information- entropy for the receiving end is: H aa. = k ln am. . 3.2.5 r( j) J ( ) Having defined the information-entropies of the source and receiver, the next aspect of the amino acid nutritive communication system to be viewed is the notion of "channel capacity." The theorem on page 26 states that to minimize transmission errors (noise), the channel capacity must be greater than or equal to that of the source. Ability to minimize transmission error is highly desirable for any organism and nature would probably not design a system which violated conditions allowing error minimization. To minimize transmission error, the rela- tionship which holds for the entrOpy of the source and the channel capacity must also hold between the entropy of encoder and decoder. That is, entrOpy of the decoding device must be greater than or equal to that of the encoding device. Also, the information-entrepy capacity of the source cannot be greater than that of the encoder. If channel capacity is much greater than that of the decoder, then decoder entrOpy becomes the determinant of 68 low error transmission. Assuming this is the case, the decoder entropy determines the channel capacity for error-free transmission. As was previously mentioned, the decoder entropy's jth variable is determined by the metabolic requirements of the receiving end of the system for that amino acid. Consequently, the jth channel capacity, ij, can be taken to be the information— entropy of the respective receiver: —1 _i C . — Kt Hr(aaj) — At k 1n amj . (3.2.6) Both the channel transmission rates and channel capacities are measures of amino acid frequencies of our system. These amino acid frequencies are very much like word frequencies in any spoken language. That amino acids are words and not symbols can be argued along genetic lines. From the genetic code we know that amino acids are coded by a combination of nucleic acid symbols, as we similarly use letters to code words. In this vein, amino acids can be regarded as words. The importance of interpreting amino acids as words is that this interpre- tation offers the opportunity to utilize Zipf's law and obtain a cost—frequency ranking or ordering scheme. The main proposition of Zipf's law is that the total cost for a message of N words, C equals NI c = 2 n.c. . (3.2.7) 1' .———~ .77‘ .374“ . , ~ 69 For our system we sum over only one 1 value because we are transmitting through a one-word channel. Therefore, equation (3.2.7) reduces to: C = n c. (3.2.8) Before proceeding further, let us define more fully what is meant by "cost." The total cost, CN’ can be visualized as the total time available for the jth amino acid to do an appointed number of metabolic tasks for the best growth or maintenance performance by the organism. The number of metabolic tasks which can be performed during CN is nj. Associated with each nj is an individual cost, Cj’ which is the average performance time for each task in the time period CN. Now, if there is some ideal number of tasks the jth amino acid must perform during CN and nj is less than the ideal, an inefficiency arises. The degree to which the system is inefficient is measured by cj, the average task perform- ance time. Given the above definition, let us proceed to deduce Zipf's law as Mandelbrot did. Fortunately, com- binatoric information—entropy formulation is much easier to handle than a probabilistic form. Thus, the finite difference approach is not needed to obtain the cost- frequency relationship. We begin by setting CN equal to At, the time duration of our channel. By definition, the 70 ideal number of metabolic tasks required of aaj over the time At is amj, and the number which can be supplied by food protein x is cxaxj. Substituting these values into equation (3.2.8) we get: _ _ x At — (amj)(cmj) - (c axj)(cxj) , (3.2.9) which becomes: c a . ll= —m3—— , (3.2.10) ij c axj where cxj and cmj are the individual metabolic performance costs of source aaj and receiver aaj, respectively. The left side of equation (3.2.10) is a ranking or ordering function of the ability to perform the ideal number of metabolic tasks relative to the number permitted by dietary limitation. The ranking is absolute if cxj is restricted only to integer multiples of cmj' and the readily identifiable integer sequence results. The rank- is any other real multiple of c .. ing is relative if c m] xj The above formulation is exactly equivalent to Zipf's law if we use an absolute ranking condition, arxj ° cmj = ij' Making the above substitution, and taking the logarithms of both sides, we obtain: ln (arx.)(cm.) x J 3 = 1n ar . = 1n am. - 1n c ax. , (3.2.11) X] J J W. 71 where arx. is the absolute rank—order of protein x and and amj is the maximum aaj frequency of the system, and cxaxj is the aaj frequency of protein x. Equation (3.2. 11) is an exact formulation of Zipf's law. However, because there is not sufficient evidence to assume that cxj is always some integer multiple of ij, the relative ranking form of Zipf's law will be used: C —§i = ln r . = 1n am. — 1n chx. , (3.2.12) ij X] J 3 where rXj is the relative rank—order. Terms in the above formula should look familiar, because after multiplying by a constant factor, k/At, the formula becomes the difference between channel capacity and the transmission rate. This allows one to see the continuity between the operation of the nutritive amino acid communication system and a metabolic cost—frequency ranking dictated by Zipf's law. The objective now is to utilize this concept of relative rank—order to relate our information—entropy analysis to some of the indices of protein quality discussed in the previous chapter. We start by taking the antilogarithm of equation (3.2.12) and shifting around some terms: X c ax. = a . r . . 3.2.13 3 m3/ X3 ( ) 72 Recall that cxaxj is the greatest possible quantity of the jth essential amino acid from food protein x that can be utilized by the organism. For the time being, let us assume that all of the jth amino acid content of protein x is utilized for protein synthesis by the organism, and is thereby retained. If we divide equation (3.2.13) by a factor which we assume constant over At, the total pro- tein nitrogen intake, N expressions for both the II biological value of the jth amino acid of x, BV(xj), and the net protein value, NPV(xj) (or, similarly, net protein utilization) result: C ax. amj 1 EV X. = NPV X.) = = ° . 3.2.14 ( J) ( ] —lNI NI rxj ( ) This states that biological value and net protein value for a single amino acid are inversely proportional to the relative rank—orders. x By taking c axj and csasj for two different food proteins, x and s, and dividing them, we get: rs. chx. r—3—= Tl . (3.2.15) xj c asj The following identity is seen from equation (3.2.14): rsj BV(Xj) NPV(X.) ix—j = _BV—_(sj) = N—JTPV(Sj , (3.2.16) 73 and equation (3.2.15) becomes: BV(x ) NPV(x.) cxaxj BV(sj) = NPV(sj) = 5 ~ (3.2.17) The above equation isaacombined form of Zipf's law for two variables of different rank. The important point here is that under our constraint of complete absorption and retention of the aaj, two protein quality indices have been generated from our information-entropy rule, Zipf's law. Before removing some constraints from equation (3.2.17) and seeing what happens to BV and NPV, I wish to demonstrate an invariance of these indices when the protein concentrations of two proteins are identical. Letlusassume cS equal to cx. The first thing which we notice is that: BV(x.) NPV(X.) ax. _ J_J w BV(sj) ‘ NPV(sj) ' SE; ' (3°2°18) or the relationship betweentflmaprotein quality indices remains unchanged. This allows us to approximate the total source entropies by the characteristic information- entropy of the food protein inputted to the system. The first constaint I will remove is that of 100% digestibility. Removing this constraint changes equation (3.2.18) to: .74 D(x.)-BV(x.) NPV(X.) ax. = = —l , (3.2.19) D(sj)°BV(sj) NPV(sj) asj where D(sj) is the digestibility of protein s and D(xj) is that of protein x. Removing the digestibility con- dition causes no change in our NPV relationship, but does alter our BV, leading us to conclude that NPV is an index more amenable to information-entropy analysis than BV. From the previous equation, the chemical score index is readily derived. First, we limit consideration of amino acids to those which are essential. Then, we make protein 3, usually egg, our standard protein, against whose essential amino acid levels we will compare those of protein x. The minimum axj/asj ratio will be equal to the chemical score (CS). By taking the loga— rithm of equation (3.2.19) the information-entropy explanation of chemical score is made obvious: NPV(X.) k 1n [c5] = min kln —1— NPV(s.) 3 ax. =min kln ——3 . (3.2.20) asj The chemical score is a measure of the amino acid channel which transmits the least information. Ill 75 The essential amino acid index also can be deduced from my information—entropy analysis. Instead of searching through the essential amino acid channels for the one with the minimum unknown/standard ratio, we take the average of all the essential aaj channels: 1 E NPV(X.) k 1n EAAI = —— k 1n 10 j EAA NPV(sj) l E axj = —— k 1n . (3.2.21) 10 j EAA asJ The essential amino acid index is, thus, an average essential amino acid transmission rate through the sys— tem and if we neglect Oser's condition for rejecting that fraction.of axj greater than asj, the EAAI is equivalent to the average entropy over the essential amino acid set (EAA) as defined by equation (3.2.1). Now I will summarize the material presented thus far in this section. First, a foundation for a nutritive amino acid communication system was developed, by relat- ing a schematic of an idealized communication system to protein metabolism. Next, the type of information flow— ing through the system (i.e., amino acids) was discussed. The concept of "channel“ was defined separately for each amino acid, the basis of this approach being the highly specific decoding mechanisms of the cell, and a 76 mathematical formulation of maximum and actual channel transmission was developed. The point was then made that the actual information transmission over a channel, after encoding and decoding had completely taken place for one. message, reflected the amino acid frequency in the source protein message. It was then shown that the word frequency in the channel was associated with a cost in the same way that cost is reflected by word frequency in any other code. A Zipfian distribution betweeen the word frequencies of the source protein and order with respect to their costs was shown to hold. The relation— ship between biological value, net protein value, and rank was deduced, and resulted in general log—linear relations among word frequency, biological value and net protein value. On the assignment of various (experi- mentally controlled) values to the terms of our Zipfian equation, the nutritional indices of biological value, net protein value, chemical score, and essential amino acid index were obtained with the proper constraints. This section has proposed a theoretical model for a nutritional protein communication system with analysis by information theory. The results of the analysis can be shown to correspond to some experimental and empirical protein quality indices. Such an analysis of protein nutrition is significant because it illustrates that information—entropy measures of protein nutrition can be 77 rec0gnized as relevant indicators of quality. Informa- tion theory complements these chemical indices by pro- viding them with a causal relation to nutritional protein evaluation. An analysis with experimental data of the concepts thus far presented follows. 3.39 Analysis of Information- Entropy Approach An analysis of the information-entrOpy model will be undertaken in this section. Specifically, the model predicts correspondence between amino acid content of proteins and experimental measures of their protein quality, namely, biological value and net protein value, and this will be investigated. This analysis will be accomplished using published data on the amino acid contents of proteins and on their respective biological data. Three information—entrOpy measures will be studied. Two utilize the notion of the average information-entropy of the essential amino acid set. Using equation (3.2.1) we define this variable for pro- tein x_as: HX(EAA) = 1%- . X k 1n axj . (3.3.1) 3 EAA The characteristic protein form is used because the level of protein in the experimental diet was constant. In the flak: .... 5.. ...... act“? -.I..L ...... ....nr 1.. 78 previous section Oser's essential amino acid index was derived from this information—entropy measure and related to a log-average of the aaj net protein values. I shall denote the log—average of the net protein values for protein x as NPVX(EAA), and for protein s as NPVS(EAA). Equation (3.2.21) becomes: NPVX(EAA) - NPVS(EAA) = HX(EAA) - HS(EAA) . (3.3.2) If the assumption is made that the NPV for each aaj in the standard is equal to 1.00, NPVS(EAA) equals zero and NPVX(EAA) becomes: NPVX(EAA) = Hx(EAA) - HS(EAA) . (3.3.3) The above is a logarithmic form of the EAAI, discounting Oser's rules. The antilog form of equation (3.3.3) should be an approximation of the true experimental NPV. This antilog form will be defined as IX(NPV) if Oser's conditions are not utilized, and as I:(NPV) with his conditions intact. Both are information-entropy indices. The other information—entropy measure is one found in equation (3.2.20) which generates the chemical score index. The antilog form for the left side of that equation will be denoted as IX(CS). To avoid the hazards involved with collecting values from widely scattered literature citations, I have 79 primarily selected data from a single extensive investi- gation of the amino acid content of proteins and of their effect on protein quality. The work to which I refer is that of Bjorn O. Eggum (111) in studies carried out at the Institute of Animal Science, Department of Animal Physiology, Copenhagen. However, utilizing one source also has its risks and to take into account various peculiarities or errors in Eggum's work another source on the amino acid content of proteins was employed. These data are presented in an FAO compilation of amino acid data, entitled The Amino Acid Content of Foods and Biological Data of Proteins (97). Sixteen different protein diets were studied in Eggum's experiments. The essential amino acid contents of these test diets are given in Table 3.3.1 for Eggum's study, while Table 3.3.2 lists the amino acid contents for similar food proteins found in the FAO report. Unfortunately, Eggum's study did not include tryptophan values. It is readily seen by comparison that the other amino acids values in Tables 3.3.1 and 3.3.2 are very similar. Therefore, I am going to use the FAO tryptophan values with Eggum's other amino acid values in subsequent calculations of information—entropy indices. Some com- promises had to be made: FAO "meat and bone meal" data was utilized in place of "meat and bone scraps"; "oatmeal" was substituted for "oats" and "dehulled oats" in the (ll 80 oo~.m «No.H mmo.m Hv~.o men.~ mmo.¢ omo.m oom.H mvm.m mom oom.H Hmo mmv.m omo.v mmH.m mmH.m omH.m omH.H mmm.m Honuoummnm mHm . . . . . . . . . wom .mumo omHHsnmo + «on m moo H omm m «me m mom H mmo N amp H mom H poo H .wom .Hmms coonsom . a s a a s a a mwOm .MUQO UGHHDSGU + How H mom meH m 6mm m mmH m mmH m Hmm H mom H mmH m .wom .uoozom HHHE omsstm mom.~ mmH.H mom.m omH.m ovm.m Hmo.m mmH.m va.H sov.H Home .uosoncsm Hmo.m omm mom.m oqo.m moo.H Hmo.~ mmv.H moo som.H Home .unoocsouo mom.m mmm.H oHo.m oom.m mmH.m oso.m omo.H omo.H omm.m Home .coonsom oom.m «we mmH.m moo.m Hom.H mom.m H~5.H one mom.m mmouom 0:09 can you: v~o.~ mom mvo.m «mo.m mom.m moo.m mmo.~ mmv.H mhm.m Home .BmHm nom.H mom.H omH.m moa.m moo.m on.m mom.m wov.H oom.m chmmo on.H mom smm.m mom.m moH.m omm.~ wom.H 65o.H mos assouom mmm.H ooo.H HvH.m Hmo.m omn.H moo.m ooo.~ onm.H soH.H mNHmz smo.m mmo omo.m vvm.m mmv.H mmv.m one.H on.H mom.H mam moo.H mmo, Hmo.m om~.m oHo.H vev.m mom.H VmN.H ooo.H 680:3 mnH.~ com Hoo.m mom.m oom.H mmh.m moo.H omm.H mmo.H muoo omm.H mom omH.m mmm.m mms.H mvm.m mow.H oom.H oom.H soHumm H ...... .... m m H .... a m .. a .... 5 S 1 W 8 n O T 1 M m. S ...: H4 0 U 0 T. T. a I. u .... s .....A T. a u o s I. u T. P m. ..u I u n a u 1. O a U T. a _ a D I. a u 9 U a T? U T... T: a n. u a u u 8 e a .Asbmmm "monzomv mcfiwuoum coon mo Az swam Ham moHQSOHOHEV ucmucoo pfiom 0:85411.H.m.m mamma 81 «me omH.N Nmm ooo.m moN.v mom.N mmo.m Nmo.N mmo.N omN.N mom woe .Hmosuoo + mHv Hvo.H omo.H ovm.m mHN.v mNN.N me.m omo.N NNN.H mHm.N woo .Hoozom HHHz “kuumumwum mflm . . . . . . . . wom .Hooauoo + Hmm mmv N vmo Nam N Nmm m mmH N New N mom H can H NmH N .wom .omogsom . . . . . . . . . mom .Hmwfiumo + 8H8 moo H ooo H mwN m Nmo w omo H moo m mmm H moH H mom N wom .Hoosom HHHz 8H8 mom.N amo on.N Hoo.m ovo.N mo>.N mNo.H mmH.H ovm.H uoonmcsm NHN Noo.v 8mm HNN.m omo.m NHo.H HNN.N oom.H mom HHm.H uncocdouo Nom mom.N HNo.H omm.N mos.m osH.N omm.N NNo.N ohm mN>.N noonsom NoN mov.N oNN voo.N mmm.N N6N.H mNN.N «No.H «on mvN.N Home much new poo: voN NON.N wmo.H on.N omv.m Nmo.N oHs.N NNN.N moe.H on.m Home .anm mom HNN.H mmH.H mso.e mNo.w mmo.N moo.m Nov.N NmH.N va.m :Hommo van mOH.H How Nah.N nvm.o oom.H 686.N wmm.H who mom ssnmuom wHN Nom.H hmo.H 66H.m Ham.m mms.H wmm.N omw.H NON.H NNH.H oNHoz oNN Noo.H omm NNN.N 6mm.N woo.H mmm.N NmN.H «0H.H Nv¢.H Hows mHons .osm Nmm mmo.H NNm H¢F.N oNH.m mmm.H omN.N omm.H moN.H NNN.H poor: Nmm osN.N New NNo.m wmv.m mom.H mmo.N Nos.H emv.H omo.H Hooauoo Hue voo.H wa mHo.m wsH.m HH».H Hoo.N mms.H owN.H omv.H soHumm I V H R e d T I A E W T .... N. H ...HN m H w m ..m H d .... 1. ..Auu o T. T. a .A I. 3 u I. 1 1.x T a u o s I. u o T. 0.. OUT. u n a u 1.0 a w. w m. .. m. H. mm. W a u u a nu 9 9 89 .A0¢m umOHflomv mCHououm boom mo AZ Baum Hmm meOEOHUHEV uqmucoo cflom ocflfimll.m.m.m mqmds 82 50:50 mixed diets of dehulled oats with skim milk powder and soybean meal; whole meal rye for rye; soybean, groundnut, and sunflower seed for the respective meals; milk powder for skim milk powder; and a 60% milk powder + 40% oatmeal mixture for the pig prestarter. The results of Eggum's biological tests are listed in Table 3.3.3. Two of the protein quality indi- ces discussed in section 3.1 were measured, biological value and net protein value. Two different test animals were used: rats, each initially weighing 75 grams, and baby pigs, about 16 days old. The rats were fed 150 mg N once daily. The balance period was 5 days, and the feed- ing regimen was initiated 4 days before. The baby pigs were fed 6 times daily, and the protein level in their diets was 3.84% N of dry matter. The Pigs were condi— tioned for 6 days before the balance period, which was 4 days long. Table 3.3.4 lists the information model's three predicted values for the experimental net protein values, which reflect various constraints upon the model (results are given on a scale of 0 to 100). IX(NPV) is the uncon- strained EAAI, whereas I:(NPV) employs Oser's rules, and IX(CS) takes account of the limiting amino acid con- straint in chemical scoring. The subscripts e and f denote Eggum's and FAQ data sources, respectively. A linear regression analysis (98) was done between the 11 11 m.oo o.mo mom N.mh H.mh h.ob N.mw mmflm anon How HmuHmummHm N.oo H.ms m.oo v.oo mummmwMmemmooMom m.mo H.HN m.mo o.nn Homwmm memwmwmswxm 6.8m o.oo o.so N.oa Home comm Hoonmqsm v.8m n.mm m.mm v.oo Home useoqsouo N.vo N.Ho N.om o.No Home coon msom N.mm 6.86 N.Nv N.mv mmonom much 6:6 you: o.qm H.oo o.HN m.oo Home rmHm m3. mam New ENN NHN 50.1.8 N.No m.mo m.ve N.Nm ssnmuom m.mo o.No o.om H.mm oNHmz m.vo h.ma o.mm o.oo mam N.mo N.HN m.Nm o.om ummgs o.oo v.6N H.mm m.on muoo p.66 m.om o.mm m.Ho mmHHmm Hmono msHo> HmmHmc msHm> Hmnmmc onHm> Hmnmmv msHo> :Hmuoum uoz HMOHmoHon :Hopoum uoz HmonoHon . gamma " onsomv mmflm whom can mumm mo mumwn puma comuxfim mo monam> :Hmuoum uoz paw mmaam> HMONUOHOHmII.m.m.m mamma 84 o.ooH o.ooH o.ooH moH.HH o.oOH o.ooH o.OOH HoH.HH mom o.oo H.8m m.¢m oom.oH N.mo o.Nm m.mm Nvm.oH nouHmHmon oHo o.NN o.oN o.om moa.oH v.46 m.ho o.oo NNN.oH mnoo ooMMMMommwmmom H.om o.om H.Hm Hom.oH N.HN o.oN o.on noo.oH MMMMoMowmmmomommmxm o.Hm N.ma N.mN Noo.oH N.Nm m.ao o.om ooN.oH Home .uosoncsm o.om m.mo o.oo omm.oH o.oe o.mo o.oo oom.oH Home .uscocsouo o.ov o.oa o.om HoN.oH o.em ¢.om o.om mam.oH Home .cmoosom o.mm N.No H.mo mmv.oH o.He m.mo m.mo va.oH odouom moon cam now: m.vo N.Hm N.Hm NHm.oH m.¢o N.am o.mw mom.oH Home .anm N.No o.oo H.ooH vHH.HH o.om o.om H.5o mmo.HH :Hmmoo o.oN v.mo H.oo mom.oH m.NN w.no N.oo mom.oH aoamuom H.mm m.No o.oN mom.oH H.Hv H.Ho o.NN mmo.OH oNHmz m.mv N.mo N.mo ovv.OH m.mv o.oo m.mo mov.oH mam w.Na m.mo m.mo Nom.oH m.mm m.mo m.mo Hov.oH 060:3 m.vm «.mo o.mN oo>.oH o.oo e.No o.oo NmN.oH muoo o.ov m.HN o.NN oNo.oH m.mm H.oo v.6N NHN.oH moHumm Ammova Hm>mvaH Hm>mzcxH Hmammoxm HomocxH Ho>mszH Ao>mzcxH “mammvxm mama 0mm mama asmmm .mawmuoum coon ucmuomMHn ammuxfim How mwowch meHuamlcoquEHomcHll.w.m.m Manda 1. . .... ~90. 1‘ . H Vie... , . .... . 2.41....1... . 9.19%? .1wfiou11..uk. 1 .. +11 N ,. ,..f.«..s..11.. 11...: 11 1111.11.71.11 1.0...Lllruwi £3: 85 experimental biological values and net protein values of both rats and pigs and the three information-entrOpy indices. Table 3.3.5 lists the correlation coefficients (98) of this regression analysis. All correlations between the information-entropy indices and biological evaluations are highly significant (P < 0.005 for all regressions). Because the digestibility oftfluaprotein affects the model's ability to predict this parameter the correlation of the information-entropy indices is somewhat poorer for biological value than for net protein value. This was anticipated, however, and a reasonable correlation still exists between the model's predictions TABLE 3.3.5.--Matrix of Correlation Coefficients for Information- Entropy Measures Versus Net Protein values and Biological Values of Rats and Baby Pigs (based on amino acid content of dietary protein). Net Protein Biological Net Protein Biological Value (Rats) Value (Rats) Value (Pigs) Value (Rats) Ix(NPVé) 0.843 0.706 0.795 0.673 I:(NPVe) 0.898 0.794 0.817 0.729 Ix(CSe) 0.914 0.896 0.702 0.675 Ix(NPVf) 0.856 0.704 0.786 0.634 I:(NPVf) 0.880 0.748 0.806 0.677 0.951 0.916 0.777 0.729 Ix(CSf) 86 and biological value. The two information indices, Ix(NPV) and I3(NPV), in which were considered the total essential amino acid contents of the proteins, gave a more consistent interspecies correlation than did the chemical scoring estimate. This seems to suggest that a total entropy criterion based on all essential amino acid channels gives better results than relying on the entropy of a single channel. A ranking of the sixteen proteins based on their respective scores was done. Spearman's rank correlation coefficients (99) were then computed. The correlations TABLE 3.3.6.--Matrix of Spearman's Rank Correlation Coefficients for Ranks of Information-Entropy Measures versus Net Pro- tein Values and Biological Values of Rats and Baby Pigs (based on amino acid content of dietary protein). Net Protein Biological Net Protein Biological Value (Rats) Value (Rats) Value (Pigs) Value (Pigs) Ix(NPVé) 0.824 0.629 0.546 0.513 I:(NPVe) 0.859 0.685 0.615 0.582 Ix(CSe) 0.854 0.848 0.476 0.499 Ix(NPVf) 0.888 0.727 0.594 0.526 I:(NPVf) 0.897 0.747 0.641 0.538 Ix(CSf) 0.950 0.906 0.553 0.582 87 among the rank-orderings, Table 3.3.6, as dictated by biological testing and information—entropy, were sig- nificant (P < 0.05) for all regressions). This analy- sis shows that not only does the information-entropy model generate significantly correlated scoring, but the rankings of these scores are also consistent. Eggum determined the individual amino acid avail— abilities for the food proteins, and thus performance of the model was tested using quantities of available amino acids as information sources. Each of the information- entropy indices was recalculated using the fraction of the protein amino acids which was available. A linear regres- sion analysis and rank correlation were again done, and the resultant correlation coefficients are given in Tables 3.3.7 and 3.3.8, respectively. The results of this analysis indicate that the model predictions are rela— tively uninfluenced by the use of the original amino acid content of the protein as opposed to the use of their respective availabilities. Only the IX(CS) index, based on the chemical scoring assumption, exhibits a consistent improvement. The final analysis undertaken in this section was an examination of the use of Zipf's law in the model. If a Zipfian relationship is present, a log-log plot of our measure of information frequency versus the ranking TABLE 3.3.7.--Matrix of Correlation Coefficients for Information-Entropy Measures Versus Net Protein Values and Biological Values of Rats and Baby Pigs (based on available amino acid content of dietary protein). IX(NPV) 13(NPV) Ix(CS) Negaifigtfiigts) 0.832 0.887 0.944 Bigiigicfifiats) 0.643 0.700 0.866 Nesaifigtigggs) 0.811 0.864 0.805 Bi°1°9ical 0.675 0.817 0.705 Value (Pigs) TABLE 3.3.8.--Matrix of Spearman's Rank Correlation Coefficients for Ranks of Information- Entropy Measures Versus Net Protein Values and Biological Values of Rats and Baby Pigs (based on available amino acid content of dietary protein). IX(NPV) 13(NPV) IX(CS) Nesaifigtjigts) 0.782 0.812 0.918 11 Biological 0.551 0.700 0,589 Value (Pigs) ||1 89 function should be linear. The information frequencies for our information variables are given as follows: for IX(NPV) the logarithm of the information frequency is HX(EAA), while for I:(NPV) the log frequency is HX(EAA), which, modified by Oser's rules, will be denoted H:(EAA), and that of IX(CS) is min Hx(aaj). The ranking function is the inverse of the experimental net protein value or biological value index (scaled to 1.0), as stipulated in section 3.2. The amino acid frequency for the chemical scoring method was calculated by multiplying IX(CS) by the log-average amino acid frequency for egg protein, anti-log HX(NPV) of egg. The logarithmic form of the resultant frequency is denoted min H:(aaj). This stan- dardization procedure is done to remove the variation which would occur in the analysis if raw min HX(CS) were used. The results of a correlation and regression analysis are presented in Tables 3.3.9 and 3.3.10. Table 3.3.9 gives the correlation coefficients. All correlations are at least significant at the P < 0.01 level and these results tend to support the cost-frequency behavior of Zipf's law. However, the most interesting aspect of the analy- sis is found in Table 3.3.10, which lists the slopes of the Zipfian regression analyses. The information-entropy model would predict a lepe of -l.00, but except for sev- eral values, our regression analyses yield slopes of approximately -0.50. Figure 3.3.1 illustrates this result 90 H Hmo.o1 amo.o1 HHa.o1 amm.o1 1.6oexmncHs amo.o1 mka.o1 m46.o1 who.o1 Anmamv mm omm.o1 Nma.o1 ooN.o1 Nmm.o1 Ammamvxm n Hmm.o1 voo.o1 oem.o1 vom.o1 1.6618mooHa @ . VA woo.o1 Hm6.o1 NNN.o1 mam.o1 1 < Ammflmv wsam> Amuomv mDHm> Amummv moam> mmwwm wocmwwwmw HMUHmoHon aflwwoum umz HMOHmOHon cHououm uoz mcfixzmm 1HdmcH .Hmcoz >QOHucmlquumEHomsH no mHmNHmaa looH1ooHv cmHmoHN you muaoHoHnmooo quuonHHoo no xHHumz11.m.m.m mHmme ||( 91 H ooq.H1 soN.H1 Nov.H1 mNm.H1 1.6618mmcHs x ooe.o1 moo.o1 omm.o1 moo.o1 Ammamcom Mn mNo.o1 mwo.o1 Nwm.o1 mam.o1 Hmaamv m n x m mom.H1 NNo.H1 va.H1 mmN.H1 1 661 m 2H8 m um 4m6.o1 Hoo.o1 mmm.o1 on.o1 . 1 ammo m G x ovo.o1 NNo.o1 NHm.o1 Nmm.o1 1 «Nov m Ammflmv wsam> Ammflmv wsaw> Ampmmv 05Hm> Amummv mdam> mmwwm mocmwwmwm HmowmoHowm campoum umz HMOHmoHon cflmgoum umz mcflxcmm IHOMGH . _ .Hmwoz amonuGMIcoHmeHomcH mo mHm>HMG¢ Amoalmoav smemmwu mo mmmon mo xHHuMEII.OH.m.m mqmfla 92 Net Protein Value Information Frequency 100 80 60 40 12* I 1 l 4000 -—3000 Slope = -0.526 11- i )—-1500 E c:% IE ‘ 10" L-1000 Slope = —1.000 _ 600 9 1 I 1 I l 0 0.25 0.5 0.75 1.0 1.25 1.5 log (Rank-ordering) Figure 3.3.l.--H3(EAA), the Average Information-Entropy, Versus Zipfian-Rank—Ordering for Net Protein Value for Rats. 93 for the information-entropy model of the net protein value rank-ordering of rats. This apparent error in the model is due to a constraint we still have operating, namely, total retention of all amino acids fed into the system. This effect will be explored more fully in the discussion when the model is assessed in light of all evidence. In general, the information—entropy model cor- relates with both scores and rankings of such indices as biological value and net protein value. These results tend to indicate”that information theory could provide a causal interpretation for underlying biological phe- nomena and serve as an aid in rationalizing observed nutritional behavior. CHAPTER IV INFORMATION AND THE HYDROLYSIS OF CARBOHYDRATE POLYMERS The release of simple sugars from complex glucose polymers is directly related to the nutritonal values of these substances. A relationship between chain length and hydrolysis can be deduced by an information-entropy analysis. The organization of carbohydrate information is related to the polymer's length, and this structural information affects the rate of hydrolysis. The following presents an information-entropy approach for discerning the above relationship and verifies the analysis with experimental data on such degradative processes. 4.1 Aspects of Carbohydrate Structure, Hydrolysis and Metabolism Unlike protein structure, carbohydrate structure is usually based on the frequency of only one type of chemical information, namely, glucose (112). Glucose is a hexose or six-carbon sugar. Glucose is commonly found in polymer forms of which there are two main linear classes, amyloses and celluloses. The difference between amylose and cellulose is structural: the way the glucose 94 95 monomers are bonded into chains differs. In carbohy- drates the general. picture is.of a single information unit linked together in different ways, whereas in protein nutritions there are many different information units linked together in one way (i.e., peptide bond). An amylose bond originates at the a-position on the asymmetric carbon of glucose (113) and the cellulose at the B-position (114). Primarily, the linkages go from the C-1 to the carbon at the C-4 position in the adjoin- ing molecule when the linear polymers are formed. Aside from these linear bonds branching ones also exist; the most common being the a-1,6. This discussion will be limited to the linear 1-4 linkages. 7 Both amylose and cellulose must be degraded, so their glucose can be made available in monomer form, before these substances possess nutritional value. Degradation is accomplished by bond-specific enzymes. Those glucan hydrolase enzymes which react with the d-l,4 linked glucose of amylose are known as amylases (115), while those which react with the B-l,4 linkages of cellulose are called cellulases (116). These glucan hydrolases operate in two different ways. The exo-enzyme mechanism attacks the nonreducing end of the polymer, cleaving off disaccharides in an endwise fashion. The endo-enzyme mechanism attacks the internal linkages of the polymer, randomly breaking it down, initially into a 96 mixture of di- and tri—saccharides, and finally into a mixture of mono- and di—saccharides. The residual di-saccharides and the few tri- saccharides produced by polymer degradation are readily hydrolyzed into glucose by an enzymatic class known as the glucosidases. These enzymes are specific for the (r-orB-bonds of two or three unit glucose chains and react poorly or not at all with polymers of greater chain length. Maltase is the common name of the glucosidase degrading the a-linked di-saccharide, maltose, whereas cellobiase is the enzyme acting on the B—dimer of glucose, cellobiose. Once glucose is obtained from the digestion of carbohydrate polymers, one of its metabolic functions is to provide the organism with energy. This energy is obtained by the breakdown of glucose into carbon dioxide and water. Two metabolic pathways are needed to derive the nutritional energy from glucose (117). The Embden— Meyerhopf pathway takes glucose and converts it into two molecules of pyruvate with some generation of biochemical energy (ATP). The pyruvate is then oxidized, with the loss of a carbon, into an acetyl group which enters the tricarboxylic acid cycle and is completely oxidized into Carbohydrate and water with significant generation of ATP. Given the above metabolic role of carbohydrates, how is their nutritional value measured? Digestibility 97 is the main criterion for ascertaining the value of car— bohydrate. In the routine analysis of feeds, the nutri— tional benefit of carbohydrate results from the digestibility of two fractions, the crude fiber fraction and the nitrogen—free extract. The chemical procedure for this fractionation of feeds was developed over 100 years ago and is known as the Weende method (118). The crude fiber fraction consists basically of cellulose, lignin, and other structural polysaccharides. The nitrogen-free extract consists of amylose, sugars, lignin, and material known as hemicellulose. Generally, then, the digesti- bilities of cellulose and amylose components of the feed are studied. Once the digestibilities of crude fiber and nitrogen-free extract fractions are known, the contri- butions to digestible and metabolizable energy of the cellulosidic and amylosidic components can be determined (119). The conversion factor for carbohydrates used in calculating the digestible energy from the digestible crude fiber and nitrogen—free extract fractions is 4 kcal/g. Also, if only the crude fiber and nitrogen- free extract fractions are considered, the digestible energy equals the metabolizable energy. A chemical analysis of the feedstuff ingested by an animal will give the necessary data on the quantity of carbohydrate in the diet. The energy contribution of these carbohydrates to the organism's metabolism is 98 ascertained by experimentally determining the digesti- bility of this fraction. Given the polymeric nature of carbohydrates and the role these polymers have in nutri- tion, their structure—function behavior could be patterned after an information—entropic behavior similar to that in the previous analysis of proteins. Just what information— entropic rules are followed in carbohydrate metabolism will be examined in the following section by analyzing how the polymeric structure of carbohydrates affects the rate and extent of their hydrolysis. 4.2 An Encoding Model for Carbohydrate Information In Chapter III the information-entropy approach was used to analyze the transmission of nutritional pro— tein information. The capacity of the decoder was important in the previous analysis because it reflected the optimal metabolic requirement for amino acids or protein information. This demand had to be uniquely satisfied for each essential word (amino acid), because each was missing information required for development. The information requirements in carbohydrate nutrition differ from those for protein nutrition. First, a carbohydrate requirement does not exist per se, but rather, an energy requirement does. The primary function of carbohydrates is to satisfy the energy requirement, a requirement also fulfilled by proteins and 99 lipids. Therefore, carbohydrates do not supply essential information as proteins do. The channel and decoding capacities are not as relevant information-entropy parameters for carbohydrates as they are for proteins. This is because the excessive carbohydrate storage capa— bility of the organism places no limit upon the trans— mission of the carbohydrate message once it enters the channel. Consequently, the carbohydrate information encoded (i.e., transported) into the organism identifies the carbohydrate nutritional contribution. In the previous section the main carbohydrate messages, amylose and cellulose, and their basic informa- tion unit, glucose, were discussed. To be properly encoded, a carbohydrate polymer message must be reduced to the monomer form. This encoding process is analogous to the enzymatic digestion of the polymer and we can think of the amylase and cellulase enzymes as encoding devices. Typically, the efficiency of the encoding proc- ess is related to the length of the message to be encoded. The longer the message, the greater the cost of encoding and the lower the efficiency. Message length in carbohydrate chemistry is ssynonymous with the degree of polymerization, DP. Con- ssider a carbohydrate source which is monodisperse (i.e., 5111 molecules have the same degree of polymerization) (120). If Ng equals the total number of glucose units in 100 this source, the frequency of messages, m(DPj) with length DPj, equals Ng divided by DPj. Given the relationship between message length and encoding efficiency, Zipf's law will order message length with respect to encoding cost. Equation (2.2.22) gives this rank-frequency relation: k 1n m (DPj) = k 1n Ng - k 1n (Rankj) , (4.2.1) or, alternatively, k l N DP. = k l N — k l R k. . 4.2.2 n(g/ 3) ng n(anj) ( ) Solving equation (4.2.2) for Rankj yields the relation— ship: k 1n Rank. = k 1n N - k ln N DP. ( j) g (g/ j) = k ln DPj . (4.2.3) Equation (4.2.3) gives both the logarithm of the rank and the absolute redundancy, which, if divided by k 1n Ng' becomes the relative redundancy. The encoding efficiency, EF., equals one minus the relative redundancy, J and for carbohydrates has the following form: k ln DP. k 1n (N /DP ) g 1 EFj = 1 ' k ln N = k 1n N 9 g k ln (Rank.) = _ ____J_ 1 k M N . (4.2.4) 9 101 The above equation is identical to Reza's definition of encoding efficiency (121): the entropy of the original message ensemble, k ln (Ng/DPj), divided by the maximum information, k 1n Ng, times the average length of the encoded message (equal to one for monomers). The inter- pretation of the encoding process for a carbohydrate message, then, is that as the cost of encoding increases (i.e., as the ranking of messages with respect to their degree of polymerization increases) the efficiency of the encoding process decreases. Now comes the question of relating the above result from our information-entrOpy analysis to a rele— vant nutritional index. Let us begin by defining the cost associated with encoding further. Equation (2.2.13) gives the total message cost which can accommodate an information-entrOpy analysis. By considering a mono- disperse carbohydrate message where the frequency, nj, of words can be determined by dividing the total number of symbols (monomers) present, Ng’ by the message length or degree of polymerization, DPj, the total cost, CJ, becomes: N C = n.c. = ——— c (4.2.5) 9 J 33 DP]. j. An expression for the degree of polymerization based upon the above variables is: 102 c. . (4.2.6) The variable cj is, in the context of our defi- nition, the cost (i.e., time) per word. The total activity of an enzyme is defined as the substrate con- sumed per unit time (122). Consideration of the dimen- sions of these two variables indicates that they are inversely related; cj is equal to the inverse of the total enzymatic activity. It is also logical that the enzymatic activity is a determinant of the cost of encoding because the encoding device for the carbohydrate message is an enzyme. Given that this relationship is correct, a link between a physical measure of carbohydrate hydrolysis (i.e., enzyme activity) and information encoding can be established. Substituting the total enzymatic activity, a., for a carbohydrate polymer of degree of polymerization J 1, we obtain: (4.2.7) Let us now consider two monodisperse systems of different degrees of polymerization, each with a total of N9 mono- mers. The total costs, CJ and CI’ of each system are equal, but the individual message cost, cj and Ci’ will be different. This is logical because the message 103 frequency of the system possessing the lower degree of polymerization increases proportionally as its individual message cost decreases. Now, by taking the logarithm of equation (4.2.7) for the jth and ith polymer systems and calculating the difference between them, the following is seen to be true: k ln DPj - k ln DPi = k 1n (ai/aj) . (4.2.8) Note that a similar result is obtained by computing the difference between the jth and ith systems using equa- tion (4.2.3), and results would be identical if the inverse total enzymatic activity were equal to the rank. Assuming that the enzymatic activity does have such a relationship to rank, what is the significance? Firstly, equations (4.2.7) and (4.2.3) imply that as rank of DPj increases, the total enzymatic activity will decrease, or alternatively, the respective polymer cost will increase. Then, if we assume DPi to be greater than DPj, the ratio ai/aj measures the relative rate of hydrolysis as the more highly polymerized ith system is degraded to a less polymerized state, the jth system. This activity ratio is also known as the yield or recovery (122) of the activity at the nth step of a reaction, com— pared with some reference level. Since as the polymer is degraded, it attains a lower degree of polymerization, 104 the activity increases proportionally; the activity recovered by the polymeriJibeing degraded can be thought of as proportional to the degree to which it has been hydrolyzed. Therefore, the activity ratio ai/aj, measuring the proportion of the activity recovered by the molecule during degradation, can be viewed as an estimate of a hydrolysis coefficient, Di' Setting ai/aj equal to Di’ the degree of hydrolysis the ith poly- mer system has undergone when it possesses a degree of polymerization of j, and substituting into equation (4.2.8), yields: k 1n DP. = k ln DP. + k 1n D. , (4.2.9) 3 i i which relates the information-entropy measure, message length, to the degree of hydrolysis. The hydrolytic mea— sure of equation (4.2.9) is identical to the ratio of the encoding efficiency of the jth system to that of the ith system. Using equation (4.2.4) to determine EFi and EFj, the ratio equals: EFi _ k 1n (Ng/DPi) k ln (Ng/DPi) Er.‘ klnN ’ klnN 3 9 9 k 1n (DP./DPi) = _______l_____ k In Ng , (4.2.10) 105 which is proportional to the extent of hydrolysis, Di’ in equation (4.2.9). Thus, the cost-efficiency reasoning encompassed by Zipf's law has led to ordering of the hydrolyses of carbohydrate messages based on their lengths. This rela- tionship in turn has been shown to be identical to the encoding efficiency of the carbohydrate message, which is probably the most sensitive variable in the trans— mission of carbohydrate information. The agreement between this approach and experimental data will be demonstrated in the following section. Before assessing the information—entropy method for estimating the hydrolysis of amylose, a modification is necessary because when activities of enzymes are determined, the experimental conditions are constrained so the concentration or word frequency, nj, instead of N9, the total monomer concentration, is constant. Thus, equation (4.2.7) becomes: N . DP. = 93 J l __ 1 (4.2.11) CJ aj fivhere Ngj is the number of glucose monomers in the jth System, equal to DPj times nj. Given equation (4.2.11), the difference between the jth and ith polymer systems is: 106 k 1n DP. - k ln DP. 3 1 + k In order to see how equation ( tion (4.2.8), both the relationships between N and those between C N J gi If we denote DPi as greater than DPj, the experi- mental constraint of nj being N e uals N g 9] gi ratio of CI to CJ shown by first changing N is equal to gi ratio and substituting it into equation allows a solution of DPj in terms of CI' equation (4.2.7) gives a solution of DPj N . k ln _El . and CI must be known. times the ratio of DPi to DPj. to N 93 J ln (ai/aj) . (4.2.12) 4.2.12) differs from equa- gj and equal to ni implies that That the that of DPi to DPj can be times the DPi/DPj (4.2.11), which Alternatively, in terms of CJ. Equating these two expressions for DPi shows the ratio of CI/CJ equal to aj/ai, and the equation (4.2.8) equal to the together these relationships, (4.2.12) becomes: ( the logarithm of which equals gi activity ratio given by ratio DPi/DPj. Putting the ratio term of equation DPi DP = l , (4.2.13) zero. Therefore, the (experimental modification of making the initial 107 concentrations (i.e., word frequencies) equal necessi- tates no adjustment of the information-entropy relation- ship and equation (4.2.9) is valid. The remaining question concerns the validity of this information-entropy approach in determining an experimental value such as enzymatic activity. If the mathematical relations previously developed can be con- firmed by experimental means, then more faith can be placed in the viability of the information approach as a methodology for understanding enzyme hydrolysis. The following section will assess the agreement of experi- mental data with information-entropy theory. 4.3 Assessment of the Carbohydrate Information-Entropy Analysis Because the conditions of the information- entropic analysis were based upon a very selective type of experimental situation, the first part of this assess- ment will focus on monodisperse systems. However, the approach will later be modified so that hydrolyses of polydisperse systems can be calculated. The total activity of the enzyme will be determined from velocity of the reaction (moles/unit time) as given by the Michaelis-Menten equation (123), and using the kinetic constants characteristic of the enzyme under study. The data for amyloses were taken from a paper by Husemann and Pfannemuller (124), who experimentally L 108 determined the kinetic constants Vm' maximum reaction velocity, and Km’ the Michaelis-Menten constants for two amylases, B-amylase and phosphorylase synthetase (sources of the enzymes in the studies were not discernible). Both are exo-enzymes: B-amylase cleaves off maltose units from the non—reducing end of the polymer, while phos- phorylase cleaves or adds glucose-l-phosphate units to the ends of polymers (125). The experiment was done with amylose having degrees of polymerization from 750 to 3,815 where the molecular weight distribution for each polymer was narrow (i.e., approximating a monodiSperse solution). These amylose chains served as sites of degradation for B-amylase action or sites of synthesis for phOSphorylase. The substrate concentration used in calculating the total activities of the enzymes was 0.06hL Table 4 . 3. 1 (page 111) presents the experimental kinetic data for these enzymes while a graphic display of these rela- tionships to DPj for B-amylase can be found in Figure 4.3.1 (page 109). The results of a correlation and regression analysis (P < 0.05) between total enzymatic activity and degree of polymerization are found in Table 4.3.3 (page 112). Data on monodisperse solutions of carboxymethyl cellulose for four different cellulase complexes, after an experiment by Almon and Eriksson (126), were used to investigate the chain length—activity relationship for 109 Activity (10_6 moles/sec) 2.0 3.0 4.0 6.0 8.0 13 ' ' 1 8000 12— ..4000 a, 11- 1—2000 H O (U 0 U) N 8‘ 10— —1000 ....4 o 9... —500 8 1 l r l 1 1 1 1 I 1.0 2 0 3.0 log2 Scale Figure 4.3.1.-—The Degree of Polymerization Versus Enzymatic Activity for B-Amylase. Degree of Polymerization (Amylose) 110 Activity (10.9 moles/sec) 1 4 16 64 256 10.0 1 i 1 o 9.0- — o 3 8.0— — (U 8 O U) .9. O h— 7.0-1 00 6.0— _ 0 I I I I I I I I o 5.0 10.0 log Scale 1000 500 250 125 64 32 Figure 4.3.2.--The Degree of Polymerization Versus Enzymatic Activity for Cellulase A (Penicillium Notatum). Degree of Polymerization (Cellulose) ‘|_..rl 111 TABLE 4. 3. 1. --Degree of Polymerization and Enzyme Kinetic Data of Amylose. B-Amylase Phosphorylase DPJ' Km* vm* * aj * * Km1L vmfl aj H 750 66.6 10.3 10.29 19.0 22.2 22.19 1,775 39.4 6.25 6.25 13.0 14.7 14.70 2,875 29.6 4.0 3.99 6.75 8.04 8.04 3,815 23.3 3.45 2.92 3.25 4.50 4.50 *micromoles maltose. **micromoles maltose/sec. *micromoles glucose. +micromoles glucose/sec. TABLE 4.3.2.--Degree of Polymerization and Activity Data of Cellulose, with activity in (moles/ sec.) x 10' DP Cellulase Cellulase Cellulase Cellulase j A* B** CT DII 112 124 109 126 90 118 85 68 66 68 128 158 185 248 156 211 58 60 80 86 291 69 57 57 60 323 42 ‘ 25 35 30 351 37 16 _ 31 29 625 17 10 19 12 871 6.2 6.2 7.3 7.1 885 8.0 5.9 5.9 5.5 988 14.3 6.5 8.9 11.2 *Cellulase purified from Penicillium Chrysogenim Notatum. **Cellulase dialyzed from Aspergillus Oryzae Niger. 1’Cellulase partially purified from Aspergillus Oryzae Niger. ++Cellulase purified from Stereum Sanguinolentum. 112 TABLE 4.3.3.—-Correlation and Regression Analysis of Activity Data Versus Degree of Polymeriza- tion. Correlation . Enzyme Coefficient Slope x-intercept B-Amylase 0.99 -l.33 14.41 Phosphorylase 0.95 -0.97 14.21 Cellulase A* 0.95 -0.72 12.09 Cellulase B** 0.99 -0.65 11.46 Cellulase c”r 0.93 -0.57 11.26 Cellulase B** 0.97 -0.70 11.85 *Purified from Penicillium C. Notatum. **Dialyzed from Aspergillus O. Niger. +Partially purified from Aspergillus O. Niger. IIPurified from Stereum Sanguinolentum. cellulose. The carboxy-methyl-substituted cellulose was used because samples with a narrow molecular distribution were more readily attainable. The degree of substitution on the cellulose had a range from 0.8 to 1.0. The activity was calculated by relating the changes in vis— cosity to enzymatic degradation and the enzymatic activity was calculated through the number of bonds broken per unit time; refer to the above paper if addi- tional information on determination of the enzymatic activity is necessary. The cellulases employed are from three sources: Penicillium Chrysogenim Notatum, Aspergillus Oryzae Niger, 113 and Stereum Sanguinolentum. These cellulases are all complexes of exo—enzymes and endo—enzymes (116), and random as well as endwise degradation occurs. The rela— tionship between enzymatic activity and degree of poly— merization is given in Table 4.3.2 (page 111). A graphical presentation for the Penicillium Notatum com— plex, illustrating the respective activity versus chain length behavior, is found in Figure 4.3.2 (page 110). The results of correlation and regression analysis (P < 0.0005) on the cellulases' enzymatic activities are given in Table 4.3.3 (page 112). The activity—degree of polymerization data indi— cate that the log—linear correlation between DPj and aj based on equation (4.2.7) is good for both the amyloses and celluloses. This is an important confirmation of my approach, because equation (4.2.7) was derived from an information analysis of carbohydrate encoding, and also is the foundation for subsequent equations relating the information approach to enzyme hydrolysis. The regression analysis yields a considerable variation from the pre- dicted slope of minus one, a necessary condition for maximal information ordering in the system. Only one enzyme, phosphorylase, has a slope close to unity——the others differ. This behavior does not detract from the information—entropy approach but rather implies that these other systems are not most effectively organized. The 114 carboxymethyl celluloses are certainly not, because sub— stitution of the cellulose polymer is known to have unpredictable effects on the enzyme-substrate reaction (126), which could account for the cellulose-cellulase deviations from unity. The deviation for B-amylase can perhaps be attributed to its mode of enzyme action. Phosphorylase adds glucose monomers to the chain, where B-amylase cleaves off maltose, a glucose dimer; there- fore, B—amylase acts on only about half the bonds as. would an enzyme cleaving monomers. Thus, a lepe between —2 and -1 should be expected because B-amylase's action is relatively quicker. The information-entropy activity relation holds both in the synthesis and degradation of carbohydrate polymers. The relationship between hydrolysis, Dj’ and degree of polymerization, DPj, of carbohydrate molecules in vitro, can be shown for amyloses (127, 128). Four different hydrolyses were conducted on narrowly distribu- ted amylose polymers with B-amylase, using substrates with different initial degrees of polymerization. The results of these experiments are summarized in Table 4.3.4. Hydrolysis, Dj’ is expressed on a scale of 100 instead of l, and the logarithm of zero is designated as equal to zero. The correlation and regression analysis (see Table 3.3.5) shows a lower degree of correlation and signifi- cance than that seen in previous analysis (P < 0.1 for 115 TABLE 4.3.4.--Hydrolysis of Amylose Polymers with B-Amylase, and Degree of Polymerization. Test 1 Test 2 Test 3 Test 4 Natural Amylose 55. D. 53. D. 53. D. 53. D. 55. D. J J J 3 J J 3 J J 3 3,150 0% 1,230 0.0% 800 0.0% 795 0.0% 2,600 0.0% 2,050 20% 730 47.5% 560 29.5% 575 24.5% 2,500 35.5% 2,110 40% 525 68.0% 350 84.0% 280 78.0% 2,580 53.5% 1,550 70% 350 91.0% —- -— -- -- 2,200 72.0% Table 4.3.5.--Correlation and Regression Analysis of Hydrolysis and Degree of Polymerization. Correlation Coefficient SlOpe y-Intercept Test 1 0.85 -0.14 11.72 Test 2 0.89 -0.22 10.33 Test 3 0.92 -0.17 9.71 Test 4 0.89 -O.21 9.74 test 1 and 2; P < 0.15 for 3 and 4). However, the paucity and limited range of data could considerably bias the results of this analysis. Note that as the range for a particular experiment increases, so does the correlation coefficient. the theoretical line of minus one. The slopes of all the lines also differ from 116 This deviation of slopes from -l.0 is perhaps best understood after the influence of the reaction order in the encoding process is ascertained. Typically, enzyme reactions are viewed as lSt order; substrate and enzyme reacting on a one-to-one basis. However, B-amylase reacts with an average of 4.3 linkages per encounter (129), by a multichain mechanism, yielding a reaction order of 4.3. Such a situation necessitates a revised definition of enzyme activity. If we denote the enzy- matic activity for a reaction between the enzyme and one substrate bond as alj’ then the total enzymatic activity equals the nth product because alj reflects the proba- bility of reaction at one reaction site on the enzyme molecule, and the joint probability that 3 sites will react determines total enzyme activity and equals the n + 1 product of the activities. Therefore, assuming that the activity at each site is identical, the total enzymatic activity equals: a . = 3 (alj)n . _ (4.3.1) The cost function, cj, which was initially thought to be equal to the total enzymatic activity, is now seen to be equal to the site activities, a Hence, cj must be lj' redefined in terms of site activities: = l/n Cj (aj) , (4.3.2) 117 which substituted into equation (4.2.7), gives: -N 1 DP. — i . (4.3.3) 3 CJ (aj)l7n This equation dictates a generalized equation for “(I k ln Di . (4.3.4) hydrolysis: u UIH 01pm k ln DP. — k 1n DP. 3 i I UIH Using this result for a reaction order, 3, equal to 4.3 B-amylase, a slope of -0.232 is expected, closer to the average lepe -0.185 which results from experimental data. Why does the previous analysis of DPj versus activity not exhibit similar behavior? The reason is that the regression analysis done on the activities listed in Tables 4.3.1 and 4.3.2 was effectively a comparison of the rank orders of the enzymatic activities. The rank ordering of activity for a particular enzyme will be the same for k 1n a or for some constant multiple of it, lj n k ln alj equal to k ln aj. The digestion data were a part of the dynamic analysis of the relative rates of hydrolysis, which are dependent Iqxni reaction order. The behavior of monodisperse carbohydrate polymers differs from that of natural or polydisperse polymers. 118 The information analysis can be modified to encompass polydisperse systems. We begin by defining the contri- bution of the DPj polymer fraction to the polydisperse activity of system i as k ln DPj times its probable occurrence, Pj’ and do likewise for the DPi polymer. Thus, the proportional change in the degree of poly— merization of the system is: X P. k 1n DP. - Z P. k 1n EP. average = k 1n chain length ratio . (4.3.4) In Table 4.3.4, the degree of polymerization appears independent of the degree of hydrolysis. Can equation (4.3.4) predict such behavior? Husemann and Pfannemuller (128) studied the dis- tribution of polymers in polydisperse systems as a func— tion of hydrolysis. Table 4.3.6 presents the chain length distribution - Cj 1n P1 Il . (5.1.5) 126 A deviation from maximum organizability was noted in the Zipfian analyses conducted in section 3.3, where the slopes differed greatly from unity. Both the "under— organized" and "overorganized" systems are present. Since the organizability of the protein information sys- tem is dependent upon the amounts of amino acids flowing through the channels, an underorganized system having b/b' values less than one reflects the loss of organiza- tion in the system resulting from catabolism. The over- organized systems, those with b/b' greater than one, reflect the conservation of amino acids in the protein information system; that is, the information transmission rate of the jth channel is increased so more information can pass through this channel. Introduction of the catabolic concept into our system effectively allows rejection of the complete retention hypothesis. The Zipf's law equation for a single amino acid becomes: k ln ax. = k 1n am. - f k 1n r . , (5.1.6) 3 J C X] or, fOr the entire essential amino acid set: (5.1.7) HX(EAA) = Hm(EAA) - fC k 1n rx , where rx is the rank of protein x, fim(EAA) is the rank one log-frequency, and fC is a catabolism factor 127 accounting for the loss of organization in the system due to amino acid destruction by the liver. The regression analyses done in section 3.3 give, fortfluainformation-entropy measures based on the essen- tial amino acid set, a range for fC of -0.517 to -0.608 for rats, and -0.601 to -O.700 for pigs. The interpreta- tion of this result is that between 40% and 50% of the essential amino acids in the diets of rats are catabolized, and between 30% and 40% of those in the diets of baby pigs are converted to urea. Both of these values are compar- able to the experimentally determined value for mature dogs of approximately 56%. Younger animals such_as the experimental pigs and rats could be expected to have a higher nitrogen retention. If these urea production figures for baby pigs and rats are correct, a new use for the information- entropy model has arisen. In addition to being able to predict protein quality indices, the model can in turn be used to estimate the degree of catabolism of essential amino acids for various species of animals. The experi- mental method for measuring this phenomenon is very difficult, and the information-entropy model may quite possibly provide an adequate alternative. The interpretation of an overorganized system as overcompensating and thusreflecting essential amino acid conservation is supported by the regression analyses #—=_ v. u .— . ‘“.OI_-v-F—O- 128 in section 3.3. A formula similar to equation (5.1.7) can be employed to reflect amino acid conservation: min Hx(aaj) = min Hs(aaj) - fO - k 1n rxj ,‘ (5.1.8) where fO is a conservation factor which represents the overcompensation in the system due to essential amino acid conservation, and min HS(aaj) is the log frequency for the standard protein. Returning to the regression analyses in section 3.3, the range for fO for rats is -1.287 to -1.487, while the range for pigs is -l.097 to —l.460. These results imply a conservation of the limiting essential amino acid. The degree to which it is conserved is not readily evi- dent: whereas the slope for underorganized systems can range from 0.0 to 1.0, that for overorganized systems goes from 1.0 to infinity. Nonetheless, the information- entropy model conforms to the rule that the most limiting amino acid is conserved by the organism. The question now arises as to how the inclusion of fg will affect the ability of the information-entropy model to predict net protein value, the protein quality index. The mathematics involved are quite simple and for the essential amino acid set, the logarithm of NPV, NPVX(EAA) is: _ l - _ _ - NPVX(EAA) — E— H (EAA) HS(EA.A) . (5.1.9) 129 The predicted NPV resulting from utilizing fc will be denoted IX(EAAR) and known as the "essential amino acid retention index": _. 1- __1_- IX(EAAR) — antilog f; HX(EAA) fc HS(EAA) (5.1.10) when based on log-frequency fix(EAA), or Ig(EAAR) for log-frequency fi:(EAA). Utilizing equation (5.1.10) with fc = 0.5, ,. W Ix(EAAR) and I:(EAAR) were determined from the data in g section 3.3. Table 5.1.1 lists the correlation coeffi- cients among IX(EAAR) and I3(EAAR), and the experimental net protein values for rats and pigs. Table 5.1.2 lists the slopes of a linear regression analysis for the above variables and Table 5.1.3 lists the corresponding y-intercept Values. The correlation coefficients are not very dif- ferent from those obtained by regression without fc. However, the regression analysis shows a much improved picture in the ability of the model to accurately pre- dict the true score of net protein value for rats but the results for pigs indicate an fc = 0.5 may be too high a catabolism factor. This is graphically illustrated in Figures 5.1.1 and 5.1.2 for rats and_pigs, respectively. These graphs show the relationships of the model's information-entropy measures, I:(EAA) and I:(EAAR), to 130 TABLE 5.1.l.--Correlation Coefficients Among Essential Amino Acid Retention Indices and Experi- mental Protein Values of Rats and Pigs. Net Protein Value (Pigs) Net Protein Value (Rats) FAO Data: Eggum Data: IX(EAARf) 0.854 0.806 0 Ix(EAARf) 0.879 0.826 IX(EAARe) 0.851 0.819 IO(EAAR ) 0.910 0.842 X e TABLE 5.1.2. --Linear Regression Coefficients Among the Essential Amino Acid Retention Indices and Experimental Net Protein Values of Rats and Pigs. Net Protein Net Protein Value (Rats) Value (Pigs) FAO Data: Ix(EAARf) 0.642 0.533 0 Ix(EAARf) 0.657 0.543 Eggum Data: Ix(EAARe) 0.678 0.573 I0(EAAR ) 0.800 0.650 x e TABLE 5.1.3. --SlOpes for Regression Analysis Among Essen- tial Amino Acid Indices and Experimental Net Protein Values of Rats and Pigs. Net Protein Value (Pigs) Net Protein Value (Rats) FAO Data: Eggum Data: Ix(EAARf) 23.7 35.5 0 IX(EAARf) 24.0 .35.8 IX(EAARe) 20.7 32.3 I°> S‘, then equation (5.3.1) has the following form: m m g _ I — __. _— a _ _S _ (5.3.2) m m or aocl/DPj . (5.3.3) This is the exact conclusion of the information—entropy analysis. Thus, a valid analog to the encoding model presented in this chapter can be found in a special appli— cation of enzyme kinetics. Extension of the model to predict digestibilities had questionable results. From 137 my VieWpoint, this was due to the lack of adequate data. However, the regression analysis did not indicate that the log-log relationship between degree of polymerization and digestibility had complete merit. Rather, the level of significance of the results did not offer sufficient Cause for acceptance. The most pertinent notion gleaned from this analysis is the inverse proportionality between enzyme activity and degree of polymerization. It indicates that introduction of an information-entropy formalism may be possible for the study of enzyme kinetics. CHAPTER VI CONCLUSIONS 1. The information-entrOpy model of protein metabolism can be employed to assess the nutritional quality of proteins. This is accomplished by relating the amino acid content of proteins in the diet to net protein value. The model's output is similar to other amino acid scoring approaches such as Oser's essential amino acid index and chemical score. 2. A new index, the "essential amino acid reten- tion index," was postulated from the information—entropy model. It was as well correlated to net protein value as other chemical scoring methods, and permitted the catabo- lism of ingested amino acids to be accounted for. 3. An information-entrOpy analysis of polymer length versus the activity of enzymatic hydrolysis, by a cost—frequency analysis of encoding, was well correlated with experimental data on the subject. 4. The extension of the information-entropy study for the estimation of a hydrolysis coefficient could not be adequately correlated with experimental data. 138 REFERENCES 139 REFERENCES (1) S. Carnot, Reflections on the Motive-power of Heat, 1824, trans. by R. H. Thurston, ed. by E. Mendoza (New York: Dover Publications, 1960), p. 7. (2) R. Clausius, The Mechanical Theory of Heat (London: Macmillan and Co., 1879), p. 78. (3) W. Thomson (Kelvin), "0n the Dynamical Theory of Heat with Numerical Results Deduced from Mr. Joule's Equivalent of a Thermal Unit, M. Regnault's Obser- vations on Steam," Trans. of the Royal Society of Edinburgh (March, 1851), reprinted in The Second Law of Thermodynamics, ed. by W. F. Magie (New York: Harper & Brothers, 1899), pp. 111-148. (4) J. C. Maxwell, Theory of Heat (10th ed.; London: Longmans, Green & Co., 1921), p. 189. (5) C. Caratheodory, Math. Ann., No. 67 (1909), p. 355, cited by S. Blinder, “Caratheodory's Formulation of the Second Law," Physical Chemistry, ed. by W. Dost (New York: Academic Press, 1971), pp. 613-637. (6) M. J. Klein, "The DevelOpment of Boltzman's Statis- tical Ideas," The Boltzman Equation, ed. by E. G. D. Cohen and W. Thirring (New York: Springer-Verlag, 1973). (7) L. Boltzman, "Further Studies on the Thermal Equi- librium of Gas Molecules," Wiener Berichte, v. 66 (1872), cited by L. Boltzman, Lectures on Gas Theory (1896), trans. by S. Brush (Berkeley: Uni- ver31ty of California Press, 1964), p. 52. (8) J. C. Maxwell, Matter and Motion (New York: Dover Publications, Inc., 1877). (9) L. Boltzman, "Observations on One Problem of the Mechanics of Heat Theory," Wiener Berichte, v. 76 (1877), cited in L. Boltzman, Lectures on Gas Theory (1910), trans. by S. Brush (Berkeley: Uni- versity of California Press, 1964), p. 58. 140 (10) (11) (12) (l3) (l4) (15) (16) (17) (18) (19) (20) (21) (22) (23) 141 G. Arfken, Mathematical Methods for Physicists (New York: Academic Press, 1965). J. W. Gibbs, Elementary Principles in Statistical Mechanics (New Haven: Yale University Press, 1902). E. H. Kerner, Gibbs Ensemble: Biological Ensemble (New York: Gordon and Breach, Science Publishers, 1972), p. vii. J. Kestin and J. R. Dorfman, A Course in Statistical Thermodynamics (New York: Academic Press, 1971), pp. 178-181. Ibid., p. 196. Ibid., p. 199. E. T. Jaynes, "Gibbs vs. Boltzman Entropies," Amer. J. Physics, v. 33 (1965), pp. 391—398. A. Grunbaum, "Is the Coarse-grained Entropy of Classical Statistical Mechanics an Anthropomor- phism," Modern Developments of Thermodynamics, ed. by B. Gallor (New York: J. Wiley & Sons, 1974), pp. 413--28. L. Szilard, "0n the Decrease of Entropy in a Thermo- dynamic System by the Intervention of Intelligent Beings," Z. Phy., v. 53 (1929), trans. in Behavioral Science, v. 9 (October, 1964), pp. 301-310. C. E. Shannon, "A Mathematical Theory of Communica- tion," Bell System Tech. J., v. 27 (July-October, 1948), pp. 379 and 623. J. R. Pierce, "The Early Days of Information Theory," IEEE Trans. on Info. Thy., v. IT-l9, No. 1 (January, 1973). R. V. L. Hartley, "Transmission of Information," Bell System Tech. J., v. 7 (July, 1928), pp. 535-563. D. Gabor, "New Possibilities in Speech Transmission," J. Inst. Elect. Eng. (London), v. 94 (November, 1947), pp. 369-390. A. I. Khinchin, Mathematical Foundations of Informa- tion Theory (New York: Dover Publications, 1957), pp. 9-13. (24) (25) (26) (27) (28) (29) (30) (31) (32) (33) (34) 142 E. T. Jaynes, "Information Theory and Statistical E. T. Jaynes, "Information Theory and Statistical Mechanics, II," Physical Review, v. 108, No. 2 (1957). M. Tribus, P. T. Shannon, and R. B. Evan, "Why Thermodynamics is a Logical Consequence of Informa- tion Theory," AIChE J. (March, 1966), p. 244. P. Ziesche, "About a New Introduction of Entropy in Statistical Mechanics due to Macke," Proc. of Coll. on Info. Thy., v. II, ed. by A. Renyi’(Buda- pest, Hungary: J. Bolyai Math. Soc., 1968), pp. 515-518. J. Fritz, "Information Theory and Thermodynamics of Gas Systems," Proc. of Coll. on Info. Thy., V. I, ed. by A. Renyi (Budapest, Hungary: J. Bolyai Math. Soc., 1968), pp. 167-175. M. Tribus, Thermostatics and Thermodynamics: An Information Theory Approach (Princeton, N.J.: Van Nostrand, 1961). R. Baierlain, Atoms and Information Theory (San Francisco: Freeman and Company, 1971). D. C. Zubaren, Nonequilibrium Statistical Thermo- dynamics (New York: Consultants Bureau, 1974), pp. 38-45 and 100—103. J. von Neumann, Theory of Self-Reproducing Automata (Urbana, Illinois: (University of Illinois Press, V1966), pp. 60-61. G. Jumarie, "Further Advances on the General Thermodynamics of Open Systems via Information Theory: Effective Entropy, Negative Information," Int. J. of Sys. Sci., v. 6, 1975, pp. 249-269. I. N. Taganov, "Information Simulation of Multi- factor Systems in Chemistry and Chemical Engineer- ing," Theoretical Foundations of Chemical Engineer- ing, English translation from the Russian, v. 9, No. 2 (1975, translated January, 1976), pp. 223- 228. (35) (36) (37) (38) (39) (40) (41) (42) (43) (44) (45) (46) (47) 143 D. Slepian, "Information Theory in the Fifties," IEEE Trans.'on Info.'Thy., v. IT-l9, No. 2 (March, 1973), PP. 143-148. C. Cherry, ed., Information Theory: 3rd London Symposium (London: Burtersworths, 1956). R. W. Hamming, "Error Detecting and Error Correct- ing Codes," Bell System Tech. J., v. 29 (1950), pp. 147-160. C. E. Shannon, "Certain Results in Coding Theory for Noisy Channels," Inform. Control, v. 1 (Septem- ber, 1957), pp. 6—25. A. J. Viterbi, "Information Theory in the Sixties," IEEE Trans. on Info., v. IT-19, No. -3 (May, 1973), pp. 257-262. E. R. Berlekamp, Algebraic Coding Theory (New York: McGraw-Hill, 1968). J. K. Wolf, "A Survey of Coding Theory: 1967-1972," IEEE Trans. on Info. Thy., v. IT—l9, No. -4 (July, 1973), PP. 381-389. W. R. Garner, Uncertainty and Structure as Psycho- logical Concepts (New York: John Wiley and Sons, Inc., 1962), pp. 28-32. H. Theil, Economics and Information Theory (Amster- dam, Netherlands: North-Holland Publishing Co., 1967). J. M. Cozzolino and M. J. Zahner, "The Maximum- EntrOpy Distribution of Future Market Price of Stock," Operation Research, v. 21 (1973), pp. 1200- 1211. B. Lev, "Accounting and Information Theory,“ Studies in Accounting-Research (American Accounting Assoc1- ation, 1969). N. Georgescu-Roegen, The Entropy Law and the Eco? nomic Process (Cambridge, Massachusetts: Harvard University Press, 1971). L. W. Rosenfield, Aristotle and Information Theory (The Hague, Netherlands: Humanities Press, 1971). (48) (49) (50) (51) (52) (53) (54) (55) (56) (57) (58) (59) (60) 144 M. A. P. Willmer, "Information Theory and the Mea- surement of Detective Performance,"'Kybernetes, L. L. Gatlin, "The Entropy Maximum of Protein," Math. Biosci., v. 13 (1972), pp. 213-227. M. Haegawa and T. Yanko, "The Genetic Code and the Entropy of the Protein," Math. Biosci., v. 24 (1975), pp. 169-182. C. E. Shannon, "Prediction and Entropy of Printed English," Bell System Tech. J., v. 30 (1951), p. 50. J. F. Young, Information Theory (New York: Wiley- Interscience, 1971), pp. 50—58. C. E. Shannon and W. Weaver, The Mathematical Theory of Communication (lst paperback ed.; Urbana, Illinois: University of Illinois Press, 1949). L. P. Hyvarinen, Information Theory for Engineers (New York: Springer-Verlag, 1968), pp. 15-17. B. Mandelbrot, Jeux de communication (Institut de Statistique de l'UniversitE’de Paris, 1953), cited by L. Brillouin, Science and Information Theory (New York: Academic Press, Inc., 1962), pp. 28-47. G. K. Zipf, The Psycho-Biology of Language (Cam- bridge, Massachusetts: MIT Press, 1935). G. K. Zipf, Human Behavior and the Principle of Least Effort (Reading, Massachusetts: Addison- Wesley Press, Inc., 1949). L. S. Kozachkov, "Certain Integral Properties of Information Systems of Hierarchic Type," Kiber- netica (1974). . V. T. Coates, Revitalization of Small Communities: .TranSportation Options, U.S. Department of Trans- portation, DOT-TST-75-l (May, 1974). V. Pareto, Manual of Political Economy, trans. by A. S. Schwier, ed. by A. S. Schwier and A. N. Page (New York: A. M. Kelley, 1971). (61) (62) (63) (64) '(65) (66) (67) (68) (69) (70) (71) (72) (73) (74) 145 J. Lotka, “The Frequency Distribution of Scientific Productivity," J.'Acad;'Sci. (Washington, D.C., No. 12, 1926). J. Huxley, Problems of Relative Growth (2nd ed.; New York: Dover Press, 1972). T. A. Loomis, Essentials of Toxicology (Philadel- phia: Lea & Febiger, 1968). M. Tribus, Rational Descriptions, Decisions and Designs (New York: Pergamon Press, 1969). E. T. Jaynes, Probability Theory in Science and Engineering (Dallas, Texas: Mobil Oil Research Laboratory, 1959). R. T. Cox, The Algebra of Probable Inference (Baltimore, Maryland: Johns Hopkins Press, 1961), pp. 35-65. C. E. Shannon, "The Bandwagon," IEEE Trans. Info. Thy., v. IT-2 (March, 1956), p. 3. H. P. Yockey, R. L. Platzman and H. Quastler, edi- tors, Symposium on Information Theory in Biology (New York: Pergamon Press, 1958). S. M. Danoff and H. Quastler, editors, Essays on the Use of Information TheoryiJIBiology (Urbana, Illinois: University of Illinois Press, 1953). W. M. Elsasser, The Physical Foundations of Informa- tion Theory in Biology (New York: Pergamon Press, 1958). E. Samuel, Order: In Life (Englewood Cliffs, New Jersey: Prentice-Hall, Inc., 1972). J. von Neumann, "Probabilistic Logics and the Syn- thesis of Reliable Organisms from Unreliable Com- ponents," Ann. of Math. Studies, No. 34 (1956), pp. 43-98. B. Michaels and R. A. Chaplain, "The Encoder Mechanism of Receptor Neurons," Kybernetic, v. 13 M. Abeles, "Transmission of Information by the Axon: II--The Channel Capacity," Biological Cybernetics, v. 19 (1975), pp. 121-125. .— (75) (76) (77) (78) (79) (80) (81) (82) (83) (84) (85) (86) (87) 146 R. F. Quick and T. A. Reichert, "Multi-Channel Models of Human Vision: Bandwidth Considerations," Kybernetic, v. 12 (1973), pp. 141-144. A. B. Kogan and O. G. Chorajan, "Some Information Theory Applications to the Physiology of the Nervous Cell," Kybernetes, v. 2 (1973), pp. 77-78. R. Eckhorn and B. Popel, "Rigorous and Extended Application of Information Theory to the Affernet Visual System of the Cat-II: Experimental Results," Biological Cybernetics, v. 17 (1975), pp. 7-17. M. W. Nirenberg and H. Matthaei, "Dependence of Cell—Free Protein Synthesis in E. coli upon Naturally Occurring or Synthetic Polyribonucleo- tides," Proc. Nat. Acad. Sci. (USA), v. 47 (1961), p. 1588. H. R. Mahler and E. H. Cordes, Biological Chemistry (New York: Harper and Row, 1971), p. 783. L. L. Gatlin, Information Theory and the Living System (New York: Columbia University Press, 1972). L. M. Spetner, "Information Transmission in Evolu- tion," IEEE Transactions on Information Theory, V. IT-l4] No. l (1968), pp. 3-6. E. Schroedinger, What is Life? (Cambridge, England: Cambridge University Press, 1945). J. von Neumann, Theory of Self-Reproducing Automata (Urbana, Illinois: University of Illinois Press, 1966), p. 61. P. Fong, "Thermodynamic and Statistical Theory of Life: An Outline," Biogenesis, Evolution, Homeo— stasis-—A Symposium bngorrespondence, ed. by A. Locker (Berlin, Germany: Springer-Verlag, 1975), pp. 93-100. P. Glansdorff and I. Prigogine, Thermodynamic Theory of Structure, Stability and Fluctuations (New York: Wiley-Interscience, 1971). L. A. Maynard and J. K. Loosli, Animal Nutrition (New York: McGraw-Hill Book Co., 1969), p. 367. M. Hawegawa and Y. Taka-aki, "The Genetic Code and the EntrOpy of a Protein," Mathematical Biosciences, v. 24 (1975), pp. 169—182. (88) (90) (91) (92) (93) (94) (95) (96) (97) (98) (99) 147 K. Thomas, "Uber die Biolische Wertigkeit der Stickstoff Substanzen in ver schieden Nahrung— smittel," Arch. Anat. u. Physiol., Physiol. Abstract (1909), pp. 212-302, cited by Maynard and Loosli, op. cit., p. 459. H. H. Mitchell and G. G. Carman, "The Biological Value of the Protein Nitrogen Mixtures of Patent White Flour and Animal Foods," J. Biol. Chem., v. 68 (1926), pp. 183-215. A. E. Bender and D. S. Miller, "A Brief Method for Estimating the Value of Protein," Biochem. J., v. 53 (1953), p. vii. D. S. Miller and A. E. Bender, "The Determination of the Net Utilization of Proteins by a Shortened Method," British J. of Nutrition, v. 9 (1955), pp. 382-390. ‘ J. M. McLaughlan and J. A. Campbell, "Methodology of Protein Evaluation," Mammalian Protein Metabolism, Vol. III, ed. by H. N. Munro (New York: Academic Press, 1969), pp. 391-422. T. B. Osborne, L. B. Mendel and E. L. Perry, "A Method of Expressing Numerically the Growth- Promoting Value of Proteins," J. Biological Chem., v. 37 (1919), p. 223. D. V. Frost, "Methods of Measuring the Nutritive Value of Proteins, Protein Hydrolyzates, and Amino Acid Mixtures," Protein and Amino Acid Nutrition, ed. by A. A. Albanese (New York: Academic Press, 1959). PP. 225-274. D. M. Hegsted, "Assessment of Protein Quality," Improvement of Protein Nutriture, National Academy of Sciences (1974), pp. 64-88. Amino Acid Content of Foods and Biological Data on Proteins, FAQ: FAQ Nutritive Studies, Rome, Italy (1970), NO. 24. S. B. Richmond, Statistical Analysis (2nd ed.; New York: Ronald Press Co., 1964), pp. 424-465. R. J. Senter, Analysis of Data (Glenview, Illinois: Scott, Foresman, & Co., 1969), pp. 440—445. .1 (100) (101) (102) (103) (104) £105) (106) (107) (1()£3) (1C)9I) 148 H. H. Mitchell and R. J. Block, "Some Relationships Between the Amino Acid Contents of Proteins and their Nutritional Values for the Rat," J. Biol. Chem., v. 163 (1946), p. 599. B. L. Oser, "Method for Integrating Essential Amino Acid Content in the Nutritional Evaluation of Proteins," J. Amer. Dietetic Assoc., v. 27 (1951), pp. 396-402. A. E. Bender, "Rat Assays for Protein Quality--A Reappraisal," Proc. 9th International Congress on Nutrition (MexiCo City, Mexico: 1972), v. 3, reprinted by Karger Basel (1975), pp. 310-320. H. N. Munro, "A General Survey of Mechanisms Regu- lating Protein Metabolism in Mammals," Mammalian Protein Metabolism, ed. by H. N. Munro (New York: Academic Press, 1970), v. 4, pp. 3-130. A. M. Rosie, Information and Communication Theory (London, England: Van Nostrand Reinhold Co., 1973), p. 90. ’ H. H. Williams, A. E. Harper, D. M. Hegsted, et al., "Nitrogen and Amino Acid Requirements," Improvement of Protein Nutriture, National Academy of Sciences (1974), pp. 23-63. H. N. Munro, "Free Amino Acids and Their Role in Regulation," Mammalian Protein Metabolism, v. 4, ed. by H. N. Munro (New York: Academic Press, 1970), pp. 299-386. J. M. McLaughlan and A. B. Morrison, "Dietary Fac- tors Affecting Plasma Amino Acid Concentrations," Protein Nutrition and Free Amino Acid Patterns, ed. by J. H. Leathem (New Brunswick, New Jersey: Rutgers University Press, 1968), pp. 3-18. C. Gitler, "Protein Digestion and Absorption in Non- Ruminants," Mammalian Protein Metabolism, v. 1, ed. by H. N. Munro (New York: Academic Press, 1964), pp. 35—70. D. H. Elwyn, "Modification of Plasma Amino Acid Pattern by the Liver," Protein Nutrition and Free Amino Acid Patterns, ed. by J. H. Leathem (New Brunswick, New Jersey: Rutgers University Press, 1968). pp. 88-106. (110) (111) (112) (113) (114) (115) (116) (117) (118) (119) (120) 149 A. N. Kolmogorov, "Three Approaches to the Quanti— tative Definition of Information," Problems of Information Transmission, v. 1, No. 1 (1965), pp. 3-11. B. O. Eggum, A Study of Certain Factors Influencing Protein Utilization in Rats and Pigs (COpenhagen, Denmark: I Kommission has Lanhusholdningsselskabets Forlag, 1973). I. Danishefsky, R. L. Whistler and F. A. Bettelheim, "Introduction to Polysaccharides," The Carbohy- drates, Chemistry and Biochemistry, ed. by W. Pigman and D. Horton (New York: Academic Press, 1970), v. II-A, pp. 375—410. "Starch in Glycogen," ed. by W. "The Carbohydrates, Chemistry Academic Press, 1970), C. T. Greenwood, Pigman and D. Horton, and Biochemistry (New York: v. II-B, pp. 471-513. E. B. Cowling, "Structural Features of Cellulose," ed. by E. T. Reese, Advances in the Enzymatic Hydrolysis of Cellulose and Related Materials (New York: Pergamon Book Press, 1963). W. S. Whelan, "Enzymatic Explorations of the Struc— tures of Starch and GlyCOgen," Biochemistry, K. W. King, "Enzymes of the Cellulase Complex," ed. by R. F. Gould, Cellulases and Their Applica- tions, Advances in Chemistry Series 95 (Washington, D.C.: American Chemical Society Publications, 1969), pp. 7-26. A. White, P. Handler, and E. Smith, Principles of Biochemistry (New York: McGraw—Hill, Inc., 1973), Weende Method, cited by L. A. Maynard and J. K. Loosli, Animal Nutrition (New York: McGraw-Hill, Inc., 1969), pp. 76-77. Biological Energy Interrelationships and Glossary of Energy Terms, National Academy of Sciences (Washington, D.C.: Printing and Publishing Office, NAS, 1966), Publication No. 1411. R. A. Gibbons, Polydispersity," Nature, v. 200 (November 16, 1963), pp. 665-666. (121) (122) (123) (124) (125) (126) (127) (128) (129) (130) 150 F. M. Reza, An Introduction to Information Theory (New York: McGraw-Hill, 1961), pp. 132-135. H. R. Mahler, E. H. Cordes, Biological Chemistry (New York: Harper & Row, 1966), p. 230. L. Michaelis and M. L. Menten, Biochem. Z., V. 49 (1913), p. 333. E. Husemann and B. Pfannemuller, "An Investigation of the Kinetics of B-Amylase and Phosphorylase: The Dependence of Reaction Velocity on the Chain Length of the Amylose," D. Makromole. Chem., V. 87 (1965), pp. 139-151. W. Z. Hassid, "Biosynthesis of Sugars and Poly- saccharides," The Carbohydrate's Chemistry and Biochemistry, ed. by W. Pigman and D. Horton (New York: Academic Press), v. II-A, pp. 302-373. K. E. Almin and K. E. Eriksson, "Influence