HWZMIIHHHIWMl l l \ \ [ WWIIHHNUJHIIWWIUI YHE CAPACITY ANS AMBIGUW 0F A TRANSQUCER l )l Thesés for the Degree 0‘? 2951.9. MECHEGAN STAE UNWERSWY WILLEAM MORGAN CONRER 1969 THESIS 0-169 This is to certify that the thesis entitled THE CAPACITY AND AMBIGUITY OF A TRANSDUCER presented by William Morgan Conner has been accepted towards fulfillment of the requirements for ___/7/2' 0 'degree in_Z//,(/..t7 fzém44é/z4'f Majdr professor Date 0/9/C 2]; /f//V LIBRARY ' Michigan State University ABSTRACT. THE CAPACITY AND AMBIGUITY OF A TRANSDUCER By William Morgan Conner Up to now little work has been done on examining the corre- spondence between a discrete, noisy information channel with memory and the unit square. The correspondence is made by associating the infinite sequences of the channel with the expansions of points in the unit interval. We begin such an investigation in this paper, although not in this generality. We introduce the element of memory but not noise, and examine the following particular type of noiseless channel with memory. Let S = {0,l,...,b-l} where b 2 2 is an integer and let m be a positive integer. Let 3m be the set of all func- * * tions f from Sm = SxSx...x into S. Now let f E 3m, let m factors a - X E (0,1] and 19C 2 xib 1 be the (unique) nonterminating base i=1 b expansion of x. We define a function f from (0,1] into a _ 'k [0,1] by f(x) = Zyb i where y =f (x ...x i=1 1 1 1 i = 1,2,... . We call f a transducer of memory m, and we i+m-l)’ call {xi} the input sequence and {yi} the received sequence. We give two definitions for the capacity of the transducer and show that these two values for the capacity are equal. We then show that the Hausdorff dimension of the set of all received sequences is equal to the transducer capacity. This result William Morgan Conner establishes a correspondence between a geometric property (dimension of the set of received sequences) and an informa- tion theoretic property (transducer capacity). It shows that the "size" (dimension) of the received set, which is an intu- itive measure of the "capacity", is equal to the quantity formally defined as the capacity. With geometric considerations as the motivation, we next define the ambiguity of the transducer. It is shown that the transducer has a homogeneity property by proving that the ambiguity is almost everywhere the same. The usual quantity used for ambiguity in information theory is a conditional entropy (called the equivocation by most authors) which repre- sents the ambiguity averaged over all received sequences. Our definition is pointwise and gives results involving almost everywhere statements which are somewhat more satisfying than statements involving averages. THE CAPACITY AND AMBIGUITY OF A TRANSDUCER BY William Morgan Conner A THESIS Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Department of Mathematics 1969 $0 1 77-4 7’ / ‘7“ ACKNOWLEDGMENTS The author wishes to thank his advisor, Dr. John R. Kinney, for suggesting the topic of this thesis, and for his helpful suggestions, patience, and encouragement throughout the course of the work. Thanks are due also to Dr. G.A. Hedlund of Yale University for sending cOpies of requested material to the author. ii II. III. IV. VI. TABLE OF CONTENTS INTRODUCTION .. ...... .... HAUSDORFF DIMENSION AND ENTROPY ... ...... THE CAPACITY OF THE TRANSDUCER .. ........ THE DIMENSION OF THE RECEIVED SET ... ........... . ...... THE AMBIGUITY OF THE TRANSDUCER ...... - .............. ... CONCLUSION ..... BIBLIOGRAPHY iii 13 18 29 31 I. Introduction Besicovitch [1.], Eggleston [ 8, 9 ] and others have cal- culated the Hausdorff dimension (see Section II) of subsets of the unit interval defined by placing certain restrictions on the digits of expansions of numbers. For example, let M(p), 0 s p S 1, be the set of points in the unit interval containing 1 in their dyadic expansions in the proportion p, i.e., x = .xlxz... belongs to M(p) if and only if lim n-1 2 xk = p. n-m k=l Eggleston [8] has shown that the dimension of M(p) is p logzp - (l-p)log2(l~p). (A simplified proof is obtained by using a general theorem due to Billingsley [5 ], p. 142.) Observing that this value for the dimension is the entropy of an information source, Kinney [18] and Billingsley [2,3,4 ] sought a connection between dimension theory and information theory by making the infinite sequences of symbols from the source correSpond to the expansions of points in the unit interval. Theorem 1 of Kinney's paper asserts the existence of a set of measure one whose dimension is the entropy of a Markov source. Dym [7] recently extended this theorem to general stationary, ergodic sources. Theorem 2 of Kinney's paper is concerned with noiseless coding and shows that the dimension of a certain set corresponding to the coded messages is equal to the capacity of the noiseless channel. Smorodinsky [20] recently extended this theorem to very general noiseless channels. 1 As yet little work has been done on examining the corre- Spondence between a discrete, noisy channel with memory and the unit square. Such an investigation is begun in this paper, although not in this generality. We introduce the element of memory but not noise, and examine the following particular type of noiseless channel with memory. Let S = (0,1,...,b-l} where b 2 2 is an integer and let m be a positive integer. Let 3m be the set of all * functions f from Stn = S X S X...X S into S. We note that m factors m crd 3b = bb where crd denotes the cardinal number of a set. * Let f 6 3m and let n be a positive integer. We de- fine a function fn from Sn+m-l into Sn as follows. Let _ nfim-l _ * X -' X1...Xn+m-1 e S and 18C yi f (xi...xi'Hn-l)’ i = 1,...,n. Then the sequence y = yl...yn 6 Sn, and we define *- fn(x) = y. Note that f1 = f . Now let x 6 (0,1] and let g xib-1 be the (unique) nonterminating base b expansion of1 i. We define a function f from (0,1] into [0,1] by f(x) = Elmyib-D1 where y1 = f*(xi...xi+m_1), i = 1,2,... . In 2h: terminology of Shannon [19], the function f is an example of a transducer of memory m. In the terminology of Feinstein [10], Billingsley [5 ], Khinchin [16], and others f is a noiseless, discrete channel (or code) of memory m. We will occasionally call {x1} the input sequence and {y1] the output (or received) sequence; and, following Shannon, we will call f a transducer of memory m. In Section II we define and give the basic properties of Hausdorff dimension and entropy. In Section III we give two definitions of the capacity of the transducer and show that they are equivalent. It is shown in Section IV that the dimension of the set of all received sequences is equal to the capacity. Finally, in Section V we define and examine the ambiguity of the transducer. The pound sign (#) will be used throughout this paper to denote the end of a proof. Also, all logarithms in this paper are to the base b. II. Hausdorff Dimension and Entropy In this section we give the definitions and basic properties of Hausdorff dimension and entropy, two concepts we will be using later. a 0 Let x 6 (0,1] and let x = z xib 1 be the nonterminat- i=1 ing base b expansion of x. Define bi(x) = x1 for all i, i.e., bi(x) is the ith digit of the base b expansion of x. A set of the form {x: bi(x) = 81, i = l,...,n}, where 81 E S, is denoted by [31,...,sn] and is called a cylinder of length b-n. Note that [31,...,sn] is a half-open (open on the left) b-adic interval of length b.n (i.e., an interval of the form (:3 , 13%] for some j, 0 s j 3 bn - l). The following definition of dimension in the unit interval, which extends Hausdorff's original definition, is given by Billingsley [5 ]. Let MA: (O,l], let a and p be positive real numbers, and let u be a probability measure on the Borel sets of. (O,l]. Define pamw) = inf if 9.071)“. where the infimum is taken over all u-p'-covdrings of M, a p-p-covering being a covering by cylinders V1 with p(vi) < p. It is clear that ua(M,p) s ua(M,p') for p' < p, so the limit Ham) = if; pawn) 4 exists (but may be infinite). It can be shown that for fixed M there is an a such that p. (M) I on for a< or and p. (M) = 0 O a o a for a >'ao. The number 00 is called the (Hausdorff) dimension of M with respect to p and is denoted by dimu M. We will denote Lebesgue measure by L and will write dim M instead of dim M. L Some prOperties of dimension are: (2.1) ua(M) > 0 implies dimh M 2 a; (2.2) dimh M > 0 implies ua(M) ==m; (2.3) ua(M) < m implies dimL M s a; (2.4) dim“ M < 0 implies Liam) = O; (2.5) O S dimh M s l; (2.6) dimh M = 0 if M is countable; and (2.7) dimh M = 1 if M is a Borel set with u(M) > O. Dimension is useful for indicating the "size" of a noncountable, measure zero set. Entropy was introduced into information theory by Shannon [19]; Kolmogorov and Sinai have extended the notion of entropy to general measure preserving transformations (see Billingsley [55]). Discussions of entropy in information theory may be found in Feinstein [10] or Dym [7 ]. Let B be the Borel sets in (0,1] and define a trans- formation T on (0,1] by Tx = [bx] for each x 6 (O,l], where {bx} denotes the fractional part of bx. The transformation T is a left shift on the digits of the base b expansion of x. If p.(T-]B) = p.(B) for all B E B, T is said to be measure pre- serving. In such a case we will say that u is stationary. We now define the entropy of a stationary probability measure u as follows. For each positive integer n define Hug") = '2 H([Sls°°-ssn])log H([sls°°'ssn])s where the summation is taken over all s ... s 6 Sn, and where l n we take 0 log 0 to be zero. It can be shown that the limit H(u) H(H) = lim _n___ nab n exists. We call H(u) the entropy of u. Similar definitions can be given for the unit square. De- fine a transformation TI on (0,1] X (O,l] by T1(x,y) = (Tx,Ty) for each (x,y) 6 (0,1] X (O,l]. As before a probability measure P on the Borel sets of (0,1] X (0,1] will be called stationary if T1 preserves P. For stationary P we define Hn(P) = -2 P([sl,...,sn,r1,...,rn])log P([sl,...,sn,r1,...,rn]), where [31,...,sn,r1,...,rn] = {(x,y): bi(x) = $1, bi(y) = ti, 1 s i s n}, and where the summation is taken over all s ... s ,r ... r 6 8“. Again it can be shown that the limit 1 n l n Hum , H(P) = lim n ,' n—m called the entropy of P, exists. III. The Capacity of the Transducer * Let f E 35 and let f be the corresponding transducer of memory m. Let n be a positive integer and let N(n) = crd fn(Sn*m-1), i.e., N(n) is the number of distinct output sequences of length . n+m-l , n which correspond to the b input sequences of length n+m-l. Following Shannon's terminology [19] we call C = lim lQEIEIEl nam n the capacity of the transducer. Theorem 3.1. The limit C exists and satisfies 0 s C s 1. Proof: Let k and n be positive integers. There are N(k) different ways in which a received sequence of length k+n may begin, and at most N(n) different ways in which it may end. Hence (3.2) N(k+n) s N(k)N (n). Also it is obvious that (3.3) N(k) s N(n) for k S n. We now follow a well-known procedure (see, for example, Feinstein [10], p. 85) to show that C exists. Let a = inf lEBEELEl n and let a > 0 be given. There exists an integer r such that $23:E££l s a.+ e- For any integer n 2 r define kn by (kn-l)r S nr< knr. By (3.3) and (3.2) we have log N(n) 3 log N(knr) 3 kn log N(r), and thus k r k r k log N(n) S n log Nit) S ___n__(a+€) = “ (a+€). n n r (kn-Dr kn-l k kn l 1. It follows that limnsup .28;§£El s 1+3. Since 6 was arbitrary, we have limnsup 125:!121 s a, and since legagjni 2 a for all n, As n approaches m, kn approaches a and hence El approaches wezhave limninf 123:§121 2 a. Thus C exists (and is equal to a). Clearly C 2 0 since 12535121 2 0 for all n. As for C s 1, we note that 125:5121 s 12532: = l for all n. # We now apply another definition of capaCity, introduced by Shannon [19] and expanded upon by others for noisy channels, to our transducer. Let T be the transformation defined in Section II, and let u be a probability measure on the Borel sets .52 Then T is called ergodic under u if for each B 6.6 such that T-lB = B, u(B) is either zero or one. In such a case we will say that u is ergodic, omitting reference to T. We will denote by 771 the set of all probability measures on B which are stationary (defined in Section II) and ergodic. For each x E (O,l] define a probability measure vx on 13 by letting vx assign unit mass to the point f(x). Let p 6 W1 and, for M 6 Ca where C, is the class of Borel sets in (0,1] X (O,l], define (3.4) MM) = f vx<{y: (m) e M})u(dx)- (0.1] To see that this integral is defined (i.e., that the integrand is measurable) we proceed as follows. Let B be a cylinder (b-adic interval). Then f-1(B) is the finite disjoint union of b-adic intervals and hence f-1(B) 618. Since the class of b-adic intervals generates 13 (i.e., the smallest o-algebra containing the class of b-adic intervals is 6), f is measurable with respect to 5'. Now let B be a fixed member of B. We show that the function g(x) = vx(B) is measurable with respect to .6. The function g assumes only two values, namely zero on the set D = {x: f(x) d B} and one on the set E = {x: f(x) 6 B}. Thus g will be measurable if D and E €18. Since E = f-1(B) and f is measurable, E 616, and since D is the complement of E, D 618. Thus vx(B), as a function of x, is measurable. We next show that the class .B of all sets M in CI for which the integrand in (3.4) is measurable is a monotone class. Let M1 c: M2 C... be a sequence of sets in .3. Then vx({y= (my) 6 L: MiD = vxaib': (my) 6 Mi}) = 11m vx({y: (my) 6 Mil) and hence U M1 6.5 since vx({y: (x,y) E U Mi})’ being the limit 1 i of measurable functions, is measurable. Similarly 0 Mi E-D for a decreasing sequence M1 :3 M2 :3... of sets in .8. 1Thus .3 is a monotone class. Also .0 contains all rectangles B X C, B,C 6.6, since the set {y: (x,y) E B X C} is either empty or C, and we showed above that vx(C) is~measurable. It follows easily that .8 contains the finite, disjoint unions of rectangles B X C, lO B,C E18. Thus by a well-known result (see, for example, Kingman and Taylor [17], p.18), we have .8 = Ca Hence the integrand in (3.4) is measurable for all M 6 CL It is easily verified that P(M) = u(projx{M n graph of f}), where projx denotes the projection on the x-axis, and that P is a probability measure. Also the set function A defined, for B 648, by (3.5) X(B) = P((0,1] X B) is easily seen to be a probability measure. We note that MB) = u({x: f(x) 6 3}). Theorem 3.6. P is stationary and A 6 77¢. Proof: We show first that for any B 676 we have -1 -l (3.7) [x: f(x) 6 T B} = T [x: f(x) 6 B}. Let A be the set on the left hand side of (3.7) and let A' be the set on the right hand side of (3.7). Let x E A and let xi = bi(x), i = 1,2,..., i.e., x1 is the ith digit of the non- terminating base b expansion of x. Then f(x) 6 T-lB which implies that f(x) = .y yly2 ... where y E S and the point .ylyz... E B. Thus f(Tx) = f(.x2x3...) = .y1y2... E B which is the condition that x 6 A'. Therefore (3.8) A<: A'. Now let x E A' and let xi = bi(x), i = 1,2,... . Then f(Tx) E B, i.e., f(.x2x3...) = .y1y2... where .ylyz... 6 B. It follows that f(.x1x2...) = .y ylyz... for some y E S, or 11 f (x) E T-1B. Thus x E A so (3.9) A' CIA. Inclusions (3.8) and (3.9) give (3.7). Now for any B E B, u({x: f(x) 6 T-lB]) x(T'lB) u n graph of f}) u({x: f(x) 6 T-IC} fl T-1B) 12 n(T'1{x: f(x) 6 C} n T-IB) u(T'1<{x: f(x) 6 c} n 3)) u([x: f(x) 6 C} O B) P(B x C). # Now for p EEWL all three measures u, A, and P are stationary so their entropies are defined. We let (3-10) Ru = H(u) + H(x) - H(P) and (3-11) 0' = Sup R . new The number R9 is called the rate of transmission of the trans- ducer with reSpect to u. See Billingsley [5:] for an intuitive interpretation of R“. We now show that C' can be called the transducer capacity by proving the following theorem. Theorem 3.12. C' = C. (2532;: For any integer n 2 l, the triple (Sn+m-l’ Sn, fn) forms a discrete, noiseless, memoryless Channel, where we think of a transmitted sequence x E Srflm"1 being received as the sequence fn(x) 6 S“. The capacity Cn of this channel is easily computed to be log N(n) (see Feinstein [10] for the definition of capacity of a memoryless channel). Feinstein [11] has shown that lim‘;9 3 0'. But since lim'EE = 11m.128:flifll = C, we have C = C'. # 11-1» 11"” [I‘m IV. The Dimension of the Received Set Let f* 6 3m and let f be the correSponding transducer of memory m. Let Y = f((O,l]) be the range of f (Y is the collection of all possible received sequences). In this section we show that the Hausdorff dimension of Y is C, the transducer capacity. ngmg_fl;l, The expression (3.10) for the rate of transmission R“ of the transducer reduces to Ru = H(x). Proof: Since by (3.10), Ru = H(u) + H(A) ' H(P): we must show that H(u) - H(P) = 0. We will make use of the following two facts: if p,q, and r are positive real numbers then (4-2) (P+q)log(p+q) 2 p 10g p + q 10g a and (4.3) (P+q)108(P+ O and c > 0 be given, and choose a positive integer k such that b.k < p and 1 C +'e > -23fiESEL . Then -k c+ LC+€(Y,p) sN(k)b ( 3’ 0 be given and choose u Ein so that Ru >'C' - 6. Then by Theorem 3.12 and Lemma 4.1 we have H(x) > C - so Now A E‘m by Theorem 3.6 and thus the 15 Shannon-McMillan-Breiman theorem (see Billingsley [55]) shows that , l (4.6) 11m - E'log k([b1(y),...,bn(y)]) = H(X) a.e. [A]- n—m Let M be the set of y's for which (4.6) holds. Then M n Y C M and by a general theorem due to Billingsley [5 ], p. 141, we have dim M n Y = H(x)dime n Y. Now A(M) = 1 and 1(Y) = p({x: f(x) 6 Y}) = u((O,1]) = 1 so 1(M 0 Y) = 1. Thus dime1W'Y = 1 by (2.7), so dimMnY=H(>\)>C'€o Since a was arbitrary, we have dim M n Y 2 C, and since MnYCY, we have dimdeimMnYzc. # We remark that dim M.n Y = H(x) can also be shown by using a theorem of Dym [ 7, Theorem 2]. Dym provides a direct proof not involving Billingsley's general theorem. We also note that both definitions of capacity given in Section III were used in proving Theorem 4.5. The C definition was used in showing dim Y s C and the C' definition was used in showing dim Y 2 C. We next construct a set Y' which has the same dimension as Y and which illustrates the structure of the received set. For each integer n 2 1, let II’I;"°°’IN(n) be the N(n) closed b-adic intervals of length b.n which cover Y. Let n n n Yn - I1 U 12 U...U IN(n)’ l6 and let It is clear that Y' :>Y since Yn D‘Y for all n, and since the dimension of a set is not changed by the addition of a countable number of points, the following theorem shows that dim Y' = dim Y. Theorem 4.7. The set difference Y' - Y is countable. .Proof: If Y' - Y = ¢ the theorem is true. Thus we assume Y' - Y # ¢ and we let y E Y' - Y. We assume for the moment that y is irrational; then the expansion y = 2 yib.1 is unique. i=1 For each integer n 2 1 there exists an integer jn satisfying 1 s jn s N(n) such that y E I: . Let En = f-1(1; ); n n En is the union of those half-open (open on the left) b-adic - +m-l intervals of length b (n ) which map under E into I? . n Since y is not an endpoint of any In , we have In 33 I:+l , jn jn n+1 and thus EnID En+l' By the finite intersection property (see, for example, 9 Hocking and Young [15], p. 19) E = n c En # ¢, where c En is n=l m the closure of En' Since y 4 Y, E' = f] En = ¢. Let x E E; n=l then x E c En for all n. Since E' = ¢, there exists a positive integer M such that x 4 En for all n 2 M. Therefore, for all n 2 M, x is the left hand endpoint of one of the intervals comprising En. Thus x is a b-adic point. Let x = :E‘EIxib-i be the terminating expansion of x. Then since for any n 2 M, x1,...,xn+m_1 represent one of the intervals of En, we have fn(x1 ... xn+m-l) = y1 ... yn. But since xi = O for all i larger than some positive integer, we l7 * . must have y. = f (O...O) for all i larger than some positive 1 E—EETOS integer. This is a contradiction since the expansion of y, y being irrational, is nonrepeating. Therefore, since assuming that the set Y' - Y contained an irrational point led to a contradic- tion, Y' - Y must contain only rational points. # That the set Y' - Y may indeed be nonempty is shown by the following example. * Example 4.8. Let b = 3, m = 1, and let f E 31 be defined * * * by f (O) = 1, f (l) = O, and f (2) = 2. Then the ternary m - fraction y = 2 yiB 1, y1 = l for all i, belongs to Y' since i=1 for each n the point f(.O...O 111...) matches y in the first n zeros n places. However, y E Y since the point 0 is not in the domain of f, and no other point can possibly map into y. V. The Ambiguity g£_the Transducer * Let f E 3m, m 2 l, and let f be the corresponding trans- ducer of memory m. Let y 6 (0,1] and for n 2 1, let Mao) = crd f;1(b1(y)...bn 0 and s > 0 be given, and choose a positive integer k such that _ _ 108 M (y) b (k+m 1) < p k and D +'e > k . Then LNG (M(y) .p) s Mk(y)b’ (Hm-1) (D+e) '108 (y) _ _ < Mk(y)b Mk b (m 1) (mg) g 1. Since p was arbitrary, it follows that LD+€(M(y)) s l, and thus dim M(y) S D+e by (2.3). Since a was arbitrary, we have (5.12) dim M(y) s D. Since (5,12) holds for all y E E, the proof is completed. # Example 5.13. We demonstrate in this example why for y a b-adic point, M(y) is defined by the set (5.1). Let b = 2, m = 2, and let f* 6 32 be defined by 5*(00) = £*(11) = o and f*(Ol) = f*(lO) = 1. Let y be the dyadic point .0111..., and note that .1000... is also equal to y. Now M1(y) = crd(f*)-1(O) = crd{00,1l} = 2. The two intervals (0,1/4] and (3/4,l] represented by the two sequences 00 and 11 do not cover f-1(y) since the point x = .0111... belongs to f-1(y) but does not belong to either (0,1/4] or (3/4,l]. However, if wearer-7F- 23 M(y) is defined to be the set (5.1) and not the set f-1(y), then it is clear that for all n 2 1, the M (Y) intervals n represented by the set of sequences f;1(bl(y)...bn(y)) will cover M(y), a property which is essential for the proof of Theorem 5.11. The discussion so far in this section has been for trans- ducers of memory m 2 2, since the matrices A1 are not defined for m = 1. We now examine the case m = 1 separately. * . Let f 6 3h with f the corresponding transducer of ). * memory 1. We represent f by a b X b matrix D = (di J * 0 s i, j s b-l, as follows. The entry di is one if f (i) = j J * and is zero if f (i) # j. Thus each row of D contains a single entry of one and all other entries of the row are zero. Define b-l L. = 2 d,,, 0 s j s b-l; L, is the number of elements of S i=0 ‘3 J * which map to j under f . Let u 6 W2 and define pi = p([i]), 0 s i s b-l and q. = 2 d ,p , O S i s b-l. We note that 1 defined by (3.5) 1 1:0 ji j . is such that k([i]) = qi’ 0 s i s b-l, since A([i]) = p({x: f(x) 6 [i]]) = z'u([j]) = 91’ where 2' denotes * the summation taken over those j for which f (j) = i. We now prove the equivalent of Theorem 5.8 for the case m = l. The proof is similar to that of Theorem 5.8, using Li instead of A1 and the ergodic theorem instead of the Furstenberg and Kesten theorem. Theorem 5.14. Let f be a transducer of memory 1, let u 67%, and let A be defined by (3.5). Then the ambiguity at the point 108 Mn(y) n y, lim n—a , exists and has the same value (namely, 24 l +...+' . qo 0g ‘0 qb_llog L ) for almost all y [A] b-l 'nggf: Let h:(y) be the number of occurrences of i, O s i s b-l, among the first n digits of the nonterminating base b expansion of y. By Theorem 3.6, A 6 711 and we may apply the pointwise ergodic theorem (see Billingsley [5 ], p. 13) to conclude that for each i, 0 S i s b-l, h:(y) n (5.15) lim n—cm for almost all y [x]. Let D., 0 s i s b-l, be the set of y's 1 i b-1 hn(y) for which (5.15) holds. Then for y 6 F = n Di’ lim n = q1 i=0 new for all i, and x(F) = 1 since x(Di) = l, O s i S b-l. Now since for each n 2 l we obviously have h°(y) hz'l(y) _ n Mn(y) - to ...L b-l ’ then 0 b-l 108 Mn(Y) hniy) ha (y) _——_= +00. —— n n log Lo + n log Lb-l’ where we take 00 to be one and 0 log 0 to be zero. Hence if 108 Pg1(y) y 6 F, we have lim-—-————-—- exists and is equal to new n qolog Lo +...+qb 1log L Since x(F) = l, the proof is com- b-l' pleted. # We now prove the equivalent of Theorem 5.11 for the case m = 1. In this case we are able to obtain an equality for the dimension of the ambiguity set rather than just an upper bound. We begin with a lemma. Lemma 5.16. Let y 6 (0,1] and let ED % U'[x1,...,xn] where U' denotes the union over those x .,xn such that 1,.. 25 fn(xl...xn) = bl(y)...bn(y) (there are clearly Mn(y) such sequences x1,...,xn). Then a: 0 En = M(y)- n=l * Proof: Let x 6 M(y). Then f (b(xi)) equals the ith digit of * the nonterminating expansion of y, i.e., f (b(xi)) = bi(y). Thus fn(b1(x)...bn(x)) = b1(y)...bn(y) for all n. Hence x 6 En so M(y)<: En for all n. Therefore (5.17) M(y) C: Q En. n=l Let x E 0 En. Then fn(bl(x)...bn(x)) = b1(y)...bn(y) n=1 for all n so f*(bn(x)) = bn(y) for all n. Thus x 6 M(y) SC (5.18) En<: M(y). l : ll:)8 Inclusions (5.17) and (5.18) give the desired result. # Theorem 5.19. Let f be a transducer of memory 1, let u 61%, and let A be defined by (3.5). Then b-l dim M(y) = 2: qilos Li i=0 for almost all y [1]- Proof: Let y 6 (O,l] and let y1 = b1(y) for all i 2 1. For k each integer k 2 l and each sequence ...xk 6 S , define a x1 on S by function pk 26 l/Mk(y) = 1/t,y ...L if fk(x1...xk) = y1.. -Y 1 YR k pk(x1...xk) = 0 otherwise. It is clear that (5.20) pk(xl...xk) 2 0 for all k. Also we have (5.21) z p1(i) = L /L = 1. iES y1 y1 We show next that (5.22) .2 pk+1(x1...xk1) = pk(x1...xk). 168 If pk(x1...xk) = 0 then fk(xl...xk) # yl...yk so fk+1(x1...xki) f yl...ykyk‘+1 for all i 6 8. Hence pk+1(x1...xk1) for all i 6 S so (5.22) is true in this case. If pk(x1...xk) = l/LYI...Lyn then fk*1(x1...xki) = yl...ykyk+1 for the L . . * . _ yk+l values of 1 for which f (1) yk+l° Hence p (x ...x i) is l/L ...L L for those L values of k“ 1 k y1 yk yR+1 yR+1 i and is zero for the remaining values of 1. Therefore 2? (Knox i)=L /L ...x, 165 k+1 1 k y1.4-1 y1 so (5.22) is also true in this case. Finally we see that for any seq of S, we have (5.23) lim pk+n(x1...xk n—cco yR+1 uence = pk(x1...xk) X 1". k .,x of elements m, u~‘._AM 27 It follows from (5.20), (5.21), (5.22) and (5.23) that there exists a probability measure vy on the unit interval such that vy([x1,. .. ,xk]) = pk(x1. . .xk) , k where x1...xk 6 S (see Billingsley [5 ], p. 35). Now let F be the set of y's for which Theorem 5.14 holds, and let y 6 F. If x 6 M(y) and if we set xi = bi(x) and yi = bi(y), then 108(1/L o a 0L ) log v ([x ,...,x ]) y y lim - ,‘y 1 n = lim - 1 n n n [1‘03 n—m 10 C C C 8 Lyl Lyn log Mn(y) b-1 = lim = lim ---———- = Z qolog L.. n n . i i nao new i=0 The next to the last equality follows from the fact that we obviously have Mn(Y) = L ...L , and the last equality follows yl yn from Theorem 5.14. By a theorem of Billingsley [55], p. 141, we now have b-l (5-24) dim M(y) = dim M(y) 2 q.log L. v i i y i=0 for all y 6 F. If En is defined as in Lemma 5.15, it is seen that an vy(En) - 1. Since gllEn = M(y) by Lemma 5.15 and Since {En} is a decreasing sequence, we have vy(M(y)) = 1. Hence dimv M(y) = l by (2.7) and thus from (5.24) we have Y b-l dim M(y) = 2 qilog L. i=0 1 28 for all y 6 F. Since x(F) = 1 by Theorem 5.14, the proof is completed. # For the case m = l we were able to calculate the dimension of the ambiguity set M(y) (Theorem 5.19), whereas in the case m 2 2, we were able to obtain only an upper bound on dim M(y) (Theorem 5.11). Conjecture is that for the case m 2 2, dim M(y) is actually equal to this upper bound. However, the method of proof of Theorem 5.19 cannot be used to prove this conjecture, since the consistency condition (5.22) may not hold for m 2 2. The following example demonstrates this fact. * Example 5.25. Let b = 2, m = 2, and let f 6 32 be defined * * * * by f (00) = o and f (01) = f (10) = f (11) = 1. The function f2 is then as follows: Domain Value Functional Value HHHHOCOO l—‘HOOl-‘HOO i-‘i-‘t-‘i-‘t-‘D-‘OO l-‘I-‘t-‘Ol-‘r-‘I—‘O 0 l O l O l O 1 Let y = .11; we see that M1(y) = crd {01, 10, 11] 3 and M2(y) = crd {010, 011, 101, 110, 111} = 5. If we define (as in the proof of Theorem 5.19) p2(00) = O, p2(01) = p2(10) = p2(ll) = 1/3 and p3(000) = p3(OOl) = p3(lOO) = o, p(010) = p(Oll) = p(lOl) = p(110) = p(lll) = 1/5, then it is clear that the consistency con- dition (5.22) does not hold. Thus for any y 6 (3/4,l], we cannot define a measure Vy as we did in Theorem 5.19. VI. Conclusion This paper has begun an examination of the relationship between an information channel and the unit square. In partic- ular, we have examined a special type of noiseless channel with memory called a transducer. This same transducer has been examined from a topological dynamics standpoint by Hedlund [14]. We now comment on our results and mention some difficulties encountered. Theorem 4.5 shows that capacity is a reasonable term for the quantity C in the following sense. The "size" (dimension) of the set of all possible received sequences is a measure of the "capacity" of the transducer, since under no circumstances can the transducer be made to create new output sequences and hence increase the "size" of this set. This interpretation of "size" as "capacity", and the fact that the quantity C is equal to this "size" (Theorem 4.5), make capacity a reasonable term for C. Attempts at calculating C explicitly by finding a difference equation satisfied by N(n) were unsuccessful. During these attempts it was discovered that the difference equation method used by Shannon [19, Appendix 1] for his finite state sources calculates the number of state sequences of a given length rather than the number of output sequences as desired. Conant [6] is also aware of this problem. 29 3O Theorems 5.8, 5.11, 5.14, and 5.19 show that the transducer has a homogeneity property since almost all received sequences have the same ambiguity. The usual quantity used for ambiguity in information theory is a conditional entrOpy (called the equi- vocation by most authors) which represents the ambiguity averaged over all received sequences. Our pointwise ambiguity has a geometric interpretation and gives results involving almost every- where statements which are somewhat more satisfying than state- ments involving averages. Yet to be resolved is the question of whether or not the inequality of Theorem 5.11 is actually an equality. One direction in which our work may be extended is to con- sider more general channels by introducing the element of noise. Another direction is to consider how to make the transducer invertible, i.e., find a subset of the unit interval of measure one such that the tranSducer is a one-to-one function onto the set of all received sequences when restricted to this subset. 10. ll. 12. 13. BIBLIOGRAPHY Besicovitch, A.S., On the sum of digits of real numbers represented in the dyadic system, Math. App. 110(1935), 321-330. Billingsley, P., Hausdorff dimension in probability theory, 111. J, Math. 4(1960), 187-209. , Hausdorff dimension in probability theory II, 111. i. Math. 5(1961), 291-298. , On the coding theorem for the noiseless channel, Ann. Math. Statist. 32(1961), 594-601. , Ergodic Theory and Information, John.Wi1ey and Sons, New York (1965). Conant, R.C., Channel capacity of Meore automate, Information and Control 12(1968), 453-465. Dym, H., On a class of monotone functions generated by ergodic sequences, Amer. Math. Monthly 75(1968), 594-601. Eggleston, H.G., The fractional dimension of a set defined by decimal properties, Quart. J, Math. Oxford Ser. 20(1949), 31-360 , Sets of fractional dimensions which occur in some problems of nunber theory, Proc. London Math. Soc. (2) 54 (1952), 42-93. Feinstein, A., Foundations pf Information Theory, McGraw- Hill, New York (1958). , 0n the coding theorem and its converse for finite- memory channels, lpformation and Control 2(1959), 25-44. Furstenberg, H. and H. Kesten, Products of random matrices, Ann. Math. Statist. 31(1960), 457-469. Hedlund, G.A., Mappings on sequence spaces, Communications Research Division Technical Report No. 1, von Neumann Hall, Princeton, New Jersey, February 1961. 31 14. 15. 16. 17. 18. 19. 20. 32 Hedlund, G.A., Transformations commuting with the shift, in Topological Dynamics, J. Auslander and W. Gottschalk, eds., W.A. Benjamin, New York (1968), 259-289. Hocking, J. and G. Young, Topology, AddisonJWesley, Reading, Massachusetts (1961). Khinchin, A.I., Mathematical Foundations pf Information Theory, Dover, New York (1957). Kingman, J.F.C. and S.J. Taylor, Introduction £2.Measure and Probabiligy, Cambridge University Press, Cambridge (1966). Kinney, J.R., Singular functions associated with Markov chains, Proc. Amer. Math. Soc. 9(1958), 603-608. Shannon, C.E. and W. Weaver, The Mathematical Theory pf Communication, University of Illinois Press, Urbana,(l964). (reprinted from Bell System Tech. g, 27(1948), 379-423, 623- 656.) Smorodinsky, H., The capacity of a general noiseless channel and its connection with Hausdorff dimension, Proc. Amer. Math. Soc. 19(1968), 1247-1254.