HWZMIIHHHIWMl

l
l
\

\
[

 
    

WWIIHHNUJHIIWWIUI

YHE CAPACITY ANS AMBIGUW
0F A TRANSQUCER

l

)l

Thesés for the Degree 0‘? 2951.9.
MECHEGAN STAE UNWERSWY
WILLEAM MORGAN CONRER
1969

THESIS

0-169

This is to certify that the

thesis entitled

THE CAPACITY AND AMBIGUITY OF A TRANSDUCER

presented by

William Morgan Conner

has been accepted towards fulfillment
of the requirements for

___/7/2' 0 'degree in_Z//,(/..t7 fzém44é/z4'f

Majdr professor

Date 0/9/C 2]; /f//V

   
     

  

LIBRARY '

Michigan State
University

 

 

ABSTRACT.
THE CAPACITY AND AMBIGUITY OF A TRANSDUCER
By

William Morgan Conner

Up to now little work has been done on examining the corre-
spondence between a discrete, noisy information channel with memory
and the unit square. The correspondence is made by associating
the infinite sequences of the channel with the expansions of points
in the unit interval. We begin such an investigation in this
paper, although not in this generality. We introduce the element
of memory but not noise, and examine the following particular
type of noiseless channel with memory.

Let S = {0,l,...,b-l} where b 2 2 is an integer and
let m be a positive integer. Let 3m be the set of all func-

* *
tions f from Sm = SxSx...x into S. Now let f E 3m, let

m factors

a -

X E (0,1] and 19C 2 xib 1 be the (unique) nonterminating base
i=1

b expansion of x. We define a function f from (0,1] into

a _ 'k

[0,1] by f(x) = Zyb i where y =f (x ...x
i=1 1 1 1

i = 1,2,... . We call f a transducer of memory m, and we

i+m-l)’
call {xi} the input sequence and {yi} the received sequence.
We give two definitions for the capacity of the transducer
and show that these two values for the capacity are equal. We
then show that the Hausdorff dimension of the set of all received

sequences is equal to the transducer capacity. This result

William Morgan Conner

establishes a correspondence between a geometric property
(dimension of the set of received sequences) and an informa-
tion theoretic property (transducer capacity). It shows that
the "size" (dimension) of the received set, which is an intu-
itive measure of the "capacity", is equal to the quantity
formally defined as the capacity.

With geometric considerations as the motivation, we next
define the ambiguity of the transducer. It is shown that the
transducer has a homogeneity property by proving that the
ambiguity is almost everywhere the same. The usual quantity
used for ambiguity in information theory is a conditional
entropy (called the equivocation by most authors) which repre-
sents the ambiguity averaged over all received sequences. Our
definition is pointwise and gives results involving almost
everywhere statements which are somewhat more satisfying than

statements involving averages.

THE CAPACITY AND AMBIGUITY OF A TRANSDUCER

BY

William Morgan Conner

A THESIS

Submitted to
Michigan State University
in partial fulfillment of the requirements
for the degree of

DOCTOR OF PHILOSOPHY
Department of Mathematics

1969

$0 1 77-4
7’ / ‘7“

ACKNOWLEDGMENTS

The author wishes to thank his advisor, Dr. John R. Kinney,
for suggesting the topic of this thesis, and for his helpful
suggestions, patience, and encouragement throughout the course

of the work.
Thanks are due also to Dr. G.A. Hedlund of Yale University

for sending cOpies of requested material to the author.

ii

II.

III.

IV.

VI.

TABLE OF CONTENTS

INTRODUCTION .. ...... ....

HAUSDORFF DIMENSION AND ENTROPY ... ......

THE CAPACITY OF THE TRANSDUCER .. ........

THE DIMENSION OF THE RECEIVED SET ... ........... . ......

THE AMBIGUITY OF THE TRANSDUCER ...... - .............. ...

CONCLUSION .....

BIBLIOGRAPHY

iii

13

18

29

31

I. Introduction

 

Besicovitch [1.], Eggleston [ 8, 9 ] and others have cal-
culated the Hausdorff dimension (see Section II) of subsets of
the unit interval defined by placing certain restrictions on the
digits of expansions of numbers. For example, let M(p),

0 s p S 1, be the set of points in the unit interval containing

1 in their dyadic expansions in the proportion p, i.e.,

x = .xlxz... belongs to M(p) if and only if lim n-1 2 xk = p.
n-m k=l

Eggleston [8] has shown that the dimension of M(p) is

p logzp - (l-p)log2(l~p). (A simplified proof is obtained by

using a general theorem due to Billingsley [5 ], p. 142.)

Observing that this value for the dimension is the entropy
of an information source, Kinney [18] and Billingsley [2,3,4 ]
sought a connection between dimension theory and information theory
by making the infinite sequences of symbols from the source
correSpond to the expansions of points in the unit interval.
Theorem 1 of Kinney's paper asserts the existence of a set of
measure one whose dimension is the entropy of a Markov source.
Dym [7] recently extended this theorem to general stationary,
ergodic sources. Theorem 2 of Kinney's paper is concerned with
noiseless coding and shows that the dimension of a certain set
corresponding to the coded messages is equal to the capacity of
the noiseless channel. Smorodinsky [20] recently extended this
theorem to very general noiseless channels.

1

As yet little work has been done on examining the corre-
Spondence between a discrete, noisy channel with memory and the
unit square. Such an investigation is begun in this paper,
although not in this generality. We introduce the element of
memory but not noise, and examine the following particular type
of noiseless channel with memory.

Let S = (0,1,...,b-l} where b 2 2 is an integer and
let m be a positive integer. Let 3m be the set of all

*
functions f from Stn = S X S X...X S into S. We note that

m factors

m
crd 3b = bb where crd denotes the cardinal number of a set.

*
Let f 6 3m and let n be a positive integer. We de-

fine a function fn from Sn+m-l into Sn as follows. Let

_ nﬁm-l _ *
X -' X1...Xn+m-1 e S and 18C yi f (xi...xi'Hn-l)’
i = 1,...,n. Then the sequence y = yl...yn 6 Sn, and we define

*-

fn(x) = y. Note that f1 = f .

Now let x 6 (0,1] and let g xib-1 be the (unique)
nonterminating base b expansion of1 i. We define a function
f from (0,1] into [0,1] by f(x) = Elmyib-D1 where
y1 = f*(xi...xi+m_1), i = 1,2,... . In 2h: terminology of
Shannon [19], the function f is an example of a transducer of
memory m. In the terminology of Feinstein [10], Billingsley
[5 ], Khinchin [16], and others f is a noiseless, discrete
channel (or code) of memory m. We will occasionally call
{x1} the input sequence and {y1] the output (or received)

sequence; and, following Shannon, we will call f a transducer

of memory m.

In Section II we define and give the basic properties of
Hausdorff dimension and entropy. In Section III we give two
definitions of the capacity of the transducer and show that they
are equivalent. It is shown in Section IV that the dimension of
the set of all received sequences is equal to the capacity.
Finally, in Section V we define and examine the ambiguity of the
transducer.

The pound sign (#) will be used throughout this paper to
denote the end of a proof. Also, all logarithms in this paper

are to the base b.

II. Hausdorff Dimension and Entropy

In this section we give the definitions and basic properties

of Hausdorff dimension and entropy, two concepts we will be using

later.
a 0
Let x 6 (0,1] and let x = z xib 1 be the nonterminat-
i=1
ing base b expansion of x. Define bi(x) = x1 for all i, i.e.,

bi(x) is the ith digit of the base b expansion of x. A set
of the form {x: bi(x) = 81, i = l,...,n}, where 81 E S, is
denoted by [31,...,sn] and is called a cylinder of length b-n.
Note that [31,...,sn] is a half-open (open on the left) b-adic
interval of length b.n (i.e., an interval of the form

(:3 , 13%] for some j, 0 s j 3 bn - l).

The following definition of dimension in the unit interval,
which extends Hausdorff's original definition, is given by Billingsley
[5 ]. Let MA: (O,l], let a and p be positive real numbers,
and let u be a probability measure on the Borel sets of. (O,l].
Define

pamw) = inf if 9.071)“.
where the infimum is taken over all u-p'-covdrings of M, a
p-p-covering being a covering by cylinders V1 with p(vi) < p.
It is clear that ua(M,p) s ua(M,p') for p' < p, so the limit
Ham) = if; pawn)
4

exists (but may be infinite). It can be shown that for fixed M

there is an a such that p. (M) I on for a< or and p. (M) = 0
O a o a

for a >'ao. The number 00 is called the (Hausdorff) dimension

of M with respect to p and is denoted by dimu M. We will

denote Lebesgue measure by L and will write dim M instead of

dim M.

L

Some prOperties of dimension are:

(2.1) ua(M) > 0 implies dimh M 2 a;

(2.2) dimh M > 0 implies ua(M) ==m;

(2.3) ua(M) < m implies dimL M s a;

(2.4) dim“ M < 0 implies Liam) = O;

(2.5) O S dimh M s l;

(2.6) dimh M = 0 if M is countable; and

(2.7) dimh M = 1 if M is a Borel set with u(M) > O.

Dimension is useful for indicating the "size" of a noncountable,
measure zero set.

Entropy was introduced into information theory by Shannon
[19]; Kolmogorov and Sinai have extended the notion of entropy to
general measure preserving transformations (see Billingsley [55]).
Discussions of entropy in information theory may be found in
Feinstein [10] or Dym [7 ].

Let B be the Borel sets in (0,1] and define a trans-
formation T on (0,1] by Tx = [bx] for each x 6 (O,l], where

{bx} denotes the fractional part of bx. The transformation T

is a left shift on the digits of the base b expansion of x.
If p.(T-]B) = p.(B) for all B E B, T is said to be measure pre-
serving. In such a case we will say that u is stationary.

We now define the entropy of a stationary probability

measure u as follows. For each positive integer n define

Hug") = '2 H([Sls°°-ssn])log H([sls°°'ssn])s

where the summation is taken over all s ... s 6 Sn, and where

l n
we take 0 log 0 to be zero. It can be shown that the limit
H(u)
H(H) = lim _n___
nab n

exists. We call H(u) the entropy of u.

Similar definitions can be given for the unit square. De-
fine a transformation TI on (0,1] X (O,l] by T1(x,y) = (Tx,Ty)
for each (x,y) 6 (0,1] X (O,l]. As before a probability measure
P on the Borel sets of (0,1] X (0,1] will be called stationary

if T1 preserves P. For stationary P we define
Hn(P) = -2 P([sl,...,sn,r1,...,rn])log P([sl,...,sn,r1,...,rn]),

where [31,...,sn,r1,...,rn] = {(x,y): bi(x) = $1, bi(y) = ti,
1 s i s n}, and where the summation is taken over all

s ... s ,r

... r 6 8“. Again it can be shown that the limit
1 n l n

Hum ,
H(P) = lim n ,'

n—m

 

called the entropy of P, exists.

III. The Capacity of the Transducer

 

*
Let f E 35 and let f be the corresponding transducer

of memory m. Let n be a positive integer and let

N(n) = crd fn(Sn*m-1),

i.e., N(n) is the number of distinct output sequences of length
. n+m-l ,
n which correspond to the b input sequences of length
n+m-l. Following Shannon's terminology [19] we call
C = lim lQEIEIEl
nam n
the capacity of the transducer.
Theorem 3.1. The limit C exists and satisfies 0 s C s 1.
Proof: Let k and n be positive integers. There are N(k)
different ways in which a received sequence of length k+n may
begin, and at most N(n) different ways in which it may end.

Hence
(3.2) N(k+n) s N(k)N (n).
Also it is obvious that
(3.3) N(k) s N(n) for k S n.
We now follow a well-known procedure (see, for example,
Feinstein [10], p. 85) to show that C exists. Let a = inf lEBEELEl

n
and let a > 0 be given. There exists an integer r such that

$23:E££l s a.+ e- For any integer n 2 r define kn by

(kn-l)r S nr< knr. By (3.3) and (3.2) we have

log N(n) 3 log N(knr) 3 kn log N(r),

 

 

and thus
k r k r k
log N(n) S n log Nit) S ___n__(a+€) = “ (a+€).
n n r (kn-Dr kn-l

k
kn
l
1. It follows that limnsup .28;§£El s 1+3. Since 6 was arbitrary,

we have limnsup 125:!121 s a, and since legagjni 2 a for all n,

 

As n approaches m, kn approaches a and hence El approaches

wezhave limninf 123:§121 2 a. Thus C exists (and is equal to a).

Clearly C 2 0 since 12535121 2 0 for all n. As for
C s 1, we note that 125:5121 s 12532: = l for all n. #

We now apply another definition of capaCity, introduced by
Shannon [19] and expanded upon by others for noisy channels, to
our transducer.

Let T be the transformation defined in Section II, and
let u be a probability measure on the Borel sets .52 Then T
is called ergodic under u if for each B 6.6 such that
T-lB = B, u(B) is either zero or one. In such a case we will
say that u is ergodic, omitting reference to T. We will denote
by 771 the set of all probability measures on B which are
stationary (defined in Section II) and ergodic.

For each x E (O,l] define a probability measure vx on
13 by letting vx assign unit mass to the point f(x). Let
p 6 W1 and, for M 6 Ca where C, is the class of Borel sets in

(0,1] X (O,l], define

(3.4) MM) = f vx<{y: (m) e M})u(dx)-
(0.1]

To see that this integral is defined (i.e., that the integrand is
measurable) we proceed as follows. Let B be a cylinder (b-adic
interval). Then f-1(B) is the finite disjoint union of b-adic
intervals and hence f-1(B) 618. Since the class of b-adic intervals
generates 13 (i.e., the smallest o-algebra containing the class
of b-adic intervals is 6), f is measurable with respect to 5'.
Now let B be a fixed member of B. We show that the
function g(x) = vx(B) is measurable with respect to .6. The
function g assumes only two values, namely zero on the set
D = {x: f(x) d B} and one on the set E = {x: f(x) 6 B}. Thus
g will be measurable if D and E €18. Since E = f-1(B) and
f is measurable, E 616, and since D is the complement of E,
D 618. Thus vx(B), as a function of x, is measurable.
We next show that the class .B of all sets M in CI for
which the integrand in (3.4) is measurable is a monotone class.

Let M1 c: M2 C... be a sequence of sets in .3. Then

vx({y= (my) 6 L: MiD = vxaib': (my) 6 Mi}) = 11m vx({y: (my) 6 Mil)

and hence U M1 6.5 since vx({y: (x,y) E U Mi})’ being the limit
1 i

of measurable functions, is measurable. Similarly 0 Mi E-D for

a decreasing sequence M1 :3 M2 :3... of sets in .8. 1Thus .3 is a
monotone class. Also .0 contains all rectangles B X C, B,C 6.6,
since the set {y: (x,y) E B X C} is either empty or C, and we

showed above that vx(C) is~measurable. It follows easily that

.8 contains the finite, disjoint unions of rectangles B X C,

lO

B,C E18. Thus by a well-known result (see, for example, Kingman
and Taylor [17], p.18), we have .8 = Ca Hence the integrand in
(3.4) is measurable for all M 6 CL

It is easily verified that P(M) = u(projx{M n graph of f}),
where projx denotes the projection on the x-axis, and that P
is a probability measure. Also the set function A defined, for

B 648, by
(3.5) X(B) = P((0,1] X B)

is easily seen to be a probability measure. We note that
MB) = u({x: f(x) 6 3}).
Theorem 3.6. P is stationary and A 6 77¢.

Proof: We show first that for any B 676 we have
-1 -l
(3.7) [x: f(x) 6 T B} = T [x: f(x) 6 B}.

Let A be the set on the left hand side of (3.7) and let A' be
the set on the right hand side of (3.7). Let x E A and let

xi = bi(x), i = 1,2,..., i.e., x1 is the ith digit of the non-
terminating base b expansion of x. Then f(x) 6 T-lB which
implies that f(x) = .y yly2 ... where y E S and the point
.ylyz... E B. Thus f(Tx) = f(.x2x3...) = .y1y2... E B which is

the condition that x 6 A'. Therefore
(3.8) A<: A'.

Now let x E A' and let xi = bi(x), i = 1,2,... . Then
f(Tx) E B, i.e., f(.x2x3...) = .y1y2... where .ylyz... 6 B. It

follows that f(.x1x2...) = .y ylyz... for some y E S, or

11

f (x) E T-1B. Thus x E A so
(3.9) A' CIA.

Inclusions (3.8) and (3.9) give (3.7).

Now for any B E B,

u({x: f(x) 6 T-lB])

x(T'lB)

u<T'1{x: f(x) 6 Bl)

H({Xt f(x) 6 Bl)

A(B),

where the next to the last equality follows by the stationarity
of n. Thus A is stationary.

If B 648 is such that T-lB = B, then by (3.7) we have

T-1[x: f(x) 6 B} = {x: f(x) 6 T-1B}

= {x: f(x) 6 B}.

It follows by the ergodicity of u that x(B) (which is equal to
u([x: f(x) 6 B])) is either zero or one. Thus I is ergodic.

To see that P is stationary it suffices to prove that
-l
P(T1 (B x C)) = P(B x C)
for all B,C €18 (Billingsley [5 ], p. 4). We see that

P(T-IB x T'IC)

P(T;1(B x C))

u<projx{<T'IB x T'1c> n graph of f})

u({x: f(x) 6 T-IC} ﬂ T-1B)

12

n(T'1{x: f(x) 6 C} n T-IB)

u(T'1<{x: f(x) 6 c} n 3))

u([x: f(x) 6 C} O B)

P(B x C). #

Now for p EEWL all three measures u, A, and P are

stationary so their entropies are defined. We let

(3-10) Ru = H(u) + H(x) - H(P)
and
(3-11) 0' = Sup R .

new

The number R9 is called the rate of transmission of the trans-
ducer with reSpect to u. See Billingsley [5:] for an intuitive
interpretation of R“. We now show that C' can be called the
transducer capacity by proving the following theorem.

Theorem 3.12. C' = C.

(2532;: For any integer n 2 l, the triple (Sn+m-l’ Sn, fn) forms
a discrete, noiseless, memoryless Channel, where we think of a
transmitted sequence x E Srﬂm"1 being received as the sequence
fn(x) 6 S“. The capacity Cn of this channel is easily computed
to be log N(n) (see Feinstein [10] for the definition of capacity
of a memoryless channel). Feinstein [11] has shown that

lim‘;9 3 0'. But since lim'EE = 11m.128:ﬂiﬂl = C, we have C = C'. #

11-1» 11"” [I‘m

IV. The Dimension of the Received Set

 

 

Let f* 6 3m and let f be the correSponding transducer
of memory m. Let Y = f((O,l]) be the range of f (Y is the
collection of all possible received sequences). In this section
we show that the Hausdorff dimension of Y is C, the transducer
capacity.
ngmg_ﬂ;l, The expression (3.10) for the rate of transmission
R“ of the transducer reduces to Ru = H(x).

Proof: Since by (3.10),

Ru = H(u) + H(A) ' H(P):

we must show that H(u) - H(P) = 0. We will make use of the
following two facts: if p,q, and r are positive real numbers

then

(4-2) (P+q)log(p+q) 2 p 10g p + q 10g a

and
(4.3) (P+q)108(P+<I) + r 10g r S (p+q+r)10g(p+q+'r).

Both facts follow from the monotonicity of the logarithm.

Now

“H(P) = 'ZP([xls'°°sxnsyls°°°syn])log P([x1:°°'sxnsy19°--syn])

(4.4) -Zu([x1,...,xn]ﬂ{x: f(x)€[y1,...,yn]])
10g H([xls°°°sxn]n{xz f(x)6[y1"°°:yn]})-

l3

14

The set [x1,...,xn] n {x: f(x) 6 [y1,...,yn]] is either empty
or is the disjoint union of cylinders (b-adic intervals) of
- +m-l
length b (n ). Then by using (4.2) and (4.3) on the summation

(4.4) it is seen that

Ham) 3 Hn(P) s H (u).

n+mrl

Thus

Hum) H (P)

s lim

 

 

H(u) = lim
H

= H(P)

H (p) H (u)
s lim -Ei%;l——- = lim

n—m
Hence H(u) = H(P). #

Theorem 4.5. dim Y = C.

 

Proof: Clearly, for each positive integer n, N(n) b-adic intervals
of length b.n will cover Y. Let p > O and c > 0 be given,

and choose a positive integer k such that b.k < p and

1
C +'e > -23ﬁESEL . Then
-k c+
LC+€(Y,p) sN(k)b ( 3’
<N(k)b'1°g “(10 = 1.

Since p was arbitrary, it follows that LC+e(Y) s l, and thus
dim Y s C + e by (2.3). Since 6 was arbitrary, we have
dim'Y S C.
We now show that dim Y 2 C. Let s > 0 be given and choose
u Ein so that Ru >'C' - 6. Then by Theorem 3.12 and Lemma 4.1

we have H(x) > C - so Now A E‘m by Theorem 3.6 and thus the

15

Shannon-McMillan-Breiman theorem (see Billingsley [55]) shows that
, l
(4.6) 11m - E'log k([b1(y),...,bn(y)]) = H(X) a.e. [A]-
n—m
Let M be the set of y's for which (4.6) holds. Then
M n Y C M and by a general theorem due to Billingsley [5 ],

p. 141, we have
dim M n Y = H(x)dime n Y.

Now A(M) = 1 and 1(Y) = p({x: f(x) 6 Y}) = u((O,1]) = 1 so

1(M 0 Y) = 1. Thus dime1W'Y = 1 by (2.7), so
dimMnY=H(>\)>C'€o

Since a was arbitrary, we have dim M n Y 2 C, and since
MnYCY, we have dimdeimMnYzc. #

We remark that dim M.n Y = H(x) can also be shown by
using a theorem of Dym [ 7, Theorem 2]. Dym provides a direct
proof not involving Billingsley's general theorem.

We also note that both definitions of capacity given in
Section III were used in proving Theorem 4.5. The C definition
was used in showing dim Y s C and the C' definition was used
in showing dim Y 2 C.

We next construct a set Y' which has the same dimension
as Y and which illustrates the structure of the received set.
For each integer n 2 1, let II’I;"°°’IN(n) be the N(n) closed

b-adic intervals of length b.n which cover Y. Let

n n n
Yn - I1 U 12 U...U IN(n)’

l6

and let

It is clear that Y' :>Y since Yn D‘Y for all n, and since the
dimension of a set is not changed by the addition of a countable
number of points, the following theorem shows that dim Y' = dim Y.

Theorem 4.7. The set difference Y' - Y is countable.

 

.Proof: If Y' - Y = ¢ the theorem is true. Thus we assume

Y' - Y # ¢ and we let y E Y' - Y. We assume for the moment that

y is irrational; then the expansion y = 2 yib.1 is unique.

i=1
For each integer n 2 1 there exists an integer jn
satisfying 1 s jn s N(n) such that y E I: . Let En = f-1(1; );
n n
En is the union of those half-open (open on the left) b-adic
- +m-l
intervals of length b (n ) which map under E into I? .
n
Since y is not an endpoint of any In , we have In 33 I:+l ,
jn jn n+1
and thus EnID En+l'
By the finite intersection property (see, for example,
9
Hocking and Young [15], p. 19) E = n c En # ¢, where c En is
n=l m
the closure of En' Since y 4 Y, E' = f] En = ¢. Let x E E;
n=l

then x E c En for all n. Since E' = ¢, there exists a positive
integer M such that x 4 En for all n 2 M. Therefore, for all
n 2 M, x is the left hand endpoint of one of the intervals
comprising En. Thus x is a b-adic point.

Let x = :E‘EIxib-i be the terminating expansion of x.

Then since for any n 2 M, x1,...,xn+m_1 represent one of the

intervals of En, we have fn(x1 ... xn+m-l) = y1 ... yn. But

since xi = O for all i larger than some positive integer, we

l7

* .
must have y. = f (O...O) for all i larger than some positive
1 E—EETOS

integer. This is a contradiction since the expansion of y, y
being irrational, is nonrepeating. Therefore, since assuming that
the set Y' - Y contained an irrational point led to a contradic-
tion, Y' - Y must contain only rational points. #

That the set Y' - Y may indeed be nonempty is shown by
the following example.

*
Example 4.8. Let b = 3, m = 1, and let f E 31 be defined

* * *
by f (O) = 1, f (l) = O, and f (2) = 2. Then the ternary
m -
fraction y = 2 yiB 1, y1 = l for all i, belongs to Y' since
i=1
for each n the point f(.O...O 111...) matches y in the first

n zeros
n places. However, y E Y since the point 0 is not in the

domain of f, and no other point can possibly map into y.

V. The Ambiguity g£_the Transducer

*
Let f E 3m, m 2 l, and let f be the corresponding trans-

ducer of memory m. Let y 6 (0,1] and for n 2 1, let

Mao) = crd f;1(b1(y)...bn<y)).

i.e., Mn(y) is the number of input sequences of length n+m-l

which map into the output sequence b (y)...bn(y). In this section

log Mn(y
we examine the quantity lim-—
n-m

transducer at the point y, and we examine the dimension of the

, called the ambiguity of the

set M(y) = f-1(y) = {x: f(x) = y}. We will call M(y) the
ambiguity set of y.
When y is a b-adic point, the set M(y) consists of two

disjoint parts, namely,

*
(5.1) A = {x: f (bi(x)...bi+m_1(x)) is the 1th digit of the non-
terminating expansion of y, for all i 2 l}
and
* b c o o o
(5.2) B - {x: f (bi(X)--- i+m_1(x)) is the 1th digit of the

terminating expansion of y, for all i 2 1].

In this case we wish to consider only the set A so, for y a
b-adic point, we redefine- M(y) to be the set A.

For the moment we restrict our attention to the case m 2 2.
We begin by defining a set of matrices and stating a theorem which

allow computation of Mn(y). These definitions and the theorem

18

19

appear in a report by Hedlund [13].

Let r and s be integers satisfying 0 s r s bm”1 -l,

rlrz...rmm1 and $152...sm_1 be the

b-adic representations of r and s respectively, i.e.,

o s s s b”'1 - 1, and let

and

= x1X2"'xk E 8k is said to begin with r and end

with 3 provided the initial (m—l)-sequence of x is r ...r

A sequence x

1 m-l
and the terminal (m-l)-sequence of x is sl"'Sm-l'
Let A1 = (a:s), 0 s i s b-l, be the square matrices of order
bm“1 defined as follows. For 0 s r, s s bm-1 - l, ai is the

['8

- * - '
number of members of f11(i) = (f ) 1(i) which begin with r
and end with 3.

Theorem 5.3. Let y = yl-o-yk E Sk and 18t W(Y) =‘W(Y1---Yk) =

y1 y2 YR
A A ...A . Then wrs’ the r,s entry of W(y), O s r,

s S bm-1 - l, is the number of members of fk1(yl'°°yk) which
begin with r and end with s.
12522;: The proof is by induction on the length k of the sequences.
See Hedlund [13]. #

By the weight of a matrix A, denoted by \A‘, we will mean
the sum of all the entries of A.
Corollary 5.4. If y E (O,l], then Mh(y) = |W(b1(y)...bn(y))‘,

We now show that the ambiguity exists and has the same value

for almost all y. We begin with a lemma.

20

Lemma 5.5. Let A = (a ) and B = (bi ) be two n X n matrices

ij j
with nonnegative entries. Then ‘AB‘ 3 IAl‘B‘.

Proof: We see that

n 1'1 II

(5.6) [A3 = 2 2 z a. b .
‘ i=1 j=l k=l 1k k3
and
““ nn nn
(5.7) A B = ( 2 2 a. )( 2 z b..).
i=1 j=l lj i=1 j=l 13

It is clear that each term in the summation (5.6) appears on the
right hand side of (5.7), and since all terms are nonnegative we
have |AB| s |A||B|. #

We next state a theorem of Furstenberg and Kesten [12,
Theorem 2] which is needed for the proof of Theorem 5.8.
Theorem (Furstenberg and Kesten). Let X1,X2,... be a metrically
' transitive, stationary stochastic process with values in the set
of k X k matrices. Define the norm of a matrix A = (aij) by
ij" and for any positive real number t, let

NA“ = max 2 la
+ i j + 1
log t = max(log t,0). If E(log “X H) < m, then

m logux'sc2'1...x1u = D

1i
W
with probability one, where D is equal to
an-l 1
11m-§$1°3“x ...X H)
n

n—m
Furstenberg and Kesten [12, Theorem 1]).

(a limit which is shown to exist by

Theorem 5.8. Let f be a transducer of memory m 2 2, let u 61m,
108 Mn(y)
and let A be defined by (3.5). Then lim
new
has the same value for almost all y [1]-

 

exists and

21

Proof: We define a stochastic process Y1,Y2,... with domain

m-l x bm-l

(0,1] and values in the set of b matrices as follows.

b (5')
For y 6 (0,1] define Yn(y) = ATn , where the subscript T
denotes the tranSpose operation. By Theorem 3.6 we have A 67W-
Then by the Furstenberg and Kesten theorem we have
(5.9) lim l logHY (y)...Y (y)\\
n n l
n—m
exists and has the same value for almost all y [A]- We may re-

write (5.9) as

b (y) bn ()0

lim l'logH(A 1 ...A )T“
n—uao
or
1
(5.10) :13 H logHWT(b1(y) . . .bn(y))u.

The only property of the norm used by Furstenberg and Kesten
in their proofs is that “AB“ 5 HAHHB“ for any two mabrices A and
B. Since the matrices A1 are all nonnegative, we have the same
inequality when the norm is replaced by the weight (Lemma 5.5).
Hence we may use the weight instead of the norm in (5.10). Then,

noting that

le<b1<y)...bn<y))l = Iwo1(y)...bn(y))\,
log M (y)
and using Corollary 5.4, we have lim-——-—E-——— exists and has
n—m

the same value for almost all y [A]- #

If we let D be the limit in Theorem 5.8, we have the

following theorem.

...—II

I: v v 7.7

22

Theorem 5.11. Let f be a transducer of memory m 2 2, let
p. 6771, and let A be defined by (3.5). Then dim M(y) s D for

almost all y [1]-

108 M (y)
Proof: Let E = {y: lim ___—EE—__ = D}. By Theorem 5.8, x(E) = l.
n—m
Let y E E. Clearly, for each positive integer n, Mn(y) b-adic
-(n-im-1)

intervals of length b will cover M(y). (It is here
that we use the fact that for y a b-adic point, M(y) is defined
by the set (5.1). See Example 5.13). Now.let p > 0 and s > 0
be given, and choose a positive integer k such that

_ _ 108 M (y)
b (k+m 1) < p k

and D +'e > k . Then

LNG (M(y) .p) s Mk(y)b’ (Hm-1) (D+e)

'108 (y) _ _
< Mk(y)b Mk b (m 1) (mg) g 1.

Since p was arbitrary, it follows that LD+€(M(y)) s l, and thus

dim M(y) S D+e by (2.3). Since a was arbitrary, we have
(5.12) dim M(y) s D.

Since (5,12) holds for all y E E, the proof is completed. #
Example 5.13. We demonstrate in this example why for y a b-adic
point, M(y) is defined by the set (5.1). Let b = 2, m = 2, and
let f* 6 32 be defined by 5*(00) = £*(11) = o and

f*(Ol) = f*(lO) = 1. Let y be the dyadic point .0111..., and
note that .1000... is also equal to y. Now

M1(y) = crd(f*)-1(O) = crd{00,1l} = 2. The two intervals (0,1/4]
and (3/4,l] represented by the two sequences 00 and 11 do not
cover f-1(y) since the point x = .0111... belongs to f-1(y)

but does not belong to either (0,1/4] or (3/4,l]. However, if

wearer-7F-

23

M(y) is defined to be the set (5.1) and not the set f-1(y),

then it is clear that for all n 2 1, the M (Y) intervals
n

represented by the set of sequences f;1(bl(y)...bn(y)) will
cover M(y), a property which is essential for the proof of
Theorem 5.11.

The discussion so far in this section has been for trans-
ducers of memory m 2 2, since the matrices A1 are not defined
for m = 1. We now examine the case m = 1 separately.

* .
Let f 6 3h with f the corresponding transducer of

).

*
memory 1. We represent f by a b X b matrix D = (di

J

*
0 s i, j s b-l, as follows. The entry di is one if f (i) = j

J

*
and is zero if f (i) # j. Thus each row of D contains a single
entry of one and all other entries of the row are zero. Define
b-l

L. = 2 d,,, 0 s j s b-l; L, is the number of elements of S
i=0 ‘3 J

*
which map to j under f .
Let u 6 W2 and define pi = p([i]), 0 s i s b-l and

q. = 2 d ,p , O S i s b-l. We note that 1 defined by (3.5)
1 1:0 ji j .
is such that k([i]) = qi’ 0 s i s b-l, since
A([i]) = p({x: f(x) 6 [i]]) = z'u([j]) = 91’ where 2' denotes
*

the summation taken over those j for which f (j) = i.

We now prove the equivalent of Theorem 5.8 for the case
m = l. The proof is similar to that of Theorem 5.8, using Li
instead of A1 and the ergodic theorem instead of the Furstenberg

and Kesten theorem.

Theorem 5.14. Let f be a transducer of memory 1, let u 67%,

 

and let A be defined by (3.5). Then the ambiguity at the point

108 Mn(y)

n

y, lim
n—a

, exists and has the same value (namely,

24

l +...+' .
qo 0g ‘0 qb_llog L ) for almost all y [A]

b-l
'nggf: Let h:(y) be the number of occurrences of i,

O s i s b-l, among the first n digits of the nonterminating
base b expansion of y. By Theorem 3.6, A 6 711 and we may
apply the pointwise ergodic theorem (see Billingsley [5 ], p. 13)
to conclude that for each i, 0 S i s b-l,

h:(y)

n

 

(5.15) lim

n—cm

for almost all y [x]. Let D., 0 s i s b-l, be the set of y's
1 i
b-1 hn(y)
for which (5.15) holds. Then for y 6 F = n Di’ lim n = q1
i=0 new
for all i, and x(F) = 1 since x(Di) = l, O s i S b-l.

 

Now since for each n 2 l we obviously have

h°(y) hz'l(y)

_ n
Mn(y) - to ...L

 

b-l ’
then
0 b-l
108 Mn(Y) hniy) ha (y)
_——_= +00. ——
n n log Lo + n log Lb-l’

where we take 00 to be one and 0 log 0 to be zero. Hence if
108 Pg1(y)
y 6 F, we have lim-—-————-—- exists and is equal to
new n

qolog Lo +...+qb 1log L Since x(F) = l, the proof is com-

b-l'
pleted. #

We now prove the equivalent of Theorem 5.11 for the case
m = 1. In this case we are able to obtain an equality for the
dimension of the ambiguity set rather than just an upper bound.
We begin with a lemma.
Lemma 5.16. Let y 6 (0,1] and let ED % U'[x1,...,xn] where

U' denotes the union over those x .,xn such that

1,..

25

fn(xl...xn) = bl(y)...bn(y) (there are clearly Mn(y) such

sequences x1,...,xn). Then

a:

0 En = M(y)-

n=l

*
Proof: Let x 6 M(y). Then f (b(xi)) equals the ith digit of
*

the nonterminating expansion of y, i.e., f (b(xi)) = bi(y).
Thus fn(b1(x)...bn(x)) = b1(y)...bn(y) for all n. Hence
x 6 En so M(y)<: En for all n. Therefore

(5.17) M(y) C: Q En.
n=l

Let x E 0 En. Then fn(bl(x)...bn(x)) = b1(y)...bn(y)
n=1
for all n so f*(bn(x)) = bn(y) for all n. Thus x 6 M(y)

SC

(5.18) En<: M(y).

l

:
ll:)8

Inclusions (5.17) and (5.18) give the desired result. #

Theorem 5.19. Let f be a transducer of memory 1, let u 61%,

 

and let A be defined by (3.5). Then

b-l
dim M(y) = 2: qilos Li
i=0

for almost all y [1]-
Proof: Let y 6 (O,l] and let y1 = b1(y) for all i 2 1. For

k
each integer k 2 l and each sequence ...xk 6 S , define a

x1
on S by

function pk

26

l/Mk(y) = 1/t,y ...L

if fk(x1...xk) = y1..

-Y

1 YR k
pk(x1...xk) =
0 otherwise.

It is clear that
(5.20) pk(xl...xk) 2 0 for all k.

Also we have
(5.21) z p1(i) = L /L = 1.

iES y1 y1

We show next that

(5.22) .2 pk+1(x1...xk1) = pk(x1...xk).
168
If pk(x1...xk) = 0 then fk(xl...xk) # yl...yk so
fk+1(x1...xki) f yl...ykyk‘+1 for all i 6 8. Hence pk+1(x1...xk1)
for all i 6 S so (5.22) is true in this case. If
pk(x1...xk) = l/LYI...Lyn then fk*1(x1...xki) = yl...ykyk+1 for
the L . . * . _
yk+l values of 1 for which f (1) yk+l° Hence
p (x ...x i) is l/L ...L L for those L values of
k“ 1 k y1 yk yR+1 yR+1

i and is zero for the remaining values of 1. Therefore

2? (Knox i)=L /L ...x,
165 k+1 1 k y1.4-1 y1

so (5.22) is also true in this case.
Finally we see that for any seq

of S, we have

(5.23) lim pk+n(x1...xk
n—cco

yR+1

uence

= pk(x1...xk)

X

1". k

.,x of elements

m, u~‘._AM

27

It follows from (5.20), (5.21), (5.22) and (5.23) that

there exists a probability measure vy on the unit interval such

that

vy([x1,. .. ,xk]) = pk(x1. . .xk) ,

k
where x1...xk 6 S (see Billingsley [5 ], p. 35).
Now let F be the set of y's for which Theorem 5.14
holds, and let y 6 F. If x 6 M(y) and if we set xi = bi(x)

and yi = bi(y), then

 

 

 

108(1/L o a 0L )
log v ([x ,...,x ]) y y
lim - ,‘y 1 n = lim - 1 n
n n
[1‘03 n—m
10 C C C
8 Lyl Lyn log Mn(y) b-1
= lim = lim ---———- = Z qolog L..
n n . i i
nao new i=0

The next to the last equality follows from the fact that we

obviously have Mn(Y) = L

...L , and the last equality follows
yl yn
from Theorem 5.14. By a theorem of Billingsley [55], p. 141, we
now have
b-l
(5-24) dim M(y) = dim M(y) 2 q.log L.
v i i
y i=0

for all y 6 F.

If En is defined as in Lemma 5.15, it is seen that

an
vy(En) - 1. Since gllEn = M(y) by Lemma 5.15 and Since {En}

is a decreasing sequence, we have vy(M(y)) = 1. Hence

dimv M(y) = l by (2.7) and thus from (5.24) we have
Y
b-l
dim M(y) = 2 qilog L.
i=0 1

28

for all y 6 F. Since x(F) = 1 by Theorem 5.14, the proof is
completed. #

For the case m = l we were able to calculate the dimension
of the ambiguity set M(y) (Theorem 5.19), whereas in the case
m 2 2, we were able to obtain only an upper bound on dim M(y)
(Theorem 5.11). Conjecture is that for the case m 2 2, dim M(y)
is actually equal to this upper bound. However, the method of
proof of Theorem 5.19 cannot be used to prove this conjecture,
since the consistency condition (5.22) may not hold for m 2 2.
The following example demonstrates this fact.

*
Example 5.25. Let b = 2, m = 2, and let f 6 32 be defined

 

* * * *
by f (00) = o and f (01) = f (10) = f (11) = 1. The function

f2 is then as follows:

Domain Value Functional Value

HHHHOCOO
l—‘HOOl-‘HOO
i-‘i-‘t-‘i-‘t-‘D-‘OO
l-‘I-‘t-‘Ol-‘r-‘I—‘O

0
l
O
l
O
l
O
1

Let y = .11; we see that M1(y) = crd {01, 10, 11] 3 and
M2(y) = crd {010, 011, 101, 110, 111} = 5. If we define (as in

the proof of Theorem 5.19) p2(00) = O, p2(01) = p2(10) = p2(ll) = 1/3
and p3(000) = p3(OOl) = p3(lOO) = o, p(010) = p(Oll) = p(lOl) =
p(110) = p(lll) = 1/5, then it is clear that the consistency con-
dition (5.22) does not hold. Thus for any y 6 (3/4,l], we cannot

define a measure Vy as we did in Theorem 5.19.

VI. Conclusion

This paper has begun an examination of the relationship
between an information channel and the unit square. In partic-
ular, we have examined a special type of noiseless channel with
memory called a transducer. This same transducer has been
examined from a topological dynamics standpoint by Hedlund [14].
We now comment on our results and mention some difficulties
encountered.

Theorem 4.5 shows that capacity is a reasonable term for
the quantity C in the following sense. The "size" (dimension)
of the set of all possible received sequences is a measure of
the "capacity" of the transducer, since under no circumstances
can the transducer be made to create new output sequences and
hence increase the "size" of this set. This interpretation of
"size" as "capacity", and the fact that the quantity C is
equal to this "size" (Theorem 4.5), make capacity a reasonable
term for C.

Attempts at calculating C explicitly by finding a
difference equation satisfied by N(n) were unsuccessful. During
these attempts it was discovered that the difference equation
method used by Shannon [19, Appendix 1] for his finite state
sources calculates the number of state sequences of a given
length rather than the number of output sequences as desired.
Conant [6] is also aware of this problem.

29

3O

Theorems 5.8, 5.11, 5.14, and 5.19 show that the transducer
has a homogeneity property since almost all received sequences
have the same ambiguity. The usual quantity used for ambiguity
in information theory is a conditional entrOpy (called the equi-
vocation by most authors) which represents the ambiguity averaged
over all received sequences. Our pointwise ambiguity has a
geometric interpretation and gives results involving almost every-
where statements which are somewhat more satisfying than state-
ments involving averages. Yet to be resolved is the question of
whether or not the inequality of Theorem 5.11 is actually an
equality.

One direction in which our work may be extended is to con-
sider more general channels by introducing the element of noise.
Another direction is to consider how to make the transducer
invertible, i.e., find a subset of the unit interval of measure
one such that the tranSducer is a one-to-one function onto the

set of all received sequences when restricted to this subset.

10.

ll.

12.

13.

BIBLIOGRAPHY

Besicovitch, A.S., On the sum of digits of real numbers
represented in the dyadic system, Math. App. 110(1935),
321-330.

Billingsley, P., Hausdorff dimension in probability theory,
111. J, Math. 4(1960), 187-209.

, Hausdorff dimension in probability theory II, 111.
i. Math. 5(1961), 291-298.

, On the coding theorem for the noiseless channel,
Ann. Math. Statist. 32(1961), 594-601.

, Ergodic Theory and Information, John.Wi1ey and
Sons, New York (1965).

Conant, R.C., Channel capacity of Meore automate,
Information and Control 12(1968), 453-465.

Dym, H., On a class of monotone functions generated by
ergodic sequences, Amer. Math. Monthly 75(1968), 594-601.

Eggleston, H.G., The fractional dimension of a set defined
by decimal properties, Quart. J, Math. Oxford Ser. 20(1949),
31-360

, Sets of fractional dimensions which occur in some
problems of nunber theory, Proc. London Math. Soc. (2) 54
(1952), 42-93.

Feinstein, A., Foundations pf Information Theory, McGraw-
Hill, New York (1958).

, 0n the coding theorem and its converse for finite-
memory channels, lpformation and Control 2(1959), 25-44.

Furstenberg, H. and H. Kesten, Products of random matrices,
Ann. Math. Statist. 31(1960), 457-469.

Hedlund, G.A., Mappings on sequence spaces, Communications

Research Division Technical Report No. 1, von Neumann Hall,
Princeton, New Jersey, February 1961.

31

14.

15.

16.

17.

18.

19.

20.

32

Hedlund, G.A., Transformations commuting with the shift,
in Topological Dynamics, J. Auslander and W. Gottschalk,
eds., W.A. Benjamin, New York (1968), 259-289.

Hocking, J. and G. Young, Topology, AddisonJWesley, Reading,
Massachusetts (1961).

Khinchin, A.I., Mathematical Foundations pf Information
Theory, Dover, New York (1957).

 

Kingman, J.F.C. and S.J. Taylor, Introduction £2.Measure
and Probabiligy, Cambridge University Press, Cambridge
(1966).

Kinney, J.R., Singular functions associated with Markov
chains, Proc. Amer. Math. Soc. 9(1958), 603-608.

Shannon, C.E. and W. Weaver, The Mathematical Theory pf
Communication, University of Illinois Press, Urbana,(l964).

(reprinted from Bell System Tech. g, 27(1948), 379-423, 623-
656.)

Smorodinsky, H., The capacity of a general noiseless channel
and its connection with Hausdorff dimension, Proc. Amer.
Math. Soc. 19(1968), 1247-1254.