.\_

‘ 1‘,"
‘v HE. ‘

Inn
V 'x 4-.

 

w
I I ‘
""«s~ Viv-
”nun.” .
u ”.1" \
va

a??? ‘
1;;‘7‘0‘7"
‘.

u
‘ .
“~

In _\,

.V,

x
1

Iv 5-"in ( '
- ,M‘u ‘1 \‘
3mm!" ’1-

l f ‘1

w' . 31"
' ' 1 4
’ "V ‘ . .V‘Y'Sv.
‘If-‘ﬂlﬁ‘l'f.

l' r w
‘ Alli!

1
H x \I,
”1'..-
("A
I
if;

,I . H
‘fwyfé l ‘ f ‘w‘. 1
S (44‘s't‘J-XU4‘MI’.‘ ‘

. 7‘ #3914014:

' I 7 ‘l I
' -,. u

l .

dual,” ‘
H' .s’l 2t
:2‘Ijt‘,’t ‘"

a,
4‘“: ‘
E W

Lib.
mid]
w».
‘ pxr‘.‘
. a'fg’A‘Q'

;.

,
MM"
44‘ '
w-
i"

. ._ . ,
,‘(Q‘N‘Kb'

‘ '4
‘4'}
,r

v

l «’14
‘ $741.
, 4»

II

2

, J l ( :r‘
3‘3:
.

”(Hr

’5’! 'Y'
{My

{WAY

I
u" n
Gill‘v‘}! '
4 §
(

E

. "I ' ‘
v‘: ﬂy}

3)} ‘
V» .v
.gv“:

‘ 0”}qu
I.“ {(435 ‘ d
l‘ 4 1 . ‘

n

; {5.}???
5: ' '3',“
ﬁn???“

J
, .
‘ M“

l“’
r:

....
"um. ‘

N’ﬂ'ﬂ

.a
14%:
‘aw

7AA,"

 

 

 

 

~ K."
'1'; u
"\ J‘
~ I.“~\‘

\ a‘.

V ‘I "
‘ . ‘ x ,
h 3:9533‘2’!‘
‘1‘ 4h V};

v
. v ; i

. ‘6‘
5:.“ ’3‘ ‘,
vx‘ l‘q»
~ 1?in
.mva‘v
,.v i» i
V ' ‘s
2‘».

I; .
4V;

4 v"
0".

no!
‘ .

\.;.
31’» ‘I. ‘4 ‘ 1“
‘4 Ma: 1'"

.1 {.1}, y ,I
, 43‘s“ .,.

31,1!44- ';
J. \V,‘
. r‘.

a

.‘ ‘

.. 4 .4-, A1).

’ I , :- w“
“(Ma 15‘1”" “"5?” ‘ ‘ 'Tt‘qf."~;‘,¢‘u,
(,Hu‘lp‘. l.,“l(‘l . ' ,l (K;
“I. "act”; '9' Q6,“ \
A345. 34} 0 7“,";

1-

 

y’k‘
(Fig:
"' ‘1‘. 4‘ FHA].
".
ﬂa- 1‘

l.-

 

 

J

[Willilllﬂ/lllllllllHill!!!llIll/IllH/ll/lll

93 00552 6797

This is to certify that the

dissertation entitled

DATA DISTRIBUTION STRATEGIES
FOR PARALLEL DATABASE ACCESSES

presented by
Myoungho Kim
has been accepted towards fulﬁllment

of the requirements for

#degree in om u er ience

MM

Major professor

 

Date ML

MSU is an Affirmative Action/Equal Opportunity Institution 0—12771

 

 

LIBRARY
Michigan State
University

 

 

 

A UM,...n._ 4‘_4

 

 

 

MSU

LIBRARIES
m

 

 

RETURNING MATERIALS:
Place in book drop to
remove this checkout from
your record. FINES will
be charged if book is
returned after the date
stamped below.

 

 

 

 

 

DATA DISTRIBUTION STRATEGIES
FOR PARALLEL DATABASE ACCESSES

By

Myoungho Kim

A DISSERTATION
Submitted to
Michigan State University
in partial fulﬁllment of the requirements

for the degree of

DOCTOR OF PHILOSOPHY

Department of Computer Science

1989

‘- , If -'
, Z» ”N i

57;:

ABSTRACT

DATA DISTRIBUTION STRATEGIES
FOR PARALLEL DATABASE ACCESSES

By

Myoungho Kim

With the advent of commercially available general purpose multiprocessing sys-
tems, the need for developing appropriate information processing systems are increas-
ingly recognized. Since many database applications require a large number of data
accesses with relatively less computation, exploiting parallel data accesses is important
to improve performance in parallel processing database systems. In this thesis we
investigate data distribution strategies for parallel processing of database systems. The
primary objective is to maximize throughput and minimize response time through con-
current data accesses. We propose database processing models as a general framework,
and then present data distribution strategies for three common types of database applica—
tions. Two of these applications are on multikey hash ﬁles and the third application is
on B-tree accesses. First, we present data distribution strategies for partial match
queries. The main contribution here is the development of new data distribution
methods called Fieldwise eXclusive-or (FX) distribution methods to achieve optimal ﬁle
distribution. An algebraic property of exclusive-or Operation along with ﬁeld transfor-
mation techniques are fundamental to these data distribution methods. We show that
the proposed data distribution methods perform better than the others proposed in the
past. We also present efﬁcient data construction methods based on the usage of multi-
key hash directory. Second, optimal distribution for parallel processing of multiattri-

bute range queries is investigated. Here, we show that for various types of

 

multiattribute range queries there are inherent limitations in achieving optimal distribu-
tion. We extend FX distribution methods to achieve optimal distribution for many use-
ful multiattribute range queries. For both partial match queries and multiattribute range
queries, sufﬁcient conditions for optimal distribution by the proposed distribution
methods are given. Finally, we present node partitioning schemes for B-tree type
indexes. The objective here is to develop a new parallel processing scheme for B-tree
type indexes stored in parallel disks. We show that parallel processing of the proposed
partitioned node B-trees performs better than parallel processing of conventional B-
trees. This work presents a new basis on which parallel processing systems for other

database applications can be designed.

To my parents Han Cheol Kim and
Kye Soon Lee

iv

ACKNOWLEDGEMENTS

I wish to express my appreciation to my advisor, Dr. Sakti Pramanik, for his gui-
dance over the years. Dr. Pramanik provided valuable comments and constant support
in every phase of the preparation of this work. I would also like to thank Dr. Lionel M.
Ni and Dr. Abdol Esfahanian for their many valuable suggestions and comments. I also
wish to express my appreciation to Dr. Dorian Feldman for his encouragement and sup-
port. Without their intellectual advice and inspiration, this dissertation would not have
been possible. I would like to thank all the people who helped me during my stay at

Michigan State University.

Finally, I would like to thank my parents, my wife and my son for their constant

support, patience, and love.

TABLE OF CONTENTS

List of Tables ............................................................................................................ ix
List of Figures .......................................................................................................... x
Chapter 1. Introduction .......................................................................................... 1
1.1. Database Machines ......................................................................................... 1
1.2. Main Memory Databases ................................................................................ 3
1.3. Concurrent Dynamic Search Structures ......................................................... 3
1.4. Classiﬁcation of Database Queries ................................................................. 4
1.5. Problem Statement .......................................................................................... 5
1.6. Thesis Overview ............................................................................................. 6
Chapter 2. Database Processing Models for Parallel Processing ........................ 8
2.1. General Parallel Processing Model for Database Systems ............................. 8
2.2. Two Stage Parallel Processing Model for Database Systems ........................ 10
Chapter 3. Parallel Processing Strategies for Partial Match Queries ................ 14
3.1. Introduction .................................................................................................... 14
3.2. Deﬁnitions and Terminology .......................................................................... 16
3.3. Basic FX Distribution Method ....................................................................... 19
3.4. Extended FX Distribution Methods ................................................................ 24
3.4.1. Field Transformation Functions for Partial Match Queries .................... 25
3.4.2. I and U Field Transformation Functions ................................................. 27
3.4.3. I and I Ux Field Transformation Functions .............................................. 29

vi

3.4.4. U and I Ux Field Transformation Functions ............................................. 35

3.4.5.1U1, IU2, . . . , [Ux Field Transformation Functions .............................. 40
3.4.6.1, U and I Ux Field Transformation Functions ......................................... 46
3.5. Performance Comparison with Other Distribution Methods .......................... 53
3.5.1. Probability of Strict Optimality ............................................................... 54
3.5.2. Average Response Time .......................................................................... 55
3.6. Data Construction Methods ............................................................................ 62
3.6.1. Data Construction Based on Real Global Directory ................................ 63
3.6.2. Data Construction Based on Virtual Global Directory ............................ 66

Chapter 4. Optimal Data Distribution for Multiattribute Range Queries

.................................................................................................................................... 68
4.1. Introduction .................................................................................................... 68
4.2. Deﬁnitions and Terminology .......................................................................... 69
4.3. Limitations of Perfect Optimal Distribution ................................................... 73

4.3.1. Type (0 - 2) Range Queries ..................................................................... 74
4.3.2. Type (0 - 1) Range Queries ..................................................................... 78
4.3.3. Type 0 Range Queries ............................................................................. 82
4.4. Optimal Data Distribution Methods for Range Queries ................................. 82
4.4.1. Optimal Distribution by Basic FX Distribution Methods ....................... 83
4.4.2. Field Transformation Functions for Range Queries ................................ 86
4.4.3.1 and UR Field Transformation Functions ............................................... 90
4.4.4.1 and UM Field Transformation Functions .............................................. 92
4.4.5. UR and UM Field Transformation Functions .......................................... 95
4.4.6.1, UR and UM Field Transformation Functions ....................................... 98
Chapter 5. Node Partitioning Schemes for B-trees .............................................. 100
5.1. Introduction .................................................................................................... 100
5.2. The PNB-tree .................................................................................................. 101
5.2.1. The PNB-tree Operations ........................................................................ 103
5.2.2. Parallel Disks Conﬁguration for PNB-trees ............................................ 104

vii

5.3. Motivations of PNB-trees ............................................................................... 106

5.3.1. Compressed Height .................................................................................. 106
5.3.2. Reduced Frequency of Tree Restructuring .............................................. 107

5.4. Performance Comparison ............................................................................... 107
5.4.1. Performance Models ............................................................................ 108

5.4.2. Average Data Retrieval Time .............................................................. 109

5.4.3. Threshold Points .................................................................................. 114

5.4.4. Update Performance ............................................................................ 115

5.5. Other Strategies for The PNB-tree Organization ......................................... 115
Chapter 6. Conclusions ........................................................................................... 118
List of References ..................................................................................................... 121

viii

LIST OF TABLES

Table 3.1. Response Time for F 1=F2=F3=F4=2, F 5=F 6% and M = 16 ............... 59
Table 3.2. Response Time for F 1=F 2=F 3:2, F 4=F 5=F 6:4 and M = 32 ............... 59
Table 3.3. Response Time for F 1=F 2=F3=F4=F 5=F 6:8 and M = 32 ................... 59
Table 3.4. Response Time for F 1=F 2=F3=F4=F 5=F 6:8 and M = 64 ................... 60
Table 3.5. Response Time for F1=2, F 2=F 3:4, F 4=F 5=F 6:8 and M =

128 ....................... . ..................................................................................................... 60
Table 3.6. Response Time for F 1=F2=F3=F4=4, F 5=F 6:8 and M = 256 ............. 60
Table 3.7. Response Time for F1=F2=F3=4, F4=F5=F6=8 and M = 512 ............. 61
Table 3.8. Response Time for F1=F2=F3=8, F4=F5=F6=16 and M = 512 ........... 61

ix

Figure 2.1.
Figure 2.2.
* Figure 2.3.
Figure 3.1.

Figure 3.2.
Figure 3.3.

Figure 3.4.

Figure 3.5.
Figure 3.6.
Figure 3.7.
Figure 3.8.
Figure 3.9.
Figure 4.1.
Figure 4.2.
Figure 4.3.

LIST OF FIGURES

General Parallel Processing Model ........................................................
Two Stage Parallel Processing Model ....................................................
Two Level Implementation of H1 for The Two Stage Model ...............
Basic FX Distribution ............................................................................
FX Distribution with I And U Transformation ......................................
FX Distribution with I And I U 1 Transformation ...................................
FX Distribution with I And I U 2 Transformation ...................................
FX Distribution with U And I U 1 Transformation ................................
FX Distribution with U And I U 2 Transformation .................................
FX Distribution with I, U And I U 2 Transformation ..............................
Strict Optimality When for Any Fields r and s, F ,F s 2 M ....................
Strict Optimality When for Any Fields r, s and t, F ,F SF, 2 M .............
Explicit Representation of Multikey Hash Functions ............................
Example Bucket Distributions ...............................................................
Nonexistence of Perfect Optimal Distribution for Type (0 - 2)

Range Queries ............................................................................................................

Figure 4.4.
Figure 4.5.

Bucket Distribution When F 1 = 4, F 2 = 4 and M = 8 ...........................

Nonexistence of Perfect Optimal Distribution for Type (O - 1)

Range Queries ............................................................................................................

Figure 4.6.
Figure 4.7.
Figure 4.8.
Figure 4.9.

Figure 5.1.

Basic FX Distribution When F 1 = F 2 = 4 and M = 4. .........................
FX Distribution with I And UR Transformation ....................................
FX Distribution with I And UM Transformation ...................................
FX Distribution with UR And UM Transformation ...............................
Partitioned Node B-tree ..........................................................................

9
11
13
19
29
35
36
39
4O
52
56
57
70
72

75
77

80
83
91
96
98
103

Figure 5.2.
Figure 5.3.
Figure 5.4.
Figure 5.5.

Figure 5.6.
Figure 5.7.

Figure 5.8.

PNB-tree After Insertion of Key Value 57 ............................................ 105

PNB-tree After Deletion of Key Value 71 ............................................. 105
Data Response Time with Various Data Request Rates ........................ 111
Data Response Time with Various Disk Speeds .................................... 113
Threshold Points ..................................................................................... 114
Tree Restructuring Cost ......................................................................... 116
Insertion Time ........................................................................................ 116

xi

CHAPTER 1

INTRODUCTION

There will be many applications for large databases that cannot be performed in an
acceptable response time by current database systems. Since one of the most signiﬁcant
ways of improving performance is through parallelism, the importance of parallel pro-
cessing in database systems has been increasingly recognized. Parallel processing in
database systems can increase throughput and minimize response time through con—
current data accesses as well as processing data in parallel. However, parallel process-
ing by itself does not necessarily lead to high performance. Some of the reasons are
attributed to overhead due to interprocessor communication, remote memory accesses
and data access conﬂicts. For parallel database operations external I/O also causes a
serious performance bottleneck [Bor83]. Stone [Sto87] shows that parallel query algo-
rithms in a multiprocessor system may perform poorly than efﬁcient serial algorithms on
a single processor system. The advantage of indexing to reduce external I/O trafﬁc was
also emphasized in that paper. There have been numerous works on improving perfor—
mance of database systems by specialized software/hardware techniques. These include
database machines, main memory resident databases and. concurrent dynamic search

Sti'llCtllI’CS.

1.1. Database Machines

Many proposals have been made in the past for machine architectures for efﬁcient
parallel processing of database operations. Early designs such as CASSM [Su79] and
RAP [Ozk75] exploit the logic per track idea which was ﬁrst proposed in [81070] as an

alternative to pure associative memories. They are SIMD architectures and focus on

1

parallel scanning of data on secondary storage devices. DIRECT [Dew79] is a logical
extension of the SIMD type associative processors which supports both intra-query and
inter-query concurrency. It assigns the number of processors dynamically to a query
and so it allows users to share the processing power simultaneously. These machines
can be characterized by functional replication approach because processors (or
memories) are basically similar to each other and capable of performing the same set of
functions in parallel. On the other hand, database machines such as RDBM [chh83]
and SiDBM [Le185] have used functional specialization approach which means that dif-

ferent functional units are used for different classes of database operations.

Some database machines such as DBMAC M583] and Delta [Kak85] use an attri-
bute based data model which employs vertical fragmentation concept. They use a verti-
cal fragmentation of a relation to minimize the cost of processing and transferring
unnecessary attributes from disks. Other database machines which use some form of

vertical fragmentation are DBC [Ban79] and a database computer proposed in [Tan83].

Due to the similarity between relational algebra tree and data ﬂow graph, the data
driven computation concept has been proposed as an effective operational method for
relational database machines [Bor82]. GAMMA, proposed in [Dew86], exploits
dataﬂow query processing techniques. In GAMMA, queries are compiled into a tree of
operators, and each operator is executed by one or more operator processes. With the
exception of a few control messages at the time a query is initiated, execution of an
operator is self-scheduling, i.e., when the process terminates execution, data ﬂows
between the processes without centralized control. There are also other proposals of

database machines using data ﬂow concepts [Bic83, Gli83].

Performance of various database machines are discussed in [Haw82, Hi186]. The
results show that the performance improvement depends on the query type as well as the

database machine architecture.

l.2. Main Memory Databases

Since inexpensive large main memory is becoming available, and many database
applications are I/O bound, some work has been done on main memory database sys-
tems to minimize data access time. ESP, presented in [GarL84], consists of multiple
processors, but it operates in SISD mode. ESP uses single data stream with all proces-
sors operating in lock step. The main focus of this machine is to reduce the global
memory access time. SiDBM presented in [Le185], on the other hand, has an MIMD
architecture in which the processors are functionally specialized. Both machines use

shared memory with common bus architecture.

Storage structure for main memory databases has been studied in the past. Dewitt
et al. discussed the access time of AVL trees and B-trees for various memory residency
factors of databases [Dew84]. They have also shown that hash based join algorithms
are very effective for large main memory database systems. Lehman et al. give perfor-
mance results of main memory databases using various types of access methods
[Leh86]. In that paper they have also proposed a T tree as an alternative to other tree
structures. The T tree is a balanced binary tree with many elements per node. It
exploits binary search nature of AVL trees, and good update and storage characteristics
of B-trees. Several other issues of main memory databases have also been investigated

in [Amm85].

1.3. Concurrent Dynamic Search Structures

There have been several proposals on concurrent dynamic search structures to sup-
port parallel processing. Many of them are based on tree structures such as B-trees and
binary search trees. For concurrent B-tree operations, the entire subtree of the highest
affected node is exclusively locked in [Sam76]. Bayer et al. present the algorithms
which Only lock out other writers until the actual modiﬁcations must be performed

[Bay77]. This approach distributes exclusive locks mostly in lower sections of the B-

tree and hence it increases the concurrency. Lehman et al. also described algorithms for
concurrent B-tree operations which use only a small number of locks for each tree mani-
pulating process [Leh81]. Concurrent search and insertion algorithms for AVL trees
and 2-3 trees are given in [E1180-a, E1180-b]. There are also several other work on con-

current manipulation of binary search trees [Kun80, Man84].

1.4 Classiﬁcation of Database Queries

Parallel processing strategy which is appropriate for one database application may
not be appropriate for another. For example, the granularity of parallel processing
which is suitable for one application may not be so for others. Thus, it may be neces-
sary to develop different parallel processing strategies for various types of applications.
We classify database queries based on their parallel processing characteristics, as fol-

lows :

(A1) Single query with multiple hits

(A2) Single query with a single hit

(A3) Single complex query

(A4) Multiple queries accessing the same relation
(A5) Multiple queries accessing different relations

Examples of queries of type A1 are partial match queries and range queries. Here,
intra-query parallel processing is advantageous because a single query requiring many
data records can be processed in parallel [Kim88, Pra88-b, Pra88-d]. Rosenau et al.
have also applied this type of parallel processing for projection operation on a relation

[R0587].

It is rather difﬁcult to exploit parallel processing of type A2 queries because apply-
ing parallelism for this type of query may require ﬁner granularity which may result in

lower throughput of the system. On the other hand, parallel processing may achieve the

lower bound on access time for these types of queries when appropriate software and
hardware architecture is used. Achieving and guaranteeing this lower bound are impor-
tant for many real-time critical applications. Parallel processing strategies for this type

of applications can be found in [Pra86, Pra87, Pra88-a, Pra88-c].

Queries of type A3 include join functions, sorting of ﬁles, and complex
qualiﬁcations. Several database machines which use functionally distributed architec-
tures have been proposed for this type of queries Hxl85, Sch83]. Intra-query parallel-
ism is also advantageous for this type of applications.

For A4 and A5 type queries, many independent queries can be processed in paral-
lel. Transaction processing is an example for these types. The throughput of the system
for these types of applications can be improved by maximizing concurrency among the
queries. Multidirectory hashing proposed in [Cev88] has been shown to be effective for

certain applications of these types.

1.5. Problem Statement

The primary objective of this work is to investigate data distribution strategies for
various types of database applications in parallel processing systems. In this thesis we
will mainly focus on the data distribution strategies for A1 and A2 type queries because
these queries are in fact the most basic types, and data distribution strategies for other
types of applications may be developed based on the strategies used in A1 and A2 type
queries. For queries of type A1, we will investigate optimal data distribution for multi-
key search queries such as partial match retrieval and multiattribute range queries.
Since multikey search queries access a set of records, appropriate data distribution is
very important to facilitate parallel data accesses. Though much work has been done in
the past on designing efﬁcient ﬁle structures for this type of applications, data distribu-
tion to enhance concurrency has not been considered much. We will investigate

optimal data distribution and appropriate data construction for this type of applications.

For type A2 queries, we will investigate parallel processing Of tree type indexes for
external databases. Since the B-tree is a common storage structure for this type Of
applications, node partitioning schemes to exploit parallel data accesses tO B-trees will

be investigated.

1.5. Thesis Overview

The remainder of this thesis is organized as follows. In chapter 2 we describe high
level abstraction Of database processing for parallel processing systems. The objective
Of this abstraction is to deﬁne a framework which can be used as a basis for more

speciﬁc implementation.

In chapter 3 we present Optimal data distribution for partial match retrieval type
queries. The main focus Of this chapter is the development Of new data distribution
methods, called Fieldwise eXclusive-or (FX) distribution methods, for maximizing data
access concurrency. FX distribution methods use bitwise exclusive-or Operation on the
ﬁeld values which are computed by multikey hashing. We show several useful charac-
teristics Of exclusive-or Operation for Optimal ﬁle distribution. Field transformation
techniques are presented to extend the scope Of Optimality in FX distribution methods.
We give sufﬁcient conditions for Optimal distribution by the proposed distribution
methods and show that these methods are Optimal for most partial match queries. We
describe the performance improvement by FX distribution methods over Other methods
proposed earlier. Efﬁcient data construction methods for FX distribution approach are

also discussed.

In chapter 4 we investigate ﬁle distribution problems for parallel processing of
multiattribute range queries. Optimal data distribution methods as well as the existence
and nonexistence Of perfect Optimality are investigated. It will be shown that there are
inherent limitations tO achieve Optimal distribution for various types Of range queries.

We give sufﬁcient conditions for the nonexistence Of perfect Optimal distribution for

certain types Of range queries. We extend the FX distribution methods for several use-
ful multiattribute range queries. Sufﬁcient conditions for Optimal distribution for mul-
tiattribute range queries will be described. It will be shown that the proposed data dis—

tribution methods are Optimal for a large class Of multiattribute range queries.

In chapter 5 we investigate the performance of various parallel processing methods
for B-trees. The main focus Of this chapter is a node partitioning scheme for B-trees to
exploit parallel data accesses. The proposed B-tree node partitioning scheme is based
on synchronized disks. Parallel processing Of partitioned node B-trees on asynchronous
disks are also discussed. We also show the performance improvement Of partitioned

node B-Uees over conventional B-trees. Chapter 6 contains concluding remarks.

CHAPTER 2

DATABASE PROCESSING MODELS
FOR PARALLEL PROCESSING

In this chapter we describe high level abstraction of database processing for paral-
lel processing systems. The Objective of this abstraction is to deﬁne a framework which

can be the basis of more speciﬁc implementation.

2.1. General Parallel Processing Model for Database Systems

We propose an abstract model for parallel processing of database systems. This is
shown in Figure 2.1. The model is based on distributing data and access structures to
enhance concurrency. In the ﬁgure, Q5,S represent a set Of parallel access nodes.
These can be main memory modules or disks, depending on the parallel processing
environment. Ai’s denote a set of access structures. As shown in the ﬁgure, there are
three mappings that are important for concurrent processing. They are (1) storage allo-
cation for data (2) storage allocation for access structures and (3) key to access structure
mapping.

Storage allocation for data, and storage allocation for access structures determine
the amount of physical access concurrency among the nodes. In practice, these issues
should be considered based on the architectural types Of parallel processing systems. In
other words, a number of factors such as the interconnection structure between the pro-
cessors and the access nodes may contribute to actual system performance. In this

thesis we assume that the parallel access nodes are logically single shared memory.

 

DATA

 

 

 

Storage Allocation Key-tO-Access Structure
for Data Mapping

Storage Allocation for

Access Structures

A:

     

Figure 2.1. General Parallel Processing Model

In the parallel processing system, the speed mismatch between computation and
data access time becomes a more serious problem than in the uniprocessor system due to
data access conﬂict if data are not evenly distributed over access nodes. Generally,
database applications require many data accesses with relatively less computation, and
so degradation of system performanc due to inappropriate storage allocation scheme

may be more signiﬁcant in database applications than in others.

In addition to physical memory (or device) access conﬂict, there are other sources
of conﬂict for concurrent accessing of data. This means that even though two data are
stored in different access nodes, access concurrency can still be reduced. One such
source is the lock contention because the data in the same locking entity cannot be
accessed concurrently when that entity is locked. It has been shown in [Cev88] that
lock conﬂict causes more serious overhead than the physical memory access conﬂict in
certain cases. This is because each resolution for a lock conﬂict takes at least a lock

duration time which is perhaps much longer than one remote memory access time.

10

Each lock duration time may be reduced by using ﬁner locking granularity. However,
lock conﬂict for critical shared variables, if exist, cannot be avoided. Unfortunately,
most access structures have critical parts which frequently need to be locked, e.g., root

node in the tree type index and directory size dependent variables in dynamic hashing.

Based on the above Observation, in order to achieve high concurrency multiple
independent access structures need to be constructed for each relation to reduce the
amount Of sharing for critical shared variables. By multiple access structures for each
relation, we mean that a set of records pertaining to one relation is partitioned horizon-
tally and each subset Of partition contributes to each independent access structure.
Thus, we need an appropriate mapping from a set of keys to a set of access structures.
Key to access structure mapping determines logical access concurrency while storage

allocation scheme for data and access structure determines physical access concurrency.

For many applications we can simplify the general model such that only a group of
data which are stored in the same access node, say Qi, contributes to a unique subset Of
access structures, say A j to A, which are also constructed in the same access node. By
this approach storage allocation strategy for data and access structures can be treated
integratedly, and the complexity of processing models can also be reduced. Two stage
parallel processing model shown in Figure 2.2 represents this idea. In the two stage pro-
cessing model the ﬁrst stage corresponds to data storage allocation and the second stage
corresponds to key to access structure mapping. The two stage model provides a more

systematic design procedure for developing parallel database systems.

2.2. TWO Stage Parallel Processing Model for Database Systems

The basic idea Of this model is to partition data mapping into two stages. As
shown in the ﬁgure, the ﬁrst stage, H1, is called Data Distribution algorithm and the

second stage, H2, is called Data Construction algorithm.

ll

 

QM—l

Figure 2.2. Two Stage Parallel Processing Model

The data distribution algorithm, which is the same as data storage allocation in the
general model, determines how the data is appropriately distributed to the parallel access
nodes so that maximum concurrency is achieved between the access nodes. The data
construction algorithm, on the other hand, determines the appropriate data structure to
minimize the access time. It receives data from the data distribution algorithm and then

create local access structures such as hashed or indexed ﬁles.
In general, the following strategies can be employed for data distribution :
(B 1) Declustering based on query’s data reference pattern
(B2) Random distribution
(B3) Objective speciﬁc declustering
(B4) Clustering based on data reference pattern

In method B 1, the data distribution technique takes advantage of the data reference
pattern of a query. For example, if a query references numerous records, the strategy

may be to distribute the data so that these records are stored uniformly among the nodes.

12

This approach is useful for A1 type applications discussed in chapter 1. In method B2,
records are randomly distributed between the nodes. This method is simple, but may
not guarantee a good distribution. In the objective speciﬁc method, records are allo-
cated to optimize certain objective functions. For example, in [Pra86] Pramanik et al.
propose a data distribution technique to construct multiple directories for a single rela-
tion, where a record is allocated to the node which has the smallest directory size. It
has been shown in that paper that this approach gives the minimum total directory size.
However, declustering of data may not be always beneﬁcial. For example, if the com-
munication cost between access nodes is large, clustering may give better performance

than declustering. SO, B4 type strategies may depend on the interconnection network.

Data distribution algorithms can be a functional mapping which depends only on
data values. It maps a set of data values into a set of nodes. For example, if node
addresses are determined by hashed values of input data, the distribution algorithm is a
mapping which is independent of time or other system parameters. On the other hand,
the distribution algorithm need not be functional. For example, random distribution
may map the same record to different nodes at different time. Since data accesses are
content-based for most database applications, it is advantageous to make a data distribu-

tion algorithm functional depending only on data values.

Let D be a set of data and ZM = [0, 1, . . . , M-l} be a set of parallel access nodes.
Let a data distribution algorithm be a function from D to ZM. Since actual data are usu-
ally unevenly distributed in the data domain, data distribution algorithms are commonly
designed based on the hashed values of data which are evenly distributed in hashed
address space. Thus, we deﬁne data distribution algorithm H1 as a composition of two
functions, H 1(1) and H 1(2), such that H 1‘” is a mapping from D to T and H 1(2) is a
mapping from T to ZM, where T is the set of hashed values. Figure 2.3 shows two level
implementation of H1 for the abstract model in Figure 2.2. This model can be thought

of as one class of the two stage model.

(2)
H1 Q0

 

H10)

 

 

 

DATA - T

 

 

 

 

 

QM—

Figure 2.3. Two Level Implementation Of H1 for The Two Stage Model

Let H2 be a hash-based data construction algorithm and LD be a set of entries in all
local directories generated by H2 for a given ﬁle system. If there exists one-to-One
correspondence between T and LD, T is called a real global directory. Otherwise, it is

called a virtual global directory.

When T is a real global directory, the set of all the local directories can be thought
of as a partition of T. H 1(1) is usually static because the use of dynamic hashing for
H 1(1) will cause signiﬁcant overhead due to intemode data movement. However, static
hashing scheme for H 1(1) may result in very sparse local directories or long overﬂow
chains. These problems can be avoided by using a virtual global directory, where the

actual local directory is determined by H2.

When T is a real global directory, the ratio lTl/IDI directly affects the data retrieval
time as well as storage utilization. On the other hand, it is more ﬂexible that T is used as
a virtual global directory. The comparison Of these two approaches will be described in
more detail in chapter 3. Functional distribution, and real/virtual global directory con-
cepts are used for parallel processing of partial match queries and range queries

presented in chapter 3 and 4.

CHAPTER 3

PARALLEL PROCESSING STRATEGIES
FOR PARTIAL MATCH QUERIES

3.1. Introduction

Many information processing systems involve the application which accesses the
records in a ﬁle having the values of a speciﬁc set Of attributes in common. A ﬁle is a
collection of records each of which is deﬁned as an ordered n-tuple (r0, . . . , r,,_1) of n
values which are the keys or attributes of the records. When a query is allowed to
specify conditions on more than one attributes, the search performed for this query is
called multikey searching, or associative searching. Multikey searching is used in
queries such as partial match retrieval and multiattribute range queries. A partial match
query is a query where some of the attributes are speciﬁed and the rest of them are
unspeciﬁed. For example, query q = [Age = *, Department = "mathematics", State =
"Ohio"] is a partial match query, where * denotes don’t care condition. Since each attri-
bute in a ﬁle may or may not be speciﬁed in a partial match query, a set of records need
to be retrieved. When a ﬁle is constructed on parallel devices, it is important to store
these records to maximize concurrency. In this chapter we investigate parallel process-
ing strategies for partial match queries. Optimal data distribution methods and

appropriate data construction methods are described.

It has been shown in [Rot74] that multikey hashing is effective for partial match
retrieval type applications. Multikey hash function, H, for a ﬁle consisting of 11 ﬁelds is
an ordered n functions <h1, - - - , hn> such that given a record r = <r1, ~ ' ~ , r" >, H(r)
= <h1(r1), ~ - - , h,,(r,,)>. H(r) is called a bucket. Rivest [Riv76] and Rothnie, et a1.

[Rot74] have independently proposed the use Of multikey hashing, as an alternative to
14

15

inverted ﬁles, to reduce the total search time for partial match retrieval type queries.
The design of multikey hash functions was considered in [Bur76]. The determination
of each ﬁeld size for minimum search time based on query statistics was also investi-
gated by [Aho79, B0179]. In [4] it has been shown that the problem of ﬁnding the
optimal ﬁeld sizes for multikey hashing scheme is NP-hard. The main focus Of those
research is on minimizing the total number of bucket accesses. Our objective in this

chapter is to achieve maximum parallelism by distributing buckets in multikey hashing.

There are a few heuristic methods for distributing data in partial match retrieval
type queries. Du, et al. have proposed Disk Modulo (DM) distribution method
[DuS82]. The DM distribution method is simple and elegant but does not work well in
many cases. For example, it may not give Optimal distribution if some of the ﬁeld sizes
are less than the given number of devices. SO, for a large number of parallel access
nodes, the DM distribution method may not be appropriate. Generalized Disk Modulo
(GDM) method has also been proposed in [DuSSZ] to overcome this problem. This
method gives a sufﬁcient condition to achieve optimal distribution. However, no gen-
eral method has been given to ﬁnd the optimal distribution parameters. In fact, the
problem of ﬁnding the optimal parameter values could be very complex [DuS82].
Since DM distribution method does not work well for binary cartesian product ﬁle
(binary cartesian product ﬁle is a cartesian product ﬁle in which each ﬁeld contains only
two elements), several heuristics have been proposed by [Du86, Sun85] to distribute
buckets in binary cartesian product ﬁles. These heuristics are also special cases of the
GDM distribution method. Several useful properties of these modulo based distribution
methods have also been given in [Sun87]. Other approaches such as data distribution
methods based on minimal spanning trees and short spanning paths have been proposed
in [Fan86].

We propose Fieldwise eXclusive-or (FX) distribution methods which give better

performance for a wider range of partial match queries than existing methods. The main

l6

idea of FX distribution methods is the use Of bitwise exclusive-or Operation on the ﬁeld
values which are computed by multikey hashing. Here, we show several useful charac—
teristics of exclusive-or operation for Optimal data distribution. Field transformation
techniques have been used to increase the scope Of Optimality in FX distribution

methods.

The rest Of this chapter is organized as follows. In section 3.2, we describe
deﬁnitions and terminology. In section 3.3, we deﬁne Basic FX distribution method
and its optimality conditions. In section 3.4, we present ﬁeld transformation techniques
to increase the scope of Optimality over the Basic FX distribution method. The
extended optimality conditions for these ﬁeld transformation schemes are also described.
In section 3.5, we compare the performance of FX distribution methods with those of the
other distribution methods proposed in the past. We discuss efﬁcient data construction

methods in section 3.6.

3.2. Deﬁnitions and Terminology
Before describing FX distribution method, it is necessary to introduce some nota-

tions as well as relevant deﬁnitions and assumptions.

Deﬁnition 3 .1 .

(a) f,- = {0, l, , Fi—l }, a set of hashed values of ﬁeld i by the i-th hashing function in
multikey hashing.

(b) F ,- denotes lﬁ I .

(c) M denotes the number of parallel devices.

((1) N is the set of all natural numbers including 0.

(e) ZM is the set of all integers from 0 to M-l.

(f) (am_1 a0)3 is a binary notation of an integer, where a,- is a binary digit.

17

I f,- | is assumed to be a power of 2 which is common for hash directory ﬁles for parti-
tioned hashing schemes [Aho79]. The number of devices M is also assumed to be a

power of 2. ZM will be frequently used to denote the set of devices.

Deﬁnition 32. Let f 1 xfzx xf,, be the set of all buckets, where x denotes the carte-
sian product of sets. When the given number of devices is M, data distribution method

is a function fromflezx xf,l to ZM.

Deﬁnition 3 .3 . Let R(q) be the set of buckets which satisfy qualiﬁcations for a partial
match query q. The distribution method is called strict optimal for a partial match
query in a given ﬁle system if each device has no more than [IR (q) l/M] number of

buckets.

Deﬁnition 3.4. When the distribution method is strict optimal for all possible partial

match queries in a given ﬁle system, it is called perfect optimal for that ﬁle system.

Deﬁnition 3 .5. The distribution method is called k-optimal, 0 S k S n, for a given ﬁle
system consisting of n ﬁelds, when it is strict optimal for all partial match queries which

have exactly k unspeciﬁed ﬁelds.

Thus, the distribution method is perfect optimal, if it is k-optimal for all k = 0, , n.
Note that some authors exclude cases where the number of unspeciﬁed ﬁelds is 0 (i.e.,
exact match) and the number of unspeciﬁed ﬁelds is n (i.e., retrieval Of whole ﬁle) from

partial match queries.

Deﬁnition 3 .6. [+] denotes exclusive-or operation between two bits. We will use the
same notation [+] to denote exclusive-or Operation between integers and sets of integers
as follows. When X = (am_1 a0)3 and Y = (b,,,_1 b0)3 are two integers, X [+] Y
= (am_1 [+] bm_1 a0 [+] b0)3. IfX is an integer and Y = {y1, , yL} is a set of
integers, X [+] Y is deﬁned as {X [+] y,- | y,- e Y} . If both X = {x1, ,xK} and Y =

{y1, ,yL} are sets of integers, X [+] Y is deﬁned as {x; [+] yj l x,- e X, yj e Y}

18

Forexample, ifX1 =2and Y1 =3, thenX1 [+] Y1 = 1. Isz =2and Y2 = {0, 1,2,3},
thean [+] Y2 = {0, 1,2, 3}.

Deﬁnition 3.7. [EKYD = Y1 [+] Y2 [+] Y3 - ' - [+] Y".

II
Note that [+] Operator is associative and [+] is a shorthand notation for performing
i=1

exclusive-or operation between sets of integers Y1, Y2, , Y,,.

We assume that precedence of multiplicative Operator * is higher than that Of
exclusive-or operator [+], which means that in the absence of parentheses, multiplication
is done before exclusive-or operation. But the precedence of [+] is assumed to be
higher than that of + (addition). We will leave out the multiplication operator * when-

ever there is no ambiguity (e.g., AB instead of A*B).

When d is a power of 2, and J1, J 2 e N, we will use the following relations impli-

citly between exclusive-or Operator and arithmetic Operators + and *.
(1) IfJ2<d, thenJ1d[+]Jz=Jld+Jz

(2) er [+]12d= (Jr [+]Jz)d

(3) J1d[+]J2d=Oifandonlyif Jl[+]J2=0

(4) IfJ1<dansz<d, thenJl[+]Jz<d

(5) (J1 [+]J2) mod d :01 mod d)[+] (12 mod d)

Since the proof is straightforward, we leave out the proof. Note that if d is not a power

of 2, the above relations may not hold.

Deﬁnition 3.8. Let (as_1 am_1 a0)3 be the binary notation of an integer l, where

m = logzM. We deﬁne TM : N —> ZM as a function such that TM(l) = (am_1 a0)3.

Thus, function TM returns only the rightmost logzM bits of domain values. Since we

can add arbitrarily large number of 0’s to the left of the given binary number of an

19

integer 1, function T M is always deﬁned for any power of 2 integer M. It is also easy to

see that for any Jl , 12 e N, TM(JI [+] 12) = TM(TM(JI) [+] 12) = TM(J1) [+] TM(JZ).

3.3. Basic FX Distribution Method

Let f 1 xfzx xfn be a set of all buckets. The Basic FX distribution method allo-

n

cates bucket <11, , Jn> into device TM [[ﬂUﬁ], where Jj e f,- forj = l, , n.
1:

Example 3.1. Figure 3.1 shows the bucket distribution by the Basic FX distribution
method, where f1 ={0,1},f2 = {0, 1, 2, 3, 4, 5, 6, 7} and M = 4. In the ﬁgure, binary
numbers are used for ﬁeld values and decimal numbers are used for Device No. (This

convention will be used in all the examples of FX distribution). Here, Device N0 =
TM [J 1 [+] J 2], where J 1 6 f1, J 2 6 f2 and TM returns the rightmost two bits of the

result of J 1 [+] J 2.

 

 

f1 f2 Device No
000 000 0
000 001 l
000 010 2
000 011 3
000 100 0
000 101 1
000 110 2
000 111 3
001 000 1
001 001 0
001 010 3
001 011 2
001 100 1
001 101 0
001 110 3
001 111 2

 

 

 

Figure 3.1. Basic FX Distribution

20

As shown in Figure 3.1, the Basic FX distribution method is strict optimal for any partial
match query in this ﬁle system. For example, if f1 value is (001)}; and f2 is
unspeciﬁed, then we have to access eight buckets <(001)B,(000)3>, ,
<(001)B,(111)B>. Since each device has two qualiﬁed buckets for this partial match

query, the Basic FX distribution method is strict Optimal for this query.

Lemma 3.1. ZM is a set which contains M different nonnegative integers from 0 to M-
1. Let k be some integer such that 0 S k 5 M-1. Then ZM [+] k = ZM.

Proof: Clearly, for any nonnegative integer Aa which is less than M, 0 S k [+] A, 3
M—1. Thus, it is sufﬁcient to show that for any two different elements of ZM, the results
of exclusive-or with k are different. Let A0 = (a,,,_la,,,_2 - ~ . a0)3 and Ab =
(b,,,_1b,,,_2 . - - b0)3, where m = logzM and at least one a,- ¢ bi, i.e., Aa¢ Ab. Let Aa
[+] k = (c,,,_1 --'c0)B and Ab [+] k = (d,,,_1 . ' «10)3. Assume Aa [+] k = Ab [+] k.
Then c,- = dj for all j = 0,, m—l. Since cj = d,- implies d} = bj, it follows that aj = b,-
for allj = O, , m-l. This contradicts the assumption of Ad ¢ Ab. Since 0 S k [+] A,- S
M—l for all A,- e ZM and k [+] A, at k [+] Ab for any two different A, and Ab in ZM, ZM

[+] k is the same as ZM. 1:]

Example 3.2. LetZs = {0,1, 2, 3, 4,5, 6, 7} and k = 3. Then Zg [+] k = {3, 2, l, 0,7,
6, 5, 4} = 23.

In the proof of Lemma 3.1, it is shown that for any two different nonnegative
integers I, J and any nonnegative integer k, I [+] k is different from J [+] k. We call

this XOR uniqueness property.

We can observe that the property described in Lemma 3.1 (this also implies XOR
uniqueness property) is very useful for optimal ﬁle distribution of partial match retrieval
type applications. This property is fundamental, and will be used in various places Of

this chapter to develop further techniques for Optimal distribution.

21

In the following Theorem 3.1 and Theorem 3.2 we will describe the types of partial

match queries for which the Basic FX distribution method gives Optimal distribution.

Theorem 3.1. The Basic FX distribution method is always O-optimal and l-Optimal.
Proof: (1) O—Optimal : This is trivially true.

(2) l-optimal : Let only one ﬁeld i be unspeciﬁed and T M [[J:}(J 1)] = h, where J j is the
speciﬁed value of ﬁeld j. Thus, h gives the projection of the rightmost logzM bits of the
value obtained by the exclusive-or of all the speciﬁed values of the query. There are
two cases, F,- SM and F,- > M. When F,- S M, for all I e f,~, TM(I) [+] h is different
from each other by XOR uniqueness property. Therefore, the distribution is optimal.
(Note that TM(A [+13 ) = TM(A) [+] TM(B ) = TM(A [+] Tu(B)) = TM(TM(A) [+] TM(B)))-
When F; > M, let F,- = A*M. By Lemma 3.1 1‘, [+] h = ZAIM. Since
#{ae ZArM ITM(Ot) = z} = A for any 2 e ZM, the Basic FX distribution method allocates
A number of buckets to each device. (Note that # denotes the cardinality of a set.)

Therefore, the distribution is Optimal. El

Theorem 3.1 says that the Basic FX distribution method is strict optimal for any partial
match query in which the number of unspeciﬁed ﬁelds is O or 1. Note that in the above
expression, #{Ote ZArM ITM(0t) = z} = A, A denotes the number of qualiﬁed buckets that

correspond to a particular device 2.

Theorem 3.2. For any partial match query which has two or more unspeciﬁed ﬁelds,
the Basic FX distribution method is strict optimal, if there exists at least one unspeciﬁed
ﬁeld i such that F ,- 2 M.

Proof: For partial match query q, let q(f) = {i1, i 2, , ik} be the set of unspeciﬁed
ﬁelds in which the size of at least one Of these ﬁelds is greater than or equal to M.

Without loss of eneralit , assume F - 2M, i.e., F ~ = A- *M for some nonne ative
g y ll ‘1 ll g

integer Ai,- Let h = TM [lel-2f)“ 1)], where J 1- denotes the speciﬁed value of ﬁeld j

22

which is not in q(f). Thus, h is the projection of the rightmost logzM bits of the value
Obtained by the exclusive-or of all the speciﬁed ﬁeld values of the query. By Lemma
3.1, h [+] fl, = ZA,‘*M and hence #{Jilefil | TM(h [+] 11",) =2} = Al, for all z e ZM.
Here, we have done exclusive-or of h with the set of values of the unspeciﬁed ﬁeld i 1.
At, gives the number of unique values of the unspeciﬁed ﬁeld i 1 that correspond to a
particular value 2. Now we will exclusive—or the set of values h [+] ft, with a value of
the unspeciﬁed ﬁeld i2. Let J; G fiz. By Lemma 3.1
#{Jilefi1 | TM(h [+]J,-1 [+]J,~2)=z} = At, for all z e ZM. Note that this will not
change the number of unique values of ﬁeld i1 that correspond to a particular 2. We
also used the property, TM(h [+] J}! [+] J52) = TM(h [+] J11 [+] TM(J,~2)). Thus,
#{(J,~l , Ji2)e fi, xf,2 I TM(h [+]];1 [+]J,-2)=z} = AilFi2 for all z e ZM. Here, the

number of unique values for each 2 is more by a factor F [2 because the size of the

domain has increased by a factor F ,2. By continuing this argument, #{(J,-l, ' - - Jik)e
k k k
filx xfik {TM h [+] [+}(J,'P) =2} =Aill—IFL', = (HE-p) / M for all z e ZM.
P: p= p=l
CI

Note that Theorem 3.1 works for partial match queries with zero or one unspeciﬁed
ﬁeld while Theorem 3.2 applies to partial match queries with more than one unspeciﬁed

ﬁelds.

Corollary 3.1. When all ﬁeld sizes are no less than the given number of devices M, the
Basic FX distribution method is perfect optimal.
Proof: This is a direct consequence of Theorem 3.2. CI

Theorem 3.1 and 3.2 show general characteristics of exclusive-or operation for
optimal distribution. However, the Basic FX distribution method does not give optimal
distribution for partial match queries with 2 or more unspeciﬁed ﬁelds, when the size Of

none of the unspeciﬁed ﬁelds is greater than or equal to M. For example, when M = 16

23

and all others are the same as in Example 3.1, the distribution is not optimal if both
ﬁelds are unspeciﬁed. Proposition 3.1 gives the conditions for Optimal distribution for

these cases.

Proposition 3.1. Let q(f) = {i1, i 2, , ik} be the set of unspeciﬁed ﬁelds for partial
match query q, where F j < M, for all j e q(f). FX distribution is strict Optimal for par-

tial match query q, if there exist a set of ﬁelds {[1, , ij} ; q(f) such that
j

Ifilx ><f,-jl 2M and #[(J,-l, J,-j)e fi,>< xfijl TM [[+}(Jip)]=z} =
p:

lfilx xfijl/Mforallze ZM.
Proof: Let FilX ' ' ' >'<F,'j =Ai;*M’Ai,- E N, Ai, Z 1. Let h = TM [l[+J’)(J1)], WhCI'C J1
’ a:

denotes the speciﬁed value of ﬁeld 1. The remainder of the proof is similar to that of

Theorem 3.2. C]

Proposition 3.1 says that even though the sizes of all the unspeciﬁed ﬁelds are less
than the given number of devices M, we can still guarantee optimal distribution, if (1)
there exists a subset of the unspeciﬁed ﬁelds, the size of whose cartesian product is
greater than or equal to M and (2) the records projected on these sets of ﬁelds are distri-
buted uniformly among the M devices. In other words, optimal distribution for a subset
of ﬁelds guarantees strict optimal distribution for many queries in which those ﬁelds are

unspeciﬁed.

However, when the size of none of the unspeciﬁed ﬁelds is greater than or equal to
M, the conditions given in Proposition 3.1 are not satisﬁed in the Basic FX distribution
method. Thus, in the next section we propose ﬁeld transformation techniques for the
ﬁelds whose sizes are less than the given number of devices M. These ﬁeld transforma-
tion techniques increase the scope of optimality by themselves, and can also utilize Pro-
position 3.1. The following paragraph exempliﬁes the idea of ﬁeld transformation tech—

niques.

24

Let f1 ={0,1}, f2 = {0, l, 2, 3, 4, 5, 6, 7} and M =16. As we discussed, the
Basic FX distribution method is not perfect optimal for this ﬁle system. Let X be an
one-tO-one mapping such that X (f 1) = {0, 8}. When the Basic FX distribution method
is applied for the set of ordered 2—tuples, X (f 1 )xfz, the distribution is perfect Optimal.
(It can be easily veriﬁed by substituting (1000);; for (001)B in f1 column of Figure 3.1.)
Now, our Objective is to ﬁnd a general one-to-one mapping, X, such that the Basic FX
distribution method for X (1‘1)fo gives optimal distribution. It will be shown that for
any values of | f 1 I, |f2 l and M, such mapping can be easily found.

We will present several ﬁeld transformation techniques which give the mapping, X,
described above. Even though the techniques developed here may not achieve perfect
optimal distribution in all the cases, this extended FX distribution method gives strict

optimal distribution for a large class of partial match query.

3.4. Extended FX Distribution Methods

In the previous section the Basic FX distribution method was described. In this
section we extend the Basic FX distribution method by using ﬁeld transformation tech-
niques.

Let f 1 xfzx xf,, be a set of all buckets. Extended FX distribution methods (the

Basic FX distribution method with ﬁeld transformation techniques) allocates bucket

<11, , J,,>,Jj 6 fl forj = 1, , n, into device TM , where

 

 

[+](Xj(Jj))
j=l

i) if Ifjl 2 M, X j is the identity function,

ii) if IfJ-l < M, X 1' is an element of set of injective (one-to—one) functions whose
domains are fl- and ranges are ZM.

X j- is called a ﬁeld transformation function.

When X j is the identity function for all j = 1, ..., n, Extended FX distribution

methods reduce to the Basic FX distribution method. It is easy to see that all the

25

lemmas and theorems that hold for the Basic FX distribution method also hold for

Extended FX distribution methods.

By the above deﬁnition, a ﬁeld transformation function can be any one-tO-one func-
tion. However, we are only interested in the ﬁeld transformation functions by which we
can achieve strict Optimal distribution for various types of partial match queries for

which the Basic FX distribution method does not give strict optimal distribution.

Since the ﬁelds whose sizes are no less than the given number of devices M, do not
cause any problem (whether it is speciﬁed or not), in this section we will focus only on
the ﬁelds whose sizes are less than M. From now on, we will simply call FX distribution

methods instead of Extended FX distribution methods.

In the following subsections we describe ﬁeld transformation functions which are

used for the ﬁelds whose sizes are less than the given number of devices.

3.4.1. Field Transformation Functions for Partial Match Queries

The ﬁeld transformation functions developed for FX distribution methods are as

follows.

Deﬁnition 3.9. Letﬁ = {0, . . . , Fl—l}, and let If, I and Marc power of 2.

(l) I z N —> N is an identity function.

(2) When Iﬁl <M, UM’ 'f" : f, —+ 2,, is afunction such that UM’ 'ﬁ'(z)=1dM"f",
wheredM'Urll =M/If,l.
(3) When lle < M, for x = 1, . . . , y in which y is a maximum integer such that
l f, It < M, 'IUXM' 'f" : f, —> ZM is a function such that
x
IUXM’ 'ﬁ' (1) =1 [+] [,[c+}ldkM’ '4'] where dkM’ 'f" = M/ If, l".
The function IUQI' 'f’ ' is a general notation for functions IUIIW’ 'f’ I , IUIZW‘ If’ I , . . . , etc.

For example, functions I UIIW’ If" and 103" W are deﬁned as follows :

26

(a) When I ﬁt < M, IUlM’If’l : f, —> 2,, is a funCtion such that

IUIM’ 'f"(l)=1[+]1dM"f",wheredM"f" = M/lf,|.

(b) When I ﬁl2<M, IUzM’W : f, —> 2,, is a function such that
IUZM' 'f"(1)=z[+] 1d,“ 'f" [+]1d2“"f", where dlM’W = M/Iﬁl, dzM’W =
M/IfIIZ.

Example 3.3. Letf1={0,1, 2, 3} andfg = {0,1} andM =16.

(a) 016.4%): {0. 4. 8. 12}. and IU116'4tn>= {0[+10. 1[+l4. 2[+]8, 3[+]12} = {0, 5.
10, 15}.

(b) 111216202) = {0, 1[+]8[+]4} = to, 13} and [(1316202) = {0, 1[+]8[+]4[+]2} = [0.
15}.

We have deﬁned basic transformation functions, I, U M' If" , 1U IM’ W, ,

M,|f,l

1U, which will be used in various combinations for optimal data distribution. For

example, for any values of If,- I, lfjl and M, it will be shown that FX distribution
methods distribute ordered 2-tuples in [(73) x U M’ W (fj) optimally.

When X {Ml’lf‘ ' and XZMZ’If’I functions are applied to ﬁelds 1', and j, two

transformation methods are said to be the same, if X 1 = X 2. For example, U M 1’ 'f‘ '

M2.If,-I M1,!f,l

and U
M2,lf,-I
1 s

are said to be the same transformation methods. But U and

M1,lf}l M2,!fjl

[U or IU1 and IUZ are said to be diﬁ’erent transformation

methods.

Because of notational complexity, we will leave out the superscripts M, l f,| from

transformation functions and their parameters whenever there is no ambiguity.

It is easy to see that for any proper subset f, of ZM where lf,| is some power of 2, U
transformation function satisﬁes the requirement of ﬁeld transformation functions (i.e.,
one-to-one function whose domain is f; and range is ZM). We will show later that I U 1,

. . . , I UJr transformation functions also satisfy this requirement. From the deﬁnition of U

27

transformation function we see that the transformed domain elements are equally
spaced. In other words, when they are linearly ordered, any two adjacent elements have
the same distance. Several useful properties of IU1, . . . , IUJr transformation functions

will be described in Lemma 3.7, Lemma 3.9, Lemma 3.11 and Lemma 3.12.
In the following sections, we will present various combinations of ﬁeld transforrna—
tion functions by which FX distribution methods give optimal distribution. Note that

we focus only on the ﬁelds whose sizes are less than the given number of devices.

3.4.2. I and U Field Transformation Functions

In this section we show in Theorem 3.3 that for any values of F ,-, F j and M, FX dis-
tribution methods distribute ordered 2-tuples in I (ﬁ) x U (fj) optimally. Theorem 3.3

uses Lemma 3.2 which is an extended version of Lemma 3.1.

Lemma 3.2. For a nonnegative integer L, let L = aw + b, where w is a power of 2, and
a, b are some nonnegative integers such that 0 S b S w-l. When Z, = {0, 1, , w-l },
Zw [+] L = {aw, aw+1, , (a+1)w-1}.

Proof: Let (1”,-1 - - - lk lk_1 - ~ - [0)3 be a binary notation of L, where [k has weight w.

Then, (1”,-1 [k)B = a. All elements in ZW has the form (0 ...0 bk_1 ' ~ ' b0)3. When

L [+] or for any or e Zw, the result should have the form (l,,,_1 - - ~ [k ck_1 .. ° c0)3
which is aw + (ck_1 c0)B, where (ck_1 co)3 is equal to (lk_1 10)B [+]
(bk_1 - . - b0)3. Thus the proof follows by Lemma 3.1. E]

Theorem 3.3. When there are only two ﬁelds i, j whose sizes are less than the given
number of devices M, the FX distribution method with I ()2) and U (fl) is perfect
optimal.

Proof: There are two cases, i.e., FiFj < M and FiFj 2 M. Let dj = M/Fj.

(case 1) FiFj < M (i.e., dj >Fi)

I(f,-) [+] U(fj) = {0, 1,, Fi-1}U{dj, dj+1, , dj+F,--1} U {M-dj, M-dj+1, ...,

‘1!

 

28

M-dJ-+F,--1} by Lemma 3.2. In the right hand side all sets are disjoint and largest ele-
ment is less than M because dj > F}. So, for all z e ZM,
#{(J,-,Jj) e fixfj | I(J,-) [+] U(Jj) = 2} S 1. Therefore, it is 2-optimal. 0 and l-optimal
come from Theorem 3.1.
(case 2) FiFJ-ZM (i.e., dj SF,-)
Let FiFj =A*M (i.e., dj = FilA). In order to be 2-optimal, for all z e ZM,
#{(J;,Jj) e fixfj l 10,-) [+] U(Jj) = 2} should be equal to A.

(1) 0 S U(Jj) S (A-1)dj
For each U(Jj) in this range, I(f,-) [+] U(Jj) = {0, 1,, Fi-l} by Lemma 3.2. Let this
set be So. Since there are A number of such U (J j) in this range, for all s 6 So
#{(J,-,Jj) e fixfj l OSU(Jj)S(A—l)dj, [(J,) [+] U(Jj) :3} = A.

(2) Adj S U(Jj-) S (2A-l)dj
For each U(Jj) in this range, I(f,-) [+] U(Jj) = {F}, Fi+1, --- , 2F,-—l} by Lemma 3.2.
Let this set be SI. Since there are A number of such U(Jj) in this range, for all s e 51
#{(J,-,Jj) e ﬁxfj | AdjSU(Jj)S(2A —1)dj, l(J,-) [+] U(Jj)=s} =A.

(M/Fi) (M/F;—1)Adj S U(Jj) S M-dj
For each U(Jj) in this range, [(ﬁ) [+] U(Jj) = {(M/F;—1)F,-, (M/Fi—1)F,~+1, . M—l}.
Let this set be SM/F,—1- Since there are A number of such U (J j) in this range, for all
s e SM,p‘._1, #{(J,-,Jj)e ﬂxfj | (M/Fi—1)AdjSU(Jj)SM-dj, I(J,~) [+] U(Jj)=s} = A.
M/F,—1
Since U Sp = ZM and there are A repetitions for each element in ZM through 1 (ft) [+]
p=0

U(Jj), Jj = O, - - - , Fj—l, it is 2—optimal. 0 and 1-optimal come from Theorem 3.1. C]

Example 3.4. Letfl = {0, 1, 2, 3},f2 = {0, l, 2, 3} and M =16. Figure 3.2 shows the
bucket distribution by FX and DM distribution methods. Note that I (f 1) = [0, l, 2, 3}

29

and U(fz) = {0, 4, 8, 12} are I and U transformed values of f1, f2 and denoted by
binary numbers. Here, Device N0 = TM(I (J 1) [+] U (12)) for FX distribution methods,
and Device N0 = (J 1 +12) mod 16 for the DM distribution method, where J 1 6 f1, J 2
6 f2.

 

 

f1 f2 I (f 1) U (f 2) Device No (FX) Device No (DM)
0 0 0000 0000 0 0
0 1 0000 0100 4 l
0 2 0000 1000 8 2
0 3 0000 1100 12 3
1 0 0001 0000 l 1
1 1 0001 0100 5 2
l 2 0001 1000 9 3
1 3 0001 1100 13 4
2 0 0010 0000 2 2
2 1 0010 0100 6 3
2 2 0010 1000 10 4
2 3 0010 1100 14 5
3 0 0011 0000 3 3
3 l 0011 0100 7 4
3 2 0011 1000 11 5
3 3 0011 1100 15 6

 

 

 

Figure 3.2. FX Distribution with I And U Transformation

The FX distribution in Figure 3.2 is perfect optimal. But in DM distribution, the
distribution is skewed. The GDM method can also give optimal distribution by multi-
plying 3 to the ﬁrst ﬁeld values and by 4 to the second ﬁeld values. However, these
parameters should be found by trial and error. On the other hand, FX distribution tech-

niques give a speciﬁc method.

3.4.3. I and IUx Field Transformation Functions
In this section we show in Theorem 3.4 that for any values of F ,-, F k, and the given
number of devices M, FX distribution methods distribute ordered 2—tuples in I ()2) x

IUx(fk) optimally. Lemma 3.3 shows IUX transformation functions are injective.

30

Lemma 3.7 shows a useful property of IUX transformation functions which is used to

prove Theorem 3.4. Lemma 3.7 is derived based on Lemma 3.4, 5.3 and 5.4.

Lemma 3.3. Let fk = {0, , Fk-l} such that F k" < M for some positive integer x.
Then, IUx(fk) is a set of Fk elements between 0 and M-1 (i.e., IU,(fk) is an injective
function).

Proof: (case 1 ) x = 1.

Assume otherwise. Let d1 = M /F k. Then there exist two different elements K 1,K2 e
fk such that K1 [+]K1d1= K2 [+] K2d1. Let K’ = K1 [+] K2. (Note that for any two
different nonnegative integers K 1 and K 2 there always exists positive integer K’ such
that K’ = K1 [+] K2). By substituting K1 [+] K’ for K2 we have K1d1 = K’ [+] K2d1.
Let K1 = (0 O a, a0)3, K2 = (O 0 b5 b0)3 and K’ = (O 0 e1e0)3, where r,
s, t denote leftmost bit positions in which the binary value is "1" in K1 , K 2, K’ , respec—
tively. This implies a, at b, because K ’ = K 1 [+] K 2. Since d 1 is a power of 2, d 1 = 26
for some positive integer c. Then, K1d1 = (O O a,+c a1+c ac 0)B, K2d1 = (0
0 bs+c b1+c bc 0)B, where a1+c at b,+c. Since e1+c is zero, a1+c at e,+C [+] b1+c.
This contradicts K 1 d 1 = K’ [+] K 2d 1.

(case 2) x > 1

Assume otherwise. Then there exist two different elements K 1, K 2 e fk such that
(K1[+]K1d1[+]...[+]K1dx)=(K2[+]K2d1[+]...[+]K2dx), where d1=M/Fki, i
= l, . . . , x. Since K1d1 and K 2d1 are the only ones which can have binary value "1"
between bit position logzd 1 and logZM (we considered the rightmost bit position to be
zero), the only way the above equation can be satisﬁed is K 1 = K 2. Therefore, contrad-

iction. El

Lemma 3.4. When there are only two ﬁelds i, k whose sizes are less than the given
number of devices M and F ,- 2 F k, the FX distribution method with I (f1) and I U 1 (fk) is

perfect optimal.

31
Proof: There are two cases, i.e., F1Fk < M and EB, 2M. Let dk = M/F;C and d,- =
M/F1.
(case 1) F1-F,c < M (i.e., dk >F1)
For all K e fk,
1(fi) [+] K [+] de={0, 1, Fi_1}[+]de
= {de, de-i-l, ' ‘ ' , de'i-Fi—l]
The ﬁrst equality holds by Lemma 3.1, and the second equality holds by Lemma 3.2.
Clearly, no element in ZM is repeated through I (f1) [+] IU1(K), K=O, F k—l because
dk > F1.
(case 2) FiFk 2 M (i.e., Ci}; 5 F1)
Let FiFk =AM. Then, F1=Adk and Adidk =M. Let K6 fk
i) OSK <A (i.e., OSde < F1)
For each K within this range,
1(fi) [+1K [+]de={0, 1, .Fi-1}[+]de
={0, l, , F1—1}
by Lemma 3.1. Since there are A number of such K’s within this range, there are A
repetitions for each element in Zp'. through I (f1) [+] [U 1 (K), K=0, A-l.
ii) A SK < 2A , (i.e., F1Sde < 2F1)
For each K within this range,
[(fi) [+1K [+1de = {0. 1, --- .Ft—1}[+]de
= {[71, Fri-1, ' ' ' , 2F1—1}
by Lemma 3.1 and Lemma 3.2. Since there are A number of such K’s within this range,

there are A repetitions for each element in Z 21:]. - Zp‘. through I (f1) [+] [U 1 (K), K=A,
2A-1.

32

d1) (d1—1)A SK < d1A (i.e., (d1—1)Adk Sde < Adidk)
For each K within this range,
I(f,-) [+] K [+] de = {0, 1, -~ , F1—1}[+]de
={(di_1)Fii(di_1)Fi+1v “' .diFi-l}

by Lemma 3.1 and Lemma 3.2. Therefore, there are A repetitions for each element in
ZM - ZM_pl.. Since there are A repetitions for each element in ZM through
[(12) [+] IU1 (K), K=0, , Fk-l, it is 2-optimal.

O and 1-optimal come from Theorem 3.1 for both case (1) and case (2). E]

Deﬁnition 3.10. When Fj < M and d = M/Fj, there are Fj intervals [0, d), [d, 2d), . . . ,
[M-d, M) from 0 to M with interval size d =M/Fj, where "[" and ")" denote "closed"

and "open", respectively.

Lemma 3.5. When there are only two ﬁelds i and k such that F 1F k = M, FX distribution
with I (f1) and IU1(fk) is perfect optimal if and only if there is exactly one element of
I U 1 (fk) at each interval from O to M with interval size M /F k-

Proof: Since F1Fk =M, in order to be perfect optimal distribution for all z e ZM,
#{(J1-,Jk) e f1xf,c | I(J,-) [+] IU1(Jk) = 2} should be equal to one.

(1) Only if : Assume otherwise. Let dk = M/F,c i.e., dk = F1. Then for some inter-
val [ldk, (l+l)dk), there exist two or more IU1(fk) elements. Let u and v be such ele-
ments in fk, i.e., Id;c S u [+] udk S (l+1)d;c — 1 and ldk S v [+] vdk S (l+1)d/c - 1. Then
I(f,-) [+] u [+] udk = I(f,-) [+] v [+] vd,c = {ldk, lei/(+1, , (l+1)dk-1}. The equality
holds by Lemma 3.1 and Lemma 3.2. Note that F; = dk. Clearly, the distribution is not
optimal. Therefore, contradiction.

(2) If : Let 5,, = {aF,-, aF1-+1, ,(a+1)F1-—1}. Since IU1(fk) =

33

’71-]
{idk+C1'li=0, , Fk-l, where OSCISdk}, IU1) [+]IU1(fk) = Use: = ZM by
a=0

Lemma 3.2. Therefore, it is perfect optimal. [:1

Lemma 3.6. For any F k which is less than M, there is exactly one element of IU1(fk)
at each interval from O to M with interval size M /Fk.

Proof: For any large Fk there always exist M and F1 such that M > F,- 2 Pk and
FiF,‘ =M (we can always ﬁnd such F,- and M). For such M, F,- and Fk, I (f1) [+]
IU1(fk) gives perfect optimal distribution by Lemma 3.4. For all such F k, by Lemma
3.5 there is exactly one element of IU1(fk) at each interval from O to M with interval

size M /F k. Thus the lemma follows C]

For example, whenf1={0, 1, 2, 3},f2 = {0, l, . . .7} and M: 16,1U1(f1) = {0, 5,10,
15} and IU1(f2) = {0, 3, 6, 5, 12, 15, 10, 9}. We can see that there is exactly one ele-

ment of [U 1 (f1) and I U 2 (fz) at each interval with size 4 and 2, respectively.

Lemma 3.7. Let f; = {0, , F j—l }, where F 1.16 < M for some positive integer x. Then,
there is exactly one element of IUx(fj-) at each interval from O to M with interval size
M/Fj.

Proof: When x =1, this is true by Lemma 3.6. Whenx 2 2, let d,- =M/F-i, i = 1, . . .,
x. Since (Jd2 [+] . . . [+] de) < d1 for all I e fj, the proof is clear. [3

Theorem 3.4. When there are only two ﬁelds i, j and the given number of devices is M
such that F ,- < M and F j" < M, for some positive integer x, the FX distribution method
with 1 (f1) and I Ux(fj) is perfect optimal.

Proof: Let d 1=M/Fj.

(case 1) F117} S M (i.e., d1 2F1)

By Lemma 3.7, there is exactly one element of IUx(fj) at each interval from O to M with
interval size d1. Thus, by Lemma 3.2 I (f1) [+] IUx(fj) is a set of all different F1F j ele—

ments between 0 and M-1.

34

(case 2) F1FI- > M (i.e., d1 <F,-)
Let F1-Fj =AM. Thus, F1=Ad1. Let IUx(fJ-) = {t0, t1, , tp}._1}, where
t0<t1 < <tp}_1.

(1)0 S t1< Ad1 (i.e., 0 S t1 < F,)
For each t1 within this range, I(f,-) [+] t, = {0, 1, , F1—1 }. Since there are A number of
such t1 elements within this range by Lemma 3.7, there are A repetitions for each ele-
ment iani through 10”,) [+] t1, 1 =0, --- , A—l.

(2) Ad1St1< 2Ad1 (i.e.,FiSt1< 2F1)
For each t1 within this range, I(f,-) [+] t1 = {F1, F1+1, ,2F1—1} by Lemma 3.2. Since
there are A number of such 1‘, elements within this range, there are A repetitions for each
element in Zzpl, - Zp‘, through I(f,~) [+] t1, 1 =A, -'- , 2A—1.

(M/F1) (M/F1—1)Ad1 S t1< M/F1-Ad2 (i.e., (M/F1—1)F1 S t1< M)
For each I, within this range, 1(f1) [+] t, = {(M/F1—1)F,~, (M/F1—1)F,-+1, ,
(M/F1—1)F1+F1—1} by Lemma 3.2. Since there are A number of such t1 elements

within this range, there are A repetitions for each element in ZM — Z (713.415. through
M
1(ft) [+] t1. 1 = (F-IM, ,M/FiA—l.
i

Since there are A repetitions for each element in ZM through 102-) [+] t1,

1 = 0, ..., , Fj—l, it is 2-optimal. 0 and l-optimal come from Theorem 3.1. CI

Example 3.5. Letf1 = {0, 1, 2, 3},f2 = {0, 1, 2, 3} and M = 16. Figure 3.3 shows the
bucket distribution by the FX distribution method with 1(f1) and IU1(fz). Here,
IU1(fz) = {0, 5, 10, 15} and Device N0 = TM(I(J1) [+]IU1(J2)), J1 6 f1, .12 6 f2.
We can see that the distribution of Figure 3.3 is perfect optimal.

Example 3.6. Let f1 = {0, l, 2, 3,4,5, 6, 7}, f2 = [0, 1} and M = 16. Figure 3.4
shows the bucket distribution by the FX distribution method with I (f1) and IU2(fz).

35

 

 

[(fl) 1U1(f2) DcviCCNO
0000 0000 0
0000 0101 5
0000 1010 10
0000 1111 15
0001 0000 l
0001 0101 4
0001 1010 11
0001 1111 14
0010 0000 2
(”10 0101 7
(X110 1010 8
0010 1111 13
0011 0000 3
0011 0101 6
0011 1010 9
(X111 1111 12

 

 

 

Figure 3.3. FX Distribution with I And [U 1 Transformation

Here,IU2(fz) = {0, l3} and Device No=TM(I(J1)[+]IU2(Jz)),J1e f1,J2 6 f2.

We can verify that the distribution of Figure 3.4 is perfect optimal.

3.4.4. U and IU,‘ Field Transformation Functions

We will show in Theorem 3.5 that for any values of F j, F k and M, FX distribution
methods distribute ordered 2-tuples in U (fj) x IUxo‘k) optimally. Lemma 3.8 and
Lemma 3.9 show useful properties of U and IU1 transformation functions, respectively.

These two lemmas will be used to prove Theorem 3.5.

Deﬁnition 3.1]. Let S = {51, . . . , 31} be a set of nonnegative integers. Then for any

nonnegative integer c we deﬁne S + c = {s1 + c, . . . , s, + c}.

Lemma 3.8. Letfj={0, 1, ...,Fj—l}such thatFj<M. Let dj=M/Fj. Then, for
any nonnegative integer J and c such that OSJ <Fj and O S c < dj,

U03) [+1 (de+c)=U(fj)+c.

36

 

 

10(1) IU2U2) DeviceNo
0000 0000 0
0000 1101 13
0001 0000 l
0001 1101 12
0010 0000 2
0010 1101 15
0011 0000 3
0011 1101 14
0100 0000 4
0100 1101 9
0101 0000 5
0101 1101 8
0110 0000 6
0110 1101 11
0111 0000 7
0111 1101 10

 

 

 

Figure 3.4. FX Distribution with I And I U 2 Transformation

Proof: By XOR uniqueness property, IU (fj) [+] JdJ-I = F j. Let ad}- 6 U (fj). Then
adj [+] de should be ij for some [3 e 2;]. Thus, adj [+] de e U(fj). Since
|U(fj) [+] del = F1 and ad,- [+] de e U(fj) for all ad,- 6 U(fj), U(fj) [+] de should
be equal to U(fj). de + c = de [+] c because c < d}. Thus,
U(fj)[+1-/dj[+]C=U(fj)[+]c=U(fJ-)+c. D

Lemma 3.9. Let f,- = {0, , F j—l} such that F jx<M for some positive integer x.
Then, for any K1, K2 6 fj, and for any (1 which is a power of 2, K1 mod d = K2 mod d
if and only if IU,(K1) mod d = IU,(K2) mod d.
Proof: The proof is an induction on x. Let d1 = M/F-i, i = 1, . . . , x.
(1) Basis. x = 1. We want show that K1 mod d = K2 mod d if and only if
(K1 [+] K1d1) mod d= (K2 [+] K2d1) mod d. Let or = logzd and [3 = log2d1.

Only if : When d S d 1, bit positions 0 to or — 1 of K1 and K 2 are not affected by

exclusive-or of K 1d 1 and K 2d 1. Therefore, this is true. When d > d1 , let K 1 mod d =

1|!

37

K2 mod d = (aa_2 a0)3. Then, (K1[+]K1d1) mod d =
(aa_2[+]aa_2_g aa_3[+]aa_3_13 - - ' a0)3. Clearly, (K2 [+] K2d1) mod d is also
(act-21+]act—2—s aa—3[+laa—3—B ' ‘ ‘ 00hr.

If : Assume otherwise. Then there exist K 1, K 2 such that (K 1 [+] K 1d1) mod (1 =
(K2 [+]K2d1) mod d, and K1 mod d at K; mod d. Let K1 mod d = (a01_2 -~ a0)3
and K2 mod d = (ba_2 - - . bo)3. Let i be the rightmost bit position such that a,- ¢ b1.
When i < B, clearly, contradiction. When i 2 B, a1[+]a1_15 = b1[+]b,-_B because
(K1 [+] K1d1) mod d = (K2 [+] K2d1) mod d by the assumption. Since i is the right-
most bit position such that a; at b1, this implies that a,-_13 = b1_13. Therefore, a contradic-
tion.

(2) Induction Step. Suppose this is true for x = p. We want to show that the lemma is
also true for x = p + 1. (Note that we consider only the case where F1” H < M). Let
(K1[+]K1d1[+]...[+]K1dp)=L1, and (K2 [+] K2d1 [+] . . . [+] szp)=L2.

Only If : K1dp+1 mod d is equal to szp+1 mod d. (It is easy to see that K1 mod d=
K2 mod d if and only if K1d’ mod d = sz’ mod d for any (1’ which is a power of 2.)
Since L1 mod d = L2 mod d by induction hypothesis, (L1 [+] K 1dp+1) mod d should be
equal to (L2 [+] K 2dp+1) mod d. Note that when d is a power of 2,
(L1 [+] K1dp+1) mod d: (L1 mod d) [+] (K1dp+1 mod d).

If : Assume otherwise. Then there exist K 1, K 2 such that
(K1[+]Kidil+] ' ' ' [+]Kidp+i) mod d = (K2[+1K2d1[+] ‘ ' ' [+]K2dp+1)m0d 4’ and
K1 mod d at K2 mod d. Let 0t =1og2d and B = logzdp+1. Let K1 mod (1 =
(aa_2 ' - - a0)3 and K2 mod d = (b 13-2 - - - b0)3. Let i be the rightmost bit position
such that a,- ¢ b1. The remainder of the proof is similar to that of case 1, i.e., x = 1.

This completes the induction. [1

Corollary 3.2. Let f,- = {0, , F j—l} such that F j" < M for some positive integer x.
Let S, i = O, . . . , d—l be a subset of IUx(fj) whose residue by modulus d is i, where d

is a power of 2 which is less than M. Then, for any such d, ISO! = |S1| = . . . =

38

[511.1 I.

Proof: This is a direct consequence of Lemma 3.9. [:1

Theorem 3.5. When there are only two ﬁelds j, k and the given number of devices is M
such that F j < M and F I“ < M for some positive integer x, the FX distribution method
with U (f,) and IUz(fk) is perfect optimal.
Proof: Let dj = M/Fj.
(case 1) Fij > M (i.e., Fk > dj)

Let Fij =AM. Then A =Fk/dj, i.e., Fk =Adj. Let K C- fk.

(1) K mod d1: 0 (i.e., K = de for some I 6 fl)
By Lemma 3.9, all IUx(K) elements such that K mod dj = 0 has the same residue co by
modulus dj. So, all such IUx(K) can be represented as adj + co for some variable or e
ij and some ﬁxed nonnegative integer (:0 which is less than d j. Since F k =Adj, there
are A number of IUx(K) such that K mod dj = 0. For such IUx(K), U(fj) [+] IUx(K) =
U (fj) + co by Lemma 3.8. Let this set be S 0. Then there are A repetitions for each
element in So through U(fj) [+] IUX(K), K = 0, d1, ~~ , (A—l)dj.

(2) K mod dj =1 (i.e., K =de + 1, for some I e fj)
By Lemma 3.9 all IUx(K) elements such that K mod dj = 1 has the same residue c1 by
modulus dj. For all such K U(fj) [+] IUX(K) = U(fj) + c1 by Lemma 3.8. Let this set
be S 1. Since there are A number of IU,(K) such that K mod d,- = 1, there are A repeti—
tions for each element in S1 through U(fj) [+] IUX(K), K =1, dj+1, -°~ , (A—1)a’j+1.

(dj-l) K mod dj = dj -1
By Lemma 3.8 and Lemma 3.9, for any IUx(K) such that K mod dj = dj—l,
U(f,-) [+] IUx(K) = U(fj) + Cdrl for some cdl._1 which is less than dj. Let this set be

$111.1. Then, there are A repetitions for each element in de_1 through U (fj) [+] I Ux(K ),

d—l
K = dj—l, - . - ,Adj-l. By Lemma 3.9, all c,-’s are different each other. So, (J51 =
1:0

39

ZM. Since there are A repetitions for each element in ZM through U (fj) [+] IUx(K),
K = O, - ~ - , F k—l, it is 2-optimal. 0 and l-optimal come from Theorem 3.1.

(case 2) Fij < M (i.e., Fk < dj)
The proof is almost the same as that of case 1. Here, enumeration ends at F k—l instead

of d j—l and there exists only one I Ux(K ) element at each step. El

Example 3.7. Letf1 = {0, 1, 2, 3},f2 = {0, 1, 2, 3} and M =16. Figure 3.5 shows the
FX distribution with U(f1) and IU1(fz). Here, U(f1): {0,4, 8, 12},IU1(fz)= {0,5,
10,15} and Device N0 = TM(U(J1) [+] IU1(12)), 116 f1, J2 6 f2. We can see that

the distribution of Figure 3.5 is perfect optimal.

 

 

U(fl) IU1(fZ) Device No
0000 0000 0
0000 0101 5
0000 1010 10
0000 1111 15
0100 0000 4
0100 0101 1
0100 1010 14
0100 1111 11
1000 0000 8
1000 0101 13
1000 1010 2
1000 1111 7
1100 0000 12
1100 0101 9
1100 1010 6
1100 1111 3

 

 

 

Figure 3.5. FX Distribution with U And [U 1 Transformation

Example 3.8. Let f1 = {0, 1, 2, 3, 4, 5, 6, 7},f2 = {0, 1} and M = 16. Figure 3.6
shows the FX distribution with U(f1) and IU2(fz). Here, U(f1) = {0, 2, 4, 6, 8, 10, 12,
14}. 1U2(fz) = {0, 13} and Device N0 = TM(U(Jr) [+] “1202», Jr 6 f1, 12 6 f2.

We can verify that the distribution of Figure 3.6 is perfect optimal.

40

 

 

U(fl) [U2(f2) DeviceNo
0000 0000 0
0000 1101 13
0010 0000 2
0010 1101 15
0100 0000 4
0100 1101 9
0110 0000 6
0110 1101 11
1000 0000 8
1000 1101 5
1010 0000 10
1010 1101 7
1100 0000 12
1100 1101 1
1110 0000 14
1110 1101 3

 

 

 

Figure 3.6. FX Distribution with U And IU2 Transformation

3.4.5. IU1 , IU2 , . . . , IUx Field Transformation Functions
In this section we will show that the set of IU1 transformation functions can give
optimal distribution for the case when there are several ﬁelds whose sizes are much less

than the given number of devices.

Deﬁnition 3.12. Let <k1,...,k,,> and <11,...,l,,>betwo ordered n-tuples. Then,
<k1,...,kn>=<[1,...,ln>,lfk1=11,...,kn=ln. Otherwise,<k1,...,k,,>¢<l1,.
.,l,,>.

Lemma 3.10. When there are only two ﬁelds i 1, i 2 such that F12 2 F11 and F122 < M
for the given number of devices M, the FX distribution method with IU 1(f1-1) and
[U 2(f1-2) is perfect optimal.

Proof: Assume otherwise. Then, there exist {J11,J12} ; f11, {121,122} ; fig such
that <111er> ¢ 41211222 and (J11[+1111d11)[+1(121[+11216121[+]er6122) =
(112[+]Jrzdlr)[+](122[+1122d21[+1122d22), Where 6111 = M/Fila 6121 = M/Fi2, 6122

41

(121/F12. (Note that F11F12 < M and d11 2 d21.) The above equation can be rewritten
as [(111 [+]-[12M 11 [+1021 [+1122)d21] [+] [(er [+]J22)d22[+1111 [+11 121+11211+11221 =
0. Since the second term (delimited by bracket) is less than d 21 , and the ﬁrst term does
not have binary value "1" between bit position 0 and log2d21 - 1, the only way the above
equation can be satisﬁed is that each term (delimited by bracket) should be equal to zero.

Thus, we have following two equations.
(J11[+1112)d11[+1(121[+1122)d21 = 0
(12111‘1122)€1221+1(111[+]J 121+11211+1122) = 0

(case 1) d11=d21
Then, J11[+]J12[+]Jz1[+]J22 = 0 from the ﬁrst equation. So, from the second equation
121 is equal to J22 which results in J11 = J12. This contradicts <J11,J21> ¢ <121,J22>.
(case 2) d11 > d21
Let d 11 = pd 21, where p is a power of 2. From the ﬁrst equation, J 21 [+]J 22 =
12 (J 11 [+]J 12). Substituting this in the second equation,
(1111+1112)Pd22[+1(111[+1112)[+](111[+1112)P = 0- Thus, the only way this equation

can be satisﬁed is J 11 = J 12 which implies 121 = J 22. Therefore, contradiction. C1

Deﬁnition 3.13. We deﬁne LMB(J), J e N, to be the leftmost bit position whose value
is "l" in the binary notation of J, and deﬁne RMB(J), J e N, to be the rightmost bit posi—
tion whose value is "1" in the binary notation of J. When J = 0, we deﬁne LMB(1) =

RMB(J)=-1.

For example, LMB(1) = O, LMB(3) = 1, LMB(10)= 3, RMB(1)= 0, RMB(3) = 0, and
RMB(lO) = 1. Here, we considered the rightmost bit position to be the bit position

ZCI'O.

Theorem 3.6. When there are only 1 ﬁelds, i 1, i2, il such that F11 .>_ F1(1_1) 2 2
F11 and F111 < M for the given number of devices M, the FX distribution method with

IU1(f,~1), , IU1(f,-1) is perfect optimal.

 

 

 

42

Proof: The proof is an induction on 1.

(1) When I: 2, this is true by Lemma 3.10.

(2) Suppose this is true until I = k—l. We want to show that this is true for l = k. For
convenience, let these k ﬁelds be 1,2, , k. (Note that we only consider the case where
sz 21:1, 11",," < M, and so F1F2...Fk < M.) Let F1, =p,,_1F,,_1 = pk-1pk_2Fk_2 = . ..
=pk-1pk-2 . - °p2F2 =pk_1pk_2 -~p2p1F1, where each pi, i = 1, , k-l, is either 1 or
some power of 2. Let d11 = M/F1, and d21 = M/Fz, dz2 = M0722, and . . . , and d11 =
M/Fk, . . . , d1,}, = M/Fk". Then,

dll=P1d21=P1P2d31=P1P2P3d41=---=p1P2°°'Pk—2d(k—l)l =P1P2 "'Pk—rdkr
6122 =P22432 =P22P32442 =- - -=P22P32~-Pk-22d(k—1)2 =P22P32---Pk—12dk2
433 =P33d43 =P33P43dss =. - - =P33P43---Pk—23d(k-1)3 =P33P43---Pk—13dk3

d (k —2)(k-2) = Pk-zk-Zd (k—1)(k —2) = Pk-Zk-ZPk—l k_2dk(k—2)
d(k-1)(Ic-1) =Pk—1k_l deC—l)
Assume otherwise, i.e., there exist {J11,J12} ; f1, . . . , [Jk1,Jk2} ; fk such that

d11, . . . ,Jk1>¢<]12, . . . , Jk2>and
(111[+1111d11)[+](121[+1121421[+1121d22)[+1 . . - [+](Jk1dk1[+] ---1+1Jk1dkk)=
(1121+11126111)[+](122[+1122421[+]Jzzd22)[+] - - . [+] (szdk2[+] - . . [+1Jk2dkk)

Here, Jkl it La. This is because if Jkl = sz, then <Jll, . . . , J(k_1)1> ¢
<I12, . . . , J (k_1)2> and hence the equality cannot be satisﬁed by the induction
hypothesis and XOR uniqueness property. The above equality can be rewritten as

[(111[+1712)d11[+](121[+]J22)421[+] . . - [+1 (J(k—1)1[+]J (k-1)2)d(k—1)1[+](Jkrl+l/kz)dk1][+]

[(121 [+1122)dzz [+] ~ . . [+](J(k-1)1 [+1J(k—1)2)d(k-1)2 [+](Jk1 [+1Jk2)dkzl [+1

43

[(J(It-2)1 [+11 (k—2)2)d(k-2)(k-2) [+] (J(k-1)1 1+1J(k—1)2)d(k—l)(k—2) [+1 (111 [+1Jk2)dk(k—2)1 [+]
[(J(k—1)l [+] J(k-l)2)d(k-l)(lz—l) [+] (Jkr [+] Jk2)dk(k-1)1 [+1

[(1111 [+] 112M111: [+1 (111 H1112) [+] (121 [+1122)[+]. - 1 [+1011 [+1 J1:2)1 =0-

Now, we will show that each term delimited by bracket (i.e., each line in the above
equation) should be equal to zero.

(1) The k-th term (i.e., the last term) should be equal to zero because the last term can-
not be greater than dk(k_1), and all other terms do not have binary value ”1" between bit
position 0 and logzdk(k_1) — 1,

(2) Suppose (k-1)-th term is not zero. Then, (J(k_1)1 [+] J(k_1)2)d(k_1)(k_1) 2 dk(k_2).
This is because all other terms do not have binary value "1" between bit position
logzdk(k_1) and lngdk(k_2) — 1. So, the only way this inequality can be satisﬁed is
LMB ((J(k—1)1 [+1 J(k-1)2)d(k—1)(k—1)) 2 RMB((Jkl [+] Jk2)dk(k-2))- (If not we can 566
immediately either (k-l)—th term should be zero or there is no way the whole equation
can be satisﬁed.) Since RMB ((J(k_1)1 [+] J(k_1)2)d(k_1)(k_1)) should be equal to
RMB((J/cr [+]Jk2)dk(k—1)). and RMB((Jkl [+] Jk2)dk(k—2)) -RMB((Jk1 [+]Jk2)dk(k—1))
= longk, it follows that (J (k_1)1 [+] J (k_1)2) > F k. This contradicts Fk_1 S Fk. Hence
(k-1)-th term should be equal to zero.

(3) Suppose (k-2)-th term is not zero. Then, (J (k_2)1 [+] J (k_2)2)d (k_2)(k_2) should be
greater than or equal to dk(k_3). This is because ﬁrst, all the other terms do not have
binary value "1" between bit position logzdk(k_2) and lngdk(k_3) — 1, and second, from
(2) (J (k—1)1 [+] J(k—1)2)d(k—1)(k—1) < dk(k—2) and this implies that
(J(k_1)1 [+]J(k_1)2)d(k_1)(k_2) is less than dk(k_3). (If we multiply Fk_1 to the left-
hand-side, and multiply F k to the right-hand-side of the inequality
(J (k_1)1 [+] J (k_1)2)d (k_1)(k_1) < dk(k_2), this statement follows). The remainder of the

proof is almost the same as that of case (2)

44

By continuing this argument, all the terms delimited by bracket should be equal to
zero. Thus, we have the following k equalities.

(1111+1112)P1 " 'Pk—l [+] (1211+1Jz2)P2 ' ' 'Pk-l [+1 - - - [+](1(k—1)1[+1J(k-1)2)Pk—1 [+1(Jk1[+11k2)= 0

(~1211‘1'1-722)I722 ' ' 'PIc-r2 [+1 - . . [+] (Jot-m[+11(Ic—1)2)Pk—12 [+1 (Jkl [+1J12) = 0

011-211 [+1J(k—2)2)pk—2k_2Pk—lk—z [+] (Jot—m[+]J(k—1)2)Pz-1k-2 [+1 (111114112): 0
(Jot—1n [+]J(k-1)2)pk-lk-1 [+] (11:1 [+1Jk2) = 0
(Jk1[+1Jk2)dkk [+1 (11114-1112) [+1 - - . [+1 (111 [+1Jk2) = 0

Now, assume pk_1 = 1. Then, from (k-1)-th equality (J (k_1)1[+]J (k_1)2) [+]
(Jk1[+]Jk2) =0, and so from the (k-2)—th equality J (k_2)1 = J (k_2)2. This results in
J(k_3)1 = J(k_3)2 from (k-3)-th equality, and results in J(k_4)1 = J(k_4)2, . . . , J11 =
J 12. So, from the last equality Jk1 = sz, and hence J(k_1)1 = J (k_1)2. This contradicts
<J11, . . . , Jk1> at <J12, ...,sz>. Thus,pk_1 should be greater than 1.

In the (k-1)—th equality, since pk_1 is a power of 2, LMB (Jk1[+]Jk2) should be
greater than LMB (J 01-111 [+]J (k_1)2). (Otherwise, (k-l)—th equality cannot be satisﬁed.)
By the similar reason, it is easy to see that LMB (Jk1[+]Jk2) > LMB(J1-1[+]J,~2), i = 1,
k-l. Now, from the last equality, since LMB (Jk1[+]Jk2) > LMB (J11[+]J1-2), i = 1, k-l
and dkk is a power of 2, there is no way the last equality can be satisﬁed. We have
shown that there do not exist {J11,J12} ; f1, . . . , {Jk1,Jk2} ; fk such that
<J11,...,Jk1> ¢ <J12,...,Jk2> and
IU1(J11) [+] . . . [+] IUk(Jk1) =IU1(J12) [+] . . . [+] IUk(Jk2). This completes the

proof for l = k. Thus the theorem follows by the principle of induction. [:1

The following example shows the proof of Theorem 3.6 when l = 3, i.e., for [U 1,

[U2 andIU3.

45

Example 3.9. Let a ﬁle consist of three ﬁelds 1, 2, 3 such that F 3 2 F 2 2 F 1 and F 33 <
M for the given number of devices M. Suppose FX distribution with [U 1(f 1), I U 2(fz)
and IU3(f3) is not perfect optimal. Then, there exist {J11,J12} gf1, {121,122} gfz,

{J31,J32} gf3 such that <J11,J21,J31> at <I12,J22,J32> and

(1111+lfrrd11)l+1(121 [+]ledz1 [+1121422)[+](131 [+1131d31 [+1131d32[+11316133) =
(1121+1J12411)[+](1221+]J22421[+1122d22)[+](132[+1132431[+1132432[+1132433)
where d11 = M/F1, d21 = M/Fz, an = d21/F2, (131 = M/F3, (132 = d31/F3, (133 =
d32/F3. Here, J31 at J32. This is because if J31 = J32, then <J11,J21> ¢ <J12,J22>

and hence the equality cannot be satisﬁed by Lemma 3.10 and XOR uniqueness pro-

perty. The above equality can be rewritten as,
[(111[+1112)d11 [+1 (121[+1122)¢121 [+] (131[+1132)d31] [+]
[(121 [+1122)d 22 [+] (J31 [+1132)432] [+]
[(131[+1132)d33 [+] (J 11 [+1112) [+] (121[+1122) [+] (J31[+1132)] = 0

Let F3 = p2F2 = p1p2F1, where each pi, i = 1,2 is either one or some power of 2.
Then, 411 =Prdzr =prpzdsri and 422 =P22432-

Now, we will show that each term delimited by bracket (i.e., each line) should be
equal to zero.
(1) The third term (i.e., the last term) should be equal to zero because the last term is
less than d32, and the ﬁrst and the second term do not have any binary value "1"
between bit position 0 and log2d32 -— 1.
(2) Suppose the second term is not equal to zero. Then, (J 21 [+]J 22)d 22 > d31 because
the ﬁrst term does not have binary value "1" between bit position log2d32 and log2d31 —
1. Hence, LMB ((J21[+]J22)d22) 2 RMB ((J 31 [+]] 32)d 31). Otherwise, there is no way
the equality can be satisﬁed. (Here, note that (121 [+]122)d22 < d21 S d 11). Since

RMB ((J21[+]J22)d22) should be equal to RMB ((J31[+]J32)d32)a and

'1!

46

RMB ((131 [+1132 )4 31) - RMB ((J 31 [+11 32)d 32) = 10g2F 3. it follows that (1211+1122) >
F3. This contradicts F 2 S F3. Thus, the second term should be equal to zero, and

hence the ﬁrst term is also zero. Now, we have the following three equalities.
(1111+1112)P1P2d31 [+] (1211+1122)P2d31 [+] (1311+1132)d31 = 0
(121 [+1122)P22(132 [+1 (1311+1132)432 = 0
(1311+1132)433 [+] (111144112) [+] (12114-1122) [+1 (13114-1132) = 0

Here, assume p2 = 1. Then from the second equality, (J 21 [+]J 22) [+] (J 31 [+]J 32) = 0,
and so from the ﬁrst equality J 11 = J 12 which results in J31 = J 32 in the last equality.
This contradicts J 31 at J 32. Thus, P2 should be greater than one.

In the second equality, since p2 is a power of 2, LMB (J 31 [+]] 32) should be greater
than LMB (J 21 [+]] 22). Thus, from the ﬁrst equality, it is easy to see that
LMB (J 31[+]J 32) > LMB (J 11[+]J 12). Then, there is no way the third equality can be
satisﬁed except J31 = J 32 which is contradiction. Therefore, the FX distribution

method with IU1 (f1), [U2(fz) and IU3 (f3) is perfect optimal.

Corollary 3.3. Let a ﬁle consist of 1 ﬁelds, 1, 2, , . . ,I such that F1 2 F1_1 2 2 F1
and F11 < M for the given number of devices M. Then, for any set of ﬁelds
{i 1, ..., ik} g {1, 2, , . . ,1}, the FX distribution method with IU1-1(f1-1), ... ,IU1k(f,-k)
is perfect optimal.

Proof: This is a direct consequence of Theorem 3.6. 1]

3.4.6. I, U and [UK Field Transformation Functions

In this section we show that FX distribution methods can always give perfect
optimal distribution as long as the number of ﬁelds, whose sizes are less than the given
number of devices, is no greater than 4. Lemma 3.11 and Lemma 3.12 shows useful

properties of 1U, transformation functions in proving Theorem 3.7.

47

Lemma 3.11. Let fk = {0, . . . , F k-l} such that F k‘ < M for some positive integer x.
Let d,- =M/Fki, i = 1, . . . , x. Let K1, K2 6 fk, and or be some power of 2 which is less
than F k. Then, K1 and K 2 are in the same interval of size or from 0 to F k if and only if
IUX(K1) and IU,(K2) are in the same interval of size ord1 from O to M.

Proof:

Only If : When K 1 and K 2 are in the same interval of size or, (K 1 [+] K 2) S or-l and
hence (K1 [+] K2)d1 S (a—1)d1. Now,

1Ux(K1)[+]1Ux(K2)

=(K11+]Krdr [+]---l+]K1dx)[+](K2[+lK2dl[+]---[+]K2dx)
=(K1[+1K2)d1[+][(K1[+1K142[+1 - - . [+]K1dx)[+1(K2[+1K2d2[+] - - - [+]K2dx)1
<(K1[+]K2)d1+d1

The last inequality holds because [(K1 [+]K1d2 [+]. . . [+] K1dx) [+]
(K2 [+]K2d2 [+] . . . [+] szx)] < d1 by the construction of di, i = 1, , x. Thus,
IUx(K1) [+] IUx(K2) < ord1. This implies that IUx(K1) and IUx(K2) are in the same
interval of size ad 1.

If : When IUx(K1) and IUx(K2) are in the same interval of size ord1,
(K1[+]K1d1[+]. . - [+]K1dx)[+l(K21+]K2dl[+1- - . [+]K2dx) < 041- 50.
(K1 [+]K2)d1 + [(K1[+]K1d2[+] - - . [+]K1dx)[+] (K2[+]K2d2[+1 - - -[+1K2dx)]
< (1411. Thus,(K1[+]K2)d1<0td1,andsoK1[+]K2<oc. Cl

Lemma 3.12. Let fk = {0, . . . , F k—l} such that F k" < M for some positive integer x.
Let d1 = M/Fki,i= 1,. . .x. Let w = ord1 in which Otis a power of 2, and let B be a
power of 2 which is less than or equal to or. For some nonnegative integer c which is
less than M/w, let 5,10”) = {K e f, | cw SIUx(K) < (c+1)w, K mod [3 = 1'}. Then for
any such w, a, [3, c, ISOWM = |S1(CW)I =. . . 1511“”) I.

Proof: By Lemma 3.7, for any nonnegative integer c which is less than M /w,

#{seIUxUk) I cw Ss < (c+1)w} = or. Let {K1, . . . K01} be the subset offk such that

‘

48

cw SIUX(K,-)<(c+1)w,i=1,...,0t. Then by Lemma 3.11, all K1, . . . , K1,1 are in the
same interval with size or, and hence {K 1, . . . , K 01} is a set of or consecutive integers.
Thus the lemma follows. 1:!

Theorem 3.7. When there are only three ﬁelds i, j, k whose sizes are less than the given
number of devices M, and F k" < M for some positive integer x which is greater than 1,
the FX distribution methods with I (f1), U (fj) and IUx(fk) is perfect optimal, if either (i)
there are at least 2 ﬁelds r and s such that r ,s e {i,j, k} and F,Fs 2 M or (ii) Fk 2 F]-
Proof.‘ It is clear that (i) is a sufﬁcient condition for perfect optimal distribution by
Theorem 3.1, 3.3, 3.4, 3.5 and Proposition 3.1. So, let us consider only the other case,
i.e., F,FS < M for any r, s e {i, j, k}. We want to show that (ii) is sufﬁcient for perfect
optimal distribution for this case. Let d,- = M/Fj and dkl = M/Fk’, l = 1 , - - - , x.
Then, dj > F11dj> F1, and d,- > (11,1 and d11 > F,-_dk1>F,-.

(case 1) FiFij 2 M

Let FiFj-Fk =AM, and let (1, = Bdk1 and dk1 = CF}, where A,B and C are some positive
integers. Then, FiM/dek =AM or F1=Adlek =ABdk2. Since dj > F1, 10",) [+]

U(fj)=So US1U...USFJ,_1,where

s0 = {0, 1, ...,F1—1}
$1 = {dj, dj'i'l, , dj+F;—1}

Si = {ldj,idj+1, , (i+1)dj+F1~—1}

SFJ-l = ((Fj—1)dj, (Fj—1)dj+1, , (Fj—1)dj+F1'—1}

Clearly, all 51’s are disjoint. By deﬁnition 3.11 S,-+c = {idj+c, id j+l+c, ,
id j+F ,-—l+c} for any positive integer c which is less than d j. Let K e fk. By Lemma
3.7 there is exactly one element of K [+]de2[+] . . . [+]deJr at each interval from 0 to

dk1 with interval size dk3 (Note that this can be obtained by substituting dk1 for M in

49

Lemma 3.7).

(1)0 S (K [+] K dkz [+] . . . [+]Kd1a) < F,-
Since F,- = ABdkz, there are AB number of IUx(K) such that
US (K [+] K dkz [+] . . . [+]Kd1a) < F1.
(1-1) Kmod B = O (i.e., de1 mod dj = 0)

F~-l
For each IUx(K) within this range, [(f,-) [+] U(fj) [+] IUX(K) = OS,- The equality
i=0

holds due to Lemma 3.1, and Lemma 3.2 (Note that K[+]de1[+] - - - [+]de2 =
de1[+](K [+] de2[+]...[+]Kd/a)). Let this set be T11. Since there are A number of
such I Ux(K )’s within this range by Lemma 3.12 (this statement follows by substitute dk1
for M in Lemma 3.12), there are A repetitions for each element in T11.

(1-2) KmodB =1 (i.e., de1 mod (11 =dk1)

F-—1
For each IUX(K) within this range, I(f,-) [+] U(f1) [+] IU,(K) = (J S; + dk1. The equal-

1-
ity holds due to Lemma 3.1 and Lemma 3.2. (Note that dk1<dj.) Let this set be T12.
Since there are A number of such IU1(K) elements within this range by Lemma 3.12,
there are A repetitions for each element in T12

(l-B) K mod B = B —1 (i.e., de1 mod (1, = (B -—1)dk1)
F-—1
For each IUx(K) within this range, I(f,-) [+] U(fj) [+] IU1(K) = L) S,- + (B —l)dk1. Let
i=0
this set be T13. Since there are A umber of such IUX(K) elements within this range,
there are A repetitions for each element in T1 3.
(2)F1$K[+]de2 < 2F;
(2-1) K mod B = 0

F-—1
For each IU1(K) within this range, [(f1) [+] U(fj) [+] IUX(K) = Q S,- + F1. Let this set
i=0

50

T21. By the same reason as previous, there are A repetitions for each element in T21 .

(2-B) KmodB=B—l
171—1
For each IUx(K) within this range, 1(f1) [+] U(fj) [+] [Ux(K) = US,- + (B —1)dk1 + F1.
1'-
(Note that dk1 >F1.) Let this set be T23. By the same reason as previous, there are A

repetitions for each element in T23.

(C) (C-1)F,- SK [+1de2 < CF,-
(C-l) KmodB =0

For each IUx(K) within this range, I(f,-) [+] U(f1) [+] IUx(K) = U S,- + (C—1)F,-. (Note
i=0

that CF ,- = dk1.) Let this set be T51. By the same reason as previous, there are A

repetitions for each element in TC 1.

(C-B) KmodB=B-1
For each IU1(K) within this range, I(f,-) [+] U(f1) [+] IU1(K) =

F.—1
US,- +(B—1)dk1 + (C —1)F,-. Let this set be TCB. By the same reason as previous,
i=0

there are A repetitions for each element in TCB. Now, it is not difﬁcult to see that
i=C,j=B . . . . .
U T1,- : ZM and all T1,- ’s are d1s301nt. Srnce we already show that there are A repen-
i=1.j=1
_ i=C,j=B .
trons for every element in U T,-- , it is 3-opt1mal. 2-optimal come from Theorem
i=1,j=1

51

3.3, 7, 8. 0 and l-optimal come from Theorem 3.1.
(case 2) F117ij < M

The proof is almost the same as that of case 1. Cl

Corollary 3.4. Let L be the number of ﬁelds whose sizes are less than the given
number of devices M. FX distribution methods can be always perfect optimal, if L S 3.

Proof: When L = 0, 1, 2, it follows from Theorem 3.1, 3.2, 3.3, 3.4, 3.5 and Proposi-
tion 3.1. When L = 3, let i, j, k are ﬁelds whose sizes are less than M and F,- 2 Pk 2 F1.
Ika2 2 M, apply I (f1), U (f,) and IU1(fk) transformation. Since F 1F k 2 M, the corollary
follows by (i) of Theorem 3.7. If F12 < M, apply 1(f1), U (f,-) and IU2(fk) transforma-

tion. Then the corollary follows by (ii) of Theorem 3.7. [:1

Example 3.10. Let f1 = {0, l, 2, 3},f2 = {0, 1},f3 = {0, 1} and M = 8. Figure 3.7
shows the FX distribution with I(f1), U(fz) and IU2(f3). Here, U(fz) = {0, 4} and
1U2(fs) = {0. 7}. and Device N0 =TM(I(J1)[+]U(12)[+]1U2(13)).Jr E f1.12 6 f2.
J3 6 f3. (Here, we can also verify that the FX distribution with I (f1), U (f;) and
IU1(f3) gives perfect optimal distribution because the ﬁle system of this example also

satisﬁes condition (i) of Theorem 3.7).

We have shown that by various combinations of ﬁeld transformation functions FX
distribution methods give strict optimal distribution for many types of partial match
queries. Here, it should be emphasized that these ﬁeld transformation techniques along
with Proposition 3.1 increase the scope of optimality considerably. This is because by
Proposition 3.1 optimal distribution for a subset of ﬁelds guarantees strict optimal distri-
bution for many partial match queries in which those ﬁelds are unspeciﬁed. For exam-
ple, let a partial match query unspecify ﬁelds 1, 2, 3, where I f 1 l = 8, l f2 I = 4, I f3 I =
8, and M = 32. When ﬁeld 1 is U transformed and ﬁeld 2 is I-transformed, then regard-
less of ﬁeld 3, the distribution is strict optimal for this query by Theorem 3.3 and Propo-

sition 3.1.

52

 

 

I(f1) U(fz) I(1203) DeviceNO
000 000 000 0
000 000 111 7
000 100 000 4
000 100 111 3
001 000 000 1
001 000 111 6
001 100 000 5
001 100 111 2
010 000 000 2
010 000 111 5
010 100 000 6
010 100 111 1
011 000 000 3
011 000 111 4
011 100 000 7
011 100 111 0

 

 

 

Figure 3.7. FX Distribution with I, U And I U 2 Transformation

Now, the summarized results of FX distribution methods are as follows (here, all
the theorems, lemmas and corollary in section 3.3 also hold for (Extended) FX distribu-

tion methods)

Let a ﬁle consist of 11 ﬁelds and there be M parallel devices. Let L be the number
of ﬁelds whose sizes are less than the given number of devices M. FX distribution
methods can be always perfect optimal, when L S 3. Let L be greater than or equal to
4. Let (1qu ) be the set of ﬁelds which are unspeciﬁed for partial match query q. Then,
FX distribution methods are strict optimal for partial match query q, if at least one of the

following conditions holds.
(1) lq,,(f)|=0 or 1
(2) there is at least one ﬁeld i e qu(f ) such that F1 2 M.

(3) | qu(f ) I = 2 and transformation methods of two ﬁelds in qu(f ) are different. (ref.

section 3.4.1.)

53

(4) I q“(f) I = 3 and either
(a) there are at least two ﬁelds i, j e qu(f ) such that F 1F ,- .>_ M and transformation
methods of two ﬁelds i and j are different, or
(b) transformation methods of three ﬁelds in qu(f) are I, U, IUJr for some x 2 2,

and the size of IUx transformed ﬁeld is not less than the size of U transformed

ﬁeld.
(5) | qu(f) | 2 4 and either
(a) there are at least two ﬁelds i, j e qu(f ) such that F 1F j 2 M and transformation
methods of two ﬁelds i and j are different, or
(b) there are at least three ﬁelds i, j, k e qu(f ) such that F 1F jF k 2 M and transfor-
mation methods of ﬁelds i, j and k are I, U, I UJr for some x 2 2, and the size of

I U, transformed ﬁeld is not less than the size of U transformed ﬁeld.

(c) the ﬁelds in qu(f ) satisfy Theorem 3.6

Note that these are only sufﬁcient and are not necessary conditions.

It is unfortunate that FX distribution methods do not always guarantee perfect
optimal distribution when the number of ﬁelds whose sizes are less than M, is greater
than or equal to 4 in general. In fact, it has been shown in [Sun87] that when the
number of ﬁelds whose sizes are less than the given number of devices, is greater than or
equal to 4, there is no method which always gives perfect optimal distribution. How-
ever, even for these cases we will show through performance experiments that FX distri-

bution methods still gives near optimal distribution for most queries.

3.5. Performance Comparison with Other Distribution Methods

In this section we compare the performance of FX distribution methods with those
of the DM and GDM distribution method. The performance comparisons are based on

the probability of strict optimality and the average response time for a given partial

54

match query. In section 3.5.1 the probability of strict optimal distribution for partial
match queries is given. In section 3.5.2 we compare the average response time for the

FX, DM and GDM distribution method.

For both section 3.5.1 and 3.5.2, it is assumed that the probability of each ﬁeld
being speciﬁed is same for all ﬁelds and some ﬁeld being speciﬁed is independent of

each other.

3.5.1. Probability of Strict Optimality

In this section we show that the probability of strict optimality for FX distribution
methods is much higher than the DM distribution method. Even for the worst case the
decrease of probability of strict optimality for FX distribution is not much. On the other
hand, in the DM distribution the decrease is quite large. Since no general method has
been given to determine the existence of parameter values for strict optimal distribution
in the GDM method, we compare FX distribution methods to DM distribution method
only.

Let the ﬁle consist of 7 ﬁelds and M = 32. Let F1: 2, F2 =F3 = 4,
F 4 = F5 =F6 = 8, F7 = 16. In the DM distribution method the probability of strict
optimal distribution for some partial match query is 0.0547 (computed from the
optimality conditions given in [DuS82]). On the other hand, in FX distribution methods
with IU1(f1), IU2(fz), U(f3), I(f4), U(fs), IU1(f6), 1(f7), the probability of strict
optimal for a partial match query is 0.9531 (computed from the sufﬁcient conditions in
section 3.3 and 3.4). Therefore, in this example FX distribution methods give much
higher probability of strict optimality than the DM distribution method.

Figure 3.8 and 3.9 show the percentage of strict optimal distribution for all possible
partial match queries in a given ﬁle system. In these ﬁgures DM denotes the results of
the DM distribution method and FX denotes the results of FX distribution methods.

Here, the results are computed from sufﬁcient conditions given for each method. Figure

55

3.8 shows the case where for any two ﬁelds r and 3 whose sizes are less than the given
number of devices, F ,F s 2 M. We used ﬁles with six and ten ﬁelds. In this ﬁgure FX

distribution methods used I, U and I U 1 transformation methods.

Figure 3.9 shows the percentage of strict optimal distribution when for any two
ﬁelds r and s whose sizes are less than the given number of devices, F ,F s < M, but for
any three ﬁelds r, s and t whose sizes are less than the given number of devices, F,F,F1
2 M. Here, in FX distribution methods I, U and I U 2 transformation methods are used.

These results show that FX distribution methods give high probability of strict
optimal distribution for partial match queries in typical ﬁle systems. However, since
there is no convenient way to compute probability of strict optimal distribution in more
general ﬁle systems, in the next section we give results of performance experiments

based on average response time.

3.5.2. Average Response Time

Deﬁnition 3.14. For a given partial match query q, r,-(q) is deﬁned as the number of
qualiﬁed buckets in device i for a partial match query q. We call this the response size
of device i for a partial match query q. Then, the largest response size for a partial

match query q is deﬁned as MAX(r 1 (q), r2(q), - . - rM_1(q)).

For the response time of a partial match query, we will consider two factors,
namely, largest response size and CPU computation time for bucket address calculation.
In parallel disks environment, largest response size is the most important factor, while in

main memory databases, CPU computation time is more important.

When systems are conﬁgured such that the data retrieval time for any device is
almost the same, the response time for a partial match query is determined by the device
which has the largest number of qualiﬁed buckets. For example, parallel disks con-
nected to one shared bus, or some of the multiprocessors based on multistage intercon-

nection networks are considered to be such systems.

Ill

 

Percentage of 60 _
Strict Optimal DM
Distribution 40 —

 

 

20 —
0 I I I I I I
O 1 2 3 4 5
Number of Fields Whose
Sizes Are Less Than M

(a) Number of Fields = 6

 

 

 

100
FX
80—
Percentage of 60— DM
Strict Optimal
Distribution 40-
20—-
0 I I I I I I I I I I
012345678910
Number of Fields Whose
Sizes Are Less Than M

(b) Number of Fields = 10

Figure 3.8. Strict Optimality When for Any Fields r and s, F ,F s 2 M

57

FX

Percentage of 60 _
Strict Optimal
Distribution 40-

 

DM

 

 

 

20 —
O I I I I I I
O 1 2 3 4 5 6
Number of Fields Whose
Sizes Are Less Than M
(a) Number of Fields = 6
1
00 FX
80 —

Percentage of 60 _ DM

Strict Optimal
Distribution 40 “

 

 

20—
0 I I I I I I I I I I
012345678910
Number of Fields-Whose
Sizes Are Less Than M

(b) Number of Fields = 10

Figure 3.9. Strict Optimality When for Any Fields r, s and t, F ,F SF 1 2 M

58

Table 3.1 through 3.8 show the largest response size of the DM, GDM and FX dis-
tribution methods for various ﬁle sizes with various number of parallel devices. The
number of ﬁelds is six for all these experiments. In all these tables, the ﬁrst column

denotes the number of unspeciﬁed ﬁelds.

For the GDM method, in order for comparison to be fair, we used seven different
sets of multiplication parameters. These sets are GDMl : 3, 11, 23,37, 49, 53 and
GDM2 : 5, 9, 31, 37, 53,59 and GDM3 :41, 43, 47, 51, 53,57 and GDM4 : 3, 5, 7, 11,
13,17 and GDM5 :3, 7,13, 43, 51, 57 and GDM6 : 2, 3, 5, 7,11,13 and GDM7 :2, 5,
11, 43, 51, 57. Here, the sets of parameters in GDMl, GDM2, GDM3, GDM4 and
GDM5 are chosen based on [DuS82], i.e., relative prime to the given number of devices.
GDM6 and GDM7 are used to include other cases.

For FX distribution methods, ﬁeld transformation functions applied in each experi-
ment are as follows.

Table 3.1:<I, U, IU2, IU3, I, IU1>, Table 3.2: <U, IU3, IU4, I, IU1, IU2>,
Table 3.3 : <l, U, IU1, I, U, IU1>, Table 3.4: <I, U, IU1, I, U, IU1>,

Table 3.5 : <IU4, U, IU3, I, IU1, IU2>, Table 3.6: <U, IU1, IU3, IU4, I, IU2>,
Table 3.7 : <U, IU3, IU4, I, IU1, IU2> and Table 3.8 : <I, U, IU2,I, U, IU2>.

Here, the sequence of transformation functions denotes the sequence of ﬁelds to which
these transformation functions are applied. Heuristics based on the theorems in section
3.3 and 3.4 are used to choose these transformation functions.

In all these tables, each entry is computed as an average value of largest response
sizes from all possible partial match queries for that entry. The tables show that except
for ﬁrst row out of table 3.4 and 3.8, FX distribution methods give smaller largest-
response-size than all the other methods. FX distribution is also very close to optimal.
It should also be noted that there may exist a set of multiplication parameters by which

the GDM method can give better performance than those of GDMl, GDM2, GDM3,

 

59

Table 3.1. Response Time for F 1=F2=F3=F4=2, F5=F6=4 and M = 16.

 

 

 

 

 

 

 

 

 

 

 

 

DM GDMl GDM2 GDM3 GDM4 GDM5 GDM6 GDM7 FX Optimal
2 2.1 1.3 1.7 1.4 1.3 1.5 1.3 1.5 1.1 1.0
3 4.4 2.2 3.1 2.2 2.2 2.4 2.3 2.6 1.6 1.2
4 10.3 4.4 6.2 4.3 4.2 4.7 4.5 5.1 3.0 2.7
5 22.3 8.7 12.3 8.2 8.2 9.2 9.0 10.7 6.7 6.7
6 52.0 18.0 26.0 17.0 18.0 19.0 20.0 22.0 16.0 16.0
Table 3.2. Response Time for F 1=F 22F 3:2, F 4=F 5=F 6:4 and M = 32.
DM GDMr GDM2 GDM3 GDM4 GDM5 GDM6 GDM7 FX Optimal
2 2.4 1.2 1.5 1.3 1.2 1.3 1.2 1.3 1.0 1.0
3 5.7 1.9 2.9 2.1 1.9 2.2 1.9 2.2 1.5 1.1
4 14.8 3.8 6.0 3.8 3.5 4.2 3.5 3.9 2.9 2.2
5 36.0 8.2 13.3 7.8 7.5 8.8 7.8 8.7 6.6 6.0
6 92.0 18.0 28.0 18.0 18.0 20.0 19.0 20.0 16.0 16.0
Table 3.3. Response Time for F 1=F2=F3=F4=F5=F6=8 and M = 32
DM GDMl GDM2 GDM3 GDM4 GDM5 GDM6 GDM7 FX Optimal
2 8.0 3.8 4.5 3.7 3.4 3.9 3.3 3.6 3.2 2.0
3 48.0 19.2 21.9 18.9 18.3 20.2 18.1 18.9 16.0 16.0
4 344.0 133.8 143.3 132.5 131.6 138.0 130.5 132.7 128.0 128.0
5 2460.0 1034.7 1058.3 1031.7 1029.3 1045.3 1026.3 1029.7 1024.0 1024.0
6 18152.0 8210.0 8292.0 8202.0 8200.0 8224.0 8196.0 8198.0 8192.0 8192.0

 

 

 

Table 3.4. Response Time for F 1=F 2=F 3=F4=F 5=F 6:8 and M = 64.

 

 

 

 

 

 

DM GDM1 GDM2 GDM3 GDM4 GDM5 GDM6 GDM7 FX optimal

2 8.0 2.4 2.8 2.4 2.1 2.9 2.1 2.2 2.4 1.0
3 48.0 10.7 11.8 10.6 9.9 12.4 10.2 10.3 8.0 8.0
4 344.0 69.0 73.3 67.5 67.1 75.4 68.3 68.1 64.0 64.0
5 2460.0 522.3 531.3 517.3 516.2 538.2 520.5 517.0 512.0 512.0
6 18152.0 4115.0 4148.0 4102.0 4102.0 4146.0 4114.0 4102.0 4096.0 4096.0
Table 3.5. Response Time for F1=2, F2=F3=4, F4=F5=F6=8 and M = 128.
DM GDM1 GDM2 GDM3 GDM4 GDM5 GDM6 GDM7 FX Optimal

2 4.1 1.2 1.3 1.3 1.2 1.2 1.0 1.4 1.0 1.0
3 17.8 2.8 3.1 2.7 2.5 3.0 2.6 2.9 1.9 1.5
4 81.9 9.6 9.2 8.7 8.2 9.9 8.9 8.8 6.5 6.3
5 351.3 35.3 33.5 32.8 32.2 35.0 35.8 31.8 29.3 29.3
6 1456.0 142.0 134.0 133.0 132.0 135.0 145.0 131.0 128.0 128.0

 

Table 3.6. Response Time for F1=F2=F3=F4=4, F5=F5=8 and M = 256.

 

 

 

DM GDM1 GDM2 GDM3 GDM4 GDM5 GDM6 GDM7 FX Optimal
2 4.3 1.1 1.1 1.2 1.0 1.1 1.1 1.1 1.0 1.0
3 17.6 1.8 2.0 1.9 2.2 1.8 2.6 1.9 1.4 1.0
4 79.2 5.2 5.1 4.8 6.7 5.0 8.2 4.9 3.7 2.7
5 352.0 18.2 17.3 16.8 26.7 17.5 34.2 17.3 14.7 13.3
6 1592.0 73.0 72.0 70.0 119.0 71.0 155.0 69.0 64.0 64.0

 

 

 

 

61

Table 3.7. Response Time for F 1=F2=F3=4, F4=F5=F6=8 and M = 512.

 

GDM3 GDM4 GDM5 GDM6 GDM7 FX Optimal

 

ONUIshWN

 

DM GDM1 GDM2
4.8 1.0 1.0 1.1 1.0 1.0 1.1 1.0 1.0 1.0
22.8 1.6 1.8 2.0 2.6 1.6 3.2 1.6 1.4 1.0
114.8 4.4 4.8 6.2 9.6 4.5 12.3 4.3 3.5 2.2

569.0 15.8 17.3 25.8 44.8 16.0 58.3 15.5 13.3 12.0
2848.0 70.0 75.0 122.0 220.0 70.0 289.0 69.0 64.0 64.0

 

 

Table 3.8. Response Time for F 1=F2=F3=8, F4=F5=F6=16 and M = 512.

 

DM GDM1 GDM2 GDM3 GDM4 GDM5 GDM6 GDM7 FX Optimal

 

OUIJKWN

 

9.6 1.3 1.4 1.4 1.3 1.3 1.7 1.3 2.3 1.0
91.2 5.3 5.7 5.6 7.7 5.5 10.0 5.5 5.1 3.2
911.2 39.9 40.1 42.2 70.2 40.5 90.3 40.5 37.3 35.2

395.5 392.7 408.67 700.2 397.7 909.5 397.3 384.0 384.0

9076.0
9176.0 4144.0 4096.0 4096.0

90404.0 4129.0 4112.0 4313.0 6969.0 4139.0

 

 

62

GDM4 and GDM5 in Table 3.1 through Table 3.8. However, even though such set of

parameters exists, those can only be found by trial and error method.

In disk based database systems the computation time is not signiﬁcant compared to
disk access time. But in main memory databases CPU computation time is important.
Thus, we will discuss CPU computation time for the FX and GDM distribution method.

We use optimized instruction codes for comparing CPU computation time. In the
GDM method we use AND operation to implement modulo function. This is possible
because the number of devices is assumed to be a power of 2. In FX distribution
method, since the multipliers for U and [UK transformation are always power of 2, we
can substitute multiplication by shift operation. Note that we cannot do this in the
GDM method because multipliers in the GDM method are usually chosen from prime or

odd numbers. Function TM is done by AND operation.

In MC68000 processor, computation time of FX methods take much less than that
of the GDM method. (In MC68000, XOR takes 8 cpu clock cycles, ADD takes 4 clock
cycles, AND takes 4 clock cycles, n bit shift takes 6 + 2n clock cycles. But multiplica-
tion takes 70 clock cycles). In intel 80286/80386 processor the ratios of clock cycles

between different operations are almost similar to those of MC68000.

For main memory databases FX distribution methods are much faster than the
GDM distribution method. The computation time of the DM distribution method is less
than that of FX distribution methods, but as is shown in the Table 3.1 through Table 3.8,

the DM distribution method is not suitable for a large number of parallel devices.

3.6. Data Construction Methods

In this section we discuss data construction methods for the ﬁle which is distributed
by FX distribution methods. Data distribution methods determine the amount of access
concurrency while data construction methods affect storage characteristics and time for

each bucket access. We will present two approaches of data construction based on the

63

usage of multikey hashed directory. Multikey hashing for a given ﬁle with n ﬁelds pro-
duce a subset of T, where T = f1 xf2>< xfn. As discussed in chapter 2, T can be used

as either a real global directory or a virtual global directory.

3.6.1. Data Construction Based on Real Global Directory

We will describe methods of using T as a real global directory. Let GD =
[0..F1—l, . . . , O..F,,—1] be a multi-dimensional array in which the range of the i-th
dimension is 0.171-1. This is the same range of multikey hashing for ﬁeld i. Here, GD
serves as a real global directory. Each element of GD contains the address of a bucket.
When the directory is centralized, the problem is trivial. However, for maximum access
concurrency the directory also needs to be distributed among the access nodes. Data

construction methods for this case will be investigated in the rest of this section.

FX distribution methods partition multi-dimensional array GD into M subsets,
where M is the number of access nodes. Since a directory is also distributed among the
nodes, we have to have efﬁcient storage rule to locate the elements of this multi-
dimensional array. In other words, for each element of GD we have to deﬁne the local
address in each device. The similar ideas for distributing the elements of an array have
been used in MDA memory [Bat77] and the prime memory system [Law82].

In order to determine the local address efﬁciently, we need a few techniques which

are described below.

Deﬁnition 3.15. In a given ﬁle system, a Minimal Pivot Set (MP8) is any set of ﬁelds in
which the size of the cartesian product of those ﬁelds is no less than the given number of

devices M, and any subset of an MP8 is not an MP8.

Let there exist an MP8 whose cardinality is no greater than 4 in a given ﬁle system (if
not, no efﬁcient method of using T as a real global directory has been found). Let p be

the one of MPS’s whose cardinality is minimum, and p’ be the set of all the remaining

64

ﬁelds. In multidimensional array GD, rearrange ﬁelds such that those ﬁelds in p

become rightmost dimensions.

When IpI = 2, let n-1 and n be two ﬁelds in p. Here, ﬁeld n-1 and n denote the
ﬁelds which correspond to (n-1)-th and n-th dimensions in GD, respectively. Then,
apply I—transformation for ﬁeld n-1 and U-transformation for ﬁeld n. When Ipl = 3, let
n-2, n-1 and n be three ﬁelds in p such that Fn_2 2 F,, 2 Fn_1. Then, apply I, U and
[U 2 transformation to ﬁeld n-2, n-1 and n, respectively. Note that the sequence of ﬁelds
is important in above two cases. When |p| = 1, apply I transformation to the ﬁeld in p.

Let the sequence of elements in GD is based on row-major ordering, i.e, index ele-
ments of high dimensions change ﬁrst. Then, the storage rule of this multi-dimensional
array is as follows :

For a bucket < J 1, . . . , J,, > produced by multikey hashing (the sequence of ﬁelds in
the given bucket is the same as that in GD, i.e., i-th ﬁeld denotes i—th dimension in GD),

(1) Device No is determined by the FX distribution methods.

(2) The local address for this bucket in the device is Z J1m- + l [2 J 1K1]/MJ, where 71

iep’ iep

n n
= H/H Fj]/M, andM: HF-,ifi¢n and7t,-=1,ifi=n.

=i+1 j=i+l

 

The correctness of the local address calculation can be shown by the following proposi-

tion.

Proposition 3.2. Let GD’ = [0..F1—1, . . . , 0..F ,1—1] be a multi-dimensional array,
n

where the content of element [J 1, , J,,] is TM [[+](X 1-(J j))] which is the same as the
i=1

device number computed by the FX distribution method. Let S =

(a0, . . . , aM_1, aM, . . , am, . . ) be a linear sequence of the contents of GD’ by row-

major ordering. Let S,- be a subsequence of S such that S,- =
F1 x . . . Xpn

(am, am“, . . . ,a(,-+1)M_1). Then, for any S,-, i =0, ~-- , —M———l , a set of

65

elements in S,- is 2114. (It is assumed that there exists an MP8 whose cardinality is no
greater than 4.)

Proof: When a cardinality of p is one (i.e., there exists a ﬁeld whose size is no less than
the given number of devices), or the size of the cartesian product of ﬁelds in p is M, it
simply follows by Lemma 3.2. For other cases, the arrangement of ﬁelds in p ensures
that sequence as shown below.

(case 1) Ipl = 2 (i.e., I(f,,_1) [+] U(fn))

The proof immediately follows by Lemma 3.8.

(case 2) 'PI = 3 (i.eo 1(fn—2) [+] U(fn—i) [+11U2(fn))

Since F,,_2 2F" 2F,,_1, F,,2 < M. Let Fn_2F,,_1F,, =AM and F,,-1F,, = %. Then,

F,,_2 =AD. Let U(fn_1) [+] IU2(f,,) = R. By Theorem 3.5, R has F,,_1F,, distinct ele-

ments which are between 0 to M-l. Let d,,_1=M/F,,_1, i.e., d,,_1 = F,,D. Since
Fn—l

d,,_1 > F,,, by Lemma 3.8, R = U [U (f,,_1) +c i] for F,, different number of ci’s which
i=0

are between 0 to d,,_1. Now it is sufﬁcient to show that there exists exactly one c,- for
each interval of size D between 0 to d,,_1. This guarantees that for any set W of D con-
secutive elements in f,,_2 which are all in the same interval of size D, R[+]W = ZM by

Lemma 3.2.

Now, let d,,1 =M/F,, and d,,2 = M/Fnz. We want to show that for any K1, K2 5
f,, such that K2 > K1, c,- = (K1[+]K1d,,1[+]K1d,,2) mod d,,_1 and cj =
(K2[+]K2d,,1[+]K2d,12) mod d,,_1 are in different intervals of size D. Let dn_1 = Bdn1.
When K 1 mod B at K 2 mod B, it is easy to see that c,- and cj cannot be in the same inter-
val of size D because d,,1>D. Let K1 mod B = K2 mod B. This implies that K1 and
K 2 are in the different intervals of size B. Therefore, by Lemma 3.12, K1[+]K1d,,2 and
K 2[+]K 2dn2 should be in different intervals of size Bdnz between 0 to dn_1. Since

Bdnz = D, the proof follows. E]

66

When a multikey hash function (91 is used for a ﬁle V, using T as a real global
directory is advantageous if for most t e T, (91—10) 6 V with reasonable average chain
length. However, an appropriate (91 may not be easily determined for dynamic ﬁles
because the ﬁle size is not known in advance. Hence, the real global directory may turn
out to be very sparse or to have long overﬂow chain. Here, it should be noted that
applying dynamic hash function for 81 will cause signiﬁcant overhead due to intemode
data movement. The virtual global directory approach described in the next section can

avoid this problem.

3.6.2. Data Construction Based on Virtual Global Directory

In the previous section we described the method of using T as a real global direc-

tory. In this section we describe the method of using T as a virtual global directory. '

The idea of the virtual global directory is to use one more hash function (92 which is
local in each device. Let < J 1, . . . , J,, > be an ordered n-tuple produced by multikey
hash function (91 for some record. The local hash function @2 uses this ordered n-tuple
as an input key for its local directory. Since (92 does not affect data distribution,
dynamic hash functions [Fag79, Lar78, Lit80] can be used as (92. When T is used as a
virtual global directory, only the local directories physically exist. Each local directory

can dynamically grow and shrink while the virtual global directory is static.

This two—level mapping data construction is much more ﬂexible than the real glo—
bal directory because the storage utilization of directories is not affected by @1, and
dynamic hash function for local directories can handle dynamic ﬁles. Thus, a virtual
global directory approach is appropriate for dynamic ﬁles.

One disadvantage of the virtual global directory approach is that it may cause more
probings to ﬁnd qualiﬁed records than in the real global directory. This is because dif—
ferent ordered n-tuples produced by (91 can be mapped into the same local directory

entry by @2.

67

Let V1 be the set of ordered n-tuples produced by @1 for a given ﬁle which are
allocated into the same device, and V2 be the range of local hash function (92 in that
device. Let p1 = NH and p2 = IV2l. Let t be an average number of elements in V1
which are mapped into the same v 6 V2 by 62. Then, the probability P that @2_1(v) =

 

1.11
. . l l
forveV2,1s 1venb P= 1—— . Letc= /1. Then,t=—-—.
111
This can be derived by using lim (1 — 1 ) = e‘l’c, and t = L . For exam-
p1_,°. c111 (l—P)p2

ple, 1: = 1.58 when c = l, and so the number of probings increases. (Detailed description
about average chain length for various hashing algorithms can be found in [Knu73].)
However, since T consists of cartesian product of all ﬁelds, many elements in T
may not correspond to any record. On the other hand, the two level hashing scheme
(i.e., the virtual global directory approach) can always achieve efﬁcient storage utiliza-
tion of the directories because the input of 62 is only those n-tuples which have
corresponding records. Even for the case when a real global directory has many empty
entries, the two level hashing scheme can always guarantee efﬁcient storage utilization

with slightly increased number of probings.

CHAPTER 4

OPTIMAL DATA DISTRIBUTION FOR
MULTIATTRIBUTE RANGE QUERIES

4.1. Introduction

In this chapter we describe optimal data distribution for multiattribute range
queries. A multiattribute range query, also called a region query or orthogonal range
query, is an intersection query in which a multiplicity of the attributes are allowed to be
range speciﬁed in a query’s qualiﬁcation. The multiattribute range query differs from
the partial match query in that the range speciﬁcation is allowed in the multiattribute

range query while it is not allowed in the partial match query.

Several ﬁle structures have been proposed for handling multiattribute range queries
(hereafter, a range query denotes a multiattribute range query). These include various
types of specialized tree structures and hash based accesses. In [Ben75] the multidi-
mensional binary search tree, also called k-d tree, has been proposed, which is an exten-
sion of the standard binary search tree. Worst case analyses of these trees can be found
in [Lee77]. There are other types of the tree structure for range searching [Ben79].
Hash based accesses for range queries has also been investigated in the past. In
[Bol81], Bolour proposed box-array addressing functions which is a composition of ran-
domizing function and order-preserving function. It has been shown in that paper that
this hashing scheme is effective for answering small range queries. Several other types
of the ﬁle structure for range searching can also be found in [Ben79, Knu73]. A lower
bound on the complexity of range queries has been presented in [Fre81]. The main
focus of those research is on minimizing the number of bucket accesses. However, they

did not consider data distribution to enhance access concurrency. Though there are
68

69

some heuristic data distribution methods for multikey search queries [Du882, Kim88],
range speciﬁcation in a query’s qualiﬁcation is not considered in those methods. In this
chapter we will mainly focus on optimal data distribution for multiattribute range

queries to facilitate parallel processing of this type of applications.

A partial match query can be thought of as a special case of a range query because
a range query reduces to a partial match query when each speciﬁed range size is only
one. Thus, a partial match query will be treated as a special case of a range query in the

rest of this chapter.

The rest of this chapter is organized as follows. In section 4.2 we describe under-
lying ﬁle structure, basic deﬁnitions and assumptions. In section 4.3 we show the
inherent limitations for optimal data distribution for some types of range queries. We
describe optimal data distribution methods for range queries in section 4.4. The

optimality conditions for these methods are also given.

4.2. Deﬁnitions And Terminology

In this section we describe underlying ﬁle structures as well as relevant deﬁnitions

and assumptions.

We use partitioned hashed directory as a ﬁle structure. In other words, the access
structure for a ﬁle with n secondary keys is a partitioned hashed directory consisting of n
ﬁelds. This directory is based on multikey hashing scheme, where each hash function is
order-preserving. Multikey hash function is described in chapter 3. Several order-
preserving hash functions have been proposed in [Gar86,Hsi88]. In the following para-
graph we also give a method for implementing order-preserving hashed accesses by

small tables.
One way of implementing order-preserving hash functions is to represent the func-
tion explicitly by a table. Figure 4.1 shows the explicit representation of n functions for

a ﬁle consisting of n ﬁelds, where hi denotes the function (i.e., table to show the function

70

values) applied to i-th attribute values. Each table of the ﬁgure the ﬁrst column denotes
the interval of key values and the second column denotes the image for any domain

value in that interval.

     

hi hn

Figure 4.1. Explicit Representation of Multikey Hash Functions

In this structure a ﬁle consisting of n ﬁelds (i.e., n secondary keys) needs n tables.
Let f,- be the projection of the second column of table i, i.e., f1 = {0, 1, . . . , 7} in Figure
4.1. A bucket is deﬁned as an ordered n-tuple, <b1, . . . , b,,>, such that b,- is an element
of f1. Then, the cartesian product f1 x . . . x f,, represents the multikey hash directory.
As we discussed in chapter 3, this directory can be used as either a global directory or a

virtual global directory.

This representation requires additional storage for the tables as in Figure 4.1. Let
us consider the storage overhead of these tables for the ﬁle with ﬁve ﬁelds. Suppose the
directory consists of thirty thousand (about 215) entries (if a bucket contains 33 records
in average, the ﬁle contains about one million records). When all the ﬁeld sizes are the
same, the size of each ﬁeld is 8. If each entry of the tables takes ten bytes, the storage
overhead is only 400 bytes because we need 5 * 8 = 40 entries altogether. Thus, the

storage overhead is negligible and these tables can be easily stored in main memory.

P

71

We use a bucket as a unit of data distribution. The main focus of this chapter is to
investigate optimal bucket distribution for range queries in multikey hashing. Since we
are dealing with multikey hash ﬁle which is the same as that in chapter 3, all the termi-
nology deﬁned in chapter 3 are also used in this chapter. In the rest of this section we

describe deﬁnitions which are necessary in this chapter.

Deﬁnition 4.1. r(u, v) denotes the range speciﬁcation between it and v, where both

boundaries are closed.

We assume that a range query can have three types of ﬁeld value speciﬁcation which are
single value, range, and don’t care in which case it is denoted by *. We will not use r(u,
v) to denote a single value (i.e., range size is equal to one) and don’t care (i.e.,
unspeciﬁed). We also do not allow cyclic ranges like r(5,2), i.e., 5, 6 , . . . , F ,~—l, 0, l,
2. In other words, whenever r( u, v) is a range speciﬁcation for ﬁeld i, it is implied that u

< v and, either it =00rv¢F,—l.

Deﬁnition 4.2. A range query for ﬁles with n ﬁelds is denoted by [A 1, A2, . . . , An],

where for each i = l, . . . , n, A,- = *, or r(u, v), or w, where u, v, w 6 f1 and u < v.

We deﬁne types of range queries based on the number of range speciﬁed ﬁelds in a

query’s qualiﬁcation.

Deﬁnition 4.3. When the number of range speciﬁed ﬁelds is or in a query q, query q is
called type or range query.

Example 4.1. Let q1, q2 and q3 be queries in the ﬁle (DEPT, AGE, STATE) such that
q1 = [Math, r(20, 27), Ohio], q2 = [*, r(20, 27), *] and q3 = [r(Math, Physics), r(20,
27), *]. Here, r(20, 27), r(Math, Physics) denote range speciﬁcation, and "*" denotes
don’t care condition, i.e., the ﬁeld value is unspeciﬁed. Then, q1 and q2 are type 1
range queries, and q3 is type 2 range query. By the above deﬁnition a partial match

query such as [Math, *, Ohio] is a type 0 range query.

72

Deﬁnition 4 .4. Let R (q) be the set of buckets which satisfy the qualiﬁcation for a range
query q. The distribution method is called strict optimal for a range query q in a given

ﬁle system if each device has no more than [IR (q) I/M] number of buckets.

Deﬁnition 45. When the distribution method is strict optimal for all queries in type 0
through type or range queries in a given ﬁle system, it is called perfect optimal for type

(0 - 0t) range queries in that ﬁle system.

Example 4.2. Let f1 = {0, l, 2, 3}, f2 = {0, l, 2, 3} and M = 8. Figure 4.2 shows
three bucket distributions denoted by Distlibution-l, Distribution-2 and Distribution-3.
For example, the bucket <3, 1> is stored in device 5 for Distribution-l, device 2 in

Distribution—2 and device 7 in Distribution-3.

 

 

f1 f 2 Distribution-l Distribution-2 Distribution-3
0 0 0 0 0
0 1 1 1 4
0 2 2 2 2
0 3 3 3 6
1 0 4 4 1
1 1 5 5 5
1 2 6 6 3
1 3 7 7 7
2 0 0 7 2
2 1 l 6 6
2 2 2 5 0
2 3 3 4 4
' 3 0 4 3 3
3 1 5 2 7
3 2 6 l 1
3 3 7 0 5

 

 

 

Figure 4.2. Example Bucket Distributions

Let q1 = [r(l, 2), *] and q2 = [*, r(O, 1)]. Then, the Distribution-1 is strict optimal
for query q 1 , but is not strict optimal for query q2. The Distribution-2 is strict optimal

for query qz, but is not strict optimal for query q1. The Distribution-3 is perfect

73

optimal for type (0 - 1) range queries, and hence it is strict optimal for both queries q1
and q2. However, this distribution is still not perfect optimal for type (0 - 2) range

queries because it is not strict optimal for query q3 = [r(O, 2), r(O, 2)].

Deﬁnition 4.6. Let a ﬁle consists of n ﬁelds, and S1, i = 1, . . . , n, be a subset of f1.
Then, M (S 1 , . . . , 5,.) denotes a set of devices at which bucket <b1, . . . , bn>is stored,
where b,- 6 5;. For convenience, when S,- = {s1}, i.e., S1 has a single element, we will

use s,- instead of {s1}.

For example, in Distribution-1 of Figure 4.2, M(0, 0) = {0}, M({0, l}, 0) = {0, 4} and
M({0,1, 2}, O) = {0, 4}.

Since the perfect optimal distribution is the most desirable, it is worth investigating
whether a perfect optimal distribution always exists. It will be shown that perfect
optimal distribution for certain types of range queries does not exist inherently in many
cases. In the following section we discuss nonexistence of perfect optimal distribution
for certain types of range queries. We give data distribution methods for optimal
bucket distribution for various types of range queries in later sections. It will be shown
that the proposed data distribution methods are perfect optimal for certain types of range

queries, and strict optimal for a large class of range queries.

4.3. Limitations of Perfect Optimal Distribution

In this section we will show that there are inherent limitations for achieving perfect
optimal distribution for certain types of range queries. In section 4.3.1 we show that for
ﬁles with two or more ﬁelds, perfect optimal distribution for type (0 - 2) range queries
does not exist in many cases. In section 4.3.2 it will be shown that for ﬁles with three
or more ﬁelds, perfect optimal distribution for type (0 - 1) range queries is not always
possible. In both sections the sufﬁcient conditions for the nonexistence of perfect

optimal distribution will be given. In section 4.3.3 we discuss perfect optimal

74

distribution for type 0 range queries.

4.3.1. Type (0 - 2) Range Queries

In this section we show that perfect optimal distribution for type (0 - 2) range

queries does not exist in many cases.

Lemma 4.1. When a ﬁle consists of n ﬁelds (n 2 2) and the given number of devices is
M, perfect optimal distribution for type (0 - 2) range queries does not exist if there are at
least two ﬁelds i and j, and two integers a and b such that 2 S a < F1, 3 S b S Fj and ab =
M.

Proof: It is sufﬁcient to prove the lemma for ﬁles consisting of only two ﬁelds.
Assume perfect optimal distribution for type (0 - 2) range queries exists. Thus, we have
a distribution for this ﬁle system which is perfect optimal for type (0 - 2) range queries.
LetS= {S1, . . . ,sa} beasubsetofﬁ,wherea_>.2ands1 >0, s2 =s1+1, . . . ,sa =
sa_1+1, and T = {t1, . . . , tb} be a subset offj, where b 2 3 and t2 = t1+1, . . . , tb =
tb_1+1 such that ab = M. In other words, S and T are a and b consecutive values in
ﬁeld i and j, respectively, where IS x Tl = M. Clearly, when A and B are sets of con-
secutive elements in ﬁeld i and j, respectively, such that IA x Bl < M, M(A, B) is a set
whose size is IA x Bl. Thus, the sets M(s1, T), . . . , M (sa, T) are mutually disjoint and
M (S, T) = ZM (otherwise, the distribution is not strict optimal for query q1 = [r(s1, sa),
r(t1, tb)]). Since the sets M(s1—1, T), . . . , M(sa—l, T) are also mutually disjoint, and
the size of cartesian product of these sets is M, M (s1—1, T) = M (so, T) (Figure 4.3
gives a pictorial view of these steps). We will show a contradiction for either case
when the size of T is even, or odd.

(case 1) b is even, (i.e., b 2 4)

Let T11 = {t1, . . . , rm}, and T12 = {t1,/2.11,. . . , tb}. The sets M(s1—l, T11) and
M (sa, T11) are disjoint (otherwise, the distribution is not strict optimal for query q2 =

[r(s1-1, Sa), r(t1, t1, 0)]. Note that the number of qualiﬁed buckets for qz is less than

75

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

1". fi Device No
51-1 _____
_ _T_11 _ F _ _>
_ _T_12 _ :
Sl _ _ _ E
. _T_11 _
_T_12 _ :
_ 5 Do not have
_ : qualiﬁed
1 E bucket for
Sr ————— I 43
_ -731 _ i
_ _T_12 _ E
s. _____ I
_ _T_1_1 _ L
_T_12

 

 

 

 

 

 

 

 

 

 

Figure 4.3. Nonexistence of Perfect Optimal Distribution for Type (0 - 2) Range Queries

76

M). Thus, M(s1—1,T11) is equal to M(sa, T12). and M(s1-1, T12) is equal to
M(sa’ T11)'

Now, let us consider range query q3 = [r (s1-1, so), r(t1, $12.11)]. Suppose the
bucket <s1—1, tb12+1> is stored at device mo, and the bucket <sa, 0,1211 > is stored at
device m1. Since tb,2+1 e T12 and M(s1—1, T12) = M(sa, T11), mo is an element of
M(sa, T11). Similarly, m1 is an element of M(s1-1, T11). Thus, device mo and m1

have at least two qualiﬁed buckets for query q3. However, the devices in

a-l
UM (sk, {t1,/2+2, , tbj) do not have any qualiﬁed buckets for query q3. This contrad-
k=l

lets the assumption.

(case 2) b is odd

Let T11, = {11, . . . , tb’} and T12, = {tb’+1: . . . , tb}, WhCI‘C b, = 'b—:1—. T116 8618

M (s1—1,T11’) and M (sa,T11’) are disjoint (otherwise, the distribution is not strict

optimal for query q4 = [r(s1—l, sa), r(t1, tbr)]. Note that the number of qualiﬁed buck-
ets for (14 is less than or equal to M because (a +1)(P;2*i) S ab when a 2 2 and b 2 3).

However, M(s1—1, T) = M(sa,T) and M(s1—1,T11’) n M(sa,T11’) = 0 cannot be
satisﬁed at the same time because |T11’I > ITI/2. This is a contradiction.

CI

Example 4.3. This example explains the proof of Lemma 4.1 through an example ﬁle
system. Let f1 = {0, 1, 2, 3}, f2 = {0, l, 2, 3} and M = 8. Note that this ﬁle system
satisﬁes the conditions of Lemma 4.1. Let buckets be distributed as in Figure 4.4,
where mk, k = 0, , 15, denotes an element in Z 3. Since we have only eight devices,

m1, at m, is not implied by u at v in the table.
Suppose Figure 4.4 is a perfect optimal distribution for type (0 - 2) range queries.

Then, (mo, m1, . . . , m7} = {m4, m5, . . . , m11} =Z3. Otherwise, the distribution is

not strict optimal for at least one of the range queries q1 = [r(O, 1), *] and q2 = [r(l, 2),

77

 

 

 

 

 

 

 

f1 f2 DeviceNo
0 0 m0
0 1 m1
0 2 m2
0 3 m3
1 0 m4
1 1 m5
1 2 m6
1 3 "l7
2 0 m3
2 1 m9
2 2 mm
2 3 m11
3 0 M12
3 1 "113
3 2 m14
3 3 "215

 

Figure 4.4. Bucket Distribution When F 1 = 4, F 2 = 4 and M = 8

*]. This implies that {m0, m1, m2, m3} = {m3, m9, m11), m11}. Since {m0, m1} and
{m3, m9] are disjoint (otherwise, the distribution is not strict optimal for range query q3
= [r(O. 2).r(0. 1) l). {mo.m1} = (mm. mm} and {mama} = {m81m9}. Let us con-
sider the range query q4 = [r(O, 2), r(O, 2)]. Then, the devices which have the qualiﬁed
buckets for this range query are m0, m1, m2, m4, m5, m6, m3, m9, m11). Here, m2 is
one of mg or 1719, and m11) is one of mg or m1. Thus, at least two devices have more
than one qualiﬁed buckets for this range query. However, m7 does not have any

qualiﬁed bucket for q3. This contradicts the assumption.

Lemma 4.2. When a ﬁle consists of n ﬁelds (n 2 2) and the given number of devices M
is equal to four, perfect optimal distribution for type (0 - 2) range queries does not exist
if there are at least two ﬁelds i and j such that F,- 2 3 and F J- 2 3.

Proof: It is sufﬁcient to prove the lemma for ﬁles consisting of only two ﬁelds. Sup-

pose perfect optimal distribution for type (0 - 2) range queries exists. Thus, we have a

78

distribution for this ﬁle system which is perfect optimal for type (0 - 2) range queries.
Let (mg, m 1 , m2, m3} be the set of four devices. Since any two of buckets <0, 0>, <0,
1> and <0, 2> should not be stored at the same device, let M (0,0) = {mg}, M(O, 1) =
{m1} and M(0,2) = {mg}. Since M(O, {0, 1}) and M(l, {0, 1}) are disjoint (otherwise,
the distribution is not strict optimal for query q1 = [r(O, l), r(O, l)]), and M(l, {0, 1})
and M(2, {0, 1}) are disjoint, M(O, {0, 1}) = M(2, {0,1 }). Since M(O, 0) is not equal to
M(2, 0), M(O, 0) = M(2, l) and M(O, l) = M(2, 0). Thus, bucket <2, 0> is stored at
device m1, and bucket <2, l> is stored at device m0. Since bucket <2, 2> cannot be
stored at any one of the devices m0, m1 and m2 (otherwise, the distribution is not strict
optimal for at least one of the range queries q2 = [2, r(O, 2)] and q3 = [r(O, 2), 2]), M(2,
2) = {m3}.

Now, for any allocation of bucket <1, l> to any one of the devices mg, m 1, m 2 and
m3, the distribution is not strict optimal for at least one of the queries q4 = [r(O, l), r(l,

2)] and q5 = [r(l, 2), r(l, 2)]. This is a contradiction. El

Theorem 4.1. When a ﬁle consists of n ﬁelds (n 2 2) and the given number of devices
is M, there does not exist a perfect optimal distribution for type (0 — 2) range queries if
( 1) there are at least two ﬁelds i and j, and two integers a and b such that 2 S a < F1, 3 S
b S F1 and ab = M, or (2) there are at least two ﬁelds i and j such that F,- 2 3, F1 2 3 and
M = 4.

Proof: This is a direct consequence of Lemma 4.1 and Lemma 4.2. [3

Thus, there does not exist a data distribution method which guarantees perfect optimal

distribution for type (0 - 2) range queries even for every ﬁle with only two ﬁelds.

4.3.2. Type (0 - 1) Range Queries

In this section we show that for ﬁles with three or more ﬁelds, there does not exist

perfect optimal distribution for type (0 - 1) range queries in many cases.

79

Theorem 4.2. When a ﬁle consists of n (n _>. 3) ﬁelds and the given number of devices
is M, there does not exist a data distribution which is strict optimal for any type 0 range
query that has at most two unspeciﬁed ﬁelds, and for any type 1 range query that has at
most one unspeciﬁed ﬁeld if (1) there are at least three ﬁelds i, j and k such that F ,- S F ,-
<Fk, and M <F1-F,C SFJ-Fk S 2M, and Fk 2 FJ- + Fij / M, and (2) there are two integers
bandc such thatij = ch =M.

Proof: It is sufﬁcient to prove the theorem for ﬁles consisting of only three ﬁelds. For
convenience, let these three ﬁelds be 1, 2 and 3 such that F 1 S F2 < F 3 and M < F 1F3
S F 2F 3 S2M and F3 2 F 2 + F 2F 3 / M. Suppose the theorem is not true. Then, we
have a distribution which is strict optimal for any query in the theorem. Let cF 3 = M
and bF2 =M. This implies that c < F1 and b < F3. Suppose bucket <0, 0, 0> is
stored at device m0, bucket <0, 0, 1> is stored at m1, . . . , and bucket <0, 0, F 3—1> is
stored at mp3_1. Clearly, for any u 6 f1, v 6 f2 and a set W which contains some con-
secutive elements in f3 such that IW | < M, M(u,v, W) is a set whose size is IW I, and
hence m0, . . . , m1.-3-1 are all different. We will show that buckets <0, 0, b> and <c, O,
b> should be stored at device mo. At ﬁrst, M(O, 0, f3) is equal to M(O, 0, f3) (other-
wise, the distribution is not strict optimal for at least one of the range queries q1 = [0,

r(O, c-l), *] and q2 = [0, r(l, c), *]). Thus, the possible set of devices at which bucket

<0, c, b> can be stored is (mo, . . . , mp3_1}. Bucket <0, 0, b> cannot be stored at any
one of the devices m 1, . . . , mb. Otherwise, for range query q3 = [0, *, r(l, b)], at least
one of the devices m1, . . . , m1, has two or more qualiﬁed buckets while there exist a

device which does not have any qualiﬁed bucket. Note that the number of qualiﬁed
buckets for q3 is M. Bucket <0, 0, b> cannot also be stored at any one of the devices
mb+1, . . . , mp3_1. Otherwise, the distribution is not strict optimal for range query q4 =
[0, *, r(b, F 3—1)]. Note that the number of qualiﬁed buckets for q 4 is less than or equal
to M. Thus, bucket < O, c, b> should be stored at device m0. (Figure 4.5 gives a pic-

torial view of these steps.)

80

 

 

 

 

 

 

 

 

 

f1 f2 f3 Device No
0 0 0

b
0 0 Fir-1
0 1
I) c-1
0 c 0
0 c b
0 c F3-1
1 .
c’-1 :
c 0 O
. o P
c 0 173—1

 

 

 

 

 

Figure 4.5. Nonexistence of Perfect Optimal Distribution for Type (0 - 1) Range Queries

Ill

81

Now, in order for the distribution to be strict optimal for queries q5 = [r(O, c-l), 0,
*] and 46 = [r(l, c), 0, *], M(c, 0, f3) should be the same as M(O, 0, f3). Thus, the pos-
sible set of devices at which bucket <c, 0, b> can be stored is {m0, . . . , m1.-3-1}.

Bucket <c, 0, b> cannot be stored at any one of the devices m 1 , . . . , mp3_1. Otherwise,

the distribution is not strict optimal for at least one of the queries 47 = [*, 0, r(1,b)] and
qg = [*, 0, r(b,F3—1)]. Thus, bucket <c, O, b> should be stored at device m0. How-
ever, this bucket distribution is not strict optimal for query q9 = [*, r(O, c), b] because
device m0 has at least two qualiﬁed buckets for q9 while there exists a device which
does not have any qualiﬁed bucket for q9. Note that cF 3 = M and F3 > F 2 + F 2F 3 /
M imply that the number of qualiﬁed buckets for qg is less than or equal to M. This

completes the proof. III

Corollary 4.1. When a ﬁle consists of n (n 2 3) ﬁelds and the given number of devices
is M, perfect optimal distribution for type (0 - 1) range queries does not exist if the same

conditions of Theorem 4.2 are satisﬁed.

Corollary 4.2. When a ﬁle consists of n (n 2 3) ﬁelds and the given number of devices
is M, there does not exist a data distribution which is strict optimal for any range query
that has at most one range speciﬁed and at most one unspeciﬁed ﬁeld if (1) there are at
least three ﬁelds i, j and k such that F,- < Fj < Fk, and M < F1Fk S Fij S 2M, and F1, 2
F j + F 1F], / M, and (2) there are two integers b and e such that bFJ- = CF 1, = M.

Proof: It can be observed that Theorem 4.2 is proved by considering range queries (q1
through qg) which have at most one unspeciﬁed ﬁeld. The only one possible exception
is query qo which may have two unspeciﬁed ﬁelds if F j = c + 1. When the condition F ,-
S F, < Fk is replaced by F,- < F j < F k from those of Theorem 4.2, the query q9 is
guaranteed to be a type 1 range query (i.e., r(O, c) is a range speciﬁcation) because F j >

F; 2 c + 1. Thus, the corollary follows. III

By Corollary 4.2, there does not exist a data distribution method which guarantees strict

82

optimal distribution for any range query that has at most one range speciﬁed and at most
one unspeciﬁed ﬁeld even for every ﬁle with only three ﬁelds. This also implies that
there is no data distribution method which guarantees perfect optimal distribution for

type (0 - 1) range queries for every ﬁle with only three ﬁelds.

4.3.3. Type 0 Range Queries

When each ﬁeld size and the given number of devices are power of 2, by Corollary
3.4 perfect optimal distribution for type 0 range queries is always possible if the number
of ﬁelds whose sizes are less than the given number of devices, is no greater than four.
The data distribution methods for type 0 range queries are also presented in chapter 3.
In [Sun87] it has been shown that perfect optimal distribution does not exist for binary

cartesian product ﬁles with n ﬁelds, if M 2 4 and n 2 [logZMJ + 2. This result implies

that perfect optimal distribution for type 0 range queries is not always possible for ﬁles
with four or more ﬁelds. However, for general ﬁle systems, sufﬁcient condition for
either existence or nonexistence of perfect optimal distribution for type 0 range queries

has not been found.

We have shown through Theorem 4.1 and Theorem 4.2 that there are inherent limi-
tations to achieve perfect optimal distribution for range queries even for a ﬁle consisting
of small number of ﬁelds. In the following sections we present optimal data distribution
methods for range queries. These methods are the extended version of FX distribution

methods presented in chapter 3.

4.4. Optimal Data Distribution Methods for Range Queries

In this section we present FX distribution methods for range queries. We describe
basic optimality conditions in section 4.4.1. In section 4.4.2 and the following sections
we will present ﬁeld transformation functions developed for range queries. The condi-

tions to achieve optimal distribution by these ﬁeld transformation techniques will also be

83

described. For convenience we assume, from now on, that each ﬁeld size and the given

number of devices M are power of 2.

4.4.1. Optimal Distribution by Basic FX Distribution Method

In this section we describe conditions for optimal distribution for various types of
range queries by using the Basic FX distribution method. The deﬁnition of the Basic

FX distribution method is given in section 3.3.

Example 4.4. Figure 4.6 shows the bucket distribution by the Basic FX distribution
method, where f1 = {0, l, 2, 3}, f2 = {0, 1, 2, 3} and M = 4. Here, Device N0 =
TM [J1 [+] J 2], where J1 6 f1, J 2 6 f2 and TM returns the rightmost two bits of the

result of J 1 [+] J 2.

 

 

f1 f2 Device No
000 000 0
000 001 1
000 010 2
000 011 3
001 000 1
001 001 0
001 010 3
001 011 2
010 000 2
010 001 3
010 010 0
010 011 1
011 000 3
011 001 2
011 010 1
011 011 0

 

 

 

Figure 4.6. Basic FX Distribution When F 1 = F 2 = 4 and M = 4.

We can verify that the distribution of Figure 4.6 is perfect optimal for type (0 - 1)

range queries but is not perfect optimal for type (0 - 2) range queries because it is not

84

strict optimal for query [r(O, 1), r(O, 1)]. In fact, by Theorem 4.1 there does not exist a

perfect optimal distribution for type (0 - 2) range queries for this ﬁle system.

Since we have shown that FX distribution methods are strict optimal for mosr class of
type 0 range queries in chapter 3, we will not discuss the cases when only type 0 range

queries are involved.

Lemma 4.3. For any type 0 range query which has at least one unspeciﬁed ﬁeld whose
size is greater than or equal to M, all the devices have equal number of qualiﬁed buck-
ets by the Basic FX distribution method.

Proof: The proof is almost the same as that of Theorem 3.2. III

Theorem 4.3. The Basic FX distribution method is strict optimal for any range query,
for which there exists at least one unspeciﬁed ﬁeld whose size is greater than or equal to
the given number of devices M.

Proof: Since a range query is a set of partial match (i.e., type 0 range) queries, and by
Lemma 4.3 we know that equal number of qualiﬁed buckets are distributed in all the

devices for each partial match query, the proof immediately follows. I]

Corollary 4.3. The Basic FX distribution method is strict optimal for any range query,
for which there exists at least one range speciﬁed ﬁeld such that the speciﬁed range size
is an integral multiple of the given number of devices M.

Proof: When the speciﬁed range size is F1, it is easy to see that the effect of the Basic
FX distribution method for this range is the same as that of the Basic FX distribution
method for an unspeciﬁed ﬁeld whose size is F, (note that for any integers J 1 and J 2,

TM(J 1 [+] J 2) = T M(J 1) [+] T M(J 2)). Thus, the corollary follows. III

Theorem 4.4. The Basic FX distribution method is strict optimal for any range query,
for which there is at most one range speciﬁed ﬁeld and all the other ﬁelds are speciﬁed
as single values.

Proof: The proof is almost the same as that of Theorem 3.1. E]

1.

85

Theorem 4.5. When all the ﬁeld sizes are greater than or equal to the given number of
devices M, the Basic FX distribution method is perfect optimal for type (0 - 1) range
queries.
Proof: When all the ﬁelds are speciﬁed as a single value, the proof is trivial. When at
least one ﬁeld is unspeciﬁed, the proof follows by Theorem 4.3. The only remaining
case is when one ﬁeld is speciﬁed as a range and all the other ﬁelds are speciﬁed as sin-
gle values. The proof for this case follows by Theorem 4.4. III

Theorem 4.3, 4.4 and 4.5 show sufﬁcient conditions for optimal distribution by the
Basic FX distribution method. However, the Basic FX distribution method does not
give optimal distribution for many types of range queries. The following proposition

gives the conditions for optimal distribution for these cases.

Proposition 4.1. Let q(f) = {i1, i2, , ik} be the set of range speciﬁed or unspeciﬁed
ﬁelds for a range query q. Let S1} be the set of elements in the range when ﬁeld i J- is
range speciﬁed, or be f1}. when ﬁeld i ,- is unspeciﬁed. Then, FX distribution methods are
strict optimal for a range query q, if there exists a set of ﬁelds {1' 1, , i j} c; q(f) such

that |S1l>< ><S,-j| is an integral multiple of M, and #{(J,-1, J,-}.)e
1'

S11>< x51}, | TM [H1019] :2} = |S,-lx x51). I/Mfor allze ZM.
p:

Proof: The proof is similar to that of Theorem 3.2. [I

Let S,- be the set of elements in the range when ﬁeld i is range speciﬁed, or be f1 when
ﬁeld i is unspeciﬁed. Proposition 4.1 says that we can guarantee strict optimal distribu-
tion for a given range query, if

(1) there exists a subset of the range speciﬁed or unspeciﬁed ﬁelds {i 1, . . . , i I} such

that |S1l x - - - x51]. | is an integral multiple of M, and

(2) the records projected on these sets of ﬁelds are distributed uniformly among the M

devices.

l!I

86

In other words, optimal distribution for a subset of ﬁelds guarantees strict optimal distri-

bution for many queries in which those ﬁelds are unspeciﬁed or range speciﬁed.

However, when the size of none of the ﬁelds is greater than or equal to M, the con-
ditions given in Proposition 4.1 are not satisﬁed in the Basic FX distribution method.
Thus, in the next section we propose ﬁeld transformation techniques for the ﬁelds whose
sizes are less than the given number of devices M. Note that the deﬁnition of ﬁeld
transformation functions is given in section 3.4. These ﬁeld transformation techniques
increase the scope of optimality over that of the Basic FX distribution method. By the
deﬁnition of ﬁeld transformation functions (i.e., one—to—one function), it is easy to see
that all the lemmas, theorems and proposition that hold for the Basic FX distribution
method also hold for FX distribution methods (i.e., the Basic FX distribution method

with ﬁeld transformation functions).

4.4.2. Field Transformation Functions for Range Queries

In the previous section conditions for optimal distribution have been described
when the Basic FX disribution method is used. In this section we present ﬁeld transfor-
mation techniques which can improve the performance over the Basic FX distribution
method signiﬁcantly. The following paragraph exempliﬁes the idea.

Whenf1 = {0, 1, 2, 3},f2 = {0, 1, 2, 3} and M = 8, the disribution by the Basic FX
distribution method is not strict optimal for many range queies (we can easily see that
the distribution in Figure 4.6 is not strict optimal for many range queries when M = 8).
Suppose X is an one-to-one mapping such that X (0) = O, X (1) = 4, X (2) = 2, X (3) = 6.
When each bucket <u, v > in the ﬁle f1 x f2 is stored at the device it [+] X (v), the dis-
tribution is strict optimal for any range query in which ﬁeld 1 is not speciﬁed as a range.
(It can be easily veriﬁed by substituting (100)B, (010),; and (110)19 for (0003, (010)];
and (011)B, respectively in f2 column of Figure 4.6. In fact this distribution is perfect

optimal for type (0 - 1) range queries.) Thus, our objective is to ﬁnd such mapping, X

'1:

87

in general, for the given ﬁle system. It will be shown that for any values of F 1, F 2 and
M, such mapping can be easily found. The following proposition gives sufﬁcient condi-

tion for the mapping X.

Proposition 4.2. Let a ﬁle consist of two ﬁelds i, j whose sizes are less than the given
number of devices M, and X be an injective function from N to ZM. Then, FX distribu—
tion method with I-transformation for ﬁeld i and X-transformation for ﬁeld j is strict
optimal for any range query in which ﬁeld i is not range speciﬁed, if for any L consecu-
tive valuesJ1, . . . ,JL in ﬁeldjsuch thatLSM/Fi, {X(J1), . . . ,X(JL)} = {kxFi +0): I
x=0, . . . ,L-l, andforallx,k,, e N,kx <M/F1,0Scx <F,-, andko at. ..¢kL_1}.

The proof for the proposition is straightforward from Lemma 3.2 (note that [-
transformation function denotes the identity function). Based on Proposition 4.2, we

describe an idea of how to ﬁnd the mapping X in the following paragraph.

Let F j = 8 and M = 16. The sizes of ﬁeld i in which we are interested are 2, 4, and
8. Note that when the size of ﬁeld i is greater than or equal to M, by Theorem 4.3 and
Theorem 4.1 the distribution is strict optimal for any range query in which ﬁeld i is not
range speciﬁed. When F,- = 8, in order to satisfy the condition of the mapping X in Pro-
position 42, X (0), X (1), . . . , X (7) should be in interval [0, 8) and [8, l6), alternately.
Here, "[" and "(" denote "closed" and "open", respectively. When F,- = 4, X (0), X (l), . .
. , X(7) should be in interval [0, 4), [4, 8), [8, 12) and [12, 16) in turn. Since the
sequence of intervals also has to satisfy the case when F; = 8, the above sequence of
intervals has to be reordered such as [0, 4), [8, 12), [4, 8) and [12, 16). When F,- = 2,
X(O), X(l), . . . , X(7) should be in interval [0, 2), [2, 4), [4, 6), [6, 8), [8, 10), [10, 12),
[12, 14), [14, 16) in turn. Similarly, we have to reorder the sequence of intervals such
as [0, 2), [8, 10), [4, 6), [12, 14),[2, 4), [10, 12), [6, 8), [14, 16).

This idea can be stated as follows : When (a,,,_1 . . . a0)3 is the binary notation of

J e fj, where m = logzM,

88

(1) For any consecutive two J 1, J 2 e fj, the most signiﬁcant bit values of X (J 1) and
X (12) have to be different

(2) For any consecutive four J 1, 12, J3, J4 e fj, two most signiﬁcant bits (i.e., left-
most two bits) of X (J1), X (12), X (J3) and X (J4) have to be all different.

(3) Thus, in general for any consecutive d values J1, . . . , Jd e fj, where d is a power

of 2, the logzd most signiﬁcant bits ofX (J1), . . . , X (13) have to be all different.

Based on these observations we propose the following ﬁeld transformation functions.

Deﬁnition 4.7. Let f; = {0, . . . , Fl—l} such that |f1l is a power of 2, and let
(a, a1ao)B bethe binary notation ofle f1. Then, URM : f1 —) ZM isafunction

such that URM(I) = (a0a1 ° - ° am_2a,,,_1)3, where M is a power of 2 and m = logzM.

Since we can add arbitrarily large number of 0’s to the left of the given binary number

of an integer 1, function UK” is always deﬁned for any power of 2 integer M.

Deﬁnition 4.8. Letﬁ = {0, . . . , F1—1] such that If1l < M, where |f1l and M are power
of 2. Then, UMM' If” : f, —> ZM is a function such that

UMM’ '4' (1) = URMa) [+] (1 mod dM‘ 'f" ), where dM' '1’" = M/ I f, I .

Example 4.5. When f1 = {0, 1, 2, 3],f2 = {0, 1, 2, 3, 4, 5, 6, 7} and M =16,
(a) 08160.) = {0 8. 4. 121 and UMW‘tfi): {0.9.6.15}.

(b) 01216082) =-{0, 8, 4, 12,2, 10, 6, 14}, and (11416302) = {0,9, 4, 13,2, 11,6,
15}.

Lemma 4.4. When there are only two ﬁelds i, j such that I fjl is less than the given
number of devices M, functions URM and UM M‘F’ satisfy the condition of the function
X in Proposition 4.2.

Proof: The proof of the lemma is straightforward from the deﬁnition of URM and

MP,

UM CI

89

From the deﬁnition of the function UR, we can observe the following relations.

UR-property 1. Let f1 = {0, . . . , Fl—l} such that |f1l < M. Then, for any I 6 f1,
URM(1) = URF‘(1)* M/F1.

The proof of UR—property 1 is straightforward from deﬁnition. Since UR F’(f,) = fl,
UR-property 1 says that when the domain size of a function UR is less than the range

size, the values of function UR are the multiples of the range size / the domain size.
UR-property 2. When f1 = {0, . . . ,F1—1} such that F, < M, URM(f1) = UM‘F’(f1).

Since for any I 6 f1 such that If, I < M, URM(l) is a multiple of M/FI, and for any two
different 11 and 12 6 f1, URM(I1) :t URM(12), it is easy to see UR-property 2. Thus, by
UR-property 2, all the lemmas and theorems in [12, 20] which hold for U—transformation
functions also hold for UR-transformation functions. Several useful characteristics of
the function UM will be described in section 4.5.4 and 4.5.5.

Because of notational complexity we will use the following conventions : When
the parameter M of a function UR denotes the given number of devices, we will leave
out the parameter by default. When the parameter of a function UR does not mean the
given number of devices, we will use the notation BI instead of UR. For example, UR F’
in UR-property 1 will be denoted by BIF'. The parameter M and |f1I of function UM
will be left out whenever there is no ambiguity.

Proposition 4.2 deﬁnes the class of function X such that FX distribution methods
with I (f1) and X (fj) gives strict optimal distribution for any range query in which ﬁeld i
is not range speciﬁed. That proposition is only for the case when ﬁeld i is [-
transformed. In the following proposition we generalize Proposition 4.2 to include the

case when other transformation functions are applied to ﬁeld i.

Proposition 4.3. When a ﬁle consists of two ﬁelds 1', j whose sizes are less than the

given number of devices M, let X ,- and X j be transformation functions applied to ﬁeld i

90

and j, respectively. Let d,- = M/F1. Then, FX distribution methods with X1(f,-) and

X j(fj) are strict optimal for any range query in which ﬁeld i is not range speciﬁed if

(1) X1(f,-) [+]Xj(J1) and X1(f,-) [+]X1-(Jz) are disjoint for any J1, J2 e f,- such that
J1[+]J2 <d,-, and

(2) X1(f,-) [+] Xj(J) =X1-(f1) [+] Xj(J[+](Xd1~) for any J e f, and for any a e N such that
J[+]01d,- SFj-l.

Proof: (1) and (2) in the proposition guarantee that for any J e fj, the sets M (f1, J),

M(f1,J+l), . . . , M(f,-,J+d,~-l) are disjoint if J+d1—1 S Fj-l, and M(f1,J) =

M(f1, J+0td,-) for any or e N such that J+ord1 S Fj—l. Thus, the proof follows.

El

Proposition 4.3 gives general conditions which guarantee strict optimal distribution for a
range query in which ﬁeld i is not range speciﬁed. These conditions will be used fre-

quently in proving theorems.

4.4.3. I and UR Field Transformation Functions

In this section we show that for any two ﬁelds i, j and the given number of devices
M such that F ,- < M, FX distribution methods are perfect optimal for type (0 - 1) range
queries when I-transformation is applied to ﬁeld i and UR-transformation is applied to

ﬁeld j.

Theorem 4.6. When there are only two ﬁelds i, j such that F j is less than the given
number of devices M, the FX distribution method with I—transformation for ﬁeld i and
UR-transformation for ﬁeld j is (1) perfect optimal for type (0 - 1) range queries when
F1Fj > M, and (2) perfect optimal for type (0 - 2) range queries when F 1F }- S M.

Proof: (case 1) F1FJ- >M

When one ﬁeld is range speciﬁed and the other ﬁeld is speciﬁed as a single value, the

proof follows from Theorem 4.4. When ﬁeld i is range speciﬁed and ﬁeld j is

91

unspeciﬁed, the proof follows from UR-property 2 (i.e., UR (fj) = U (fj)) and Lemma
3.8. When ﬁeld j is range speciﬁed and ﬁeld i is unspeciﬁed, the proof follows by
Lemma 4.4 and Proposition 4.2. When there is no range speciﬁcation, the proof follows
by UR-property 2 and Theorem 3.4.

(case 2) HF S M

From UR-property 2 and Lemma 3.8, each device has at most one bucket, and therefore

the proof follows. C]

Example 4.6. Let f1 = {0, 1, 2, 3},f2 = {0, l, 2, 3} and M = 8. Figure 4.7 shows the
bucket distribution by the FX distribution method with I (f1) and UR (fz). Here,
UR (fz) = {0, 4, 2, 6} and Device N0 = TM(I(J1) [+] UR (12)). J16 f1, 12 6 f2.

 

 

I (f 1) UR Ifz) Device No
000 000 0
000 100 4
000 010 2
000 110 6
001 000 1
001 100 5
001 010 3
001 110 7
010 000 2
010 100 6
010 010 0
010 110 4
011 000 3
011 100 7
011 010 1
011 110 5

 

 

 

Figure 4.7. FX Distribution with I And UR Transformation

We can verify that for any type 0 and type 1 range query the distribution of Figure 4.7 is

strict optimal.

92

Before we describe the next section, it is worth considering the following question.
Does there exist three ﬁeld transformation functions X 1, X 2 and X3 such that for any
two ﬁelds whose sizes are less than M, FX distribution methods with any two of X 1, X 2,
X3 transformation functions are perfect optimal for type (0 - 1) range queries ? Unfor-
nately, there do not exist such transformation functions because otherwise, it immedi-
ately contradicts Theorem 4.2. For example, let a ﬁle consist of three ﬁelds 1, 2 and 3
such that F 1 = 4, F 2 = 4 and F3 = 8, and let M = 16. Suppose the above such functions
X 1, X 2 and X3 exist, and let us apply X 1, X 2 and X3 transformation functions to ﬁeld 1,
2 and 3, respectively. Then, it is easy to see that the distribution for this example ﬁle
system should be strict optimal for any type 0 range query which has at most two
unspeciﬁed ﬁelds, and strict optimal for any type 1 range query which has at most one
unspeciﬁed ﬁeld. However, by Theorem 4.2 we know that there does not exist such dis-
tribution for this ﬁle system. This implies there does not exist a transformation function
Y such that FX distribution methods with I and Y-transformation as well as UR and Y~
transformation is perfect optimal for type (0 - 1) range queries for every ﬁle with two
ﬁelds. Thus, it is inevitable to have some restrictions for perfect optimal distribution for
type (0 - 1) range queries when I and UM-transformation, and UR and UM-

transformation are considered.

4.4.4. I and UM Field Transformation Functions

In this section we show that for any values of F,-, F k and the given number of dev-
ices M such that F1 S F k < M, FX distribution methods are perfect optimal for type (0 -
1) range queries when I—transformation is applied to ﬁeld i and UM-transformation is

applied to ﬁeld k.

Lemma 4.5. When there are only two ﬁelds i and k whose sizes are less than the given
number of devices M, the FX distribution method with I—transformation for ﬁeld i and

UM-transformation for ﬁeld k is strict optimal for any range query in which ﬁeld i is not

93

range speciﬁed.
Proof: This is a direct consequence of Lemma 4.4 and Proposition 4.2.

III

Lemma 4.6. When the size of ﬁeld k is less than the given number of devices M, let d1,
= M /F 1,. Then, for any two different nonnegative integers J 1 and J 2 such that J 1 and
J; are within the same interval of size dk (i.e., J1 [+] J2 < dk), UM (fk) [+] J1 and
UM (fk) [+] J 2 are disjoint.

Proof: LetJ1=J’dk[+]c1andJ2 =J’dk [+] c2, where 1’, c1, c2 6 N, and c1 and c2
are less than dk. Assume the lemma is not true. Then, there exist two different K 1, K 2

e fk such that
UR (K1) [+] (K1 mod dk)[+1JIdk [+101 = UR (K2) [+] (K2 mod dk)I+lJ’d/r [+102

Note that UM (K 1) = UR (K 1) [+] (K 1 mod dk) by deﬁnition. After removing J ’dk

from both sides of the above equality, we have

UR (K1) [+] (K1 mod dk)[+lc1 = UR (K2) [+1 (K2 mod dk)[+102
This implies that UR (K1) [+] UR (K2) = [(K1 mod dk) [+] c1] [+] [(K2 mod dk) [+] c2].
Since UR (K1) [+] UR (K2) > dk by UR-property 1, and (K1 mod dk) [+] c1,
(K 2 mod dk) [+] c2 is less than dk, there is no way the above equality can be satisﬁed.

This is a contradiction. CI

Lemma 4.7. When the size of ﬁeld k is less than the given number of devices M,
UM (fk) [+] UM (K) = UM (fk) for any K e fk.

Proof: Since UM (fk) [+] UM (K) are a set of F k different nonnegative integers, it is
sufﬁcient to show that for any two different K 1, K 2 in fk, (UM (K 1) [+] UM (K 2)) e
UM(fk). Let dk=M/Fk. Now,

UM (K1) [+] UM (K2) = [UR (K1) [+] (K1 mod 41.)] [+] [UR (K2) [+] (K2 mod 40]

= UR (K1) [+] UR (K2) [+] [(K1 [+] K2) mod dk]

94

By UR-property 1, UR (K1) [+] UR (K2) is equal to [3150(1) [+] 31’” * (K2)]dk (note
that we deﬁned BIK to denote URK when the parameter K of function UR does not mean

the given number of devices M). This implies that
UR (K1) [+1 UR (K2) =BIF"(K1 [+] K2)dk = UR (K1 [+] K2)
Thus, UM(K1) [+] UM(Kz) = UR (K1 [+] K2) [+] [(KlI+lK2) mod dk] =

UM (K 1 [+]K2). Since K 1 [+] K 2 e fk by Lemma 3.2, the proof follows. E]

Lemma 4.8. Let the size of ﬁeld k be less than the given number of devices M. Then
for any K e fk, UM (fk) [+] UR (K) = UM (fk) [+] (K mod dk), Where d1, = M/Fk.
Proof: Let K e fk. Then,
UM (fit) [+] UR (K) = UM (fk) [+] UR (K) [+] (K mod dk) [+] (K mod dk)
= UM (fr) [+] UM (K) [+] (K mod dk)

Since UM (fk) [+] UM (K) = UM (fk) by Lemma 4.7, the proof follows. E]

Lemma 4.9. When there are only two ﬁelds i and k whose sizes are less than the given
number of devices M and F; S F k, the FX distribution method with I-transformation for
ﬁeld i and UM-transformation for ﬁled k is strict optimal for any range query in which
ﬁeld k is not range speciﬁed.

Proof: Let dk = M/Fk. By Lemma 4.6, for any two different K 1 and K 2 which are in
the same interval of size dk, UM (fk) [+] K1 and UM (fk) [+] K 2 are disjoint. Thus, by
Proposition 4.3 it is sufﬁcient to show that for any I 6 f1, UM (fk) [+] J =

UM (fk) [+] (J [+] 0111,, ), where Otis any nonnegative integer such that ordk < F1. Now,
UM (ft) [+1 out = UM (it) [+] UR (31h (00)
= UM (ft) [+] (31%.) mod do

The ﬁrst equality holds because UR (31“ (01)) = BIF * (31F * (q))dk by UR-property 1, and

BI F‘ (81 F‘ (01)) = or. The second equality holds by Lemma 4.8. Since at < F k/d;c

II!

95

(because F1- SFk), this implies that BI F‘ (at) is a multiple of dk for all at = 0, . . . ,
F1/dk—l. Thus, BIF‘ (0t) mod dk = 0 and therefore UM (fk) [+] ordk = UM (fk) for all or =
0, . . . , F1/dk—1. This completes the proof. [I

Theorem 4.7. When there are only two ﬁelds i and k whose sizes are less than the
given number of devices M, the FX distribution method with I-transformation for ﬁeld i
and UM-transformaton for ﬁeld k is (l) strict optimal for any range query in which ﬁeld
i is not range speciﬁed if F; > F k, and (2) perfect optimal for type (0 - 1) range queries if
F ,- S Fk, and (3) perfect optimal for type (0 - 2) range queries if F 1F k S M.

Proof: For (1) and (2), the theorem is a direct consequence of Lemma 4.5 and Lemma
4.9. When F 1F k S M, since F 1—1 < M /F k, the theorem is a direct consequence of

Lemma 4.6. E]

Example 4.7. Let f1 = {0, 1, 2, 3},f2 = {0, 1, 2, 3} and M = 8. Figure 4.8 shows the
bucket distribution by FX distribution method with 1 (f1) and UM (fz). Here, UM (fz)
= {0, 5, 2, 7} and Device No=TM(I(J1)[+]UM(J2)),J1e f1,J2 6 f2.

We can verify that for any type 0 and type 1 range query the distribution in the ﬁgure is

strict optimal.

4.4.5. UR and UM Field Transformation Functions

In this section we discuss the cases of optimal distribution when UR and UM ﬁeld
transformation functions are used.
Lemma 4.10. Let the size of ﬁeld k be less than the given number of devices M.
Then, for any K e fk, K mod (1 = UM (K) mod d, where d is a power of 2 which is less
than or equal to M /Fk.

Proof.‘ Let dk = M/Fk.
UM (K) mod d = [UR (K) [+] (K mod dk)] mod d

= [UR (K) mod d] [+] [(K mod dk) mod d]

96

 

 

[(fl) UM (fz) Device No
000 000 0
000 101 5
000 010 2
000 111 7
001 000 1
001 101 4
001 010 3
001 111 6
010 000 2
010 101 7
010 010 0
010 111 5
011 000 3
011 101 6
011 010 l
011 111 4

 

 

 

Figure 4.8. FX Distribution with I And UM Transformation

Since UR (K) is a multiple of dk by UR-property 1, and dk is a multiple of d,
UR (K) modd = 0 and (K mod d1,) modd = K mod d. Thus, the proof follows.
[:1

Lemma 4.11. When there are only two ﬁelds j and k whose sizes are less than the
given number of devices M, the FX distribution method with UR-transformation for
ﬁeld j and UM-transformation for ﬁeld k is strict optimal for any range query in which
ﬁeld j is not range speciﬁed, if (1) F ,- 2 F k or (2) F ,F k S M.

Proof: When one ﬁeld is range speciﬁed and the other ﬁeld is speciﬁed as a single
value, the proof follows from Theorem 4.4. Thus, we have only to consider the case
when ﬁeld j is unspeciﬁed, and ﬁeld k is range speciﬁed or unspeciﬁed.

(case 1) F,- ZFk

Let dj = M/Fj and dk = M/Fk. Then, dj S dk. By UR-property 2, UR(fj) = U(f1), and
by Lemma 4.10, UM (K) mod dk = K mod dk for any K in fk. Since d ,- S dk, by Lemma

97

3.8 the proof follows.
(case 2) F ij S M

This is a direct consequence Lemma 4.10 and Lemma 3.8. CI

Lemma 4.12. When there are only two ﬁelds j and k whose sizes are less than the
given number of devices M, the FX distribution method with UR-transformation for
ﬁeld j and UM-transformation for ﬁeld k is strict optimal for any range query in which
ﬁeld k is not range speciﬁed, if (1) F ,- S F1, or (2) F in S M.

Proof: ’ (case 1) F j S Fk

Let dk = M /F 1,. By Lemma 4.6, we know for any two different nonnegative integers J1
and J 2 such that J 1 and J 2 are within the same interval of size dk (i.e., J 1 [+] J 2 < dk),
UM (fk) [+] J1 and UM (fk) [+]12 are disjoint. Thus, by Proposition 4.3 it is sufﬁcient
to show that for any J e fj, UM (fk) [+] UR (J) = UM (fk) [+] UR (J [+] adk), where or is
any nonnegative integer such that ordk < Fj. Now, since J is also an element of fk
because F J- S Fk, by Lemma 4.8 UM (fk) [+] UR (J) = UM (fk) [+] (J mod dk) for any J

e fj. This implies that
UM (fit) [+] UR (J [+] adk) = UM (fir) [+] [(J [+] (Id/c) mod dk]
= UM(f/t) [+] (J mod dk)

Note that J [+] ord,c is also an element of fk. Thus, by Proposition 4.3 the proof follows.
(case 2) Fij S M

This is a direct consequence of Lemma 4.10 and Lemma 3.8. CI

Theorem 4.8. When there are only two ﬁelds j and k whose sizes are less than the
given number of devices M, the FX distribution method with UR-transformation for
ﬁeld j and UM-transformation for ﬁeld k is (1) strict optimal for any range query in
which ﬁeld j is not range speciﬁed if F j 2 F k, and (2) strict optimal for any range query
in which ﬁeld k is not range speciﬁed if F J- S F k, and (3) perfect optimal for type (0 - 1)
range queries if F ,- =F k, and (4) perfect optimal for type (0 - 2) range queries if

I!

98

F ij S M.
Proof: The theorem is a direct consequence of Lemma 4.11, Lemma 4.12 and Lemma

3.8. [:1

Example 4.8. Let f1 = {0, 1, 2, 3},f2 = {0, 1, 2, 3} and M = 8. Figure 4.9 shows the
bucket distribution by FX distribution with UR (f1) and UM (fz). Here, UR (f1) = {0,
4, 2, 6}, UM(f2) = {0, 5, 2, 7} and Device N0 = TM(UR(J1)[+]UM(12)), J16 f1, J2
6 f2.

 

 

UR (f1) UM (fz) Device No
000 000 0
000 101 5
000 010 2
000 111 7
100 000 4
100 101 1
100 010 6
100 111 3
010 000 2
010 101 7
010 010 0
010 111 5
110 000 6
110 101 3
110 010 4
110 111 1

 

 

 

Figlire 4.9. FX Distribution with UR And UM Transformation

We can verify that the distribution is perfect optimal for type (0 - 1) range queries.

4.4.6. 1, UR and UM Field Transformation Functions

In this section we discuss the cases of optimal disribution when 1, UR and UM-

transformation functions are used.

99

Theorem 4.9. When there are only three ﬁelds i, j and k such that F ,- SF j SF 1, < M,
FX distribution methods can be always (1) perfect optimal for type 0 range queries, and
(2) strict optimal for any type 1 range query which has at most one unspeciﬁed ﬁeld if F j
=Fk, orFiFk SM.

Proof: (1) is true by Corollary 3.4. Thus, let us consider only the case (2). When
F,- S F j =Fk, let ﬁeld i be I-transformed, ﬁeld j be UR-transformed and ﬁeld k be UM-
transforrned. When F1Fk SM, let ﬁeld i be UR-transformed, ﬁeld j be I-transformed
and ﬁeld k be UM-transformed. Then, for both cases, the theorem is a direct conse-

quence of Theorem 4.6, Theorem 4.7 and Theorem 4.8. CI

Note that by Theorem 4.2 there does not exist a data distribution method which guaran—
tees strict optimal distribution for any range query in Theorem 4.9 for every ﬁle with

three ﬁelds.

We have shown through lemmas and theorems that FX distribution methods along
with various combinations of ﬁeld transformation functions give optimal distribution for
many types of range queries. Here, it should be emphasized that the scope of optimality
is increased signiﬁcantly by these ﬁeld transformation techniques along with Proposition
4.1. This is because by Proposition 4.1 optimal distribution for a subset of the ﬁelds
guarantees strict optimal distribution for many range queries in which those ﬁelds are

range speciﬁed or unspeciﬁed.

CHAPTER 5

NODE PARTITIONING SCHEMES FOR B-TREES

5.1. Introduction

In chapter 3 and 4 we have described data distribution strategies for multikey
search queries which access a set of records for each query. These are type A1 applica-
tions based on the classiﬁcation in section 1.4. In this chapter we will investigate data
distribution strategies for parallel processing of type A2 applications, i.e., the database
query which accesses a record based on primary key. The object is to enhance con-
currency by parallel processing of tree type index structure for the database stored in the
external storage. Parallel processing strategy for key based access main memory data-

bases can be found in [Cev88].

B-trees are the most commonly used tree type index structure for external data-
bases [Bay72, Com79]. For the database stored in the secondary storage, the height of
an index tree is the most important parameter in data retrieval time. One approach to
reduce the height of a B-tree is to partition a ﬁle horizontally, and construct a B—tree for
each subset of the partition. These B-trees are searched in parallel. When the database
size is D and is partitioned into p subsets, the height of each B-tree is approximately
logf(D /p), where f is the fan-out of an index node. Another approach to reduce the
height of a B-tree is to use a large node B-tree for the whole ﬁle. When the size of
index nodes is increased p times, the height of a B-tree becomes about longD. The
second method gives smaller height because D > pf for most practical cases. However,
this approach alone may not improve the overall performance because large nodes result

in long block transfer time and more main memory processing time.

101

We propose a node partitioning scheme for large node B-trees for parallel process-
ing. In the proposed scheme, each node is partitioned into multiple subnodes to be dis-
tributed for parallel processing. We also propose a new approach to process these sub-
nodes. We will call the proposed B-tree structure Partitioned Node B-trees (PNB-
trees), while the standard B-trees will be called conventional B-trees. With respect to
the two stage processing model in Figure 2.2, PNB-trees are based on partitioned node

B-tree data construction, and random or object speciﬁc data distribution.

The main results presented in this chapter are that the parallel processing of PNB-
trees reduces access time, increases throughput and minimizes the frequency of tree res-
tructuring. The basic structure of PNB-trees is based on B-trees. However, we use a
different approach to process and search PNB-trees. The PNB-tree approach exploits
parallel scanning by distributing multiple subnodes of a B-tree node among parallel

disks.

The rest of this chapter is organized as follows. In section 5.2 we present the basic
structure of PNB-trees, and the search and update algorithms. The hardware environ-
ment for the PNB-tree construction is also discussed in this section. The important
parameters of the PNB-tree are described in section 5.3. Section 5.4 presents a perfor-
mance comparison between PNB-trees and conventional B-trees. Finally, PNB-tree

construction for various disk technology is discussed in section 5.5.

5.2. The PNB-tree

PNB-trees are constructed on synchronized disks where read/write heads of all the
disks are located at the same position. Synchronizing disks for disk interleaving was
proposed in [Kim86] where multiple disks are interleaved like main memory interleav-
ing. It has been shown in that paper that the response time improves considerably by
using disk interleaving techniques when block size is large. This is because synchron-

ized disk interleaving increases data transfer rate signiﬁcantly. In PNB-trees,

PI}

102

synchronized disks are used to exploit searching in parallel.

The PNB-tree is a B—tree with each node partitioned into s subnodes, and all the
subnodes of a node are stored on s different disks. The format of an index node is
(¢,P0)(K1,P1)(K2,P2) - - - (Km,P,,,) where K,- is the key value and P,- is the address of
the child subnode in each disk. This index node is partitioned into 3 subnodes, and all
the subnodes of a node are accessed by the same address pointer. All the subnodes have
the same format except the ﬁrst one, where the key value in the ﬁrst record is omitted.
The leaf nodes which contain the data are also partitioned into s subnodes. However,
PNB-trees differ from conventional B-trees in that the key values within a node in
PNB-trees are not required to be in sorted order. This needs extra computation to deter-
mine the child address, but the amount of data movement between disks is reduced con-
siderably because a subnode does not overﬂow. The rule of locating a data record is
deﬁned recursively as follows :

Let R = {(4) , Po),(K1, P1), (K2 , P2), ~° , (Km , Pm)} be a current index

node. Then the data record with a key value K is in the descendant node

pointed by P , if (Ky , P,) e R - {(1) ,P0)} and (K —Ky) is the minimum non-
negative value among all (K —K,-) , i = l,...,m . If there is no nonnegative

(K —- K1) , i = l,...,m, the desired record is in the descendant node pointed by

P0.

An example of a PNB-tree with a set of key values is given in Figure 5.1. It shows that
each node is partitioned into two subnodes being stored on disks D1 and Dz. Note that
the values in a node are not kept in sorted order. When the key value e.g., 25 needs to
be searched, we ﬁrst compute the differences 25 —- Ky for all the key values Ky in the
root node. These values are <—7, 4, -66, -46>. By following the pointer associated with
Ky such that 25 - Ky = 4, the node <21, 25, 23, 29> is obtained. Here it is assumed that

all data values are stored in leaf nodes.

jl

103

Dr D2
3 2 9171

 

\O

 

 

 

D1 D2 '01 02101 DZID1 D I
, II IIIII IIII IIIII Eltlll IIIII IIEI

 

2
IIII 12l-

 

 

Figure 5.1. Partitioned Node B-tree

The reason for the use of unsorted nodes is to minimize the inter-device data move—
ment. Suppose that a node is in sorted order, and is partitioned and stored as above.
Whenever a subnode overﬂows, data movement between disks is required. However,
this does not happen in unsorted node construction because inserted records can be

placed in any subnode of a node.

5.2.1. The PNB-tree Operations

(1) LOOKUP

The lookup of a record starts from the root node and continues until the leaf. All
the subnodes of a node are searched in parallel. Let P be the address of the node
currently being accessed. Initially, P is the address of the root. The lookup procedure
consists of two phases. At ﬁrst, (K , P) is broadcast to all the disks, where K is the
desired key value. If P is the address of a leaf node, ﬁnd the record and stop. If not,
each disk ﬁnds an index record (K1, P1) in its subnode such that (K —K,-) is the
minimum nonnegative value. In the second phase, ﬁnd the minimum of these minimum
nonnegative values at each disk. If no nonnegative value is found at any disk, repeat

the ﬁrst and the second phase by replacing P with P0. Otherwise, repeat the same by

104

replacing P with P j, where (K j , P j) is an index record and K J- is the overall minimum

nonnegative value.

(2) INSERTION

To insert a record, apply the lookup procedure to ﬁnd the desired leaf node. We
then insert the record into any place within the node, if it has empty room. If the node
is already full, it is sorted and then split into two nodes. The effect of node splitting on
its parent node is handled recursively in the same way as in standard B-trees. Figure
5.2 shows the PNB-tree conﬁguration after the key value 57 has been inserted into the

tree of Figure 5.1.

(3) DELETION

The node underﬂow in PNB_trees is handled the same way as in standard B-trees.
PNB-trees have an advantage over B-trees because the holes created by deletion can be
easily ﬁlled by any records inserted. This is because the values within a node need not
be kept in sorted order. In conventional B-trees the holes can also be used but only at
the cost of sorting the node each time. Figure 5.3 shows the PNB—tree conﬁguration

when key value 71 is deleted from the tree of Figure 5.2.

5.2.2. Parallel Disks Conﬁguration for PNB-trees

For each disk we assume a simple microprocessor with a few thousand bytes of
memory. All the subnodes of a node are processed locally by these processors. Only
the matched record is sent to the host processor. For the second phase of a lookup pro-
cedure, we use a minimum ﬁnding hardware module to which all the disks are directly
connected. This module is a simple extension of a multiple input comparator. As soon
as each disk ﬁnds an index record (K1-,P1) such that "the desired key value — K1" is
minimum nonnegative value of that disk, it sends that index record to this minimum

ﬁnding hardware. If a disk does not ﬁnd a nonnegative value, it sends (—1, null). The

105

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Dr 1132
III 9 III
A D ‘D D, .D
91 - 98 — 7 - 87 -
Figure 5.2. PNB-tree After Insertion of Key Value 57
Dz
. 51 III
D2 D 1 02 D1 1)2

 

 

 

 

I 87 II —II II21II25II 23II29I|

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

IIII

EIEI

 

 

 

 

 

Figure 5.3. PNB-tree After Deletion of Key Value 71

 

 

106

minimum ﬁnding hardware determines the overall minimum nonnegative value and then
broadcasts the corresponding address of the child node to all the disks. When all the
disks reply with (—1, null), the address of a child node is the ﬁrst pointer in the ﬁrst sub-
node of a node. Since the communication involved is very simple and communication
distance is short, we expect that the time for these operations is negligible compared to

disk access time.

5.3. Motivations of PNB-trees

The motivations of PNB-trees are to reduce the tree height and to reduce the fre-
quency of tree restructuring. This is because the height of index trees and the frequency
of tree restructuring are very important parameters for the performance of external data-

bases.

5.3.1. Compressed Height

Let s be the number of disks, n be the number of records, r be the size of a data
record, and k be the key size. Let the block size be b bytes and the pointer size be p
bytes, where a block is a unit of data transfer. This block corresponds to a subnode in
PNB—trees and a node in conventional B-trees. Let h be the height of a PNB-tree, where
h includes the level of data nodes. To compute the worst case height we assume every
block to be half-full except the root node. Then, the maximum number of leaf nodes is

approximately

shb b "'2
k+p 2(k+p)

Since we need about 2nr/b data blocks (leaf nodes), the relation

shb b h—z > _2n_r
k+p 2(k+p) _ b

 

should be satisﬁed [Knu73]. For all the examples in this chapter we will use r = 200

107

bytes, k = 30 bytes andp = 4 bytes. Then, b 2 2500 if h = 3, s = 4 and n = 1 M. There-
fore, 3 Kbyte block size is enough for making 2 level indices in PNB-trees. Even for
large ﬁles with 60 M records, the block size of 5 Kbyte guarantees 2 level indices in
PNB-trees when eight disks are used. On the other hand, 4 level indices are needed for
this ﬁle when the conventional B-tree of 5 Kbyte block size is used. To guarantee 2
level indices in conventional B-trees we need to use the block size of about 48 Kbyte.

In this case the block transfer time is quite signiﬁcant.

The node size of PNB-trees is large, but I/O and computation time improves
signiﬁcantly because the large node is split into smaller subnodes and these subnodes are
processed in parallel. Since the height of a B-tree decreases with increasing node size,
the number of disk accesses in PNB-trees is minimized. Furthermore, the block transfer
time is reduced because only the subnode contributes to the block transfer time. Note
that the height can also be reduced in conventional B-trees by making the node size
large. But this results in a long block transfer time and more main memory processing

time.

5.3.2. Reduced Frequency of Tree Restructuring

The B-tree restructuring due to node overﬂow is expensive. One way to reduce the
frequency of tree restructuring is to use a large node. When the node size increases k
times, the frequency of tree restructuring due to node overﬂow reduces by a factor of
l/ck, where c is slightly larger than one. This is because the number of nodes in a B-
tree decreases almost linearly with node sizes. The detailed analysis for computing the

average number of nodes in a B-tree is given in [Yao79].

5.4. Performance Comparison

In this section we compare the response time of PNB-trees with conventional B-

trees which are stored on parallel disks.

108

5.4.1. Performance Models

For conventional B-trees each data request is directed to the disk which contains
the corresponding B-tree. Thus, a request queue is formed in front of each disk. In

PNB—trees all data requests form one queue.

Let X be a random variable for the disk service time of a single disk. Then, X =
S(seek time) + L(rotational latency) + T(block transfer time). For conventional B-trees
on parallel disks Rotational Positioning Sensing (RPS) miss delay should be added for
disk access time [Kim86]. RPS miss delay happens due to the channel contention by
concurrent disk I/Os. Let X ,- be the disk service time for disk i in conventional parallel
disks. Then, X1 = S + L + T + RPS 1. We deﬁne the disk response time to be the disk
service time plus the queue waiting time for the disk service. The queue waiting time is
computed as follows. The arrival process for disk access requests is assumed to be
Poisson process with mean R. L is uniforrrrly distributed with mean one half of one disk
rotation time, S is assumed to be exponentially distributed and the number of RPS,-
rrrisses is assumed to be geometrically distributed [Kim86]. Since the distributions of S,
L ,T and RPS,- are known, the mean and the variance of X 1 can be computed. We choose
M/G/l queueing model to compute average queue waiting time. In M/G/l queueing
model, the queue waiting time for disk i, denoted W1, is given by

ammmwmﬂ
2(1-Ri*E 1X11)
where R,- is the request rate for disk i [Ross85]. Thus, the average disk response time for
disk i is E [X1] + W,-. Let Z ,- be the disk response time for a j—th level node of a B-tree.
We deﬁne the data response time Z to be Z 1 +22 + . - . + 2),, where h is the height of
the B-tree. Thus, the average data response time E [Z] = hE [Z I]. Here, for all
j = 1, ..., h,
s

E%P%2@WHWD
i=1

109

where sis the number of disks. For the PNB—tree model,

R (Var (X)+E [X 12)

E[Zj]=E[X]+ 2(1—R*E[X])

S
for all j = l, . . . , h, where R is equal to 2R1. Note that there is only one queue and
i=1

there is no RPS rrriss delay in the PNB-tree model.

As described in section 5.2, PNB-trees use two phase processing to locate a record.
In the ﬁrst phase, computation time is overlapped with data transfer time because pro-
cessing can be done concurrently with the data transfer between a disk and a local
buffer. For the second phase, we assume that it takes less than one millisecond to ﬁnd

the overall minimum value.

5.4.2. Average Data Retrieval Time

In this section we use IBM 3380 disk parameters which are E[S] = 7.2 ms, E[L] =
8.3 ms and block transfer rate = 3 Mbyte/sec. We will compare the performance of
conventional B—trees on parallel disks with PNB-trees on synchronized disks. As typical
examples, four disks and eight disks are used. In conventional parallel disks, data
requests are usually skewed for some disks. The following skewed data request patterns

are used in the performance experiments.

1) 2)
DI D2 D3 D4 DI DZ D3 D4 D5 06 D7 D8
.458 .371 .153 .018 .388 .225 .153 .102999 .000001 .010 .054 .068

Here, Di represents disk i and the number below Di represents the probability that some
request is a disk i request. Note that there is no skewed effect in the PNB-tree model
because there is only one queue. Let the number of records be 1 M and all other param-
eters be the same as in section 5.3.1. When 4 disks are used, the data response time for

conventional B-trees is 70.07 ms while data response time for PNB-tree is 54.09 ms.

110

We have used 4 Kbyte block size with a data request rate of 2 per second. When 8
disks are used, block size is 2 Kbyte and the data request rate is 2 per second, the data
response time of conventional B-trees is 77.85 ms and that for the PNB-trees is 51.85
ms. The root node is assumed to be resident in main memory for all response time com-
putation. The height of the B-tree and the PNB-tree is computed with the assumption
that every node is half-full. We see that the use of small block size for conventional B-
trees may increase the data response time due to the increased height. In PNB-trees we
can have smaller block size with the same height by using more disks. Thus, the
PNB-tree takes advantage of less block transfer time. In conventional disk systems with
small block size, the block transfer time is negligible but the time due to increased disk
accesses is signiﬁcant.

Many parameters need to be considered in comparing the response time of conven-
tional B-trees on parallel disks and PNB-trees on synchronized disks. This is because
the height of B-trees depends on the block size, the number of records, record size and
key ﬁeld size. We will use the same values for these parameters as in section 5.3.1 for
the following experiments.

We compare three different B-tree processing models. The ﬁrst is conventional
B-trees with height 4. The second is conventional B—trees with height 3, and the third is
PNB-trees with height 3. The second model uses an increased block size to reduce the
height of the conventional B-tree. In the ﬁgures B, LB and PNB denote the response

time of the ﬁrst, second and third model, respectively.

Figure 5.4 shows response time for various data request rate when IBM 3380 disk
parameters are used. In Figure 5.4.(a), we used a database containing 1 M records with
eight parallel disks. The block sizes for conventional B-trees are 4 Kbyte and 14 Kbyte.
The block size for the PNB-trees is 2 Kbyte. In the ﬁgure PNB-trees are the most
efﬁcient, until 12 data requests/sec. Conventional B-trees of height 3 by using a large

block size is always better than the conventional B-trees of height 4. But for very large

111

100 —

9O —

80 —

Data Response 70 _
Time (ms) 60 — B
50 —

40 —

30

LB

PNB
IIIIIIIIIjj
0 2 4 6 810121416182022

 

 

Data Request Rate

(a) l M Records

200 a

180 —

160 —~

140 —

Data Response 120 —
100 —

Time (ms) 80 _ LB
60 —

40 —

20 a

B
PNB

 

 

I I I I I I I I I I j
0 2 4 6 8 10 12 1416 18 2022
Data Request Rate

(b) 60 M Records

Figure 5.4. Data Response Time with Various Data Request Rates

112

ﬁles this is not the case. Figure 5.4.(b) shows the case for 60 M records with eight
disks. Here, we use 10 Kbyte and 48 Kbyte block sizes for the conventional B-trees,
and 6 Kbyte block size for the PNB-trees. The PNB-trees are the most efﬁcient, until 21
data requests/sec. Using the large block size in the conventional B—trees turns out to be
the worst. This is because for 48 Kbyte block size the block transfer time is comparable
to one disk access time. Furthermore, it suffers severe RPS miss delay due to the long
block transfer time. On the other hand, PNB-trees perform better than conventional B-

trees for both small and large block sizes.

Figure 5.5 shows the response time comparison for various disk speeds, when data
request rate is 20/sec. Disk speed i in these ﬁgures denotes M x "average disk access
time of IBM 3380 disk". Figure 5.5.(a) and 5.5.(b) show the results for 4 parallel disks
with 1 M records and for 8 parallel disks with 60M records, respectively. In Figure
5.5.(a) 4 Kbyte and 14 Kbyte block sizes are used for conventional B-trees, and 4 Kbyte
block size is used in PNB-trees. In Figure 5.5.(b) we used 10 Kbyte and 48 Kbyte block
sizes for conventional B-trees and 6 Kbyte block size for PNB-trees. In this ﬁgure we
also see that PNB-trees perform better than conventional B-trees for both small and

large block sizes.

The saturation request rate is deﬁned to be a data request rate which makes the
queue length inﬁnite. The PNB-tree achieves a larger saturation request rate than the
conventional B-tree in the worst case, where every request goes to the same disk. Note
that in PNB-trees the worst case is the same as the average case. When the number of
records is 1 M and 8 disks are used with 4 Kbyte block size, the PNB-tree is saturated at
21 data requests/sec, but the conventional B-tree is saturated at 14 data requests/sec.

Therefore, PNB-trees can handle larger data request rate in the worst case.
One problem of PNB-tree organization is an increased queue length. This is
because all disk requests go to the same queue. For normal data request rate, the queue

length is usually very small(less than 0.1). But for high data request rate(e.g., more

113

 

 

35 —
30 —
25 -—
Data Response 20 ._
B
Time (ms) 15 -
10 —
5 __ PNB
0 | | I I I | I I I
1 2 3 4 6 7 10
Disk Speed
(a) 1 M Records

50 —1
40 —

Data Response 30 -

 

 

Time (ms) 201 LB
10— B
PNB
O I I I I I I I I I
1 2 3 4 5 6 7 8 9 10
Disk Speed
(b) 60 M Records

Figure 5.5. Data Response Time with Various Disk Speeds

114

than 15 data requests per second which are more than 45 disk access requests per
second, if the height of a B—tree is 3), the queue waiting time is no longer negligible and

the PNB-tree suffers from long queue waiting time.

In the next section the range of data request rates until which the beneﬁt of PNB-
trees is larger than the increased queue waiting time will be given. Note that the range
of data request rates resulting in long queue waiting time is quite dependent on the disk

speed.

5.4.3. Threshold Points

We deﬁne the threshold point to be the maximum data request rate until when the
response time of the PNB—tree is smaller than the conventional B-tree. Figure 5.6
shows threshold points for various disk speeds. l M records and 4 parallel disks are

used.

Threshold

Points

 

0 I I I I I
0 2 4 6 8 10

 

Disk Speed

Figure 5.6. Threshold Points

In the ﬁgure we can observe that threshold limit increases almost linearly with the disk

speed.

115

5.4.4. Update Performance

As we discussed in section 5.3.2, PNB-trees have an advantage of reduced number
of tree restructuring for insertion/deletion operation. Figure 5.7 shows the total number
of tree restructuring and Figure 5.8 shows the total amount of time for creating ﬁles of
different ﬁle sizes for conventional B-trees and PNB-trees. Here, eight IBM 3380 disks
with the conventional B—tree of 4 Kbyte and 32 Kbyte block size, and the PNB-tree of 4
Kbyte block size are used. In both ﬁgures, B denotes the conventional B-tree of 4 Kbyte
block size, LB denotes the conventional B-tree of 32 Kbyte and PNB denotes the PNB—
tree of 4 Kbyte block size. In Figure 5.7 the conventional B-tree of 32 Kbyte block size
has almost the same shape as that of the PNB-tree. When the insertion/deletion request
rate is high, the advantage of the reduced number of tree restructuring for the PNB-tree

is quite signiﬁcant.

5.5. Other Strategies for The PNB-tree Organization

In synchronized disks, discussed so far, we use one pointer to locate all the sub-
nodes of a node. When asynchronous disks are used for PNB-trees, multiple pointers
are needed to locate multiple subnodes. Note that asynchronous disks do not use syn-
chronizing disk-arm. In spite of a little bit extra storage to maintain multiple pointers,
this approach gives much better overall storage utilization. This is because the records
can be stored atany place within a node since key values need not be kept in sorted
order. Therefore, we can store the data in a compact form with variable number of sub—
nodes for each node. For this implementation, all the tree operation algorithms are the
same as those in section 5.2. One problem of this approach is the increased disk access
time because the disk access time is determined by the worst case disk position (note

that the second phase of lookup operation requires responses of all disks).

Another approach is to use synchronized disks and keep the key values ordered

within a node. By keeping ordered key values in a node we can eliminate associated

116

500 1

450 —

400 —

Number of 350 _
300 —

Restructuring 250 _ B
200 —
150 —
100 —-
50 — PNB
0

(* 10“)

 

 

IIIIIIIIIIII
051015202530354045505560

Number of Records (* 106)

Figure 5.7. Tree Restructuring Cost

50 _
45 _
40 _
35 — B
30 —
Time 25 _ LB

20—

3
(* 10 SEC) 15_ PNB
10—

5_
0

Insertion

 

 

I I I I I I I I
0 10 20 30 40 506070 80

Number of Records (* 104)

Figure 5.8. Insertion Time

117

minimum ﬁnding hardware for each disk. This approach has the problem of increased
number of tree restructuring due to subnode overﬂow. Note that restructuring operation
is expensive.

We can also use asynchronous disks and keep the key values ordered within a
node. The disk access time for a successful search does not increase in this method.
This is because for a successful search the decision can be made immediately in one disk
without collecting responses from all the disks. Therefore, for a successful search the
disk access time is the same as that of synchronized disks. However, this approach
requires multiple pointers and has the same problem as synchronized disks with ordered

key values.

[It

CHAPTER 6

CONCLUSIONS

We have investigated data distribution strategies for various database applications.
We have proposed a general database parallel processing model and a two stage data—
base parallel processing model which can be used as a framework for more speciﬁc
implementation. The general database parallel processing model shows important
issues of parallel database systems. The two stage database parallel processing model is
derived from the general database parallel processing model and is suitable for many
database applications. The objective of these models is to facilitate the design of paral-
lel processing database systems. We have applied these abstract models to three
speciﬁc database applications. These are partial match queries, multiattribute range

queries and key based accesses on B-tree type index structures.

We have investigated concurrent data accesses for partial match queries. Much
research has been done on designing efﬁcient multikey hashing schemes for partial
match retrieval type applications. We have focused on optimal bucket distribution in
multikey hashing to achieve maximum access concurrency for partial match queries.
We have presented FX distribution methods which are based on exclusive-or operation.
Field transformation techniques have been used to increase the scope of optimality in
FX distribution methods. Performance of FX distribution methods has been compared
with those of the other existing methods for typical ﬁle systems. We have shown that
FX distribution methods give higher probability of strict optimality than the existing
methods. We have also shown that the query response time of FX distribution methods

is better than that of the others.

118

119

Concurrent data accesses on multiattribute range queries has also been investi-
gated. Though much work has been done in designing ﬁle systems for multiattribute
range queries to reduce the number of bucket accesses, appropriate data distribution
among the access nodes has not been considered for these applications to enhance access
concurrency. There are several proposals for data distribution methods for multikey
accesses. However, they did not consider range speciﬁcation in a query. We have
investigated optimal data distribution for multiattribute range queries in parallel process-
ing ﬁle systems. Since perfect optimal distribution is the most desirable, the existence
of perfect optimal distribution has been investigated. We have shown for various types
of range queries that there are inherent limitations to achieve optimal distribution. The
results show that optimal distribution does not exist in many cases even for ﬁles with
two ﬁelds. We have given sufﬁcient conditions for the nonexistence of perfect optimal
distribution for certain types of range queries. We have also developed data distribution
methods for several useful multiattribute range queries. Sufﬁcient conditions for
optimal distribution by these proposed methods have been given. It has been shown
that the proposed data distribution methods are perfect optimal for certain types of mul-
tiattribute range queries, and strict optimal for a large class of multiattribute range

queries.

We have proposed data distribution strategies for B—tree nodes to improve the
response time of ﬁle accesses. This approach is based on the partitioned node B-tree
called the PNB-tree. We compared the performance of PNB-trees with conventional B-
trees on parallel disks. PNB-trees reduce the height of B-trees and exploit intra-node
parallel processing. We have shown that the height reduction of the B-tree along with
parallel processing within a node gives better performance than parallel processing of
conventional B-trees on parallel disks. We have also shown that the frequency of tree

restructuring due to node overﬂow is considerably reduced in PNB-trees.

120

This thesis investigated optimal data distribution for multikey hash ﬁles and node

partitioning schemes for B-trees to enhance access concurrency. Some areas of future

research that extend the work in this thesis are described below.

(1)

(2)

(3)

(4)

We have proposed a general model and two stage processing model. Though these
models are independent of the parallel processing architecture, they need to be
tuned and further subdivided into several prototypes to reﬂect hardware architec-

tures such as cube type interconnection.

In many cases we do not know what the best achievable distribution is for a given
ﬁle systems. It is worth while to investigate the existence of optimal distribution
for these ﬁle systems. Knowledge of the existence of optimal distribution will

help to deve10p methods for optimal distribution for these ﬁle systems.

FX distribution methods have been developed for ﬁle systems in which each ﬁeld
size and the number of parallel access nodes are power of 2. FX distribution

methods need to be extended for non power of 2 ﬁle systems.

PNB-trees are developed for ﬁles on secondary storage devices. The application
of PNB-tree approach to main memory databases needs to be investigated. Perfor-
mance analyses of PNB-trees are mainly based on disk access time because disk
access time is the most important parameter for the performance of external data-
bases. However, performance analyses of PNB-trees in main memory databases

will require different set of parameters.

LIST OF REFERENCES

 

[Aho79]

[Amm85]

[Ban79]

[Bat77]

[Bay72]

[Bay79]

[Ben75]

[Ben79]

[B1083]

[B0179]

[B0181]

LIST OF REFERENCES

Aho, A.V. and Ullman, J.D., "Optimal Partial-Match Retrieval When
Fields Are Independently Speciﬁed," ACM Trans. on Database Systems,
Vol. 4 No.2 , June 1979, pp. 168-179.

Ammann, A., Hanrahan, M. and Krishnamurthy, R.,"Design of a Memory
Resident DBMS,’ Proc. IEEE COMPCON, 1985, pp. 54-57.

Banerjee, J., Hsiao, D.K. and Kannam, K., "DBC - A Database Computer
for Very Large Databases," IEEE Trans. on Computers, Vol. c-28, No. 6,
June 1979. PP. 414-429.

Batcher, K.E., "The Multidimensional Access Memory in STARAN",
IEEE Trans. on Computers, Feb. 1977, pp. 174-177.

Bayer, R. and McCreight, E.M., "Organization and Maintenance of Large
Ordered Indices," Acta Informatica, 1:3, 1972, pp. 173-189.

Bayer, R. and Schkolnick, H., "Concurrency of Operations on B-trees,"
Acta Informatica, Vol.9, 1977, pp. 1-21.

Bentley, J.L., "Multi-dimensional Binary Search Trees Used for Associa-
tive Searching," Commu. of the ACM Vol. 18, No. 9, 1975, pp. 509-517.

Bentley, J.L. and Friedman, J.H., "Data Structures for Range Searching,"
ACM Computing Surveys, Vol. 11, No. 4, Dec. 1979, pp. 397-409.

Bic, L. and Hartman, R.L., "A Network Oriented Dataﬂow System," Data-
base machines, Leilich, HO. and Missikoff, M., eds., Springer-Verlag,
1983, pp. 166—187.

Bolour, A., "Optimality Properties of Multiple-key Hashing Functions,"
JACM, Vol.26, No.2, April 1979, PP. 196-210.

Bolour, A., "Optimal Retrieval Algorithms for Small Region Queries,"
SIAM J. Comput. Vol. 10, No. 4, Nov. 1981, pp. 721-741.

121

[Bor82]

[Bor83]

[Bur76]

[Cev88]

[Com79]

[Dew79]

[Dew86]

[Dew84]

[Du85]

[Du86]

[DuS82]

[E1180-a]

122

Boral, H. and Dewitt, D.J., "Applying Dataﬂow Techniques to Database
Machines," IEEE Computer, August 1982, pp. 57-63.

Boral, H. and Dewitt, D.J., "Database Machines : An Idea Whose Time
Has Passed? A Critique of the Future of Database Machines," Database
machines, Leilich, HO. and Missikoff, M., eds., Springer-Verlag, 1983,
pp. 166-187.

Burkhard, W.A., "Hashing and Trie Algorithms for Partial Match
Retrieval," ACM Trans. on Database Systems, Vol. 4, No. 2, June 1976,
pp. 175-187.

Ceverance, C. and Pramanik, S., "A High Speed KDL-RAM File System
For Parallel Computers," Techinical Report, Computer Science Depart-
ment, Michigan State University, Sept. 10, 1988.

Comer, D., "The Ubiquitous B-trees," ACM Computing Surveys, vol. 11,
no. 2, June 1979, pp. 121-136.

Dewitt, D.J., "DIRECT-A Multiprocessor Organization for Supporting a
Relational Database Management System," IEEE Trans. on Computers,
June 1979, pp. 395-406.

Dewitt, D.J., Gerber, R.H., Graefe, G., Heytens, M., Kumar, KB. and
Muralikrishna, M., "GAMMA : A Performance Dataﬂow Database
Machines," Proc. Conf. on Very Large Data Bases, 1986, pp. 228—237.

Dewitt, D.J., Katz, R., Olken, F., Shapiro, L, Stonebraker, M. and Wood,
D., "Implementation Techniques for Main Memory Database Systems,"
Proc. ACM SIGMOD, 1984. PP. 1-8.

Du, H.C., "On the File Design Problem for Partial Match Retrieval," IEEE
Trans. on Software Eng. Vol. SE-l 1, No. 2, Feb. 1985, pp. 213-222.

Du, H.C., "A Heuristic Disk Allocation Method for Binary Cartesian Pro-
duct Files," BIT 1986, pp.138-147.

Du, HQ and Sobolewski, J.S., "Disk Allocation for Cartesian Product
Files on Multiple-Disk Systems," ACM Trans. on Database Systems, Vol.
7 No. 1, March 1982, pp.82-101.

Ellis, C., "Concurrent Search and Insertion in AVL Trees," IEEE Trans. on
Computers, Vol. c-29, Sept. pp. 811-817.

[Ell80-b]

[Fag79]

[Fan86]

[Fre81]

[Gar86]

[GarL84]

[G1183]

[Haw82]

[Hil86]

[Hsi78]

[Hsi88]

[Kak85]

123

Ellis, C., "Concurrent Search and Insertion in 2-3 Trees," Acta Infonna-
tica, Vol. 14, 1980, pp. 63-86.

Fagin, R., Nievergelt, J., Pippenger, N. and Strong, H.R., "Extendible
Hashing - A Fast Access Method For Dynarrric Files,"ACM Trans. on
Database Systems, Vol. 4 No. 3, Sept. 1979, pp.315-344.

Fang, M.T., Lee, R.C.T. and Chang, CO, "The idea of De-clustering and
Its Applications," Proc. Conf. on Very Large Data Bases," Aug. 1986,
pp.181-188.

Fredman, M.L., "A Lower bound on the Complexity of Orthogonal Range
Queries," JACM, Vol. 28, No. 4, Oct. 1981, pp. 696-705.

Garg, AK. and Gotlieb, C.C., "Order-Preserving Key Transformations,"
ACM Trans. on Database Systems, Vol. 11, No. 2, June 1986, pp. 213
-234.

Garcia-Molina, H., Lipton, R]. and Valdes, J., "A Massive Memory
Machine," IEEE Trans. on Computers, Vol. c-23, No. 5, May 1984, pp.
391-399.

Glinz, M., "A Dataﬂow Retrieval Unit for a Relational Database
Machine," Database machines, Leilich, HO. and Missikoff, M., eds.,
Springer-Verlag, 1983, pp. 166-187.

Hawthorn, PB. and Dewitt, D.J., "Performance Analysis of Alternative
Database Machine Architecture," IEEE Trans. on Software Eng, Vol. SE-
8, Jan. 1982, pp. 61-75.

Hillyer, B., Shaw, DE. and Nigam, A., "NON-VON’s Performance on
Certain Database Benchmarks," IEEE Trans. on Software Eng. Vol. SE-
12, No. 4, April 1986. PP. 577-583.

Hsiao, D.K. and Baum, R.I., "Concepts and Capabilities of a Database
Computer," ACM Trans. on Database Systems, 1978, pp. 347-384.

Hsiao, Y.-S. and Tharp, A.L., "Adaptive hashing," Inform. Systems, Vol.
13, No. l, 1988. Pp. 111-127.

Kakuta,T., Miyazaki,N., Shibayama,S., Yokota,H. and Murakami,K.,"The
Design and Implementation of Relational Database Machine Delta," Data-
base Machines, Fourth International Workshop, 1985, pp. 13-34.

[Kh088]

[Kim86]

[Kim8 8]

[Knu73]

[Kun80]

[Lar7 8]

[Law82]

[Lee77]

[Leh81]

[Leh86]

[Lei78]

[Le185]

[LitSO]

[Mis83]

124

Khoshaﬁan, S., Valduriez, P. and Copeland, G., "Parallel Query Processing
for Complex Objects," Int’l Conf. on Data Engineering, 1988, pp. 202 -
209.

Kim, M. Y., "Synchronized Disk Interleaving", IEEE Trans. on Computers
Vol.c-35, No.11, Nov. 1986, pp. 978-988.

Kim, M. and Pramanik, 8., "Optimal File Distribution For Partial Match
Retrieval," Proc. ACM SIGMOD, 1988, pp. 173-182.

Knuth, DE. The Art of Computer Programming, Vol. 3 : Sorting and
Searching, Addison-Wesley, 1973, pp. 517-537.

Kung, H.T. and Lehman, P.L., "Concurrent Manipulation of Binary Search
Trees," ACM Trans. on Database Systems, Vol. 5, No. 3, Sept. 1980, pp.
354-382.

Larson, P., "Dynanric Hashing," BIT, 1978, pp. 184-201.

Lawrie, DH. and Vora, OR, "The Prime Memory System for Array
Access," IEEE Trans. on Computers, May 1982, pp. 435-442.

Lee, D.T. and Wong, C.K., "Worst-Case Analysis for Region and Partial
Region Searches in Multidimensional Binary Search Trees and Balanced
Quad Trees," Acta Informatica Vol. 9, 1977, pp. 23-29.

Lehman, PL. and Yao, S.B., "Efﬁcient Locking for Concurrent Operations
on B-Trees," ACM Trans. on Database Systems, Vol. 6, No. 4, Dec. 1981,
pp. 650-670.

Lehman, TJ. and Carey, M.J., "Query Processing in Main Memory Data-
base Management Systems," Proc. ACM SIGMOD, 1986, pp. 239-250.

Leilich, H.O., Stiege, G. and Zeidler, H.Ch., "A Search Processor for Data-
base Management Systems," Proc. of the 4th Int’l Conf. on Very Large
Data Bases, 1978, pp. 280-287.

Leland, MD. and Roome, W.D., "The Silicon Database Machine" Data-
base Machines, Fourth International Workshop, 1985, pp. 169-189.

Litwin, W., "Linear Hashing : A New Tool for File and Table Addressing,"
Proc. of the 6th Int’l Conf. on Very Large Data Bases, 1980, pp.212-223.

Missikoff, MJ. and Terranova, M., "The Architecture of Relational Data-
base Computer Known as DBMAC," Advanced Database Machine

[Ozk75]

[Pra86]
[Pra87]

[Pra88-a]

[Pra88-b]

[Pra88-c]

[Pra88-d]

[Riv76]

[R0587]

[Ross85]

[Rot74]

[Sam76]

125

Architecture, D.k. Hsia0(ed), Prentice Hall, 1983, pp. 109-129.

Ozkarahan, C.A., Schuster,S. A., and Smith, K. C., "RAP-An Associative
Processor for Database Management," Proc. NCC, AFIPS, 1975, pp. 379-
387.

Pramanik, S. and Davis, H., ”Multi Directory Hashing," Technical Report,
Computer Science Department, Michigan State University, Oct. 15, 1986.

Pramanik,S. and Kim, M., "HCB_tree : A B_tree Structure for Parallel
Processing," Proc. Int’l Conf. on Parallel Processing, 1987, pp. 140-146.

Pramanik, S. and Kim, M., "HCB_tree : A Height Compressed B_tree for
Parallel Processing," Information Processing Letters, November 1988, pp.
213-220.

Pramanik, S. and Kim, M.,"Generalized Parallel Processing Models for
Database Systems," Proc. Int’l Conf. on Parallel Processing, 1988, pp.
76-83.

Pramanik, S. and Kim, M., "Parallel Processing of Large Node B-trees,"
IEEE Trans. on Computers (to be published).

Pramanik, S. and Kim, M., "Optimal Data Distribution for Parallel Pro-
cessing of Multiattribute Range Queries," Technical Report, Computer
Science Department, Michigan State University, Nov. 15, 1988.

Rivest, R.L., "Partial-Match Retrieval Algorithms," SIAM J. Computing,
Vol.5, No.1, March 1976, pp. 19-50.

Rosenau, T. and Jajodia, S., "Parallel Relational Database Operations on
the Butterﬂy Parallel Processor : Projection Results," Technical Report,
Naval Research Laboratory, July 1987.

Ross, S.M., Introduction to probability Models, Academic Press, 1985, 3rd
Edition, pp. 355-399.

Rothnie, J.B.Jr. and Lozano,T.,"Attribute based ﬁle organization in a
paged memory environment," Comm. ACM, Vol.17, No.2, 1974, pp. 63-
69.

Samadi, B., "B-trees in A System with Multiple Users," Information Pro-
cessing Letters, October 1976, pp. 107-112.

[Sch83]

[S1070]

[Sto87]

[Su79]

[Sun85]

[Sun87]

[Yao78]

126

Schweppe,H., Zeidler,H.Ch., Hell,W., Leilich,H.O., Stiege,G. and
Teich,W.,"RDBM-A Dedicated Multiprocessor System for Database
Management," Advanced Database Machine Architecture, Hsiao, D.K. ed.,
Prentice Hall, 1983, pp. 36-86.

Slotnick, D.L., "Logic Per Track Devices," Advances in Computers, V01.
10, Academic Press, 1970, pp. 291-296.

Stone, H., "Parallel Querying of Large Databases : A Case Study," IEEE
Computer, Oct. 1987, pp. 11-21.

Su, S.Y.W., Nguyen, L.H., Eman, A. and Lipovski, G]. "The Architec-
tural Features and Implementation Techniques of multicell CASSM,"
IEEE Trans. on Computers, June 1979, pp. 430-445.

Sung, Y.Y., "Parallel Searching for Binary Cartesian Product Files," Proc.
ACM CSC Conf., 1985, pp. 163-172.

Sung, Y.Y., "Performance Analysis of Disk Modulo Allocation Method for
Cartesian Product Files," IEEE Trans. on Software Eng., Vol. SE-13, No.
9, Sept. 1987, pp. 1018-1026 .

Yao, A.,"Random 2-3 trees," Acta Informatica, Vol. 9, 1978, pp. 159-170.

 

“llllllllllllllllllll