la!!b£$:.:.$
is}: i.

E. .1.

12.1.21.
.3...- . K

 

UNEIV RS ITY LIBRARI

llllllllllllllllHlllHHllll lllllllllll

3 1293 01410 7365

 

 

 

 

 

 

 

||||l|l

 

 

 

 

 

 

This is to certify that the

dissertation entitled

RECURSIVE QUERY PROCESSING
IN LARGE DATABASES

presented by

Sungwon Jung

has been accepted towards fulﬁllment
of the requirements for

PhD degree in Computer Science

g ' Q‘KMMLL‘

Major professor

 

 

 

Date December 18, 1995

MS U is an Affirmative Action/Equal Opportunity Institution 0-12771

 

 

LIBRARY
Michigan State
Unlversity

 

 

 

PLACE III RETURN BOX to roman thb checkout from your record.
TO AVOID FINES return on or before date duo.

    

DATE DUE DATE DUE DATE DUE

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

MSU In An Al'ﬁrmlutlvo Action/Equal Opporlmly Ira-mulch
W

RECURSIVE QUERY PROCESSING IN LARGE DATABASES
By

Sungwon Jung

A DISSERTATION

Submitted to
Michigan State University
in partial fulﬁllment of the requirements
for the degree of

DEGREE OF PHILOSOPHY

Department of Computer Science

1995

ABSTRACT
RECURSIVE QUERY PROCESSING IN LARGE DATABASES
By

Sungwon Jung

Recursive database structures are the fundamental processing elements in numer-
ous important applications including navigation systems, recursive rule processing
in knowledge base systems, and processing IS-A hierarchy in object-oriented sys-
tems. Query processing for these types of structures is the focus of numerous current
database research projects, because traditional approaches are not suitable for this
kind of query processing. In this thesis, we have investigated query processing meth—
ods for large recursive relational databases. We have used automobile navigation
systems as one of the model applications. More general models for recursive queries

using transitive closures have also been investigated.

In automobile navigation systems, a topographical road map can be deﬁned by a
large recursive relation, and “ﬁnding shortest paths” using this relation is a recursive
query. We have developed a H iTz' graph model for structuring large recursive relations
to speed up the shortest path computation. The H iTi graph model provides a novel

approach to abstracting and structuring a topographical road map in a hierarchical

fashion. We then proposed a new shortest path algorithm named SPAH, which
utilizes the hierarchical abstraction of a topographical road map for its computation.
Our performance analysis of SPAH showed that it dramatically reduces the search
space. Within the H iTi graph framework, we also studied the parallel processing
for intra and inter query Shortest path problems. Our empirical analysis of these
two problems revealed that the inter query Shortest path problem provides more
opportunity for scalable parallelism than the intra query Shortest path problem.

We then studied recursive query processing for distributed fragments of recursive
relations. The major performance issues involved were the description and location
of recursive fragments in a distributed environment. Traditional approaches to de-
scribing and locating a fragment in non-recursive databases were not effective here.
We have developed a new method for describing and locating recursive fragments
based on the mathematical properties of lattices. We showed that lattice structures
provide a good theoretical and practical basis for describing and locating recursive
fragments. The performance of this lattice approach was analyzed both theoretically

and experimentally in this thesis.

© Copyright 1995 by Sungwon Jung
All Rights Reserved

To my parents

ACKNOWLEDGMENTS

I wish to express my gratitude and appreciation to my thesis advisor Dr. Sakti
Pramanik for his consistent guidance and encouragement right from the beginning,
and his ﬁnancial support. I am grateful for many discussions and invaluable comments
he provided.

I would like to thank my guidance committee members, Dr. Moon Jung Chung,
Dr. .lon Sticklen, and Dr. David Rovner, for their help and guidance.

I must express my heartful thanks to my father, BongJo Jung and my mother,
JungIIee Kim for their Sincere pray during my studying here. I also thank my two
sisters, Heewon Jung and Soonwon Jung, and my brother Sungkyu Jung for their

encouragement.

vi

TABLE OF CONTENTS

LIST OF TABLES ix
LIST OF FIGURES x
1 Introduction 1
1.1 Problem Description ............................. 1
1.2 Previous Works and Our Approach ..................... 4
1.3 Dissertation Outline ............................. 11

2 HiTi Graph Model of Large Recursive Relations for the Shortest

Path Computation 12
2.1 Introduction .................................. 14
2.2 Formal Framework and Description of HiTi Graphs ............ 19
2.3 Use of HiTi Graphs for Computing Shortest Route ............ 28
2.3.1 Shortest Path Algorithm SPAH ...................... 29
2.4 Performance Analysis ............................. 35
2.4.1 Comparison between SPAH and A* algorithm ............. 37
2.4.2 Effects of edge cost distribution ...................... 38
2.4.3 Effects of the number of hierarchical levels ................ 40
2.5 Updating HiTi Graph ............................ 42
2.5.1 Edge Deletion ................................ 42
2.5.2 Edge Addition ............................... 43
2.6 Conclusion ................................... 44
3 HiTi Graph-based Parallel Shortest Path Algorithms 46
3.1 Introduction .................................. 47
3.2 Intra Query Parallel Shortest Path Computation .............. 49
3.2.1 Use of HiTi Graph for Identifying Middle Nodes ............. 51
3.2.2 Efﬁcient Stopping Criteria ......................... 53
3.2.3 Formal Description of PASPAH ...................... 55
3.2.4 Performance Evaluation .......................... 58
3.3 Inter Query Parallel Shortest Paths Computation ............. 62
3.3.1 Formal Description of ISPAH Algorithm ................. 62
3.3.2 Performance Evaluation .......................... 63
3.4 Conclusion ................................... 65

vii

viii

4 Description and Location of Distributed ﬁagments of Large Recur-

sive Relations 67
4.1 Introduction .................................. 69
4.2 Fragment Description by Lattice Structures ................ 71
4.2.1 Maximal and Minimal Nodes Approach ................. 75
4.2.2 Lattice Approach .............................. 76
4.2.3 Lattice Cover Problem is NP — complete . . . .. ............. 79
4.2.4 A General Heuristic for Finding a Good Lattice Cover ......... 83
4.2.5 Near Optimal Lattice Cover Size ..................... 91
4.3 Finding Relevant Fragments ......................... 94
4.4 Updating Recursive Relations ........................ 101
4.4.1 Edge Update ................................ 102
4.4.2 Node Addition and Deletion ........................ 107
4.4.3 Performance Analysis of Update Algorithms ............... 107
4.5 Conclusion ................................... 111
5 Conclusion and Future Research 112
5.1 Contribution of This Dissertation ...................... 112
5.2 Future Research ................................ 115
APPENDICES 116
A HiTi Graph Based Distributed 'D‘ansitive Closure Algorithm 116
B Empirical Analysis of Computing Shortest Path for Road Map

Queries 124
8.1 Performance comparison of Shortest path algorithms ........... 124
8.2 Conclusion ................................... 129
C A Survey of Database Problems in Intelligent 'h‘ansportation Sys-

tem 131
CI Introduction .................................. 131
C.2 Data Models for Vehicle Navigation Systems ................ 134
C21 Indexing Schemes for accessing multidimensional road objects ..... 135
C22 Large Database Issues ........................... 139
(1.2.3 Update ................................... 140
C.3 Queries .................................... 141
C.3.1 Optimization ................................ 142
C.3.2 Query Types ................................ 143
CA Positioning Technology ............................ 144
C5 Current Systems ............................... 146

BIBILIOGRAPHY 149

LIST OF TABLES

2.1 N}, B}, and W} of SC} in Figure 2.2 .................... 22
2.2 Boundary nodes, between and within edges of level 1, 2, and 3 subgraphs

in Figure 2.4 ................................ 25
2.3 Effects of levels of HiTi graphs on H" when 10 S lNgll S 20 ....... 40
2.4 Effects of levels of HiTi graphs on H” when 4 S [Nil] S 8 ......... 40
2.5 Effects of levels of HiTi graphs on '1'" .................. .. . 41
4.1 An example of relation R .......................... 72
4.2 An example of relation R’ .......................... 72
4.3 Newly created nodes for each set S.- for transforming from MCP to LCP 82
4.4 Lattice L and Set 3;, ............................. 90

2.1
2.2
2.3
2.4
2.5
2.6
2.7
2.8
2.9
2.10
2.11

3.1
3.2
3.3
3.4
3.5
3.6

3.7
3.8
3.9

4.1
4.2
4.3
4.4
4.5
4.6
4.7
4.8
4.9
4.10
4.11
4.12
4.13
4.14

LIST OF FIGURES

Examples of within and between connection of CROMS .......... 17
A digraph G and its level 1 subgraph SC}, SC; and SC; ......... 20
An example of level 3 subgraph tree ST .................. 24
An example of a digraph C(V, E) partitioned according to ST in Figure 2.3 25
A level 2 HiTi graph corresponding to the digraph in Figure 2.4 ..... 27
An example of level 4 subgraph tree ST .................. 29
SPAH: Finding a Minimum Cost Path from START to DEST ..... 34
A* Algorithm: Finding a Minimum Cost Path from START to DEST . 36
A 800 x 800 grid graph partitioned according to the level 4 ST ...... 37
Performance comparison between A* and Algorithm S PAH ....... 38
Effect of edge cost on the performance of MA and MD .......... 39
Relative size of explored search space for A* and 0T0par ........ 50
Explored search space size of PAS PAH .................. 50
Level 1 HiTi graph created from level 1 subgraphs ............. 51
PASPAH: Find a shortest path from S' in 3G}, to T in SC} ........ 56
An example of sets I and II ......................... 59
Performance of PASPAH, SPAH, MOTOpar, and MOTO over: a) SET

I b) SET 11 ............................... 61
ISPAH algorithm ............................... 63
Performance of ISPAH ............................ 64
Performance of MISPAH ........................... 65
A digraph G for an acyclic relation ..................... 74
Three F -subgraphs .............................. 74
An Example of DAG GR(VR, ER) ...................... 76
Two lattice covers for the F -subgraph G2 .................. 78
Partial order sets created from MCP to LCP ................ 82
A digraph G and F-Subgraph of G ..................... 84
Algorithm 1: Partition an Unsuitable Candidate lattice L =< at, y > . . . 85
Changes of SGL(VL, EL) in Algorithm 1 .................. 86
Algorithm 2: Find a Lattice Cover for an F-subgraph ........... 88
A digraph G consisting of two F-Subgraphs ................. 89
Performance analysis for Algorithm 2 .................... 93
Procedure Block(;r,t) ............................. 97
a) DAG for the whole recursive relation b) Fragment table ........ 98
Algorithm 3: Find the Relevant Fragments ................. 99

X

4.15
4.16
4.17
4.18
4.19

A.1
A.2
A.3
A.4
A5
A6

8.1
8.2
8.3
8.4
85

xi

Procedure AddEdge((:c, y), S) ........................
A DAG consisting of two F -Subgraphs ...................
Procedure DeleteEdge((;r, y), s) .......................
Effects of updates on ILCpl for a) non-dense and b) dense F-Subgraphs .
Snapshots of lLCpl and Q after 50 random updates ............

Algorithm TC.1: AscendingPhase (U, j, k) .................
Algorithm TC.2: DescendingPhase (Z, j,i) .................
G'( V, E) partitioned according to level 3 subgraph tree ..........
A level 2 HiTi graph created from Figure A.2 ...............
Algorithm TC.3: CoordinatorSite j(H,TC,k) ...............
Algorithm TC.4: ParticipantSite 5(F, m, j) ................

Dijkstra’s shortest path algorithm ......................
Nicholson’s Shortest path algorithm .....................
Performance comparison between Dijkstra and Nicholson’s algorithms . .
Our two tree expansion shortest path algorithm ..............
Performance comparison between Nicholson’s and Our algorithms . . . .

Chapter 1

Introduction

1.1 Problem Description

Recursive relations have become one of the dominant types of relations in next gen-
eration databases such as deductive and object-oriented databases. The recursive
relation can be easily viewed as a directed graph. Each instance of attributes partic-
ipating in the transitive closure represents a node in the graph, and each tuple in the
relation represents an arc [53]. The above recursive relation is usually called an edge
relation, where each tuple corresponds to a labeled edge. Traditionally, the recursive
relation refers to this edge relation. Queries are recursive if they are deﬁned over re-
cursive relations and their processing requires transitive closure computations. These
recursive queries also must access detailed information about the nodes. Thus, we are
deﬁning another type of relation, called a node relation, where each tuple represents
the detailed information of a node in the graph.

A signiﬁcant amount of research has been done on recursive relations, primar-

2

ily in computing transitive closures [4, 42, 46, 49, 50, 57, 74, 83, 99, 102]. This is
- because every linearly recursive query requires transitive closure computation [52].
The-Shortest path problem is a Special case of the transitive closure computation
problem. The shortest path problem is fundamental for processing road map queries.
Very large recursive relations are needed to store topographical road maps. Previ-
ously suggested transitive closure or graph traversal algorithms are not suitable for
shortest path computations on topographical road maps because of their large search
space.

In this thesis, we will ﬁrst study an efﬁcient database organization method that
allows us to speed up the single pair shortest path (SPSP) computation from source
to destination nodes on large topographical road maps. We will then investigate a
fast SPSP algorithm that takes advantage of our proposed data organization method.
The efﬁciency of our proposed SPSP algorithm and data organization method will be
thoroughly analyzed on two dimensional grid graphs. Two dimensional grid graphs
are considered to be typical examples of topographical road maps [64, 91]. Grid
graphs not only closely model topographical road maps but also provide road map
databases for controlled analysis of our algorithms.

This thesis wll then examine the parallel processing for intra and inter query SPSP
problems on large topographical road maps. Intro query SPSP problems deal with
parallelizing a Single transaction of SPSP computation for further speedup, while inter
parallel SPSP problems deal with computing multiple SPSP transactions in parallel.
The inter query SPSP problem arises in the domain of automobile navigation systems

where a single central server provides route information to automobile navigators. In

3
these systems, many vehicles send their SPSP computation requests to a central
server and the server must be able to compute these multiple SPSP computation
transactions within a reasonable amount of time. Thus, it is very useful to develop
an efﬁcient inter query parallel SPSP algorithm.

In a centralized database, the primary concern of recursive query processing is
an efﬁcient transitive closure computation (i.e., a Shortest path computation for to-
pographical road maps). However, in a distributed database system, the problems
of describing and locating data are interrelated, and their optimization is an impor-
tant goal for recursive query processing. One important optimization parameter for
describing and locating distributed data-is the data transmission cost, because this
is a major performance bottleneck in distributed query processing. In distributed
database systems, it is often convenient and beneﬁcial to fragment and distribute
data according to referencing frequency or locality. By storing only frequently ac-
cessed data at the site, we can signiﬁcantly reduce the data maintenance and trans-
mission cost. In recursive relations, frequently accessed data are likely to be related
through the transitive relationships between them.

Most papers on parallel and distributed computation of transitive closure (includ-
ing the shortest path computation) have used either predicates or hash functions to
describe, distribute, and locate the data [32, 45, 47, 74, 84, 102]. However, the above
two methods are not suitable for describing and locating the distributed fragments
because they do not properly capture transitive relationships between data. In this
thesis, we will investigate the problem of describing distributed fragments of the re-

cursive relation that capture transitive relationships between data. Next, we will

4

show how to locally determine the location of the remote fragments of the recursive
relation. By locally determining the location of remote fragments, we can avoid a

Significant amount of communication cost for recursive query processing.

1.2 Previous Works and Our Approach

Shortest path problems have been the fundamental problems in computer science.
Therefore, a Significant amount of research has been done on these problems, and
excellent survey papers can be found in [23, 25, 31, 34, 81]. In the domain of auto-
mobile navigation systems where a database contains large recursive relations such
as topographical road maps, one of the primary functions is to compute the shortest
route from the current location (i.e. where a driver is) to the destination (i.e. where
the driver wants to go). Thus, computing SPSP is also essential for processing of
road map queries. As a database problem, the SPSP problem has been traditionally
studied as a special case of Single source and all pair Shortest path problems in the
context of recursive query processing [7, 8, 46, 65, 86, 101], transitive closure computa-
tion [3, 21, 22, 40, 48, 49, 52, 53, 57, 70, 103], and database query languages [2, 26, 71].
However, due to the large search Space a topographical road map creates [64], these

single source or all pair shortest paths algorithms are not suitable for computing
SPSP on this kind of map.

Dijkstra ﬁrst proposed an algorithm developed Speciﬁcally for SPSP problems [24].
This algorithm further reduces the search space required by single—source and all-pair

shortest path algorithms. However, the search Space Dijkstra’s algorithm has to

5

explore still remains very large when large topographical road maps are considered.
Nicholson [77] proposed a scheme that ﬁnds SPSP by alternately building two trees,
one rooted at the source node and the other rooted at the destination node. By
exploring the search space from both ends simultaneously, his algorithm can reduce
the explored search Space of Dijkstra’s algorithm by roughly half.

Another approach to reducing the explored search Space is to use semantic in-
formation about the domain of topographical road maps. That is, each node in the
graph corresponding to a topographical road map has its distinct coordinates repre-
sented in terms of longitude and latitude. By taking advantages of this information,
we can estimate the cost of the Shortest path between any two nodes. Either Eu-
clidean or Manhattan distances are usually used for the estimation of shortest path
cost between nodes. A* algorithm [30] is a typical example of this approach. A*
algorithm using Manhattan distance estimation gives an optimal shortest path only
when edge costs are uniform. This is because when edge costs are nonuniform, the
Manhattan distance is usually overestimated. Unlike A* algorithm using Manhattan
distance estimation, A* algorithm using Euclidean distance estimation always gives
optimal Shortest paths. The performance of A* algorithm was empirically analyzed
on grid graphs as well as on actual topographical road maps by Pearl and Shekhar
et. al. in [80,91].

Pearl [80] showed that A* algorithm was more efﬁcient than Breadth First Search
(BFS) [20] when the database fit in main memory. However, Pearl’s analysis did not
examine the effect of path length and edge costs on the relative performance of A*

algorithm. This was later analyzed by Shekhar et. al. [91]. Their analysis was done on

6

10 x 10, 20 x 20, and 30 x 30 grid graphs with different edge-cost distributions and one
actual digitized Minneapolis road map consisting of 1089 nodes and 3300 edges. Their
analysis shows that A* outperforms BFS if the path (source,destination) is much
smaller than the diameter of the graph, or if the edge—cost distribution is skewed.

The explored search space can be further reduced by combining A* algorithm and
Nicholson’s two-tree building approach. That is, each tree is expanded by using A*
algorithm. Mohr and Pasche [72] proposed a new SPSP algorithm named 0T0, which
is a hybrid of A* and Nicholson’s algorithms. They provided an empirical analysis of
0T0, Nicholson’s, A*, and Dijkstra’s algorithms on three grid graphs (i.e., 100 x 100
nodes with three different edge cost distributions) and one road map of Switzerland
(i.e., 3937 nodes and 12500 edges). For their empirical analysis, they used Manhattan
distance as the estimation of shortest path cost between nodes. Their analysis showed
that 0T0 outperforms Dijkstra and A* algorithms. The 0T0 algorithm was also
Shown to perform far better than N icholson’s if the degree of edge cost variation is
not high.

Agrawal and J agadish [5] developed a data organization approach to further speed
up SPSP computation. Their approach was to precompute some partially precom-
puted path information and then use it at run time to prune the explored search
space. This data organization technique assumes that a graph (i.e., topographical
road map) can be decomposed into a set of subgraphs. This assumption is reasonable
in the domain of road maps, since a large road map can be easily decomposed into
a set of small sub-road maps. While this approach gives much faster computation

time than all the algorithms discussed above, it has additional storage overhead and

update complexity.

Although the approach of Agrawal and Jagadish speeds up the Shortest path
computation Signiﬁcantly, it does not optimize processing the shortest path queries
based on road map navigation. The reason for this is that the shortest path obtained
from their approach still includes all the detail intermediate nodes between the source
and destination, and navigators do not usually need this information. This problem
would be very serious if we needed to compute a shortest path on a large topographical
road map on which the discretized interval is ﬁne—grained, and the two end nodes
on the path are far apart. To cope with this problem, we have developed a H iTi
(Hierarchical mulTi) graph model of very large recursive relations (e.g., topographical
road maps).

The H iTi graph provides a novel approach to abstracting and structuring road
maps in a hierarchical fashion. It is modeled after the mechanism people use to select
a route on a road map. That is, when a person wants to ﬁnd the shortest route
from a current location in one state to a destination point in another, he/ she usually
reads road maps in a deceasing map-scale order. For example, a driver who wants
to ﬁnd the shortest path from a current location in East Lansing to a destination
in Los Angeles, will consult the highway road map for the entire U.S.A. ﬁrst. The
person will then ﬁnd a more detailed highway connection by reading state road maps
(i.e., Michigan, and California). Finally, the driver will consult city road maps of
East Lansing and Los Angeles to ﬁnd very detailed information about the starting
point and destination. By maintaining road maps in this hierarchical structure, H iTi

graphs provide the following three features:

8

1. They allow the computation of a Shortest path without having to search through

all the detailed maps of the intermediate states and cities.

2. They provide a basis for storing road maps at various levels of hierarchical ab—
straction. Thus, a more controlled storage management suitable for navigation

systems (e.g., automobile navigation systems) with limited available storage can

be built.

3. They are able to support various types of database queries on hierarchical ab-

stractions of a road map.

HiTi graphs, therefore, provide a powerful framework for implementing road-map
queries (e.g., shortest-path computation) as well as for storing very large topograph~
ical road maps. We developed a new Shortest-path algorithm named S' PAH , which
utilizes H iTi graph structure. By taking advantage of hierarchical edge levels in the
H iTi graph, SPAH Signiﬁcantly reduces the explored search space. The performance
of SPAH can be further improved if Euclidean distance estimation is used. Although
S PAH reduces the explored search space signiﬁcantly, its performance improvement
is limited by the computation power of a single processor.

In order to overcome this performance limitation, we studied parallel processing
for Shortest path problems. In this study, we considered not only intra but also inter
query SPSP problems. Intro query parallel SPSP problems deal with parallelizing a
single pair Shortest path transaction, whereas inter query SPSP problems deal with
parallelizing multiple Single-pair shortest-path computations. Mohr and Pasche [72]

proposed an intra parallel SPSP algorithm called 0T0par, which is a parallel im-

9

plementation of the 0T0 algorithm mentioned above. That is, two processors are
used in OTOpar, and each processor executes A* algorithm with Manhattan distance
estimation to build its corresponding tree. To the best of our knowledge, no previous
work has been done on inter query SPSP problems.

The 0T0par algorithm is not scalable because its use of only two processors
limits its performance improvement. Therefore, we need a parallel intro query SPSP
algorithm that is more scalable and ﬁne-grained than OTOpar. In this thesis, we
proposed both intra query and inter query parallel SPSP algorithms named PAS PAH
and M I SPAH respectively. These two algorithms are based on the HiTi graph
structure. The H iTi graph structure provides the opportunity for ﬁne-grained parallel
processing for intra query SPSP problems. This is because multiple parallel shortest-
path computations can be initiated based on the nodes at a higher level of abstraction
of a topographical road map.

This far, we have discussed recursive query processing in a centralized database
system where data are stored at a single location. In this environment, the primary
goal of optimizing recursive query processing is to speed up either Shortest path
computation or, more generally, transitive closure computation. Location of the data
in recursive relations is not an issue because they are all in one location. However, in a
distributed database system, recursive relations are often fragmented and distributed
over various sites to reduce the data maintenance cost and the data access cost. Each
site stores the data (i.e., fragments) that are frequently accessed at that sites. In such
a system, the reduction of transmission of data between sites is also very critical for

the optimization of recursive query processing. Thus, we need an efﬁcient fragment

10

description and location method to avoid the signiﬁcant communication costs for
query processing. This could be achieved by keeping remote fragment descriptions in
local Sites.

Most papers on parallel and distributed computation of transitive closure (includ-
ing the shortest path) have used either predicates or hash functions to fragment,
distribute, and identify the data in recursive relations [32, 45, 47, 74, 84, 102]. Huang
et.al. [45] and Valduriez et.al. [102], studied the parallel computation of transitive
closures where a hash function is applied to fragment a recursive relation. Gan-
guly et.al. [32] suggested a framework for the parallel processing of datalog queries
where hash functions are also used to fragment recursive relations. Nejdl et.al. [74]
studied evaluating recursive queries in distributed databases in which recursive rela-
tions are fragmented by predicates. However, these fragmentation criteria, deﬁned by
hash functions or predicates, are not suitable for fragmenting recursive relations in
distributed environments. This is because they do not properly capture the referenc-
ing locality of data, which is of paramount importance to the success of distributed
database systems. Locality often comes from the transitive relationships between
data. Stated another way, relevant data, frequently accessed by recursive queries,
are likely to be related in the transitive closures. Hence, in distributed databases,
fragmentation criteria based on the transitive relationship are considered to be more
suitable for recursive relations than traditional criteria.

Based on the criteria deﬁned by transitive relationships, we developed a method
to describe and locate the fragments of a recursive relation. Our method uses lattice

structures. Lattice structures provide a powerful mathematical representation tool

11

for the description of recursive fragments. A distinct set of lattices describes each
recursive fragment. This lattice representation provides a concise but ﬂexible frag-
ment description. It is concise in that a large recursive fragment is represented by
a small set of lattices. At the same time, it is ﬂexible in that it can represent any
type of recursive fragment. Finding the optimal lattice descriptions of fragments is
Shown to be an NP-complete problem. We proposed an efﬁcient heuristics for this
problem. The performance of our heuristics was analyzed theoretically as' well as
empirically. The empirical analysis Showed that our heuristics gives a near-optimal
lattice description. The lattice approach creates unique update problems. We have

extensively analyzed the performance of our proposed update algorithms.

1.3 Dissertation Outline

The structure of this thesis is as follows: Chapter 2 presents a HiTi graph model
of large recursive relations for efficiently computing an optimal single pair shortest
path. In Chapter 3, parallel processing for intra and inter single pair shortest path
problems is discussed. Chapter 4 presents a methodology for describing and locating
the distributed fragments of recursive relations in a distributed database. Chapter 5
concludes the thesis with a list of future research issues.

The following parts of this thesis have been published: The content of chapter 2
appears in [59]. The content of Chapter 4 appears in [82]. The content of Appendix
A appears in [58]. Appendix B [60] and C [104] are being submitted. Chapter 3 will.

be submitted after additional work has been done.

Chapter 2

HiTi Graph Model Of Large

Recursive Relations for the

Shortest Path Computation

In navigation systems, a primary task is to compute the minimum cost route from the
current location to the destination. One of the major problems for navigation systems
is that a signiﬁcant amount of computation time is required to ﬁnd a minimum
cost path when the topographical road map is large. Since navigation systems are
real time systems, it is critical that the path be computed while satisfying a time
constraint. An efﬁcient database organization method is proposed for structuring a
topographical road map to speed up the computation of a minimum cost path. Data
organization methods previously suggested either do not allow optimal minimum cost
path generation or require searching all intermediate nodes on the minimum cost path.

12

13

In navigation systems, a navigator may not have to know all the intermediate nodes to
go from the current location to destination. In this chapter, we propose a new graph
model named HiTi (Hierarchical mulTi graph model), for efﬁciently computing an
optimal minimum cost path. Multiple levels of geographical boundaries (e.g. cities,
counties, and states) can be easily mapped into the hierarchical structure of the
proposed graph model. Various levels of hierarchical abstractions can also serve as
the basis for efficient storage management for large road maps. Thus, it provides the
basis to develop a more controlled storage management suitable for navigation systems
with limited available storage (e.g., navigation system inside an automobile). Based
on HiTi graph model, we propose a new single pair minimum cost path algorithm.
We empirically Show that our proposed algorithm dramatically reduces the explored
search space. Further, we empirically analyze our algorithm by varying both edge

cost distribution and hierarchical level number of H iTi graphs.

14

2.1 Introduction

In navigation systems, a primary function is to ﬁnd possible routes from the current
location (e.g., where a driver is currently positioned) to the destination (e.g., where
the driver wants to go) with a minimum expected cost. For this purpose, they use a
topographical road map which is in the form of the following recursive relation :

topographical.road-map (source, destination, cost)

where the cost attribute indicates, for example, a minimum expected time of travel
from point source to point destination. Another applicable cost can be the Shortest

distance between the two end points.

One of the major difﬁculties of navigation systems is the size of the topographical
road map data. It requires about 2.4 Gbytes of storage to store a small 100 mi x
100 mi map discretized at 100 feet intervals [5, 64]. Thus, the size of data involved is
very large when larger maps are considered. Minimum cost route computation with
this large amount of road map data requires a Signiﬁcant amount of computation time.
Since navigation systems are real time systems, it is critical to compute a minimum

cost route satisfying a time constraint.

Previously suggested transitive closure or graph traversal algorithms [4, 46, 49,
50, 57, 83, 86, 99] are not directly applicable to topographical-road-map (source,
destination, cost) for the computation of a minimum cost path due to the very large
volume of data they have to search. Thus, we need an efﬁcient database organization

method for structuring the topographical road map to speed up the computation of

15

a minimum cost path. In this regard, two different approaches have been studied in
the past. One approach is to develop a database structure which gives a suboptimal
minimum cost path quickly. The other is to develop the database structure which

gives an optimal Shortest path.

For the suboptimal shortest path generation, Ishikawa et. al., Shapiro et. al., and
Liu et. al. used road hierarchies (i.e., Freeways, Highways, . . ., Side roads) to speed up
minimum cost routes [51, 68, 89, 107]. They used multiple levels of hierarchical details
of road maps to cut down the unnecessary search Space. Ishikawa et. al. [51] applied
Dijkstra algorithm to the hierarchically structured road maps. Shapiro et. al. [89]
proposed a new graph structure, named LGS' (Level Graph Structure ) which models
the road hierarchies theoretically. Based on LGS, Shapiro et. al. gave a new algorithm
which generates approximate shortest paths rapidly. Their study Showed that the
length of the path produced by LGS converges rapidly to that of the actual minimum
cost path as the distance between the source and destination nodes increases. Liu et.
al. [68] studied integrating Dijkstra’s algorithm with an Al knowledge—based approach

and case-based reasoning for computing a minimum cost path.

For the optimal Shortest path generation, Agrawal and Jagadish recently studied
a data organization technique which precomputes and stores some partial path infor-
mation [5]. They use the precomputed partial path information to prune the search
space when computing a minimum cost path. While their approach Speeds up the
computation of a minimum cost path, it does not optimize processing minimum cost

path queries based on road map navigation. The reason is that their approach still

16

requires searching all the intermediate nodes on a minimum cost path. In navigation
systems, it is not necessary for a navigator to know all the intermediate nodes on
a minimum cost path. This can be illustrated by the following example. Consider
an automobile navigation system where a driver wants to ﬁnd a minimum cost path
from a place in East Lansing to a destination point in New York city. He/she needs
to know more about detailed route information near East Lansing and New York
city than those in other intermediate places (e.g., Toledo, and Pittsburgh). It is not
likely that that person has to know about detailed route information within all the
intermediate cities. The rational for this is based on the following two observations.
First, a driver likes to have detailed route information near East Lansing to get into
an interstate highway from a current location in East Lansing. Second, he/she needs
to know detailed route information near New York city to get into the local road

which leads to the destination point in New York city.

The above problem would be Very serious if we need to compute a minimum cost
path on a large topographical road map where its discretized interval is ﬁne-grained
and two end nodes on the path are far apart. To cope with this problem, we have
developed a H iTi (Hierarchical mulTi) graph model of very large recursive relations
(e.g., topographical road maps). In the following paragraphs, we will ﬁrst describe

Hi Ti graphs informally followed by a more formal description of the model.

Consider a topographical road map viewed as a directed graph G( V, E). Nodes in
V correspond to discretized grid points representing map objects in a road map. Edges

in E correspond to the connections between the nodes in V. Then, regional bound-

17

aries partition a road map C(V, E) into a set of Component ROad Maps (CROM)
exclusively. Each CROM can be viewed as a subgraph with its boundary nodes deﬁn-
ing the boundary of the CROM. Connectivity between CROMS is represented by the
connectivity of their boundary nodes. One CROM has direct connections with an-
other, if boundary nodes of the former are directly connected to boundary nodes of
the latter. We call this kind of connectivity as between connections of CROMS. For
each CROM, we precompute the cost for each pair of connected boundary nodes. If
two boundary nodes of the same CROM are connected by a path solely contained
within the CROM, we call this a within connection. By using the between and within
connections, we can partition the entire road map. Examples of between and within
connections of CROMS are shown in Figure 1.1 where dotted lines represent regional

boundaries.

Withinconnection forCROM 1:
“MD!

Within connection for CROM 2:
1(B.C).(B.D).(C.D)l
Within connection for CROM 3:

[(EG).(B.F).(F.G)I

Betweenconneetion forCROM I and2:
1(A.B)l

Between connection for CROM 2 and 3:
((C3).(D.F)l

Between connection for CROM] and 3:
101.0”

 

 

Figure 2.1: Examples of within and between connection of CROMS

HiTi graph is a graph whose nodes are the boundary nodes of the CROMS and

edges are the within and between connections of CROMS. Note that a CROM can be

18

deﬁned to contain a set of CROMS, thus creating a multi level hierarchy. Thus, we
ﬁrst need to determine a set of the lowest level CROMS which exclusively partition an
entire road map. We call these CROMS as level 1 CROMS. Then, we can recursively
construct a set of level k CROMS by grouping a set of geographically adjacent level
19-] CROMS where k 2 2. These sets of level I, 2, . .. ,k CROMS form a complete
balanced tree structure where the root node of the tree is a whole road map. For
example, consider a topographical road map covering the entire area of U.S.A. The
U.S.A. consists of 4 subregions which are Eastern, Midwestern, Southern and Western
regions. These 4 regions consist of their subregions which are the states of U.S.A.
Each state consists of a set of counties and each county consists of a set of cities. If
a city road map is deﬁned as basic i.e., level 1, CROMS, then level 2 CROMS are

county road maps, level 3 CROMS the state road maps, and SO on.

Given a complete balanced tree representing level I, 2, ,k CROMS, level 1
CROMS will be the leaf nodes, i.e., level 1 node of the tree. Each level 2 nodes ‘of the
tree will represent the level 2 CROMS and capture the between and within connections
of the level 1 CROMS. Thus, in general, a complete balanced tree has 1 thru k levels
corresponding to 1 thru k level CROMS. A level i (S k) node of the tree corresponds
to between connections of level i CROMS plus level 1 within connections. Note that
level 1 within connections capture the between and within connections of the level
1— 1 CROMS that corresponds to the child nodes of the tree. A HiTi graph is named
the level I: Hi Ti graph if its nodes and edges represent boundary nodes, between and

within connections of level I, .‘3, ,k CROMS. The HiTi graph constructed for the

19

United States example above is a level 4 Hi Ti graph.

The rest of the paper is organized as follows. In section 2.2, a formal framework
and description of HiTi graphs are discussed. In section 2.3, we propose and empiri-
cally analyzed a new Single pair shortest path algorithm that takes advantage of Hi Ti
graphs. Section 2.5 discusses the effects of updates on HiTi graphs. Finally, section

2.6 gives concluding remarks.

2.2 Formal Framework and Description of HiTi

Graphs

Recursive relations have the generic form R (attl, attz, att3) where attributes satisfy

the following 3 conditions:

1. attl and att-z are the key attributes of R and share the same domain

2. att, and att-z are the recursive join attributes and their corresponding values

are related by some transitive relationship

3. att3 describes the transitive relationship between attl and attg

Based on the above generic form, a recursive relation can be viewed as a directed
graph G(V, E). Each node in V is represented by the value of attl. Each edge in E

is represented by values of (attl, attg, att3).

20

Suppose that a recursive relation G(V, E) is partitioned into a set of subgraphs

(i.e. SG] (141,531), SG; (V21,E._}), ..., SG1 (VulnEl ) ) such that:

1lj 111

V,‘OV.,‘u-~uv‘ =v, E,‘uE,§u---uE‘ CE

n1 n1

Vilﬂl/j1 = w and £3,105};- = Q) where 1Si,an] and iyéj

We name SG} (W, EB) a level 1 subgraph i and the connections between all level 1
subgraphs are captured in PE1 2 E— (E,1 U E} U- - -U E)“ ). Then, each (:r,y,z) E PEl

has the property that a: 6 V,‘ and y 6 VJ} where i 9i j. Note that (:r,y,z) corresponds

to (attl, attg, att3) where a: = attl, y = attg, z = att3.

Deﬁnition 2.2.1 Let N,1 denotes a set of nodes in SC} (Vil, E3) connected to/from
nodes in other subgraphs SG] (VJ-1, E,‘) where i 79 j and 1 S i,j S n1. Speciﬁcally,
N,‘ ={;r|(a:,y,z) 6 PE1 /\ :1: 6 V,‘} U {y|(;r,y,z) 6 PE1 /\ y 6 14‘}. Set

N-1 is called level I boundary nodes of SG}.

3

 

Figure 2.2: A digraph G and its level 1 subgraph SG], SG; and 5G;

21

For example, consider Figure 2.2 where a digraph G and its subgraphs SG}, SG; and
5G}, are Shown. Set N21 of subgraph SG; is { G, H, M, O }. Each level 1 subgraph
is described and identiﬁed by its boundary nodes, since they exclusively belong to
one level 1 subgraph. Based on boundary nodes of 56'}, deﬁnition 2.2.2 gives formal

deﬁnitions of level 1 between and within edge sets B} and W,‘ of SG].

Deﬁnition 2.2.2 Let { SG} ,1,E,l) | j = l,n1 } be a set of level I subgraphs of
graph G(V, E) with the corresponding N}. Then, for each SG} where 1 S i S n1,
B.’ ={ (mas/.2) l ($.y.z) 6 PE‘ A w 6 V9 } and W.‘ = { ($,y,fz(2=,y)) l ($.31) E

(N1 x N3) A (x’ﬂ’y in $03) /\ as ye y}.

Each set B} contains an edge (:r,y,z) if there is an edge (x,y,z) in PE1 where a:
is in SG} and y is in a different level 1 subgraph. In other words, set B} contains
connectivity information between level 1 subgraph SG} and other level 1 subgraphs

of G(V, E). Then, the following condition holds:

1 _ n 1
PE ._ u,;,B,-

11.1

From the above, it can be noted that edges in szlBJl exclusively partition graph

G(V, E) into a set of subgraphs SG] (V,‘, E]), SG; (V21, Eg), ..., 5G,!“ (an1, E3“).

Each set W} contains an edge (2:, y, fz(:r,y)) if there exists a directed path from
the boundary nodes .7: to y of 5G}. Function fz(a:,y) gives an aggregated cost (i.e.
from node 2: to y) with respect to :3 within SG}. In other words, each edge in W?

represents additional precomputed connectivity information of boundary nodes within

22
SG,! . Thus,

it I n 1 n 1

AS an example, Table 2.2 shows the corresponding values of N}, B} and W} of the

three subgraphs Shown in Figure 2.2. In the following example, fz(:r, y) gives the cost

of a minimum expected time of travel between node 2: and y.

 

N-1 B.1 w}

 

 

 

 

 

 

 

 

1 {Or} {(D,G,2),(F,H,3)} {(P,D,4)}
2 {G’H’MIO} {(OIQI2)I(MIR74)7(G1FI2)} {(G1016)’(G1M13)’(H1M13)}
3 {QiR} 0 (0

 

Table 2.1: N}, B}, and W-1 of SG} in Figure 2.2

I

So far, we have introduced level 1 subgraphs and their associated information such
as level 1 Boundary node sets, and between and within edge sets. In general, based
on these level 1 subgraphs and their associated information, we can recursively deﬁne
level I: subgraphs SG]c and their corresponding N}, B}, and W} for any h 2 2. Their

details are discussed in the following deﬁnitions.

Definition 2.2.3 Let \II"‘1 = { 5qu (Vik'l, Eff-l) I i = 1,nk_1 } be the set of
all level k-I subgraphs of graph G(V, E) where k 2 2. Then, a level I: subgraph
SG]c (Vf, Ef) is deﬁned as a subgraph induced by all nodes of level k-I subgraphs in

(I) Q ‘11,“1 where [CPI Z 2.

After all level k subgraphs in \II" are deﬁned, the following three requirements must

be satisﬁed:

1. Each level k-I subgraph in ‘1‘,“1 is a subgraph of some. level I: subgraph in ‘1”.

23

2. All level It subgraphs in W“ do not overlap among themselves. That is,

V,"uV,*u-~LJV’° = V, Equg‘uo-bEﬁk c E

nk

VfﬂVf = 0 and EfﬂEf = Q) where 1Si,an;c and iaéj

3. All edges belonging to level 1: between edge sets are removed from level k-I

between edges sets. AS a result, (Uj‘éj' Bf"l ﬂ UjvﬁlBJ-k) = (0.

By satisfying the preceding ﬁrst two requirements, all subgraphs in U53 Q” are

related to each other in a complete balanced tree structure where the root node of
the tree symbolizes G(V, E). This complete balanced tree is named a subgraph tree
(ST). The root node of ST is considered as a level k+I subgraph and it has all level
I: subgraphs as child nodes. Recursively, each node symbolizing a level I: subgraph
has a set of level k-I subgraphs as child nodes. All leaf nodes of ST symbolizes level
1 subgraphs. This subgraph tree is named level k+1 ST if the root of ST symbolizes
level k+1 subgraph (i.e. G( V, E)). The following Figure 2.3 shows an example of level
3 subgraph tree. This tree structure shows a level 3 subgraph SG? induced by the
nodes in level 2 subgraphs (i.e. S G?, S G: and S G3) It also shows a level 2 subgraph

3G? induced by the nodes in the two level 1 subgraphs SG] and SG;.

For each level It subgraph SGf" of ST where 1 S i S nk, there exist corresponding
level I: boundary nodes (i.e. N1“) and level 1: between and within edge sets (i.e. B!“ and
W,"). The semantics of the level I: boundary nodes, level 1: between and within edge

sets are same as that of level 1 boundary nodes, level 1 between and within edge sets.

24

so?
so, so; so; so} so5 so,

Figure 2.3: An example of level 3 subgraph tree ST

They are formally deﬁned in deﬁnition 2.2.4.

Definition 2.2.4 Let PE" = E — ([3]c U EfUn-U Eikl- Then, N," = { :c | (z,y,z)
6 PEkAxEI/iij{yl(x,y,;/)E PEkAyel/f},B,-k={(x,y,z)](:1:,y,z)

k k k k k M”) -
6 PE /\ we V, }andW,- ={ (:r,y,fz(:r,y)) | (x,y) E (N,- x N,)/\ (x ——i’ y in
5G5“) /\ :1: # y }. Function fz(;r,y) gives an aggregated cost (i.e. from node a: to y)

with respect to 2 within SGf.

As can be noticed from the above deﬁnition, PE" = @118}. Note that of: Bf"
= U32? Bf" — Uj-‘éle. In order to Show how NI“, Bf, and W!“ are obtained, we use

the digraph in Figure 2.4 as an example.

From Figure 2.4, we got PE1 2 { (4,6,2), (5,7,3), (5,20,7), (10,11,4), (10,12,6),
(14,16,4), (l5,14,3), (l7,19,5), (l7,20,4), (18,20,2), (23,25,4), (24,25,5) } and PE2 =
{ (5,20,7), (10,11,4), (10,12,6), (l7,19,5), (l7,20,4), (18,20,2) }. By utilizing these
PE] and PE), we can obtain boundary nodes, within and between edges of level 1
and 2 subgraphs. These are Shown in Table 2.2. Note that in Table 2.2, between

edge sets of SG; and SGl, do not contain edge(s) in {(5,20,7), (10,11,4), (10,12,6)}

 

 

 

 

 

 

 

 

Figure 2.4: A11 example of a digraph G( V, E) partitioned according to ST in Figure
2.3

and {(l7,19,5), (l7,20,4), (18,20,2) } respectively. This is due to the last requirement

related to Deﬁnition 2.2.3. Thus, in general, given level k+1 ST, edges in B!“ of S Gf

capture connections between boundary nodes of level k-I child subgraphs of S'G]c and.

those of the rest of level k-I subgraphs. Between edges of level k-I child subgraphs of i

S G? capture connections between boundary nodes of themselves.

 

 

 

 

 

i N} W} B:

1 {4,5} 0 {(4,6,2),(5,7,3)}

2 {6,7,10} {(6,10,3),(7,10,3)} ll

3 {11,12,14} {(11,14,2),(12,14,1)} {(14,16,4)}

4 {15,16,17,18} {(16,15,3),(16,17,2),(16,18,2)} {(15,14,3)}

5 {1920,2324} {(20,23,2),(20,24,3)} {(23,25,4),(24,25,5)}

6 {25} 0 0

i N} W} B?

1 {5.10} {511016)} {(5.2017).(10.11141.00.12.61}
2 {11,12,17,18} {(1l,17,8),(11,18,8),(12,17,7),(12,18,7)} {(l7,19,5),(17,20,4),(18,20,2)}
3 {19,20} 0 0

1 N? w? B?

1 0 0 0

 

 

 

 

 

 

Table 2.2: Boundary nodes, between and within edges of level 1, 2, and 3 subgraphs
in Figure 2.4

 

26

We have given basic deﬁnitions necessary to deﬁne a Hi Ti graph. However, these
deﬁnitions (i.e. Nf, BI“, and Wik) do not provide an efﬁcient way of obtaining their
corresponding values. A primary reason for this inefﬁciency is that their deﬁnitions
require obtaining their values by using V," and Ef, which is computationally expen-
sive. Thus, we need a more efficient way of computing these values. The following
deﬁnitions provide a recursive way of obtaining the values of N,-", Bf, and W," based

on the values of Nik", Bf", and Wf'l for all level It — l subgraphs in \Ilk‘l.

Deﬁnition 2.2.5 Let D = { I, 2?, ..., nk_1 } where nk_1 = [Wk-1|. Suppose level
I: subgraph SG]c (Vik, BI“) is induced by all nodes in level k4 subgraphs SGf’l, V
t E I where I C D. Then, N!“ = { :c I (x,y,z) E UjiLI'Bj-P'l A :1: E Utethk'l A

y ¢ UtelNzk-l } U i y 1 (3313/12) 6 Ujléi’Bjc-l A y E Utethk-l A 33 if UtEINtk-l }

Deﬁnition 2.2.6 Let D = { I, 2, ..., 711..-} } where nk_1 = [Wk-1|. Suppose level I:
subgraph SG]c (Vf, BI“) is induced by all nodes in level k-I subgraphs 50:”, V t E I
where I C D. Then, Bf = { (:c,y,z) | (z,y,z) E U'-"‘"‘B'~"l A a: E Utele’l A

1:1 J

y E Utethk-l }

Deﬁnition 2.2.7 Let D = { 1, f3, ..., 721..-, } where nk_1 = |W"“|. Suppose level I:

subgraph SGf (Vik, El“) is induced by all nodes in level k-I subgraphs SGf’l, V t E I
k _ k k III-1'1 ) .

where I C D. Then, W,- — { (:r,y,fz(a:,y)) | (x,y) E (N,- x N,) A (a: ——i y in

UtEl(Wtk-l U Bic-l) A 37 E y I-

By taking advantages of the above three deﬁnitions, we can efﬁciently obtain
values of different levels of boundary node, within, and between edge sets. Based on

within and between edge sets, the formal deﬁnition of Hi Ti graph is given as below:

27

Deﬁnition 2.2.8 A HiTi graph is a directed labeled graph deﬁned in terms of between
and within edges associated with all nodes of subgraph tree ST. A HiTi graph deﬁned
over level k+I subgraph tree ST is called level It HiTi graph. It is represented by
Hk(Pk,Ak) where Pk = { :1: I (.r,y,z) E Uf=,U;-‘j;_,B; } U {y I (2:,y,z) e Uf=,U;-‘;,B;
} and A" = I (may, [11W1fz($1y)l) | ($,y,fz($,y)) E U?‘=1W} f0?“ 1 = 11k } U {

(x,y,[l,B,z]) I (:r,y,z) E U?‘=1BJ’- for l=1,k}.

Note that the symbols “1” and “W”( “B ”) in the deﬁnition of an edge in A’c represent
the edge is level 1 within(resp. between) edge. An example of level 2 HiTi graph

corresponding to the digraph in Figure 2.4 is shown in Figure 2.5.

 

 

. (1:21 6 .
./'. "'3’
5 IIJJI ‘ my, '°

\ [31"07 /’

 

 

Figure 2.5: A level 2 HiTi graph corresponding to the digraph in Figure 2.4

We have introduced basic concepts and formal deﬁnitions of a level I: Hi Ti graph

in this section. It is easy to see that the overhead of a level 11: Hi Ti graph comes from

n.

the within edges in sets UleUFIWj. Thus, for the HiTi graph model to be practical,

we introduce one requirement. That is, IUf=1U;-‘_'__1W;I should be small compared with
IEI in G(V, E). In practice, IUf=1U3‘;,W;I is likely to be very small. This is because

|W;I can not be greater than INjI x Ile where the boundary nodes of 5G;- are usually

quite small compared to IVJ-‘I.

28
2.3 Use of HiTi Graphs for Computing Shortest

Route

In this section, we discuss the use of HiTi graphs for computing a minimum cost
path. It is a Simple bookkeeping problem to keep track of nodes on a minimum cost
path; hence we will focus on an algorithm for simply ﬁnding a minimum cost of the

path. We ﬁrst introduce a set of basic notations which will be used in the rest of this
paper.

Deﬁnition 2.3.1 Let ST represent a level I: subgraph tree. Assume set X consist of

a set of subgraphs (i.e. nodes) of ST. Then, S],(X) { y I y is a level I subgraph
which is an ancestor of each subgraph in X }, SA(X) = UleSﬂX), and SC(X) = {
YI Y is a direct child node of a subgraph in X }. Note that Sc(X) = X for all leaf

node subgraphs S G} in X.

Deﬁnition 2.3.2 Assume set X consist of a set of subgraphs. Then, SB(X) and
Sw(X) consist of between and within edge sets associated with all the subgraphs in X

respectively. SN(X) consists of boundary node sets associated with all subgraphs in X.

Deﬁnition 2.3.3 Let S G} and SG} be two distinct level I subgraphs deﬁned in a level

I: subgraph tree ST. Then, LU B_g(,-;(SG1,SG}) is the least leveled common ancestor

subgraph of SC] and SC].

29

The examples of the deﬁnition 2.3.1, 2.3.2 and 2.3.3 are illustrated through the
level 4 subgraph tree shown in Figure 2.6. Assume that X = { SGé, SG]l }. Then,
91(x )- — { 9 9a.. 1, 9W) = { .903. .90: }. SAX) = { 90;. 90:, 903,
9'6"}, 5'6“ 111 . 9G5}, S(,~(X) = { sag, SGI, }, SC(S'2(X )) ={ 9G8, L 9G2” SGIO, SGI,,
3612 }1SB(X) = { 8211 311 }. Sw(X) = { W31. W111 1.5N(X) = { N819N111 }. and

LUBSG(SG,‘,, 501,) is 50;.

SO

/\ /I\ /I\
/\,l A \ \/,\

i 5(i. 50i 5011 SG\QSGIOSGII SGiszis 561456155016 SGiv SGis i9

Figure 2.6: An example of level 4 subgraph tree ST

2.3.1 Shortest Path Algorithm SPAH

We now describe a Shortest Path Algorithm based on H iTi graph (S PAH ) S PAH
takes advantage of Hi T i graphs to speed up the computation of a minimum cost path.
Our algorithm explores at most ES search space necessary for computing a shortest

cost path. The size of ES is shown in the The following theorem.

30

Theorem 2.3.1 Let ST represent a level k+I subgraph tree where level Is: HiTi graph
H"(P’°,Ak) is deﬁned. Assume that we want to compute a minimum cost path from
START node in SG,1 to DEST node in SG] Then, the explored search space ES is
at most the union of E}, E} and A’k Q A" where A’k = SB(SC(SA({SG},SG]}))) U

sw(90(.9..({9G:.96:11)).

Proof: Let I be the level number of LUBSG(SG,I, SGj). Furthermore, let E31

13; u E}. u SB(SC(U’ 9;;(S(;:,SC;))) u .S'w(Sc(Uf,=,S,';(SG},SG}))) and ES2

n=1“

II

= -S'B(SC(U:T—_l+152(LUBSGISGl1SUD») U Sw(SC(Uii:i+1SKILUBSGISGia 501)»)-
Then, we can represent ES as a union of BS, and E32. From ES, it is easy to
see that at least ES} search space is necessary to compute a minimum cost path
from the nodes START and DEST. Hence, we only need to show that E32 is
also a necessary search space. Assume that the shortest cost path from the nodes
START to DEST is 2:0 = START —> $1 —> :22 —> —> :rm —> mm“ —> xm+2
—> xm+3 —> -—> em“ = DEST. It is possible that ({xm} U u;‘=+,,‘{+2{x,,})
c .9~<.9c(u:.....92(.9G:,90;») and 9.... e 9~(so(u.’:::..9::(LUBsa(SG.%SCI)»
where (x,,,,:r,,,+1,[C,B,zI) and (:r,,,+1,:rm+2,[(,B,z]) are level C between edges for

some C such that 1 S (I S k. D

From theorem 2.3.1, it is easy to see that how Hi Ti graphs can signiﬁcantly reduce
the search space. That is, without using HiTi graph, ES would be U?;1E,l where 72,

represents a total number of level 1 edges sets. Furthermore, theorem 2.3.1 allows

more controlled storage management that is suitable for navigation systems (e.g.

31
automobile navigation system) where available storage is limited. This is possible
because we do not need to have all level 1 edge sets to compute a minimum cost
path. Instead, we only need a part of the level I: HiTi graph and two level 1 edge
sets (i.e. one for START node and the other for DEST node). Note that the major

storage overhead of a topographical road map comes from the size of level 1 edge sets.

We take a level 4 subgraph tree in Figure 2.6 to give an example of theorem 3.1.
Assume that we wants to ﬁnd a minimum cost path from node START in SG] to node
DEST in SGia- Then, our search space ES consists ofE,1 U Eis U SB(SC(SA(SGI)))

U SWISCISAISGi)» UizGISBISGi’) U SWISGill UiinlsBISGi) U SWISGill-

Within the search Space E S, SPAH traverses edges from node START in S G} to
node DEST in SG] to ﬁnd a minimum cost (i.e. with respect to cost 2) path. Its edge
traversal consists of two phases, the ascending and descending phases. The ascending
phase corresponds to the period of edge traversal from node START to the boundary
nodes in SN(S:,'1(SG;-)) where l is a level number of LU Bsa(SGI,SGj-). Similarly,
the descending phase corresponds to the period of edge traversal from the nodes in
SN(.5'34'1(SG])) to node DEST. During the ascending and descending phases, SPAH
traverses edges in a non-decreasing and non-increasing edge level order respectively.
Note that each of the considered paths has its own thread of processing. Some of
the paths are still in the ascending phase, whereas the others are in the descending
phase. The following proposition 2.3.1 provides the theoretical basis of the detailed

ascending method given in SPAH.

32
Proposition 2.3.1 Given a level I: HiTi graph, assume that I be the level number of
LUBSG(SGI,SG]) where node START E V,1 and DEST 6 Vi]. Assume that level
11 to r edges are incident on node :1: such that :1: E (V,1 U SN(SC(SA(SG})))) and 0 S
u S r. In order to ﬁnd the shortest path from node a: to any node in SN(S§"'(SG})),

it is enough to follow only level p edges from node a: where min(l — 1, 7') S p S r.

Proof: Since level 11 to r edges are incident on node x, we only need to Show that
it is not necessary to follow level ,11 to min(l — l, T) — 1 edges from node 3:. Then, we

need to consider two cases when l— 1 S r and l— l > 7. Let SGI, — 1 be SQ'I(SG}).

0 Case I — 1 S r: In this case, we need to show that we do not have to visit
the nodes adjacent to :1: by following level p to l -— 2 edges for the shortest
path computation within 5fo1 from node a: to the boundary nodes of SGL“.
Since level 1 — 1 edges are incident on node 2:, node a: is in SN(SGI,"1). From
the deﬁnitions of WI", it is true that W1“ contains all the shortest path
connections between the boundary nodes of SGI,’1 within SGL“. Thus, all
shortest paths between node :1: and y within SGI,"l are also captured through
level l— 1 within edges (1:,y, [l — l, W, z]) 6 W1" where y E SN(SGI,‘1), which

proves this case.

0 Case I - l > r: In this case, we need to show that we do not have to visit the
nodes adjacent to :1: by following level In to r — 1 edges for the shortest path
computation within SGI,‘l from node a: to the boundary nodes of S G5,“. Since
level 7’ edges are incident on node 2:, node a: is in SN(SG;) where SC; is a

descendant subgraph of SGI,’1 in ST. Then, it is clear from the deﬁnition of

33
HiTi graphs that all paths generated from node a: by following level p to r — 1
edges must pass through level 7’ boundary node(s) of SGb’. From the deﬁnitions
of W;, it is true that W; contains all the Shortest path connections between
the boundary nodes of SC; within SGI. Thus, all Shortest paths from node :1:
to y within SGZ are also captured through level 1' within edges (1:,y, Ir, W, 2])

6 W; where y E Sit/(SGI), which proves this case.

We take Figure 2.3, 2.4, and 2.5 to exemplify proposition 2.3.1. Assume that
we want to ﬁnd the shortest path from node 5 in SG] to node 25 in SGé. Since
LUBSG(SGI,SG(1,) = 3, l = 3. Thus, we have SN(S§(SG,15)) = N32 = {19,20,25}.
From Figure 2.4 and 2.5, we can see that three edges are incident on node 5. They
are: one level 1 between edge (5, 7, 3), one level 2 within edge (5, 10, 6), and one level 2
between edge (5,20, 7). Then, by the above proposition, in order to ﬁnd the shortest
path from node 5 to any node in {19,20,25}, we do not need to follow the level 1

between edge (5, 7, 3) from node 5.

After obtaining a minimum cost path from S PAH shown in Figure 2.7, a navigator
(e.g. driver) may want to ﬁnd a more ﬁne-grained path connection on some edges (i.e.
high level edges) with a level number greater than 0. This can be accomplished by
specializing a high level edge. High level within edges are specialized by representing
them in terms of lower level edges. For this purpose, we keep actual shortest path

information for each corresponding within edges. A high level between edge cannot

 

34

 

begin
Step 1:
Assume we have a level It Hi Ti graph deﬁned on a level I: + 1 ST;
Find SG} and SG} to which START and DEST belong respectively;
We maintain G(V, E) together with H"(Pk, A") as adjacency lists;
Let 1 be the level number of LUBgc;(SG}, SGI);
f°r(P=k;p>=1;P--)
Mark all boundary nodes in SN(SC(S2(SG]))) with level p — 1;
Step 2:
/\(START) = 0;
FSet = { START}; ESet = 0;
while (FSet 96 0){
Select u from F'Set with minimum /\(u) + f(u, DEST);
/* the function f (u, DEST) estimates the cost of
the shortest path from node u to DEST */
FSet = FSet-{ u}; ESet: ESet U { u };
if (u = DEST) stop;
Let r be the highest level number of the edges incident on node u;
if (u is NOT marked) { /* Start Ascending Phase */
for each level p where min(l — l,r) S p S 1' {
for each edge u —z-> v with level p {
if ((11 ¢ FSet) and (u g ESet)) {
Mo) = Mu) + z;
FSet = FSetU{11};}
else {
if (Mv) > Mu) + 2) Mv) = Mu) + z; 1 } } }
else { /* u is marked */ /* Start Descending Phase */
Let [i be the marked level number on node u;
for each edge u ——z—> v with the levels from B to r {
if ((v Q ESet) and (v Q ESet)) I
X(v) = Mu) + z;
FSet = FSct U I v I;}
else {
if (Mv) > Mu) + 2) Mu) = Mu) + z; } } } 1

end

 

 

Figure 2.7: SPAH: Finding a Minimum Cost Path from START to DEST

35

be specialized in terms of lower level edges. This is because all between edges in

ULIU}; B; are the edges in G(V, E).

2.4 Performance Analysis

The algorithm SPAH can be classiﬁed as a variation of A* algorithm in that
it uses the function f(u,DEST) to estimate the cost of the shortest path from
node u to DEST. In the domain of road maps, the function f(u,DEST) com-
putes the Euclidean distance between the node u and DEST. This is possible
because the coordinates (i.e. longitude and latitude) of all nodes on a road map
are assumed to be available. Assuming (u.:1:,u.y) and (DEST.:r,DEST.y) are

the corresponding coordinates of the nodes u and DEST, f (u, DEST) computes

 

\ﬂDESTJE — u.a:)2 + (DEST.y - u.y)2. Since Euclidean distance is not an over—
estimated cost between node u and DEST, our algorithm ﬁnds the optimal shortest
path. The detailed proof for the optimality of A* algorithm is found in [30]. The
following Figure 2.8 shows A* algorithm which ﬁnds the shortest path from the nodes
START to DEST 011 G(V, E). Note that l(:r,y) represents the cost associated with

each edge (:r,y) E E.

A* shortest path algorithm are Shown to be more efﬁcient than the breadth-
ﬁrst search single pair shortest path algorithm when the database can ﬁt in main
memory [64, 80]. For algorithm SPAH, this is the case since we showed in Theorem

2.3.1 that its explored search space is at most ES which is small enough to ﬁt in main

36

memory.

 

begin
For each node u E V, Mu) = 00;
Let /\(START) = 0, ESet = {START}, and ESet = 0;
while (FSet 75 0) {
Select a node u in FSet for which /\(u) + f(u, DEST) is minimum;
FSet = FSet — {u} and ESet = ESet U {u};
If (u = DEST), then /\(u) is the Shortest path cost and Stop;
else {
For every edge (u,v) in E, if /\(v) > z\(u) + l(u,v) {
Let /\(v) = A(u) + l(u,v);
Let ESet = ESet U {12} if v a (ESet U ESet); } I}
end

 

 

 

Figure 2.8: A* Algorithm: Finding a Minimum Cost Path from START to DEST

For our empirical analysis of Algorithm S PAH , we create two dimensional grid
graphs G(V, E) with 4 adjacent nodes. Two dimensional grid graphs are considered
as typical examples of road maps [64, 91]. 111 grid graph G, IVI and IE I are equal to
800 x 800 nodes and 4 x 800 x 799 directed edges. From G( V, E), we create a level
4 subgraph tree ST where each level 1 subgraph SGI(V,~‘, E3) has IV,-1I = 100 x 100,
IE,-1I = 4 x 100 x 99. Thus, the level 4 subgraph tree ST consists of 64 level 1, 16 level

2, 4 level 3, and 1 level 4 subgraphs, which are shown in Figure 2.9.

We use the above level 4 subgraph tree ST to generate level 3 HiTi graph for the

empirical analysis of SPAH in the following subsections.

 

........................................

 

 

 

EEC] DEED [3.513 [:1 El g;j———.............
33GB in. - 1.13? 1:11;! 1:11;}, , ‘ “"‘ ...,...
EDD? EU: :‘Ej"t':i"§gtj""1jg5-—W~w
531:1-131 in. - :1 1:1 , 1;}? {g _ D»— . .... . 1....

 

 

 

............................
I...------~ p----—--o —-—----- ..-------o
:

-s-------v s-------o;

 

-~-------I ~-------o; -_~-------o \-------l:'

 

Figure 2.9: A 800 x 800 grid graph partitioned according to the level 4 ST

2.4.1 Comparison between SPAH and A* algorithm

To show the search Space savings of SPAH over the traditional A* algorithm (pre-
sented in [91]), we create 5 level 3 HiTi graphs where 10 S INEI S 20 and where
the edge cost is generated based on a uniform distribution [100,120] with 5 different
seeds. We represent IN,‘I as the total number of boundary nodes deﬁned on level 1
subgraph SG}. Next, we create 5 plain grid graphs which are simply unions of PE1
and level 1 subgraphs used in the above 5 level 3 HiTi graphs. For each level 3 HiTi
graph and plain grid graph created above, we compute 20 different Shortest paths
randomly preﬁxed pairs of source and destination nodes. Let H” and A, be the total
number of edges visited by S PAH and A* respectively. Note that throughout this
paper, we multiply the estimation of f (u, DEST) by 100 to normalize the estimation
with respect to the edge cost. The number 100 is used for this normalization because

the Euclidean distance between two adjacent nodes is l and the edge cost between

38

two nodes is at least 100. We compare SPAH and A* by observing the ratio An/Hn.
these values are then averaged over the 5 different level 3 Hi Ti and plain grid graphs
with the same source and destination nodes. They are shown in Figure 2.10 where
the numbers on :1: axis represent 20 ordered < source, destination > pairs with path

length increasing from 1 to 20.

35
30
25
20
15
10

 

Art/H11

 

 

 

l I I l l l l l l

 

0 2 4 6 8101214161820
the number of shortest paths

Figure 2.10: Performance comparison between A* and Algorithm S PAH

Figure 2.10 clearly shows how effectively Algorithm S PAH cuts down the search
space over the traditional A* algorithm. It is interesting to observe that the ratio
An/ Hn increases rapidly as the path lengths from source to destination increase. This
occurs because the search space A* needs to explore grows exponentially whereas that

of S PAH grows very slowly due to the hierarchical structure of H iTi graphs.

2.4.2 Effects of edge cost distribution

In this section, we studied the effects of edge cost distributions on the performance
of SPAH. For this study, we generate 4 x 5 (i.e. 4 uniform distributions with 5

seeds) level 3 HiT i graphs where 10 S INEI S 20. Note that the 4 uniform distribu-

39
tions [100,120], [100,200], [100,300], and [100,500] correspond to 20%, 100%, 200%,
and 400% variations of edge costs respectively. We apply S PAH to each level 3 Hi Ti
graph by randomly creating 50 different < source, destination > pairs and then aver-
aging the cost. Let MA and MD symbolize SPAH when the estimator f(u, DEST)
gives Euclidean distance and zero (i.e. no optimization based on Euclidean distance)
respectively. Figure 2.11 Shows the effect of edge cost distributions on the perfor-
mance of SPAH in terms of ﬂu (i.e. the total number of visited edges) of MA and

MD. Note that the values of H" are averaged over 5 seeds.

 

72000
70000
68000
66000
64000
Hn 62000
60000
58000
56000
54000
52000 1 J 1 l l J l
0 50 100 I50 200 250 300 350 400
Edge cost variation in %

IIIIIIII+I

 

 

 

 

Figure 2.11: Effect of edge cost on the performance of MA and MD

When f (u, DEST) gives Euclidean distance estimation, the performance of
SPAH deteriorates as the variation of edge cost is increased. The primary reason is
that increasing the variation degrades the quality of Euclidean distance estimation to
shortest path. This degradation of Euclidean distance estimation seems to have more
severe impact on the performance of MA for the ﬁrst part of edge cost variation (i.e.
between 20% and 200%) than for the rest (i.e. between 200% and 400%). Unlike

MA, MD gives a very stable performance with varying edge cost distribution.

40

2.4.3 Effects of the number of hierarchical levels

We examined the effects of different levels of Hi Ti graphs on the performance of
S PAH (i.e. MA and MD). We constructed different levels of HiTi graph out of the
same set of level 1 subgraphs. For this analysis, we create 2 (i.e. 4 g |N§| g 8 and
10 S INBI S 20) x 5 (i.e. an edge cost distribution [100,200] with 5 seeds) level 3
HiTi graphs. Similarly, we create 2 x 5 level 1 and level 2 HiTi graphs. Then, we

measure average H,, of MA and MD the same way as we did in section 3.2.2. They

are shown in Table 3.1 and 3.2.

 

level 1 H iTi graph

level 2 H iTi graph

level 3 H iTi graph

 

Hn for MA
H,, for MD

 

 

 

57428
72227

 

57336
71953

 

57063
69392

 

Table 2.3: Effects of levels of HiTi graphs on H" when 10 S IN,-1I S 20

 

level 1 H iTi graph

level 2 H iTi graph

level 3 H iTi graph

 

Hn for MA
Hﬂ for MD

 

 

 

50650
59684

 

50370
59642

 

50669
60114

 

Table 2.4: Effects of levels of HiTi graphs on H, when 4 S IN,‘| S 8

AS we can see from table 2.3 and 2.4, higher level Hi Ti graphs do not necessarily
guarantee the better performance when using S PAH than the lower level Hi Ti graphs.

It depends 011 the the entire search space for S PAH . The entire search space for

S PAH on a level k H iTi graph, represented by T", is formulated as follows:

 

 

 

 

41

nk 27111—1 _ "Is-2
T5 = 2 IE‘|+}:|W“I+—ZIW" 11+;— 21W“ 21+

i=1 i=1

+—Z|W?|+— EEZIW‘I forl=1,k

3i=l

Note that IEII represent the total number of edges of a level 1 subgraph and n}
represent the total number of level I subgraphs where 1 S l S k. If T” < T" where
p > q, then S PAH is likely to perform better on the higher level p HiTi graph than
on the lower level q HiT i graph. Otherwise, the higher level HiTi graph is not likely
to provide a performance advantage over the lower level Hi Ti graph. Table 2.5 shows
the values of T1, T2, and T3 corresponding to level 1,2, and 3 H iTi graphs used in

the tables 2.3 and 2.4.

 

T1 T2 T3
10 g IN,‘| g 20 99000 98737 98487
4 S IN,‘| S 8 82272 83364 83332

 

 

 

 

 

 

 

Table 2.5: Effects of levels of HiTi graphs on T"

Table 2.5 veriﬁes our conjecture on the performance of S PAH for different levels
of H iTi graphs. An interesting thing to note from the tables 2.3 and 2.4 is that the
search space (i.e. H,,) does not vary signiﬁcantly going beyond level 1 HiTi graph.
This is because our experiments were done with the grid graphs havingthe following
property. That is, the difference between T5 and Ti“ does not vary signiﬁcantly as i

increases. We believe that for most road maps this is the case. As a result, creating

42

higher level Hi Ti graphs does not contribute to the reduction in computation time.
From this observation, we can conclude that level i HiTi graph is good enough for

road map applications where the difference between Ti and TH“ is small.

2.5 Updating HiTi Graph

There are two different types of updates for a recursive relation G(V, E) (e.g. topo—
graphical road map). They are edge addition, and edge deletion. An edge addition
(deletion) can connect (resp. disconnect) two existing nodes belonging to either a

level 1 subgraph or two different level 111 subgraph where 1 S m S 11:.

Since updating G( V, E) may require modifying level I: HiTi graph, special update
problems associated with HiTi graph will need to be addressed. In the following

subsections, we will discuss the update problems on level I: HiTi graph.

2.5.1 Edge Deletion

Suppose edge (2:, y, z) is deleted from E. Then, we have two possible cases to consider.
The ﬁrst case case is when (x, y, z) is in level m between edge set where 1 S m S k.

The second case is when edge (x,y,z) is in E} where 1 S i S n].

The ﬁrst case may change boundary nodes (i.e. a: and y) to non-boundary nodes
and it may invalidate existing within edges. Boundary node :1: (y) becomes non-

boundary nodes if there is no other level m between edges incident on node a: (resp.

43

y). Assume that node x and y become non-boundary nodes. We need to delete the
nodes :1: and y from their corresponding boundary node sets. For this purpose, we
ﬁrst need to identify two level 1 subgraphs SG} and SG} where :1: E SN(SG}) and
y e SN(SG}). Then, we delete node :1: from SN( 7;,Sj,({SG}})) and node y from
SN(UI;1S]4({SG]})). Next, we ﬁnd all existing invalid within edges (a,b,fz(a,b)) in
Sw(SA({SG,l,SG]-})). A within edge (a,b,fz(a,b)) becomes invalid in the following

three cases.

case 4.1.1 Nodes a or b is :1: or y.
case 4.1.2 A path from node a to b is disconnected.

case 4.1.3 The value of fz(a,b) is not correct.

For the case 4.1.1 and 4.1.2, we delete (a,b, fz(a,b)) from the corresponding within
edge set. In the case 4.1.3, simply replace the incorrect value of fz(a, b) with a correct

one.

In the second case, the deletion of edge (:r,y,z) from E} may invalidate some
existing within edges in Sw(SA({SGI})). Thus, we need to identify all the existing
invalid within edges. After the identiﬁcation of all invalid within edges, we perform

the same actions as we did in the cases 4.1.2 and 4.1.3.

2.5.2 Edge Addition

Suppose a new edge (9:, y, z) is added to G(V, E). Then, there are two cases we need

to consider. The ﬁrst case is that edge (2:, y, 2) becomes a new level m between edge

44

where 1 S m S k. The second case is that edge (:r,y, z) is added to E}.

The ﬁrst case changes non-boundary node :1: (y) to boundary node if there is no
other level m between edges incident on node a: (resp. y). Assume that node :1: and
y become new boundary nodes. Then, it is necessary to add node x and y to their
corresponding boundary node sets. For this purpose, we need to identify two level
1 subgraph SG} and SG] where :1: E V,‘ and y E VJ-l. Then, we add node a: to
SN( I;S£,({SGI})) and node y to SN( I;,.S],({SG}})). Next, we add edge (:r,y,z)
to level m between edge set SB(S,',I‘({SG,‘}). The addition of this new level 772 between

edge may cause the following cases in Sw(SA({SG,l, SG}})).

case 4.2.1 Need to create new level ml within edges where 1 S m1 S 19.

case 4.2.2 Need to update f,(a,b) of some existing level m1 within edges where
1 S m, S k.

In the second case, the addition of edge (1:, y, z) to E3 may cause the same cases

as the cases 4.2.1 and 4.2.2 for edges in Sw(SA({SG}}).

2.6 Conclusion

In this chapter, we developed a new graph, called a Hi T i graph, to model very large
topographical road maps. Hi Ti graphs provides a powerful formal framework for
structuring topographical road map data in a hierarchical fashion. We have em-
pirically shown that our proposed shortest path algorithm, based on HiTi graphs,

signiﬁcantly reduces the search Space for computing the minimum cost path over a

45

very large topographical road map. Our algorithm is also empirically analyzed by
varying edge cost distributions and the number of levels of HiTi graphs. Finally, we
have investigated the problems of updating a Hi Ti graph in this paper. The applica-
tions of Hi Ti graph structure go beyond the domain of topographical road maps. It
can be applied to any very large recursive relations where hierarchical abstractions of

data are useful.

In addition to SPSP problem discussed in this chapter, there are quite a few other
database research problems associated with the automobile navigation system. They
include data models for storing large amounts of road map data, the determination
of current location in the database, and physical storage structure of road maps. We

give our survey on these problems in Appendix C of this thesis.

Chapter 3

HiTi Graph-based Parallel

Shortest Path Algorithms

In this chapter, we study for both intra and inter query parallel processing for SPSP
problems. Based on H iTi graph structure, we propose two parallel shortest path
algorithms named PASPAH and ISPAH for intra and inter query SPSP problems,
respectively. We empirically analyze the performance of PASPAH and ISPAH al-
gorithms on two-dimensional grid graphs by implementing them on BBN GP1000
shared memory multiprocessor system. The analysis of PAS PAH shows that the
average execution time of PASPAH is not signiﬁcantly better than those of S PAH
and M 0T Opar. From these performance, we conjecture that inter query shortest
path problem has more potential for the parallel processing than intra query shortest

path problem.

46

47

3.1 Introduction

A Single Pair Shortest Path (SPSP) computation is fundamental to topographical
road map queries. In order to reduce the large search space of topographical road
maps, a graph traversal approach is usually used. Although the graph traversal ap-
proach reduces the search Space signiﬁcantly, its search space is still large for most
real world road maps. To cope with this limitation, two new approaches are pro-
posed I5, 59]. Agrawal and Jagadish [5] studied SPSP computation in the context of
problem of performing efﬁcient search over disk-resident massive graphs. Jung and
Pramanik [59] proposed a new graph model, named Hi Ti graph model for this prob-
lem. These two methods narrow down the search space very signiﬁcantly at the cost
of storing the partially precomputed path information. However, their performance
improvement is still limited to a Single processor case. For interactive applications,
further reduction in response time is beneﬁcial. Thus, efﬁcient parallel SPSP algo-

rithms have been developed to improve the performance further.

Little research has been done on parallel SPSP algorithms. Mohr and Pasche [72]
presented a new parallel shortest path algorithm named OTOpar which is a parallel
implementation of 0T0 algorithm. OTOpar uses two processors where each processor
uses A* algorithm with Manhattan distance estimation to build its corresponding tree,
one rooted at the source node and the other rooted at the destination node. Their
empirical analysis shows that the average execution time of OTOparis roughly half of
0T0. However, OTOpar algorithm is not scalable at all. That is, their performance

improvement is limited by only two processors. Thus, we need more scalable parallel

48
SPSP algorithms than OTOpar.

In this chapter, we study the parallel processing for intra as well as inter query
shortest path problems. Intra query SPSP problem deals with parallelizing a single
transaction of SPSP problem. The parallel SPSP algorithm proposed by Mohr and
Pasche [72] belongs to this category. Inter query SPSP problem deals with paral-
lelizing multiple single-pair shortest-path computations. Inter query SPSP problem
arises in the domain of automobile navigation systems where many vehicles send their
shortest route computation requests to a. central server. Then the server must be able

to handle these multiple SPSP computation requests satisfying a real time constraint.

We have developed two parallel SPSP algorithms. One is for intra query SPSP
problem and the other for inter query SPSP problem. Both algorithms are based on
HiT i graph introduced in Chapter 2. Hi Ti graph structure provides the opportunity
for ﬁne-grained parallel processing for intra query SPSP problem. This is because
multiple parallel Shortest-path computations can be initiated based on the nodes at
a higher level of abstraction of a topographical road map. Note that H iTi graph
structure can capture a hierarchical abstraction of the topographical road map and

it is easy to identify the appropriate nodes at the higher level.

Even though we have done extensive analyses on parallel processing of H iTi graph,
we also have explored the possibilities of using H iTi graph in a distributed environ-
ment. 111 a distributed environment, H iTi graph provides an efﬁcient structure for
computing a distributed transitive closure. That is, it allows a dynamic decomposi-

tion of a transitive closure computation. This dynamic decomposition in turn allows

49

us to minimize the communication costs between the sites involved for the transitive
closure computation. Note that reducing the communication costs is a major opti-
mization goal for distributed transitive closure computations. We give the detailed
description of our proposed distributed transitive closure algorithm in the appendix

A of this thesis.

The rest of the chapter is organized as follows. In section 3.2, we give the detailed
description of intra query parallel Shortest path algorithm and empirically compare
its performance with those of the previous work. Section 3.3 discusses two inter
query parallel shortest path algorithms. We give empirical performance analysis of
these two inter query parallel algorithms in this section. Finally, section 3.4 gives the

concluding remarks.

3.2 Intra Query Parallel Shortest Path Computa-
tion

The main performance overhead of the shortest path computation comes from the size
of the search Space it needs to explore. In this section, we describe a PArallel Shortest
PAth algorithm based on H iTi graph (PASPAH). To Show the basic idea behind
PASPAH, we ﬁrst graphically show the explored search space of the algorithms
A* and OTOpar in Figure 3.1. Note that in the ﬁgure, the rectangle and eclipse
represent the entire and the explored search Space respectively. Arrows indicate the

direction of exploring the search space. It is intuitively obvious that A* algorithm

50

 

e) henﬂaedeeuchspeetorA‘nlgonuen b) beWmhw-Iwmtmm

Figure 3.1: Relative size of explored search Space for A* and OTOpar

explores the search space in an elliptical shape, which is shown in Figure 3.1 a. Since
OTOpar explores the search Space by applying A* algorithm from both the source
(i.e. S) and destination (i.e. T) nodes, the Size of its explored search Space becomes
smaller than that of A*. This is clearly illustrated in Figure 3.1 b. Based on this
observation, we can infer that, if a middle node M lying on the shortest path is
known, we can further narrow down the search space by applying A* algorithm from
the middle node to both source and destination, simultaneously. Our PAS PAH is

based on this idea which is illustrated in Figure 3.2.

 

 

 

 

 

 

 

Figure 3.2: Explored search space size of PASPAH

51
3.2.1 Use of HiTi Graph for Identifying Middle Nodes

The problem that remains is how to identify the middle node without computing the
shortest path. This problem can be partially resolved by taking advantage of HiTi

graph structure, which we explain through Figure 3.3.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Figure 3.3: Level 1 HiTi graph created from level 1 subgraphs

In Figure 3.3, the rectangles symbolize the level 1 subgraphs SGl to SG9 and
their boundary nodes are represented by B1 through B24. Level 1 HiTi graph de-
ﬁned on these level 1 subgraphs is then easily represented by all the boundary nodes
interconnected by the dotted arrows. Assume that we want to compute the shortest
path from node S in SG7 to node T in 302. It is clear from the ﬁgure that the
shortest path from the nodes S to T must pass through at least one boundary node
of S G; and one boundary node of S G2. From this observation, we can approximate

the two possible sets of middle node candidates for the shortest path. They are

52

M} = {818, B21} and M2 = {B2, B3, B6} where set M] is chosen in this example.

Then, the shortest path cost from the nodes S to T, represented by
PSP_COST(S, T), can be computed by exploring the search space from each
node in M1 to S and T simultaneously. Let S P_COST(X , Y) represent the
shortest path cost from the nodes X and Y. To obtain PSP_COST(S, T), we
ﬁrst compute SP_GOST(B18,S), SP_C’OST(Bl8,T), SP-GOST(B21,S), and
SP..COST(B21, T) in parallel. Note that each processor use A* algorithm to compute
SP_COST(X, S) and SP-GOST(X, D) where X 6 M1. PSP-COST(S, T) is then
obtained by selecting the minimum of { SP_COST(BI8,S) + SP_COST(BI8,T),
SP_COST(B21,S) + SP_COST(B21,T) }. Note that we choose M1 over M2 since
the size of set M1 is smaller than that of M2. This way, we can minimize the number

of processors involved for the computation.

As it might be noticed from the above example, there is a possible performance
bottleneck problem in this approach. We re—take Figure 3.3 to explain the bot-
tleneck in the following example. Assume the shortest path computation from
the nodes A in SG4 to D in SG6. Since the number of the boundary nodes of
SG4 and SG6 are equal, we will randomly choose one of them. In this example,
set {38,811,315} is selected. Then, in order to compute SP-COST(A, D), our
method will compute SP_COST(BS,A), SP-COST(B8, D), SP-COST(BII,A),
SP_COST(B11,D), SP_COST(BI5,A), and SP_COST(BI5,D) in parallel. As
we can see from Figure 3.3, since node A is located a lot closer to node Bll than the

nodes B8 and 815, the actual shortest path from A to D is likely to pass through

53
the boundary node B11. In such case, SP-GOST(B11,A) and SP-GOST(B]1,D)

computations will be done a lot faster than the others.

However, we have to wait until the rest of computations end, although we al-
ready got SP-COST(Bll,A) and SP_COST(Bll,D). This waiting time is re-
quired to guarantee the optimality of SP-COST(BII,A) + SP-COST(B11,D)
among the rest of results, which becomes the performance bottleneck of PAS PAH .
To cope with the performance bottleneck, it is necessary to minimize the waiting
time. The minimization of the waiting time can be realized if we have a cor-
rect and efﬁcient stopping condition. The stopping condition allows us to decide
the optimality of SP_COST(Bll,A) + SP-GOST(BII,D) before completing all
the computation of SP_COST(B8, A), SP-C'OST(BS, D), SP-COST(815, A), and

SP-COS'T(815, D).

3.2.2 Efﬁcient Stopping Criteria

Before we explain stopping condition, we ﬁrst examine how S P-COST(B8, A) in the
above example is computed in A* algorithm. In A*, the search space is explored by
permanently labeling a node :1: the shortest path cost from B8 to :1: until the next
selected node :1: becomes the node A. The next unlabeled node a: is selected from the
search space such that SP_COST(B8,:1:) + EuDist(a:,A) is minimum. Note that
EuDist(:c,A) is the Euclidean distance estimation from the nodes :7: to A. Based
on this observation, we now formally describe the stopping condition in the following

proposition.

54

Proposition 3.2.1 Given a directed graph G(V, E), assume that the shortest path
from the nodes S to T must pass through at least one node from set B = { B}, 82,
..., B" }. Let set C Q B where SP-GOST(X, S) and SP_COST(X, T) are already
computed for each node X E (7 at time to. For each X E B — C, let N(X, S, to) and
N(X, T, to) be the next node at time to selected by A* algorithm for the computation
of SP-COST(X, S) and SP_GOST(X, T) respectively. Then, PSP-COST(S, T) is
costo = mianc { SP_COST(X, S) + SP_C’OST(X, T) } if costo S minyeB-c {
SP-COST(Y,N(Y, S, 150)) + EuDist(N(Y, S, to),S) + SP-COST(Y,N(Y,T,to)) +

EuDist(N(Y, T, to),T) }.

Proof: Assume that costo is not PS P-COST(S,T) although costo S miny53_c
{ SP-COST(Y,N(Y,S,to)) + EuDist(N(Y,S,to),S) + SP_COST(Y,N(Y,T,to))
+ EuDist(N(Y, T, to),T) }. Then, at some time t, > to, there must be
some node Z G B — C such that SP-COST(Z,S) + SP_COST(Z,T) <
costo. However, SP_C()ST(Z, S) + SP-COST(Z, T) cannot be smaller than
SP_COST(Z, N(Z, S, to)) + EuDist(N(Z, S, to),S) + SP_COST(Z, N(Z, T, to)) +
Eu Dist(N(Z, T, to), T). This is because
SP_COST(Z,N(Z,S,t)) + EuDist(N(Z,S,t),S) and SP-COST(Z,N(Z,T,t)) +
EuDist(N (Z, T,t),T) are always monotonically increasing in A* algorithm where
to < t g t]. As a result, we have costo g miny€3_o { SP-COST(Y,N(Y,S,to))
+ EuDist(N(Y, S, to),S) + SP-COST(Y,N(Y, T, to)) + EuDist(N(Y, T, to),T) } <

SP_COST(Z, S) + SP_COST(Z,T) < costo, which is a contradiction. Cl

55

To exemplify how proposition 3.2.1 works, we show the stopping con-
dition for the computation of SP-COST(A, D) in Figure 3.3. Assume
that SP-COST(BII,A) and SP_COST(B11,D) are obtained at time to.
Then, PSP_GOST(A,D) is costo = SP_COST(BII,A) + SP-GOST(BII,D)
if costo S min { SP-COST(B8,N(B8,A,to)) + EuDist(N(B8,A,to),A),

SP_COST(815, N(Bl5, D, 70)) + EuDist(N(Bl5, D, to), D) }.

3.2.3 Formal Description of PASPAH

In the previous section, we described the basic idea as well as a theoretical foundation
of PASPAH when level 1 HiTi graph is given. We now formally present PASPAH
in Figure 3.4 where we use the same notations as those deﬁned in the previous chapter
2.3. Our PASPAH is developed for Shared memory multiprocessor systems with the

following architecture assumptions:

0 All processors have their own local memory.

0 All Processors communicate with each others through globally shared memory.

0 All processors share disk.

Based on the preceding assumptions, PAS PAH accesses a level k + 1 subgraph tree,
level Is: HiTi graph H"(P",Ak), and level 1 subgraphs U?=‘1SG2(V,-1,E,-l), which are

stored in the globally Shared memory.

56

 

 

begin
Step 1: .
for(j=k+l;j>=l;j——) .
Mark all boundary nodes in SN(SC(S,’,(SG1, SGI))) with level j — 1;
Let l be the level number of LUBSG(SG;, SGI);
f0f(J'=l-1;j>=1;j--) _
Mark all boundary nodes in SN(SC(SQ(SG:, SG}))) with level j — 1;
Step 2: '
If(|3~(Sit-1(SGI))I .<. ISN(S.I4-1(SGi))l) 501;] = 32"(501);
else SGLII = Sfl(SG});
Step 3:
for each boundary node a: in SN(SGI,I1)
Enqueue (9:,S) and (2:,T) to BQ;
Step 4:
while (IBQI > 0) {
Dequeue (x,y) from BQ;
A next available processor performs S P(:c, y);
/* 'SP(:r, y) computes the shortest path from :1: to y */
/* It checks stopping criteria given in proposition 3.2.1
when the computation is done */}
end

 

Figure 3.4: PASPAH: Find a shortest path from S in SG; to T in SC]

 

57

PASPAH consists of four steps. At step 1, the boundary nodes of HiTi graph is
marked with edge level numbers. This marking step is necessary to avoid traversing
the unnecessary low level edges incident on the marked nodes. When a marked node
is visited, the corresponding marked level (i.e. lowJevel) Speciﬁes the lowest level
of the edges to be traversed from the marked node. Since the highest level (i.e.
hithevel) of the edges incident on the marked node can be easily identiﬁable, the
nodes expanded directly from the marked node are only those reachable through
the edges with the levels ranging from lowJevel to hithevel. This marking step
guarantees that the explored the search space of PASPAH is at most E,1 U E',1 U

53(5c(U"+'Sii(SGLSGI))) U Sw(SC(Ui:.’Si(SGl.303))-

i=1

At step 2, we determine the level I subgraph SGI;1 whose boundary nodes will
be used as the middle nodes as explained in the previous section 3.2.1. The step 3
is self explanatory. In step 4, the shortest path computation S P(:r,y) for each pairs
of nodes (2:,y) in BQ is performed in parallel by the available processors. S P(:1:,y)
computes the shortest path from the nodes :1: to y by using A* algorithm. Note that
At: algorithm in S P(:z:, y) explores the neighbor nodes of the currently expanding node
2 by following only those edges whose level number is not less than the marked level
number of the node 2. Whenever each processor ﬁnish its computation, it exclusively
checks the stopping condition described in the previous section 3.2.2. If the condition
is satisﬁed, then it aborts all the computations being performed by other processors.

Otherwise, it simply stops its computation.

58

3.2.4 Performance Evaluation

We empirically compare PASPAH with 0T0, OTOpar [72] and SPAH [59] algo-
rithms. S PAH is an improved A* algorithm which takes advantage of HiTi graph
structure. For fair comparisons, we modiﬁed 0T0 and OTOpar so that they also uti-
lize HiTi graph structure. In other words, in the modiﬁed 0T0 and OTOpar named
M 0T0 and M OTOpar respectively, we use S PAH algorithm for building two trees
from both source and destination nodes. Note that we use Euclidean distance for the
lower bound estimation in S PAH rather than Manhattan distance of OTOpar. This
is because Manhattan distance estimation does not guarantee the optimal shortest

path generation.

We implemented SPAH, MOTO, MOTOpar and PASPAH on a BBN GP1000
Shared memory multiprocessor system, which has a nonuniform memory architec-
ture. The BBN GP1000 multiprocessor currently consists of 85 nodes, each one with
4MBytes of local memory, linked together by a high speed butterﬂy switch. In this
system, the globally shared memory is the sum of the memories local to all processors.
Thus, the size of available main memory increases with increasing number of nodes

in the system. The BBN GP1000 system can have up to 250 processing nodes.

For our empirical analysis, we create two dimensional grid graphs G(V, E) with 4
adjacent nodes as we did in chapter 2.4. In grid graph G, IVI and IEI are equal to
400 x 400 nodes and 4 X 400 X 399 directed edges. From G(V, E), we create a level

4 subgraph tree ST where each level 1 subgraph SG}(V,-1,E}) has IV,‘I = 50 x 50,

59
IEII = 4 x 50 x 49. The level 2 subgraph tree ST consists of 64 level 1, 16 level 2, 4

level 3, and 1 level 4 subgraphs.

Based on the above level 4 subgraph tree, we create level 1 HiTi graph which
will be used throughout this section. To Show performance comparison between
PASPAH, MOTOpar, MOTO, and SPAH, we create 5 level 3 HiTi graphs where
3 S INEI S 8 and where the edge cost is generated based on a uniform distribution
[100,120] with 5 diﬁerent seeds. Note that INRI is the total number of boundary
nodes deﬁned 011 level 1 subgraph S G}. For each level 1 Hi Ti graph, we compute two
different sets SET I and SET II of 20 shortest paths randomly preﬁxed pairs of source
and destination nodes. SET I consists of the pairs whose source node location within
the corresponding level 1 subgraph is not far apart from the destination nodes. A

simpliﬁed example of these two sets is shown in Figure 3.5.

MbHIuw‘huet-M“i

9 . :J sg/V
\_' . “ :rj/1/1
0 ‘ . ..r/Tw
\. .444

(/

 

' v
\I
\\\3

. x I -.u,- x //‘3'

1V?“ 3544/3/4, /5,/. .9
&\-.:‘.\§~o Mfy/ﬂa xx,-
‘sEth-‘ek “Wu/ .(ouéu’

I

 

 

Figure 3.5: An example of sets I and II

We then compare the above 4 algorithms by observing their average execution
times of two sets SET 1 and SET 11 over 5 different level 1 HiTi graphs. The per-

formance of the four algorithms are given in Figure 3.6 a and b. As it is shown in

60

Figure 3.6, PASPAH requires 42 processors to achieve the maximum parallelism in
this empirical analysis. Figure 3.6 a Shows that PASPAH outperforms the rest of
the algorithms. However, Figure 3.6 b Shows that PAS PAH does not perform sig-
niﬁcantly better than S PAH and M OTOparl. This is because, when source nodes
comes from SET 11, the explored search space of PAS PAH becomes almost same
as those of SPAH and MOTOpm'. Thus, the average execution time of PASPAH
is not signiﬁcantly better than S PAH and M OTOpar. From this observation, we

conjecture that the parallel processing for intra query SPSP problem is not promising.

We have observed that S PAH performs better than M 0T0 and M OTOpar,
which is a different result from the one given in [72]. Our empirical observation
can be explained by the following two reasons. The ﬁrst reason is that HiTi graph
structure signiﬁcantly reduces the advantage of using the two tree expansion approach
given in [72]. In other words, the most of the nodes in the two level 1 subgraphs (i.e.
where source and destination nodes are in) are already explored when two trees start
to include the nodes in Hi Ti graph. This conjecture is veriﬁed in the appendix B of
this thesis where the two tree expansion approach performs far better than than that
of one tree expansion approach when H iTi graph is not used. The second reason is
that the lower bound estimation of M OTOpar is Euclidean distance which provides a
lot tighter bound than Manhattan distance of OTOpar in [72]. As a result, compared
with OTOpar, M OTOpar takes much longer time to stop building the two trees

before it ﬁnds the shortest path.

 

1In Figure 3.6 b, the minimum execution time of PAS PAH , SPAH , M OTOpar, and M 0T0
are 8.36, 8.46, 9.92, and 15.56 seconds respectively.

61

 

 

 

 

 

 

60 1 1 1 1 1 1 1 1
5O . paspah G— ..
spah ~+ -
40 moto B— —
motopar ->(- -
Seconds 30
20 .
lO
0 3 .9.
0 51015202530354045
the number of processors
85)
250‘11111111
paspah 9—
200 spah .+. . -
moto B—

I

150

motopar ->(- - ‘

Seconds

100

50

 

 

 

 

0
0 51015202530354045
the number of processors

b)

Figure 3.6: Performance of PASPAH, SPAH, MOTOpar, and MOTO over: a)
SET I b) SET 11

62
3.3 Inter Query Parallel Shortest Paths Computa-

tion

In this section, we study the parallel processing for computing inter query Shortest
path based 011 Hi Ti graph. Efﬁcient and fast computation of multiple SPSP shortest
paths are very critical for those automobile navigation systems where all route com-
putations are performed 011 a Single server. For this purpose, we propose a new inter

query parallel shortest path algorithm in the following subsection.

3.3.1 Formal Description of ISPAH Algorithm

As it was already shown in section 3.2.4, S PAH outperforms M 0T0 and M OTOpar.
Due to the number of processors P AS PAH requires, PAS PAH is not appropriate for
inter query parallel shortest paths computation. We therefore use S PAH as the unit
operation for parallelizing inter query Shortest paths computation. In other words,
S PAH is performed in parallel for each Shortest path computation request. The
detail description of the inter query parallel shortest path algorithm, named Inter

SPAH (ISPAH), is shown in Figure 3.7.

Algorithm ISPAH executes SPAH in parallel on a shared memory multiprocessor
system. For SPAH running on each processor, it accesses both local memory and
globally shared memory for the computation. Algorithm SPAH accesses the glob-
ally shared memory only when it needs to accesses G(V, E) or level ls: HiTi graph

Hk(Pk, Ah). Other than that, SPAH accesses the local memory. Note that there is

63

 

begin
Adjacency lists of G(V, E) and level It HiTi graph H"(P", Ak)
are stored in the globally shared memory;
Let R contain a set of source and destination nodes pairs (x,y);
for each (x,y) E R do in parallel {
Find SG; and SG; to which :1: and y belong respectively;
Let I be the level number of LUBS(_;(SG;, SGI);
f0r(P=l-1;P>=1;P--)
Locally mark all boundary nodes in SN(SC(SZ(SG]-))) with level p — 1;
Perform Step 2 of SPAH for (x,y); } '
/* SPAH is given in Figure 2.7 in section 2.3.1 */
end

 

 

 

Figure 3.7: ISPAH algorithm

no memory access contention for reading or writing a local memory. In the following

section, we empirically analyze the performance of I SPAH .

3.3.2 Performance Evaluation

We implemented I S PAH on a BBN GP1000 shared memory multiprocessor system.
For our empirical analysis, we used the same grid graph as described in section 3.2.4.
Based on these data sets, we computed 43 randomly preﬁxed shortest paths. The
performance is then measured in terms of the average speedup Tn, where Tﬂ represents
the total time taken by n processors to compute M shortest paths. We express the
speedup S,, of 12 processors as T}/ T,,. Figure 3.8 shows Sn as n is increased from 1 to

43.

As we can see from Figure 3.8, the Speedup of I S PAH increases almost linearly

up to 10 processors, after 10 it is beginning to level off, and after 25 processors,

64

 

4'5 I I I I f I I I

40 ISPAH e- ‘
35

30
Speed 25
U 1) 20

I’I I
I

I
I

 

 

 

 

0 51015202530354045
the number of processors

Figure 3.8: Performance of ISPAH

the performance is ever deteriorating. This occurs because of the globally shared
memory access contentions. In I S PAH , all processors have to access a single H iTi
graph in a globally Shared memory, which causes a severe memory access contention
when more than 25 processors are used. In order to verify our conjecture of this
memory access contention, we modiﬁed ISPAH so that an entire level I: H iTi graph
H "(Pk , A") is replicated on the local memory of each processor. This revised I S PAH
is named a Modiﬁed ISPAH (MISPAH). We empirically analyzed M I SPAH the
same way as we did for ISPAH. Figure 3.9 shows the speedup of MISPAH up
to 43 processors showing the advantage of replicating a level I: H iTi graph on each
processor. Although M l S PAH performs better than I S PAH , it is scalable up to
41 processors. From this analysis, we conjecture that the parallel processing for inter

query SPSP problems are much more promising than intra query SPSP problems.

 

451lTlTllT

40 t MISPAH e— "
35 ~ —
30 - -
Speed 25 L-
Up 20 ~
15 - -
10 —
5
0 v 1 1 1 l 4 #1 1

0 51015202530354045

the number of processors

T I

 

 

 

Figure 3.9: Performance of MISPAH

3.4 Conclusion

In this chapter, we studied the parallel processing for intra as well as inter query
shortest path problems on topographical road maps. Based on Hi T i graph structure,
we ﬁrst developed a parallel shortest path algorithm named PAS PAH for intra query
shortest path problem. Hi Ti graph structure provides an opportunity for developing
more scalable parallel shortest path algorithms than two tree building approach used
in M OTOpar algorithm. In M OTOpar algorithm, only two processors are used for
building two trees, one from source and the other from destination. As a result,
M OTOpar algorithm is not scalable at all. We empirically analyzed the performance
of PAS PAH by comparing its execution time on grid graphs with those of M OTOpar
and SPAH. We used a BBN GP1000 Shared memory multiprocessor system to
implement PASPAH, MOTOpar, and SPAH. The BBN GP1000 has a nonuniform
memory architecture. Note that SPAH presented in chapter 2 is the H iTi graph-

based sequential shortest path algorithm. Our empirical analysis shows that, although

66
PAS PAH is more scalable than M OTOpar, the average execution time of PASPAH

is not signiﬁcantly better than those of M OTOpar and S PAH . From this analysis,
we conjecture that the parallel processing for intra query shortest path problem is

not very promising.

For inter query Shortest path problem, we proposed a H iTi graph-based parallel
algorithm named I SPAH . We empirically analyzed the performance of I S PAH on
the BBN GP1000 shared memory multiprocessor system by measuring its speedup
on grid graphs as processors are increased. Our analysis shows that the speedup
of I S PAH increases almost linearly for up to 10 processors. However, I S PAH is
scalable up to 25 processors. This performance degradation occurs due to the severe
memory access conﬂicts of processors. We then presented an improved version of
I S PAH named M I S PAH which reduces the memory access conﬂicts through partial
data replication. The peformance analysis of M I S PAH shows that it is scalable up
to 41 processors. From this empirical analysis, we conjecture that inter query Shortest
path problem has more potential for the parallel processing than intra query shortest

path problem.

Chapter 4

Description and Location of
Distributed Fragments of Large

Recursive Relations

In a distributed environment, it is advantageous to fragment a relation and store the
fragments at various sites. 111 this chapter, based on the concept of lattice structures,
we develop a framework to study the fragmentation problems of distributed recursive
relations. Two of the fragmentation problems are how to describe and locate frag—
ments. Description and location methods previously suggested are more suitable in
parallel environments than in distributed databases. In this chapter, we propose a
method to describe and locate fragments based on lattice structures. Finding lattice
descriptions of fragments is shown to be an N P-complete problem. We analyze the

performance of lattice approach both theoretically and experimentally. This is done

67

68

by creating a database of recursive relations. The empirical analysis shows that our

proposed algorithms give near-optimal solutions.

69
4.1 Introduction

In distributed database systems, it is often convenient and beneﬁcial to fragment and
distribute data according to referencing frequency or locality. By having only fre-
quently accessed data stored in each local site, we can reduce the data maintenance
cost. For example, in a distributed database for designing automobiles or aircraft, it
is desirable to store the relevant and frequently-accessed parts together. This data
can be maintained in a Single recursive relation such as a parts/subparts relation.
There are three major problems in fragmenting recursive relations in distributed en-
vironment. The ﬁrst is how to fragment a recursive relation. The second and third is
how to describe and locate the fragments of the recursive relation. The focus of our

research is to investigate the second and third problems.

Fragmentation in traditional (non recursive) databases is accomplished through
the use of logical predicates, which the tuples of the fragments must satisfy. These
logical predicates can easily capture the referencing locality of non-recursive data.
They are formed by conjunction, disjunction, and negation statements deﬁned on the
attribute values of the tuples. The fragmentation criterion, or Simply criterion, for a
fragment F is a predicate which, when applied to a relation R, will determine which
tuples of R belong to F. This fragmentation of non-recursive relations is investigated

in [16].

Fragmenting recursive relations in distributed databases has not been extensively

studied. Most papers 011 parallel and distributed computation of transitive closures

70

have applied the traditional fragmentation criteria to recursive relations [32, 45, 47,
74, 84, 102]. However, this criteria, deﬁned by a predicate or a hash function, does not
properly capture the referencing locality of data in recursive relations. Locality often
comes from the transitive relationship between data. Stated another way, relevant
data, frequently accessed by recursive queries, are likely to be related in transitive
closures. Although the fragmentation based on the transitive relationship provides
less efﬁcient non-recursive query processing strategies for recursive relations than the
traditional predicate-based fragmentation, it gives a very powerful basis for an efﬁcient
fragment description and location technique suitable for recursive query processing.
An efﬁcient processing of recursive queries is important for large recursive relations

in distributed databases.

Based on the criteria deﬁned by transitive relationships, Houtsma, et al., studied
the parallel computation of transitive closures in [42, 43] and designed strategies
for fragmentation in [44]. However, Houtsma’s paper [44] did not discuss how to
describe and locate fragments in a distributed environment. In a distributed database,
we need an efﬁcient fragment description and location method because we can avoid
a signiﬁcant amount of communication cost for query processing by keeping remote
fragment descriptions in local sites. In this paper, we focus only on acyclic recursive
relations. Unless otherwise stated, recursive relations and acyclic recursive relations

will be used interchangeably.

The rest of the chapter is organized as follows. Section 4.2 describes a fragment

description method based on lattice structures. In Section 4.3, a fragment location

71

method is described. In section 4.4, we suggest methods to incrementally update
description of fragments when a recursive relation is modiﬁed. The performance
analysis of our update algorithm is also discussed in this section. Finally concluding

remarks are given in Section 4.5.

4.2 Fragment Description by Lattice Structures

In this section we use the properties of partial order sets (posets) to deﬁne and locate
fragments of acyclic recursive relations. The acyclic recursive relations, considered in
this paper, have the generic form R (attributel, attributeg, attributeg, . . . , attributem)

where the attributes of R satisfy the following:

I. attribute} and attribute; are the key attributes of R and share the same domain

2. attribute} and attribute; are the recursive join attributes and their correspond-

ing values are related by some transitive relationship

3. attributeg through attribute”, describe the relationship between attribute} and

attributeg

The above relation R is associated with relation R’ which maintains detail descriptions
0f attribute} of R. Thus, relation R’ has the generic form of R’ (attributel', attributeg’,

attribute3', ...,attribute,,’) where the attributes of R’ satisfy the following:

I. attribute}I of R’ is the same as either attribute] or attributeg of R

72

2. attribute,’ is the key attribute of R’.

3. attribute; through attribute,,' describe attribute},

Examples of relation R and R’ are Shown in Table 4.1 and Table 4.2 where R and R’

correspond to part / subpart and part relations respectively.

 

 

 

 

part subpart quantity
engine cylinder head 1
engine fan belt 1
cylinder head piston 4

 

 

 

 

 

Table 4.]: A11 example of relation R

 

 

 

 

part color weight dimension cost
engine black 800 3 by 2 by 2 1000
cylinder head gray 100 1 by 2 by 2 250
fan belt black 30 1 by 5 by 1 250
piston Silver 70 1 by l by 1 100

 

 

 

 

 

 

 

Table 4.2: An example of relation R’

The above relations part/subpart and part together can be viewed as a directed
acyclic graph (DAG) GR(VR, ER). Each node in V3 represent a tuple of the part
relation and symbolized by the key attribute of the part relation. Each edge in ER
represent a tuple of the part / subpart relation and symbolized by the key attributes

of the part/subpart relation. The values of attributes part and subpart are related
by an inclusion (i.e. transitive) relationship. For example, engine directly includes
cylinder head and fan belt while piston is indirectly included in engine. If we consider

this inclusion relationship as a partial order, then DAG GR(VR, ER) is a poset. A

73

partial order, :1: j y, is represented as a path in this DAG if there is a path of length

0 or more from node :1: to y.

In our fragmentation method, each site keeps ER while VB is fragmented exclu-
sively and distributed over various sites. Thus, a fragment is a subset of nodes in V3.
Since each site stores ER, our fragmentation technique is more effective when the ratio
of [ER] to Will is small. When IERI is much larger than IVRI, the beneﬁts of fragment-
ing ER are likely to exceed the beneﬁts of just fragmenting VR. However, fragmenting
ER causes one problem in our fragment description and location method presented
in the following sections. The problem is, if E; is fragmented and distributed, then
we cannot locally determine the location of remote fragments necessary for recursive

query processing.

Deﬁnition 4.2.1 Let GR(VR, ER) represent an acyclic recursive relation. Then, an
F—subgraph for a fragment F is a subgraph Gp(Vp,Ep) of the graph G3 induced by

the nodes offragment F.

For example, Figure 4.1 shows a digraph GR(VR, Eg) for a relation R. Suppose G3
has three fragments named F] = { 1, 2, 5, 8, 9 }, F2 = { 3 ,5, 7, 10, 11, 12}, and F3

={4,13}.

Figure 4.2 shows the three F-subgraphs G1, G2, and G3 induced by the fragments
F1, F2 and F3, respectively. The F—subgraph is not likely to be a set of randomly
unrelated nodes. More likely, it will be a connected component or a collection of
several connected components. This is because, as mentioned before, the referencing

locality of nodes in V3 often comes from the transitive relationships among the nodes.

 

Figure 4.1: A digraph G for an acyclic relation

An F -subgraph is also a poset, since each subset of a poset is itself a poset. Therefore,

the properties of posets can be used to describe and locate fragments.

 

MW 03

Figure 4.2: Three F -subgraphs

The descriptions of fragments are maintained in a fragment table at each site. A
fragment table is a binary relation of the form (f, d) where f is a fragment identi-
ﬁer and d is the corresponding description. A naive description (I of a fragment is
deﬁned as the set of key attribute values for nodes contained in the fragment. This

naive description is quite ﬂexible in the sense that any fragment can be described by
grouping the key attribute values of its elements. It is also easy to maintain when

updating fragments. However, this naive description suffers from the following serious

drawbacks:

75

1. Fragment description size is large

2. Fragment description size grows linearly with respect to fragment size

3. Communication of fragments’ descriptions with other sites is costly due to their

large size

4. A large search space is needed to ﬁnd the fragments containing certain nodes.

The above drawbacks can be very costly, especially when the original relation is very
large. A more efﬁcient approach is required for practical purposes. By taking advan-
tage of ER at each site, we can construct a concise but ﬂexible fragment description

method. This method is illustrated in the following subsections.

4.2.1 Maximal and Minimal Nodes Approach

We can represent a fragment by a set U of maximal nodes and a set of V minimal

nodes of the F-subgraph GF(VF, Ep). Let F be a fragment. Then

F: {.2le VRwhere ajzij aGU A bEV}

Thus, a node is in the fragment if it has an ancestor node in U and a descendant node
in V. The fragment F is then represented by the notation < U, V >. This approach,
however, is not complete because it cannot describe certain types of fragments. For

example, consider the DAG GR(VR, ER) in Figure 4.3.

76

@0

00

I
(9 0

Figure 4.3: An Example of DAG GR(VR, E3)

The fragment F = { 1, 2, 3, 7, 8, 9 } cannot be represented by this approach. This is

because the fragment represented by < {1, 7}, {3, 9} > contains the additional node
5. In order for this maximal and minimal nodes approach to be applicable, we need
the restriction which should be imposed on fragments. That is, for each fragment F
represented by < U, V >, the intersection between U U { all descendants of nodes
in U } and V U { all ancestors of nodes in V } should be equal to F. In the next

subsection, we give a representation method based on the lattice approach which does

not require the above restriction.

4.2.2 Lattice Approach

11) this approach, a fragment is described by a set of lattices deﬁned on the corre
Sponding F-subgraph Gp(Vp, Ep). A lattice is represented by L = < 2:1,,y1, > where
ml, is the least upper bound (LUB) and y}, is the greatest lower bound (GLB). Next,

We present a list of deﬁnitions which will be used throughout the rest of the paper.

Deﬁnition 4.2.2 Let GF(VF, Ep) represent an F—subgraph. Then, any pair of nodes

(any) in Vp is a candidate lattice < :r,y > for GI“ ifs: :5 y.

77

Deﬁnition 4.2.3 A candidate lattice < 1:,y > is called a simple lattice ifa: 74 y and

there is no 2 such that a: j z j y, z 76 x and z 75 y. A candidate lattice of the form

< 3,:1: > is called a trivial lattice.

Deﬁnition 4.2.4 Let CR(VR, ER) represent an acyclic recursive relation and
CF(VF,EF) be an F—subgraph of CR. A candidate lattice < 1:,y > for CF is suit-
able if the intersection between { a: } U { all descendants ofz } and{ y } U { all

ancestors ofy } forms a nonempty subset of VF.

Deﬁnition 4.2.5 Let L = < :c,y > be a suitable candidate lattice for Cp(Vp, Ep).

Then, lattice L covers node 2 E Vp ifs: j z j y.

Deﬁnition 4.2.6 Let CL be a set of suitable candidate lattices defined on the F-
subgraph Cp(Vp, Ep). Let LC Q CL. Then, we say LC is a lattice cover for CF if

every node of Vp is covered by a member of LC.

We will represent a fragment by a lattice cover. Unlike the maximal and minimal
nodes approach, the lattice approach is complete in that it can describe any fragment
by a set of suitable candidate lattices deﬁned on the corresponding F-subgraph. For
example, { < 1,3 >, < 7, 9 > } is a lattice cover precisely describing the fragment {
1, 2, 3, 7, 8, 9 } shown in Figure 4.3. Thus we can deﬁne a fragment F by a lattice

cover as follows:

F = {zlszjy for at least one <z,y>€LC}

78

More than one lattice cover exists for a given F-subgraph. Figure 4.4a and 4.4b show

two different lattice covers for the same F-subgraph G2 of Figure 4.2.

 

Figure 4.4: Two lattice covers for the F-subgraph G2

The above Figures 4.4a and 4.4b correspond to the lattice covers LC; = { < 3,10 >,

< 3,11 >, < 3,12 > } LC2 = { < 3,10 >, < 7,11 >, < 3,12i> } respectively.

So far, we have presented the basic idea for describing the fragment by using a set
of suitable candidate lattices called a lattice cover. One important thing to note from
this lattice approach is that our lattice representation scheme (i.e. L = < 1:1,,yL >)
has one representational ambiguity. That is, we cannot always uniquely reconstruct
any lattice L by using its LUB x1, and GLB yL alone. This problem will be shown
from the lattice L]: < 3,11 > = {3 j 6, 6 j 11, 3 j 7, 7 j 11 } in Figure 4.4a.
From L1, consider two lattices L2 = {3 j 6, 6 j 11 } and L3 = { 3 j 7, 7 j 11
} . Even though all three lattices L1, L2, and L3 are represented by the same LUB
and GLB (i.e. < 3,11 >), they are different lattices. However, this representational
ambiguity is not a problem with our lattice approach. This is because in our lattice

ELpproach, lattice L = < ethyl, > always refers to the one which maximally covers

79

nodes in GR(VR,ER). Thus, in the above example, lattice < 3,11 > refers to the

lattice L1.

The question that remains is what criteria will determine a good lattice cover for
an F-subgraph. We deﬁne a good lattice cover as one that minimizes the search space
for ﬁnding a lattice incorporating a given node. The search time will depend on the
number of lattices in a lattice cover and not on the size of each lattice. In other words,
the number of lattices in a good lattice cover should be as small as possible while the
amount of overlap among the lattices is not important. An index called lattice cover

size will be used to represent the number of lattices in a lattice cover.

In the next subsection, we show that ﬁnding a lattice cover with a minimal index

is NP — complete. We then present a heuristics for ﬁnding a good lattice cover.

4.2.3 Lattice Cover Problem is NP — complete

The lattice cover problem, LC P, is deﬁned as follows. Let P be a set of nodes where
the partial order j is deﬁned. For a given positive integer K, the problem is to
determine the existence of a lattice cover LC whose cover size S K such that every
node of P belongs to at least one of the lattices in LC. To prove it is NP — complete,
we will map the minimum cover problem [33] to LC P. The minimum Cover problem,
M C P, is stated as follows: Given a collection C of subsets of a ﬁnite set S with a

positive integer K g |C|, is there a subset C’ Q C with IC’I S K such that every

element of S belongs to at least one member of C’?

80

Before presenting a detailed proof for NP — completeness of LCP, we ﬁrst explains
the basic intuition on how to check if a given set of lattices is a lattice cover for the
F-subgraph. Given a lattice in a lattice cover, ﬁnding a set of nodes covered by
the lattice requires performing two transitive operations, one from the LUB (on the
original edge graph) and the other from the GLB (on the inverted edge graph) of
the lattice, and ﬁnding the intersection of the two node sets. Checking that a set of
lattices is a lattice cover involves taking the union of the sets of nodes covered by the
lattices and checking if this is identical to the sets of all nodes in a fragment. Next,

we prove that LCP is NP — complete.

Lemma 4.2.1 LCP is NP — complete

Proof: LC P is in NP, since a nondeterministic algorithm may guess a set of
suitable candidate lattices LC for a partial order set (P, j) such that |LC I S K, and

then check in polynomial time whether LC is a lattice cover.

We transform MCP to LCP. Let C = { S], 3;, ..., Sn } be a collection of subsets

of a ﬁnite set S. The poset (P, j) is constructed as follows. For each subset S.- E C,

we create four nodes, M,“ M,-,, N,, , and N,,. Next, we create direct partial orders

for each i such that M,, j N,, and Mg, j N,., wherei = 1, 2, . .. ,n. For every x E S,-

where i = 1, 2, .. . ,n, we create a direct partial order such that Mg, j .7: j N,,. This

newly constructed poset (P, j) has 2n maximal nodes (i.e. M1,, M1, M2,, M2,, . . .

a ".11, Mn_12, Mm, Mm) and 2n minimal nodes (i.e. N1“ N12 N21, N22, . . . ,Nn_1,,
N,,.],, NM, NM). From this poset (P, j), let set M and N consist of all maximal

and minimal nodes of P. Then, we create CL as such CL = { < x,y > : a: E M /\

 

81
y 6 N A 1' j y }.

Then, in any lattice cover LC Q CL, we must have at least 2n suitable candidate
lattices to cover both the n maximal nodes (i.e. M1,, M2,, ,Mn,) and the n
minimal nodes (i.e. N12, N22, ,N,g,). Furthermore, it is always possible to derive
the above 2n suitable candidate lattices as 2n simple lattices. This is because we
have Mg2 :5 Ng, and Mg, j Ng2 where i = 1, 2, ,n. Let a set X denote the above

2n simple lattices.

It is obvious that the lattices in X also cover the remaining 12 maximal nodes
(i.e. M1,, M2,, ,M,,,) and n minimal nodes (i.e. N1,, N2” ,an). Note that
no lattice in X covers any node in S because the lattices in X are simple lattices.
Therefore, if we can derive a lattice cover LC, where ILC I S (K + Zn), from this

poset (P, S), then we have a minimum cover C’ Q C with IC’I S K.

To see that this transformation can be performed in polynomial time, it sufﬁces to
observe that the total number of partial orders in (P, j) are bounded by a polynomial

of 0(ICI - max{|Sg| : Sg E C}). D

In order to demonstrate how the above transformation works, we will illustrate an
example as follows. Let MCP have the instances such as S = { 1, 2, 3, 4, 5 } and C
={51,52,33,S.g,55}where31={1,2,3},Sg={4,5},S3={1,2},Sg={3,
4 }, and 55 = { 5 }, From this MCP, we need to construct a poset (P, '1). For each
subset S,- E C, we ﬁrst create four nodes Mg,, Mg2, Ng,, and Ng,, which are shown in

Table 4.3.

 

82

 

 

 

 

 

 

 

 

 

 

 

i5; Mg1Mg°2 Ni1Ni2
1s,={1,2,3} A B K L
2 Sz={4,5} I J T U
3 S3={1,2} C D N O
4 s..={3,4} E F P Q
5 55:93} G H R s

 

Table 4.3: Newly created nodes for each set Sg for transforming from MCP to LCP

Next, we create direct partial orders for each set S'g where i = 1,2, . . ., 5. For 51,

the direct partial orders created are A :5 L, B j K, A j 1 j K, A j 2 j K, and A .

j 3 j K. The rest of direct partial orders created for other sets in S are shown in

Figure 4.5.

\ I \ I \ I \ I \
\ I I \ I \ I \ I
\ I \ I \ I '
\ I \ ’ I
X l is 2 ii is ‘ A
I\ ’\ I I\
I, \/ Ii ///i‘\/ \‘/
\ I \ I
V s ‘ r u r v I ‘
L K O N Q P S R U T

Figure 4.5: Partial order sets created from MCP to LCP

Note that in Figure 4.5, dashed arrows (undashed arrows) represent direct partial
orders constructed for newly created nodes (resp. nodes in Sg). Then, set CL consists
of suitable candidate lattices in { < B,K >, < A,L >, < A,K >, < A,N >,
< A,P >, < D,N >, < 0,0 >, < C,N >, < C,K >, < F,P >, < E,Q >,
< E,K >, < E,P >, < H,R >, < G,S>, < G,R>, < G,T >, < J,T>, < I,U >,
< I, R >, < I,T > }. It is easy to see from Figure 4.5 and CL that if we can derive

a lattice cover LC Q CL, where ILCI S K + 2 - 5, then we have a minimum cover

83
C’ c c with m s K.

4.2.4 A General Heuristic for Finding a Good Lattice Cover

In this subsection, we present a heuristics for ﬁnding a good lattice cover. The
following proposition will establish the relationship between the lattice cover size and

the number of maximal and minimal nodes in the F-subgraph Gp(Vp, Ep).

Proposition 4.2.1 Given an F-subgraph CF with M maximal nodes and N minimal

nodes, the index for any lattice cover of CF must be at least max(M, N).

Proof: Assume that M _>_ N. Since two maximal nodes can not belong to the
same lattice, we need at least M lattices to cover all those maximal nodes. Similarly,

if N 2 M, we need at least N lattices to cover those minimal nodes. D

From the proof of proposition 4.2.1, it is obvious that all maximal (minimal) nodes in
Cp must be used at least once as the LUB (resp. GLB) node of some lattice in any
lattice cover of CF. This is because a maximal (minimal) node cannot be covered by
any lattice whose LUB (resp. GLB) node is not a maximal (resp. minimal) node.
This property provides a good basis for choosing a set of suitable candidate lattices,

i.e. CL, from all possible suitable candidate lattices.

We restrict all suitable candidate lattices to consist of those whose LUB and GLB
nodes are chosen from maximal and minimal nodes (respectively) of F—subgraph. For

example, consider the F-subgraph G2 in Figure 4.2. The suitable candidate lattice

84

< 7,11 > deﬁned on G2 is not chosen as a member of set CL. This is because LUB
node 7 is not a maximal node of the F—subgraph G2. The set CL consists of the
lattices < 3,10 >, < 3,11 > and < 3,12 >. However, it is not always possible to
derive a lattice cover for a given CF from the subset of suitable candidate lattices we

just described. We take the DAG GR in Figure 4.6a as an example to illustrate this

if) mi A?
z W. n:

® 03 ®

a b

problem.

Figure 4.6: A digraph G and F-subgraph of G

From the graph G3 in Figure 4.6a and the fragment F = { 2, 5, 6, 9, 10, 13, 14 },
we get the corresponding CF shown in Figure 4.6b. It follows that CL = { < 2,13 >
} from which we cannot derive a lattice cover for CF. This is because set CL cannot
contain the unsuitable candidate lattice < 2,14 > which also includes additional
node 11 ¢ F'. Thus, we need to enumerate additional suitable candidate lattices to
derive a lattice cover. For this purpose, we present Algorithm 1 in Figure 4.7. It
obtains additional suitable lattices by partitioning an unsuitable lattice < x, y > into
a set of suitable lattices. The union of these suitable lattices then covers all nodes in

{zlszjy /\ zEVp}.

" .
i .

 

 

85

The input to the algorithm is VF, unsuitable lattice L =< :r, y > and a subgraph
SG;(V1, E1) induced by the elements of L =< :r, y >. The elements of L =< :r,y >
was obtained from the intersection between ({x} U {all descendants of x}) and ({y} U
{all ancestors of y}) by using ER. Note that the we represent E1 and ER as adjacency

lists.

 

begin
1. Let a subgraph SC1(V1, E1) corresponds to L = < :c,y >;
2. Let \I' = 0;
3. Obtain IV] = V] - V}: and V1, -'= V] — IV];
4. Let SGL(VL, EL) be a subgraph induced by nodes in V1,;
5. Let P, = 0, S] = {x}, and i = 2;
6. Let S-z = { v | (u,v) 6 EL for all uE S] };
7. If 32 = (ll then goto step 10;
8. For each v in 5;, let P,, = i;

5"

i = i + 1, 5'1 = $2, and goto step 6;
10. Find 5 = {(u,v) | (u,v) E EL A (u,w) E E1 where w 6 IVI};
ll. EL = EL—tiand N={u | (u,v)Eb};
12. Find w 6 N such that PW = min{ Pg, I v E N};
13. Let M consist of all maximal nodes in EL;
14. Let \II = \IIU {<u,w> |ueMA(ujw is true in EL)}i
15. From EL, obtain S<m,n> = {elm j c j n} for each < m,n >€ ‘1’;
16. Obtain VL = v,, — {cl 0 e 50....) where < m.n >6 ‘1'};
17. If VL = (D then goto step 20;
18. Let EL consist of those edges whose end nodes are in V1,;
19. Let N consists of all minimal nodes in EL and goto step 12;
20. return (\II, {SLIL 6 .\II});

end

 

 

 

Figure 4.7: Algorithm 1: Partition an Unsuitable Candidate lattice L =< 1:, y >

We re—take the DAG GR and CF in Figure 4.6 as an example to demonstrate
Algorithm 1. The input to this algorithm is a VF = { 2, 5, 6, 9, 10, l3, l4 }, the
unsuitable candidate lattice < 2,14 > and the subgraph SG;(V1, El) corresponding

to < 2,14 >. Figure 4.8a shows SCL(VL,EL) obtained at step 4. The value P,, of

86

each node v in V1, is also shown beside node v in Figure 4.8a. Note that due to edge

(5,14), P14 is initially set to 3 and then reset to 4.

in: xi: .,
’39:: A; A:

a) b) C)

Figure 4.8: Changes of S'CL(VL, EL) in Algorithm 1

At step 10, since node 6 connected to node 11 in [V], 6 contains edge (6, 10). We
delete edge(s) in 6 from EL at next step. Thus, the structure of $01, is changed,
which is shown in Figure 4.8b. After the executions of step 12 to 15, we obtain \II
= { < 2,6 > } and change SCI, This changed 50;, is shown in Figure 4.8c. Since
V1, is still non-empty, steps 12 to 16 are repeated. During this repetition, set \I' is
changed to \II = { < 2,6 >, < 5,14 > }. This set \I' contains suitable candidate
lattices partitioned from the unsuitable candidate lattice < 2, 14 >. Next, we discuss

the time complexity of Algorithm 1.

The primary complexity of Algorithm 1 comes from traversing edges in EL. Then,
it is easy to see that Algorithm 1 will not traverse more than |\II| - IEL| edges. Thus,

the time complexity is 0(|\II| - IELI).

87

We now present Algorithm 2 which generates a good lattice cover for an F-
subgraph. It is given in Figure 4.9. The input to Algorithm 2 is theyF-subgraph
Gp(Vp, Ep), edge set ER, and set M and N. Set M and N denote the sets of max-
inial and minimal nodes of the F—subgraph respectively. Note that edge set ER and

Ep are maintained as adjacency lists.

Algorithm 2 ﬁrst creates a set of suitable candidate lattices CL by using Algorithm
1 and the method we described in the early part of this subsection. Next, Algorithm
2 tries to identify essential lattices among the lattices in CL. The formal deﬁnition

of essential lattices is given in the following deﬁnition 4.2.7.

Deﬁnition 4.2.7 Let CL be a set of suitable candidate lattices for an F—subgraph
CF(VF,EF). Suppose that CL' = { LC], LC-z, ...,LCk } consists of all possible
lattice covers for CF where LCg Q CL fori = 1,2,...,k. Then, a lattice in CL is

essential if it belongs to any lattice cover LCg fori = 1,2, . . . ,k.

Identifying all the essential lattices in CL was shown to be NP-Hard [73]. How-
ever, some of the essential lattices can be identiﬁed by checking each node in the
F—subgraph. That is, if a node is covered by a single lattice in CL, then the lattice is
essential. After ﬁnding all identiﬁable essential lattices in CL, Algorithm 2 compute
a set of nodes, V, not covered by identiﬁed essential lattices in CL. It then repeatedly
apply a greedy heuristics to the nodes in V to select a lattice from CL that covers

the most remaining uncovered nodes in V (with ties broken arbitrary).

Figure 4.10 shows a digraph CR(VR,E’R) consisting of two F-subgraphs corre-

sponding to the fragments { l, 2, ,14, 15 } and { 16, 17, , 21, 22 }. The

88

 

 

Stage 1

1. Let LC = (b. For each node v in Vp, create the corresponding ances-
tor and the lattice set A,, and L3,, respectively. They are initialized
to (0. Create two sets EC L (Essential Candidate Lattice) and U C L
(Unsuitable Candidate Lattice). Let ECL = UCL = (0.

2. For each maximal node m in M, let A", = { m }. For every v which
is a descendant of m in Ep, let also A,, = A, U { m }. Therefore,
given a node v, its ancestor set A,, contains all the maximal nodes
which are ancestors of v. For each maximal node m in M, let Am =

{m}-

3. For each minimal node n in N, and each m 6 An, the node pair
(m,n) suggests a candidate lattice < m,n >. There are EneN IAnI
such pairs. Let CL denote the set of ZneN IAnI candidate lattices.

4. For each candidate lattice L =< 2:,y > in CL, create the corre-
sponding set S L by using E3. The elements of S L are obtained from
the intersection between ({ a: } U { all descendants of z }) and ({ y
} U { all ancestors of y }). If L is not suitable, move L from CL to
U C L.

5. For each L 6 CL, L5,, = L5,, U { L } ifv 6 51,. Let UCL' = 0.
For each node v in Vp such that L3,, = 0: UCL' = UCL’ U { L |
L e UCL A v e .91, }. 1f UCL’ = (2) goto step 8.

6. Apply Algorithm 1 to each unsuitable candidate lattice L 6 UCL’.
Let PC L consists of those suitable candidate lattices returned from
Algorithm 1.

7. For each L E PCL, L5,, = L5,, U { L } ifv 6 31,. CL = CL U
PCL.

8. For all v 6 VP, if L5,, contains only one lattice L, then move L from

CL to ECL.
9. Let L0 = LC U ECL and V}? = VF - ULeECLSL-

Stage 2
V = Vp; U = CL;
while ( V ¢ lb) {
Select an L E U that maximizes ISL 0 VI;

U=U—L;
V=V—SL;
LC=LCUL;}

LC is the lattice cover for the F-subgraph;

 

Figure 4.9: Algorithm 2: Find a Lattice Cover for an F -sub2raph

 

89
F-subgraph Cp(Vp, Ep) corresponding to the fragment { 1, 2, ,14, 15 } will be

used to demonstrate Algorithm 2 as follows.

01110121°l31
[01110111 m 0 0m

Om @ Gm

  

\ .
@III Guam Guam Gm

Figure 4.10: A digraph G consisting of two F-subgraphs

Stage 1
2. The ancestor set A,, of each node v in VF is shown beside node v in Figure 4.10.

3. The set CL of candidate lattices is obtained: CL = { < 1,12 >, < 1,13 >,

<1,14 >,<2,13 >,<2,14 >,<3,13 >,<3,14>,<3,15>}.

4. Table 4.4 shows the nodes of each candidate lattice L in CL. The nodes of each
lattice are computed by using the edge set ER. Among the candidate lattices

in CL, lattice < 1,12 > is not suitable, since it also covers nodes 20 and 21.

Thus, CL 2 CL — {< 1,12 >} and UCL ={ <1,12 > }.

5. In this step, wegot LS] ={< 1,13 >,< 1,14 > },L32={<2,l4 >,<2,15>
},L53={<3,13 >,<3,14 >,<3,15> },LS,=0,...,LS,5={<3,15>

}. Then, by the deﬁnition of UCL’, UCL’ = { < 1,12 > }.

90

 

 

 

 

 

 

 

11 SL L 51,

< 1,12> SL={1,4,9,12,20,21} <2,13> SL={2,6,10, 13}

< l,13> SL={1,4,5,9,10,13} <3,13> SL={3,7,10,13}

< l,14> SL={1,5, 10,14} <3,14> SL={3,7,8,10, 11,14}

< 2,14> SL={2,6,10,14} <3,15> SL={3,7,8,11,15}
Table 4.4: Lattice L and Set 3;,

6. By applying Algorithm 1 to unsuitable candidate lattice < 1,12 >, we got PC L
={<1,4>,<9,12>}.

7- In this step, by using Ep, we got 50,4) = { 1, 4 }, 5(9’12> = { 9, 12 }, L31
={<1,4>,<1,13>,<1,14>}, LS,={<1,4>,<1,13>},L39={
<1,13 >, < 9,12 > }, L512 = { < 9,12 > }. And CL = CL U { < 1,4 >,
< 9,12 > }.

8° The lattices < 9,12 > and < 3,15 > are essential because L312 : { < 9,12 >
}, L515 = { <3,15 > }. Hence, ECL = { <9,12 >, <3,15>}and CL ={
<1,4>,<1,13>,<2,13>,<2,14 >, <3,13 >, <3,14> }.

9‘ LC and V consist of{ < 9,12 >, < 3,15 > } and { 1, 2, 4, 5, 6,10, 13,14}
respectively.

Stage 2

A greedy method is performed in while loop. At the ﬁrst execution of while
loop, lattice < 1, 13 > will be selected from CL. Then lattice < 2,14 > will be

selected in the next execution of the while loop and Stage 2 will end. Therefore,

91

the lattice cover LC consists of{ < 9,12 >, < 3,15 >, <1,13 >, < 2,14 > },

which in this example is the minimal lattice cover.

4-2 -5 Near Optimal Lattice Cover Size

Corrnen ct al. [20] used the same greedy heuristics as the one in stage 2 of Algorithm
2- They derived a logarithmic ratio bound for their algorithm. Our greedy approxi-
rn ation algorithm has the same performance ratio bound in the worst case as the one
shown in [20]. This is because, in the worst case, our algorithm may not identify any
essential lattice. As a result, only the greedy heuristics of stage 2 is used for deriving

a l at t ice cover.

Let OPTU‘“) be the optimal solution and LCP(I) the heuristic solution for LCP
PTOblem. Cormen et al. prove in [20] that LCP(I) < (ln(max{|SL| : L 6 CL}) + 1) ,

O P T( I ‘). However, for our algorithm, this bound represents the worst case situation.

To measure the performance of Algorithm 2, we create random DAGs GR(VR, ER).
T118 generation of DAGs is based on a uniform distribution in the range [1,IVRI]
Where |VR| represents the total number of nodes. Then, we' create F-subgraphs by
partitioning 03. Each F-subgraph has a small number of maximal and minimal nodes
C0111 pared to lVFl (i.e. about 1 % of |Vp| on the average). Based on the above scheme,

we Create a set of random F-subgraphs Cp(Vp, Ep) as f0110W53

. Size of Vp is varied from 1000, 1500, to 2000 nodes.

. For leI = 1000, we create CF by varying IEpI from 1500 to 10000 with an

 

92

interval 500 edges.

. For IVpI = 1500, we create Gp by varying IEpI from 2500 to 10000 with an

interval 500 edges.

0 For lVFl = 2000, we create CF by varying IEFI from 3000 to 10000 with an

interval 500 edges.

0 For each ﬁxed size of VF and Ep, we create 10 Gp’s using 10 different seeds.

For each F-subgraph created as above, we compute the lattice cover size ILCFI and
the percentage ratio between the total number of essential lattices (i.e. IEC LI) and
sui table candidate lattices (i.e. ICLI), by using Algorithm 2. Note that CL (ECL) is
Obtained in step 7 (resp. step 8) of Algorithm 2. These values are then averaged over

the 10 F-subgraphs with same values of I VpI and IEpI. These are shown in Figure

4-11

Figure 4.11a shows that the lattice cover size |LCp| converges to the theoretically
DOSSible minimal lattice cover size as we increase lEFl- The theoretically possible
Irlinirnal lattice cover size is given in Proposition 2.1. From Figure 4.11b, we can
clearly observe that the value of P approaches close to 100 % as We decrease IEFI This
i ‘11 plies that the total number of essential lattices approaches close to the total number
of the suitable candidate lattices, as we decrease IEpI. Since IECLI S ILCpI S
ICLI, Algorithm 2 gives LCp close to the minimal lattice cover, as IEpI is decreased.
Therefore, we can conclude that Algorithm 2 gives near optimal lattice covers for both

dense and non-dense F-subgraphs. Note that dense F —subgraphs have much larger

93

number of edges per node than that for non-dense F-subgraphs.

50 l l T

 

ILCFI

 

 

 

 

 

 

ITIIIIIII

 

 

b) P = 100 - IECLI/ICLI

Figure 4.11: Performance analysis for Algorithm 2

Next, we discuss the complexity of Algorithm 2. Algorithm 2 consists of stage I
and 2. The time complexity of stage I is dominated by step 4. Let Nd (Na) be the
1hﬁximum number descendants (resp. ancestors) of node a: E M (resp. N). Let c, be
the total number of edges to be traversed to ﬁnd the above ancestor and descendant
nodes. Then step 4 will run in 0(e, + (Nd + Na) - ZnEN IAnI)- For stage 2, we can
ea8in see that its time complexity is 0(ICLI2 - IVpI). Thus, the time complexity of
the algorithm is 0m + (N. + N.) - gal/4.4 + ICle - val). Note that. as the

denseness of F-subgraph is decreased, the time complexity will approach to 0(eg +

 

94
(N4 + Na) - ZneN |A,,I). This is explained by the result shown in Figure 4.11b.

4-3 Finding Relevant Fragments

The fragments containing the nodes a query needs to access, are deﬁned as relevant
fragments of the query. Note these set of nodes may be a super set of the result
11 odes. In order to ﬁnd relevant fragments of the query, each site needs to access its
local fragment table and local edge set ER. Before discussing methods of locating the
relevant fragments, we ﬁrst introduce a set of basic notations which will be used in

t he rest of this paper. They are deﬁned as follows:

FT : the set of all lattices in the fragment table; Q : a set of nodes;
DQ : Q U { all descendants of nodes in Q };

AQ : Q U { all ancestors of nodes in Q };

Sgub: {x|<x,y>6FT}; Sgubq : {xIxESgubeE/lq };

Sggb:{yl<:r,y>EFT}; SglbqiiylyESgIbAl/GDQ }i

Note that DQ and AQ are computed from E3 maintained in adjacency lists. The

following proposition provides the basis for locating relevant fragments of the query.

Proposition 4.3.1 Suppose Q consists of a set of nodes needed by a query. Let A be
the set of lattices in FT which cover all nodes in Q. Then, the lattices in A are only

those lattices in T ={ < x,y > I :1: E Stubq A y 6 Sglbq A < :c,y >6 FT }.

95

Proof: We need to show that A Q T and T Q A to prove A E T.
Case 1: To prove A Q T, we need to show V < 2:,y > (< 1:,y >6 A —> < a:,y >6 T).
Assume that there is a lattice < u,v >6 A such that < u,v >6 T. Then, < u,v >
11) ust cover at least one node 2 6 Q. By the deﬁnition of T, u 6 Sgubq or '0 ¢ Sggbq.
However, since u j z j v and z 6 Q, u 6 AQ and v 6 Dq. This implies that
u 63 Sgubq and v 6 39150, which is a contradiction.
Case 2: To prove T Q A, we need to show V < 2:,y > (< a:,y >6 T —> < $,y >6 A).
Assume that there is a lattice < u,v >6 T such that < u,v >6 A. Then, by the
deﬁnition of T, < u,v >6 FT and it must cover at least one node in Q. Since A
Con tains all lattices in FT which cover all nodes in Q, A must contain < u,v >. This

contradicts our assumption. 0

Even though proposition 4.3.1 gives a general way of locating relevant fragments,

it is an inefﬁcient approach because it requires computing two transitive closures
(i-e. ﬁnding all ancestors and descendants) of all nodes in Q. Thus, we need an
efﬁcient algorithm which minimizes transitive closure computations. Fortunately,
for recursive queries, the nodes they need to access are likely to be a connected
Component or a collection of several connected components. In this case, we can
lhinimize computing two transitive closures of all nodes in Q by taking advantage of
the transitive relationships among the query nodes. Furthermore, if a recursive query
processing requires a transitive closure computation on ER, then either Q E DQ or

Q _=. Aq. Then, in either cases, no more transitive closure computation is necessary.

96

This will be proved in the following proposition 4.3.2 and 4.3.3.
Proposition 4.3.2 Let DQ E Q and ll ={ < 2:,y > I y 6 5'9ng A < :c,y >6 FT}.

Then, 11 E T.

Proof: We need to show that I] Q T and T Q IT to prove ll 5 T.
Case 1: To prove T Q II, we need to show V < x,y > (< :c,y >6 T —-1 < 2:,y >6 II).

For all < Ir,y >6 T, by deﬁnition of T, it is true that y 6 S9150 and < :c,y >6 FT.

Tl) us, all < :r,y >6 T belongs to ll.

Case 2: To prove ll Q T, we need to show V < x,y > (< :L‘,y >6 ll —+ < :c,y >6 T).
Assume that there is a lattice < u,v >6 ll such that < u,v >¢ T. Then, it must
be true that v 6 Sglbq, < u,v >6 FT and u 6 Szubq. By proposition 4.3.1, < u,v >
Callliot cover any node in Q. However, < u,v > covers the node v 6 Q, since

'1’ G Sggbq Q DQ E Q. Thus, < u, v >6 T, which is a contradiction. Cl

Proposition 4.3.3 Let Ag 5 Q and Q ={ < 3:,y > I a: 6 Siubq A < x,y >6 FT}.
Then, 9 E T.

Proof: The proof is similar to the one for Proposition 4.3.2.

Before presenting an efﬁcient algorithm (i.e. Algorithm 3) for ﬁnding relevant
fragments, we ﬁrst introduces a function named Block(x,t), which is called within
Algorithm 3. Note that parameter t represents either symbol “L” or “G”. Function

Block(z,C) (Block(:r,L)) returns a set of GLB (resp. LUB) nodes from which a given

97

node :1: is reachable. This function is described in Figure 4.12.

 

Block(m,t) /* a: : node; t: either C or L; */
begin
current = {1:}; Block 2 0;
Ift = G then .S = $915;
else s = Slug;
While ( current # 0 ) {
temp = current 0 3;
Block = Block U temp;
current = current — temp;
Ift = C then
current = {y|(:r,y) 6 ER A a: 6 current};
else /* t = L */
current = {:r|(:c,y) 6 ER A y 6 current}; }
return (Block);
end

 

 

 

Figure 4.12: Procedure Block(x,t)

To show how Block(r, t) works, we take Figure 4.13 as an example.

Figure 4.13a shows three F-subgraphs whose node information is stored at different
Sit-es. The fragment table corresponding to the DAG CH of Figure 4.13a is shown in
ITigure 4.13b. Note that this fragment table is stored at each site. From Figure 4.13b,
We have Sglb = I 6, 7, 13, 14, 15, 24, 20, 23 } and Slug, = { 1, 2, 8, 9, 16, 17, 21 }.
Then, it is easy to see from the procedure that Block(4,G) = {6,7}, Block(6,G) =

{6}, Block(‘23,L) = {9, 16, 17, 21}, and Block(21,L) = {21}.

We now describe Algorithm 3 in Figure 4.14. Let Q consists of a set of nodes a
query need to access. Algorithm 3 ﬁrst checks if either Q E DQ or Q 5 Ag. If it is,
Using proposition 4.3.2 and 4.3.3, Algorithm 3 easily locates relevant fragments. Oth-

erwise, the algorithm tries to ﬁnd relevant fragments taking advantage of transitive

98

   

 

Fragrant O Lattice
l <l.6>

47>

<8.l3>

 

 

 

 

 

<9.15>

<17,24>
<16.20>
d!.23>

 

 

 

 

 

 

wwwnnu-

 

a) b)

Figure 4.13: a) DAG for the whole recursive relation b) Fragment table

relationships among the nodes in Q.

We re—take the preceding Figure 4.13 to demonstrate Algorithm 3. Assume that
t-V'\Io recursive queries RQ1 and HQ; need to access nodes in Q1 = { 8, 10, 11, 13, 14 }
Ei-nd Q2 ={ 12, 18, 19, 20, 21 } respectively. We show how to ﬁnd relevant fragments

of RQ1 and RQ2 in the following two cases.

Case Rle We have [IQ = {8}, G’Q = {13,14}, EQ = { (8,10), (8,11), (10,13),
(11,13), (11,14) }, EL = { (6,8), (9,11) }, and Ea = 0. Since E0 = 0, we obtain
LT = {< 8,13 >, < 9, 14 >} at step 3. Thus, fragment 2 is the relevant fragment of

the query RQI .

99

 

 

begin
1. Compute LQ = Stub O Q and GQ = Syn, (1 Q;
2. Compute EQ = {(x,y)la: 6 Q A y 6 Q A ($.31) 6 ER},
EL = {(x,y)lx ¢ QAy 6 QA (33,31) 6 ER},
Ea ={($,y)|1‘ E Q /\ y 9! Q A ($.11) 6 ER};
31ng =Iothen /* Q2 Do */
{LT = {< x,y > Iy 6 CqA < x,y >6 FT}; Goto step 10; }
IfEL=lllthen /*QEAQ */
{LT = {< x,y > Ia: 6 LqA < x,y >6 FT}; Goto step 10;}
4. For each node a: 6 Q, L, = C, = 0;
5. so, = {$l($,y) e Ea} u Go;
81.0 = {yl($,y) 6 EL} U Lo;
6. For each a: 6 800’ C, = Block($,G);
For each a: 6 81,0, L, = Block(zr, L);
7. For each a: 6 Bag
For every y which is an ancestor of :1: within EQ
C, = C, U (1,;
8. For each a: 6 BLQ {
L’ = L,; G’ = 0,;
LT, = {< a,b> Ia e L.Abe (LA < a,b>6 FT};
while (LT, = 0) { /* No lattice covering node x is yet found */
L' = {uly E L’ A (my) 6 ER};
C’ = {va 6 G" A (x,v) 6 133};
For each u 6 L’, L, = L, U Block(u, L);
For each v 6 C’, C, = C, U Block(v, C);
LT,= {< a,b> |a6 L,Ab6C,A <a,b>6 FT}; }}
9. LT = U,€3LQ LT,;
10. All lattices in LT describe the fragments containing nodes in Q;
end

 

Figure 4.14: Algorithm 3: Find the Relevant Fragments

 

100
Case RQ2: We have LQ = {21}, CQ = {20}, EQ = { (18,20), (19,20), (20,12), (20,21)

}, EL = { (9,12), (16,18), (17,19)}, and E0 = { (12,15), (12,23), (21,23), (20,22) }.
At step 5, we obtain 800 = {12, 20, 21} and 8,0 = {12, 18, 19, 21}. At next step,
function Block-(2:, C) and Block-(w, L) gives 0,, = {15,23}, G20 = {20}, 621 = {23},
L1 2 = {9}, L13 = {16}, L19 = {17}, and L2] = {21}. Since nodes 18,19, and 20 are
tlle ancestors of each node :1: 6 Bag within Eq, G18 = { 15, 20, 23 }, C19 = { 15, 20,
23 } and C20 = { 15, 23 } are computed at step 7. In the following step, we identify
all the lattices covering nodes in BLq. By the deﬁnition of LT,, LT12 = {< 9,15 >},
LT” ={<16,‘20 >}, L7}, = (21, and LT21 = {g 21,23 >}. Since LT19 = 0, we need
to execute the while loop. In that loop, we got L’ = ID and G’ = {22}. Then, we
cOmpute C19 = { 15, 20, 23 } U {24}, which gives LT19 = {< 17,24 >}. Thus, the
lattices in { < 9,15 >, < 16,20 >, < 17,24 >, < 21,23 > } describe the relevant
friitgments (i.e. fragment 2 and 3) of RQg. Next, we discuss the time complexity of

A 1 gorithm 3.

The time complexity of Algorithm 3 is varied depending the following two cases.
The ﬁrst case is when EC; or EL is equal to (ll. In this case, the complexity is 0(IEQ U
EL U EGI + ILTI - IFTI). The second case is when the ﬁrst case is not true. Let e1,
l‘epresent the total number of edges belong to lattice L. Then, Algorithm 3 in the
Second case do not traverse more than 0(21,e LT IeLI) edges. Thus, the complexity is

0(Ztem ICLI + ILTI ' lFTl)-

101
4.4 Updating Recursive Relations

There are four different types of updates on a fragmented recursive relation
GR(VR,ER). They are node addition, node deletion, edge addition, and edge dele-
tion. A new node can be added to V3 or an existing node can be deleted from V3.

This node addition and deletion require updating Vp at the corresponding site. An
edge addition (deletion) can connect (resp. disconnect) two existing nodes belong-
in g to either same fragment or two different fragments. This edge update requires

revising the local copy of edge set ER at each site.

These 4 different types of updates must be performed by following the two update

constraints given below:

1. A new edge (as, y) can be added to ER if node a: and y already exists in VB.

‘2. A node a: can be deleted from V3 after all edges incident on node a: are deleted

from E R.

The preceding two constraints are introduced to make each type of update atomic.

The updates may also require revising the descriptions of some fragments (i.e.
lattice covers). If we have to recompute new lattice covers for those fragments affected
by updates, the overhead of maintaining the fragment description would be high.
However, we can avoid this re—computation by using an efﬁcient incremental update

algorithm. This is presented in the following subsections.

102
4.4.1 Edge Update

Suppose that an edge (x,y) is added (deleted) to (resp. from) DAG GR(VR, ER) at
site h. Then, by using Algorithm 3, site k can easily locate which fragments contain
Ilode a: or y. Assume that fragments F.- at site i and Pi at site j contain node a:

and y respectively. Note that i,j and k are not necessarily distinct. Based on the

above assumption, we describe the detailed update algorithms for edge addition and

d eletion.

E dge Addition

Site It sends the update information (i.e. an addition of edge (x,y)) to site i
alld j. After receiving the update information, site i (j) performs the procedure
AddEdge((:r,y),i) (resp. AddEdge((:r,y),j)). This procedure is described in Figure
4- 1 5. The DAG GR(VR,ER) in Figure 4.16 is used as an example to show the‘ex-
eoCution of procedure AddEdge((x,y),s). Figure 4.16 shows two F -subgraphs whose

1lodes are stored at site 1 and ‘2.

Assume that we add edge (13,15) first and then (9,7) to ER. Site 2 performs
AddEdge((13, 15),2) because both node 13 and 15 are available locally. Since A is an
empty set, we obtain L13 = {< 12,13 >} and L15 = {< 15,15 >}. Lattices < 12,13 >
and < 15, 15 > are merged into a new lattice < 12,15 > because lattice < 12,15 > is

suitable. Thus, lattice cover LC‘Z is changed to {< 8,10 >,< 12,15 >}.

Next, we consider edge (9, 7) addition to 133. Since site 1 and 2 stores node 9 and

 

103

AddEdge((x,y),s) /* (x,y): edge; 3: site number; */
begin
Let LC, be a lattice cover for a fragment F, at site .5;
Let Ep, be an edge set corresponding to F,;
Let Q1 = {10}, Q2 = {31};
By using En, compute $1.,le and 391502;
Obtain A = { < u,w > | u 6 31,50! /\ w 6 5,1qu /\ < u,w >6 LC, };
If ( == 0) {
/* No lattice becomes unsuitable by the addition of edge (x,y) */
Obtain L, = { < a,2: > I < a,;r >6 LC, };
Obtain L, = { < y,b> I < y,b>€ LC, };
133' = ER U {(X,)’)};
For each < a,x > in L,
For each < y,b > in L1’
If (lattice < a,b > is suitable) {
/* Suitability checking is done by using ER’ */
LC, = LC, — {< a,;r >,< y,b >};
LC, = LC, U {< a,b >}; }}
else { /* Some lattices in A may become unsuitable */
For each < u,w > in A
If (< u, w > is not suitable) {
/* Suitability checking is done by using ER’ */
LC, = LC, — {< u,w >};
Apply Algorithm 1 to lattice < u,w >;
Let LC, include those lattices returned from Algorithm 1; } }

 

end

 

Figure 4.15: Procedure AddEdge((:i:, y), s)

 

 

 

 

 

 

 

 

 

Site] Site2
3 9 8% .3 9
Lattice Cover LC]: Lattice Cover LC2:
{ <1.3>,<5.7> } { <8,10>.<12.l3>,<15,15> }

Figure 4.16: A DAG consisting of two F—subgraphs

 

104
7 respectively, the site 1 (2) performs AddEdge((9,7),l) (resp. AddEdge((9,7),2)).

For site 2, no change is made on lattice cover LC2 because of A = L9 = L7 = 0.
For site 1, we got A = { < 5,7 > }. Lattice < 5,7 > is not suitable because it
includes additional node 9 stored at site 2. Therefore, it is necessary to partition
lattice < 5, 7 > into a set of suitable lattices. By applying Algorithm 1, we partition

the unsuitable lattice < 5, 7 > into suitable lattices < 5,6 > and < 4, 7 >. Thus, we

obtain L(31={<1,3>,< 5,6 >, < 4,7 > }-

After performing AddEdge((;z:,y),i) (AddEdge((a:,y),j)), site i (resp. j) notifies
to site 1: its fragment description (i.e. lattice cover) changes. Upon receiving the
replies from site i and j, site k updates its local fragment table and ER. Then, site I:
broa-dca.sts these changes to all other sites. They in turn update their local fragment

table and ER.

We now discuss the time complexity of procedure AddEdge((a:,y),s). Its com-
pleXity is determined by the total number of edges to be traversed in ER'. Let LU B,
= { l I<l,g>E(AUL,)}andGLB,={g|<l,g>€(AULy)}whereset
A’ L;- and L, are shown in Figure 4.15. We denote a, (a,) as the total number of
edges to be traversed to compute all descendants (resp. ancestors) of each node in
LI] 8,7, (resp. GLBy). Then, it is easy to see that the time complexity of procedure

AddEdge((z,y),s) is 0(a, + ay).

105
Edge Deletion

Site Is: checks if fragments F, and F, are two distinct fragments. If they are, then
no update is necessary on the fragment table. Site Is updates only its local ER and

broadcast this change to all other sites. They in turn updates their local ER.

If F,- and F, are same fragment (i.e. i = j), then site k sends the update informa-
tion (i.e. a deletion of edge (a, y)) to site 2'. After receiving the update information,
site i performs the procedure DeleteEdge((m,y),i). This procedure is described in

Figure 4.17.

We re—use DAG GR(VR,ER) in Figure 4.16 as an example to show the exe-
cution of procedure DeleteEdge((x,y),s). Assume that we delete edge (8,9) ﬁrst
and then (8,11) from E3. Thus, site 2 ﬁrst executes DeleteEdge((8,9),2) and then
DeleteEdge((8,11),2). For DeleteEdge((8,9),2), we got A = {< 8,10 >}. Since
.5'1={8,10,11 } and 5'2 = {8, 9,10,11},LC2is{< 8,10 >, < 9,10 >,
< 12,13 >, < 15,15 > }. For DeleteEdge((8,11),2), we obtain A = {< 8,10 >}.
Lattice < 8, 10 > becomes unsuitable because S, is an empty set. It is easy to see from
the procedure how LC2 is changed to { < 8,8 >, < 11,10 >, < 9,10 >, < 12, 13 >,

<15,15> }.

After performing DeleteEdge((m, y),i), site i notiﬁes to site Is: its fragment descrip-
tion (i.e. lattice cover) changes. Upon receiving the reply from site 2', site 1: updates
its local fragment table and ER. Then, site k broadcasts these changes to all other

sites. They in turn update their local fragment table and ER. In the next paragraph,

v “J:_‘l‘_1__.i‘
- l
a
Q

106

we discuss the time complexity of procedure DeleteEdge((z, y),s).

 

DeleteEdge((zr,y),s) /* (x,y): edge; .5: site number; */
begin
Let LC, be a lattice cover for a fragment F, at site .9;
Let Ep, be an edge set corresponding to F,;
Let BF. : BF: —{(:c,y)},Q1={:c}and Q2 = {y};
By using En, compute $1,,le and Sglqu;
Obtain A = { < u,w > I u 6 5'1,le /\ w 6 391qu A < u,w >6 LC, };
If (A # 01) { /* Lattice in A may be affected by the deletion of edge (x,y) */
For each lattice < u,w > in A {

By using Eh, compute S] = { v I u j v j w};

If (5'1 == ) { /* Lattice < u,w > is not suitable */
LC,=LC,-{<u,w>};
LC,=LC,U{<u,x>,<y,w> };}

else { /* lattice < u,w > is still suitable */

By using EFL: compute 52 = { v I u j v j w};

If1|32|75|51|1 {

/* The number of nodes lattice < u,w > covers is decreased */
If(u;é;r) LC, = LC,U{<u,:r> };
If(yaﬁw) LC.=LC.U{<y,w>};}}}}

end

 

hill-I

 

 

 

Figure 4.17: Procedure DeleteEdge((z, y), s)

The time complexity of procedure DeleteEdge((m, y),s) is determined by the total
number of edges to be traversed in ER. Let LUB, = { u I < u,w > 6 A } and
GLB, = { w I < u,w > 6 A } where set A is referred in Figure 4.17. We denote d,
((1,) as the total number of edges to be traversed to compute all descendants (resp.

ancestors) of each node in LU B, (resp. CLBy). Then, it is easy to see that the time

complexity of procedure DeleteEdge((x, y),s) is 0(d, + d,).

 

 

107
4.4.2 Node Addition and Deletion

Suppose that a new node x is added to DAG GHQ/R, ER) at the site It. In this case,
we assume that site k wants to store node a: in its fragment. Site k ﬁrst creates a new
trivial lattice < 3:,3: >. It then updates its local fragment table by adding < m,n: >
to its fragment description. Site k broadcasts this change to all other sites. They in

turn update their local fragment table.

Suppose that a node a: is deleted from DAG GR(VR, ER) at the site h. Then, by
using Algorithm 3, site k can easily locate which fragment contains node 2:. Assume
that fragment F,- at site i contains node 11:. Note that i and k are not necessarily
distinct. Further note that, by the 2nd update constraint, all the edges incident on
node a: are already deleted. Therefore, except for a trivial lattice < :c, a: >, no lattice
in the description of fragment F,- covers node 3:. Site 1: updates its local fragment table
by deleting lattice < 13,2: > from the description of fragment E. It then broadcasts

this change to all other sites. They in turn update their local fragment table.

4.4.3 Performance Analysis of Update Algorithms

In this section, we discuss the performance analysis of our update algorithms given
in the previous section. We use random F-subgraphs for this analysis. Random

F -subgraphs GF(VF, Ep) are created the same way as that given in Section 4.2.5.

Let LCP be a lattice cover for an F-subgraph GF(VF, Ep). After each update on

CF, the size of LCF may change. We denote this changed LCF by SLO (Semi Local

 

 

108

Optimal)-LCF. Besides SLO-LCp, we consider two other LC ps after each update on
CF. They are TOP (Theoretically OPtimal)-LCF and N GO (Near Global Optimal)-
LCp. N GO-LCF is obtained by applying Algorithm 2 to CF. In order to show the
performance of our update algorithm, it is ideal to compare the size of SLO-LCF with
that of TOP-LCP. However, TOP-LCp is computationally very difﬁcult to achieve
due to the N P-completeness of LCP. Thus, as an alternative, we compare the size of

SLO-LCF with that of NGO-LCF.

 

200 1 I T r
180 SLO — ~
160 ..-
140 -

120 ~

ILCpl 100 L
I.

I
2
Q
C
l

 

 

 

 

0 22 0 0 8 100
Num er 0 ran om up ates

a) IVFI = 1000 and lEFl =1500

 

200 1 l i 1

180

160

140

120

|LCp| 100
80

60
40 I...
20 .....................

0 l L l l

0 22 0 0 8 100
Num er 0 ran om up ates

b) IVFI = 1000 and IEFI = 4000

     

llLllll

LA

 

 

 

Figure 4.18: Effects of updates on ILCpI for a) non-dense and b) dense F—subgraphs

Figure 4.18 shows the effect of random updates on the size of LC}: for a non-dense

as well as a dense F-subgraph. Note that random updates consist of node addition,

 

109

node deletion, edge addition, and edge deletion where each type of update is randomly
chosen. Figure 4.18a shows that, for a non-dense F-subgraph, the performance of our
update algorithms deteriorates very slowly. For a dense F-subgraph, the performance
deteriorates faster than a non—dense F-subgraph. This is shown in Figure 4.18b.
However, in the following paragraph, we will show that this performance degrade will

not cause the frequent application of Algorithm 2 to compute N GO-LCF.

We have measured the impact of updates on ILCFI for varying denseness of F-
subgraphs. In this experiment, ILCFI is measured after 50 random updates. We
averaged the value of ILCFI for 10 F -subgraphs using 10 different seeds, as we did in
section 4.2.5. The results are shown in Figure 4.19. From Figure 4.19a, it is easy to
see that the performance of our update algorithms is good for non-dense F—subgraphs
while it is not as good for dense F—subgraphs. For a non-dense F-subgraph, SLO-LCF
remains near optimal for a large number of random updates. This is seen in Figure
4.18a and 4.19a. Thus, we do not have to frequently apply Algorithm 2 to obtain
N GO LCF. For a dense F-subgraph 0,2, we do not need to frequently obtain N GO
LC p, either. This is justiﬁed from Figure 4.19b, which shows that the percentage ratio
between ILCFI and IVpI for SLO-LCp grows very slowly with increasing denseness
of F -subgraph. Therefore, we conclude that our update algorithms perform well
without requiring frequent application of Algorithm 2 for both dense and non-dense

F-subgraphs.

110

 

110

80
70

ILCFI 6°~ ..

50
40
30
20
10

l
100 *SLO <9—
90 WGO +-

llfllllllLl

 

 

 

0

1500 2000 2500 3000 3500 4000

IEFI
a) IVFI21000

 

100
90
80
7o
60
Q 50
4o
30
20

l 1 1 I
SLO @-
rNGO + '

lJllllll

Tlllrﬁ

 

 

104
0

 

 

1500 2000 250I)E 3000 3500 4000

PI

b) |Vp|=1000 and Q = 100 . ILCFI/IVFI

Figure 4.19: Snapsho

ts of ILCpI and Q after 50 random updates

111

4.5 Conclusion

In this chapter, we developed a distributed database model for studying the problem of
fragmentation of recursive relations. We used the mathematical properties of lattices
to investigate fragmentation schemes. We showed that the lattice structures are

powerful representation tools for distributed fragments of recursive relations. Both
theoretical and empirical performance analysis of the lattice approach are provided.
The empirical analysis veriﬁes the near optimality of the performance of our proposed
algorithm. We also showed that lattice approach provides an efﬁcient way of locating

the relevant fragments of a recursive query.

There are special problems with updating fragmented recursive relations which
do not occur in updating fragmented non—recursive relations. We discussed these
PFOblems and their solutions. The empirical performance analysis of these solutions

are also discussed.

Chapter 5

Conclusion and Future Research

5 - 1 Contribution of This Dissertation

We have examined recursive query processing strategies for large recursive relations.
Recursive query processing requires transitive closure computations. As a special
Case of transitive closure computation, the shortest path computation is essential
for recursive query processing for the large recursive relations of topographical road

In this thesis, we proposed an efficient recursive data organization method called
the HiTi graph model. The H iT i graph model allows structuring of large recur-
Sive relations for hierarchical abstraction. Based on the H iTi graph structure, we
de‘VeIOped a new fast shortest path algorithm named S' PAH . S PAH uses the pre-
C01I‘Dutations captured by the H iTi graph to speed up SPSP computation. These

112

113

precomputations are not used in the traditional SPSP algorithms, including A*. By
using these precomputations, S PAH dramatically reduces the explored search space.
This is shown through our empirical analysis of S PAH on grid graphs. We also stud-

ied intra and inter query parallel processing for shortest path computations within

the H iTi graph model framework.

The traditional intra query parallel shortest path algorithm known as 0T0par
is not scalable and was shown to be inefficient on the H iTi graph structure. We
proposed a new parallel intra query shortest path algorithm named PAS PAH . We
implemented PASPAH and MOTOpar on a BBN GP1000, Which has a nonuni-
forn] memory architecture. The BBN GP1000 multiprocessor currently consists of 85
nodes, linked together by a high speed butterﬂy switch. In this system, the globally
shared memory is the sum of the memories local to all processors. Our empirical anal-
y Sis of PASPAH on grid graphs showed that it is more scalable and performs better
‘31) an M OTOpar. However, our analysis also showed that the average execution time
of PAS PAH is not signiﬁcantly better than that of S PAH and M 0T0par. Note

th at M 0T0par is a version of OTOpar modiﬁed to utilize the H iTi graph structure.

Next, we proposed a new inter query parallel shortest path algorithm called
I ‘9 PAH. The performance evaluation of I S PAH was also performed on the BBN
G P 1 000 shared-memory multiprocessor system by using the same grid graphs as those
for PAS PAH . The empirical analysis of I S PAH revealed that I SPAH is scalable
up to 25 processors although the speedup of I S PAH increases almost linearly for up

1‘. .
O 1 0 processors. One of the major performance bottlenecks of I S PAH results from

114

the severe shared-memory access conﬂicts between processors. We then proposed an
improved version of I S PAH named M I S PAH , which reduces the memory-access
conﬂicts through partial data replication. As a result of this replication, M I S PAH
is scalable up to 41 processors. From this analysis, we conjecture that the parallel

processing for inter query SPSP problems is more promising than for intra query

S PSP problems.

We then studied the problems of describing and locating the fragments of large re-
cursive relations in a distributed database. These two problems are not primary query
optimization issues in a centralized database. In a centralized database, distribution
and location of the recursive data are easily handled through hashing or predicates.
Thus, its query optimization is primarily done on shortest path or transitive closure
computations. However, in a distributed database, the above two problems are the
major performance bottlenecks for recursive query processing. This is because the
COSt of processing recursive queries is strongly dependent on the cost of transmitting
information between sites in a distributed database. Traditional use of hashing is not
Sui table for distributed recursive databases, since they cannot capture the referencing
lOCEAJity of data. We used the mathematical properties of lattices to describe and
idelltify the distributed recursive fragments. We found that lattice structures provide
a. C'Dtliplete and sound description of recursive fragments. Our theoretical and empiri-
Cal performance analysis of the lattice approach veriﬁes its effectiveness in distributed

dat abases.

5.2

115
Future Research

Shortest path computation on topographical road maps requires accessing the
road segments on the disk. Since representation of topographical road maps
requires spatial data structures, we need spatial index schemes to enhance the
performance of shortest path computation. Traditionally, R, R+, R*, k-d, k-d-
b, MD, and GDB trees are used for indexing general spatial objects. However,
the above methods are not optimized for the identiﬁcation of road segments.
Thus, more efficient spatial index schemes for the road segments need to be

developed.

Physically clustering adjacent road segments together in the same disk sector is
an important research issue for the performance of shortest path computation.
1f adjacent road segments are not clustered, this will cause very frequent page

faults which in turn will slow down the shortest path computation.

In navigation systems, a shortest path may need to be recomputed because of
changing trafﬁc conditions, such as roadblocking. Instead of recomputing an
entire shortest path, it may be possible to optimize recomputation based on

previously computed path information.

The M I SPAH algorithm given in section 3.3.2 is scalable up to 41 processors.

More research is needed for further improvement.

APPENDICES

Appendix A

HiTi Graph Based Distributed '

’I‘ransitive Closure Algorithm

In a distributed database, H iTi graph can be used for describing and identifying local
relations (i.e. level 1 subgraphs). That is, each boundary node of a level 1 subgraph
1111i quely describes and identiﬁes the corresponding local relation. Furthermore, the
StI‘UCture of H iTi graph allows dynamic decomposition of a transitive closure compu-
tation. By utilizing the above two advantages, we developed an efﬁcient distributed
tram sitive closure computation. We ﬁrst describe two primary algorithms named TC.1
an d TC.2 which will be used for our proposed distributed transitive closure algorithm.

Their descriptions are given in Figures A.1 and A.2.

116

117

 

/* U : Relation U(a:,y); J. site number; */

/* k : the highest level number of a subgraph tree ST; */
begin

i— - 1; X: {30}; skip: TRUE,

BN’-— - boundary node information of SC(S’+1(X));
R} = FU.,.,nyBN}.sg,BNJ‘.site(U Ddy=:c BNj);
ifR' =0then{i=i—1;return((ll,z)}

L1.
WB;=(SB(SC(S'I,“(X)))USW(SC(S',+1(X))));

R— — 7IrR'..1:,WB".,y R; .,sg R' .site(Rj' Dal/=3 WB Ji)
while (R 75 0) {

skip = FALSE, R3: Rj U R,

R“ - "news; .,yBN;. sg,BN;. sitc(R Wyn WBil- R}? }
R; = ”I? .,.w: R' .y BN1. .99, BN'. sitc(Rj' My=3 BNj');
if (i— - - 1) then return(R1-, R2, , R}, i);
BN‘-+1k = boundary node information of SC(S'+2(X));
Rf?- = "R' .,zR' .;y,BN .sg,BN'-. site(Rj' Mv=r BN‘IH);
if (skip = TRUE) then R‘ = 0;
if R‘“- = 0 then return(R‘-, R3, ..., RE, i);
else{X= S'+1(X); i=i+1;

skip- — TRUE; goto Ll; }

end

 

 

 

Figure A.1: Algorithm TC.1: AscendingPhase (U,j,k)

These two algorithms use three different types of relations such as WBJ':(:r,y),

B N;(:r, sg, site), and R;(:::, y, sg, site). Relation WBj(a:,y) consists of tuples t where

t corresponds to either a level i between or within edge. Relation BN;(x,sg, site)

C011 tains tuples t where t.a: corresponds to a level i boundary node, t.sg corresponds
to the notation of the level i subgraph to which t.x belongs, and t.site corresponds to
tllﬁ site number storing the level 1 subgraph to which to: belongs. We assume that
the: subscript of each level 1 subgraph corresponds to a site number. For example,

hQ local relation corresponding to SC}3 lS stored at Site 3. Relation R3 1S used as

118

a working relation for computing a distributed transitive closure. Thus, each site j
has Rj-(x,y,sg, site) for each level i where 1 S i S k. Note that we use the same

notations as those deﬁned in the chapter 2.3 in TC.1 and TC.2 algorithms.

 

/* Z : Relation 2(219139135te); */
/* j: site number; i: level number; */

begin
x = STUD;

EN; = boundary node information of SC(X);
Z = Wz.:,z.,,,s~;.ag,s~;.me(Z NF, BNi')?
WBj = (SB(SC(X)) u swam»;
R = 7rZ.x,WB;.y,Z.sg,Z.site(Z ”11:2 W8”;
while (R # 0) {
skip = FALSE; Z = Z U R;
R = 7rR.z,WB;.y,R.sg,R.site(R NF, WBi) — Z i }
if (skip = FALSE) then
Z = 7rZ.z,Z.y,BN;.sg,BN;.aite(Z NF, BNJi);
return(Z);
end

 

 

 

Figure A.2: Algorithm TC.2: DescendingPhase (Z, j,i)

Algorithm AscendingPhase() traverses Hi Ti graph by following edges with non-
decreasing level number. To demonstrate Algorithm TC.1, we use the digraph and

its corresponding HiTi graph shown in the Figures A3 and A.4 respectively.

For site 1, which stores the local relation corresponding to so}, we have we;
e { (4,6), (5,7), (6,10) }, we,2 = { (10,11), (11,17), (17,19), (17,20) }, 131v,l = {
(4. so}, 1), (5, so}, 1), (0, 50;, 2), (7, 36;, 2), (10, so), 2) }, and 131v,2 = {
( 1 O, SG¥, 2), (11, 303, 3), (17, SGg, 4), (19, 50%, 5), (20, SC; 5) }. Assume that

at site 1, AscendingPhase(U, 1, 3) is invoked where U = { (4,4) }. The outputs of

 

 

 

Figure A.3: C(V, E) partitioned according to level 3 subgraph tree

 

. n 2 l1 2 I! 23 I 3
l ' 2 l “ i 1’ 3 an/
.3 , 7. IO/ I I6 .

Figure AA: A level 2 HiTi graph created from Figure A.2

AscendingPhase(U, 1, 3) are stored in relations R} and R? where R} = { (4, 6, SC;
2), (4, 10, 30.3,, 2) } and Rf = { (4, 11, .963, 3), (4, 17, 30;, 4), (4, 19, 503,, 5), (4,

20, 50:21, 5) }-

In contrast to AscendingPhase (U, j, k), DescendingPhase (Z, j, i) traverses
Hi Ti graph by following edges with a given level number. We use Hi Ti graph shown
in A.4 as an example to demonstrate Algorithm TC.2. Assume that at site 3,
l3'E=Sc.e11di11gP11ase(Z, 3, 2) is called where Z = { (4, 11, 503, 3), (4, 17, SC;, 4)
} 7 Set X is initialized with SC}3 and it becomes to contain 5C3. Then, site 3 has
BN3 = { (11, sag, 3), (14,301,3), (15, 561, 4), (16, $03,, 4), (17, 50;, 4) }, and
WE; = (11,14), (14,16), (16,15), (16,17), (15,14) }. After the execution of while
1001) and the join with BNg, Z = { (4, 11, SCQ, 3), (4, 17, SCI, 4), (4, 14, SC;, 3),

(4. 10, so}, 4) }.

120

Based on the above two algorithms, we discussed how to compute transitive clo-
sure in distributed database systems. In our distributed transitive closure algorithm,
we have two types of sites where a transitive closure is computed. They are coor-
dinator and participant sites. A site is named a coordinator if a transitive closure
computation is initially started at this site. The coordinator decomposes a transitive
closure computation into a set of sub-computations if needed. A site is called a par-
ticipant if it performs the sub-computation by the request of either a coordinator or
another participant. We give two algorithms TC.3 and TC.4 in Figures A5 and A.6

where the former is for a coordinator and the latter for a participant.

 

/* H: a set of nodes; */
/* TC: transitive closure of nodes in H */
/* k : the highest level number of a subgraph tree ST; */
begin
D] = "Halal-.171” Dds/=4c Ei); D2 = 0;

D2 = Dz U 01;

D] = "01.x,E;.y(DI NF: Ej) " Dz; }
AscendingPhase(Dg, j, k) returns (R}, R?, .. ., RI, 2);
for (m = i downto 1 by -1) {

Partition Rg" such that each partition 1"

share the same value of Ry‘sg;

Let 2 consists of PS of R3";

for each F 6 2 {

s = a majority value of f‘.site;
SendParticipant (F, m, J, 3); } }
TC = 02 U Wait_Result.From_Participant(R,*);
/* * represents a site number */
end

 

 

 

Figure A.5: Algorithm TC.3: CoordinatorSite j(H,TC,k)

121

 

/* F : Relation I‘(2:,y,sg, site); */
/* 171: level number; j: coordinator site number; */
begin
Receive(F, m, j, 4:);
/* * represents a site number */
if (m = 1) {
D = 7rl‘.a-,Ejl.y(r M =1: Ej)i R = 7rx.y(r)i
while ( D 75 0 ) {
R = R U D;
D = "D.z,EJ‘.y(D Dds/=3 Ej) — R; }
SendCoordinator(R, j); }
else {
DescendingPhase (F, j, m) returns Z;
Partition Z such that each partition T
share the same value of Z.sg;
Let 2 consists of Ts of Z;
for each T 6 Z {
Check if any value of T.site is 3;
If yes then h = 3;
else h = a majority value of T.site;
If (h 75 s) then notify coordinator site j
that site h will send the coordinator the
partial results of the transitive closure;
SendParticipant (T, m — 1, j, h); } }
end

 

 

 

Figure A.6: Algorithm TC.4: ParticipantSite 3(1‘, m, j)

To demonstrate Algorithm TC.3, we also use a recursive relation C(V, E) and
Hi Ti graph shown in Figures A3 and A.4 respectively. Assume that a recursive
query “Find all descendants of node 3” is issued at site 1. Then, site 1 becomes
COOrdinator and algorithm TC.3 is executed at site 1. Algorithm TC.3 ﬁrst computes
a. transitive closure of node 3 (i.e. H = { 3 }) with respect to SC} (V11, E11). As a
reSult, D; = { (3,4) } and AscendingPhase (D2, 1, 3) is called. After the execution

of AscendingPhase (Dz, 1, 3), we got i = 2, RI = 1 (31 6, 56;, 2), (31 101 5617

122
2) } and R? = (3, 11, SCg, 3), (3, 17, 5C3, 4), (3, 19, 5C3, 5), (3, 20, 5C3, 5)
}. R? is partitioned into two relations F, = { (3, 11, 30;, 3), (3, 17, SC;, 4) }
and F2 = { (3, 19, 503,, 5), (3, ‘20, 503,, 5) }. Next, coordinator site 1 sends its
decomposed sub-transitive closure computation requests to site 3 and 5 by executing
SendParticipant (1‘1, 2, 1, 3) and SendParticipant (F2, 2, 1, 5). Note that for 1“,, site
4 could be chosen as a participant since there is no majority site number of elements
in 1‘]. For RI, since all tuples share the same RI.sg (i.e. SCé), we have only one F;
= R} in Z and SendParticipant (1‘1, 1, 1, 2) is executed. Algorithm TC.4 is executed

at a participant site upon receiving the sub-transitive closure computation request

(e.g. SendParticipant (F1, 1, 1, 2)) from a coordinator.

To show 110w participant sites work, SendParticipant (1‘1, 2, 1, 3) and SendPar-
ticipant (1‘ 1, 1, 1, 2) mentioned in above paragraph are considered as examples. For
SendParticipant (F1, 2, 1, 3), since m = 2, site 3 ﬁrst calls DescendingPhase (F1, 1,
‘2) and gets Z = { (3, 11, SC},, 3), (3, 14, 503,, 3), (3, 16, SC}, 4), (3, 17, SC},

4) }. Relation Z is partitioned into two relations T1 = { (3, 11, SC?” 3), (3, 14,
490.33, 3) } and T2 = { (3, 16, SC}, 4), (3, l7, SCI, 4) }. For T1, site 3 sends its
deCOI‘DpOSCd transitive closure computation requests to itself by executing SendPar-
ticIipant (T1, 1, 3, 3). For T2, site 3 ﬁrst notiﬁes the coordinator site 1 that site 4
Will send partial results of the transitive closure. Site 3 then sends its decomposed
$11 b-transitive closure computation to site 4 by executing SendParticipant (T2, 1, 3,
4) - As for SendParticipant (F1, 1, 1, 2), since m = 1, site 2 computes a transitive

c1Osure with respect to SC; (V21, E3). Relation R contains { (3,6), (3,8), (3,9), (3,10)

123

} and site 2 sends R back to the coordinator site 1 by executing SendCoordinator (R,

1).

Appendix B

Empirical Analysis of Computing
Shortest Path for Road Map

Queries

B.1 ' Performance comparison of shortest path al-

gorithms

In this section, we empirically compare the performance of Dijkstra’s and Nicholson’s
Shortest path algorithms on road map graphs. Their pseudo-codes are described
i 11 Figure RI and R2 respectively. For the detailed demonstration of Nicholson’s
E11 gorithm, readers can refer to Nicholson’s paper in [77]. The inputs to the following
aJgorithms are graph C(V, E), source and destination nodes 3 and d respectively.

124

125

Note that [(x,y) represents the cost associated with each edge (x,y) 6 E.

 

1. For each node 11 6 V, A(u) = 00.

Let Ms) = 0, FS = {s}, and E5 = 0.

Select a node v. in PS for which Mn) is minimum.
FS = FS — {u} and ES = ES U {u}

If (u = d), then Mn) is the shortest path cost and Stop.

sweep

For every edge (u, v) in E, if A(v) > Mu) + l(v, v),

(a) Let /\(v) = Mu) + l(u,v).
(b) Let FS = FS U {v} if v ¢ (FS u ES).

 

 

7. Go to step 3.

 

Figure 8.1: Dijkstra’s shortest path algorithm

To compare the performance of Dij kstra’s and Nicholson’s algorithms on road map
graphs, we create two dimensional grid graphs CR(VR, ER) with 4 adjacent nodes. The
grid has N at N nodes where each row and column includes N nodes. Thus, in a graph
G3, IVRI and IERI are equal to N * N and 4 at N at (N - 1) respectively. These two
di mensional grid graphs are considered as typical examples of road maps [64]. Based
On the above scheme, we create a grid graph CR where IVRI = 200 a: 200 and lERl =
4 * 200*199. The cost of each edge in ER is generated based on a uniform distribution

[1 , 100] with 10 different seeds. As a result, we have 10 different grid graphs CRs.

For each grid graph created above, we compute 50 different shortest paths for
‘30 randomly preﬁxed pairs of source and destination nodes. Let D] J (N I C) be
the total number of nodes visited at step 6 (at step 5.(a).i and 6.(a).i) by Dijkstra’s
( Nicholson’s) algorithms. Note that we also count the nodes re—visited for DI J (N I C )

We compare the two algorithms by observing the ratio D] J / N I C . These values are

 

126

 

 

99°!"

For each node u 6 V, A,(u)=/\,(u)=oo. Let FS, = FS, = 0 and 2: = y = 0.
For each edge (s,u) in E, A,(u) = l(s,u) and FS, = FS, U {v}.
For each edge (d, v) in E, A,(v) = l(d, v) and FS, = FS, U {v}.

Let C, = min/Mu)” A,(u) and C, = minhm” A¢(v) where u 6 FS, and
U 6 FSg.

If (Cs S Ct):
(a) For each m for which A,(m) = minA’M” A,(u) for all u 6 FS,
i. For every edge (m, u) in E, if A,(u) > A,(m) + l(m,u),
A. A,(u) = A,(m) +l(m,u)
B. if 11 ¢ FS,, FS, = FS, U {u}.
(b) Reset :r = minA'M» A,(u).
. If (C; < C,),

(a) For each m for which At(m) = minMv)» A¢(v) for all v 6 FS,
i. For every edge (m, U) in E, if A¢(v) > A,(m) + l(m, v),
A. A,(v) = A,(m) + l(m,v)
B. if v ¢ FS,, FS, = FS, U {U}.
(b) Reset y = min/MM» A¢(v).
If minuer,qu. (481“) + Atlull S (minA.(w)>:c 8:0”) + minA¢(v)>y 81(0))

where w 6 FS, and v 6 FS,, the shortest path cost is minu(z\,(u) + At(u))
and Stop.

. Go to step 4.

 

Figure B.2: Nicholson’s shortest path algorithm

 

127

then averaged over the 10 different grid graphs with same source and destination

nodes. These are shown in Figure B.3.

 

DIJ
NIC

 

 

 

 

0 5101520253035404550
the number of shortest paths

Figure B.3: Performance comparison between Dijkstra and Nicholson’s algorithms

Figure B.3 clearly shows that Nicholson’s algorithm visit far less nodes than Dijk-
stra’s when they are applied to grid graphs. It is interesting to observe that the ratio
DI J / N I C is small for shorter paths. This happens when source and destination

nodes are closely located in the graph. However, even in this case, Nicholson’s algo-
rithm still performs slightly better than Dijkstra’s. Therefore, we can conclude from
Figure B.3 that Nicholson’s algorithm is superior to Dijkstra’s when graphs capture

road maps.

Next, we present our shortest path algorithm in Figure 3.4 which ﬁnds shortest
path by simultaneously constructing two trees, one rooted at the source node and
tlle other at the destination node. This simultaneous expansion of two trees is the
primary difference between our and Nicholson’s algorithms. Note that the two trees
in Nicholoson’s algorithm are expanded alternatively. That is, if one tree has the
IZlext shortest route than the other, the one with the next shortest route is expanded.

This alternating tree expansion may cause one tree to be expanded more than the

128

 

 

PPN

9"

5090719»

10.

11.

12.

13.

. For each node n 6 V, A,(u)=/\¢(u)=oo and Marked(u) = 0.

Let A,(s) = A,(d) = 0, Marked(s)=1 and Marked(d)=2.
Let FS,={s}, FS¢={t} and E, = E, = 0.

Select 711, and m, respectively from FS, and FS, such that A,(m,) and At(mt)
are minimum. Let CI = A,(m,) + At(m,).

Let FS, = FS, — {711,}, ES, = ES, u {m,}, FS, = FS, — {m,}, E3, =
ES¢U{m,}.

Obtain I = { u I Marked(u) = 2 for all u 6 FS, }.

[f I 75 0, C2 = min,€1(/\,(u) + A,(u)). Otherwise C2 = 00.

If (m, = 171,), the shortest path cost is min(C1,C2) and Stop.
1f (C2 5 C1), the shortest path cost is C2 and Stop.

If (Marked(m,) = 0), Marked(m,) = 1. 1f (Marked(m,) = O), Marked(m,) =
2.

For every edge (m,, u) in E,

(a) If A,(u) > A,(m,) + l(m,,u),
i. A,(u) = A,(m,) + l(m,,u)
ii. If u 9! (FS, 11 ES,), FS, 2 FS, u {ii}
For every edge (m,, U) in E,

(a) If )1,(v) > A¢(m¢) + l(m,,v),
i. /\¢(v) = /\¢(m¢) + l(m,,v)
ii.1fv g (FStUESg), FS¢ = F5: U {U}

Go to step 4.

 

Figure 3.4: Our two tree expansion shortest path algorithm

 

129

other. As a result, this unbalanced tree expansion may delay ﬁnding the shortest
path. However, in our approach we expand the shortest paths of the two trees. In
order to show the performance comparison between these two algorithms, we used the
same data set and methods as we used for those between Dijsktra’s and Nicholson’s.

This is shown in Figure B.5.

 

1.2IIIIII1II

I}: NIC/JUN e-
1 1

v

NIC 1-1
/ 110111
JUN 1:06
1.04

1.02

0.9; l l 1 l 1 l l l l

0 51015202530354045

the number of shortest paths

4
‘2

 

lllTlflll

lllllll

 

 

O1
C

Figure B.5: Performance comparison between Nicholson’s and Our algorithms

In Figure B.5, JUN represents the total number of nodes visited (including the
revisitation) at step 11.(a) and 12.(a) by our algorithm. Figure B.5 clearly veriﬁes
Our conjecture that Nicholson’s alternating tree expansion delays ﬁnding the shortest
Path. From this ﬁgure, we can conclude that our algorithm on the average always

performs better than Nicholson’s.

B .2 Conclusion

111 this appendix, we studied a single pair shortest path problem on the domain of
I‘Oad map queries. We have compared Nicholson’s shortest path algorithm with that of

Dijkstra. Our empirical analysis shows that Nicholson’s two tree expansion approach

 

130

outperforms Dijkstra’s one tree expansion method for road map queries. We prOposed
a more efﬁcient two tree expansion shortest path algorithm than Nicholson’s. This is

justiﬁed through empirical analysis.

Appendix C

A Survey of Database Problems in

Intelligent Transportation System

C. 1 Introduction

Intelligent Transportation Systems, ITS, are increasingly being used to automate the
navigation function of the automobile. The navigation infrastructures such as Global
Positioning Systems, GPS, and beacons have a profound impact on the design and
development of modern navigation systems. Along with this, ITS also requires power-
ful software systems including very large database system support. The development
of these infrastructures requires substantial resources involving coordination on a
mammoth scale. The US. government has played a major role in providing the nec-
essary leadership in the development of these infrastructures. Software systems and
databases have been developed by governmental as well as non-governmental agencies

131

 

 

132

such as Universities and corporations. These infrastructures and the software systems
are built on top of large digital map databases. Much work has been done in the area
of Geographic Information Systems to create large databases of digital maps. In this
paper we survey the database work which is relevant to ITS. We also discuss other

important work that is necessary for an understanding of ITS.

ITS must be able to look up information on a digital map in order to automate
navigation. A digital map not only stores location information about objects, but also
shows the relative locations of objects. These objects are of differing dimensions such
as points, line segments, and regions [106]. Multidimensional objects require more
complex modeling and processing than those required in traditional applications. A
database design for ITS must also capture the spatial relationships between these
objects. One important spatial relationship is that of neighbor. The clustering of
neighbor objects in the same area of the disk is an important database research
issue [93]. A competing topic is the dispersion of stored neighbor objects to different

disks to enable multiprocessing [55].

The database structures for these very large databases must be effective in quickly
accessing multidimensional objects. For example, the map database stored for a
system developed locally for the Twin Cities in Minnesota is approximately 10
Mbytes [92]. A system which uses country or continent wide data will access a much
larger database. Efﬁcient organization of large multidimensional databases for fast
access is a paramount database research issue in this area. Accessing large databases

of multidimensional objects requires efﬁcient spatial indexing schemes. In section 0.2

J
Ilj‘l

133

we discuss such indexing schemes. The currently available commercial navigation
systems use one or more of these index methods. However the actual details of the
indexing scheme used by particular current commercial systems is not known because

they are proprietary.

The currently available position systems, such as GPS and dead reckoning, play
an important role in the design of these database systems. If the GPS system is
used it automatically determines latitude, longitude, and altitude [1]. However, the
returned position is not accurate. If dead reckoning is used, the position of a vehicle
is known precisely at certain locations. As the vehicle travels, the new location is
calculated. Dead reckoning systems often use a beacon to ﬁnd the initial known
position. The position of the vehicle is then calculated as a displacement from the
known position as the vehicle moves. This calculation of displacement is also prone to
errors. An important issue in designing ITS is to minimize these errors. The correct
determination of a vehicle’s current location is essential for the successful operation
of an automated navigation system. Positioning technologies such as GPS and dead

reckoning are often combined to deliver the desired accuracy needed by ITS [61, 67].

In ITS, quick real time response to large database queries is essential. The simplest
query in ITS is “where am I?” This query is answered partially by the positioning
technology which determines the coordinates of the vehicle. These coordinates are
usually mapped to an object in the database such as a landmark, a road, or a region.
An important database design issue is the representation of the inherent hierarchy of

the data such as roads and landmarks within cities, and cities within counties. An

H-i

 

 

134
example query using this hierarchy is ”List all counties in Michigan which intersect

Interstate Highway 194.” A number of typical road map queries have been discussed

in [92].

Another issue of importance is the placement of the data; on board the vehicle or
at a central site. If the data is 011 board the vehicle, then each vehicle must have a
copy of the necessary data in order to answer the queries. Real time updates through
a central system is expensive. These updates must be communicated in real time for
up—to—date queries at the vehicle. If the data is maintained at some central site, the
vehicle must access this remote database in order to receive the data or answers to
its queries. 111 this model the maintenance of the database is simpler but at the cost

of increased communication overhead.

The remainder of this paper is organized as follows. In section C2 of this paper ‘

we describe the data model for ITS, and discuss classes of road map queries for ITS
in section C.3. In section C.4 we brieﬂy review the existing positioning technologies.
Section C.5 describes a number of commercially available intelligent transportation

systems.

C.2 Data Models for Vehicle Navigation Systems

ITS databases contain information about objects such as landmarks, roads, and ge-
ographic regions We can classify these objects as O-Dimensional, 1-Dimensional, 2-

Dimensional, and 3-Dimensional objects [106]. 0-Dimensional objects are points and

 

 

135

represent landmarks such as building sites or intersections of roads. A typical 1-
Dimensional object is a road segment, and it is a primary object for road map queries.
Roads are formed from straight line segments which in turn form the network of roads.
Examples of 2-Dimensional objects are geographic regions such as cities, counties, and
states. 3 dimensional objects are necessary for recognizing certain types of landmarks.
A11 object such as a building must be described by latitude, longitude, and altitude.
Efﬁcient spatial indexing schemes for these multidimensional objects are critical to

the performance of ITS queries.

C.2.1 Indexing Schemes for accessing multidimensional road

objects

Indexing schemes for ITS databases can be classiﬁed into 2 groups based on the type
of object indexed. They are point indexes for point objects, and indexes for higher
dimensional objects. We ﬁrst discuss point indexes. In order to index objects such
as line segments or regions, each vertex is a point to be indexed. Alternatively, the
object may be considered as a point in a higher dimensional space. Point indexing

schemes include quad-trees, oct-trees, point quad-trees, k-d trees, k-d-B-trees, and

the grid ﬁle.

A quad-tree is an indexing scheme for objects in a planar region. A planar object
consists of a collection of points in the plane. The quad-tree indexes the planar object

by dividing the planar region into regular quadrants. This is a recursive process, and

 

136

halts when all points in a quadrant are of the same type, i.e., part of the object or not
part of the object. The oct-tree is a generalization of the quad-tree to 3 dimensional
space. The quad-tree can be further generalized to any k-dimensional space. Quad-
trees are efficient mechanisms for computer vision, but not for ITS. The quad—tree (or
oct—tree) is not height balanced. The point quad-tree (or point oct-tree) picks points
of interest at which the plane (or 3—dimensional space) is subdivided into unequal size

quadrants. This gives an index tree which is closer to being height balanced [88].

The k-d tree is a generalization of the point quad tree. A k-d tree is a binary tree
in which k levels of the tree are necessary to partition the k-dimensional region into
2" regions. One signiﬁcant difference between k-D trees and point quad-trees is that

in k-d trees these 2“ regions do not meet in a single point [88].

The grid ﬁle approach partitions a k-dimensional space into non-uniformly sized
k—dimensional grid blocks in order to index the data points located within the
blocks [76]. This grid ﬁle approach is not suited to indexing spatial objects of higher
dimension than points. The primary reason is that k-dimensional objects are mapped
into points in 2k-dimensional space in the grid ﬁle approach. As a result, most grid
blocks will not be used. Various versions of the grid ﬁle have been proposed in order
to improve their performance [28, 63, 96, 12]. Freeston [28] has developed a nested
and balanced grid (BANG) ﬁle which has a tree structured directory. This tree struc-
tured directory has the self height balancing B-tree property. Kriegel and Seeger [63]
have proposed a method called PLOP-Hashing to eliminate the grid directory. This

is a multidimensional extendible hashing scheme which expands the pages of the ﬁle

 

 

137
corresponding to grid blocks in a piecewise linear fashion. However, the problem
with PLOP-Hashing is that it can not fully capture natural ordering of higher di-
mension spatial objects. Six and Widmayer [96] present a multilayer paradigm for
transforming point index structures (such as the grid ﬁle) into index structures for
k-dimensional intervals. This paradigm allows the preservation of the natural order-
ing of interval data. Blanken et. al. [12] propose a generalized grid ﬁle which more
fully preserves the natural ordering between geometric objects. The generalized grid

ﬁle integrates the B+-tree, the grid ﬁle, and the k-d-B-tree.

The second group of indexing schemes are better suited to indexing k—dimensional
objects in k-dimensional space. The most important indexing scheme is the R-
Tree [38]. The R—tree is a generalization of the B—tree to higher dimension objects.
All objects are in the leaves of the tree, and the leaves are all at the same level. The
R-tree functions by enclosing objects in rectangles. Nodes of the R-tree are formed
by enclosing groups of rectangles in a larger rectangle which becomes an R-tree node.
A problem with the R-tree structure is that these rectangle nodes in the tree overlap,
which causes the lookup procedure to follow multiple tree paths. Several modiﬁca-
tions to the R—tree have been developed to alleviate this problem, such as the R+-tree,
the R*-tree and the packed R-tree [10, 35, 79, 87]. These alterations to the R—tree
prevent overlap (R+-trees) or minimize the overlap by using a complicated algorithm
(R*-trees). A weakness of the R—trees is that rectangles are the stored object. An-
other weakness is that R-trees have poor storage utilization. Objects such as diagonal

line segments and convex objects are not well represented by bounding rectangles.

 

 

138

The cell tree is a modiﬁcation of the R—tree in which the bounding regions do not
need to be rectangular [36]. Jagadish uses bounding polyhedra in the P-tree [54] to
eliminate the space wasted by bounding rectangles. Jagadish has also proposed a
line segment index scheme based on the Hough transform. This index scheme always
outperforms minimum bounding rectangle index schemes for retrieving line segments

which go through a speciﬁc point or intersect a speciﬁc line segment.

Several structures improve the worst case storage utilization found in R-trees. The
MD—tree [75], like the R-tree, is a generalization of the B-tree to higher dimensions
utilizing bounding rectangles. The storage utilization of the MD-tree has a worst
case of 66.7% due to the dynamic readjustment of split regions. The GBD-tree [78]
is similar to the R-tree, but uses extra data to increase the storage utilization. GBD-
trees also speed the processes of insertion and deletion over R-trees, but at a higher
memory cost and at a higher CPU. cost for large extent data. Freeston has pro—
posed a generalization of n-dimensional B-trees in which the worst case performance
is controlled [29]. This solution recursively partitions the data space, guarantees loga-
rithmic access and update, and has a worst case 33% storage utilization. The hB-tree
proposed by Lomet and Salzberg [69] has competitive storage utilization. The hB-tree

differs from B+-trees in that the index terms are organized into k-d trees.

Kolovson and Stonebraker propose segment indexes [62] as an extension to
database indexing structures such as R-trees. These segment intervals improve the

search performance on interval data with non—uniform length distributions.

 

 

 

139

C.2.2 Large Database Issues

The amount of stored data is massive. If the data is kept entirely at the vehicle,
then the vehicle system is responsible for all queries and updates. This requires a
powerful system in the vehicle, both in terms of storage space and processing power
in order to access the information. External communications will be minimum for
this conﬁguration. If the data is stored entirely at some site outside of the vehicle,
then the vehicle system will be primarily a communications system. The external
processor(s) must be extremely powerful since all the vehicles in the processing area
will be sending their information requests to that external site. A third alternative
is to distribute the data and processing between the external site and the vehicle. In
this case, the vehicle can download the needed data from an external system, then
answer queries on that downloaded data. Troy Michigan has a system developed by

Siemans which uses this distributed approach.

There are currently a number of systems under development and testing. These
systems are restricted to a small geographic area. In order to develop a system
which is viable in a large scale system, such as the continental United States, large
database issues such as the clustering of data must be explored. Shekhar et. al., [94]
discuss a partitioning technique based on max out partitioning of a similarity graph
whose nodes are data items. The similarity is the probability that the data items will
be accessed together. This technique assigns the data items to distinct disks for a
speed up using parallel processing. The size of the database is indicated by Shekhar,

et.al [92], in that the storage size of the data for the Twin Cities metropolitan area

11"

 

 

 

140

in Minnesota is 10 Mbytes. Jagadish discusses clustering of data so that objects
which are close in multidimensional space are close together after mapping into a one

dimensional [55] in order to minimize disk accesses.

C.2.3 Update

There are two different types of algorithms which are useful in automated navigation.
A real-time system must be continuously updated according to the changes in road
conditions. This also implies that the shortest route is dynamic; that is, the route
planned at the beginning of the trip may not be the route ﬁnally taken. Methods to
alter an already computed path using the computed path information could be faster

than completely determining the path from the new current location.

For long trips, a pre—computed path may be used as part of the overall trip. If
it can be determined that the route from source to destination must pass through
certain locations, then shortest path information which is already computed between

those locations can be used in order to improve the speed of the route calculation.

Updating the multidimensional objects in the database can be costly. The cost
depends on the storage mechanism used for the objects. There is also the problem of
changing the stored information about the relative locations of objects. For example,

if a road is closed, all the information about the intersecting roads must be updated.

El—"_

 

a."

I" ..-4
I
0'

 

141
C.3 Queries

011 a 0-D object, a typical query is to ﬁnd the landmark (or nearest landmark) for
a particular location. Locations are to be described by latitude and longitude. The
queries on l-D objects are more common, since they include all the ﬁnd the path types
of queries. Some of these queries are: ﬁnd the shortest path between two locations;
ﬁnd the shortest path between two locations going through some list of other locations;
ﬁnd the shortest path between two locations not going through any locations in
some list. There is also the query to determine which road segment goes through a
particular location. The database must include the connectivity information about
road segments in order to answer these queries. One practical problem that arises is
that there is as yet no accurate large area map. The most common starting point for
automated navigation is the Tiger ﬁle produced by the US. Census Bureau. These

ﬁles have only road map accuracy.

For 2-D objects, typical queries are: which region includes a particular landmark,

which region includes a particular road segment. Another query type is to determine '

the neighboring regions of a particular region.

Of all these types of queries, the predominating ones are the ﬁnd the path queries.
These types of queries are usually recursive, and can be answered by many types of
algorithms, such as transitive closure, recursive query processing, partial transitive
closure, depth ﬁrst search, and A* [2, 3, 4, 7, 8, 21, 22, 26, 30, 40, 42, 46, 48, 49,

50, 52, 53, 57, 59, 70, 99, 101, 103]. The interface between the system and the driver

gulf

.- -.‘;§i; Z.-

 

142

must be as unobtrusive as possible. A distracting system could become the cause of
accidents. Some possibilities include voice recognition or point to menu choices for
input. For output, voice is the least distracting; reading a map with a blip to indicate

a location on a map is more distracting.

C.3.1 Optimization

Storing the data in an hierarchic fashion can help optimize the query processing. One
method of storing road map data which has been explored is storing the road infor-
mation in a hierarchy. That is, interstate expressways are at level 1, state highways
are at level 2, and county roads are at level 3. An example of the use of this in ﬁnding
the path algorithms is that a path from a source to a destination uses level 2 and level
3 roads in the immediate area of the source and destination; level 1 roads are used to
ﬁnd a path between these areas. This type of method does not guarantee an optimal

path, however, the amount of processing to ﬁnd a good path is greatly reduced.

In general, the queries must access objects of various types. Indexing schemes such
as k-d-trees and point quad trees are efﬁcient at ﬁnding point data (regions can be
transformed into higher dimension points). These methods suffer from an unbalance
in the tree heights. This leads to multiple disk accesses for a single object. The
methods based on B-trees, such as the R-tree, are better in terms of ﬁnding a leaf
node in the tree which contains the desired search object, however, multiple leaves
may be discovered in the search due to overlap. The MD-tree and GDB-tree were

developed to decrease the number of disk accesses in a search for an object. Shekhar

 

143

and Liu have presented a clustering method for storing objects such that connected
objects are likely to be stored in the same disk sector [93] Shekhar and Yang have
described MoBiLe Files in which the population density of objects determines the

sector location for storage [90].

The number of processors available to answer queries is a vital factor in optimiza-
tion. For a single processor system, storing neighboring objects in the same bucket
is efﬁcient, since all probable desired objects can be retrieved with a single disk ac-
cess of that bucket. In a multiprocessor environment, it is more efﬁcient to store the
neighboring objects in different buckets on different disks. All probable desired search

objects can be accessed in parallel.

C.3.2 Query Types

We divide the types of queries into two categories, shortest path, which has been
extensively covered in the literature, and all other types of queries which we discuss
in this section. One query is to retrieve certain information about an object in
the database. For example, return the name of a road at a particular latitude and
longitude; list the nearest landmark(s) (building, road intersection, etc.) to a certain
location. Geographical data is inherently hierarchical. Queries are often asked in this
context: list all the restaurants in a particular city; list all the roads in a particular
county which are currently undergoing repair. Another classiﬁcation of queries is
by neighborhood. List all the national parks near 196; list all public colleges within

200 miles of home. A query such as list all the roads which intersect Maple street

[RAJ—A- " _

weave :
.

 

144

within a particular county will use hierarchic information as well as the connectivity

information which is necessary to answer shortest path queries.

C.4 Positioning Technology

One of the main components of a navigation system is the current position determi-
nation system. There are a number of hardware systems available for determining
the current position. Typical examples of them include global positioning system
(GPS), differential global positioning system (DGPS), Loran-C, dead reckoning, and
map matching. The above systems can be used either independently or combined
in the navigation system to enhance the accuracy of the vehicle’s position. In this

section, we discuss the above systems.

The U .S. government has placed satellites in orbit around the earth. GPS func-
tions by sampling signals from several satellites. The hardware systems are able to
use this sampled information to determine the absolute current position of the ve-
hicle in latitude and longitude. GPS is inaccurate and the error can be as great as
100m [61]. Many papers have been published which show methods of reducing this

error [1, 27, 100].

DGPS is a combination of on-board GPS hardware and GPS hardware at a base
station. Here, the precise location of the base station is known. It is assumed that the
positional error, determined by on-board GPS hardware, is identical to that deter-

mined at the base station. By adjusting the reported position of the vehicle according

 

 

145

to the error at the base station, the error factor can be reduced to as little as 5m.
Problems with DGPS are that the base station must communicate with the vehicle,

requiring additional capabilities on the receivers [27].

Loran-C is a system for sending location information by radio signals from land
based sites. Loran-C signals are propagated along the earth’s surface. The vehicle
hardware samples signals from multiple transmitters, and determines a location. The
positional error of Loran-C is 500m, but can be reduced to 100m. This compares

favorably with that of GPS [66].

Dead Reckoning is a method of locating one’s current position on a map, given
an accurate starting position and the vectors representing headings and velocities of
vehicles. The instruments that are used in dead reckoning to measure changes in a

vehicles direction and speed includes:

0 Wheel odometers for measuring distance, these are subject to changes in tire
inﬂation, tire tread wear and slippage.

o Compasses for determining direction, these are subject to errors due to the
magnetic ﬁeld of the vehicle and changes in the earth magnetic ﬁeld in different
locations.

0 Gyroscopes for measuring angular velocities of the vehicle, they suffer from
changing temperatures which cause drifting problems.

0 Accelerometers for measuring changes in acceleration, both Gyroscopes and
accelerometers cannot accurately measure small changes in movement [67].

It is possible to precisely determine the current position if a complete and accu-

rate information about a vehicles movements and its starting point is given. However,

in reality, the information gathered from the instruments for a vehicle’s movements

 

 

 

146

is likely to be inaccurate. As a result, the dead reckoning method accumulates er-
rors as the vehicle continues to move [67]. This error accumulation is the primary

disadvantage of the dead reckoning method.

Map matching is a method of correcting errors in vehicle’s position caused by
either GPS or dead reckoning systems. It tries to locate a logical path on the map
that closely resembles the physical path traveled. For example, if after traveling 5
miles from the starting point the vehicle turns 90 degrees to the right, the map—

matching algorithm will search for a road to ﬁt the new direction.

C.5 Current Systems

In this section, we describe currently operational vehicle navigation systems in Eu-
rope, U.S.A. and Japan. The current vehicle navigation systems can be classiﬁed
based on 3 different positioning technologies. They are systems based on dead reck-
oning with map matching, GPS, and the combined use of GPS and dead reckoning

with map matching.

The systems based on the dead reckoning with map matching include Ali Scout,
Euro-Scout, Autoguide, Carin, and ETAK Travelpilot [14, 15, 18, 85, 97, 98]. Among
these system, ETAK Travelpilot is a stand-alone system which does not require any
infrastructure [14, 18]. Its map databases are stored in one CD-ROM disk. Trav-
elpilot provides two major functions to a driver. One is to display to the driver a

road map of the area surrounding the car. The other is to locate destinations by the

 

147

street address, and to indicate those destination on the map. The main disadvantage
of this system is that it does not provide a route guidance to the driver. The rest
of the systems [15, 85, 97, 98] are based on the infrastructure called Beacon. Bea-
cons are usually installed at selected trafﬁc signal lights and other strategic locations.
They broadcast trafﬁc information to the vehicles, such as the current location and
the description of the surrounding road networks. As a result, these beacon-based
navigation systems provide most recent real time route guidance to the drivers. In-
teresting thing to note is that Ali-Scout and Euro-Scout do not require the car be
equipped with CD—ROMs maintaining road maps. In other words, vehicles receive all

necessary map data from the Beacons.

The GPS-based system are GuideStar, Milemaster, Satellite Cruising System,
and N V-l [41, 95]. Milemaster [41] is based on the text—based routes databases while
others are based on the digitized road maps. Unlike beacon-based systems, GPS-
based systems do not provide the most up—to—date real time routing guidance to the

driver.

Advance, Multi Vision, and Socrates are the typical examples of the navigation
systems using GPS as well as dead reckoning with map matching [13, 108]. Compared
with the other systems based on either GPS or dead reckoning with map matching

alone, these system provide very accurate current position of the vehicles.

Two of the most popular providers of digital maps and software for map access are
ETAK and NavTech[][]. ETAK is a provider of a map database and software for ac-

cessing the database. ETAK has been instrumental in a number of ITS projects such

a."

4".

 

148

as Pathﬁnder ATIS in the Los Angeles area; TravTek in greater Orlando, Florida;
Minnesota Guidestar; TRiMS (Trip Reduction Information Management System);
and'others. ETAK actually uses two separate databases: one database for drawing
maps and another network database with connectivity information for path calcu—
lations. The network database is organized in connectivity—based clusters to reduce

access time. ETAK uses a modiﬁed Moore’s algorithm for path ﬁnding.

NavTech developed their own data model which has some of the features of the
relational data model to support a navigation system. The physical organization of
database is based 011 the k-d trees which uses hierarchical notions of parcels, zones,
and regions. Parcels are lowest logical entity and it deﬁnes a geographical area with a
certain number of road segments (always less than some maximum) [39]. The stored

regions represent physically overlapping geographic areas.

The NavTech database is compiled from a set of standard ﬁles on IBM main-
frame computer, for a particular region. It contains sufﬁcient detail, accuracy, and
breadth to support the advanced transportation and navigation applications. The
database contains navigation and digital cartography information partitioned into
regions. The data includes detailed city and inter-town databases including traf-
ﬁc control data such as time dependent restriction on road accesses. The NavTech
database consists of 7 database layers which are customer speciﬁc, points of interest,
cartography, geo-political, navigation, path, and geometry layers [19]. NavTech allows
the database to be compiled into different standards such as the European standard

called GDF(Geographic Data File) [17].

 

 

 

Bibliography

[1]

[‘21

[3]

[41

151

[6]

[7]

181

19]

M. Abousalem and E. Krakiwsky, ”A Quality Control Approach for GPS-Based
Automatic Vehicle Location and Navigation Systems” IEEE Int ’1 Conf. on Ve-
hicle Navigation and Information Systems(VNIS I VHS) pp. 466-470, 1993.

R. Agrawal, “Alpha: An Extension of Relational Algebra to Express a Class of
Recursive Queries”, IEEE Transactions on Software Engineering, Vol. 14, No. 7,
1988.

R. Agrawal, A. Borgida, and H. Jagadish, “Efﬁcient Management of Transitive
Relationships in Large Data and Knowledge Bases”, In Proc. ACM-SICMOD
1.98.9 Intl. Conf. on Management of Data, pp. 253-262, 1989.

R. Agrawal, S. Dar, and H. Jagadish, “Direct Transitive Closure Algorithms: De-
sign and Performance Evaluation”, In ACM Transactions on Database Systems,
Vol. 15, No.3, pp, 427-458, September 1990.

R. Agrawal, and H. Jagadish, “Algorithms for Searching Massive Graphs”, In
IEEE Transactions on Knowledge and Data Engineering, Vol. 6, No.2, pp. 225-
238, April 1994.

S. Azuma, K. Nishida, and S. Hori, “The Future of In-Vehicle Navigation Sys-
tems”, IEEE Int ’1 Conf. on Vehicle Navigation and Information Systems{VNIS
I VHS), pp. 537-542, 1994.

F. Bancilhon, “Naive Evaluation of Recursively Deﬁned Relations”, 0n Knowl-
edge Base Management Systems — Integrating Database and AI systems, Spring
Verlag, 1985.

F. Bancilhon and R. Ramakrishnan, “An Amateur’s Introduction to Recursive

Query Processing Strategies”, In Proc. ACM-SICMOD 1986 Intl. Conf. on Man-
agement of Data, pp. 16—52, 1986.

J. Bander and C. White, “A New Route Optimization Algorithm for Rapid

Decision Support”, IEEE Int ’1 Conf. on Vehicle Navigation and Information
Systems(VNIS I VHS), pp. 709-727, 1991

149

 

150

[10] N. Beckmann, H. Kriegel, R. Schneider, and B. Seeger, “The R* —tree: An
Efficient and Robust Access Method for Points and Rectangles”, Proc. ACM-
SICMOD 1990 Intl. Conf. on Management of Data, pp. 322-331, 1990.

[11] B. Becker, H. Six, and P. Widmayer, “Spatial Priority Search: An Access Tech—
nique for Scaless Maps”, Proc. ACM-SICMOD 1991 Intl. Conf. on Management
of Data, pp. 128—137, 1991.

[12] H. Blanken, A. Ubema, P. Meek, and B. van den Akker, ”The Generalized Grid
File: Description and Performance Aspects”, Proc. IEEE 6th Int’l Conf. on Data
Engineering, pp. 380-388, 1990.

[13] D. Boyce, A. Kirson, and J. Schofer, “Design and Implementation ofADVANCE:
The Illinois Dynamic Navigation and Route Guidance Demonstration Program”,
IEEE Int ’1 Conf. on Vehicle Navigation and Information Systems(VNIS IVHS),
pp. 415-426, 1991

[14] .1. Buxton, S. Honey, W. Suchowerskyj, and A. Tempelhof, “The Travelpilot:
A Second-Generation Automotive Navigation System”, IEEE Transactions on
Vehicular Technology, Vol. 40, No. 1, pp. 41-44, February 1991.

[15] I. Catling and B. McQueen, “Road Transport Informatics in Europe — Major
Programs and Demonstrations”, IEEE Transactions on Vehicular Technology,
Vol. 40, No. 1, pp. 132-140, February 1991.

[16] S. Ceri, M. Negri, and G. Pelagatti, “Horizontal Partitioning in Database De-
sign”, In Proc. ACM SICMOD 1982 Intl. Conf. on Management of Data, 1982.

[17] H. Claussen, W. Lichtner, L. Heres, P. Lahaije, and .1. Siebold, “GDF, A Pro-
posed Standard for Digital Road Maps to be Used in Car Navigation Systems”,
IEEE Int ’1 Conf. on Vehicle Navigation and Information Systems(VNIS IVHS),
pp. 324-330, 1993.

[18] H. Claussen and R. GmbH, “Status and Directions of Digital Map Databases
in Europe”, IEEE Int 7 Conf. on Vehicle Navigation and Information Sys-
tems(VNIS IVHS), pp. 25-28, 1993.

[19] Navtech Company Document, Navigation Technologies, 740 East Arques Avenue,
Sunnyvale, CA USA 94086-3833 '

[20] T. Cormen, C. Leiserson, and R. Rivest, “The Set-Covering Problem”, In In-
troduction to Algorithms, MIT Press, Cambridge, Mass. and McGraw-Hill, New
York, 1990, Chapter 37.3, pp. 974-978.

[21] I. Cruz and T. Novell, “Aggregate Closure: An Extension of Transitive Closure”,
Proc. IEEE 5th Int’l Conf. on Data Engineering, 1989.

[22] S. Dar and H. Jagadish, “A Spanning Tree Transitive Closure Algorithm”, Proc.
IEEE 8th Int ’1 Conf. on Data Engineering, 1992.

‘4: 1 1'
._

 

151

[23] N. Dec and C. Pang, “Shortest-Path Algorithms: Taxonomy and Annotation”,
Networks, Vol. 14, pp. 275-323, 1984.

[24] W. Dijkstra, “A Note on Two Problems in Connection with Graphs”, In Numer.
Math., Vol. 1, pp. 269-271, 1959

[25] S. Dreyfus, “An Appraisal of Some Shortest-Path Algorithms”, Operation Re-
search, Vol. 17, pp. 395-412, 1969

[26] J. Eder, “Extending SQL with General Transitive Closure and Extreme Value
Selections”, IEEE Transactions on Knowledge and Data Engineering, Vol. 2, No.
4, 1990.

[27] C. Feijoo, J. Ramos, and F. Perez, ”A System for Fleet Management using Dif-
ferential GPS and VHF Data Transmission Mobile Networks” IEEE Int ’l Conf.
on Vehicle Navigation and Information Systems(VNIS I VHS), pp. 445-448, 1993.

[28] M. Freeston, “The BANG ﬁle: A New Kind of Grid File”, Proc. ACM-SICMOD
1987 Int ’l. Conf. on Management of Data, pp. 260-269, 1987.

[29] M. Freeston, ”A General Solution of the n-dimensional B—tree Problem”, Proc.

ACM-SICMOD 1995 Int ’1. Conf. on Management of Data, pp. 80-91, 1995.

[30] D. Galperin, “On the optimality of A*”, In Artiﬁcail Intelligence, Vol. 8, No. 1,
pp. 69-76, 1977.

[31] G. Gallo and S. Pallottino, “Shortest Path Methods: A Unifying Approach”,
Math. Programming Study, Vol. 26, pp. 25-36, 1984.

[32] S. Ganguly, A. Silberschatz, and A. Tsur, “A Framework for the Parallel Pro-
cessing of Datalog Queries”, In Proc. ACM-SICMOD 1990 Intl. Conf. on Man-
agement of Data, pp. 143-152, 1990.

[33] M. Garey and D. Johnson, “Computers and Intractability: A Guide to the The-
ory of NP-Completeness”, W.H. Freeman and Company, New York, 1979, pp.
222.

[34] B. Golden and T. Magnanti, “Deterministic Network Optimization - A Bibliog-
raphy”, Networks, Vol. 7, pp. 149-183, 1977.

[35] D. Greene, “An Implementation and Performance Analysis of Spatial Data Ac-
cess Methods”, Proc. IEEE 5th Int ’1 Conf. Data engineering, pp. 606-615, 1989.

[36] O. Giinther, “The Design of the Cell Tree: An Ob ject-Oriented Index Structures
for Geometric Databases”, Proc. IEEE 5th Int’l Conf. Data engineering, pp.
598-605, 1989.

[37] O. Giinther and H. Noltmeier, “Spatial Database Indices for Large Extended
Objects”, Proc. IEEE 7th Int ’1 Conf. Data engineering, pp. 520-526, 1991.

152

[38] A. Guttman, “R-trees: A Dynamic Index Structure for Spatial Searching”, Proc.
ACM-SICMOD 1987f Intl. Conf. on Management of Data, pp. 47-57, 1984.

[39] J. Guzolekm and E. Koch, “Real-Time Route Planning in Road Networks”,
IEEE Int ’1 Conf. on Vehicle Navigation and Information Systems(VNIS I VHS),
pp. 165-169, 1989.

f [40] .1. Han, G. Quadah, and C. Chaou, “Processing and Evaluation of Transitive
Closure Queries”, Proc. Int ’1 Conf. on Extending Database Technology, 1988.

[41] S. Hoffman and C. Stewart, “Text-based Routing: An Affordable Way Ahead ?”,
IEEE Int ’l Conf. on Vehicle Navigation and Information Systems(VNIS I VHS),
pp. 45-48, 1993.

[42] M. Houtsma, P. Apers, and S. Ceri, “Distributed Transitive Closure Computa-
tions: The Disconnection Set Approach”, In Proc. of the 16th VLDB Conference,
pp. 335-346, 1990.

[43] M. Houtsma, F. Cacace, and S. Ceri, “Parallel Hierarchical Evaluation of Tran-
sitive Closure Queries”, In Proc. Ist Int. Conf. on Parallel and Distributed In-
formation Systems, pp. 130-137, Dec. 1991.

[44] M. Houtsma, P. Apers, and G. Schipper, “Data Fragmentation for Parallel Tran-
sitive Closure Strategies”, In Proc. IEEE 9th Int ’1 Conf. Data Engineering, pp.
447-456, 1993.

[45] Y. Huang and J. Cheiney, “Parallel Computation of Direct Transitive Closures”,
In Proc. IEEE 7th Int ’l Conf. Data Engineering, pp. 192-199, 1991.

[46] K. Hua, J. Su, and C- Hua, “Efﬁcient Evaluation of Traversal Recursive Queries
Using Connectivity Index”, In Proc. IEEE 9th Int ’1 Conf. Data Engineering, pp.
549-558, 1993.

[47] G. Hulin, “Parallel Processing of Recursive Queries in Distributed Architec-
tures”, In Proc. of the 5th Intl Conf. on VLDB, pp. 87—96, 1989.

[48] Y. Ioannidis, “On the Computation of the Transitive Closure of Relational Op-
erators”, Proc. of the 13rd VLDB Conference, 1987.

[49] Y. Ioannidis and R. Ramakirishnan, “Efﬁcient Transitive Closure Algorithms”,
In Proc. of the 14th VLDB Conference, pp. 382-394, 1988.

[50] Y. Ioannidis, R. Ramakirishnan, and L. Winger “Transitive Closure Algorithms
Based on Graph Traversal”, In ACM Transactions onDatabase Systems, Vol.
18, No.3, pp. 513-576, September 1993. I

[51] K. Ishikawa, M. Ogawa, S. Azume, and T. Ito, “Map Navigation Software of the
Electro Multivision of the ’91 Toyota Soarer”, Int. Conf. on Vehicle Navigation
and Information Systems(VNIS I VHS), IEEE, (1991) 463-473.

 

153

[52] H. Jagadish, R. Agrawal, and L. Ness, “A Study of Transitive Closure As a Re-
cursion Mechanism”, In Proc. ACM-SICMOD 1987 Intl. Conf. on Management
of Data, pp. 331-344, 1987.

[53] H. Jagadish, “A Compression Technique to Materialize Transitive Closure”, In
ACM Transactions on Database Systems, Vol. 15, No.4, pp, 559-598, December
1990.

[54] H. Jagadish, “Spatial Search with Polyhedra”, Proc. IEEE 6th Int ’1 Conf. Data
engineering, pp. 311-319, 1990.

[55] H. Jagadish, “Linear Clustering of Objects with Multiple Attributes”, Proc.
ACM-SICMOD 1990 Intl. Conf. on Management of Data, pp. 332-342, 1990.

[56] H. Jagadish, “On Indexing Line Segments”, Proc. ACM- VLDB 1990 Intl. Conf.
on Very Large Data Bases, pp. 614-625, 1990.

[57] B. Jiang, “A Suitable Algorithm for Computing Partial Transitive Closures in
Databases”, In Proc. IEEE 6th Int ’1 Conf. Data engineering, pp. 264-271, 1990.

[58] S. Jung and S. Pramanik, “An Efﬁcient Representation of Distributed Fragments
of Recursive Relations” In Proceedings of the Fourth International Conference on
Database Systems for Advanced Applications, pp. 394-407, 1995.

[59] S. Jung and S. Pramanik, “HiTi Graph Model of Topographical Road Maps in
Navigation Systems”, In Proc. IEEE 12th Int ’1 Conf. on Data Engineering, 1996.

[60] S. Jung and S. Pramanik, “Empirical Analysis of Computing Shortest Path for
Road Map Queries”, submitted to Information Processing Letters.

[61] W. Kao, ”Integration of GPS and Dead Reckoning Navigation Systems” IEEE
Int ’l Conf. on Vehicle Navigation and Information Systems(VNIS I VHS), 1991,
pp. 635-643 '

[62] C. Kolovson and M. Stonebraker, “Segment Indexes: Dynamic Indexing Tech-
niques for Multi-Dimensional Interval Data”, Proc. ACM-SICMOD 1991 Int ’l.
Conf. on Management of Data, pp. 138-147 1991.

[63] H. Kriegel and B. Seeger, “PLOP-Hashing: A Grid File without Directory”,
Proc. IEEE 4th Int ’1 Conf. Data engineering, pp. 369-375, 1988.

[64] R. Kung, E. Hanson, Y. Ioannnidis, T. Sellis, L. Shapiro, and M. Stonebraker,
“Heuristic Search in Data Base System”, In Proc. Ist Int. Workshop Expert
Database Systems, pp. 96—107, Oct. 1984.

[65] Y. Kusumi, S. Nishio, and T. Hasegawa, “File Access Level Optimization Us-
ing Page Access Graph 011 Recursive Query Evaluation”, Proc. Int ’1 Conf. on
Extending Database Technology, 1988.

' 1661

[57]

[68]

[59]

[701

[71]

[72]
[731

[74]

[751

[76]

[771

[78]

154

G. Lachapelle, B. Townsend, H. Gehue, and M. Cannon, ”GPS versus Loran-
C for Vehicular Navigation in Urban and Mountainous Areas” IEEE VNIS pp.
456-459, 1993.

T. Lezniak, R. Lewis, and R. McMillen. “A Dead Reckoning/ Map Correlation
System for Automatic Vehicle Tracking”, IEEE Transactions on Vehicular Tech-
nology, Vol. 26, No. 1, pp. 47-60, February 1977.

B. Liu, S. Choo, S. Lok, S. Leong, S. Lee, F. Poon, and H. Tan, “Intergrating
Case-Based Reasoning, Knowledge—Based Approach and Dijkstra Algorithm for
Route Finding”, Proc. Tenth Conf. Artiﬁcial Intelligence for Applications (CAIA
’94), (1994) 149-155. E

D. Lomet and B. Salzberg, ”A Robust Multi-Attribute Search Structure” Proc.
IEEE 5th lnt’l Conf. Data engineering, pp. 296-304, 1989.

H. Lu, K. Mikikilineni, and J. Richardson, “Design and Evaluation of Algorithms
to Compute the Transitive Closure of a Database Relation”, Proc. IEEE 3rd Int ’1
Conf. on Data Engineering, 1987.

M. Mannino and L. Shapiro, “Extensions to Query Languages for Graph Traver-
sal Problems”, IEEE Transactions on Knowledge and Data Engineering, Vol. 2,
No. 3, pp. 353-363, 1990.

T. Mohr and C. Pasche, “A Parallel Shortest Path Algorithm”, Computing, Vol
40, pp. 281-292, 1988.

S. Moran and Y. Perl, “The Complexity of Identifying Redundant and Essential
Elements”, In Journal of Algorithm, Vol 2, pp. 22—30, 1981.

W. Nejdl, S. Ceri, and G. Wiederhold, “Evaluating Recursive Queries in Dis-
tributed Databases”, In IEEE Transactions on Knowledge and Data Engineer-
ing, Vol 5, pp. 104-121, February 1993.

Y. Nakamura, S. Abe, Y. Ohsawa, and M. Sakauchi, ”A Balanced Hierarchical
Data Structure for Multidimensional Data with Highly Efﬁcient Dynamic Char-

acteristics”, IEEE Trans. on Knowledge and Data Engineering, Vol 5, No. 4, pp.
682-693, August 1993.

J. Nievergelt and H. Hinterberger, “The Grid File: An Adaptable, Symmetric
Multikey File Structure”, ACM Transactions on Database Systems, Vol 9, No.
1,pp. 38-71, March, 1984.

T. Nicholson, “Finding the shortest route between two points in a network”,
Computer Journal, Vol. 9, (1966) 275-280.

Y. Ohsawa and M. Sakauchi, “A New Tree Type Data Structure with Homoge-
neous Nodes Suitable for a Very Large Spatial Database”, Proc. IEEE 6th Int ’l
Conf. Data engineering, pp. 296-303, 1990.

If...»
A

“‘1

 

 

155

[79] D. Papadias, Y. Theodoridis, T. Sellis, and M. Egenhofer, ”Topological Relations
in the World of Minimum Bounding Rectangles: A Study with R-trees”, Proc.
ACM-SICMOD 1995 Int ’1. Conf. on Management of Data, pp. 92-103, 1995.

[80] J. Pearl, In Heuristics: Intelligent Search Strategies for Computer Problem Solv-
ing, Addison Wesley, Reading, Mass., 1984.

[81] A. Pier, “Bibliography on Algorithms for Shortest Path, Shortest Spanning Tree,
and Related Circuit Routing Problems (1956-1974)”, Networks, Vol. 5, pp. 129-
149, 1975.

[82] S. Pramanik, and S. Jung, “Description and Identiﬁcation of Distributed Frag-
ments of Recursive Relations”, accepted for IEEE Transactions on Knowledge
and Data Engineering.

[83] G. Qadah, L. Henschen, and J. Kim, “Efﬁcient Algorithms for the Instantiated
Transitive Closure Queries“ In IEEE Transactions on Software Engineering, Vol

17, No. 3, pp. 296-309, March 1991.

[84] Ries and Epstein, “Evaluation of Distribution Criteria for Distributed Database
Systems“ In U. C. Berkley, Tech. Report UCB/ERL M78/22, May 1978.

[85] D. ROper and G. Endo, “Advanced Trafﬁc Management in California”, IEEE
Transactions on Vehicular Technology, Vol. 40, No. 1, pp. 152-158, February
1991.

[86] A. Rosenthal, S. Heiler, U. Dayal, and F. Manola, “Traversal Recursion: A
practical approach to supporting recursive applications” In Proc. IEEE 3rd Int ’l
Conf. Data Engineering, pp. 580-590, 1987.

[87] N. Roussopoulos and D. Leifker, ”Direct Spatial Search on Pictorial Databases
Using Packed R-trees”, Proc. ACM-SICMOD 1985 Int ’1. Conf. on Management
of Data, pp. 17-31, May 1985.

[88] H. Samet, Applications of spatial data structures : computer graphics, image
processing, and CIS, Addison-Wesley series in computer science, 1990

[89] J. Shapiro, J. Waxman, and D. Nir, “Level Graphs and Approximate Shortest
Path Algorithms”, In Networks, Vol. 22, pp. 691-717, 1992.

[90] S. Shekhar and T. Yang, “Motion in a Geographical Database System”, Proc.
9nd Symp. Design and Implementation of Large Spatial Databases, 1991.

[91] S. Shekhar, A. Kohli, and M. Coyle, “Path Computation Algorithms for Ad-
vanced Traveller Information System (ATIS)”, In Proc. IEEE 9th Int ’1 Conf.
Data engineering, pp. 31-39, 1993.

[92] S. Shekhar and D. Liu, “Genesis and Advanced Traveler Information System
(ATIS): Killer Application for Mobile Computing?”, NSF MOBIDATA Work-

shop on Mobile and Wireless Information System, Nov. 1994.

156

[93] S. Shekhar and D. Liu, “CCAM: A Connectivity-Clustered Access Method for
Aggregate Queries on Transportation Networks: A Summary of Results”, IEEE
11th Int ’1 Conf. Data engineering, pp. 410—419, 1995.

[94] S. Shekhar and D. Liu, “A Similarity Graph Based Approach to Declustering
Problems and its Application towards Parallelizing Grid Files”, In IEEE 11th
Int ’1 Conf. on Data engineering, 1995.

[95] M. Shibata and Y. Fujita, “Current Status and Future Plans for Digital Map
Databases in Japan”, IEEE Int’l Conf. on Vehicle Navigation and Information
Systems(VNIS 1 VHS), pp. 29—33, 1993.

[96] H. Six and P. Wildmayer, “Spatial Searching in Geometric Databases”, Proc.
IEEE 4th Int ’1 Conf. on Data Engineering, pp. 496-503, 1988.

[97] H. Sodeikat, “EURO-SCOUT is Facing the German 1994 Market”, IEEE Int’l
Conf. on Vehicle Navigation and Information Systems(VNIS I VHS), pp. 551-556,
1994.

[98] R. Tomkewitsch, “Dynamic Route Guidance and Interactive Transport Manage-
ment with ALI-SCOUT” IEEE Transactions on Vehicular Technology, Vol. 40,
No. 1, pp. 45—50, February 1991.

[99] 1. Toroslu and G. Qadah, “The Efﬁcient Computation of Strong Partial
Transitive-Closures”, In Proc. IEEE 9th Int ’l Conf. Data Engineering, pp. 530-
537, 1993.

[100] H. Tsuji, H. Maeda, A. Shibata, and F. Morisue, ”Evaluation of Location Sys-
tem Combining a GPS Receiver with Inertial Sensor”, IEEE Int ’1 Conf. on Ve-
hicle Navigation and Information Systems{VNIS I VHS), 1991, pp. 645-649

[101] P. Valduriez and H. Boral, “Evaluation of Recursive Queries Using Join In-
dices”, Int ’1 Conf. on Expert Database Systems, 1986.

[102] P. Valduriez and S. Khoshaﬁan, “Parallel Evaluation of the Transitive Closure
of a Database Relation”, In Intl. Journal of Parallel Programming, pp 19-42,
1988.

[103] P. Valduriez and S. Khoshaﬁan, “Transitive Closure of Transitively Closed Re-
lations”, Int ’l Con. on Expert Database Systems, Benjamin Cumming, 1989.

[104] D. Vineyard, S. Jung, and S. Pramanik, “A Survey of Database Problems in
IVHS”, submitted to IEEE Transactions on Knowledge and Data Engineering.

[105] H. Wang and B. Zhang, “Route Planning and Navigation System for an Au-
tonomous Land Vehicle”, IEEE Int ’1 Conf. on Vehicle Navigation and Informa-
tion Systems(VNIS IVHS), pp. 135-141, 1992.

157

[106] R. Weiland, ”Standards for Navigable Databases: A Progress Report” IEEE
Int ’1 Conf. on Vehicle Navigation and Information Systems(VNIS I VHS), pp.

185-192, 1991.

[107] T. Yang, S. Shekhar, B. Hamidzadeh, and P. Hancock, “Path Planning and
Evaluation in IVHS Databases”, IEEE Int’l Conf. on Vehicle Navigation and
Information Systems(VNlS I VHS), pp. 283-290, 1991

[108] F. Zijderhand and J. Biesterbos, “Functions and Applications of SOCRATES:
A Dynamic In-Car Navigation System with Cellular-Radio Based Bi-directional
Communication Facility”, IEEE Int ’l Conf. on Vehicle Navigation and Informa-
tion Systems(VNIS I VHS), pp. 543-546, 1994.

HICHIGQN STATE UNIV. LIBRARIES
IIIII[IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
31293014107365