ECONOMIC GAIN-AWARE ROUTING PROTOCOLS FOR DEVICE-TO-DEVICE
CONTENT DISSEMINATION
By
Faezeh Hajiaghajani Memar

A THESIS
Submitted to
Michigan State University
in partial fulfillment of the requirements
for the degree of
Electrical Engineering-Doctor of Philosophy
2017

ABSTRACT
ECONOMIC GAIN-AWARE ROUTING PROTOCOLS FOR DEVICE-TO-DEVICE
CONTENT DISSEMINATION
By
Faezeh Hajiaghajani Memar
The objective of this dissertation is to investigate Device-to-Device content dissemination
protocols for maximizing the economic gain of dissemination for given combinations of
commercial and network parameters. We pose this as a gain-aware multicast Delay Tolerant
Network (DTN) routing problem in a Social Wireless Network (SWN) setting. Commercial
parameters in this framework include the revenue out of selling a product, discount, rebate, content
expiry time, and a content-specific redemption function. These parameters are inputs provided by
a content generator. Network parameters are decided based on a Social Wireless Networkâs
characteristics such as consumersâ interest profiles and their social bindings summarized in their
mobility profiles. We explore two solution approaches. First, we investigate stochastic design
methods for gain-aware DTN protocol design including prediction-based dissemination
mechanisms. We develop a predictive gain utility which is used for deciding relative importance
of an immediate peer in comparison with potential future peers. We propose two versions of the
utility; the first one, Predictive Gain Utility Routing-Individual (PGUR-I) uses node-specific
information, and the second one, Predictive Gain Utility Routing-Aggregated (PGUR-A)
implements a more scalable version by probabilistically aggregating node interaction information
across consumers with similar consumption interests. In the second approach, we explore learning
mechanisms including evolutionary learning and reinforcement learning for synthetizing
communication protocols. We explore evolutionary learning, specifically, evolving state machines
for protocol synthesis. A specific network protocol is coded as a genotype and its resulting state

machine behavior manifests in the form of the corresponding phenotype, leading to specific
performance. Using the developed framework, we explore the evolutionary design of Medium
Access Control (MAC) protocols including ALOHA, S-ALOHA and Carrier Sensing Multiple
Access (CSMA). Finally, we leverage Reinforcement Learning (RL), which enables nodes to
gather local information and make efficient routing decisions in runtime.

ACKNOWLEDGEMENTS

My deepest gratitude is to my advisor, Dr. Subir Biswas for his endless support,
encouragement and guidance throughout my work. It was through his mentorship that I learned
how to be curious, optimistic and enthusiastic, how to think independently, own a research problem
and find hope in failure.
I would also like to thank my committee Dr. Charles Ofria, Dr. Jian Ren and Dr. Kalyanmoy
Deb for their time and guidance.
My thanks also go to past and current members of our lab: Debasmit Banerjee, William
Tomlinson, Stephan Lorenz, Bo Dong, Qiong Huo, Saptarshi Das, Ian Shi, Feng Dezhi, Rui Wang,
Salim Muhamed, Henry Griffith and Brandon Harrington for their collaboration, support and for
many memorable moments. I am deeply grateful for my East Lansing friends who have been my
family throughout these years. I would like to thank my brother, Foad, as his presence in my life
has been a motivation for me in ways he may not realize. Last but not least, I would like to give
thanks to my parents for making sure that I feel their love and support for the last 1819 days,
despite being 6342 miles away.

iv

TABLE OF CONTENTS

LIST OF TABLES .........................................................................................................................ix
LIST OF FIGURES ........................................................................................................................x
LIST OF ALGORITHMS ............................................................................................................xiv
1
1.1
1.2
1.3
1.3.1
1.3.2
1.3.3
1.3.4
1.4
1.5
1.6
2
2.1
2.1.1
2.1.2
2.2
2.3
2.4
2.5
2.6

Introduction ..................................................................................................................... 1
Social Wireless Networks ....................................................................................... 1
Device-to-Device (D2D) Content Dissemination in SWN ..................................... 2
Problem Statement .................................................................................................. 4
Consumption Interest and Coupon Redemption ................................................. 5
Consumer Privacy ............................................................................................... 8
Role of a Content Server ..................................................................................... 8
Incentive Mechanisms ........................................................................................ 9
Current Solutions for DTN Routing ..................................................................... 10
Dissertation Objectives ......................................................................................... 11
Scope of Dissertation ............................................................................................ 12
Related Work ................................................................................................................ 14
Unicast .................................................................................................................. 14
Community-based Routing ............................................................................... 14
Community-independent Routing..................................................................... 16
Multicast ............................................................................................................... 18
Interest-aware and Incentive-based Mechanisms ................................................. 20
Communication Protocol Evolution ..................................................................... 23
Reinforcement Learning based Approaches ......................................................... 25
Summary ............................................................................................................... 26

3
3.1
3.2
3.3
3.4
3.5
3.5.1
3.5.2
3.5.3
3.6
3.6.1
3.6.1.1
3.6.1.2
3.6.1.3
3.6.1.4

Gain-aware Unicast Dissemination Protocol (GDP) .................................................... 28
Introduction ........................................................................................................... 28
Network Model ..................................................................................................... 30
Problem Formulation ............................................................................................ 31
Gain Upperbound .................................................................................................. 33
Gain-aware Dissemination Protocol ..................................................................... 35
State Variables .................................................................................................. 35
Updating State Variables .................................................................................. 36
Forwarding Logic.............................................................................................. 37
Performance Characterization ............................................................................... 39
Experiments with Clique-forming Synthetic Mobility ..................................... 39
Protocol State Warm up .................................................................................... 40
Steady State Contact Statistics .......................................................................... 42
Protocol Self-tunability ..................................................................................... 43
Evolution of Gain and Latency ......................................................................... 45

v

3.6.1.5
3.6.1.6
3.6.1.7
3.6.1.8
3.6.1.9
3.6.2
3.6.2.1
3.6.2.2
3.6.3
3.7

Delay, Forwarding Cost, and Gain ................................................................... 45
Impacts of Forwarding Cost.............................................................................. 50
Impacts of Value Duration ................................................................................ 51
Joint Impacts of Forwarding Cost and Value Duration .................................... 53
Effects of Varying Content Generation Time ................................................... 54
Experiments with Trace Based Mobility .......................................................... 56
Effects of Different Content Sources ................................................................ 58
Effects of Varying Content Generation Time ................................................... 59
Gain-positive Delivery Ratio ............................................................................ 60
Summary and Conclusion ..................................................................................... 62

4
4.1
4.2
4.2.1
4.2.2
4.2.3
4.3
4.4
4.5
4.5.1
4.5.2
4.5.3
4.6
4.6.1
4.6.2
4.6.3
4.7
4.7.1
4.7.2
4.8

Economy Driven Multicast Content Dissemination ..................................................... 64
Introduction ........................................................................................................... 64
Economic Gain of Dissemination ......................................................................... 65
Cost, Revenue, and Gain................................................................................... 65
Consumer Interest and Coupon Redemption .................................................... 66
Consumer Privacy and the Role of Centralized Server .................................... 67
Problem Definition................................................................................................ 68
Economy Driven Content Dissemination ............................................................. 70
Compared Protocols .............................................................................................. 72
Interest-Aware Dissemination (IAD) ................................................................ 72
Social-aware Opportunistic Routing Protocol (SCORP) .................................. 73
Random Forwarding ......................................................................................... 74
Experimental Details ............................................................................................. 74
Interest and Redemption Function .................................................................... 74
Target Exposure Index (TEI) ............................................................................ 75
Degree of Penetration (DoP) ............................................................................ 76
Dissemination Performance .................................................................................. 76
Social Wireless Network with Clique Formation ............................................. 76
Second Mobility Model .................................................................................... 83
Summary and Conclusion ..................................................................................... 87

5
5.1
5.2
5.3
5.4
5.4.1
5.4.2
5.5
5.6
5.6.1
5.6.1.1
5.6.1.2
5.6.2
5.7

Multicast Routing using Predictive Gain Utility .......................................................... 88
Introduction ........................................................................................................... 88
System Model ....................................................................................................... 89
Problem Formulation ............................................................................................ 89
Proposed Protocols................................................................................................ 90
Predictive Gain Utility Routing -Individual (PGUR-I) .................................... 91
Predictive Gain Utility Routing -Aggregated (PGUR-A) ................................. 93
Compared Algorithms........................................................................................... 96
Dissemination Performance .................................................................................. 98
Synthetic Mobility ............................................................................................ 98
Monolithic Mobility .......................................................................................... 99
Community-based Mobility ............................................................................ 102
Trace-based Mobility ...................................................................................... 104
Summary and Conclusion ................................................................................... 105

vi

6
MAC Protocol Design Using Evolvable State Machines ........................................... 107
6.1
Introduction ......................................................................................................... 107
6.2
Protocol State Machine ....................................................................................... 108
6.3
Evolution Platform .............................................................................................. 111
6.4
Results ................................................................................................................. 113
6.4.1
Partial EDBF (EDBF: 1, 2, 3) ......................................................................... 113
6.4.1.1
Evolvable Probability: PWW	P = 0 â P = 1 (EDBF: 1) .............................. 113
6.4.1.2
Evolvable Probabilities: PTWT = 1 â T = 0 and PLW (EDBF: 2).............. 115
6.4.1.3
Evolvable Probabilities: PWL and PLW (EDBF: 2) ....................................... 118
6.4.1.4
Evolvable Probabilities: PTLT = 1 â T = 1, PTWT = 1 â T = 1, and
																																PLW	(EDBF: 3) ............................................................................................ 119
6.4.2
Full EDBF (EDBF: 6) ..................................................................................... 120
6.4.3
Protocol Adaptation ........................................................................................ 120
6.4.3.1
Adaptation with Changing Traffic Load ......................................................... 120
6.4.3.2
Adaptation with Changing Network Size ....................................................... 122
6.5
Summary and Conclusion ................................................................................... 123
7
7.1
7.2
7.2.1
7.2.2
7.3
7.3.1
7.3.2
7.3.3
7.4
7.4.1
7.4.1.1
7.4.1.2
7.4.1.3
7.4.2
7.4.2.1
7.4.2.2
7.5

MAC Protocol Evolution with Time Sensing and Channel Sensing Modalities ........ 124
Introduction ......................................................................................................... 124
Evolving Protocol State Machines ...................................................................... 125
Machine with Time Sensing Modality ............................................................ 125
Machine with Channel Sensing Modality ....................................................... 128
Evolution Architecture ........................................................................................ 130
Gene Architecture ........................................................................................... 130
Fitness Formulation ........................................................................................ 130
Reproduction and Evolution ........................................................................... 131
Results ................................................................................................................. 132
Evolution with Global Time Sensing Enabled................................................ 132
Limited Evolutionary Degree of Behavioral Freedom (i.e.,:1,2) .................... 132
Full Evolutionary Degree of Behavioral Freedom (EDBF: 8)........................ 135
Protocol Adaptation ........................................................................................ 136
Evolution with Channel Sensing Enabled....................................................... 138
Limited Evolutionary Degree of Behavioral Freedom (i.e., 1,2) .................... 138
Protocol Adaptation with Maximum EBDF (i.e., 8) ....................................... 140
Summary and Conclusion ................................................................................... 143

8
8.1
8.2
8.3
8.4
8.5
8.6
8.7
8.7.1
8.7.2

Q-learning based Gain-aware Routing........................................................................ 144
Introduction ......................................................................................................... 144
System Model ..................................................................................................... 145
Problem Formulation .......................................................................................... 146
Q-learning Model ................................................................................................ 146
Q-learning based Gain-aware Routing (QGR) ................................................... 149
Compared Protocols ............................................................................................ 153
Results ................................................................................................................. 155
Experimental Details ....................................................................................... 155
Community-based Synthetic Mobility ............................................................ 156

vii

8.7.3
8.8

Impact of Rebate Cost (cf) ............................................................................. 156
Impacts of Content Expiry Duration (Ď) and the Number of Content Copies (T)
......................................................................................................................... 159
Trace-based Mobility ...................................................................................... 160
Summary and Conclusion ................................................................................... 162

9
9.1
9.2
9.2.1
9.2.2
9.3
9.4
9.4.1
9.4.2
9.4.3
9.4.4
9.5

Scalable Q-learning based Gain-aware Routing ......................................................... 164
Introduction ......................................................................................................... 164
Scalable Q-learning based Routing (SQGR) ...................................................... 165
Gain Quanta .................................................................................................... 165
Q-learning Model ............................................................................................ 165
Scalable Q-learning based Gain-aware Routing ................................................. 167
Results ................................................................................................................. 171
Experimental Details ....................................................................................... 171
Community-based Synthetic Mobility ............................................................ 171
Trace-based Mobility ...................................................................................... 175
Working Day Mobility.................................................................................... 177
Summary and Conclusion ................................................................................... 179

10
10.1
10.2
10.2.1
10.2.2
10.3
10.4
10.5
10.6

Future Work ................................................................................................................ 180
Introduction ......................................................................................................... 180
Evolution of DTN Routing Protocols ................................................................. 182
Parameter Evolution........................................................................................ 182
Protocol Evolution .......................................................................................... 183
Neural Network Driven Q-learning .................................................................... 184
Gain Estimation using Time Series Prediction ................................................... 185
Redemption Functions ........................................................................................ 186
Managing Selfishness ......................................................................................... 187

8.7.2.1
8.7.2.2

BIBLIOGRAPHY....................................................................................................................... 189

viii

LIST OF TABLES

Table 3-1: Notations and corresponding definitions used in GDP ............................................... 36
Table 4-1: Notations and corresponding definitions in EDCD ..................................................... 71
Table 6-1: Independent transition probabilities .......................................................................... 111
Table 7-1: Independent transition probabilities for both state machines .................................... 128
Table 7-2: Possible transitions from W to T in the machine in Figure 7-2 ................................ 130

ix

LIST OF FIGURES

Figure 1-1: Example redemption functions .................................................................................... 7	
Figure 1-2: Scope of the dissertation ............................................................................................ 13	
Figure 3-1: Example user interaction and DTN routing scenario ................................................. 31	
Figure 3-2: RGAP for finding performance upper bound ............................................................ 35	
Figure 3-3: Experimental network topology with six social cliques ............................................ 39	
Figure 3-4: Warm up time based on inter-contact time evolution ................................................ 41	
Figure 3-5: Example contact statistics of a C4 node..................................................................... 42	
Figure 3-6:Gain comparison between GDP, PROPHET, and RGAP ........................................... 44	
Figure 3-7: Comparative delivery delays and gain evolution ....................................................... 46	
Figure 3-8: Delay vs. Forward count with cost: 0.01 and VD: 20K ............................................. 47	
Figure 3-9: Delay vs. Gain with cost: 0.01 and VD: 20K ............................................................. 50	
Figure 3-10: Forward count vs. Gain: cost 0.01 and VD 20K....................................................... 50	
Figure 3-11: Delay vs. number of forwards for GDP ................................................................... 51	
Figure 3-12: Delay vs. gain for GDP ............................................................................................ 52	
Figure 3-13: Number of forwards vs. gain for GDP ..................................................................... 52	
Figure 3-14: Delay vs. number of forwards with .......................................................................... 53	
Figure 3-15: Delay vs. gain with forwarding cost: 0.01 ............................................................... 53	
Figure 3-16: Number of forwards vs. gain with forwarding cost: 0.01 ........................................ 53	
Figure 3-17: Number of forwards vs. gain with forwarding cost: 0.01 ........................................ 55	
Figure 3-18: Performance for content generation at 16000s ........................................................ 55	
Figure 3-19: Performance for content generation at 17000s ........................................................ 56	

x

Figure 3-20: Warm up time for the INFOCOM â06 mobility trace .............................................. 57	
Figure 3-21: VD vs. Gain for INFOCOM â06 trace; Source node N5;(a): Cost 0.05, (b): Cost 0.1
and (c): Cost 0.14 ......................................................................................................................... 58
	
Figure 3-22: VD vs. Gain for INFOCOM â06; Source node N1; (a): Cost 0.05, (b): Cost 0.1 and
(c): Cost 0.14................................................................................................................................. 60
	
Figure 3-23: Performance for content generation at 82000s; Infocom â06 traces; Source node N5;
Cost 0.05 ....................................................................................................................................... 61
	
Figure 3-24: Gain-positive Delivery Ratio (GPDR) for a) GDP, b) Epidemic, c) SNW, and d)
PROPHET ..................................................................................................................................... 62
	
Figure 4-1: Example redemption function .................................................................................... 67	
Figure 4-2: Example user interaction and routing ........................................................................ 69	
Figure 4-3: Example Gaussian redemption function ................................................................... 75	
Fig. 4-4: Network with three social cliques (đđ¸đź: 21.5) ............................................................. 76	
Fig. 4-5: Multicast dissemination with different Depth of Penetration (DoP) ............................ 77	
Figure 4-6: Mobility for higher target consumer exposure (đđ¸đź: 37.5) ....................................... 79	
Fig. 4-7: Gain in scenarios of high TEI........................................................................................ 80	
Figure 4-8: Total rebate (forwarding cost) per destination ........................................................... 81	
Figure 4-9: Expected gain from one-hop deliveries ..................................................................... 82	
Figure 4-10: Five clique network: (a): TEI:27.8, (b) TEI:32.1 ..................................................... 84	
Figure 4-11: Expected gain distribution from one-hop deliveries ................................................ 84	
Figure 4-12: Maximum gain and the time to reach maximum ..................................................... 86	
Figure 5-1: Example user interaction and routing ........................................................................ 90	
Figure 5-2: Example Gaussian redemption function .................................................................... 98	
Figure 5-3: Evolution of gain for different expiry time; T=10 ..................................................... 99	
Figure 5-4: Dissemination behavior of PGUR protocols ........................................................... 102	

xi

Figure 5-5: Final gain for all combinations of T and đ for CMCI mobility scenario ................. 103	
Figure 5-6: Gain dynamics for different expiry time; T=15 ....................................................... 105	
Figure 6-1: (a) Two-state machine for Pure-ALOHA MAC; (b) Network-wide load G vs.
Throughput S (in Erlang) for pure-ALOHA phenotype ............................................................. 108
	
Figure 6-2: An evolvable generalized FSM with embedded pure-ALOHA logic ...................... 109	
Figure 6-3: System structure for the proposed evolutionary framework .................................... 111	
Figure 6-4: (a) Evolution of đđđ	đ = 0 â đ = 1 and (b) the corresponding throughput ..... 114
	
Figure 6-5: Trajectory of effective load đş for different applied load G ..................................... 115	
Figure 6-6: Evolution of pair đđđđ = 1 â đ = 0 and đđđż,	and the corresponding throughput
..................................................................................................................................................... 117
	
Figure 6-7: Evolution of pair PWLand PLW for different network-wide loads ......................... 118	
Figure 6-8: Evolution of đđđżđ = 1 â đ = 1, đđđđ = 1 â đ = 1 and đđżđ for different load
conditions .................................................................................................................................... 119
	
Figure 6-9: Evolution of all six independent transition probabilities for full EDBF .................. 121	
Figure 6-10: Adaptation of probabilities with runtime changes of network load ....................... 122	
Figure 6-11: Adaptation of probabilities with runtime changes of network size ....................... 123	
Figure 7-1: An evolvable generalized FSM with time sensing capability .................................. 125	
Figure 7-2: An evolvable generalized FSM with channel sensing capability ............................ 129	
Figure 7-3: Genotype formation and the system components .................................................... 131	
Figure 7-4: Fitness dynamics before and after time sensing is added ........................................ 133	
Figure 7-5: Genotype dynamics before and after time sensing is added ................................... 134	
Figure 7-6: Evolution of all probabilities in different loading conditions .................................. 135	
Figure 7-7: Throughput with runtime parameter changes .......................................................... 137	
Figure 7-8: Adaptation of probabilities with runtime parameter changes .................................. 137	
Figure 7-9: Genotype dynamics before/after channel sensing is added .................................... 139	

xii

Figure 7-10: Throughput with runtime parameter changes ........................................................ 141	
Figure 7-11: Adaptation of probabilities with runtime changes ................................................. 141	
Figure 8-1: Abstraction of states and actions in the Q-learning based routing ........................... 147	
Figure 8-2: Community-based synthetic mobility ...................................................................... 156	
Figure 8-3: Evolution of gain for forwarding cost cf:	(a) 0.002, (b) 0.02, (c) 0.1 and (d) 0.2 .... 158
	
Figure 8-4:Evolution of Q-values ............................................................................................... 159	
Figure 8-5: Impacts of content expiry duration (Ď) and number of copies (T) on gain .............. 160	
Figure 8-6: Dynamics of a) gain and b) time to reach maximum gain ....................................... 162	
Figure 9-1: Probabilistic state machine modeling for Q-Learning ............................................. 165	
Figure 9-2: Community-based mobility...................................................................................... 172	
Figure 9-3: Evolution of gain over time for expiry duration đ: (a) half a day, (b) 1.5, (c) 3 and (d)
5.5. days ...................................................................................................................................... 173
	
Figure 9-4: Final gain values for (a) T=15 and (b) T=25 ........................................................... 174	
Figure 9-5: Q-tablesâ content at each node for (a) Qii â 0 and (b)	Qii â 1.2 ........................... 175	
Figure 9-6: Maximum gain and time to reach that for expiry duration: (a) half a day and (b) 2.5
days ............................................................................................................................................. 177
	
Figure 9-7: Q-table size per node................................................................................................ 177	
Figure 9-8: Gain evolution for working day mobility with đ = 4 days...................................... 178	
Figure 10-1: Example general state machine consisting of a content dissemination protocol ... 184
	
Figure 10-2: Time series of gain observed by a node-i .............................................................. 186

xiii

LIST OF ALGORITHMS

Algorithm 3-1: The key components of GDP protocol..................................................................38
Algorithm 4-1: The key components of the EDCD algorithm........................................................73
Algorithm 5-1: The key components of the PGUR-I algorithm.....................................................94
Algorithm 5-2: The key components of the PGUR-A algorithm...................................................96
Algorithm 8-1: Key components of the QGR algorithm...............................................................153
Algorithm 9-1: Summary of the SQGR algorithm........................................................................170

xiv

1
1.1

Introduction
Social Wireless Networks
Similarity breeds connection [1]. This principle (i.e., homophily) structures social ties of every

type, including friendship, collegiality, co-membership, and other types of relationships. Such
pairwise similarities are the building blocks of the concept of Social Wireless Networks (SWNs).
SWNs are formed by ad hoc wireless links among data-enabled wireless devices when people
carrying such devices regularly gather in certain places. SWNs are classified under Delay Tolerant
Networks (DTNs) in the sense that they lack continuous network connectivity. According to the
homophily principle, human mobility is partially shaped by their social relationships [2].
Reciprocally, it is shown [3] that the similarity between movement trajectories of two persons can
be a strong indication of their tie in a physical social network. This binding between social
interactions and mobility similarities is reflected in SWNs. Examples include the SWN formed by
students in an academic department within a university campus. In a smaller scale, students
residing in a specific department and having a common interest in a specific type of food, form an
SWN. Other SWN examples include networks formed in work places, malls, airports, train stations
and other public places.
Interaction patterns in an SWN can be indexed with social metrics such as pairwise contact
duration, pairwise contact frequency, betweenness centrality [4], and degree centrality [4]. These
metrics summarize the mobility profile of a person in relation to other entities in the network. This
additional information layer about mobility and social interactions of nodes in a network leads to
the emergence of a new field of Socially Aware Networking (SAN) [5]. By exploiting social
properties of nodes, the abstraction of SAN can be used for better networking support to innovative

1

applications and services, such as Device-to-Device content dissemination in Social Wireless
Networks.
1.2

Device-to-Device (D2D) Content Dissemination in SWN
Due to human mobility and the lack of complete coverage by WiFi access points, an SWN

can be susceptible to intermittent disconnections to the Internet. This can result in partitions of
devices that can communicate with each other using ad hoc routing protocols, but do not have
Internet connectivity. This lack of connectivity may affect server-based content dissemination
inside an SWN. Intermittent Internet connectivity is one of the factors which necessitates deviceto-device content dissemination approaches in such SWNs. Device-to-Device (D2D) content
dissemination can be implemented through any of most recent technologies including Bluetooth,
Bluetooth Low Energy (BLE), Bluetooth Service Discovery Protocol.
While there are many applications of multicast D2D dissemination in healthcare [6],
emergency handling [7], and vehicular networks [8], in this dissertation we specifically focus on
a commercial content dissemination application. It is about delivering commercial content such as
coupons to a target population of customers. Consider an example scenario in which an ethnic
restaurant in a mall sends out an electronic coupon (through Bluetooth) to the mobile device of a
customer who is known to be interested in that ethnic food. The customer, who happens to be a
University student, forwards the coupon to other students in her social DTN communities in the
University campus via mobile-to-mobile links. It is desired that the coupon gets disseminated to
students who are most likely to redeem them so that the resulting commercial gain from the
dissemination is maximized.
Even in the presence of Internet connectivity, device-to-device content dissemination can be
of interest for the following reasons:

2

1) Scalability and privacy: reliance on a-priori knowledge about specific destination node
identifiers is not feasible in practical coupon dissemination applications in which the coupons need
to be delivered to nodes based on their dynamic interest levels and redemption chances instead of
their static identities. Therefore, in a server-based approach, a server needs to keep track of every
nodeâs interest level toward every product/subject. This, not only exposes usersâ preferences to an
external server, but also imposes a heavy control load on the server. In a D2D approach, however,
nodes share their preferences (interest profile) only with certain peers upon an interaction, and not
with a centralized server, thus, partially protecting privacy and obtaining scalability.
2) Targeted dissemination: D2D dissemination can promote coupon distribution within the
geographical proximity of the coupon generator. Consider the earlier mentioned example scenario
in which an ethnic restaurant in a mall sends out a coupon to the mobile device of a customer who
is known to be interested in that ethnic food. The customer, who happens to be a student in a
University in proximity of the mall, forwards the coupon to other students in her social
communities in the University campus via mobile-to-mobile links. Since the coupon holderâs peers
are in the same general geographical area, they can similarly access the restaurant (coupon
generator). That is, the D2D dissemination takes care of delivering to users which are in spatial
proximity of the mall, and thus, may be more likely to redeem coupons. Also, this dissemination
happens without any exposure of nodesâ geographical locations to a centralized content server.
There are a few other common characteristics that the coupon holder may share with her/his
student peers, other than the characteristic of commuting in each othersâ proximity. For example,
most of such peer are likely to belong to the same social and financial class. As a result, it is also
more likely that the coupon holdersâ peers share similar interest levels. In a centralized approach,

3

the server can not easily identify such detailed preferences for large consumer population, thus
leading to scalability concerns.
3) Viral marketing: D2D content dissemination implements viral marketing by leveraging
inter-personal interaction information to direct the coupons to most fit consumers. Viral marketing
is essentially a consumer-to-consumer process and can not be implemented in a centralized
manner. It leverages the power of individuals to influence peers through various online means
including social networks, emails, instant messaging, and online chat. Electronic viral marketing
facilitates asynchronous round-the-clock connectivity with customers, and also considerably
expands the scope and scale of influence compared to face-to-face contacts [9]. Depending on an
applicationâs requirements regarding privacy preservation, a coupon forwarder may or may not
expose his/her identity. In either case, D2D dissemination is still more effective [10] than serverbased advertising methods because 1) people trust their acquaintances more than ads coming from
a coupon provider, and 2) a user receives targeted coupons based on his/her interests which are
most probably similar to his/her acquaintancesâ interests [1] and choices.
The main design goal for a D2D commercial content dissemination algorithm is about
maximizing a predefined commercial gain for a Content Provider (CP). In other words, a carrier
of content/coupon identifies most fit individual Content Consumers (CCs), based on its pattern of
social interaction (e.g., contact duration, centrality, etc.) with target consumers. In the following
section, the requirements and design challenges of D2D commercial content dissemination
approaches are discussed in detail.
1.3

Problem Statement
In traditional server-based approaches [11], coupons are sent to consumers directly by Content

Providers (CPs). The economic value of a disseminated coupon is determined based on statistically

4

estimated redemption rates and the known costs of coupon generation and distribution. The overall
goal is to maximize the content providerâs commercial gain, which is defined as the generated
revenue out of potential redemptions minus all involved costs of preparation and distribution.
In D2D distribution, the primary cost comes from incentivizing mobile nodes for taking
part in D2D dissemination process. D2D coupon forwarding costs bandwidth, storage, CPU and,
more importantly, battery power which can deter people from taking part in dissemination. Thus,
the objective here is to develop coupon dissemination mechanisms that can maximize a couponâs
economic gain in the presence of costs/rebates for D2D forwarding incentives. This can be posed
as a Delay Tolerant Network (DTN) multicast routing problem. Most DTN multicast routing
approaches in literature attempt to individually optimize performance indices such as message
delay, forwarding cost, and storage. None define composite cost factors by combining multiple
such indices into a single gain index. Due to the lack of gain-awareness, these protocols cannot
control the economic gain of dissemination, as defined in this context.
1.3.1

Consumption Interest and Coupon Redemption

Consumer Interest: Consumersâ interest level towards an item (e.g., product, brand, etc)
changes across time and across individuals. Such interest level is mostly dynamic. For example, a
consumer may develop interest for a brand after multiple satisfactory purchases from that brand.
In another example, a consumer may lose his/her interest in a product after reading negative
reviews from other purchasers who used the product (word-of-mouth effect [12]). Modeling and
embedding such interest level into the D2D dissemination is necessary in order to have an elaborate
system model. Interest level for a consumer can be set through many possible means including: 1)
an explicit input by the consumer herself, 2) computed based on the number of purchases made

5

for a product through the consumerâs mobile application, or 3) locally extracted from the metadata
collected from searching and browsing activities on the consumerâs device.
Redemption Probability: Redemption probability for a consumer for a coupon, represents
her/his likelihood of redeeming a coupon within a defined coupon expiry time. The determination
of redemption probabilities can be helpful when planning and monitoring coupon campaigns.
Redemption probabilities for a specific type of coupon (i.e., the corresponding product or service)
are usually predicted based on past redemption data and actual data from an ongoing coupon
campaign [13].
Modeling redemption probabilities in a D2D dissemination framework is another step toward
simulating a real coupon campaign. For that, a redemption function can be presented which maps
a userâs interest level for a specific coupon to a redemption probability.
While interest level is a characteristic of a consumer, redemption probability is a function that
depends on how consumers react to a coupon based on their interests in the product or service that
the coupon represents. If a coupon is originally designed with the goal of motivating highinterest/loyal consumers, then the redemption function is likely to map higher redemption
probabilities to consumers with higher interest levels. This can be modeled as an exponential
redemption function with a maximum redemption probability (e.g., the shown in Figure 1-1).
Interest level in Figure 1-1 is shown as đ? and redemption probability is indicated with đ@ đ? ,
which is a function of the interest level.

6

0.7

Exponential

)

0.6
0.5

Gaussian

0.4
0.3
0.2
0.1
0
0

0.2

0.4

0.6

0.8

1

Figure 1-1: Example redemption functions
Another example, a Gaussian redemption function with đ = 0.75 is shown in Figure 1-1. This
function corresponds to a coupon campaign designed to motivate customers with medium interest
level. Such model can be justified based on findings [14] which imply that coupon delivery to
highly loyal (interested) customers may kill customer loyalty. That is because sending coupons to
customers who already buy specific products (possibly at regular price) communicates that the
product they are buying has a lower value than what they pay normally. In that case, the sentiment
of exclusivity and lowering the bridge are lost, because they are already customers [14]. Thus,
light to moderately loyal customers, as opposed to most loyal ones with higher interest levels, may
maintain higher redemption rates.
To mention another example, customers based on the stage of their buying cycles can be
categorized in different classes, namely, active, at risk, and lost [15]. These customersâ interest
levels fall into different ranges of high, moderate, and low. The definitions and boundaries of such
ranges of interest levels can be maintained by the CPâs server. From a marketing point of view,
each class of customers need to be treated with different strategies, thus different coupon campaign
models are designed. For example, a win-back coupon [15] is designed to target at risk customers,
who maintain moderate interest levels rather than being highly or marginally interested.

7

The interest level for a coupon type for a user is reflected in his/her interest profile.
Redemption probability is then determined based on the interest level and specific coupon type.
The combination of the mobility (Section 1.1) and interest profiles of users in an SWN, when
shared within peers, provides information based on which D2D coupon dissemination protocols
can be designed. This information is not made accessible to a server (in a server-based
dissemination approach) due to privacy concerns explained in the next section.
1.3.2

Consumer Privacy

From a privacy standpoint, both interest (Section 1.3.1) and mobility (Section 1.1) profiles of
a consumer are preferred to be stored locally within the mobile device, and should not be exposed
to a remote server. Under this privacy model, consumer devices are allowed to share their interest
profiles with other devices selectively upon meeting, but are not allowed to share with a centralized
entity. The latter provides partial user privacy. Towards the end of this dissertation document (in
Chapter 9), we propose a dissemination mechanism which does not require nodes to store peer
specific interest and mobility profile on their devices. This information is still shared upon pairwise
meetings for a one-time use for required state variable computations on peerâs device, however,
this information is deleted afterwards and is not stored on peerâs device. This is the highest level
of privacy considered across methods proposed throughout this dissertation.
1.3.3

Role of a Content Server

While the dissemination process can be done in a fully Device-to-Device manner in order to
preserve consumersâ privacy (Section 1.3.2), a centralized server may be present with the
following roles: 1) consumer authentication, 2) maximum dissemination enforcement, and 3) other
long term control and management tasks. This is of course feasible if user devices can reach the

8

server through vertical links (i.e., 3G/4G) for such control and management operations. Note that
even when a vertical link to a server is available, D2D dissemination is desirable for the reasons
outlined in Section 1.2.
For horizontal (i.e., Device-to-Device) links between devices, different communication
technologies such as Bluetooth, WiFi-direct, and Bluetooth SDP can be adapted.
1.3.4

Incentive Mechanisms

As mentioned earlier, D2D coupon forwarding costs bandwidth, storage, CPU and, more
importantly, battery power which can deter people from taking part in dissemination. Incentive
mechanisms are needed to encourage users to participate in sharing resources. Here, few basic
incentive mechanisms are discussed:
1) Uniform dispensing [16]: Considering a total reward of đE , every node in the forwarding
chain (with size l) receives equal amount of reward đE /đ. Forwarding chain refers to the
intermediate nodes who participate in forwarding a copy of a content from a source to a destination.
This approach leads to an undesirable situation in which users may refuse to forward a coupon
further in order to keep the length of the forwarding chain smaller, thus, maximize their own
reward.
2) Fixed and equal dispensing [16] :in this method, every node on the forwarding chain
receives a similar amount of reward irrespective of the chainâs length.
3) Geometric dispensing: intuitively, usersâ contribution along a forwarding chain is uneven.
Geometric dispensing is based on the assumption that the closer a user is to the initial coupon
holder in the chain, the greater her contribution is. With Geometric dispensing, the reward
distribution is a geometric series with a constant ratio p. Uniform dispensing is a special case of
geometric dispensing, when p=1. Also, p=0 results in a single-level dispensing mechanism in

9

which the initial coupon holder receives reward for every forwarding, but the rest of the nodes in
the chain do not receive any reward if they forward further. Few other incentive mechanisms that
are proposed in literature, which will be discussed in the next chapter (Related Work).
Along with every incentive mechanism, security concerns need to be addressed. One of the
possible threats is when a node either from inside or outside of a forwarding chain hacks into the
incentive mechanism and manipulates the rewards for itself and other nodes to collect higher
reward. To avoid such issues, different encryption methods are applied [16, 17]. Furthermore, an
incentive mechanism may require the trajectory of a coupon to be stored and exposed to the coupon
provider for redeeming rewards eventually. That itself discloses usersâ social interactions and
trajectories as it shows the social interactions which took place in the process of delivering a
coupon to one or multiple receivers.
1.4

Current Solutions for DTN Routing
The problem of D2D coupon dissemination in SWNs can be posed as a multicast Delay

Tolerant Routing problem. There exists a rich body of literature on both Unicast and Multicast
DTN routing solutions. Among these methods, only those which use a specific type of multicast,
namely, dissemination, are fit for targeted coupon distribution problem in this dissertation.
Dissemination, does not rely on a-priori knowledge about specific destination node identifiers,
which is not always available in practical coupon dissemination applications in which the coupons
are delivered to nodes based on their dynamic interest levels and redemption chances instead of
static identities.
Under the broad class of D2D dissemination algorithms, there are examples which apply
social-aware dissemination [18], embed incentive mechanisms in their proposed framework (e.g.,
[19]), and/or rely on utility functions [20] designed towards performance indices such as delay and

10

forwarding cost. Social-aware dissemination approaches leverage the information regarding
existing social ties among users in an SWN to identify and disseminate a content to receivers
interested in the content. Incentive-based methods are suited for D2D dissemination framework in
terms of providing incentive methods to encourage nodesâ participation in dissemination process.
Utility-aware approaches apply a defined utility as the routing criterion. This is similar to the
general approach in our research here where we use a defined routing utility in order to maximize
the eventual economic gain.
However, most existing multicast approaches are not applicable to marketing applications,
since they do not: 1) explicitly define and employ a composite economic gain for the content
generator, and 2) incorporate a model considering interest profile of nodes toward a
content/product. The approaches in this dissertation attempt to address both the problems.
1.5

Dissertation Objectives
The objective is to design a D2D dissemination algorithm which for a given combination of

commercial and network parameters, maximizes the total economic gain within the lowest possible
dissemination delay. Commercial parameters include the revenue out of selling a product, discount
cost, rebate cost, number of available coupons, coupon expiry time, and a redemption function.
These parameters are inputs provided by a coupon generator. Network parameters are given based
on the SWNâs characteristics such as consumersâ interest profiles and their social bindings
demonstrated in their mobility profiles. This work formulates the gain of dissemination in
perspective of a commercial coupon generator and proposes few Unicast and Multicast gain-aware
routing algorithms which maximize a defined economic gain.
We explore two solution approaches. First, we use traditional protocol design methods for
gain-aware DTN protocol design. In the second approach, we explore learning mechanisms for

11

communication protocol design with a focus on routing layer protocol design, specifically, gainaware DTN protocols for coupon dissemination.
1.6

Scope of Dissertation
Chapter 2 is a survey of Unicast and Multicast DTN routing strategies. We explain why such

approaches are not practical in target marketing applications. Furthermore, we highlight the
differences and advantages of our proposed gain-aware dissemination framework compared to
most recent routing schemes in the literature.
In Chapter 3, we propose a Gain-aware Unicast Dissemination Protocol (GDP) which routes
a content to a receiver with the goal of maximizing the defined economic gain for the content
provider. Here, gain of dissemination is defined as a combination of a time-based value function
and cost of delivery.
In Chapter 4, we formulate the gain in a more practical elaborate manner. We define the value
of a coupon delivery as a function of redemption probability of the coupon receiver, rather than
time of the delivery. Then, we define gain as a combination of the defined value and cost of
dissemination. We then propose an Economy Driven Content Dissemination (EDCD) routing
protocol that controls D2D content dissemination with an end goal of maximizing the defined
economic gain for a coupon-generator.
In chapter 5, using the same gain formulation as in chapter 4, we introduce a Predictive Gain
Utility Routing framework consisting of two routing protocols. Using gain prediction, this
framework provides more structured and scalable routing approaches compared to EDCD (chapter
4).

12

In chapter 6 and chapter 7, we present the proposed framework of evolutionary-based learning
for protocol synthesis using probabilistic evolving state machines. We apply this framework to
MAC layer protocols.
In chapter 8 and chapter 9, we propose reinforcement learning, specifically Q-learning based
gain-aware dissemination mechanisms for D2D coupon dissemination. The proposed online
learning framework is expected to be robust in diverse and dynamic mobility environments.
In chapter 10, we present the summary and future work with a discussion on how the presented
mechanisms in this dissertation may open new directions in the area of commercial content
dissemination in SWNs. Figure 1-2 demonstrates a pictorial representation of the presented work.
Gain-aware D2D
routing in DTNs
Time-based gain
formulation
Unicast Gain-aware
Dissemination Protocol
(Chapter 3)

Redemption
probability-based gain
formulation
Multicast Gain-aware
dissemination methods

Without gain prediction

With gain prediction

Economy Driven Content
Dissemination (EDCD)
(Chapter 4)

Predictive Gain Utilities
Routing (PGUR)

Learning-based
protocol design

(Chapter 5)

Q-learning based
learning

Evolutionary-based
learning (MAC)
ALOHA S-ALOHA
(Chapter 6)

CSMA

(Chapter 7)

Q-learning based Gainaware Routing (QGR)
(Chapter 8)

Figure 1-2: Scope of the dissertation

13

Scalable Q-learning based
Gain-aware Routing
(SQGR)
(Chapter 9)

2

Related Work
This section reviews existing DTN routing protocols that use social interactions information

for route computation in Social Wireless Networks. Several Unicast and Multicast approaches are
first reviewed. Then, few approaches which are specifically fit for Device-to-Device (D2D)
coupon dissemination applications are presented. These approaches embed concepts such as
consumer interest, redemption probability, and forwarding incentive mechanisms. Finally, a class
of existing work which use learning based techniques (evolutionary and reinforcement-based) to
develop network routing protocols are reviewed.
2.1

Unicast
Unicast social-aware routing approaches are more extensively investigated in the literature

compared to Multicast approaches. Few Unicast mechanisms can be extended to multicast by
running the unicast logic for multiple destinations simultaneously. Those are referred to as unicastbased multicast routing [21].
2.1.1

Community-based Routing

Community-based routing algorithms rely on extracting information regarding nodesâ
affiliation to communities. These methods can be classified [22] into two categories, namely, selfreported, and detected. Self-reported social bindings are not always practical, or even accurate due
to the dynamic nature of usersâ interests and affiliation. In detected routing, social ties are inferred
runtime based on nodesâ interaction statistics. Many of them attempt to extract community
structures prior to routing. LABEL [23] uses nodesâ pairwise inter-contact time values to detect
co-memberships to a community. Then, each node is labeled with its affiliation to a community.

14

A packet is forwarded to a destination directly or it is forwarded to a node which has a label similar
to the destinationâs label.
Bubblerap [24] detects inter-node social relationship such as existing community structures
or level of centrality of each node. Node centrality is a measurement of a nodeâs topological
importance, which is used to identify the key nodes that can bridge between different network
segments and partitions [25]. Following a hierarchical pattern, a message is first forwarded toward
the community in which the destination resides, and consequently, using intra-community
interactions it is delivered to the destination.
[26] proposes a friendship based routing in which each node forms a temporal friendship
community. Friendship with a peer is estimated based on the frequency, longevity, and regularity
of pairwise meetings. [26] demonstrates that the delivery decisions based on such communities
can improve both delivery ratio and delivery cost. Diverse Routing (DR) [27] is a tunable protocol
designed to cope with nodes deviating from their habitual activities. Habitual activities, reflected
in community movement, helps in reducing average communication delay and improves routing
efficiency. When nodes deviate from their habitual activities, affecting the routing performance,
DR ensures that at least one of the deviant nodes can receive a copy of the packet. This is done by
DRâs runtime way of social clustering and scattering copies of the packet into these clusters. DR
supports Multicast as well as Unicast.
All mentioned community-based routing methods rely on the homophily [1] principle which
implies that nodes with similar interests, belonging to the same community, have higher chances
of meeting each other. Therefore, the main effort in this class of methods is to detect communities
and route the packet to the destination through nodes which belong to the same community as that

15

of the destination. Dynamic topology of SWNs and frequent changes in nodesâ affiliation may
affect the performance of community detection and the subsequent routing process.
2.1.2

Community-independent Routing

Community-independent routing approaches build routing algorithms based on utilities
without relying on information about nodesâ affiliation to any communities. In social-aware
routing mechanisms, utility is computed based on nodesâ inter-personal relationships. Content
carriers with higher utility are more fit to perform the content dissemination task.
SimBetTS [28], for example, defines utility as a weighted combination of three social utilities
including similarity, betweenness and tie-strength. Two nodesâ similarity level is a function of the
number of common peers they are expected to meet in future. Betweenness for a node is calculated
based on the number of nodes in the network that are indirectly connected through that node. Tiestrength between two nodes is calculated based on their meeting frequency, closeness, and
freshness of meetings.
PROPHET family of protocols [29], use history of inter-node encounters and transitivity for
computing a delivery predictability metric based on which the forwarding decisions are made.
Delivery predictability of a node captures the frequency with which the node meets the destination
directly or transitively through other nodes. When a node carrying a content meets another node,
the content is forwarded to the latter only if it has higher delivery predictability (to the destination)
compared to the node that is currently holding the content. PROPHET implicitly attempts to
balance between delay and forward-count in a best-effort manner. However, according to [30],
protocols such as PROPHET often face a fundamental challenge in choosing the right time scale
for delivery predictability estimation which is consistent with the underlying mobility. This causes
certain performance loss. Authors in [30] derive mechanisms to dynamically determine the time

16

scale of PROPHETâs delivery predictability based on mobility, and show that when such time
scale matches with mobility, the routing performance indeed improves.
MaxProp [31] uses prioritized packet transmissions and packet drops (when buffers are low
on storage) based on information collected via system-wide acknowledgements about path failure
likelihood and hop-lists, which show previous intermediate recipients. Fresh packets, which are
packets with hop count less than a threshold value, are given transmission priority. Extra copies of
delivered packets are removed from involved nodesâ buffers as an acknowledgment is broadcast
in the network.
Both MaxProp and PROPHET use human mobility patterns to predict the best forwarders for
accomplishing a higher delivery ratio and lower forwarding cost.
Encounter Based Routing (EBR) [32] aims to maximize delivery ratio while minimizing the
number of replicas and delivery delay. The basic idea is that nodes that frequently meet other nodes
are better carriers than those who infrequently meet other nodes. Each node maintains an
Encounter Value (EV) which represents the nodeâs past rate of encounters with other nodes. EV is
formulated as an exponentially moving weighted average. When two nodes meet, the number of
replicas of the content that are exchanged is proportional to the Evs of the nodes. The authors in
[32] show that this mechanism amounts to a quota-based system in which the total number of
copies of a content does not depend on the number of network nodes; instead it depends on a preset
quota. It also shows that the encounter based forwarding implicitly minimizes both delivery delay
and number of content forwards, while the protocolâs performance is subject to change based on
the initial number of copies of a message.
The protocol Selectively Making pRogress Towards delivery (SMART) [33] defines
companions of the destination, which are nodes that frequently meet the destination. In the spray

17

phase of SMART, a node forwards the packet only to the companions of destination. In its second
phase, companions of destination forward the node to other companions until the packet is
delivered to the destination. SMART exploits nodesâ interaction information when forwarding
messages. Thus it does not assume a node knows other nodesâ mobility patterns. The protocol
attempts to minimize both delay and forwarding cost by forwarding packets to companions of the
destination. None of the protocols, EBR and SMART employ any explicit mechanism for gain
maximization as this thesis sets out to accomplish.
Another category of routing protocols assume presence of regularity in human mobility. For
example, Nazir et al., [34] assumes that people follow similar mobility patterns daily (i.e., Monday
to Friday). It is designed specifically to reduce end-to-end delay and increase delivery performance
for time-critical applications.
Community-independent methods are light-weight to implement compared to communitybased methods since they do not require a community detection scheme to run in parallel to the
routing framework. However, a warmup period is required for the nodes to build their mobility
profiles including the information regarding their social ties.
2.2

Multicast
There is less coverage on social-aware multicast DTN routing in the literature [22]. In [21],

relay selection for multicast is formulated as a unified knapsack problem by extracting node
centrality and community features, based on which it builds a social weight metric. Each node
maintains a path to each multicast destination with the maximum social weight. This approach
reduces the number of used relays, thus addresses the forwarding cost issue.
Ning and et. al. [35] propose an incentive based forwarding scheme which rewards the last
hop relay node. Every node computes its Effective Interest Contact Probability (ECIP) for each

18

data category, which represents the probability that this node contacted a node interested in the
corresponding data category either directly or indirectly. When two nodes meet, they exchange
data messages based on computed ECIPs such that they maximize their own expected credit. This
approach is restricted in the sense that a node without any messages is unable to receive any data
item from other nodes it encountered, unless the data item is of its interest.
The publish/subscribe Multi-Receiver Incentive Based Dissemination (MuRIS) [36] develops
a hop-count based multicast utility to achieve minimum-hop multicast dissemination. During the
warmup period or after each idle period, a publisher broadcasts probe messages labeled as a
particular data category that this publisher will publish. Every subscriber, upon receiving this probe
message, sends a receipt message containing the path information carried by the corresponding
probe message to all relay nodes on the recorded path.
Many existing multicast schemes (e.g., MuRIS [36]) face scalability issues as packets need to
contain the address of every node on the path from a publisher to a subscriber.
A category of flooding-based methods are designed based on Epidemic routing [37]. One such
example, [38], analytically determines how a service provider can allocate its bandwidth optimally
to make the content at users as âfreshâ as possible. It defines an optimization problem around a
global fairness objective (namely, maximizing the aggregate utility over all users) and solves it
applying a gradient descent. [38] also specifies a condition under which the system is highly
scalable.
Other than the scalability issue mentioned for Multicast approaches, most Unicast and
Multicast approaches are not fit to coupon dissemination applications because they: a) lack the
abstraction of economic gain for content generator; b) they do not incorporate a model of interest
profile, and c) they rely on a-priori knowledge about specific destination node identifiers. The

19

latter is not feasible in practical coupon dissemination applications in which the coupons are
delivered to nodes based on their dynamic interest levels and redemption chances instead of the
static identities.
In the following section, we introduce existing frameworks which are more fit to coupon
dissemination applications in the sense that they focus on embedding consumersâ interest profile
and incentive mechanisms as parts of their system model and solution framework.
2.3

Interest-aware and Incentive-based Mechanisms
For a content dissemination method to be applicable for commercial applications, consumersâ

interest profiles (e.g., interest type, interest level) should be incorporated into the content
dissemination framework. In other words, nodes are identified with interest types/levels. Also, an
effective incentive mechanism should compel nodes to contribute to message forwarding in such
a way that the overall cost (e.g., delay, number of hops) is minimized.
Interest-Based (IB) routing [39] uses an m-dimensional interest profile. Similarities between
interests are measured using the cosine similarity metric. Forwarding is done to the relay nodes,
which have similar interest profile(s) to that of the destination. Here, a destination must be known
a-priori.
In the social-aware interest-cast mechanism SCORP [20], nodes are assigned binary interest
levels (i.e., 0 or 1) toward different items. A utility of social weight is used for capturing a nodeâs
contact durations with the nodes interested in a type of content. Nodes with higher social weight
are considered better forwarders. SCORP improves delivery rate and delivery cost. The binary
interest level used in SCORPâs model does not match practical business models with non-binary
consumerâs interests for a coupon type.

20

đH -coupon [17] proposes a probabilistic coupon dissemination method focusing on
promptness and privacy. Users are not identified with interest profiles. Thus, this approach does
not attempt to route a coupon to destination(s) with specific interest. Instead, it assumes all users
are potentially interested in owning a coupon. A coupon receiver becomes an owner of the coupon
based on a defined probability. Each intermediate node on the forwarding chain receives some
portion of the total associated reward (designated by the coupon generator) based on its rank in the
forwarding chain. đH -coupon uses Bluetooth Service Discovery Protocol (SDP) toolkit for Unicast
coupon exchange which does not require pairing, thus, provides prompt coupon exchange.
MobiCent [40] is an incentive compatible scheme proposed for replication based DTN routing
protocols. It works on top of existing routing protocols to ensure that selfish actions do not result
in larger rewards. Using Multiplicative Decreasing Award (MDR), MobiCent provides an
incentive compatible payment mechanism to cater to clients that want to minimize either payment
or data delivery delay. The protocol is also capable of identifying and reacting to DTN related
attacks such as edge insertion attack and edge hiding attack. MobiCent treats the routing protocol
as a black-box and is independent of the specific algorithm used. Authors choose Epidemic [37]
as the underlying routing algorithm to evaluate MobiCentâs performance. MobiCent requires a
trusted third party to verify and manage payment transactions. Another method RELICS (innetwork ReaLization of Incentives to Combat Selfishness) [41] introduces in-network-realization
of incentives, in which it does not pay nodes virtually, but rewards them in a physically realizable
way. RELICS assigns a rank to each node based on nodeâs reciprocity of service. That is, a node
obtains a higher rank if it forwards messages originated from others more frequently. Message
forwarding is then prioritized based on the message generatorâs rank. RELICS, not a routing

21

protocol itself, can be mounted on top of any routing protocol to facilitate energy- and incentiveaware message forwarding.
COUPONS [19] introduces a multilevel incentive mechanism for coupon dissemination in
mobile networks. The incentive mechanism rewards intermediate nodes in a forwarding chain
equally or based on a geometric series (See Section 1.3.4). The applied dissemination mechanism
is not social-aware. However, based on feedbacks from the network, including acknowledgements
and node density, COUPONS controls the rate of coupon duplication and forwarding. COUPONS
improves delivery ratio similar to đH -coupon. These methods are fit to coupon dissemination
applications in the sense that they address privacy issues regarding D2D dissemination, as well as
encouraging nodesâ cooperation (with embedded incentive mechanisms). But they do not
formulate nodesâ interest levels and redemption probability.
SocialCast [42] defines a system model embedding nodesâ interest profiles. It then defines a
combined node utility based on 1) a nodeâs colocation with another node sharing the same interest
type, and 2) change of degree for a node. SocialCast uses Kalman Filter [42] to predict nodesâ
utility. The higher a nodeâs utility is, a better content carrier it is. SocialCast is shown to improve
delivery ratio and latency compared to a non-prediction based method.
The system model introduced in [16] is closer to coupon dissemination business models as it
considers consumersâ redemption probabilities. Its forwarding logic assumes that each node knows
its neighborsâ redemption probabilities and topological degrees. It defines a social-aware incentive
mechanism such that a nodeâs gain (acquired incentive) is a function of the redemption probability
of the nodes to which it forwards the coupon. Therefore, users implicitly maximize advertisersâ
interest by maximizing their own interest. This method is tested on a synthesized contact network

22

with a known redemption probability distribution, rather than a wireless social network with node
mobility.
Social-Aware Networking (SANE) [43] explicitly considers user interest information. It uses
an interest profile, and similarities between interests are measured using the cosine similarity
metric. Forwarding is done to the relay nodes, which cosine similarity of their interest profile(s) to
that of the destinationâs is larger than a relevance threshold. The parameter relevance threshold
needs to be adjusted experimentally. SANE attempts to take efficient forwarding decisions using
utility prediction. However, it does not use a structured learning framework to perform this task.
Next section addresses the application of learning in protocol evolution on different network
layers.
2.4

Communication Protocol Evolution
Communication network protocols are designed based primarily on heuristics and past

experience of the designer. While this works well for simple networks with homogeneous and
prior known operating conditions, often this approach does not scale well for heterogeneous and
dynamically changing networks. This motivates a field of network protocol design using
evolutionary mechanisms that can adapt with heterogeneity and changing operating conditions,
thus avoiding redesigns with changing conditions.
Communication protocol evolution is a tool that can be used by a collection of network entities
to synthesize a set of rules as protocol components for a given network and loading conditions.
Parameter evolution, as a rudimentary form of protocol evolution, allows the evolution of one or
multiple parameters of an already-designed protocol logic.
An example of parameter evolution can be found in [3] where Epidemic Routing [3] is used
as the target protocol for which forwarding probability is selected as the evolvable parameter. The

23

authors use Genetic Algorithms (GA) to evolve the parameter forwarding probability for
maximizing fitness. While being useful for logic optimization, parameter evolution itself does not
serve the purpose of logic/protocol synthesis.
The anycast evolutionary-based routing proposed in [44] defines a chromosome as a set of
randomly selected possible paths. Using Genetic Algorithms (GA), it evolves more fit set of routes
(e.g., with lower delivery delay). Simulation results show that the proposed GA-based method can
reduce the average delay compared to the shortest path algorithm. This approach however, faces
challenges when used in highly dynamic networks (such as SWNs) where paths between sources
and destination(s) change frequently.
[45] finds a routing solution for flying Unmanned Aerial Vehicles (UAVs) with mutual sense
and communication capability. The proposed fuzzy-based ant colony routing strategy computes
local heuristic information based on crowd density of peers around a source node and relative
velocity between a source and its peers. Combining the locally acquired information and ant
pheromone, a routing strategy is exploited which improves delivery probability and average
latency compared to few existing methods, including Epidemic [37] and Direct Delivery [46].
Genetic Programming (GP) is widely applied for program evolution [4]. GP contributes to the
evolution of a set of instructions (i.e., implementing algorithms) as genes, rather than a set of
parameters as done in [3]. [6] is an example of using GP for protocol evolution in which the
evolution of IEEE802.11 contention window behavior is explored. It was reported that the types
of evolved MAC contention window behaviors are less complex than the well-known exponential
back-off. The methodology [6] does not apply to the whole MAC functionality. Instead, it aims to
optimize MAC protocol performance through evolving the contention window behavior.

24

Very few studies investigate use of evolutionary techniques in developing routing protocols
for DTNs and specifically for SWNs. One reason may be that SWNs usually involve extreme
dynamic human mobility patterns, while evolutionary methods generally require a convergence
period to adapt to environment changes. Feasibility of evolving content dissemination protocols in
SWNs in an online manner can be considered as an open issue.
2.5

Reinforcement Learning based Approaches
Reinforcement learning (RL) has been shown to address routing challenges caused by

dynamicity of wireless networks [47]. Using RL, nodes attempt to learn about a defined quality
metric (e.g., buffer occupancy, distance to destination, etc.) using information they collect locally
from their encounters. The goal is to maximize a reward which is associated with the defined
quality metric.
A Q-learning based routing, namely, Q-routing, was first introduced in [48]. Simple
experiments on a 36-nodes network irregularly connected, proved Q-routing to be superior
compared to a non-adaptive shortest path based routing algorithm in terms of adaptation and
maintaining efficient routing in presence of network dynamicity, such as varying network load.
Delay Tolerant Reinforcement-Based (DTRB) [49] is a flooding-based routing solution for
802.11 wireless networks. It utilizes Multi-Agent Reinforcement Learning techniques to learn
about routes in the network and forward/replicate messages that produce the best reward. Reward,
in DTRBâs framework is associated with nodesâ distance to destination(s). It is shown that DTRB
improves delivery ratio especially in dense population areas.
Adaptive Reinforcement-Based Routing (ARBR) for DTNs [50] uses a Collaborative
Reinforcement Learning (CRL) technique for enabling a group of nodes to make cooperative
forwarding decisions that can increase the delivery ratio and reduce the number of transmissions

25

and message delivery time. The quality metric in this learning framework is associated with nodes
mobility statistics, congestion, and buffer occupancy.
Both DTRB and ARBR [15] are not interest-aware protocols and their quality metrics are
different than the defined gain used in the proposed approaches in this dissertation.
2.6

Summary
Most DTN Multicast and Unicast routing approaches in literature attempt to optimize

individual performance indices such as message delay [37], forwarding cost, and storage. None
define composite cost factors by combining multiple such indices into a single gain index. None
of the existing methods are applicable to commercial dissemination applications, since they do not
formulate the notion of economic gain of dissemination for a coupon provider. Throughout this
proposal, we introduce a Gain Utility to control D2D coupon dissemination with an end goal of
maximizing the economic gain for a coupon generator.
Few existing approaches are not applicable to marketing purposes since they do not model
consumersâ interest and redemption chance for specific types of content. Among those which
define consumersâ interest (e.g., SCORP [20]), binary consumer interest levels (i.e., 0 or 1) are
defined. According to existing business models [11], consumersâ interest level toward an item is
neither binary, nor static.
Most routing protocols rely on a-priori knowledge about specific destination node identifiers
(e.g., [39]). This is not feasible in practical coupon dissemination applications in which the
coupons are delivered to nodes based on their dynamic interest levels and redemption chances
instead of the static identities.
Methods which require packets to record the whole forwarding chain between a source and a
destination (e.g., [17]) are less scalable. Furthermore, from a privacy preservation point of view,

26

storing the forwarding chain in a coupon exposes information about consumersâ social interactions
to a centralized server. To preserve consumerâs privacy, a consumerâs interest profile is preferred
to be stored locally within his/her mobile device. In the proposed model, a consumer device is
allowed to share consumerâs interest profile with other devices selectively upon meeting, but is
not allowed to share such information with a centralized entity.
Among learning based approaches, D2D content dissemination protocol design using
evolutionary frameworks is considered as an open issue. Communication protocol design using
evolutionary mechanisms provides adaptability with heterogeneity and changing operating
conditions. Throughout this work, we present first steps toward evolving network communication
protocols (e.g., for Medium Access Control layer).
Another group of learning based approaches, namely, reinforcement learning based ones, are
not applicable to marketing applications, since they do not: 1) explicitly define and employ a
composite economic gain for the content generator, or/and 2) incorporate a model considering
interest profile of nodes toward a content/product. Part of this dissertation attempts to address both
the limitations by embedding a learning mechanism with the goal of maximizing the defined
economic gain.

27

3
3.1

Gain-aware Unicast Dissemination Protocol (GDP)
Introduction
Routing in Delay Tolerant Networks [51] addresses the technical issues in delivering single

or multiple content in heterogeneous networks that may lack continuous network connectivity.
When no end-to-end path is present, intermediate nodes take custody of the data being transferred
whenever opportunity arises. To choose the most appropriate content forwarders, majority of
routing protocols attempt to explicitly minimize one of the DTN routing indices such as message
delay [37], forwarding cost [46], or storage [31]. Those protocols attempt to explicitly minimize
any one of the above three indices at a time.
From an operational standpoint, it is often necessary to strike a balance between multiple of
those indices instead of targeting the best performance for a single index while neglecting the
others. For instance, a routing protocol that offers excellent delay performance may require an
unacceptably large number of forwarding (e.g. flooding), thus causing impractical amount of
energy burden on the intermediate forwarding nodes. Similar practical problems also exist for the
protocols that attempt to minimize forwarding-count or storage without considering the primary
application requirement, namely, delay. Therefore, a desired DTN routing protocol should be able
to strike an operational balance between the different objectives.
As a first step towards this goal, we develop a composite delay-forwarding cost index, and
design a DTN routing protocol around that index. Many practical mobile applications would
benefit from DTN routing based on such a composite cost index that combines the effects of delay,
forwarding cost, and sometimes storage.
There are many applications of DTNs ranging across education, healthcare, government,
interplanetary networks [52, 53], vehicular networks [54] and marketing services [19]. As a

28

marketing strategy, consider a coupon distributing application which distributes time-constrained
coupons through the wireless network using usersâ mobile devices. Let the value of a coupon be
the amount of discount its recipient gets when she or he redeems the coupon at the store. And the
economic gain of the store from disseminating the coupon is defined as the generated value upon
delivery minus the forwarding costs. Therefore, for a value function that linearly decreases with
time, it is desirable to deliver the coupon as early as possible with the minimum possible
forwarding cost, i.e., through the fewest number of intermediate hops. In other words, the goal
would be to find the right combination of delivery delay and number of forwards so that the
economic gain of dissemination is maximized.
In self-organizing unmanaged networks nodes cannot be expected to forward messages
without considering their own energy-cost of forwarding. This may require the content distributor
to pay a rebate (i.e., contributing to the forwarding cost) to each user whenever he/she forwards
the content. The concept of rebate is particularly important in self-organizing social settings in
which there is no central authority that can enforce nodes to store and forward content without
incentives. In addition to the coupon example above, the concepts of value, rebate, and economic
gain apply to a number of other contents such as advertisements, event notifications, etc.
In this chapter, we propose an economic gain-aware DTN routing protocol that
automatically adapts itself to different combinations of value functions and forwarding costs
without requiring any explicit parameter tuning. Specific contributions are as follows. First, we
design and parameterize a composite gain function that combines the effects of delay (via value)
and forwarding-cost (via rebate) during unicast DTN routing. Second, we develop a unicast
routing mechanism that yields the best possible gain for a given network and mobility pattern.
Third, we develop a unicast DTN routing protocol Gain-aware Dissemination Protocol (GDP)

29

that utilizes the time-dependent gain function as a forwarding decision metric in both direct and
transitive manners, to route content from a mobile source to a mobile destination. Finally, using
DTN simulator ONE [55] , the proposed protocol is extensively characterized along with few other
existing unicast DTN routing protocols.
The proposed mechanism assumes full node cooperation. In other words, it does not address
issues raised in the presence of non-cooperative and colluding selfish nodes, and attacks such as
Sybil [56] and whitewashing attacks [57].
3.2

Network Model
DTN Networks can be formed in public places such as in a University where students carrying

mobile devices can form an ad-hoc system. Due to periodic schedules of classes, lunch hours, and
other social events, students in a University often form communities which are groups of students
with similar mobility patterns. Students taking same set of courses or students belonging to same
dormitory are examples of community formation.
Consider a student X whose mobile device is in possession of an electronic coupon from a
fast food restaurant in the campus. The coupon was delivered to the device when student X visited
the restaurant. Student X can now distribute the coupon (single hop or multi-hop) to students of
dormitory Dy through Xâs classmates who reside in dormitory Dy. In this section, we design a
unicast DTN routing protocol GDP which can maximize the gain of the coupon delivery from
source student X to a pre-determined destination student Y.
Consider the DTN in Figure 3-1, in which the vertices represent mobile nodes, the edges
represent the occurrence of contacts between node-pairs, and the weights on the edges indicate the
average inter-contact time between the nodes. There are three unicast paths from source S to
destination D, namely S-M-N-O-P-D, S-A-B-D and S-X-D. The delivery latencies from S to D

30

across these paths follow the pattern: S-M-N-O-P-D << S-A-B-D << S-X-D. This is because the
nodes in the path S-M-N-O-P-D meet each other frequently, whereas the nodes in the path S-X-D
meet each other very infrequently. When the value obtained by delivering the content to D does
not depend on the delivery time and the forwarding cost (i.e., the rebate given to the forwarding
nodes) is high, it is preferable for S to hold on to the content and deliver it to D along the path SX-D so that the number of forwards is minimized. On the other hand, if the value of the content
falls rapidly with time and the forwarding cost is low, it is preferable for S to forward along the
path S-M-N-O-P-D so that the delivery delay is minimized.
1 hour
1 hour

1 hour
N

M
5 hours

S

1 hour
O

5 hours
A

P
5 hours

B

1 day

1 hour

D
1 day

X

Figure 3-1: Example user interaction and DTN routing scenario
3.3

Problem Formulation
Value: The value of content is a monotonically non-increasing function of time. For example,

the value of an electronic discount coupon delivered over a phone-to-phone DTN network is
expected to be the maximum in the beginning, and zero after its expiry time, which is termed as
the value duration. The shape of the function (e.g., liner, exponential, step etc.) is set by the content
originator. Value can be abstracted for other time-critical electronic content including short-cycle
advertisements, warning/alert messages, and peer-to-peer auction services. Very similar value
abstraction can be applied to a short-cycle advertisement being distributed by a merchant such as
a department store. The objective is to deliver an advertisement to a group of interested destinations
such as students in a campus. A price-aware algorithm should ideally manage delivering the

31

content to an interested person within the defined expiration period. Earlier delivery induces a
higher value. The value function in this example would be a step function with an initial amplitude
that becomes zero at the time of expiry. Another embodiment of value would be to distribute timebound content within a peer-to-peer social network. Consider an example in which a person twits
a content which should be peer-to-peer distributed among a group of interested receivers within a
certain period of time that is determined by the lifetime of the content (e.g., the lifetime of a photo
in Snapchat; http://www.snapchat.com/). The value function of such a content in this application
is determined by the lifetime of the content set by its originator.
A simple linearly declining value function can be written as:
đ đĄ = đ˘đđđđđđđđ,											đĄ	 < đĄR	

(3-1)

																															= VR â Îą t â t R ,						đĄR	 â¤ t	
where đĄR is the time of content injection, đR is the initial value, Îą is the value gradient, and
đ t âĽ 0 for all t.
Cost: Forwarding cost at time t can be expressed as:
																																																											đśZ	 đĄ = đś[	 . đZ	 t 																																																

(3-2)

where đZ	 (t) is the number of forwarding till time t, and đś[	 is a fixed cost of each forwarding,
which is the rebate or reward given to the forwarding users. Practical formulation of đś[	 will depend
on specific target application. For a coupon application, đś[	 can represent a credit, possibly a
fraction of the coupon value [19], which is given to each forwarding node as an incentive for
energy-expensive content forwarding. Other formulations of đś[	 are possible for specific
applications.
Gain: Combining value and cost, the gain can be written as:

32

																																																														đş đĄ = đ đĄ â đśZ	 t 																																							

(3-3)	

Gain can be negative. Meaning the overall benefit of a content delivery can be negative when an
expired content is delivered or when the overall forwarding cost is higher than the obtained value
upon delivery. Protocols which are unaware of the notion of gain are in danger of delivering the
packet with a negative gain, i.e., a worthless delivery.
Routing Problem: For given value function with specified đR and	Îą, cost factor	đś[	 , and
network mobility, the objective is to design a unicast DTN routing algorithm that maximizes the
gain đş đĄ at content delivery.
3.4

Gain Upperbound
To investigate the gain performance of the experimented DTN routing protocols (including

that of GDP), it is essential to know the gain upper-bound with given network mobility and content
value profiles. To this end, we developed a strategy to find the gain upper bound for when all
system parameters including node mobility is known a prioi. In other words, we attempt to find
the maximum gain a packet can achieve in a certain scenario, so that we can compare the achieved
upper-bound with the gain that other protocols, including GDP, can obtain. A full flooding
approach can be used for: a) finding all possible routes, b) computing the gain via such routes, and
c) choosing the one with maximum gain. This cannot be implemented (even in simulation) because
of non-polynomial complexity of storage of content that are stamped with all possible routes
during flooding. A scalable Routing with Gain Aware Pruning (RGAP) for finding the performance
upper bound is devised as follows. Note that the role of RGAP is much like the Oracle based DTN
routing protocols [46] that are able to provide the best-performing forwarding when the node
mobility is known a priori.
In RGAP, An intermediate node âiâ maintains a variable đ?	 which represents the Minimum

33

Loop-free Forwarding Count (LFC) among the paths taken by all copies of the content received
by node âiâ. The concept is illustrated in Figure 3-2. In the left, since there is no forwarding loop,
đ?	 in this case is 3. In the middle, the number of forwarding is larger than the loop-free forwarding
count â which is 2. The right one shows that additional looping does not change	đ?	 . Up to a given
time, if node I received three copies via those forwarding scenarios, then đ?	 at that time will be 2,
which is the minimum of the three.
Before receiving the first copy, đ?	 is set to a large value at each node i. Each copy maintains
a field đ]Z^	 ,	representing the LFC encountered by that copy so far. Before forwarding its copy,
node I updates the đ]Z^	 field as	đ]Z^	 = đ?	 + 1. When a node j receives a copy, if for that copy
đ]Z^	 < đ`	 , then đ`	 is set to đ]Z^	 , and the copy is subsequently flooded. If đ]Z^	 âĽ đ`	 ,	then the
received copy is discarded. The destination maintains a variable đşabc	 which at a given time
represents the maximum of the gains obtained from the individual received copies so far. Upon
receiving a copy, the destination computes (using đ đĄ and đś[	 ) a temporary gain đş@c	 . If
đş@c	 > 	đşabc	 , then đşabc	 is replaced by	đş@c	 .
Proof of Gain Bound: The objective of RGAP is to prune out routes (from all possible end-toend routes as a result of flooding) which are guaranteed to not yield higher gain than the current
maximum gain at the destination. The pruned routes are the ones that have both higher delay and
number of forwards than the route corresponding to the current maximum gain at the destination.
When an intermediate node j receives a copy with đ]Z^	 âĽ đ`	 , the path followed by that copy can
be pruned because it results in both higher delay and forwarding counts, and hence, lower potential
gain at the destination. This ensures that for given mobility and value/cost profiles, within the time
till value becomes zero (i.e., value duration), đşabc	 of RGAP monotonically reaches the maximum
possible gain at destination.

34

Source

Node-i

Source

Node-i

Source

Node-i

Fig. 5: RGAP for unicast benchmark

Figure 3-2: RGAP for finding performance upper bound
3.5

Gain-aware Dissemination Protocol
Gain-aware Dissemination Protocol (GDP) is a heuristic based approach and driven by the

key idea of forwarding content through a path with low inter-node contact intervals, so that the
expected gain at the destination is maximized. GDP is a history based protocol that observes
locality in node interaction patterns and develops utilities based on that for gain-aware forwarding.
3.5.1

State Variables
Each node I maintains two variables: 1) đ?	 : the expected number of hops to destination on
an estimated route which yields the maximum gain, and 2) đˇ? : the expected delivery
latency for the same route. Nodes interchange đ?	 and đˇ? 	via periodic Hello messages.
When the node I meets a destination node j, it initializes đˇ? to be 	đĽ?,` which indicates the
mean inter-contact time between nodes I and j. It also initializes the quantity đ?	 to 1. When
node I meets any node j (i.e., node j is a destination or non-destination), both nodes update
the quantities đĄ?,`	 and	đĽ?,` , which are the meeting time and the mean inter-contact time
between nodes I and j respectively.

35

Notation
Definition
Expected number of hops to destination on an
đ?	
estimated route which yields the maximum
gain
Expected delivery latency for a route to
đˇ?
destination on an estimated route which yields
the maximum gain
Meeting time of nodes I and j
đĄ?,`	
Mean inter-contact time between nodes I and j
	đĽ?,`
Expected gain by delivering a content
đ?
generated by node i
đ đˇ?
Generated value after a delivery latency đˇ?
Compensation given for each forwarding along
đ[	
the route to destination
Time of content generation and injection in the
đĄi	
network
Expected gain by forwarding the content to
đ`
node j, when node I meets an intermediate node
j at time đĄE 	(đĄi < đĄE )
Table 3-1: Notations and corresponding definitions used in GDP
The latter adaptively captures any locality in inter-contact pattern. Table 3-1 shows the
notations used in this section and their corresponding definitions.
3.5.2

Updating State Variables
The expected gain by delivering a content generated by node I can be written as
																																															đ? = đ đˇ? â đ?	 â đ[	 																																																				(3 â 4)

where đ đˇ? is the generated value after a delivery latency đˇ? , and đ[	 is the compensation
given for each forwarding along the route to destination. Let us now consider a node I that
possesses a content and it meets node j. If the latter is the destination, then đ?	 and đˇ? must be
updated only if đ? < đ đĽ?,` â đ[	 . If that is true, then node I updates đ?	 as 1 and đˇ? 	as đĽ?,` . If node
j is not the destination, node I checks to see if it is profitable to use node j as an intermediate carrier
to the destination. If yes, then đ?	 is updated such that it considers the one additional intermediate

36

hop between node I and destination through node j. Also, đˇ? 	is updated to take into account both
the average inter-contact time between nodes I and j, and the expected delivery latency from node
j to the destination. Thus, if đ? < đ đĽ?,` + đˇ` â (1 + đ`	 ) â đ[	 , update đ?	 as 1 + đ`	 and đˇ? 	as
(đĽ?,` + đˇ` ).
3.5.3

Forwarding Logic
When a node in possession of content meets the destination, it makes a delivery only if the

remaining value for the content is larger than the forwarding compensation CR. Using the
constructs of đ?	 and đˇ? , a node chooses the best forwarding entity from its social peers who may
or may not currently be in direct contact. Consider node I which is in possession of a content that
was generated and injected in the network at time đĄi	 . When node I meets an intermediate node j
at time đĄE 	(đĄi < đĄE ), the expected gain by forwarding the content to node j can be written as:
	đ` = đ((đĄE + đˇ` ) â đĄi ) â đ`	 â đ[	 	

(3-5)

The expected gain by holding on to the content by node I is đ? . Every time node I meets an
intermediate node j, in addition to computing đ? and đ` , it also computes the expected gain by
forwarding the content through nodes that it met in recent past, but not currently (at time đĄE ) in
direct contact. For such a node k which was last met by node I at time đĄ?,m	 , if đĄ?,m	 + 	 đĽ?,m	 > 	 đĄE	 ,
meaning it is not yet time for the next expected meeting, the expected gain by forwarding the
content to node k can be written as:
đm = đ(đĄ?,m	 + 	 đĽ?,m	 + đˇm â đĄi ) â đm	 â đ[	 	

(3-6)

However, if the time for the next meeting has passed, that is if đĄ?,m	 + 	 đĽ?,m	 < 	 đĄE	 , the
expected gain can be computed as
	đm = đ(đĄE	 + đˇm â đĄi ) â đm	 â đ[	 																									

37

(3-7)

After computing đ? , đ` (j Ďľ all direct neighbors), and đm (k Ďľ all recent past neighbors), node
I first chooses the node with the highest đ value for all j and k. If the node with that highest đ (i.e.
đabc ) corresponds to a past neighbor, then no forwarding is done. If it is a current neighbor, then
a forwarding to the corresponding neighbor is performed only if đabc > đ? + đś[ . All the key
concepts of GDP protocol are summarized in Algorithm 3-1.

Note that a flooding/spray based version of GDP was not considered for the following reason.
The basic premise of GDP is to be able to design a utility based forwarding in which the utility is
explicitly designed based on the core objective of gain maximization. All the system parameters
and their algebraic handling in Section 3.5 are based on this premise. Introducing spraying, which
is a form of naĂŻve flooding, goes against this core effort towards gain-awareness and introduces a
sense of blindness in the forwarding effort. GDP is intended to avoid this by making the protocol
as utility driven as possible.

38

3.6

Performance Characterization
Using the ONE [55] simulator, we compare GDP with Direct Delivery [46], PROPHET with

estimation [30], Epidemic [37], Spray and Wait [58], and the RGAP (i.e., the performance upper
bound) routing approaches. Protocol performance was evaluated using a community based
synthetic mobility model and a trace based model [59] in which mobility traces were collected for
participants in the conference INFOCOM â06.

Figure 3-3: Experimental network topology with six social cliques
3.6.1

Experiments with Clique-forming Synthetic Mobility
A 101-node network, was structured in five 20-node and one 1-node social cliques or

communities. The cliques are formed due to node mobility around six pre-positioned Point of
Interests or POIs. Movement of each node is stochastically programmed by an inter-waypoint
probability transition matrix. As shown in Figure 3-3, the cliques C1, C2, and C3, with 20 nodes
each, are designed to be isolated in that the members of an isolated clique interact among each
other, but do not move out to meet nodes from other cliques. The members of the other two 20node cliques C4 and C5 are migratory between C1-C2 and C2-C3 respectively. Finally, the single
node in C6 migrates between cliques C1 and C3. An example realization of this model would be

39

the student interactions in a campus where the POIs represent class rooms, food court, athletic
centers, etc., and the cliques represent students from specific academic units, dormitories, etc.
All results presented for this model corresponds to content generated at a node in C1 and is
destined to a node in C3. With the above clique configuration, content needs be routed either
through a path through migratory cliques C4 and C5, or through a path through the only node in
the migratory clique C6.
We used a constant forwarding cost (i.e. đś[ ) and a linearly decreasing value function with
the initial value set to 1. The maximum possible value of đś[ is also set to 1. Main experimental
variables are đś[ and Value Duration (VD), which is the time for the value function to reach zero.
The chosen range for đś[ and VD are 0.02 through 0.18, and 1000sec through 20000sec
respectively. Low VDs indicate faster reduction of contentâs value with time. The reported gain is
averaged over ten different source-destination pairs with the source in clique C1 and destination
in clique C3.
3.6.1.1 Protocol State Warm up
Before content is injected in the network the simulation is run for a warm up period during
which the protocol state variables, as introduced in Section 3.5.1, are allowed to stabilize.
Figure 3-4 shows the evolution of protocol state variable Îi,j (i.e., the expected inter-contact time)
for a number of arbitrarily chosen node pairs in a representative simulation run.

40

Pairwise Inter-contact Time

10000

1000

N2-N4
N5-N2

100
Warm up time

N5-N3
10

N1-N4

1
100
0.1

5100

10100

15100

20100

25100

30100

Time (Seconds)

Figure 3-4: Warm up time based on inter-contact time evolution
In this particular case, the warm up period is chosen to be 15,000s by which time the state
variable Îi,j for all node-pairs were observed to have stabilized. Content injection and routing
operation was initiated at this point. The warm up period of 15,000s was experimentally justified
after observing the stabilization of Îi,j for a large number of node pairs (i.e., in this case 11). In
our experiments, the inter-contact time was found to be in the range of 1s to 7000s, representing
the most and least frequently interacting node pairs respectively.

41

1

C1 and C2

0.8

C 0.6
D
0.4
F
0.2

C4
C5
C6

0

40

(a)

0.4

120
200
280
Duration	(seconds)
ICT (Seconds)
0.05
C4 and C1

C4 and C6

0.04
P 0.03
D 0.02
F 0.01
0.00

0.3
P
D 0.2
F
0.1

0
(b)

360

40 160280400
Duration	
âŚ
ICT
(Seconds)

(c)

40 200 360
âŚ
ICTDuration	
(Seconds)

Figure 3-5: Example contact statistics of a C4 node
3.6.1.2

Steady State Contact Statistics
Figure 3-5 depicts example node interaction statistics as CDFs of inter-contact time (ICT)

between a node from the migratory clique C4 and all other network nodes in Figure 3-5. In
Figure 3-5:a, the clusters of CDF lines indicate formation of ICT localities due to interaction
affinities that GDP can leverage for gain-maximizing dissemination. Figure 3-5:b and Figure 3-5:c
show the PDF of ICTs between a node in C4 and nodes in C1 and C6 respectively. Results in
Figure 3-5 clearly indicate that the C4 nodeâs social interactions with nodes from C1 and C2 are
stronger than with those from C5 and C6. This implies that to route content to a node in social
clique C4, nodes from C1 and C2 would serve as better next hops compared to those from C5 and
C6.

42

3.6.1.3 Protocol Self-tunability
Results in this subsection aim to establish the ability of the proposed GDP in achieving gain
closer to the RGAP upper bound (see Section 3.4) compared to PROPHET. It also establishes that
unlike PROPHET, GDP can self-tune its operation with varying value and cost parameters.
Figure 3-6:a shows the gain upper-bound computed using RGAP algorithm. The theoretical
gain upper-bound is 1-đś[ , which happens when content is directly delivered immediately upon
generation. Higher gains are achievable for lower đś[ and for higher VD. Figure 3-6:b reports the
performance of PROPHET when its parameters (i.e., initialization, transitivity, and aging
constants) are tuned for each {đś[ ,VD} combination to ensure the best achievable gain. It performs
well only for low đś[ and high VD since for high đś[ and low VD, often the content remains
undelivered, indicated by zero gain.
Figure 3-6:c shows how PROPHET performs when it is tuned for a specific scenario
(i.e.,	đś[ = 0.04, VD = 4000 sec). While performing well in the vicinity of the tuned-for {đś[ ,VD}
combination, it shows poor gain, often negative, at other regions of the {đś[ ,VD} space.
Figure 3-6:b,c indicate the need for parameter 43enability depending on mobility, value, and cost
profiles. Since mobility information is not always known a priori, parameter tuning may not be
feasible for PROPHET and other parameter-configurable DTN protocols. Note that while
observing gain across different protocols, more emphasis should be given on relative gains as
opposed to on the absolute gains.

43

b) Fully tuned PROPHET

0.8

0.6

0.6

0.4

0.4

0.2

0.2

10000

0

1

0.8

0.06

0.1

Gain
Gain

0.02

0.6
0.4
0.2
0
-0.2
-0.4
-0.6
-0.8
-1

CR = 0.04, Value
Duration = 4000 Sec

10000

0

1000

0.14

b) Fully tuned PROPHET

c) Scenario-tuned PROPHET

0.02

0.18

c) Scenario-tuned
PROPHET
b) Fully tuned PROPHET

1

CR = 0.04, Value
Duration = 4000 Sec

0.8

0.06

0.1

1000
0.14

0.18

c) Scenario-tuned
PROPHET
d) Self-tuned
GDP

Gain
Gain

0.8

1

Gain
Gain

a) RGAP

Gain
Gain

1

CR = 0.04, Value
Duration = 4000 Sec

0.6
0.02

0.06

0.1

10000
1000
0.14

0.4

0.18

0.2

10000

0
0.02

0.06

0.1

1000
0.14

0.18

Gain

Gain
Gain

d) Self-tuned GDP

c) Scenario-tuned PROPHET

d) Self-tuned GDP
Figure 3-6:Gain comparison between GDP, PROPHET, and RGAP
CR = 0.04, Value

Duration =on
4000the
Sec ratio of the initial content value đ 0 to the
The absolute gains depend directly

forwarding cost	đś[ . That ratio also determines the amount of negative gains.
Figure 3-6:d reports performance of the proposed GDP protocol. Observe that the protocol
is able to obtain gains that are close to that of the RGAP-indicated upper bound, except for few
scenarios when the forwarding cost is too high and the content value reaches zero too quickly. It
d) Self-tuned GDP

Gain

is worth mentioning that the high-gain performance of GDP was obtained without parameter
tuning for different {đś[ ,VD} combinations, and also for different mobility scenarios. This is
mainly due to the fact that the protocol adapts itself with mobility by observing the inter-contact
dynamics	đĽ?,` , and with {đś[ ,VD} combinations by making forwarding decisions based on the
gain-aware utility function as proposed earlier.

44

3.6.1.4 Evolution of Gain and Latency
Figure 3-7:a,b show comparative gain and latency for the proposed GDP and scenario-tuned
{đś[ = 0.04, đđˇ = 4,000 sec} PROPHET for a representative run. As shown in Figure 3-7:b, since
the route taken by gain-agnostic PROPHET remains unchanged, its delivery delay remains
constant for all đś[ . GDP, on the other hand, changes route based on the forwarding cost. Initially
for low đś[ , it forwards content through a low-delay (and high forward-count) path, but as
đś[ 	increases, the route is shifted towards a higher delay (and low forward-count) path. Figure 3-7:a
shows that irrespective of the path shifting, GDP always maintains high gain, which is what it is
designed for. Figure 3-7:a also shows the corresponding performance of RGAP, which is very
close to GDP.
Figure 3-7:c shows gain evolution for PROPHET and GDP with time. Generally, gain
reduces by đś[ upon each forward, and finally rises upon delivery. PROPHET in this case delivered
earlier than GDP, but due to more aggressive forwarding, its eventual gain is smaller.
To summarize, the results in the above two subsections indicate GDPâs self-tuning ability
to perform gain-aware forwarding and to obtain gain values that are close to the practical upperbound observed from the RGAP mechanism.
3.6.1.5

Delay, Forwarding Cost, and Gain
Having observed the overall gain trend of GDP in the previous two subsections, in this and

a few following subsections we characterize the protocol in further details. In addition to
PROPHET and RGAP, we analyze Epidemic Routing [37] and Spray and Wait (SNW) [58] in
this characterization phase. For each routing protocol under consideration, we ran 100

45

experimental scenarios, each with a different seed for enabling different mobility and node
interaction scenarios.
1600

1

0

PROPHET Tuned:
CR = 0.04, Value
Duration = 4000 Sec

a) Gain

)

0.2

GDP

(

G 0.6
a
i
0.4
n

D
e 1200
l
a
y 800

RGAP

0.8

S
400
e
c
0

0.02 0.06 0.1 0.14 0.18
Forwarding	Cost
0.35

G
0.15
a
i -0.05
15700
n -0.25
-0.45

Delivery

c) Gain Evolution

Route
Change
PROPHET

GDP

Route
Change

b) Delivery Delay
0.02 0.06 0.1 0.14 0.18
Forwarding	Cost
GDP

PROPHET
16100

Forwards

16500

16900

17300

17700

Time	(seconds)

Figure 3-7: Comparative delivery delays and gain evolution
Figure 3-8 shows a scatter-plot for delay versus forwarding count for different protocols when
the forwarding cost is set to 0.01 and the value duration (VD) is set to 20,000s. Each point in the
graph for a given protocol corresponds to a simulation run with a specific seed (i.e., mobility/node
interaction scenario). The numbers in parentheses indicate average Delay (D), Number of forwards
(F) and Gain (G) across all 100 scenarios.
Note that the cluster for RGAP in Figure 3-8 occupies the region corresponding to the lowest
delay and fewest forwarding, thus ensuring the best gain performance. The cluster for Epidemic
Routing [37], however, indicates that while the protocol is consistently able to deliver content very
early, it incurs high forwarding expenses. The cluster for Spray and Wait (SNW) [58] incurs
relatively lower forwarding cost while its delivery delay is higher and spread over a wider range.

46

The protocol PROPHET [30] has a similar spread in delay, but its forwarding cost is lower than
SNW, mainly due to its usage of an adaptive utility of delivery predictability. The cluster for GDP
also shows a large delay spread, but its forwarding cost is strictly bounded mainly because GDP
delivers a packet only if the combination of delay and forwarding cost generates a positive gain.
Thus, even when the delay is large, content is delivered only if the forwarding cost is limited
enough to maintain a positive gain. In this manner, GDP is able to maintain a forwarding cost
bound (i.e., 4) which is similar to that of RGAP. The intersection of GDP and RGAP clusters
indicates how GDP can achieve the same performance as that of RGAPâs in a number of node
interaction scenarios. Note that the delay-forwarding behavior of the protocols presented in
Figure 3-8 were observed to be valid for a wider range of cost and value durations (e.g., cost: 0.02,
0.1, etc., and VD: 5K, 40K, etc.).
120

Epidemic(D:156.5;F:95.4)

Number of forwards

100
SNW(D:642;F:65.3)

80
60
40
20

PROPHET(D:1420.9;F:53.8)

RGAP(D:229.9;F:4) GDP(D:3058.9;F:4.5)

0
100

1000
Delay (Seconds)

10000

Figure 3-8: Delay vs. Forward count with cost: 0.01 and VD: 20K
Note that the initial number of copies for the SNW protocol was set to 723, which turned
out to be the minimum number of initial copies needed for content delivery. The reasons for the
initial number of copies to be that large were as follows. The adopted community based model
contains special node dynamics involving bridge nodes. For example, nodes in community C4

47

(refer to Figure 3-3) are bridges which receive packet from the source in community C1 and then
relay it to any of the nodes in community C2. We observed that in such scenarios, it is essential
for the bridge nodes to maintain enough copies of packets, so that they can relay the packet to the
next bridge node after sharing the packet with their own neighbors in a binary spraying manner.
Note that, a binary version of Spray and Wait is used with the community based model.
Let us now analyze an example which happens in our community based scenario. The static
source in community C1 starts forwarding with community 723 copies and shares its copies with
other 19 members of community C1 using a binary Spray [58]. Afterwards, most community C1
nodes already hold less than 90 copies of the packet. A member of community C4 visits a node in
community C1 (with 90 copies) and takes half of its copies, which is 45. This node reaches the
wait phase after forwarding the packet to 6 of community C4 members. Note that a node in its wait
phase switched to a direct delivery behavior. Thus, it can be seen how quickly a bridge node from
community C4 enters the wait phase and loses its ability to relay the packet to community C2
which contains the bridge nodes that connect community C2 to community C5. Considering that
very few contacts take place between nodes of community C1 and C4 members in the mentioned
scenario, community C4 nodes have a small chance of holding enough number of copies to share
with community C2 members. Note that community C2 members should share their copies with
community C5 members later, since community C5 members are the only ones (except the single
node in community C6) which meet destination directly and can deliver the packet.
The higher forwarding cost of SNW comparing to PROPHET in Figure 3-8 is explained by
the large number of forwards that the spraying phase in SNW incurs due to the large required
number of initial copies (i.e. 723).

48

Figure 3-9 shows delivery delay vs. gain when the forwarding cost and value durations are set
to 0.01 and 20,000s respectively. Observe how Epidemic Routing produces a point cluster with
very low delay but sometimes with negative gain. This is because Epidemic delivers content as
early as possible in a forwarding cost-insensitive manner, thus, maintaining a bound on the delay
â but often with very low gain due to high forwarding cost. For all other protocols, the gain
generally decreases with increase in delay, thus reducing the value of content at delivery time. The
clusters of PROPHET and SNW are more spread out compared to GDP and RGAP. That is because
GDP bounds the number of forwards (Figure 3-8) while attempting to reach the maximum gain.
Figure 3-10 shows the general trend of decreasing gain with increasing forwarding costs. It
depicts high forwarding costs for both SNW and Epidemic even when their gains are low.
PROPHET, with its delivery predictability abstraction, provides slightly better gain, even though
its forwarding cost is quite significant. GDP, however, with its gain-aware utility formulation is
able to provide a much better gain cluster with extremely low forwarding cost. Observe the
proximity and overlapping of the GDP cluster with that of RGAP which provides the gain upperbound. From the results in this subsection it is apparent that GDP is able to provide the best gain
performance when compared to the other protocols, except the upper-bound provided by RGAP.

49

1.2

RGAP(D:229;G:0.94)

1

0.8

GDP (D:3058;G:0.8)

PROPHET(D:1420;G:0.39)

Gain

0.6
0.4
0.2

SNW (D:642;G: 0.31)

0
-0.2

100
1000
Epidemic (D:156;G:0.03)

-0.4

10000

Delay (Seconds)

Figure 3-9: Delay vs. Gain with cost: 0.01 and VD: 20K
1.2

RGAP(F:4;G:0.94)

1

GDP(F:4.5;G:0.8)

Gain

0.8
PROPHET(F:53.8;G:0.39)

0.6

SNW(F:65.3;G:0.31)

0.4
0.2

Epidemic(F:95.4;G:0.03)

0
3

30

300

-0.2
Number of forwards

Figure 3-10: Forward count vs. Gain: cost 0.01 and VD 20K
3.6.1.6 Impacts of Forwarding Cost
Figure 3-11 reports the (delay, forwards) scatterplots for GDP with two different forwarding
costs 0.01 and 0.02, when the value duration is set to 20,000s. A notable observation in Figure 3-11
is that the point clusters for the two cost scenarios overlap significantly, thus pointing to the fact
that the protocol is able to adjust its routing behavior in order to maintain an optimal number of
gain-maximizing forwards based on the underlying forwarding expenditures. The non-overlapping
points in the figure belong to different routes taken toward the destination.

50

8

Number of forwards

7

GDP-Cost 0.01(D:2727.2;F:4.59)

6
5
4
3

GDP-Cost 0.02(D:3058.9, F:4.58)

2
1
0
100

1000
10000
Delay (Seconds)

100000

Figure 3-11: Delay vs. number of forwards for GDP
Figure 3-12 and Figure 3-13 corroborate the above observation via the finding that different
forwarding costs minimally affect the obtained economic gain. This is accomplished through selftuning of the forwarding aggressiveness. In both figures, the gain and its spread for GDP with cost
0.01 is slightly larger than that with cost 0.02.
3.6.1.7

Impacts of Value Duration

Figure 3-14 depicts delay vs. number of forwards for three different value durations (VD) of
5000s, 20,000s and 40,000s (i.e., 1.3hr, 5.5hr and 11hr) respectively. Observe that with the lowest
VD of 5000s, GDP offers earliest delivery, but with very high forwarding cost. This is because the
protocol attempts quicker delivery via aggressive forwarding before the value of the content
reduces too much. As the VD increases (i.e., 20000s and 40000s), the protocol becomes less
aggressive, thus allowing relatively higher delivery latency and lesser forwarding costs.

For the

delay-gain scatter plots in Figure 3-15, observe that the gain decreases with a steeper slope as the
value duration decreases. This is consistent with the observation from Figure 3-14, validating that

51

for content with short-lasting value, quick deliveries are essential in order to preserve the potential
gain.

Gain

1
GDP-Cost 0.01(D:2727.2;G:0.8)
0.9
0.8
0.7
0.6
0.5
0.4 GDP-Cost 0.02(D:3058.9;G:0.77)
0.3
0.2
0.1
0
100
1000
10000
100000
Delay (Seconds)

Figure 3-12: Delay vs. gain for GDP
1
0.9
0.8

0.7

GDP-Cost 0.01(F:4.59;G:0.8)

Gain

0.6
0.5
0.4
0.3

GDP-Cost 0.02(F:4.58;G:0.77)

0.2
0.1
0

0

2

4
Number of forwards

6

8

Figure 3-13: Number of forwards vs. gain for GDP
From the forward-gain scatterplots in Figure 3-16, it can be observed that the points for
majority of the scenarios with VD = 40000s are scattered around the minimum number of
forwards, i.e., 4. The reason is that with very large value duration, in an ideal gain-aware routing

52

protocol, nodes would hold and carry the content for direct delivery. Therefore, higher value
duration justifies a lower forwarding cost.
9

8

1

GDP-VD 20000s(D:3058.9;F:4.5)

7

GDP-VD 40000s
(D:3059.2;G:0.87)

GDP-VD 20000s
(D:3058.9;G:0.8)

0.8

6
5

Gain

Number of forwards

1.2

GDP-VD 5000s(D:1839.6;F:4.6)

0.6

4
3

0.4

GDP-VD 40000s
(D:3059.2;F:4.5)

2

0.2

1

0
10000 100

0
100

GDP-VD 5000s
(D:1839;G:0.58)

1000
Delay (Seconds)

Figure 3-14: Delay vs. number of forwards with
forwarding cost: 0.01

1000
Delay (Seconds)

10000

Figure 3-15: Delay vs. gain with forwarding
cost: 0.01

1.2
GDP-VD 40000s
(F:4.5;G:0.87)

1

Gain

0.8
0.6
0.4

GDP-VD 5000s
(F:4.6;G:0.58)

0.2

GDP-VD 20000s
(F:4.5;G:0.8)

0
0

2

4
6
Number of forwards

8

10

Figure 3-16: Number of forwards vs. gain with forwarding cost: 0.01
3.6.1.8 Joint Impacts of Forwarding Cost and Value Duration
Figure 3-17 reports value duration vs. gain for a range of forwarding costs for all the evaluated
protocols. Generally, with extended value duration since the value of a content delivery is higher,
the gain is also higher for all the protocols. A comparison between the graphs in Figure 3-17:a
through Figure 3-17:c reveals that the performance gap between Epidemic and GDP increases as
the cost increases from 0.01 through 0.1. Same can be also observed while comparing the gains
53

achieved by GDP and PROPHET. The reason is that a more disciplined decision should be taken
while forwarding when the cost of forwarding is higher. In other words, it is more reasonable to
restrict liberal forwarding (as done in Epidemic) when the cost is higher. GDP, on the other hand,
chooses a more conservative forwarding approach in such situation, which leads to a higher gain.
We have also simulated a Direct Delivery [46] approach which was not able to deliver content due
to very low probability of direct contact between the source and destination in this and many other
practical mobility scenarios. Both Epidemic and Direct Delivery are naĂŻve in that they are value
un-aware and do not consider any routing utilities. They are used only for the purpose of
illustrating the typical patterns in the gain dynamics. Overall, Figure 3-17 confirms that GDP is
able to provide better economic gain compared to all other simulated protocols in the presence of
this model based mobility scenarios.
3.6.1.9

Effects of Varying Content Generation Time
For all the presented results so far, the content was generated at a specific time 15000s,

which is after the network warm up period. In order to validate the performance consistency over
different generation times, which can potentially change content trajectory, results are presented
for a different content generation time in this section.
Figure 3-18 and Figure 3-19 depict value duration vs. gain when content is generated at
16000s and 17000s respectively with the clique-forming synthetic mobility. Similar to all previous
results, in both scenarios, the same increasing trend of gain for increasing value duration can be
n

observed. As depicted, for low VDs GDP choses no content delivery over a delivery with negative
gain. But for higher VDs, GDPâs gain is closest to that of RGAP compared to the other protocols.

54

1.2

1.5
RGAP

1

GDP

0.6

0.5

SNW

Gain

Gain

0.4
0.2

0
1000

-0.2

(a)

20000

GDP
SNW

0
1000

5000

10000

-0.5
Epidemic

-1

PROPHET
Value duration (Seconds)

PROPHET

(b)

-1.5

Value Duration (Second)

1.2
GDP-VD 40000s
(F:4.5;G:0.87)

1
0.8

Gain

-0.8

10000

0.6
0.4

GDP-VD 5000s
(F:4.6;G:0.58)

0.2

GDP-VD 20000s
(F:4.5;G:0.8)

0
0

2

4
6
Number of forwards

8

10

Figure 3-17: Number of forwards vs. gain with forwarding cost: 0.01
2
1

RGAP
GDP

0
-1
Gain

-0.6

5000
Epidemic

-0.4

RGAP

1

0.8

1000

5000

10000

20000

-2
-3

SNW

-4
-5

-6
-7

Epidemic

PROPHET
Value duration (Seconds)

Figure 3-18: Performance for content generation at 16000s

55

20000

2

RGAP

GDP

1
0

Gain

-1

1000

5000

10000

20000

Epidemic

-2

-3
-4

SNW

-5
-6

PROPHET

Value duration (Seconds)

Figure 3-19: Performance for content generation at 17000s
3.6.2

Experiments with Trace Based Mobility
As the second mobility model, we have used human interaction traces collected [59] from

INFOCOM â06 conference. The trace contains data from 97 devices. The first 20 devices (i.e.,
iMotes) are static and the rest were carried around by a group of users for all four conference days.
As done with the model based synthetic mobility in Section 3.6.1.1, the system waits for a
warm-up period before content is injected. According to the evolution of Îi,j shown in Figure 3-20,
75000s is used as the content injection point for the following experiments. Note that unlike in the
synthetic mobility scenario in Section 3.6.1.1, the source and destinations nodes in this case do
directly meet in some instances, thus making Direct Delivery feasible.

56

Pairwise Inter-contact Time (Seconds)

N8-N48

Warm up time
N84-N12
N1-N55

8000
N22-N23

N35-N36

N41-N92
800
0

100000

200000
Time (Seconds)

300000

400000

Figure 3-20: Warm up time for the INFOCOM â06 mobility trace
Figure 3-21:a through Figure 3-21:c show GDPâs gain with varying value durations when
the forwarding costs are set at 0.05, 0.1 and 0.14 respectively. As expected, in all scenarios the
gain increases with increasing value duration. The gain difference between GDP and Direct
Delivery decreases slightly with higher cost and higher value durations. This is because it is
desirable for a protocol to withhold the content and perform a direct delivery when both the value
duration and forwarding cost are high. GDP adapts itself to such behavior, which results in a
similar gain for Direct Delivery and GDP under the mentioned situation.
It should also be observed that with higher forwarding costs, the gain differences between
GDP and all other protocols (i.e., Epidemic, SNW, and PROPHET) increase. In fact, GDP is the
only protocol which retains a positive gain for almost all value durations with costs as large as 0.1
and 0.14. This is in contrast to all non-GDP protocols that can maintain positive gains only with
forwarding costs smaller than 0.05. That is why the minimum cost value is chosen to be 0.05 for
these experiments. For the clique-forming synthetic mobility results in Section 3.6, this threshold
value was 0.01 (see Figure 3-17). It is noteworthy that for very small forwarding costs and value

57

durations (e.g., cost: 0.05 and VD: 1000 in Figure 3-21:a) GDP can be outperformed by the less
conservative protocols such as Epidemic and PROPHET.
2
PROPHET

Gain

0
1000

GDP

0
1000
-0.5
10000

Epidemic

RGAP

PROPHET

10000

-1

SNW

-1

Epidemic

-1.5
-2

-2

DD

-2.5

-3
-4

GDP

0.5

Gain

1

1
RGAP

SNW

-3
-3.5

(a)
Value duration (Seconds)

1

-4

DD
(b)
Value duration (Seconds)

RGAP

0.5

GDP

0
1000
-0.5

PROPHET

10000

Gain

-1

-1.5
Epidemic

-2
-2.5

DD

-3
SNW

-3.5
-4

(c)

Value duration (Seconds)

Figure 3-21: VD vs. Gain for INFOCOM â06 trace; Source node N5;(a): Cost 0.05, (b): Cost
0.1 and (c): Cost 0.14
3.6.2.1 Effects of Different Content Sources
All our previous results were shown for the distribution of a content that was generated at a
specific node. Figure 3-22 shows the comparative protocol performance when content is generated
at a different node chosen from the INFOCOM â06 mobility trace. Note that there are no reported
results for Direct Delivery in Figure 3-22 since the protocol was not able to deliver content for any
of the value duration and cost combinations that we experimented with. This indicates that the
source and destination never met.

58

There are no reported results for the protocol spray and wait (SNW) either. This is because
there were not enough contacts for a successful delivery during both the spray and wait phases.
These results, along with similar experiments we ran for a broader range of costs (e.g., 0.3, 0.4)
demonstrate the consistent performance advantage of GDP (i.e., close to the RGAP performance)
with higher value durations for arbitrary content generators.
3.6.2.2 Effects of Varying Content Generation Time
To evaluate the effects of varying content generation time on GDP performance, we
generated the packet at a later point than 75000s. Figure 3-23 depicts value duration vs. gain when
content is generated at 82000s with INFOCOM â06 mobility. Even with change of content
trajectory, GDP still holds the maximum gain. Similar to all previous results, increasing trend of
gain is visible due to longer value durations.
Overall, from all experimental results (i.e., with different mobility models, forwarding costs,
value durations, source nodes, and generation times) it was found that approximately for 87.1% of
the experiments, GDP achieved a gain that is closest to the upper bound found by the RGAP
protocol. Direct Delivery and PROPHET win in 10% and 2.5% of the experiments respectively.
These happen mainly when both the value duration and the forwarding costs are extremely low.
These clearly show the dominance of GDP from the standpoint of gain as formulated in this work.

59

1

1
0.8

0.5

0.6

0
1000

GDP

0
1000
-0.2

10000

-0.5

PROPHET

0.2

PROPHET

Gain

Gain

0.4

-0.4

GDP

RGAP

RGAP

-1

Epidemic

Epidemic

10000
-1.5

(a)
-2

Value duration (Seconds)

1

Gain

0
1000
-0.5

Value duration (Seconds)

GDP

RGAP

0.5

(b)

PROPHET

10000

-1

-1.5

Epidemic

-2
-2.5
-3

(c)
Value duration (Seconds)

Figure 3-22: VD vs. Gain for INFOCOM â06; Source node N1; (a): Cost 0.05, (b): Cost 0.1 and (c): Cost
0.14
3.6.3

Gain-positive Delivery Ratio

In the present context of gain, successful deliveries are defined as those only with positive gains.
The index Gain-Positive Delivery Ratio (GPDR) is defined as the fraction of all content that are
delivered with positive gains. Note that a protocol may have high delivery ratio in the conventional
sense, but it may suffer from poor GPDR if its routing strategies are not gain-aware. Epidemic and
Direct Delivery, for example, can maintain a 100% delivery ratio if the notion of gain is not
involved. Yet, both of them can perform poorly from a GPDR standpoint. Epidemic in one extreme
may use frequent number of forwards, until the forwarding cost exceeds its value. Direct Delivery
in the other extreme, may postpone content delivery until its value reaches zero.
60

1.5
1

RGAP GDP

PROPHET

Gain

0.5
0
1000

5000

-0.5

10000

20000

Epidemic

-1
-1.5
-2

DD

SNW
Value Duration (Seconds)

Figure 3-23: Performance for content generation at 82000s; Infocom â06 traces; Source node N5;
Cost 0.05
Figure 3-24 shows the GPDR computed over 100 experiments with clique-forming synthetic
mobility for the protocols GDP, Epidemic, SNW, and PROPHET. Combinations of three CR
(forwarding cost) values: 0.01, 0.02 and 0.1, and three value durations: 5,000, 20,000 and 40,000
are used.
Figure 3-24 clearly shows the dominance of GDP in achieving the highest gain-positive
delivery ratio, especially when the forwarding cost is high. The other three protocols maintain a
lower than GDP, but reasonable gain-positive delivery ratio when the cost is 0.01. However, their
GPDR degrade when forwarding becomes more expensive. These results justify GDPâs
forwarding logic from the delivery ratio perspective.

61

b) Epidemic

100
80
60
40
40000
20000
5000

20

0
0.01

0.02

Gain-positive Delivery Ratio (%)

Gain-positive Delivery Ratio (%)

a) GDP

0.1

100
80
60
40
40000
20000
5000

20

0
0.01

0.02

0.1

Forwarding Cost

Forwarding Cost

d) PROPHET

100
80
60
40
40000
20000
5000

20
0
0.01

0.02

0.1

Gain-positive Delivery Ratio (%)

Gain-positive Delivery Ratio (%)

c) SNW

100
80
60
40
40000
20000
5000

20
0
0.01

0.02

0.1

Forwarding Cost

Forwarding Cost

Figure 3-24: Gain-positive Delivery Ratio (GPDR) for a) GDP, b) Epidemic, c) SNW, and d)
PROPHET
3.7

Summary and Conclusion
Design details and performance of a new economic gain based DTN routing protocol was

presented in this chapter. Economic gain from disseminating content is defined as the generated
value upon delivery minus the forwarding costs. It was shown that by adaptively balancing
between content dissemination latency and number of forwarding, the proposed protocol GDP is
able to achieve higher gain compared to a number of existing DTN protocols with partially similar
objectives. Using Experimental results from the simulator ONE [55], it was shown that for certain
community based mobility scenarios, GDP can provide near-benchmark gain (i.e., close to the gain
upper-bound achieved by RGAP) while protocols such as PROPHET suffer from negative gains
when their protocol parameters are not explicitly tuned for target mobility scenarios. It was also
shown that in certain scenarios GDP can suffer from worse delay or forwarding cost, but manages

62

to maintain better gain which is a composite index obtained via combing the former two. As shown
in Section 3.6.1 for the community based model, in all the performance measures including delayvs-forward-count, delay-vs-gain, and forward-count-vs-gain, GDP performs closest to the
benchmark RGAP compared to all other experimented protocols. Similar trends were also
observed for the trace based mobility model obtained from INFOCOM â06 traces [59]. It is
noteworthy that such performance advantages of GDP were maintained with varying forwarding
cost and value dynamics over time.
One future work direction on this topic can be securing GDP against non-cooperative users,
and under various threats such as Sybil [56] and whitewashing attacks [57].
In the following chapter, we extend the concept of gain-aware forwarding with multi-cast
dissemination.

63

4
4.1

Economy Driven Multicast Content Dissemination
Introduction
In this chapter, we present a DTN multicast based content dissemination framework with

economic underpinnings. Commercial gain from disseminating a content is defined as the
generated revenue upon delivery minus the forwarding costs incurred for stimulating cooperation
among the forwarding nodes by compensating for their resource/energy usage. We introduce a
novel concept of Economy Driven Content Dissemination (EDCD) using which a Content
Generator (CG) is able to maximize its economic gain by disseminating content among interested
Content Consumers (CCs), and that is after compensating the Content Forwarders (CFs). The
general approach in the existing DTN protocols [18, 21] is to use routing utilities that minimize
latency, forwarding-count, storage, and in some instances â combinations of the three. Due to the
lack of gain-awareness, these protocols cannot control the economic gain of dissemination.
EDCD, on the other hand, controls content dissemination based on a composite index,
termed as economic gain, which combines revenue from delivery and forwarding cost from
disseminating commercial content such as coupons. Specific contributions of this chapter are as
follows. First, we define a consumer interest based dissemination gain abstraction within a
practical commercial setting. Second, we propose a new gain-aware multicast Device-to-Device
dissemination protocol (i.e., EDCD) and its usage within the developed network and commercial
model. Finally we evaluate EDCD and few other competing protocols using simulation to evaluate
their relative dissemination performance.

64

4.2

Economic Gain of Dissemination
As mentioned in Section 1.11.2, interaction patterns in an SWN can be indexed with social

metrics such as contact duration and contact frequency. This information about the underlying
social interactions can be leveraged to deliver content such as a coupon. The objective is to deliver
coupons through multicast D2D routes so that the economic gain of dissemination is maximized.
Note that, the terms content and coupon are used interchangeably throughout this dissertation.
4.2.1

Cost, Revenue, and Gain

Cost: The cost to a coupon-distributing merchant includes: 1) the product discount cost đo ,
which is the discount given to a customer upon redeeming a coupon, and 2) the rebate đp , which
should be given to nodes which relay one or multiple coupon(s). This can also be perceived as the
cost of forwarding.
Revenue: For a coupon-distributing merchant, this quantity represents the net revenue from
redeeming copies of a coupon by one or multiple recipients. Net revenue is computed as the earned
revenue đq 	minus the discount đo . The net revenue of a coupon redemption by consumer-I is then
defined as:
																																																		đ? = 	 đ@ (đ? )(đq â đo )															

(4-1)

đ@ đ? 	represents the probability of redeeming a coupon by consumer-i. This quantity depends
on the consumerâs interest level đ? 	(on a coupon), and is elaborated in the next subsection.
Gain: Combining revenue and cost, the expected gain achieved through a coupon redemption
by consumer-I can be written as:
đş? = 	 đ? â 	 đp
The total network wide gain đş r can be expressed as:

65

(4-2)

đşr = 	

st
?uv đ?

â đĂđp

(4-3)

where coupons are delivered to đo consumers through đ forwards. The maximum number of
delivery of a coupon is generally restricted by its generator. That is, đo 	 â¤ 	đ, where đ is that
distribution upper bound set by the coupon generator.
4.2.2

Consumer Interest and Coupon Redemption

Each consumer-I maintains an interest level đ? (0 â¤ đ? â¤ 1) towards a specific content type.
Such interest level for a consumer, can be set through many possible means as described in
Section 1.3.1. To preserve privacy, in this model, consumer devices are allowed to share their
interest profiles with other devices upon meeting.
Coupon Redemption: Redemption probability for a consumer, represents his/her likelihood
of redeeming a coupon within a defined coupon expiry time (See Section 1.3.1). One such example
with a maximum redemption probability of 0.5 is shown in Figure 4-1. The figure shows
redemption probability graph that decreases exponentially with increase in interest level. A
corresponding Gain function is also shown in the figure. As can be seen in the figure, for a given
redemption function, earned revenue đq ,	and discount đo , the gain đş? can be positive only for a
specific range of interest level.
A desirable target consumer is defined as a consumer with high enough redemption
probability such that the induced net revenue from a coupon-redeemed purchase is higher than the
forwarding cost (rebate) for the coupon. Such a delivery induces a positive economic gain đş? .
Figure 4-1 shows the range of interest levels for an example definition of desirable target
consumers.

66

)

0.5
0.4
0.3
0.2
0.1
0
0.2
0.4
-0.1 0
-0.2
-0.3 Desirable Target
-0.4 Consumers

)
0.6

0.8

1.4
1.2
1
0.8
0.6
0.4
1 0.2
0
-0.2
-0.4

Figure 4-1: Example redemption function

The presented redemption model is general in that it can accommodate other redemption
functions including the ones in which the redemption probability increases with higher interest
level. It can also accommodate functions in which the redemption probability maximizes for
interest levels that are not too low or not too high. As shown in [14], coupon delivery to highly
loyal (interested) customers may kill customer loyalty. Thus, light to moderately loyal customers,
as opposed to most loyal ones with higher interest levels, may maintain higher redemption rates.
Another example of appropriate target population is the new customers with smaller interest levels.
Reports show that new customers spend 67% less with repeat purchases compared to loyal (highly
interested) customers [60]. In both examples, the objective is to deliver coupons to the target user
class with such optimal interest levels so that the cumulative network wide redemption rate can be
maximized. The abstraction used for representing redemption probabilityđ@ đ? can be used for
capturing both the scenarios outlined above.
4.2.3

Consumer Privacy and the Role of Centralized Server

Even though the dissemination process presented in this chapter is fully Device-to-Device, we
assume the presence of a centralized server with primary roles as explained in Section 1.3.3.

67

Privacy preservation: In the presented model it is ensured that the data related to consumer
interests and consumer-to-consumer interaction is always stored at the consumersâ devices. Such
privacy-sensitive data is never exported to the centralized server. This opens up the possibility of
economic gain-aware coupon dissemination without a centralized server having to know about
consumer mobility and their interests in different content types. This is an attractive aspect of the
architecture.
Dissemination Termination: It is possible for the server to take an active role in terminating
dissemination in order to control the network-wide economic gain. As shown in Section 4.7, the
network wide economic gain generated by coupon dissemination can start declining after it reaches
a maximum value. In such situations, the server can terminate dissemination at an appropriate time
to ensure near-maximum network-wide gain. Before every coupon delivery, the delivering node
checks with the server about the current network wide cumulative gain. When the server observes
a declining trend, it puts a stop on the dissemination of that specific coupon. This can result in
near-maximum final gain from a dissemination process.
Note that such server-assisted termination does not violate the privacy feature as outlined
above. Meaning, the centralized termination can be performed without the server having any
access to the consumer interest and interaction/mobility data.
4.3

Problem Definition
For given earned revenue đq , discount đo , rebate đp , distribution upper bound đ, redemption

function đ@ (đ? ), and a social wireless network of consumers with interest profiles đ? , the objective
is to design a multicast coupon dissemination mechanism that maximizes the total gain đş r 	within
the lowest possible dissemination delay.

68

Consider the example delay tolerant network in Figure 4-2. Each consumer is represented
as N-I	(đ@ (đ? )), where I is the node-ID and đ@ (đ? ) indicates its redemption probability computed
based on a pre-assigned redemption function applied to its interest level đ? . For example, the
consumer N-0 (0.8) represents node-0, which is likely to redeem a coupon (of the type in question)
with a probability of 80%.
N-3 (0.6)
N-1 (0.1)

N-4 (0.9)

N-0 (0.8)
N-5 (0.7)
N-2 (0.8)

Figure 4-2: Example user interaction and routing
Consider a scenario in which N-0 (0.8) possesses at least five copies of a coupon that needs
to be disseminated such that the overall gain from its dissemination is maximized. There are two
possible paths: though N-1 (0.1) and N-2 (0.8). Given the social interaction pattern it is obvious
that dissemination through N-1 (0.1) can provide more revenue even though N-1 itself does not
contribute too much to the gain because of its relatively low redemption probability. Node N-2
(0.8), on the other hand, while being able to contribute to the revenue significantly, does have
relatively worse dissemination potential compared to N-1 (0.1). The gain performance also
depends on the forwarding cost đp . The objective of the protocol is to be able to choose the right
forwarding path so that the overall gain is maximized.

69

4.4

Economy Driven Content Dissemination
Economy Driven Content Dissemination (EDCD) relies on the knowledge of: 1) nodesâ

interest profiles (i.e., đ? ), and 2) their underlying social interaction patterns. They key idea is that
a node-I, carrying T copies of a content, first evaluates any of its peer jâs quality as a consumer
and second, as a potential forwarder. Based on that evaluation it may transfer: 1) a copy of the
coupon for jâs consumption, and 2) certain number of copies to j for further forwarding.
State Variables: Any node-I maintains the following variables:
đđ,đ :	iâs frequency of meeting node-j over a pre-defined time window of đ (e.g., a week)
đšđ :	expected cumulative gain for node-I for delivering the content to its one-hop neighbors
encountered during time window đ . It can be expressed as:
đż? = (

â˘âŹ
muv[(đq - đo )đ@ (đm )

đ?,m â đp Ă1])

(4-4)

where đ? is the set of nodes interacted by node-I during the time window đ. This includes only
the interacted nodes that have not already received the content in question. A larger đż? implies that
the nodes that I directly interacts with have higher redemption probability, thus yielding higher
expected gain.
When I meets j, it updates đ?,` and đż` . It also receives jâs expected gain of delivery to all its
non-I one-hop peers in the time window đ, which j computes based on Equation 4-4.
Forwarding Logic: When a node-I with T copies of a coupon meets node-j, it first evaluates
the possibility of j being a potential consumer of the coupon based on its redemption probability.
Meaning, based on the delivery probability đ@ (đ` 	)/	đ@abc , I forwards a consumption-copy to j and
E
E
sets đ?,`
= 1. With probability 1 â đ@ (đ` )/đ@abc , đ?,`
is set to zero and no consumption-copy is

forwarded to j. Normalization by đ@abc is done for linearly mapping the delivery probability

70

đ@ (đ` )/đ@abc 	within the range [0,1]. The redemption probability varies within the range [0,	đ@abc ].
Table 4-1 summarizes all the state variables used in this section.
Notation
đ?,`

Definition
Frequency of meetings between nodes I and j in a
preset time window đ
Expected cumulative gain from all iâs one-hop
đż?
delivery in time đ
Net price of the product (earned revenue)
đq
Product discount cost
đo
Rebate given to nodes which relay the coupon
đp
(forwarding cost)
E
đ?,`
Number of copies (out of T copies) forwarded for
E
jâs consumption from I to j (đ?,`
	 â [0,1])
p
Number
of
copies
(out
of
[T-	
đ?,`
E
đ?,` ] copies) forwarded for dissemination through
j from I to j
đ@ (đ? ) Redemption probability of node I with interest
level đ?
Net revenue of a coupon redemption by consumerđ?
I
Merchantâs expected gain through a coupon
đş?
redemption by node i
r
đş
Network-wide total expected gain đş r
Table 4-1: Notations and corresponding definitions in EDCD

After the decision about delivering a consumption-copy to node-j is made, if additional copies
are left, I evaluates jâs suitability as a forwarder. This forwarding evaluation is done based on đż` .
bâĆ

First, node-I computes the quantity đż`si@a = đż` /đż?

bâĆ

, where	đż?

is the average of all đżm (đ	 â

đ? , the set of all nodes that I directly interacted with during the predefined time window đ). A
larger đż`si@a shows that j is a better forwarder compared to the other nodes encountered by nodeE
i. Therefore, a larger percentage of the remaining [đ â 	 đ?,`
] copies of the coupon should be
p

forwarded to node-j for dissemination. A positive đż`si@a is then mapped to an integer đ?,` , which
indicates the number of copies to be forwarded by I to j for dissemination. The value of đż`si@a

71

belongs to the range

âŚâŹâ âŹâĄ âŚâŹâ Ëâš
Ëâ°Ĺ  , Ëâ°Ĺ 
âŚâŹ
âŚâŹ

, where đż?a?s and đż?abc are the minimum and maximum of đżm
p

E
respectively	(đ	 â đ? ). To map đż`si@a to đ?,` , which is an integer in the range [0,	đ â 	 đ?,`
], the

min-max normalization from [61] is applied. Such linear transformation preserves the relationship
âŚâŹâ âŹâĄ

between data values from the old range (i.e.,

Ëâ°Ĺ 
âŚâŹ

	,

âŚâŹâ Ëâš
Ëâ°Ĺ 

âŚâŹ

E
) to the new range [0,	đ â 	 đ?,`
].

If đż`si@a â¤ 0 or if node-j already possesses copies of the coupon, then node-I does not
p

need to transfer any copies to node-j for forwarding purposes. Meaning the quantity đ?,` is set to
zero. The key components of the described EDCD algorithm are summarized in Algorithm 4-1.
In the above logic it is assumed that a node-j always reports correct values of đ@ (đ` ) and
đż` 	to its peers. This may not be true when consumers act selfishly. Such selfishness is considered
outside the scope of this research.
4.5

Compared Protocols

4.5.1

Interest-Aware Dissemination (IAD)

This is a degenerate version of EDCD in which no node-interaction information is used.
Content such as coupon is delivered based only on the redemption probability. Similar to EDCD,
E
when a node-I carrying T copies of a coupon meets node-j, đ?,`
is set to 1 with probability

đ@ (đ` ))/	đ@abc . However, the number of forwarding copies transferred from node-I to node-j is
determined randomly without using the interaction information	đ?,` , as utilized in the social-aware
p

E
EDCD. The number of forwarding copies đ?,` is chosen randomly from a range of [0,	đ â 	 đ?,`
].

To summarize, IAD is expected to show the dissemination performance of when nodes do not

72

register any social interaction information, but are ready to share their own interest level with
interacted nodes.
1: When node-I meets node-j:
2:
I updates đ?,` and đż` for node-j in its records;
Forwarding Logic:
3: For node-I carrying T copies of a coupon;
p
4:
đ?,` = 0;
E
5:
đ?,`
= 0;
6: //set number of copies for jâs consumption
E
7: đ?,`
= 1 with probability đ@ đ` ;
E
8: if (đ â đ?,`
> 0) then
9: if (j has no copies for dissemination) then
10:
//set number of copies to forward to j for
dissemination
â˘â âŹâĄ

â˘Ĺ˝

11:

p

đ?,` = 	

â˘
( Ëâ°Ĺ  â˘ âŹËâ°Ĺ  )Ă(râ˘râŹ,Ĺ˝
)
â˘
â˘
âŹ
âŹ
â˘â Ëâš
â˘â âŹâĄ
âŹ
âŹ
Ëâ°Ĺ  â˘ Ëâ°Ĺ 
â˘
â˘
âŹ
âŹ

	;	

12: end if
13: end if
p
E
14: Forward đ?,`
and đ?,` number of copies to j, for
consumption and dissemination respectively;
Algorithm 4-1: The key components of the EDCD algorithm
4.5.2

Social-aware Opportunistic Routing Protocol (SCORP)

This interest aware protocol [20] works with binary interest levels (i.e., 0 or 1). It uses a social
weight, computed using a nodeâs interest profile and social interaction patterns. Node-a computes
the quantity	đđśđđź(đ, đĽ)? , which is its Total Connected Time to Interest (TCTI) with nodes with
interest-x using a daily sample-i. The average đ´đđśđđź(đ, đĽ)`,? is computed for day-j as:
đ´đđśđđź(đ, đĽ)`,? =

r^râ(b,c)Ĺ˝,âŹ â˘(`â˘v)âr^râ(b,c) Ĺ˝âË ,âŹ
`

(4-5)

Content is transferred from node-I to node-j, when node-j has consumption interest, or, its
social weight (computed based on its ATCTI) is higher than that of i. We have incorporated the
following changes in order to adapt the baseline SCORP [20] with binary interests to the present
framework of multi-level interests. The key idea is to let the node interest level đ?	 vary in the range

73

0 to 1, but to restrict the redemption probability đ@ (đ` ) to binary values of either 0 or 1. đ@ (đ` ) is
set to 1 for đv < đ` < đâ˘ , and zero for other values. The thresholds đv and đâ˘ are chosen such that
the resulting gain from a single delivery is positive, that is, it can offset the forwarding cost đp . We
use a Gaussian redemption function, and a threshold pair (đv = 0.63, đâ˘ = 0.88) chosen based on
the above strategy.
4.5.3

Random Forwarding

This resembles Epidemic [37] DTN routing which is oblivious to both interest and interaction
p

profiles. When a node-I with T copies of a coupon meets node-j, it forwards đ?,` copies to node-j
p

with a predefined forwarding probability đp . The number of forwarded copies đ?,` is chosen
randomly between 1 and T. The forwarding probability affects dissemination gain dynamics in
terms of the maximum gain and the time to reach that maximum value. The scenario with đp = 1
represents Epidemic DTN forwarding. In following set of results we show results for the đp = 0.5
scenario.
4.6

Experimental Details

4.6.1

Interest and Redemption Function

We use the DTN simulator ONE [55] for evaluating EDCD and other protocols with two
community based mobility models. The models differ in terms of: 1) number of nodes, 2)
community structure, and 3) interest level distribution. For all the experiments we use a redemption
function in which the redemption probability maximizes for interest levels that are not too low or
not too high. This is motivated by the observation [14] that while improving purchase probability

74

for moderately loyal (interested) customers, coupon delivery may often kill product loyalty for
highly loyal customers.
To capture this effect, a Gaussian model for the redemption function is used. Figure 4-3
shows the applied Gaussian redemption function with mean đ = 0.75 and maximum redemption
probability	đ@abc = 0.5. The corresponding gain curve in Figure 4-3 is when the forwarding cost
đp , earned revenue đq ,	and discount đo are set to 0.2, 5 and 2 respectively. These values are also
used for all subsequent experiments.

Figure 4-3: Example Gaussian redemption function
4.6.2

Target Exposure Index (TEI)
TEI indicates how the desirable target consumers, as defined in Section 4.2.2, are visible
to the rest of the nodes in a network. TEI is computed as the expected maximum number
of contacts with target consumers and is expressed as a per node metric. It is a network
property in terms of the mobility and social interaction patterns. Higher TEI is expected to
yield better gain of dissemination. It should be noted that TEI is computed only to
characterize the mobility model dynamics and its impacts on dissemination performance
for different protocols. There are no protocol syntaxes that rely on this parameter. More
about TEI computation is presented in the performance section.

75

4.6.3

Degree of Penetration (DoP)
This is a protocol parameter, which is defined as the maximum depth of a multicast
dissemination tree. The root of the tree is the node, which initially receives all T copies of
a content from its generator such as a coupon merchant. When the đˇđđ	is set to one, the
source is not allowed to forward any forwarding copies to any of its directly interacted
peers. It is allowed to distribute only consumption copies to T-1 peers. In other words,
those peers cannot forward the copies, thus maintaining the depth of the tree to be 1. When
the đˇđđ	is set to 2, only one level of re-forwarding is allowed. Larger đˇđđđ 	allow deeper
trees to form, thus, more thorough search can take place in order to find nodes with high
redemption probabilities.

4.7

Dissemination Performance

4.7.1

Social Wireless Network with Clique Formation

As shown in Figure 4-4, a 100-node network is structured in three clique-based communities
C1, C2 and C3.

C1
(10)

C2
(50)

C3
(40)

Fig. 4-4 Network with three social cliques (đđ¸đź: 21.5)
The numbers in parentheses show the population of each community. The inter-clique arrows
indicate the flow of inter-community mobility. For example, 10 members of C2 migrate between
C2 and C1, and another 10 members of C2 migrate between C2 and C3. Nodes inside each
community meet more frequently than nodes from different communities. Based on the strategy
described in Section 4.2.2, desirable target consumers are designated as those with interest levels

76

in an approximate range of 0.63 to 0.88 (see Figure 4-3). While such desirable target consumers
are deployed within the communities C1 and C3, nodes with lower interest levels (i.e., < 0.6) are
inserted in the community C2.
An example real-world representation of this model would be the daily student interactions in
a University campus. Students who are more interested in a specific product interact more often
compared to those who do not share similar interests. For example, a community of desirable
target consumers as shown in Figure 4-4, can consist of students who share a common interest in
a type of food.
SCORP EDCD

2.5

10

IAD
(a):DoP=1

2
1.5
RND

1

Average Gain

Average Gain

3

EDCD (b):DoP=3

6
4

SCORP

2
RND

0

0.5
Time (days)

10

(c):DoP=5

8
SCORP

6

EDCD

4
2
RND

0
5.5

7.5

IAD
11.5

15.5
Time (Sec)

10

25.5
(d): DoP=10

8

SCORP

6

EDCD

4
2

IAD
RND

0
9.5
Time (days)

IAD

5.5

10.5

Average Gain

5.5

Average Gain

8

5.5

6.5

7.5
Time (days)

8.5

Fig. 4-5: Multicast dissemination with different Depth of Penetration (DoP)
Figure 4-5 shows evolution of the average cumulative gain from dissemination for all four
evaluated protocols under different đˇđđ scenarios. For all cases, the mobility shown in Figure 44 is used which results in a Target Exposure Index (TEI) of approximately 21.5.

77

TEI indicates how visible the desirable target consumers are to all other consumers in the
network. The cumulative gains for all the protocols in Figure 4-5 under all DoP scenarios have a
generally increasing trend before they start falling after reaching a maximum. In line with the
method outlined in Section 4.2.3, dissemination is terminated when the moving average (i.e.,
computed over a half a day window) of the cumulative gain falls by at least đĽ/đĄ % of the attained
maximum. The quantity đĄ is the elapsed time since the coupon generation, and	đĽ is an empirically
chosen parameter that controls termination. For all presented results, the value of đĽ is chosen to
be 0.24, which provided consistent performance across all the evaluated protocols.
The results show that among all the evaluated protocols, the proposed EDCD offers the largest
maximum gain across all the đˇđđ scenarios. Also, the maximum attainable gain for EDCD first
increases with increasing DoPs (i.e., from 1 to 3), but then it falls for larger DoPs (i.e., larger than
3). With đˇđđ < 3, the depth of the multicast dissemination tree is not sufficient to reach many
desirable target consumers residing in C1 and C3. This explains the need for a sufficiently large
DoP setting for maximizing the gain. Too large đˇđđs, on the other hand, incur the cost of many
unnecessary forwards, which bring the gain down. This is evident from the graphs in Figure 4-5.
The gain of the protocol IAD is worse than that of EDCD because its forwarding component
is not social interaction-aware. Also, IADâs performance does not change much with larger
penetration levels. The reason is that IAD forwards one consumption-copy along with random
number of forwarding-copies to nodes based on their interest level and the resulting redemption
probability. For any preset DoP level, IAD does not allow the non-consumer nodes (i.e., the
migrating nodes between C1 and C2 in this case), to receive any copies for
consumption/forwarding. This restricts its ability to attain high economic gain.

78

The protocol SCORP, however, achieves a higher gain as đˇđđ	increases. This is because it
leverages more intermediate nodes to relay the content to farther destinations. For smaller đˇđđs,
the extra cost of forwarding is a hindrance to the growth of network-wide gain. Although for
large	đˇđđđ , the large cumulative revenue achieved across a deeper multicast tree offsets the
forwarding cost. Observe that EDCD takes longer than all other protocols to reach its maximum
gain because all other protocols rely on some form of randomization for the forwarding decisions.
EDCD, on the other hand, relies on a specific criterion, which often cause longer forwarding
delays, but results in better quality forwards.
Note in Figure 4-5, that the random dissemination reaches its maximum gain earlier than other
protocols because of its random forwarding criterion in choosing coupon consumers, which are
also the forwarders. Random choices are inherently faster than using a criterion, as done by the
other protocols. This fast dissemination, however, comes at a steep price of lower maximum gain
compared to the non-random protocols.
Increased Exposure to the Desirable Consumers: In this scenario we increase the network wide
visibility of the nodes with higher redemption rates. This is done by imposing more frequent
contacts between the members of community C2 and the target consumers of community C3.

C1
(10)

C2
(50)

C3
(40)

Figure 4-6: Mobility for higher target consumer exposure (đđ¸đź: 37.5)
As shown in Figure 4-6, in addition to C2âs migrating nodes to C3 and C1, now the members
of C3 also occasionally migrate to community C2. These modifications caused an increase in TEI
to a value of approximately 37.5.

79

Figure 4-7 shows the evolution of gain for different approaches with đˇđđ = 2 and đˇđđ = 5.
Observe that EDCD achieves the highest gain compared to the other approaches for both the đˇđđđ .
The maximum gain, however, is achieved at a lower đˇđđ	(i.e., 2), compared to the cases in the
low-TEI scenario (đˇđđ: 3) in Figure 4-5:b. The reason is that more exposure of target consumers
in this high-TEI case allows the nodes with high coupon redemption probability to be reached

35
EDCD
(a):DoP=2
30
25
20
15
10 SCORP
IAD RND
5
0
5.5
7.5
9.5
11.5
Time (days)

Average Gain

Average Gain

through a shallower multicast tree.
35
30
25
20
15
10
5
0

(b):DoP=5
EDCD

SCORP

IAD
5.5

RND

10.5
Time (days)

15.5

Fig. 4-7: Gain in scenarios of high TEI
Also, with this higher target consumer exposure, generally it takes shorter to reach the maximum
gain. For example, it takes around 11.6 days for EDCD to reach its maximum gain (i.e., 33.7)
across both the DoPs (i.e., Figure 4-7:a), while, it takes around 16.6 days to reach its maximum
gain of 9.63 when there is less exposure to target consumers (Figure 4-5:b). Higher exposure to
target consumers also implies that there are more relay nodes between content carrier(s) in C1 and
the part of target consumer population, which reside in C3. According to the low-TEI scenario in
Figure 4-4, the 10 migrating nodes moving between C2 and C3 are mainly responsible for relaying
the content to C3. In the high-TEI topology in Figure 4-6, however, all low-interest members in
C2 can act as relay nodes in addition to those 10 migrating nodes. The maximum attainable gain
of the social interaction unaware protocol IAD does not significantly depend on the target

80

consumer exposure level. This is because IAD does not forward coupons to non-consumer nodes
ignoring any effects of enhanced target consumer exposure in high-TEI scenarios. With large DoPs
(i.e., 5), the Random dissemination achieves a higher gain (i.e., 6.4) in the current high-TEI
scenario when compared to its maximum gain of around 2.6 in the low-TEI scenario as shown in
Figure 4-5:c. This is because more nodes with high redemption probability are in reach of the
carrier nodes in the high-TEI scenario.
SCORPâs gain increases with an increase in đˇđđ	as well, because the increase of reachability
to the desired targets provided by a large đˇđđ	(i.e., 5) compensates for the additional rebate cost
incurred by random deliveries. Figure 4-8 shows the forwarding costs in terms of the average
cumulative rebate given to intermediate nodes as forwarding incentive. IAD and the Random
dissemination incur very similar rebate/forwarding costs in both the TEI scenarios. The reason is
that the Random method blindly forwards a random number of copies irrespective of the quality

30

(a)TEI:21.5

Rebate (forwarding cost)

Rebate (forwarding cost)

of the receivers and level of exposure to target consumers.

25
20
15
EDCD

10

SCORP

5

Random
IAD

0
1

2

3
DoP

4

5

(b)TEI: 37.5

30
25
EDCD

20

SCORP

15
Random

10

IAD

5
0
1

2

3
DoP

4

5

Figure 4-8: Total rebate (forwarding cost) per destination
For IAD, even with increased contact frequencies between the members of C2 and C3, content
cannot be effectively relayed further to where C2 and C3 meet. This limitation stems from IADâs
inability to use the social interaction information for forwarding decisions.

81

The protocol SCORP incurs slightly higher rebate cost in the high-TEI scenario especially
for the larger đˇđđ (i.e., 5). This is mainly due to more exposure of target consumers to potential
relay nodes from community C2.
The maximum amount of rebate-increase with higher TEI happens for the protocol EDCD. It
almost doubled across the shown scenarios. The reason is very similar to that for SCORP.
Meaning, more members from community C2 maintain high đż? (đ	 â đś2) values (refer to Table 41) due to their higher exposure to the members of C3 in the high-TEI scenario.
To obtain further insight, we plot the value of the expected gain đż? in Figure 4-9. It is plotted
with time for the average values computed from: 1) nodes in communities C1, C2, and C3 (i.e.,
denoted as đżm , đ = 1,2,3), 2) the migrating nodes between C1 and C2 (đż^v:^â˘ 	), and 3) the
migrating nodes between C2 and C3 (đż^â˘:^H 	).
Figure 4-9:a shows various đżm values for the lower TEI scenario. Note that đżm can be negative
when the expected generated revenue is smaller than the cost of forwarding. Observe that the
340

(a)TEI:21.5

!"$

290

!"':"#

!&

!"#:"$

190

90

!&

140
90

!"#:"$

!"# !"':"#

40

(b)TEI:37.5

!"$

140

240

-10

!"#

190

40

!"'

!"'
2.8

12.8

22.8
32.8
Time (days)

42.8

-10

2.8

7.8

12.8
Time (days)

17.8

22.8

Figure 4-9: Expected gain from one-hop deliveries
quantity đż^v:^â˘ initially rises due to the meetings between the migrating nodes and the members
of C1 with high redemption probabilities. As soon as the majority of the desirable target consumers
in C1 receive the coupon, đż^v:^â˘ falls and reaches negative values. The more frequently a node
meets peers with high redemption probabilities, a higher đżm is expected. Observe that đż^H is the

82

largest among all đżm due to more frequent intra-community contacts in C3 among the nodes
holding high redemption probabilities.
Figure 4-9:b (the high-TEI scenario) shows that đż^â˘ 	 maintains the largest đżm values since
the members of C2 frequently meet nodes with high redemption from communities C1 and C3. A
similar declining trend is visible for all đżm . This is due to the decrease in number of potential
coupon receivers as the dissemination progresses. In other words, as the number of potential
coupon redeemers goes down, đż` reduces. The reason that đż^H 	 and đż^â˘:^H 	 do not fall in the lowTEI scenario in Figure 4-9:a is that a portion of the population in C3 with high redemption
probability do not receive the coupon within the shown time duration. The fall for đż^H 		and đż^â˘:^H 	
happens much earlier in the high-TEI scenario in Figure 4-9:b. This is due to faster dissemination
in the presence of greater exposure to nodes with higher redemption chances.
Different đżm dynamics shown in Figure 4-9 across the two TEI scenarios explain the
performance gap discussed for the two scenarios in Section 4.7.1. The main reason for higher gain
achievement in the high-TEI scenario is the relatively higher đż^â˘ 	, which implies that the densely
populated community C2 provides high-quality relay nodes, even though the nodes in C2
themselves have low redemption probability. Higher đż^v:^â˘ 	 values also show that more coupons
can be relayed from C1 to C2 through these intermediate nodes.
To summarize, the presented results demonstrate how EDCD outperforms other evaluated
approaches across most of the depth of penetration (DoP) scenarios. To verify the generality of
this claim, the protocols are evaluated for another mobility scenario as presented in the following
subsection 4.7.2.
4.7.2

Second Mobility Model

In this model 150 nodes are deployed across five clusters.
83

(a) TEI:27.8

C2 (40):
[0,0.6]
C1 (14):
[0.6,1]

C3 (5):
[0,0.6]

C2 (40):
[0,0.6]

C5 (47):
[0.6,1]

C1 (14):
[0.6,1]

C4 (44):
[0.6,1]

(b) TEI:32.1

C3 (5):
[0,0.6]

C5 (47):
[0.6,1]

C4 (44):
[0.6,1]

Figure 4-10: Five clique network: (a): TEI:27.8, (b) TEI:32.1
Figure 4-10:a and Figure 4-10:b show the network topologies with target exposure indices
(TEIs) of 27.8 and 32.1 respectively.
The numbers in parentheses show the community population and the arrows show the flow of
migrating nodes between communities. Higher cross-community migrations in Figure 4-10:b
result in higher TEI compared to the scenario in Figure 4-10:a. Before looking into the overall
dissemination performance, in Figure 4-11 we plot the average	đż` 	for nodes from each community
for the low-TEI scenario.

395
345
295
245
195
145
95
45
-5

đż_đś4

0

0.5

1

Figure 4-11: Expected gain distribution from one-hop deliveries
Observe how the đż` 	values for nodes from the same community form distinct clusters,
indicating the similarities in their exposure to desirable target consumers. The đż` 	values also show

84

wide variations across different communities. For example, community C2 has the smallest đż`
since its nodes are generally isolated with less contact with target consumers. The situation is
similar for the nodes in C1. The communities C3, C4, and C5 show higher đż` because either their
nodes themselves are highly populated with nodes of with high redemption rates, or there are
enough nodes that migrate across communities containing desirable target consumers.
Figure 4-12 shows the dissemination performance for both TEI scenarios for all the evaluated
protocols, namely, EDCD, SCORP, IAD, and Random Dissemination with forwarding probability
(đp ) of 0.5. In Figure 4-11:a and Figure 4-11:b, for each protocol there are multiple points, each
corresponding to a specific Depth of Penetration (DoP).
Observe that for most DoPs in Figure 4-12:a, EDCD generally achieves the maximum gain
across all the đˇđđs. The protocol SCORP achieves the second-highest gain across DoPs in the
low-TEI scenario. Both these protocols rely on some form of social weight based utility for
dissemination, which explains their performance advantages compared to IAD and Random
dissemination. Protocols that rely on some form of randomness in choosing a relay or number of
copies to relay (e.g., Random, SCORP, and IAD) generally reach their maximum gain sooner
compared to EDCD. Since IAD only allows forwards to nodes with high-redemption probability,
it can take longer to meet such nodes, especially with low exposure to desirable target consumers.
That is why in few scenarios IAD takes longer than other protocols to reach its maximum gain.
Carriers hold on to the coupons until consumers are met, and in this case only consumers from
C1 and C4 are accessible. In the high-TEI scenario (Figure 4-12:b), EDCD maintains its general
performance advantage over the other protocols. IAD here does better than SCORP and EDCD in
few DoP scenarios.

85

SID
EDCD
SCORP
IAD
R0.5

Max gain

25
20
15
10
5

(a)TEI:27.8

0

30

SID
EDCD
SCORP
IAD
R0.5

25

Max gain

30

20
15
10
5

(b)TEI:32.1

0
5.5

10.5
15.5
20.5
25.5
Time to reach max gain (days)

5.5

10.5
15.5
20.5
25.5
Time to reach max gain (days)

Figure 4-12: Maximum gain and the time to reach maximum
This is because the high exposure of target consumers compensates for IADâs limitation of
not having a social-aware forwarding policy. Better exposure to the desirable nodes also explains
IADâs performance improvement compared to the low-TEI scenarios. More specifically, the
coupon carriers in C1 in this case meet consumers in C5 directly or through other members of C5
with high đż` (đ	 â đś5) values. This enables IAD to perform good quality forwarding without
having to use a social interaction-aware utility. Like in the low-TEI case, the random method in
high-TEI case also reaches its maximum gain in short time duration due to the ready access to
higher exposure target consumers.
Generally speaking, all the protocols attain their maximum gain sooner when the TEI is high
(i.e., in Figure 4-12:b) due to better exposure to the desirable target consumers.
The results demonstrate that the Economy Driven Content Dissemination protocol (EDCD)
is able to achieve higher economic gain compared to few other competing protocols under different
mobility and DoP scenarios. Note that the presented version of EDCD does not self-select the
optimal DoP for maximizing gain. The value of DoP needs to be a priori dimensioned through
simulation for given network parameters. Designing self-adjusting DoP within EDCD is currently
being investigated.

86

4.8

Summary and Conclusion
An economy-driven framework for D2D content dissemination was proposed and evaluated

in this chapter. The primary new concept is an economic gain of dissemination, which is defined
as a composite routing metric. It is constructed by combining the revenue from delivering a coupon
and the forwarding cost of dissemination. Using experimental results from the simulator ONE
[55], it was shown that the proposed EDCD can achieve better economic gain compared to a
number of competing DTN routing protocols under different mobility scenarios and Depths of
Penetration (DoP). Experiments also demonstrate that the maximum achievable gain is generally
higher in scenarios in which nodes with high redemption rates have higher network wide exposure.
Future work on this topic can be: a) methods for self-adjusting DoPs, b) learning based framework
that can auto-learn mobility patterns of the desirable target consumers for high-gain dissemination,
and c) mechanisms for handling consumer selfishness.
EDCD suffers from certain amount of performance loss due to the fact that the disseminating
node does not use prediction about the consumption interests of its potential future contacts. In the
next chapter, we address that limitation by the way of developing a predictive gain utility which is
used for deciding relative importance of an immediate peer in comparison with potential future
peers.

87

5
5.1

Multicast Routing using Predictive Gain Utility
Introduction
In previous chapter, we developed the notion of rebate and economic gain of dissemination.

The proposed multicast dissemination protocol EDCD (Chapter 4) relies on the consumption
interests of nodes that a disseminating node is in direct contact with. While it was shown to
perform better than I protocols such as Direct Delivery and Epidemic [37], it suffers from certain
amount of performance loss due to the fact that the disseminating node does not use prediction
about the consumption interests of its potential future contacts.
The presented work in this chapter addresses that limitation by the way of developing a
predictive gain utility which is used for deciding relative importance of an immediate peer in
comparison with potential future peers. We propose two versions of the utility; the first one,
Predictive Gain Utility Routing Individual (PGUR-I) uses node-specific information, and the
second one, Predictive Gain Utility Routing Aggregated (PGUR-A) implements a more scalable
version by aggregating node interaction information across consumers with similar consumption
interests.
The specific contributions in this chapter are as follows. First, we introduce the gain-aware
DTN multicast routing utility (similar to Chapter 4) that is designed around the notion of
consumption interest and coupon redemption probability. We propose two versions of a utility
based protocol; the first one uses node-specific information, and the second one implements a more
scalable version by aggregating node interaction information across consumers with similar
consumption interests. Simulation experiments for functional validation and performance
evaluation with respect to a benchmark and a comparative protocol are then performed. Finally,
sensitivity of the proposed protocol is evaluated for a number of social and commercially

88

meaningful parameters namely, network mobility, consumption interest patterns, a couponâs
expiry time, and the number of distributed copies.
5.2

System Model
Using the same cost and revenue (Section 4.2.1) formulation as defined in Chapter 4, we

compute the expected gain achieved through a coupon redemption by consumer-I as:
đş? = 	 đ? â 	 đp

(5-1)

The total network wide gain đş r can be expressed as:
đşr = 	

st
?uv đ?

â đĂđp

(5-2)

where coupons are delivered to đo consumers through đ forwards within a pre-specified
coupon expiry time (đ). The maximum number of deliveries of a coupon and the coupon expiry
time are generally restricted by its generator. Also, đo 	 â¤ 	đ, where đ is that distribution upper
bound of a coupon, which is set by the couponâs generator. In the proposed methods in this chapter,
only one-hop deliveries are allowed, thus n always equals to đo .
Similar to the proposed consumersâ interest and redemption probability model in
Section 4.2.2, each consumer-I maintains an interest level đ? (0 â¤ đ? â¤ 1) towards a specific
coupon type. Redemption function đ@ đ? ,	represents the probability of redeeming a coupon by
consumer-i. Note again that in the presented model, to preserve consumersâ privacy, it is assumed
that the data related to consumer interests and consumer-to-consumer interaction is always stored
at the consumersâ devices. It is never exported to a centralized server.
5.3

Problem Formulation
For a given combination of earned revenue đq , discount đo , rebate đp , distribution upper bound

đ, expiry time đ, redemption function đ@ (đ? ), and a social wireless network of consumers with

89

interest profiles đ? , the objective is to design a multicast coupon dissemination mechanism that
maximizes the total gain đş r 	within the lowest possible dissemination delay.
5.4

Proposed Protocols
The proposed Predictive Gain Utility Routing (PGUR) accomplishes the above objective by

adjusting its delivery strategy based on 1) the available number of copies of a coupon with the
coupon carrier, and 2) the time till the couponâs expiry. The concept is explained using the
following example of a four-node network shown in Figure 5-1.

!#

%&(0.8)
!"

!$

(a)
!'

%& (0.8)

!(

(b)

!"

%"(0.6) %'(0.9)%((0.8)

%"(0.6) %'(0.7) %((0.6)

Figure 5-1: Example user interaction and routing
Each consumer in the figure is represented as đ? 	(đ@ (đ? )), where I is the node-ID and đ@ (đ? )
indicates its redemption probability computed based on a pre-assigned redemption function
applied to its interest level đ? . For example, the consumer đR 	(0.8) represents node-0, which is
likely to redeem a coupon (of the type in question) with a probability of 80%. The time by an arrow
indicates the time of contact between two nodes. For example, the quantity đĄv corresponds to đR 	âs
contact time with đv 	. Across the scenarios, source node đR with two copies of the coupon (i.e.,
đ = 2) meets the other three nodes such that đĄv < đĄâ˘ 		 < 	 đĄH 		, đĄ 	 < 	 đĄÂĄ 	 < 	đ, where đ is the expiry
time of the coupon in question.
In scenario-a, since both đâ˘ and đH 	have higher redemption probabilities than đv , it makes
more sense for đR 	to wait and deliver the two copies of the coupon to those two later nodes. In

90

scenario-b, based on the redemption probabilities, đR 	should give one copy to đv and the other
copy to đâ˘ . This strategy, however, can be implemented when the entire interaction dynamics is
known a priori, as in oracle-based routings.
In the absence of the knowledge about future interactions, upon meeting đv , đR 	can compute
the probability of meeting nodes with higher redemption probabilities before the couponâs expiry
time. A delivery decision to đv can be made based on such expected overall gain. More
specifically, if đR 	is expected to meet K number of nodes with redemption probabilities higher
than that of đv â˛đ , and if K is larger than the current number of copies T, then no delivery to đv 	is
made. Otherwise, one out of T copies is given to đv . The proposed Predictive Gain Utility Routing
(PGUR) accomplishes such logic by computing expected gains based on past interaction locality
information. It does so in the presence of the constraints on number of available coupon copies
and the remaining time to coupon expiry.
5.4.1

Predictive Gain Utility Routing âIndividual (PGUR-I)
Predictive Gain Utility Routing âIndividual (PGUR-I) relies on the knowledge of: 1)
nodesâ interest profiles (i.e., đ? ), and 2) their underlying social interaction patterns. A
coupon-carrying node uses this information to evaluate how a delivery to a peer contributes
to the overall gain in the couponâs expiry time.

State Variables: Any node-I maintains the following variables with respect to another node-j.
đ?,` : iâs rate of meeting node-j over a pre-defined time window (e.g., a week). This parameter is
updated every time the two nodes meet.
đ` : node-jâs interest level toward the merchandize corresponding to the coupon. Node-I computes
node-jâs redemption probability đ@ (đ` ) based on the known redemption function corresponding to
the coupon.
91

Forwarding Logic: Consider a situation in which a node-I, carrying T copies of a coupon, meets
another node-j that does not already have a copy of the coupon. Node-I first uses Eqn. 2 to compute
the gain đş` ,	which node-j can achieve through redeeming the coupon.
It also computes the expected gain of delivery to nodes it may visit during the remaining
time (đ â đĄE ) to coupon expiry, where đĄE denotes the current time. Using the recorded frequency
of meeting with each node k that I met so far, I computes the probability of meeting k at least once,
during đ â đĄE as:
																													đm? (đ â đĄE ) = 1 â đ â˘đđ,đ (ÂŞâ˘ÂŤâ˘ )

(5-3)

This assumes that the inter-contact duration between node pairs follow exponential
distribution. The closer the above probability gets to the coupon expiry time, the lower is the
chance of meeting node k. The higher the contact frequency is between node-I and node-k, the
more probable it is for I to meet k during đ â đĄE . The expected gain from a possible delivery from
node-I to node-k can be computed by node-I as:
																				đ¸m? (đ â đĄE ) = đm? (đ â đĄE )Ăđşm

(5-4)

đşm is the expected gain achieved by a coupon redemption by consumer-k as defined in Eqn.
5-1. It is computed based on đm and the corresponding redemption probability đ@ đm .
After node-I computes đ¸m? (đ â đĄE ) for each node-k from its past interaction list, it identifies
the number of such nodes for which the expected gain of delivery is larger than the gain đş` ,	which
can be achieved by delivering a copy of the coupon to an immediate contact j. Those nodes form
a set đ = 	 	âđ	 	âđ:	đ¸m? đ â đĄE > đş` }, which is the set of target receivers (formed from the past
interaction list), whose members individually maintains a higher expected gain compared to the
gain generated by delivering to the immediate contact j.

92

If đ âĽ đ, it means that node-I is expected to meet more nodes (than it has coupon copies)
with expected gain higher than that of the immediate contact node-j. In this situation, node-I holds
onto all its copies and does not make any delivery to node-j. If đ < đ, a copy of the coupon is
delivered to the immediate contact node-j, since the expected number of future contacts with higher
gain in this case is less than the available number of copies T. Until T becomes zero, this process
is repeated every time node-I meets another node, which does not already have a copy of the
coupon. The key components of the described PGUR-I algorithm are summarized in Algorithm 51.
This protocol is termed as PGUR-I (or Individual) since a node-I is required to maintain the
state variables about other nodes on a per-node or individual basis. This may give rise to scalability
problems, which are addressed in PGUR-A (or Aggregated) as presented next.
5.4.2

Predictive Gain Utility Routing âAggregated (PGUR-A)
Information aggregation in PGUR-A is done by maintaining state variables for node groups
in specific gain quanta as opposed to for individual nodes.

Gain Quantum: The entire gain range is divided in L equal gain quanta. Gain of delivery, as
defined in Eqn. 5-2, varies in the range of [đşa?s , đşabc ],	where đşa?s = đa?s â 	 đp and đşabc =
đabc â 	 đp .

93

1: When node-I meets node-j:
2:
I updates đđ,đ and đ` for node-j in its
records;
Forwarding Logic:
3: đ = â; //Set of nodes k with expected gain
larger than đş`
4: For node-I carrying T copies of a coupon:
5:
For every node-k met by I before:
6:
//compute the expected meeting
probability
7:
					đm? (đ â đĄE ) = 1 â đ â˘đđ,đ (ÂŞâ˘ÂŤâ˘ )
8:
//compute the expected gain
9:
		đ¸m? (đ â đĄE ) = đm? (đ â đĄE )Ăđşm
10:
if (đ¸m? (đ â đĄE ) > đş` ) then
11:
add the node-k to the set S;
12:
if ( đ < đ) then
13:
Deliver one copy of the coupon to
node-j;
Algorithm 5-1: The key components of the PGUR-I algorithm
đa?s and đabc are the minimum and maximum possible net revenues for a delivery. Minimum
and maximum achievable revenues can be computed using Eqn. 4-1, as follows:
đa?s = đ@a?s (đq â đo )		

(5-5)

đabc = đ@abc (đq â đo )		

(5-6)

where, đ@a?s and đ@abc are the minimum and maximum possible redemption rates for a coupon.
Each gain quantum l corresponds to gain values in the range of [ đşa?s + 	
đ â 1 , đşa?s + 	

Âąâ Ëâš â˘Âąâ âŹâĄ
]

Âąâ Ëâš â˘Âąâ âŹâĄ
]

Ă

Ă đ ]. The quantum l is represented by the gain ceiling in that

quantum (đşÂ˛ ), which is đşa?s + 	

Âąâ Ëâš â˘Âąâ âŹâĄ
]

Ă đ . For example, if gain varies in range [0,1] and

đż = 10, đşÂ˛ = 0.5 corresponds to a set of nodes with gain range [0.4,0.5].
Depending on its interest level in a coupon category and the corresponding redemption
probability, each node belongs to a specific gain quantum. Nodes are logically grouped based on

94

the quantum they belong to. The number of quanta (L) should be large enough to provide sufficient
granularity, so that	đşÂ˛ is a fair estimation of individual gains for members in a quantum.
State Variables: Each node-I maintains the following variables:
đÂ˛ :	No. of nodes in gain quantum l that are met by node-I so far.
đ?Â˛ : iâs rate of meeting with nodes in the gain quantum l over a pre-defined time window (e.g., a
week).
Forwarding Logic: when node-I, carrying T copies of a coupon, meets node-j, which does not
possess a copy, the former computes đş` based on Eqn. 5-1 and the interest value shared by nodej. Based on jâs gain quantum, I also updates the quantities đ?Â˛ 	and	đÂ˛ 	at this stage.
For each gain quantum l with đşÂ˛ âĽ đş` , node-I computes the following probability of meeting
at least one node in quantum l during the remaining time to expiry for the coupon in interest.
âŹ

		đÂ˛? (đ â đĄE ) =1-đ â˘ÂłÂ´ (ÂŞâ˘ÂŤâ˘ ) 				

(5-7)

Node-I then computes the aggregated expected gain of delivery to nodes in gain quantum l,
where đşÂ˛ âĽ đş` :
							đ¸Â˛? (đ â đĄE ) = 		đÂ˛? (đ â đĄE )ĂđşÂ˛

(5-8)

A cumulative prospective recipient count	đ`? (đ â đĄE ) is then computed by adding the quantum
member count đÂ˛ over all the quanta for which the quantity đ¸Â˛? (đ â đĄE ) is larger than đş` .
đ`? (đ â đĄE ) =

Â˛Âľ
Â˛uÂ˛Ë đÂ˛

(5-9)

The quantities đv and đâ˘ correspond to the smallest and largest gain quantum which have
higher aggregated expected gain compared to đş` . The quantity	đ`? (đ â đĄE ) corresponds to the
number of nodes (that node-I is expected to meet during the remaining time to expiry) that can

95

individually produce larger expected gain compared to đş` . Delivery to this set of nodes is therefore
supposed to lead to a higher overall gain.
1: When node-I meets node-j:
2:
I updates đ?Â˛ and đÂ˛ for node-j in its records;
Forwarding Logic:
3: For node-I carrying T copies of a coupon;
4:
For each gain quantum đ	with đşÂ˛ âĽ đş` :
5:
//compute the expected meeting
probability
âŹ
6:
		đÂ˛? (đ â đĄE ) =1-đ â˘ÂłÂ´ (ÂŞâ˘ÂŤâ˘ ) ;
7:
//compute the aggregated expected gain
8:
đ¸Â˛? đ â đĄE = 		đÂ˛? đ â đĄE ĂđşÂ˛
9:
if (đ¸Â˛? đ â đĄE > đş` ) then
10:
//add the number of nodes in gain
quantum l
11:
đ`? đ â đĄE += đÂ˛ ;
12:
if (đ`? đ â đĄE < đ) then
13:
Deliver one copy of the coupon to node-j;
Algorithm 5-2: The key components of the PGUR-A algorithm
If đÂ˛? đ â đĄE âĽ đ, it means that node-I is expected to meet more nodes (than it has coupon
copies) with expected gain higher than that of the immediate contact node-j. In this situation, nodeI holds on to all its copies and does not make any delivery to node-j. Otherwise, a copy of the
coupon is delivered to the immediate contact node-j, since the expected number of future contacts
with higher gain in this case is less than the available number of copies T. Until T becomes zero,
this process is repeated every time node-I meets another node, which does not already have a copy
of the coupon. The key components of the described PGUR-I algorithm are summarized in
Algorithm 5-2.
5.5

Compared Algorithms
Benchmark: This approach is used for finding the gain upper bound when all system

parameters including node mobility/interactions are known a-priori. The first-time interactions

96

between a coupon carrier I and every node-j that it meets till the couponâs expiry time are tracked.
The top T interactions in terms of gain from the entire interaction set are chosen for coupon
dissemination from node-I to each such node-j. The cumulative gain (đş` ) is then computed by
adding up the individual gains obtained across all T dissemination. This cumulative value does
indicate the upper bound of achievable gain for the given mobility/interaction and interest profiles.
Social-aware Opportunistic Routing Protocol (SCORP): In order to make SCORP fit the
existing framework, we modify this protocol slightly different compared to the modified version
used in previous chapter. The interest aware protocol, SCORP [20], works with binary interest
levels (i.e., 0 or 1). It uses a social weight, computed using a nodeâs interest profile and social
interaction patterns. Node-a computes the quantity	TCTI(a, x)Âš , which is its Total Connected Time
to Interest (TCTI) with nodes with interest-x using a daily sample-i. The average ATCTI(a, x)Âť,Âš is
computed for day-j as:
đ´đđśđđź(đ, đĽ)`,? =

r^râ(b,c)Ĺ˝,âŹ â˘(`â˘v)âr^râ(b,c) Ĺ˝âË ,âŹ
`

(5-10)

Content is transferred from node-I to node-j, when node-j has consumption interest, or, its
social weight (computed based on its ATCTI) is higher than that of i. We have incorporated the
following changes in order to adapt the baseline SCORP with binary interests to the present
framework of multi-level interests. The idea is to let the node interest level đ?	 vary in the range 0
to 1, but to restrict the redemption probability đ@ (đ` ) to binary values of 0 or 1. In that case, đ@ (đ` )
is set to 1 for all nodes, which induce a positive gain, and it is set to zero otherwise.
This approach does not differentiate between interest levels, which are associated with
higher redemption probabilities, and those, which are associated with, lower redemption
probabilities. Therefore, it underestimates the achievable gain. To prevent that, we modify

97

SCORPâs logic such that the delivery to an interested node-j takes place based on peerâs
redemption probability normalized between 0 and 1.
5.6

Dissemination Performance
We use the DTN simulator ONE [55] for evaluating PGUR-A, PGUR-I and other target

protocols under three different community based synthetic mobility models and one real mobility
trace (i.e., from INFOCOM â06 [59]). The main differences across the models are in their
community structure and interest level distributions. For all the experiments we use the Gaussian
redemption function as explained in Section 4.6.1. Figure 5-2 shows the applied function with
mean đ = 0.75 and maximum redemption probability	đ@abc = 0.5. The corresponding gain curve
in Figure 5-2 is when the forwarding cost đp , earned revenue đq ,	and discount đo are set to 0.2, 5
and 2 respectively. These values are also used for all subsequent experiments.
1

0.8

)

0.6
0.4
0.2

)

0
-0.2

0

0.2

0.4

0.6

0.8

1.4
1.2
1
0.8
0.6
0.4
0.2
0
1
-0.2

Figure 5-2: Example Gaussian redemption function
5.6.1

Synthetic Mobility
Synthetic mobility follows map-based movements in a network of 10 waypoints. Nodes
move between waypoints with assigned transition probabilities. For all experiments,

98

network size is set to 100 nodes. Each experiment is repeated for around 32 source nodes
with highest redemption probability.
5.6.1.1 Monolithic Mobility
Each of the 100 nodes travels from any waypoint to any randomly chosen waypoint with
equal probability. Interest levels for nodes are randomly assigned within the range đ? 	 â [0.62, 0.89]
following a uniform distribution.

4.5

BM

PGUR-I PGUR-A

3.5
2.5
1.5
16.66

SCORP
(a): !=12	â-./)

Average gain

Average gain

5.5

17.16
17.66
Time (days)

PGUR-I

PGUR-A

7.5
5.5

11.5

BM
PGUR-A
SCORP

Average gain

Average gain

BM
PGUR-I

SCORP
(b): ! = 1	%&'
17.16
17.66
Time (days)

13.5

11.5
9.5

8.5
7.5
6.5
5.5
4.5
3.5
2.5
1.5
16.66

3.5
1.5
16.66

9.5 PGUR-I

BM
PGUR-A

7.5
5.5

SCORP

3.5

(c): ! = 5	%&')

1.5
16.66

18.66
20.66
Time (days)

(d): ! = 10	%&')
21.66
Time (days)

26.66

Figure 5-3: Evolution of gain for different expiry time;
T=10
Figure 5-3 shows evolution of the network-wide cumulative gain over time, beginning from
the content generation time on day 16, which is when the network simulation warms up in terms

99

of stabilizing the interaction profile. Experiments with ten initial coupon copies (i.e., đ = 10) were
carried out for expiry time đ ranging from 12 hours all the way up to 10 days.
The benchmark mechanism (i.e., BM), as explained in Section 5.5, shows the maximum
achievable gain for all scenarios in Figure 5-3. All the protocols, including benchmark, show an
increasing trend for gain over time, as more coupons are delivered.
Figure 5-3 shows that even though the proposed PGUR protocols have different timedynamics, in most scenarios they deliver near-BM final cumulative gain performance. PGUR-A
achieves a higher gain compared to PGUR-I in all cases except for the very short expiry time. The
reason is as following.
Imagine a node-I carrying a copy of a coupon meets node-j. In case of an immediate
delivery, the induced gain is đş` . The protocol PGUR-A first computes the expected gain of delivery
to nodes belonging to gain quantum (see Eqn. 5-8) higher than đş` . This expected gain is a function
of the probability of meeting any of the qualified nodes in the mentioned quantum. PGUR-I, on
the other hand, computes the expected gain of delivery to future contacts based on the probability
of meeting each individual node with gain larger than đş` (see Eqn. 5-4). Since, the probability of
meeting any node belonging to a gain quantum is greater than the probability of meeting each
specific node in that quantum, the expected gain of delivery to future contacts is usually higher for
PGUR-A compared to that for PGUR-I. Therefore, PGUR-A is more conservative regarding
immediate deliveries. This allows PGUR-A to achieve a higher gain by waiting for more qualified
consumers with higher gain and not dispensing resources (i.e., coupon copies) rapidly. That is why
PGUR-A achieves a higher gain when the expiry time is longer than half-a-day. For short expiry
times (12 hours), however, PGUR-I achieves a higher gain, since generous dissemination to
immediate contacts is more efficient under tight time constraints.

100

Figure 5-3 shows that the other evaluated protocol SCORP performs worse than both PGURA and PGUR-I in all the experimented scenarios. The main reason for such performance deficiency
is that while dispensing coupons, SCORP does not predict the chances of meeting nodes with
higher gain before the coupon expiry time. In contrast, with both PGUR-I and PGUR-A, a carrier
often avoids immediate delivery according to the prediction it does about possibility of meeting
more qualified receiver nodes. This lack of visibility particularly affects SCORPâs gain when the
expiry time is long (e.g., 5 or 10 days). It is also the reason why SCORP finishes dissemination in
a shorter time compared to PGUR in all scenarios.
Generally, SCORP achieves a higher gain with increase in expiry time initially, because in
longer time duration, a larger fraction of the sample space is explored and nodes with higher
expected gain are found. Though, for đ = 10	days, a slight decrease in gain is observed. That is
because with SCORP a coupon carrier may deliver one or multiple copies to another node, which
already has the content. Such deliveries do not generate revenue since no consumptions happen.
Those deliveries, however, incur the forwarding cost in the form of rebates. With longer expiry
time, more of such deliveries happen, thus resulting in loss of cumulative gain	đşđ .
Figure 5-4 depicts the behavior of PGUR protocols in terms of how they perform immediate
deliveries and how they hold on to coupon copies with the anticipation of meeting good
prospective recipients. These results correspond to the dissemination pattern for one of the coupon
carriers shown in Figure 5-3:d, where T=10 copies and đ = 10 days. The red lines show the change
in available number of copies at the coupon carrier over time. T starts at 10 and it reaches 0 showing
that all copies are disseminated eventually. Upon interacting with node-j, node-I computes the
number of nodes (with individual gains higher than đş` ) that node-I is expected to meet before the

101

couponâs expiry time. This quantity, which corresponds to the size of set S in PGUR-I and refers

Time (days)

!"#$ % â '(

20
(a) PGUR-I 18
16
14
12
10
8
6
Last delivery 4
2
0
18.66

!"#$ % â '(
20
Available
18
copies
16
14
12
10
8
6
4
2
0
16.66
18.66

(b) PGUR-A

Last delivery

20.66

20
18
16
14
12
10
8
6
4
2
0

Available number of copies

20
18
16
|S|
14
Available
copies
12
10
8
6
4
2
0
16.66
17.66

Available number of copies

|S|

to the value đ`? đ â đĄE in PGUR-Aâs logic, is shown as the blue circles in Figure 5-4.

22.66

Time (days)

Figure 5-4: Dissemination behavior of PGUR protocols
The points above the red line show the instances of meetings for which the number of better
future consumers (compared to j) exceeds the number of available copies. Therefore, no delivery
to j happens. All 10 points below the red line correspond to instances, where, the number of
available copies are more than the number of nodes with potentially higher gain compared to đş` .
That is when a delivery takes place. After the last delivery, the number of copies reaches 0.
With a fixed expiry time in both cases, PGUR-A finishes disseminating all copies on day 25
(Figure 5-4:b), while for PGUR-I dissemination is done by day 19 (Figure 5-4:a). This indicates
that PGUR-A is less likely to deliver to immediate contacts. It rather holds on to the content for a
longer duration. This phenomenon, usually leads to a higher eventual gain.
5.6.1.2 Community-based Mobility
We experimented with a Community-based Mobility with Community-based Interest
(CMCI) model. Nodes are divided into 4 communities, each with 25 nodes. Members within a
community are made to follow the same stochastic mobility by enforcing the same inter-waypoint

102

transition probabilities as discussed in Section 5.6.1. This ensures higher intra-community meeting
frequency compared to inter-community on a node-pair basis. Consumption interest is assigned as
follows. The entire redemption probability spectrum is divided into 4 equal segments, each specific
to a community. Then the interest levels for nodes in a community are uniformly randomly
assigned such that the corresponding redemption probabilities fall into the community-specific
segment. This way, the members of one community are similarly interested in receiving coupons
for a product.

20

Final Gain

15

SCORP
PGUR-I
PGUR-A
BM

10
5

0
10
15
5

10
0 5

Figure 5-5: Final gain for all combinations of T and đ for CMCI mobility scenario
Experiments with CMCI are performed for a large combination of T and	đ. The observed
trends of gain evolution over time for all experimented protocols are very similar to those in the
Monolithic scenario (i.e., Figure 5-3). The final gain values for all protocols are summarized in

103

Figure 5-5. Each point corresponds to the final gain achieved by a protocol for a combination of
initial number of coupon copies T, and an expiry time đ.
Observe that under CMCI, the final gain provided by the benchmark (BM) protocol slightly
increases compared to that under the monolithic model, as reported in Figure 5-3.
This is because a coupon carrierâs most frequently contacted peers are nodes from the same
community with similarly high interest levels and the resulting redemption probabilities. The same
observation and explanation also apply to all non-BM protocols.
For all protocols, the final gain increases with higher number of coupon copies. Also, as the
expiry time extends, the protocols are able to achieve larger gains, since the chances of meeting
nodes with higher redemption probabilities grow. Generally, with longer expiry time, the
performance gap between SCORP and the PGUR protocols increases for all Tâs. The reason is that
in a shorter expiry time it is more urgent to deliver coupons to immediate contacts rather than
holding on to the coupon to meet better consumers in future. Hence, SCORPâs rapid delivery logic
is closer to the Benchmarkâs strategy. For longer contact duration, such strategy hurts SCORPâs
performance compared to those of benchmark and the other protocols that waits for perspective
better receivers.
5.6.2

Trace-based Mobility
All the dissemination protocols were also tested for a human interaction trace collected
[59] from INFOCOM â06 conference. The trace contains data from 97 devices including
few static ones and those, which were carried around by a group of users for all four
conference days. All experiments in this section are run for at least 15 source nodes.

Figure 5-6 shows the maximum achievable gain (Figure 5-6:a) and the time to reach that
(Figure 5-6:b) as a function of the coupon expiry time. Observe that the overall comparative

104

performance across the protocols, as observed for the two previous synthetic mobility scenarios,
also applies for this trace based case. As before, the performance of the proposed PGUR protocols

19
18
17
16
15
14
13
12
11
10

(a)

BM

PGUR-A
PGUR-I

SCORP
0.5

1
1.5
Expiry time (days)

Time to reach max. gain

Max. gain

lay in between those for benchmark and the SCORP mechanism.

2

2
1.8
1.6
1.4
1.2
1
0.8
0.6
0.4

(b)
BM

PGUR-I

PGUR-A
SCORP
0.5

1
1.5
Expiry time (days)

2

Figure 5-6: Gain dynamics for different expiry time; T=15
Overall, results from synthetic mobility and the trace-based mobility verify the usefulness of
PGUR-A in achieving near-benchmark gain various combinations of number of coupon copies
and their expiry times. Only for short expiry time, PGUR-Aâs performance is slightly worse than
PGUR-Iâs. However, considering the scalability advantages of PGUR-A, this small gain
compromise can be acceptable. The larger the expiry time is and the more the number of initially
available coupon copies, the higher economic gain is provided by the PGUR protocols.
5.7

Summary and Conclusion
We developed two economic gain-aware DTN multicast routing protocols that maximize the

economic gain of disseminating coupons. Economic gain was defined as a composite index, which
combines revenue from delivery and forwarding cost from dissemination. Using experimental
results from the simulator ONE [55], it was shown that the proposed protocols achieve nearBenchmark economic gain, which is better compared to a competing DTN routing protocol.

105

Sensitivity of the protocols was evaluated for different network mobility, consumption interest
patterns, a couponâs expiry time, and the number of distributed copies.
Future work includes developing: a) learning-based framework for high-gain dissemination,
b) a routing framework which uses information regarding nodesâ transitive interactions, and c)
mechanisms for handling consumer selfishness.

106

6
6.1

MAC Protocol Design Using Evolvable State Machines
Introduction
Communication protocols can be modeled [62] as collaborative state machines that interact

through inter-node message passing. Designing a protocol requires constructing an optimized state
machine that usually maintains a balance between multiple performance indices such as
communication delay and throughput. Protocol design is considered to be an optimization problem
for which rarely there is a closed form solution. Currently, designing network protocols are mostly
based on heuristics, which relies on past experience of the designer in known network conditions.
While heuristics can work for relatively static networks, a protocolâs performance can and do
degrade when the network conditions change over time. To cope with such changes, network
designers often need to redesign protocols periodically with changing network conditions.
One alternative approach is to leverage learning mechanisms which allow network protocols
adapt to the dynamicity of the environment with minimum intervention from the designer. This
section explores evolutionary learning, specifically, evolving state machines for protocol
synthesis. A specific network protocol is coded as a genotype and its resulting state machine
behavior manifests in the form of the corresponding phenotype, leading to specific performance.
The genotype or state machine is allowed to evolve under the pressure of desired performance
fitness. In this chapter, we explore the evolutionary design of ALOHA [63], a simple yet widely
used family of MAC protocols for which the theoretical upperbounds are well known.
Figure 6-1 shows a two-state machine for the Pure-ALOHA logic along with the protocolâs
attainable throughput for different network-wide load G (Figure 6-1:b). The state machine shows
when a node has a packet to send, it simply sends the packet, and subsequently gets back to the
wait-for-packet state. Part of the investigation in this chapter will be to explore when the basic

107

building blocks for the ALOHA state machine are provided to an evolutionary substrate, if and

P=0/1: No packet to send/packet to send
T=0/1: Transmission done/ in progress
T=1

P=0

P=1
(a)

Wait for
packet

Transmit
packet
T=0

Normalized Medium Usage

how the desired state machine emerges as a result of an evolution process.
0.2

~18% Pure ALOHA

0.18
0.16
0.14
0.12

0.1
0.08
0.06
0.04

(b)

0.02

0
0

2

4

6

8

Normalized Information Load

Figure 6-1: (a) Two-state machine for Pure-ALOHA MAC; (b) Network-wide load G vs.
Throughput S (in Erlang) for pure-ALOHA phenotype
In the next chapter, it will be also explored if progressively more complex protocols such as
slotted-ALOHA and Carrier Sense Multiple Access (CSMA) can emerge out of an evolution
process when time slotting and channel sensing capabilities are added. Specific contributions of
the chapter are as follows. First, a probabilistic state transition model is developed for genotype
coding of protocol state machines. Second, a genotype to behavioral phenotype modeling
mechanism is developed for evaluating the fitness of a protocol genotype for a given network and
traffic load. Third, a Java based evolution substrate is developed in which state machines can
evolve with user-specified evolution operators such as mutation. Finally, results are presented in
the form of evolution and performance trajectories in the presence of varying network and traffic
conditions.
6.2

Protocol State Machine

We define a generalized state machine (see Figure 6-2), which is modeled after a typical learning
process that a baby goes through for making a meaningful conversation. Such learning involves:

108

a) identifying appropriate time slots to start talking in a competitive environment with multiple
speakers, b) ending a sentence in a graceful manner rather than interrupting herself or getting
distracted in the middle of a sentence, and c) waiting or distracting herself for subconsciously
adjusting the speaking rate in order to minimize interruptions to others. The state machine in
Figure 6-2 provides a general framework that embeds those learning components and more. The
machine is intentionally kept general with high degrees of behavioral freedom. The hope is that in
the presence of such behavioral freedom, unpredictable speaking protocols will emerge under right
environmental conditions. In general, speaking a sentence is mapped as sending a fixed size packet.
Three states, namely, Wait (W), Transmit (T) and Loitering (L) are defined. Wait is a state in which
a node remains silent while waiting to receive the next packet to be transmitted from the upper
network layer.
T=1;
Set Timer;
State = T

)(

Drop
packet

T=0;
State = W

W

}

)
T

1-

)

State = W
T=0;
State = L

State=L

L
1-

Figure 6-2: An evolvable generalized FSM with embedded pure-ALOHA logic
A transmission takes place in the state Transmit. The transition from W to T is the only legitimate
transition that can initiate a packet transmission. Loitering models the distraction phase as outlined
in the babyâs learning scenario. This state does not actually induce any explicit fitness reward,

109

though it ensures additional behavioral freedom. Lingering in any of the states T or L can delay
possible new transmissions. The labels on the arrows in Figure 6-2 indicate the transition
probabilities, and the text within the circle by a transition describes the actions associated with it.
Additionally, the machine uses two flags P (Packet flag) and T (Transmission flag). When set, they
indicate generation of a packet and an ongoing transmission respectively. It should be noted that
while a specific set of transition probabilities in this machine implements the known pure ALOHA
protocol, it is general in that it contains many possible execution sequences implementing other
working and non-working protocols.
ĂuRâĂuv
There are six independent transition probabilities as follows. The probability 1 â đĂĂ	

refers to the probability of a transition from W to T when a packet is generated. In pure-ALOHA,
This probability equals to 1, representing a node greedily attempting to access the channel as soon
as a packet is generated. đĂ] is the probability of moving to the Loitering state, while đ]Ă models
the return path from L to W. Arbitrary movements between the states W and L based on the
probabilities đĂ] and đ]Ă control random loitering that will prove to be useful for self-adjusting
the effective load from a node. In pure-ALOHA logic, a node returns to state W as soon as a
previous transmission is over.
However, in the generalized model, a node can decide between returning to state W with
ruvâruR
ruvâruR
probability đrĂ
and moving to state L instead, with a probability 1-đr]
. Note that a

redundant transition to the state L delays possible packet transmissions. Furthermore, a node is
ruvâruv
allowed to abandon its transmission and move to state W or state L with probabilities đrĂ
ruvâruv
and đr]
respectively. Table 6-1 summarizes the independent transition probabilities and

values which would reduce the state machine to that of pure-ALOHA logic in Figure 6-1:a.

110

đˇđˇuđâđˇuđ
:
đžđž	

W to W, when packet is generated, 0

đˇđžđł :

W to L without any external event, 0

đˇđłđž :

L to W without any external event, 0

đˇđťuđâđťuđ
:
đťđž

T to W when an ongoing transmission is over, 1

đˇđťuđâđťuđ
:
đťđž

T to W while a packet transmission is in
progress,0
đťuđâđťuđ
đˇđťđł
: T to L while a packet transmission is in
progress, 0
Table 6-1: Independent transition probabilities
6.3

Evolution Platform

The evolution platform (see Figure 6-3) consists of two main components, namely, a Java based
evolution fabric and a C based protocol evaluation module. A gene embodies the collection of all
the independent transition probabilities. The evolution fabric is responsible for the evolutionary
tasks including generation of the initial gene population, sorting fitness values, a tournament
selection, mutation, and reproduction [64]. A mapping function is used for mapping each gene to
a protocol behavior phenotype code, which is evaluated within the C based evaluation module.
The protocol embedded in network nodes runs and induces a network-wide fitness reward. The
protocol performance is then fed back into the evolution fabric in the form of gene-fitness.

C code
gi âs
Throug
hput

Gene

map

Protocol Code
Target Network
Platform

Evolution
Fabric

Java code

Fitness

Figure 6-3: System structure for the proposed evolutionary framework

111

Fitness: Network-wide throughput is used as the fitness indicator. The hypothesis here is that
by the way of effective genotype evolution of the state machine in Figure 6-2, nodes will be able
to adjust to the optimum loading conditions in order to achieve the best possible throughput (i.e.,
of ALOHA) as shown in Figure 6-1:b.
Evolution: Evolution starts with an initial population (with size p) that includes genes
consisting of independent transition probabilities from Table 6-1 with randomly assigned values.
Note that the extreme probability values (i.e., 0 and 1) can lead to state pruning and inclusion,
which can induce drastic changes in node behavior. For example, a gene which has both
ruvâruv
probabilities đĂ] and đr]
set to zero, prunes the loitering state L from the corresponding

phenotype protocol.
Reproduction is implemented using tournament selection [64], which is used in Genetic
Algorithm for providing selection pressure (i.e., network throughput) by holding a tournament
among a set of selected members, which is 2m% of p in this case. After the tournament selection,
m% of p members are selected. Subsequently, the selected m% of the p genes goes through a
mutation process to produce a new generation. It was observed that by choosing m around 20%,
sufficient genetic diversity can be maintained across generations. The mutation operator of the
Breeder Genetic Algorithm proposed in [65] is used. A quantity Valuei is mutated as:
			đđđđ˘đ?aĂÂŤbÂŤĂo = 	 đđđđ˘đ?	 + 	 đ ? . đ? . đ? 			, đ â {1,2, . . , đ}	

(6-1)

where đ ? is a random variable â {â1,1}, and đ? is defined as rĂ đđđđđđ? . The quantity r is
the mutation range with typical values within [0.1,10â˘Ă ] and đđđđđđ? is the domain of the
variables. The quality đ? 	is defined as 2â˘Ă.m where u	â [0,1] and k is the mutation precision with
typical values in range {4,5, . . ,20}. Parameter k is associated with the mutation step-size, which is
the proximity of a mutated value to its parent individual. Mutation steps are limited to the range

112

[r,đ. 2â˘m ]. Modifying parameters in Eqn. (6-1) provides different search strategies with different
step-sizes. Mutating top (most-fit) members of a population is supposed to maintain diversity
among members of the new generation and avoid settling in local optimums.
Each member of the new generation models a specific protocol behavior, which gets evaluated
within the C based evaluation module in Figure 6-3. The evolution process including sorting fitness
values, tournament selection, mutation based reproduction, and population replacement are then
performed on members of the new generation which consists of m% of p number of children and
the p â m% of p members from the previous generation.
6.4

Results

We define an Evolutionary Degree of Behavioral Freedom (EDBF) as the number of transition
probabilities (in Figure 6-2) that are allowed to evolve in an experiment. In scenarios with smaller
EDBFs, fewer evolvable probabilities limit the phenotype search space. The gene structure
consists of six independent probabilities mentioned in Section 6.2. The population size in each
generation is set to 100 in all experiments. Evolution is run for 300 generations with m (see
Section 0) set to 20%.
6.4.1

Partial EDBF (EDBF: 1, 2, 3)

6.4.1.1 Evolvable Probability: đˇđˇuđâđˇuđ
(EDBF: 1)
đžđž	
ĂuRâĂuv
đĂĂ	
, the probability of remaining in state W when a new packet is generated, is the

only evolvable probability in these experiments. Figure 6-4:a shows the evolution of average
ĂuRâĂuv
đĂĂ	
over 100 members of each generation for scenarios with different network-wide applied

load (G). Only 50 generations out of 300 are shown since probabilities stabilize by that point.

113

G-0.8

0.18

0.5

G-0.8

0.4

0.2

(a)

G-1.0

Throughput (Erlang)

0.6

G-0.5

(b)

G-1.0

0.16

G-0.3

0.14

0.3

0.12

0.2

G-0.5

G-0.3

0.1

0.1

0.08

0
0

20

Generation number

0

40

20

Generation number

40

ĂuRâĂuv
Figure 6-4: (a) Evolution of đĂĂ	
and (b) the corresponding throughput
ĂuRâĂuv
For a given node-specific load g (i.e., G/N), any non-zero đĂĂ	
reduces the nodesâ
ĂuRâĂuv
ĂuRâĂuv
effective individual load đ following the expression đ=đ(1 â đĂĂ	
) . In general, đĂĂ	

reduces a nodeâs transmission persistence. The network-wide effective load can be expressed as
ĂuRâĂuv
đş	 = đş 1 â đĂĂ	
. For any applied load G, by the way of evolving the probability
ĂuRâĂuv
đĂĂ	
, the nodes are able to adjust the network-wide effective load đş to its optimal value 0.5

at which the network-wide throughput is maximized to pure-ALOHAâs throughput of 18.4%. This
ĂuRâĂuv
load-reduction effect can be evidenced in the evolution trajectories of đĂĂ	
and throughput

(when G=0.8 and G=1) in Figure 6-4:a and Figure 6-4:b. The expected steady state value of
ĂuRâĂuv
đĂĂ	
is:
ĂuRâĂuv
		đĂĂ
= 1- đş đş

(6-2)

ĂuRâĂuv
For example, when G is 0.8, đĂĂ	
is expected to evolve to 0.37 (i.e., 1- 0.5 đş ), which

is evidenced in Figure 6-4:a. When G is set to less than the optimal value 0.5, since no reduction
ĂuRâĂuv
in load is needed, đĂĂ	
evolves to a near-zero value. This helps maintaining the effective

114

load đş at the same level of the applied load G. Figure 6-4:b shows that throughput trajectory with
ĂuRâĂuv
evolving đĂĂ	
for the đş = 0.3 case settles at the theoretical throughout of 0.16 (đâ]ĂĂâ =

đşđ â˘â˘Âą ). When the applied load G is equal or higher than the optimal value 0.5, the throughput is
able to reach the theoretical maximum of 18.4%.
Figure 6-5 depicts the trajectory of the effective load đş averaged over each generation. For
all đş âĽ 0.5	scenarios, đş	is adjusted to 0.5 by the way of evolving the transition probability
ĂuRâĂuv
đĂĂ	
. However, đş does not need to be adjusted when the applied load G is less than the
ĂuRâĂuv
optimal value 0.5. In such cases (e.g.,	đş = 0.3), đĂĂ	
evolves to near-zero values such that

đş	converges to G.
To summarize, the above results indicate that the generalized state machine in Figure 6-2
ĂuRâĂuv
can evolve towards the ALOHA state machine in Figure 6-1:a, when genes containing đĂĂ	

are allowed to evolve under network-wide throughput as the fitness pressure.
Effective Load đş Ě (Erlang)

0.8

0.7

G-1.0

0.6

G-0.8

0.5
0.4

G-0.5

G-0.3

0.3
0.2
0.1
0
0

10

20

30

Generation number

40

50

Figure 6-5: Trajectory of effective load đş for different applied load G
6.4.1.2 Evolvable Probabilities: đˇđť=đâđť=đ
and đˇđłđž (EDBF: 2)
đťđž
ruvâruR
Following a packet transmission, a node returns to state W or L with probabilities đrĂ
ruvâruR
or 1-đrĂ
respectively. A node moves out of state L with probability đ]Ă . In this set of

115

experiments, we allow both these probabilities to evolve. Figure 6-6:a through Figure 6-6:d depict
ruvâruR
the joint evolution trajectory of the pair đrĂ
and đ]Ă for applied loads G = 0.3, 0.5, 0.8 and
ruvâruR
1.0. Each point in these graphs represents the pair {đrĂ
, đ]Ă } and the corresponding

throughput. In the beginning, the probabilities change at a fast rate, causing the inter-generation
probabilities to be well separated. With advancing generations, the probabilities reach nearconvergence values, forming distinct clusters. For all đş	 âĽ 0.5, such clusters are formed around
the throughput 0.18, indicating evolved ALOHA behavior. For the scenarios with applied load
ruvâruR
đş	 â¤ 	0.5, đrĂ
evolves towards 1, which is the corresponding transition probability for pureruvâruR
ALOHA. This can be observed in Figure 6-6:a and Figure 6-6:b. Any non-unity đrĂ
causes

the effective load đş to be smaller than load G. For G <= 0.5, since such load reduction can drag
ruvâruR
down the throughput, the nodes evolve the đrĂ
part of their genes towards 1 in order to

maintain the throughput at its maximum possible values. đ]Ă increases initially slightly to allow
the needed exits from state L. It then evolves to values between 0.6 and 0.8, which can maintain
the maximum possible throughput. Figure 6-6:c and Figure 6-6:d depict scenarios with G > 0.5.
ruvâruR
The initial fall in đrĂ
helps reducing đş towards the optimal value 0.5. The fall for the đş	 =

1.0 case in Figure 6-6:d is sharper than that for đş	 = 	0.8	in Figure 6-6:c. This is because with
ruvâruR
higher load, a larger reduction is needed and a smaller đrĂ
provides that reduction.

116

End point of evolution

0.18

(a) G-0.3

0.17
0.16

Starting point
of evolution

0.15
1

Throughput (Erlang)

Throughput (Erlang)

0.19

0.19
0.18
0.17

(b) G-0.5

End point of
evolution

0.16

Starting point
of evolution

0.15
1

1
0.5

0.5

0.5

0.19

0

End point of evolution

0.18

(c) G-0.8

0.17
0.16
0.15
1

Starting point of evolution

Throughput (Erlang)

Throughput (Erlang)

0

0.5

0.19
0.18

End point of
evolution

0.17

(d) G-1.0
Starting point
of evolution

0.16
0.15
1

1

1
0.5

0.5

0.5

0.5
0

0

ruvâruR
Figure 6-6: Evolution of pair đrĂ
and đĂ] ,	and the corresponding throughput
ruvâruR
After the initial fall, đrĂ
increases and stabilizes at values in the range between 0.7

and 0.8 which, along with the corresponding	đ]Ă values, maximizes the throughput around 18%.
The effective load đş in this EDBF= 2 scenario follows a pattern that is very similar to what is
shown in Figure 6-5.
Overall, these results indicate that even when the gene structure constitutes two probabilities
ruvâruR
đrĂ
and đ]Ă (i.e., EDBF = 2), the generalized state machine in Figure 6-2 can evolve

towards the pure-ALOHA behavior as depicted in Figure 6-1:a.

117

6.4.1.3 Evolvable Probabilities: đˇđžđł and đˇđłđž (EDBF: 2)
Figure 6-7:a through Figure 6-7:d demonstrate the trajectory of the	probabilities	đĂ] and đ]Ă
over 300 generations. For small G values 0.3 and 0.5, đĂ] approaches zero, indicating that the

1

1

0.8

0.8

Probabilities

Probabilities

nodes rarely digress from the desirable packet transmission flow.

0.6

0.4
0.2

(a) G-0.3

0

0.6
0.4
0.2

(b) G-0.5

0
0

100

200

300

0

1

0.8

0.8

Probabilities

Probabilities

1

0.6
0.4
0.2

(c) G-0.8

0
0

100

200

Generation number

100

200

300

Generation number

Generation number

0.6

0.4
0.2

(d) G-1.0

0
0

300

100

200

300

Generation number

Figure 6-7: Evolution of pair PĂĂ and PĂĂ for different network-wide loads
In other words, due to almost zero entries into the L state, the system becomes insensitive to the
probability	đ]Ă . To maintain a balanced effective load	đş, đ]Ă slightly increases (Figure 6-7:b) as
a response to a slight decrease in đĂ] . Due to traffic overload (i.e., higher G), đĂ] increases in
Figure 6-7:c as well as in Figure 6-7:d. Higher probability of entering to state L imposes longer
intervals between packet transmissions; thus a decrease in đş.

118

6.4.1.4 Evolvable Probabilities: đˇđťuđâđťuđ
, đˇđťuđâđťuđ
, and đˇđłđž 	(EDBF: 3)
đťđł
đťđž
ruvâruv
ruvâruv
Figure 6-8:a through Figure 6-8:d show the trajectory of the triple đr]
, đrĂ
and

đ]Ă for 300 generations. The sparse points show the start of the evolution. As probabilities evolve
and converge to optimum values, dense clusters form.

0.6

Starting point of evolution

0.6

(a) G-0.3
0.4

0.2

Starting point
of evolution

(b) G-0.5
0.4

End point of
evolution

0
1

0
1

1
0.5

End point of
evolution

0.2

1
0.5

0.5
0 0

0 0

Starting point of evolution
(c) G-0.8

0.6

0.5

0.6

(d) G-1.0

0.4

0.4

0.2

0.2

End point of evolution
0
1

0
1

Starting point
of evolution

End point of
evolution

1

1
0.5

0.5

0.5

0.5
0 0

0 0

ruvâruv
ruvâruv
Figure 6-8: Evolution of đr]
, đrĂ
and đ]Ă for different load conditions
ruvâruv
ruvâruv
The transmission abandoning probabilities (i.e., đr]
	and đrĂ
) approach zero

when G is small, i.e., 0.3, 0.5. In fact, smaller abandoning probabilities enforce full use of sending
potential, which is required for small G values. However, for larger G, 0.8 in Figure 6-8:c and 1.0
ruvâruv
ruvâruv
in Figure 6-8:d, the system learns to leverage larger đr]
and đrĂ
to abandon

redundant transmissions to adjust the effective load đş	to 0.5. đ]Ă , however, declines for larger Gs.

119

The reason is that đ]Ă facilitates returns from L to W and aids subsequent packet transmissions.
Thus, lower values of đ]Ă , acting as a đş-reducer, emerge as network traffic increases.
6.4.2

Full EDBF (EDBF: 6)
Full EDBF is accomplished by letting all six transition probabilities to evolve. Figure 6-9
depicts the evolution of all six probabilities for loads G set to 0.3, 0.5, 0.8 and 1.0. Observe
that đĂ] increases for larger G (i.e., 0.8 and 1.0). The reason is that lager đĂ] values cause
more frequent loitering (L), which delays packet transmissions, thus causing reduction in
ĂuRâĂuv
effective load	đş. Large đĂĂ	
	can also be seen in large G scenarios.

For all the full-EDBF scenarios a network is able to adjust its effective load 	đş to 0.5, leading
to the target network throughput of 0.18. These full-EDBF results demonstrate that the generalized
state machine in Figure 6-2 is evolvable towards pure-ALOHA even when all its independent
transition probabilities are allowed to evolve.
6.4.3

Protocol Adaptation

6.4.3.1 Adaptation with Changing Traffic Load
In a 100-node network, load G is runtime altered in four instances. Starting at 0.3, G is
changed every 50 generations to values 0.8, 1.0, 0.5 and 0.8. Each stage is run for 50 generations.
The results in Figure 6-10 correspond to an EDBF-2 case in which the independent probabilities
ruvâruR
đrĂ
and đ]Ă are set to evolve.

When đş	 = 	0.3, both probabilities converge to large values (i.e., 0.8 through 1) in order to
use the full transmission capacity via expedited packet transmissions. As a result, after
approximately 13 generations, the throughput reaches the maximum value for pure-ALOHA,
ruvâruR
which is 0.16 (i.e., for đş	 = 	0.3). As G is changed from 0.3 to 0.8 after 50 generations, đrĂ

120

drops in order to decrease the effective load đş (i.e., towards 0.5) by the way of limiting the number
of transitions to state W following a transmission. đ]Ă reduces as well to enforce the same effect.
Observe how by adapting those probabilities, the system is able to reach back to the pure-ALOHA

(a) G-0.3

1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0

Probabilities

Probabilities

throughput after it had dropped immediately due to the load transition from 0.3 to 0.8.

100

200

Generation number

0

300

(c) G-0.8

1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0

Probabilities

Probabilities

0

0

100

200

Generation number

(b) G-0.5

1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0

300

100

200

300

Generation number

(d) G-1.0

1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0

100

200

Generation number

300

Figure 6-9: Evolution of all six independent transition probabilities for full EDBF
The throughout stabilizes back to the desired value roughly after 20 generations since the load
transition. When G changes from 0.8 to 1.0, a similar drop in probabilities can be observed. When
G reduces to 0.5 from 1.0, the probabilities increase again to facilitate more packet transmissions
and finally, a fall is observed as load increases in the final transition from 0.5 to 0.8.

121

As marked in the figure, the throughput changes and settles near the pure-ALOHA throughput
of 0.18 after each of the transitions. Overall, along all the load change points, the system is able to
adapt transition probabilities in order to reach and maintain the desired pure-ALOHA throughput.

1

Z:200;G:0.5 -> G:0.8;S:0.18

Z:50;G:0.3 -> G:0.8;S:0.16

Z:250;G:0.8;S:0.18

0.5

Z:150;G:1.0 -> G:0.5;S:0.17

0
1

Z:100;G:0.8 ->
G:1.0;S:0.18

Start: Z:1;G:0.3;S:0.15

200

0.5
G: Network load
S: Throughput (Erlang)
Z: Generation number

100
0 0

Figure 6-10: Adaptation of probabilities with runtime changes of network load
6.4.3.2 Adaptation with Changing Network Size
The probability đ]Ă (in Figure 6-11) evolves commensurately to adapt the state machine
towards the maximum throughput of 18%. As N transitions to 200, the overall load is increased.
As an attempt to bring đş down, đĂ] and đ]Ă 	further evolve for maintaining the maximum
throughput. When N reduces to 100, there is a need for increasing đ]Ă . It is needed to achieve a
higher packet transmission rate which the less-crowded network can accommodate while
maintaining the effective load đş	at 0.5. Maximum đ]Ă and minimum đĂ] values are observed
after the next network size transitions to 25. Finally, when the number of nodes is increased back
to 200, the probabilities evolve and converge to similar values as in the second stage in order to
reduce the effective load to the optimal level. To summarize, like in the variable loading case, the
system is able to adapt transition probabilities in order to reach the desired ALOHA throughput.

122

These adaptation runs were carried out for all combinations of partial EDBF (Section 6.4.1) as
well as full EDBF (Section 6.4.2) scenarios, in which similar adaptabilities were observed.
Z:50;N:125-> N:200;S:0.18
1

Z:150;N:100 ->
N:25;S:0.18
Z:250;N:200;S:0.18

Z:100;N:200 ->
N:100;S:0.18

0.5

Z:200;N:25 -> N:200;S:0.18

0
1
0.8

Start: Z:1;N:125; S:0.15

200
0.6

N: Network size
S: Throughput (Erlang)
Z: Generation number

100
0.4 0

Figure 6-11: Adaptation of probabilities with runtime changes of network size
6.5

Summary and Conclusion
An evolutionary based learning framework was proposed in this chapter for MAC layer

protocol synthesis. A generalized probabilistic state machine, a subset of which can mimic the
behavior of pure-ALOHA MAC, was defined. Through simulations it was verified that a desired
state machine with near-benchmark (pure-ALOHA) performance can evolve from that generalized
state machine. This was investigated with static and adaptive implementations when both limited
and full behavioral freedom were granted.
In the next chapter, we add sensory capabilities such as time slotting and carrier sensing, and
investigate if more complex protocols such as slotted-ALOHA and Carrier Sense Multiple Access
(CSMA) can be evolved progressively using the same framework. We then propose a different
learning approach, namely, reinforcement based learning to the problem of communication
protocol design, specifically, the multicast content dissemination in Social Wireless Networks.

123

7

MAC Protocol Evolution with Time Sensing and Channel Sensing Modalities

7.1

Introduction
Building on the research in previous chapter on ALOHA protocol evolution, we extend the

core concepts for the next natural steps towards Slotted ALOHA and CSMA logic. Pure-ALOHA
logic can be formulated with a two-state machine including a wait-for-packet and a transmit state.
If global time sensing ability with fixed packet sized slot granularity is added to Pure-ALOHAâs
logic, the behavior of Slotted-ALOHA (i.e., S-ALOHA) emerges which doubles the protocolâs
attainable throughput (i.e., from 0.18 to 0.36). To model S-ALOHAâs behavior, ALOHAâs state
machine can be equipped with an extra state indicating a node buffering a packet until the
beginning of the next designated slot. Adding the channel sensing modality to ALOHAâs state
machine results in a three-state CSMA machine in which a node keeps listening to the channel in
âwait for free mediumâ state, and transmits the packet as soon as it finds the channel to be idle.
Increasing throughput under various loading conditions demonstrates how adding different node
capabilities can transform the Pure-ALOHA logic to more efficient and complex protocols such
as S-ALOHA or CSMA. In previous chapter (Chapter 6), it was verified that protocol with nearbenchmark pure-ALOHA performance can emerge from that generalized state machine as a result
of the evolution process. In this chapter, we explore if and how more complex protocols such as
S-ALOHA and CSMA can emerge when time slotting and channel sensing capabilities are added
to the network nodes.
The contributions of this chapter are as follows. First, building on the preliminary concepts
developed in previous chapter, the proposed evolutionary protocol synthesis concepts are applied
for networks of nodes with global time or carrier sensing capabilities. Second, the mechanismâs
feasibility is evaluated with varying degrees of evolutionary behavioral freedom. Finally, protocol

124

adaptations are experimented with when node capabilities, network size, and loading conditions
are changed runtime, leading to dynamic protocol synthesis. The objective of this specific chapter
is to study the feasibility of Genetic Algorithm (GA) based design approach for simple protocols
with known performance bounds.
7.2

Evolving Protocol State Machines

Two state machines, one with global time access (i.e., slotting) and the other with carrier sensing
modality are defined.
)	
P=1;
State = WWP

WWP
T=1;
Set Timer;
State = T

(1Drop
packet

W

T=0;
State = W

}
T

State=W

T=0;
State = L
Drop packet

State=L

L
(1-

(1-

Figure 7-1: An evolvable generalized FSM with time sensing capability

7.2.1

Machine with Time Sensing Modality

A generalized state machine that can produce arbitrary protocol behavior is shown in Figure 7-1.
The machine is modeled after a possible process that a baby goes through to learn to talk at
designated intervals rather than talking in between intervals: a) identifying designated intervals
and talking in the beginning of such intervals in a competitive environment with multiple speakers,

125

b) ending a sentence in a graceful manner rather than interrupting herself or getting distracted, and
c) intentionally waiting or unintentionally distracting herself for adjusting the talking rate in order
to minimize interruptions to others.
The state machine in Figure 7-1 is intentionally kept general with high degrees of behavioral
freedom. The hypothesis here is that when the state machine is evolved under the right fitness
pressure (e.g., network throughput), optimal communication protocols will emerge for given
network and loading conditions. Since the nodes are allowed to sense global timing (i.e., time
slotting) and the S-ALOHA logic is a subset of this machine, the evolved solutions are expected
to have performance close to S-ALOHA. The state machine consists of four states, namely, Wait
(W), Wait-With-Packet (WWP), Transmit (T), and Loitering (L). In Wait, a node silently waits to
receive the next packet to be transmitted. A transmission takes place in the state Transmit. A node
can initiate a packet transmission by transitioning from W to T, or by going through W-WWP-T
which models deferring transmission to the beginning of the coming slot. In the latter, the node
transitions to the WWP state while buffering the packet and waiting for the appropriate time slot
for transmission. Loitering models the distraction phase as outlined in the babyâs learning scenario.
This state is different from intentional Wait in that transition from any other state to Loitering is
arbitrary and without any explicit strategy. As a result, it does not induce any explicit fitness
reward. Lingering in any of the states T or L can delay possible new transmissions. The labels on
the arrows indicate the transition probabilities. The notation PXY indicates the transition probability
from State-X to State-Y.
The machine uses two flags P (Packet flag) and T (Transmission flag). When set, they indicate
generation of a packet and an ongoing transmission respectively. Texts within rectangles describe
the action(s) associated with each state transition. While a specific set of transition probabilities in

126

this machine implements the known S-ALOHA protocol, it is general in that it contains many
possible execution sequences implementing other working and non-working access protocols. The
state machine in Figure 7-1 contains 8 independent transition probabilities listed in Table 7-1 (i.e.,
parts a and b). đĂ] is the probability of moving from W to L; đ]Ă models the probability of the
return transition. Transitioning between the states W and L based on those two probabilities control
random loitering which proved to be useful towards self-adjusting traffic load for network-wide
throughput maximization. According to the state machine, after a transmission is over, a node can
ruvâruR
decide between returning to state W with probability đrĂ
or moving to state L with
ruvâruR
probability 1-đr]
. Note that a redundant transition to the state L delays possible future

packet transmissions. Furthermore, a node is allowed to abandon its transmission and move to state
ruvâruv
ruvâruv
W or state L with probabilities đrĂ
and đr]
respectively.

To accommodate global time sensing, 3 additional probabilities are included as follows.	đĂâ ,
namely, the probability of agreeing with slot status, refers to probability of deferring a packet
transmission to the beginning of the next slot. Note that, even after a deferral, packet transmission
is not guaranteed. It depends on the probability 1 â đĂĂ	 ,which refers to the probability of a
transition to state T when a packet is available. This transition takes place either through W, or
through WWP depending on the đĂâ value. When đĂâ = 0, the state machine becomes insensitive
to global time sensing (i.e., time slotting). In that case, the best evolvable solutions for the
generalized state machine is expected to be the pure-ALOHA behavior [63]. đĂĂĂ] is the
probability of arbitrary transitions from WWP to L during which the node loses the chance to
transmit the packet on the scheduled time slot and drops the packet as entering the L state. Note
that at the end of each row in Table 7-1, the probability value for which the state machine reduces
to that of the most likely solution S-ALOHA is mentioned.

127

7.2.2

Machine with Channel Sensing Modality

The next functional augmentation is to develop a state machine that incorporates channel sensing
abilities. When mapped to the model for a babyâs learning process to talk (Section 7.2.1), this
machine enables the additional option for not starting to speak when someone else is speaking.
The new hypothesis here is that when such a state machine (i.e., in Figure 7-2) is allowed to evolve
under the right fitness pressure, the evolved solutions are expected to have performance close to
that of CSMA. Two new transition probabilities, namely, đ^Ă	 and đâ^Ă , are added. While đ^Ă	 refers
to the probability of channel sensing, đâ^Ă indicates the probability of a node respecting the status
of the channel when it is actually sensed. In other words, CSMA logic can be produced when both
đ^Ă	 and đâ^Ă are set to 1 in a reduced state machine mimicking CSMA logic.
(a) Common transition probabilities:
đˇđžđł : From W to L without any external event, 0
đˇđłđž : From L to W without any external event, 0
đˇđťuđâđťuđ
: From T to W after a transmission is over, 1
đťđž
đťuđâđťuđ
đˇđťđž
: From T to W during a transmission, 0
đťuđâđťuđ
đˇđťđł
: From T to L during a transmission, 0
(b) Probabilities to accommodate global time sensing
đˇđžđž	 : From W to W/WWP, when packet is available, 0
đˇđşđ¨ : Agreement with slot status, postponing packet transmission
procedure to the beginning of next slot, 1
đˇđžđžđˇđł : From WWP to L, without any external event, 0
I Probabilities to accommodate channel sensing
đˇđˇuđâđˇuđ
: From W to W, when packet is generated, 0
đžđž	
đˇđŞđş	 : Sensing the channel before deciding on transmission, 1
đˇđ¨đŞđş: Agreement with channel status; postpone transmission when channel
is occupied, 1
Table 7-1: Independent transition probabilities for both state machines
If such CSMA logic is to emerge through evolution, it is expected that a node would first learn
to sense the channel by the way of evolving đ^Ă	 , and then would learn to respect the channel status
ĂuRâĂuv
by evolving đâ^Ă . The quantity 1 â đĂĂ	
refers to the probability of a transition from W to T

128

when a packet is generated. The rest of the transition probabilities have similar function as
described in the generalized state machine in Figure 7-2 for evolving S-ALOHA. These three
probabilities are added in Table 7-1.
(1-

[

[1-

)	

]

T=1;
Set Timer;
State = T

)

]

W

T=0;
State = W

)(

)

}
T

State = W

Drop
packet

T=0;
State = L

State = L

L
1-

Figure 7-2: An evolvable generalized FSM with channel sensing capability
All eight independent probabilities for the state machine in Figure 7-2 are included in parts a
and c of Table 7-1. At the end of each row, the probability value for which the state machine
reduces to that of the most likely solution CSMA is mentioned. Table 7-2 describes how each
transition from W to T can possibly happen in the state machine shown in Figure 7-2.

129

đ â đˇđˇuđâđˇuđ
(đˇđŞđş )	(đˇđŞđ­ )(đˇđ¨đŞđş ): W to T, packet is to be transmitted while channel is sensed
đžđž	
as free
đâ
đâ

đˇđˇuđâđˇuđ
(đˇđŞđş )
đžđž	

đ â đˇđŞđ­ đ â đˇđ¨đŞđş : W to T, packet is to be transmitted although

đˇđˇuđâđˇuđ
(1-đˇđŞđş ):
đžđž	

channel is sensed as occupied
W to T, packet is to be transmitted without checking channel status

Table 7-2: Possible transitions from W to T in the machine in Figure 7-2
7.3
7.3.1

Evolution Architecture
Gene Architecture
As shown in Figure 7-3, the system consists of a Java based evolution fabric and a C based

protocol evaluation module. A gene embodies the collection of all the independent transition
probabilities described in Table 7-1. The upper part of the gene contains the common probabilities
from Table 7-1: part a, and the lower part contains either the probabilities for incorporating global
time sensing (from Table 7-1: part b) or carrier sensing (from Table 7-1:part c). Note that a gene
under this formulation is used for protocol state machine evolution either with time sensing or with
carrier sensing, but not both of them together. The evolution fabric is responsible for the
evolutionary tasks. A mapping function is used for mapping each gene to a protocol behavior
phenotype code, which is evaluated within the C based evaluation module for given target
networks. The protocol performance (i.e., throughput) is then fed back into the evolution fabric in
the form of gene-fitness values.
7.3.2

Fitness Formulation

Network throughput is chosen as the fitness indicator for the evolution experiments since the
evolved solution for the state machines in Figure 7-1 and Figure 7-2 are expected to be close to SALOHA and CSMA, and the maximum possible throughput for these protocols are well-known.

130

Evolution Fabric : Java code
Common
genes

Evaluation Module:
C code
map

Time
sensing/
Carrier
sensing
specific Time Carrier
genes sensing sensing

Protocol Code
Target
Network
Platform

Figure 7-3: Genotype formation and the system components
7.3.3

Reproduction and Evolution

The reproduction and evolution processes follow the ones explained in Section 0.
Evolution starts with an initial population (with size p) with randomly assigned probability values
to the gene structure shown in Figure 7-3. First, the tournament selection process is applied on
2m% of the most-fit individuals within a generation. Through experiments and our prior
experience (Chapter 6), it was observed that by choosing m around 20% sufficient genetic diversity
can be maintained across generations. The mutation operator of the Breeder Genetic Algorithm
proposed in [65] is used. Finally, m% of the least-fit members of the population is replaced. Thus,
children (i.e., mutated members in the last step), substitute the m% least-fit members. Each of the
new members model a specific protocol behavior, which gets evaluated within the C based
evaluation module in Figure 7-3. The evolution process including sorting fitness values,
tournament selection, mutation based reproduction, and population replacement are then
performed on members of the new generation which consists of m% of p number of children and
the p â m% of p members from the previous generation.

131

7.4

Results

Similar to Section 6.4, Evolutionary Degree of Behavioral Freedom (EDBF) is defined as the
number of transition probabilities that are allowed to evolve in a given experiment. In scenarios
with smaller EDBFs, fewer evolvable probabilities limit the phenotype search space. The
population size and network size are both set to 100 unless stated otherwise.
7.4.1

Evolution with Global Time Sensing Enabled

7.4.1.1 Limited Evolutionary Degree of Behavioral Freedom (i.e.,:1,2)
Evolution of the state machine from Figure 7-1 in these experiments starts with đĂĂ	 as the only
evolvable probability (i.e., EDBF is set to 1). After 50 generations, the value of EDBF is bumped
up to 2 by including đĂâ (i.e., enabling slotted time sensing) in the evolvable set. Figure 7-4 shows
the throughput evolution for different network wide loading conditions. For the first 50 generations
when time slot sensing is not enabled, the converged throughput indicates that the generalized state
machine from Figure 7-1 evolves to yield the same throughput of pure-ALOHA protocol. Pure
ALOHAâs throughput is expressed as đâ]ĂĂâ = đş	đ â˘â˘Âą ,	which is maximized to 0.18 when đş =
0.5. As can be seen in Figure 7-4, for đş = 0.5, the fitness stabilizes at around 0.18 while for đş =
0.1, it stabilizes at 0.08 which is correct for pure-ALOHA. An interesting observation is that for
đş > 0.5 cases, the throughput is still maintained at the maximum value of 0.18.
It turns out that the system evolves to adjust the đĂĂ	 value such that the nodes drop packets,
and the resulting effective network wide load is adjusted down to 0.5 for maintaining the maximum
throughput.

132

Throughput (Erlang)

0.4
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0

No global
time sensing

G=2.0
G=1.0

Global time
sensing enabled

G=0.1

0

20

G=0.5

40

60

80

100

Generation number
Figure 7-4: Fitness dynamics before and after time sensing is added
After 50 generations, when time slotting is enabled (i.e., by making the probability đĂâ
evolvable), the fitness of the system stabilizes at larger values for all G values. As expected, the
fitness in this case indicates the emergence of S-ALOHA, with throughput đĂâ˘â]ĂĂâ = đş	đ â˘Âą . It
has the maximum value of 0.36 for đş = 1. The graph in Figure 7-4 shows how S-ALOHA
performance is maintained for all đş < 1, and it is maintained around the maximum value of 0.36,
for đş âĽ 1 by the way of adjusting the effective load. It is observed that for all đş âĽ 0.5, the
effective load đş converges at 0.5 before time sensing is enabled after 50 generations. This is
because the state machine evolves to Pure-ALOHA for which the fitness/throughput is maximized
when the applied load is 0.5. However, for đş < 0.5, đş maintains the same value as G. After time
sensing is enabled, the state machine evolves to S-ALOHA for which the maximum fitness
happens at the effective load of 1. This explains as to why for all đş âĽ 1, the effective load đş
converges around 1. For đş < 1, đş remains same as G. Figure 7-5 reports the genotype dynamics
in terms of the evolvable probabilities đĂĂ	 and đĂâ . The evolved state machine in Figure 7-1

133

indicates how any non-zero đĂĂ	 causes packet drops and resulting load reduction. For đş = 0.1
and đş = 0.5 cases, before time sensing is enabled (i.e., đĂâ is made evolvable), đĂĂ	 evolves to 0.
(a) G=0.1

1

0.5

1

(b) G=0.5

0.5

Enabling
time
sensing

Enabling
time
sensing

0.5

0.5

0.5

1

(c) G=1.0

1

Enabling
time
sensing

0.5

(d) G=2.0
Enabling
time
sensing

0.5

0.5

Figure 7-5: Genotype dynamics before and after time sensing is added
This is because the load is already less than or equal to the optimal load 0.5, thus no load reduction
by packet drop is needed. After time sensing is added, đĂâ evolves to 1, which represents
transmission compliance with time-slotting at the emergence of the S-ALOHA logic. The value of
đĂĂ	 remains zero since load values of 0.1 and 0.5 are still less than the optimal load 1 for SALOHA. For đş = 1, however, đĂĂ	 initially evolves towards 0.5 which reduces the effective load
đş	towards the optimal pure-ALOHA load 0.5. After time slotting is enabled, much like the
previous two loading scenarios,	đĂâ evolves to 1 for ensuring full transmission compliance with
time slotting. After time sensing is enabled, đĂĂ	 evolves towards zero since the load đş = 1 now
becomes equal to the S-ALOHA optimal load 1, thus eliminating the need for any load reduction
134

through packet drops. The dynamics is different for higher loads when đş = 2. In this case
đĂĂ	 first evolves to a larger value of 0.75 due to the need for a greater load reduction. After time
sensing is enabled, the need for load reduction reduces due to the higher optimal G of S-ALOHA.
As a result, đĂĂ	 reduces and finally converges towards a value of around 0.27. The đĂâ dynamics
remain similar to the lower loading scenarios.
7.4.1.2 Full Evolutionary Degree of Behavioral Freedom (EDBF: 8)
Results in this Section correspond to the evolution of the state machine with enabled time
sensing (i.e., in Figure 7-1) when all its eight independent probabilities are allowed to evolve.

(a) G=1.0

1

Probabilities

Probabilities

0.8
0.6
0.4
0.2

0.8
0.6

0.4
0.2

0
0

20

40
60
80
Generation number

(b) G=3.0

1

0

100

0

20

40
60
80
Generation number

100

Figure 7-6: Evolution of all probabilities in different loading conditions
Experiments show that even when the evolution is completely unguided, meaning with the
maximum degree of freedom, the system is still able to evolve towards the emergence of slotted
ALOHA logic with its maximum throughput of around 36%. This further validates the ability of
this framework to synthesize optimal MAC protocols when time sensing is enabled.
Figure 7-6 shows the evolution of all eight probabilities for đş = 1.0 (Figure 7-6:a) and đş =
3.0 (Figure 7-6:b). Observe that under the higher load of 3.0, the probabilities are set to regulate
ruvâruR
the effective load toward the optimal load 1.0. E.g., đrĂ
and đĂĂĂ] decrease to interrupt

135

the packet transmission process by not returning to W state immediately following a successful
transmission in first case, and discarding the packet before the scheduled time slot for transmission
ĂuRâĂuv
is reached in the second case. đĂĂ	
and đĂ] grow larger when đş = 3.0 to maintain the same

effect, that is to regulate the effective load đş.
7.4.1.3 Protocol Adaptation
One of the primary motivations for evolution based protocol synthesis is to incorporate the
ability for self-adaptation when node capabilities and network environment are changed. After
establishing the baseline abilities to synthesize protocols using evolution, we investigate their
adaptability in this section. Adaptation is investigated with: a) varying traffic load, b) turning
global time sensing on and off, and c) various degrees of guidance in terms of EDBF. Figure 7-7
depicts adapted throughput values for a 100-node network for which the MAC protocol is evolved
for 300 generations. Global time sensing is enabled only for the second half of the experiment for
150 generations. Six distinct phases, each for 50 generations, are executed. Parameters that
change across consecutive phases are underlined in the figure. The corresponding genotype
dynamics in terms of up to three independently evolving probabilities (i.e., EDBF = 3) is shown
in Figure 7-8. Starting with EDBF-1 in first 50 generations, đĂĂ	 evolves to around 0.75 to adjust
the load đş = 2.0 down to the optimal load 0.5, resulting a near-ALOHA throughput of 0.18.
During generations 50-100, EDBF changes to 3 as two additional probabilities (i.e., đĂ]	 and
đ]Ă	 ) are allowed to evolve. In addition to that, G is reduced to 0.5. Since there is no need for
further reducing the effective load, đĂĂ	 converges to near-zero values. The three evolving
parameters maintain a balance such that the maximum throughput of 0.18 is achieved in this

136

second phase. In the third phase, as G increases to 1.0, đĂ]	 increases to help reducing the effective
load.
No global time sensing Global time sensing enabled

Througput (Erlang)

0.33
0.28
0.23
0.18
0.13
0.08

G:0.5
EDBF:3
G:1.0
(
, G:1.0
EDBF:3
, EDBF:3
(
,
G:2.0
,
) (
,
EDBF:1
,
)
(
)
) G:2.0
EDBF:1 G:0.5
(
) EDBF:3
(
,
,
)

0

100
200
Generation number

300

Figure 7-7: Throughput with runtime parameter changes

Probabilities

No global time sensing Global time sensing enabled
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0

G:0.5
G:1.0
EDBF:3 EDBF:3
(
,
(
,
,
)
,
)

G:2.0
G:0.5
G:1.0
EDBF:1 EDBF:3
EDBF:3
(
) (
,
(
,
,
)
,
)

G:2.0
EDBF:1
(
)

0

100
200
Generation number

300

Figure 7-8: Adaptation of probabilities with runtime parameter changes
Starting from generation 150, global time sensing is enabled and EDBF=1 , thus allowing đĂĂ	
to regulate the effective load by converging to values around 0.27. The corresponding throughput
in Figure 7-7 converges to 0.36 which proves that evolved solutions mimic the S-ALOHA
benchmarkâs behavior. For the next 50 generations, G is reduced to 0.5 and

137

EDBF=3.

Probabilities evolve such that the effective load is maintained near 0.5, at which the state machine
induces a throughput of around 0.30, which is in agreement with the S-ALOHA throughput at
that load. In the final 50 generations, with change of G to 1.0, đ]Ă	 further increases to assist the
network in expediting packet transmissions. The resulting throughput is near 36% which is in line
with the known maximum throughput for a random access protocol with time sensing enabled
(i.e., S-ALOHA). Overall, these adaptation experiments indicate the systemâs ability to adjust
genetic material (i.e., probabilities) such that evolved near-benchmark solutions are maintained
under varying loading, time sensing, and behavioral degrees of freedom. The above results
demonstrate the feasibility of evolution based approach for designing time-slotted MAC layer
protocols under varying network and traffic conditions.
7.4.2

Evolution with Channel Sensing Enabled
Evolution of the state machine from Figure 7-2 is presented in this section. Note that time
sensing is disabled for these experiments.

7.4.2.1 Limited Evolutionary Degree of Behavioral Freedom (i.e., 1,2)
In the following set of experiments, we observe evolution of throughput as well as genotype
dynamics in the absence and presence of carrier sensing. Using the general state machine in
ĂuRâĂuv
Figure 7-2, we first allow đĂĂ	
as the only probability to evolve (i.e., EDB=1) in the initial

50 generations; afterwards, đ^Ă	 is also allowed to evolve, thus enabling carrier sensing.
Propagation delay (normalized by packet duration) a is set to 0.1. In the first 50 generations,
throughput evolves to around 0.18 for đş âĽ 0.5, the generalized machine reduces to that of pureALOHA when channel sensing is disabled. For đş = 0.1, theoretical ALOHA benchmark
throughput is around 0.08 which is the throughput of the evolved solutions. After generation 50, it

138

is observed that solutions with higher throughput are evolved as the nodes take advantage of the
auxiliary modality of carrier sensing. It can be shown that throughput evolves to a near-CSMA (1persistent) throughput which follows 	đ^ĂĂâ	 =

Âą	Ă âËĂ
Âą vâ˘â˘b â˘	Ă âËĂ

[66]. Figure 7-9 reports the

ĂuRâĂuv
corresponding evolutionary trends of đĂĂ	
and	đ^Ă	 during 100 generations when đş = 	0.5

and đş = 3.0.

1

0.5

1

(a) G=0.5
Enabling
channel
sensing

(b) G=3.0

0.5

Enabling channel
sensing

Figure 7-9: Genotype dynamics before/after channel sensing is added
ĂuRâĂuv
In first 50 generations, đĂĂ	
converges to near-zero values for đş = 0.5	as load is not

greater than the optimal load 0.5 for pure-ALOHA. In case of an over-optimal load (đş = 3.0),
ĂuRâĂuv
đĂĂ	
	converges to around 0.83 to decrease the effective load by the way of blocking a fraction

of possible transmissions. After 50 generations, when channel sensing is enabled, đ^Ă	 evolves to 1
for both the loading scenarios, indicating that nodes learn to sense the channel when making
transmission decisions. Upon emergence of CSMA behavior in the second half of the experiment,
ĂuRâĂuv
đĂĂ	
acts in order to boost transmissions when load is less than the optimal CSMA load 2.6
ĂuRâĂuv
( Figure 7-9:a). For đş = 3.0, đĂĂ	
acts in opposite direction to lessen the excessive input

139

load. Similar experiments performed for varying a and N demonstrated emergence of 1-CSMA
throughput solutions as the system learns to use channel sensing.
7.4.2.2 Protocol Adaptation with Maximum EBDF (i.e., 8)
A thorough test of adaptability was performed by changing all system parameters including
a) normalized propagation delay a, b) network size N, c) network load G, and d) availability of
channel sensing. All eight transition probabilities in Table 7-1 (i.e., parts (a) and (c)) were allowed
to evolve during these experiments. Figure 7-10 depicts adapted throughput values in eight distinct
phases, each of which was run for 50 generations. Channel sensing is enabled only for the second
half of the experiment after 200 generations. Parameters that change across consecutive phases are
underlined in the figure. Figure 7-11 shows the corresponding genotype dynamics. Observe in
Figure 7-10, when there is no channel sensing, the throughput converges near the ALOHA
benchmark of 0.18. In Figure 7-11, the parameter đĂ]	 increases due to the excessive input load
compared to the optimal load of 0.5 for pure-ALOHA. With change of a to 0.1 in phase 2,
throughput is kept at 0.18 and probabilities follow a similar trajectory as in phase 1. That is because
pure-ALOHA performance is insensitive to propagation delay (đâ]ĂĂâ = đş	đ â˘â˘Âą ). In first 2
ruvâruR
phases, both đĂ]	 and đrĂ	
increase such that the effective load is adjusted to 0.5. Next,

network size increases to 200, while the individual node load (g) remains at 0.02 (similar to node
load in second phase). Such increase in network size results in a network wide load of 4.0 which
ruvâruv
ruvâruv
enforces probabilities responsible for abandoning the transmission (đrĂ	
and đr]	
)
ĂuRâĂuv
slightly increase. đĂĂ	
also sharply increases to impose less frequent packet transmissions.

Final phase of experiments with no channel sensing is performed by fixing N and a, and reducing
ĂuRâĂuv
the network load to 1.4. Diverging in opposite directions, lower đĂĂ	
and higher đ]Ă	

140

No channel sensing

Throughput (Erlang)

0.51
0.46

Channel sensing enabled

N:100; N:100; N:200; N:200;
G:2.0; G:2.0; G:3.0; G:1.4;
a:0.5 a:0.1 a:0.1 a:0.1

0.41
N:100; N:100; N:200; N:200;
G:2.0; G:2.0; G:3.0; G:1.4;
a:0.5 a:0.1 a:0.1 a:0.1

0.36
0.31
0.26
0.21
0.16

0.11
0

100

200
300
Generation number

400

Probabilities

Figure 7-10: Throughput with runtime parameter changes

1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0

N:100 N:100
;G:2.0 ;G:2.0
;a:0.5 ;a:0.1

0

N:200
;G:4.0
;a:0.1

100

N:200
;G:1.4
;a:0.1

N:100
;G:2.0
;a:0.5

N:100
;G:2.0
;a:0.1

N:200
;G:4.0
;a:0.1

N:200
;G:1.4
;a:0.1

200
300
Generation number

400

Figure 7-11: Adaptation of probabilities with runtime changes
facilitate more packet transmissions. Despite the initial decrease of throughput in third and fourth
phases, system adapts to the new environment such that the throughput climbs to 0.18 again in
both cases (Figure 7-10). Starting from phase five, channel sensing is enabled. Similar change of
network parameters is repeated for the next four phases. đ^Ă	 and đâ^Ă	 which were not subject to

141

evolution in the first four phases move toward 1 in the fifth phase. They remain at near-1 values
indicating that nodes learn to apply the channel sensing modality and comply with channel status
when sending packets. In this manner, a more complex protocol, i.e., CSMA emerges. This is
proven in Figure 7-10 where near-benchmark CSMA throughput of 0.23 is achieved in the fifth
phase. With changing a to 0.1 in the sixth phase, đĂ]	 decreases sharply. The reason is that, under
corresponding network parameters, CSMA attains the maximum throughput of 0.51 at đş = 2.6.
That is why the system attempts to adjust probabilities such that the effective load equals the input
load 2.0. Observe how throughput bumps up to around 0.51 in the sixth phase of Figure 7-10. With
changing network size to 200 in next phase, the network wide load approaches 4.0, which is larger
ruvâruR
than the optimal load 2.6. đĂ]	 increases while	đrĂ	
decreases, both aiming at reducing the

effective load. As observed in Figure 7-10, then throughput falls initially, though it is restored to
around maximum achievable throughput (0.51) by adjusting the appropriate probabilities. In the
last phase, when network-wide load is reduced to 1.4, the system attempts to adjust the effective
ruvâruR
load to input load G again. Increase of đrĂ	
enforces legitimate returns to state W following

successful transmissions. Besides, reduction in đĂ]	 and the transmission abandoning probabilities
help a) reducing the effective load đş, and b) maintaining throughput around the maximum value
of 0.47. Overall, these set of adaptability experiments show: 1) how a more complex protocol
(CSMA) can emerge when the system is equipped with channel sensing, and 2) how the
evolutionary fabric is able to adapt to optimal solutions under dynamic network conditions. To
summarize, the results in this section demonstrate the feasibility of using an evolution based
approach for designing carrier-sensing MAC protocols.

142

7.5

Summary and Conclusion

Building on preliminary concepts developed in previous chapter (Chapter 6), in this chapter we
explored the feasibility of evolving multi-access MAC protocols in presence of global time and
channel sensing capabilities. Starting from a generalized state machine, it was shown that under
appropriate genotype definition and fitness pressure it is indeed feasible to evolve ALOHA, Slotted
ALOHA and CSMA protocols under a wide variety of network and loading conditions.
Future work on this topic includes evolving protocol state machines with: 1) both carrier and
time sensing simultaneously enabled, 2) replacing full network connectivity by arbitrary mesh
connections, and 3) heterogeneous loading conditions in different parts of a network. Designs with
delay and other non-throughput fitness factors can also be explored.
The ultimate goal of this research is to build on the presented feasibility study, and enable
evolutionary protocol synthesis for more complex network protocols and traffic conditions
including variable size packets.
In the next chapter, we present a different form of learning based protocol design approach,
namely, reinforcement learning to tackle the problem of communication protocol design.
Reinforcement learning is concerned with decision makings with the goal of maximizing some
notion of cumulative reward. In that sense, reinforcement based learning can closely be fit the
problem of commercial content dissemination in SWNs.

143

8

Q-learning based Gain-aware Routing

8.1

Introduction
This chapter explores a reinforcement learning approach for synthetizing routing protocols in

DTNs. As mentioned in Section 1.1, SWNs, as special type of Delay Tolerant Networks (DTNs),
often reflect inter-personal social ties, interactions, and similarities in content and commodity
consumption patterns. We leverage this information for D2D commercial content (e.g., coupon)
dissemination with the goal of maximizing a predefined commercial gain for a Content Provider
(CP).
The presented history-based multicast dissemination [67] in Chapter 4 suffered from certain
amount of performance loss due to the fact that the disseminating node does not use prediction
about the consumption interests of its potential future contacts. Predictive Gain Utility Routing
Individual (PGUR-I) [68] introduced in Chapter 5 addressed that limitation by the way of
developing a predictive gain utility, however, this method does not leverage transitive deliveries,
which as opposed to direct delivery, refer to multi-hop content delivery. Such a delivery is done
through intermediate (transitive) node encounters. PGUR-I is not scalable to mobility scenarios in
which content can only be delivered to target destinations through transitive interactions. To
address such routing challenges caused by high dynamicity of SWNs, in this chapter, we leverage
the concept of Reinforcement Learning (RL) [47], which enables nodes to gather local information
and make online efficient routing decisions. We design a Q-learning [47] based Gain-aware
Routing (QGR) protocol, which aims at maximizing the economic gain for a content provider.
To address the stochastic nature of the presented routing problem, including nodesâ
stochastic meeting patterns, we apply the standard Q-learning framework [47] in an stochastic
version. In this framework, nodes learn the estimated gain associated with each probabilistic

144

forwarding action and select the actions, which lead to higher expected reward or economic gain
of dissemination. This online learning component in QGR is expected to be robust in diverse and
dynamic mobility environments.
The specific contributions of this chapter are as follows. First, we briefly review the gain
utility [68] that is designed around the notion of consumption interest and coupon redemption
probability (Section 4.2.1). Then, we map this DTN routing utility as a stochastic value function
within the Q-learning framework for developing a Q-learning based Gain-aware Routing (QGR)
protocol, which attempts to maximize a defined expected economic gain of dissemination. Finally,
we perform extensive simulation based experiments for functional validation and performance
evaluation of the presented QGR protocol in comparison to two existing protocols. The developed
protocol is evaluated for a number of social and commercially meaningful parameters, namely,
network mobility, consumption interest patterns, content expiry time, forwarding (rebate) cost and
the number of content copies.
8.2

System Model
Using the same cost and revenue (Section 4.2.1) formulation as defined in Chapter 4, we

compute the expected gain achieved through a coupon redemption by consumer-I as:
đş? = 	 đ? â 	 đp

(8-1)

The total network wide gain đş r can be expressed as:
đşr = 	

st
?uv đ?

â đĂđp

(8-2)

where coupons are delivered to đo consumers through đ forwards within a pre-specified
coupon expiry time (đ). The maximum number of deliveries of a coupon and the coupon expiry
time are generally restricted by its generator. Also, đo 	 â¤ 	đ, where đ is that distribution upper

145

bound of a coupon, which is set by the couponâs generator. In the presented methods in this chapter,
only one-hop deliveries are allowed, thus n always equals to đo .
Similar to the presented consumersâ interest and redemption probability model in
Section 4.2.2, each consumer-I maintains an interest level đ? (0 â¤ đ? â¤ 1) towards a specific
coupon type. Redemption function đ@ đ? ,	represents the probability of redeeming a coupon by
consumer-i. Note again that in the presented model, to partially preserve consumersâ privacy, it
is assumed that the data related to consumer interests and consumer-to-consumer interaction is
always stored at the consumersâ devices. It is never exported to a centralized server.
8.3

Problem Formulation
For a given combination of earned revenue đq , discount đo , rebate đp , distribution upper bound

đ, expiry time đ, redemption function đ@ (đ? ), and a social wireless network of consumers with
interest profiles đ? , the objective is to design a learning based multicast coupon dissemination
mechanism that maximizes the total gain đş r 	within the lowest possible dissemination delay.
8.4

Q-learning Model
Reinforcement Learning (RL) has been shown to address the routing challenges caused by

dynamicity of wireless networks [47]. Q-learning [47], is a popular RL approach in which an agent
(i.e., a node) is modeled as a three-tuple entity consisting of {States (S), Actions (A), Reward I}
[47]. To address the stochastic nature of the routing, we propose a stochastic Q-learning based
routing framework in which the set of states and actions (i.e., state transitions) are represented in
the form of a probabilistic state machine. This state machine is formed when an agent meets a peer
node at any time instant t.

146

a)

State, Action and Reward
States (S): An agent observes decision making factors in the local environment in the form of

states [47]. An agent observes a state đ ÂŤ? â đ at a time instant t. Here, the state space S consists of
the set of all nodes in the network.
'â("

$â!
#$ "

#'

j

i
$â!&

!"

("

#$
!&

'â()

#'

()

i j:
deterministic transition
j					+, .. +. ,i 		0, .. 01 :	
probabilistic transitions

Figure 8-1: Abstraction of states and actions in the Q-learning based routing
Figure 8-1 shows an example of the abstraction of the mapped probabilistic state machine
including the states observed by each agent I and j at time t. The figure shows an interaction
between a content source node-I, and a node-j at t. This interaction is shown with a solid arrow
line from node-I to node-j. The agent node-I is in a state of meeting node-j. Such state is
deterministic in the sense that the interaction actually happens at time t. If node-jâs interest level
(đ` )	indicates that it is a target consumer, then the external state (j) observed by node-I is âa target
consumerâ. We model a set of probabilistic states observed by an agent which refer to nodeâs
prospective interactions in future.
In Figure 8-1, a number of probabilistic states observed by node-I are shown as nodes đ? ,
đ â [1, đ]. Similarly, the probabilistic states observed by node-j are shown as jâs prospective future
encounters with nodes đ? , đ â [1, đ]. These probabilistic interactions (transitions) are shown with
dotted lines. Note that, node-I indirectly observes states đ? , đ â [1, đ] through node-j. In other
words, node-jâs encounters are observed by node-I through transitivity. Deterministic and
probabilistic transitions in perspective of node-I are stated on the right side of the figure. In

147

following sections, we explain how the probability associated with each probabilistic transition is
computed. Node-I, maintains the local information about its own observed states. To take an
efficient routing action, node-I also collects (from node-j) the information regarding the
probabilistic states, which it can reach via a two-hop transitivity through node-j.
Actions (A): An agent chooses an action đÂŤ? (at time t) among its available set of actions (A).
For agent node-I carrying T number of copies in Figure 8-1, an example action can be the
transmission of a copy to a node đv through node-j. Another possible action for node-I is to
transmit a copy for node- jâs consumption. Or, node-I can decide to keep a copy to deliver to a
prospective peer đv .
?
Reward I: Each action đÂŤ? is associated with a reward đÂŤâ˘v
. The reward can represent any

performance metric. To design a gain-aware routing framework, we define the reward associated
with an action of forwarding a copy to a peer node-j as the expected gain out of that delivery (đş` ).
Reward may refer to an immediate reward such as the gain yielded after delivering a consumption
copy to an immediate contact node-j (at time t+1). It can also refer to the discounted reward [47]
which is the expected accumulated reward to be received at time đĄ > đĄ. Here, an example of such
discounted reward is the expected gain obtained through the transitive deliveries that happen
through node-j in future. The goal of an agent is to maximize its total reward, which is the gain
through an immediate delivery and the gain obtained through transitive deliveries.
Assuming that all nodes contribute to such goal in an unselfish manner, maximizing the
individual gain results in maximizing the total (network-wide) gain.
b) State-Action Functions (Q-values)
An state-action function, đÂŤ? (đ ÂŤ? , đÂŤ? ) estimates the long term reward that can be obtained by
agent node-I if it performs an action đÂŤ? â đ´, while in state đ ÂŤ? â đ. An agent maintains a table

148

containing Q-values for all possible state-action pair. The size of the table depends on the number
of probabilistic states a node observes. That is the number of unique peers the node met so far,
thus have future possibility to meet again.
From this point onwards, we express the iterative updates of the Q-values without explicitly
indicating the iteration time steps âtâ. This notation has already been used in Figure 8-1. For
example, đ??âĂv refers to the Q-value for the possible action of transmitting a copy to the
prospective encounter đv by i. Every time two nodes (đv and i) meet, they update their Q-values.
8.5

Q-learning based Gain-aware Routing (QGR)
Q-learning based Gain-aware Routing (QGR) protocol relies on the knowledge of: 1) nodesâ
interest profiles (i.e., đ? ), and 2) their underlying social interaction patterns. A couponcarrying node uses this information to compute and evaluate Q-values which reflect how a
delivery to a peer contributes to the overall gain in the couponâs expiry time.
State Variables: Any node-I maintains the following variables with respect to another node-j.
đ?,` : iâs rate of meeting node-j over a pre-defined time window (e.g., a week). This parameter

is updated every time the two nodes meet.
đ` : node-jâs interest level toward the merchandize corresponding to the coupon. Node-I
computes node-jâs redemption probability đ@ (đ` ) based on the known redemption function
corresponding to the coupon.
Q-values: Q-values are initialized at 0. When a node-I meets a node-j, the Q-value for the
possible action of transmitting a copy to j is computed using the following equation. This value is
updated even if node-I does not possess any copies of the coupon at the time of the interaction.
?â`

đ?

?â`

= (1 â đź)đ?

+đź		max	{đş` , đžđĂŁabc }

149

(8-3)

đź (0 â¤ đź â¤ 1) is the learning rate. A very high đź value may cause random fluctuations in Qvalues. With đź close to 0, learning may not happen and the agent only relies on its old Q-value.
The right side of the equation calculates the maximum of the two parameters đş` and đžđĂŁabc .
đş` is the immediate reward corresponding to the action of forwarding a consumption copy to nodej. đş` equals the gain of delivery to node-j and is computed based on Eqn. 8-2. Note that, đş` is an
indication of how good a consumer node-j is.
đĂŁabc is the discounted (i.e., non-immediate) reward and refers to the maximum attainable
expected reward from delivery through node-j. This quantity reflects how suitable node-j is as a
forwarder. In other words, it estimates the expected gain out of a transitive delivery through nodej. đĂŁabc is computed based on the following equation:
`

`âm

														đĂŁabc = max {đm (đ â đĄE ) Ăđ`
bÂ˛Â˛	mâĂ¤

}

(8-4)

,where, K is the set of nodes that node-j has seen so far and has the possibility to meet again.
`

đm (đ â đĄE ) is defined as that probability of j meeting k at least once, during the remaining time to
coupon expiry (đ â đĄE ). đĄE denotes the current time. Assuming that the inter-contact duration
`

between node pairs follows exponential distribution, đm (đ â đĄE ) is computed as follows:
`

																										đm (đ â đĄE ) = 1 â đ â˘đđ,đ (ÂŞâ˘ÂŤâ˘ )
(8-5)
The closer it gets to the coupon expiry time đ, the lower is the chance of meeting node k.
The higher the contact frequency, đ`,m , is between node-j and node-k, the more probable it is for j
`

to meet k during đ â đĄE . The quantity đm (đ â đĄE ) indicates the probability of obtaining reward
`âm

đ`

, which is the expected reward associated with the action of content delivery from node-j to

its prospective future peer node-k. Computing the maximum reward over the rewards expected
from transitive deliveries to all nodes k	â đž, reflects the highest achievable expected gain out of a
transitive delivery through node-j. Finally, the parameter đž (0 â¤ đž â¤ 1) in Eqn. 8-3 is the discount
150

factor which determines agent node-iâs level of preference for relying on the long-term expected
reward 	đĂŁabc . If đž = 0, the immediate reward, which is the gain out of delivery to node-j, is the
?â`

only factor affecting đ?

. In that case, the expected gain out of transitive deliveries through node-

j is ignored.
Using the expression	max	{đş` , đžđĂŁabc } in Eqn. 8-3, node-I evaluates the quality of node-j
?â`

as a consumer, or as a prospective forwarder. Then, đ?

, the total expected reward out of content

delivery to node-j, is updated.
Note that a proof of convergence for the Q-values is not provided in this paper, and it
remains as a major future work. According to [69], for a stochastic framework such as the
presented one, the learning algorithm was shown to converge under certain conditions on the
learning rate.
Forwarding Logic: When a node-I carrying T number of copies meets node-j, if node-j has not
consumed a copy of the content yet, I computes the gain đş` ,	which node-j can achieve through
redeeming the coupon (Eqn. 8-2). If node-j has already consumed a copy, then	đş` = 0, since j is
not a potential consumer anymore.
`

`âm

Node-I, then receives the product đm (đ â đĄE )đ`

, đđđ	đđđ	đ â đž stored at node-j. K is the
`âm

set of peers that j has seen so far, thus, may encounter again in future. The Q-value đ`

indicates

the reward (i.e., gain) from a transitive delivery to node-k via node-j. Multiplying the probability
`

of pairwise meeting đm (đ â đĄE ) by the Q-value yields the expected reward (i.e., gain) of a delivery
through j. Node-I also maintains đ??âÂ˛ values, where đ â đż, and L is the set of peers that node-I has
encountered so far. At this point, Node-I sorts the following values in a descending order in a list
`

of expected Q-values, namely, SQ: 1)	đş` , 2) 	đÂ˛? (đ â đĄE )đ??âÂ˛ â đp	 , âđ â đż , and 3) đm (đ â

151

`âm

đĄE )đ`

â 2Ăđp	, (âđ â đž). The quantity đÂ˛? đ â đĄE ,	the probability of meeting between node-I

and its peer node-l, when multiplied by đÂ˛?âÂ˛ ,	indicates the expected reward for a delivery to nodel by i. Such delivery may be a direct delivery of a copy to node-l for its consumption, or it may be
a copy for a transitive delivery through node-l. That depends on which of the two elements out of
đşÂ˛ 	and	đžđÂ˛abc were superior when đ??âÂ˛ has been computed using Eqn. 8-3.
In the sorted set SQ, one unit of forwarding cost đp	 is subtracted from each đ??âÂ˛ value to
consider the rebate given to node-I upon a delivery to node-l. Two units of forwarding cost are
`âm

subtracted from đ`

values to compensate for the cost of delivery from node-I to node-j and from

node-j to node-k. If there is a common peer that node-j and node-I both may encounter, that is, if
`âm

for an entry đ`

, there exists a đ??âÂ˛ (in SQ) such that đ = đ, then node-I evaluates the expected

reward for each action, If delivery through node-j results in a higher expected reward compared to
a delivery to node-k by I itself, then the entry corresponding to đ??âÂ˛ is replaced with
`

đm (đ â đĄE )đ??âm â 2Ăđp . Finally, node-I assigns copies to node-j based on the top T entries of the
sorted expected Q-values list. If, đş` is among the top T entries, a copy is transmitted to j for
consumption. For each Q-value entry that belongs to a potential transitive delivery to a node-k via
`âm

node-j (đ`

), a copy out of T copies is assigned to node-j. For each Q-value entry associated with

node-iâs encounters (đÂ˛?âÂ˛ ), a copy remains with node-i. Algorithm 8-1 depicts the key components
of the QGR algorithm.

152

8.6

Compared Protocols

Social-Aware Networking (SANE): SANE [43] is designed based on the observation that
individuals with similar interests tend to meet more often. This characteristic is preserved in the
example SWNs (mobility scenarios) used for evaluation in this chapter. SANE models each

153

individualâs interest profile as an m-dimensional interest vector. We implement the interest-cast
SANE-SW (Spray and Wait), in which a content carrier I forwards half of its T copies to a peer j
if the cosine similarity metric [43] between the relevance of the message M, denoted as R(M), and
the interest profile of j (đ` ) is higher than a given relaying threshold đ. Cosine similarity between
m-dimensional vectors A and B is defined as đ đ´, đľ = đ´. đľ/ đ´ | đľ |. We have incorporated the
following changes in order to adapt the baseline SANE-SW with m-dimensional (m>1) interest
profiles to the present framework in which nodes can have a one-dimensional interest profile
(m=1).
We change our one-dimensional interest profile đ? for each node-I to a two-dimensional
interest profile vector of <đ? , đ§> in which z is assumed to be a fixed interest level to another type
of content. This enables SANE to work under the experiment settings of this paper. It also does
not affect the performance of any of the protocols. This way, the cosine similarity between two
interest profiles reflects the similarity interest level (đ? ), because the second element z is the same
for all profiles.
The relevance of the message R(M) is in the same form as nodesâ m-dimensional interest
profiles and is set to (đ?abc , đ§). đ?abc is the interest level with the highest redemption probability
(đ@ đ? ) according to the redemption function. R(M) explains the targetâs desirable set of interest
levels.
To set the relaying threshold đ, the possible range of cosine similarities for different pair
of nodes with various interest levels are first computed using z= 0.5. đ	should be a value inside the
range of possible cosine similarities. We test different đ values and choose the one which yields
the best results (highest gain) for SANE under the tested mobility scenarios.

154

PGUR-I: Predictive Gain Utility Routing Individual (PGUR-I) [68] as described in Chapter 5
uses node-specific interest and interaction information to evaluate how a delivery to a peer
contributes to the overall gain in the couponâs expiry duration. This protocol does not consider
transitive delivery through multi-hops. However, using gain prediction, it provides a strategy to
choose a forwarding strategy with the goal of maximizing the defined gain. When a content carrier
node-I meets a node-j, it computes the expected gain of delivery (Eqn. 8-2) to nodes it may visit
during the remaining time to coupon expiry (đ â đĄE ). If node-I is expected to meet more number
of nodes (than it has coupon copies) with expected gain higher than that of the immediate contact
node-j, it holds onto all its copies. Otherwise, a copy of the coupon is delivered to the immediate
contact node-j.
8.7

Results

8.7.1

Experimental Details

We use the DTN simulator ONE [55] for evaluating QGR, SANE and PGUR-I under a
community based synthetic mobility model and a real mobility trace (i.e., from INFOCOM â06).
The models are different in their community structure and interest level distributions. For all the
experiments we use a redemption function in which the redemption probability maximizes for
interest levels that are not too low or not too high. This is motivated by the observation [14] that
while improving purchase probability for moderately loyal (interested) customers, coupon delivery
may often kill product loyalty for highly loyal customers (Section 1.3.1). To capture this effect, a
Gaussian model for the redemption function (Fig. 4-3) is used with mean đ = 0.75 and	đ@abc =
0.5. The relevance of the message R(M) for SANE is set to <0.75,z=0.5>, because 0.75 is the
interest level of the target consumer with maximum redemption probability. The Q-learning

155

parameters learning rate (đź)	and discount factor (đž) are set at 0.5 and 0.9 respectively. After
experimenting with a range of values, these values were found to be able to provide fast learning
convergence.
8.7.2

Community-based Synthetic Mobility

Synthetic mobility follows map-based movements in a network of 10 waypoints [55]. Nodes
move between waypoints with assigned transition probabilities. 90 nodes are divided into three
communities as shown in Figure8-2. Numbers in parentheses show communityâs population.
Arrows show the flow of migrant nodes between communities. Members within a community hold
higher intra-community meeting frequency compared to inter-community meeting frequencies.

(30)

(30)

(30)

Figure 8-2: Community-based synthetic mobility
Members of each community also tend to have more similarity in interest levels toward a
specific content compared to the similarity of interest level between members of two different
communities. We assign random interest levels to members of each community such that
đ@ đ? , đ â đśv < 	 đ@ đ? , (đ â đśâ˘ ) < đ@ đ? , (đ â đśH ). Each experiment is repeated for around 30
source nodes and the average performance is reported.
8.7.2.1 Impact of Rebate Cost (đđ )
Figure 8-3 shows the evolution of gain over time for different forwarding cost (rebate) đp . In
all experiments, content is generated on day 27, which is long after when the network simulation
warms up in terms of stabilizing the node-interaction profiles. Experiments with 15 initial coupon
copies (i.e., đ = 15) were carried out for expiry time đ	= 8 days.

156

All the protocols show an increasing trend for gain over time, as more coupons are delivered.
QGR shows the maximum achievable gain for all the scenarios in Figure 8-3. SANE has the second
best gain mostly because it leverages multi-hop transitive deliveries despite PGUR-I, which only
performs direct delivery. PGUR-I has the lowest performance since the source nodes, especially
those located in community đśv , can not directly reach target nodes with maximum redemption
probabilities in community đśH . QGR achieves the maximum gain since it manages to distribute
copies among nodes according to their consumption and forwarding capabilities. SANE attempts
to achieve a similar goal by finding consumers which are most similar to the target consumer (i.e.,
node with the highest redemption probability). However, SANE uses a binary distribution of
copies which involves certain amount of randomness that causes the gain to suffer. As the rebate
cost increases from 0.002 (Figure 8-3:a) to 0.2 (Figure 8-3:d), gain for all protocols decreases. The
performance gap between SANE and QGR reduces as the cost increases.
The reason is that QGRâs more node-specific strategy of content dissemination results in
distribution of the copies in breadth rather than forwarding bundles of copies to nodes in a binary
spray and wait [43] manner. With cost increase, QGRâs more spread distribution of copies results
in a larger rebate cost and therefore, a lower gain. QGR also starts and finishes content delivery
with some delay compared to SANE and PGUR-I.
That is because the protocol predicts meetings with nodes with a higher redemption probability.
Source nodes reserve copies for such high-gain yielding deliveries, deferring dissemination. To
?â`

better understand the working dynamics of the QGR protocol, the evolution of Q-values đ?

for

đ =8 days (Figure 8-3:d) is shown in Figure 8-4. Two pair of nodes are randomly chosen from
each community (see Figure8-2). For each node-j, the moving average over each 50 samples of
?â`

đ?

âs seen by any arbitrary node-I, is measured. The average Q-values are shown from day 2,

157

after an initial warm-up period. Initially, the only non-zero element in Eqn. 8-3 is đş` . Thus, Qvalues reflect đźđş` . As Q-values build up, they show the quality of node-j as a consumer and as a
forwarder, depending on the result of max	{đş` , đžđĂŁabc } in Eqn. 8-3. The highest Q-values belong
to nodes from community	đśH , because the nodes in đśH maintain the highest redemption
probabilities. These nodes can also be high-quality forwarders due to their intra-community
encounters with similarly interested peers.
15

15

(a)

(b)

10

Gain

Gain

10

SANE

5

QGR

0
27.77

PGUR-I

(c)

10

Gain

SANE
QGR

5
0
27.77

SANE

0
27.77

32.77
Time (days)

15

Gain

5

PGUR-I
32.77
Time (days)

14
12
10
8
6
SANE
4
2
0
27.77

QGR

PGUR-I

32.77
Time (days)
(d)

QGR

PGUR-I

32.77
Time (days)

Figure 8-3: Evolution of gain for forwarding cost cĂŹ :	(a) 0.002, (b) 0.02, (c) 0.1 and (d) 0.2
While nodes from community đśâ˘ hold medium redemption probabilities, those from đśv
maintain the lowest redemption probabilities (and gain), hence, have the lowest Q-values.
The monotonically increasing Q-values for each node-j from community đśH stabilize around
đş` . This shows that these nodes are high-quality consumers and đş` is so large that the result of
max	{đş` , đžđĂŁabc } in Eqn. 8-3 is mostly đş` .

158

Q-values for nodes j from đśv and đśâ˘ initially increase to values larger than their own gain đş` .
This shows that these nodes can reach peers which are better consumers than themselves. Since
nodes from đśâ˘ visit target consumers of đśH with high redemption probabilities, their maximum Q?â`

values are higher than those for nodes from đśv . A fall in đ?

	(đ. đ., đ â đśâ˘ 	đđđ	đśv ) is observed

for the following reason. Closer to the content expiry time, the probability of meeting any node-k
`

by a node-j, (đm (đ â đĄE ))	becomes very small. Thus, đĂŁabc (i.e., in Eqn. 8-4), which reflects the
capability of j for transitive delivery to the target consumers reduces. That leads to the domination
of đş` over đžđĂŁabc in the Q-value computation formula in Eqn. 8-3.

1.4
1.2
1
0.8
0.6
0.4
0.2
0
2

12
22
Time (days)

32

Figure 8-4:Evolution of Q-values
The Q-values converge to đş` eventually. The rising and falling trend of the Q-values for the
members of đśv and đśâ˘ show that these nodes are more qualified forwarders than consumers.
8.7.2.2 Impacts of Content Expiry Duration (đ) and the Number of Content Copies (T)
Figure 8-5 shows the impacts of content expiry duration (Ď) and the number of copies (T) on
gain when đp =0.02. Total gain of delivery for all protocols increases as more number of content
copies are available. Stretching Ď duration has similar effect.
159

For fewer number of copies (e.g.,	đ = 5) and for short expiry duration (e.g.,	đ = 0.5	đđđŚ), the
difference of gain between the protocols is relatively small. Thatâs because the protocols have less
number of choices to make and less chance to use their prediction/learning ability to make efficient
routing decisions. For example, for a small number of copies (T), allowing fewer intermediate
forwarding and eliminating the rebate cost is a more economical solution to maintain a higher gain.
This is basically how the non-transitive protocol PGUR-I works. The closer performance of QGR
to PGUR-I for limited T and đ indicates QGRâs ability to adjust its forwarding strategy to such
application level parameters.

Gain

QGR
SANE
PGUR-I

Figure 8-5: Impacts of content expiry duration (Ď) and number of copies (T) on gain
Note that an extra analysis on the metric ânumber of forwardsâ involved in a content delivery
is not performed because the composite metric âgainâ already contains this cost in the form of
rebate cost. This is the main cost since the other cost which is due to transferring pairwise interest
and interaction profiles (upon nodesâ meetings) is similarly imposed in all protocols.
8.7.3

Trace-based Mobility

The dissemination protocols were also tested for a human interaction trace collected [59] from
INFOCOM â06 conference. The trace contains data from devices which were carried around by

160

users for all four conference days. All experiments in this section are run for 30 random source
nodes out the 97 nodes.
Figure 8-6:a and Figure 8-6:b show the dynamics of gain and time to reach the maximum
gain respectively. Forwarding (rebate) cost đp is fixed at 0.002. Each cluster in the figures
corresponds to a scenario with a specific number of content copies T. Each of the 12 points in each
cluster corresponds to a combination of protocol and expiry duration: 0.5, 1, 1.5 and 2.5 days.
Figure 8-6:a shows that QGR achieves the maximum gain compared to other protocols for most
combinations of T and đ. As expected, larger T results in higher gain. Also, in each cluster in
Figure 8-6, the points toward the right side generally have higher gain as they correspond to larger
đ. Similar to the synthetic mobility results (in Figure 8-3), the gain difference between the
protocols is smaller under smaller number of copies such as T=5. The protocol SANE has the
second highest gain in most cases. This is because it utilizes transitive deliveries, using which it is
possible to better explore the environment in comparison to PGUR-I which is direct delivery based.
As shown in Figure 8-6:b, the observed higher gain for QGR comes at the price of longer
time to reach the maximum gain. In most cases, PGUR-I reaches its maximum gain at a later point
of time compared to SANE. The reason lies in PGUR-Iâs direct delivery nature in which the source
node holds on to copies to deliver to the best consumers using predictions. SANE, on the other
hand, reaches target consumers in a shorter time delay, leveraging its transitive deliveries. As
shown in all the clusters in Figure 8-6, it naturally takes longer for all protocols to deliver all
copies when the number of copies is larger. Experiments with other combinations of T, đ and
rebate cost (đp ) for this trace-based mobility have shown similar gain dynamics as what is seen for
the synthetic mobility.

161

Gain

35
30
25
20
15
10
5
0

QGR

SANE

T=5

T=15
1

Time to reach max. gain (days)

PGUR-I

6
Experiment ID

QGR

3.5

T=25 (a)

PGUR-I

11

SANE

3
2.5
2
1.5
T=25

1

T=15

0.5

T=5

0
1

6
Experiment ID

(b)
11

Figure 8-6: Dynamics of a) gain and b) time to reach maximum gain
Overall, results from both synthetic and trace-based mobility verify that the designed QGR
protocol is able to achieve higher gain by striking a better balance between the forwarding cost
and revenue for most combinations of commercial and network parameters.
8.8

Summary and Conclusion
In this chapter, we developed a Q-learning based Gain-aware DTN multicast Routing (QGR)

protocol which is specifically designed for commercial content types such as coupons. In
comparison with the history-based and prediction-based dissemination methods in Chapters 4 and
5, the online learning component in QGR is expected to be robust in diverse and dynamic mobility
environments. The protocolâs objective is to maximize the economic gain of dissemination which

162

is defined as a composite index that combines revenue from delivery and forwarding cost from
dissemination. In this framework, a node learns the suitability of its encountered peers both as
consumers as well as intermediate forwarders by estimating Q-learningâs state-action functions.
Experimental results from the simulator ONE [55] showed that the presented protocol achieves
higher economic gain compared to two existing dissemination methods in most combinations of
network (mobility) and commercial/environment parameters.
Future work on this topic can include: a) studying the effects of learning framework
parameters (e.g., learning rate (đź)), b) providing a proof of convergence for action-state functions
and c) developing mechanisms for handling consumer selfishness.
In next chapter, we extend this protocol to a Scalable Q-learning based Gain-aware Routing
protocol (SQGR) which provides higher scalability and maintains better user privacy.

163

9

Scalable Q-learning based Gain-aware Routing

9.1

Introduction

In previous chapter [70], we showed that Q-learning is able to address the dynamicity and
stochasticity associated with routing in SWNs. This chapter presents a Scalable Q-learning based
Gain-aware Routing (SQGR) which extends QGR (Chapter 8) [70] focusing on system scalability
and privacy preservation. Using an Scalable Q-learning model, unlike QGR [70], we avoid storing
consumer specific interaction and interest profiles on devices, thus, providing better privacy.
In this chapter, we formulate the gain of dissemination as a long-term reward in a Q-learning
based Gain-aware routing approach. The goal is to take a series of forwarding actions which result
in delivery to the highest gain-yielding target customers.
The specific contributions of this chapter are as follows. We first introduce a definition of the
economic gain of dissemination as used in our prior work (Chapter 4). That is a function of
forwarding cost (rebate) and revenue. We then present the Scalable Q-learning based Gain-aware
Routing (SQGR) algorithm. Finally, we present simulation results, which verify SQGRâs
functionality and performance in comparison to few existing protocols.
The system model including the gain formulation used in this chapter follows the same model
presented in Chapter 8. We also tackle the same problem of maximizing total economic gain đş r
for the content generator, except that we focus on embedding an additional level of scalability and
privacy.

164

!# âs Q-table

Probabilistic state

$##â&Ě , (&Ě
$##â&Ě )*+ ,	(&Ě )*+
$##â&Ě ),*+ ,	(&Ě ),*+
âŚ

86
8,
89:Ě

>6

Deterministic
transition

>,

?",&

!" âs Q-table

>9:
;6

,	

$""â&)*+ ,	(&)*+

?",&)*+ ;,

?#,&
!#

!"

$""â&),*+ ,	(&),*+
âŚ

;9:<=

+

#â&Ě )[(/012 /*+ )56]*+

$#

$""â& , (&

"â&)[(/

Probabilistic transition

(&Ě )[(/012 /*+)56]*+

Deterministic state

/* )56]*

012 +
$"
(&)[(/012 /*+)56]*+

Figure 9-1: Probabilistic state machine modeling for Q-Learning
9.2

Scalable Q-learning based Routing (SQGR)

9.2.1

Gain Quanta

Scalability and Privacy: We divide the possible gain range [đşa?s , đşabc ] to b number of gain
quanta. Maximum and minimum possible gain can be computed by Eqn. 8-2. Width of each
quantum, đ ĂŻ is (đşabc â đşa?s )/đ. We refer to each gain quantum with its lower bound. đĂą is the
number of nodes falling into each gain quantum q.
9.2.2

Q-learning Model

Following the Q-learning [47] model, each agent (i.e., a node) is modeled as a three-tuple
entity consisting of {States (S), Actions (A), Reward I} [47]. A probabilistic state machine, as
shown in Figure 9-1, formulates states along with stochastic transitions (i.e., actions) when two
nodes (đ? 	đđđ	đ` )	meet at time t.
States (S): At each point of time t, an agent may observe a set of deterministic and probabilistic
states. The state space S consists of a set containing all nodes in the network. In Figure 9-1, agent
đ? observes the deterministic state đ` , as long as the connection between the two nodes is held.

165

Agent đ? observes a set of probabilistic states shown with dashed circles. Each circle consists of a
group of nodes whose gain falls in a specific gain quantum. Probabilistic states may or may not be
observed in future đĄ > đĄ. The agent đ? observes đ` âs prospective states transitively (e.g., nodes
đv . . đsĂ˛ĂłĂ´ with gains in [đ, đ + đ ĂŻ )).
Ăľ

Actions (A): At each point of time t, an agent chooses an action among the set of actions (A).
Considering the stochastic nature of the routing problem, A consists of both deterministic and
probabilistic actions shown as the transitions among states in Figure 9-1. For example, agent đ? ,
assuming that it holds T copies of content, can choose the action of forwarding a copy to node đ` .
This is a deterministic action. Agent đ? can choose to hold certain number of copies to transmit to
a node belonging to the group of nodes with gain in quantum đ or to transmit (through transitivity)
to a prospective peer of node đ` (e.g., đâ˘ ). The latter actions are probabilistic and are shown as
probabilistic transitions. Each such transition is associated with a transition probability. For
example, the transition probability to the state of node đ? meeting any node from quantum q
through node đ` 	is a function of đ`,Ăą , the meeting frequency between đ` and members of quantum
q.
Reward I: When two nodes meet, an action đ	 â đ´ is associated with a reward r â đ. Set of
rewards R consists of immediate and discounted rewards [47]. In the presented model, reward
associated with each action is defined as the resulting expected economic gain of delivery in
perspective of the content provider. Deterministic actions, such as a delivery from node đ? to đ`
(Figure 9-1), are associated with immediate reward, which is đş` . Transitive deliveries which take
place at đĄ > đĄ, are associated with discounted rewards.
Q-values: At each point of time t, a state-action function, namely, a Q-value,	estimates the
immediate/discounted reward associated with an action đ	 â đ´ considering a current state đ 	 â đ.
166

?âĂą

For example, the Q-value đ?

indicates the action of forwarding one or multiple copies of a

coupon to any node-o with gain within the gain quantum đ (i.e., đşi â [đ, đ+đ ĂŻ )). The subscript
of the term shows the owner of the Q-value, while the superscript indicates the action which is
associated with this Q-value. For an immediate contact such as node đ` , in Figure 9-1, đş` 	denotes
the value associated with a direct delivery to node-j. All Q-values are stored at each nodeâs Q-table
(Figure 9-1).
9.3

Scalable Q-learning based Gain-aware Routing
The proposed SQGR protocol relies on the knowledge of: 1) nodesâ interest profiles (i.e., đ? ),

and 2) their underlying social interaction patterns. Unlike in Q-learning based Gain-aware Routing
(QGR) [70], these state variables are not stored per-node. They are instead aggregated over a gain
quantum. A coupon-carrying node uses this information to compute and evaluate Q-values which
reflect how a delivery to a peer with a certain interest level, contributes to the overall gain in the
couponâs expiry time.
State Variables: Any node-I maintains and updates the following variables everytime it meets
a node-j.
đ?,Ăą : iâs rate of meeting with any node j in the gain quantum q where đş` â [q,q+đ ĂŻ ] over a predefined time window (e.g., a week).
đĂą : the number of nodes in gain quantum q which are visited by node-I so far. This is increased
by one at every instance of meeting between node-I and a node with gain in quantum q.
Q-values: Q-values are initialized at 0. When a node-I meets a node-j, the Q-value for the
possible action of transmitting a copy to a node belonging to the gain quantum q (đş` â [q,q+đ ĂŻ ]]
is computed using the following equation.

167

?âĂą

đ?

?âĂą

= (1 â đź)đ?

+đź		max	{đş` , đžđĂŁabc }

(9-1)

đź (0 â¤ đź â¤ 1) is a standard Q-learning parameter, namely, the learning rate. A very high đź
value may cause random fluctuations in Q-values. Very small đź values indicate more reliance on
old Q-values.
A higher đş` indicates a higher redemption probability for node-j, thus shows the quality of
node-j as a consumer. Node-j can also be a forwarder. Such quality of j is reflected in đĂŁabc , the
discounted (i.e., non-immediate) reward. đ`abc is the maximum attainable expected reward from
delivery through node-j:
`

`âĂą

		đĂŁabc = max {đĂą (đ â đĄE ) Ăđ`
bÂ˛Â˛	ĂąâĂˇ

}

(9-2)

,where, Q is the set of gain quanta that node-jâs previously visited peers belonged to.
`

đĂą (đ â đĄE ) is defined as the probability of j meeting any node with gain within the gain quantum
q at least once, during the remaining time to coupon expiry (đ â đĄE ). đĄE denotes the current time.
Assuming that the inter-contact duration between node pairs follows exponential distribution,
`

đĂą (đ â đĄE ) is computed as follows:
`

													đĂą (đ â đĄE ) = 1 â đ â˘đđ,đ (ÂŞâ˘ÂŤâ˘ )
`

(9-3)
`âĂą

The quantity đĂą (đ â đĄE ) indicates the probability of obtaining reward đ`

associated with

a coupon delivery from node-j to a prospective future peer with gain within quantum q.
The Q-learning parameter đž (0 â¤ đž â¤ 1) in Eqn. 9-1 is the discount factor which determines
agent node-iâs level of preference for relying on the long-term expected reward 	đĂŁabc compared
to immediate reward đş` . For a đž = 1, immediate delivery is discouraged.

168

Using the expression	max	{đş` , đžđĂŁabc } in Eqn. 9-1, node-I evaluates the quality of node-j
as a consumer, or as a prospective forwarder in a group of nodes which gain fall into the same
gain quantum.
Note that, for SQGR, the maximum size of a Q-table on a consumerâs device reduces to
đ,		the total number of gain quanta, which is smaller in comparison with N-1 (network size), the
maximum size of Q-table in QGR [70]. Lower Q-table size translates to less usage of storage on
mobile devices.
Forwarding Logic: When a node-I carrying T number of copies meets node-j, if node-j has not
consumed a copy of the content yet, I computes the gain đş` ,	which node-j can achieve through
redeeming the coupon (Eqn. 8-1).
`

`âĂą

Node-j, then shares the product đĂą (đ â đĄE )đ`

, for	all	gain	quanta	đ â đ with node-i. Q

is the set of gain quanta which node-jâs previous encountered peers belonged to. J may encounter
one of these nodes at least once in time remaining to coupon expiry with the probability
`

`âĂą

đĂą (đ â đĄE ). The Q-value đ`

indicates the reward (i.e., gain) from a transitive delivery to any
`

`âĂą

node with gain in range [q,q+đ đ ] via node-j. Therefore, đĂą (đ â đĄE )đ`
?âĂą

reward (i.e., gain) of a delivery through j. Node-I also maintains đ?

indicates the expected

values in its Q-table, where

đ â đ. Here, đ is the set of gain quanta that node-Iâs previously encountered peers belong to.
Node-I then sorts the following values in a descending order in a list of expected Q-values, namely,
đĂź :

?âĂą

1]	đş` , 2] 	đĂą? (đ â đĄE )đ?

`

`âĂą

â đp	 , âđ â đ , and 3] đĂą (đ â đĄE )đ`

169

â 2Ăđp	, (âđ â đ).

?âĂą

The one unit of forwarding cost đp	 is subtracted from each đ?

value is to consider the rebate

given to node-I as it cooperates in a transitive delivery. Two units of forwarding cost are subtracted
`âĂą

from đ`

values to compensate for the cost of delivery from node-I to node-j and from node-j to

another node with gain belonging to quantum q. For common gain quanta that node-j and node-I
both hold in their Q-tables, the highest expected Q-value is stored.
If đş` 	is the highest value in the sorted list, a copy is forwarded to node-j. Otherwise, for each
?âĂą

expected Q-value đ?

`âĂą

or đ`

	 higher than đş` , đĂą or đĂą 	copies (out of T available copies at node-

i) are held at I for transitive deliveries. This means that there are transitive deliveries which result
in a higher expected gain compared to a delivery to the immediate contact node-j. Note that đĂą

170

and đĂą 	are the number of đ? and đ`Ă˝ đ  encountered peers which belong to the gain quanta đ and q
respectively. Eventually, if any copies are left with I, it is forwarded to node-j for consumption.
This forwarding algorithm is depicted in Algorithm 9-1.
9.4

Results
We have compared the designed SQGR algorithm with Social-Aware Networking (SANE)

[7] and Direct Delivery (DD) mechanisms. In SANE, a content carrier I forwards half of its T
copies to a peer j if the cosine similarity metric between the relevance of the message M, denoted
as R(M), and the interest profile of j (đ` ) is higher than a given relaying threshold đ. In DD, a
node forwards a copy of a coupon to a peer that does not possess the coupon at their meeting time.
9.4.1

Experimental Details

We use the DTN simulator ONE [55] to evaluate performance of SQGR, SANE, DD and QGR
[70]. We run experiments for three mobility scenarios. We use a Gaussian redemption function
(Figure 5-2) with đ = 0.75		and maximum redemption probability đ@abc = 0.5. The relevance of
the message for SANE is set such that it targets consumers who have the closest similarity of
interest to the maximum target interest level, 0.75. Learning rate (đź) and discount factor (đž) are
set to 0.5 and 0.2 respectively. These values are set experimentally after proving to provide fast
learning coverage. The impact of different learning parameters remains for the future work. Earned
revenue đq and discount cost đo are set to 5 and 2 respectively.
9.4.2

Community-based Synthetic Mobility

The synthetic mobility consists of 90 nodes forming 3 distinct communities (Figure 9-2). The
arrows show the flow of nodes migrating between communities and the numbers inside each

171

community shows the size of the community. Intra-community interest levels are more similar
compared to those across communities. Also, đş? đ â đśv < đş? đ â đśâ˘ < đş? đ â đśH .

Figure 9-2: Community-based mobility
Evolution of Gain: Figure 9-3 shows the evolution of gain over time during the lifetime of a
coupon. The coupon is generated after a warmup period, when Q-values are settled network-wide.
The gain is computed as the average over 30 experiments, each using a random source node.
Forwarding cost is set as 0.002.
Observe that the gain has an increasing trend for all protocols as more number of coupons are
distributed over time. SQGR achieves a higher gain compared to other protocols for all expiry
durations. With an increase in expiry duration from half a day (Figure 9-3:a) to 5.5 days
(Figure 9-3:d), gain for all protocols except for DD, increases due to more exploration duration.
For DD, the source and the receivers of the coupon deliver all T=15 coupons to the first 15 nodes
which they encounter. A longer expiry duration for the DD case does not necessarily change the
set of encountered nodes, thus, the gain does not increase. SQGR performs better than QGR in this
case because SQGR takes the forwarding decisions based on comparison of the immediate gain of
delivery đş` with expected Q-values of transitive deliveries to any node from a gain quantum higher
than the quantum đş` belongs to. The probability of meeting such node is relatively higher than
meeting any one specific node with gain higher than đş` . The latter is the forwarding criteria used
in QGR. For this reason, SQGR finds immediate delivery less urgent and instead waits to meet
better consumers in future.

172

(b)

(a)

25

25
20

SQGR

15

Gain

Gain

20
QGR

10

SANE

5

28.27

28.77

29.27

0
27.77

29.77

DD

28.27

28.77

29.27

29.77

(c)

SQGR

QGR

SANE

15
10
5
0
27.77

Gain

30

Gain

SANE

Time (day)

35

20

QGR

5

Time (day)

25

SQGR

10
DD

0
27.77

15

DD
29.77

31.77

Time (day)

33.77

35
30
25 SQGR
20
15
10
5
0
27.77

(d)
QGR

SANE
DD

29.77

31.77

33.77

Time (day)

Figure 9-3: Evolution of gain over time for expiry duration đ: (a) half a day, (b) 1.5, (c) 3 and (d)
5.5. days
The final gain values when forwarding cost is set higher at đp = 0.2 are shown in Figure 9-4.
Figure 9-4:a and Figure 9-4:b show gain results for number of coupons T =15 and T=25
respectively. Similar to the observation in Figure 9-3, all protocols have an increasing trend in gain
with longer coupon expiry duration.
For T=15 (Figure 9-4:a), final gain values for all protocols are lower compared to the equivalent
scenarios with the smaller forwarding cost of 0.002 in Figure 9-3:a to Figure 9-3:d. Higher gain
values in general is observed when T=25. SQGR mostly achieves a higher gain compared to QGR
for the reasons mentioned earlier, that is, maintaining copies for longer duration and eventually
delivering to better consumers.

173

(b)

(a)
SQGR
QGR
SANE
DD

10
0

20
Gain

Gain

20

SQGR
QGR
SANE
DD

10
0
0.5 1.5 3 5.5
Expiry Duration (!)

0.5 1.5 3 5.5
Expiry Duration (!)

Figure 9-4: Final gain values for (a) T=15 and (b) T=25
To gain more insight into SQGRâs performance, we show Figure9-5. Figure9-5 shows Q-values
đ??âR (Fig 9-5:a) and đ??âv.â˘ (Fig 9-5:b) at every update for every node in the network. Fig 9-4:a
shows the quality of nodes for delivering to any consumer whose gain belongs to the gain quantum
0. This corresponds to the smallest possible gain values in the current experiment. The high
concentration of points in cluster of Q-values observed by members of đśv in Fig 9-5:a indicates
that these nodes meet a higher population of nodes with gain within quantum 0.
This is due to the high contact frequencies of the members of this community with one another.
The higher Q-values shown in this cluster indicate that nodes with IDs in range [0,29] are good
candidates for delivering coupons to nodes maintaining gain within quantum 0. The lower values
of đ??âR recorded for nodes belonging to đśH shows that these nodes are less capable of delivering
copies to nodes maintaining small gain values. This is because members of đśH only encounter
nodes with relatively small gain values when they migrate to đśâ˘ .
Absence of Q-values đ??âv.â˘ for members of đśv 	in Fig 9-5:b shows that these nodes do not directly
encounter nodes which hold gain in range [1.2, 1.3], that is, nodes belonging to đśH . Note that, this
is the highest possible gain range for nodes under the tested mobility scenario. Unlike members of

174

đśv ,	nodes in đśâ˘ migrate to community đśH and update Q-values đ??âv.â˘ whenever they visit any peer
residing in đśH . Thatâs the reason why relatively high đ??âv.â˘ can be seen for nodes with IDs in range
0-29.. Similar trend of Q-values is observed for the members of đśH due to the frequent intracommunity encounters.
In summary, Figure9-5 shows how nodes differ in quality of delivery (direct or transitive) to
low (Figure9-5:a) and high quality (Figure9-5:b) consumers. SQGR dissemination mechanism
works based on identifying such qualities and choosing the most suitable nodes for delivering
content to high quality consumers.

Q-values	observed	by	 Q-values	observed	by	members	of	('
(b)
(a)
members	of	(%

!""â$

!""â%.'

1
0.5

1

0.5

0
0

Q-values	observed	by	
members	of	()

0

50
Node ID (i)

0

50
Node ID (i)

Figure 9-5: Q-tablesâ content at each node for (a) QÂšâR
and (b)	QÂšâv.â˘
Âš
Âš
9.4.3

Trace-based Mobility

The dissemination protocols were tested on INFOCOM mobility trace [59], which consists of
human social interaction data for 97 devices, during 4 days of the INFOCOM â06 conference. Each
node is assigned a random interest level.
Gain Vs. Time to reach maximum gain: Figure 9-6 shows the maximum gain and the time
to reach the maximum gain for all protocols for short expiry duration of half a day (Figure 9-6:a)
and longer expiry duration of 2.5 days (Figure 9-6:b). The forwarding cost is fixed at 0.2. Similar

175

to the results for the community based mobility (Section 9.4.2), SQGR mostly provides a higher
gain value in both expiry duration scenarios. This is because, SQGR relies on the probability of
meeting any higher quality consumer in future. This results in a longer time for SQGR to reach its
maximum gain. DD has the fastest delivery at the cost of lower gain. SANE and QGR almost do
similarly in terms of gain and time to reach that. As expected, the total gain of delivery and time
to reach that increases for all protocols, if longer expiry duration is allowed (Figure 9-6:b).
Scalability: One of the primary value propositions of SQGR, in comparison with QGR, is higher
scalability and better privacy preservation by not storing consumer specific personal interaction
and interest profile on peersâ devices. As mentioned in Section 9.2, the higher scalability of SQGR
is a result of smaller Q-tables stored at consumersâ mobile devices. To demonstrate this, we furnish
Figure9-7, which shows the average Q-table size for 10 randomly chosen nodes recorded at the
time of the coupon expiry. The figure shows that QGR requires nodes to store more number of Qvalues, for all expiry durations between half a day and 2.5 days. For QGR, the maximum number
of Q-values stored in a nodeâs Q-table is N-1, where N is the network size. That is equal to the
maximum number of peers a node may visit before a coupon expires. In this case, N= 97. For
SQGR, the maximum number of Q-values equals the number of gain quanta (b), since Q-values
are computed and stored per gain quantum. Here, the gain varies in [0,1.3], which results in 13
gain quanta, each of size 0.1.

176

(a) !: half	a	day

17

22

AQGR
SQGR

Maximum gain

Maximum gain

22

QGR
DD

12

SANE

(b)	!: 2.5	days

17
12
7

7
1.3
1.8
Time to reach maximum gain (day)

1.3
1.8
Time to reach maximum gain (day)

Figure 9-6: Maximum gain and time to reach that for expiry duration: (a) half a day and (b) 2.5
days
9.4.4

Working Day Mobility

Here, we combine the community-based movement with reasonable human daily routines [55]. 90
nodes are divided in three communities residing in their home areas in 3 distinct areas on the map
of the city of Helsinki [55].

Figure 9-7: Q-table size per node

177

Members of each community are similarly interested in a coupon content. Nodes wake up in
the morning, go to work, go shopping or do similar activities in the evening and eventually return
home and sleep. Experiments were run for two forwarding cost values of 0.002 and 0.2 and đ
between half a day and 4 days. Similar results to what was observed for the community based and
trace based mobility scenarios were seen. Those will not be shown here to avoid redundancy. One
interesting example of gain evolution for a đ = 4 days is depicted in Figure9-8. The dashed lines
show the start of a new working day. It can be seen that for all protocols, gain starts to increase in
initial hours of a working day. That is when nodes start visiting each other. Gain stabilizes as soon
as nodes go back to their homes and sleep.

Figure 9-8: Gain evolution for working day mobility with đ = 4 days
Overall, results from the three tested mobility scenarios demonstrate that SQGR is mostly able
to achieve the highest gain among other protocols under various combinations of forwarding cost
đp , number of coupons T, and expiry duration đ. We observed that SQGR does not obtain a
noticeably higher gain compared to QGR and SANE for some combinations of forwarding cost đp
and very short expiry duration. However, due to its higher scalability and providing a better

178

consumer privacy, SQGR appears to be more suited to the task content dissemination with the goal
of maximizing gain among other tested protocols.
9.5

Summary and Conclusion
We presented a Scalable Q-learning based Gain-aware Routing (SQGR) protocol for

disseminating commercial content in Social Wireless Networks. We defined an economic gain,
which is a composite metric that combines the revenue from delivering and the costs associated
with that delivery for a vendor. Content forwarding and the associated gain are then mapped to the
state-action and reward for the Q-Learning framework. The key benefit of SQGR is that it does
not store any per-node consumption interest or interaction information across nodes, thus
providing both scalability and privacy. The analysis and performance evaluation performed using
the ONE simulator demonstrates the superiority of SQGR in terms of gain performance and
scalability in comparison to three existing protocols. The future work on this topic includes: 1)
studying the effects of consumer (node) selfishness, and 2) automatically fine-tuning the Qlearning parameters (e.g., discount factor and learning rate).

179

10

Future Work

10.1 Introduction
There are many applications of multicast Device-to-Device (D2D) content dissemination in
healthcare [6], emergency handling [7], and vehicular networks [8]. In this dissertation we
specifically focused on a commercial content dissemination application which is about delivering
commercial content such as coupons to a target population of customers. The main design goal for
a D2D commercial content dissemination algorithm is about maximizing a predefined commercial
gain for a Content Provider (CP). In other words, a carrier of content/coupon identifies most fit
individual Content Consumers (CCs) based on its pattern of social interaction (e.g., contact
duration, centrality, etc.) with target consumers.
The objective of this dissertation has been to develop coupon dissemination mechanisms that
can maximize a couponâs economic gain in the presence of costs/rebates for D2D forwarding
incentives. This can be posed as a Delay Tolerant Network (DTN) multicast routing problem.
In chapter 3, we developed a Unicast Gain-aware Dissemination Protocol (GDP) with the goal
of maximizing the defined economic gain for the content provider. Gain of dissemination was
defined as a combination of a time-based value function and cost of delivery. In chapter 4, we
extended this work to multicast content dissemination introducing an Economy Driven Content
Dissemination (EDCD) protocol. We applied a more elaborate gain formulation which is a
combination of a redemption probability based value function and cost of delivery. In chapter 5,
using the same gain formulation, we presented a Predictive Gain Utility based Routing framework
consisting of two routing protocols. With gain prediction, this framework provides multicast gainaware routing in more structured and scalable routing approaches compared to EDCD.

180

So far, the designed routing mechanisms were designed primarily based on past experience of
the designer with no embedded learning mechanism. This approach may not scale well for
heterogeneous and dynamically changing networks and with newly emerged gain formulation.
This motivates the field of network protocol design using learning mechanisms that can adapt with
heterogeneity and changing operating conditions, thus avoiding redesigns with changing
conditions.
In this direction, we presented a first step toward evolving network communication protocols,
specifically for Medium Access Control layer protocols (Chapter 6 and Chapter 7). A generalized
probabilistic state machine was defined initially. It was shown that under appropriate genotype
definition and fitness pressure, it is indeed feasible to evolve ALOHA, Slotted ALOHA and CSMA
protocols under a wide variety of network and loading conditions.
In Chapter 8, we leveraged reinforcement learning to tackle the problem of communication
protocol design. Reinforcement based learning can closely fit the problem of commercial content
dissemination in SWNs as it is concerned with decision makings with the goal of maximizing some
notion of cumulative reward. We developed Q-learning based Gain-aware DTN multicast Routing
(QGR) protocol which is specifically designed for commercial content types such as coupons. In
comparison with the history-based and prediction-based dissemination methods in Chapters 4 and
5, the online learning component in QGR is expected to be robust in diverse and dynamic mobility
environments.
In Chapter 9, extending QGR, we developed Scalable Q-learning based Gain-aware Routing
(SQGR) protocol. The key benefit of SQGR in comparison to QGR is that it does not store any
per-node consumption interest or interaction information across nodes, thus providing both
scalability and privacy.

181

Building on the developed research in this dissertation, in this chapter, we outline different
directions of future work and the involved challenges.
10.2 Evolution of DTN Routing Protocols
10.2.1 Parameter Evolution
1) Evolving Depth of Penetration (DoP): It was shown in Chapter 4 that the presented EDCD
can achieve better economic gain compared to a number of competing DTN routing protocols
under different mobility scenarios and Depths of Penetrations (DoPs). The balance between the
obtained value and cost usually leads to gain maximization at one specific optimal DoP for any
given mobility scenario. A possible future work direction on this topic is to use Genetic Algorithms
(GA) for self-adjusting the DoP parameter such that the optimal value of DoP can be set online.
Optimal DoP may change according to the mobility scenario, nodesâ interest profiles and type of
the coupon (with different expiry time). An evolutionary approach may manage to adjust the DoP
value runtime according to such changes.
2) Delivery pattern: A new direction could be that without sharing any interest level among
consumers, a device learns about the average interest level of peers in its surrounding and adjusts
its delivery rates accordingly. Let us assume that the probability of sending a coupon for
consumption (đĂźpE ) exponentially increases as time remaining to coupon expiry decreases. For
example, đĂźpE =

v
Ă âš(ĂžâĂżâ˘ )

, where đ â đĄE indicates the time remaining to coupon expiry. The slope of

the curve, x, if smaller, causes more number of deliveries in the early stages after coupon
generation. Larger đĽ values impose more number of deliveries around the expiry time. A possible
future work in this direction could be using GA to evolve this parameter in order to fine tune the
pattern of delivery according to network conditions.

182

The fitness function in both examples of parameter evolution can be the âeffectiveâ gain out
of deliveries that a node does. Challenges which need to be addressed here include: notifying a
node regarding its fitness which is its contribution to the total gain without a serverâs involvement.
10.2.2 Protocol Evolution
Based on the evolutionary framework presented in Chapter 6 and Chapter 7, it can be proposed
to design and evolve probabilistic state machines which implement D2D content dissemination
protocols. One such example is shown in Figure10-1. States of the machine include: 1) Wait (W):
which shows the state in which a node holds at least one copy of a coupon and is waiting to meet
a peer, 2) Send (S): represents sending one copy of a coupon to a peer for consumption and 3)
Terminate (T): represents termination of the content transmission procedure between two nodes.
The quantity đm is a binary input parameter. If đm = 1, it indicates that the coupon carrier has more
than enough number of copies to deliver to prospective nodes with expected gain larger than an
immediate contactâs gain (before the coupon expiry time). đm 	đĂĂ represents a valid transmission,
while (1 â đm )	đĂĂ indicates that a copy is delivered to an immediate neighbor when there are not
enough number of copies to be delivered to more qualified contacts in future. đĂĂ = 1 indicates a
rogue behavior which allows a source node to return to the Wait state and attempt to send more
number of consumption copies for the same peer, instead of terminating the transmission process.
This state machines is general in that it contains both valid and rogue scenarios for disseminating
a content. The objective is to investigate if and how the transition probabilities evolve towards
generating different forwarding protocols.

183

W

S

T

Figure 10-1: Example general state machine consisting of a content dissemination protocol
The fitness criterion in this case would be the induced gain by disseminating coupons using a
routing protocol that each set of transition probabilities (state machine) represent. It is challenging
to design a more general state machine by granting higher degree of behavioral freedom to it.
Consider when inputs to the state machine are 1) expected gain induced by an immediate contact,
and 2) the expected gain of delivery through prospective contacts. In that case, the state machine
should manage to evolve the right set of calculations and comparisons on the inputs in order to use
them in making a valid delivery decision. With more complexity, this approach searches through
a wider sample space. That may lead to finding better solutions compared to when the state
machine has less degree of behavioral freedom as shown in Figure10-1. Thus, one future work on
this topic includes investigating the feasibility of evolving D2D content dissemination protocols
using evolving probabilistic state machines with different evolutionary degrees of behavioral
freedom.
10.3 Neural Network Driven Q-learning
Neural Networks (NNs) can be used for approximating Q-functions, which are Q-values [48,
71] in a learning formulation. A possible future work direction on the designed reinforcement

184

learning based methods, QGR (Chapter 8) and SQGR (Chapter 9) is to estimate the Q-values
(Section 8.1.2) using a Neural Network instead of using the state-action functions. Using this
approach, the input of the NN would be states and actions. The output is the estimated Q-value for
the corresponding state-action pair. In this process, the need for a Q-table look up is eliminated.
Based on the specific Q-learning formulation presented in this dissertation and the analysis
presented in Section 9.4, for QGR, the maximum number of Q-values stored in a nodeâs Q-table
is N-1, where N is the network size. That is equal to the maximum number of peers a node may
visit before a coupon expires. For SQGR, this number reduces to the number of pre-defined gain
quanta. Training an NN for estimating Q-values instead of looking up the Q-tables is faster and
more scalable especially for large Q-tables.
Using NNs may result in a more generalized state-action function estimator. In the alternative
approach, every time a state-action is observed, the corresponding Q-value is computed as a
function of the previous Q-value based on a pre-defined state-action function.
10.4 Gain Estimation using Time Series Prediction
In the prediction based method presented in Chapter 5, we assumed that the inter-contact
durations between node pairs follow exponential distribution. Based on this assumption, we
developed two Predictive Gain Utility Routing (PGUR) methods in which forwarding logic is built
upon predicting expected gains that are based on nodesâ past interaction locality information.
A future work direction on this topic is to relax the assumption that nodesâ inter-contact
durations follow a specific distribution. Instead, use prediction tools such as Recurrent Neural
Networks, specifically Long Short Term Memory (LSTM), to predict [72] the gain values observed
by each node. Based on such predicted values, a node can leverage a wider perspective in order to
decide whether it needs to hold on to a content copy or forward it when a peer is met.

185

Note that, the gain values observed by a node is a time series. Thus, the problem can be posed
as time series prediction. To accomplish this, nodes interactions should inhibit some sort of
temporal locality, which is the case in SWNs.
One example of such gain time series can be seen in Figure10-2. The synthetic series shows
the gain values corresponding to the peers met by an arbitrary node-I over 10 days. This
corresponds to a simulation done with 100 nodes following a Working Day Movement (WDM)

Gain observed by Node-i

mobility model [55].
1.1
0.9
0.7
0.5
0.3
0.1
-0.1 1
-0.3

2

3

4

5

6

7

8

9

10

Day

Figure 10-2: Time series of gain observed by a node-i
Consider a benchmark method in which an oracle is aware of the future gain observations by
every node in the network. In that case, the problem is defined as if the oracle can take the best
forwarding decision for each node such that the overall gain is maximized.
10.5 Redemption Functions
The redemption function framework introduced in 1.3.1 is generalized enough to capture
varying interest level to redemption probability mapping. A coupon generator can manually
design such a mapping as a redemption function. The goal of a D2D dissemination protocol is to
maximize the economic gain for a given redemption function. A future work direction on this topic
includes performing experiments with a wide range of sample redemption functions to

186

accommodate for different marketing strategies and analyzing the impacts of varying marketing
psychology on the economic gain performance.
10.6 Handling Selfishness
A selfish user is one who deviates from the dictated content dissemination strategy with the
goal of earning more rebate. Such deviation results in forwarding decisions which result in a lower
network-wide gain for the content provider. This happens if a node arbitrarily forwards content as
an intermediate node and is rewarded with rebates, while increasing the cost of forwarding for the
content provider. The other way a node can deviate from the designed dissemination strategy is to
avoid performing some forwarding decisions selected by the embedded dissemination algorithm.
In that case, it stops itself and the peers which were supposed to receive the content copy, from
contributing to the gain maximization process.
A future direction on this work is to analyze the impacts of selfishness on gain performance.
To accomplish that, parameters such as number of selfish nodes, degree of node selfishness can be
defined. One possible way to define node selfishness is to have a selfish node do x number of
arbitrary forwards. A full degree of node selfishness under such definition is to make a selfish node
forward a single copy to every peer it meets. Such forwarding, without considering a peerâs
potentials as a 1) consumer and 2) a forwarder, increases the forwarding cost and may result in a
negative gain.
The next step would be providing solutions including 1) detecting selfish nodes and 2)
combating selfishness through embedding appropriate strategies in the dissemination protocol.
One approach to infer selfishness for the dissemination algorithm on a nodeâs application is to
detect a decreasing trend of individual gain as a warning and either report this to the content
provider or run an embedded strategy to combat the nodeâs selfish behavior. One simple approach

187

to combat nodeâs selfishness is to de-authorize the nodeâs mobile device for participating in the
dissemination process. Combatting selfishness in incentive-based content dissemination in SWNs
and DTNs is extensively discussed in the literature [5, 73]. Similar methods can be adapted to the
commercial content dissemination framework presented in this dissertation.

188

BIBLIOGRAPHY

189

BIBLIOGRAPHY
	
[1] M. McPherson, L. Smith-Lovin, and J. M. Cook, "Birds of a feather: Homophily in social
networks," in Annual Review of Sociology, 2001.
[2] J. Goldenberg and M. Levy, "Distance is not dead: Social interaction and geographical
distance in the internet era," in Arxiv,abs/0906, 2009.
[3] E. Cho, S. Myers, J. Leskovec, "Friendship and mobility: user movement in location-based
social networks," in KDD '11, NY, 2011.
[4] D. KoschĂźtzki, et. al., "Centrality Indices," Network Analysis: Methodological
Foundations, p. 16â61.
[5] F. Xia, L. Liu, J. Li, J. Ma and A. V. Vasilakos, "Socially Aware Networking: A Survey,"
IEEE Systems Journal, vol. 9, pp. 904-921, 2015.
[6] S. Olariu, K. Maly, E.C. Foudriat, S. M. Yamany and T. Luckenbach,, " A Dependable
Architecture for Telemedicine in Support of Disaster Relief," in Dependable Systems, John
Wiley and Sons, 2004.
[7] M. Asplund, S. Nadjm-Tehrani, and J. Sigholm, "Emerging Information Infrastructures:
Cooperation in Disasters," in Lecture Notes in Computer Science, Critical Information
Infrastructure Security, Springer , 2009, pp. 258-270.
[8] P. R. Pereira, A. Casaca, J. J. P. C. Rodrigues, V. N. G. J. Soares, J. Triay, and C. CervelloPastor, "From delay-tolerant networks to vehicular delay-tolerant networks," IEEE
Commun. Surv. Tutorials, 2012.
[9] M. Subramani and B. Rajagopalan, "Knowledge-sharing and influence in online social
networks via viral marketing," Commun. ACM , vol. 46, no. 12, pp. 300-307, 2003.
[10] J. Berger, "Is Word of Mouth Better Than Advertising?," 2015. [Online]. Available:
http://jonahberger.com/is-word-of-mouth-better-than-advertising/.
[11] K. P. S. T. D. Goel, "Determining a value for a coupon". Google Patents 2013.
[12] M. Trusov, R. Bucklin, K. Pauwels, "Effects of Word-of-Mouth Versus Traditional
Marketing: Findings from an Internet Social Networking," Journal of Marketing, vol. 73,
pp. 90-102, 2009.

190

[13] A. Udd, "Predicting Redemption Probability of Gift Cards," Master of Science
Thesis,Royal Institute of Technology, KTH, 2013.
[14] T. Lehman, "Using coupons to kill customer loyalty," [Online]. Available:
https://www.newnorth.com/using-coupons-to-kill-customer-loyalty/.
[15] M. Boll, "Effective Win-Back Email Campaigns For 3 Stages Of Customers," 2016.
[Online]. Available: https://glew.io/effective-win-back-email-campaigns-3-stagescustomers/.
[16] L. Guo, C. Zhang, H. Yue and Y. Fang, "A privacy-preserving social-assisted mobile
content dissemination scheme in DTNs," in INFOCOM, Turin, 2013.
[17] B. Zhang, J. Teng, X. Bai, Z. Yang and D. Xuan, "P3-coupon: A probabilistic system for
Prompt and Privacy-preserving electronic coupon distribution," in Pervasive Computing
and Communications (PerCom), Seattle, 2011.
[18] K. Wei; X. Liang; K. Xu, "A Survey of Social-Aware Routing Protocols in Delay Tolerant
Networks: Applications, Taxonomy and Design-Related Issues," Communications Surveys
& Tutorials, IEEE , vol. 16, pp. 556-578, 2014.
[19] A. Garyfalos and K.C. Almeroth, "Coupons: A Multilevel Incentive Scheme for
Information Dissemination in Mobile Networks," IEEE Transactions on Mobile
Computing, vol. 7, no. 6, pp. 792-804, 2008.
[20] W. Moreira, P. Mendes, and S. Sargento, "Social-aware Opportunistic Routing Protocol
Based on Userâs Interactions and Interests," in Proceedings of AdHocNets, Spain, 2013.
[21] W. Gao, Q. Li, B. Zhao, G. Cao,, "Multicasting in Delay Tolerant Networks: a social
network perspective," in MobiHoc '09, 2009.
[22] Y. Zhu; B. Xu; X. Shi; Y. Wang., "A Survey of Social-Based Routing in Delay Tolerant
Networks: Positive and Negative Social Effects," Communications Surveys & Tutorials,
IEEE, vol. 15, pp. 387-401, 2013.
[23] P. Hui and J. Crowcroft, "How Small Labels Create Big Improvements," in PerCom
Workshops, 2007.
[24] P. Hui, J. Crowcroft, and E. Yoneki, "Bubble rap: social-based forwarding in delay tolerant
networks," in ACM MobiHoc, China, 2008.
[25] Y. Cao, Z. Sun, "Routing in Delay/Disruption Tolerant Networks: A Taxonomy, Survey
and Challenges," Communications Surveys & Tutorials, IEEE, vol. 15, pp. 654-677, 2013.

191

[26] E. Bulut and B. K. Szymanski, "Friendship based routing in delay tolerant mobile social
networks," in GLOBCOM, 2010.
[27] T. Zhou, R. R. Choudhury, and K. Chakrabarty, "Diverse routing:Exploiting social
behavior for routing in delay-tolerant networks," in Proc. CSE, 2009.
[28] E. M. Daly and M. Haahr, "Social network analysis for information flow in disconnected
delay-tolerant MANETs," IEEE Trans. Mobile Comput., vol. 8, p. 606â621, 2009.
[29] A. Lindgren, A. Doria, O. Scheln, "Probabilistic routing in intermittently connected
networks," in Proceedings of the Workshop on Service Assurance with Partial and
Intermittent Resources, August, 2004.
[30] J. Karvo, and J. Ott, "Time scales and delay-tolerant routing protocols," in In Proc.c of the
ACM MobiCom Workshop on Challenged Networksâ 2008, (CHANTS).
[31] J. Burgess, B. Gallagher, D. Jensen, and B. N. Levine, "Maxprop:Routing for vehicle-based
disruption-tolerant networks," in IEEE INFOCOM â06, Barcelona, Catalunya, Spain, 2006.
[32] S. Nelson, M. Bakht, and R. Kravets, "Encounter-based routing in DTNs," in INFOCOM,
2009.
[33] L.Tang, Q. W. Zheng, J. Liu, X. Y.Hong, "SMART:A Selective Controlled-Flooding
Routing for Delay Tolerant Networks," in Broadband Communications, Networks and
Systems, BROADNETS, 10-14 Sept. 2007.
[34] F. Nazir, J. Ma, and A. Seneviratne, "Time critical content delivery using predictable
patterns in mobile social networks," in Proc. CSE, 2009.
[35] T. Ning, Z. Yang, X. Xie and H. Wu, "Incentive-aware data dissemination in delay-tolerant
mobile networks," in Sensor, Mesh and Ad Hoc Communications and Networks (SECON),
2011.
[36] Y. Wang; M. Chuah; Y. Chen, "Incentive driven information sharing in delay tolerant
mobile networks," in GLOBECOM, 2012.
[37] A. Vahdat, D. Becker, "Epidemic Routing for partially-connected ad hoc networks," Duke
Technical Report CS-2000-06, 2000.
[38] S. Ioannidis, A. Chaintreau and L. Massoulie, "Optimal and scalable distribution of content
updates over a mobile social network," in INFOCOM, 2009.

192

[39] J. DĂ­az, A. Marchetti, D. Mitsche, P. Santi, J. Stefa, "Social-Aware Forwarding Improves
Routing Performance in Pocket Switched Networks," in Lecture Notes in Computer
Science, 2011.
[40] B. B. Chen and M. C. Chan, "MobiCent: A credit-based incentive system for disruption
tolerant network," in INFOCOM, 2010.
[41] M. Uddin, B. Godfrey, and T. Abdelzaher, "RELICS: In-network realization of incentives
to combat selfishness in DTNs," in Network Protocols (ICNP), 18th IEEE International
Conference on, 2010.
[42] P. Costa, C. Mascolo, M. Musolesi, and G.-P. Picco, "Socially-aware routing for publishsubscribe in delay-tolerant mobile ad hoc networks," IEEE J. Sel. Areas Commun, vol. 26,
no. 5, p. 748 â760, 2008.
[43] A. Mei, G. Morabito, P. Santi and J. Stefa, "Social-Aware Stateless Routingin Pocket
Switched Networks," IEEE Transactions on Parallel and Distributed Systems, vol. 26, no.
1, pp. 252-261, 2015.
[44] C. Zhu and M. Jin, "An anycast routing algorithm based on genetic algorithm," Wireless
Trans. on Comp., pp. 113-122, 2009.
[45] X. Li, L. Ci, B. Cheng, C. Tian, M. Yang, "Ant Colony Based Routing Strategy in UAV
Delay Tolerant Networks," 2012.
[46] S. Jain, K. Fall, and R. Patra, "Routing in a delay tolerant network," in SIGCOMM '04,
ACM, 2004.
[47] H. Al-Rawi, M. Ng, K. Yau, "Application of reinforcement learning to routing in
distributed wireless networks: a review," Artif. Intell. Review, vol. 43, no. 3, pp. 15737462, 2015.
[48] J. Boyan and M. Littman, "Packet Routing in Dynamically Changing Networks: A
Reinforcement Learning Approach," in Advances in Neural Info. Proc. Systems (NIPS),
1994.
[49] V. Rolla, M. Curado, "A Reinforcement Learning-based Routing for Delay Tolerant
Networks," Eng. Appl. Artif. Intell, vol. 26, no. 10, pp. 2243--2250, 2013.
[50] A. Elwhishi, et. al, "ARBR: Adaptive reinforcement-based routing for DTN," in IEEE 6th
International Conf. on Wireless and Mobile Computing, Networking and Comms., Niagara
Falls, 2010.

193

[51] V. Cerf, S. Burleigh, A. Hooke, L. Torgerson, R. Durst, K. Scott, K., & H. Weiss, "Delaytolerant networking architecture," RFC4838, 2007.
[52] K. Fall, "A delay-tolerant network architecture for challenged internets," in In Proceedings
of the conference on Applications, technologies, architectures, and protocols for computer
communications,ACM, 2003.
[53] S. Burleigh, A. Hooke, L. Torgerson, K. Fall, K., V. Cerf, B. Durst, & H. Weiss, "Delaytolerant networking: an approach to interplanetary internet," Communications Magazine,
IEEE, vol. 41(6), pp. 128-136, 2003.
[54] N. Kumar, & J. Kim, "Probabilistic trust aware data replica placement strategy for online
video streaming applications in vehicular delay tolerant networks," in Mathematical and
Computer Modeling, 2012.
[55] "The ONE; The Opportunistic Network Environment simulator," [Online]. Available:
http://www.netlab.tkk.fi/tutkimus/dtn/theone.
[56] J. Douceur, "The sybil attack," in IPTPS, 2002.
[57] E. Friedman and P. Resnick, "The social cost of cheap pseudonyms," Journal of Economics
and Management Strategy, vol. 10, no. 2, p. 173â199, 1998.
[58] T. Spyropoulos, K. Psounis, and C. S. Raghavendra, "Spray and wait: an efficient routing
scheme for intermittently connected mobile networks," in ACM SIGCOMM workshop on
Delay-tolerant networking (WDTN '05), ACM, 2005.
[59] J.Scott, R. Gass, J.Crowcroft, P.Hui, C. Diot, A. Chaintreau, "Cambridge-haggle-imoteinfocom2006-2009-05-29," May, 2009. [Online]. Available: Downloaded from
http://crawdad.cs.dartmouth.edu/cambridge/haggle/imote/infocom2006.
[60] D.Fenn, "10 ways to get more sales from existing customers," [Online]. Available:
http://www.inc.com/guides/2010/08/get-more-sales-from-existing-customers.html .
[61] J. Han, M. Kamber, J. Pei, Data Mining: Concepts and Techniques, Morgan Kaufmann,
2011.
[62] S.G. AraÂ´ujo, A.C.P. Pedroza, A.C. Mesquita, "Evolutionary Synthesis of Communication
Protocols," in 10th International Conference on Telecommunications (ICT), 2003.
[63] N. Abramson, "The ALOHA System - Another Alternative for Computer
Communications," in in Proc. Fall Joint Computer Conference, 1970.

194

[64] L. D. Davis, Handbook Of Genetic Algorithms, Van Nostrand Reinhold, 1991.
[65] H. MĂźhlenbein, "The Breeder Genetic Algorithm - a provable optimal search algorithm and
its application.," in Colloquium on Applications of Genetic Algorithms, IEE 94/067,
London, 1994.
[66] L. Kleinrock, F.A. Tobagi, "Packet Switching in Radio Channels: Part I--Carrier Sense
Multiple-Access Modes and Their Throughput-Delay Characteristics," Communications,
IEEE Transactions on, vol. 23, no. 12, pp. 1400-1416, 1975.
[67] F. Hajiaghajani, S. Biswas, "Device-to-Device Commercial Content Dissemination in
Social Wireless Networks," in The 14th Annual IEEE Consumer Comms. and Networking
Conf. (CCNC '17), Las Vegas, 2017.
[68] F. Hajiaghajani, S. Biswas, "Device-to-Device Coupon Distribution using Economic
Routing Utilities," in 12th International Conf. on Wireless and Mobile Computing,
Networking and Comms. (WiMob '16), New York, 2016.
[69] R. Sutton, A. Barto, Reinforcement Learning: An Introduction, MIT Press, 1998.
[70] F. Hajiaghajani and S. Biswas, "Learning based Gain-aware Content Dissemination in
Delay Tolerant Networks," in 9th International Conference on COMmunication Systems
and NETworkS (COMSNETS), India, 2017.
[71] V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, M.
Riedmiller, "Playing Atari with Deep Reinforcement Learning," in NIPS Deep Learning
Workshop, 2013.
[72] J. Brownlee, "Time Series Prediction with LSTM Recurrent Neural Networks in Python
with Keras," 2016. [Online]. Available: http://machinelearningmastery.com/time-seriesprediction-lstm-recurrent-neural-networks-python-keras/.
[73] H. Zhou, J. Wu, H. Zhao, S. Tang, C. Chen, and J. Chen, "ncentive driven and freshnessaware content dissemination in selfish opportunistic mobile networks," in Proc. IEEE 10th
Int. Conf. Mobile Ad-Hoc Sensor Syst. (MASS), 2013.
[74] S.Y. Yang, J.T. Jiang, & P.Z. Chen, "OOPProPHET: A New Routing Method to Integrate
the Delivery Predictability of ProPHET-Routing with OOP-Routing in Delay Tolerant
Networks," Journal of Computers, vol. 8(7), pp. 1656-1663, 2013.
[75] F. Hajiaghajani, Y. Piolet, M. Taghizadeh, S. Biswas, "Economy Driven Content
Dissemination in," Elsevier Journal of Ad hoc Networks, vol. 20, pp. 132-149, 2014.

195

[76] A. K ERĂNEN , and J. OTT., "Increasing reality for DTN protocol simulations," Technical
Report, Helsinki University of Technology, Networking Laboratory, July 2007.

196