MITIGATING UNCERTAINTY AT DESIGN TIME AND RUN TIME TO ADDRESS
ASSURANCE FOR DYNAMICALLY ADAPTIVE SYSTEMS
By
Erik M. Fredericks

A DISSERTATION
Submitted to
Michigan State University
in partial fulfillment of the requirements
for the degree of
Computer Science - Doctor of Philosophy
2015

ABSTRACT
MITIGATING UNCERTAINTY AT DESIGN TIME AND RUN TIME TO ADDRESS
ASSURANCE FOR DYNAMICALLY ADAPTIVE SYSTEMS
By
Erik M. Fredericks
A dynamically adaptive system (DAS) is a software system that monitors itself and its
environment at run time to identify conditions that require self-reconfiguration to ensure
that the DAS continually satisfies its requirements. Self-reconfiguration enables a DAS to
change its configuration while executing to mitigate unexpected changes. While it is infeasible for an engineer to enumerate all possible conditions that a DAS may experience, the
DAS must still deliver acceptable behavior in all situations. This dissertation introduces
a suite of techniques that addresses assurance for a DAS in the face of both system and
environmental uncertainty at different levels of abstraction. We first present a technique for
automatically incorporating flexibility into system requirements for different configurations
of environmental conditions. Second, we describe a technique for exploring the code-level
impact of uncertainty on a DAS. Third, we discuss a run-time testing feedback loop to continually assess DAS behavior. Lastly, we present two techniques for introducing adaptation
into run-time testing activities. We demonstrate these techniques with applications from
two different domains: an intelligent robotic vacuuming system that must clean a room
safely and efficiently and a remote data mirroring network that must efficiently and effectively disseminate data throughout the network. We also provide an end-to-end example
demonstrating the effectiveness of each assurance technique as applied to the remote data
mirroring application.

Copyright by
ERIK M. FREDERICKS
2015

To Natalie and Zoe, thank you for everything.
I couldn’t have done this without you.

iv

ACKNOWLEDGEMENTS

I first want to thank Dr. Betty H. C. Cheng for taking me under her wing and guiding
me from the bright-eyed graduate student that I was to the sleepy-eyed, yet still optimistic,
person I am today. Your willingness to provide feedback and many, many revisions on papers
and projects is greatly appreciated. I want to also thank you for taking me along on the
ICSE ride. Because of that invaluable experience, I met many wonderful people and helped
to make something great, and I cannot thank you enough for that. Special thanks also go
to Dr. McKinley for trading me to the highest bidder early on in my academic career. I am
fairly certain that I would not have ended up in software engineering otherwise.
I also would like to thank my committee members: Dr. Philip McKinley, Dr. Erik Goodman, Dr. William Punch, and Dr. Xiaobo Tan. Their valuable feedback and willingness to
attend overly long presentations throughout this entire process has been greatly appreciated.
Additionally, there are many other people that I would like to thank who have helped
me along the way. First of all, I’m sure that, at some point, all members of the SENS
lab were pulled into some sort of discussion on software engineering (either willingly or
unwillingly). Many thanks to Jared Moore, Tony Clark, Chad Byers, Daniel Couvertier,
Andres Ramirez, and Byron DeVries. Outside of academia, I would like to thank Matthew
Jermov, Robb Melenyk, and Greg Robertson. Their oft-uncredited discussions and support
definitely helped me get where I am today.
I would also like to thank my family and friends for both believing in me and suffering
with me through this long and somewhat arduous process. Any social life that I previously
had disappeared the moment I entered graduate school, and I appreciate all of the support
and understanding that you all have given to me. Lastly, I especially would like to thank
my parents. Thank you for giving me the drive and desire to accomplish whatever I set my
mind to.

v

TABLE OF CONTENTS

LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

xi

LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

xii

Chapter 1
Introduction . . . . . .
1.1 Problem Description . . . .
1.2 Thesis Statement . . . . . . .
1.3 Research Contributions . . .
1.4 Organization of Dissertation

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

1
2
3
4
6

Chapter 2
Background and Application . . . . . . . . . . . . . .
2.1 Dynamically Adaptive Systems . . . . . . . . . . . . . . . .
2.2 Smart Vacuum System . . . . . . . . . . . . . . . . . . . . .
2.2.1 Overview of Smart Vacuum System . . . . . . . . .
2.2.2 Smart Vacuum System Implementation . . . . . .
2.3 Remote Data Mirroring . . . . . . . . . . . . . . . . . . . .
2.3.1 Overview of Remote Data Mirroring Application
2.3.2 Remote Data Mirroring Implementation . . . . . .
2.4 Requirements Engineering . . . . . . . . . . . . . . . . . . .
2.4.1 Goal-Oriented Requirements Engineering . . . . .
2.4.2 Goal-Oriented Requirements Modeling . . . . . . .
2.4.3 RELAX Specification Language . . . . . . . . . . .
2.5 Evolutionary Computation . . . . . . . . . . . . . . . . . . .
2.5.1 Genetic Algorithms . . . . . . . . . . . . . . . . . . .
2.5.1.1 Population Generation . . . . . . . . . . . .
2.5.1.2 Crossover . . . . . . . . . . . . . . . . . . . .
2.5.1.3 Mutation . . . . . . . . . . . . . . . . . . . .
2.5.1.4 Fitness Evaluation . . . . . . . . . . . . . . .
2.5.2 Stepwise Adaptation of Weights . . . . . . . . . . .
2.5.3 Novelty Search . . . . . . . . . . . . . . . . . . . . . .
2.5.4 (1+1)-ONLINE Evolutionary Algorithm . . . . . .
2.6 Software Testing . . . . . . . . . . . . . . . . . . . . . . . . .
2.6.1 Structural Testing . . . . . . . . . . . . . . . . . . . .
2.6.2 Functional Testing . . . . . . . . . . . . . . . . . . . .
2.6.3 Unit Testing . . . . . . . . . . . . . . . . . . . . . . . .
2.6.4 Regression Testing . . . . . . . . . . . . . . . . . . . .

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

8
8
10
10
11
14
14
17
17
18
18
19
23
23
24
25
26
26
27
28
28
29
29
30
30
30

Chapter 3
Addressing Requirements-Based Uncertainty
3.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2 Introduction to AutoRELAX . . . . . . . . . . . . . .
3.2.1 Assumptions, Inputs, and Outputs . . . . . . .
3.2.1.1 Assumptions . . . . . . . . . . . . . . .

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

31
31
32
33
33

vi

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

3.2.1.2 Inputs and Outputs . . . . . . . . . . . .
AutoRELAX Approach . . . . . . . . . . . . . . .
Optimizing Fitness Sub-Function Weights with
Studies . . . . . . . . . . . . . . . . . . . . . . . . . .
RDM Case Study . . . . . . . . . . . . . . . . . .
3.3.1.1 RDM Uncertainty . . . . . . . . . . . . .
3.3.1.2 Dynamic Weight Adjustment . . . . . .
3.3.2 SVS Case Study . . . . . . . . . . . . . . . . . . .
3.3.2.1 SVS Uncertainty . . . . . . . . . . . . . .
3.3.2.2 Dynamic Weight Adjustment . . . . . .
3.4 Related Work . . . . . . . . . . . . . . . . . . . . . . . . .
3.4.1 Expressing Uncertainty in Requirements . . . .
3.4.2 Requirements Monitoring and Reflection . . . .
3.4.3 Obstacle Mitigation . . . . . . . . . . . . . . . . .
3.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . .

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

33
36
42
43
44
44
48
50
51
54
58
58
58
59
59

Chapter 4
Exploring Code-Level Effects of Uncertainty . . . . . . .
4.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2 Introduction to Fenrir . . . . . . . . . . . . . . . . . . . . . . . . .
4.2.1 Assumptions, Inputs, and Outputs . . . . . . . . . . . . .
4.2.1.1 Assumptions . . . . . . . . . . . . . . . . . . . . .
4.2.1.2 Inputs and Outputs . . . . . . . . . . . . . . . . .
4.2.2 Fenrir Approach . . . . . . . . . . . . . . . . . . . . . . . .
4.3 RDM Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.3.1 DAS Execution in an Uncertain Environment . . . . . .
4.3.1.1 Threats to Validity . . . . . . . . . . . . . . . . .
4.4 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.4.1 Code Coverage . . . . . . . . . . . . . . . . . . . . . . . . .
4.4.2 Automated Testing of Distributed Systems . . . . . . .
4.4.3 Automatically Exploring Uncertainty in Requirements
4.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

61
61
62
63
63
63
66
70
71
73
74
74
75
75
76

Chapter 5
Run-Time Testing of Dynamically Adaptive Systems
5.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.1.1 Test Case Generation . . . . . . . . . . . . . . . . . . . .
5.1.2 When to Test . . . . . . . . . . . . . . . . . . . . . . . . .
5.1.3 Testing Methodology Selection . . . . . . . . . . . . . .
5.1.4 Impact and Mitigation of Test Results . . . . . . . . .
5.2 Introduction to the MAPE-T Feedback Loop . . . . . . . . .
5.2.1 MAPE-T Feedback Loop . . . . . . . . . . . . . . . . . .
5.2.2 Monitoring . . . . . . . . . . . . . . . . . . . . . . . . . .
5.2.2.1 Key Challenges . . . . . . . . . . . . . . . . . . .
5.2.2.2 Enabling Technologies . . . . . . . . . . . . . .
5.2.3 Motivating Example . . . . . . . . . . . . . . . . . . . . .
5.2.4 Analyzing . . . . . . . . . . . . . . . . . . . . . . . . . . .

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

77
77
79
80
82
82
83
84
84
85
85
86
87

3.2.2
3.2.3
3.3 Case
3.3.1

vii

. . . .
. . . .
SAW
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .

.
.
.
.
.
.
.
.
.
.
.
.
.

5.2.4.1 Key Challenges . . . . . . .
5.2.4.2 Enabling Technologies . .
5.2.4.3 Motivating Example . . .
5.2.5 Planning . . . . . . . . . . . . . . . .
5.2.5.1 Key Challenges . . . . . . .
5.2.5.2 Enabling Technologies . .
5.2.5.3 Motivating Example . . .
5.2.6 Executing . . . . . . . . . . . . . . .
5.2.6.1 Key Challenges . . . . . . .
5.2.6.2 Enabling Technologies . .
5.2.6.3 Motivating Example . . .
5.3 Related Work . . . . . . . . . . . . . . . . .
5.3.1 Exploration of System Behavior .
5.3.2 Multi-Agent Systems . . . . . . . .
5.3.2.1 Autonomous Tester Agent
5.3.2.2 Monitoring Agent . . . . .
5.4 Discussion . . . . . . . . . . . . . . . . . . .

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

87
87
88
89
89
89
91
92
92
93
94
94
95
96
97
98
99

Chapter 6
Run-Time Test Adaptation . . . . . . . . .
6.1 Motivation . . . . . . . . . . . . . . . . . . . . . . .
6.2 Terminology . . . . . . . . . . . . . . . . . . . . . .
6.3 Introduction to Proteus . . . . . . . . . . . . . .
6.3.1 Proteus Approach . . . . . . . . . . . . . .
6.3.2 Test Suite . . . . . . . . . . . . . . . . . . .
6.3.3 Adaptive Test Plan . . . . . . . . . . . . .
6.3.3.1 Regression Testing . . . . . . . .
6.4 Introduction to Veritas . . . . . . . . . . . . . . .
6.4.1 Assumptions, Inputs, and Outputs . . . .
6.4.1.1 Assumptions . . . . . . . . . . . .
6.4.1.2 Inputs and Outputs . . . . . . . .
6.4.2 Veritas Fitness Functions . . . . . . . . .
6.4.2.1 Test Case Validation . . . . . . .
6.4.2.2 Online Evolutionary Algorithm
6.5 Run-Time Testing Framework . . . . . . . . . .
6.5.1 Test Case Adaptation. . . . . . . . . . . .
6.6 Case Study . . . . . . . . . . . . . . . . . . . . . .
6.6.1 Simulation Parameters . . . . . . . . . . .
6.6.2 Proteus Experimental Results . . . . . .
6.6.2.1 Threats to Validity. . . . . . . . .
6.6.3 Veritas Experimental Results . . . . . . .
6.6.3.1 Threats to Validity. . . . . . . . .
6.6.4 Combined Experimental Results . . . . .
6.7 RELAXation of Test Cases . . . . . . . . . . . .
6.7.1 Discussion . . . . . . . . . . . . . . . . . . .
6.7.2 Threats to Validity . . . . . . . . . . . . .

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

100
101
101
102
102
105
106
107
108
109
109
109
113
116
117
118
122
124
124
126
129
130
133
133
137
139
142

viii

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

6.8 Related Work . . . . . . . . . . . . . . .
6.8.1 Search-Based Software Testing
6.8.2 Run-Time Testing . . . . . . . .
6.8.3 Test Plan Generation . . . . . .
6.8.4 Test Case Selection . . . . . . .
6.9 Conclusion . . . . . . . . . . . . . . . . .

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

142
142
143
143
144
144

Chapter 7
Impact of Run-Time Testing . . . . .
7.1 Motivation . . . . . . . . . . . . . . . . . . . .
7.2 Analysis Metrics . . . . . . . . . . . . . . . .
7.2.1 DAS Performance . . . . . . . . . . .
7.2.1.1 Total execution time . . . .
7.2.1.2 Memory footprint . . . . . .
7.2.2 DAS Behavior . . . . . . . . . . . . .
7.2.2.1 Requirements satisficement
7.2.2.2 Behavioral function calls . .
7.3 Baseline Results . . . . . . . . . . . . . . . .
7.4 Optimization Approach . . . . . . . . . . . .
7.5 Related Work . . . . . . . . . . . . . . . . . .
7.5.1 Processor Cycles . . . . . . . . . . . .
7.5.2 Agent-Based Testing . . . . . . . . .
7.6 Conclusion . . . . . . . . . . . . . . . . . . . .

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

146
146
147
147
147
147
148
148
148
148
154
163
163
164
164

Chapter 8
End-to-End RDM Example . . . . .
8.1 RDM Configuration . . . . . . . . . . . . .
8.2 Requirements-Based Assurance . . . . . .
8.2.1 AutoRELAX Case Study . . . . . .
8.2.1.1 RDM Goal RELAXation .
8.2.2 Scalability of RDM Application .
8.2.3 Scaled RDM Configuration . . . .
8.2.4 Approach . . . . . . . . . . . . . . .
8.2.5 Experimental Results . . . . . . . .
8.3 Code-Based Assurance . . . . . . . . . . .
8.3.1 Fenrir Analysis . . . . . . . . . . . .
8.4 Run-Time Testing-Based Assurance . . .
8.4.1 Derivation of Test Specification . .
8.4.2 Proteus Analysis . . . . . . . . . . .
8.4.3 Veritas Analysis . . . . . . . . . . .
8.5 Conclusion . . . . . . . . . . . . . . . . . . .

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

166
166
168
168
169
173
173
175
175
177
177
178
179
180
183
185

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

Chapter 9
Conclusions and Future Investigations . . . . . . . . . . . . . . . 187
9.1 Summary of Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . 189
9.2 Future Investigations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190
9.2.1 Exploration of Different Evolutionary Computation Techniques190
9.2.2 Interfacing with the DAS MAPE-K Loop . . . . . . . . . . . . 191
ix

9.2.3 Hardware Realization of the MAPE-T Loop . . . . . . . . . . 191
9.2.4 Incorporation of MAS Architecture . . . . . . . . . . . . . . . . 192
APPENDIX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231

x

LIST OF TABLES

Table 2.1:

RELAX operators [113]. . . . . . . . . . . . . . . . . . . . . . . . . .

21

Table 2.2:

Sample genetic algorithm encoding. . . . . . . . . . . . . . . . . . . .

24

Table 3.1:

Summary of manually RELAXed goals for the RDM application. . . .

45

Table 3.2:

Summary of manually RELAXed goals for the SVS application. . . . .

52

Table 4.1:

Novelty search configuration. . . . . . . . . . . . . . . . . . . . . . . .

71

Table 5.1:

Example of test case as defined according to IEEE standard [54]. . . .

79

Table 6.1:

Examples of SVS test cases. . . . . . . . . . . . . . . . . . . . . . . . 113

Table 6.2:

Individual test case fitness sub-functions. . . . . . . . . . . . . . . . . 114

Table 8.1:

Subset of RDM network parameters.

Table 8.2:

RDM sources of uncertainty. . . . . . . . . . . . . . . . . . . . . . . . 168

Table 8.3:

End-to-end AutoRELAX-SAW configuration. . . . . . . . . . . . . . . 169

Table 8.4:

Comparison of RDM Configuration Parameters. . . . . . . . . . . . . 173

Table 8.5:

End-to-end Fenrir configuration. . . . . . . . . . . . . . . . . . . . . . 177

. . . . . . . . . . . . . . . . . . 167

Table A.1: Smart vacuum system test specification. . . . . . . . . . . . . . . . . . 195
Table A.2: Traceability links for smart vacuum system application. . . . . . . . . 215
Table A.3: Remote data mirroring test specification. . . . . . . . . . . . . . . . . 218
Table A.4: Traceability links for remote data mirroring application. . . . . . . . . 227

xi

LIST OF FIGURES

Figure 1.1:

High-level depiction of how our suite of techniques impacts a DAS at
varying levels of abstraction. . . . . . . . . . . . . . . . . . . . . . .

6

Figure 2.1:

Structure of dynamically adaptive system. . . . . . . . . . . . . . . .

9

Figure 2.2:

KAOS goal model of the smart vacuum system application. . . . . .

12

Figure 2.3:

Screenshot of SVS simulation environment. . . . . . . . . . . . . . .

13

Figure 2.4:

KAOS goal model of the remote data mirroring application. . . . . .

16

Figure 2.5:

Partial KAOS goal model of smart vacuum system application. . . .

20

Figure 2.6:

Fuzzy logic membership functions. . . . . . . . . . . . . . . . . . . .

22

Figure 2.7:

Examples of one-point and two-point crossover. . . . . . . . . . . . .

25

Figure 2.8:

Example of mutation. . . . . . . . . . . . . . . . . . . . . . . . . . .

26

Figure 3.1:

Data flow diagram of AutoRELAX process. . . . . . . . . . . . . . .

36

Figure 3.2:

Encoding a candidate solution in AutoRELAX. . . . . . . . . . . . .

37

Figure 3.3:

Examples of crossover and mutation operators in AutoRELAX. . . .

41

Figure 3.4:

Fitness values comparison between RELAXed and unRELAXed goal
models for the RDM. . . . . . . . . . . . . . . . . . . . . . . . . . . .

46

Mean number of RELAXed goals for varying degrees of system and
environmental uncertainty for the RDM. . . . . . . . . . . . . . . . .

47

Fitness values comparison between AutoRELAXed and SAW-optimized
AutoRELAXed goal models for the RDM. . . . . . . . . . . . . . . . .

49

Fitness values of SAW-optimized AutoRELAXed goal models in a single
environment. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

50

Comparison of SAW-optimized fitness values from different trials in a
single environment. . . . . . . . . . . . . . . . . . . . . . . . . . . . .

51

Figure 3.5:

Figure 3.6:

Figure 3.7:

Figure 3.8:

xii

Figure 3.9:

Fitness values comparison between RELAXed and unRELAXed goal
models for the SVS. . . . . . . . . . . . . . . . . . . . . . . . . . . .

54

Figure 3.10: Mean number of RELAXed goals for varying degrees of system and
environmental uncertainty for the SVS. . . . . . . . . . . . . . . . .

55

Figure 3.11: Fitness values comparison between AutoRELAXed and AutoRELAX-SAW
goal models for the SVS. . . . . . . . . . . . . . . . . . . . . . . . . .

56

Figure 3.12: Comparison of SAW-optimized fitness values from different trials in a
single environment. . . . . . . . . . . . . . . . . . . . . . . . . . . . .

57

Figure 4.1:

Example code to update data mirror capacity. . . . . . . . . . . . . .

65

Figure 4.2:

Data flow diagram of Fenrir approach.

. . . . . . . . . . . . . . . . .

67

Figure 4.3:

Sample Fenrir genome representation for the RDM application. . . . .

67

Figure 4.4:

Example of weighted call graph.

. . . . . . . . . . . . . . . . . . . .

69

Figure 4.5:

Novelty value comparison between Fenrir and random search. . . . . .

72

Figure 4.6:

Unique execution paths. . . . . . . . . . . . . . . . . . . . . . . . . .

73

Figure 5.1:

Comparison between standard test case and adaptive test case. . . .

81

Figure 5.2:

MAPE-T feedback loop. . . . . . . . . . . . . . . . . . . . . . . . . .

84

Figure 6.1:

DAS configurations and associated adaptive test plans. . . . . . . . . 104

Figure 6.2:

Example of test case configuration for T S1.0 and T S1.1 . . . . . . . . 106

Figure 6.3:

Veritas example output tuple. . . . . . . . . . . . . . . . . . . . . . . 112

Figure 6.4:

Proteus workflow diagram. . . . . . . . . . . . . . . . . . . . . . . . . 119

Figure 6.5:

Veritas workflow diagram. . . . . . . . . . . . . . . . . . . . . . . . . 120

Figure 6.6:

Examples of test case adaptation. . . . . . . . . . . . . . . . . . . . . 123

Figure 6.7:

Average number of irrelevant test cases executed for each experiment. 127

Figure 6.8:

Average number of false positive test cases for each experiment. . . . 128

Figure 6.9:

Average number of false negative test cases for each experiment. . . . 129
xiii

Figure 6.10: Cumulative number of executed test cases for each experiment. . . . 130
Figure 6.11: Comparison of fitness between Veritas and Control. . . . . . . . . . . . 131
Figure 6.12: Comparison of false negatives between Veritas and Control. . . . . . . 132
Figure 6.13: Average number of irrelevant test cases for combined experiments. . 134
Figure 6.14: Average number of false positive test cases for combined experiments. 135
Figure 6.15: Average number of false negative test cases for combined experiments. 136
Figure 6.16: Average test case fitness for combined experiments. . . . . . . . . . . 137
Figure 6.17: Comparison of average test case fitness values for RELAXed test cases. 140
Figure 6.18: Comparison of average test case failures for RELAXed test cases. . . . 141
Figure 7.1:

Amount of time (in seconds) to execute RDM application in different
testing configurations. . . . . . . . . . . . . . . . . . . . . . . . . . . 150

Figure 7.2:

Amount of memory (in kilobytes) consumed by the RDM application
in different testing configurations. . . . . . . . . . . . . . . . . . . . . 151

Figure 7.3:

Average of calculated utility values throughout RDM execution in
different testing configurations. . . . . . . . . . . . . . . . . . . . . . 152

Figure 7.4:

Average amount of utility violations throughout RDM execution in
different testing configurations. . . . . . . . . . . . . . . . . . . . . . 153

Figure 7.5:

Number of adaptations performed throughout RDM execution in different testing configurations. . . . . . . . . . . . . . . . . . . . . . . 155

Figure 7.6:

Amount of time (in seconds) to execute optimized RDM application
in different parallel and non-parallel testing configurations. . . . . . . 157

Figure 7.7:

Amount of time (in seconds) to execute optimized network controller
function in different parallel and non-parallel testing configurations. . 159

Figure 7.8:

Average of calculated utility values throughout RDM execution in
different parallel and non-parallel testing configurations. . . . . . . . 160

Figure 7.9:

Average number of utility violations encountered throughout RDM
execution in different parallel and non-parallel testing configurations. 161

xiv

Figure 7.10: Number of adaptations performed throughout RDM execution in different parallel and non-parallel testing configurations. . . . . . . . . . 162
Figure 8.1:

KAOS goal model of the remote data mirroring application. . . . . . 170

Figure 8.2:

Fitness values comparison between RELAXed and unRELAXed goal
models for the RDM. . . . . . . . . . . . . . . . . . . . . . . . . . . . 171

Figure 8.3:

Fitness values comparison between AutoRELAXed and SAW-optimized
AutoRELAXed goal models for the RDM. . . . . . . . . . . . . . . . . 172

Figure 8.4:

RELAXed KAOS goal model of the remote data mirroring application.174

Figure 8.5:

Fitness values comparison between RELAXed and unRELAXed goal
models for a scaled RDM. . . . . . . . . . . . . . . . . . . . . . . . . 176

Figure 8.6:

Comparison of average number of RDM errors between original and
updated RDM application across 50 trials. . . . . . . . . . . . . . . . 179

Figure 8.7:

Cumulative number of irrelevant test cases executed for each experiment.181

Figure 8.8:

Cumulative number of false positive test cases for each experiment. . 182

Figure 8.9:

Cumulative number of false negative test cases for each experiment. . 183

Figure 8.10: Cumulative number of executed test cases for each experiment. . . . 184
Figure 8.11: Average test case fitness values calculated for each experiment. . . . 185
Figure A.1: KAOS goal model of the smart vacuum system application. . . . . . 214
Figure A.2: KAOS goal model of the remote data mirroring application. . . . . . 230

xv

Chapter 1
Introduction
Cyber-physical systems are being increasingly implemented as safety-critical systems [74]. Moreover, these software systems can be exposed to conditions for which they
were not explicitly designed, including unexpected environmental conditions or unforeseen
system configurations. For domains that depend on these safety-critical systems, such as
patient health monitoring or power grid management systems, unexpected failures can have
costly results. In response to these issues, dynamically adaptive systems (DAS) have been
developed to constantly monitor themselves and their environments, and if necessary, change
their structure and behavior to handle unexpected situations. Providing run-time assurance
that a DAS is consistently satisfying its high-level objectives and requirements remains a
challenge, as unexpected environmental configurations can place a DAS into a state in which
it was not designed to execute. For this dissertation, environmental uncertainty refers to
unanticipated combinations of environmental conditions. In contrast, system uncertainty
encompasses imprecise or occluded sensor measurements, sensor failures, unintended system
interactions, and unexpected modes of operation. As such, we developed new techniques to
address assurance in the face of both types of uncertainty at different levels of abstraction,
including requirements, code, and testing.

1

While software engineering techniques have historically focused on design-time approaches for developing a DAS [47, 59, 114] and providing design-time assurance [48, 50,
78, 85], increasingly, more efforts are targeting assurance and system optimization at run
time [15, 36, 39, 78]. As cyber-physical systems continue to proliferate, techniques are needed
to ensure that their continued execution satisfies their key objectives. For example, agentbased approaches have been developed to provide continual run-time testing [78, 79] in order
to verify that the system is behaving as expected. However, these approaches tend to run in
a parallel, sandboxed environment. Conversely, we focus on addressing assurance concerns
within the production environment.
This dissertation presents a suite of techniques at different levels of abstraction that
collectively addresses assurance concerns for a DAS that is experiencing both system and
environmental uncertainty. Specifically, our techniques provide automated assurance that
a DAS continually satisfies its requirements and key objectives within its requirements,
implementation, and testing levels of abstraction. Where applicable, we leverage searchbased software engineering heuristics to augment our techniques.

1.1

Problem Description

The field of software engineering strives to design systems that continuously satisfy
requirements even as environmental conditions change throughout execution [21, 94, 113].
To this end, a DAS provides an approach to software design that can effectively respond
to unexpected conditions that may arise at run time by performing a self-reconfiguration.
Design and requirements elicitation for a DAS can begin with a high-level, graphical description of system objectives, constraints, and assumptions in the form of a goal model
that can be used as a basis for the derivation of requirements [21]. Adaptation strategies
for performing system reconfiguration can then be defined to mitigate previously identified
system and environmental conditions in order to effectively respond to known or expected

2

situations [22, 47, 81]. The DAS then monitors itself and its environment at run time to
determine if the identified conditions are negatively impacting the DAS, and if so, performs
a self-reconfiguration to mitigate those conditions.
In addition to design-time assurance, verification and validation activities are required
to provide a measure of assurance that the DAS succeeds in satisfying key objectives and
requirements. However, standard testing techniques must be augmented when testing a
DAS as it may exhibit unintended behaviors or follow unexpected paths of execution in the
face of uncertainty [45]. Moreover, due to the complex and adaptive nature of a DAS, runtime techniques are necessary to continually validate that the DAS satisfies its requirements
even as the environment changes [19, 41, 43, 44, 48]. These concerns imply that assurance
techniques for a DAS must not only be applied at design time, but also at run time.
As a result, we have identified the following challenges that currently face the DAS
research community:
• Provide effective assurance that the DAS will satisfy its objectives in different operating
conditions at run time.
• Anticipate and mitigate unexpected sources of uncertainty in both the system and
environment.
• Leverage and extend traditional testing techniques to handle the needs of a complex
adaptive system.

1.2

Thesis Statement

This research is intended to explore methods for addressing uncertainty with a focus on
providing assurance at different levels of abstraction in a DAS. In particular, techniques for
facilitating requirements adaptation, exploration of DAS behaviors expressed as a result of
exposure to different combinations of system and environmental uncertainty, and performing
run-time testing are examined to provide assurance for the DAS.
3

1.2.0.0.1

Thesis Statement.

Evolutionary techniques can be leveraged to address assurance concerns for adaptive
software systems at different levels of abstraction, including at the system’s requirements,
implementation, and run-time execution levels, respectively.

1.3

Research Contributions

The overarching objective of this dissertation is to provide assurance for a DAS at
different levels of abstraction. We address this goal with three key research objectives.
First, we provide a set of techniques that mitigate uncertainty by providing assurance for a
DAS’s requirements, implementation, and run-time behavior. Second, automation is a key
driving force to designing, performing, and analyzing the particular issues presented by each
assurance technique. Third, we provide a feedback loop to enable testing of a DAS at run
time. To accomplish these key objectives, we developed the following set of techniques:
1. At the requirements level, we provide an automated technique for mitigating uncertainty at the DAS requirements level by exploring how goals and requirements can
be made more flexible by automatically introducing RELAX operators [21, 113], each
of which map to a fuzzy-logic membership function. We also introduce a method
for optimizing the exploration process of goal RELAXation for different environmental contexts. To this end, we introduce AutoRELAX [42, 88] and AutoRELAX-SAW [42],
respectively. AutoRELAX provides requirements-based assurance by exploring how RELAX operators can introduce flexibility to a system goal model. AutoRELAX-SAW, in
turn, extends AutoRELAX by balancing the competing concerns that manifest when the
DAS experiences different combinations of environmental conditions.
2. At the implementation level, we develop an automated technique to explore and optimize run-time execution behavior of a DAS as it experiences diverse combinations of
4

operating conditions. In particular, we introduce Fenrir [45], a technique that leverages
novelty search [64] to determine the specific path of execution that a DAS follows under
different sets of operating conditions, thereby potentially uncovering unanticipated or
anomalous behaviors.
3. At the run-time execution level, we define and implement a feedback loop that provides run-time assurance for a DAS through online monitoring and adaptive testing.
Specifically, we introduce the MAPE-T feedback loop [44], a technique that defines how
run-time testing is enabled via Monitoring, Analyzing, Planning, and Executing, linked
together by Testing knowledge, thereby taking a proactive approach in ensuring that
a DAS continually exhibits correct behaviors. Furthermore, we introduce techniques
to facilitate run-time adaptation of test suites [41] and test cases [43] to ensure that
assurance is continually provided even as environmental conditions change over time.
Figure 1.1 presents a high-level description of how each technique facilitates assurance at
the requirements, implementation, and testing levels. First, AutoRELAX (1) and AutoRELAXSAW (2) assess and mitigate both system and environmental uncertainty at the requirements

level. Next, Fenrir (3) explores how program behavior, as defined by DAS code, is affected
by system and environmental uncertainty. Techniques (1), (2), and (3) are intended to be
performed at design time. Next, the MAPE-T feedback loop (4) provides run-time assurance
for both the DAS’s requirements and implementation by performing run-time testing in the
face of uncertainty. The MAPE-T loop is supported by Proteus (5) and Veritas (6), where
Proteus and Veritas each address assurance by ensuring that test suites and test cases, respec-

tively, remain relevant as environmental conditions change and the DAS transitions to new
system configurations. As the operating context evolves, Proteus and Veritas ensure that test
suites and test cases correctly evolve in turn. Techniques (4), (5), and (6) are intended to
be performed at run time.

5

Goal Model

(2)
AutoRELAX
-SAW

(1)
AutoRELAX

(3)
Fenrir

optimized, RELAXed
goal model

Code

novel code
traces

DAS
RELAXed Goal
Model

Code Traces

Test Suites &
Test Cases

verification of runtime behavior

`

(4)
MAPE-T
Feedback
Loop

requirements
traceability

execution &
adaptation

Legend
Technique

Data Flow

Data store

Boundary

adaptive
test plans

Design-time technique
Run-time technique

test
cases

test
suites

(5)
Proteus

(6)
Veritas
optimized
test cases

Figure 1.1: High-level depiction of how our suite of techniques impacts a DAS at varying
levels of abstraction.

1.4

Organization of Dissertation

The remainder of this dissertation is organized as follows. Chapter 2 provides background information on enabling technologies used for our research, including DASs, goaloriented requirements modeling, evolutionary computation, the RELAX requirements specification language, and software testing. Chapter 2 also presents background information
on our case studies used to validate our techniques: intelligent vacuum systems and remote
data mirroring. Next, Chapter 3 presents AutoRELAX and AutoRELAX-SAW, techniques for
mitigating the effects of environmental uncertainty at the system requirements level. Then,
Chapter 4 describes Fenrir, our technique for exploring the impact of environmental uncertainty on DAS behavior at the code level. Next, Chapter 5 introduces the MAPE-T feedback
loop and describes the key elements of the feedback loop. Chapter 6 then describes a real-

6

ization of MAPE-T with two techniques, Proteus and Veritas, that perform online adaptation
of test suites and test cases, respectively, to ensure testing activities remain relevant to the
operating context even as the DAS reconfigures and the environment changes over time.
Following, Chapter 7 discusses the impact that a run-time testing framework can have on a
DAS and also presents techniques for optimizing run-time testing. Chapter 8 then presents
an end-to-end example where each of our techniques were applied to the RDM case study in
a stepwise process. Finally, Chapter 9 presents our conclusions and summarizes the research
contributions presented in this dissertation and then discusses future directions for this line
of work.

7

Chapter 2
Background and Application
This chapter provides relevant background information on the topics discussed within
this dissertation: dynamically adaptive systems (DAS), smart vacuum systems (SVS), remote data mirroring (RDM) networks, requirements engineering, and evolutionary computation (EC). First, we overview the components of a DAS and its self-reconfiguration
capabilities. Next, we describe the SVS and RDM applications, including the implementation details of each as they are used throughout this dissertation as motivating examples.
Following, we describe requirements engineering from the perspective of system goal models
and also overview the RELAX specification language. We then present EC and how it can be
used to explore complex search spaces. Finally, we describe software testing and highlight
common approaches for performing testing.

2.1

Dynamically Adaptive Systems

The state-space explosion of possible combinations of environmental conditions that a
system may experience during execution precludes their total enumeration [21, 113]. Moreover, system requirements may change after initial release, thus potentially requiring a new
software release or patch. A DAS provides an approach to continuously satisfy requirements
by changing its configuration and behavior at run time to mitigate changes in its require8

ments or operating environment [68, 81]. As such, we consider a DAS to comprise a set of
non-adaptive configurations connected by adaptive logic [114]. Figure 2.1 illustrates our approach for implementing a DAS. Specifically, the example DAS comprises n configurations
(C1..n ), each of which is connected by adaptive logic (A). Each configuration Ci satisfies
requirements for a given operating context and each path of adaptation logic defines the
steps and conditions necessary to move from a source DAS configuration to a target DAS
configuration (e.g., from C1 to C2 ).

An1
A12
C1

A21

A2n
C2

Cn

An2
A1n

Legend
Ci

DAS
configuration

adaptive
logic

Figure 2.1: Structure of dynamically adaptive system.
Some DASs are embedded systems and achieve dynamic adaptations via mode
changes [77]. Mode changes can enable run-time adaptation in situations in which it is
either not safe or practical to upload or offload software on executing systems, and thus,
mode changes are required to capture the effects of dynamic reconfiguration. In particular,
mode changes enable a DAS to self-reconfigure by selecting discrete modes of operation,
where a mode is characterized by a particular configuration of system resources and parameters. For example, an autonomous wheeled robot may be characterized by different
pathfinding modes and can transition from an exploration mode to a wall-following mode
based on input from monitored sensors and a central controller.
9

2.2

Smart Vacuum System

This section describes the smart vacuum system (SVS) application that is used as a case
study throughout this dissertation. First, we overview SVSs in general, present our derived
goal model of the SVS, and then discuss our implementation of the SVS.

2.2.1

Overview of Smart Vacuum System

SVSs are currently available in the consumer market, with a notable example being
iRobot’s Roomba.1 An SVS must clean a desired space by using sensor inputs to balance path
planning, power conservation, and safety concerns. Common sensors available to an SVS
include bumper sensors, motor sensors, object sensors, and internal sensors. Bumper sensors
provide feedback when the robot collides with an object, such as a wall or a table. Motor
sensors provide information regarding wheel velocities, suction speed, and power modes.
Object sensors, for example infrared or camera sensors, can be used to detect and identify
different types of entities near the SVS. Internal sensors provide feedback regarding sensor
health or the overall state of the SVS. A robot controller processes data from each sensor to
determine an optimal path plan, conserve battery power as necessary, avoid collisions with
objects that may damage the SVS or the object itself (e.g., a pet or a child), and avoid
objects which, if vacuumed, could damage the internal components of the SVS (e.g., liquid
or large dirt particle).
Due to its relative level of sophistication, an SVS can also be modeled as an adaptive
system [7, 8]. Specifically, the SVS can perform mode changes at run time [77] as a means
to emulate the self-reconfiguration capabilities of a DAS. Each mode provides the SVS with
the capability to select an optimal configuration of system parameters to properly mitigate
uncertainties within the system and environment. An example of system uncertainty is
noisy or untrustworthy sensor data, and an example of environmental uncertainty is the
1 See http://www.irobot.com/

10

possibility for a liquid to have been spilled in the room in which the SVS is operating.
Possible SVS modes include different pathfinding algorithms, reduced power consumption
modes, and obstacle avoidance measures. Each mode of operation can be configured in a
different manner, leading to an explosion of possible configuration states.
Figure 2.2 presents a KAOS goal model for the SVS application. The SVS must successfully clean at least 50% of the small dirt particles within the room (A). To do so, the
SVS must operate efficiently (B) to conserve battery power (F) while still providing both
movement (E) and suction (G) capabilities. The SVS can operate in a normal power mode
for speed (L) and suction (N), or lower its power consumption (F) by operating in a reduced
power mode for speed (K) and/or suction (M). The SVS must also clean the room effectively
(C) by selecting an appropriate path plan. The SVS can both clean and explore the room by
selecting either a random (O) or straight (P) path for 10 seconds (H), or focus on a smaller
area by selecting the 20 second (I) spiral (Q) path plan. Moreover, the SVS must also satisfy
safety objectives (D). If a safety violation occurs, then the SVS must activate a failsafe mode
(J). Safety violations include collisions with specific obstacles (e.g., pets or children), falling
down stairs (R), or collisions with objects that can damage the SVS (S).

2.2.2

Smart Vacuum System Implementation

This section describes the implementation of the SVS that is used throughout this dissertation. The SVS was configured as an autonomous robot that must efficiently, effectively,
and safely vacuum dirt particles within a room while avoiding obstacles and objects that
may cause harm to the SVS. Figure 2.3 provides a screenshot of the SVS simulation as implemented within the Open Dynamics Engine physics platform.2 The screenshot shows the
SVS within a simulated room, containing small dirt particles (i.e. small dark cubes), large
dirt particles (i.e., large dark cubes), a liquid spill (i.e., small yellow disc), and two obstacles
(i.e., thin red pillars) that the SVS must avoid.
2 See http://www.ode.org.

11

(A)

(B)

Achieve
[Movement]

(E)

(K)

Achieve [Reduced
Speed]

(L)

Achieve [Cleaning
Efficiency]

(F) Achieve [BatteryPower
> 5%]

Achieve [Normal
Speed]

(M)

…

Maintain
[Suction]

(G)

Achieve [Reduced
(N)
Suction]

Battery
Sensor

Motors

Achieve [50%
Clean]

Achieve [Normal
Suction]

Vacuum

(A) Left half of smart vacuum system goal model.
…

(C)

Achieve [Cleaning
Effectiveness]

(D)

Maintain
[Safety]

(J) FailSafeEnabled If
SafetyCheckFailure

(R)

Avoid
[Obstacles]

Object
Sensor
(H) Achieve [Path Plan
for 10 Seconds]

(O)

Achieve [Random
Path]

(P)

(S)

Suction
Sensor

Avoid [Self
Damage]

Cliff
Sensor

Internal
Sensor

Achieve [Clean Area
for 20 seconds]

(I)

Achieve [Straight
Path]

(Q)

Achieve [Spiral
Path]

Legend
Bumper
Sensors

Controller

Goal

Agent

Requirement / Expectation

Refinement

(B) Right half of smart vacuum system goal model.

Figure 2.2: KAOS goal model of the smart vacuum system application.
12

Figure 2.3: Screenshot of SVS simulation environment.
The SVS (i.e., large yellow disc) comprises a controller and a set of available sensors,
including an array of bumper sensors (i.e., small gray spheres attached to SVS), an array of
cliff sensors (i.e., gray rectangles attached to SVS), an object sensor, an internal sensor, and
wheel and suction sensors. The bumper sensor array provides feedback upon contact with
a wall or object. The cliff sensor array detects downward steps and is intended to prevent
damage from falling. The object sensor detects the distance between the SVS and nearby
objects and also provides information regarding the type of object that was detected. The
internal sensor monitors the SVS to determine if any damage has occurred, such as vacuuming
liquid or large dirt particles. Wheel and suction sensors provide feedback regarding the
velocity and suction power of the wheel and suction motors, respectively, and also provide
information regarding the health of each. Lastly, a controller aggregates the information
from all sensors to determine an appropriate path plan for the robot to follow, as well as a
power consumption plan to determine if the SVS must take measures to conserve battery
power.

13

To illustrate our implementation, we now describe an example of the SVS performing
mode changes. At the start of the simulation, the SVS selects a SPIRAL path plan that
is to be executed for 20 seconds. Within 10 seconds, the cliff sensor detects that the SVS
is near a downward step and therefore changes its mode to a cliff avoidance plan, pausing
the SPIRAL path plan while the SVS begins to move in reverse, away from the step. The
cliff avoidance mode runs for 5 seconds to ensure that the SVS has avoided the step, and
then resumes the SPIRAL path plan mode for the remaining 5 seconds. At 20 seconds,
the SVS selects a new path plan mode (e.g., STRAIGHT, 10 seconds) and begins to move
in a straight line, as opposed to the spiraling path it followed previously. After executing
for several minutes, the SVS’s internal sensor detects that the amount of available battery
power is falling below a predefined threshold (e.g., 50% remaining). The SVS then selects
a power conservation mode and reduces power to its wheels, slowing the overall velocity of
the SVS while effectively extending its battery life. This process of mode changes reflects
how an onboard system-based DAS adapts and selects new configurations at run time.

2.3

Remote Data Mirroring

This section describes the remote data mirroring (RDM) application that is used as a
case study throughout this dissertation. First, we overview RDMs in general, present our
derived goal model of the RDM, and then discuss our implementation of the RDM.

2.3.1

Overview of Remote Data Mirroring Application

RDM is a data protection technique for maintaining data availability and preventing
data loss by storing copies (i.e., replicates) on servers (i.e., data mirrors) in physically remote
locations [56, 60]. By replicating data on remote data mirrors, an RDM can provide continuous access to data and moreover ensure that data is not lost or damaged. In the event of
an error or failure, data recovery can be facilitated by either requesting or reconstructing the

14

lost or damaged data from another active data mirror. Additionally, the RDM network must
replicate and distribute data in an efficient manner by minimizing consumed bandwidth and
providing assurance that distributed data is not lost or corrupted.
The RDM can reconfigure at run time in response to uncertainty, including dropped or
delayed messages and network link failures. Furthermore, each network link incurs an operational cost that directly impacts a controlling budget and also has a measurable throughput,
latency, and loss rate. Collectively, these metrics determine the overall performance and
reliability of the RDM. To mitigate unforeseen issues, the RDM can reconfigure in terms
of its network topology and data mirroring protocols. Specifically, the RDM can selectively
activate and deactivate network links to change its overall topology. Furthermore, each data
mirror can select a remote data mirroring protocol, defined as either synchronous or asynchronous propagation. Synchronous propagation ensures that the receiving or secondary
data mirror both receives and writes incoming data before completion at the primary or
sending site. Batched asynchronous propagation collects updates at the primary site that
are periodically transmitted to the secondary site. Given its complex and adaptive nature,
the RDM application can be modeled and implemented as a DAS [89].
Figure 2.4 provides a KAOS goal model of the RDM application. Specifically, the RDM
must maintain remotely stored copies of data (A). To satisfy this goal, the RDM must
maintain operational costs within a fixed budget (B) while ensuring that the number of
disseminated data copies matches the number of available servers (C). To satisfy Goal (B),
the RDM must be able to measure all network properties (D) while ensuring that both the
minimum number of network links are active (E) and that the network is unpartitioned (F).
To satisfy Goal (C), the RDM must ensure that risk (G) and time for data diffusion (H) each
remain within pre-defined constraints, and moreover, the cost of network adaptation must
be minimized (I). To satisfy Goals (D) – (I), RDM agents, such as sensors and actuators,
must be able to measure and effect all available network properties, respectively.

15

(A)

Maintain [Operational
Costs  ≤    Budget]

(B)

Achieve [Measure
Network Properties]

(D)

(J)

Achieve
[Activity
Measured]

…

(E) Achieve [Minimum Num
Links Active]

(M)

(L)

(K)

Achieve
[Cost
Measured]

Maintain
[DataAvailable]

Achieve
[LossRate
Measured]

(N)

Achieve
[Workload
Measured]

Link
Sensor

Achieve
[Capacity
Measured]

(F)

Achieve [Network
Partitions == 0]

(P)

(O)

Achieve [Link
Deactivated]

Achieve [Link
Activated]

Network
Actuator

RDM
Sensor

(A) Left half of remote data mirroring goal model.
…

Achieve [NumDataCopies ==
NumServers]

(C)

(G) Achieve [DataAtRisk  ≤  
RiskThreshold]
(Q)
Achieve
[Send Data
Synchronously]

(H)

Achieve [DiffusionTime  ≤  
MaxTime]

(R)
Achieve
[Data Sent ==
Data Received]

(S)
Achieve
[Send Data
Asynchronously]

(I)

(T)
Achieve
[Data Received
== Data Sent]

Network
Controller

Achieve [Adaptation
Costs == 0]

(U)
(V)
(W)
Achieve
Achieve
Achieve
[Num Active
[Num Passive Data [Num Quiescent
Data Mirrors ==
Mirrors == 0]
Data Mirrors == 0]
Num Mirrors]

Adaptation
Controller

Legend
Goal

Agent

Requirement / Expectation

Refinement

(B) Right half of remote data mirroring goal model.

Figure 2.4: KAOS goal model of the remote data mirroring application.

16

2.3.2

Remote Data Mirroring Implementation

This section describes the implementation of the RDM application used for case studies
within this dissertation. Specifically, we modeled the RDM network as a completely connected graph, where each node represents an RDM and each edge represents a network link.
In total, the RDM network comprises 25 RDMs with 300 network links. Each link can be
activated or deactivated, and while active, can be used to transfer data between RDMs. An
operational model previously introduced by Keeton et al. [60] was used to determine performance attributes for each RDM and network link. The RDM application was simulated for
150 time steps. During each simulation, 20 data items were inserted into randomly selected
RDMs at different times in the simulation. The selected RDMs were then responsible for
distributing the data items to all other RDMs within the network.
The RDM network is subject to uncertainty throughout execution. For example, unpredictable network link failures and dropped or delayed messages can affect the RDM at
any point during the simulation. In response, the network can self-reconfigure to move to a
state that can properly mitigate these setbacks. To this end, each RDM implements the dynamic change management (DCM) protocol [63], as well as a rule-based adaptation engine,
to monitor goal satisfaction and determine if a reconfiguration of topology or propagation
method is required. Upon determining that self-reconfiguration is necessary, a target network configuration and set of reconfiguration steps are generated to ensure a safe transition
to the new configuration.

2.4

Requirements Engineering

This section presents background information on goal-oriented requirements engineering,
goal-oriented requirements modeling, and the RELAX specification language.

17

2.4.1

Goal-Oriented Requirements Engineering

An integral part of software engineering is in eliciting, analyzing, and documenting
objectives, constraints, and assumptions required for a system-to-be to solve a specific problem [105]. In the 4-variable model proposed by Jackson and Zave [55], the problem to be
solved by the system-to-be exists within some organizational, technical, or physical context.
As a result, the system-to-be shares a boundary with the area surrounding the problem,
interacting with that world and its stakeholders. As a result, the system-to-be must monitor
and control parts of this shared boundary to solve the problem.
Goal-oriented requirements engineering (GORE) extends the 4-variable model with the
concept of a goal. Specifically, a goal guides the elicitation and analysis of system requirements based upon key objectives of the system-to-be. Furthermore, the goal must capture
stakeholder intentions, assumptions, and expectations [105]. Several types of goals exist: a
functional goal declares a service that a system-to-be must provide to its stakeholders; a
non-functional goal imposes a quality constraint upon delivery of those services; a safety
goal is concerned with critical safety properties of a system-to-be; and a failsafe goal ensures
that the system-to-be has a fallback state in case of critical error. Additionally, a functional,
safety, or failsafe goal may be declared invariant (i.e., must always be satisfied; denoted by
keyword “Maintain” or “Avoid”) or non-invariant (i.e., can be temporarily unsatisfied; denoted by keyword “Achieve”). Goals may also be satisficed, or satisfied to a certain degree,
throughout execution [24].

2.4.2

Goal-Oriented Requirements Modeling

The GORE process gradually decomposes high-level goals into finer-grained subgoals [105], where the semantics of goal decomposition are captured graphically by a directed
acyclic graph. Each node within the graph represents a goal and each edge represents a goal
refinement. Figure 2.5 presents a KAOS goal model [25, 105] based on the SVS that is

18

used as a case study throughout this dissertation. KAOS depicts goals and refinements as
parallelograms with directed arrows that point towards the higher-level (i.e., parent) goals.
KAOS also supports AND/OR refinements, where an AND-decomposition is satisfied only
if all its subgoals are also satisfied, and an OR-decomposition is satisfied if at least one of
its subgoals is satisfied. Generally, AND-refinements capture objectives that must be performed in order to satisfy the parent goal, and OR-refinements provide alternative paths for
satisfying a particular goal.
Goal decomposition continues until each goal has been assigned to an agent capable of
achieving that goal. An agent represents an active system component that restricts its behavior to fulfill leaf-level goals (i.e., requirements/expectation goals) [105]. There are two types
of agents: system and environmental. A system agent is an automated component controlled
by the system-to-be, and an environmental agent is often a human or some component that
cannot be controlled by the system-to-be.
For example, Figure 2.5 presents an example of a partial KAOS goal model that defines
a subset of the high-level goals for the SVS. In particular, Goal (B) is decomposed into
Goals (E), (F), and (G) via an AND-decomposition. Goal (B) can only be satisfied if and
only if Goals (E), (F), and (G) are also satisfied. Furthermore, Goal (G) is decomposed
via an OR-decomposition into Goals (M) and (N), implying that Goal (G) is satisfied if at
least one subgoal is also satisfied. Leaf-level goals (K), (L), (M), and (N) represent low-level
requirements or expectations. Lastly, the hexagonal objects (e.g., Motors, Battery Sensor,
Vacuum) represent agents that must achieve the leaf-level goals.

2.4.3

RELAX Specification Language

RELAX [21, 113] is a requirements specification language used to identify and assess
sources of uncertainty. RELAX declaratively specifies the sources and impacts of uncertainty
at the shared boundary between the execution environment and system-to-be [55]. Furthermore, a requirements engineer organizes this information into three distinct elements: ENV,
19

(A)

(B)

(E)

(K)

Achieve
[Movement]

Achieve [Reduced
Speed]

(L)

Achieve [Cleaning
Efficiency]

(F) Achieve [BatteryPower
> 5%]

Achieve [Normal
Speed]

Motors

(M)

Achieve [50%
Clean]

(G)

Achieve [Reduced
(N)
Suction]

Battery
Sensor

…

Maintain
[Suction]

Achieve [Normal
Suction]

Vacuum

Figure 2.5: Partial KAOS goal model of smart vacuum system application.
MON, and REL. ENV defines environmental properties that can be observed by the DAS’s

monitoring infrastructure. MON specifies the elements that are available within the monitoring infrastructure. REL defines the method to compute a quantifiable value of ENV properties
from their corresponding MON elements.
The semantics of RELAX operators are defined in terms of fuzzy logic and specify the
extent that a non-invariant goal can be temporarily unsatisfied at run time [113]. Table 2.1
describes the intent of each RELAX operator. For instance, in Figure 2.5, the RELAX operator AS CLOSE AS POSSIBLE TO 0.05 can be applied to Goal (F) to allow flexibility in the
remaining amount of battery power for the system.
Figure 2.6 gives the fuzzy logic membership functions that have been implemented
for RELAX and used within this dissertation. Specifically, the AS EARLY AS POSSIBLE and
AS FEW AS POSSIBLE use the left shoulder function from Figure 2.6(A). The AS LATE AS
POSSIBLE and AS MANY AS POSSIBLE operators use the right shoulder function from Fig-

20

Table 2.1: RELAX operators [113].
RELAX Operator

Informal Description

Fuzzy-logic
Membership
Function

AS EARLY AS POSSIBLE φ

φ becomes true as close to the

Left Shoulder

current time as possible.
AS LATE AS POSSIBLE φ

φ becomes true as close to time

Right shoulder

t = ∞ as possible.
AS CLOSE AS POSSIBLE TO

φ is true at periodic intervals

[f requency φ]

as close to f requency as possi-

Triangle

ble.
AS FEW AS POSSIBLE φ

The value of a quantifiable

Left shoulder

property φ is as close as possible to 0.
AS MANY AS POSSIBLE φ

The value of a quantifiable

Right shoulder

property φ is maximized.
AS CLOSE AS POSSIBLE TO

The value of a quantifiable

[quantity φ]

property φ approximates a de-

Triangle

sired target value quantity.
ure 2.6(B). Lastly, the AS CLOSE AS POSSIBLE TO operators use the triangle function from
Figure 2.6(C).
Cheng et al. [21] previously proposed a manual approach for applying RELAX operators
to KAOS non-invariant goals. A requirements engineer must first specify the ENV, MON, and
REL elements necessary to define the sources of uncertainty within the operational context.

Next, each goal must be designated as either invariant or non-invariant, where invariant goals
are precluded from RELAXation. For each non-invariant goal, the engineer must determine
21

Desired
Value

Measured
Property

1.0

0.0

Max. Value
Allowed
Desired Property
(A) Left shoulder function

Desired
Value

Measured
Property

1.0

Min. Value
Allowed

Desired Property

0.0

(B) Right shoulder function

Measured
Property

Desired
Value
1.0
Max. Value
Allowed

Min. Value
Allowed

Desired Property

0.0

(C) Triangle shoulder function

Figure 2.6: Fuzzy logic membership functions.
if any of the defined sources of uncertainty can cause the goal to become unsatisfied. The
engineer then applies an appropriate RELAX operator to restrict how the particular goal
may be temporarily violated. For a modest-sized goal model with minimal sources of uncer-

22

tainty, many possible combinations of RELAXed goals are possible, thereby necessitating an
automated approach for applying RELAX operators [42, 88].

2.5

Evolutionary Computation

Evolutionary computation (EC) comprises a family of stochastic, search-based techniques that are considered to be a sub-field of artificial and computational intelligence [46, 57].
Genetic algorithms [53], genetic programming [62], evolutionary strategies [96], digital evolution [80], and novelty search [64] are different approaches to EC. These techniques are
generally used to solve problems in which there is a large solution space, such as complicated optimization or search problems within software engineering [20, 42, 49, 50, 51, 52],
robotics [62], and land use management [23]. EC techniques typically implement evolution
by natural selection as a means to guiding the search process towards optimal areas within
the solution space.
To implement an EC approach, the parameters that comprise a candidate solution must
be fully defined. These parameters must be encoded into a data structure that enables operations specified by the corresponding algorithm [46] to facilitate evolutionary search. As
EC is grounded in Darwinian evolution, the use of biologically-inspired terms is relevant.
In particular, a gene represents an element or parameter directly manipulable by the evolutionary algorithm. A set of genes is known as a genome and represents a candidate solution
or individual. Lastly, the set of genomes represents a population. Evolutionary operations,
such as crossover, mutation, and selection can be applied to a genome over a number of
generations, or iterations, to simulate evolution.

2.5.1

Genetic Algorithms

A genetic algorithm [53] is a stochastic, search-based heuristic grounded in EC that can
be used to explore the space of possible solutions for complex optimization problems. A

23

genetic algorithm typically represents each possible solution (i.e., individual) in an encoded
fashion that is amenable to manipulation and evaluation. For example, Table 2.2 illustrates
a typical genetic algorithm encoding, wherein a vector of numbers comprises the parameters
necessary to represent a particular solution. The data within this table represents k candidate
solutions with an encoding of length n. Each parameter value is represented as a floating
point number, and directly corresponds to a feature within the candidate solution. Candidate
solutions can be encoded in many other formats as well, including bit strings and variablelength vectors.
Table 2.2: Sample genetic algorithm encoding.
Solution

Parameter A

Parameter B

...

Parameter n

Individual1

0.2

1.5

...

pn1

Individual2

0.4

1.2

...

pn2

...

...

...

...

...

Individualk

pAk

pBk

...

pnk

A set of candidate solutions must undergo an evolutionary process to guide the search
towards an optimal solution. To do so, a set of evolutionary operations is executed upon the
population until a termination criterion, typically a specific number of generations, is performed. These operations include population generation, performing crossover and mutation,
and fitness evaluation with respect to predefined fitness criteria. Each of these evolutionary
operations is next described in turn.

2.5.1.1

Population Generation

At the beginning of the genetic algorithm, a population of individuals with completely
randomized parameter values is generated. Throughout each successive generation, new
individuals are created via the crossover and mutation operations. At the end of each

24

generation, the highest performing, or elite, individuals may be retained to protect the
“successful” individuals that represent a particular area of the solution space.

2.5.1.2

Crossover

The crossover operation creates new individuals by combining genetic information from
two existing individuals. Crossover is commonly applied via one-point or two-point crossover.
As demonstrated by Figure 2.7(A), one-point crossover selects a random gene within the
genome and then exchanges the genes before and after that point to create two new children.
Two-point crossover, shown in Figure 2.7(B), selects two random genes and exchanges the
genes between those two points with another individual, thereby creating two new children
as well.

1.0

1.5

0.8

0.2

3.2

0.9

1.4

0.9

0.3

3.4

0.9

1.4

0.9

0.2

3.2

1.0

1.5

0.8

0.3

3.4

(A) One-point crossover example.

1.0

1.5

0.8

0.2

3.2

0.9

1.4

0.9

0.3

3.4

1.0

1.4

0.9

0.3

3.2

0.9

1.5

0.8

0.2

3.4

(B) Two-point crossover example.

Figure 2.7: Examples of one-point and two-point crossover.

25

2.5.1.3

Mutation

The mutation operation creates a new individual by selecting a random gene from a
random individual and mutating the selected gene. Mutation provides an approach for
exploring different areas of the solution space that may not have been explored by the
genetic algorithm. Figure 2.8 provides an example of mutation, where a gene from the
parent individual is randomly mutated to create a new child individual.

1.0

1.5

0.8

0.2

3.2

1.0

1.5

0.6

0.2

3.2

Figure 2.8: Example of mutation.

2.5.1.4

Fitness Evaluation

Each candidate solution’s fitness must be calculated to determine which areas of the
solution space should be explored by retaining the highest-performing individuals. Specifically, predefined fitness criteria are applied to each candidate solution within the population,
thereby quantifying each individual’s performance. If multiple fitness functions are necessary due to domain constraints, then each fitness sub-function can be combined to form a
single aggregate fitness function. One approach for combining fitness sub-functions uses a
linear weighted sum, where each weight provides a metric for the relative impact of each
sub-function.

26

2.5.2

Stepwise Adaptation of Weights

Fitness sub-functions calculate a quantifiable performance metric for specific concerns,
the aggregation of which forms an overall fitness value. Adding a weighting coefficient to
each sub-function in turn determines the relative importance, or impact, of that particular
sub-function on an aggregate fitness value (i.e., a higher weight value gives preference to a
particular fitness sub-function in the overall fitness) [31]. However, the definition of weight
values often depends upon domain knowledge of the DAS engineer or is based on observed or
calculated metrics. As a result, an automated approach to calculating fitness sub-function
weights is necessary.
The stepwise adaptation of weights (SAW) [31, 104] is a hyper-heuristic [13] for optimizing the weighting coefficients for fitness sub-function equations. Specifically, the weights
are adjusted over time and the aggregate fitness value is monitored to determine if a particular combination of weights improves overall fitness. This approach can be implemented
either offline or online. An offline SAW adjusts sub-function weights following execution
of an evolutionary algorithm (EA), whereas an online SAW implementation adjusts weights
dynamically throughout execution of the EA.
For the purposes of this dissertation, we have used the online SAW approach [104] as
it provides a fast and efficient method for optimizing fitness sub-function weights. Within
online SAW, weights are seeded with values identified as optimal by a requirements engineer.
Throughout execution, feedback from the controlling EA is analyzed periodically to determine which weight requires adjustment. The fitness sub-function that performs the worst
(i.e., yields the lowest fitness value) has its associated weight increased to provide preference
to that particular fitness sub-function. The remaining weights are then normalized to ensure
that all weights sum to a value of 1.0.

27

2.5.3

Novelty Search

Novelty search [64] is a branch of genetic algorithms that explicitly searches for unique
solutions, as opposed to optimal solutions. The intention of this approach is to avoid converging to a locally optimal solution and instead explore the entire solution space to find a
set of solutions that may contain a globally optimal solution. Novelty search often replaces or
augments the traditional fitness function with a novelty function that implements a distance
metric, such as a Euclidean distance [10], to determine the distance between a candidate
solution and its nearest neighbors in the solution space. A novelty archive tracks candidate
solutions considered to be the most novel throughout the evolutionary process and is used
to guide the search towards unexplored areas within the solution space.

2.5.4

(1+1)-ONLINE Evolutionary Algorithm

Generally, EC is a computationally-intense technique that must be performed offline.
However, complex optimization problems with an enormous solution space can exist at runtime, thereby necessitating the need for an online technique to solve these problems. The
(1+1)-ONLINE EA [12] is a lightweight approach to providing run-time evolution on systems

with limited processing capabilities. (1+1)-ONLINE EA sacrifices searching power for performance by providing a population with only two individuals: a parent and a child. In this
approach, the parent individual is evaluated for a set amount of time. At the end of its
allotted time, the child individual asserts control of the system and is then evaluated for the
same amount of time. Upon completion of its execution cycle, the performance of both the
parent and child are compared, with the higher-performing individual retained for the following generation. A new child is then created by mutating the retained individual, and the
process is then repeated until either a satisfactory individual is found or a specified number
of generations has occurred.

28

The (1+1)-ONLINE EA can also search locally or globally within the solution space by
specifying a mutation parameter σ. Upon evaluation of each individual, the (1+1)-ONLINE EA
determines if the fitness has stagnated (i.e., no significant increase in value), where stagnation
implies that the search procedure is trapped within a local optima. The mutation value σ
determines the severity of the mutation, where higher severities search other regions of the
solution space, and lower severities explore nearby regions of the search space. By varying σ
as necessary, the search process has a better chance of avoiding convergence to a non-optimal
solution.

2.6

Software Testing

Software testing is an assurance technique that determines if a software system is operating according to its requirements or design specification [72]. Many techniques exist for
performing software testing at design time [9] as well as approaches for extending common
testing techniques to the search-based and evolutionary domain [52]. A subset of the most
common techniques [14, 82] are structural testing, functional testing, unit testing, integration
testing, and regression testing. Each of these techniques is next described in turn.

2.6.1

Structural Testing

Structural testing, typically known as white-box testing, often is concerned with validating coverage metrics such as branch or data flow coverage. Test engineers generally
have access to the source code or other abstractions of system implementation to target
implementation-specific aspects for verification and validation. For example, a test engineer
can write a set of test cases that triggers difference branches of an if-else construct to
ensure that each branch is accessible within the program.

29

2.6.2

Functional Testing

Functional testing, otherwise known as black-box testing, generally validates a system
based solely on a test specification. In this approach, a test engineer does not have implicit
knowledge of the inner workings of a system. At a high-level, functional testing is concerned
with verifying that defined features function correctly within a software implementation.
For example, a functional test case could be defined to validate that an autonomous robot
successfully actuates a motor to a particular velocity.

2.6.3

Unit Testing

Unit testing is concerned with testing individual source code modules, and often are
considered the “smallest” test that can be performed when verifying a system. Moreover,
unit tests are generally performed by the software developer to ensure that each code module
performs according to key objectives or requirements, thereby catching bugs early in the
software development cycle. For example, a unit test verifies that the output of a particular
function matches a pre-defined expected value.

2.6.4

Regression Testing

Regression testing is an assurance technique for verifying a system following a change
in software or operating context. Specifically, regression testing validates that the system
continues to satisfy previously validated requirements. For example, a regression test ensures
that a new software release does not introduce any new issues or side effects against a
previously executed test case.

30

Chapter 3
Addressing Requirements-Based
Uncertainty
This chapter introduces our techniques for automatically providing requirements-based
assurance for a DAS. Specifically, we explore how RELAX operators can be automatically
applied to a goal model using AutoRELAX and how the fitness sub-function weights that
guide the creation of automatically RELAXed goal models can be optimized with AutoRELAXSAW [42, 88]. First, we motivate the need for requirements-based assurance. Next, present
AutoRELAX and AutoRELAX-SAW. We then present two empirical studies across different

application domains for each technique. Lastly, we describe related work, summarize our
findings, and propose future directions for this research.

3.1

Motivation

Uncertainty experienced by the DAS at run time can adversely affect its ability to satisfy
requirements. While uncertainty can be mitigated by performing a self-reconfiguration,
introducing adaptations can incur operational costs in both preparing for and executing
the adaptation. To this end, RELAX [21, 113] has been introduced to provide flexibility
in requirements satisfaction as a means to reduce the number of adaptive reconfigurations.
31

However, the number of possible combinations for defining and applying RELAX operators
to a goal model exponentially increases with the number of non-invariant goals and sources
of uncertainty, thereby necessitating the need for an automated approach. For example, the
RDM goal model (c.f., Figure 2.4) comprises 21 non-invariant goals, each of which may have
one of the 6 presented RELAX operators applied (c.f., Table 2.1). Moreover, each RELAX
operator can be designated as active or inactive when applied to the goal model. As such,
there exist 1221 possible goal models comprising combinations of applied RELAX operators
that must be examined. To enable exploration of this enormous search space, we introduce
AutoRELAX and AutoRELAX-SAW.
AutoRELAX and AutoRELAX-SAW support our overarching objective to providing assur-

ance in exploring how RELAX operators can provide flexibility at the requirements level.
Specifically, AutoRELAX automatically generates RELAXed goal models while minimizing
the number of DAS adaptations and the number of RELAXations to enable the DAS to
temporarily tolerate adverse uncertainty. AutoRELAX-SAW automatically balances the competing concerns of adaptation minimization and minimization of applied RELAX operators
by tailoring fitness sub-function weights to the operating context being experienced by the
DAS.

3.2

Introduction to AutoRELAX

In this section, we present AutoRELAX, an evolutionary approach for automatically applying RELAX operators to a goal model to mitigate system and environmental uncertainty
at the requirements level. First, we introduce the assumptions that are needed to implement
and apply AutoRELAX to a given problem. Next, we describe the inputs and the expected
outputs of AutoRELAX. Following, we describe each step in the AutoRELAX approach in detail.
Lastly, we describe how SAW, a hyper-heuristic for optimizing the weights for a set of fitness

32

sub-functions, can be applied to AutoRELAX to tailor fitness sub-functions to a particular
environment, thereby yielding a higher overall fitness value.
Thesis statement: It is possible to automatically introduce flexibility into system requirements to mitigate system and environmental uncertainty.

3.2.1

Assumptions, Inputs, and Outputs

In this section we present the required assumptions for implementing AutoRELAX, as
well as the necessary inputs and expected outputs.

3.2.1.1

Assumptions

Three assumptions must hold true for AutoRELAX to successfully generate an optimized
set of RELAXed goal models to mitigate different combinations of system and environmental
configurations. In particular, we assume that:
• System and environmental data monitored by the DAS is accessible to AutoRELAX’s
genetic algorithm.
• The requirements engineer has identified a representative set of system and environmental uncertainties.
• The requirements engineer has constructed a KAOS goal model that is representative
of the DAS requirements specification.

3.2.1.2

Inputs and Outputs

AutoRELAX requires three elements as input. First, a goal model of the DAS must

be provided to capture high-level requirements. Next, a set of utility functions must be
derived to provide a quantifiable measure of requirements monitoring [28, 87, 109]. Lastly,
an executable specification or prototype of the DAS is required to simulate different types of

33

system and environmental conditions. A set of RELAXed goal models is provided as output.
Each input and output is next described in detail.

3.2.1.2.1

Goal Model

AutoRELAX requires as input a KAOS goal model [25, 105] of the DAS to capture the

functional requirements of a system and resolve obstacles to goal satisfaction. Moreover,
goals within a KAOS goal model can be specified as either invariant (i.e., cannot be violated) or non-invariant (i.e., can be temporarily unsatisfied without adverse consequences).
AutoRELAX can then introduce RELAX operators to non-invariant goals, as invariant goals

are precluded from adaptation as they must always be satisfied.

3.2.1.2.2

Utility Functions

Utility functions provide an approach for quantifying and measuring run-time satisfaction of software requirements [28, 87, 109]. Specifically, utility functions can yield a quantifiable metric that a DAS can use to determine if its goals are being satisfied. Each utility
function comprises a mathematical relationship that maps data monitored by the DAS to
a scalar value between 0.0 and 1.0. Utility functions can be derived either manually [109],
automatically by using a goal model [87], or statistically inferred based upon observed DAS
behaviors [28]. For example, the satisfaction of Goal (B) in the RDM goal model presented
in Figure 2.4 is evaluated with a utility function that returns 1.0 if accrued operational costs
remain less than or equal to a pre-determined budget, and 0.0 otherwise. AutoRELAX leverages utility functions to evaluate how applied RELAXations either positively or negatively
impact DAS behavior at run time.
For this research, we manually derived the utility functions based upon the RDM goal
model (see Figure 2.4). Specifically, the utility functions were derived according to the
Athena approach previously developed by Ramirez and Cheng [87]. Athena uses a goal model
and a mapping of environmental conditions to elements that monitor those conditions to

34

derive either state, metric, or fuzzy logic-based utility functions. State-based utility functions
monitor goals that are Boolean in nature. Both metric and fuzzy logic-based utility functions
monitor the satisficement of goals. A RELAX operator is then used to specify an appropriate
membership function for fuzzy logic goals to derive a utility function. In the context of the
RDM (c.f., Figure 2.4), a state-based utility function may monitor determine if invariant
goal (A) is either satisfied or not satisfied by monitoring if data is always accessible by the
network; a metric-based utility function may measure the satisficement of non-invariant goal
(F) to determine how many network partitions have been introduced; and a fuzzy logic-based
utility function can determine the satisficement of a goal to which a RELAX operator has
been applied, for instance, Goal (C) can be RELAXed with the AS CLOSE AS POSSIBLE TO
operator to provide flexibility in the number of data copies with respect to the number of
servers.

3.2.1.2.3

Executable Specification

An executable specification or prototype for the DAS is used by AutoRELAX to evaluate
the effects of different combinations of system and environmental conditions on DAS behavior. First, a requirements engineer identifies sources of uncertainty that can affect a DAS
at run time and the executable specification is then initialized with those uncertainties. For
instance, the RDM can be initialized according to the probability that certain network links
will fail at any given point during the simulation. Varying the sources of uncertainty, including their likelihood of occurrence and impact, will ideally trigger the DAS to self-reconfigure,
thereby leading to different types of RELAXed goal models. The executable specification then
uses the input utility functions to evaluate the satisfaction of requirements at run time.

3.2.1.2.4

RELAXed Goal Models

Given this information, AutoRELAX generates a set of RELAXed goal models that capture
optimal configurations of applied RELAXations. Each goal model in the output set represents

35

Weighting
Scheme

Goal Model

Executable
Specification

Utility Functions

goals,
constraints
(1) Define
Solution
Structure

genome
structure

(2) Configure
Search
Process

configuration

non-invariant
goals
Requirements
Engineer
Legend
process

data store

agent

data flow

candidate
models

new RELAXed
models
(5) Generate
RELAXed Models
(crossover,
mutation,
selection)

fittest RELAXed
models

(3) Evaluate
RELAXed
Models

[done]

RELAXed Models

[not done]
models with
fitness values

(4) Select
RELAXed
Models

Figure 3.1: Data flow diagram of AutoRELAX process.
a combination of goal RELAXations that maximize run-time goal satisfaction for a particular
set of system and environmental conditions.

3.2.2

AutoRELAX Approach

This section describes the AutoRELAX approach. Figure 3.1 presents a data flow diagram
(DFD) that overviews AutoRELAX. We now present each step in detail.
(1) Define Solution Structure: First, an encoding must be specified for candidate
solutions. Each solution comprises a vector of n elements or genes, where n is equal to the
total number of non-invariant goals specified by the input goal model. Figure 3.2(A) presents
the gene structure. Specifically, each gene comprises a Boolean variable that specifies if each
non-invariant goal is to be RELAXed, a corresponding RELAX operator (see Table 2.1),
and two floating-point values that specify the left and right boundaries of the fuzzy logic
membership function, respectively.
Figure 3.2(B) illustrates how each gene is then mapped to its corresponding fuzzy logic
function to evaluate the satisfaction of that particular goal. This example maps Goal (F)
from Figure 2.4 to a left shoulder fuzzy logic membership function, corresponding to the AS
FEW AS POSSIBLE RELAX operator. This function returns 1.0 if the network is connected
36

Number of non-invariant goals
Genome:

...

Gene for ith non-invariant goal
RELAXed
RELAX
Left Endpoint Right Endpoint
(boolean) Operator
(float)
(float)
AS FEW AS
True
0.0
3.0
POSSIBLE
(A) AutoRELAX solution encoding.

Desired value

Measured
property

1.0

0.0

Max. value
allowed
0.0

Number of
network partitions

3.0

(B) Mapping a gene to a RELAXed goal.

Figure 3.2: Encoding a candidate solution in AutoRELAX.
(i.e., the number of network partitions equals zero) and 0.0 otherwise. However, as long as
the network partition is transient (i.e., minor and temporary), then data can still be diffused
among connected data mirrors. While the RDM contains between 0 and 3 transient network
partitions, the downward slope from the apex to the right endpoint reflects values that may
be temporarily tolerated at run time, resulting in a quantifiable satisfaction value between
1.0 and 0.0.
(2) Configure Search Process: The genetic algorithm must then be configured by a
requirements engineer. Specifically, a population size, crossover and mutation rates, and a
termination criterion must be defined. The population size determines how many candidate
RELAXed goal models are available each generation; crossover and mutation rates specify

37

how AutoRELAX generates new goal models; and the termination criterion defines a stopping
point for AutoRELAX, and is typically represented as a generational limit.
(3) Evaluate RELAXed Models: Each RELAXed goal model is evaluated based on
the performance of the DAS throughout execution, where performance is measured based
on the satisfaction of invariant goals and the fitness sub-functions defined in Equations (3.1)
and (3.2). Minimizing the number of DAS adaptations and number of RELAXed goals is also
emphasized within the fitness calculation. To evaluate a RELAXed goal model, AutoRELAX
first maps the RELAX operators encoded in an individual goal model to their corresponding
utility functions based on Step (1). AutoRELAX then executes the prototype and records the
satisfaction of each goal and the number of adaptations performed by the DAS throughout
the simulation. Two fitness sub-functions consume this information to emphasize individuals
that minimize the number of introduced RELAXations and minimize the number of run-time
DAS reconfigurations.
The first fitness sub-function, F Fnrg , rewards candidate individuals that minimize the
number of RELAXed goals. The purpose of this fitness sub-function is to limit an excess of
unnecessary flexibility provided by the controlling goal model. F Fnrg is defined as follows:

F Fnrg = 1.0 −

|relaxed|
|Goalsnon−invariant |

(3.1)

where |relaxed| and |Goalsnon−invariant | are the number of RELAXed and non-invariant
goals in the goal model, respectively.
The second fitness sub-function, F Fna , rewards candidate individuals that minimize the
number of reconfigurations performed by the DAS throughout execution in order to reduce
overhead incurred on performance and cost. F Fna is defined as follows:

F Fna = 1.0 −

|adaptations|
,
|f aults|

38

(3.2)

where |adaptations| measures the total amount of reconfigurations performed by the DAS,
and |f aults| measures the total number of adverse environmental conditions introduced
throughout the simulation, thereby reducing the number of passive and quiescent components [63].1 A passive component can service transactions from other components, however
it cannot initiate new transactions. Quiescent components cannot initiate new transactions
or service incoming transactions. For the RDM, the operational cost of maintaining the
network varies due to the number of nodes placed into a passive or quiescent state to ensure system consistency. Reducing the amount of the passive and quiescent components
minimizes the impact of adaptation on operational costs. RELAX operators can increase a
system’s flexibility and ability to tolerate uncertainty, resulting in fewer nodes placed into a
passive or quiescent state.
The fitness sub-functions are then combined into a single fitness calculation with a linear
weighted sum in Equation (3.3):



 αnrg ∗ F Fnrg + αna ∗ F Fna
F itness V alue =


0.0

iff invariants true

(3.3)

otherwise

where αnrg and αna are weights that reflect the priority of each fitness sub-function with
respect to the calculated fitness value. The sum of the two weights must equal 1.0. Fitness of
a RELAXed goal model also depends on the satisfaction of all goals designated as invariant.
Specifically, any violation of an invariant goal results in a fitness value of 0.0, as AutoRELAX
must only produce viable RELAXed goal models.
(4) Select RELAXed Models: Next, AutoRELAX selects the highest-performing individuals from the population based on their calculated fitness value to guide the search
process towards that particular area of the solution space. AutoRELAX then applies tourna1 A component refers to an executable element, such as a software component or network

node.

39

ment selection [53], a technique that randomly selects k individuals from the population and
competes them against one another. For each group of competing individuals, AutoRELAX
selects the RELAXed goal model with the highest fitness to survive into the next generation.
The remaining individuals do not survive and are effectively removed from consideration.
(5) Generate RELAXed Models (crossover, mutation, selection): AutoRELAX
uses two-point crossover and single-point mutation to create new RELAXed goal models.
The rates for crossover and mutation were specified to be 50% and 40%, respectively.2 Figure 3.3(A) demonstrates two-point crossover. Specifically, two individuals are selected at
random from the population and designated as parents. Two new individuals, designated as
children, are produced by exchanging genes that lie between two randomly selected points
along the genome. Conversely, Figure 3.3(B) demonstrates single-point mutation. In this
example, a single individual is selected at random, and a single point along its genome
is randomly selected and then mutated. Specifically, the mutation operator in this example changes the RELAX operator from AS FEW AS POSSIBLE to AS MANY AS POSSIBLE, and
moreover sets the gene to enable RELAXation. Mutation is precluded from affecting the
fuzzy logic function boundaries, as that would alter the meaning of a fully satisficed goal.
In summary, the crossover operation attempts to generate new individuals that combine
good elements from their parents, and mutation introduces a random element to facilitate
exploration of the solution space.
Output RELAXed Models: AutoRELAX iteratively applies Steps (3) through (5) until
the termination criterion (i.e., generational limit) is reached and then outputs the set of
RELAXed goal models with the highest fitness values.
2 The crossover and mutation rates were determined based upon the rate of convergence

found from empirical results.

40

Parent 1

Parent 2

Offspring 1

Offspring 2
(A) Two-point crossover.

Gene
mutation

Parent 1

RELAXed
RELAX
Operator
Left Endpoint

False
AS FEW AS
POSSIBLE
2.3

Right Endpoint

3.4

RELAXed
True
RELAX
AS MANY AS
Operator
POSSIBLE
Left Endpoint
2.3
Right Endpoint

3.4

Offspring 1
(B) Single-point mutation.

Figure 3.3: Examples of crossover and mutation operators in AutoRELAX.

41

3.2.3

Optimizing Fitness Sub-Function Weights with SAW

While AutoRELAX can automatically provide requirements-based assurance for a DAS,
the generated combination of RELAX operators as applied to the goal model may not be
optimal across different operating contexts. SAW is a hyper-heuristic for optimizing a given
weighting scheme by iteratively exploring how different combinations of weights can improve
overall fitness within a given environment [104]. Fitness sub-functions often use weighted
coefficients to define the relative importance of each sub-function and can therefore be optimized with SAW. Specifically, SAW explores the space of weight combinations within a
particular operating context to determine which weighting scheme is appropriate (i.e., yields
a higher fitness value) for a specific environmental configuration.
Weights are typically assigned to fitness sub-functions based mainly on the domain
knowledge of a requirements engineer, coupled with trial and error and limited empirical
evidence. As a result, the combination of weighting coefficients may not be optimal for
the specific operating context in which the DAS is executing and therefore an algorithmic
approach is necessary to determine the optimal weighting scheme. SAW iteratively updates
each weight over the course of an EA to explore how different combinations of weights affect
the fitness of an individual, resulting in a combination of weights optimized for a specific
environment.
AutoRELAX-SAW extends AutoRELAX with the SAW approach to optimize weights in the

fitness function. Specifically, we have added the online method of SAW optimization [104] to
AutoRELAX. This online approach can be performed in parallel to DAS execution. Specifically,

weights are initially seeded with values specified by a requirements engineer. Throughout
DAS execution, feedback is provided to AutoRELAX-SAW to determine which fitness subfunctions are providing the smallest fitness value compared to the other sub-functions. The
poorer performing sub-function has its associated weight increased to indicate its importance

42

to that specific sub-function, and the remaining weights are then normalized to ensure that
all weights sum to 1.0.
To accommodate SAW within AutoRELAX, we augment Step (3) as shown in Figure 3.1
to incorporate the online SAW implementation. Fitness sub-function calculations are examined periodically (i.e., for this study, fitness was examined every 5th generation, where
this value was determined based upon empirical evidence to provide SAW enough time to
explore different weighting schemes). To guide the AutoRELAX process towards combinations
of RELAX operators that mitigate the uncertainties exhibited by the current operating context, the weight associated with the poorest-performing fitness sub-function is incremented.
Incrementation of this particular weight pressures the AutoRELAX process to select appropriate RELAX operators for the current environment based on the shortcomings identified by
AutoRELAX-SAW.

3.3

Case Studies

This section describes experimental results from two case studies in which AutoRELAX
and AutoRELAX-SAW were both applied to the RDM and SVS applications, respectively, to
demonstrate the effectiveness of these techniques in different application domains. First, we
introduce the configuration of each simulation environment, including sources of uncertainty
and possible reconfiguration states. Next, we present experimental results that compare and
evaluate unRELAXed, manually RELAXed, and AutoRELAXed goal models. We then describe
how SAW can augment AutoRELAX to determine an optimal weighting scheme that further
improves AutoRELAX results. For each experiment, we state and evaluate our experimental
hypotheses and resulting goal models.

43

3.3.1

RDM Case Study

For this experiment, AutoRELAX and AutoRELAX-SAW were applied to the RDM application. The RDM was configured as described in Section 2.3.2. The fitness functions described
in Section 3.2.2 are used to guide the generation of RELAXed goal models.

3.3.1.1

RDM Uncertainty

We now compare and evaluate goal models that have been automatically RELAXed with
two goal models: one in which we manually applied RELAX operators and one that was not
RELAXed. The goal model used as a base for each experiment was previously presented in
Figure 2.4. A small group of people with domain knowledge relevant to the RDM independently created first drafts of a manually RELAXed goal model and then collaborated to form
a single, common goal model to be used in this experiment. The following RELAX operators were applied to the goal model: Goal (C) was RELAXed to tolerate dropped messages
throughout the simulation; Goal (F) was RELAXed to allow for up to three simultaneous
transient network partitions; Goal (G) was RELAXed to allow larger exposure to data loss;
and Goal (H) was RELAXed to allow for extra time when diffusing data. These operators
are summarized in Table 3.1.
We define two null hypotheses for this experiment. The first, H10 , states that “there
is no difference in fitnesses achieved by a RELAXed and an unRELAXed goal model.” The
second, H20 , states that “there is no difference in fitnesses achieved between RELAXed goal
models generated by AutoRELAX and those manually created by a requirements engineer.”
Also, we specified the fitness sub-function weights, αnrg and αna , to be 0.3 and 0.7, respectively. These values were chosen based on empirical evidence that it is nearly twice as
important to minimize the number of adaptations when compared to minimizing the number
of RELAXed goals.

44

Table 3.1: Summary of manually RELAXed goals for the RDM application.
Goal

Description

C

Achieve

F

G

[DataSent]

==

RELAX Operator

Boundaries

AS CLOSE AS POSSIBLE TO

±5.0 copies

[N umberof DataCopies]

[quantity]

Achieve

AS CLOSE AS POSSIBLE TO

±1.0

[N etworkP artitions] == 0

[quantity]

tions

Achieve [DataAtRisk]

AS FEW AS POSSIBLE

±5.0% risk

AS FEW AS POSSIBLE

±5.0s

<=

parti-

[RiskT hreshold]
H

Achieve
[Dif f usionT ime]

<=

[M axT ime]

Due to the variety of configurations available to the RDM at run time, not all adaptations result in the same fitness. Figure 3.4 presents boxplots that capture the fitness values
achieved by AutoRELAXed goal models, a manually RELAXed goal model, and an unRELAXed
goal model (the small circle on the plot indicates an outlier for the unRELAXed fitness values),
where each experimental treatment was performed 50 times to achieve statistical significance.
Despite the fitness boost that unRELAXed goal models have (i.e., 0 applied RELAXations for
F Fnrg ), goal models with applied RELAX operators achieve significantly higher fitness than
unRELAXed goal models (p < 0.001, Welch Two-Sample t-test). These results enable us to
reject H10 and conclude that RELAX reduces the number of necessary adaptations in the
face of uncertainty.
Furthermore, Figure 3.4 also presents the difference in fitnesses between goal models that
have been automatically generated and those manually generated. Specifically, AutoRELAXgenerated goal models also achieve significantly higher fitness than those that are manually
RELAXed (p < 0.001, Welch Two-Sample t-test), indicating that an automated approach for
assigning RELAX operators to a system goal model provides an overall higher fitness value.

45

This result indicates that an AutoRELAXed goal model can better manage uncertainty than

0.70
0.55

0.60

0.65

Fitness

0.75

0.80

0.85

those that are manually RELAXed or unRELAXed.

AutoRELAXed

Manually RELAXed

UnRELAXed

Experimental Treatment

Figure 3.4: Fitness values comparison between RELAXed and unRELAXed goal models for
the RDM.
Lastly, Figure 3.5 presents information regarding the types of goal model configurations
generated by AutoRELAX for varying degrees of uncertainty. There are two specific types
of uncertainty represented: low uncertainty and high uncertainty. The low uncertainty environment specifies a small probability that failure can occur, such as failed network links
and dropped or delayed messages. The environment with high uncertainty specifies the opposite. As such, Figure 3.5 plots the sorted number of RELAXed goals per trial, where the
first environment specifies a low degree of uncertainty and the second environment speci-

46

fies a high degree of uncertainty, and each point in the plot represents the average number
of RELAXations applied throughout the RDM simulation. In all but one trial, AutoRELAX
introduced greater than or equal amounts of RELAX operators to the goal model executed
under the environment with high uncertainty than those executed under low amounts of
uncertainty. Moreover, the positive correlation between the two curves indicates that AutoRELAX introduced RELAXations as necessary to mitigate increasing amounts of uncertainty,

suggesting that unnecessary goal model flexibility is avoided.

6
4
0

2

Number of RELAXed Goals

8

Low Uncertainty Environment
High Uncertainty Environment

0

10

20

30

40

50

Trial

Figure 3.5: Mean number of RELAXed goals for varying degrees of system and environmental
uncertainty for the RDM.
47

3.3.1.2

Dynamic Weight Adjustment

Next, we evaluate the effectiveness of optimizing the fitness sub-function weights. Specifically, we compare the calculated fitness values provided by AutoRELAXed goal models with
those generated by AutoRELAX-SAW. AutoRELAX and AutoRELAX-SAW were each executed 50
times to achieve statistical significance. The fitness sub-function weights for AutoRELAX were
set as defined in Section 3.3.1.1, and the weights for AutoRELAX-SAW were also seeded with
these values.
We define two hypotheses for this experiment. First, H10 states that “there is no
difference between an automatically RELAXed goal model with static weights and an automatically RELAXed goal model that uses a SAW-optimized weighting scheme across different
environments.” The second null hypothesis, H20 , states that “there is no difference between
RELAXed goal models with optimized and unoptimized weights in a static environment.”
As such, Figure 3.6 presents two boxplots with fitness values generated by AutoRELAXed
goal models and AutoRELAX-SAW-generated goal models. As is demonstrated by these plots,
goal models that use an optimized weighting scheme achieve significantly higher fitness values
than those that use a static set of weights (p < 0.001, Welch Two-Sample t-test). These results enable us to reject H10 and conclude that optimizing the fitness sub-function weighting
scheme directly impacts AutoRELAX fitness.
For a small subset of environments, AutoRELAX-SAW converged to weights of αnrg = 1.0
and αna = 0.0, ensuring that the number of RELAXed goals is minimized. This result
implies that RELAX operators are not always necessary if the goal model can sufficiently
handle uncertainty without extra flexibility. Comparing the two populations, we can also
reject H20 (p < 0.001, Welch Two-Sample t-test) and state that SAW-optimized goal models
generally perform better than goal models without optimized weights.
Furthermore, AutoRELAX-SAW was compared to AutoRELAX in a static environment.
Figure 3.7 presents boxplots of fitness values found by AutoRELAX-SAW over 50 separate trials, where the horizontal line represents the best AutoRELAX fitness value for that particular
48

1.00
0.95
0.90
0.85
0.70

0.75

0.80

Fitness

AutoRELAX

SAW-AutoRELAX

Experimental Treatment

Figure 3.6: Fitness values comparison between AutoRELAXed and SAW-optimized AutoRELAXed goal models for the RDM.
environment. The results show that while AutoRELAX found a higher fitness value than the
median fitness found by AutoRELAX-SAW, optimizing with SAW enabled the achievement of
100% fitness in this particular environment. However, AutoRELAX-SAW also found fitnesses
lower than those found by AutoRELAX, indicating that not all weighting schemes are appropriate for a given environment. Each boxplot in Figure 3.8 represents the first 25 trials of a
single environment experiment, where each boxplot presents the fitness over 50 replicates.

49

AutoRELAX  best
fitness  value

Figure 3.7: Fitness values of SAW-optimized AutoRELAXed goal models in a single environment.

3.3.2

SVS Case Study

We now describe a case study with AutoRELAX and AutoRELAX-SAW that uses the SVS
as an application domain. The SVS was implemented as a fully autonomous robot tasked
with cleaning a room, as described in Section 2.2.2. For this particular study, the SVS
monitored utility values for each goal (c.f., Figure 2.2) at each time step to determine if
a self-reconfiguration is necessary. The following section describes our experimental results
from the SVS case study.

50

Figure 3.8: Comparison of SAW-optimized fitness values from different trials in a single
environment.

3.3.2.1

SVS Uncertainty

The SVS case study also evaluates and compares unRELAXed, manually RELAXed, and
AutoRELAXed goal models as a basis for experimentation. The manually RELAXed goal model

had RELAX operators applied to: Goal (A) to allow the amount of remaining dirt to be larger
than specified; Goal (B) to allow flexibility in the amount of remaining battery power; and
Goals (F) and (G) to provide flexibility in the timing of each path plan. Table 3.2 describes
the RELAX operators and fuzzy logic boundaries for the manually RELAXed goals.
The fitness function required an update for the SVS application. Specifically, two additional fitness sub-functions were added to maximize the cleanliness of the room (i.e., F Fclean )
and minimize the number of faults encountered (i.e., F Ff aults ). The first additional subfunction, presented in Equation (3.4), maximizes the amount of dirt cleaned by the SVS:
51

Table 3.2: Summary of manually RELAXed goals for the SVS application.
Goal

Description

RELAX Operator

Boundaries

A

Achieve [90% Clean]

AS MANY AS POSSIBLE

±5.0%

B

Achieve [> 5% Power]

AS MANY AS POSSIBLE

±5.0%

F

Achieve [Path Plan for 30

AS CLOSE AS POSSIBLE TO

±15.0 seconds

seconds]

[quantity]

Achieve [Clean Area for 1

AS CLOSE AS POSSIBLE TO

minute]

[quantity]

G

F Fclean =

±15.0 seconds

|dirt removed|
)
|amount of dirt|

(3.4)

where |dirt removed| is the number of dirt particles vacuumed by the SVS and
|amount of dirt| represents the total number of initialized dirt particles. The second additional sub-function, Equation (3.5), rewards candidates that minimized the number of
faults, such as running out of battery power before the simulation completed:
1.0
)
|f aults|

F Ff aults =

(3.5)

where |f aults| represents the total number of faults encountered by the SVS during simulation. All fitness sub-functions can be combined into a linear weighted sum as follows in
Equation (3.6):

F itness V alue =





[αnrg ∗ F Fnrg ]







[α ∗ F F ]
na

na

+

% number of RELAXed goals

+

% number of adaptations
(3.6)




[αclean ∗ F Fclean ]
+






[α

f aults ∗ F Ff aults ] +
52

% percentage of dirt removed
% number of faults

where αclean and αf aults are the weights associated with the additional sub-functions. Again,
the sum of all weights must sum to 1.0. For this particular experiment, we have set αnrg =
0.1, αna = 0.4, αclean = 0.2, and αf aults = 0.3.
For comparison purposes, we reuse the null hypotheses proposed in the RDM case study.
Specifically, we define H10 to state that “there is no difference in fitnesses achieved by a
RELAXed and an unRELAXed goal model,” and H20 to state that “there is no difference in
fitness values between RELAXed goal models generated by AutoRELAX and those manually
created by a requirements engineer.”
Figure 3.9 presents three box plots that demonstrate the overall fitness values calculated
for AutoRELAX-generated goal models, a manually RELAXed goal model, and an unRELAXed
goal model, respectively. Each experiment was performed 50 times to achieve statistical
significance. As is demonstrated by the results, RELAXed goal models achieve significantly
higher fitness values than unRELAXed goal models (p < 0.001, Welch Two Sample t-test),
enabling us to reject H10 and conclude that applying RELAX operators to non-invariant
goals can significantly reduce the number of encountered faults while mitigating uncertainty.
Figure 3.9 also demonstrates that AutoRELAXed goal models achieve significantly higher
fitness than manually RELAXed goal models (p < 0.001, Welch Two Sample t-test), allowing
us to reject H20 and conclude that AutoRELAX can generate goal models that better mitigate
system and environmental uncertainty than those created manually.
Figure 3.10 illustrates the impact that environmental uncertainty has on the application
of RELAX operators. Specifically, this figure shows the sorted number of RELAXed goals for
each trial in two separate environments. Each data point represents the mean number of
RELAXed goals generated throughout the evolutionary process for each goal model. The first
environment contained a low degree of uncertainty, and the second contained a high degree
of uncertainty. Low uncertainty implies a low probability for the occurrence of failures, and
high uncertainty implies the opposite. In all trials, AutoRELAX generated goal models with
more RELAX operators in the environment with higher uncertainty. The gradual increase in

53

0.6
0.5
0.4
0.2

0.3

Fitness

AutoRELAX

Manually RELAXed

UnRELAXed

Figure 3.9: Fitness values comparison between RELAXed and unRELAXed goal models for
the SVS.
slope of the curve again suggests that the application of goal RELAXations increases with
the amount of uncertainty within the DAS and environment.

3.3.2.2

Dynamic Weight Adjustment

This next experiment explores how the weighting of fitness sub-functions impacts fitness
as a result of environmental uncertainty. As such, we reuse the experimental setup as described for the SVS in the previous section. Furthermore, we define our first null hypothesis
54

15
14
13
12
11
10
9

Number of RELAXed Goals

8

Low Uncertainty Environment
High Uncertainty Environment

0

10

20

30

40

50

Trial

Figure 3.10: Mean number of RELAXed goals for varying degrees of system and environmental uncertainty for the SVS.
H10 to state that “there is no difference between an automatically RELAXed goal model
with static weights and one that uses a SAW-optimized weighting scheme across different
environments.” We define a second null hypothesis H20 to state that “there is no difference
between RELAXed goal models with optimized and unoptimized weights in a static environment.” The SVS fitness sub-function weights were defined to emphasize the importance of
reducing the number of encountered faults (see Equation (3.5) while minimizing the number

55

of performed reconfigurations (see Equation (3.4). Prior to dynamic adjustment, the weights
were initialized as follows: αnrg = 0.1, αna = 0.4, αf aults = 0.3, and αclean = 0.2.
Figure 3.11 presents boxplots that show fitness values calculated from AutoRELAXed
and goal models generated with AutoRELAX-SAW, respectively. As such, AutoRELAXed goal
models with optimized weights have a significantly higher fitness (p < 0.001, Welch TwoSample t-test) than those without weight optimization. We can reject H10 and state that

0.50

0.55

0.60

Fitness

0.65

0.70

0.75

the weighting scheme has a direct impact on aggregate fitness for a goal model.

AutoRELAX-SAW
SAW-AutoRELAX

AutoRELAX

Figure 3.11: Fitness values comparison between AutoRELAXed and AutoRELAX-SAW goal
models for the SVS.
56

Lastly, Figure 3.12 demonstrates a range of fitness values calculated from goal models
with weights optimized for a static environment using AutoRELAX-SAW over 50 separate trials,
where only the first 25 trials are depicted for presentation purposes. The solid horizontal
line (at approximately 0.62) represents the best possible fitness generated by AutoRELAX
over 50 separate trials. These results demonstrate that the environment directly impacts
the weighting scheme of the fitness sub-functions, particularly in that a single weighting
scheme may not be optimal for all environments. Moreover, Figure 3.12 demonstrates that
different combinations of fitness sub-function weights yield different fitness values, indicating
that AutoRELAX-SAW can be used to search for an optimal combination of weights, thereby
finding an optimal RELAXed goal model for the particular environment.

Figure 3.12: Comparison of SAW-optimized fitness values from different trials in a single
environment.

57

3.4

Related Work

This section presents related work in specifying, detecting, and mitigating obstacles to
the satisfaction of DAS requirements. In particular, we discuss related research for expressing uncertainty in requirements, performing requirements monitoring and reflection, and
providing obstacle mitigation strategies.

3.4.1

Expressing Uncertainty in Requirements

Research into determining the impact of uncertainty, including assurance, affects the
design, implementation, and execution of a DAS has been recently been the focus for the
software engineering community [5, 8, 21, 32, 34, 35, 66, 86, 94, 111, 112], specifically in
identifying and documenting sources of uncertainty. For instance, a Claim [111, 112] is a
marker of uncertainty in requirements and design based upon possibly incorrect assumptions.
Likewise, an analytical framework [34, 35] has been proposed that leverages mathematical
approaches for modeling and assessing uncertainty from a risk-management perspective.
Uncertainty in model checking [110] has also been studied, specifically in the realm of model
variations. Probabilistic model checking [38] has also been explored to quantify unpredictable
changes in models to mitigate unsatisfied requirements. While these approaches often require
a requirements engineer to evaluate the sources and results of uncertainty, AutoRELAX can
automate the identification and application process of RELAX operators to mitigate those
sources of uncertainty.

3.4.2

Requirements Monitoring and Reflection

Requirements monitoring frameworks can detect the occurrence of obstacles in satisfying requirements [37]. In certain circumstances, such frameworks can also suggest mitigation
strategies as necessary. Requirements have also recently been promoted to first-class, runtime entities whose satisfaction directly impact the execution of adaptive systems [8, 94]. Fur58

thermore, an awareness requirements-based feedback loop has also been introduced, where
awareness requirements perform a meta-level monitoring of system requirements [100]. In
contrast, these approaches do not directly support the management and run-time monitoring
of RELAXed requirements.

3.4.3

Obstacle Mitigation

Existing strategies have been proposed to identify, analyze, and resolve obstacles that
prevent requirements from being satisfied [105, 106], however the majority of these strategies
focus on revising goals to either prevent or negate the effects of obstacles. For example,
Letier et al. [65] have recently proposed an approach for designing software requirements
while explicitly considering obstacles posed by uncertainty using multi-objective optimization
and decision analysis. AutoRELAX can complement the proposed strategies by automatically
determining the extent to which a non-invariant goal can be unsatisfied at run time, thereby
enabling the design of a system that can continually function while exposed to obstacles that
occur as a result of uncertainty.

3.5

Conclusion

This chapter described AutoRELAX, a technique for automatically generating RELAXed
goal models, and AutoRELAX-SAW, a technique for automatically balancing fitness subfunction weights in response to environmental uncertainty while executing AutoRELAX. AutoRELAX leverages a genetic algorithm to explore how RELAX operators can be automatically

applied to a goal model to provide an optimal combination of RELAXations that can effectively mitigate uncertainty in both the system and environment. Similarly, AutoRELAX-SAW
also uses a genetic algorithm for optimization, however it, instead, uses AutoRELAX as a basis
for evaluation. Both AutoRELAX and AutoRELAX-SAW were applied to the RDM and SVS
applications for validation.

59

AutoRELAX experimental results indicate that RELAXations tend to correspond with

specific types of uncertainty. For the RDM, Goals (F) and (T) were RELAXed most often,
depending on whether network links had a higher probability of failure than it did for dropping messages, and vice versa. RELAX operators applied to Goal (J) tended to add extra
flexibility with respect to the available time for goal satisfaction. For the SVS, Goals (D)
and (E) tended to be RELAXed more often, as RELAXing those goals effectively extends the
available operating time for the SVS. Adding the SAW optimization for both case studies
tended to improve overall goal model fitness for a given environment, specifically, we determined that the weighting scheme is directly impacted by the environmental configuration
in that fitness sub-function weights can pressure the search for an optimal RELAXed goal
model for each given environment.
Potential future directions for AutoRELAX and AutoRELAX-SAW include exploring different approaches to handling multiple concerns when calculating fitness, such as multiobjective optimization [27], as well as exploring how different genetic operators can impact
the search procedure. Lastly, exploring different types of fuzzy logic membership functions
for RELAX operators is another future direction.

60

Chapter 4
Exploring Code-Level Effects of
Uncertainty
This chapter presents Fenrir, a technique for exploring how uncertainty impacts the execution behavior of a DAS by examining the paths of execution taken throughout system
execution, thereby enabling the identification of unexpected or anomalous behaviors [45].
Anomalous behaviors can then be corrected by applying bug fixes, augmenting target configurations, or updating system requirements. First, we motivate the need to explore DAS
run-time behavior. Next, we introduce Fenrir, an evolutionary computation-based approach
for exercising a DAS implementation by generating a novel set of operational contexts. We
then describe the assumptions, required inputs, and expected outputs of Fenrir. Following,
we present our case study in which Fenrir was applied to the RDM application. Lastly, we
present related work and summarize our conclusions.

4.1

Motivation

A DAS can self-reconfigure as it executes to mitigate unexpected changes in the environment by triggering adaptive logic to switch between configurations. By doing so, the
DAS can continually satisfy its requirements and high-level objectives even as its operational
61

context changes. However, if the DAS reconfigures in response to an unexpected situation,
its resulting behavior can also be unexpected and may no longer satisfy its requirements.
To this end, we describe an approach for exploring how a DAS reacts to both system and
environmental uncertainty at the code level.
Fenrir supports our approach for mitigating uncertainty by providing code-level assur-

ance. Specifically, Fenrir examines the resulting behaviors of a DAS experiencing system and
environmental uncertainty to determine the paths followed by the DAS throughout execution. The amount of total execution paths is very large due to the adaptive nature of the
DAS, as an adaptation can yield a completely unexpected DAS configuration. To explore
the space of possible execution paths, Fenrir leverages novelty search to search for a representative set of operational contexts within the solution space. Fenrir then executes the DAS
within each discovered operating context to determine its resulting behavior. The generated execution traces can then be analyzed by a DAS engineer to determine the effect that
different types of uncertainty can have on the DAS, thereby enabling the identification of
unanticipated or anomalous behaviors.

4.2

Introduction to Fenrir

In this section, we present and describe Fenrir, our approach to exploring how uncertainty
affects a DAS within its implementation. First, we introduce the assumptions required to
use Fenrir, as well as its required inputs and expected outputs. Next, we describe each step
of the Fenrir approach in detail, illustrated by a DFD of the process. Lastly, we describe the
results of a case study in which Fenrir was applied to the RDM application and compared with
random search. Finally, we present related work in this domain, summarize our findings,
and present future directions for research into Fenrir.
Thesis statement. Complex software applications can follow unexpected paths of code
execution when faced with run-time uncertainty.

62

4.2.1

Assumptions, Inputs, and Outputs

In this section, we present the assumptions needed for implementing Fenrir. We then
describe the required inputs to Fenrir and its expected outputs in detail.

4.2.1.1

Assumptions

We assume the following when instrumenting a DAS with Fenrir to explore the different
paths of execution taken at run time:
• The DAS source code is available and can be instrumented with logging statements.
• Instrumenting the DAS does not significantly affect run-time behavior or performance
of the DAS.
• The logging statements should provide a measure of code coverage and should monitor
possible exceptions or error conditions.

4.2.1.2

Inputs and Outputs

Fenrir requires two elements as input. First, a prototype or executable specification of

a DAS must be provided and instrumented with logging statements. Second, the domain
engineer must specify all possible sources of uncertainty exposed to the DAS at run time.
Fenrir then provides an archive containing a set of pairs as output, where each pair comprises

a unique operating context and operational trace of DAS execution. Each input and output
is next described in detail.

4.2.1.2.1

Instrumented Executable Specification

Fenrir requires an instrumented DAS to execute in order to trace its run-time behav-

ior. Specifically, logging statements must be introduced into the DAS code to ensure that
all possible adaptation paths and representative behavioral paths are represented (i.e., to

63

provide a base measure of DAS behavioral coverage). Furthermore, variables identified as
relevant to the DAS operating context should also be monitored by the logging statements
to capture the state of the operating context that led to each expressed behavior.

4.2.1.2.2

Sources of Uncertainty

The DAS engineer must specify the expected sources of uncertainty to simulate while
executing Fenrir (e.g., severed network links or sensor failures). Specifically, the probability
of occurrence and the severity of each uncertainty must be defined (e.g., [5%, 8%], [10%,
15%], ..., [probabilityn , severityn ]), as they are required when configuring the novelty search
algorithm. The probability of occurrence defines the chance that the source of uncertainty
will manifest and the severity defines how much the source of uncertainty will impact the
DAS. For example, a source of uncertainty may impact the ability of an RDM sensor to
monitor a data mirror. In this case, we can state that this source of uncertainty defines the
number of RDM sensors that may fail during execution. As such, we define the probability
of occurrence to be 3%. This probability states that, for each timestep of execution, there
is a 3% random chance that this source of uncertainty occurs. Next, we define the severity
of occurrence to be 25%. In this case, the severity defines an upper bound on the amount of
RDM sensors that can fail. In this case, up to 25% of all RDM sensors may fail when this
source of uncertainty is activated.

4.2.1.2.3

Fenrir Archive

Fenrir generates a collection of operational contexts, each with a corresponding execution

trace that details the path of execution followed by the DAS. Each operational context specifies the information necessary to recreate the context in simulation, comprising the sources
of uncertainty with likelihood of occurrence (e.g., 10% chance of occurrence that network
links will be severed) and severity (e.g., up to 15% of all network links affected) that are specific to that particular context. Furthermore, each context may trigger adaptations within

64

(1)

data . s i z e

= size ;

(2)

mirror . remaining capacity = capacity ;

(3)
(4)

// l o g t h e v a r i a b l e s t a t e f o r t h i s d a t a mirror

(5)

l o g g e r . p r i n t ( ’ data m i r r o r : [ % d ] , data s i z e :[ % f ] ,

(6)

r e m a i n i n g c a p a c i t y :[ % f ] ’ , data . id ,

(7)

data . s i z e , r e m a i n i n g c a p a c i t y ) ;

(8)
(9)

// send d a t a from d a t a mirror

( 1 0 ) i f ( data . s i z e <= r e m a i n i n g c a p a c i t y )
(11) {
(12)

r e m a i n i n g c a p a c i t y −= data . s i z e ;

(13)

return t r u e ;

(14) }
(15) else
(16)

return f a l s e ;
Figure 4.1: Example code to update data mirror capacity.

the DAS, and as such the trace provides information specific to each explored path. Path
information comprises the invoking module, line number, description of intended behavior,
and a flag that indicates the presence of an error or exception.
For instance, a module within the RDM application that is concerned with updating
the remaining capacity for a data mirror can be instrumented as follows in Listing 4.1:
Executing the RDM under a particular operating context would yield the following order
of statement execution: [(1), (2), (12), (13)]. In this case, the size of the particular
data message is smaller than the remaining capacity of the data mirror, thus enabling a

65

successful transmission to another node. However, to reduce the size of the resulting trace,1
only a single logging statement is necessary to capture the intended behavior of this sequence of operations. As such, the sample logging statement would appear as follows (for
presentation purposes, only the relevant variable information has been displayed; a full logging statement comprises a unique identifier, the current module name, line number, and
description including any relevant variable values):
data mirror:[2], data size:[2.5], remaining capacity:[4.0]
This particular trace statement enables a DAS developer to determine the behavior of a
particular data mirror at a particular point during execution. Each branch of the if-else
block did not require instrumentation in this case, as DAS behavior can be inferred based
upon the monitored variables.

4.2.2

Fenrir Approach

This section details the Fenrir approach. Figure 4.2 shows the DFD that illustrates Fenrir.
Each step is next described in detail.
(1) Generate Operational Contexts: Fenrir accepts the sources of uncertainty, as
specified by a DAS engineer, and in turn uses novelty search [64] to generate a set of operational contexts, where each operational context specifies the configuration of the sources of
uncertainty, the configuration of the RDM, and the configuration of the environment. These
contexts are then applied to a DAS that has been instrumented with logging statements to
determine how different configurations of uncertainty affect the DAS throughout execution,
specifically in which paths of execution the DAS follows as it executes and self-reconfigures
in response to those uncertainties.
Each operational context is represented as a genome, where a set of genomes in turn
represents a population. Each genome comprises a vector of length n, where n defines the
1 Full coverage of all possible branches resulted in file sizes greater than 10 GB and required

optimization to ensure comparisons between logs would be manageable.

66

Instrumented
DAS

operational
contexts

Operational
Contexts

(operational contexts,
operational traces)

operational
contexts
(2)

Compute
Novelty and
Archive
Solutions

(1)
Generate
Operational
Contexts

operational
contexts

sources of
uncertainty

(novel operational contexts,
operational traces)

Sources of
Uncertainty

Fenrir Archive
Legend:
process

data store

agent

Domain
Engineer

data flow

Figure 4.2: Data flow diagram of Fenrir approach.
number of uncertainty sources from the system and the environment. Each gene defines
the likelihood of occurrence and the severity of the occurrence, representing how often the
particular uncertainty is likely to occur and how much of an impact it will have on the DAS.
Figure 4.3 presents a sample Fenrir genome containing three sources of uncertainty specific
to the RDM application: dropped message, delayed message, and link failure. In this case,
there is a 15% chance that a link failure will occur, and upon occurrence, can affect up to
10% of all network links.
Genome:
Dropped
Message

Delayed
Message

...

Link
Failure

0.15

Sources of
Uncertainty

0.10

Probability Severity

Figure 4.3: Sample Fenrir genome representation for the RDM application.
67

Following the generation of operational contexts in each generation, Fenrir then applies
each operational context to the instrumented DAS, resulting in an execution trace for each
context. The set of pairs (operational context, execution trace) is then provided as input
to the novelty value calculation, as is described in the following section.
(2) Compute Novelty and Archive Solutions: Following execution of all operational contexts within a population, Fenrir then calculates a novelty value for each solution.
First, Fenrir constructs a weighted call graph (WCG) [1] by parsing each execution trace,
where a WCG is an extension to a program call graph [90]. Each WCG is represented as
a directed graph, where each node has a unique identifier that corresponds to a particular
logging statement, the directed edges between nodes symbolize the order in which logging
statements were executed, and a weight on each edge represents execution frequency. Figures 4.4(A) and 4.4(B) illustrate a sample WCG with corresponding code statements. In
this example, Statement (a) was executed first, and only a single time. Based on the if
statement within the wrapper function, either Statement (b) or (c) can then be executed
next. Here, Statement (b) was executed three times, and Statement (c) was executed twice.
Lastly, Statement (d) is executed a single time following Statement (c).
The novelty value for an individual is calculated based upon the difference in nodes
and edges between two WCGs, as is shown in Equation (4.1), and then by applying the
Manhattan distance metric [10] to measure the distance between every other WCG in both
the current population and the novelty archive, as shown in Equation (4.2). In Equation
(4.1), v represents a vertex or node, e represents an edge, µi and µj represent the WCGs
being compared, and gi and gj represent the graphs that define each WCG. In Equation
(4.2), k represents the number of WCGs to compare.
Any individual with a novelty value that either exceeds the novelty threshold, or falls
within the top 10% of all novelty values (i.e., novelty threshold), is then added to the novelty
archive at the end of each generation. The novelty archive, in turn, stores the k most diverse
individuals throughout the evolutionary process.

68

3
1

b

1

c

a
2

d

(A) Weighted call graph.

void main() {
a:
wrapper(TRUE);
}
void wrapper(bool flag) {
int i, k = 0;
if (flag)
b:
while (i < 3) ++i;
else {
c:
while (i < 2) ++i;
d:
callFunction(i,k);
}
}
(B) Corresponding code.

Figure 4.4: Example of weighted call graph.

dist(µi , µj ) = len({v ∈ gi } ⊕ {v ∈ gj }) + len({e ∈ gi } ⊕ {e ∈ gj }))
1
p(µ, k) =
k

(4.1)

k

dist(µi , µj )

(4.2)

i=1

Following completion of novelty search, Fenrir returns the novelty archive containing a set
of pairs (operational context, execution trace) that, based on their novelty score, comprise

69

the most diverse individuals found by Fenrir. Each pair provides insight into the behavior of
the DAS throughout its execution, such as adaptation triggers, parameters for each instantiated DAS configuration, and any raised exceptions, unnecessary adaptations, or unacceptable
behaviors that occurred, where an unnecessary adaptation refers to thrashing between DAS
configurations before finally settling on a target configuration. This type of detailed runtime behavior can only be found by reviewing execution traces. Unacceptable behavior can
then be corrected by providing bug fixes, augmenting target configurations, updating system
requirements, or introducing methods for adding flexibility to system requirements, such as
RELAX [21, 113].

4.3

RDM Case Study

This section presents a case study in which Fenrir is applied to the RDM application.
First, we introduce the configuration of the simulation environment. Next, we present experimental results in which DAS execution behavior is examined within uncertain environments.
For this experiment, Fenrir was applied the RDM application that was configured as
described in Section 2.3.2. Equations (4.1) and (4.2) were used to guide the search procedure towards novel areas of the solution space. The RDM network contained 25 RDMs with
300 network links. Each logging statement added to the DAS comprises a unique identifier
and module information such as function name, line number, and description. The RDM
was executed for 150 time steps and 20 data items were randomly inserted into the network throughout the simulation. The novelty search algorithm was configured as follows in
Table 4.1:
To validate our approach, we compared the results of novelty search with a random
search algorithm. We chose random search as we do not know a priori which operational
contexts will adversely affect the system. As such, we generated 300 random operational
contexts to compare against Fenrir. For statistical purposes, we conducted 50 trials of each

70

Table 4.1: Novelty search configuration.
Parameter

Value

Number of generations

15

Population size

20

Crossover rate

25%

Mutation rate

50%

Novelty threshold

10%

experiment and have plotted mean values and error bars. The following section describes
the experiment in detail, including our hypotheses and experimental results.

4.3.1

DAS Execution in an Uncertain Environment

To validate our approach, we compared Fenrir with random search to explore the space
of possible operational contexts. Specifically, we defined a null hypothesis H0 to state that
“there is no difference in execution traces generated by configurations produced by novelty
search and those created by random search.” Furthermore, we defined the alternate hypothesis H1 to state that “there is a difference in execution traces generated by configurations
produced by novelty search and those created by random search.”
Figure 4.5 presents the results of our experiment. Specifically, boxplots with novelty
distances for each member of the novelty archive generated by Fenrir and novelty distances
calculated by a randomized search algorithm. This plot demonstrates that Fenrir-generated
operational contexts yield execution traces that achieve significantly higher novelty values
than are generated by random search (p < 0.001, Welch Two-Sample t-test). Furthermore,
Fenrir-generated execution traces that exhibited negative kurtosis, suggesting that the dis-

tribution of operational contexts was skewed towards larger novelty values. We can then
reject H0 based on these results. Furthermore, we can also accept H1 , as novelty search
discovered a larger number of unique DAS execution paths in comparison to random search.
71

Furthermore, Figure 4.5 also demonstrates that Fenrir can generate a better representation
of the search space with fewer operational contexts. This conclusion is attained by the fact
that the Fenrir boxplot only contains novelty values from 30 individuals, whereas the random
search boxplot contains novelty values from 300 individuals. As a result, a DAS developer
can use Fenrir at design time to assess DAS behavior in the face of uncertainty.

180
160

Novelty Distance

140
120
100
80
60
40
20
0

Novelty Search

Search Technique

Randomized

Figure 4.5: Novelty value comparison between Fenrir and random search.
Figure 4.6 presents two sample RDM execution paths produced by execution traces
generated by Fenrir-created operational contexts. Each node represents a unique logging
statement, each edge represents a sequential execution from one node to the next, and each
weight represents the frequency that the statement was executed. For example, Figure 4.6(A)
demonstrates that Node (g) was executed 28 times and then Node (h) was executed once,
based on the transition between Nodes (h) and (g). Further analysis of Figure 4.6(A) indi-

72

cates that the RDM network comprised 28 instantiated data mirrors (i.e., Node (g) logs a
function that instantiates data mirrors).
Visual inspection of Figures 4.6(A) and 4.6(B) indicate that Fenrir can generate operational contexts that cause a DAS to exhibit wildly different execution paths in the face
of uncertainty. The diversity of these traces enables the DAS developer to determine focal
points in the DAS functional and adaptation logic to perform optimizations such as reducing
complexity, optimizing target configurations, and repairing erroneous behaviors.
28

1

a

300

1

g

1

h

8100

300

300

d

300

i

300

j

m

b1

300

n

300

300

o

300

p

q

1
10200

8100

l

300

1
1620

300

k

300

r

300

8100

z

e

300

300

s

300
1

b

1

c

300

(A) Execution Path 1
34

a

1

g

300

1

h

1

d

300

300

i

300

e

300

b

1

c

(B) Execution Path 2

Figure 4.6: Unique execution paths.

4.3.1.1

Threats to Validity

This research was intended to be a proof of concept to determine the feasibility of
using DAS execution trace data to examine how the DAS reacts to different sources of
uncertainty. Threats to validity include whether this technique will achieve similar results
in other application domains, such as embedded systems. To limit the resulting size of
73

generated trace data, we focused on coverage points in the DAS, rather than providing full
branch coverage. As a result, additional logging statements may be required to provide a
full representation of DAS run-time behavior.

4.4

Related Work

This section presents related work on research into code coverage efforts, methods for
automatically testing distributed systems, and automatically exploring how uncertainty impacts software requirements.

4.4.1

Code Coverage

Code coverage testing provides a measure of assurance by defining a metric to which the
different branches of code are verified and validated. Moreover, code coverage can be used
to determine how uncertainty impacts a DAS codebase in terms of unexpected adaptations
or unwarranted behaviors. Chen et al. [18] previously proposed code coverage as a method
for enhancing reliability of a software system while testing is being performed, as a system
may contain latent errors in areas of code that have not been exercised even following a
successful testing phase. However, instrumenting a software system to provide code coverage
analysis is non-trivial, as each separate branch of execution must be examined for faults.
Previously, Tikir and Hollingsworth [102] introduced a technique for dynamically adding
and removing logging calls into a codebase to ease the impact of instrumenting code on the
developer. Furthermore, logging optimizations were introduced to reduce the total number
of instrumentation points as well as the expected overhead incurred by logging.
While each of these approaches facilitates code coverage testing, Fenrir instead targets
code segments that specifically cause the DAS to follow different paths of run-time execution.
These segments include branching paths and self-reconfiguration code statements, and can
provide a finer-grained representation of DAS behavior throughout execution.

74

4.4.2

Automated Testing of Distributed Systems

Distributed systems are generally composed of asynchronous processes or applications
that can send and receive data asynchronously. Distributed systems are similar to DASs
in that they are complex software systems that require constant communication to properly
function. As a result, distributed systems can exhibit a staggeringly large number of possible
execution paths. Sen and Agha [98] previously introduced the notion of concolic execution,
or simultaneous concrete and symbolic execution, to determine a partial order of events that
occur throughout system execution. Concolic execution has been proven to exhaustively find
all possible execution paths of a distributed system. In the DAS domain, concolic execution
could be used to examine the space of possible DAS adaptations and adaptation triggers,
enabling an in-depth analysis of DAS behavior in uncertain environments. Conversely, Fenrir
does not exhaustively examine all possible execution paths, instead opting to examine a
representative set of behaviors exhibited by executing a DAS under disparate environments.
While concolic execution can find all possible paths of execution, Fenrir can be used to
determine a DAS’s reaction to different operating contexts and sources of uncertainty.
Automated fault injection is an approach to facilitate fault tolerance by subjecting a
distributed system to problems at run time, thereby enabling a system to provide mitigation
behaviors to continually satisfy its requirements [26] and ensure dependability [97]. Fenrir
instead explores how a DAS handles faults by examining its reaction to various types of
operational contexts, rather than directly injecting faults for the system to mitigate.

4.4.3

Automatically Exploring Uncertainty in Requirements

Uncertainty impacts software at all levels of abstraction, including the requirements
level. Ramirez et al. [86] proposed Loki as an approach to create a set of unique operating
conditions that impacts a DAS, thereby attempting to uncover latent or unexpected errors in
the controlling requirements specification. This task is accomplished by using novelty search

75

to generate conditions that cause the DAS to exhibit different behaviors throughout execution. Fenrir extends Loki by exploring uncertainty at the code level, specifically examining
the path of execution followed by the DAS in response to different types of uncertainty.

4.5

Conclusion

This chapter described Fenrir, a design-time approach for providing code-level assurance
for a DAS in the face of uncertainty. Fenrir leverages novelty search [64] to automatically
generate a diverse set of operational contexts that can affect how a DAS executes. Specifically, Fenrir introduces logging statements into the DAS codebase to determine when and
how often sections of code are executed. Fenrir calculates the distance between execution
paths by translating program logs into WCGs and then measures the difference in nodes and
edges to determine a resulting novelty value. Using novelty search, Fenrir searches for a set
of operational contexts that provide a broad range of execution traces, effectively exercising
the DAS to determine if it exhibits undesirable behaviors or contains inconsistencies between
its requirements and implementation.
Fenrir was applied to the RDM application and subjected to uncertainties in the form

of dropped or delayed messages and network link failures. Experimental results determined
that Fenrir can generate operational contexts that are more novel than can be found by
random search, and moreover can generate the set of contexts in an efficient manner. Future
directions for this work include application to additional domains, investigating the use of
other types of distance metrics for the novelty value calculation, and exploring how other
search-based or evolutionary algorithms can be leveraged to explore the operational context
solution space.

76

Chapter 5
Run-Time Testing of Dynamically
Adaptive Systems
This chapter proposes a run-time testing feedback loop that provides online testing
assurance for continual satisfaction of DAS requirements [44]. First, we motivate the need
for run-time testing in DASs. Next, we present MAPE-T, a feedback loop to enable run-time
testing. For each element of the MAPE-T loop, we describe in detail the key objectives,
expected functionalities, and key challenges that we anticipate to manifest. Moreover, we
describe enabling technologies that could be used or extended to realize each element. For
each component, we also highlight an example of each component within the context of
the SVS application. Next, we overview related work. Finally, we summarize our proposed
framework.

5.1

Motivation

While testing a DAS during the design and implementation phases of the software
development cycle addresses verification and validation for a DAS, the derived test cases are
often static in nature and may not adequately assess unexpected system or environmental
conditions [20, 61, 68, 93, 94, 112]. Furthermore, test cases that have been derived for the
77

anticipated environments may quickly lose relevance as the DAS self-reconfigures to address
fluctuations within the environment. For instance, test cases that have been derived to
test an SVS under conditions in which large objects must be avoided are only applicable if
those large objects exist within the current operational context. Otherwise, executing this
particular test case is superfluous and unnecessarily expends system resources.
Traditional testing techniques provide assurance that the DAS is satisfying its requirements by identifying and testing the DAS against anticipated operating conditions. However,
it is often unfeasible for an engineer to identify all possible conditions that the DAS may face
over its lifetime, leading to possible requirements violations and system failures as a result
of untested DAS execution states. Extending traditional testing techniques to the run-time
domain enables a DAS to determine if the operating context will cause a negative effect
on DAS operation, and if possible, reconfigure the DAS to mitigate the adverse conditions.
Moreover, the results of run-time testing could be used to trigger a DAS adaptation, thereby
strengthening the DAS against adverse conditions.
Several challenges must be overcome to test a live system as it executes. First, understanding how and when to generate test cases is a concern, as providing a representative
set of test cases is necessary when validating a system. Next, determining when to test is a
concern that impacts DAS performance, as testing activities may consume system resources
required for normal DAS operation. Then it is necessary to select an appropriate testing
method, as the type of testing can focus on a particular aspect of system validation (e.g.,
regression analysis, structural testing, etc.). Finally, determining the impact of test results,
and moreover, how to mitigate the impact is crucial to ensure the continual satisfaction of
DAS requirements, as the execution and results of testing activities may cause an unexpected
impact to DAS operation. Each of these challenges is next described in turn.

78

5.1.1

Test Case Generation

A test case is intended to determine if a system requirement, objective, or expectation is
being satisfied or violated. Specifically, a test case explores how different ranges, boundaries,
and combinations of input values can affect the behavior of software modules. IEEE provides
a standard definition of a test case [54] that comprises three parameters: an input value,
an expected output value, and a set of conditions necessary for the test case to be properly
executed. Table 5.1 presents an example of an SVS test case as defined according to the
IEEE standard. In this example, a test case has been defined to monitor the effective sensing
radius of a camera sensor. According to a requirements specification, the camera must be
able to detect objects within 1.0 m. To this end, the input value for this test case is a
measurement of a camera’s sensing distance. As the resulting value must be 1.0 m for this
test case to pass, the expected output value is defined to also be 1.0 m. Lastly, to successfully
execute this test case, a measurable object must be within sensing range of the SVS camera
sensor, at most, 1.0 m.
Table 5.1: Example of test case as defined according to IEEE standard [54].
Parameter

Value

Test Case

Camera has an effective sensing distance of 1.0 m

Input Value

Measured sensing distance

Expected Output Value

1.0 m

Operating Conditions

A measurable object is placed at 1.0 m away from the
SVS, and the camera sensor is enabled and able to sense
external objects.

As such, test cases are often defined at design time when environmental conditions
are static in nature and do not take into account the impact that a changing operating
context can have on a DAS. To facilitate run-time testing, traditional test cases must be

79

extended to provide adequate flexibility in the input conditions, expected output values, and
environmental conditions that define the test case.
Figure 5.1 presents a block diagram comparing the standard test case definition and
our proposed extensions necessary to facilitate run-time DAS testing. Specifically, Figure 5.1(A) demonstrates a standard test case, comprising an input value, corresponding output value, and the environmental conditions necessary to execute the test case. Conversely,
Figure 5.1(B) presents an adaptive test case. An adaptive test case introduces flexibility
in terms of the input value (i.e., α) and expected output value (i.e., β). As environmental
conditions change, the input value and expected output value may require an update to
properly represent the new operating context. For example, consider the lens of a camera
on an SVS. During normal operation, the camera can sense objects within a range of 1.0 m,
however during its latest cleaning cycle, the SVS collided with an object that scratched the
camera lens, thus reducing its range of visibility to 0.75 m. However, a predefined safety
tolerance within the requirements specify that the SVS can safely operate as long as the
camera can sense objects within a radius of 0.5 m, thereby allowing continuing operation of
the SVS with its currently reduced capability. An adaptive test case that tests the sensing
range of the SVS could therefore update its expected sensing range (i.e., expected output
value) to be 0.75 m as opposed to its initial definition of 1.0 m. Given that the SVS is still
satisfying its requirements, this test case should continue to pass. The impact of making
such an adaptation is minor, as the safety tolerance ensures that the SVS still continues to
satisfy its safety requirements. However, the SVS can experience requirements violations
as its utility functions may be rigidly defined in expecting the initial sensing range. These
violations may be reduced by applying RELAX as specified in Chapter 3.

5.1.2

When to Test

Testing non-adaptive systems occurs throughout the software development cycle, including the design, requirements, integration, and maintenance phases. However, testing efforts
80

Input Value
Test Case
Execution

Expected
Output Value

Environmental
Conditions
(A) Standard test case.

Input Value + α
Test Case
Execution
Environmental
Conditions

Expected
Output Value
+ β  

(B) Adaptive test case.

Figure 5.1: Comparison between standard test case and adaptive test case.
may decrease once the software has been released and may be restricted to performing regression testing on bug fixes or code updates to ensure that new or modified functionality
does not have an adverse effect on the system. Moreover, post-deployment testing tends to
occur on isolated or cloned systems to ensure that there is not a negative impact on the production environment. Conversely, testing a DAS following deployment must be conducted
at run time on a live system and often in response to changes in the operating context.
Given the need to perform run-time testing on DASs, traditional testing techniques must
be augmented to consider an appropriate time to execute the test cases to prevent adverse
performance effects on the system under test. Furthermore, assurance that run-time testing
activities do not negatively impact DAS performance must also be provided.
For example, continually testing an SVS can have an impact on its ability to perform its
intended functionality and process environmental data gathered from its sensors. Specifically,
images gathered by the camera may require intensive processing to distinguish and categorize
objects within its field of view. As such, executing a set of test cases concurrently with the
image processing can overextend the on-board processor, thereby reducing performance for
81

both operations. Determining an appropriate schedule for testing is therefore a necessity for
run-time testing to be feasible.

5.1.3

Testing Methodology Selection

Many established methods exist for performing software testing at design time, with
each verifying and validating a system in different ways. These include black-box, white-box,
equivalence class, and boundary testing. Black-box testing [82] examines system functionality without having an intimate knowledge of the internal workings of the system, whereas
white-box testing [82] examines functionality based upon prior knowledge of the system.
Equivalence class testing [14] groups test cases that exhibit similar system responses and
then executes a single item from each group to minimize the length of the testing procedure.
Finally, boundary testing [14] defines test case input values at and beyond specified limits
to determine the system’s response.
Each of these testing strategies examines different aspects of system behavior and performance, and moreover can be extended to provide run-time testing capabilities. However,
assurance must be provided that run-time testing will not negatively impact the expected
functionality and behavior of the system under test. To this end, monitoring run-time requirements satisfaction and only performing testing during periods in which the DAS is not
actively satisfying safety or failsafe requirements can assist in this objective.

5.1.4

Impact and Mitigation of Test Results

A failed test case in traditional systems often demonstrates a requirements violation,
typically necessitating an update to the software system and further testing. Common repair
strategies require a detailed investigation into the cause of the failure and an accompanying
resolution that may include code refactoring, patch generation, or bug fixing. Following the
repair attempt, the system must then be re-validated to ensure that the resolution strategy
fixed the problem and did not introduce any new problems.
82

Conversely, a DAS can leverage monitoring and testing information to determine if a
self-reconfiguration is needed to prevent a requirements violation from occurring. Specifically,
the DAS can correlate information gained from monitoring itself and its environment with
the results of run-time testing to determine an appropriate mitigation strategy. For instance,
the DAS can select a new configuration or require that a live patch be applied to introduce
a new configuration to mitigate an unexpected environmental situation. An example of this
behavior is that an SVS may determine that a failing test case is due to a fault within an
internal sensor tasked with monitoring sensor health. As a result, self-reconfiguration may
not be possible, however a software patch can be applied to the internal sensor to resolve
the problem.

5.2

Introduction to the MAPE-T Feedback Loop

This chapter introduces MAPE-T, a run-time testing feedback loop that proposes a
feedback-based approach for implementing run-time testing in a DAS. MAPE-T is inspired
from the MAPE-K autonomic feedback architecture that often guides DAS development [61].
Specifically, MAPE-T provides components for Monitoring, Analyzing, Planning, and Executing test cases at run time, with each component linked by Testing knowledge.
Thesis statement. Automated run-time assurance can be facilitated by a testing feedback loop that monitors, analyzes, plans, and executes test cases during execution.
To facilitate run-time assurance, the MAPE-T loop has four main components. Monitoring observes and identifies changes within the DAS and in its environmental context. This
information is then transferred to the Analyzing component that in turn identifies individual
test cases that require adaptation and creates a test plan specific to the current operating
conditions. Next, the Planning component performs adaptation on test case input values
and expected output values and then schedules the execution of the test plan. Then the
Executing component performs the test plan and analyzes its results. Test case results can

83

feed back into the DAS MAPE-K loop to be used in determining if the DAS requires reconfiguration. Lastly, Testing Knowledge ensures that each element has full awareness of all
testing activities.

5.2.1

MAPE-T Feedback Loop

We now describe each component of the MAPE-T feedback loop in detail, identify challenges to realizing each component, discuss candidate enabling technologies that could be
leveraged, and present an example of each component within the context of the SVS case
study. Figure 5.2 provides a graphical representation of the MAPE-T feedback loop, comprising the Monitoring, Analyzing, Planning, and Executing components, each of which are
linked together by Testing Knowledge.
Monitoring
- Observe contextual change

Executing
- Test evaluation
- Result analysis
- Outcome adjustment
- Adaptation triggers

Testing
Knowledge

Analyzing
- DAS component/test selection
- Identify test adaptation points

Planning
- Test scheduling
- Input/output adaptation

Figure 5.2: MAPE-T feedback loop.

5.2.2

Monitoring

The MAPE-T monitoring component provides run-time information regarding both the
DAS and its environment. Moreover, monitoring is considered both an internal and an
external process, as monitoring introspects upon DAS components to provide data on sys-

84

tem execution, and moreover measures properties regarding its execution environment to
determine external factors impacting the DAS.

5.2.2.1

Key Challenges

The difficulties in monitoring the operating context of the system executing MAPE-T
are similar to those that are encountered within the monitoring component of the MAPE-K
feedback loop [61]. The following monitoring challenges are applicable to both feedback
loops:
• Which properties of the DAS and its operating context must be observed.
• Which sensors or aggregations of sensor values can measure observed properties.
• What frequency for sampling sensors or sensor values is appropriate.
• How does uncertainty in the system (i.e., imperfect or unreliable sensors) or environment (i.e., unexpected operating conditions) affect sampled sensor data.
The MAPE-T monitoring component must address each of these concerns to ensure that
all processes within the MAPE-T feedback loop are provided with accurate and appropriate
information. For example, an SVS is required to sample an array of sensors to execute a
path planning algorithm appropriate to its environment. As such, an appropriate sampling
frequency must be determined for each sensor, and the sampled values must be verified to
ensure that faulty data as a result of system and/or environmental uncertainties are not
introduced into the loop.

5.2.2.2

Enabling Technologies

As MAPE-T is based upon MAPE-K, we envision that the monitoring component will
reuse the infrastructure already in place for monitoring a DAS in the MAPE-K architecture.
To this end, MAPE-T and MAPE-K will depend upon the same monitoring infrastructure
85

that provides information regarding DAS components and processes as well as monitoring
sensors that provide feedback on environmental conditions.
Additionally, the MAPE-T architecture must also observe the success and failure rates of
test cases at run time. Moreover, the relevance of individual test cases to different operating
conditions must also be observed to ensure that only necessary test cases are executed. As a
result, enabling technologies must support the traditional MAPE-K monitoring requirements
as well as the following:
• Test case instrumentation such that the DAS is continuously aware of each individual
outcome.
• Correlation between test cases and particular operating contexts to determine test case
relevance.
• Traceability between requirements and test cases to ensure that changing requirements
also facilitate updates to test cases.

5.2.3

Motivating Example

We now provide a motivating example of the monitoring component of the MAPE-T
feedback loop. For this example and all following motivating examples, the SVS is performing
a mode change in which it previously executed the RANDOM path plan for 10 seconds and
must now transition to a 20 second SPIRAL path plan.
Given that the SVS has been modeled as a DAS, a monitoring infrastructure already
exists for monitoring system and environmental conditions. This feature of the DAS MAPEK loop can be reused and extended to provide the necessary infrastructure required by
MAPE-T. Upon notification from internal monitoring that the SVS has satisfied a 10 second

path execution timer, the SVS begins its transition to a new path plan (i.e., the SPIRAL
path plan). Before doing so, a run-time testing cycle must be performed due to the change
in operating context.
86

5.2.4

Analyzing

The analyzing component focuses on generating an appropriate software test plan comprised of test cases that can be adapted at run time. As the DAS self-reconfigures or environmental conditions change, the test plan can contain test cases that are irrelevant to the
current operating context, and therefore cannot be properly executed. By analyzing current
conditions, MAPE-T can determine which test cases are relevant (i.e., valid for execution)
and thereby generate an appropriate software test plan conducive to the operating context.

5.2.4.1

Key Challenges

Three key challenges must be addressed in order to implement this component. First,
important changes within the DAS and its environment must be properly identified to determine if a contextual change has occurred. Second, the previously identified changes must
then be analyzed to determine their impact on the current test plan, including relevance and
precision of test cases and their parameters, respectively. Finally, the analyzing component
must continually update traceability links between individual requirements, test cases, and
the DAS implementation.

5.2.4.2

Enabling Technologies

The analyzing component will leverage and possibly extend existing techniques from
formal methods [36], run-time requirements adaptation [8, 85, 94], and non-functional requirements satisfaction [39, 108]. For example, an approach for verifying component-based
systems by instrumentation via a set of Labeled Transition System (LTS) properties has been
proposed [36], where LTS is a formal method for specifying a set of states and transitions.
Each transition is then labeled by a particular action. By formalizing a DAS’s state space,
it can be possible to monitor relevant artifacts within a system in an efficient manner.

87

Requirements are typically elicited at design time and are developed towards a set of
target operating contexts that may become violated over time as the system or environment
changes. However, requirements can be treated as objects to be satisfied at run time to
counter environmental uncertainty that affects a DAS [94]. An approach to elevate requirements to run-time entities leverages goal and architecture-based reflection [8]. If requirements
are considered as first-class artifacts in a system, then run-time reasoning [85] can be applied to consider trade-offs that may occur if the satisfaction of one requirement impedes
the satisfaction of another. Similarly, we propose that test cases should also be converted to
first-class entities in a system to enable run-time reasoning within testing activities.
Lastly, non-functional requirements must also be considered when designing a
requirements-aware DAS [94], given that non-functional requirements impose important constraints on how a system delivers acceptable behavior. These constraints typically involve
performance, efficiency, or usability concerns. Filieri [39] and Villegas [108] have both proposed methods for quantifying non-functional requirements. This quantification can then
enable non-functional requirements to also be treated as first-class entities, enabling system
functionality to evolve alongside system behavior during execution.

5.2.4.3

Motivating Example

We now provide a motivating example of the analyzing component of the MAPE-T
feedback loop, following the example presented in Chapter 5.2.3.
Upon receiving control from the monitoring component, the analyzing component must
determine which test cases are relevant to the current operating context. As the most recently
completed path plan was RANDOM, test cases that validate the length of time for which it
was executed can be selected, as well as any other test cases that can relate to the RANDOM
path plan. This adaptation can be represented in LTS as a transition between DAS modes,
where test cases associated with each state can be selected for execution. Moreover, all safety
and failsafe test cases are also selected, as they must continually be re-validated.

88

5.2.5

Planning

The planning component accepts the set of test cases identified by the analyzing component and determines an appropriate schedule for executing the test plan. Specifically,
planning must identify a period of time that is conducive to testing, and moreover ensure
that DAS performance and behavior is not adversely impacted by the execution of the test
plan. Furthermore, planning is also concerned with facilitating the adaptation of test case
inputs and expected outputs, as adaptation points can be defined in the planning component
to be used as a basis for later adaptation.

5.2.5.1

Key Challenges

We foresee four key challenges inherent in planning run-time test execution. First,
the performance cost of test case adaptation and side effects must be rigorously analyzed,
as the overall impact of running a test plan must be known. Specifically, executing and
adapting test cases must not cause interference or delays in the overall satisfaction of DAS
requirements. Second, automatically detecting low periods of activity within the DAS can
provide information regarding proper timing for test plan execution. Third, defining the
conditions that trigger input and output value adaptation must also be considered as a key
challenge. Finally, extending standard testing approaches to consume run-time monitoring
information for adaptation purposes is another challenge to be overcome.

5.2.5.2

Enabling Technologies

The planning component can benefit from techniques for continuous testing, searchbased software engineering, using models as run-time entities, and test scheduling and planning. Continuous testing is an approach for performing testing activities on a system at run
time. The setup and maintenance of test case parameters, test scheduling, and selection of
test cases to be executed are concerns that must be addressed while performing continuous

89

testing. Along these lines, Saff and Ernst have studied the effects of continuous testing by
performing online regression testing to aid software developers as they write code [91, 92].
In this case, spare processor cycles were used to execute regression test cases to verify that
the code updates made by the software developers continually satisfied requirements while
the system was executing. Similarly, Nguyen et al. [78] implemented run-time testing via
distributed agents that execute tests at run time. Specifically, a dedicated, automated testing agent coordinates the testing of other agents within the distributed system. In this case,
testing is performed by two types of agents: monitoring and testing. The monitoring agent
detects and reports faults found within the system. The testing agent generates test suites
to be executed by other system agents to provide continuous testing throughout execution.
Test cases derived for a system generally provide a suitable starting point for continuous
testing, however the test cases are often static in nature and therefore can lose relevance as
the system experiences unexpected operating conditions. As such, test cases must adapt
along with the DAS. To address this concern, the planning component can supplement
the initial suite of test case parameters with run-time monitored values [67] to accurately
test the conditions that the DAS experiences throughout its lifetime. To this end, searchbased techniques, such as online evolutionary computation [43], can be leveraged to explore
optimal test suite parameter configurations. Furthermore, search-based techniques can also
facilitate adaptive test case generation. In particular, machine learning techniques that apply
probabilistic methods [4, 15, 38, 84], exploring points of failure [86], or extending current
methods with evolutionary computation [69] can be considered for assisting in planning
run-time testing.
Leveraging models at run time provides another dimension of enabling run-time requirements satisfaction. Baresi et al. [4] have proposed methods for validating software
compositions at run time with an assertion language that describes both functional and nonfunctional system properties. Morin et al. [76] have proposed an approach that considers a
DAS to be composed of software product lines (SPLs), where adaptations are represented as

90

different configurations of SPLs. To this end, the goal of SPL design is to share a common
codebase between different projects to enable code reuse. Trade-offs in balancing object satisfaction must also be considered at run-time by reasoning over the models as they experience
both system and environmental uncertainty [33].
Scheduling test case execution is another concern for the planning component of MAPET. Test case scheduling requires a careful balance between two competing concerns: maxi-

mizing the utilization of system resources while minimizing adverse side effects experienced
by the DAS. As was previously described in the continuous testing approaches [91, 92], the
planning component of the run-time testing framework can execute test cases as spare processor cycles become available. Likewise, the DAS can leverage selective testing strategies
to filter extraneous tests that would otherwise consume needed resources. This concern is
exacerbated within onboard embedded systems, where resources tend to be constrained to
limited processing or memory capabilities.
Lastly, selecting an appropriate subset of test cases to be executed can be considered
a multidimensional optimization problem, where competing concerns can be represented by
system performance, test case execution scheduling, and adaptation penalties that may occur. Furthermore, there is still a large challenge in quantifying each of these concerns, as
many comprise non-functional characteristics. As such, it may be possible to use multidimensional optimization algorithms, such as NSGA-II [27], to balance these competing concerns
when selecting test cases. Moreover, extending NSGA-II to be performed either online or in
a separate, evolutionary agent can further enable run-time test case selection.

5.2.5.3

Motivating Example

We now provide a motivating example of the planning component of the MAPE-T feedback loop, following the example presented in Chapter 5.2.3.
After the analyzing component has completed its tasks, the planning component of
MAPE-T analyzes the current state of the SVS to determine if run-time testing can be phys-

91

ically performed, considering performance and memory restrictions that may be imposed by
the limited processing capabilities of the on-board controller. Once the planning component
determines that testing can be safely performed, a run-time test plan is generated comprising
all test cases relevant to the current operating context.

5.2.6

Executing

The executing component performs the test plan according to the previously determined
schedule, analyzes the results of each test case, and triggers any necessary adaptations in the
testing framework or DAS. Following execution of the test plan, each individual test result
is analyzed to determine if adaptation is required. A test case adaptation is necessary if the
test case is no longer relevant to its environment (i.e., the input value and expected output
value are no longer correct for current operating conditions). Note that we are not suggesting
that test cases be adapted to ensure that they will always provide a passing value, instead,
they are adapted to ensure that they remain applicable targets for execution. Finally, the
test results may also be delivered to the DAS MAPE-K loop to provide extra information
regarding DAS behavior and performance, the result of which may result in reconfiguration
of the DAS.

5.2.6.1

Key Challenges

Three key challenges exist for the executing component. The first is promoting both
requirements and test cases to be first-class entities in the DAS. Specifically, distilling requirements, especially non-functional requirements, and test cases into quantifiable entities
is a non-trivial task. Moreover, providing adaptation capabilities for test cases is also nontrivial, as any flexibility must be guaranteed to not exceed any safety or failsafe tolerance.
The second challenge is in determining an appropriate amount of adaptability for each test
case, as excess plasticity can cause adverse effects on the DAS. The third challenge is in de-

92

termining and adapting the acceptance rate for test cases, as not all test cases can necessarily
be executed at the same time.

5.2.6.2

Enabling Technologies

A DAS must be able to continually deliver acceptable behavior and satisfy requirements
even as its operating context changes or requirements are updated or added. To this end,
Souza et al. [99] have proposed eliciting awareness requirements that monitor requirements
violations over time and are treated as run-time entities. Once a predetermined amount
of violations occurs, the failing requirement can be adapted at run-time with previously
identified parameter changes to improve its chances of future satisfaction. This approach
can be modeled as a set of operations over a goal model, further enabling the use of models at run time. Other approaches also use requirements as run-time entities [8, 94] that
provide capabilities for reasoning about trade-offs in requirements satisfaction at run time.
These requirements-aware systems provide a step towards handling uncertainty in a systematic manner, using system requirements as a basis for quantifying run-time behavior.
Additionally, including specification languages such as RELAX [21] or FLAGS [5] can reduce the rigidity of requirements, thereby improving the chances of run-time requirements
satisfaction.
Run-time testing has yet to gain major traction in production software systems, however
it has been successfully applied within hardware-based adaptive systems [75]. To this end,
field-programmable gate arrays are used to provide adaptive capabilities. Run-time testing
methods in hardware systems are typically concerned with component and/or system execution failures, and therefore must reconfigure the system as needed [29, 95]. Additionally,
these systems are further constrained with embedded CPUs and must consider power optimization and minimization of memory usage in their design. A previous approach used
monitoring, redundancy, and a well-defined recovery procedure to facilitate run-time testing
of an adaptive hardware system [75]. This approach enables a system to continuously func-

93

tion while executing under adverse conditions without the need for external intervention,
however, intervention can be required in the presence of severe or recurring faults. As such,
hardware-domain techniques can be reused in software, such as using redundant controllers
for monitoring and correcting faults at run time [78].

5.2.6.3

Motivating Example

We now provide a motivating example of the executing component of the MAPE-T
feedback loop, following the example presented in Chapter 5.2.3.
Upon scheduling the test plan, the execution component runs the previously selected
test cases and analyzes their results. Test cases results are correlated with the controlling
SVS goal model (see Figure 2.2) to determine the validity of their results. Failed test cases
are analyzed to determine if a run-time adaptation is necessary, as it is possible that the
current operating context may no longer reflect the initial intent of test cases derived at
design time [43]. Control is then passed back to the monitoring component, and the MAPE-T
loop repeats.

5.3

Related Work

While testing is widely used throughout the software industry for the purposes of verification and validation, extending testing activities to the run-time domain has only recently
been introduced as a viable means of providing assurance. Testing initiatives are often relegated to design-time assurance techniques [9, 52], with common approaches [14, 82] being
structural, functional, and regression testing. However, test cases derived at design time
are generally static in nature and cannot adequately provide assurance for a system as it
experiences unexpected or unanticipated environmental conditions. As software grows in
complexity, the need for run-time assurance becomes apparent, as software is required more
and more to remain online while continually behaving as expected.

94

This section presents related work in providing run-time testing for complex, distributed
systems. In particular, we overview how run-time monitoring of system behavior can be
used to test systems at run time, as well as how multi-agent systems, a branch of distributed
computing, can also be used as a basis for run-time testing.

5.3.1

Exploration of System Behavior

Distributed systems tend to comprise large networks of heterogeneous nodes that must
communicate with each other to attain a common goal. As such, validating the behavior of
each node and the system as a whole is a non-trivial task, given the size and complexity of
the system, as well as the fact that source code and/or private data may not be available on
nodes within the system. To this end, monitoring and exploring system behavior provides a
black-box approach to testing a distributed system at run time.
Researchers have explored how system behavior can be analyzed to determine if potential faults exist within distributed systems [16, 17] and have introduced DiCE as a framework
for providing run-time testing. DiCE is an approach that automatically explores how a distributed system behaves at run time and also checks to determine if there are any deviations
from desired behavior. To accomplish this task, system nodes are subjected to a variety of
inputs that systematically exercise their behaviors and actions. Furthermore, DiCE operates
in parallel to a deployed system so as not to negatively or adversely impact the production
system.
To systematically explore node behavior, DiCE begins execution from the current state
of the distributed system, essentially creating a system checkpoint for its beginning state.
DiCE clones this checkpoint to be used as a basis for testing, and then employs concolic
execution to explore how different inputs affect the node behavior. Concolic execution,
a combination of concrete and symbolic execution, uses concrete inputs while symbolically
executing a program. In concolic execution, constraints are collected by following code paths
and are then fed into a solver that attempts to find an input that negates each constraint.
95

Negating constraints provides DiCE with a set of concrete inputs that, in aggregate, provide
full coverage of the conditions that the distributed system faces at run time.
DiCE was applied to a case study that explored how the border gateway protocol (BGP),
the basis for the Internet’s inter-domain routing, can experience routing leaks (i.e., hijacking
of domain prefixes, such as a widely publicized issue in which traffic routed to YouTube.com
was hijacked [11]). DiCE exercised each node’s message processing code, resulting in a set
of prefix ranges that can be leaked, and therefore filtered by Internet providers, to prevent
such an attack.
While DiCE exhaustively explores different paths of execution in parallel to live systems,
our run-time approaches to providing assurance focus on exploring the reactions of live
systems in a production environment. This method provides the benefit that faults can be
identified live within the production system, rather than relying on a secondary, parallel
system. However, to enable this method, we stress that caution must be exercised when
implementing run-time assurance methods to ensure that, by providing assurance, we do not
adversely impact the normal execution behaviors of the running system.

5.3.2

Multi-Agent Systems

Multi-agent systems (MASs) are typically implemented as distributed systems that contain multiple agents that must interact to achieve key objectives for both the agents and the
system as a whole. Agents, in reality, are separate software systems that must communicate
with each other by passing messages. Furthermore, agents can be located within different
host systems, where each host represents a different environment. Agents are reactive in that
they can monitor and react to changes in operational context, and moreover are considered
proactive in that they are autonomous and can decide which actions must be performed to
achieve their goals. Due to these particular attributes of a MAS, testing is a non-trivial task
requiring careful thought and extension of existing techniques.

96

Run-time testing of MASs requires that a framework be implemented that supports the
autonomous, proactive, and reactive properties of agents within a system. In particular,
Nguyen et al. [78] have proposed an approach that uses a set of manually derived test cases
complemented with automatically generated test cases. These test cases are then executed
continuously at run time by dedicated agents tasked with monitoring and performing testing
activities. This framework, called eCAT, uses two types of agents to enable testing: an
autonomous tester agent and a monitoring agent. Each of these agents is next described in
detail.

5.3.2.1

Autonomous Tester Agent

The autonomous tester agent is tasked with automatically generating and executing
test cases. In particular, this agent can use either random testing or evolutionary mutation
(hereafter “evol-mutation”) testing to generate test cases. Tester agents use methods for
random test data generation [73, 101] to randomly populate test cases. These test cases, in
turn, are then sent to the software systems (i.e., other agents in the MAS) that are under
test to be executed. It is then the task of the monitoring agent to observe the results of test
case execution for each system under test. Random testing may be a cheap and efficient
technique for generating test data, however the resulting test cases often lack meaning and,
as such, may not be able to uncover all faults within an agent.
Evol-mutation testing combines mutation testing with evolutionary testing. In particular, mutation testing [30] mutates existing programs (i.e., agents under test) in an attempt
to artificially introduce known defects. Moreover, mutation testing has been shown to be
a valid replacement for fault injection, as well as a better indicator of fault detection than
code coverage [58]. For example, a mutant can be a copy of an existing agent, however
an if-statement within the agent may be slightly modified with a different value being
checked, the intent of which is a fault. Evolutionary testing [83] uses evolutionary methods
to evolve test case parameters in order to achieve a better overall test suite using derived fit-

97

ness metrics. Combining these two approaches yields evol-mutation testing. Evol-mutation
testing provides an advantage in that test cases can be evolved to catch the faults introduced
into mutants, rather than relying on static testing methods. The steps required to perform
evol-mutation are next presented:
1. Preparation: Initially, mutation testing is executed on the agent(s) under test to create
a set of mutants. These mutant programs are provided as input to evolutionary testing.
2. Test execution and adequacy measurement: The test cases provided at design time are
then executed on all mutants. A fitness value is then calculated for each test case based
on a ratio of how many mutants were killed by each test case, where a killed mutant is
an agent whose output deviates from what is expected, signifying a discovered fault.
3. Test case evolution: An evolutionary process is then applied to generate new test cases
by applying crossover and mutation operations to existing test cases.
4. Decision: Lastly, this step determines if the evolutionary process should end (i.e.,
maximum number of generations has been attained) or if a new set of mutant agents
should be created and tested, respectively.
5.3.2.2

Monitoring Agent

The monitoring agent is tasked with observing the results of test cases applied to agents
under test. In particular, if monitored output deviates from an expected value, then the
monitoring agent reports to a controlling entity that the particular agent exhibits a fault. If
this agent is a mutant, then it is considered to have been killed. Furthermore, monitoring
agents are also tasked with verifying that pre and post-conditions are met by agents under
test to protect the MAS during execution. Lastly, monitoring agents observe agent behavior
and provide an execution trace to the autonomous tester agents, including information regarding agent birth (i.e., new mutant created), agent death (i.e., mutant killed), and agent
interactions.
98

eCat has been successfully applied to a case study in which a MAS, BibFinder, is tasked
with finding and retrieving bibliographic information from a computer. Compared to random
testing and goal-based testing, the continual testing approach of eCat was able to find more
bugs in the system, particularly those that are either very hard or take a large amount of
time to manifest.
Continuous testing of MASs is similar to the MAPE-T feedback loop that has been proposed within this dissertation. However, our approach differs in that MAPE-T defines a
feedback-based approach for continuously executing and adapting testing activities. Moreover, MAPE-T is intended to test a production system that does not contain artificial defects
or generated mutants, thereby providing assurance for the live system. Finally, the information provided by run-time testing can be used to actively reconfigure the DAS, thereby
hardening it against uncertainty.

5.4

Discussion

This chapter has introduced the MAPE-T feedback loop, an approach for automatically
providing run-time assurance for a DAS. MAPE-T comprises a set of components to facilitate
run-time testing. A monitoring component measures system and environmental conditions,
an analyzing component generates a test plan specific to current operating conditions, a
planning component performs run-time test case adaptation and moreover schedules the
execution of the test plan, and an execution component runs the generated test plan and
analyzes its results. Each of these components are linked together by testing knowledge,
a controlling aspect of MAPE-T that enables the sharing of test-related information within
each component. Furthermore, this chapter described key challenges to overcome in implementing each component within the MAPE-T loop, and also proposed methods for leveraging
existing design-time techniques within the run-time domain. The following chapters detail
investigations into realizing the components of MAPE-T on a simulated SVS.

99

Chapter 6
Run-Time Test Adaptation
This chapter introduces our techniques for performing test adaptation at run time for a
DAS. Specifically, we explore how both test suites and test cases require adaptation during
DAS execution to remain relevant even as system parameters and environmental conditions
change. We first motivate the need to perform both types of run-time test adaptation in
the face of uncertainty. Next, we describe our techniques for supporting run-time test suites
adaptation and test case adaptation, respectively. Proteus1 is a technique for adapting test
suites at run time to ensure that the appropriate test cases are selected for online execution.
Veritas2 is a technique for performing fine-grained adaptation of test case parameter values

to ensure that individual test cases remain relevant to their current operating context. In
addition, Proteus acts as a managing infrastructure to facilitate run-time test adaptation by
performing coarse-grained test suite adaptation and invoking Veritas as necessary for finergrained parameter value adaptations. Next, we present our approach for using both Proteus
and Veritas at run time to enable adaptive testing. Then, we present results from two separate
experiments. The first experiment details how Proteus was used to automatically adapt suites
at run time, and the second experiment details how Veritas was used to automatically adapt
test cases at run time. Next we explore whether RELAX operators can be applied to run-time
1 In Greek mythology, Proteus was a deity that could change his form at will.
2 In Roman mythology, Veritas was known as the goddess of truth.

100

test cases to further enhance adaptive testing. Then, we compare and contrast both Proteus
and Veritas with other testing initiatives. Finally, we summarize our findings and present
future directions for our research into run-time test adaptation.

6.1

Motivation

DAS applications often must continually deliver critical functionality with assurance
even as the environment changes, requiring the DAS to self-adapt to mitigate unexpected
uncertainties. Moreover, the resulting state of the DAS following a reconfiguration may no
longer satisfy its requirements. Performing run-time testing, as described in the previous
chapter, adds a layer of assurance that the key objectives and requirements of the DAS
are still being satisfied even as the system and environment changes. However, the test
specification derived for the DAS at design time may lose relevance as the operating context
changes. For instance, a test case that tests the sensing range of an on-board camera may
become compromised as the SVS collides with a wall, causing the lens to be scratched
and thereby effectively reducing its sensing range. As a result, the test case may require
coarse-grained adaptation by Proteus (e.g., disabled as the camera is no longer functional) or
fine-grained adaptation by Veritas (e.g., the expected sensing range is reduced, yet still valid,
and can therefore be adapted to the new sensing range).

6.2

Terminology

This section introduces and defines key terminologies relevant to testing that are used
throughout this chapter. These terms are defined as follows:
• Test case: Single test to assess all or a portion of a requirement. An IEEE standard
test case comprises an input value, expected output value, and conditions necessary to
perform the test case [54].

101

• Test plan: Describes the scope and schedule of testing activities.
• Test suite: Subset of test cases from the test specification. A test suite is typically
derived to be executed under a particular operating context.
• Test specification: Set of all possible test cases derived for a system.

6.3

Introduction to Proteus

Proteus is a requirements-driven approach for run-time adaptation of test suites. Specif-

ically, Proteus acts as a managing infrastructure for test adaptation activities performed at
run time. To this end, Proteus provides two levels of adaptation capabilities. First, Proteus
creates an adaptive test plan for each DAS configuration, as well as a default test suite for
each adaptive test plan. An adaptive test plan describes how a set of test suites are used to
test a particular operating context, where a test suite comprises a collection of test cases to
be executed. From each default test suite, Proteus can selectively enable or disable test cases
as necessary to maintain relevance of each individual test case as well as to optimize run-time
testing performance. Second, Proteus determines when fine-grained test case adaptation is
required by monitoring testing results. If necessary, Proteus invokes Veritas to search for an
optimal configuration of test case parameter values.
Thesis statement. Performing online adaptation of both test suites and test cases
enhances the overall relevance of testing activities for a DAS at run time.

6.3.1

Proteus Approach

Proteus is an approach for enabling test suite adaptation at run time. First, Proteus

defines an adaptive test plan for each DAS configuration at design time. Each adaptive test
plan contains a default test suite that is executed upon invocation of the DAS configuration
at run time. The default test suite, in turn, contains all possible test cases that are relevant

102

to that particular DAS configuration. Run-time adaptation occurs by selectively activating
and deactivating test cases to generate new test suites based upon test results, where test
cases considered failing are re-executed and test cases considered passing are not. This
behavior enables test cases determined to be irrelevant to be adapted at run time (i.e., by
Veritas) and furthermore enables the conservation of system resources by not re-executing

test cases that have already passed for the current operational context. For example, the
DAS selects a new configuration Ci at run time. Proteus, in turn, activates the associated
adaptive test plan AT Pi and then executes the default test suite T Si.0 . Following execution
of T Si.0 , Proteus generates a new test suite T Si.1 that reflects the configuration of passed and
failed test cases. The following sections describe how Proteus defines test suites and adaptive
test plans.
Figure 6.1 presents a graphical depiction of a DAS that has been instrumented with Proteus. Specifically, this figure presents the logical connections between DAS configurations,

operating contexts, adaptive test plans, and test suites. For example, DAS configuration C1
is triggered by operational context OC1 . When testing begins, Proteus selects the associated
adaptive test plan AT P1 . AT P1 comprises a dynamically-generated collection of test suites
T Si.j that each defines a particular configuration of test cases to be executed. Moreover,
T Si.0 is considered to be the default test suite for AT P1 , and as such, is executed initially
each time AT P1 is selected for testing. The test case parameter values for all test cases
within AT P1 are stored in its associated data store P arams1 . Finally, each test suite T Si.j
comprises a collection of test cases that specify an activation state. Specifically, Proteus
selectively activates or deactivates test cases based on test results, and therefore the configuration of test case states defines a unique test suite. The specific definition of a test suite
with respect to Proteus is provided in Chapter 6.3.2.
Figure 6.1 also depicts how different testing cycles can generate different test suites
within the existing collection of test suites for a particular adaptive test plan. A testing
cycle is performed following a DAS reconfiguration, and the testing cycle continues until

103

either all test cases have passed or a new DAS configuration is selected. Specifically, the
results from two testing cycles are pictured in Figure 6.1 by different patterns in each T Si.j .
The first testing cycle is represented by the darker, right-slanted shading, and the second
cycle is represented by the lighter, left-slanted shading. T Si.0 , as the default test suite, was
executed at the start of both testing cycles for regression purposes. Following execution of
the default test suite, both testing cycles generated T Si.1 based on the configuration of test
cases. Following execution of T Si.1 , the first testing cycle then generated T S1.2 , T S1.3 , and
T S1.4 , whereas the second testing cycle generated T S1.5 , T S1.6 , and T S1.7 .

ATP1

TS1.0

TS1.4
TC1.4.1 ACTIVE

TS1.2

TS1.3

TS1.4

TS1.5

TS1.6

TS1.7

TC1.4.2
...

TS1.1

INACTIVE
...

Params1
DAS

Cn1

C12

OC1

C2n

C1

...

C2
C21

Cn

Cn2
t1n

Params2

OC2

ATP2

Paramsn

...

OCn

ATPn

Legend
Cij

ATPi

Adaptive
logic
Adaptive
test plan

Ci

Association

TSi.0

DAS
configuration

OCi

Example iteration 1

Test case parameter
data store

Default test suite

Operational
context

TSi.j

Test suite

Example iteration 2

Figure 6.1: DAS configurations and associated adaptive test plans.
104

6.3.2

Test Suite

Test suites used by Proteus contain a set of test cases to be executed at run time, where
each test case has an additional status attribute that can be either ACTIVE, INACTIVE,
or N/A. Specifically, ACTIVE test cases are executed when the test suite is executed, INACTIVE test cases are not executed due to prior testing results, and N/A test cases are
not executed as they are not relevant to the current operating context. Test case status is
determined by two conditions. First, the operational context as specified by the DAS configuration determines which test cases are relevant to the adaptive test plan. Test cases that
are not relevant are designated as N/A and cannot be executed while the current adaptive
test plan is selected. For instance, the SVS may be following a SPIRAL path plan while
attempting to navigate around obstacles in a room. In this case, test cases that are related
to the RANDOM path plan would be designated as N/A, as they are not relevant to the
current situation. Collectively, the set of test cases with their associated status form a unique
test suite T Si.j within a particular adaptive test plan AT Pi .
The second condition for determining test case status is based on testing results. As
run-time performance could be negatively impacted by performing testing activities, Proteus
ensures that only necessary test cases are executed at run time. As such, only test cases
that are designated as ACTIVE are executed. A test case is designated as ACTIVE if it
has not yet been executed for a given DAS configuration Ci (assuming it is relevant), or if
it has previously failed. Test cases that fail are subjected to further adaptation by Veritas
in order to realign the test case with current operating conditions. Test cases that monitor
safety or failsafe concerns are always designated as ACTIVE and moreover are precluded
from adaptation. This approach enables Proteus to provide continuing assurance that the
DAS is satisfying its safety objectives at run time.
Figure 6.2 provides a detailed view of two sample test suites: T S1.0 and T S1.1 . Specifically, the default test suite T S1.0 considers all test cases to be relevant except for T C3 ,

105

where T C3 will be defined as N/A for all other test suites within AT P1 as it is not relevant
to the current operating context. T S1.1 defines T C4 to be INACTIVE, as T C4 has previously passed for this particular test suite and does not require further execution. As such,
T S1.1 is generated by Proteus at run time for this particular combination of test case statuses
following execution of the default test suite T S1.0 .

ATP1
TS1.1

TS1.0

TC1
TC2
TC3
TC4
TC5

ACTIVE
ACTIVE
N/A
ACTIVE
ACTIVE
...

TC1
TC2
TC3
TC4
TC5

...

ACTIVE
ACTIVE
N/A
INACTIVE
ACTIVE
...

Figure 6.2: Example of test case configuration for T S1.0 and T S1.1 .

6.3.3

Adaptive Test Plan

An adaptive test plan comprises a default test suite and a set of dynamically-generated
test suites that are relevant to a particular DAS configuration and operating context. At
design time, an adaptive test plan contains only a default test suite T Si.0 that specifies each
relevant test case to be ACTIVE and each irrelevant test case to be N/A. Following execution
of T Si.0 , Proteus analyzes the test results to generate a new test suite to be executed next.
Test cases that are considered to be invalid (e.g., false negatives or false positives) are labeled
as ACTIVE and test cases that have passed and are valid are labeled as INACTIVE. Invalid
test cases are then provided to Veritas for further adaptation.
Testing within a particular adaptive test plan continues until either all test cases have
passed (i.e., are labeled as either INACTIVE or N/A) or the DAS has selected a new con106

figuration. When the DAS selects a new configuration, its associated adaptive test plan is
activated and Proteus then executes the default test suite for that particular adaptive test
plan.
To illustrate Proteus adaptive test plans, consider the execution of the SVS following
startup. At the outset, the SVS randomly selects a STRAIGHT path plan to follow while
operating under a normal power consumption mode. This configuration of SVS features,
defined for a particular set of system and environmental conditions OC1 , is considered to
represent DAS configuration C1 . As such, AT P1 is selected by Proteus for run-time testing
and the default test suite T S1.0 is executed. Proteus analyzes the results to determine which
test cases should remain active. In Figure 6.2, T C1 and T C2 monitor safety requirements and
are therefore required to remain ACTIVE, regardless of the result of test case execution. T C3
was not executed, as it is labeled N/A and therefore not relevant to the current operating
context (i.e., OC1 ). T C4 passed and was therefore designated as INACTIVE until the
operating context changes. T C5 failed and therefore remains ACTIVE. Proteus analyzes
these results and generates a new test suite T S1.1 that specifies the current status of each
test case. For each test case failure that disagrees with its correlated utility function(s) (i.e.,
test case is determined to be invalid), Proteus invokes Veritas to further adapt the test case
parameter values. Following parameter value adaptation, Proteus stores the parameter values
for whichever individual test case (i.e., parent or child) exhibited the largest fitness value to
P arams1 for use in the next testing cycle.
6.3.3.1

Regression Testing

To continually ensure that the DAS is satisfying its requirements even as the environment changes, Proteus performs three different levels of regression testing. First, test cases
that monitor safety and failsafe conditions (i.e., Figure 2.2, Goals (D), (J), (R), and (S))
must always remain active to ensure that they are re-validated when each test suite is executed. Second, test cases that monitor system invariants (i.e., Figure 2.2, Goal (G)) are

107

always considered relevant to all operating contexts and are therefore re-validated within
each test suite. Third, test cases that monitor non-invariant goals must be re-validated in
the event of a contextual change. Specifically, following reconfiguration of the DAS from C1
to C2 , the default test suite T Si.0 is executed with all relevant test cases set to ACTIVE.
Assuming that a particular test case has been previously validated in a prior DAS configuration, re-executing that test case in the current DAS configuration provides functional
regression assurance.

6.4

Introduction to Veritas

Veritas is an evolutionary computation-based approach for fine-tuning test case param-

eter values to ensure that each test case remains relevant to its operational context. Specifically, Veritas uses an online EA (c.f., Chapter 2.5.4) to search for an optimal combination of
parameter values for each test case within a pre-defined safety tolerance defined within a test
specification. For example, a test case that tests the distance between the SVS and a liquid
spill must also provide a safety tolerance. Specifically, the test case may specify that the
SVS must detect a spill within 0.5 m, however any value larger than 0.3 m is still considered
safe, and any value below 0.3 m causes a safety violation. Therefore, Veritas cannot adapt
the expected value of this test case to be any smaller than 0.3 m.
Veritas uses DAS utility functions to assess test case relevance. While utility functions

typically quantify high-level system intent (e.g., maintaining a safe distance between vehicles
for an intelligent vehicle system), test cases provide a finer-grained assessment for the behavior of individual features and functions. By correlating utility functions with particular
test cases, Veritas can determine if test cases are valid or invalid at run time, and if invalid,
provide an approach for realigning test case parameters with the operational context.
Thesis statement. Adapting test case parameter values aids in providing run-time
assurance.

108

6.4.1

Assumptions, Inputs, and Outputs

In this section, we describe the assumptions that must hold when using Veritas. We
then state the required inputs as well as the expected outputs for Veritas when adapting test
cases.

6.4.1.1

Assumptions

We assume that a system goal model provides an accurate representation of key objectives and requirements for a system under test. Moreover, the goal model and its derived
utility functions are assumed to have been derived correctly to provide a representative quantification of system behavior. Lastly, we assume that a test specification derived from a goal
model provides full coverage of all key system objectives and requirements.

6.4.1.2

Inputs and Outputs

Veritas requires five inputs to properly adapt test case parameters at run time: a goal

model of the DAS, a set of utility functions to be used for high-level requirements monitoring [109], an executable specification or prototype of the DAS, a set of monitoring elements,
and a test specification that comprises all possible test cases. First, Veritas requires a goal
model of the DAS under test to capture the requirements and key objectives of the DAS.
For this chapter, we use the SVS goal model presented in Figure 2.2.
Second, Veritas needs a set of utility functions for high-level requirements monitoring.
The utility functions must quantify each goal’s performance during DAS execution. An
example utility function that measures the satisfaction of power conservation measures for
SVS movement (Figure 2.2, Goal (F)) is presented as follows in Equations (6.1) and (6.2):

109

utilityGoal F = BatteryDecay,

(6.1)

where

BatteryDecay =




1.00 if BatteryCharge >= 75%,







 0.75 if BatteryCharge >= 50%,

(6.2)




0.50 if BatteryCharge >= 25%,






 0.25 else.

The returned utility values are then used to determine if a DAS reconfiguration is
necessary. For example, Equations (6.1) and (6.2) quantify the current state of the battery.
If the remaining charge falls below a predefined threshold, for instance 50%, then the SVS
can self-reconfigure to provide reduced power to its wheel motors, thereby conserving battery
power while still achieving motion and satisfying Goals (A) and (B). Veritas, in turn, uses
utility values to validate test results. Following the prior example, a test case that monitors
the SVS’s power mode is considered to be valid if the utility value (utilitygoal F ) is violated
and the test case determines that the SVS has not successfully entered a reduced power
mode, implying that there is an issue in the DAS reconfiguration mechanism.
Third, an executable specification of the DAS that captures the system and environmental contexts is required. The executable specification is responsible for executing DAS
behavior in a simulation environment that can generate and apply all possible operating
contexts in which the DAS may operate.
Fourth, a set of monitoring elements is required to monitor and quantify DAS operating
conditions, particularly environmental elements (ENV), variables that monitor those ENV
elements (MON), and a relationship between each ENV and MON element (REL). For example,
while in a power conservation mode, the SVS can reduce power to its wheel motors (ENV).
To do so, the SVS queries the wheel motor sensors (MON) to determine how much torque
is currently being applied to the wheels. Upon entering the power conservation mode, the
110

SVS commands the wheel motors to reduce the applied torque (REL) to effectively reduce
the amount of battery power consumed, thereby extending the amount of time that the SVS
can continue to operate.
Fifth, a test specification that provides full coverage of system requirements must be
provided to Veritas. Each test case must also be specified as invariant or non-invariant.
Invariant test cases are precluded from adaptation and are typically associated with safety
or failsafe concerns (e.g., SVS shutting down power in case of a safety violation). Noninvariant test cases can be adapted at run time and are typically associated with functional
or behavioral concerns. For run-time validation, each test case must also be associated
with one or more utility functions that provide oversight on test case validity. Table 6.1
provides a subset of example SVS test cases. For instance, Test Case 3 ensures that the
SVS object sensor (MON) can detect large dirt particles in the environment (ENV) within
a radius of 0.5 m (REL). As such, the expected value of an SVS internal variable (i.e.,
ObjectSensor.DetectRadius) must equal 0.5 m. If this test case fails, then Veritas checks
to determine if an adaptation is necessary, as operating conditions may have changed such
that the detection radius of the object sensor is reduced.
Veritas outputs a set of tuples comprising the environmental configuration, system config-

uration, and test specification configuration, as well as an adaptation trace. The tuple is used
to determine the state of the DAS at each point of test execution and adaptation and can
be used offline to determine the types of system and environmental configurations that were
experienced by the DAS during execution. Moreover, the adaptation trace provides information regarding the contextual changes that triggered testing activities, the results of testing,
and the adaptation path of each test case. An example of an output tuple is provided
in Listing 6.3. This listing provides a snapshot of the data found within each configuration in the tuple ([Environmental Configuration], [System Configuration], [Test
Specification], [Adaptation Trace]).

111

[ Environmental C o n f i g u r a t i o n ] ,
% Environment c o n f i g u r a t i o n i n f o r m a t i o n
Number o f l i q u i d s p i l l s

: 3

Room width

: 10m

Room depth

: 12m

Downward s t e p

: True

...
[ System C o n f i g u r a t i o n ] ,
% SVS c o n f i g u r a t i o n i n f o r m a t i o n
Starting battery l i f e

: 98.0%

Starting capacity

: 0.0%

P r o b a b i l i t y o f wheel s e n s o r f a i l u r e : 5.0%
...
[ Test S p e c i f i c a t i o n ] ,
...
%TC5 − Network a c t u a t o r has n e v e r p a r t i t i o n e d network .
TC5 e x p e c t e d v a l u e

: NetworkActuator . NumPartitions = 0

TC5 s a f e t y t o l e r a n c e : NetworkActuator . NumPartitions ∈ [ 0 , 2 ]
TC5 type

: non−i n v a r i a n t

...
[ Adaptation Trace ]
...
% TC5 e x p e c t e d v a l u e has been adapted t o 1
Timestep 5 : TC5 e x p e c t e d v a l u e : 0 −> 1
...
Figure 6.3: Veritas example output tuple.

112

Table 6.1: Examples of SVS test cases.
Test Case 1

Test Case 2

Test Case 3

Test Case Test suction tank for If SVS falls off cliff, en- Verify large dirt particle
(ENV)

large objects

sure that failsafe mode detection within 0.5m
was enabled and all
power disabled

Agent

InternalSensor (IS)

(MON)

InternalSensor (IS), ObjectSensor (OS)
Controller (C)

Expected

IS.NoLargeObjects

C.FailSafeActive == OS.DetectRadius

Value

== T RU E

T RU E

0.5m

Type

Invariant

Invariant

Non-invariant

Accepted

T RU E

T RU E

[0.25m, 0.75m]

utilityGoal D ,

utilityGoal D ,

utilityGoal J

utilityGoal R

==

(REL)

Value
Goal Con- utilityGoal D ,
straint

6.4.2

utilityGoal J

Veritas Fitness Functions

This section describes the fitness functions and validation metrics used by Veritas. While
test cases typically return a Boolean pass or fail, Veritas returns a measured fitness subfunction value and test result. A fitness value provides an extra metric for comparison of
test case relevance, and a particular fitness sub-function is defined for each test case based
on its type (i.e., either an exact value or range of values) as well as designation as invariant
or non-invariant. Table 6.2 provides the fitness sub-functions for each combination of test
type and designation.
The different types of fitness sub-functions are derived based upon the type of variable measured by the particular test case. For those test cases that monitor an exact
113

Table 6.2: Individual test case fitness sub-functions.

Test type

REL

Fitness

Invariant

Exact Value

if (valuemeasured == valueexpected ) then
f itnessmeasured = 1.0
else
f itnessmeasured = 0.0

Invariant

Range
of Values

if (valuemeasured ∈
[valuelow boundary , valuehigh boundary ]) then
f itnessmeasured = 1.0
else
f itnessmeasured = 0.0

Non-invariant

Exact Value

f itnessmeasured = 1.0 −

Non-invariant

Range

if (valuemeasured ∈

of Values

|valuemeasured −valueexpected |
|valueexpected |

[valuelow boundary , valuehigh boundary ]) then
f itnessmeasured = 1.0
else
f itnessmeasured = 1.0 − |

valuemeasured −valueoptimal
|
valueoptimal

value, the test’s measured value (valuemeasured ) is compared to the test’s expected value
(valueexpected ). Variables defined as a range expect that the measured value falls within
pre-determined boundary values (i.e., [valuelow boundary , valuehigh boundary ]). Moreover, if
a range variable falls outside of those boundaries, then the optimal value (i.e., valueoptimal )
is defined as the nearest boundary to the measured value, as defined in Equations (6.3)
and (6.4) that respectively depict the distance calculation to the nearest boundary and the
optimal value:

114

dlow = |valuemeasured − valuelow boundary |
dhigh = |valuehigh boundary − valuemeasured |,

where
valueoptimal =



 valuelow boundary

if (dlow < dhigh ),


 value

else.

high boundary

(6.3)

(6.4)

As such, overall fitness for each test case considers inclusion within the low and high
boundaries as an impetus for providing a fitness boost during the evolutionary process. Particularly, a measured value that is not equal to the expected value, yet is within the specified
range of acceptable values, is rewarded for being a valid result. To this end, aggregate fitness
for each test case is calculated as a linear weighted sum that comprises the measured fitness
sub-function and fitness boost for validity, where weights αmeasured and αvalid define the
relative importance of each sub-function. Aggregate test case fitness is next presented in
Equations (6.5) and (6.6):

f itnesstest case = αmeasured ∗ f itnessmeasured +
αvalid ∗ V alidResult,

where
V alidResult =



 1.0 if valuemeasured is valid,

(6.5)

(6.6)


 0.0 else.

Given that invariant test cases cannot be adapted, they can only pass or fail. Test cases
are also considered as passing or failing based upon a fitness threshold value. As such, this
dissertation considers a test case with a fitness value of 0.75 or higher to have passed and
otherwise to have failed. This threshold value was selected to provide a higher probability

115

that the adaptation process would be triggered as test case fitness must be relatively high
to be considered passing. Given that invariant test cases cannot be adapted, they can only
pass or fail, and therefore can only use the threshold value to determine the test result. Noninvariant test cases that are considered failed are then adapted by an online evolutionary
algorithm (EA).
6.4.2.1

Test Case Validation

Veritas analyzes the results of non-invariant test case execution to determine if adaptation

is warranted based on the validity of the test cases. Test case validity is determined by
the relationship between the test case and its corresponding utility function, with validity
comprising four possible cases:
• True positive: Test case fitness is within [T hreshold, 1.0] and its correlated utility
value is within (0.0, 1.0], indicating that the test is valid and has passed. No extra
action is required by the DAS reconfiguration engine or the Proteus test adaptation
framework.
• True negative: Test case fitness is within [0.0, T hreshold) and its correlated utility
value equals 0.0, indicating that an error has occurred and the test has failed. Presence
of a true negative implies that the DAS requires reconfiguration.
• False positive: Test case fitness is within [T hreshold, 1.0] and its correlated utility
value equals 0.0, requiring both DAS reconfiguration and test adaptation. This result
indicates that the test case has passed, however its correlated goal is not satisfied.
In this case, the test case and utility value are in disagreement, indicating that the
test case and/or the utility function are incorrect. A DAS test engineer can analyze
the false positive and associated execution trace to determine the source of the error,
particularly if the test case or utility function is in error.3
3 For this work, we assume that all utility functions have been derived correctly, and

therefore a false positive indicates an error in a test case parameter value.
116

• False negative: Test case fitness is within [0.0, T hreshold) and its correlated utility
value is within (0.0, 1.0], indicating that the test case requires adaptation. In this case,
the test case and utility value are again in disagreement. Given that the utility function
is satisfied, the test case requires adaptation to become relevant again. Moreover, a
DAS test engineer can analyze the state of the DAS and its environment at this point
in time to determine the reason(s) that the test case became irrelevant.3
For example, a test case that monitors the health of the SVS’s cliff sensors is considered
to be valid if its corresponding utility value(s) indicate that the SVS has not enabled a
failsafe mode (Goal (J)) and has not fallen down a set of stairs (Goal (D)). However, if this
test case is passing (i.e., implying that the cliff sensors are fully functional) and the SVS has
fallen down a set of stairs (violating Goals (J) and (D)), then the test case disagrees with its
utility functions and is therefore invalid. In this case, either a DAS reconfiguration or sensor
replacement is necessary to rectify the problem. Test cases that are invalid are marked for
adaptation.

6.4.2.2

Online Evolutionary Algorithm

Veritas runs the (1+1)-ONLINE EA (see Chapter 2.5.4) on each individual test case that

was selected for adaptation, where the (1+1)-ONLINE EA comprises a population of two individuals: a parent and a child, where the child is created by mutating the parent. Mutation
of test case parameters is enabled by updating either the test case expected value or range
of acceptable values, depending on the type of test case (i.e., exact or range). Following
execution and analysis of each individual test case, the individual with the higher fitness
value is retained for the next iteration. Veritas considers test cases to be individuals, and
evaluation is considered to be a test case execution.
Compared to offline evolutionary approaches, the (1+1)-ONLINE EA gains a significant
advantage in its ability to provide evolutionary capabilities at run time by limiting the
amount of individuals in the population and only performing mutation. However, an ex117

haustive search of the solution space is precluded due to the limited population size of the
(1+1)-ONLINE EA. As a result, the (1+1)-ONLINE EA may not be able to successfully converge

to a globally-optimal solution.

6.5

Run-Time Testing Framework

Together, Proteus and Veritas provide a unified approach for DAS run-time test adaptation to mitigate uncertainty at different granularities. Proteus enables coarse-grained test
suite adaptation to ensure that only relevant test cases are executed as necessary even as
environmental conditions change. Veritas enables fine-grained test case adaptation to update
individual test case parameters to ensure that their values continually reflect changing environmental conditions. This section describes how Proteus and Veritas can be used in tandem
to ensure that both test suites and test cases can continually provide run-time assurance
while testing a DAS. Furthermore, we also describe how Veritas calculates a fitness value for
individual test cases, describe how test cases are validated, and lastly describe the process
by which Veritas uses the (1+1)-ONLINE EA.
To this end, Figures 6.4 and 6.5 present a combined workflow diagram that illustrates
how Proteus and Veritas, respectively, are used to enable run-time test adaptation. Particularly, Figure 6.4 presents the Proteus technique for coarse-grained test suite adaptation, and
Figure 6.5 presents the Veritas technique for fine-grained test case adaptation. As is illustrated by the shading in Figure 6.4, (10) and (17), Veritas is a component within the Proteus
framework.
We now describe each step of Figures 6.4 and 6.5 in detail, where bolded items indicate
Boolean conditions checked before the step is executed.
(1) The DAS identifies a change within its operating context and selects a new DAS configuration. Proteus, in turn, consumes this information and activates the associated
adaptive test plan for that particular DAS configuration.
118

(1)
New DAS
configuration Ci invoked,
corresponding ATPi
is retrieved

MAPE-K

(2)
Execute default TS i.0 for
associated ATPi

(4)
Contact MAPE-K

(3)
Any valid failed test
cases?

yes

no

(6)
Done testing Ci

(5)
All test cases valid and
passed

yes

no
(7)
Label valid passed test cases as
INACTIVE

(16)
Execute adapted test
plan TSi.j

(8)
Generate TSi.j to handle ACTIVE
test cases

(9)
Invoke Veritas to finetune test case
parameters

Figure 6.4: Proteus workflow diagram.
(2) Upon activation of run-time testing, Proteus first executes the default test suite, effectively executing all relevant test cases for the particular operational context.
(3) Proteus analyzes the test results to determine if any true negative results exist.
(4) If a true negative exists, then the DAS is notified to perform a self-reconfiguration
as an issue has been identified that requires self-reconfiguration.
(5) If a true negative does not exist, then Proteus checks to see if execution of all test
cases resulted in true positives.

119

(9) Invoke Veritas to
fine-tune test case
parameters

(10)
Was TSi.j adapted
by
Veritas?

no

(13) Select test cases
for adaptation

yes

(11) Compare test case
fitness values between
TSi.j and TSi.j’

(12) Store best
performing test cases in
TSi.j

(14) Create backup
TSi.j’

(15) Execute 1+1ONLINE EA on
selected test cases in
TSi.j

(16) Execute
adapted test suite
TSi.j

Figure 6.5: Veritas workflow diagram.
(6) If all test results are true positives, then testing is considered complete for the
current DAS configuration and testing is halted until a change in operating context
occurs.
(7) If at least one test result is not a true positive, then Proteus updates the current
test suite. All true positive test cases are labeled INACTIVE, as they do not need to
be re-executed until a change in operating context occurs. All remaining test cases are
labeled ACTIVE, or valid for execution.

120

(8) Proteus dynamically-generates a new test suite based upon the default test suite T Si.0
defined for AT Pi to reflect the configuration of ACTIVE and INACTIVE test cases
with respect to the current operating context.4
(9) Veritas is invoked to determine if any ACTIVE test cases require fine-grained adaptation, the details of which are presented in the following section.
(10) Veritas checks to determine if test cases in the current test suite T Si.j were previously
adapted.
(11) If a test case in the current test suite T Si.j was previously adapted, then
Veritas determines which test cases perform better based upon calculated fitness values

between the current test suite T Si.j and the backup test suite T Si.j (see Step 15).
(12) Veritas then stores the better performing test cases within the current test suite T Si.j .
(13) If no test case in the current test suite T Si.j was previously adapted, then
Veritas first determines which test cases require adaptation (i.e., test cases that have

failed).
(14) Veritas creates a backup of the current test suite T Si.j to provide a basis for comparison
between adapted and non-adapted test cases in the following testing cycle, where the
backup test suite is an exact copy of T Si.j prior to adaptation.
(15) Veritas executes the (1+1)-ONLINE EA on test cases selected for adaptation within the
current test suite T Si.j .
(16) Proteus executes the current test suite and then the procedure iterates back to Step 5.
4 Test cases considered irrelevant to the current operational context are labeled N/A.

121

6.5.1

Test Case Adaptation.

We now discuss how test cases can be adapted at run time in the face of uncertainty.
Particularly, we present two test cases that were derived according to the SVS goal model
(Chapter 2, Figure 2.2), and specifically demonstrate how these particular test cases can be
adapted at run time. To this end, Figure 6.6 presents two representative test case adaptations. For each example, we present the test case identifier (T Ci ), a description of the test
case, the conditions necessary for the test case to be executed, and the expected value of
the parameter being tested.5 We then present an unexpected occurrence that would otherwise cause the test case to fail as well as an adaptation provided by Veritas that ensures the
particular test case remains relevant. For each of the presented test cases, the adaptation
is assumed to be within a tolerated safety value (i.e., adaptation of the test case does not
violate requirements or safety constraints).
First, Figure 6.6(A) presents an example test case that is impacted by a sensor occlusion.
T C12 tests that the SVS can successfully follow a backup procedure in the event that a noncollideable (e.g., baby or pet) object is detected within 0.5 m of the SVS. However, due to
an unforeseen sensor occlusion, the effective sensing range of the SVS is lowered. As such,
Veritas can adapt the expected value that T C12 measures to be 0.35 m. The new value

ensures that T C12 will pass due to the new system condition being experienced. However,
this adaptation is only valid if the adaptation falls within a pre-defined safety tolerance, as
well oversight from a corresponding DAS utility function.
Figure 6.6(B) presents an example test case that measures the remaining battery life of
the SVS, and moreover ensures that power does not fall below 10%. During the simulation,
the SVS focused its efforts on cleaning the room and may not have conserved enough power
to ensure that it can return to a docking station while still maintaining more than 10%
remaining battery power. As such, Veritas adapts the test case parameter for T C22 to allow
5 The full definition of the each presented test case is specified in Appendix .

122

ID

Description

Environmental
Condition

Value

TC12

SVS enables backup procedure
upon detection of non-collideable
object

Non-collideable
object in room

0.5m

TC12

Test Case

Action

Sensor
occluded

SVS enables backup procedure
upon detection of non-collideable
object

Legend

Non-collideable
object in room

0.35m

Adapted
Test Case

(A) Sensor occlusion example.

ID

Description

Environmental
Condition

Value

TC22

SVS must ensure battery life
remains above 10%

SVS active

> 10%

SVS active

> 5%

SVS attempting
to return to
dock

TC22

SVS must ensure battery life
remains above 5%

(B) Power conservation example.

Figure 6.6: Examples of test case adaptation.
extra flexibility in the amount of power remaining while the SVS attempts to reach its
docking station. For reference, if power fell below 5%, then a safety violation would occur
and this test case would irrevocably fail. Depending on the correlated utility functions
(i.e., utilityGoal K , utilityGoal M , c.f., Appendix ), this test case failure will either be a true
negative (i.e., utility functions are also violated) or a false negative (i.e., utility functions are
satisficed to a degree).

123

6.6

Case Study

This section presents three case studies that illustrate how Proteus and Veritas can be used
to provide adaptive testing at run time. The first case study presents experimental results
in which Proteus was enabled to adapt test suites at run time, the second case study presents
experimental results in which Veritas was used to adapt test case parameter values at run
time, and the third case study presents experimental results in which we enabled both Proteus
test suite adaptation and Veritas test case adaptation together. The following sections also
present the simulation parameters controlling the SVS application and experimental results
for both Proteus and Veritas.

6.6.1

Simulation Parameters

Each experiment was conducted using the SVS application. Particularly, the SVS was
configured as an autonomous robot that must behave in accordance with a controlling goal
model (c.f., Chapter 2, Figure 2.2), with specific focus on effectiveness, efficiency, and safety.
The physical simulation was created within the Open Dynamics Engine,6 and the SVS and
its environment were configured as described in Section 2.2.2.
We subjected the SVS to uncertainty by using Loki [86], an approach to automatically
generate unique combinations of system and environmental conditions. System uncertainty
manifested in random occlusions and/or failures to randomly selected sensors. Environmental uncertainty comprised the amount, location, and distribution of dirt particles; instantiation of objects that can damage the SVS (e.g., liquid spills), instantiation of objects that the
SVS must navigate around (e.g., stationary pets); floor friction (e.g., carpeting); downward
steps; and room dimensions.
The input test specification comprised 72 test cases, where 17 test cases covered safety
concerns and the remaining 55 test cases covered system functionality. The test cases that
6 See http://www.ode.org.

124

monitored safety and failsafe concerns were defined to be invariant, and therefore precluded
from adaptation. The test cases that validated system functionality were defined to be noninvariant, thereby allowing run-time adaptation. The fitness function weights αmeasured and
αvalid (c.f., Equation 6.5) were defined as 0.4 and 0.6, respectively, to maximize the amount
of test results considered valid.
The SVS simulation was configured to run for 120 timesteps and was required to vacuum
at least 50% of the small dirt particles within the room. Conversely, the SVS was required
to avoid large dirt particles, liquid spills, and downward steps to avoid both internal and
external damage. The SVS simulation was also instrumented to enable run-time monitoring
of system and environmental conditions that are not typically available to SVS sensors. Examples include monitored variables that store data regarding the SVS decision-making logic
and data structures that maintain the state of all objects within the simulation environment
over time.
For the first experiment, we compared and evaluated adaptive test plans generated by
Proteus for each DAS configuration with a manually-derived test plan (hereafter the “Control”) that did not provide run-time adaptation capabilities. The Control test plan comprises

all test cases from the input test specification and it only executes the test cases that satisfy
their execution requirements (i.e., conditions necessary for execution were met). According
to the IEEE standard [54], a test case must define the test criteria for the test to be successfully executed. The intent of the Control test plan is to provide coverage of all possible mode
changes that may be experienced at run time by the SVS. For the second experiment, we
compared test cases adapted by Veritas to test cases that were not adapted (i.e., the Control).
As such, Proteus was disabled to focus on the impact of Veritas. Lastly, we combined our experiments to determine how enabling both Proteus and Veritas can enhance testing assurance.
For statistical purposes, we conducted 50 trials of this experiment, and, where applicable,
plotted mean values with corresponding error bars or deviations.

125

6.6.2

Proteus Experimental Results

For this experiment, we define the null hypothesis H0 to state that “there is no difference
between Proteus adaptive test plans and a manually-derived test plan.” Moreover, we define
the alternate hypothesis H1 to state that “there is a difference between a Proteus adaptive
test plan and a manually-derived test plan.”
Figure 6.7 presents boxplots of the average number of test cases that should not have
been executed (hereafter termed “irrelevant”) between a Proteus adaptive test plan and a
manually-derived test plan. A test case is considered to be irrelevant if its fitness value (c.f.,
Equation 6.5) equals 0.0. For those test cases designated as irrelevant, the difference between
the measured and expected values is large, indicating that the test case is no longer relevant
to its operating context. As Figure 6.7 demonstrates, Proteus significantly reduces the number
of irrelevant test cases executed in comparison to those executed under a manually-derived
test plan (Wilcoxon-Mann-Whitney U-test, p < 0.05).
Testing activities were further analyzed to monitor the amount of false positive test
cases, or instances where test case fitness falls within [T hreshold, 1.0] and its correlated
utility value equals 0.0 (c.f., Chapter 6.4.2). As Figure 6.8 demonstrates, Proteus significantly
reduces the amount of false positive test results as compared to testing with the manuallyderived test plan (Wilcoxon-Mann-Whitney U-test, p < 0.05). These results indicate that
testing with adaptive test plans can reduce the amount of false positive test results, thus
reducing the need for spurious test adaptation and, more importantly, reducing the burden
of unnecessary analysis by the DAS test engineer.
As Figure 6.9 demonstrates, Proteus also significantly reduces the amount of false negative results, or instances when test case fitness is calculated to be within [0.0, T hreshold)
while its correlated utility value is greater than 0.0 (c.f., Chapter 6.4.2), that were encountered during testing (Wilcoxon-Mann-Whitney U-test, p < 0.05). This result indicates that
Proteus adaptive test plans assist in reducing the amount of run-time testing adaptations

126

Average Number of Irrelevant Test Cases

40

30

20

10

Control

Proteus
Experimental Treatment

Figure 6.7: Average number of irrelevant test cases executed for each experiment.
required for the testing framework, thereby reducing the overall cost of testing the DAS and
the amount of analysis effort required by a DAS test engineer.
Lastly, the total number of executed test cases were recorded to provide a measure of
the overall impact of run-time testing to a DAS. Particularly, we demonstrate that Proteus
significantly reduces the amount of executed test cases per testing cycle by ensuring that
only relevant test cases are executed, as is shown in Figure 6.10. This figure illustrates
that Proteus can reduce the amount of required effort by a testing framework at run time

127

Average Number of False Positives

800

600

400

Control

Proteus
Experimental Treatment

Figure 6.8: Average number of false positive test cases for each experiment.
(Wilcoxon-Mann-Whitney U-test, p < 0.05). The performance and behavioral impacts of
run-time testing on a DAS are explored in Chapter 7.
The combined results presented in Figures 6.7 – 6.10 enable us to reject the null hypothesis H0 and determine that a clear difference exists between Proteus adaptive test plans
and a manually-derived test plan. Moreover, these results enable us to accept the alternate
hypothesis H1 , based on the assumption that the manually-derived test plan was defined in
relation to normal operating conditions.

128

Average Number of False Negatives

5000

4000

3000

2000

1000
Control

Proteus
Experimental Treatment

Figure 6.9: Average number of false negative test cases for each experiment.

6.6.2.1

Threats to Validity.

The research presented within this section is intended as a proof of concept to determine
the feasibility of using Proteus for managing and adapting run-time testing activities. One
threat to validity is if Proteus will achieve similar results within adaptive systems that do not
implement mode changes. Another threat to validity occurs in the validity of the input test
specification, as it must be fully comprehensive to enable complete coverage of the system
requirements for testing assurance to be provided.

129

Cumulative Number of Executed Test Cases

3500

3000

2500

2000
Control
Proteus
Experimental Treatment
Figure 6.10: Cumulative number of executed test cases for each experiment.

6.6.3

Veritas Experimental Results

This section presents the experimental results from enabling Veritas test case adaptation
on the SVS. For this experiment, we define the null hypothesis H0 to state that “there is
no difference between Veritas test cases and unoptimized test cases.” We also define the
alternate hypothesis H1 to state that “there is a difference between Veritas test cases and
unoptimized test cases.” Figure 6.11 presents boxplots of the mean fitness values of all
executed test cases across different environments for both Veritas and the Control. As the
130

Average Test Case Fitness

0.327

0.325

0.323

Control

Veritas
Experimental Treatment

Figure 6.11: Comparison of fitness between Veritas and Control.
figure demonstrates, Veritas test cases achieve significantly higher fitness values than those
that were not adapted at run time (Wilcoxon-Mann-Whitney U-test, p < 0.05). Moreover,
these results demonstrate that test cases adapted by Veritas consistently remain relevant to
their changing environment, whereas the non-adapted test cases quickly lost relevance. These
results enable us to reject H0 that no difference exists between adapted and non-adapted
test cases.
Next, Figure 6.12 presents the mean number of test case failures experienced throughout the simulation for both Veritas and the Control while the corresponding utility values for
131

Average Number of False Negatives

5000

4500

4000

3500
Control
Veritas
Experimental Treatment
Figure 6.12: Comparison of false negatives between Veritas and Control.
each test case was satisfied. This type of failure is considered to be a false negative. As is
demonstrated by the figure, adapting test case parameter values with Veritas can minimize
the amount of false negatives experienced at run time in the face of changing environmental
conditions, whereas the non-adapted test cases could not handle the unanticipated environmental conditions and therefore became irrelevant.
As Veritas can successfully maximize fitness (Figure 6.11) and minimize the amount of
false negatives incurred during testing (Figure 6.12), we can safely reject H0 and accept H1 .
Moreover, we can state that Veritas can optimize run-time test cases to fail significantly less
132

often under changing environmental conditions while still maintaining a valid representation
of the operational context.

6.6.3.1

Threats to Validity.

This experiment was intended as a proof of concept to determine the feasibility of
adapting test case parameter values at run time to support the MAPE-T feedback loop.
Veritas was applied to an adaptive system that used mode changes to facilitate adaptation.

One threat to validity is if Veritas can achieve similar results in adaptive systems that do not
use mode changes as a basis for adaptation. Another threat to validity is the validity of the
derived utility functions, as they play a major role in determining the validity of test results.
Furthermore, the performance impact of Veritas on a live system was not explored. Lastly,
the overall representativeness of the test cases with respect to the system and encountered
environmental conditions is another threat to validity.

6.6.4

Combined Experimental Results

Lastly, we enabled both Proteus test suite adaptation and Veritas test case parameter
adaptation together to demonstrate the impact of using both techniques simultaneously,
reusing the experimental setups defined in the previous sections. To this end, Figure 6.13
presents the average number of executed irrelevant test cases for each experiment. In comparison to the Control in which no test adaptation was performed, both Proteus and Veritas
significantly reduce the number of irrelevant test cases (Wilcoxon-Mann-Whitney U-test,
p < 0.05). As Figure 6.13 shows, Proteus incurs far fewer irrelevant test cases than Veritas
incurs alone. Moreover, combining Proteus and Veritas does not significantly reduce irrelevant
test cases further than Proteus alone (Wilcoxon-Mann-Whitney U-test, p < 0.05). This result
indicates that, while Veritas does reduce irrelevant test cases, Proteus provides a much greater
overall impact.

133

Average Number of Irrelevant Test Cases

40

30

20

10

Control

Veritas

Proteus

Proteus+Veritas

Experimental Treatment

Figure 6.13: Average number of irrelevant test cases for combined experiments.
Next, Figure 6.14 presents the amount of false positive test cases executed for each
experiment. In this case, only Proteus reduces the amount of false positives significantly in
comparison to the Control experiment (Wilcoxon-Mann-Whitney U-test, p < 0.05), indicating
that test suite adaptation provides a larger benefit than test case adaptation in reducing false
positives.
Following, Figure 6.15 presents the amount of false negative test cases executed for
each experiment. Here, both Proteus and Veritas significantly reduce the amount of false

134

Average Number of False Positives

800

600

400

Control

Veritas
Proteus
Experimental Treatment

All

Figure 6.14: Average number of false positive test cases for combined experiments.
negatives (Wilcoxon-Mann-Whitney U-test, p < 0.05), however Proteus once more provides
a far greater overall impact than does Veritas.
Lastly, Figure 6.16 presents the average test case fitness of SVS execution over several
consecutive environments. While Veritas does provide a fitness boost for executed test cases,
employing Proteus adaptive test plans introduces a further overall increase in fitness, as only
relevant test cases are executed. This result occurs as Veritas may not have converged to
a fully-optimal solution, while Proteus gains an advantage by executing test cases specified
by adaptive test plans as relevant to the current operating context, and therefore better135

Average Number of False Negatives

5000

4000

3000

2000

1000
Control

Veritas

Proteus

All

Experimental Treatment

Figure 6.15: Average number of false negative test cases for combined experiments.
performing in comparison to Veritas test cases. Together, Proteus and Veritas perform significantly better than the Control that performs no test adaptation (Wilcoxon-Mann-Whitney
U-test, p < 0.05), indicating that the combination of Proteus and Veritas provides a clear
advantage for run-time testing.
In summary, Proteus and Veritas enable test adaptation to improve the relevance of runtime test cases. Proteus ensures that test suites remain relevant to their operational context by
requiring that only relevant test cases are executed, thereby reducing the amount of executed
irrelevant test cases. Furthermore, Veritas ensures that test case parameter values remain
136

0.345

Average Test Case Fitness

0.340

0.335

0.330

0.325

Control

Veritas
Proteus
Experimental Treatment

All

Figure 6.16: Average test case fitness for combined experiments.
relevant to their operational context by ensuring that parameter values adapt alongside the
environment. Together, Proteus and Veritas can increase overall test case relevance in the face
of uncertainty.

6.7

RELAXation of Test Cases

This section presents our investigation into the feasibility of using RELAX operators
to facilitate adaptive testing. Particularly, we examined how RELAX, a fuzzy logic-based

137

language for specifying flexibility in requirements, can be extended to the testing domain
to provide flexibility in executing test cases under uncertain conditions. In this section, we
introduce our technique for applying RELAX operators to test cases, describe and present
our experiment in which RELAX operators were both manually and automatically applied to
a test specification, and discuss the implications of this study.
RELAX [21, 113] was initially developed for the requirements domain to introduce flexibility to requirements and provide a means for tolerating uncertainty. As such, we extended
the RELAX approach to the testing domain to investigate if RELAX operators can provide
flexibility to test cases that may be no longer be indicative of their operating context as a result of uncertainty. Similar to Veritas, RELAX operators can only be applied to non-invariant
test cases to ensure that both safety and failsafe concerns are continually satisfied. Moreover, our approach follows the existing RELAX approach [113] in that uncertainty factors,
monitoring agents, and mathematical relationships between each must be defined for each
use of a RELAX operator. Our approach extends the existing RELAX approach by requiring
that each test case be linked to at least one requirement or goal to provide both traceability
and coverage.
In the case of requirements or goals, a fuzzy logic function is applied to each associated
utility function to provide flexibility in the level of satisfaction, based upon the calculated
utility value. Ideally, a test case can be RELAXed in a similar fashion. Specifically, test cases
derived for Proteus and Veritas each specify a fitness function (c.f., Table 6.2) that captures
the difference between the measured and expected value for each test case. Given that the
fitness function is normalized to a value between [0.0, 1.0], a fuzzy logic function can be
applied to the fitness function to provide extra flexibility in satisfaction with the resulting
distance between the measured and expected value. For the purposes of this dissertation,
we reused the set of RELAX operators presented in Table 2.1.
To investigate test case RELAXation, we both manually and automatically introduced
RELAX operators to non-invariant test cases. For the manual approach, a DAS test engineer

138

identified a set of target test cases most conducive to RELAXation. The automatic approach
applied a genetic algorithm in a similar fashion to the AutoRELAX technique introduced in
Chapter 3.2.2, where test cases replaced goals as targets for automated RELAXation.
We compared both automated and manually RELAXed test specifications to a Control
that did not provide any RELAXation or test adaptation over 50 trials to establish statistical
significance. Figure 6.17 presents boxplots comparing the average test case fitness between
experimental treatments where test cases were automatically RELAXed, manually RELAXed,
and not RELAXed (i.e., the Control). Furthermore, we compared these results to the average
test case fitnesses generated by run-time test adaptation provided by both Proteus and Veritas.
These results indicate that introducing RELAX operators, either manually or automatically,
does not significantly improve test case fitness (Wilcoxon-Mann-Whitney U-test, p < 0.05),
especially in comparison to run-time test case adaptation.
Next, Figure 6.18 presents boxplots of the average number of test case failures recorded
throughout execution, where a test case failure occurs when the measured test case fitness
value falls below the test case failure threshold (c.f., Chapter 6.4.2). These results indicate
that RELAXation does not improve run-time testing by reducing the amount of test case
failures (Wilcoxon-Mann-Whitney U-test, p < 0.05). Using Proteus and Veritas test adaptation does significantly reduce the amount of experienced failures, indicating again that
Proteus and Veritas test adaptation continually improves the relevance of test cases to their

environment.

6.7.1

Discussion

This section has presented our investigation into the feasibility of using RELAX operators to provide extra flexibility for run-time test cases within a test specification. Specifically,
the fuzzy logic functions derived for each RELAX operator were intended to enable RELAXed
test cases to accept a wider range of possible measured values as valid, thereby extending
the relevance of RELAXed test cases to different operational contexts. Experimental results
139

0.7

s
rit
a
s+
Ve

on
C

ot
eu

M

an

Pr

ua

Au

lly

to
R

R

EL

EL

AX

AX

tro

l

ed

0.6

ed

Average Test Case Fitness

0.8

Experimental Treatment

Figure 6.17: Comparison of average test case fitness values for RELAXed test cases.
indicate that a test specification with RELAXed test cases does not perform significantly
better than a non-RELAXed test specification. Moreover, run-time test adaptation provided
by Proteus and Veritas performs significantly better than a RELAXed test specification. As
such, Proteus and Veritas adapt test suites and test case parameters at run time to optimize
testing activities with respect to changing environmental conditions, whereas RELAX operators introduce flexibility into test cases by using fuzzy logic functions to enable a test case
to tolerate a wider range of values than it could without RELAX. The proactive adaptation provided by Proteus and Veritas to ensure that test cases are relevant to their operating

140

Average Test Case Failures

4000

3000

2000

1000

s
rit
a

tro
l

Ve

on

s+

C

ot
eu

lly

Pr

ua
M
an

Au

to
R

R

EL

EL

AX

AX

ed

ed

0

Experimental Treatment

Figure 6.18: Comparison of average test case failures for RELAXed test cases.
context demonstrates a significant advantage in the average test case fitness and number of
failures encountered, whereas the RELAXed test specification may still have been too rigid
to provide any significant benefit. These results indicate that RELAXation of test cases may
not be a feasible strategy for extending the relevance of test cases to changing environmental
conditions.

141

6.7.2

Threats to Validity

The research presented within this section is intended to be a proof of concept to determine the feasibility of applying RELAX operators to test cases to tolerate uncertainty at
run time. As such, one threat to validity is the validity of the derived test specification, as
the safety ranges must be correctly defined to ensure that safety or failsafe concerns are not
violated. Another threat to validity is if similar results can be achieved in a DAS application
that does not implement mode changes.

6.8

Related Work

This section presents existing research that is related to run-time testing. Particularly,
we overview prior work in search-based software testing, run-time testing, test plan generation, and test case selection.

6.8.1

Search-Based Software Testing

Search-based software testing leverages search-based techniques, such as evolutionary
computation, simulated annealing, and hill climbing, to the field of software testing. Searchbased techniques have been successfully applied to automated test case generation, including
applications in model-based testing, regression testing, and structural testing [51, 52, 69].
EvoSuite [40] and Nighthawk [3] are testing frameworks that use evolutionary computation
to generate test suites and instantiate unit tests, respectively. Proteus does not directly leverage search-based techniques, however Veritas applies search-based techniques at run-time. In
contrast, other search-based applications tend to focus at design-time optimization. Using
evolutionary techniques at run time enables a DAS to react to and adapt towards live conditions, rather than requiring the system be taken offline for bug fixes or code optimizations.

142

6.8.2

Run-Time Testing

Testing software at run time provides insights into the behavior and correctness of
a system while executing in different operating conditions. Previously, run-time testing
has been implemented using models [50] and as a variation of reinforcement learning [107].
By extending an assertion language, unit testing has been incorporated into a run-time
verification framework [71]. Run-time software testing has previously been accomplished
by recording execution traces and then replaying them on a secondary computer in parallel
while examining those traces for faults [103]. Markov models have been used to mitigate
uncertainty in a run-time monitoring framework [38], and agent-based approaches have also
been used to facilitate run-time testing in distributed systems [78]. While each of these
approaches facilitate run-time testing, Veritas combines evolutionary search with goal-based
validation. Furthermore, Proteus focuses on maintaining test plan and test case relevance as
system and environmental conditions change.

6.8.3

Test Plan Generation

Automatic generation of software test plans is a field that has been extensively studied. One approach to automated test plan generation uses a requirements specification and
formal grammars to automatically define a test plan [6]. In this approach, requirements
are converted to a finite state automata (FSA) and a grammar is then derived from that
FSA. Test plans are then generated from the grammar and can be used during testing activities. Conversely, Proteus generates new test plans based upon a predefined default test plan
and then executes test plans at run time based upon monitored system and environmental
conditions.
Automated planning approaches have also been applied to testing graphical user interfaces (GUIs) [70]. Here, artificial intelligence techniques were used to anticipate actions
taken by users who are actively using the GUI, resulting in a set of testing goals. The algo-

143

rithm then generates a partially-ordered plan that realizes the testing goals and generates a
set of related test cases. Conversely, Proteus automatically generates a test plan by analyzing
run-time monitoring information to determine which test cases are appropriate based upon
the operational context.

6.8.4

Test Case Selection

Automatic selection and prioritization of test cases has also been extensively studied, with a particularly excellent survey covering the search-based software engineering domain [51]. In this survey, test case selection is concerned with selecting a representative
set of test cases, and test case prioritization is concerned with optimizing the order of test
case execution. However, the surveyed techniques tend to focus on design-time approaches,
whereas Proteus selects test cases at run time. Tropos [78] is an agent-based implementation
that uses a testing agent to continuously validate the behavior of other agents in the system
by performing randomized testing. Conversely, Proteus generates test plans that are targeted
towards each DAS configuration based on each set of operating conditions.

6.9

Conclusion

This chapter has introduced our approach for adaptive run-time testing. We first introduced Proteus, a technique for automatically generating adaptive test plans to ensure that
only relevant test cases are executed at run time. Next, we introduced Veritas, a technique
for adapting test case parameter values at run time to ensure that test cases remain relevant to changing environmental conditions. Together, these approaches support the MAPE-T
feedback loop by enabling the creation and execution of test suites and test cases at run
time. We have demonstrated both Proteus and Veritas on the SVS application that was required to clean a room effectively, efficiently, and safely while mitigating uncertainty in the
system and environment. These uncertainties included random sensor occlusions and/or

144

failures, randomly placed dirt particles, downward steps to be avoided, and randomly placed
obstacles that may cause damage to the SVS or to the object itself. Experimental results
indicate that providing run-time adaptation capabilities for testing activities significantly increases the relevance of executed test cases. Possible future directions for this work include
investigating other search-based techniques, such as simulated annealing or multi-objective
optimization, for run-time use to automatically derive a representative set of test suites and
optimize test case parameters. Lastly, another direction includes providing feedback to the
MAPE-K loop to assist in performing self-reconfigurations.

145

Chapter 7
Impact of Run-Time Testing
This chapter provides an analysis of the impact that run-time testing has on a DAS from
both a performance and a behavioral standpoint. We first motivate the need to analyze the
impact of testing on a DAS. Next, we describe the metrics we use to quantify each type of
impact on the DAS. Following, we discuss optimizations to the run-time testing framework to
reduce the overall impact of run-time testing on the DAS. Lastly, we summarize our findings
and propose future optimizations to our work.

7.1

Motivation

Run-time testing provides a valuable layer of assurance for a DAS. However, the relative impact that a testing framework imposes upon the DAS must also be considered when
performing run-time testing, as the addition of testing activities can require additional processing time, extra memory overhead, and/or unexpected changes to DAS behavior. To this
end, we analyze how our run-time testing framework impacts a DAS at run time in terms of
extra performance overhead or unexpected behavioral changes. We define these metrics in
the following section.

146

7.2

Analysis Metrics

This section presents the metrics used for analyzing the performance and behavioral
impact that run-time testing imposes on a DAS. To study the effect of these metrics on a
DAS, we analyze how run-time testing affects the RDM application at run time.

7.2.1

DAS Performance

We quantify DAS performance based upon two key metrics: total execution time and
memory footprint. The method for which we measure each metric is defined as follows.

7.2.1.1

Total execution time

In simulation, the DAS executes for a set number of timesteps and therefore the total
execution time can be measured. To this end, we measure the total execution time of the
function that is responsible for executing the complete DAS simulation. In particular, we use
the cProfile Python package1 to measure the cumulative time that the simulation execution
function requires. Measuring the execution times of a deterministic DAS instrumented with
run-time testing and with testing disabled can then provide a point of comparison for any
extra time used to perform run-time testing.

7.2.1.2

Memory footprint

Extra memory may be consumed when instrumenting a DAS with a run-time testing
framework. Depending on the hardware used to support the DAS, the extra memory cost
may be prohibitive, particularly in embedded systems where memory is limited. Given that
the RDM application has been implemented in Python, we use the resource package to
examine the total amount of memory consumed throughout execution.
1 See https://docs.python.org/2/library/profile.html.

147

7.2.2

DAS Behavior

While execution time and incurred memory costs are relatively straightforward to quantify, examining differences in DAS behavior is less obvious. To this end, we define two key
metrics for quantifying behavior: requirements satisficement and behavioral function calls.
Each of these metrics are next described in turn.

7.2.2.1

Requirements satisficement

We first examine the extent to which software requirements are satisficed during execution to examine the impact that testing may impose upon a DAS. In particular, we monitor
utility values calculated based upon a provided DAS goal model to determine if any difference occurs when run-time testing is performed on a DAS. Given that a utility value is
required to be within a range of [0.0, 1.0], any difference in behavior can be identified based
on a comparison of the utility values that quantify goal satisficement.

7.2.2.2

Behavioral function calls

We also quantify behavioral performance based upon the number of behavioral function
calls invoked. We define a behavioral function call as a function identified by the DAS
engineer as having an integral impact on DAS behavior. For the purposes of this chapter, we
monitor the number of self-reconfigurations performed by the DAS, as a self-reconfiguration
can create a major divergence in system behavior.

7.3

Baseline Results

This section presents the results found by executing the RDM in a baseline setting to
determine the impact that run-time testing has on a DAS, where the baseline introduces no

148

optimizations.2 In particular, we instrument the RDM application to provide the metrics
that we have previously defined in this chapter. Moreover, we configure run-time testing
activities on the RDM as follows:
• (S1): All run-time testing activities enabled (i.e., Proteus and Veritas enabled)
• (S2): Run-time testing disabled (i.e., Proteus and Veritas disabled)
• (S3): Run-time testing removed (i.e., Proteus and Veritas data structures and functions
removed from DAS codebase)
(S1) executes the RDM with all run-time testing activities enabled, including test case
execution, Proteus test plan adaptation, and Veritas test case parameter value adaptation.
(S2) does not perform run-time testing, however the data structures and functions required
by the framework are still instantiated by the DAS. Lastly, (S3) completely removes the
run-time testing framework from the DAS.
Figure 7.1 presents boxplots of the number of seconds required to execute the RDM
simulation for each testing state across 50 trials. As these results demonstrate, performing
run-time testing requires significantly more time for the simulation to complete than either
disabling or removing run-time testing from the RDM application (Wilcoxon-Mann-Whitney
U-test, p < 0.05).
Next, Figure 7.2 presents boxplots of the total amount of memory consumed by the
RDM application for each testing state. In particular, we examine the total number of
kilobytes required by the RDM simulation across 50 trials. As these boxplots demonstrate,
no significant difference exists in memory overhead incurred by each testing state (WilcoxonMann-Whitney U-test, p > 0.05).
Figure 7.3 presents boxplots that display an average of all utility value calculations
across 50 trials of RDM execution. Here, we average the calculated utility values across
2 For the purposes of this experiment, Proteus is not considered to be an optimization, but

a managing infrastructure and an approach for continuously ensuring test case relevance.
149

40

30

20

S3

S2
sa

mo

ve

ble

d(

d(

(S
led
ab

Te
s

Te
s

tin
g

tin
g

Re

Di

En
tin
g
Te
s

)

)

10

1)

Average Execution Time (sec)

50

Test Configuration

Figure 7.1: Amount of time (in seconds) to execute RDM application in different testing
configurations.

150

Average Memory Consumption (KB)

70000

66000

62000

)
S3
d(
ve
mo
tin
g

Re

Di
tin
g

Te
s

Te
s

Te
s

tin
g

En

sa

ab

ble

led

d(

(S

S2

1)

)

58000

Test Configuration

Figure 7.2: Amount of memory (in kilobytes) consumed by the RDM application in different
testing configurations.

151

all timesteps of RDM execution into a single value to represent overall goal satisficement.
These results, while exhibiting a difference in their respective means, are not statistically
significantly different from each other overall (Wilcoxon-Mann-Whitney U-test, p > 0.05),
resulting in the conclusion that our testing framework does not significantly impact DAS
behavior.

Averaged Utility Values

1.00

0.75

0.50

0.25

)
S3
d(
ve
mo
tin
g

Re

Di
tin
g

Te
s

Te
s

Te
s

tin
g

En

sa

ab

ble

led

d(

(S

1)

S2
)

0.00

Test Configuration

Figure 7.3: Average of calculated utility values throughout RDM execution in different
testing configurations.
Next, Figure 7.4 presents boxplots that display the average number of utility violations
throughout RDM execution. Statistically, performing run-time testing does not incur sig152

nificantly more utility violations as compared to disabling or removing run-time testing, as
no significant difference exists between each set of data (Wilcoxon-Mann-Whitney U-test,

Number of Goal Violations

p > 0.05).

1500

1000

)
S3
d(
ve
mo
tin
g

Re

Di
tin
g

Te
s

Te
s

Te
s

tin
g

En

sa

ab

ble

led

d(

(S

S2

1)

)

500

Test Configuration

Figure 7.4: Average amount of utility violations throughout RDM execution in different
testing configurations.
While the presented results in Figure 7.3 and 7.4 are not statistically significant, a clear
difference exists in the presented mean values. As such, we examine RDM execution traces to
determine why this difference occurs. To this end, we found that, while no difference exists
in the operating context between all three testing states, a minor difference in execution
153

behavior does occur. Specifically, utility functions sample data provided by the RDM each
timestep. When testing is enabled or available, the real time at which utility values are
calculated will be slightly delayed by the extra time required to perform testing activities,
resulting in a sampling of different RDM environmental states. For example, at a certain
point during RDM execution, the number of passive data mirrors increased when testing
was enabled, decreasing the number of active data mirrors and resulting in a partitioned
network. As such, this behavior induces a utility violation in Goal (F) (c.f., Figure 2.4).
We next compare the total amount of self-reconfigurations performed by the RDM
throughout its simulation and present these results in Figure 7.5. This figure corroborates
the information presented in Figures 7.3 and 7.4 in that run-time testing did not incur significantly more adaptations than were found by disabling or removing testing (Wilcoxon-MannWhitney U-test, p < 0.05). These results, coupled with the results presented in Figure 7.3
and 7.4, indicate that our testing framework does not introduce significant behavioral change
to the RDM application.
The results presented in Figures 7.1 – 7.5 indicate that run-time testing only significantly
impacts a DAS in the amount of execution time required, whereas memory overhead and
behavior are not significantly impacted by run-time testing. The next section details our
approaches for optimizing run-time testing from a performance perspective.

7.4

Optimization Approach

We now detail our optimization strategy to lessen the overall impact of run-time testing
on a DAS. As such, this section details our implemented optimizations for run-time testing
in terms of execution time. Specifically, we discuss how we introduce parallelization into
the RDM framework. In particular, we create a separate worker subprocess using Python’s
multiprocessing package3 to perform testing in parallel to RDM execution. The worker
3 See https://docs.python.org/2/library/multiprocessing.html.

154

Number of Adaptations

300

200

100

)
S3
d(
ve
mo
Re

Di

tin
g

tin
g

Te
s

Te
s

Te
s

tin
g

En

sa

ab

ble

led

d(

(S

S2

1)

)

0

Test Configuration

Figure 7.5: Number of adaptations performed throughout RDM execution in different testing
configurations.

155

subprocess comprises instantiation of testing-related data structures, sequential execution of
test cases, and sequential adaptation of test suites and test cases. Information particular to
RDM execution, such as the number of active network links and number of quiescent data
mirrors, are passed to the worker subprocess via a pipe.
We now examine the impact that parallel processing has on the performance of our
run-time testing approach. To this end, we extend the execution states previously defined
in Chapter 7.3 (i.e., (S1), (S2), and (S3)) to understand the impact that each technique has
on RDM performance:
• (S1): Proteus test plan adaptation enabled, Veritas test case adaptation enabled
• (S1.1): Proteus test plan adaptation disabled, Veritas test case adaptation enabled
• (S1.2): Proteus test plan adaptation enabled, Veritas test case adaptation disabled
• (S2): Proteus test plan adaptation disabled, Veritas test case adaptation disabled
• (S3): Testing framework removed (i.e., Control)
We compare the overall execution time for States (S1), (S1.1), and (S1.2) with their
parallelized counterparts (i.e., (S1’), (S1.1’), and (S1.2’)) in Figure 7.6. No comparison to a
parallelized implementation is made for States (S2) and (S3) as no test execution is performed
while these particular states are active. Figure 7.6 presents boxplots of the number of seconds
required to execute the RDM simulation for each testing state across 50 trials.
The results presented in Figure 7.6 demonstrate the differences in execution time between each State and its parallelized counterpart. As such, parallelizing States (S1), (S1.1),
and (S1.2) significantly decreases the amount of execution time required (Wilcoxon-MannWhitney U-test, p < 0.05). Moreover, States (S1’), (S1.1’), and (S1.2’) do not demonstrate
a significant difference between States (S2) and (S3), indicating that parallelizing testing
activities does not significantly impact run-time performance of the RDM (Wilcoxon-MannWhitney U-test, p < 0.05).
156

Figure 7.6: Amount of time (in seconds) to execute optimized RDM application in different
parallel and non-parallel testing configurations.

157

20

40

60

80

u
ote

Pr

Average Execution Time (sec)

e
all

ar

P

e

V
s+

)

lP

S1

s(
rita

r

ote

us

e
+V

rita

1')

S
s(
ote

Pr

u

S
s(

)

Pa

1.1

r

e
all

ote

r
lP

us

)
.1'
Ve

rita

e
all
ar

)

P

.2
S1
s(

Test Configuration

1
(S

e
lV

rita

')

n
sti
Te

.2
S1
(
s

i
gD

b
sa

Te

2)
(S
d
le
ng
sti

Re

m

S3
d(
e
ov

)

To confirm our results, we now examine the RDM framework in greater detail. In
particular, we examine the amount of time required to execute a controlling function that
executes both network and testing activities, whereas we previously examined the execution
time of the RDM simulation as a whole, including instantiation of all requisite data structures
as well as execution of the controlling function. Figure 7.7 next presents boxplots of the
execution time required for each testing state in terms of the controlling function. Again,
parallelizing testing activities significantly reduces the execution time required.
Next, we analyze the speedup that our optimization approach provided using Amdahl’s
Law [2]. In particular, we use Equation 7.1 to calculate the speedup ratio:

S(N ) =

T (1)
T (N )

(7.1)

where S(N ) is the speedup ratio, T (1) is the time required for the program to execute sequentially, and T (N ) is the time required for the program to execute with N worker processes.
For our experiment, we used 1 worker process. Over 50 experimental treatments, we calculate an average S(N ) of 2.742, or a 274.2% speedup, in processing time when parallelization
is introduced.
Following, we re-examine the behavioral impact of testing activities on the RDM in our
parallelized and non-parallelized testing states. Specifically, Figure 7.8 presents the average
utility value calculated during the RDM simulation, Figure 7.9 presents the average number
of utility violations encountered during the simulation, and Figure 7.10 presents the number
of triggered adaptations. For each figure, there again is no significant behavioral difference,
suggesting that our parallelization approach did not induce significant changes to RDM
behavior (Wilcoxon-Mann-Whitney U-test, p < 0.05).
Lastly, we examine two other approaches to reduce the performance impact of testing.
For the first approach, we create two subprocesses to execute in parallel to RDM execution, the first of which performs testing adaptation and the second of which executes the
test cases sequentially. For the second approach, each test case is executed in parallel in
158

Figure 7.7: Amount of time (in seconds) to execute optimized network controller function
in different parallel and non-parallel testing configurations.

159

20

40

60

u
ote

Pr

Average Execution Time (Network Execute Function) (sec)

P

e

V
s+

ar

e
all

)

ote

r

lP

S1
s(
a
t
ri
us

e
+V

rita

S1
s(

')
ote

Pr

u

)

Pa

1.1

S
s(
r

e
all

te
ro
lP

u

)
.1'
Ve

rita

e
all
ar

)

P

.2
S1
s(

Test Configuration

S1
s(

e
lV

rita

')

Te

.2
S1
(
s

n
sti

i
gD

b
sa

Te

)
S2
(
led

ng
sti

Re

m

3)
(S
d
e
ov

Figure 7.8: Average of calculated utility values throughout RDM execution in different
parallel and non-parallel testing configurations.

160

0.6

0.7

0.8

0.9

u
ote

Pr

Average Utility Value

P

e

V
s+

ar

e
all

)

ote

r

lP

S1

s(
rita
us

e
+V

rita

1')

S
s(
ote

Pr

u
e
all

r

)

Pa

1.1

S
s(
ote

r
lP

us

)
.1'
Ve

rita

S1
s(

P

e
all
ar

.2)

Test Configuration

1
(S

e
lV

rita

')

Te

.2
S1
(
s

n
sti

i
gD

b
sa

)

Te

2
(S
led

ng
sti

Re

m

3)
(S
d
e
ov

Figure 7.9: Average number of utility violations encountered throughout RDM execution in
different parallel and non-parallel testing configurations.

161

Number of Goal Violations

Pr

u
ote

500

1000

1500

V
s+

P

e

ar

e
all

)

ote

r

lP

S1

s(
rita
us

e
+V

rita

S1
s(

')
o
Pr

teu

)

e
all

r
Pa

1.1

S
s(
ote

r
lP

us

')
Ve

rita

S1
s(

lle
ra
a
P

.2)

Test Configuration

1.1
(S

ita
er
V
l

S1
s(

Te

)
.2'
n
sti

i
gD

b
sa

Te

)
S2
(
led

ng
sti

Re

m

)
S3
(
ed
v
o

Figure 7.10: Number of adaptations performed throughout RDM execution in different parallel and non-parallel testing configurations.

162

u
ote

0

100

200

300

Pr

Number of Adaptations

P

e

V
s+

ar

e
all

)

ote

r

lP

S1

s(
rita
us

e
+V

rita

S1
s(

')
ote

Pr

u
e
all

r

)

Pa

.1
S1
s(
te
ro
lP

u

)
.1'
Ve

rita

e
all
ar

)

P

.2
S1
s(

Test Configuration

S1
s(

e
lV

rita

s

')

Te

1.2
(S

n
sti

i
gD

b
sa

)

Te

2
(S
led

ng
sti

Re

m

S3
d(
e
ov

)

separate subprocesses. In particular, no dependencies exist between our test cases and, for
this case study, test case execution can be considered an embarrassingly-parallel problem.
Performance slightly improves for the first approach, however is not significant enough to be
considered successful. Interestingly, performance degrades for the second approach. Specifically, executing test cases in parallel significantly increases the execution time required for
the RDM simulation. After further study, we determine that the increase in execution time
is a result of using Python for the RDM application, as the overhead necessary to continually
instantiate subprocesses far outweighs the performance benefit from executing test cases in
parallel.

7.5

Related Work

This section overviews related work in optimizing software testing. In particular, we
examine how physical processors can be monitored to determine when testing should be
performed and how testing can be divested onto agents to remove the burden on the main
processing unit.

7.5.1

Processor Cycles

Previously, Saff and Ernst [91, 92] introduced a continuous testing approach for assisting
a software developer in catching bugs while the developer writes code. The continuous
testing framework was integrated into a development environment to run regression tests
in parallel to development, thereby increasing both development and testing efficiency. In
particular, their approach monitored intervals between developer activity (i.e., writing code,
compiling, etc.) and executed a set of prioritized test cases while the developer was inactive.
As such, continuous testing was shown to dramatically improve both developer and testing
efficiency, specifically by localizing software bugs in advance of a formal testing cycle. In
comparison, our run-time testing approach parallelizes testing activities at run time, reducing

163

the performance impact to the DAS. However, our approach does not fully minimize the
impact that run-time testing incurs on DAS performance. Therefore, regularly monitoring
system activity to determine appropriate intervals for executing test cases and performing
test adaptations may be an optimization to complement our current testing strategy.

7.5.2

Agent-Based Testing

We have previously discussed agent-based testing in detail in Chapter 5.3, however we
now examine the physical use of agents to perform testing. Nguyen et al. [78] examined the
use of multi-agent systems (MAS) to provide a continuous and evolutionary testing approach.
In particular, the agent-based approach requires that a particular agent, or separate software
system, be used solely to perform testing. As such, the performance impact incurred when
performing run-time testing does not affect agents tasked with critical system functionalities,
as they are not required to perform testing. Instead, the testing agent is required to perform
testing and is fully dedicated to that task. Our approach to provide run-time testing uses
subprocesses that execute in parallel to the DAS. While we do offload testing activities
to subprocesses, the DAS and portions of the testing framework share a central processor,
thereby introducing the possibility for contention of processor usage between testing activities
and DAS execution. Introduction of a separate testing agent into the DAS environment,
effectively creating a union of a DAS and a MAS, may be an effective approach to reducing
the performance impact of run-time testing.

7.6

Conclusion

In this chapter, we have described the impact that run-time testing can have on a DAS.
In particular, we examined impact based on two metrics: performance and behavior. Performance was measured based on execution time and memory overhead. Behavior was measured
based on overall requirements satisfaction and the number of performed self-reconfigurations.

164

Our experiments indicate that our run-time testing framework has an impact on performance
in terms of required execution time, however, it does not significantly impact the DAS in
terms of memory overhead, requirements satisfaction, or DAS reconfigurations.
We next introduced an optimization strategy to reduce the performance impact in terms
of execution time. To this end, we introduced parallelization into the RDM simulation, moving instantiation and execution of testing activities into a worker subprocess that executes in
parallel to the RDM. As such, we found that parallelization can significantly reduce overall
execution time of the simulation. In terms of the RDM application, introducing parallelization effectively reduced overall execution time to be comparable with disabling or removing
run-time testing activities. Future investigations include studying how memory overhead can
be optimized in onboard systems. Specifically, examining how test trace data can be scaled
back while still providing useful information to the test engineer, offloading data storage
to separate devices, and introducing extra processors or modules for additional processing
capabilities are examples of future optimizations.

165

Chapter 8
End-to-End RDM Example
This dissertation has presented techniques for providing assurance for a DAS at varying
levels of abstraction and it has demonstrated those techniques in different domains. This
chapter presents an end-to-end example of how these techniques can provide assurance for
the RDM application at its different levels of abstraction. In particular, we present how
each technique can be used sequentially to increase overall assurance. First, we present the
configuration of the RDM application for each portion of the end-to-end investigation. Next,
we discuss how a goal model describing RDM requirements can be automatically refined with
RELAX operators to provide requirements-based assurance, as well as the overall scalability
of this technique. Following, we present our technique for analyzing the RDM codebase to
discover unexpected paths of execution. Lastly, we instrument the RDM application with an
adaptive, run-time testing framework to provide run-time testing-based assurance. Where
applicable, this chapter reuses previously presented experimental results.

8.1

RDM Configuration

This section describes the configuration of the RDM application. In particular, we define
the configuration of the RDM network and sources of uncertainty.

166

The RDM configuration comprises parameters that define the RDM network as well as
parameters that define the sources of uncertainty that can affect the RDM during execution. First, Table 8.1 presents a subset of the parameters that configure the RDM network.
Specifically, we provide the number of timesteps the simulation is specified to run for all experiments, the possible network topologies that may be selected, the number of data mirrors
available to the network, the number of messages that must be distributed by the end of
the simulation, and the available budget and the cost of each network link in terms of U.S.
dollars.
Table 8.1: Subset of RDM network parameters.
Configuration Parameter

Value

Timesteps

300

Network topology

{Complete, Grid, T ree, Random, Social}

Number of data mirrors

[15, 30]

Number of messages

[100, 200]

Budget

[$350, 000.0, $550, 000.0]

Network link cost

[$10, 000.0, $15, 000.0]

...

...

Next, Table 8.2 describes a subset of the parameters that define the sources of uncertainty. Here, the function that generates random values for the RDM application is provided
with a different seed and distribution for each replicate. Furthermore, the probabilities describing the failure rates, fuzz rates, and rates of possible message errors that can occur
during simulation are also presented.

167

Table 8.2: RDM sources of uncertainty.
Configuration Parameter

Value

Random seed

[1, 50]

Random number distribution

{Binomial, Exponential, N ormal,
P oisson, U nif orm}

Probability of data mirror failure

[0.0%, 5.0%]

Probability of data mirror sensor failure

[0.0%, 5.0%]

Probability of data mirror sensor fuzz

[0.0%, 20.0%]

Probability of network link failure

[0.0%, 10.0%]

Probability of dropped message

[0.0%, 20.0%]

Probability of delayed message

[0.0%, 10.0%]

Probability of corrupted message

[0.0%, 10.0%]

...

8.2

...

Requirements-Based Assurance

This section first presents a case study examining how AutoRELAX enables requirementsbased assurance and then investigates the scalability of AutoRELAX with respect to the RDM
application.

8.2.1

AutoRELAX Case Study

We now discuss how assurance can be provided for a DAS at its requirements level.
First, we examine how RELAX can be used to augment the RDM goal model to provide
requirements-based assurance. In particular, we study the impact of AutoRELAX as applied to
the RDM application. Next, we examine how the SAW can be used to enhance requirementsbased assurance by tailoring fitness sub-function weights to different operational contexts.
Lastly, we provide an investigation into the scalability of AutoRELAX.
168

8.2.1.1

RDM Goal RELAXation

We applied AutoRELAX to the RDM application to explore how AutoRELAX enables
requirements-based assurance. To facilitate readability, we now recreate the RDM goal
model in Figure 8.1 and AutoRELAX fitness functions in Equations (8.1) – (8.3). As such,
we examined the resulting fitness values generated by AutoRELAXed goal models, manually
RELAXed goal models, and unRELAXed goal models. For the manually RELAXed goal model,
we introduced RELAX operators to Goals (C), (F), (G), and (H) (c.f., Figure 8.1).

F Fnrg = 1.0 −

|relaxed|
|Goalsnon−invariant |

(8.1)

|adaptations|
,
|f aults|

(8.2)

F Fna = 1.0 −



 αnrg ∗ F Fnrg + αna ∗ F Fna
F itness V alue =


0.0

iff invariants true

(8.3)

otherwise

Table 8.3 presents the configuration of both the genetic algorithm and hyper-heuristic
used for AutoRELAX and AutoRELAX-SAW, respectively.
Table 8.3: End-to-end AutoRELAX-SAW configuration.
Configuration Parameter

Value

Number of generations

50

Population size

20

Crossover rate

50.0%

Mutation rate

40.0%

SAW Sampling Rate

Every 5th generation

169

(A)

Maintain [Operational
Costs  ≤    Budget]

(B)

Achieve [Measure
Network Properties]

(D)

(J)

Achieve
[Activity
Measured]

…

(E) Achieve [Minimum Num
Links Active]

(M)

(L)

(K)

Achieve
[Cost
Measured]

Maintain
[DataAvailable]

Achieve
[LossRate
Measured]

(N)

Achieve
[Workload
Measured]

Link
Sensor

Achieve
[Capacity
Measured]

(F)

Achieve [Network
Partitions == 0]

(P)

(O)

Achieve [Link
Deactivated]

Achieve [Link
Activated]

Network
Actuator

RDM
Sensor

(A) Left half of remote data mirroring goal model.
…

Achieve [NumDataCopies ==
NumServers]

(C)

(G) Achieve [DataAtRisk  ≤  
RiskThreshold]
(Q)
Achieve
[Send Data
Synchronously]

(H)

Achieve [DiffusionTime  ≤  
MaxTime]

(R)
Achieve
[Data Sent ==
Data Received]

(S)
Achieve
[Send Data
Asynchronously]

(I)

(T)
Achieve
[Data Received
== Data Sent]

Network
Controller

Achieve [Adaptation
Costs == 0]

(U)
(V)
(W)
Achieve
Achieve
Achieve
[Num Active
[Num Passive Data [Num Quiescent
Data Mirrors ==
Mirrors == 0]
Data Mirrors == 0]
Num Mirrors]

Adaptation
Controller

Legend
Goal

Agent

Requirement / Expectation

Refinement

(B) Right half of remote data mirroring goal model.

Figure 8.1: KAOS goal model of the remote data mirroring application.

170

Figure 8.21 presents boxplots of the average fitness values calculated for an RDM application using AutoRELAX to configure its goal models, a manually-RELAXed goal model
configured by a requirements engineer, and a Control in which no goal RELAXations were
applied. As is demonstrated by these results, AutoRELAX can attain significantly higher
fitness values than can be found using either the manually-RELAXed goal model or the

0.70
0.55

0.60

0.65

Fitness

0.75

0.80

0.85

unRELAXed goal model.

AutoRELAXed

Manually RELAXed

UnRELAXed

Experimental Treatment

Figure 8.2: Fitness values comparison between RELAXed and unRELAXed goal models for
the RDM.
Furthermore, we examined how SAW can optimize fitness by evolving the fitness subfunction weights. Figure 8.32 presents boxplots of the average fitness values calculated for
1 Figure 8.2 was previously presented in Chapter 3.3.1.1.
2 Figure 8.3 was previously presented in Chapter 3.3.1.2.

171

AutoRELAXed goal models and goal models evolved with AutoRELAX-SAW. As these results

0.85
0.70

0.75

0.80

Fitness

0.90

0.95

1.00

demonstrate, SAW can significantly increase fitness by evolving fitness sub-function weights.

AutoRELAX

SAW-AutoRELAX

Experimental Treatment

Figure 8.3: Fitness values comparison between AutoRELAXed and SAW-optimized AutoRELAXed goal models for the RDM.
Based on these results, we conclude that AutoRELAX can generate better configurations
of RELAX operators than can be found by a requirements engineer. We analyzed the resulting
set of data and determined that Goals (C), (F), and (T) provided the largest impact to RDM
fitness. To this end, we manually introduce RELAX operators to Goals (C), (F), and (T) in all
instances of the RDM application for the remainder of this end-to-end study, as RELAXation
of these goals provides extra flexibility necessary for the RDM to satisfy its key objectives.

172

Figure 8.4 presents the RDM goal model updated to use the RELAX operators found by
AutoRELAX.

8.2.2

Scalability of RDM Application

In this section, we explore the scalability of the RDM application. First, we introduce
the parameters used to configure the RDM in both the normal and scaled experiments. Next,
we describe our approach for comparing the normal and scaled versions of the RDM. Finally,
we explore the effectiveness of AutoRELAX in the scaled RDM application.

8.2.3

Scaled RDM Configuration

The RDM application accepts a configuration file that defines parameters such as the
number of data mirrors, number of messages to disperse, and an operating budget. Thus far,
the RDM has been configured on a relatively small scale for simulation purposes. Table 8.4
presents the relevant configuration parameters that have been increased to effectively scale
the RDM application.
Table 8.4: Comparison of RDM Configuration Parameters.
Configuration Parameter
Number of Messages
Timesteps

Normal RDM Value

Scaled RDM Value

[100, 200]

[800, 1000]

300

2000

As such, only two parameter updates were necessary to scale the RDM application, as
the existing number of data mirrors and operating budget can successfully handle the larger
number of messages inserted into the network. In particular, we increased the number of
messages to be distributed to effectively increase the amount of traffic handled by the RDM
application. To this end, we also increased the number of timesteps available for the RDM
to disseminate those messages, as extra time was required to fully replicate each message
throughout the network.
173

Maintain
[DataAvailable]

(A)

Maintain [Operational
Costs  ≤    Budget]

(B)

…

(F)

Achieve [Measure
(D)
Network Properties]

(J)
Achieve
[Cost
Measured]

(E) Achieve [Minimum Num
Links Active]

(M)

(L)

(K)
Achieve
[Activity
Measured]

Achieve
[LossRate
Measured]

Achieve
[Workload
Measured]

Link
Sensor

(N)
Achieve
[Capacity
Measured]

Achieve [Network
Partitions AS CLOSE AS
POSSIBLE TO 0]

(P)

(O)

Achieve [Link
Deactivated]

Achieve [Link
Activated]

Network
Actuator

RDM
Sensor

(A) Left half of RELAXed remote data mirroring goal model.
…

Achieve [NumDataCopies AS CLOSE AS
POSSIBLE TO NumServers]

(C)

(G) Achieve [DataAtRisk  ≤  
RiskThreshold]
(Q)
Achieve
[Send Data
Synchronously]

(H)

Achieve [DiffusionTime  ≤  
MaxTime]

(R)
Achieve
[Data Sent ==
Data Received]

(S)
Achieve
[Send Data
Asynchronously]

(I)

Achieve [Adaptation
Costs == 0]

(T)
(U)
(V)
(W)
Achieve
Achieve
[Data Received
Achieve
Achieve
[Num Active
AS CLOSE AS
[Num Passive Data [Num Quiescent
Data Mirrors ==
POSSIBLE TO
Mirrors == 0]
Data Mirrors == 0]
Num Mirrors]
Data Sent]

Network
Controller

Adaptation
Controller

Legend
Goal

Agent

Requirement / Expectation

Refinement

(B) Right half of RELAXed remote data mirroring goal model.

Figure 8.4: RELAXed KAOS goal model of the remote data mirroring application.

174

8.2.4

Approach

For this study, we analyzed the scalability of the RDM application as applied to AutoRELAX. In particular, we were interested in the viability of evolving a RELAXed goal model

for a network that has been scaled upward. To this end, we reuse the AutoRELAX approach
introduced in Chapter 3.2.2, however the RDM was configured to use the scaled configuration parameters defined in Table 8.4. Moreover, we also reuse the RDM goal model
presented in Figure 8.4. In the interest of both execution time and available storage space,
AutoRELAX-SAW was not applied to this study.

8.2.5

Experimental Results

We now present experimental results from applying AutoRELAX to a scaled RDM application. We performed three experimental treatments to determine the scalability of both
the RDM and AutoRELAX. For the first treatment, we applied AutoRELAX to the RDM goal
model. For the second treatment, we manually applied RELAX operators to the RDM goal
model. In particular, RELAX operators were manually applied to Goals (F), (G), and (H)
(c.f., Figure 8.1). Lastly, the third treatment was a Control in which no goal RELAXation
was performed. For each treatment, we performed 50 trials for statistical significance.
Figure 8.5 presents boxplots of the average fitness values calculated for the RDM application over each experimental treatment. These results indicate that automatically RELAXing
goals can yield a higher overall fitness as compared to performing no goal RELAXation
(Wilcox-Mann-Whitney U-test, p < 0.05). However, manually RELAXing the goal model
did not yield significantly higher fitness values than the Control, indicating that a RELAXed
goal model will not always yield higher fitness values when scaling the RDM application.
Furthermore, we manually RELAXed other goals to determine if different configurations of
RELAX operators would perform better. In each case, overall fitness did not significantly
improve compared to the Control experiment.

175

1.00

Average Fitness

0.75

0.50

0.25

0.00
AutoRELAXed

Manually RELAXed

UnRELAXed

Experimental Treatment

Figure 8.5: Fitness values comparison between RELAXed and unRELAXed goal models for a
scaled RDM.
While scaling the RDM application is feasible, the amount of data generated was exponentially larger than the dataset generated by the normally-configured application. In
particular, each experimental replicate generated a dataset several gigabytes in size and
moreover required at least three days to run on a high-performance computing cluster. In
total, the combined number of executed replicate trials required roughly three weeks of computing time to run. To summarize, this experiment has demonstrated the feasibility of scaling

176

Table 8.5: End-to-end Fenrir configuration.
Configuration Parameter

Value

Number of generations

15

Population size

20

Crossover rate

25.0%

Mutation rate

50.0%

Novelty threshold

10.0%

both the RDM application and AutoRELAX while presenting the overhead encountered when
performing AutoRELAX in an upward-scaled environment.

8.3

Code-Based Assurance

This section presents how we used Fenrir to address assurance for the RDM’s implementation to address code-based uncertainty. In particular, we examine how an RDM engineer
can analyze output provided by Fenrir to identify sections of code that may require an update. The RDM engineer can then update the associated system goal model or requirements
specification as necessary to reflect the required changes to the DAS implementation.

8.3.1

Fenrir Analysis

We applied RELAX operators to the goals previously identified by AutoRELAX (c.f., Figure 8.4) to determine the RDM’s response to uncertainty when augmented with RELAX. In
particular, we applied RELAX operators to Goals (C), (F), and (T). Following, we executed
the instrumented, RELAXed RDM application over 50 trials to provide a statistically significant suite of possible execution behaviors. Table 8.5 presents the configuration of the
novelty search algorithm implemented by Fenrir.

177

As such, we selected a subset of the resulting RDM execution traces that exhibited
the largest number of errors for manual analysis. We then identified a section of code
that continually triggered an error condition in the data distribution portion of the RDM
codebase. In particular, the RDM was attempting to send messages via faulty data mirrors,
thereby triggering an assertion error. We then rectified the problem by ensuring that a faulty
data mirror is never selected for message distribution.
Figure 8.6 presents boxplots that compare the average number of errors exhibited by
the original RDM codebase to the average number of errors exhibited by the updated RDM
codebase. These results demonstrate that the bugfix significantly reduces the number of
errors encountered throughout the simulation (Wilcoxon-Mann-Whitney U-test, p < 0.05).
Moreover, the introduced bugfix reduces the maximum number of errors to 1, indicating
that the updated section of code was responsible for the majority of errors logged within the
RDM application.
The data presented within this section, particularly in Figure 8.6, demonstrate that
Fenrir enables a DAS engineer to effectively identify and rectify problems within a DAS

codebase. Specifically, the novel set of system and environmental conditions generated by
Fenrir provide a repository of operational contexts with which to exercise DAS behavior in

uncertain situations.

8.4

Run-Time Testing-Based Assurance

This section discusses the application of Proteus and Veritas to the RDM application to
provide run-time testing assurance. We first discuss the derivation of the test specification
and then analyze the results from applying both Proteus and Veritas to the RDM, respectively.

178

Average Number of Errors

900

600

300

0
Original Codebase

Updated Codebase

Experimental Treatment

Figure 8.6: Comparison of average number of RDM errors between original and updated
RDM application across 50 trials.

8.4.1

Derivation of Test Specification

To perform run-time testing on the RDM, a test specification was derived based upon
its requirements. As such, we derived a representative set of test cases based on the goal
model presented in Figure 8.4. Furthermore, we analyzed the RDM codebase to create extra
test cases to validate system functionality. In total, 34 test cases were derived for run-time
execution on the RDM application. Each test case was correlated to at least one goal in

179

the goal model to measure test case relevance, as described in Chapter 6.4.2. Of those 34
test cases, 7 test cases were considered invariant and therefore precluded from adaptation.
Moreover, invariant test cases were re-executed each testing cycle to ensure that system
constraints are continually satisfied. The remaining 27 test cases were labeled non-invariant
and were therefore targets for adaptation.
To provide assurance at the run-time testing level, the RDM application was outfitted
with the Proteus run time testing framework, including management of run-time testing
activities and coarse-grained test plan adaptation. Furthermore, Veritas was also implemented
to provide fine-grained test case adaptation capabilities. The following sections present
results in which run-time testing activities, including both types of test adaptation, were
performed on the RDM application.

8.4.2

Proteus Analysis

To evaluate the impact of Proteus test plan adaptation on the RDM application, we
compared Proteus adaptive test plans to a manually-derived test plan. Moreover, Veritas was
disabled to focus solely on the effects of Proteus test plan adaptation. The manually-derived,
or Control, test plan comprised all test cases from the test specification. The test cases in
the Control test plan were only executed when the conditions for test case execution, as
specified by each individual test case, were met. Lastly, to provide statistical significance,
we performed 50 trials of each experiment.
First, we examined the amount of executed irrelevant test cases to demonstrate how
Proteus can reduce the expended effort of the testing framework. Figure 8.7 presents the

cumulative amount of irrelevant test cases executed using Proteus adaptive test plans and the
Control test plan, respectively. As the boxplots in Figure 8.7 demonstrate, Proteus adaptive

test plans significantly reduce the amount of executed irrelevant test cases (Wilcoxon-MannWhitney U-test, p < 0.05). This result enables us to conclude that using adaptive test plans
can reduce the amount of unnecessary effort expended by the testing framework.
180

Cumulative Amount of Irrelevant Test Cases

4.0

3.5

3.0

2.5

2.0
Control

Proteus
Experimental Treatment

Figure 8.7: Cumulative number of irrelevant test cases executed for each experiment.
Figure 8.8 next presents the amount of false positive (i.e., test case passed but the
correlated utility function was violated) test results found when performing run-time testing.
As the results demonstrate, Proteus adaptive test plans can significantly reduce the amount of
false positive test results (Wilcoxon-Mann-Whitney U-test, p < 0.05), indicating that Proteus
adaptive test plans can reduce the amount of executed test cases that are not relevant to
current operating conditions.
Next, Figure 8.9 depicts the amount of false negative (i.e., test cases that fail but the correlated utility function is satisfied) test cases. Here, Proteus significantly reduces the amount
181

Cumulative Amount of False Positives

30

20

10

0
Control

Proteus
Experimental Treatment

Figure 8.8: Cumulative number of false positive test cases for each experiment.
of false negative test cases (Wilcoxon-Mann-Whitney U-test, p < 0.05), demonstrating that
adaptive test plans can reduce the overall amount of adaptations required by the testing
framework, thereby reducing the overall cost of testing on the DAS.
Lastly, Figure 8.10 presents the total number of executed test cases for both Proteus
and the Control. This figure demonstrates that adaptive test plans significantly reduce the
amount of test case executions at run-time (Wilcoxon-Mann-Whitney U-test, p < 0.05).
This result enables the conclusion that using Proteus adaptive test plans can again reduce

182

Cumulative Amount of False Negatives

30

25

20

15

10

Control

Proteus
Experimental Treatment

Figure 8.9: Cumulative number of false negative test cases for each experiment.
the effort required by the testing framework, as the total number of test cases executed at
run time has been minimized.

8.4.3

Veritas Analysis

To evaluate the impact of fine-grained test case adaptation on the run-time testing
framework we enabled Veritas and augmented the original test specification with acceptable
adaptation ranges for each test case. We then compared the results to a Control in which

183

Cumulative Amount of Executed Test Cases

10000

8000

6000

4000

Control
Proteus
Experimental Treatment
Figure 8.10: Cumulative number of executed test cases for each experiment.
test case adaptation was not performed. For this experiment, we disabled Proteus test plan
adaptation to focus solely on the results provided by Veritas.
In particular, we examine the average test case fitness value through execution of the
RDM. Test case fitness is considered to be the difference between the test case expected
value and the value measured by the testing framework (test case fitness is formally defined in Chapter 6.4.2). Figure 8.11 presents the average test case fitness values calculated
throughout execution of the RDM application. These results indicate that Veritas can adapt
test case parameter values to be more relevant towards their environment, as Veritas returned
184

Average Test Case Fitness

0.88

0.86

0.84

Veritas

Control
Experimental Treatment

Figure 8.11: Average test case fitness values calculated for each experiment.
significantly higher fitness values during the course of execution (Wilcoxon-Mann-Whitney
U-test, p < 0.05).

8.5

Conclusion

This chapter has discussed our investigation into applying each assurance technique
presented within this dissertation sequentially to the RDM application to demonstrate the
effectiveness of our techniques. First, we explored how AutoRELAX provides requirements-

185

based assurance by automatically RELAXing a system goal model. To this end, we discovered
that introducing RELAX operators to the RDM goal model can increase overall fitness of
the application, and moreover Goals (C), (F), and (T) are most amenable to RELAXation.
We also explored the scalability of the RDM application. We performed the AutoRELAX
experiment on an RDM application that was required to distribute an exponentially higher
number of messages than was required for the normal RDM. The scaled version of the
RDM successfully satisfied its requirements, however it incurred significant overhead in both
execution time and memory overhead.
Next, we applied Fenrir to the RELAXed RDM application to enable implementationbased assurance. In particular, analyzing Fenrir output enabled us to discover a major bug
within the RDM application that accounted for the majority of run-time errors. Lastly,
we performed run-time testing on the RDM application to enable run-time assurance that
the RDM is behaving in accordance with its requirements. As such, we enabled Proteus
test plan adaptation and Veritas test case parameter value adaptation to provide flexibility
when executing tests in uncertain environments, ensuring that test cases remained relevant
throughout changing conditions.

186

Chapter 9
Conclusions and Future Investigations
This chapter summarizes our contributions to the field of software engineering by providing DAS run-time assurance at different levels of abstraction. We also propose future
investigations to complement this body of work.
Software systems are becoming increasingly complex to enable execution in uncertain
environments. In particular, software systems can adapt at run time in the face of uncertainty
to continue execution while still satisfying requirements. DASs provide self-reconfiguration
capabilities to change configuration states when exposed to different types of environmental
conditions. As such, providing continuous assurance that a DAS is executing correctly and
safely is a concern, especially as DASs take part in increasingly critical systems, such as
health care infrastructure and power grid management. To this end, researchers have been
exploring methods of providing run-time assurance through system monitoring and run-time
testing, particularly when faced with uncertain environmental conditions.
This dissertation presented techniques for providing run-time assurance for a DAS at
different levels of abstraction, particularly at design time and run time. First, we described
how to provide assurance at the requirements level with AutoRELAX [42, 88] and AutoRELAXSAW [42], two techniques that add flexibility to a system goal model and tailor fitness sub-

function weights to particular environmental contexts, respectively. Next, we presented our

187

work in exploring how assurance can be provided during design time at the implementation
level with Fenrir [45], our technique for exploring how a DAS executes in different operational
contexts. Following, we outlined a guiding approach to enabling run-time testing with the
MAPE-T feedback loop [44]. MAPE-T is derived from the DAS MAPE-K feedback loop, and

as such, can be used in tandem with MAPE-K to provide run-time assurance. We also outlined the key challenges and enabling technologies to realize MAPE-T. Then, we introduced
two techniques for providing run-time assurance with adaptive testing. Specifically, we introduced Proteus [41], a technique for test suite adaptation, and Veritas [43], a technique for
adapting test case parameter values. We also investigated the feasibility of using RELAX operators to provide flexibility for run-time test case results. To demonstrate the effectiveness
of our techniques, we applied Fenrir, AutoRELAX, and AutoRELAX-SAW to an application that
must dynamically reconfigure an RDM application. Furthermore, we also applied AutoRELAX, AutoRELAX-SAW, Proteus, and Veritas to a cyber-physical system that must efficiently,

effectively, and safely vacuum a room.
We then studied the impact that run-time testing imparts on a DAS. In particular, we
examined DAS performance and behavior while run-time testing was enabled. We discovered
that a DAS outfitted with a run-time testing framework requires significantly more time to
execute than a DAS that does not perform run-time testing. However, our testing framework
did not consume significantly extra memory, and moreover did not significantly impact the
DAS from a behavioral standpoint. Therefore, we conclude that, as long as the DAS can
tolerate the extra time required, run-time testing provides a considerable benefit for runtime assurance. Lastly, we performed an end-to-end investigation of each of our techniques
as applied to the RDM application to demonstrate how each technique enhances assurance
at different levels of abstraction.

188

9.1

Summary of Contributions

There were three overarching objectives for this body of research:
1. Provide a set of techniques that effectively and efficiently mitigate uncertainty by
providing assurance at different levels of DAS abstraction.
2. Maximize the use of automation in designing, performing, and analyzing each technique.
3. Provide a well-defined feedback loop to enable run-time testing of a DAS.
As such, this dissertation has presented the following research contributions:
• Defined a suite of techniques that provide assurance for a DAS at its requirements
level [42, 88], implementation level [45], and run-time testing level [41, 43, 44].
• Explored how search-based software engineering techniques, particularly evolutionary
computation, can be used to enable assurance for a DAS at different levels of abstraction. Specifically, we investigated how a genetic algorithm and hyper-heuristic
can mitigate requirements-based uncertainty [42, 88], how novelty search can mitigate
implementation-based uncertainty [45], and how an online evolutionary algorithm can
mitigate run-time testing-based uncertainty [43]. Moreover, we explored how automated techniques can be used to create, execute, and maintain run-time software test
suites [41].
• Proposed and instantiated a well-defined DAS testing feedback loop for execution at
run time [44], and moreover provided an implementation of the different aspects of the
feedback loop [41, 43].

189

9.2

Future Investigations

The investigations described within this dissertation demonstrate approaches for providing software assurance at different levels of abstraction to address uncertainty. Based on
the presented results, we now describe complementary investigations that can be pursued to
extend our line of work in automatically enabling DAS assurance. We next describe each
investigation in turn.

9.2.1

Exploration of Different Evolutionary Computation Techniques

Evolutionary computation is used as our main approach for automatically searching a
solution space for an optimal solution. AutoRELAX uses a genetic algorithm to search for
optimal configurations of RELAXed goal models, and moreover uses a hyper-heuristic, SAW,
to search for an optimal combination of fitness sub-function weights. Fenrir leverages novelty
search to find a solution set comprising the most different solutions within the search space.
Lastly, Veritas implements an online evolutionary algorithm, the (1+1)-ONLINE EA, to find
optimal combinations of test case parameter values.
We envision that exploration of other search-based techniques can provide an interesting
comparison point to this body of work to determine the effect that other approaches may
have on providing assurance, as other evolutionary approaches may uncover unexplored areas
of a solution space that our techniques may not have found. In particular, approaches such
as multi-objective optimization, simulated annealing, or particle swarm optimization can be
used to compare to the results found by the evolutionary approaches we have used. Moreover,
exploration into the extension of these techniques to the run-time domain would also be of
great benefit to the field of software engineering, as few run-time evolutionary algorithms
currently exist, and many advanced applications (e.g., power grid systems, autonomous
vehicles, etc.) may benefit from an available suite of run-time, search-based heuristics.
190

Furthermore, different types of distance metrics can be explored for use in Fenrir to determine
if a different approach can uncover solutions that provide a richer set of expressed DAS
behaviors. For instance, we use the Manhattan distance metric, whereas others, such as
Euclidean distance, could be explored in comparison. Lastly, probabilistic methods [4, 15,
38, 84] may also be used to explore the search space.

9.2.2

Interfacing with the DAS MAPE-K Loop

We have provided sample instantiations of the MAPE-T feedback loop for run-time testing. We envision that future investigations can more tightly link the DAS MAPE-K loop
with the MAPE-T loop to enable fine-grained control over DAS reconfiguration capabilities.
In particular, test cases that have been correlated with a utility function could be used to
validate different aspects of the utility function, enabling a fine-grained analysis of each utility function. As such, the DAS reconfiguration engine could consume this information and
perform a targeted reconfiguration, thereby reducing the overall impact of reconfiguration to
the DAS and potentially reducing unexpected side effects of a total reconfiguration.

9.2.3

Hardware Realization of the MAPE-T Loop

To validate the techniques presented in this dissertation, we have simulated both a
high-level networking application and a low-level onboard control for an embedded system.
However, given the reality gap that naturally exists between simulation and implementation,
we envision that a hardware-based realization of a DAS with which to implement the MAPE-T
loop can provide an even richer testbed for experimentation in both validating our simulated
results and enabling further research into the MAPE-T loop.

191

9.2.4

Incorporation of MAS Architecture

Lastly, unifying the DAS and MAS architectures to create a new type of adaptive
system could be a particularly interesting path of research. Specifically, the use of agents in
the DAS architecture could provide a significant advantage in that processor- or memoryintensive tasks could be offloaded to a separate agent, thereby freeing the DAS to fully focus
on monitoring its environment and performing self-reconfigurations. Given the prior work
we have previously described in agent-based testing [78], we envision that using agents to
perform software testing can significantly reduce the impact of run-time testing on the DAS
architecture.

192

APPENDIX

193

Appendix
Test Specifications
This appendix presents the test specifications used for both the smart vacuum system
(SVS) and remote data mirroring (RDM) case studies within this dissertation. For each
case study, we first present the full test specification, including a description of each test
case, expected value(s), and the safety tolerances used to enable safe adaptation of test case
parameter values. We then provide the traceability links between tests and requirements,
particularly, how each test case is correlated to a particular goal within the corresponding
goal model.

Smart Vacuum System Test Specification
This section overviews the test specification used for the SVS case study within this
dissertation. Particularly, Table A.1 presents the test specification that is used as a basis for
run-time testing and adaptation. For each test case, we present the test case identifier (ID),
the test case type (i.e., invariant or non-invariant), a description, the expected value(s), and
the safety tolerance used for adaptation. Furthermore, if the safety tolerance is listed as
N/A, then the test case is considered non-invariant, however cannot be physically adapted
as no defined flexibility was provided. Lastly, while some test cases appear to be duplicates,

194

they may measure different variables, execute under different circumstances, or provide a
slightly different safety tolerance to explore different test case adaptations.

Table A.1: Smart vacuum system test specification.
TC1
Test case type

Non-invariant

Description

Room sensor can detect the amount of dirt cleaned by the SVS.

Expected value

RoomSensorActive = T rue

Safety tolerance

N/A

TC2
Test case type

Non-invariant

Description

SVS executes a power conservation plan to maximize the amount
of time it spends vacuuming the room.

Expected value

P owerM odif ier ∈ [0.0, 1.0)

Safety tolerance

N/A

TC3
Test case type

Non-invariant

Description

SVS executes a path plan to maximize the amount of coverage
while vacuuming the room.

Expected value

P athP lan ∈ {RAN DOM, SP IRAL, ST RAIGHT }

Safety tolerance

N/A

TC4
Test case type

Non-invariant

Description

SVS internal sensor can detect battery usage and percentage of
room covered.
Continued on next page

195

Table A.1 (cont’d)

Expected value

InternalSensorActive = T rue

Safety tolerance

N/A

TC5
Test case type

Non-invariant

Description

SVS must be in motion.

Expected value

Speed > 0.0

Safety tolerance

Speed ≥ 0.0

TC6
Test case type

Non-invariant

Description

SVS suction must be active.

Expected value

SuctionActive = T rue

Safety tolerance

N/A

TC7
Test case type

Invariant

Description

Bumper sensor tested by internal health monitor SHALL always
be healthy.

Expected value

BumperSensor.Healthy = T rue

Safety tolerance

N/A

TC8
Test case type

Invariant

Description

No large dirt particles SHALL be inside vacuum at any point in
time.

Expected value

N oLargeDirtDetected = T rue
Continued on next page

196

Table A.1 (cont’d)

Safety tolerance

N/A

TC9
Test case type

Invariant

Description

SVS SHALL never fall off cliff.

Expected value

F allDetected = F alse

Safety tolerance

N/A

TC10
Test case type

Invariant

Description

SVS SHALL never collide with non-collideable objects.

Expected value

InvalidCollision = F alse

Safety tolerance

N/A

TC11
Test case type

Invariant

Description

SVS SHALL never collide with liquid objects.

Expected value

InvalidCollision = F alse

Safety tolerance

N/A

TC12
Test case type

Non-invariant

Description

If OBJECT detected within 0.5m of SVS, SVS SHALL execute
BACKUP procedure for 2 seconds.

Expected value

DistanceT oObject = 0.5m

Safety tolerance

±0.2m

TC13
Continued on next page
197

Table A.1 (cont’d)

Test case type

Non-invariant

Description

If LIQUID detected within 0.5m of SVS, SVS SHALL execute
BACKUP procedure for 2 seconds.

Expected value

DistanceT oLiquid = 0.5m

Safety tolerance

±0.2m

TC14
Test case type

Non-invariant

Description

If CLIFF detected by SVS cliff sensor, SVS SHALL execute
BACKUP procedure for 2 seconds.

Expected value

DistanceT oClif f = 0.5m

Safety tolerance

±0.2m

TC15
Test case type

Non-invariant

Description

If LARGE DIRT PARTICLE detected within 0.5m of SVS, SVS
SHALL execute BACKUP procedure for 2 seconds.

Expected value

DistanceT oLargeDirtP article = 0.5m

Safety tolerance

±0.2m

TC16
Test case type

Invariant

Description

Wheel sensor tested by internal health monitor SHALL always
be healthy.

Expected value

W heelSensor.Healthy = T rue

Safety tolerance

N/A
Continued on next page

198

Table A.1 (cont’d)

TC17
Test case type

Non-invariant

Description

If LEFT BUMPER array triggered, SVS SHALL reorient by
π/4.0 radians clockwise.

Expected value

CurrentP athAngle = CurrentP athAngle + (π/4.0)rad

Safety tolerance

±1.0rad

TC18
Test case type

Non-invariant

Description

If RIGHT BUMPER array triggered, SVS SHALL reorient by
π/4.0 radians counter-clockwise.

Expected value

CurrentP athAngle = CurrentP athAngle − (π/4.0)rad

Safety tolerance

±1.0rad

TC19
Test case type

Non-invariant

Description

If CENTER BUMPER array triggered, SVS SHALL reorient by
π/4.0 radians in the preferred (configured) direction.

Expected value

CurrentP athAngle = CurrentP athAngle ± (π/4.0)rad

Safety tolerance

±1.0rad

TC20
Test case type

Non-invariant

Description

If BatteryPower falls below 50%, SVS reduces power to either
SUCTION or MOTOR.

Expected value

BatteryP ower ≤ 50%
Continued on next page

199

Table A.1 (cont’d)

Expected value

V elocity ≥ 0.0

Safety tolerance

N/A

TC21
Test case type

Non-invariant

Description

If BatteryPower falls below 25%, SVS reduces power to either
SUCTION or MOTOR, whichever is still operating at FULL
POWER.

Expected value

BatteryP ower ≤ 25%

Safety tolerance

N/A

TC22
Test case type

Non-invariant

Description

If BatteryPower falls below 10%, SVS reduces power further to
SUCTION and MOTOR.

Expected value

BatteryP ower ≤ 10%

Safety tolerance

N/A

TC23
Test case type

Invariant

Description

Suction sensor tested by internal health monitor SHALL always
be healthy.

Expected value

SuctionSensor.Healthy = T rue

Safety tolerance

N/A

TC24
Test case type

Invariant
Continued on next page

200

Table A.1 (cont’d)

Description

SVS suction must not be impeded by large objects or liquids.

Expected value

InvalidCollision = F alse

Safety tolerance

N/A

TC25
Test case type

Non-invariant

Description

If RANDOM selected, SVS SHALL execute RANDOM path
plan for 10 seconds.

Expected value

P athP lanT imer = 10sec

Safety tolerance

±5.0sec

TC26
Test case type

Non-invariant

Description

If STRAIGHT selected, SVS SHALL execute STRAIGHT path
plan for 10 seconds.

Expected value

P athP lanT imer = 10sec

Safety tolerance

±5.0sec

TC27
Test case type

Non-invariant

Description

SVS continually provides forward motion while providing suction.

Expected value

V elocity ≥ 0.0

Safety tolerance

N/A

TC28
Test case type

Invariant
Continued on next page

201

Table A.1 (cont’d)

Description

If internal sensor or object sensor has failed, then SVS must
disable the wheel motors.

Expected value

W heelM otors = DISABLED

Safety tolerance

N/A

TC29
Test case type

Invariant

Description

If large dirt particle is detected internally, then SVS SHALL
shut off all power.

Expected value

P ower = OF F

Safety tolerance

N/A

TC30
Test case type

Invariant

Description

If liquid detected internally, then SVS SHALL shut off all power.

Expected value

P ower = OF F

Safety tolerance

N/A

TC31
Test case type

Invariant

Description

If collision with non-collideable object, SVS SHALL shut off all
power.

Expected value

P ower = OF F

Safety tolerance

N/A

TC32
Test case type

Invariant
Continued on next page

202

Table A.1 (cont’d)

Description

If SVS fell off cliff, then SVS SHALL shut off all power.

Expected value

P ower = OF F

Safety tolerance

N/A

TC33
Test case type

Non-invariant

Description

If OBJECT detected within 0.5m of SVS, SVS SHALL execute
BACKUP procedure for 2 seconds.

Expected value

DistanceT oObject = 1.0m

Safety tolerance

±0.2m

TC34
Test case type

Non-invariant

Description

If LIQUID detected within 0.5m of SVS, SVS SHALL execute
BACKUP procedure for 2 seconds.

Expected value

DistanceT oLiquid = 0.5m

Safety tolerance

±0.2m

TC35
Test case type

Non-invariant

Description

If CLIFF detected by SVS cliff sensor, SVS SHALL execute
BACKUP procedure for 2 seconds.

Expected value

DistancetoClif f = 0.5m

Safety tolerance

±0.2m

TC36
Test case type

Non-invariant
Continued on next page

203

Table A.1 (cont’d)

Description

If LARGE DIRT PARTICLE detected within 0.5m of SVS, SVS
SHALL execute BACKUP procedure for 2 seconds.

Expected value

DistanceT oObject = 0.5m

Safety tolerance

±0.2m

TC37
Test case type

Non-invariant

Description

If ROOM SENSOR cannot measure amount of dirt in room,
SVS controller SHALL select a random path plan.

Expected value

RoomSensor.Healthy = T rue

Safety tolerance

N/A

TC38
Test case type

Non-invariant

Description

SVS is moving at the prescribed reduced speed.

Expected value

V elocityM odif ier ∈ (0.0, 1.0)

Safety tolerance

[0.0, 1.0]

TC39
Test case type

Non-invariant

Description

SVS has at least 25% battery power remaining.

Expected value

BatteryP ower ≥ 0.25

Safety tolerance

[0.05, 1.0]

TC40
Test case type

Non-invariant

Description

SVS is moving at the initially prescribed speed.
Continued on next page

204

Table A.1 (cont’d)

Expected value

V elocityM odif ier = 1.0

Safety tolerance

[0.0, 1.0]

TC41
Test case type

Non-invariant

Description

SVS has at least 75% battery power remaining.

Expected value

BatteryP ower ≥ 0.75

Safety tolerance

[0.05, 1.0]

TC42
Test case type

Non-invariant

Description

SVS has successfully entered a REDUCED SUCTION POWER
mode.

Expected value

SuctionM odif ier ∈ (0.0, 1.0)

Safety tolerance

[0.0, 1.0]

TC43
Test case type

Non-invariant

Description

SVS provides suction power of at least the base measure and
less than the power provided in NORMAL SUCTION POWER
mode.

Expected value

SuctionP ower ≥ 0.25 AND SuctionP ower < 1.0

Safety tolerance

N/A

TC44
Test case type

Non-invariant
Continued on next page

205

Table A.1 (cont’d)

Description

SVS continually moves in a forward motion while providing reduced suction power.

Expected value

V elocityM odif ier ≥ 0.25 AND SuctionP ower ∈ (0.0, 1.0)

Safety tolerance

N/A

TC45
Test case type

Non-invariant

Description

SVS has not entered a REDUCED SUCTION POWER mode.

Expected value

SuctionP ower is always 1.0

Safety tolerance

[0.0, 1.0]

TC46
Test case type

Non-invariant

Description

SVS provides suction power of the prescribed base measure.

Expected value

SuctionP ower = 0.25

Safety tolerance

[0.0, 1.0]

TC47
Test case type

Non-invariant

Description

SVS continually provides forward motion while providing base
suction power.

Expected value

V elocityM odif ier ≥ 0.25 AND SuctionP ower ≥ 0.25

Safety tolerance

[0.0, 1.0]

TC48
Test case type

Non-invariant
Continued on next page

206

Table A.1 (cont’d)

Description

If RANDOM selected, SVS SHALL execute RANDOM path
plan for 10 seconds.

Expected value

P athT imer = 10.0sec

Safety tolerance

±5.0sec

TC49
Test case type

Non-invariant

Description

If RANDOM selected, SVS SHALL select a random angle on
[0.0, π]rad every 5 seconds.

Expected value

P athT imer = 10.0sec

Safety tolerance

±5.0sec

TC50
Test case type

Non-invariant

Description

If STRAIGHT selected, SVS SHALL execute STRAIGHT path
plan for 10 seconds.

Expected value

P athT imer = 10.0sec

Safety tolerance

±5.0sec

TC51
Test case type

Non-invariant

Description

If STRAIGHT selected and collision occurs, SVS will select a
new angle (π/4.0)rad from the prior angle.

Expected value

P athAngle ± (π/4.0)rad

Safety tolerance

N/A

TC52
Continued on next page

207

Table A.1 (cont’d)

Test case type

Non-invariant

Description

Wheel motors can be actuated to specified velocity.

Expected value

V elocityLef t

≥

SP ECIF IED V ELOCIT Y

AND

V elocityRight ≥ SP ECIF IED V ELOCIT Y
Safety tolerance

N/A

TC53
Test case type

Non-invariant

Description

SVS moves in a forward motion while following SPIRAL path
plan.

Expected value

V elocityLef t ≥ 0.0 AND V elocityRight ≥ 0.0

Safety tolerance

N/A

TC54
Test case type

Non-invariant

Description

SVS executes the spiral path plan for 20 seconds.

Expected value

P athT imer = 20.0sec

Safety tolerance

±5.0sec

TC55
Test case type

Non-invariant

Description

SVS continually provides at least the base measure of suction
power while executing spiral path plan.

Expected value

SuctionP ower ≥ 0.25

Safety tolerance

N/A

TC56
Continued on next page

208

Table A.1 (cont’d)

Test case type

Non-invariant

Description

If SPIRAL selected, SVS SHALL execute SPIRAL path plan for
20 seconds.

Expected value

P athP lan = SP IRAL AND P athT ime = 20sec

Safety tolerance

±5.0sec

TC57
Test case type

Non-invariant

Description

Object sensor tested by internal health monitor SHALL always
be healthy.

Expected value

ObjectSensor.Healthy is never F alse

Safety tolerance

N/A

TC58
Test case type

Non-invariant

Description

Cliff sensor tested by internal health monitor SHALL always be
healthy.

Expected value

Clif f Sensor.Healthy is never F alse

Safety tolerance

N/A

TC59
Test case type

Non-invariant

Description

Internal sensor tested by internal health monitor SHALL always
be healthy.

Expected value

InternalSensor.Healthy is never F alse

Safety tolerance

N/A
Continued on next page

209

Table A.1 (cont’d)

TC60
Test case type

Invariant

Description

No large dirt particles SHALL be inside vacuum at any point in
time.

Expected value

N oLargeDirtP articles is never F alse

Safety tolerance

N/A

TC61
Test case type

Non-invariant

Description

SPIRAL path plan is executed for 20 seconds.

Expected value

P athT imer = 20.0sec

Safety tolerance

±5.0sec

TC62
Test case type

Non-invariant

Description

RANDOM path plan is executed for 10 seconds.

Expected value

P athT imer = 10.0sec

Safety tolerance

±5.0sec

TC63
Test case type

Non-invariant

Description

STRAIGHT path plan is executed for 10 seconds.

Expected value

P athT imer = 10.0sec

Safety tolerance

±5.0sec

TC64
Test case type

Non-invariant
Continued on next page

210

Table A.1 (cont’d)

Description

SVS selects a new angle on [0.0, π]rad every 5.0 seconds.

Expected value

P athAngle ∈ [0.0, π]rad

Safety tolerance

[−(π/2.0), π + (π ∗ 2.0)]rad

TC65
Test case type

Non-invariant

Description

Upon notification of a left bumper collision, the SVS reorients
clockwise by (π/4.0)rad.

Expected value

P athAngle + (π/4.0)rad

Safety tolerance

[(π/4.0) − 1.0, (π/4.0) + 1.0]rad

TC66
Test case type

Non-invariant

Description

Upon notification of a right bumper collision, the SVS reorients
counter-clockwise by (π/4.0)rad.

Expected value

P athAngle − (π/4.0)rad

Safety tolerance

[−(π/4.0) − 1.0, −(π/4.0) + 1.0]rad

TC67
Test case type

Non-invariant

Description

Upon notification of a center bumper collision, the SVS reorients
by (π/4.0)rad towards the configured direction preference.

Expected value

P athAngle ± (π/4.0)rad

Safety tolerance

[−(π/4.0) − 1.0, (π/4.0) + 1.0]rad

TC68
Test case type

Non-invariant
Continued on next page

211

Table A.1 (cont’d)

Description

The SVS executes a backup procedure for 2.0sec upon detection
of an object within 0.5m.

Expected value

DistanceT oObject = 0.5m

Safety tolerance

[0.3, 0.7]m

TC69
Test case type

Non-invariant

Description

The SVS executes a backup procedure for 2.0sec upon detection
of a liquid spill within 0.5m.

Expected value

DistanceT oLiquid = 0.5m

Safety tolerance

[0.3, 0.7]m

TC70
Test case type

Non-invariant

Description

The SVS executes a backup procedure for 2.0sec upon detection
of a downward step within 0.5m.

Expected value

DistanceT oClif f = 0.5m

Safety tolerance

[0.3, 0.7]m

TC71
Test case type

Non-invariant

Description

The SVS executes a backup procedure for 2.0sec upon detection
of a large object within 0.5m.

Expected value

DistanceT oObject = 0.5m

Safety tolerance

[0.3, 0.7]m

TC72
Continued on next page

212

Table A.1 (cont’d)

Test case type

Non-invariant

Description

Room sensor can measure the amount of dirt within the room.

Expected value

RoomSensor.Active = T rue

Safety tolerance

N/A

Smart Vacuum System Goal Correlation
This section provides an overview of how each test case is correlated to a particular
goal, thereby providing a method for validating test results at run time. We now reproduce
the SVS goal model previously presented in Figure 2.2 to simplify readability.

213

(A) Left half of SVS goal model

(A)

(B)

(E)

(K)

Achieve
[Movement]

Achieve [Reduced
Speed]

(L)

Achieve [Cleaning
Efficiency]

(F) Achieve [BatteryPower
> 5%]

Achieve [Normal
Speed]

(M)

…

(G)

Maintain
[Suction]

Achieve [Reduced
(N)
Suction]

Battery
Sensor

Motors

Achieve [50%
Clean]

Achieve [Normal
Suction]

Vacuum

(B) Right half of SVS goal model
…

(C)

Achieve [Cleaning
Effectiveness]

(D)

Maintain
[Safety]

(J) FailSafeEnabled If
SafetyCheckFailure

(R)

Avoid
[Obstacles]

Object
Sensor
(H) Achieve [Path Plan
for 10 Seconds]

(O)

Achieve [Random
Path]

(P)

(S)

Suction
Sensor

Avoid [Self
Damage]

Cliff
Sensor

Internal
Sensor

Achieve [Clean Area
for 20 seconds]

(I)

Achieve [Straight
Path]

(Q)

Achieve [Spiral
Path]

Legend
Bumper
Sensors

Controller

Goal

Agent

Requirement / Expectation

Refinement

Figure A.1: KAOS goal model of the smart vacuum system application.
Table A.2 next presents the traceability links for each test case. Particularly, the first
column presents the test case identifier and the second column lists the goals to which the
test case is correlated. The traceability defined here enables run-time validation of test cases,

214

as is described in Chapter 6.4.2. Furthermore, we only linked test cases to leaf-level goals
for the SVS, as they represent system requirements.

Table A.2: Traceability links for smart vacuum system application.
Test Case ID

Correlated Goal(s)

TC1

(K), (L), (M), (N), (O), (P), (Q), (R), (S)

TC2

(K), (L), (M), (N), (O), (P), (Q), (R), (S)

TC3

(K), (L), (M), (N), (O), (P), (Q), (R), (S)

TC4

(K), (L), (M), (N), (O), (P), (Q)

TC5

(K), (L), (M), (N), (O), (P), (Q)

TC6

(K), (L), (M), (N), (O), (P), (Q)

TC7

(R)

TC8

(R)

TC9

(S)

TC10

(R), (S)

TC11

(R), (S)

TC12

(R)

TC13

(R)

TC14

(S)

TC15

(R), (S)

TC16

(K), (L)

TC17

(K), (L)

TC18

(K), (L)

TC19

(K), (L)

TC20

(K), (M)

TC21

(K), (M)
Continued on next page
215

Table A.2 (cont’d)
Test Case ID)

Correlated Goal(s)

TC22

(K), (M)

TC23

(M), (N)

TC24

(M), (N)

TC25

(O)

TC26

(P)

TC27

(K), (L), (O), (P), (Q)

TC28

(R), (S)

TC29

(R), (S)

TC30

(R), (S)

TC31

(R), (S)

TC32

(S)

TC33

(R), (S)

TC34

(R), (S)

TC35

(S)

TC36

(R), (S)

TC37

(S)

TC38

(K)

TC39

(K)

TC40

(L)

TC41

(L)

TC42

(M)

TC43

(M)

TC44

(M)
Continued on next page

216

Table A.2 (cont’d)
Test Case ID)

Correlated Goal(s)

TC45

(N)

TC46

(N)

TC47

(N), (O), (P)

TC48

(O)

TC49

(O)

TC50

(P)

TC51

(P)

TC52

(Q)

TC53

(Q)

TC54

(Q)

TC55

(Q)

TC56

(Q)

TC57

(R)

TC58

(S)

TC59

(S)

TC60

(S)

TC61

(G), (N)

TC62

(F), (L), (M)

TC63

(L)

TC64

(D)

TC65

(D)

TC66

(D)

TC67

(O), (P), (Q)
Continued on next page

217

Table A.2 (cont’d)
Test Case ID)

Correlated Goal(s)

TC68

(O)

TC69

(P), (Q), (R)

TC70

(O), (P), (R)

TC71

(O), (P), (Q), (R)

TC72

(P)

Remote Data Mirroring Test Specification
This section overviews the test specification used for the RDM case study within this
dissertation. Particularly, Table A.3 presents the test specification that is used as a basis for
run-time testing and adaptation. For each test case, we present the test case identifier (ID),
the test case type (i.e., invariant or non-invariant), a description, the expected value(s), and
the safety tolerance used for adaptation. Furthermore, if the safety tolerance is listed as
N/A, then the test case is considered non-invariant, however cannot be physically adapted.

Table A.3: Remote data mirroring test specification.
TC1
Test case type

Invariant

Description

Data has been fully replicated throughout the network at the
end of execution.

Expected value

RDM.DataCopies = RDM.N umServers

Safety tolerance

N/A
Continued on next page

218

Table A.3 (cont’d)

TC2
Test case type

Invariant

Description

Operational cost has never exceeded the budget.

Expected value

RDM.Cost < RDM.Budget

Safety tolerance

N/A

TC3
Test case type

Non-invariant

Description

Link sensor and RDM sensor can collectively measure network
properties.

Expected value

LinkSensor.Active = T rue
AND RDM Sensor.Active = T rue

Safety tolerance

N/A

TC4
Test case type

Non-invariant

Description

Network actuator only enables the minimum number of links to
replicate data.

Expected value

N etworkActuator.ActiveN etworkLinks =
(RDM.N umServers − 1)

Safety tolerance

[RDM.N umServers − 1, RDM.N umServers + 1]

TC5
Test case type

Non-invariant

Description

Network actuator has never partitioned network.

Expected value

N etworkActuator.N umP artitions = 0
Continued on next page

219

Table A.3 (cont’d)

Safety tolerance

[0, 2]

TC6
Test case type

Non-invariant

Description

Risk threshold is never exceeded while sending data synchronously.

Expected value

N etworkController.SynchronousSend = T rue
AND N etworkController.Risk ≤ RiskT hreshold

Safety tolerance

[RiskT hreshold, RiskT hreshold + 20.0]

TC7
Test case type

Non-invariant

Description

Diffusion time never exceeds maximum time while sending data
asynchronously.

Expected value

N etworkController.AsynchronousSend = T rue
AND Dif f usionT ime ≤ M axT ime

Safety tolerance

[Dif f usionT ime − 5.0, Dif f usionT ime + 5.0]

TC8
Test case type

Non-invariant

Description

Network adaptation costs must remain 0.

Expected value

AdaptationController.AdaptationCosts = 0.0

Safety tolerance

[0.0, 200.0]

TC9
Test case type

Non-invariant

Description

Link sensor can physically measure cost.
Continued on next page

220

Table A.3 (cont’d)

Expected value

LinkSensor.Active = T rue

Safety tolerance

N/A

TC10
Test case type

Invariant

Description

Link sensor is active.

Expected value

LinkSensor.Active = T rue

Safety tolerance

N/A

TC11
Test case type

Non-invariant

Description

Link sensor can physically measure activity.

Expected value

LinkSensor.Active = T rue

Safety tolerance

N/A

TC12
Test case type

Non-invariant

Description

Link sensor can physically measure loss rate.

Expected value

LinkSensor.Active = T rue

Safety tolerance

N/A

TC13
Test case type

Non-invariant

Description

RDM sensor can physically measure workload.

Expected value

RDM Sensor.Active = T rue

Safety tolerance

N/A

TC14
Continued on next page
221

Table A.3 (cont’d)

Test case type

Invariant

Description

RDM sensor is active.

Expected value

RDM Sensor.Active = T rue

Safety tolerance

N/A

TC15
Test case type

Non-invariant

Description

RDM sensor can physically measure capacity.

Expected value

RDM Sensor.T otalCapacity > 0.0

Safety tolerance

N/A

TC16
Test case type

Non-invariant

Description

Network actuator can measure link state.

Expected value

N etworkActuator.N umActiveLinks ≥ 0.0 AND
N etworkController.N umSynchronousLinks ≥ 0.0 AND
N etworkController.N umAsynchronousLinks ≥ 0.0

Safety tolerance

N/A

TC17
Test case type

Non-invariant

Description

Network actuator can physically deactivate a link.

Expected value

N etworkActuator.N umActiveLinks ≥ 0.0

Safety tolerance

N/A

TC18
Test case type

Invariant
Continued on next page

222

Table A.3 (cont’d)

Description

Network actuator is active.

Expected value

N etworkActuator.Active = T rue

Safety tolerance

N/A

TC19
Test case type

Non-invariant

Description

Network actuator can physically activate a link.

Expected value

N etworkActuator.N umActiveLinks ≥ 0.0

Safety tolerance

N/A

TC20
Test case type

Non-invariant

Description

Network controller can synchronously send data.

Expected value

N etworkController.N umSynchronousLinks ≥ 0.0

Safety tolerance

N/A

TC21
Test case type

Non-invariant

Description

Network controller can send data over the network.

Expected value

N etworkController.N umInsertedM essages > 0
AND N etworkActuator.N umActiveLinks > 0

Safety tolerance

N/A

TC22
Test case type

Invariant

Description

Network controller is active.

Expected value

N etworkController.Active = T rue
Continued on next page

223

Table A.3 (cont’d)

Safety tolerance

N/A

TC23
Test case type

Non-invariant

Description

Network controller can physically measure data sent.

Expected value

N etworkController.N umSentM essages ≥ 0

Safety tolerance

N/A

TC24
Test case type

Non-invariant

Description

Network controller can physically measure data received.

Expected value

N etworkController.N umReceivedM essages ≥ 0

Safety tolerance

N/A

TC25
Test case type

Non-invariant

Description

Network controller records that the data received equals the data
sent.

Expected value

N etworkController.N umReceivedM essages =
N etworkController.N umSentM essages

Safety tolerance

[0.0, 10.0]

TC26
Test case type

Non-invariant

Description

Network controller can asynchronously send data.

Expected value

N etworkController.N umAsynchronousLinks ≥ 0.0

Safety tolerance

N/A
Continued on next page
224

Table A.3 (cont’d)

TC27
Test case type

Non-invariant

Description

Adaptation controller can physically measure number of active
data mirrors.

Expected value

RDM.N umActiveServers ≥ 0

Safety tolerance

N/A

TC28
Test case type

Non-invariant

Description

Adaptation controller can physically measure number of data
mirrors.

Expected value

RDM.N umServers ≥ 0

Safety tolerance

N/A

TC29
Test case type

Non-invariant

Description

Adaptation controller has ensured that all data mirrors are active.

Expected value

RDM.N umServers = RDM.N umActiveServers

Safety tolerance

[RDM.N umServers ∗ 0.4, RDM.N umServers]

TC30
Test case type

Invariant

Description

Adaptation controller is active.

Expected value

AdaptationController.Active = T rue

Safety tolerance

N/A
Continued on next page

225

Table A.3 (cont’d)

TC31
Test case type

Non-invariant

Description

Adaptation controller can physically measure number of passive
data mirrors.

Expected value

RDM.N umP assiveServers ≥ 0

Safety tolerance

N/A

TC32
Test case type

Non-invariant

Description

Adaptation controller has not put a data mirror into passive
state.

Expected value

RDM.N umP assiveServers = 0

Safety tolerance

[0, RDM.N umServers ∗ 0.4]

TC33
Test case type

Non-invariant

Description

Adaptation controller can physically measure number of quiescent data mirrors

Expected value

RDM.N umQuiescentServers ≥ 0

Safety tolerance

N/A

TC34
Test case type

Non-invariant

Description

Adaptation controller has not put a data mirror into quiescent
state.

Expected value

RDM.N umQuiescentServers = 0
Continued on next page

226

Table A.3 (cont’d)

Safety tolerance

[0, RDM.N umServers ∗ 0.4]

Remote Data Mirroring Goal Correlation
This section provides an overview of how each test case is correlated to a particular
goal, thereby providing a method for validating test results at run time. We now reproduce
the RDM goal model previously presented in Figure 2.4 to simplify this section.
Table A.4 next presents the traceability links for each test case. Particularly, the first
column presents the test case identifier and the second column lists the goals to which the test
case is correlated. Again, the traceability defined in this section enables run-time validation
of test cases, as was previously described in Chapter 6.4.2. For the RDM, we correlated
test cases to all goals within the goal model to examine a different approach for test case
derivation and correlation.

Table A.4: Traceability links for remote data mirroring application.
Test Case ID

Correlated Goal(s)

TC1

(A), (C)

TC2

(A), (B)

TC3

(D)

TC4

(E)

TC5

(F)

TC6

(G)

TC7

(H)
Continued on next page
227

Table A.4 (cont’d)
Test Case ID)

Correlated Goal(s)

TC8

(I)

TC9

(J)

TC10

(J), (K), (L)

TC11

(K)

TC12

(L)

TC13

(M)

TC14

(M), (N)

TC15

(N)

TC16

(O), (P)

TC17

(O)

TC18

(O), (P)

TC19

(P)

TC20

(Q)

TC21

(Q), (S)

TC22

(Q), (R), (S), (T)

TC23

(R), (T)

TC24

(R), (T)

TC25

(R), (T)

TC26

(S)

TC27

(U)

TC28

(U)

TC29

(U)

TC30

(U), (V), (W)
Continued on next page

228

Table A.4 (cont’d)
Test Case ID)

Correlated Goal(s)

TC31

(V)

TC32

(V)

TC33

(W)

TC34

(W)

229

(A) Left half of RDM goal model

(A)

(B)

(D)

Maintain [Operational
Costs  ≤    Budget]

Achieve [Measure
Network Properties]

(J)
Achieve
[Cost
Measured]

…

(E) Achieve [Minimum Num
Links Active]

(M)

(L)

(K)
Achieve
[Activity
Measured]

Achieve
[LossRate
Measured]

Achieve
[Workload
Measured]

Link
Sensor

Maintain
[DataAvailable]

(N)

(F)

Achieve [Network
Partitions == 0]

(P)

(O)

Achieve
[Capacity
Measured]

Achieve [Link
Deactivated]

Achieve [Link
Activated]

Network
Actuator

RDM
Sensor

(B) Right half of RDM goal model
…

Achieve [NumDataCopies ==
NumServers]

(C)

(G) Achieve [DataAtRisk  ≤  
RiskThreshold]
(Q)
Achieve
[Send Data
Synchronously]

(H)

Achieve [DiffusionTime  ≤  
MaxTime]

(R)
Achieve
[Data Sent ==
Data Received]

(S)
Achieve
[Send Data
Asynchronously]

(I)

(T)
Achieve
[Data Received
== Data Sent]

Network
Controller

Achieve [Adaptation
Costs == 0]

(U)
(V)
(W)
Achieve
Achieve
Achieve
[Num Active
[Num Passive Data [Num Quiescent
Data Mirrors ==
Mirrors == 0]
Data Mirrors == 0]
Num Mirrors]

Adaptation
Controller

Legend
Goal

Agent

Requirement / Expectation

Refinement

Figure A.2: KAOS goal model of the remote data mirroring application.

230

BIBLIOGRAPHY

231

BIBLIOGRAPHY

[1] Shin-Young Ahn, Sungwon Kang, Jongmoon Baik, and Ho-Jin Choi. A weighted
call graph approach for finding relevant components in source code. In 10th ACIS
International Conference on Software Engineering, Artificial Intelligences, Networking
and Parallel/Distributed Computing, pages 539–544. IEEE, 2009.
[2] Gene M Amdahl. Validity of the single processor approach to achieving large scale
computing capabilities. In Proceedings of the April 18-20, 1967, spring joint computer
conference, pages 483–485. ACM, 1967.
[3] James H. Andrews, Tim Menzies, and Felix C.H. Li. Genetic algorithms for randomized
unit testing. IEEE Trans. on Software Engineering, 37(1):80–94, January 2011.
[4] L. Baresi, D. Bianculli, C. Ghezzi, S. Guinea, and P. Spoletini. Validation of web
service compositions. Software, IET, 1(6):219–232, 2007.
[5] L. Baresi, L. Pasquale, and P. Spoletini. Fuzzy goals for requirements-driven adaptation. In Requirements Engineering Conference (RE), 2010 18th IEEE International,
pages 125 –134, 27 2010-oct. 1 2010.
[6] Jonathon A Bauer and Alan B Finger. Test plan generation using formal grammars.
Proceedings of the 4th International Conference on Software Engineering, pages 425–
432, 1979.
[7] Nelly Bencomo and Amel Belaggoun. Supporting decision-making for self-adaptive
systems: from goal models to dynamic decision networks. In Requirements Engineering:
Foundation for Software Quality, pages 221–236. Springer, 2013.
[8] Nelly Bencomo, Jon Whittle, Peter Sawyer, Anthony Finkelstein, and Emmanuel
Letier. Requirements reflection: Requirements as runtime entities. In Proc. of the
32nd ACM/IEEE International Conference on Software Engineering, pages 199–202,
Cape Town, South Africa, May 2010. ACM.
[9] Antonia Bertolino. Software testing research: Achievements, challenges, dreams. In
Future of Software Engineering, 2007. FOSE ’07, pages 85–103, 2007.
[10] Paul E. Black. Dictionary of algorithms and data structures. U.S. National Institute
of Standards and Technology, May 2006.
[11] Renesys Blog. Pakistan hijacks youtube.
pakistan-hijacks-youtube-1/, 2008.

http://www.renesys.com/2008/02/

[12] N. Bredeche, E. Haasdijk, and A.E. Eiben. On-line, on-board evolution of robot controllers. In Pierre Collet, Nicolas Monmarch´e, Pierrick Legrand, Marc Schoenauer, and
Evelyne Lutton, editors, Artifical Evolution, volume 5975 of Lecture Notes in Computer
Science, pages 110–121. Springer Berlin Heidelberg, 2010.
232

[13] Edmund Burke, Graham Kendall, Jim Newall, Emma Hart, Peter Ross, and Sonia
Schulenburg. Hyper-heuristics: An emerging direction in modern search technology.
In Fred Glover and Gary A. Kochenberger, editors, Handbook of Metaheuristics, volume 57, pages 457–474. Springer US, 2003.
[14] Ilene Burnstein. Practical software testing: a process-oriented approach. Springer,
2003.
[15] J. Camara and R. de Lemos. Evaluation of resilience in self-adaptive systems using
probabilistic model-checking. In Software Engineering for Adaptive and Self-Managing
Systems., pages 53 –62, june 2012.
[16] Marco Canini, Vojin Jovanovic, Daniele Venzano, Boris Spasojevic, Olivier Crameri,
and Dejan Kostic. Toward online testing of federated and heterogeneous distributed
systems. In USENIX Annual Technical Conference, 2011.
[17] Marco Canini, Dejan Novakovi´c, Vojin Jovanovi´c, and Dejan Kosti´c. Fault prediction
in distributed systems gone wild. In Proceedings of the 4th International Workshop on
Large Scale Distributed Systems and Middleware, pages 7–11, 2010.
[18] Mei-Hwa Chen, Michael R Lyu, and W Eric Wong. Effect of code coverage on software
reliability measurement. IEEE Transactions on Reliability, 50(2):165–170, 2001.
[19] Betty H. C. Cheng, Kerstin I. Eder, Martin Gogolla, Lars Grunske, Marin Litoiu,
Hausi A. M¨
uller, Patrizio Pelliccione, Anna Perini, Nauman A. Qureshi, Bernhard
Rumpe, Daniel Schneider, Frank Trollmann, and Norha M. Villegas. Using models at
runtime to address assurance for self-adaptive systems. In Models@run.time, volume
8378 of Lecture Notes in Computer Science, pages 101–136. Springer International
Publishing, 2014.
[20] Betty H. C. Cheng, Rog´erio Lemos, Holger Giese, Paola Inverardi, Jeff Magee, and
et al. Software engineering for self-adaptive systems: A research roadmap. In Software
engineering for self-adaptive systems, chapter Software Engineering for Self-Adaptive
Systems: A Research Roadmap, pages 1–26. Springer-Verlag, Berlin, Heidelberg, 2009.
[21] Betty H. C. Cheng, Pete Sawyer, Nelly Bencomo, and Jon Whittle. A goal-based
modeling approach to develop requirements of an adaptive system with environmental
uncertainty. In Proc. of the 12th International Conference on Model Driven Engineering Languages and Systems, pages 468–483, Berlin, Heidelberg, 2009. Springer-Verlag.
[22] Shang-Wen Cheng, David Garlan, and Bradley Schmerl. Architecture-based selfadaptation in the presence of multiple objectives. In Proceedings of the 2006 international workshop on Self-adaptation and self-managing systems, pages 2–8. ACM,
2006.
[23] O. Chikumbo, E. Goodman, and K. Deb. Approximating a multi-dimensional pareto
front for a land use management problem: A modified moea with an epigenetic silencing
metaphor. In 2012 IEEE Congress on Evolutionary Computation (CEC), 2012.
233

[24] Lawrence Chung, B Nixon, E Yu, and J Mylopoulos. Non-functional requirements.
Software Engineering, 2000.
[25] Anne Dardenne, Axel Van Lamsweerde, and Stephen Fickas. Goal-directed requirements acquisition. Science of computer programming, 20(1):3–50, 1993.
[26] Scott Dawson, Farnam Jahanian, Todd Mitton, and Teck-Lee Tung. Testing of faulttolerant and real-time distributed systems via protocol fault injection. In Proceedings
of Annual Symposium on Fault Tolerant Computing, pages 404–414. IEEE, 1996.
[27] K. Deb, A. Pratap, S. Agarwal, and T. Meyarivan. A fast and elitist multiobjective genetic algorithm: Nsga-ii. Evolutionary Computation, IEEE Transactions on, 6(2):182
–197, apr 2002.
[28] Paul deGrandis and Giuseppe Valetto. Elicitation and utilization of application-level
utility functions. In Proc. of the 6th International Conference on Autonomic Computing, ICAC ’09, pages 107–116. ACM, 2009.
[29] R.F. DeMara and Kening Zhang. Autonomous fpga fault handling through competitive
runtime reconfiguration. In NASA/DoD Conference on Evolvable Hardware, pages 109
– 116, 2005.
[30] Richard A DeMillo, Richard J Lipton, and Frederick G Sayward. Hints on test data
selection: Help for the practicing programmer. Computer, 11(4):34–41, 1978.
[31] A.E. Eiben and J. K. van der Hauw. Adaptive penalties for evolutionary graph coloring.
In Artifical Evolution. Springer, 1998.
[32] Sebastian Elbaum and David S. Rosenblum. Known unknowns: Testing in the presence
of uncertainty. In Proceedings of the 22Nd ACM SIGSOFT International Symposium
on Foundations of Software Engineering, pages 833–836, 2014.
[33] Ilenia Epifani, Carlo Ghezzi, Raffaela Mirandola, and Giordano Tamburrelli. Model
evolution by run-time parameter adaptation. In Proc. of the 31st International Conference on Software Engineering, pages 111–121, Washington, DC, USA, 2009. IEEE
Computer Society.
[34] N. Esfahani. A framework for managing uncertainty in self-adaptive software systems.
In Automated Software Engineering (ASE), 2011 26th IEEE/ACM International Conference on, pages 646–650, 2011.
[35] Naeem Esfahani, Ehsan Kouroshfar, and Sam Malek. Taming uncertainty in selfadaptive software. In Proc. of the 19th ACM SIGSOFT symposium and the 13th
European conference on Foundations of software engineering, pages 234–244. ACM,
2011.
[36] Yli`es Falcone, Mohamad Jaber, Thanh-Hung Nguyen, Marius Bozga, and Saddek Bensalem. Runtime verification of component-based systems. In Proc. of the 9th international conference on Software engineering and formal methods, pages 204–220, Berlin,
Heidelberg, 2011. Springer-Verlag.
234

[37] S. Fickas and M.S. Feather. Requirements monitoring in dynamic environments. In
Requirements Engineering, 1995., Proc. of the Second IEEE International Symposium
on, pages 140 – 147, mar 1995.
[38] Antonio Filieri, Carlo Ghezzi, and Giordano Tamburrelli. Run-time efficient probabilistic model checking. In Proc. of the 33rd International Conference on Software
Engineering, pages 341–350, Waikiki, Honolulu, Hawaii, USA, 2011. ACM.
[39] Antonio Filieri, Carlo Ghezzi, and Giordano Tamburrelli. A formal approach to adaptive software: continuous assurance of non-functional requirements. Formal Aspects of
Computing, 24:163–186, 2012.
[40] Gordon Fraser and Andrea Arcuri. Evosuite: automatic test suite generation for
object-oriented software. In Proc. of the 19th ACM SIGSOFT symposium and the
13th European conference on Foundations of software engineering, ESEC/FSE ’11,
pages 416–419, Szeged, Hungary, 2011. ACM.
[41] Erik M. Fredericks and Betty H. C. Cheng. Automated generation of adaptive test
plans for self-adaptive systems. In Accepted to Appear in Proceedings of 10th International Symposium on Software Engineering for Adaptive and Self-Managing Systems,
SEAMS ’15, 2015.
[42] Erik M. Fredericks, Byron DeVries, and Betty H. C. Cheng. Autorelax: Automatically relaxing a goal model to address uncertainty. Empirical Software Engineering,
19(5):1466–1501, 2014.
[43] Erik M. Fredericks, Byron DeVries, and Betty H. C. Cheng. Towards run-time adaptation of test cases for self-adaptive systems in the face of uncertainty. In Proceedings
of the 9th International Symposium on Software Engineering for Adaptive and SelfManaging Systems, SEAMS ’14, 2014.
[44] Erik M. Fredericks, Andres J. Ramirez, and Betty H. C. Cheng. Towards run-time
testing of dynamic adaptive systems. In Proceedings of the 8th International Symposium on Software Engineering for Adaptive and Self-Managing Systems, SEAMS ’13,
pages 169–174. IEEE Press, 2013.
[45] Erik M. Fredericks, Andres J. Ramirez, and Betty H. C. Cheng. Validating codelevel behavior of dynamic adaptive systems in the face of uncertainty. In Search Based
Software Engineering, volume 8084 of Lecture Notes in Computer Science, pages 81–95.
Springer Berlin Heidelberg, 2013.
[46] Christian Gagn´e and Marc Parizeau. Genericity in evolutionary computation software
tools: Principles and case-study. International Journal on Artificial Intelligence Tools,
15(02):173–194, 2006.
[47] David Garlan, Shang-Wen Cheng, An-Cheng Huang, Bradley Schmerl, and Peter
Steenkiste. Rainbow: Architecture-based self-adaptation with reusable infrastructure.
Computer, 37(10):46–54, 2004.
235

[48] C. Ghezzi. Adaptive software needs continuous verification. In Software Engineering
and Formal Methods (SEFM), 2010 8th IEEE International Conference on, pages 3
–4, Sept. 2010.
[49] Heather J. Goldsby and Betty H. C. Cheng. Automatically generating behavioral models of adaptive systems to address uncertainty. In Model Driven Engineering Languages
and Systems, pages 568–583. Springer, 2008.
[50] Heather J. Goldsby, Betty H. C. Cheng, and Ji Zhang. Models in software engineering. In Holger Giese, editor, Models in Software Engineering, chapter AMOEBA-RT:
Run-Time Verification of Adaptive Software, pages 212–224. Springer-Verlag, Berlin,
Heidelberg, 2008.
[51] Mark Harman, S Afshin Mansouri, and Yuanyuan Zhang. Search based software engineering: A comprehensive analysis and review of trends techniques and applications.
Department of Computer Science, King’s College London, Tech. Rep. TR-09-03, 2009.
[52] Mark Harman, Phil McMinn, Jerffeson Teixeira Souza, and Shin Yoo. Search based
software engineering: Techniques, taxonomy, tutorial. In Empirical Software Engineering and Verification, volume 7007 of Lecture Notes in Computer Science, pages 1–59.
Springer Berlin Heidelberg, 2012.
[53] John H. Holland. Adaptation in Natural and Artificial Systems. MIT Press, Cambridge,
MA, USA, 1992.
[54] IEEE. Systems and software engineering – vocabulary. ISO/IEC/IEEE 24765:2010(E),
pages 1–418, Dec 2010.
[55] Michael Jackson and Pamela Zave. Deriving specifications from requirements: an
example. In Proceedings of the 17th International Conference on Software Engineering,
pages 15–24. ACM, 1995.
[56] Minwen Ji, Alistair Veitch, and John Wilkes. Seneca: Remote mirroring done write.
In USENIX 2003 Annual Technical Conference, pages 253–268, Berkeley, CA, USA,
June 2003. USENIX Association.
[57] Keneth A. De Jong. Evolutionary computation, a unified approach. The MIT Press,
March 2002.
[58] Ren´e Just, Darioush Jalali, Laura Inozemtseva, Michael D Ernst, Reid Holmes, and
Gordon Fraser. Are mutants a valid substitute for real faults in software testing. Preprint, 12 pages, available online http://homes.cs.washington.edu/~rjust/
publ/mutants_real_faults_fse_2014.pdf, 2014.
[59] Gail Kaiser, Phil Gross, Gaurav Kc, Janak Parekh, and Giuseppe Valetto. An approach
to autonomizing legacy systems. Technical report, DTIC Document, 2005.

236

[60] Kimberly Keeton, Cipriano Santos, Dirk Beyer, Jeffrey Chase, and John Wilkes. Designing for disasters. In Proc. of the 3rd USENIX Conference on File and Storage
Technologies, pages 59–62, Berkeley, CA, USA, 2004. USENIX Association.
[61] J.O. Kephart and D.M. Chess. The vision of autonomic computing. Computer, 36(1):41
– 50, jan 2003.
[62] John R. Koza. Genetic programming: On the programming of computers by means of
natural selection (complex adaptive systems). The MIT Press, December 1992.
[63] Jeff Kramer and Jeff Magee. The evolving philosophers problem: Dynamic change
management. Software Engineering, IEEE Transactions on, 16(11):1293–1306, 1990.
[64] Joel Lehman and Kenneth O. Stanley. Exploiting open-endedness to solve problems
through the search for novelty. In Proceedings of the Eleventh International Conference
on Artificial Life (ALIFE XI). MIT Press, 2004.
[65] Emmanuel Letier, David Stefan, and Earl T. Barr. Uncertainty, risk, and information
value in software requirements and architecture. In Proceedings of the 36th International Conference on Software Engineering, pages 883–894, 2014.
[66] Emmanuel Letier and Axel van Lamsweerde. Reasoning about partial goal satisfaction
for requirements and design engineering. In Proceedings of the 12th ACM SIGSOFT
Twelfth International Symposium on Foundations of Software Engineering, pages 53–
62, 2004.
[67] Khaled Mahbub and George Spanoudakis. A framework for requirements monitoring of
service based systems. In Proc. of the 2nd international conference on Service oriented
computing, pages 84–93. ACM, 2004.
[68] P.K. McKinley, S.M. Sadjadi, E.P. Kasten, and B. H. C. Cheng. Composing adaptive
software. Computer, 37(7):56 – 64, july 2004.
[69] P. McMinn. Search-based software testing: Past, present and future. In Software Testing, Verification and Validation Workshops (ICSTW), 2011 IEEE Fourth International
Conference on, pages 153–163, 2011.
[70] Atif M Memon, Martha E Pollack, and Mary Lou Soffa. Hierarchical gui test case
generation using automated planning. IEEE Transactions on Software Engineering,
27(2):144–155, 2001.
[71] Edison Mera, Pedro Lopez-Garc´ıa, and Manuel Hermenegildo. Integrating software
testing and run-time checking in an assertion verification framework. In Logic Programming, pages 281–295. Springer, 2009.
[72] Bertrand Meyer. Seven principles of software testing. Computer, 41(8):99–101, 2008.
[73] Harlan D Mills, Michael Dyer, and Richard C Linger. Cleanroom software engineering.
IEEE Software, 1987.
237

[74] S. Mitra, T. Wongpiromsarn, and R.M. Murray. Verifying cyber-physical interactions
in safety-critical systems. IEEE Security Privacy, 11(4):28–37, 2013.
[75] Subhasish Mitra, W.-J. Huang, N.R. Saxena, S.-Y. Yu, and E.J. McCluskey. Reconfigurable architecture for autonomous self-repair. Design Test of Computers, IEEE,
21(3):228 – 240, may-june 2004.
[76] B. Morin, O. Barais, J.-M. Jezequel, F. Fleurey, and A. Solberg. Models@ run.time to
support dynamic adaptation. Computer, 42(10):44 –51, oct. 2009.
[77] Sandeep Neema, Ted Bapty, and Jason Scott. Development environment for dynamically reconfigurable embedded systems. In Proc. of the International Conference on
Signal Processing Applications and Technology. Orlando, FL, 1999.
[78] Cu D. Nguyen, Anna Perini, Paolo Tonella, and Fondazione Bruno Kessler. Automated
continuous testing of multiagent systems. In The Fifth European Workshop on MultiAgent Systems (EUMAS), 2007.
[79] Duy Cu Nguyen, Anna Perini, and Paolo Tonella. A goal-oriented software testing
methodology. In Proc. of the 8th international conference on Agent-oriented software
engineering VIII, pages 58–72, Berlin, Heidelberg, 2008. Springer-Verlag.
[80] Charles Ofria and Claus O Wilke. Avida: A software platform for research in computational evolutionary biology. Artificial life, 10(2):191–229, 2004.
[81] P. Oreizy, M.M. Gorlick, R.N. Taylor, D. Heimhigner, G. Johnson, N. Medvidovic,
A. Quilici, D.S. Rosenblum, and A.L. Wolf. An architecture-based approach to selfadaptive software. Intelligent Systems and their Applications, IEEE, 14(3):54 –62,
1999.
[82] Thomas Ostrand. Encyclopedia of Software Engineering. Wiley, 2002.
[83] Roy P Pargas, Mary Jean Harrold, and Robert R Peck. Test-data generation using
genetic algorithms. Software Testing, Verification and Reliability, 9(4):263–282, 1999.
[84] Esteban Pavese, V´ıctor Braberman, and Sebastian Uchitel. Automated reliability estimation over partial systematic explorations. In Software Engineering (ICSE), 2013
35th International Conference on, 2013.
[85] N.A. Qureshi, S. Liaskos, and A. Perini. Reasoning about adaptive requirements for
self-adaptive systems at runtime. In Proc. of the 2011 International Workshop on
Requirements at Run Time, pages 16 –22, aug. 2011.
[86] A.J. Ramirez, A.C. Jensen, B. H. C. Cheng, and D.B. Knoester. Automatically exploring how uncertainty impacts behavior of dynamically adaptive systems. In Automated Software Engineering (ASE), 2011 26th IEEE/ACM International Conference
on, pages 568 –571, nov. 2011.

238

[87] Andres J. Ramirez and Betty H. C. Cheng. Automatically deriving utility functions for
monitoring software requirements. In Proc. of the 2011 International Conference on
Model Driven Engineering Languages and Systems Conference, pages 501–516, Wellington, New Zealand, 2011.
[88] Andres J. Ramirez, Erik M. Fredericks, Adam C. Jensen, and Betty H. C. Cheng.
Automatically relaxing a goal model to cope with uncertainty. In Gordon Fraser and
Jerffeson Teixeira de Souza, editors, Search Based Software Engineering, volume 7515,
pages 198–212. Springer Berlin Heidelberg, 2012.
[89] Andres J Ramirez, David B Knoester, Betty HC Cheng, and Philip K McKinley.
Applying genetic algorithms to decision making in autonomic computing systems. In
Proceedings of the 6th international conference on Autonomic computing, pages 97–106,
2009.
[90] Barbara G Ryder. Constructing the call graph of a program. IEEE Transactions on
Software Engineering, pages 216–226, 1979.
[91] David Saff and Michael D. Ernst. Reducing wasted development time via continuous
testing. In Proc. of the 14th International Symposium on Software Reliability Engineering, pages 281–292. IEEE Computer Society, 2003.
[92] David Saff and Michael D. Ernst. An experimental evaluation of continuous testing
during development. In Proc. of the 2004 ACM SIGSOFT international symposium on
Software testing and analysis, pages 76–85, Boston, Massachusetts, USA, 2004. ACM.
[93] Mazeiar Salehie and Ladan Tahvildari. Self-adaptive software: Landscape and research challenges. ACM Transactions on Autonomous and Adaptive Systems (TAAS),
4(2):14:1–14:42, May 2009.
[94] P. Sawyer, N. Bencomo, J. Whittle, E. Letier, and A. Finkelstein. Requirementsaware systems: A research agenda for re for self-adaptive systems. In Requirements
Engineering Conference (RE), 2010 18th IEEE International, pages 95 –103, 2010.
[95] Nirmal R. Saxena, Santiago Fernandez-Gomez, Wei-Je Huang, Subhasish Mitra, ShuYi Yu, and Edward J. McCluskey. Dependable computing and online testing in adaptive and configurable systems. IEEE Des. Test, 17(1):29–41, January 2000.
[96] Hans-Paul Schwefel. Numerical optimization of computer models. John Wiley & Sons,
Inc., 1981.
[97] Z. Segall, D. Vrsalovic, D. Siewiorek, D. Yaskin, J. Kownacki, J. Barton, R. Dancey,
A. Robinson, and T. Lin. Fiat-fault injection based automated testing environment.
In Eighteenth International Symposium on Fault-Tolerant Computing, 1988. FTCS-18,
Digest of Papers. IEEE, 1988.
[98] Koushik Sen and Gul Agha. Automated systematic testing of open distributed programs. In Fundamental Approaches to Software Engineering, pages 339–356. Springer,
2006.
239

[99] V.E.S. Souza, A. Lapouchnian, and J. Mylopoulos. (requirement) evolution requirements for adaptive systems. In Software Engineering for Adaptive and Self-Managing
Systems, 2012 ICSE Workshop on, pages 155 –164, 2012.
[100] V.E.S. Souza and J. Mylopoulos. From awareness requirements to adaptive systems:
A control-theoretic approach. In Requirements@Run.Time (RE@RunTime), 2011 2nd
International Workshop on, pages 9–15. IEEE Computer Society, 2011.
[101] Pascale Thevenod-Fosse and Helene Waeselynck. Statemate applied to statistical software testing. ACM SIGSOFT Software Engineering Notes, 18(3):99–109, 1993.
[102] Mustafa M Tikir and Jeffrey K Hollingsworth. Efficient instrumentation for code
coverage testing. In ACM SIGSOFT Software Engineering Notes, volume 27, pages
86–96. ACM, 2002.
[103] J.J.-P. Tsai, K.-Y. Fang, Horng-Yuan Chen, and Yao-Dong Bi. A noninterference monitoring and replay mechanism for real-time software testing and debugging. Software
Engineering, IEEE Transactions on, 16(8):897–916, 1990.
[104] van der Hauw K. Evaluating and improving steady state evolutionary algorithms on
constraint satisfaction problems. Master’s thesis, Leiden University, 1996.
[105] Axel van Lamsweerde. Requirements Engineering: From System Goals to UML Models
to Software Specifications. Wiley, 2009.
[106] Axel Van Lamsweerde and Emmanuel Letier. Handling obstacles in goal-oriented
requirements engineering. Software Engineering, IEEE Transactions on, 26(10):978–
1005, 2000.
[107] Margus Veanes, Pritam Roy, and Colin Campbell. Online testing with reinforcement
learning. In Formal Approaches to Software Testing and Runtime Verification, pages
240–253. Springer, 2006.
[108] Norha M. Villegas, Hausi A. M¨
uller, Gabriel Tamura, Laurence Duchien, and Rubby
Casallas. A framework for evaluating quality-driven self-adaptive software systems. In
Proc. of the 6th International Symposium on Software Engineering for Adaptive and
Self-Managing Systems, pages 80–89, Waikiki, Honolulu, Hawaii, USA, 2011. ACM.
[109] William E. Walsh, Gerald Tesauro, Jeffrey O. Kephart, and Rajarshi Das. Utility
functions in autonomic systems. In Proc. of the First IEEE International Conference
on Autonomic Computing, pages 70–77. IEEE Computer Society, 2004.
[110] O. Wei, A. Gurfinkel, and M. Chechik. On the consistency, expressiveness, and precision of partial modeling formalisms. Information and Computation, 209(1):20–47,
2011.
[111] K. Welsh, P. Sawyer, and N. Bencomo. Towards requirements aware systems: Runtime resolution of design-time assumptions. In Automated Software Engineering (ASE),
2011 26th IEEE/ACM International Conference on, pages 560–563, 2011.
240

[112] Kristopher Welsh and Peter Sawyer. Understanding the scope of uncertainty in dynamically adaptive systems. In Proc. of the Sixteenth International Working Conference
on Requirements Engineering: Foundation for Software Quality, volume 6182, pages
2–16. Springer, 2010.
[113] J. Whittle, P. Sawyer, N. Bencomo, B. H. C. Cheng, and J.M. Bruel. Relax: Incorporating uncertainty into the specification of self-adaptive systems. In Requirements
Engineering Conference, 2009. RE ’09. 17th IEEE International, pages 79–88, 31 2009sept. 4 2009.
[114] Ji Zhang and Betty H. C. Cheng. Model-based development of dynamically adaptive
software. In Proceedings of the 28th international conference on Software engineering,
pages 371–380. ACM, 2006.

241