POMDP-BASED BEHAVIOR PLANNING FOR ANTICIPATING AND RESPONDING TO
LANE EXCURSIONS IN LARGE AUTONOMOUS VEHICLES

By

Batuhan Yücer

A THESIS

Submitted to
Michigan State University
in partial fulfillment of the requirements
for the degree of

Computer Science—Master of Science

2024

ABSTRACT

Large autonomous vehicles are rising in prevalence, and the lane excursion problem is a common

occurrence in the navigation of large vehicles through urban roads. To the best of my ability,

background research has not revealed any studies on solving this problem under the assumption

that the excursion is unavoidable with a large vehicle body. This thesis introduces the lane

excursion problem to the research space with a primary focus on dynamic vehicle interaction, and

possible solutions are considered. The main solution provided is a POMDP (Partially Observable

Markov Decision Process) method utilizing predictive planning given another vehicle with unknown

intentions. A simulation environment is created for evaluation, and the POMDP method is evaluated

using two measures introduced along with the problem. The POMDP method demonstrates good

approximation ability of the hidden system variables and performs better than an uninformed non-

predictive approach. With an additional section on the real-life implications of the problem, this

study is expected to bring to light the importance of the lane excursion problem and prompt deeper

discussion on the domain of large autonomous vehicles as a whole.

Copyright by
BATUHAN YÜCER
2024

ACKNOWLEDGEMENTS

I would like to express my deepest thanks to my supervisor, Prof. Joshua Siegel, for being constantly

supportive and constructive towards my work on this thesis. He has been a true inspiration and a

great teacher to me during my time in the graduate program.

Special thanks to my committee members, Prof. Philip McKinley and Prof. Daniel Morris, for

their enthusiasm towards my project and their invaluable expertise and time in evaluating this work.

In addition, I would like to acknowledge Dr. Ali Ufuk Peker and ADASTEC Corp.

for

encouraging me to pursue graduate studies and for their financial support throughout this endeavor.

I sincerely thank my friends and colleagues, who have been by my side whenever I needed them

and helped me through times of stress.

Finally, I am incredibly grateful for my loving family. My parents are why I am where I am

today, and I feel the deepest appreciation for them as I close another chapter of my life.

iv

TABLE OF CONTENTS

LIST OF ABBREVIATIONS .

.

.

.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

vi

CHAPTER 1

INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . .

.

1

CHAPTER 2

BACKGROUND . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

.
2.1 Formal Definition of the Lane Excursion Problem . . . . . . . . . . . . . . . .
2.2 Previous Research .

5
5
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

.

CHAPTER 3

METHODOLOGY . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. 16
3.1 Modeling the Scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.2 ABT Solver
. 24
3.3 TAPIR and ROS Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

. . . . . . . . . . . . . . . . . . . . . . . . . . . . .

.

.

.

.

.

.

CHAPTER 4

EVALUATION AND RESULTS . . . . . . . . . . . . . . . . . . . . . . 28
4.1 Simulation of the Lane Excursion Scenario . . . . . . . . . . . . . . . . . . . . 28
4.2 Rule-Based Baseline Method . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.3 Performance Measures
4.4 Results and Analysis
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.5 Effects of Observation Uncertainty . . . . . . . . . . . . . . . . . . . . . . . . 36
4.6 Discussion on Real-life Application . . . . . . . . . . . . . . . . . . . . . . . 38

.

CHAPTER 5

CONCLUSION .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. 41

BIBLIOGRAPHY . .

. .

.

.

.

.

.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

v

LIST OF ABBREVIATIONS

POMDP Partially Observable Markov Decision Process

NACTO National Association of City Transportation Officials

MDP

Markov Decision Process

PBVI

Point-Based Value Iteration

POMCP Partially Observable Monte-Carlo Planning

ABT

Adaptive Belief Tree

TAPIR

Toolkit for Approximating and Adapting POMDP solutions In Real-time

ROS

Robot Operating System

vi

CHAPTER 1

INTRODUCTION

As autonomous driving technology continues to rise in prevalence, the idea of the autonomous

vehicle also evolves alongside it. As with most emerging technologies, entry to earlier research

on autonomous transportation was a higher hurdle to clear than it is today and a risky endeavor

for the automotive industry at the time. Thus, the profit-driven, privately owned passenger vehicle

has been the main product of the autonomous transportation industry for the last several decades.

However, as the industry matures, a wider array of profitable business models emerge along with the

decrease in upfront R&D costs. This is the reason that, recently, there has been a rise in the number

of companies attempting to tackle newer challenges, such as autonomous public transportation.

Viewing the autonomous driving challenge through a public transportation perspective naturally

expands the design of an autonomous vehicle to cover many more use cases than before, each with its

own challenges. One such case is an autonomous bus transporting passengers throughout an urban

route multiple times daily. Navigating urban areas with a large bus is, of course, a much different

and arguably more difficult exercise in practice than it is with a standard passenger vehicle. This

type of problem is also not explored in literature as extensively as it is for passenger cars because

of the sparsity of large autonomous vehicles in the industry.

As a software engineer in the industry, my background includes several years of employment

at ADASTEC Corp., a company developing a Level 4 autonomous driving software platform for

public transportation vehicles. At ADASTEC, most of our research and development has been

performed on typical public transit buses. A major part of the product is a full-stack autonomous

driving software and sensor kit, which includes all the standard Level 4 automation capabilities

from perception to localization and control. I personally work on the path planning and control of

the vehicle, which involves route navigation, short and long-term decision-making, lane following,

and comfortable driving. The outputs of these tasks depend heavily upon the kinematics and the

geometry of the driven vehicle, which means that we regularly face challenges that arise due to

the vehicle specifically being a bus. Previous research often makes the understandable assumption

1

Figure 1.1 A part of the daily route taken by the bus running ADASTEC’s software at the city of
Stavanger, Norway. On the left is a camera snapshot of the general area, and on the right is the
lane excursion scenario replicated in a simulation environment using collected sensor data.

that the vehicle in question is a passenger car that has standard dimensions and uses a well-known

driving model, making it difficult to adapt to public transport where large vehicle dimensions and

unusual vehicle geometries are commonly encountered, such as trucks and articulated buses. In

this thesis, I address one such challenge where the long wheelbase and the wide frame of a bus

cause lane excursions during sharp turns, a regular occurrence on urban roads.

Today, many other autonomous driving companies are also focusing on larger types of au-

tonomous vehicles. Aurora’s [1] hardware system product is designed to be fitted to primarily

semi-trucks moving long distances. EasyMile [2] has a fleet of hundreds of autonomous passenger

shuttles currently in operation, as well as truck and tow tractor projects. With the rising devel-

opment efforts towards large autonomous vehicles, it becomes increasingly important to gain the

unique perspectives required to address domain-specific challenges associated with these vehicle

types. The lane excursion scenario to be defined is one such challenge.

Figure 1.1 depicts an example of a lane excursion from data collected by ADASTEC. When the

autonomous bus attempts to clear the sharp turn, the front of the bus crosses the lane separation line

and into the opposite lane. Even though this maneuver is unavoidable due to the shape of the road

and the long wheelbase, it presents several possible risks: a collision with an oncoming vehicle

in the opposite lane, a standoff where both the bus and the opposing vehicle are unable to move

2

forward and one must reverse to clear the way, or at the very least, discomfort to the inhabitants of

one (or both) of the vehicles involved due to emergency braking.

To prevent trajectories from conflicting with other vehicles in other scenarios, such as navigating

intersections, many autonomous driving pipelines employ predictive methods so that said situations

can be foreseen and appropriate actions can be performed by the autonomous vehicle before the

system reaches that state. However, to the best of my knowledge, these methods have not yet been

implemented in the context of the category of lane excursions where an area of collision risk exists

directly as a consequence of the large size of the ego vehicle.

In this thesis, I introduce a formal definition of "the lane excursion problem" to the literature.

Although this concept of lane excursion is described using physically large autonomous vehicle

types as references, such as trucks and buses, the problem is generalizable to any vehicle. Lane

excursions happen and are unavoidable whenever the curvature of the road is too high for the

whole body of the vehicle to stay in lane boundaries. With this perspective, the "largeness" of

the navigating vehicle is directly dependent on road shapes and sizes. I formally define the term

"largeness" in the context of the lane excursion problem as well, for future reference in literature.

I propose two types of longitudinal controllers to navigate through the scenario under defined

problem constraints. For the best solution, I model the lane excursion scenario as a POMDP,

an optimization-based system model that enables accurately approximating the system state in

future time steps and selecting the best action to take in an online fashion. The other solution

is a configurable, rule-based controller which acts as a baseline. For evaluation, I introduce two

performance measures and create a simulation environment. The POMDP method demonstrates

high performance through quantitative and qualitative analysis.

The lane excursion problem is significant due to its high probability of occurrence during urban

navigation, especially with commercial buses, as we at ADASTEC often experience. A meaningful

example is the typical four-way intersection structure in the United States, where city buses need to

encroach into the opposite lane for some time while turning right because it is impossible to clear

the 90-degree turn without this encroachment. Figure 1.2 shows an example of this.

3

Figure 1.2 An example from a design guide by NACTO (National Association of City
Transportation Officials) [3], displaying the extra lane space an articulated bus requires while
making a sharp right turn in an urban four-way intersection. The front end of the bus draws a
larger arc than the kinematic base of the leading bus segment.

A past article from the Transportation Research Record journal includes statistics from com-

mercial bus accidents [4]. They report that rear-end crashes where one of the vehicles is stopped

are among the most common types of accidents (24% of all recorded crashes), especially at inter-

sections. They add that the bus is often the stopped vehicle in these incidents. They argue that

the reason for this is the bus unexpectedly coming to a stop during right turns because of vehicles

in the opposite lane. Their primary proposal to address accidents is to widen the turn radius of

intersections so that buses can clear the turn without encroaching upon the opposite lane. Another

study shows that narrower lane widths directly correlate to higher bus accident rates [5].

With the significance of the problem in mind, my goals in this thesis are twofold. First, I aim

to present a successful real-time algorithm to address one category of lane excursions. Second,

I aim to provide perspective on the real-life implications of the defined lane excursion problem

and to prompt future research toward solving the additional challenges brought on by the much

more complex real-life environments. To this end, I also include a discussion section detailing the

considerations one must make when implementing the lane excursion planner in a real autonomous

system. I expect that the proposed lane excursion planner will play a pivotal role in path planning

pipelines in the future, especially for projects focused on large autonomous vehicle types in the

growing industry.

4

CHAPTER 2

BACKGROUND

2.1 Formal Definition of the Lane Excursion Problem

2.1.1 Lane and Vehicle Geometry

We start by considering a two-lane road (right-hand traffic), with the lanes L = {L𝑒, L𝑜} in

opposite directions. The ego vehicle moves along L𝑒, the ego lane, and an independent actor

vehicle moves along L𝑜, the opposite lane. For this problem, both vehicles are assumed to follow

their lane centers perfectly, and lateral motion will be ignored. The reason for this is explained

more in detail in the section about problem boundaries.

In our scenario, the ego lane curves to the right, and the opposite lane naturally curves left

along with it in a consistent circular motion until both lanes have turned 90 degrees, resulting in a

quarter of a circle with radii R𝑒 for L𝑒, and R𝑜 > R𝑒 for L𝑜. As traffic rules dictate, both lanes

have associated speed limits; for this study, the limit is set to 5 m/s everywhere. The speed limits

also decrease on the curved parts of the lanes to limit the lateral acceleration experienced while

taking the turn.

We define an ego vehicle navigating the turn on L𝑒 with dimensions D𝑒 = { 𝑓𝑒, 𝑏𝑒, 𝑠𝑒}, where 𝑓𝑒

is the distance between the rear axle and the front end of the vehicle, 𝑏𝑒 is the distance between the

rear axle and the rear end of the vehicle, and 𝑠𝑒 is the distance from the driveshaft to the right or the

left side of the vehicle. It is clear then that the vehicle’s total length is equal to 𝑓𝑒 + 𝑏𝑒, and its total

width is equal to 𝑠𝑒 + 𝑠𝑒. We similarly define an actor vehicle with dimensions D𝑎 = { 𝑓𝑎, 𝑏𝑎, 𝑠𝑎},

navigating through L𝑜 in the opposite direction at the same time.

To represent a real-life situation, we refer to values from ADASTEC’s collected data as well as

the actual bus dimensions used on-site. Therefore, for the rest of this thesis, the reference vehicle

dimensions are 𝑓𝑒 = 6.62 , 𝑏𝑒 = 1.70 , 𝑠𝑒 = 1.22 , 𝑓𝑎 = 3.72 , 𝑏𝑎 = 1.11 , 𝑠𝑎 = 1.05 (all in meters).

For the solutions proposed in this study, these values are arbitrary as long as they are geometrically

feasible and the vehicle dimensions are large enough to cause the lane excursion problem, according

to the equations provided throughout the rest of this definition. This maintains the transferability

5

Figure 2.1 An instance of lane excursion due to the curvature of the road. All relevant vehicle and
road dimensions are labeled.

of the problem when the vehicle sizes vary. However, it is assumed that the vehicle has a single

rigid body, so vehicles with multiple segments, such as articulated buses and trucks with trailers,

would need to be modeled differently, and they do not apply to this problem definition.

The standard bicycle model commonly used for most autonomous control frameworks [6]

dictates that the center of the rear axle of the vehicle is the reference point that traces the lane to be

followed, while the front axle is steered to keep the reference point on the lane center at all times.

Given the circular shape of the curve in the scenario, at any point inside the curve, the ego vehicle’s

driveshaft will be tangential to the lane, which leads to the front of the vehicle being farther to

the lane center compared to its rear. Figure 2.1 visualizes this scenario along with the relevant

variables.

6

Figure 2.2 The vehicle in the scenario moved to the far right of its lane, to calculate if it is "large"
in the context of a lane excursion. The image shows that it still slightly overextends at the front left
point, which means that it is indeed a large vehicle.

While the ego vehicle is in this state, the center of the rear axle is exactly on the lane center,

and the furthermost point to the lane center is the front-left corner, with a lateral distance of
𝑑𝑙𝑎𝑡𝑒𝑟𝑎𝑙 = √︁(R𝑒 + 𝑠𝑒)2 + 𝑓 2

𝑒 − R𝑒. Given a turn radius of R𝑒 = 15 meters, which is far from an

extreme case, this distance calculates to 𝑑𝑙𝑎𝑡𝑒𝑟𝑎𝑙 ≈ 2.52 meters.

Many standards regarding urban road design recommend lane widths of around 3.5 meters when

accommodating large vehicles such as city buses [7, 8], meaning a half-width of 1.75 meters. The

previously calculated 𝑑𝑙𝑎𝑡𝑒𝑟𝑎𝑙 value exceeds this half-width and invades the opposite lane by 0.77

meters, comfortably blocking the path of an opposing vehicle and causing risks. This becomes

even more severe when the curve of the road is tighter or a lane width as high as 3.5 meters is not

available for the lane in question, situations not at all uncommon on a highly variable urban route.

We take this opportunity to coin a new term, "largeness," to analyze lane excursions. We say

that a vehicle "has largeness" or "is large" in the context of a possible lane excursion scenario if

the excursion is impossible to avoid through trajectory shape optimization. This condition can be

checked by assuming the vehicle is as far to the right of its lane as possible. This ensures that the

vehicle can neither be translated further to the right to avoid the excursion nor steer to the right,

7

because the right side of the vehicle body is currently at a tangent to the right lane boundary, and

any steering would cross the lane on the right side. This can be seen in Figure 2.2. If, in this state,

the vehicle still overextends to the opposite lane, then it is, by definition, a large vehicle for the

purposes of the lane excursion problem. This condition is formalized as an inequality check below:

Vehicle is large if 𝑑𝑙𝑎𝑡𝑒𝑟𝑎𝑙 =

√︃

(R𝑒 + 2𝑠𝑒 − 𝑊)2 + 𝑓 2

𝑒 − R𝑒 > 𝑊

(2.1)

where 𝑊 is equal to the half-width of the ego lane.

It is useful to emphasize that the defined "largeness" term for a vehicle is relative to the lane

it navigates. With this definition, even a regular passenger vehicle can be large in the context of a

lane excursion scenario if the lane width is too small or the curvature is too sharp. Given a half lane

width of 𝑊 = 1.5 meters and the vehicle dimensions previously provided, we calculate that the ego

vehicle is large if the lane curve radius is R < 37.9 meters, In contrast, if the actor vehicle were to

navigate the same lane, it would be defined as large if the lane curve radius was R < 6.6 meters.

2.1.2 Vehicle Interaction

To reiterate, the main risk while navigating this curve stems from a possible collision with the

incoming actor vehicle, and the main challenge is that the intent of the other driver at any point is

not known to the ego vehicle. For this scenario, the actor vehicle is assigned one of three driving

styles: "standard," "aggressive," or "cautious." In simulating the scenario, their driving style affects

their decision-making towards passing or yielding. By default, all drivers attempt to follow their

reference lane velocity as closely as possible. However, when both the ego vehicle and the actor

vehicle approach the conflict area at the same time, their interaction in a short span of time decides

which driver takes priority in passing the conflict area and which driver waits to yield the way.

An aggressive driver, as defined here, speeds up at this moment to attempt to pass before the ego

vehicle. A cautious vehicle slows down and gives an opportunity to the ego vehicle to pass. A

standard vehicle maintains its speed and reacts to the actions of the ego vehicle instead.

The existence of a set of different driving styles is a cornerstone of the definition of a lane

excursion problem. Without varying styles, the actor vehicle becomes completely reactionary, at

which point the ego vehicle assumes complete control of the whole interaction, and a major source

8

of uncertainty is cleared. However, with varying driving styles, the ego vehicle needs to estimate a

course of action by considering significantly different future states while the style is hidden from the

autonomous system. The set of styles can be more diverse and complex depending on how much

accuracy is needed in representing a real-life scenario. For this study, with random acceleration

noise added in simulation, the given set of three driving styles is able to represent a sufficiently wide

variety of driving behaviors such that a controller with an incorrect assumption fails to navigate

through the scenario without conflicts.

An autonomous system may employ different types of automation frameworks to navigate the

scenario. One such framework involves a prediction module to approximate the state of the actor

vehicle on a finite time horizon. This allows the planning module to react accordingly and avoid

maneuvering itself into a state that conflicts with the actor state in the future. A problem with

this approach is that it ignores that the actor’s actions heavily depend on the ego vehicle’s actions

because the human driver on the other side will make their own assumptions and predictions on

how the ego vehicle will move in the foreseeable future. If all possible actions of the actor are

considered with the same weight and many predictions are made at the same time, this leads instead

to a very conservative speed planning by the autonomous system and possibly even to the "freezing

robot" problem [9]. This is because whatever action is taken by the ego vehicle, there will always

be a non-negligible chance that the actor vehicle will act in such an undesirable manner that the

system reaches a conflicting state. Therefore, the best action to take becomes not to move at all.

Another possible framework involves performing prediction and planning in parallel. Here, a

search space is established for the possible future states, with well-defined state transitions. This

space can be searched up to a finite horizon, and optimal actions can be chosen by assigning values

to states similar to Q-learning. This approach enables the system to discard state sequences with

negligible probabilities, leading to a better approximation of what will be encountered in the future.

A POMDP, as will be detailed in subsequent chapters, can be used to model the scenario in a way

that is solvable by this second framework.

9

2.1.3 Problem Boundaries and Assumptions

Here, we list the assumptions made in formally defining the lane excursion problem:

• The road geometry is constrained to a two-lane road with lanes in opposite directions. There

may be other lanes to the right of each lane, but the model would not include them, and the

proposed solvers in this study would not handle lane changes. Different curvatures and lane

widths are acceptable, and the lanes can also be curving in the other direction.

• There is only one actor vehicle, and it is visible to the autonomous system sensors. It is hy-

pothetically possible to handle multiple actor vehicles using the solvers provided in this work

by running the optimization multiple times. However, doing this instead of implementing

both vehicles in a single optimization process is computationally infeasible.

• Vehicle shapes can be arbitrarily chosen as long as they form a geometrically plausible,

single rigid body. The actor vehicle can also be large, which the solvers are expected to

handle. Vehicles with segmented bodies (e.g., articulated buses) are excluded while defining

the problem.

• For lane excursions to occur, there should be a notable curvature on the road. So, low urban

speeds are assumed for both vehicles instead of highway speeds. The vehicles are assumed

to be reasonable in staying under the speed limits provided by the traffic rules on the map.

• As an extension to the low-speed assumption, vehicle dynamics characteristics of the bus that

affect high-speed driving, such as tire slip, are ignored for movement calculations. Instead,

the bicycle kinematic model is assumed for both vehicles moving at low speeds.

• For the purposes of this study, lateral lane motion will be ignored since the problem assumes

in the first place that a lane excursion is unavoidable due to the shape of the road. As some

examples show in the introduction section, there are many cases in urban road navigation

where the large vehicle is unable to progress without encroaching into adjacent lanes. Here,

we assume that only a trajectory with such a shape is available, and the task becomes a

10

type of velocity optimization over the provided trajectory. This is a common path planning

framework known as the "path-velocity decomposition," where the ego path’s shape and

velocity profile are calculated consecutively instead of being optimized together [10].

In this work, the shape of the trajectory is provided as an input to the POMDP solver, which

then acts as the velocity optimizer. The POMDP, as will be detailed further, models the

system state using only the longitudinal distance traveled on the reference lane and not the

lateral distance to the lane. The solver is capable of handling noise over the observed system.

Since any lateral motion will only be observed by the solver as some noise on the longitudinal

motion, the resulting change in behavior is expected to be insignificant. Thus, ignoring the

lateral lane motion is, in fact, not a major issue in this study.

2.2 Previous Research

2.2.1 POMDP

The conceptualization of the Partially Observable Markov Decision Process starts with the

assumption of incomplete state information being incorporated into the MDP [11] (Markov Decision

Process). Later works reveal that solving for an optimal policy given a non-trivial POMDP model

is a computationally intractable task [12, 13], due to what is now commonly referred to as "curses

of dimensionality and history." The severity of these computational drawbacks is aggravated by

modeling the POMDP with continuous state, observation, or action spaces, often the best options

for representing real-life problems. Therefore, while the earlier POMDP solvers focus on the

discretization of these spaces as well as the implicit belief state, the preferred approach later

becomes approximation by belief state sampling, employed in popular methods such as PBVI

(Point-Based Value Iteration) [14], POMCP (Partially Observable Monte-Carlo Planning) [15] and

their many variants. Specifically in online policy iteration, the ABT (Adaptive Belief Tree) speeds

up computation drastically by utilizing sampled belief states from the previous time step instead

of resampling the entirety of the state pool each time. In addition, it is proven that an ABT will

converge to an optimal solution with sufficient optimization time, meaning that the action sequence

11

with the highest total expected reward will be approximated correctly [16].

With these latest developments in solving POMDPs, and the ability of the POMDP to represent

the data gathering process under constraints of noise and incompleteness, they have become useful

tools to address real-life problems in many different fields. Recent works utilize POMDPs to tackle

problems in ecology [17, 18], networks [19, 20], infrastructural safety [21] and many aspects of

robotics [22, 23, 24].

Due to the difficulty of coding complex POMDP problems in a reliable and computationally

efficient manner, open-source frameworks for solving various problems using POMDP models have

also been developed in software languages such as Python [25], Julia [26], and C++ [27].

2.2.2 Path Planning for Large Autonomous Vehicles

Some existing work on the consequences of large vehicle dimensions focuses on trajectory

optimization and vehicle control. Yu et al.

[28] design a comprehensive planning and control

framework for autonomous buses of 12 meters in length. They focus on the uncommon kinematic

properties of the autonomous vehicle in question as the primary constraint on trajectory generation

and rule-based decision-making algorithms for various scenarios such as lane changing and over-

taking. More recently, Oliviera et al. [29] address the navigation of narrow and constrained roads

via autonomous buses. They model the lane following scenario as an optimization problem where

the kinematics and dynamics of the bus are used as constraints, as well as the existence of the front

and rear overhangs on a bus, which are free to move outside the drivable area during lane following

as long as the wheels stay inside. The long wheelbase is also considered as one of the issues since

the front of the bus creates a larger arc than its rear while turning, a crucial characteristic in the

scenario this thesis work is built upon.

2.2.3 Path Planning as a POMDP Problem

In the context of autonomous driving, some major sources of uncertainty, such as sensor noise,

occlusions, and the lack of knowledge about the intents of other actors in the environment, can all

be taken into consideration when modeling the system as a POMDP. This allows solvers to perform

high-level decision-making tasks such as navigating an intersection or roundabout [30, 31, 32],

12

changing lanes [33, 34], or stopping to pedestrians at crosswalks [35]. These methods usually

differ most in their state and observation representations, as well as the solvers used to calculate the

target policy.

As mentioned in the problem definition, when the trajectory of the autonomous vehicle does

contain an unavoidable lane excursion, some part of the drivable area is established as the boundaries

in which a collision may occur. Then this problem can also be evaluated as a collision avoidance

problem, where two incoming vehicles attempt to predict each other’s behavior and safely navigate

through the area with the collision risk. Re-framing the lane excursion problem in this manner is

important since it reveals many similarities with the intersection navigation problem. Therefore,

this thesis work takes inspiration from notable prior research on intersection navigation.

Brechtel et al. [36] formulate the POMDP under the assumption that uncertainties arise from

noise and occlusions. The sensor noise is implicitly defined in the observation model, and to handle

occlusions, a special type of observation is defined for vehicles outside the line of sight of sensors.

The authors implement their own solver to calculate the policy offline. The belief space has to be

discretized to make computation feasible for the offline approach. This discretization is performed a

priori by a learned neural network. Unfortunately, this constrains the problem to previously known

environments, and a dynamic solution becomes unavailable.

Liu et al.

[37] model T-junction and roundabout scenarios using a POMDP with multiple

obstacle vehicles. The states of the vehicles are their positions and orientations, as well as a

hidden intention from a set of 4 different intentions: Stopping, hesitating, normal, and aggressive.

The hidden intent directly affects how the driver’s behavior differs from the expected motion, and

the expected motion is encoded into the local map through a previously learned model utilizing

geometric context from the drivable areas. The concept of a set of hidden intents leading to varying

driving models is used as inspiration when defining the lane excursion problem for this thesis.

Hubmann et al. [38] focus on a four-way intersection scenario where the sources of uncertainties

are sensor noise and driver intent. Other actors can choose one of three lanes to follow; this intent

variable is hidden from the autonomous vehicle, leading to partial observability. Ego actions

13

are selected depending on the hidden variables’ beliefs, and the utilized ABT solver allows more

accurate preventative decision-making at the current time step. The online policy computation and

the modeling of the hidden variable make this work an appropriate analog to the lane excursion

problem. Thus, it will be referenced frequently throughout this writing.

Zhang et al.

[39] tackle the same problem in an intersection with unknown driver intents.

However, they take a different approach under the assumption that the set of possible predicted

paths that the actor vehicles may follow are already available through the prediction module of the

autonomous stack. With this, state representation becomes trivial, with only the static intents of the

vehicle being (hidden) variables. At each time step, a time series of possible future vehicle poses

is used as input for policy optimization. This time series is discretized into a multi-step occupancy

grid comprised of layered masks of the occupied cells over time. While representing the predictions

as grid maps makes online computation feasible, accurate predictions are not always available in

practice. Another major issue is that separating the prediction module prevents the ego vehicle

from considering the future interaction between the two vehicles when deciding on a behavior.

2.2.4 Addressing Lane Excursions

Lane excursions are also referred to as lane departures in literature, although departures often

describe completely exiting one lane to move into another in scenarios such as lane changing.

Preventing lane excursions before they occur is a vital constraint in most trajectory optimization

problems, and there are many studies on this subject. However, to the best of my knowledge, no

studies exist on specifically velocity optimization algorithms that act under the assumption that

a lane excursion is unavoidable. To investigate prior art, I used the online tools Google Scholar,

Science Direct, and SciSpace to conduct my searches. The most extensive search was done on

Google Scholar. Many search terms were used to examine the available studies: To describe the

vehicle type, size, and automation, I used the terms "Autonomous," "Automated," "Self driving,"

"Truck," "Bus," "Shuttle," "Large vehicle," and "Long vehicle." To describe the domain of the

desired algorithm, I used the terms "Velocity planning," "Path planning," "Trajectory planning,"

"Motion planning," and "Decision making." To describe the road characteristics that lead to lane

14

excursions, I used the terms "Turning," "Curvature," "Horizontal curve," "Urban roads," "Lane

width," and "Turn radius." Finally, to describe the problem itself, I used the search terms "Lane

departure," "Lane overflow," "Lane excursion," and "Encroachment." Among these terms, many

combinations were also tried to filter through promising results, such as "Path planning for large

autonomous vehicles" and "Lane departures of buses in urban roads." However, I found no results

describing the lane excursion problem as defined in this study. Hence, in this thesis, I take the

opportunity to define and address said problem, as well as to express the need for it in the industry

and to coin necessary terms for future references in this branch of research.

15

CHAPTER 3

METHODOLOGY

3.1 Modeling the Scenario

3.1.1 Map Representation

As mentioned, the lane excursion scenario in question is simulated on a two-way, double-lane

road with a 90-degree turn. The subsequent sections will define a solver to optimize an action

policy for navigating this scenario as the ego vehicle. Due to the optimization process being a

Monte-Carlo process, future states need to be sampled at a high frequency to solve the scenario as

optimally as possible in real time. In addition, the primary goal is to avoid a conflict between two

vehicles, and a conflict in this problem occurs when the two vehicles are at a state where neither can

pass the other without one of them reversing to yield the way, or even worse, a state of collision.

The calculation required to decide if there is a conflict requires many collision checks on simulated

future vehicle positions using polygon intersection algorithms and is a computationally heavy

process. Eliminating this burden during real-time optimization significantly boosts the processing

ability of the method. To this end, the lane map is pre-processed and discretized. The distance

ranges on the lanes where a conflict occurs are extracted, and whether a future state is a conflict

state can be checked using a simple distance comparison. Pre-processing the map is a viable option

in real-life applications since prior map data is usually available to autonomous systems.

First, 𝑁 points are sampled on both the ego and the opposite lanes, using a sufficiently dense

sampling interval to represent the curvature on both lanes. These points represent the possible

vehicle positions on the lane, with the center of its rear axle aligning with the position and the

orientation of a given point. Then, each point is paired with the opposing point on the other lane,

creating 𝑁 lines connecting the lane centers.

The body of each vehicle is simulated at each lane point. At each point, the vehicle frame

intersects a given number of lines, and these intersection points correspond to a ratio of how much

the vehicle covers the area between its own lane center and the opposing lane center. A ratio of 0.5

means that the intersection point is exactly in the middle of the two lane centers, 1.0 means that

16

Figure 3.1 A visual representation of the map discretization and conflict checks. The red arrows
depict how much each vehicle occupies the space between the two lane centers in different
directions, creating a separate histogram at each point.

the vehicle reaches or exceeds the opposing lane center and 0.0 shows that the vehicle is currently

not present at that location. This is visually depicted in Figure 3.1. These ratios create a histogram

of lane coverage ratios at a given vehicle position. In the end, 𝑁 histograms are created for each

vehicle, where each histogram corresponds to a single lane point.

When the ego vehicle’s histogram and the actor vehicle’s histogram are summed up, if any value

exceeds 1.0, the vehicles are in a physical collision state. This creates the basis of the conflict areas

on each lane: If, at any state, the vehicles are propagated forward and there is no possibility that

they pass each other without their histograms summing up to a collision, then the two vehicles are

in conflict, and the solver must avoid this state. To avoid states that are uncomfortably close to a

physical collision, this threshold is reduced to 0.8 instead when producing the conflict areas.

The conflict areas for each lane are extracted only once as a pre-processing step, so this

computation does not affect the performance of the algorithm during runtime. Given a future state,

the solver now only needs to check if the arc distances of the vehicles are inside the threshold of

the conflict area, which is much easier to compute.

17

3.1.2 Definition of POMDP

A POMDP model is defined by the tuple of expressions ⟨𝑆, 𝐴, 𝑂, 𝑇, 𝑍, 𝑅, 𝑏0, 𝛾⟩, wherein 𝑆, 𝐴

and 𝑂 are respectively the sets of possible states, actions, and observations for the given problem.

At any time step, the problem can be in a state 𝑠 ∈ 𝑆, where taking an action 𝑎 ∈ 𝐴 transitions the

system into a new state 𝑠′ ∈ 𝑆. This transition is probabilistic and is characterized by the transition

function 𝑇 (𝑠, 𝑎, 𝑠′) = 𝑝(𝑠𝑡+1 = 𝑠′|𝑠𝑡 = 𝑠, 𝑎𝑡 = 𝑎), which calculates the probability of reaching the

state 𝑠′ at the next time step 𝑡 + 1 when action 𝑎 is taken inside state 𝑠 at time step 𝑡. Of course,

this function and all of the other functions present in the POMDP model adhere to the Markovian

property by being dependent on only the latest time step’s state, action, and observation.

The property of the POMDP that extends the original MDP model is the partial observability

of the system. While in an MDP, the whole system state 𝑠 is observable from the perspective of

a solver, in a POMDP, a part of this information is unavailable. Instead, an observation 𝑜 ∈ 𝑂 is

received at each time step after an action is taken and the corresponding state transition occurs.

The observation function 𝑍 (𝑠′, 𝑎, 𝑜) = 𝑝(𝑜𝑡+1 = 𝑜|𝑎𝑡 = 𝑎, 𝑠𝑡+1 = 𝑠′) denotes the probability of

receiving the observation 𝑜 when state 𝑠′ is reached by taking the action 𝑎.

Since the state of the POMDP is unknown to a solver, a probability distribution over all possible

states is maintained instead. This distribution is called a belief 𝑏, and 𝑏(𝑠) denotes the probability

of being at state 𝑠 under the current belief 𝑏. An observation received from the system changes the

belief since more information has become available relating to the partially observable state. The

initial belief 𝑏0 is part of the model, and the belief update function, 𝜏(𝑏, 𝑎, 𝑜), calculates the next

belief 𝑏′ given an action and an observation:

𝑏′(𝑠′) = 𝜂𝑍 (𝑠′, 𝑎, 𝑜)

∑︁

𝑇 (𝑠, 𝑎, 𝑠′)𝑏(𝑠)

(3.1)

𝑠∈𝑆
Every action in a POMDP is rewarded or penalized. The reward function 𝑅(𝑠, 𝑎) returns a

scalar reward value for taking action 𝑎 inside state 𝑠. The goal of a POMDP solver is to maximize

the accumulative future rewards up to a finite or infinite time horizon, given a discount factor

𝛾 ∈ [0, 1). Since the system state, transitions, and observations are probabilistic, the value to

maximize becomes the expected total future rewards instead. To this end, the solver calculates a

18

policy 𝜋 : 𝐵 → 𝐴, which maps each possible belief to an action. The optimal policy 𝜋∗ maximizes

the expected total rewards given the initial belief:

𝜋∗ = argmax

𝜋

𝑉 (𝑏0, 𝜋) = argmax

𝜋

𝐸

(cid:34) 𝐻
∑︁

𝑡=0

𝛾𝑡 𝑅 (𝑠𝑡, 𝜋(𝑏𝑡))

(cid:35)

𝑏0, 𝜋

(cid:12)
(cid:12)
(cid:12)
(cid:12)

(3.2)

where 𝑉 (𝑏0, 𝜋) is the value function and 𝐻 is the time horizon of the solution.

The belief space is a probability space and is always continuous, which makes optimizing a

POMDP policy much more complicated than its MDP counterpart. In addition, the state, action,

and observation spaces may also be continuous. Even though this is another source of high problem

complexity, it is also a common choice for modeling real-life systems since the alternative is to

discretize these spaces but lose crucial information in the process.

3.1.3 State Space

The specific lane excursion problem defined in this work depicts the ego vehicle and the actor

vehicle moving in opposite directions in their respective lanes, L𝑒 and L𝑜. The state space for this

problem is represented as 𝑠 = (𝑠𝑒, 𝑠𝑎)𝑇 ∈ 𝑆, where 𝑠𝑒 is the ego state, and 𝑠𝑎 is the actor state. The

ego state contains the following values:

𝑠𝑒 = (𝑑𝑒, 𝑣𝑒, 𝑎𝑒)𝑇

(3.3)

where 𝑎𝑒 and 𝑣𝑒 are the magnitudes of the longitudinal acceleration and velocity of the ego vehicle.

The acceleration value is included in the ego state because changes in acceleration will be penalized

by the reward function, and tracking the current ego acceleration is required. 𝑑𝑒 is the longitudinal

distance traveled on L𝑒 in the Frenet coordinate system. The vehicle’s rear axle, being its kinematic

base, is assumed to be the reference point for its position. Lateral motion is ignored for the purposes

of this scenario, so the lateral distance to the lane is not included in the state.

Similarly, the actor state is as follows:

𝑠𝑎 = (𝑑𝑎, 𝑣𝑎, I𝑎)𝑇

(3.4)

where 𝑑𝑎 and 𝑣𝑎 represent longitudinal distance and velocity in the opposite lane, L𝑜. I𝑎 is the

enumerated driving style (intent) of the actor vehicle; it is a constant hidden variable and cannot be

19

Figure 3.2 Distance and velocity state variables of the lane excursion POMDP model. In addition
to these variables, ego acceleration, and actor driving style are also included in the system state.

observed. The driving styles defined for this problem are "cautious," "standard," and "aggressive."

It characterizes the behavior of the actor vehicle when both vehicles are approaching the conflict

area. In this situation, a cautious driver is more inclined to yield to the ego vehicle, whereas an

aggressive driver is more inclined to attempt passing first.

Longitudinal movement categorization is more commonly employed in pedestrian-centric pre-

dictive models, with the hidden state being, for instance, the pedestrian’s intent towards crossing a

road or waiting [35]. Intersection navigation methods, even though they address a similar problem,

often ignore this factor in modeling the intersection scenario [38, 39] and categorize driver intent

by the turn direction they are heading towards. The longitudinal intent is also pivotal in the context

20

of the lane excursion scenario because the driving style of the actor vehicle is the primary factor in

how an interaction will occur between the ego and actor vehicles in future time steps.

The three driving styles defined in this scenario do not necessarily cover every possible individ-

ual behavior that one may encounter in real-life situations. However, the simulation of the scenario

(detailed in further sections) is constructed in such a way that the actor vehicle can choose to move

with acceleration values in the range of -4.0 m/s2 to 2.0 m/s2 depending on its driving style, which

leads to high interaction variety with the aim to cover the most cases.

This POMDP model employs a continuous state space. Only the driving style is enumerated,

whereas the rest of the values are continuous.

3.1.4 Action Space

Actions and observations are the direct sources of the branching factor when considering future

belief states for the given problem. Keeping the action set discrete significantly improves the

computational ability, as well as leading to closer-to-optimal solutions. For this problem, at each

time step, the ego vehicle can take one of three discrete actions: accelerate (1.5 m/s2), decelerate

(-1.5 m/s2), or maintain its speed (0 m/s2). The chosen acceleration value is constantly applied

until the next time step.

3.1.5 Observation Space and Observation Function

It is assumed for the base model that there is no noise introduced by perception, and the ego

vehicle can completely observe its own state, as well as the position and velocity of the actor

vehicle. The only hidden variable is the driving style of the actor vehicle. Thus, the observation

space is represented as 𝑜 = (𝑜𝑒, 𝑜𝑎)𝑇 ∈ 𝑂, with 𝑜𝑒 being the ego observation and 𝑜𝑎 being the actor

observation. The specific values included are similar to the state space:

𝑜𝑒 = (𝑑𝑒, 𝑣𝑒, 𝑎𝑒)𝑇

𝑜𝑎 = (𝑑𝑎, 𝑣𝑎)𝑇

(3.5)

(3.6)

The observation function then simply maps each value to itself. Of course, the defined obser-

vation space is continuous in the same way as the state space.

21

3.1.6 Transition Function

For both the ego vehicle and actor vehicles, a constant acceleration motion model is employed

at each time step. The ego vehicle state is updated with the following equation:

𝑑′
𝑒

1 Δ𝑡 0

𝑑𝑒

(cid:169)
(cid:170)
(cid:173)
(cid:174)
(cid:173)
(cid:174)
(cid:173)
(cid:174)
(cid:173)
(cid:174)
(cid:173)
(cid:174)
(cid:171)
(cid:172)
where Δ𝑡 is the duration of the time step and 𝑎𝑐𝑚𝑑 is the chosen acceleration action under the

(cid:170)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:172)

(cid:169)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:171)

(cid:170)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:172)

(cid:169)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:171)

(cid:169)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:171)

(cid:170)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:172)

𝑎𝑐𝑚𝑑

(3.7)

𝑎′
𝑒

𝑣′
𝑒

𝑎𝑒

𝑣𝑒

=

+

0

0

0

1

0

0

1

(Δ𝑡)2
2
Δ𝑡

current belief, 𝑎 = 𝜋(𝑏). After the state is updated, the distance and velocity values are limited to

valid thresholds: 𝑑′

𝑒 ∈ [0, length(L𝑒)] and 𝑣′

𝑒 ∈ [0, ∞). This is to prevent invalid states, such as

vehicles moving outside map limits and moving in reverse.

For the actor vehicle, a similar state update is performed. The only difference is that the driving

style stays unchanged:

𝑑′
𝑎

1 Δ𝑡 0

𝑑𝑎

(cid:170)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:172)
where 𝑎𝑔𝑒𝑛 is a rough approximation of the actor state behavior. It is the sum of three modifiers:

(cid:169)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:171)

(cid:170)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:172)

(cid:169)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:171)

(cid:169)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:171)

(cid:170)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:172)

(cid:169)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:171)

(cid:170)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:172)

𝑎𝑔𝑒𝑛

I′
𝑎

𝑣′
𝑎

𝑣𝑎

I𝑎

=

+

0

0

0

0

1

0

1

(3.8)

(Δ𝑡)2
2
Δ𝑡

𝑎𝑔𝑒𝑛 = 𝑎𝑟𝑒 𝑓 + 𝑎𝑠𝑡𝑦𝑙𝑒 + 𝑎𝑢𝑛𝑐

(3.9)

𝑎𝑟𝑒 𝑓 is the required acceleration for the actor to reach the reference lane velocity at its current

position.

Its value is limited to the [−3, 1] range (m/s2). 𝑎𝑠𝑡𝑦𝑙𝑒 approximates the difference in

behavior given the driving style of the actor. It is −1.5 m/s2 for "cautious", 0 m/s2 for "standard",

and 1.5 m/s2 for "aggressive" actors. Lastly, 𝑎𝑢𝑛𝑐 is a random value sampled from a truncated

normal distribution to cover a range of individual behaviors of actors. The distribution is N (0, 2)

with a range limited to [−2, 2]. After the sum, the total 𝑎𝑔𝑒𝑛 is limited one last time to the [−4, 2]

m/s2 range. The lower ranges for acceleration values are kept larger in magnitude due to the

possibility of emergency brakes. It is assumed that the actor will attempt an emergency brake to

avoid a collision if it is the only option available to not enter into a conflict state. This does not

22

correlate to the driving style of the actor, as it is expected from any driver in real life to apply hard

brakes when there is a hazard nearby, such as a fast oncoming vehicle without any space to swerve.

Several assumptions are made to approximate the driving behaviors. This approximation does

not exactly match the simulated driving model, which will be detailed in the chapter on method

evaluation. This is a deliberate choice to show that a rough approximation with general but

reasonable assumptions about driver behavior is sufficient to distinguish drivers from each other.

3.1.7 Reward Function

Building upon Hubmann et al.’s approach [38], the reward function consists of several types of

penalties (negative rewards):

𝑅(𝑠, 𝑎) = 𝑅𝑐𝑜𝑛 𝑓 𝑙𝑖𝑐𝑡 (𝑠) + 𝑅𝑣𝑒𝑙 (𝑠) + 𝑅𝑎𝑐𝑐 (𝑠, 𝑎) + 𝑅𝑑𝑖𝑠𝑡 (𝑠)

(3.10)

𝑅𝑐𝑜𝑛 𝑓 𝑙𝑖𝑐𝑡 is a high penalty equal to the coefficient −𝐾𝑐𝑜𝑛 𝑓 𝑙𝑖𝑐𝑡 when the two vehicles are in

a conflicting state according to their distances and the pre-processed conflict area thresholds

previously described. When no conflict is detected, this penalty is 0.

𝑅𝑣𝑒𝑙 is a penalty caused by undershooting or overshooting the reference speed limit 𝑣𝑟𝑒 𝑓 at the

position of the ego vehicle. Two different formulas are used for lower and higher velocities:

𝑅𝑣𝑒𝑙 =

−𝐾𝑣+(𝑣𝑟𝑒 𝑓 − 𝑣𝑒)2

if 𝑣𝑒 > 𝑣𝑟𝑒 𝑓

−𝐾𝑣−(𝑣𝑟𝑒 𝑓 − 𝑣𝑒)

if 𝑣𝑒 < 𝑣𝑟𝑒 𝑓

(3.11)





with the coefficients 𝐾𝑣+ and 𝐾𝑣− for the two different cases. The higher velocity case is punished

with quadratic severity instead of linear severity, since exceeding the speed limit is much more

dangerous than being too slow.

𝑅𝑎𝑐𝑐 is equal to a comparably low negative coefficient −𝐾𝑎𝑐𝑐 when the action changes the

acceleration of the ego vehicle, to prevent erratic behavior and increase comfort. When the

acceleration stays the same, the value is 0.

Finally, the distance penalty is calculated as:

𝑅𝑑𝑖𝑠𝑡 =





−𝐾𝑑𝑖𝑠𝑡 (𝑑𝑐𝑜𝑛 𝑓 𝑙𝑖𝑐𝑡 − 𝑑𝑒)

if 𝑑𝑒 < 𝑑𝑐𝑜𝑛 𝑓 𝑙𝑖𝑐𝑡

0

otherwise

23

(3.12)

where 𝐾𝑑𝑖𝑠𝑡 is the penalty coefficient and 𝑑𝑐𝑜𝑛 𝑓 𝑙𝑖𝑐𝑡 is the distance at which the collision area begins.

This penalty induces the ego vehicle to move closer to the collision area before stopping, leading

to a more realistic yielding behavior.

3.2 ABT Solver

Given the continuous state and observation spaces, solving the lane excursion problem online

becomes a computationally demanding task. Exploring every possible belief state to generate

a complete and optimal policy is infeasible. Thus, this type of problem is usually approximated

using Monte-Carlo belief sampling. Among the existing sampling-based POMDP solvers, the ABT

(Adaptive Belief Tree) is an appropriate choice to solve a discrete-action, continuous-observation

POMDP problem [16]. It is also used in Hubmann et al.’s intersection problem, which is very

similar in terms of POMDP characteristics [38].

The ABT functions by sampling many particles (states) from the initial belief, then propagating

the particles through a "belief tree" comprised of different beliefs represented as nodes. To propagate

a particle, a random action is sampled, a state transition is performed, and then an observation is

sampled using the observation model. This process produces a new state, which is added to the tree

by branching out from the previous belief to a new belief node. This propagation continues for each

particle until the planning horizon is reached from the root. Particles travel through the same node

if the same action is taken from the previous node and the received observation is similar enough

according to a pre-defined similarity metric. In the end, all the particles in a given node form an

approximation of the belief at that node. Using the taken action and the received observation, the

corresponding node is used as the initial belief for the next time step in an online fashion. This

way, the belief update function 𝜏(𝑏, 𝑎, 𝑜) need not be explicitly defined.

The ABT has computational advantages in that it only searches the reachable belief space, and

particles sampled from previous time steps remain in the tree, so it is not necessary to sample all

states from scratch at every time step. A visualization of this process can be seen in Figure 3.3.

24

Figure 3.3 The process of sampling future states for the line excursion problem using the ABT
method. The hidden driving style of the actor is sampled randomly at the initial belief, and then
sampled particles are propagated through the belief tree. The belief node corresponding to the
taken action and the received observation is used as the initial belief for the next step. The visual
was adapted from Hubmann et al. [38] and redrawn for this problem.

3.3 TAPIR and ROS Integration

TAPIR (Toolkit for Approximating and Adapting POMDP solutions In Real-time) is an open-

source software toolkit for solving continuous POMDP problems [27]. It implements the ABT

method as a C++ library and provides inheritable classes for defining detailed POMDP models.

ROS (Robot Operating System) [40] is a software and communications framework for designing

real-life and simulated robotics applications. It is currently used as a development platform for

many autonomous vehicle projects. TAPIR provides the option to build its ABT library as a

ROS package and implement the method as a ROS process. For this thesis, the ROS approach

is employed, and the lane excursion solver is also implemented in ROS. It requires installing

TAPIR separately and building packages together in a ROS workspace. The code can be found in

https://github.com/adastec-batuhan/lane_overflow_pomdp_planner [41].

The ROS package is named lopp_tapir, which stands for "Lane Overflow POMDP Planner"

25

Figure 3.4 A snapshot of a POMDP simulation, visualized in RViz. Various visual markers are
used to debug and track system states. On the left is the menu for selecting which of the published
markers are to be currently viewed. The red squares correspond to the beginnings and ends of the
conflict area on each lane.

and TAPIR. The term "lane overflow" was changed to "lane excursion" in the thesis later in time,

but the package uses the abbreviation "Lopp" for most of the classes. We will refer to the package

as "lane excursion planner" during this section.

Using an object-oriented approach, the TAPIR library provides everything related to POMDP

as classes. The lane excursion planner inherits these classes and provides the necessary methods

and variables to exhaustively model the lane excursion problem as a POMDP model. Some of

the classes created are "LoppState," "LoppAction," "LoppObservation," and "LoppModel," which

together define all of the POMDP characteristics in code. In addition, some classes unique to the

lane excursion problem were needed to describe the scenario. "EgoState" and "ActorState" were

created to represent the parts of the POMDP state. "Scenario" was created to design the lanes and

extract information on the conflict area.

The "LoppNode" and "BaselineNode" classes inherit from the "TapirNode" class in the TAPIR

library, which provides the basic ROS integration and functionalities. The inherited classes build

26

upon this by implementing their own simulation algorithm.

The ROS environment is accompanied by several useful functionalities for the software devel-

opment process. RViz, a built-in visualization tool of ROS, is used for debugging and visualizing

the lane excursion simulation in real-time. The LoppNode and BaselineNode classes both publish

a series of "markers" after running each time step of the simulation. A marker is the ROS message

type that RViz recognizes and displays; various shapes (or text) in 2D or 3D can be drawn on the

RViz screen. The lane excursion planner package publishes the map, vehicles, and states as marker

messages to watch the ongoing simulation. The resulting view can be seen in Figure 3.4.

ROS also provides a data format called Rosbag, which records a series of ROS messages with

timestamps so that the recordings can be played back and reviewed later. To extract performance

measures for the POMDP and baseline controllers, each run is saved into a Rosbag containing the

necessary simulation state data for each time step. The Rosbags are then processed using a simple

analysis script in Python to produce the measures and plots that are shown in subsequent chapters.

27

CHAPTER 4

EVALUATION AND RESULTS

4.1 Simulation of the Lane Excursion Scenario

To test the performance of the POMDP, we utilize the built-in simulator class of the TAPIR

library. The POMDP solver and the simulator are called in an alternating sequence within a single

time step. After the POMDP solver decides on an action, it is passed to the simulation, which

updates ego and actor states accordingly.

We implement driving models for both vehicles. The ego and actor vehicles’ states are updated

using the same kinematic formulas as the equations 3.7 and 3.8. The ego acceleration used is the

corresponding value to the taken action. The actor acceleration is chosen through a more complex

algorithm, depicted as a flowchart in Figure 4.1.

The actor, by default, aims to reach the reference speed limit at its current position, which

produces the reference acceleration 𝑎𝑟𝑒 𝑓 . If both vehicles are predicted to enter the conflict area

under a time threshold (5 seconds), they are considered "close" in the flowchart. Then, the actor

aims for a different target speed depending on its driving style. The new target speed is the lane

speed limit multiplied by a coefficient of 0.5 for cautious behavior, 1.0 for standard behavior, and

1.25 for aggressive behavior, producing the acceleration 𝑎𝑏𝑒ℎ. If the ego vehicle is already inside

the conflict area, and the actor is close or inside, then the actor attempts to stop just before the

conflict area with a maximum deceleration threshold and let the ego vehicle pass first, producing the

acceleration 𝑎𝑠𝑡𝑜 𝑝. 𝑎𝑟𝑒 𝑓 and 𝑎𝑏𝑒ℎ are also perturbed with transition uncertainty, using a truncated

normal distribution N (0, 𝜎2) with 𝜎 = 2 and a valid range of [−2, 2] which is able to simulate

behavioral variance under a given driving style.

The parameters used for the POMDP model and the ABT solver are given in Table 4.1. As the

reward system is similar to Hubmann et al.’s intersection navigation POMDP [38], the values are

similar in magnitude; although the penalty coefficients are specifically tuned to the defined lane

excursion problem.

28

No

Start

Actor inside
or close?

Yes

Ego inside?

No

Yes

𝑎 = 𝑎𝑟𝑒 𝑓

No

Ego close?

𝑎 = 𝑎𝑠𝑡𝑜 𝑝

Yes

𝑎 = 𝑎𝑏𝑒ℎ

𝑎 = 𝑎 + N (0, 𝜎2)

Stop

Figure 4.1 Algorithm for deciding the acceleration value chosen by the actor vehicle at each time
step of the simulation.

29

Parameter Value Definition

𝛾
𝐾𝑐𝑜𝑛 𝑓 𝑙𝑖𝑐𝑡
𝐾𝑣+
𝐾𝑣−
𝐾𝑎𝑐𝑐
𝐾𝑑𝑖𝑠𝑡
Δ𝑡
𝐻

0.95 Discount factor
10000 Penalty coefficient for reaching a conflict state
1000
100
200
50
1
8

Penalty coefficient for being above the reference velocity
Penalty coefficient for being under the reference velocity
Penalty coefficient for changing acceleration value
Penalty coefficient for staying too far away from the conflict area
Time step (seconds)
Planning horizon in terms of time steps

Table 4.1 POMDP and ABT parameters.

4.2 Rule-Based Baseline Method

A rule-based driving controller is also created to provide a baseline for comparison to the

POMDP solver. The rule-based controller does not simulate further in time to predict future states

and interactions, unlike the POMDP. Instead, it assumes optimal ego and actor movements until the

conflict area and makes a decision using only the current state of both vehicles. Optimal movement

in this context is defined as the series of accelerations that follow the reference lane velocity as

closely as possible without considering vehicle interaction.

In other words, optimal movement

minimizes the sum of the differences between the vehicle velocity and the lane reference velocity

at every time step.

The rule-based controller calculates the times in which the ego and actor vehicles will reach the

conflict area. Using this convention, if the ego is expected to enter first and after that the actor will

have sufficient braking distance to stop without entering, then the ego takes the already calculated

optimal actions. Otherwise, the ego vehicle will follow optimal actions until the last possible point,

then brake to stop until the actor passes the conflict area.

The time in which the actor will enter the conflict area is calculated using knowledge of the

actor’s driving style. An aggressive actor is expected to enter earlier than a cautious actor, which

ultimately affects the acceleration decision. The controller has three configurations in how it

processes driving style information:

1. Omniscient: This configuration allows the controller to always know the driving style of the

30

actor vehicle. Using this knowledge, the omniscient controller is able to calculate the timing

precisely and take the fastest path across the conflict area without causing conflicts.

2. Uniform assumption: This configuration makes the controller consider each of the three

driving styles as valid possibilities. Even when the actor is cautious, the uniform assumption

controller will act conservatively and move slower in case the style is, in fact, aggressive.

3. False assumption: This configuration causes the controller to misinterpret the actual driving

style and randomly pick one of the two remaining styles as a certainty. When the actor is

aggressive, the false assumption controller may presume that it is cautious and attempt to

enter the conflict area, which is likely a mistake. This type of controller drives fast and tries

to pass first, but sometimes leads to undesirable conflicts.

These configurations all have different advantages over each other, but their lack of predictive

optimization prevents them from performing well in all possible cases.

4.3 Performance Measures

We define two different measures to evaluate the performance of the rule-based and POMDP

controllers. The measures are defined over 𝑁 simulation runs.

The first measure is the number of times in 𝑁 runs where the simulation terminated with a conflict

state instead of a goal state. The goal state means that the ego vehicle has successfully reached the

end of the lane. Of course, fewer terminations with conflict states show better performance.

The second measure is the average velocity error during each run, where the velocity error at

each time step is the absolute difference between the ego velocity and the reference lane velocity

at the position of the ego. This value is also averaged over 𝑁 runs. A lower average is better

since it means that the ego vehicle was successfully able to follow the reference velocity. For the

proposed methods, higher values are most often caused by the ego vehicle stopping to yield the way

to the actor vehicle because exceeding the reference velocity is punished severely for the POMDP

method, and it is outright impossible for the rule-based method, given its algorithm. This means

that higher velocity errors directly correlate to longer clearing times of the lane excursion scenario.

31

Method

Rule-Based, False Assumption
Rule-Based, Uniform Assumption
Rule-Based, Omniscient
POMDP

Cautious Standard Aggressive All
41
0
0
0

19
0
0
0

22
0
0
0

0
0
0
0

Table 4.2 Total numbers of rounds ending in conflicts. The data is over 100 rounds for each
driving style, for 300 rounds in total.

Method
Rule-Based, False Assumption
Rule-Based, Uniform Assumption
Rule-Based, Omniscient
POMDP

Cautious Standard Aggressive
1.246
1.913
1.587
1.711

1.394
1.768
1.768
1.561

1.281
1.341
0.843
1.028

All
1.306
1.674
1.399
1.433

Table 4.3 Average absolute velocity errors of the ego vehicle (m/s). The data is over the successful
rounds among 100 each driving style, for a maximum of 300 rounds in total.

Even when yielding the way, the ego vehicle can be controlled in a way that minimizes the wait

time for the actor vehicle, which leads to the differences between the performances of the POMDP

and rule-based methods, even in scenarios where yielding is the only feasible option.

The second measure, the average velocity error, is reported only for the successful runs, meaning

the runs where the ego vehicle was able to avoid a conflict and reach the goal state. This allows

for the complete distinction of the two measures regarding their purposes. The first measure shows

if the methods can safely navigate the lane excursion scenario. The second measure shows how

quickly it is navigated, as opposed to having an undesirable and conservative approach where it

is unnecessary. Then, balancing these two measurements (keeping both values low) becomes the

goal of a good-performing solution.

4.4 Results and Analysis

We test four methods: POMDP and the rule-based controller with its three different configura-

tions. With each method, 100 simulation runs are executed for each driving style, for a total of 300

runs. The ego and actor vehicle states are randomly initialized within boundary conditions (valid

ranges of arc distance and velocity) using a global seed. The list of seeds is reused for each method

to ensure the same initial states when testing a new method.

32

Figure 4.2 The average velocity error over each successful run, reported for each of the four
proposed methods. Runs with cautious, standard, and aggressive actor styles are separated into
different colored groups. The maximum and minimum values for each set of runs are highlighted
at the top and bottom of the point groups with horizontal bars. The overall averages are
highlighted with the wider black bars.

The measurements are reported in Tables 4.2 and 4.3. For each measure, a lower value shows

higher performance.

The uniform assumption controller does not cause conflicts because it always acts conservatively,

considering that there is always a possibility that the driving style is aggressive. The omniscient

controller also does not cause conflicts because it knows the driving style with certainty and can act

accordingly. The POMDP controller is also able to avoid any conflicts by accurately approximating

the driving style of the actor in a short amount of time in all runs.

In the second measurement, the false assumption controller has the lowest velocity error overall.

This is due to reckless driving in cases with standard or aggressive styles. In said cases, there is

a chance that the controller assumes cautious behavior and takes the opportunity to pass. This

leads to less stopping and yielding for the ego vehicle and lower velocity errors with respect to the

reference on successful runs. However, there are many runs where this behavior for the ego leads to

conflicts, as can be seen among the results for the first measurement. This disadvantage significantly

33

overshadows its performance in the second measurement. The false assumption controller is a fine

example where the balancing of the capabilities between navigating quickly and avoiding conflict

fails in one direction.

The omniscient method performs second best since it possesses perfect knowledge about the

actor and can select actions accordingly. Thus, this method’s performance acts as a target for the

POMDP method, which has to approximate the driving style dynamically.

The uniform assumption controller can be considered a realistic baseline for a lane excursion

solution. It possesses no information about the actor’s driving style and needs to act conservatively

in every situation to avoid a conflict, given the chance that the driving style is more aggressive than

cautious. This leads to more stopping and yielding behavior, which in turn increases the average

velocity error in all types of runs. It can be seen that the POMDP controller performs better than

this baseline in every case. This is because the POMDP can gather information while navigating

the scenario and make better-informed decisions.

A deeper analysis of the average velocity error values can be made through Figure 4.2, where the

average error is shown as a point for every successful run. Runs with different driving styles for the

actor are grouped into different colored points, with their maximum, minimum, and average values

highlighted as horizontal bars. It can be seen that there is a high variance in the error values; this is

caused by the various randomized starting state velocities for the ego vehicle and the difference in

yielding and passing behaviors. Yielding to the actor vehicle leads to higher velocity errors since

the reference velocity guides the ego vehicle to move while the ego needs to stop until the actor has

passed the conflict area. Passing first means that a lower average velocity error will be produced.

It can be observed in Figure 4.2 that most of the runs in the uniform assumption method end

with the ego vehicle yielding and the error being higher as a result. As mentioned, the ego yields

because, from the controller’s point of view, there is always a chance that the actor will be aggressive

and attempt to pass as well, which may lead to a conflict. In comparison, the POMDP controller

produces more cases with correct yielding or passing decisions and, as a result, lowers the velocity

errors for each driving style on average.

34

Figure 4.3 Data from a single run comparing the rule-based and POMDP methods, where the
actor vehicle’s driving style is "cautious." On the top, the velocity profiles over time are shown in
comparison to the reference lane velocities. On the bottom, the POMDP belief (probability
distribution) over the actor vehicle’s driving style is shown.

For a detailed look into which actions the methods take during runs, the velocity profiles created

during a single run are provided in Figure 4.3. The false assumption controller is left out for easier

visualization. On the top, the ego velocities over time for each of the remaining methods are shown.

The actor velocity for the POMDP method is also included. On the bottom, the belief state of the

POMDP over the probabilities of the driving style of the actor is shown for the same run.

In this run, the actor vehicle is cautious, and the optimal approach is to pass the conflict area

first with the highest velocity value allowed, which is seen as the velocity profile of the omniscient

method. The POMDP controller starts to slow down one time step before the uniform controller

(𝑡 = 3) since cruising at a lower speed allows it to gather information for a longer time, as opposed

35

Method

# Conflicts Average velocity error

POMDP
POMDP + perception noise

0
1

1.433
1.418

Table 4.4 The POMDP evaluated in the default configuration, and with perception noise added in
simulation. Conflicts are summed and velocity errors are averaged over 100 runs for each driving
style, for 300 runs in total.

to reaching the conflict area as fast as possible and making a full stop after that. At 𝑡 = 4, the actor

vehicle starts to slow down due to its cautious behavior, and the belief of the POMDP solver shifts

towards cautious as well. The future time steps confirm the POMDP solver’s belief, and the ego

takes the opportunity to pass the conflict area, while the uniform assumption controller has to slow

down more to ensure that the actor vehicle cannot pass first, even if it is aggressive.

4.5 Effects of Observation Uncertainty

To test the capabilities of the POMDP model under added sources of uncertainty, another series

of runs is performed where perception noise is introduced to the system. The noise is introduced

during the observation step, making this a type of observation uncertainty. The observed arc

distance is perturbed with a value taken from the normal distribution N (0, 2.5), and the observed

velocity is perturbed using the distribution N (0, 0.5), which are plausible magnitudes of noise that

could exist in a real-life system.

A basic delay-based error calculation was used to pick the variance for the velocity error: Given

a very common autonomous system frequency of 10Hz, each observation will be 0.1 seconds later

than its previous. In this duration, a vehicle may change acceleration values; then, the following

observation becomes different from what is expected under a constant acceleration assumption. We

consider the highest change in acceleration allowed by the simulation, ignoring emergency braking

scenarios. This value is 3 m/s2, leading to a maximum velocity change of 0.3 m/s in the 0.1s time

frame. The distance error was modeled based on the size of the actor vehicle. In an extreme case

where only one side of the vehicle was seen at a time step, the center of the bounding box could shift

from its true position as much as half of the vehicle length, which is 1.86 meters in the simulation.

These values both represent very improbable situations, but a higher variance than each of them

36

was chosen as the error parameter to challenge the POMDP solver.

The measurements of the POMDP solver with and without perception noise are reported in

Table 4.4. A consequence of integrating perception noise is that one of the 300 runs in total ends in

a conflict, as opposed to none of them when the noise is not present. In this run, the actor vehicle

is aggressive, and the ego vehicle estimates this correctly once again, then stops before the conflict

area to yield the way. The actor vehicle passes through, and right before it completely clears, the

ego vehicle starts moving and enters the conflict area as well, causing a conflict at the edge of the

area. This is caused by the incorrect information provided to the ego vehicle, leading the solver to

think that the actor vehicle is, in fact, completely past the conflict area. The incorrect information

here causes the mistake in the first two steps of the time horizon and not the time steps farther

into the future. Therefore, this failure does not indicate any weaknesses in the POMDP solver’s

prediction ability. Instead, it is an instant consequence of erroneous data provided.

Regarding the ability to follow the reference velocity profile of the lane, the POMDP seems to

perform at the same level regardless of whether or not there is external observation uncertainty. This

can be explained through the two following remarks: First, in the solution space that the POMDP

solver searches, internal noise is introduced in the transition model, which leads to simulated

variance in future time steps. When states similar to each other are considered the root belief, the

beliefs in future time steps still cover very similar possibilities due to the high branching factor. This

means that the future optimal actions and rewards reached from the root belief will likely be similar

sequences when noise is introduced at the root step, which is what the observation uncertainty

entails. Secondly, given a state in the lane excursion scenario, perturbing that state is not very

likely to change the optimal action at closer time steps. More specifically, given an actor’s velocity

and distance, moving the actor or changing the actor’s velocity without a major difference does not

change the best action to take for the ego vehicle. For example, if the actor is aggressive, whether

it is 10 or 12 meters away from the conflict area does not affect that the best action to take is to stop

and yield. This is also true for velocities. When these two remarks are combined, it is observed

that neither the closer time steps nor the time steps farther into the future are affected substantially,

37

given a realistic amount of perception noise.

In conclusion, the predictive capabilities and performance of the POMDP controller are main-

tained even when some perception noise is introduced as the observation uncertainty. However,

immediate effects, such as a conflict at the next time step, can be seen due to having inaccurate

information. Though this problem is not specific to a POMDP solver and can happen with any

other, future work can still be considered to address the higher possibility of conflicts.

4.6 Discussion on Real-life Application

As mentioned, research in autonomous vehicle literature about addressing problems specific

to large vehicle types is scarce. Moreover, to the best of my knowledge, no effort has been made

to address unavoidable lane excursions via longitudinal motion planning. Yet, with autonomous

buses, trucks, and shuttles being more regularly deployed on roads, this scenario is likely to become

a common occurrence during autonomous driving. Thus, the real-life implications of modeling

this problem are discussed in this section.

The performance of the POMDP solver on navigating through a lane excursion scenario has

been demonstrated throughout this paper. However, the evaluations were only done in a simulation

environment. In real life, every step of communication and actuation introduces varying uncer-

tainties to the whole problem. For instance, in a ROS environment, different packages are run

under different processes, and there is always a communication delay between sending an action

request and that action reaching the package that commands the vehicle. From the commandeering

package to the vehicle, there are multiple physical layer steps that the message has to move through,

which adds to the delay. Meanwhile, the simulation assumes that the action is instantly taken after

being issued. To address this, in a real-life system the POMDP may need to estimate the delay to

the actuator and simulate forward to issue the action at the delayed time earlier.

Another example is the vehicle dynamics being much more complicated than a bicycle model

in real life. When doing particle propagation, the POMDP solver may need to consider the detailed

vehicle dynamics equation. Especially for large autonomous vehicles, the vehicle’s movement

capabilities differ significantly from the common assumptions made in prior research and this thesis.

38

Even among a specific type of vehicle, such as a commercial bus, the dynamics characteristics differ

according to factors such as tire placement [42].

Lateral motion, although ignored here, is also a source of uncertainty. As mentioned in the

problem definition, the POMDP model is expected to be able to incorporate this range of motion

without major changes since the conflict area is already pre-processed in this method, and the

motion can be considered in this step. During the navigation of the scenario, the longitudinal arc

distance will again be the important state variable, and the lateral motion will only add noise to the

longitudinal distance traveled at each time step. The amount of uncertainty introduced this way is

already handled by the variance in the transition model of the POMDP.

A major assumption made in defining the lane excursion problem is that the vehicles involved

will adhere to traffic rules. In cases where these rules are broken by illegal maneuvers, the POMDP

would not be able to forward simulate this into possible predictions since the state transitions are

constrained by the traffic rules. This is an extreme case of uncertainty, but it is worth considering

nonetheless when designing a generic approach. As a solution, the ABT solver may have a tiny

portion of its belief particles propagated without the traffic rule constraints, allowing it to consider

the possibility of reaching illegal states in the future. However, this may cause an unbalanced value

estimation of each action, so a difficult fine-tuning process for reward parameters may be required.

Another solution may be a higher-level decision tree reacting to illegal states in a rule-based manner

and transferring control to the POMDP solver when the states are deemed legal, which may be the

safer approach.

Moving on from uncertainties, there is also an issue of generalization. Multiple actor vehicles

may need to be considered in a single environment simultaneously, in which case the current

method would have to be run twice, whereas the better option is to extend the POMDP model to

include multiple actors. In other cases, a single actor near a conflict area may not be visible to

the ego vehicle due to occlusion spots. These cases have been addressed in previous works with

intersection navigation problems [36, 39], which can also be adapted to this problem.

Generalization also requires the method to be activated dynamically whenever a possible conflict

39

area is encountered. To this end, the conflict area can be extracted using prior map data as a pre-

processing step or approximating the local map on demand through sensing. The POMDP method

provided is able to use lane maps with different shapes as inputs. However, the type of scenario is

limited to two-way, double-lane roads, so another challenge is to extend its capabilities to handle

varying road types. In multi-lane roads, vehicles can change lanes which is a reliable way to avoid

the conflict area, whereas this study was built on the assumption that passing through the conflict

area is the only option for both vehicles involved. Multi-lane roads would require redefining the

problem to accommodate vehicles moving in and out of the conflicting lanes.

As another generalization challenge, the driving styles of actor vehicles, although they cover a

high variance of behaviors in this simulation, may not be able to completely represent the variance

of driver behaviors encountered in real life. To address this, driving styles and their corresponding

driving models can be extracted from real-life data, such as autonomous driving datasets, either

with a rule-based data analysis method or a learning algorithm. After this, it must also be ensured

that the POMDP solver is able to properly divide the belief space to distinguish different styles

among a more extensive set.

One more interesting branch of autonomous driving research to consider is multiple autonomous

systems interacting with each other. There may be a benefit to redesigning the lane excursion

problem using two autonomous systems instead of one system interacting with an independent,

manually driven vehicle. There have been many studies on managing intersections with conflict

areas given multiple autonomous vehicles, with cases where they are able to communicate with

each other or where a single infrastructure element makes the higher-level decision-making on pass

priority, which can all be applied to the conflict area management in this problem as well [43].

In summary, many research opportunities exist in implementing the lane excursion problem in

real-life autonomous vehicles. The set of assumptions made by the original problem definition is

strict, but it can be relaxed by testing the removal of each assumption in isolated experiments. This

way, the original problem definition can also be extended significantly to accommodate various

road and vehicle geometries, vehicle interaction scenarios, and types of uncertainties.

40

CHAPTER 5

CONCLUSION

In this thesis, the lane excursion problem was introduced in the context of large autonomous

vehicles. The main goal was to emphasize that as the research space on the autonomous driving

industry expands, new perspectives on some long-standing problems need to be brought forward.

One example is that large autonomous vehicles do not always have the capability of avoiding a lane

excursion through trajectory shape optimization; thus, sometimes, it is required to assume that the

excursion will inevitably take place and to plan longitudinal velocities under vehicle interaction

constraints instead.

The lane excursion problem was formally defined, including a conflict area with undetermined

passing priority and an opposing vehicle with a driving style that is non-observable to the au-

tonomous system. Along with it, a "largeness" term is coined to be able to focus on vehicles with

dimensions in specific ranges when addressing the lane excursion problem. Several methods to ad-

dress the problem have been proposed. The main solution employs the capabilities of the POMDP

framework to estimate the hidden driving style of the incoming vehicle and take close-to-optimal

actions to navigate the conflict area. Other rule-based methods have been proposed as baselines

for the POMDP method. Two measures to compare all methods have been proposed to evaluate

them regarding safety and speed. Comparisons demonstrated the ability of the POMDP solver

to generalize to different driving styles and randomized scenario states. Lastly, the behavior of

the POMDP was analyzed in more detail with the aid of simulation data from specific runs and

measures gathered under an additional observation uncertainty constraint.

The lane excursion problem is a frequent occurrence when the vehicle size is significant in the

given road geometry, and it has the possibility to become even more pervasive as a higher number

of large autonomous vehicles are deployed in the industry. My aim is to provide insight into how

to solve this problem in a way that is generalizable to any type of autonomous vehicle deployment

today. The proposed POMDP model successfully represents the fundamental challenges brought

on by the problem definition: the non-observability of the whole system state and the complexity

41

of dynamic vehicle interactions near conflict zones. However, there are still two major obstacles in

transferring and generalizing the problem into real-life contexts: First, the POMDP model makes

many assumptions to construct the problem, some of which need to be eliminated when building a

generic solution. Second, real-life vehicle deployments are accompanied by a significantly larger

number of characteristics and uncertainties that can affect the performance of a given solution,

as opposed to the simulation environment presented in this study. To direct research efforts into

clearing these two obstacles for the lane excursion problem, I included a discussion section detailing

the major considerations associated with transferring the problem into real life.

My personal takeaways from this study were to widen my vision of domain-specific challenges

for larger autonomous vehicles, to gain a deeper understanding of modeling and predicting vehicle

interaction, and to achieve a level of expertise on MDP and POMDP solutions for the task of

autonomous path planning. I first started my research on lane excursion due to it being a problem

we frequently encounter during our autonomous deployments at ADASTEC. I wanted to improve

my knowledge of a branch of planning algorithms that is applicable to real-life products in my

career. As I worked on the thesis, I grew more aware of the potential of the proposed model to

become a prevalent and useful tool to improve the safety of many autonomous deployments in

operation today, ranging from regular passenger car types to buses and trucks. My next major goal

is to focus on the aforementioned real-life implications and approach the POMDP method from an

engineering perspective. I believe that this study in lane excursion modeling will be a consequential

step toward safer and more efficient autonomous driving solutions in the future.

42

BIBLIOGRAPHY

[1] Aurora Innovation Inc., “The aurora driver.” [Online]. Available:

https://aurora.tech/

aurora-driver

[2] N. Verin and B. Sperling, “Ez10 passenger shuttle.” [Online]. Available:

https:

//easymile.com/vehicle-solutions/ez10-passenger-shuttle

[3] National Association of City Transportation Officials, “Transit street design guide,” 2016.
[Online]. Available: https://nacto.org/wp-content/uploads/2016/04/Turn-Radius_60ft-bus.
jpg

[4] C. V. Zegeer, H. F. Huang, J. C. Stutts, E. Rodgman, and J. E. Hummer, “Commercial bus
accident characteristics and roadway treatments,” Transportation Research Record, vol. 1467,
pp. 14–22, 1994.

[5] D. Chimba, T. Sando, and V. Kwigizile, “Effect of bus size and operation to crash occurrences,”

Accident Analysis & Prevention, vol. 42, no. 6, pp. 2063–2067, 2010.

[6] P. Polack, F. Altché, B. d’Andréa Novel, and A. de La Fortelle, “The kinematic bicycle model:
A consistent model for planning feasible trajectories for autonomous vehicles?” in 2017 IEEE
intelligent vehicles symposium (IV).

IEEE, 2017, pp. 812–818.

[7] R. F. Smith, “State of the practice and recommendations on traffic control strategies at toll

plazas,” p. 24, 2006.

[8] National Association of City Transportation Officials, “Urban street design guide,” pp. 33–36,

2013.

[9] P. Trautman and A. Krause, “Unfreezing the robot: Navigation in dense, interacting crowds,”
IEEE, 2010,

in 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.
pp. 797–803.

[10] K. Kant and S. W. Zucker, “Toward efficient trajectory planning: The path-velocity decom-
position,” The international journal of robotics research, vol. 5, no. 3, pp. 72–89, 1986.

[11] K. J. Åström, “Optimal control of markov processes with incomplete state information i,”

Journal of mathematical analysis and applications, vol. 10, pp. 174–205, 1965.

[12] C. H. Papadimitriou and J. N. Tsitsiklis, “The complexity of markov decision processes,”

Mathematics of operations research, vol. 12, no. 3, pp. 441–450, 1987.

[13] O. Madani, S. Hanks, and A. Condon, “On the undecidability of probabilistic planning
and infinite-horizon partially observable markov decision problems,” in Aaai/iaai, 1999, pp.
541–548.

[14] J. Pineau, G. Gordon, S. Thrun et al., “Point-based value iteration: An anytime algorithm for

pomdps,” in Ijcai, vol. 3, 2003, pp. 1025–1032.

43

[15] D. Silver and J. Veness, “Monte-carlo planning in large pomdps,” Advances in neural infor-

mation processing systems, vol. 23, 2010.

[16] H. Kurniawati and V. Yadav, “An online pomdp solver for uncertainty planning in dynamic
Springer,

environment,” in Robotics Research: The 16th International Symposium ISRR.
2016, pp. 611–629.

[17] M. Memarzadeh, G. L. Britten, B. Worm, and C. Boettiger, “Rebuilding global fisheries
under uncertainty,” Proceedings of the National Academy of Sciences, vol. 116, no. 32, pp.
15 985–15 990, 2019.

[18] L. Pascal, M. Memarzadeh, C. Boettiger, H. Lloyd, and I. Chadès, “A shiny r app to solve the
problem of when to stop managing or surveying species under imperfect detection,” Methods
in Ecology and Evolution, vol. 11, no. 12, pp. 1707–1715, 2020.

[19] R. Xie, Q. Tang, C. Liang, F. R. Yu, and T. Huang, “Dynamic computation offloading in iot
fog systems with imperfect channel-state information: A pomdp approach,” IEEE Internet of
Things Journal, vol. 8, no. 1, pp. 345–356, 2020.

[20] Y. Zhu, S. Liang, M. Gong, and J. Yan, “Decomposed pomdp optimization-based sensor
management for multi-target tracking in passive multi-sensor systems,” IEEE Sensors Journal,
vol. 22, no. 4, pp. 3565–3578, 2022.

[21] G. Arcieri, C. Hoelzl, O. Schwery, D. Straub, K. G. Papakonstantinou, and E. Chatzi, “Bridg-
ing pomdps and bayesian decision making for robust maintenance planning under model
uncertainty: An application to railway systems,” Reliability Engineering & System Safety,
vol. 239, p. 109496, 2023.

[22] M. Chen, S. Nikolaidis, H. Soh, D. Hsu, and S. Srinivasa, “Trust-aware decision making for
human-robot collaboration: Model learning and planning,” ACM Transactions on Human-
Robot Interaction (THRI), vol. 9, no. 2, pp. 1–23, 2020.

[23] L. Holzherr, J. Förster, M. Breyer, J. Nieto, R. Siegwart, and J. J. Chung, “Efficient multi-scale
pomdps for robotic object search and delivery,” in 2021 IEEE International Conference on
Robotics and Automation (ICRA).

IEEE, 2021, pp. 6585–6591.

[24] J. Pajarinen, J. Lundell, and V. Kyrki, “Pomdp planning under object composition uncertainty:
Application to robotic manipulation,” IEEE Transactions on Robotics, vol. 39, no. 1, pp. 41–
56, 2022.

[25] K. Zheng and S. Tellex, “pomdp_py: A framework to build and solve pomdp problems,” arXiv

preprint arXiv:2004.10099, 2020.

[26] M. Egorov, Z. N. Sunberg, E. Balaban, T. A. Wheeler, J. K. Gupta, and M. J. Kochenderfer,
“Pomdps. jl: A framework for sequential decision making under uncertainty,” Journal of
Machine Learning Research, vol. 18, no. 26, pp. 1–5, 2017.

[27] D. Klimenko, J. Song, and H. Kurniawati, “Tapir: A software toolkit for approximating and
adapting pomdp solutions online,” in Proceedings of the Australasian Conference on Robotics
and Automation, Melbourne, Australia, vol. 24, 2014.

44

[28] L. Yu, D. Kong, and X. Yan, “A driving behavior planning and trajectory generation method

for autonomous electric bus,” Future Internet, vol. 10, no. 6, p. 51, 2018.

[29] R. Oliveira, P. F. Lima, G. C. Pereira, J. Mårtensson, and B. Wahlberg, “Path planning
for autonomous bus driving in highly constrained environments,” in 2019 IEEE intelligent
transportation systems conference (ITSC).

IEEE, 2019, pp. 2743–2749.

[30] W. Song, G. Xiong, and H. Chen, “Intention-aware autonomous driving decision-making in

an uncontrolled intersection.” Mathematical Problems in Engineering, 2016.

[31] K. H. Wray, B. Lange, A. Jamgochian, S. J. Witwicki, A. Kobashi, S. Hagaribommanahalli,
and D. Ilstrup, “Pomdps for safe visibility reasoning in autonomous vehicles,” in 2021 IEEE
International Conference on Intelligence and Safety for Robotics (ISR).
IEEE, 2021, pp.
191–195.

[32] H. Bey, M. Sackmann, A. Lange, and J. Thielecke, “Pomdp planning at roundabouts,” in 2021
IEEE, 2021, pp. 264–271.

IEEE Intelligent Vehicles Symposium Workshops (IV Workshops).

[33] A. G. Cunningham, E. Galceran, R. M. Eustice, and E. Olson, “Mpdm: Multipolicy decision-
making in dynamic, uncertain environments for autonomous driving,” in 2015 IEEE Interna-
tional Conference on Robotics and Automation (ICRA).

IEEE, 2015, pp. 1670–1677.

[34] Z. Sunberg and M. J. Kochenderfer, “Improving automated driving through pomdp planning
with human internal states,” IEEE Transactions on Intelligent Transportation Systems, vol. 23,
no. 11, pp. 20 073–20 083, 2022.

[35] Y.-C. Hsu, S. Gopalswamy, S. Saripalli, and D. A. Shell, “A pomdp treatment of vehicle-
Implicit coordination via uncertainty-aware planning,” in 2020
IEEE, 2020,

pedestrian interaction:
IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).
pp. 1984–1991.

[36] S. Brechtel, T. Gindele, and R. Dillmann, “Probabilistic decision-making under uncertainty
for autonomous driving using continuous pomdps,” in 17th international IEEE conference on
intelligent transportation systems (ITSC).

IEEE, 2014, pp. 392–399.

[37] W. Liu, S.-W. Kim, S. Pendleton, and M. H. Ang, “Situation-aware decision making for
autonomous driving on urban road using online pomdp,” in 2015 IEEE Intelligent Vehicles
Symposium (IV).

IEEE, 2015, pp. 1126–1133.

[38] C. Hubmann, J. Schulz, M. Becker, D. Althoff, and C. Stiller, “Automated driving in un-
certain environments: Planning with interaction and uncertain maneuver prediction,” IEEE
transactions on intelligent vehicles, vol. 3, no. 1, pp. 5–17, 2018.

[39] C. Zhang, S. Ma, M. Wang, G. Hinz, and A. Knoll, “Efficient pomdp behavior planning for
autonomous driving in dense urban environments using multi-step occupancy grid maps,”
in 2022 IEEE 25th International Conference on Intelligent Transportation Systems (ITSC).
IEEE, 2022, pp. 2722–2729.

45

[40] M. Quigley, K. Conley, B. Gerkey, J. Faust, T. Foote, J. Leibs, R. Wheeler, A. Y. Ng et al.,
“Ros: an open-source robot operating system,” in ICRA workshop on open source software,
vol. 3, no. 3.2. Kobe, Japan, 2009, p. 5.

[41] B. Yücer, “Lane overflow pomdp planner,” https://github.com/adastec-batuhan/lane_

overflow_pomdp_planner, 2024.

[42] Y. Zhang, A. Khajepour, and Y. Huang, “Multi-axle/articulated bus dynamics modeling: a
reconfigurable approach,” Vehicle System Dynamics, vol. 56, no. 9, pp. 1315–1343, 2018.

[43] J. E. Siegel, D. C. Erb, and S. E. Sarma, “A survey of the connected vehicle land-
scape—architectures, enabling technologies, applications, and development areas,” IEEE
Transactions on Intelligent Transportation Systems, vol. 19, no. 8, pp. 2391–2406, 2017.

46