EXPLORING AND ADDRESSING THE VULNERABILITIES OF MULTIMEDIA SERVICES
OVER MOBILE NETWORKS:
FROM DEVICES TO INFRASTRUCTURE

By

Jingwen Shi

A DISSERTATION

Submitted to
Michigan State University
in partial fulfillment of the requirements
for the degree of

Computer Science—Doctor of Philosophy

2025

ABSTRACT

As mobile systems evolve from traditional telephony network architectures (e.g., 3G) to all-IP-based

network architectures (4G, 5G, and beyond), the IP Multimedia Subsystem (IMS) was introduced

to provide users with a variety of multimedia services—such as voice calls, video calls, SMS,

and emergency communications. However, while it enriches daily communication over cellular

networks, it also introduces new security threats to the mobile communication ecosystem.

In this dissertation, we systematically investigate the vulnerabilities introduced by architectural

shifts in mobile networks, spanning from user devices to network infrastructure: (1) on the device

side, we analyze the negative impact of transitioning IMS client implementations from traditional

hardware-based solutions (in cellular modems) to software-based applications on mobile phones.

Our study reveals that this shift significantly expands the attack surface, enabling adversaries to

hijack, spoof, or manipulate signaling and media data across various multimedia services; and (2)

on the infrastructure side, we examine privacy leakage issues in voice calls over IMS. Although all

voice packets and signaling messages are encrypted, the underlying transmission patterns remain

observable, thereby leaking user privacy.

There are three key lessons learned from our study. First, current IMS standards lack robust

security protections for IMS signaling routing on phones. Thus, the common socket communication

allows interprocess communication to the IMS client within the same mobile system. This

architectural gap enables malware to easily intercept or forge IMS signaling between the IMS client

and the IMS server. It enables attacks that can prevent mobile users from accessing multimedia

services across all available radio access networks - including 4G, 5G, and Wi-Fi. It also allows

adversaries to spoof SMS messages with arbitrary display names. Second, IMS video sessions lack

encryption and integrity protection beyond the IP layer. As a result, even with radio and IP layer

protection in place, it cannot safeguard the IMS video data on a compromised mobile device before

sending it to the air. This opens the door for adversaries to hijack legitimate video streams. We

demonstrate that the attacker can hijack video sessions as covert channels, completely bypassing

operator-level monitoring and charging policy. Third, although 5G/4G voice calls are encrypted for

security and privacy, we unveil that side-channel vulnerabilities persist. In particular, transmission

patterns and signaling metadata can still leak sensitive information about 5G/4G call states. We

demonstrate a Cross-domain Identity Linkage (CrossIL) attack that can link user identities to their

cellular identities with a success rate of 89% to 98%, highlighting the need for deeper privacy-aware

design in encrypted mobile voice services. Building on our findings and lessons learned, we propose

innovative countermeasures that not only address the identified security vulnerabilities but also pave

the way for enabling more reliable and resilient multimedia services over mobile networks.

Copyright by
JINGWEN SHI
2025

"Live yourself as a light,
Because you don’t know,
Who by thy light,
Out of the darkness."
— Rabindranath Tagore

v

ACKNOWLEDGEMENTS

Pursuing a Ph.D. has been a long, challenging, and unforgettable journey—marked by moments

of happiness, excitement, hope, and, at times, discouragement and daze. I am deeply grateful to

the many individuals whose support, guidance, and companionship have shaped and sustained me

throughout this endeavor.

First and foremost, I would like to express my sincere appreciation to my advisor, Dr. Guan-Hua

Tu, for his unwavering support and mentorship. His encouragement led me to explore an entirely

new domain in cellular networks. Under his guidance, I developed a deeper understanding of

high-quality research practices, logical reasoning, and attention to detail. I am also grateful to Dr.

Chi-Yu Li and Dr. Chunyi Peng for their invaluable feedback and guidance, particularly on academic

writing. Collaborating with them has been a truly rewarding experience, and I am profoundly

thankful for their mentorship. I would like to express my heartfelt gratitude to Yaron Koral, my

mentor at the AT&T Lab, for offering me a fresh perspective on the industry, as well as for his trust

and encouragement. I would also like to thank the members of my dissertation committee—Dr.

Guan-Hua Tu, Dr. Zhichao Cao, Dr. Tianxing Li, and Dr. Yuying Xie—for their insightful feedback,

constructive suggestions, and generous support throughout my research.

My heartfelt thanks go to Xitong Zhang, who has been a steadfast companion throughout the

past twelve years of my academic journey. I am also thankful to Changhan Ge, Shufan Wang, and

Yanbin Liu for their encouragement during my transition into new research path, and for making

my internship at AT&T Labs a memorable and enriching experience. I extend my gratitude to my

labmates—Xinyu Lei, Tian Xie, Sihan Wang, Yiwen Hu, Minyue Chen, Height Yan, Yu-An Chen,

Moyan Lyu, and Jared Singh Sekhon—for the camaraderie, discussions, and shared moments. I am

equally appreciative of the friends I was fortunate to meet at MSU during my Ph.D. years, including

(but not limited to) Guangliang Liu, Haitao Mao, Bocheng Chen, Zhiyu Xue, Lan Wang, Wei Ao,

Wei Wang, Boyang Liu, Yunshi Liang, Mengying Sun, Deliang Yang, Haoyu Zheng, Tooba Nasir

and Catherine Mfinanga. What I have learned from all of you has shaped me into a more inclusive,

patient, and determined person. To those I may have unintentionally omitted, please accept my

vi

sincere apologies—you are no less appreciated.

Above all, I extend my deepest gratitude to my family, whose love, patience, and support have

been the cornerstone of my journey. Their belief in me has carried me through every challenge, and

for that, I am forever thankful.

vii

TABLE OF CONTENTS

CHAPTER 1

INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . .

CHAPTER 2

BACKGROUND AND STATE-OF-ART . . . . . . . . . . . . . . . .

CHAPTER 3

SECURING IP MULTIMEDIA SUBSYSTEM (IMS) ON MOBILE
DEVICES .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

.

.

.

.

1

6

. 20

CHAPTER 4

ENHANCING THE PRIVACY OF VOICE SERVICES OVER IMS
FRASTRUCTURE . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. 49

CHAPTER 5

CONCLUSION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. 76

BIBLIOGRAPHY .

.

.

.

.

. .

. .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

viii

CHAPTER 1

INTRODUCTION

Multimedia services such as voice calls, video calls, messaging, and emergency communication

are vital to modern life. They play a crucial role in both everyday personal communication and

life-saving public safety services, such as 911 calls. As technology continues to advance, the scope

and impact of these services are expanding. Emerging application scenarios such as autonomous

vehicles and the Internet of Things (IoT) increasingly depend on robust and reliable multimedia

communication over mobile networks. Parallel to this growth, mobile networks have evolved

significantly from the early days of 3G to today’s 5G, and upcoming 6G. A central milestone in this

evolution has been the architectural shift from circuit-switched networks to IP-based packet-switched

infrastructures.

In traditional circuit-switched networks, communication requires establishing a dedicated

physical link between two endpoints before any data exchange can occur. This model provides

reliability but lacks scalability and efficiency. By contrast, modern packet-switched networks break

data into discrete IP packets that are independently routed across the IP-based networks. This

paradigm shift enables more flexible and efficient communication, generating advanced multimedia

service platforms for mobile networks - IP Multimedia Subsystem (IMS) [66]. IMS supports a wide

range of media services by integrating key functionalities, including user authentication, session

control, media processing, charging, and Quality of Service (QoS) enforcement. Though introduced

during the 4G era, IMS is not confined to 4G networks. It is designed to support multimedia

services across various access technologies, including 2G, 3G, Wi-Fi, Internet, and landline phones,

providing a unified service backbone regardless of the underlying connection. Over the last two

decades, IMS has gained widespread adoption across diverse access networks, including 4G LTE,

5G New Radio (NR), and Voice over Wi-Fi (VoWi-Fi). As of April 2023, over 290 service providers

across 235 networks had deployed IMS-based voice services. By the end of 2022, 4.7 billion

subscribers relied on IMS for voice communication, and this figure is projected to rise to 7.5 billion

by 2028—representing approximately 90% of all combined 4G and 5G subscriptions [60].

1

While IMS greatly enhances flexibility, scalability, and service reliability, it also introduces a

range of complex security challenges. In response to these architectural shifts and emerging threats,

this dissertation presents a comprehensive study of IMS-based mobile multimedia services.

1.1 Current Research Contributions

The current research contributions can be categorized into two primary domains: one focused

on mobile devices and the other on network infrastructure. Specifically, my work spans two main

research directions:

Figure 1.1 An overview of current research contributions.

• Securing IP Multimedia Subsystem on Mobile Devices — focusing on vulnerabilities

introduced at the mobile device.

• Enhancing the Privacy of Voice Services over IMS Infrastructure — addressing user

privacy leakage and side-channel threats within the network infrastructure.

2

Network Infrastructure(1) Securing IP Multimedia Subsystem (IMS) On Mobile Devices(2) Enhancing the Privacy of Voice Services over IMS InfrastructureMobile Device  Securing IP Multimedia Subsystem on Mobile Devices. The IP Multimedia Subsystem (IMS) is a

foundational framework for delivering multimedia services—such as voice and video calling, SMS,

and emergency communication—across cellular networks. While its security mechanisms have been

substantially strengthened over the past two decades, most of these enhancements have focused on

the network infrastructure layer. Techniques such as mutual authentication (AKA), IPsec encryption,

and STIR/SHAKEN caller ID verification provide strong protection within the operator’s domain.

However, a critical blind spot remains: the mobile device (ME) itself. As smartphone architecture

has evolved—shifting IMS functionality from secure hardware modems to more flexible but exposed

application processors—device-side security standards have failed to keep pace. This discrepancy

opens up new and under-explored attack surfaces.

This research direction presents the first comprehensive security analysis of IMS client behavior

on modern smartphones. We identify four key vulnerabilities in current IMS implementations:

unprotected signaling routing, unrestricted signaling sources, insecure video data delivery, and

unauthorized use of ViIMS channels. Based on these weaknesses, we design and demonstrate three

proof-of-concept attacks, including (1) the DoS-ALL attack that denies IMS access across Wi-Fi

and cellular; (2) the NameSpoofing attack that fabricates sender names, bypassing carrier-level

validation mechanisms; and (3) the ViIMS-Any attack that exploits high-priority ViIMS channels.

These attacks are experimentally validated on commercial smartphones using leading carriers in the

U.S. and Taiwan. Our findings reveal a critical need to re-examine and strengthen mobile-side IMS

security in light of architectural transitions. We propose a set of countermeasures to mitigate these

emerging threats.

Enhancing the Privacy of Voice Services over IMS Infrastructure. Mobile voice communication

remains a fundamental aspect of daily life, even as rich communication services proliferate over

mobile broadband. With the transition to all-IP in 4G and 5G networks, voice services have evolved

into Voice over IP Multimedia Subsystem (VoIMS), encompassing VoLTE and VoNR technologies.

These services are deployed by hundreds of operators worldwide and are expected to support

billions of devices shortly. Security mechanisms for VoIMS are robust in design, leveraging layered

3

encryption, mutual authentication via SIM-based keys, and standardized IPsec protection. However,

while these protocols ensure encryption and integrity of signaling and media packets, they do

not fully address emerging privacy risks arising from how voice traffic behaves under real-world

conditions.

This research direction examines user privacy threats arising from network-level optimizations

designed to enhance voice call performance. Techniques such as guaranteed-bit-rate bearers, ROHC,

AMR codecs, and comfort noise generation, though effective individually and collectively, create

distinct and predictable traffic patterns. We demonstrate that these patterns can be exploited to

passively infer confidential information about encrypted calls—including call activity, call state, and

even caller or callee identity—without decrypting any voice content. Through a series of proof-of-

concept attacks, we show that adversaries can link users to specific cellular identities and mute call

participants, posing serious privacy and integrity threats. Our contributions include the empirical

analysis of new side-channel risks in VoIMS traffic and the proposal of a standards-compliant

mitigation strategy that addresses these weaknesses without sacrificing service quality.

1.2 Dissertation Structure

The structure of this dissertation is outlined as follows. Note that this dissertation will not

introduce the project of safeguarding cellular emergency service security and improving user

authentication for Internet applications in detail because those works are not the author’s main

contribution.

Chapter 2 provides the necessary background and reviews the state of the art. Chapter 2.1

introduces the evolution of multimedia services in mobile networks, with a focus on the architectural

shift from circuit-switched to IP-based infrastructure. Chapter 2.2 first provides a technical primer

on IMS in 4G and 5G networks, covering key components, security mechanisms, and signaling

flows. It then introduces the system architecture and performance optimizations for IMS-based

voice services in 4G and 5G. Chapter 2.3 surveys existing research on IMS security, identifying

gaps in both device- and infrastructure-level protections.

Chapter 3 investigates the security vulnerabilities of IMS implementations on mobile devices.

4

Chapter 3.1 presents the threat model and experimental setup. Chapters 3.2 and 3.3 reveal two

fundamental issues: unprotected routing of SIP signaling messages and insecure access to IMS

media sessions. Chapter 3.4 discusses the security of modem-based IMS clients and iPhones.

Chapter 3.5 proposes a lightweight and standard-compliant solution that secures both signaling and

media paths without modifying mobile infrastructure.

Chapter 4 focuses on privacy vulnerabilities in IMS-based voice services. Chapter 4.1 outlines

the threat model and methodology. Chapter 4.2 introduces a traffic analysis-based side-channel

attack that enables call inference, and Chapter 4.3 demonstrates proof-of-concept attacks capable of

inferring call states and speaker identity. Chapter 4.4 discusses practical deployment considerations,

and Chapter 4.5 presents our defense mechanism.

In the final chapter, Chapter 5 summarizes the key contributions of the dissertation and outlines

directions for future research in securing next-generation IMS and emergency services. Section 5.1

consolidates the research findings across the dissertation, emphasizing their significance for both

academic and practical stakeholders. Section 5.2 summarizes the insights and lessons learned from

our study. Section 5.3 introduces two key areas for future exploration: (1) investigating the security

of Next-Generation 911 (NG911) services on mobile devices and (2) detecting and analyzing privacy

leakage in IMS-based robocalls and IVR systems. These topics reflect the ongoing evolution

of mobile services and the need for forward-looking security research. The overarching goal of

our research is to enhance the security of mobile multimedia technology, safeguarding network

infrastructure, mobile equipment, and, ultimately, mobile users.

5

CHAPTER 2

BACKGROUND AND STATE-OF-ART

In this chapter, we introduce the evolution of mobile multimedia services, network infrastructure,

protocols, IMS service signaling flow, and IMS voice call primer involved in this dissertation. We

further present the related state-of-the-art studies.

2.1 The Evolution of Multimedia Services over Mobile Networks

We now present the evolution of mobile multimedia services, highlighting the historical timeline,

associated standards, and key architectural shifts that have shaped the development of voice, video,

and messaging services across generations of mobile networks.

2.1.1 An Overview of Evolution of Multimedia Services

Figure 2.1 An overview of the standard release timeline and the development of multimedia services.

The evolution of cellular technology [48, 130] has been marked by significant milestones,

each generation (G) taking approximately a decade to develop, as shown in Figure 2.1. These

advancements have not only transformed the way we communicate but have also played a pivotal role

in the development of multimedia services. In this section, we will explore the key developments and

standards that have shaped the cellular landscape, highlighting their impact on multimedia services.

Before 3G, cellular networks evolved from fragmented analog systems to standardized digital

6

infrastructures, primarily supporting voice through circuit-switched technology. These early

generations lacked efficient support for data services, limiting multimedia capabilities.

The Evolution of 3G (1998): The Split Architecture of Circuit-Switching and Packet-Switching.

In December 1998, the 3rd Generation Partnership Project (3GPP) was established to develop a

specification for a 3G mobile phone system building upon the 2G GSM system [131]. The 3GPP is

indeed responsible for designing and developing standards not only for 3G cellular technology but

also for all subsequent generations, including 4G and 5G cellular networks. The introduction of 3G

marked a pivotal shift from circuit-switched technology to packet-switching. Additionally, release 5

of the 3GPP standards formalized the concept of the IP Multimedia Subsystem (IMS) [17], which laid

the foundation for multimedia services integration within cellular networks. As a result, data services

such as web browsing and email became possible over mobile networks for the first time. However,

multimedia services like voice and SMS continued to rely on legacy circuit-switched methods.

This led to a split architecture: IP-based transport for data and circuit-switched infrastructure for

multimedia services.

The Emergence of 4G (2008): Transition to a Fully Packet-Switched Network. With 4G, the

mobile network fully transitioned to packet switching. More importantly, it introduced the IP

Multimedia Subsystem (IMS)—a standardized architectural framework specifically designed to

deliver IP-based multimedia services over mobile networks. IMS represents a pivotal shift: voice

and messaging are no longer confined to circuit-switched infrastructure. Instead, they are integrated

into the packet-switched domain, enabling richer communication services and significantly greater

scalability.

The Advent of 5G (2018): Enabling Cloud-Native Architectures. With the advent of 5G, IMS

has evolved into a cloud-native architecture, delivering greater flexibility and enhanced performance.

This transformation paves the way for real-time multimedia applications, including mission-critical

use cases such as autonomous driving.

As we progress from 3G to the current 5G, the evolution of multimedia services within

cellular networks is notable. With the transition to 3G and beyond, multimedia services have

7

become increasingly integrated into the cellular landscape. We will delve into the details of these

advancements and compare the differences in multimedia services from the 3G era to the present

state of 5G. This comparison will shed light on the profound impact that each cellular generation

has made on the development and delivery of multimedia services.

2.1.2 Multimedia Services through Circuit Switched Core Network in 3G

Figure 2.2 Architecture of 3G Mobile Networks.

3G introduced significant optimizations for data transmission, boosting the bit rate from kilobits

per second (kbps) to megabits per second (Mbps). As a result, before 3G the mobile network

primarily focused on providing voice and text services, while 3G extended its capabilities to include

multimedia services such as image and video transmission. We next introduce the 3G networks as

shown in Figure 2.2, which consist of four parts: User Equipment (UE), radio access network, core

network, and data network.

User Equipment (UE). The User Equipment (UE) serves as the entry point for mobile users into

the cellular network. It encompasses two domains: the Mobile Equipment (ME) and the User

Services Identity Module (USIM). The ME is responsible for radio transmission and executing

various applications on the mobile device. The USIM, on the other hand, consists of a standalone

smart card that stores user information used for authentication purposes [18]. Together, the ME and

USIM form the user’s interface with the network.

Radio Access Network (RAN). The Radio Access Network (RAN) is responsible for establishing

8

Voice Network(PSTN/ISDN)2G Base Station(BST)3G Base Station(NodeB)UEVoiceVoiceDataDataMGWMGWMSC ServerGMSC ServerCircuit-switched Core NetworkData Network(PDN)Packet-switched Core Network(GPRS)SGSNGGSNSignalingRadio Access NetworkData Networkand maintaining the wireless communication link between the user equipment (UE) and the core

network. In 3G, the base station within the RAN is known as the NodeB. These base stations

facilitate wireless communication through various air interfaces and are crucial for network coverage

and capacity.

Circuit-Switched Core Network. For voice communication in 3G networks, the Circuit-Switched

Core Network is employed. This network includes components such as the Media Gateway (MGW),

Mobile Switching Center (MSC) server, and Gateway Mobile Switching Center (GMSC) server.

The MSC and GMSC servers primarily handle call control and mobility control functions. Under

their control, the MGW establishes the bearer connection required for every voice session [38].

Packet-switched Core Network.

In contrast to circuit-switched voice communication, data

transmission in 3G networks is achieved through the Packet-Switched Core Network, often referred

to as the General Packet Radio Service (GPRS). Key elements in the GPRS include the Serving

GPRS Support Node (SGSN) and Gateway GPRS Support Node (GGSN). These components are

responsible for handling data transmission to external data networks, thus enabling mobile data

services and internet connectivity [45].

Data Network. 3G networks support various types of data networks to accommodate different

applications. For voice, two notable data networks are the Public Switched Telephone Network

(PSTN) and the Integrated Services Digital Network (ISDN). PSTN is a traditional analog circuit-

switched telephone network. At the same time, ISDN is a digital communication technology that

offers faster data transfer rates, higher call quality, and the ability to handle multiple simultaneous

connections. For data services, the Public Data Network (PDN) is utilized to provide data

connectivity, including internet access, to mobile users. These data networks play a crucial role in

supporting both voice and data services within the 3G cellular networks.

Understanding the core components of 3G networks is essential for comprehending the functioning

of these mobile communication systems. From the UE to the various core network elements, each

component plays a unique role in enabling voice and data services. In the subsequent sections, we

will delve deeper into their evolution in successive generations of mobile networks.

9

2.1.3 Multimedia Services through Packet Switched Core Network in 4G and 5G

Figure 2.3 Architecture of 4G and 5G Mobile Networks.

In Figure 2.3, the architecture and operations of 5G/4G networks are illustrated for both the

control plane and user plane. To follow the flow from left to right, user traffic passes through several

key components: the UE (User Equipment), RAN (Radio Access Network), the 5G/4G core network,

and, depending on the service, it may proceed to the Internet for mobile broadband access or to IMS

network or multimedia services (voice, video, 911 calls, SMS, and more).

In the RAN, 5G employs gNodeB, and 4G uses eNodeB as the Base Station (BS) to provide

radio access to the UE. For the control plane, the Mobility Management Function (MMF) handles

tasks like registration, authentication, IP connectivity management, and mobility management. The

Home Environment (HE) serves as the repository for user data. In the user plane, Gateways (GWs)

play a critical role in forwarding traffic and managing IP connectivity.

The IMS network consists of the Call Session Control Function (CSCF), Application Server (AS),

and Home Subscriber Server (HSS). The three CSCF(s) are Proxy-CSCF (P-CSCF), Interrogating-

CSCF (I-CSCF), and Serving-CSCF (S-CSCF), which collectively manage the SIP signaling for

initiating, maintaining, modifying, and terminating IMS services (e.g., IMS call). CSCF will

route the SIP signaling and media data to the assigned Application Server (AS), which executes

the application. HSS is a database that contains subscription-related information to support user

authentication and authorization. It also stores the subscriber’s location and IP information [23].

10

4G Base Station(eNodeB)5G Base Station(gNodeB)UEVoice/Video/SMS/911 Call, etc.DataGWsMMFPacket-switched 5G/4G Core NetworkData Network(Internet)Radio Access NetworkData NetworkHESignaling4G:MME5G:AMF+SMF4G:HSS5G:UDM+UDR4G:SGW+PGW5G:UPFHEGWsMMFVoice Network(IMS)P-CSCFS-CSCFI-CSCFHSSAS2.1.4 New Challenges in Mobile Network Evolution

While IMS significantly enhances flexibility, scalability, and service reliability, it also introduces

a new set of complex security challenges. These challenges stem from four fundamental shifts in the

design and operation of IMS-based networks: (1) the transition isolated, hardware-based solutions

to more accessible, software-based clients (e.g., IMS operating as an Android application); (2) the

adoption of more flexible and extensible signaling protocols; (3) the migration of signaling from the

secure control plane to the less protected data plane; and (4) the elevated prioritization of network

resources for IMS traffic (i.e., IMS services versus general data services).

(1) Transition from Hardware-Based to Software-Based Clients. In legacy mobile systems, call

and SMS functionalities were tightly integrated into the hardware modem. This hardware-based

design offered strong isolation and minimal exposure to software-level threats. With the introduction

of IMS, this model has shifted. Most IMS clients now operate as software applications within

the mobile operating system—similar to conventional Android apps. Services such as SMS and

video calls have also migrated to the software layer, while only voice communication remains in

the modem for reasons of backward compatibility. This software-centric architecture significantly

expands the attack surface, making client-side exploitation more feasible and increasing the potential

for privilege escalation, spoofing, or data tampering.

(2) Flexible and Extensible Protocol Standards. Alongside the shift to software-based clients comes

a second challenge: flexible signaling protocols. In 2G and 3G networks, signaling relied on

rigid, binary-encoded formats strictly enforced by hardware, which made spoofing and message

fabrication exceedingly difficult. In contrast, IMS in 4G and 5G employs the text-based Session

Initiation Protocol (SIP), which is highly flexible and extensible. However, it also reduces barriers

for attackers. Malicious attacker can now craft, intercept, or modify SIP messages with relatively

little effort, increasing the risk of fraud and impersonation attacks. Additionally, IMS must support

multiple standards for SMS—such as 3GPP (used by Verizon) and 3GPP2 (used by T-Mobile and

AT&T)—to ensure compatibility across networks. If even one standard contains a vulnerability, the

requirement for cross-support can allow that flaw to impact users on otherwise unaffected networks,

11

thereby amplifying the scope and complexity of potential attacks.

(3) Migration from Control Plane to Data Plane. A third challenge arises from the relocation of

IMS signaling from the control plane to the data plane. In 2G and 3G, signaling took place over

the control plane, which was safeguarded by well-tested security mechanisms like authentication,

encryption, and integrity protections. But in 4G and 5G, IMS signaling moves to the data plane,

which is optimized for performance but lacks equivalent security protections. As a result, these

messages may be exposed to weaker or inconsistently configured defenses, increasing the likelihood

of spoofing, interception, or unauthorized access. This architectural change introduces a trade-off

between performance and security, where the latter is often compromised.

(4) Elevated Resource Priority for IMS Traffic. Finally, IMS services are assigned high-priority

bearers with elevated QoS guarantees across the device, radio access network, and core infrastructure.

These guarantees ensure low latency and high reliability for critical services like voice and emergency

calls. However, it also introduces new security vulnerability and incentives for attackers to abuse

those resources. Adversaries who manage to impersonate an IMS client may gain preferential

access to network bandwidth, effectively hijacking system resources. Such abuse can lead to service

degradation for legitimate users and may even allow attackers to bypass traffic monitoring and policy

enforcement mechanisms.

In light of these architectural changes and emerging security concerns, this dissertation undertakes

a comprehensive study of IMS-based mobile multimedia serviers. The research spans mobile

devices and network infrastructure, protocol specifications and implementations. Our objective

is to uncover previously unexamined vulnerabilities, demonstrate real-world attack scenarios, and

propose practical, scalable defenses to enhance the overall security of IMS-enabled multimadia

services.

2.2

5G/4G IMS Primer

In this section, we introduce the foundational concepts of 5G/4G network architecture, along

with their associated security mechanisms and performance optimizations. For simplicity, we use a

unified terminology to represent functionally equivalent components across 4G and 5G networks.

12

2.2.1

5G/4G IMS architecture and Service Flows

In this section, we first present the necessary background on 5G/4G network architecture and its

security measures. We then introduce the architecture and network stack on Mobile Equipment

(ME), and finally present the IMS service flow.

Figure 2.4 5G/4G mobile network architecture and its security; the architecture and potential security
vulnerabilities of ME.

5G/4G mobile network architecture. Figure 2.4(a) shows 5G/4G network architecture and its

operations in both control-plane and user-plane. From right to left, user traffic traverses the UE (User

Equipment), RAN (Radio Access Network), 5G/4G core network, and Internet (mobile broadband)

or IMS (voice/text). UE is the ME equipped with a valid USIM (UMTS Subscriber Identity Module);

RAN uses 5G gNodeB or 4G eNodeB as the BS (Base Station) to provide radio access to the UE. In

the control plane, MMF (Mobility Management Function) administrates registration, authentication,

IP connectivity, and mobility, whereas HE (Home Environment) stores user data. In the user plane,

GWs (Gateways) are used to forward traffic and manage IP connectivity.

To offer guaranteed network performance for each UE, multiple IP flows are created and assigned

with distinct QoS levels. Specifically, one flow is established for mobile broadband service to

the Internet, whereas two flows are created to support multimedia services (e.g., voice and video

calls) offered by the IMS: one for signaling and the other for media traffic; they are managed

by IMS signaling servers and media gateways, respectively. The IMS signaling uses Session

Initiation Protocol (SIP) [50] and the media traffic is transported over Real-Time Transport Protocol

(RTP) [23, 39].

13

(a) 5G/4G mobile network architecture.(b) 5G/4G security architecture.(c) ME architecture.BSSMSUSIMVoIMSIMSClientMEHEMMF(III)(IV)(I)(II)(I)(I)(I)(II)(I)IMSTS 33.210TS 33.203Application StratumServing StratumTransport Stratum(?)(?)IMS ClientTelephonyRILHardwareSoftwareTFTNon-IMSNative LibJava API3412Linux Kernel14…VulnerabilityAttackPHYApplicationProcessorModeme.g., SnapdragonMACRLCPDCPSDAP(5Gonly)RAN5G/4G Core NetworkIMS Core4G:MME5G:AMF+SMF4G:HSS5G:UDM+UDR4G:eNodeB5G:gNodeBControl-planeUser-plane4G:SGW+PGW5G:UPFMMFHEGatewayBSInternetIPFlow (Internet) IPFlow (SIP)IP Flow (RTP&RTCP)InternetIMSUESignalingServerMedia Gateway5G/4G security architecture. Figure 2.4(b) shows that 5G/4G uses a multi-layer security architecture

with three stratums: application, service, and transport. The security functions are divided into four

domains [43, 44]: (I) network access domain, which ensures mutual authentication between the core

network and the ME, as well as secure service access; (II) network domain, which guarantees secure

communication among network entities; (III) user domain, which secures communication between

the ME and the USIM; (IV) application domain, which protects message exchanges between ME

applications and network servers (e.g., IMS). Such a security architecture shows that the access

between the applications and the ME is not explicitly protected.

ME architecture. Figure 2.4(c) illustrates the ME architecture, which includes both software and

hardware components, with Android Phones serving as examples. The ME software includes OS,

applications, and the user interface. The applications can be classified into IMS and non-IMS types

with different protocol stacks on top of the Linux kernel, and specifically, each IMS application

serving as an IMS client runs on the Telephony Framework and the RIL (Radio Interface Layer)

for IMS functionalities. The ME hardware contains two major components. One is an application

processor supporting the ME software, whereas the other is the cellular modem offering cellular

connectivity and cellular-related services. The modem mainly contains cellular L1/L2 protocols,

including PHY, MAC, RLC (Radio Link Control), and PDCP (Packet Data Convergence Protocol)

for both 5G and 4G networks, as well as SDAP (Service Data Adaptation Protocol) for the 5G

network only. Moreover, it contains a function, TFT (Traffic Flow Template), for associating packets

with each specified IP flow based on the 5-tuple (source/destination IP addresses, source/destination

port numbers, protocol ID) information so that the corresponding routing and QoS policy can be

applied [19].

IMS service flows. Figure 2.5 depicts IMS service flows for text, voice, and video services. To access

an IMS service, the UE needs to perform three actions. First, IP Connectivity Establishment [23]

is performed to obtain IP connectivity for communicating with the IMS server. Second, IMS

Service Registration [23] is made for service registration from the UE to the IMS server, but

also for mutual authentication between them. It uses the SIP Registration procedure with the

14

Figure 2.5 IMS service flows.

IMS-AKA (Authentication and Key Agreement) [41]. When the IMS signaling security is enabled,

IPsec SAs (Security Associations) between the IMS server and the UE are established during the

registration. Third, the UE carries out IMS Service Session Establishment to establish an IMS

service session with another UE [23, 9] using SIP. The IMS text and call services have different

establishment procedures, which are initialized with initial messages, SIP MESSAGE and SIP

INVITE, respectively. In particular, to ensure carrier-grade IMS service quality, the IMS signaling

with SIP messages and the IMS media traffic with RTP/RTCP packets are both prioritized over the

traffic of mobile data services. Specifically, the QoS levels assigned to mobile data services are the

best-effort transmission, with priority indexes ranging from 8 to 9 [34]. In contrast, those assigned

to IMS signaling and media traffic are the best effort with a priority of 1 (smaller values indicate

higher priority) and the guaranteed bit rate transmission, respectively.

15

IMS ServerCallerCallee5G/4G Core Network IP Connectivity Establishment IMS Service RegistrationSIP Register401 UnauthorizedSIP Register200 OK IMS Service Session EstablishmentSIP INVITESIP INVTE100 TryingSession ProgressSession Progress180 Ringing180 Ringing200 OK200 OKVoice/Video Conversation(RTP/RTCP packets)Case 1: Text Over IMSSIP MESSAGE202 AcceptedSIP MESSAGE200 OKCase 2: Voice/Video Call Over IMS2.2.2

IMS Voice Service Primer

In this section, we introduce the arthitecture and optimization for 5G/4G voice service (i.e.,

VoIMS). VoIMS is an essential VoIP-based voice solution for 5G/4G networks [69]. Figure 2.6

depicts its network architecture, main protocols, and a basic work flow.

Figure 2.6 Network architecture, main protocols and an operation flow for 5G/4G voice over VoIMS.

5G/4G network architecture supporting VoIMS. It comprises two parts: the 5G/4G network

infrastructure and the IMS domain. The former provides User Equipment (UE, e.g., mobile phones)

with active mobile connections (user-plane data pipes) to deliver user traffic over IP within 5G/4G

networks. User traffic packets in turn traverse the UE, the base station and the gateways in the core

cellular network to reach the external Internet or the IMS domain (for 5G/4G voice), or vice versa.

The IMS domain comprises two key components: media gateway and signaling server. The former

delivers IP multimedia data (e.g., voice packets) to IMS clients (e.g., UEs); the latter processes all

signaling messages which are used to establish and manage voice call session above IP.

Main protocols for VoIMS. The main protocols above IP are Session Initiation Protocol (SIP) and

Real-time Transport Protocol (RTP). SIP is used for voice signaling to initiate, maintain, modify,

and terminate voice calls over IP. RTP transmits a live multimedia stream over IP. VoIMS takes

the same choices used by VoIP. Below IP, the main protocol is PDCP [46]. It performs three main

functions within 5G/4G networks. First, it compresses the IP headers of user-plane data packets

16

IMS DomainUE4G/5G user-planeVoIMStrafficVoIMSsignalingControlControlMediaGateway4G gatewaysUPF(similar to4G gateways)4Ge.g., authentication, security5GIPRTPProtocol Stack @UESIP…⓪Establish a 4G/5G user-plane pipe①Establish a VoIMScall (via SIP bearer )②Run a VoIMScall session (via RTP bearer)③Terminate a VoIMScall (via SIP bearer)SignalingServerPDCP (encrypted)Encrypted Pipe 4G/5G control-plane03…VoIMScall flowVoIMSenhancement CodecE1 … E4E3E4E3E4E1E2E1E2to improve transmission efficiency over the air. data, namely, IP packets. The session keys are

generated through 5G/4G security functions in the control plane [44, 43]. Third, it dispatches the

upper-layer data to their corresponding radio bearers: Dedicated Radio Bearer (DRB) and Signaling

Radio Bearer (SRB). DRB is used to carry traffic in the user plane, and SRB is for the 5G/4G

signaling in the control plane. PDCP is the only Layer-2 protocol studied in this work because PDCP

wraps other lower L2/L1 protocols to offer a user-plane pipe for IP packet delivery. Conceptually,

there is no difference between 5G and 4G except that 5G supports varying QoS settings for distinct

IP data flows [33].

VoIMS call flow. As illustrated in Figure 2.6, a VoIMS call typically takes three steps: establishment

( 1 ), call conversation ( 2 ), and termination ( 3 ), if a 5G/4G user-plane pipe below IP is available.

Otherwise, it first establishes this pipe ( 0 ). Actually, this pipe is encrypted using the keys derived

from the mutual authentication between the UE and the network. A VoIMS call session is established

by SIP signaling; it starts when someone dials a phone number to generate a call request and ends

when the call request is accepted by the other call party ( 1 ). A call conversation is then carried over

this established call session ( 2 ). The voice call application uses a speech audio codec to convert

voice traffic into a digital format, which is later delivered by RTP. To end the call, SIP is used again

to terminate the VoIMS call session ( 3 ). Both RTP and SIP packets are further encapsulated into

IP packets for delivery. Specifically, they are forwarded to the IMS through the user-plane (PDCP)

pipe provided and encrypted by 5G/4G networks.

Voice enhancement techniques. As illustrated in Figure 2.6, four techniques have been introduced

by 3GPP to enhance quality and efficiency of VoIMS services, as illustrated in Figure 2.6. From

bottom up, they include special radio bearers (E1), ROHC (E2), comfort noise (CN) (E3), and AMR

speech codecs (E4). VoIMS uses a special DRB with a guaranteed-bit-rate to ensure sufficient radio

bandwidth for voice [69]; ROHC compresses the headers of VoIMS packets to reduce transmission

overhead [46]; CN injects some background noise to prevent an unexpected call termination caused

by a period of total silence [25]; AMR speech codecs (e.g., AMR [25], EVS [28]) offer adaptive rates

for voice speech, and VoIMS uses a lower coding rate for unvoiced packets that carry background

17

noise. These techniques indeed enhance 5G/4G voice quality and efficiency. However, we find that

the good turns evil as they together bring unanticipated side effects that have not been reported

before to leak confidential call information despite encryption.

2.3 State of the Art on IMS over Mobile Networks

Many studies have explored the security issues of IMS services from mobile equipement and

mobile network infrastructure.

Mobile Equipment. The IMS security of the ME has attracted much attention recently. The related

studies can be classified into two directions, namely, IMS service abuse and DoS attacks. In the first

direction, [123] studies the insecurity of the IMS-based SMS and then uncovers the corresponding

SMS abuse and spoofing attacks. [84] compromises the phone modem to abuse the IMS voice

session to transmit malicious data. [59] defends against the caller-ID spoofing by verifying the

caller’s call state based on a callback.

The other direction focuses on DoS attacks against IMS services. Specifically, [99] hijacks the

VoWiFi signaling session to launch stealthy IMS call DoS attacks based on an insecure design of the

call state machine. [92] spams the voice bearer to launch a DoS attack by muting an ongoing VoLTE

call. [76] presents several vulnerabilities, including an improper cross-layer security binding, for the

IMS service, thereby causing DoS attacks on the cellular emergency service against anonymous

UEs. [140] introduces side-channel inference techniques to identify specific IMS call signaling

messages and launch DoS on the IMS service over Wi-Fi.

Network infrastructure. Several works focus on the insecurity of the IMS server deployed

in the cellular network infrastructure. They can be classified into two categories. First, two

studies [101, 120] investigate potential flooding and DoS attacks against the IMS server. Specifically,

one [101] is to show that the adversary can flood SIP registration messages to the IMS server,

yielding the server’s extra CPU processing power. The other [120] presents that abrupt changes in

the content of SIP session requests, as well as the SIP message sequence, can be used as detection

features of the IMS flooding. Second, three research works [127, 114, 53] attack the IMS session

authentication and privacy against the IMS server. They observe that differentiated call response

18

times can be used to identify cellular IoT devices, introduce an attack that eavesdrops on the victim’s

VoLTE call based on an implementation flaw of reusing the network key stream, and uncover that the

weak requirement of network certification in the standard may cause the leakage of the IMSI/APN

information for a UE involved in a VoWiFi call, respectively.

Over recent years, the realm of 4G/5G voice security on the network infrastructure side has

garnered increasing attention from the research community [81, 91, 98, 105, 57, 58, 75, 113].

Previous investigations have predominantly delved into various security challenges, including

aspects such as VoLTE call reliability [81], unauthorized data access via VoLTE signaling [91],

caller spoofing attacks [85], Denial of Service (DoS) attacks [98, 58], 911 call security [75], and

overall security analyses [57]. Intriguingly, a study [58] enabled voice monitoring and harnessed

vulnerabilities in 5G standalone networks (specifically, EPS fallback) while also exposing encryption

algorithm insecurities in 2G GSM networks. A separate recent investigation suggested that encrypted

packets of VoLTE calls might be susceptible to decryption [113]. This claim hinged on the reuse of

encryption keys across different VoIMS calls by the same mobile user, despite such key reuse being

explicitly prohibited by 3GPP standards [32].

19

CHAPTER 3

SECURING IP MULTIMEDIA SUBSYSTEM (IMS) ON MOBILE DEVICES

The IP Multimedia Subsystem (IMS) delivers IP multimedia services, such as voice/video calling

and texting, to mobile users over cellular networks. In the past two decades, IMS services have

been augmented to support various access networks, incorporating VoLTE (Voice over LTE), VoNR

(Voice over New Radio), and VoWi-Fi (Voice over Wi-Fi). IMS security is also enhanced with a

suite of well-examined mechanisms, including 5G/4G AKA (Authentication and Key Agreement),

cellular-specific multi-layer security, and IMS media security. Specifically, secret keys required for

IMS sessions [40] are derived from the AKA mutual authentication, wireless transmission in the

air (Layer 2) is encrypted using the derived keys, and IP session (Layer 3) is secured by Internet

Protocol Security (IPsec) [42]. Moreover, network operators enforce additional measures such as

STIR/SHAKEN [129] required by the FCC for caller ID authentication, protecting IMS services

from malicious attacks.

However, these security enhancements are primarily centered on cellular network infrastructure.

Our security analysis reveals that security measures on the mobile equipment (ME) side have remained

relatively unchanged over the years. There are many advances on ME; for example, smartphone

vendors have migrated the IMS client from 5G/4G modem chips to application processors and

segregated IMS voice and video media processing within modem chips and application processors.

Unfortunately, we find that 3GPP-mandated IMS security measures on the ME side fail to keep pace

with device-side technological advances, resulting in new security vulnerabilities and unprecedented

attacks.

Our security analysis on ME shows neither IMS media sessions nor their control signaling are

well protected. Specifically, we discover four new vulnerabilities: (V1) unprotected IMS signaling

routing, (V2) unrestricted IMS signaling source, (V3) unprotected video data delivery, and (V4)

unrestricted source for IMS video delivery. Details are elaborated in §3.2 and §3.3. By exploiting

these vulnerabilities on ME, we further develop three proof-of-concept attacks against IMS services:

(A1) Denial of Service over All Networks (DoS-ALL), (A2) Named SMS Source Spoofing

20

Category

Vulnerability Description

Attacks

V1. Unpro-
tected IMS
Signaling
Routing

Unprotected
ME
Routing
For IMS
Client
Signaling
(§3.2)

V2.
Unre-
stricted IMS
Signaling
Source

V3. Unpro-
tected Video
Data Deliv-
ery

Insecure
ME Access
for IMS
Media
Sessions
(§3.3)

ME does not ensure that
all outgoing IMS signaling
messages are sent to the
IMS servers deployed by
network operators; Routing
to malicious programs at
the ME is allowed. (§3.2.1

ME does not protect IMS
client software from re-
ceiving IMS signaling mes-
sages originated from non-
IMS servers (say,
local
apps). (§3.2.2)

The IMS media transmis-
sion between IMS client
and cellular network mo-
dem is not provided with
confidentiality and integrity
protection. (§3.3.1)

Un-

V4.
restricted
Source
for
IMS Video
Delivery

Cellular network modem
cannot verify whether IMS
video data is transmitted by
IMS clients or other non-
IMS applications. (§5.2)

Empirical Validation

Carrier Device OS

Android
4.4.2, 7,
8, 9,
11, 13

US-I,
US-II,
US-III,
TW-I,
TW-II

LG(G3,
G7),
TCL(40
XL),
Samsung
(S8,S10,
S21)

[A1] DoS-ALL, a
novel DoS attack that
prevents IMS clients
from using all access
networks over Wi-Fi,
4G LTE, 5G NR.
(§3.2.3.1)
[A2] NameSpoofing,
an SMS spoofing
attack fabricates the
sender’s name, which
is prohibited by the
network. (§3.2.3.2)

US-I†,
US-II†

[A3] ViIMS-ANY, an
attack that abuses Vi-
IMS as a covert com-
munication channel be-
tween two malicious
MEs, bypassing oper-
ator policies. (§3.3.3)

Android
4.4.2, 7,
8.1, 10,
13

LG(G3),
Samsung
(S8,S10),
Google
(Pixel
1/3/5/7)

Table 3.1 Summary of four vulnerabilities and three proof-of-concept attacks in this work. Note: †
ViIMS experiments were conducted in US-I and US-II because US-III supports ViIMS only with
very limited phone models; TW-I and TW-II do not support ViIMS yet.

(NameSpoofing), and (A3) Covert Communications over Video-over-IMS (ViIMS-ANY).

The first DoS-ALL attack prevents the victim phones from accessing IMS services over all

access networks including Wi-Fi, 4G LTE and 5G NR. It is more threatening than any DoS attacks

reported before; it not only denies the IMS service access over Wi-Fi networks but also prevents

access to all alternative cellular networks. The second NameSpoofing attack creates a fake short

message with a fabricated sender name, which is prohibited by cellular infrastructure to mobile users.

Figure 3.1 gives an illustrative example where NameSpoofing is successfully launched on our lab

smartphone and the victim receives a message from “Mark Zuckerberg verified by Verizon”. Unlike

21

Figure 3.1 A successful NameSpoofing attack.

the existing SMS spoofing attacks, NameSpoofing is much more threatening because it fabricates

the sender name instead of the phone number. Note that network operators do not allow SMS users

to fabricate the sender’s name (here, Mark Zuckerberg) even though the phone number is spoofed;

more importantly, “verified by Verizon" cannot be added into the sender’s name unless Verizon

authenticates that the sender number is not spoofed and truly used by Mark Zuckerberg. It is much

harder for the victims to know whether they suffer from the smishing/phishing attacks, particularly

when the names are "verified" by network operators. In comparison to fake Amber/Wireless alert

attacks [90, 54] that primarily target emergency attack scenarios, this attack is applicable to a broader

range of attack scenarios.

The third ViIMS-ANY attack abuses ViIMS, which is designated for delivering video calls

over IMS. ViIMS is used by two adversary MEs for any data communications, which obtains

guaranteed bit rates and a higher service priority that normal data services should not have. As such,

ViIMS-ANY bypasses data service policies enforced by operators.

Table 3.1 summarizes new vulnerabilities and attacks, which are experimentally validated using

commodity phones with three top-tier U.S. carriers and two major operators in Taiwan. We further

propose countermeasures to address identified vulnerabilities and evaluate their effectiveness (§3.5).

3.1 Threat Model and Methodology

To support multimedia services, the IMS client is designed differently from the traditional one,

offering circuit-switched call and text services. It contains control-plane and data-plane operations.

22

For the control plane, the IMS client is expected to support various multimedia services with a

flexibility demand of being updated dynamically, so most phone models use a software-based design

to function as a mobile application. Compared with the traditional one implemented in the phone

modem, it has a larger attack surface and may thus be more vulnerable. It may suffer from the

hijacking of the IMS signaling session due to the acquisition of root privilege [99, 84] or the delivery

of spoofed signaling messages (i.e., SMS) given unprotected packet routing. In our work, we focus

on the latter security threat, which has not been explored, in §3.2.

For the data plane, there are currently two major multimedia services: Voice over IMS (VoIMS)

and ViIMS. These two services are supported in different ways according to their different processing

resource requirements. The voice data of VoIMS is processed by the phone modem; thus, the

corresponding voice packets cannot be captured in the mobile OS. They are inherently protected

by the hardware security of the modem. Although the modem can still be compromised by some

specialized tools (e.g., QXDM [108]), thereby causing its voice session to be hijacked [84], the

security threat is limited since the assumption for attackers in the threat model is too strong to be

practical. However, with a demand for large processing resources, a multi-core application processor

is deployed to process the video data of ViIMS. This new component might lack the conventional

hardware security protection from the modem, thus broadening the attack surface. This motivates

us to investigate the security of the IMS data-plane framework in §3.3.

We next present the threat model and experimental methodology, ethical considerations, as well

as responsible disclosure.

Threat model. For the control-plane security threats of IMS services in §3.2, victims are mobile

users with a subscription to operational IMS services, whereas the adversary develops a malware

application and installs it on the victims’ MEs; notably, there have been many ways for the malware

propagation [106], and it is not our focus. The malware application does not require any root

privileges. Specifically, the DoS-All attack (A1) encompasses two attack scenarios with distinct

requirements. First, the adversary compromises the Wi-Fi router that the victim ME connects to and

assigns the ME a Wi-Fi configuration during the initial Wi-Fi association. Such a malicious router

23

can be deployed in some public areas (e.g., cafes, airports, and restaurants). In this case, the malware

requires only the INTERNET permission at most. More threateningly, if the victim’s ME supports

IPv6, the malware is not needed. Second, in case the victim ME is not trapped in the malicious

Wi-Fi network, the malware is a must and needs not only the INTERNET permission but also the

BIND_VPN_SERVICE one. As for the NameSpoofing attack (A2), the malware necessitates the

INTERNET permission, and the BIND_VPN_SERVICE one is also needed for the victim MEs

running Android 9 or higher.

For the data-plane security threats in §3.3, victims are mobile operators, and adversaries are

mobile users abusing IMS video channels. For the ViIMS-ANY attack (A3), it is assumed that the

adversary can install a malware application with root privileges on their own ViIMS-supported MEs.

This assumption is practical since the compromised MEs are attack devices held by the adversary

for attacking the infrastructure. Notably, only the ME software is compromised, but the others,

including the ME hardware, are not.

Methodology. To validate the presented security threats, we conduct experiments in the networks

of three top-tier U.S. carriers and two Taiwan carriers, which are denoted as US-I, US-II, US-III,

TW-I and TW-II, due to a privacy concern. We mainly focus on 4G networks and 5G NSA

(Non-Standalone) networks, since the 5G SA (Standalone) network has not been widely deployed

yet. .

We totally test 10 carrier-certified COTS (Commercial Off-The-Shelf) phone models, including

LG G3/G7, Samsung S8/S10/S21, Google Pixel 1/3/5/7, and TCL 40XL, from four major brands.

Their Android versions range from 4.4.2 to 13. The reason why the chosen phone models are mainly

those with Android OS is that Android takes the largest share with 71.8% [1] of the worldwide

mobile OS market.

Ethical consideration. We bear in mind that some feasibility tests and attack evaluations may

harm mobile users or carriers, so we conduct all the experimental studies in two responsible ways.

First, we use only our own phones as the victim UEs. Second, we purchase unlimited plans for

the text, call, and data services on all the tested phones. Notably, we do not seek to cause any

24

unnecessary damage but rather to make a disclosure about potential security threats in operational

mobile networks.

Responsible disclosure. We have reported all the identified vulnerabilities to the parties involved,

including mobile OS vendors, phone manufacturers, and carriers. The proposed remedies have also

been provided to them.

3.2 Unprotected ME Routing for IMS Client Signaling

The ME routing requirement for the IMS client signaling seems to be simple and easily fulfilled,

but it may not be restricted or protected from a security aspect. Specifically, the requirement needs to

cover the delivery of both incoming and outgoing IMS signaling messages; the incoming ones shall

originate from the IMS server and be delivered to the IMS client, whereas the outgoing ones shall be

sent in the opposite direction. The other routing rules shall be prohibited; otherwise, some security

threats may occur, e.g., the IMS server or the IMS client is spammed/spoofed by a third-party entity,

and the IMS signaling session is hijacked by a man-in-the-middle (MiTM) attack.

However, we discover that the routing rules for the IMS client signaling on the ME are not

enforced to be exclusively restricted for the routing requirement; that is, the IMS signaling messages

may be received from non-IMS parties or be maliciously routed to them. In the following, we

present the corresponding two security vulnerabilities, namely (1) unprotected IMS signaling routing

and (2) unrestricted IMS signaling source, and introduce two proof-of-concept attacks to show the

real-world impact.

3.2.1 V1. Unprotected IMS Signaling Routing

In the mobile OS, a network interface is created for the exclusive use of the IMS service (e.g.,

“rmnet_data0”), designated as an IMS interface, and is associated with a set of routing rules to

route the IMS signaling. According to our investigation on Android OS with versions from 4 to 13,

the IMS signaling routing is implemented by two components: RPDB (Routing Policy Database)

and iptables. The RPDB defines the priority of routing policies, as shown in Figure 3.2, and each

routing policy specifies a rule matching a routing table managed by the iptables. For example, the

highest priority is the rule at the topmost line, “from all lookup local”, which means looking

25

Figure 3.2 Routing Policy Database (RPDB).

up the “local” routing table for the packets from “all” sources. For each packet, the first matched

rule from the priority list is applied. By following the priority list to search for a match rule for each

packet, the first matched one will be employed.

To support the IMS signaling over the IMS interface (e.g., “rmnet_data0”), there are two

approaches observed. First, with the older Android versions, the IMS server address is explicitly

specified in a routing policy of the RPDB and the policy is to look up the routing table of the IMS

interface. Second, with the newer Android versions, the packets generated by the IMS client are

identified by a framework mark [4] that is assigned to the client, and the mark is associated with the

routing table in a routing policy. For example, the policy set to route IMS signaling packets in the

RPDB shown in Figure 3.2 is the bottommost one. It means that all the packets with the framework

mark (i.e., “fwmark”), “0x10fa4/0x1ffff”, are routed by looking up the “rmnet_data0” routing

table.

Seemingly, the routing of the IMS signaling is secure, since the non-IMS applications without

root privilege are not allowed to modify the RPDB and routing tables. However, we discover that

adding a routing rule to match and route the IMS signaling packets before they are matched with

the IMS routing policy is still possible based on some specific operations supported for normal

applications without root privilege. For example, activating the VPN service on an Android phone

is allowed to create a virtual interface (e.g., “tun0”) and assign the interface an address; connecting

the phone to a Wi-Fi network is allowed to assign an IP address to the Wi-Fi interface, which is

usually given by the DHCP service of the Wi-Fi network. Once the adversary can compromise

26

0:from all lookup local 10000:from all fwmark0xc0000/0xd0000 lookup legacy_system10500:from all iiflo oifdummy0 uidrange0-0 lookup dummy0 10500:from all iiflo oifrmnet_data0 uidrange0-0 lookup rmnet_data0 10500:from all iiflo oifrmnet_data1 uidrange0-0 lookup rmnet_data1 10500:from all iiflo oifswlan0 uidrange0-0 lookup local_network13000:from all fwmark0x10063/0x1ffff iiflo lookup local_network13000:from all fwmark0x10fa4/0x1ffff iiflo lookup rmnet_data0 13000:from all fwmark0xd0254/0xdffff iiflo lookup rmnet_data1 Highest Priority Routing Policy(a) Before activating the malicious VPN service.

(b) After activating the malicious VPN service.

Figure 3.3 The routing rules of the ’local’ routing table.

Figure 3.4 Failing to send SMS messages, there are no responses received from the IMS server.

a VPN application or a Wi-Fi network, the assigned IP address can be set to the IMS server’s IP

address and the IMS signaling can be thus routed to a compromised network interface, instead of

the IMS interface.

Experimental validation. We validate this vulnerability by developing a VPN application that

assigns the IMS server’s IP address to the virtual interface. The experiment is conducted across

three U.S. carriers and two Taiwan carriers. For each tested phone, we activate the VPN service

using the developed application, and then send one SMS message to another phone number using

the GUI of the SMS service.

For all the tested phones, we observe that the VPN application can successfully assign the

IMS server’s IP address to its established virtual network interface, and its routing information is

updated in the “local” routing table, as shown in Figure 3.3. Moreover, the outgoing IMS signaling

messages that carry SMS ones are routed to the VPN interface (i.e., “tun0”) instead of the IMS

interface. It causes the delivery of the SMS messages to fail and no responses from the IMS server

27

local ::1 dev lo proto kernel metric 0 pref mediumlocal 2600:1007:110b:9b03:c923:69ce:83f:d04d dev rmnet_data0 proto kernel metric 0 pref mediumBefore Opening the VPN interfacelocal ::1 dev lo proto kernel metric 0 pref mediumlocal 2001:4888:2:fe40:a0:104:0:232 dev tun0 proto kernel metric 0 pref mediumlocal 2600:1007:110b:9b03:c923:69ce:83f:d04d dev rmnet_data0 proto kernel metric 0 pref mediumVPN is using the IMS server IPlocal ::1 dev lo proto kernel metric 0 pref mediumlocal 2600:1007:110b:9b03:c923:69ce:83f:d04d dev rmnet_data0 proto kernel metric 0 pref mediumBefore Opening the VPN interfacelocal ::1 dev lo proto kernel metric 0 pref mediumlocal 2001:4888:2:fe40:a0:104:0:232 dev tun0 proto kernel metric 0 pref mediumlocal 2600:1007:110b:9b03:c923:69ce:83f:d04d dev rmnet_data0 proto kernel metric 0 pref mediumVPN is using the IMS server IPRetransmission until timeout since the IMS client receives no response.Figure 3.5 Pixel phone modem’s extended debug messages collected via the QXDM [108].

are received, as shown in Figure 3.4, for all the phones except Google Pixel ones.

The reason why this vulnerability does not work for the Google Pixel phones is that they

use the IMS client supported in the phone modem to access IMS services, instead of that

software-based IMS client in the Android OS, which is employed by the other tested phones. The

modem-based IMS client can be accessed by Android applications via QMI (Qualcomm MSM

Interface)[134], which is a proprietary interface for interacting with Qualcomm baseband processors.

For example, to initiate a VoIMS call, the call application of a Pixel phone sends its modem a

QMI_VOICE_DIAL_CALL_REQ message with a specified calling number and the call type (e.g.,

emergency and auto-selected). The modem then starts the call setup procedure by transmitting a

SIP INVITE message to the cellular infrastructure without involving the Android OS, as shown in

Figure 3.5. Thus, this supported modem operation makes the Pixel phones be immune to V1.

Notably, common VPN applications are not allowed to intercept the IMS signaling without V1.

Although they handle most data transmissions on the phones and the malicious ones may cause

severe attacks on them, the VPN data transmissions do not cover IMS signallings being transmitted

over 4G/5G networks. The importance of V1 lies in its ability to allow malware to intercept IMS

signaling messages across all radio access networks, creating a new attack surface.

Root cause and lesson learned. This vulnerability arises from a conventional function (i.e., packet

routing) on phones, but the root cause is still a design issue from the IMS standard; that is, there

is a lack of security protection over the IMS signaling routing on the phones. The mobile OS has

fulfilled the requirement of routing all the packets generated from the IMS client to the IMS server,

and this routing policy cannot be modified without root privilege. Without an explicit security

manner over the IMS signaling routing from the IMS standard, the mobile OS should not take the

blame. To address this vulnerability, a new security mechanism is needed to prevent any potential

28

Security
Associations
1
2
3
4

Protocol

IMS Client
Port

IP

Direction

IMS Server
Port

IP

TCP

UDP

IP_A Server Port A
IP_A Client Port A
IP_A Server Port A
IP_A Client Port A

↔
↔
↔
↔

IP_B Client Port B
IP_B Server Port B
IP_B Client Port B
IP_B Server Port B

Table 3.2 Four IPsec SAs needed for IMS services.

IMS-related policies or rules from nullifying the actual IMS routing policy.

3.2.2 V2. Unrestricted IMS Signaling Source

When IMS signaling security is enabled, the IPsec SAs between the IMS client and the IMS

server will be established during the IMS registration procedure [50]. The number of the established

IPsec SAs can be up to four, as shown in Table 3.2, since the SIP messages are transmitted in two

directions, i.e., outgoing and incoming, and can be sent over UDP and TCP. According to the IMS

standard [50], all the packets belonging to these four IPsec SAs shall be offered encryption and

integrity protection.

Nevertheless, the IMS standard [50] does not expressly specify that the packets which are sent to

the IMS client or the server but do not belong to those four IPsec SAs shall be discarded, so the ME

may still route them based on its own policy, e.g., a pass-through policy. Especially when the source

of IMS signaling packets is not restricted, a malware application on the victim UE may be allowed

to send fabricated IMS signaling packets to the IMS client locally on the same UE. Given that SMS

messages are delivered based on the IMS signaling, this vulnerability can be exploited to launch

SMS spoofing attacks when the IMS client accepts and processes the fabricated packets.

Experimental validation. We validate this vulnerability by developing an Android application,

designated as FakeIMSSingaling, with only the INTERNET permission. It can fabricate a type of

IMS signaling messages, SIP MESSAGE [35], which is designed to carry SMS messages. Given

the IMS client’s IP and port, FakeIMSSingaling sends fabricated signaling messages to the IMS

client using a local UE IP address as the source IP address, which is different from the IMS server’s.

The experiment is conducted in the networks of three U.S. carriers and two Taiwan carriers.

29

Figure 3.6 A forged SIP message containing an SMS message.

Figure 3.7 Packet traces collected from US-I (top, w/o IPsec), US-II (middle, with IPSec), and
US-III (bottom, with IPSec).

Figure 3.8 The number of the observed new IMS server IP addresses over time for three U.S. carriers
and two Taiwan carriers.

For each tested phone, the application sends a plain-text IMS signaling message, in the format

of SIP MESSAGE, as illustrated in Figure 3.6, to the IMS client, while the tcpdump program

captures routed packets. The message is assigned a UDP port number which is not used by those

four established IPsec SAs, if the IPsec is adopted for the IMS signaling security (US-II, US-III, and

TW-I); otherwise, the UDP port number is randomly selected (US-I and TW-II). Notably, US-II,

US-III, and TW-I adopt the IPsec for the IMS signaling security, whereas US-I and TW-II do not.

Our experimental results show that the fabricated IMS signaling message can be locally routed to

30

Faked SIP signaling carries ﻿SMS message050100150010203040Time(Hour)VariationUS-III050100150010203040Time(Hour)VariationUS-II050100150010203040Time(Hour)VariationUS-I050100150010203040Time(Hour)VariationTW-I050100150010203040Time(Hour)VariationTW-IIthe IMS client for all the tested carriers no matter whether the IPsec is used, as shown in Figure 3.7.

However, this vulnerability does not work for all the tested phones. The Google Pixel phones are an

exception. As explained in §3.2.1, they employ the modem-based IMS client, instead of the IMS

client in the Android OS, so that no IMS signaling messages can be observed in the OS and sent to

the IMS client successfully.

Root cause and lesson learned. Seemingly, phone vendors should take most of the blame. After

the second thought, it may not be the case. The reason is twofold. First, the common socket

communication allows interprocess communication within a system so that the malware can use

it to easily send fabricated SIP packets to the IMS client within the same system. However, the

IMS standard does not explicitly prohibit it. Second, the phone vendors indeed fulfill the IPsec

requirement for the IMS signaling, but the IMS standard does not stipulate how to deal with the

packets which do not belong to the IPsec SAs but are sent to the IMS client. In view of these root

causes, the vulnerability arises from a design issue of the IMS standard that the IMS client is not

protected from receiving messages originating from non-IMS servers. It thus calls for a new security

mechanism to ensure the source of the IMS signaling for the IMS client.

3.2.3 Proof-of-concept Attacks

We devise two novel attacks using the vulnerabilities V1 and V2, respectively: (1) Denial of

Service over All Networks, designated as DoS-All; and (2) Named SMS Source Spoofing,

designated as NameSpoofing. DoS-All launches denial of IMS services by exploiting the Wi-Fi

association of the victim UE or installing a malware application without root privilege on the UE.

It causes the IMS services not only to suffer over Wi-Fi [139] but also to be blocked over cellular

radios (e.g., 5G NR). NameSpoofing allows the malware to send spoofed SMS messages with the

sender nicknames arbitrarily assigned (e.g., “Daddy”), to the IMS client locally on the victim UE;

they can be successfully received by the SMS application and shown to the victim. Notably, these

two attacks do not require root privilege and have been successfully validated with Android versions

ranging from 4.4.2 to 13.

31

(a) Before the attack, IMS signaling is untouchable.

(b) After the attack, IMS signaling is redirected.

Figure 3.9 Overview of DoS-All attack on IMS signaling routing.

3.2.3.1 DoS-All Attack

This attack can cause denial of IMS services for the victim UE by exploiting V1, even though the

IMS service continuity between different access networks is supported (e.g., when an IMS service

is not available over Wi-Fi, its offering can be handed over to another access network, 4G LTE or

5G NR [139, 138]). It can be very challenging, since IMS services are blocked from being offered

through all the access networks, particularly for cellular access networks. Another reason IMS

traffic is difficult to block lies in its transmission through a dedicated network interface that operates

independently of the standard data path. This isolation makes it inaccessible to conventional VPN

applications, which cannot capture or interfere with the traffic—as illustrated in Figure 3.9a.

By exploiting vulnerability V1, the adversary can configure one of the victim UE’s network

interfaces (i.e., Wi-Fi or VPN) to be with the IMS server’s IP address assigned to the victim UE,

causing all IMS signaling messages to be transmitted to the local network interface. It prevents the

IMS client from communicating with the IMS server, no matter which access network is used—as

illustrated in Figure 3.9b. There are two attack cases. First, the victim UE associates with a

compromised Wi-Fi network, and the Wi-Fi interface is maliciously assigned the IMS IP address.

In this case, no malware application is needed on the victim UE. Second, the victim UE installs a

compromised VPN application, so the VPN interface is abused to be assigned the IMS IP address.

However, it is not trivial to get the IMS server’s IP address assigned to the victim UE in practice.

With Android version 10 or lower, this information can be easily obtained when the permission of

“READ_PHONE_STATE” is granted. The Android OSes with the later versions constrain the access

32

IMS AppMEInternetIMS Voice/VideoNon-IMS AppMEIMS SignalingInternetWi-Fi RouterInternetIMS5G/4GIMS AppMEInternetIMS Voice/VideoNon-IMS AppMEInternetIMS SignalingInternetIMS5G/4GMal-wareWi-Fi Routerredirectof this IMS information based on a privileged permission, “READ_PRIVILEGED_PHONE_STATE”.

To avoid the requirement of privileged permissions, we propose that the adversary collects a list

of IMS server IP addresses in advance within the proximity of each victim UE and assigns them to

it during the attack. This mechanism is motivated by the observation that multiple UEs at nearby

locations are likely assigned the IMS servers from the same pool; it is also reasonable that serving

the UEs in a given range requires only a few IMS servers to be deployed. We conduct an experiment

to validate the effectiveness of this mechanism for all the tested carrier networks. In this experiment,

we disable and enable the airplane mode periodically on tested phones to trigger a new assignment

of the IMS server IP while moving to different locations over time. The ranges of different locations

are up to 400 KM and 181 KM for the experiment in U.S. and Taiwan, respectively.

As shown in Figure 3.8, where the number of the observed new IP addresses over time varies at

different locations, there are two main findings. First, the number of the IMS IP addresses assigned

to the UEs within nearby areas is limited; specifically, there are 16, 7, 40, 2, and 2 different IP

addresses for carriers, US-I, US-II, US-III, TW-I, and TW-II, respectively. Moreover, all the IP

addresses can be collected within a short time, since no more new IP addresses appear after the

first two hours in each experiment. Second, the collected IP addresses from two different areas for

each carrier have a large overlap in percentages: 57.1% (US-II), 92.5% (US-III), 100% (TW-I), and

100% (TW-II), except for US-I (0%). Moreover, in all tested carriers, the overlap percentage can

always reach 100% for any two locations with a distance no larger than 5 KM, which is much larger

than the Wi-Fi network range. This experimental result shows that the proposed mechanism can

allow the adversary to collect a set of potential IMS IP addresses for each target victim UE.

Note that although the IMS IP address assigned to a victim UE cannot be accurately identified,

the Wi-Fi and VPN interfaces are both allowed to be assigned multiple IP addresses so that the set of

potential IMS IP addresses can be directly used for the attack. An experiment has been conducted to

validate that there are up to 50 IP addresses successfully assigned to any of the Wi-Fi and VPN

interfaces; this number of assigned IP addresses is greater than that of the IMS IP addresses observed

for each carrier in the experiment.

33

Attack implementation and evaluation. We implement the DoS-ALL attack by considering two

available manners, namely compromised Wi-Fi network and VPN malware. A successful attack

depends on whether the IMS server IP address assigned to the victim UE can be correctly assigned

to its Wi-Fi or VPN interface.

⋄ Compromised Wi-Fi network: We develop a customized DHCP server on a widely-used Wi-Fi

router, GL.iNet GL-AX1800, with OpenWrt 21.02. It assigns a set of prepared IMS IP addresses

to selected Wi-Fi clients (e.g., only smartphones) based on the device model name specified in

each DHCP request message. The DHCP server supports the assignment of both IPv4 and IPv6

addresses, where the tested three U.S. carriers all use IPv6, whereas the tested two Taiwan carriers

use IPv4. The IPv6 interface can be assigned multiple IP addresses with a mandatory multi-address

feature [112], but only a single IP address is accepted by the IPv4 one. To launch the attack against

the carriers using the IPv4 address type, a malware program with the INTERNET permission needs

to be deployed on the victim UE; notably, the malware is not needed when IPv6 networks are

supported by devices and carriers. It detects whether an assigned IP address is correct by listening

to the port numbers used by the IMS session (e.g., 5060); when it is correct, the malware can receive

IMS signaling messages. Once the assigned IP address is incorrect, the malware disconnects the

UE and assigns it with another IP address via the DHCP server. Notably, it is observed that those

two Taiwan carriers using the IPv4 each has only two IMS IP addresses, so the IP address can be

correctly assigned with at most two assignments.

⋄ VPN malware: We develop a VPN malware application on Android phones and deploy a VPN

server on the Internet. The VPN malware creates and manages the VPN interface based on the

VpnService class. When connecting to the VPN server, it gets a set of potential IMS IP addresses

and assigns them to the VPN interface using the function of VpnService.Builder.addAddress.

We further launch the two attack approaches against all the tested phone models except for Pixels

and cellular network operators. The result shows that victim UEs always suffer from denial of IMS

services and are prevented from using IMS-based call or text services, even though the cellular

signal quality or the Wi-Fi one is good in all the experiments. Figure 3.10 shows examples of

34

Figure 3.10 Successful denial of IMS services over Wi-Fi (Left), 4G (Middle), and 5G (Right)
networks.

Figure 3.11 USSD service over IMS signaling (Left) and successful DoS attack on USSD service
(Right).

successful attacks in three different access networks. Notably, we notice that some carriers deployed

additional security mechanisms. Specifically, the phones tested in the networks of US-II and TW-II

downgrade their access networks to the legacy ones (e.g., 2G and 3G networks) so that they can still

have the legacy call and text services, but those tested for the other carriers suffer from denial of all

the call and text services.

Notably, DoS-ALL can intercept and drop IMS signaling, disrupting not only IMS-based calls

and texts but also other services reliant on IMS signaling, such as USSD, RCS 1. For example,

USSD is another IMS-based service (Figure 3.11), enabling users to dial quick codes for tasks

like checking balances, paying fees, or changing passwords. When a DoS-ALL attack is launched,

the IMS client loses connection to the IMS server, resulting in user-facing errors like “SYSTEM

UNAVAILABLE” or “MMI Complete.” These observations reveal the broader impact of DoS-ALL

on IMS-dependent services without additional attack techniques.

Attack variance. The DoS-ALL attack can be extended to launch various MiTM attacks. The

1Rich Communication Services (RCS) relies on IMS Signaling to comlete the configuration and registration

procedures [68]

35

IMS call DoS over 5G…Code to check the account balanceFailed tofetchthe account balanceFigure 3.12 Intercepting a SIP message carrying an SMS message in an MITM attack.

above malware, lacking root privileges, can intercept all outgoing IMS client messages, as shown

in Figure 3.12, and then forward messages to a remote server or interact directly with the client

through fabricated replies when IPsec is absent.

3.2.3.2 NameSpoofing Attack

This attack exploits V2 to send victims spoofed SMS messages in which the sender names can be

arbitrarily specified. It differs from conventional SMS spoofing attacks, which deliver spoofed SMS

messages through the core network based on spoofed phone numbers, with two advantages. First,

the attack does not require SMS messages to be sent through the network, so it cannot be impeded

by any security mechanisms deployed in the network. On the other hand, the conventional ones

have become much more challenging, since the FCC in the U.S. has mandated carriers to deploy

STIR/SHAKEN [129] in the core network to defend against the spoofing attacks; specifically, STIR

incorporates digital certificates into the IMS signaling to validate the identity of the SMS sender or

the caller. Second, the attack can arbitrarily show the spoofed sender’s name without investigating

any phone numbers trusted by the victim or stored in the contact list, but the conventional ones can

show only inconvincible spoofed numbers on the victim phone if they are not in the contact list.

To show spoofed names, we fabricate SMS messages in the format stipulated by the 3GPP2

standard [51], as shown in Figure 3.13, instead of the 3GPP standard format [22]. The former offers

the capability of presenting the message’s originating address using ASCII characters (up to 128

characters), whereas the latter allows it to be only in the E.164 format (i.e., the format of phone

numbers). Since the spoofed SMS messages do not need to pass through the core network, they can

be successfully delivered to the IMS client based on V2 whenever the UE supports the 3GPP2 format.

Surprisingly, most phone vendors, such as Samsung, LG, and TCL, still support both 3GPP and

36

Figure 3.13 A fabricated SMS message with a spoofed sender name in the 3GPP2 format.

3GPP2 standards for international roaming services. As a result, the present attack can successfully

show spoofed names on victim UEs by specifying them in the field of the originating address.

Attack implementation and evaluation. We develop the malware based on the FakeIMSSingaling

application, which has only the INTERNET permission, by adding two new features for the attack:

(1) identifying the IMS client’s IP address and UDP port number, which were manually configured

during the validation of vulnerability V2; and (2) fabricating SMS messages in the 3GPP2 format.

We conduct an experiment that uses the malware to send a spoofed SMS message with the sender

name, “Mark Zuckerberg verified by Verizon” to the IMS client locally, for all the tested phones

except Google Pixel phones, and all the tested carriers; notably, the carrier name “Verizon” is used

solely for the testing purpose. The result shows that all the tested phones can successfully display

the spoofed SMS message no matter which carrier network is connected. Figure 3.1 illustrates

the spoofed SMS message displayed on a phone using US-I. Although carriers US-II and US-III

conform to the 3GPP standard, the attack can still succeed since the fabricated SMS messages do

not pass through the core network. This also validates that many mobile phones support both 3GPP

and 3GPP2 standards.

Moreover, note that the malware needs to use the IMS server IP address as the source address of

the forged SIP packet carrying the spoofed SMS message if the victim phone is running Android

37

SMSin3GPP2formatSpoof name using 8-bit ASCIIFigure 3.14 Overview of ghost conversation attack (Left) and a successful fabricated conversation
(Right).

9 or higher; otherwise, the source address can be assigned any IP address. The IMS clients with

Android 9 or higher validate the source address and discard SIP packets not from IMS servers.

Given the malware with only the Internet permission, setting the IMS server IP to be the source IP

address of fabricated SIP packets can be achieved by binding a UDP socket to the local network

interface which is assigned the IMS server IP. For assigning the IP to a network interface, there

are two approaches as presented in § 3.2.3.1: Wi-Fi-based and VPN-based. When the VPN-based

approach is adopted, an additional permission, BIND_VPN_SERVICE, is required for the malware.

Attack variance. The NameSpoofing attack enhances traditional phishing techniques, such as

SMS messages with malicious URLs [121]. Unlike conventional phishing, which typically originates

from unknown numbers, NameSpoofing allows attackers to impersonate trusted entities—e.g.,

friends or official institutions—by spoofing the sender name, making messages appear more credible.

Additionally, it can fabricate ghost SMS conversations that do not exist. As shown in Figure

3.14, the adversary can use malware to interact with the IMS client in SMS conversations, creating

fabricated evidence in courts that can instantly damage the victim’s reputation. As a result,

NameSpoofing significantly increases the effectiveness of existing phishing attacks.

3.3

Insecure ME Access for IMS Media Sessions

We further explore the insecurity of the IMS media session built on the ME. According to the

IMS standard [20], the IMS media session should be encrypted and integrity-protected based on

SRTP (Secure RTP); the SRTP keys are derived from the IMS call setup procedure. Since this

security mechanism offers end-to-end protection between the IMS client and the IMS server, without

38

Mal-wareIMS ClientBobFake: VictimTomReply: AttackerIMS Client UIUE of AttackerFabricated SMS from malwarecompromising the IMS client, it is almost impossible to forge valid media packets or hijack the

media session on the ME if the security protection exists.

However, the SRTP is not a mandatory feature, so it may be absent on the ME, and then allow

the adversary to fabricate valid media packets, which are just in plaintext and in the RTP format.

Moreover, when the phone modem does not verify the originator of the received IMS media packets,

they may be allowed to be dispatched to the radio bearer dedicated to the IMS media session. Thus,

the IMS media bearer may be abused. Notably, the IMS voice packets are generated by the phone

modem itself, so they do not have this security issue. In the following, we focus on the IMS video

session.

Unfortunately, the above potential security threat is discovered on COTS MEs. We identify

two vulnerabilities that facilitate the potential abuse of IMS video sessions. The first vulnerability

(V3) reveals that video data delivery is not protected by SRTP. All ViIMS packets are transmitted

without confidentiality and integrity protection. Consequently, the adversary can easily use ViIMS

packets to carry non-video data. The second vulnerability (V4) confirms that the phone modem

does not impose any restrictions on the source of ViIMS packets, allowing the adversary to bypass

the authentic IMS client and transmit non-video data to the cellular infrastructure over the IMS

media bearer.

We next elaborate on these two vulnerabilities and devise a proof-of-concept attack. Note that

experiments are mainly conducted in US-I and US-II networks, as ViIMS is not yet supported by

TW-I and TW-II, and US-III supports it on only a few phone models.

3.3.1 V3. Unprotected Video Data Delivery

It has been reported that no SRTP protection is provided over the IMS voice session [92, 84],

where voice packets originate from the phone modem. For the video session, though the video data

are processed by a different component, the application processor, there exists a high probability

that the SRTP is still missing due to the common practice. This practice can leave the video data for

delivery unprotected on the UE. Once the UE is compromised with root access, the video packets

can be captured and learned for the preparation of forging valid IMS video packets.

39

Figure 3.15 Unprotected IMS video packets in plaintext.

Experimental validation. We validate this vulnerability on LG G3, Google Pixel 1/3/5/7, and

Samsung S8/S10 with Android versions ranging from 4.4.2 to 13. On each tested phone connecting

to a carrier network, we use Wireshark to capture packets while dialing a video call to another phone.

It is observed that all the IMS video packets are in plaintext without any security protection on all

the tested phones. Figure 3.15 shows one test result as an example.

Root cause and lesson learned. The absence of the SRTP protection does not come without any

reasons. The IMS video data delivery has been protected by the user-plane security built between the

UE and the base station; it is performed at the PDCP layer with ciphering and integrity protection.

Therefore, phone vendors and carriers may consider that such security mechanism has defended the

video session against all the potential threats. However, it cannot safeguard the IMS video data on a

compromised UE before they are sent to the air. This vulnerability is rooted in that the end-to-end

security between the IMS client and the IMS server is not fulfilled; especially, the ME security is

not considered. Note that the SRTP protection can be applied, along with SELinux, to safeguard

IMS voice sessions. This prevents video calls from being tampered with or extracted, even by

adversaries with root privileges. The reason is that SELinux, integrated into Android, employs

MAC (Mandatory Access Control) [64] to restrict user access, including root users.

3.3.2 V4. Unrestricted Source for IMS Video Delivery

The phone modem employs the TFT filter to identify IMS video packets and then dispatches

them to the IMS video bearer [49], which offers guaranteed performance for the IMS video session.

The TFT filter rule set for each bearer is based on the 5-tuple information (i.e., source/destination

40

Video Call StartFigure 3.16 A collected trace including forged RTP packets at the ViIMS callee for US-I.

IP addresses, source/destination port numbers, and protocol ID); it can be easily obtained by the

adversary from normal video packets or control-plane SIP messages. Once the forged video packets

are given the correct 5-tuple information and the phone modem does not deploy any security

mechanism to verify their delivery source, they could be forwarded to the IMS video bearer by the

modem. Moreover, unlike the IMS voice data processed by the modem directly, the IMS video data

are sent from the Android OS to the modem, so the forged video packets can be possibly delivered

by a malware application in the same way.

Experimental validation. We develop a malware application with root privilege to validate this

vulnerability with US-I and US-II. Given a ViIMS call, the malware at the caller generates RTP

packets with various payload sizes and sends them to the callee; notably, these packets are assigned

a unique RTP SSRC (Synchronization Source) ID, 1234567890 (0x499602D2), and contain random

data in the payload (as shown in Figure 3.16). Based on the collected trace at the callee, it is

observed that all the RTP packets ranging from 100 to 1346 can be successfully delivered from the

caller to the callee in US-I, whereas US-II only allows 10 particular sizes: 37, 169, 393, 489, 537,

585, 729, 1129, 1237, and 1294.

Root cause and lesson learned. The root cause is that the phone modem does not verify the source

of the IMS video data delivery, but depends on only the default TFT filter for the dispatching of

video packets. Although the IMS server has a chance of inspecting the payload content of video

packets to identify the forged ones, it is not allowed except for the approval from at least one party

involved in the video call or the court, due to legal provisions for carriers[2]. Thus, addressing this

vulnerability has to be at the ME.

41

Figure 3.17 Illustration of ViIMS-ANY attack.

3.3.3 ViIMS-ANY: Covert Communications over Video-over-IMS

We next present a proof-of-concept attack in which two UEs communicate covertly with each

other over the ViIMS data-plane channel by exploiting vulnerabilities V3 and V4. Different from

previous attacks, the victims in this attack are carriers (e.g., US-I), not individual users. Specifically,

adversaries are individuals seeking to exploit the carriers’ high-priority resources reserved for the

ViIMS service to establish their covert communication channels, with full control over their own

phones.

The impact of this attack is expected to grow rapidly as the ViIMS service becomes more popular,

even though ViIMS is still in the early stages of deployment. The three major U.S. operators —

AT&T, Verizon, and T-Mobile — have introduced the ViIMS service, and some of them support

inter-operator ViIMS calls. According to a report [110] by Juniper Research, the number of

subscribed users is projected to reach 4.5 billion by 2025, representing 50% of global mobile

subscribers. Superior to other video call services such as Skype, ViIMS guarantees performance

with minimal overhead, relying on the IMS application for widely deployed VoIMS service; no

additional applications are needed.

Attack implementation and evaluation. We develop an attack library called ViIMSSocket using

C and the raw socket APIs provided by the Linux kernel. It is given root privilege and provides

upper-layer applications with a UDP-like packet transmission method for executing covert ViIMS

communication, as shown in Figure 3.17. For adversaries with an engineering background,

obtaining root privileges is technically feasible using tools like Magisk [136] and One-Click

42

Cellular Network Attacker 1UE1Attacker 2ViIMSCommunicationApplicationRead() Write() ViIMSSocketApplicationRead() Write() ViIMSSocketUE2Root [15] on Android.

It contains three major APIs: (1) ViIMSSocket(Callee’s Number),

which establishes a covert communication channel with the callee over ViIMS and returns a

socket ID; (2) ViIMSSocketWriteData(socketID, data), which transmits data to the callee; and

(3) ViIMSSocketReadData(socketID, buffer), which receives data from the callee. Notably,

ViIMSSocket prevents the actual IMS video packets from being transmitted to the IMS server, to

maximize the communication capacity.

We evaluate the throughput performance of the covert communication in the networks of US-I

and US-II by sending a 10 MB file from one UE to another UE. The experiment runs ten times for

each carrier. It is observed that the file is always delivered successfully. The average throughput

measured on US-I and US-II is 545.7 Kbps and 581.4 Kbps, respectively. The achieved throughput

values are much greater than the one (e.g., up to 38 Kbps) measured from the data transmission over

the IMS voice data-plane channel [84]. By abusing IMS video sessions, the covert communication

channel is given the guaranteed bit rate resource so that the throughput is guaranteed even in

congested scenarios. Furthermore, it is observed that the covert communication can be sustained for

at least 100 minutes during a ViIMS call.

Attack variance. With the developed ViIMSSocket, potential attacks extend beyond covert

communication. Adversaries can hijack video calls to launch video spoofing attacks, allowing

for mobile deepfake video calls, encrypted stealthy communication channels, and video frame

steganography attacks [135], evading carrier detection of non-video data transmission. Importantly,

the video spoofing attack doesn’t require a malware application on the victim UE receiving the

spoofed video call.

3.4 Discussion

Now we analyze the security implications of modem-based IMS implementations and conclude

with a brief discussion of the iPhone’s architecture and built-in protections.

Is modem-based IMS client better? Google Pixel phones employ the modem-based IMS client,

ensuring that no IMS signaling packets are routed in the Android OS, thus making them immune to

vulnerabilities V1 and V2. This hardware-based approach appears more secure than software-based

43

methods on other phones but has its limitations. First, the Pixel phones remain vulnerable to the

DoS-ALL attack, when extended to prevent IMS media from being sent to the IMS media server.

This extension can be achieved by assigning the IMS media server address to a local network

interface of the victim UE. Consequently, IMS media data generated by the application processor’s

domain (e.g., video), rather than the phone modem’s, can be sent to the local interface instead of the

IMS media server. Second, the modem-based IMS client lacks flexibility in updating IMS-related

services, e.g., enabling any of the rich communication services (RCS) [67], since updating the phone

modem requires collaboration from modem vendors like Qualcomm. It is less convenient and more

time-consuming compared to a software update.

Are iPhones secure? We conduct experiments on iPhones with four iOS versions (15/15.5/16.5/17)

to validate the four discovered vulnerabilities. iPhones are immune to vulnerabilities V1 and V2

due to different network policies applied in iOS, which is built on a Unix-like OS (Darwin) [132].

Specifically, the iOS employs an interface-oriented approach, which restricts the routing of the IMS

signaling to only the cellular interface, so V1 does not exist. It drops the IMS packets that do not

belong to the established IPsec SAs, thereby avoiding V2. Since most recent iPhones do not support

ViIMS [133], the validation of V3 and V4 is left for future investigation.

3.5 Solution

In this section, we propose two remedies to address these four vulnerabilities and evaluate their

effectiveness.

3.5.1 Restricted IMS Routing

We propose a restricted IMS routing mechanism that contains two methods to address vul-

nerabilities V1 (unprotected IMS signaling routing) and V2 (unrestricted IMS signaling source),

respectively. First, the mobile OS shall prohibit any local network interface from being assigned

the IMS server’s IP address so that the IMS signaling packets cannot be routed locally but to the

IMS server. Second, the mobile OS shall be prevented from sending the IMS client any packets

originating from local applications, so all the routing policies and tables shall prohibit the local

IMS traffic routing. Take the routing table in Figure 3.3a as an example; the routing rule, “local

44

2600:...:83f:d04d dev rmnet_data0”, with the IMS client’s IP address shall be removed.

3.5.2 Protected IMS Media Sessions

Applying the SRTP protection to safeguard IMS voice sessions can prevent video call tampering

(vulnerability V3), but it does not forbid transmitting non-video data. The SRTP protection is built

between the IMS client and the IMS server, so the modem is not allowed to verify the authenticity of

the IMS video source (vulnerability V4). To this end, a secure communication channel between the

IMS client and the modem has to be built. We adopt DHKE (Diffie-Hellman Key Exchange), which

is effective in deriving shared secret keys, to establish a secure communication channel. This solution

leverages the cellular infrastructure as a trusted intermediary in the DHKE procedure, preventing its

common threat, MiTM attacks. It can avoid the use of asymmetric cryptography, which is commonly

used to address the MiTM attacks but may not be supported on all MEs. This proposed DHKE

procedure exchanges DHKE parameters between the IMS client and the modem during the SIP

registration, while doing mutual authentication based on the 3GPP symmetric cryptography. Unless

it is compromised, the established secure communication channel remains secure. unless the 3GPP

symmetric cryptography is compromised. Notably, this proposed solution does not require any

modifications from cellular network protocols or add new signaling messages.

Figure 3.18 illustrates the proposed DHKE procedure with seven steps: 1 in the initiation of the

IMS registration procedure [50], the IMS client selects a large prime number, q, a primitive root of

q, 𝛼, and a private key, 𝑋𝑎; 2 the IMS client calculates its public key 𝑌𝑎 as 𝑌𝑎 = 𝛼𝑋𝑎 mod q; 3 the

IMS client transmits the SIP REGISTER message carrying q, 𝛼, and 𝑌𝑎 to the IMS server; 4 the

IMS server coordinates PCF, AMF, and the serving base station to transmit q, 𝛼, and 𝑌𝑎 to the RRC

(Radio Resource Control) layer on the phone modem using the RRC Reconfiguration message; 5

the RRC layer on the phone modem selects a private key, 𝑋𝑏, calculates the corresponding public
key, 𝑌𝑏 = 𝛼𝑋𝑏 mod q, calculates the shared secret key, K = 𝑌 𝑋𝑏

𝑎 mod q, provides K for the PDCP

layer, and then transmits 𝑌𝑏 to the base station using the RRC Reconfiguration Complete message;

6 the phone modem’s 𝑌𝑏 is delivered to the IMS client through the SIP OK in response to the SIP

REGISTER message; and 7 with the received public key, 𝑌𝑏, the IMS client calculates the shared

45

Figure 3.18 The DHKE procedure integrated into the 3GPP cross-layer communication framework.

secret key and finally shares it with the phone modem.

Note that the DHKE has ensured that two communicating parties can derive a shared secret key

over an insecure channel. Even with an eavesdropper inside or outside the ME (e.g., eavesdropping

on RRC messages) during the DHKE procedure, the shared secret key cannot be inferred or leaked.

After the secret key is derived, the mobile OS must ensure that no applications, even those with root

privileges, can access the IMS client’s memory where the key is stored.

Against legacy and compromised UEs. Adversaries may use legacy UEs or compromised UEs

built based on SDR (Software-Defined Radio) platforms (e.g., srsUE [7]) to launch the ViIMS-ANY

attack, since they do not allow the proposed solution to be deployed. To address this issue, carriers

can reduce attack incentives by preventing them from making high-bandwidth video calls, thereby

limiting the bandwidth of their video sessions, whenever the deployment of the proposed solution

is not detected. Moreover, the infrastructure can also detect them by monitoring their IMS media

usage [55].

46

Namf_Communication_N1MessageNotify (𝒀𝒃)RRC Reconfiguration Complete (𝒀𝒃)RRC Reconfiguration (𝒒, 𝛼, and 𝒀𝒂)Namf_Communication_N1N2MessageTransfer (𝒒, 𝛼, and 𝒀𝒂)AAR (𝒒, 𝛼, and 𝒀𝒂)AAA (𝒀𝒃)MEMobileNetwork1Selecting 𝒒, 𝛼, and 𝑿𝒂SIP REGISTER (𝒒, 𝛼, and 𝒀𝒂)34Selecting 𝑿𝒃, Calculating 𝒀𝒃 =𝛼𝑿𝒃 𝑚𝑜𝑑 𝒒,𝑲=𝒀𝒂𝑿𝒃 𝑚𝑜𝑑 𝒒SIP OK (𝒀𝒃)67The shared secret keyCalculating𝑲=𝒀𝒃𝑿𝒂 𝑚𝑜𝑑 𝒒BSIMS ServerPCFAMFIMS ClientModem4DL NAS Transport (𝒒, 𝛼, and 𝒀𝒂)45456UL NAS Transport (𝒀𝒃)66A Secure Communication Channel𝑲𝑲2Calculating 𝒀𝒂 =𝛼𝑿𝒂 𝑚𝑜𝑑 𝒒3.5.3 Prototype and Evaluation

We next prototype and evaluate the proposed remedies.

Restricted IMS routing. We develop an Android system application, designated as IMSProtector,

with root privilege. It mainly monitors three pieces of information: (1) the RPDB, (2) routing tables,

and (3) network interfaces. It not only removes any routing rule allowing local IMS traffic routing

but also deactivates the interface assigned the IMS server’s IP address, if there is any to be detected.

To assess the effectiveness of IMSProtector, we launch the attacks of the ineluctable denial of

IMS services and the named SMS source spoofing. As shown in Figure 3.19, IMSProtector can

successfully defend against these two attacks. Specifically, it deactivates the local Wi-Fi network

interface (i.e., tun0) assigned the IMS server’s IP address.

It is also observed that the attack

application, SMSNameSpoofer, is not allowed to transmit any SMS messages with named sources

to the IMS client due to a lack of local IMS routing.

Protected IMS media sessions. We implement and evaluate this solution on an SDR platform,

using srsUE (v23.04) for emulating a 5G UE, srsRAN (v23.04) for emulating a 5G gNB, and

open5GS (v2.4.11) for emulating a 5G core network; ZeroMQ [3] is used to implement the radio

link between the gNB and the UE. Moreover, we develop an IMS client and an IMS server in Python,

and deploy them on the srsUE and the open5GS, respectively. The PCF and the AMF in the core

network, as well as the gNB, are modified to support the proposed DHKE procedure. This platform

is built on a Dell XPS 13 laptop running Ubuntu 22.04, equipped with an i7-1185G7 CPU and

16GB of RAM. In the prototype, we use the shared secret key, K, to provide integrity and data origin

authentication for IP packets exchanged between the IMS client and the phone modem. In particular,

we add an option using an unassigned option type of 150 [5] to IP headers for the MAC (Message

Authentication Code) verification.

To examine the effectiveness of this solution, we launch the ViIMS-ANY attack after a secure

communication between the IMS client and the phone modem is established. As shown in Figure 3.20,

it is observed that the fabricated IMS video packets are detected and then dropped.

Carriers have deployed the IMS system since launching VoLTE. Although 3GPP kept improving

47

Figure 3.19 IMSProtector: (Left) disabling an interface assigned the IMS IP address; (Right) local
IMS routing is forbidden.

(a) Traces on SDR UE (An adversary view).

(b) Errors shown on the terminal of the SDR modem.

Figure 3.20 Evaluation of enabling secure communications between the IMS client and the phone
modem.

its security designs over the last two decades, most enhancements have been focused on the cellular

infrastructure. This caused the ME security in the IMS standard to lag behind the infrastructure

security, posing security risks to cellular users and carriers. We conducted a comprehensive security

study regarding the IMS signaling and media delivery on the ME; four vulnerabilities were identified,

and the corresponding three attacks were exposed. These security threats have been validated using

ten phone models and five carriers across two countries. Although we have proposed remedies to

address them, completely solving them requires collaboration among carriers, phone vendors, and

the cellular standard community.

48

Dropinvalid packetsCHAPTER 4

ENHANCING THE PRIVACY OF VOICE SERVICES OVER IMS FRASTRUCTURE

Mobile voice communication has been a long-standing and widely utilized service. Despite the

growing popularity of third-party communication services over mobile broadband, voice calls

remain prevalent, with a substantial user base [109, 118, 111]. In the transition to 5G/4G networks

fully reliant on IP technology, voice communication has evolved into Voice over IP Multimedia

Subsystem (VoIMS), known as Voice-over-New-Radio (VoNR) for 5G and Voice-over-LTE (VoLTE)

for 4G [9]. Presently, more than 235 operators in 105 countries offer VoIMS services, with a

projection to serve five billion devices by 2025 [65].

The security of 5G/4G voice calls is a primary concern, with encryption measures in place to

safeguard confidentiality, privacy, and security. These security protocols incorporate well-established

mechanisms such as 5G/4G Authentication and Key Agreement (AKA) and multi-layer security at

Network Layer 3 and Layer 2 [44, 43]. In Layer 3, the protection of voice call is achieved through

IPsec (Internet Protocol Security) for confidentiality and integrity [42]. To protect transmissions over

the air, Layer 2, utilizing Packet Data Convergence Protocol (PDCP), provides encryption [46, 47].

Unfortunately, the pursuit of optimizing VoIMS quality and efficiency through standardized

techniques, adopted by commercial 5G/4G networks, introduces unanticipated security implications.

These optimization techniques include the use of guaranteed-bit-rate radio bearers, RObust Header

Compression (ROHC) to compress packet headers [78], the implementation of Adaptive Multi-Rate

(AMR) audio codecs [24] for varying radio conditions, and the incorporation of comfort noise for

handling silence during calls[25]. Each technique individually aims to enhance call quality and

efficiency.

However, the good turns evil when putting them together. They presents unexpected vulnerabili-

ties, potentially turning VoIMS calls into security threats. For instance, the combination of ROHC

and comfort noise generates very small packets (less than 16 bytes) that can distinguish VoIMS

traffic from other types. By this means, a VoIMS call can be detected by checking the presence

of tiny packets. Moreover, examining voice packet patterns with and without comfort noises can

49

infer voice call states. More inference details are elaborated in §4.2. We want to highlight that it

is not easy because of the sheer volume of encrypted packets over the air and the rich real-world

complexity in human speech activities.

These security concerns led to the development of proof-of-concept passive (§4.3.1) and active

attacks (§4.3.2). The passive attacks leverage precise call state information to uncover the identity

of the caller/callee (i.e., linking user identities to cellular identities such as phone numbers). In

contrast, the active attack focuses on selectively muting one of the call participants, rendering Call

Denial of Service (DoS) attacks more inconspicuous and efficient. These attacks have undergone

rigorous validation and assessment through experiments conducted with all three major US operators,

utilizing commercial mobile phones. A standard-compliant fix to these vulnerabilities has been

proposed and assessed for its effectiveness in §4.5.

In summary, we pioneer in its capacity to infer confidential 5G/4G call information without

decrypting voice packets, essentially transforming beneficial call enhancement techniques into

security concerns and emerging threats against 5G/4G call security and privacy.

4.1 Threat Model and Methodology

We next present the threat model, followed by our responsible methodology and ethical

considerations.

Threat Model. In our threat model, adversaries are defined as individuals or entities that seek to

monitor or launch attacks against mobile users by exploiting vulnerabilities in 5G/4G radio channels.

These adversaries have the capability to eavesdrop on all communications occurring over public

channels, such as 5G/4G radio channels, but they lack the ability to decrypt encrypted messages

without access to the requisite decryption keys. In practice, adversaries can deploy their own

equipment, like a 5G/4G sniffer, in close proximity to victim User Equipment (UEs) to intercept

all packets transmitted over the air. However, they do not possess the means to compromise the

security of any victim’s smartphone or the 5G/4G networks themselves. To simplify the scenario,

we consider an adversary, referred to as "Evil," eavesdropping on one of the call participants (e.g.,

Alice) during a voice call, given that both call parties (e.g., Alice and Bob) are typically not in close

50

proximity and do not utilize the same radio channel that the adversary is intercepting.

Responsible methodology and ethics. We adhere to a responsible and ethical research approach.

Real experiments were performed in collaboration with all three prominent U.S. operators, which

we refer to as OP-I, OP-II, and OP-III. The primary objective was to validate the identified

vulnerabilities and evaluate the potential impact of attacks. Recognizing that certain feasibility

tests and attack assessments could pose risks to network operators and their mobile users, we

took precautions. Unless explicitly stated otherwise, experiments were carried out within a fully

controlled environment.

In this controlled setting, we utilized a 5G/4G sniffer implemented via software-defined radio

(SDR) to gather information about voice calls from smartphones. Importantly, all smartphones

involved in these experiments were the property of our research laboratory. To ensure that inadvertent

attacks on smartphones not participating in the study were avoided, we implemented two critical

measures:

1. All experiments were conducted within a private laboratory during off-peak times, and strict

measures were taken to ensure the absence of any unauthorized individuals nearby. In this setup,

one smartphone served as the victim, while several other smartphones acted as simulated users in

proximity to replicate normal 5G/4G traffic.

2. In situations where potential passersby were present, we utilized phone-side cellular trace

collectors, such as MobileInsight [102], to exclusively collect cellular radio traffic data from the

victim’s smartphone. This approach guaranteed that no cellular radio traffic from non-participating

smartphones was inadvertently collected.

In some cases, specific attack experiments were conducted in semi-controlled environments or

public places. Further details regarding the experimental configurations for each attack are provided

in §4.3.1 and §4.3.2 of the study. This comprehensive approach underscores our commitment to

responsible research practices and ethical considerations in the study of mobile network security.

51

Figure 4.1 Overview of side-channel call inference.

4.2 Side-Channel Call Inference

In this section, we present call inference techniques to obtain confidential call information over

encrypted packets (without knowing the decryption keys). 4.1 gives an overview of side-channel

inference with three tasks. First, it detects the presence of a VoIMS call out of all the packets

received in the air (4.2.1). Note that most packets are not for voice (say, for mobile data and 5G/4G

signaling). Second, it infers call states for the detected VoIMS call, particularly who is talking

(4.2.2). It means that the adversary Evil is capable of knowing more about how this voice call is

going on by dividing a call conversation into fine-grained segments (e.g., Alice talks most time or

rarely talks), thereby launching attacks based on precise call states. Last, it further infers the start

and end time for each conversation segment (marked as “+” and “×” in 4.1). Such precise call state

information makes it possible to launch attacks to infer more confidential information (say, user

identities in §4.3.1 or selectively manipulate the target victim call at specific times §4.3.2.

4.2.1 Detecting VoIMS Calls

At first glance, detecting the presence of an ongoing VoIMS call is not challenging, even though

all the packets are encrypted. This is because the radio bearers used for VoIMS (voice traffic and

signaling) differ from those used for mobile Internet data. A previous study [113] has observed the

use of distinct DRBs (for example, DRB1: mobile Internet data, DRB2: VoIMS signaling, DRB3:

52

VoIMS voice packets). Even though traffic is ciphered at PDCP, it is not hard to detect VoIMS calls

by analyzing the use of all DRBs.

However, the reality is more complex and challenging. First, the mapping between a DRB

number and its supported traffic type (e.g., DRB3 for VoIMS voice traffic) is never fixed or explicitly

defined by any VoIMS standard. As a result, it varies with network operators and changes over time.

Notably, each UE can have up to 8 DRBs. All PDCP packets transmitted over DRBs are encrypted,

making discovering the DRB transmitting VoIMS packets challenging. Second, there are many

packets over the air; it is not scalable to inspect all the packets in a large number of concurrent DRBs

for an extended period. Here, we need a reliable, scalable, and lightweight solution to effectively

detect the presence of VoIMS calls by concurrently screening all DRBs used by nearby mobile users.

Our call detection approach exploits two voice quality optimization techniques: ROHC [46,

31, 37] and CN [25, 27, 21]. It is based on two key facts: (1) both ROHC and CN are mandatory

features for VoIMS services regulated by 3GPP standards [25, 27, 21]; and (2) they produce special

voice packets whose sizes are significantly smaller than non-VoIMS packets. The use of both

techniques has been observed in all VoIMS call experiments conducted with three U.S. operators.

A CN voice packet contains 35∼48 bits (4.375∼6 bytes) for noise information [24, 26, 28]; it is

then encapsulated into an RTP packet using the payload type of 13 [79]. With ROHC, it is further

compressed into a PDCP packet with a length of 8∼13 bytes. These tiny PDCP packets only appear

during VoIMS calls, making the detection of their presence an indicator of VoIMS calls. Once a

tiny packet is detected, we can determine its DRB number, revealing which DRB is used for VoIMS.

Notably, the vulnerability lies in the fact that both ROHC and CN are exclusive to VoIMS and not

used for other types of traffic.

Empirical Validation. We conducted experiments with three U.S. operators to validate that only

VoIMS calls generate tiny PDCP packets, while non-VoIMS applications do not. Our tests covered

four phone models, including Google Pixel 5, Pixel 3, Samsung S5, and LG G3, all supporting

VoLTE, and Pixel 5 supporting VoNR as a 5G phone. We ran three VoIMS-based applications

(VoLTE, VoNR, and Google Voice) and 57 non-VoIMS applications selected from the top-100

53

mobile Internet applications. These test applications were roughly categorized into three other

groups: (1) Non-VoIMS VoIP (e.g., Skype), (2) Non-VoIP streaming (e.g., Netflix, YouTube), and

(3) Non-streaming (e.g., Amazon, Twitter, Reddit). For all VoIP applications (including VoIMS and

non-VoIMS VoIP), each test consisted of a 30-second voice call with 10 seconds for ringing and

20 seconds for call conversation. For non-VoIP streaming applications, each test involved video

streaming for approximately 5 minutes. For non-streaming applications, we continuously accessed

their Internet services (e.g., sending messages, refreshing online content, and searching for products)

for 1 minute.

Table 4.1 shows the PDCP packet lengths for three VoIMS applications and the minimal length

observed for each non-VoIMS application in our study. We made four key findings from this data:

First, we observed that all three U.S. operators have implemented both ROHC and CN techniques

for VoIMS calls. Second, none of the non-VoIMS applications produced tiny PDCP packets, while

many of these tiny packets were consistently present during VoIMS calls. This finding remained

consistent across all four phone models and three operators. Third, our approach was effective in

detecting the presence of multiple concurrent VoIMS calls, with the capability to identify up to four

concurrent VoIMS calls tested in our study. Finally, the number of tiny packets detected during a

VoIMS call varied across network operators and mobile device models. We later demonstrate that

this variation is primarily due to differences in speech coding rates (AMR), which directly impact

call state inference (as discussed in §4.2.2).

4.2.2

Inferring Call States

In this section, we describe the method for inferring finer-grained call states once a VoIMS call is

detected. When the adversary, denoted as Evil, deploys a sniffer near Alice, the observed call states

from Alice’s perspective include two primary conditions: "talking" and "not-talking (listening)"

while the call conversation is active. In this context, "talking" indicates that Alice is speaking, while

"listening" means that Alice is not speaking but rather listening to Bob.

A straightforward approach for inferring call states is to examine the presence of Comfort Noise

(CN) and non-Comfort-Noise (nonCN) voice packets in both the downlink (DL) and uplink (UL)

54

V
o
I
M
S

N
o
n
-
I

M
S
V
o
I
P

N
o
n
-
V
o
I
P
S
t
r
e
a
m
i
n
g

N
o
n
-
S
t
r
e
a
m
i
n
g

Skype
Telegram
Discord
TextNow

VoLTE (4G)
VoNR (5G)
Google Voice

No. App Name
1
2
3
4 WhatsApp
5 WhatsApp Business
6
7
8
9
10 Google Hangouts
Snapchat
11
Twitch
12
Spotify
13
14 Netflix
15
16 Disney+
17 Amazon Prime Video
18 Xbox
19 NewsBreak
20 Bigo Live
21
Shazam!
22 YouTube Kids
23
24 Microsoft Edge
Twitter
25
26
Photomath
27 Microsoft Teams
28
29 Booking
30 Duolingo

The Weather Channel

SoundCloud

Zillow

Len
11 – 13
12 – 13
13
32
32
37
42
42
42
50
54
42
42
42
42
42
42
42
42
42
54
59
42
42
42
42
54
42
42
42

N
o
n
-
S
t
r
e
a
m
i
n
g

Snapfish

Instagram
TikTok

No. App Name
31 Amazon
32 Reddit
33 McDonald’s
34 DuckDuckGo
35 DoorDash
36 WeatherPort
37 Waze
38
39
40 Walmart
41 Airbnb
42 AAA
43
44 Apex News
45
Expedia
46 Chrome
47 Brave Browser
48 Google Earth
49
50
51 Google Translate
52 Microsoft Authenticator
53 Acrobat Reader
54 Google Docs
55 DNS Speed Test
56
FTP Server
57 Outlook
58 Canva
59 Google Authenticator
60

SpeedVPN
Thunder VPN

Pinterest

Len
42
42
42
42
42
42
42
42
42
42
42
42
42
42
42
42
54
61
30
37
42
42
42
42
42
42
42
54
54
54

Table 4.1 Tiny packets are only observed in VoIMS.

directions to determine whether Alice is talking. Specifically, UL-nonCN packets are transmitted

when Alice is speaking, while UL-CN packets are transmitted when Alice is not speaking and is in

a period of silence. Similar observations are made for DL packets, depending on whether Bob is

talking.

However, several practical issues need to be addressed, as shown in Figure 4.2.

First, there may be situations where the call conversation has not yet begun, even when packets

55

Figure 4.2 Call state inference for a single detected call (§4.2.2).

are transmitted over the target DRB. This scenario occurs in cases of premium voice services. For

instance, with features like Early Media [80], the callee can play alerting media (e.g., a song) to

the caller before the conversation starts. Consequently, we introduce an additional call state, "no

conversation," which indicates that the DRB is active, but the call has not been fully established.

This state is identified when both #UL-CN and #DL-CN are equal to zero. The conversation begins

as soon as the first CN packet is observed. Notably, the first CN packet can appear in either the DL

or UL direction, as Alice can be either the caller or the callee.

Second, distinguishing CN and nonCN packets solely based on their packet lengths can be

challenging. However, our observation reveals that nonCN packets are consistently larger than CN

packets. NonCN voice packets have a minimum PDCP payload length of 15.08 bytes (rounded up to

16 bytes), which is observed when the lowest VoIMS codec bit rate for nonCN voice packets is used

(4.75 Kbps via AMR), and the inter-arrival time is an average of 20 ms (resulting in 50 packets per

second). ROHC further reduces the size of RTP headers to 3.2–6.5 bytes. In contrast, CN packets

have a maximum length of 13 bytes. In our approach, we employ a threshold of 𝜃 = 16, where

packets with a payload length of less than 16 bytes are considered CN packets.

To infer call states more accurately, we collect packet statistics every second, including the

counts of CN and nonCN packets in the UL and DL, labeled as #UL-CN, #UL-nonCN, #DL-CN,

and #DL-nonCN. Table 4.2 lists the criteria or intermediate results used to infer the three call states

based on the presence of these four packet types: "No conversation" for situations before the call

begins when both #UL-CN and #DL-CN are equal to zero; "Talking" when #UL-nonCN is greater

56

Call State

Talking
Listening
No conversation

Intermediate Criteria
#UL-CN #UL-nonCN #DL-CN #DL-nonCN

∗
>0
=0

>0
∗
∗

>0
∗
=0

∗
>0
>0

Table 4.2 Intermediate criteria used to infer call states (∗: wildcard).

than zero and #DL-CN is greater than zero, indicating that the user is sending voice packets to the

remote party while receiving comfort noise packets; and "Listening" when #DL-nonCN is greater

than zero and #UL-CN is greater than zero, indicating that the user is not sending voice packets but

receiving comfort noise packets from the remote party.

In practice, short-term call state inference based on these criteria may not be sufficient to

accurately determine call time due to potential noise. We observe tiny CN packets in both directions

even when Alice is talking or listening, as explained in Sections 4.2.3 and Figure 4.3a. In the

following sections, we present our final approach for call state inference and discuss inferring call

time.

4.2.3

Inferring Call Time

(a) An intuitive approach (§4.2.2).

(b) Our final approach with time inference.

Figure 4.3 Comparison in a real-world instance.

The approach outlined in Section 4.2.2 involves inferring call states on a per-second basis and

accumulating the time periods associated with the talking and listening states. However, practical

57

AmplitudeTimetalk starttalk endAmplitudeTimetalk starttalk endFigure 4.4 Side-channel inference uses DBSCAN and MAVG to prevent unnecessary talking-listening
state switches (§4.2.3).

implementation reveals that this method encounters challenges due to the presence of additional

noise packets. These noise packets fall into two distinct categories: "hidden" noise and "redundant"

comfort noise.

Hidden noise packets are unvoiced nonCN packets generated in response to unresolved en-

vironmental noises when one party involved in the call is in the listening state. On the other

hand, redundant comfort noise arises during brief speech pauses while the person is talking. The

consequence of these noise packets is frequent and undesired transitions between the talking and

listening states, as depicted in Figure 4.3a. Short speech pauses, often accompanied by redundant

comfort noises, lead to inaccurate inferences regarding the talking state. These inaccuracies involve

wrongly concluding that talking has ceased and transitioning to the listening state, only to revert

quickly to the talking state as the person continues talking. Similarly, hidden noise packets affect

the precision of inferences about when the listening state ends. Accurate inference of call times

is essential for various call-state-based applications and attacks. This is particularly important in

situations where multiple users engage in phone calls simultaneously. Inaccurate call times may not

furnish adversaries with sufficient information to distinguish between calls or associate them with

specific cellular/user identities.

To address these issues, we introduce two approaches: (1) density-based spatial clustering of

58

applications with noise (DBSCAN) and (2) the moving average of the voiced packet ratio (MAVG).

These strategies mitigate unnecessary transitions between the listening and talking states, resulting

in more precise inferences of the start and end times for each call state, as depicted in Figure 4.4.

◦ DBSCAN serves to manage hidden noises and prevent unwarranted transitions from the

listening state to the talking state. We analyze nonCN packets and categorize them into two groups:

(1) voiced nonCN and (2) unvoiced nonCN. Hidden noise packets belong to the unvoiced nonCN

category. Unvoiced nonCN packets do not contain user voice but carry uncanceled environmental

noise. By classifying these unvoiced nonCN packets as comfort noise packets, we mitigate the

issue of prematurely leaving the listening state. Differentiating between these categories remains

challenging due to the variable voice coding rates inherent in the AMR audio codec used by VoIMS.

To address this, we employ a classifier based on the well-established DBSCAN algorithm, classifying

nonCN packets into voiced and unvoiced nonCN packets with a user-defined number of clusters or

categories and a suitable 𝜖 value, representing the maximum distance range for data points in the

same cluster. In our prototype, 𝜖 is set to 10, delivering comparable performance across all audio

coding rates outlined in VoIMS standards [26, 24, 28].

◦ MAVG addresses redundant comfort noises and prevents unwarranted transitions from the

talking state to the listening state. This is achieved by ensuring that the transition from talking to

listening is not solely based on minimal comfort noise. We develop a moving average algorithm that

considers both comfort noise packets and unvoiced nonCN packets, which may be generated during

short speaking pauses. The algorithm operates as follows: First, within a predefined time window

(e.g., 2 to 4 seconds), denoted as 𝑤𝑛𝑑, we collect statistics on the numbers of uplink comfort noise

packets, unvoiced nonCN packets, and voiced nonCN packets, represented as #CN, #Unvoiced-nonCN,

and #Voiced-nonCN, respectively. Subsequently, we compute the percentage of voiced packets,

labeled as 𝑉 𝑃, within each 𝑤𝑛𝑑, using the formula

#𝐶𝑁+#𝑈𝑛𝑣𝑜𝑖𝑐𝑒𝑑−𝑛𝑜𝑛𝐶𝑁+#𝑉 𝑜𝑖𝑐𝑒𝑑−𝑛𝑜𝑛𝐶𝑁 ∗ 100%.
If the observed 𝑉 𝑃 exceeds 50%, the state is inferred as "talking"; otherwise, it is inferred as

#𝑉 𝑜𝑖𝑐𝑒𝑑−𝑛𝑜𝑛𝐶𝑁

"listening."

Figure 4.3b provides an illustrative example illustrating the efficacy of our proposed solutions in

59

Metrics

C Inference accuracy

Time errors

T Inference accuracy

Time errors

L Inference accuracy

Time errors

S5

G3

Cross-Phone Exp.

Cross-Carrier Exp.
OP-I OP-II OP-III OP-III OP-III
S5
S5
100% 100% 100% 100% 100%
0.51s
0.57s
1.8s
100% 100% 100% 100% 100%
0.89s
0.99s
0.36s
100% 100% 100% 100% 100%
0.76s
1.22s
0.48s

OP-III
Pixel 3 Pixel 5†
100%
0.39s
100%
0.22s
100%
0.31s

0.45s

0.14s

0.53s

0.6s

0.3s

0.4s

Table 4.3 The accuracy of VoIMS call detection, state and time inference (†:VoNR). C: Conversation,
T: Talking, L: Listening.

preventing unnecessary state transitions, thereby resulting in more precise inferences of talking and

listening times.

In conclusion, we determine the time for a call conversation as follows: The conversation initiates

when the "talking" or "listening" state is identified for the first time. This approach is more robust

compared to detecting the first PDCP packet over the target DRB. The conversation concludes when

the last PDCP packet is sent or received over this target DRB. However, it is essential to acknowledge

that time inference may exhibit slight inaccuracies because the official establishment or termination

of the call occurs via SIP, delivered over SRB, rather than DRB.

4.2.4 Evaluation on 5G/4G Call Inference

We conducted extensive experiments involving three major US operators, namely OP-I, OP-II,

and OP-III, to evaluate the effectiveness (accuracy) of the proposed side-channel inference techniques

in terms of VoIMS call detection, and call state and time inference. Our tests involved four 4G/5G

smartphones: the Samsung Galaxy S5, LG G3, Google Pixel 3, and Google Pixel 5 (a 5G phone), and

two mainstream VoIMS services, VoLTE and VoNR (limited to Google Pixel 5). Each experiment

setting, which includes the operator, phone model, and VoIMS service, was executed 20 times. In

each run, a victim VoIMS call spanned 30 seconds: 10 seconds for alerting, 10 seconds for talking,

and 10 seconds for listening. The callee answered an incoming call 10 seconds after the ringtone,

after which the caller and callee engaged in a 10-second conversation. Simultaneously, other

participating smartphones (not the victim) ran various accompanying traffic, including non-VoIMS

60

Internet applications and VoIMS calls.

We opted for fixed 10-second intervals in the evaluation experiments because they offered ample

time to explore different call state transitions. The evaluation of call inference with varying intervals

will be discussed in Section 4.3.1, 4.3.2 (proof-of-concept attacks).

Table 4.3 illustrates the success of side-channel call inference in both cross-carrier and cross-

phone scenarios. Due to space constraints, we present the results for all operators using the Samsung

S5 and the results for all phone models using OP-III. Our observations include the following:

1. All VoIMS calls and states are reliably detected, and non-VoIMS traffic is never mistakenly

recognized as VoIMS calls.

2. Average time errors remain consistently below 9%, except for the inference using the Samsung

S5 over OP-III. Specifically, the average errors range from 0.3 seconds to 0.57 seconds (1.5% to

2.85%) for conversation (20 seconds), from 0.14 seconds to 0.89 seconds (1.4% to 8.9%) for talking

(10 seconds), and from 0.31 seconds to 0.76 seconds (3.1% to 7.6%) for listening (10 seconds).

3. In experiments involving the Samsung S5 with OP-III, we noted that during a VoIMS call,

comfort noise packets were transmitted at a low rate (i.e., less than or equal to 7 packets per second)

to the cellular infrastructure. This rate is significantly lower than the rate stipulated by the VoIMS

standard (i.e., 50 packets per second) [24, 26, 28, 16]. Consequently, this resulted in longer call

state inference times and higher error rates. This anomaly could be attributed to an implementation

flaw specific to the Samsung S5 and OP-III, as similar issues were not observed with other tested

phone models and operators.

4.3 Proof-of-concept Attacks

In the following sections, we introduce several proof-of-concept attacks that leverage the inferred

VoIMS call state information. We start with passive attacks that reveal more sensitive information

beyond call states, such as the caller’s identity (§4.3.1). Subsequently, we present an active attack

that utilizes the inferred call information to overshadow a selected voice call and mute the victim at

specific times (§4.3.2).

61

4.3.1 Who is Calling?

The first passive attack aims to identify the caller and connect her two distinct identities: the

user identity (e.g., name and phone number) and the cellular identity (e.g., International Mobile

Subscriber Identity - IMSI, Radio Network Temporary Identity - RNTI, and Globally Unique

Temporary Identity - GUTI). If an attacker can successfully link a mobile user’s user and cellular

identities, it opens the door to powerful cellular-identity-based cyberattacks, such as IMSI-based

Denial-of-Service (DoS) attacks [141], which can be targeted at specific high-value victims instead

of random individuals.

Numerous studies, including [56, 97], have demonstrated how adversaries can easily acquire

user identities (including names and phone numbers) of mobile users through online payment

services (e.g., PeopleLooker [10]), social network platforms, and data breaches from online service

providers [11]. Regarding cellular identities, several methods [86, 77, 88] have been proposed to

infer or link them to each other (e.g., by forcing a device to transmit its IMSI or linking RNTIs to an

IMSI [88]). However, currently, no studies have presented stealthy methods to link user identities to

cellular identities for a mobile user. While some research suggests the feasibility of such linkage,

their techniques are not covert. For instance, Hussain et al. [77] require making multiple calls to the

victim while already knowing their phone number. Importantly, it’s worth noting that compromising

the carrier’s infrastructure is not considered within the threat model outlined in Section4.1.

We thus developed a novel attack called "Cross-domain Identity Linkage attack" or "CrossIL,"

which leverages precise call state inference and correlates inferred call states with related visual data

extracted from the visualization domain (such as video recordings). The core idea of this attack is

illustrated in Figure 4.5.

The motivation for the CrossIL attack arises from two key factors in the visualization domain:

Distinct User Postures: Users typically exhibit different postures when using VoIMS services,

such as holding a phone next to their ear. This behavior contrasts with how they interact with other

mobile services, like Internet surfing or texting. To validate this, we conducted an online survey

with college students. Among the 83 collected responses, 53 participants preferred to place their

62

Figure 4.5 Overview of an user identity linkage attack.

phones near their ears during phone calls in public, while 30 participants did not (e.g., they used

earphones).

Face Recognition Advances: Face recognition techniques have become increasingly sophisticated,

allowing adversaries to recognize people’s faces with high accuracy (greater than 90%) in video

frames, even when the faces are small or appear in tiny frames. Once a face is recognized, adversaries

can use reverse image search engines (e.g., PimEyes [13]) to find the owner’s name and then obtain

their phone number through paid online services (e.g., PeopleLooker and Spokeo).

It’s important to note that this attack involves deploying cameras and sniffers in a specific area

(e.g., an airport or subway) to identify potential victims making calls on the spot. The attack does

not target pre-selected individuals. Once eligible victims are identified, adversaries can launch

cellular-identity-based attacks (e.g., IMSI-based Denial-of-Service [141]) against high-value victims

only, either immediately or at a later time, instead of randomly selecting targets, which is an easier

but less effective approach.

Practicality of the Attack. Critics might raise concerns about the practicality of the CrossIL attack,

specifically regarding three issues: (1) Deployment of Devices: Adversaries need to install cellular

radio sniffers and surveillance cameras in public areas, which could potentially be discovered. (2)

Multiple Users Making Calls: Multiple users might have VoIMS calls at similar times, making it

63

challenging to distinguish between them. (3) Lack of VoIMS Users: There may be no VoIMS users

with ongoing calls during surveillance, making the attack less practical. However, we believe that

these issues can be addressed without significant technical challenges: Modern hidden cameras [12]

are discreet, have extended battery life, and offer ample storage capacity. Portable sniffers can cover

more than 1 km [86], enabling covert operations. Our precise call state inference mechanism can

differentiate between multiple VoIMS users with concurrent calls. The CrossIL attack specifically

targets VoIMS users, so adversaries can strategically deploy the sniffers and cameras in selected

public locations where phone calls are frequent, such as airports and hotel lobbies. With a high

volume of people passing through during the day, it is likely that individuals making calls will be

observed within the surveillance coverage.

Attack design. The high-level concept of this attack involves gathering cellular identities from radio

traces (via VoIMSAnalyzer) and user identities from the visualization domain (via VideoAnalyzer),

and then linking them by correlating the victim’s call states and related motions. The primary

challenge in launching this attack lies in accurately detecting when a voice conversation starts from

recorded videos. Specifically, the main difficulty arises from distinguishing between the following

two scenarios: The user initiates an outgoing call and waits for the called party to answer. The user

answers an incoming call and listens to the caller without speaking at all. In both of these scenarios,

the user exhibits similar behaviors: they move their phone to their ear and have no lip motions for a

period. This issue leads to notable inference errors in call state determination from the visualization

domain, significantly reducing the effectiveness of the attack. To address this challenge, we have

devised a novel approach called "cross-domain indeterministic call state correlation." This approach

introduces an indeterministic state L’ to account for these two scenarios. We will now elaborate on

its three key components.

◦ VoIMSAnalyzer. The new functionality introduced for this attack involves extracting the

cellular radio identity of each VoIMS call, such as C-RNTI, IMSIs, and TMSIs [30, 36]. In other

words, VoIMSAnalyzer is now capable of not only discovering cellular identities but also inferring

the corresponding VoIMS call states (e.g., talking and listening times) from encrypted radio traces.

64

Figure 4.6 Three steps for cross-domain identity linkage.

◦ VideoAnalyzer. The attack capitalizes on the increasingly mature face recognition techniques

that can successfully recognize individuals with a high accuracy (exceeding 90%) in video frames,

even when dealing with small or tiny faces [74, 95, 116, 73]. Furthermore, it takes advantage

of public image search engines designed to identify people using facial images. For instance,

PimEyes.com, as of August 2022, boasts a database containing more than 2.1 billion unique

faces [13]. VideoAnalyzer’s methodology involves the extraction of call-related motions specific to

each user’s identity. This can include actions like moving a phone closer to or further away from

an ear. These extracted motions are then utilized to generate estimated call statistics from video

recordings.

VideoAnalyzer comprises three distinct sub-modules: (1) Call Motion Detector: This component

is responsible for detecting two voice-call-specific human activities—moving a phone close to

and away from an ear. These actions are used to identify the start and end times of phone calls,

respectively. (2) Lip Motion Detector: The Lip Motion Detector serves the purpose of identifying

the start and end times of each talking or listening interval. It achieves this by analyzing human

lip motions, employing a recurrent neural network (RNN) model [115]. (3) Face Detector and

Recognizer: This module is in charge of locating each user’s face in video frames and identifying

their corresponding user identity, such as their name. It utilizes the Dual Shot Face Detector

(DSFD)[93] for detecting human faces and the ResNet50[72] model for recognizing these faces. For

every identified user identity, VideoAnalyzer produces output detailing the start and end times of

each call conversation, along with the talking and listening time intervals, which may be interleaved.

65

Score TableOverlap DetectionVideo CandidatesL’   L’   T   T   T   L0.81.00.80.60.40.50.80.60.5✓RadioVideoV1.1V1V1.2V1.3R1R2R3R1R2R3V1.1V1.2V1.3◦ Cross-domain identity linkage. This component associates a cellular radio identity with a user

identity by correlating their corresponding call event sequences generated by VoIMSAnalyzer and

VideoAnalyzer. The process of cross-domain indeterministic call state correlation is illustrated in

three steps, as depicted in Figure 4.6.

Step 1. Given a video-induced call record produced by VideoAnalyzer, the correlator searches

through all radio-induced call records and checks whether any of them overlap with it based on their

call start and end times.

Step 2. Due to the indeterminacy of L’, which indicates listening to the other call party or waiting

for the called party to answer, the video-induced call event sequence, designated as 𝐶𝐸 𝑆𝑒𝑞𝑣𝑖𝑑𝑒𝑜,

is not in a deterministic form. We thus expand 𝐶𝐸 𝑆𝑒𝑞𝑣𝑖𝑑𝑒𝑜 to multiple deterministic call event

sequences by exploring all possible states of L’ in practice. For example, a video-induced call event

sequence, "S, L’, L’, T, T, L, E," can be expanded into three sequences: (1) "S, L, L, T, T, L, E"; (2)

"S, L, T, T, L, E"; and (3) "S, T, T, L, E," which outputs all possible call event sequences.

Step 3. We calculate matching scores between each of the radio-induced call event sequences and

all the sequences expanded from the given video-induced call event sequence; the correlation with

the highest matching score is chosen. We calculate the Edit distance (i.e., Levenshtein distance [87]),

which quantifies the similarity between two strings, between two selected call event sequences and

obtain their matching score as 1 −

Edit distance

|Longest call event sequence| . For example, the Edit distance between

"S, T, T, L, E" and "S, L, T, T, L, E" is 1, and their matching score is 0.83 (1 − 1

6 ).

Attack implementation.

In addition to VoIMSAnalyzer, we implement VideoAnalyzer using

Python3 on HPCC servers with the following libraries: Keras (Mask R-CNN and RNN lip movement

model), Pytorch (Dual Shot Face Detector), keras_vggface (ResNet50), scipy (Cosine Similarity),

and cv2 libraries. Notably, we did not need to collect a large-scale dataset, since all used models

were pre-trained [71]. For example, Mask R-CNN had been trained on the Coco dataset with 80K

training images [71] and validated on 35K test images; ResNet50 had been trained with 1.28 million

training images from ImageNet and evaluated on 50K test images. Correlator was implemented in

Python3 using timestamps recorded in radio traces and videos.

66

Attack evaluation. The attack evaluation is performed in both controlled (without passersby) and

wild (with passersby) environments. The controlled experiment was conducted in a classroom,

where only experiment participants were on campus during holidays, whereas the wild experiment

was carried out in the lobby of a dormitory with passersby. There were 7 participants, and each of

them was required to freely dial/receive 15 VoIMS calls within two hours under the surveillance of

two cameras (iPhone 12). The participants can make phone calls simultaneously. In the experiment,

we gauged the inference accuracy in terms of call start time, call end time, talking/listening times,

and the association between the cellular and user identities. To obtain the ground truth, we not

only collected cellular radio traces and recorded videos but also logged VoIMS call events from the

participants’ smartphones using the Android logcat.

Table 4.4 summarizes the experimental results. We have four observations. First, the success

rates of linking cellular identities to user identities are 59/60 and 40/45 in controlled and wild

environments. Such high success rates are achieved even when VideoAnalyzer has up to 17.3% error

in estimating talking and listening times, since Correlator employs multiple call states, instead of

only talking and listening times. Second, most estimation errors from VideoAnalyzer in the wild

environment are obviously larger than those in the controlled environment. There are two reasons:

(1) cameras were occasionally blocked by passersby (1/45 phone calls), and (2) the brightness of

natural light is not always stable; specifically, 14/45 phone calls experienced short-time (a few

seconds) underexposure/overexposure issues. To address this issue, adversaries may deploy multiple

hidden surveillance cameras to reduce potential interference and noise, such as when a victim’s face

is blocked by passersby. Third, VideoAnalyzer can precisely recognize the faces of participants for

all the VoIMS calls and then discover their names from our database. Fourth, VoIMSAnalyzer in

the controlled experiment has similar errors in estimating call start and end times as in the wild

experiment, whereas its estimation on talking and listening times in the wild setting has higher errors

(2.8%∼6.5%) than that in the controlled setting (2.8%∼4.9%). The reason is that the background

noise of the wild environment is larger than that of the controlled one.

Current prototype limitations. While our proposed attack demonstrates effectiveness in our

67

Module

Performance Metrics

Controlled Settings

Wild Settings

User1

User2

User3

User4

User5

User6

User7

RadioAnalyzer

Events
Call
Time Estima-
tion

Call start error

0.92s

0.32s

0.85s

0.85s

1.20s

1.43s

1.60s

Call end error

0.18s

0.27s

0.32s

0.37s

0.32s

0.15s

0.28s

Talking & listen-
ing time error

1.8s
(4.6%)

1.4s
(2.8%)

2.3s
(4.9%)

1.6s
(3.9%)

1.5s
(2.8%)

2.88s
(6.5%)

2.65s
(6.3%)

VideoAnalyzer

Correlator

Call
Events
Time Estima-
tion

Face Recogni-
tion
Cellular
and
Linkage

ID
User

Call start error

1.87s

2.53s

1.99s

2.0s

6.42s

3.09s

2.85s

Call end error

2.15s

3.74s

3.97s

2.14s

3.01s

4.95s

1.18s

Talking & listen-
ing time error

3.2s
(8.2%)

4.12s
(8.3%)

2.15s
(4.6%)

2.23s
(5.4%)

9.25s
(17.3%)

6.67s
(15.1%)

4.37s
(10.4%)

Accuracy

100%

100%

100%

100%

100%

100%

100%

Accuracy

100% 93.5% 100% 100% 86.7% 86.7% 93.5 %

( 15/15)

( 14/15)

( 15/15)

( 15/15)

( 13/15)

( 13/15)

( 14/15)

Table 4.4 Summary of cross-domain identity linkage attack performance.

experiments, the current prototype exhibits several limitations. These limitations include: (1) the

necessity for video recordings at a resolution of 1080P or higher; (2) skewed, crooked, and blurred

faces cannot be well recognized; (3) immunity of video call and earphone users to this attack.;

and (4) it only considers the locations with good cellular signals. Some techniques can be used

to improve the prototype. To address these shortcomings, various techniques can be employed to

enhance the prototype’s performance. For instance, employing tiny face recognition techniques

as presented in studies such as [82, 70, 74, 95] can facilitate the recognition of small faces within

low-resolution video recordings. To tackle skewed and crooked face recognition, methods outlined

in [128, 94, 96] can be implemented. Furthermore, techniques from studies like [107, 100] can

be applied to recognize blurred faces and address similar issues. We defer the exploration and

implementation of these potential improvements to our future work.

Limited to 4G? It might be argued that 5G users are immune to the proposed attacks since 5G

mobile devices do not transmit the permanent cellular identity (e.g., IMSI) in plaintext but rather in

ciphertext, known as Subscription Concealed Identifier (SUCI), making it impossible to learn the

cellular identity. However, this may not be the case, as some researchers have demonstrated network

downgrade attacks [83] capable of downgrading 5G mobile devices to legacy 4G networks.

68

Mapping RNTIs to IMSIs in a Short Time. While our sniffer is designed for long-term traffic

sniffing, prior studies have shown that adversaries can compel a mobile device to transmit its IMSI

to the cellular infrastructure using a fake EMM Service Request message [61]. Subsequently, they

can continuously correlate all RNTIs assigned to this device [89].

4.3.2 Selective Voice Muting Attack

We have developed an active attack aimed at selectively muting one of the call parties based

on inferred call states. This attack falls under the category of a Denial-of-Service (DoS) attack.

However, it differentiates itself from typical jamming attacks that disrupt calls by continuously

transmitting wireless noise indiscriminately. Such jamming attacks often result in degraded channel

quality, which can be easily detected through physical-layer performance metrics, including bit error

rate (BER) and signal-to-noise ratio (SNR) [52].

In contrast, our attack operates by strategically overpowering the victim’s voice packets with

stronger signals when necessary, particularly when the victim is actively engaged in conversation

(i.e., when the victim is talking). It accomplishes this by transmitting valid PDCP (Packet Data

Convergence Protocol) packets through the victim’s assigned uplink channels at specific times.

Consequently, the remote call party becomes unable to hear the victim’s voice, potentially leading

to call termination after a prolonged silent period.

In comparison to conventional jamming attacks, such as Jammer-V [113], which can be readily

detected through abnormal fluctuations in physical-layer performance metrics like BER and SNR,

our proposed attack offers a dual advantage. Not only does it manage to evade detection effectively,

but it also optimizes the attack cost by transmitting PDCP packets only during moments when the

victims are actively engaged in conversation.

Attack design. This attack uses a cellular sniffer and a VoIMS analyzer built for the previous attacks.

The main change is that we reduce the inference window from several seconds to 200 ms, in order to

launch this attack in real-time. The inference threshold changes accordingly; if the percentage of

voice packets is greater than 40%, the talking state is inferred; otherwise, the state is inferred as

listening. Whenever the victim enters the talking state, this component signals the voice muter to

69

start the overshadowing attack until it goes to the listening state.

The attack overshadows the victim’s uplink voice packets through Uplink Voice Muter. With the

victim’s C-RNTI and uplink control information, it fabricates valid packets (i.e., random Internet

Control Message Protocol (ICMP) packets) and transmits them using the physical uplink channels

granted to the victim using stronger signals (e.g., 3dB higher [141]).

Attack implementation. We implement this attack on an SDR platform using srsRAN (v20.10.1) [14],

which can connect to the operational cellular network. It monitors the Physical Downlink Control

Channel (PDCCH) to collect the uplink and downlink control information (DCI) [29] from nearby

cellular devices. When the talking state of the target victim is detected, both the C-RNTI and

uplink control information are sent to the uplink voice muter, which overshadows the victim’s uplink

signals. We modify the values of the transmission gain (tx_gain) and receiver gain (rx_gain) in

the ue.conf file to generate stronger signals as did in [141].

Attack Design. This attack leverages a cellular sniffer and a VoIMS analyzer previously developed

for other attacks. The key modification is the reduction of the inference window from several

seconds to 200 ms, enabling real-time attack execution. Correspondingly, the inference threshold

has been adjusted: if the percentage of voice packets surpasses 40%, it is inferred as the talking

state; otherwise, it is inferred as the listening state. When the victim transitions into the talking

state, this component signals the voice muter to initiate the overshadowing attack until the victim

returns to the listening state.

The attack aims to obscure the victim’s uplink voice packets through the use of the Uplink Voice

Muter. With access to the victim’s C-RNTI (Cell Radio Network Temporary Identity) and uplink

control information, this component generates valid packets (e.g., random Internet Control Message

Protocol (ICMP) packets) and transmits them over the physical uplink channels allocated to the

victim, employing stronger signals, often around 3dB higher [141].

Attack implementation. The attack is implemented on a Software-Defined Radio (SDR) platform

using srsRAN (v20.10.1) [14], facilitating connectivity with the operational cellular network. It

monitors the Physical Downlink Control Channel (PDCCH) to gather uplink and downlink control

70

Figure 4.7 The selective muting attack only overshadows Alice’s uplink voice data when she is
talking.

information (DCI) [29] from nearby cellular devices. Upon detecting the talking state of the target

victim, both the C-RNTI and uplink control information are relayed to the uplink voice muter, which

proceeds to overshadow the victim’s uplink signals. This is achieved through the adjustment of

parameters such as transmission gain (tx_gain) and receiver gain (rx_gain) within the ue.conf

file, in a manner similar to what was demonstrated in [141].

Attack evaluation. We assess the efficacy of the proposed attack, referred to as SeletiveMuter, by

conducting a comparative analysis against Jammer-V, an attack method utilizing state-of-the-art

techniques [113] to disrupt VoIMS calls. Our experimental setup involves 30 participants, comprising

4 callers and 26 callees. Each callee is paired with a caller. The callees are categorized into two

groups: "friends" and "strangers." In the first group, the assigned caller’s phone number is known to

the callee, while in the second group, it remains unknown. The experiments span a duration of two

weeks.

For each caller, a minimum of three voice calls are placed to their assigned callees, under the

conditions of both the SeletiveMuter and Jammer-V attacks. The muting attacks are executed

against approximately two-thirds of the outgoing calls, with SeletiveMuter and Jammer-V methods

being employed in equal measure.

Figure 4.7 illustrates the operation of SeletiveMuter in a specific instance. Meanwhile, 4.8

compares the two attacks using the "attack time ratio" metric, defined as Overshadowing time

Call time

. Our

71

Figure 4.8 Boxplot of attack time ratios (ATRs) in three groups: Stranger, Friend and All (Stranger
+ Friend).

analysis yields three key findings. First, all attack instances, whether SeletiveMuter or Jammer-

V, achieve a 100% success rate. Across a total of 26 instances each, calls are terminated, on

average, within a timeframe ranging from 6.2 seconds (strangers) to 16.8 seconds (friends). Second,

SeletiveMuter exhibits higher efficiency, reducing the median attack time ratio by 59.9% to 61.8%.

This improvement is attributed to Jammer-V overshadowing all uplink cellular signals of the

victims, regardless of whether they are actively speaking or listening. In contrast, SeletiveMuter

generates attack signals solely during the victim’s talking phase, as demonstrated in Figure 4.7.

Third, the attack time ratio is slightly higher in the friend group due to the callers engaging in more

conversation, and the callees not promptly terminating the call, even when voice communication is

disrupted. When combined with the previously mentioned passive attacks, this active attack can

compound the extent of damage.

4.4 Discussion

The real-world implications of side-channel VoIMS call inference extend beyond the proof-of-

concept attacks presented earlier.

Exploring Further Attacks. Precise call state inference can serve to augment existing research in

sociology and linguistics, where call state information is used to infer user profiles (e.g., residents,

commuters, visitors [62, 122]), personality traits (e.g., extroversion, agreeableness [117, 103]),

and dominant partners [63]. Additionally, it can be leveraged to deduce user behaviors (e.g.,

spam) [124, 125] and social interactions [137]. Furthermore, outgoing and incoming phone call

patterns can be instrumental in inferring individual wealth [119, 63]. Steele et al. [119] illustrated

that these call patterns provide valuable information for inferring the poverty and wealth of call

72

StrangerFriendAll(S+F)050100Jammer-VSelectiveMuterATR(%)parties. The stealthy monitoring and inference of voice calls raise additional privacy concerns in

this context.

For instance, side-channel VoIMS inference can be exploited in the context of robocalls or

Interactive Voice Response (IVR) systems, such as those used by banks or customer service lines.

Since IVR systems employ pre-recorded or synthesized voices, they generate distinctive audio

fingerprints. By leveraging fine-grained call state inference, adversaries could differentiate robocalls

from human conversations and potentially determine whether the victim is interacting with automated

services. This includes inferring whether a user is contacting a bank or specific company, raising

significant concerns about the leakage of sensitive personal activities and affiliations. While these

attacks may be bypassed by Non-VoIMS apps like Skype, they are effective in scenarios where

the phone number serves as the sole means of contact between the two call parties, such as in

business-related calls.

4.5 Solution

In this section, we present a standard-compliant solution to prevent VoIMS call states from

being inferred, and then assess its effectiveness. Seemingly, it is not difficult to solve the above

vulnerabilities by removing the unique features of those exploitable VoIMS packets described

previously (i.e., tiny compressed comfort noise packets and distinguishable voiced/unvoiced packets

in Section 4.2 ) with the insertion of additional padding to VoIMS packets (e.g., adding the padding

to compressed comfort noise packets so that their packet sizes are larger than or equal to 21 bytes

(i.e., smallest IP packets with the one-byte payload)).

However, the conventional padding-based solution will lead to real-world negative impacts: (1)

users may need to pay for the additional padding when operators charge the users by the volume of

data transmitted/received, and (2) the IMS media gateways (as shown in Figure 4.9) have to remove

the padding from VoIMS packets before forwarding them to the call recipients, thereby significantly

increasing the loading of the IMS media gateways when considering a large number of voice calls1.

Prototype. We thus propose to develop a singular rectifier that adds and removes necessary padding

1According to [104], Verizon customers make 800 million wireless calls per day, which is more than double the

population of the United State

73

Figure 4.9 Singular Rectifier (SR) at the PDCP layer.

at the PDCP layer of phones and base stations, as shown in Figure 4.9. Such design can address

the aforementioned two issues. First, users do not need to pay for the additional padding, which

does not reach the core network and is not counted by the charging function. Second, the loading of

handling the padding is distributed among front-end base stations, thereby preventing the overhead

from be imposed on the IMS media gateways.

We use srsUE [7], srsLTE [8], Open IMS Core [6], and UCT IMS Client 1.0.14 [126] to serve

as the 4G UE, the 4G infrastructure, the IMS core with a VoLTE server, and the VoLTE client app,

respectively. We implement SR@PDCP by modifying the PDCP layer at both the UE and the eNB

to handle necessary paddings of the VoIMS packets. In particular, we add paddings to all the PDCP

packets whose payload lengths are smaller than 20 bytes and increase the size to 42 bytes, which can

be observed from many non-VoIMS applications. The inserted paddings are removed at the PDCP

layer of both the UE and the eNB before the corresponding packets are forwarded to the upper layer

or next network element (e.g., the 4G gateway and 5G UPF).

Evaluation. We evaluate its effectiveness and overhead. We re-run VoIMS call experiments where

a user dials ten VoIMS calls from the srsUE and the callee answers each incoming call immediately.

Each call lasts for 25 seconds with 10s for talking and 15s for listening. The result shows that

VoIMSAnalyzer fails to detect any of the calls. Notably, the 4G core network does not receive

any VoIMS packets with additional paddings, so no additional charges are made. We evaluate the

solution overhead in terms of CPU usage, memory usage, and processing time. In practice, VoIMS

74

UEeNBGatewaySR@PDCPRelayGTP-UApplicationIPSR@PDCPRLCMACPHYRLCMACPHYUDP/IPL2L1IPGTP-UUDP/IPL2L1clients do not keep sending out compressed CN packets during voice calls. Here, we assess the

overhead in an extreme case where the VoIMS client keeps transmitting 11-byte comfort noise

VoIMS packets to the OpenIMS server. Figure 13 shows that the average CPU and RAM usages

slightly increase by 0.72% and 0.02% when the proposed solution is enabled. The average processing

time at the PDCP layer increases by 1.46 𝜇𝑠 per packet (from 15.24 𝜇𝑠 to 16.70 𝜇𝑠). Thus, the

solution is effective at an acceptable cost. The domain of 5G/4G voice calls has seen significant

advancements with the introduction of various optimization techniques. These techniques have been

designed to enhance the quality and efficiency of voice calls. However, it is important to recognize

that these enhancements do not come without a cost.

We shed light on the phenomenon of side-channel call inference that arises from the interplay

of these 5G/4G call optimization techniques, which, in turn, gives rise to potential threats against

5G/4G voice calls. One of the key findings of our research is that adversaries have the ability

to accurately infer confidential call information, including the determination of whether a call is

in progress, identification of the individuals engaged in the conversation, and the timing of their

communication, all without the need to decrypt the encrypted packets in transit through the air.

This acquired call information, when exploited, can lead to the launch of both active and passive

attacks against 5G/4G call users. It is essential to emphasize that these optimization techniques

were not designed with the intention of compromising security for the sake of improving call quality.

Instead, the security implications of these techniques are often subtle and result in unanticipated

side effects.

Addressing the new security challenges posed by evolving technologies in mobile networks is a

formidable challenge, one that is ongoing and boundless. This undertaking necessitates collaborative

efforts from all stakeholders, including standardization bodies, network operators, equipment

vendors, and mobile users.

75

CHAPTER 5

CONCLUSION

In this chapter, we summarize the key findings of our work, highlight the insights and lessons

learned, and outline two promising directions for future research.

5.1 Summary of Results

The rapid evolution of mobile networks—from the early days of circuit-switched systems to

today’s IP-based architectures—has transformed the way multimedia services are delivered. The

IP Multimedia Subsystem (IMS) has emerged as a central framework enabling advanced services

such as voice and video calling, SMS, and emergency communications over 4G and 5G networks.

While IMS has greatly improved flexibility, scalability, and service performance, this dissertation

demonstrates that it also introduces new and underexplored security challenges that threaten both

user privacy and service reliability.

This dissertation presents a comprehensive and systematic security analysis of IMS across two

major dimensions: mobile devices and network infrastructure.

Securing IP Multimedia Subsystem (IMS) On Mobile Devices. On the device side, we

identify four critical security issues arising from the transition of IMS clients from hardware-based

implementations to software-based applications. These weaknesses expose users to three novel

attacks that enable denial of service, spoofed messaging, and covert media hijacking. Our evaluation,

conducted across multiple devices and carriers in two countries, confirms that these threats are real

and impactful.

Enhancing the Privacy of Voice Services over IMS frastructure. On the infrastructure side,

we uncover a new class of side-channel vulnerabilities stemming from performance-optimizing

techniques used in 5G/4G voice services. Despite strong encryption protocols, our findings

show that adversaries can infer call states, speaker activity, and user identity simply by analyzing

encrypted traffic patterns. Through these investigations, this dissertation not only reveals previously

undocumented vulnerabilities in IMS-based communication systems but also demonstrates how

emerging technologies, though well-intentioned, can inadvertently compromise security. To that

76

end, we propose a set of mitigation strategies that offer immediate remedies while also highlighting

the necessity for longer-term collaboration among standard bodies, carriers, device manufacturers,

and users.

5.2

Insights and Lessons

As we move further into the 6G era and beyond, it is critical to proactively reassess the broader

security implications introduced by rapid technological evolution. Design changes—while necessary

for innovation—often introduce unforeseen vulnerabilities. Therefore, security must not be treated

as an afterthought but as a foundational element embedded throughout the entire innovation lifecycle.

As mobile communication systems become increasingly complex and interconnected, securing them

is no longer a one-time effort but a continuous, evolving process. This dissertation offers both

foundational insights and practical solutions to support that ongoing journey toward resilient and

secure network architectures. We have two key insights and lessons learned from our study.

Mobile device security lags behind the advancements in mobile network infrastructure. It

leaves end users more vulnerable to emerging threats. To achieve proper end-to-end protection,

future system designs must extend infrastructure-grade security to the mobile device side.

Privacy must be elevated as a design priority within network architectures such as IMS.

We found that the traditional security mechanism focused on performance and standard security

mechanisms like authentication and encryption while overlooking the growing imperative of user

privacy.

5.3 Future Work

Building upon the findings of this dissertation, there are two promising directions for future

research.

Security of Next-Generation 911 (NG911) services on mobile devices. NG911 is an advanced,

IP-based emergency communication system designed to replace legacy analog 911 infrastructure. It

supports richer forms of communication—including voice, text, photos, and videos—enabling more

dynamic interaction with emergency services. Notably, NG911 leverages an IP-based network similar

to the IP Multimedia Subsystem (IMS) for routing traffic from mobile devices. This means that

77

NG911 may share many of the same vulnerabilities identified in this dissertation. As the emergency

communication ecosystem shifts toward NG911, it is critical to investigate how adversaries might

exploit mobile phones as entry points to interfere with or hijack these life-saving services. Future

work will focus on identifying mobile-device-level threats specific to NG911, understanding their

impact, and proposing defenses that maintain the accessibility and reliability expected of emergency

communication systems.

Tackling privacy leakage in IMS-based robocalls involving interactive voice response (IVR).

Robocalls—automated calls where users interact with computer-operated voice systems—are

increasingly common in sectors like mobile banking and retail customer service. Organizations such

as JPMorgan Chase, Wells Fargo, Walmart, and Best Buy widely employ IVR systems to streamline

user interaction and operational efficiency. In these calls, the IVR system issues pre-recorded or

computer-generated prompts, and users respond through voice commands or keypad inputs. Given

the sensitive nature of many robocall scenarios, especially those tied to financial services, any

potential compromise in call privacy can carry serious consequences. This future work will continue

to examine the security of IMS services within radio access networks, with a focus on detecting

robocalls and, when possible, identifying user privacy during the conversation.

78

BIBLIOGRAPHY

[1]

[2]

[3]

[4]

20 android statistics in 2023 (market share and users). https://www.demandsage.com/
android-statistics/, 2023.

Federal Phone Call Recording Law.
criminal-resource-manual-1050-scope-18-usc-2511-prohibitions, 2020.

https://www.justice.gov/archives/jm/

srsran 4g with zmq virtual radios.
latest/app_notes/source/zeromq/source/index.html#zeromq-appnote, 2023.

https://docs.srsran.com/projects/4g/en/

tc-fw(8) — Linux manual page. https://man7.org/linux/man-pages/man8/tc-fw.
8.html,.

[5]

Internet Protocol, 1981. https://www.rfc-editor.org/info/rfc791.

[6]

Openimscore. http://openimscore.sourceforge.net/, 2008.

[7]

srsue. https://github.com/domi007/srsUE, 2017.

[8]

srsENB. https://github.com/topics/srsenb, 2019.

[9]

IMS Profile for Voice and SMS. https://www.gsma.com/newsroom/wp-content/
uploads/IR.92-v15.0-4.pdf, 2020.

[10] Peoplelooker. https://www.peoplelooker.com, 2020.

[11] Yahoo! data breache. https://en.wikipedia.org/wiki/Yahoo!_data_breaches,

2020.

[12] Mini

spy

camera

1080p.

https://www.amazon.com/

Spy-Camera-Charger-Hidden-Surveillance/dp/B07GCKZKX8/, 2021.

[13] Pimeyes-ceo: The user is the stalker, not the search engine. https://netzpolitik.org/
2022/pimeyes-ceo-the-user-is-the-stalker-not-the-search-engine/, 2022.

[14]

srsran 20.10.1. https://github.com/srsran/srsRAN/releases/tag/release_20_
10_1, 2022.

[15] Kingroot. https://kingrootapp.net/, Jan 2024.

[16] 3GPP. GSM 06.81: Digital cellular telecommunications system (Phase 2+); Discontinuous

Transmission (DTX) for Enhanced Full Rate (EFR) speech traffic channels, 1999.

[17] 3GPP. TS 22.228: Service requirements for the Internet Protocol (IP) multimedia core

79

network subsystem (IMS); Stage 1, 2000.

[18] 3GPP. TS 23.101: General UMTS Architecture, 2000.

[19] 3GPP. TS 23.125: Overall high level functionality and architecture impacts of flow based
charging; Stage 2 (Release 7), Jun. 2007. https://portal.3gpp.org/desktopmodules/
Specifications/SpecificationDetails.aspx?specificationId=790.

[20] 3GPP.

TS33.328:

rity, Nov. 2018.
SpecificationDetails.aspx?specificationId=2295.

IP Multimedia Subsystem (IMS) media

secu-
https://portal.3gpp.org/desktopmodules/Specifications/

plane

[21] 3GPP. TS 26.449: Codec for Enhanced Voice Services (EVS); Comfort Noise Generation

(CNG) aspects, March 2019. V15.1.0.

[22]

3GPP. TS24.011: Point-to-Point (PP) Short Message Service (SMS) support on mobile radio
interface, Nov. 2019. https://www.etsi.org/deliver/etsi_ts/124000_124099/
124011/15.03.00_60/ts_124011v150300p.pdf.

[23] 3GPP. TS 23.228: IP Multimedia Subsystem (IMS); Stage 2; V16.5, 2020.

[24] 3GPP. TS 26.071: Mandatory speech CODEC speech processing functions; AMR speech

Codec; General description, July 2020. V16.0.0.

[25]

3GPP. TS 26.092: Mandatory speech codec speech processing functions; Adaptive Multi-Rate
(AMR) speech codec; Comfort noise aspects, July 2020. V16.0.0.

[26] 3GPP. TS 26.171: Speech codec speech processing functions; Adaptive Multi-Rate -

Wideband (AMR-WB) speech codec; General description (Release 16), 2020.

[27] 3GPP. TS 26.192: Speech codec speech processing functions; Adaptive Multi-Rate -

Wideband (AMR-WB) speech codec; Comfort noise aspects, July 2020. V16.0.0.

[28] 3GPP. TS 26.441: Codec for Enhanced Voice Services (EVS); General Overview, 2020.

(V16.0.0).

[29] 3GPP. TS 36.213: Evolved Universal Terrestrial Radio Access (E-UTRA); Physical layer

procedures, 2020.

[30]

3GPP. TS 36.300: Technical Specification Group Radio Access Network; Evolved Universal
Terrestrial Radio Access (E-UTRA) and Evolved Universal Terrestrial Radio Access Network
(E-UTRAN); Overall description; Stage 2, 2020.

[31] 3GPP. TS 36.306: Evolved Universal Terrestrial Radio Access (E-UTRA); User Equipment

(UE) radio access capabilities(Release 13), 2020.

80

[32]

3GPP. TS 36.331: Technical Specification Group Radio Access Network; Evolved Universal
Terrestrial Radio Access (E-UTRA); Radio Resource Control (RRC); Protocol specification,
2020.

[33]

3GPP. TS 37.324: LTE; 5G; Evolved Universal Terrestrial Radio Access (E-UTRA) and NR;
Service Data Adaptation Protocol (SDAP) specification, 2020.

[34] 3GPP. TS 23.203: Policy and charging control architecture , Mar. 2021. https:

//portal.3gpp.org/desktopmodules/Specifications/SpecificationDetails.
aspx?specificationId=810.

[35] 3GPP. TS 24.341: Support of SMS over IP networks; Stage 3, 2021.

[36] 3GPP. TS 38.300: Technical Specification Group Radio Access Network; NR; NR and

NG-RAN Overall Description; Stage 2 (Release 16) , 2021.

[37] 3GPP. TS 38.306: NR; User Equipment (UE) radio access capabilities (Release 16), 2021.

[38] 3GPP. TS 23.205: Bearer-independent circuit-switched core network, 2022.

[39]

3GPP. TS 26.139: Real-time Transport Protocol (RTP) / RTP Control Protocol (RTCP) verifica-
tion procedures (Release 17), Apr. 2022. https://portal.3gpp.org/desktopmodules/
Specifications/SpecificationDetails.aspx?specificationId=3709.

[40] 3GPP. TS 33.102: 3G security; Security architecture, March 2022. V17.0.0.

[41] 3GPP.

TS 33.203:

3G security; Access security for IP-based services (Release
17) , Mar. 2022. https://portal.3gpp.org/desktopmodules/Specifications/
SpecificationDetails.aspx?specificationId=1055.

[42] 3GPP. TS 33.210: Network Domain Security (NDS); IP network layer security, Sep. 2022.

V17.1.0.

[43] 3GPP. TS 33.401: 3GPP System Architecture Evolution (SAE); Security architecture (Re-
lease 17), Sep. 2022. https://portal.3gpp.org/desktopmodules/Specifications/
SpecificationDetails.aspx?specificationId=2296.

[44] 3GPP. TS 33.501: Security architecture and procedures for 5G System (Release
18) , Mar. 2022. https://portal.3gpp.org/desktopmodules/Specifications/
SpecificationDetails.aspx?specificationId=3169.

[45] 3GPP. TS 48.016: General Packet Radio Service (GPRS), 2022.

[46] 3GPP.

TS 36.323:
Packet Data Convergence Protocol

Evolved Universal Terrestrial Radio Access (E-UTRA);
https:
(PDCP) specification, Mar. 2023.

81

//portal.3gpp.org/desktopmodules/Specifications/SpecificationDetails.
aspx?specificationId=2439.

[47] 3GPP.

TS 38.323: NR; Packet Data Convergence Protocol (PDCP) specifica-
https://portal.3gpp.org/desktopmodules/Specifications/

tion, Mar. 2023.
SpecificationDetails.aspx?specificationId=3196.

[48] 3GPP. 3GPP Portal. https://portal.3gpp.org/#/55934-releases, 2023.

[49] 3GPP. TS 24.008: Mobile radio interface Layer 3 specification; Core network proto-
cols; Stage 3 (Release 18), Apr. 2023. https://portal.3gpp.org/desktopmodules/
Specifications/SpecificationDetails.aspx?specificationId=1015.

[50] 3GPP.

TS 24.229:

IP multimedia call control protocol based on Session Ini-
tiation Protocol (SIP) and Session Description Protocol (SDP); Stage 3 (Release
18) , Apr. 2023. https://portal.3gpp.org/desktopmodules/Specifications/
SpecificationDetails.aspx?specificationId=1055.

[51] 3GPP2. 3GPP2 C.S0015-A: Short Message Service (SMS) for Wideband Spread Spec-
trum Systems Release A, Sep. 2004. https://www.3gpp2.org/Public_html/Specs/C.
S0015-A_v2.0_051006.pdf.

[52] Youness Arjoune and Saleh Faruque. Smart jamming attacks in 5g new radio: A review.
In 2020 10th Annual Computing and Communication Workshop and Conference (CCWC),
pages 1010–1015, 2020.

[53]

Jaejong Baek, Sukwha Kyung, Haehyun Cho, Ziming Zhao, Yan Shoshitaishvili, Adam
Doupé, and Gail-Joon Ahn. Wi not calling: Practical privacy and availability attacks in wi-fi
calling. In Proceedings of the 34th Annual Computer Security Applications Conference,
pages 278–288, 2018.

[54] Evangelos Bitsikas and Christina Pöpper. You have been warned: Abusing 5g’s warning and
emergency systems. In Proceedings of the 38th Annual Computer Security Applications
Conference, pages 561–575, 2022.

[55] Fabio Cecchinato, Lorenzo Vangelista, Giulio Biondo, and Mauro Franchin. Anomaly
detection using lstm neural networks: an application to voip traffic. In 2021 IEEE International
Conference on Recent Advances in Systems Science and Engineering (RASSE), pages 1–7,
2021.

[56] Xiaolin Chen, Xuemeng Song, Guozhen Peng, Shanshan Feng, and Liqiang Nie. Adversarial-

enhanced hybrid graph network for user identity linkage. In ACM SIGIR’21.

[57] Hyungjin Cho, Seongmin Park, Youngkwon Park, Bomin Choi, Dowon Kim, and Kangbin
IEICE TRANSACTIONS on

Yim. Analysis against security issues of voice over 5g.

82

Information and Systems, 104(11):1850–1856, 2021.

[58] Zhiwei Cui, Baojiang Cui, Junsong Fu, and Renhai Dong. Security threats to voice services

in 5g standalone networks. Security and Communication Networks, 2022, 2022.

[59] Haotian Deng, Weicheng Wang, and Chunyi Peng. Ceive: Combating caller id spoofing on 4g
mobile phones via callee-only inference and verification. In Proceedings of the 24th Annual
International Conference on Mobile Computing and Networking, pages 369–384, 2018.

[60] Ericsson. Voice and communications services trends and outlook, 2023.

[61] Simon Erni, Martin Kotuliak, Patrick Leu, Marc Roeschlin, and Srdjan Capkun. Adaptover:
adaptive overshadowing attacks in cellular networks. In Proceedings of the 28th Annual
International Conference on Mobile Computing And Networking, pages 743–755, 2022.

[62] Barbara Furletti, Lorenzo Gabrielli, Chiara Renso, and Salvatore Rinzivillo. Identifying users

profiles from mobile calls habits. In ACM SIGKDD’12.

[63]

Julia A Goldberg. Interrupting the discourse on interruptions: An analysis in terms of
relationally neutral, power-and rapport-oriented acts. Journal of Pragmatics, 1990.

[64] Google.

Android security paper 2023.

https://blog.google/products/

android-enterprise/android-security-paper-2023/, Jan 2023.

[65] GSMA. Volte market update.

https://www.gsma.com/services/wp-content/

uploads/2022/04/GSMAi-VoLTE-Market-Update-final.pdf.

[66] GSMA. Ims profile for voice and sms. version 13.0. https://www.gsma.com/newsroom/

wp-content/uploads//IR.92-v13.0-2-1.pdf, 2019.

[67] GSMA. RCS Universal Profile Service Definition Document , Oct. 2019. https://www.
gsma.com/futurenetworks/wp-content/uploads/2019/10/RCC.71-v2.4.pdf.

[68] GSMA. Rich Communication Suite - Advanced Communications Services and Client Specifi-
cation, Oct. 2019. https://www.gsma.com/solutions-and-impact/technologies/
networks/wp-content/uploads/2019/10/RCC.07-v11.0.pdf.

[69] GSMA. Ims profile for voice and sms version 15.0. https://www.gsma.com/newsroom/

wp-content/uploads//IR.92-v15.0-4.pdf, 2020.

[70] Mohammad Haghighat and Mohamed Abdel-Mottaleb. Low resolution face recognition in
surveillance systems using discriminant correlation analysis. In 2017 12th IEEE International
Conference on Automatic Face & Gesture Recognition (FG 2017), pages 912–917. IEEE,
2017.

83

[71] Kaiming He, Georgia Gkioxari, Piotr Dollár, et al. Mask r-cnn. In IEEE ICCV’17.

[72] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image

recognition. In IEEE CVPR’16.

[73] Andrew G Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias
Weyand, Marco Andreetto, and Hartwig Adam. Mobilenets: Efficient convolutional neural
networks for mobile vision applications. arXiv preprint arXiv:1704.04861, 2017.

[74] Peiyun Hu and Deva Ramanan. Finding tiny faces. In IEEE CVPR’17.

[75] Yiwen Hu, Min-Yue Chen, et al. Uncovering insecure designs of cellular emergency services

(911). In ACM Mobicom’22.

[76] Yiwen Hu, Min-Yue Chen, Guan-Hua Tu, Chi-Yu Li, Sihan Wang, Jingwen Shi, Tian Xie,
Li Xiao, Chunyi Peng, Zhaowei Tan, and Songwu Lu. Uncovering insecure designs of cellular
emergency services (911). In Proceedings of the 28th Annual International Conference on
Mobile Computing And Networking, MobiCom ’22, page 703–715, New York, NY, USA,
2022. Association for Computing Machinery.

[77] Syed Rafiul Hussain et al. Privacy attacks to the 4g and 5g cellular paging protocols using

side channel information. In NDSS’19.

[78]

IETF. RObust Header Compression (ROHC): Framework and four profiles: RTP, UDP, ESP,
and uncompressed. https://datatracker.ietf.org/doc/html/rfc3095, 2001.

[79]

IETF. Real-time Transport Protocol (RTP) Payload for Comfort Noise (CN). https:
//datatracker.ietf.org/doc/html/rfc3389, 2002.

[80]

IETF. Early Media and Ringing Tone Generation in the Session Initiation Protocol (SIP).
https://www.rfc-editor.org/rfc/rfc3960, 2005.

[81] Yunhan Jack Jia, Qi Alfred Chen, Zhuoqing Morley Mao, Jie Hui, Kranthi Sontinei, Alex
Yoon, Samson Kwong, and Kevin Lau. Performance characterization and call reliability
diagnosis support for voice over lte. In ACM MobiCom’15.

[82] Dmitri Kamenetsky, Sau Yee Yiu, and Martyn Hole. Image enhancement for face recognition

in adverse environments. In IEEE DICTA’16.

[83] Mohsin Khan, Philip Ginzboorg, Kimmo Järvinen, and Valtteri Niemi. Defeating the

downgrade attack on identity privacy in 5g. In SSR’18.

[84] Hongil Kim, Dongkwan Kim, Minhee Kwon, Hyungseok Han, Yeongjin Jang, Dongsu
Han, Taesoo Kim, and Yongdae Kim. Breaking and fixing volte: Exploiting hidden data
channels and mis-implementations. In Proceedings of the 22nd ACM SIGSAC Conference

84

on Computer and Communications Security, pages 328–339, 2015.

[85] Hongil Kim, Dongkwan Kim, Minhee Kwon, Hyungseok Han, Yeongjin Jang, Dongsu Han,
Taesoo Kim, and Yongdae Kim. Breaking and fixing volte: Exploiting hidden data channels
and mis-implementations. In Conference on Computer and Communications Security (CCS),
pages 328–339, 2015.

[86] Martin Kotuliak, Simon Erni, Patrick Leu, Marc Roeschlin, and Srdjan Capkun. LTrack:

Stealthy Tracking of Mobile Phones in LTE. In USENIX Security’22.

[87]

Joseph B Kruskal. An overview of sequence comparison: Time warps, string edits, and
macromolecules. SIAM review, 1983.

[88] Swarun Kumar, Ezzeldin Hamed, Dina Katabi, and Li Erran Li. Lte radio analytics made

easy and accessible. In ACM SIGCOMM’14.

[89] Swarun Kumar, Ezzeldin Hamed, Dina Katabi, and Li Erran Li. Lte radio analytics made

easy and accessible. ACM SIGCOMM’14.

[90] Gyuhong Lee, Jihoon Lee, Jinsung Lee, Youngbin Im, Max Hollingsworth, Eric Wustrow,
Dirk Grunwald, and Sangtae Ha. This is your president speaking: Spoofing alerts in 4g
lte networks. In International Conference on Mobile Systems, Applications, and Services
(MobiSys), pages 404–416, 2019.

[91] Chi-Yu Li, Guan-Hua Tu, Chunyi Peng, Zengwen Yuan, Yuanjie Li, Songwu Lu, et al.

Insecurity of voice solution volte in lte mobile networks. In ACM CCS’15.

[92] Chi-Yu Li, Guan-Hua Tu, Chunyi Peng, Zengwen Yuan, Yuanjie Li, Songwu Lu, and Xinbing
Wang. Insecurity of voice solution volte in lte mobile networks. In Proceedings of the 22Nd
ACM SIGSAC Conference on Computer and Communications Security, CCS ’15, pages
316–327, New York, NY, USA, 2015. ACM.

[93]

Jian Li, Yabiao Wang, Changan Wang, Ying Tai, Jianjun Qian, Jian Yang, Chengjie Wang,
Jilin Li, and Feiyue Huang. Dsfd: dual shot face detector. In IEEE CVPR’19.

[94] Pei Li, Loreto Prieto, Domingo Mery, and Patrick J Flynn. On low-resolution face recognition
in the wild: Comparisons and new techniques. IEEE Transactions on Information Forensics
and Security, 14(8):2000–2012, 2019.

[95] Zhihang Li, Xu Tang, Junyu Han, Jingtuo Liu, et al. Pyramidbox++: High performance

detector for finding tiny face. arXiv preprint arXiv:1904.00386, 2019.

[96] Shengcai Liao, Anil K Jain, and Stan Z Li. Partial face recognition: Alignment-free approach.
IEEE Transactions on pattern analysis and machine intelligence (TPAMI), 35(5):1193–1205,
2012.

85

[97] Siyuan Liu, Shuhui Wang, Feida Zhu, et al. Hydra: Large-scale social identity linkage via

heterogeneous behavior modeling. In ACM SIGMOD’14.

[98] Yu-Han Lu, Chi-Yu Li, Yao-Yu Li, Hsiao, et al. Ghost calls from operational 4g call systems:

Ims vulnerability, call dos attack, and countermeasure. In ACM MobiCom’20.

[99] Yu-Han Lu, Chi-Yu Li, Yao-Yu Li, Sandy Hsin-Yu Hsiao, Tian Xie, Guan-Hua Tu, and
Wei-Xun Chen. Ghost calls from operational 4g call systems: Ims vulnerability, call dos
attack, and countermeasure. In Proceedings of the 26th Annual International Conference on
Mobile Computing and Networking, MobiCom ’20, New York, NY, USA, 2020. Association
for Computing Machinery.

[100] Feifan Lv, Bo Liu, and Feng Lu. Fast enhancement for non-uniform illumination images

using light-weight cnns. In ACM Multimedia’20.

[101] Jamila Manan, Atiq Ahmed, Ihsan Ullah, Leïla Merghem-Boulahia, and Dominique Gaïti.
Distributed intrusion detection scheme for next generation networks. Journal of Network and
Computer Applications, 147:102422, 2019.

[102] MobileInsight. Mobileinsight. http://www.mobileinsight.net/, 2021.

[103] Bjarke Mønsted, Anders Mollgaard, et al. Phone-based metric as a predictor for basic

personality traits. Elsevier Journal of Research in Personality.

[104] NYtimes. The humble phone call has made a comeback. https://www.nytimes.com/

2020/04/09/technology/phone-calls-voice-virus.html, 2020.

[105] Seongmin Park, HyungJin Cho, Youngkwon Park, Bomin Choi, Dowon Kim, and Kangbin
Yim. Security problems of 5g voice communication. In Information Security Applications:
21st International Conference, WISA 2020, Jeju Island, South Korea, August 26–28, 2020,
Revised Selected Papers 21, pages 403–415. Springer, 2020.

[106] Sancheng Peng, Shui Yu, and Aimin Yang. Smartphone malware and its propagation modeling:

A survey. IEEE Communications Surveys & Tutorials, 16(2):925–941, 2013.

[107] Abhijith Punnappurath, Ambasamudram Narayanan Rajagopalan, Sima Taheri, Rama Chel-
lappa, and Guna Seetharaman. Face recognition across non-uniform motion blur, illumination,
and pose. IEEE Transactions on image processing, 24(7):2067–2082, 2015.

[108] Qualcomm. Qxdm professional tool. https://www.qualcomm.com/media/documents/

files/qxdm-professional-qualcomm-extensible-diagnostic-monitor.pdf,
2020.

[109] Grand View Research. Voice over lte market: Global industry trends, share, size, growth, op-
portunity and forecast 2023-2028. https://www.researchandmarkets.com/reports/

86

5732780/voice-over-lte-market-global-industry-trends, Jan 2023.

[110] Juniper Research. Video calling demand booms during pandemic. https://pipelinepub.

com/news/12307, Jan 2024.

[111] KBV Research. Global mobile voice market size, share & industry trends analysis report
by transmission, by end user, by regional outlook and forecast, 2022 - 2028. https:
//www.reportlinker.com/p06364618/, Oct 2022.

[112] RFC. Dynamic Host Configuration Protocol for IPv6 (DHCPv6), 2018.

https:

//datatracker.ietf.org/doc/html/rfc8415.

[113] David Rupprecht, Katharina Kohls, Thorsten Holz, and Christina Pöpper. Call me maybe:

Eavesdropping encrypted lte calls with revolte. In USENIX Security’20.

[114] David Rupprecht, Katharina Kohls, Thorsten Holz, and Christina Pöpper. Call me maybe:
Eavesdropping encrypted {LTE} calls with {ReVoLTE}. In 29th USENIX security symposium
(USENIX security 20), pages 73–88, 2020.

[115] Sachinsdate. Speaker detection by watching lip movements. https://github.com/

sachinsdate/lip-movement-net, 2016.

[116] Florian Schroff, Dmitry Kalenichenko, and James Philbin. Facenet: A unified embedding for

face recognition and clustering. In IEEE CVPR’15.

[117] Clemens Stachl, Quay Au, Ramona Schoedel, et al. Predicting personality from patterns of

behavior collected with smartphones. PNAS’20.

[118] Statista. Mobile voice - worldwide.

https://www.statista.com/outlook/tmo/

communication-services/mobile-voice/worldwide, 2023.

[119] Jessica E Steele, Carla Pezzulo, Maximilian Albert, Christopher J Brooks, Elisabeth zu Erbach-
Schoenberg, Siobhán B O’Connor, Pål R Sundsøy, Kenth Engø-Monsen, Kristine Nilsen,
Bonita Graupe, et al. Mobility and phone call behavior explain patterns in poverty at
high-resolution across multiple settings. Humanities and Social Sciences Communications,
8(1):1–12, 2021.

[120] Qibo Sun, Shangguang Wang, Ning Lu, Kok-Seng Wong, and Myung Ho Kim. Sfads: A
sip flooding attack detection scheme with the internal and external detection features in ims
networks. Journal of Internet Technology, 17(7):1327–1338, 2016.

[121] Sarah Tabassum, Cori Faklaris, and Heather Richter Lipford. What drives {SMiShing}
susceptibility? a {US}. interview study of how and why mobile phone users judge text
messages to be real or fake. In Twentieth Symposium on Usable Privacy and Security
(SOUPS 2024), pages 393–411, 2024.

87

[122] Deborah Tannen. Turn-taking and intercultural discourse and communication. The handbook

of intercultural discourse and communication, pages 135–157, 2012.

[123] Guan-Hua Tu, Chi-Yu Li, Chunyi Peng, Yuanjie Li, and Songwu Lu. New security threats
caused by ims-based sms service in 4g lte networks. In Proceedings of the 2016 ACM
SIGSAC Conference on Computer and Communications Security, CCS ’16, pages 1118–1130,
New York, NY, USA, 2016. ACM.

[124] G Vennila, MSK Manikandan, and MN Suresh. Detection and prevention of spam over internet
telephony in voice over internet protocol networks using markov chain with incremental svm.
International Journal of Communication Systems, 30(11):e3255, 2017.

[125] Ganesan Vennila, MSK Manikandan, and MN Suresh. Dynamic voice spammers detection
using hidden markov model for voice over internet protocol network. Computers & Security,
2018.

[126] David Waiting et al. the uct ims client. In IEEE TRIDENTCOM.

[127] Sihan Wang, Guan-Hua Tu, Xinyu Lei, Tian Xie, Chi-Yu Li, Po-Yi Chou, Fucheng Hsieh,
Yiwen Hu, Li Xiao, and Chunyi Peng. Insecurity of Operational Cellular IoT Service: New
Vulnerabilities, Attacks, and Countermeasures, pages 437–450. Association for Computing
Machinery, New York, NY, USA, 2021.

[128] Xintao Wang, Kelvin CK Chan, Ke Yu, Chao Dong, et al. Edvr: Video restoration with

enhanced deformable convolutional networks. In IEEE CVPR’19.

[129] Wikipedia. STIR/SHAKEN. https://en.wikipedia.org/wiki/STIR/SHAKEN,.

[130] Wikipedia. 3GPP. https://en.wikipedia.org/wiki/3GPP, 2023.

[131] Wikipedia. 3GPP. https://en.wikipedia.org/wiki/3GPP, 2023.

[132] Wikipedia. Darwin (operating system). https://en.wikipedia.org/wiki/Darwin_

(operating_system), Feb 2024.

[133] Wikipedia. ios. https://www.apple.com/iphone-15/specs/, Jan 2024.

[134] Wikipedia. Qualcomm msm interface. https://en.wikipedia.org/wiki/Qualcomm_

MSM_Interface, Jan 2024.

[135] Wikipedia. Steganography. https://en.wikipedia.org/wiki/Steganography, Jan

2024.

[136] John Wu. Magisk. https://github.com/topjohnwu/Magisk, Jan 2024.

88

[137] Danny Wyatt, Tanzeem Choudhury, Jeff Bilmes, and James A Kitts. Inferring colocation
and conversation networks from privacy-sensitive audio with implications for computational
social science. ACM TIST’11.

[138] T. Xie, G. Tu, C. Li, C. Peng, J. Li, and M. Zhang. The dark side of operational wi-fi calling
services. In 2018 IEEE Conference on Communications and Network Security (CNS), pages
1–1, May 2018.

[139] Tian Xie, Guan-Hua Tu, Bangjie Yin, et al. The untold secrets of wifi-calling services:

Vulnerabilities, attacks, and countermeasures. IEEE TMC’22.

[140] Tian Xie, Sihan Wang, Xinyu Lei, Jingwen Shi, Guan-Hua Tu, and Chi-Yu Li. Mpkix:
Towards more accountable and secure internet application services via mobile networked
systems. IEEE Transactions on Mobile Computing, pages 1–1, 2022.

[141] Hojoon Yang, Sangwook Bae, Mincheol Son, Hongil Kim, et al. Hiding in plain signal:

Physical signal overshadowing attack on {LTE}. In USENIX Security’19.

89