ON EFFICIENCY AND INTELLIGENCE OF NEXT-GENERATION WIRELESS NETWORKS By Pedram Kheirkhah Sangdeh A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of Computer Science – Doctor of Philosophy 2023 ABSTRACT The ever-increasing demand for data-hungry wireless services and rapid proliferation of wireless devices in sub-6 GHz band have pushed the current wireless technologies to a breaking point, necessitating efficient and intelligent strategies to utilize scarce communication resources. This thesis aims at leveraging novel communication frameworks, artificial intelligence techniques, and synergies between them in bringing efficiency and intelligence to the next generation of wireless networks. In the first chapter of this thesis, we propose a novel spectrum sharing scheme to address spectrum shortage, a fundamental issue in current and future wireless networks. Our proposed scheme enables transparent spectrum utilization for a small cognitive radio network by leverag- ing two interference management techniques that are not reliant on inter-network coordination, fine-grained synchronization, and knowledge about other occupants of the spectrum. We further extend this idea in the second chapter of this thesis and enable concurrent device-to-device and cellular communications in cellular networks where the base station and wireless devices exploit interference management techniques to avoid causing interference to each other, making concur- rent spectrum utilization possible for both cellular and device-to-device communications. In the third chapter, to enhance spectral efficiency, connectivity, and throughput of Wireless Local Area Networks (WLAN), we propose a non-orthogonal multiplexing scheme (NOMA). In our proposed scheme, the access point (AP) is equipped with a novel precoder design and user grouping which are tailored based on the requirements of power-domain NOMA. Also, a novel successive inter- ference cancellation technique is designed for users which does not require channel estimation to decode the signals and is more resilient to interference compared to the existing techniques. The second part of this thesis focuses on taking advantage of artificial intelligence for solv- ing communication and networking challenges and also taking advantage of novel communication frameworks to let future wireless networks indulge intelligence-oriented networking and resource management. In the fourth chapter, we propose a new solution to solve a long-standing issue ahead of multi-user multiple-input multiple-output (MU-MIMO) communications in WLANs, which is the large sounding overhead for acquiring the channel state information (CSI). Our learning-based solution includes an automated mechanism that enables access points to collect, clear, and bal- ance dataset, and also deep neural networks to compress CSI and reduce the airtime overhead for channel acquisition. However, with provisioning concurrent MU-MIMO and orthogonal fre- quency division multiple access (OFDMA) in the new generation of WLANs, not only the sound- ing overhead problem becomes more acute, but it also marries with a complex resource allocation problem which makes designing a practical enabler of MU-MIMO-OFDMA transmissions nec- essary for WLANs. In the fifth chapter of this thesis, we propose DeepMux, which comprises a deep-learning-based channel sounding and a deep-learning-based resource allocation both of which reside in access points and impose no computational/communication burden on users, en- abling efficient downlink MU-MIMO-OFDMA transmissions in WLANs. We finally design a communication framework for accelerating federated learning in future intelligent transportation systems, where heterogeneous capabilities and mobility of users along with limited available band- width for communications are huge obstacles toward making the network intelligent in a distributed manner. With the aid of a deadline-driven scheduler and asynchronous uplink multi-user MIMO, our proposed solution reduces data loss at vehicles in a dynamic vehicular environment, making a concrete step toward the practical adoption of federated learning in future transportation systems. To my parents, Jafar and Manijeh, and to my wife, Fariba iv ACKNOWLEDGMENTS This dissertation would not be possible without the contribution and support of many people. Their efforts cannot be completely credited for with one page of a document. First and foremost, my deepest gratitude goes out to my advisor, Dr. Huacheng Zeng. I am thankful for all the advice and encouragement he gave me, for the hours of discussion we shared, and for all that I learned while working under his guidance. I am very grateful for the additional time he took to prepare me for the key moments of the Ph.D., from day one to writing up this dissertation. It is indeed a great privilege and honor to study under his supervision. I would like to acknowledge Prof. Matt Mutka and Dr. Mi Zhang. Their guidance and support have transformed this research work into a foundation of career, for which I am more than grateful. I am also in debt to Dr. Qiben Yan. His valuable feedback and countless advice were the keystones of two interesting research efforts I had the pleasure to work on. I am grateful to my fellow colleagues Hossein Pirayesh, Adnan Quadri, and Shichen Zhang for their support of my lab work and countless brainstorming moments during the last five years. My special thanks go to my friends in Louisville, Mahyar, Mehdi, and Masoud, for their moral support, numerous nights out, flat parties, dinners, and holidays together. Thanks also to my life- long friend Shayan for the endless hours we spent around lakes and rivers without catching a single fish. I am glad our adventure continues in San Diego. Thanks to my friends from Tehran and Parehsar, Moein, Ashkan, Omid, Farnoush, Pegah, Behzad, Shahrokh, Saeid, Mojtaba, Kaveh, Mohammad, Majid, Ali, and Vahid, for keeping in touch all the years and growing even closer to my heart whilst I am miles away. My sincere gratitude goes to my wife, Fariba, who stood by me during the ups and downs of Ph.D. life. Not only for the moral support but also for all the fruitful discussions about my work. What an adventure, I could not love you more. Thanks to my dog, Kenji, for his constant v interruptions during important meetings and interviews. His persistence in chewing my papers and supplies has been always inspiring. He taught me the ability to focus during presentations and transformed my reading style from in-print to on-screen. Finally, my most special thanks to my parents, Jafar and Manijeh, for their love and support, for all the comments they gave me on my work, for all their advice and encouragement, and for absolutely everything that they did. vi TABLE OF CONTENTS LIST OF ABBREVIATIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix Chapter 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Research Scope and Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2 Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 Chapter 2 Underlay Spectrum Sharing for CRNs . . . . . . . . . . . . . . . . . . . . 13 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.3 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.4 A Spectrum Sharing Scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 2.5 Blind Beamforming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 2.6 Blind Interference Cancellation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 2.7 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 2.8 Limitations and Discussions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 2.9 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 Chapter 3 D2D Communications in Cellular Networks . . . . . . . . . . . . . . . . . 54 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 3.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 3.3 Problem Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 3.4 DM-COM: An Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 3.5 MU-MIMO Communication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 3.6 D2D Communication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 3.7 Experimental Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 3.8 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 Chapter 4 Non-Orthogonal Multiple Access for WLANs . . . . . . . . . . . . . . . . 86 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 4.2 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 4.3 Problem Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 4.4 Precoder Design for Downlink NOMA . . . . . . . . . . . . . . . . . . . . . . . . 93 4.5 A Downlink NOMA Scheme for WLAN . . . . . . . . . . . . . . . . . . . . . . . 106 4.6 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 4.7 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 Chapter 5 Learning-Based Channel Feedback for MU-MIMO in WLANs . . . . . . . 126 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 5.2 Related Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 5.3 Problem Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 5.4 LB-SciFi: A Learning-Based Feedback Framework . . . . . . . . . . . . . . . . . 135 5.5 Experimental Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148 vii 5.6 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158 Chapter 6 A Learning-Based Channel Sounding and Resource Allocation for IEEE 802.11ax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159 6.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162 6.3 Problem Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165 6.4 Overview of DeepMux . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169 6.5 DLCS: A Low-Overhead Channel Sounding . . . . . . . . . . . . . . . . . . . . . 172 6.6 DLRA: A Lightweight Resource Allocation . . . . . . . . . . . . . . . . . . . . . 179 6.7 Experimental Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186 6.8 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197 Chapter 7 A Communication Framework for FL in Intelligent Transportation Systems . 199 7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199 7.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202 7.3 Federated Learning in Vehicular Networks . . . . . . . . . . . . . . . . . . . . . . 203 7.4 CF4FL: Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209 7.5 Deadline-Driven Vehicle Scheduler (DDVS) . . . . . . . . . . . . . . . . . . . . . 211 7.6 Concurrent Vehicle Polling Scheme (CVPS) . . . . . . . . . . . . . . . . . . . . . 224 7.7 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229 7.8 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238 Chapter 8 Summary and Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240 8.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240 8.2 Future Focus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241 BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244 viii LIST OF ABBREVIATIONS WLAN Wireless Local Area Network IoT Internet of Things AP Access Point BS Base Station CRN Cognitive Radio Network D2D Device-to-Device NOMA Non-Orthogonal Multiple Acess AI Artificial Intelligence FL Federated Learning BBF Blind BeamForming BIC Blind Interference Cancellation DoF Degrees of Freedom CSI Channel State Information SIC Successive Interference Cancellation OMA Orthogonal Multiple Access DL Deep Learning MU-MIMO Multi-User Multiple-Input Multiple-Output OFDM Orthogonal Frequency-Division Multiplexing OFDMA Orthogonal Frequency Division Multiple Access ITS Intelligent Transportation Systems DNN Deep Neural Network DLCS DL-based Channel Sounding DLRA DL-based Resource Allocation ML Machine Learning DDVS Deadline-Driven Vehicle Scheduler CVPS Concurrent Vehicle Polling Scheme ix IC Interference Cancellation TDD Time-Division Duplex CDMA Code-Division Multiple Access ZF Zero Forcing MMSE Minimum Mean Square Error SDR Software-Defined Radio SIR Signal to Interference Ratio EVM Error Vector Magnitude EBF Explicit Beamforming NDP Null Data Packet IBF Implicit Beamforming UEs User Equipment PRB Physical Resource Block NR New Radio TDMA Time Division Multiple Access SNR Signal to Noise Ratio MSE Mean Square Error SVD Singular Value Decomposition MCS Modulation and Coding Scheme FFT Fast Fourier Transform FDMA Frequency Division Multiple Access RAT Radio Access Technologies MISO Multi-Input Single-Output SISO Single-Input Single-Output MM Minorization-Majorization MIMO Multi-Input Multi-Output SINR Signal-to-Interference-and-Noise Ratio x ISNR Interference-to-Signal-and-Noise Ratio NDPA Null Data Packet Announcement L-STF Legacy Short Training Field L-LTF Legacy Long Training Field L-SIG Legacy Signal DNN-AE Deep Neural Network Auto-Encoder GR Givens Rotation PSE Power spectral Entropy ReLU Rectified Linear Unit AFC Adaptive Feedback Compression RU Resource Unit MINLP Mixed-Integer Non-Linear Programming LP Linear Programming BR Beamforming Report TBRP Trigger Beamforming Report Poll MILP Mixed-Integer Linear Programming NMSE Normalized Mean Squared Error EPS Extended Polynomial Scheduler FPD Fictitious Polynomial Deadline FSC Fictitious Scheduler Construction CNN Convolutional Neural Network RND Random Scheduler RR Round-Robin Scheduler EDF Earliest Deadline First Scheduler SP Sequential Polling xi Chapter 1 Introduction The burgeoning demand for data-hungry wireless services and the proliferation of mobile devices have pushed the current wireless technologies to a breaking point whereby the traditional ways of deploying, operating, managing, and troubleshooting wireless networks are not efficient anymore. Only Wireless Local Area Networks (WLANs) have connected more than 22.2 billion devices around the globe [155], let alone tens of billions Internet of Things (IoT) devices [123] and smart- phones [35]. The scarcity of spectrum and over-congestion of sub-6GHz band necessitate the advent of more efficient and intelligent wireless networking and communication paradigms for the next generations of wireless networks. Efficient utilization of limited available resources and preservation of optimal performance at scale are two indispensable requirements in next-generation wireless networks. However, these two requirements are not easy to fulfill. First, if set to employ shared communication resources, current wireless technologies compete for exclusive utilization and only consider their individual performance [2, 67, 69]. Second, the performance of existing networking solutions may not scale well or even may start setting back, as the number of users grows. In fact, excessive computational complexity, adverse effect of interference, large communication overhead, and limited power bud- get at Access Points (APs)/Base Stations (BSs) stations impede the scalability of networks. In this thesis, we study how efficient communication frameworks, intelligent networking so- lutions, and synergies between them help to fulfill the two aforementioned requirements. In what 1 follows, we discuss our research scope in bringing efficiency and intelligence to wireless networks and present the overview of this thesis. 1.1 Research Scope and Contribution Our research scope is broadly categorized into two parts, efficiency and intelligence, both explor- ing the opportunities for improving the performance of future wireless networks. As shown in Figure 1.1, the first part of the thesis particularly focuses on the efficient utilization of the scarce and over-crowded spectrum in next-generation wireless networks. It specifically aims at investi- gating three problems and offers corresponding solutions as follows. • Underlay Spectrum Sharing for Cognitive Radio Networks (CRNs) • Transparent Device-to-Device (D2D) communications in cellular networks • Non-Orthogonal Multiple Acess (NOMA) for WLANs The second part of the thesis focuses on the synergy between Artificial Intelligence (AI) and wireless communications. Our research efforts in this part follow two trajectories: i) leveraging AI techniques for improving the functionality of wireless networks, and ii) leveraging wireless communication and networking solutions to pave the way for making intelligence native to the next generations of wireless networks. The problems we study in this part are as follows. • Learning-based channel sounding for IEEE 802.11ac • Learning-based channel sounding and resource allocation for IEEE 802.11ax • A communication framework for Federated Learning (FL) in transportation systems 2 Figure 1.1: Overview of this thesis. In the rest of this chapter, we explain our problems of interest and the shortcoming of existing solutions to each in more detail. We also briefly describe our proposed schemes and their novelty compared to state-of-the-art schemes. 1.1.1 Efficiency The first part of the thesis intends to answer one important question; how to re-utilize the precious spectrum for establishing new communications without adversely affecting the existing ones. This question is explored in three different networking scenarios. In Chapter 2, we considered a very heterogeneous environment where the spectrum is potentially being used by a variety of tech- nologies that are likely unknown to the new spectrum occupant we intend to introduce. While Chapter 2 focuses on a very generic scenario, Chapter 3 targets cellular networks and tailors a specific solution enabling spectrum re-utilization in the context of D2D communications. It par- ticularly leverages the capabilities of the BS as the central coordinator of the network and lays the foundation of an interference management framework enabling D2D communications in the 3 presence of regular cellular devices. Finally, Chapter 4 focuses on WLANs and investigates the possibility of re-utilizing spectrum for a massive number of WLANs transmissions from a single AP to many devices without causing co-channel interference. In what follows, we further elaborate on the problems we intend to solve and the contributions of our proposed schemes. 1.1.1.1 Underlay Spectrum Sharing for Cognitive Radio Networks Problem Statement and Existing Solutions. Sub-6 GHz frequency spectrum is very crowded while being the main carrier for the data traffic in commercial wireless systems. The scarcity of resources and rapid proliferation of devices are pushing the spectrum shortage issue to a breaking point, necessitating the enhancement in the utilization efficiency of sub-6 GHz spectrum. One promising solution is spectrum sharing in the context of CRNs. This cost-effective and immediate solution to solve the spectrum shortage issue has been a long-standing subject of study and several enablers have been proposed. The common concept among all the proposed enablers is co-channel interference management among spectrum utilizers through various signal processing techniques, such as spread spectrum [59], power control [91, 107, 175], and beamforming [49, 133]. Spread spectrum handles interference in the code domain, and power control tames interference in the power domain. Beamforming exploits the spatial Degrees of Freedom (DoF) provided by multiple antennas to steer the secondary signals to some particular directions, thereby avoiding interfer- ence for primary users. Compared to the other two techniques, beamforming is more appealing in practice as it is effective in interference management. However, the existing beamforming solu- tions are reliant on global network information and cross-network channel knowledge or reliant on cooperation with other occupants of the spectrum. Proposed Scheme. We propose a practical scheme to enable transparent spectrum sharing [144] through two complementary modules, Blind BeamForming (BBF) and Blind Interference 4 Cancellation (BIC). These two techniques enable a new occupant of the spectrum to mitigate cross-network interference in the absence of inter-network coordination, fine-grained synchroniza- tion, and mutual knowledge from other occupants. Unlike the existing solutions, our scheme is not reliant on the momentary Channel State Information (CSI) nor seeking any coordination and cooperation from the other occupants of the spectrum for interference management and adjust- ing its transmission policy. We have built a prototype of our scheme on a wireless testbed and demonstrated its compatibility with commercial Wi-Fi devices. Experimental results show that our scheme is able to improve the spectral efficiency of CRNs by 1.1 bit/s/Hz in real-world wire- less environments. 1.1.1.2 Transparent Device-to-Device Communications in Cellular Networks Problem Statement and Existing Solutions. D2D communication is a promising technology for cellular networks [204] to enhance their spectral efficiency. It allows direct communication between two proximity-based mobile users without traversing the BS or core network. The ad- vantages of D2D communications go beyond spectral efficiency. Saving the airtime at the core network, D2D offers more airtime to the BS that can be leveraged to serve a massive number of low-rate devices such as IoT sensors. It also can potentially reduce packet transmission delay, en- hance user fairness, offload traffic for BSs, and alleviate congestion for core networks, especially in networks congested by IoT devices [177]. Despite its potential benefits, a D2D system needs to control co-channel interference and manage resources for competing users. In order for accom- plishing these tasks, the enablers of D2D communications include beamforming [115, 165, 176], spectral resource management [30, 84, 130, 160], power control [9, 10, 61, 62, 98, 160, 174], and mode selection [15, 27, 60]. Most of existing works consider spectrum re-utilization in either up- link (see, e.g., [10, 61, 98]) or downlink (see, e.g., [115, 165, 176]) of cellular networks, but not 5 both. Moreover, most of the existing works require perfect global channel knowledge as well as network-wide synchronization. Proposed Scheme. We propose DM-COM [80], a practical scheme for enabling D2D com- munications in cellular networks. The enabler of DM-COM is a new approach for managing the mutual interference between the D2D and cellular devices, which does not require CSI and is, therefore, amenable to practical implementation. It is also not restricted to only uplink or down- link and is compatible with both modes of transmission in cellular networks. We have built a prototype of DM-COM on a wireless testbed. Our experimental results show that using DM-COM in a small cellular network, D2D users achieve 1.9 bit/s/Hz spectral efficiency, while MU-MIMO users have less than 8% throughput degradation compared to the case without D2D users. Overall, DM-COM improves the throughput of the network by 82% compared to the case whole traffic is traversed to the BS. 1.1.1.3 Non-Orthogonal Multiple Access for WLANs Problem Statement and Existing Solutions. NOMA allows multiple users to utilize the same spectrum band for signal transmissions at the same time and, therefore, offers many advantages such as improving spectral efficiency, enhancing resource allocation flexibility, reducing schedul- ing latency, increasing cell-edge throughput, and enabling massive connectivity. Although a con- siderable amount of research efforts have been made on the study of NOMA, most of them are limited to theoretical exploration and performance analysis in cellular networks. Very limited progress has been made so far in the development of practical NOMA schemes and experimental validation of NOMA in WLANs. This stagnation reflects the challenges in the design of practi- cal NOMA schemes and the engineering issues related to their implementations, such as channel acquisition and precoding on the transmitter side and Successive Interference Cancellation (SIC) 6 realization on the receiver side. Specifically, there is no research effort focusing on the pre-coding design, user grouping, and SIC for WLANs. Moreover, there is no experimental validation of NOMA for these networks. Proposed Scheme. We propose a practical downlink NOMA scheme for WLANs [81] and evaluate its performance in real-world wireless environments. Our NOMA scheme has three key components: precoder design, user grouping, and a new SIC scheme. On the transmitter side, we first formulate the precoding design problem as an optimization problem and then devise an effi- cient algorithm to construct precoders for downlink NOMA transmissions. We further propose a lightweight user grouping algorithm to ensure the success of SIC at the receivers. On the receiver side, we propose a new SIC method to decode the desired signal in the presence of strong interfer- ence. In contrast to existing SIC methods, our SIC method does not require channel estimation to decode the signals, thereby improving its resilience to interference. We have also built a prototype of the proposed NOMA scheme on a wireless testbed. This is the first experimental validation of NOMA on a real WLAN testbed. Experimental results show that, compared to Orthogonal Multi- ple Access (OMA), the proposed NOMA scheme considerably improves WLAN’s weighted sum rate (36% on average). 1.1.2 Intelligence The second part of the thesis focuses on both leveraging and domesticating intelligence in the next generations of wireless networks. It first turns its focus on using AI techniques for solving intricate networking and communication problems in WLANs. Second, as intelligent applications are in- dispensable parts of future wireless networks, it endeavors to provide a communication framework to make the intelligence native to these networks considering the application requirements and net- working challenges. Particularly, Chapter 5 exploits the recent advances in Deep Learning (DL) 7 to relax the excessive overhead of channel sounding in the current IEEE 802.11ac [67] WLANs. If overhead is reduced, more airtime can be assigned for data transmissions, enhancing the overall throughput of WLANs in Multi-User Multiple-Input Multiple-Output (MU-MIMO) mode. While our proposed solution, LB-SciFi [79], effectively works for IEEE 802.11ac [67], it falls short in IEEE 802.11ax [70] where MU-MIMO can be mixed with Orthogonal Frequency Division Multi- ple Access (OFDMA). This marriage brings two challenges. It significantly scales up the channel sounding. Also, to fully exploit the gain of MU-MIMO-OFDMA mixed mode, a complicated re- source allocation problem needs to be solved in real-time. To address these challenges, a novel technique beyond LB-SciFI is needed. We introduce DeepMux [82] in Chapter 6 to tackle these issues. Finally, Chapter 7 lies within the latter trajectory of the second part of the thesis, which is the domestication of intelligence in wireless networks. We focus on Intelligent Transportation Systems (ITS) and design an elaborate communication framework to facilitate the establishment of Federated Learning (FL) in such a dynamic and heterogeneous wireless environment [78]. 1.1.2.1 Learning-based Channel Feedback for MU-MIMO in WLANs Problem Statement and Existing Solutions. To support downlink MU-MIMO communications in WLANs, an AP needs to access short-term CSI for the construction of beamforming filters. To acquire CSI, IEEE 802.11 standards [67, 70] adopted explicit CSI feedback. However, due to its reliance on over-the-air CSI feedback, it suffers from large airtime overhead. The large overhead of this method can be attributed to a large number of subcarriers in WLANs’ Orthogonal Frequency- Division Multiplexing (OFDM) modulation, each of which has a channel matrix to be reported. To reduce the overhead, existing 802.11 protocols may group subcarriers and provide one CSI report per group. Apparently, such a naive scheme will lead to an inferior beamforming performance 8 and drastically compromises the throughput gain of MU-MIMO. Given the severity of this issue, research efforts have been devoted to studying the effect of channel acquisition parameters on network throughput or completely altering the channel acquisition paradigm to enhance network throughput [17, 33, 53, 87, 110, 113, 117, 128, 134, 139, 197, 200]. None of these works focused on reducing the overhead of explicit channel sounding. Although there are several learning-based solutions for cellular networks [54, 97, 111, 172, 179, 193], none of them can be directly applied to WLANs. CSI in WLANs are essentially real-valued spatial angles which are different from CSI in cellular networks which are complex-valued channel gains. Proposed Scheme. We present LB-SciFi [79], a learning-based channel feedback framework for MU-MIMO in WLANs. LB-SciFi takes advantage of deep neural network autoencoder (DNN- AE) to compress CSI in 802.11 protocols, thereby conserving airtime and improving spectral effi- ciency. The key component of LB-SciFi is an online DNN-AE training scheme, which allows an AP to train DNN-AEs by leveraging the side information of existing 802.11 protocols. With this training scheme, DNN-AEs are capable of significantly lowering the airtime overhead for MU- MIMO while preserving its backward compatibility with incumbent Wi-Fi client devices. To the best of our knowledge, LB-SciFi is the first learning-based channel feedback framework designed for WLANs. We have implemented LB-SciFi on a wireless testbed and evaluated its performance in indoor wireless environments. Experimental results show that LB-SciFi offers an average of 73% airtime overhead reduction and increases network throughput by 69% on average compared to 802.11 feedback protocols. 1.1.2.2 Learning-based Channel Sounding and Resource Allocation for IEEE 802.11ax Problem Statement and Existing Solutions. Wi-Fi networks are evolving from 802.11n/ac to 802.11ax so that a Wi-Fi AP is capable of utilizing the spectrum more efficiently. To do so, 9 802.11ax features a new transmission mode in the downlink, i.e., MU-MIMO-OFDMA mixed mode. However, this new mode comes with two major challenges. The channel sounding overhead drastically scales up as the number of potential users grows. Second, the marriage of MU-MIMO and OFDMA largely expands the optimization space of resource allocation at an 802.11ax AP, making it infeasible to pursue an optimal resource allocation solution in real time due to the lim- ited computational power of APs. Therefore, a low-complexity, yet efficient, algorithm is needed for an AP to solve the resource allocation problem. The existing efforts related to channel sounding overhead in WLANs either focus on re-using outdated CSI or bypassing the problem through lever- aging implicit channel sounding. The most related prior works are [187] and [79] which reduce the channel sounding overhead by compressing CSI in the frequency domain. However, these two efforts require coordination from Wi-Fi clients to fully or partially compress CSI. Such a luxury is not readily available in practical WLANs. On the other hand, the study on resource allocation for MU-MIMO-OFDMA in WLANs is scarce. [169] and [170] are the only works considering downlink MU-MIMO-OFDMA in WLANs. However, these two works employ greedy iterative algorithms to compute a feasible solution; making them not appealing for real-time resource allo- cation. Proposed Scheme. We present DeepMux [82], a deep-learning-based MU-MIMO-OFDMA transmission scheme for 802.11ax networks. DeepMux mainly comprises two components: DL- based Channel Sounding (DLCS) and DL-based Resource Allocation (DLRA), both of which re- side in APs and impose no computational/communication burden on Wi-Fi clients (unlike LB- SciFi). DLCS reduces the airtime overhead of 802.11 protocols by leveraging DNNs. It uses uplink channels to train the DNNs for downlink channels, making the training process much faster than LB-SciFi. DLRA employs a DNN to solve the mixed-integer resource allocation problem, enabling an AP to obtain a near-optimal solution in polynomial time. We have built a wireless 10 testbed to measure the performance of DeepMux in real-world environments. Results show that DeepMux reduces the sounding overhead by 62.0% ∼ 90.5% and increases the network through- put by 26.3% ∼ 43.6% . 1.1.2.3 A Communication Framework for Federated Learning in Transportation Systems Problem Statement and Existing Solutions. Machine Learning (ML) techniques have been ex- tensively studied to extract useful knowledge from massive data collected by vehicles so as to enhance the safety and efficiency of ITS. However, the sheer amount of data collected by vehi- cles and privacy concerns around the collected data make it impractical to transfer raw data to a server and use of conventional centralized training scheme. While FL can be regarded as a privacy- preserving and communication-efficient alternative training paradigm for vehicular networks, the limited communication capacity of these networks along with the heterogeneous sensing, storage, and processing capabilities of individual vehicles, bring severe challenges ahead of practical im- plementation of FL in ITS. Recently, pioneering works [28, 29, 40, 152, 171, 189, 195] have been conducted to incorporate FL into ITS. To the best of our knowledge, existing works mainly employ cross-layer optimization techniques to enhance learning efficiency. They assume that global CSI is available on the server. They also assume that CSI remains valid for the time period of an FL iteration. Given the small channel coherence time caused by the high mobility of vehicles, these two assumptions may not be valid in practical vehicular networks. Also, the existing works either rely on inter-vehicle synchronization or separate transmissions across frequency resources. Proposed Scheme. We present a communication framework for FL (CF4FL) in transportation systems [78]. CF4FL aims to accelerate the convergence of FL training process through the innova- tion of two complementary networking components, Deadline-Driven Vehicle Scheduler (DDVS) and Concurrent Vehicle Polling Scheme (CVPS). DDVS identifies a subset of vehicles for local 11 model training in each iteration of FL, with the aim of minimizing data loss while respecting the deadline constraints derived from vehicles’ storage, computation, and energy budgets. CVPS takes advantage of multiple antennas on an edge server to enable concurrent local model transmissions in dynamic vehicular networks, thereby reducing the airtime overhead of each FL iteration. CF4FL needs neither inter-vehicle synchronization nor instantaneous CSI for asynchronous concurrent ve- hicle transmissions. Our simulation results show that CF4FL reduces the convergence time of FL training by more than 39% compared to the existing solution. 1.2 Organization The rest of the thesis is organized as follows: Chapter 2 presents a blind spectrum sharing scheme for CRNs. Chapter 3 presents a communication framework enabling concurrent MU-MIMO and D2D communications in cellular networks, yielding high spectral efficiency and throughput. Chap- ter 4 presents a downlink power-domain NOMA scheme for WLANs in detail. Chapter 5 de- scribes LB-SciFi, a learning-based compression approach for reducing excessive airtime overhead in downlink MU-MIMO of WLANs. Chapter 6 presents DeepMux and its underlying modules for low-overhead channel sounding and fast resource allocation in downlink MU-MIMO-OFDMA mode of WLANs. Chapter 7 deals with designing a communication framework to accelerate FL in ITS and challenges ahead of its successful deployment. Finally, Chapter 8 enumerates possible research directions for our future research endeavors. 12 Chapter 2 Underlay Spectrum Sharing for CRNs 2.1 Introduction The rapid proliferation of wireless devices and the burgeoning demands for wireless services have pushed the spectrum shortage issue to a breaking point. Although it is expected that much spec- trum in the millimeter band (30 GHz to 300 GHz) will be allocated for communication purposes, most of this spectrum might be limited to short-range communications due to its severe path loss. Moreover, millimeter band is highly vulnerable to blockage and thus mainly considered for com- plementary use in next-generation wireless systems. As envisioned, sub-6 GHz frequency spec- trum, which is already very crowded, will still be the main carrier for the data traffic in commercial wireless systems. Therefore, it is very necessary to maximize the utilization efficiency of sub-6 GHz spectrum. To improve spectrum utilization efficiency, spectrum sharing in the context of CRNs has been widely regarded as a promising and cost-effective solution. In the past two decades, CRNs have received a large amount of research efforts and produced many cognitive radio schemes. Depend- ing on the spectrum access strategy at secondary users, the existing cognitive radio schemes can be classified to three paradigms: interweave, overlay, and underlay [51]. In the interweave paradigm, secondary users exploit spectrum white holes and intend to access the spectrum opportunistically when primary users are idle. In the overlay paradigm, secondary users are allowed to access spec- 13 trum simultaneously with primary users, provided that the primary users share the knowledge of their signal codebooks and messages with the secondary users. Compared to these two paradigms, the underlay paradigm is more appealing as it allows secondary users to concurrently utilize the spectrum with primary users while requiring neither coordination nor knowledge from the primary users. Although there is a large body of work on underlay CRNs in the literature, most of existing work is either focused on theoretical exploration or reliant on unrealistic assumptions such as cross- network channel knowledge and inter-network coordination (see, e.g., [34,89,91,107,122,138,146, 166, 175]). Thus far, very limited progress has been made in the design of practical underlay spec- trum sharing schemes. To the best of our knowledge, there is no underlay spectrum sharing scheme that has been implemented in real-world wireless environments. This stagnation underscores the challenge in such a design, which is reflected in the following tasks: i) at a secondary transmitter, how to pre-cancel its generated interference for the primary receivers in its close proximity; and ii) at a secondary receiver, how to decode its desired signals in the presence of unknown interference from primary transmitters. These two tasks become even more challenging when secondary users have no knowledge (e.g., signal waveform and frame structure) about primary users. In this chapter, we consider an underlay CRN that comprises a pair of primary users and a pair of secondary users. We assume that the secondary users are equipped with more antennas than the primary users. By leveraging their multiple antennas, the secondary users take the full responsibility for cross-network Interference Cancellation (IC). For such a CRN, we propose a practical spectrum sharing scheme that allows the secondary users to access the spectrum while re- maining transparent to the primary users. The key components of our scheme are two interference management techniques: BBF and BIC. The proposed BBF technique is used at the secondary transmitter to pre-cancel its generated 14 interference for the primary receiver. In contrast to existing beamforming techniques, which re- quire channel knowledge for the construction of beamforming filters, our BBF technique does not require channel knowledge. Instead, it constructs the beamforming filters by exploiting the sta- tistical characteristics of the overheard interfering signals from the primary users. The proposed BIC technique is used at the secondary receiver to decode its desired signals in the presence of unknown interference from the primary transmitter. Unlike existing IC techniques, which require CSI and inter-network synchronization, our BIC technique requires neither cross-network channel knowledge nor inter-network synchronization for signal detection. Rather, it leverages the refer- ence symbols (preamble) embedded in the data frame of secondary users to construct the decoding filters for signal detection in the face of unknown interference. With these two IC techniques, the secondary users can effectively mitigate the cross-network interference in the absence of coordi- nation from the primary users. We have built a prototype of our scheme on a wireless testbed to evaluate its performance in real-world wireless environments. Particularly, we have demonstrated that our prototyped sec- ondary devices share 2.4 GHz spectrum with commercial Wi-Fi devices (primary users) while not affecting Wi-Fi devices’ throughput. A demo video of our scheme is presented in [131]. We fur- ther conduct experiments to evaluate the performance of our secondary network in coexistence with LTE-like and CDMA-like primary networks in the following two cases: i) the primary users are equipped with one antenna and the secondary users equipped with two antennas; and ii) the primary users are equipped with two antennas and the secondary users equipped with three anten- nas. Experimental results measured in an office environment show that the secondary network can achieve an average of 1.1 bits/s/Hz spectrum utilization without visibly degrading primary network throughput. Moreover, the proposed BBF and BIC techniques achieve an average of 25 dB and 33 dB IC capabilities, respectively. 15 The contributions of this work are summarized as follows: • We have designed a new BIC technique for a wireless receiver, which is capable of decoding its data packets in the presence of unknown interference. Our prototype of such a wireless receiver can achieve 33 dB IC capability for unknown interference in real-world tests. • We have designed a new BBF technique for a wireless transmitter, which is capable of pre- canceling its generated interference for an unintended receiver without the need of channel knowledge. Our prototype of such a wireless transmitter can achieve 25 dB IC capability for the unintended receiver. • To the best of our knowledge, our work is the first one that demonstrates real-time concurrent spectrum utilization of two wireless systems in the absence of inter-network coordination and fine-grained synchronization. 2.2 Related Work We focus our literature survey on spectrum sharing in underlay CRNs and the related interference management techniques. Spectrum Sharing in Underlay CRNs. Underlay CRNs allow concurrent spectrum utiliza- tion for primary and secondary networks as long as the interference at primary users remains at an acceptable level. Different signal processing techniques have been studied for interference management in underlay CRNs, such as spread spectrum [59], power control [91, 107, 175], and beamforming [3, 4, 6, 8, 24, 31, 46, 49, 56, 57, 64, 118, 119, 125, 133, 190, 191, 202, 203]. Spread spectrum handles interference in the code domain, and power control tames interference in the power domain. Beamforming exploits the spatial DoF provided by multiple antennas to steer the 16 secondary signals to some particular directions, thereby avoiding interference for primary users. Compared to the other two techniques, beamforming is more appealing in practice as it is effective in interference management. Given its potential, beamforming has been studied in underlay CRNs to pursue various ob- jectives, such as improving energy efficiency of secondary transmissions [49, 64, 133, 191], max- imizing data rate of secondary users [3, 4], maximizing sum rate of both primary and secondary users [8, 46, 56, 57], and enhancing the security against eavesdroppers [118, 119, 202]. However, most of these beamforming solutions are reliant on global network information and cross-network channel knowledge. Our work differs from these efforts as it requires neither cross-network chan- nel knowledge nor inter-network cooperation. BBF in Underlay CRNs. There are some pioneering works that studied BBF to eliminate the requirement of cross-network channel knowledge for the design of beamforming filters [6, 24, 31, 125, 190, 203]. In [203] and [31], an eigen-value-decomposition-based approach was proposed to construct beamforming filters at a secondary transmitter using its received interfering signals from a primary device. When the secondary device transmitting, the constructed beamforming filters would steer its radio signals to the null subspace of the cross-network channel, thereby avoiding interference for the primary device. Our BBF technique follows similar idea, but differs in the net- work setting and design objective. Specifically, [203] and [31] were focused on theoretical analysis to optimize the data rate of secondary users under certain interference temperature, while the BBF technique in our work is designed to guarantee its practicality and optimize its IC capability in real-world OFDM-based networks. In [6] and [24], the beamforming design is formulated as a part of a network optimization problem, and some constraints are developed based on statistical channel knowledge to relax the requirement of cross-network channel knowledge. This approach is of high complexity, and it 17 3ULPDU\XVHU   3ULPDU\XVHU 38 38 FURVVQHWZRUN LQWHUIHUHQFH 6HFRQGDU\XVHU   6HFRQGDU\XVHU 68 68 Figure 2.1: A CRN consisting of two active primary users and two active secondary users. seems not amenable to practical implementation. on the In [125] and [190], spatial learning meth- ods were proposed to iteratively adjust beamforming filters at the secondary devices based on the power level of primary transmission, with the objective of reducing cross-network interference for primary users. However, these learning-based methods are cumbersome and not amenable to practical use. MIMO-based BIC. While there are many results on interference cancellation in cooperative wireless networks, the results of MIMO-based BIC in non-cooperative networks remain limited. In [142], Rousseaux et al. proposed a MIMO-based BIC technique to handle interference from one source. In [182], Winters proposed a spatial filter design for signal detection at multi-antenna wire- less receivers to combat unknown interference. In [52], Gollakota et al. proposed a MIMO-based solution to mitigate narrow-band interference from home devices such as microwave. BIC was further studied in the context of radio jamming in wireless communications (see, e.g., [149, 192]). Compared to the existing BIC techniques, our BIC technique has a lower complexity and offers much better performance (33 dB IC capability in our experiments). 2.3 Problem Statement We consider an underlay CRN as shown in Fig. 2.1, which consists of two active primary users and two secondary users. The primary users establish bidirectional communications in Time-Division Duplex (TDD) mode. The traffic flow in the primary network is persistent and consistent in both 18 WLPH )RUZDUGSDFNHW SDFNHWIURP38WR38 %DFNZDUGSDFNHW SDFNHWIURP38WR38 Figure 2.2: Consistent and persistent traffic in the primary network. directions, as shown in Fig. 2.2. The secondary users want to utilize the same spectrum for their own communications. To do so, the secondary transmitter employs beamforming to pre-cancel its generated interference for the primary receiver; and the secondary receiver performs IC for its signal detection. Simply put, the secondary users take full burden of cross-network interference cancellation, and their data transmissions are transparent to the primary users. In this CRN, there is no coordination between the primary and secondary users. The secondary users have no knowledge about cross-network interference characteristics. The primary users have one or multiple antennas, and the number of their antennas is denoted by Mp . The secondary users have multiple antennas, and the number of their antennas is denoted by Ms . We assume that the number of antennas on a secondary user is greater than that on a primary user, i.e., Ms > Mp . This assumption ensures that each secondary user has sufficient spatial DoF to tame cross-network interference. Our Objective. In such a CRN, our objective is four-fold: i) design a BBF technique for the secondary transmitter to pre-cancel its generated interference for the primary receiver; ii) de- sign a BIC technique for the secondary receiver to decode its desired signals in the presence of interference from the primary transmitter; iii) design a spectrum sharing scheme by integrating these two IC techniques; and iv) evaluate the IC techniques and the spectrum sharing scheme via experimentation in real wireless environments. Two Justifications: First, in this work, we study a CRN that comprises one pair of primary users and one pair of secondary users. Although it has a small network size, such a CRN serves as a 19 3DFNHWWUDQVPLVVLRQVLQSULPDU\QHWZRUN  WLPH 3DFNHWWUDQVPLVVLRQVLQVHFRQGDU\QHWZRUN  WLPH 68RU68RYHUKHDUVLQWHUIHULQJVLJQDOVIURPSULPDU\WUDQVPLWWHU )RUZDUGSDFNHW SDFNHWIURP68WR68 %DFNZDUGSDFNHW SDFNHWIURP68WR68 Figure 2.3: A MAC protocol for spectrum sharing in a CRN that has two primary users and two secondary users. fundamental building block for a large-scale CRN that have many primary and secondary users. Therefore, understanding this small CRN is of theoretical and practical importance. Second, in our study, we assume that the secondary users have no knowledge about cross-network interference characteristics. Such a conservative assumption leads to a more robust spectrum sharing solution, which is suited for many application scenarios. 2.4 A Spectrum Sharing Scheme In this section, we present a spectrum sharing scheme for the secondary network so that it can use the same spectrum for its communications while almost not affecting performance of the primary network. Our scheme consists of a lightweight MAC protocol and a new PHY design for the secondary users. In what follows, we first present the MAC protocol and then describe the new PHY design. 20 2.4.1 MAC Protocol for Secondary Network Fig. 2.3 shows our MAC protocol in the time domain. It includes both forward communications (from SU 1 to SU 2) and backward communications (from SU 2 to SU 1) between the two sec- ondary users. Since the two communications are symmetric, our presentation in the following will focus on the forward communications. The backward communications can be done in the same way. The forward communications in the proposed MAC protocol comprise two phases: overhear- ing (Phase I) and packet transmission (Phase II). In the time domain, Phase I aligns with the back- ward packet transmissions of the primary network, and Phase II aligns with the forward packet transmissions of the primary network, as illustrated in Fig. 2.3. We elaborate the operations in the two phases as follows: • Phase I: SU 1 overhears the interfering signals from PU 2, and SU 2 remains idle, as shown in Fig. 2.4(a). • Phase II: SU 1 first constructs beamforming filters using the overheard interfering signals in Phase I and then transmits signals to SU 2 using the constructed beamforming filters. Meanwhile, SU 2 decodes the signals from SU 1 in the presence of interference from PU 1. Fig. 2.4(b) shows the packet transmission in this phase. When the primary network has consistent and persistent bidirectional traffic, it is easy for sec- ondary devices to learn primary transmission direction and duration by leveraging wireless signals’ spatial signature (e.g., signal angle-of-arrival). Based on the learned information, the secondary network can align its transmissions with the transmissions in the primary network, as illustrated in Fig. 2.3. It is noteworthy that the time alignment requirement of primary and secondary trans- 21 5[ 7[ 7[ 5[ 38   38 38   38 68   68 68   68 2YHUKHDULQJ ,GOH 7[ 5[ (a) Phase I: SU 1 overhears the interfering signals (b) Phase II: SU 1 sends data to SU 2 using IC tech- from PU 2. niques. Figure 2.4: Illustration of our proposed spectrum sharing scheme. missions is loose, thanks to the capability of BBF and BIC at the PHY layer. To ensure that the secondary transmissions will not disrupt the primary transmissions, SU 1 sends its signals only after it detects the interfering signals from PU 2. 2.4.2 PHY Design for Secondary Users: An Overview To support the proposed MAC protocol, we use IEEE 802.11 legacy PHY for the secondary net- work, including frame structure, OFDM modulation, and channel coding schemes. However, IEEE 802.11 legacy PHY is vulnerable to cross-network interference. Therefore, we need to modify the legacy PHY for the secondary users. The modified PHY should be resilient to cross-network inter- ference on both transmitter and receiver sides. The design of such a PHY faces the following two challenges. Challenge 1. Referring to Fig. 2.4(b), the main task of the secondary transmitter (SU 1) is to pre-cancel its generated interference for the primary receiver (PU 2). Note that we assume the sec- ondary transmitter has no knowledge about the primary network, including the signal waveform, bandwidth, and frame structure. The primary network may use OFDM, Code-Division Multiple Access (CDMA) or other types of modulation for packet transmission. The lack of knowledge about the interference makes it challenging for SU 1 to cancel the interference. 22 To address this challenge, we design a BBF technique for the secondary transmitter (SU 1) to pre-cancel its interference at the primary receiver. Our beamforming technique takes advantage of the overheard interfering signals in Phase I to construct precoding vectors for beamforming. Our BBF technique can completely pre-cancel the interference at the primary receiver if noise is zero and the reciprocity of forward/backward channels is maintained. Details of this beamforming technique are presented in Section 2.5. Challenge 2. Referring to Fig. 2.4(b) again, the main task of the secondary receiver (SU 2) is to decode its desired signals in the presence of unknown cross-network interference. Note that the secondary receiver has no knowledge about the interference characteristics, and the primary and secondary networks may use different waveforms and frame formats for their transmissions. The lack of inter-network coordination, cross-network knowledge and fine-grained synchroniza- tion makes it challenging to tame interference for signal detection. To address this challenge, we design a MIMO-based BIC technique for the secondary receiver. The core component of our BIC technique is a spatial filter, which mitigates unknown cross- network interference from the primary transmitter and recovers the desired signals. Details of this BIC technique are presented in Section 2.6. 2.5 Blind Beamforming In this section, we study the beamforming technique at SU 1 in Fig. 2.4. In Phase I, SU 1 first overhears the interfering signals from the primary transmitter and then uses the overheard interfer- ing signals to construct spatial filters. Based on channel reciprocity, the constructed spatial filters are used as beamforming filters in Phase II to avoid interference at the primary receiver. These op- erations are performed on each subcarrier in the OFDM modulation. In what follows, we first 23 present the derivation of beamforming filters and then offer performance analysis of the proposed beamforming technique. Mathematical Formulation. Consider SU 1 in Fig. 2.4(a). It overhears interfering signals from PU 2. The overheard interfering signals are converted to the frequency domain through FFT operation.1 We assume that the channel from PU 2 to SU 1 is a block-fading channel in the time domain. That is, all the OFDM symbols in the backward transmissions experience the same channel. Denote Y(l, k) as the lth sample of the overheard interfering signal on subcarrier k in Phase I. Then, we have2 [1] [1] Y(l, k) = Hsp (k)Xp (l, k) + W(l, k), (2.1) [1] where Hsp (k) ∈ CMs ×Mp is the matrix representation of the block-fading channel from PU 2 [1] to SU 1 on subcarrier k, Xp (l, k) ∈ CMp ×1 is the interfering signal transmitted by PU 2 on subcarrier k, and W(l, k) ∈ CMs ×1 is the noise vector at SU 1. It is noteworthy that SU 1 knows [1] [1] Y(l, k) but does not know Hsp (k), Xp (l, k), and W(l, k). At SU 1, we seek a spatial filter that can combine the overheard interfering signals in a de- structive manner. Denote P(k) as the spatial filter on subcarrier k. Then, the problem of designing P(k) can be expressed as: min E[P(k)∗ Y(l, k)Y(l, k)∗ P(k)], s.t. P(k)∗ P(k) = 1, (2.2) where (·)∗ represents conjugate transpose operator. Construction of Spatial Filters. To solve the optimization problem in (2.2), we use Lagrange 1 The interfering signals are not necessarily OFDM signals. 2 For the notation in this chapter, superscripts “[1]” and “[2]” mean Phase I and Phase II, respectively. Subscripts “s” and “p” mean the secondary and primary users, respectively. 24 multipliers method. We define the Lagrange function as: L(P(k), λ) = E P(k)∗ Y(l, k)Y(l, k)∗ P(k) −λ P(k)∗ P(k) −1 ,     (2.3) where λ is Lagrange multiplier. By setting the partial derivatives of L(P(k), λ) to zero, we have ∂L(P(k), λ)   = P(k)∗ E[Y(l, k)Y(l, k)∗ ] − λI = 0, (2.4) ∂P(k) ∂L(P(k), λ) = P(k)∗ P(k) − 1 = 0. (2.5) ∂λ Based on the definition of eigendecomposition, it is easy to see that the solutions to equations (2.4) and (2.5) are the eigenvectors of E[Y(l, k)Y(l, k)∗ ] and the corresponding values of λ are the eigenvalues of E[Y(l, k)Y(l, k)∗ ]. Note that E[Y(l, k)Y(l, k)∗ ] has Ms eigenvectors, each of which corresponds to a stationary point of the Lagrange function (extrema, local optima, and global optima). As λ is the penalty multiplier for the Lagrange function, the optimal spatial filter P(k) lies within the subspace spanned by the eigenvectors of E[Y(l, k)Y(l, k)∗ ] that correspond to the minimum eigenvalue. For Hermitian matrix E[Y(l, k)Y(l, k)∗ ], it may have multiple eigenvectors that correspond to the minimum eigenvalue. Denote Me as the number of eigenvectors that correspond to the minimum eigenvalue. Then, we can write them as:   [U1 , U2 , · · · , UMe ] = mineigvectors E[Y(l, k)Y(l, k)∗ ] , (2.6) where mineigvectors(·) represents the eigenvectors that correspond to the minimum eigenvalue. To estimate E[Y(l, k)Y(l, k)∗ ] in (2.6), we average the received interfering signal samples over 25 time. Denote Y(l, k) as the lth sample of the interfering signals on subcarrier k. Then, we have Lp X  [U1 , U2 , · · · , UMe ] = mineigvectors Y(l, k)Y(l, k)∗ , (2.7) l=1 where Lp is the number of overheard interfering signal samples (e.g., Lp = 20). Also, the neigh- boring subcarriers can be bonded to improve accuracy. Based on (2.7), the optimal filter P(k) can be written as: Me X P(k) = αm Um , (2.8) m=1 PMe 2 where αm is a weight coefficient with m=1 αm = 1. Now, we summarize the BBF technique as follows. In Phase I, SU 1 overhears the interfering signal Y(l, k) from PU 2. Based on the overheard interfering signals, it constructs a spatial filter P(k) for subcarrier k using (2.7) and (2.8). In Phase II, we use P(k) as the precoding vector for beamforming on subcarrier k, where (·) is the element-wise conjugate operator. For this beamforming technique, we have the following remarks: i) This beamforming tech- nique does not require CSI. Rather, it uses the overheard interfering signals to construct the pre- coding vectors for beamforming. ii) This beamforming technique requires only one-time eigende- composition on every subcarrier. It has a computational complexity similar to Zero Forcing (ZF) and Minimum Mean Square Error (MMSE) precoding techniques. Therefore, it is amenable to practical implementation. IC Capability of BBF. For the performance of the proposed beamforming technique, we have the following lemma: Lemma 1. The proposed beamforming technique completely pre-cancels interference at the pri- mary receiver if (i) forward and backward channels are reciprocal; and (ii) noise is zero. 26 Proof. We first consider the signal transmission in Phase I and then consider that in the Phase II. [1] [1] In Phase I, if the noise is zero, we have Y(l, k) = Hsp (k)Xp (l, k). Then, we have Lp (a) (b) [1] [1] Y(l, k)Y(l, k)∗ = Lp E[Y(l, k)Y(l, k)∗ ] = Lp Hsp (k)Rx (k)Hsp (k)∗ , X (2.9) l=1 where (a) follows from that Y(l, k) is a stationary random process, which is true in practice; and [1] [1] (b) follows from the definition of Rx (k) = E[Xp (l, k)Xp (l, k)∗ ]. Based on (3.12), we have Lp X      [1] [1] Rank Y(l, k)Y(l, k)∗ ∗ =Rank Lp Hsp (k)Rx (k)Hsp (k) ≤ Rank Rx (k) ≤ Mp . (2.10) l=1 PLp ∗ Inequation (2.10) indicates that l=1 Y(l, k)Y(l, k) has at least Ms − Mp eigenvectors that correspond to zero eigenvalues. This further indicates that [U1 , U2 , · · · , UMe ] in (2.7) are corre- sponding to zero eigenvalues. Therefore, we have X Lp  Y(l, k)Y(l, k)∗ Um = 0, for 1 ≤ m ≤ Me . (2.11) l=1 Based on (3.12) and (2.11), we have   [1] [1] Lp Hsp (k)Rx (k)Hsp (k)∗ Um = 0, for 1 ≤ m ≤ Me . (2.12) [1]   In real wireless environments, we have Rank Hsp (k) = Mp and Rank Rx (k) = Mp . Therefore, the following equation can be deducted from (2.12). [1] Hsp (k)∗ Um = 0, for 1 ≤ m ≤ Me . (2.13) 27 Based on (2.8) and (2.13), we have Me [1] [1] Hsp (k)∗ P(k) αm Hsp (k)∗ Um = 0. X = (2.14) m=1 [2] We now consider signal transmission in Phase II (see Fig. 2.4(b)). Denote Hps as the matrix representation of the channel from SU 1 to PU 2 on subcarrier k in Phase II. Given that the forward [2] [1] T and backward channels in the two phases are reciprocal, we have Hps = Hsp . Then, we have [2] [1] T [1] Hps (k)P(k) = Hsp P(k) = Hsp (k)∗ P(k) = 0. (2.15) [2] It means that the precoding vector P(k) is orthogonal to the interference channel Hps (k). Therefore, we conclude that the proposed beamforming scheme can completely pre-cancel the interference from the secondary transmitter at the primary receiver in Phase II. The proof of Lemma is based on reciprocity of channels in backward and forward transmis- sions. To maintain the reciprocity of forward and backward channels in practical wireless systems, we can employ the relative calibration method in [150]. This relative calibration method is an inter- nal and standalone method that can be done with assistance from one device. In our experiments, we have implemented this calibration method to preserve the channels reciprocity. 2.6 Blind Interference Cancellation In this section, we focus on SU 2 in Phase II as shown in Fig. 2.4(b). We design a BIC technique for the secondary receiver (SU 2) to decode its desired signals in the presence of interference from the primary transmitter (PU 1). 28 Mathematical Formulation. Recall that we use IEEE 802.11 legacy PHY for data transmis- sions in the secondary network. Specifically, SU 1 sends packet-based signals to SU 2, which comprise a bulk of OFDM symbols. In each packet, the first four OFDM symbols carry preambles (pre-defined reference signals) and the remaining OFDM symbols carry payloads. [2] Consider the signal transmission in Fig. 2.4(b). Denote Xs (l, k) as the signal that SU 1 [2] transmits on subcarrier k in OFDM symbol l. Denote Xp (l, k) as the signal that PU 1 transmits on subcarrier k in OFDM symbol l.3 Denote Y(l, k) as the received signal vector at SU 2 on subcarrier k in OFDM symbol l. Then, we have [2] [2] [2] [2] Y(l, k) = Hss (k)P(k)Xs (l, k) + Hsp (k)Xp (l, k) + W(l, k), (2.16) [2] [2] where Hss (k) is the block-fading channel between SU 2 and SU 1 on subcarrier k, Hsp (k) is the block-fading channel between SU 2 and PU 1 on subcarrier k, and W(l, k) is noise on subcarrier k in OFDM symbol l. At SU 2, in order to decode the intended signal in the presence of cross-network interference, we use a linear spatial filter G(k) for all OFDM symbols on subcarrier k. Then, the decoded signal can be written as: [2] X̂s (l, k) = G(k)∗ Y(l, k). (2.17) While there exist many criteria for the design of G(k), our objective is to minimize the mean square error (MSE) between the decoded and original signals. Thus, the signal detection problem can be formulated as: h [2] [2] 2i min E X̂s (l, k) − Xs (l, k) . (2.18) 3 PU 1 does not necessarily send OFDM signals. But at SU 2, the interfering signals from PU 1 can always be converted to the frequency domain using FFT operation. 29 Construction of Spatial Filters. To solve the optimization problem in (2.18), we use Lagrange multipliers method again. We define the Lagrange function as: h [2] [2] 2i L(G(k)) = E X̂s (l, k) − Xs (l, k) . (2.19) Based on (2.17), (2.19) can be rewritten as: h [2] 2i L(G(k)) = E G(k)∗ Y(l, k) − Xs (l, k) . (2.20) Equation (2.20) is a quadratic function of G(k). To minimize MSE, we can take the gradient with respect to G(k). The optimal filter G(k) can be obtained by setting the gradient to zero, which we show as follows: [2] E Y(l, k)Y(l, k)∗ G(k) − E Y(l, k)Xs (l, k)∗ = 0.     (2.21) Based on (2.21), we obtain the optimal filter +  [2] G(k) = E Y(l, k)Y(l, k)∗ E Y(l, k)Xs (l, k)∗ ,   (2.22) where (·)+ denotes pseudo inverse operation. Equation (2.22) is the optimal design of G(k) in the [2] sense of minimizing MSE. To calculate E Y(l, k)Y(l, k)∗ and E Y(l, k)Xp (l, k)∗ in (2.22),     we can take advantage of the pilot (reference) symbols in wireless systems (e.g., the preamble in IEEE 802.11 legacy frame). Denote Qk as the set of pilot symbols in a frame that can be used for the design of interference mitigation filter G(k). Then, we can approach the statistical expectations 30 /67) //7) /6,* 'DWD   6XEFDUULHUk              7KHVHWRISLORW  VLJQDOVXVHGIRU   VXEFDUULHUk¶VVSDWLDO  ILOWHUGHVLJQ  Figure 2.5: An example of Q(k) in IEEE 802.11 legacy frame. in (2.22) using the averaging operations as follows: 1 E Y(l, k)Y(l, k)∗ ≈ Y(l, k ′ )Y(l, k ′ )∗ ,   X (2.23) |Qk | (l,k ′ )∈Qk [2] 1 X [2] E Y(l, k)Xp (l, k)∗ ≈ Y(l, k ′ )Xp (l, k ′ )∗ ,   (2.24) |Qk | ′ (l,k )∈Qk where an example of Qk is illustrated in Fig. 2.5. Note that, with a bit abuse of notation, we replace the approximation sign in (2.23) and (2.24) with an equation sign for simplicity. Then, the spatial filter G(k) can be written as: h X i+ h X [2] ∗i G(k) = Y(l, k ′ )Y(l, k ′ )∗ Y(l, k ′ )Xp (l, k ′ ) . (2.25) (l,k ′ )∈Qk (l,k ′ )∈Qk We now summarize our BIC technique as follows. In Phase II, SU 2 needs to decode its desired signal in the presence of interference from PU 1. To do so, SU 2 first constructs a spatial filter for each of its subcarriers using (2.25), and then decodes its desired signal using (2.17). For this BIC technique, several remarks are in order: i) The spatial filter in (2.25) not only can- cels the interference but also equalizes the channel distortion for signal detection. ii) As shown in (2.17) and (2.25), our BIC technique does not require knowledge about the interference character- 31 istics, including waveform and bandwidth. iii) our BIC technique does not require CSI. Rather, it only requires pilot signals at the secondary transmitter. In contrast to conventional signal detection techniques (e.g., ZF and MMSE detectors), our BIC technique technique does not require channel estimation. iv) As shown in (2.17) and (2.25), the computational complexity of our BIC technique is similar to that of the ZF detector, which is widely being used in real-world wireless systems. IC Capability of BIC. For the performance of the proposed BIC technique, we have the fol- lowing lemma: Lemma 2. If the pilot signals are sufficient and noise is zero, the BIC technique can perfectly re- [2] [2] cover the desired signals in the presence of cross-network interference (i.e., X̂s (k, l) = Xs (k, l), ∀k, l). Proof. For notational simplicity, we denote H(k) as the compound channel between the SU 2 h i [2] [2] and the two transmitters (SU 1 and PU 1), i.e., H(k) = Hss (k)P(k) Hsp (k) ; we also denote h iT [2] [2] X(l, k) as the compound transmit signals at the two transmitters, i.e., X(l, k) = Xs (l, k) Xp (l, k) . Then, in noise-negligible scenarios, (2.16) can be rewritten as Y(l, k) = H(k)X(l, k). By defining RX as the autocorrelation matrix of the compound transmit signals, we have     (a)  Rxs 0   1 0  RX = E(XXH ) =  = , (2.26) 0 Rxp 0 Rxp where Rxs is the autocorrelation of SU 1’s transmit signal and Rxp is the autocorrelation matrix of PU 1’s transmit signals. (a) follows from our assumption that the transmit signal from SU 1 is independent of the transmit signals from PU 1. Note that Rxp is not necessarily an identity matrix since the signals from PU 1’s different antennas might be correlated. 32 Based on (2.25), (3.4), and (3.5), we have h X i+ h X [2] ∗i G(k) = Y(l, k ′ )Y(l, k ′ )H Y(l, k ′ )Xs (l, k ′ ) (l,k ′ )∈Qk (l,k ′ )∈Qk (a) +  [2] ∗ = E Y(l, k)Y(l, k)∗ E Y(l, k)Xs (l, k)  (b) +  = H(k)RX H(k)∗   H(k)I1 , (2.27) where (a) follows from our assumption that the amount of reference signals is sufficient to achieve convergence of G(k); (b) follows from the definition that I1 is a vector where its first entry is 1 and all other entries are 0. Based on (2.17) and (3.6), we have [2] X̂s (l, k) = G(k)∗ Y(l, k) n +  o∗ = H(k)RX H(k)∗ H(k)I1 H(k)X(l, k) [2] = Xs (l, k), ∀l, k. (2.28) Pilot Signals for Spatial Filter Construction. Lemma 2 shows the superior performance of our BIC technique when the pilot signals are sufficient. A natural question to ask is how many pilot signals are considered to be sufficient. To answer this question, we first present our simulation results to study the convergence speed of the spatial filter over the number of pilot signals, and then propose a method to increase the number of pilot signals for the spatial filter construction. As an instance, we simulated the convergence speed of the spatial filter over the number of pilot symbols for SU 2 in Fig. 2.4. Fig. 2.6 and Fig. 2.7 present our simulation results in two network settings: (Mp = 1, Ms = 2) and (Mp = 2, Ms = 3). From the simulation results, we can see 33    5HDODQGLPDJSDUWVRIHQWULHVLQ* k 5HDODQGLPDJSDUWVRIHQWULHVLQ* k 5HDODQGLPDJSDUWVRIHQWULHVLQ* k                      1XPEHURISLORWV\PEROV 1XPEHURISLORWV\PEROV 1XPEHURISLORWV\PEROV (a) SNR=5dB case. (b) SNR=15dB case. (c) SNR=25dB case. Figure 2.6: Convergence speed of spatial filter over the number of pilot symbols in (Mp = 1, Ms = 2) network.    5HDODQGLPDJSDUWVRIHQWULHVLQ* k 5HDODQGLPDJSDUWVRIHQWULHVLQ* k 5HDODQGLPDJSDUWVRIHQWULHVLQ* k                      1XPEHURISLORWV\PEROV 1XPEHURISLORWV\PEROV 1XPEHURISLORWV\PEROV (a) SNR=5dB case. (b) SNR=15dB case. (c) SNR=25dB case. Figure 2.7: Convergence speed of spatial filter over the number of pilot symbols in (Mp = 2, Ms = 3) network. that the spatial filter converges at a pretty fast speed in these two network settings. Specifically, the spatial filter can achieve a good convergence within about 10 pilot symbols. Recall that the secondary network uses IEEE 802.11 legacy frame for transmissions from SU 1 to SU 2, which only has four pilot symbols on each subcarrier (i.e., two L-STF OFDM symbols and two L-LTF OFDM symbols). So, the construction of spatial filter is in shortage of pilot sym- bols. To address this issue, for each subcarrier, we not only use the pilot symbols on that subcarrier but also the pilot symbols on its neighboring subcarriers, as illustrated in Fig. 2.5. The rationale behind this operation lies in the fact that channel coefficients on neighboring subcarriers are highly 34 PU 1 PU 2 PU 1 PU 2 SU 1 SU 2 SU 1 SU 2 (a) Transmission in Phase I. (b) Transmission in Phase II. Figure 2.8: Experimental setup for an underlay CRN with two network settings: (Mp =1, Ms =2) and (Mp=2, Ms=3). correlated in real-world wireless environments. By leveraging the pilot symbols on two neighbor- ing subcarriers, we have 12 pilot symbols for the construction of the spatial filter, which appears to be sufficient based on our simulation results in Fig. 2.6 and Fig. 2.7. We note that analytically studying the performance of BIC with respect to the number and format of pilot signals is beyond the scope of this work. Instead, we resort to experiments to study its performance in real-world network settings. 2.7 Performance Evaluation In this section, we consider an underlay CRN in two time slots as shown in Fig. 2.8. We have built a prototype of the proposed underlay spectrum sharing scheme in this network on a Software- Defined Radio (SDR) testbed and evaluated its performance in real-world wireless environments. 2.7.1 Implementation PHY Implementation. We consider three different primary networks: a commercial Wi-Fi pri- mary network, a LTE-like primary network, and a CDMA-like primary network. The commercial Wi-Fi network comprises Alfa AWUS036NHA 802.11n Adapters, each of which has one antenna for radio signal transmissions and receptions. The LTE-like and CDMA-like primary networks as well as the secondary network are built using USRP N210 devices and general-purpose com- 35 Table 2.1: The implementation parameters of primary and secondary networks. Primary Primary Primary Secondary network 1 network 2 network 3 network System type Commercial Custom-built Custom-built Custom-built Standard Wi-Fi LTE-like CDMA-like Wi-Fi-like Waveform OFDM OFDM CDMA OFDM FFT-Point 64 1024 - 64 Valid subcarriers 52 600 - 52 Sample rate 20 MSps 10 MSps 5 MSps 5, 25 MSps Signal bandwidth ∼16 MHz ∼5.8 MHz ∼5 MHz ∼4.06, 20.31 MHz Carrier frequency 2.48 GHz 2.48 GHz 2.48 GHz 2.48 GHz Max tx power ∼20 dBm ∼15 dBm ∼15 dBm ∼15 dBm Antenna number 1 1, 2 1 2, 3 puters. The USRP devices are used for radio signal transmission/reception while the computers are used for baseband signal processing and MAC protocol implementation. The implementation parameters are listed in Table 2.1. MAC Implementation. We implement the MAC protocol in Fig. 2.3 for the primary and secondary networks. The packet transmissions in the two networks are loosely aligned in time, as shown in Fig. 2.3. Since the bidirectional communications in the secondary network are symmetric, we only consider the forward communications (from SU 1 to SU 2). We implement BBF on SU 1 to pre-cancel interference for the primary receiver. We also implement BIC on SU 2 to decode its desired signals in the presence of interference from PU 1. Moreover, we implement the RF chain calibration method [150] on SU 1 in Fig. 2.8 to maintain relative channel reciprocity. Note that the calibration needs to be done at a low frequency (0.1 Hz in our experiments) and therefore would not consume much airtime resource. 2.7.2 Experimental Setup and Performance Metrics Experimental Setup. Consider the primary and secondary networks in Fig. 2.8. We place the devices on a floor plan as shown in Fig. 2.9(a). The two primary users are always placed at the 36 32 m Loc 11 Loc 10 (a) (b) SU 1 SU 2 SU 1 SU 2 Loc 8 SU 1 SU 2 Loc 12 Loc 9 SU 1 SU 2 SU 1 SU 2 Loc 1 Loc 7 14 m (c) SU 1 SU 2 PU 1 PU 2 SU 1 SU 2 Loc 2 Loc 3 Loc 4 Loc 5 Loc 6 SU 1 SU 2 SU 1 SU 2 SU 1 SU 2 SU 1 SU 2 SU 1 SU 2 Figure 2.9: Experimental setting: (a) floor plan of primary and secondary users’ locations; (b) a primary transceiver; and (c) a secondary transceiver. spots marked “PU 1” and “PU 2.” The two secondary users are placed at one of the 12 different locations. The distance between PU 1 and PU 2 is 10 m and the distance between SU 1 and SU 2 is 6 m. Fig. 2.9(b-c) show the prototyped secondary and primary transceivers on our wireless testbed. The transmit power of primary users is fixed to the maximum level specified in Table 2.1, while the transmit power of secondary users is properly adjusted to ensure that its generated interference to the primary receiver (after BBF) is below noise level. Performance Metrics. We evaluate the performance of the proposed spectrum sharing scheme using the following four metrics: i) Tx-side IC capability at SU 1: This IC capability is from SU 1’s BBF. It is defined as βtx = 10 log10 (P1 /P2 ), where P1 is the received interference power at PU 2 when SU 1 uses [ √1 √1 ] or [ √1 √1 √1 ] as the precoder, and P2 is the received interference 2 2 3 3 3 power at PU 2 when SU 1 uses our BBF precoder. ii) Rx-side IC capability at SU 2: This IC capability is from SU 2’s BIC. It is defined as βrx = |EVM| − max{SIR m }, where SIR m is Signal to Interference Ratio (SIR) on SU 2’s mth antenna and Error Vector Magnitude (EVM) will 37 Table 2.2: EVM specification in IEEE 802.11ac [67]. EVM (dB) (inf -5) [-5 -10) [-10 -13) [-13 -16) [-16 -19) [-19 -22) [-22 -25) [-25 -27) [-27 -30) [-30 -32) [-32 -inf) Modulation N/A BPSK QPSK QPSK 16QAM 16QAM 64QAM 64QAM 64QAM 256QAM 256QAM Coding rate N/A 1/2 1/2 3/4 1/2 3/4 2/3 3/4 5/6 3/4 5/6 γ(EVM) 0 0.5 1 1.5 2 3 4 4.5 5 6 20/3 Table 2.3: EVM specification for LTE-like PHY [43, 77]. EVM (dB) [-6.3 -9.1) [-9.1 -11.8) [-11.8 -14.2) [-14.2 -16.8) [-16.8 -19.1) CQI 6 7 8 9 10 Modulation QPSK 16QAM 16QAM 16QAM 64QAM Coding rate ×1024 602 378 490 616 466 γ(EVM) 1.1758 1.4766 1.9141 2.4063 2.7305 EVM (dB) [-19.1 -21.0) [-21.0 -23.3) [-23.3 -25.7) [-25.7 -28.2) [-28.2 -∞) CQI 11 12 13 14 15 Modulation 64QAM 64QAM 64QAM 64QAM 64QAM Coding rate ×1024 567 666 772 873 948 γ(EVM) 3.3223 3.9023 4.5234 5.1152 5.5547 be defined in the following. iii) EVM of the decoded signals at SU 2: It is defined as follows:    [2] [2] 2 E X̂s (l, k) − Xs (l, k) EVM = 10 log10   [2] . (2.29) 2 E Xs (l, k) iv) Throughput of secondary and primary networks: The throughput of the primary and secondary networks are extrapolated based on the measured EVM at SU 2 and PU 2, respectively. To calculate throughput, we use Nsc r= · b · ηt · γ (EVM) , (2.30) Nfft + Ncp where Nsc , Nfft , and Ncp denote number of used subcarriers, FFT points, and the length of cyclic prefix, respectively. b is the sampling rate in MSps. ηt is the portion of available airtime being used for signal transmissions. γ(EVM) is the average number of bits carried by one subcarrier. This parameter is specified in Table 2.2 and Table 2.3 for WiFi-like PHY and LTE-like PHY, respectively. 38 Packet per second 3000 Packet per second 3000 1500 1500 0 0 0 10 20 30 0 10 20 30 Time (s) Time (s) (a) Interference-free scenario. (b) Spectrum sharing scenario. Figure 2.10: Packet delivery rate of the primary network in interference-free and spectrum sharing scenarios. 2.7.3 Coexistence with Commercial Wi-Fi Devices We first consider primary network 1 in Table 2.1. The two Wi-Fi devices (Alfa 802.11n don- gles with Atheros Chipset) in this primary network are connected in the ad-hoc mode, and they send data packets to each other as shown in Fig. 2.3. These two devices are placed at the spots marked by blue squares in Fig. 2.9. The secondary network is also specified in Table 2.1. Each secondary device is equipped with two antennas. We place the two secondary devices at location 1 in Fig. 2.9(a). Primary Network. We first study the performance of the primary devices with and without spectrum sharing. Fig. 2.10(a) shows the measured packet delivery rate between the two primary devices in the absence of secondary devices (i.e., the secondary devices are turned off). Fig. 2.10(b) shows the measured result when the secondary devices conduct their transmissions in Phase II (see Fig. 2.8(b)). It can be seen that, in both cases, the primary network achieves almost the same packet delivery rate. This indicates that the primary network is almost not affected by the secondary network. How is the interference from the secondary transmitter handled? Is it because of the BBF on the secondary transmitter? To answer these questions, we conduct another experiment. When both 39 3000 Packet per second 1500 After moving SU 1's one antenna by 10 cm 0 10 20 30 40 50 60 Time (s) Figure 2.11: Packet delivery rate of the primary network before and after moving SU 1’s one antenna by 10 cm. Relative power (dB) -60 Relative power (dB) -60 -80 -80 -100 -100 -120 -120 -10 -5 0 5 10 -10 -5 0 5 10 Frequency (MHz) Frequency (MHz) (a) Received interference from PU 1. (b) Received signal from SU 1. Figure 2.12: Relative power spectral density of the received signal and interference at the sec- ondary receiver’s first antenna. primary and secondary networks are transmitting, we move one of the secondary transmitter’s antennas about 10 cm. Fig. 2.11 shows the packet delivery rate of the primary network before and after the antenna movement. We can see that the movement of SU 1’s one antenna results in a steep drop of primary network’s packet delivery rate. This indicates that it is SU 1’s BBF that mitigates the interference for PU 2. Secondary Network. We now shift our focus to the secondary network. We first check the strength of signal and interference at the secondary receiver. Fig. 2.12 shows the measured results on one of SU 2’s antennas. We can see that the signal and interference at the secondary receiver are at the similar level. This observation also holds for the another antenna. We then check the performance of the secondary receiver in the presence of interference from the primary transmitter. To do so, we conduct three experiments: i) interference-free transmission of the secondary network (secondary devices only, no primary devices); ii) spectrum-sharing transmission with SU 2 using 40 1 1 1 0 EVM= -27.0 dB 0 EVM= -23.2 dB 0 -1 -1 -1 -1 0 1 -1 0 1 -1 0 1 (a) Interference-free. (b) BIC-spectrum sharing. (c) ZF- spectrum sharing. Figure 2.13: Constellation diagram of the decoded signals at the secondary receiver (SU 2) in three different experiments. our proposed BIC; and iii) spectrum-sharing transmission with SU 2 using ZF signal detection. The measured results are presented in Fig. 2.13. It is clear to see that, with the aid of BIC, the secondary receiver can successfully decode its desired signals. Compared to the interference-free scenario, the EVM degradation is about 3.8 dB. The conventional ZF signal detection method cannot decode the signal in the presence of interference. This shows the effectiveness of our proposed BIC technique. A demo video of our real-time spectrum sharing scheme can be found in [131]. 2.7.4 Network Setting: (Mp = 1, Ms = 2) We now consider the CRN in Fig. 2.8 when the primary devices have one antenna (Mp = 1) and the secondary devices have two antennas (Ms = 2). Primary networks 2 and 3 specified in Table 2.1 are used in our experiments. 2.7.4.1 A Case Study As a case study, we use primary network 3 (CDMA-like) in Table 2.1 and place the secondary devices at location 1 to examine the proposed spectrum sharing scheme. Tx-Side IC Capability. We first want to quantify the tx-side IC capability at the secondary 41 Relative power (dB) -60 Relative power (dB) -60 -80 -80 -100 -100 -120 -120 -2 -1 0 1 2 -2 -1 0 1 2 Frequency (MHz) Frequency (MHz) (a) SU 1 uses [ √12 √1 ] 2 as precoder. (b) SU 1 uses our BBF technique. Figure 2.14: Relative power spectral density of PU 2’s received interference from two-antenna SU 1 in two cases. transmitter (SU 1) from its BBF. To do so, we conduct the following experiments. We turn off the primary transmitter (PU 1) and measure the received interference at the primary receiver (PU 2) in two cases: (i) using [ √1 √1 ] as the precoder; and (ii) using our proposed beamforming precoder 2 2 in (2.7) and (2.8) with α1 = 1. Fig. 2.14 presents our experimental results. We can see that, in the first case, the relative power spectral density of PU 2’s received interference is about −87 dB. In the second case, the relative power spectral density of PU 2’s received interference is about −113 dB. Comparing these two cases, we can see that the tx-side IC capability from BBF is about 113 − 87 = 26 dB. We note that, based on our observations, the relative power spectral density of the noise at PU 2 is in the range of −120 dB to −110 dB. Therefore, thanks to BBF, the interference from the secondary transmitter to the primary receiver is at the noise level. Rx-Side IC Capability, EVM, and Data Rate. We now study the performance of the sec- ondary receiver (SU 2). First, we measure SIR at SU 2. Fig. 2.15 shows our measured results on SU 2’s first antenna. We can see that the relative power spectral density of its received signal and interference is −83 dB and −73 dB, respectively. This indicates that the SIR on SU 2’s first antenna is −10 dB (assuming that noise is negligible). Using the same method, we measured that the SIR on SU 2’s second antenna is −12 dB. We measure the EVM of SU 2’s decoded signals in the presence of interference. Fig. 2.16(a–b) present the constellation of the decoded signals at SU 2. It is evident that SU 2 can decode both 42 Relative power (dB) Relative power (dB) -60 -60 -80 -80 -100 -100 -120 -120 -2 -1 0 1 2 -2 -1 0 1 2 Frequency (MHz) Frequency (MHz) (a) SU 2’s received signal on its first antenna. (b) SU 2’s received interference on its first an- tenna. Figure 2.15: Relative power spectral density of SU 2’s received signal and interference on its first antenna. 1 1 1 1 0 EVM= -21.9 dB 0 EVM= -22.0 dB 0 EVM= -25.1 dB 0 EVM= -25.3 dB -1 -1 -1 -1 -1 0 1 -1 0 1 -1 0 1 -1 0 1 (a) QPSK signals- (b) 16QAM signal- (c) QPSK signal- (d) 16QAM signal- spectrum sharing. spectrum sharing. interference-free. interference-free. Figure 2.16: Constellation diagram of decoded signals at SU 2: our spectrum sharing scheme versus interference-free scenario. QPSK and 16QAM signals from SU 1 in the presence of interference from PU 1. The EVM is −21.9 dB when QPSK is used for the secondary network and −22 dB when 16QAM is used for the secondary network. As a benchmark, Fig. 2.16(c–d) present the experimental results when there is no interference from PU 1. Comparing Fig. 2.16(a–b) with Fig. 2.16(c–d), we can see that SU 2 can effectively cancel the interference from PU 1. Finally, we calculate SU 2’s IC capability and throughput. Based on the SIR on SU 2’s an- tennas and the EVM of its decoded signals, SU 2’s IC capability is 10 + 21.9 = 31.9 dB in this case. Based on (2.30) and the measured EVM, the throughput (data rate) of secondary network is extrapolated to be 4.5 Mbps. 43 (dB) Tx-side IC capability (dB) CDMA-like PHY CDMA-like PHY 40 LTE-like PHY LTE-like PHY 30 Rx-side IC capability 30 20 20 10 10 0 0 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12 Test location index Test location index (a) Tx-side IC capability from the secondary (b) Rx-side IC capability from the secondary re- transmitter’s BBF. ceiver’s BIC.  Figure 2.17: Tx-side and rx-side IC capabilities of the secondary network for Mp = 1, Ms = 2 setting. 2.7.4.2 Experimental Results at all Locations We now extend our experiments from one location to all the 12 locations and present the measured results as follows. Tx-Side IC Capability. Fig. 2.17(a) presents the tx-side IC capability of the two-antenna secondary transmitter (SU 1). We can see that the secondary transmitter achieves a minimum of 20.0 dB and an average of 25.3 dB IC capability across all the 12 locations. Rx-Side IC Capability. Fig. 2.17(b) presents the rx-side IC capability of the two-antenna secondary receiver. We can see that the secondary receiver achieves a minimum of 25.0 dB, a maximum of 38.0 dB, and an average of 32.8 dB IC capability across all the 12 locations, regardless of the PHY used for the primary network. Rx-Side EVM. Fig. 2.18(a) presents the EVM of the decoded signals at the two-antenna sec- ondary receiver in the presence of interference from the primary transmitter. We can see that in all the locations, although the EVM varies, the EVM achieves an average of −21.8 dB, regardless of the PHY used for the primary network. Throughput of Secondary Network. Based on the measured EVM at the secondary receiver, we extrapolate the achievable data rate in the secondary network using (2.30). Fig. 2.18(b) presents 44 8 *") 89:;*04<,/=>? CDMA-like PHY Throughput (Mbps) @+A*04<,/=>? LTE-like PHY 6 !"#$%&' *!) 4 * ) 2 ) 0 ! " # $ % & ' ( ) ! 1 2 3 4 5 6 7 8 9 10 11 12 +,-./0123.415/456,7 Test location index (a) EVM of the decoded signals at the secondary (b) Throughput of the secondary network. receiver. Figure 2.18: Performance  of the secondary network in the proposed spectrum sharing scheme for Mp = 1, Ms = 2 setting. the results. As we can see, the secondary network achieves a minimum of 3.0 Mbps data rate, a maximum of 6.7 Mbps, and an average of 5.1 Mbps across all the 12 locations. Note that this data rate is achieved by the secondary network in 5 MHz bandwidth, and the secondary transmitter’s power is controlled so that its interference at the primary receiver (after BBF) remains at the noise level. 2.7.4.3 BBF versus Other Beamforming Techniques As BBF is the core component of our spectrum sharing scheme, we would like to further examine its performance by comparing it against the following two beamforming techniques. • Explicit Beamforming (EBF): In this technique, the secondary transmitter (SU 1) has knowl- [1] edge of forward channel between itself and the primary receiver (PU 2), i.e., Hsp (k). The forward channel knowledge is obtained through explicit channel feedback. Specifically, SU 1 sends a Null Data Packet (NDP) to PU 2, which estimates the channel and feed the estimated [1] channel information back to SU 1. After obtaining the forward channel Hsp (k), SU 1 con- [1] structs the precoder by P(k) = mineigvectors(Hsp (k)), where k is subcarrier index. • Implicit Beamforming (IBF): In this technique, the secondary transmitter (SU 1) has knowl- 45 Tx-side IC capability (dB) 30 20 10 Explicit beamforming (EBF) Implicit beamforming (IBF) Blind beamforming (BBF) 0 1 2 3 4 5 6 7 8 9 10 11 12 Test location index Figure 2.19: Tx-side IC capability of the three beamforming techniques when the secondary device has three antennas. [1] edge of backward channel from the primary receiver (PU 2) to itself, i.e., Hps (k). The backward channel knowledge is obtained through implicit channel feedback. Specifically, [1] PU 2 sends an NDP to SU 1. SU 1 first estimates the backward channel Hps (k). It then [1] constructs the precoder by P(k) = mineigvectors(Hps (k)), where k is subcarrier index. Channel calibration has been performed at SU 1 before signal transmission. We conduct experiments to measure the tx-side IC capability of these three beamforming tech- niques. Fig. 2.19 depicts our results. We can see that, compared to EBF, our proposed BBF has a maximum of 4.5 dB and an average of 2.1 dB degradation. Compared to IBF, our proposed BBF has a maximum of 2.5 dB and an average of 1.0 dB degradation. The results show that the proposed BBF has competitive performance compared to EBF and IBF. We note that, although offering bet- ter performance, EBF and IBF cannot be used in underlay CRNs as they require knowledge and cooperation from the primary devices. 2.7.5 Network Setting: (Mp = 2, Ms = 3) We now study the CRN in Fig. 2.8 when the primary devices have two antennas and the secondary devices have three antennas (i.e., Mp = 2 and Ms = 3). The primary devices use their two 46 Rx-side IC capability (dB) 30 40 Tx-side IC capability (dB) 30 20 20 10 10 Antenna 1 Antenna 2 0 0 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12 Test location index Test location index (a) Tx-side IC capability from the secondary (b) Rx-side IC capability from the secondary re- transmitter’s BBF. ceiver’s BIC.  Figure 2.20: Tx-side and rx-side IC capabilities of the secondary network for Mp = 2, Ms = 3 setting. antennas for spatial multiplexing. That is, two independent data streams are transfered in the primary network. The secondary devices use their spatial DoF provided by their three antennas for both interference management and signal transmission. Indeed, one data stream is transfered in the secondary network. The primary network uses LTE-like PHY (see primary network 2 in Table 2.1) for data transmission. We study our spectrum sharing scheme in this CRN and report the measured results below. Tx-Side IC Capability. In this CRN, since the primary receiver has two antennas, the sec- ondary transmitter needs to cancel its generated interference for both antennas on the primary re- ceiver. We measure the IC capability of our proposed BBF for the primary receiver’s both antennas. Fig 2.20(a) exhibits our measured results. We can see that a three-antenna secondary transmitter can effectively cancel the interference on the primary receiver’s both antennas. Specifically, the BBF on the secondary transmitter achieves a minimum of 21.7 dB, a maximum of 28.7 dB, and an average of 25.1 dB IC capability for the primary receiver’s two antennas. Rx-Side IC Capability. In this CRN, since the primary transmitter sends two independent data streams, the secondary receiver needs to decode its desired signals in the presence of two in- terference sources. We measure the rx-side IC capability of our proposed BIC at the three-antenna 47 -30 -30 EVM (dB) EVM (dB) -20 -20 -10 -10 Interference-free Interference-free Spectrum sharing Spectrum sharing 0 0 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12 Test location index Test location index (a) EVM of the decoded data stream 1 at the pri- (b) EVM of the decoded data stream 2 at the pri- mary receiver. mary receiver. Figure 2.21: EVM of the twodata streams in the primary network with and without the secondary network for Mp = 2, Ms = 3 setting. secondary receiver. Fig 2.20(b) exhibits our measured results. We can see that the proposed BIC on the secondary receiver achieves a minimum of 26.5 dB, a maximum of 38.1 dB, and an average of 33.0 dB IC capability over the 12 locations. This shows the effectiveness of the proposed BIC in handling unknown interference. EVM at Primary Receiver. We now study the performance of the two data streams in the primary network. We want to see if the presence of secondary network harmfully affects the traffic in the primary network. To do so, we measure the EVM of the decoded two data streams at the primary receiver in two cases: i) in the presence of the secondary network, and ii) in the absence of the secondary network. Fig. 2.21 presents our measured results. It can be seen that the presence of the secondary network does not visibly affect the EVM performance of the primary network. This indicates that the BBF at the secondary network successfully mitigates the interference from the secondary transmitter to the primary receiver. Throughput of Primary Network. Based on the measured EVM at the primary receiver, we extrapolate the achievable data rate on each data stream of the primary network using (2.30). The extrapolated throughput is presented in Fig. 2.22. Referring to Fig. 2.22(a), the primary net- work achieves an average of 32.1 Mbps throughput for its stream 1 in interference-free case and an average of 31.9 Mbps throughput in coexistence with the secondary network. As shown in 48 40 40 Throughput (Mbps) Throughput (Mbps) 20 20 Interference-free Interference-free Spectrum sharing Spectrum sharing 0 0 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12 Test location index Test location index (a) Extrapolated throughput of decoded primary (b) Extrapolated throughput of decoded primary data stream 1. data stream 2. Figure 2.22: Throughput of the two data  streams in the primary network with and without the secondary network for Mp = 2, Ms = 3 setting. 8 (Mbps) -30 6 (dB) -20 4 Throughput EVM -10 2 0 0 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12 Test location index Test location index (a) EVM of decoded signals at the secondary re- (b) Throughput of the secondary network. ceiver. Figure 2.23: Performance  of the secondary network in the proposed spectrum sharing scheme for Mp = 2, Ms = 3 setting. Fig. 2.22(b), for its data stream 2, the primary network achieves 32.5 Mbps and 32.3 Mbps through- put on average in the interference-free and spectrum sharing scenarios, respectively. For both data streams, only 0.2 Mbps degradation is observed in the throughput of the primary network. EVM at Secondary Receiver. Having confirmed that the spectrum utilization of secondary network does not degrade the performance of primary network, we now study the achievable per- formance of the secondary network. Recall that we transfer one data stream in the secondary network. We measure EVM of the decoded signal at the secondary receiver. Fig. 2.23(a) depicts the measured results. We can see that the EVM at the secondary receiver achieves a minimum of −27.7 dB, a maximum of −18.2 dB, and an average of −22.5 dB over the 12 locations. Throughput of Secondary Network. Based on the measured EVM at the secondary receiver, 49 we extrapolate the achievable data rate of the secondary network using (2.30). The extrapolated data rate is presented in Fig. 2.23(b). We can see that the proposed spectrum sharing scheme achieves a minimum of 3.0 Mbps, a maximum of 7.5 Mbps, and an average of 5.5 Mbps over the 12 locations. Note that this data rate is achieved by the secondary network in 5 MHz and without harmfully affecting the primary network. 2.7.6 Summary of Observations We now summarize the observations from our experimental results as follows: • BBF: BBF demonstrates its capability of handling cross-network interference in CRNs where  the secondary network has no knowledge about the primary network. In Mp = 1, Ms = 2  network setting, BBF achieves an average of 25.3 dB IC capability. In Mp = 2, Ms = 3 network setting, BBF achieves an average of 25.1 dB IC capability. • BIC: BIC also demonstrates its capability of decoding the desired signals in the presence  of unknown interference. In Mp = 1, Ms = 2 network setting, it achieves an average of  32.8 dB IC capability. In Mp = 2, Ms = 3 network setting, it achieves an average of 33.0 dB IC capability. • Primary Network: The primary network has very small performance degradation when the secondary network shares the spectrum (compared to the case without secondary network). As shown in Fig. 2.24(a), the average EVM degradation at the primary receiver is 1.6% over the 12 locations. Also, as shown in Fig. 2.24(b), the average throughput degradation at the primary receiver is 0.7% over the 12 locations. • Secondary Network: Using BBF at its transmitter and BIC at its receiver, the secondary net- 50 % of interference-free case % of interference-free case 100 98.3 97.7 100 98.7 99.5 100 100 100 100 100 100 100 100 100 100 100 100 100 96.3 92.1 99 99 100 91.1 100 75 75 50 50 25 25 0 0 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12 Test location index Test location index (a) Measured EVM of primary data streams. (b) Extrapolated throughput of primary data streams. Figure 2.24: Performance  of the proposed spectrum sharing scheme w.r.t. interference-free case for Mp = 2, Ms = 3 setting. work intends to establish communications by sharing the spectrum with the primary network.  The secondary network achieves 1.0 bits/s/Hz in the CRN with Mp = 1, Ms = 2 network  setting and 1.1 bits/s/Hz in the CRN with Mp = 2, Ms = 3 network setting. 2.8 Limitations and Discussions While the proposed scheme demonstrates its potential in real-world networks, there are still some issues that remain open and need to be addressed prior to its real applications. Primary Traffic Directions. In our spectrum sharing scheme, we assume that the primary communications are bidirectional and that the pattern of primary traffic is consistent. Under such assumptions, duration and direction of primary traffic are easy to learn for beamforming filter design. In real systems, the pattern of primary traffic might not be consistent. In such a case, a sophisticated learning algorithm is needed for the secondary devices to differentiate the forward and backward transmissions of the primary network. Channel Coherence Time. In static networks (e.g., indoor Wi-Fi), the devices are stationary or moving at a low speed. Then, the channel coherence time is large enough to cover the entire 51 period of primary forward transmission. But in the dynamic networks with highly mobile devices, the channel coherence time may be smaller than the duration of primary forward transmission. In such a case, the secondary network cannot use entire airtime of primary forward transmission. Instead, it can only access the spectrum when its beamforming filters remain valid (i.e., within the channel coherence time). Extension to Large-Scale Networks. In this work, we presented a spectrum sharing scheme for a small-size CRN consisting of one PU pair and one SU pair. This spectrum sharing scheme can be extended to a large-scale CRN that comprises multiple PU pairs and multiple SU pairs. This is because in most real-world wireless networks (e.g., Wi-Fi and cellular), only one user pair is active on a frequency band at a time. Therefore, our current design is a fundamental building block for spectrum sharing in a large-scale CRN. Nevertheless, extending our design to a large-scale CRN still faces several challenges. First, a secondary device should be capable of learning the active PU devices over time as well as their transmission direction and duration. For a secondary device, how to accurately obtain this information through a learning procedure is a challenging task. Second, primary devices may not be stationary (e.g., vehicular and unmanned aerial networks). How to design an adaptive and intelligent spectrum sharing MAC protocol for the secondary network is another challenging task. 2.9 Chapter Summary In this chapter, we proposed a spectrum sharing scheme for an underlay CRN that comprises two primary users and two secondary users. The proposed scheme allows the secondary users to use the spectrum without affecting the throughput of the primary users. The key components of our scheme are two MIMO-based IC techniques: BBF and BIC. BBF enables the secondary transmitter to pre- 52 cancel its generated interference for the primary receiver. BIC enables the secondary receiver to decode its desired signals in the presence of unknown cross-network interference. These two IC techniques make it possible for the secondary users to access the spectrum while remaining transparent to the primary users. We have built a prototype of our spectrum sharing scheme on a wireless testbed. We demonstrated that our prototyped secondary devices can coexist with com- mercial Wi-Fi devices. Extensive experimental results show that, for a secondary user with two or three antennas, BBF and BIC achieve about 25 dB and 33 dB IC capabilities in real wireless environments, respectively. 53 Chapter 3 D2D Communications in Cellular Networks 3.1 Introduction Cellular networks are key components of the telecommunications infrastructure in our society. Their roles of providing ubiquitous wireless Internet services become increasingly important with the proliferation of Internet-based applications such as smart cities, IoT, and autonomous driving. To increase the network capacity, provide massive connectivity, and meet the growing demands for wireless services, many advanced wireless technologies have been proposed for next-generation cellular networks. MU-MIMO, which allows a multi-antenna BS to simultaneously serve multiple User Equipment (UEs) on the same spectrum band, is one of the pivotal technologies for cellu- lar networks [102]. As its benefits are well recognized, MU-MIMO has already been deployed in real cellular networks to harness its throughput gain in the presence of antenna configuration asymmetricity. D2D communication is another promising technology for cellular networks [204]. Its basic idea is to allow direct communication between two proximity-based mobile users without travers- ing the BS or core network. As mobile users in today’s cellular networks require high data rate services (e.g., video sharing, online gaming, proximity-aware networking) in which they could potentially be in a short range for direct communication, D2D communication can greatly increase the spectral efficiency of the network. Moreover, the advantages of D2D communication go be- 54 yond spectral efficiency. Saving the airtime at the core network, D2D offers more airtime to the BS that can be leveraged to serve massive number of low-rate devices such as IoT sensors. It also can potentially reduce packet transmission delay, enhance user fairness, offload traffic for BSs, and alleviate congestion for core networks, especially in networks congested by IoT devices [177]. Although there are many results of MU-MIMO and D2D communications, most of them are limited to their respective domains and there is a lack of practical design to harvest the benefits of both technologies in cellular networks. Such a stagnation underscores the critical need for bridg- ing this gap. The main challenge in such a joint design is the interference management between MU-MIMO devices (BS and UEs) and D2D devices. As existing MU-MIMO schemes are vul- nerable to interference (e.g., pilot contamination), the performance of MU-MIMO communication will be dramatically degraded by the interference from active D2D devices if the interference is not properly handled. At the same time, the interference from MU-MIMO devices will also disrupt the D2D communications. Therefore, the coexistence of D2D and MU-MIMO communications necessitates a systematic scheme to tame the mutual interference between the two subsystems. In this chapter, we present DM-COM, a practical scheme for enabling the coexistence of D2D and MU-MIMO communications for cellular networks. We consider a single cell that comprises a BS, a set of cellular UEs (C-UEs), and a pair of D2D UEs (D-UEs) on each Physical Resource Block (PRB). The BS is equipped with several antennas; the C-UEs are equipped with one antenna; and the D-UEs are equipped with one or multiple antennas. MU-MIMO is used for communication between the BS and set of C-UEs. D2D technology is used for communication between the pair of D-UEs. We assume that MU-MIMO communication follows the principles of 5G New Radio (NR) standard (e.g., waveform and frame structure). We also assume that D-UEs know the network protocol and transmission pattern used by MU-MIMO as such information will be broadcast by BS over control channel. We further assume that the D2D applications are sensitive to communication 55 latency (e.g., virtual reality, online gaming, and health monitoring) and thus require low-delay bidirectional transmissions. In such a network, our objective is to enable the concurrent spectrum utilization of MU-MIMO and D2D communications. Towards this objective, we employ a blend of two interference management techniques: in- terference cancellation and beamforming, which are used in the following way. In the uplink MU-MIMO, the BS receives both desired signals from C-UEs and interfering signals from D-UEs. To decode its desired signals, the BS leverages the spatial DoF provided by its multiple antennas and constructs a decoding matrix to cancel the interference and equalize the channel distortion. In the downlink MU-MIMO, the BS constructs a beamforming (precoding) matrix to send its in- tended signals to C-UEs while pre-cancelling the interference for the receiving D-UEs. The C-UEs do not participate in the interference management and, instead, they rely on other devices to handle their interference. A similar approach is adopted to manage the interference in the D2D subsystem. While the idea of our interference management scheme is clear, many technical issues remain challenging. For uplink MU-MIMO transmission, how can the BS decode the signals from C-UEs in the presence of interference from D-UEs? For downlink MU-MIMO transmission, how can the BS perform beamforming in the downlink so it can mute its interference for the D-UEs? For these two questions, one possible solution is to design a dedicated channel acquisition protocol for the BS to obtain CSI for signal detection and beamforming. However, such a solution not only entails a large airtime overhead but also complicates the system operation. In light of this, we propose a new MU-MIMO scheme that is resilient to the interference from/to D-UEs. The key idea of our new scheme is that, instead of relying on CSI for signal detection and beamforming, we blindly use the received signals to extract spatial information required to train decoding and beamforming matrices. Surprisingly, such a scheme leads to a very good performance for signal detection in the face of interference, provided that the BS has sufficient antennas. 56 For D2D communication, we apply the same approach to managing interference. For a trans- mitting D-UE, it leverages the overheard interference from C-UEs to construct the precoding ma- trix for beamforming. For a receiving D-UE, it leverages the reference signals to construct the decoding matrix for signal detection in the presence of interference. By doing so, the D-UEs do not require CSI for signal detection and beamforming. Therefore, the need for notorious channel feedback is eliminated. Based on the above interference management scheme, we have developed DM-COM to enable the coexistence of D2D and MU-MIMO communications in cellular networks. In a nutshell, DM- COM advances the state-of-the-art in the following aspects: • At the cellular BS, we have designed an interference management technique that cancels interference from/to D2D users at uplink/downlink. This scheme does not need CSI nor synchronization with D2D users. • At the D2D users, we have designed an interference management technique that cancels interference from/to cellular nodes. This scheme does not need CSI nor synchronization with cellular subsystem. • We have proposed DM-COM, a holistic scheme to enable coexistence of D2D and MU-MIMO technologies without adversely affecting each other. • We have built a prototype of DM-COM on a wireless testbed consisting of USRP N210 devices and shown DM-COM’s efficacy in handling cross-subsystem interference in real- world wireless environment. We evaluated the performance of DM-COM in a pico-cell network where a four-antenna BS serves two single-antenna C-UEs in accordance with 5G NR standard. In the network, there co- exists a pair of D-UEs for direct communication. One D-UE has one antenna and the other has 57 three antennas. Our experimental results show that DM-COM reaches 1.9 bit/s/Hz spectral effi- ciency for D2D users. This is achieved at the cost of 8.0% throughput degradation for MU-MIMO users (compared to the case without D2D users). Moreover, compared to the conventional case where all the users (C-UEs and D-UEs) are served by the BS, DM-COM improves the average network throughput from 21.9 Mbps to 35.1 Mbps in 5 MHz bandwidth, i.e., 60.3% throughput gain for DM-COM is observed. Our experimental results show that DM-COM successfully re-uses the spectrum that is pre-occupied by C-UEs. DM-COM maintains the performance of incumbent C-UEs and increases the overall network throughput through establishing D2D communications. 3.2 Related Work We briefly review D2D and MU-MIMO solutions in cellular networks. D2D. To accommodate ever-increasing users in cellular networks and enhance the spectrum re-utilization, D2D users are allowed to communicate directly without involvement of the BS. Despite its potential benefits, a D2D sub-system needs to control co-channel interference, manage resources for competing users, and mitigate security threats [177]. In order for accomplishing these tasks, the enablers of D2D communications include beamforming [115,165,176], spectral resource management [30, 84, 130, 160], power control [9, 10, 61, 62, 98, 160, 174], and mode selection [15, 27, 60]. The existing research follows different objectives, such as achievable data rate [9, 10, 15, 27, 160], fairness [98], interference minimization [61], energy efficiency [62, 84, 130, 165, 174], and security of D2D systems [112, 148, 158]. From another perspective, most of existing works consider spectrum re-utilization in either uplink (see, e.g., [10, 61, 98]) or downlink (see, e.g., [115, 165, 176]) of cellular networks, but not both. Moreover, most of the existing works require perfect global channel knowledge as well as 58  %6 ''   ,QWHUIHUHQFH '8( '8(  &8( &8( &8( Figure 3.1: Coexisting MU-MIMO and D2D communications over one PRB in a cellular network. network-wide synchronization. In contrast, DM-COM enables spectrum re-utilization in both up- link and downlink. It does not require channel feedback between the network devices, nor network- wide fine-grained synchronization. MU-MIMO. MU-MIMO has widely been employed in current wireless systems. The main components of MU-MIMO are beamforming in the downlink and multi-user detection in the up- link. Most of beamforming methods are reliant on perfect CSI [34, 122, 163]. In the uplink, blind beamforming methods offer a solution to this challenge, but suffer from high computa- tional complexity and long processing delays since they need to solve a complex optimization problem [6, 24, 173] or follow sophisticated procedures to learn spatial information [12, 125, 126]. In the downlink, existing signal detection methods consider benign environments where the net- work nodes are perfectly synchronized [22, 42, 183, 201]. DM-COM differs from existing methods as it eliminates the need for channel feedback and network-wide synchronization in both downlink and uplink. 3.3 Problem Description Network Setting. We consider a cellular network as shown in Fig. 3.1. It comprises a BS, a set of C-UEs, and a pair of D-UEs on one PRB. The BS has multiple antennas, and each C-UE has 59 a single antenna.1 Let Mbs denote the number of the BS’s antennas. Let N denote the number of C-UEs. To fully utilize the BS’s antennas and maximize the spectral efficiency, MU-MIMO is used for the communication between the BS and N C-UEs. The BS coordinates uplink and downlink transmissions with Time Division Multiple Access (TDMA) to serve cellular users. Within the cellular network, there coexists a pair of D-UEs intending to conduct bi-directional communication over a PRB without traversing the BS. without any loss of generality, in the remainder of this chapter, we focus on one pair of D-UEs over a PRB. All the arguments hold for multiple pair of D-UEs, each of which exclusively work over one or multiple PRBs. In the D2D pair under consideration, the two D-UEs may have different numbers of antennas. Let Md1 and Md2 denote the number of D-UE 1’s and D-UE 2’s antennas, respectively. Without loss of generality, we also assume that the number of D-UE 1’s antennas is less than or equal to the number of D-UE 2’s antennas, i.e., Md1 ≤ Md2 . For such a network, we have the following assumptions and justifications: • We assume that the user selection for MU-MIMO and D2D has taken place. User selection is not within the scope of this work. In real networks, there may exist multiple pairs of D-UEs. In that case, different pairs of D-UEs can be assigned to different PRBs based on some criteria. So, focusing on one pair is sufficient to study the coexistence problem which is indeed the main objective of this chapter. • We assume that the BS has more antennas than C-UEs, i.e., Mbs > N . This assumption can be fulfilled through user selection algorithms. Under this assumption, in addition to decoding the N desired data streams from C-UEs, the BS has remaining spatial DoF provided by its antennas to cancel interference from/to D-UEs. 1 DM-COM can support the case where C-UE has multiple antennas by simply treating a multi-antenna C-UE as multiple single-antenna C-UEs. 60  %6 6LJQDOLQJ 'RZQOLQNSDFNHW 'RZQOLQNSDFNHW D 080,02 WLPH 8SOLQNSDFNHW 8SOLQNSDFNHW  &8( WLPH  8SOLQNSDFNHW 8SOLQNSDFNHW  &8(N WLPH )RUZDUGSDFNHW )RUZDUGSDFNHW  '8( E '' WLPH %DFNZDUGSDFNHW %DFNZDUGSDFNHW  '8( WLPH /LVWHQLQJDW'8( /LVWHQLQJDW'8( Figure 3.2: The proposed network protocol for coexisting MU-MIMO and D2D communications. • We assume that the D-UEs know the data flow pattern of MU-MIMO communication indi- cated by slot format in NR. We also assume that C-UEs are oblivious to D-UEs. C-UEs will not contribute to the interference management. • We assume that the channel coherence time is sufficient (e.g., 1 ms). The same assumption has been made by other beamforming-based MIMO systems [34, 122, 163]. Our Objective. We aim to develop DM-COM, a practical scheme to enable the coexistence (concurrent spectrum utilization) of D2D and MU-MIMO communications by taming their mutual interference. More specifically, we aim to maximize the throughput of the D2D communication while maintaining the performance of MU-MIMO subsystem. 3.4 DM-COM: An Overview In this section, we first present a network protocol for the concurrent spectrum utilization of coex- isting MU-MIMO and D2D subsystems, and then analyze the achievable data streams on the D2D link. Finally, we point out the underlying challenges in interference management at the physical layer and outline our solutions. 61 PRB #24 Subframe 0 Subframe 1 Subframe 2 Subframe 9 #10 5 MHz #9 #2 #1 #0 PSS SSS PBCH PBCH DM-RS PDSCH PDSCH DM-RS PDSCH PT-RS PUSCH PUSCH DM-RS PUSCH PT-RS Flexible Figure 3.3: Frame structure for MU-MIMO communication. 3.4.1 Network Protocols MU-MIMO Communication. In the context of cellular networks, Fig. 3.2(a) presents our proposed protocol for uplink and downlink MU-MIMO transmissions. The protocol works as follows. The BS first broadcasts an announcement about MU-MIMO transmission to the selected C-UEs. Then, the selected C-UEs send their packets to the BS in the uplink, which is followed by spatial multiplexing in downlink transmissions. The uplink and downlink transmissions repeat until the session of MU-MIMO communication terminates. To support MU-MIMO communication, we consider NR-like frame format. Fig. 3.3 depicts the frame structure in one PRB within a frame. To be specific, this frame structure is adopted based on N38 frequency band and slot format 45 setting over 5 MHz [1]. As shown in the figure, the frame is composed of 10 subframes, each of which comprises 14 OFDM symbols according to numerology µ = 0 in NR. Based on the bandwidth configuration, an OFDM symbol has 300 occupied subcarriers grouped into 25 PRBs. Reference signals are embedded into frames for synchronization, signal demodulation, phase tracking, etc. Among the reference signals shown in Fig. 3.3, We will leverage PDSCH DM-RS of downlink packets and PUSCH DM-RS of uplink packets in our design. As shown in the figure, not every subcarrier has these reference signals. This is because, the subcarrier spacing is small 62 (15 kHz), and the channels of adjacent subcarriers are highly correlated. Therefore, if a subcarrier does not have reference signal, the reference signals on its adjacent subcarriers can be used for signal demodulation (detection). This feature will also be leveraged in the design of our signal detection method. TDD is considered for MU-MIMO to support its uplink and downlink transmis- sions. The ratio of uplink and downlink duration can be configured as desired based on the slot format. For ease of demonstration, we have considered slot format 45 with equal downlink/uplink duration, and we equally assigned flexible OFDM symbols to uplink and downlink transmissions. D2D Communication. Fig. 3.2(b) shows the proposed transmission protocol for the D2D communication, with respect to the timeline of uplink/downlink transmission in MU-MIMO sub- system. In the uplink MU-MIMO, D2D conducts forward transmissions (from D-UE 1 to D-UE 2). In the downlink MU-MIMO, D2D conducts backward transmissions (from D-UE 2 to D-UE 1). To establish such a timing alignment, D2D subsystem needs neither fine-grained synchronization with MU-MIMO subsystem nor coordination from the cellular BS. The D2D subsystem can learn cellular traffic pattern by either listening the information over the control channel or tracking the spatial signatures of signals on multiple antennas on D-UEs. It then adjusts its transmission ac- tivities based on learned pattern. As illustrated in the figure, the time duration of D2D forward transmissions is slightly shorter than that of uplink MU-MIMO. In this time period, D-UE 2 over- hears the interfering signals from C-UEs, which will be used for the calculation of its beamforming matrix. For the D2D communication, two remarks are in order. First, the mutual interference between D2D and MU-MIMO communications will be properly handled at physical layer. Therefore, the D2D and MU-MIMO subsystems remain oblivious to each other from the viewpoint of MAC or upper layers. Second, as we shall see later, the interference management at the physical layer does not require PHY-layer cooperation between the D2D and MU-MIMO subsystems. Hence, the D2D 63 and MU-MIMO subsystems do not need to use the same frame structure and modulation. As we will show via experiments, D2D can employ IEEE 802.11 PHY for its transmissions. 3.4.2 Achievable Data Streams (DoF) on the D2D Link For the protocol in Fig. 3.2, a natural question to ask is how many data streams can be transported on the D2D link. Apparently, it depends on the number of D-UE 2’s antennas. If D-UE 2 has a large number of antennas, then many data streams can be transported on the D2D link, provided that D-UE 1 has enough DoF to support all incoming streams from D-UE 2. If D-UE 2 does not have sufficient antennas, then no data stream can be transported on the D2D link. We note that the number of data streams on an MIMO link, which is also known as DoF, is the first- order approximation of its Shannon capacity with respect to Signal to Noise Ratio (SNR). It also represents the multiplexing gain of the MIMO link in high-SNR regime. Therefore, studying the number of data streams is of great theoretical importance to analyze the achievable data rate of the D2D link (given that analyzing its Shannon capacity is out of our capability). In what follows, we derive the achievable data streams on the D2D link by analyzing the spatial DoF consumption in the uplink and downlink MU-MIMO using an existing DoF model [153]. Assume that the bi-directional transmissions on the D2D link are symmetric, i.e., the number of data streams from D-UE 1 to D-UE 2 is the same as that from D-UE 2 to D-UE 1. We let d ∈ N0 denote the number of data streams on the D2D link. To determine the maximum value of d, we first consider the uplink MU-MIMO as shown in Fig. 3.4(a). At the BS, it needs to decode N data streams from C-UEs and cancel d interfering streams from D-UE 1. Based on the DoF model in [153], we have N + d ≤ Mbs . At D-UE 1, it needs to transmit d data streams. We therefore have d ≤ Md1 . At D-UE 2, it needs to decode d data streams and cancel N data streams from C-UEs. We have N + d ≤ Md2 . Based on the above three constraints, the maximum number of achievable 64 &DQFHOOHGE\%6  '8( '8( %6   D &DQFHOOHGE\'8(  &8( &8( &8(1 3UHFDQFHOOHGE\%6 YLDEHDPIRUPLQJ  '8( '8( %6   E 3UHFDQFHOOHGE\'8(  YLDEHDPIRUPLQJ &8( &8( &8(1 Figure 3.4: Illustrating the mutual interference between MU-MIMO and D2D subsystems: (a) uplink; (b) downlink. data streams on the D2D link can be expressed by d = min(Md1 , Md2 −N, Mbs −N )+ , (3.1) where (·)+ returns a nonnegative number, i.e., max(·, 0). By the same token, it is easy to verify that d in (3.1) is also the maximum number of achievable data streams on the D2D link in downlink MU-MIMO (see Fig. 3.4(b)). 3.4.3 Interference Management and Its Challenges Now the question is how to handle the interference at the physical layer so that the D2D link can achieve d data streams while the MU-MIMO subsystem can maintain its N data streams between the BS and the N C-UEs. To answer this question, we consider uplink and downlink MU-MIMO 65 separately. For the uplink as shown in Fig. 3.4(a), we need to design a signal detection method for both D-UE 2 and the BS so that they can decode their respective signals in the presence of interference from unintended transmitters. For the downlink as shown in Fig. 3.4(b), we need to design a beamforming method for both D-UE 2 and the BS so that they can pre-cancel their generated interference for their unintended receivers. While there are many results of signal detection and beamforming in the context of MIMO, most of them require global CSI and perfect synchronization. Such requirements entail a large amount of airtime overhead, thereby degrading the spectral efficiency and complicating system operation. In light of this challenge, we propose a new signal detection method and show its resilience to interference. In contrast to existing detection methods (e.g., ZF and MMSE detectors), our signal detection method does not require CSI but is capable of decoding signals in the face of interference. In the downlink, we propose two new beamforming methods for BS and D-UE 2, respectively. Again, the proposed beamforming methods do not require CSI for the design of precoding matrix, differentiating themselves from existing beamforming methods. 3.5 MU-MIMO Communication In this section, we present new signal detection and beamforming methods for MU-MIMO to handle the interference between the BS and D-UE 1 (see Fig. 3.4). The interference between C- UEs and D-UE 2 will be handled by the D2D communication method presented in the next section. 3.5.1 Basic Idea In the MU-MIMO subsystem, the BS handles its interference in both uplink and downlink trans- missions by leveraging its spatial DoF offered by multiple antennas. Specifically, in the uplink 66 MU-MIMO as shown in Fig. 3.4(a), the BS performs interference cancellation and signal detec- tion to recover its desired signals from the N C-UEs in the presence of interference from D-UE 1. At the BS, interference cancellation and signal detection will be done using spatial matrices to combine the received signals from its multiple antennas. In the downlink MU-MIMO as shown in Fig. 3.4(b), the BS applies beamforming to pre-cancel its generated interference at D-UE 1. Recall that we assume the BS has more antennas than C-UEs (Mbs > N ). This assumption ensures that the BS has sufficient spatial DoF to send N data streams towards the N C-UEs and, at the same time, it is able to nullify its generated interference at D-UE 1. In contrast to the BS, the C-UEs do not participate in the interference management since they have a single antenna. They will rely on D-UE 2 to handle the interference in both uplink and downlink. As such, DM-COM preserves backward compatibility with incumbent C-UEs. In what follows, we focus on the baseband signal processing at the BS. We first present the signal detection method for the uplink and then present the beamforming method for the downlink. 3.5.2 Uplink Signal Detection at the BS Mathematical Formulation. We consider the uplink MU-MIMO transmissions in the presence of interference from D-UE 1 as shown in Fig. 3.4. Let sc ∈ CN ×1 denote the vector of signals that are transmitted by the N C-UEs. Let sd ∈ Cd×1 denote the vector of signals that are transmitted by D-UE 1. Let Pd ∈ CMd1 ×d denote its precoding vector. Also, Hc ∈ CMbs ×N denotes the compound channel between the BS and the N C-UEs, and Hd ∈ CMbs ×Md1 stands for the MIMO channel between the BS and D-UE 1. We further let w ∈ CMbs ×1 denote noise at the receiving BS. Then, the vector of received signals at the BS, which we denote as y ∈ CMbs ×1 , can be written as: y = Hc sc + Hd Pd sd + w. (3.2) 67 At the BS, to recover its desired signal sc in the presence of interference sd and noise w, one approach is using conventional detectors, such as ZF and MMSE detectors. These approaches, however, require channel knowledge about Hc and Hd . While Hc is easy to obtain, Hd is not. If the BS intends to obtain Hd , it requires to cooperatively work with D-UE 1, and a dedicated protocol is needed for channel sounding as well. This increases the airtime overhead and compli- cates network operation remarkably. In light of this challenge, we propose an approximate-MMSE MIMO detector for the BS, which does not require channel knowledge about Hc and Hd for signal detection. Detection Matrix Design. We consider linear detection at the BS. By letting G ∈ CN ×Mbs denote the detection matrix, the estimated signal at the BS can be written as ŝc = Gy, where ŝc is the estimated version of signal sc . Then, the Mean Square Error (MSE) between the original signal sc and estimated signal ŝc can be written as: MSE = E |Gy − sc |2 , where | · |2 is ℓ2 -norm   of a complex vector. By letting ∂MSE H H+ ∂G = 0, we can obtain G = E[sc y ]E[yy ] , where [·] is + Moore-Penrose inverse. This is actually another form of MMSE MIMO detector.2 To calculate G in real systems, we need to compute E[sc yH ] and E[yyH ]. To do so, we take advantage of the demodulation reference signals for uplink (PUSCH DM-RS) in the frame struc- ture, as shown in Fig. 3.3. In the uplink frame, one OFDM symbol is used for PUSCH DM-RS within a PRB. We can use these reference signals to estimate E[yyH ] and E[ysH c ]. Let us define that a PRB has 12 subcarriers and 14 OFDM symbols. Let R denote the set of PUSCH DM-RS elements in an uplink PRB as shown in Fig. 3.3. Let k and l denote the index of subcarriers and OFDM symbols, respectively. Then, we have E[yyH ] ≈ |R| 1 P H (l,k)∈R y(l, k)y(l, k) and 2 By letting H denote the compound channel and assuming that the distribution of transmit signal is i.i.d., G can be transformed to its classical form: G = E[sc yH ]E[yyH ]+ = HH (HHH + σ 2 I)−1 , where σ 2 is the normalized noise power. 68 E[sc yH ] ≈ |R| 1 P H (l,k)∈R sc (l, k)y(l, k) . Consequently, G can be approximately expressed as: hX ih X i+ G= sc (l, k)y(l, k)H y(l, k)y(l, k)H , (3.3) (l,k)∈R (l,k)∈R where sc (l, k), (l, k) ∈ R, is a PUSCH DM-RS element at the N C-UEs; and y(l, k), (l, k) ∈ R, is the corresponding received signal at the BS, which includes both PUSCH DM-RS element from the C-UEs and interfering signals from D-UE 1. We note that in (3.3), we replace the approx- imation sign (≈) with equation sign (=) for simplicity. We also note that, since G in (3.3) is an approximation of MMSE MIMO detector, we therefore term it approximate-MMSE MIMO detector. Performance Analysis. The approximate-MMSE MIMO detector does not require CSI for the signal detection. Instead, it uses the transmitted and received reference signals to compute the detection matrix. For this reason, the approximate-MMSE MIMO detector can decode desired signals in the presence of unknown interference. It is interesting to explore the performance of this approximate-MMSE MIMO detector in the cellular network. Let us assume that the signals in a PRB experience the same channel, i.e., channel coherence frequency is greater than 12 subcarriers (180 kHz) and channel coherence time is greater than 14 OFDM symbols (1 ms). Let us further assume that the noise is negligible (i.e., zero-noise). We have the following lemma: Lemma 3. If th BS is equipped with sufficient number of antennas then the approximate-MMSE MIMO detector at BS can perfectly decode the signals from the N C-UEs, i.e., ŝc (l, k) = sc (l, k), ∀l, k. Proof. We denote H(k) as the compound channel between the BS and all the transmitting UEs h i over subcarrier k, i.e., H(k)= Hc (k) Hd1 (k) ; we also denote S(l, k) as the compound transmit 69 h iT signals at all C-UEs and D-UE 1, i.e., S(l, k) = sc (l, k) sd1 (l, k) . Then, we can re-write (3.2) over subcarrier k and OFDM symbol l as: Y(l, k) = H(k)S(l, k). (3.4) As the auto-correlation matrix of the compound transmit signals, we have     (a)  Rc 0  (b)  I 0 RS = E(SSH ) =  = (3.5)   0 Rd 0 Rd where RS , Rc , and Rd are the auto-correlation matrix of the compound transmit signals, auto- correlation matrix of C-UEs’ transmit signals, and auto-correlation matrix of D-UE 1 transmit signals, respectively. Equality (a) follows from the fact that the transmit signal from C-UEs are independent of the transmit signals from D-UE 1. Also, (b) follows from the fact that transmit signals from C-UEs are independent too. Based on (3.3), (3.4), and (3.5), we obtain the approximate-MMSE MIMO detector G(k) over subcarrier k as follows: hX i+ h X i ′ ′ H ′ ′ H G(k) = Y(l, k )Y(l, k ) Y(l, k )sc (l, k ) (l,k ′ )∈R (l,k ′ )∈R +  = E Y(l, k)Y(l, k)H E Y(l, k)sc (l, k)H   +  = H(k)RS H(k)H H(k)I′ ,   (3.6) where I′ is a matrix which its entries on the diameter are one and other entries are zero. Then, we 70 have ŝc (l, k) = G(k)∗ Y(l, k) n H +  ′ oH = H(k)RS H(k) H(k)I H(k)S(l, k) = sc (l, k), ∀l, k. (3.7) This means that the approximate-MMSE MIMO detector G(k) is capable of perfectly recovering the original signal over subcarrier k and OFDM symbol l in a noise-free environment. This lemma shows the superior performance of approximate-MMSE MIMO detector in ideal scenarios (frequency-flat channel, sufficiently large channel coherence time, and zero-noise regime). For its performance in non-ideal scenarios, we resort to experimentation. Our experimental results will show that the approximate-MMSE MIMO detector yields a good performance in real network scenarios. 3.5.3 Downlink Beamforming at BS Beamforming Matrix Design. We now consider the beamforming for downlink MU-MIMO as shown in Fig. 3.4(b). Based on the network information theory, if a network can send N data streams in the uplink, it can also send N data streams in the downlink. This principle inspires us in the design of beamforming matrix. Our beamforming method is simple – we use the detection matrix derived in the uplink as the beamforming matrix in the downlink. Let z(l, k) ∈ CN ×1 denote the vector of signals in OFDM symbol l on subcarrier k that the BS wants to send towards N C-UEs. Let x(l, k) ∈ CMbs ×1 denote the vector of precoded signals in OFDM symbol l on subcarrier k that the BS sends to its Mbs antenna ports. Then, the beamforming operation can be expressed as: x(l, k) = αGT z(l, k), ∀l, k, where G is obtained in the uplink and α ∈ R is a 71 scaling factor to meet the requirement of the BS’s transmit power. In Lemma 3, we showed that the G matrix can perfectly recover the desired signals at the BS in the uplink. If the uplink and downlink channels reciprocity is maintained, it is evident that the C-UEs can also perfectly recover their respective signals in the downlink. Moreover, the BS can perfectly pre-cancel the interference for D-UE 1, which is a receiver in this time period (see Fig. 3.4(b)). For the beamforming method in non-ideal scenarios, we leave its performance evaluation to our experimental results in Section 3.7. Channel Calibration. The proposed beamforming method relies on the channel reciprocity. For its deployment in real systems, relative channel calibration at the BS can be implemented to maintain the channel reciprocity. In our experiments, the relative calibration method in [150] was implemented at the BS as a part of beamforming implementation. 3.5.4 Discussions on Its Limitations Two remarks on this MU-MIMO method are in order. First, channel coherence time plays a critical role in the proposed MU-MIMO method. Suppose that both uplink and downlink occupy one sub- frame (1 ms). Then, the required channel coherence time should be longer than 1 ms. This is a mild requirement in real wireless environments. Second, the performance of the proposed MU-MIMO method is dependent on the number of reference signals in an uplink PRB. Per our experiments, when a device has Nant antennas, R needs to be selected such that |R| ≥ 2Nant . In this case, the average EVM gap between approximate-MMSE and ideal MMSE detectors is less than 3 dB. As such, D2D and MU-MIMO subsystems individually set an appropriate R and PUSCH DM-RS pattern to meet their own needs. For instance, R may embrace more than one PRB or PUSCH DM- RS pattern may entail dense distribution of reference signals to meet the requirements of D-UEs and the BS. 72 3.6 D2D Communication In this section, we focus on the D2D communication. As the interference related to D-UE 1 has been tamed by the BS, we now focus on the interference related to D-UE 2. Specifically, we design a D2D communication scheme such that D-UE 2 can properly handle its related interference in both uplink and downlink. A proper D2D scheme has to address the following two questions: For the uplink shown in Figure 3.4(a), how can D-UE 2 decode its intended signals in the presence of interference from C-UEs? For the downlink in Figure 3.4(b), how can D-UE 2 send its signal to D-UE 1 while pre-canceling its generated interference for C-UEs? In what follows, we present our solutions to these questions. 3.6.1 Signal Detection at D-UE 2 Referring to D2D forward transmissions in Figure 3.4(a), we follow the same approach presented in Section 3.5.2 for D-UE 2 to decode its signals in the presence of interference from C-UEs. Specifically, D-UE 2 first calculates a detection matrix using (3.3) and then uses the calculated detection matrix to filter out the interference from C-UEs and equalize the channel distortion for signal recovery. The remaining question is what frame structure should be used for the D2D transmission. Actually, the frame structure for D2D transmission is flexible. As we will show in our experiments, the frame structure of D2D communication can be the same as the MU-MIMO frame structure as shown in Figure 3.3; it also can be IEEE 802.11 frame structure (consisting of preamble and data parts [67]). 73 3.6.2 Beamforming at D-UE 2 We now consider the D2D backward transmissions in Figure 3.4(b). In this time period, D-UE 2 needs to perform beamforming to pre-cancel its interference for C-UEs. Our beamforming method takes advantage of the overheard interfering signals in the previous time period, as illustrated in Figure 3.2. By leveraging the overheard signals, D-UE 2 constructs a beamforming matrix for signal transmission. In what follows, we detail the construction of beamforming matrix at D-UE 2. Beamforming Matrix Design. Referring to Figure 3.2, in a short time period at the end of uplink MU-MIMO, D-UE 1 does not transmit signal and thus D-UE 2 receives only interfering signals from C-UEs. Let Yd ∈ CMd2 ×1 denote the received signals at D-UE 2 in this time period. Then, we have yd = Hdc sc + Wd , (3.8) where Hdc ∈ CMd2 ×N is the channel between C-UEs and D-UE 2; sc ∈ CN ×1 is the vector of transmit signals at the N C-UEs; and wd ∈ CMd2 ×1 is the noise vector at D-UE 2. Let Pd ∈ CMd2 ×d denote the precoding matrix at D-UE 2. Then, based on the received signal Yd , we construct Pd as: Pd = U(:, Md2 − d + 1 : Md2 ), (3.9) where U(:, n : m) is a submatrix of U, which is from U’s nth column to mth column. U is computed by [U D V] = svd(yd ydH ), (3.10) where D and V are redundant outputs, and svd (·) denotes singular value decomposition. Using (3.9) and (3.10), we compute a beamforming matrix Pd for each subcarrier in the OFDM symbols. 74 Then, the Pd is applied to the corresponding subcarrier for beamforming during D-UE 2’s signal transmission. Since the matrix Pd is computed using the uplink interfering signal, it necessitates channel reciprocity when using Pd as the beamforming matrix in the downlink. Therefore, channel calibration has to been done at D-UE 2 in order to pre-cancel its interference for C-UEs. Again, RF calibration can be used at D-UE 2 in the baseband signal processing domain to preserve channel reciprocity. Performance Analysis. We first study the performance of proposed beamforming scheme in an ideal network scenario. Let us assume that all the MIMO channels have full rank. Let us assume that the channel coherence time is sufficiently large (larger than the duration of downlink). Let us assume that the channel is perfectly calibrated at D-UE 2, i.e., the downlink and uplink channels are reciprocal. Let us further assume that the noise is negligible, and D-UE 2 has sufficient number of antennas, i.e., d + N ≤ Md2 . Then, we have the following lemma: Lemma 4. The constructed beamforming matrix Pd can completely pre-cancel the interference for the N C-UEs on every OFDM subcarrier. Proof. Referring to Fig. 3.2, D-UE 1 first remains silent for a while, and D-UE 2 merely receives interfering signals from N C-UEs. Then, D-UE 2 uses the overheard interference to design precod- ing filter for pre-cancelling its generated interference at C-UEs in backward transmission. The re- ceived interference can be written as: yd (k) = Hdc (k)sc (k), (3.11) where Hdc (k) denotes the compound channel between D-UE 2 and all the C-UEs over subcarrier 75 k. Then, we have (a) [1] [1] E[yd (k)yd (k)H ] = Hdc (k)Hdc (k)H , (3.12) where (a) follows from the fact that E[sc (k)sc (k)H ] = I as the N C-UEs send N independent data streams. Recall that N + d ≤ Md2 and consequently d ≤ Md2 − N . Based on the right hand side of (3.12), rank of E[yd (l, k)yd (l, k)H ] is at most N . The rank reduces when channel is correlated and rank deficient. Therefore, svd(yd (k)yd (k)H ) has at least d zero singular vectors. If ui denotes the ith left singular vector, based on (3.12), we have   Hdc (k)Hdc (k)H ui = 0, Md2 − d + 1 ≤ i ≤ Md2 . (3.13) If channel reciprocity is maintained with the aid of a channel calibration method, Hcd (k) = T Hdc (k) . Then, it is easy to show that Hcd (k)P = 0. Lemma 4 shows the superior performance of the proposed beamforming method. It is worth noting that, although the beamforming technique presented in Section 3.5 works for D-UE 2, we observed in experiments that the proposed Singular Value Decomposition (SVD)-based technique has superior performance in terms of interference leakage. In light ot this, the proposed technique is applied on D-UE 2 to preserve the performance of MU-MIMO subsystem. 3.7 Experimental Evaluation In this section, we build a prototype of DM-COM and evaluate its performance in a small network. 76 %6 %6 %6 ,QWHUIHUHQFH ,QDFWLYH &8( &8( '8( '8( &8( &8( '8( '8( &8( &8( '8( '8( D '0&20 E &HOOXODU080,02 & )XOO080,02 Figure 3.5: Experimental setup and comparison baselines. 3.7.1 Implementation and Experimental Setup Implementation. We have built a wireless network testbed that consists of a BS, two C-UEs, and two D-UEs as shown in Fig. 3.5(a). The BS has four antennas. The C-UEs has one antenna. D-UE 1 has one antenna. D-UE 2 has three antennas. The BS, C-UEs, and D-UEs are built using USRP N210 devices as the radio transceivers and general-purpose computers as baseband signal processors. We implement DM-COM on this testbed. The MU-MIMO subsystem is implemented using a custom-built 5G NR PHY, while the D2D subsystem is implemented using both NR-like and WiFi-like PHYs. The PHY parameters of DM-COM implementation are listed in Table 3.1. Based on these PHYs, we implement the MAC protocols for both MU-MIMO and D2D subsystems as shown in Fig. 3.2. For the MU-MIMO protocol, both uplink and downlink transmissions have the same duration. For the D2D protocol, the time duration of “listening at D-UE 2” is about 71.35 µs. Experimental Setup. Fig. 3.6 depicts the floor plan of our experimentations. The BS and C-UEs are always placed on the spots marked by blue and red colors, respectively. The distance between BS and cellular users is about 7 m. D-UE 1 and D-UE 2 are deployed over 50 random locations in Fig. 3.6. In each location, the distance between D-UEs is about 3 m. We use the indoor environments for ease of experimentation. Moreover, many small cells will be deployed in the buildings as mobile hotspot in the near future. 77 Table 3.1: The parameters of experimental network. MU-MIMO D2D D2D subsystem subsystem 1 subsystem 2 Standard NR-like NR-like WiFi-like Waveform OFDM OFDM OFDM FFT point 512 512 64 Valid subcarrier 300 300 52 Sample rate 5 Msps 5 Msps 5 Msps Symbol duration 71.35 µs 71.35 µs 16 µs Signal bandwidth 2.9 MHz 2.9 MHz 4 MHz Carrier frequency 2.48 GHz 2.48 GHz 2.48 GHz Transmit power ∼ 18 dBm ∼ 18 dBm ∼ 18 dBm P &8( %6 aP &8( P Figure 3.6: The floor plan of our experimentation. 3.7.2 Performance Metrics and Comparison Baselines Performance Metrics. We use two metrics to evaluate DM-COM. The first one is EVM, E[|Ŝ(l,k)−S(l,k)|2 ] which is defined as follows: EVM = 10 log10 ( ), where Ŝ(l, k) and S(l, k) are E[|S(l,k)|2 ] the estimated and original signals, respectively. EVM is widely used in both IEEE 802.11 standards [67] and 3GPP standards [1] to measure quality of decoded signals, define modulation Modulation and Coding Scheme (MCS), and estimate the achievable data rate as we see shortly. The second metric is the achievable data rate. Based on the measured EVM, we extrapolate the achievable data rate using the MCS defined in the 3GPP standard and IEEE 802.11ac standard as follows: r = 12 · N N+Nsc · b · γ (EVM), where coefficient 1/2 stems from halftime uplink fft cp 78 1 1 1 0 EVM= -22.3 dB 0 EVM= -25.0 dB 0 EVM= -19.7 dB -1 -1 -1 -1 0 1 -1 0 1 -1 0 1 (a) Decoded signal from C-UE 1 (b) Decoded signal from C-UE 2 (c) Decoded signal fromD-UE 1 at BS. at BS. at D-UE 2. Figure 3.7: Decoded (demodulated) signals in the MU-MIMO and D2D subsystems in the up- link/forward transmission.     (90 G%  (90 G%  (90 G%             (a) Decoded signal from BS at (b) Decoded signal from BS at (c) Decoded signal fromD-UE 2 C-UE 1. C-UE 2. at D-UE 1. Figure 3.8: Decoded (demodulated) signals in the MU-MIMO and D2D subsystems in the down- link/backward transmission. and halftime downlink transmissions in MU-MIMO. Nsc , Nfft , and Ncp denote number of used subcarriers, Fast Fourier Transform (FFT) points, and the length of cyclic prefix, respectively. b is the sampling rate in Msps. γ(EVM) is the spectral efficiency of transmission based on MCS selection defined in standards. Table 2.3 and Table 2.2 present γ (EVM) for NR-like and WiFi-like PHYs, respectively. Comparison Baselines. As shown in Fig. 3.5, we compare DM-COM with two existing schemes: Cellular-MU-MIMO and Full-MU-MIMO. In the Cellular-MU-MIMO, the BS serves the two C-UEs only, and the two D-UEs are deactivated. In the Full-MU-MIMO, the BS serves the two C-UEs while the two D-UEs communicate with each other with the aid of BS. Technically, 79 the BS simultaneously serves the four UEs in both uplink and downlink. 3.7.3 A Case Study of DM-COM We first use a case study to scrutinize DM-COM and its interference cancellation capability. In this case study, we place the two D-UEs at two spots marked by stars in the upper-left room in Fig. 3.6, and the D2D subsystem uses NR-like PHY for communications. Recall that DM-COM comprises two phases: uplink and downlink, as shown in Fig. 3.4. In what follows, we first examine the decoded signals in the two phases and then study the interference cancellation capability. Constellation, EVM, and Data Rate. Referring to Fig. 3.4(a), in the uplink, the BS demod- ulates the signals from the two C-UEs; at the same time, D-UE 2 demodulates the signal from D-UE 1. Fig. 3.7 exhibits the constellation of the demodulated signals at the BS and D-UE 2, as well as their EVMs. Based on the measured EVM, the uplink data rates of C-UE 1 and C-UE 2 are extrapolated to 6.2 Mbps and 7.6 Mbps, respectively. Meanwhile, the data rate of D-UE 2 is extrapolated to 4.5 Mbps. Referring to Fig. 3.4(b), in the downlink, the BS sends the data to the two C-UEs; at the same time, D-UE 2 sends data to D-UE 1. Fig. 3.8 presents the constellation of the demodulated signals at the two C-UEs and D-UE 1, as well as their EVMs. Based on the measured EVM, the down- link data rates of C-UE 1 and C-UE 2 are extrapolated to 5.3 Mbps and 6.2 Mbps, respectively. Meanwhile, the data rate of D-UE 1 is extrapolated to 4.6 Mbps. Beamforming Capability. Referring to Fig. 3.4(b), in the downlink, we examine the effec- tiveness of beamforming at the two transmitters (BS and D-UE 2) . To do so, we measure the interference at the receiving nodes (C-UE 1, C-UE 2, and D-UE 1) in two cases: with and without beamforming. For the case without beamforming, we use precoder [1/2, 1/2, 1/2, 1/2] at the BS √ √ √ and [1/ 3, 1/ 3, 1/ 3] at D-UE 2. Fig. 3.9 presents the measured the interference at the receiv- 80 Relative power (dB) Relative power (dB) Relative power (dB) -80 -80 -80 -100 w/ precoding -100 w/ precoding -100 w/ precoding w/o precoding w/o precoding w/o precoding -120 -120 -120 -2 0 2 -2 0 2 -2 0 2 Frequency (MHz) Frequency (MHz) Frequency (MHz) (a) Received interference at C- (b) Received interference at C- (c) Received interference at D- UE 1. UE 2. UE 1. Figure 3.9: Received interference at the receiving nodes in the downlink with and without beam- forming at the transmitters. ing nodes. It is evident that the proposed beamforming methods can very effectively pre-cancel the interference. Specifically, both beamforming methods achieve at least 28.5 dB interference cancellation capability. Thanks to the effective beamforming methods, both MU-MIMO and D2D subsystems can achieve superior performance in the downlink, as shown in Fig. 3.8. 3.7.4 DM-COM vs. Cellular-MU-MIMO and Full-MU-MIMO By the same token in the case study, we now study the performance of DM-COM by placing the two D-UEs at 50 different locations as shown in Fig. 3.6. In this study, we use Cellular-MU-MIMO and Full-MU-MIMO as the comparison baselines (see Fig. 3.5). EVM Distribution. Fig. 3.10 presents the distribution of measured EVM when the three schemes are used. Specifically, Fig. 3.10(a) presents the measured EVM of demodulated signals at the BS in the uplink MU-MIMO when DM-COM, Cellular-MU-MIMO, and Full-MU-MIMO are respectively used. Particularly, we considered two cases for DM-COM: (i) D2D subsystem uses NR-like PHY and (ii) D2D subsystem uses WiFi-like PHY. From the figure, we can see that DM-COM achieves −26.1 dB EVM on average, no matter which PHY (5G NR or WiFi) is used for D2D communications. In contrast, Cellular-MU-MIMO achieves about −27.6 dB EVM on aver- age, and Full-MU-MIMO achieves −20.1 dB EVM on average. The EVM gap between DM-COM 81       &') &')   &HOOXODU080,02 &HOOXODU080,02  '0&20 15''  '0&20 15'' '0&20 :L)L'' '0&20 :L)L'' )XOO080,02 )XOO080,02           (90 G% (90 G% (a) EVM in uplink MU-MIMO. (b) EVM in downlink MU-MIMO.       &') &')   '0&20 15''  '0&20 :L)L''  '0&20 15''  '0&20 :L)L'' )XOO080,02 )XOO080,02           (90 G% (90 G% (c) EVM in forward D2D. (d) EVM in backward D2D. Figure 3.10: EVM distribution of demodulated signals when DM-COM, Cellular-MU-MIMO, and Full-MU-MIMO are used. and Cellular-MU-MIMO is only 1.5 dB. This means that, in DM-COM, the EVM degradation at the BS caused by the interference from D2D subsystem is only 1.5 dB. Fig. 3.10(b) presents the measured EVM of the demodulated signals at the two C-UEs in the downlink MU-MIMO. It shows that DM-COM achieves an average of −24.3 dB EVM in the downlink MU-MIMO. The EVM gap between DM-COM and Cellular-MU-MIMO is about 1.9 dB. This means that, in DM-COM, the EVM degradation at C-UEs caused by the interference from D2D subsystem is only 1.9 dB. Fig. 3.10(c) and (d) present the measured EVM in forward and backward D2D transmissions when DM-COM and Full-MU-MIMO are used. Note that Cellular-MU-MIMO does not sup- port D2D communication, and thus these two figures do not include the results from Cellular- MU-MIMO. On average, DM-COM achieves −22.1 dB EVM for forward D2D transmission and 82   &HOOXODU080,02 '0&20 15''  '0&20 :L)L''  )XOO080,02 &HOOXODU080,02 '0&20 15''   &') &') '0&20 :L)L'' )XOO080,02                   7KURXJKSXW 0ESV 7KURXJKSXW 0ESV (a) Per-UE throughput in uplink MU-MIMO. (b) Per-UE throughput in downlink MU-MIMO.     '0&20 15'' '0&20 :L)L''   &') &') )XOO080,02   '0&20 15''    '0&20 :L)L'' )XOO080,02               7KURXJKSXW 0ESV 7KURXJKSXW 0ESV (c) D2D forward throughput. (d) D2D backward throughput. Figure 3.11: Distribution of extrapolated throughput when DM-COM, Cellular-MU-MIMO, and Full-MU-MIMO are used. −19.2 dB EVM for backward D2D transmission, no matter which PHY (NR or WiFi) is used for D2D subsystem. In contrast, Full-MU-MIMO achieves −15.6 dB EVM for forward D2D trans- mission and −13.2 dB EVM for backward D2D transmission. This means that DM-COM outper- forms Full-MU-MIMO by 6.5 dB in forward D2D communication and 6.0 dB in backward D2D communication. Per-UE Throughput Distribution. We extrapolate per-UE throughput (dat rate) based on the measured EVM. Fig. 3.11 presents the results. The staircase shape of the curves stems from the MCS selection, which yields discrete data rate region in nature. On average, DM-COM achieves 6.7 Mbps per-UE throughput in uplink MU-MIMO and 6.1 Mbps per-UE throughput in downlink MU-MIMO. At the same time, it achieves 5.4 Mbps for forward D2D transmission and 4.2 Mbps for backward D2D transmission. 83 40 15 Throughput (Mbps) Throughput (Mbps) 30 10 20 5 10 0 0 Cellular- DM-COM DM-COM Full- Cellular- DM-COM DM-COM Full- MU-MIMO (NR D2D) (WiFi D2D) MU-MIMO MU-MIMO (NR D2D) (WiFi D2D) MU-MIMO (a) Total throughput of MU-MIMO. (b) Total throughput of D2D. Figure 3.12: Total throughput of the MU-MIMO and D2D subsystems when DM-COM, Cellular- MU-MIMO and Full-MU-MIMO are respectively used. 3.7.5 Summary of Observations Fig. 3.12 presents the total throughput of MU-MIMO and D2D subsystems when DM-COM, Cellular-MU-MIMO, and Full-MU-MIMO are used. The total throughput of MU-MIMO is the summation of its uplink and downlink data rates. The total throughput of D2D is the summation of its backward and forward data rates. The total throughput are averaged over the 50 different locations in Fig. 3.6. MU-MIMO subsystem. Fig. 3.12(a) shows that DM-COM achieves 25.3 Mbps throughput for MU-MIMO subsystem when using 5G NR PHY for D2D and 25.7 Mbps throughput when using WiFi PHY for D2D. In contrast, Cellular-MU-MIMO achieves 27.8 Mbps throughput for MU-MIMO. This means that, in DM-COM, the throughput degradation of C-UEs caused by the interference from D-UEs is only 8%. Full-MU-MIMO achieves 16.6 Mbps, which is much less than DM-COM. D2D subsystem. Fig. 3.12(b) shows that DM-COM achieves 9.2 Mbps throughput for D2D subsystem when using 5G NR PHY and 10.1 Mbps throughput when using WiFi PHY. Recall that the system bandwidth is 5 MHz. This means that DM-COM achieves more than 1.9 bit/s/Hz spec- tral efficiency for D2D communication. In contrast, Full-MU-MIMO achieves 5.3 Mbps through- 84 put for D2D. This means that DM-COM outperforms Full-MU-MIMO by 82%. This can be partially attributed to the two-hop D2D communication in Full-MU-MIMO. 3.8 Chapter Summary In this chapter, we presented DM-COM, a practical scheme that combines D2D and MU-MIMO technologies to advance cellular networks. The main challenge in DM-COM is managing the interference between D2D and MU-MIMO subsystems. DM-COM takes the advantage of mul- tiple antennas on the network devices to cancel the interference and recover the desired signals, without requiring channel state information and fine-grained inter-system synchronization. This was achieved through the design of practical yet effective multi-user detection and beamforming methods. We have built a prototype of DM-COM on a custom-built wireless testbed and com- pared its performance with two existing schemes. Our experimental results show that DM-COM achieves 1.9 bit/s/Hz spectral efficiency for D2D users. Moreover, the throughput degradation of MU-MIMO users due to the spectrum utilization of D2D users is less than 8%. 85 Chapter 4 Non-Orthogonal Multiple Access for WLANs 4.1 Introduction Multiple access is a crucial mechanism for wireless network infrastructure to serve multiple users. OMA techniques (e.g., TDMA and Frequency Division Multiple Access (FDMA)), albeit easy to implement, are incapable of approaching network capacity limit due to their exclusivity in resource allocation. This issue becomes particularly acute for networks with strict user fairness require- ments. NOMA has recently emerged as a new multiple access paradigm for infrastructure-based wireless networks. Since its inception, NOMA has attracted a large amount of research attention and has been widely regarded as a promising candidate for Radio Access Technologies (RAT) for 5G networks and beyond. In contrast to OMA, NOMA allows multiple users to utilize the same spectrum band for signal transmissions at the same time and, therefore, offers many advantages such as improving spectral efficiency, enhancing resource allocation flexibility, reducing schedul- ing latency, increasing cell-edge throughput, and enabling massive connectivity. Recognizing its great potentials, power-domain NOMA has been studied in a variety of network settings in an increasingly sophisticated form, such as power allocation in Single-Input Single- Output (SISO) networks [44, 48, 207], precoder design in Multi-Input Single-Output (MISO) net- works [7, 32, 55, 209], and privacy protection [25, 108, 208]. Although a considerable amount of research efforts have been made on the study of NOMA, most of them are limited to theoretical 86 exploration and performance analysis in cellular networks. Very limited progress has been made so far in the development of practical NOMA schemes and experimental validation of NOMA in real wireless network settings. This stagnation reflects the challenges in the design of practical NOMA schemes and the engineering issues related to their implementations, such as channel acquisition and precoding on the transmitter side and SIC realization on the receiver side. In this chapter, we aim to make a concrete step forward to bridge this gap by proposing a prac- tical downlink NOMA scheme for WLANs and evaluating its performance on a wireless testbed. We consider an AP that has one or multiple antennas and a set of widely distributed users that have one antenna each. In such a network setting, we first examine the precoder design problem at the AP for downlink NOMA transmissions. We formulate the precoder design problem as an optimiza- tion problem, which inevitably includes non-convex constraints due to the intrinsic complication of the problem. To solve this problem, we employ a Minorization-Majorization (MM) approach for convexification of constraints. Based on the convexification results, we further develop an iterative algorithm to solve the precoder design problem. Based on the solution to the precoder design problem, we develop a downlink NOMA scheme to enable concurrent data transmissions from an AP to multiple users. Our NOMA scheme features a lightweight user grouping strategy and a new SIC method. Specifically, on the transmitter (AP) side, we develop a heuristic algorithm to group the users for downlink NOMA transmission; on the receiver (user) side, we propose a robust SIC algorithm for interference subtraction and signal detection. In contrast to existing SIC methods [167], which first estimate the channels and then use the estimated channels to decode the signal/interference sequentially, our proposed SIC method does not require channel knowledge for interference subtraction and signal detection. Instead, it directly uses the reference signals (the precoded preamble in a frame) to compute the detection filters, which are used for interference subtraction and signal detection. As channel estimation is 87 vulnerable to interference, the removal of channel estimation in the SIC procedure improves the performance and reliability of signal detection in our NOMA scheme. We have built a prototype of the proposed NOMA scheme on a GNURadio-USRP2 wireless testbed using IEEE 802.11 legacy parameters and conducted extensive experiments in indoor office wireless environments to evaluate its performance in comparison with a conventional TMDA-based OMA scheme. We consider the following three network settings: (i) the AP has one antenna and it serves two users; (ii) the AP has two antennas and it serves two users; and (iii) the AP has two antennas and it serves three users. Our experimental results show that, compared to OMA, the proposed NOMA scheme can significantly improve the data rate of the weak user and considerably improve the weighted sum rate of all users. Specifically, for the cases that we have examined, the average improvement of data rate of the weak users is about 93.1%, and the average improvement of weighted sum rate of all users is about 36.1%. Moreover, our experimental results show that, on average over all the cases that we have considered, our proposed SIC method outperforms the conventional SIC method (least-squares channel estimation and zero-forcing signal detection) by 13.4% for the AP’s weighted sum rate and 39.6% for the data rate of users performing SIC. 4.2 Related work Since its inception, NOMA has been studied in an increasingly sophisticated form for cellular net- works. Given that this work studies power-domain NOMA for the downlink of wireless networks, we focus our literature review on this specific area. Power Allocation for NOMA. Power allocation for NOMA has been well studied in cellular networks where each node has a single antenna. These research efforts mainly focus on the power allocation strategies for NOMA under different performance considerations, such as user fairness 88 [162, 188], outage probability [39, 196], and achievable throughput [129, 199]. This research line was then expanded to joint optimization of power allocation and subcarrier assignment for NOMA in OFDMA networks. These research efforts have produced many results, such as maximizing sum rate subject to the power constraints [44, 48, 207], minimizing the power consumption subject to SIC and rate requirements [99], and developing tractable algorithms [106]. Precoder Design for NOMA. When the base station (BS) has multiple antennas, the power allocation problem in downlink NOMA is escalated to precoder design problem as the power allo- cation and beam steering operations are tightly coupled. The precoder design at the BS needs to jointly optimize NOMA’s power allocation and Multi-Input Multi-Output (MIMO)’s beam steer- ing. In the literature, precoder design has been studied toward different objectives, such as maxi- mizing network throughput [55, 209], maximizing transmitter’s energy efficiency [7, 32], and pre- serving users’ signal privacy [25, 108, 208]. In what follows, we discuss the papers that are mostly relevant to our work. In [55], the precoder design problem has been studied for NOMA to maximize sum rate, sub- ject to the SINR constraints in the SIC process at all the users. A non-convex optimization problem was formulated and an iterative algorithm was developed to pursue a feasible solution. In [209], the precoder design problem has been studied to maximize the sum rate of a sophisticated hybrid network where an unmanned aerial vehicle and a BS serve a set of ground users. Precoder de- sign aimed to nullify the cross-network interference or maintain the interference below a certain threshold. The precoder design problem has also been studied for security enhancement and privacy preservation for power-domain NOMA. In [210], NOMA was studied under eavesdropping at- tacks, and artificial jamming approach was studied to combat the attacks. A non-convex optimiza- tion problem was formulated to maximize the artificial jamming power and, similar to [55], an 89 iterative algorithm was developed to solve the problem. In [25], precoders were designed to ensure the privacy of a particular user. Specifically, precoders were designed to ensure that the private user’s signal is of the weakest strength at all the users except itself. By doing so, none of the users are capable of decoding the private user’s message. As we shall see, our mathematical formulation of the precoder design problem is different from those existing ones. It features practical considerations in the design of precoders for downlink NOMA. User Grouping for NOMA. User grouping is another key component of NOMA. In [37], the impact of user grouping on the performance of NOMA was studied. It shows that the throughput gain of NOMA (over OMA) becomes more significant when the channel strengths of the users in a group increases. However, reaching the optimal user grouping solution demands an exhaustive search. In [104] and [205], it was shown that the computational complexity of exhaustive-search- based user grouping algorithm can be relaxed by pruning the search space. Greedy grouping algorithms (e.g., [36]) and matching-based grouping algorithms (e.g., [100]) were proposed to reach a near-optimal solution. To further reduce the computational complexity, [38] proposed a random grouping algorithm, which needs a very low computation. However, this random algorithm cannot fully exploit the throughput gain of NOMA. The user grouping algorithm in our work is a lightweight heuristic algorithm, and it is amenable to practical implementation. Experimental Validation of NOMA. While there is a large body of theoretical work on NOMA, experimental validation of NOMA in real wireless environments remains limited. Some pioneering work can be found in [18–21]. Our work differs from these research efforts in the following two aspects. First, these research efforts study NOMA in cellular networks, while our work focuses on NOMA for WLANs. Cellular networks and WLANs have significant differences in many aspects, including frame format, transmission pattern, transmit power, and receiver sensi- 90 « $3  67$N 67$ 67$ Figure 4.1: Downlink data transmission in a WLAN. tivity. The results of NOMA in cellular networks cannot be directly applied to WLANs. Second, existing experimental efforts primarily investigated the gain of NOMA over OMA with respect to different system parameters and did not take into account precoder design for the performance op- timization of NOMA. Our work considers both precoder optimization and NOMA implementation in WLANs. 4.3 Problem Description We consider a WLAN as shown in Fig. 4.1, which comprises an AP and a set of user devices (a.k.a. stations, STAs, or users for simplicity). The AP has one or more antennas, and each station has a single antenna. Denote M as the number of antennas on the AP. Denote N as the set of stations, with N being its cardinality (N = |N |). In this network, we assume that the signal from the AP to the stations experiences significantly different path losses. That is, the signal received by STA i is much stronger than the signal received by STA i − 1, for 2 ≤ i ≤ N . This assumption can be fulfilled through a user selection/scheduling algorithm at the upper layer. A Premier of NOMA. In power domain, NOMA takes advantage of the power difference between the interference and desired signals to mitigate interference and decode the desired signal at a receiver. SIC is typically used at the receivers for interference mitigation and signal decod- ing [167]. To illustrate the original concept behind power-domain NOMA, let us consider the 91 67$ VGHVLUHGVLJQDO S 67$ VGHVLUHGVLJQDO S 6LJQDOVWUHQJWK 67$ VGHVLUHGVLJQDO S 1RLVH 67$ 67$ 67$ Figure 4.2: Illustration of NOMA in downlink data transmission in a WLAN for N = 3. network in Fig. 4.2 as an example. In this network, a single-antenna AP serves three stations standing far from each other. The AP sends superimposition of three signals to all stations: signal s1 for STA 1, signal s2 for STA 2, and signal s3 for STA 3, with a proper power allocation for these signals. At the stations, the received signals have significantly different strengths as illustrated in Fig. 4.2. The difference in signal strengths makes it possible for the stations to perform SIC. At STA 1, since the undesired signals s2 and s3 are relatively weak due to the large path loss, the desired signal s1 can be easily decoded by treating interference (s2 and s3 ) as noise. At STA 2, the strongest undesired signal s1 can be first decoded and subtracted from what is received. For the resulting signals, the desired signal s2 can be easily decoded by treating s3 as noise. At STA 3, the strong undesired signal s1 and s2 can be first decoded and removed successively. After that, the desired signal s3 can be decoded in a conventional way. As shown in this example, NOMA can enable concurrent data transmissions for STA 2 and STA 3 without causing much performance degradation for STA 1. Such a multi-user communica- tion approach provides the AP with a new level of flexibility for its resource allocation and user scheduling, which can be leveraged for network throughput maximization. 92 Design Objectives. We consider the WLAN as shown in Fig. 4.1. We aim at developing a practical NOMA scheme to maximize the weighted sum rate of the users. For fairness, weak users have larger weights while strong users have relatively small weights. To do so, several questions remain open and needed to be addressed: (i) How to design the precoder for each data stream at the AP is a non-trivial task. When the AP has one antenna, the precoders degrade to complex co- efficients, which represent the power allocation at the AP. When the AP has multiple antennas, the precoders determine not only the power allocation but also beam steering at the AP. The design of the optimal precoders at the AP involves both Signal-to-Interference-and-Noise Ratio (SINR) and Interference-to-Signal-and-Noise Ratio (ISNR) constraints at each station, making it challenging to reach the optimality. It is noteworthy that we design a unique precoder for each individual data stream in order to pursue performance optimality. Compared to the approach that separates the power allocation and beam-steering design, our joint design promises better possible performance, especially for the networks with a small number of antennas and a small number of users. (ii) How to design a practical scheme to enable downlink NOMA in WLANs is another challenging prob- lem. To support downlink NOMA, the AP needs the knowledge of CSI to compute the precoders; each station needs to perform signal detection in the face of inter-user interference. All these tasks require a sophisticated design of protocols and algorithms that are amenable to practical imple- mentation. 4.4 Precoder Design for Downlink NOMA In this section, we first formulate the precoder design problem in downlink NOMA transmission of WLANs. Then, we convexify the non-convex constraints and propose an iterative algorithm to pursue a feasible solution. Finally, we offer discussions on the proposed algorithm. 93 4.4.1 Mathematical Formulation Consider the downlink data transmission in the WLAN shown in Fig. 4.1. Denote hi ∈ C1×M as the channel from theAP to STA i, which includes the effects of path loss, shadow fading, and fast fading. Owing to the large difference in path losses, we assume that ∥h1 ∥ ≤ ∥h2 ∥ ≤ . . . ≤ ∥hN ∥. At theAP, denote si as the signal intended for STA i, with E(|si |2 ) = 1; denote vi ∈ CM ×1 as the precoding vector of this signal. The transmit signals at theAP, which is denoted by x, can be P written as x = j∈N vj sj . Then, the received signal at STA i ∈ N can be written as: XN yi = hi v j s j + ni , i∈N . (4.1) j=1 where ni ∼ CN (0, σi2 ) is additive white Gaussian noise. Transmit Power Constraint. In practice, the transmit power of theAP is bounded by its maximum power budget, which we denote as Pap . This constraint can be written as: N ∥vi ∥2 ≤ Pap . X (4.2) i=1 SIC and SINR Constraints. At STA i ∈ {2, 3, · · · , N }, we employ SIC to mitigate the strong interference [s1 , s2 , · · · , si−1 ] and decode the desired signal si by treating interference [si+1 , si+2 , · · · , sN ] as noise. Specifically, we first decode the undesired signal s1 by treating signals [s2 , s3 , · · · , sN ] as noise. Based on the estimated signal ŝ1 , we can remove the effect of [1] [1] undesired signal s1 and the resulting signal can be written as yi = hi N P j=2 vj sj +ni , where yi denotes the remaining signal after the first iteration of SIC. By the same token, we can continue to remove undesired signals [s2 , s3 , · · · , si−1 ] sequentially. After removing the undesired signals, 94 Table 4.1: MCS specification in IEEE 802.11ac [67]. SINR (dB) (inf -5) [-5 -10) [-10 -13) [-13 -16) [-16 -19) [-19 -22) [-22 -25) [-25 -27) [-27 -30) [-30 -32) [-32 -inf) Modulation N/A BPSK QPSK QPSK 16QAM 16QAM 64QAM 64QAM 64QAM 256QAM 256QAM Coding rate N/A 1/2 1/2 3/4 1/2 3/4 2/3 3/4 5/6 3/4 5/6 η) 0 0.5 1 1.5 2 3 4 4.5 5 6 20/3 ai 0.079 0.073 0.050 0.025 0.018 0.012 0.004 0.002 0.001 0.001 0 bi 0 0.018 0.247 0.747 0.996 1.495 2.746 3.395 3.996 4.075 6.666 we can decode the intended signal si by treating [si+1 , si+2 , · · · , sN ] as noise. Suppose that the SIC procedure is ideal. By denoting SINRi,j as the SINR in the jth iteration of SIC at STA i, we have 2 hi vj SINRi,j = PN 2 , i ∈ N , 1 ≤ j ≤ i. (4.3) k=j+1 |hi vk | + σi2 By defining γi,j as a non-negative variable less than or equal to SINRi,j , we have 2 hi vj γi,j ≤ PN 2 , i ∈ N , 1 ≤ j ≤ i. (4.4) 2 k=j+1 |hi vk | + σi Data Rate Constraints. In the SIC procedure, STA i needs to decode signals [s1 , s2 , · · · , si ] sequentially. When decoding signal sj (1 ≤ j ≤ i), we known that its SINR is greater than or equal to γi,j . To ensure that STA i can successfully decode sj , the data rate of signal sj is determined by this SINR value. Theoretically, the relationship between the maximum achievable data rate and the given SINR is governed by Shannon capacity. Denote rj as the data rate from the AP to STA j in 1 Hz. Then, the achievable data rate constraints can be expressed as: rj ≤ log2 (1 + γi,j ), i ∈ N , 1 ≤ j ≤ i. (4.5) However, Shannon capacity is far from being reached by current WLANs’ technologies. It is highly inaccurate to characterize the relationship between the achievable data rate and the SINR in WLANs. In real wireless systems, adaptive MCS is typically used to adjust the data rate based on the SINR value. Table 4.1 lists the MCS selection criteria that are specified in IEEE 802.11ac 95 standard [67]. Fig. 4.3 shows the gap between Shannon capacity and the data rate achieved by the adaptive MCS approach. It is evident that the gap is large. This indicates that Shannon ca- pacity is not a good formula to compute the achievable data rate in real WLANs. To enhance the practicality of our results, we employ the adaptive MCS approach to calculate the achievable data rate for a given SINR value. However, their relation is expressed as a staircase function, which is non-convex. To ease our optimization problem, we approximate this non-convex region by the following linear constraints: rj ≤ ak γi,j + bk , i ∈ N , 1 ≤ j ≤ i, 1 ≤ k ≤ 11, (4.6) where ak and bk are constants given in Table 4.1. We can see from Fig. 4.3 that, compared to (4.5), (4.6) is much more accurate to compute MCS-based achievable data rate in real WLANs. It is worth pointing out that the values of ak and bk in Table 4.1 were derived from the MCS specified in IEEE 802.11ac. If we want to apply this method to other networks such as IEEE 802.11ax, the values of ak and bk should be updated according to the MCS specified in corresponding standards. Optimization Formulation. Based on the above constraints, we can formulate the NOMA problem as an optimization problem. Here, we consider the weighted sum rate as the objective function. Other objective functions (e.g., maximizing the minimum data rate) can be formulated in the same way. Denote wj as the given weight for STA j ∈ N . These weights are used to prioritize the service for the STAs and maintain the fairness among the STAs. Generally speaking, a STA with strong channel should be given a small weight, while a STA with weak channel should be given a large weight. Suppose that the STAs’ weights are pre-defined. Then, the objective function P can be written as: j∈N wj rj . Then, the optimization problem, which we denote as OPT-NOMA, can be written as: 96 12 Shannon capacity MCS-based approach 10 8  =  , +   Data rate (Mb/s/Hz) 6 4 2 Approximated Region 0 0 500 1000 1500 2000 SINR Figure 4.3: The gap between Shannon capacity and the data rate achieved by MCS-based approach. X max w j rj (4.7a) j∈N s.t. rj ≤ ak γi,j + bk , i ∈ N , 1 ≤ j ≤ i, 1 ≤ k ≤ 11; (4.7b) 2 hi vj γi,j ≤ , i ∈ N , 1 ≤ j ≤ i; (4.7c) N 2 + σi2 P |hi vk | k=j+1 ∥vi ∥2 ≤ Pap ; X (4.7d) i∈N where ri , γi,j , and vi are optimization variables; wj , ak , bk , hi , and σi , and Pap are given param- eters. Note that we changed the equality in (4.4) to the inequality in (4.7c). It can be verified that this operation does not alter the optimal value. Compared to many existing optimization formulations of NOMA (see, e.g., [55, 210]), one may notice that our formulation does not have explicit constraints to represent the decoding order 97 requirements of SIC. Actually, constraints (4.7b) and (4.7c) in our formulation can ensure the success of SIC at every station. Consider STA i for example. Signal si is its desired signal and sj (1 ≤ j < i) is interference that should be removed. The combination of (4.7b) and (4.7c) ensures that interference sj (1 ≤ j < i) can be decoded and subtracted. It also ensures that the desired signal si can be decoded after subtracting interference sj (1 ≤ j < i). As such, our formulation can ensure the success of SIC, albeit without explicit constraints to enforce the decoding order requirements of SIC. Moreover, our formulation is in a simpler format and relatively easier to solve compared to the existing ones. OPT-NOMA is a non-convex problem, which is NP-hard in general. There is no efficient algorithm that can find its optimal solution in polynomial time. In the rest of this section, we delve into the development of a tractable approach to pursue a suboptimal solution to OPT-NOMA via disciplinary convexification. 4.4.2 Constraint Relaxation via Disciplinary Convexification In OPT-NOMA, (4.7a) and (4.7b) are linear and easy to handle by optimization solvers; however, (4.7c) and (4.7d) are not. In what follows, we focus on these two nonlinear constraints. Constraint (4.7d). This constraint is convex, but in an indisciplined form. To transform it to a disciplined convex constraint, we rewrite it as a Lorentz cone [23, Ch. 2]: h i v1T , v2T , . . . , vN T p ≤ Pap . (4.8) N (N +1) Constraint (4.7c). This constraint generates a set of 2 non-convex inequations and needs to be convexified into a disciplined form. To convexify (4.7c), we introduce an auxiliary PN 2 2 variable zi,j and define zi,j ≥ k=j+1 |hi vk | + σi . Then, (4.7c) can be equivalently broken into 98 the following two sets of constraints: 2 γi,j zi,j ≤ hi vj , i ∈ N , 1 ≤ j ≤ i; (4.9a) N |hi vk |2 ≤ zi,j − σi2 , X i ∈ N , 1 ≤ j ≤ i. (4.9b) k=j+1 To convexify (4.9), we first focus on (4.9a) and then on (4.9b). For (4.9a), it is a non-convex constraint because it has a quadratic term on its right-hand side (RHS). To untangle this problem, we employ tangent point and Taylor expansion to approximate the quadratic term with an appro- priate affine [66, 210]. To illustrate this idea, let us consider a differentiable convex function f (v) for example. At any feasible point, say ṽ, a tangent function g(v, ṽ) can be defined such that f (v) ≥ g(v, ṽ), and the equality holds at v = ṽ. The tangent function g(v, ṽ) is a minorant of f (v), and the solution to the approximated problem using tangent point ṽ will majorize the mi- norant [66]. To further make this constraint disciplinary, the first-order Taylor expansion of f (v) can be used as the tangent function since it removes the high-order nondisciplinary components of f (v). Using the first-order Taylor expansion, the tangent function can be written as: g(v, ṽ) = f (ṽ) + ∇f (ṽ)H (v − ṽ). (4.10) 2 We apply this idea to the RHS of (4.9a). If fi (vj ) = hi vj is defined, then we have ∇fi (vj ) = 2hi hH i vj . The tangent function at ṽj can be written as: gi (vj , ṽj ) = fi (ṽj ) + ∇fi (ṽj )H (vj − ṽj ) = hi ṽj ṽjH hH H i + 2hi hi ṽj (vj − ṽj ) 99 = 2hi ṽj vjH hH H H j − hi ṽj ṽj hi . (4.11)  Given that both sides of original constraint (4.9a) are real values, we use Re gi (vj , ṽj ) as the tangent function for fi (vj ). Then, the RHS of (4.9a) can be approximated by 2 H H ≈ Re gi (vj , ṽj ) = 2Re(hi ṽj vjH hH  hi vj j ) − hi ṽj ṽj hi , (4.12) We apply the same method to convexify the left-hand side (LHS) of (4.9a). To convexify the product of two variables, we define a bivariate function f (γ, z) = γz. It is neither convex nor concave since its Hessian matrix is neither positive semidefinite nor negative semidefinite, and it also has a saddle point at γ = z = 0. However, this function can be expressed as a summation of a (γ+z)2 convex function and a concave one, i.e., f (γ, z) = f1 (γ, z) + f2 (γ, z), where f1 (γ, z) = 4 (γ−z)2 and f2 (γ, z) = − 4 . To convexify f (γ, z), it suffices to pursue the idea of using a tangent function for its concave component. Since f2 (γ, z) is a differentiable concave function, tangent function g (γ, z, γ̃, z̃) is a majorant of f2 (γ, z). Indeed, f2 (γ, z) ≤ g (γ, z, γ̃, z̃). This majorant can be expressed as a tangent function at point (γ̃, z̃) as: 1 1 g (γ, z, γ̃, z̃) = (γ̃ − z̃) (γ − γ̃ + z̃ − z) − (γ̃ − z̃)2 . (4.13) 2 4 Based on the tangent function in (4.13), we can approximate f (γ, z) = γz using f1 (γ, z) + g (γ, z, γ̃, z̃). Then, the LHS of (4.9a) can be approximated by   γi,j zi,j ≈f1 γi,j , zi,j + g γi,j , zi,j , γ̃i,j , z̃i,j 1 2 1 2 = γi,j + zi,j − γ̃i,j − z̃i,j 4 4 100 1   − γ̃i,j − z̃i,j γi,j − γ̃i,j + z̃i,j − zi,j . (4.14) 2 Based on the relaxations in (4.12) and (4.14), the non-convex constraint (4.9a) can be approxi- mated by the following convex constraint: 1 2 1 2 γi,j + zi,j − γ̃i,j − z̃i,j 4 4 1   − γ̃i,j − z̃i,j γi,j − γ̃i,j + z̃i,j − zi,j 2 ≤2Re(hi ṽj vjH hH H H j )−hi ṽj ṽj hi , i ∈ N , 1 ≤ j ≤ i. (4.15) So far, we have convexified constraint (4.9a). Now, we focus on (4.9b), which is a restricted hyperbolic constraint. This constraint is convex but indisciplined. To make it disciplined, we first introduce an existing technique and then apply it to transform (4.9b). Consider an indisciplined convex constraint θ2 ≤ αβ, α, β ∈ R+ and θ ∈ R. Based on [11], we have:   (α − β) (α + β) θ2 ≤ αβ ⇐⇒ θ, ≤ , (4.16) 2 2 where ⇐⇒ means that the two sides are equivalent, and the RHS is a disciplined convex constraint. By taking advantage of this result, indisciplined convex constraint (4.9b) can be equivalently trans- formed to a disciplined convex constraint as follows: " # zi,j − σi2 − 1 zi,j −σi2 +1  hi vj+1 , · · · , |hi vN |, ≤ , i ∈ N , 1 ≤ j ≤ i, (4.17) 2 2 The relaxed problem using the convexified constraints, which we denote as OPT-NOMA- RELAX, can be written as: 101 X max w i ri i∈N s.t. (4.7b), (4.8), (4.15), and (4.17). OPT-NOMA-RELAX is a second-order cone programming (SOCP) problem, which can be solved in polynomial time by off-the-shelf optimization solvers such as CVX and CVXOPT [13]. 4.4.3 Our Proposed Algorithm Based on OPT-NOMA-RELAX, we propose an algorithm to solve the original problem OPT- NOMA. The proposed algorithm is an iterative algorithm. In each iteration, we solve OPT- NOMA-RELAX by taking the output results from the previous iteration as the input parameters (tangent points for convexification). The iterative algorithm terminates if the increase of the objec- tive value is less than a pre-defined threshold (ϵ) or the number of iterations reaches a pre-defined bound (Niter ). For notational simplicity, when solving OPT-NOMA-RELAX in iteration l, we h i denote ṽ[l−1] , γ̃ [l−1] , z̃[l−1] as the input parameters (the tangent points for convexification) and h i v[l] , γ [l] , z[l] , r[l] as the output results (the optimal solution to OPT-NOMA-RELAX). Alg. 4.1 presents our proposed algorithm. For such an iterative algorithm, an important question is how to construct an appropriate initial tangential set for the OPT-NOMA-RELAX problem in the first iteration. It is well known that the performance of many optimization problems is heavily reliant on their initial search points. A good initial point significantly accelerates the search process and therefore remarkably reduces the computational time of the algorithm. In light of this, we develop an algorithm to construct 102 Algorithm 4.1 Solving OPT-NOMA. Inputs: Network parameters hi , σi , N , wj , Pap , ak, bk , and threshold ϵ; Outputs: A solution to OPT-NOMA v∗ , γ ∗ , z∗ , r∗ ; h i 1: Compute initial tangent points ṽ[0] , γ̃ [0] , z̃[0] using Alg. 4.2; 2: Specify the max number of iterations (e.g., Niter = 100); (l = 1; l ≤ Niter ;il++) do 3: for h h i 4: [l] [l] [l] [l] v , γ , z , r ←solving OPT-NOMA-RELAX using ṽ [l−1] , γ̃ [l−1] , z̃[l−1] ; h i h i 5: ṽ[l] , γ̃ [l] , z̃[l] ← v[l] , γ [l] , z[l] 6: if r[l] − r[l−1] < ϵ then 7: Break;  ∗ ∗ ∗ ∗  h [l] [l] [l] [l] i 8: v , γ , z , r ← v , γ , z , r ; a good initial search point for the OPT-NOMA-RELAX problem in the first iteration. Alg. 4.2 shows our proposed algorithm. In this algorithm, we first randomly generate a set of vectors for ṽ[0] and then normalize its amplitude to meet the power constraint. Upon initializing ṽ[0] , we then calculate γ̃ [0] and z̃[0] based on their respective constraints. In this process, a small number ε is used to ensure the strict feasibility of the tangential set and maximize its corresponding objective value. 4.4.4 Discussions on the Proposed Algorithm Alg. 4.1 and Alg. 4.2 constitute our proposed algorithm to solve the optimization problem OPT- NOMA. We have the following remarks for the proposed algorithm: Remark 1 (Feasibility). Our proposed algorithm yields a feasible solution to the original op- timization problem OPT-NOMA. We pinpoint this by arguing that any feasible solution to OPT- NOMA-RELAX is also feasible to OPT-NOMA. Comparing the two optimization problems, we can see that the different constraints are (4.7c) in OPT-NOMA and (4.15) and (4.17) in OPT- NOMA-RELAX; other constraints are the same. Now, let us focus on these two constraints. 103 Algorithm 4.2 Constructing the first tangential set for OPT-NOMA-RELAX. Inputs: Network parameters hi , σi , N , wj , Pap , and safety gap h ε; i Outputs: An initial tangential set for OPT-NOMA-RELAX ṽ[0] , γ̃ [0] , z̃[0] ; h i 1: Generate random values for v̂1T , v̂2T , · · · , v̂N T 2: for (i = 1; i ≤ N ; i + +) do s (1−ε)Pap 3: vi = P N 2 v̂i j=1 v̂j 4: for (i ∈ N , 1 ≤ j ≤ i) do  N  2 σi2 P 5: Calculate zi,j = (1 + ε) + |hi vk | k=j+1 2 (1−ε) hi vj 6: Calculate γi,j = zi,j h i [0] [0] [0]   7: ṽ , γ̃ , z̃ ← v, γ, z The relaxation from (4.7c) to (4.15) and (4.17) is actually a minorization-majorization process [66]. That is, we replaced a concave function on the LHS with a tangent affine function and replaced a convex function on the RHS with a tangent affine function (see (4.12) and (4.14) respectively). Based on the properties of concave and convex functions, we know that (4.15) and (4.17) are more restrictive than (4.7c). In other words, a solution satisfying (4.15) and (4.17) certainly satisfies (4.7c). Therefore, we conclude that a solution feasible to OPT-NOMA-RELAX is also feasible to OPT-NOMA. According to Alg. 4.1 and Alg. 4.2, it is easy to see that the generated solution is feasible to OPT-NOMA-RELAX, which is also feasible to the original optimization problem OPT-NOMA. Remark 2 (Convergence). Our proposed algorithm converges to a stationary point. Since the feasible region of OPT-NOMA-RELAX is expanding over the iterations in Alg. 4.1 [66], the value returned by the objective function is non-decreasing. Moreover, the solution yielded by each itera- tion (from solving OPT-NOMA-RELAX) is in the feasible region of OPT-NOMA. Because of the same objective function on both problems, Alg. 1 converges to a stationary point, which could be 104                67$   $3 Figure 4.4: An example of WLAN that has a set of widely distributed stations. either a global or local optimal point. Remark 3 (Computational Complexity). Our proposed algorithm (Alg. 4.1) has polynomial- time computational complexity. Alg. 4.1 is an iterative algorithm. In each iteration, its main work is solving the OPT-NOMA-RELAX problem (an SOCP problem). Given M ≤ N in NOMA, the complexity of each iteration is O(N 6 ) [120]. Since the number of iterations in Alg. 1 is bounded by Niter , the overall computational complexity of Alg. 4.1 is O(Niter · N 6 ). Remark 4 (Imperfect CSI). In the formulation of OPT-NOMA, we assumed perfect CSI for the design of precoders. However, in real systems, perfect CSI may not be available. In that case, we can use the measured (imperfect) CSI as the input to compute the precoders. Apparently, the imperfection of CSI may lead to a performance degradation. 105 4.5 A Downlink NOMA Scheme for WLAN In this section, we propose a practical scheme based on the precoder design in the previous section to enable downlink NOMA transmissions in WLANs. We consider a WLAN as shown Fig. 4.4, which comprises an AP and a set of widely distributed stations (STAs). Denote S as the set of STAs in the network, with S = |S|. The STAs are sorted in non-decreasing order based on their channel quality (i.e., ∥hi ∥, i ∈ S). STA 1 is the weakest station and STA S is the strongest one. For such a network, we propose a downlink NOMA framework to support multi-user communications by leveraging the precoder optimization approach in Section 4.4. 4.5.1 User Grouping at AP We assume that the AP is responsible for user scheduling and grouping for the downlink transmis- sions. To perform user grouping, the AP needs to determine the number of stations in one group. Theoretical exploration of this problem requires an exhaustive search to identify the best grouping strategy that leads to the maximum network throughput. However, such an approach is overly com- plicated and not amenable to practical implementation. Therefore, we resort to a heuristic design for user grouping. In what follows, we first study the user grouping in a simple WLAN and then propose a heuristic algorithm for user grouping in a generic WLAN. User Pairing in SISO Network. We consider a WLAN as shown in Fig. 4.4 and assume that each node (AP or STA/user) has a single antenna. We also assume that each group has two users in NOMA transmission for simplicity. Denote hw and hs as the channel coefficients of the weak and strong users in a group, respectively. Denote p(hw , hs ) as the normalized portion of AP’s power allocated to the strong user’s message. Based on the notion of NOMA, the AP’s 106 :HDNHVWXVHU 6WURQJHVWXVHU   S S S S S S   %HVWSDLULQJVWUDWHJ\   S S S 6 S S   :RUVWSDLULQJVWUDWHJ\ Figure 4.5: Two pairing strategies in NOMA communications. power allocation for NOMA transmission should have the following property: p(hw , hs ) is a non- increasing function with respect to |hs | / |hw |. Based on this property, we have the following proposition: Proposition 1. Suppose that the objective is to maximize the weighted sum rate of all users and that round-robin scheduler is used for the paired users. Then, the best pairing strategy is (i, S/2+i), and the worst pairing strategy is (i, S + 1 − i), for 1 ≤ i ≤ S/2, as illustrated in Fig. 4.5. Proof. Consider the pairing strategies in Fig. 4.5. Suppose that the paired users are scheduled in the round-robin way over a set of time slots and that rbps denotes the weighted sum rate yielded by the (i, S/2 + i) pairing strategy. Then, we have 2    S/2 !  αi hS/2+i 2 X (1 − αi ) |hi | rbps = w1 log2 1+ +w log 2 1+ , (4.18)  2 2 α |h i i | +σ 2 σ2 i=1 where αi = p(hi , hS/2+i ), and (1 − αi ) is then the normalized portion of AP’s transmit power 107 for the weak user, σ 2 is the normalized power of noise (w.r.t. the signal power), and w1 and w2 denote the weight assigned to the weak and strong users in a group, respectively. To show that the (i, S/2 + i) pairing strategy yields the highest weighted sum rate among all possible pairing strategies, we argue that any permutation over this pairing strategy would lead to a decrease in the weighted sum rate. Without loss of generality, we assume that the permutation occurs for user pairs (1, S/2 + 1) and (2, S/2 + 2). After permutation, the resulting pairs are (1, S/2 + 2) and (2, S/2 + 1). For the permuted user pairs, we let α1′ = p(h1 , hS/2+2 ) and α2′ = p(h2 , hS/2+1 ). Then, the change of weighted sum rate from the permutation can be written as follows: ! (1 − α1 ) |h1 |2 α 2 ∆r = rbps − rperm = w1 log2 1+ 2 + w2 log2 (1 + 21 hS/2+1 ) α1 |h1 | + σ 2 σ ! (1 − α2 ) |h2 |2 α 2 + w1 log2 1 + 2 + w2 log2 (1 + 22 hS/2+2 ) α2 |h2 | + σ 2 σ ! (1 − α1′ ) |h1 |2 α′ 2 − w1 log2 1 + − w2 log2 (1 + 21 hS/2+2 ) α1′ |h1 |2 + σ 2 σ ! (1 − α2′ ) |h2 |2 α′ 2 − w1 log2 1 + − w2 log2 (1 + 22 hS/2+1 ), (4.19) α2′ |h2 |2 + σ 2 σ where rperm denotes the weighted sum rate after permutation. Through algebraic operations, (4.19) can be rewritten as: ! ! α2′ |h2 |2 + σ 2 α1′ |h1 |2 + σ 2 ∆r = w1 log2 −w1 log2 + α2 |h2 |2 + σ 2 α1 |h1 |2 + σ 2 2 2     2 2 α2 hS/2+2 + σ  α1 hS/2+1 + σ  w2 log2  2  −w 2 log 2 2 . (4.20) ′ α1 hS/2+2 + σ 2 ′ α2 hS/2+1 + σ 2 Recall that the users are sorted in increasing order by their channel strength, i.e., |h1 | ≤ |h2 | ≤ hS/2+1 ≤ hS/2+2 . Since p(hw , hs ) is a non-increasing function with respect to |hs | / |hw |, 108 Algorithm 4.3 An algorithm for user grouping. Inputs: The array of sorted STAs (S = {1, 2, · · · , S}) and each STA’s channel (hi , i ∈ S); Outputs: The total number of groups (K) and the generated user groups G1 , G2 , · · · , GK ; 1: ∆q ← 10; 2: k ← 0; 3: while (S is not empty) do 4: k++; 5: Gk = [S(1)]; // S(1) is the 1st element of S 6: q_value ← q(S(1)); 7: for (l = 2; l ≤ size(S); l++) do 8: if q(S(l)) ≥ q_value + ∆q then 9: Gk ← [Gk S(l)]; 10: q_value ← q(S(l)); 11: Remove all elements in Gk from S; 12: K ← k; we have α1 ≥ α1′ , α2′ ≥ α2 , α2′ ≥ α1 , and α2 ≥ α1′ . Then, the following two inequalities are imminent. α2′ |h2 |2 + σ 2 α1′ |h1 |2 + σ 2 ≥ , (4.21) α2 |h2 |2 + σ 2 α1 |h1 |2 + σ 2 2 2 α2 hS/2+2 + σ2 α1 hS/2+1 + σ2 2 ≥ 2 . (4.22) α1′ hS/2+2 + σ2 α2′ hS/2+1 + σ2 Based on (4.20), (4.21), and (4.22), it is evident that ∆r ≥ 0. This shows that any permutation on user pairing (i, S/2 + i) decreases the weighted sum rate. We therefore conclude that the (i, S/2 + i) pairing strategy yields the highest weighted sum rate. By the same token, we can prove that the (i, S + 1 − i) pairing strategy yields the lowest weighted sum rate. We omit this part to converse space. From Proposition 1, we have the following observations on user pairing: (i) it should try to avoid pairing two users with similar channel quality; and (ii) it should try to maintain a similar channel difference for user pairs. User Grouping in MISO Network. Based on the above two observations, we propose a 109 heuristic user grouping algorithm for a generic WLAN. For STA i ∈ S, we define its channel quality indicator as q(i) = 20 log10 (∥hi ∥), where hi is STA i’s channel that includes path loss, shadow fading, and fast fading. Based on the channel quality indicator, we use the following rules to devise a user grouping algorithm: (i) The STAs in the same group should have at least ∆q chan- nel quality difference, where ∆q represents the channel quality difference in decibel and should be adaptively set based on the network environment. In our experiments, extensive measurements of wireless channels in an office building show that the average channel quality difference of two users is about 9.3 dB. Based on this observation, we set ∆q = 10 dB for the user grouping algo- rithm. (ii) one STA is associated with only one group. Per these two rules, we propose a greedy algorithm as shown in Alg. 4.3 for user grouping. Essentially, Alg. 4.3 is heuristic. We have the following remarks on it. Remark 5 (Single STA in a Group): Based on our algorithm, it is apparent that there is no guarantee each group has more than one STA. If a group has only one STA, this means that NOMA is not needed, and OMA can be used for its transmission. Essentially, such a grouping algorithm requires a combination of NOMA and OMA at the PHY layer for data transmission. 4.5.2 A MAC-Layer Protocol for NOMA If a group includes multiple users (STAs), then NOMA is used to enable concurrent data transmis- sion for the STAs. With a bit abuse of notation, we denote {1, 2, · · · , N } as the STAs in the group under consideration. Fig. 4.6 shows our proposed protocol for NOMA transmission. At high level, it comprises three steps: Channel sounding, NOMA transmission, and acknowledgment. Since the acknowledgment step is straightforward, we focus our discussions on channel sounding and NOMA transmission. 110 AP NDPA Poll Pol l Superposition of signal frames NDP ACK STA 1 NDP ACK STA 2 ... ... ... NDP ACK STA N Implicit channel feedback NOMA transmission ACK Figure 4.6: A protocol for NOMA transmission in WLANs. Channel Sounding. To reduce the airtime overhead, we employ an implicit channel feedback mechanism in our protocol by leveraging channel reciprocity. Specifically, the AP first broad- casts a Null Data Packet Announcement (NDPA) to inform the stations of channel sounding and NOMA transmission. Upon reception of the NDPA packet, the stations sequentially respond with a NDP following the poll packets from the AP. The NDP includes the preamble (reference sig- nals) enabling the AP to estimate the uplink channel. At the end of this step, the AP obtains the uplink channels between itself and all the intended stations. The obtained uplink channels will be converted to downlink channels through channel calibration. In such an implicit channel feedback mechanism, three important problems need to be taken into consideration: (i) For the protocol in Fig. 4.6, the stations should use the same transmit power when transmitting the NDP (e.g., the maximum transmit power specified in the standards). Use of different transmit powers will confuse the AP about the channel quality between itself and the stations, thereby leading to a failure in the downlink NOMA transmission. (ii) Typically, the stations in a WLAN have the same noise power. In some extreme cases where the stations have different noise power, the stations need to feed their noise power back to the AP. This can be 111 easily done by embedding the noise power information (only a real number) into the NDP when performing uplink channel sounding. (iii) To perform downlink NOMA transmission, the AP actually needs to know the downlink channels information. It is therefore imperative to infer the downlink channels based on the measured uplink channels in our protocol. When the AP has a single antenna, the difference between an uplink channel and its corresponding downlink channel can be represented by a complex number in the mathematical channel model. Such a complex scalar does not affect the NOMA scheduling and transmission results. Therefore, the measured uplink channels can be equivalently treated as downlink channels. When the AP has multiple antennas, the difference between uplink and downlink channels is an array of complex numbers. To compensate the mismatch, a channel calibration procedure is needed at the AP. While there are many calibration methods, we employ the relative calibration method in [150]. This relative calibration method is an internal and standalone calibration method that can be done at the AP without any aid from the stations. In our experiment, we implicitly implement this calibration method to maintain channel reciprocity. NOMA Transmission and Frame Structure. After obtaining the downlink channel, the AP computes precoders using the proposed method in Section 4.4 and selects an MCS for each STA in the scheduled group. Then, the AP performs downlink NOMA transmission as illustrated in Fig. 4.6. To perform downlink NOMA transmission, we propose a MU-MIMO-like frame structure as shown in Fig. 4.7. The proposed frame structure has three parts: (i) The legacy preamble part comprises a Legacy Short Training Field (L-STF), a Legacy Long Training Field (L-LTF), and a Legacy Signal (L-SIG) field. This part is designed for frame detection and time/frequency synchronization on the STA side. (ii) The reference signal part comprises N precoded L-LTFs, each of which has two identical OFDM symbols. This part is devised for signal detection on the STA side. (iii) The data (payload) part comprises a sequence of OFDM symbols, each of which is 112 2 OFDM symbols ... ... Subcarrier k ࣬ʹ ሺ݇ሻ ࣞ L- STF L- LTF L- SIG LTF (V1) LTF (V2) ... LTF (VN) DATA Legacy Precoded reference signal Superposition of precoded signals for N users preamble Figure 4.7: Proposed frame structure for NOMA transmission. a superposition of precoded signals for N stations. In what follows, we propose a new SIC method to decode the desired signal at each STA. 4.5.3 PHY-Layer Signal Processing At the PHY layer, multiple adjacent subcarriers are bonded together for data transmission in order to reduce computational complexity. The rationale behind this operation is that the channels of adjacent subcarriers are typically similar. Hence, the bonding strategy does not cause much per- formance degradation but reduces the complexity significantly. Fig. 4.7 illustrates an example of bonding over five adjacent subcarriers. For the bonded subcarriers, we use the same precoder for power allocation and beam steering on the AP side and the same detection filter for signal recovery on the user side. AP-Side Precoding. After computing the precoders for each user, we assemble a downlink NOMA transmission frame as shown in Fig. 4.7. The first part of the frame is fixed. The second 113 part of the frame is computed based on the precoders. Specifically, for the jth LTF in this part, its frequency-domain data is generated by x(l, k) = vj (k)s̄j (l, k), where s̄j (l, k) is a pre-defined reference signal. The third part is superposition of precoded data in the frequency domain. It is PN generated by x(l, k) = j=1 vj (k)sj (l, k). The generated signal vector x(l, k) is converted into time domain using IFFT operation. The resulting time-domain signal vector will be sent to RF chains for transmission. User-Side SIC-based Signal Detection. Since each frame has the IEEE 802.11 legacy pream- ble, the users can perform frame detection, time synchronization, and frequency offset correction in the same way as conventional Wi-Fi devices do. Afterward, each user performs SIC to decode its desired signal. For ease of exposition, we denote l as the index of OFDM symbol in a frame and denote k as the index of subcarrier in the OFDM modulation. Then, the received signal at STA i can be written as: XN yi (l, k) = hi (k)vj (k)sj (l, k) + ni (l, k). (4.23) j=1 To decode the desired signal at STA i, one approach is using ZF SIC (ZF-SIC). This ap- proach decodes and subtracts the strongest signal sequentially until its desired signal is obtained. When decoding the strongest signal, it simply treats other (non-strongest) signals as interference. When decoding sj , it first estimates the compound channel by ĥj (k) = ȳi (l, k)/s̄j (l, k), where ȳi (l, k) and s̄j (l, k) are the received and transmitted reference signals, respectively. Then, it uses the estimated channel to decode the strongest signal by letting ŝj (l, k) = yi (l, k)/ĥj (k), where ŝj (l, k) is the estimated version of the strongest signal sj (l, k). Although ZF-SIC is amenable to implementation, its performance is highly suboptimal. This is because it does not take into account the effect of noise and interference (non-strongest signals) in the course of its signal de- tection. To improve its performance, we may consider MMSE SIC (MMSE-SIC), which takes into 114 account noise and interference. In contrast to ZF-SIC, MMSE-SIC estimates the strongest signal as follows: ŝj (l, k) = ĥj (k)∗ [ĥj (k)ĥj (k)∗ + ρ1 ]−1 yi (l, k), where ρ is the SINR. MMSE-SIC requires the knowledge of SINR, making it difficult to implement in practice. To circumvent the above problems, we propose a new SIC scheme, which uses the refer- ence signals to construct a detection filter directly. Specifically, at STA i, we decode signals {s1 (l, k), s2 (l, k), · · · , si (l, k)} in sequence. When decoding the strongest signal sj (l, k), we construct the detection filter as follows: P ∗ (l,k)∈Rj (k) yi (l, k)sj (l, k) gj (k) = P ∗, 1 ≤ j ≤ i, (4.24) (l,k)∈Rj (k) yi (l, k)yi (l, k) where (·)∗ is the conjugate operator, and Rj (k) is the set of reference signals in the jth LTF on subcarrier k. Fig. 4.7 illustrates an example of Rj (k) when j = 2. It can been seen that we use not only the reference signals on subcarrier k but also reference signals on its two neighboring sub- carriers to construct detection filter gj (k). The rationale behind this design is that the summation over multiple subcarriers can reduce the effect of noise and interference (non-strongest signals). After calculating the detection filter, we estimate signal sj (l, k) in the data part of the frame as follows: ŝj (l, k) = gj (k)∗ yi (l, k), 1 ≤ j ≤ i, (4.25) where ŝj (l, k) is the estimated version of sj (l, k). Based on (4.24) and (7.12), we present the proposed SIC algorithm in Alg. 4.4. Apparently, this algorithm does not require the estimated SINR. But, it can partially reduce the influence of noise and interference. This is important in SIC detection because all of non-strongest signals are considered as interference. Meanwhile, this SIC algorithm has a low complexity, and it is amenable to practical implementation. For its performance, we will show via experimental results 115 Algorithm 4.4 The proposed SIC at STA i. Inputs: Received signal yi (l, k), reference signals in the frame; Outputs: Estimated signals in the data part of the frame, i.e., ŝi (l, k) for (l, k) ∈ D; 1: for (j = 1; j ≤ i; j++) do 2: Compute decoding filter gj (k) using (4.24); 3: Estimate current signal ŝj (l, k) using (7.12); 4: s̃j (l, k) ← QAM-based demodulation of ŝj (l, k); 5: yi (l, k) ← yi (l, k) − s̃j (l, k)/gj (k)∗ ; that it considerably outperforms ZF-SIC. 4.6 Performance Evaluation In this section, we conduct experiments to evaluate the performance of the proposed NOMA scheme in real-world wireless environments. 4.6.1 Prototyping and Experimental Setup Experimental Testbed. We have prototyped an AP and three users. The AP has been imple- mented using two USRP N210 devices and one laptop. Each user has been implemented using an USRP N210 device and one laptop. The USRP devices are used for radio signal transmission and reception, and the laptop is used for baseband signal processing. All the baseband signal processing is carried out by the laptop using Python and C++ in GNU Radio software package. The prototyped AP supports up to two antennas for signal transmission and reception, while users support one antenna. On the AP, the relative calibration method in [150] was implemented to preserve the up- link/downlink channel reciprocity. This relative calibration method is a standalone calibration procedure that can be done by the AP without requiring the involvement of the users. Prototyping NOMA. We have implemented the NOMA protocol in Fig. 4.6 on the testbed. 116 (a) STA (b) AP 110 ft L31 L41 L32 L42 L33 L43 AP L21 L22 L23 L13 L12 L11 47 ft L63 L53 L62 L52 L51 L61 (c) Floor plan Figure 4.8: Our NOMA testbed and floor plan. As shown in Fig. 4.6, the protocol first performs uplink channel sounding to obtain the uplink channels. Based on the channel knowledge, Alg. 4.1 is used to compute the precoders (vi , i ∈ N ) using a convex optimization solver such as CVX and CVXOPT [13] . In OPT-NOMA, we set w1 = 3 for weak user, w2 = 2 for middle user, and w3 = 1 for strong user. These weights are just an example and other weights would also work. After computing the precoders, the AP sends a superimposition of the signals toward users using the frame structure depicted in Fig. 4.7. The users perform SIC to decode their desired signals. In our implementation, we use Schmidl- Cox algorithm for the timing and frequency synchronization at the receivers in both uplink and downlink transmissions. We use least-squares channel estimation at the AP in the uplink channel sounding. For the downlink transmissions, we use the precoded reference signals in the frame (see Fig. 4.7) to construct the channel equalization coefficient gj (k) and then use this coefficient for signal detection, as detailed in (4.24) and (7.12). During this protocol, IEEE 802.11 legacy frame parameters are used for both uplink and down- link transmissions. That is, each OFDM symbol has 64 subcarriers in total; 48 subcarriers are used 117 for data transmission; 4 subcarriers are used for pilot; and 12 subcarriers are null. The 52 valid subcarriers are bonded into two groups for the precoder design in NOMA. The length of cyclic prefix is 16. Due to the hardware limitation, we set the sampling rate to 5 Msps (to avoid the unflat circuit response from the CIC) and set the short interframe space (SIFS) to 2 seconds. Given the 5 MSps sampling rate, the time duration of each OFDM symbol is 16 µs. The data part of each frame consists of 20 OFDM symbols. Experimental Setup. In our tests, the maximum transmit power for each node (AP or user) is set to 17 dBm. Fig. 4.8 shows the floor plan of our test scenarios and the prototyped AP and STA. Regarding the floor plan, the AP is placed at a fixed location marked “AP”. The three users are placed at one of the six different locations. Specifically, STA 1 is placed Lk1 , STA 2 is placed Lk2 , and STA 3 is placed Lk3 , for k = 1, 2, · · · , 6. It is noteworthy that the linear deployment of the three users in a group is for ease of explanation. In a rich scattering environment like an office building, such a deployment does not impose a significant correlation among users’ channels. Moreover, it is worth pointing out that our experimentation does not include the user grouping algorithm. 4.6.2 Performance Metrics and Benchmark Performance Benchmark. We use OMA as the performance benchmark to evaluate the throughput gain of NOMA. In OMA, the round-robin scheduler is used at the AP. Specifically, the AP serves only one user in one time slot, and different users are scheduled in different time slots. When the AP has multiple antennas, the best antenna is selected for spatial diversity. Performance Metrics. We evaluate the performance of the proposed NOMA scheme using the two metrics: EVM and data rate. (i) EVM is a metric widely used in WLANs. At STA j,  2  2  its EVM is defined as EVM = 10 log10 El,k ŝj (l, k) − sj (l, k) / El,k sj (l, k) . (ii) The 118 data rate is extrapolated based on the measured EVM at each user using the MCS specified in IEEE 802.11 [67]. Specifically, the data rate at STA j is calculated by 48 NOMA: rj = · b · η(EVM), (4.26a) 80 1 48 OMA: rj = · · b · η(EVM), (4.26b) N 80 where N is the number of users served by the AP, 48 is the number of payload subcarriers, 80 is the number of samples in an OFDM symbol, b is the bandwidth (5 MHz), EVM is measured at the STA j when NOMA or OMA is used, and η(EVM) is the average number of bits carried by one subcarrier in an OFDM symbol and its value is given in Table 4.1. 4.6.3 Experimental Results of (1 × 2)-NOMA We first consider the case where the AP has one antenna and it serves two users (one weak user and one strong user). The weak user is placed at Lk1 and the strong user is placed at Lk3 , k = 1, 2, · · · , 6. Case Study. We use location 4 (k = 4) as an example to examine NOMA. Fig. 4.9 presents the constellation of the decoded signals at the two users when NOMA and OMA are used, re- spectively. For the weak user, Fig. 4.9(a) and Fig. 4.9(c) show that NOMA has a small (2.3 dB) EVM degradation compared to OMA. Using (4.26), the data rate at the weak user is extrapo- lated to 3.0 Mbps in NOMA and 1.5 Mbps in OMA. This indicates that NOMA has a significant throughput gain for the weak user. For the strong user, Fig. 4.9(b) show its decoded signals af- ter SIC in NOMA, respectively. We can see that the strong user can achieve −17.3 dB EVM after SIC. Compared Fig. 4.9(b) with Fig. 4.9(d), we can see that the strong user has a consider- 119 1 1 1 1 0.5 0.5 0.5 0.5 0 EVM=-10.5 dB 0 EVM=-17.3 dB 0 EVM=-12.8 dB 0 EVM=-26.0 dB -0.5 -0.5 -0.5 -0.5 -1 -1 -1 -1 -1 -0.5 0 0.5 1 -1 -0.5 0 0.5 1 -1 -0.5 0 0.5 1 -1 -0.5 0 0.5 1 (a) NOMA: signal at (b) NOMA: signal at (c) OMA: signal at (d) OMA: signal at weak user. strong user. weak user. strong user. Figure 4.9: Constellations of NOMA and OMA in downlink. 4 8 25 Weak user's data rate (Mbps) Strong user's data rate (Mbps) Weighted sum rate (Mbps) 20 3 6 15 2 4 10 OMA OMA 1 2 NOMA with ZF-SIC NOMA with ZF-SIC OMA 5 NOMA with proposed SIC NOMA with proposed SIC NOMA 0 0 0 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 Location index Location index Location index (a) Weak user’s data rate (b) Strong user’s data rate (c) AP’s weighted sum rate Figure 4.10: Performance comparison of NOMA and OMA in downlink transmission of a WLAN where a single-antenna AP serves two single-antenna users. able EVM degradation (about 9.4 dB) when NOMA is used. Using (4.26), the data rate achieved by the strong user is extrapolated to 6.0 Mbps in NOMA and 6.7 Mbps in OMA. Compared to OMA, our NOMA scheme slightly decreases the strong user’s data rate. The reasons are two- fold. First, NOMA serves two users while OMA serves one user. Moreover, a higher weight is assigned for the weak user to maintain the fairness in NOMA transmission when we conduct the optimization (OPT-NOMA). Second, SIC in NOMA is not perfect due to the limited ADC resolu- tion, circuit nonlinearity and distortion. The imperfection of SIC degrades the performance of the strong user. From the AP’s perspective, the weighted sum rate of the two users is 22.5 Mbps in our NOMA scheme and 16.9 Mbps in OMA. This means that our NOMA scheme has about 33.5% improve- ment over OMA in terms of weighted sum rate. Results from All Locations. Fig. 4.10 presents the extrapolated data rate at each user when NOMA and OMA are used, respectively. Two SIC techniques are implemented for the strong 120 1 1 1 1 0.5 0.5 0.5 0.5 0 EVM=-12.6 dB 0 EVM=-20.0 dB 0 EVM=-12.8 dB 0 EVM=-24.9 dB -0.5 -0.5 -0.5 -0.5 -1 -1 -1 -1 -1 -0.5 0 0.5 1 -1 -0.5 0 0.5 1 -1 -0.5 0 0.5 1 -1 -0.5 0 0.5 1 (a) NOMA: signal at (b) NOMA: signal at (c) OMA: signal at (d) OMA: signal at weak user. strong user. weak user. strong user. Figure 4.11: Constellations of NOMA and OMA in the downlink of WLAN where a two-antenna AP serves two single-antenna users. user: ZF-SIC and our proposed SIC. Based on the results, we have the following observations: (i) For the weak user, our NOMA scheme has a 60.0% data rate gain compared to OMA, on average over all the locations that we tested. (ii) For the strong user, NOMA yields a 17.9% data rate degradation on average compared to OMA. This degradation can be attributed to the inter- user interference and the imperfections of SIC as explained above. (iii) For the AP, the proposed NOMA scheme outperforms OMA by 18.0% in terms of weighted sum rate. (iv) The proposed SIC scheme outperforms ZF-SIC by 27.8% for the stronger user’s data rate. This throughput gain is from the summation operation in (4.24). Mathematically, this summation operation is equivalent to a low pass filter, which reduces the effects of noise and interference (weak signals) in each iteration of SIC. 4.6.4 Experimental Results of (2 × 2)-NOMA We now consider the case where the AP has two antennas and it serves two users. The weak user is placed at Lk1 and the strong user is placed at Lk3 , k = 1, 2, · · · , 6. Case Study. Again, we use location 4 (k = 4) as an example to examine the proposed NOMA scheme. Fig. 4.11 presents the constellation of the decoded signals at the two users when NOMA and OMA are used, respectively. We have the following observations from the experimental data. (i) For the weak user, it has similar EVM in NOMA and OMA. Its extrapolated data rate is 121 5 35 Weak user's data rate (Mbps) Strong user's data rate (Mbps) OMA Weighted sum rate (Mbps) 12 30 4 NOMA 25 9 3 20 2 6 15 OMA 10 OMA 1 3 NOMA with ZF-SIC NOMA with ZF-SIC 5 NOMA with proposed SIC NOMA with proposed SIC 0 0 0 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 Location index Location index Location index (a) Weak user’s data rate (b) Strong user’s data rate (c) AP’s weighted sum rate Figure 4.12: Performance comparison of NOMA and OMA in downlink transmission of a WLAN where a two-antenna AP serves two single-antenna users. 3.0 Mbps in NOMA and 1.5 Mbps in OMA. This shows that NOMA has a significant throughput gain for the weak user. (ii) For the strong user, it achieves −20.0 dB EVM in NOMA and −24.9 dB EVM in OMA. Correspondingly, the extrapolated data rate for this user is 9.0 Mbps in NOMA and 6.0 Mbps in OMA. This shows that NOMA has a considerable throughput gain (50.0%) for the strong user as well. (iii) For the AP, the weighted sum rate is 27.0 Mbps in NOMA and 15.7 Mbps in OMA. This shows that our NOMA scheme has a 72.0% gain over OMA. Results from All Locations. Fig. 4.12 presents the extrapolated data rate at each user when NOMA and OMA are used, respectively. The experimental results from the six locations corrob- orate our observations in the case study. On average, NOMA improves the data rate by 55.5% for the weak user, 40.7% for the strong user, 49.8% for the AP’s weighted sum rate. Moreover, our proposed SIC outperforms ZF-SIC, yielding 35.7% data rate gain for the strong user. 4.6.5 Experimental Results of (2 × 3)-NOMA Finally, we consider the case where the AP has two antennas and it serves three users. The weak user is placed at Lk1 , the middle user is placed at Lk2 , and the strong user is placed at Lk3 , k = 1, 2, · · · , 6. Case Study. Similar to the previous case studies, we place the users at location 4. Fig. 4.13 presents the constellation of the decoded signals at the users when NOMA is used. When OMA 122 1 1 1 0.5 0.5 0.5 0 EVM=-13.1 dB 0 EVM=-13.1 dB 0 EVM=-12.1 dB -0.5 -0.5 -0.5 -1 -1 -1 -1 -0.5 0 0.5 1 -1 -0.5 0 0.5 1 -1 -0.5 0 0.5 1 (a) 1st-decoding at weak user. (b) 2nd-decoding at middle user. (c) 3rd-decoding at stronger user. Figure 4.13: Constellation of the decoded signals in downlink NOMA transmission when the two- antenna AP serves three single-antenna users. is used, the three stations achieve −14.5 dB, −21.4 dB, and −26.8 dB EVM, respectively. Based on the experimental results, we have the following observations: (i) For the weak user, NOMA has a small EVM degradation (1.4 dB) compared to OMA. Its extrapolated data rate is 4.5 Mbps in NOMA and 1.5 Mbps in OMA. (ii) For the middle user, NOMA has a considerable EVM degradation (8.3 dB) compared to OMA. Its extrapolated data rate is 4.5 Mbps in NOMA and 3.0 Mbps in OMA. (iii) For the strong user, NOMA has a significant EVM degradation (13.7 dB). Its extrapolated data rate is 3.0 Mbps in NOMA and 4.5 Mbps in OMA. (iv) For the AP, the weighted sum rate is 25.5 Mbps in NOMA and 15.0 Mbps in OMA. This means that NOMA has a 70.0% gain over OMA in terms of weighted sum rate. Results from All Locations. Fig. 4.14 presents the extrapolated data rate at each user when NOMA and OMA are used. From the experimental results, we can see that NOMA significantly increases the weak user’s data rate, slightly increases the middle user’s data rate, and considerably decreases the strong user’s data rate. On average over the six locations, NOMA increases the data rate by 147.1% for the weak user and by 18.4% for the middle user. However, it decreases the data rate of the strong user by 26.3%. For the AP, NOMA achieves a weighted sum rate of 21.5 Mbps and OMA achieves 15.3 Mbps, indicating a 40.5% improvement. Meanwhile, it is evident that our proposed SIC considerably outperforms ZF-SIC. It improves the strong user’s data rate by 16.7% and the middle user’s data rate by 50.0%. 123 Weak user's data rate (Mbps) Middle user's data rate (Mbps) 5 7 OMA OMA 6 4 NOMA with ZF-SIC NOMA 5 NOMA with proposed SIC 3 4 2 3 2 1 1 0 0 1 2 3 4 5 6 1 2 3 4 5 6 Location index Location index (a) Weak user’s data rate. (b) Middle user’s data rate. Strong user's data rate (Mbps) Weighted sum rate (Mbps) 6 30 5 25 4 20 3 15 2 10 OMA OMA NOMA with ZF-SIC NOMA with ZF-SIC 1 5 NOMA with proposed SIC NOMA with proposed SIC 0 0 1 2 3 4 5 6 1 2 3 4 5 6 Location index Location index (c) Strong user’s data rate. (d) AP’s weighted sum rate. Figure 4.14: Performance comparison of NOMA and OMA in downlink transmission when the two-antenna AP serves three single-antenna users. 4.6.6 Summary of Observations Based on our experimental results, we have the following observations on NOMA: (i) NOMA can significantly increase the weak user’s data rate when compared to OMA. This phenomenon has been observed in all the cases that we tested in our experiments. (ii) As expected, the use of NOMA will lead to a degradation for the strong user’s data rate. But in overall, NOMA can greatly improve the weighted sum rate for the AP. (iii) Our proposed SIC method works in practice and it offers considerably better performance than ZF-SIC. 4.7 Chapter Summary In this chapter, we proposed a NOMA scheme for WLANs and evaluated its performance in real- world wireless environments. Our NOMA scheme has three key components: precoder design, 124 user grouping, and a new SIC method. We formulated the precoder design problem as an op- timization problem and developed a minorization-majorization algorithm to pursue an efficient solution to it. Moreover, a robust SIC method has been proposed to decode the desired signal in the presence of strong interference. Our SIC method does not require channel estimation and is amenable to practical implementation. We have implemented the proposed NOMA scheme on a GNURadio-USRP2 testbed. Experimental results show that, compared with OMA, the proposed NOMA scheme can significantly improve the weak user’s data rate and considerably improve the AP’s weighted sum rate. 125 Chapter 5 Learning-Based Channel Feedback for MU-MIMO in WLANs 5.1 Introduction The proliferation of wireless devices, combined with the growth of Internet-based wireless appli- cations such as online streaming and video chatting, has led to continuously increasing demands for wireless services in indoor environments such as smart homes, university campuses, football stadiums, and airports. As one of the largest wireless networks in real world, WLANs carry the most wireless data traffic (even more than cellular networks) and play a pivotal role in our society. To meet the increasing demands for data services in WLANs, MU-MIMO is a key technology. It allows an AP to serve multiple users simultaneously and therefore can significantly improve the spectral efficiency. Given its potential, MU-MIMO has been specified in the IEEE 802.11 stan- dards [67,70] and widely been deployed on commercial Wi-Fi devices, e.g., Wi-Fi routers, laptops, and phones. In real-world WLANs, the downlink typically has higher demands for data services compared to the uplink. To support downlink MU-MIMO communications in WLANs, an AP needs to access short-term CSI for the construction of beamforming filters. The filters will then be used to project the precoded signals onto the AP’s multiple antennas so that each user can decode its 126 data packets. Thus, CSI at the AP is essential to enabling downlink MU-MIMO transmissions. There are two channel acquisition methods for an AP to obtain CSI: i) implicit channel acquisition, and ii) explicit channel acquisition. The implicit method is based on channel reciprocity. The AP infers the downlink CSI through the estimation of uplink CSI and periodic channel calibrations [124]. This method, however, requires an extra RF chain on hardware or a sophisticated algorithm for channel calibration, and may not be suited for implementation on low-cost Wi-Fi devices [74, 75, 206]. The explicit method is based on channel feedback over uplink over-the-air channel. Each user first estimates the downlink CSI and then reports the estimated CSI to the AP. Given its amenabil- ity to implementation, this method has been adopted by the IEEE 802.11 standards [67, 70] and been implemented on commercial Wi-Fi systems. However, due to its reliance on over-the-air CSI feedback, it suffers from large airtime overhead. The large overhead of this method can be attributed to the large number of subcarriers in WLANs’ OFDM modulation, each of which has a channel matrix to be reported. Existing 802.11 protocols may group subcarriers for CSI feedback to reduce the overhead. Apparently, such a naive scheme will lead to an inferior beamforming per- formance and drastically compromises the throughput gain of MU-MIMO. While there are many results of MU-MIMO in the literature, the CSI compression for 802.11 MU-MIMO protocols is highly overlooked and its progress remains limited. In this chapter, we study explicit channel acquisition in 802.11 MU-MIMO protocols with the objective of minimizing CSI feedback airtime overhead while preserving CSI feedback ac- curacy. Toward this objective, we propose a learning-based channel feedback framework (called LB-SciFi1 ) for 802.11 protocols to reduce their airtime overhead by taking advantage of recent 1 LB-SciFi stands for Learning-Based compression for Ψ (Sci) and Φ (Fi), which are the CSI for feedback in 802.11 MU-MIMO protocols [67, 70]. 127 Encoder Encoder Encoder Autoencoder Decoder Encoder (a) Online training. (b) Real-time exploitation. Figure 5.1: An overview of DNN-AEs for channel feedback compression in 802.11 MU-MIMO protocols. advances in Deep Neural Network Auto-Encoder (DNN-AE). Fig. 5.1 shows the basic idea of LB- SciFi, which is composed of two phases: online training and real-time exploitation. In the training phase, LB-SciFi trains DNN-AEs at the AP by leveraging side information from existing 802.11 MU-MIMO protocols, and thus require no extra effort from user devices. In the exploitation phase, LB-SciFi uses the trained DNN-AEs to compress CSI for efficient feedback. Given the redundancy of CSI and the effectiveness of DNN-AEs, LB-SciFi can reduce the airtime overhead significantly without sacrificing CSI feedback accuracy. The main challenge in the design of LB-SciFi is the online training of DNN-AEs, which should be capable of capturing the kernel space of all possible channels in a given wireless environment through the learning of collected CSI at the AP. To address this challenge, we design an efficient training scheme for the DNN-AEs, which jointly optimize the structure of DNN-AEs, the collec- tion of training data, and the preprocessing of collected data by leveraging the existing feedback data in existing 802.11 MU-MIMO protocols. Specifically, the proposed training scheme metic- ulously chooses the ψ and ϕ angles from Givens Rotation (GR) as the DNN-AEs input based on a defined Power spectral Entropy (PSE). Moreover, several important engineering problems have 128 been addressed to make DNN-AEs work in real-world wireless environments. This work advances the state-of-the-art in the following respects. • We propose to employ DNN-AEs for CSI compression in 802.11 MU-MIMO protocols, and have designed an online training scheme for DNN-AEs while imposing no computation burden on user devices. • Based on the DNN-AEs, we have designed a learning-based channel feedback framework (LB-SciFi) for downlink MU-MIMO. This framework can dramatically reduce the CSI feed- back airtime overhead for 802.11 MU-MIMO protocols without sacrificing CSI feedback accuracy. • We have built a prototype of LB-SciFi and evaluated its performance in real-world indoor en- vironments. Our experimental results show that LB-SciFi reduces the CSI feedback airtime overhead by 73% and improves the throughput of MU-MIMO by 69% on average. 5.2 Related Works We focus our literature review on research efforts studying low-overhead channel acquisition meth- ods for MU-MIMO transmissions in WLANs and cellular networks. Channel Acquisition in WLANs. As the core technology of existing WLANs, MU-MIMO markedly improves users experience with high throughout and low latency. However, airtime over- head from channel acquisition is a real barrier toward fully exploiting the potential of MU-MIMO. Given the severity of this issue, research efforts have been devoted to studying the effect of chan- nel acquisition parameters on network throughput or completely altering the channel acquisition paradigm to enhance network throughput [17, 33, 53, 87, 110, 113, 117, 128, 134, 139, 197, 200]. 129 Pioneering work [128, 134, 139, 197] studied the underlying relationship between network throughput and channel acquisition parameters. The outcome was not surprising; full exploita- tion of MU-MIMO requires a timely CSI through a frequent channel acquisition. The large airtime overhead, however, drastically compromises the throughput gain of MU-MIMO. [17, 33, 87, 113, 117] aimed at lowering the frequency of channel acquisitions to reduce channel feedback overhead for MU-MIMO protocols. However, the airtime overhead was still too large. [53,110,200] revisited existing channel acquisition paradigm and explored new methods for efficient channel acquisition. Thus far, there is no efficient method for CSI compression to reduce feedback overhead. Our work fills this gap by leveraging recent advances in artificial neural networks to compress CSI. The resultant CSI feedback will entail much less overhead compared to existing 802.11 protocols. Channel Acquisition in Cellular Networks. Compared to WLANs, the need for low-overhead channel acquisition methods in cellular networks is appreciated earlier as the emergence of massive- MIMO revealed the drawbacks of traditional techniques. Toward this objective, the underlying correlation of CSI reports has been used for compression by removing the redundant correlated information [54, 97, 111, 172, 179, 193]. In particular, temporal correlation [97, 111, 172], spectral correlation [54, 179, 193], and spatial correlation [97, 101] have been explored to minimize the representation of CSI. Channel reciprocity [109, 143, 194] and outdated CSI [73] have also been studied to enhance the efficiency of channel acquisition. Our work is orthogonal to these research efforts in the following two aspects: i) Our work focuses on indoor WLANs, which differ from cellular networks in terms of CSI format, network architecture, data collection, data processing, and system implementation. ii) While the above efforts focused on theoretical exploration, our work focuses on practical design based on real- world 802.11 protocols. 130 5.3 Problem Description In this section, we first offer a primer of existing 802.11 MU-MIMO protocols and underscore their airtime overhead issue. Then, we will state our design objective and challenges. 5.3.1 Existing 802.11 MU-MIMO Protocols Consider a WLAN as shown in Fig. 5.1(a), where a multi-antenna AP is serving a set of user devices (a.k.a. stations or STAs for brevity). The AP is equipped with Nap antennas, and an STA is equipped with Nsta antennas. Due to the physical size and power limits, an STA typically has less antennas than an AP, i.e., Nsta < Nap . In such a WLAN, MU-MIMO is widely used to exploit the spatial DoF of asymmetric antenna configuration by enabling the AP to serve multiple STAs simultaneously. The application of MU-MIMO not only improves spectral efficiency and user scheduling flexibility but it also reduces packet delay at the MAC layer and enhances fairness in resource allocation. To enable MU-MIMO transmissions in real-world WLANs, protocols with explicit channel acquisition have been specified in the IEEE 802.11 standards [67, 70]. Fig. 5.2 shows an existing 802.11 MU-MIMO protocol, which is composed of the following four phases: • MU-MIMO Announcement: The AP selects a subset of STAs for the downlink MU-MIMO transmission based on some pre-defined criteria.2 After user selection, the AP broadcasts a NDPA to inform the STAs of MU-MIMO transmission, followed by an NDP for those STAs to estimate downlink CSI. • Channel Feedback: After estimating CSI, the selected STAs feed back their CSI to the AP 2 Note that user selection is not in the scope of our work, and there are many prior results on user selection for MU-MIMO. 131 Sounding Overhead AP N1 N2 PL PL Data STA 1 CBR A STA 2 CBR A ... ... ... STA N CBR A N1 NDPA frame PL Poll frame A ACK N2 NDP frame CBR Compressed beamforming report Figure 5.2: An MU-MIMO protocol in IEEE 802.11ac [67]. sequentially following the poll frames from the AP, as shown in Fig. 5.2. The CSI feedback procedure will be detailed shortly. • Data Transmission: Upon obtaining CSI from all STAs, the AP uses CSI to construct beam- forming filters and performs downlink data transmission. • Acknowledgment: After decoding the packets, all STAs sends an ACK/NACK to the AP to indicate the success/failure of their packet detection. In the channel feedback phase, if an STA sends raw CSI to the AP, it entails a huge amount of airtime overhead and thus negates the throughput gain of MU-MIMO. To reduce the airtime overhead, 802.11 protocols have employed angle-based CSI feedback instead of raw CSI feedback in the spatial domain and specified subcarrier grouping in the spectral domain. We detail them below. Angle Feedback in Spatial Domain. Referring to the protocol in Fig. 5.2, once an STA has received the NDP from the AP, it estimates the downlink CSI, i.e., H(k) ∈ CNsta×Nap , 1 ≤ k ≤ Nsc , where Nsc is the number of valid subcarriers. Instead of reporting the complex entries of H(k), the STA reports two sets of angles (Ψ and Φ) to the AP to reduce the feedback overhead. A 132 Algorithm 5.1 A high-level description of computing Ψ and Φ at an STA specified in the IEEE 802.11ac/ax [67, 70]. Inputs: Estimated channel at an STA, i.e., H(k) ∈ CNsta×Nap , 1 ≤ k ≤ Nsc Outputs: Computed angles, i.e., Ψ and Φ 1: Set Ψ = { } and Φ = { } 2: for (k = 1; k ≤ Nsc ; k++) do 3: [U, Σ, V] = svd (H(k)) 4: V′ = V (: , 1 : Nsta ) 5: for (l = 1; l ≤ Nsta ; l++) do 6: ψk := phase_extraction(V′ (:, l)) ϕk := givens_rotations(V ′ 7:  (:, l)) 8: Ψ := Ψ ψk and Φ := Φ ϕk 9: Quantizing every angle in Ψ using p bits, p ∈ {5, 7} 10: Quantizing every angle in Φ using q bits, q = p + 2. high-level description of computing Ψ and Φ is given in Alg. 5.1. This conversion is also known as Givens Rotations. Details of computing the angles can be found in [156]. With these two sets of angles, the AP can reconstruct the essential spatial information of H(k), which suffices for beamforming operations at the AP. In this method, the number of generated angles in Φ is Nϕ = (Nap Nsta −Nsta2 /2−N /2)N , sta sc so is the number of angles in Ψ. These angles need to be reported to the AP via the uplink over-the- air channels. In 802.11 standards [67], two types of quantization are specified for CSI feedback: • Type 0: 5 bits for angles in Ψ and 7 bits for angles in Φ, • Type 1: 7 bits for angles in Ψ and 9 bits for angles in Φ. Subcarrier Grouping in Spectral Domain. In a typical environment of WLANs, adjacent subcarriers experience highly correlated channel responses from the medium. Therefore, instead of reporting CSI for every individual subcarrier, an STA may group multiple neighboring subcarriers together for CSI feedback. Per IEEE 802.11ac [67], the number of subcarriers in a group, denoted by Ng , can be 1, 2, or 4, depending on network configuration. In IEEE 802.11ax [70], Ng can also 133 be 8 or 16 due to its small subcarrier spacing. Large Airtime Overhead. Even with the spatial- and spectral-domain compression, 802.11 MU-MIMO protocols still come with a large amount of airtime overhead, which significantly compromises the throughput gain of MU-MIMO [110, 187]. For example, for an STA with 4 antennas and an AP with 8 antennas, the CSI feedback could be as large as 19.7 kbit for 20 MHz bandwidth and 170.4 kbit for 160 MHz bandwidth. The problem of CSI feedback airtime overhead becomes increasingly acute as the evolution of WLANs is accommodating more subcarriers in a certain frequency band. For example, IEEE 802.11ax employs 256 subcarriers over 20 MHz for packet transmissions, which is four times greater than that of IEEE 802.11ac. 5.3.2 Our Objective and Challenges Objective. We aim to reduce the CSI feedback airtime overhead by taking advantage of recent advances in DNN-AE, which has been successfully used for data compression and feature extrac- tion in other fields such as image and video compression. Toward this aim, we will compress the angles in Ψ and Φ in the spectral domain by removing their information redundancy caused by channel correlation. Challenges. While the idea is straightforward, there are challenges in the design of practical DNN-AEs that are amenable to real-world applications. The challenges lie in the following re- spects: i) The configuration of DNN-AEs should be meticulously selected, including the number of layers in autoencoder, the number of neurons on each layer, and the preprocessing of input data. The configuration of DNN-AEs is of great importance as it dictates the compression rate, the infor- mation loss, and the required data amount and computational power for training. ii) The training of DNN-AEs should be online and transparent to user devices. User devices are typically constrained by their computational capability and battery power. It is desirable that the training of DNN-AEs, 134 which is computational demanding, will not put a burden on user devices. In what follows, we propose LB-SciFi to address these two challenges. 5.4 LB-SciFi: A Learning-Based Feedback Framework To reduce the CSI feedback airtime overhead, we propose LB-SciFi for CSI compression. The core components of LB-SciFi are two DNN-AEs, which compress CSI at each STA and decompress CSI at the AP. Fig. 5.1 shows the basic idea of LB-SciFi, which is composed of two phases: online training and real-time exploitation. As shown in Fig. 5.1(a), the online training is done at the AP by taking advantage of the side information (Ψ and Φ) from existing 802.11 protocols. Once the training of DNN-AEs is completed, the AP broadcasts the weights of the DNN-AEs to all STAs and enters into the exploitation phase as shown in Fig. 5.1(b). In the exploitation phase, each STA uses DNN-AEs to compress its CSI and reports the compressed CSI to the AP. The AP uses DNN-AEs to decompress the received CSI for the construction of beamforming filters. 5.4.1 DNN-AEs Autoencoder is a type of artificial neural network used to learn efficient data coding in a self- supervised manner. One of its applications is to learn a representation for a set of data for dimen- sionality reduction. Autoencoders are effectively used for solving many applied problems, ranging from face recognition to acquiring the semantic meaning of words. In this work, we take advantage of recent advances in DNN-AEs to compress CSI for 802.11 MU-MIMO protocols. We consider a DNN-AE as shown in Fig. 5.1, which is composed of two parts: encoder and decoder. The encoder will be used on each STA to compress its estimated CSI for feedback, and the decoder will be used at the AP to recover CSI for construction of beamforming filters. 135 π /2 2π Phase (rad) Phase (rad) π /4 π 0 0 0 25 50 0 25 50 Subcarrier index Subcarrier index (a) An angle in Ψ on 52 subcarriers (measured (b) An angle in Φ on 52 subcarriers (measured PSE is 0.11). PSE is 0.25). Figure 5.3: Angle instances in Ψ and Φ as well as their PSE. 0.02 0.02 PDF 0.01 PDF 0.01 0 0 0 π /4 π /2 0 π 2π Phase(rad) Phase(rad) (a) PDF of the angles in Ψ. (b) PDF of the angles in Φ. Figure 5.4: Distribution of the measured angles over all subcarriers and at many locations in a real-world office environment. Compressibility of Ψ and Φ. Before delving into the details of DNN-AEs, we introduce a metric to quantify compressibility of angles on an observation basis. The compressibility metric will lay the foundation for our design of DNN-AEs. Consider an angle sequence θ = [θ1 , θ2 , · · · , θK ]. Denote its FFT output as ϑ = [ϑ1 , ϑ2 , · · · , ϑK ]. Then, we define PSE of θ as follows: K 1 X PSE(θ) = − p(ϑk ) log2 p(ϑk ), (5.1) log2 K k=1 |ϑ |2 where p(ϑk ) = PK k 2 [65]. Apparently, the PSE of an angle sequence is bounded in [0 1]. i=1 |ϑi | In our case, its value reflects the uncertainty of a random angle or fluctuations of a measured angle over subcarriers. Intuitively, a low value of PSE indicates high compressibility, while a high value of PSE indicates low compressibility. In WLANs, STAs are semi-stationary and work on a limited bandwidth. In such an envi- 136 8.1 8 2 1.79 1.69 1.51 1.35 Error (%) Error (%) 1.06 4 1 0.92 0.92 0.57 0.51 0.68 0.38 0 0 7 6 5 4 3 2 10 9 8 7 6 5 Code dimension Code dimension (a) Error for ψ angles. (b) Error for ϕ angles. Figure 5.5: Compression error for different code dimensions. ronment, the channels between an AP and STAs are prone to be frequency-flat, and the channel responses on adjacent subcarriers are highly correlated. Fig. 5.3 exhibits an angle in Ψ and an an- gle in Φ over 52 valid subcarriers in 20 MHz bandwidth at 2.484 GHz as well as their PSE values. It is evident that both PSE values are much less than 1, indicating the compressibility of the angles. Separate DNN-AEs for Ψ and Φ. For an STA, it needs to first compress Ψ and Φ, and then report compressed Ψ and Φ to the AP. A natural question to ask is whether an STA should use the same DNN-AE for both Ψ and Φ. To explore an answer to this question, we empirically study the compressibility of the angles in Ψ and Φ. Specifically, we collected the CSI angles for Ψ and Φ at the STAs that were widely distributed in a real-world office environment, and plotted the PDF of the collected angles. Fig. 5.4 shows our measured results. We can see that the angles in Ψ is non- uniformly distributed, while the angles in Φ are almost uniformly distributed. Based on collected CSI angles, the measured PSE of Ψ is 0.09, and the measured PSE of Φ is 0.23. The measurement results indicate that the angles in Ψ and Φ have different levels of compressibility. Given that the compression ratio is determined by the DNN-AE structure (the ratio of dimension of the input layer to that of the latent layer), we employ two different DNN-AEs for the compression of Ψ and Φ. DNN-AE Settings. Another question to ask is about the parameter selection of the two DNN- 137 f(•) 52 x 8 x 3 x 8 x 52 f -1(•) g(•) 52 x 16 x 8 x 16 x 52 g-1(•) f(•) f -1(•) g(•) g-1(•) f(•) f -1(•) g(•) g-1(•) f(•) f -1(•) g(•) g-1(•) f(•) f -1(•) g(•) g-1(•) f(•) f -1(•) g(•) g-1(•) Encoder Decoder Encoder Decoder (a) DNN-AE for Ψ. (b) DNN-AE for Φ. Figure 5.6: Illustration of two different DNN-AEs for Ψ and Φ. AEs, including the number of layers, the number of neurons on each layer, quantization bits, and dimension of the latent layer. Unfortunately, there is no systematic approach that we can utilize to determine the optimal values for these parameters. Therefore, we focus only on the dimension of the latent layer (a.k.a. code dimension) as it is the most important parameter for a DNN-AE. Fig. 5.5 presents the compression error of DNN-AEs for different code dimensions. Using 1.5% error as reference, we select the code dimension that offers the best compression rate. As such, our design choices are 52 × 8 × 3 × 8 × 52 for Ψ’s DNN-AE and 52 × 16 × 8 × 16 × 52 for Φ’s DNN-AE, as shown in Fig. 5.6. 5.4.2 Online Training: Data Collection As illustrated in Fig. 5.1, the AP takes advantage of existing 802.11 protocols to train the DNN-AEs. That is, AP and STAs perform downlink MU-MIMO transmissions using the 802.11 protocol as shown in Fig. 5.2. In the meantime, the AP trains the DNN-AEs using reported CSI (uncompressed Ψ and Φ) from the STAs. By doing so, the AP can train the DNN-AEs by collecting side infor- mation from the existing MU-MIMO protocol, and the training remains transparent to the STAs. In the course of data collection, care should be taken for the following two tasks. Avoiding Garbage-In/Garbage-Out. To collect a meaningful dataset for training DNN-AEs, the AP needs to block out garbage CSI reports from STAs. In real WLANs, an STA may fail in es- 138 timating accurate CSI due to various sources of errors such as time and frequency synchronization errors. As a garbage report has intrinsically a noise-like behavior, several dominant components exist in its spectral representation. Therefore, the PSE of such a report is high likely to be overly high. The AP leverages PSE metric in (5.1), and blocks out the sequences with abnormal PSE. The abnormality is detected by adjusting appropriate thresholds. In our experiment, we assumed that an abnormal angle in Ψ has PSE ≥ 0.25 and that an abnormal angle in Φ has PSE ≥ 0.5. Avoiding Overrepresentation. Another important task of the AP is to prepare a balanced data set. In a typical WLAN, a static STA like smart TV remains at a fixed location without quitting the WLAN, while a mobile STA wanders through coverage range and may quit the WLAN for a while. A static STA may temporally experience correlated large-scale fading, making its historical CSI reports highly correlated. In light of this, the CSI reports from static STAs might be over- represented, making the DNN-AEs biased in favor of themselves. To avoid overrepresentation, the AP divides the PSE range into 100 uniform bins. If the AP receives 20 consecutive CSI samples of the same PSE value from the same STA, it will ignore the subsequent CSI samples from this STA, until the PSE value of its CSI samples changes. Here, PSE values within a PSE bin are considered the same. 5.4.3 Online Training: Data Preprocessing After clearing and balancing the collected datasets, the AP preprocesses the datasets before feeding them into DNN-AEs for training. In what follows, we first describe the purpose of data prepro- cessing and then present the preprocessing procedure for the two sets of angles. Purpose of Data Preprocessing. To avoid biased training and boost the convergence for DNN- AEs, we wish to obtain the training datasets with a normalized zero-mean probability density function and uniform subcarrier-wise variance in the feasible space [83]. Such datasets are likely 139 0.02 Variance 0.2 PDF 0.01 0.1 0 0 0 π /4 π /2 0 25 50 Phase(rad) Subcarrier index (a) Probability density function and variance before preprocessing. 0.02 0.4 Variance PDF 0.01 0.2 0 0 -0.8 0 1 0 25 50 Normalized phase Subcarrier index (b) Probability density function and variance after preprocessing. Figure 5.7: The probability and variance of the angles in Ψ before and after rectification. to render an unbiased training for DNN-AEs and yield a high compression ratio. Unfortunately, the collected angles in Ψ and Φ do not meet these two conditions (normalized zero-mean distribution and flat subcarrier-wise variance). Therefore, we preprocess the collected datasets with the aim of rectifying their distributions to accelerate the training. Preprocessing of Angles in Ψ. Fig. 5.7(a) shows the probability and variance of the angles in Ψ before the preprocessing. As it can be seen, the angles in Ψ are non-uniformly distributed within their range. To alleviate this issue, we apply a rectification function f (·) at the encoder and de-rectification function f −1 (·) at the decoder, as shown in Fig. 5.6(a). Here, we employ  f (ψk ) = α ψk − ψ̄ as the rectification function, where ψ̄ is the average of the angles in Ψ and α is a normalization constant. In our experiments, we use ψ̄ = 0.68 rad and α = 1.12. After the rectification, the angles will have zero mean and uniform variance over different subcarriers, thereby improving the convergence of the DNN-AEs [92] and avoiding zigzag behavior in gradient descent algorithms [93]. Fig. 5.7(b) shows the probability and variance of the angles in Ψ after the preprocessing. As it 140 0.08 30 Variance PDF 0.04 15 0 0 -6π -3π 0 3π 6π 0 25 50 Phase(rad) Subcarrier index (a) Probability density function and variance before preprocessing. 0.03 Variance 0.2 PDF 0.1 0 0 -1 -0.5 0 0.5 1 0 25 50 Normalized phase Subcarrier index (b) Probability density function and variance after preprocessing. Figure 5.8: The probability and variance of the angles in Φ before and after rectification. can be seen, the probability density function has a zero mean after the preprocessing, which leads to a disciplined training for the corresponding DNN-AE. Preprocessing of Angles in Φ. Compared to Ψ, the preprocessing of Φ is a bit more tricky. Fig. 5.8 shows the probability density function and subcarrier-wise variance of the angles in Φ measured in real WLANs. The non-uniform probability distribution, non-uniform variance, high variance on each subcarrier, and the large range (even beyond [−4π, 4π]) make the angles in Φ unsuited for training. Preprocessing is needed to rectify the dataset to improve the convergence of the DNN-AE and avoid biased training. One approach that one may think of to rectify the angles is to wrap the angles into [0, 2π) using a simple function g(ϕ) = mod(ϕ, 2π). This approach, however, is not an effective one. Fig. 5.9 shows an example of this rectification function. It can be seen that the rectified angle curve appears to be discontinuous. However, the discontinuity of the rectified data cannot be captured by the DNN-AE, as illustrated in the figure. Therefore, a continuous rectification function is needed for the preprocessing of Φ. 141 2 Phase (rad) Original Reconstructed 0 0 10 20 30 40 50 Subcarrier index Figure 5.9: Illustrating the underlying problem of the rectification function g(ϕ) = mod(ϕ, 2π) for the angles in Φ. In light of this requirement, we propose a piece-wise function to rectify the angles in Φ before feeding them into the DNN-AE:    1 2π (ϕk − 0.07) if mink (ϕk ) < 0,        g(ϕk ) = 1 (5.2)   2π (ϕk − 6.16) if maxk (ϕk ) > 2π,     π1 (ϕk − 3.13) otherwise,    for k = 1, 2, · · · , 52. In this equation, the values of 0.07, 6.16, and 3.13 are the mean of the angles in their respective category and obtained from our experimental measurements. It is noteworthy that coefficient 1/π in the third equation in (5.2) differs from the other two. This is because the angles in this category have a small range and thus a small normalization coefficient is used for scaling. Fig. 5.8(b) shows the probability density function and subcarrier-wise variance of all the an- gles in Φ after preprocessing. Compared to the distribution and variance before preprocessing as shown in Fig. 5.8(a), it is evident that this preprocessing can flatten both probability and variance distributions, making the DNN-AE easy to converge. Given that g(ϕk ) is used for data preprocessing on the encoder side, an inverse function is needed on the decoder side to recover the original angles. However, g(ϕk ) is a piece-wise function and it is not invertible. To address this challenge, we use two bits to indicate the sub-function used 142 for rectification, i.e., “00” means g(ϕk ) = 2π 1 (ϕ − 0.07), “01” means g(ϕ ) = 1 (ϕ − 6.16), k k 2π k and “10” means g(ϕk ) = π1 (ϕk − 3.13). With these two bits, the decoder is capable of constructing g −1 (ϕk ) and inversing the preprocessing at the encoder. In the exploitation phase, each STA should send these indication bits to the AP via the over-the-air uplink channel. It is worth noting that these indication bits are of very small size compared to conventional CSI feedback. 5.4.4 Online Training: Settings and Procedure Training Procedure and Hyper-Parameter Tuning. We train the DNN-AEs shown in Fig. 5.6 using the preprocessed datasets. For the two DNN-AEs, each hidden layer is composed of a fully- connected layer followed by a batch-normalization layer to speed up the training convergence [72]. Also, Rectified Linear Unit (ReLU) activation function is used. The DNN-AEs are trained to minimize loss function, which is defined as the relative error: ∥x̂ − x∥ L(x, x̂) = , (5.3) ∥x∥ where x and x̂ represent the input sample and the corresponding reconstructed sample, respec- tively. The networks are trained using Adam optimizer [85]. We started the training with an initial learning rate of 0.001 and reduced it with a decay rate of 0.98 following a step-wise ap- proach. All parameters were initialized using Xavier initialization [50]. Dropout [154] is applied to all hidden layers to prevent over-fitting and improve the generalization of the model. The final architectures are the result of random search over hyper-parameters. All DNN-AEs are trained end-to-end using Pytorch v1.4 library [132]. Readiness of DNN-AEs for Exploitation. While the AP trains the DNN-AEs whenever it receives a batch of CSI reports from the STAs, a natural question to ask is about the criteria for the 143 completion of its training phase. In our experiments, we check the loss function of validation data to determine the readiness of the DNN-AEs. If the loss function of validation data is consistently less than 1.5%, we consider the completion of the training phase and the DNN-AEs are ready to use. The AP then broadcast the weights and bias values of the encoder parts of the two DNN-AEs as well as the preprocessing parameters to the STAs, so that the STAs can reconstruct the encoder part to compress the angles in Ψ and Φ, as shown in Fig. 5.1(b). Using 32 bits to represent each parameter (real number), the total overhead of transmitting the parameters of the trained DNN-AEs is 5.74 kB, where 1.80 kB is for the parameters of Ψ’s DNN-AE, and 3.94 kB is for the parameters of Φ’s DNN-AE. This airtime overhead of DNN-AEs broadcast is not an issue for two reasons. First, the broadcast takes place once for a very long period of time. Second, the broadcast is not time-sensitive and the AP can broadcast whenever it gets the resource. Keep Training DNN-AEs. While the AP has broadcast the DNN-AEs to the STAs, there might be some STAs incapable of utilizing the DNN-AEs for CSI compression. For example, some incumbent STAs may support MU-MIMO but do not support autoencoder-based CSI compression. In such a case, the AP can instruct these STAs to report CSI without compression and use the uncompressed CSI reports for the construction of beamforming filters as that in exiting 802.11 protocols. In the meantime, the AP can use the uncompressed CSI reports from those STAs to keep training the DNN-AEs. Updating DNN-AEs. During the training in exploitation phase, the AP will periodically use validation data to check the loss function. It rebroadcasts the DNN-AEs to STAs whenever it detects a stable improvement in trained DNN-AEs. Furthermore, the AP rebroadcasts the updated DNN-AEs to the STAs whenever it observes an increase (e.g., 5%) in downlink packet error rate. Such an event simply means that the DNN-AEs in use are outdated. It is noteworthy that we did not observe a failure of the DNN-AEs in our experiments even though we moved the testbed 144 Beamforming Angles Alg. 1 (Givens Rotation) ^ Ѱ ^ Փ Փ Ѱ g -1(•) g(•) -1 f (•) f(•) Frame recovery Frame assembly -1 Qϕ(•) Qϕ(•) -1 Qψ(•) Qψ(•) AP STAs Figure 5.10: CSI compression at STA and decompression at AP. significantly. We enforce this mechanism just to improve the robustness of our design. 5.4.5 Real-Time DNN-AEs Exploitation: CSI Compression After the AP completes the training phase, the WLAN enters into the exploitation phase. In this phase, the AP and STAs still use the existing MU-MIMO protocols shown in Fig. 5.2 for downlink MU-MIMO transmissions, except that DNN-AEs are used for CSI compression of the channel feedback. In what follows, we describe CSI compression at an STA and CSI decompression at the AP, respectively. STA-Side Operations. Fig. 5.10 shows the CSI compression operations at a STA. The STA first estimates the CSI and then converts the estimated CSI to two sets of angles. Then, the two sets of angles are preprocessed and fed into the encoders of DNN-AEs for compression. After that, quantization is performed on the output, followed by frame assembly for uplink CSI report. A question to ask is how many bits should be used for the quantization of the output of DNN-AEs’ encoders. While there is no analytical guidance to answer this question, we resort to experimental tests. We found that the angles in Φ are more sensitive to quantization errors than the angles in 145 Ψ. We also observed that the setting of 5 bits for each output of Ψ’s DNN-AE and 8 bits for each output of Φ’s DNN-AE is a good trade-off between performance and airtime overhead. In our experiments, we will stick to this quantization setting. AP-Side Operations. Fig. 5.10 shows the CSI decompression operations at the AP, which try to recover the original CSI based on the compressed angles from an STA. The decompressed CSI will be used to construct the beamforming filters (e.g., using SVD-based precoding methods) for downlink MU-MIMO transmissions. 5.4.6 Compression Ratio and Airtime Overhead As presented in Section 5.3.1, the existing MU-MIMO protocols employ two types of CSI feedback quantization options and can group different numbers of subcarriers for CSI feedback. Then, the number of bits required for CSI feedback can be expressed as Nsc Na (p + q)/Ng , where Nsc is the number of valid subcarriers, Na is the number of angle-sequences in Ψ or Φ, p and q are the number of quantization bits as shown in Alg. 5.1, and Ng is the number of subcarriers in a group. Per the IEEE 802.11ac, we have (p, q) ∈ {(5, 7), (7, 9)}, Ng ∈ {1, 2, 4}. LB-SciFi uses two DNN-AEs to compress the angle sequences in Ψ or Φ. Based on the DNN-AE settings and quantization bits as shown in Fig. 5.10, the number of feedback bits is Na (5 × 3 + 8 × 8 + 2) = 81Na . Therefore, the compression ratio of LB-SciFi can be written as: 81Ng compression_ratio = 1 − , (5.4) 52(p + q) where (p, q) ∈ {(5, 7), (7, 9)} and Ng ∈ {1, 2, 4} as specified in the IEEE 802.11ac [67]. Based on (5.4), it is easy to check that LB-SciFi can achieve significant compression compared to the existing protocols. The compression ratio ranges from 48.1% to 90.3%, depending on the 146 setting of the existing channel feedback protocol. While LB-SciFi can significantly reduce the quantity of CSI feedback, a question to ask is about the quality of its compressed feedback, in- cluding the feedback error and the impact on downlink MU-MIMO. We will provide experimental results to answer this question in the next section. 5.4.7 Limitations Some limitations of LB-SciFi are discussed as follows. Compression Settings. LB-SciFi involves many parameters such as the number of layers in DNN-AEs, the number of neurons on each layer, the number of bits for quantization, and the pre- processing function parameters. These parameters are empirically chosen in our design, and there is no systematic approach to determine the optimal values of those parameters. So, essentially, LB-SciFi is heuristic and cannot offer any guarantee on its compression loss performance. Dataset Size. The key phase of LB-SciFi is training the two DNN-AEs. However, there is no guideline on how many data samples suffice for the two DNN-AEs’ training. Our experiments show that 13, 100 data samples can achieve at least 98.5% compression accuracy. However, this number is not generic and may change in other network environments. In general, the required dataset size for the DNN-AEs’ training remains unknown. Variability of Physical Environment. When there is a significant change in the surround- ings of the AP (e.g., a metal desk placed in front of the AP or the AP is moved into a distinct environment), re-training will be triggered to update the DNN-AEs. LB-SciFi cannot offer a time guarantee on the re-training as it depends on the speed of data collection. 147 AP AP AP STA 1 STA 2 STA 1 STA 2 STA 3 STA 1 STA 2 STA 3 STA 4 (a) Two-user case. (b) Three-user case. (c) Four-user case. Figure 5.11: Experimental setup for downlink MU-MIMO. 5.5 Experimental Evaluation In this section, we evaluate the performance LB-SciFi in comparison with existing 802.11 protocols in an indoor wireless environment. For ease of exposition, we use 802.11-TiGj to denote the IEEE 802.11 MU-MIMO protocol with Type i feedback and j subcarriers in a group, where i ∈ {0, 1} and j ∈ {1, 2, 4} (see Section 5.3.1 and [67]). Since T1G1 represents the finest feedback and T0G4 represents the coarsest feedback, we will use these two protocols as our performance comparison baseline. 5.5.1 Experimental Setup and Implementation Downlink MU-MIMO. We consider a WLAN as shown in Fig. 5.11, where the AP can serve two, three, or four STAs simultaneously in downlink. While there are many different beamform- ing methods in the literature, we used ZF beamforming method in our experiments owing to its popularity and ease of implementation. Implementation of AP and STAs. Fig. 5.12(a–b) shows our wireless testbed. The AP and STAs are built using USRP N210 devices and general-purpose computers. Each USRP N210 de- vice is equipped with VERT2450 Antenna for radio signal transmissions at 2.484 GHz. The com- puters are used for baseband signal processing and MAC protocol implementation. More specifi- cally, the AP is implemented using a Dell Inspiron 3671 Desktop, which serves eight USRP N210 148 AP (c) STA- random location 6m 2 STA- case study 1 3 32 m Figure 5.12: Illustrating our wireless testbed and test environment. (a) Prototyped STA. (b) Proto- typed AP. (c) Floor plan of tests. devices through a 10Gb fiber optic cable and a DGS-1210-20/ME Ethernet switch. Each STA is prototyped with a Lenovo ThinkPad T480 and one USRP N210 device. Implementation of 802.11 Protocols. IEEE 802.11 protocols are implemented with the legacy PHY and MAC layers specifications. We use IEEE 802.11 frame format with 64 subcarriers for OFDM modulation. Out of these 64 subcarriers, 48 subcarriers carry payload and 4 subcarriers contain pilots. The sampling rate and carrier frequency are set to 20 MSps and 2.484 GHz, respec- tively. Also, the maximum transmission power is set to 15 dBm. All the necessary 802.11 baseband signal processing modules are realized with C++ in GNU Radio. For ease of implementation, our 802.11 protocols do not include user scheduling. Implementation of LB-SciFi. LB-SciFi is implemented on top of 802.11 protocols. It mainly deals with collecting datasets and training DNN-AEs. On our testbed, the training datasets are automatically generated in the 802.11 protocols. With the collected datasets, DNN-AEs are trained end-to-end using Pytorch v1.4 library [132] and Adam optimizer [85]. 149 Experimental setting. Fig. 5.12(c) shows an office scenario where we conducted the experi- ments. The AP is placed at the spot marked as a blue square in the figure, while each STA is placed at a random location marked as a red circle. 5.5.2 DNN-AEs Training and Feedback Data Collection Campaign. We ran the MU-MIMO communications shown in Fig. 5.11(c) to collect data for DNN-AEs training at the AP. Specifically, the AP performs downlink MU-MIMO communications using the 802.11 protocols and, at the same time, it takes advantage of the CSI reports from the STAs for DNN-AEs training. The data collection campaign was conducted during two business days from 10am to 8pm. The human activity level in the environment was high between 11am to 2pm and low to moderate in other periods of time. To cover all areas, we were moving the STAs around all locations. This can be achieved in real systems thanks to the mobility of some Wi-Fi devices such as phones and laptops. In our experiments, the AP eventually collected 60, 000 samples from the STAs for training the DNN-AEs. Sufficiency of Collected Data. A question to ask is whether 60, 000 samples suffice for DNN- AEs’ training. To answer this question, we conduct convergence test under two criteria: i) the test loss of DNN-AE should be less than 1.5%, and ii) the loss difference for two validations should be less than 0.1%. With such two criteria, Ψ’s DNN-AE converges with 7, 300 samples, and Φ’s DNN-AE converges with 13, 100 samples. This indicates that 60, 000 samples suffice for training. Computational Complexity of Training. In our experiments, the training process takes less than 5 seconds on a Desktop PC with i5 CPU and 16 GB memory. A question is how much time is needed for training DNN-AEs on a commodity AP (Wi-Fi router). Since most commodity APs are equipped with an ARM processor, we expect that a commodity AP may take minutes to complete the training. In addition, we note that the training process is not time-sensitive, and an AP can take 150 10 1.0 Error (%) Overhead 5 0.5 0 0.0 L T T T T T T L T T T T T T B 0 0 0 1 1 1 B 0 0 0 1 1 1 -S G G G G G G -S G G G G G G c 1 2 4 1 2 4 c 1 2 4 1 2 4 iF iF i i (a) Normalized feedback error. (b) Normalized feedback overhead. Figure 5.13: Feedback comparison between LB-SciFi and 802.11 protocols (T0G1, T0G2, T0G4, T1G1, T1G2, and T1G4). its spare time to complete the training. If an AP is not capable of doing the training by itself, it can take advantage of its wired Internet connection and a cloud server to run the training. Feedback Error. With the completion of the first training, we examine the performance of the DNN-AEs. LB-SciFi introduces CSI error during the feedback. The feedback error can be attributed to two sources: compression and quantization. The compression error comes from the imperfection of the DNN-AEs, and the quantization error comes from the limited quantization bits. The normalized feedback error can be quantified by the loss function in (5.3). As a comparison baseline, we also measure the normalized feedback error in 802.11 protocols, where the error comes from the quantization of Ψ and Φ as well as the subcarrier grouping. Fig. 5.13(a) shows our measured normalized feedback errors. It can be seen that LB-SciFi has a larger feedback error than 802.11-T0G1/T1G1 protocols, and it has a smaller feedback error compared to 802.11-T0G2/T0G4/T1G2/T1G4 protocols. This is because 802.11-T0G1/T1G1 pro- tocols do not compress the CSI in the spectral domain while other protocols naively compress CSI in the spectral domain. Feedback Overhead. While LB-SciFi introduces larger error than 802.11-T0G1 and 802.11- T1G1, it uses much smaller uplink airtime resource for CSI feedback and therefore entails much smaller overhead. Fig. 5.13(b) compares the normalized feedback overhead of LB-SciFi with the 151 existing 802.11 protocols. It can be seen that LB-SciFi entails much less overhead compared to 802.11 protocols. LB-SciFi’s overhead is 0.1 while the lowest normalized overhead among IEEE 802.11 protocols is 0.2. Also, LB-SciFi’s compression ratio ranges from 48.1% to 90.3%, thereby conserving much airtime resource for data transmissions. 5.5.3 LB-SciFi: Performance Metrics We now focus on the overall performance of downlink MU-MIMO. We will consider the following performance metrics. EVM. EVM is widely used to assess the quality of received signals at a receiver device. It is   E[|X̂−X|2 ] defined as follows: EVM = 10 log10 2 , where X and X̂ are the original and estimated E[|X| ] signals on a subcarrier of an OFDM symbol, respectively. Gross Throughput. Gross throughput refers to the data rate achieved by a device (AP or STA) without taking into account the CSI overhead. For STA i, based on the EVM of its decoded signal, N sp its gross throughput can be extrapolated as follows: ri = N +N · b · γ (EVMi ), where Nsp = 48 fft cp is the number of subcarriers carrying payload, Nfft = 64 is FFT points, Ncp = 16 is the length of cyclic prefix, b = 20 is the sampling rate, and EVMi is EVM of the STA i’s decoded signal, and γ(EVMi ) is the average number of bits carried by one subcarrier. This parameter is given in P Table 2.2. As such, the gross throughput at the AP can be computed by r = i ri . Net Throughput. The net throughput refers to the data rate achieved by a device after sub- tracting the overhead mainly caused by CSI feedback in the MU-MIMO protocols. Denote r̄ as P the net throughput achieved by the AP. Then, it can be expressed by: r̄ = i ti ri , maxi {ti }+toverhead where toverhead is the time duration of overhead (NDPA, NDP, Poll, CBR, and ACK) and ti is the time duration required by STA i for its downlink data transmission (see Fig. 5.2). While the value of toverhead is fixed, the value of ti is not. ti is determined by the downlink data packet 152 size and selected modulation and coding scheme. In real WLANs, a data packet should not exceed 2304 bytes [135]. In our experiments, we consider the maximum packet size to measure the lowest throughput gain that can be achieved by LB-SciFi. 5.5.4 Micro Performance of LB-SciFi: A Case Study We use a case study to examine the micro performance of LB-SciFi. We consider the network shown in Fig 5.11(b) and place the three STAs at the spots marked with golden stars in Fig. 5.12(c). We compare the performance of LB-SciFi with 802.11-T1G1/T0G4 protocols. EVM.: We conduct downlink MU-MIMO transmissions using LB-SciFi, 802.11-T0G4, and 802.11-T1G1. Fig. 5.14 exhibits the constellation of the decoded data packet at each STA with the three protocols. As shown in Fig. 5.14(a), LB-SciFi achieved −16.5 dB EVM at STA 1, −19.0 dB EVM at STA 2, and −19.6 dB EVM at STA 3. In contrast, Fig. 5.14(b) shows the achieved EVM at the three STAs when 802.11-T0G4 is used; and Fig. 5.14(c) shows the achieved EVM at the three STAs when 802.11-T1G1 is used. It can be seen that LB-SciFi achieves an EVM performance sim- ilar to 802.11-T1G1 and outperforms 802.11-T0G4. We note that the constellations in Fig. 5.14 can be successfully decoded thanks to the powerful LDPC channel code. It is also worth point- ing out that LB-SciFi can support any modulation and coding scheme as long as channel quality permits. Feedback Overhead. In the MU-MIMO transmissions, the CSI reports are transmitted from STAs to the AP using BPSK rate to ensure the feedback reliability [110]. Table 5.1 lists the feed- back overhead using different protocols. As we can see from the table, LB-SciFi entails 0.6 kbit feedback overhead per STA. In contrast, 802.11-T0G4 entails 1.1 kbit feedback overhead per STA, and 802.11-T1G1 entails 5.8 kbit feedback overhead per STA. Gross and Net Throughput. Table 5.1 lists each STA’s and the AP’s gross/net throughput. 153 1 1 1 0 EVM= -16.5 dB 0 EVM= -19.0 dB 0 EVM= -19.6 dB -1 -1 -1 -1 0 1 -1 0 1 -1 0 1 (a) LB-SciFi: Constellation measured at three STAs. 1 1 1 0 EVM= -16.2 dB 0 EVM= -18.3 dB 0 EVM= -18.8 dB -1 -1 -1 -1 0 1 -1 0 1 -1 0 1 (b) 802.11-T0G4: Constellation measured at three STAs. 1 1 1 0 EVM= -16.4 dB 0 EVM= -19.3 dB 0 EVM= -20.0 dB -1 -1 -1 -1 0 1 -1 0 1 -1 0 1 (c) 802.11-T1G1: Constellation measured at three STAs. Figure 5.14: Constellations of decoded signals at STAs when using LB-SciFi, 802.11-T0G4, and 802.11-T1G1. Table 5.1: Experimental results of the case study for LB-SciFi and 802.11 protocols. STA 1 STA 2 STA 3 AP EVM (dB) -16.5 -19.0 -19.6 – LB-SciFi Feedback overhead (kbit) 0.6 0.6 0.6 – Gross throughput (Mbps) 24.0 36.0 36.0 96.0 Net throughput (Mbps) 15.9 23.9 23.9 63.7 EVM (dB) -16.2 -18.3 -18.8 – T0G4 Feedback overhead (kbit) 1.1 1.1 1.1 – Gross throughput (Mbps) 24.0 24.0 24.0 72.0 Net throughput (Mbps) 15.0 15.0 15.0 45.0 EVM (dB) -16.4 -19.3 -20.0 – T1G1 Feedback overhead (kbit) 5.8 5.8 5.8 – Gross throughput (Mbps) 24.0 36.0 36.0 96.0 Net throughput (Mbps) 9.4 14.1 14.1 37.6 154 1 1 1 LB-SciFi LB-SciFi LB-SciFi T0G4 T0G4 T0G4 CDF T1G1 CDF T1G1 CDF 0.5 0.5 0.5 T1G1 0 0 0 -30 -20 -10 0 30 60 0 10 20 EVM (dB) Throughput (Mbps) Throughput(Mbps) (a) EVM (b) Gross throughput (c) Net throughput Figure 5.15: Comparison of LB-SciFi and 802.11 protocols in the two-user MU-MIMO network. We can see that LB-SciFi’s gross throughput is larger than 802.11-T0G4 but less than 802.11- T1G1. However, LB-SciFi’s net throughput is larger than both of them. The overall net throughput gain of LB-SciFi is 41.7% over 802.11-T0G4 and 68.8% over 802.11-T1G1. 5.5.5 Macro Performance of LB-SciFi: Extensive Results We now extend our case study to a more generic scenario. We consider the three networks in Fig. 5.11 and measure their performance at many different locations as shown in Fig. 5.12. Our evaluation methodology follows the previous case study. Two-User MIMO. Fig. 5.15 presents the CDF of our measured EVM, gross throughput, and net throughput over all locations when the AP serves two STAs. Per Fig. 5.15(a), the average EVM of decoded signals at the two STAs is −20.7 dB for LB-SciFi, −19.1 dB for 802.11- T0G4, and −21.2 dB for 802.11-T1G1. Compared to T0G4, LB-SciF has 1.6 dB EVM improve- ment. Compared to T1G1, LB-SciF has 0.5 dB EVM degradation. Per Fig 5.15(b), LB-SciFi achieves an average of 35.8 Mbps per-STA gross throughput, while 802.11-T0G4 and 802.11- T1G1 achieve 30.2 Mbps and 38.7 Mbps, respectively. Per Fig 5.15(c), LB-SciFi achieves an average of 17.6 Mbps per-STA net throughput, while 802.11-T0G4 and 802.11-T1G1 achieve 14.1 Mbps and 8.8 Mbps, respectively. The results indicate that LB-SciFi offers 25.0% net through- put gain over 802.11-T0G4 and 99.8% gain over 802.11-T1G1. [187] proposed a 3-dimensional (time, frequency, and quantization) Adaptive Feedback Com- 155 1 1 1 LB-SciFi T0G4 CDF CDF CDF T1G1 0.5 0.5 0.5 LB-SciFi LB-SciFi T0G4 T0G4 T1G1 T1G1 0 0 0 -30 -20 -10 0 30 60 0 10 20 EVM (dB) Throughput (Mbps) Throughput(Mbps) (a) EVM. (b) Gross throughput. (c) Net throughput. Figure 5.16: Comparison of LB-SciFi and 802.11 protocols in the three-user MU-MIMO network. pression (AFC) scheme for WLANs. While LB-SciFi is orthogonal to the time-domain AFC, we compare LB-SciFi with the frequency-domain AFC. Experimental results in [187] show the frequency-domain AFC achieves 12.7% throughput gain when compared to 802.11-T1G1 (“Size1” in its Fig. 13b). LB-SciFi achieves an average of 99.8% throughput gain over 802.11-T1G1. The comparison result is not surprising because LB-SciFi exploits DNN-AEs to reduce channel’s inter-subcarrier correlation for feedback compression, rather than simply grouping a subset of sub- carriers for feedback compression. Three-User MIMO. Fig. 5.16 presents the CDF of our measured EVM, gross throughput, and net throughput over all locations when the AP serves three STAs. Per Fig 5.16(a), the average EVM of decoded signals at the three STAs is −16.5 dB for LB-SciFi, −15.3 dB for 802.11-T0G4, and −16.8 dB for 802.11-T1G1. Per Fig 5.16(b), LB-SciFi achieves an average of 23.3 Mbps per- STA gross throughput, while 802.11-T0G4 and 802.11-T1G1 achieve 20.0 Mbps and 24.0 Mbps, respectively. Per Fig 5.16(c), LB-SciFi achieves an average of 10.5 Mbps per-STA net throughput, while 802.11-T0G4 and 802.11-T1G1 achieve 8.4 Mbps and 4.9 Mbps. Therefore, LB-SciFi offers 25.7% net throughput gain over 802.11-T0G4 and 116.8% net throughput gain over 802.11-T1G1. Four-User MIMO. Fig. 5.17 presents the CDF of our measured EVM, gross throughput, and net throughput over all the locations when the AP serves two STAs. Per Fig 5.17(a), the average EVM of decoded signals at the four STAs is −14.5 dB for LB-SciFi, −13.4 dB for 802.11-T0G4, and −14.9 dB for 802.11-T1G1. Per Fig 5.17(b), LB-SciFi achieves an average of 18.3 Mbps per- 156 1 1 1 LB-SciFi T0G4 CDF CDF CDF T1G1 0.5 0.5 0.5 LB-SciFi LB-SciFi T0G4 T0G4 T1G1 T1G1 0 0 0 -30 -20 -10 0 30 60 0 10 20 EVM (dB) Throughput (Mbps) Throughput(Mbps) (a) EVM. (b) Gross throughput. (c) Net throughput. Figure 5.17: Comparison of LB-SciFi and 802.11 protocols in the four-user MU-MIMO network. Net throughput (Mbps) 40 20 LB-SciFi 802.11-T0G4 802.11-T1G1 0 Two-user Three-user Four-user Test Scenario Figure 5.18: Net throughput of LB-SciFi and 802.11 protocols. STA gross throughput, while 802.11-T0G4 and 802.11-T1G1 achieve 15.6 Mbps and 19.0 Mbps. Per Fig 5.17(c), LB-SciFi achieves an average of 8.3 Mbps per-STA net throughput, while 802.11- T0G4 and 802.11-T1G1 achieve 6.4 Mbps and 3.8 Mbps, respectively. LB-SciFi offers 28.9% net throughput gain over 802.11-T0G4 and 117.3% net throughput gain over 802.11-T1G1. Summary of Observations. We now focus on the net throughput achieved by the AP. Fig. 5.18 depicts the total net throughput achieved by the AP when it employs these three protocols. As it can be seen, the three protocols yield similar throughput in two-user, three-user, and four-user MIMO cases. On average, LB-SciFi achieves 26.5% net throughput gain compared to 802.11-T0G4 and 111.3% throughput gain over 802.11-T1G1. 157 5.6 Chapter Summary In this chapter, we presented LB-SciFi, an online learning-based channel feedback framework for existing IEEE 802.11 MU-MIMO protocols. LB-SciFi reduces the CSI feedback overhead for 802.11 protocols by leveraging recent advances in deep neural networks to compress CSI in the spectral domain without compromising the CSI feedback accuracy. The key component of LB-SciFi is an online training scheme, which requires no dedicated training datasets but takes advantage of available side information from existing 802.11 protocols to train the autoencoders. As such, LB-SciFi can be easily plugged into existing 802.11 protocols and thus amenable to prac- tical implementation. We have built a prototype of LB-SciFi on a wireless testbed and evaluated its performance in indoor wireless environments. Experimental results show that LB-SciFi can reduce the feedback overhead by 73% and increases the network throughput by 69% on average. 158 Chapter 6 A Learning-Based Channel Sounding and Resource Allocation for IEEE 802.11ax 6.1 Introduction After two decades of evolution from its genesis, Wi-Fi technology has become the dominant car- rier of the Internet traffic [181] and penetrated every aspect of our lives. With the continuous proliferation of the Internet-based applications, Wi-Fi market is growing at an unprecedented rate, and more than four billion Wi-Fi devices have shipped in 2019 alone [181]. To serve the large number of Wi-Fi devices and meet their high data rate demands, Wi-Fi networks are evolving from 802.11n/ac to 802.11ax so that a Wi-Fi AP is capable of utilizing the spectrum more efficiently and accommodating more Wi-Fi clients at the same time. Compared to the carrier-sense-based 802.11n/ac, 802.11ax features centralized resource allocation and fine-grained inter-device syn- chronization. With these two features, it introduces OFDMA and uplink MU-MIMO techniques for the first time. Although OFDMA and MU-MIMO has been well studied in cellular networks (see Table 6.1), their joint optimization in Wi-Fi networks remains scarce because OFDMA is introduced to Wi- Fi networks in 802.11ax for the first time. Given that cellular and Wi-Fi networks have different PHY and MAC layers, and that BSs and APs have very different computational power, the MU- 159 MIMO-OFDMA transmission schemes designed for cellular networks may not be suited for Wi-Fi networks, necessitating research efforts to innovate the MU-MIMO-OFDMA design for 802.11ax networks. Particularly, the MU-MIMO-OFDMA transmission in 802.11ax faces two challenges. First, to perform downlink MU-MIMO transmissions, an AP needs to have CSI for the construction of beamforming filters so that it can concurrently send independent data streams to multiple Wi-Fi clients on the same Resource Unit (RU). However, existing 802.11 channel sounding protocols are notorious for their large airtime overhead, which significantly compromises the throughput gain of MU-MIMO. Therefore, a low-overhead channel sounding protocol is needed. Second, the marriage of MU-MIMO and OFDMA largely expands the optimization space of resource allocation at an 802.11ax AP, making it infeasible to pursue an optimal resource allocation solution in real time due to the limited computational power of APs. Therefore, a low-complexity, yet efficient, algorithm is needed for an AP to solve the resource allocation problem. In this chapter, we study the channel sounding and resource allocation problems for down- link transmissions in an 802.11ax Wi-Fi network, where an AP serves many STAs on a set of pre-defined RUs jointly using MU-MIMO and OFDMA techniques. We assume that the AP is equipped with multiple antennas, while each STA is equipped with one antenna. In such an 802.11ax network, we propose a practical scheme, called DeepMux, to enhance the efficiency of downlink MU-MIMO-OFDMA transmissions by leveraging recent advances in DL. Deep- Mux addresses the above two challenges using DNNs, and it mainly comprises the following two key components: DLCS and DLRA. Both of them reside in APs and impose no computa- tional/communication burden to the STAs. To reduce the channel sounding overhead, DLCS in DeepMux compresses the frequency- domain CSI during the feedback procedure by leveraging the compression capability of DNNs. Specifically, instead of reporting CSI on all the grouped tones, each STA only reports the quan- 160 tized CSI on a small number of tones to the AP. Based on the limited CSI, the AP infers CSI over all tones using well-trained DNNs. Particularly, the AP takes advantage of channel reciprocity and uses uplink CSI, which is easy to obtain, to train the DNNs for downlink CSI, making the training process easy to conduct. To obtain a near-optimal resource allocation solution in real time at the AP, DLRA in DeepMux employs a DNN to solve a Mixed-Integer Non-Linear Programming (MINLP) optimization prob- lem. Specifically, DLRA decouples the complex resource allocation optimization problem into two sub-problems: RU assignment and power allocation. A DNN is then employed to compute a sub-optimal solution to the RU assignment sub-problem. Once RU assignment is determined, the original MINLP problem degrades to a Linear Programming (LP) problem, which is easy to solve. The contributions of this work are summarized as follows. • We have designed DLCS, a DL-based channel sounding protocol for 802.11ax networks. DLCS employs an online training process and requires no efforts from STAs. Numerical re- sults show that DLCS is capable of reducing the channel sounding overhead by 62.0%∼90.5% without sacrificing CSI feedback accuracy. • We have designed DLRA, a DL-based resource allocation algorithm for 802.11ax APs to perform efficient downlink transmissions. Numerical studies show that DLRA is capable of yielding a sub-optimal solution to MINLP resource allocation problems in polynomial time. • By combining DLCS and DLRA, we have designed DeepMux to enable efficient downlink MU-MIMO-OFDMA transmissions in 802.11ax networks. We have evaluated DeepMux on a wireless testbed. Experimental results show that DeepMux improves network throughput by 26.3%∼43.6% compared the greedy utilization of DoF by strongest STAs on each RU. 161 6.2 Related Work We focus our literature review on channel sounding and resource allocation in both Wi-Fi and cellular networks. 6.2.1 Channel Sounding Channel Sounding for Wi-Fi. The sounding overhead issue in Wi-Fi networks has been in focal point of view since accommodation of MU-MIMO in IEEE 802.11 standards. Existing re- search efforts have been invested to tackle this issue by optimizing channel sounding parame- ters [17, 33, 113], seeking new channel sounding paradigms [53, 110], or compressing CSI frames [79, 187]. As the pioneering trials of reducing sounding overhead, research efforts in [17, 33, 113] have exploited the semi-static nature of Wi-Fi networks to adaptively reduce the frequency of chan- nel sounding and avoid unnecessary sounding overhead. Implicit channel sounding has also been studied for rectifying sounding overhead [53, 110]. Although implicit channel sounding can sig- nificantly lower the overhead, it requires extra hardware for channel calibration and thus may not be amenable to low-cost Wi-Fi networks. DeepMux is orthogonal to these works as DLCS neither manipulates the channel sounding frequency nor employs implicit channel sounding. [187] and [79] are two prior efforts that reduce the channel sounding overhead by compressing CSI in the frequency domain. However, these two efforts require coordination from Wi-Fi clients to fully or partially compress CSI. In contrast, DLCS runs solely on Wi-Fi routers and requires no coordination from Wi-Fi clients. Simply put, DLCS is transparent to Wi-Fi clients. DLCS also differs from these two works in terms of computational complexity. Specifically, [187] and [79] require Wi-Fi clients to estimate CSI for all frequency tones while DLCS requires Wi-Fi clients to 162 Table 6.1: A summary of resource allocation schemes in Wi-Fi and cellular networks. Objective Network Mode Polynomial MU-MIMO sum-rate Fairness Latency Energy Wi-Fi Cellular Uplink Downlink complexity O n2.5  DeepMux ✓ ✓ ✓ ✓ [41] ✓ ✓ ✓ ✓ O n3 [88] ✓ ✓ ✓ ✓ [169, 170] ✓ ✓ ✓ ✓ [71] ✓ ✓ ✓ O n3  [16] ✓ ✓ ✓ [184] ✓ ✓ ✓ [121] ✓ ✓ ✓ ✓ ✓ O n3  [186] ✓ ✓ ✓ [58] ✓ ✓ ✓ ✓ O n2.5  [116] ✓ ✓ ✓ ✓ [45, 76, 141] ✓ ✓ ✓ ✓ estimate CSI only for a small number of tones. Learning-Based Channel Sounding in Cellular Networks. Sounding overhead is also a critical problem in cellular networks. Temporal correlation [97, 111, 172] and spatial correlation [97] have been harvested to remove the redundancy of CSI and reduce the airtime overhead of CSI acquisition. DeepMux differs from these works as it focuses on the frequency domain. Frequency- domain correlation of CSI has been studied in [179] and [54] to reduce the channel sounding overhead in cellular networks. DeepMux differs from these works because DLCS is transparent to users (i.e., imposing no computation on users). In addition, CSI in cellular networks is very different from that in Wi-Fi networks. DeepMux is meticulously tailored for Wi-Fi networks. Finally, most prior works are limited to theoretical investigations and numerical evaluations while DeepMux takes into account incumbent Wi-Fi protocols and has been validated in practical indoor wireless environments. 6.2.2 Resource Allocation Table 6.1 summarizes existing resource allocation schemes in cellular and Wi-Fi networks, where n denotes the number of active users served by an AP or a BS. Clearly, DeepMux differs from 163 existing works in terms of objective, network scenario, transmission mode, or computational com- plexity. In what follows, we elaborate the existing studies and point out the differences between DeepMux and these works. Resource Allocation for Wi-Fi Networks. Recently, [178] has studied downlink OFDMA in wireless local area networks (WLANs) and showed that its performance is highly dependent on the resource assignment strategies at APs. This problem has been followed in [41], with the objec- tive of improving the fairness among users. DLRA differs from the proposed resource allocation scheme in [41] as it focuses on pursuing a sub-optimal resource allocation scheme with a low com- putational complexity. [88] has considered the throughput maximization under the assumption that a user can be assigned to at most one RU and offered a solution for both uplink and downlink trans- missions. Compared to [88], DLRA expands the problem scope by allowing multiple RUs to serve a user and also by allowing an RU to serve multiple users concurrently. [169] and [170] are the only works considering downlink MU-MIMO-OFDMA in WLANs. However, these two works employ greedy iterative algorithms to compute a feasible solution. In contrast, DLRA employs learning-based approach and offers a solution in polynomial time. [16, 71, 184] studied resource allocation in uplink OFDMA WLANs, which is not the scope of our work. Resource Allocation in Cellular Networks. Since there are many research results of resource allocation in cellular networks, we focus our review on MIMO-OFDMA techniques. [121] has studied the resource allocation problem under latency constraint. However, the complexity of the proposed solution is prohibitively large. [186] has studied the resource allocation problem with the objective of enhancing energy efficiency. The authors has proposed an algorithm with polynomial-time complexity. However, it only works for single-user MIMO-OFDMA networks. [58] and [116] have investigated the resource allocation problem for MU-MIMO-OFDMA cellular networks and proposed low-complexity algorithms to compute the solutions. However, these two 164 works focus on maximizing energy efficiency. In contrast, DeepMux aims to maximize network throughput. [45, 76, 141] have explored downlink MU-MIMO-OFDMA transmissions in different network scenarios. These research efforts have proposed greedy algorithms to pursue optimal solutions for maximizing network throughput. DeepMux is very different from these works in terms of network settings and computational complexity. 6.3 Problem Description Consider an 802.11ax network comprising a multi-antenna AP and many single-antenna STAs. Denote Nap as the number of antennas on the AP. Denote Nsta as the number of STAs in the network. We consider a dense network where Nsta > Nap . In 802.11ax standard, OFDMA and MU-MIMO techniques have been included for efficient communications between the AP and its serving STAs. Fig. 6.1 shows the four possible RU configurations when the network works on 20 MHz bandwidth. As the figure shows, the total number of valid tones is 242, and an RU could consists of 26, 52, 106, or 242 tones. When MU-MIMO is enabled, an RU can serve multiple STAs, depending on the channel condition, data traffic, and network setting. In the downlink transmissions, in order for an AP to serve multiple STAs per RU, it needs to first perform channel sounding to obtain the CSI and then construct the spatial filters for beamforming. By doing so, independent data streams can be delivered to different STAs simultaneously. In this process, CSI is crucial. In what follows, we first present the existing channel sounding protocol and then state our design objectives. 165                 Figure 6.1: Four different RU configurations over 20 MHz as specified in IEEE 802.11ax [70]. 6.3.1 802.11 Channel Sounding Protocol in Nutshell Fig. 6.2 shows the channel sounding protocol specified in 802.11ax, and we elaborate on it in the following. Announcement. The AP initiates the channel sounding procedure by broadcasting an NDPA frame, which contains the addresses of intended STAs. Then, the AP sends out an NDP frame for STAs to estimate the downlink channels between themselves and the AP. Channel Estimation. Each STA leverages the preamble in the NDP frame to estimate the complex-valued channel vectors between the AP and itself. Reporting the raw channel vectors to the AP, however, entails too much airtime overhead. To reduce the airtime overhead, each STA employs GR and tone grouping to pre-process its estimated channel vectors. The pre-processing leads to a CSI compression in both spatial and spectral domains. Spatial compression. In its general form, the spatial compression includes a series of GR, pre- multiplications, and post-multiplications applied to the right singular vectors of a channel matrix to extract its spatial information [67, 68, 70]. Each rotation or pre-multiplication is realized by an angle, which stores a part of spatial information [156]. On each tone, two sets of angles will be generated: Nψ ψ-type angles from GR and Nϕ ϕ-type angles from pre-multiplications, where Nψ = Nϕ = 2Nap Nr − Nr2 − Nr /2 and Nr is the number of the STA’s antennas in general case  (we assumed Nr = 1 throughout this chapter). For notional simplicity, we denote these two sets 166 1'3$ 1'3 7%53 7%53 'DWD $3 )LUVWJURXS %5 %5 RI67$V %5 5HSRUWVIURPILUVWJURXS %5 /DVWJURXS %5 RI67$V %5 5HSRUWVIURPODVWJURXS Figure 6.2: Existing channel sounding protocol in IEEE 802.11ax.   over all tones as Ψ = ψi,k ∀i,k and Φ = ϕi,k ∀i,k , where i is the angle index (1 ≤ i ≤ Nψ ), k is the tone index (1 ≤ k ≤ Ntone ), and Ntone is the number of tones. Generally speaking, ψi,k ∈ [0, π/2) and ϕi,k ∈ [0, 2π). The angles will be quantized before being sent to the AP. In 802.11 standards, two types of quantization are specified for feedback: • Feedback type 0 uses 5 bits for each angle in Ψ and 7 bits for each angle in Φ. • Feedback type 1 uses 7 bits for each angle in Ψ and 9 bits for each angle in Φ. Tone Grouping. As Wi-Fi networks typically work in indoor scenarios for short-range commu- nications, their coherence bandwidth tends to be large. Hence, tone grouping has been employed to bond Ng tones. In 802.11ax standard [70], Ng = {1, 4, 16}. Particularly, Ng = 1 means that no grouping is employed. Also, Ng = 16 is only allowed with feedback type 1. Beamforming Report (BR). The BR frames carry the quantized angles (Ψ and Φ) from each STA to the AP. These frames are also used to carry the channel strength information (average SNR and SNR deviation for each group of tones) from each STA to the AP. Based on the reported SNR 167 information, the AP manages available spectral and power resources to serve STAs. Polling: Polling is a mechanism to coordinate the report process among STAs. Once all STAs have prepared their BR frames, the AP sends Trigger Beamforming Report Poll (TBRP) frames sequentially. Each TBRP frame coordinates a group of STAs to send their BR frames through uplink MU-MIMO as illustrated in Fig. 6.2. The AP decodes the BR frames and identifies the sender of each report using the MAC address in the corresponding frame. After polling all the groups, the AP obtains information required for downlink MU-MIMO transmission. 6.3.2 Design Objectives and Challenges The objectives of this work are to design and evaluate a practical, yet efficient, downlink MU- MIMO-OFDMA transmission scheme for 802.11ax networks. Towards these objectives, we face the following two challenges. Challenge 1 – Channel Sounding Overhead. Channel sounding is crucial for beamforming in downlink MU-MIMO transmissions. However, the existing channel sounding protocol in Fig. 6.2 entails a large airtime overhead and significantly compromises the throughput gain of MU-MIMO. For instance, consider an AP with 8 antennas and a single-antenna STA working on 160 MHz bandwidth. Even with the tone grouping, the angles information in a single report could be as large as 7.0 kB1 , which is far beyond a maximum transmission unit (2.3 kB) in WLANs [135]. This means that a BR frame in Fig. 6.2 can take more than 3 packets for CSI feedback. Such a large airtime overhead not only consumes network bandwidth but it also ruins the freshness of CSI for beamforming. Challenge 2 – Joint Resource Allocation. The marriage of MU-MIMO and OFDMA cre- 1 In this case, N = N = 7 and feedback type 1 is used over 498 groups of tones. Representation of angles ψ ϕ requires 55, 776 bits ≈ 7.0 kB. 168 ates a joint resource allocation problem for the AP, which involves RU assignment for users and power allocation for MIMO streams. This problem is complicated as it crosses spectral and power domains. Solving the resource allocation problem is time-constrained as the coherence of wire- less channels degrades over time. It is therefore important for an AP to have a low-complexity algorithm that can find an efficient resource allocation solution in real time. A classical approach for solving this problem is to first formulate the problem as an optimization problem and then employ existing optimization solvers to compute the optimal solution. This approach, however, is infeasible in practice due to the high computational complexity from an exhaustive search over RU assignment instances. For example, consider a small 802.11ax network where a 4-antenna AP serves 6 single-antenna STAs over four 52-tone RUs on 20 MHz bandwidth. We formulate the resource allocation problem as an MINLP optimization problem and employ CVX package to solve it for a given RU assignment. Our observation is that it takes up to 342 minutes to find an optimal solution with search over 223.3 RU assignment instances. Such a large delay makes resource allocation infeasible for practical use and urges us to devise a low-complexity resource allocation mechanism. 6.4 Overview of DeepMux In this section, we present an overview of DeepMux, which leverages recent advances in DNNs to address the challenges for downlink MU-MIMO-OFDMA transmissions in 802.11ax networks. Fig. 6.3 shows a high-level structure of DeepMux. It mainly comprises two components: DLCS and DLRA. In what follows, we present the basic idea of these two components. 169 '/&6 $QJOHV 0XOWLSOH[HG %HDPIRUPLQJ 6SDUVLILHG DQJOHV 3UH VLJQDOV SURFHVVLQJ 58 6%5IUDPHV DVVLJQPHQWV 3RZHU 3UH 3RVW 3RZHU FRHIILFLHQWV &KDQQHO VWUHQJKWV SURFHVVLQJ SURFHVVLQJ DOORFDWLRQ '/5$ Figure 6.3: The overview of DeepMux. 6.4.1 Basic Idea of DLCS DLCS is an enhanced 802.11 channel sounding protocol aiming to reduce the sounding overhead. Its design is based on the following two observations: i) wireless channels in local area networks are highly correlated in the frequency domain; and ii) tone grouping in the current 802.11 sounding protocol is not an efficient approach for feedback compression. Motivated by the success of DNNs for image compression, we propose to use DNNs to reduce the channel sounding overhead in the CSI feedback process. Specifically, instead of reporting CSI over a large number of tones, each STA only reports CSI over a small number of tones. Based on the reported CSI over sparse tones, the AP attempts to infer the CSI over all tones using DNNs. While the idea is straightforward, an important question is how to train the DNNs so that they can infer the full CSI based on the limited feedback. For this question, one solution is that the AP asks every STA to report a large amount of CSI over all tones at the beginning and uses the large amount of CSI to train the DNNs. This solution, however, imposes heavy computational 170 and communication burdens on STAs, and thus is not amenable to implementation. To circum- vent this issue, we use uplink CSI, instead of downlink CSI, for the training of DNNs. This is because uplink and downlink channels have the same profile in the frequency domain, thanks to the channel reciprocity [109]. In other words, uplink and downlink channels bear the same shape over frequency domain even without channel calibration, making it possible for DNNs to learn the downlink frequency-domain CSI correlation using uplink CSI samples in the absence of channel calibration. Additionally, an AP can easily obtain uplink CSI over all tones. Obtaining uplink CSI re- quires no effort from STAs, making the training process transparent to the STAs. Whenever an AP receives packets from STAs, it can measure the uplink channel based on the packets’ pream- ble. We note that, different from prior channel reciprocity applications, channel calibration is not needed for our application. Details of DLCS are presented in Section 6.5. 6.4.2 Basic Idea of DLRA The marriage of MU-MIMO and OFDMA creates a challenge for an 802.11ax-enabled AP to optimally allocate the available spectral and power resources in a reasonable amount of time. To address this challenge, DeepMux formulates the resource allocation problem as an optimization problem. In its original form, the optimization problem is an MINLP problem, where its binary variables correspond to RU assignment sub-problem and its continuous variables correspond to power allocation sub-problem. DeepMux approaches the MINLP problem by reformulating it into a Mixed-Integer Linear Programming (MILP) problem. Unlike an MINLP problem, an MILP problem can be systematically solved in two steps: i) an organized search mechanism over discrete instances of the feasible region (RU assignment instances), and ii) an interior-point algorithm that solves the convex sub-problem (power allocation) for a given RU assignment. 171 1 D=16, CD=0.77 1 Correlation Correlation D=16, CD=0.75 0.5 0.5 0 0 1 30 60 1 30 60 Depth (tone) Depth (tone) (a) Correlation of angles in Ψ. (b) Correlation of angles in Φ. Figure 6.4: Spectral correlation of angles in Ψ and Φ. Given that MILP is NP-hard in general, we take advantage of recent advances in DNNs to determine the optimal RU-assignment in the first step. Specifically, DeepMux employs a DNN to compute the values for the binary optimization variables in the MILP problem. Such a DNN is trained offline, in a supervised manner, using the SNR reports from STAs, as shown in Fig. 6.3. After the binary variables (corresponding to the RU assignment sub-problem) are determined, the MILP problem degrades to a linear programming problem, which is easy to solve. Details of DLRA are presented in Section 6.6. 6.5 DLCS: A Low-Overhead Channel Sounding DLCS enhances the 802.11 channel sounding protocol in Fig. 6.2 by reducing the airtime con- sumed by BR frames. This is done through sparsification of Ψ and Φ angles in the frequency domain. That is, each STA reports CSI angles over a few tones, and the AP infers the CSI angles for all tones based on the sparsified feedback using DNNs. Before diving into DLCS, we first take a look at the frequency-domain correlation of CSI angles. We collected 50, 000 Ψ and Φ samples in an office environment to measure the frequency- domain correlation. For a sequence x ∈ R1×L , we define CD as its correlation at depth D by 172 letting:   x(m+1:m+D) xT(m+D+1:m+2D) CD = Em  , (6.1) x(m+1:m+D) x(m+D+1:m+2D) where x(i:j) ≜ xi , xi+1 , · · · , xj with xi being the ith element in x, and (·)T is transpose oper-   ator. Fig. 6.4 shows the correlation of the collected CSI angles at different tone depths. It can be seen that, when the tone depth is greater than 16 (i.e., D > 16), the correlation is still considerable for both Ψ and Φ angles. This means that, grouping the angles over Ng tones (simply by averag- ing operation) cannot fully harvest such a significant correlation for compression purpose. On the other hand, tone grouping may lead to an inaccurate feedback when Ng > 16. DLCS is a more sophisticated compression approach to reduce the sounding overhead by exploiting inter-tone CSI correlation. In what follows, we first present the settings of DNNs and then elaborate on their training (exploration) and sparsification (exploitation) phases separately. 6.5.1 DNNs Settings As shown in Fig. 6.5, DLCS employs DNNs at the AP to infer full CSI angles based on a sparsified feedback. One DNN is used for the angles in Ψ and another DNN is used for the angles in Φ. The dimension of input layer is S, corresponding to the quantized CSI angles over S tones. The value of S is selected through experimental studies, which will be shown shortly. The DNNs have Ntone neurons on the output layer, corresponding to the inferred CSI angles over all tones (e.g., Ntone = 234 for all the nine 26-tone RUs over 20 MHz bandwidth). DNNs have multiple hidden layers, say L hidden layers. The dimension of the ith hidden layer is di . Each hidden layer is fully- connected, followed by a batch-normalization layer to speed up the training convergence [72]. 173 Ntone dL Թ dL-1 Թ d2 Թ d1 Թ S Թ Թ Figure 6.5: DNNs’ structure at the AP for inferring Ψ and Φ based on limited feedback. ReLU activation function is used for each layer. Since the DNNs are designed for interpolation purpose, they are in an enlarging trapezoid shape. 6.5.2 Training Phase As we explained before, the AP does not require STAs to report a large amount of downlink CSI angles for training DNNs because doing so imposes heavy computational and communication burdens on STAs. Instead, the AP uses its estimated uplink channels to calculate CSI angles and train the DNNs by taking advantage of wireless channel reciprocity. Since the DNNs focus only on learning the frequency-domain properties of CSI, channel calibration is not necessary to compensate the response difference between Tx and Rx RF chains. Using the uplink CSI to train the DNNs have two benefits. First, it is easy for an AP to collect a large amount of samples for training purpose. As long as an STA sends a packet, the AP can estimate the uplink channel and use it for generating angles and training DNNs. Simply put, the 174 AP requires zero effort to obtain dataset for training DNNs. Second, it tends to offer better training results as uplink CSI does not suffer from tone grouping and quantization errors. If the AP wants to use downlink CSI for training DNNs, quantization of the estimated downlink CSI at STAs is needed to facilitate the feedback. This introduces quantization error and degrades the training performance. In contrast, using uplink CSI for training purpose does not suffer from this issue. In what follows, we describe the operations of DNNs training at the AP. No extra effort is needed at the STAs. Data Collection. AP and STAs work in their ordinary mode. Whenever the AP receives a packet, it decodes the packet and records its estimated uplink channel on all tones. Then, the AP performs spatial compression on the estimated uplink channel over every tone, as specified in 802.11 standards [70] to collect CSI angles (i.e., Ψ and Φ). The generated CSI angles are organized in batches and used for training DNNs. Data Preprocessing. As shown in Fig. 6.3, each batch of CSI angles are pre-processed before being used for training the DNNs. The pre-process is to make the angles zero-mean and unite- variance over all tones [83]. Albeit simple, this pre-process significantly improves the convergence of DNNs [83], especially when gradient descent algorithms are used for weight adaptation [92]. The AP also quantizes these pre-processed angles with different numbers of bits and keeps all versions to examine their performance. Training Parameters and Provisions. Normalized Mean Squared Error (NMSE) loss function is employed to measure the sparsification error. The DNNs are trained using Adam optimizer [85]. The training is performed with an initial learning rate of 0.001 and decaying rate of 0.98 following a step-wise approach. The batch size is set to 128. All parameters are initialized using Xavier initialization [50]. Dropout [154] is applied to all hidden layers to prevent over-fitting and improve the generalization of the model. All DNNs are trained end-to-end using Pytorch v1.4 library [132]. 175 AP STA Phase DNN Phase Quantized samples Angles over on few tones all tones BR Tone index frame Tone index Figure 6.6: DLCS workflow in sparsification (exploitation) phase. 6.5.3 Sparsification Phase After completing the training phase, the AP initiates the sparsification phase. That is, the network begins to use the trained DNNs to reduce the channel sounding overhead when applicable. To do so, the AP informs all STAs of Sψ , Sϕ , qψ , and qϕ , where Sψ and Sϕ are the number of tones for which STAs report angles of Ψ and Φ, respectively. qψ and qϕ are the number of bits for quantizing each angle in Ψ and Φ, respectively. Fig. 6.6 illustrates the CSI reporting process when the AP is equipped with the trained DNNs. In what follows, we elaborate the operations at an STA and the AP, respectively. Operations at an STA. Referring to Fig. 6.2, when MU-MIMO transmission is triggered by an NDPA frame, each STA estimates the downlink channel vector H(k) based on the re- ceived NDP frame, where k = {kψ , kϕ } is the selected tone indices, kψ ∈ {⌊0.5Ntone /Sψ ⌋, ⌊1.5Ntone /Sψ ⌋, · · · , ⌊(Sψ − 0.5)Ntone /Sψ ⌋} is the set of tone indices for which STAs report Ψ and kϕ ∈ {⌊0.5Ntone /Sϕ ⌋, ⌊1.5Ntone /Sϕ ⌋, · · · , ⌊(Sϕ − 0.5)Ntone /Sϕ ⌋} is the set of tone indices for which STAs report Φ. Spatial compression is performed on H(k) to obtain the angles in Ψ and Φ, which are then quantized using qψ and qϕ bits (using the quantization method in [67]), respec- tively. In the BR frame shown in Fig. 6.2, instead of reporting CSI angles on all groups of tones, the STAs report ψ and ϕ angles only on those Sψ tones and Sϕ tones, respectively. In addition, 176 Table 6.2: End-to-end error of DNNs in inferring the angles in Ψ. Sψ =5 Sψ =6 Sψ =7 Sψ =8 Sψ =9 qψ =3 bits 10.55% 10.63% 12.00% 8.99% 9.88% qψ =4 bits 5.85% 4.95% 5.03% 3.86% 3.29% qψ =5 bits 3.97% 2.77% 2.52% 1.93% 1.32% qψ =6 bits 3.52% 2.16% 1.53% 1.35% 1.14% qψ =7 bits 3.19% 2.08% 1.16% 1.14% 0.80% Table 6.3: End-to-end error of DNNs in inferring the angles in Φ. Sϕ =5 Sϕ =6 Sϕ =7 Sϕ =8 Sϕ =9 qϕ =3 bits 26.51% 22.70% 27.39% 29.83% 21.57% qϕ =4 bits 8.30% 6.63% 6.33% 6.09% 5.73% qϕ =5 bits 3.01% 2.40% 2.19% 2.14% 1.85% qϕ =6 bits 2.67% 2.06% 1.10% 1.01% 0.76% qϕ =7 bits 2.30% 1.07% 0.82% 0.77% 0.57% each STA also reports the measured SNR values to the AP in the BR frame, following the existing 802.11 protocol [67]. Operations at the AP. Upon receiving the reports from an STA, the AP extracts the quantized angles and SNR reports. As illustrated in Fig. 6.6, the received angles are then fed into the DNNs to infer the angles over all tones. The output of the DNNs are then used to construct the beamforming vectors for downlink MU-MIMO transmissions. 6.5.4 Parameter Selection and Numerical Results A question to ask is how to choose the values for sparsification parameters Sψ , Sϕ , qψ , and qϕ . In our design, the parameter values are selected to ensure the end-to-end errors below a pre-defined threshold, which is empirically set. Specifically, after the AP collects the sufficient channel data, it first trains DNNs under different values of sparsification parameters and then records the end-to- end error in the test phase. The AP selects the values for sparsification parameters that yield the 177 Overhead(bits/angle/tone) DeepMux IEEE 802.11ax DeepMux IEEE 802.11ax 2 6 Error (%) 3 1 0 0 DeepMux T0G4 T1G4 T1G16 DeepMux T0G4 T1G4 T1G16 (a) Error. (b) Overhead. Figure 6.7: Error and overhead comparison between DLCS and existing 802.11 protocols [70]. lowest sounding overhead while meeting the end-to-end error requirement (below a pre-defined threshold). To illustrate this selection approach, we resort to experiments. We implemented DLCS in an indoor environment and collected about 50, 000 angle samples in the uplink over 20 MHz band- width. We tuned those parameters and examined the performance of well-trained DNNs. As a possible end-to-end error threshold in inferring the angles, we use error from the tone grouping mechanism. Table 6.2 and Table 6.3 present our results. In each table, the DNN settings which meet the end-to-end error requirement are highlighted in green color. Based on the results, we choose (Sψ = 9, qψ = 5) which leads to 0.19 bits/angle/tone overhead and 1.32% error for the angles in Ψ. We choose (Sϕ = 6, qϕ = 7) for the angles in Φ which leads to 0.18 bits/angle/tone overhead and 1.07% error. Finally, the DNNs we choose are a 9 × 16 × 32 × 64 × 128 × 234 DNN for sparsification of Ψ and a 6 × 16 × 32 × 64 × 128 × 234 DNN for sparsification of Φ. We note that the resultant parameter values are scenario-specific. When an AP is moved to a new scenario, it needs to re-tune the parameters to obtain the “best” values for those parameters. Fortunately, the parameter re-tuning process can be done by the AP automatically without human intervention. We now compare DLCS with existing 802.11 protocols in terms of error and sounding over- 178 head. Fig. 6.7 presents our results. Particularly, TiGj in the figure means feedback type i is employed and Ng = j tones are grouped for feedback. Fig. 6.7(a) shows the superior performance of DLCS in terms of error. DLCS reaches 1.19% error, while T0G4, T1G4, and T1G16 reach 2.48%, 1.64%, and 7.05% error, respectively. Fig. 6.7(b) shows that DLCS entails significantly lower overhead compared to existing 802.11 protocols. DLCS reaches a sounding overhead as low as 0.19 bits/angles/tone while T0G4, T1G4, and T1G16 reach 1.50, 2.00, and 0.50 bits/angles/tone overhead, respectively. This means DLCS reduces sounding overhead by 62.0%∼90.5%. 6.6 DLRA: A Lightweight Resource Allocation In this section, we employ DNNs to facilitate the resource allocation problem at the AP, which includes two sub-problems: RU assignment and power allocation. Recall that the AP recovers angles in Ψ and Φ using DNNs, and it also collects SNR values over all tones. The angles in Ψ and Φ can be used to partially reconstruct the right singular vectors of channel matrices, which can be leveraged to mitigate inter-user interference in the downlink transmissions. The SNR values provide the information of channel quality, which can be used to optimize the resource allocation. In what follows, we first formulate the resource allocation problem as an optimization problem, and then develop a learning-based algorithm to solve it. Finally, we offer numerical results to show the effectiveness of the proposed learning-based algorithm. 6.6.1 Problem Formulation and Reformulation Problem Formulation. At an AP, denote N as the set of STAs that it serves in the downlink MU-MIMO-OFDMA transmission. Denote R as the set of RUs, which are the granularity for assignment. Let |N | = Nsta and |R| = Nru . We define a binary variable zi,j to indicate the RU 179 assignment. Specifically, zi,j = 1 if RU j is assigned to STA i; and zi,j = 0 otherwise. Denote pi,j as the portion of the AP’s power allocated to STA i on RU j. Denote Wj as the bandwidth of RU j. Denote γi,j as reported SNR at STA i on RU j. Denote ri,j as the data rate achieved by STA i on RU j. Denote ri as the achievable data rate for STA i. Denote Ω(·) as the mapping function from SNR to data rate. Then, the resource allocation problem with the objective of maximizing total STAs’ data rate can be expressed as: X maximize ri (6.2a) p,z i∈N X s.t. ri ≤ ri,j , i ∈ N ; (6.2b) j∈R  ri,j ≤ Wj zi,j Ω pi,j γi,j , i ∈ N , j ∈ R; (6.2c) X zi,j ≤ Nap , j ∈ R; (6.2d) i∈N X pi,j ≤ 1 . (6.2e) i∈N ,j∈R In this formulation, z = {zi,j }i∈N ,j∈R and p = {pi,j }i∈N ,j∈R are optimization variables. {γi,j }i∈N ,j∈R , {Wj }j∈R , and Nap are given parameters. Constraint (7.4b) calculates the achieved data rate by an STA. Constraint (7.4c) defines the achievable rate region. Constraint (6.2d) is spatial DoF constraints on the maximum number of STAs that can be allocated to an RU. Constraint (6.2e) characterizes the power budget at the AP. Achievable Rate Region. A classical way to map SNR to data rate is Shannon capacity, which is a theoretical bound and hard to reach in practice. In 802.11 networks, adaptive MCS is used 180 12 Shannon capacity MCS-based data rate (w/o overhead) Data rate (bit/s/Hz) MCS-based data rate (w/ overhead) 9 Approximated boundary Gap from Gap by MCS overhead 6 3 Approximated region 0 0 1000 2000 3000 4000 SNR Figure 6.8: Illustration of Shannon capacity, MCS-based data rate, and achievable data rate. to adjust the data rate based on SNR. As shown in Fig. 6.8, there is a significant gap between Shannon capacity and MCS-based data rate. Therefore, Shannon capacity is not an ideal function for our purpose. Moreover, when taking into account the overhead from OFDM cyclic prefix and pilot tones in 802.11ax2 , the achievable data rate becomes even lower, as shown in Fig. 6.8. The achievable data rate region (MCS-based rate with overhead) is characterized by a staircase curve, which is non-convex function. To simplify the optimization, we approximate the achievable rate region using a series of linear constraints as illustrated by Fig. 6.8. Mathematically, by defining γ as a measured SNR value, we approximate the achievable rate region as follows: Ω (γ) ≤ ak γ + bk ; k ∈ K, (6.3) where ak and bk are given in Table 6.4 as per IEEE 802.11ax; and K ≜ {1, 2, · · · , 13}. We note that the EVM in Table 6.4 is equivalent to the inverse of post-SNR of a decoded data stream 2 For 802.11ax with 20 MHz bandwidth, every 26-tone RU has 2 tones for pilot. 181 Table 6.4: EVM specified in IEEE 802.11ax [70]. EVM (dB) [+∞,−5) [−5,−8) [−8,−10) [−10,−13) [−13,−16) [−16,−19) [−19,−22) Modulation N/A BPSK BPSK QPSK QPSK 16QAM 16QAM Coding rate N/A 1/2 3/4 1/2 3/4 1/2 3/4 Γ(EVM) N/A 1/2 3/4 1 3/2 2 3 ai 0.1067 0.0536 0.0457 0.0339 0.0170 0.0170 0.0085 bi 0 0.1679 0.2177 0.3359 0.6734 0.6718 1.3468 EVM (dB) [−22,−25) [−25,−27) [−27,−30) [−30,−32) [−32,−35) [−35,−∞) [−35,−∞) Modulation 64QAM 64QAM 64QAM 256QAM 256QAM 1024QAM 1024QAM Coding rate 2/3 3/4 5/6 3/4 5/6 3/4 5/6 Γ(EVM) 4 9/2 5 6 20/3 15/2 25/3 ai 0.0021 0.0018 0.0013 0.0008 0.0007 N/A 0 bi 2.3609 2.4605 2.6968 3.2806 3.3696 N/A 5.6250 at a receiver. The relation of γ in (6.3) and the EVM value in Table 6.4 can be expressed as γ = 10−EVM/10 . Based on the EVM regions specified in Table 6.4, the approximated achievable rate region with its boundaries is shown in Fig. 6.8. Then, constraints in (7.4c) can be expressed as: ri,j ≤ Wj zi,j (ak pi,j γi,j + bk ), i ∈ N , j ∈ R, k ∈ K. (6.4) Using (6.4), the resource allocation problem in (7.4) can be re-defined as: X maximize ri (6.5) p,z i∈N s.t. (7.4b), (6.2d), (6.2e), and (6.4). The optimization problem in (6.5) is an MINLP problem. The non-linear term is from (6.4), where binary and continuous optimization variables are multiplied. Problem Reformulation. To reduce the processing time, we reformulate the MINLP problem (6.5) to an MILP problem by leveraging a classic linearization technique [151]. To do so, we as- sume that the SNR value is bounded. This is a valid assumption in practice. Denote γmax as the 182 maximum value of SNR (e.g., 45 dB in our design) and define a constant A = maxj,k {Wj (ak γmax + bk )}. Then, (6.4) can be equivalently expressed as: ri,j ≤ Wj (ak pi,j γi,j + bk ), i ∈ N , j ∈ R, k ∈ K. (6.6a) 0 ≤ ri,j ≤ zi,j A, i ∈ N , j ∈ R. (6.6b) Therefore, the MINLP problem in (6.5) can be reformulated to the following MILP problem: X maximize ri (6.7) p,z i∈N s.t. (7.4b), (6.2d), (6.2e), and (6.6). We note that the MINLP problem in (6.5) and the MILP problem in (6.6) have identical fea- sible region. The reformulation does not alter the solution space. The new optimization problem involves 2Nsta Nru +Nsta continuous variables, Nsta Nru binary variables, and 14Nsta Nru +Nsta + Nru + 1 constraints. Recall the example in Section 6.3.2, where a 4-antenna AP serves six STAs on four 52-tone RUs. By formulating the resource allocation problem in the form of (6.7), off- the-shelf optimization solver MOSEK [14] can find an optimal solution within 5 seconds for most cases. In general, MILP is NP-hard. Its computational complexity is still beyond the acceptable range of a wireless AP device. 6.6.2 DLRA: A Deep-Learning-Based Resource Allocation Solving an MILP problem is still beyond the computational capacity of an 802.11ax-enabled AP to allocate its resources for downlink transmissions. To reduce the computational complexity, we take advantage of recent advances in DNNs. Specifically, we first reformulate the resource 183 LP Power MILP (8) LP Solver coefficients MILP RU Solver assignment Training SNR set Pre- Post- SNR reports processing processing Training flow Exploitation flow Figure 6.9: DLRA workflow in training and exploitation phases. allocation problem as an MILP problem as shown in (6.7), and then employ a DNN to compute the binary variables. Once the binary variables are determined, the MILP problem degrades to a linear programming problem, which is easy to solve. In what follows, we focus on the design of a DNN to determine the binary variables in (6.7). DNN Settings. Fig. 6.9 shows the DNN-based approach in training and sparsification (ex- ploitation) phases. The input of the DNN is the SNR values reported by the STAs. The dimension of input layer is Nsta Nru . The DNN consists of multiple hidden layers. Each hidden layer is fully- connected, followed by a batch-normalization layer to speed up the training convergence [72]. Sigmoid activation function is used for each layer. The output layer has Nsta Nru neurons, each of which corresponds to a binary variable in RU assignment sub-problem. In our experiments, we consider the case where an 8-antenna AP serves 20 STAs on 9 RUs. For this case, the input and output layers both have 180 neurons, and the overall DNN’s structure we trained for RU assignment is 180 × 128 × 128 × 180. Data Collection and Pre-processing. We collect 60, 000 SNR reports from an office envi- 184 ronment. Each report consists SNR values over all the nine 26-tone RUs on 20 MHz bandwidth. Every set of SNR values (20 SNR reports) will be flattened, normalized, and then used for training the DNN as an instance of its input. At the same time, the set of unprocessed SNR values will be fed into (6.7). The output of (6.7) includes RU assignment and power allocation coefficients. The resultant RU assignment will be used as the reference output of the DNN in its supervised training procedure. We use MOSEk v.9 [14] to solve (6.7) for a given set of SNR values. Since data generation process is pretty slow, we augment the training data set by adding negligible noise to the original input samples. Moreover, we set aside one third of input-output sample pairs for test purpose. We augment the remaining samples 4.5 times. Training Process. To train the DNN, we use NMSE loss function. The outputs of (6.7) for given sets of SNR values are used as reference outputs of the DNN in training loss calculation. For training the DNN, we use Adam optimizer [85] and PyTorch v1.4 library [132]. We also apply batch normalization [72] and Xavier initialization [50] approaches to accelerate the training process. Post-Processing. The output of DNN will be post-processed in two steps: binarization and correction. The output of DNN is a vector comprising real values bounded between 0 and 1. We apply a threshold-based binarization on outputs of the DNN to transform them into binary entities. Once the binary vector is obtained, we can use our domain knowledge to further polish this vector. Two rules are followed in the correction step: i) If the DoF constraint is violated on an RU, the STA with the lowest SNR will be removed until the DoF constraint is met. ii) When the DoFs on an RU are under-utilized, the STA with the highest SNR will be activated if there is an assigned STA with a lower SNR. Computational Complexity. Referring to Fig. 6.9, the computational complexity of pre- processing and post-processing operations is O (Nsta ), provided that Nru