ADDRESSING SECURITY, SCALABILITY, AND USABILITY CHALLENGES OF BLOCKCHAIN INTEGRATION WITH THE SMART WORLD By Nikolay Ivanov A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of Computer Science—Doctor of Philosophy 2023 ABSTRACT In recent decades, we have witnessed a convergence of multiple technologies into the inte- grated ever-evolving Smart World ecosystem. The ongoing evolution of the Smart World is shaped by cross-technological integration, as well as the adoption of new technologies into the ecosys- tem. Particularly, academia and industry envision blockchain technology as one of the major new additions to the Smart World. However, the adoption of blockchain technology is impeded by three major practical challenges: security, scalability, and usability. This dissertation aims at ad- dressing these three challenges by focusing on revealing new blockchain attacks, facilitating threat mitigation in smart contracts, and introducing new trust-free applications of blockchain technol- ogy. First, this dissertation addresses some security challenges of blockchain largely overlooked in existing research. We discovered six zero-day social engineering attacks in Ethereum smart contracts and propose measures to address them. Furthermore, we introduce a new attack against hardware crypto wallets, confirmed by the manufacturers of the wallets, which evades security ver- ification by user. Second, the dissertation elaborates on defending smart contracts against attacks. We design a comprehensive five-dimensional classification taxonomy of smart contract defense tools and classify 133 existing threat mitigation solutions using our taxonomy. Next, we introduce a new smart contract security testing approach called transaction encapsulation, and implement a transaction testing tool, which reveals the actual outcomes (either benign or malicious) of Ethereum transactions. Third, the dissertation introduces novel practical blockchain applications that exhibit increased security, privacy, and user control compared to other distributed solutions. We propose a framework that uses a single Ethereum smart contract for enabling high-performance scalable smart contracts on the cloud. Finally, the dissertation introduces a solution that uses Ethereum smart con- tracts for leveraging decentralized networks of WiFi hotspots with cross-domain authentication and automated QoS enforcement. We implemented and thoroughly evaluated all the proposed attacks, defenses, and frameworks thereby confirming the real-world applicability of our work. The disser- tation concludes with an outlook of our ongoing and future efforts to further address the practical challenges associated with the integration of blockchain into the Smart World ecosystem. Copyright by NIKOLAY IVANOV 2023 To Anya, Mark, and Erik for their love and support. iv ACKNOWLEDGMENTS I would like to express my heartfelt gratitude towards my advisor, Dr. Qiben Yan, for being my mentor and research partner since the first day I jointed his lab; his patience, wisdom, and devotion to students’ success are truly inspiring. Thank you, Dr. Yan, for believing in me and empowering me to believe in myself! I am thankful to Dr. Li Xiao for helping me navigate the intricacies of my Ph.D. program, serving in my Qualifying and Guidance committees, offering multiple opportunities for academic growth, and helping me with fellowship and faculty job applications. I am thankful to Dr. Matt Mutka for his service in my Guidance committee, thoughtful feedback on my research, and helping me in my faculty job applications. I am thankful to Dr. Jian Ren for his service in my Guidance committee and important feedback on my research from the engineering perspective. I am thankful to my past and present colleagues from THINK Lab and SEIT Lab — Dr. Mohan- nad Alhanahnah, Boyan Hu, Qi Xia, Qicheng Lin, Jianzhi Lou, Hanqing Guo, Guangjing Wang, Bocheng Chen, Yuanda Wang, Ce Zhou, Anurag Kompalli, Jon Gorman, Max Danley, Eric Ranes, and Simon Harmata. I am thankful to the external research collaborators and mentors that I had the honor to work with: Dr. Yunhao Liu (Tsinghua University), Chenning Li (Massachusetts Institute of Technology), Dr. Qingyang Wang (Louisiana State University), Dr. Ting Chen (University of Electronic Science and Technology of China), Dr. Xiapu Luo (The Hong Kong Polytechnic University), and Zhiyuan Sun (King’s College London). I am grateful to many faculty members from Southwest Minnesota State University, University of Nebraska—Lincoln, and Michigan State University for helping me navigate academia: Dr. Shushuang Man, Dr. Daniel Kaiser, Prof. Kourosh Mortezapour, Dr. Don Robertson, Dr. Teresa Henning, Dr. Lori Baker, Dr. Tom Williford, Dr. Thomas Dilley, Dr. Vaughn Gehle, Dr. Huang Mu-wan, Dr. Wije Wijesiri, Dr. Massimiliano Pierobon, Dr. Zhichao Cao, Dr. Charles Ofria, Dr. Arun Ross, Dr. Abdol-Hossein Esfahanian, and Dr. Charles Owen. Most importantly, I am thankful to my wife Anya, my sons, Mark and Erik, and my mother Julia — for their love, support, patience, and believing in me. v TABLE OF CONTENTS CHAPTER 1: INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1: Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2: Research Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 1.3: Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 CHAPTER 2: SOCIAL ENGINEERING IN ETHEREUM SMART CONTRACTS . . . . . 15 2.1: Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.2: Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.3: Threat Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.4: Social Engineering Attacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.5: Case Study of Real-world Smart Contracts . . . . . . . . . . . . . . . . . . . . . . 34 2.6: Evaluation and Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 2.7: Security Recommendations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 2.8: Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 CHAPTER 3: ATTACKING HARDWARE WALLETS . . . . . . . . . . . . . . . . . . . . 47 3.1: Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 3.2: EthClipper Attack Design and Analysis . . . . . . . . . . . . . . . . . . . . . . . . 51 3.3: Implementation and Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 3.4: Security Recommendations and Defense . . . . . . . . . . . . . . . . . . . . . . . 67 3.5: Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 3.6: Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 CHAPTER 4: TAXONOMY OF DEFENSE SOLUTIONS FOR SMART CONTRACTS . . 70 4.1: Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 4.2: Prior Surveys . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 4.3: Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 4.4: Threat Mitigation Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 4.5: Design Workflows of Threat Mitigation Methods . . . . . . . . . . . . . . . . . . . 94 4.6: Vulnerability Coverage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 4.7: Trends and Perspectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 4.8: Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 CHAPTER 5: CONTEXT-AWARE USER-CENTERED TRANSACTION TESTING . . . . 111 5.1: Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 5.2: Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 5.3: Motivating Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 5.4: Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 5.5: TxT: Transaction Testing Framework . . . . . . . . . . . . . . . . . . . . . . . . . 121 5.6: Implementation and Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 5.7: Limitations and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142 5.8: Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144 vi CHAPTER 6: SMART CONTRACTS ON THE CLOUD . . . . . . . . . . . . . . . . . . . 145 6.1: Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 6.2: Comparison with SOTA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148 6.3: System Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 6.4: Scalability Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159 6.5: Security Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162 6.6: Implementation and Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164 6.7: Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169 CHAPTER 7: DECENTRALIZED NETWORK OF WI-FI HOTSPOTS . . . . . . . . . . . 171 7.1: Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171 7.2: Background and Key Insights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174 7.3: The SmartWiFi System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176 7.4: Security Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183 7.5: Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184 7.6: Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186 7.7: Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194 CHAPTER 8: CONCLUSION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195 8.1: Summary of Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195 8.2: Limitations and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198 8.3: Lessons Learned . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203 8.4: Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205 BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206 APPENDIX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230 vii CHAPTER 1: INTRODUCTION The world is experiencing the emergence and rapid growth of various information and engineering technologies, including, but not limited to: Internet of Things (IoT), distributed systems, mobile computing, artificial intelligence, autonomous and semi-autonomous vehicles, space technology, digital finance, smart health care, big data, wireless communication, smart agriculture, Industry 4.0 [181], and smart cities — to name a few. Evidently, not only do these technologies appear and evolve within themselves, but also they converge into an integrated ecosystem — the ever-evolving Smart World [143]. The ongoing evolution of the Smart World is manifested by an increasing cross-technological integration as well as the adoption of new technologies into the ecosystem [195]. Particularly, the academia and industry envision the blockchain technology as one of the major new additions to the Smart World [81,82,164,190,238,251,259,300,306]. The blockchain technology allows to imple- ment and automatically enforce fine-tuned communication and authorization protocols encoded in smart contracts, which eliminate by design such events as data falsification, lost records, repudia- tion, and untraceable sources. Multiple studies demonstrate [128,253,254,257] that complexity and size of the trusted code adversely affect the security of a system. To avoid negative consequences caused by the growing complexity of scaling computer systems, security design solutions have been proposed based on the blockchain technology. While public blockchains are predominantly used for cryptocurrencies and decentralized tokens, permissioned blockchains are used by busi- nesses and governments, as they allow multiple parties, who do not necessarily trust one another, to establish a shared digital ecosystem with a set of predetermined automatically enfoced policies. According to Deloitte’s 2019 Global Blockchain Survey [129, 155] encompassing 1,386 senior ex- ecutives, 53% of companies worth $500 million or more, place blockchain integration within 5 top most critical priorities. Adopted by multiple businesses and governments, the blockchain tech- nology allows to contain the entropy of complex systems, especially those involving data flows 1 and transactions between multiple independent organizations. To facilitate the integration of the blockchain technology into existing IT infrastructure, a number of ready-to-use platforms, such as Hyperledger [53], have been developed. Table 1.1: Overview of the scope of this dissertation. Challenges Addressed Chapter Related Publication† SECURITY SCALABILITY USABILITY N. Ivanov, J. Lou, T. Chen, J. Li, Q. Yan Targeting the Weakest Link: Social Engineering # # PART I: ATTACKS Chapter 2 Attacks in Ethereum Smart Contracts ACM ASIA CCS 2021 N. Ivanov, Q. Yan EthClipper: A Clipboard Meddling Attack on Chapter 3 # G Hardware Wallets with Address Verif. Evasion # IEEE CNS 2021 N. Ivanov, C. Li, Z. Sun, Z. Cao, X. Luo, Q. Yan PART II: DEFENSE Chapter 4 Security Threat Mitigation For # # G Smart Contracts: A Survey To appear at ACM Computing Surveys (CSUR) N. Ivanov, A. Kompalli, Q. Yan. TxT: Real-time Transaction Encapsulation Chapter 5 # G for Ethereum Smart Contracts IEEE TIFS 2023 N. Ivanov, Q. Yan, Q. Wang PART III: APPLICATIONS Blockumulus: A Scalable Framework for Chapter 6 # # G Smart Contracts on the Cloud † IEEE ICDCS 2021 N. Ivanov, J. Lou, Q. Yan. SmartWiFi: Universal and Secure Smart Chapter 7 # G Contract-Enabled WiFi Hotspot EAI SecureComm 2020 — primaty focus; G # — partial focus; # — not addressed. † The author of this dissertation (in bold), is the main contributor to all these papers, and the primary research advisor of all these publications is Dr. Qiben Yan (underlined). Unfortunately, the integration of the blockchain technology into the Smart World ecosystem faces two major practical challenges: security [185, 309] scalability [179, 308], and usability [121, 213]. In this dissertation, we aim at addressing multiple aspects of the above triad of challenges largely overlooked by current studies. Since the majority of our solutions simultaneously target 2 some subsets of the three challenges, this dissertation organizes the solutions based on their core approach rather than challenges it intends to address, as summarized in Table 1.1. Specifically, we identify three major approaches described in this dissertation: revealing attacks, facilitating de- fense, and designing novel blockchain applications. We further outline the plan for future work, in which we continue addressing the security, scalability, and usability issues that impede the practical adoption of blockchain. 1.1: Related Work This section summarizes the most influential pieces of existing literature related to the research in this dissertation, subdivided into five major categories: social engineering and phishing, security of hardware wallets, smart contract defense, blockchain scalability improvements, and wireless hotspot networks. Each category summarizes state-of-the-art research followed by a brief discus- sion of the contribution of this dissertation. 1.1.1: Social Engineering and Phishing The study of social engineering attacks in Ethereum is limited to honeypots — deceptive smart contracts targeting users who attempt to exploit known vulnerabilities of smart contracts. Torres et al. [270] present a taxonomy of honeypots, while Zhou et al. [309] later discover 51 previously undetected honeypots. Although Ethereum honeypots is definitely a subclass of social engineering attacks, these contracts are harmless for ordinary users, as their potential victims are opportunistic malicious players. Social engineering attacks are known outside of the blockchain domain. Fu et al. [124] present a methodology for defending against such attacks, and develop a Unicode character similarity list and attack detection tool, IDN-SecuChecker. Holgers et al. [150] conduct a measurement study of IDN homograph attacks, which shows their real-world impact. Email/URL phishing and Ethereum so- cial engineering attacks both target human cognitive biases. Phishing attacks have been thoroughly studied in recent years [102, 149, 151, 218, 219, 266, 273]. However, the unique characteristics of smart contracts, such as open execution, fee-charging transactions, and non-interactive properties, 3 make the design of their social engineering attacks significantly different from traditional phishing attacks. Contribution of this Dissertation: The research described in this dissertation is the first to sys- tematically study social engineering techniques in Ethereum smart contracts. Specifically, this dissertation highlights a largely overlooked class of social engineering attacks in Ethereum smart contracts. These attacks exploit human cognitive biases as new attacking vectors. We identified these biases and developed six zero-day social engineering attacks. By embedding most of these attacks into existing popular tokens, we demonstrated that the attacks have the potential to victim- ize a large group of normal users. Moreover, the attacks remain dormant during testing and only activate after a production deployment. 1.1.2: Security of Hardware Wallets Guri et al. [138] demonstrate a technique that allows for an attacker to exfiltrate private keys from a hardware wallet by installing a malware directly on the wallet’s firmware. Gutoski et al. [139] show that the hierarchical deterministic (HD) wallet design, used in all popular hardware wallets, allows to reveal all the private keys in the hierarchy if only one of the private keys is leaked; this research further proposes a new design of an HD wallet that allows to avoid such key co-dependency. Several works in wireless sensing [184] demonstrate the ability to steal passcodes from personal devices, possibly including hardware wallets. The above adversarial scenarios, however, assume that either the attacker has a physical access to the hardware wallet, or there is a partial leak of wallet credentials. Datko et al. [104] demonstrate how the firmware of some hardware wallets can be attacked to steal the user PIN code. San Pedro et al. [246] explore side-channel attacks that allow to extract PIN codes and private keys from Trezor One hardware wallet — although the vulnerability has been timely patched by the manufacturer, it demonstrates that the hardware and firmware components of hardware wallets can also be attacked. Gkaniatsou et al. [131] show how the low-level local communication protocol between the client software and the hardware wallet can be used for side-channel attacks. 4 Contribution of this Dissertation: This dissertation proposes a new hybrid EthClipper attack, which zeroes in on the adversarial actions that the real world attackers have been using successfully for decades, i.e., malware infestation of user computers and social engineering. We demonstrated that it is possible to compromise the air-gapped security of a hardware wallet and fool its owner into confirming a malicious transaction, even without jeopardizing the integrity of the wallet itself. Our EthClipper attack, which is confirmed to be potentially dangerous by the manufacturers of three leading hardware wallet firms, not only falsifies the input to the hardware wallet, but it also crafts the address in a way that allows to circumvent the transaction verification procedure. Our evaluation confirms that the attack can be carried out with a limited budget on a retail equipment. 1.1.3: Smart Contract Defense Code-based defense tools use source code, bytecode and/or ABI maps for finding bugs and vulner- abilities in smart contracts. One of the most popular code-based approaches is symbolic execution, represented by Mythril [215], Oyente [200], and Maian [223]. SmarTest [260] uses a language- based model for guiding symbolic execution and generating malicious transaction sequences. Static analyzers and formal verifiers, such as Securify [271], EthBMC [122], VerX [230], and Vandal [76] attempt to extract semantics and other facts from the code for finding violation of safety pat- terns. Many static analysis tools zero in on specific security issues. For example, ZEUS [168], Osiris [269], and VeriSmart [261] focus on arithmetic bugs; ECFChecker [137], Sereum [240], and SeRIF [83] address reentrancy; TokenScope [96] targets security issues of ERC-20 tokens. How- ever, the major drawback of code-based defense approaches is the probabilistic nature of the result, which incurs non-negligible false positives/negatives. Smart contract testers allow to generate and execute transactions to unveil vulnerabilities or semantic violations. Manual testing methods include tools like Waffle [7] and Solidity Cover- age [6]. In order to enhance the ability of the test tools to reveal vulnerabilities, a number of smart contract fuzzing methods have been proposed, including Harvey [291], Confuzzius [119], Contract- Fuzzer [163], and sFuzz [221]. These testing methods try to find transaction parameters that would 5 confirm the safety of a smart contract or reveal a vulnerability. However, the search space for the candidate parameters is usually too large to exhaust all the possible values (the path explosion problem); as a result, the testing methods only use some sample sets of parameters or heuristically determined combinations of parameters — resulting in overlooked vulnerabilities. Unlike code-base defense tools, which statically scrutinize source code or bytecode of smart contracts, the transaction-based defense tools analyze historical transactions stored in the blockchain, or intercept the incoming transactions in real time. TxSpector [299] and EthScope [289] deliver frameworks for retrospective vulnerability search using Ethereum transactions. SODA [94] and Ægis [117] are tools for online interception of malicious transactions. Qin et al. [235] describe a transaction replay scheme, but it is only used to demonstrate a front-running attack, not for defense. Evidently, none of the existing transaction-based methods provide a definitive result to ensure the transaction safety. Contribution of this Dissertation: This dissertation surveyed the full spectrum of smart contract threat mitigation solutions in this work. We presented a general taxonomy for the classification of such solutions, which applies to today’s methods and is suitable for future methods, even if new paradigms, blockchain platforms, or vulnerabilities appear. Using this taxonomy, we classified 133 existing smart contract threat mitigation solutions. We identified eight distinct core defense methods employed by the existing solutions and developed synthesized workflows of these core methods. We studied the ability of the existing smart contract threat mitigation solutions to address the known vulnerabilities. We conducted an evidence-based evolutionary study of smart contract threat mitigation solutions to outline trends and perspectives. To further benefit the community of smart contract security researchers, users, and developers, we deployed an open-source, regularly updated online registry for smart contract threat mitigation. Furthermore, in this dissertation, we proposed a transaction-based dynamic interceptor, called TxT, that deterministically verifies the safety of transactions, or refuses to give an answer in case of uncertainty. In contrast with previous defense solutions, TxT provides the user with an actual outcome of a transaction applied to the current state of blockchain. Instead of predicting future transaction parameters, TxT tests the exact 6 transactions the user is going to submit. Moreover, in addition to replaying the transaction, TxT also addresses the notorious TOCTOU challenge by assessing the expiration and replicability of the test transaction. 1.1.4: Blockchain Scalability Improvements There have been various solutions proposed for improving the performance and scalability of blockchain which we subdivide into five major groups: 1) off-chain execution, 2) side- and cross- chaining, 3) sharding and alternative consensus, 4) network optimizations and payment channels, and 5) alternative blockchain architectures. Off-Chain Execution: Off-chain execution is an arrangement that allows to perform computation of some portions of smart contracts outside of the blockchain to improve performance and reduce costs. Ekiden [98] addresses the lack of confidentiality and poor performance of blockchain by securing an off-chain computation via trusted execution environment (TEE) technology. Despite significant performance improvement, the operation of Ekiden relies on the availability of crowd- sourced consensus and compute nodes. The security of the system is founded on the assumption that the participants have Sybil-resistant identities (i.e., they cannot create multiple fake accounts). The requirement for a participation deposit to prevent Sybil attacks may not only be ineffective against wealthy attackers, but may also reduce the incentive for community participation. Another off-chain execution solution, ZEXE, is proposed for abundant private off-chain computation [71]. Unlike Ekiden, ZEXE does not require hardware TEE enclaves, and therefore can be used in a wider scope of platforms. However, this system focuses on improving the computation scalabil- ity and reducing communication overhead, whereas the scalability issue in storage and transaction throughput remains unaddressed. Side-Chaining and Cross-Chaining: Side-chaining is an arrangement in which some smart con- tract execution is outsourced to a different blockchain, while cross-chaining is a way for inde- pendent blockchains to share resources and use common assets. Plasma [231] attempts to reduce fees and improve performance of Ethereum blockchain by linking a smart contract to a tree of 7 child blockchains. Although Plasma distributes the computation load of the master smart contract among multiple chains, the transaction throughput remains a likely bottleneck, and there is no solid evidence of significant improvement of storage capacity. A popular cross-chaining solution called Polkadot [287] improves transaction throughput by creating a network of interoperable blockchains. However, the solution does not directly address the storage and compute capacity for smart con- tracts. Sharding and Alternative Consensus: The concept of sharding involves selecting a subset of nodes to serve as temporary representatives in a decentralized consensus, which curbs the perfor- mance degradation associated with gossip broadcasts in large blockchain networks. Algorand [130] proposes a blockchain with improved performance using a sharding scheme based on a verifiable random function. Algorand delivers a significant increase in transaction throughput compared to classic public blockchains, but its operation relies on a set of assumptions that can be refuted by massive denial-of-service or Sybil attacks. Specifically, Algorand assumes that at least 95% of all honest users must be able to send messages to other honest users, and the overall share of honest participants must be greater than 2/3. Another solution with sharding-based consensus is Rapid- chain [298], which delivers high transaction throughput. However, Rapidchain is not scalable in terms of data storage and compute capacity. Some alternative consensus models attempt to replace a compute-heavy PoW algorithm with lightweight alternatives, such as proof-of-stake (PoS), in which the voting power is determined by the amount of funds in possession of a node. Ouroboros [173] is a provably secure blockchain with PoS consensus. Unfortunately, existing alternative consensuses fail to address the full spectrum of scalability problems, and they introduce a significant fairness challenges, such as “monetary hegemony”. Yu et al. [297] propose a lightweight consensus protocol, OHIE, to improve blockchain scalability by leveraging a parallel execution of the Nakamoto consensus. Despite the improvement in transaction throughput and available bandwidth, the scalability of storage and computation is not considered in OHIE. Network Optimizations and Payment Channels: Off-chain payment channels have been pro- 8 posed to improve performance and reduce fees associated with financial transactions. The Light- ning Network protocol [232] allows to create off-chain micropayment channels. Perun [110] is another proposal of a payment channel that improves routing of transactions. Although off-chain payment channels have been adopted by real-world applications, they cannot serve as alternatives of public blockchains because of their specific focus (only for payment) and the necessity to or- chestrate a network of crowdsourced participants. Alternative Architectures: Researchers have been re-thinking the architecture of blockchain in order to improve performance and scalability. SPECTRE [262] proposes a reorganization of a traditional Nakamoto blockchain into a directed acyclic graph (DAG). Although it improves the speed of transactions, it could not be used for general-purpose smart contracts that may require abundant data storage and heavy computation. Contribution of this Dissertation: In this dissertation, we proposed the first scalable framework, called Blockumulus, for deploying decentralized smart contracts on the cloud — to address the blockchain scalability limitations on three dimensions: transaction throughput, data storage, and computation. Blockumulus employs a novel overlay consensus which delivers decentralization to smart contracts in a centralized cloud instead of random P2P network nodes. Concretely, a consor- tium of centralized cloud computing nodes can host a permissionless smart contract environment where clients can control the execution of their customized contracts and manage the data stored by these contracts. Our evaluation on Microsoft Azure shows that Blockumulus can execute tens of thousands of transactions within a minute, which is on par with the average throughput of world- wide credit card transactions. By integrating the decentralization of smart contracts and the scal- ability feature of the cloud, Blockumulus takes the first step towards high-performance data-rich smart contracts with high transaction throughput. 1.1.5: Wireless Hotspot Networks WiFi hotspots are often operate as networks for improvement of coverage, mobility, authentication, and payment. Industry and academia proposed a number of approaches for WiFi hotspot networks, 9 which we subdivide into two general categories: traditional solutions and blockchain solutions. Traditional WiFi Hotspot Solutions: Current non-blockchain WiFi hotspot solutions are repre- sented either by manual setups, or cloud-managed subscription-based proprietary products, such as Cisco Meraki [8], Aruba [9], Ruckus [14], and similar solutions. However, none of these ap- proaches simultaneously addresses all of the following objectives: a) enhancing hotspot security against malicious routers and clients; b) providing universal authentication and billing; and c) mak- ing payment based on service quality. Blockchain Solutions: OPPay [252] is a peer-to-peer opportunistic data service system. However, the OPPay-based solution is impractical for a WiFi hotspot, as it incurs high fees and does not offer QoS measurement for sustaining a reliable service. A commercial project WinQ [15] has been in development since 2016. Advertised as a blockchain-enabled mobile WiFi hotspot, the solution was intended to operate on its own blockchain called QLC Chain [13]. We installed both the Android and iOS apps to discover that the system is activated only on testnet blockchain, which was practically unavailable. Dynamic Speed Evaluation: QDASH was proposed for dynamic speed measurement [210], which is based on the assumption that the user traffic is available to the client connection handler. This requirement makes QDASH and its derivatives unsuitable for use by SmartWiFi clients. Xylo- phone [292] observes the behavior of TCP ACK and RST packets for speed measurement. Al- though the technique accurately estimates the bandwidth, it requires extended permissions for the client to capture TCP packets, which are usually not available on Android and iOS without root- ing/jailbreaking. Contribution of this Dissertation: In this dissertation, we proposed SmartWiFi, a smart contract- enabled WiFi hotspot system, which provides universal accessibility, cross-domain authentication, association of QoS and payment, and security enhancement. SmartWiFi utilizes a novel crypto- graphic mechanism, Hansa, to establish connection. Hansa provides low-cost off-chain execution by restricting otherwise unacceptable smart contract fees, and significantly reduces delays associ- ated with smart contract interaction. To validate the feasibility of SmartWiFi system, we designed 10 and implemented a SmartWiFi prototype using an Ethereum smart contract. The experimental re- sults show that SmartWiFi exhibits low operational delays, minimum communication overhead, and small blockchain fees. We demonstrated that SmartWiFi is a scalable, secure, and efficient WiFi hotspot solution, which can be easily deployed in a variety of systems with minimal intervention. 1.2: Research Scope 1.2.1: Attacks on Smart Contracts Ethereum holds multiple billions of U.S. dollars in the form of Ether cryptocurrency and ERC-20 tokens, with millions of deployed smart contracts algorithmically operating these funds. Unsur- prisingly, the security of Ethereum smart contracts has been under rigorous scrutiny. In recent years, numerous defense tools have been developed to detect different types of smart contract code vulnerabilities. When opportunities for exploiting code vulnerabilities diminish, the attackers start resorting to social engineering attacks, which aim to influence humans — often the weakest link in the system. The only known class of social engineering attacks in Ethereum are honeypots, which plant hidden traps for attackers attempting to exploit existing vulnerabilities, thereby targeting only a small population of potential victims. In this dissertation, we systemically explore the social en- gineering attacks in Ethereum smart contracts largely overlooked by existing research on smart contract security. Another important aspect of blockchain security is identity management, which is often en- trusted to hardware crypto wallets. Hardware wallets are designed to withstand malware attacks by isolating their private keys from the cyberspace, but they are vulnerable to the attacks that fake an address stored in a clipboard. To prevent such attacks, a hardware wallet asks the user to verify the recipient address shown on the wallet’s display. Since crypto addresses are long sequences of random symbols, their manual verification becomes a difficult task. Consequently, many users of hardware wallets elect to verify only a few symbols in the address, and this can be exploited by an attacker. With this insight, we develop a new attack on hardware wallets and report it to the major manufacturers of the wallets. 11 1.2.2: Smart Contract Defense The blockchain technology, initially created for cryptocurrency, has been re-purposed for record- ing state transitions of smart contracts — decentralized applications that can be invoked through external transactions. Smart contracts gained popularity and accrued hundreds of billions of dol- lars in market capitalization in recent years. Unfortunately, like all other programs, smart contracts are prone to security vulnerabilities that have incurred multimillion-dollar damages over the past decade. As a result, many automated threat mitigation solutions have been proposed to counter the security issues of smart contracts. These threat mitigation solutions include various tools and methods that are challenging to compare. In this dissertation, we develop a comprehensive five- dimensional classification taxonomy of smart contract threat mitigation solutions and classify 133 existing threat mitigation solutions using our taxonomy. Among other discoveries, our classification reveals a low coverage of known vulnerabilities by existing threat mitigation approaches. To increase the vulnerability coverage, we propose a new smart contract security testing approach called transaction encapsulation. The core idea lies in the local execution of transactions on a fully-synchronized yet isolated Ethereum node, which creates a preview of outcomes of transaction sequences on the current state of blockchain. However, This approach poses a critical technical challenge — the well-known time-of-check/time-of-use (TOC- TOU) problem, i.e., the assurance that the final transactions will exhibit the same execution paths as the encapsulated test transactions. To overcome this challenge, we determine the exact conditions for guaranteed execution path replicability of the tested transactions. To demonstrate the transac- tion encapsulation, we implement a transaction testing tool, TxT, which reveals the actual outcomes (either benign or malicious) of Ethereum transactions. To ensure the correctness of testing, TxT deterministically verifies whether a given sequence of transactions ensues an identical execution path on the current state of blockchain. We analyze over 1.3 billion Ethereum transactions and de- termine that 96.5% of them can be verified by TxT. We further show that TxT successfully reveals the suspicious behaviors associated with 31 out of 37 vulnerabilities (83.8% coverage) in the smart contract weakness classification (SWC) registry. In comparison, the vulnerability coverage of all 12 the existing defense approaches combined only reaches 40.5%. 1.2.3: Blockchain Efficiency and Applications One of the most critical fundamental efficiency challenges of blockchain is scalability of public blockchains. Public blockchains have spurred the growing popularity of decentralized transactions and smart contracts, especially on the financial market. However, public blockchains exhibit their limitations on the transaction throughput, storage availability, and compute capacity. To avoid transaction gridlock, public blockchains impose large fees and per-block resource limits, making it difficult to accommodate the ever-growing high transaction demand. Previous research endeav- ors to improve the scalability and performance of blockchain through various technologies, such as side-chaining, sharding, secured off-chain computation, communication network optimizations, and efficient consensus protocols. However, these approaches have not attained a widespread adop- tion due to their inability in delivering a cloud-like performance, in terms of the scalability in trans- action throughput, storage, and compute capacity. In this dissertation, we address the scalability challenge of decentralized computation by using the Ethereum blockchain to secure execution of off-chain smart contracts on the cloud thereby eliminating the data, computation and transaction throughput limitations. Another important application of the blockchain technology proposed in this dissertation is orchestration of decentralized WiFi hotspots. WiFi hotspots often suffer from mediocre security, unreliable performance, limited access, and cumbersome authentication procedure. Specifically, public WiFi hotspots can rarely guarantee satisfactory speed and uptime, and their configuration often requires a complicated setup with subscription to a payment aggregator. Moreover, paid hotspots can neither protect clients against low quality or non-service after prepayment, nor do they provide an adequate defense against misuse by the clients. In this dissertation, we introduce a blockchain-assisted network of WiFi hotspots, which is not only decentralized but also maintains scalability and high performance. 13 1.3: Organization The rest of this document is organized as follows. Part I presents the work in which we reveal overlooked attacks against blockchain and smart contracts. Chapter 2 introduces six zero-day social engineering attacks in Ethereum smart contracts. Chapter 3 elaborates on our new attack against hardware crypto wallets. Part II elaborates on the defense against security threats in smart contracts. Chapter 4 surveys existing state-of-the-art threat mitigation solutions for smart contracts. Chapter 5 introduces the new paradigm of context-aware user-based transaction testing. Part III presents novel applications and efficiency enhancements of blockchain. Chapter 6 addresses the blockchain scalability problem by enabling smart contracts on the cloud. Chapter 7 introduces a blockchain- based solution for decentralized network of WiFi hotspots with cross-domain authentication and QoS enforcement. Chapter 9 summarizes this dissertation and outlines future directions. 14 CHAPTER 2: SOCIAL ENGINEERING IN ETHEREUM SMART CONTRACTS1 2.1: Introduction In one decade, the blockchain technology has emerged from a ledger of barely known cryptocur- rency to an entire industry with hundreds of billions of dollars in market capitalization. A major reason of its vast expansion is the ability to support smart contracts — decentralized programs that can enforce execution of protocols without any third party or mutual trust. Moreover, smart con- tracts are used to store and transfer financial assets. For example, as of December 2020, the Tether USD smart contract had more than 2.1 million users with about $36 billion in daily transaction volume [28]. Like any other software, smart contracts have security vulnerabilities, manifested by recent hacks with multimillion-dollar damages [207, 226]. Moreover, a recent analysis of 420 million Ethereum transactions by Zhou et al. reveals an ongoing evolution of vulnerabilities and attacks in smart contracts [309]. To avoid devastating consequences of smart contract hacks, a number of security auditing tools have been developed to detect smart contract vulnerabilities [76,96,200,271], such as reentrancy, integer overflow, etc., most of which are smart contract code vulnerabilities. However, smart contracts are designed and implemented by human developers to interact with human users, in which the human is the central component of a smart contract ecosystem. Yet, the existing smart contract security studies do not take the human factor into account. In this work, we aim to deliver the first human-centered study of smart contract security. Instead of targeting known code vulnerabilities, social engineering attacks exploit cognitive 1 This chapter is based on previously published work by Nikolay Ivanov, Jianzhi Lou, Ting Chen, Jin Li and Qiben Yan titled “Targeting the Weakest Link: Social Engineering Attacks in Ethereum Smart Contracts” pub- lished at the Proceedings of the 2021 ACM Asia Conference on Computer and Communications Security. DOI: 10.1145/3433210 [158]. 15 bias of human mind. Cognitive bias is an optimization function of the human brain that draws conclusions based on probability, expectation, previous experience, belief, or emotional response, especially when the input data is incomplete and/or decision time is limited [145]. One common technique exploiting cognitive bias is visual deception, which has been widely used in email phish- ing, e.g., via mimicking the appearance of a popular website [285] or International Domain Name (IDN) homograph attacks [150]. Another aspect of cognitive bias is confirmation bias, character- ized by the rejection of evidence dissenting from the initially established belief or narrative [169]. Smart contract honeypot is one example of confirmation bias exploitation, in which the established narrative that the smart contract is vulnerable makes even experienced hackers overlook hidden traps. Honeypot is the only known and documented social engineering attack type in Ethereum [270]. A honeypot is a smart contract that lures a hacker into exploiting a known vulnerability, but an insidious trap in this contract turns the hacker into a victim instead. Despite being a very effective attack class, the scope of potential victims of honeypots is narrow, i.e., skillful hackers who try to steal unprotected funds. In this work, we demonstrate that the Ethereum platform and the most popular smart contract programming language, Solidity, create a potential for evasive social engineering attacks. Social engineering attacks have been carried out across a wide spectrum of technologies, from landline phones to corporate networks. When existing software and hardware defense reduces the attack surface, the adversaries resort to exploiting human cognitive bias — the weakest link in many security systems. To the best of our knowledge, this work presents the first investigation of the possibility, vectors, and impact of social engineering attacks in smart contracts, as well as defense against these attacks. Specifically, we attempt to answer the following three research questions. RQ1: What are the Ethereum social engineering attack vectors? We analyze the exact aspects of human cognitive bias that can be exploited to carry out social engineering attacks in smart con- tracts. Specifically, we discover several common misconceptions and undocumented behaviors of the Ethereum platform that create opportunities for a set of zero-day social engineering attacks. 16 RQ2: Are social engineering attacks in smart contracts feasible? Through our analysis, we identify two classes of social engineering deception — Address Manipulation and Homograph. Across these two categories, we develop six social engineering attacks. By integrating the patterns of these attacks in the source codes of existing contracts with large number of users and billions of dollars in market capitalization, we further show that these attacks could potentially target a large number of victims. RQ3: What are the defenses against social engineering attacks in Ethereum? The human is not only the main target of social engineering attacks, but also an irreplaceable element of defense against these attacks. This prompts us to develop specific security recommendations for identifica- tion and prevention of social engineering attacks by users and auditors. In summary, this dissertation delivers the following contributions: • We identify two classes of social engineering attacks in Ethereum smart contracts, Address Manipulation and Homograph, and develop six zero-day attacks. • We demonstrate the attacks by embedding them in source codes of five popular smart con- tracts with combined market capitalization of over $29 billion, and show that the attacks have the ability to remain dormant during the testing phase and activate only after production de- ployment. • We analyze 85,656 open source smart contracts and find 1,027 contracts that can be directly used for performing social engineering attacks. • For responsible disclosure, we contact seven smart contract security firms. The survey of experts from these firms confirms that the proposed attacks are highly likely to be dangerous. • In the spirit of open research, we make the source codes of the attack benchmark, tools, and datasets available to the public2 . 2 https://nick-ivanov.github.io/se-info/ 17 2.2: Background Smart Contracts and EVM: A smart contract is a program deployed on a blockchain that provides a set of functions to be called via transactions and executed by the blockchain’s virtual machine (VM). Most smart contracts are written in a high-level special-purpose programming language, such as Solidity or Vyper, and compiled into the blockchain VM bytecode. The Ethereum Virtual Machine (EVM) is the blockchain VM for executing Ethereum smart contracts. Externally Owned Account: Ethereum blockchain has two types of accounts: smart contract account and externally owned account (EOA). Both EOAs and smart contract accounts can be referenced by their 160-bit public addresses. EOAs can be used to call the functions of smart contracts via signed transactions. ERC-20 Tokens: ERC-20 is the most popular standard for implementing fungible tokens3 in Ethereum smart contracts. Some of the most traded alternative cryptocurrencies (altcoins) are ERC-20-compatible smart contracts deployed on Ethereum Mainnet, such as ChainLink and Bi- nance Coin. The ERC-20 standard defines an interface that a smart contract should implement in order to become an ERC-20 token to interact with ERC-20-compliant clients4 . OpenZeppelin Contracts: OpenZeppelin Contracts is a library of smart contracts that have been extensively tested for adherence to the best security practices. These smart contracts are consid- ered to be the de-facto standardized implementations of popular smart contract code patterns. The OpenZeppelin project provides a rich codebase for ERC-20 token developers5 . EIP-55 Checksums: Developers of blockchain clients use checksums for validating public ad- dresses. A checksum is a digital fingerprint of an address to ensure its validity and correctness. In Ethereum, the checksum is embedded in the address by capitalizing certain hexadecimal letters, as described in the EIP-55 standard6 . Specifically, if the ith hexadecimal digit of Keccak256 hash 3 Each fungible token has the same value and does not possess any special characteristics compared with other tokens of the same type. 4 https://eips.ethereum.org/EIPS/eip-20 5 https://openzeppelin.com/contracts/ 6 https://eips.ethereum.org/EIPS/eip-55 18 digest of the EIP-55 address string is ≥ 8, the ith hexadecimal digit of the address is capitalized. The accuracy of EIP-55 error checking is nearly 99.986% [55]. Smart Contract Addresses: A smart contract address in Ethereum is generated using the deter- ministic function7 χ(Ad , η), where Ad is the public address of the account deploying the contract, and η is the nonce of the deploying transaction. η is always equal to the number of transactions sent from the deploying EOA. As a result, we can deterministically calculate the address of a future smart contract that will be deployed by a certain user. EVM Function Selector: In EVM, when a smart contract function is called by an EOA or another smart contract, the calling function is identified by its selector Sf as follows: function header string z }| { Sf = P32 (Hk ( “f (α1 , ..., αn )” )), where P32 is a 32-bit prefix, Hk is the Keccak256 hash function, f is the function name, and α1 , ..., αn is the list of argument types (0 ≤ n ≤ 16). For example, the selector of the function f oo with a single 256-bit unsigned integer argument is P32 (Hk (‘‘f oo(uint256)”)) = 0x2fbebd38. 2.3: Threat Model In this section, we give a general overview of social engineering attacks in Ethereum smart contracts by identifying their participants, vectors, goals, and outcomes. 2.3.1: Actors Most known attacks in Ethereum smart contracts involve a hacker exploiting a smart contract vul- nerability [55, 309]. In social engineering attacks, however, a reverse configuration takes place: the owner of the malicious smart contract is the attacker, and the victim of the smart contract is a person or organization who engages with this smart contract. 7 An implementation of this function can be found at https://github.com/ethereumjs/ethereumjs-util. 19 2.3.2: Social Engineering Attack Vectors Here, we expose a number of social engineering attack vectors that are likely to be exploited. Es- sentially, all these vectors are misconceptions (false assumptions) about properties or behaviors of the Ethereum platform. We subdivide these misconceptions into two major categories: 1) mis- conceptions about Ethereum addresses, and 2) misconceptions related to strings and characters in EVM and Solidity. Misconceptions About Addresses: An Ethereum public address is a 160-bit number using a 40- digit hexadecimal representation. Our analysis reveals that the following four false assumptions about Ethereum public addresses can be exploited in social engineering attacks. • M1 : Slight modification of an address (e.g., substitution of a single digit) is useless for an attacker because no one knows the private key associated with the modified address. In this dissertation, we demonstrate that the knowledge of the private key for an address is not always required for a successful social engineering attack. • M2 : EIP-55 checksums deliver a reliable protection against address falsification. In this work, we show that EIP-55 falsification is possible using a brute-force attack on a retail laptop or desktop computer. • M3 : An Ethereum address is associated either with an EOA, or a smart contract, and does not change its status. In this dissertation, we demonstrate that an EOA can mutate into a smart contract and vice versa. • M4 : All Ethereum accounts are equally secure as long as their private keys are random and secret. In this dissertation, we show that a small portion of Ethereum accounts have a special property, making them more vulnerable to a specific social engineering attack. Homograph Backdoors in Solidity: Falsification of typographic symbols, known as homograph or Unicode attacks, have been used in phishing scams [124,150,192]. These attacks mostly falsify domain names, and to the best of our knowledge, there are no recorded homograph attacks carried inside a source code of a program. Surprisingly, our analysis of Solidity reveals the following three misconceptions that open dangerous backdoors to homograph attacks in Ethereum smart contracts. 20 • M5 : Since the string returned by the ERC-20 symbol() function is optional and informa- tional by design, it does not pose any danger. In this dissertation, we show that by falsifying the symbol of an ERC-20 token, an attacker can perform a social engineering attack. • M6 : Two identical arguments of call() or delegatecall() always result in the same 32-bit function selector. In this dissertation, we demonstrate that two identical arguments are capa- ble of producing different function selectors, which leads to the execution of an unexpected function or transaction reversion due to the absence of a referenced function. • M7 : Function selector collision prevention by Solidity compiler eliminates falsification of smart contract functions. In smart contracts, two functions with colliding selectors cannot coexist in one contract. In this dissertation, we show that it is possible to mine names of two functions with visually identical arguments of call() or delegatecall() routines that generate different selectors, thereby allowing these two functions to coexist in the contract. Consequently, unbeknownst to the transaction sender, a non-existent function might be called, resulting in transaction reversal; or a wrong function might be called, leading to unexpected code execution. 2.3.3: Attack Goals and Outcomes Although some Ethereum attackers may pursue vandalism as the primary goal (e.g., via ”funds freeze”), in this work, we assume that the ultimate objective of the attacker is to steal funds from victims. All social engineering attacks covered in this study are based on the premise that the attacker is the owner or privileged user of the smart contract8 , which creates a broad range of pos- sibilities for stealing funds. For example, many contracts implement the selfdestruct procedure, which allows the owner to appropriate the entire balance of the contract by submitting a single transaction. Moreover, as of early December 2020, Etherscan reports more than 342,000 ERC-20 smart contracts, which have a variety of operations with tokenized funds, such as minting, burning, ap- 8 In Ethereum, the implementation of smart contract ownership is the developer’s responsibility. Zhou et al. [309] report more than 2 million contracts with ownership implemented using the OpenZeppelin Ownable abstract class and onlyOwner modifier. 21 Table 2.1: Social engineering attacks in Ethereum smart contracts. Social Engineering Attack Miscon- Attack Class and Brief Description ceptions A1 : Replace EOA with a non-payable contract address M1 , M2 to incur transfer failure and revert transaction Address A2 : Pre-calculate a future contract address and replace Manipulation M3 EOA with a non-payable contract at this address A3 : Exploit EVM’s EIP-55 checksum insensitivity M4 in address comparison A4 : Use dynamically-injected homograph M5 string in a branching condition A5 : Replace inter-contract call (ICC) header with Homograph M6 , M7 identically looking one to call a non-existing function A6 : Suppress EVM exception by mining a M6 , M7 function that matches a tampered ICC header proved transfer, etc. For example, in Tether USD stablecoin token, which is worth over $19 billion, the owner can call the deprecate function of the contract, effectively replacing the functionality of the smart contract into any arbitrary code. Subsequently, it would take only a few minutes for the contract owner to steal all the tokens and exchange them into Ether, at which point no exist- ing defense can revert the theft of funds. Essentially, when the attacker is the owner of the smart contract, it is unnecessary to implement the malicious transfer of funds within the call stack of the transaction submitted by the victim. Instead, the attacker may prefer to accrue a sufficient sum by blocking fund withdrawals, and acquire the entire balance afterwards. Such an approach makes the malicious patterns more stealthy than an immediate transfer of stolen funds. 2.4: Social Engineering Attacks In this section, we introduce six Ethereum social engineering attacks grouped into two classes, as shown in Table 2.1. The Address Manipulation class allows attackers to strategically exploit Ethereum public addresses, which empowers attacks A1 , A2 , and A3 . The Homograph class, which takes advantage of the fact that many fonts have identically looking symbols with different codes, includes attacks A4 , A5 , and A6 . The implementations of all the six attacks are available at https: //nick-ivanov.github.io/se-info/. 22 1 contract BaseToken is Context , ERC20 , ERC20Detailed { 2 uint256 tokenPrice = 100 wei; 3 constructor () public payable ERC20Detailed ( " BaseToken ", "BT", 18) { 4 _mint ( _msgSender () , SafeMath .div(msg.value , tokenPrice )); 5 } 6 function buyTokens () public payable { 7 _mint ( _msgSender () , SafeMath .div(msg.value , tokenPrice )); 8 } 9 function sellTokens ( uint256 amount ) public { 10 _burn ( _msgSender () , amount ); 11 address (msg. sender ). transfer ( SafeMath .mul(amount , tokenPrice )); 12 } 13 } Figure 2.1: Implementation of the Base Token, which is used to demonstrate the six social engi- neering attacks. Base Token: We demonstrate all the six attacks by altering the implementation of the smart contract called Base Token (see Fig. 2.1). This contract is an Ether-collateralized ERC-20 token, which means that the supply of tokens in the contract is backed by its Ether balance, allowing users to swap (i.e., buy and sell) the tokens using Ether. We implement Base Token using the OpenZeppelin ERC-20 prototype with two additional methods: • buyToken method deposits Ether in the smart contract and mints (issues) tokens correspond- ing to the deposited amount; • sellToken method burns (destroys) user tokens and transfers the corresponding amount of Ether to the caller. 2.4.1: Address Manipulation Address Manipulation attacks exploit cognitive biases and misconceptions about equality, format, referenced objects, derivation methods, and other properties of Ethereum public addresses. In this section, we propose three social engineering attacks: A1 , A2 , and A3 . Attack A1 : This attack covertly substitutes an EOA address into a similar smart contract address that allows the attacker to block funds withdrawal and subsequently acquire them. In A1 attack, the attacker deploys a smart contract with two sequential Ether transfers within the call stack of one transaction. The first transfer looks like a fee collection, while the second transfer is a fund transfer 23 Figure 2.2: Attack A1 workflow. to the user. The attacker deceives a victim to believe that the first transfer goes to an EOA, whereas the real destination is a smart contract without a payable fallback function. Therefore, the transfer fails, and the funds (deposited by the users earlier) remain in the malicious contract, which are available for the attacker for subsequent withdrawal through contract self-destruction, deprecation, or similar mechanism. Essentially, the attacker exploits the fact that almost any unused sequence of 40 hexadecimal digits is a valid EOA address, even if its corresponding private key is unknown. Particularly, if a few symbols in an address are replaced or swapped, the resulting address will still be a valid Ethereum EOA, which can accept incoming Ether transfers. In A1 attack, as shown in Fig. 2.2, the adversary deploys a malicious smart contract CA . The variable feeAddress in this contract 24 is initiated with an EOA address A1 . Also, each fund transfer to the user is preceded by another transfer of a small fee to the address stored in feeAddress. This creates a perfect illusion that the smart contract was deployed to profit from service fees. However, the real purpose of the contract is to lure the user to make a deposit and block any attempt to withdraw the funds. To achieve that, we introduce another public address A2 , derived from address A1 by either changing one symbol or swapping neighboring symbols to make two addresses visually similar. The manipulated address must maintain a valid checksum that collides with the checksum of the original address, reassuring the user that the address is the one seen in the constructor. We find that mining such an address pair takes only a few seconds9 , and thus demonstrate the incorrectness of M2 . Address A2 belongs to a pre-existing smart contract Caux , which does not have a payable fallback function. The attacker sets the value of feeAddress into A2 . Due to the addresses’ visual similarity, the user deposits funds with the assumption that the fees go to A1 . However, the with- drawal fails due to an attempt to send fees to an unpayable smart contract. For further deception, the attacker can generate a history of successful fee transfers from the smart contract to address A1 , deceiving the users into believing that the smart contract is actively receiving successful fee payments. This deepens the users’ confirmation bias that complies with the attacker’s deceptive narrative. The attack workflow in Fig. 2.2 includes four layers of deception that give the victim several clues aligned with the same narrative (i.e., the contract is a fair for-profit scheme), thereby exploit- ing the victim’s confirmation bias. The first layer of deception is that the smart contract does not reveal its deceptive nature during a test deployment — if a user compiles and deploys this smart contract for testing, the scheme will support the deceptive narrative because the test deployment can- not predict that the owner would change the value of feeAddress into the address of a non-payable smart contract. The second layer of deception comes from the deployment-time initialization of the feeAddress variable: by examining this address, the victim finds a history of fair transactions. The third layer of deception is delivered through keeping the feeAddress variable private, which 9 Our address miner is available at https://github.com/nick-ivanov/se-tools 25 prevents the victim from easy retrieval of its current value, as it requires a laborious effort of pars- ing binary transaction data. The fourth layer of deception targets a user who manages to retrieve the current value of feeAddress. Since this value is visually similar to the initialization address, the victim is likely to conclude that the original address is in use. Attack A2 : This attack intercepts a client deposit event and immediately deploys an auxiliary malicious smart contract at an EOA address for stealing funds accrued via blocked withdrawals. The key idea is to mislead the user by runtime replacement of what an address points to. The attack utilizes a more sophisticated method that dynamically changes the object referenced by an address. Here, we discover a peculiar combination of two facts about Ethereum that lead to the incorrectness of M3 : a) the address of a future, not yet deployed, smart contract is predictable; b) prior to deployment, the address of the future smart contract has the status of a legitimate EOA. Recall from Section 2.2 that a smart contract address is generated from the address of the deploying EOA and the transaction tally in this EOA. Fig. 2.3 illustrates the workflow of attack A2 . Smart contract A is disguised as a fair for-profit scheme, in which the owner charges fees per fund withdrawal. The fee recipient address is hard- coded in the smart contract and set as a constant, which fuels the confirmation bias supporting the notion of permanence of this address. For normal operation, this address should accept incoming funds, which means that it should either be an EOA or a smart contract with a payable fallback function. When the user makes a deposit, an event is emitted, which is intercepted by a server belonging to the attacker (the owner of smart contract A). Upon the detection of the event, the attacker deploys smart contract B at the address Af . The fee collector address Af is crafted in a way that the attacker knows the corresponding private key of the account Ad , based on which the contract B is deployed, i.e., Af = χ(Ad , η) (see Section 2.2). The fee transfer to address Af now fails because smart contract B has no payable fallback function. As a result, the previously deposited funds remain in the contract for subsequent acquisition by the attacker. Attack A3 : The attack leverages the overlap between lower-case and mixed-case EIP-55 addresses to misguide users into locking their funds in the smart contract for subsequent acquisition thereof 26 Figure 2.3: Attack A2 workflow. by the attacker. In attack A3 , the attacker provides the user with a personal smart contract and a seemingly random test Ethereum accounts. When a smart contract has hard-coded addresses or other account-specific values, it is a common practice to provide users with test accounts to demonstrate the functionality of a smart contract [55]. Since all accounts are assumed to have the same set of properties, the user believes that any account will have the same behavior as the test accounts, which we found not to be always true. Essentially, attack A3 exploits M4 , i.e., the belief that the secrecy of the private key solely determines the security of an Ethereum account. The key to this attack is the generation of accounts with all lowercase EIP-55 checksums. We verify that the probability of generating an EIP-55 address with lowercase checksums is about 0.0246% using a random guessing approach. One-time-password validation is a common supplemental authorization technique in smart con- tracts10 . The smart contract owner can generate an authentication hash of the user address and the corresponding user password, and store this hash in the smart contract. In this attack, the adversary creates such a password validation routine in the smart contract, and offers the user several test accounts for verification of functionality. However, the test set consists of only deliberately mined accounts with all lowercase EIP-55 checksums. In this smart contract, the fund transfer function is preceded by a password validation, which invokes an address conversion function that translates the address of the transaction sender into an all-lowercase string (e.g., strAddrHash in Fig. 2.4). 10 Sample password-based authorization can be found in these contracts: 0x0f82C7EAb8F7efB577A2DE9d2B7 e1Da1d0b6870e, and 0x13407d93F343148bf03eaCf482441dD526cD7EbD. 27 1 bytes32 constant authHash = 2 0 x8e69860da968defb8d06a7e565e5d76e3e878a01473a0cb191a0eda120323ca5 3 function strAddrHash ( address _addr , 4 string memory _pass ) private pure returns ( bytes32 ) { 5 return keccak256 (abi. encodePacked ( addr2Str ( _addr ), _pass )); 6 } 7 function sellTokens ( uint256 amount , string memory password ) public { 8 if( strAddrHash (msg.sender , password ) == authHash ) { 9 _burn ( _msgSender () , amount ); 10 address (msg. sender ). transfer ( SafeMath .mul(amount , tokenPrice )); 11 } 12 } Figure 2.4: Code snippet from function sellTokens in A3 attack. Using the test accounts, the smart contract works as expected. After the testing, the user creates a production authentication hash by concatenating his/her public address (copied from the wallet) and a secret password. This production account cannot be tested to avoid revealing the password through the open network of the public blockchain. Unexpectedly, an attempt to withdraw the funds will fail due to a failure in password validation caused by the disparity in the address capitalization. Fig. 2.4 demonstrates an example of attack A3 . The authHash constant variable stores the Kec- cak256 digest of the user address 0xe6c700856796524501438d7197497c14bceac297 concatenated with the password ASIACCS2021. The attacker offers the user the private keys of several test ac- counts, whose public addresses’ EIP-55 checksums are all lowercase. These test accounts work as expected. But when the users initiate transactions with their real addresses, the password vali- dation fails, since authHash incorporates the address with checksums in mixed-case letters, while strAddrHash generates the hash using the same address with all lowercase checksums. This failed validation prevents the selling of tokens by the user. This attack demonstrates that some accounts can be more vulnerable than others, effectively defying misconception M4 . 2.4.2: Homograph Visual Cognitive Deception The homograph attacks in smart contracts are enabled by the existence of symbols that look identical or very similar, whereas most text editors (except hex viewers) are unable to reveal the difference. We surveyed security experts from seven smart contract auditing firms (listed in Section 2.6.5) 28 1 if( stringsEqual ( symbol () , "BT")) { 2 _burn ( _msgSender () , amount ); 3 address (msg. sender ). transfer ( SafeMath .mul(amount , tokenPrice )); 4 } Figure 2.5: A snippet of the sellTokens function in A4 attack. about the usage frequency of hex viewers in their auditing process. The survey results show that only 1 out of 7 companies uses hex viewers usually, 2 of them use hex viewers sometimes, while the rest never or rarely use them. Here, we define two words or letters that contain identically looking symbols with different codes as a pair of homograph twins. The Homograph class of social engineering attacks leverages the fact that: although Solidity prohibits Unicode symbols in the names of functions and variables, it allows these symbols to appear in string literals that determine branching and inter-contract calls. In this section, we introduce three Homograph attacks: A4 , A5 , and A6 . Attack A4 : The attack leverages homograph twins in a string matching pattern to craft a malicious smart contract. Specifically, the attacker crafts a smart contract in which a homograph string is used in a branching condition, which leads to unexpected code execution. Fig. 2.5 demonstrates attack A4 , with the attack code embedded in the sellTokens() function. The stringsEqual() function performs a string matching by comparing the hashes of two strings11 . The literal BT is made of two ASCII characters, but the symbol() return value, although visually identical to literal BT, has the symbol T substituted with its homograph twin from the Cyrillic symbol set. Since the value of symbol() is mutable, the smart contract does not contain any explicitly malicious code, however, it turns malicious when the token symbol value is changed. As a result, the branching condition turns false, and the sell of tokens never occurs, which proves the importance of the token symbol, and thus refuting misconception M5 . Attack A5 : This attack replaces the header of a function with its homograph twin to cause unex- pected inter-contract call failures. Code reuse has been one of the best practices of smart contract 11 Solidity does not have any embedded or library string matching function. As Keccak256 digest is an EVM opcode function with relatively low gas cost, comparing string hashes is de-facto the standard string comparison approach. 29 Figure 2.6: Attack A5 workflow. development, allowing to reduce implementation time and frequency of programming errors. Code reuse can be either static or dynamic. A typical example of static code reuse is inheriting classes from the OpenZeppelin Contracts library. EVM also supports dynamic code reuse, in which one smart contract calls functions of another contract deployed on the same blockchain. Dynamic code reuse reduces the utilization of blockchain storage and achieves native inter-contract communi- cation (ICC). It is known that if a function is specified incorrectly in an ICC call, the fallback function12 of the smart contract will be invoked instead [57]. However, if the fallback function is absent, the call to a non-existent function triggers an EVM exception with subsequent transaction reversal, which is utilized by attack A5 via falsification of a function ICC selector. Fig. 2.6 demonstrates the general idea of attack A5 . During an ICC call, when an expected function in the destination smart contract is not found, and with no fallback routine implemented, the call will unexpectedly fail, and the transfer of funds to the client will not be executed. The proposed A5 attack substitutes one or several letters in the function header string with homograph twins, and as a result, the generated function selector will not match any existing function, leading to the ICC call failure. Fig. 2.7 shows the sellTokens function of A5 attack. We create and deploy an additional smart contract called Helper (see Fig. 2.8), whose address is hard-coded in the BaseToken contract. The Helper smart contract has a log function for event logging. However, the string “log(address)” 12 In Ethereum smart contracts, the fallback function is an optional nameless function designed to be a default inter- face of a smart contract. 30 1 bytes memory payload = abi. encodeWithSignature 2 ("log( address )", msg. sender ); 3 bool success = address ( helperAddress ).call( payload ); 4 if( success ) { 5 _burn ( _msgSender () , amount ); 6 address (msg. sender ). transfer ( SafeMath .mul(amount , tokenPrice )); 7 } Figure 2.7: Code snippet from function sellTokens in A5 . 1 mapping ( address => uint256 ) private lastSell ; 2 function log( address a) public { 3 require (msg. sender == 4 0 x0EFb5DE6AddAdDE835CEaadaAB1992590d7588F5 ); 5 lastSell [a] = block . number ; 6 } Figure 2.8: A code snippet of the Helper contract used in A5 . contains letters substituted with their homograph twins, and therefore the ICC call fails. Thus, the subsequent fund transfer to the caller never happens. This example demonstrates that visually identical arguments of call() and delegatecall() routines can indeed produce different selectors, proving the incorrectness of M6 . Attack A6 : The previous attack has one major weakness: although nothing in the code looks sus- picious, the status check of the ICC call may prompt a cautious user to set up a test deployment to check whether the call succeeds or not. Our next attack provides a deceptive technique to pass such a test. Attack A6 leverages potential collision cases of Ethereum function selectors, whose length is only 32 bits, to ensure a successful status from a deceptive ICC call. Assuming a uniform distri- bution of function selectors, the probability of collision with another function (i.e., two functions have the same selector) is approximately 2.33 · 10−10 . We run an experiment to show that it only takes a few hours on average for an office computer to find a collision13 . In attack A6 , the attacker crafts a function whose selector collides with the selector of the homograph twin of the expected function. Since the called function actually exists, the transaction succeeds, which further fuels the confirmation bias of the victim supporting the deceptive narrative crafted by the attacker. 13 Generally, the larger the number of symbols available for homograph substitution in the function header, the less time it takes to mine a collision. 31 Figure 2.9: Workflow in the A6 attack. The Solidity compiler will terminate with an error if it encounters two functions with the same selectors in one smart contract. A6 attack avoids this issue by replacing a function header with its homograph twin. In the workflow of the attack, presented in Fig. 2.9, smart contract A im- plements a call to a function in smart contract B. When B is compiled, the string header of the function foo will be translated into the 32-bit selector 0xc2985578. However, if we substitute both the letters “o” in the string “foo()” with their homograph twins, the compiler will translate the modified header into the selector 0x3293f 02a. Now, the attacker uses a collision search algorithm to mine the function name bar821770037, whose selector is also 0x3293f 02a. As a result, foo and bar821770037 can coexist in contract B, despite the fact that they both have visually identical argument of delegatecall, i.e., "foo()" (see step · in Fig. 2.9), effectively refuting M7 . After the homograph substitution, unbeknownst to the user, bar821770037 will be called instead of foo, which will return a successful status but break the anticipated code logic in contract A. Figs. 2.10 and 2.11 demonstrate an example of the A6 attack. The Helper smart contract in- cludes two functions, accountRegistered and afterBlock29410106. Since block number checks 32 1 bytes memory payload = abi. encodeWithSignature 2 (" accountRegistered ( address )",msg. sender ); 3 (bool success , bytes memory result ) = address ( helperAddress ). delegatecall ( payload ); 4 require ( success ); 5 if(abi. decode (result , (bool)) == true) { 6 _burn ( _msgSender () , amount ); 7 address (msg. sender ). transfer ( SafeMath .mul(amount , tokenPrice )); 8 } Figure 2.10: Code snippet from function sellTokens in A6 . 1 function afterBlock29410106 (bool deadlineCheck ) 2 public view returns (bool) { 3 if( block . number > 29410106 && deadlineCheck ) { 4 return true; 5 } 6 return false ; 7 } 8 function accountRegistered ( address a) public pure returns (bool) { 9 return a== mainAccount || a== backupAccount ; 10 } Figure 2.11: A snippet of the Helper contract used in A6 attack. are common in Ethereum smart contracts14 , the presence of an auxiliary function with this name is unlikely to raise any suspicion. The string “accountRegistered(address)” (Fig. 2.10) con- tains Cyrillic letters (letters 1, 2, 3, and 16 are replaced). We use a brute-force algorithm to mine the name afterBlock29410106, whose function selector collides with a homograph twin of ``accountRegistered(address)''. Surprisingly, we discover that the functions afterBlock2941 0106 and accountRegistered can accept arguments of different types: the call will still succeed regardless of the argument types, as long as the number of arguments in the two functions is consis- tent. This undocumented behavior of EVM adds an additional layer of disguise to the attack. In the end, afterBlock29410106 is called instead of the expected function accountRegistered. Unlike in A5 attack, the success variable is now true. However, the user’s fund transfer does not happen despite the successful return status, as the function’s return value is not as expected. 14 For example, contract 0xb68c88283b558cdc38c75c07bbc0d6921ef40fc7 uses a block number check to deter- mine the contract initialization deadline. 33 Table 2.2: Five popular tokens that we succeed in integrating social engineering attack patterns. Smart Market Capitalization† Integrated Contract (×$1 billion) Attack Pattern Tether USD (USDT) 19.76 A4 Binance (BNB) 4.6 A5 ChainLink (LINK) 3.94 A1 Bitfinex (LEO) 1.32 A6 CryptoKitties (CK) — A1 + A2 † Approximate rounded averages as of early December 2020. 2.5: Case Study of Real-world Smart Contracts One of the most important questions of this work is whether the six social engineering attacks can be used in real-world smart contracts. To answer this question, we choose source codes of five smart contracts that meet the following criteria: a) they represent a popular use case of a smart contract; b) they have thousands of active users; c) they have high market capitalization (i.e., the users entrust them their funds); d) the contracts implement one of the standard use cases from the OpenZeppelin Contract library. Then, we slightly modify the source codes of these contracts to integrate the social engineering attacks into them without altering any functionality or incorporating any unsafe practices or known vulnerabilities. This way we demonstrate that popular trusted smart contracts are capable of delivering the social engineering attacks. After integrating the attack patterns into the source codes of the five contracts, we deploy the contracts on Ropsten testnet and validate their expected functionalities. Then, we simulate the production deployment of the contracts, and demonstrate that some transactions that worked during the testing will fail due to activation of the attack functionality (e.g., deployment of a contract at EOA address in attack A2 ). For each case, we make sure that: a) the attacks remain dormant during the test stage and activate only on a production deployment; b) the attacks visually conceal themselves from the auditor; and c) each attack has a rational disguise (e.g., pretend to profit from charging service fees). Table 2.2 summarizes the five smart contracts and attack patterns integrated in them. The video demonstrations of all the five cases are available at https://nick-ivanov.github. 34 io/se-info/. The source code files of the entire smart contract set are available at https://github.com/ nick-ivanov/social-engineering-big5. Production Deployment Simulation: Our manual analysis of the source codes of popular con- tracts reveals that most of them use the OpenZeppelin Contracts templates with some custom addi- tions. In our case study, we demonstrate the feasibility of an attack code integration into an existing token without breaking the security patterns and functionality delivered by the OpenZeppelin Con- tracts library. The manipulated token can be advertised as a new cryptocurrency with additional features, such as special VIP privileges for early adopters. For ethics concerns, we perform both testing and production deployment simulation using the Ropsten testnet, whose smart contract ex- ecution is identical to the Mainnet, but does not involve real funds. To simulate a production deployment of a malicious contract by an adversary, we deliberately configure the same contracts with different constructor arguments (e.g., replace token symbol’s letter with its homograph twin), or submit additional transactions (e.g., deploy a smart contract at a hard-coded EOA address). It effectively simulates the activation of previously dormant malicious functionality in a production deployment. Here, we provide a high-level overview of five attack patterns integration. Integration of A4 pattern in Tether USD Stablecoin: Stablecoin is a fungible token pegged to the market price of a fiat currency (e.g., U.S. dollar). Adopted mainly by crypto exchanges, mainstream stablecoins have very high market capitalizations and daily transaction volumes. Tether USD (USDT), the most popular stablecoin, is an ERC-20 smart contract deployed on Ethereum15 . We integrate the pattern of attack A4 into the source code of USDT by adding a seemingly harmless check of the token symbol before each transfer. We test the code by confirming that the transfer routine’s functionality remains unchanged. After that, we simulate a production deployment of the code with an invisible modification of the token symbol, which is passed through the constructor. As a result, the smart contract traps user tokens due to the tampered token symbol. Integration of A5 pattern in Binance Token: The Binance Token (BNB)16 is a popular ERC- 15 Deployed at 0xdAC17F958D2ee523a2206206994597C13D831ec7. 16 Deployed at 0xB8c77482e45F1F44dE1745F52C74426C631bDD52. 35 20 altcoin with a high market capitalization and daily transaction volume, collateralized by the financial assets of Binance, a large crypto exchange. We integrate the pattern of attack A5 into the source code of the BNB token by adding an innocently-looking logging routine, which saves the transfer record in another smart contract. In the test, the code performs logging as expected. However, in the final deployment, the owner replaces one letter in the logging function ICC header with a homograph twin. The log call throws an exception ensuing the failure of fund transfer to users. Integration of A1 pattern in ChainLink Token: A blockchain oracle is a service that delivers a reliable outside information into the context of a smart contract. Collateralized by its business assets, ChainLink issues an ERC-20 token with the symbol LINK17 , in the source code of which we integrate the pattern of attack A1 . In this token, we use a special user role, the VIP user, who can transfer funds at any time, whilst the remaining users can only transfer funds after a pre-determined deadline. The test run does not reveal any issues, but in the production deployment, the malicious smart contract owner mines a similar public address with the same EIP-55 checksum as in the legitimate VIP user address, and saves this address in the smart contract. As a result, the VIP user, who does not recognize the address falsification, will fail to transfer funds from the smart contract. Integration of A6 pattern in Bitfinex Token: The Bitfinex LEO token, also known as the UNUS SED LEO18 , is backed by the assets of the Bitfinex crypto exchange. In this token, an auxiliary helper smart contract is used by the attacker for purported protection against transfer flood (i.e., per- forming too many small transfers by one user). This smart contract uses a homograph substitution of the ICC header of the expected flood-checking function. However, because of the homograph substitution, a wrong function in the auxiliary smart contract is called, which causes an unexpected failure of fund transfer. Hybrid Social Engineering Attack Pattern Integration in CryptoKitties: The ERC-721 stan- dard is used for non-fungible (i.e., unique) Ethereum tokens, such as collectibles, games, deeds, 17 Deployed at 0x514910771af9ca656af840dff83e8264ecf986ca. 18 Deployed at 0x2af5d2ad76741191d15dfe7bf6ac92d4bd912ca3. 36 etc. The CryptoKitties collectible game is one of the most popular ERC-721 tokens19 . For this contract, we use a combination of techniques from attacks A1 and A2 . Specifically, the A1 com- ponent involves a manual change of the fee collector by the attacker. The A2 component deploys a non-payable smart contract at an EOA address, resulting in transaction reversal. Akin to the four previous attacks on ERC-20 tokens, this social engineering exploitation also does not reveal itself during testing: only in the production environment, when the owner deploys the non-payable contract, the malicious logic enables. 2.6: Evaluation and Analysis In this section, we attempt to project the social engineering attacks onto all deployed open source smart contracts and estimate the overall danger of the attacks. 2.6.1: Methodology As demonstrated in Sections 2.3 and 2.4, the detection of social engineering attacks is impossible in a fully-automated manner because human assessment is necessary for understanding semantics of smart contracts. However, manual detection of social engineering attacks requires a laborious effort, such as inspecting the source code with a hex viewer, generating ICC selectors, etc. To address this dichotomy, we develop an automated tool that selects a potential subset of candidates from a given set of smart contracts for further manual analysis. Using this hybrid approach, we manage to filter out over 95.4% of all the candidates. Then, we manually inspect each of the suspected smart contracts and classify them into three categories: non-exploitable, syntactically matching, and semantically exploitable. Finally, we share our findings with security experts from seven leading smart contract security firms and ask them to share their opinions about the attacks in the form of an online survey. 19 Deployed at 0x06012c8cf97bead5deae237070f9587f8e7a266d. 37 2.6.2: Automated Detection A specific feature of all social engineering attacks is that their deception mechanisms are located only in the source code, and therefore undetectable in the bytecode. As a result, we consider the source code of a smart contract as an input. Fig. 2.12 illustrates the operation of our automated filter, which uses a double-layer detection, i.e., search for atomic signatures (attack markers) fol- lowed by logic processing of these signatures to match specific attacks. First, we preprocess the source codes by parsing multi-file contracts embedded in JSON objects, removing all non-Solidity smart contracts, erasing all the comments, and discarding smart contracts that are duplicates of the previously processed ones. Then, we feed the source codes into a set of signature detectors. Each signature detector utilizes text search and regular expression matching to identify specific markers in the source codes. For example, a fund transfer routine can be represented in the source code by either of the three markers: a) the transfer routine; b) the send routine; or c) the call with value procedure. These markers are then combined into a signature for detecting a fund transfer. Based on the signatures, we generate social engineering attack detection rules in a conjunctive normal form (CNF) by concatenating a sequence of signatures. We implement the smart contract scanner using Python, ethereum.utils, and Web3.py, and we publish the source code of the tool at https://github.com/nick-ivanov/esead. It is worth noting that we do not attempt to detect the proposed social engineering attacks using traditional smart contract vulnerability scanners (e.g., Securify, Sereum, etc.), because these tools by design assume a threat model in which a smart contract is the attack target. The only publicly available tool that fits the threat model of the proposed attacks is HoneyBadger20 . However, Hon- eyBadger is designed to detect Ethereum honeypots — the type of attack excluded from our study due to its limited audience of targeted victims. Therefore, none of the existing tools is capable to identify the proposed social engineering attacks. 20 https://github.com/christoftorres/HoneyBadger 38 Figure 2.12: Automated detection of potential social engineering attacks, in which atomic signa- tures are combined to match an attack profile for each attack in the form of CNF. 2.6.3: Potentially Exploitable Smart Contracts Attacks exploiting smart contract code vulnerabilities (e.g., reentrancy or integer overflow) can be detected via automated analysis of bytecode, source code, or transaction history of a smart con- tract. However, this information is insufficient to identify social engineering attacks with satisfying certainty. For example, consider transaction 0xc215b9356db58ce05412439f49a842f8a3abe6c179 2ff8f2c3ee425c3501023c, through which the sender paid around $5 million in gas fees: the con- text of this transaction cannot be known without a testimony from the sender. Our exhaustive effort to find any existing reports of social engineering attacks in the wild have not yielded any results beyond the cases of honeypot exploitations. Therefore, until the emergence of reports from victims, we can only discuss the potential of the social engineering attacks in the real-world smart contracts. To shed light on the potential existence of social engineering attacks in Ethereum, we collect all available open-source smart contracts from Etherscan21 , 85,656 unique smart contracts in total, including 73,933 in Mainnet, 8,297 in Ropsten testnet, and 3,426 in Kovan testnet. Table 2.3 shows the breakdown of the 3,855 detected candidates, which can potentially deliver social engineering attacks. Then, we perform a manual analysis of all the 3,855 suspicious cases to remove 2,375 non- exploitable smart contracts, and subdivide the remaining 1,480 contracts into 453 syntactically matching (but not exploitable) and 1,027 semantically exploitable contracts. An example of a non- exploitable contract22 would be the one with a suspicious transfer isolated from critical instructions by a mutually-exclusive if-else branching. Next, we elaborate on how we identify syntactically matching and semantically exploitable contracts, as well as their implications. 21 https://etherscan.io/ 22 For example, 0xa62bf7c97c4270882a9278c6f9d684d30e242e03. 39 Syntactically Matching Contracts: A syntactically matching smart contract fits the profile of one of the social engineering attacks (A1 ... A6 ), but does not exhibit a deception capability necessary for fooling the victim. For example, smart contract 0xe5b288da8fb70cd 58ab240f71610576657308762 fits the A2 case because it has a hard-coded fee-collecting EOA address. However, the manual ex- amination of the smart contract reveals that this address is 0xfeefeefeefee feefeefeefeefeefeefe efeefeef. Obviously, it is extremely unlikely that someone owns an account that can deploy a smart contract at this address. Another example of a syntactically matching smart contract is the smart contract called MyMil- lions23 , in which a fee transfer is sharing the call stack of the same transaction with another transfer, while the fee address is both pre-initialized and can be changed, which matches both A1 and A2 attacks. However, the manual analysis of this contract reveals that the double transfer occurs in the the function buyFactory, which is an engagement function (i.e., the function that the client calls to participate in the scheme of the smart contract). If this function fails due to the attack, the client deposit will never happen, and therefore this attack will not bring any gain for the attacker. Since semantics of smart contracts vary, only a human can definitely identify engagement and resolution functions. Semantically Exploitable Contracts: A semantically exploitable smart contract not only matches the profile of one of the social engineering attacks, but it also has the deception capability. It indi- cates that this type of contracts is actually exploitable. A deception capability is an introspective measure characterized by a substantial chance for a contract user to misconstrue the logic of the smart contract, leading to a potential execution of one of the social engineering attacks. The in- trospective nature of deception capability requires a human to reason about deceptiveness, leading us to manually analyze the source codes of all the 3,855 automatically selected suspected source codes, taking around 140 person-hours in total. As an example of semantic exploitability, our analysis reveals 34 smart contracts where a com- parison with an empty string literal precedes a critical operation, such as the one shown in Fig. 2.13. 23 Deployed at 0xbBbeCd6ee8D2972B4905634177C56ad73F226276. 40 One way such a contract can be used as a carrier of attack A4 is through the use of a zero-width space (Unicode U+200B), which appears as an empty string in many popular text editors (e.g., VS Code). Although none of the suspected 34 contracts have an actual zero-width space, a redeployment of the same contract can be used to launch the social engineering attack A4 . Another interesting exploitable example of attack A4 can be found at 0xf5615138A7f2605e 382375fa33Ab368661e017ff. This smart contract implements a personal smart contract scheme, which implies that each user of the scheme has an individual deployment of the same smart contract, sometimes referred to as a “wallet”. The contract uses a homograph symbol in a hashmap key, which leads to the inability to withdraw previously deposited funds. Although the contract has an obvious deception capability, neither code nor transaction log could definitely determine the contract’s maliciousness. In other words, the homograph substitution of the map key may indicate a malice or a mere typo. Another peculiar example of a semantically exploitable Address Manipulation attack is the game called JigsawGames224 . In this contract, the resolution function sellEggs contains a fee transfer alongside with the user reward transfer, which allows the attacker to block the user from getting the prize by making the fee address non-payable via attack A1 or A2 techniques. The contract does not implement any self-destruction or deprecation functionality, posing a challenge for the attacker who needs to acquire the funds trapped in the contract. Coincidentally, this smart contract also charges a developer fee in the engagement function buyEggs. In this case, the attacker can create a fake player, and make the fee address payable by calling buyEggs function multiple times using the fake player until the contract balance is drained through multiple fee transfers. This example shows that smart contract owners often have multiple indirect ways of stealing funds from smart contracts. 2.6.4: Observations While performing a manual analysis of 3,855 suspected smart contracts, we gathered some interest- ing observations, which are relevant within a broader discussion about social engineering attacks 24 Deployed at 0x2C7Bc39B1B0C9Fdf200fd30C74C0a9a41C2C7047. 41 1 if (! compareStr ( userGlobal .referrer , "")) { 2 ... 3 userRoundMapping [rid ][ referrerAddr ]. inviteAmount ++; 4 } Figure 2.13: Empty string comparison in a smart contract. The contract is deployed at 0x61394198ee6cbe2d6ad603d52c10fba3237202ef. in Ethereum. Observation 1 [Multiple versions of the same code]: It is well-known that a vast majority of smart contracts reuse secure patterns, modifiers, and abstract classes from the OpenZeppelin Contracts library. However, despite the fact that we remove all duplicate smart contracts during the pre- processing stage, our manual analysis of the suspected smart contracts reveals a significant number of large contract clusters, in which a custom code is reused with slight modifications. Such clusters of reused custom code patterns are also widely presented in the semantically exploitable set, which demonstrates that code reuse is prevalent in smart contracts, leading to the dissemination of insecure patterns. Observation 2 [No evidence of testnet experimentation with social engineering attacks]: In pursuit of early signs of experimentation with social engineering attack patterns, we supplement our dataset with open-source contracts from two testnets — Ropsten and Kovan. Our initial hypothesis was that the first experimental exploitations of social engineering attacks may prevail at testnets first. However, compared to Mainnet, in which 937 out of 3,165 suspected contracts are semantically exploitable (29.6%), in Ropsten this is 11.9%, and in Kovan it is 16.0%. Thus, the testnets exhibit reduced probability of encountering semantically exploitable social engineering contracts. 2.6.5: Survey of Auditing Firms To further evaluate the proposed attacks, we send surveys consisting of two questions shown in Fig. 2.14 to the following seven smart contract firms (listed alphabetically): Audithor, CertiK, CoinFabrik, ConsenSys, Dedaub, Trail of Bits, and one company that elected to be anonymous. The responses were provided by actual smart contract developers and security auditors from each 42 Table 2.3: Analysis results of 85,656 smart contracts. Non- Syntactically Semantically Attack exploitable matching exploitable A1 561 230 636 A2 213 100 341 A3 1,515 0 0 A4 86 123 50 A5 0 0 0 A6 0 0 0 Total: 2,375 453 1,027 (a) Could this attack be dangerous to your customers? (b) Do you think the attack can be discovered by human users? Figure 2.14: Average survey results from seven smart contract auditing firms. The red vertical line represents the average value of the six attacks. of the firms (one anonymous participant from each company)25 . Fig. 2.14 represents the answers from the experts regarding the six social engineering attacks. The vertical red lines represent the averages of responses with respect to all the six attacks. The results of the survey demonstrate that the experts agree that the social engineering attacks can cause damage to their customers. Also, the experts believe that the social engineering attacks are unlikely to be discovered by a human user. 25 There was no identifiable private information collected from the anonymous participants; therefore, the study did not require an IRB review. 43 2.7: Security Recommendations In Section 5.6, we demonstrate that even if all the syntactic patterns in a smart contract correctly match one of the social engineering attacks, only 1,027 contracts out of total 3,855 are actually exploitable, which is less than 27%. Corroborating our finding, Zhou et al. [309] demonstrate that the attempt to detect Ethereum honeypots by Torres et al. [270] in a fully-automated manner pro- duces a large number of false negative and false positive results. Therefore, the defense against social engineering attacks should involve human auditing. To account for this characteristic of social engineering attacks, we develop a list of recommendations for people considering engage- ment with a smart contract, including security auditors verifying safety of smart contracts on behalf of their clients. These recommendations aim for effective identification and prevention of social engineering attacks with minimal effort. Recommendation 1 [Beware of address change]: To prevent A1 , smart contract users should not engage in a contract which allows to change the address that is a transfer recipient within the call stack of a critical operation. Our analysis finds many smart contracts with such patterns in the wild, but none of them exhibit a malicious intent or have a suspicious history. However, it grants a potential backdoor for the owner to block critical operations, e.g., fund withdrawals. Recommendation 2 [Check EOAs for outgoing transactions]: To prevent A2 , smart contract users should verify that all hard-coded EOAs have at least one outgoing transaction. If the EOA has outgoing transactions (marked as “OUT” by Etherscan), it indicates that the smart contract owner knows the private key of the EOA, and it entails that the owner does not know the private key of the account that could deploy a smart contract at this address. In fact, the probability that someone knows the private key of an EOA and the private key of the account for deploying a contract at the same address equals to the probability of a 160-bit hash collision because each public address is a Keccak256 hash of a public key trimmed to 160 bits. Recommendation 3 [Avoid visual cognitive bias]: To prevent A1 , smart contract users should never compare addresses visually; text editor search function should be used instead. In this work 44 we show that EIP-55 collision bruteforce attacks are easy to carry out. As a result, even slightly mod- ified addresses with unknown associated private keys can be dangerous. Therefore, users should treat all public addresses with suspicion. Recommendation 4 [Avoid confirmation bias]: To prevent A3 , smart contract users should never use accounts with all-lowercase EIP-55 checksums for smart contract testing. Most Ethereum clients, such as Metamask, enforce EIP-55 checksums, so public addresses are always shown in a mixed-capitalization form. Another way to verify an address is to paste it in the search field of Etherscan, which also enforces EIP-55. If the address is all-lowercase, it might be a part of a social engineering scheme, and thus the contract should undergo additional scrutiny. Recommendation 5 [Do not trust string comparison]: To prevent A4 , smart contract users should not engage in a smart contract that uses string comparison to determine a transfer or an- other critical operation. If a text comparison involves two immutable values, e.g., constant and string literal, it is essentially a tautology, and is indicative of a derelict smart contract. However, one way to carry out attack A4 is to mimic a tautology, as is shown in Fig. 2.5. Either way, a critical operation determined by a string comparison should be treated with caution. Recommendation 6 [Verify ICC selectors]: To prevent A5 and A6 , smart contract users should verify the arguments of call() and delegatecall() with a hex viewer. Smart contract users and auditors cannot see selectors associated with functions and arguments of call()/delegatecall() while examining the Solidity code, since these selectors are computed at the compile time. If the parameters of call() or delegatecall() include a string literal, we recommend to compile both the calling and the callable contracts with --asm or --ir options to verify that the selectors of functions match. If the parameters are mutable variables, the contract cannot be treated as safe. 2.8: Chapter Summary This work zeroes in on a largely overlooked class of social engineering attacks in Ethereum smart contracts. These attacks exploit human cognitive biases as new attacking vectors. We identified these biases and developed six zero-day social engineering attacks. By embedding most of these at- 45 tacks into existing popular tokens, we demonstrated that the attacks have the potential to victimize a large group of normal users. Moreover, the attacks remain dormant during testing and only activate after a production deployment. We worked with seven smart contract security firms and confirmed that the attacks are indeed dangerous and evasive. Our analysis reveals 1,027 existing smart con- tracts that can potentially carry out social engineering attacks. By open-sourcing our analysis tools and benchmark datasets, we invite further research exploration of this emerging topic. 46 CHAPTER 3: ATTACKING HARDWARE WALLETS26 3.1: Introduction Hardware crypto wallets, also known as cold wallets, are air-gapped devices that produce public- key signatures27 for transactions with cryptocurrencies and smart contracts. These devices have some computing power, but they do not have any networking interfaces — to stay outside of the cyberspace. Instead, they communicate with the client computer through a secure device-to-device (D2D) channel (e.g., FIDO protocol over a USB serial bus). Hardware wallets are considered to be the most secure solution for protecting crypto funds from stealing, even in the case when the client computer is infected with malware. Fig. 3.1 shows four popular hardware wallets from three leading brands available on the market: Trezor by SatoshiLabs s.r.o. [18], Ledger Nano X and Ledger Nano S by Ledger SAS [17], and KeepKey by ShapeShift [19]. Fig. 3.2 shows a transaction workflow with a hardware wallet. First, the client software prepares a transaction message, and sends this message over to the hardware wallet via a non-networking channel. Then, the user confirms the parameters of the transaction (such as transaction amount, recipient address, and blockchain fee) shown on the display of the wallet. After that, the wallet signs the transaction with a non-extractable private key, and sends the signature back to the client software. Finally, the client software sends the signed transaction message to the blockchain, where the transaction is executed. Unfortunately, the described chain of actions has a weak link: the 26 This chapter is based on previously published work by Nikolay Ivanov and Qiben Yan titled “Eth- Clipper: A Clipboard Meddling Attack on Hardware Wallets with Address Verification Evasion” published at the Proceedings of the 2021 IEEE Conference on Communications and Network Security (CNS). DOI: 10.1109/CNS53000.2021.9705033 [160]. © 2021 IEEE. Reprinted, with permission, from Nikolay Ivanov and Qiben Yan, “EthClipper: A Clipboard Meddling Attack on Hardware Wallets with Address Verification Evasion” (paper and IEEE titles are the same), October 2021. 27 All popular hardware crypto wallets are hierarchical deterministic (HD) wallets [16], which are capable of gener- ating nearly infinite number of private keys (i.e., accounts) from a single secret seed. 47 Figure 3.1: Hardware wallets used in this research. ¶: Ledger Nano X ; ·: Trezor One; ¸: Keep Key; ¹: Ledger Nano S. attacker does not need to compromise the wallet to steal funds — it is sufficient to tamper with the transaction data sent for signing by falsifying the address of the recipient of funds. A recent formal security analysis by Khan et al. [172] formally proves that under normal cryptographic assumptions, the user of a hardware wallet plays a crucial role in its security. One way to target the user of the hardware crypto wallet is to substitute the transaction recipient address and covertly replace it in the clipboard of the operating system. The clipboard substitution attack, or clipboard hijacking, has been known for years [45, 67]. This attack exploits the fact that wallet users often utilize clipboard for copying a recipient address to the wallet’s client app. For example, the malware called Clipsa stole at least $36,000 worth of Bitcoin in 2018 and 2019 [62]. By examining the client software provided by the three vendors of the wallets shown in Fig. 3.1, we determined that they do not discourage the use of the clipboard (e.g., by disabling the keyboard operations). In clipboard substitution attack the malware running on the user computer detects the presence of a crypto address in the clipboard, and immediately substitutes it with another address. This attack, however, has one major weakness: the user is likely to notice the falsification of the address on the screen of the client software or on the mini- screen of the hardware wallet. Hence, our research question: is it possible to devise a clipboard 48 Figure 3.2: General transaction workflow using a hardware wallet. ¶: The client software sends the transaction data to the hardware wallet; ·: the user verifies the data and confirms the transaction with the wallet; ¸: the wallet sends the transaction signature back to the client software; ¹: the client software sends the signed transaction to the blockchain network. substitution attack that dodges the revelation of the address substitution during the transaction confirmation phase? The major insight of this work is to incorporate a social engineering component into a clip- board substitution attack. Social engineering attack techniques exploit human cognitive bias — an optimization mechanism of the human mind that makes conclusions based on expectation, prior experience, probability assessment, pre-existing belief, or emotions [145]. One way of exploiting a cognitive bias is through visual deception, which is actively used by attackers in email phishing via mimicking a popular website [285]. Another facet of cognitive bias is confirmation bias, defined as the rejection of evidence contradicting the originally established belief [169]. We discover that both the visual deception and confirmation bias could be exploited by an attacker who tries to steal funds from the hardware wallet. Specifically, this work is inspired by our observation that hard- ware wallet users exhibit a strong confirmation bias about the correctness of the recipient address, resulting in a behavioral pattern to verify only several leading (or trailing) digits of a transaction address, or even skipping the verification whatsoever. The validity of this observation is confirmed by previous research [50] and by manufacturers of the hardware wallets used in this research. 49 In this work, we propose a new attack called EthClipper, which adds a social engineering com- ponent to the existing clipboard substitution technique. In EthClipper, the attacker deploys a dis- tributed system, called ClipperCloud, which is used to mine and store billions of Ethereum accounts. When the malware detects an Ethereum address in the clipboard, it asks ClipperCloud to find among the mined accounts the one that exhibits maximum visual similarity with the address in the clip- board. As a result, the visual similarity between the address on the screen and the expected address is likely to enact the victim’s confirmation bias, incurring the approval of the malicious transaction. Although the Clipsa malware [242] also attempts to match some symbols in substituted address, it uses a small address database and only targets two leading and two trailing symbols, which is very easy to reveal visually. A small human-based study conducted by Almutairi and Al-Megren [50], involving substitution of symbols in Bitcoin addresses using a KeepKey hardware wallet, confirms that the rate of false approval of a modified address strongly correlates with the number of matching symbols. Unsurprisingly, the attacks by Clipsa have been prevented at least 360,000 times [62]. Unlike Clipsa, EthClipper is highly optimized to enable practical substitution of up to 25% of the symbols in the address for achieving maximum level of deception with a limited attacker’s budget, while maintaining low latency, and maximized likelihood of having a replacement address readily available. In summary, we deliver the following contributions: • We discover a new attack, called EthClipper, against hardware crypto wallets, which com- bines the features of clipboard substitution, cryptographic pre-computation, and social engi- neering to lure the victim into confirming a transaction with a tampered recipient address. • We introduce a low-latency application-specific distributed system, called ClipperCloud, that performs computation and storage needed for the EthClipper attack outside of the victim’s computer. This makes the attack a realistic one, which is easy to carry out. • We implement EthClipper malware, and test it using four popular hardware wallets from three manufacturers. • We implement the ClipperCloud system and test it on four server deployments. Our evalua- 50 Figure 3.3: Replacing 80% of address symbols with an ellipsis in MetaMask, one of the most popular Ethereum wallets. tion shows that ClipperCloud exhibits a low query latency, and EthClipper attack adapts to a flexible range of setups and budgets. • For responsible disclosure, we have communicated the details of the attack to the manufac- turers of the wallets used in this research and received the confirmation from all of them that the attack is potentially dangerous. 3.2: EthClipper Attack Design and Analysis In this section, we elaborate on the technical details of the EthClipper attack, and then describe the workings of the EthClipper malware, followed by elaboration on the ClipperCloud system needed for the attack. 3.2.1: Attack Overview The manufacturers of popular hardware crypto wallets openly state that hardware wallets are the best devices for storing crypto accounts. Yet the EthClipper attack bypasses the air-gapped protec- tion of hardware wallets in order to steal money from the user’s Ethereum account by falsifying the transaction recipient address with an address belonging to the attacker. The EthClipper attack uses the clipboard hijacking technique to falsify the data sent to a hardware wallet without com- promising the wallet itself. However, unlike previous attacks of this kind, EthClipper makes it harder for the user to recognize a falsification. Fig. 3.4 shows a general workflow of the attack. The attacker infects the victim’s computer with malware using one of a plethora of available tech- 51 niques [88,243]. The malware monitors the clipboard of the user account for appearance of a valid public crypto address. Once the address is discovered in the clipboard, the malware immediately contacts the pre-deployed distributed system, called ClipperCloud, which stores a database of sim- ilar addresses that have been mined in advance (see Section 3.2.3 for details). After receiving the matching visually similar address from ClipperCloud, the malware replaces the original address in the clipboard with the forged one. Our observation suggests that users of hardware wallets tend to verify a few leading and trailing symbols in the address. Moreover, many popular Ethereum wallets, such as MetaMask, indirectly suggest the normality of skipping the internal symbols of the address by incorporating this feature in the user interface (see Fig. 3.3). This observation is independently confirmed by three manu- facturers of hardware wallets, and leads us to the design of the attack that substitutes the address with matching d N2 e symbols in the prefix and b N2 c symbols in the suffix (see Fig. 3.6). Moreover, our observation of Ethereum address checking by hardware wallet users reveals the habit of not verifying more than 4 symbols in the prefix and 4 symbols in the suffix, suggesting that N = 8 is likely to be sufficient amount of matching symbols in many cases. Since the attack is opportunis- tic, there is no need for the attacker to succeed every time. However, the probability of success is obviously growing with larger values of N . Furthermore, a large amount of funds involved in a cryptocurrency transaction does not necessarily entail increased vigilance by the user. For example, in 2016, an attack on an Ethereum smart contract, known as the DAO attack, incurred a damage worth approximately $50 million due to a simple reentrancy vulnerability, which had been known and well-researched for years prior to the attack [85] — despite high amounts of money at stake, none of the investors in the infamous smart contract was able to notice the bug that was exploited in the attack. Therefore, we do not exclude users of large transactions from the scope of potential victims of EthClipper. When the user pastes the address in the hardware client application, he/she has to confirm the parameters of the transaction on the screen of the wallet, which includes the recipient address, as shown in Fig. 3.5. Since the address on the screen is visually similar to the expected one (i.e., the 52 two addresses have matching prefixes and suffixes), the victim might fail to notice the substitu- tion. Our informal communications with several users of hardware wallets confirm that most of them, when verifying the recipient address, examine only first and last several digits, or none at all. Finally, the user pushes the confirmation button on the hardware wallet and sends the funds to the address which corresponding private key is stored on ClipperCloud, and therefore known to the attacker. EthClipper is optimized for the specifics of Ethereum, which allows for the attacker to maximize the social engineering effect of the attack, which existing malware, such as Clipsa, fails to achieve. However, it is possible to independently develop a similar malware and associated distributed service optimized for other formats of addresses, such as Bitcoin. 3.2.2: EthClipper Malware In this research, we design a malware that allows to bypass the air-gapped protection of a hardware wallet through the EthClipper attack, which uses clipboard substitution as a carrier. EthClipper malware is a program that persistently runs on the background, monitoring the clipboard of the current user. An important feature of EthClipper malware is that it does not require any special user privileges or hardware access. Moreover, it can be implemented as a cross-platform Python or Node.js script. Once the malware detects an Ethereum address in the clipboard, it immediately submits a UDP request to the ClipperCloud system, which replies with a substitute address, if one is found. As soon as the substitute address is received, the malware injects it in the clipboard. Intu- itively, it is very important for the malware to substitute the address very quickly, before the user pastes the address to the wallet client application. The manufacturers of the hardware wallets used in this study confirmed for us that currently there is no defense against EthClipper attack. Thus, given the decentralized nature of the Ethereum blockchain, if the attack is deployed and subse- quently revealed by one or multiple users, it would require an extensive publicity and substantial amount of time to alert all potential victims of the attack. Next, we elaborate on the architecture of ClipperCloud, which provides the storage model that allows to achieve a low response latency to ensure the success of the attack. 53 Figure 3.4: Workflow of the EthClipper attack. ¶: The owner of the wallet copies a recipient address to the clipboard from the source (e.g., website); ·: the EthClipper malware detects the address in the clipboard; ¸: the malware connects to ClipperCloud to request an address that is similar to the one in the clipboard; ¹: ClipperCloud replies with a similar address; º: EthClip- per malware places the substitute address from ClipperCloud to the clipboard; »: the user of the wallet pastes the address from the clipboard to the hardware wallet’s client software; ¼: the client software sends the transaction data, which includes the replaced (fake) recipient address, to the hardware wallet for signing; ½: the hardware wallet asks the user to confirm the parameters of the transaction (by pushing a button on the wallet); ¾: the user of the hardware wallet, who is prone to a confirmation bias, confirms the transaction without verifying all of the symbols of the recipient address; ¿: the wallet signs the transaction using the air-gapped private key and sends the signature to the wallet’s client software; : finally, the wallet client software sends the signed transaction to the Ethereum blockchain, where the transaction is executed. 3.2.3: ClipperCloud In order to make EthClipper practical for a real-world attacker, the abundant storage and heavy computation needed for the attack must be outsourced to a distributed service. ClipperCloud is a distributed system that has two main purposes: it mines malicious addresses for the attacker, and it stores these addresses in a way that allows to query them very quickly. Next, we elaborate on the architecture, computation, and storage model of ClipperCloud. ClipperCloud Architecture: The EthClipper attack requires a heavy computation (for mining 54 (a) KeepKey (b) Ledger Nano S (c) Ledger Nano X (d) Trezor One Figure 3.5: Ethereum cryptocurrency transaction confirmation in popular hardware wallets. Figure 3.6: Address substitution pattern. The substituted address has the same number of matching prefix and suffix symbols (or one more in the prefix, when the number of symbols is odd), i.e, d N2 e in the prefix Ap , b N2 c in the suffix As , N total. When verifying the address, many users check only a few symbols in the prefix, and sometimes a few symbols in the suffix. similar addresses), as well as a large storage (for keeping the pre-mined addresses ready for the malware and storing their corresponding private keys for the attacker to withdraw stolen money). Moreover, to make ClipperCloud suitable for the EthClipper attack, the system must meet the following four major requirements. First, regardless of the size of the database, the system must respond to the malware requests very quickly, in order to replace the recipient address in the clip- board before the user pastes it. Second, the computation and storage may need to be split between multiple servers because a single server might not have sufficient resources required for the attack. Third, the EthClipper malware is likely to have multiple instances, so the ClipperCloud system 55 must be able to serve them all. Fourth, the system must be flexible enough to support adding ad- ditional computation and storage, as well being capable for easy reconfiguration after the address database is fully mined. To satisfy these desiderata, we design ClipperCloud in a way that it can split resources across multiple servers. To achieve that goal, each server performs communication with malware, computation and storage in three parallel processes. Fig. 3.7 shows the basic architecture of ClipperCloud. The distributed system can have one or several servers. Each server has a compute module, which performs address mining, and it also has a storage module, which saves mined addresses, along with their corresponding private keys. Each server is responsible for storing addresses corresponding to a certain range of matching symbols, while the compute module can produce addresses for any of the servers (because the result of the random guessing is unpredictable). If the compute module on one server finds a matching address for another server, it stores the result in a temporary buffer. When the buffer is full, the server transfers these addresses over to the corresponding server — this procedure is called the cooperative transfer. We conducted a preliminary testing using a high-performance Microsoft Azure H-Series server with 60 CPUs, which revealed that the cooperative transfer overhead was between 200 and 300 megabytes per minute, while the available bandwidth is normally 1 Gbps in the uplink direction and 9 Gbps in the downlink direction, which confirms that there is no risk of a traffic bottleneck incurred by the cooperative transfer. Also, we experimentally confirm that despite the increased network traffic, the cooperative transfer delivers at least 50% faster database population compared to discarding out-of-range addresses — we attribute this phenomenon to the benefits of the usage of direct memory access (DMA) or similar hardware extensions by the servers, which allow to perform disk operations with minimal CPU involvement. Compute Module and Address Mining: In order to conduct the EthClipper attack, the attacker needs to have a large database of Ethereum accounts readily available for address substitution. In this work, we call the process of population of such a database the address mining, which is performed by the ClipperCloud module called the address miner. ClipperCloud address miner is a multi-threaded program that generates random Ethereum accounts. For example, consider the 56 Figure 3.7: ClipperCloud workflow. address substitution depicted in Fig. 3.6. The bottom address in this figure will be stored in the slot 48B7769616 (or 121998299810 ) within the ClipperCloud address space. Each address in the ClipperCloud address space translates into an absolute address within one of the ClipperCloud servers. The address miner, when a new account is created28 , either forwards the account to the storage module, or eventually sends this account to the ClipperCloud server where it belongs (as part of the cooperative transfer). If there is already an account stored in that slot, the new account is ignored. Fixed-Field Storage: The account records produced by the address miner should be stored at Clip- perCloud in a way that any requested address substitute must be found very quickly — otherwise, the malware will not be able to substitute the address in the victim’s clipboard within the short pe- riod of time between copying and pasting of the address. To guarantee instant response to a record search, ClipperCloud stores records as hexadecimal strings in a fixed-field database, so its total storage requirement Stot can be calculated as Stot = (Sprk + Spa ) × 16N , where Sprk is the size of a private key, Spa is the size of a public address, and N is the number of matching symbols (both prefix and suffix). Since EthClipper targets only Ethereum users, Sprk and Spa can be replaced 28 By account we assume a pair made of a private key and corresponding public address. In Ethereum, the the address of an account is calculated as a 160-bit prefix of the Keccak256 hash of the account’s public key. 57 Figure 3.8: Overview of the ClipperCloud storage format. with their respective numerical values of 64 and 40 bytes, i.e., Stot = 104 × 16N . As shown in Fig. 3.8, the records in ClipperCloud are stored sequentially in fixed-sized fields. The length of one field is 104 bytes (40-byte address concatenated with 64-byte private key). This allows to access the records with the time complexity in the order of O(1). To access the record within the file storage, the server needs to perform a single lseek29 operation within the data file with the offset set as 104 × ([Ap As ] − a0 ), where [Ap As ] is the number resulting from the concatenation of the prefix Ap and the suffix As ; a0 is the first value in the range of record numbers assigned to the current ClipperCloud server. Additionally, ClipperCloud allocates storage for cooperative transfer buffers, as well as a little space for logging successful requests (in order to inform the attacker which accounts have stolen funds). Both of these additional storage components remain constant and much smaller than the storage of records with substantially large N , so we exclude these insignificant values from the storage analysis. 3.2.4: Address Mining Analysis Each newly generated random address might match a previously stored ClipperCloud record. More- over, the more addresses ClipperCloud generates, the higher the probability of collision with an 29 lseek is a system call in POSIX-compatible operating systems (e.g., Linux) that moves the read/write position (called offset) within a file. This operation is intended to have a constant-time complexity. 58 already stored record, which slows down the rate of adding new records to the database of similar addresses. Since the EthClipper attack is opportunistic by its nature, let us assert that 95% coverage of available similar addresses is satisfactory for an attacker. In other words, we assume that the ClipperCloud database is fully-mined if any given address request has a 95% probability of suc- cess. Here, we deliver a formal argumentation regarding the compute complexity of the brute-force similar address mining that is probabilistically necessary for achieving the 95% target coverage. Claim 1: A random set of 3 · M integer numbers from the interval [0, M − 1] is expected to have at least 0.95 · M distinct values. Proof: Let us consider a set S of properly random numbers Si ∈ Z, 0 ≤ Si ≤ M − 1, and |S| = 3M . Each number in the set is expected to have a certain probability of collision with at least one other number in the set, i.e.: p = P r(∃i ∈ [1, 3M ] ∃j ∈ [1, 3M ] : i 6= j ∧ Si = Sj ) (3.1) The expected value of p is consistent with the well-known birthday paradox [171], in which the expected proportion of resulting distinct values C of m possible values in the random sample of size n can be determined by the Taylor’s approximation, shown below, that delivers a provably narrow margin of error [247]: C = 1 − e− n/m (3.2) Next, let us apply Eq. 3.2 towards the constraints described in Eq. 3.1: 1 C = 1 − e− =1− ≈ 0.95021 3M/M (3.3) e3 Therefore, the expected number of distinct values in the set of 3M random integer numbers between 0 and M − 1 is at least 95% of total possible distinct numbers, i.e., C ≥ 0.95 · M . ■ Corollary of Claim 1: In order to attain the coverage of at least 95% of similar addresses, the 59 attacker is expected to generate 3 · 16N random Ethereum accounts. Proof: An Ethereum address is a 160-bit prefix of a Keccak256 digest of the public key of the account, which is derived from the 256-bit random private key of the account via the secp256k1 el- liptic curve algorithm [288]. Assuming that any hexadecimal digit position of an Ethereum address expresses an equal probability of its 16 possible values (i.e, 0 through F ), then any subset of N digits in a random address is essentially an integer number from the interval [0, 16N −1]. Therefore, Claim 1 can be applied to the address mining by ClipperCloud, with M = 16N . Consequently, in order for the attacker to achieve a minimum 95% coverage of similar addresses with N matching digits, 3 · 16N random Ethereum accounts must be generated. ■ For the purpose of generality, let us denote the multiplier 3 in Claim 1 as τ (i.e., τ = 3). Following the same logic, we can leverage different target coverage values by changing τ . For example, when τ = 1, we may expect at least 63% of database coverage, i.e.: 1 C = 1 − e− =1− ≈ 0.6321 1M/M (3.4) e Similarly, when τ = 0.7, the approximate coverage is 50%, which means that the attacker needs to generate 0.7 · 16N accounts to achieve 50% probability of successful N -digit match for a given random address. To confirm the correctness of the above argumentation, we conducted a small experiment for the case of 95% coverage: we generate 3 million random numbers between 0 and 999,999 in Python, adding them to the set that prohibits duplicates. The resulting size of the set was 950,188, which is consistent with Eq. 3.3. 3.3: Implementation and Evaluation 3.3.1: Implementation In order to demonstrate that EthClipper is feasible, we implement it and perform a thorough testing of its parameters using four different hardware wallets from three manufacturers. We implement our EthClipper malware prototype using Python 3.7.5 with socket and clipboard libraries. The 60 ClipperCloud prototype is implemented using Node.js JavaScript 10.15.2 with dgram, buffer, fs, and Web3.js libraries. After the manufacturers of the hardware wallets deploy the defense, we intend to publish the source code of our implementation under an open-source license for testing, reproduction, independent evaluation, and follow-up research. We test our implementation using four hardware wallets: Ledger Nano X, Trezor One, KeepKey, and Ledger Nano S. Ledger Nano X supports both Bluetooth and USB connections, but we use only USB, for fair comparison. For Trezor One and KeepKey, we use the vendor’s bridge software installed on Ubuntu 20.04, and the vendors’ web apps (Trezor Ethereum Wallet and ShapeShift) in Google Chrome web browser. For the Ledger wallets, we use the vendor’s bridge software and the vendor-provided cross-platform desktop GUI application. Then we execute the workflow of the attack three times with each of the four wallets, confirming that the attack executes as expected and that the similarity of the addresses shown for confirmation on the screen of the wallets indeed have a deceptive quality on the human cognition. 3.3.2: Storage Requirement EthClipper can be used with a wide spectrum of ClipperCloud configurations, thereby leverag- ing the balance between the number of matching symbols, address mining time, address database coverage, and the budget of the attacker. Table 3.1 shows some possible ClipperCloud storage con- figurations that the attacker may use. As we can see from the table, if the attacker wants to match only 4 symbols in the address, ClipperCloud needs to store about 6.5 megabytes of information. However, in order to match 11 symbols, the storage requirement increases to over 1.6 petabytes, which would require about 820 2-terabyte hard drives, which is unrealistic for most attackers. The storage configurations for up to 9 matching symbols are easily attainable with retail storage devices or affordable cloud solutions. The database matching 10 symbols, requiring 104 Tb, is also achiev- able with relatively affordable retail options. For example, as of mid-April 2021, two WD EX4100 56TB off-the-shelf network access storage (NAS) units can provide the attacker with sufficient memory for addresses with 10 matching symbols at a total cost of under $4,200. Thus, we assume 61 Table 3.1: Cumulative address storage requirement. Matching Storage requirement per server EthClipper Clipsa symbols 1 server 5 servers 10 servers 4 3 3 6.5 Mb 1.6 Mb 665.6 Kb 5 3 7 104 Mb 20.8 Mb 10.4 Mb 6 3 7 1.625 Gb 332.8 Mb 166.4 Mb 7 3 7 26 Gb 5.2 Gb 2.6 Gb 8 3 7 416 Gb 83.2 Gb 41.6 Gb 9 3 7 6.5 Tb 1.3 Tb 665.6 Gb 10 3 7 104 Tb 20.8 Tb 10.4 Tb 11 3 7 1.625 PB 332.8 Tb 166.4 Tb that the attacker’s ClipperCloud database has the maximum capability for replacing 10 symbols, which is 25% of the total Ethereum address length. Unlike Clipsa, EthClipper allows to match larger number of symbols in the address, thereby substantially increasing the odds of success. 3.3.3: Query Latency Evaluation The delay between the address request submitted by EthClipper malware and the response by Eth- Cloud, denoted the query latency, is crucial for the success of the attack because the address has to be replaced with the similar one before the user pastes the address to the wallet client. In order to evaluate the delay of similar address requests, we conduct 5 experiments, each including 20 mea- surements (100 measurements total). For each experiment, we exponentially increase the number of addresses stored in one ClipperCloud server from 104 to 108 . Then, for each of the 5 experi- ments, we measure the delay of requesting a similar Ethereum address in milliseconds. We use two ClipperCloud servers (one in San Francisco, another one in New York) using DigitalOcean Droplet service, both with the following configuration: CPU-Optimized 32-CPU servers with 400 Gb SSD, 64 Gb RAM, running Ubuntu 20.04 LTS x64. We test three different attacker’s Internet connection types: 100 Mbps home cable modem, 60 Mbps home Wi-Fi, and 20 Mbps 4G LTE connection (AT&T in the United States). Fig. 3.9a represents the results of the experiments. As we can see from the evaluation, the similar address request time is consistent under different circumstances, and is around 2 seconds. Most importantly, as the number of addresses grows exponentially, we 62 (a) Address request. (b) Mined account transfer. Figure 3.9: Average transmission delay. observe only a slight increase in delay. Specifically, While the number of addresses increased by 1,000,000%, the delay increased only by 19.7%, which suggests that with larger database sizes the latency will still remain low. Although it may be common to copy and paste text in under 2 seconds within a single window of a frequently used application, in the case of cryptocurrency transfer, the address will undoubtedly be copied from one application (e.g., web browser) and pasted into the hardware wallet client, which may also involve the application switching step to bring the wallet app to the foreground. Moreover, it is reasonable to assume that hardware wallet apps are not frequently used by most users because every cryptocurrency transfer incurs paying blockchain fees. Therefore, the workflow of the clipboard copy-paste cycle is likely to take more than 2 seconds on average. In our experiment, in which we repeated the workflow of the attack 12 times on a laptop (3 times for each wallet), the copy-paste delay exceeded 2 seconds each time (based on stopwatch measurements made by an observing assistant). Cooperative Account Transfer: We evaluated the delay of cooperative address transfer by con- ducting 5 experiments, each including 20 measurements (100 measurements total). In each ex- periment, we exponentially increased the number of addresses stored in each of the ClipperCloud servers from 104 to 108 . Then, for each of the 5 experiments, we measured the delay of transferring an Ethereum account from miner to cooperator in milliseconds, using two different ClipperCloud 63 servers (one in San Francisco, another one in New York). After that, we calculated the mean aver- age and standard deviation of the 20 measurements for each experiment, and represented the results in Fig. 3.9b. As we can see from the evaluation, the cooperative transfer time is consistent under different circumstances, and is in the order of 4 seconds. Most importantly, as the number of addresses grows exponentially, we observe only a slight increase in delay. Specifically, While the number of addresses increased by 1,000,000%, the cooperative transfer delay increased only by 11.6%, which suggests that the complexity of the account transfer is about O(1). 3.3.4: Address Mining Performance In order to successfully conduct the EthClipper attack, the attacker needs a large database of Ethereum accounts for address substitution, mined using a substantial compute power applied for a lengthy period of time. Therefore, it is crucial to evaluate the ability of an attacker to mine a ClipperCloud database with desired parameters using a reasonable time and budget. In order to evaluate the performance of address mining by ClipperCloud, we deploy four different server con- figurations. For the ease of reference, we give these configurations short code names: Azure, DO-, DO+, and PC. Below are details of these configurations: • Azure: Microsoft Azure H-Series HB60rs high-performance virtual machine with 60 CPUs, 223.52 Gb RAM, and 700 Gb of storage, running Ubuntu Server 20.04 LTS. The cost at the time of deployment (January 2021) was $1,664.40/mo. • DO-: DigitalOcean Basic Droplet with 1 vCPU, 1 Gb RAM and 25 Gb of storage, running Ubuntu 20.04 (LTS) x64. The cost at the time of deployment was $5/mo. • DO+: DigitalOcean CPU-Optimized Droplet with 32 CPUs, 64 Gb RAM, and 400 Gb of storage, running Ubuntu 20.04 (LTS) x64. The cost of the instance is $640/mo. • PC: Office PC with AMD Ryzen Threadripper x2950 CPU (16 cores, 32 threads), 70.6 Gb RAM, 1 Tb SSD storage, running Kubuntu 20.04 LTS. On each of the four configurations, we perform tests involving different number of simultaneous 64 address mining processes: 1, 2, 4, 8, 16, 32, 64, 128, and 256. Each process mines and saves 100,000 random Ethereum accounts. For each test, we measure the time needed for all the threads to finish. Then, for each test, we calculate the mining performance measured in accounts per second for each of the server configurations. Fig. 3.10 shows the results of the experiments. The DO- server hung each time we attempted to run 32 simultaneous mining processes, therefore we were only able to gather partial data for it. All the servers, except DO-, exhibit similar performance for 1, 2, 4, 8, and 16 simultaneous processes; however, the Azure server shows a significant performance advantage with 32, 64, 128, and 256 simultaneous processes. Additionally, we evaluate how much time it would take for Azure, DO+, and PC to mine 50% and 95% of the address database for 7, 8, 9, and 10 matching digits, with results shown in Fig 3.11. Please recall Claim 1 for details of calculation of 95% coverage. For the 50% coverage, we use the same Taylor approximation with τ = 0.7. First, we can see that, unsurprisingly, the Azure deployment exhibits a significant performance advantage compared to DO+ and PC. However, Azure is also the most expensive deployment out of the three. Second, the performance difference between the DO+ and PC deployments is insignificant, and given a sizable rental cost of DO+, the use of retail PC may be the most economic option for an attacker, depending on available budget and other circumstances. Nevertheless, ClipperCloud is suitable for a flexible variety of possible deployment scenarios For example, a realistic and affordable scenario would be to use 5 office computers to mine a 50% address coverage. The number of days to mine the required coverage for one 1 PC is 467.84, and if it is split between 5 computers, it would take 467.84/5 = 93.6 ≈ 3months. Essentially, it means that the attacker will be able to run the ClipperCloud from home or office, statistically capable of replacing 50% of incoming addresses with a 10-digit match, which is 25% of all digits in an Ethereum address. Meanwhile, a small user study by Almutairi and Al-Megren [50] demonstrated that 30% of users of the KeepKey wallet failed to recognize the substitution of a Bitcoin address with 20% of matching symbols. Since EthClipper is an opportunistic attack, a success rate around 30% is capable to yield a substantial gain for the attacker. Therefore, the EthClipper attack is a realistic attack which can be launched by an attacker with relatively limited 65 Figure 3.10: ClipperCloud address mining performance. (a) 7 digits (b) 8 digits (c) 9 digits (d) 10 digits Figure 3.11: Time needed to achieve the target address mining coverage for 7, 8, 9, and 10 matching digits. resources. 3.3.5: Opinions from Manufacturers of Hardware Wallets For responsible disclosure, we contacted all the manufacturers of the wallets used in this research, and all of them confirmed the potential danger of the attack. Specifically the security representative from ShapeShift stated, “[...] it would likely impact KeepKey users since in my experience, you are right: most users either verify the first/last characters or none at all.” The security representative from SatoshiLabs s.r.o. stated, “It’s quite obvious from the description how the attack works [...]” The head of security research at Ledger SAS said, “The attack you described is a problem we 66 already discussed, and we did not find a satisfactory solution to tackle it. We would be happy to collaborate with you in order to develop defenses against it.” Following the responses, we are in the process of discussing a collaborative defense solution against the attack. 3.4: Security Recommendations and Defense In this section, we discuss two categories of measures that can be used to prevent the EthClipper attack: adherence to security recommendations and automated defense against the attack. 3.4.1: Security Recommendations Recommendation 1. Resist confirmation bias: EthClipper is a hybrid attack with a substantial social engineering component, which means that its success largely depends upon the ability of the attacker to exploit the human cognitive bias of the user of a hardware wallet. Specifically, the attack relies on the confirmation bias that forces the user to conclude that the actual recipient address matches the intended one, based on a partial reading. However, a proper verification of the entire address by the user, prior to sending funds to it, is sufficient to reveal the address substitution. Therefore, a disciplined verification of the entire address by the user would deliver a reliable defense against the EthClipper attack. Recommendation 2. Pay attention to EIP-55 checksums: Ethereum clients often use address checksums, also known as EIP-55 checksums, which are encoded in the addresses via selective capitalization of certain hexadecimal letters. These checksums are primarily designed for software clients to detect typos in hand-typed addresses; however, they can also be useful for uncovering an EthClipper attack. Although an EIP-55 capitalization can be falsified [158], it would incur a sig- nificant computation overhead for the ClipperCloud address miner30 , rendering the creation of the address database impossible within a reasonable time frame. Consequently, the address substituted by EthClipper would likely have different capitalization than the original one. Therefore, when ver- ifying the correctness of Ethereum addresses, we recommend to pay attention to the capitalization 30 The probability of EIP-55 checksum collision is ≈0.0139% [55]. 67 of their hexadecimal letters. 3.4.2: Automated Defense In the spirit of reproducible and open research, we intend to make the source code of the EthClip- per stack published after the defense is developed and incorporated by the wallet manufacturers. The malware component of our stack can be reused to implement a resident program that issues notifications or sound alarms each time a new Ethereum address is added to the operating sys- tem clipboard. Moreover, we are in the process of submitting recommendations to all the vendors of hardware wallets to incorporate the clipboard monitoring components into their desktop client software. Specifically, a system alert issued by such a component upon detection of an Ethereum address in the clipboard would effectively prevent the event of address substitution to be unnoticed because the user will see a system notification alert each time an address appears or replaced in the clipboard. 3.5: Related Work The research related to hardware wallets mostly focuses on hardware vulnerabilities and feature enhancements. Guri et al. [138] demonstrate a technique that allows for an attacker to exfiltrate pri- vate keys from a hardware wallet by installing a malware directly on the wallet’s firmware. Gutoski et al. [139] show that the hierarchical deterministic (HD) wallet design, used in all popular hard- ware wallets, allows to reveal all the private keys in the hierarchy if only one of the private keys is leaked; this research further proposes a new design of an HD wallet that allows to avoid such key co-dependency. Several works in wireless sensing [184] demonstrate the ability to steal passcodes from personal devices, possibly including hardware wallets. The above adversarial scenarios, how- ever, assume that either the attacker has a physical access to the hardware wallet, or there is a partial leak of wallet credentials — intuitively, both the scenarios are highly unlikely within the context of the EthClipper attack, which zeroes in on the adversarial actions that the real world attackers have been using successfully for decades, i.e., malware infestation of user computers and social engineering. Datko et al. [104] demonstrate how the firmware of some hardware wallets can be 68 attacked to steal the user PIN code. San Pedro et al. [246] explore side-channel attacks that allow to extract PIN codes and private keys from Trezor One hardware wallet — although the vulnerability has been timely patched by the manufacturer, it demonstrates that the hardware and firmware com- ponents of hardware wallets can also be attacked. Gkaniatsou et al. [131] show how the low-level local communication protocol between the client software and the hardware wallet can be used for side-channel attacks. Nevertheless, while breaking the air-gap protection of hardware crypto wal- lets is unrelated to EthClipper, fixing the hardware vulnerabilities does not make hardware wallets less susceptible to the EthClipper attack. 3.6: Chapter Summary Hardware crypto wallets are relatively expensive and popular among the users who own large amounts of cryptocurrency. These devices promise the protection of the stored funds even in the event when the attacker gains full control over the victim’s computer, including the malware in- vasion scenario. However, in this work we demonstrated that it is possible to compromise the air-gapped security of a hardware wallet and fool its owner into confirming a malicious transaction, even without jeopardizing the integrity of the wallet itself. Our EthClipper attack, which is con- firmed to be potentially dangerous by the manufacturers of three leading hardware wallet firms, not only falsifies the input to the hardware wallet, but it also crafts the address in a way that allows to circumvent the transaction verification procedure. Our evaluation confirms that the attack can be carried out with a limited budget on a retail equipment. As hardware wallets continue populating the market, we anticipate a growing number of opportunistic social engineering attack attempts on these wallets, and we believe that our work will raise the vigilance about such attacks. At the time of writing, there is no affiliation or sponsorship, current or arranged, between the authors of this work and the manufacturers of the hardware wallets used in this research. 69 CHAPTER 4: TAXONOMY OF DEFENSE SOLUTIONS FOR SMART CONTRACTS31 4.1: Introduction Blockchain is a decentralized network that sustains distributed records stored in immutable blocks to form an ever-growing chain. In one decade, blockchain technology has evolved from the ledger of cryptocurrency (e.g., Bitcoin, Monero) to the decentralized computing platform (e.g., Ethereum, EOS) that allows the deployment and execution of smart contracts. Smart contract is a decentral- ized program deployed on a blockchain that enforces the execution of protocols and agreements without involving any third party or establishing a mutual trust [265]. A smart contract provides a set of functions to be called via transactions and executed by the blockchain’s virtual machine (VM). Most smart contracts are written in high-level special-purpose programming languages, such as Solidity, JavaScript, or Vyper, and compiled into the blockchain VM bytecode. For example, the Ethereum Virtual Machine (EVM) is the blockchain VM for executing smart contracts on the Ethereum platform32 . An important feature of smart contracts is their ability to perform financial operations with cryptocurrency and valuable custom tokens (e.g., ERC20, ERC721). As of March 2022, the total market capitalization of smart contracts exceeds 300 billion USD [28]. The large amounts of valued assets stored and transacted by smart contracts made them lucrative targets for attackers. Numerous security vulnerabilities and attacks on Ethereum smart contracts have been hampering their widespread adoption [132, 255]. In the past few years, exploitations of these vulnerabilities caused hundreds of millions of dollars in damages. For example, in June 31 This chapter is based on accepted work by Nikolay Ivanov, Chenning Li, Qiben Yan, Zhiyuan Sun, Zhichao Cao and Xiapu Luo titled “Security Threat Mitigation For Smart Contracts: A Comprehensive Survey” to be published at ACM Computing Surveys (CSUR) [157]. 32 Although it is primarily associated with Ethereum, EVM has also been adopted by some other blockchain platforms, such as Polygon [34] and RSK [35]. 70 2016, about $150 million were stolen from the popular DAO contract [123]. In July 2017, about $30 million were stolen from the Parity multi-signature wallet [73]. Not long after that, a bug in the same multi-signature wallet caused the freeze of about $280 million [77]. A large number of approaches and tools have been developed to address different types of smart contract security issues. In this work, we use the term threat mitigation solutions to describe the full spectrum of the active defense and passive preventative solutions aiming to reduce or eliminate the threat associated with the exploitation of security vulnerabilities in smart contracts. These solutions include both academic research efforts as well as commercial and open-source software products. Some surveys have been published that summarize vulnerabilities and attacks in smart con- tracts [57,185]. Furthermore, the Smart Contract Weakness Classification and Test Cases database, also known as the SWC Registry [42], identifies and describes 37 classes of known smart contract vulnerabilities (as of March 2022). However, all the existing ways of systematizing smart contract security knowledge focus primarily on vulnerabilities and attacks, paying very little or no attention to the broad swath of defense and prevention mechanisms developed in the past decade. In this work, we bridge the gap in the systematization of the threat mitigation solutions via the following four steps: developing classification taxonomy, synthesizing design workflows of core methods of threat mitigation, creating the map of vulnerability coverage, and conducting an evolutionary analysis. Step I. Taxonomy: The smart contract threat mitigation constitutes a diverse set of efforts, so find- ing a uniform organizational methodology for all these solutions poses a major challenge. These solutions employ a variety of techniques, such as symbolic execution [214, 223], formal verifica- tion [83], static analysis [76, 310], to name a few. Some of these solutions target specific vulnera- bilities, such as reentrancy [240] or integer overflow [261], while others are general-purpose [271]. Some threat mitigation solutions aim at detecting vulnerabilities [200], while others focus on ver- ifying the safe property of a smart contract [230]. In other words, all these solutions vary within multiple dimensions. In this survey, we formalize these dimensions and create a comprehensive taxonomy of smart contract threat mitigation based on five dimensions: defense modality, core 71 method, targeted contracts, data mapping, and threat model. Step II. Design Workflows: In addition to learning what the smart contract threat mitigation so- lutions do, we also explore how they achieve their aimed goals — which is challenging due to a wide variety of innovations and novel techniques employed by the existing solutions. In this work, we study the design workflows of all the 133 smart contract threat mitigation solutions under our investigation, and we subdivide them into eight core methods: static analysis, symbolic execution, fuzzing, formal analysis, machine learning, execution tracing, code synthesis, and transaction in- terception. Then, we synthesize the actual designs of the threat mitigation solutions corresponding to each of the eight core methods and build eight uniform workflows that summarize the whole variety of threat mitigation solutions for smart contracts. Step III. Vulnerability Coverage: Next, we raise another important question: which known vul- nerabilities are covered (i.e., prevented, detected, or unmasked) by the existing smart contract threat mitigation solutions? Answering this question requires overcoming two significant challenges: i) the lack of explicit and implicit declaration of addressed vulnerabilities by many threat mitigation solutions, and ii) the lack of uniform definitions of smart contract vulnerabilities. To overcome these challenges, we meticulously translate, group, or un-group the vulnerabilities referred to by the authors of the threat mitigation solutions to match the vulnerability classification proposed by the popular SWC Registry. Thus, we develop a unified vulnerability coverage map for these solu- tions based on the SWC registry. Step IV. Evolutionary Analysis: We perform an evidence-based evolutionary analysis of existing smart contract threat mitigation solutions to identify trends and potential future research directions. Specifically, we identify the three most promising vectors of development of smart contract threat mitigation solutions: dynamic transaction interception, AI-driven security, and study of human- machine interaction in smart contracts. In addition, we identify two major deficiencies of the exist- ing body of threat mitigation solutions: the under-representation of non-Ethereum smart contracts as targets and the lack of security-related large-scale measurements, especially related to off-chain data. 72 In summary, in this work, we make the following contributions: • We develop a five-dimensional threat mitigation taxonomy tailored for smart contracts, and we use this taxonomy to classify 133 existing smart contract threat mitigation solutions. • We pinpoint eight core methods adopted by the existing smart contract threat mitigation so- lutions, and we develop synthesized workflows of these methods to demonstrate the internal workings of smart contract threat mitigation. • We identify the threat mitigation solutions that explicitly declare protection against specific vulnerabilities, and we create a smart contract vulnerability coverage map for these solutions. • We identify trends and deficiencies of the existing smart contract mitigation solutions based on the findings of this survey and other solid evidence. • Finally, in the spirit of open research, we develop and publish a constantly updated online registry of threat mitigation solutions, called the STM Registry33 . Organization: The rest of this work is organized as follows. First, we compare our work with previous surveys related to smart contract security (Section 4.2). Then, we describe the methodol- ogy employed in this survey (Section 4.3). After that, we classify 133 threat mitigation solutions based on the developed five-dimensional taxonomy (Section 4.4), followed by a detailed compar- ative description of designs of the eight core methods of threat mitigation (Section 4.5). Next, we compare the threat mitigation methods by their ability to address specific known smart contract vulnerabilities (Section 4.6). Then, we discuss trends and future perspectives of threat mitigation in smart contracts (Section 4.7), and finally, we conclude our work (Section 5.8). 4.2: Prior Surveys A number of previous surveys aimed at smart contract security have been published, which, how- ever, have different perspectives than this survey. Atzei et al. [57] propose the first systematic exposition of the Ethereum security vulnerabilities by organizing the vulnerabilities in three levels: 33 https://nick-ivanov.github.io/stmregistry/ 73 Solidity34 , EVM35 bytecode, and blockchain. They also illustrate six influential attacks in different application scenarios. In contrast, we primarily target vulnerability mitigation methods rather than the classification of programming pitfalls. Jiachi et al. [92] propose an empirical survey that pro- vides a systematic study of smart contract defects on the Ethereum platform from five aspects: se- curity, availability, performance, maintainability, and re-usability. They collect and analyze smart contract-related posts on Ethereum.StackExchange36 as well as real-world smart contracts to define 20 kinds of contract flaws and 5 relevant impacts. Zou et al. [311] perform an exploratory research to illustrate the current state and potential challenges in smart contract development. Specifically, they conduct semi-structured interviews with 20 developers and professionals, followed by a sur- vey of 232 practitioners to confirm the 5 conclusions from the interviews that focus primarily on smart contract development. In addition, Zhang et al. [302] present a new classification framework for smart contract bugs and construct a dataset of 176 buggy smart contracts. Wang et al. [284] con- duct an analysis of the security of Ethereum smart contracts and categorize these security challenges into abnormal contracts, program vulnerabilities, and unsafe external data. Vacca et al. [272] pro- vide a systematic review of techniques and tools used to address the software engineering-specific challenges of blockchain-based applications by analyzing 96 papers. The above surveys summa- rize smart contract security and development issues, while we focus on vulnerability mitigation solutions. There are also a number of surveys that take the vulnerability mitigation solutions into consid- eration. Huashan et al. [91] present a comprehensive and systematic survey on Ethereum systems security which includes vulnerabilities, attacks, and defenses. The authors discuss 44 kinds of vul- nerabilities based on the layers of the Ethereum architecture and describe the history, cause, tactic, and direct impact of 26 attacks. As for defenses, the authors enumerate 47 defense mechanisms and provide the best practices to guide contract development. Although they divide the defenses into proactive and reactive, they are lacking an explanation of how the different tools are designed. 34 Solidity is an object-oriented programming language used mostly for writing Ethereum smart contracts. 35 The Ethereum Virtual Machine (EVM) is a software platform for executing Ethereum smart contracts. All smart contracts are compiled into bytecode and run on the EVM of all Ethereum nodes. 36 https://ethereum.stackexchange.com/ 74 Another survey by Wang and He et al. [282] reviews 6 kinds of vulnerability detection methods and privacy protection techniques in 3 platforms (i.e., Ethereum, Hyperledger fabric and Corda), and summarizes several commonly used tools for each method. Di Angelo et al. [107] investi- gate 27 analysis tools of Ethereum smart contracts regarding availability, maturity level, methods employed, and detection of security issues. They examine the availability and functionality of the tools and compare their characteristics in a structured manner. In comparison, we carry out a multi-dimensional classification of 133 solutions and take into account different aspects of threat mitigation. Besides, we also analyze different defense mechanisms through their architecture. Fur- thermore, Samreen et al. [245] review some detection tools and discuss eight vulnerabilities by analyzing past exploitation cases. Ni et al. [222] propose a three-layered threat model for smart contract security and introduce 15 major vulnerabilities of Ethereum at three levels: programming language, virtual machine, and blockchain. They also summarize and compare the three most com- monly used vulnerability mitigation techniques, viz., fuzzing, symbolic execution, and formal ver- ification. Li et al. [185] survey the security threats of blockchain and enumerate 6 real attack cases. They also review the security enhancement solutions for blockchain by introducing 5 commonly used defense tools. In contrast, we categorize defenses in 5 orthogonal dimensions and compare 133 commonly used solutions. Praitheeshan et al. [233] review the security of Ethereum smart contracts through 16 types of security vulnerabilities, 19 software security issues, and 3 defense methods. For each defense method, they list several common tools but do not compare the differ- ent methods and tools. In contrast, we summarize 5 more vulnerability core methods and compare them through 5 dimensions. Moreover, we also construct a compact vulnerability map that contains 37 known vulnerabilities to summarize the vulnerability-addressing ability of 38 classes of threat mitigation solutions. There are several studies that delve into a specific defense method (e.g., formal verification). Tolmach et al. [268] scrutinize formal models and specifications of smart contracts. They categorize the specifications of smart contracts in various application domains and propose a four-layered framework to classify smart contract analysis methods. After that, they summarize the tools for 75 formal verification and group them based on the utilized techniques. In addition, the authors also discuss the difficulties in smart contract verification and development. Similarly, Singh et al. [258] conduct a systematic survey about current formalization research on all smart contract-enabled blockchain platforms by summarizing 35 studies between 2015 and 2019. However, these studies focus purely on formal verification without examining other types of threat mitigation. On the contrary, we provide eight commonly used vulnerability mitigation core methods and identify future research trends and directions in smart contract threat mitigation. Unlike the above surveys, which have insufficient technical depth or only focus on a specific method, our survey comprehensively reviews the topic of eight commonly used core methods. Overall, we undertake four major steps to shed light on the ever-evolving threat mitigation land- scape of smart contracts: 1) comprehensive 5-dimensional classification taxonomy; 2) synthesis of design workflows corresponding to the eight core methods; 3) vulnerability coverage map; and 4) evolutionary analysis with trends and perspectives. The combination of these four steps applied to 133 solutions makes our work the most comprehensive systematization of smart contract threat mitigation to date. 4.3: Methodology In this section, we describe the details of the 4-step methodology that we use in this survey. Fig. 4.1 depicts these steps, which include: Step I: developing the classification taxonomy of smart con- tract threat mitigation solutions; Step II: synthesizing the workflows of the core methods of threat mitigation solutions; Step III: developing the vulnerability coverage map by threat mitigation so- lutions; and Step IV: investigating the evolutionary trends and deficiencies of threat mitigation in smart contracts. Next, we describe the approaches employed by these four steps in detail. 4.3.1: Classification Taxonomy To classify the smart contract threat mitigation solutions, we build a comprehensive taxonomy of threat mitigation, which includes the following five orthogonal dimensions (see Table 4.1): 1) de- fense modality, 2) core method, 3) targeted contracts, 4) data mapping, and 5) threat model. We 76 Figure 4.1: Four-step methodology of this survey. empirically verify that our taxonomy is not only concise but also allows to describe a threat mitiga- tion solution with high accuracy. For example, using our taxonomy, the popular threat mitigation tool Oyente [200] can be accurately described via the following single sentence: “Oyente is a security tool based on symbolic execution that detects and reports vulner- abilities in the bytecode of malicious or buggy Ethereum smart contracts.” Moreover, our taxonomy is cross-platform and general enough to be applied to the future de- velopments of threat mitigation for smart contracts, even when new methods or platforms emerge. Next, we describe all these five dimensions of the threat mitigation taxonomy in detail. Defense Modality: The defense modality is the essential philosophy used by a threat mitigation solution to achieve its goals, which is either prevention, detection, or exploration. The prevention methods aim at verifying or enforcing certain security properties of a smart contract. For exam- ple, the requirement that if a smart contract accepts cryptocurrency deposits, it must also provide the functionality for cryptocurrency withdrawal, can be used by a solution with the prevention modality as a property to enforce or verify security. The detection methods look for known vul- nerabilities in smart contracts. For instance, defense tools that search for reentrancy vulnerabilities in smart contracts pertain to the detection defense modality. The exploration approaches enhance 77 Table 4.1: Smart contract threat mitigation taxonomy Classification Dimension Possible Values Short Notation prevention PR Defense Modality detection DET exploration EXP static analysis SA symbolic execution SE fuzzing F formal analysis FA Core Method machine learning ML execution tracing ET code synthesis CS transaction interception TI Ethereum ETH EVM-compatible EVMc Targeted Contracts any contract aC non-Ethereum nETH Input Output Input Output source code report S R bytecode source code B S Data Mapping ABI bytecode A B specifications action Sp Ac chain data exploits C E assembly code metadata As M vulnerable contract only VC Threat Model malicious contract only MC malicious or vulnerable contract MVC the transparency of a smart contract or associated transactions in order to facilitate security audits. For example, an auditing tool that allows demystifying the call stack of a complicated smart con- tract, thereby exposing the potential security problems, would belong to the exploration defense modality. Core Method: The core method is the technical approach describing the implementation princi- ples of a given threat mitigation solution. Unlike defense modality, which describes the general philosophy of a solution, the core method describes the implementation methodology utilized by the solution; in other words, the same defense philosophy can be implemented in a number of dif- ferent core methods. Threat mitigation solutions belonging to the same core method, despite the 78 Figure 4.2: Venn diagram of relationships between different scopes of smart contracts. diversity of implementations, share the same major workflow with possible minor additions. For example, all symbolic execution methods take a smart contract and a set of specifications as an input, utilize an SMT solver, and produce a human-readable report as an output; however, many symbolic execution solutions, in addition to the standard workflow items, add some additional mod- ules and data units. In this work, we build workflows that demonstrate which items are essential and which of them provide an incremental augmentation. Targeted Contracts: The dimension of targeted contracts describes the class of smart contracts that a threat mitigation solution applies to. This dimension is largely shaped by the practical circum- stance, in which the vast majority of smart contract threat mitigation solutions target the popular Ethereum platform. Moreover, we notice that within the Ethereum platform, there is very little variety in terms of what kind of Ethereum smart contracts the threat mitigation solutions target. In other words, most solutions target Ethereum, and these Ethereum-based solutions are suitable for any Ethereum contract. Thus, to accurately represent the practical reality of the distribution of smart contract threat mitigation solutions in the dimension of targeted contract, we subdivide this dimension into four classes: Ethereum smart contracts, EVM-compatible smart contracts, non- Ethereum smart contracts, and any smart contract (i.e., platform-agnostic). Fig. 4.2 shows the Venn diagram of the relationships between these classes. Specifically, all Ethereum contracts are EVM-compatible, but there are non-Ethereum platforms that may or may not be EVM-compatible. At the same time, the “any contract” scope would embrace all the types of smart contracts men- tioned above, without prioritizing any of them. 79 Data Mapping: The data mapping dimension describes what the input and output of a given threat mitigation solution are. As shown in Table 4.1, the input of a threat mitigation solution may be a combination of 1) source code; 2) bytecode; 3) application binary interface (ABI); 4) security specifications; 5) chain data; or 6) assembly code. The output can be represented by any combination of the following six entities: 1) security report; 2) source code; 3) bytecode; 4) defense action; 5) set of exploits; or 6) metadata. In this work, we use the symbol 7→ as a convention for data mapping. For example, if the input of a threat mitigation solution is a set of specifications with the source code of the smart contract, and the output is a human-readable report, then we denote such a mapping as Sp,S7→R. As we can see, the data mapping dimension allows to concisely and informatively describe the requirements for the input and expectations for the output for a smart contract threat mitigation solution. Threat Model: The dimension of threat model describes the vector(s) of potential attacks that the threat mitigation solution aims to prevent, detect, or explore. We empirically observe that all the smart contract threat mitigation solutions belong to either of the three general threat models: 1) the one with the malicious smart contract; 2) the one in which the smart contract is the victim; and 3) the agnostic model, in which the contract may be either malicious or a victim. For example, the threat mitigation solutions capable of preventing exploitations of the reentrancy vulnerability, re- sponsible for the infamous DAO hack [123], belong to the VC (victim contract) model. Conversely, a tool defending against honeypot smart contracts, which set unexpected traps for hackers attempt- ing to exploit known smart contract vulnerabilities, is a typical example of a threat mitigation tool assuming the malicious contract (MC) threat model. However, some solutions defend against vul- nerabilities that can be used both in a malicious or a victim smart contract; in this case, we assign to this solution the malicious-or-vulnerable contract (MVC) model. For example, the SWC-123 vul- nerability [40], called Requirement Violation, can be both a bug in a vulnerable smart contract or an intentional malicious action of the smart contract developer. 80 4.3.2: Workflows of Core Methods In this survey, not only do we explore what the smart contract threat mitigation solutions do, but we also explore, for the first time, how these solutions accomplish their goals. In order to do that, we adopt the following approach: for each of the eight core methods, we synthesize the workflows of all the existing solutions implementing these methods to showcase the mandatory (common for all solutions) and augmented (observed in some solutions) elements. Sections 4.5.1—4.5.8 describe the synthesized workflows of all the eight core methods of smart contract threat mitigation. In order to embrace the diverse variety of implementations, we use a uniform set of conventions in the eight workflows. Specifically, we use three types of elements connected with flows (arrows): modules (data processors), data entities, and environments (groups). 4.3.3: Vulnerability Coverage The third step of our survey is scrutinizing the vulnerability coverage, i.e., to determine which known vulnerabilities are detectable and/or preventable by the existing threat mitigation solutions. To accomplish that, we create a uniform vulnerability coverage map using the popular SWC Reg- istry. This task poses two major challenges: i) many threat mitigation solutions do not explicitly or even implicitly declare the set of addressed vulnerabilities; ii) the majority of threat mitigation solutions refer to the existing vulnerabilities using custom names and/or groupings, which often do not correspond to the SWC taxonomy. Here, we select the 38 threat mitigation solutions that explicitly specify the list of targeted vulnerabilities, and then we meticulously translate the declared vulnerability coverage provided by the selected 38 solutions into the SWC conventions. 4.3.4: Threat Mitigation Evolution Our final step explores the evolution of the smart contract threat mitigation solutions, as well as the trends and obstacles observed in this area of computer security. Specifically, we explore the adoption and augmentation of new core methods over time. For each threat mitigation solution, we keep track of the publication date as well as the initial release or announcement date, whenever 81 available. Additionally, we analyze the “blind spots” of the existing body of smart contract mit- igation solutions — the potentially feasible yet unexplored combinations of approaches that can bring more benefits, especially if a similar combination of approaches has been successful in other more mature areas of computer security. As a result, we make five observations supported by data and evidence. First, we identify that dynamic transaction interception methods of smart contract threat mitigation are gaining momentum in the research community. Second, we show that the smart contract threat mitigation solutions utilizing AI and machine learning have started playing an important role in smart contract defense. Third, we identified the emerging trend for studying human-machine interaction in the domain of smart contracts. Fourth, we confirm that Ethereum smart contracts are over-represented by the threat mitigation solutions, and we discuss likely rea- sons explaining this phenomenon. Finally, we discuss the necessity for more exploration tools and large-scale measurements for gathering important data about smart contract security, such as the real market value of smart contracts and the traces of choices made by miners and crypto exchanges. 4.4: Threat Mitigation Classification In this section, we apply the taxonomy developed earlier (Section 4.3.1) to describe each of the threat mitigation solutions via the five orthogonal dimensions: threat mitigation modality (Section 4.4.1), core method (Section 4.4.2), the scope of targeted contracts (Section 4.4.3), the input-output data mapping of the solution (Section 4.4.4), and the assumed threat model (Section 4.4.5). The results of our classification are given in Table 4.2. Furthermore, we perform a frequency analysis of the results along the five dimensions, and create a visual representation of the distributions of defense modalities, core methods, targeted contracts, and threat models in Fig. 4.3. In the first column of the table, we assign to each of the threat mitigation solutions a permanent Security Threat Mitigation (STM) registry identifier in the STM-XXX format. The second column provides the name of the tool implementing the solution along with its reference; if a solution does not have a common name, we refer to the solution by its authors (e.g., Ivanov et al.). In columns 3–7, we provide the values along the five classification dimensions for each of the 133 threat mitigation 82 Table 4.2: Classification of threat mitigation solutions based on the proposed taxonomy. Classification Criteria (Dimensions)† STM Threat registry mitigation Defense Core Targeted Data Threat code solution Modality Method Contracts Mapping Model STM-001 Oyente [200] DET SE ETH B 7→ R MVC STM-002 Mythril [215] DET SE ETH B 7→ R MVC STM-003 Securify [271] DET+PR SA ETH B,S 7→ R,M MVC STM-004 Maian [223] DET SE ETH B 7→ R MVC STM-005 Manticore [214] DET SE ETH B 7→ R MVC STM-006 KEVM [148] EXP FA ETH B 7→ R MVC STM-007 ZEUS [168] PR SE ETH B 7→ R MVC STM-008 Sereum [240] DET ET,TI ETH C 7→ R VC STM-009 ECFChecker [137] PR ET,TI ETH C 7→ R VC STM-010 teEther [178] DET SE ETH B 7→ E VC STM-011 Hydra [74] PR CS ETH S 7→ B VC STM-012 Erays [310] EXP SA ETH B 7→ M MVC STM-013 TokenScope [96] DET ET ETH C 7→ R MVC STM-014 Osiris [269] DET SE ETH B 7→ R VC STM-015 Vandal [76] DET SA ETH B 7→ R MVC STM-016 FSolidM [205] PR CS ETH S,Sp 7→ S VC STM-017 ContractFuzzer [163] DET F ETH A,B 7→ R VC STM-018 S-GRAM/Ether* [194] DET SA ETH S 7→ R MVC STM-019 MadMax [133] DET SA ETH B 7→ R MVC STM-020 SmartCheck [267] DET SA ETH S 7→ R MVC STM-021 ReGuard [193] DET F ETH S 7→ R,E VC STM-022 GASPER [95] DET SA ETH B 7→ R MC STM-023 Grishchenko et al. [136] EXP FA ETH B 7→ M MVC STM-024 Lolisa [294] PR FA ETH S 7→ R MVC STM-025 SASC [307] EXP SA ETH S 7→ R MVC STM-026 Chen et al. [97] DET ET,SA ETH C,S 7→ R MC STM-027 Solidity*/EVM* [64] PR SA,FA ETH S 7→ R MVC STM-028 Amani et al. [52] PR SA,FA ETH B,Sp 7→ R MVC STM-029 Model-Checking [217] PR FA ETH Sp,S 7→ R MVC STM-030 EtherTrust [135] DET SA ETH B 7→ R MVC STM-031 Flint [249] PR CS ETH Sp 7→ S VC STM-032 HoneyBadger [270] DET SA,SE ETH B 7→ R MC STM-033 ILF [146] DET F,ML ETH B,S 7→ R MVC STM-034 VeriSolid [206] PR FA,CS ETH S,Sp 7→ S VC STM-035 solc-verify [141] PR SA ETH S,Sp 7→ R MVC STM-036 Slither [114] DET SA ETH S 7→ R MVC STM-037 sCompile [89] DET SE ETH B 7→ R MVC STM-038 NPChecker [280] DET SA ETH B 7→ R MVC STM-039 BitML [58] PR SA nETH C,Sp 7→ R,E VC STM-040 CESC [187] DET SA ETH B 7→ R VC † DET— detect; PR — prevent; EXP — exploration; SA — static analysis; SE — symb. execution; F — fuzzing; FA — form. analysis; A — ABI; ML — mach. learning; ET — ex. tracing; CS — code synthesis; TI — transaction interc.; S — source code; B — bytecode; Sp — specifications; C — chain data; As — assemb. code; R — report; Ac — action; E — exploits; M — metadata; ETH — Ethereum; nETH — non-Ethereum; EVMc — EVM-comp.; aC — any contr.; VC — vuln. contr.; MC — mal. contr.; MVC — mal. or vuln. contr. 83 Table 4.2 (cont’d) Classification criteria STM Threat registry mitigation Defense Core Targeted Data Threat code solution Modality Method Contracts Mapping Model STM-041 EasyFlow [127] DET SA ETH C 7→ R VC STM-042 Vultron [278] DET CS ETH S 7→ R VC STM-043 SAFEVM [48] PR SA ETH S,B 7→ R MVC STM-044 EthRacer [176] DET F ETH B,C 7→ R MVC STM-045 SolidityCheck [301] DET SA ETH S 7→ R MVC STM-046 EVMFuzz [125] DET F ETH S 7→ R MVC STM-047 EVulHunter [236] DET SA ETH As 7→ R VC STM-048 GasFuzz [202] DET F ETH B 7→ R MVC STM-049 NeuCheck [198] DET SA ETH S 7→ R MVC STM-050 SolAnalyser [47] DET SA ETH S 7→ R MVC STM-051 SoliAudit [189] DET F ETH S 7→ R MVC STM-052 MPro [304] DET SA,SE ETH S 7→ R MVC STM-053 Li et al. [186] PR FA ETH S 7→ R VC STM-054 Gastap [49] PR SA ETH S,B,As 7→ R MVC STM-055 Momeni et al. [212] DET ML ETH S 7→ M MVC STM-056 KSolidity [165] EXP FA ETH S 7→ M MVC STM-057 VerX [230] PR SE,CS ETH S 7→ R MVC STM-058 VeriSmart [261] DET SA ETH S 7→ R MVC STM-059 TxSpector [299] EXP SA,ET ETH C 7→ R MVC STM-060 Zhou et al. [309] EXP SA ETH C 7→ R MVC STM-061 ETHBMC [122] DET SE ETH B 7→ R MVC STM-062 SODA [94] DET TI EVMc C 7→ R,Ac VC STM-063 Ethor [248] PR SA,FA ETH B 7→ R MVC STM-064 ÆGIS [117] DET TI ETH C 7→ R,Ac VC STM-065 SafePay [188] DET SE ETH S,B 7→ R VC STM-066 Solar [116] DET CS,SE ETH Sp 7→ R VC STM-067 EVMFuzzer [126] DET F EVMc Sp 7→ R MVC STM-068 ModCon [196] DET+PR F aC S 7→ R MVC STM-069 Harvey [291] DET F ETH S 7→ R MVC STM-070 Solythesis [183] PR CS ETH S 7→ S VC STM-071 Ethainter [75] DET SA ETH S 7→ R MVC STM-072 sFuzz [221] DET F ETH B,A 7→ R MVC STM-073 Seraph [295] DET SE aC S 7→ R MVC STM-074 Clairvoyance [296] DET SA ETH S 7→ R VC STM-075 Artemis [274] DET SE ETH B 7→ R MVC STM-076 Echidna [134] DET F ETH B,Sc 7→ R,M MVC STM-077 EShield [293] PR CS ETH B 7→ B VC STM-078 SMARTSHIELD [305] DET CS ETH B 7→ B,R VC STM-079 ETHPLOIT [303] DET F ETH S 7→ E MVC STM-080 Cecchetti et al. [84] PR CS aC S 7→ R VC STM-081 EthScope [289] DET ET ETH C 7→ R MC STM-082 ContractWard [281] DET ML ETH S 7→ R,M MVC STM-083 RA [100] DET SA,SE ETH B 7→ R VC STM-084 Camino et al. [80] DET SA ETH B 7→ R MC STM-085 OpenBalthazar [56] DET SA ETH S 7→ R MVC STM-086 sGUARD [220] PR SA,FA ETH B 7→ R VC STM-087 SmartPulse [263] PR SA ETH S,Sp 7→ R MVC 84 Table 4.2 (cont’d) Classification criteria STM Threat registry mitigation Defense Core Targeted Data Threat code solution Modality Method Contracts Mapping Model STM-088 SeRIF [83] DET FA aC S 7→ R VC STM-089 EVMPatch [241] DET CS ETH B,C 7→ B VC STM-090 Perez et al. [228] EXP ET ETH C 7→ R VC STM-091 DEFIER [264] DET ET ETH C 7→ R MVC STM-092 SmarTest [260] DET SE,SA ETH B 7→ R,E MVC STM-093 EOSAFE [147] DET SA nETH B 7→ R MVC STM-094 Ivanov et al. [158] PR+DET SA ETH S 7→ R MC STM-095 ConFuzzius [119] DET F ETH B,A 7→ R MVC STM-096 Huang et al. [154] DET SA ETH B 7→ R VC STM-097 STC/STV [153] PR SA ETH S 7→ R VC STM-098 Horus [120] DET ET ETH C 7→ R MVC STM-099 BlockEye [276] DET ET,TI ETH C 7→ R MVC STM-100 Sailfish [70] DET SE,SA ETH B 7→ R MVC STM-101 DeFiRanger [290] DET SA ETH C 7→ R VC STM-102 ESCORT [199] DET ML ETH B,Sp 7→ R MVC STM-103 DefectChecker [93] DET SE ETH B 7→ R MVC STM-104 Hu et al. [152] DET SA,ML ETH B 7→ R MVC STM-105 HFContractFuzzer [108] DET F nETH S 7→ R VC STM-106 Solidifier [54] PR FA ETH S 7→ R VC STM-107 SafelyAdministrated [156] PR CS,ML ETH S 7→ S MC STM-108 EXGEN [166] DET SE aC S 7→ R VC STM-109 EtherProv [191] DET SA ETH S 7→ B,M MVC STM-110 Abdellatif et al. [44] PR FA aC C 7→ R MVC STM-111 Bai et al. [59] PR FA aC Sp 7→ R MVC STM-112 Bigi et al. [65] PR FA aC Sp 7→ R MVC STM-113 Findel [66] PR CS aC Sp 7→ S VC STM-114 ContractLarva [112] PR CS ETH S,Sp 7→ S MVC STM-115 Le et al. [182] PR FA aC S 7→ R MVC STM-116 Solicitous [204] PR SA ETH S 7→ B MVC STM-117 VeriSol [283] PR FA EVMc S 7→ R VC STM-118 SmartCopy [115] DET F ETH B,A 7→ R VC STM-119 WANA [277] DET SE aC B 7→ R MVC STM-120 E-EVM [224] EXP ET ETH C 7→ R MVC STM-121 AMEVulDetector [197] DET ML aC S 7→ R MVC STM-122 Javadity [46] PR CS EVMc S 7→ S MVC STM-123 Alqahtani et al. [51] PR SA aC S 7→ B MVC STM-124 Bartoletti et al. [60] PR FA nETH S 7→ R MVC STM-125 Beckert et al. [61] PR FA nETH S 7→ R VC STM-126 SmartInspect [72] EXP SA ETH S,C 7→ R MVC STM-127 CPN [109] EXP SA ETH S,B 7→ R MVC STM-128 Hajdu et al. [142] PR FA,SA ETH S 7→ R MVC STM-129 Kongmanee et al. [177] PR FA ETH Sp 7→ M MVC STM-130 EVM* [203] PR TI ETH B,C 7→ R,Ac MVC STM-131 OpenZeppelin Contracts [33] PR CS ETH S 7→ S VC STM-132 MythX [31] DET many ETH B 7→ R MVC STM-133 Contract Library [26] DET+PR CS ETH B 7→ R MVC 85 solutions. Furthermore, to keep the data in this table up to date and handy, we deploy the Smart Contract Threat Mitigation Registry (STM Registry). Selection Method of the Threat Mitigation Solutions: For this survey, we select 133 threat miti- gation solutions, encompassing both academic research projects (e.g., Securify [271], Oyente [200] and commercial non-academic efforts (e.g., OpenZeppelin Contracts [33], MythX [31]). To assure the quality of our study, we use the following four criteria for selecting threat mitigation solutions: 1. Implementation. We select only solutions that are implemented and evaluated, either as a proof-of-concept (PoC) prototype or in the form of a final product. 2. Publication. For academic research projects, we search for the papers published or accepted at a reputable peer-reviewed venue. 3. Impact. We select solutions that deliver specific improvements or other unique qualities compared to the state-of-the-art solutions. 4. Novelty. Not only do we consider the fact of improvement or impact, but we also consider the presence of technical novelty, i.e., a specific innovation that leads to the improvement. In some cases, we include threat mitigation solutions that do not meet all the four above criteria, such as the academic project Vandal [76], which has never been published at a peer-reviewed venue. However, we include this work in our survey because it is widely adopted and cited. Lessons Learned: There are more than 200 claims of smart contract threat mitigation solutions. Yet, our thorough manual examination reveals various problems associated with some of them. For example, we observed that sometimes two research papers refer to the same implementation (e.g., poster or journal extension articles). In the end, 133 instances have been selected to represent the body of smart contract threat mitigation solutions. Therefore, manual scrutiny of each work is required. 86 (a) Defense modality (b) Core method (c) Targeted contracts (d) Threat model Figure 4.3: Distribution of threat mitigation methods by four criteria: defense modality, core method, targeted contracts, and threat model. 4.4.1: Threat Mitigation Modalities A threat mitigation modality is a philosophy that a smart contract threat mitigation method employs to address security issues of a smart contract. The threat mitigation solutions that employ the de- tection modality are designed to identify vulnerabilities in smart contracts. Some of them (e.g., Oyente [200], Securify [271], Vandal [76], and Mythril [215]) target several groups of vulnerabil- ities. Other detection-based threat mitigation solutions focus on specific classes of vulnerabilities, such as Sereum [240], which detects only reentrancy vulnerabilities (SWC-107 [37]). Another narrow-focused detection tool is VeriSmart [261], which detects arithmetic bugs only. Overall, we note that the detection solutions that focus on specific vulnerabilities tend to deliver improved detection rates compared to the solutions targeting multiple vulnerabilities. The solutions belonging to the prevention modality validate some safety properties or rules. ZEUS [168] provides eight semantic rules that are used as part of an abstract assertion language 87 for specifying safety properties for ensuring that a smart contract is free of certain vulnerabilities (e.g., reentrancy, unchecked send, integer overflow, etc.). Another salient representation of a pre- vention solution is SmartPulse [263], which creates a linear temporal logic (LTL) language, called SmartLTL, for expressing temporal safety properties in smart contracts and enforcing them with the SmartPulse verifier. The exploration modality solutions do not detect vulnerabilities or enforce safety properties; instead, they reveal previously concealed data that facilitates human-based or automated auditing of a smart contract. Erays [310] is a tool for reverse-engineering of smart contracts that converts a bytecode of a smart contract into pseudocode-like metadata. TxSpector [299] is another exploration solution, which is a transaction processing framework that identifies the executed attacks in smart contract execution traces. Some threat mitigation solutions adhere to a hybrid detection+prevention modality, which means that they can detect existing vulnerabilities, as well as enforce security properties. Securify [271] not only checks the compliance with security patterns but also detects violations of patterns asso- ciated with specific vulnerabilities, such as reentrancy and restricted transfer. Another threat mit- igation solution with a hybrid detection+prevention modality is ModCon [196], which is a smart contract testing tool that generates a list of states and transitions between these states, thereby en- abling further identification of vulnerabilities and confirmation of security properties. Fig. 4.3a shows the breakdown of the three defense modalities among the 133 threat mitigation solutions. As we can see, 81 (59.6%) of all the threat mitigation solutions employ the detection modality, 44 (32.4%) use the verification modality, and the remaining 11 (8.1%) belong to the exploration modality. Some threat mitigation solutions exhibit a hybrid modality (e.g., DET+PR — detection combined with prevention), in which case we identify and assume the predominant modality for the statistical analysis, or we count both modalities in cases when it is impossible to detect the predominant one — which explains the 136 total modalities considered, despite the fact that they correspond to 133 threat mitigation solutions. 88 4.4.2: Core Methods The core method describes how a threat mitigation solution addresses the security issues of a smart contract. In other words, the core method defines the implementation approach, choice of algo- rithms, and internal data processing model of a threat mitigation solution. By scrutinizing all the 133 smart contract threat mitigation solutions, we identify eight distinct core methods: 1) static analysis; 2) symbolic execution; 3) fuzzing; 4) formal analysis; 5) machine learning; 6) execution tracing; 7) code synthesis; and 8) transaction interception. Static analysis solutions extract data from smart contracts in order to detect vulnerabilities or confirm safety properties. Most static analysis solutions adhere to the detection modality (e.g., Se- curity [271], S-GRAM [194], MadMax [133], SmartCheck [267]). However, some static analysis solutions enforce policies instead of detecting vulnerabilities (e.g., solc-verify [141], BitML [58], GasTap [49], Solicitious [204]). Moreover, we notice that the static analysis core method is often coupled with some other methods. Solidity* [64], Amani et al. [52], Ethor [248], and sGUARD [220] use static analysis together with formal analysis. Also, static analysis is often used together with the symbolic execution core method, as we can see in HoneyBadger [270], MPro [304], SmarTest [260], and Sailfish [70]. Symbolic execution methods execute a smart contract with symbolic parameters instead of real ones — in order to make conclusions regarding some security properties of smart contracts (e.g., the range of values that make a certain condition true). Oyente [200], Mythril [215], Maian [223], Manticore [214], ZEUS [168], Osiris [269], teEther [178] are popular solutions employing the sym- bolic execution core method. Similar to static analysis, symbolic execution is also often coupled with other core methods. VerX [230] and Solar [116] use symbolic execution to guide code syn- thesis. The solution by Hu et al. [152] takes advantage of both symbolic execution and machine learning for detecting smart contract vulnerabilities. Fuzzing methods perform smart contract testing by iteratively generating test cases that are likely to reveal vulnerabilities. ContractFuzzer [163] uses the abstract binary interface (ABI) of the smart contract to facilitate the generation of fuzzing inputs. Harvey [291] is a smart contract tester 89 based on greybox fuzzing, which is a middle-ground solution between the absence of code analy- sis (blackbox fuzzing) and full code execution (whitebox fuzzing); specifically, greybox fuzzing assumes a lightweight (compared to symbolic execution) analysis of the code execution paths. Con- fuzzius [119] is a smart contract fuzzer that uses a combination of genetic algorithms and constraint solving. Overall, fuzzing threat mitigation solutions utilize a diverse variety of predictive methods for balancing accuracy and performance. Formal analysis methods convert a smart contract into a formal representation and run a solver over this representation to prove or disprove some security properties. Most solutions employing the formal analysis core method belong to either the prevention defense modality (e.g., Lolisa [294], Model-Checking [217], Li et al. [186], Solidifier [54], VeriSol [283]) or the exploration modality (e.g., KEVM [148], Grishchenko et al. [136]). However, SeRIF [83], which primary purpose is defense against reentrancy, demonstrates that the formal analysis can also be used for targeting vulnerabilities. Machine learning methods extract features from smart contracts and train models for detect- ing vulnerabilities. The smart contract threat mitigation solutions utilizing the machine learning core method are ContractWard [281], ESCORT [199], AMEVulDetector [197], and the solution by Momeni et al. [212]. In Section 4.7.2, we conduct an in-depth discussion about the evolutionary perspective of machine learning in smart contract security. Execution tracing and transaction interception core methods constitute the transaction-based methods of smart contract threat mitigation. The execution tracing methods examine the runtime traces of the actual transactions submitted to a smart contract in order to detect vulnerabilities, verify safety properties, or facilitate manual auditing. TokenScope [96], EthScope [289], DEFIER [264], Horus [120], BlockEye [276], E-EVM [224] are instances of “pure” execution tracing methods (i.e., not combined with other methods). Code synthesis threat mitigation solutions aim at generating vulnerability-free smart contract code resistant to attacks. Hydra [74] is a framework that generates bug bounties for smart contracts using the N-of-N version programming (NNVP) principle. FSolidM [205] is a framework for de- 90 signing secure smart contracts as finite state machines (FSMs) and converting them into Solidity code. Solythesis [183] is a source-to-source Solidity compiler that instruments the input source code with additional instructions for validation of security-sensitive invariants. Transaction interception solutions dynamically observe the transaction pool of a blockchain node in order to prevent the execution of malicious or unsafe transactions. These solutions are represented by SODA [94], and EVM* [203]. However, we observe that execution tracing is often combined with other core methods. Sereum [240] and ECFChecker [137] combine execution trac- ing with transaction interception, while TxSpector [299] and the Ponzi scheme detection solution by Chen et al. [97] utilize trace execution combined with static analysis. Fig. 4.3b shows the distribution of the eight core methods among the 133 threat mitigation so- lutions. Specifically we found 49 (35.3%) static analysis tools, 21 (15.1%) symbolic execution methods, 15 (10.8%) fuzzing tools, 22 (15.8%) formal analysis tools, 5 (3.6%) machine learning solutions, 11 (7.9%) execution tracing tools, 13 (9.4%) code synthesis tools, and 3 (2.2%) trans- action interceptors. Notably, some threat mitigation solutions employ a combination of the afore- mentioned core methods; in this case, we recognize all the methods evolved in Table 4.2, yet for the purpose of counting and frequency analysis, we reduce the combination of core methods to the predominant core method, if there is one. If it is impossible to identify the predominant core method, we count all of them, which explains that the total count of instances of core methods slightly exceeds the number of the threat mitigation solutions surveyed in this work. 4.4.3: Targeted Contracts Each of the threat mitigation solutions assumes a type of targeted smart contract. Some solutions target general groups of smart contracts, such as Ethereum or even all possible contracts, while some other solutions may target a single specific smart contract instance. Oyente [200], Mythril [215], Securify [271], Sereum [240], Vandal [76], OpenZeppelin Contracts [33], MythX [31], Contract Library [26], and many other popular threat mitigation solutions are strictly Ethereum-based. Some solutions are EVM-compatible, which means that they are compatible with but not limited by the 91 Ethereum smart contracts. SODA [94], VeriSol [283], and Javadity [46] are EVM-compatible solutions. Some solutions are universal in terms of the scope of targeted contracts; although they might not support any type of smart contracts (e.g., the ones that are not Turing-complete), they do not limit their scope to a specific group either. Such solutions are ModCon [196], Seraph [295], SeRIF [83], EXGEN [166], and the information flow control solution by Cecchetti et al. [84]. Some threat mitigation solutions target a specific non-Ethereum platform. BitML [58] targets Bitcoin smart contract overlays, EOSAFE [147] targets the smart contracts on the EOS blockchain [27], and HFContractFuzzer [108] targets the Hyperledger Fabric platform [53]. To make sense of this diverse spectrum, we group the targeted smart contracts into four types. Fig. 4.3c shows the distribution of different groups of targeted contracts among the threat mitigation methods. Specifically, we discover that as many as 111 (83.5%) solutions target Ethereum contracts, 13 (9.8%) are suitable for any contract (including Ethereum, but not specifying it), 5 (3.8%) aim for some non-Ethereum contracts (e.g., Hyperledger Fabric), and 4 (3.0%) target EVM-compatible contracts (e.g., Polygon [34], RSK [35]). 4.4.4: Data Mapping Next, we explore the design-specified inputs and outputs of each of the threat mitigation solu- tions. Most smart contract threat mitigation solutions assume a smart contract as an input, either as bytecode, source code, or as part of the chain data. Oyente [200], Mythril [215], Vandal [76], ZEUS [168], teEther [178], and Osiris [269] are solutions that take bytecode as a smart contract in- put. Hydra [74], S-GRAM [194], SmartCheck [267], VerX [230], VeriSmart [261], and SeRIF [83] are solutions that assume source code as the input. Sereum [240], ECFChecker [137], Token- Scope [96], EasyFlow [127], TxSpector [299], and EthScope [289] are the threat mitigation so- lutions that read smart contract information from the chain data, i.e., stored copy of the blockchain. Some threat mitigation solutions use a combination of bytecode and source code as an input, e.g., Securify37 [271], SAFEVM [48], Gastap [49], SafePay [188], and CPN [109]. Other solu- tions, in addition to a smart contract, also take a set of manual specifications as an input, as we 37 Source code is optional in Securify. 92 see it in FSolidM [205], Model-Checking [217], VeriSolid [206], solc-verify [141], BitML [58], SmartPulse [263], ESCORT [199], and ContractLarva [112]. Moreover, a smart contract is not always used as an input of a threat mitigation solution. For instance, Flint [249], Solar [116], EVM- Fuzzer [126], Findel [66], and the solution by Kongmanee et al. [177] assume a set of specifications as the only input. Most threat mitigation solutions produce human-readable report as an output, e.g., Oyente [200], Mythril [215], Maian [223], Manticore [214], ZEUS [168], and Sereum [240]. However, some solutions produce machine-readable metadata (e.g., a formal model) in lieu of a human-readable report, which can be observed in Erays [310], the solution by Grishchenko et al. [136], the solution by Momeni et al. [212], KSolidity [165], and the solution by Kongmanee et al. [177]. Table 4.2 shows that the majority of the threat mitigation solutions (82.7%) produce a human- readable report as an output, and for 78.19% of the solutions, the security report is the only output. Notably, only 4 (3.0%) of all the threat mitigation solutions result in an action (e.g., stopping a malicious transaction), which is indicative of the predominance of the static methodology in the smart contract defense, which is further discussed in Section 4.7.1. One important property of data mapping is that it often provides fine-tuned information that can- not be inferred from the workflow of the corresponding core method. For example, the workflows of smart contract threat mitigation solutions often specify “smart contract” as one of the inputs. However, a smart contract can have several representations: source code, bytecode, deployed ad- dress, etc. In this work, we extract the specific meaning of the “smart contract” and represent it accordingly in the data mapping. 4.4.5: Threat Model Finally, we describe all the threat mitigation solutions through the general description of their as- sumed threat models. In other words, the threat model specifies the source of the threat, identifies the victim(s), and defines the intent. We generalize all the threat models by subdividing them into three major groups: victim contract, malicious contract, and hybrid malicious or victim contract. 93 Sereum [240], teEther [178], Hydra [74], Osiris [269], SODA [94], ÆGIS [117], EVMPatch [241], SeRIF [83], and OpenZeppelin Contracts [33] are threat mitigation solutions with the vulnerable contract threat model. Solutions with malicious contract threat models are the Ethereum honeypot detector HoneyBadger [270], GASPER [95], and the social engineering attack detector by Ivanov et al. [158]. Most threat mitigation solutions, however, are threat vector agnostic, i.e., they are capable of defending against malicious smart contracts, as well as protecting vulnerable contracts. Secu- rify [271], Oyente [200], ZEUS [168], SmartCheck [267], SmartPulse [263], SmarTest [260], and MythX [31] are solutions with a bidirectional vector (malicious or victim contract) threat model. Fig. 4.3d shows the breakdown of different threat models among the threat mitigation methods. We find that 41 (30.8%) methods assume vulnerable contracts, 7 (5.3%) imply the malicious con- tract model, and 85 (63.9%) assume both these vectors. As we can see, the pure malicious smart contract threat model is underrepresented among the threat mitigation solutions, which suggests that attacks on smart contracts are generally perceived as more important than the cases of malicious contracts attacking users. This finding is corroborated by the study by Zhou et al. [309], which confirms that the popularity of the honeypot vulnerability, associated with the malicious smart con- tract modality, is fourth after call injection, call-after-destruct, and airdrop-hunting vulnerabilities, which all assume the victim smart contract threat model. 4.5: Design Workflows of Threat Mitigation Methods In this section, we scrutinize the designs of the threat mitigation solutions by synthesizing the uniform workflows for all the eight core methods, i.e., static analysis (Section 4.5.1), symbolic execution (Section 4.5.2), fuzzing (Section 4.5.3), formal analysis (Section 4.5.4), machine learning (Section 4.5.5), execution tracing (Section 4.5.6), code synthesis (Section 4.5.7), and transaction interception (Section 4.5.8). Figs. 4.4—4.11 depict the workflows of the eight core methods. Each of these eight workflows utilizes a set of uniform elements: modules, data entities, flows (arrows), and environments. This set of elements allows us to concisely summarize and demystify the wide variety of implementations of smart contract threat mitigation solutions using the aforementioned 94 set of uniform conventions. The modules (green rectangles) represent items that do something, i.e., algorithms, data filters, etc. Modules can be mandatory, i.e., pertaining to any solution with the given core method (solid borders) or optional/augmenting, i.e., implemented by some solutions employing the given core method (dashed borders). The data entities (blue rectangles) represent pieces of data or abstract data structures. The flows, depicted as arrows, show data or execution transitions. Environments (red rectangles) allow grouping of certain elements into single logical modules. Lessons Learned: By manually examining the workflows of all the 133 threat mitigation solutions, we learned that every component exhibits a certain degree of generalization. For example, an element called “smart contract” is a more general form of what could also be denoted as “source code” or “bytecode”. Thus, one of the challenges we face when synthesizing the workflows is to equate the generalizations of similar workflow elements. 4.5.1: Static Analysis Workflow The static analysis methods apply automated data filtering and syntax analysis techniques to the input. Static analysis methods detect vulnerabilities by extracting information (facts) from the source code or bytecode of a smart contract. Fig. 4.4 shows the general workflow of static analysis methods. The static analysis methods take bytecode (e.g., Erays [310], Vandal [76], MadMax [133]) or source code (e.g., S-GRAM [194], SmartCheck [267], Slither [114]) of a smart contract as an input, while some solutions also analyze previously executed transactions gathered from the chain data (e.g., EasyFlow [127], Zhou et al. [309]). A large part of the static analysis process is devoted to constructing a model in the form of one or a set of abstract data structures (ADS) that constitute a suitable (and efficient) input for the static analyzer. Control flow graph (CFG) is a popular type of such an ADS, which is utilized by Securify [271], Erays [310], and Vandal [76], to name a few. The built model, data (in the form of some intermediate representation, e.g., a graph), and a set of pre-defined or user-specified specifications are then directed to the static analyzer, which produces 95 a human-readable security assessment report. Figure 4.4: Workflow of the static analysis core method. 4.5.2: Symbolic Execution Workflow Symbolic execution methods [174] simulate the execution of a smart contract in a way that the actual inputs are replaced with special traceable symbolic parameters. Fig. 4.5 depicts the gen- eral workflow of symbolic execution methodology. These methods use smart contract bytecode and a set of specifications as an input. In some cases, the specifications are part of the tool (e.g., Oyente [200], Mythril [215], teEther [178], Osiris [269]), in other cases, the specifications are ex- pected to be provided by the user (e.g., Maian [223]). Symbolic execution methods execute smart contracts with traceable (symbolic) parameters in lieu of actual inputs, which allows to prove or disprove some presumptions about smart contracts. Specifically, symbolic execution can answer questions about the possibility of execution of a certain block of code (reachability), the ability to invoke a certain execution path, or the ability to satisfy certain constraints. Similar to static anal- ysis, symbolic execution often involves building a search-efficient data structure, such as CFG, as well as extracting facts and features from the input. However, unlike static analysis, the symbolic execution methods run the code instead of analyzing its syntax. All the existing symbolic execution solutions surveyed in this work employ the Z3 [43] SMT solver. Some symbolic execution solutions use certain augmentations to the basic design by adding additional features. Oyente [200], teEther [178], SafePay [188], and Artemis [274] process the smart contract to build a CFG. Another augmentation observed in symbolic execution solutions is 96 the production of exploits (sample inputs revealing vulnerabilities), as can be seen in teEther [178] and EthBMC [122]. Moreover, some symbolic execution methods perform a preliminary analysis (preprocessing) for generating guidance data facilitating the symbolic execution. SmarTest [260] guides symbolic execution with a language-based model in order to achieve higher accuracy and reduce the rate of timeouts. Figure 4.5: Workflow of the symbolic execution core method. 4.5.3: Fuzzing Workflow Fuzzing methods use various techniques for generating subsets of test inputs that could reveal vulnerable execution paths in smart contracts. Fig. 4.6 shows how the fuzzing core method works in smart contracts. Fuzzing tools perform iterative testing of a smart contract by generating test cases and adjusting these cases via a feedback loop. The execution of smart contracts is performed by the fuzzing engine, which is either a stand-alone code interpreter or an instrumented (i.e., modified with a custom code) blockchain virtual machine. Fuzzing techniques allow to address the two notorious problems associated with software testing — input ranges and path explosion. Even a single parameter of a smart contract function might exhibit a virtually endless range of actual values, e.g., the 256-bit integer in Ethereum; so the goal of a fuzzing method is to pick input samples that are likely to reveal vulnerabilities. The path explosion problem occurs when the user needs to call a sequence of transactions. Even if the exact arguments are known in advance (which is not always the case), the number of possible orders of transactions and other variable scenarios “explodes” as the number of transactions in the sequence increases, which necessitates the use of special techniques, such as pruning, by the fuzzing threat mitigation methods. Similar to symbolic execution, some fuzzing methods also utilize guidance data for facilitating test case generation. Confuzzius [119] performs a preprocessing in the form of taint analysis in 97 order to guide the fuzzing engine. Also, in addition to identifying a problem in a smart contract, it is common for a fuzzing solution to deliver proof of a vulnerability in the form of a sample malicious transaction or a series thereof, as we see in ReGuard [193], SoliAudit [189], and EthPloit [303]. Figure 4.6: Workflow of the fuzzing core method. 4.5.4: Formal Analysis Workflow Formal analysis methods convert smart contracts into formal representations and use automated provers for deriving deterministic conclusions about the security properties of these smart con- tracts. Fig. 4.7 depicts the workflow of the smart contract formal analysis core method. One impor- tant component of a formal analysis solution is the fact extractor, which converts a smart contract into a formal representation, usually in a form of a domain-specific language (DSL). The formal representation is then delivered to an automated prover, such as Tamarin [208], along with some specifications representing vulnerabilities or security properties. The prover then juxtaposes the extracted facts with the provided properties to deliver a set of conclusions, which include compli- ance and violation statements. The output of a formal analysis solution may be supplemented with additional outputs. Specifically, some formal analysis solutions include the intermediate results in the report, e.g., extracted semantics, as seen in KEVM [148]. Also, some solutions not only prove existing theorems, but they also produce theorems based on certain specifications, such as theorems, as we can see in Lolisa [294]. 98 Figure 4.7: Workflow of the formal analysis core method. 4.5.5: Machine Learning Workflow Machine learning methods extract features from smart contracts or smart contract transactions and train models for classifying smart contracts based on the types of vulnerabilities discovered in them. Fig. 4.8 shows the general workflow of smart contract machine learning-based threat mit- igation solutions. We discover that all the existing machine learning methods of smart contract threat mitigation use supervised models, requiring a subset of labeled smart contract samples. The workflow of a machine learning approach requires the data preprocessing (preparation) step, which includes building a “clean” (uniform) dataset, creating training and testing samples, and performing manual labeling (or using an existing one). The primary goal of the training step is to determine the parameters of a chosen model. The goal of the testing step is to verify the robustness of the model candidate. Once the model is trained and properly tested (e.g., using a K-fold method, as observed in the evaluation part of SafelyAdministrated [156]), the model can detect vulnerabilities or confirm the safety of the unlabeled contracts or smart contract transactions. Feature extraction and model building are two major characteristics that describe machine learn- ing threat mitigation solutions. Momeni et al. [212] deliver an ML model for detecting vulnerability patterns in smart contracts, using an abstract syntax tree (AST) and control flow graph (CFG) for feature extraction. ContractWard [281] approaches an ML-based detection of vulnerabilities in smart contracts based on bigram features. ESCORT [199] is a machine learning smart contract threat mitigation solution based on a deep neural network (DNN) with a semantic-based feature extractor. AMEVulDetector [197] builds a semantic graph from the source code and applies deep learning to building the vulnerability detection model. 99 Figure 4.8: Workflow of the machine learning core method. 4.5.6: Execution Tracing Workflow Execution tracing methods assess the security properties of smart contracts by exploring the execu- tion of transactions sent to a given smart contract or an externally owned account (in cases when the Ethereum platform is targeted38 ). Fig. 4.9 depicts the workflow of execution tracing methods. These solutions use transactions as their input. After that, the transactions are filtered to keep only the ones associated with a specific account, specific smart contract, or a concrete action (e.g., at- tack). Next, the filtered transactions are executed by the instrumented blockchain virtual machine (e.g., EVM). The instrumented code passively observes the execution of the given transactions and produces a special data structure called execution traces. Formally, an execution trace is a path in a control flow graph (CFG) of a smart contract that describes the execution of a specific transaction (or a sequence of transactions). The execution traces are then analyzed to produce a human-readable report. EthScope [289] is a security analysis framework that detects suspicious smart contracts in three steps: collecting related blockchain states, replaying transactions, and reporting data for manual introspective analysis. Perez et al. [228] propose an automated execution tracing framework for Ethereum for detecting both vulnerabilities and actual attacks exploiting these vulnerabilities. DE- FIER [264] is a tool for the investigation of attack instances associated with Ethereum decentral- ized applications (DApps), which use Ethereum transaction tracing. Horus [120] is an execution 38 Ethereum has two types of accounts: smart contract account and externally owned account (EOA). Both EOAs and smart contract accounts can be referenced by their 160-bit public addresses. 100 tracing framework for the detection and investigation of attacks on smart contracts that use logic- based and graph-based analyses of Ethereum transactions. Another execution tracing solution is E-EVM [224] that performs emulation and visualization of smart contracts. Figure 4.9: Workflow of the execution tracing core method. 4.5.7: Code Synthesis Workflow The code synthesis methods produce the source code or bytecode of a smart contract with or without a template. The objective of code synthesis methods is to produce a smart contract resistant to spe- cific attacks or vulnerabilities. Fig. 4.10 shows the workflow of the code synthesis core method. We observe that some code synthesis solutions produce code from specifications only; others require a template to apply specifications to (e.g., ContractLarva [112]). Custom source code annotations are an example of specifications, as we can see in Cecchetti et al. [84]. Some code synthesis solutions utilize language BNF grammars or custom code libraries (e.g., SafelyAdministrated [156] and OpenZeppelin Contracts [33]) to aid the process. The result of code synthesis is a source code or a bytecode of a smart contract with specific security properties. In addition, some threat mitigation solutions utilize the code synthesis core method to patch vulnerable smart contracts on the bytecode level (e.g., SmartShield [305]). 4.5.8: Transaction Interception Workflow A blockchain network is a set of peer-to-peer (P2P) nodes. In this type of workflow, we assume that each node sustains the entire copy of the blockchain, i.e., we assume that the blockchain node is a full node. Furthermore, each node has a transaction pool, which is a queue of transactions- 101 Figure 4.10: Workflow of the code synthesis core method. candidates for addition to the blockchain. Transaction interception methods are dynamic approaches that read submitted transactions from the transaction pool of the blockchain node and prevent the node from including unsafe transactions in the blockchain. Fig. 4.11 shows the general workflow of the transaction interception core method. Transaction interception methods employ the blockchain P2P node instrumentation, which means that there is a custom code injected into the routines respon- sible for transaction ordering or smart contract execution. All the transaction interception solutions surveyed in this work also produce a human-readable report of their operation, which is reasonable: deleting transactions from the pool is a deep intervention into the blockchain network protocol, so it must leave a log of the action. Transaction interception solutions, although not numerous, exhibit a diverse spectrum of ap- proaches. SODA [94] is a transaction-interception framework for EVM-compatible platforms that allows users to develop custom apps for dynamic defense against attacks. ÆGIS [117] is another transaction interception solution that uses a committee of voting security experts to create and ap- prove attack patterns that steer transaction interception by instrumented nodes. Another transaction interception solution is EVM* [203], which monitors overflows and timestamp bugs. Figure 4.11: Workflow of the transaction interception core method. 102 4.6: Vulnerability Coverage In this section, we compare threat mitigation solutions from the perspective of their ability to ad- dress the known smart contract vulnerabilities. First, we select all the solutions that explicitly declare the list of vulnerabilities they cover, 38 total, and translate the information about these vulnerabilities into the model adopted by the popular SWC Registry [42]. Then we build the vul- nerability map, presented in Table 4.3, which juxtaposes the threat mitigation methods by their ability to address the 37 known smart contract vulnerabilities. The first column of the table has the names of the threat mitigation solutions and corresponding references; if the names are not available, we use the authors instead. The next 37 columns each correspond to the numbered SWC Registry vulnerabilities. Thus, the table constitutes a compact map showing which vulnerabilities are supported (i.e., defended against), which ones are partially supported, and which ones are not supported at all for each of the 38 threat mitigation methods. The challenge of this approach lies in the fact that different threat mitigation solutions refer to the same vulnerabilities using different names. Moreover, some solutions refer to a group of SWC vulnerabilities as a single weakness. Rodler et al. [240] declare the coverage of three vulnerabilities, which correspond to the single reentrancy vulnerability in the SWC Registry, viz., SWC-107 [37]. Some other solutions do the opposite: they break down a single SWC vulnerability into several fine- grained subgroups. For instance, the SWC-100 [36] and SWC-108 [38] vulnerabilities are often treated as a single vulnerability called the “private modifier”, as we can see in SmartCheck [267] and in SolidityCheck [301]. Table 4.3 unambiguously demonstrates that different vulnerabilities exhibit unequal attention from different threat mitigation solutions. For example, 24 solutions declare defense against reen- trancy (SWC-107 [37]), whereas none of the solutions declare defense against shadowing the state variables (SWC-119 [39]) and RTL-override control character (SWC-130 [41]). Remarkably, we observe that both of the vulnerabilities exhibiting close attention by the existing threat mitigation solutions as well as the ones overlooked by these solutions are often particularly challenging to 103 pinpoint. Lessons Learned: By studying the vulnerability coverage by smart contract threat mitigation solu- tions, we discovered that some vulnerabilities are covered by multiple threat mitigation solutions. In contrast, many vulnerabilities are not covered by any solutions. 4.7: Trends and Perspectives In this section, we discuss the emerging trends in smart contract threat mitigation (Section 4.7.1, Section 4.7.2, Section 4.7.3), the overlooked types of smart contracts (Section 4.7.4), and the ne- cessity for data-driven studies in smart contract security (Section 4.7.5). To avoid speculations and opinion-based statements, we only make inferences based on our survey data and other strong evidence. Lessons Learned: By exploring trends and perspectives associated with smart contract threat mit- igation solutions, we discovered that there is a substantial room for future work despite the abun- dance of existing studies. 4.7.1: Dynamic Transaction Interception Most smart contract threat mitigation solutions use predominantly static code-based detection ap- proaches. However, we note that the focus of the research community is shifting in three major directions: 1. static approaches are shifting into the dynamic paradigm; 2. the code based methods are shifting into the transaction-based ones; and 3. the detection methods are shifting towards verification. Following these observations, it would be reasonable to suppose that the next generation of smart contract threat mitigation solutions will likely continue exploring the primarily overlooked area of vulnerability-agnostic dynamic transaction interception. We believe that there are two significant reasons these methods are particularly promising: they are blockchain state-aware and can address zero-day attacks. 104 Table 4.3: Summary of the defense tools against smart contract vulnerabilities. Vulnerability (SWC Registry Number)† Threat Mitigation 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 Solution 123 124 125 126 127 128 129 130 131 132 133 134 135 136 Oyente [200] ####### ##### # #################### Securify [271] #### # G ###### ######## ############ Mythril [215] # ## # ## # # # #G ################### Sereum [240] ####### ############################# Vandal [76] ##### ####### ## ###############G ### sGuard [220] # ##### ####### ##################### ZEUS [168] # ##### ###### ################# ## ConFuzzius [119] # ## # ## # # # #G ################### VeriSmart [261] # ################################### SmarTest [260] # ### ### ############G ############## Maian [223] ##### ############################## ECFChecker [137] ####### ############################# Osiris [269] # ################################### FSolidM [205] ####### ###### ###################### ContractFuzzer [163] #### ## #### ### ################# ## MadMax [133] ########################## # ##### ## SmartCheck [267] # ## #### # ############### #### ReGuard [193] ####### ############################# ILF [146] #### ##### ### #################### NPChecker [280] #### ## ###### # #################### EasyFlow [127] # ################################### Vultron [278] # ## ## ################## ########## SoidityCheck [301] ## ### #### # # ################## GasFuzz [202] ###### ################# # ######## SolAnalyzer [47] # ## ########## ######### # ######## GasTap [49] ########################## # ##### ## Momeni et al. [212] # ## ## ##### ##################### Harvey [291] ########## ############# ############ sFuzz [221] # ## ## #### ### ######### ########## Artemis [274] ############ ### ######### ########## EthPloit [303] ####G #G############################### EthScope [289] # ##### ############ ################ RA [100] ####### ############################# SeRIF [83] ####### ############################# Huang et al. [154] # ## ## ############ ################ DefectChecker [93] #### # #####G # # #################### ExGen [166] # #### ##### G ######################## MythX [31] # ## # # # ### ### ## ##### ## # — full support; G # — partial support; # — no support. † Available at https://swcregistry.io/ and https://github.com/SmartContractSecurity/SWC-registry 105 To demonstrate the blockchain state awareness, consider the Ethereum smart contract Foo in Fig. 4.12a, which transfers cryptocurrency funds to a smart contract Bar (Fig. 4.12b). Bar is de- ployed on Ethereum Mainnet39 , but not on Ropsten testnet40 . Moreover, Bar does not have any payable functions41 , and therefore it cannot accept incoming Ether. As a result, the transfer in line 6 (Fig. 4.12a) will fail, reverting the entire transaction — but only on Mainnet, not on Ropsten. Even if the states of all the variables of contract Foo on Ropsten are identical to their counterparts on Mainnet, the behavior of the withdraw() function will be different. This example demonstrates that the state of blockchain is an important factor that determines the outcome of smart contract ex- ecution. Unlike the static ones, dynamic transaction interception methods consider the current state of the blockchain, thereby preventing situations such as those illustrated in this example. A recent study by Zhou et al. [309] reveals that novel (zero-day) smart contract attacks con- stantly appear on Ethereum. This trend creates a major challenge: how to defend against attacks we do not yet know about? One way to address this problem is to utilize the prevention methods that enforce security properties instead of searching for flaws, attacks, and vulnerabilities. Unfor- tunately, the security properties in static prevention solutions are tightly associated with known attacks and vulnerabilities. ECFChecker (STM-009) [137] is a prevention method that verifies the “callback-free” property that ensures the safety of a smart contract from the family of reentrancy vulnerabilities. These properties, however, might not be universal enough to protect the smart con- tract from new vulnerabilities. One possible way to fill this gap is to verify the properties associated with expected outcomes of smart contract functions instead of vulnerability-related properties. 4.7.2: AI-driven Security We identify another recent salient trend in smart contract threat mitigation solutions — AI-driven approaches involving machine learning. There are two major reasons why these approaches are capable of making a significant contribution: they allow to embrace the expressiveness of modern 39 Ethereum Mainnet is the major production Ethereum network supporting the Ether cryptocurrency. 40 Testnets are alternative blockchain networks utilized for development and experiments. Testnets normally execute the same protocols as production networks, but the test cryptocurrency on the testnet does not have any market value. 41 A payable function allows to transfer (deposit) cryptocurrency to the smart contract. 106 1 contract Foo { 1 contract Bar { 2 function deposit () public payable {} 2 constructor () public { } 3 function withdraw () public { 3 } 4 address admin = 5 0 xEc125A03C6F9E75BEB1A420e94d655B2f1352584 ; 6 payable (admin ). transfer (1000000000 wei) ; 7 payable (msg. sender ). transfer ( address ( this). balance ); 8 } 9 } (a) smart contract Foo (b) smart contract Bar Figure 4.12: A pair of smart contracts demonstrating the importance of the block state. smart contracts, and also these approaches have been proven successful in securing other domains of computing [25, 63]. The expressiveness of smart contracts limits the capacity of static and formal analytical methods. Most modern smart contracts are Turing-complete, which allows them to implement sophisticated algorithms using high-level programming languages, such as Solidity and Rust. However, the smart contract expressiveness is a double-edged sword, as it creates a virtually infinite number of coding possibilities, which are very hard to embrace by static methods that predominantly rely upon patterns. Although machine learning methods also rely upon some patterns, recent machine learning models (e.g., deep neural network based) could explore much higher-dimensional feature spaces than static approaches. In the past few years, we have been observing a growing trend of using AI and machine learning for security purposes, such as malware detection [244]. Although the machine learning methods for smart contract threat mitigation have not yet gained considerable popularity, the flexibility and universality of these methods will likely play an important role in smart contract defense. 4.7.3: Human-machine Interaction in Smart Contracts Smart contracts are often opposed to traditional user software based on the idea of replacing human- based decisions with a deterministic algorithm. However, such a vision is overly idealistic because 107 a human is an integral part of a smart contract lifecycle. Specifically, humans write the source code of smart contracts. Even in the case of automatically synthesized smart contracts, we still require sufficient human intervention for developing templates and specifications. Testing a smart contract also requires a human, even for unit tests, which are developed by a human developer too. The security audit of a smart contract is also impossible without human judgment despite a wide variety of auditing tools available. Finally, interaction with smart contracts is always initiated by a user, regardless of the degree of automation. However, the impact of a human on the security of smart contracts is not sufficiently studied. The study of human-machine interaction in smart contracts is limited by exploring honeypots and revealing a potential for some social engineering attacks. Honeypots are malicious smart con- tracts that entrap naive attackers who try to exploit a known vulnerability in a smart contract, making honeypots a class of social engineering attacks, i.e., attacks targeting humans as the major attack vector. HoneyBadger [270] is the automated tool that identifies such honeypots. Ivanov et al. [158] expand the scope of social engineering attacks with two more categories: address manipulation and homograph. However, the two efforts mentioned above do not embrace the entire complexity of human-smart contract interaction. One unexplored area of human-smart contract interaction is the security implication of the grow- ing population of smart contract users who do not have a deep knowledge of the working mechanics of the blockchain and smart contracts. Another security-sensitive aspect of human-smart contract interaction is the assumption that the decentralization of blockchain implies decentralized applica- tions (i.e., smart contracts) enabled by that blockchain. Specifically, many smart contracts imple- ment routines (e.g, the Ownable parent class in OpenZeppelin Contracts [33]) that grant excessive power to specified accounts. This excessive power may be abused by the owner or stolen by the at- tacker [156] with potentially detrimental consequences. These two examples show the importance of studying human-smart contract interaction from the security perspective, and we envision many future studies in this area. 108 4.7.4: Non-Ethereum Contracts As it is revealed in Section 4.4, the vast majority of the existing smart contract threat mitigation methods target the smart contracts on the Ethereum platform. However, in recent years, the world has been experiencing major growth in the popularity of non-Ethereum smart contract platforms, such as NEO [32], Hyperledger Fabric [53], EOS [27], and others. Our analysis of the evolution of smart contract threat mitigation solutions clearly shows the growing attention by the research community to the security of non-Ethereum smart contracts. One reason for such disproportional attention to Ethereum, compared to other platforms, is that Ethereum is an open-data environment with the second-largest market capitalization after Bitcoin, so it is both convenient and important to study [229]. However, these choices come at the expense of overlooking other major smart contract platforms. At the same time, our analysis shows that it is often impossible to extrapolate the lessons learned in Ethereum to the other platforms. Many of the existing vulnerabilities and other security issues are directly related to the design of the Ethereum platform or the syntax of Solidity — the most popular programming language for Ethereum smart contracts. Therefore, we expect increased attention to non-Ethereum platforms in the future development of smart contract threat mitigation research. 4.7.5: Large-scale Measurements Although blockchain is an open-data environment, there are multiple facts and statistics that we are unaware of. One reason is that a large amount of blockchain-related data, such as failed transactions and ERC20 token prices, is stored outside of the blockchain. Moreover, the growing popularity of Decentralized Finance (DeFi) further intensified the exchange of off-chain data [29,30]. As a result, we have seen the growing amounts of on-chain and off-chain data that have not been analyzed from a security perspective. Yet, the existing security-related measurement studies [117, 228, 270, 309] of smart contracts do not give answers to all the important questions. Specifically, we identify two areas important for the security of smart contracts in which there is no systematic data: 109 1. the measurement and flow of the market value of non-cryptocurrency blockchain assets (e.g., ERC20 tokens); 2. study of the purchases and sales of cryptocurrency and tokens by the crypto exchanges, min- ing rewards, and crypto money laundering. Such data would be very helpful for applying weights to attacks and vulnerabilities based on the actual value flow of the smart contract assets. 4.8: Chapter Summary We surveyed the full spectrum of smart contract threat mitigation solutions in this work. We pre- sented a general taxonomy for the classification of such solutions, which applies to today’s methods and is suitable for future methods, even if new paradigms, blockchain platforms, or vulnerabilities appear. Using this taxonomy, we classified 133 existing smart contract threat mitigation solutions. We identified eight distinct core defense methods employed by the existing solutions and developed synthesized workflows of these core methods. We studied the ability of the existing smart contract threat mitigation solutions to address the known vulnerabilities. We conducted an evidence-based evolutionary study of smart contract threat mitigation solutions to outline trends and perspectives. To further benefit the community of smart contract security researchers, users, and developers, we deployed an open-source, regularly updated online registry for smart contract threat mitigation at https://nick-ivanov.github.io/stmregistry/. 110 CHAPTER 5: CONTEXT-AWARE USER-CENTERED TRANSACTION TESTING42 5.1: Introduction Ethereum smart contracts have been used for a wide variety of decentralized applications, such as decentralized finance (DeFi), non-fungible tokens (NFT), alternative currencies (based on ERC-20 tokens), and data attestation. However, numerous vulnerabilities and attacks on Ethereum smart contracts have been hampering their widespread adoption [132, 255]. Following the common vulnerabilities and exposures (CVE) database, the smart contract weak- ness classification and test cases (SWC) registry [42] identifies 37 classes of known smart contract vulnerabilities (as of January 2022). To counter the security threats, different types of defense tools have been developed, including syntactic analyzers [206,248], security scanners based on symbolic execution [168,200], fuzzing tools [119,163], transaction analyzers [94,240], security libraries [33, 156], formal defense methods [83, 227], and various hybrid analysis approaches [117, 309]. In this work, we scrutinize 106 existing smart contract security defense solutions, and find that each of them only addresses very few classes of known vulnerabilities. We further discover that certain vulnerability types have never been effectively addressed by any of the proposed defenses. Generally, all the existing smart contract defense methods have two design choices: 1) heuristic versus deterministic; and 2) detection versus verification (see Table 5.1). Heuristic approaches use the best-effort judgement applied to all cases (e.g., Confuzzius [119], sFuzz [221], Harvey [291]), 42 This chapter is based on previously published work by Nikolay Ivanov, Qiben Yan, and Anurag Kompalli titled “TxT: Real-Time Transaction Encapsulation for Ethereum Smart Contracts” published at the IEEE Transactions on Information Forensics and Security (Volume: 18). DOI: 10.1109/TIFS.2023.3234895 [161]. © 2023 IEEE. Reprinted, with permission, from Nikolay Ivanov, Qiben Yan and Anurag Kompalli “TxT: Real-Time Transaction Encapsulation for Ethereum Smart Contracts” (paper and IEEE titles are the same), January 2023. 111 Table 5.1: Different design choices of smart contract defense. Design choices Property Heuristic Deterministic† Detection Verification† Reject option 8 4 — — Guaranteed correctness 8 4 — — Confirm safety — — 8 4 Identify vulnerabilities — — 4 8 † choices made in this work (TxT) while deterministic designs guarantee the correctness at the expense of rejecting a small number of cases (such as KEVM [148], SeRIF [83], and eThor [248]). Detection tools identify known vul- nerabilities (e.g., Oyente [200], Securify [271]), while verification tools aim at confirming various safety properties (examples are VerX [230] and ZEUS [168]). The only known deterministic verifi- cation approach is formal verification which proves the correctness of smart contracts by develop- ing formal specifications for an automated prover [91]. Unfortunately, these specifications cover only particular cases (e.g., reentrancy [83]). Consequently, these formal verification approaches, despite guaranteed correctness, have very limited vulnerability coverage. To increase vulnerability coverage, we propose a new approach for real-time deterministic verification of Ethereum transac- tions. In this work, for the first time, we propose the deterministic verification of Ethereum transac- tions using a fully-synchronized instrumented Ethereum Virtual Machine (EVM). Our verification system relies on the user confirmation of a test transaction, as smart contract users generally have reasonable expectations of the transaction outcomes. For example, if the users purchase some to- kens, they would expect a balance increase of the respective token in the wallet. Unlike traditional defense methods, our approach could cover a large scope of suspicious transactions, thereby reveal- ing the behaviors associated with a majority of known and unknown vulnerabilities. 112 TxT Transaction Testing: To make it possible to preview the result of one or several transactions, we develop a smart contract testing framework called transaction encapsulation, which uses a fully- synchronized Ethereum node to execute transactions, while preventing the propagation of these transactions across the network. Transaction encapsulation classifies the transactions into two cate- gories: σ-deterministic (with guaranteed test result), and σ-nondeterministic (with non-guaranteed test result). To demonstrate the transaction incapsulation, we implement a distributed real-time transaction tester called TxT, which successfully reveals the unexpected outcomes associated with the majority of known smart contract vulnerabilities — significantly outperforming all existing defense methods. Our evaluation shows that TxT exhibits a low rate of σ-nondeterministic transac- tions. To further reduce the rate of σ-nondeterministic transactions, we enhance TxT functionality to enable explicit detection of specific vulnerabilities in 75% of σ-nondeterministic transactions. To interact with the transaction framework, the user first connects their crypto wallet to a TxT network and submits a transaction (or a sequence thereof) to the smart contract. Then, the user observes in the wallet or dApp interface (if used) the exact outcome of the transaction(s), called a posteriori state, manifested in cryptocurrency balances, token balances, error messages, etc. If the result of the test execution matches the expectations, the user switches their wallet back to the Ethereum Mainnet and submits the transaction as usual. While the user is testing and submitting transactions, TxT is continuously checking in the background if the condition for the replicability of the test transaction execution path still satisfies. Without the necessity to install new software or learn contract programming, TxT allows everyday users to identify unexpected outcomes of transaction sequences associated with the majority of known vulnerabilities, and it achieves a high vulnerability coverage which more than doubles the coverage of all the state-of-the-art defense tools combined. In summary, we deliver the following contributions: • We propose a new deterministic approach for smart contract verification, transaction encap- sulation, and design a distributed real-time dynamic transaction tester, TxT, to verify the security of transactions at runtime. 113 • To address the time-of-check/time-of-use (TOCTOU) problem, we formally determine the exact set of conditions for the execution path replicability of a test transaction and implement TxT using a fully-synchronized Ethereum node to perform the transaction encapsulation. • We reproduce 37 known smart contract vulnerabilities and confirm that TxT can intercept 83.8% of them, compared to only 40.5% by all the existing methods combined. We further evaluate 1.3 billion Ethereum transactions and confirm that 96.5% of them are suitable for security evaluation by TxT. 5.2: Background Ethereum, dApps, and Wallets: Ethereum is a decentralized blockchain ecosystem that supports the execution of smart contracts. Ethereum popularized the notion of decentralized application (dApp) — a full-stack software product with a web or mobile interface as a frontend and smart contract as a backend. In order for a dApp to interface with a smart contract and the Ethereum network at large, it must use a wallet as an intermediary. The wallets securely store private key(s) for signing and submitting transactions on the user’s behalf. Smart Contracts and Transactions: Ethereum Virtual Machine (EVM) is a part of Ethereum that executes smart contracts. As each transaction is executed by the EVM, the state of the blockchain changes to reflect the executed transaction. However, if a given transaction is invalid, the EVM reverts the blockchain to the state preceding this transaction. Essentially, an Ethereum transaction is a state changing instruction signed by the sender using their private keys. London Hard Fork and EIP-1559: There have been instances where Ethereum transactions were included in the blocks paying very little or no gas at all. As of block 12,965,000, a hard fork implementing several new Ethereum features was activated on the network. Dubbed “London”, this hard fork changed how fees are collected by the Ethereum network. Ethereum Improvement Proposal 1559 (EIP-1559), enforced in the London fork, changes the fee model in a way that it practically prevents zero-priced transactions. 114 1 contract Foo { 2 function deposit () public payable {} 3 function withdraw () public { 4 address admin = 5 0 xEc125A03C6F9E75BEB1A420e94d655B2f1352584 ; 6 payable ( admin ). transfer (1000000000 wei); 7 payable (msg. sender ). transfer ( address (this). balance ); 8 } 9 } Figure 5.1: A smart contract that fails only on Mainnet. 1 contract Bar { 2 constructor () public { } 3 } Figure 5.2: A non-payable smart contract deployed on Mainnet at 0xEc125A03C6F9E75BEB1A420e94d655B2f1352584. The same address on Ropsten testnet is an externally owned account (EOA). 5.3: Motivating Example Smart contracts do not operate in isolation; instead, they share with other smart contracts a dynamic blockchain network environment. Moreover, the same blockchain platform can be represented by several public blockchain networks, which sometimes affect the execution of the same smart contract. Consider smart contract Foo in Fig. 5.1, which transfers funds to smart contract Bar (Fig. 5.2). Bar is deployed on Mainnet, but not on Ropsten testnet. Moreover, Bar does not have any payable functions, and therefore it cannot accept incoming Ether. As a result, the transfer in line 6 (Fig. 5.1) will fail, reverting the entire transaction — but only on Mainnet, not on Ropsten. Even if the states of all the variables of contract Foo on Ropsten are identical to their counterparts on Mainnet, the behavior of the withdraw() function will be different. This example demonstrates that the state of blockchain (denoted σ) is an important factor that determines the outcome of smart contract execution. Next, we run a set of experiments to determine whether the existing smart contract defense can reveal the failed transfer issue. We confirm that Securify [271], Oyente [200], Mythril [215], Vandal [76], and Manticore [214] all fail to detect the issue, although some of them produce unre- 115 lated warnings. This example shows that some vulnerabilities might not be detected by the existing defense methods. Moreover, the security evaluation on a testnet does not offer a sufficient reassur- ance of contract safety. To address these issues, we propose a new defense approach for smart contracts based on transaction testing. Our approach tests a transaction (or a series of transactions) on an isolated fully-synchronized node, and then checks in real time whether the test transaction can replicate exactly the same execution path on Mainnet. Unfortunately, most existing smart con- tract threat mitigation solutions do not take the state of the current environment into account. The solution proposed in this work tests the current state of smart contracts in the blockchain, thereby providing a more accurate representation of contract behaviors. 5.4: Preliminaries In this section, we introduce the transaction encapsulation approach, and then give an overview of TxT tester, followed by formal conventions, assumptions, and threat model. 5.4.1: System Overview Transaction Encapsulation: In this work, we propose a new transaction encapsulation framework which offers a preview of the result of a transaction against the current state of Mainnet, but without mining the transaction across the network. The transaction encapsulation executes one or a series of transactions on an instrumented node fully-synchronized with the Mainnet network. Unlike testnet simulations and symbolic executions, the transaction encapsulation enables the execution of the transaction on the current state of Mainnet. The transaction encapsulation is designed not only to execute the transaction but also to deterministically reason whether the transaction can be replicated on Mainnet with completely identical execution path. Overview of Transaction Testing Workflow: Fig. 5.3 shows the workflow of the TxT’s transac- tion testing. To test a transaction with TxT, the user first switches the Ethereum network in their wallet and specifies a custom transaction gas price. Then, the user submits a sequence of trans- actions using their favorite wallet and dApp (if applies) — no other special-purpose software is needed. When the transaction sequence is executed, the a posteriori state will be observable in 116 the wallet and/or in the dApp, as if the transaction was executed by the Mainnet. Next, the user observes the status of the tested transaction (e.g., on a web page) to determine if the transaction is testable and reproducible at any given moment. In some rare cases, TxT will not be able to guarantee the result of the transaction, in which case the transaction will be labelled as σ-nondeterministic. Most σ-nondeterministic transactions contain binary opcodes that are potentially associated with some known vulnerabilities — in this case, TxT issues a warning about such a vulnerability. Otherwise, when the transaction is classified as σ-nondeterministic and there is no vulnerability marker present among the binary opcodes, then the transaction is deemed untestable. On the other hand, if the transaction is labeled σ-deterministic, it means that it is testable and guarantees correct test result. In this case, the user observes the result of the transaction (e.g., balances in the wallet) to determine if the result of the transaction matches the expectation. If the result is unexpected, obviously, the transaction should be abandoned by the user. If the result matches the expectation, the user needs to verify whether or not the transaction has expired, i.e., whether there are other incoming transactions that change the state of the contract(s) during the transaction testing. In some rare cases, TxT could determine that the test transaction has expired by the time the user is ready to resubmit it to the Mainnet. Even in such a situation, the user could retest the transaction. Conversely, if a TxT status shows that the transaction is still valid, the user submits the transaction to the Mainnet knowing that the outcome will be identical to the one observed during the corresponding transaction test. 5.4.2: Notation Previous studies demonstrate that reproducing a smart contract vulnerability often requires a se- quence of two or more transactions [122, 178, 214, 223, 260]. In this work, we use the notation similar to the one in [260] to denote the sequence of N transactions as T ∗ : T ∗ = (T1 , · · · , TN ), N ≥ 1. 117 Figure 5.3: Flow chart of transaction testing.  — requires manual user interaction. Furthermore, without the loss of generality, we use a simplified43 notation of transaction adapted from [113, 288]: Ti = {Tn,i , Tp,i , Tg,i , To,i , Tt,i , Tv,i , Tf,i , Ta,i , Tb,i , Th,i , Tc,i }, where Tn,i is the transaction nonce, Tp,i is gas price, Tg,i is gas offer, To,i is the transaction sender address, Tt,i is transaction recipient (destination address), Tv,i is the transaction value (the amount of Wei sent along with the transaction), Tf,i is the invoked function of the smart contract, Ta,i is the set of arguments with which Tf,i is invoked, Tb,i is the block the transaction is mined into, Th,i is the transaction hash, and Tc,i is the sequence of EVM opcodes in the execution stack of Ti , which recursively includes the opcode sequences of all the inter-contract calls (ICCs) executed by the 43 We simplify the definition by removing fields irrelevant to this study, such as (v, r, s) components of the transaction signature. 118 transaction. We assume that Ti is properly signed. 5.4.3: Assumptions A Posteriori State Assessment: Unlike traditional defense methods, TxT does not detect vulner- able or malicious code patterns; instead, TxT reveals suspicious behavior associated with these vulnerabilities. Specifically, we make a reasonable assumption that the user can assess whether the outcome of a series of transactions is satisfactory or not. TxT will then give the user an accurate preview of what will happen if the given transaction sequence is executed, and the user can use the interface of the wallet and/or the dApp to assess the a posteriori state in the form of Ether balances, token balances, dApp interface elements, transaction error messages, etc. Transaction Sequences: We assume that all transactions in the sequence represent a single com- plete logical workflow, such that the user can unambiguously assess its success or failure. For ex- ample, a typical token exchange workflow can be logically represented as the following sequence: ¶ sell token A for stablecoin44 S; · buy token B using stablecoin S. In this example, the user ex- pects to observe a specific amount of B tokens in their wallet. Also, we assume that all transactions in the sequence are distinct and sent from the same account to the same contract, i.e., ∀Ti , Tj ∈ T ∗ , i 6= j : To,i = To,j ∧ Tt,i = Tt,j , (5.1) where Ti , Tj are two transactions in the same sequence. We assume that the transactions in the sequence are chronologically ordered. Since Ethereum uses incremental per-account nonces by design [288], a testable transaction sequence must have nonces appearing in a strictly ascending order, i.e., ∀Ti , Tj ∈ T ∗ : j = i + 1 =⇒ Tn,j = Tn,i + 1. Finally, we define the requirement for Ethereum state transitions within the testable transaction 44 A token with market price pegged to a fiat currency (e.g., USD). 119 sequence: ∀Ti , Tj ∈ T ∗ : j = i + 1 ∧ Ti 7→ Tj =⇒ ∄Tk : (5.2) ∗ Tk ∈/ T ∧ To,k = To,i ∧ Tn,k ∈ [Tn,1 , Tn,N ], where Ti 7→ Tj denotes an EVM state transition when transaction Tj is executed after Ti within the sequence, and ∄Tk indicates the non-existence of any transaction Tk that satisfies the following criteria. On-Chain Transactions: We assume that all the transactions tested by TxT are traditional on- chain transactions, i.e., the transactions propagated, pooled, and mined by unmodified Ethereum nodes, such as Go-Ethereum. The Decentralized Finance (DeFi) ecosystem, which has gained sig- nificant traction in the recent years, is particularly sensitive to transaction ordering manipulation via a widespread opportunistic exploration of Miner/Maximum Extractable Value (MEV) [30]. This creates a pretext for transaction ordering attacks, such as sandwich front-running attack [118]. To alleviate the negative consequences (e.g., gas fee inflation and increased network overhead) of MEV transactions, the Flashbots project delivers a patch (MEV-Geth [29]) for the Go-Ethereum node that allows DeFi participants to submit transactions directly to the patched nodes, which essen- tially creates an off-chain overlay network for transaction propagation. In this work, we consider orthodox Ethereum transactions, and leave the MEV-related transactions for future work. 5.4.4: Threat Model In this work, we assume that Ethereum is secure and correct on the blockchain and consensus layers, and the honest nodes correctly implement the protocol. The threat rests on the smart contract layer, coming either from an attacker or from a non-adversarial bug. The attacker (if present) may either be the one who introduces a security vulnerability in the smart contract, or they may be the one who exploits a pre-existing program bug. The attacker aims at earning financial gains or causing disruptions to the dApps. In all the cases, the attacking vector is a stand-alone Ethereum node or a Ethereum API (such as Infura or Pocket Network). 120 5.5: TxT: Transaction Testing Framework In this section, we describe the challenges and details of the TxT design, and illustrate the transac- tion testing procedure. 5.5.1: Design Challenges Ethereum is a dynamic ecosystem where anyone in the world can deploy smart contracts or submit transactions that compete for being included into constantly appended blocks. This compositional nature of Ethereum creates a number of practical challenges described below. Challenge #1. TOCTOU Problem: The time-of-check/time-of-use (TOCTOU) problem is man- ifested in TxT as the combination of the transaction expiration problem and the execution path guarantee. Our analysis of Ethereum confirms the intuitive proposition that the execution path of a transaction does not necessarily repeat that of an identical previously-submitted test transaction. Every test transaction may sooner or later experience an “expiration” (i.e., the outcome of the test transaction does not match that of the real transaction), after which it no longer demonstrates a valid outcome of an identical transaction. In this work, we determine the exact set of conditions affect- ing the expiration of a test transaction, and we further design TxSEA (Transaction State Expiration Analyzer) algorithm, which could deterministically reason whether a test transaction has expired or not (see Section 5.5.4 for more details). Our analysis of EVM execution reveals that Ethereum smart contracts sometimes include data sources unrelated to transaction-based state transition. For example, the Solidity property block.difficulty, represented by the DIFFICULTY EVM opcode, is determined by mining instead of previous transactions. We call the presence of such data sources σ-nondeterminism. If a transaction exhibits σ-nondeterminism in its execution stack, the transac- tion is σ-nondeterministic. In this work, we determine the exact conditions for σ-nondeterminism, and we design TxT in a way that it unambiguously detects σ-nondeterministic transactions. More- over, TxT could scrutinize σ-nondeterministic transactions to provide a warning regarding specific vulnerabilities associated with the σ-nondeterministic instructions in the contract. Challenge #2. Execution Without Propagation: Transaction encapsulation requires that the test 121 transaction should only be executed on the instrumented TxT node, while being ignored by all other nodes within the blockchain network. We show that the straw-man solutions, such as network packet filtering or propagation suppression of the transaction, disrupt the synchronization and lead to a stall of the node. To overcome this challenge, we propose transaction underpricing — a gas price manipulation scheme, which effectively avoids the execution of transaction by the blockchain network at large, without creating conditions in which the TxT node cannot re-synchronize with the Mainnet after the test. Challenge #3. Transaction Sequences: As demonstrated by previous studies [122, 178, 214, 223, 260], many vulnerabilities require executing a series of transactions for reproduction. To address this challenge, we design TxT to retain the state of a soft fork for a set period of time after each test transaction, in order to enable the execution of a sequence of transactions with an arbitrary length. We enhance the TxSEA algorithm to determine the expiration of the entire sequence of transactions. 5.5.2: Transaction Expiration Determining a transaction expiration event is essential for the success of the proposed TxT tool; otherwise, TxT cannot guarantee that the final real transaction(s) will produce the same result as the test transaction(s). Here, we formally define the expiration conditions starting from transaction expiration. Definition 1: A transaction Ti is expired at block B if: ∃Tj : Tt,i = Tt,j ∧ To,j 6= To,i ∧ Tb,j > Tb,i ∧ Tb,j ≤ B. (5.3) Essentially, the transaction expiration stipulates the presence of at least one transaction Tj submitted to the same smart contract as Ti (Tt,i = Tt,j ) from a different account than Ti (To,j 6= To,i ) at any block time after Ti (Tb,j > Tb,i ) but before or at block B (Tb,j ≤ B). The following definition asserts that for each block, the sets of expired and unexpired transactions are disjoint and form a partition. 122 Definition 2: A transaction Ti is unexpired at block B if and only if it is not expired at block B. Following the definitions of transaction expiration, we define the expiration of a transaction sequence as follows. Definition 3: A sequence T ∗ is expired at block B if: ∃Ti ∈ T ∗ ∃Tj ∈ / T ∗ , To,j 6= To,i : (5.4) Tb,i < Tb,j ≤ B ∧ Tt,i = Tt,j . Finally, we formally define the condition for an unexpired sequence of transactions. Definition 4: A sequence T ∗ is unexpired at block B if: ∀Ti ∈ T ∗ ∄Tj : (5.5) To,j 6= To,i ∧ Tb,j > Tb,i ∧ Tb,j ≤ B ∧ Tt,i = Tt,j . Intuitively, a transaction expiration event is characterized by the presence of another transaction calling a function of the same smart contract after the test transaction. We assess the probability of such an event in Section 5.6.3. 5.5.3: Sources of σ-nondeterminism In order to determine all the sources of σ-nondeterminism on the Ethereum platform, we conduct an exhaustive manual analysis of the current 145 EVM opcodes. In the end, we identify the following set of opcodes incurring σ-nondeterminism: T = {BLOCKHASH, NUMBER, COINBASE, GASLIMIT, DIFFICULTY, TIMESTAMP, GASPRICE, BALANCE}. Next, we elaborate on how these opcodes make the associated transaction σ-nondeterministic. Block Hash: The BLOCKHASH opcode retrieves the block hash for a specified block number. Its presence in the execution stack of a transaction is a sign that this transaction is σ-nondeterministic. For example, if B is the most recently mined block, the BLOCKHASH opcode will return 0x0 for 123 B + 1 (i.e., the next block). However, one hour after that, the same code will return a non-zero hash. Note that the BLOCKHASH opcode constitutes a signature of the “Weak Sources of Randomness from Chain Attributes” (SWC-120) vulnerability. Block Number: The NUMBER opcode retrieves the current block number. This variable constantly increments, rendering any transaction that has this opcode in its bytecode to be σ-nondeterministic. Also, this opcode is a marker for the “Block Values as a Time Proxy” (SWC-116) vulnerability. Block Beneficiary Address: The block beneficiary address is the address specified by the winning miner for receiving the reward. The COINBASE opcode retrieves the current block’s beneficiary address. Since this value may be different between blocks, any transaction that uses this opcode in its execution stack is σ-nondeterministic. Furthermore, this opcode is also a signature of the SWC-120 vulnerability. Block Gas Limit: Each Ethereum block has a limit on the cumulative gas consumption by all its transactions. The GASLIMIT opcode returns the gas limit value. This value may vary from block to block, and therefore the presence of the GASLIMIT opcode within the execution stack of a transaction renders this transaction σ-nondeterministic. Additionally, this opcode constitutes a signature of the SWC-120 vulnerability. Block Difficulty: Each block has its own mining difficulty, which is calculated from the difficulty of the previous block and the timestamp set by the miner, and therefore its specific value is volatile. The DIFFICULTY opcode allows to retrieve the current block’s difficulty. The variability of block difficulty is a clear sign that the transaction with the DIFFICULTY opcode in its execution stack is σ-nondeterministic. This opcode is yet another signature of SWC-120. Block Timestamp: The block timestamp is a value put in the block by the miner, and it may not necessarily represent the exact time the block was mined. A contract can retrieve the block timestamp value using the TIMESTAMP opcode. Intuitively, the value of block timestamp is not expected to stay the same. Therefore, the presence of the TIMESTAMP opcode in the execution stack of a transaction is not only indicative of the SWC-116 vulnerability potential, but it is also an indicator that the transaction is σ-nondeterministic. 124 Third-party Account Balance: The BALANCE opcode retrieves the balance of an account. If an account is not in the set {To,i , Tt,i }, we call it a third-party account. In this work, we analytically determine that a third-party account balance incurs σ-nondeterminism in smart contracts. If some account’s balance is updated by a transaction submitted to an account other than Tt,i , it does not render Ti expired; however if this transaction contains a BALANCE opcode in its execution stack, the transaction is marked as σ-nondeterministic. Transaction Gas Price: The transaction gas price can be obtained via the GASPRICE opcode. Since TxT uses transaction underpricing, the value retrieved by the GASPRICE opcode will differ between the test transaction and the final one. Therefore, the presence of this opcode in the execution stack of a transaction implies that this transaction is σ-nondeterministic. This opcode is another signature of the SWC-120 vulnerability. Finally, by combining the above observations, we can establish the following definitions, start- ing with the definition of a σ-deterministic transaction. Definition 5: A transaction Ti is σ-deterministic if and only if Tc,i ∩ T = ∅. Since σ-deterministic and σ-nondeterministic transactions form a partition, the following defi- nition ensues. Definition 6: A transaction Ti is σ-nondeterministic if and only if it is not σ-deterministic. Similarly, we can further expand the definitions to include testing sequences. Definition 7: A transaction sequence T ∗ is σ-deterministic if and only if all transaction in T ∗ are σ-deterministic. Definition 8: A transaction sequence T ∗ is σ-nondeterministic if at least one transaction in T ∗ is σ-nondeterministic. 5.5.4: TxSEA Algorithm Through transaction testing, TxT allows the user to peek into the a posteriori state of a transaction. Unfortunately, a posteriori state is transient and can expire at any moment. Due to the other inter- fering transactions, the execution path of the final transaction might not match that of the testing 125 Algorithm 1: Dynamic TxSEA with Caching Data: The transaction expiration map E: ContractAddress 7→ LastTxBlock 1 Procedure CacheTransaction(Tj ) begin Result: Cache the transaction currently processed by EVM and append it to the permanent storage Input: Tj — currently executed transaction 2 E[Tt,j ] ← Tb,j ; 3 Function ExpirationTest(Ti ) begin Result: Test transaction expiration status Input: Ti — tested transaction Output: {Expired, Unexpired} 4 if Tt,i ∈/ E.Keys then 5 return Unexpired; 6 else if E[Tt,i ] ≥ Tb,i then 7 return Expired; 8 else 9 return Unexpired; transaction. To address this issue, we develop the TxSEA algorithm for confirming the identical execution path when the test transaction is submitted to the current block. Algorithm 1 shows an efficient implementation of TxSEA using caching and dynamic pro- gramming. This algorithm introduces a constant-time procedure CacheT ransaction, which is embedded into the instrumented Ethereum node and invoked for each executed transaction. This procedure uses the map E to store the block number of the last transaction for each smart contract. The transaction data gathered from the node is stored in an outside storage (e.g., a database), and this data is used by the ExpirationT est() function to determine if the transaction has expired. This function uses the transaction expiration map to search for a transaction that might have been recorded after Ti . The condition E[Tt,i ] ∈ / E.Keys checks whether the smart contract Tt,i has any recorded transactions; if not, the transaction is obviously unexpired. Otherwise, we check if the block associated with the last recorded transaction was mined simultaneously or after Tb,i (i.e., E[Tt,i ] ≥ Tb,i ), which indicates expiration. Finally, if the last transaction is recorded for the contract, but it happened before Ti , the transaction is unexpired. Our experiments show that this algorithm only experiences a negligible latency (see Section 5.6.6). The requirement for the additional storage 126 does not need an experimental evaluation because it will always occupy a fixed 52 bytes of storage per transaction. As of November 2022, the size of the TxSEA cache is slightly over 86 gigabytes, which is a small fraction of the size of the full node that requires hundreds of gigabytes. 5.5.5: How Does TxT Guarantee the Transaction Execution Path? In the previous section, we demonstrate the cases in which TxT cannot guarantee that the execution path of the final Mainnet transaction remains the same as that of the test transaction. Here, we confirm that, with all the uncertain cases eliminated, the identical path execution can be guaranteed. So far, we have been using a loose notion of transaction execution, which does not take into account the state of blockchain the transaction applies to. Following the Ethereum state transition model from [288], we can further define the formally precise definition of state-conditional execution path as follows. Definition 9: A state-conditional execution path, denoted Ti |σi , is the state transition σi → σi′ , such that σi′ = Υ(σi , Ti ), where Υ is the deterministic state transition function in EVM. Definition 10: A state of contract Tt,i , denoted σt,i is a subset of state values in σi (i.e., σt,i ∈ σi ) that encompass only storage and balances associated with all contracts in the call stack of transaction Ti . Definition 11: A contract-state-conditional execution path with respect to contract Tt,i , denoted ′ ′ Ti |σi |Tt,i , is the state transition σt,i → σt,i , such that σt,i = Υ(σt,i , Ti ). Now that we have formal definitions of state-conditional execution path, contract state, and contract-state-conditional execution path, consider the following theorem, which formalizes the exact condition of replicability of a transaction execution path. Theorem 1: Given two transactions Ti and Tj , if Tn,i = Tn,j , Tg,i = Tg,j , To,i = To,j , Tt,i = Tt,j , Tv,i = Tv,j , Tf,i = Tf,j , Ta,i = Ta,j , Tc,i = Tc,j , Tc,i ∈ / T , and Ti is not expired at block Tb,j , then Ti |σi |Tt,i = Tj |σj |Tt,j , i.e., Tj exhibits an identical execution path as Ti within the call stack of Tt,i and conditional to states σj and σi , respectively, for all j > i. ′ ′ Proof: By definition, Ti |σi |Tt,i = Tj |σj |Tt,j =⇒ σt,i = σt,j =⇒ Υ(σt,i , Ti ) = Υ(σt,j , Tj ). Since 127 ′ ′ Υ is deterministic, σt,i depends solely upon σt,i and Ti , while σt,j depends solely upon σt,j and Tj . As per Ethereum and EVM specifications [20, 21, 288], the execution of a transaction calling a function of a smart contract is determined only by the following four components: 1) the code of the smart contract, as well as the code of the other contracts invoked within the call stack of the transaction; 2) the storage of the target smart contract, as well as the storage of all contracts within the call stack of the current transaction; 3) balances of smart contracts and EOAs; 4) block- related values. Next, we prove that none of these components could prevent Tj , applied to σt,j , from executing the exactly same path as Ti , when applied to σt,i , while satisfying the Theorem’s constraints. Since Tc,i = Tc,j , the code of all contracts in the call stack is identical, and therefore this com- ponent is incapable of creating a discrepancy between Ti |σi |Tt,i and Tj |σj |Tt,j . As per Definitions 1 and 2, the pre-condition that Ti is not expired at block Tb,j implies that Tt,i has no incoming transactions to Tt,i between the timestamps of blocks Tb,i and Tb,j , i.e.: ∄Tk : To,j 6= To,i =⇒ (5.6) Tb,k > Tb,i ∧ Tb,k ≤ Tb,j ∧ Tt,i = Tt,k . Since Tc,i = Tc,j and the contract storage can only be altered through an incoming transaction , Eq. (5.6) effectively eliminates contract storage discrepancy between states σt,i and σt,j . Therefore, a contract storage could not create a discrepancy between Ti |σi |Tt,i and Tj |σj |Tt,j . Similarly, altering a contract’s balance is only possible through transactions, mining, or self- destruction. Specifically, the balance increase requires a transaction calling a payable function with a non-zero value. The balance decrease requires a transfer of Ether performed by the smart contract code. The mining reward involves updating of the coinbase parameter of the block, which makes it a block-related parameter as discussed later. The self-destruction is only possible by executing the SELFDESTRUCT opcode in the smart contract code initiated via a transaction. Therefore, Eq. (5.6) also excludes any balance transfer. Moreover, since Tc,i ∈ / T , the balance checks for other accounts are also excluded. Therefore, balances also could not create a discrepancy between Ti |σi |Tt,i and 128 Tj |σj |Tt,j . Finally, all block-related values are included in T . As established earlier, Tc,i = Tc,j , and thus Tc,i ∈ / T =⇒ Tc,j ∈ / T . Therefore, block values cannot create a discrepancy between Ti |σi |Tt,i and Tj |σj |Tt,j under the set constraints. In summary, we see that the code of the transaction call stack, the storage of the target smart contract and all contracts within the call stack of the current transaction, balances of smart con- tracts and EOAs, and block-related values are unable to create a discrepancy between Ti |σi |Tt,i and Tj |σj |Tt,j . Therefore, Ti |σi |Tt,i ≡ Tj |σj |Tt,j . ■ The set of constraints in Theorem 1 is the sufficient condition for guaranteed replicability of a test transaction, which is used by TxT and TxSEA. Specifically, we prove that σ-deterministic unexpired transactions guarantee the replicability of a testing transaction execution path. 5.5.6: Temporal Separation of Transactions Some smart contracts force transaction separation by a time gap. For example, an investment scheme might require a delayed withdrawal of dividends. In this work, we analytically determine that it is impossible to enforce time separation without incurring σ-nondeterminism or transaction sequence expiration, which we can summarize in the following theorem. Theorem 2: Inter-transaction time separation stipulation in a sequence T ∗ implies that T ∗ is σ- nondeterministic or it is bound to expire before block Tb,N . Proof: The time separation stipulation means that it is impossible to complete the transaction se- quence without awaiting a certain event or condition between a pair of subsequent transactions. Without the loss of rigor, we assume that the minimum inter-transaction separation time quantum is equal to one block45 . This reduction allows us to define the transaction time separation stipulation as follows: ∃Ti , Tj ∈ T ∗ : Tb,i = α ∧ Tb,j = β ∧ Tn,j = Tn,i + 1 =⇒ β > α. This condition is indicative of either of the following three circumstances regarding Ti and Tj : 1) The cumulative gas consumption of Ti and Tj exceeds the block’s gas limit; 2) There is at 45 On average, it takes 10 to 20 seconds in Ethereum to mine a new block. 129 least one other transaction Tk expected before the block α; 3) The state of blockchain σ must meet a certain condition before α. Indeed, outside of these three conditions, there is no other circumstance preventing Ti and Tj with different nonces to share a block. The first condition is automatically prevented by Ethereum by mining one of the transactions in one of the following blocks, but this adaptive behavior is not stipulated because the block size is variable and may or may not exceed the cumulative gas consumption of Ti and Tj . The second case precisely matches Definition 3, and therefore incurs the expiration of sequence T ∗ . The third case satisfies Definition 6 (and subsequently Definition 5) — which means that in this case Ti is σ-nondeterministic, and by Definition 8 it means that T ∗ is σ-nondeterministic. Therefore, the inter-transaction time separation implies either σ-nondeterminism or expiration of T ∗ ■ Corollary of Theorem 2: If a transaction sequence requires a time separation between transac- tions, this sequence is untestable, potentially vulnerable, or it will expire before the execution of its last transaction. Therefore, transaction sequences that require time separation between trans- actions cannot be tested by TxT. 5.5.7: Transaction Execution on an Instrumented Node Here, we outline some straw-man approaches that might be considered as alternative design choices for TxT. However, all these approaches suffer from some limitations as illustrated below. Gossip Delivery: TxT requires the user to switch to TxT network for transaction testing. It would be reasonable to consider delivering the test transaction through the normal Ethereum node based on the assumption that any transaction, even the one deemed for failure, must arrive at every node in the network, so that all the nodes could make their own independent rejection judgement. However, our experiments show that this assumption is not always correct. Our extensive experiments with transaction underpricing show that the nodes often refuse to forward transactions that do not pass certain “smoke tests” (e.g., minimum gas price), so we cannot rely on the Ethereum gossip protocol for delivering the test transaction to the TxT node. Network-layer Propagation Inhibition: We assume that the user has a subscription with a TxT 130 provider. This allows the provider to compare the from field of the transaction message with the user database to filter outgoing network packets containing test transactions. However, our exper- iments show that the attempts to tamper with Ethereum network traffic cause some unpredictable behavior, such as node stalling and various synchronization errors. Even if we could overcome these errors by reverse-engineering the software (Go Ethereum, in our case), the reliance on eccen- tricities of a specific implementation of Ethereum node is not only extremely complicated, but it could also involve some unforeseeable errors. Thus, we choose to prevent the transaction propaga- tion through the less intrusive method of transaction underpricing. Moreover, since the wallets ask users to select the transaction gas price anyway, the requirement to specify a low gas price does not create a noticeable inconvenience. Submit Final Transaction via TxT: If a TxT test confirms the safety of a transaction, the user is required to reconnect to the Mainnet network for submitting the final transaction. This step raises a question: would it be easier to submit the final transaction through TxT instead, as the TxT node is essentially a Mainnet node? Unfortunately, this approach might be less convenient for the user than the one proposed in our design. Submitting the final transaction through TxT would require pruning and re-synchronizing the TxT node to remove the test transaction from it, which takes some time; since tested transactions, as we know, are prone to expiration, any unreasonable delay should be eliminated. Our TxT Design with Transaction Underpricing: TxT requires the isolated execution of trans- actions on a fully-synchronized node. We run multiple experiments to determine that the “forced” solutions, such as gossip firewalling (suppression of transaction propagation), incur unrecoverable node stalls. Moreover, any extensive modification of a TxT node creates sustainability issues: the same modifications have to be applied to future releases of the node, resulting in an increased main- tenance overhead. To overcome this challenge, we propose transaction underpricing — a gas price manipulation scheme, which effectively avoids the execution of transaction by the blockchain net- work at large, without creating conditions in which the TxT node cannot re-synchronize with the Mainnet after the test. To override the rejection of transaction, we enable --miner.gasprice 1 131 Figure 5.4: The workflow of TxT testing. CLI option in Go-Ethereum which effectively overrides the underpriced transaction checks. Since the London Fork, Ethereum enforces the EIP-1559 proposal [79], which effectively prevents mining transactions with very low gas price, making the concern about accidental mining of underpriced transactions unsubstantiated. 5.5.8: Putting It All Toghether Fig. 5.4 shows a successful testing of a single transaction. Without the loss of generality, the same workflow can be applied to a series of two or more transactions. We assume that the user has an Ethereum wallet with an account and some positive Ether balance. By specifying the minimal positive gas price of 1 wei/gas, it would require the user to have only 1.35 · 10−7 USD worth of Ether (as of November 2021) for a worst-case transaction consuming the entire block gas limit. We 132 also assume that the user submits a transaction either directly to a smart contract, or uses a dApp as a front-end for a smart contract (with the wallet connected to this dApp). Next, we describe each of the nine steps of a successful transaction security testing using TxT. ¶ Unplugging from Distributed Node: Virtually all Ethereum wallets are connected to Mainnet using a distributed Ethereum Access Service, such as Infura [22] or POKT [24]. In order to test transactions with TxT, the user should connect to the TxT server. · Connecting to TxT Node: Popular advanced Ethereum wallets, such as MetaMask, allow to connect to a custom Ethereum network by providing its address and port. In most wallets, this is a one-time setup, after which the user can use a drop-down menu to switch between TxT service and Mainnet. ¸ Sending Test Transaction: Once the user switches to the the TxT network, which is essentially the Mainnet network accessed through the TxT Ethereum node, the user submits a transaction as if it was a usual transaction. This prompts the wallet to show the confirmation dialog, asking the user to select or manually enter the fee parameters. The user specifies a very low gas price (e.g., 1 wei)46 . Once the transaction is submitted, TxT immediately begins processing it. ¹ Transaction Forwarding: Next, the instrumented node forwards transaction to the Ethereum Mainnet network using the gossip P2P protocol. Since we aim at preventing the execution of the test transaction by the Mainnet network at large, we expect the transaction to be rejected by all other nodes except the instrumented TxT node due to a very low gas price (i.e., Tp  µ(Tp )) specified by the user in the wallet. º Rejection by Mainnet at Large: Ethereum nodes place transactions into transaction pools, in which transactions are awaiting execution. Our experiments confirm that severely underpriced transactions are rejected by most Ethereum nodes early on, without reaching the transaction pools. » A Posteriori State: Since the user’s wallet is connected to the TxT network, the state of TxT node becomes the ground truth for the wallet or a dApp connected to that wallet. Therefore, the test transaction rejected by the Mainnet network outside of the TxT node will be seen as executed 46 Some wallets prohibit tiny gas prices for Mainnet transactions. However, they do not impose gas price limits on TxT because it is a custom network. 133 by the wallet. We call this situation the a posteriori state, i.e., the state of the blockchain caused by the execution of the test transaction. ¼ Test Transaction Status: TxT provides the status information for each transaction (delivered via a web page, API, or other methods). After submitting the transaction, the user will observe one of the following four test transaction statuses: S1: Transaction is unconditionally testable (σ-deterministic) and valid (unexpired); S2: Transaction is σ-deterministic, but it is expired; S3: Transaction is σ-nondeterministic, but TxT found a potential vulnerability; and S4: Transaction is untestable (σ-nondeterministic and no vulnerabilities found). If the transaction is successfully executed and unexpired (S1), the user may submit the test trans- action to Mainnet, if the a posteriori state is satisfactory. If the transaction is successfully executed but expired (S2), then the user should repeat the test. If the transaction is σ-nondeterministic with a potential vulnerability warning (S3), the user cannot rely on TxT for testing the transaction, but TxT provides a warning facilitating the assessment of risks via traditional methods. Finally, if TxT determines that the transaction is untestable (S4) that the transaction cannot be evaluated by TxT. ½ Reconnecting to Distributed Node: If the transaction is testable, unexpired, and a posteriori state matches the user expectation, then this transaction can be safely submitted to Mainnet for final execution. In this case, the user switches the wallet back to the Mainnet network node for submitting the transaction as usual. ¾ Submitting Mainnet Transaction: An unexpired σ-deterministic transaction is guaranteed to have the same outcome as the test transaction. Even if transaction expires right at the moment it is submitted, the user might initiate an emergency cancellation of the transaction before it is mined following the Ethereum transaction replacement procedure supported by most crypto wallets [86]. The above procedure corroborates that a TxT user does not require to employ advanced techni- cal skills (e.g., understanding the contract code) or meticulously investigate the safety of a planned transaction (or transaction sequence). Moreover, the user assesses the outcome of the test transac- tion(s) using the familiar interfaces, such as crypto wallet and/or dApp. 134 Table 5.2: Summary of vulnerability coverage by state-of-the-art defense tools. Vulnerability (SWC Registry number)† Defense 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 Tool 125 126 127 128 129 130 131 132 133 134 135 136 Oyente [200] ####### ##### # #################### Securify [271] #### # G ###### ######## ############ Mythril [215] # ## # ## # # # #G ################### Sereum [240] ####### ############################# Vandal [76] ##### ####### ## ###############G ### sGuard [220] # ##### ####### ##################### ZEUS [168] # ##### ###### ################# ## ConFuzzius [119] # ## # ## # # # #G ################### VeriSmart [261] # ################################### SmarTest [260] # ### ### ############G ############## Osiris [269] # ################################### ECFChecker [137] ####### ############################# Maian [223] ##### ############################## TxT (our work) # #   # # — full support; G # — partial support; # — no support;  — explicit detection of vulnerability. † https://swcregistry.io/. 5.6: Implementation and Evaluation In this section, we evaluate our implementation of TxT to confirm the feasibility of its real-world deployment. 5.6.1: Implementation and Deployment We implement TxT by instrumenting Go Ethereum 1.10.10 and adding additional data-processing modules using Node.js 12.22.5 with Web3.js 1.2.6 and Python 3.9.7. In order to prevent accidental disruption of the normal Ethereum execution, our instrumentation of Go Ethereum includes only minimal necessary modifications, i.e., gathering and saving chain data, and overriding the gas price bottom limitations only for specified accounts representing the customers of a TxT server. The gathered data is then processed independently of the node by external Node.js and Python modules. We deploy TxT on Dell PowerEdge T640 server with 2 Intel Xeon Gold 5218 CPU, 250 GB RAM, and SATA SSD (6 Gbps throughput), connected to 1 Gbps wired Internet link. The instru- 135 mented TxT node uses the full synchronization mode with one CPU mining thread (for enabling opcode execution), and 8,192 MB of cache. In current implementation, we use SSH and text inter- face for test transaction status retrieval. 5.6.2: Vulnerability Coverage by TxT We implement 37 cases reproducing all the cataloged smart contract vulnerabilities in the SWC Registry [42]. After that, we test all the transaction sequences reproducing these vulnerabilities on a TxT deployment to assess which vulnerabilities are detectable by TxT. One important aspect of this assessment is that we judge the ability of TxT to reveal a vulnerability not only based on our sample implementation, but also based on the ability to address all possible vulnerable implementations. We compare TxT with 13 state-of-the-art defenses based on their self-reported coverage disclosure. The result, shown in Table 5.2, demonstrates that TxT significantly outperforms all the state-of- the art tools in terms of the number of vulnerabilities it is able to defend against. Specifically, all the state-of-the-art tools combined only detect and/or prevent 15 out of 37 vulnerabilities (40.5% coverage), while TxT deterministically prevents 31 out of 37 (83.8% coverage). Furthermore, if we add the warnings of potential insecurity to our assessment, the vulnerability coverage by TxT reaches 89.2%. Some vulnerabilities, such as SWC-105, SWC-115 and SWC-134, are semantic-dependent, i.e., they rely upon understanding of the intent of the developer and/or user, and therefore they are only supported by the heuristic tools. For example, a pattern corresponding to SWC-105 (“Unprotected Ether Withdrawal”) is perceived as a dangerous omission in most contracts, but the same behavior could be correct if the contract is designed to be an Ether faucet. Moreover, SWC-136 (“Unen- crypted Private Data On-Chain”) still remains unsupported by all existing tools, including TxT. Addressing this vulnerability would requires the identification of a leaked secret, which is insur- mountable. 136 Table 5.3: Number (×106 ) and percentage of accounts exhibiting state retention within set time threshold. State retention threshold (θexp ) Counting condition 60 sec. 600 sec. 3600 sec. All txns testable 122.38 115.11 109.68 (min(∆t) > θexp ) (93.19%) (87.65%) (83.52%) Avg. txns testable 124.80 121.50 119.08 (µ(∆t) > θexp ) (95.04%) (92.52%) (90.67%) 90% testable 124.86 121.59 119.19 (P90% (∆t) > θexp ) (95.08%) (92.59%) (90.76%) 5.6.3: Transaction Expiration Rate TxT tests are prone to expiration due to the constantly changing state of blockchain. In this eval- uation, we gather over 1.3 billion transactions (from the Genesis block until November 5, 2021) submitted to over 131 million Ethereum accounts (smart contracts and EOAs) to find the percentage of accounts resilient to transaction expiration. To assess the transaction expiration resiliency, we pick three time thresholds (1 minute, 10 minutes, and 1 hour), and we group all accounts into three categories: 1) the ones that have never experienced transaction expiration within the set threshold; 2) the ones which transactions on average (mean) do not expire before the threshold; and 3) the ones with 90% or more transactions not expiring before the set threshold, as shown in Table 5.3. The experimental result demonstrates that statistically the vast majority of test transactions will not expire within reasonable time, sufficient for submitting the final transaction to Mainnet. However, if the test expires earlier than the final Mainnet transaction is submitted, the user has a choice to repeat the test, and the probability of success after the multiple tests will be Psucc = 1 − (1 − Psingle )k , where Psingle is the probability of success for a single test within the given time threshold, and k is the number of attempts. Thus, even if transaction expires before the user submits the final one, a repeated test will address the problem, as shown in Fig. 5.5. 137 Figure 5.5: Probability of avoiding expiration via repeated testing. 5.6.4: σ-nondeterministic Transactions In this work, we propose a paradigm allowing to deterministically predict the result of a transaction at the expense of rejecting a small portion of transactions that we call σ-nondeterministic. TxT is unable to guarantee the outcome of a σ-nondeterministic transaction, however, we are able to par- tition these transactions into potentially unsafe (prone to SWC-120 and SWC-116 vulnerabilities), and untestable (not necessarily vulnerable, but the result is unpredictable). Fig. 5.6 shows the result of processing over 1.3 billion Ethereum transactions with opcode anal- ysis of their execution stacks. The result shows the counts of opcode presence events (i.e., several identical opcodes within one call stack count as one event), divided into three groups: untestable (no vulnerability markers), SWC-120 markers, and SWC-116 markers. The latter two groups pro- duce respective warnings regarding possible vulnerabilities, while the untestable transactions are to be rejected by TxT. The evaluation shows that approximately 86.27% of all transactions are σ-deterministic and 13.73% are σ-nondeterministic. Out of almost 185 million σ-nondeterministic transactions, only 25.5% are purely untestable, which means that TxT completely rejects about 3.5% of transactions, and gives at least partial results for 96.5% of transactions. We believe that through a deep opcode and EVM stack analysis it is possible to further reduce the rate of σ-nondeterministic and untestable 138 Figure 5.6: Occurrence of σ-nondeterministic opcodes in the execution stack of 1.3 billion Ethereum transactions. transactions. 5.6.5: Underpriced Transactions in the Wild The transaction underpricing approach, utilized by TxT, raises a concern: if a block does not have enough properly priced competing transactions, the underpriced test transaction might be included in this block [288]. Our evaluation shows that Ethereum Mainnet has 2,506,498 zero-priced trans- actions (as of November 2021)47 . These transactions have been a known nuisance in the Ethereum community [87]. Although the rate of zero-priced transactions on Ethereum is only 0.186%, the very fact of their presence poses a threat to the feasibility of TxT. Fortunately, the EIP-1559 [79] proposal, which has been enforced at the London hard fork, solves the problem. Although the pro- tocol adjustment does not explicitly target the zero-priced transactions, it effectively makes these transactions impossible. To verify this, we process over 111 million transactions soon after the Lon- don fork to confirm that none of them has a gas price lower than 1,423,420,054 wei (see Fig. 5.7). Thus, after the London fork, it is no longer possible to accidentally mine an underpriced transaction. 5.6.6: TxT Delays and Transaction Efficiency TxT is implemented as an instrumented Go Ethereum node incurring some additional transaction execution delays. Moreover, as TxT continues processing transactions, the transaction execution delay may increase due to the growing cache size of TxSEA algorithm. In this part of the evaluation, 47 For example, 0xc3fa8399ef7922aef0ec7278f7b4b5e28e7191e ba3027ca1143af2cf17acae86 139 Figure 5.7: Minimal accepted gas price of 111,226,625 post-London transactions on Mainnet. With- out the loss of correctness, we apply the moving minimum function to the data. we first measure the added per-opcode delay of TxT instrumentation over a large time period. Then, we make a projection of the added delay of transaction execution by a TxT node. For our evaluation experiment, we compare the opcode execution delays between instrumented TxT node and a pure Go Ethereum node. The experiment was conducted on the same Dell Pow- erEdge T640 server as the rest of the experiments. The time-critical core module of TxT is repeat- edly invoked in the opcode processing loop of the Run function in core/vm/interpreter.go. We activate TxT for historical Mainnet transactions, and collect timestamps at every iteration of the loop, which gives us the delay of execution of a single opcode. After that, we remove all the TxT code from the node, leaving only the timestamp collection instruction, and execute the same transactions again, this time without TxT. To be able to better visualize the data without the loss of generality, we collect the execution delays of 500 million of executed opcodes, both with and without TxT, and split them into 500 frames, each containing 1 million transactions. Then, for each of the frames, we plot the difference between the average instrumented and non-instrumented delays, which we call the added delay (i.e., the difference in opcode processing delay between TxT and baseline approach, see Fig. 5.8). The result shows that despite growing TxSEA cache, TxT 140 Figure 5.8: Additional time (in nanoseconds) that the instrumented TxT node spends on average for executing one opcode compared to the baseline non-instrumented node. Despite growing cache size, the execution delay is not visibly increasing even after 500 million processed opcodes. The measurement is a average of 500 frames, each with 1 million transactions. does not exhibit any noticeable growth in added opcode processing delay. Moreover, the average delay for each frame stays between 2,300 and 3,000 nanoseconds per opcode. In our next experiment, we count the number of opcodes executed by a sample of 100 mil- lion transactions in Ethereum Mainnet. Then, we create a distribution of the transaction opcode counts, shown in Fig. 5.9. As we can see, the vast majority of transactions execute less than 5,000 opcodes. The results of the evaluation show that most added opcode execution delays are under 3,000 nanoseconds, while the vast majority of transactions execute under 5,000 opcodes. There- fore, the added delay caused by TxT implementation does not exceed 3, 000 · 5, 000 · 10−6 = 15 ms. Assuming that the sequence of transactions in a tested workflow does not exceed 10, the TxT delay per test will not be larger than 150 ms, which is negligible. The state-of-the-art smart contract tester Confuzzius [119], which claims enhanced time performance compared to previous testers, requires 500-1,000 seconds of time to achieve 75% instruction coverage. Compared to Confuzzius, TxT delivers almost instant result because it dynamically tests transactions against the current state of blockchain. 141 Figure 5.9: Number of executed EVM opcodes per transaction, based on the sample of 100 million transactions. Last, but not least, we evaluated the performance and feasibility of TxT over the Proof-of-Stake consensus that was recently adopted by the Ethereum network. Except for the necessity to update several command-line parameters, we did not notice any performance or other difference between TxT operating on Proof-of-Work consensus versus Proof-of-Stake. 5.7: Limitations and Discussion This is the first work on a deterministic approach of smart contract testing using the transaction en- capsulation. We believe that our work opens up a new era of non-heuristic audit of smart contracts. However, the paradigm shift comes with some remaining challenges and open questions. Testing Eligibility: The current implementation of TxT provides a practical proof of concept of the transaction encapsulation framework. However, the rate of σ-nondeterministic transactions is still high. Our evaluation shows that the major culprits are the NUMBER and BALANCE opcodes in the call stacks of the tested transactions. However, we observed that the conditional statements involving the NUMBER opcode often turn into tautologies or contradictions when Tb,i is larger than a certain value. In other words, we have sufficient reason to believe that the rate of σ-nondeterministic transactions can be drastically reduced by designing more fine-tuned procedures for identifying σ-nondeterministic transactions. 142 State Expiration: Blockchain is a dynamic multi-user environment in which executed transactions constantly create interference to one another. This interference is the cause of TxT test expiration. In this work we use a coarse assumption Tb,j > Tb,i ∧ Tt,j = Tt,i for determining the transaction expiration. However, we believe we can significantly reduce the rate of expiration by exploring the execution stacks of purportedly interfering transactions and determining which of them effectively interfere with the testing transaction. Custom RPC Support by Crypto Wallets: TxT design assumes that the user wallet, which is the proxy to the Ethereum ecosystem, has the support for adding a custom RPC network. While most popular wallets support this feature, some other wallets (e.g., MyEtherWallet [23]) do not. However, in the spirit of decentralization and trust elimination, most Ethereum wallets are open- source, which makes it easy to add a modification or plugin for supporting a custom RPC. Transactions from Multiple Accounts: The current design of TxT assumes that the entire trans- action sequence originates from the same account, which is by far the most obvious scenario. How- ever, we also admit that some transaction sequences might require testing involving several ac- counts, such as in the case of multi-signature distributed token wallets. Although the multi-account support is deliberately removed from the current design to avoid unnecessary complication, our analysis indicates that implementing this functionality is tantamount to improving some bookkeep- ing routines in the current design. Deployment Scalability: The current implementation of TxT does not allow to run multiple testing sessions on one instrumented node, which means that the TxT security provider must maintain a sufficient number of separate instrumented Ethereum nodes to accommodate all simultaneous test- ing requests. On the one hand, the computation cost is not a big concern because TxT instrumented nodes do not require competitive mining. On the other hand, each node must be fully synchronized, with approximately 500GB of per-node storage requirement. Yet, we believe it is possible to or- chestrate testing to allow execution of non-conflicting transactions on the same node. We leave this functionality for future research. 143 5.8: Chapter Summary Traditional software often requires user confirmation of critical operations, such as deleting records or submitting web-based applications. Implementing the same mechanism in smart contracts is notoriously hard due to the notorious TOCTOU issue caused by the ever-changing state of the blockchain. In this work, we provided the first solution to address this problem by allowing a user to preview and confirm transactions. To make it feasible, we formally determined the ex- act set of conditions for transaction replicability and introduced transaction encapsulation, a new framework for deterministic real-time transaction testing, which uncovers the outcome of the in- tended transactions or transaction sequences. Transaction encapsulation could effectively capture the unpredictable behaviors associated with known and zero-day vulnerabilities. We developed and implemented the transaction tester TxT. Through extensive experiments, we demonstrated that TxT prevents the exploitation of more than twice as many vulnerabilities as covered by the existing defense tools combined. In the spirit of open research, we will make TxT and all the evaluation artifacts open source. 144 CHAPTER 6: SMART CONTRACTS ON THE CLOUD48 6.1: Introduction Bitcoin is the first decentralized digital currency powered by blockchain with proof-of-work (PoW) consensus, which effectively prevents data tampering by anyone with less than half of the total computational power of the network [216]. Recently, Dembo et al. delivered a formal proof of the correctness of the above statement with respect to the original PoW Nakamoto consensus [106]. Although Bitcoin’s original purpose was to serve as a cryptocurrency transaction ledger, the unique properties of blockchain soon attracted researchers and engineers to re-purpose the technology for a plethora of decentralized applications, commencing the era of smart contracts. Smart contracts are decentralized immutable programs that allow to establish custom mediator- free protocols between parties that do not trust one another. For example, a smart contract can be used to help conduct an election in a decentralized manner [144, 275]. Another popular use case is fungible tokens, which can represent corporate shares, gift card balances, and even custom currencies. Recently, researchers and businesses proposed a wide variety of smart contract applica- tions [159, 237, 239], some of which have already been adopted by nations’ governments and large industries [286]. However, the unique features of blockchain and smart contracts come at a high price of mediocre performance and bounded scalability. One way to address the performance and scalability issues of blockchain is to use a private per- missioned blockchain framework, such as Hyperledger Fabric [53], which only uses pre-installed 48 This chapter is based on previously published work by Nikolay Ivanov, Qiben Yan and Qingyang Wang titled “Blockumulus: A Scalable Framework for Smart Contracts on the Cloud” published at the Proceed- ings of the 2021 IEEE 41st International Conference on Distributed Computing Systems (ICDCS). DOI: 10.1109/ICDCS51616.2021.00064 [162]. © 2021 IEEE. Reprinted, with permission, from Nikolay Ivanov, Qiben Yan and Qingyang Wang “Blockumulus: A Scalable Framework for Smart Contracts on the Cloud” (paper and IEEE titles are the same), July 2021. 145 smart contracts (called chaincode) and splits the voting power between a small number of fixed par- ticipants. Although such blockchains deliver performance improvement over public blockchains, the requirement to establish a trustworthy consortium of organizations running these blockchains prevents its wide adoption in many applications, such as cryptocurrencies and decentralized voting. Thus, public blockchains cannot be replaced by permissioned ones. Recently, a number of solutions have been developed to address the inherent performance and scalability issues of public blockchain, including partial off-chain computation [98], side- chaining [231], cross-chaining [287], sharding [130, 298], payment channels [110, 232], efficient consensus protocols [68], new blockchain architecture [262], and network optimizations [90]. How- ever, all these solutions suffer from at least one of the following limitations. First, they could not deliver scalability in transaction throughput, data storage, and computation capacity at the same time. Second, the performance improvement is often incremental, but could be insufficient for many applications, such as retail payments. Third, they either do not support smart contracts, or their smart contracts are not Turing-complete [209], making it impossible to realize certain pro- gramming patterns. A recent blockchain scalability survey by Zhou et al. [308] concludes that a desired solution still has not been found. In this work, we propose a conceptually new approach to the blockchain scalability problem: we use an existing blockchain as-is to enable smart contracts on an already scalable system: the cloud. Observation 1. Centralized Service for Scalable Decentralized Contracts: The operation of decentralized systems is often supported by underlying centralized and/or permissioned services. For example, the decentralization of Domain Name System (DNS) is based on the assumption that the Internet Corporation for Assigned Names and Numbers (ICANN), which oversees the system, is functional and trustworthy [170]. Such pattern is also observed in public blockchains. Kwon et al. [180] formally demonstrate that classic public blockchains exhibit partial centralization incurred by concentration of compute power around a few mining pools. Moreover, the decentralized nodes of blockchains use the Internet as a communication medium, which is subsequently enabled by a network of centralized routers and Internet service providers (ISPs), whose owners must comply 146 with the regulations of local and federal jurisdictions. In this work, we extrapolate the above prin- ciple (i.e., the reconciliation of centralization and decentralization) to show that it is feasible and beneficial to build an environment that uses a centralized cloud as an underlying communication, storage and compute service for decentralized smart contracts. Particularly, this work demonstrates that cloud resources, such as storage and computation, can be treated as a utility (offered by a third party), which can support the operation of a decentralized network. Observation 2. High Cost of Permissionless Network: Public blockchains are supported by unstructured permissionless P2P networks, where nodes can freely join and leave. To support such a flexibility, the blockchains use a gossip protocol for peer communication. In this protocol, the peers are unaware of the current configuration of the network, so they achieve the network-wide propagation of broadcast messages by forwarding them through a subset of known peers. This incurs a significant message propagation latency and strict limits on the amount of data that can be transferred [105, 298]. Moreover, to prevent Sybil attacks, in which an adversary creates a large number of fake identities for gaining greater voting power, the PoW consensus algorithm has been used by Bitcoin, Ethereum, and many other popular blockchains. The PoW consensus involves a heavy computation, resulting in enormous electricity consumption. As such, public blockchains pay a very high price for the flexibility of the underlying P2P network. In this work, we show that a smart contract can be used to facilitate a decentralized consensus in an overlay smart contract environment built upon a centralized network of cloud providers, which drastically reduces communication and computational overhead. Putting together the above observations, we develop the concept of overlay consensus, which aims to deliver decentralization to smart contracts in a centralized cloud instead of random P2P network nodes. As a result, a consortium of clouds can host a permissionless smart contract envi- ronment and sell the access to it, but it cannot control the execution of these contracts or interfere with the data stored by these contracts. To achieve this, we use a smart contract deployed on a public blockchain to accrue periodic proofs of decentralization reported by the cloud consortium. The smart contract is designed in a way that any attempt of a foul play would inevitably generate a 147 Table 6.1: Comparison of Blockumulus with state-of-the art solutions. General-purpose smart Scalability improvement Solution contract support TPSa Storage Compute Algorand [130] 7 3 7 7 RapidChain [298] 7 3 7 7 Lightning [232] 7 3 7 7 Ekiden [98] 3 3 7 3 Arbitrum [167] 7 7 7 3 Jidar [103] 7 7 3 7 Monoxide [279] 7 3 3 7 Plasma [231] 3 7 ? 3 OmniLedger [175] 7 3 7 7 Blockumulus 3 3 3 3 a Transaction throughput (transactions per second). publicly-verifiable proof for the action of breaking the consensus protocol. In summary, we make the following contributions: • We introduce Blockumulus49 , a distributed framework for cloud contracts (bContracts50 ) based on the novel concept of overlay consensus. • We implement the full Blockumulus stack along with a sample bContract, called FastMoney, for payment processing. • We evaluate our Blockumulus implementation and the FastMoney bContract to show that the framework delivers low transaction latency, high transaction throughput, and affordable operation cost. 6.2: Comparison with SOTA Table 6.1 compares the state-of-the art (SOTA) solutions aiming to address the blockchain scal- ability and performance limitations. Although these studies improve the blockchain scalability, they could not simultaneously accommodate the growing demand for transaction throughput, data storage, and heavy computation, in applications such as cryptography, AI, and big data analytics. In contrast, Blockumulus brings general-purpose smart contracts (i.e., the smart contract suitable 49 The name Blockumulus is the portmanteau of the words “blockchain” and “cumulus” — a type of cloud with the traditional puffy texture. 50 bContract stands for “Blockumulus contract”. 148 Figure 6.1: Blockumulus overview. for a variety of applications beyond cryptocurrency transactions) on the cloud, which improves blockchain scalability in terms of transaction throughput, data storage, and computation simultane- ously. 6.3: System Design In this section, we introduce the Blockumulus framework and its operation protocol. 6.3.1: Blockumulus Overview Blockumulus is a framework that builds a decentralized environment for executing smart contracts upon a cloud consortium — a fixed set of M cloud nodes called cells, synchronized by the overlay consensus. The overlay consensus is empowered by a smart contract deployed on a third-party public blockchain, with independent auditors running software for an automated verification of Blockumulus workflow (see Fig. 6.1). Next, we introduce the major concepts of Blockumulus. Blockumulus Code Execution Model: The code execution in Blockumulus is performed in de- centralized Blockumulus smart contracts called bContracts, as shown in Fig. 6.2. The code of 149 Figure 6.2: Blockumulus state transition and data model. bContracts is openly accessible, so that the execution of transactions could be verified by anyone. The functions of bContracts are invoked through signed transactions arriving at the network, and the code in bContracts can be executed by appropriate interpreters. bContracts can be written in different programming languages, such as Python or JavaScript. Blockumulus Data Model: All data in Blockumulus is openly accessible and managed via cus- tom models implemented in the deployed bContracts. In order to store data as part of Blockumulus, each bContract must implement two interfaces: data fingerprinting and data cloning. Data finger- printing is a function that produces a fingerprint of the bContract’s current state or previously saved state. The data cloning function asks the contract to temporarily save its current state of data for sub- sequent fingerprinting. Blockumulus then combines all the fingerprints reported by the bContracts into a single hash called the data snapshot fingerprint. Overlay Consensus: The core idea of Blockumulus overlay consensus is to periodically report the hashes of data snapshots to a dedicated smart contract deployed on a public blockchain, as shown in Fig. 6.3. Once the report is submitted, it cannot be altered. Subsequently, if the report does not match the publicly available and independently verifiable snapshot, the cell cannot be trusted. In essence, the Ethereum smart contract serves as an online barometer of liveness and integrity of the Blockumulus deployment. 150 Blockumulus overlay consensus has two major differences with the traditional Nakamoto con- sensus observed in popular public blockchains. First, Blockumulus consensus uses correctness check instead of voting — all incoming transactions are recorded, and there is only one correct way to execute them such that the existence of two conflicting transactions in different cells is ruled out. Second, all transactions are executed immediately, during the open session with the client, with a pre-defined decision deadline — as a result, a consensus partitioning (called fork) is impossible in Blockumulus. Unlike in a distributed database, which stipulates identical query execution in all tables, Blockumulus provides autonomous but distinct execution environments for each individual bContract. The contracts with mismatching fingerprints can be excluded from the consensus, and timely fingerprint reports can be guaranteed even if some contracts are unable to establish consensus within their respective contexts. The goal of each bContract is to assure that a transaction is executed identically across all the cells. To enforce this, after each transaction, the called bContract produces a fingerprint of its current data. If the fingerprints do not match, the bContract is temporarily excluded from the snapshot. As a result, each transaction entails an identical state transition of each cell in Blockumulus. If a cell becomes irresponsible or fails the verification, it is excluded from the consensus until the next report cycle. Report Timing: Prior to deployment, the cloud consortium determines the system invariants that cannot be changed during the lifetime of the system. One of these invariants is the snapshot report period, denoted λ, which is measured in seconds. In Blockumulus, the report deadlines are all timestamps divisible by λ. Therefore, the last report deadline can be calculated as td = tc M OD λ, where tc is the current timestamp. Thus, the upcoming report deadline is calculated as tnext = λ + tc M OD λ. Every data snapshot, denoted Si , has a serial number i, which is called the report td −t0 cycle, represented as λ , where t0 is the deadline of the very first snapshot in the Blockumulus deployment. Subsequently, the Blockumulus protocol requires that each cell reports the snapshot Si by the end of cycle i + 1 in order to be treated as valid during the cycle i + 2. 151 Figure 6.3: Reporting of current cell state to the smart contract. 6.3.2: Blockumulus Components Next, we introduce the major components of Blockumulus: consortium of cloud cells, decentral- ized Blockumulus smart contracts (bContracts), clients, Ethereum smart contract, and independent auditors. Cloud Consortium: The cloud consortium is a pre-defined set of Blockumulus cells. The number of cells should be sufficient to guarantee the availability of the system, but it should not be too large (i.e., 10 or less) to avoid performance degradation. Unlike peers in blockchain, multiple cells in Blockumulus are used to achieve the accessibility and fault-tolerance, rather than the consen- sus, which will be detailed in Section 6.4. Moreover, since clouds allow vertical scalability (i.e., adding resources to existing entities), a large number of cells (horizontal scalability) is not needed for performance advancement either. The size of the consortium and the set of identities of the 152 participating cells are the invariants that must be decided at the time of deployment. Blockumulus Cell: A Blockumulus cell is a network node on the cloud, which is sufficient for participating in Blockumulus consensus. A cell can be represented by a virtual machine, physical dedicated server, or a compute cluster — whichever meets the demands of the system. bContracts: Blockumulus smart contracts (bContracts), are decentralized programs deployed on Blockumulus, whose functionality is similar to smart contracts in Ethereum or chaincode in Hyper- ledger Fabric. There are two types of bContracts: system bContracts and community bContracts. The system bContracts are pre-deployed in Blockumulus, and they cannot be removed. The com- munity bContracts are developed and deployed by clients. Blockumulus Clients: Blockumulus client is a person or software that interacts with a deployed bContract. Blockumulus is a permissionless environment for clients, which means that clients do not have to register a Blockumulus account. However, akin to the ISP model for Internet access, a client should have a subscription to Blockumulus through one of the cells. The subscription, however, does not incur any control over the use of Blockumulus. The purpose of the subscription is to charge for data transferred or time period during which the subscription is active. This contrasts with the transaction fee collection observed in public blockchains. As a result, Blockumulus offers flexibility that allows cells to establish their own pricing policies to compete for customers. Ethereum Smart Contract: Each Blockumulus deployment has a smart contract on Ethereum blockchain, which stores hashes of the reported snapshots. To avoid retrospective modification, the repeated reporting for the same timestamp is prohibited by the logic of the smart contract. Blockumulus Auditors: Akin to public blockchain, Blockumulus is an open-data system with transparent execution, i.e., Blockumulus data is available to everyone, and everyone can indepen- dently trace state transition between a given pair of subsequent data snapshots. Auditors are volun- tary permissionless participants that run software to oversee the integrity of the Blockumulus de- ployment. The community auditing model, which demonstrated its efficiency in public blockchains, is also employed in Blockumulus. Auditors can be a paid participants, community enthusiasts, se- curity bounty hunters, or academic researchers. Moreover, cells in the consortium can perform 153 Figure 6.4: Blockumulus audit procedure. cross-audit. The process of auditing requires only a server and the auditing software that is run- ning on this server to monitor the integrity of Blockumulus. Fig. 6.4 shows the procedure of the Blockumulus audit. The auditing software performs two major tasks: snapshot succession audit and data integrity audit. The snapshot succession audit is the verification that all the transactions processed by all bContracts between two reports indeed entail a state transition from one data snap- shot into another. The data integrity check verifies that: a) the snapshot fingerprints have been reported to the smart contract on time; and b) the fingerprints in reports match the actual data in the cells. 6.3.3: Blockumulus Cell Architecture In this section, we take a closer look at the architecture of a cell, which is shown in Fig. 6.5. Blockumulus Core: Blockumulus cell Core is responsible for networking, cryptography, synchro- nization, protocol, process and thread management, signature and authenticity verification, trans- action parsing, data encoding and decoding, and communication with the smart contract. Uniform RESTful Interface: Blockumulus assumes six vectors of communication: client-cell, cell-cell, auditor-cell, cell-blockchain, auditor-blockchain, and client-auditor. The client-cell, cell- cell, and auditor-cell communications have a uniform RESTful interface. Specifically, each request 154 Figure 6.5: Blockumulus components and bContracts. is either GET or POST HTTP request with the body formally represented as the following set: M = {P = hAs , Ar , O, η, τ, t, Di, Sigs (P)} , where P is the payload of the message, and Sigs is the ECDSA signature calculated via the private key of the sender. The tuple P has the following components: As is the public address of the sender, Ar is the public address of the intended recipient, O is the operation code, η is a random nonce used as a message ID, τ is the ID of the message that M is replying to (if applicable), t is the current timestamp, and D is the data, whose format is determined by O. Keys: Each cell uses an Ethereum account to represent itself within Blockumulus. The set of public addresses51 of Blockumulus cells is fixed for each deployment and is hard-coded in the Ethereum smart contract. System Invariants: Some parameters of a Blockumulus deployment that remain constant for a 51 In Ethereum, a public address of an account is the 160-bit prefix of the Keccak256 hash of the account’s public key. 155 lifetime are called the system invariants. Examples of system invariants are: unique deployment ID, identities of the cells, reporting period λ, initial timestamp t0 , etc. However, the IP addresses of cells are not among the invariants, which allows cells to change location, or network configuration — we assume that these settings are exchanged between cells. System bContracts: The system bContracts are pre-implemented as part of Blockumulus, and they cannot be removed. These bContracts deliver essential functionality to the system, and their number can grow as Blockumulus framework evolves. The current version of Blockumulus in- cludes two system bContracts: community bContract deployer, and content-addressable storage (CAS). The community app deployer serves as an interface for developers to add their community bContracts to Blockumulus. The CAS contract has two major functions: a) it allows to store large files outside of data models of community bContracts, thereby significantly improving the perfor- mance of fingerprinting and cloning; and b) it establishes a secure communication channel between bContracts, which are otherwise autonomous and isolated. Community bContracts: Community bContracts are deployed by users of Blockumulus. The cells have no power to modify, censor, or control these contracts. The deployer of a community bContract can specify the ownership and other parameters of the contract, including the ability to destroy one. bContract Interface: In order to create a bContract, the developer should implement a standard bContract interface, which includes smart contract data model, data fingerprinting, and snapshot cloning. Then, the developer writes the bContract code for the interpreter specified in the configu- ration. 6.3.4: Blockumulus Protocol Data Snapshots and Fingerprinting: Blockumulus data is stored in bContracts according to their respective data models. For example, one bContract can store data in binary files, while others may use SQLite. To prevent operations with large data instances, bContracts can upload data blobs to Blockumulus CAS, and refer to these blobs via their hashes. Blockumulus performs CAS reference 156 Figure 6.6: Blockumulus lifecycle. counting, purging CAS entries only when their reference counters reach zero. Operation Lifecycle: Fig. 6.6 shows the lifecycle of Blockumulus involving an oscillation of two stages: main stage and report stage. In the main stage, which is longer than the report stage, Blockumulus actively accepts and processes incoming transactions that shape the current data snap- shot. During the main stage, auditors download the previous data snapshot for review and storage. In the report stage, Blockumulus accepts transactions, but instead of executing them, it queues them in a buffer. Once the current snapshot is fingerprinted, Blockumulus continues executing incoming and queued transactions. Also, as soon as the fingerprint is ready, the cell saves it in the smart contract. However, at this point, the execution of the incoming transactions resumes be- cause the execution inhibition is needed only for calculating the fingerprint, not for smart contract submission. Transactions: Fig. 6.7 shows a general overview of a Blockumulus transaction. The transac- tion begins with a client creating a transaction message M, which is signed and sent to the the Blockumulus cell, called the service cell, with which the client has an access subscription. The service cell first authenticates the transaction by confirming that the transaction message is signed by the user with the same identity (public address) as the one found in the transaction message. Then, the service cell forwards the transaction to all the cells in the consortium. After that, the cells of the consortium verify and execute the transaction and send a signed confirmation back to the 157 Figure 6.7: Blockumulus transaction workflow. ¶: Client creates a transaction and commits it to the the Blockumulus cell with which they have a Blockumulus access subscription; ·: the service cell verifies the authenticity of the transaction, and forwards it to all the other cells in the consortium; ¸: the cells of the consortium process the transaction and send a signed confirmation back to the service cell within a strict deadline; ¹: the service cell executes the transaction, serializes the confirmations into an aggregated receipt, and sends it to the client as a reply to the initial commit request. service cell within a pre-determined short time frame. If the forwarded transaction is not processed by all cells until the established deadline, the transaction reverts. If a cell misses the deadline more often than a pre-determined threshold, it is temporarily excluded from the consensus upon mutual agreement with the other cells. Finally, the service cell verifies the fingerprints of the resulting data snapshots reported by the other cells, and executes the transaction by itself. If the result of the execution matches the fingerprints reported by the other cells, the service cell serializes the confirmations into an aggregated receipt, and sends it to the client as a reply to the initial commit request, which constitutes the transaction confirmation event with a multi-signature cryptographic proof. Incentive for Cooperation: Here, the incentive for cooperation is discussed through the P2P net- 158 work perspective. Unlike in public blockchain consensus (e.g., Nakamoto consensus), Blockumulus is designed in a way to encourage cooperation and make cheating unbeneficial. The combination of synchronous execution, fixed cell topology, open data, transparent execution, and payment model separated from consensus create an arrangement in which cells have no incentive to cheat. More- over, each cell benefits from fast and successful execution of transactions by all other cells in the system. The following theorem confirms that competition for voting power, typical for blockchains, is not pertinent to Blockumulus. Theorem 1: The minimum required number of valid cells in Blockumulus overlay consensus is the same for all M ≥ 2. Proof: As per design of Blockumulus, the auditor software verifies that the deployment has at least one cell i that maintains the succession of reported snapshots Si,j and correctness of the correspond- ing smart contract reports Ri,j , i.e.: succession ∃ 1 ≤ i ≤ M ∀1 ≤ j ≤ tc M OD λ−t0 λ : Si,j −−−−−−→ Si,j+1 ∧ H(Si,j ) = Ri,j , (6.1) where H is the hash function used for fingerprinting in Blockumulus. Suppose that M = 2, one cell is valid, while all other cells may or may not be compromised or cheating. In this case, formula (1) evaluates to “true”, because either Cell 1 is valid, or Cell 2 is valid, or both of them are valid. Now, suppose that M = Q (Q > 2), and one cell is valid, while all other cells may or may not be compromised or cheating. In this case, formula (1) again evaluates to “true”, because there is a cell with an index in the range [1, Q], which maintains succession of snapshots and correctness of the fingerprint reports. Therefore, the minimal number of cells required for the overlay consensus is always 1. ■ 6.4: Scalability Analysis In this section, we formally explore the scalability of Blockumulus through an asymptotic com- plexity analysis. All the assumptions in this section follow the real implementation of the system 159 described later in Section 6.6. Here, we assume that K clients submit N successful transactions to a Blockumulus deployment with M cells. We use the symbol c to denote a constant value that does not grow as the system scales. Number of Cells: Unlike blockchain, in which an increase of the number of nodes benefits decen- tralization, the Blockumulus overlay consensus requires only one valid cell to sustain normal oper- ation, including prevention of conflicting transactions, such as double spending. As per Theorem 1, proven in Section 6.3, adding more cells does not enhance the decentralization of a Blockumulus deployment. Thus, we neither require the number of cells M to be scalable, nor do we assume its scalability. The two reasons for using multiple cells in Blockumulus is to enhance availability of the system through replication and to increase the diversity of Blockumulus access providers. 6.4.1: Transaction Latency Transaction latency is the total delay experienced by the client between the initiation of a transaction until the confirmation of its completion. The cumulative transaction delay in the system, denoted ∗ Ldelay , can be expressed as Ldelay = N · (D1 + maxM i=2 (Di + Di ) + Dc ), where D1 is the delay of sending a transaction to the service cell, Di is the delay in forwarding the transaction to cell i, Di∗ is the delay of response from cell i to the service cell, and Dc is the delay of sending the response to the client. We also assume that Di + Di∗ < δ for all i > 1, where δ is maximum transaction forwarding delay. Each of the N transactions begins with the client sending it to the service node, which simultaneously forwards the transaction to all the other cells, followed by an immediate parallel response from these cells to the service cell. Then, it finishes by sending the aggregate response to the client from the service cell. Now, since D1 , δ, and M do not grow with increased number of transactions, the transaction latency complexity can be presented as Ldelay = N · (c + c · c + c) = O(N ). Therefore, the transaction latency in Blockumulus grows linearly with the number of transactions. Section 6.6.3 further shows that the transaction latency remains low even when the cells are deployed on low-tier cloud servers with an extreme transaction load. 160 6.4.2: Communication Overhead Transaction communication overhead is the total amount of data transferred within Blockumulus in the course of N transactions. The communication overhead Ldata of N Blockumulus transactions can be expressed as follows: XM X M Ldata = N · [Hc + Pc + (M − 1) · (H1 + Hc + Pc ) + (Hi + Pi ) + (Hi + Pi )], (6.2) i=2 i=1 where Hc is the header sent by a client, Pc is the payload sent by a client, Hi is the header sent by a cell i, and Pi is the payload sent by the cell i. Since headers and payloads of messages do not become bigger with more transactions, and the number of cells remains constant, Eq. (6.2) can be reduced as Ldata = N ·[2c+c·3c+c·2c+c·2c] = O(N ). Therefore, as the number of transactions grows, the communication overhead also experiences a linear increase. In Section 6.4.2 we show that this complexity is practically amenable and does not lead to bottlenecks even under an extreme transaction load. 6.4.3: Data Storage We assume that each transaction in Blockumulus leaves a data footprint Ui , which is replicated across participating cells, and also appears in three snapshots52 : the snapshot currently being built, and also two previous snapshots left for auditing. The data storage can be written as Lstorage = P 3·M · N i=1 Ui . Since the number of cells M and each of the size of stored data items Ui do not grow with the increasing number of transactions and users, the following reduction takes place: Lstorage = 3 · c · N · c = O(N ). Therefore, the complexity of the stored data is linear with respect to the number of transactions. 52 Blockumulus uses the CAS subsystem to prevent unnecessary replication of the same data across several snapshots. However, since our analysis pursues the upper bound complexity, we assume 100% replication of the data. 161 6.4.4: Computation In our Blockumulus compute analysis we take into consideration the processing performed both by cells and by auditors. We further assume that the number of auditors is linearly proportional to the number of users K, i.e., certain percentage of users serve as auditors. Then, the cumulative P PN computation overhead can be represented as Lcompute = K · N i=1 (Ci ) + M · i=1 (Ci ), where Ci is the amount of computation required for processing a single transaction i on a single computer. Since each computational load and the number of cells remain the same with growing number of transactions and users, we perform the following reduction: Lcompute = K·N ·c+c·N ·c = O(KN ). Therefore, the compute overhead of Blockumulus has a linear dependency upon both the number of users and the number of transactions, which suggests that the cells may require to proportionally increase their compute power as the number of users grows. Since users are expected to pay for Blockumulus access, the above requirement is unlikely to form a scalability bottleneck. 6.4.5: Snapshot Reporting Each Blockumulus cell reports fingerprints to the smart contract with constant frequency F = λ1 . By representing the report timeline through R, the blockchain fee overhead is as follows: Lf ee = M · R · F . Since the number of cells M is fixed, the fee does not change over time, and the report frequency is also fixed, i.e., Lf ee = c · c · c = O(1). Therefore, as a Blockumulus deployment grows, the fee overhead remains in the same order. 6.5: Security Analysis Blockchain is a target of a wide range of security threats, from consensus-based attacks [216] to social engineering attacks [158], and Blockumulus is not an exception. In this section, we scruti- nize critical scenarios that pose security threats to a Blockumulus deployment, and we show how Blockumulus addresses these challenges. 162 6.5.1: Double Spending A double spending is a situation in which two mutually exclusive transactions are executed by a distributed system, such as repeated transfer of the same cryptocurrency balance. Consider a situation in which Alice, who has 10 crypto coins, creates a transaction that sends 10 coins to Bob, and another transaction with identical timestamp that sends 10 coins to Charlie. After that, Alice simultaneously submits one of these transactions to Blockumulus through Cell 1, and another one through Cell 2. Assume that the transaction storage of Blockumulus is properly implemented with a mutex-based storage (i.e., the one that does not permit simultaneous writing operations), which can be achieved through file locks or ACID databases. The two transactions will be saved in the ledger in the order of their arrival. Subsequently, the transaction that is executed second will be rejected, effectively preventing the double spending. Furthermore, Blockumulus transactions are executed synchronously by all cells. Unlike blockchain, which allows a temporary partition into peers that have already processed a transaction and peers that have not, Blockumulus prohibits temporary asynchrony using the synchronous execution with a mutex-based storage. Therefore, the situation where Bob received 10 coins from Alice according to one cell and Charlie received 10 coins according to another cell is impossible. 6.5.2: Transactions Filtering Attack Blockumulus cells might prevent routing of a certain transaction to a bContract via a transactions filtering attack. For example, consider a bContract that re-invests dividends if an investor fails to withdraw them until a certain deadline. The invested business might bribe the cloud consortium to filter out the withdrawal transaction — in which case the auditors will not be able to detect any anomaly. In Blockumulus, we address this issue by enforcing the execution of a transaction via the Ethereum smart contract. If a transaction is censored, it can be submitted directly to the smart contract, and the system protocol stipulates the necessity to execute all transactions submitted in this way. Since the smart contract is not under any party’s control, users have the ability to enforce a transaction even when Blockumulus has only one operational cell. 163 6.5.3: Consortium Conspiracy The cells might conspire to tamper with the snapshots in three possible ways: 1) by modifying an existing transaction, 2) by removing an existing transaction, or 3) by injecting a new transaction. If an existing transaction is modified, it will immediately break the verification of the transaction signature generated by the sender. If an existing transaction is removed before the report is sub- mitted to the smart contract, the receipt of this transaction signed by the cell becomes the proof of malice by the cell. Finally, if a new transaction is added before the report, it is a legitimate way to change data in the snapshot and does not need to be defended against. Another type of consortium conspiracy is a system-wide subscription ban of a user by all Blockumulus providers. Fortunately, this type of conspiracy can be easily prevented in the same way as in the case of transaction filter- ing attack (see Section 6.5.2), i.e., by letting users submit contingency transactions to the Ethereum smart contract. 6.5.4: Compromised Cells An attacker might compromise one or several Blockumulus cells to skew the overlay consensus. Let us consider the worst-case scenario, in which the attacker gained full access to the majority of cells in a Blockumulus deployment to cause the Byzantine Fault event. In this case, a consen- sus node cannot verify the true state of the system based on the testimonies from the other nodes. However, Blockumulus is not prone to the Byzantine Fault scenario, because the Ethereum smart contract, deployed on a Byzantine Fault Tolerant (BFT) blockchain, prevents the cells from deliv- ering inconsistent testimonies to different parties. 6.6: Implementation and Evaluation We implement Blockumulus framework and evaluate its transaction latency, communication over- head, transaction throughput, and operation cost. To account for different configurations, we test the system performance with three different sizes of the cloud consortia: N = 2, N = 4, and N = 8. 164 (a) 2 cells (b) 4 cells (c) 8 cells Figure 6.8: Transaction latency for FastMoney funds transfer with different sizes of cloud consortia based on 500 requests. 6.6.1: Implementation We implement the full stack of Blockumulus for evaluation and proof of concept. The Blockumulus API is implemented using Web3.js 1.3.0, and node-rest-client 1.3.1. The Blockumulus core frame- work is implemented using Node.js 10, Express 4.17, and Web3.js 1.3. We deploy 8 Blockumulus cells on individual Ubuntu 20.04 servers on Microsoft Azure cloud. Then, we implement the Ethereum smart contract using Solidity 0.8.0, with the test deployment available on Ropsten net- work at 0x2F2980067A524a9A12C46354D62B8D769Ee119AB. The implementation includes 2553 lines of code. To demonstrate the performance of Blockumulus, we implement a sample bContract called FastMoney using Python 3.6 and Web3.py 5.13 (for fingerprinting), which delivers a decentralized digital currency. Then, we implement the user clients for FastMoney and CAS in JavaScript and Web3.js, which are used for automated evaluation, as described below. 6.6.2: Test Setup System Under Test: Our system under test (SUT) includes a set of cell deployments and an Ethereum smart contract on the Ropsten testnet. For latency evaluation, we use Blockumulus cell deployments with three different sizes of cloud consortia: N = 2, N = 4, and N = 8. For each cell, we deploy an Azure B1ms instance with Ubuntu 20.04 LTS. Test Harness: We use Blockumulus API to create custom test clients with the additional functional- ity of generating a random account for each request to simulate different clients and avoid potential 165 caching of data related to a single account. Then, we deploy 8 client pools, which are Azure Virtual Machines running Ubuntu 20.04 LTS each, scattered across different geographic regions for better simulation of a real-world distribution of clients. Evaluation Metrics: In this work, we measure transaction latency, communication overhead, trans- action throughput, and operational cost of our prototype. 6.6.3: Transaction Latency We evaluate the transaction latency of our Blockumulus deployment by measuring the time between submitting a transaction until the acquisition of the receipt. We conduct two latency evaluation tests: distribution of delays of standalone transactions under normal load, and transaction latency under the load of a large number of simultaneous transactions. The results of the first experiment are shown in Fig. 6.8. In this experiment, we measure trans- action latency for the funds transfer in FastMoney bContract with the sizes of the cloud consortia of 2, 4, and 8 cells. For each consortium size, we execute 500 consecutive transactions and measure their confirmation delays. When the size of consortium is 2, 90% of transactions execute in under 2 seconds. When we double the size of consortium, the upper boundary of the confirmation delay of 90% of transactions increases by around 50%, which is slower than the increase of the number of cells. By doubling the number of cells again up to 8, we observe again that 90% of transactions finished in under 5 seconds, which is around 66% greater than in the case of 4 cells. Thus, the result is indicative that the growth of the transaction latency is slower than the number of cells. In the second transaction latency measurement, we conduct a stress test with multiple trans- actions issued at the same time. For this experiment, we use the CAS system bContract, and run 9 experiments: with 5,000, 10,000, and 20,000 transactions, for each of the consortia sizes seen in the previous experiment, i.e., 2 cells, 4 cells, and 8 cells. Similar to the previous experiment, we can observe that in each configuration, as the number of transactions doubles, the transaction confirmation time increases by a lesser factor. 166 (a) 2 cells (b) 4 cells (c) 8 cells Figure 6.9: Transaction latency for simultaneous CAS upload requests with different sizes of cloud consortia. 6.6.4: Communication Overhead Table 6.2 shows the TCP overhead observed in Blockumulus while processing a transfer transaction with FastMoney bContract. In order to observe the communication between cells, we create a 2-cell Blockumulus deployment on a local machine and run WireShark, in which we use the Follow TCP Stream function to observe the cumulative traffic of each communication for each direction. The results shows that, in the worst case, the largest communication is around 4 Kbytes per transaction in downlink direction. A speed test using the Ookla software on several Azure servers revealed the available bandwidth around 8.5 Gbps in the downlink direction and around 1 Gbps in the uplink direction. Since the overhead of a FastMoney transaction does not exceed 4 Kbyte, the 1 Gbps server bandwidth is capable to transfer the data of more than 30,000 transactions per second, which exceeds the average throughput of all credit card transactions in the world [1]. 167 Table 6.2: Communication overhead in FastMoney (bytes). 2 cells 4 cells 8 cells Communicationa in out in out in out CL ↔ C: fingerprint 1,200 516 2,179 516 4,135 520 CL ↔ C: payment 1,140 559 2,059 559 3,895 563 CL ↔ C: forward 667 947 667 946 667 947 a CL ↔ C: between client and cell; C ↔ C: between two cells. 6.6.5: Transaction Throughput For this evaluation, we transfer a small amount of funds from one FastMoney account into another, measuring the full delay between the submission of the transaction until receiving the confirmation. We do not generate any failing transactions, nor do we observe any failures during the stress test. We run 9 experiments, matching three deployment configurations (2, 4, and 8 cells) with three sizes of transaction load (5,000, 10,000, and 20,000 simultaneous transactions), with the result shown in Fig. 6.10. The result demonstrates that: while the increased number of cells reduces the transaction throughput, the growing number of transactions makes the throughput larger, which is expected because the latency is growing slower than the number of simultaneous transactions, as was shown earlier. We attribute this “bulk discount” effect to the benefits of parallel execution, caching, and a significant reserve of available bandwidth due to the low communication overhead of Blockumulus. 6.6.6: Operational Cost Blockumulus delivers transaction performance similar to credit card providers, alongside with de- centralization properties seen in cryptocurrencies such as Bitcoin. This reconciliation of perfor- mance and decentralization comes at a price of delayed final settlement of transactions. Specifically, any confirmed transaction hinges upon trust towards the cells until the corresponding snapshot is submitted to the Ethereum smart contract. Therefore, the frequency of snapshot reports defines the speed of final irreversible settlement of recent transaction sets in Blockumulus. Table 6.3 shows how much each of the participating clouds will pay in Ethereum fees in 24 hours for data vali- dation based on the frequency of the reports. Depending on the projected user participation and 168 Figure 6.10: Transaction throughput in Blockumulus. other goals of the Blockumulus deployment, the consortium can balance cost and frequency of re- ports. For comparison, the average price per Ethereum transaction on January 13, 2021 is $5.72 [4], with approximately 1,000 daily transactions [5]. With the same number of daily transactions, the Blockumulus fee overhead per transaction would be 218.08/1000 = $0.218 with 10-minute report frequency, which is about 26 times less than that in Ethereum. Moreover, the more subscribers a Blockumulus cell has, the lesser the amount of money is required per user to cover the reporting fee. For example, if a Blockumulus cell has 10,000 active subscribers, the monthly reporting fee overhead per user would be only $0.65. We do not add the cost of auditing to the overall cost because cross-auditing is already a part of the normal cell operation, and the third-party auditing does not incur any expense for Blockumulus cell operators. 6.7: Chapter Summary We propose Blockumulus, the first scalable framework for deploying decentralized smart contracts on the cloud, to address the blockchain scalability limitations on three dimensions: transaction throughput, data storage, and computation. The core idea of Blockumulus is to exploit a novel overlay consensus which delivers decentralization to smart contracts in a centralized cloud instead of random P2P network nodes. Concretely, a consortium of centralized cloud computing nodes can 169 Table 6.3: Cost of Blockumulus smart contract fees for participating cloud services based on the report period. Report Cost per 24 hours per cloud provider Period Gas Approx. USDa 10 min 7,083,792 218.08 30 min 2,361,264 72.69 1 hour 1,180,632 36.35 8 hours 147,579 4.54 24 hours 49,193 1.51 a With the market price of Ether $733 and gas price 22 GWei. host a permissionless smart contract environment where clients can control the execution of their customized contracts and manage the data stored by these contracts. Our evaluation on Microsoft Azure shows that Blockumulus can execute tens of thousands of transactions within a minute, which is on par with the average throughput of worldwide credit card transactions. By integrating the decentralization of smart contracts and the scalability feature of the cloud, Blockumulus takes the first step towards high-performance data-rich smart contracts with high transaction throughput. 170 CHAPTER 7: DECENTRALIZED NETWORK OF WI-FI HOTSPOTS53 7.1: Introduction The number of mobile Internet users have been steadily increasing, corroborating a pressing need for reliable wireless connectivity to be available everywhere, all the time. As opposed to cellular communications, WiFi provides a low-cost solution for wireless Internet access with a miniature infrastructure [99]. During the past two decades, WiFi has become the de facto standard for wireless local area networks (WLAN) and Internet-of-Things (IoT) [211]. The WiFi technology has been used to create hotspots to offer Internet access to users in their proximity. WiFi hotspots are typically seen in such venues as airports, cafes, hotels, etc. Private hotspots are often configured in enterprise, personal, and household networks to serve limited num- ber of WiFi-enabled devices. Both public and private hotspots often require authentication and/or payment. Two or more hotspots belong to the same authentication domain if they share the same authentication server and, if applicable, share a payment server. Although a number of technolo- gies have been introduced for cross-domain authentication, such as Passpoint [12] and eduroam [2], WiFi hotspots are still partitioned into a multitude of incompatible domains, which makes seam- less WiFi roaming infeasible. In this work, we introduce a practical solution for a universal (i.e., cross-domain) and decentralized hotspot network, which addresses the domain partitioning prob- lem. Ultimately, we envision a fully automated cross-domain authentication between wireless APs provided by different businesses and private owners, forming a global permissionless decentralized 53 This chapter is based on previously published work by Nikolay Ivanov, Jianzhi Lou and Qiben Yan titled “SmartWiFi: Universal and Secure Smart Contract-Enabled WiFi Hotspot” published at the Security and Privacy in Communication Networks. SecureComm 2020. Lecture Notes of the Institute for Computer Sciences, Social Infor- matics and Telecommunications Engineering, vol 335. Springer, Cham. DOI: 10.1007/978-3-030-63086-7_23 [159]. Reproduced with permission from Springer Nature. 171 network of free and paid hotspots. However, in order to achieve this goal, a number of existing hotspots’ shortcomings must be addressed. Motivation: Despite its obvious benefits and popularity, the current WiFi hotspot technology expe- riences significant shortcomings: M1: Security: Public WiFi often eliminates password protection or conveys the passwords insecurely. M2: Unreliable performance: The speed of a WiFi hotspot largely depends on several unpredictable factors, such as the number of connected users or the bandwidth consumption of each individual user. Moreover, the hotspot owners generally have no incentive for upgrading hardware and service. M3: Limited access: Traditional WiFi hotspots do not offer a universal service for everyone. To be associated with a hotspot service, a user should be ascribed to a certain role or affiliation. The users’ access to the service hinges upon their partic- ular subscriptions. M4: Cumbersome procedure or high infrastructure cost: Connecting to a WiFi hotspot often requires extensive manual effort, such as: searching for SSID, entering payment de- tails, specifying authentication settings, etc. Although WLAN direct IP access or 3GPP IP access enable easy configurations, they both rely on a heavy-cost cellular authentication infrastructure. In this research, we envision that transferring the point of centralized trust from hotspot and/or client to a decentralized independent party, i.e., blockchain, enhances security of the connection and payment while simplifying the configuration procedure (to address M1). SmartWiFi hotspot establishes the dependency between the Quality of Service (QoS) and payment, which creates an in- centive for hotspot owners to deliver a high QoS (to address M2). The proposed hotspot technology is universal and accessible, i.e., it serves all clients who have means to pay, while also supporting unrestricted free WiFi hotspots (to address M3). The simplified configuration procedures offer a full automation of handshake, connection control, and checkout using the enforced execution of smart contract protocols without relying on complex server-based or cloud authentication infras- tructure (to address M4). Key Challenges: Designing a universal smart contract-enabled WiFi hotspot involves three major challenges. First, blockchain execution incurs significant processing delays, rendering the execu- tion of many operations impossible within reasonable time limits. Second, blockchain offers very 172 limited data storage. Third, blockchain networks charge considerable fees for executing block- modifying operations, e.g., payment transactions, smart contract deployments, smart contract state transitions, etc. In this work, we present SmartWiFi, the first operational smart contract-enabled WiFi hotspot with automated cross-domain authentication. SmartWiFi leverages a novel off-chain protocol called Hash Chain-based Network Connectivity Satisfaction Acknowledgement (Hansa54 ) to man- age secure and reliable connection. An off-chain protocol establishes communication between two entities using blockchain, but executes without any interaction with blockchain, which allows Hansa to enable a fast, low-cost, and low-overhead provider-client interaction with significant re- duction of blockchain delays and fees. In addition, we present DupSet, a Dynamic User-Perceived Speed Estimation Technique, which reliably estimates the speed of Internet connection for client- side QoS control. Leveraging these novel techniques, we design and implement SmartWiFi desktop and mobile apps using a smart contract executed by an Ethereum Virtual Machine (EVM). A video demonstration of the SmartWiFi app is available at https://youtu.be/jrDl204fGso. This work makes the following main contributions. • Protocol Design: To build SmartWiFi, we propose Hansa, a novel cryptographic scheme that provides cross-domain authentication and establishes a smart contract-enabled off-chain session arrangement for a hotspot and a client. It provides a fast and low-cost smart contract execution by restricting blockchain transaction delays and fees. We also design DupSet to quantify the QoS of Internet access provided by SmartWiFi hotspots to clients. DupSet allows SmartWiFi clients to perform low-overhead bandwidth estimation to measure the quality of Internet connection. • System Implementation: We implement operational prototypes of Smart-WiFi router and client that use Ethereum blockchain as a smart contract platform. Both components are cross- platform, hardware-agnostic, and can be easily deployed into existing infrastructure. In addi- tion, we implement a fully-functional SmartWiFi Android app, demonstrating the feasibility 54 The name is inspired by the Hansa Trade League, which successfully operated under the power of mutual trust for over a century in a turbulent political and economic environment of Medieval Europe. 173 of deploying SmartWiFi on non-rooted mobile platforms. • Experimental Evaluation: We rigorously evaluate the delays, blockchain fees, and commu- nication overhead of SmartWiFi on Ropsten and Mainnet Ethereum networks. We also scruti- nize the DupSet technique by juxtaposing its measurements with the results from nine popular bandwidth measurement services. Furthermore, we evaluate the scalability of SmartWiFi by demonstrating the stability of the system under the load of more than 100 simultaneous client processes connected to a single SmartWiFi router. 7.2: Background and Key Insights 7.2.1: Blockchain and Smart Contracts Formally, blockchain is a distributed abstract data structure (ADS) represented by a list of objects (blocks), which are cryptographically linked in such a way that a modification of any block would require a chain recalculation (validation) of all subsequent blocks in the list. Consequently, any block-modifying operation, except append, draws a considerable execution time complexity. The block validation speed is deliberately throttled in the proof-of-work (PoW) consensus protocol employed by Ethereum, Bitcoin, and some other blockchains, making retrospective modifications of these blockchains nearly impossible. Practically, the term blockchain is used to refer to one of many peer-to-peer (P2P) networks that store, synchronize, and cross-validate their respective blockchain data structures. A smart contract is a distributed deterministic application, deployed on blockchain, and individually executed by the blockchain participants, with any associated data and results being part of the consensus. Therefore, smart contracts can establish, execute, and unequivocally enforce protocols and agreements between parties. 7.2.2: Threat Model We consider a threat model with both malicious clients and malicious hotspots. Malicious clients would attempt to obtain Internet access without payment, which is regarded as free-rider attack. They could also try to bring significant performance degradation or complete shutdown of the 174 hotspot. Malicious hotspots, on the other hand, aim to get payment from the clients without pro- viding sufficient QoS. We assume that hotspots and clients have no knowledge regarding their respective identities, and they have no pre-established trust. Moreover, the blockchain, smart contract, and its underlying cryptography are considered secure and trusted by hotspots and clients, i.e., we do not consider a wide range of attacks towards blockchain [69] and smart contracts [271]. 7.2.3: Overview of Key Insights Recognizing the shortcomings of existing WiFi hotspots, we bring forth a set of key insights that lead to the design of SmartWiFi. Off-Chain Interaction: High delays and fees in blockchain networks make it impossible to query the blockchain frequently for trust renewal. The idea of off-chain interaction, in which a smart contract is used by two or more parties as a guarantor of a protocol, but not as an executor of this protocol, has been proposed for fast and cheap payments [111, 232]. We extend this idea to WiFi hotspots by limiting the blockchain interaction to only handshake (session initiation) and payment resolution (session conclusion). Cryptographic Satisfaction Acknowledgement: One of the key design goals of SmartWiFi is to develop a protocol that would deliver a tamper-proof testimony of Internet usage time to the smart contract. The traditional approach is based on connection time and data size measurements performed by the provider itself, which relies on the assumption of its trustworthiness. However, a more comprehensive Internet traffic accounting is needed to ensure proper mutual agreement and non-repudiation. We design such a scheme using periodic cryptographically verifiable acknowl- edgements sent by the client to the hotspot. Each next acknowledgement testifies the client’s sat- isfaction in the quality of the Internet connection during a short period of time since the previous acknowledgement, which we call a session unit. Each acknowledgement can be cryptographically verified by the smart contract and exchanged for funds reserved in the contract. Hash Chain Data Compression: The cryptographically verifiable acknowledgements need to 175 be stored in the smart contract, resulting in fees and consumption of computational time. Our key insight is to represent the set of acknowledgements by a hash chain, which can be generated from one random seed, and the verification of each acknowledgement will only require the head of the hash chain. Therefore, the smart contract only needs to store one hash value, i.e., the hash chain head, to verify a series of acknowledgements. We use hash chain based arrangement instead of signatures to eliminate the need to constantly use the private key by the client, which makes SmartWiFi a safer option for unattended IoT devices. Dynamic Speed Measurement: The satisfaction acknowledgement based protocol stipulates that the client evaluates the satisfiability of the Internet connection prior to sending an acknowledge- ment. Aiming for a fully-automated solution, we quantify the quality of the Internet connection using a dynamic speed measurement technique. Existing bandwidth estimation approaches require the transfer of a large amount of data, while we aim for frequent, fast, and low-overhead speed probes. Here, we simulate Internet activities using a set of HTTP servers deployed globally. The concept of measuring the speed of delivering an average web page, rather than consuming the available bandwidth, creates the possibility for frequent and low-overhead speed probes emulating actual user experience. 7.3: The SmartWiFi System In this section, we present the design of the SmartWiFi system. Unlike traditional WiFi hotspots, SmartWiFi is a universal infrastructure that supports cross-domain authentication, i.e., anyone can use SmartWiFi as a client or as a hotspot, while the smart contracts authenticate the users by their Eithereum account (generated offline and stored by user). In this work, we use Ethereum as the target platform due to its relative maturity and wide popularity. Fig. 7.1 depicts the basic building blocks of the SmartWiFi system, which consists of six major components: SmartWiFi router, the router’s Ethereum wallet, the hotspot managed by the router, the client, the client’s Ethereum wallet, and the smart contract. SmartWiFi is enabled by three main ingredients: the Hansa protocol, the DupSet speed measure- 176 SmartWiFi Hotspot SmartWiFi Client 1 2 5 6 Internet-connected Device WiFi Hotspot Client with Digital Wallet with Digital Wallet 3 4 7 8 Ethereum Smart Contracts Figure 7.1: SmartWiFi workflow. ¬ an Internet-connected device (router) provides SmartWiFi hotspot service; ­ the client connects to the hotspot and sends it a hash chain head and its public address; ® router provides a grace-period Internet access to the client and stores the public address in the smart contract; ¯ client funds the smart contract; ° router activates unrestricted Internet access for the client; ± client periodically sends the router satisfaction acknowledgements (links of hash chain); ² router claims payment from the smart contract using the last acknowledgement; ³ client is refunded by smart contract. The dashed lines represent the Hansa protocol communica- tions. ment, and the smart contract. The Hansa protocol establishes and maintains an Internet connection, and it includes two major sessions: handshake and service. Payment and refund are processed after the client-router connection terminates. While handshake, payment, and refund require interaction with blockchain, the service session is executed off-chain. DupSet is a speed measurement tech- nique that allows the clients to quantify their QoS satisfaction and continuously monitor the Internet access quality of SmartWiFi. The smart contract is designed to process the payment and refund. 7.3.1: SmartWiFi Setup SmartWiFi uses a smart contract to serve as an intermediate trust layer to hold/release the payment and enforce fair behavior between the router and the client. SmartWiFi also uses the router’s firewall policy to control the clients’ access privilege. The hotspot initiates SmartWiFi service after the router performs the following steps: (1) SmartWiFi router deploys several reusable smart contracts, the number of which equals the maximum number of concurrently-served clients; (2) the router establishes a two-way communication channel with every user; (3) the router activates a default 177 firewall policy that allows every client to have a restricted access to required services, such as the blockchain API. We define the service period as the period the client is connected to the Internet via a SmartWiFi router, and the service unit as the minimum service period that the client will be charged for. 7.3.2: Hansa Handshake Session In Hansa handshake session, the router and the client establish a relationship regulated and pro- tected by the smart contract. Hansa protocol begins when the client connects to the hotspot and establishes a TCP connection with the SmartWiFi router. The router replies with a greeting mes- sage, and the client generates a hash chain Υ using a random secret seed Υ0 . The length of the hash chain, denoted as |Υ|, is calculated as |Υ| = T η , where T is the length of the service period, and η is the length of the service unit. For instance, in our prototype, the length of the service period is 3,600 seconds, and the length of the service unit is 60 seconds, i.e., one Hansa session serves a connection up to 1 hour in length with per-minute acknowledgements. The client keeps the seed of the hash chain in secret and sends the head of the hash chain Υhead and the public address Apub to the router. The router then prepares the smart contract by storing the public key of the client’s public address and the head of the hash chain in the smart contract. After that, the router replies to the client with the address of the smart contract. Before executing prepayment, the client verifies the bytecode of the smart contract and the price per service unit ξ, which is hard-coded in the smart contract. Then, the client prepays the smart contract with the amount of cryptocurrency Ξ that corresponds to the cost of the entire service period, i.e., Ξ = ξ × Tη . Once the prepayment is processed by the blockchain, the client and router enter the Hansa service session. If the price is unacceptable, the client terminates the connection. 7.3.3: Hansa Service Session The Hansa service session begins after the router verifies that the client has funded the smart con- tract. Then follows the grace period, which seamlessly switches into an unrestricted Internet access. Meanwhile, the router sends the client a short message signifying the beginning of a service session, 178 Missed ACK Client picks random Υ0 deadline ... ... Client sends Υn-1 to router Client sends Υi to router Client disconnected Router requests Client generates Υ, |Υ| = n Router verifies that Υn-1∈ Υ Router verifies that Υi∈ Υ Router passively waiting payment using Υi Client sends Υn to router DISCONNECTION time Υhead=Υn=H(Υn-1) Υn-1=H(Υn-2) Υi=H(Υi-1) Υi-1=H(Υi-2) Υ1=H(Υ0) Υ0 ... ... Figure 7.2: Hansa timeline with respect to the hash chain Υ. In this scenario, the client disconnects after releasing acknowledgement Υi , and the acknowledgement Υi−2 was not released. When the service session timer expires, the router uses the last available acknowledgement Υi to request payment from the smart contract. and both the client and router start service session timers. Satisfaction Acknowledgement: Traditional paid WiFi hotspots charge users ahead of the service, and if the QoS is unacceptable, requesting refund is often challenging. We use cryptographic satis- faction acknowledgements to allow the client to control its service session and payment. The first service unit of a service session is regarded as a free trial, during which the client confirms that the Internet connection is active and starts measuring speed (described in Section 7.3.4). Before the end of each service unit, the client confirms a satisfactory QoS by sending to the router a satisfac- tion acknowledgement (the next hash in the hash chain), as shown in Fig. 7.2. The router verifies that the acknowledgement is the valid hash on the hash chain, replies with an acknowledgement response, and extends the connection for another service unit. Hansa allows the client to pause the connection, which can happen automatically as a result of a speed probe, or can be triggered manually by the user. If the router does not receive the next acknowledgement on time, it deactivates the Internet access for the client. During the service period, the client can resume acknowledging the service, which will reactivate the Internet access, with a maximum reactivation delay η. The service session concludes when either the timer reaches the value T , or when the client-router connection breaks. 7.3.4: DupSet Speed Measurement We present DupSet, a bandwidth estimation solution that allows SmartWiFi clients to quantify hotspots’ QoS. SmartWiFi is designed to operate in a flexible range of speeds and with different 179 Table 7.1: Summary of smart contract features required for executing Hansa. Feature Type Access Control Security Measure Υhead (hash chain head) variable rw-r--r-- timer Apub (public address) variable rw-r--r-- timer ξ (price) constant rw-r--r-- read only T (session length) constant r--r--r-- read only τ (refund delay) constant r--r--r-- read only η (session unit length) constant r--r--r-- read only balance check function r-xr-xr-x read only Υhead -accessor function r-xr-xr-x read only Υhead -mutator function r-xr--r-- timer Apub -accessor function r-xr-xr-x read only Apub -mutator function r-xr--r-- timer prepay (fund) p-function r-xr-xr-x none checkout function r-xr--r-- Υ-check refund function r--r-xr-- delay (τ ) number of mobile or stationary users, so the bandwidth estimation should be frequent and with low overhead. Traditional speed evaluation methods include four metrics: capacity, available band- width, TCP throughput, and bulk transfer capacity (BTC) [234]. Although these techniques can provide very accurate results, they are not suitable for SmartWiFi since they require lengthy probes and transfer of large amounts of data. The core of DupSet is a metric called user-perceived speed, represented by the transmission component of the throughput when loading an average web page. Measuring the transmission com- ponent, instead of the entire end-to-end communication, allows to achieve transparency with respect to different bandwidth uses, such as video streaming services or VPN traffic. DupSet draws probes from pre-selected servers. Unlike many traditional bandwidth services, such as M-Lab [201] and Ookla [225], the DupSet servers do not require to deliver high computational and throughput per- formance. Each probe calculates a statistical summary55 of readings from all the reachable servers from the list. Then, the current DupSet reading is calculated using a simple moving average56 . Each DupSet server is an HTTP server with two payload files with random information available 55 We experimentally found that the third quartile statistic achieves a better measurement accuracy compared to mean, median, and maximum. 56 We empirically determine that simple moving average over 6 periods (SMA-6) delivers stable and reliable results. 180 for download. The size of the first file (P1 bytes) is much greater than the size of the second file (P2 bytes). The client loads both the files and calculates the difference between delays of downloading the first and the second file, which extracts the transmission delay from the total end- to-end delay. Then, the user-perceived speed reading (in bytes/second) for ith server is determined EP F (P1 −P2 ) as Speedi = ∆Di , where EP F is the Effective Payload Function defined as follows:      0, if x ≤ 0;    EP F (x) = 0, if request failure;       x, otherwise. ∆Di is the time in seconds needed to load the file from the server i. The EP F function filters out unreliable results and ignores results from inaccessible DupSet servers, so when one or several DupSet servers are unavailable or provide unreliable readings, the accuracy of the DupSet result is not affected. 7.3.5: SmartWiFi Smart Contract The SmartWiFi smart contract provides an overarching trust layer between the router and the client to exchange data and payments. The SmartWiFi smart contract has the following components: a) state variables; b) state changing functions; c) cryptocurrency balance; and d) payable function (p-function) for incoming payments. The functions that do not submit transactions (pure and view functions) are called anonymously, whereas the calls to state changing functions are signed by a specific user (using the account’s private key). The minimal set of the SmartWiFi smart contract features is summarized in Table 7.1, which includes constants, variables, functions, and one payable function. The access control to each feature is represented in the Unix-style symbolic access mode format, where the first triple refers to the router’s privilege, the second triple is for user’s privilege, and the third one is for others. The security column describes protective measures employed for each feature. The price ξ, session length T , refund delay τ , and service unit length η are set as constants to 181 Algorithm 2: Smart contract payment routine INPUT: Υhead , Υi , ξ, t, T , η, Apub OUTPUT: none 1: if Υi ∈ Υ and caller = Router and T imestamp ≥ t + T then 2: RouterBalance ← i × ξ 3: Ref undAmount ← ( Tη − i) × ξ 4: T ransf erF unds(Apub , Ref undAmount) 5: end if 6: return reduce execution delays and fees. The smart contract has two variables for hash chain head Υhead and client public address Apub ; they can only be set by the router using their mutators. The accessor and balance check functions are called without fees since they do not modify the blockchain. Both the mutators use timers to prevent the modification of the values they set. The values are protected using a timer for at least the duration of a Hansa session, including handshake, service session, and checkout. The timer plays two important roles: first, it prevents a malicious modification of Υhead and Apub by the router; second, it facilitates the reuse of the smart contract, thereby reduc- ing blockchain fees and delays. The prepay function funds the smart contract. The checkout and refund functions include additional security checks as depicted in Algorithms 2 and 3, which will be described next. 7.3.6: Payment and Refund The fair payment and refund procedures are automatically enforced by the SmartWiFi smart con- tract. The smart contract holds the amount of cryptocurrency Ξ, sufficient for funding one Hansa session. The router is prohibited from claiming its payment until the blockchain timestamp reaches the value t + T , where t is the saved timestamp at the beginning of the service session. Algorithm 2 shows how the payment is executed. The inputs include: the hash chain head Υhead , last retrieved acknowledgement Υi , price ξ (stored as a constant in smart contract), timestamp t (saved during handshake), service session length T (constant), session unit length η, and the user’s public ad- dress Apub . The router obtains the payment based on the depth of Υi , and the remaining funds are transferred back to the client as a refund. 182 Algorithm 3: Smart contract refund routine INPUT: Υhead , ξ, t, T , η, τ , Apub OUTPUT: none 1: if caller.address = Apub and T imestamp ≥ t + T + τ then 2: Ref undAmount ← Tη × ξ 3: T ransf erF unds(Apub , Ref undAmount) 4: end if 5: return The execution of lines 2-4 in Algorithm 2 can only be triggered by the router. In case when the router does not request any payment, the client may never receive any refund, for which case we design an additional refund routine, described in Algorithm 3. The execution of the actual refund (lines 2–3) is permitted only by the client after the pre-determined refund delay τ , which prevents refund before payment described in Section 7.4. 7.4: Security Analysis The security threats of SmartWiFi come either from malicious clients or from malicious hotspot- s/routers. In this section, we analyze the security of SmartWiFi. Non-Service by Malicious Hotspot: The goal of the client is to have a satisfying Internet connec- tion for the money paid. If the high-quality service is not provided, full or partial refund should be guaranteed. A malicious router might refuse a service, i.e., to receive a payment without pro- viding a quality connection. To counteract such a behavior, the SmartWiFi client uses DupSet to assess the quality of Internet connection before sending each subsequent acknowledgement, while the SmartWiFi smart contract guarantees a full or partial refund. Refund before Payment: The router expects to be fairly paid after the connection period is over. The goal of the client, who prepaid the smart contract with one whole period worth of money, is to receive a refund for all service units that do not result in satisfaction acknowledgement. Refund before payment indicates the case when a malicious client claims no service received and asks for a full refund. In SmartWiFi, this threat is prevented by the refund delay τ for the router to claim payment, during which the refund is impossible. 183 Handshake Flooding: The handshake of SmartWiFi is prone to denial-of-service attacks. The goal of the adversary in the handshake flooding attack is to render the router unavailable or degrade its performance. This can be achieved by initiating multiple incomplete handshakes, in which the attacker, pretending to be a valid client, forces the router to submit values to the smart contract, for which the blockchain charges fees. In SmartWiFi, this attack is prevented by checking the balance of the client before preparing the smart contract for that client. SmartWiFi router also curbs the number of clients to serve: once the number of requests exceeds the maximum, SmartWiFi starts dropping requests. Free-Rider Attack (Non-Payment): The existence of the free trial in Smart-WiFi allows any user who funds the contract to gain one service unit of Internet connectivity without providing an acknowledgement. A dedicated attacker may use multiple client devices to interchangeably connect to the router, use the free trial period (1 minute in this work), disconnect without providing any acknowledgements, tunnel the traffic to the same outlet, and then request a full refund. We define such connection misuse as traffic hopping, which is a special case of the free-rider attack. In SmartWiFi, we prevent such threats by relying on the accruing blockchain fees. As the creation of new malicious nodes (i.e., Sybil nodes) will require the attacker to transfer funds into multiple accounts and pay fees for each funding transaction, such fees, after being summed up from multiple accounts, will nullify the benefits of the free riding. After the free trial, the router expects to receive regular satisfaction acknowledgements. Each acknowledgement from a client is expected to arrive before a strict deadline, otherwise, the Internet connection will be terminated by the router. 7.5: Implementation We implement a fully-functional SmartWiFi prototype on a Netgear router and Raspberry Pi clients for testing the general functionality and performance of the system. In addition, we implement an Android SmartWiFi client app, as shown in Figure 7.3a, for testing the performance of SmartWiFi on mobile devices. The client app can be easily ported to iOS. We use Java 11 and Web3j for implementing the software of the router, the desktop/IoT client, and the Android client. We use 184 (a) (b) Figure 7.3: SmartWiFi prototype: (a) The connection page of the SmartWiFi Android app; (b) SmartWiFi configuration with a wired Internet connection, Raspberry Pi as a SmartWiFi router, retail WiFi router with factory software, and Android smartphone as a SmartWiFi client. Infura API [11] to interact with Ethereum blockchain. Figure 7.3b shows one possible configuration of SmartWiFi, in which Smart-WiFi router soft- ware is installed on Raspberry Pi with two Ethernet interfaces: one for Internet connection, another for delivering the Internet to the WiFi router. The retail WiFi router runs its original software; the configuration of this router includes TCP port 5566 forwarding in order to allow connected devices to access the SmartWiFi router. The client in this configuration is an ordinary Android smartphone without rooting. This configuration ensures SmartWiFi’s compatibility with legacy systems, i.e., we can easily deploy SmartWiFi by plugging in a device running SmartWiFi router software. We implement a prototype SmartWiFi Ethereum smart contract using Solidity programming language. In our prototype and evaluation, we use both Mainnet and Ropsten testnet for executing the smart contracts. Furthermore, we build an IoT testbed with five Raspberry Pi clients simulta- neously connected to a single-antenna all-in-one SmartWiFi router (AMD A4 Micro-6400T, 4GB RAM, Xubuntu 18.04). This setup demonstrates that SmartWiFi can be easily adapted to support a diverse variety of IoT configurations. 185 7.6: Evaluation We thoroughly evaluate the performance of the SmartWiFi prototype by scrutinizing the following system parameters under different circumstances: blockchain-related delays, Ethereum gas fees, smart contract storage, the accuracy of DupSet speed probes, the scalability of the system, and the communication overhead. In Ethereum, all blockchain-modifying transactions require the caller to pay fees measured in the unit named gas, which is convertible into Ether using a dynamic variable called gas price. In our evaluation, the service session lasts for one hour (T = 3, 600 seconds), and the service unit is one minute (η = 60 seconds). 7.6.1: Delays In this section, we evaluate the blockchain-related delays of SmartWiFi sessions in both Mainnet and Ropsten Ethereum networks. We add the Ropsten testnet for comparison to demonstrate the performance stability of SmartWiFi under Ethereum networks with different amounts of mining hash power. Thus, we show that if the parameters of the blockchain change in the future, it will not significantly affect the performance of SmartWiFi. For each type of blockchain-related delays, ten measurements have been taken. The average delays (with standard deviations) for Ropsten and Mainnet are presented in Table 7.2, from which we observe similar delays in both the networks. The connection initiation phase, in which no blockchain interaction occurs, takes a few seconds on average; after this phase the user can start accessing the Internet. The handshake phase, whose average delay is below one minute for both Ropsten and Mainnet, initiates the payment arrangement. The code check phase, which requires only a non-modifying blockchain operation, also takes a few seconds in delay. The smart contract funding phase is essentially a cryptocurrency transaction, which requires more time than a read-only blockchain request. Similarly, payment and refund routines, although demanding additional calculations and checks, demonstrate delays just a little longer than a simple Ether transfer. In summary, to connect to a SmartWiFi router and start Internet access, the client only experiences a few seconds of connection initiation delay, which is completely acceptable. 186 Table 7.2: Comparison of blockchain-related average delays (in seconds) with relatively high gas price (100 GWei for Ropsten and 5 GWei for Mainnet). Ropsten Testnet Ethereum Mainnet Delay Type davg σ davg σ Connection initiation 3.965 0.177 4.161 0.202 Handshake 39.093 18.504 53.161 16.432 Bytecode verification 4.268 0.376 4.291 0.360 Funding 23.629 17.711 25.449 14.519 Payment 30.729 23.208 31.512 17.304 Refund 33.194 23.640 37.521 23.006 The delay of blockchain execution in the Ethereum network can be further reduced by increas- ing the gas price offered for a transaction [78]. However, such a performance optimization is not guaranteed [256]. First, the Ethereum blockchain protocol does not enforce the prioritization of incoming transactions, leaving this decision to the discretion of miners. Second, since Ethereum is a decentralized network, the increase of transaction execution speed adopts a best-effort approach. Here, we conduct an empirical testing to evaluate the delays with respect to different gas prices, the result of which, presented in Figure 7.4, demonstrates a slight but consistent reduction of total SmartWiFi session delays as the gas price increases, which shows the possibility of reducing delays by offering a higher gas price. However, given the increasing cost, the delay reduction may not be worthwhile. 7.6.2: Fees In this section, we measure the gas fees per transaction when a public function of the SmartWiFi smart contract is called. In order to exclude the possibility of variable fees, we take every measure- ment twice, and confirm that the cost remains the same for both measurements. The summary of gas fees is presented in Table 7.3. The address accessor, hash chain head accessor, balance check, and bytecode download are read-only blockchain operations, which do not incur any fees. How- ever, the mutators and payable functions require the caller to pay fees. The fees in Table 7.3 are calculated for one 60-minute service session, with 1-minute service units. 187 220 200 Session delay (seconds) 180 160 140 120 100 80 1.000 125.875 250.750 375.625 500.500 Gas Price (GWei) Figure 7.4: Full session delays with different gas prices in Ropsten network. The graph has a logarithmic Gas price axis, and it shows that while it is empirically true that offering more gas increases the chance of faster transaction, the speed improvement is insignificant. Ethereum allows the issuer of a transaction to offer an arbitrary gas price to prioritize the transac- tion. Similar to the delay-measuring experiment in Figure 7.4, we record fees over 10 measurements on Ropsten network for a more realistic 10 gas prices equally spread across the interval between 0.5 and 5.0 GWei. The cumulative fee (for both router and client) is less than $0.4, even with the highest gas price and ETH market price. Since the highest gas price is used rarely in production systems, the fee overhead is expected to be significantly lower then the maximum. It is important to note that the cryptocurrency market price variations have little effect on SmartWiFi fee overhead. Ethereum is a dynamic self-regulating system, so when the market price of Ether goes up, the users can afford less, and they offer smaller fees for transactions, which results in lower average gas price, and vice versa [78]. The curve of resulting fee in USD is thus smoothed and flattened. Therefore, regardless of any cryptocurrency price fluctuations, SmartWiFi blockchain fees paid in USD will remain approximately the same. 7.6.3: Smart Contract Storage Table 7.4 shows a comparison between data stored in the SmartWiFi smart contract with and without hash chain compression, from which we can see that the hash chain in Hansa stores about 17 times 188 Table 7.3: Gas fees for different functions of the SmartWiFi smart contract. Since SmartWiFi uses an off-chain execution protocol with infrequent smart contract transactions, the resulting fee overhead drops significantly. Transaction fee with recommended gas price [3] Function Gas Approx. USD Apub -accessor 0 0 Apub -mutator 28,366 0.11 Υhead -accessor 0 0 Υhead -mutator 33,684 0.13 Balance check 0 0 Payment 50,076 0.20 Refund 42,266 0.17 Fund contract 21,040 0.08 Download contract bytecode 0 0 less data in the smart contract, effectively reducing per-session delays. Moreover, it also reduces per-session fees from $8 to about 40¢ in USD equivalent, which corroborates the feasibility of SmartWiFi in terms of low cost. Table 7.4: Data stored in the smart contract per session (T = 3, 600, η = 60). Stored data per session Data Unit With hash chain Without hash chain Acknowledgement data 32 bytes 1,920 bytes Client identity 20 bytes 20 bytes Auxiliary data 64 bytes 64 bytes Total 116 bytes 2,004 bytes 7.6.4: DupSet Measurement and Overhead In this section, we evaluate the feasibility of DupSet by comparing our estimations with the average readings obtained from nine popular Internet speed measurement services, specifically: Bandwidth Place, DSLReports, Fast.com, Google Fiber, Internet Health Test, M-Lab, Ookla, Speed-Of.Me, and Xfinity. We test ten different SmartWiFi router Internet connections belonging to different speed tiers, and evaluate average speeds of each of these connections by taking six speed test probes at each of the nine services listed above. The six speed test probes consist of three probes per 189 service before running the DupSet simulation, and three probes per service right after the DupSet simulation. In our prototype setup, we deploy ten DupSet servers in different geographic locations. In order to achieve further diversity in measurements, we use servers provided by two different cloud services, DigitalOcean [10] and Vultr [101]. For each Internet connection, we run 60 probes of DupSet for measuring the transmission speed component from each of the servers based on the payload of 10 kilobytes. The fastest reading from the ten servers represents the speed result of a probe. The experiment confirms that the low-overhead DupSet estimations correlate with the high- overhead traditional Internet speed readings. Figure 7.5 shows that DupSet speed measurement results accurately reflect the Internet connection speed tier, which quantifies the QoS of user service. The spiked increase in the gap between the two readings at high speeds demonstrates the core difference between traditional bandwidth measurements and user-perceived speed estimation: a drastic increase in available bandwidth after a certain threshold does not trigger a proportional boost in loading web pages. In a high-speed Internet, the performance bottleneck moves from the client to the server. The overall maximum communication overhead of DupSet probes depends on the number of DupSet servers and the size of payload on any of these servers. In our prototype, we empirically select a 10-kilobyte DupSet payload and 10 DupSet servers, resulting in 100-kilobyte maximum overhead per probe, or approximately 6 Mb of overhead per one-hour session. Through this experi- ment, we demonstrate that DupSet probes reflect accurate user-perceived speed with low overhead. A SmartWiFi client uses DupSet to control minimum expected speed. Since different users may have different minimum speed requirements at different times (e.g, watching stream video needs higher speed than reading e-mail), it is required from the users to explicitly specify their expectations in the client settings. In the SmartWiFi Android app, for example, we let the user choose between 5 discrete options. 190 Measured downlink speed (Mbps) 700 DupSet 600 Speed measurement websites 500 400 300 200 100 0 1 2 3 1 z 2 N N em N N iF i em H N et VP VP od VP VP W od G VP rn M m i5 he P P P P oa M iF P Et TC TC E TC D le -W D LT U ur ab U ab it Ed C AT N ig it- G ab ig G Internet connection profile Figure 7.5: Correlation between traditional Internet speed measurement (average result from nine websites) and DupSet probes over 10 different Internet connection profiles. 7.6.5: Scalability SmartWiFi is designed to scale to multiple clients connecting to a single router. We evaluate the performance of the system under the load of different numbers of users. For each client, we perform a background web surfing simulation that picks and loads a random website from the Alexa Top 10K list [250] every 10 seconds. Figure 7.6 shows the number of clients one router could serve without disconnection. As we can see, this capacity depends on the bandwidth of the Internet connection of the router and the maximum expected Internet speed set by the client. The experiment shows that when the router has a high-bandwidth Internet connection, and clients do not request high speed, SmartWiFi is capable of serving hundreds of clients simultaneously57 . Figure 7.7 shows average DupSet readings for different Internet connections with different num- ber of simultaneously served clients under a background web surfing simulation. The graph shows the number of clients one SmartWiFi router can serve based on its Internet connection bandwidth and average speed expectations. For example, it will be overly ambitious for a SmartWiFi router with 100 Mbps connection to serve 40 users whose average speed expectation is 2 Mbps. However, 57 The growing number of users incurs higher rate of physical layer packet collisions. One way to mitigate this is to use MIMO WiFi access point hardware. 191 300 Max. number of simultaneous clients Ethernet, 100 Mbps 250 LTE, 52 Mbps VPN, 28 Mbps 200 Tor, 6 Mbps 150 100 50 0 25 12 1 2 4 8 16 32 64 8 6 2 24 06 0. 5 25 0. 12 25 51 10 0. 0. 5 User-perceived speed limit based on DupSet (Mbps) Figure 7.6: Maximum number of clients simultaneously served by the router for 15 min. under different connectivities, with random web surfing simulation in the background. if the expectation is reduced to 1 Mbps, serving 40 users simultaneously will likely be a realistic projection. 7.6.6: SmartWiFi Communication Overhead The communication overhead includes the client-router TCP traffic and the Infura blockchain API communication. We measure overhead by capturing network traffic and calculating a cumulative one-hour session TCP payload using Wireshark. Each session’s average result is based on 10 mea- surements. The results in Table 7.5 demonstrate that the overhead of off-chain communication is low compared to the results for blockchain-related calls. 192 8 Gigabit Ethernet Average DupSet speed (Mbps) 7 Cable Modem, 100 Mbps 6 LTE Modem, ~50 Mbps 5 4 3 2 1 0 10 20 30 40 50 60 70 80 90 10 11 0 0 12 13 14 15 16 17 18 19 20 21 22 0 0 0 0 0 0 0 0 0 0 23 24 25 0 0 0 0 Number of simultaneously connected clients Figure 7.7: Average DupSet readings for different types of Internet connection profiles with differ- ent number of clients simultaneously served by a SmartWiFi router. Table 7.5: Session communication overhead for different SmartWiFi calls over 10 measurements. The local calls represent off-chain communication between the hotspot and the client, including handshake Eh , connection initiation Ec , connection status check Es , and acknowledgement Ea . Blockchain (B/C) calls use Infura API [11]. Procedure Call Avg TCP Payload (bytes) σ Local: Eh 580 20 Local: Ec 412 0 Local: Es 274 9 Local: Ea 334 8 B/C: download bytecode 69,797 32,633 B/C: Apub -accessor 51,597 38,224 B/C: Apub -mutator 51,279 34,389 B/C: Υhead -accessor 50,472 33,205 B/C: Υhead -mutator 60,606 36,489 B/C: balance check 64,487 46,150 B/C: payment 45,326 28,016 B/C: refund 59,500 41,856 B/C: fund contract 56,834 39,542 193 7.7: Chapter Summary In this work, we proposed SmartWiFi, a smart contract-enabled WiFi hotspot system, which pro- vides universal accessibility, cross-domain authentication, association of QoS and payment, and security enhancement. SmartWiFi utilizes a novel cryptographic mechanism, Hansa, to establish connection. Hansa provides low-cost off-chain execution by restricting otherwise unacceptable smart contract fees, and significantly reduces delays associated with smart contract interaction. To validate the feasibility of SmartWiFi system, we designed and implemented a SmartWiFi prototype using an Ethereum smart contract. The experimental results show that SmartWiFi exhibits low op- erational delays, minimum communication overhead, and small blockchain fees. We demonstrated that SmartWiFi is a scalable, secure, and efficient WiFi hotspot solution, which can be easily de- ployed in a variety of systems with minimal intervention. The limited adoption of cryptocurrencies and the volatility of their market prices can be further addressed through the use of stablecoin to- kens, which we leave for future work. 194 CHAPTER 8: CONCLUSION Decentralized distributed systems, such as blockchain and DAG networks, are recent candidates for integration into the Smart World. These systems can significantly increase digital equity, free- dom, and privacy, but their integration is hindered by several fundamental technical challenges that this dissertation aims to address. Our exhaustive research and literature review reveals that all the challenges on the way of integration of decentralized distributed systems in the Smart World can be subdivided into three major categories: security, scalability, and usability. So, the core method- ology of our present and future research is to meticulously address the three groups of challenges, thereby fostering the adoption of blockchain and other decentralized distributed systems by the modern world. 8.1: Summary of Contributions In this dissertation proposal, we addressed three major challenges on the way of integration of the blockchain technology into the Smart World ecosystem: security, scalability and usability. Specif- ically, we unraveled new blockchcain attacks, classified existing threat mitigation solutions, pro- posed a new concept of defense, and discovered new trust-free applications of the blockchain tech- nology. Specifically, this dissertation makes the following contributions. Social Engineering Attacks in Smart Contracts: We explore the possibility and existence of new social engineering attacks beyond smart contract honeypots. We present two novel classes of Ethereum social engineering attacks — Address Manipulation and Homograph — and develop six zero-day social engineering attacks. To show how the attacks can be used in popular programming patterns, we conduct a case study of five popular smart contracts with combined market capitaliza- tion exceeding $29 billion, and integrate our attack patterns in their source codes without altering their existing functionality. Moreover, we show that these attacks remain dormant during the test phase but activate their malicious logic only at the final production deployment. We further an- 195 alyze 85,656 open-source smart contracts, and discover that 1,027 of them can be used for the proposed social engineering attacks. We conduct a professional opinion survey with experts from seven smart contract auditing firms, corroborating that the exposed social engineering attacks bring a major threat to the smart contract systems. Attacking Hardware Wallets: We introduce EthClipper, an attack that targets owners of hardware wallets on the Ethereum platform. EthClipper malware queries a distributed database of pre-mined accounts in order to select the address with maximum visual similarity to the original one. We de- sign and implement a EthClipper malware, which we test on Trezor, Ledger, and KeepKey wallets. To deliver computation and storage for the attack, we implement a distributed service, Clipper- Cloud, and test it on four deployment environments. Our evaluation shows that with off-the-shelf PCs and NAS storage, an attacker would be able to mine a database capable of matching 25% of the digits in an address to achieve a 50% chance of finding a fitting fake address. For responsible dis- closure, we have contacted the manufactures of the hardware wallets used in the attack evaluation, and they all confirmed the danger of EthClipper. Taxonomy and Classification of Threat Mitigation Solutions in Smart Contracts: We develop a comprehensive classification taxonomy of smart contract threat mitigation solutions within five orthogonal dimensions: defense modality, core method, targeted contracts, input-output data map- ping, and threat model. We classify 133 existing threat mitigation solutions using our taxonomy and confirm that the proposed five dimensions allow us to concisely and accurately describe any smart contract threat mitigation solution. In addition to learning what the threat mitigation solu- tions do, we also show how these solutions work by synthesizing their actual designs into a set of uniform workflows corresponding to the eight existing defense core methods. We further cre- ate an integrated coverage map for the known smart contract vulnerabilities by the existing threat mitigation solutions. Finally, we perform the evidence-based evolutionary analysis, in which we identify trends and future perspectives of threat mitigation in smart contracts and pinpoint major weaknesses of the existing methodologies. For the convenience of smart contract security devel- opers, auditors, users, and researchers, we deploy a regularly updated comprehensive open-source 196 online registry of threat mitigation solutions. Context-Aware User-Centered Transaction Testing: We propose a new smart contract security testing approach called transaction encapsulation. The core idea lies in the local execution of trans- actions on a fully-synchronized yet isolated Ethereum node, which creates a preview of outcomes of transaction sequences on the current state of blockchain. To overcome the well-known time-of- check/time-of-use (TOCTOU) problem, i.e., the assurance that the final transactions will exhibit the same execution paths as the encapsulated test transactions, we determine the exact conditions for guaranteed execution path replicability of the tested transactions. To demonstrate the transaction encapsulation, we implement a transaction testing tool, TxT, which reveals the actual outcomes (either benign or malicious) of Ethereum transactions. To ensure the correctness of testing, TxT deterministically verifies whether a given sequence of transactions ensues an identical execution path on the current state of blockchain. We analyze over 1.3 billion Ethereum transactions and de- termine that 96.5% of them can be verified by TxT. We further show that TxT successfully reveals the suspicious behaviors associated with 31 out of 37 vulnerabilities (83.8% coverage) in the smart contract weakness classification (SWC) registry. In comparison, the vulnerability coverage of all the existing defense approaches combined only reaches 40.5%. Smart Contract on the Cloud: We determine that the major obstacle to public blockchain scala- bility is their underlying unstructured P2P networks. We further show that a centralized network can support the deployment of decentralized smart contracts. We propose a novel approach for achieving scalable decentralization: instead of trying to make blockchain scalable, we deliver de- centralization to already scalable cloud by using an Ethereum smart contract. We introduce Blocku- mulus, a framework that can deploy decentralized cloud smart contract environments using a novel technique called overlay consensus. Through experiments, we demonstrate that Blockumulus is scalable in all three dimensions: computation, data storage, and transaction throughput. Besides eliminating the current code execution and storage restrictions, Blockumulus delivers a transaction latency between 2 and 5 seconds under normal load. Moreover, the stress test of our prototype reveals the ability to execute 20,000 simultaneous transactions under 26 seconds, which is on par 197 with the average throughput of worldwide credit card transactions. Blockchain-Assisted Wireless Cross-Domain Authentication: We propose SmartWiFi, a uni- versal, secure, and decentralized WiFi hotspot that can be deployed in any public or private envi- ronment. SmartWiFi provides cross-domain authentication, fully automated accounting and pay- ments, and security assurance for both hotspots and clients. SmartWiFi utilizes a novel off-chain transaction scheme called Hash Chain-based Network Connectivity Satisfaction Acknowledgement (Hansa), which enables fast and low-cost provider-client protocol by restricting otherwise unac- ceptable delays and fees associated with blockchain interaction. In addition, we present DupSet, a dynamic user-perceived speed estimation technique, which can reliably evaluate the quality of Internet connection from the users’ perspective. We design and implement SmartWiFi desktop and mobile apps using an Ethereum smart contract. With extensive experimental evaluation, we demon- strate that SmartWiFi exhibits rapid execution with low communication overhead and reduced fees. 8.2: Limitations and Discussion Although the research described in this dissertation makes significant contribution to the field of blockchain integration, our work obviously has limitations and room for further improvement. Be- low are... Social Engineering Attacks in Smart Contracts: In this dissertation, we have highlighted crucial security vulnerabilities in Ethereum smart contracts caused by social engineering attacks. While this work contributes significantly to understanding these threats, there are certain limitations and future work requirements that need to be addressed. First, the work focuses on Ethereum smart contracts, which, although being the most widely used, are not the only existing smart contract platforms. Extending the analysis to other platforms such as Binance Smart Chain, Cardano, and Polkadot can provide a comprehensive understanding of social engineering vulnerabilities across the blockchain ecosystem. Second, social engineering attacks continue to evolve, and as new tech- niques emerge, your study may require updates to stay relevant. Regularly updating the research with the latest attack vectors can help practitioners better protect against evolving threats. Third, 198 while the work addresses technical aspects of social engineering attacks, it is essential to focus on human factors contributing to the success of these attacks, such as cognitive biases and susceptibil- ity to manipulation. Future work can explore psychological and behavioral aspects to design more effective countermeasures. Fifth, the the work identifies vulnerabilities and possible attack vectors but could benefit from a deeper exploration of mitigation strategies. Expanding the research to propose and evaluate concrete solutions, such as secure coding practices, enhanced auditing, and education initiatives, can help developers and users fortify their defenses against social engineering attacks. Sixth, the work can be strengthened by analyzing real-world instances of social engineer- ing attacks on Ethereum smart contracts. These case studies will provide valuable insights into how attackers exploit vulnerabilities and the consequences of these breaches. Last but not least, our future work can benefit from collaboration between cybersecurity, social science, and legal experts to develop a holistic approach in understanding and addressing social engineering threats in smart contracts. Addressing these limitations and focusing on future work will not only improve the quality of our future research but also contribute to advancing the understanding and mitigation of social engineering attacks in smart contracts and the broader blockchain ecosystem. Attacking Hardware Wallets: In this dissertation, we introduce an innovative attack vector that exploits clipboard manipulation to target hardware wallets. This work, however, has some limita- tions that need to be addressed and areas where future work is required. First, the work primarily focuses on EthClipper’s impact on Ethereum-based hardware wallets. Investigating the potential of similar attacks on other cryptocurrencies (e.g., Bitcoin, Litecoin) and wallet types (e.g., soft- ware wallets, mobile wallets) will help provide a broader perspective on the risks associated with clipboard meddling attacks. Second, the work only presents one method for evading address verifi- cation mechanisms. Future work can explore other potential evasion techniques, as well as assess the effectiveness of existing countermeasures in protecting against these attacks. Third, although our dissertation identifies the vulnerabilities and attack vectors, it could benefit from a more in- depth analysis of potential countermeasures and mitigation strategies. Proposing and evaluating solutions to prevent or detect clipboard meddling attacks, such as secure clipboard APIs, behav- 199 ioral analytics, and user education, will contribute to strengthening the security of hardware wal- lets and the wider cryptocurrency ecosystem. Fourth, our research would benefit from analyzing real-world instances of clipboard meddling attacks on hardware wallets. Examining such cases can provide valuable insights into the tactics used by attackers, the effectiveness of existing defenses, and the consequences of these security breaches. Fifth, our future work can explore the balance between user experience and security in the context of hardware wallets. Investigating usability aspects that may inadvertently contribute to the success of clipboard meddling attacks can inform the design of more secure and user-friendly wallet solutions. Finally, we should consider analyz- ing clipboard meddling attacks across various operating systems and platforms, such as Windows, macOS, Linux, Android, and iOS, to gain a comprehensive understanding of the risks and potential mitigation strategies. Taxonomy of Threat Mitigation in Smart Contracts: In this dissertation, we provide a compre- hensive overview of security threats and mitigation techniques for smart contracts. However, there are certain limitations that need to be addressed, and areas where future work is required. First, se- curity threats and mitigation techniques evolve over time. The survey paper may become outdated as new threats and solutions emerge. Periodically updating the survey with the latest developments will help maintain its relevance and usefulness to practitioners and researchers. Second, he survey primarily focuses on Ethereum-based smart contracts. However, various other smart contract plat- forms exist, such as Binance Smart Chain, Cardano, and Polkadot. Extending the survey to cover security threats and mitigation techniques for these platforms will provide a more comprehensive understanding of the smart contract security landscape. Third, the work covers technical aspects of smart contract security but could benefit from incorporating insights from other disciplines, such as legal, economic, and social sciences. Integrating these perspectives can lead to a more holis- tic understanding of security challenges and potential solutions in the smart contract ecosystem. Fourth, as the field of smart contracts and blockchain technology continues to advance, new tools and techniques are being developed. Future work can explore the impact of emerging technologies, such as zero-knowledge proofs, formal verification, and secure multi-party computation, on smart 200 contract security and threat mitigation. Fifth, usability and security trade-offs: The survey paper can benefit from a discussion on the trade-offs between usability and security in the design and implementation of smart contracts. Understanding these trade-offs can help guide the development of more secure and user-friendly smart contract solutions. Finally, including real-world case stud- ies of smart contract security breaches, their consequences, and the efficacy of various mitigation techniques will provide valuable context and practical insights for the survey’s audience. Real-Time Transaction Testing: In this dissertation, we present a novel approach to enhance the security and efficiency of Ethereum smart contracts by encapsulating transactions in real-time. While this work contributes significantly to the field, there are certain limitations that need to be addressed and areas where future work is required: First, our approach can benefit from a more extensive performance evaluation of the transaction encapsulation approach. This may include analyzing the overhead introduced by transaction encapsulation, the impact on transaction through- put, and the scalability of the approach in different scenarios and network conditions. Second, a detailed security analysis of the transaction encapsulation approach is essential to understand its effectiveness in mitigating potential attacks and vulnerabilities. Future work can explore possible attack vectors that may target the encapsulation mechanism and evaluate the efficacy of the TxT approach in addressing these threats. Third, investigating the compatibility and integration chal- lenges of the TxT tester with existing smart contract development and deployment tools, such as Truffle, and Remix, will help identify potential barriers to adoption and guide the development of more seamless integration strategies. Fourth, as the blockchain ecosystem continues to evolve, interoperability between different platforms is becoming increasingly important. Future work can explore the potential of the transaction encapsulation approach in facilitating cross-chain commu- nication and bridging different blockchain networks. Finally, it is important to consider the impact of the transaction encapsulation testing approach on user experience and adoption, as these factors are crucial for its success. Investigating the ease of use, potential learning curve, and changes to existing workflows for developers and end-users can help inform the design and development of more user-friendly transaction encapsulation solutions. 201 Smart Contracts on the Cloud: In this dissertation, we present an innovative framework that leverages cloud computing to enhance the scalability of smart contract execution. While this work contributes significantly to the field, there are some limitations to address and areas where future work is required. First, cloud-based infrastructure introduces new security and privacy challenges. A detailed analysis of the potential security risks and privacy implications of the Blockumulus framework is crucial to ensure its robustness and reliability. Future work can explore the inte- gration of advanced security techniques, such as zero-knowledge proofs and secure multi-party computation, to address these concerns. Second, our work can benefit from a comprehensive cost analysis of the Blockumulus framework, comparing the costs of on-chain and off-chain computa- tion and storage, as well as the trade-offs between scalability, performance, and costs associated with using cloud-based infrastructure. Third, further performance evaluation of the Blockumulus framework is essential to understand its scalability and efficiency. This may include analyzing var- ious performance metrics such as transaction throughput, latency, and resource utilization under different workloads and network conditions. Finally, investigating the challenges and barriers to deploying and adopting the Blockumulus framework, including compatibility with existing tools, legal and regulatory considerations, and user acceptance, can help guide the development of more practical and widely-adopted solutions. Decentralized Wi-Fi Hotspots: We propose in this dissertation a new solution that leverages smart contracts to provide secure and universal access to WiFi hotspots. Yet, the work has some limi- tations and room for future improvement. First, a comprehensive cost analysis of the SmartWiFi system is necessary to understand the trade-offs between access fees, infrastructure costs, and trans- action fees associated with smart contract execution. This would help users and hotspot providers make informed decisions regarding the adoption and deployment of the SmartWiFi system. Sec- ond, investigating dynamic pricing models and incentive mechanisms to optimize the allocation of WiFi resources and encourage more hotspot providers to participate in the SmartWiFi ecosystem. This could involve the use of game theory, auction mechanisms, or reputation systems to create a fair and economically efficient marketplace. Third, it is important to conduct pilot studies or 202 real-world deployments of the SmartWiFi system in various environments, such as urban centers, airports, or university campuses, to validate its performance, usability, and security under differ- ent conditions. This would help identify potential challenges, gather user feedback, and refine the SmartWiFi solution based on real-world experiences. 8.3: Lessons Learned Blockchain is still an emergent technology undergoing a steady integration into the Smart World ecosystem. Unsurprisingly, the new technology is strewn with common misconceptions and sur- prising discoveries. Our understanding of blockchain changed as we researched it further. Below are some most important lessons that we have learned in the course of the research delivered in this dissertation. Human-in-the-Loop: Smart contracts are designed and implemented by human developers to in- teract with human users, in which the human is the central component of a smart contract ecosystem. Yet, most existing smart contract security studies do not take the human factor seriously. We are the first to study social engineering attacks in smart contracts. We developed six zero-day attacks and demonstrated that most of them remain dormant on the testnet and activate their malicious capacity only when deployed on a production network. Blockchain Evolution, not Revolution: A significant motivational factor for blockchain integra- tion is that the technology practically enables smart contracts, which existed only as a concept before the era of blockchain. Together blockchain and smart contracts deliver some unique impor- tant properties, such as full or partial decentralization, non-repudiation, permanent recording, and trustless computation. These key properties enable a broad spectrum of important decentralized applications, such as: decentralized finance, various data certifications and proofs, unaffiliated identity and key management, different voting and election schemes, legal contracts, and many more — spurring the notion of blockchain as a revolutionary technology. However, the reality suggests that despite its enormous potential, there is nothing revolutionary about blockchain. Our research and careful observation suggest, that instead of a blockchain revolution, we are facing a 203 steady blockchain evolution. Specifically, like any other technology, blockchain began with the idea (e.g., fair machine), followed by specific concepts (e.g., reasonably hard computation), then turning into first prototypes (e.g., Ethereum). The next step, integration, is a painstaking process of making the technology more secure, scalable, and usable — which is the main goal of the research in this dissertation. Distributed Service versus Utility: Our research shows that distributed computation is transition- ing from the concept of service towards the concept of next-generation utility, which is to be be provided in a generic form, separated from specific apps. For example, using cloud as a utility al- lows a smartphone user to “plug” their apps to the cloud(s) of their choice, instead of using the cloud predetermined by developers. Furthermore, we discovered sufficient evidence that cloud providers’ existing user data collection practices often breach user privacy, which necessitates the use of zero- knowledge protocols. As the first step in this evolutionary transformation, we created the first distributed framework for smart contracts on the cloud called Blockumulus. Instead of completely replacing blockchain with a cloud, Blockumulus uses the Ethereum blockchain as a guarantor of the permissionless properties of cloud contracts. Our future vision of privacy-preserving cloud with sovereign identities further elaborates on the idea of distributed computation-as-a-utility. Scalability is Multi-Dimensional: Blockchain is notorious for trading performance for decen- tralization, known as the blockchain scalability problem that manifests in limited computation, bounded data storage, and insufficient transaction throughput. In addition to the performance- decentralization trade-off, sometimes referred to as the blockchain scalability trilemma [140], we also discovered an inter-performance trade-off in decentralized distributed systems, which in- cludes the balance between transaction throughput, computation, and data storage. To tackle this problem, we proposed a shift of approach. Instead of delivering scalability to blockchain, as in previous solutions, we port blockchain properties into a distributed system that is already scalable, i.e., cloud. 204 8.4: Future Work In the future, we will continue advancing the frontier of adoption of decentralized distributed sys- tems by addressing their security, scalability, and usability challenges, as outlined below. Security Data Flow Analysis via Parsing: Our preliminary work demonstrates the ability to au- tomatically detect some covert security threats, such as overflows and backdoors, across a broad spectrum of software. Our novel approach proceeds in two steps: 1) parse the source code (using the RPLY or similar framework) with a special augmented grammar to extract some important facts; 2) use the extracted facts for security data flow analysis (DFA) based on the Datalog declarative logical language. We recently applied this methodology to find security issues related to unsafe de- pendency between Ethereum accounts in smart contracts. Our preliminary evaluation corroborates the accuracy and efficiency of the approach. Thus, this approach has a potential to be developed into a new general security method suitable for addressing a wide variety of security issues in multiple domains. Privacy-Preserving Cloud with Self-Sovereign Identities: Previous attempts to make the cloud more privacy-preserving, user-centered, and versatile are scarce and address only a small subset of existing problems. To address the shortage of such systems, we envision a new paradigm called Cloud 2.0, which introduces the concept of Data-Execution Models (DEMs) for safeguarding and managing the communication among client apps, distributed services, and external actors (e.g., other apps). As a result, our new cloud will enable the following three properties: 1) guaranteed user-controlled redundancy: if one service is down, the remaining ones continue working; 2) sepa- ration of apps from services: the user can switch, connect, or disconnect services according to their needs without switching to different apps; 3) enforced privacy: the user can have full control of their data by design, not by promise. Achieving this ambitious vision requires extensive research to address significant technical challenges that Cloud 2.0 faces, including, but not limited to, DEM security, backward compatibility, DEM upgrade, scalability, and performance. Friction-Free Public-Key Authentication: Traditional password-based authentication is associ- 205 ated with many security, privacy, and usability issues. As opposed to that, public-key authentication enhances privacy, security, and flexibility. Some popular services, such as GitHub and Ethereum, successfully use public-key authentication. Yet, the adoption of this approach by many services is impeded by the necessity for users to learn new concepts and perform additional steps (e.g., key generation), which is called the technological friction. Overcoming the friction associated with public-key authentication requires creating novel applied cryptography protocols for seamless gen- eration of private keys, account storage, and public-key signatures. Moreover, these protocols must be compatible with legacy systems. As a result, usable public-key authentication can address many existing security and privacy problems in the modern world. 206 BIBLIOGRAPHY [1] The average number of credit card transactions per day & year. https://www.cardrates.com/ advice/number-of-credit-card-transactions-per-day-year/. Accessed: 2021-01-12. [2] eduroam - World Wide Education Roaming for Research & Education. https://www. eduroam.org/. Accessed: 2020-05-10. [3] ETH Gas Station. https://ethgasstation.info/. Accessed: 2020-05-17. [4] Ethereum average transaction fee. https://ycharts.com/indicators/ethereum_average_transaction_fee. Accessed: 2021-01-13. [5] Ethereum daily transactions chart. https://etherscan.io/chart/tx. Accessed: 2021-01-13. [6] Solidity coverage. https://github.com/sc-forks/solidity-coverage. Accessed: 2021-11-12. [7] Waffle. https://getwaffle.io/. Accessed: 2021-11-12. [8] Cisco meraki for sp public wifi. http://marketo.meraki.com/rs/010-KNZ-501/images/ Meraki_for_SP_Public_WiFi.pdf, 2019. Accessed: 2020-04-03. [9] Cloud managed networking. https://www.arubanetworks.com/solutions/cloud-managed/, 2019. Accessed: 2020-04-10. [10] Digitalocean. https://www.digitalocean.com, 2019. Accessed: 2020-04-03. [11] Infura: Scalable blockchain infrastructure. https://github.com/INFURA, 2019. Accessed: 2020-04-03. [12] Passpoint. https://www.wi-fi.org/discover-wi-fi/passpoint, 2019. Accessed: 2020-04-03. [13] Qlc chain. https://medium.com/qlc-chain/chain/home, 2019. Accessed: 2020-04-03. [14] Ruckus cloud wi-fi. https://www.ruckuswireless.com/products/ system-management-control/cloud-wifi, 2019. Accessed: 2020-04-03. [15] Winq. https://winq.net/, 2019. Accessed: 2020-04-03. [16] BIP-32 Protocol. https://github.com/bitcoin/bips/blob/master/bip-0032.mediawiki, 2020. Accessed: 2020-02-27. [17] Ledger SAS. https://www.ledger.com, 2020. Accessed: 2020-02-27. [18] Satoshi Labs. https://satoshilabs.com/, 2020. Accessed: 2020-02-27. [19] ShapeShift. https://shapeshift.io, 2020. Accessed: 2020-02-27. [20] Ethereum development documentation. https://ethereum.org/en/developers/docs/, 2021. [21] Ethereum virtual machine opcodes. https://www.ethervm.io/, 2021. 207 [22] Infura. https://infura.io/, 2021. [23] Myetherwallet. https://www.myetherwallet.com/, 2021. [24] Pokt. https://pokt.network/, 2021. [25] Artificial intelligence (AI) for cybersecurity. https://www.ibm.com/security/ artificial-intelligence, 2022. Accessed: 2022-03-07. [26] Dedadub Contract Library. https://dedaub.com/contract-library, 2022. Accessed: 2022-02- 28. [27] EOS.IO Technical White Paper v2. https://github.com/EOSIO/Documentation/blob/master/TechnicalWhitePaper.md, 2022. Accessed: 2022-03-07. [28] Etherscan Token Tracker. https://etherscan.io/tokens, 2022. Accessed: 2022-03-05. [29] Go implementation of mev-auction for ethereum. https://github.com/flashbots/mev-geth, 2022. [30] Miner extractable value (mev). https://ethereum.org/en/developers/docs/mev/, 2022. [31] MythX. https://mythx.io/, 2022. Accessed: 2022-02-26. [32] Neo White Paper. https://docs.neo.org/v2/docs/en-us/basic/whitepaper.html, 2022. Ac- cessed: 2022-03-07. [33] OpenZeppelin Contracts. https://openzeppelin.com/contracts/, 2022. Accessed: 2022-02- 28. [34] Polygon. https://polygon.technology/, 2022. [35] Rsk whitepaper. https://www.rsk.co/Whitepapers/RSK_White_Paper-ORIGINAL.pdf, 2022. [36] Swc-100: Function default visibility. https://swcregistry.io/docs/SWC-100, 2022. Accessed: 2022-03-21. [37] Swc-107: Reentrancy. https://swcregistry.io/docs/SWC-107, 2022. Accessed: 2022-03-21. [38] Swc-108: State variable default visibility. https://swcregistry.io/docs/SWC-108, 2022. Ac- cessed: 2022-03-21. [39] Swc-119: Shadowing state variables. https://swcregistry.io/docs/SWC-119, 2022. Accessed: 2022-04-17. [40] Swc-123: Requirement violation. https://swcregistry.io/docs/SWC-123, 2022. Accessed: 2022-03-10. 208 [41] Swc-130: Right-to-left-override control character (u+202e). https://swcregistry.io/docs/ SWC-130, 2022. Accessed: 2022-04-17. [42] Swc registry. https://swcregistry.io/, 2022. Accessed: 2022-03-15. [43] Z3prover/z3. https://github.com/Z3Prover/z3, 2022. [44] Tesnim Abdellatif and Kei-Léo Brousmiche. Formal verification of smart contracts based on users and blockchain behaviors models. In 2018 9th IFIP International Conference on New Technologies, Mobility and Security (NTMS), pages 1–5. IEEE, 2018. [45] Lawrence Abrams. Clipboard hijacker malware monitors 2.3 mil- lion bitcoin addresses. https://www.bleepingcomputer.com/news/security/ clipboard-hijacker-malware-monitors-23-million-bitcoin-addresses/, 2018. [46] Wolfgang Ahrendt, Richard Bubel, Joshua Ellul, Gordon J Pace, Raúl Pardo, Vincent Rebis- coul, and Gerardo Schneider. Verification of smart contract business logic. In International Conference on Fundamentals of Software Engineering, pages 228–243. Springer, 2019. [47] Sefa Akca, Ajitha Rajan, and Chao Peng. Solanalyser: A framework for analysing and test- ing smart contracts. In 2019 26th Asia-Pacific Software Engineering Conference (APSEC), pages 482–489. IEEE, 2019. [48] Elvira Albert, Jesús Correas, Pablo Gordillo, Guillermo Román-Díez, and Albert Rubio. Safevm: a safety verifier for ethereum smart contracts. In Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis, pages 386–389, 2019. [49] Elvira Albert, Pablo Gordillo, Albert Rubio, and Ilya Sergey. Running on fumes. In Interna- tional Conference on Verification and Evaluation of Computer and Communication Systems, pages 63–78. Springer, 2019. [50] Emad Almutairi and Shiroq Al-Megren. Usability and security analysis of the keepkey wallet. In 2019 IEEE International Conference on Blockchain and Cryptocurrency (ICBC), pages 149–153. IEEE, 2019. [51] Sarra Alqahtani, Xinchi He, Rose Gamble, and Papa Mauricio. Formal verification of func- tional requirements for smart contract compositions in supply chain management systems. In Proceedings of the 53rd Hawaii International Conference on System Sciences, 2020. [52] Sidney Amani, Myriam Bégel, Maksym Bortin, and Mark Staples. Towards verifying ethereum smart contract bytecode in isabelle/hol. In Proceedings of the 7th ACM SIGPLAN International Conference on Certified Programs and Proofs, pages 66–77, 2018. [53] Elli Androulaki, Artem Barger, Vita Bortnikov, Christian Cachin, Konstantinos Christidis, Angelo De Caro, David Enyeart, Christopher Ferris, Gennady Laventman, Yacov Manevich, et al. Hyperledger fabric: a distributed operating system for permissioned blockchains. In Proceedings of the thirteenth EuroSys conference, pages 1–15, 2018. 209 [54] Pedro Antonino and AW Roscoe. Solidifier: bounded model checking solidity using lazy contract deployment and precise memory modelling. In ACM Symposium on Applied Com- puting, pages 1788–1797, 2021. [55] Andreas M Antonopoulos and Gavin Wood. Mastering Ethereum: Building Smart Contracts and Dapps. O’Reilly Media, 2018. [56] Mauro Argañaraz, Mario Berón, Maria João Pereira, and Pedro Henriques. Detection of vulnerabilities in smart contracts specifications in ethereum platforms. In 9th Symposium on Languages, Applications and Technologies (SLATE 2020), volume 83, pages 1–16. Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, 2020. [57] Nicola Atzei, Massimo Bartoletti, and Tiziana Cimoli. A survey of attacks on ethereum smart contracts (sok). In International conference on principles of security and trust, pages 164–186. Springer, 2017. [58] Nicola Atzei, Massimo Bartoletti, Stefano Lande, Nobuko Yoshida, and Roberto Zunino. Developing secure bitcoin contracts with bitml. In Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pages 1124–1128, 2019. [59] Xiaomin Bai, Zijing Cheng, Zhangbo Duan, and Kai Hu. Formal modeling and verification of smart contracts. In Proceedings of the 2018 7th international conference on software and computer applications, pages 322–326, 2018. [60] Massimo Bartoletti and Roberto Zunino. Verifying liquidity of bitcoin contracts. In 8th International Conference on Principles of Security and Trust, POST 2019 Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2019, volume 11426, pages 222–247. Springer, 2019. [61] Bernhard Beckert, Mihai Herda, Michael Kirsten, and Jonas Schiffl. Formal specification and verification of hyperledger fabric chaincode. In Proc. Int. Conf. Formal Eng. Methods, pages 44–48, 2018. [62] Matthew Beedham. This cryptocurrency stealing malware was blocked more than 360,000 times over the past year. https://thenextweb.com/news/ cryptocurrency-malware-blocked-360000-times, 2019. Accessed: 2021-04-12. [63] Elisa Bertino, Murat Kantarcioglu, Cuneyt Gurcan Akcora, Sagar Samtani, Sudip Mittal, and Maanak Gupta. Ai for security and security for ai. In Proceedings of the Eleventh ACM Conference on Data and Application Security and Privacy, pages 333–334, 2021. [64] Karthikeyan Bhargavan, Antoine Delignat-Lavaud, Cédric Fournet, Anitha Gollamudi, Georges Gonthier, Nadim Kobeissi, Natalia Kulatova, Aseem Rastogi, Thomas Sibut-Pinote, Nikhil Swamy, et al. Formal verification of smart contracts: Short paper. In Proceedings of the 2016 ACM workshop on programming languages and analysis for security, pages 91–96, 2016. 210 [65] Giancarlo Bigi, Andrea Bracciali, Giovanni Meacci, and Emilio Tuosto. Validation of de- centralised smart contracts through game theory and formal methods. In Programming Lan- guages with Applications to Biology and Security, pages 142–161. Springer, 2015. [66] Alex Biryukov, Dmitry Khovratovich, and Sergei Tikhomirov. Findel: Secure derivative contracts for ethereum. In International Conference on Financial Cryptography and Data Security, pages 453–467. Springer, 2017. [67] David Bisson. Bitcoin stealer malware takes 60k using clip- board modification method. https://securityintelligence.com/news/ bitcoin-stealer-malware-takes-60k-using-clipboard-modification-method/, 2018. [68] Sujit Biswas, Kashif Sharif, Fan Li, Sabita Maharjan, Saraju P Mohanty, and Yu Wang. Pobt: A lightweight consensus algorithm for scalable iot business blockchain. IEEE Internet of Things Journal, 7(3):2343–2355, 2019. [69] Joseph Bonneau, Andrew Miller, Jeremy Clark, Arvind Narayanan, Joshua A Kroll, and Ed- ward W Felten. Sok: Research perspectives and challenges for bitcoin and cryptocurrencies. In 2015 IEEE Symposium on Security and Privacy, pages 104–121, 2015. [70] Priyanka Bose, Dipanjan Das, Yanju Chen, Yu Feng, Christopher Kruegel, and Giovanni Vigna. Sailfish: Vetting smart contract state-inconsistency bugs in seconds. arXiv preprint arXiv:2104.08638, 2021. [71] Sean Bowe, Alessandro Chiesa, Matthew Green, Ian Miers, Pratyush Mishra, and Howard Wu. Zexe: Enabling decentralized private computation. In 2020 IEEE Symposium on Secu- rity and Privacy (SP), pages 947–964. IEEE, 2020. [72] Santiago Bragagnolo, Henrique Rocha, Marcus Denker, and Stéphane Ducasse. Smartin- spect: solidity smart contract inspector. In 2018 International workshop on blockchain ori- ented software engineering (IWBOSE), pages 9–18. IEEE, 2018. [73] Lorenz Breidenbach, Phil Daian, Ari Juels, and Emin Gün Sirer. An in-depth look at the parity multisig bug. Hacking, Distributed, July, 2017. [74] Lorenz Breidenbach, Phil Daian, Florian Tramèr, and Ari Juels. Enter the hydra: Towards principled bug bounties and exploit-resistant smart contracts. In USENIX Security 18, pages 1335–1352, 2018. [75] Lexi Brent, Neville Grech, Sifis Lagouvardos, Bernhard Scholz, and Yannis Smaragdakis. Ethainter: A smart contract security analyzer for composite vulnerabilities. In Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implemen- tation, pages 454–469, 2020. [76] Lexi Brent, Anton Jurisevic, Michael Kong, Eric Liu, Francois Gauthier, Vincent Gramoli, Ralph Holz, and Bernhard Scholz. Vandal: A scalable security analysis framework for smart contracts. arXiv preprint arXiv:1809.03981, 2018. 211 [77] R Browne. Accidental bug may have frozen 280 million worth of digi- tal coin ether in a cryptocurrency wallet. https://www.cnbc.com/2017/11/08/ accidental-bug-may-have-frozen-280-worth-of-ether-on-parity-wallet.html, 2017. [78] Vitalik Buterin. A next-generation smart contract and decentralized application platform. white paper, 2014. [79] Vitalik Buterin, Eric Conner, Rick Dudley, Matthew Slipper, Ian Norden, and Abdelhamid Bakhta. Eip-1559: Fee market change for eth 1.0 chain. https://eips.ethereum.org/EIPS/ eip-1559, 2019. [80] Ramiro Camino, Christof Ferreira Torres, Mathis Baden, and Radu State. A data science approach for detecting honeypots in ethereum. In 2020 IEEE International Conference on Blockchain and Cryptocurrency (ICBC), pages 1–9. IEEE, 2020. [81] Roberto Casado-Vara and Juan Corchado. Distributed e-health wide-world accounting ledger via blockchain. Journal of Intelligent & Fuzzy Systems, 36(3):2381–2386, 2019. [82] Christian Catalini. How blockchain technology will impact the digital economy. Blockchains Smart Contracts Internet Things, 4:2292–2303, 2017. [83] Ethan Cecchetti, Siqiu Yao, Haobin Ni, and Andrew C Myers. Compositional security for reentrant applications. contract, 12(13):14. [84] Ethan Cecchetti, Siqiu Yao, Haobin Ni, and Andrew C Myers. Securing smart contracts with information flow. In International Symposium on Foundations and Applications of Blockchain, 2020. [85] Ethan Cecchetti, Siqiu Yao, Haobin Ni, and Andrew C Myers. Compositional security for reentrant applications. arXiv preprint arXiv:2103.08577, 2021. [86] Etherscan Information Center. How to “cancel” ethereum pending transactions? https://info.etherscan.com/how-to-cancel-ethereum-pending-transactions/, 2021. [87] ChainSecurity. Zero gas price transactions — what they do, who creates them, and why they might impact scalability. https://tinyurl.com/4rfpafp4, 2019. [88] S Sibi Chakkaravarthy, D Sangeetha, and V Vaidehi. A survey on malware analysis and mitigation techniques. Computer Science Review, 32:1–23, 2019. [89] Jialiang Chang, Bo Gao, Hao Xiao, Jun Sun, Yan Cai, and Zijiang Yang. scompile: Critical path identification and analysis for smart contracts. In International Conference on Formal Engineering Methods, pages 286–304. Springer, 2019. [90] Nakul Chawla, Hans Walter Behrens, Darren Tapp, Dragan Boscovic, and K Selçuk Candan. Velocity: Scalability improvements in block propagation through rateless erasure coding. In 2019 IEEE International Conference on Blockchain and Cryptocurrency (ICBC), pages 447–454. IEEE, 2019. 212 [91] Huashan Chen, Marcus Pendleton, Laurent Njilla, and Shouhuai Xu. A survey on ethereum systems security: Vulnerabilities, attacks, and defenses. ACM Computing Surveys (CSUR), 53(3):1–43, 2020. [92] Jiachi Chen, Xin Xia, David Lo, John Grundy, Xiapu Luo, and Ting Chen. Defining smart contract defects on ethereum. IEEE Transactions on Software Engineering, 2020. [93] Jiachi Chen, Xin Xia, David Lo, John Grundy, Xiapu Luo, and Ting Chen. Defectchecker: Automated smart contract defect detection by analyzing evm bytecode. IEEE Transactions on Software Engineering, 2021. [94] Ting Chen, Rong Cao, Ting Li, Xiapu Luo, Guofei Gu, Yufei Zhang, Zhou Liao, Hang Zhu, Gang Chen, Zheyuan He, et al. Soda: A generic online detection framework for smart contracts. In NDSS. The Internet Society, 2020. [95] Ting Chen, Xiaoqi Li, Xiapu Luo, and Xiaosong Zhang. Under-optimized smart contracts devour your money. In 2017 IEEE 24th International Conference on Software Analysis, Evolution and Reengineering (SANER), pages 442–446. IEEE, 2017. [96] Ting Chen, Yufei Zhang, Zihao Li, Xiapu Luo, Ting Wang, Rong Cao, Xiuzhuo Xiao, and Xiaosong Zhang. Tokenscope: Automatically detecting inconsistent behaviors of cryptocur- rency tokens in ethereum. In Proc. of CCS, pages 1503–1520, 2019. [97] Weili Chen, Zibin Zheng, Jiahui Cui, Edith Ngai, Peilin Zheng, and Yuren Zhou. Detecting ponzi schemes on ethereum: Towards healthier blockchain technology. In Proceedings of the 2018 World Wide Web Conference, pages 1409–1418, 2018. [98] Raymond Cheng, Fan Zhang, Jernej Kos, Warren He, Nicholas Hynes, Noah Johnson, Ari Juels, Andrew Miller, and Dawn Song. Ekiden: A platform for confidentiality-preserving, trustworthy, and performant smart contracts. In 2019 IEEE European Symposium on Security and Privacy (EuroS&P), pages 185–200. IEEE, 2019. [99] Mung Chiang. Networked Life: 20 Questions and Answers, chapter How WiFi is different from cellular, pages 406–409. Cambridge University Press, 2012. [100] Yuchiro Chinen, Naoto Yanai, Jason Paul Cruz, and Shingo Okamura. Ra: Hunting for re- entrancy attacks in ethereum smart contracts via static analysis. In 2020 IEEE International Conference on Blockchain (Blockchain), pages 327–336. IEEE, 2020. [101] Vultr Holdings Corporation. Vultr. https://www.vultr.com, 2019. Accessed: 2020-04-03. [102] Lorrie Faith Cranor, Serge Egelman, Jason I Hong, and Yue Zhang. Phinding phish: An evaluation of anti-phishing toolbars. In NDSS, pages 1–19, 2007. [103] Xiaohai Dai, Jiang Xiao, Wenhui Yang, Chaofan Wang, and Hai Jin. Jidar: A jigsaw-like data reduction approach without trust assumptions for bitcoin system. In 2019 IEEE 39th International Conference on Distributed Computing Systems (ICDCS), pages 1317–1326. IEEE, 2019. 213 [104] Josh Datko, Chris Quartier, and Kirill Belyayev. Breaking bitcoin hardware wallets. DEF CON 2017, 2017. [105] Christian Decker and Roger Wattenhofer. Information propagation in the bitcoin network. In IEEE P2P 2013 Proceedings, pages 1–10. IEEE, 2013. [106] Amir Dembo, Sreeram Kannan, Ertem Nusret Tas, David Tse, Pramod Viswanath, Xuechao Wang, and Ofer Zeitouni. Everything is a race and nakamoto always wins. In Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security, pages 859–878, 2020. [107] Monika Di Angelo and Gernot Salzer. A survey of tools for analyzing ethereum smart contracts. In 2019 IEEE International Conference on Decentralized Applications and In- frastructures (DAPPCON), pages 69–78. IEEE, 2019. [108] Mengjie Ding, Peiru Li, Shanshan Li, and He Zhang. Hfcontractfuzzer: Fuzzing hyperledger fabric smart contracts for vulnerability detection. In Evaluation and Assessment in Software Engineering, pages 321–328. 2021. [109] Wang Duo, Huang Xin, and Ma Xiaofeng. Formal analysis of smart contract based on colored petri nets. IEEE Intelligent Systems, 35(3):19–30, 2020. [110] Stefan Dziembowski, Lisa Eckey, Sebastian Faust, and Daniel Malinowski. Perun: Virtual payment channels over cryptographic currencies. IACR Cryptol. ePrint Arch., 2017:635, 2017. [111] Jacob Eberhardt and Stefan Tai. On or off the blockchain? insights on off-chaining compu- tation and data. In European Conference on Service-Oriented and Cloud Computing, pages 3–15. Springer, 2017. [112] Joshua Ellul and Gordon J Pace. Runtime verification of ethereum smart contracts. In 2018 14th European Dependable Computing Conference (EDCC), pages 158–163. IEEE, 2018. [113] Ethereum. Web3.js. https://web3js.readthedocs.io/en/v1.2.11/web3-eth.html, 2021. [114] Josselin Feist, Gustavo Grieco, and Alex Groce. Slither: a static analysis framework for smart contracts. In 2019 IEEE/ACM 2nd International Workshop on Emerging Trends in Software Engineering for Blockchain (WETSEB), pages 8–15. IEEE, 2019. [115] Yu Feng, Emina Torlak, and Rastislav Bodik. Precise attack synthesis for smart contracts. arXiv preprint arXiv:1902.06067, 2019. [116] Yu Feng, Emina Torlak, and Rastislav Bodik. Summary-based symbolic evaluation for smart contracts. In 2020 35th IEEE/ACM International Conference on Automated Software Engi- neering (ASE), pages 1141–1152. IEEE, 2020. [117] Christof Ferreira Torres, Mathis Baden, Robert Norvill, Beltran Borja Fiz Pontiveros, Hugo Jonker, and Sjouke Mauw. Ægis: Shielding vulnerable smart contracts against attacks. In Proceedings of the 15th ACM Asia Conference on Computer and Communications Security, pages 584–597, 2020. 214 [118] Christof Ferreira Torres, Ramiro Camino, et al. Frontrunner jones and the raiders of the dark forest: An empirical study of frontrunning on the ethereum blockchain. In USENIX Security Symposium, Virtual 11-13 August 2021, 2021. [119] Christof Ferreira Torres, Antonio Ken Iannillo, Arthur Gervais, et al. Confuzzius: A data dependency-aware hybrid fuzzer for smart contracts. 2021. [120] Christof Ferreira Torres, Antonio Ken Iannillo, Arthur Gervais, et al. The eye of horus: Spotting and analyzing attacks on ethereum smart contracts. In International Conference on Financial Cryptography and Data Security, Grenada 1-5 March 2021, 2021. [121] Karoline Figueiredo, Ahmed WA Hammad, Assed Haddad, and Vivian WY Tam. Assess- ing the usability of blockchain for sustainability: Extending key themes to the construction industry. Journal of Cleaner Production, page 131047, 2022. [122] Joel Frank, Cornelius Aschermann, and Thorsten Holz. ETHBMC: A bounded model checker for smart contracts. In 29th USENIX Security Symposium (USENIX Security 20), pages 2757–2774, 2020. [123] Ernesto Frontera. A History of The DAO Hack. https://coinmarketcap.com/alexandria/ article/a-history-of-the-dao-hack. Accessed: 2022-02-15. [124] Anthony Y Fu, Xiaotie Deng, Liu Wenyin, and Greg Little. The methodology and an appli- cation to fight against unicode attacks. In Proceedings of the second symposium on Usable privacy and security, pages 91–101, 2006. [125] Ying Fu, Meng Ren, Fuchen Ma, Yu Jiang, Heyuan Shi, and Jiaguang Sun. Evmfuzz: Dif- ferential fuzz testing of ethereum virtual machine. arXiv preprint arXiv:1903.08483, 2019. [126] Ying Fu, Meng Ren, Fuchen Ma, Heyuan Shi, Xin Yang, Yu Jiang, Huizhong Li, and Xiang Shi. Evmfuzzer: detect evm vulnerabilities via fuzz testing. In Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pages 1110–1114, 2019. [127] Jianbo Gao, Han Liu, Chao Liu, Qingshan Li, Zhi Guan, and Zhong Chen. Easyflow: Keep ethereum away from overflow. In 2019 IEEE/ACM 41st International Conference on Soft- ware Engineering: Companion Proceedings (ICSE-Companion), pages 23–26. IEEE, 2019. [128] Daniel E Geer Jr. Complexity is the enemy. IEEE Security & Privacy, 6(6):88–88, 2008. [129] Serenity Gibbons. 3 practical ways to use blockchain in your business in 2020. Forbes. [130] Yossi Gilad, Rotem Hemo, Silvio Micali, Georgios Vlachos, and Nickolai Zeldovich. Al- gorand: Scaling byzantine agreements for cryptocurrencies. In Proceedings of the 26th Symposium on Operating Systems Principles, pages 51–68, 2017. [131] Andriana Gkaniatsou, Myrto Arapinis, and Aggelos Kiayias. Low-level attacks in bitcoin wallets. In International Conference on Information Security, pages 233–253. Springer, 2017. 215 [132] Dan Goodin. Really stupid “smart contract” bug let hackers steal $31 million in digi- tal coin. https://arstechnica.com/information-technology/2021/12/hackers-drain-31-million- from-cryptocurrency-service-monox-finance/, 2021. [133] Neville Grech, Michael Kong, Anton Jurisevic, Lexi Brent, Bernhard Scholz, and Yannis Smaragdakis. Madmax: Surviving out-of-gas conditions in ethereum smart contracts. Pro- ceedings of the ACM on Programming Languages, 2(OOPSLA):1–27, 2018. [134] Gustavo Grieco, Will Song, Artur Cygan, Josselin Feist, and Alex Groce. Echidna: effective, usable, and fast fuzzing for smart contracts. In Proceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis, pages 557–560, 2020. [135] Ilya Grishchenko, Matteo Maffei, and Clara Schneidewind. Ethertrust: Sound static analysis of ethereum bytecode. Technische Universität Wien, Tech. Rep, 2018. [136] Ilya Grishchenko, Matteo Maffei, and Clara Schneidewind. A semantic framework for the security analysis of ethereum smart contracts. In International Conference on Principles of Security and Trust, pages 243–269. Springer, 2018. [137] Shelly Grossman, Ittai Abraham, Guy Golan-Gueta, Yan Michalevsky, Noam Rinetzky, Mooly Sagiv, and Yoni Zohar. Online detection of effectively callback free objects with applications to smart contracts. Proceedings of the ACM on Programming Languages, 2(POPL):1–28, 2017. [138] Mordechai Guri. Beatcoin: Leaking private keys from air-gapped cryptocurrency wallets. In 2018 IEEE International Conference on Internet of Things (iThings). IEEE, 2018. [139] Gus Gutoski and Douglas Stebila. Hierarchical deterministic bitcoin wallets that tolerate key leakage. In International Conference on Financial Cryptography and Data Security, pages 497–504. Springer, 2015. [140] Abdelatif Hafid, Abdelhakim Senhaji Hafid, and Mustapha Samih. Scaling blockchains: A comprehensive survey. IEEE Access, 8:125244–125262, 2020. [141] Ákos Hajdu and Dejan Jovanović. solc-verify: A modular verifier for solidity smart con- tracts. arXiv preprint arXiv:1907.04262, 2019. [142] Ákos Hajdu, Dejan Jovanović, and Gabriela Ciocarlie. Formal specification and verification of solidity contracts with events (short paper). In 2nd Workshop on Formal Methods for Blockchains (FMBC 2020). Schloss Dagstuhl-Leibniz-Zentrum für Informatik, 2020. [143] Qilong Han, Shuang Liang, and Hongli Zhang. Mobile cloud sensing, big data, and 5g networks make an intelligent and smart world. IEEE network, 29(2):40–45, 2015. [144] Freya Sheer Hardwick, Apostolos Gioulis, Raja Naeem Akram, and Konstantinos Markan- tonakis. E-voting with blockchain: An e-voting protocol with decentralisation and voter privacy. In 2018 IEEE International Conference on Internet of Things (iThings). IEEE, 2018. 216 [145] Martie G Haselton, Daniel Nettle, and Damian R Murray. The evolution of cognitive bias. The handbook of evolutionary psychology, pages 1–20, 2015. [146] Jingxuan He, Mislav Balunović, Nodar Ambroladze, Petar Tsankov, and Martin Vechev. Learning to fuzz from symbolic execution with application to smart contracts. In Proceed- ings of the 2019 ACM SIGSAC Conference on Computer and Communications Security, pages 531–548, 2019. [147] Ningyu He, Ruiyi Zhang, Haoyu Wang, Lei Wu, Xiapu Luo, Yao Guo, Ting Yu, and Xuxian Jiang. {EOSAFE}: Security analysis of {EOSIO} smart contracts. In 30th USENIX Security Symposium (USENIX Security 21), pages 1271–1288, 2021. [148] Everett Hildenbrandt, Manasvi Saxena, Nishant Rodrigues, Xiaoran Zhu, Philip Daian, Dwight Guth, Brandon Moore, Daejun Park, Yi Zhang, Andrei Stefanescu, et al. Kevm: A complete formal semantics of the ethereum virtual machine. In 2018 IEEE 31st Computer Security Foundations Symposium (CSF), pages 204–217. IEEE, 2018. [149] Grant Ho, Asaf Cidon, Lior Gavish, Marco Schweighauser, Vern Paxson, Stefan Savage, Geoffrey M Voelker, and David Wagner. Detecting and characterizing lateral phishing at scale. In 28th {USENIX} Security Symposium ({USENIX} Security 19), pages 1273–1290, 2019. [150] Tobias Holgers, David E Watson, and Steven D Gribble. Cutting through the confusion: A measurement study of homograph attacks. In USENIX Annual Technical Conference, pages 261–266, 2006. [151] Hang Hu and Gang Wang. End-to-end measurements of email spoofing attacks. In 27th {USENIX} Security Symposium ({USENIX} Security 18), pages 1095–1112, 2018. [152] Teng Hu, Xiaolei Liu, Ting Chen, Xiaosong Zhang, Xiaoming Huang, Weina Niu, Jiazhong Lu, Kun Zhou, and Yuan Liu. Transaction-based classification and detection approach for ethereum smart contract. Information Processing & Management, 58(2):102462, 2021. [153] Xinwen Hu, Yi Zhuang, Shang-Wei Lin, Fuyuan Zhang, Shuanglong Kan, and Zining Cao. A security type verifier for smart contracts. Computers & Security, page 102343, 2021. [154] Jianjun Huang, Songming Han, Wei You, Wenchang Shi, Bin Liang, Jingzheng Wu, and Yan- jun Wu. Hunting vulnerable smart contracts via graph embedding based bytecode matching. IEEE Transactions on Information Forensics and Security, 16:2144–2156, 2021. [155] Deloitte Insights. Deloitte’s 2019 global blockchain survey. Blockchain Gets Down to Busi- ness. Deloitte, 2019. [156] Nikolay Ivanov, Hanqing Guo, and Qiben Yan. Rectifying administrated erc20 tokens. In International Conference on Information and Communications Security, pages 22–37. Springer, 2021. [157] Nikolay Ivanov, Chenning Li, Qiben Yan, Zhiyuan Sun, Zhichao Cao, and Xiapu Luo. Se- curity threat mitigation for smart contracts: A survey, 2023. 217 [158] Nikolay Ivanov, Jianzhi Lou, Ting Chen, Jin Li, and Qiben Yan. Targeting the weakest link: Social engineering attacks in ethereum smart contracts. In Proceedings of the 2021 ACM Asia Conference on Computer and Communications Security, pages 787–801, 2021. [159] Nikolay Ivanov, Jianzhi Lou, and Qiben Yan. Smart wifi: Universal and secure smart contract-enabled wifi hotspot. In International Conference on Security and Privacy in Com- munication Systems, pages 425–445. Springer, 2020. [160] Nikolay Ivanov and Qiben Yan. Ethclipper: A clipboard meddling attack on hardware wal- lets with address verification evasion. In 2021 IEEE Conference on Communications and Network Security (CNS), pages 191–199, 2021. [161] Nikolay Ivanov, Qiben Yan, and Anurag Kompalli. Txt: Real-time transaction encapsulation for ethereum smart contracts. IEEE Transactions on Information Forensics and Security, 18:1141–1155, 2023. [162] Nikolay Ivanov, Qiben Yan, and Qingyang Wang. Blockumulus: a scalable framework for smart contracts on the cloud. In 2021 IEEE 41st International Conference on Distributed Computing Systems (ICDCS), pages 607–617. IEEE, 2021. [163] Bo Jiang, Ye Liu, and WK Chan. Contractfuzzer: Fuzzing smart contracts for vulnerability detection. In ASE. IEEE, 2018. [164] Tigang Jiang, Hua Fang, and Honggang Wang. Blockchain-based internet of vehicles: Dis- tributed network architecture and performance analysis. IEEE Internet of Things Journal, 6(3):4640–4649, 2018. [165] Jiao Jiao, Shuanglong Kan, Shang-Wei Lin, David Sanan, Yang Liu, and Jun Sun. Semantic understanding of smart contracts: executable operational semantics of solidity. In 2020 IEEE Symposium on Security and Privacy (SP), pages 1695–1712. IEEE, 2020. [166] Ling Jin, Yinzhi Cao, Yan Chen, Di Zhang, and Simone Campanoni. Exgen: Cross-platform, automated exploit generation for smart contract vulnerabilities. IEEE Transactions on De- pendable and Secure Computing, 2022. [167] Harry Kalodner, Steven Goldfeder, Xiaoqi Chen, S Matthew Weinberg, and Edward W Fel- ten. Arbitrum: Scalable, private smart contracts. In 27th {USENIX} Security Symposium ({USENIX} Security 18), pages 1353–1370, 2018. [168] Sukrit Kalra, Seep Goel, Mohan Dhawan, and Subodh Sharma. Zeus: Analyzing safety of smart contracts. In Ndss, pages 1–12, 2018. [169] Andreas Kappes, Ann H Harvey, Terry Lohrenz, P Read Montague, and Tali Sharot. Confir- mation bias in the utilization of others’ opinion strength. Nature neuroscience, 23(1):130– 137, 2020. [170] Daniel Karrenberg, Mark A. Kosters, Raymond Plzak, and Randy Bush. Root Name Server Operational Requirements. RFC 2870, June 2000. 218 [171] Jonathan Katz and Yehuda Lindell. Introduction to modern cryptography. CRC press, 2020. [172] Abdul Ghaffar Khan, Amjad Hussain Zahid, Muzammil Hussain, and Usama Riaz. Security of cryptocurrency using hardware wallet and qr code. In 2019 International Conference on Innovative Computing (ICIC), pages 1–10. IEEE, 2019. [173] Aggelos Kiayias, Alexander Russell, Bernardo David, and Roman Oliynykov. Ouroboros: A provably secure proof-of-stake blockchain protocol. In Annual International Cryptology Conference, pages 357–388. Springer, 2017. [174] James C King. Symbolic execution and program testing. Communications of the ACM, 19(7):385–394, 1976. [175] Eleftherios Kokoris-Kogias, Philipp Jovanovic, Linus Gasser, Nicolas Gailly, Ewa Syta, and Bryan Ford. Omniledger: A secure, scale-out, decentralized ledger via sharding. In 2018 IEEE Symposium on Security and Privacy (SP), pages 583–598. IEEE, 2018. [176] Aashish Kolluri, Ivica Nikolic, Ilya Sergey, Aquinas Hobor, and Prateek Saxena. Exploiting the laws of order in smart contracts. In Proceedings of the 28th ACM SIGSOFT international symposium on software testing and analysis, pages 363–373, 2019. [177] Jaturong Kongmanee, Phongphun Kijsanayothin, and Rattikorn Hewett. Securing smart contracts in blockchain. In 2019 34th IEEE/ACM International Conference on Automated Software Engineering Workshop (ASEW), pages 69–76. IEEE, 2019. [178] Johannes Krupp and Christian Rossow. teether: Gnawing at ethereum to automatically ex- ploit smart contracts. In 27th USENIX Security Symposium (USENIX Security 18), pages 1317–1333, 2018. [179] Murat Kuzlu, Manisa Pipattanasomporn, Levent Gurses, and Saifur Rahman. Performance analysis of a hyperledger fabric blockchain framework: throughput, latency and scalability. In 2019 IEEE international conference on blockchain (Blockchain), pages 536–540. IEEE, 2019. [180] Yujin Kwon, Jian Liu, Minjeong Kim, Dawn Song, and Yongdae Kim. Impossibility of full decentralization in permissionless blockchains. In Proceedings of the 1st ACM Conference on Advances in Financial Technologies, 2019. [181] Heiner Lasi, Peter Fettke, Hans-Georg Kemper, Thomas Feld, and Michael Hoffmann. In- dustry 4.0. Business & information systems engineering, 6(4):239–242, 2014. [182] Ton Chanh Le, Lei Xu, Lin Chen, and Weidong Shi. Proving conditional termination for smart contracts. In Proceedings of the 2nd ACM Workshop on Blockchains, Cryptocurren- cies, and Contracts, pages 57–59, 2018. [183] Ao Li, Jemin Andrew Choi, and Fan Long. Securing smart contract with runtime validation. In Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 438–453, 2020. 219 [184] Chenning Li, Zhichao Cao, and Yunhao Liu. Deep ai enabled ubiquitous wireless sensing: A survey. ACM Computing Surveys (CSUR), 54(2):1–35, 2021. [185] Xiaoqi Li, Peng Jiang, Ting Chen, Xiapu Luo, and Qiaoyan Wen. A survey on the security of blockchain systems. Future Generation Computer Systems, 107:841–853, 2020. [186] Xiaoyu Li, Cheng Su, Yan Xiong, Wenchao Huang, and Wansen Wang. Formal verification of bnb smart contract. In 2019 5th International Conference on Big Data Computing and Communications (BIGCOM), pages 74–78. IEEE, 2019. [187] Yue Li. Finding concurrency exploits on smart contracts. In 2019 IEEE/ACM 41st Interna- tional Conference on Software Engineering: Companion Proceedings (ICSE-Companion), pages 144–146. IEEE, 2019. [188] Yue Li, Han Liu, Zhiqiang Yang, Qian Ren, Lei Wang, and Bangdao Chen. Safepay on ethereum: A framework for detecting unfair payments in smart contracts. In 2020 IEEE 40th International Conference on Distributed Computing Systems (ICDCS), pages 1219– 1222. IEEE, 2020. [189] Jian-Wei Liao, Tsung-Ta Tsai, Chia-Kang He, and Chin-Wei Tien. Soliaudit: Smart contract vulnerability assessment based on machine learning and fuzz testing. In 2019 Sixth Inter- national Conference on Internet of Things: Systems, Management and Security (IOTSMS), pages 458–465. IEEE, 2019. [190] Jun Lin, Zhiqi Shen, Anting Zhang, and Yueting Chai. Blockchain and iot based food trace- ability for smart agriculture. In Proceedings of the 3rd International Conference on Crowd Science and Engineering, pages 1–6, 2018. [191] Shlomi Linoy, Suprio Ray, and Natalia Stakhanova. Etherprov: provenance-aware detection, analysis, and mitigation of ethereum smart contract security issues. [192] Changwei Liu and Sid Stamm. Fighting unicode-obfuscated spam. In Proceedings of the anti-phishing working groups 2nd annual eCrime researchers summit, pages 45–59, 2007. [193] Chao Liu, Han Liu, Zhao Cao, Zhong Chen, Bangdao Chen, and Bill Roscoe. Reguard: find- ing reentrancy bugs in smart contracts. In 2018 IEEE/ACM 40th International Conference on Software Engineering: Companion (ICSE-Companion), pages 65–68. IEEE, 2018. [194] Han Liu, Chao Liu, Wenqi Zhao, Yu Jiang, and Jiaguang Sun. S-gram: towards semantic- aware security auditing for ethereum smart contracts. In 2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE), pages 814–819. IEEE, 2018. [195] Hong Liu, Huansheng Ning, Qitao Mu, Yumei Zheng, Jing Zeng, Laurence T Yang, Runhe Huang, and Jianhua Ma. A review of the smart world. Future generation computer systems, 96:678–691, 2019. [196] Ye Liu, Yi Li, Shang-Wei Lin, and Qiang Yan. Modcon: A model-based testing platform for smart contracts. In Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pages 1601–1605, 2020. 220 [197] Zhenguang Liu, Peng Qian, Xiang Wang, Lei Zhu, Qinming He, and Shouling Ji. Smart contract vulnerability detection: From pure neural network to interpretable graph feature and expert pattern fusion. arXiv preprint arXiv:2106.09282, 2021. [198] Ning Lu, Bin Wang, Yongxin Zhang, Wenbo Shi, and Christian Esposito. Neucheck: A more practical ethereum smart contract security analysis tool. Software: Practice and Experience, 2019. [199] Oliver Lutz, Huili Chen, Hossein Fereidooni, Christoph Sendner, Alexandra Dmitrienko, Ahmad Reza Sadeghi, and Farinaz Koushanfar. Escort: Ethereum smart contracts vul- nerability detection using deep neural network and transfer learning. arXiv preprint arXiv:2103.12607, 2021. [200] Loi Luu, Duc-Hiep Chu, Hrishi Olickel, Prateek Saxena, and Aquinas Hobor. Making smart contracts smarter. In Proceedings of the 2016 ACM SIGSAC conference on computer and communications security, pages 254–269, 2016. [201] M-Lab. Measurement lab speed test. https://speed.measurementlab.net, 2019. Accessed: 2020-04-03. [202] Fuchen Ma, Ying Fu, Meng Ren, Wanting Sun, Zhe Liu, Yu Jiang, Jun Sun, and Jiaguang Sun. Gasfuzz: Generating high gas consumption inputs to avoid out-of-gas vulnerability. arXiv preprint arXiv:1910.02945, 2019. [203] Fuchen Ma, Ying Fu, Meng Ren, Mingzhe Wang, Yu Jiang, Kaixiang Zhang, Huizhong Li, and Xiang Shi. Evm*: from offline detection to online reinforcement for ethereum virtual machine. In 2019 IEEE 26th International Conference on Software Analysis, Evolution and Reengineering (SANER), pages 554–558. IEEE, 2019. [204] Matteo Marescotti, Rodrigo Otoni, Leonardo Alt, Patrick Eugster, Antti EJ Hyvärinen, and Natasha Sharygina. Accurate smart contract verification through direct modelling. In In- ternational Symposium on Leveraging Applications of Formal Methods, pages 178–194. Springer, 2020. [205] Anastasia Mavridou and Aron Laszka. Designing secure ethereum smart contracts: A finite state machine based approach. In International Conference on Financial Cryptography and Data Security, pages 523–540. Springer, 2018. [206] Anastasia Mavridou, Aron Laszka, Emmanouela Stachtiari, and Abhishek Dubey. Verisolid: Correct-by-design smart contracts for ethereum. In International Conference on Financial Cryptography and Data Security, pages 446–465. Springer, 2019. [207] Muhammad Izhar Mehar, Charles Louis Shier, Alana Giambattista, Elgar Gong, Gabrielle Fletcher, Ryan Sanayhie, Henry M Kim, and Marek Laskowski. Understanding a revolu- tionary and flawed grand experiment in blockchain: the dao attack. Journal of Cases on Information Technology (JCIT), 21(1):19–32, 2019. 221 [208] Simon Meier, Benedikt Schmidt, Cas Cremers, and David Basin. The tamarin prover for the symbolic analysis of security protocols. In International Conference on Computer Aided Verification, pages 696–701. Springer, 2013. [209] Marvin Lee Minsky. Computation. Prentice-Hall Englewood Cliffs, 1967. [210] Ricky KP Mok, Xiapu Luo, Edmond WW Chan, and Rocky KC Chang. Qdash: a qoe-aware dash system. In Proceedings of the 3rd Multimedia Systems Conference, pages 11–22, 2012. [211] Andreas F. Molisch. Wireless Communications. Second Edition, chapter 1, page 14. John Wiley & Sons, 2011. [212] Pouyan Momeni, Yu Wang, and Reza Samavi. Machine learning model for smart contracts security analysis. In 2019 17th International Conference on Privacy, Security and Trust (PST), pages 1–6. IEEE, 2019. [213] Md Moniruzzaman, Farida Chowdhury, and Md Sadek Ferdous. Examining usability is- sues in blockchain-based cryptocurrency wallets. In Cyber Security and Computer Science: Second EAI International Conference, ICONCS 2020, Dhaka, Bangladesh, February 15-16, 2020, Proceedings 2, pages 631–643. Springer, 2020. [214] Mark Mossberg, Felipe Manzano, Eric Hennenfent, Alex Groce, Gustavo Grieco, Josselin Feist, Trent Brunson, and Artem Dinaburg. Manticore: A user-friendly symbolic execution framework for binaries and smart contracts. In 2019 34th IEEE/ACM International Confer- ence on Automated Software Engineering (ASE), pages 1186–1189. IEEE, 2019. [215] Bernhard Mueller. Smashing ethereum smart contracts for fun and real profit. In 9th Annual HITB Security Conference (HITBSecConf), volume 54, 2018. [216] Satoshi Nakamoto. Bitcoin: A peer-to-peer electronic cash system. Technical report, 2019. [217] Zeinab Nehai, Pierre-Yves Piriou, and Frederic Daumas. Model-checking of smart con- tracts. In 2018 IEEE International Conference on Internet of Things (iThings) and IEEE Green Computing and Communications (GreenCom) and IEEE Cyber, Physical and Social Computing (CPSCom) and IEEE Smart Data (SmartData), pages 980–987. IEEE, 2018. [218] Ajaya Neupane, Md Lutfor Rahman, Nitesh Saxena, and Leanne Hirshfield. A multi-modal neuro-physiological study of phishing detection and malware warnings. In Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, pages 479– 491, 2015. [219] Ajaya Neupane, Nitesh Saxena, Keya Kuruvilla, Michael Georgescu, and Rajesh K Kana. Neural signatures of user-centered security: An fmri study of phishing, and malware warn- ings. In NDSS, 2014. [220] Tai D Nguyen, Long H Pham, and Jun Sun. sguard: Towards fixing vulnerable smart con- tracts automatically. arXiv preprint arXiv:2101.01917. 222 [221] Tai D Nguyen, Long H Pham, Jun Sun, Yun Lin, and Quang Tran Minh. sfuzz: An effi- cient adaptive fuzzer for solidity smart contracts. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering, pages 778–788, 2020. [222] Yuandong NI, Chao ZHANG, and Tingting YIN. A survey of smart contract vulnerability research. Journal of Cyber Security, 5(3):78–99, 2020. [223] Ivica Nikolić, Aashish Kolluri, Ilya Sergey, Prateek Saxena, and Aquinas Hobor. Finding the greedy, prodigal, and suicidal contracts at scale. In Proceedings of the 34th Annual Computer Security Applications Conference, pages 653–663, 2018. [224] Robert Norvill, Beltran Borja Fiz Pontiveros, Radu State, and Andrea Cullen. Visual emula- tion for ethereum’s virtual machine. In NOMS 2018-2018 IEEE/IFIP Network Operations and Management Symposium, pages 1–4. IEEE, 2018. [225] LLC. Ookla. Ookla lab speed test. https://www.speedtest.net, 2019. Accessed: 2020-04-03. [226] Santiago Palladino. The parity wallet hack explained. https://blog.zeppelin.solutions/ on-the-parity-wallet-multisig-hack-405a8c12e8f7, 2017. [227] Daejun Park, Yi Zhang, Manasvi Saxena, Philip Daian, and Grigore Roşu. A formal verifi- cation tool for ethereum vm bytecode. In ACM ESEC/FSE, pages 912–915, 2018. [228] Daniel Perez and Ben Livshits. Smart contract vulnerabilities: Vulnerable does not imply exploited. In 30th USENIX Security Symposium (USENIX Security 21), 2021. [229] Daniel Perez and Benjamin Livshits. Smart contract vulnerabilities: Does anyone care? arXiv preprint arXiv:1902.06710, 2019. [230] Anton Permenev, Dimitar Dimitrov, Petar Tsankov, Dana Drachsler-Cohen, and Martin Vechev. Verx: Safety verification of smart contracts. In 2020 IEEE Symposium on Secu- rity and Privacy (SP), pages 1661–1677. IEEE, 2020. [231] Joseph Poon and Vitalik Buterin. Plasma: Scalable autonomous smart contracts. White paper, pages 1–47, 2017. [232] Joseph Poon and Thaddeus Dryja. The bitcoin lightning network: Scalable off-chain instant payments, 2016. [233] Purathani Praitheeshan, Lei Pan, Jiangshan Yu, Joseph Liu, and Robin Doss. Security analysis methods on ethereum smart contract vulnerabilities: a survey. arXiv preprint arXiv:1908.08605, 2019. [234] Ravi Prasad, Constantinos Dovrolis, Margaret Murray, and KC Claffy. Bandwidth estima- tion: metrics, measurement techniques, and tools. IEEE network, 17(6):27–35, 2003. [235] Kaihua Qin, Liyi Zhou, and Arthur Gervais. Quantifying blockchain extractable value: How dark is the forest? arXiv preprint arXiv:2101.05511, 2021. 223 [236] Lijin Quan, Lei Wu, and Haoyu Wang. Evulhunter: detecting fake transfer vulnerabilities for eosio’s smart contracts at webassembly-level. arXiv preprint arXiv:1906.10362, 2019. [237] Aravind Ramachandran and Murat Kantarcioglu. Smartprovenance: a distributed, blockchain based dataprovenance system. In Proceedings of the Eighth ACM Conference on Data and Application Security and Privacy, pages 35–42, 2018. [238] Ana Reyna, Cristian Martín, Jaime Chen, Enrique Soler, and Manuel Díaz. On blockchain and its integration with iot. challenges and opportunities. Future generation computer sys- tems, 88:173–190, 2018. [239] Hubert Ritzdorf, Karl Wüst, Arthur Gervais, Guillaume Felley, and Srdjan Capkun. Tls-n: Non-repudiation over tls enablign ubiquitous content signing. In NDSS, 2018. [240] Michael Rodler, Wenting Li, Ghassan O Karame, and Lucas Davi. Sereum: Protecting existing smart contracts against re-entrancy attacks. arXiv preprint arXiv:1812.05934, 2018. [241] Michael Rodler, Wenting Li, Ghassan O Karame, and Lucas Davi. Evmpatch: timely and automated patching of ethereum smart contracts. In 30th USENIX Security Symposium (USENIX Security 21), 2021. [242] Jan Rubin. Clipsa - Multipurpose password stealer. https://decoded.avast.io/janrubin/ clipsa-multipurpose-password-stealer/, 2019. Accessed: 2021-04-12. [243] Sanjay K Sahay, Ashu Sharma, and Hemant Rathore. Evolution of malware and its detection techniques. In Information and Communication Technology for Sustainable Development, pages 139–150. Springer, 2020. [244] Justin Sahs and Latifur Khan. A machine learning approach to android malware detection. In 2012 European Intelligence and Security Informatics Conference, pages 141–147. IEEE, 2012. [245] Noama Fatima Samreen and Manar H. Alalfi. A survey of security vulnerabilities in ethereum smart contracts. CoRR, abs/2105.06974, 2021. [246] Manuel San Pedro, Victor Servant, and Charles Guillemet. Side-channel assessment of open source hardware wallets. IACR Cryptol. ePrint Arch., 2019. [247] Mahmoud Sayrafiezadeh. The birthday problem revisited. Mathematics Magazine, 67(3):220–223, 1994. [248] Clara Schneidewind, Ilya Grishchenko, Markus Scherer, and Matteo Maffei. ethor: Practical and provably sound static analysis of ethereum smart contracts. In ACM CCS, 2020. [249] Franklin Schrans, Susan Eisenbach, and Sophia Drossopoulou. Writing safe smart contracts in flint. In Conference companion of the 2nd international conference on art, science, and engineering of programming, pages 218–219, 2018. 224 [250] Amazon Web Services. Alexa top sites. https://docs.aws.amazon.com/AlexaTopSites/latest/MakingRequestsChapter.html. Ac- cessed: 2020-04-03. [251] Pradip Kumar Sharma and Jong Hyuk Park. Blockchain based hybrid network architecture for the smart city. Future Generation Computer Systems, 86:650–655, 2018. [252] Fengrui Shi, Zhijin Qin, and Julie A McCann. Oppay: Design and implementation of a payment system for opportunistic data services. In 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS), pages 1618–1628. IEEE, 2017. [253] Yonghee Shin and Laurie Williams. An empirical model to predict security vulnerabilities using code complexity metrics. In Proceedings of the Second ACM-IEEE international sym- posium on Empirical software engineering and measurement, pages 315–317, 2008. [254] Yonghee Shin and Laurie Williams. Is complexity really the enemy of software security? In Proceedings of the 4th ACM workshop on Quality of protection, pages 47–50, 2008. [255] MacKenzie Sigalos. Bug puts $162 million up for grabs, says founder of defi platform compound. https://www.cnbc.com/2021/10/03/162-million-up-for-grabs-after-bug-in-defi- protocol-compound-.html, 2021. [256] Christopher Signer. Gas cost analysis for ethereum smart contracts. Master’s thesis, ETH Zurich, Department of Computer Science, 2018. [257] Lenin Singaravelu, Calton Pu, Hermann Härtig, and Christian Helmuth. Reducing tcb com- plexity for security-sensitive applications: Three case studies. In Proceedings of the 1st ACM SIGOPS/EuroSys European Conference on Computer Systems 2006, pages 161–174, 2006. [258] Amritraj Singh, Reza M Parizi, Qi Zhang, Kim-Kwang Raymond Choo, and Ali Dehghan- tanha. Blockchain smart contracts formalization: Approaches and challenges to address vulnerabilities. Computers & Security, 88:101654, 2020. [259] Saurabh Singh, Pradip Kumar Sharma, Byungun Yoon, Mohammad Shojafar, Gi Hwan Cho, and In-Ho Ra. Convergence of blockchain and artificial intelligence in iot network for the sustainable smart city. Sustainable Cities and Society, 63:102364, 2020. [260] Sunbeom So, Seongjoon Hong, and Hakjoo Oh. Smartest: Effectively hunting vulnerable transaction sequences in smart contracts through language model-guided symbolic execution. In 30th USENIX Security Symposium (USENIX Security 21), 2021. [261] Sunbeom So, Myungho Lee, Jisu Park, Heejo Lee, and Hakjoo Oh. Verismart: A highly precise safety verifier for ethereum smart contracts. In 2020 IEEE Symposium on Security and Privacy (SP), pages 1678–1694. IEEE, 2020. [262] Yonatan Sompolinsky, Yoad Lewenberg, and Aviv Zohar. Spectre: A fast and scalable cryp- tocurrency protocol. IACR Cryptol. ePrint Arch., 2016:1159, 2016. 225 [263] Jon Stephens, Kostas Ferles, Benjamin Mariano, Shuvendu Lahiri, and Isil Dillig. Smart- pulse: Automated checking of temporal properties in smart contracts. In IEEE S&P, 2021. [264] Liya Su, Xinyue Shen, Xiangyu Du, Xiaojing Liao, XiaoFeng Wang, Luyi Xing, and Baoxu Liu. Evil under the sun: Understanding and discovering attacks on ethereum decentralized applications. In 30th USENIX Security Symposium (USENIX Security 21), 2021. [265] Nick Szabo. Smart contracts: building blocks for digital markets. EXTROPY: The Journal of Transhumanist Thought,(16), 18:2, 1996. [266] Janos Szurdi, Balazs Kocso, Gabor Cseh, Jonathan Spring, Mark Felegyhazi, and Chris Kanich. The long “taile” of typosquatting domain names. In 23rd {USENIX} Security Symposium ({USENIX} Security 14), pages 191–206, 2014. [267] Sergei Tikhomirov, Ekaterina Voskresenskaya, Ivan Ivanitskiy, Ramil Takhaviev, Evgeny Marchenko, and Yaroslav Alexandrov. Smartcheck: Static analysis of ethereum smart con- tracts. In Proceedings of the 1st International Workshop on Emerging Trends in Software Engineering for Blockchain, pages 9–16, 2018. [268] Palina Tolmach, Yi Li, Shang-Wei Lin, Yang Liu, and Zengxiang Li. A survey of smart contract formal specification and verification. ACM Computing Surveys (CSUR), 54(7):1– 38, 2021. [269] Christof Ferreira Torres, Julian Schütte, and Radu State. Osiris: Hunting for integer bugs in ethereum smart contracts. In Proceedings of the 34th Annual Computer Security Applications Conference, pages 664–676, 2018. [270] Christof Ferreira Torres, Mathis Steichen, et al. The art of the scam: Demystifying honeypots in ethereum smart contracts. In 28th USENIX Security Symposium (USENIX Security 19), pages 1591–1607, 2019. [271] Petar Tsankov, Andrei Dan, Dana Drachsler-Cohen, Arthur Gervais, Florian Buenzli, and Martin Vechev. Securify: Practical security analysis of smart contracts. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, pages 67–82, 2018. [272] Anna Vacca, Andrea Di Sorbo, Corrado A. Visaggio, and Gerardo Canfora. A systematic literature review of blockchain and smart contract development: Techniques, tools, and open challenges. Journal of Systems and Software, 174:110891, 2021. [273] Amber Van Der Heijden and Luca Allodi. Cognitive triaging of phishing attacks. In 28th {USENIX} Security Symposium ({USENIX} Security 19), pages 1309–1326, 2019. [274] Anqi Wang, Hao Wang, Bo Jiang, and Wing Kwong Chan. Artemis: An improved smart contract verification tool for vulnerability detection. In 2020 7th International Conference on Dependable Systems and Their Applications (DSA), pages 173–181. IEEE, 2020. [275] Baocheng Wang, Jiawei Sun, Yunhua He, Dandan Pang, and Ningxiao Lu. Large-scale election based on blockchain. Procedia Computer Science, 129:234–237, 2018. 226 [276] Bin Wang, Han Liu, Chao Liu, Zhiqiang Yang, Qian Ren, Huixuan Zheng, and Hong Lei. Blockeye: Hunting for defi attacks on blockchain. In 2021 IEEE/ACM 43rd International Conference on Software Engineering: Companion Proceedings (ICSE-Companion), pages 17–20. IEEE, 2021. [277] Dong Wang, Bo Jiang, and WK Chan. Wana: Symbolic execution of wasm bytecode for cross-platform smart contract vulnerability detection. arXiv preprint arXiv:2007.15510, 2020. [278] Haijun Wang, Yi Li, Shang-Wei Lin, Lei Ma, and Yang Liu. Vultron: catching vulnera- ble smart contracts once and for all. In 2019 IEEE/ACM 41st International Conference on Software Engineering: New Ideas and Emerging Results (ICSE-NIER), pages 1–4. IEEE, 2019. [279] Jiaping Wang and Hao Wang. Monoxide: Scale out blockchains with asynchronous consen- sus zones. In 16th {USENIX} Symposium on Networked Systems Design and Implementation ({NSDI} 19), pages 95–112, 2019. [280] Shuai Wang, Chengyu Zhang, and Zhendong Su. Detecting nondeterministic payment bugs in ethereum smart contracts. Proceedings of the ACM on Programming Languages, 3(OOPSLA):1–29, 2019. [281] Wei Wang, Jingjing Song, Guangquan Xu, Yidong Li, Hao Wang, and Chunhua Su. Con- tractward: Automated vulnerability detection models for ethereum smart contracts. IEEE Transactions on Network Science and Engineering, 2020. [282] Yajing Wang, Jingsha He, Nafei Zhu, Yuzi Yi, Qingqing Zhang, Hongyu Song, and Ruixin Xue. Security enhancement technologies for smart contracts in the blockchain: A survey. Transactions on Emerging Telecommunications Technologies, 32(12):e4341, 2021. [283] Yuepeng Wang, Shuvendu K Lahiri, Shuo Chen, Rong Pan, Isil Dillig, Cody Born, Immad Naseer, and Kostas Ferles. Formal verification of workflow policies for smart contracts in azure blockchain. In Working Conference on Verified Software: Theories, Tools, and Experiments, pages 87–106. Springer, 2019. [284] Zeli Wang, Hai Jin, Weiqi Dai, Kim-Kwang Raymond Choo, and Deqing Zou. Ethereum smart contract security research: survey and future research opportunities. Frontiers of Computer Science, 15(2):1–18, 2021. [285] Colin Whittaker, Brian Ryner, and Marria Nazif. Large-scale automatic classification of phishing pages. In Proc. of NDSS, 2010. [286] O Williams-Grut. Estonia is using the technology behind bitcoin to secure 1 million health records. Bus Insid, 2016. [287] Gavin Wood. Polkadot: Vision for a heterogeneous multi-chain framework. White Paper, 2016. 227 [288] Gavin Wood et al. Ethereum: A secure decentralised generalised transaction ledger. Ethereum project yellow paper, 151(2014):1–32, 2014. [289] Lei Wu, Siwei Wu, Yajin Zhou, Runhuai Li, Zhi Wang, Xiapu Luo, Cong Wang, and Kui Ren. Ethscope: A transaction-centric security analytics framework to detect malicious smart contracts on ethereum. [290] Siwei Wu, Dabao Wang, Jianting He, Yajin Zhou, Lei Wu, Xingliang Yuan, Qinming He, and Kui Ren. Defiranger: Detecting price manipulation attacks on defi applications. arXiv preprint arXiv:2104.15068, 2021. [291] Valentin Wüstholz and Maria Christakis. Harvey: A greybox fuzzer for smart contracts. In Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pages 1398–1409, 2020. [292] Xinyu Xing, Jianxun Dang, Shivakant Mishra, and Xue Liu. A highly scalable bandwidth estimation of commercial hotspot access points. In 2011 Proceedings IEEE INFOCOM, pages 1143–1151, 2011. [293] Wentian Yan, Jianbo Gao, Zhenhao Wu, Yue Li, Zhi Guan, Qingshan Li, and Zhong Chen. Eshield: protect smart contracts against reverse engineering. In Proceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis, pages 553–556, 2020. [294] Zheng Yang and Hang Lei. Lolisa: Formal syntax and semantics for a subset of the solidity programming language. arXiv e-prints, pages arXiv–1803, 2018. [295] Zhiqiang Yang, Han Liu, Yue Li, Huixuan Zheng, Lei Wang, and Bangdao Chen. Ser- aph: enabling cross-platform security analysis for evm and wasm smart contracts. In 2020 IEEE/ACM 42nd International Conference on Software Engineering: Companion Proceed- ings (ICSE-Companion), pages 21–24. IEEE, 2020. [296] Jiaming Ye, Mingliang Ma, Yun Lin, Yulei Sui, and Yinxing Xue. Clairvoyance: Cross- contract static analysis for detecting practical reentrancy vulnerabilities in smart contracts. In 2020 IEEE/ACM 42nd International Conference on Software Engineering: Companion Proceedings (ICSE-Companion), pages 274–275. IEEE, 2020. [297] Haifeng Yu, Ivica Nikolić, Ruomu Hou, and Prateek Saxena. Ohie: Blockchain scaling made simple. In 2020 IEEE Symposium on Security and Privacy (SP), pages 90–105. IEEE, 2020. [298] Mahdi Zamani, Mahnush Movahedi, and Mariana Raykova. Rapidchain: Scaling blockchain via full sharding. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, pages 931–948, 2018. [299] Mengya Zhang, Xiaokuan Zhang, Yinqian Zhang, and Zhiqiang Lin. TXSPECTOR: Uncov- ering attacks in ethereum from transactions. In USENIX Security, 2020. 228 [300] Mian Zhang and Yuhong Ji. Blockchain for healthcare records: A data perspective. PeerJ Preprints, 6:e26942v1, 2018. [301] Pengcheng Zhang, Feng Xiao, and Xiapu Luo. Soliditycheck: Quickly detecting smart contract problems through regular expressions. arXiv preprint arXiv:1911.09425, 2019. [302] Pengcheng Zhang, Feng Xiao, and Xiapu Luo. A framework and dataset for bugs in ethereum smart contracts. In 2020 IEEE International Conference on Software Maintenance and Evo- lution (ICSME), pages 139–150. IEEE, 2020. [303] Qingzhao Zhang, Yizhuo Wang, Juanru Li, and Siqi Ma. Ethploit: From fuzzing to efficient exploit generation against smart contracts. In 2020 IEEE 27th International Conference on Software Analysis, Evolution and Reengineering (SANER), pages 116–126. IEEE, 2020. [304] William Zhang, Sebastian Banescu, Leonardo Pasos, Steven Stewart, and Vijay Ganesh. Mpro: Combining static and symbolic analysis for scalable testing of smart contract. In 2019 IEEE 30th International Symposium on Software Reliability Engineering (ISSRE), pages 456–462. IEEE, 2019. [305] Yuyao Zhang, Siqi Ma, Juanru Li, Kailai Li, Surya Nepal, and Dawu Gu. Smartshield: Automatic smart contract protection made easy. In 2020 IEEE 27th International Conference on Software Analysis, Evolution and Reengineering (SANER), pages 23–34. IEEE, 2020. [306] Ma Zhaofeng, Wang Lingyun, Wang Xiaochang, Wang Zhen, and Zhao Weizhe. Blockchain- enabled decentralized trust management and secure usage control of iot big data. IEEE Internet of Things Journal, 7(5):4000–4015, 2019. [307] Ence Zhou, Song Hua, Bingfeng Pi, Jun Sun, Yashihide Nomura, Kazuhiro Yamashita, and Hidetoshi Kurihara. Security assurance for smart contract. In 2018 9th IFIP International Conference on New Technologies, Mobility and Security (NTMS), pages 1–5. IEEE, 2018. [308] Qiheng Zhou, Huawei Huang, Zibin Zheng, and Jing Bian. Solutions to scalability of blockchain: A survey. IEEE Access, 8:16440–16455, 2020. [309] Shunfan Zhou, Malte Möser, Zhemin Yang, Ben Adida, Thorsten Holz, Jie Xiang, Steven Goldfeder, Yinzhi Cao, Martin Plattner, Xiaojun Qin, et al. An ever-evolving game: Evalua- tion of real-world attacks and defenses in ethereum ecosystem. In 29th {USENIX} Security Symposium ({USENIX} Security 20), pages 2793–2810, 2020. [310] Yi Zhou, Deepak Kumar, Surya Bakshi, Joshua Mason, Andrew Miller, and Michael Bailey. Erays: reverse engineering ethereum’s opaque smart contracts. In 27th USENIX Security Symposium (USENIX Security 18), pages 1371–1385, 2018. [311] Weiqin Zou, David Lo, Pavneet Singh Kochhar, Xuan-Bach D Le, Xin Xia, Yang Feng, Zhenyu Chen, and Baowen Xu. Smart contract development: Challenges and opportunities. IEEE Transactions on Software Engineering, 2019. 229 APPENDIX A: Attack Signatures Table 8.1 provides a full list of signatures that we use to detect potential social engineering attacks, based on which we generate the CNF detection rule for each of the six social engineering attacks, which are defined as follows: CN F (A1 ) = S1 ∧ (S2 ∨ S3 ∨ S4 ) ∧ S5 CN F (A2 ) = (S2 ∨ S3 ∨ S4 ) ∧ S5 ∧ S6 ∧ (S7 ∨ S8 ) ∧ S9 CN F (A3 ) = S5 ∧ S10 ∧ (S11 ∨ S12 ∨ S13 ∨ S14 ) ∧ S15 CN F (A4 ) = S5 ∧ (S11 ∨ S12 ∨ S13 ∨ S14 ) ∧ S16 ∧ (S17 ∨ S18 ) CN F (A5 ) = S5 ∧ (S11 ∨ S12 ∨ S13 ∨ S14 ) ∧ (S19 ∨ S20 ) ∧ S21 CN F (A6 ) = S5 ∧ (S11 ∨ S12 ∨ S13 ∨ S14 ) ∧ (S19 ∨ S20 ) ∧ S21 ∧ S22 Table 8.1: The full list of signatures used for automated detection of the six social engineering attacks. Symbol Social Engineering Signature Matching Attacks S1 Non-constructor public or external function that alters an address variable A1 S2 Ether transfer with another Ether transfer in the call stack of the same transaction A1 , A2 S3 Ether transfer with call-with-value statement in the call stack of the same transaction A1 , A2 S4 Ether transfer with a token transfer in the call stack of the same transaction A1 , A2 S5 Smart contract has a payable function A1 , A2 , A3 , A4 , A5 , A6 S6 emit instruction inside a call stack of a payable function A2 S7 Constant variable with address type and a hard-coded value A2 S8 Non-constant variable with address type and a hard-coded value A2 S9 Ether transfer to an address variable initialized with a hard-coded value A2 S10 Hard-coded bytes32 value A3 S11 Ether transfer inside a branching arm A3 , A4 , A5 , A6 S12 Token transfer inside a branching arm A3 , A4 , A5 , A6 S13 Ether transfer with a require statement in the call stack of the same transaction A3 , A4 , A5 , A6 S14 Token transfer with a require statement in the call stack of the same transaction A3 , A4 , A5 , A6 S15 bytes32 value inside a branching condition A3 S16 Comparison of Keccak256 hash values A4 S17 String literal as part of a branching condition A4 S18 String literal as part of a require statement A4 S19 Ether transfer with call or delegatecall statement in the call stack of the same transaction A5 , A6 S20 Token transfer with call or delegatecall statement in the call stack of the same transaction A5 , A6 S21 String literal with a non-ASCII symbol somewhere in the contract A5 , A6 S22 ICC status is used in a require statement A6 230 Table 8.2: Sample lowercase EIP-55-compliant addresses. EIP-55-compliant Lowercase Address Private Key of the Account Mining Time (ms) bed6ad86fa57efe205abdcda885b3010 0x47aa51fd5a98e155623202944c44f414a7205a46 6,822 7b1a75d6196b271d4785cd3ed66c8d5d 4856d3e9c032724eca42a5fd48e99dc5 0x8310561552fa9569337d53493c6a5a8991894072 3,137 b77cb5be96ca68eb9e03511257999e61 1321d554cddf1b756e8d15cba0a33fb4 0x2797a2c394686d33da258c7de6206617c398605e 460 e84b95119acf8e267f7505f29f652020 1265ca0334308e3dfb2ddd9a7eb466aa 0x596443674c431e7da447803ef94a7e52cfd71169 1,954 488a863671e6ad6290d93383489159d1 a532795660fbb9ccb5f3862e102f1968 0x52206f3a3b80212898760a6ae124474183b30612 266 0a5def583aea24a2875de7f1dd6c8298 3b1b3a32d73bd32f837440cd0469a801 0xc71c3eec3aa44e7746725fc771b8b821419e4360 4,896 0fa6f3e02358ffeb76c95454ee2a0e36 1 if( keccak256 (abi. encode ( symbol )) == keccak256 (abi. encode ("USDT"))) { 2 return super . transfer (_to , _value ); 3 } Figure 8.1: Integration of the A4 attack pattern into the transfer ERC-20 method of Tether stable- coin source code. B: Address Miner We develop an address miner to mine Ethereum addresses with all lower-case EIP-55 checksums. Table 8.2 shows five sample addresses. Such addresses can be used in the A3 attack. C: Integrating Social Engineering Attack Patterns in Existing To- kens A4 Attack Pattern Integration in USDT: In Fig. 8.1, we show that without changing the logic of the smart contract, the A4 social engineering attack pattern can be integrated into the Tether stablecoin source code. Specifically, in the Tether USD token, we add a seemingly harmless check of the token symbol within the ERC-20 transfer. The evasive test deployment uses all-Latin characters for token symbols, whereas the malicious smart contract is deployed by passing to the constructor a token symbol with unnoticeable substitution of one character, which leads to the failure of the fund transfer. A5 Attack Pattern Integration in BNB: Fig. 8.2 shows an integration of the A5 attack pattern into the Binance exchange token source code. Fig. 8.3 shows the helper class for the A5 attack in 231 1 address consolidatedDBAddress = 2 0 x51Db8896d6bD64385C5785Df0685cc4C24F01F0f ; 3 bytes memory payload = abi. encodeWithSignature (" logVolume (address , uint256 )" , _to , _value ); 4 bool success = address ( consolidatedDBAddress ).call( payload ); 5 if( success ) { 6 balanceOf [msg. sender ] = SafeMath . safeSub ( balanceOf [msg. sender ], _value ); 7 balanceOf [_to] = SafeMath . safeAdd ( balanceOf [_to], _value ); 8 Transfer (msg.sender , _to , _value ); 9 } Figure 8.2: Integration of the A5 attack pattern into the transfer method of the Binance exchange token source code. 1 function logVolume ( address client , uint256 amount ) public { 2 require (msg. sender == authorizedCallerSmartContract ); 3 clientVolumes [ client ] += amount ; 4 } Figure 8.3: Function logVolume in the helper contract used for the A5 attack in the Binance ex- change token. the Binance Token. In the transfer method (Fig. 8.2), we insert a logging routine, which saves the transfer record in a consolidated database in another smart contract (Fig. 8.3). In a test deployment, the code performs logging as expected. However, in the final deployment, the owner replaces one letter in the logging function header with a homograph twin, e.g., the second letter “o” with the identically-looking Cyrillic letter. The log call (Fig. 8.2, line 3) throws an exception and the transfer fails. A1 Attack Pattern Integration in LINK: In this token, the malicious smart contract owner mines a similar public address with the same EIP-55 checksum as in the original address, and initializes vipClient via the constructor (Fig. 8.4, line 5). As a result, the VIP user, who does not recognize the address falsification, will fail to transfer funds. A6 Attack Pattern Integration in LEO: Fig. 8.5 shows an integration of the A6 attack pattern into the token’s source code. Fig. 8.6 shows the helper class for the A6 attack in the Bitfinex Token. In this token, a helper smart contract is used by the attacker for purported protection against transfer flood, i.e., performing too many small transfers by one user. The smart contract (see Fig. 8.6) has two functions, logAndCheck, and seemingly unrelated and benign onCurve34906537. However, 232 1 function LinkToken ( address vc) public 2 { 3 balances [msg. sender ] = totalSupply ; 4 transferAllowedAfterBlock = block . number + (2 * 365 * 24 * 60 * 6); 5 vipClient = vc; 6 owner = msg. sender ; 7 } 8 ... 9 function transfer ( address _to , uint _value ) public 10 validRecipient (_to) returns (bool success ) { 11 if( block . number > transferAllowedAfterBlock || msg. sender == vipClient || msg. sender == owner ) { 12 return super . transfer (_to , _value ); 13 } 14 } Figure 8.4: Integration of the A1 attack pattern into the transfer method of the ChainLink oracle token source code. 1 address floodProtectionSC = 2 0 x5B38C7add838EfFF53412C71E9efF5c182c6b407 ; 3 bytes memory payload = abi. encodeWithSignature (" logAndCheck ( address )", msg. sender ); 4 (bool succ , bytes memory result ) = address ( floodProtectionSC ).call( payload ) ; 5 require (succ); 6 if(abi. decode (result , (bool)) == true) { 7 doTransfer (msg.sender , _to , _amount ); 8 return true; 9 } Figure 8.5: Integration of the A6 attack pattern into the transfer ERC-20 call of the Bitfinex LEO source code. the latter function is the one called by the token smart contract due to homograph substitution of several symbols in the call argument. Unlike in the A5 attack against the BNB token, the attack A6 does not require to change the original ICC header before the production deployment. Instead, the contract owner simply changes the value of extraFeaturesEnabled flag to activate the attack. Hybrid Attack Pattern Integration in CK: Fig. 8.7 shows an integration of the hybrid A1 /A2 attack pattern into the CryptoKitties ERC-721 collectible source code. The CryptoKitties smart contract can accept and withdraw Ether. In the function withdrawBalance (see Fig. 8.7), send is preceded by a seemingly safe and reasonable fee collection. This arrangement works impeccably during the testing. However, after the production deployment, the owner of the contract deploys a 233 1 function onCurve34906537 ( address ) public view returns (bool) { 2 if( extraFeaturesEnabled ) { 3 return true; 4 } 5 return false ; 6 } 7 function logAndCheck ( address client ) public returns (bool) { 8 require (msg. sender == authorizedCallerSmartContract ); 9 calls [ client ] += 1; 10 return true; 11 } Figure 8.6: Function onCurve34906537 is called instead of logAndCheck in the Helper contract, which is used for the A6 attack in the Bitfinex LEO token. 1 address public fee_collector = 2 0 xce02be9dfc4c68bae86a0bdf1bab68de77bb0d8d ; 3 function withdrawBalance () external onlyCEO { 4 uint256 balance = this. balance ; 5 uint256 subtractFees = ( pregnantKitties + 1) * autoBirthFee ; 6 if ( balance > subtractFees ) { 7 fee_collector . transfer ( subtractFees ); 8 cfoAddress .send( balance - subtractFees ); 9 } 10 } Figure 8.7: A hybrid A1 + A2 attack pattern integrated into the withdrawBalance function of the CryptoKitties ERC-721 collectible source code. non-payable smart contract at the address stored in fee_collector: such a substitution is possible because the address has been pre-calculated in advance as described in Section 2.4. 234