WATER QUALITY MONITORING AND CONTAMINANTS ANALYSIS WITH COFFEE-RING EFFECT BY MACHINE LEARNING By Xiaoyan Li A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of Environmental Engineering – Doctor of Philosophy Computational Mathematics, Science and Engineering – Dual Major 2023 ABSTRACT In the first stage, a low-cost tap water fingerprinting technique was explored using the coffee ring effect, which produces distinguishable residue patterns after tap water evaporates. This technique was evaluated by photographing tap water droplets from different communities in southern Michigan with a cell phone camera and 30x loupe. A convolutional neural network (CNN) model was then trained using the images to group the tap waters with similar water chemistry, achieving 80% accuracy. Further experiments were conducted to determine the influence of lower concentration species in the tap water "fingerprint". By analyzing the residue patterns from salt mixtures with varying concentrations of sodium, calcium, magnesium, chloride, bicarbonate, and sulfate, it was found that the residue patterns are unique and reproducible, and are associated with the water chemistry of the sample. Principal component analysis (PCA) was also applied to the image files and particle measurements, further highlighting differences in the residue patterns. The results suggest that the residue patterns of tap water, imaged with a cell phone camera and loupe, contain valuable information about the composition of tap water, and the coffee ring effect should be further studied for potential use in low-cost tap water fingerprinting. The second stage examined the coffee-ring effect for tap water component analysis using synthetic samples with varying concentrations of ions. A custom four-axis autosampler was built using Raspberry Pi, a 3D printer stage, and programmed with Ubuntu and Python 3.7. The experiment was conducted in a controlled temperature and humidity chamber. SEM images, EDS mapping, and particle features extracted from photographs were analyzed using statistical methods. Optimal conditions were identified as 23-26°C with 45%-50% humidity, 20-23°C with 45%-50% humidity, and 26-29°C with 40%-45% humidity, showcasing the coffee-ring effect as a low-cost, effective technique for tap water analysis. In the third stage, three models were evaluated in this research: the One-stage point estimation model (OnePeM), the Two-stage vision-transformer point estimation model (TwoVtPeM), and the Two-stage vision-transformer multiple output estimation model (TwoVtMoM). The TwoVtPeM technique achieved the best performance of the models tested (OnePeM, TwoVtPeM and TwoVtMoM), with OnePeM also performing well and TwoVtMoM falling short. The TwoVtPeM relative percentage errors were ±17.1% for oxygen, ±4.5% for sulfur, ±19.9% for sodium, ±5.7% for chlorine, ±19.8% for calcium, ±25.8% for magnesium, and ±20.1% for carbon. The R2 was 0.95 which is higher than OnePeM with 0.90 R2 and TwoVtMoM which was 0.54. The TwoVtPeM had a higher error mean than OnePeM, but it exhibited lower relative standard deviations of estimation; the TwoVtPeM relative standard deviations values were: 3.9% for oxygen, 3.0% for sulfur, 5.3% for sodium, 3.9% for magnesium, 5.3% for chlorine, 10.0% for calcium, and 5.9% for carbon. Moreover, 79.2% of water samples were correctly classified for hardness based on the estimated element concentrations by TwoVtPeM. Compared to strip test kits, this technology offers advantages such as speed, low cost, and the ability to simultaneously estimate multiple contaminants. However, addressing certain limitations, such as the quality of the substrate used and the size and complexity of the dataset and models, is essential. The TwoVtMoM is underfitting and requires additional training epochs and fine-tuning. Overall, this research demonstrates a promising technique for water quality analysis, providing a low-cost, fast, and relatively accurate method for estimating water contaminant concentrations. Copyright by XIAOYAN LI 2023 To Life. v ACKNOWLEDGMENTS I would like to express my most profound gratitude to Dr. Lahr and Dr. Xie for graciously providing me with the laboratory facilities within the Department of Civil and Environmental Engineering and Computational Mathematics, Science, and Engineering at Michigan State University. Their invaluable guidance, unwavering support, and enthusiasm have greatly contributed to my research experience. I am deeply thankful to all the committee members for sharing their expertise and for fostering a sense of camaraderie throughout my academic journey. My heartfelt appreciation goes to Dr. Tan for his generosity in sharing his vast knowledge and for extending me the opportunity to attend his group meetings, where I could learn about cutting-edge machine learning models and techniques. I am also grateful to his current and former group members for their willingness to share their valuable insights and advice on my research project. I would like to express my gratitude to Dr. Tarabara and his research group for providing me with the essential training and access to the contact angle goniometer, which played a crucial role in my research. I am deeply indebted to Dr. Xie for serving as my advisor in the CMSE department, offering guidance in my research, and granting me access to the lab computer and GPU. I also extend my thanks to Dr. Phanikumar for his insightful advice and expertise on my research problem and techniques. His guidance has been invaluable in helping me navigate the complexities of my work and ensuring that my research is both rigorous and impactful. Furthermore, I would like to thank Dr. Yunhao Liu for his guidance of my career and for introducing me to the world of volleyball. His enthusiasm for the sport is contagious, vi and playing volleyball has become a source of joy and stress relief for me. I appreciate the time he has spent teaching me the skills and techniques, as well as the camaraderie we have shared on the court. Dr. Liu’s support has been instrumental in both my professional and personal development, and I am truly grateful for his mentorship. Each of these individuals has played a significant role in my academic journey, and I am deeply grateful for their contributions to my growth and success. Their guidance, expertise, and support have been invaluable, and I am honored to have had the opportunity to learn from and work with such exceptional mentors and colleagues. Thank you all for your unwavering commitment to my development and for helping me become the researcher and individual I am today. Your contributions to my success will never be forgotten, and I look forward to carrying the lessons you have taught me into the future. Furthermore, I would like to extend my gratitude to my friends and all the individuals I had the pleasure of meeting at Michigan State University, Futurewei Technologies, and Lucy-labs Inc. Their support, advice, and friendship have enriched both my research and personal life. Lastly, I want to express my utmost appreciation to my family for their unwavering love, encouragement, and support throughout my academic journey. Their belief in me has been a constant source of strength and motivation, and I could not have achieved this without them. vii PREFACE This dissertation represents the culmination of my years of dedicated study and research in the interdisciplinary fields of Civil and Environmental Engineering and Computational Mathematics, Science, and Engineering. Since embarking on my PhD journey in 2016, I have been confronted with an array of challenges that society and the built environment face on a daily basis. My research endeavors to contribute to the advancement of these fields by proposing innovative and effective solutions to real-world problems, paving the way for sustainable development. The primary focus of my dissertation revolves around the coffee ring effect and tap water quality monitoring, which has broad implications for public health and environmental safety. Throughout my PhD program, I have extensively explored the intricacies of machine learning, statistics, computer vision, analytical chemistry, and microfluidics, and have drawn upon these disciplines to enhance my research. By conducting a comprehensive analysis of the current state of the field, I identified key areas for improvement that helped shape my research agenda. My findings were ultimately derived from a synergistic blend of theoretical analysis, numerical modeling, and hands-on laboratory experimentation. I am deeply grateful to everyone who supported me throughout this journey, particularly my advisor, Dr. Lahr, who has been a constant source of guidance, inspiration, and encouragement. I also wish to extend my appreciation to my colleagues and fellow students, whose camaraderie and intellectual stimulation have enriched my overall experience and contributed to my personal growth. It is my fervent hope that my research efforts will make a significant and lasting impact on the field of Civil and Environmental Engineering, ultimately contributing to viii the betterment of our world through innovative approaches to sustainable development and environmental stewardship. As I look back on my PhD journey, I am filled with a sense of accomplishment and a renewed commitment to using my knowledge and skills to create a healthier, more sustainable future for all. ix TABLE OF CONTENTS CHAPTER 1 Introduction ............................................................................................................................ 1 1.1 Need for innovation in drinking water monitoring ................................................. 1 1.2 Coffee-ring effect introduction ................................................................................... 1 1.2.1 What is coffee-ring effect? ............................................................................ 1 1.2.2 Several factors in pattern formation of crystals in the coffee-ring effect process........................................................................................................... 2 1.2.3 Understanding the mechanism of coffee-ring effect ................................... 3 1.2.4 Crystal structure prediction with energy minimization............................. 4 1.2.5 Coffee-ring effect applications........................................................................ 4 1.3 Machine-Learning Models in water treatment and modeling ............................... 8 1.3.1 Image analysis via convolutional neural network (CNN)........................... 9 1.3.2 Vision Transformer in computer vision ......................................................14 1.3.3 Machine-Learning Models and Artificial-Intelligence Methods in Water Treatment ....................................................................................................15 1.3.4 Applications of AI and ML methods in Water Treatment .......................27 CHAPTER 2 Tap water fingerprinting using a convolutional neural network built from images of the coffee-ring effect .........................................................................................................50 2.1 Abstract......................................................................................................................50 2.2 Introduction ...............................................................................................................51 2.3 Experimental .............................................................................................................55 2.4 Results and discussion ..............................................................................................65 2.5 Conclusions and future outlook ..............................................................................79 CHAPTER 3 Optimal environmental condition for contaminants separation by coffee-ring Effect ..................................................................................................................... 81 3.1 Abstract......................................................................................................................81 3.2 Introduction ...............................................................................................................82 3.3 Experimental Methods .............................................................................................93 3.3.1 Materials and instruments ............................................................................93 3.3.2 Four-axis-autosampler ..................................................................................94 3.3.3 Auto temperature humidity control chamber ............................................97 3.3.4 Water samples...............................................................................................99 3.3.5 Coffee-ring effect pattern statistical analysis methods ........................... 100 3.3.6 Experiment procedure ................................................................................ 106 3.4 Results and Discussion ........................................................................................... 110 3.4.1 Under what environmental conditions are coffee-ring effect fingerprints are consistent ................................................................................... 110 3.4.2 What are the optimal environmental conditions that different water samples exhibit mostly different coffee-ring effect residue patterns .... 113 3.4.3 Under each environmental condition, are the elements deposition locations significantly different from each other ............................ 127 x 3.4.4 Do the water sample coffee-ring effect patterns have significant statistical correlation with element composition ........................... 130 3.5 Conclusion................................................................................................................ 132 CHAPTER 4 CNN-Vision-transformer model for elements concentration estimation by coffee-ring effect residue patterns ....................................................................................... 134 4.1 Abstract.................................................................................................................... 134 4.2 Introduction ............................................................................................................. 135 4.2.1 Coffee-ring effect residue provides particles structure information........ 136 4.2.2 Applications of AI and ML methods in Water Treatment ..................... 137 4.2.3 Model for elements recognition and concentration estimation ............... 141 4.3 Experimental Methods ........................................................................................... 144 4.3.1 Develop a deep learning model to identify corrosion indicators and quantify their concentrations in tap water ................................................. 144 4.4 Results and Discussion ........................................................................................... 152 4.4.1 Elements correlations between coffee-ring effect subrings....................... 152 4.4.2 Elements mapping estimation model analysis ......................................... 153 4.4.3 Two-stage model produces better results than one-stage model ........... 161 4.5 Conclusion................................................................................................................ 174 CHAPTER 5 Implications .......................................................................................................................... 177 BIBLIOGRAPHY ................................................................................................................ 179 APPENDIX ......................................................................................................................... 203 xi CHAPTER 1 Introduction 1.1 Need for innovation in drinking water monitoring The need for innovation in drinking water monitoring is growing due to increased awareness of the impact of contaminated water on human health and the environment. Current monitoring methods are often expensive, time-consuming, and reliant on manual analysis. As a result, there is a pressing need for more efficient, cost-effective, and reliable methods to monitor drinking water quality. Innovations in technologies, such as sensors and machine learning, have the potential to revolutionize drinking water monitoring by providing real-time data and reducing the need for manual analysis. In addition, incorporating these technologies into drinking water monitoring systems can help to address the current challenges of limited resources and expertise in many communities, leading to better access to safe and clean drinking water for all. 1.2 Coffee-ring effect introduction 1.2.1 What is coffee-ring effect? The coffee-ring effect is a low-cost method for separating particles in aqueous samples. It occurs when a water droplet shrinks in height and its particles are squished into concentric circles based on size as the droplet dries on a hydrophobic substrate Wong et al. [2011]. This phenomenon is known as "nanochromatography" and has been used to separate particles with resolutions of 100 nm at low particle volume fractions Wong et al. [2011]. The separation is possible due to the differential effects of adhesion and surface tension forces, which move larger particles towards the center of the drop and hold smaller particles in place at the drop edge. 1 1.2.2 Several factors in pattern formation of crystals in the coffee-ring effect process Takhistov and Chang and other researchers found coffee-ring effect (CRE) depends on temperature, concentration of particles and substrate hydrophobicity. film and solutal flux dynamics of such small drops at their contact lines can induce macroscopic concentration segregation and produce distinct large-scale stain patterns such as concentric rings on hydrophilic surfaces and latticed crystals on hydrophobic ones. Coupling between these bulk segregation instabilities and the classical Mullins-Sekerka crystallization instability results in a large variety of crystal patterns with interwoven complex structures of two length scales. Furthermore, low density crystals can occupy a larger area than the initial drop, and gravitational drainage on inclined substrates can change the larger length scale. Takhistov and Chang [2002], Shahidzadeh-Bonn et al. [2008], Zhong et al. [2017]. Researchers also found polyelectrolyte concentration and humidity have effects on pattern formation Kaya et al. [2010]. Shin also demonstrated solubility, evaporation rate and mobility of the contact line determines the pattern of formed crystals in the coffee-ring effect Shin et al. [2014]. Lee proved the degree of supersaturation affects the nucleation pathways of potassium dihydrogen phosphate solution droplet Lee et al. [2016]. It is also found in the evaporation process of NaCl, the hydrophobicity (wettability) of substrate has effects on formed crystal pattern. On hydrophilic surface, ringlike crystalline deposit surrounded by a small spreading film formed and on hydrophobic surface, a close-up of the cauliflower-like pattern on the residue border was formed. And degree of saturation has effects on crystals pattern of Na2SO4 Shahidzadeh-Bonn et al. [2008]. Researchers found salts concentration and wettability have effects on the formation of crystal pattern Zhong et al. [2017]. 2 1.2.3 Understanding the mechanism of coffee-ring effect In terms of numerical approaches, a variety of studies have been conducted on the pattern formation of evaporating suspensions containing dissolved nanoparticles, employing Monte Carlo models Kim et al. [2011], Stannard [2011], Robbins et al. [2011], Brownian dynamics Gupta and Peters [1985], Chen and Kim [2004] and physical microfluid mechanism modeling Kang et al. [2016], Fischer [2002], Shmuylovich et al. [2002], Pauchard and Allain [2003], Popov [2005], Heim et al. [2005]. Previous study investigated a computational Monte Carlo method approach for estimating the ring-like deposition of nanoparticles contained in a drying liquid droplet Kim et al. [2011]. The investigation of non-equilibrium dewetting processes in nanoparticle-containing solutions revealed various pattern for example ring- like structures formations and other underlying mechanisms Stannard [2011]. A dynamic density functional theory was developed to replicate branched ’flower-like’, labyrinthine, and network structures and this model was used to examine the effects of solvent evaporation, as well as the diffusion of colloidal particles and liquid across the surface. Robbins et al. [2011]. A study demonstrated the formation of coffee stains necessitates specific boundary conditions, such as pinning boundaries Yunker et al. [2011]. A model based on the bulk flow within the drop transporting particles to the interface where they are captured by the receding free surface and subsequently transported along the interface until they are deposited near the contact line was investigated Kang et al. [2016] A review of recent studies can be found in Larson [2014]. 1.2.4 Crystal structure prediction with energy minimization Material synthesizing is an active area both in research and industry. Once a material is finally synthesized and characterized, its properties can be evaluated in the 3 engineering design process. However, to synthesize the desired material, most applications require an optimization of multiple properties which may be interrelated. In field of thermoelectrics, materials are compared to one another using a figure of merit. In this equation, S is the Seebeck coefficient, σ is the electrical conductivity, χ is the thermal conductivity, and T is temperature. However, the material properties σ, χ, and S are all interrelated. For example, electrical conductivity is positively related with high carrier concentration, whereas Seebeck coefficient is negatively related with carrier concentration to increase zT. In addition, thermal conductivity also increases with carrier concentration which in turn decreases zT. Therefore, optimization of thermoelectric materials requires a compromise between these properties. Also, the most significant advances in this field have come from identifying new compounds which exhibit a better intrinsic balance in these properties Graser et al. [2018]. 1.2.5 Coffee-ring effect applications Understanding and controlling the process of solute deposition in the presence of coffee-ring effect is important in manufacturing processes involving evaporation on surfaces including printing Park and Moon [2006], Friederich et al. [2013], Kuang et al. [2014], Sun et al. [2015], Huang and Zhu [2019] and fabrication of ordered structures Han and Lin [2012], functional nanomaterials Shao et al. [2014], Zou and Kim [2014] and colloidal crystals Park et al. [2006], Cui et al. [2009]. coffee-ring effect also improves the performance of commercial applications including fluorescent microarrays Blossey and Bosio [2002], Dugas et al. [2005], matrix assisted laser desorption ionization (MALDI) spectrometry Hu et al. [2013], Mampallil et al. [2012], Kudina et al. [2016], Lai et al. [2016], and surface enhanced Raman spectroscopy (SERS) Zhou et al. [2014a], Wang et al. [2014], Garcia-Cordero and Fan [2017]. coffee- ring effect has also implications in plasmonics Li et al. [2016a], solute separation Wong et 4 al. [2011], diagnostics Brutin et al. [2011], Wen et al. [2013], Gulka et al. [2014] and electronics applications de Gans and Schubert [2004]. Suppression of coffee-ring effect Coffee-ring effect can be suppressed through one of the three physical strategies (i) preventing the pinning of the contact line; (ii) disturbing the capillary flow towards the contact line and (iii) preventing the particles being transported to the droplet edge by the capillary flows. The coffee-ring effect could be suppressed by preventing contact line pinning using hydrophobic surfaces. Increasing the hydrophobicity of surfaces is often accompanied by decreasing contact angle hysteresis (CAH) Eral et al. [2013]. Lower CAH in essence means reduced contact line pinning which leads to suppression of coffee-ring effect. Lower CAH could be achieved by patterning of controllable surface wettability as reviewed previously by Tial et al. Tian et al. [2013]. These methods include chemical modification Ko et al. [2004], Tian et al. [2013], Li et al. [2018] and physical modification Yunker et al. [2011]. On hydrophobic and partially hydrophobic surfaces, pinning can even occur when the CAH or solute concentration is high. If CAH is high, during the contact angle decreases to the receding angle, typically a few seconds depending upon the rate of evaporation, solutes can accumulate at the contact line. Such accumulation produces ring- like deposits only if the duration of pinning is above a critical value for a given substrate-solute system Moraila-Martinez et al. [2013]. However if the pinning time is short, even with high initial solute concentration, the coffee-ring effect will just produce smaller inner rings Nguyen et al. [2013]. The nanoparticles are more prominent to form ring like patterns compared with larger particles as they can flow into the microscopic regions of the droplet edge faster. In 5 the presence of solute particles in the droplet, electrowetting (EW) can reduce the pinned contact line on (partially)-hydrophobic surfaces Mugele and Baret [2005], Li and Mugele [2008]. A droplet is deposited on a dielectric layer covering an electrode. When a voltage is applied between the droplet and the electrode an electric force pulls the contact line outward, overcoming the pinning forces so the contact line pinning is reduced. The coffee- ring effect can also be suppressed by vibration and acoustics, marangoni flow and other factors Mampallil and Eral [2018]. Researchers have also proposed a method that relies on the covalent cross-linking of monodisperse materials, which allows for the formation of thin films with uniform thicknesses and macroscale cohesion. This approach prevents the coffee-ring effect by inducing gelation of the coating materials through a thioacetate- disulfide transition, counterbalancing the capillary forces generated by evaporation Li et al. [2018]. Enhancing coffee-ring effect Evaporation of droplets can be utilized as a method to concentrate its solutes in it. Evaporation of the solvent can increase the analyte concentration making the reactions more probable Hernandez-Perez et al. [2016], De Angelis et al. [2011]. Concentrating solutes at the rim of the droplet by coffee-ring effect is called the self-ordered ring (SOR) method. It acts as a pre-concentration procedure before other analyses. The deposition of solutes and particles are exploited as a pre-concentration method 1.1. To enhance the coffee-ring effect, hydrophobic surface is usually used as the substrate. Drying process on hydrophobic surfaces forms smaller rings with higher solute density as the contact line is pinned only in the later stages of the evaporation. Liu et al. demonstrated that the SOR method enhanced the fluorescence detection of orally administrated berberine in human urine Liu et al. [2002]. Similarly, fluorescent detection of trace levels of tetracycline 6 Huang et al. [2004a], quinidine sulfate in serum samples Yang and Huang [2006] and fluorescein Liu et al. [2006] was demonstrated based on the SOR method. Coffee-ring effect could facilitate identifying pathogens which are associated with diseases by isolating the disease markers from body fluids Wong et al. [2011], Chen and Evans [2010]. The coffee-ring effect has been found to have several practical applications in various fields. In particular, it has been utilized to enhance the deposition of gold nanoparticles (AuNPs) on cellulose nanofibers (CNFs) for the purpose of improving surface-enhanced Raman scattering (SERS) as reported in several studies Chen et al. [2017], Wang et al. [2014], Hussain et al. [2019], Juneja and Bhattacharya [2019], Zhou et al. [2014b]. The coffee-ring effect has also been used as a low-cost approach for malaria diagnosis Gulka et al. [2014]. Additionally, the coffee-ring effect has shown potential for monitoring tap water quality with the help of deep neural networks Li et al. [2020]. Furthermore, the coffee-ring effect has the potential to aid in identifying pathogens associated with various diseases by isolating disease markers from body fluids Wong et al. [2011], Chen and Evans [2010]. These findings demonstrate the versatile and practical applications of the coffee-ring effect in various fields. 7 Figure 1.1: Suppression and Enhancement of coffee-ring effect. Comparison of different methods. The working principle, advantages and limitations are illustrated. 1.3 Machine-Learning Models in water treatment and modeling The table referred to as Table 1.1 provides a summary of AI and ML models and methods used in water treatment and modeling applications. It highlights their general and specific uses, as well as the advantages and disadvantages of each method. The final column includes references to peer-reviewed textbook sources that offer comprehensive and in-depth explanations of these models and methods. Although the table may not cover every aspect of water treatment and modeling, the applications selected are based on a well-defined 8 methodology. It is worth noting that the majority of the ML methods listed in the table fall under the "black-box" category, which is generally considered a drawback for most models. However, the exception to this are Genetic Algorithms (GA) and Gaussian Processes (GPs). 1.3.1 Image analysis via convolutional neural network (CNN) The basic ideas underlying the use of convolutional neural networks (CNNs, also known as ConvNets) for inverse problems are not innovative. For more historical perspective, see Schmidhuber [2015], Li et al. [2016b], and for an accessible introduction to deep neural networks and a summary of their recent research, see LeCun et al. [2015], Schwendicke et al. [2019], Brinker et al. [2018]. The CNN architecture was proposed in 1986 in RUMBERT [1986] and were developed for solving inverse imaging problems as early as 1988 Zhou et al. [1988]. These approaches, which used networks with a few parameters and did not always include learning, were largely superseded by compressed sensing (or, broadly, convex optimization with regularization) approaches in the 2000s. As computer hardware improved, it became feasible to train larger and larger neural networks, until, in 2012, Krizhevsky et al. Krizhevsky et al. [2017] achieved a significant improvement over the state of the art on the ImageNet classification challenge by using a GPU to train a CNN with 5 convolutional layers and 60 million parameters on a set of 1.3 million images. This work spurred a resurgence of interest in neural networks, and specifically CNNs, for not only computer vision tasks, but also inverse problems and more. With the development of CNN models, both accuracy and operation have increased dramatically. Basic CNN components There are numerous variants of CNN architectures in the literature. However, their basic components are the same. They all consist of three types of main layers, namely 9 convolutional, pooling, and fully-connected layers. The convolutional layer aims to learn feature representations of the inputs, for example human eyes features, nose features or objects. As shown in Figure. 1.2 Convolution layer is composed of several convolution kernels which are used to compute different feature maps. Specifically, each neuron of a feature map is connected to a region of neighbouring neurons in the previous layer. This neighbourhood is referred to as the neuron’s receptive field in the previous layer. The new feature map can be obtained by first convolving the input with a learn-able kernel and then applying an element-wise nonlinear activation function on the convolved results. After the activation function, a pooling layer is normally applied to the feature map to filter the high frequency noise. The complete feature maps are obtained by using several different kernels with the same or different activation and pooling functions Gu et al. [2018]. Mathematically, % the feature value at location (i, j) in the kth feature map of lth layer, 𝑧!,#,$ is calculated by the equation: % 𝑧!,#,$ = 𝑤$% 𝑥!,# % + 𝑏$% (1.2) Where 𝑤$% and 𝑏$% are the weight vector and bias term of the kth filter of the lth layer % respectively, and 𝑥!,# is the input patch centered at location (i, j) in the previous layer, the lth layer. It worth to know that the kernel 𝑤$% that generates the feature map 𝑧!,#,$% is shared but there are several different kernels generated and learned in the model building process 10 Figure 1.2: The architecture of the LeNet-5 network works well on digit classification task. Such a weight sharing mechanism has several advantages such as it can reduce the model complexity and make the network easier to train. At the same time, to not loose generality and information, several kernels is trained and implemented in the model structure. The activation function introduces nonlinearities to CNN, which are desirable for multi-layer networks to detect nonlinear features Gu et al. [2018]. The activation function are normally Sigmoid function, ReLU function, Tanh function and their derivatives LeCun et al. [2012], Hinton [2010]. Let a(·) denote the nonlinear activation function. The activation value a(i, j, k) of convolutional feature zl can be computed as % % 𝑎!,#,$ = 𝑎)𝑧!,#,$ * (1.2) The pooling layer aims to achieve shift-invariance and information aggregation by reducing the dimension of the feature maps in the previous layer. It is usually placed between two convolutional layers. Each feature map of a pooling layer is connected to its corresponding feature map of the preceding convolutional layer. Denoting the pooling function as pool(·), each feature map al could be denoted as: {%} 𝑌 = 𝑝𝑜𝑜𝑙 /𝑎{',(,$} 0 (1.3) In this equation, where Rij is a local neighbourhood around location (i, j). The typical pooling operations are average pooling Wang et al. [2012] and max pooling Boureau et al. 11 [2010], Murray and Perronnin [2014]. The kernels in the lower convolutional layers are designed to detect low-level features such as edges and curves, while the kernels in higher layers are learned to detect more abstract features. By stacking several convolutional, activation and pooling layers, the model could gradually extract higher-level feature representations. After the convolutional and pooling layers, there may be one or more fully-connected layers which aim to perform high-level reasoning Simonyan and Zisserman [2014], Zeiler and Fergus [2014], Hinton et al. [2012]. They take all neurons in the previous layer and connect them to every single neuron of current layer to generate global semantic information. Note that fully-connected layer not always necessary as it can be replaced by a 1 x 1 convolution layer Lin et al. [2013], Saxena and Verbeek [2016]. The last layer of CNNs is an output layer. Softmax operator is commonly used for classification tasks Russakovsky et al. [2015]. Another commonly used method is SVM, which can be combined with CNN features to solve different classification tasks Tang [2013], Madjarov et al. [2012]. Let θ denote all the parameters of a CNN (e.g., the weight vectors and bias terms). The optimum parameters for a specific task can be obtained by minimizing an appropriate loss function defined on that task. Suppose we have N desired input-output relations (xn, yn); n ∈ [1, ..., N ], where xn is the n-th input data, yn is its corresponding target label and on is the output of CNN. The aim of training CNN is a problem of global optimization. However, in practice, it is often a local minimum problems and by minimizing the loss function. Stochastic gradient descent is a common solution for optimizing to find the best fitting set of parameters. The loss of CNN can be calculated as follows: 12 * (,* 𝑙(q; 𝑦 , 𝑜 ) 𝐿 = + ∑+ ( ( (1.4) Recent advances in convolutional neural networks Since 2006, many methods have been developed to overcome the difficulties encountered in training deep CNNs Niu and Suen [2012], Russakovsky et al. [2015], Simonyan and Zisserman [2014], Szegedy et al. [2015]. For example, the CNN model proposed by Krizhevsky et al. showed significant improvements upon previous methods on the image classification task. The overall architecture of their method, i.e., AlexNet Russakovsky et al. [2015], is similar to LeNet-5 but with a deeper structure. With the success of Krizhevsky’s work, many works have been proposed to improve its performance. Among all these works, there are four models which are most representative. These models are ZFNet Zeiler and Fergus [2014], VGGNet Simonyan and Zisserman [2014], GoogleNet Szegedy et al. [2015] and ResNet He et al. [2016]. From the evolution of the model architectures, a typical trend is that researchers are building deeper networks, e.g., ResNet, which won the champion of ILSVRC 2015, is about 20 times deeper than AlexNet. Theoretically, By increasing depth, the network can achieve better feature extraction and representation which could approximate the target function better. However, deeper model architecture also increases the complexity of the network, which makes the network be more difficult to optimize and easier to get overfitting and suffer th curse of dimensionality problem. Along this way, various methods have been proposed to deal with these problems in various aspects. 1.3.2 Vision Transformer in computer vision Deep neural networks (DNNs) form the core of AI systems. Different types of networks are designed for different tasks. The multi-layer perceptron (MLP) or fully connected (FC) network, made up of multiple linear layers and nonlinear activations, is a 13 classic neural network Rosenblatt [1957]. Convolutional neural networks (CNNs), consisting of convolutional and pooling layers, are used to process images and other shift- invariant data LeCun et al. [1998], Krizhevsky et al. [2017]. Recurrent neural networks (RNNs) use recurrent cells to process sequential or time series data Hochreiter and Schmidhuber [1997]. The transformer is a novel neural network that uses self-attention mechanisms Bahdanau et al. [2014], Parikh et al. [2016] to extract intrinsic features Vaswani et al. [2017]. It has shown potential for a wide range of AI applications, especially in NLP. For example, Vaswani et al. Vaswani et al. [2017] proposed the transformer for machine translation and English constituency parsing tasks, and BERT (Bidirectional Encoder Representations from Transformers) was introduced by Devlin et al. Devlin et al. [2018], a language representation model that pre-trains the transformer on unlabeled text, considering the context of each word in a bidirectional manner. BERT achieved state-of-the-art results on 11 NLP tasks. Brown et al. Brown et al. [2020] pre- trained the massive transformer-based model, GPT-3 (Generative Pre-trained Transformer 3), using 45 TB of compressed plaintext data and 175 billion parameters, and it performed well on various downstream NLP tasks without fine-tuning. These transformer- based models have brought significant advances to NLP. Inspired by the success of transformer architectures in NLP, researchers have recently applied them to computer vision (CV) tasks. Although CNNs have been traditionally considered the foundation of CV He et al. [2016], Ren et al. [2015], the transformer is emerging as a potential alternative. Chen et al. Chen et al. [2020] trained a sequence transformer to auto-regressively predict pixels, achieving results comparable to CNNs in image classification tasks. Dosovitskiy et al. Dosovitskiy et al. [2020] proposed the vision transformer model, ViT, which directly applies a pure transformer to sequences 14 of image patches to classify the full image, and it has achieved state-of-the-art performance on multiple image recognition benchmarks. Transformer has also been used to solve various other CV problems, such as object detection Carion et al. [2020], Zhu et al. [2020], semantic segmentation Zheng et al. [2021], image processing Chen et al. [2021], and video understanding Zhou et al. [2018]. Its exceptional performance has attracted more researchers to propose transformer-based models for a wide range of visual tasks. 1.3.3 Machine-Learning Models and Artificial-Intelligence Methods in Water Treatment Table 1.1 summarizes AI and ML models and methods, highlighting their general and specific usages in water treatment and modeling applications, as well as their advantages and disadvantages. The final column includes peer-reviewed textbook sources that provide foundational and in-depth explanations of these models and methods. While not all-encompassing, the selected water treatment and monitoring applications are based on a specified methodology. The majority of the included ML methods fall under the "black- box" archetype, which is generally considered a disadvantage for most models, with the exception of GA/GPs. 15 Table 1.1: A summary of AI methods and ML models used in water treatment and monitoring. Reviewed Water Leaning and General Treatment and Monitoring Advantages Disadvantages Modeling Applications Applications Technique Models for disinfection by- Kernel selection is product (DBP) modeling initially difficult and Developing time consuming when Classification Models for membrane models using SVM/SVR process parameter modeling capable of handling high modeling Regression, dimensional datasets Classification, Models for biological (i.e., datasets with a high SVM/SVR modeling Support Vector oxygen demand (BOD) and Pattern number of inputs vs. a requires high Machines, chemical oxygen demand Analysis Cortes lower number of outputs) computational power, Regressions (COD) modeling and making it mostly Vapni Developing models that unsuitable for larger k Models for dissolved can handle small changes oxygen modeling of rivers datasets [1995], Chua in the dataset [2003], Noble Models for aquaponics SVM/SVR modeling is [2006], Caie Developing models that susceptible to noise in growth rate modeling are functional with both et al. datasets Models for aquaponics linear and nonlinear data. [2021], growth stage classification SVM/SVR modeling has Goodfellow et relatively long training al. [2016] times. 16 Table 1.1: (cont’d) Reviewed Water Leaning and General Treatment and Monitoring Advantages Disadvantages Modeling Applications Applications Technique Modeling adsorption Intuitive model process parameters and architecture for efficient Accuracy and robustness Supervised percent removal using ML and effective ML of the model are machine modeling determined by the learning Developing simple and density of decision trees hybrid models for dissolved Models capable of Regression, oxygen prediction and handling continuous and Increasing the density of Random Forest decision trees results in (RF) Classification modeling categorical inputs, even Maimon and with missing values or significant increases in Rokach [2005], data model complexity, Ceri et al. training period, and [2003], Singh Models that are relatively required computational et al. [2016], Liu stable and have less power et al. [2012], impact due to noise and Hastie et al. outliers [2009] Bagging algorithms to reduce overfitting and variance in the model 17 Table 1.1: (cont’d) Reviewed Water Leaning and General Treatment and Monitoring Advantages Disadvantages Modeling Applications Applications Technique Supervised machine Requires minimal learning training and can be easily Poor performance with k-Nearest implemented large datasets or those Neighbor (k- Classification Classification of aquaponics with high dimensionality NN) Gaya et al. growth stage Capable of handling new [2017], Zhu data additions without Susceptible to noise and [2002], Abba requiring significant missing data, which can et al. modifications to the result in decreased [2020], model accuracy Wills et al. [2013], Allafi et al. [2017] 18 Table 1.1: (cont’d) Reviewed Water Leaning and General Treatment and Advantages Disadvantages Modeling Applications Monitoring Applications Technique Decision Utilizing fuzzy logic making, system rather than binary logic The applicability of control Moraga Models for chlorine dosage to better model the models developed Fuzzy Inference et al. set-point control human experience of with fuzzy logic is System (FIS) [2003], decision making dependent on operator Afroozeh et al. Developing models for defined parameters and [2018], hydroponics system and Developing models with experience, which makes Moon environmental control easily interpretable them prone to human et al. [2011], outputs and decisions error. Kaynak et al. with a well defined [1998], Zadeh system [1998] 19 Table 1.1: (cont’d) Reviewed Water Leaning and General Treatment and Advantages Disadvantages Modeling Applications Monitoring Applications Technique DBP (disinfection Supervised byproduct) formation Capable of handling high High computational machine modeling dimensional datasets power associated with learning backward propagation Artificial Neural Adsorption Modeling/prediction stage Network Regression, process parameter results obtained in a Classification modeling reasonable amount Some models and Goodfellow of time architecture themselves et al. Membrane are difficult to interpret [2016], process parameter Forward propagation Shahmansouri et modeling capable of cheap and fast See below al. [2021] computation for specific Chlorine dosage/set-point ANN model disadvantages Dissolved oxygen concentration modeling 20 Table 1.1: (cont’d) Reviewed Water Leaning and General Treatment and Monitoring Advantages Disadvantages Modeling Applications Applications Technique Data must be in fixed Regression, CNNs have been shown dimensions Classification, to produce highly Convolutional Segmentation accurate results on a wide Requires high Neural Network LeCun et al. Disinfection by-product range of image and video computational (CNN) [2015], Kim formation modeling recognition tasks power: Training and and Kim [2017], processing CNNs Acharya et al. Operations run in parallel can be computationally [2017], Gu et al. and results are obtained intensive, requiring [2018] quickly significant computational power and resources 21 Table 1.1: (cont’d) Reviewed Water Leaning and General Treatment and Advantages Disadvantages Modeling Applications Monitoring Applications Technique Regression, Classification LeCun Suitable for sequential Recurrent et al. Parameter modeling datasets especially time Training and processing Neural Network [2015], of membrane series datasets and RNNs requires high (RNN)/Long Zhou et al. process modeling computational power Short Term [2019], Zhang Memory et al. Modeling of dissolve Suitable for varying Prone to gradient (LSTM) [2020], oxygen concentration lengths of sequence exploding and vanishing Hochreiter and modeling datasets Schmidhuber [1997], Smagulova and James [2020] Capturing nonlinear effects and Hammerstein Regression Dissolved oxygen simultaneously being Limited model structure Wiener (HW) concentration modeling computationally less complex than fully nonlinear dynamic models 22 Table 1.1: (cont’d) Reviewed Water Leaning and General Treatment and Monitoring Advantages Disadvantages Modeling Applications Applications Technique Slow convergence: GAs can sometimes take a Parallelism: Genetic long time to converge to Evolutionary, algorithms can explore the optimal solution, stochastic multiple solutions especially for large or algorithm simultaneously, allowing complex problems Genetic for faster convergence to Algorithm Regression, DBP formation modeling an optimal solution Premature convergence: Classification GAs can converge Agrawal and Applicability: GAs are prematurely to Mathew applicable to a wide suboptimal solutions if [2004] range of problems, the population diversity , including those with is lost Yang discrete, continuous, or [2020] mixed variable types, and Computational cost: , those with multiple Genetic algorithms can Katoch et al. objectives or constraints be computationally [2021] expensive, particularly for large scale or high dimensional problems 23 Table 1.1: (cont’d) Reviewed Water Leaning and General Treatment and Monitoring Advantages Disadvantages Modeling Applications Applications Technique RBF networks are RBF networks are hard capable of approximating to scale to large datasets any continuous function, and high dimensional Regression, given a sufficient number datasets Classification Modeling of DBP formation of hidden neurons and Radial Basis LeCun et al. appropriate basis The model may become Function (RBF) [2015], Prediction of adsorption functions overly complex or overfit Kernel Karimi process removal efficiency the data if the basis et al. [2020], RBF networks can be functions not chosen Modeling of membrane trained more quickly than correctly Powell et al. process parameters other types of neural [1981], Baddari Susceptible to local et al. [2009] networks minima RBF networks are generally more robust to The choice of radial basis noise than other types of functions is fixed which neural networks limits its flexibility 24 Table 1.1: (cont’d) Reviewed Water Leaning and General Treatment and Monitoring Advantages Disadvantages Modeling Applications Applications Technique Adaptive Neuro- Regression, DBP formation modeling Fuzzy logic components of ANFIS models can be Fuzzy Inference Classification ANFIS allow for greater complex, with many Systems (ANFIS) Farhoudi Adsorption process removal interpretability of the parameters to tune efficiency modeling model et al. [2010], The training process Karaboga and Membrane process parameters ANFIS is capable of of ANFIS can be Kaya [2019], modeling modeling complex computationally intensive nonlinear relationships and time Adedeji et al. Dissolved oxygen between inputs and consuming [2019] concentration modeling outputs, making it suitable BOD/COD modeling for a wide range of ANFIS model is prone to applications overfitting the data ANFIS models are ANFIS may not scale well generally robust to noise to large or high and uncertainties in the dimensional datasets data The performance of ANFIS can be sensitive to the initial settings of the membership functions and rule base 25 Table 1.1: (cont’d) Reviewed Water Treatment Advantages Disadvantages Leaning and and Monitoring Applications General Modeling Applications Technique Extreme Learning Regression, Dissolved oxygen Relatively short training Often faces over fitting or Machine (ELM) Classification Zhu concentration modeling times under fitting if too et al. [2005], many/few hidden nodes are Huang et al. Suitable for pattern utilized [2004b] classifications Boltzmann Unsupervised Wastewater treatment process Capable capture complex Learning is slow and Machines learning modeling dependencies between computationally intensive variables Optimization, water treatment automated Challenge to scale to large system control anomaly detection Provide a measure of datasets and high Demertzis et al. uncertainty for the learned dimensional problems [2022], Harrou representations Learning algorithm can get et al. [2018] Flexible architecture: stuck in local optima Boltzmann machines can be adapted and extended to Difficult to interpret various architectures, such Outperformed as Restricted Boltzmann by modern Machines (RBMs) and techniques, such as Deep Belief Networks deep learning models (DBNs) 26 1.3.4 Applications of AI and ML methods in Water Treatment Chlorination control has been effectively managed using AI methods, while ML models have shown efficacy in modeling DBP concentrations and significant parameters for adsorption and membrane-filtration processes. Commonly used statistical measures for evaluating results include the coefficient of correlation (R), coefficient of determination (R2), mean average error (MAE), mean square error (MSE), root mean square error (RMSE), and relative error (RE). The following sections provide a brief overview of the applications of AI and ML methods in water treatment. Chlorination and Disinfection By-Product Estimation In water and wastewater treatment plants, disinfection is crucial for killing or inactivating microorganisms and viruses, often with chlorine-based disinfectants Li et al. [2017], Xu et al. [2015, 2013]. However, chlorine poses human health hazards and can react with bromide and organic matter to create disinfection by-products (DBPs), which are suspected carcinogens and reproductive disruptors Sedlak and von Gunten [2011], Bull et al. [1995]. DBPs are divided into two subcategories, trihalomethanes (THMs) and haloacetic acids (HAAs), with THMs being the most common form. ML technologies are well-suited for predicting and mitigating DBP formation. AI methods can be used for controlling chlorination. The studies often tested models on surface waters treated with chlorine as the primary disinfectant and noted success in modeling DBP concentrations in treated water distribution networks and at consumer taps Librantz et al. [2018], Godo-Pla et al. [2021], Singh and Gupta [2012], Mahato and Gupta [2022], Park et al. [2018], Lin et al. [2020], Xu et al. [2022], Peleato [2022], Okoji et al. [2022], Cordero et al. [2021]. Common model inputs include water temperature, pH, chlorine concentration, contact time, and TOC/DOC concentrations, as well as other markers such as bromine concentration, UV254, algae and 27 chlorophyll-a concentrations, and DBP-precursor chemicals. The most commonly tested ML model for chlorination and DBP prediction is the Artificial Neural Network (ANN), although other models such as support vector machines, fuzzy inference systems, and genetic algorithms have also been used. In comparative studies, ANNs generally outperform GAs and SVMs, although in some cases, SVMs have provided a slight advantage when using R2 as a comparison metric Wortmann and Flüchter [2015], Imo et al. [2007]. Researchers have modeled and predicted common DBPs, such as total trihalomethanes (TTHM) and total haloacetic acids (THAA), as well as specific DBP compounds including dichloroacetic acid (DCAA), trichloroacetic acid (TCAA), bromochloroacetic acid (BCAA), HAA5, HAA9, trichloromethane (TCM), bromodichloromethane (BDCM), and dibromochloromethane (DBCM). Statistical model validation numbers did not show significant differences in predictions for TTHMs or THAAs versus their individual compounds. 28 Table 1.2: Disinfection by-products (DBP) formation prediction by ML models. AI/ML Target Water Disinfectants Technique Input Variables Output Year Compounds Source Used Artificial neural Dissolved organic network (ANN), carbon normalized Total Surface support vector Chlorine chlorine dose, water TTHM effluent 2012 trihalomethanes water machine (SVM), concentration Singh pH, temperature, (TTHMs) and gene and Gupta [2012] bromide expression concentration, programming and contact time (GEP) modeling Artificial neural Temperature, pH, TTHM effluent TTHM Tap Chlorine network and 2022 residual chlorine, concentration Mahato water support vector TOC, UV 254 and Gupta [2022] machine Dissolved organic carbon (DOC), RBF-ANN, UVA254, bromine Haloacetic Tap Chlorine linear/log linear concentration, DBP tap 2020 acids (HAAs) water regression temperature, pH, concentration Lin (MLR) models Cl 2 concentration, et al. [2020] NO2 − N concentration, NH4+ − N concentration 29 Table 1.2: (cont’d) AI/ML Target Water Disinfectants Technique Input Variables Output Year Compounds Source Used Ion artificial Temperature, pH, Trichloromethane neural network UV absorbance (TCM), Tap (RBF ANN), at 254 (UVA254), TTHM Chlorine bromodichloromethane 2020 water Hybrid method dissolved organic (BDCM) and of RBF ANN carbon, bromide, total-THMs and grey residual free (T-THMs) Hong relational chlorine, nitrite et al. [2020] analysis (GRA) and ammonia Linear/log TTHMs, linear regression Sum of Tap models (LRM) pH, temperature, DBP tap trichloromethane water Chlorine and radial UV A254, Cl2 concentration Xu 2020 (TCM), BDCM basis function concentration et al. [2022] artificial neural network (RBF ANN) Dichloroacetonitrile (DCAN), TTHMs Tap Chlorine Classification Fluorescence trichloropropanone 2016 water trees spectra (TCP), trichloronitromethane (TCNM) Bergman et al. [2016] Peroxide DBP effluent Tap Fluorescence TTHMs, HAAs (Ozone), CNN concentration Peleato 2022 water spectra Chlorine [2022] 30 Table 1.2: (cont’d) AI/ML Target Water Disinfectants Technique Input Variables Output Year Compounds Source Used Temperature, TTHMs, TCM, Tap Adaptive pH, UVA254, DBP effluent BDCM, DBCM water Chlorine neuro-fuzzy residual chlorine concentration Okoji inference system concentration, et al. [2022] (ANFIS) dissolved organic carbon Chlorine Least-square Trihalomethanes Tap dose/DOC, THM concentration Boost (THMs) water Chlorine reaction time, Sikder et al. [2023] 2023 (LSBoost), XGBoost, and pH, bromide Random forest concentration, and temperature Tempaerature, total residual Generalized chlorine, dissolve DCAN, TCP, Tap Chlorine regression organic chlorine, DCAN, TCP, TCNM 2021 TCNM water neural network turbidity, pH, Mian et al. [2021] (GRNN) conductivity, absorbance, TCM, BDCM, DBCM, DCAA, TCAA 31 Table 1.2: (cont’d) AI/ML Target Water Disinfectants Technique Input Variables Output Year Compounds Source Used Reservoir set-point output, FRC of Chlorine treated water dose and Surface Chlorine ANN tank, FRC Chlorine dosage, 2018 free residual water output of WTP WTP FRC set point chlorine (FRC) (mg/L), WTP Librantz et al. [2018] set point production flow rate, compensating system flow rate, dosage error Water quality Multivariate parameters linear measured in the regression-based Small samples include DCAN, model, water Chlorine water temperature, DCAN, chloropicrin 2023 chloropicrin, regression total residual (CPK) and TCP Hu distribution and TCP tree-based chlorine, dissolved et al. [2023] networks model, neural (SWDNs) organic carbon, networks-based turbidity, pH, model and conductivity, advanced and ultraviolet non-parametric absorbance at 254 regression model nm (UV254) 32 Table 1.2: (cont’d) AI/ML Target Water Disinfectants Technique Input Variables Output Year Compounds Source Used Inflow rate, Raw water total organic Surface carbon (TOC), Free chlorine and Chlorine water Chlorine FIS Raw turbidity, chlorine dioxide dose 2021 conductivity, Godo-Pla et al. [2021] temperature, Raw water UV absorbance Number of aromatic bonds, hydrophilicity, Haloacetic Support vector electrotopological acids (HAAs), regressor, descriptors related DBP effluent trichloroacetic Lab Chlorine random forest to electrostatic concentration Cordero 2121 acid (TCAA), synthesized regressor, and interactions, and et al. [2021] dichloroacetic multilayer atomic distribution acid (DCAA) perceptron of electronegativity, regressor geometry, ionization potential, , steric effects, and acid-base interactions et al. 33 Adsorption Processes Adsorption processes are a crucial physical and chemical treatment option for removing various contaminants in the water and wastewater treatment industries. These processes transfer target molecules from fluids to solid surfaces, known as adsorbents or sorptive media. Due to the complex interactions involved in the process, it can be challenging to determine the adsorption parameters and ultimate removals accurately Karri et al. [2020], Vinayagam et al. [2022]. Predictive models using ML can optimize the adsorption process and extend the media’s life, increasing the plant’s effectiveness and confidence in meeting applicable regulations. Studies have modeled adsorption processes with water streams contaminated with metals, industrial dyes, and organic compounds using various adsorbent media, including carbonaceous materials and metal-based nanocomposites Bhagat et al. [2021], Mazloom et al. [2020], Mesellem et al. [2021a], Al-Yaari et al. [2022], Mazaheri et al. [2017], Ahmad et al. [2020], Fawzy et al. [2016], Ullah et al. [2020], Mahmoud et al. [2019], Mesellem et al. [2021b]. Common inputs for modeling adsorption processes include pH, water temperature, adsorbent dose, contact time, and initial adsorbate concentration. Other models have used parameters such as adsorbent particle size, system flow rate, agitation speed, bed height, and BET surface area, among others. The published studies mostly focused on adsorbate percentage removal, while some models predicted adsorption capacity, non-dimensional effluent concentrations, and the relative importance of input water-quality parameters. These models have the potential to support operator decisions and improve the efficiency of the adsorption process. ANN was the most commonly used ML model in studies involving metal, organic, and industrial-dye contaminants, while ANFIS, SVM, and RF were also studied with notable success. These models generally achieved R2 values greater than 0.9 and 34 sometimes greater than 0.99 Bhagat et al. [2021], Mazloom et al. [2020], Mohammadi et al. [2019]. SVM models performed slightly better than ANN models in most cases, producing R2 and RMSE values with better statistical value. However, in one case, the optimized ANFIS model performed poorly compared to other successful models, with an R = 0.813, and was noted as the worst performing model in a comparison between ANN, ANFIS, and SVM models Mesellem et al. [2021a]. In another case, the ANFIS model achieved adequate performance with an R2 of 0.9333 Al-Yaari et al. [2022]. 35 Table 1.3: Adsorption processes and removal rates prediction by ML models. ML Technique Adsorbate Adsorbent Input Variables Output Year Used Nanosized iron-oxide- Initial concentration, As percent As (III) immobilized Artificial neural adsorbent dosage, pH, and removal Maurya 2022 graphene oxide network (ANN) residence time et al. [2022] gadolinium oxide (Fe-GO-Gd) pH, As initial Adaptive Adsorbate As (III) A variety of concentration, contact time, 2022 network-based percent removal absorbents or adsorbent dosage, inoculum fuzzy inference Al-Yaari et al. biosorbents size, and temperature, system (ANFIS) [2022] agitation speed, flow rate Grid optimization-based Initial concentration of Cu Adsorbate random forest (IC), the dosage of percent removal Copper ions Attapulgite clay (Grid-RF), artificial Attapulgite clay (Dose), Bhagat et al. 2021 neural contact time (CT), pH, and [2021] network (ANN) addition of NaNO3 and support vector machine (SVM) 36 Table 1.3: (cont’d) ML Technique Adsorbate Adsorbent Input Variables Output Year Used Contents of ash, carbon, hydrogen, oxygen, nitrogen, sulfur, and iron, H/C atomic ratio, O/C atomic As (III, IV) Biochar Random forest ratio, (O + N)/C atomic As adsorption 2021 algorithm ratio, and specific surface capacity Liu area (SBET), As species et al. [2023] (arsenite or arsenate), initial concentration (CAs), adsorption conditions, reaction temperature, solution pH, adsorbent dosage BET surface area and Group Method volume of micropores Adsorbate of Data Handling Asphaltenes Nickle(II) Oxide (GMDH), ANN, of nanocomposite, pH, percent removal 2020 Nanocomposites Mazloom et al. Least Squares amount of nanocomposites [2020] Support Vector over asphaltenes initial Machine (LSSVM) concentration (D/C0), temperature Artificial Neural Networks (ANNs), Molar mass, initial Non-dimensional Various organic Activated Support Vector concentration, flow effluent pollutants carbon Machines (SVMs) rate, bed height, BET concentration 2021 and Adaptive surface area, time Mesellem et al. Neuro-Fuzzy and concentration of [2021a] Inference System non-dimensional effluents (ANFIS) 37 Table 1.3: (cont’d) ML Technique Adsorbate Adsorbent Input Variables Output Year Used Boosted regression trees (BRTs), Stirring time, pH, Adsorbate Methylene blue Natural walnut artificial adsorbent mass, MB percent removal (MB), Cd(II) activated carbon neural network concentration, Cd(II) Mazaheri et al. 2017 (ANN) and concentration, [2017] response surface methodology (RSM) Methylene Methylene blue Graphite oxide ANN Solution pH, initial dye blue removal 2014 (MB) (GO) nano concentration, contact time efficiency and adsorbent dosage Ghaedi et al. [2014] Neodymium(III) SY removal Sunset yellow chloride ANN Initial concentration, 2020 efficiency (SY) modified order reaction time, and Ahmad et al. mesoporous adsorbent dosage [2020] carbon (OMC) Initial pH, bioadsorbent Typha Adaptive Metal-ions Ni(II), Cd(II) dosage, initial metal-ions 2016 domingensis neuro-fuzzy removal concentration, contact (Cattail) inference system efficiency Fawzy biomass (ANFIS) time, biosorbent particle et al. [2016] size Low-cost Contact time, initial Adsorption Zn(II) adsorbents ANN 2020 concentration and the capacity Ullah produced from applied temperature et al. [2020] rice husks 38 Table 1.3: (cont’d) ML Technique Adsorbate Adsorbent Input Variables Output Year Used Encapsulated Initial pH, initial PO43− Adsorbate Phosphate nanoscale ANN concentration, adsorbent percent removal 2018 zero-valent dose, contact time, stirring Mahmoud et al. iron rate [2019] Molar mass of target contaminant, initial Non-dimensional Systems organic Activated concentration, flow rate, bed effluent pollutants carbon ANN height, particle diameter, 2021 BET surface area, average concentration pore diameter, time, Mesellem et al. concentration of [2021b] dimensionless effluents Magnetic Adsorption Pb (II) ash/graphene ANN Initial Pb ion concentration, 2021 capacity Zeng oxide (GO) temperature et al. [2022] nanocomposites Composite of Pb (II), Cd (II) metal organic ANN Type of ions (Pb, Cd) and Adsorption 2021 framework and time capacity Wei layered double et al. [2021] hydroxide Fibrous Adaptive Dose, pH, time, Removal As (III), Cr(VI) zirconium oxide 2021 neuro-fuzzy temperature and initial efficiency ethylenediamine concentration, bed height Mandal et al. inference system adipate (ZEDA) (ANFIS) and flow rate [2015a] hybrid material 39 Table 1.3: (cont’d) ML Technique Adsorbate Adsorbent Input Variables Output Year Used Cerium Adsorbent dose, pH, Removal As (III) hydroxylamine ANN 2015 contact time, initial efficiency hydrochloride concentration and contact Mandal et al. (Ce-HAHCl) temperature [2015b] hybrid material Cerium oxide Removal Adsorbent dose, time, pH, Cr (IV) polyaniline ANN efficiency 2015 temperature and initial (CeO2/PANI) Mandal et al. concentration composite [2015c] 40 Membrane-Filtration Processes Membrane processes separate contaminants in water and wastewater treatment by passing the water through a barrier or filter using high-pressure differentials. These processes are typically used for contaminants that are difficult or costly to remove by chemical or physical means or require a high level of removal that cannot be achieved by other means. Microfiltration, ultrafiltration, nanofiltration, and reverse osmosis are the most commonly used membrane processes Hube et al. [2020], Pronk et al. [2019]. These models have been used with microfiltration, ultrafiltration, nanofiltration, reverse osmosis, and submerged membrane bioreactors to treat various water sources contaminated with pollutants and natural compounds such as petroleum, natural organic matter, industrial and pharmaceutical wastes, and saltwater Zoubeik et al. [2019], Fetanat et al. [2021], Khan et al. [2022], Yusof et al. [2020], Nazif et al. [2020], Shim et al. [2021], Ammi et al. [2021a]. ANN is the most dominant model used, although ANFIS, SVM, and specific forms of ANNs, including RNNs that utilize LSTM, have also been used for membrane-filtration- process modeling. ML techniques for modeling membrane-filtration processes aim to output several variables, such as transmembrane pressure, permeate flux, and solute rejection. Inputs in published studies include pH, temperature, contact/filtration time, transmembrane pressure, and flux rate, among others. Due to the wide range of models testing for different parameters, it is difficult to make a full statistical comparison of the values obtained in these studies. However, ANN, RNN, and SVM models consistently performed well, achieving R2 values greater than 0.9 and often greater than 0.99 Zoubeik et al. [2019], Khan et al. [2022], Yangali-Quintanilla et al. [2009] (Table 1.4). 41 Table 1.4: Membrane-filtration parameters prediction by ML models. Membrane ML Technique Water Source Input Variables Output Year Type Used Transmembrane pressure Titanium-based Petroleum Permeate flux ANN, ANFIS, (TMP), crossflow velocity et 2019 ceramic production Zoubeik al. RBF-ANN (CFV), temperature, pH ultrafiltration wastewater [2019] and time Temperature, pH, crossflow Aluminum oxide Permeate flux Various water Hermia model, velocity (CFV), and et 2022 microfiltration Zoubeik al. types ANN transmembrane pressure (MF) membrane [2022] (TMP) Nanolayered Pure water double Nanolayered double 2017 flux, protein hydroxide Various water ANN-GA hydroxide (NLDH), flux and flux decorated types polyvinylpyrrolidone (PVP, thin-film MW = 29 000 g/mol) and recovery ratio nanocomposite polymer concentrations. Arefi-Oskoui membrane et al. [2017] Polymer concentration, polymer type, filler Solute rejection, Nanocomposite Various ANN concentration, average filler flux recovery, 2021 membranes size, solvent concentration and pure water (in the dope solution), flux Fetanat solvent type, and contact et al. [2021] angle Oscillating Dilute suspension ANN Permeate flux, shear rate, Transmembrane slotted mixture of crude filtration time pressure (TMP) 2022 membrane oil, dilute Khan et al. [2022] suspension mixture of tween- 20 42 Table 1.4: (cont’d) Membrane ML Technique Water Source Input Variables Output Year Type Used Submerged RNN, nonlinear Pump voltage, airflow, Permeate flux, membrane Palm oil mill auto-regressive transmembrane pressure OR transmembrane 2019 bioreactor effluent model flux pressure (TMP) Yusof et al. [2020] Feedforward neural network (FFNN), Permeate Submerged radial basis flux and membrane Waste water function neural Permeate pump voltage transmembrane 2020 bioreactor network (RBFNN) pressure (MBR) filtration and nonlinear Mahmod et al. system autoregressive [2020] exogenous neural network (NARXNN) Membrane operating period, time interval Pressure drop Reverse osmosis Ground water General regression between consequent (PD), salt membrane and surface neural network cleanings, water passage (SP) 2020 (BW30-400) water (GRNN) temperature, input Nazif et al. concentration, [2020] inflow , inlet pressure of the compartments, recovery ANNs, Random Pressure, flow rate, Salt passage, Reverse osmosis Municipal forest, multiple temperature, conductivity, permeate flow 2022 wastewater linear ORP, turbidity, dissolved rate Odabaşı et regressio organic carbon (COD), TDS al. [2022] n models 43 Table 1.4: (cont’d) Membrane ML Technique Water Source Input Variables Output Year Type Used Operation time, pressure, Permeate flux Nanofiltration Surface water Long short-term initial permeate flux, (PF), fouling system with natural memory (LSTM) dissolved organic carbon layer thickness 2021 organic matter model (DOC), modified FRI, (FLT) Shim et optical al. [2021] coherenc e tomography (OCT) images Substrate type, nanoparticle type, nanoparticle size, nanoparticle loading, Support vector amine monomer type, amine Relative Organic solvent Various water machine (SVM), concentration, permeability nanofiltration types boosted tree (BT), chloride monomer type, (RP) and 2023 (OSN) and artificial neural chloride concentration, relative network (ANN) water contact angle, surface selectivity (RS) roughness, organic solvent Wang et al. type, solvent properties [2023] (molecular weight, viscosity, density and molar volume), solute type, solute concentration, solute charge and solute molecular weight 44 Table 1.4: (cont’d) Membrane ML Technique Water Source Input Variables Output Year Type Used Anti-inflammatory drug properties (logD, dipole moment, the effective diameter of the organic compound in water "dc", Rejection Nanofiltration, Pharmaceutical ANN, SVM molecular length, and percentage 2021 reverse wastewater molecular equivalent width of the target osmosi "eqwidth"); membrane compound s membranes characteristics (molecular Ammi et al. weight cutoff "MWCO", [2021a] sodium chloride salt rejection "SR (NaCl)", zeta potential, and contact angle); and filtration conditions (pH, pressure, temperature, and recovery) Polyamide-based Water flux, thin film Effluent from Organic matters, sodium ion, membrane composite (TFC) primary treatment ANN, SVM and calcium ion fouling, 2022 FO plant concentrations and membrane removal efficiencies Im et al. [2022] 45 Table 1.4: (cont’d) Membrane ML Technique Water Source Input Variables Output Year Type Used Molecular weight (MW), log Kow, dipole moment, molar volume, molecular length, molecular width, molecular Polyamide depth, equivalent width; rejection of nanofiltration Various water ANN membrane characteristics: neutral organic 2009 (NF) and reverse types molecular weight cut-off compounds osmosis (RO) (MWCO), pure water Yangali-Quintanill membrane permeability, magnesium et al. [2009] sulphate salt rejection (SR), surface membrane charge (as zeta potential), and hydrophobicity (as contact angle); operating conditions: operating pressure and permeate flux 46 Table 1.4: (cont’d) Membrane ML Technique Water Source Input Variables Output Year Type Used Pharmaceutical active compound properties (hydrophobicity "logD", Quantitative dipole moment, the structure-activity effective diameter of Nanofiltration Domestic relationship (single organic compound in water Removal (NF) and reverse wastewater neural networks "dc", molecular length, and efficiency Ammi 2021 osmosis (RO) "QSAR-SNN" molecular equivalent width et al. [2021b] membrane and bootstrap “eqwidth”); membrane aggregated characteristics (molecular neural networks weight cut-off "MWCO", "QSAR-BANN") sodium chloride salt rejection "SR (NaCl)", zeta potential, and contact angle); and filtration conditions (pH, pressure, temperature, and recovery) Molecular weight, ratio Uncharged Nanofiltration of the equilibrium organic and Various water Bootstrap concentration (logD), compounds 2017 revers types aggregated neural dipole moment, length, rejection e osmosis networks (BANN) eqwidth, SR (NaCl), zeta Khaouane membranes potential, contact angle, pH, et al. [2017] pressure, recovery, temperature 47 Table 1.4: (cont’d) Membrane ML Technique Water Source Input Variables Output Year Type Used Molecular weight, ratio of the equilibrium Nanofiltration concentration (logD), dipole Uncharged and Various water ANN moment, length, eqwidth, organic 2015 revers types membrane molecular weight compounds e osmosis cutoff (MWCO)/pore size rejection Ammi membranes MWCO, SR (NaCl), zeta et al. [2015] potential, contact angle, pH, pressure, recovery, temperature Molecular weight, molecular effective diameter "dc", Single neural ratio of the equilibrium Nanofiltration Various water networks (SNN) concentration (logD), dipole Removal and types and bootstrap moment, length, eqwidth, efficiency Ammi 2018 revers aggregated neural membrane molecular weight et al. [2018] e osmosis networks (BANN) cutoff (MWCO)/pore size MWCO, SR (NaCl), SR (MgSO4), zeta potential, contact angle, pH, pressure, recovery, temperature 48 Table 1.4: (cont’d) Membrane ML Technique Water Source Input Variables Output Year Type Used Molecular class, molecular weight, The octanol/water partition coefficien Nanofiltration Random forest, t (log Kow), partition Membrane and Various water neural network coefficient (logD), Rejection 2020 revers types models dipole moment, length, Le e osmosis eqwidth, depth, equivalent e and Kim length, membrane type, [2020] molecular weight cutoff (MWCO)/pore size MWCO, zeta potential, contact angle, pH, pressure, recovery, pH, operating pressure, recovery, salt rejection SR (MgSO4) 49 CHAPTER 2 Tap water fingerprinting using a convolutional neural network built from images of the coffee-ring effect 2.1 Abstract A low-cost tap water fingerprinting technique was evaluated using the coffee-ring effect, a phenomenon by which tap water droplets leave distinguishable “fingerprint” residue patterns after water evaporates. Tap waters from communities across southern Michigan dried on aluminum and photographed with a cell phone camera and 30x loupe produced unique and reproducible images. A convolutional neural network (CNN) model was trained using the images from the Michigan tap waters, and despite the small size of the image dataset, the model assigned images into groups with similar water chemistry with 80% accuracy. Synthetic solutions containing only the majority species measured in Detroit, Lansing, and Michigan State University tap waters did not display the same residue patterns as collected waters; thus, the lower concentration species also influence the tap water “fingerprint”. Residue pattern images from salt mixtures with an array of sodium, calcium, magnesium, chloride, bicarbonate, and sulfate concentrations were analyzed by measuring features observed in the photographs as well as using principal component analysis (PCA) on the image files and particles measurements. These analyses together highlighted differences in the residue patterns associated with the water chemistry in the sample. The results of these experiments suggest that the unique and reproducible residue patterns of tap water samples that can be imaged with a cell phone camera and a loupe contain a wealth of information about the overall composition of the tap water, and thus, the phenomenon should be further explored for potential use in low-cost tap water 50 fingerprinting. 2.2 Introduction Need for innovation in drinking water monitoring With tap water crisis events that continue to occur in both developed and developing nations, the desire for low-cost tap water testing that is practical for application by citizens is high. When a teacher, student, household, or community member would like to test their tap water, they are faced with single use paper test strips, probes, standard analytical methods for measuring water quality, or water testing fees for hundreds or even thousands of different water quality parameters. Challenges exist in choosing which water constituents to test and which methods to apply, both of which can be difficult since there is little to no tap water education in typical K-12 and university systems. In this work, experiments were conducted to determine if the coffee-ring effect, precipitation reactions, and convolutional neural networks (CNN) could be harnessed for low-cost “fingerprinting” of tap water samples as a whole, rather than measuring one contaminant at a time. How does the coffee-ring effect work The coffee-ring effect offers low-cost separation of particles in aqueous samples due to the physics of water droplet drying on hydrophobic substrates. This phenomenon occurs when water evaporates evenly from a water droplet surface with a pinned diameter, such that the droplet shrinks in height while the diameter remains constant Wong et al. [2011], Deegan et al. [1997]. The shrinking height of the droplet correlates to a decrease in contact angle at the pinned surface through droplet drying, squishing particles into concentric circles by size Wong et al. [2011]. The phenomenon was termed nanochromatography after separation resolutions on the order 100 nm were demonstrated for mixtures of fluorescently labeled antibodies, B-lymphoma cells, and E. coli at particle 51 volume fractions on the order of <0.04% Wong et al. [2011]. Force balance analysis suggests nanoscale separation is possible for low particle volume fractions due to the difference in the magnitude of adhesion versus surface tension forces for large (1 mν) and small (40 nm) particles at the drop edge, where surface tension forces move particles towards the center of the drop and substrate-particle adhesion forces hold particles in place. Most existing studies on the coffee-ring effect have been conducted on particles or biological molecules, sometimes in buffer solutions or biofluids where particle-like species deposit on the outer edge forming concentric rings of particles separated by size and soluble salts deposit throughout the center of the drop (Figure. 2.1). Particles within a drop are known to deposit on the outer edge when the fluid flow that delivers particles to the drop edge is faster than the surface capture effect, the latter which occurs if the concentration of particles at the surface of the droplet is high or if water evaporation is accelerated Li et al. [2016c]. Tap water solutions, however, are composed largely of dissolved ions rather than particles. Within dissolved salt solutions, the majority of the particles observed in the residue patterns must form as water evaporates and increases ion concentrations above solubility limits of their respective salts; however, very little work has been conducted to document the coffee-ring patterns for complex mixtures of salts Shahidzadeh et al. [2015]. It is expected that in mixed salt solutions both the coffee-ring effect and the fundamental characteristics of the salts that form will control the location, sizes, and shapes of each salt in the resulting residue pattern, with the least soluble salts that form particles quickly separated by size at the drop edge. Thus, features such as the sizes, shapes, colors, quantity, and location of particles within the coffee-ring residue of a water sample are expected to correlate to water chemistry. The coffee-ring 52 effect has previously been partnered with Raman spectroscopy to quantify cyanotoxins in environmental water, signs of ocular damage in human tear fluid, and osteoarthritis determinants in knee fluid; however, the patterns produced due to the coffee-ring effect have not been harnessed without expensive chemical analysis instrumentation to record composition of the deposited residues. Image analysis via convolutional neural network (CNN) Machine learning methods, especially deep learning artificial neural networks (ANNs) are increasing in popularity in research and engineering to solve problems that are challenging to solve with traditional analysis techniques. Convolutional neural networks (CNNs) have been widely tested and successfully used for image analysis, especially in segmentation problems, such as differentiating between an object and the background. With the development of more advanced CNN architectures (e.g., CNN models involving more layers, new activation functions, more options for objective functions to calculate error, more sophisticated model structures) and use of graphics processing units with higher computational speeds, CNNs are being developed to analyze a growing variety of data types, including medical images, electron microscopy images, cal structures. For example, CNN models have proven the ability to identify brain tumors in magnetic resonance images (MRI) faster and more accurately than the state of the art tools and can identify the pancreas in computerized tomography (CT) images, both of which are challenging analysis problems because of anatomical variability. In chemistry, CNN models are being trained using 2D and 3D images of molecular structure for quantitative structure-activity relationship (QSAR) modeling to predict toxicity Matsuzaka and Uesawa [2019] and to predict therapeutic use classes of drugs Meyer et al. [2019]. CNN models have also been trained to assign surface-enhanced Raman spectroscopy (SERS) spectra to classes of 53 metabolites and to assign bundles of SERS spectra (8 x 8 pixel hyperspectral images) to the concentration of rhodamine 800 dye at femtomolar concentrations for single molecule detection Lussier et al. [2019], Thrift and Ragan [2019]. Additional applications include identifying the types and positions of defect structures in silicon doped graphene from unprocessed scanning transmission electron microscopy images, predicting chemical reactivity, and diagnosing faults in the chemical process industry. Limitations of CNNs include the computational cost of model training, the sensitivity of classification to unbalanced datasets (unequal numbers of samples in different classes can result in poor model performance), and the necessity of experienced users to modify model structure and tune parameters for every individual CNN application. However, the accuracy of classification results observed and the wide variety of cases in which it can be applied ensures use of CNN will continue to grow. The goal of this research was to determine if the residue patterns of tap water samples imaged with a cell phone camera and loupe were sufficiently reproducible, sensitive, and correlated to water chemistry to be valuable for low-cost analyses. Specific objectives were to create a library of images of residue patterns for real and synthetic tap waters, determine if the residue patterns were reproducible for a given water chemistry, document the response of the fingerprint to changes in composition of majority species (sodium, calcium, magnesium, chlorine, bicarbonate, sulfate), and apply machine learning image analysis techniques to differentiate between residue patterns. These objectives were met by photographing residue patterns for a variety of collected tap water solutions and increasingly complex synthetic water solutions with a cell phone camera through a jeweler’s loupe, measuring features observed in residue patterns, and correlating residue features to water chemistry, and creating a CNN to classify residue pattern images to 54 groups with similar water chemistry. Figure 2.1: Nanoscale separation of particles within a drying droplet is provided by the phenomenon known as the coffee-ring effect. 2.3 Experimental Water samples Thirty tap water samples were collected from communities across southern Michigan, utilizing a variety of water treatment systems (Table. 2.1, Table. 6.1). One liter of each water sample was collected in a hydrochloric acid washed polypropylene bottle from the water supply at a public park, community center, or city building water fountain or restroom tap. Samples were stored at 4 °C until analysis using the coffee-ring effect and standard methods. Samples were not filtered before measurement. Conductivity was measured by a Hach HQ40D portable conductivity meter and intelliCALTM CDC401 standard conductivity probe, and pH was measured with a Orion Star A211 pH meter and Orion 8135BNUWP Ross Ultra Fast pH probe (Thermo Scientific). Chlorine, sulfate, phosphate, fluoride, bromide, and nitrate concentrations were measured by ion chromatography with a Dionex series 2000i/sp instrument. Bicarbonate was measured by titration to pH of 4.5 using standard method 2320.28 Metals were measured by Varian 710- 55 ES Axial ICP-OES and samples were digested by nitric acid using standard method 3030 E. One replicate sample was measured for every ten samples, and values that deviated from expected ( from annual municipal water quality reports or previous measurements) were repeated. 56 Table 2.1: Measured water quality data for tap water samples collected across Michigan and treatment information from annual municipal water quality reports and system operators. Averages and standard deviations are listed for values conducted in replica. Water Cl− 2− HCO− PO3 Cu City pH Cond Na+ Ca 2+ Mg2+ K+ SO 4 3 − Fe treatment 4 uS/cm mM mM mM mM mM mM mM mM mM mM Chlorine, MSU, fluoride, 6.96 823 1.08 2.24 1.54 0.041 0.91 0.92 6.94 0.01 6.1× 2.2× academic phosphate, 10−3 10−2 hall sodium hydroxide Iron Durand remove 6.72 388 0.31 0.16 0.11 0.075 1.10 0.47 4.88 0.02 1.6× 2.4× filters, 10−3 10−3 chlorine Chlorine, Kalamazoo fluoride, 8.52 976 3.17 1.06 1.29 0.06 3.11 0.39 6.23 0.01 1.2× 4.1× and 10−3 10−3 phosphate Chlorine, 1.1× 1.1× Portland 6.94 909 0.76 0.53 2.86 0.109 0.05 0.12 7.51 BD phosphate 10−3 10−3 Chlorine, Battle Creek Site fluoride, 7.22 673 1.60 1.77 1.04 0.035 1.16 0.50 5.47 0.02 4.0× 7.9× A and 10−3 10−4 phosphate Chlorine, Battle Creek Site fluoride 7.22 673 1.60 1.77 1.04 0.035 1.16 0.50 5.47 0.02 8.9× 2.0× B and 10−3 10−2 phosphate Chlorine, 7.01± 1215 ± 3.9× 4.4× Charlotte 3.79 2.53 3.32 0.252 4.10 0.54 6.89 0.02 phosphate 0.29 23 10−4 10−3 57 Table 2.1: (cont’d) Water Cl− SO2− HCO− PO3− City pH Cond Na+ Ca2+ Mg2+ K+ 4 4 Cu Fe treatment 3 uS/cm mM mM mM mM mM mM mM mM mM mM Chlorine, 7.8× 9.4× Fowlerville 7.14 978 4.63 1.10 0.91 0.158 3.53 0.24 6.07 0.01 10−4 10−3 phosphate Lansing Lime 3.1× 2.5× 8.70 609 4.29 0.55 0.56 0.082 2.33 1.34 0.99 0.01 10−4 10−3 site A softening Lansing Lime 1.4× 5.9× 7.04 535 3.79 0.63 0.49 0.079 1.91 1.16 0.83 0.01 10−3 10−4 site B softening Lime, ferric East fluoride, 6.61 361 1.43 0.58 0.56 0.063 1.10 0.50 1.39 0.01 1.8× 5.3× Lansing filtration, 10−3 10−3 chloramine, fluoride, phosphate Lime 6.9× 6.6× Howell 8.15 453 2.76 0.55 0.54 0.092 1.83 0.62 1.29 0.01 10−4 10−3 softening Iron MSU exchange, residence chlorine, 7.34 880 19.57 0.07 0.04 0.025 1.16 0.84 7.09 0.01 1.3× 2.3× hall fluoride, 10−3 10−2 phosphate, sodium, hydroxide Iron Williamston removal, 7.51 710 6.02 0.99 0.53 0.075 0.93 0.43 6.83 0.02 1.0× 6.4× softening, 10−2 10−4 chlorine, phosphate 58 Table 2.1: (cont’d) Water Cl− SO2− HCO− PO3− City pH Cond Na+ Ca2+ Mg2+ K+ 4 4 Cu Fe treatment 3 uS/cm mM mM mM mM mM mM mM mM mM mM Household Genoa water 7.04± 1920 ± 18.65± 0.20± 0.20± 0.03± 9.7 ± 0.61 8.55 BD 8.1× Twp Soft softener, 0.23 30 0.47 0.015 0.035 0.025 0.3 10−4 BD private well Genoa Private 4.5× 4.7× Twp, well, 7.24 1940 6.69 3.81 1.98 0.12 11.16 0.60 8.26 BD 10−4 10−2 Untreated untreated Chlorine Rest stop, 3.4× 1.7× if bacteria 7.36 516 3.08 1.41 0.46 0.141 0.09 0.15 6.19 BD 10−4 10−3 Okemos found Chlorine Rest stop, 2.7× 9.3× if bacteria 7.05 560 3.35 1.04 0.82 0.085 0.79 0.21 5.38 BD 10−4 10−3 Zeeland found Chlorine Rest stop, 2.5× 4.0× if bacteria 7.07 546 1.22 1.76 1.19 0.029 0.05 0.12 6.86 BD 10−4 10−2 I96/M66 found Chlorine Rest stop 4.1× 1.3× if bacteria 6.96 606 2.71 1.10 1.21 0.090 1.20 0.14 5.64 BD 10−3 10−2 Fenton found Reverse osmosis Allegan 6.53 295 1.41 0.73 0.52 0.019 0.63 0.17 2.51 0.02 1.8× 6.0× 10−4 10−4 59 Table 2.1: (cont’d) Water Cl− SO2− HCO− PO3− City pH Cond Na+ Ca2+ Mg2+ K+ 4 4 Cu Fe treatment 3 uS/cm mM mM mM mM mM mM mM mM mM mM Reverse osmosis Genoa of private 6.64 264 3.23 0.08 0.02 0.006 1.27 0.11 1.37 BD 4.8× 4.8× Twp RO well after 10−4 10−4 household water softener Great Lakes Water Detroit Authority 6.21 226 0.43 0.59 0.34 0.023 0.51 0.26 1.55 0.02 1.8× 5.4× (GLWA), 10−3 10−3 Water Works Park plant GLWA, Flint Lake 6.86 219 0.32 0.07 0.02 0.022 0.52 0.23 1.64 0.04 4.4× 5.6× Huron 10−3 10−3 plant GLWA, Swartz Lake 5.87 209 0.41 0.08 0.03 0.024 0.51 0.23 1.61 0.02 6.9× 4.9× Creek Huron 10−4 10−3 plant Lake Grand Michigan 7.17 304 0.44 0.89 0.26 0.030 0.63 0.33 2.2 ± 0.02 4.9× 1.9× rapids Filtration 0.04 10−3 10−2 plant 60 Table 2.1: (cont’d) Water Cl− SO2− HCO− PO3− City pH Cond Na+ Ca2+ Mg2+ K+ 4 4 Cu Fe treatment 3 uS/cm mM mM mM mM mM mM mM mM mM mM Holland Board of Holland Public 6.76 302 0.74 0.85 0.51 0.034 0.60 0.29 2.45 BD 3.7× 5.7× Works 10−3 10−3 Water Filtration Plant Donald Wyoming K. Shrine 7.16± 302±8 1.30± 0.905± 0.5 ± 0.036± 0.61± 0.34 2.17 BD Water 0.03 0.005 0.005 0.001 0.002 0.01 BD BD Treatment Plant 61 In order to determine the effects of specific ions on residue patterns, synthetic water samples containing various concentrations of the main components in tap water were prepared, including synthetic hard freshwater (192 mg/L NaHCO 3 , 120 mg/L MgSO 4 , 120 mg/L CaSO4 2H2O, and 8 mg/L KCl) and mixtures of NaCl, NaHCO 3 , CaCl2, MgCl 2 , CaSO4, MgSO 4 , and Na2SO4. Salt mixtures were designed to examine ranges that may be observed in real tap waters; thus, the low and high concentrations tested of every salt do not match. Simplified synthetic tap waters were created to mimic concentrations of calcium, magnesium, sodium, chlorine, sulfate, and total carbonate species observed in tap water. Complex synthetic tap waters also contained phosphate, nitrate, fluoride, copper, and iron. Natural organic matter was not added because larger organic molecules typically deposit on the outer edge of the drop where the organics can’t be identified from images alone. Collection of coffee-ring residue patterns Two microliter droplets of each water were gently pipetted onto aluminum substrates (6061 with mirror-like finish, McMaster-Carr 1655T1). Substrates from the manufacturer were used directly after peeling off the plastic film that protects the mirror-like finish. Samples were left uncovered for 20-30 minutes or until dry without being moved, touched, or disturbed from the moment of deposition on the slide (Figure. 6.1). Relative humidity in the lab ranged from 47-52% and temperature 23-25 °C over the course of the coffee-ring effect experiments. Samples were imaged with a SamSung S6 cell phone through a Fancii 30× triplet loupe (Amazon.com) with the LED light on (Figure. 2.2). At least five drops were imaged for each sample, and residues that were not round due to lack of pinning to the surface were repeated. Relative humidity and temperature were recorded for each experiment with a Fisher Scientific Traceable Relative Humidity/Temperature Meter (11- 62 661-13). Reproducibility of water residue patterns was examined by three researchers testing a subset of water samples on several substrates. Figure 2.2: Tap water fingerprints were captured by drying droplets on aluminum and photographing with a cell phone camera through a loupe. Image processing, principal component analysis (PCA), and cluster analysis Residue pattern photographs were cropped manually with ImageJ to dimensions of 700 by 600 pixels. Scales bars of 0.5 mm were added in ImageJ using ruler tape captured in photographs as a reference, dimensions of features in residues were measured, and processed images were saved in JPEG format. Images were converted to black and white, noise removed, and particles measured in Matlab software version R2017b (im2bw, medfilt2, and regionprops functions). Principal component analysis (PCA) was conducted on both particle measurements and on the image files themselves using Python version 3.6.4 (matplotlib, numpy, and sklearn packages; Figure. 6.2). Measured water chemistry for each tap water sample was plotted on a trilinear classification diagram using GW_Chart (Version 1.29.0.0, USGS) with samples sorted according to treatment. The cluster analysis algorithm CLARA was used to group samples into six groups using all thirteen of the measured parameters after normalization by subtracting the mean from the measured value and dividing by its standard deviation Liu and Özsu [2009]. The cluster analysis result was visualized in a two dimensional map using the two main components identified by principal 63 component analysis with the R factoextra package. Convolutional neural network A convolutional neural network (CNN) model was created to classify images. Ten residue images from each water sample were used for model training and testing, five of which were from fresh samples and five collected after storage at 4 °C. The first three replicates of each water sample for each condition (fresh and stored) were used for training the model (180 images), and the last two replicates were used for testing the model (120 images). Image pre-processing involved resizing each image from 470 by 470 pixels to 300 by 300 pixels and converting from color to gray-scale (Table. 6.3). The brightness was normalized for each image by dividing the brightness value for each pixel in an RGB channel by the overall sum of the brightness values of all pixels for that RGB channel. A CNN model was built with two convolutional layers and three fully connected layers in Python (Figure. 6.3). In the first layer eight filters were used to extract pattern features, and sixteen filters were used in the second layer to extract deeper pattern features. After the convolutional layers, three fully connected layers were used to fit the data. The fitting method was a stochastic gradient descent (SGD) with probability calculations through the SoftMax function. The batch size was five for each optimization process. Samples were randomly selected by their weights which were set equal at the beginning but updated after each optimization process by their classification result. The learning rate was 10−4 in the model training process. In each iteration, five samples were randomly selected from 180 training samples by their weights with replacement, and every 36 iterations consisted of one epoch. After each epoch, training accuracy, testing accuracy, training loss, and testing loss were calculated. Two hundred epochs were processed for each model and ten independent models were trained. The test dataset accuracies of the 64 last one hundred epochs and the last epoch model were recorded for analysis. 2.4 Results and discussion Coffee-ring residue patterns for each Michigan tap water are unique Michigan State University and the surrounding communities frequently rely on groundwater sources with minimal treatment (chlorine and phosphate, sometimes with fluoride) or hardness removal by lime softening or ion exchange. Rural communities also frequently use on point-of-use or point of entry treatment such as home water softeners or reverse osmosis systems. Many communities near Great Lakes coast-lines utilize surface water sources and conventional treatment. The Great Lakes Water Authority (GLWA) treats and distributes water to a substantial fraction of Michigan’s population in the east from Lake Huron or the Detroit River and many communities in the west utilize Lake Michigan. Tap water collected from the sampled Michigan communities displayed a wide range of chemical compositions (Table. 2.1). The coffee-ring residue patterns for each type of tap water were unique, and waters with similar chemistry displayed similar residue features (Figure. 2.3). Reproducibility was evaluated initially by imaging five droplets of each sample on the same slide, and most residue patterns displayed nearly identical features across replicates (Figure. 6.4). Lime softened water showed variability across replicates, with some samples displaying a thin film of particles across the entire drop and others producing a clearing in the center. A subset of samples were analyzed by three analysts with varying levels of experience. Mirrored aluminum 6061 substrates were chosen due to low cost, availability, compatibility with the loupe and cell phone camera for imaging, and ease of use for inexperienced users; substrates were inspected before use for scratches or defects and only smooth areas without blemishes were used for the coffee-ring effect 65 experiments. Nanopure water and synthetic hard freshwater were applied as controls. The substrates contained residue remaining from the manufacturer that was captured in images of nanopure water controls (Table. 6.5). A trend was not observed between residue patterns for samples and the residue pattern or lack of residue pattern in the nanopure water controls (Table. 6.5). Tap water samples were tested on multiple substrates to ensure that variation observed in the patterns was not due to the substrate (Tables. 6.6). All analysts produced more consistent data across a single slide than across different slides. Despite variability between substrates, MSU water from academic buildings (hard water) displayed similar patterns on substrates tested across all researchers. Untreated groundwater from the rest stop was characteristically more variable, displaying one of two patterns with a thin film of small particles and either a white ring at the outer edge or a circular segment to one side. Residue patterns for lime softened water from East Lansing were typically consistent across a single slide, but showed two types of patterns with several concentric rings at the drop edge and either a clear center or a thin film of feathery particles across the center surface. Neither the nanopure blank nor synthetic hard freshwater were sufficient to predict which samples would produce thin films of particles for the lime softened water. A similar result was observed for softened Lansing water (Table. 6.5). Synthetic lime softened water may function as a more sensitive positive control for future experiments. Only analyst 1 observed the residue pattern for Detroit with the center scattering of particles concentrated on one side of the drop; this result was attributed to a lab bench at an angle of approximately 1° (Table. 6.5). Residue patterns that displayed variability across substrates were still sufficiently unique from samples with different chemistry to identify what type of drinking water treatment was applied. The results of these experiment suggest that a more uniform substrate and level surface may be 66 required to reduce variability for applications beyond identifying the tap water source from a library of residue fingerprints. It is well established that the hydrophobicity of the substrate influences the coffee-ring effect Shahidzadeh et al. [2015], Zhang et al. [2003], Ortiz et al. [2004], Zhong et al. [2017]; thus, the substrate used for training datasets must be consistent with that of unknown samples. Additional variables that must be controlled during coffee-ring effect experiments include temperature Li et al. [2016c], Takhistov and Chang [2002], humidity Li et al. [2016c], Chhasatia et al. [2010], Kaya et al. [2010], and the volume of the droplet Ortiz et al. [2006] (further evaluation of the durability of the protocol is included in the ESI and Table. 6.6). Synthetic tap water solutions containing six main constituents do not fully explain the environmental samples Synthetic tap water solutions were created to reflect components measured in Lansing (lime softened groundwater), MSU (minimally treated hard water), and Detroit water (surface water with conventional treatment). A synthetic mixture of simplified Lansing water containing only the six major components (calcium, magnesium, sodium, chlorine, sulfate, and total carbonate species) displayed many features observed in Lansing waters 67 Figure 2.3: Coffee-ring residue patterns of freshly collected Michigan tap waters. The lab temperature was 24-25 ◦C and relative humidity 52% for this experiment. Replicates are included in Table. 6.4. on various slides, but the simplified synthetic Detroit and MSU waters were different than the collected tap water samples (Table. 2.2). The simplified synthetic Detroit water had particles deposited at the drop edge like the environmental sample, but the rings, color, and center were different. Adding iron, copper, nitrate, fluoride, and phosphate caused the synthetic residue pattern for Detroit water to become closer to the environmental sample, but still did not capture all the features. Additional studies must be conducted to determine the influence of pH and organic matter on the residue patterns as well. The complex synthetic Detroit water sample captured the yellow and blue coloring observed in the concentric ring at the inner drop edge, possibly due to the presence of phosphate and iron forming insoluble salts. The MSU tap water still did not resemble the collected water after addition of the lower concentration components. This finding provides further evidence that lower concentration species, pH, or 68 particulates likely play a role in defining residue patterns. Table 2.2: Simplified synthetic tap water compared residue patterns to real tap water, with measured pH of each solution listed below the image (24 degree C, 47% relative humidity). Replicate images are shown in Table. 6.1 Collected tap water Simplified Complex synthetic synthetic Lansing • 7.0-8.7 • 8.08 • 8.02 MSU • 7.34 • 7.85 • 8.01 Detroit • 6.21 • 7.39 • 7.35 Residue patterns document water chemistry Simple synthetic mixtures demonstrate trends between water chemistry and particle, shape, size, and location of deposition. To confirm that trends in particle shapes and sizes in coffee-ring patterns are influenced by the identities and concentrations of solutes, three salt synthetic mixtures were created of NaCl with CaCl2 and MgCl2, NaHCO 3 with CaCl2 and MgCl 2 , Na2SO4 with CaSO4 and MgSO 4 , and NaHCO 3 with CaSO4 and 69 MgSO4 at concentrations relevant to tap waters. In the presence of calcium and magnesium chlorine, NaCl caused large uniform particles to be distributed across the drop, while NaHCO 3 caused smaller and more densely packed flakes and feathering patterns at the higher concentrations (Table. 2.3). These features could be quantified by measuring the average area of particles and the number of particles for each set of images. For example, the average area of particles decreased with decreasing NaCl concentration in the presence of 3.0 mM CaCl2 and 1.5 mM MgCl 2 , and the average number of particles decreased with decreasing NaHCO 3 concentration in the presence of 0.5 mM CaCl2 and 0.25 mM MgCl 2 (Figure. 2.4). It was hypothesized that because NaCl and NaHCO 3 are highly soluble, both produced thin films of particles that were likely deposited through surface capture or settling rather than the coffee-ring effect as ions remain dissolved through most of the droplet evaporation process. Crystal formation was sensitive to differences in slides; a similar result was found on additional slides, though the large distinct, uniformly sized NaCl particles did not form at the lower concentrations of calcium and magnesium chlorine (Table. 6.9). Intricate particle shapes were observed for mixtures of sodium bicarbonate with calcium and magnesium chlorides, but the shapes of the particles were not identical across all batches of slides. Additional experiments are required with higher quality substrates to determine how the shape of the bicarbonate particles correlates to the matrix water chemistry and surrounding conditions. Simple synthetic mixtures containing sulfate salts of sodium, magnesium, and calcium had multiple concentric rings at the drop edge, likely due to differences in solubility between calcium sulfate, magnesium sulfate, and sodium sulfate. Again, the number of particles decreased with decreasing sodium sulfate concentration in the presence of 0.5 mM 70 CaSO4 and 0.25 mM MgSO4 (Figure. 2.4). Adding bicarbonate to the mixture at the same concentration of calcium and magnesium sulfate caused the concentric rings at the drop edge to be eliminated to create a thin film of densely packed very small uniform particles, except for the lowest sulfate and bicarbonate concentrations (Table. 2.3), though the number of particles still decreased with sodium bicarbonate concentration (Figure. 2.4). PCA conducted on the image files themselves (five replicates of each image) was compared to PCA on the measurements of particle sizes and numbers within the images. In both cases, three principal components were useful in clustering the images into groups with similar ions, but not sufficient to group samples by concentrations of components (Figure. 2.5). Three principal components explained around 50% of the variability of the data set for PCA conducted on the image files (Figure. 6.4). PCA is valuable for highlighting variability in a dataset, but it does not take into account subimages or sub- patterns (such as rings at the drop edge versus the center of the residue pattern) Kadappa and Negi [2016]; thus, it is not surprising that PCA on the image files was not sufficient to differentiate between images with different concentrations of ions despite clear qualitative differences in residue patterns. Specific measurements of features within the images or a convolutional neural network designed from a larger dataset may be more valuable in determining concentrations of species (Figure. 2.4). 71 Table 2.3: Simple synthetic mixtures analyzed at 24 ◦C and 48% relative humidity NaCl 10 NaCl 5.0 NaCl 2.5 NaHCO3 NaHCO3 NaHCO3 Quality mM mM mM 10 mM 5.0 mM 2.5 mM check 3 mM CaCl2, 1.5 mM MgCl2 1 mM CaCl2, 0.5 mM MgCl2 0.5 mM CaCl2, 0.25 mM MgCl2 Na2SO4 Na2SO4 Na2SO4 NaHCO3 NaHCO3 NaHCO3 Quality 5.0 mM 2.5 mM 1.25 mM 10 mM 5.0 mM 2.5 mM check 3 mM CaCl2, 1.5 mM MgCl2 1 mM CaCl2, 0.5 mM MgCl2 0.5 mM CaCl2, 0.25 mM MgCl2 72 Similar residue patterns were observed for collected tap water samples with similar water chemistry. Cluster analysis and trilinear classification diagrams were used to group samples with similar water chemistry, with cluster analysis taking all the collected water chemistry data into account and the trilinear diagram only using data for the species with the highest concentrations typical of fresh waters (calcium, magnesium, sodium, potassium, chlorine, sulfate, carbonate, and bicarbonate). In general, the cluster analysis and the trilinear diagrams grouped samples with those from the same treatments together (Figure. 2.6, Figure. 6.5). Cluster analysis, however, did not group ion exchange samples together, more effectively separated minimally treated groundwaters, and lumped reverse osmosis samples with surface waters. The trilinear plot showed the ion exchange samples clearly distinct from the rest, plotted the reverse osmosis samples closer to the minimally treated groundwaters, and the lime softened waters separated clearly from the surface waters. These findings highlight that the water chemistry for the ion exchanged samples are related in terms of the higher concentration components, but the overall water chemistry more closely matches samples from other groups. Inspection of the coffee-ring residue photographs according to the groupings visualized by cluster analysis and trilinear diagrams uncovers patterns in the crystals that may associate with a given water chemistry (Figure. 2.6). For example, each ion exchange sample that clustered together on the trilinear diagram had a thin film of particles with larger crystals scattered across the drop, but each image also displayed attributes of the group assigned through cluster analysis when the lower concentration species were accounted for. Trends in the dataset can also be determined from comparing residue patterns from synthetic mixtures, samples with similar composition of the six main water components, and samples with 73 Figure 2.4: Particle areas and particle counts for simplified synthetic mixtures of three salts. Figure 2.5: Principal component analysis (PCA) on particle measurement data (left) and PCA conducted on image files (right). similar overall water chemistry. The residue patterns for tap waters treated by similar methods displayed characteristic features representative of that treatment, such as several concentric rings with a strong secondary ring near the outer edge for surface water, colorful concentric rings with smaller particles scattered throughout for hard groundwaters with 74 Figure 2.6: Cluster analysis of water chemistry data. Figure 2.7: Testing dataset accuracies of ten CNN models (left) and the confusion matrix of the first trained model (right). minimal treatment, a thin film of fine particles for reverse osmosis treated groundwater, a strong outer ring of white with small particles densely spread across the drop for untreated groundwater, large crystals scattered across the drop for ion exchange, and a white/gray thin film of small particles or dense concentric rings of small particles with feathering pat- terns for lime softened water (Figure. 2.3). Tap water samples contain high concentrations of dissolved ions when droplets are placed on the substrate, so particles form and grow as water evaporates from the drop as observed previously for solutions of NaCl or CaSO4 75 Shahidzadeh et al. [2015]. Therefore, particles of the least soluble salts that grow quickly upon their concentrations exceeding solubility limits are expected to form particles early enough during drying to be transported by the coffee-ring effect to the drop edge, unless they grow large enough to settle first. Particles that do not form until the drop is nearly dry are expected to be deposited through the surface capture effect or settling and be found across the center of the drop. Calcium and magnesium carbonates and sulfates are less soluble than sodium and chlorine containing salts Benjamin [2014], Haynes et al. [2016]; therefore, it is logical that hard waters would display an outer ring at the drop edge and waters softened by ion exchange (containing more sodium than calcium or magnesium) would display thin films of particles. Additional mixtures must be analyzed to verify the qualitative patterns described here. Convolutional neural network (CNN) model assigned images to groups with similar water chemistry. CNN models have previously been proven effective in object detection and image classification Krizhevsky et al. [2017], Russakovsky et al. [2015], Szegedy et al. [2015]. Herein a CNN model was developed and tested to assign residue images into classes with similar water chemistry data as determined by cluster analysis. Overall, after building the model from a library of similar training images, the CNN model was effective with 80% accuracy in assigning residue images from the test set into groups with similar water chemistry. To achieve higher accuracy, a larger dataset would be needed to train the model. Specifically, in the CNN model developed here the average and standard deviation of the accuracy for the last 100 epochs for ten independent CNN models was 76.7 ± 3.0% (Figure. 2.7). Only six of the test images were misclassified in the class one group of images that contained a total of 48 images (largely from surface waters with RO samples and a few others mixed in), but two of the test images were misclassified from class two that 76 contained a total of four images all from the high TDS genoa township untreated well water (Figure. 2.7). All of the misclassified images from class two were instead placed into class four that contained minimally treated groundwaters and one ion exchanged sample. Two out of twenty-four images from class four and two out of twenty-four images in class five (minimally treated and untreated groundwaters) were misclassified into class one. A few additional images were also mis- classified between class four and five; in qualitative comparing residue images, images of class four and class five are more similar than images in other classes, which is logical considering these both classes largely contain minimally treated and untreated groundwaters. Confusion matrix of the ten models were provided in Figure. 2.8. There were a few of the test images that were misclassified more often than others (Table. 6.10). Five of the test images with a misclassification percentage over 70% had a coffee-ring residue pattern that was notably different from replicates of the same sample. For example, two MSU residence hall samples had a clearing in the center of the residue pattern while the rest had a complete thin film across the entire drop; the two samples with clearings were misclassified in over 70% of the models (Table. 6.10, Table. 6.3). Two of the images with a misclassification percentage over 70% were from class two which had the lowest number of replicates. The low number of images causes the model to be less sensitive to this class despite the distinct large crystal pattern Junqué de Fortuny et al. [2013], Martens et al. [2016]. Three images were often misclassified without a clear reason (Table. 6.10). 77 Figure 2.8: Confusion matrix of ten CNN models. The percentage of images that were properly classified into class one was much higher than most of the other classes. Class one had the most images, so in the model training 78 process the model is skewed to more accurately predict the class one images Japkowicz and Stephen [2002], Krawczyk [2016]. Generally with CNN models the accuracy is improved by using a larger dataset of images during model training to allow the model to capture more information and detail Junqué de Fortuny et al. [2013], Martens et al. [2016]. Overall, class one, three, four and five had similar accuracy around 80%, but due to the low number of samples the accuracies of classes two and six were around 40-50% (Figure. 6.6). About half images in class one had less than 1% mis-classification percentage and most images in class two and six had high mis-classification percentages. 2.5 Conclusions and future outlook Both the coffee-ring effect and convolutional neural networks (CNNs) remain underutilized techniques to be harnessed for tap water analysis. Herein we show proof of concept experiments that document the unique fingerprints provided by the coffee-ring effect for tap water solutions from various cities across Michigan and the reproducibility of the phenomenon, demonstrate that low concentration species as well as major ions influence the residue patterns, provide evidence that the patterns indeed document water chemistry within the sample, and demonstrate the ability of a CNN in assigning images to water chemistry. The low-cost substrate employed in this work caused variability between experiments, especially for batches of substrates purchased at different times; however, the variability was included in the training dataset, so the CNN was still able to classify the images with 80% accuracy. Additional work is required to identify the appropriate substrate that is widely available for a low cost test. Quality control metrics are critical for identifying variation in experiments, and lime softened water was much more sensitive to experimental variation than the hard synthetic water used as a control for this study. Traditional PCA on image files is insufficient for differentiating between images of water 79 samples with different concentrations of components, likely due to lack of consideration of subregions such as the outer coffee-ring; however, with a larger dataset a CNN model will be especially valuable for differentiating between water chemistries and assigning unknown images to groups from a library of images. A larger library of residue patterns and a corresponding CNN model must be trained to move this technology from qualitative tap water quality analysis to a quantitative technique and to further identify features of the residue patterns. Despite the use of a low-cost and variable aluminum slide, using a pipette, $18 jeweler’s loupe, and cell phone camera, each type of tap water tested displayed unique characteristics, water samples with similar water chemistry produced residue patterns with similar features, waters from two locations in a city were more similar than samples from different cities, and the CNN model was able to assign samples to groups with similar water chemistry. This evidence suggests that this method should be further considered for low-cost water quality fingerprinting. 80 CHAPTER 3 Optimal environmental condition for contaminants separation by coffee-ring effect 3.1 Abstract This study investigates the potential of the coffee-ring effect as a tool for tap water analysis, demonstrating its ability to produce unique fingerprints for water samples with varying compositions and environmental conditions. However, the coffee-ring effect’s stability is found to be influenced by environmental conditions, presenting a challenge for its practical application. Additionally, identifying the optimal environmental conditions for separating contaminants particles is essential to enhance the technique’s efficacy. Establishing the correlation between water sample coffee-ring effect patterns and element deposition compositions is also crucial for utilizing the technique to identify particle compositions. The study confirms the reproducibility of the coffee-ring effect and highlights the impact of both environmental condition and water compositions on the residue patterns produced. Various statistical methods, such as ANOVA, MANOVA, and PERMANOVA, can differentiate coffee-ring effect residue patterns with respect to environmental conditions and water sample compositions. However, determining the most effective method for differentiating these patterns requires further research, as the results from different analyses can be inconsistent. The study’s statistical analyses indicate that environmental conditions and water chemistry significantly influence residue patterns and element distributions. Optimal environmental conditions, including 23-26°C with 45-50% relative humidity, 20-23°C with 81 45-50% relative humidity, and 26-29°C with 40-45% relative humidity, are identified for differentiating water samples with varying component concentrations. Nonetheless, the optimal environmental condition is a temperature range of 23-26°C and a relative humidity of 45-50%, as it yielded the highest number of optimal results in 12 separate analyses. These findings have implications for further research on residue patterns and improving the understanding of the underlying mechanisms of the coffee-ring effect. 3.2 Introduction Centralized drinking water supply and distribution systems in the U.S. were developed in 1854 to reduce the reliance of fast growing cities on contaminated wells and decrease incidence of cholera and typhoid diseases Burian et al. [2000]. Today, water distribution systems are currently reaching their end of life and failing faster than they can be replaced, requiring funding at a rate that strains many communities Coghill et al. [2014], Folkman [2018]. According to a 2018 report of 197,866 miles of pipes across the United States, over 16% of installed water mains are beyond their useful life, 28% of pipes of all material types are older than 50 years, and 71% of all the pipes are older than 20 years Folkman [2018]. Since 2012, the overall break rates increased 27% , primarily due to failures in asbestos cement (AC) and cast iron (CI) pipes Folkman [2018]. The most common method for prioritizing pipe replacement is based on failure data. Large, critical mains have essentially been ignored in many communities until they failed Darlene Garcia and Susan Funchion [2015]. This method ignores water quality issues related to aging pipes. Prior knowledge of pipe material or age can also be used to prioritize pipe replacement, but knowledge of where lead service lines or older pipes exist is not always available Cornwell et al. [2016]. Researchers have also developed models to prioritize pipe replacement based on pipe failure data including multiobjective genetic algorithms, failure 82 assessment models, rank aggregation models, etc. Giustolisi and Berardi [2009], Rogers and Grigg [2008], Tlili and Nafi [2012], Choi et al. [2017], Marzouk et al. [2015], Ho et al. [2009]. Water quality data can also be used to directly identify sections of the distribution system that negatively impact water chemistry Kirmeyer [2002]; however, collecting sufficient water quality data across a distribution system to determine which pipes are hazardous to public health is often challenging due to the time and costs required to collect and analyze enough water samples. Herein we propose to develop a fast, low-cost method that can drastically increase the number of water samples that can be collected and analyzed to aid in identification of waters across a distribution system that have been impacted by corrosion. This method will harness tap water fingerprints created by the coffee-ring effect. Tap water fingerprints provided thru the coffee-ring effect are unique to water chemistry When the coffee-ring effect is harnessed, tap waters leave unique residue patterns, or fingerprints that correlate to tap water chemistry (Table. 3.1), Li et al. [2020], Shahidzadeh-Bonn et al. [2008], Kaya et al. [2010], Shin et al. [2014], Shahidzadeh et al. [2015]. The residue pattern formation is a crystallization process of water contaminants and crystallization of salts or other materials in supersaturated solutions has been intensively investigated due to its practical significance in pharmaceutical purification, salt manufacturing, seawater purification, cosmetic production, deicing, and so on Li et al. [2020], Qazi et al. [2017], Wei et al. [2012], Sammalkorpi et al. [2009], Desarnaud et al. [2014], Meldrum and O’Shaughnessy [2020]. Previous researchers mainly studied the mechanisms of crystallization in electrolyte solutions without evaporation. However, Studies stressed on precipitation and crystallization from evaporating sessile droplets are far less especially when compared with the active domain of colloidal sessile droplets 83 Zhong et al. [2015], Feng et al. [2017], Zhong and Duan [2016], Anyfantakis et al. [2015], Zhang et al. [2016], Xu et al. [2016], Bahmani et al. [2017], Lee et al. [2017], Saxena et al. [2017], Li et al. [2016d], Chen et al. [2012], Malvadkar et al. [2010]. According to the previous study, the more complex profile of a sessile droplet characterized by the three phase contact line and the curved liquid vapor interface complexes the precipitation process as compared the easy solution configuration. The higher evaporation flux in the vicinity of the contact line can induce outward flows that result in heterogeneous distribution of ions and the associated supersaturation degree. At the mean time, microfluid formed inside the droplet sessile bring particles to the droplet substrate contact line. The curved liquid vapor interface could limit the growth and vary the motion of precipitation. The complexity caused by the multifactors in respect to evaporation, bulk flow, temperature, humidity and wettability is therefore expected to significantly vary crystallization in sessile droplets. So far crystallization of salts from drying saline droplets has been investigated in a number of studies mainly focused on nucleation mechanisms and the dependence of precipitation profile on solid surface properties, salt concentration, and so forth Takhistov and Chang [2002], Townsend et al. [2017], Kaya et al. [2010], Shahidzadeh et al. [2015], Shin et al. [2014], Suresh [2006], Shahidzadeh-Bonn et al. [2008]. The previous study of the effects of polyelectrolyte concentration of drops and the surrounding humidity on the final salt crystallization, which exhibited profiles of concentric rings and needle-like and chainlike structures Kaya et al. [2010]. Takhistov et al. investigated the crystal formation process from microliter droplets on both hydrophilic and hydrophobic substrates. Based on their results, concentric rings of salts were formed on hydrophilic surfaces while crystalline was produced on hydrophobic surfaces Takhistov and Chang [2002]. Shahidzadeh et al. also investigated the evaporation and stain structures on 84 various substrates with two types of salts, sodium chlorine (NaCl), and calcium sulfate (CaSO4) with different crystalline structures and precipitation pathways. In their research, they concluded the crystalline pattern in a variety was concluded to be controlled by the interfacial properties of the emerging crystalline and the number of crystals generated Shahidzadeh et al. [2015]. The study of crystallization from saline droplet is conducted by Shin et al. They obtained threedimensional salt structures from droplets with high aspect ratio and a rich variety of three-dimensional crystalline deposits were observed Shin et al. [2014]. The coffee-ring effect process involves the solvent evaporation on droplet surface and resulting residue ring like patterns. The formation of the coffee-ring effect pattern is complex. The contact line pinning on the substrate and the contact angle determines the pattern formation Wong et al. [2011], Larson [2014], Deegan et al. [1997], Chen and Evans [2010], Eral et al. [2013]. Wong et al. found the physics of particle separation during coffee-ring formation, which is based on a particle-size selection mechanism near the contact line of an evaporating droplet. On the basis of this mechanism, they found nanochromatography of three relevant biological entities (proteins, micro-organisms, and mammalian cells) in a liquid droplet, with a separation resolution on the order of 100 nm and a dynamic range from 10 nm to a few tens of micrometers Wong et al. [2011]. Coffee-ring effect applications Understanding and controlling the process of solute deposition in the presence of coffee- ring effect is important in manufacturing processes involving evaporation on surfaces including printing Park and Moon [2006], Friederich et al. [2013], Kuang et al. [2014], Sun et al. [2015], Huang and Zhu [2019] and fabrication of ordered structures Han and Lin [2012], functional nanomaterials Shao et al. [2014], Zou and Kim [2014] and colloidal 85 crystals Park et al. [2006], Cui et al. [2009]. coffee-ring effect also improves the performance of commercial applications including fluorescent microarrays Blossey and Bosio [2002], Dugas et al. [2005], matrix assisted laser desorption ionization (MALDI) spectrometry Hu et al. [2013], Mampallil et al. [2012], Kudina et al. [2016], Lai et al. [2016], and surface enhanced Raman spectroscopy (SERS) Zhou et al. [2014a], Wang et al. [2014], Garcia- Cordero and Fan [2017]. coffee-ring effect has also implications in plasmonics Li et al. [2016a], solute separation Wong et al. [2011], diagnostics Brutin et al. [2011], Wen et al. [2013], Gulka et al. [2014] and electronics applications de Gans and Schubert [2004]. Suppression of coffee-ring effect Coffee-ring effect can be suppressed through one of the three physical strategies (i) preventing the pinning of the contact line; (ii) disturbing the capillary flow towards the contact line and (iii) preventing the particles being transported to the droplet edge by the capillary flows. The coffee-ring effect could be suppressed by preventing contact line pinning using hydrophobic surfaces. Increasing the hydrophobicity of surfaces is often accompanied by decreasing contact angle hysteresis (CAH) Eral et al. [2013]. Lower CAH in essence means reduced contact line pinning which leads to suppression of coffee-ring effect. Lower CAH could be achieved by patterning of controllable surface wettability as reviewed previously by Tial et al. Tian et al. [2013]. These methods include chemical modification Ko et al. [2004], Tian et al. [2013] and physical modification. On hydrophobic and partially hydrophobic surfaces, pinning can even occur when the CAH or solute concentration is high. If CAH is high, during the contact angle decreases to the receding angle, typically a few seconds depending upon the rate of evaporation, solutes can accumulate at the contact line. Such accumulation produces ring- like deposits only if the duration of pinning is above a critical value for a given substrate- 86 solute system Moraila-Martinez et al. [2013]. However if the pinning time is short, even with high initial solute concentration, the coffee-ring effect will just produce smaller inner rings Nguyen et al. [2013]. The nanoparticles are more prominent to form ring like patterns compared with larger particles as they can flow into the microscopic regions of the droplet edge faster. In the presence of solute particles in the droplet, electrowetting (EW) can reduce the pinned contact line on (partially)-hydrophobic surfaces Mugele and Baret [2005], Li and Mugele [2008]. A droplet is deposited on a dielectric layer covering an electrode. When a voltage is applied between the droplet and the electrode an electric force pulls the contact line outward, overcoming the pinning forces so the contact line pinning is reduced. The coffee-ring effect can also be suppressed by vibration and acoustics, marangoni flow and other factors Mampallil and Eral [2018]. Enhancement of coffee-ring effect Evaporation of droplets can be utilized as a method to concentrate its solutes in it. Evaporation of the solvent can increase the analyte concentration making the reactions more probable Hernandez-Perez et al. [2016], De Angelis et al. [2011]. By the coffee-ring effect, the solutes is deposited at the contact line increasing their concentration there and separated by their size, charge and solute-substrate interactions. This deposition of solutes and particles are exploited as a pre-concentration method Figure. 1.1. Concentrating solutes at the rim of the droplet by coffee-ring effect is called the self-ordered ring (SOR) method. It acts as a pre-concentration procedure before other analyses. To enhance the coffee-ring effect, hydrophobic surface is usually used as the substrate. Drying process on hydrophobic surfaces forms smaller rings with higher solute density as the contact line is pinned only in the later stages of the evaporation. Liu et al. demonstrated that the SOR method enhanced the fluorescence detection of orally 87 administrated berberine in human urine Liu et al. [2002]. Similarly, fluorescent detection of trace levels of tetracycline Huang et al. [2004a], quinidine sulfate in serum samples Yang and Huang [2006] and fluorescein Liu et al. [2006] was demonstrated based on the SOR method. Coffee-ring effect could facilitate identifying pathogens which are associated with diseases by isolating the disease markers from body fluids Wong et al. [2011], Chen and Evans [2010]. Coffee-ring effect has also been used to enhance the deposition of gold nanoparticles(AuNPs) on cellulose nanofibers (CNFs) to enhance surface-enhanced Raman scattering (SERS) Chen et al. [2017], Wang et al. [2014], Hussain et al. [2019], Juneja and Bhattacharya [2019], Zhou et al. [2014b]. Coffee-ring effect has also been utilized for a low-resource malaria diagnostic platform Gulka et al. [2014]. Coffee-ring effect also has shown great potential to monitor tap water quality with deep neural networks Li et al. [2020]. 88 Table 3.1: Coffee-ring residue patterns of Michigan tap waters Li et al. [2020]. MINIMALLY TREATED GROUNDWATER LIME SOFTENED MSU tap Durand Battle Kalamazoo East Lansing Howell water tap Creek tap Lansing SURFACE WATER ION EXCHANGE UNTREATED LAKE MICHIGAN GROUNDWATER Holland, Grand Holmes Wyoming Williamston Okemos Zeeland MI Rapids Hall 89 Tap water fingerprinting is fast, low-cost, and has potential to be automated, allowing greater numbers of samples to be analyzed across a distribution system Compared with other methods, the coffee-ring effect method for measuring pipe corrosion indicators has benefits of being low-cost and fast, not requiring specialized technicians, and the same method can be used to see multiple analytes at once. Required equipment to complete the coffee-ring effect method includes a small aluminum substrate and one pipette which costs about 10 dollars. To collect images, a cell phone camera and a $18, 30x jeweler’s loupe can be used. Considering the wide availability of cell phone cameras already used in households, the total cost for new, reusable equipment for this method is less than forty dollars Li et al. [2020]. Common methods for contaminants elements measurement are ICP-MS (about $25, 000 $40, 000 for refurbished), atomic absorption (about $13, 000 $20,000), and spectroscopic methods such as phenanthroline method, neocuproine method and bathocuproine method Walter [1961] The coffee-ring effect method is not only a low-cost method, but also fast (approximately total 25 minutes including 5 minutes to drop water and 20 minutes to dry), does not use hazardous reagents, and does not require specialized technicians to conduct the experiment, and has potential to be automated for the evaluation of high numbers of samples across a distribution system. Optimization of tap water fingerprinting for tap water contaminants As demonstrated in previous research, tap water fingerprinting (coffee-ring effect), an innovative technique for identifying and characterizing water samples, effectively distinguishes between different tap water compositions and differentiates mixtures of salts based on their consistent and reproducible water fingerprints Li et al. [2020], Shahidzadeh-Bonn et al. [2008]. This groundbreaking approach shows promising potential for a range of applications in environmental monitoring and water quality management. 90 The tap water fingerprinting method produces consistent and reproducible residue patterns under constant environmental conditions 3.2, but data is not yet available to demonstrate how much the residue patterns of dried water droplets change for small changes in environmental conditions. Table 3.2: Nine environmental conditions Temperature, RH 20-23 (°C) 23-26 (°C) 26-29 (°C) 35%-40% A D G 40%-45% B E H 45%-50% C F I Under low evaporation rate conditions, particles have time to arrange by Brownian motion Mampallil and Eral [2018], Rodriguez-Navarro and Doehne [1999], Marin et al. [2011]. In contrast, when the evaporation rate is high, high-speed particles deposit into a disordered phase. Consequently, under high relative humidity and low-temperature conditions, coffee-ring fingerprints are more constant Mampallil and Eral [2018], Rodriguez-Navarro and Doehne [1999], Marin et al. [2011]. However, no research has quantified how evaporation rate (temperature and relative humidity) influences residue patterns for mixed salt solutions at concentrations relevant to tap water. In this study, we further optimized the tap water fingerprinting methodology to enhance its capabilities for identifying contaminant particles in water samples. This optimization process involved several critical factors that significantly influence the accuracy and reliability of the fingerprinting results. Key factors considered include optimal temperature and humidity conditions, and solute properties. Experiments will be conducted to determine how much temperature and humidity control is required to minimize changes in particle positions, sizes, shapes, elemental composition, and crystal structures while also maximizing the separation of contaminant particles within the coffee-ring pattern. In this work, the question of what temperature and relative humidity ranges (within the range 91 of 20-29 degrees C and 35-50% relative humidity) provide reproducible fingerprints and sufficient separation of contaminant particles from other salts to facilitate detection within a photograph will be answered. Firstly, we examined the effects of temperature and humidity on the fingerprinting process. By conducting a series of controlled experiments, we determined the optimal temperature and humidity conditions that yield the most accurate and consistent water fingerprints. These findings are crucial in ensuring that the fingerprinting method can be effectively applied under varying environmental conditions and across diverse geographical regions. Next, we investigated the role of solute properties in the fingerprinting process. Given that the presence of various solutes can alter the characteristics of water fingerprints, understanding their effects is essential for accurately identifying contaminants in water samples. Through rigorous testing, we determined the key solute properties that influence the fingerprinting results. Furthermore, we identified the optimal conditions to concentrate similar contaminants and effectively separate different contaminants, thereby enhancing the precision and reliability of the tap water fingerprinting method. In conclusion, our optimization of the tap water fingerprinting method has resulted in significant improvements in its ability to identify contaminant particles in water samples. By carefully considering and addressing the effects of temperature and humidity conditions and solute properties, we have established a more reliable and accurate technique for analyzing water quality and detecting potential contaminants. This optimized fingerprinting method holds great promise for enhancing water safety and protecting public health on a global scale. 92 3.3 Experimental Methods 3.3.1 Materials and instruments The following substances were purchased from Fisher Scientific: sodium bicarbonate, calcium chloride, magnesium chloride, sodium sulfate, sodium phosphate monobasic, potassium fluoride, sodium hydroxide, iron nitrate nonahydrate, and copper sulfate. The surface-polished aluminum slides used were obtained from McMaster-CARR (1655T1) with a yield strength of 35,000 psi, a hardness of Brinell 95 (soft), and a fabrication of cold rolled, temper 3/8" thick T651. The slides met the specification of ASTM B209 and were polished to a #8 reflective finish without any visible grain lines. One side of these sheets and bars was polished to either a brushed finish or a mirror-like finish and protected with a peel-off film. 6061 aluminum, the most commonly used type, is used to make a wide range of products, from pipe fittings and containers to automotive and aerospace parts. The Scanning Electron Microscopy (SEM) and Energy-Dispersive X-ray Spectroscopy (EDS) images were acquired using a high-performance JEOL 6610LV SEM system, set at an accelerating voltage of 20 kV. This advanced microscope is specifically designed for the efficient characterization and imaging of delicate structures, providing exceptional SEM imaging at magnifications ranging from 5X to 50,000X. The accelerating voltage of the JEOL 6610LV can be adjusted from 300 V to 30 kV. X-Ray diffraction images were collected by the Oxford EDS system which was equipped on the SEM system. The JEM 6610LV Scanning Electron Microscope (SEM) is equipped with EDS. SEM/EDS provides chemical analysis of the field of view or spot analyses of minute particles. The EDS Analysis System for SEM was designed for a wide range of applications. Whether simply collecting a spectrum or performing complex phase analysis, the system is easy to get the quick results you want. EDS analysis is best 93 suited for: Metals and metal alloys, Ceramics, Minerals and Certain types of polymeric materials. The operation software is Scandium image processing software by Olympus Soft imaging Solutions. Coffee-ring effect patterns were also collected by SamSung S6 cell phone or a 5 MP Digital Microscope Pro-20x-200x magnification (Celestron) camera. Data analysis and statistical analysis were performed on MATLAB R2021a, R 4.1.1 and python 3.7. 3.3.2 Four-axis-autosampler The Four-axis-autosampler is a complex device that is designed to automate the process of collecting and injecting samples. The device is composed of several components. The 3D printer stage, a CNC 3018-PRO Router Kit, is responsible for providing the the foundation for the other components to be mounted on and for providing the necessary movement and precision for the device to operate accurately. The injector,a Thermo Scientific 365CL221, is responsible for injecting the samples into the system. This component is designed to be highly precise and accurate, ensuring that the samples are injected with minimal error or variation. The Raspberry Pi-4 Model B 2019 Quad Core 64 Bit WiFi Bluetooth (4GB) serves as the controller for the stepper motors, the injector, and the sample collection system. The Raspberry Pi is also responsible for running the python code that controls the device’s operations. The 3 steppers, Nema 17 Bipolar 2A Stepper Motor by OSM Technology Co (17HS19-2004S1), are responsible for moving the injector. These motors are designed to provide precise and accurate movement of the injector, ensuring that samples are injected in the correct location. The one stepper motor driver (TB6600 4A 9- 42V Nema 17) is responsible for operating the sample collection and injection action. This stepper motor is responsible for moving the sample collection system, which is responsible for collecting samples, and moving the injector, which is responsible for 94 injecting the samples into the system. The device is operated by python code under linux system, specifically Ubuntu operating system. The sample code is used to control the various components of the device, including the stepper motors, the injector, and the sample collection system. This code is responsible for ensuring that the device operates accurately and efficiently and is able to collect and inject samples with minimal error or variation. The Four-axis-autosampler is a highly advanced device that is designed to automatically prepare water samples based on a predefined set of water samples. The device is equipped with a sample holder that can hold up to 32 water samples at a time, making it suitable for large-scale sample preparation tasks. The device operates in several steps, each of which is specifically designed to ensure accurate and efficient sample preparation. In the first step, the autosampler resets its syringe positions to the initial setting to ensure that the syringe is in the correct position and orientation before it begins to collect and inject samples. The syringe is then washed with nanopure water to ensure that it is clean and free from any contaminants. In the second step, the syringe collects a 2 µL water sample at a predefined water sample location to ensure that the correct sample is collected and that the sample is collected in the correct location. The stage then moves the syringe to the desired sample location above the substrate and lowers the syringe until the syringe tip is 0.5 mm above the substrate. This step is important for ensuring that the sample is delivered to the correct location on the substrate. In the third step, the fourth motor pushes the syringe piston to slowly push the 2 µL water sample out of the syringe. This step is important for ensuring that the sample is delivered to the substrate in a controlled and precise manner. The water sample is then 95 dropped on the substrate surface. In the last step, after the water sample is dropped, the syringe is rinsed with nanopure water again and reset to its original location for collecting the next water sample. This step is important for ensuring that the syringe is clean and free from any contaminants before it collects the next sample. The whole process is then repeated for each water sample in the sample holder. This allows for efficient and accurate sample preparation for a large number of samples in a short period of time. The process flow is illustrated in Figure. 6.7. Furthermore, the system is built on open source software and hardware, it can be easily modified and expanded according to the user’s needs. The device’s control system is based on a Raspberry Pi, which is a powerful and versatile platform that can be easily programmed and customized. This allows for flexibility and adaptability in the device’s operation, making it suitable for a wide range of applications. The Four-axis-autosampler is a powerful and efficient device that is designed to collect water coffee-ring samples at a high speed. The device is capable of collecting samples at a rate of 45 seconds per sample, which is comparable to the speed of a human sample collector, who typically takes around 30 seconds per sample. However, the autosampler has several advantages over human sample collectors. One of the main advantages of the auto-sampler is its stability and ability to work continuously for longer periods of time. Unlike human sample collectors, the device does not tire, and it can work continuously without interruption. This is an important feature for large-scale sample preparation tasks that require a high degree of accuracy and consistency. Another advantage of the auto-sampler is that it can be placed in a small chamber with controlled temperature and humidity. This is beneficial because it allows for precise control over the sample preparation environment, which is important for maintaining the integrity and quality of the samples. Operating the same 96 experiments manually under this condition is tedious and time-consuming. In addition to its ability to collect water coffee-ring samples, the auto-sampler can be easily modified to work for other tasks. For example, it can be used for solution preparation, blood test and so on. This makes it a versatile and useful tool for a wide range of applications. Overall, the Four-axis-autosampler is a powerful and efficient device that can significantly improve the speed and accuracy of sample preparation tasks. Its compact size, precise control over the sample preparation environment, and ability to work continuously make it an ideal tool for large-scale sample preparation tasks. 3.3.3 Auto temperature humidity control chamber An auto-temperature-humidity control chamber was constructed using a chamber, two Diymore XH-M452 temperature and humidity controllers, a Space SFH-181 TP heater from Ningbo Electrical Appliance Company, a Frigidaire FFRA051WAE 5000 BTU air conditioner, and an AO-101 AquaOasis humidifier. Sodium hydroxide was used as a dehumidifier. Typically an environmental control chamber would cost on the order of $5000; herein, to reduce overall cost of implementing the tap water fingerprinting method we built, will demonstrate use of, and will publish designs for a lower cost setup on the order of $1000. The chamber controlling system consists of two automotive temperature and relative humidity controllers and one of them is programmed to increase temperature and relative humidity and the other is programmed to decrease temperature and relative humidity. The chamber consists of a 12V, 200W heater, a ultrasonic humidifier, a 500 ml plastic bottle with dry NaOH and desiccant and a 5,000 BTU 115V mini air-conditioner. 97 Figure 3.1: Temperature humidity control chamber. This auto temperature humidity control chamber is capable of adjusting and maintaining the temperature and humidity automatically based on the pre-set temperature and humidity values in the two Diymore controllers. The sensitivity of temperature is 0.5 degree and relative humidity is 1%. Based on the test, the system is capable adjust temperature at a speed of 3 degrees Celsius per min and relative humidity of 2% per min. After adjusting the temperature and humidity to desired the desired range, the chamber switched to main mode. If the temperature increased and above the highest temperature limit, the air conditioning switch would be turned on to decrease the temperature to the desired range. On the other hand, if the temperature of the chamber was below the lowest limit, the switch of the heater would be turned on to increase the chamber temperature until temperature increased to the desired range. The humidifier and dehumidifier worked in the same way. When the chamber humidity was below the pre-set lowest limit, the 98 humidifier would be turned on until the humidity reaches the desired range and if the humidity was higher than pre-set highest limit, the dehumidifier would be turned on to lower the humidity to the desired range. 3.3.4 Water samples In order to determine the effects of temperature and humidity on residue patterns of various water compositions, synthetic tap water samples containing various concentrations of the main components in tap water were prepared based on the range of composition concentrations of the Detroit water quality report in 2017, 2018, 2019. Detroit water is served by Great Lakes Water Authority to about 3.5 million people, 40% Michigan residents (Detroit Water and Sewerage Department 2015). Sources of Detroit tap water include the Detroit River and Lake Huron and thus, the composition of Detroit tap water varies over time. The water recipe is determined by the average of Detroit Water Quality Report from 2016 to 2018 and three recipes are designed to mimic the variability of the water chemistry Table. 3.3. Water recipes Table. Water recipes will be spiked into water samples prepared by preparing water sample with 0.7 ppm fluoride, 0.4 ppm nitrate, 0.062 ppm aluminum, 1.1 ppm potassium, 25 ppm sulfate, 0.36 ppm phosphorus in nanopure water (Table. 3.4). Table 3.3: Detroit tap water components data sheet (Source: Detroit water quality reports 2017-2019) Average Average Components (ppm) (mM) Nitrate 0.790 0.013 Lead 0.000 0.000 Iron 0.277 0.005 Copper 0.015 0.000 Magnesium 10.800 0.444 Calcium 37.833 0.946 Sodium 9.817 0.427 Potassium 1.533 0.039 99 Table 3.3: (cont’d) Sodium 9.817 0.427 Potassium 1.533 0.039 Manganese 0.004 0.000 Zinc 0.000 0.000 Sulfate 33.267 0.346 Phosphorus 1.040 0.034 Chloride 18.033 0.509 Fluoride 0.853 0.045 Table 3.4: Recipe for synthetic water samples Sample ID / NaHCO CaCl2 MgCl2 Na2SO4 NaH2 PO4 KF Fe(NO3)3 CuSO4 Components 3 (mM) Sample A 0.1 1.5 0.5 0.35 0.033 0.4 0.005 0.00024 Sample B 0.2 1 0.35 0.35 0.033 0.4 0.005 0.00024 Sample C 0.1 0.5 0.2 0.35 0.033 0.4 0.005 0.00024 Sample D 0 1 1 1.35 0.033 0.4 0.005 0.00024 Sample E 0 1 0.5 2.35 0.033 0.4 0.005 0.00024 3.3.5 Coffee-ring effect pattern statistical analysis methods After preprocessing the images, particles would be recognized by MATLAB and would be used to calculate particle shape, color, location from the drop edge, and size. These properties would be extracted from each residue image for each water recipe, and analysis of variance (ANOVA) would be conducted across the nine environmental condition groups and for constant evaporation rates (five replicate samples in each group). Residue patterns for two environmental conditions would be considered different from one another when a statistical difference is observed for any of the particle measurements (shape, color, location from the drop edge, and size). Residue patterns would be labeled as consistent across two environmental conditions when there is no statistical difference observed between any of the particle measurements. Analysis of variance (ANOVA) is a statistical technique to analyze variation in a response variable (continuous random variable) measured under conditions defined by discrete factors (classification variables, often with 100 nominal levels). In order to determine whether or not residue patterns are consistent across two different environmental conditions, a statistical analysis would be conducted on various particle measurements. These measurements include the shape, color, location from the drop edge, and size of the particles. If a statistical difference is observed for any of these measurements between the two environmental conditions, the residue patterns would be considered different. This means that there is a significant variation in one or more of the particle measurements between the two conditions, indicating that the residue patterns are not the same. If there is no statistical difference observed between any of the particle measurements, the residue patterns would be labeled as consistent across the two environmental conditions. This means that there is no significant variation in any of the particle measurements, indicating that the residue patterns are the same. Overall this approach would be used to compare residue patterns between two environmental conditions only, and that further research and analysis may be required to compare residue patterns across multiple conditions or other factors. One-Way ANOVA The one-way analysis of variance (One-way ANOVA) is also known as single-factor ANOVA or simple ANOVA. As the name suggests, the one-way ANOVA is suitable for experiments with only one independent variable (factor) with two or more levels. Full Factorial ANOVA (Two-Way ANOVA) Full Factorial ANOVA, also known as two-way ANOVA, is a statistical method used to determine the effect of two or more independent variables on a dependent variable. It involves using every possible combination of levels of the independent variables in an 101 experiment, and analyzing the data to see if there is a significant difference in the dependent variable due to the different levels of the independent variables. Two-way ANOVA can also be used to determine if there is an interaction between the independent variables, which means that the effect of one variable on the dependent variable depends on the level of the other variable. This method is useful for experiments where there are multiple factors that could potentially affect the outcome, and allows researchers to gain a more comprehensive understanding of the relationship between the variables. PERMANOVA PERMANOVA is an acronym for “permutational multivariate analysis of variance”. It is best described as a geometric partitioning of multivariate variation in the space of a chosen dissimilarity measure according to a given ANOVA design, with p-values obtained using appropriate distribution-free permutation techniques (see Permutation Based Inference; Linear Models: Permutation Methods). The method is semiparametric, motivated by the desire to perform a classical partitioning, as in ANOVA (hence allowing tests and estimation of sizes of main effects, interaction terms, hierarchical structures, random components in mixed models, etc.), while simultaneously retaining important robust statistical properties of rank-based nonparametric multivariate methods, such as the analysis of similarities (ANOSIM2), namely, (1) the flexibility to base the analysis on a dissimilarity measure of choice (such as Bray-Curtis, Jaccard, etc.) and (2) distribution- free inferences achieved by permutations, with no assumption of multivariate normality. Thus, PERMANOVA opens the door for formal partitioning of multivariate data in response to complex experimental designs in a wide variety of contexts: there may be more response variables than sampling units, data may be severely non-normal, zero- inflated, ordinal or qualitative (e.g., responses to questionnaires, DNA/RNA sequences, 102 allele frequencies, amino acids, or protein data). Although originally motivated by ecological studies, where variables usually consist of counts of abundances (or percentage cover, frequencies, or biomass) for a large number of species, PERMANOVA is now used across many fields, including chemistry, social sciences, agriculture, medicine, genetics, psychology, economics, and more Anderson [2014]. The required assumption are exchangeability and the linear model and homogeneity of multivariate dispersions. MANOVA The Multivariate analysis of variance (MANOVA) procedure provides regression analysis and analysis of variance for multiple dependent variables by one or more factor variables or covariates. The factor variables divide the population into groups. Using this general linear model procedure, the null hypotheses could be tested about the effects of factor variables on the means of various groupings of a joint distribution of dependent variables. The MANOVA could be used to investigate interactions between factors as well as the effects of individual factors. In addition, the effects of covariates and covariate interactions with factors can be included. For regression analysis, the independent (predictor) variables are specified as covariates. Both balanced and unbalanced models can be tested. A design is balanced if each cell in the model contains the same number of cases. In a multivariate model, the sums of squares due to the effects in the model and error sums of squares are in matrix form rather than the scalar form found in univariate analysis. These matrices are called SSCP (sums-of-squares and cross-products) matrices. If more than one dependent variable is specified, the multivariate analysis of variance using Pillai’s trace, Wilks’ lambda, Hotelling’s trace, and Roy’s largest root criterion with approximate F statistic are provided as well as the univariate analysis of variance for each dependent variable. In addition to testing hypotheses, Multivariate analysis of variance 103 (MANOVA) produces estimates of parameters O’Brien and Kaiser [1985]. ANOSIM Classical one-way ANOSIM operates on an appropriate resemblance matrix calculated among samples, with a factor describing their a priori group structure (e.g. of different sites, times, treatments, etc.) underlying the null hypothesis to be tested, namely H0: ’no differences among groups of samples’. If the null hypothesis is true, then the average rank resemblance among samples within groups is expected to be the same as the average rank resemblance among samples from different groups. The ANOSIM statistic R is defined as the scaled difference between the average between-group (r̄ B ) and within-group (r¯W ) ranks: (. ////0. ! /////) " 𝑅= 2/4 (3.1) where M = n(n − 1)/2 and n is the total number of samples being considered. Clearly, under the null hypothesis, R would be expected to take values (positive or negative) ’close’ to zero, and increasing departure from H0 would result in increasingly larger positive values for R. The scaling in equation 3.1 ensures that R falls within the range -1 to 1, and takes the value R = 1 only under maximal separation of the groups, that is if all samples within groups (replicates) are less dissimilar to each other than any pair of samples from different groups. Values of R substantially less than 0 are not usually to be expected as this implies that samples within groups are generally less similar to each other than samples in different groups, a possibility only for a mislabeled or seriously inappropriate design. Note that the usual mathematical terminology for ranks assigns to the highest observation a rank value of 1 (the lowest number). If H0 is true, then all samples effectively belong to a single group. The spread of possible values of R under the null hypothesis can be determined by randomly permuting the sample labels 104 and recalculating R for each random reallocation, or for a random subset if there is a large number of possible permutations Hope [1968]. The significance level of the observed value of R is then determined by comparing it to the range of values obtained under permutation, with rejection of the null hypothesis when the observed R is sufficiently large (positive) to have rarely or never occurred under permutation. Jensen-Shannon divergence The Jensen-Shannon divergence is a measure of similarity between two probability distributions. It is a symmetric and finite variant of the Kullback-Leibler divergence, also known as information radius Nielsen [2021], Manning and Schutze [1999] or total divergence to the average Dagan et al. [1997]. The square root of the Jensen-Shannon divergence is also known as the Jensen-Shannon distance Endres and Schindelin [2003], Osterreicher and Vajda [2003], Fuglede and Topsoe [2004], and it is a metric that can be used to compare two probability distributions. It is commonly used in information theory, machine learning, and natural language processing, among other fields. Multidimensional scaling (MDS) Multidimensional scaling is a visual representation of distances or dissimilarities between sets of objects. “Objects” can be colors, faces, map coordinates, political persuasion, or any kind of real or conceptual stimuli Kruskal and Wish [1978]. Objects that are more similar (or have shorter distances) are closer together on the graph than objects that are less similar (or have longer distances). As well as interpreting dissimilarities as distances on a graph, MDS can also serve as a dimension reduction technique for high-dimensional data Buja et al. [2008]. Noise removal with singular value decomposition (SVD) Singular value decomposition (SVD) is a mathematical technique by which a matrix is 105 decomposed into a product of three matrices, which can also be written as a sum of rank- one matrices. SVD could be regarded as a generalization of eigen decomposition, a technique employed to decompose a positive semidefinite normal matrix. This relationship makes SVD connected to principal component analysis (PCA), a technique commonly utilized for data analysis and representation. One example of SVD application can be found in image processing. A digital image can be represented by a matrix, where the value of a matrix element encodes information about a specific pixel. By breaking down this matrix using SVD, the image can be simplified, and useful information can be extracted. Another application of SVD is observed in signal processing, where it is employed to remove noise from biomedical signals and construct signal and noise subspaces for analysis and approximation. 3.3.6 Experiment procedure This research comprises three stages. In the first stage, data collection, SEM, EDS, and camera photographs related to the coffee-ring effect were gathered and the images were preprocessed. The second stage focused on method optimization, during which the required extent of temperature and humidity control to maintain consistent residue patterns was examined through the coffee-ring effect. The final stage involved identifying the optimal environmental conditions for separating contaminant particles from one another (such as calcium, sodium, magnesium, etc.) using the statistical analysis introduced earlier. Stage 1: Collection of coffee-ring effect residue pattern Stage 1 was divided into two subtasks. Task 1a) involved collecting the coffee-ring effect SEM, EDS, and camera photographs. Task 1b) focused on preprocessing the images gathered in Task 1a by implementing noise removal, color normalization, and other techniques. 106 Data collection: To investigate the effect of environmental conditions on coffee-ring effect patterns, nine temperature and relative humidity (RH) combinations were maintained by the auto temperature humidity control chamber, and a four-axis autosampler was placed inside the chamber (Table. 3.1). During the droplet dropping process, each water sample was stored in a 2 µL micro-centrifuge tube and placed in the sample holder. In each experiment, sixteen samples were positioned at once in one sample holder. The autosampler was programmed to collect 2 µL water samples and inject them onto the aluminum substrate (6061 with mirror-like finish, McMaster-Carr 1655T1) as described in previous research Li et al. [2020]. After each water sample injection, the injector tip was rinsed through a programmed procedure in nanopure water. To prevent the influence from other droplets’ drying processes, droplets were placed 1 cm apart, and ten droplets were dried at once on one aluminum substrate (1 inch wide and 3 inches long). To avoid vibrations from autosampler motors, the aluminum slides were positioned on an independent sample stage detached from the autosampler. The auto temperature humidity control chamber not only maintained the desired temperature and humidity but also prevented air flow in the environment. Two microliter droplets of each of the five water samples would be deposited on a mirrored aluminum slide and allowed to dry, separating particles that form through the coffee-ring effect Li et al. [2020]. Five water droplet replicates were collected under each environmental condition. A low-cost camera photograph were used for all replicates, using 100X magnification and the Celestron camera, including a color bar in all images to normalize brightness, contrast, and color. The total number of collected photographs is 225 (9 environmental conditions, 5 water recipes with 5 replicates). Residues were saved for further analysis. Image preprocessing: In image preprocessing, images were color-normalized based on 107 RGB distribution. Images were loaded using the imread function and converted to binary with the im2bw function (threshold set to 0.2). Noise was removed from each binary image using the medfilt2 function with an [8, 8] square parameter. Particle edges were captured using the edge function with the canny method applied to the smoothed binary image. Particle properties were extracted from the smoothed binary image using the regionprops function with ’Area’, ’Perimeter’, ’Eccentricity’, ’Orientation’, and ’Centroid’ methods. In each SEM-EDS map, a 2-D coordinate was established with origin on the center of the droplet pattern in Matlab. The deposition position of each particle for each element were recorded as a x-y value in the coordinate. Because particles were deposited as a circle around the residue center, particle locations were calculated by the distance between the particle’s location and the coordinate center. The adjusted centroid was recalculated by taking the square root of the sum of squares of the differences between the centroid x- coordinate and the image center x-coordinate, and the centroid y-coordinate and the image center y-coordinate. Stage 2: Optimization of tap water fingerprints Stage 2 was divided into three subtasks. Task 1a) Determine the ranges of temperature, relative humidity that have consistent coffee-ring fingerprints. Task 1b) Find the optimal ranges of temperature, relative humidity to separate contaminants particles from each other. Task 1c) Investigate the elements deposition separation effect under each environmental condition. Adjusted coordinate = G(X56789:;< − X567869 )4 + (Y56789:;< − Y567869 )4 (3.2) Task 1a: Determine the ranges of temperature, relative humidity over which coffee-ring fingerprints are constant. In order to implement this method broadly for analyzing samples across a distribution 108 system, it is essential to accommodate analysis in various laboratories and field settings. This task assessed the extent of temperature and humidity control needed to produce consistent tap water fingerprints. The proposed nine temperature and humidity conditions Table. 3.2 were evaluated using PERMANOVA on coffee-ring effect residue pattern features. Task 1b: Find the optimal ranges of temperature, relative humidity that different water samples exhibit different coffee-ring effect residue patterns. This task aimed to investigate the optimal temperature and relative humidity conditions under which differnt water samples exhibit different coffee-ring effect residue pattern. In the previous task, the optimal temperature and relative humidity conditions were determined to exhibit consistent coffee-ring effect residue pattern. However, only have similar residue pattern is not enough to distinguish different water samples. This task utilized PERMANOVA, MANOVA, ANOVA tecniqes to investigate the coffee-ring effect residue pattern feature statistics under different environmental conditions. Jensen- Shannon divergence was used to measure the similarity between different water samples and classical multidimensional scaling (NMDS) was used to visualize the differences in the coffee-ring effect residue pattern features between different water samples. Task 1c: Investigate the optimal ranges of temperature, relative humidity to separate contaminants particles from each other. This task is to investigate whether specific elements are associated with residue particles, EDS mapping images were used to identify particle compositions in coffee-ring effect residue patterns. The locations of elements were determined by calculating the square root of the x-axis and y-axis relative to the center of each image. Analysis of variance (ANOVA) was conducted on the element locations to examine whether there were 109 any significant differences in the spatial distribution of elements within the residue patterns. Stage 3: Identify the correlation between water sample coffee-ring effect patterns and element deposition compositions The EDS images were preprocessed using Singular Value Decomposition (SVD) and noise was filtered using the medfilt2 function with filter size [3, 3]. After preprocessing, the element compositions were extracted from the EDS mappings. To determine the composition ratio of each element in the corresponding particle, the particles extracted from the water samples coffee-ring effect patterns were compared with the pixel signals extracted from the EDS data. The composition ratio of each element in each particle was then calculated. To investigate whether there is a significant difference in the element composition ratios between particles, the correlations of these ratios were calculated, and ANOVA was conducted on these ratios. 3.4 Results and Discussion 3.4.1 Under what environmental conditions are coffee-ring effect fingerprints are consistent PERMANOVA on coffee-ring effect residue pattern features (particle shape, color, location from the drop edge, and size) results has shown in Table. 3.5. In all the nine temperature and relative humidity combination conditions, the p-values are all smaller than 0.001. Based on the p-values which has the same degree of freedom of 4, all the coffee- ring effect residue patterns are consistent between sample replicates and different between different samples. However the R2 of all the nine conditions are ranging from 0.716 to 0.957. PERMANOVA on coffee-ring effect residue pattern features visualization results has shown in Table. 6.12. According to the visualization results (manhattan distance 110 ◦ applied), under the condition A (20-23 C, 35%-40%), most samples have been separated except samples A and B. However, sample A and sample B have similar recipes according to Table. 3.4. At the same time, sample D and sample E have similar water components, and their positions in the PERMANOVA visualization result are near to each other Figure. 3.2. Based on the visualization result, the samples coffee-ring effect residue pattern features are mostly differentiable under the condition C (20-23 ◦C, 45%-50%) (Figure. 3.3) and H (26-29 ◦C, 40%-45%) (Figure. 3.4). Across the nine conditions, the sample C (0.1 mM NaHCO 3 , 0.5 mM CaCl2, 0.2 mM MgCl 2 , 0.35 mM Na2SO4, 0.033 mM NaH 2 PO 4 , 0.4 mM KF , 0.005 mM Fe(NO3)3, 0.00024 mM CuSO4) is the most stable one that all replicates clustered in a smaller range and not overlapping with other samples. Sample E (0 mM NaHCO 3 , 1 mM CaCl2, 0.5 mM MgCl 2 , 2.35 mM Na2SO4, 0.033 mM NaH 2 PO 4 , 0.4 mM KF , 0.05 mM Fe(NO3)3, 0.00024 mM CuSO4) is the most unstable one that spreads most among the five water samples. This could be explained by with higher humidity, there is more vapor-liquid exchange of water molecules. So in the particles formation process, there is more time for the particles to crystalize and in the droplet-air interface, the water density gradient decreases smaller than low humidity conditions. This smaller water density induces the particles formed in a slower and gradual manner which results in crystals formed in different phases of droplet drying processes and formed unique residue patterns. At high temperature conditions, the particles residue pattern features are not only spread out but also replicates of the same sample clustered closer. The reason is at higher temperature conditions, at the moment of crystallization, crystals formed at a relative speed so the patterns features are more consistent between replicates, for example in condition H (26-29 C, 40%-45%) and I (26- 29 ◦C, 45%-50%). According to the overall analysis, the suitable conditions to produce 111 consistent residue patterns are the conditions with high temperature and relative humidity such as condition C, condition F, condition H and condition I. All condition results is shown in Table. 6.11. Table 3.5: PERMANOVA analysis for particle features Condition Df Sum of Sqs Mean Sqs F.Model R2 Pr(>F) sig. 11 10 A 4 1.09 × 10 2.71 × 10 113.28 0.95773 0.001 *** 8 9 B 4 2.87 × 10 7.18 × 10 64.175 0.92772 0.001 *** 11 10 C 4 1.91 × 10 4.78 × 10 15.567 0.75689 0.001 *** 11 10 D 4 1.67 × 10 4.18 × 10 71.904 0.93498 0.001 *** 11 10 E 4 1.49 × 10 3.73 × 10 12.651 0.71673 0.001 *** 11 11 F 4 4.72 × 10 1.18 × 10 15.542 0.7566 0.001 *** G 4 5.98 × 1011 1.49 × 1011 8.1009 0.61835 0.001 *** 11 10 H 4 2.08 × 10 5.20 × 10 27.386 0.84561 0.001 *** 11 11 I 4 7.43 × 10 1.86 × 10 24.709 0.8317 0.001 *** Figure 3.2: PERMANOVA of condition A 112 Figure 3.3: PERMANOVA of condition C Figure 3.4: PERMANOVA of condition H 3.4.2 What are the optimal environmental conditions that different water samples exhibit mostly different coffee-ring effect residue patterns To investigate the optimal environmental conditions for separating particles in the coffee- 113 ring effect residue pattern, water samples coffee-ring effect residue particles feature statistics were analyzed under varying conditions. The most statistically significant particle features were identified through multivariate analysis of variance (MANOVA) on water samples and environmental conditions. The study found that factors such as mean area, mean perimeter, mean eccentricity, standard deviation of area, standard deviation of centroid, and standard deviation of orientation influenced the coffee-ring effect residue pattern features. These results, as shown in Table 3.6, provide insight into the conditions that promote a more visible and distinct coffee-ring effect residue pattern. According to the findings, particle features such as area, perimeter, eccentricity, and centroid are sensitive to environmental conditions, with ’class’ representing water samples and ’condition’ representing environmental conditions in Table 3.6. Table 3.6: MANOVA analysis for image properties Responses Response area mean Df Sum Sq Mean Sq F value Pr(>F) sig. −16 class 4 38382 9595.6 44.1302 < 2.2 × 10 *** −05 condition 8 7450 931.3 4.2831 8.562 × 10 *** Residuals 212 46097 217.4 Response area std Df Sum Sq Mean Sq F value Pr(>F) sig. −16 class 4 1930667 482667 46.1313 < 2.2 × 10 *** condition 8 274996 34375 3.2854 0.001482 *** Residuals 212 2218133 10463 Response eccentricity mean Df Sum Sq Mean Sq F value Pr(>F) sig. −16 class 4 0.121804 0.0304510 42.7562 < 2.2 × 10 *** condition 8 0.019479 0.0024348 3.4187 0.001016 ** Residuals 212 0.150987 0.0007122 Response eccentricity std Df Sum Sq Mean Sq F value Pr(>F) sig. class 4 0.0109677 0.00274192 21.4681 6.974 × 10−15 *** 114 Table 3.6: (cont’d) condition 8 0.0017548 0.00021935 1.7174 0.09579 . Residuals 212 0.0270767 0.00012772 Response orientation mean Df Sum Sq Mean Sq F value Pr(>F) sig. class 4 555.90 138.975 11.9927 8.368 × 10−09 *** condition 8 189.02 23.628 2.0389 0.04333 * Residuals 212 2456.72 11.588 Response orientation std Df Sum Sq Mean Sq F value Pr(>F) sig. class 4 56.00 14.000 2.3224 0.05782 . condition 8 279.66 34.957 5.7991 1.074 × 10−06 *** Residuals 212 1277.94 6.028 Response perimeter mean Df Sum Sq Mean Sq F value Pr(>F) sig. class 4 73599 18399.8 53.9503 <2.2 × 10−16 *** condition 8 10876 1359.5 3.9863 0.0002011 *** Residuals 212 72303 341.1 Response perimeter std Df Sum Sq Mean Sq F value Pr(>F) sig. class 4 4692837 1173209 42.6128 < 2 × 10−16 *** condition 8 500897 62612 2.2742 0.0236 * Residuals 212 5836749 27532 Response centroid mean Df Sum Sq Mean Sq F value Pr(>F) sig. class 4 401278 100319 12.530 3.614 × 10−09 *** condition 8 878527 109816 13.717 5.513 × 10−16 *** Residuals 212 1697283 8006 Response centroid std Df Sum Sq Mean Sq F value Pr(>F) sig. class 4 89628 22406.9 17.906 1.108 × 10−12 *** condition 8 200351 25043.8 20.013 < 2.2 × 10−16 *** Residuals 212 265294 1251.4 Table 3.6 demonstrates the coffee-ring effect residue pattern variabilities with the interaction between environmental conditions and water samples. However, the coffee- ring effect pattern variabilities of water samples without environmental conditions are not clear. In the ANOVA analysis of coffee-ring effect residue pattern features (Table 3.7), 115 area mean, area standard deviation, perimeter mean, perimeter standard deviation, centroid mean, centroid standard deviation, and eccentricity mean are statistically significant across the nine experiment conditions. The area mean is mostly significant at the 10−6 level (conditions A, C, E, F, G, H, I) and lower, with only two conditions (B, D) having larger statistical significance at 10−2-10−3. This result suggests that the area mean significantly differs between water samples under most test environmental conditions. It aligns with the results in Table 6.12, where particle positions in the PERMANOVA visualization image are mixed under conditions B and D. This confirms that particles formed by different water samples exhibit distinct coffee-ring effect residue patterns. 116 Table 3.7: P-value of ANOVA of coffee-ring effect residue pattern features under each experiment condition Area Perimeter Centroid Eccentricity Orientation Area Perimeter Centroid Eccentricity Orientation Condition mean mean mean mean mean std std std std std 5.62× 3.11 × 1.57 × 2.7 × 10−6 5.07×10−2 3.58× 1.20 × 7.5 × 2.64 × 10−3 8.07×10−2 A 10−6 10−7 10−3 10−6 10−6 10−6 9.51× 1.71 × 4.04 × 5.71 × 10−3 7.16×10−3 8.74× 2.55 × 1.07 × 3.83 × 10−2 1.67×10−1 B 10−2 10−2 10−8 10−7 10−6 10−10 2.46× 1.26 × 2.66 × 5.13 × 10−9 4.32×10−3 1.26× 3.78 × 1.10 × 4.60 × 10−2 3.12×10−1 C 10−10 10−9 10−2 10−7 10−7 101 7.70× 4.56 × 3.23 × 1.36 × 10−5 1.75×10−1 2.73× 2.39 × 8.48 × 3.12 × 10−3 1.22×10−1 D 10−3 10−5 10−6 10−4 10−5 10−4 9.50× 4.19 × 4.86 × 1.09 × 10−4 4.62×10−3 3.23× 1.09 × 6.57 × 1.31 × 10−3 1.12×10−2 E 10−12 10−12 10−6 10−15 10−13 10−5 8.41× 6.43 × 2.99 × 5.61 × 10−5 1.55×10−1 2.50× 1.20 × 7.38 × 4.50 × 10−4 6.48×10−1 F 10−7 10−9 10−2 10−6 10−6 10−5 1.23× 1.62 × 1.45 × 4.66 × 10−7 4.52×10−2 9.30× 4.92 × 8.60 × 3.21 × 10−7 5.14×10−5 G 10−6 10−10 10−6 10−6 10−8 10−5 1.43× 3.65 × 1.47 × 9.72 × 10−6 1.68×10−1 1.24× 7.94 × 2.96 × 2.46 × 10−6 2.80×10−1 H 10−9 10−11 10−2 10−6 10−7 10−2 6.74× 2.41 × 3.02 × 5.00 × 10−3 3.73×10−1 3.16× 2.51 × 9.13 × 3.69 × 10−2 4.76×10−3 I 10−6 10−6 10−1 10−5 10−5 10−7 117 For the perimeter mean variable, the nine conditions show similar statistical results, with the perimeter mean mostly significant at the 10−5 level (conditions A, C, D, E, F, G, H, I) and lower, except for one condition (B) with a larger statistical significance of 1.71 × 10−2. The larger significance value also contributes to point mixing in the PERMANOVA under condition B (Table 6.11). Figure 3.5: PERMANOVA of condition B Although centroid mean is statistically significant in water sample coffee-ring effect residue pattern features, the significance levels are lower than those of area mean and perimeter mean features, with five significance values greater than 10−3 among nine conditions. This occurs because the shapes of formed particles are similar, leading to similar centroid calculations among particles. Interestingly, despite condition B having larger significance values for area mean and perimeter mean, the significance value of centroid under condition B is smaller than other conditions. Eccentricity values are similar to centroid, with larger significance values than area 118 mean and perimeter mean but smaller values than centroid. In contrast, orientation mean shows much larger significance values due to particles forming in the droplet drying process, from the droplet edge to the droplet center, resulting in the same orientations. The standard deviations of coffee-ring effect residue particle area, perimeter, centroid, eccentricity, and orientation display similar results to the feature means: area and perimeter standard deviations have the highest levels of statistical significance, centroid and eccentricity standard deviations have lower levels of statistical significance, and orientation standard deviation has the lowest significance levels. However, unlike the residue particle feature mean values, the feature standard deviation values do not correlate with the PERMANOVA of residue particle features. The ANOVA on coffee-ring effect residue pattern features of each water sample follows the same trend: particle area and perimeter features have the highest statistical significance levels, centroid and eccentricity have lower statistical significance, and orientation has the lowest significance levels (Table 3.8). 119 Table 3.8: P-value of ANOVA of coffee-ring effect residue pattern features of water samples Area Perimeter Centroid Eccentricity Orientation Area Perimeter Centroid Eccentricity Orientation Samples mean mean mean mean mean std std std std std Sample 5.16 × 4.2 × 4.97×10−9 3.92 × 10−5 4.1 × 10−1 2.02× 6.26 × 1.87 × 3.24 × 10−3 4.66×10−1 1 10−4 10−3 10−1 10−1 10−4 Sample 8.89 × 7.48 × 2.63×10−9 1.37 × 10−5 4.92×10−3 9.81× 3.99 × 1.05 × 6.9 × 10−2 7.38×10−4 2 10−3 10−3 10−2 10−2 10−8 Sample 4.72 × 1.2 × 1.18×10−3 5.23 × 10−5 6.77×10−1 3.2 × 2.87 × 5.64 × 5.75 × 10−6 1.28×10−5 −11 −12 3 10 10 10−8 10−8 10−9 Sample 2.05 × 1.79 × 1.23×10−4 1.14 × 10−3 8.78×10−2 4.51× 5.55 × 6.68 × 4.08 × 10−2 2.22×10−3 4 10−6 10−6 10−6 10−7 10−10 Sample 2.60 × 1.65 × 2.90 × 3.95 × 10−5 4.21×10−1 1.35× 1.67 × 7.45 × 6.61 × 10−2 1.24×10−3 5 10−3 10−4 10−12 10−3 10−4 10−12 120 In a previous analysis of variance (ANOVA), the significance of each coffee-ring effect residue pattern feature was evaluated independently, considering water sample class or environmental condition separately. To confirm the statistical significance of these pattern features, a multivariate analysis of variance (MANOVA) was conducted for each environmental condition individually, as shown in Table 3.9. Based on the MANOVA results, four conditions (A, B, D, and F) exhibited a statistically significant residue area feature. Five conditions (A, B, C, E, and H) showed a statistically significant residue eccentricity feature. Seven conditions (A, B, D, E, F, H, and I) demonstrated a statistically significant residue centroid feature. Unlike the ANOVA results, the orientation feature was not found to be statistically significant under any condition. However, only one condition (B) had a statistically significant residue perimeter feature. This discrepancy is likely due to the MANOVA algorithm accounting for the correlations between the residue features. 121 Table 3.9: MANOVA of coffee ring effect residue pattern features of water samples under each condition Area Perimeter Centroid Eccentricity Orientation Area Perimeter Centroid Eccentricity Orientation Condition mean mean mean mean mean std std std std std 4.8 × 10−5 1.15 × A 0.6364 0.456 0.0001 0.016 0.0003 0.003 10−6 0.012 0.032 1.12 × 8.4 × 1.99 × B 0.408 0.059 10−8 0.035 0.80 0.0001 10−5 10−8 0.0025 0.09 C 0.18 0.38 0.29 4.7 × 10−5 0.062 0.01 0.2 0.5 0.254 0.22 D 0.23 0.51 0.001 0.038 0.366 0.0005 0.1 0.001 0.268 0.79 3.9 × E 0.179 0.117 10−7 0.006 0.05 0.01 0.014 0.007 0.004 0.19 F 0.40 0.44 0.38 0.03 0.06 0.0016 0.037 0.0001 0.8 0.31 G 0.95 0.91 0.059 0.5 0.81 0.037 0.21 0.185 0.72 0.15 H 0.52 0.54 0.006 0.0006 0.099 0.076 0.182 0.0012 0.63 0.39 1.75 × I 0.83 0.756 0.249 0.6 0.59 0.042 0.173 10−6 0.88 0.001 122 ANOSIM for coffee-ring effect residue pattern features To account for the correlation between the experiment environmental conditions and water sample classes, ANOSIM (with Canberra dissimilarity index) was conducted on coffee- ring effect residue pattern features. The results are shown in Table 6.12. According to the ANOSIM results, conditions C, E, G, and H are the conditions where coffee-ring effect residues of the same water recipe are more distinguishable than those of water samples with different components. Conditions A and F display comparable differences between water samples with the same components and those with different components. However, conditions B and I show the least similarity in residue patterns for the same water components and different residue patterns for different water components. The ranking of water sample residue pattern similarity for the same water components compared to water with different components is C, H, E, G, F, A, D, B, I. Statistically, the null hypothesis is that there is no difference between the means of two or more groups of (ranked) dissimilarities. The ANOSIM statistic R (Table 3.10) and significance values can be compared to test this hypothesis. Table 3.10: R-value ANOSIM result of water samples coffee-ring effect residue patterns ◦ Temperature C Relative Humidity 20-23 (◦C) 23-26 (◦C) 26-29 (◦C) (R-value) 35%-40% 0.6344 0.5459 0.7600 40%-45% 0.5366 0.7706 0.7922 45%-50% 0.8643 0.7202 0.5366 ANOSIM was conducted on each particle feature of the coffee-ring effect residue pattern features to investigate the variability of particle area, perimeter, eccentricity, and centroid in relation to water samples and environmental conditions. Under each environmental condition, Jensen-Shannon divergence was calculated based on particle area, perimeter, and eccentricity. Multidimensional scaling and classical multidimensional 123 scaling coordinates were then derived from the Jensen-Shannon distance matrix. ANOSIM for coffee-ring effect residue pattern area feature The ANOSIM result for the coffee-ring effect residue pattern area feature is shown in Table 6.13. In this result, the upper right and lower left triangles are the same due to the interchangeability of distances between two replicate residues. Also, in condition A results, for images from 11 to 15, the distances between the replicates are smaller than the distances between these replicates and other samples, demonstrating the consistency of coffee-ring effect residue patterns. Conditions C, F, and H all display relatively smaller distances within water samples than distances between samples. Under the high relative humidity conditions (conditions C, F, and I), sample E (1 mM CaCl2, 0.5 mM MgCl2, 2.35 mM Na2SO4, 0.033 mM NaH2PO4, 0.4 mM KF, 0.005 mM Fe(NO3)3, and 0.00024 mM CuSO4) exhibits relatively greater distinct residue patterns compared to other water samples. The CMDS coordinates of the ANOSIM results are shown in Table 6.14. In this table, it is clear that the coffee-ring effect residue patterns of replicates for each water sample are clustered near each other under conditions C, F, and H. However, the projected points under conditions A, B, and D are mixed together. Therefore, based on the residue pattern area feature, conditions C, F, and H are suitable for separating water contaminant particles from each other. ANOSIM for coffee-ring effect residue pattern perimeter feature The ANOSIM result for the coffee-ring effect residue pattern perimeter is shown in Table 6.15. Based on the results, under conditions D, G, and H, the similarities between water samples C (0.1 mM NaHCO 3 , 0.5 mM CaCl2, 0.2 mM MgCl 2 , 0.35 mM Na2SO4, 0.033 mM NaH 2 PO 4 , 0.4 mM KF, 0.005 mM Fe(NO3)3, and 0.00024 mM CuSO4) 124 differ from those of water samples D (1 mM CaCl2, 1 mM MgCl 2 , 1.35 mM Na2SO4, 0.033 mM NaH2 PO4 , 0.4 mM KF, 0.005 mM Fe(NO3)3, and 0.00024 mM CuSO4) and E (1 mM CaCl2, 0.5 mM MgCl 2 , 2.35 mM Na2SO4, 0.033 mM NaH 2 PO 4 , 0.4 mM KF, 0.005 mM Fe(NO3)3, and 0.00024 mM CuSO4). The reason is that samples D and E do not contain NaHCO3 . Additionally, only under condition A do the replicates of water samples C, D, and E produce consistent residue patterns; under other temperature and relative humidity conditions, water samples A, B, and C produce more consistent residue patterns. Furthermore, under conditions B, C, D, F, and I, sample E produces different residue patterns than samples A, B, C, and D. In the nanochromatography (Table 6.24), sample E is prone to forming an olive-shaped residue with a strong edge. Especially under conditions D and G, sample E has difficulty maintaining a convex shape residue, which results from the shrinkage of the residue during the droplet drying process. The CMDS coordinates of the ANOSIM results are shown in Table 6.16. The sample separation and replicate clustering results are not as strong as those for the residue pattern area feature. This is because non-convex shaped residues can produce the same sized residue pattern but with a much larger perimeter. Only under conditions C and F are the water samples with different components separated, and replicates with the same recipe are clustered together. ANOSIM for coffee-ring effect residue pattern centroid feature The ANOSIM result for the coffee-ring effect residue pattern centroid is shown in Table 6.17. Based on the results, only under condition C do the replicates of water samples produce similar residue pattern centroid features, and water samples with different components produce different residue patterns. Under conditions A and B, water samples 125 C and D produce similar residue pattern centroid features. The reason that the centroid feature is not a suitable metric to distinguish water samples with different water components is that the formed particles in the residue have a similar centroid, which originates from the formation of the particles. During the droplet drying process, particles are formed from the droplet edge to the droplet center, and they are formed in the same direction, resulting in particles with similar centroids (see Table 6.27). The CMDS coordinates of the ANOSIM results are shown in Table 6.18. As shown in the centroid ANOSIM results, only under condition C do water samples with different components’ residue patterns produce different centroid features and have different coordinates in the CMDS plot. Replicates of water samples with the same components produce similar centroid residue patterns and have similar coordinates in the CMDS plot. However, under conditions A, D, and G, the water sample C points are separable from the other water samples (see Table 6.18). This is consistent with the results in the ANOSIM results, where under condition G, water sample C (replicates 11 to 15) residue patterns have more similar centroid features than the other water replicates. This phenomenon occurs under conditions with lower relative humidity, where the concentration of 0.1 mM NaHCO3 , 0.5 mM CaCl2, and 0.2 mM MgCl 2 is lower. These low component concentrations result in slower particle formation, so only when the droplet shrinks to a smaller size will particles form, and the formed particles are larger than particles formed under other conditions (see Table 6.24, Table 6.25, and Table 6.26). ANOSIM for coffee-ring effect residue pattern eccentricity feature The ANOSIM results for coffee-ring effect residue pattern eccentricity are shown in Table 6.19. Based on the results, under condition A, the replicates of water samples A and B have similar eccentricity features, and water samples C, D, and E have similar 126 eccentricity features. However, water samples A, B, and C form one group, and water samples D and E form another group. Under conditions C, D, G, and H, water sample A exhibits its own eccentricity feature. Under condition H, all five water samples exhibit distinct eccentricity features. The CMDS coordinates of ANOSIM results are shown in Table 6.20. Under conditions B, C, and D, all replicate points are mixed together in a small region and cannot be separated effectively. Under condition G, replicate points are separated by their components; however, these points are too close together, making it difficult to find a clear rule for separating them and using them for further prediction. The water samples are separated maximally under condition H; however, there are two drawbacks in this condition. First, the replicates of water sample A are not clustered in a small region, indicating that the replicates’ consistency is not optimal, as shown in Table 6.19. Second, samples B and C are too close to each other in the CMDS plot. 3.4.3 Under each environmental condition, are the elements deposition locations significantly different from each other Previous analyses have shown that both environmental conditions and water chemistry have statistically significant effects on coffee-ring effect patterns. However, these analyses did not provide information on whether the elements were separated in each residue pattern. To investigate this, EDS mapping images were used to label particle compositions in coffee-ring effect residue patterns. The locations of elements were calculated as the square root of the x-axis and y-axis relative to the center of each image. The p-value of the analysis of variance (ANOVA) was found to be smaller than 2 × 10−16, indicating that environmental conditions and water sample class have significant statistical effects on element distributions. This suggests that different elements are separated by the coffee-ring 127 effect. The carbon, chlorine and sulfur elements Two-way ANOVA results is shown in Table. 6.21. All the tests on these nine conditions have degree of freedom of 4 for class variable, degree of freedom of 2 for elements variable and degree of freedom of 8 for class:elements (class stands for water samples, elements stands for elements, carbon, chlorine and sulfur in this case). Based on these tests, all these nine conditions have shown statistical significance that the p-value is smaller than 2 × 10−16. Comparing the F values respect to the elements of under these nine conditions, condition A and C have the value around 5600 and condition D have the value around 8600 which is the highest value in these nine conditions. This results concludes the carbon, chlorine and sulfur are mostly separated under condition D than condition A and C and other conditions. Comparing the F values respect to the class variable, condition C, D and G all have shown largest F values (F values in the range of 400-470) which means the carbon, chlorine and sulfur elements are mostly separated in the coffee-ring effect residue pattern under these environmental conditions with respect to the water components recipe. Comparing the class to elements correlation, the carbon, chlorine and sulfur are mostly separated under the C, D, F and G (F values in the range 400-600) conditions which is consistent with the ANOSIM of residue pattern features result. The Two-way ANOVA results for calcium, magnesium, and sodium elements are presented in Table 6.22. All nine tests have a degree of freedom of 4 for the class variable, 2 for the elements variable, and 8 for class:elements (where ’class’ represents water samples, ’elements’ represents the elements calcium, magnesium, and sodium in this case). Based on these tests, all nine conditions showed statistical significance, with p-values smaller than 2 × 10−16. 128 When comparing the F-values with respect to the elements under these nine conditions, conditions A, B, C, and I had values around 3000, while condition D had the highest value at around 3800. This indicates that calcium, magnesium, and sodium are more effectively separated under condition D compared to A, B, C, and the other conditions. When comparing the F-values with respect to the class variable, conditions B, C, D, and E showed the largest F-values (in the range of 50 to 150), suggesting that calcium, magnesium, and sodium elements are more effectively separated in the coffee-ring effect residue pattern under these environmental conditions with respect to the water components recipe. Furthermore, when comparing the class to elements correlation, calcium, magnesium, and sodium were more effectively separated under conditions B, C, D, and E (with F- values in the range of 100 to 180), which is consistent with the ANOSIM analysis of residue pattern features for carbon, chlorine, and sulfur. Overall, these results suggest that conditions B, C, D, and E are the most effective for separating calcium, magnesium, and sodium elements in the coffee-ring effect residue pattern. Previous analyses have demonstrated that environmental conditions and water chemistry have statistically significant effects on the coffee-ring effect pattern and the distribution of element components in water samples. However, it remains unclear whether there is a correlation between the coffee-ring effect patterns and the element compositions of water samples, which is crucial for building models to recognize and quantify contaminants. In previous analyses, we identified several optimal conditions that produced consistent replicates of water sample residue patterns and distinct residue patterns for different water components. The following analysis aims to investigate under which environmental conditions the coffee-ring effect patterns of water samples are correlated 129 with element compositions. This analysis will provide insight into the relationship between the residue patterns and the underlying elemental components, which can be used to develop more accurate models for detecting and quantifying contaminants. 3.4.4 Do the water sample coffee-ring effect patterns have significant statistical correlation with element composition The heat-map correlations between the coffee-ring effect residue particles’ area, eccentricity, and the percentage of elements such as sulfur, chlorine, carbon, sodium, magnesium, and calcium are shown in Table 6.23. The strongest correlations between residue particle features and element percentage are observed under conditions A, G, and H. Under condition G, the correlation between calcium and magnesium is -0.0093, indicating that these two elements in the residue are well separated in the residue pattern. Conversely, under condition B, the correlation between calcium and magnesium is 0.0087, suggesting that these two elements present in similar positions in the residues are not well separated. Another important phenomenon observed under conditions A, G, and H is that the correlation between particle area feature and elements is higher than other conditions. For instance, the correlation between particle area and sulfur percentage is 0.01, which is higher than condition B (0.0045) and condition D (0.0057). Additionally, the correlation between area and chlorine is 0.027, which is the highest correlation among these nine conditions. Overall, these results suggest that conditions A, G, and H are more effective in separating the elemental components in the coffee-ring effect residue pattern and producing a higher correlation between the particle features and element compositions. 130 Table 3.11: Optimal condition analysis for consistent replicates residue pattern and distinct water samples particle features. Conditions vs A B C D E F G H I analysis Temperature (°C) 20-23 20-23 20-23 23-26 23-26 23-26 26-29 26-29 26-29 Relative humidity 35-40 40-45 45-50 35-40 40-45 45-50 35-40 40-45 45-50 (%) PERMANOVA on CRE pattern ✓ ✓ ✓ ✓ features MANOVA on CRE pattern area ✓ ✓ ✓ ✓ MANOVA on CRE pattern perimeter ✓ MANOVA on CRE pattern eccentricity ✓ ✓ ✓ ✓ ✓ MANOVA on CRE pattern centroid ✓ ✓ ✓ ✓ ✓ ✓ ✓ CRE pattern features ANOSIM ✓ ✓ ✓ ✓ ✓ CRE area ✓ ✓ ✓ ANOSIM CRE perimeter ✓ ✓ ✓ ✓ ANOSIM CRE centroid ✓ ✓ ✓ ANOSIM CRE eccentricity ✓ ✓ ✓ ✓ ANOSIM EDS elements ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ANOVA Particles EDS ✓ ✓ ✓ ANOVA Summary-number of optimal results 6 6 7 6 4 8 5 7 2 out of 12 analyses Based on the previous analysis presented in Table 3.11, the optimal environmental conditions for separating the elemental components in the coffee-ring effect residue pattern are 23-26°C with 45-50% relative humidity, 20-23°C with 45-50% relative humidity, and 26-29°C with 40-45% relative humidity. Nonetheless, the optimal environmental condition is a temperature range of 23-26°C and a relative humidity of 45-50%, as it yielded the 131 highest number of optimal results in 12 separate analyses. These conditions produced the highest correlation between particle features and element compositions, indicating that the particles and elements were well separated in the residue pattern. These optimal environmental conditions can be useful for developing models to detect and quantify contaminants in water samples using the coffee-ring effect residue pattern analysis. 3.5 Conclusion The study demonstrates the potential of the coffee-ring effect as a tool for tap water analysis. It shows that the coffee-ring effect can produce unique fingerprints for water samples with different recipes and environmental conditions. The results also confirm the reproducibility of the coffee-ring effect, which is essential for establishing it as a reliable analytical technique. Additionally, the study highlights that both environmental conditions and water compositions impact the residue patterns produced by the coffee-ring effect, and that these patterns reflect the water chemistry within the sample. This study demonstrated the effectiveness of the auto temperature humidity control chamber in maintaining temperature and relative humidity, as well as the four-axis autosampler for conducting nanochromatography experiments. The study concludes that statistical methods such as ANOVA, MANOVA, and PERMANOVA can differentiate coffee-ring effect residue patterns with respect to environmental conditions and water sample compositions. However, the results from different analysis methods are inconsistent, so further research is needed to determine the best method for differentiating these patterns. The research presents the findings of various statistical analyses conducted to investigate the coffee-ring effect residue patterns. These analyses included ANOVA and MANOVA tests on residue pattern features, such as area, perimeter, centroid, eccentricity, and orientation, ANOSIM tests on residue pattern 132 features and element distributions, and two-way ANOVA tests on element distributions. The results of these analyses indicate that both environmental conditions and water chemistry significantly influence residue patterns and element distributions. In particular, certain conditions, such as 23-26°C with 45-50% relative humidity, 20-23°C with 45-50% relative humidity, and 26-29°C with 40-45% relative humidity, are well- suited for differentiating between water samples with varying concentrations of different components. Nonetheless, the optimal environmental condition is a temperature range of 23-26°C and a relative humidity of 45-50%, as it yielded the highest number of optimal results in 12 separate analyses. It is important to note that these findings have implications for the study of residue patterns and the understanding of the coffee-ring effect. Specifically, they suggest that further research is needed to better understand how environmental factors and water chemistry work together to impact residue patterns. 133 CHAPTER 4 CNN-Vision-transformer model for elements concentration estimation by coffee-ring effect residue patterns 4.1 Abstract This study investigates the effectiveness of the machine learning technique in detecting multiple contaminants in a tap water’s dried residue’s coffee-ring effect "fingerprint" with high accuracy. The use of the coffee-ring effect on water droplets dried on low-cost aluminum substrates allows low-cost separation of solutes within water samples, forming unique “fingerprints” for each tap water that can be photographed and analyzed using machine learning. Three models were evaluated in this research: the One-stage point estimation model (OnePeM), the Two-stage vision-transformer point estimation model (TwoVtPeM), and the Two-stage vision-transformer multiple output estimation model (TwoVtMoM). The TwoVtPeM technique achieved the best performance of the models tested (OnePeM, TwoVtPeM and TwoVtMoM), with OnePeM also performing well and TwoVtMoM falling short. The TwoVtPeM relative percentage errors were ±17.1% for oxygen, ±4.5% for sulfur, ±19.9% for sodium, ±5.7% for chlorine, ±19.8% for calcium, ±25.8% for magnesium, and ±20.1% for carbon. The R2 was 0.95 which is higher than OnePeM with 0.90 R2 and TwoVtMoM which was 0.54. The TwoVtPeM had a higher error mean than OnePeM, but it exhibited lower relative standard deviations of estimation; the TwoVtPeM relative standard deviations values were: 3.9% for oxygen, 3.0% for sulfur, 5.3% for sodium, 3.9% for magnesium, 5.3% for chlorine, 10.0% for calcium, and 5.9% for carbon. Moreover, 79.2% of water samples were correctly classified for hardness based on the estimated element concentrations by TwoVtPeM. The OnePeM 134 model correctly classified 67.2% of water samples, however the TwoVtMoM model achieved only 60.2% accuracy rate in classifying water samples for hardness. The study’s findings reveal the advantages of the deep learning technique (TwoVtPeM) potential for water analysis over other screening methods such as test strip test kits, due to its ability to estimate multiple contaminants simultaneously, speed and low cost. Further improvements can be made, including addressing certain limitations such as the quality of the substrate and the size and complexity of the dataset and models. Advances in camera technology and deep learning techniques have the potential to improve the method’s ability to detect low concentrations of elements. In conclusion, this study highlights the potential of machine learning to transform water quality monitoring, leading to better health outcomes for individuals and communities. 4.2 Introduction Ensuring sustainable and clean access to water is crucial for water and wastewater treatment plants as well as other natural and industrial systems that depend on this vital resource. These plants not only have to meet the needs of consumers and upgrade infrastructure to improve their quality of life, but they also face increasingly stringent regulatory measures to meet rising quality standards Faherty [2021]. Unfortunately, heavily polluted waterways are becoming more common in many countries, posing a threat to human, aquatic, and terrestrial life Ebenstein [2012]. To address these challenges, researchers worldwide are exploring methods to optimize, remediate, and enhance water usage Lages Barbosa et al. [2015], Yang et al. [2020], Vu and Wu [2022], Podder et al. [2021]. Many are focusing on creating and simulating optimized, cost-effective, and intelligent models to tackle these issues. Artificial intelligence (AI) has become an important tool in this effort, enabling the analysis and interpretation of vast amounts of 135 data to facilitate better decision-making and more effective management of water resources. The water industry is increasingly turning to emerging AI and ML technologies, as well as smart systems, to address challenges that have traditionally been underserved by conventional methods and approaches. These technologies are anticipated to offer cost savings and process optimization through their resilience, generalization, and ease of design, helping to model and overcome complex water-related issues Alam et al. [2022], Taoufik et al. [2022], Gordanshekan et al. [2023], Xie et al. [2022]. Applications that have already benefited from ML include water and wastewater treatment, natural-systems monitoring, and precision agriculture. The most commonly used ML techniques in these studies include artificial neural networks (ANNs), recurrent neural networks (RNNs), random forest (RF), support vector machine (SVM), and adaptive-neuro fuzzy inference systems (ANFISs), with occasional use of AI techniques such as fuzzy inference systems (FISs). Some studies have also explored hybrid approaches, such as ANN-RF and SVM- RF, with positive outcomes in water-related modeling processes. 4.2.1 Coffee-ring effect residue provides particles structure information The coffee-ring effect creates unique residue patterns or fingerprints correlating to tap water chemistry when harnessed Li et al. [2020], Shahidzadeh-Bonn et al. [2008], Kaya et al. [2010], Shin et al. [2014], Shahidzadeh et al. [2015]. These patterns result from the crystallization process of water contaminants and are influenced by various factors, such as evaporation, bulk flow, temperature, humidity, and wettability Li et al. [2020], Qazi et al. [2017], Wei et al. [2012], Sammalkorpi et al. [2009], Desarnaud et al. [2014], Meldrum and O’Shaughnessy [2020]. Crystallization of salts from drying saline droplets has been investigated in some studies, which analyzed nucleation mechanisms and the dependence 136 of precipitation profile on factors like surface properties and salt concentration. The complexity of the coffee-ring effect pattern formation is influenced by contact line pinning on the substrate and the contact angle. Previous research has found particle separation during coffee-ring formation to be based on a particle-size selection mechanism near the contact line of an evaporating droplet, leading to nanochromatography of various biological entities with high separation resolution and dynamic range Wong et al. [2011], Larson [2014], Deegan et al. [1997], Chen and Evans [2010], Eral et al. [2013]. This mechanism has the potential to be used to estimate crystal structures and even particle concentrations. 4.2.2 Applications of AI and ML methods in Water Treatment ML techniques for modeling membrane-filtration processes aim to output several variables, such as transmembrane pressure, permeate flux, and solute rejection. Inputs in published studies include pH, temperature, contact/filtration time, transmembrane pressure, and flux rate, among others. ANN, RNN, and SVM models consistently performed well, achieving R2 values greater than 0.9 and often greater than 0.99. AI and ML methods have also been used to control chlorination, estimate disinfection by-product (DBP) concentration, model significant parameters for adsorption and membrane-filtration processes. Statistical measures used to evaluate results include the coefficient of correlation (R), coefficient of determination (R2), mean average error (MAE), mean square error (MSE), root mean square error (RMSE), and relative error (RE). Chlorination and Disinfection By-Product Estimation Disinfecting water is crucial for killing or inactivating microorganisms and viruses. Chlorine-based disinfectants Li et al. [2017], Xu et al. [2015, 2013] are often used, but they pose health hazards and can create DBPs Sedlak and von Gunten [2011], Bull et al. [1995]. AI methods can be used to control chlorination, while ML technologies can predict and 137 mitigate DBP formation. Studies have tested models on surface waters treated with chlorine and noted success in modeling DBP concentrations in treated water distribution networks and at consumer taps Librantz et al. [2018], Godo-Pla et al. [2021], Singh and Gupta [2012], Mahato and Gupta [2022], Park et al. [2018], Lin et al. [2020], Xu et al. [2022], Peleato [2022], Okoji et al. [2022], Cordero et al. [2021]. Adsorption Processes Adsorption processes remove various contaminants in the water and wastewater treatment industries. Predictive models using ML can optimize the adsorption process and extend the media’s life, increasing the plant’s effectiveness and confidence in meeting applicable regulations. Studies have modeled adsorption processes with water streams contaminated with metals, industrial dyes, and organic compounds using various adsorbent media including carbonaceous materials and metal-based nanocomposites Bhagat et al. [2021], Mazloom et al. [2020], Mesellem et al. [2021a], Al-Yaari et al. [2022], Mazaheri et al. [2017], Ahmad et al. [2020], Fawzy et al. [2016], Ullah et al. [2020], Mahmoud et al. [2019], Mesellem et al. [2021b]. Membrane-Filtration Processes Membrane processes separate contaminants in water and wastewater treatment by passing the water through a barrier or filter using high-pressure differentials. These processes are typically used for contaminants that are difficult or costly to remove by chemical or physical means Hube et al. [2020], Pronk et al. [2019]. AI and ML models have been used to treat various water sources contaminated with pollutants and natural compounds Zoubeik et al. [2019], Fetanat et al. [2021], Khan et al. [2022], Yusof et al. [2020], Nazif et al. [2020], Shim et al. [2021], Ammi et al. [2021a]. ANN is the most commonly used model, although ANFIS, SVM, and specific forms of ANNs have also been used for membrane- 138 filtration-process modeling. ANN, RNN, and SVM models consistently performed well, achieving R2 values greater than 0.9 and often greater than 0.99 Zoubeik et al. [2019], Khan et al. [2022], Yangali-Quintanilla et al. [2009]. Vision Transformer in computer vision Deep neural networks (DNNs) are the backbone of AI systems today. Different types of networks are suited for different tasks. For instance, the multi-layer perceptron (MLP) or fully connected (FC) network, made up of multiple linear layers and nonlinear activations, is a classical type of neural network Rosenblatt [1957]. Convolutional neural networks (CNNs) use convolutional and pooling layers to process shift-invariant data like images LeCun et al. [1998], Krizhevsky et al. [2017]. Recurrent neural networks (RNNs) apply recurrent cells to handle sequential or time series data Hochreiter and Schmidhuber [1997]. Transformer is a novel neural network that uses self-attention mechanisms Bahdanau et al. [2014], Parikh et al. [2016] to extract intrinsic features Vaswani et al. [2017] and demonstrates great potentialfor broad AI applications. It was first used in NLP tasks, where it showed significant improvement Vaswani et al. [2017], Devlin et al. [2018], Brown et al. [2020]. For instance, Vaswani et al. Vaswani et al. [2017] first proposed the transformer, which is based on the attention mechanism, for machine translation and English constituency parsing tasks. Devlin et al. Vaswani et al. [2017] introduced BERT (Bidirectional Encoder Representations from Transformers), a language representation model that pre-trains the transformer on unlabeled text, considering the context of each word in a bidirectional manner. BERT obtained state-of-the-art results on 11 NLP tasks upon publication. Brown et al. Brown et al. [2020] pre-trained a massive transformer- based model, GPT-3 (Generative Pre-trained Transformer 3), using 45 TB of compressed plaintext data and 175 billion parameters. It performed well on various downstream NLP 139 tasks without the need for fine-tuning. These transformer-based models, with their robust representation capacity, have brought about significant advances in NLP. Recently, the success of transformer architectures in NLP has inspired researchers to apply it to computer vision tasks. Although CNNs have been traditionally considered the foundation of vision applications He et al. [2016], Ren et al. [2015], the transformer is proving to be a potential alternative. Chen et al. Chen et al. [2020] trained a sequence transformer to predict pixels through auto-regression, achieving comparable results to CNNs in image classification tasks. The vision transformer model, ViT, was proposed by Dosovitskiy et al. Dosovitskiy et al. [2020], which directly applies a pure transformer to sequences of image patches for full image classification and has achieved state-of-the-art results on multiple image recognition benchmarks. Transformer has also been used to solve various other vision problems, such as object detection Carion et al. [2020], Zhu et al. [2020],semantic segmentation Zheng et al. [2021], image processing Chen et al. [2021], and video understanding Zhou et al. [2018]. Its exceptional performance is attracting more researchers to propose transformer-based models for a wide range of visual tasks. However, there has not yet been research conducted using the coffee-ring effect in conjunction with machine learning and deep learning models, particularly the vision transformer model, to estimate the concentration of elements in water samples. The vision transformer model has the potential to not only utilize the particle morphology and location information from one element to make estimations but also incorporate the physical chemistry interactions between elements to correct noise and increase accuracy. This approach could offer a novel method for screening water quality and even understanding the underlying interactions between various elements within them. Another contribution of the study is the use of SEM-EDS images as training data to build the 140 model. This approach allows for the extraction of much more detailed information regarding crystal structure. Additionally, EDS images serve as guidance for the model to estimate the locations of deposited elements, which helps reduce estimation errors and increase the coefficient of determination from 0.90 to 0.95. This innovative method provides improved accuracy and insights into the complex relationships between elements within the samples. 4.2.3 Model for elements recognition and concentration estimation The proposed components estimation model is a two-stage deep learning approach for determining the elements concentrations in water samples using the coffee-ring effect. The coffee-ring effect is a phenomenon in which a ring-shaped deposit of coffee particles is formed around the perimeter of a droplet of coffee on a substrate. The effect is caused by the combination of coffee particles’ Brownian motion and evaporation, which causes the particles to be transported to the edge of the droplet. The coffee-ring effect is of interest in various fields such as materials science, physics, and biology, as it can be used to pattern surfaces and deposit particles in a controlled manner. The first stage of the model utilizes a deep learning model to estimate the locations and abundances of seven elements (calcium, magnesium, sodium, sulfur, carbon, oxygen, and chlorine) in the sample, based on the crystal structure and location information present in images of the coffee-ring effect. The input to the model are the SEM images and EDS images of the coffee-ring effect, which are pre-processed to ensure they are of good quality and that the features of interest are clearly visible. The model uses a convolutional neural network (CNN) architecture to extract features from the images, as the information extracted from one element can be useful for understanding the presence and behavior of other elements, and the crystal deposition location plays a critical role in determining the 141 crystal composition. The output of the first stage are seven binary images, each indicating the estimated location and abundance of a specific element. The binary output images are threshold images that have been processed to get a binary image where the pixels with signal corresponds to the location of the estimated element and the abundance of signal pixels indicates the abundance of the element in that area. The second stage of the model utilizes a Vision-transformer deep learning model to estimate the concentrations of the elements in the sample, based on the locations and abundances estimated in the first stage. The model uses the outputs from the first stage as input, and considers the relationships between elements, such as the low solubility of calcium sulfate, to improve the accuracy of the concentration estimates. For example, the estimated concentration of sulfur can be used to refine the concentration estimation of calcium, and vice versa. This stage also uses a CNN architecture to extract features from the inputs and a regression model to estimate the concentrations. Overall, this proposed model utilizes the latest machine learning techniques to study the coffee-ring effect and estimate the composition of elements in water samples. The two-stage approach, with the co-learning and attention technique, allows for more accurate estimation of the locations, abundances, and concentrations of the elements, and can provide new insights into the dynamics of the coffee-ring effect and aid in the development of new techniques for controlling the deposition of particles. In this study, three models were built and evaluated on water samples that have been prepared using scanning electron microscopy (SEM) images. The end-to-end model is a single-stage model designed to estimate the concentrations of elements in the water samples. The input to the model is a three-layer SEM image and the output is the 142 estimated concentration of elements. The model consists of a Unet module with a ResNet50 encoder, ImageNet encoder weights, and a sigmoid activation function. This is followed by three convolutional layers, max pooling layers with ReLU activation, and a final linear layer that outputs the estimated element concentrations (refer to Figure 4.1). The Two-stage vision-transformer point estimation model is made up of two modules (stages). The first module is identical to the Unet structure in the end- to-end model, producing seven binary 2D outputs used to estimate the elements’ concentrations. The second module is a vision-transformer module that extracts elements’ location information to estimate their corresponding concentrations (refer to Figure 4.2). The third model, the Two-stage vision-transformer multiple output model, is similar in structure to the Two-stage vision-transformer point estimation model, but it produces a range of elements’ concentrations (refer to Figure 4.3). 143 Figure 4.1: One-stage point estimation model structure. Figure 4.2: Two-stage vision-transformer point estimation model structure. Figure 4.3: Two-stage vision-transformer point estimation model structure. 4.3 Experimental Methods 4.3.1 Develop a deep learning model to identify corrosion indicators and quantify their concentrations in tap water A CNN model has been developed to identify corrosion indicators in tap water, 144 utilizing similar methods as those previously employed for assigning tap water SEM fingerprints to groups with similar water chemistry with an accuracy of 76.7 ± 3.0% Li et al. [2020]. Features of the previous model that are applicable to the new work include the convolutional layers, fully connected layers, and the Relu activation function Li et al. [2020]. Parameters of the model have been adjusted to fit this research, including the number of convolutional layers, the output layer, and the loss function (three-channel RGB images will be analyzed instead of black and white images). The output of this study consists of maps depicting expected elemental deposition and concentrations of each contaminant, in contrast to the previously published work where the output was a classification of the image into a group with similar water chemistry. Loss will be calculated for the proposed work using mean square error instead of the cross-entropy method used previously Li et al. [2020]. The experiment has been divided into three steps. In the first step, additional tap water SEM fingerprints have been collected and evaluated for synthetic Detroit water samples under the optimal environmental condition (23-26 (°C), 45-50% relative humidity) obtained from a chapter 3. In the second step, a deep learning model has been developed using tap water SEM fingerprints (SEM images) and SEM-EDS map images to assign elements to the crystals that formed. Finally, in the last step, three vision-transformer models have been constructed to utilize the predicted element depositions in order to estimate concentrations of each element. Tap water fingerprints (SEM and photographs) collection for an array of synthetic waters Water sample recipes were developed based on Detroit water reports from 2017 to 2019. Components were prepared in a broader range to accurately represent the variability of 145 water constituents. The recipe details can be found in the supplementary file. The SEM residue patterns and EDS mapping of contaminant particles in tap water samples have been collected from droplets of each sample with five replicates that were dried under optimal temperature and relative humidity conditions (23-26 (°C), 45-50% relative humidity). Photographs of each residue were captured with the Celestron camera, SEM images of whole droplets were taken, and EDS maps were obtained for sodium, calcium, magnesium, chlorine, carbon, sulfur, and oxygen using the same method as in previous research Li et al. [2020] section. 3.3. Water sample recipes were designed to mimic the range of tap water components Table. 6.28 Table. 6.29 Table. 6.30. Table. 6.31 Table. 6.32. The SEM image and EDS mapping of the same area are shown in Figure. 4.4. Elements mapping estimation model for recognition of contaminants particles Elements mapping estimation model has been built and trained based on the SEM and EDS mapping data collected in previous step. The model takes water SEM fingerprints as input and, however, EDS image maps of contaminants elements as output instead of classification of the image. To evaluate the model performance, the output images have been overlaid with the EDS map, the pixel positions in these two maps has been measured and accuracy has been calculated. The model was built with segmentation_models_pytorch package with resnet34 encoder, seven output classes, sigmoid activation and model weights initialized with ImageNet weight initializer. Multilabel dice loss was applied in the training process. All the three models trained for 1000 epochs with 0.1 learning rate. 146 Figure 4.54: 3D stacking of residue surface scanning, SEM image, oxygen EDS, chlorine EDS bottom up. Dice loss In cross entropy loss, the overall loss was calculated as the average of per-pixel loss. However, the per-pixel loss was calculated discretely without considering whether its neighboring pixels are boundaries or not. As a result, cross entropy loss only takes into account the loss in a micro sense, rather than considering it globally, leading to limitations in image-level prediction. Dice loss Eqn. 4.1 originates from Sorensen-Dice coefficient, which is a statistic developed in 1940s to gauge the similarity between two samples. It was brought to computer vision community by Milletari et al. in 2016 for 3D medical image segmentation Milletari et al. [2016] which is widely used for image segmentation and boundary detection. 2 ∑+! 𝑝! 𝑔! 𝐷= (4.1) ∑! 𝑝! + ∑+ + 4 4 ! 𝑔! The equation for the Dice coefficient, shown in Eq. 4.1, calculates the similarity between the prediction and ground truth in boundary detection. The variables pi and gi represent corresponding pixel values, with a value of 1 indicating the presence of a 147 boundary and 0 indicating its absence. The denominator is the sum of total boundary pixels in both the prediction and ground truth, while the numerator is the sum of correctly predicted boundary pixels (i.e., those where pi and gi both have a value of 1). Figure 4.5: Dice coefficient (set view) Persistent Homology of Point Clouds In practice, the sliding window embedding of a video X is a finite set SWd,τ X = SWd,τ X(t) : t ∈ T , determined by a finite choice of T ⊂ R. As SWd,τ X ⊂ RW H(d+1) , the ambient Euclidean distance equips SWd,τ X with the structure of a finite metric space. Such discrete metric spaces, or point clouds, are topologically trivial, with N points having N connected components and no higher-dimensional features like holes. However, when a point cloud is sampled from or around a continuous space with non-trivial topology (e.g., a circle or torus), one would expect simplicial complexes built on the point cloud vertices to reflect the underlying continuous space’s topology. Persistent homology is applied to discrete collections of points such as sliding window embeddings Zomorodian and Carlsson [2004]. For a point cloud (X, dX), where X is a finite set and dX : X ∗ X → [0, ∞) represents a distance function, the Vietoris-Rips complex (also known as Rips complex) at scale ϵ ≥ 0 148 consists of non-empty subsets of X with a diameter less than or equal to ϵ: Rϵ(X) := σ ⊂ X : dX(x1, x2) ≤ ϵ, ∀xi, xj ∈ σ (4.2) The Rϵ(X) is a simplicial complex with its vertex set equivalent to X. It is formed by adding an edge between any pair of vertices with a distance of at most ϵ, incorporating all 2-dimensional triangular faces (i.e., 2-simplices) with existing bounding edges, and, more generally, including all k-simplices with included (k-1)-dimensional bounding facets. Figure 4.6 illustrates the evolution of the Rips complex for a set of points sampled around the unit circle. Figure 4.6: The Rips complex, at three different scales (ϵ = 0, 0.30, 0.40, 0.48), on a point cloud with 40 points sampled around S1 ⊂ R2. For an open cover given by {Bα(lj)}lj ∈L, where L is the landmark set and α is the radius of the balls, we have an associated partition of unity defined as =>0?@A,%# B= 𝜙# (𝑏) = ∑ $ (4.3) %|>0?(A,%% )|$ Persistent Homology of Point Cloud In topological analysis, the nerve complex, or the nerve of a family of sets, is a concept used to represent the intersection patterns of these sets. Given a collection of sets, the nerve complex is an abstract simplicial complex where each set corresponds to a vertex, and a collection of vertices forms a simplex if and only if the intersection of the corresponding sets is nonempty. In other words, the nerve complex encodes how the sets in 149 a family overlap with each other. This concept is particularly useful in various applications, including topological data analysis, where it can help analyze the structure of complex data sets Dey et al. [2017], Carlsson [2020]. Let I be a set of indices and C be a family of sets (Ui )i∈I . The nerve of C is a set of finite subsets of the index set I Geoghegan [2007]. It contains all finite subsets J ⊆ I such that the intersection of the Ui whose subindices are in J is non-empty Eqn. 4.4. 𝑁(𝐶) = {𝐽 ⊆ 𝐼: ⋂!∈F 𝑈! ≠ ∅, 𝐽 𝑓𝑖𝑛𝑖𝑡𝑒 𝑠𝑒𝑡} (4.4) Build vision-transformer model to use elements locations to estimate concentrations of contaminants elements The Elements mapping estimation model from previous step has been trained to recognize element particles to output elements mappings. By utilizing the estimated contaminants particles EDS mapping images, these particles concentrations has been quantified. In this study, the vision-transformer model has been built and trained on EDS mapping predictions to estimate contaminants concentrations. The vision-transformer composed by two Multi Head Attention module with FeedForward module, Norm module, Positional layer, Encoder module, Decoder module and Feature Extraction module. One-stage point estimation model, two-stage vision-transformer point estimation model and two-stage vision-transformer multiple output model comparison To measure the model performance, One-stage point estimation model (OnePeM) was built to estimate the concentrations of elements in the water samples. This model consists of two modules. The first module is identical to the Unet structure of elements mapping estimation model and the second module is to use the 2D layers to estimate the elements concentrations. Different from the two-stage vision-transformer point estimation model (TwoVtPeM), in this model, the elements EDS mapping weren’t used to train the first 150 module of the module, however, the model was trained end-to-end to estimate the elements concentrations. To produce robust concentration estimations, a two-stage vision- transformer multiple output model (TwoVtMoM) was built to produce multiple elements estimations. These two models have the same model backbone and the same activation function, weight initializer as the two-stage vision-transformer point estimation model. To train the model stochastic gradient descent (SGD) with 0.001 learning rate and MSE loss were used to optimize the model parameters for 500 epochs. To accelerate the training speed, the model is trained by distributed data parallel (DDP) module with eight A100 (80GB SXM4) GPUs. The training time is about 10 hours. Model training Three of the five replicates of each image collected in task were randomly assigned to the training dataset and the remaining two replicates were assigned to the testing dataset. All models were trained on the training set and model performance was tested on the testing set. The accuracy of the particle recognition was computed by comparing two features of the element SEM-EDS mapping image and CNN model output: 1) whether or not a pixel occurs in the same location, and 2) the size of pixel clusters. Specifically, the pixel occurrence was evaluated by first overlaying the CNN output map onto the EDS map for contaminants particles. Both the EDS images and the CNN output are maps where each pixel was assigned either a value of 0 or 1. In the evaluation stage, the CNN model output were analyzed to determine whether or not a pixel value of 1 exists in the same position or in a circle with a radius of 3 pixels drawn around the corresponding location on the EDS map. The pixel will be labeled as correctly identified if there exists at least one pixel for indicating the contaminants particles in the EDS map or labeled as incorrectly if not. The model accuracy, percentage of the 151 pixels that matched the EDS output for each image were calculated. Stochastic gradient descent (SGD) with 0.001 learning rate and MSE loss were used to optimize the model parameters and training were conducted for 500 epochs. To accelerate the training speed, the mode;s were trained by distributed data parallel (DDP) module with eight A100 (80GB SXM4) GPUs. 4.4 Results and Discussion 4.4.1 Elements correlations between coffee-ring effect subrings To investigate the correlations between elements in each coffee-ring effect residue subring, the droplet residue were separated into fifteen subrings with the evenly 4.9. The elements correlations between coffee-ring effect residue subrings were analyzed by Pearson correlation coefficient. The Pearson correlation coefficient is a measure of the linear correlation between two variables. It is a dimensionless number between -1 and 1, where 1 is total positive linear correlation, 0 is no linear correlation, and -1 is total negative linear correlation. The Pearson correlation coefficient is calculated by Eqn. Eqn. 4.5. ∑' &()(G& 0G̅ )(H& 0H /) 𝑟GH = (4.5) J∑' * ' /)* &()(G& 0G̅ ) J∑&()(H& 0H The strongest correlation was observed between sodium and chlorine, particularly within the second subring of both elements. This phenomenon suggests that sodium and chloride ions tend to form crystals in the second subring area. The highest correlations among oxygen, calcium, and sulfur were found in the outermost subring, indicating the formation of calcium sulfate (CaSO4) in this region Figure. 4.7. Meanwhile, the highest correlations between chlorine and calcium occurred in the middle subring areas, signifying the formation of calcium chloride (CaCl2) in those regions Figure. 4.8. 152 4.4.2 Elements mapping estimation model analysis The estimated calcium-carbon and oxygen sulfur EDS mappings are displayed in separate 2D histograms in Figure 4.10. As observed, oxygen and sulfur are more prominently present in the droplet residue pattern area, while calcium is distributed throughout the entire image, although it is primarily located in the residue pattern. This is likely due to the presence of calcium in the substrate during the manufacturing process. To overcome this issue, a higher-quality substrate with a lower calcium content could be utilized. From the histogram results, discerning the correlation between calcium and carbon proves to be difficult. However, the relationship between calcium and sulfur is more apparent. SEM example image is shown in Figure. 4.7. 153 Figure 4.7: SEM image of water sample coffee-ring effect residue pattern (with detailed subregion presentation). Water sample with MgCl 2 0.45 mM, NaHCO 3 1.0 mM, CaCl2 1.5 mM, MgSO4 0.5 mM. 154 Figure 4.8: Pearson correlation of water contaminants in Coffee-ring effect residue subrings. Water sample with MgCl 2 0.45 mM, NaHCO 3 1.0 mM, CaCl2 1.5 mM, MgSO4 0.5 mM. 155 Table 4.1: Coffee ring effect elements deposition prediction by Unet model. Water sample with MgCl 2 0.45 mM, NaHCO 3 1.0 mM, CaCl2 1.5 mM, MgSO4 0.5 mM Elements Predicted EDS Target EDS mapping mapping Calcium Sodium Carbon Magnesium Oxygen Sulfur Chlorine 156 Figure 4.9: Coffee-ring effect residue pattern were separated to fifteen subrings with the evenly. Water sample with MgCl 2 0.45 mM, NaHCO 3 1.0 mM, CaCl2 1.5 mM, MgSO4 0.5 mM. Figure 4.10: Topological nerve complex of estimated Calcium-Carbon EDS and Oxygen-Sulfur EDS. The left is Calcium-Carbon 2D histogram and the right is Oxygen-Sulfur 2D histogram. x-axis and y-axis are scaled. Water sample with MgCl 2 0.45 mM, NaHCO 3 1.0 mM, CaCl2 1.5 mM, MgSO4 0.5 mM. The calculation of the nerve complex was based on the combination of calcium and carbon EDS mappings and the combination of calcium and sulfur EDS mappings. In both combinations, the two EDS mappings were compared, with one serving as the reference 157 and the other as the target. If a predicted element pixel was found in the reference, the location of the corresponding pixel in the target was recorded as a positive signal if it was present within a 3x3 area. To minimize noise, 1000 randomly selected points from the resulting pixels were used. This method resulted in the creation of the calcium-carbon and calcium-sulfur combination mappings with radius 0.008, which were then used to calculate the nerve complex. The nerve complex of the calcium-carbon and calcium-sulfur combination mappings are shown in Figure 4.11. The calcium-sulfur nerve comlex formed at different locations of than calcium-calcium which was consistent the teh claim that different composition particles formed at different locations in the droplet residue pattern. The results of the Unet elements deposition estimation are presented in Figure 4.12. The three tables, from left to right, represent accuracy, false positive, and false negative (sensitivity). The y-axis of each table represents 625 water samples, while the x- axis lists the elements in the order of Oxygen, Sulfur, Sodium, Magnesium, Chlorine, Calcium, and Carbon. The two-stage vision-transformer point estimation model, the one- stage point estimation model, and the two-stage vision-transformer multiple output model all include this module and were trained independently. As shown in the accuracy results, sulfur and magnesium have the highest overall accuracy, while calcium and carbon have the lowest accuracy. This is also evident in Figure 4.17 where the predicted calcium values are mostly lower than the true values. The high accuracy of sulfur and magnesium can be attributed to the more accurate collection of sulfur and magnesium EDS mappings, compared to the high noise present in the calcium EDS mapping (as seen in Figure 4.13), as the EDS instrument is more sensitive to these two elements. Additionally, the substrate contains fewer sulfur and magnesium impurities, and these elements are more separated from other elements such as 158 2− oxygen and are prone to 4 form crystals, such as SO ions. The false positive and false negative values for magnesium are also lower than for other elements. However, the EDS detector is not as sensitive to carbon, and the substrate contains a high concentration of calcium, leading to an inaccurate collection of EDS mappings for carbon. As a result, the model has difficulty learning the relationship between crystal structure and elements composition for carbon. Figure 4.11: Topological nerve complex of estimated Calcium-Carbon EDS and Calcium-Sulfur EDS. The diagram on the left represents the Calcium-Carbon EDS nerve complex, while the one on the right shows the Oxygen-Sulfur nerve complex. A radius of 0.008 was used in the calculations. The coffee-ring effect residue pattern resulted in the formation of calcium carbon crystals (CaCO3) and calcium sulfur crystals (CaSO4) at different locations. 159 Figure 4.12: Accuracy, False Positive, False Negative (Sensitivity) tables from left to right; O, S, Na, Mg, Cl, Ca, C elements in each table from left to right. Result is averaged across five replicates. Figure 4.13: Magnesium Sodium EDS mapping comparison. water sample was prepared with the following components: 0.45 mM Magnesium Chloride (MgCl2), 0.25 mM Sodium Bicarbonate (NaHCO3), 2.0 mM Magnesium Sulfate (MgSO4), and 0.75 mM Calcium Chloride (CaCl2). 160 Figure 4.14: Trilinear plot of water recipes. The trilinear plot of water sample recipes, as depicted in Figure. 4.14, effectively demonstrates the wide range of element concentrations found in various tap water samples. These samples are distributed across the plot to account for the inherent variability of tap water components that may be encountered in different geographical regions and under diverse environmental conditions. This comprehensive representation of tap water compositions enables a more thorough analysis and understanding of the various factors influencing water quality, ultimately supporting the development and evaluation of the vision-transformer model in this study. 4.4.3 Two-stage model produces better results than one-stage model Water contaminants elements concentrations were predicted by the two-stage vision- transformer point estimation model, one-stage point estimation model and two-stage vision-transformer multiple output model. Results were plotted independently by target concentrations (x-axis) versus predicted concentrations (y-axis). Elements were labeled by independent color. 161 Two-stage vision-transformer point estimation model (TwoVtPeM) Figure 4.15 displays the predicted and true (target) chlorine-sulfur mass ratios. The predicted chlorine to sulfur mass ratio is found to be higher than the true values, particularly when the true values are larger. This is consistent with the overestimation of concentration seen in the results of the TwoVtPeM (Fig. 4.17). The reason for this overestimation will be discussed in the following sections. Figure 4.15: TwoVtPeM chlorine to sulfur mass ratio. Targets chlorine to sulfur mass ratio vs predictions chlorine to Sulfur mass ratio. Marker colors relates target chlorine to sulfur ratio value. The predicted water hardness values tend to be higher than the true hardness values of the water samples, as shown in Fig. 4.16. For instance, twenty hard water samples were predicted as very hard, and five moderately hard water samples were predicted as hard. Nineteen hard water samples and eighty very hard water samples were correctly predicted. Only one sample had a predicted hardness lower than its true hardness. This is due to the overestimation of calcium concentrations, as seen in Fig. 4.17. The reason for this overestimation will be discussed in the following section. 162 The concentrations of contaminants estimated by the TwoVtPeM model are displayed in Figure 4.17. The target concentrations (elements concentrations in the water preparation recipe) are plotted on the x-axis, while the predicted concentrations are plotted on the y- axis. The results indicate that the predicted chlorine concentrations are generally higher than the true chlorine concentrations. This is consistent with the EDS mapping results in Figure 4.12 which show that the false negative value is lower than the false positive value. This suggests that some of the estimated chlorine crystals are not actually chlorine, leading to an overestimation of the chlorine concentration. Additionally, the estimation of chlorine has a larger standard deviation, which is likely due to the relatively high concentrations of chlorine compared to other elements in the water samples. As shown in Table 4.1, the predicted chlorine crystals are larger than true chlorine crystals. Figure 4.16: TwoVtPeM of water samples hardness category classification results. The trilinear plot of the estimated element concentrations by the TwoVtPeM is presented in Figure. 4.18. When comparing this result with the true element concentrations 163 trilinear plot in Figure. 4.14, it is apparent that the water samples in the same table of water recipes are situated in similar locations. This observation indicates that the TwoVtPeM has successfully estimated element concentrations, demonstrating the effectiveness and accuracy of the model in analyzing and characterizing various tap water compositions. Figure 4.17: TwoVtPeM results. Targets (x-axis) vs predictions (y-axis). One-stage point estimation model (OnePeM) Figure 4.19 displays the predicted and true (target) chlorine to sulfur mass ratios. Different from the overestimated chlorine to sulfur mass ratio in the TwoVtPeM, the estimated chlorine to sulfur mass ratio is overestimated when the true chlorine to sulfur mass ratio is low but underestimated by the OnePeM especially when the true chlorine to sulfur mass ratio is high. This is consistent with the elements concentrations estimations Figure 4.21 that chlorine concentration is overestimated under its low concentrations condition but overestimated under its high concentration condition. 164 Figure 4.18: TwoVtPeM of water samples trilinear plot. Figure 4.19: OnePeM chlorine to sulfur mass ratio. Targets chlorine to sulfur mass ratio vs predictions chlorine to sulfur mass ratio. Marker colors relates target chlorine to sulfur ratio value. The predicted water hardness values also tend to be higher than the true hardness values of the water samples, as shown in Fig. 4.20. For instance, thirty-four hard water samples were predicted as very hard, three moderately hard water samples were predicted 165 as hard and two moderately hard water samples predicted as very hard. six hard water samples and eighty very hard water samples were correctly predicted. Only two samples had a predicted hardness lower than its true hardness. This is due to the overestimation of calcium and magnesium concentrations under low concentration conditions, as seen in Fig. 4.21. The reason for this overestimation will be discussed in the following section. Figure 4.20: OnePeM of water samples hardness category classification results. Figure 4.21 displays the concentrations of contaminants estimated by the one-stage point estimation model. In comparison to the TwoVtPeM, the OnePeM results in a greater standard deviation in the predicted concentrations. Additionally, the model tends to overestimate low concentrations and underestimate high concentrations of each element. For example, the predicted calcium concentration is higher than its true concentration when it is around 2 mM, but lower than its true concentration when it is around 3.5 mM. This is because the one-stage model is trained end-to-end, lacking the correction step present in the TwoVtPeM that adjusts for the EDS mapping estimation. As a result, the model requires more training epochs and fine-tuning to effectively learn the features. 166 The trilinear plot of the estimated element concentrations by the OnePeM is presented in Figure. 4.22. When comparing this result with the true element concentrations trilinear plot in Figure. 4.14, it is apparent that the water samples in the same table of water recipes are situated in similar locations, but not as accurately as the TwoVtPeM. This observation indicates that while the OnePeM is capable of estimating element concentrations, its performance is not as precise as the TwoVtPeM. Figure 4.21: OnePeM results. Targets (x-axis) vs predictions (y-axis). Two-stage vision-transformer multiple output estimation model (TwoVtMoM) While the TwoVtMoM was expected to produce more accurate results than the TwoVtPeM, its element concentration estimations are actually worse. The model tends to overestimate low true element concentrations and underestimate high true element concentrations. This is due to the larger number of parameters in the TwoVtMoM model, which requires more training epochs and fine-tuning to effectively learn the features. 167 Figure 4.22: OnePeM model of water samples trilinear plot. The results of each element estimation for the different models are summarized in Figure 4.24. As illustrated in the figure, the TwoVtPeM (row 1) exhibits lower variance compared to the OnePeM (row 2). The one-stage point estimation model tends to predict lower element concentrations than the actual values. This is due to the fact that the TwoVtPeM more accurately maps the elements’ locations compared to the OnePeM. Although crystals form in a 3D structure, the EDS mapping can only represent the elements’ 2D deposition. The TwoVtPeM can utilize relative location information from other elements to construct the crystal deposition structure and infer the corresponding concentrations. The error mean (calculated as the percentage difference between the mean of the estimated element concentrations and their true concentrations) and standard deviation of concentration estimations (calculated as the standard deviation of estimated element concentrations) are presented in Table 4.2. The OnePeM has the lowest error mean for five elements (oxygen, sodium, chlorine, calcium, and carbon) out 168 of the seven elements, while the TwoVtPeM has the lowest error mean for the remaining two elements (sulfur and magnesium). Although the OnePeM has the lowest error mean, the TwoVtPeM has the lowest standard deviation for all seven element concentration estimations. This demonstrates that the TwoVtPeM is more stable than the OnePeM, which is due to the elements EDS mapping estimation module in its first stage, resulting in an R2 of 0.95, which is higher than the 0.9 of the OnePeM. Figure 4.23: TwoVtMoM results. Targets (x-axis) vs predictions (y-axis). Model comparison In Section 4.4, the individual results of the three models regarding their element concentration estimations are presented. To compare the three models, the element concentration estimations and relative standard deviations are illustrated in Figure 4.24 and Table 4.2. From this figure, it is evident that TwoVtPeM outperforms the other models with lower variance and higher R2. The OnePeM concentration estimations are accurate for nonmetals oxygen, chlorine, and sulfur; however, its estimations are not precise for metals sodium, calcium, magnesium, and carbon. The 169 TwoVtPeM is more accurate for all elements. The TwoVtMoM is the least effective model, with the highest variance and lowest R2. According to model performance analysis: the TwoVtPeM technique achieved the best performance of the models tested (OnePeM, TwoVtPeM and TwoVtMoM), with OnePeM also performing well and TwoVtMoM falling short. The TwoVtPeM relative percentage errors were ±17.1% for oxygen, ±4.5% for sulfur, ±19.9% for sodium, ±5.7% for chlorine, ±19.8% for calcium, ±25.8% for magnesium, and ±20.1% for carbon. The R2 was 0.95 which is higher than OnePeM with 0.90 R2 and TwoVtMoM which was 0.54. The TwoVtPeM had a higher error mean than OnePeM, but it exhibited lower relative standard deviations of estimation; the TwoVtPeM relative standard deviations values were: 3.9% for oxygen, 3.0% for sulfur, 5.3% for sodium, 3.9% for magnesium, 5.3% for chlorine, 10.0% for calcium, and 5.9% for carbon. Moreover, 79.2% of water samples were correctly classified for hardness based on the estimated element concentrations by TwoVtPeM. The OnePeM model correctly classified 67.2% of water samples, however the TwoVtMoM model achieved only 60.2% accuracy rate in classifying water samples for hardness Table 4.2. Although the OnePeM has the relative error for oxygen, sodium, chlorine, calcium, and carbon, it exhibits larger relative standard deviations than the estimations of TwoVtPeM, indicating that the OnePeM is less stable. The TwoVtPeM has the lowest standard deviation for all seven element concentration estimations, demonstrating greater stability than the OnePeM. This is attributed to the element EDS mapping estimation module in its first stage. The TwoVtPeM can utilize relative location information from other elements to construct the crystal deposition structure and infer the corresponding concentrations. The TwoVtMoM was expected to have the lowest relative error and highest R2, but this was not the case. This is due to the larger number of parameters in the TwoVtMoM model, 170 which necessitates more training epochs and fine-tuning to effectively learn the features. To apply this method in water quality monitoring, further research is required to investigate the reasons for the TwoVtMoM’s poor performance and explore methods to enhance it. Another necessary step is to develop a model that transfers from the element concentration estimation model based on water SEM fingerprints to one based on water photograph fingerprints. The rationale is that SEM images are more accurate than photographs, but SEM images are not available in households or in the field. The model built from water SEM fingerprints is only used for learning crystal features from water residue patterns, and this information is solely for constructing the element concentration estimation model from water photograph fingerprints. Thus, in the future, when the element concentration estimation model from water photograph fingerprints is developed, only water photograph fingerprints will be needed for element concentration estimation. 171 Figure 4.24: TwoVtMoM (row 1) produces lower variance than one-stage point estimation model (row 2). OnePeM predicts lower elements concentrations than their real values. 172 Table 4.2: Comparing Estimation Results of Model Element Concentrations. Models Oxygen Sulfur Sodium Magnesium Chlorine Calcium Carbon R2 Relative Error (%) OnePeM ±5.2% ±16.4% ±5.2% ±20.0% ±10.7% ±17.9% ±3.2% 0.90 TwoVtPeM ±17.1% ±4.5% ±19.8% ±5.7% ±19.7% ±25.8% ±20.1% 0.95 TwoVtMoM ±35.5% ±19.3% ±30.2% ±21.9% ±11.8% ±20.7% ±33.3% 0.54 Relative Standard Deviation Error (%) OnePeM 6.9% 19.7% 8.0% 27.9% 12.2% 24.6% 6.2% TwoVtPeM 3.9% 3.0% 5.3% 3.9% 5.3% 10.0% 5.9% TwoVtMoM 59.0% 31.0% 46.8% 42.3% 20.3% 39.9% 53.1% Coefficient of Variation (%) OnePeM 33.5% 20.7% 36.4% 34.7% 22.0% 30.2% 36.5% TwoVtPeM 13.0% 22.4% 14.1% 25.0% 14.1% 20.5% 17.6% TwoVtMoM 19.4% 18.1% 25.6% 26.4% 15.1% 20.7% 22.7% Mean Absolute Percentage Error (%) OnePeM ±18.1% ±33.2% ±20.3% ±37.3% ±38.9% ±27.1% ±17.4% TwoVtPeM ±17.1% ±13.3% ±20.7% ±25.9% ±15.7% ±19.8% ±20.3% TwoVtMoM ±55.2% ±42.2% ±49.6% ±49.2% ±47.6% ±36.8% ±52.8% Root Mean Square Error OnePeM 0.52 0.44 0.18 0.39 0.45 0.79 0.17 TwoVtPeM 0.45 0.18 0.18 0.27 0.18 0.54 0.18 TwoVtMoM 1.57 0.60 0.51 0.54 0.60 0.40 1.09 Mean Square Error OnePeM 0.27 0.19 0.04 0.16 0.20 0.62 0.03 TwoVtPeM 0.20 0.03 0.03 0.07 0.03 0.29 0.03 TwoVtMoM 2.47 0.36 0.26 0.29 0.37 1.19 0.27 173 4.5 Conclusion Machine learning is transforming the way we approach water quality and public health. This study shows the potential of machine learning to revolutionize water quality monitoring. With the use of low-cost aluminum substrates, the overall cost of the experiment is significantly lower than traditional analytical methods, making this technique a cost-effective solution for water quality monitoring. The method is especially useful in rural areas and in the event of potential pollution incidents, where early detection is crucial. The findings of this study reveal that the TwoVtPeM technique achieved the best performance of the models tested (OnePeM, TwoVtPeM and TwoVtMoM), with OnePeM also performing well and TwoVtMoM falling short. The TwoVtPeM relative percentage errors were ±17.1% for oxygen, ±4.5% for sulfur, ±19.9% for sodium, ±5.7% for chlorine, ±19.8% for calcium, ±25.8% for magnesium, and ±20.1% for carbon. The R2 was 0.95 which is higher than OnePeM with 0.90 R2 and TwoVtMoM which was 0.54. The TwoVtPeM had a higher error mean than OnePeM, but it exhibited lower relative standard deviations of estimation; the TwoVtPeM relative standard deviations values were: 3.9% for oxygen, 3.0% for sulfur, 5.3% for sodium, 3.9% for magnesium, 5.3% for chlorine, 10.0% for calcium, and 5.9% for carbon. Moreover, 79.2% of water samples were correctly classified for hardness based on the estimated element concentrations by TwoVtPeM. The OnePeM model correctly classified 67.2% of water samples, however the TwoVtMoM model achieved only 60.2% accuracy rate in classifying water samples for hardness. Advances in camera technology and deep learning techniques hold great potential for improving the method’s ability to detect low concentrations of elements. By using substrates 174 with varying surface properties, such as roughness, wettability, charge, and others, different crystal formations can be produced that can be designed to monitor specific contaminants. The two-stage vision-transformer multiple output model produces a smaller variance, but the concentration estimation is not always accurate, requiring more fine- tuning and training epochs. To detect low concentrations of elements, water samples with lower concentrations need to be prepared and the coffee-ring effect residue pattern collected. Confirmation of the crystal structure can be obtained through Raman spectroscopy on the water sample residue. To analyze the one-stage point estimation model performance, the intermediate output of the seven element mappings can be compared with the predicted EDS mapping of the two- stage vision-transformer point estimation model. This will provide insights into the strengths and weaknesses of each model, allowing for further improvements to be made. An additional avenue for improvement is the creation of a loss function that takes into account not only the pixel classes but also their structure. Contaminants often have distinct 3D lattice structures, and this information could be leveraged in the loss function. Additionally, incorporating domain knowledge from physical chemistry could also be beneficial. For instance, magnesium and calcium crystals are unlikely to form crystals at the same location, but calcium and sulfur are more likely to form calcium sulfate first due to their relatively low Ksp values compared to other crystals such as sodium chloride and calcium chloride. In conclusion, this study highlights the potential of machine learning to revolutionize water quality monitoring. By improving the efficiency and effectiveness of water quality management systems, machine learning has the potential to lead to better health outcomes for individuals and communities. With continued advancements in technology 175 and machine learning techniques, we can expect to see even more exciting developments in this field in the future. 176 CHAPTER 5 Implications Machine learning is revolutionizing both water quality and public health. In the realm of water quality, machine learning is employed to create predictive models that shed light on the relationships between various water quality parameters and the impact of different factors. This results in the creation of early warning systems that can identify potential water quality problems, enabling proactive solutions. Machine learning also enables the analysis of large amounts of data and extraction of previously hidden insights, leading to a deeper understanding of water quality and new methods for managing this critical resource. By automating certain tasks and simplifying data analysis processes, machine learning has the potential to enhance the efficiency and effectiveness of water quality management systems. In public health, machine learning algorithms are trained on medical images and patient records to diagnose diseases and predict future health outcomes. They are also utilized to analyze and forecast the spread of infectious diseases, providing crucial support to public health officials. Machine learning is integrated into environmental monitoring systems, providing real-time data analysis for environmental facilities and resulting in more informed management decisions. Additionally, machine learning algorithms can predict the risk of specific environmental issues, such as pollution events or habitat degradation, allowing for early interventions and preventive measures. Machine learning has the potential to significantly improve the efficiency and effectiveness of environmental initiatives, leading to better environmental outcomes for ecosystems and communities. The impact of machine 177 learning on water quality and public health is substantial and has the potential to fundamentally change the way we approach and manage these critical resources. Through the use of advanced machine learning techniques, we can gain a deeper understanding of water quality, create new and innovative solutions for preserving this essential resource, and protect public health for future generations. This study underscores the potential of machine learning to transform water quality monitoring. By enhancing the efficiency and effectiveness of water quality management systems, machine learning can be utilized for various image formats, including SEM, EDS, X-ray Powder Diffraction (XRD), Raman spectroscopy, images collected in rural areas, and even satellite data covering larger areas. Consequently, machine learning could potentially result in better health outcomes for individuals and communities. As technology and machine learning techniques continue to advance, we can anticipate further groundbreaking developments in this field that will contribute to ensuring cleaner water and healthier environments for all. As a screening method, this research demonstrates the effectiveness of machine learning techniques in water quality monitoring. With improvements in camera technology, material science, and model design, such as the development of multimodal techniques incorporating local weather, groundwater information, pipe information, and environmental incidents, this approach shows great promise as a fast, low-cost, and accurate water quality monitoring technique. 178 BIBLIOGRAPHY Tak-Sing Wong, Ting-Hsuan Chen, Xiaoying Shen, and Chih-Ming Ho. Nanochromatography driven by the coffee ring effect. Analytical chemistry, 83(6):1871– 1873, 2011. Pavlo Takhistov and Hsueh-Chia Chang. Complex stain morphologies. Industrial & engineering chemistry research, 41(25):6256–6269, 2002. Noushine Shahidzadeh-Bonn, Salima Rafaı, Daniel Bonn, and Gerard Wegdam. Salt crystallization during evaporation: impact of interfacial properties. Langmuir, 24(16): 8599–8605, 2008. Xin Zhong, Junheng Ren, and Fei Duan. Wettability effect on evaporation dynamics and crystalline patterns of sessile saline droplets. The Journal of Physical Chemistry B, 121 (33):7924–7933, 2017. D Kaya, VA Belyi, and M Muthukumar. Pattern formation in drying droplets of polyelectrolyte and salt. The Journal of chemical physics, 133(11):114905, 2010. Bongsu Shin, Myoung-Woon Moon, and Ho-Young Kim. Rings, igloos, and pebbles of salt formed by drying saline drops. Langmuir, 30(43):12837–12842, 2014. Sooheyong Lee, Haeng Sub Wi, Wonhyuk Jo, Yong Chan Cho, Hyun Hwi Lee, Se-Young Jeong, Yong-Il Kim, and Geun Woo Lee. Multiple pathways of crystal nucleation in an extremely supersaturated aqueous potassium dihydrogen phosphate (kdp) solution droplet. Proceedings of the National Academy of Sciences, 113(48):13618–13623, 2016. Hee-Soo Kim, Sung Soo Park, and Frank Hagelberg. Computational approach to drying a nanoparticle-suspended liquid droplet. Journal of Nanoparticle Research, 13:59–68, 2011. Andrew Stannard. Dewetting-mediated pattern formation in nanoparticle assemblies. Journal of Physics: Condensed Matter, 23(8):083001, 2011. Mark J Robbins, AJ Archer, and Uwe Thiele. Modelling the evaporation of thin films of colloidal suspensions using dynamical density functional theory. Journal of Physics: Condensed Matter, 23(41):415102, 2011. Dinesh Gupta and Michael H Peters. A brownian dynamics simulation of aerosol deposition onto spherical collectors. Journal of Colloid and interface Science, 104(2):375–389, 1985. Jim C Chen and Albert S Kim. Brownian dynamics, molecular dynamics, and monte carlo modeling of colloidal systems. Advances in colloid and interface science, 112(1-3):159–173, 2004. Saeed Jafari Kang, Vahid Vandadi, James D Felske, and Hassan Masoud. Alternative mechanism for coffee-ring deposition based on active role of free surface. Physical Review 179 E, 94(6):063104, 2016. Benjamin J Fischer. Particle convection in an evaporating colloidal droplet. Langmuir, 18 (1):60–67, 2002. Leonid Shmuylovich, Amy Q Shen, and Howard A Stone. Surface morphology of drying latex films: Multiple ring formation. Langmuir, 18(9):3441–3445, 2002. L Pauchard and C Allain. Stable and unstable surface evolution during the drying of a polymer solution drop. Physical Review E, 68(5):052801, 2003. Yuri O Popov. Evaporative deposition patterns: spatial dimensions of the deposit. Physical Review E, 71(3):036313, 2005. T Heim, S Preuss, B Gerstmayer, A Bosio, and R Blossey. Deposition from a drop: morphologies of unspecifically bound dna. Journal of Physics: Condensed Matter, 17 (9):S703, 2005. Peter J Yunker, Tim Still, Matthew A Lohr, and AG Yodh. Suppression of the coffee-ring effect by shape-dependent capillary interactions. nature, 476(7360):308–311, 2011. Ronald G Larson. Transport and deposition patterns in drying sessile droplets. AIChE Journal, 60(5):1538–1571, 2014. Jake Graser, Steven K Kauwe, and Taylor D Sparks. Machine learning and energy minimization approaches for crystal structure predictions: a review and new horizons. Chemistry of Materials, 30(11):3601–3612, 2018. Jungho Park and Jooho Moon. Control of colloidal particle deposit patterns within picoliter droplets ejected by ink-jet printing. Langmuir, 22(8):3506–3513, 2006. Andreas Friederich, Joachim R Binder, and W Bauer. Rheological control of the coffee stain effect for inkjet printing of ceramics. Journal of the American Ceramic Society, 96(7): 2093–2099, 2013. Minxuan Kuang, Libin Wang, and Yanlin Song. Controllable printing droplets for high-resolution patterns. Advanced materials, 26(40):6950–6958, 2014. Jiazhen Sun, Bin Bao, Min He, Haihua Zhou, and Yanlin Song. Recent advances in controlling the depositing morphologies of inkjet droplets. ACS applied materials & interfaces, 7(51):28086–28099, 2015. Qijin Huang and Yong Zhu. Printing conductive nanomaterials for flexible and stretchable electronics: A review of materials, processes, and applications. Advanced Materials Technologies, 4(5):1800546, 2019. Wei Han and Zhiqun Lin. Learning from “coffee rings”: Ordered structures enabled by controlled evaporative self-assembly. Angewandte Chemie International Edition, 51(7): 180 1534–1546, 2012. Jiao-Jing Shao, Wei Lv, and Quan-Hong Yang. Self-assembly of graphene oxide at interfaces. Advanced Materials, 26(32):5586–5612, 2014. Jianli Zou and Franklin Kim. Diffusion driven layer-by-layer assembly of graphene oxide nanosheets into porous three-dimensional macrostructures. Nature communications, 5(1): 1–9, 2014. Jungho Park, Jooho Moon, Hyunjung Shin, Dake Wang, and Minseo Park. Direct-write fabrication of colloidal photonic crystal microarrays by ink-jet printing. Journal of colloid and interface science, 298(2):713–719, 2006. Liying Cui, Yingfeng Li, Jingxia Wang, Entao Tian, Xingye Zhang, Youzhuan Zhang, Yanlin Song, and Lei Jiang. Fabrication of large-area patterned photonic crystals by ink-jet printing. Journal of Materials Chemistry, 19(31):5499–5502, 2009. Ralf Blossey and Andreas Bosio. Contact line deposits on cdna microarrays: a “twin-spot effect”. Langmuir, 18(7):2952–2954, 2002. Vincent Dugas, Jérôme Broutin, and Eliane Souteyrand. Droplet evaporation study applied to dna chip manufacturing. Langmuir, 21(20):9130–9136, 2005. Jie-Bi Hu, Yu-Chie Chen, and Pawel L Urban. Coffee-ring effects in laser desorption/ionization mass spectrometry. Analytica chimica acta, 766:77–82, 2013. D Mampallil, HB Eral, D Van Den Ende, and F Mugele. Control of evaporating complex fluids through electrowetting. Soft Matter, 8(41):10614–10617, 2012. Olena Kudina, Burak Eral, and Frieder Mugele. e-maldi: an electrowetting-enhanced drop drying method for maldi mass spectrometry. Analytical chemistry, 88(9):4669–4675, 2016. Yin-Hung Lai, Yi-Hong Cai, Hsun Lee, Yu-Meng Ou, Chih-Hao Hsiao, Chien-Wei Tsao, Huan-Tsung Chang, and Yi-Sheng Wang. Reducing spatial heterogeneity of maldi samples with marangoni flows during sample preparation. Journal of The American Society for Mass Spectrometry, 27(8):1314–1321, 2016. Wei-dong Zhou, Jia-nan Cai, Long Sun, and Chen Shen. Time–space difference based gps/sins ultra-tight integrated navigation method. Measurement, 58:87–92, 2014a. Weidong Wang, Yongguang Yin, Zhiqiang Tan, and Jingfu Liu. Coffee-ring effect- based simultaneous sers substrate fabrication and analyte enrichment for trace analysis. Nanoscale, 6(16):9588–9593, 2014. Jose L Garcia-Cordero and Z Hugh Fan. Sessile droplets for chemical and biological assays. Lab on a Chip, 17(13):2150–2166, 2017. 181 Penghui Li, Yong Li, Zhang-Kai Zhou, Siying Tang, Xue-Feng Yu, Shu Xiao, Zhongzhen Wu, Quanlan Xiao, Yuetao Zhao, Huaiyu Wang, et al. Evaporative self-assembly of gold nanorods into macroscopic 3d plasmonic superlattice arrays. Advanced Materials, 28(13): 2511–2517, 2016a. David Brutin, Benjamin Sobac, Boris Loquet, and José Sampol. Pattern formation in drying drops of blood. Journal of fluid mechanics, 667:85–95, 2011. Jessica T Wen, Chih-Ming Ho, and Peter B Lillehoj. Coffee ring aptasensor for rapid protein detection. Langmuir, 29(26):8440–8446, 2013. Christopher P Gulka, Joshua D Swartz, Joshua R Trantum, Keersten M Davis, Corey M Peak, Alexander J Denton, Frederick R Haselton, and David W Wright. Coffee rings as low-resource diagnostics: detection of the malaria biomarker plasmodium falciparum histidine-rich protein-ii using a surface-coupled ring of ni (ii) nta gold-plated polystyrene particles. ACS applied materials & interfaces, 6(9):6257–6263, 2014. Berend-Jan de Gans and Ulrich S Schubert. Inkjet printing of well-defined polymer dots and arrays. Langmuir, 20(18):7789–7793, 2004. HB Eral, DJCM t Mannetje, and Jung Min Oh. Contact angle hysteresis: a review of fundamentals and applications. Colloid and polymer science, 291(2):247–260, 2013. Dongliang Tian, Yanlin Song, and Lei Jiang. Patterning of controllable surface wettability for printing techniques. Chemical society reviews, 42(12):5184–5209, 2013. Hwa-Young Ko, Jungho Park, Hyunjung Shin, and Jooho Moon. Rapid self-assembly of monodisperse colloidal spheres in an ink-jet printed droplet. Chemistry of materials, 16 (22):4212–4215, 2004. Huaiguang Li, Darren Buesen, Rhodri Williams, Joerg Henig, Stefanie Stapf, Kallol Mukherjee, Erik Freier, Wolfgang Lubitz, Martin Winkler, Thomas Happe, et al. Preventing the coffee-ring effect and aggregate sedimentation by in situ gelation of monodisperse materials. Chemical Science, 9(39):7596–7605, 2018. Carmen L Moraila-Martinez, Miguel A Cabrerizo-Vilchez, and Miguel A Rodriguez-Valverde. Controlling the morphology of ring-like deposits by varying the pinning time of driven receding contact lines. Interfacial Phenomena and Heat Transfer, 1(3), 2013. Tuan AH Nguyen, Marc A Hampton, and Anh V Nguyen. Evaporation of nanoparticle droplets on smooth hydrophobic surfaces: the inner coffee ring deposits. The Journal of Physical Chemistry C, 117(9):4707–4716, 2013. Frieder Mugele and Jean-Christophe Baret. Electrowetting: from basics to applications. Journal of physics: condensed matter, 17(28):R705, 2005. F Li and F Mugele. How to make sticky surfaces slippery: Contact angle hysteresis in 182 electrowetting with alternating voltage. Applied Physics Letters, 92(24):244108, 2008. Dileep Mampallil and Huseyin Burak Eral. A review on suppression and utilization of the coffee-ring effect. Advances in colloid and interface science, 252:38–54, 2018. Ruth Hernandez-Perez, Z Hugh Fan, and Jose L Garcia-Cordero. Evaporation-driven bioassays in suspended droplets. Analytical chemistry, 88(14):7312–7317, 2016. F De Angelis, F Gentile, F Mecarini, G Das, M Moretti, P Candeloro, ML Coluccio, G Cojoc, A Accardo, C Liberale, et al. Breaking the diffusion limit with super-hydrophobic delivery of molecules to plasmonic nanofocusing sers structures. Nature Photonics, 5(11):682–687, 2011. Ying Liu, Cheng Zhi Huang, and Yuan Fang Li. Fluorescence assay based on preconcentration by a self-ordered ring using berberine as a model analyte. Analytical chemistry, 74(21):5564–5568, 2002. Cheng Zhi Huang, Ying Liu, and Yuan Fang Li. Microscopic determination of tetracycline based on aluminum-sensitized fluorescence of a self-ordered ring formed by a sessile droplet on glass slide support. Journal of pharmaceutical and biomedical analysis, 34(1):103–114, 2004a. Chuanxiao Yang and Chengzhi Huang. Fluorescent microscopic determination of quinidine sulfate in serum samples with self-ordered ring technique by capillary flow effect. Chinese Journal of Analytical Chemistry, 34(2):183–187, 2006. Ying Liu, YF Li, and CZ Huang. Fluorimetric determination of fluorescein at the femtomole level with a self-ordered ring of a sessile droplet on glass slide support. Journal of Analytical Chemistry, 61(7):647–653, 2006. Lifeng Chen and Julian RG Evans. Drying of colloidal droplets on superhydrophobic surfaces. Journal of colloid and interface science, 351(1):283–287, 2010. Ruoyang Chen, Liyuan Zhang, Xu Li, Lydia Ong, Ye Gaung Soe, Neil Sinsua, Sally L Gras, Rico F Tabor, Xungai Wang, and Wei Shen. Trace analysis and chemical identification on cellulose nanofibers-textured sers substrates using the “coffee ring” effect. ACS sensors, 2 (7):1060–1067, 2017. Abid Hussain, Da-Wen Sun, and Hongbin Pu. Sers detection of urea and ammonium sulfate adulterants in milk with coffee ring effect. Food Additives & Contaminants: Part A, 36 (6):851–862, 2019. Subhavna Juneja and Jaydeep Bhattacharya. Coffee ring effect assisted improved s. aureus screening on a physically restrained gold nanoflower enriched sers substrate. Colloids and Surfaces B: Biointerfaces, 182:110349, 2019. Weiping Zhou, Anming Hu, Shi Bai, Ying Ma, and Quanshuang Su. Surface-enhanced raman spectra of medicines with large-scale self-assembled silver nanoparticle films based 183 on the modified coffee ring effect. Nanoscale research letters, 9(1):1–9, 2014b. Xiaoyan Li, Alyssa R Sanderson, Selett S Allen, and Rebecca H Lahr. Tap water fingerprinting using a convolutional neural network built from images of the coffee-ring effect. Analyst, 145(4):1511–1523, 2020. Jürgen Schmidhuber. Deep learning in neural networks: An overview. Neural networks, 61: 85–117, 2015. Yandong Li, Zongbo Hao, and Hang Lei. Survey of convolutional neural network. Journal of Computer Applications, 36(9):2508, 2016b. Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. Deep learning. nature, 521(7553): 436–444, 2015. Falk Schwendicke, Tatiana Golla, Martin Dreher, and Joachim Krois. Convolutional neural networks for dental image diagnostics: A scoping review. Journal of dentistry, 91:103226, 2019. Titus Josef Brinker, Achim Hekler, Jochen Sven Utikal, Niels Grabe, Dirk Schadendorf, Joachim Klode, Carola Berking, Theresa Steeb, Alexander H Enk, and Christof Von Kalle. Skin cancer classification using convolutional neural networks: systematic review. Journal of medical Internet research, 20(10):e11936, 2018. DE RUMBERT. Learning internal representations by error propagation. Parallel distributed processing, 1:318–363, 1986. Y-T Zhou, Rama Chellappa, Aseem Vaid, and B Keith Jenkins. Image restoration using a neural network. IEEE transactions on acoustics, speech, and signal processing, 36(7): 1141–1151, 1988. Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks. Communications of the ACM, 60(6):84–90, 2017. Jiuxiang Gu, Zhenhua Wang, Jason Kuen, Lianyang Ma, Amir Shahroudy, Bing Shuai, Ting Liu, Xingxing Wang, Gang Wang, Jianfei Cai, et al. Recent advances in convolutional neural networks. Pattern recognition, 77:354–377, 2018. Yann A LeCun, Léon Bottou, Genevieve B Orr, and Klaus-Robert Müller. Efficient backprop. In Neural networks: Tricks of the trade, pages 9–48. Springer, 2012. Geoffrey E Hinton. Rectified linear units improve restricted boltzmann machines vinod nair. 2010. Tao Wang, David J Wu, Adam Coates, and Andrew Y Ng. End-to-end text recognition with convolutional neural networks. In Proceedings of the 21st international conference on pattern recognition (ICPR2012), pages 3304–3308. IEEE, 2012. Y-Lan Boureau, Jean Ponce, and Yann LeCun. A theoretical analysis of feature pooling in 184 visual recognition. In Proceedings of the 27th international conference on machine learning (ICML-10), pages 111–118, 2010. Naila Murray and Florent Perronnin. Generalized max pooling. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2473–2480, 2014. Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large- scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Matthew D Zeiler and Rob Fergus. Visualizing and understanding convolutional networks. In European conference on computer vision, pages 818–833. Springer, 2014. Geoffrey E Hinton, Nitish Srivastava, Alex Krizhevsky, Ilya Sutskever, and Ruslan R Salakhutdinov. Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:1207.0580, 2012. Min Lin, Qiang Chen, and Shuicheng Yan. Network in network. arXiv preprint arXiv:1312.4400, 2013. Shreyas Saxena and Jakob Verbeek. Convolutional neural fabrics. Advances in neural information processing systems, 29, 2016. Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, et al. Imagenet large scale visual recognition challenge. International journal of computer vision, 115(3):211– 252, 2015. Yichuan Tang. Deep learning using linear support vector machines. arXiv preprint arXiv:1306.0239, 2013. Gjorgji Madjarov, Dragi Kocev, Dejan Gjorgjevikj, and Sašo Džeroski. An extensive experimental comparison of methods for multi-label learning. Pattern recognition, 45 (9):3084–3104, 2012. Xiao-Xiao Niu and Ching Y Suen. A novel hybrid cnn–svm classifier for recognizing handwritten digits. Pattern Recognition, 45(4):1318–1325, 2012. Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1–9, 2015. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016. Frank Rosenblatt. The perceptron, a perceiving and recognizing automaton Project Para. Cornell Aeronautical Laboratory, 1957. 185 Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998. Sepp Hochreiter and Jürgen Schmidhuber. Long short-term memory. Neural computation, 9(8):1735–1780, 1997. Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473, 2014. Ankur P Parikh, Oscar Täckström, Dipanjan Das, and Jakob Uszkoreit. A decomposable attention model for natural language inference. arXiv preprint arXiv:1606.01933, 2016. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. Advances in neural information processing systems, 30, 2017. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre- training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018. Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020. Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in neural information processing systems, 28, 2015. Mark Chen, Alec Radford, Rewon Child, Jeffrey Wu, Heewoo Jun, David Luan, and Ilya Sutskever. Generative pretraining from pixels. In International conference on machine learning, pages 1691–1703. PMLR, 2020. Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020. Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, and Sergey Zagoruyko. End-to-end object detection with transformers. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part I 16, pages 213–229. Springer, 2020. Xizhou Zhu, Weijie Su, Lewei Lu, Bin Li, Xiaogang Wang, and Jifeng Dai. Deformable detr: Deformable transformers for end-to-end object detection. arXiv preprint arXiv:2010.04159, 2020. Sixiao Zheng, Jiachen Lu, Hengshuang Zhao, Xiatian Zhu, Zekun Luo, Yabiao Wang, Yanwei Fu, Jianfeng Feng, Tao Xiang, Philip HS Torr, et al. Rethinking semantic 186 segmentation from a sequence-to-sequence perspective with transformers. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 6881–6890, 2021. Hanting Chen, Yunhe Wang, Tianyu Guo, Chang Xu, Yiping Deng, Zhenhua Liu, Siwei Ma, Chunjing Xu, Chao Xu, and Wen Gao. Pre-trained image processing transformer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12299–12310, 2021. Luowei Zhou, Yingbo Zhou, Jason J Corso, Richard Socher, and Caiming Xiong. End-to- end dense video captioning with masked transformer. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 8739–8748, 2018. Corinna Cortes and Vladimir Vapnik. Support-vector networks. Machine learning, 20: 273–297, 1995. Kok Seng Chua. Efficient computations for large least square support vector machine classifiers. Pattern Recognition Letters, 24(1-3):75–80, 2003. William S Noble. What is a support vector machine? Nature biotechnology, 24(12): 1565– 1567, 2006. Peter D Caie, Neofytos Dimitriou, and Ognjen Arandjelović. Precision medicine in digital pathology via image analysis and machine learning. In Artificial intelligence and deep learning in pathology, pages 149–173. Elsevier, 2021. Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep learning. MIT press, 2016. Oded Maimon and Lior Rokach. Data mining and knowledge discovery handbook. 2005. Stefano Ceri, Piero Fraternali, Aldo Bongio, Marco Brambilla, Sara Comai, and Maristella Matera. Morgan Kaufmann series in data management systems: Designing data-intensive Web applications. Morgan Kaufmann, 2003. Amanpreet Singh, Narina Thakur, and Aakanksha Sharma. A review of supervised machine learning algorithms. In 2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom), pages 1310–1315. Ieee, 2016. Yanli Liu, Yourong Wang, and Jian Zhang. New machine learning algorithm: Random forest. In Information Computing and Applications: Third International Conference, ICICA 2012, Chengde, China, September 14-16, 2012. Proceedings 3, pages 246–252. Springer, 2012. Trevor Hastie, Robert Tibshirani, Jerome H Friedman, and Jerome H Friedman. The elements of statistical learning: data mining, inference, and prediction, volume 2. Springer, 2009. MS Gaya, MU Zango, LA Yusuf, Mamunu Mustapha, Bashir Muhammad, Ashiru Sani, 187 Aminu Tijjani, NA Wahab, and MTM Khairi. Estimation of turbidity in water treatment plant using hammerstein-wiener and neural network technique. Indonesian Journal of Electrical Engineering and Computer Science, 5(3):666–672, 2017. Yucai Zhu. Estimation of an n–l–n hammerstein–wiener model. Automatica, 38(9): 1607– 1614, 2002. SI Abba, Quoc Bao Pham, AG Usman, Nguyen Thi Thuy Linh, DS Aliyu, Quyen Nguyen, and Quang-Vu Bach. Emerging evolutionary algorithm integrated with kernel principal component analysis for modeling the performance of a water treatment plant. Journal of Water Process Engineering, 33:101081, 2020. Adrian Wills, Thomas B Schön, Lennart Ljung, and Brett Ninness. Identification of hammerstein–wiener models. Automatica, 49(1):70–81, 2013. Walid Allafi, Ivan Zajic, Kotub Uddin, and Keith J Burnham. Parameter estimation of the fractional-order hammerstein–wiener model using simplified refined instrumental variable fractional-order continuous time. IET Control Theory & Applications, 11(15):2591–2598, 2017. Claudio Moraga, Enric Trillas, and Sergio Guadarrama. Multiple-valued logic and artificial intelligence fundamentals of fuzzy control revisited. Artificial intelligence review, 20: 169–197, 2003. M Afroozeh, MR Sohrabi, M Davallo, SY Mimezami, F Motlee, and M Khosravi. Application of artificial neural network, fuzzy inference system and adaptive neuro-fuzzy inference system to predict the removal of pb (ii) ions from the aqueous solution by using magnetic graphene/nylon 6. Chem Sci J, 9(2):1–7, 2018. Taesup Moon, Yejin Kim, Hyosu Kim, Myungwon Choi, and Changwon Kim. Fuzzy rule-based inference of reasons for high effluent quality in municipal wastewater treatment plant. Korean Journal of Chemical Engineering, 28:817–824, 2011. Okyay Kaynak, Lotfi A Zadeh, Burhan Türksen, and Imre J Rudas. Computational intelligence: Soft computing and fuzzy-neuro integration with applications, volume 162. Springer Science & Business Media, 1998. Lotfi A Zadeh. Roles of soft computing and fuzzy logic in the conception, design and deployment of information/intelligent systems. In Computational intelligence: soft computing and fuzzy-neuro integration with applications, pages 1–9. Springer, 1998. Amir Ali Shahmansouri, Maziar Yazdani, Saeed Ghanbari, Habib Akbarzadeh Bengar, Abouzar Jafari, and Hamid Farrokh Ghatte. Artificial neural network model to predict the compressive strength of eco-friendly geopolymer concrete incorporating silica fume and natural zeolite. Journal of Cleaner Production, 279:123697, 2021. Phil Kim and Phil Kim. Convolutional neural network. MATLAB deep learning: with machine learning, neural networks and artificial intelligence, pages 121–147, 2017. 188 U Rajendra Acharya, Shu Lih Oh, Yuki Hagiwara, Jen Hong Tan, Muhammad Adam, Arkadiusz Gertych, and Ru San Tan. A deep convolutional neural network model to classify heartbeats. Computers in biology and medicine, 89:389–396, 2017. S Kevin Zhou, Daniel Rueckert, and Gabor Fichtinger. Handbook of medical image computing and computer assisted intervention. Academic Press, 2019. Hongce Zhang, Maxwell Shinn, Aarti Gupta, Arie Gurfinkel, Nham Le, and Nina Narodytska. Verification of recurrent neural networks for cognitive tasks via reachability analysis. In ECAI 2020, pages 1690–1697. IOS Press, 2020. Kamilya Smagulova and Alex Pappachen James. Overview of long short-term memory neural networks. Deep Learning Classifiers with Memristive Networks: Theory and Applications, pages 139–153, 2020. Jitendra Agrawal and Tom V Mathew. Transit route network design using parallel genetic algorithm. Journal of Computing in Civil Engineering, 18(3):248–256, 2004. Xin-She Yang. Nature-inspired optimization algorithms: Challenges and open problems. Journal of Computational Science, 46:101104, 2020. Sourabh Katoch, Sumit Singh Chauhan, and Vijay Kumar. A review on genetic algorithm: past, present, and future. Multimedia Tools and Applications, 80:8091–8126, 2021. N Karimi, S Kazem, D Ahmadian, H Adibi, and LV Ballestra. On a generalized gaussian radial basis function: Analysis and applications. Engineering analysis with boundary elements, 112:46–57, 2020. Michael James David Powell et al. Approximation theory and methods. Cambridge university press, 1981. Kamel Baddari, Tahar Aïfa, Noureddine Djarfour, and Jalal Ferahtia. Application of a radial basis function artificial neural network to seismic data inversion. Computers & geosciences, 35(12):2338–2344, 2009. J Farhoudi, SM Hosseini, and M Sedghi-Asl. Application of neuro-fuzzy model to estimate the characteristics of local scour downstream of stilling basins. Journal of hydroinformatics, 12(2):201–211, 2010. Dervis Karaboga and Ebubekir Kaya. Adaptive network based fuzzy inference system (anfis) training approaches: a comprehensive survey. Artificial Intelligence Review, 52:2263– 2293, 2019. PA Adedeji, SO Masebinu, SA Akinlabi, and N Madushele. Adaptive neuro-fuzzy inference system (anfis) modelling in energy system and water resources. In Optimization Using Evolutionary Algorithms and Metaheuristics, pages 117–133. CRC Press, 2019. Qin-Yu Zhu, A Kai Qin, Ponnuthurai N Suganthan, and Guang-Bin Huang. Evolutionary 189 extreme learning machine. Pattern recognition, 38(10):1759–1763, 2005. Guang-Bin Huang, Qin-Yu Zhu, and Chee-Kheong Siew. Extreme learning machine: a new learning scheme of feedforward neural networks. In 2004 IEEE international joint conference on neural networks (IEEE Cat. No. 04CH37541), volume 2, pages 985–990. Ieee, 2004b. Konstantinos Demertzis, Lazaros Iliadis, Elias Pimenidis, and Panagiotis Kikiras. Variational restricted boltzmann machines to automated anomaly detection. Neural Computing and Applications, 34(18):15207–15220, 2022. Fouzi Harrou, Abdelkader Dairi, Ying Sun, and Mohamed Senouci. Statistical monitoring of a wastewater treatment plant: A case study. Journal of environmental management, 223:807–814, 2018. HY Li, H Osman, CW Kang, and T Ba. Numerical and experimental investigation of uv disinfection for water treatment. Applied Thermal Engineering, 111:280–291, 2017. Chen Xu, GP Rangaiah, and XS Zhao. A computational study of the effect of lamp arrangements on the performance of ultraviolet water disinfection reactors. Chemical Engineering Science, 122:299–306, 2015. Chen Xu, XS Zhao, and GP Rangaiah. Performance analysis of ultraviolet water disinfection reactors using computational fluid dynamics simulation. Chemical engineering journal, 221:398–406, 2013. David L Sedlak and Urs von Gunten. The chlorine dilemma. Science, 331(6013):42–43, 2011. Richard J Bull, LINDA BIRNBAUM, Kenneth P Cantor, Joan B Rose, Byron E Butterworth, REX Pegram, and Juoko Tuomisto. Water chlorination: essential process or cancer hazard? Toxicological Sciences, 28(2):155–166, 1995. André Felipe Librantz, Fábio Cosme Rodrigues dos Santos, and Cleber Gustavo Dias. Artificial neural networks to control chlorine dosing in a water treatment plant. Acta Scientiarum. Technology, 40:e37275–e37275, 2018. Lluís Godo-Pla, Jose Javier Rodríguez, Jordi Suquet, Pere Emiliano, Fernando Valero, Manel Poch, and Hèctor Monclús. Control of primary disinfection in a drinking water treatment plant based on a fuzzy inference system. Process Safety and Environmental Protection, 145:63–70, 2021. Kunwar P Singh and Shikha Gupta. Artificial intelligence based modeling for predicting the disinfection by-products in water. Chemometrics and Intelligent Laboratory Systems, 114: 122–131, 2012. JK Mahato and SK Gupta. Exploring applicability of artificial intelligence and multivariate linear regression model for prediction of trihalomethanes in drinking water. International Journal of Environmental Science and Technology, 19(6):5275–5288, 2022. 190 Jongkwan Park, Chan Ho Lee, Kyung Hwa Cho, Seongho Hong, Young Mo Kim, and Yongeun Park. Modeling trihalomethanes concentrations in water treatment plants using machine learning techniques. Desalination Water Treat, 111:125–133, 2018. Hongjun Lin, Qunyun Dai, Lili Zheng, Huachang Hong, Wenjing Deng, and Fuyong Wu. Radial basis function artificial neural network able to accurately predict disinfection by-product levels in tap water: Taking haloacetic acids as a case study. Chemosphere, 248:125999, 2020. Zeqiong Xu, Jiao Shen, Yuqing Qu, Huangfei Chen, Xiaoling Zhou, Huachang Hong, Hongjie Sun, Hongjun Lin, Wenjing Deng, and Fuyong Wu. Using simple and easy water quality parameters to predict trihalomethane occurrence in tap water. Chemosphere, 286:131586, 2022. Nicolás M Peleato. Application of convolutional neural networks for prediction of disinfection by-products. Scientific Reports, 12(1):612, 2022. Comfort N Okoji, Anthony I Okoji, Musa S Ibrahim, and Okpoko Obinna. Comparative analysis of adaptive neuro-fuzzy inference system (anfis) and rsrm models to predict dbp (trihalomethanes) levels in the water treatment plant. Arabian Journal of Chemistry, 15 (6):103794, 2022. José Andrés Cordero, Kai He, Kanjira Janya, Shinya Echigo, and Sadahiko Itoh. Predicting formation of haloacetic acids by chlorination of organic compounds using machine-learning-assisted quantitative structure-activity relationships. Journal of Hazardous Materials, 408:124466, 2021. Felix Wortmann and Kristina Flüchter. Internet of things: technology and value added. Business & Information Systems Engineering, 57:221–224, 2015. TS Imo, T Oomori, M Toshihiko, and F Tamaki. The comparative study of trihalomethanes in drinking water. International Journal of Environmental Science & Technology, 4: 421– 426, 2007. Huachang Hong, Zhiying Zhang, Aidi Guo, Liguo Shen, Hongjie Sun, Yan Liang, Fuyong Wu, and Hongjun Lin. Radial basis function artificial neural network (rbf ann) as well as the hybrid method of rbf ann and grey relational analysis able to well predict trihalomethanes levels in tap water. Journal of Hydrology, 591:125574, 2020. Lauren E Bergman, Jessica M Wilson, Mitchell J Small, and Jeanne M VanBriesen. Application of classification trees for predicting disinfection by-product formation targets from source water characteristics. Environmental Engineering Science, 33(7):455–470, 2016. Rabbi Sikder, Tianyu Zhang, and Tao Ye. Predicting thm formation and revealing its contributors in drinking water treatment using machine learning. ACS ES&T Water, 2023. Haroon R Mian, Guangji Hu, Kasun Hewage, Manuel J Rodriguez, and Rehan Sadiq. 191 Predicting unregulated disinfection by-products in water distribution networks using generalized regression neural networks. Urban Water Journal, 18(9):711–724, 2021. Guangji Hu, Haroon R Mian, Saeed Mohammadiun, Manuel J Rodriguez, Kasun Hewage, and Rehan Sadiq. Appraisal of machine learning techniques for predicting emerging disinfection byproducts in small water distribution networks. Journal of Hazardous Materials, 446:130633, 2023. Rama Rao Karri, JN Sahu, and BC Meikap. Improving efficacy of cr (vi) adsorption process on sustainable adsorbent derived from waste biomass (sugarcane bagasse) with help of ant colony optimization. Industrial Crops and Products, 143:111927, 2020. Ramesh Vinayagam, Niyam Dave, Thivaharan Varadavenkatesan, Natarajan Rajamohan, Mika Sillanpää, Ashok Kumar Nadda, Muthusamy Govarthanan, and Raja Selvaraj. Artificial neural network and statistical modelling of biosorptive removal of hexavalent chromium using macroalgal spent biomass. Chemosphere, 296:133965, 2022. Suraj Kumar Bhagat, Konstantina Pyrgaki, Sinan Q Salih, Tiyasha Tiyasha, Ufuk Beyaztas, Shamsuddin Shahid, and Zaher Mundher Yaseen. Prediction of copper ions adsorption by attapulgite adsorbent using tuned-artificial intelligence model. Chemosphere, 276:130162, 2021. Mohammad Sadegh Mazloom, Farzaneh Rezaei, Abdolhossein Hemmati-Sarapardeh, Maen M Husein, Sohrab Zendehboudi, and Amin Bemani. Artificial intelligence based methods for asphaltenes adsorption by nanocomposites: Application of group method of data handling, least squares support vector machine, and artificial neural networks. Nanomaterials, 10(5):890, 2020. Yamin Mesellem, Abdallah Abdallah El Hadj, Maamar Laidi, Salah Hanini, and Mohamed Hentabli. Computational intelligence techniques for modeling of dynamic adsorption of organic pollutants on activated carbon. Neural Computing and Applications, 33: 12493– 12512, 2021a. Mohammed Al-Yaari, Theyazn HH Aldhyani, and Sayeed Rushd. Prediction of arsenic removal from contaminated water using artificial neural network model. Applied Sciences, 12(3):999, 2022. H Mazaheri, M Ghaedi, MH Ahmadi Azqhandi, and AJPCCP Asfaram. Application of machine/statistical learning, artificial intelligence and statistical experimental design for the modeling and optimization of methylene blue and cd (ii) removal from a binary aqueous solution by natural walnut carbon. Physical Chemistry Chemical Physics, 19 (18):11299–11317, 2017. Zaki Uddin Ahmad, Lunguang Yao, Qiyu Lian, Fahrin Islam, Mark E Zappi, and Daniel Dianchen Gang. The use of artificial neural network (ann) for modeling adsorption of sunset yellow onto neodymium modified ordered mesoporous carbon. Chemosphere, 256:127081, 2020. 192 Manal Fawzy, Mahmoud Nasr, Samar Adel, Heba Nagy, and Shacker Helmi. Environmental approach and artificial intelligence for ni (ii) and cd (ii) biosorption from aqueous solution using typha domingensis biomass. Ecological Engineering, 95:743–752, 2016. Sami Ullah, Mohammed Ali Assiri, Mohamad Azmi Bustam, Abdullah G Al-Sehemi, Firas A Abdul Kareem, and Ahmad Irfan. Equilibrium, kinetics and artificial intelligence characteristic analysis for zn (ii) ion adsorption on rice husks digested with nitric acid. Paddy and Water Environment, 18:455–468, 2020. Ahmed S Mahmoud, Mohamed K Mostafa, and Mahmoud Nasr. Regression model, artificial intelligence, and cost estimation for phosphate adsorption using encapsulated nanoscale zero-valent iron. Separation Science and Technology, 54(1):13–26, 2019. Yamin Mesellem, Abdallah El Hadj Abdallah, Maamar Laidi, Salah Hanini, and Mohamed Hentabli. Artificial neural network modelling of multi-system dynamic adsorption of organic pollutants on activated carbon. Kemija u industriji: Časopis kemičara i kemijskih inženjera Hrvatske, 70(1-2):1–12, 2021b. Majid Mohammadi, Mehdi Safari, Mostafa Ghasemi, Amin Daryasafar, and Mehdi Sedighi. Asphaltene adsorption using green nanocomposites: Experimental study and adaptive neuro-fuzzy interference system modeling. Journal of Petroleum Science and Engineering, 177:1103–1113, 2019. AK Maurya, M Nagamani, Seung Won Kang, Jong-Taek Yeom, Jae-Keun Hong, Hyokyung Sung, CH Park, Paturi Uma Maheshwera Reddy, and NS Reddy. Development of artificial neural networks software for arsenic adsorption from an aqueous environment. Environmental Research, 203:111846, 2022. Jingxin Liu, Zelin Xu, and Wenjuan Zhang. Unraveling the role of fe in as (iii & v) removal by biochar via machine learning exploration. Separation and Purification Technology, page 123245, 2023. M Ghaedi, N Zeinali, AM Ghaedi, M Teimuori, and J Tashkhourian. Artificial neural network-genetic algorithm based optimization for the adsorption of methylene blue and brilliant green from aqueous solution by graphite oxide nanoparticle. Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy, 125:264–277, 2014. Kunrong Zeng, Kadda Hachem, Mariya Kuznetsova, Supat Chupradit, Chia-Hung Su, Hoang Chinh Nguyen, and AS El-Shafay. Molecular dynamic simulation and artificial intelligence of lead ions removal from aqueous solution using magnetic-ash-graphene oxide nanocomposite. Journal of Molecular Liquids, 347:118290, 2022. Yajun Wei, Jing Yu, Yonglin Du, Hongxu Li, and Chia-Hung Su. Artificial intelligence simulation of pb (ii) and cd (ii) adsorption using a novel metal organic framework-based nanocomposite adsorbent. Journal of Molecular Liquids, 343:117681, 2021. S Mandal, SS Mahapatra, and RK Patel. Neuro fuzzy approach for arsenic (iii) and 193 chromium (vi) removal from water. Journal of Water Process Engineering, 5:58–75, 2015a. S Mandal, SS Mahapatra, MK Sahu, and RK Patel. Artificial neural network modelling of as (iii) removal from water by novel hybrid material. Process Safety and Environmental Protection, 93:249–264, 2015b. S Mandal, SS Mahapatra, and RK Patel. Enhanced removal of cr (vi) by cerium oxide polyaniline composite: optimization and modeling approach using response surface methodology and artificial neural networks. Journal of Environmental Chemical Engineering, 3(2):870–885, 2015c. Selina Hube, Majid Eskafi, Kolbrún Fríða Hrafnkelsdóttir, Björg Bjarnadóttir, Margrét Ásta Bjarnadóttir, Snærós Axelsdóttir, and Bing Wu. Direct membrane filtration for wastewater treatment and resource recovery: A review. Science of the total environment, 710:136375, 2020. Wouter Pronk, An Ding, Eberhard Morgenroth, Nicolas Derlon, Peter Desmond, Michael Burkhardt, Bing Wu, and Anthony G Fane. Gravity-driven membrane filtration for water and wastewater treatment: a review. Water research, 149:553–565, 2019. Mohamed Zoubeik, Amgad Salama, and Amr Henni. A comprehensive experimental and artificial network investigation of the performance of an ultrafiltration titanium dioxide ceramic membrane: application in produced water treatment. Water and Environment Journal, 33(3):459–475, 2019. Masoud Fetanat, Mohammadali Keshtiara, Ze-Xian Low, Ramazan Keyikoglu, Alireza Khataee, Yasin Orooji, Vicki Chen, Gregory Leslie, and Amir Razmjou. Machine learning for advanced design of nanocomposite ultrafiltration membranes. Industrial & Engineering Chemistry Research, 60(14):5236–5250, 2021. Hammad Khan, Saad Ullah Khan, Sajjad Hussain, and Asmat Ullah. Modelling of transmembrane pressure using slot/pore blocking model, response surface and artificial intelligence approach. Chemosphere, 290:133313, 2022. Zakariah Yusof, Norhaliza Abdul Wahab, Syahira Ibrahim, Shafishuhaza Sahlan, and Mashitah Che Razali. Modeling of submerged membrane filtration processes using recurrent artificial neural networks. IAES International Journal of Artificial Intelligence, 9(1):155, 2020. Sara Nazif, Emad Mirashrafi, Bardia Roghani, and Gholamreza Nabi Bidhendi. Artificial intelligence–based optimization of reverse osmosis systems operation performance. Journal of Environmental Engineering, 146(2):04019106, 2020. Jaegyu Shim, Sanghun Park, and Kyung Hwa Cho. Deep learning model for simulating influence of natural organic matter in nanofiltration. Water Research, 197:117070, 2021. Yamina Ammi, Salah Hanini, and Latifa Khaouane. An artificial intelligence approach for modeling the rejection of anti-inflammatory drugs by nanofiltration and reverse osmosis membranes using kernel support vector machine and neural networks. Comptes Rendus. 194 Chimie, 24(2):243–254, 2021a. V Yangali-Quintanilla, A Verliefde, T-U Kim, A Sadmani, M Kennedy, and G Amy. Artificial neural network models based on qsar for predicting rejection of neutral organic compounds by polyamide nanofiltration and reverse osmosis membranes. Journal of membrane science, 342(1-2):251–262, 2009. Mohamed Zoubeik, Mohamed Echakouri, Amr Henni, and Amgad Salama. Taguchi optimization of operating conditions of a microfiltration alumina ceramic membrane and artificial neural-network modeling. Journal of Environmental Engineering, 148(4): 04022001, 2022. Samira Arefi-Oskoui, Alireza Khataee, and Vahid Vatanpour. Modeling and optimization of nldh/pvdf ultrafiltration nanocomposite membrane using artificial neural network- genetic algorithm hybrid. ACS Combinatorial Science, 19(7):464–477, 2017. Nurazizah Mahmod, Norhaliza Abdul Wahab, and Muhammad Sani Gaya. Modelling and control of fouling in submerged membrane bioreactor using neural network internal model control. IAES International Journal of Artificial Intelligence, 9(1):100, 2020. Çağla Odabaşı, Pelin Dologlu, Fatih Gülmez, Gizem Kuşoğlu, and Ömer Çağlar. Investigation of the factors affecting reverse osmosis membrane performance using machine-learning techniques. Computers & Chemical Engineering, 159:107669, 2022. Chen Wang, Li Wang, Allan Soo, Nirenkumar Bansidhar Pathak, and Ho Kyong Shon. Machine learning based prediction and optimization of thin film nanocomposite membranes for organic solvent nanofiltration. Separation and Purification Technology, 304:122328, 2023. Sung Ju Im, Viet Duc Nguyen, and Am Jang. Prediction of forward osmosis membrane engineering factors using artificial intelligence approach. Journal of Environmental Management, 318:115544, 2022. Yamina Ammi, Latifa Khaouane, and Salah Hanini. Stacked neural networks for predicting the membranes performance by treating the pharmaceutical active compounds. Neural Computing and Applications, pages 1–16, 2021b. Latifa Khaouane, Yamina Ammi, and Salah Hanini. Modeling the retention of organic compounds by nanofiltration and reverse osmosis membranes using bootstrap aggregated neural networks. Arabian Journal for Science and Engineering, 42:1443–1453, 2017. Yamina Ammi, Latifa Khaouane, and Salah Hanini. Prediction of the rejection of organic compounds (neutral and ionic) by nanofiltration and reverse osmosis membranes using neural networks. Korean Journal of Chemical Engineering, 32:2300–2310, 2015. Yamina Ammi, Latifa Khaouane, and Salah Hanini. A model based on bootstrapped neural networks for modeling the removal of organic compounds by nanofiltration and reverse osmosis membranes. Arabian Journal for Science and Engineering, 43:6271–6284, 2018. 195 Sangsuk Lee and Jooho Kim. Prediction of nanofiltration and reverse-osmosis-membrane rejection of organic compounds using random forest model. Journal of Environmental Engineering, 146(11):04020127, 2020. Robert D Deegan, Olgica Bakajin, Todd F Dupont, Greb Huber, Sidney R Nagel, and Thomas A Witten. Capillary flow as the cause of ring stains from dried liquid drops. Nature, 389(6653):827–829, 1997. Yanan Li, Qiang Yang, Mingzhu Li, and Yanlin Song. Rate-dependent interface capture beyond the coffee-ring effect. Scientific reports, 6(1):1–8, 2016c. Noushine Shahidzadeh, Marthe FL Schut, Julie Desarnaud, Marc Prat, and Daniel Bonn. Salt stains from evaporating droplets. Scientific reports, 5(1):1–9, 2015. Yasunari Matsuzaka and Yoshihiro Uesawa. Optimization of a deep-learning method based on the classification of images generated by parameterized deep snap a novel molecular-image-input technique for quantitative structure–activity relationship (qsar) analysis. Frontiers in bioengineering and biotechnology, page 65, 2019. Jesse G Meyer, Shengchao Liu, Ian J Miller, Joshua J Coon, and Anthony Gitter. Learning drug functions from chemical structures with convolutional neural networks and random forests. Journal of chemical information and modeling, 59(10):4438–4449, 2019. Félix Lussier, Dimitris Missirlis, Joachim P Spatz, and Jean-François Masson. Machine-learning-driven surface-enhanced raman scattering optophysiology reveals multiplexed metabolite gradients near cells. ACS nano, 13(2):1403–1411, 2019. William John Thrift and Regina Ragan. Quantification of analyte concentration in the single molecule regime using convolutional neural networks. Analytical chemistry, 91(21): 13337–13342, 2019. Ling Liu and M Tamer Özsu. Encyclopedia of database systems, volume 6. Springer, 2009. Dongmao Zhang, Yong Xie, Melissa F Mrozek, Corasi Ortiz, V Jo Davisson, and Dor Ben-Amotz. Raman detection of proteomic analytes. Analytical chemistry, 75(21): 5703– 5709, 2003. Corasi Ortiz, Dongmao Zhang, Yong Xie, V Jo Davisson, and Dor Ben-Amotz. Identification of insulin variants using raman spectroscopy. Analytical Biochemistry, 332(2):245–252, 2004. Viral H Chhasatia, Abhijit S Joshi, and Ying Sun. Effect of relative humidity on contact angle and particle deposition morphology of an evaporating colloidal drop. Applied Physics Letters, 97(23):231909, 2010. Corasi Ortiz, Dongmao Zhang, Yong Xie, Alexander E Ribbe, and Dor Ben-Amotz. Validation of the drop coating deposition raman method for protein analysis. Analytical biochemistry, 353(2):157–166, 2006. 196 Vijayakumar Kadappa and Atul Negi. A theoretical investigation of feature partitioning principal component analysis methods. Pattern Analysis and Applications, 19(1):79–91, 2016. Mark M Benjamin. Water chemistry. Waveland Press, 2014. William M Haynes, David R Lide, and Thomas J Bruno. CRC handbook of chemistry and physics. CRC press, 2016. Enric Junqué de Fortuny, David Martens, and Foster Provost. Predictive modeling with big data: is bigger really better? Big data, 1(4):215–226, 2013. David Martens, Foster Provost, Jessica Clark, and Enric Junqué de Fortuny. Mining massive fine-grained behavior data to improve predictive analytics. MIS quarterly, 40(4):869–888, 2016. Nathalie Japkowicz and Shaju Stephen. The class imbalance problem: A systematic study. Intelligent data analysis, 6(5):429–449, 2002. Bartosz Krawczyk. Learning from imbalanced data: open challenges and future directions. Progress in Artificial Intelligence, 5(4):221–232, 2016. Steven J Burian, Stephan J Nix, Robert E Pitt, and S Rocky Durrans. Urban wastewater management in the united states: Past, present, and future. Journal of Urban Technology, 7(3):33–62, 2000. Martin R Coghill, Gary A Eaton, and Nathan D Faber. A long-term commitment to pipeline infrastructure: Implementing, funding, and delivering the san diego county water authority’s asset management program. In Pipelines 2014: From Underground to the Forefront of Innovation and Sustainability, pages 1187–1197. 2014. Steven Folkman. Water main break rates in the usa and canada: A comprehensive study. 2018. PE Darlene Garcia and PE Susan Funchion. How to select and prioritize water main replacement. Opflow, 41(10):10–14, 2015. David A Cornwell, Richard A Brown, and Steve H Via. National survey of lead service line occurrence. Journal-American Water Works Association, 108(4):E182–E191, 2016. Orazio Giustolisi and Luigi Berardi. Prioritizing pipe replacement: From multiobjective genetic algorithms to operational decision support. Journal of Water Resources Planning and Management, 135(6):484–492, 2009. Peter D Rogers and Neil S Grigg. Failure assessment model to prioritize pipe replacement in water utility asset management. In Water Distribution Systems Analysis Symposium 2006, pages 1–17, 2008. 197 Y Tlili and A Nafi. A practical decision scheme for the prioritization of water pipe replacement. Water Science and Technology: Water Supply, 12(6):895–917, 2012. Go Bong Choi, Jong Woo Kim, Jung Chul Suh, Kwang Ho Jang, and Jong Min Lee. A prioritization method for replacement of water mains using rank aggregation. Korean Journal of Chemical Engineering, 34(10):2584–2590, 2017. Mohamed Marzouk, Said Abdel Hamid, and Moheeb El-Said. A methodology for prioritizing water mains rehabilitation in egypt. HBRC Journal, 11(1):114–128, 2015. Cheng-I Ho, Min-Der Lin, and Shang-Lien Lo. Prioritizing pipe replacement in a water distribution system using a seismic-based artificial neural network model. Environmental engineering science, 26(4):745–752, 2009. Gregory J Kirmeyer. Guidance manual for monitoring distribution system water quality. American Water Works Association, 2002. Mohsin J Qazi, Rinse W Liefferink, Simon J Schlegel, Ellen HG Backus, Daniel Bonn, and Noushine Shahidzadeh. Influence of surfactants on sodium chloride crystallization in confinement. Langmuir, 33(17):4260–4268, 2017. Xiaoxiao Wei, Jian Yang, Zhiyong Li, Yunlan Su, and Dujin Wang. Comparison investigation of the effects of ionic surfactants on the crystallization behavior of calcium oxalate: From cationic to anionic surfactant. Colloids and Surfaces A: Physicochemical and Engineering Aspects, 401:107–115, 2012. Maria Sammalkorpi, Mikko Karttunen, and Mikko Haataja. Ionic surfactant aggregates in saline solutions: sodium dodecyl sulfate (sds) in the presence of excess sodium chloride (nacl) or calcium chloride (cacl2). The Journal of Physical Chemistry B, 113(17): 5863– 5870, 2009. Julie Desarnaud, Hannelore Derluyn, Jan Carmeliet, Daniel Bonn, and Noushine Shahidzadeh. Metastability limit for the nucleation of nacl crystals in confinement. The journal of physical chemistry letters, 5(5):890–895, 2014. Fiona C Meldrum and Cedrick O’Shaughnessy. Crystallization in confinement. Advanced Materials, 32(31):2001068, 2020. Xin Zhong, Alexandru Crivoi, and Fei Duan. Sessile nanofluid droplet drying. Advances in colloid and interface science, 217:13–30, 2015. Huicheng Feng, Karen Siew-Ling Chong, Kian-Soo Ong, and Fei Duan. Octagon to square wetting area transition of water–ethanol droplets on a micropyramid substrate by increasing ethanol concentration. Langmuir, 33(5):1147–1154, 2017. Xin Zhong and Fei Duan. Flow regime and deposition pattern of evaporating binary mixture droplet suspended with particles. The European Physical Journal E, 39(2):1–6, 2016. 198 Manos Anyfantakis, Zheng Geng, Mathieu Morel, Sergii Rudiuk, and Damien Baigl. Modulation of the coffee-ring effect in particle/surfactant mixtures: the importance of particle–interface interactions. Langmuir, 31(14):4113–4120, 2015. Bo Zhang, Xuemei Chen, Jure Dobnikar, Zuankai Wang, and Xianren Zhang. Spontaneous wenzel to cassie dewetting transition on structured surfaces. Physical review fluids, 1(7): 073904, 2016. Chenglong Xu, Shuhua Peng, Greg Qiao, and Xuehua Zhang. Effects of the molecular structure of a self-assembled monolayer on the formation and morphology of surface nanodroplets. Langmuir, 32(43):11197–11202, 2016. Leila Bahmani, Mahdi Neysari, and Maniya Maleki. The study of drying and pattern formation of whole human blood drops and the effect of thalassaemia and neonatal jaundice on the patterns. Colloids and Surfaces A: Physicochemical and Engineering Aspects, 513: 66–75, 2017. Hau Him Lee, Sau Chung Fu, Chi Yan Tso, and Christopher YH Chao. Study of residue patterns of aqueous nanofluid droplets with different particle sizes and concentrations on different substrates. International Journal of Heat and Mass Transfer, 105:230–236, 2017. Nainsi Saxena, Tapaswinee Naik, and Santanu Paria. Organization of sio2 and tio2 nanoparticles into fractal patterns on glass surface for the generation of superhydrophilicity. The Journal of Physical Chemistry C, 121(4):2428–2436, 2017. Hui Li, Hao Luo, Zhen Zhang, Yongjun Li, Bin Xiong, Chunyan Qiao, Xuan Cao, Tie Wang, Yan He, and Guangyin Jing. Direct observation of nanoparticle multiple-ring pattern formation during droplet evaporation with dark-field microscopy. Physical Chemistry Chemical Physics, 18(18):13018–13025, 2016d. Xuemei Chen, Ruiyuan Ma, Jintao Li, Chonglei Hao, Wei Guo, Bing Lam Luk, Shuai Cheng Li, Shuhuai Yao, and Zuankai Wang. Evaporation of droplets on superhydrophobic surfaces: Surface roughness and small droplet size effects. Physical review letters, 109 (11):116101, 2012. Niranjan A Malvadkar, Matthew J Hancock, Koray Sekeroglu, Walter J Dressick, and Melik C Demirel. An engineered anisotropic nanofilm with unidirectional wetting properties. Nature materials, 9(12):1023–1028, 2010. Eleanor R Townsend, Willem JP van Enckevort, Jan AM Meijer, and Elias Vlieg. Additive enhanced creeping of sodium chloride crystals. Crystal Growth & Design, 17(6):3107–3115, 2017. Subra Suresh. Colloid model for atoms. Nature materials, 5(4):253–254, 2006. William G Walter. Standard methods for the examination of water and wastewater, 1961. 199 Carlos Rodriguez-Navarro and Eric Doehne. Salt weathering: influence of evaporation rate, supersaturation and crystallization pattern. Earth Surface Processes and Landforms: The Journal of the British Geomorphological Research Group, 24(3):191–209, 1999. Alvaro G Marin, Hanneke Gelderblom, Detlef Lohse, and Jacco H Snoeijer. Order-to-disorder transition in ring-shaped colloidal stains. Physical review letters, 107(8):085502, 2011. Marti J Anderson. Permutational multivariate analysis of variance (permanova). Wiley statsref: statistics reference online, pages 1–15, 2014. Ralph G O’Brien and Mary K Kaiser. Manova method for analyzing repeated measures designs: an extensive primer. Psychological bulletin, 97(2):316, 1985. Adery CA Hope. A simplified monte carlo significance test procedure. Journal of the Royal Statistical Society: Series B (Methodological), 30(3):582–598, 1968. Frank Nielsen. On a variational definition for the jensen-shannon symmetrization of distances based on the information radius. Entropy, 23(4):464, 2021. Christopher Manning and Hinrich Schutze. Foundations of statistical natural language processing. MIT press, 1999. Ido Dagan, Lillian Lee, and Fernando Pereira. Similarity-based methods for word sense disambiguation. arXiv preprint cmp-lg/9708010, 1997. Dominik Maria Endres and Johannes E Schindelin. A new metric for probability distributions. IEEE Transactions on Information theory, 49(7):1858–1860, 2003. Ferdinand Osterreicher and Igor Vajda. A new class of metric divergences on probability spaces and its applicability in statistics. Annals of the Institute of Statistical Mathematics, 55(3):639–653, 2003. Bent Fuglede and Flemming Topsoe. Jensen-shannon divergence and hilbert space embedding. In International Symposium onInformation Theory, 2004. ISIT 2004. Proceedings., page 31. IEEE, 2004. Joseph B Kruskal and Myron Wish. Multidimensional scaling. Sage, 1978. Andreas Buja, Deborah F Swayne, Michael L Littman, Nathaniel Dean, Heike Hofmann, and Lisha Chen. Data visualization with multidimensional scaling. Journal of computational and graphical statistics, 17(2):444–472, 2008. Alisha Faherty. Tapped out: How newark, new jersey’s lead drinking water crisis illuminates the inadequacy of the federal drinking water regulatory scheme and fuels environmental injustice throughout the nation. Environmental Claims Journal, 33(4):304– 327, 2021. Avraham Ebenstein. The consequences of industrialization: evidence from water pollution and digestive cancers in china. Review of Economics and Statistics, 94(1):186–201, 2012. 200 Guilherme Lages Barbosa, Francisca Daiane Almeida Gadelha, Natalya Kublik, Alan Proctor, Lucas Reichelm, Emily Weissinger, Gregory M Wohlleb, and Rolf U Halden. Comparison of land, water, and energy requirements of lettuce grown using hydroponic vs. conventional agricultural methods. International journal of environmental research and public health, 12(6):6879–6891, 2015. Lin Yang, Yuantao Yang, Haodong Lv, Dong Wang, Yiming Li, and Weijun He. Water usage for energy production and supply in china: Decoupled from industrial growth? Science of the Total Environment, 719:137278, 2020. Chi Thanh Vu and Tingting Wu. Recent progress in adsorptive removal of per-and poly-fluoroalkyl substances (pfas) from water/wastewater. Critical Reviews in Environmental Science and Technology, 52(1):90–129, 2022. Aditi Podder, AHM Anwar Sadmani, Debra Reinhart, Ni-Bin Chang, and Ramesh Goel. Per and poly-fluoroalkyl substances (pfas) as a contaminant of emerging concern in surface water: a transboundary review of their occurrences and toxicity effects. Journal of hazardous materials, 419:126361, 2021. Gulzar Alam, Ihsanullah Ihsanullah, Mu Naushad, and Mika Sillanpää. Applications of artificial intelligence in water treatment for optimization and automation of adsorption processes: Recent advances and prospects. Chemical Engineering Journal, 427:130011, 2022. Nawal Taoufik, Wafaa Boumya, Mounia Achak, Hamid Chennouk, Raf Dewil, and Noureddine Barka. The state of art on the prediction of efficiency and modeling of the processes of pollutants removal based on machine learning. Science of The Total Environment, 807:150554, 2022. Ariya Gordanshekan, Shakiba Arabian, Ali Reza Solaimany Nazar, Mehrdad Farhadian, and Shahram Tangestaninejad. A comprehensive comparison of green bi2wo6/g-c3n4 and bi2wo6/tio2 s-scheme heterojunctions for photocatalytic adsorption/degradation of cefixime: Artificial neural network, degradation pathway, and toxicity estimation. Chemical Engineering Journal, 451:139067, 2023. Yifan Xie, Yongqi Chen, Qing Lian, Hailong Yin, Jian Peng, Meng Sheng, and Yimeng Wang. Enhancing real-time prediction of effluent water quality of wastewater treatment plant based on improved feedforward neural network coupled with optimization algorithm. Water, 14(7):1053, 2022. Fausto Milletari, Nassir Navab, and Seyed-Ahmad Ahmadi. V-net: Fully convolutional neural networks for volumetric medical image segmentation. In 2016 fourth international conference on 3D vision (3DV), pages 565–571. Ieee, 2016. Afra Zomorodian and Gunnar Carlsson. Computing persistent homology. In Proceedings of the twentieth annual symposium on Computational geometry, pages 347–356, 2004. Tamal K Dey, Facundo Mémoli, and Yusu Wang. Topological analysis of nerves, reeb 201 spaces, mappers, and multiscale mappers. arXiv preprint arXiv:1703.07387, 2017. Gunnar Carlsson. Topological methods for data modelling. Nature Reviews Physics, 2(12): 697–708, 2020. Ross Geoghegan. Topological methods in group theory, volume 243. Springer Science & Business Media, 2007. 202 APPENDIX Table 6.1: Measured water chemistry data from tap water samples collected across Michigan and treatment information from annual municipal water quality reports and system operators. Averages and standard deviations are listed for values conducted in replicate. City F − (mM) NO −3 (mM) Zn (mM) TOC (ppm) −3 MSU - academic hall 0.04 BD 2.3x10 3.1 Durand 0.03 0.02 BD 1.3 Kalamazoo 0.04 0.03 BD BD Portland 0.03 BD BD 2.1 Battle Creek site A 0.05 BD 3.9*10−3 0.79 Battle Creek site B 0.05 0.02 BD 1.2 −4 Charlotte 0.02 0.01 1.1*10 1.4 Fowlerville 0.04 BD BD 1.4 −4 Lansing site A 0.01 0.01 1.2*10 1.5 −4 Lansing site B 0.03 0.01 9.1*10 1.5 −4 East Lansing 0.02 0.03 6.3*10 1.3 Howell 0.03 BD 1.5*10−4 BD −4 MSU - residence hall 0.05 BD 4.2*10 3.2 −4 Williamston 0.03 BD 3.5*10 2.2 Genoa Twp soft BD BD 1.7*10−4 2.2 −4 Genoa Twp BD BD 1.1*10 2.0 Rest stop Okemos 0.03 BD BD 1.1 Rest stop Zeeland 0.04 BD 2.8*10−3 1.0 −4 Rest stop I96/M66 0.03 BD 4.9*10 3.3 −3 Rest stop Fenton 0.06 0.02 1.0*10 1.0 −4 Allegan 0.03 BD 5.5*10 BD Genoa Twp BD BD 1.2*10−4 BD −3 Detroit 0.03 0.07 3.2*10 1.6 −4 Flint 0.04 0.03 7.5*10 BD hline Swartz Creek 0.03 0.03 1.2*10−4 BD −4 Grand Rapids 0.03 0.03 2.3*10 BD −4 Holland 0.04 0.03 8.9*10 BD Wyoming 0.03 0.03 BD BD 203 Table 6.2: Composition of synthetic tap water solutions. Chemicals (mM) Detroit Lansing MSU hard water NaHCO3 0.23 0.50 0.55 Na2SO4 - 1.20 - MgCl2(H2O)6 0.25 0.53 0.40 MgSO4(H2O)7 0.10 - 0.80 MgCO3 - - 0.50 CaCl2 - 0.56 - CaSO4 0.16 - - CaCO3 0.50 - 2.60 KCl - 0.100 0.027 KH2 PO4 0.0152 0.0100 0.0113 NaNO3 0.0725 0.0140 - KF (H2O)2 0.0325 0.0270 0.0430 F eCl3 0.0016 - 0.0190 CuCl2(H2O)2 0.0006 0.0005 0.0020 Table 6.3: Examples of raw and pre-processed images used for the convolutional neural network (CNN) model. Genoa Genoa Water Township Detroit Township Howell Williamston sample private well well RO untreated Raw image Pre-processed image 204 Table 6.4: Five replicates of each freshly collected water sample (stored less than one week). The lab temperature was 24-25 ◦C and relative humidity 52% for this experiment. MINIMALLY TREATED GROUNDWATER MSU academic hall Durand Kalamazoo Portland Battle Creek Site Battle Creek Site B Charlotte Fowlerville LIME SOFTENED Lansing Site A Lansing Site B East Lansing Howell ION EXCHANGE MSU residence hall Williamston Genoa Township private well softened UNTREATED GROUNDWATER Genoa Township private Rest stop A - Okemos well untreated Rest stop C - Zeeland A Rest stop D - M66/I96 East REVERSE OSMOSIS Allegan Genoa Township private well RO 205 Table 6.4: (cont’d) SURFACE WATER Detroit Flint Swartz Creek Grand Rapids Holland Wyoming 206 Table 6.5: Consistency of tap water residue patterns on different mirrored aluminum slides prepared by different researchers, with nanopure water and synthetic hard freshwater controls. The lab temperature was 24 ◦C and relative humidity 47%. Analyst 1 Analyst 2 Analyst 3 Least Experienced 1 Moderate Experienced 1 year Experience week 0.5 month MSU academic hall slide 7 8 9 7 8 9 1 2 3 Replicate 1 Replicate 2 Replicate 3 Blank Synthetic East Lansing slide 1 2 3 1 2 NA 1 2 3 Replicate 1 Replicate 2 Replicate 3 Blank Synthetic Rest Stop (M66) slide 4 5 6 4 5 6 1 2 3 Replicate 1 Replicate 2 Replicate 3 Blank Synthetic 207 Table 6.5: (cont’d) Analyst 1 Analyst 2 Analyst 3 Least Experienced 1 Moderate Experienced 1 year Experience week 0.5 month Detroit slide 1 2 3 1 2 NA 1 2 3 Replicate 1 Replicate 2 Replicate 3 Blank Synthetic Grand Rapids slide 4 5 6 4 5 6 1 2 3 Replicate 1 Replicate 2 Replicate 3 Blank Synthetic Table 6.6: Nanochromatography patterns of Michigan tap waters (stored for two months at 4◦C) dried on slides cut from the same sheet of aluminum. Nanopure water synthetic hard water served as controls. The lab temperature was 24 ◦C and relative humidity was 47-48% for this experiment. Minimally treated groundwater MSU academic building nanopure Synthetic Durand 208 Table 6.6: (cont’d) Kalamazoo Portland Battle Creek Site A Battle Creek Site B Fowlerville Charlotte Lime softened Lansing Site A Lansing Site B Howell East Lansing 209 Table 6.6: (cont’d) Ion exchange MSU residence hall Williamston Genoa Township private well softened Untreated groundwater Genoa Township well untreated Rest stop A Okemos Rest stop D M66/I96 East Lansing Site C Zeeland Rest stop B Fenton Reverse osmosis Allegan Genoa Township well RO 210 Table 6.6: (cont’d) Surface waters Detroit Flint Grand Rapids Gyoming Swartz Creek Holland 211 Table 6.7: Temperature and humidity effect on residue pattern for four salt mixtures. 0.5 mM 0.5 mM 0.5 mM 3.0 mM Temperature Drying CaSO4, CaSO4, CaCl2, CaCl2, and relative time 0.25 mM 0.25 mM 0.25 mM 1.5 mM humidity (min) MgSO4 , MgSO4 , MgCl2 , MgCl 2 , 10 5.0 mM 10 mM 10 mM mM NaCl; Na2SO4; NaHCO3 ; NaHCO3 ; ◦ 24 C <20% RH 20 ◦ 24 C 46-48% RH 25 Table 6.8: Residue patterns of synthetic tap water solutions compared to real tap water at 24 ◦C and relative humidity of 47%. Collected tap Simplified Complex water synthetic, synthetic, Calcium, simplified magnesium, synthetic water sodium, sample plus chloride, sulfate, iron, copper, bicarbonate nitrate, fluoride, phosphate MSU Detroit Lansing 212 Table 6.9: Simple synthetic mixtures on a separate slides analyzed at 24 ◦C and 48% relative humidity. The low concentration mixtures that are not the same as the previous table are indicated by bold font. 3 mM NaCl NaHCO3 NaHCO3 NaCl 10 mM 5.0 mM 10 mM 5.0 mM 3 mM Cal2 1.5 mM MgCl2 1 mM Cal2 0.5 mM MgCl2 0.1 mM Cal2 0.05 mM MgCl2 Table 6.10: Images with mis-classification percentage over 70%. Image is different from other replicates MSU MSU Lansing site residence residence Portland Portland B hall hall Reason not clear Image in class two Genoa Genoa Genoa Genoa Township Township Battle Creek Township Township private well private well site B private well private well untreated untreated softened softened 213 Figure 6.1: The experimental procedure includes depositing two microliter droplets of an aqueous solution onto an aluminum substrate and allowing it to dry without movement. Figure 6.2: Image analysis pipeline in MATLAB and Python. 214 Figure 6.3: A schematic of the convolutional neural network (CNN) model. Figure 6.4: PCA on the nanochromatography image files for simplified synthetic waters (five replicates of twelve mixtures of salts). 215 Figure 6.5: Trilinear classification of tap water samples organized by treatment technology. Figure 6.6: Test dataset accuracies by class. 216 Figure 6.7: Autosampler for coffee-ring effect nanochromatography experiment. Figure 6.8: Autosampler for coffee-ring effect nanochromatography experiment. 217 Figure 6.9: Autosampler for coffee-ring effect nanochromatography experiment. Figure 6.10: Autosampler for coffee-ring effect nanochromatography experiment. 218 Figure 6.11: Autosampler for coffee-ring effect nanochromatography experiment. Figure 6.12: Autosampler for coffee-ring effect nanochromatography experiment. 219 Figure 6.13: Temperature humidity control chamber. Figure 6.14: Trilinear plot for water samples. 220 Table 6.11: PERMANOVA clustering result Experiment condition 20-23 °C 23-26 °C 26-29 °C Temperature, Relative Humidity 35%-40% 40%-45% 45%-50% 221 Table 6.12: ANOSIM of particles CRE residue features Temperature C Relative Humidity Bar plots (p-value) 20-23 ◦C, 35%-40% 222 Table 6.12: (cont’d) 20-23 ◦C, 40%-45% 20-23 ◦C, 45%-50% 223 Table 6.12: (cont’d) 23-26 ◦C, 35%-40% 23-26 ◦C, 40%-45% 224 Table 6.12: (cont’d) 23-26 ◦C, 45%-50% 26-29 ◦C, 35%-40% 225 Table 6.12: (cont’d) 26-29 ◦C, 40%-45% 26-29 ◦C, 45%-50% 226 Table 6.13: ANOSIM of CRE residue pattern area. Images are arranged in two orientations: from left to right across the top row, numbered 1 to 25, and from top to bottom along the left column, also numbered 1 to 25. Temperature & Rh 20-23 ◦C 23-26 ◦C 26-29 ◦C 35%-40% 40%-45% 45%-50% Color bar 227 Table 6.14: CMDS of CRE residue pattern area. Red circle represents water sample A; green circle represents water sample B; blue circle represents water sample C; yellow circle represents water sample D; purple circle represents water sample E. The three axes are labeled as Dimension 1, Dimension 2, and Dimension 3. Temperature C Relative 20-23 ◦C 23-26 ◦C 26-29 ◦C Humidity (p-value) 35%-40% 40%-45% 45%-50% Color bar 228 Table 6.15: ANOSIM of CRE residue pattern perimeter. Images are arranged in two orientations: from left to right across the top row, numbered 1 to 25, and from top to bottom along the left column, also numbered 1 to 25. Temperature & Rh 20-23 ◦C 23-26 ◦C 26-29 ◦C 35%-40% 40%-45% 45%-50% Color bar 229 Table 6.16: CMDS of CRE residue pattern centroid perimeter. Red circle represents water sample A; green circle represents water sample B; blue circle represents water sample C; yellow circle represents water sample D; purple circle represents water sample E. The three axes are labeled as Dimension 1, Dimension 2, and Dimension 3. Temperature C Relative 20-23 ◦C 23-26 ◦C 26-29 ◦C Humidity (p-value) 35%-40% 40%-45% 45%-50% Color bar 230 Table 6.17: ANOSIM of CRE residue pattern centroid. Images are arranged in two orientations: from left to right across the top row, numbered 1 to 25, and from top to bottom along the left column, also numbered 1 to 25. Temperature & Rh 20-23 ◦C 23-26 ◦C 26-29 ◦C 35%-40% 40%-45% 45%-50% Color bar 231 Table 6.18: CMDS of CRE residue pattern centroid. Red circle represents water sample A; green circle represents water sample B; blue circle represents water sample C; yellow circle represents water sample D; purple circle represents water sample E. The three axes are labeled as Dimension 1, Dimension 2, and Dimension 3. Temperature C Relative 20-23 ◦C 23-26 ◦C 26-29 ◦C Humidity (p-value) 35%-40% 40%-45% 45%-50% Color bar 232 Table 6.19: ANOSIM of CRE residue pattern eccentricity. Images are arranged in two orientations: from left to right across the top row, numbered 1 to 25, and from top to bottom along the left column, also numbered 1 to 25. Temperature & Rh 20-23 ◦C 23-26 ◦C 26-29 ◦C 35%-40% 40%-45% 45%-50% Color bar 233 Table 6.20: CMDS of CRE residue pattern eccentricity. Red circle represents water sample A; green circle represents water sample B; blue circle represents water sample C; yellow circle represents water sample D; purple circle represents water sample E. The three axes are labeled as Dimension 1, Dimension 2, and Dimension 3. Temperature C Relative 20-23 ◦C 23-26 ◦C 26-29 ◦C Humidity (p-value) 35%-40% 40%-45% 45%-50% Color bar 234 Table 6.21: Two-way ANOVA for Carbon, Chlorine and Sulfur elements Condition A F Df Sum Sq Mean Sq Pr(>F) sig. value Class 4 4.05 × 106 1.01 × 106 151 < 2×10−16 *** Element 2 7.72 × 107 3.86 × 107 5751 <2 × 10−16 *** Class:Element 8 1.42 × 107 1.78 × 106 256.8 <2 × 10−16 1.18 × Residuals 7.93×1010 6713 107 Condition B F Df Sum Sq Mean Sq Pr(>F) sig. value Class 4 6.42 × 106 1.6 × 106 245.8 < 2×10−16 *** Element 2 4.64 × 107 2.32 × 107 3546 <2 × 10−16 *** Class:Element 8 1.25 × 107 1.56 × 106 239.7 <2 × 10−16 1.24 × Residuals 8.12×1010 6537 107 Condition C F Df Sum Sq Mean Sq Pr(>F) sig. value Class 4 1.25 × 107 3.13 × 106 467.1 < 2×10−16 *** Element 2 7.58 × 107 3.79 × 107 5645 <2 × 10−16 *** Class:Element 8 2.34 × 107 2.92 × 106 434.8 <2 × 10−16 1.17 × Residuals 7.83×1010 6714 107 Condition D F Df Sum Sq Mean Sq Pr(>F) sig. value Class 4 1.18 × 107 2.96 × 106 442.8 < 2×10−16 *** Element 2 1.16 × 108 5.80 × 107 8677 <2 × 10−16 *** Class:Element 8 3.17 × 107 3.96 × 106 592.2 <2 × 10−16 1.19 × Residuals 7.92×1010 6686 107 Condition E F Df Sum Sq Mean Sq Pr(>F) sig. value Class 4 3.87 × 106 9.68 × 105 148.3 < 2×10−16 *** Element 2 3.36 × 107 1.68 × 107 2568 <2 × 10−16 *** Class:Element 8 8.06 × 106 1.00 × 106 154.3 <2 × 10−16 235 Table 6.21: (cont’d) 1.26 × Residuals 8.26×1010 6532 107 Condition F F Df Sum Sq Mean Sq Pr(>F) sig. value Class 4 1.04 × 107 2.61 × 106 387.3 < 2×10−16 *** Element 2 5.92 × 107 2.96 × 107 4398 <2 × 10−16 *** Class:Element 8 2.13 × 107 2.67 × 106 396.9 <2 × 10−16 1.20 × Residuals 8.13×1010 6733 107 Condition G F Df Sum Sq Mean Sq Pr(>F) sig. value Class 4 1.07 × 107 2.67 × 106 400.1 < 2×10−16 *** Element 2 6.12 × 107 3.06 × 107 4575 <2 × 10−16 *** Class:Element 8 2.54 × 107 3.18 × 106 475.4 <2 × 10−16 1.22 × Residuals 8.19×1010 6688 107 Condition H F Df Sum Sq Mean Sq Pr(>F) sig. value Class 4 6.66 × 106 1.67 × 106 245.8 < 2×10−16 *** Element 2 6.25 × 107 3.12 × 107 4609 <2 × 10−16 *** Class:Element 8 1.32 × 107 1.65 × 106 244.1 <2 × 10−16 1.19 × Residuals 8.09×1010 6778 107 Condition I F Df Sum Sq Mean Sq Pr(>F) sig. value Class 4 5.07 × 106 1.27 × 106 187.7 < 2×10−16 *** Element 2 4.87 × 107 2.44 × 107 3605 <2 × 10−16 *** Class:Element 8 1.81 × 107 2.26 × 106 334.9 <2 × 10−16 1.19 × Residuals 8.01×1010 6757 107 236 Table 6.22: Two-way ANOVA for Calcium, Magnesium and Sodium elements Condition A F Df Sum Sq Mean Sq Pr(>F) sig. value Class 3.16 4 8.47 × 104 2.11 × 104 0.0132 *** Element 2760 2 3.70 × 107 1.85 × 107 <2×10−16 *** Class:Element 25.21 8 1.35 × 106 1.69 × 105 <2×10−16 Residuals 1.35×107 9.07×1010 6701 Condition B F Df Sum Sq Mean Sq Pr(>F) sig. value Class 4 3.85 × 106 9.63 × 105 146.1 < 2×10−16 *** Element 2 3.80 × 107 1.90 × 107 2887 <2 × 10−16 *** Class:Element 8 5.97 × 106 7.46 × 105 113.2 <2 × 10−16 1.3×107 Residuals 9.07×1010 6593 Condition C F Df Sum Sq Mean Sq Pr(>F) sig. value Class 4 1.38 × 106 3.45 × 105 51.45 < 2×10−16 *** Element 2 3.87 × 107 1.93 × 107 2882 <2 × 10−16 *** Class:Element 8 6.36 × 106 7.95 × 105 118.52 <2 × 10−16 1.34×107 Residuals 9.00×1010 6708 Condition D F Df Sum Sq Mean Sq Pr(>F) sig. value Class 4 2.17 × 106 5.42 × 105 81.05 < 2×10−16 *** Element 2 5.11 × 107 2.56 × 107 3817 <2 × 10−16 *** Class:Element 8 5.95 × 106 7.43 × 105 111 <2 × 10−16 1.36×107 Residuals 9.12×1010 6699 Condition E F Df Sum Sq Mean Sq Pr(>F) sig. value Class 4 1.42 × 106 3.56 × 105 54.01 < 2×10−16 *** Element 2 2.36 × 107 1.18 × 107 1791 <2 × 10−16 *** Class:Element 8 9.27 × 106 1.15 × 106 175.8 <2 × 10−16 237 Table 6.22: (cont’d) 7 1.41×10 Residuals 9.30×1010 6587 Condition F F Df Sum Sq Mean Sq Pr(>F) sig. value Class 4 6.67 × 105 1.67 × 105 24.82 < 2×10−16 *** Element 2 3.16 × 107 1.58 × 107 2354 <2 × 10−16 *** Class:Element 8 2.74 × 106 3.42 × 105 51.05 <2 × 10−16 1.38 × Residuals 9.25×1010 6714 107 Condition G F Df Sum Sq Mean Sq Pr(>F) sig. value Class 4 6.40 × 105 1.60 × 105 23.87 < 2×10−16 *** Element 2 2.72 × 107 1.35 × 107 2017 <2 × 10−16 *** Class:Element 8 2.98 × 106 3.72 × 105 55.66 <2 × 10−16 1.39 × Residuals 9.31×1010 6698 107 Condition H F Df Sum Sq Mean Sq Pr(>F) sig. value Class 4 2.74 × 105 6.85 × 104 10.18 < 2×10−16 *** Element 2 3.11 × 107 1.55 × 107 2311 <2 × 10−16 *** Class:Element 8 2.02 × 107 2.52 × 105 37.47 <2 × 10−16 1.37 × Residuals 9.28×1010 6732 107 Condition I F Df Sum Sq Mean Sq Pr(>F) sig. value < 1.88 × Class 4 5.05 × 105 1.26 × 105 18.78 *** 10−15 Element 2 4.21 × 107 2.10 × 107 3132 <2 × 10−16 *** Class:Element 8 2.26 × 106 2.83 × 105 42.07 <2 × 10−16 1.36 × Residuals 9.19×1010 6728 107 238 Table 6.23: Heat map of particle area, eccentricity and element compositions Temperature C Relative 20-23 ◦C 23-26 ◦C 26-29 ◦C Humidity (p-value) 35%-40% 40%-45% 45%-50% Color bar 239 Table 6.24: Nanochromatography images under condition A, 20-23 ◦C, 35%-40% Replicate 1 Replicate 2 Replicate 3 Replicate 4 Replicate 5 240 Table 6.25: Nanochromatography images under condition B, 20-23 ◦C, 40%-45% Replicate 1 Replicate 2 Replicate 3 Replicate 4 Replicate 5 241 Table 6.26: Nanochromatography images under condition C, 20-23 ◦C, 45%-50% Replicate 1 Replicate 2 Replicate 3 Replicate 4 Replicate 5 242 Table 6.27: Nanochromatography images under condition F 23-26 ◦C, 45%-50% Replicate 1 Replicate 2 Replicate 3 Replicate 4 Replicate 5 243 Table 6.28: Water samples recipe of table A for stage 2 Table 1 MgCl 2 0.45 mM, NaHCO3 0.25 CaCl2 mM 0.5 0.75 1.0 1.5 2 mM MgSO4 (mM) 0.25 1 2 3 4 5 0.5 6 7 8 9 10 0.75 11 12 13 14 15 1.0 16 17 18 19 20 2.0 21 22 23 24 25 Table 6.29: Water samples recipe of table B for stage 2 Table 2 MgCl2 0.45 mM, CaCl2 mM 0.5 0.75 1.0 1.5 2 N aHCO 3 0.5 mM MgSO4 (mM) 0.25 1 2 3 4 5 0.5 6 7 8 9 10 0.75 11 12 13 14 15 1.0 16 17 18 19 20 2.0 21 22 23 24 25 Table 6.30: Water samples recipe of table C for stage 2 Table 3 MgCl2 0.45 mM, CaCl2 mM 0.5 0.75 1.0 1.5 2 N aHCO 3 0.75 mM MgSO4 (mM) 0.25 1 2 3 4 5 0.5 6 7 8 9 10 0.75 11 12 13 14 15 1.0 16 17 18 19 20 2.0 21 22 23 24 25 244 Table 6.31: Water samples recipe of table D for stage 2 Table 4 MgCl2 0.45 mM, CaCl2 mM 0.5 0.75 1.0 1.5 2 N aHCO 3 1.0 mM MgSO4 (mM) 0.25 1 2 3 4 5 0.5 6 7 8 9 10 0.75 11 12 13 14 15 1.0 16 17 18 19 20 2.0 21 22 23 24 25 Table 6.32: Water samples recipe of table E for stage 2 Table 5 MgCl2 0.45 mM, CaCl2 mM 0.5 0.75 1.0 1.5 2 N aHCO 3 2.0 mM MgSO4 (mM) 0.25 1 2 3 4 5 0.5 6 7 8 9 10 0.75 11 12 13 14 15 1.0 16 17 18 19 20 2.0 21 22 23 24 25 245 Figure 6.15: TwoVtMoM Chlorine-Sulfur mass ratio. Targets Chlorine-Sulfur mass ratio vs predictions Chlorine-Sulfur mass ratio. Marker colors relates target Chlorine-Sulfur ratio value. Figure 6.16: TwoVtMoM of water samples hardness category classification results 246 Figure 6.17: TwoVtMoM of water samples trilinear plot. 247