ON THE PHYSIOLOGICAL AND IMMUNOLOGICAL EFFECTS OF PROTEIN GLYCATION AND ITS PREDICTION USING A NOVEL 3D CONVOLUTIONAL NEURAL NETWORK By Thomas Turkette A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of Physiology – Doctor of Philosophy Quantitative Biology – Dual Major 2024 ABSTRACT As a non-enzymatic reaction, protein glycation has a myriad of factors that govern the ability for a specific amino-group to react and if so, the rate of reaction. The first chapter of this dissertation is an overview of the reaction and these factors. Additionally, while in the modern, industrialized world glycation is associated with pathophysiology, one possible theoretical explanation for the physiological role in evolutionary history is put forth. One consequence of glycation being a non-enzymatic reaction is that no specific protein is required to catalyze the reaction and it therefore occurs ubiquitously within the body. This is reflected in the diverse pathophysiological effects elicited by elevations in protein glycation. Chapter 2 explores what this looks like in regards to the role protein glycation may play in COVID-19 complications experienced by diabetic patients. This chapter also examines inconsistent predictions made between existing protein glycation prediction models. In Chapter 3, a novel protein glycation model, SweetSMILE, is proposed. SweetSMILE is a 3D Convolutional Neural Network classifier that predicts the likelihood of lysine residues within a protein undergoing glycation utilizing simplified molecular-input line-entry (SMILE) representation of amino acid sequences converted into graphical representations of their chemical structures as input. The model demonstrates higher predictive quality than existing models, particularly for smaller datasets, and the novel input architecture permits it to be expanded to predicting glycation of amino acids besides lysine, as well as other types of biomolecules, when sufficiently large datasets of those instances of glycation become available. A brief summary of the key findings of the dissertation is provided in Chapter 4. Additionally, this chapter discusses some limitations of the current research and proposes solutions for how to address and follow-up on these in the future. ACKNOWLEDGEMENTS Research is inherently a collaborative endeavor and this is particularly true in the realm of computational biology. As such, I would like to thank the researchers behind the CPLM database, SPOT-1D, CD-HIT, LogoMaker, and RDkit. I would also like to thank my friends and family for their support and my guidance committee (Drs. Dillon, Wehrwein, Leishman, and Chan) for their invaluable insight and advice. Finally, but most certainly not least, I would like to thank my advisor, Dr. Root- Bernstein. A graduate student could ask for no better mentor, nor I a better friend. iv TABLE OF CONTENTS Chapter 1: Hyperglycemia and The Glycation of Proteins………………………………….1 BIBLIOGRAPHY……………………………………………………………………….13 Chapter 2: Hyperglycemia-Induced Non-Enzymatic Glycation of Immune System Proteins May Explain Increased COVID-19 Severity and Complications and Overall Infectious Disease Risk Among Diabetics ……………………………………………………………....21 BIBLIOGRAPHY……………………………………………………………………….81 Chapter 3: SweetSMILE: A Novel Image Based 3D Convolutional Neural Network For Predicting Protein Glycation…...…………………………………………………………….105 BIBLIOGRAPHY……………………………………………………………………...129 Chapter 4: Conclusions and Future Directions…………………………………………….135 v Chapter 1: Hyperglycemia and The Glycation of Proteins 1. Introduction Type 2 diabetes mellitus (T2D) is best defined as chronic hyperglycemia brought about through the development of insurmountable insulin resistance. It currently affects 9.3% of the American populace and is costing the U.S. hundreds of billions of dollars annually [1]. These figures are expected to continue their meteoric rise presenting grave concerns for the health of American citizens and the economy [2]. Progress in both preventative care and towards diabetic reversal is therefore imperative. Among the greatest impediments to preventative care is the current inability to identify reliable, early biomarkers pertaining to diabetic onset [3]. The origins of this challenge are multifaceted, but can largely be distilled down to two prime contributors. First, most diabetics don’t seek medical attention until well after onset, which presents obvious complications to the acquisition of relevant data prior to and during onset. As such, most of our knowledge of this phenomenon stems from in vitro or in vivo animal studies, which presents the second difficulty, because there is presently no model that truly recapitulates the etiology of T2D. Thus, despite the value of animal models, the initial stages of the human disease remain unclear. This ambiguity confounds the creation of an accurate roadmap of diabetic progression and therefore hinders both the reliable identification of biomarkers and novel targets of therapeutic intervention. Given the inherent link between blood sugar and T2D, understanding the pathophysiological effects of chronic hyperglycemia on the body is paramount in addressing these aforementioned challenges. 1 2. Glucotoxicity, Glycation, and Post-Prandial Glucose Cellular toxicity resulting from hyperglycemic conditions is known as glucotoxicity [4]. Ordinarily fasting blood glucose levels are below 100 mg/dL, but in diabetic patients it is 126 mg/dL or greater with the effects of glucotoxicity becoming more apparent as blood sugar levels rise. Mechanisms of glucotoxicity including increased oxidative stress through excessive reactive oxygen species production and antioxidant depletion, as well as pathological post-translational modification of proteins, such as glycation, the spontaneous, covalent linkage of a molecule of sugar to a free amino group of a protein [4-6]. Glucotoxicity has traditionally been thought to occur only during chronic hyperglycemia, however several studies have shown that acute bouts of hyperglycemia can lead to glucotoxicity [7-9]. Post-prandial glucose levels frequently reach hyperglycemic levels and these levels can be maintained if sugar, from snacks or sugared drinks, is consumed between meals. These findings suggest that meal associated rises in blood sugars may be contributing to the development of glucotoxicity and subsequently diabetic onset and progression [10]. If the meal was particularly rich in simple sugars it can take a healthy patient approximately 3 hours to return to normoglycemia. If further sugar is consumed within this window, return to normoglycemia is delayed. Semi-continuous sugar ingestion, as might occur when snacking or consuming sugared drinks between meals, could result in almost continuous hyperglycemia [9-11]. Post-prandial glucose has several prominent fates and any change in the availability of these fates will necessarily create shifts in the post- prandial glucose profile. A physiologically beneficial example of this is exercise, which increases glucose oxidation, decreasing the quantity of glucose available for lipogenesis 2 and glycogenesis [12]. However, a sedentary individual consuming a large amount of glucose would have decreased glucose oxidation and would eventually saturate their glycogen stores, which would necessitate increased lipogenesis, flux through the polyol pathway, and post-translational modification of proteins in the form of glycation and glycosylation [6,13]. Indeed, saturation of a sedentary individual’s glycogen stores can occur within days when consuming a sugar rich diet and lead to prolonged elevations of blood glucose and hyperlipidemia [14,15]. Acute bouts of hyperglycemia can result in glucotoxicity even in a healthy patient population [6-9]. Thus, repeated transient or prolonged blood sugar spikes may be playing a significant role in diabetic progression through induction of glucotoxicity and beginning the lipidomic shifts that underlie the lipotoxicity that contributes to pancreatic ß-cell failure and diabetic complications [16,17]. Of the various fates of ingested glucose, studying protein glycation presents a unique opportunity to simultaneously further the understanding of the mechanisms underlying the pathophysiological diabetic phenotype as well as to develop clinically relevant biomarkers for assessing its onset and progression [18]. The importance of this latter point has already been highlighted through the use of hemoglobin A1C, a glycated form of hemoglobin, as a measure for the long-term control of blood sugar levels in diabetic patients, and yet the potential usefulness of glycation as a biomarker extends far beyond this as the ubiquitous, non-enzymatic, and highly nuanced nature of the reaction can yield a tremendous amount of information about the local conditions in which a specific glycated protein exists [19,20]. 3 3. An Overview of the Glycation Reaction In protein glycation, the electrophilic carbonyl-group of a reducing sugar non- enzymatically reacts with a nucleophilic amino-group to form a covalent adduct. Under what are considered to be standard physiological conditions, this corresponds to reducing sugars reacting with the N-terminus of free amino-acids, peptides, and proteins, as well as the epsilon-amino moiety at the end of the side chain in lysine residues [21,22]. The initial product formed is an unstable Schiff base and at this stage the reaction is readily reversible, however as the Schiff base undergoes a series of intramolecular rearrangements to form a more stable Amadori product, reversibility is severely reduced [23]. Once early glycation products have formed they can react further to form Advanced Glycation Endproducts (AGEs) in which electrophilic carbonyl and dicarbonyl groups on glycated proteins react with a nucleophilic group of another protein forming a cross-link between the two molecules [24]. While the formal definition of protein glycation is generally well agreed upon to include all reducing sugars, confusion in the literature does arise owing to the fact that glucose has been the most commonly used reducing sugar in experiments. This has given rise to the inconsistent use of alternative names for the reaction when an alternative reducing sugar is used in the reaction, such as fructation for glycation that utilizes fructose as the reducing sugar reactant [25,26]. While this naming scheme is not formally nor uniformly applied across the literature, for the purposes of this document it will be adopted. The reasons for this are twofold. Firstly, as will be discussed in more detail in subsequent sections, the choice of reducing sugar can influence the reaction in terms of both site specificity and reaction rate. Secondly and building upon the first, when 4 constructing a model to predict protein glycation, which is the ultimate aim of this thesis, it is therefore paramount to distinguish between which sugar was utilized in the reaction rather than simply using the existent catch-all definition. 4. Factors Known to Influence Protein Glycation Biochemically speaking glycation is often thought to be a relatively slow process, often taking days for an appreciable amount of a protein to glycate under physiological conditions which is one of the reasons that the glycation of hemoglobin A1C is utilized in assessing long-term control blood sugar control in diabetic patients. However, the rate of glycation can vary greatly depending on the protein, peptide, or amino acid or even particular sites of these being glycated, the reducing sugar used, and the reaction conditions. For example, insulin has been shown to glycate much more rapidly than hemoglobin A1C [27] and significant amounts of hemoglobin A1A can be detected within an hour of exposure to hyperglycemic conditions [28]. 4.1. Reactant Concentration and Temperature The reaction behaves in agreement with the collision theory of chemical reactions, thus elevations in temperature and increased concentrations of reactants both lead to an increased rate of glycation [29]. Interestingly, there have been studies in which proteins incubated in reducing sugar concentrations far above physiological levels result in certain amino groups becoming glycated that are not observed at a lower reaction concentration of the sugar even if the incubation time is extended [30,31]. The reason for this is unclear. It is possible that the extended incubation times were merely insufficient to allow for appreciable amounts of reaction at these more slowly glycating sites, particularly if ones allows for the possibility that it results as a consequence of carbonyl tautomerization of 5 the reducing sugar or that the glycation events occur sequentially wherein the glycation of the non-novel sites serve a permissive role for the novel sites to become glycated through altering the local reaction environment by changing the protein structure or effecting the physicochemical properties of amino acids around the reaction site. Alternatively, given the extremely high concentrations of glucose used in these studies it could be a consequence of the reaction mixture approaching saturability. Regardless of the etiology, the existence of these sites that glycate only with extreme hyperglycemic conditions underscores the paramount importance of ensuring that they are excluded for any model being used to predict possible protein glycation sites under physiological conditions. 4.2. pH The pH of the reaction solution is an important factor to consider due to it alternating the protonation status of the amino acid residues. Free amino acids have two to three pKas to consider: the carboxyl group, the amino group, and their side chains in the cases of those that are ionizable. Of the 21 proteinogenic amino acids, the latter case is applicable to cysteine, aspartate, glutamate, lysine, arginine, and histidine, but of these the protonation of aspartate and glutamate is not relevant to the glycation reaction as they result in a completed hydroxyl of their side chain carboxylic acid groups [32]. At a standard physiological pH of 7.4, all 21 amino acids possess a protonated amino-group, meaning that all free amino acids possess at least one possible glycation site. Interestingly, despite the textbook definition of glycation specifying the nucleophilic reactant as an amino moiety and predictive glycation research focusing on glycated lysine residues due to its side chain possessing the epsilon amino group, at a physiological pH 6 that group is present as a positively charged ammonium group rather than an amino group [33]. At this pH, arginine’s guanidino group is protonated, which conceivably does allow it to function as a nucleophile capable of reacting non-enzymatically with a reducing sugar, albeit with less reactivity than lysine due to the resonance stability present in a guanidino group, and there have been instances of this being observed within the literature [34-36]. Similar observations have been made for cysteine, whose nucleophilic thiol group is also protonated at physiological pH [36,37]. However the body does not only exist in its entirety at a pH of 7.4. For example, within lysosomes the pH is approximately 5 and in the lumen of the stomach pH generally falls between 1 and 3 [38-40]. There are unequivocally conditions in the body in which all side chains become fully protonated, but in regards to side chains capable of non- enzymatically forming a covalent bond with a reducing sugar, the lowest relevant pKa belongs to the imidazole group of histidine which occurs at a pH of 6. Unsurprisingly, there have been reports of reducing sugar adducts being added to histidine residues non- enzymatically [36,41-43]. Therefore, pH must be considered a significant variable in modeling and predicting glycation. When considering amino acids in peptides and proteins, the same general principles pertaining to the effects of pH on the protonation of free amino acids hold wherein the carboxyl and amino groups are present on the C-terminal and N-terminal residues, respectively. However, in the cases of peptides and proteins, especially larger ones containing higher order structures, the surrounding amino acids can significantly alter the pKa for any given ionizable moiety [34,44]. As such, for a given protein in solution at physiological pH there could exist both protonated and unprotonated side chains for both 7 histidine and lysine residues depending on their primary, secondary, or tertiary locations that are not predictable from their respective free amino acid pKas. 4.3. Phosphate One additional factor known to influence the rate of glycation is the presence of phosphate in solution which has been shown to display a catalytic effect [45]. In solution, reducing sugars such as the hexoses glucose and fructose can exist in a straight-chain or cyclic hemiacetal form [46]. It is the former conformation which participates in the glycation reaction [47]. It is thought that the mechanism underlying the ability for phosphate to catalyze the glycation reaction is a consequence of its ability to stabilize the straight-chain form of a reducing sugar thereby effectively increasing the concentration of the reactant form at a given concentration of reducing sugar. 4.4. Cellular Compartmentalization Once one begins to consider the reaction in a biological environment, such as within a cell in vitro or within an in vivo setting, there are a number of additional factors that come into play. Local environments will each have their own distinct reaction conditions because different cell types differentially express sugar transporters, metabolic enzymes, glycogen storage capacity, or organelles such as the lysosomes and acidocalcisome possessing a pH significantly lower than 7.4 [38, 48-50]. Storage of proteins and peptides inside of vesicles prior to being exocytosed therefore provides unique environments within which glycation may be promoted [41,51,52]. These vesicles can obviously not contain the enzymes that break down and recycle the contents so that anything contained within them will have a significantly longer half-life than what will usually be observed for that protein in circulation or when expressed on the cell surface or in the cytoplasm. For 8 example, insulin has a circulating half-life of approximately 6 minutes, which is far too short for any appreciable amount of glycation to take place. However, glycation of insulin is observed in vivo, which was traced to the reaction occurring within the storage vesicles prior to its being released [53-55]. 4.5. Membrane Protein Considerations The ability for membrane proteins to undergo glycation is affected by an additional two principal factors. In the case of transmembrane proteins, the titular region is unable to glycate when inserted into the membrane because any potential glycation sites are shielded by the phospholipid bilayer. The second is that the electric field radiating outward from the cell membrane promotes the dissociation of molecules, thus any site that might undergo glycation on a strict biochemical basis must also lie sufficiently far from the membrane [56]. However, it has also been observed that even in in vitro conditions where membrane proteins are not inserted into the membrane that they are less likely to glycate compared to other proteins [58,59]. This has led to the idea that membrane proteins may have evolved sequences that are unfavorable to glycation, though as of yet no one has proffered any potential explanations for what the evolutionary pressures behind such an adaptation may have been. It seems likely that to some extent the resistance could have evolved to thwart the glycation of the transmembrane domains of the proteins which could potentially interfere with the insertion of the proteins into the membrane. It is also the case that transmembrane regions are highly hydrophobic, containing very few readily glycatable residues [59,60]. However, the resistance to glycation does not appear to be isolated exclusively to these transmembrane regions but selection pressures for 9 accurately conveying signals across the cell membranes may have led to the evolution of sequences resistant to glycation in these other domains [58,61]. Though the strength of such a selection pressure would be variable across receptors depending upon the number of typically expressed spare receptors (sometimes alternatively referred to as the receptor reserve), which succinctly summated are those receptors that are expressed in excess of the number required to stimulate a maximal response [62,63]. Examining the uses that humans have devised for glycation and other types of post- translational modifications of proteins may offer some insight. Within both food science and in the pharmaceutical industry, proteins are often stored in solutions containing high concentrations of reducing sugars, such as glucose [64-65]. Additionally, glycation is sometimes used to create post-translational modifications of pharmaceutical proteins to prolong the circulating half-life of the therapeutic compound. For example, in long-acting forms of insulin, such as the insulin detemir and insulin degludec formulations, modification of the lysine residue at position B29 is quite common [66]. In insulin detemir, a myrstric acid is conjugated to this site and in insulin degludec hexadecanedioic acid is utilized instead [67,68]. Monoclonal antibodies intended for human therapeutic uses are often glycated to increase their stability [69-72]. It is conceivable then that in ancient cells glycation of soluble proteins actually served an important protective function and when one considers the conditions under which hyperglycemic conditions would be present, may have also served as part of the metabolic control scheme by 1) serving as glucose sinks to increase the concentration gradient for absorption of glucose, similar to the role that myoglobin serves for oxygen in muscle [73], 2) supplying the cell with a relatively consistent low concentration of glucose 10 during periods of fasting as it dissociated from the proteins, each glycation site possessing its own unique storage half-life, and 3) also, because glycation sites are frequently also ubiquitination, or other post-translational modification, sites used to target proteins for destruction, glycation might extend the half-life of the metabolic proteins themselves during periods of large influxes of glucose and only allowing them to be catabolized during periods of fasting [74]. 5. Consequences of Protein Glycation There are two important consequences of the factors listed above when considering the development of models for predicting the development of diabetes mellitus. The first concerns the nature of the data available for making predictions about which proteins are most likely to glycate at rates that are meaningful for understanding diabetic pathologies. Available studies of protein glycation such as those aggregated in the Compendium of Lysine Modification (CPLM) universally fail to categorize the data in terms of the type of sugar, the concentration of the sugar, the temperature at which the glycation was carried out, the length of time the proteins were exposed to the sugar, whether the data were derived from human or animal subjects, or whether the data were derived from pharmaceutical or food preparations [75]. As we will demonstrate in the next chapter, conflating all of these factors leads to models that make very poor predictions about protein glycation. The second set of consequences of the aggregate of factors influencing glycation involve their cumulative physiological and pathological effects in different tissues and organs. While glycation may have served a beneficial physiological role in evolutionary history, in the context of the diet and lifestyle of the modern industrialized world it is most 11 commonly associated with pathophysiological outcomes [76]. Broadly speaking, the consequences of protein glycation can be segmented into two types: those associated with the direct modification of protein function as a consequence of early or advanced glycation adducts and those associated with the activation of the Receptors of Advanced Glycation End-products (RAGEs) which contribute to a pro-inflammatory state through activation of a number of pathways [77-81]. When considering the effects of glycation on any given physiological system, it is most often the case that both mechanisms are at play. Glycation and AGEs have been implicated in nearly every morbidity and co-morbidity associated with diabetes mellitus [82]. This includes directly contributing to insulin resistance [83], structural alterations in elements from the lens of the eye to collagen throughout the body [84,85], and contributing to the onset of dementia to name a few [86]. A full accounting of every system and every protein affected is beyond the scope of this thesis, however the subsequent chapter will illustrate the general principle through providing a detailed look at how protein glycation can impact the immune system of diabetic patients exposed to COVID-19. The following chapter will also make a clear case that there is a need for better models of glycation if such models are to be of value in predicting and understanding the pathological consequences of hyperglycemia in diabetes. 12 BIBLIOGRAPHY 1. American Diabetes Association. Economic costs of diabetes in the U.S. in 2012. Diabetes Care. 2013 Apr;36(4):1033-46. doi: 10.2337/dc12-2625. 2. Mathers CD, Loncar D. Projections of global mortality and burden of disease from 2002 to 2030. PLoS Med. 2006 Nov;3(11):e442. doi: 10.1371/journal.pmed.0030442. 3. Wright LA-C, Hirsch IB. The challenge of the use of glycemic biomarkers in diabetes: Reflecting on hemoglobin A1C, 1,5-anhydroglucitol, and the glycated proteins fructosamine and glycated albumin. Diabetes Spectrum. 2012;25(3):141–8. doi:10.2337/diaspect.25.3.141 4. Luo X, Wu J, Jing S, Yan LJ. Hyperglycemic Stress and Carbon Stress in Diabetic Glucotoxicity. Aging Dis. 2016 Jan 2;7(1):90-110. doi: 10.14336/AD.2015.0702. 5. Babizhayev MA, Strokov IA, Nosikov VV, Savel'yeva EL, Sitnikov VF, Yegorov YE, Lankin VZ. The Role of Oxidative Stress in Diabetic Neuropathy: Generation of Free Radical Species in the Glycation Reaction and Gene Polymorphisms Encoding Antioxidant Enzymes to Genetic Susceptibility to Diabetic Neuropathy in Population of Type I Diabetic Patients. Cell Biochem Biophys. 2015 Apr;71(3):1425-43. doi: 10.1007/s12013-014-0365-y. 6. Brownlee M. The pathobiology of diabetic complications: a unifying mechanism. Diabetes. 2005 Jun;54(6):1615-25. doi: 10.2337/diabetes.54.6.1615. 7. Schiekofer S, Andrassy M, Chen J, Rudofsky G, Schneider J, Wendt T, Stefan N, Humpert P, Fritsche A, Stumvoll M, Schleicher E, Häring HU, Nawroth PP, Bierhaus A. Acute hyperglycemia causes intracellular formation of CML and activation of ras, p42/44 MAPK, and nuclear factor kappaB in PBMCs. Diabetes. 2003 Mar;52(3):621-33. doi: 10.2337/diabetes.52.3.621. 8. Ceriello A. Acute hyperglycaemia and oxidative stress generation. Diabet Med. 1997 Aug;14 Suppl 3:S45-9. doi: 10.1002/(sici)1096-9136(199708)14:3+3.3.co;2-i. 9. Tessier D, Khalil A, Fülöp T. Effects of an oral glucose challenge on free radicals/antioxidants balance in an older population with type II diabetes. J Gerontol A Biol Sci Med Sci. 1999 Nov;54(11):M541-5. doi: 10.1093/gerona/54.11.m541. 10. Feldbauer R, Heinzl MW, Klammer C, Resl M, Pohlhammer J, Rosenberger K, Almesberger V, Obendorf F, Schinagl L, Wagner T, Egger M, Dieplinger B, Clodi M. Effect of repeated bolus and continuous glucose infusion on a panel of circulating biomarkers in healthy volunteers. PLoS One. 2022 Dec 27;17(12):e0279308. doi: 10.1371/journal.pone.0279308. 13 11. Jagannathan R, Neves JS, Dorcely B, Chung ST, Tamura K, Rhee M, Bergman M. The Oral Glucose Tolerance Test: 100 Years Later. Diabetes Metab Syndr Obes. 2020 Oct 19;13:3787-3805. doi: 10.2147/DMSO.S246062. 12. Adams OP. The impact of brief high-intensity exercise on blood glucose levels. Diabetes Metab Syndr Obes. 2013;6:113-22. doi: 10.2147/DMSO.S29222. 13. Brownlee M. Biochemistry and molecular cell biology of diabetic complications. Nature. 2001 Dec 13;414(6865):813-20. doi: 10.1038/414813a. 14. Acheson KJ, Schutz Y, Bessard T, Anantharaman K, Flatt JP, Jéquier E. Glycogen storage capacity and de novo lipogenesis during massive carbohydrate overfeeding in man. Am J Clin Nutr. 1988 Aug;48(2):240-7. doi: 10.1093/ajcn/48.2.240. 15. Butsch WL. Glucose tolerance and the glycogen storage capacity of the dog. American Journal of Physiology-Legacy Content. 1934;108(3):639–42. doi:10.1152/ajplegacy.1934.108.3.639 16. Prentki M, Nolan CJ. Islet beta cell failure in type 2 diabetes. J Clin Invest. 2006 Jul;116(7):1802-12. doi: 10.1172/JCI29103. 17. Samuel VT, Shulman GI. Mechanisms for insulin resistance: common threads and missing links. Cell. 2012 Mar 2;148(5):852-71. doi: 10.1016/j.cell.2012.02.017. 18. Rabbani N, Thornalley PJ. Protein glycation - biomarkers of metabolic dysfunction and early-stage decline in health in the era of precision medicine. Redox Biol. 2021 Jun;42:101920. doi: 10.1016/j.redox.2021.101920. 19. Gallagher EJ, Le Roith D, Bloomgarden Z. Review of hemoglobin A(1c) in the management of diabetes. J Diabetes. 2009 Mar;1(1):9-17. doi: 10.1111/j.1753- 0407.2009.00009.x. 20. Planas A, Simó-Servat O, Hernández C, Simó R. Advanced Glycations End Products in the Skin as Biomarkers of Cardiovascular Risk in Type 2 Diabetes. Int J Mol Sci. 2022 Jun 2;23(11):6234. doi: 10.3390/ijms23116234. 21. Zhao HR, Smith JB, Jiang XY, Abraham EC. Sites of glycation of beta B2-crystallin by glucose and fructose. Biochem Biophys Res Commun. 1996 Dec 4;229(1):128-33. doi: 10.1006/bbrc.1996.1768. 22. Bunn HF, Shapiro R, McManus M, Garrick L, McDonald MJ, Gallop PM, Gabbay KH. Structural heterogeneity of human hemoglobin A due to nonenzymatic glycosylation. J Biol Chem. 1979 May 25;254(10):3892-8. 14 23. Miller AK, Hambly DM, Kerwin BA, Treuheit MJ, Gadgil HS. Characterization of site- specific glycation during process development of a human therapeutic monoclonal antibody. J Pharm Sci. 2011 Jul;100(7):2543-50. doi: 10.1002/jps.22504. 24. Twarda-Clapa A, Olczak A, Białkowska AM, Koziołkiewicz M. Advanced Glycation End-Products (AGEs): Formation, Chemistry, Classification, Receptors, and Diseases Related to AGEs. Cells. 2022 Apr 12;11(8):1312. doi: 10.3390/cells11081312. 25. Dills WL Jr. Protein fructosylation: fructose and the Maillard reaction. Am J Clin Nutr. 1993 Nov;58(5 Suppl):779S-787S. doi: 10.1093/ajcn/58.5.779S. 26. Frost L, Chaudhry M, Bell T, Cohenford M. In vitro galactation of human serum albumin: analysis of the protein's galactation sites by mass spectrometry. Anal Biochem. 2011 Mar 15;410(2):248-56. doi: 10.1016/j.ab.2010.11.034. 27. Rhinesmith T, Turkette T, Root-Bernstein R. Rapid Non-Enzymatic Glycation of the Insulin Receptor under Hyperglycemic Conditions Inhibits Insulin Binding In Vitro: Implications for Insulin Resistance. Int J Mol Sci. 2017 Dec 2;18(12):2602. doi: 10.3390/ijms18122602. 28. Mortensen HB. Glycated hemoglobin. Reaction and biokinetic studies. Clinical application of hemoglobin A1c in the assessment of metabolic control in children with diabetes mellitus. Dan Med Bull. 1985 Dec;32(6):309-28. 29. Frolov A, Schmidt R, Spiller S, Greifenhagen U, Hoffmann R. Arginine-derived advanced glycation end products generated in peptide-glucose mixtures during boiling. J Agric Food Chem. 2014 Apr 23;62(16):3626-35. doi: 10.1021/jf4050183. 30. Reiser KM, Amigable MA, Last JA. Nonenzymatic glycation of type I collagen. The effects of aging on preferential glycation sites. J Biol Chem. 1992 Dec 5;267(34):24207- 16. 31. Hudson DM, Archer M, King KB, Eyre DR. Glycation of type I collagen selectively targets the same helical domain lysine sites as lysyl oxidase-mediated cross-linking. J Biol Chem. 2018 Oct 5;293(40):15620-15627. doi: 10.1074/jbc.RA118.004829. 32. Kawashima S, Pokarowski P, Pokarowska M, Kolinski A, Katayama T, Kanehisa M. AAindex: amino acid index database, progress report 2008. Nucleic Acids Res. 2008 Jan;36(Database issue):D202-5. doi: 10.1093/nar/gkm998. 33. Al Temimi AH, Amatdjais-Groenen HI, Reddy YV, Blaauw RH, Guo H, Qian P, et al. The nucleophilic amino group of lysine is central for histone lysine methyltransferase catalysis. Communications Chemistry. 2019;2(1). doi:10.1038/s42004-019-0210-8. 34. Fitch CA, Platzer G, Okon M, Garcia-Moreno BE, McIntosh LP. Arginine: Its pKa vaue revisited. Protein Sci. 2015 May;24(5):752-61. doi: 10.1002/pro.2647. 15 35. Bidasee KR, Zhang Y, Shao CH, Wang M, Patel KP, Dincer UD, Besch HR Jr. Diabetes increases formation of advanced glycation end products on Sarco(endo)plasmic reticulum Ca2+-ATPase. Diabetes. 2004 Feb;53(2):463-73. doi: 10.2337/diabetes.53.2.463. 36. Münch G, Schicktanz D, Behme A, Gerlach M, Riederer P, Palm D, Schinzel R. Amino acid specificity of glycation and protein-AGE crosslinking reactivities determined with a dipeptide SPOT library. Nat Biotechnol. 1999 Oct;17(10):1006-10. doi: 10.1038/13704. 37. Katsuta N, Takahashi H, Nagai M, Sugawa H, Nagai R. Changes in S-(2- succinyl)cysteine and advanced glycation end-products levels in mouse tissues associated with aging. Amino Acids. 2022 Apr;54(4):653-661. doi: 10.1007/s00726-022- 03130-y. 38. Bouhamdani N, Comeau D, Turcotte S. A Compendium of Information on the Lysosome. Front Cell Dev Biol. 2021 Dec 15;9:798262. doi: 10.3389/fcell.2021.798262. 39. Fallingborg J. Intraluminal pH of the human gastrointestinal tract. Dan Med Bull. 1999 Jun;46(3):183-96. 40. Evans DF, Pye G, Bramley R, Clark AG, Dyson TJ, Hardcastle JD. Measurement of gastrointestinal pH profiles in normal ambulant human subjects. Gut. 1988 Aug;29(8):1035-41. doi: 10.1136/gut.29.8.1035. 41. Shilton BH, Walton DJ. Sites of glycation of human and horse liver alcohol dehydrogenase in vivo. J Biol Chem. 1991 Mar 25;266(9):5587-92. 42. Iberg N, Flückiger R. Nonenzymatic glycosylation of albumin in vivo. Identification of multiple glycosylated sites. J Biol Chem. 1986 Oct 15;261(29):13542-5. 43. Shapiro R, McManus MJ, Zalut C, Bunn HF. Sites of nonenzymatic glycosylation of human hemoglobin A. J Biol Chem. 1980 Apr 10;255(7):3120-7. 44. Tynan-Connolly BM, Nielsen JE. Redesigning protein pKa values. Protein Sci. 2007 Feb;16(2):239-49. doi: 10.1110/ps.062538707. 45. Watkins NG, Neglia-Fisher CI, Dyer DG, Thorpe SR, Baynes JW. Effect of phosphate on the kinetics and specificity of glycation of protein. J Biol Chem. 1987 May 25;262(15):7207-12. 46. Seeberger PH. Monosaccharide Diversity. In: Varki A, Cummings RD, Esko JD, et al., editors. Essentials of Glycobiology [Internet]. 4th edition. Cold Spring Harbor (NY): Cold Spring Harbor Laboratory Press; 2022. Chapter 2. 16 47. Sztanke K, Pasternak K. The Maillard reaction and its consequences for a living body. Ann Univ Mariae Curie Sklodowska Med. 2003;58(2):159-62. 48. Kanungo S, Wells K, Tribett T, El-Gharbawy A. Glycogen metabolism and glycogen storage disorders. Ann Transl Med. 2018 Dec;6(24):474. doi: 10.21037/atm.2018.10.59. 49. Goodenough U, Heiss AA, Roth R, Rusch J, Lee JH. Acidocalcisomes: Ultrastructure, Biogenesis, and Distribution in Microbial Eukaryotes. Protist. 2019 Jul;170(3):287-313. doi: 10.1016/j.protis.2019.05.001. 50. Navale AM, Paranjape AN. Glucose transporters: physiological and pathological roles. Biophys Rev. 2016 Mar;8(1):5-9. doi: 10.1007/s12551-015-0186-2. 51. Ling X, Sakashita N, Takeya M, Nagai R, Horiuchi S, Takahashi K. Immunohistochemical distribution and subcellular localization of three distinct specific molecular structures of advanced glycation end products in human tissues. Lab Invest. 1998 Dec;78(12):1591-606. 52. Garlick RL, Mazer JS. The principal site of nonenzymatic glycosylation of human serum albumin in vivo. J Biol Chem. 1983 May 25;258(10):6142-6. 53. Duckworth WC. Insulin degradation: mechanisms, products, and significance. Endocr Rev. 1988 Aug;9(3):319-45. doi: 10.1210/edrv-9-3-319. 54. Abdel-Wahab YH, O'Harte FP, Ratcliff H, McClenaghan NH, Barnett CR, Flatt PR. Glycation of insulin in the islets of Langerhans of normal and diabetic animals. Diabetes. 1996 Nov;45(11):1489-96. doi: 10.2337/diab.45.11.1489. 55. Abdel-Wahab YH, O'Harte FP, Barnett CR, Flatt PR. Characterization of insulin glycation in insulin-secreting cells maintained in tissue culture. J Endocrinol. 1997 Jan;152(1):59-67. doi: 10.1677/joe.0.1520059. 56. Dillon PF, Root-Bernstein RS, Sears PR, Olson LK. Natural electrophoresis of norepinephrine and ascorbic acid. Biophys J. 2000 Jul;79(1):370-6. doi: 10.1016/S0006- 3495(00)76298-9. 57. Verbeke P, Clark BF, Rattan SI. Modulating cellular aging in vitro: hormetic effects of repeated mild heat stress on protein oxidation and glycation. Exp Gerontol. 2000 Sep;35(6-7):787-94. doi: 10.1016/s0531-5565(00)00143-1. 58. Johansen MB, Kiemer L, Brunak S. Analysis and prediction of mammalian protein glycation. Glycobiology. 2006 Sep;16(9):844-53. doi: 10.1093/glycob/cwl009. 17 59. von Heijne G. Membrane protein structure prediction. Hydrophobicity analysis and the positive-inside rule. J Mol Biol. 1992 May 20;225(2):487-94. doi: 10.1016/0022- 2836(92)90934-c. 60. van Geest M, Lolkema JS. Membrane topology and insertion of membrane proteins: search for topogenic signals. Microbiol Mol Biol Rev. 2000 Mar;64(1):13-33. doi: 10.1128/MMBR.64.1.13-33.2000. 61. Tantipanjaporn A, Wong MK. Development and Recent Advances in Lysine and N- Terminal Bioconjugation for Peptides and Proteins. Molecules. 2023 Jan 21;28(3):1083. doi: 10.3390/molecules28031083. 62. Marunaka Y, Niisato N, Miyazaki H. New concept of spare receptors and effectors. J Membr Biol. 2005 Jan;203(1):31-9. doi: 10.1007/s00232-004-0729-0. 63. Guieu R, Brignole M, Deharo JC, Deharo P, Mottola G, Groppelli A, Paganelli F, Ruf J. Adenosine Receptor Reserve and Long-Term Potentiation: Unconventional Adaptive Mechanisms in Cardiovascular Diseases? Int J Mol Sci. 2021 Jul 15;22(14):7584. doi: 10.3390/ijms22147584. 64. Mizzi L, Maniscalco D, Gaspari S, Chatzitzika C, Gatt R, Valdramidis VP. Assessing the individual microbial inhibitory capacity of different sugars against pathogens commonly found in food systems. Lett Appl Microbiol. 2020 Sep;71(3):251-258. doi: 10.1111/lam. 65. Rowe, Raymond C., Paul Sheskey, and Marian Quinn. Handbook of pharmaceutical excipients. Libros Digitales-Pharmaceutical Press, 2009. 66. Bolli GB, Cheng AYY, Owens DR. Insulin: evolution of insulin formulations and their application in clinical practice over 100 years. Acta Diabetol. 2022 Sep;59(9):1129- 1144. doi: 10.1007/s00592-022-01938-4. 67. Nordisk N. Levemir SMPC (2004) https://www.ema.europa.eu/en/documents/product-information/levemir-epar-product- information_en.pdf 68. Heise T, Mathieu C. Impact of the mode of protraction of basal insulin therapies on their pharmacokinetic and pharmacodynamic properties and resulting clinical outcomes. Diabetes Obes Metab. 2017 Jan;19(1):3-12. doi: 10.1111/dom.12782. 69. Quan C, Alcala E, Petkovska I, Matthews D, Canova-Davis E, Taticek R, Ma S. A study in glycation of a therapeutic recombinant humanized monoclonal antibody: where it is, how it got there, and how it affects charge-based behavior. Anal Biochem. 2008 Feb 15;373(2):179-91. doi: 10.1016/j.ab.2007.09.027. 18 70. Zheng X, Wu SL, Hancock WS. Glycation of interferon-beta-1b and human serum albumin in a lyophilized glucose formulation. Part III: application of proteomic analysis to the manufacture of biological drugs. Int J Pharm. 2006 Sep 28;322(1-2):136-45. doi: 10.1016/j.ijpharm.2006.06.038. 71. Zhang B, Yang Y, Yuk I, Pai R, McKay P, Eigenbrot C, Dennis M, Katta V, Francissen KC. Unveiling a glycation hot spot in a recombinant humanized monoclonal antibody. Anal Chem. 2008 Apr 1;80(7):2379-90. doi: 10.1021/ac701810q. 72. Fischer S, Hoernschemeyer J, Mahler HC. Glycation during storage and administration of monoclonal antibody formulations. Eur J Pharm Biopharm. 2008 Sep;70(1):42-50. doi: 10.1016/j.ejpb.2008.04.021. 73. Merx MW, Flögel U, Stumpe T, Gödecke A, Decking UK, Schrader J. Myoglobin facilitates oxygen diffusion. FASEB J. 2001 Apr;15(6):1077-9. doi: 10.1096/fj.00- 0497fje. 74. Sun F, Suttapitugsakul S, Xiao H, Wu R. Comprehensive Analysis of Protein Glycation Reveals Its Potential Impacts on Protein Degradation and Gene Expression in Human Cells. J Am Soc Mass Spectrom. 2019 Dec;30(12):2480-2490. doi: 10.1007/s13361-019-02197-4. 75. Liu Z, Wang Y, Gao T, Pan Z, Cheng H, Yang Q, Cheng Z, Guo A, Ren J, Xue Y. CPLM: a database of protein lysine modifications. Nucleic Acids Res. 2014 Jan;42(Database issue):D531-6. doi: 10.1093/nar/gkt1093. 76. Fournet M, Bonté F, Desmoulière A. Glycation Damage: A Possible Hub for Major Pathophysiological Disorders and Aging. Aging Dis. 2018 Oct 1;9(5):880-900. doi: 10.14336/AD.2017.1121. 77. Blakytny R, Harding JJ. Glycation (non-enzymic glycosylation) inactivates glutathione reductase. Biochem J. 1992 Nov 15;288 ( Pt 1)(Pt 1):303-7. doi: 10.1042/bj2880303. 78. Ganea E, Harding JJ. Inactivation of glucose-6-phosphate dehydrogenase by glycation. Biochem Soc Trans. 1994 Nov;22(4):445S. doi: 10.1042/bst022445s. 79. Oda A, Bannai C, Yamaoka T, Katori T, Matsushima T, Yamashita K. Inactivation of Cu,Zn-superoxide dismutase by in vitro glycosylation and in erythrocytes of diabetic patients. Horm Metab Res. 1994 Jan;26(1):1-4. doi: 10.1055/s-2007-1000762. 80. Takahashi M, Lu YB, Myint T, Fujii J, Wada Y, Taniguchi N. In vivo glycation of aldehyde reductase, a major 3-deoxyglucosone reducing enzyme: identification of glycation sites. Biochemistry. 1995 Jan 31;34(4):1433-8. doi: 10.1021/bi00004a038. 19 81. Kuzan A. Toxicity of advanced glycation end products (Review). Biomed Rep. 2021 May;14(5):46. doi: 10.3892/br.2021.1422. Epub 2021 Mar 18. 82. Ahmed N. Advanced glycation endproducts--role in pathology of diabetic complications. Diabetes Res Clin Pract. 2005 Jan;67(1):3-21. doi: 10.1016/j.diabres.2004.09.004. 83. Song F, Schmidt AM. Glycation and insulin resistance: novel mechanisms and unique targets? Arterioscler Thromb Vasc Biol. 2012 Aug;32(8):1760-5. doi: 10.1161/ATVBAHA.111.241877. 84. Bansode S, Bashtanova U, Li R, Clark J, Müller KH, Puszkarska A, Goldberga I, Chetwood HH, Reid DG, Colwell LJ, Skepper JN, Shanahan CM, Schitter G, Mesquida P, Duer MJ. Glycation changes molecular organization and charge distribution in type I collagen fibrils. Sci Rep. 2020 Feb 25;10(1):3397. doi: 10.1038/s41598-020-60250-9. 85. Ramalho JS, Marques C, Pereira PC, Mota MC. Role of glycation in human lens protein structure change. Eur J Ophthalmol. 1996 Apr-Jun;6(2):155-61. doi: 10.1177/112067219600600211. 86. Chen J, Mooldijk SS, Licher S, Waqas K, Ikram MK, Uitterlinden AG, Zillikens MC, Ikram MA. Assessment of Advanced Glycation End Products and Receptors and the Risk of Dementia. JAMA Netw Open. 2021 Jan 4;4(1):e2033012. doi: 10.1001/jamanetworkopen.2020.33 20 Chapter 2: Hyperglycemia-Induced Non-Enzymatic Glycation of Immune System Proteins May Explain Increased COVID-19 Severity and Complications and Overall Infectious Disease Risk Among Diabetics 1. Introduction COVID-19 is a new and evolving disease caused by SARS-CoV-2 virus variants [1- 3]. As with most infectious diseases, some groups of people are at higher risk of infection, complications and death from COVID-19 than others [4-6]. Diabetics are one group with significantly increased risk. While diabetics do not contract SARS-CoV-2 infections at a higher rate than any other group [6,7], both type 1 (T1D) and type 2 diabetics (T2D) have significantly increased risks of developing severe COVID-19 and its associated complications, and of dying from their infection (reviewed in [7]). For example, a United Kingdom study of over 61 million people found that while only 0·4% had a recorded diagnosis of T1D and 4·7% had a diagnosis of T2D, among 23,698 in-hospital COVID- 19-related deaths 1·5% (almost 4X the expected figure) occurred in those with T1D and 31·4% (almost 7X) in people with T2D [8]. These results are consistent with evidence that diabetics mount poorer responses to COVID-19 vaccines [9,10] and make up an unusually large percentage (48%) of people hospitalized with COVID-19 vaccination breakthrough infections [10, 11]. However, risk of severe COVID-19 or death among diabetics may be limited to those with loss of glycemic control rather than equally distributed among all diabetics. Well- controlled diabetic patients have much lower risks of severe COVID-19, complications and death than poorly controlled patients [12-17]. A study of 64,892 U. S. veterans with 21 both diabetes and COVID-19 diagnoses found that those with an HbA1c ≥9.0% or using insulin were significantly more likely to be hospitalized, admitted to intensive care, to die or to develop long-term complications than those treated with sodium–glucose cotransporter 2 inhibitors, glucagon-like peptide-1 receptor agonists, angiotensin receptor blockers or metformin and a HbA1c <7.0%. The latter group had significantly lower risks of hospitalization, intensive care, death and long-term complications [18]. In another study of over 200K patients, T1DM was associated with a 21% higher absolute risk of intensive care unit admission or invasive mechanical ventilation compared with non-diabetics but this increased risk was confined exclusively to T1DM patients exhibiting diabetic ketoacidosis [19]. Indeed, high fasting blood glucose levels at hospital admission are a recognized risk for COVID-19 severity and death independent of prior diabetes diagnosis [20-22]. These observations are consistent with other studies correlating degree of diabetic control and hyperglycemia increased risk of wound- [23] and post-operative infection [24-26], pneumococcal disease [27,28], risk of severe influenza and other respiratory diseases [28-30]. In a study of over 660K patients, a direct correlation was found between HbA1c levels and overall risk of death from infectious disease [31]. Finally, gestational diabetes also appears to pose a risk of increased bacterial and fungal infections to mothers and their newborns compared with pregnant women without GD [32-36]. Hyperinflammation is a common factor in both diabetes and risk for complications from the diseases that accompany it, such as COVID-19 and influenza-associated pneumonias. At the same time that diabetes impairs immune function, it is also associated with increased release of proinflammatory cytokines [37-40]. Hyperinflammation activated 22 via synergistic pathways within the innate immune system is, in turn, highly associated with susceptibility to severe or fatal COVID-19, sepsis and acute lung-injury or acute respiratory distress syndrome (reviewed in [41]. Thus, in clinical studies, the hyperinflammatory status of diabetic patients has been correlated both with their glycemic index and with their susceptibility to COVID-19 [42-46]. This risk of infectious complications associated with diabetic hyperinflammation also extends to many other diseases including diabetes-associated wounds [47-51], tuberculosis [52-54], and periodontal infections [55, 56]. Unfortunately, the mechanisms underlying immunological impairment in T1D, T2D and GD, and whether or how they are related to hyperinflammation, are unknown [23,28,57]. As Naruse [12] has commented with regard to COVID-19, “Although it is well known that better glycemic control decreases diabetic complications in type 2 diabetes patients, the mechanism by which poor glycemic control is associated with the increased mortality risk of diabetes patients with COVID‐19 is still unclear” [12]. This paper proposes that a major factor impairing immune function among hyperglycemic individuals is the non-enzymatic glycation of proteins of host defense proteins and peptides involved in immune system function such as serotransferrin, lactotransferrin and its active peptides, lysozyme, defensins, histatins, cathelicidin, hepcidin, [58] as well as complement proteins, interferons, interleukins, tumor necrosis factors, immunoglobulins, Toll-like receptors and NOD-like receptors, human leukocyte antigens, T cell receptors and caseins (TABLE 1) [59-72]. All of these proteins and peptides have potential glycation sites. Glycation is a Maillard reaction canonically involving the carbonyl group of a simple 23 reducing sugar being non-enzymatically added to an amino group of a protein or peptide, forming a Schiff base. Subsequently, the adduct is converted to a glucosylamine which then undergoes intramolecular rearrangement resulting in the generation of a more stable Amadori compound [73,74]. However, this process may differ depending upon which sugar is involved. For example, in the case of fructose the step after Schiff base formation is a fructosylamine which undergoes Heyns rearrangement to form a Heyns product [75]. Under some conditions and for some proteins frucation occurs significantly faster than glucation [76] and some metabolites derived from the glycolytic process, such as glucose- 6-phosphate can also serve as glycation reactants and seem to be even more reactive than glucose itself [77]. These reactions can occur in both extracellular and intracellular compartments [78-80] and the increased glycation of serum albumins and of hemoglobin to produce hemoglobin A1c (HbA1c) are well-established clinical diagnostic measures of the degree of chronic hyperglycemia often used to monitor diabetic control [74, 81, 82]. Simple glycation products can react further to form Advanced Glycation End-products (AGEs), which have been found to contribute to numerous diabetic pathologies through hyperinflammation associated with activation of their receptors (RAGEs) [83, 84] Insulin [85-94] and the insulin receptor [95] also become glycated under hyperglycemic conditions, resulting in poorer insulin binding to its receptor and thereby helping to explain the peripheral insulin resistance that often accompanies diabetes. Oddly, despite the long history of glycation studies involving insulin, HbA1c and serum albumins in diabetes, and the fact that it is known that thousands of proteins and peptides glycate in hyperglycemic diabetics [96, 97], the glycation of very few proteins and peptides have been investigated to determine which residues glycate or the 24 physiological effects of such glycation. The emphasis thus far has been on characterizing whether glycation occurs among moderate-to-high abundance proteins such as alpha-1- antitrypsin, alpha-2-macroglobulin, apolipoproteins, fibrinogen, alpha-1-acid glycoprotein, ceruloplasmin, hemopexin, heparin cofactor 2 precursor, kininogen-1 precursor, vitamin D-binding protein precursor, glyceraldehyde-3-phosphate dehydrogenase (GAPDH), spectrin, and apolipoproteins [96-98]. Despite the well-established deficits of immune function that often accompany diabetes, and evidence that methyl glyoxyl – the glucose metabolite that has the highest reactivity in the glycation process – has a wide range of detrimental effects on immune function [99,100], components of the immune system, which are generally present in relatively low abundance, have been particularly neglected. The two major exceptions are complement proteins [96, 97] and serotransferrin [96,97], both of which are present in high abundance in blood serum. The primary function of the complement system is to recruit innate immune cells through opsonization of target cells to initiate phagocytosis or lysis [101]. Many components of the complement system become glycated under hyperglycemic conditions [96-98] but the effects of glycation on complement activation are complicated. For example, glycated proteins in general tend to activate the complement cascade [102], with C1q specifically recognizing AGEs [103] so that any decrease in complement activation that may occur due to glycation of complement proteins themselves [96-98] may be offset by hyperactivation of the system due to the presence of other glycated host proteins. Moreover, glycation of CD59, a negative regulator of complement activation that correlates with HbA1c glycation [104,105], has been demonstrated to release control of the complement cascade leading to increased tissue damage associated with diabetic complications [106,107]. Studies of 25 serotransferrin glycation by glucose and fructose in diabetic patients have also correlated the concentrations of glycated serotransferrin to both degree of hyperglycemia and HbA1c levels and have documented decreased ability to bind and deliver iron and chromium (the latter essential to insulin function) to cells [108-113]. Another key function of serotransferrin is to act as an antibiotic by binding up iron needed by many bacteria for optimal growth and pathogenicity (reviewed in [114]). Glycation of serotransferrin may therefore decrease the ability of hyperglycemic diabetics to respond to bacterial infections. Additional studies have demonstrated that other proteins associated with immune function can undergo glycation but most of these studies employed non-physiological concentrations of sugars and/or temperatures (often involving food or pharmaceutical preparation and storage), did not identify where in the molecules the glycation occurs and did not demonstrate that glycation affects function (TABLE 1). Examples include lysozyme [115-118]; immunoglobulins [118-126]; interferon 1b [127]; and caseins [128- 129]. Maillard reaction glycation of lactoferrin (LF) has also been characterized, specifically with regard to whether it alters gastrointestinal digestion of the protein and its uptake following ingestion. Several studies demonstrated that glycation altered the rate, number and kinds of peptides produced compared to unglycated LF but no attempts were made in these studies to evaluate whether the antimicrobial activity of the protein or its peptide products was altered [130-132]. Indirect evidence also exists for glycation of LF in the tears of T1D patients [118] and in milk of mothers with GD [133], again without any tests to establish alterations in functionality. No glycation studies appear to have been done for the remaining peptides and proteins listed in TABLE 1. Thus, the question of 26 whether most immune system proteins and peptides undergo glycation under hyperglycemic conditions and whether such glycation might affect the immune functions of diabetic patients remains very much open. The purpose of this study is, first of all, to predict glycation sites on immune system proteins and peptides using two well-established glycation prediction programs, PredGly, NetGlycate and Gly-PseAAC (see Methods) and, secondly, to interpret the likely physiological effects of these glycations using previous studies of homologous proteins and/or from the effects of point mutations at these glycation sites. We have particularly focused our analysis on LF and its proteolytically-produced antimicrobial peptides because LF shares a high degree of homology with TF, the glycation of which has previously been well-studied both in terms of the location of its glycation sites and the effects of glycation on its functions. We have also used experimental data on glycation of hen egg-white lysozyme studies to infer probable effects of glycation on human lysozyme, the glycation of which has not been carried out. Using LF and lysozyme to validate our approach and to indicate its limitations and possibilities, we then make predictions concerning the likelihood of glycation of the rest of the proteins and peptides listed in TABLE 1 and how such glycations might alter their immune functions based genetic studies of the effects of amino acid substitutions of relevant lysines. 2. Results The three glycation prediction programs utilized here each predict glycation of some lysines for all immune system peptides and proteins examined (TABLE 1), with the exception of hepcidin. However, the three programs agree on which specific lysines will glycate less than a third of the time and any pair of the programs only agree about half of 27 the time. These inconsistencies result from the use of different algorithms incorporated into each program and differences in the training sets utilized for optimizing the prediction algorithms (see Methods). Additionally, none of the programs address possible glycation at the free N-termini of peptides that, experimentally, have been demonstrated to be likely to glycate under physiologic hyperglycemic conditions. Thus, hepcidin, the LF peptides, ubiquicidin, dermicidin, etc., are all likely to experience N-terminal glycation in addition to glycation of some of their lysines. Thus, our overall finding is that hyperglycemia will likely induce some degree of increased glycation in essentially all innate and adaptive immune system proteins and peptides associated with antimicrobial immune functions. The basis of this conclusion is provided below in four sub-sections. The first addresses predicted glycation of antimicrobial immune peptides and proteins and the functional consequences of glycation that can be extrapolated from the known effects of modifications of amino acids at the glycation sites. The second addresses the same data concerning glycation modification of cytokines and lymphokines. The third addresses predicted glycations and their effects on Toll-like receptors (TLR). And the fourth sub-section examines the same data relating to glycations of immunoglobulins. One unfortunate limitation of the data presented below is that one of the glycation prediction programs, GlyPreAAC, became unavailable very shortly after we began this project so that we can present predictions from it only for serotransferrin, lactotransferrin and lysozyme. 28 TABLE 1. Some key human antimicrobial proteins and peptides, their main immune functions and the primary tissues or organs from which they are released. Sequences are provided in Tables 2-5 below. AntiBact = antibacterial activity; Gram +/- = affects both Gram positive and negative bacteria; AntiVir = antiviral actity; AntiFung = antifungal activity; AntiProt = anti-protozoal activity; Chemotact = stimulates chemotaxis; TLR = Toll-like receptors; NOD = Nucleotide oligomerization domain-like receptors. IMMUNE FUNCTIONS TISSUE LOCATION Sequesters iron required for bacterial growth Blood plasma, CSF, PROTEIN/PEPTIDE Serotransferrin REFS [58] Lactotransferrin (Lactoferrin) Lf( 1-11) Lfpep (18-40) Lactoferricin (17-41) Kaliocin-1 (153-183) Lactoferrampin (268-284) Caseins Sequesters iron required for bacterial growth; prevents biofilm growth; AntiBact, AntiVir, AntiFung, AntiProt (higher activity in its peptides); Antiinflammatory; AGE recognition AntiBact (Gram +/-); AntiFung (Candida spp.) Bacteriostatic (Gram+/-); anti-yeast; AntiProt AntiVir; Bacteriostatic (Gram +/-); AntiFung; AntiProt Bacteriocidal; AntiFung AntiVir; AntiBact (Gram +/-); AntiFung; AntiProt Peptide digests AntiBact; Expressed on macrophages stimulating GM-CSF release AntiBact; Antiinflammatory Lysozyme Alpha-Defensins Beta-Defensins Histatins Cathelicidin Hepcidin Granulysin AntiBact (Gram +/-); AntiVir; AntiFung; Chemotact AntiBact (Gram +/-); AntiVir; AntiFung; Chemotact AntiBact (Gram +/-); AntiVir; AntiFung AntiVir; AntiBact (Gram +/-); AntiVir; AntiFung; AntiProt; Chemotactic AntiBact (Gram +/-); AntiFung AntiBact (Gram +/-); AntiFung; AntiProt Thrombocidin-1 Ubiquicidin Dermicidin Interferons AntiBact (Gram +/-); AntiFung AntiBact (Gram +/-) AntiBact (Gram +/-); AntiFung AntiVir Interleukins Regulates lymphocyte differentiation, activation, inflammation, chemotaxis for all pathogens Tumor necrosis factors Regulates inflammatory processes and T cell activation Toll-like and NOD- like receptors Mediate activation of immune responses to all pathogens (TLR1,2,4,5,6 and NOD1,2 mainly bacteria/fungi/protozoa; TLR3,7,8,9 mainly viruses) 29 semen Milk, blood plasma, neutrophils, all exocrine secretions (saliva, mucus, bile, tears pancreatic) (See Lactoferrin) (See Lactoferrin) (See Lactoferrin) (See Lactoferrin) (See Lactoferrin) Milk, macrophages [59] [60-62] [63] [60, 64] [62, 65] [62, 64] [66-68] All tissues; all secretions Neutrophils, bone marrow Skin, Saliva, solid organs Saliva Neutrophils, skin Blood plasma Cytolytic T cells, NK cells Platelets Macrophages Eccrine sweat glands Mainly dendritic and NK cells; any virus- infected cell Lymphocytes, keratinocytes, bone marrow, epithelial cells, varied organs Epithelial cells, fibroblasts, and immune cells Macrophages, monocytes, dendritic cells [69] [58] [58] [58] [58] [58] [58] [58] [58] [58] [70] [70] 750] [71] Table 1 (cont’d) Immunoglobulins Specific antibody-mediated immune responses Plasma cells [72] 2.1. Predicted Glycation Sites on Antimicrobial Peptides and Proteins In all of the Results described below and in the Discussion, we have used the UniProt numbering system for each protein. This is extremely important to bear in mind because many of the proteins have signal sequences at their N-termini that are removed during activation of the protein and some publications utilize numbering systems for these processed proteins, or for peptides derived from them, and number the products from the N-terminus remaining after the signal sequence has been removed. 2.1.1. Serotransferrin (TF) TF glycation is significantly increased in TIDM vs T2D and both are significantly increased compared with healthy controls. Iron-binding capacity is proportionally decreased as a function of the percent of TF glycated and inversely proportional to oxidative stress [134-137]. Zhang, et al. [96, 97] have documented that TF glycates under normoglycemic conditions at K47, K60, K61, K122, K215, K258, K278, K295, K297, K299, K331, K399, K464, K515, K553, K646, and K683and displays increased glycation at these sites during T1D and T2D over glycation rates among non-diabetics. Significantly, increased glycation of K683 of TF, in particular, correlates so well with development of T2D that it has been suggested as a possible biomarker for the disease [137]. 30 ANTIMICROBIAL PEPTIDES AND PROTEINS Serotransferrin Lactotransferrin (LTF) PREDICTED GLYCATION SITES 23, 37, 46, 60, 61, 107, 121, 122, 135, 163, 212, 215, 225, 236, 258, 278, 295, 297, 299, 310, 315, 331, 359, 399, 433, 453, 464, 467, 489, 509, 515, 546, 553, 564, 571, 588, 610, 618, 646, 659, 668, 676, 683 2, 37, 47, 57, 92, 119, 132, 182, 216, 256, 262, 282, 299, 301, 304, 315, 320, 352, 376, 405, 435, 475, 476, 503, 535, 565, 583, 600, 624,638, 658, 675, 680, 694, 695, 710 Lactoferricin (LTF 35- 60) N-terminus, 39, 41 (LF 57) Lfpepc (LTF 36-59) N-terminus, 2, 12, 22 (LF 37, 47, 57) Kaliocin-1 (153-183) N-terminus, 12 (LF 182) Lf pep (LTF 19-30) N-terminus Lactoferrampin (268- 84) N-terminus, 15 (LF 282) Casein (alpha) 18, 59, 65, 82, 98 Casein (beta) 2, 35, 38, 40, 103, 105, 111, 119, Casein (kappa) 2, 26, 123, 124, 128 FIGURE 1. Summary of results of glycation predictions for antimicrobial immune proteins and peptides made by two or more programs. Protein sequences and Uniprot accession numbers can be found in the supplementary table. Those numbers with a light blue background are unique predictions made by Gly-PseAAC; those numbers on a magenta background are unique predictions made by NetGlycate; those numbers on a light gray background are unique predictions made by PredGly; those numbers in white on a dark blue background are predictions agreed upon by Gly-PseAAC and PredGly; those numbers in white on a maroon background are predictions agreed upon by NetGlycate and PredGly; those numbers on a white background are predictions agreed upon by Gly-PseAAC and NetGlycate; those numbers in white against a black background are predictions agreed upon by all three programs. Bolded, underlined, italicized numbers are lysines that have been reported to be glycated under experimental and/or clinical conditions in the Compendium of Protein Lysine Modifications (https://cplm.biocuckoo.cn/View.php?id=CPLM004870, accessed 5 May 2023). Lactotransferrin peptide glycation sites are predicted from lactotransferrin itself. Gly-PseAAC became unavailable after we ran serotransferrin, lactotransferrin and lysozyme and before we were able to run the rest of the proteins and peptides listed in this figure. 31 Figure 1 (cont’d) Cathelicidin LL-37 2, 36, 82, 100, 101, 106, 127, 141, 143, 148, 151 Defensin, α- 50 Defensin, β- 25, 61, 67 Defensin, neutrophil None, 56 Dermicidin Granulysin Hepcidin Histatin N-terminus, 6, 20, 23, 34 41, 58, 82 None N-terminus, 2, 24, 36 α-Lactalbumin 20, 24, 77, 81, 113, 118, 127, 133 Lysozyme C 2, 19, 31, 51, 115 Thrombocidin-1 17, 56, 61, 62 Ubiquicidin 1, 11, 27, 28, 33, 52, 53 The increased glycation associated with diabetes results in low serum transferrin saturation and increased TF receptor expression. Increased glycation and TF receptor, in turn, predict a corresponding exponential increase in all-cause mortality among type 2 diabetics with coronary heart disease [138, 139]. Overall, all three glycation programs utilized here predicted some of the lysines that glycated but both missed a significant number and even where all three or any two were in consensus, their predictions were only correct about two-thirds of the time. Thus, a combination of prediction algorithms provides a greater probability of capturing the range of clinically-observed glycation sites than do individual glycation prediction programs. 2.1.2. Lactotransferrin (LTF) The glycation of lactoferrin has not been studied experimentally or analytically but since TF demonstrates a high degree of homology with LTF (FIGURES 2 and 3) and TF 32 has been shown to readily glycate under physiological conditions (see above), it is reasonable to predict that LTF will undergo extensive glycation as well. In particular, many of the known glycation sites on TF correspond to regions of high homology with LTF (FIGURE 2) and indeed SPOT-1D, a deep learning model for predicting protein secondary structures, suggests that both proteins share similar structures at these points (https://sparks-lab.org/server/spot-1d/ accessed 24 October 2022). The SPOT-1D results were confirmed using 2StrucCompare (which was used to generate FIGURE 3), a webserver for visualizing small but noteworthy differences between protein tertiary structures through interrogation of the secondary structure content (https://2struccompare.cryst.bbk.ac.uk/ accessed 11 May 2023) [140]. Both programs demonstrate that many of the TF glycation sites are just as readily accessible in the LTF structure while also identifying some key differences in lysine accessibility (FIGURE 3). Several of the LTF lysine residues most likely to glycate according to the glycation prediction programs (FIGURE 2) and homologies to TF are either located in close proximity to or are part of regions associated with antimicrobial activity. Substitution of K47 to R47 has been shown to increase the plasma concentration of LTF while simultaneously decreasing its Gram-positive antibacterial activities [141, 142]. Notably, K47 is under positive evolutionary selection and while the dominant form of LTF in North America and Europe is K47, it is R47 in most of Africa [143]. Thus, glycation of K47 may convert LTF functionally into an equivalent of its less active naturally occurring R47 variant. Indeed, the number of K-to-R and R-to-K substitutions in the comparative TF-LTF sequences is quite large and may have driven much of their functional divergence (FIGURE 3). One interesting implication of these substitutions is the possibility that 33 different ethnic and geographical groups have slightly different risks of infections and for developing diabetic complications. We can infer the importance of potential glycation of K2, K37, K47 and K57 from the fact that these appear repeatedly in the antimicrobial peptides derived from the amino- terminal portion of LTF such as Lf(1-11), Lf pep (LTF 19-30), Lactoferricin (LTF 35-60), and Lfpepc (LTF 36-59) [60-64, 144, 145]. Mutation of LTF K37 is thought to decrease function (FIGURE 4) and it is reasonable to assume that glycation of this lysine would have similar detrimental effects. LTF K47 and K57 have homologies to TF K46 and K60, which are known to glycate (see Section 2.1.1) and are implicated in the decrease of TF function. Similar reasoning suggests that K282, which is homologous to TF K278, will also glycate (see Section 2.1.1), which may impair the antimicrobial activity of both LTF and its proteolytic product, lactoferrampin. LTF K182, which has no homologous K on TF, is located near the N terminal cleavage site of the Kaliocin-1 peptide produced by proteolytic cleavage of LTF; interference in this cleavage would decrease production of kaliocin-1, which has antimicrobial activity mediated by its ability to alter liposomal membrane permeability to ions [146]. Additional functional effects may result from other predicted LTF lysine glycations. The serine protease functionality of LTF is nearly abolished in response to mutations of K92 [147], which has no homology in TF and may therefore represent a novel glycation site with important functional implications. LTF K695, however, is homologous to TF K683, which is known to glycate (see Section 2.1.1) and is implicated in decreased transferrin activity in diabetics [148]. Glycation of LTF K695 can be presumed similarly to decrease its function. Additionally, LTF shares homologous glycation sites identified on 34 TF that include LTF K119-TF K122, LTF K262 -TF K258, LTF K282- TF K278, LTF K299- TF K295, LTF K301-TF K297, LTFK 473-TF K464, LTF K565-TF K553 and LTF K658-TF K648, and LTF K696/TF K683 (FIGURE 2). The detailed functional effects of each of these glycations in TF are not known, but some may be inferred from genetic variant and mutation studies (TABLE 2). For example, two mutations of LTF K118 are associated with decreased function so that glycation of neighboring K119 (predicted by all three glycation programs) may have similar effects. Similarly, three mutations of LTF K301, which is predicted to glycate, have been documented, each associated with decreased function (TABLE 2) Table 2 lists additional lysine mutations or variations thought to modify LTF function that are not predicted to glycate but which, given the questionable accuracy of the prediction programs, should be investigated experimentally. Finally, since glycation of TF results generally in loss of metal binding capacity, the same can be predicted of LTF; loss of iron binding capacity may, in turn, impair LTF antibacterial activity as it does TF activity [114, 134-137]. In fact, three lysines in TF mediate iron binding in both proteins via protonation/deprotonation, of which two are substituted with arginines in LTF and one is homologous: TF K224 (LTF R229); TF K323 (LTF R328); and TF K553 (LTF K565) [129]. The substitutions of K for R in LTF are thought to mediate its different affinity for iron than TF [149]. Notably, however, the lysines shared by TF K553 and LTF 565 are both thought to glycate, which would interfere with the protonation/deprotonation necessary for iron binding. 35 FIGURE 2. LALIGN (https://www.ebi.ac.uK/Tools/psa/lalign/ PAM250 , open gap -10, extended gap -2, accessed 7 May 2023) homology comparison for human serotransferrin (TF) (P02787) versus human lactotransferrin (LTF) (P02788-1). 36 10 20 30 40 * 50 60 SEROTR MRLAVGALLVCAVLGLCLAVPDKTVRWCAVSEHEATKC-QSFRDHMKSVIPSDGPSVACVK :.:. .:: ..:::::: . ..:.:::::..::::: : :. : ::.:.:.: LACTOT MKLVFLVLLFLGALGLCLAGRRRSVQWCAVSQPEATKCFQWQRNMRKV----RGPPVSCIK 10 20 30 40 50 * 70 80 90 100 110 120 SEROTR KASYLDCIRAIAANEADAVTLDAGLVYDAYLAPNNLKPVVAEFYGSKEDPQTFYYAVAVV ..: ..::.:::.: :::::::.:..:.: ::: .:.::.:: ::.. .:.: ::::::: LACTOT RDSPIQCIQAIAENRADAVTLDGGFIYEAGLAPYKLRPVAAEVYGTERQPRTHYYAVAVV 60 70 80 90 100 110 * 130 140 150 160 170 SEROTR KKDSGFQMNQLRGKKSCHTGLGRSAGWNIPIGLL--YCDLPEPRKPLEKAVANFFSGSCA ::...::.:.:.: ::::::: :.::::.::: : . . ..:..:.: :::.:::.::. LACTOT KKGGSFQLNELQGLKSCHTGLRRTAGWNVPIGTLRPFLNWTGPPEPIEAAVARFFSASCV 120 130 140 150 160 170 180 190 200 210 * 220 230 SEROTR PCADGTDFPQLCQLCPG-----CGCSTLNQYFGYSGAFKCLKDGAGDVAFVKHSTIFENL : :: ..::.::.::.: :. :. ..::.::::::::.::::::::...::.::.: LACTOT PGADKGQFPNLCRLCAGTGENKCAFSSQEPYFSYSGAFKCLRDGAGDVAFIRESTVFEDL 180 190 200 210 220 230 240 250 *60 270 *80 290 SEROTR ANKADRDQYELLCLDNTRKPVDEYKDCHLAQVPSHTVVARSMGGKEDLIWELLNQAQEHF ...:.::.::::: ::::::::..::::::.::::.:::::..:::: ::.::.::::.: LACTOT SDEAERDEYELLCPDNTRKPVDKFKDCHLARVPSHAVVARSVNGKEDAIWNLLRQAQEKF 240 250 260 270 280 290 * x3*0 310 320 330* 340 350 SEROTR GKDKSKEFQLFSSPHG-KDLLFKDSAHGFLKVPPRMDAKMYLGYEYVTAIRNLREGTCPE ::::: .::::.:: : ::::::::: :: .::::.:. .::: .: :::.:::... : LACTOT GKDKSPKFQLFGSPSGQKDLLFKDSAIGFSRVPPRIDSGLYLGSGYFTAIQNLRKSE--E 300 310 320 330 340 350 360 370 380 390 4*0 410 SEROTR APTDECKPVKWCALSHHERLKCDEWSVNSVGKIECVSAETTEDCIAKIMNGEADAMSLDG . .. .: :::....: ::..:: : :...: ::.::::::: ...:::::::::: LACTOT EVAARRARVVWCAVGEQELRKCNQWSGLSEGSVTCSSASTTEDCIALVLKGEADAMSLDG 360 370 380 390 400 410 420 430 440 450 460 * SEROTR GFVYIAGKCGLVPVLAENYNKSD------NCEDTPEAGYFAVAVVKKSASDLTWDNLKGK :.::.::::::::::::::.. . :: : : .::.:::::..:...:::...::: LACTOT GYVYTAGKCGLVPVLAENYKSQQSSDPDPNCVDRPVEGYLAVAVVRRSDTSLTWNSVKGK 420 430 440 450 460 470 470 480 490 500 510 * 520 SEROTR KSCHTAVGRTAGWNIPMGLLYNKINHCRFDEFFSEGCAPGSKKDSSLCKLCMGS--GLNL :::::::.::::::::::::.:... :.:::.::..:::::. :.:: ::.:. : : LACTOT KSCHTAVDRTAGWNIPMGLLFNQTGSCKFDEYFSQSCAPGSDPRSNLCALCIGDEQGENK 480 490 500 510 520 530 530 540 550 * 560 570 580 SEROTR CEPNNKEGYYGYTGAFRCLVEK-GDVAFVKHQTVPQNTGGKNPDPWAKNLNEKDYELLCL : ::..: :::::::::::.:. :::::::. :: :::.:.: ..:::.:. :..:::: LACTOT CVPNSNERYYGYTGAFRCLAENAGDVAFVKDVTVLQNTDGNNNEAWAKDLKLADFALLCL 540 550 560 570 580 590 590 600 610 620 630 640 SEROTR DGTRKPVEEYANCHLARAPNHAVVTRKDKEACVHKILRQQQHLFGSNVTDCSGNFCLFRS ::.::::.: .::::.:::::::.:.:: . ....: .:: ::.: .::...::::.: LACTOT DGKRKPVTEARSCHLAMAPNHAVVSRMDKVERLKQVLLHQQAKFGRNGSDCPDKFCLFQS 600 610 620 630 640 650 x 650 660 670 680 * 690 SEROTR ETKDLLFRDDTVCLAKLHDRNTYEKYLGEEYVKAVGNLRKCSTSSLLEACTFRR :::.:::.:.: :::.::...::::::: .:: ...::.:::::.:::::.: : LACTOT ETKNLLFNDNTECLARLHGKTTYEKYLGPQYVAGITNLKKCSTSPLLEACEFLR 660 670 680 690 700 Figure 2 (cont’d) The Waterman-Eggert Score was 2347; 459.5 bits; E(1) < 2.4e-133. 60.2% identity (86.1% similar) in 714 aa overlap (1-697:1-709). * = experimental evidence of glycation at this site not shared by LTF. Black rectangle = evidence of glycation at this site on TF which is homologous with LTF. Gray-highlighted and underlined sequences represent known proteolytic peptide fragments of LTF that have anti-microbial activity. TF lysine (K) glycation sites identified from Compendium of Protein Lysine Modifications, Serotransferrin, https://cplm.biocuckoo.cn/View.php?id=CPLM004870 (accessed 5 May 2023). FIGURE 3. Structural comparison of human serotransferrin crystal structure (PDB 1FQE at 1.80 angstroms) with that of human lactotransferrin (PDB 1H45 at 1.95 angstroms) using 2StrucCompare (https://2struccompare.cryst.bbk.ac.uk/ accessed 11 May 2023) [140]. Red/pink areas indicate regions of significant side-chain-related structural differences, which include the two lysines visible at the left-hand side of the figure and the one at the lower right. 37 TABLE 2. List of the lysine residues (K) of human lactotransferrin by their position in the sequence (numbers under “Position”); whether they are predicted to glycate (YES) or not (blank); and the predicted or known effects of mutations or substitutions of these lysines aggregated from the UniProt database disease and variant function (www.uniprot.org accessed 14 May-19 May 2023). See the Methods section for a source with links to the Provenance websites. E = glutamic acid; I = isoleucine; M = methionine; N = asparagine; Q = glutamine; R = arginine; T = threonine; * = deletion. Provenance(s) Predicted/Known Effect Variant Identification(s) Predicted Glycation YES x3 YES x3 Position & Modification 2 K 37 K>E YES 47 K>T K>I ? Decreased function Impaired antibacterial action CA352483278, rs1289635444 CA352482984, rs1126478, CA352482977 ClinGen, TOPMed ClinGen, 1000Genomes, ESP, ExAC, TOPMed, gnomAD CA73714273, rs557956250 CA2355697, CA73714060, rs143898414 ClinGen, NCI- TCGA, gnomAD ClinGen, ESP, ExAC, TOPMed, gnomAD CA2355548, rs764927306 ClinGen, ExAC, TOPMed, gnomAD CA352476918, rs1299916491 ClinGen, gnomAD CA352476862, rs1450585310 CA73711646, rs912732938 CA352476859, rs912732938 CA73711603, rs572780496 CA2355493. rs750647853 ClinGen, gnomAD ClinGen, TOPMed ClinGen, TOPMed ClinGen. 1000Genomes ClinGen, ExAC, gnomAD CA73706972, rs752743478 ClinGen, gnomAD YES YES 57 K 92 K>N ? Decreased function 118 K>M, R Decreased function YES x3 119 K YES x2 YES x2 YES x2 132 K 182 K 216 K 250 K>T Interferes with 118 function? ? ? ? Decreased function YES x2 YES x2 262 K 282 K 296 K>R ? ? Decreased function YES x2 YES x2 299 K 301 K>E ? Decreased function YES x2 301 K>M Decreased function YES x2 301 K>R Decreased function YES YES 320 K>N Decreased function 320 K>Q Decreased function YES x2 YES 352 K 435 K>E ? Decreased function 473 K 38 Table 2 (cont’d) YES YES x3 YES x3 YES YES x3 YES 475 K 476 K 503 K 565 K 583 K 586 K>* 600 K>E ? ? ? Participates in iron binding! ? Benign: somatic variant Decreased function YES 600 K>N Decreased function YES x2 YES ? 624 K 629 K 638 K 649 K>E,Q Decreased function ? 649 K>R Decreased function YES YES x3 YES x2 YES x2 YES x2 658 K 675 K 680 K 694 K 695 K>N ? ? ? ? Decreased function YES x2 710 K>R Decreased function 2.1.3. Lysozyme [129] TCGA novel CA352472483, rs1416140040 CA2355150, rs777999029 NCI-TCGA ClinGen, gnomAD ClinGen, ExAC, gnomAD CA2355103, rs780557539 CA2355102, rs754674696 ClinGen. ExAC, gnomAD ClinGen. ExAC, gnomAD CA2355087, rs747980963 CA73701843, rs770751980 ClinGen, ExAC, gnomAD ClinGen, Ensembl Like glycation of human lactoferrin, almost nothing known about glycation of human lysozyme C and its functional effects. However, in this case, a substantial number of studies of chicken egg white lysozyme glycation are available from which to infer glycation patterns and effects on the human version of the protein. There are six lysines on human lysozyme C as well as an N-terminus that can potentially glycate. The glycation prediction programs predict three sites in common and up to five total, not including the N-terminus (FIGURE 1). Only one of the lysines readily glycates in human lysozyme (but which one is unknown) [150] and the maximal number of glycations even at 500 mM glucose for 30 days at 37oC is four [149]. Analogy to the 39 highly homologous chicken egg white lysozyme (FIGURES 4 and 5) may provide additional information concerning glycation. It is important to realize that most of the chicken glycation data utilizes a numbering system that is based on the sequence after removal of the signal sequence (18 amino acids) so that care must be taken in comparing the experimental data with the UniProt numbering of human lysozyme C that is employed here. Using the UniProt numbering, chicken lysozyme double-glycates at K19 (processed lysozyme K1) with 30 mM glucose for 30 days at 25oC) [151]. The primary glycation site on human lysozyme (predicted by homology) is therefore likely to be the homologous K19 (FIGURE 4). Glycation of K19 is likely to have two adverse effects on lysozyme activity (TABLE 3). One is to inhibit the proteolytic removal of the signal sequence that occurs at this site, thus limiting the concentration of available functional lysozyme. The possible glycation of K2 in human lysozyme at the N terminus of the signal sequence, which is an arginine in the chicken version, may also inhibit lysozyme processing (as with TF-LTF, several K-R substitutions have modified human and chicken lysozymes: see Figure 4 and Table 3). Indeed, mutation of K2 and K19 in human lysozyme C are associated with decreased activity (TABLE 3). Secondly, glycation of K19 may inhibit lysozyme activity following signal removal since it is likely that all of the positively charged residues participate in binding to and enzymatically cleaving lysozyme substrates. This prediction is also consistent with mutation studies (TABLE 3). Additional glycation of K31, K51 and K114 on chicken lysozyme result from higher concentrations of glucose and/or higher temperatures (50oC) [115, 151-153]. Thus, the homologous lysines (K31, K51, and K115) on human lysozyme may also glycate to some extent under prolonged hyperglycemic conditions. Human lysozyme K51 may be 40 particularly important since mutation studies demonstrate loss of activity when K51 is varied and familial visceral amyloidosis if K51 is deleted. These observations are consistent with significantly decreased its enzymatic activity following glycation of chicken egg white lysozyme (50 mM glucose at 37oC), which was observable within days of exposure to glucose and increased linearly over a 16 week (112 day) period; loss of activity correlated with alterations in secondary and tertiary structure [152, 153]. We predict that glycation of human lysozyme will similarly alter its secondary and tertiary structure resulting in correlative loss of function over time and in relation to the degree of hyperglycemia. FIGURE 4. Comparison of the sequences of human lysozyme (HULYS) with chicken lysozyme (CHLYS). LALIGN (https://www.ebi.ac.uK/Tools/psa/lalign/ PAM250 , open gap -10, extended gap -2, accessed 7 May 2023) homology comparison for P61626 · LYSC_HUMAN versus P00698 · LYSC_CHICK including the signal sequence (1-18) Waterman-Eggert score: 504; 124.0 bits; E(1) < 1e-33, 56.8% identity (90.4% similar) in 146 aa overlap (1-146:1-145). Magenta highlights conserved lysines (K). Green highlights conserved arginines (R). Black rectangle = evidence of glycation at this site on chicken lysozyme C at sites homologous with human lysozyme C. See text for chicken lysozyme glycation data. 41 10 20 30 40 50 60 HULYS MKALIVLGLVLLSVTVQGKVFERCELARTLKRLGMDGYRGISLANWMCLAKWESGYNTRA :..:..: : .:.... ::::.::::: ..:: :.:.::: ::.::.: ::.::..::.: CHLYS MRSLLILVLCFLPLAALGKVFGRCELAAAMKRHGLDNYRGYSLGNWVCAAKFESNFNTQA 10 20 30x 40 50x 60 70 80 90 100 110 120 HULYS TNYNAGDRSTDYGIFQINSRYWCNDGKTPGAVNACHLSCSALLQDNIADAVACAKRVVRD :: ...: ::::::.:::::.:::::.:::. : :...::::: ..:...:.:::..:.: CHLYS TN-RNTDGSTDYGILQINSRWWCNDGRTPGSRNLCNIPCSALLSSDITASVNCAKKIVSD 70 80 90 100 110 x 130 140 HULYS PQGIRAWVAWRNRCQNRDVRQYVQGC .:..:::::::::.. ::.....:: CHLYS GNGMNAWVAWRNRCKGTDVQAWIRGC 120 130 140 FIGURE 5. Structural comparison of human lysozyme C crystal structure P61626 · LYSC_HUMAN (PDB 133L. 1.77 angstroms) with that of chicken lysozyme C P00698 · LYSC_CHICK (PDB 193L, 1.33 angstroms)using 2StrucCompare (https://2struccompare.cryst.bbk.ac.uk/ accessed 11 May 2023) [140]. Red/pink areas indicate regions of significant side-chain-related structural differences. Note the human lysozyme lysine (K134) in red (indicating that it is a variant from chicken lysozyme) sticking out at the left of the figure. TABLE 3. List of the lysine residues (K) of human lysozyme C by their position in the sequence (numbers under “Position”); whether they are predicted to glycate (YES) or not (blank); and the predicted or known effects of mutations or substitutions of these lysines aggregated from the UniProt database disease and variant function (www.uniprot.org accessed 14 May-19 May 2023). See the Methods section for a source with links to the Provenance websites. E = glutamic acid; R = arginine; * = deletion. Predicted Glycation YES x2 Position & Modification 2 K>R YES x2 19 K>E YES x3 YES x2 31 K 51 K>* Predicted/Known Effect Decreased function Decreased function ? Familial visceral amyloidosis Variant Identification(s) CA385723189, rs1324049222 CA385723407, rs1473154577 CA6679728, RCV001112970, rs367682582 YES x2 51 K>R Decreased function CA6679729, rs760760746 YES x3 87 K 115 K ? Provenance(s) ClinGen, gnomAD ClinGen, gnomAD ClinGen, ClinVar, ESP, ExAC, TOPMed, dbSNP, gnomAD ClinGen, ExAC, gnomAD 42 2.1.4. Caseins The glycation-prediction models suggest that caseins are likely to glycate at multiple lysines. Bovine caseins certainly glycate under milk processing conditions that include pasteurization (and thus supra-physiological, high heat) resulting in non- enzymatic covalent linkage of lactose to five lysines [154]. Four of these lactose sites have weak homologies with K59 of alpha casein (a predicted glycation site), K54 of beta casein (which is not a predicted glycation site), K175 of beta casein (which is also not a predicted glycation site), and K26 of kappa casein (a predicted glycation site). The fifth bovine lactose glycation site appears to have no homology in human caseins. No studies of glucose glycation of caseins were found for any species. In this case, mutation and variant data were available that might shed additional light on the effects of possible glycation on beta casein (TABLE 4). Such data are not available for alpha casein or kappa casein. Beta casein is predicted to glycate at K2, K38, K40, K103 and K111. Of these, variant studies suggest that mutations in K2 and K103 each decrease casein functionality while the mutation of K38 to N (asparagine) is benign (having no effect). These observations suggest that glycation of K2 and K103 are likely to decrease beta casein activity while glycation of K38 may not. The effects of glycation at K40 and K111 cannot be predicted from existing variation data. Some data does exist to confirm that glycation inhibits key casein functions. One of the normal functions of caseins is anti-inflammatory [155]. Lactose glycation of caseins attenuated the ability of caseins to moderate release of IL-6, IL-1β and tumor necrosis factor-α (TNF-α) as well as the secretion of two anti-inflammatory mediators, IL-10 and 43 transforming growth factor-β (TGF-β) in the presence of bacterial lipopolysaccarhide (LPS) [156]. Caseins are also directly antimicrobial [66-68] and glycation may interfere with this function although there appear to be no formal studies to corroborate this possibility. Glucose-mediated glycation studies under physiological conditions representing diabetic hyperglycemia as well as clinical studies comparing casein glycation and activity in diabetic patients with those of healthy controls are needed. 2.1.5. Cathelicidin, Defensins, Dermicidin, Hepcidin, Granulysin, Thrombocidin-1, and Ubiquicidin Glycation prediction programs predict that cathelicidin, defensins, dermicidin, granulysin, thrombocidin-1, and ubiquicidin will all glycate at one or more lysines (FIGURE 1) and all of these, as well as hepcidin, can be predicted to glycate at their free N-terminus, since this is a common occurrence with peptides [75]. All of these peptides are released in various tissues in response to tissue injuries to promote wound healing and prevent or address microbial infections [58, 157-159]. As far as we can determine, no studies of their possible glycation have been carried out. However, there is extensive evidence that under hyperglycemic conditions, diabetic patients produce or release less of these peptides than do normal individuals and that the wound-healing and anti- microbial functions of these peptides is significantly impaired [160-166]. Such evidence suggests that glycation may be occurring and interfering with processing, release and/or activity of these peptides. Additional evidence from mutation and variant studies also points to potential loss of function as a result of glycation of these proteins and peptides. For example, a variety of mutations of K151 of cathelicidin, a predicted glycation residue (FIGURE 1), decrease or obliterate the antimicrobial efficacy of the polypeptide 44 (https://www.uniprot.org/uniprotkb/P49913/entry#disease_variants, accessed 19 May 2023). Similarly, when two of the three lysines predicted to glycate on beta defensin (K25 and K61) mutate, the result is reduced activity (https://www.uniprot.org/uniprotkb/Q8NES8/variant-viewer, accessed 19 May 2023). Thus, the effects of glycation on the function of these and the rest of the peptides and proteins listed in Figure 1 are clearly worth investigating. TABLE 4. List of the lysine residues (K) of human beta casein (β casein) by their position in the sequence (numbers under “Position”); whether they are predicted to glycate (YES) or not (blank); and the predicted or known effects of mutations or substitutions of these lysines aggregated from the UniProt database disease and variant function (www.uniprot.org accessed 14 May-19 May 2023). See the Methods section for a source with links to the Provenance websites. N = glutamine; R = arginine; T = threonine. Predicted Glycation YES Position & Modification 2 K>R, N Predicted/Known Effect Decreased function YES 33 K 35 K>T YES 38 K>N YES YES x2 YES YES x2 YES 40 K 54 K 103 K>T 105 K 111 K 119 K 132 K 175 K>R Decreased function Benign ? Decreased function ? ? ? Decreased function Variant Identification(s) CA2948509, rs775364490, CA98924195, rs953590187 CA357116498, rs1281418473 CA2948425, rs776355136 Provenance(s) ClinGen, ExAC, TOPMed, gnomAD, Ensembl ClinGen, gnomAD ClinGen, ExAC, NCI-TCGA, gnomAD CA2948372, rs752874310 ClinGen, ExAC, gnomAD CA357115237, rs1200339380 ClinGen, TOPMed, gnomAD 45 2.1.6. Cytokines and Lymphokines Interferons, interleukins and tumor necrosis factors play major roles in activating both innate and adaptive immunity and regulating inflammatory processes during infection [70]. A selection (FIGURE 6) suggests that all have multiple glycation sites but no studies of their glycation under physiological or hyperglycemic conditions appear to have been performed. However, a pharmaceutical preparation of human interferon beta (IFN-β) (300µg lyophilized with 15 mg of human serum albumin and 15 mg of glucose at 25oC for 35 days) glycated at two predicted sites: K40 (K18 minus the signal sequence) and K66 (K44 minus the signal sequence) [167]. Additionally, human interferon gamma (IFN- γ) has been found to glycate spontaneously when produced by recombinant gene techniques from Escherichia coli within peptides inclusive of K131 and K148 [168] (corrected for the signal sequence) and proteolytic cleavage occurs at K111 and K131, both of which are predicted glycation sites [168]. Glycation of IFN-γ results in three types of modifications: one involves the early Amadori glycation products, which do not alter the protein’s antiviral activity; the second alters proteolysis, which significantly decreases IFN-γ activity; and the third results in spontaneous dimerization of interferon chains, which completely abolishes activity [168]. If these pharmaceutically-relevant glycations are representative of the glycations that are likely under physiologically-relevant hypeglycemic conditions, then some moderation of interleukin activity can be expected. Many of the predicted glycation sites on interleukins and interferons correspond to lysines that are essential to protein function, as illustrated in Tabled 5 and 6. Additionally, general information confirm that interleukin and interferon activity is abrogated in hyperglycemic conditions. IFN-γ activity is significantly decreased among 46 those with hyperglycemia [169-175]. However, acute (24 hour) hyperglycemia had no effect on IL-6 production by macrophages while simultaneously significantly increasing TNF-α release [174] in response to LPS stimulation. On the other hand, IL-1β, TNF-α, and IL-6 release by macrophages were all significantly increased by chronic, intermittent hyperglycemic conditions [176]. Similarly, TNF-α, IL-13 and IL-1β were all significantly elevated in patients with T2D retinopathy compared with healthy controls [177] and IL- 17A, IL-1β, and TNF-α were all increased in the tears of diabetic patients compared with healthy individuals [178]. Thus, hyperglycemia-driven glycation may decrease interferon production and activity while simultaneously stimulating chronic inflammation; this combination of decreased immunity with increased inflammation may make diabetics more prone to infection. As in the previous Section, significant research is needed in this area. 47 CYTOKINES &LYMPHOKINES UNIPROT ID PREDICTED GLYCATION SITES Interferon alpha-1/13 P01562 17, 73, 94, 107, 144, 145, 158 Inteferon beta Inteferon gamma P01574 P01579 4, 40, 54, 66, 73, 120, 126, 129, 136, 155 2, 29, 35, 36, 57, 66, 78, 81, 84, 91, 97, 103, 109, 110, 111, 131, 148, 151, 153 Interleukin 1 alpha P01583 3, 12, 61, 64, 82, 83, 110, 131, 165, 175, 179,188, 212, 231 Interleukin 6 Interleukin 10 Interleukin 17 P05231 55, 69, 74, 82.,98, 114, 157, 159, 178, 199 P22301 52, 58, 67, 75, 106, 117, 143, 152 Q96PD4 4, 13, 32, 35, 56, 113, 133, 145 Tumor necrosis factor alpha P01375 19, 20, 87, 174, 188 FIGURE 6. Summary of results of glycation predictions for selected cytokines and lymphokines made by two programs. Numbers refer to the amino acid sequence in the UniProt database. Those numbers on a magenta background are unique predictions made by NetGlycate; those numbers on a light gray background are unique predictions made by PredGly; those numbers in white on a maroon background are predictions agreed upon by NetGlycate and PredGly. Gly-PseAAC became unavailable before we were able to run these protein sequences through its glycation prediction program. 48 TABLE 5. List of the lysine residues (K) of human interferon gamma (IFN-γ) by their position in the sequence (numbers under “Position”); whether they are predicted to glycate (YES) or not (blank); and the predicted or known effects of mutations or substitutions of these lysines aggregated from the UniProt database disease and variant function (www.uniprot.org accessed 14 May-19 May 2023). See the Methods section for a source with links to the Provenance websites. E = glutamic acid; I = isoleucine; M = methionine; Q = glutamine; R = arginine; T = threonine; * = deletion. Predicted Glycation YES YES x2 YES YES x2 YES x2 YES x2 Position & Modification 2 K>T Predicted/Known Effect Decreased function Variant Identification(s) CA6675947, rs768834812 Provenance(s) ClinGen, ExAC, gnomAD 29 K 35 K 36 K 57 K 66 K>E ? ? ? ? Decreased function Decreased function CA238116582, rs868730796 CA385690379, rs761801101 ClinGen, Ensembl ClinGen, ExAC. gnomAD YES 78 K>I, T YES YES YES YES x2 YES x2 YES 81 K 84 K 91 K 97 K>R 97 K>M 103 K>E ? ? ? Decreased function Benign Increased colon cancer risk CA385690014, rs1386140712 TCGA novel CA6675875, COSM1210289, rs746486099 YES x2 109 K>E Decreased function CA6675873, rs771694484 YES x2 109 K>R YES x2 YES YES x2 YES x2 110 K 111 K 117 K>Q 117 K>R 131 K 148 K>E Decreased function ? ? Benign Decreased function ? Decreased function CA385689765, rs1318962050 TCGA novel CA238116578, rs568892292 CA6675838, rs752565874 YES YES 151 K 153 K ? ? 49 ClinGen, gnomAD NCI-TCGA ClinGen, cosmic curated, ExAC, gnomAD ClinGen, ExAC. TOPMed, gnomAD ClinGen, gnomAD NCI-TCGA ClinGen, Ensembl ClinGen, ExAC, TOPMed, gnomAD TABLE 6. List of the lysine residues (K) of human interleukin 10 (IL10) by their position in the sequence (numbers under “Position”); whether they are predicted to glycate (YES) or not (blank); and the predicted or known effects of mutations or substitutions of these lysines. See the Methods section for a source with links to the Provenance websites. E = glutamic acid; N = asparagine; Q = glutamine. Predicted Glycation YES YES x2 YES YES YES x2 YES x2 YES YES x2 Position & Modification 52 K 58 K 67 K 75 K 106 K>Q 117 K 135 K>N 137 K 143 K>N 148 K 152 K 156 K 175 K>E Predicted/Known Effect ? ? ? ? Decreased function ? Decreased function Variant Identification(s) Provenance(s) CA36549305, rs149143243 ClinGen, ESP CA344481848, rs1252096113 ClinGen, TOPMed Benign TCGA novel NCI-TCGA ? Decreased function CA1363724, rs775882367 ClinGen, ExAC, gnomAD 2.1.7. Toll-like Receptors (TLR) TLR mediate the processing of essentially all antigens, interacting with human leukocyte antigens (HLA) to either tolerize or activate adaptive immune responses and TLR form a complex network of both synergistic and antagonistic interactions to regulate these responses [71]. Glycation of TLR could therefore have significant detrimental effects on immune function and regulation. FIGURE 7 demonstrates that all TLR have multiple lysines that are likely to glycate. Because the vast majority of the literature concerning TLR glycation focuses on TLR4, we will limit out detailed discussion of the effects of glycation on TLR to TLR4, assuming that glycation has similar effects on other TLR. Evidence of TLR2 glycation at Lysine 404 has been provided by Zhang [97] and of 50 TLR4 glycation at unknown sites by Bezold [179]. TLR glycation results in decreased receptor activity [179]. TLR 4 is predicted to glycate at as many as 25 lysines. Six of these lysines (of which three are predicted by both programs) have naturally-occurring genetic variants that are thought to decrease function (TABLE 7), so that glycation of these lysines can be predicted to decrease function as well. Clinical and experimental data regarding glycation under hyperglycemic conditions are difficult to interpret, however, because TLR4 (and TLR2) are upregulated by RAGE (at least under moderate hyperglycemic conditions up to 15mM glucose) [180-183]. Thus, any decrease of TLR function due to glycation of the receptors themselves may be overwhelmed by the increase in their expression by RAGE activation. A further complication is that the classical TLR 4 ligand, endotoxin, has been found to be significantly increased in diabetic patients [184], also promoting increased TLR 4 expression. Studies separating out these varied factors are rare. Son, et al. [185] found that RAGE-independent treatment of TLR4 inhibited its function and the TLR-dependent production of pro-inflammatory cytokines. Jialal, et al. [186] also found that if the glucose concentration is raised from 15 mM to 30 mM, “there was a significant decrease in both TLR2 and TLR4 expression (60% and 80% decrease respectively, p<0.001 by ANOVA) [and] phagocytosis of S. aureus was decreased in PMN [polymorphonuclear leukocytes] incubated with 30mM glucose (50% decrease, p<0.05) and killing of S. aureus was impaired in PMN incubated with the bacteria at 30mM glucose concentrations compared to 5.5 mM glucose”. In sum, it is likely that hyperglycemia induces glycation of key lysines on TLR that result in innately decreased receptor function while other factors such as RAGE activation and increased levels of TLR ligands simultaneously increase TLR expression. Clearly, 51 further research both on the specific sites of glycation and on their functional effects in response to infections under hyperglycemic conditions is needed. PROTEIN UNIPROT ID PREDICTED GLYCATION SITES Toll-like receptor 10 (EC 3.2.2.6) (CD antigen CD290) Q9BXR5 Toll-like receptor 1 (EC 3.2.2.6) (Toll/interleukin-1 receptor-like protein) (TIL) (CD antigen CD281) Q15399 Toll-like receptor 2 (EC 3.2.2.6) (Toll/interleukin-1 receptor-like protein 4) (CD antigen CD282) O60603 Toll-like receptor 3 (CD antigen CD283) O15455 Toll-like receptor 4 (EC 3.2.2.6) (hToll) (CD antigen CD284) O00206 Toll-like receptor 5 (Toll/interleukin-1 receptor-like protein 3) O60602 73, 90, 109, 153, 192, 212, 243, 325, 349, 366, 375, 383, 412, 432, 443, 460, 515, 565, 622, 629, 676, 744, 746, 752, 769, 763 33, 46, 70, 90, 104, 107, 117, 133, 153, 166, 178, 200, 222, 239 245, 282,329, 346, 389, 415, 416, 448, 458,462, 466, 520, 537, 652, 661, 677, 689, 745, 747, 762, 779 19, 37, 137, 150, 164, 192, 195, 208, 252, 260, 271, 308, 347, 360, 404, 413, 437, 439, 488, 505, 526, 619, 628,633, 671, 676, 683, 695, 698, 743, 759, 783 27, 41, 89, 97, 102, 137, 139, 145, 147, 187, 200, 201, 210, 240, 272, 330, 335, 345,371, 382, 418, 421, 493, 559, 589, 613, 619, 697, 745, 765,767, 779, 785, 808, 859,872, 883, 892, 900 47, 57, 153, 166, 186, 224, 274,230, 324, 341, 349, 351, 354, 362, 388, 435, 477, 560, 561, 615, 653, 666, 694, 773, 819 88, 106, 137, 145, 168, 196, 218, 275, 315, 323, 326, 361, 369, 385, 414, 420, 639, 670, 680, 692,753, 812, 844, 845 FIGURE 7. Summary of results of glycation predictions for TLR sequences made by two programs. Numbers refer to the amino acid sequence in the UniProt database. Those numbers on a magenta background are unique predictions made by NetGlycate; those numbers on a light gray background are unique predictions made by PredGly; those numbers in white on a maroon background are predictions agreed upon by NetGlycate and PredGly. Gly-PseAAC became unavailable before we were able to run these protein se-quences through its glycation prediction program. The K404 in TLR2 that is italicized, bolded, underlined and in orang font is known to glycate [97] (see also: https://cplm.biocuckoo.cn/View.php?id=CPLM002619 , accessed 15 June 2023). 52 Figure 7 (cont’d) Toll-like receptor 6 (EC 3.2.2.6) (CD antigen CD286) Q9Y2C9 Toll-like receptor 7 Q9NYK1 Toll-like receptor 8 (CD antigen CD288) Q9NR97 Toll-like receptor 9 (CD antigen CD289) Q9NR96 3, 5, 10, 38, 40, 48, 53, 97, 114, 134, 140, 157, 160, 184, 197, 229, 240, 276, 301, 356, 386, 390, 394, 400, 421, 453, 463, 467, 471, 525, 542,560, 573, 682, 697, 750, 752 9, 23, 32, 44, 54, 108, 114, 119, 130, 165, 197, 205, 207, 212, 239, 274, 293, 311, 315, 334, 373, 410, 421, 424, 432, 478, 480, 493, 502, 509, 567, 593, 596, 600, 651, 659, 684, 688, 693, 694, 699, 726, 731, 740, 758, 764, 776, 821, 875, 877, 900, 922, 951, 952, 960, 961, 968, 982, 997, 999 63, 125, 163, 189, 219, 226, 235, 301, 308, 314, 350, 465, 412, 435,453, 476, 484, 602, 608, 632, 639, 652, 677, 753, 759, 763, 794, 812, 866, 868, 889, 911, 940, 941, 948, 949, 952, 957, 998, 1039 51, 95, 143, 207, 234, 292, 326,328, 347, 445, 532, 669, 678, 690, 723, 725, 878, 932 TABLE 7. List of the lysine residues (K) of human Toll-like receptor 4 (TLR 4) by their position in the sequence (numbers under “Position”); whether they are predicted to glycate (YES) or not (blank); and the predicted or known effects of mutations or substitutions of these lysines aggregated from the UniProt database disease and variant function (www.uniprot.org accessed 14 May-19 May 2023). See the Methods section for a source with links to the Provenance websites. E = glutamic acid; I = isoleucine; M = methionine; N = asparagine; Q = glutamine; R = arginine; T = threonine; * = deletion. Variant Identification(s) Provenance(s) Predicted Glycation YES x2 YES YES YES YES YES x2 YES Predicted /Known Effect ? Position & Modification 47 K 57 K 130 K 150 K 153 K 166 K 186 K 224 K 230 K>E Decreased ? ? function 244 K CA5212262, rs200759610 ClinGen, ExAC, gnomAD 53 Table 7 (cont’d) YES YES x2 271 K>E Decreased function ? 274 K 324 K>N Decreased function YES x2 341 K>E Decreased function YES x2 349 K>E Decreased function YES 351 K>R Decreased function YES 354 K>N Benign: Natural variant YES 354 K>Q Decreased function YES 362 K>N Decreased YES x2 YES YES YES YES YES YES x2 YES x2 YES YES YES function ? 388 K 402 K>E Decreased function 435 K 477 K>E Decreased function ? ? 541 K 560 K 561 K 595 K 615 K 631 K ? 653 K 666 K ? 694 K>N Benign: Natural variant 729 K 732 K>N Decreased function ? 773 K 776 K 813 K 819 K>N Decreased function CA5212276, rs775417409 ClinGen, ExAC, gnomAD CA374655869, rs1408648883 CA374655980, rs1202183903 CA5212324, rs753477338 CA5212328, rs752106276 CA5212330, rs56070048 CA374656062, rs1304901531 CA5212336, rs778840011 CA5212352, rs539153708 CA374656879, rs1304711191 ClinGen, TOPMed ClinGen, TOPMed ClinGen, ExAC, TOPMed, gnomAD ClinGen, ExAC, gnomAD ClinGen, 1000Genomes, ESP, ExAC, TOPMed, gnomAD ClinGen, TOPMed ClinGen, ExAC, gnomAD ClinGen, 1000Genomes, ExAC, TOPMed ClinGen, gnomAD CA5212469, rs747209292 ClinGen, ExAC. NCI-TCGA, gnomAD CA374658590, rs769601560 ClinGen, ExAC, TOPMed, gnomAD CA374659159, rs201274947 ClinGen, ESP, ExAC, TOPMed, gnomAD 54 2.1.8. Immunoglobulins Immunoglobulins make possible specific antibody-mediated immune responses to antigens and are one of the main triggers of the complement cascade [72]. A selection of immunoglobulin chains and their predicted glycation sites is provided in FIGURE 8. Genetic variants are not available on UniProt for most of the immunoglobulin chains listed in FIGURE 8 with the exception of the immunoglobulin J chain (P01591) (TABLE 8). Two of its four lysines are predicted to glycate at sites where studies have reported that lysines mutated to other amino acids resulted in decreased function (TABLE 8). No mutations of the other two lysines have been documented and thus the effects of modifying these is unknown. More generally, there is no doubt that immunoglobulins (Ig) do glycate under the hyperglycemic conditions typifying diabetes. The lysine at position 50 in the immunoglobulin lambda constant 1 (IGLC, UniProt ID P01842) was found to be glycated in diabetic patients [186] as were the K67 of immunoglobulin kappa variable 1-5 (UniProt ID P01602), K74 of immunoglobulin lambda variable 2-23 (UniProt ID P01705) [77] as well as the K190 and K208 of immunoglobulin 1 constant 1 (IGCL1, UniProt P0CG04) [96]. Additional glycation sites (as many as 17) have been identified on monoclonal and recombinant humanized immunoglobulins produced for pharmaceutical purposes and stored in 50-100 mM glucose solutions [188-190], though how many of these sites become glycated under physiologically-relevant hyperglycemic conditions is unknown. Most studies of diabetic IgG report two-to-four glycation sites per light and heavy chain [191-194]. Hyperglycemia leads to an increase in the percentage of IgG glycated from 6% among healthy controls to 11% among diabetics and correlates with increases in 55 hemoglobin A1c glycation [191]. This IgG glycation has little effect on binding to antigen [189] but leads to significant decreases in complement fixation [191-195] and thus immunity to bacterial infections [195]. PROTEIN UNIPROT ID PREDICTED GLYCATION SITES Immunoglobulin alpha-2 heavy chain BUT P0DOX2 Immunoglobulin delta heavy chain WAH P0DOX3 Imunoglobulin epsilon heavy chain ND P0DOX4 Immunoglobulin gamma-1 heavy chain NIE P0DOX5 43, 64, 107, 122, 196, 203, 314, 323, 360, 377, 408, 409 13, 133, 212, 221, 223, 224, 234, 271, 272, 273, 275, 314, 327, 338, 403 13, 23, 98, 220, 302, 327, 352, 367, 388, 391, 435, 459, 499 58, 76, 123, 149, 207, 215, 220, 224, 248, 250, 276, 290, 292, 319, 322, 324, 328, 340, 342, 362, 372, 394, 411, 441 Immunoglobulin J chain (Joining chain of multimeric IgA and IgM P01591 2, 21, 34, 93, 130, 145 Immunoglobulin kappa light chain EU P0DOX7 Immunoglobulin lambda-1 light chain MCG P0DOX8 Immunoglobulin mu heavy chain OU P0DOX6 42, 50, 96, 103, 107, 126, 145, 149, 183, 188, 190, 207 44, 47, 55, 68, 106, 114, 153, 167, 190, 208 13, 45, 58, 73, 77, 174, 198, 201, 213, 222, 224, 237, 254, 256, 276, 292, 308, 361, 364, 391, 438, 499, 554 FIGURE 8. Summary of results of glycation predictions for immunoglobulin sequences made by two programs. Numbers refer to the amino acid sequence in the UniProt database. Those numbers on a magenta background are unique predictions made by NetGlycate; those numbers on a light gray background are unique predictions made by PredGly; those numbers in white on a maroon background are predictions agreed upon by NetGlycate and PredGly. Gly-PseAAC became unavailable before we were able to run these protein sequences through its glycation prediction program. 56 TABLE 8. List of the lysine residues (K) of human Joining immunoglobulin chain (J- region) by their position in the sequence (numbers under “Position”); whether they are predicted to glycate (YES) or not (blank); and the predicted or known effects of mutations or substitutions of these lysines aggregated from the UniProt database disease and variant function (www.uniprot.org accessed 14 May-19 May 2023). See the Methods section for a source with links to the Provenance websites. N = asparagine; Q = glutamine; R = arginine. Predicted/Known Effect ? Variant Identification(s) Provenance(s) Ig Joining Predicted Glycation YES YES YES Position & Modification 2 K 16 K 21 K>N 34 K 36 K>N 92 K>R YES x2 93 K>I YES 130 K>N Decreased function Decreased function Decreased function Decreased function Decreased function CA99049618, rs182852639 CA2952562, rs202199302 CA2952513, rs759488146 CA2952512, rs776788987 CA357147929, rs554652827 YES 130 K>Q Decreased function CA2952494, rs766553791 YES x2 145 K ? ClinGen, 1000Genomes ClinGen, ExAC, TOPMed, gnomAD ClinGen, ExAC, gnomAD ClinGen, ExAC, gnomAD ClinGen, 1000Genomes, ExAC, gnomAD ClinGen, ExAC, gnomAD 2.2. Effects of Glycation on LF, LFpep, Kaliocin and Lysozyme Activity Against Candida albicans Lactoferrin, its microbial peptides and lysozyme are known to act as antimicrobial agents against Candida albicans [251]. To test whether glycation affected their anti- Candida activity, LF, LFpep, kaliocin and human lysozyme at various concentrations were incubated in deionized water or 6.25 mM, 12.5 mM, or 25 mM glucose solutions for seven to ten days at 37oC. These incubated preparations were then mixed 1:1 by volume with Candida that had been grown to maximal density in Sabouraud’s medium over seven to 57 ten days at 37oC and the mixtures incubated for an additional two days. The incubated mixtures were then plated on Petrifilm Y&M Plates, incubated for another two days, and the resulting colonies counted. A range of protein and peptide concentrations and Candida dilutions had to be explored to find appropriate antimicrobial-yeast mixtures that neither killed all the yeast nor resulted in lack of observable anti-microbial activity. The results demonstrated the glucose concentrations utilized here had no significant effect on Candida colony counts in the absence of the antimicrobial proteins or peptides (TABLE 9 and FIGURES 9 and 10), which is not surprising given that a primary component of Sabouraud’s medium is glucose. FIGURE 9. Colony growth of 1/100,000 dilution of Candida albicans treated with 0, 6.25, 12.5, or 25 mM of glucose (left to right). FIGURE 10. Colony growth of 1/10,000 dilution of Candida albicans treated with 0, 6.25, 12.5, or 25 mM of glucose (left to right). TABLE 9. Effects of glucose concentration on LF, LFpep, Kaliocin and Lysozyme Activity Against Candida albicans. Candida albicans Protein/Peptide 0 Glucose 1/100 in H2O 1608 colonies Lactoferrin 250 µM 0 colonies 6.25 mM Glucose 0 colonies 12.5 mM Glucose 6 colonies 25 mM Glucose 18 colonies 58 1/1,000 in Sabourauds 2952 colonies 1/10,000 in H2O >3000 colonies 1/1000 in H2O >3000 colonies 1/10,000 in H2O 1/100,000 in H2O 1/100,000 In H2O 1/100,000 In H2O 1/100,000 In H2O AVERAGE NONE NONE NONE NONE NONE Lactoferrin 250 µM 0 colonies 0 colonies 5 colonies Kaliocin 500 µM Lfpep 50 µM 142 colonies 41 colonies 187 343 colonies colonies 62 colonies 156 colonies 852 colonies 145 colonies 234 colonies 1048 colonies 1544 colonies 303 colonies 120 colonies 11 colonies 27 colonies 32 colonies 11 colonies 32 colonies 48 colonies 27 colonies 15 colonies 18 colonies 20 colonies 25 colonies 15 20.3 31.7 28.0 colonies 13.7 Incubating LF, LFpep and kaliocin with increasing concentrations of glucose led to decreasing anti-Candida activity (TABLE 9 and FIGURES 11-13). We interpret this decrease in activity to be due to glycation of the protein and peptides, since the glucose itself had significant effect on the Candida growth. While lysozyme was able to kill off all Candida colonies at sufficient concentrations, we were unable to identify any appropriate mixture of Candida dilutions and lysozyme concentrations at which a difference in effect correlated with glucose concentration. On the contrary, all lysozyme experiments demonstrated the same effect (or lack thereof) on Candida at every glucose concentration (FIGURE 14). There are three possible explanations for this result. One is that lysozyme does not glycate. Given that hen egg- white lysozyme does glycate extensively and is highly homologous to the human lysozyme employed here, this explanation seems unlikely. A second and more likely explanation is that the glycation sites on the human lysozyme do not interfere with its enzymatic activity. This possibility will require additional experimentation to test. A third possibility is that the antimicrobial mechanism by which lysozyme interferes with Candida 59 growth is not enzymatic but rather its recently discovered ability to intercalate into cell membranes, disrupting membrane stability [252]. In this later case, glycation might actually stabilize the lysozyme and make it more lipophilic than the non-glycated form thus, if anything, improving its overall activity [253;254] Again, this possibility will require further experimentation. An additional confounding factor is the observation that very large changes in lysozyme concentration are required to observe significant effects on Candida growth [255], so that the protocol utilized here is simply too insensitive to yield observable results and a microbe that is more sensitive to lysozyme (such as Micrococcus lysodeikticus) might be more appropriate. FIGURE 11. Candida albicans colony growth after treatment with LF or LF and 6.25,12.5, or 25 µM glucose (left to right). 60 FIGURE 12. Candida albicans colony growth after treatment with LFpep or LFpep and 6.25,12.5, or 25 µM glucose (left to right). FIGURE 13. Candida albicans colony growth after treatment with Kaliocin or Kaliocin and 6.25,12.5, or 25 µM glucose (left to right). 3. Discussion Our results can be summarized in four basic conclusions: 1) while glycation prediction models are inconsistent in the specific predictions they make about which lysine residues will glycate on any given peptide or protein, they are consistent in predicting that all key proteins and peptides involved in both innate and adaptive antimicrobial functions are likely to glycate; 2) most of these peptides and proteins are predicted to glycate at multiple sites, the modification of which are known, either from experimental studies or from genetic variant studies, to interfere with the protein’s or peptide’s activity; 3) experimental and clinical evidence is consistent that glycation of immune system proteins and peptides follows hyperglycemia and results in impaired 61 function;; and 4) all of the preceding factors add to the inflammatory burden of diabetics through increases in advanced glycation end product production, impairment of molecular mechanisms for neutralizing AGEs (e.g., LTF, lysozyme and TLR), and increased cytokine production associated with both AGEs and infections (summarized in FIGURE 14). These results are consistent with our hypothesis that the unusual susceptibility of diabetics to infection is a result of hyperglycemia-associated glycation of key immune system molecules. As noted in the Introduction, hyperglycemia-driven glycation of immune system proteins and peptides may therefore decrease significantly the immune system’s ability to respond to infections for type 1, type 2 and gestational diabetics. The resulting immunological deficits may particularly make diabetics susceptible to antibiotic- resistant strains of bacteria and fungi because there are, at present, no reported pathogens, including antibiotic-resistant strains, that are resistant to Lf and Lfcins [44] while kaliocin-1 has been found to be effective against Candida spp., including fluconazole- and amphotericin B-resistant clinical isolates [43]. In addition, Lf and Lfcins have been shown to be synergistic with antimicrobial and antiviral drugs [44] so that glycation-impaired endogenous function may lessen the effectiveness of antibiotic therapies as well. For these reasons, we urge further investigation of the hypothesis that hyperglycemia-associated glycation of immune system proteins and peptides causes impaired immune function leading to increased susceptibility to infection. We must, however, emphasize that the present paper presents a testable hypothesis, not a firm conclusion, that hyperglycemia does induce glycation of immune system molecules leading to impairment of their function. The plausibility of the hypothesis is based on several interrelated types of evidence. We used state-of-the art ai 62 programs to predict, for the first time, the glycation sites of immune system proteins and peptides; then we demonstrated, using existing clinical studies, that many immune system proteins and peptides are known to glycate as a result of hyperglycemia experience by diabetics; next, we cite two types of evidence that such glycation is likely to result in functional impairment: 1) genetic studies demonstrating that substitution of lysines predicted to glycate impairs function; and 2) a review of in vitro studies of some of the proteins and peptides that have already demonstrated impaired function. More specifically, since TF glycation sites and functional impairment have previously been demonstrated, we use the homology between it and to draw strong inferences about where glycation is likely to occur on LTF and that will alter its activity; and similarly, since the glycation sites and functional effects of glycation have been well-studied in chicken lysozyme, we use the homology of human lysozyme to chicken lysozyme to draw strong inferences about the glycation sites on human lysozyme and its likely functional effects. In other words, we have drawn together, for the first time, all of the relevant data concerning known and predicted glycation sites for immune system proteins and peptides; compared and contrasted these predictions with each other and with experimental and clinically-derived glycation data; and integrated this glycation information with all of the known or predicted effects of glycation related to this class of proteins and peptides. This highly synthetic endeavor integrates theoretical, experimental and clinical data. Impairment of immune function that follows from such glycation would logically lead to the observed increase in risk for COVID-19 and its complications and, similarly, to impaired immune responses to other infectious diseases as well. However, none of these types of evidence, nor even their accumulation, represent direct validation 63 of the hypothesis, the limitations of which are outlined next. FIGURE 14. Summary of the types of mechanisms involved in hyperglycemia-induced impairment of immune function and enhancement of inflammation in diabetic patients. Antimicrobial proteins and peptides such as lysozyme, lactoferrin, lactoferricin, kaliocin, Toll-like receptors such as TLR2 and TLR4, (and all the other peptides and proteins discussed in this paper)., become glycated under hyperglycemic conditions. Their ability to neutralize microbial infections is thus impaired because their ability to bind to their targets is sterically blocked. The build up of glycated proteins leads to production of advanced glycation end products (AGEs) that are hyperinflammatory. Notably, lactoferrin, lysozyme and TLR4 are all involved in neutralizing AGEs so that their own glycation interferes with increased inflammation results from concurrent infection and AGE build up. Additional inflammatory processes that follow from the ones illustrated here are discussed in Section 3.4 below. their AGE-neutralization In consequence, functions. 3.1. Limitations of Current Glycation Prediction Models One serious limitation of this study is that the glycation prediction models utilized are not consistent in their predictions. The results for serotransferrin (Section 2.1.1) and lactotransferrin (Section 2.1.2) are illustrative. In both cases, each glycation prediction program predicted many more glycated lysines than were found experimentally or clinically. Moreover, the predictions shared by more than one glycation prediction 64 program were no more likely to be correct than the unique predictions made by individual programs. The problems underlying the inaccuracy of the programs are too complex to attempt to identify and resolve here and will be taken up in a separate publication. Suffice it to say that the experimental data utilized for training these programs may need to be better curated to differentiate between food preparation, pharmaceutical storage and diabetic hyperglycemia conditions; features other than amino acid sequences and secondary structure may need to be taken into account; some lysines may be inaccessible due to interactions with other proteins; the rate of protein or peptide turnover may also be a critical factor; and models that are better adapted to address the glycation of peptides rather than proteins are clearly needed. However, that said, it is interesting to note that in no instance did we find an experimental or clinical report of glycation at a site on any of the proteins or peptides that we studied here that was not predicted by at least one of the programs employed. Thus, while none of the programs were particularly accurate, the use of several programs ensured that all of the lysines known to be glycated under diabetes-appropriate conditions were actually predicted to glycate. In other words, we observed many “false positives” but no “false negatives” in the aggregated predictions. More and better experimental and clinical protein and peptide glycation data directly relevant to understanding diabetic hyperglycemia would be very useful in this regard and will be, in any case, necessary to test the hypothesis presented here. 3.2. Limitations of Available Data The other major limitation of the study is the lack of available data concerning glycation sites on immune system proteins and peptides and, in particular, the effects of glycation on their functions. As noted repeatedly in the Results, while most of the proteins 65 and peptides investigated here are known to glycate under some set of conditions, the degree to which they glycate under normoglycemic, physiological conditions is not known for almost any of them nor is the degree to which clinically-relevant hyperglycemic conditions may increase glycation. What little data was available suggests that rates and extent of glycation of most of the proteins and peptides will imitate hemoglobin A1c, which may result in significant functional impairment for many or all. However, until this prediction is tested experimentally and clinically, the full extent of immune system impairment due to glycation must remain hypothetical. Several types of experiments are needed in particular. The first are in vitro experiments exposing each immune system protein or peptide to normoglycemic and physiologically-relevant hyperglycemic concentrations of sugars (e.g., glucose, fructose) for reasonable spans of time (e.g., several days to weeks). The resulting materials can then be used for two distinct purposes. One would be for quantitation and identification of the glycation sites on each protein or peptide, which would yield data validating or invalidating the various predictions made by the AI glycation prediction models employed here. The second purpose would be to test the relative activity of each protein or peptide under normoglycemic and hyperglycemic conditions using some functional assay such as their antimicrobial activity. For example, LTF and its peptides and lysozyme all kill various Candida strains (Table 1 and [43, 44]) so that dose response curves comparing the antimicrobial activity of the proteins and peptides glycated under normoglycemic and various hyperglycemic conditions could be compared. Other types of functional assays are also possible since many immune system proteins activate down-stream immunological processes. For example, LTF, lysozyme, C1q and TLR are involved in the 66 processing of AGEs and their glycation may interfere with this function. Antibodies are necessary to activate one of the complement pathways and complement proteins are necessary for complement fixation of immune system targets so that antibody and/or complement glycation may impair the complement pathway. Cytokines have a variety of immune system stimulatory and inhibitory effects that may be impaired by their glycation. The results of such functional assays can then be correlated with the degree of glycation of the relevant proteins or peptides observed using mass spectrometry. Finally, by far the most difficult types of experiments to perform from a technical perspective, but also the most important, will be the characterization and comparison of glycation of immune system molecules from clinical specimens. Just as all individuals have some degree of HbA1c and serum albumin glycation, one can expect that some degree of glycation will be found for many, or all, immune system molecules isolated from normoglycemic individuals. Such studies can be carried out using mass spectrometry techniques in conjunction with various protein purification methods such as high pressure liquid chromatography. The essential measure will be the degree to which glycation increases on these molecules among hyperglycemic individuals as compared with normoglycemic individuals and how well such (at this point, hypothetical) glycation correlates with (again hypothetical) impairment of immune function. 3.3. Systemic Effects of Immune System Protein and Peptide Glycation The results summarized here suggest that the extent of impairment caused by glycation of immune system peptides and proteins could be quite significant, affecting both innate and adaptive immunity. Predicted effects of glycation at specific lysine residues within each immune system peptide or protein have been outlined in Section 2 67 above and will not be summarized here other than to point out that these effects may be cumulative and synergistic. For example, if TLR do not function properly, then activation of monocytes, T cells and B cells will all adversely affected, additively impairing the effects of immunoglobulin glycation and complement glycation. It is worth reiterating that many complement proteins have already been demonstrated to become heavily glycated under diabetic hyperglycemic conditions [96,97] so that the complement system itself will be adversely impacted under hyperglycemic conditions. Moreover, some components of the complement system, such as C1q, are involved in recognition and removal of AGEs [195] so that their glycation may result in enhanced AGE-associated inflammation and complications. The overall result is very likely to be a very significant diminution in antimicrobial function as well as an overall increase in inflammatory damage. Additional systemic effects resulting from the summation of these individual glycations can also be predicted. For example, one of the key functions of lactoferrin, lysozyme and complement factor C1q is to bind to advanced glycation end products (AGEs) [103, 196, 197], moderating the pathogenic effects of these AGEs. However, binding to AGEs blocks the bacterial agglutination and bacteriocidal properties of both proteins [198, 199] so that the buildup of AGEs due to unregulated hyperglycemia in diabetics can be predicted to decrease the antimicrobial functions of lactoferrin and lysozyme. In fact, the regions responsible for lactoferrin’s antimicrobial activity (residues 20-37 and 613-630) [201] are identical to the regions responsible for lactoferrin’s binding to AGEs [198]. Thus, it appears likely that these protein responses to AGEs are actually a form of innate autoimmunity in which “self” proteins modified with AGEs are recognized as foreign microbial antigens. This conclusion is further strengthened by the observation 68 that antibodies from diabetic patients are highly cross-reactive with glycated polylysine [202, 203] suggesting that the innate autoimmunity triggered by AGEs translates into adaptive autoimmunity as well. Since lactoferrin and lysozyme further share AGE binding sites with defensins, the antimicrobial activity of which may be similarly affected [198]. Glycation of LTF, lysozyme and defensins themselves adds another layer of complications for understanding how hyperglycemia may affect immune function: decreased binding to AGEs due to glycation can be predicted to increase the availability of each protein for antimicrobial activity but the antimicrobial activity of each is likely to be directly inhibited by glycation; meanwhile, the decreased neutralization of AGEs by these proteins is likely to increase diabetic complications systemically and also increase autoantibody production against AGE-modified “self” proteins. 3.4. Increased Inflammation One thing is almost certain, which is that glycation of immune system proteins and peptides will inevitably lead to their participation in advanced glycation end-product (AGE) production and that AGEs are well-known drivers of hyperinflammation-associated complications in diabetics. AGE effects are largely mediated by their receptors (RAGE), which are non-specific multi-ligand pattern recognition receptors that mediate initiation of distinct signaling cascades and activation of numerous transcription factors involved in inflammation generally and diabetes in particular [204-209]. Because key proteins and peptides such as lactotransferrin and lysozyme that normally neutralize AGEs are likely to be glycated (and inactivated) themselves, RAGE activation is likely to increase inflammatory processes generally for diabetics (see Figure 9 above). Another way in which glycation may increase inflammatory processes is by 69 interfering with other protein and peptide modification processes such as glycosylation and ubiquination that are essential for regulation of function and for cellular housekeeping. Glycosylation is the enzymatic addition of sugar residues to proteins and peptides and can control essential functional features of proteins and peptides such as folding and binding. Ubiquination is another enzyme-mediated reaction in which regulates protein turnover as well as marking damaged proteins for degradation. Interfering with these and other enzyme-mediated modifications of proteins and peptides can result in the buildup of non-functional or deleterious protein products that can drive inflammatory processes [210-214]. Evidence for disruptions of protein function related to interference with glycosylation, ubiquination and other modification processes has been observed in diabetic patients. For example, LF in women with gestational diabetes (GD) have altered glycosylation patterns, possibly due to glycation, and the amounts of LF and immunoglobulins are both significantly decreased in the breast milk of GD patients [133, 215], perhaps because glycation interferes with signal peptide functions on these molecules. Indeed, we noted with regard to lysozyme C (Section 2.1.3) that glycosylation of the lysines at position 2 and/or 20 could interfere with proteolysis of the signal sequence thereby interfering with lysozyme function. Reference to the Compendium of Protein Lysine Modifications (https://cplm.biocuckoo.cn/View.php?id=CPLM004870, accessed 18 May 2023) illustrates the fact that the lysine residues of most proteins are not only susceptible to glycation, but also to malonylation, crotonylation, carboxymethylation, ubiquitination, acetylation, and succinylation, each of which has its own effects on the processing and stability of the protein. Thus, studies of the effects of glycation on, for example, 70 subsequent ubiquitination-mediated removal of these proteins may be enlightening since it is probable that glycated proteins, while functionally damaged, are also more difficult to ubiquinate and remove. Such an effect has been demonstrated in mice feed high- glycemic index diets and is compounded by glycation of ubuiquitin itself [216]. Overall, several reviews of the effects of diabetes on immune function have documented through a combination of human clinical, ex vivo and animal studies that hyperglycemia results in a very broad array of immune system dysfunctions that include: suppression of cytokine production and activity; defects in leukocyte recruitment; impaired pathogen recognition; neutrophil, macrophage, monocyte and natural killer cell dysfunction; and decreased complement activation by antibody [217, 218]. All of these effects can be explained, at least in part, by glycation of the relevant immune system proteins and peptides regulating these immunological functions. 3.5. Anti-Glycation Interventions If the hypothesis proposed here is correct, and it is hyperglycemia rather than diabetes per se that is responsible for the increased risk of infection associated with diabetes (see Introduction), then one obvious way to reduce that risk among diabetics is better glucose regulation [219]. Given that such control is not always possible, a number of reasonably well-validated nutraceutical and pharmaceutical interventions significantly reduce protein and peptide glycation. These include a variety of amino acids, particularly those with free amino groups in their side chains and most prominently lysine itself [116] but also including glycine, arginine and peptides rich in the foregoing amino acids [220]; taurine [221]; carnosine [222]; as well as vitamin C [223, 224]. These compounds are likely to glycate more readily during digestion or after absorption into the blood stream 71 than more complex and sterically or conformationally protected peptides and proteins. L- lysine supplementation, for example, has been shown to protect lysozyme activity and prevent its glycation in type 2 diabetics [116] and can therefore be presumed to have similar protective effects on other immune system peptides and proteins. However, there appear to be no long-term studies of the consequences of producing and clearing the resulting glycation products either for digestion or renal clearance so further research is needed. Bariatric surgery may also provide a means to lower average glycemic index and consequent infection risk among patients for whom the surgery is appropriate. Studies [225, 226] have demonstrated that such surgeries significantly decreased the risk of severe COVID-19 and subsequent death among obese type 2 diabetic patients compared with matched controls, which is consistent with the normalization of their glycemic index that followed surgery. 3.6. Therapeutic Use of Immune System Peptides and Proteins Finally, for diabetic patients with pre-existing and difficult-to-control hyperglycemia- associated glycation, therapeutic use of immune system peptides and proteins may be beneficial. For example, ingestion of lactoferrin at doses ranging from 300 mg to 1 G per day have proven to have clinically demonstrable benefits in well-controlled studies of various viral and bacterial diseases including influenza, SARS and MERS [227-229]. This positive track record led many groups to suggest using lactoferrin supplementation to prevent SARS-CoV-2 infection or to treat mild cases and many studies found it effective [230-234] (though not against already established hospitalized cases [235]). In this regard, it is important to note that there are no reported pathogens, including antibiotic- 72 resistant strains, that are resistant to LTF or lactoferricin [64] while kaliocin-1 has been demonstrated to be effective against many Candida spp., including fluconazole- and amphotericin B-resistant clinical isolates [63]. Therefore, LTF supplementation, and/or use of LTF anti-microbial peptides, may be particularly important for supporting improved infection resistance among hyperglycemic patients. Similarly, interferon treatment appears to be beneficial for some COVID-19 patients and in treating other viral infections among people with hyperglycemia. SARS-CoV-2 itself deploys an array of proteins that directly interfere with interferon function [236] so that many studies have found that supplementary interferon, particularly in the early stages of COVID-19, is effective at decreasing the severity of disease, risk of death, and complications [237-239]. Additionally, for some COVID-19 patients, immunoglobulins from recovered patients have proven to be beneficial, particularly during the first 72 hours following appearance of symptoms and among milder cases [239-241]. One difficulty with all of the studies cited above is that none have broken down data on efficacy or inefficacy as a function of diabetes/hyperglycemia as a risk factor. This is an unfortunate consequence of employing truly randomized control studies. Given the data integrated in this paper, it is likely that diabetics with high glycation loads and other people with uncontrolled hyperglycemia represent an immunologically distinct group with unusual infection risks and mechanistic complications. Among the latter must be included the possibility that, for example, interferons are less effective among hyperglycemics than among other people because of glycation of interferon receptors – an issue not addressed above. Similarly, immunoglobulins (as well as monoclonal antibodies) may prove to be 73 less effective among hyperglycemics than among other groups because of the well- established problem of glycation of complement proteins. Thus, complement supplementation may also be needed. Additional problems may arise in delivering LTF or LTF peptide supplementation if patients are ingesting high sugar (glucose, fructose, etc.) diets since such diets may glycate LTF during digestion, inactivating it before it can be absorbed. On the other hand, proteins high in lysine residues, such as LTF and caseins, may blunt some of the potential hyperglycemia that such high-sugar diets would otherwise produce by acting like lysine supplementation (see above, Section 3.4). 3.7. Implications Beyond Diabetes: Stress-Induced Hyperglycemia and Obesity Our hypothesis may have implications beyond diabetes because hyperglycemia can occur transiently or even chronically among non-diabetics as well. One well-established cause of hyperglycemia is stress. Stress-induced hyperglycemia (SIH) is defined as a blood glucose level above 180mg/dl in patients without pre-existing diabetes that results in impaired insulin function and may require exogenous insulin treatment [242, 243]. It is not uncommon among hospitalized patients. The extent to which SIH may result in significant glycation of immune system proteins and peptides is completely unknown and is likely to depend on how quickly any given protein or peptide glycates, the length of time the stress-induced hyperglycemia lasts, and the degree of hyperglycemia. Should significant glycation accompany SIH, as might occur if it has been established for some time prior to hospitalization, then all of the types of immune system impairments described here would be likely to occur. On the other hand, SIH is easily treatable with insulin[242, 243] and its prompt recognition and treatment should predictably preempt most or all adverse immunological effects. However, given the total lack of data concerning the 74 effects of SIH both on immune function and on protein glycation, the preceding is completely conjectural. Another possible cause of immune impairment predicted by our hypothesis might result from repeated, transient hyperglycemic episodes resulting from, for example, ingestion of sugared beverages and snacks such as donuts or candy bars, many times daily. A meta-analysis of the literature found that ingestion of more than 95 grams of glucose per 2000 kcal diet daily was very significantly correlated with development of metabolic disease and type 2 diabetes [244]. Such habits are also associated with fasting blood glucose levels of 126 mg/dl or more and 2-hour post-prandial blood sugar levels above 160 mg/dl [245-247]. Since some proteins and peptides begin to glycate significantly within 30 minutes at these glucose concentrations [95], it is possible that impairment of immune system molecules accompanies metabolic syndrome and pre- diabetes. Again, in the absence of any relevant studies, this possibility is purely conjectural. 3.8. Summary For all of the reasons summarized in this Discussion section, we argue that people experiencing chronic or repetitive hyperglycemia represent a unique group at risk for immune dysfunction resulting from glycation of immune system proteins and peptides. The consequences effect key components of both the innate and adaptive immune systems. A much more complete understanding of the complex effects that hyperglycemia can induce these peptides and proteins and on their individual and integrated functions is necessary in order to devise appropriate clinical trials, test interventions and, evaluate their results. Current protein glycation prediction models are 75 of dubious benefit due to their predicting of numerous contradictory results. Only through new and different kinds of research can the unusual infectious disease risks of hyperglycemic patients be understood and treated or prevented. 4. Materials and Methods 4.1. Glycation Site Prediction The consensus primary sequence, as well as structural, sequence mutagenesis and variants, and molecular interaction data for each protein was obtained from the Uniprot database (https://www.uniprot.org, accessed 1 June 2022-3 March 2023). Lysines likely to undergo glycation were identified using three publicly-available glycation-prediction programs: 1) PredGly: predicting lysine glycation sites for Homo sapiens based on XGboost feature optimization [248]: https://github.com/yujialinncu/PredGly (accessed 12-17 June 2023). The PredGly machine learning system for predicting glycation of proteins is based on five elements: (i) a validated training set and independent test set were used to predict the glycation sites for H. sapiens derived from (http://plmd.biocuckoo.org/) database, which included 2473 non-redundant proteins; a maximum number of 15 residues which were the upstream and downstream of the glycation and non-glycation sites were then selected for analysis so that these protein sequences were truncated to 31 residue symmetrical windows. (ii) three types of feature information were extracted: sequence- based features; physicochemical properties; and evolutionary relationships; (iii) a novel method named XGboost, which optimized feature selection, was used with support vector machines learning methods for classification, regression and outliers detection; (iv) tested six common classifiers using cross-validation to select the optimal classifier with the 76 highest “Area Under the Curve” of the “Receiver Operating Characteristic” curve (AUC- ROC), a standard means of measuring the performance of an machine learning models: (v) evaluated and optimized against other lysine glycation machine-learning prediction algorithms. 2) The NetGlycate predictive neural network[249]: https://services.healthtech.dtu.dk/services/NetGlycate-1.0/ (accessed 1 June 2022-14 June 2023). Manual investigation of published studies of glycation of mammalian proteins yielded 20 protein sequences from UniProt with 89 glycated lysines and 126 nonglycated lysines which were used as the training set. Evidence of glycation was either from in vivo studies or in vitro studies mimicking physiological conditions. Amino acid sequences up to 24 positions on either side of the glycated or unglycated lysine were investigated for their influence on glycation probability. A standard feed forward architecture for the neural networks was employed with a back-propagation algorithm for training. A cross-validation procedure was employed in which the data set was divided into n parts of which one part was used for testing network accuracy and the other n – 1 parts for training the network. This procedure was repeated n times, thus using each of the n parts as a test set. Networks were trained with sequence input alone or together with the relative sequence position. Networks were combined using a balloting procedure such that those nets that best differentiated the glycated from the non-glycated inputs were deemed best. 3) Gly-PseAAC: prediction of lysine glycation in proteins incorporating with position- specific propensity [250]: http://app.aporc.org/Gly-PseAAC/index.html (accessed 30 March 2023. This site became unavailable the next day.) 77 A Support Vector Machine (SVM) algorithm was trained on a curated, non-redundant set of 323 glycation sites from 72 proteins from the same database (CPLM http://cplmbiocuckoo.org/ ) used by GlyPred above. Sequence order information and position specific amino acid propensity (PSAAP) were combined in the glycation residue prediction and the amino acids seven residues on either side of each lysine were used to analyze the probability of glycation. Thus, the proteins were divided into “peptides” each containing a lysine at the center and of total length 15. 1165 peptide samples. Within the training set, 223 “peptides” were glycated and 942 were non-glycated peptides. 446 unglycated peptides were then randomly selected from the negative subset. Sequence order information and position specific amino acid propensity was utilized to convert peptide fragments into mathematical expressions. Glycation sites identified using the three programs were analyzed in terms of the known consequences of site modification at the glycation sites listed in the UniProt database (https://www.uniprot.org, accessed 1 June 2022-20 June 2023). Where available, additional information concerning possible glycation effects was extrapolated from published studies of homologous proteins by searching on PubMed or Google Scholar using all reasonable combinations of the words “glycation”, “glycate”, “glycated” and the name of each protein. 4.2. Peptides and Proteins Two proteins and two peptides were tested in the C. albicans assay described above. The proteins were human recombinant lysozyme (Sigma-Aldrich, St. Louis, MO, USA; >100,000 units/mg, item # L1667) and lactoferrin purified from human milk (Sigma- Aldrich, >90% pure by SDS page, item # SRP6519). The two peptides were proteolytic 78 cleave products of lactoferrin: Kaliocin (NH2- FFSASCVPGADKGQFPNLCRLCAGTGENKCA-OH, >95% purity by mass spectroscopy) and Lfpepc (NH2- TKCFQWQRNMRKVRGPPVSCIKR-OH, >95% by mass spectroscopy), both synthesized by RS Synthesis (P.O. Box 70301, Louisville, KY 40270 USA). 4.3. Candida albicans assays Candida albicans was obtained from Carolina Biological Supply Company (2700 York Road, Burlington, NC 27215-3398, item #155965) and cultured for seven days in Sabouraud’s medium (Carolina Biological Supply Co., item #786781), 65 g of Sabouraud Dextrose Agar to a liter of deionized water, boiled for five minutes to ensure sterility. The yeast was then exposed to varying concentrations of unglycated and glycated antimicrobial peptides and proteins. Because the peptides/proteins exhibited different amounts of antimicrobial activity, It was necessary to explore a range of peptide/protein concentrations and C. albicans concentrations in order to obtain results in which the peptide/protein in water was able to kill all, or nearly all of the yeast while the peptides/proteins in glucose solutions displayed varying amounts of anti-yeast activity. Dilutions of the standard culture were therefore prepared 1/100, 1/1000 and 1/10,000 in deionized water, to which various concentrations of glucose, or peptide or protein solutions, or peptide/proteins in glucose, were added. D-Glucose (Sigma-Aldrich, St. Louis, MO, USA) solutions were prepared in deionized water at 6.35 mM glucose, 12.5 mM glucose and 25 mM glucose and incubated at 37oC for seven days. These glucose solutions were added 1:1 by volume to each of the C. albicans culture dilutions; incubated at 37oC for two more days after which 200 µL of each combination was plated on Petrifilm 79 Y&M Plates (Carolina Biological Supply, Co., item #824010) and incubated for two more days. Peptides and proteins solutions were previously prepared at concentrations ranging from 1 mM to 10 µM in deionized water, 6.35 mM glucose (in deionized water), 12.5 mM glucose (in deionized water), 25 mM glucose (in deionized water) and incubated at 37oC for seven days. These peptide and protein solutions were combined 1:1 by volume with the various C. albicans dilutions (1/100, 1/1000, 1/10,000). These combinations were then incubated at 37oC for two more days after which 200 µL of each combination was plated on Petrifilm Y&M Plates. The plates, which contain an incorporated yeast/fungal dye, were incubated at 37oC for a further three days and then the plates were scanned using a Epson scanner. In some instances, the images were color and/or contrast enhanced in order to optimize colony counting. Colonies were counted using ImageJ Version 1.54f, 29 June 2023, (https://imagej.net/ij/download.html, accessed 5 July 2023). 80 BIBLIOGRAPHY 1. Gattinoni L, Gattarello S, Steinberg I, Busana M, Palermo P, Lazzari S, Romitti F, Quintel M, Meissner K, Marini JJ, Chiumello D, Camporota L. COVID-19 pneumonia: pathophysiology and management. Eur Respir Rev. 2021 Oct 20;30(162):210138. doi: 10.1183/16000617.0138-2021. 2. Hu B, Guo H, Zhou P, Shi ZL. Characteristics of SARS-CoV-2 and COVID-19. Nat Rev Microbiol. 2021 Mar;19(3):141-154. doi: 10.1038/s41579-020-00459-7. Epub 2020 Oct 6. Erratum in: Nat Rev Microbiol. 2022 May;20(5):315. Zabidi NZ, Liew HL, Farouk IA, Puniyamurti A, Yip AJW, Wijesinghe VN, Low ZY, 3. Tang JW, Chow VTK, Lal SK. Evolution of SARS-CoV-2 Variants: Implications on Immune Escape, Vaccination, Therapeutic and Diagnostic Strategies. Viruses. 2023 Apr 10;15(4):944. doi: 10.3390/v15040944 4. Lee GC, Restrepo MI, Harper N, Manoharan MS, Smith AM, Meunier JA, Sanchez-Reilly S, Ehsan A, Branum AP, Winter C, Winter L, Jimenez F, Pandranki L, Carrillo A, Perez GL, Anzueto A, Trinh H, Lee M, Hecht JM, Martinez C, Sehgal RT, Cadena J, Walter EA, Oakman K, Benavides R, Pugh JA; South Texas Veterans Health Care System COVID-19 team, Letendre S, Steri M, Orrù V, Fiorillo E, Cucca F, Moreira AG, Zhang N, Leadbetter E, Agan BK, Richman DD, He W, Clark RA, Okulicz JF, Ahuja SK. Immunologic resilience and COVID-19 survival advantage. J Allergy Clin Immunol. 2021 Sep 8:S0091-6749(21)01363-4. doi: 10.1016/j.jaci.2021.08.021. Williamson EJ, Walker AJ, Bhaskaran K, Bacon S, Bates C, Morton CE, Curtis 5. HJ, Mehrkar A, Evans D, Inglesby P, Cockburn J, McDonald HI, MacKenna B, Tomlinson L, Douglas IJ, Rentsch CT, Mathur R, Wong AYS, Grieve R, Harrison D, Forbes H, Schultze A, Croker R, Parry J, Hester F, Harper S, Perera R, Evans SJW, Smeeth L, Goldacre B. Factors associated with COVID-19-related death using OpenSAFELY. Nature. 2020 Aug;584(7821):430-436. doi: 10.1038/s41586-020-2521-4. 6. Chadeau-Hyam M, Bodinier B, Elliott J, Whitaker MD, Tzoulaki I, Vermeulen R, Kelly-Irving M, Delpierre C, Elliott P. Risk factors for positive and negative COVID-19 tests: a cautious and in-depth analysis of UK biobank data. Int J Epidemiol. 2020 Oct 1;49(5):1454-1467. doi: 10.1093/ije/dyaa134. Orioli L, Hermans MP, Thissen JP, Maiter D, Vandeleene B, Yombi JC. COVID- 7. 19 in diabetic patients: Related risks and specifics of management. Ann Endocrinol (Paris). 2020 Jun;81(2-3):101-109. doi: 10.1016/j.ando.2020.05.001 Barron E, Bakhai C, Kar P, Weaver A, Bradley D, Ismail H, Knighton P, Holman 8. N, Khunti K, Sattar N, Wareham NJ, Young B, Valabhji J. Associations of type 1 and type 2 diabetes with COVID-19-related mortality in England: a whole-population study. Lancet Diabetes Endocrinol. 2020 Oct;8(10):813-822. doi: 10.1016/S2213- 8587(20)30272-2. Lustig Y, Sapir E, Regev-Yochay G, Cohen C, Fluss R, Olmer L, Indenbaum V, 9. Mandelboim M, Doolman R, Amit S, Mendelson E, Ziv A, Huppert A, Rubin C, 81 Freedman L, Kreiss Y. BNT162b2 COVID-19 vaccine and correlates of humoral immune responses and dynamics: a prospective, single-centre, longitudinal cohort study in health-care workers. Lancet Respir Med. 2021 Sep;9(9):999-1009. doi: 10.1016/S2213-2600(21)00220-4. 10. Pilishvili T, Gierke R, Fleming-Dutra KE, et al. Effectiveness of mRNA Covid-19 vaccine among U.S. health care personnel. N Engl J Med. 2021;NEJMoa2106599. doi:10.1056/NEJMoa2106599 11. Brosh-Nissimov T, Orenbuch-Harroch E, Chowers M, Elbaz M, Nesher L, Stein M, Maor Y, Cohen R, Hussein K, Weinberger M, Zimhony O, Chazan B, Najjar R, Zayyad H, Rahav G, Wiener-Well Y. BNT162b2 vaccine breakthrough: clinical characteristics of 152 fully vaccinated hospitalized COVID-19 patients in Israel. Clin Microbiol Infect. 2021 Jul 7:S1198-743X(21)00367-0. doi: 10.1016/j.cmi.2021.06.036. 12. Naruse K. Does glycemic control rescue type 2 diabetes patients from COVID- 19-related deaths? J Diabetes Investig. 2020 Jul;11(4):792-794. doi: 10.1111/jdi.13320. Zhu L, She ZG, Cheng X, Qin JJ, Zhang XJ, Cai J, Lei F, Wang H, Xie J, Wang 13. W, Li H, Zhang P, Song X, Chen X, Xiang M, Zhang C, Bai L, Xiang D, Chen MM, Liu Y, Yan Y, Liu M, Mao W, Zou J, Liu L, Chen G, Luo P, Xiao B, Zhang C, Zhang Z, Lu Z, Wang J, Lu H, Xia X, Wang D, Liao X, Peng G, Ye P, Yang J, Yuan Y, Huang X, Guo J, Zhang BH, Li H. Association of Blood Glucose Control and Outcomes in Patients with COVID-19 and Pre-existing Type 2 Diabetes. Cell Metab. 2020 Jun 2;31(6):1068- 1077.e3. doi: 10.1016/j.cmet.2020.04.021. 14. Bode B, Garrett V, Messler J, McFarland R, Crowe J, Booth R, Klonoff DC. Glycemic Characteristics and Clinical Outcomes of COVID-19 Patients Hospitalized in the United States. J Diabetes Sci Technol. 2020 Jul;14(4):813-821. doi: 10.1177/1932296820924469. Epub 2020 May 9. Erratum in: J Diabetes Sci Technol. 2020 Jun 10;:1932296820932678. Fadini GP, Morieri ML, Boscari F, Fioretto P, Maran A, Busetto L, Bonora BM, 15. Selmin E, Arcidiacono G, Pinelli S, Farnia F, Falaguasta D, Russo L, Voltan G, Mazzocut S, Costantini G, Ghirardini F, Tresso S, Cattelan AM, Vianello A, Avogaro A, Vettor R. Newly-diagnosed diabetes and admission hyperglycemia predict COVID-19 severity by aggravating respiratory deterioration. Diabetes Res Clin Pract. 2020 Oct;168:108374. doi: 10.1016/j.diabres.2020.108374. Zhang N, Yun R, Liu L, Yang L. Association of glycosylated hemoglobin and 16. outcomes in patients with COVID-19 and pre-existing type 2 diabetes: A protocol for systematic review and meta-analysis. Medicine (Baltimore). 2020 Nov 20;99(47):e23392. doi: 10.1097/MD.0000000000023392. 17. Chang MC, Hwang JM, Jeon JH, Kwak SG, Park D, Moon JS. Fasting Plasma Glucose Level Independently Predicts the Mortality of Patients with Coronavirus Disease 2019 Infection: A Multicenter, Retrospective Cohort Study. Endocrinol Metab (Seoul). 2020 Sep;35(3):595-601. doi: 10.3803/EnM.2020.719. 82 18. Wander PL, Lowy E, Beste LA, et al. Prior glucose-lowering medication use and 30-day outcomes among 64,892 veterans with diabetes and COVID-19. Diabetes Care. Published online October 6, 2021. doi:10.2337/dc21-1351 19. Barrett CE, Park J, Kompaniyets L, Baggs J, Cheng YJ, Zhang P, Imperatore G, Pavkov ME. Intensive Care Unit Admission, Mechanical Ventilation, and Mortality Among Patients With Type 1 Diabetes Hospitalized for COVID-19 in the U.S. Diabetes Care. 2021 Aug;44(8):1788-1796. doi: 10.2337/dc21-0604. 20. Réa RR, Bernardelli RS, Kozesinski-Nakatani AC, et al. Dysglycemias in patients admitted to ICUs with severe acute respiratory syndrome due to COVID-19 versus other causes - a cohort study. BMC Pulm Med. 2023;23(1):173. Published 2023 May 16. doi:10.1186/s12890-023-02439-y 21. Wang S, Ma P, Zhang S, Song S, Wang Z, Ma Y, Xu J, Wu F, Duan L, Yin Z, Luo H, Xiong N, Xu M, Zeng T, Jin Y. Fasting blood glucose at admission is an independent predictor for 28-day mortality in patients with COVID-19 without previous diagnosis of diabetes: a multi-centre retrospective study. Diabetologia. 2020 Oct;63(10):2102-2111. doi: 10.1007/s00125-020-05209-1 Zhang J, Kong W, Xia P, Xu Y, Li L, Li Q, Yang L, Wei Q, Wang H, Li H, Zheng J, 22. Sun H, Xia W, Liu G, Zhong X, Qiu K, Li Y, Wang H, Wang Y, Song X, Liu H, Xiong S, Liu Y, Cui Z, Hu Y, Chen L, Pan A, Zeng T. Impaired Fasting Glucose and Diabetes Are Related to Higher Risks of Complications and Mortality Among Patients With Coronavirus Disease 2019. Front Endocrinol (Lausanne). 2020 Jul 10;11:525. doi: 10.3389/fendo.2020.00525. 23. Rodríguez-Rodríguez N, Martínez-Jiménez I, García-Ojalvo A, Mendoza-Mari Y, Guillén-Nieto G, Armstrong DG, Berlanga-Acosta J. Wound Chronicity, Impaired Immunity and Infection in Diabetic Patients. MEDICC Rev. 2021 Sep 17. doi: 10.37757/MR2021.V23.N3.8. 24. Reich MS, Fernandez I, Mishra A, Kafchinski L, Adler A, Nguyen MP. Diabetic Control Predicts Surgical Site Infection Risk in Orthopaedic Trauma Patients. J Orthop Trauma. 2019 Oct;33(10):514-517. doi: 10.1097/BOT.0000000000001512. 25. Martin ET, Kaye KS, Knott C, Nguyen H, Santarossa M, Evans R, Bertran E, Jaber L. Diabetes and Risk of Surgical Site Infection: A Systematic Review and Meta- analysis. Infect Control Hosp Epidemiol. 2016 Jan;37(1):88-99. doi: 10.1017/ice.2015.249. 26. Richards JE, Kauffmann RM, Zuckerman SL, Obremskey WT, May AK. Relationship of hyperglycemia and surgical-site infection in orthopaedic surgery. J Bone Joint Surg Am. 2012 Jul 3;94(13):1181-6. doi: 10.2106/JBJS.K.00193 27. Torres A, Blasi F, Dartois N, Akova M. Which individuals are at increased risk of pneumococcal disease and why? Impact of COPD, asthma, smoking, diabetes, and/or chronic heart disease on community-acquired pneumonia and invasive pneumococcal disease. Thorax. 2015 Oct;70(10):984-9. doi: 10.1136/thoraxjnl-2015-206780. 83 28. Casqueiro J, Casqueiro J, Alves C. Infections in patients with diabetes mellitus: A review of pathogenesis. Indian J Endocrinol Metab. 2012 Mar;16 Suppl 1(Suppl1):S27- 36. doi: 10.4103/2230-8210.94253. 29. Chávez-Reyes J, Escárcega-González CE, Chavira-Suárez E, León-Buitimea A, Vázquez-León P, Morones-Ramírez JR, Villalón CM, Quintanar-Stephano A, Marichal- Cancino BA. Susceptibility for Some Infectious Diseases in Patients With Diabetes: The Key Role of Glycemia. Front Public Health. 2021 Feb 16;9:559595. doi: 10.3389/fpubh.2021.559595. 30. Rao Kondapally Seshasai S, Kaptoge S, Thompson A, Di Angelantonio E, Gao P, Sarwar N, Whincup PH, Mukamal KJ, Gillum RF, Holme I, Njølstad I, Fletcher A, Nilsson P, Lewington S, Collins R, Gudnason V, Thompson SG, Sattar N, Selvin E, Hu FB, Danesh J; Emerging Risk Factors Collaboration. Diabetes mellitus, fasting glucose, and risk of cause-specific death. N Engl J Med. 2011 Mar 3;364(9):829-841. doi: 10.1056/NEJMoa1008862. Erratum in: N Engl J Med. 2011 Mar 31;364(13):1281 31. Cheong HS, Chang Y, Kim Y, Joo EJ, Kwon MJ, Wild SH, Byrne CD, Ryu S. Glycaemic status, insulin resistance, and risk of infection-related mortality: a cohort study. Eur J Endocrinol. 2023 Feb 14;188(2):lvad011. doi: 10.1093/ejendo/lvad011. 32. Edwards JM, Watson N, Focht C, Wynn C, Todd CA, Walter EB, Heine RP, Swamy GK. Group B Streptococcus (GBS) Colonization and Disease among Pregnant Women: A Historical Cohort Study. Infect Dis Obstet Gynecol. 2019 Feb 3;2019:5430493. doi: 10.1155/2019/5430493. Li YX, Long DL, Liu J, Qiu D, Wang J, Cheng X, Yang X, Li RM, Wang G. 33. Gestational diabetes mellitus in women increased the risk of neonatal infection via inflammation and autophagy in the placenta. Medicine (Baltimore). 2020 Oct 2;99(40):e22152. doi: 10.1097/MD.0000000000022152. Lukic A, Napoli A, Santino I, Bianchi P, Nobili F, Ciampittiello G, Nardone MR, 34. Santomauro M, Di Properzio M, Caserta D. Cervicovaginal bacteria and fungi in pregnant diabetic and non-diabetic women: a multicenter observational cohort study. Eur Rev Med Pharmacol Sci. 2017 May;21(10):2303-2315. PMID: 28617561. 35. Nguyen LM, Omage JI, Noble K, McNew KL, Moore DJ, Aronoff DM, Doster RS. Group B streptococcal infection of the genitourinary tract in pregnant and non-pregnant patients with diabetes mellitus: An immunocompromised host or something more? Am J Reprod Immunol. 2021 Dec;86(6):e13501. doi: 10.1111/aji.13501 36. Birch, M. N., Frank, Z., & Caughey, A. B. (2019). Rates of neonatal sepsis by maternal diabetes and chronic hypertension [12D]. Obstetrics & Gynecology, 133, 45S- 44S. 37. Nielsen TB, Pantapalangkoor P, Yan J, Luna BM, Dekitani K, Bruhn K, Tan B, Junus J, Bonomo RA, Schmidt AM, Everson M, Duncanson F, Doherty TM, Lin L, Spellberg B. Dia-betes Exacerbates Infection via Hyperinflammation by Signaling 84 through TLR4 and RAGE. mBio. 2017 Aug 22;8(4):e00818-17. doi: 10.1128/mBio.00818-17 38. Yan L, Li Y, Tan T, Qi J, Fang J, Guo H, Ren Z, Gou L, Geng Y, Cui H, Shen L, Yu S, Wang Z, Zuo Z. RAGE-TLR4 Crosstalk Is the Key Mechanism by Which High Glucose Enhances the Lipopolysaccharide-Induced Inflammatory Response in Primary Bovine Alveolar Macrophages. Int J Mol Sci. 2023 Apr 10;24(8):7007. doi: 10.3390/ijms24087007. 39. Suren Garg S, Kushwaha K, Dubey R, Gupta J. Association between obesity, inflam-mation and insulin resistance: Insights into signaling pathways and therapeutic interven-tions. Diabetes Res Clin Pract. 2023 Jun;200:110691. doi: 10.1016/j.diabres.2023.110691 40. Root-Bernstein R. Synergistic Activation of Toll-Like and NOD Receptors by Com-plementary Antigens as Facilitators of Autoimmune Disease: Review, Model and Novel Predictions. Int J Mol Sci. 2020 Jun 30;21(13):4645. doi: 10.3390/ijms21134645. 41. Root-Bernstein R. Innate Receptor Activation Patterns Involving TLR and NLR Syn-ergisms in COVID-19, ALI/ARDS and Sepsis Cytokine Storms: A Review and Model Mak-ing Novel Predictions and Therapeutic Suggestions. Int J Mol Sci. 2021 Feb 20;22(4):2108. doi: 10.3390/ijms22042108. 42. Bansal R, Gubbi S, Muniyappa R. Metabolic Syndrome and COVID 19: Endo- crine-Immune-Vascular Interactions Shapes Clinical Course. Endocrinology. 2020 Oct 1;161(10):bqaa112. doi: 10.1210/endocr/bqaa112 43. Alzaid F, Julla JB, Diedisheim M, Potier C, Potier L, Velho G, Gaborit B, Manivet P, Germain S, Vidal-Trecan T, Roussel R, Riveline JP, Dalmas E, Venteclef N, Gautier JF. Monocytopenia, monocyte morphological anomalies and hyperinflammation characterise severe COVID-19 in type 2 diabetes. EMBO Mol Med. 2020 Oct 7;12(10):e13038. doi: 10.15252/emmm.202013038. 44. Ferguson M, Vel J, Phan V, Ali R, Mabe L, Cherner A, Doan T, Manakatt B, Jose M, Powell AR, McKinney K, Serag H, Sallam HS. Coronavirus Disease 2019, Diabetes, and In-flammation: A Systemic Review. Metab Syndr Relat Disord. 2023 May;21(4):177- 187. doi: 10.1089/met.2022.0090. 45. Vasbinder A, Anderson E, Shadid H, Berlin H, Pan M, Azam TU, Khaleel I, Padalia K, Meloche C, O'Hayer P, Michaud E, Catalan T, Feroze R, Blakely P, Launius C, Huang Y, Zhao L, Ang L, Mikhael M, Mizokami-Stout K, Pennathur S, Kretzler M, Loosen SH, Chalkias A, Tacke F, Giamarellos-Bourboulis EJ, Reiser J, Eugen-Olsen J, Feldman EL, Pop-Busui R, Hayek SS; ISIC Study Group. Inflammation, Hyperglycemia, and Adverse Outcomes in Individuals With Diabetes Mellitus Hospitalized for COVID- 19. Diabetes Care. 2022 Mar 1;45(3):692-700. doi: 10.2337/dc21-2102 46. Madaschi S, Resmini E, Bonfadini S, Massari G, Gamba P, Sandri M, Calza S, Cimino E, Zarra E, Dotti S, Mascadri C, Agosti B, Garrafa E, Girelli A. Predictive 85 markers for clini-cal outcomes in a cohort of diabetic patients hospitalized for COVID- 19. Diabetol Metab Syndr. 2022 Nov 12;14(1):168. doi: 10.1186/s13098-022-00941-7. 47. Zhang X, Dai J, Li L, Chen H, Chai Y. NLRP3 Inflammasome Expression and Sig-naling in Human Diabetic Wounds and in High Glucose Induced Macrophages. J Diabe-tes Res. 2017;2017:5281358. doi: 10.1155/2017/5281358. 48. Dai J, Zhang X, Li L, Chen H, Chai Y. Autophagy Inhibition Contributes to ROS- Producing NLRP3-Dependent Inflammasome Activation and Cytokine Secretion in High Glucose-Induced Macrophages. Cell Physiol Biochem. 2017;43(1):247-256. doi: 10.1159/000480367. 49. Ding Y, Ding X, Zhang H, Li S, Yang P, Tan Q. Relevance of NLRP3 Inflam- masome-Related Pathways in the Pathology of Diabetic Wound Healing and Possible Therapeutic Targets. Oxid Med Cell Longev. 2022 Jun 30;2022:9687925. doi: 10.1155/2022/9687925. Liu D, Yang P, Gao M, Yu T, Shi Y, Zhang M, Yao M, Liu Y, Zhang X. NLRP3 50. activa-tion induced by neutrophil extracellular traps sustains inflammatory response in the dia-betic wound. Clin Sci (Lond). 2019 Feb 18;133(4):565-582. doi: 10.1042/CS20180600. 51. Acosta JB, del Barco DG, Vera DC, Savigne W, Lopez-Saura P, Guillen Nieto G, Schultz GS. The pro-inflammatory environment in recalcitrant diabetic foot wounds. Int Wound J. 2008 Oct;5(4):530-9. doi: 10.1111/j.1742-481X.2008.00457.x. 52. Kumar NP, Fukutani KF, Shruthi BS, Alves T, Silveira-Mattos PS, Rocha MS, West K, Natarajan M, Viswanathan V, Babu S, Andrade BB, Kornfeld H. Persistent inflammation during anti-tuberculosis treatment with diabetes comorbidity. Elife. 2019 Jul 4;8:e46477. doi: 10.7554/eLife.46477. Tarachandani D, Singhal K, Goyal A, Joshi A, Joshi R. Quantum of Stress 53. Hypergly-cemia at the Time of Initial Diagnosis of Tuberculosis. Cureus. 2023 Mar 20;15(3):e36382. doi: 10.7759/cureus.36382. 54. Bezerra AL, Moreira ADSR, Isidoro-Gonçalves L, Lara CFDS, Amorim G, Silva EC, Kritski AL, Carvalho ACC. Clinical, laboratory, and radiographic aspects of patients with pulmonary tuberculosis and dysglycemia and tuberculosis treatment outcomes. J Bras Pneumol. 2022 Nov 28;48(6):e20210505. doi: 10.36416/1806-3756/e20210505. Johnson RB. Periodontitis as a component of hyperinflammation: treating 55. periodon-titis in obese diabetic patients. Compend Contin Educ Dent. 2007 Sep;28(9):500-4; quiz 506, 528. 56. Nishimura F, Kono T, Fujimoto C, Iwamoto Y, Murayama Y. Negative effects of chronic inflammatory periodontal disease on diabetes mellitus. J Int Acad Periodontol. 2000 Apr;2(2):49-55. 86 57. Berbudi A, Rahmadika N, Tjahjadi AI, Ruslami R. Type 2 Diabetes and its Impact on the Immune System. Curr Diabetes Rev. 2020;16(5):442-449. doi: 10.2174/1573399815666191024085838. 58. Wang G. Human antimicrobial peptides and proteins. Pharmaceuticals (Basel). 2014 May 13;7(5):545-94. doi: 10.3390/ph7050545 59. Kowalczyk P, Kaczyńska K, Kleczkowska P, Bukowska-Ośko I, Kramkowski K, Sulejczak D. The Lactoferrin Phenomenon-A Miracle Molecule. Molecules. 2022 May 4;27(9):2941. doi: 10.3390/molecules27092941 60. Brouwer CP, Rahman M, Welling MM. Discovery and development of a synthetic peptide derived from lactoferrin for clinical use. Peptides. 2011 Sep;32(9):1953-63. doi: 10.1016/j.peptides.2011.07.017. 61. Bruni N, Capucchio MT, Biasibetti E, Pessione E, Cirrincione S, Giraudo L, Corona A, Dosio F. Antimicrobial Activity of Lactoferrin-Related Peptides and Applications in Human and Veterinary Medicine. Molecules. 2016 Jun 11;21(6):752. doi: 10.3390/molecules21060752 62. Drago-Serrano ME, Campos-Rodriguez R, Carrero JC, de la Garza M. Lactoferrin and Peptide-derivatives: Antimicrobial Agents with Potential Use in Nonspecific Immunity Modulation. Curr Pharm Des. 2018;24(10):1067-1078. doi: 10.2174/1381612824666180327155929. 63. Viejo-Díaz M, Andrés MT, Fierro JF. Different anti-Candida activities of two human lactoferrin-derived peptides, Lfpep and kaliocin-1. Antimicrob Agents Chemother. 2005 Jul;49(7):2583-8. doi: 10.1128/AAC.49.7.2583-2588.2005. Zarzosa-Moreno D, Avalos-Gómez C, Ramírez-Texcalco LS, Torres-López E, 64. Ramírez-Mondragón R, Hernández-Ramírez JO, Serrano-Luna J, de la Garza M. Lactoferrin and Its Derived Peptides: An Alternative for Combating Virulence Mechanisms Developed by Pathogens. Molecules. 2020 Dec 8;25(24):5763. doi: 10.3390/molecules25245763 Jrad Z, El-Hatmi H, Adt I, Gouin S, Jardin J, Oussaief O, Dbara M, Arroum S, 65. Khorchani T, Degraeve P, Oulahal N. Antilisterial activity of dromedary lactoferrin peptic hydrolysates. J Dairy Sci. 2019 Jun;102(6):4844-4856. doi: 10.3168/jds.2018-15548. 66. Hayes M, Ross RP, Fitzgerald GF, Hill C, Stanton C. Casein-derived antimicrobial peptides generated by Lactobacillus acidophilus DPC6026. Appl Environ Microbiol. 2006 Mar;72(3):2260-4. doi: 10.1128/AEM.72.3.2260-2264.2006 67. Ouertani A, Chaabouni I, Mosbah A, Long J, Barakat M, Mansuelle P, Mghirbi O, Najjari A, Ouzari HI, Masmoudi AS, Maresca M, Ortet P, Gigmes D, Mabrouk K, Cherif A. Two New Secreted Proteases Generate a Casein-Derived Antimicrobial Peptide in Bacillus cereus Food Born Isolate Leading to Bacterial Competition in Milk. Front Microbiol. 2018 Jun 4;9:1148. doi: 10.3389/fmicb.2018.01148. 87 68. Vordenbäumen S, Braukmann A, Petermann K, Scharf A, Bleck E, von Mikecz A, Jose J, Schneider M. Casein α s1 is expressed by human monocytes and upregulates the production of GM-CSF via p38 MAPK. J Immunol. 2011 Jan 1;186(1):592-601. doi: 10.4049/jimmunol.1001461. 69. Ragland SA, Criss AK. From bacterial killing to immune modulation: Recent insights into the functions of lysozyme. PLoS Pathog. 2017 Sep 21;13(9):e1006512. doi: 10.1371/journal.ppat.1006512. 70. Akdis M, Aab A, Altunbulakli C, Azkur K, Costa RA, Crameri R, Duan S, Eiwegger T, Eljaszewicz A, Ferstl R, Frei R, Garbani M, Globinska A, Hess L, Huitema C, Kubo T, Komlosi Z, Konieczna P, Kovacs N, Kucuksezer UC, Meyer N, Morita H, Olzhausen J, O'Mahony L, Pezer M, Prati M, Rebane A, Rhyner C, Rinaldi A, Sokolowska M, Stanic B, Sugita K, Treis A, van de Veen W, Wanke K, Wawrzyniak M, Wawrzyniak P, Wirz OF, Zakzuk JS, Akdis CA. Interleukins (from IL-1 to IL-38), interferons, transforming growth factor β, and TNF-α: Receptors, functions, and roles in diseases. J Allergy Clin Immunol. 2016 Oct;138(4):984-1010. doi: 10.1016/j.jaci.2016.06.033. 71. Kumar V. Toll-like receptors in sepsis-associated cytokines storm and their endogenous negative regulators as future immunomodulatory targets. Int. Immunopharmacol. 2020;89:107087. doi: 10.1016/j.intimp.2020.107087. 72. Kaur H. Characterization of glycosylation in monoclonal antibodies and its importance in therapeutic antibody development. Crit Rev Biotechnol. 2021 Mar;41(2):300-315. doi: 10.1080/07388551.2020.1869684. Johansen, M.B., Kiemer, L., and Søren Brunak (2006) Analysis and prediction of 73. mammalian protein glycation Glycobiology (September 2006) 16 (9): 844-853. 74. Shin A, Connolly S, Kabytaev K. Protein glycation in diabetes mellitus. Adv Clin Chem. 2023;113:101-156. doi: 10.1016/bs.acc.2022.11.003. 75. Dills, William L. Jr. (1993) Protein fructosylation: Fructose and the Maillard reaction. Am J Clin Nutr 1993; 58 (suppl):779S-787S. 76. Suárez G Nonenzymatic glycation of bovine serum albumin by fructose (fructation). Comparison with the Maillard reaction initiated by glucose.J Biol Chem. 1989 Mar 5;264(7):3674-9. 77. O’Harte FPM, Penney AC, Flatt PR: Glycation of insulin by phosphorylated and non-phosphorylated reducing sugars. Biochem Soc Trans25 :150S ,1997 78. Ling, X., Sakashita, N., Takeya, M., Nagai, R., Horiuchi, S., and Takahashi, K. (1998) Immunohistochemical distribution and subcellular localization of three distinct specific molecular structures of advanced glycation end products in human tissues. Lab. Invest., 78, 1591–1606. 88 79. Garlick, R.L. and Mazer, J.S. (1983) The principal site of nonenzymatic glycosylation of human serum albumin in vivo. J. Biol. Chem., 258, 6142–6146. 80. Shilton, B.H. and Walton, D.J. (1991) Sites of glycation of human and horse liver alcohol dehydrogenase in vivo. J. Biol. Chem., 266, 5587–5592. 81. Yazdanpanah S, Rabiee M, Tahriri M, Abdolrahim M, Rajab A, Jazayeri HE, Tayebi L. Evaluation of glycated albumin (GA) and GA/HbA1c ratio for diagnosis of diabetes and glycemic control: A comprehensive review. Crit Rev Clin Lab Sci. 2017 Jun;54(4):219-232. doi: 10.1080/10408363.2017.1299684. Takahashi S, Uchino H, Shimizu T, Kanazawa A, Tamura Y, Sakai K, Watada H, 82. Hirose T, Kawamori R, Tanaka Y. Comparison of glycated albumin (GA) and glycated hemoglobin (HbA1c) in type 2 diabetic patients: usefulness of GA for evaluation of short-term changes in glycemic control. Endocr J. 2007 Feb;54(1):139-44. doi: 10.1507/endocrj.k06-103. 83. Hu H, Jiang H, Ren H, Hu X, Wang X, Han C. AGEs and chronic subclinical inflammation in diabetes: disorders of immune system. Diabetes Metab Res Rev. 2015 Feb;31(2):127-37. doi: 10.1002/dmrr.2560. 84. Mengstie MA, Chekol Abebe E, Behaile Teklemariam A, Tilahun Mulu A, Agidew MM, Teshome Azezew M, Zewde EA, Agegnehu Teshome A. Endogenous advanced glycation end products in the pathogenesis of chronic diabetic complications. Front Mol Biosci. 2022 Sep 15;9:1002710. doi: 10.3389/fmolb.2022.1002710 85. Song, Fei and Schmidt, Ann Marie. (2012). Glycation & Insulin Resistance: Novel Mechanisms and Unique Targets?Arterioscler Thromb Vasc Biol. 2012 Aug; 32(8): 1760–1765. 86. Abdel-Wahab YHA, O’Harte FPM, Ratcliff H, McClenaghan NH, Barnett CR, Flatt PR: Glycation of insulin in the islets of Langerhans of normal and diabetic animals. Diabetes45 :1489 –1496,1996 87. Abdel-Wahab YHA, O’Harte FPM, Barnett CR, Flatt PR: Characterization of insulin glycation in insulin-secreting cells maintained in tissue culture. J Endocrinol152 :59 –67,1997 88. Abdel-Wahab, YHA, et al. Glycation of insulin results in reduced biological activity in mice. Acta Diabetologica. December 1997, Volume 34, Issue 4, pp 265-270 89. Hunter. SJ, et al. Demonstration of Glycated Insulin in Human Diabetic Plasma and Decreased Biological Activity Assessed by Euglycemic-Hyperinsulinemic Clamp Technique in Humans. Diabetes February 2003 vol. 52 no. 2 492-498. 90. McKillop, AM, et al. Evaluation of the site(s) of glycation in human proinsulin by ion-trap LCQ electrospray ionization mass spectrometry. Regul Pept. 2003 May 15;113(1-3):1-8. 89 91. Guedes, S., et al. (2009). Mass Spectrometry Characterization of the Glycation Sites of Bovine Insulin by Tandem Mass Spectrometry Journal of the American Society for Mass Spectrometry Volume 20, Issue 7, July 2009, Pages 1319–1326. 92. Ma L, Yang C, Huang L, Chen Y, Li Y, Cheng C, Cheng B, Zheng L, Huang K. Glycated Insulin Exacerbates the Cytotoxicity of Human Islet Amyloid Polypeptides: a Vicious Cycle in Type 2 Diabetes. ACS Chem Biol. 2019 Mar 15;14(3):486-496. doi: 10.1021/acschembio.8b01128 93. Walke PB, Bansode SB, More NP, Chaurasiya AH, Joshi RS, Kulkarni MJ. Molecular investigation of glycated insulin-induced insulin resistance via insulin signaling and AGE-RAGE axis. Biochim Biophys Acta Mol Basis Dis. 2021 Feb 1;1867(2):166029. doi: 10.1016/j.bbadis.2020.166029. Jeevanandam J, Paramasivam E, Saraswathi NT. Glycation restrains open- 94. closed conformation of Insulin. Comput Biol Chem. 2023 Feb;102:107803. doi: 10.1016/j.compbiolchem.2022.107803. 95. Rhinesmith T, Turkette T, Root-Bernstein R. Rapid Non-Enzymatic Glycation of the Insulin Receptor under Hyperglycemic Conditions Inhibits Insulin Binding In Vitro: Implications for Insulin Resistance. Int J Mol Sci. 2017 Dec 2;18(12):2602. doi: 10.3390/ijms18122602. Zhang Q, Tang N, Schepmoes AA, Phillips LS, Smith RD, Metz TO. Proteomic 96. profiling of nonenzymatically glycated proteins in human plasma and erythrocyte membranes. J Proteome Res. 2008 May;7(5):2025-32. doi: 10.1021/pr700763r. Zhang Q, Monroe ME, Schepmoes AA, Clauss TR, Gritsenko MA, Meng D, 97. Petyuk VA, Smith RD, Metz TO. Comprehensive identification of glycated peptides and their glycation motifs in plasma and erythrocytes of control and diabetic subjects. J Proteome Res. 2011 Jul 1;10(7):3076-88. doi: 10.1021/pr200040j. 98. Rabbani N, Ashour A, Thornalley PJ. Mass spectrometric determination of early and advanced glycation in biology. Glycoconj J. 2016 Aug;33(4):553-68. doi: 10.1007/s10719-016-9709-8. 99. Price CL, Hassi HO, English NR, Blakemore AI, Stagg AJ, Knight SC. Methylglyoxal modulates immune responses: relevance to diabetes. J Cell Mol Med. 2010 Jun;14(6B):1806-15. doi: 10.1111/j.1582-4934.2009.00803.x. 100. Lai SWT, Lopez Gonzalez EJ, Zoukari T, Ki P, Shuck SC. Methylglyoxal and Its Adducts: Induction, Repair, and Association with Disease. Chem Res Toxicol. 2022 Oct 17;35(10):1720-1746. doi: 10.1021/acs.chemrestox.2c00160. 101. Nesargikar PN, Spiller B, Chavez R. The complement system: history, pathways, cascade and inhibitors. Eur J Microbiol Immunol (Bp). 2012 Jun;2(2):103-11. doi: 10.1556/EuJMI.2.2012.2.2. 90 102. Fortpied J, Vertommen D, Van Schaftingen E. Binding of mannose-binding lectin to fructosamines: a potential link between hyperglycaemia and complement activation in diabetes. Diabetes Metab Res Rev. 2010 May;26(4):254-60. doi: 10.1002/dmrr.1079. 103. Chikazawa M, Shibata T, Hatasa Y, Hirose S, Otaki N, Nakashima F, Ito M, Machida S, Maruyama S, Uchida K. Identification of C1q as a Binding Protein for Advanced Glycation End Products. Biochemistry. 2016 Jan 26;55(3):435-46. doi: 10.1021/acs.biochem.5b00777. 104. Ghosh P, Vaidya A, Sahoo R, Goldfine A, Herring N, Bry L, Chorev M, Halperin JA. Glycation of the complement regulatory protein CD59 is a novel biomarker for glucose handling in humans. J Clin Endocrinol Metab. 2014 Jun;99(6):E999-E1006. doi: 10.1210/jc.2013-4232. 105. Bogdanet D, Toth Castillo M, Doheny H, Dervan L, Luque-Fernandez MA, Halperin JA, O'Shea PM, Dunne FP. The Diagnostic Accuracy of Second Trimester Plasma Glycated CD59 (pGCD59) to Identify Women with Gestational Diabetes Mellitus Based on the 75 g OGTT Using the WHO Criteria: A Prospective Study of Non-Diabetic Pregnant Women in Ireland. J Clin Med. 2022 Jul 4;11(13):3895. doi: 10.3390/jcm11133895 106. Qin X, Goldfine A, Krumrei N, Grubissich L, Acosta J, Chorev M, Hays AP, Halperin JA. Glycation inactivation of the complement regulatory protein CD59: a possible role in the pathogenesis of the vascular complications of human diabetes. Diabetes. 2004 Oct;53(10):2653-61. doi: 10.2337/diabetes.53.10.2653. 107. Acosta J, Hettinga J, Flückiger R, Krumrei N, Goldfine A, Angarita L, Halperin J. Molecular basis for a link between complement and the vascular complications of diabetes. Proc Natl Acad Sci U S A. 2000 May 9;97(10):5450-5. doi: 10.1073/pnas.97.10.5450. 108. Sabitha, D., Dawson, E.M., Tiwari, S., Swetha, P., Naushad, S.M., Baba, K.S., Mohan, I.K., & Rani, G.U. (2020). Study of Glycation of Transferrin and its Effect on Biomarkers of Iron Status in Uncontrolled Diabetes Mellitus Patients. Journal of Clinical and Diagnostic Research. 109. Fujimoto S, Kawakami N, Ohara A. Nonenzymatic glycation of transferrin: decrease of iron-binding capacity and increase of oxygen radical production. Biol Pharm Bull. 1995 Mar;18(3):396-400. doi: 10.1248/bpb.18.396 110. Silva AM, Sousa PR, Coimbra JT, Brás NF, Vitorino R, Fernandes PA, Ramos MJ, Rangel M, Domingues P. The glycation site specificity of human serum transferrin is a determinant for transferrin's functional impairment under elevated glycaemic conditions. Biochem J. 2014 Jul 1;461(1):33-42. doi: 10.1042/BJ20140133. 111. Silva AMN, Coimbra JTS, Castro MM, Oliveira Â, Brás NF, Fernandes PA, Ramos MJ, Rangel M. Determining the glycation site specificity of human holo- transferrin. J Inorg Biochem. 2018 Sep;186:95-102. doi: 10.1016/j.jinorgbio.2018.05.016 91 112. Ma Y, Cai J, Wang Y, Liu J, Fu S. Non-Enzymatic Glycation of Transferrin and Diabetes Mellitus. Diabetes Metab Syndr Obes. 2021 Jun 8;14:2539-2548. doi: 10.2147/DMSO.S304796. 113. Ma Y, Zhou Q, Zhao P, Lv X, Gong C, Gao J, Liu J. Effect of transferrin glycation induced by high glucose on HK-2 cells in vitro. Front Endocrinol (Lausanne). 2023 Jan 26;13:1009507. doi: 10.3389/fendo.2022.1009507. 114. Bezkorovainy A. Antimicrobial properties of iron-binding proteins. Adv Exp Med Biol. 1981;135:139-54. doi: 10.1007/978-1-4615-9200-6_8. 115. Muraoka MY, Justino AB, Caixeta DC, Queiroz JS, Sabino-Silva R, Salmen Espindola F. Fructose and methylglyoxal-induced glycation alters structural and functional properties of salivary proteins, albumin and lysozyme. PLoS One. 2022 Jan 21;17(1):e0262369. doi: 10.1371/journal.pone.0262369. 116. Mirmiranpour H, Khaghani S, Bathaie SZ, Nakhjavani M, Kebriaeezadeh A, Ebadi M, Gerayesh-Nejad S, Zangooei M. The Preventive Effect of L-Lysine on Lysozyme Glycation in Type 2 Diabetes. Acta Med Iran. 2016 Jan;54(1):24-31. 117. Ruan ED, Wang H, Ruan Y, Juáreza M. Characteristics of glycation and glycation sites of lysozyme by matrix-assisted laser desorption/ionization time of flight/time-of-flight mass spectrometry and Liquid chromatography-electrospray ionization tandem mass spectrometry. Eur J Mass Spectrom (Chichester). 2014;20(4):327-36. 118. Zhao Z, Liu J, Shi B, He S, Yao X, Willcox MD. Advanced glycation end product (AGE) modified proteins in tears of diabetic patients. Mol Vis. 2010;16:1576–1584. 119. Kislinger T, Humeny A, Pischetsrieder M. Analysis of protein glycation products by matrix-assisted laser desorption ionization time-of-flight mass spectrometry. Curr Med Chem. 2004 Aug;11(16):2185-93. doi: 10.2174/0929867043364649. 120. Pampati PK, Suravajjala S, Dain JA. Monitoring nonenzymatic glycation of human immunoglobulin G by methylglyoxal and glyoxal: A spectroscopic study. Anal Biochem. 2011 Jan 1;408(1):59-63. doi: 10.1016/j.ab.2010.08.038. 121. Saleem RA, Affholter BR, Deng S, Campbell PC, Matthies K, Eakin CM, Wallace A. A chemical and computational approach to comprehensive glycation characterization on antibodies. MAbs. 2015;7(4):719-31. doi: 10.1080/19420862.2015.1046663 122. Khan MA, Arif Z, Khan MA, Moinuddin, Alam K. Methylglyoxal produces more changes in biochemical and biophysical properties of human IgG under high glucose compared to normal glucose level. PLoS One. 2018 Jan 19;13(1):e0191014. doi: 10.1371/journal.pone.0191014 123. Wei B, Berning K, Quan C, Zhang YT. Glycation of antibodies: Modification, methods and potential effects on biological functions. MAbs. 2017 May/Jun;9(4):586- 594. doi: 10.1080/19420862.2017.1300214. 92 124. Faisal M, Alatar AA, Ahmad S. Immunoglobulin-G Glycation by Fructose Leads to Structural Perturbations and Drop Off in Free Lysine and Arginine Residues. Protein Pept Lett. 2017;24(3):241-244. doi: 10.2174/0929866524666170117142723. 125. Rehman S, Faisal M, Alatar AA, Ahmad S. Physico-chemical Changes Induced in the Serum Proteins Immunoglobulin G and Fibrinogen Mediated by Methylglyoxal. Curr Protein Pept Sci. 2020;21(9):916-923. doi: 10.2174/1389203720666190618095719. 126. Alouffi S, Shahab U, Khan S, Khan M, Khanam A, Akasha R, Shahanawaz SD, Arif H, Tahir IK, Rehman S, Ahmad S. Glyoxal induced glycative insult suffered by immunoglobulin G and fibrinogen proteins: A comparative physicochemical characterization to reveal structural perturbations. Int J Biol Macromol. 2022 Apr 30;205:283-296. doi: 10.1016/j.ijbiomac.2022.02.093. 127. Gadgil HS, Bondarenko PV, Treuheit MJ, Ren D. Screening and Sequencing of Glycated Proteins by Neutral Loss Scan LC/MS/MS Method. Anal. Chem. 2007, 79, 5991-5999 128. Park HY, Oh MJ, Park Y, Kim Y. Nε-(carboxymethyl)lysine formation from the Maillard reaction of casein and different reducing sugars. Food Sci Biotechnol. 2019 Oct 15;29(4):487-491. doi: 10.1007/s10068-019-00689-3. 129. Nguyen HT, van der Fels-Klerx HJ, van Boekel MA. Kinetics of N(ε)- (carboxymethyl)lysine formation in aqueous model systems of sugars and casein. Food Chem. 2016 Feb 1;192:125-33. doi: 10.1016/j.foodchem.2015.06.110. 130. Dalsgaard, T.K., Nielsen, J.H., and Larsen, L.B. (2007). Proteolysis of milk proteins lactosylated in model systems. Mol. Nutr. Food Res. 51: 404– 414. 131. Wang W, Chen C, Zhou C, Tang Z, Luo D, Fu X, Zhu S, Yang X. Effects of glycation with chitooligosaccharide on digestion and fermentation processes of lactoferrin in vitro. Int J Biol Macromol. 2023 Apr 15;234:123762. doi: 10.1016/j.ijbiomac.2023.123762. 132. Moscovici AM, Joubran Y, Briard-Bion V, Mackie A, Dupont D, Lesmes U. The impact of the Maillard reaction on the in vitro proteolytic breakdown of bovine lactoferrin in adults and infants. Food Funct. 2014 Aug;5(8):1898-908. doi: 10.1039/c4fo00248b. 133. Smilowitz JT, Totten SM, Huang J, Grapov D, Durham HA, Lammi-Keefe CJ, Lebrilla C, German JB. Human milk secretory immunoglobulin a and lactoferrin N- glycans are altered in women with gestational diabetes mellitus. J Nutr. 2013 Dec;143(12):1906-12. doi: 10.3945/jn.113.180695 134. Van Campenhout A, Van Campenhout C, Olyslager YS, Van Damme O, Lagrou AR, Manuel-y-Keenoy B. A novel method to quantify in vivo transferrin glycation: applications in diabetes mellitus. Clin Chim Acta. 2006 Aug;370(1-2):115-23. doi: 10.1016/j.cca.2006.01.028. 93 135. Fuentes-Lemus E, Reyes JS, López-Alarcón C, Davies MJ. Crowding modulates the glycation of plasma proteins: In vitro analysis of structural modifications to albumin and transferrin and identification of sites of modification. Free Radic Biol Med. 2022;193(Pt 2):551-566. doi:10.1016/j.freeradbiomed.2022.10.319. 136. Deng G, Dyroff SL, Lockart M, Bowman MK, Vincent JB. The effects of the glycation of transferrin on chromium binding and the transport and distribution of chromium in vivo. J Inorg Biochem. 2016 Nov;164:26-33. doi: 10.1016/j.jinorgbio.2016.08.008. 137. Spiller S, Li Y, Blüher M, Welch L, Hoffmann R. Glycated lysine-141 in haptoglobin im-proves the diagnostic accuracy for type 2 diabetes mellitus in combination with glycated hemoglobin HbA1c and fasting plasma glucose. Clin Proteomics. 2017 Mar 28;14:10. doi: 10.1186/s12014-017-9145-1.: 138. Ponikowska B, Suchocki T, Paleczny B, Olesinska M, Powierza S, Borodulin- Nadzieja L, Reczuch K, von Haehling S, Doehner W, Anker SD, Cleland JG, Jankowska EA. Iron status and survival in diabetic patients with coronary artery disease. Diabetes Care. 2013 Dec;36(12):4147-56. doi: 10.2337/dc13-0528 139. Huth C, Beuerle S, Zierer A, Heier M, Herder C, Kaiser T, Koenig W, Kronenberg F, Oexle K, Rathmann W, Roden M, Schwab S, Seissler J, Stöckl D, Meisinger C, Peters A, Thorand B. Biomarkers of iron metabolism are independently associated with impaired glucose metabolism and type 2 diabetes: the KORA F4 study. Eur J Endocrinol. 2015 Nov;173(5):643-53. doi: 10.1530/EJE-15-0631. 140. Elliot D Drew, Robert W Janes, 2StrucCompare: a webserver for visualizing small but noteworthy differences between protein tertiary structures through interrogation of the secondary structure content, Nucleic Acids Research, Volume 47, Issue W1, 02 July 2019, Pages W477–W481, https://doi.org/10.1093/nar/gkz456 141. Velliyagounder K, Kaplan JB, Furgang D, Legarda D, Diamond G, Parkin RE, Fine DH. One of two human lactoferrin variants exhibits increased antibacterial and transcriptional activation activities and is associated with localized juve-nile periodontitis. Infect Immun. 2003 Nov;71(11):6141-7. doi: 10.1128/IAI.71.11.6141- 6147.2003. 142. Wu YM, Juo SH, Ho YP, Ho KY, Yang YH, Tsai CC. Association between lactoferrin gene polymorphisms and aggressive periodontitis among Taiwanese patients. J Periodontal Res. 2009 Jun;44(3):418-24. doi: 10.1111/j.1600- 0765.2008.01120.x. 143. Barber MF, Kronenberg Z, Yandell M, Elde NC (2016) Antimicrobial Functions of Lactofer-rin Promote Genetic Conflicts in Ancient Primates and Modern Humans. PLoS Genet 12(5): e1006063. https://doi.org/10.1371/journal.pgen.1006063 144. van Berkel PH, Geerts ME, van Veen HA, Mericskay M, de Boer HA, Nuijens JH. N-terminal stretch Arg2, Arg3, Arg4 and Arg5 of human lactoferrin is essential for 94 binding to heparin, bacterial lipopolysaccharide, human lysozyme and DNA. Biochem J. 1997 Nov 15;328 ( Pt 1)(Pt 1):145-51. doi: 10.1042/bj3280145 145. Nibbering PH, Ravensbergen E, Welling MM, van Berkel LA, van Berkel PH, Pauwels EK, Nuijens JH. Human lactoferrin and peptides derived from its N terminus are highly effective against infections with antibiotic-resistant bacteria. Infect Immun. 2001 Mar;69(3):1469-76. doi: 10.1128/IAI.69.3.1469-1476.2001. 146. Viejo-Díaz M, Andrés MT, Pérez-Gil J, Sánchez M, Fierro JF. Potassium efflux induced by a new lactoferrin-derived pep-tide mimicking the effect of native human lactoferrin on the bacterial cytoplasmic membrane. Biochemistry (Mosc). 2003 Feb;68(2):217-27. doi: 10.1023/a:1022657630698. PMID: 12693969. 147. Hendrixson DR, Qiu J, Shewry SC, Fink DL, Petty S, Baker EN, Plaut AG, St Geme JW 3rd. Human milk lactoferrin is a serine protease that cleaves Haemophilus surface proteins at arginine-rich sites. Mol Microbiol. 2003 Feb;47(3):607-17. doi: 10.1046/j.1365-2958.2003.03327.x. 148. Soboleva A, Mavropulo-Stolyarenko G, Karonova T, et al. Multiple Glycation Sites in Blood Plasma Proteins as an Integrated Biomarker of Type 2 Diabetes Mellitus. Int J Mol Sci. 2019;20(9):2329. 149. Wally J, Buchanan SK. A structural comparison of human serum transferrin and human lactoferrin. Biometals. 2007 Jun;20(3-4):249-62. doi: 10.1007/s10534-006-9062- 7. 150. McAvan BS , France AP , Bellina B , Barran PE , Goodacre R , Doig AJ . Quantification of protein glycation using vibrational spectroscopy. Analyst. 2020 May 21;145(10):3686-3696. doi: 10.1039/c9an02318f. 151. Wu H, Hartman TG, Govindarajan S, Kahn PC, Ho CT, Rosen JD. Glycation of lysozyme in a restricted water environment. Proc Natl Sci Counc Repub China B. 1991 Jul;15(3):140-6. 152. Bathaie, S.Z., Nobakht, B.B.F., Mirmiranpour, H. et al. Effect of Chemical Chaperones on Glucose-Induced Lysozyme Modifications. Protein J 30, 480–489 (2011). https://doi.org/10.1007/s10930-011-9353-x 153. Olayanju OA, Mba IN, Akinmola OO, et al. Relationship between Glycaemic Control and Oral Immunologic Proteins. West Afr J Med. 2022;39(10):1062-1067. 154. Shi J, Fu Y, Zhao XH, Lametsch R. Glycation sites and bioactivity of lactose- glycated caseinate hydrolysate in lipopolysaccharide-injured IEC-6 cells. J Dairy Sci. 2021 Feb;104(2):1351-1363. doi: 10.3168/jds.2020-19018. 155. Chatterton DE, Nguyen DN, Bering SB, Sangild PT. Anti-inflammatory mechanisms of bioactive milk proteins in the intestine of newborns. Int J Biochem Cell Biol. 2013 Aug;45(8):1730-47. doi: 10.1016/j.biocel.2013.04.028. 95 156. Kong Y, Dong Q, Yu Z, Yan H, Liu L, Shen Y. The effect of lactose and its isomerization product lactulose on functional and structural properties of glycated casein. Food Res Int. 2023;168:112683. doi:10.1016/j.foodres.2023.112683 157. Park K, Elias PM, Oda Y, Mackenzie D, Mauro T, Holleran WM, Uchida Y. Regulation of cathelicidin antimicrobial peptide expression by an endoplasmic reticulum (ER) stress signaling, vitamin D receptor-independent pathway. J Biol Chem. 2011 Sep 30;286(39):34121-30. doi: 10.1074/jbc.M111.250431. 158. Li G, Wang Q, Feng J, Wang J, Wang Y, Huang X, Shao T, Deng X, Cao Y, Zhou M, Zhao C. Recent insights into the role of defensins in diabetic wound healing. Biomed Pharmacother. 2022 Nov;155:113694. doi: 10.1016/j.biopha.2022.113694. 159. Knight LC. Non-oncologic applications of radiolabeled peptides in nuclear medicine. Q J Nucl Med. 2003 Dec;47(4):279-91. 160. Lan CC, Wu CS, Huang SM, Kuo HY, Wu IH, Liang CW, Chen GS. High-glucose environment reduces human β-defensin-2 expression in human keratinocytes: implications for poor diabetic wound healing. Br J Dermatol. 2012 Jun;166(6):1221-9. doi: 10.1111/j.1365-2133.2012.10847.x. 161. Galkowska H, Olszewski WL, Wojewodzka U. Expression of natural antimicrobial peptide beta-defensin-2 and Langerhans cell accumulation in epidermis from human non-healing leg ulcers. Folia Histochem Cytobiol. 2005;43(3):133-6 162. Najeeb T, Soomro Late MS, Fawwad A, Waris N, Nangrejo R, Siddiqui IA, Aziz Q, Basit A. Association of Hepcidin levels in Type 2 Diabetes Mellitus treated with metformin or combined anti-diabetic agents in Pakistani population. J Pak Med Assoc. 2023 Feb;73(2):313-318. doi: 10.47391/JPMA.6154. 163. Jiménez-Escutia R, Vargas-Alcantar D, Flores-Espinosa P, Helguera-Repetto AC, Villavicencio-Carrisoza O, Mancilla-Herrera I, Irles C, Torres-Ramos YD, Valdespino-Vazquez MY, Velázquez-Sánchez P, Zamora-Escudero R, Islas-López M, Carranco-Salinas C, Díaz L, Zaga-Clavellina V, Olmos-Ortiz A. High Glucose Promotes Inflammation and Weakens Placental Defenses against E. coli and S. agalactiae Infection: Protective Role of Insulin and Metformin. Int J Mol Sci. 2023 Mar 9;24(6):5243. doi: 10.3390/ijms24065243 164. Linn O, Menges B, Lammert F, Weber SN, Krawczyk M. Altered Expression of Antimicrobial Peptides in the Upper Gastrointestinal Tract of Patients with Diabetes Mellitus. Nutrients. 2023 Feb 2;15(3):754. doi: 10.3390/nu15030754. 165. Yilmaz D, Topcu AO, Akcay EU, Altındis M, Gursoy UK. Salivary human beta- defensins and cathelicidin levels in relation to periodontitis and type 2 diabetes mellitus. Acta Odontol Scand. 2020 Jul;78(5):327-331. doi: 10.1080/00016357.2020.1715471. 166. Kumar NP, Moideen K, Viswanathan V, Sivakumar S, Menon PA, Kornfeld H, Babu S. Heightened circulating levels of antimicrobial peptides in tuberculosis-Diabetes 96 co-morbidity and reversal upon treatment. PLoS One. 2017 Sep 14;12(9):e0184753. doi: 10.1371/journal.pone.0184753. 167. Zheng X, Wu SL, Hancock WS. Glycation of interferon-beta-1b and human serum albumin in a lyophilized glucose formulation. Part III: application of proteomic analysis to the manufacture of biological drugs. Int J Pharm. 2006 Sep 28;322(1-2):136- 45. doi: 10.1016/j.ijpharm.2006.06.038 168. Mironova R, Niwa T, Dimitrova R, Boyanova M, Ivanov I. Glycation and post- translational processing of human interferon-gamma expressed in Escherichia coli. J Biol Chem. 2003 Dec 19;278(51):51068-74. doi: 10.1074/jbc.M307470200. 169. Almeida EA, Mehndiratta M, Madhu SV, Kar R, Puri D. Differential Expression of Suppressor of Cytokine Signaling and Interferon Gamma in Lean and Obese Patients with Type 2 Diabetes Mellitus. Int J Endocrinol Metab. 2022 Jul 16;20(3):e122553. doi: 10.5812/ijem-122553. 170. Ayelign B, Negash M, Genetu M, Wondmagegn T, Shibabaw T. Immunological impacts of diabetes on the susceptibility of Mycobacterium tuberculosis. J Immunol Res. 2019;2019:6196532. doi: 10.1155/2019/6196532 171. Boillat-Blanco N, Tumbo AN, Perreau M, et al. Hyperglycaemia is inversely correlated with live M. bovis BCG-specific CD4(+) T cell responses in Tanzanian adults with latent or active tuberculosis. Immun Inflamm Dis. 2018;6(2):345–353. doi: 10.1002/iid3.222 172. Magee MJ, Trost SL, Salindri AD, Amere G, Day CL, Gandhi NR. Adults with Mycobacterium tuberculosis infection and pre-diabetes have increased levels of QuantiFERON interferon-gamma responses. Tuberculosis. 2020;122:101935. doi: 10.1016/j.tube.2020.101935 173. Stalenhoef JE, Alisjahbana B, Nelwan EJ, et al. The role of interferon-gamma in the increased tuberculosis risk in type 2 diabetes mellitus. Eur J Clin Microbiol Infect Dis. 2008;27(2):97–103. doi: 10.1007/s10096-007-0395-0 174. Klopfenstein N, Brandt SL, Castellanos S, Gunzer M, Blackman A, Serezani CH. SOCS-1 inhibition of type I interferon restrains Staphylococcus aureus skin host defense. PLoS Pathog. 2021 Mar 10;17(3):e1009387. doi: 10.1371/journal.ppat.1009387. 175. Megawati ER, Meutia N, Lubis LD. The effect of hyperglycaemia on the macrophages in the cell culture. Folia Morphol (Warsz). 2022;81(2):387-393. doi: 10.5603/FM.a2021.0017 176. Al-Rashed F, Sindhu S, Arefanian H, Al Madhoun A, Kochumon S, Thomas R, Al-Kandari S, Alghaith A, Jacob T, Al-Mulla F, Ahmad R. Repetitive Intermittent Hyperglycemia Drives the M1 Polarization and Inflammatory Responses in THP-1 Macrophages Through the Mechanism Involving the TLR4-IRF5 Pathway. Cells. 2020 Aug 12;9(8):1892. doi: 10.3390/cells9081892. 97 177. Qian J, Huang Y. Expression of TNF-α and IL-1β in Peripheral Blood of Patients with T2DM Retinopathy. Comput Math Methods Med. 2022 Aug 8;2022:9073372. doi: 10.1155/2022/9073372. 178. Liu R, Ma B, Gao Y, Ma B, Liu Y, Qi H. Tear Inflammatory Cytokines Analysis and Clinical Correlations in Diabetes and Nondiabetes With Dry Eye. Am J Ophthalmol. 2019 Apr;200:10-15. doi: 10.1016/j.ajo.2018.12.001 179. Bezold V. Impact of glycation and advanced glycation end products (AGEs) on macrophage activation (Dissertation). Erlangung des Doktorgrades der Naturwissenschaften(Dr. rer. nat.) der Naturwissenschaftlichen Fakultät I– Biowissenschaften der Martin-Luther-Universität Halle-Wittenberg vorgelegt. 2020. 180. Wang L, Wang J, Fang J, Zhou H, Liu X, Su SB. High glucose induces and activates Toll-like receptor 4 in endothelial cells of diabetic retinopathy. Diabetol Metab Syndr. 2015 Oct 13;7:89. doi: 10.1186/s13098-015-0086-4. 181. Takata S, Sawa Y, Uchiyama T, Ishikawa H. Expression of Toll-Like Receptor 4 in Glo-merular Endothelial Cells under Diabetic Conditions. Acta Histochem Cytochem. 2013 Feb 28;46(1):35-42. doi: 10.1267/ahc.13002. 182. May O, Yatime L, Merle NS, et al. The receptor for advanced glycation end products is a sensor for cell-free heme. FEBS J. 2021;288(11):3448-3464. doi:10.1111/febs.15667 183. Chen N, Fu Y, Wang ZX, Zhao XH. Casein Lactose-Glycation of the Maillard- Type Atten-uates the Anti-Inflammatory Potential of Casein Hydrolysate to IEC-6 Cells with Lipopol-ysaccharide Stimulation. Nutrients. 2022 Nov 29;14(23):5067. doi: 10.3390/nu14235067. 184. Devaraj S, Dasu MR, Park SH, Jialal I. Increased levels of ligands of Toll-like receptors 2 and 4 in type 1 diabetes. Diabetologia. 2009 Aug;52(8):1665-8. doi: 10.1007/s00125-009-1394-8 185. Son S, Hwang I, Han SH, Shin JS, Shin OS, Yu JW. Advanced glycation end products impair NLRP3 inflammasome-mediated innate immune responses in macrophages. J Biol Chem. 2017 Dec 15;292(50):20437-20448. doi: 10.1074/jbc.M117.806307. 186. Jialal, I.; Polage, C.; Devaraj, S. Severe Hyperglycemia Down Regulates Toll- Like Receptors on Neutrophils: Implications for Propensity to Infections in Diabetics (Experimental Biology 2013 Meeting Abstracts) FASEB Journal, Pathology. 27 (S1), 2013, 648.11-648.11 https://doi.org/10.1096/fasebj.27.1_supplement.648.11 187. Spiller S, Li Y, Blüher M, Welch L, Hoffmann R. Diagnostic Accuracy of Protein Glycation Sites in Long-Term Controlled Patients with Type 2 Diabetes Mellitus and Their Prognostic Potential for Early Diagnosis. Pharmaceuticals (Basel). 2018;11(2):38 98 188. Brady LJ, Martinez T, Balland A. Characterization of nonenzymatic glycation on a monoclonal antibody. Anal Chem. 2007 Dec 15;79(24):9403-13. doi: 10.1021/ac7017469. 189. Quan C, Alcala E, Petkovska I, Matthews D, Canova-Davis E, Taticek R, Ma S. A study in glycation of a therapeutic recombinant humanized monoclonal antibody: where it is, how it got there, and how it affects charge-based behavior. Anal Biochem. 2008 Feb 15;373(2):179-91. doi: 10.1016/j.ab.2007.09.027. 190. Kaschak T, Boyd D, Yan B. Characterization of glycation in an IgG1 by capillary electrophoresis sodium dodecyl sulfate and mass spectrometry. Anal Biochem. 2011 Oct 15;417(2):256-63. doi: 10.1016/j.ab.2011.06.024. 191. Zhang B, Yang Y, Yuk I, Pai R, McKay P, Eigenbrot C, Dennis M, Katta V, Francissen KC. Unveiling a glycation hot spot in a recombinant humanized monoclonal antibody. Anal Chem. 2008 Apr 1;80(7):2379-90. doi: 10.1021/ac701810q 192. Dolhofer R, Siess EA, Wieland OH. Nonenzymatic glycation of immunoglobulins leads to an impairment of immunoreactivity. Biol Chem Hoppe Seyler. 1985 Apr;366(4):361-6. doi: 10.1515/bchm3.1985.366.1.361. 193. Dolhofer-Bliesener R, Gerbitz KD. Impairment by glycation of immunoglobulin G Fc fragment function. Scand J Clin Lab Invest. 1990 Nov;50(7):739-46. doi: 10.3109/00365519009091067 194. Hennessey PJ, Black CT, Andrassy RJ. Nonenzymatic glycosylation of immunoglobulin G impairs complement fixation. JPEN J Parenter Enteral Nutr. 1991 Jan-Feb;15(1):60-4. doi: 10.1177/014860719101500160. 195. Black CT, Hennessey PJ, Andrassy RJ. Short-term hyperglycemia depresses immunity through nonenzymatic glycosylation of circulating immunoglobulin. J Trauma. 1990 Jul;30(7):830-2; discussion 832-3. doi: 10.1097/00005373-199007000-00012. 196. Son M. Understanding the contextual functions of C1q and LAIR-1 and their applications. Exp Mol Med. 2022;54(5):567-572. doi:10.1038/s12276-022-00774-4 197. Zheng, F., Cai, W., Mitsuhashi, T. et al. Lysozyme Enhances Renal Excretion of Advanced Glycation Endproducts In Vivo and Suppresses Adverse AGE-mediated Cellular Effects In Vitro: A Potential AGE Sequestration Therapy for Diabetic Nephropathy?. Mol Med 7, 737–747 (2001). https://doi.org/10.1007/BF03401963 198. Schmidt AM, Mora R, Cao R, Yan SD, Brett J, Ramakrishnan R, Tsang TC, Simionescu M, Stern D. The endothelial cell binding site for advanced glycation end products consists of a complex: an integral membrane protein and a lactoferrin-like polypeptide. J Biol Chem. 1994 Apr 1;269(13):9882-8. 199. Li YM, Tan AX, Vlassara H. Antibacterial activity of lysozyme and lactoferrin is inhibited by binding of advanced glycation-modified proteins to a conserved motif. Nat Med. 1995 Oct;1(10):1057-61. doi: 10.1038/nm1095-1057 99 200. Li YM. Glycation ligand binding motif in lactoferrin. Implications in diabetic infection. Adv Exp Med Biol. 1998;443:57-63. doi: 10.1007/978-1-4757-9068-9_7 201. Bellamy W, Takase M, Yamauchi K, Wakabayashi H, Kawase K, Tomita M. Identification of the bactericidal domain of lactoferrin. Biochim Biophys Acta. 1992 May 22;1121(1-2):130-6. doi: 10.1016/0167-4838(92)90346-f. 202. Ansari NA, Moinuddin, Alam K, Ali A. Preferential recognition of Amadori-rich lysine residues by serum antibodies in diabetes mellitus: role of protein glycation in the disease process. Hum Immunol. 2009 Jun;70(6):417-24. doi: 10.1016/j.humimm.2009.03.015. 203. Ansari NA, Moinuddin, Mir AR, Habib S, Alam K, Ali A, Khan RH. Role of early glycation Amadori products of lysine-rich proteins in the production of autoantibodies in diabetes type 2 patients. Cell Biochem Biophys. 2014 Nov;70(2):857-65. doi: 10.1007/s12013-014-9991-7. 204. Dong H, Zhang Y, Huang Y, Deng H. Pathophysiology of RAGE in inflammatory diseases. Front Immunol. 2022;13:931473. Published 2022 Jul 29. doi:10.3389/fimmu.2022.931473 205. Jangde N, Ray R, Rai V. RAGE and its ligands: from pathogenesis to therapeutics. Crit Rev Biochem Mol Biol. 2020;55(6):555-575. doi:10.1080/10409238.2020.1819194 206. Yue Q, Song Y, Liu Z, Zhang L, Yang L, Li J. Receptor for Advanced Glycation End Products (RAGE): A Pivotal Hub in Immune Diseases. Molecules. 2022;27(15):4922. Published 2022 Aug 2. doi:10.3390/molecules27154922 207. Liu J, Jin Z, Wang X, Jakoš T, Zhu J, Yuan Y. RAGE pathways play an important role in regulation of organ fibrosis. Life Sci. 2023;323:121713. doi:10.1016/j.lfs.2023.121713 208. Heidland A, Sebekova K, Schinzel R. Advanced glycation end products and the progressive course of renal disease. Am J Kidney Dis. 2001;38(4 Suppl 1):S100-S106. doi:10.1053/ajkd.2001.27414 209. Shen CY, Li KJ, Wu CH, et al. Unveiling the molecular basis of inflamm-aging induced by advanced glycation end products (AGEs)-modified human serum albumin (AGE-HSA) in patients with different immune-mediated diseases. Clin Immunol. 2023;252:109655. doi:10.1016/j.clim.2023.109655 210. Scheper AF, Schofield J, Bohara R, Ritter T, Pandit A. Understanding glycosylation: Regulation through the metabolic flux of precursor pathways [published online ahead of print, 2023 Jun 6]. Biotechnol Adv. 2023;108184. doi:10.1016/j.biotechadv.2023.108184 100 211. Cockram PE, Kist M, Prakash S, Chen SH, Wertz IE, Vucic D. Ubiquitination in the regulation of inflammatory cell death and cancer. Cell Death Differ. 2021;28(2):591- 605. doi:10.1038/s41418-020-00708-5 212. Beck DB, Werner A, Kastner DL, Aksentijevich I. Disorders of ubiquitylation: unchained inflammation. Nat Rev Rheumatol. 2022;18(8):435-447. doi:10.1038/s41584- 022-00778-4 213. Weissman AM, Shabek N, Ciechanover A. The predator becomes the prey: regulating the ubiquitin system by ubiquitylation and degradation [published correction appears in Nat Rev Mol Cell Biol. 2011 Oct;12(10):686]. Nat Rev Mol Cell Biol. 2011;12(9):605-620. Iyengar PV. Regulation of Ubiquitin Enzymes in the TGF-β Pathway. Int J Mol 214. Sci. 2017;18(4):877. Published 2017 Apr 20. doi:10.3390/ijms18040877 215. Wang X, Li Z, Li W, Li C, Liu J, Lu Y, Fan J, Ren H, Huang L, Wang Z. Gestational diabetes mellitus affects the fucosylation and sialylation levels of N/O- glycans in human milk glycoproteins. Carbohydr Polym. 2023 Feb 1;301(Pt A):120312. doi: 10.1016/j.carbpol.2022.120312 216. Uchiki T, Weikel KA, Jiao W, Shang F, Caceres A, Pawlak D, Handa JT, Brownlee M, Nagaraj R, Taylor A. Glycation-altered proteolysis as a pathobiologic mechanism that links dietary glycemic index, aging, and age-related disease (in nondiabetics). Aging Cell. 2012 Feb;11(1):1-13. doi: 10.1111/j.1474-9726.2011.00752.x. 217. Uçkay I, Schöni M, Berli MC, et al. The association of chronic, enhanced immunosuppression with outcomes of diabetic foot infections. Endocrinol Diabetes Metab. 2022;5(1):e00298. doi:10.1002/edm2.298 . 218. de Lourdes Ochoa-González F, González-Curiel IE, Cervantes-Villagrana AR, Fernán-dez-Ruiz JC, Castañeda-Delgado JE. Innate Immunity Alterations in Type 2 Diabetes Mellitus: Understanding Infection Susceptibility. Curr Mol Med. 2021;21(4):318-331. doi: 10.2174/1566524020999200831124534. 219. Song H, Ma H, Shi J, et al. Optimizing glycation control in diabetes: An integrated approach for inhibiting nonenzymatic glycation reactions of biological macromolecules. Int J Biol Macromol. 2023;243:125148. doi:10.1016/j.ijbiomac.2023.125148 220. Chilukuri H, Kulkarni MJ, Fernandes M. Revisiting amino acids and peptides as anti-glycation agents. Medchemcomm. 2018 Feb 12;9(4):614-624. doi: 10.1039/c7md00514h. 221. Esmaeili F, Maleki V, Kheirouri S, Alizadeh M. The Effects of Taurine Supplementation on Metabolic Profiles, Pentosidine, Soluble Receptor of Advanced Glycation End Products and Methylglyoxal in Adults With Type 2 Diabetes: A Randomized, Double-Blind, Place-bo-Controlled Trial. Can J Diabetes. 2021 Feb;45(1):39-46. doi: 10.1016/j.jcjd.2020.05.004. 101 222. Houjeghani S, Kheirouri S, Faraji E, Jafarabadi MA. l-Carnosine supplementation attenuated fasting glucose, triglycerides, advanced glycation end products, and tumor necrosis factor-α levels in patients with type 2 diabetes: a double-blind placebo- controlled randomized clinical trial. Nutr Res. 2018 Jan;49:96-106. doi: 10.1016/j.nutres.2017.11.003 223. Ashor AW, Werner AD, Lara J, Willis ND, Mathers JC, Siervo M. Effects of vitamin C supplementation on glycaemic control: a systematic review and meta-analysis of randomised controlled trials. Eur J Clin Nutr. 2017 Dec;71(12):1371-1380. doi: 10.1038/ejcn.2017.24 224. Root-Bernstein R, Busik JV, Henry DN. Are diabetic neuropathy, retinopathy and nephropathy caused by hyperglycemic exclusion of dehydroascorbate uptake by glucose transporters?. J Theor Biol. 2002;216(3):345-359. doi:10.1006/jtbi.2002.2535 225. Aminian A, Tu C, Milinovich A, Wolski KE, Kattan MW, Nissen SE. Association of Weight Loss Achieved Through Metabolic Surgery With Risk and Severity of COVID-19 Infection. JAMA Surg. 2022 Mar 1;157(3):221-230. doi: 10.1001/jamasurg.2021.6496 226. Wood GC, Benotti PN, Fano RM, Dove JT, Rolston DD, Petrick AT, Still CD. Prior metabolic surgery reduced COVID-19 severity: Systematic analysis from year one of the COVID-19 pandemic. Heliyon. 2023 May;9(5):e15824. doi: 10.1016/j.heliyon.2023.e15824. 227. Yin C, Wong JH, Ng TB. Recent studies on the antimicrobial peptides lactoferricin and lactoferrampin. Curr Mol Med. 2014;14(9):1139-54. doi: 10.2174/1566524014666141015151749. 228. Xu Y, Wang Y, He J, Zhu W. Antibacterial properties of lactoferrin: A bibliometric analysis from 2000 to early 2022. Front Microbiol. 2022 Aug 17;13:947102. doi: 10.3389/fmicb.2022.947102. 229. Campione E, Cosio T, Rosa L, Lanna C, Di Girolamo S, Gaziano R, Valenti P, Bianchi L. Lactoferrin as Protective Natural Barrier of Respiratory and Intestinal Mucosa against Coronavirus Infection and Inflammation. Int J Mol Sci. 2020 Jul 11;21(14):4903. doi: 10.3390/ijms21144903. 230. Wang Y, Wang P, Wang H, Luo Y, Wan L, Jiang M, Chu Y. Lactoferrin for the treatment of COVID-19 (Review). Exp Ther Med. 2020 Dec;20(6):272. doi: 10.3892/etm.2020.9402. 231. Ohradanova-Repic A, Praženicová R, Gebetsberger L, Moskalets T, Skrabana R, Cehlar O, Tajti G, Stockinger H, Leksa V. Time to Kill and Time to Heal: The Multifaceted Role of Lactoferrin and Lactoferricin in Host Defense. Pharmaceutics. 2023 Mar 24;15(4):1056. doi: 10.3390/pharmaceutics15041056. 232. Einerhand AWC, van Loo-Bouwman CA, Weiss GA, Wang C, Ba G, Fan Q, He B, Smit G. Can Lactoferrin, a Natural Mammalian Milk Protein, Assist in the Battle against COVID-19? Nutrients. 2022 Dec 10;14(24):5274. doi: 10.3390/nu14245274. 102 233. Bolat E, Eker F, Kaplan M, Duman H, Arslan A, Saritaş S, Şahutoğlu AS, Karav S. Lactoferrin for COVID-19 prevention, treatment, and recovery. Front Nutr. 2022 Nov 7;9:992733. doi: 10.3389/fnut.2022.992733 234. Rosa L, Cutone A, Conte MP, Campione E, Bianchi L, Valenti P. An overview on in vitro and in vivo antiviral activity of lactoferrin: its efficacy against SARS-CoV-2 infection. Biometals. 2023 Jun;36(3):417-436. doi: 10.1007/s10534-022-00427-z. 235. Matino E, Tavella E, Rizzi M, Avanzi GC, Azzolina D, Battaglia A. et al. Effect of Lactoferrin on Clinical Outcomes of Hospitalized Patients with COVID-19: The LAC Randomized Clinical Trial. Nutrients. 2023 Mar 4;15(5):1285. doi: 10.3390/nu15051285. 236. Hoang HD, Naeli P, Alain T, Jafarnejad SM. Mechanisms of impairment of interferon production by SARS-CoV-2. Biochem Soc Trans. 2023 May 18:BST20221037. doi: 10.1042/BST20221037. 237. Saleki K, Yaribash S, Banazadeh M, Hajihosseinlou E, Gouravani M, Saghazadeh A, Rezaei N. Interferon therapy in patients with SARS, MERS, and COVID-19: A systematic review and meta-analysis of clinical studies. Eur J Pharmacol. 2021 Sep 5;906:174248. doi: 10.1016/j.ejphar.2021.174248. 238. Jhuti D, Rawat A, Guo CM, Wilson LA, Mills EJ, Forrest JI. Interferon Treatments for SARS-CoV-2: Challenges and Opportunities. Infect Dis Ther. 2022 Jun;11(3):953- 972. doi: 10.1007/s40121-022-00633-9 239. Moeinafshar A, Yazdanpanah N, Rezaei N. Immune-based therapeutic approaches in COVID-19. Biomed Pharmacother. 2022 Jul;151:113107. doi: 10.1016/j.biopha.2022.113107. 240. Tobian AAR, Cohn CS, Shaz BH. COVID-19 convalescent plasma. Blood. 2022 Jul 21;140(3):196-207. doi: 10.1182/blood.2021012248 241. RECOVERY Collaborative Group. Convalescent plasma in patients admitted to hospital with COVID-19 (RECOVERY): a randomised controlled, open-label, platform trial. Lancet. 2021 May 29;397(10289):2049-2059. doi: 10.1016/S0140-6736(21)00897- 7. 242. Vedantam D, Poman DS, Motwani L, Asif N, Patel A, Anne KK. Stress-Induced Hyperglycemia: Consequences and Management. Cureus. 2022 Jul 10;14(7):e26714. doi: 10.7759/cureus.26714. 243. Mifsud S, Schembri EL, Gruppetta M. Stress-induced hyperglycaemia. Br J Hosp Med (Lond). 2018 Nov 2;79(11):634-639. doi: 10.12968/hmed.2018.79.11.634. 244. Livesey G, Taylor R, Livesey H, Liu S. Is there a dose-response relation of dietary glycemic load to risk of type 2 diabetes? Meta-analysis of prospective cohort studies. Am J Clin Nutr. 2013 Mar;97(3):584-96. doi: 10.3945/ajcn.112.041467. 103 Imamura F, O’Connor L, Ye Z, Mursu J, Hayashino Y, Bhupathiraju SN, et al. 245. Consumption of sugar sweetened beverages, artificially sweetened beverages, and fruit juice and incidence of type 2 diabetes: systematic review, meta-analysis, and estimation of population attributable fraction. BMJ. 2015;351:h3576. 246. Malik VS, Popkin BM, Bray GA, Després J-P, Willett WC, Hu FB. Sugar- sweetened beverages and risk of metabolic syndrome and type 2 diabetes: a meta- analysis. Diabetes Care. 2010;33:2477–83. 247. Stern D, Mazariegos M, Ortiz-Panozo E, Campos H, Malik VS, Lajous M, et al. Sugar-sweetened soda consumption increases diabetes risk among Mexican women. J Nutr. 2019;149:795–803. 248. Yu J, Shi S, Zhang F, Chen G, Cao M. PredGly: predicting lysine glycation sites for Homo sapiens based on XGboost feature optimization. Bioinformatics. 2019;35(16):2749-2756. doi:10.1093/bioinformatics/bty1043 249. Johansen MB, Kiemer L, Brunak S. Analysis and prediction of mammalian protein glycation. Glycobiology. 2006;16(9):844-853. doi:10.1093/glycob/cwl009 250. Xu Y, Li L, Ding J, Wu LY, Mai G, Zhou F. Gly-PseAAC: Identifying protein lysine glycation through sequences. Gene. 2017;602:1-7. doi:10.1016/j.gene.2016.11.021 4.2. Glycation Effect Prediction 251. Samaranayake YH, Samaranayake LP, Wu PC, So M. The antifungal effect of lactoferrin and lysozyme on Candida krusei and Candida albicans. APMIS. 1997 252. Edgerton M and Koshlukova SE: Salivary histatin 5 and its similarities to the other antimicrobial proteins in human saliva. Adv Dent Res 14: 16‐21, 2000. 253. Nawaz N, Wen S, Wang F, Nawaz S, Raza J, Iftikhar M, Usman M. Lysozyme and Its Application as Antibacterial Agent in Food Industry. Molecules. 2022 Sep 24;27(19):6305. doi: 10.3390/molecules27196305. 254. Aminlari L, Hashemi MM, Aminlari M. Modified lysozymes as novel broad spectrum natural antimicrobial agents in foods. J Food Sci. 2014 Jun;79(6):R1077-90. doi: 10.1111/1750-3841.12460. 255. Sebaa S, Hizette N, Boucherit-Otmani Z, Courtois P. Dose‐dependent effect of lysozyme upon Candida albicans biofilm. Mol Med Rep. 2017 Mar;15(3):1135-1142. doi: 10.3892/mmr.2017.6148. 104 Chapter 3: SweetSMILE: A Novel Image Based 3D Convolutional Neural Network For Predicting Protein Glycation 1. Background There are over 20,000 proteins in the human body [1]. Experimentally checking the glycation potential for each would be a daunting task in and of itself, but becomes an intractable problem from an experimental point of view when one begins to consider that they exist in various states of post-translational modification, their cleavage and metabolite states, and in reaction conditions that can vary significantly in regards to a myriad of factors that influence the protein glycation reaction. Thus it becomes necessary to be able to narrow the search for proteins of interest that are most likely or most unlikely to undergo glycation using an in silico approach. However, in order to obtain an output that is physiologically relevant one must employ care in both the model and data used in generating predictions. This chapter summarizes the general approaches that have been used previously to predict protein glycation, the limitations of these methods, and a novel approach pioneered in this dissertation. Machine learning techniques readily lend themselves to addressing very complex, data-intensive problems such as predicting protein glycation due to their ability to mine complex patterns from extremely large datasets [2]. Of these machine learning techniques, convolutional neural networks are a promising choice owing to the ability of convolutional layers being particularly adept at learning to contextualize input features as they create a nonlinear map between input data and output objectives [3,4]. The effectiveness of utilizing such an approach has been demonstrated in other instances of 105 post-translational modification (PTM) prediction, such as ubiquitination, as well as for the prediction of glycation itself, both of which occur primarily on select lysine residues [5]. In these cases, the problem has universally been approached as some form of a natural language processing (NLP) problem [6]. Because glycation prediction has been treated as an NLP problem, input data consists of sections of varying lengths surrounding glycated and unglycated lysine residues within a protein. These sequences have been represented as strings of the single-letter notation of the amino acids comprising them with a placeholder residue 'X' used to denote a position where a particular window extends beyond the bounds of the protein sequence. While this NLP approach has had success, reducing amino acids down to their single-letter codes results in significant information loss—to such a model a lysine residue is equally different to an arginine as it is to a proline or a glycine. Consequently, the model is then expected to learn to infer similarities and differences between all of the amino acids in addition to learning which are important in regards to protein glycation. Given a sufficiently large dataset this is perhaps not an issue, but glycation datasets are relatively modest in size when compared to most machine learning datasets and they universally contain the added complication of being imbalanced datasets as the number of unglycated lysine residues greatly exceeds the number of glycated. Left unaddressed, such an imbalance in training data invariably leads to the model drastically overpredicting, including sometimes exclusively predicting, the majority class [7]. Regarding the challenge posed by requiring the NLP approach to learn how to differentiate amino acid properties, a number of techniques have been employed, such as providing the model with additional data in the input layer in the form of 106 physicochemical properties of the amino acids and secondary structure information [8- 11]. Another approach has been to use NLP training techniques aimed at improving syntactic and semantic recognition, such as using a continuous distribution representation (CDR). Briefly, summarized a CDR forgoes exclusively using input arrays or matrices composed exclusively of binary values and rather utilizes continuous values ranging between 1 and 0, usually as a vector denoting an individual sample’s relative position in sample space [12]. To address the problems caused by class imbalance, both data and algorithmic style approaches have been employed by previous models, with a far greater degree of emphasis on the data-level modifications. This takes on the form of majority class undersampling techniques or minority class oversampling techniques. In undersampling data points corresponding to the majority class are removed from the dataset, either through random selection or directed means such as clustering, reducing their number until relative balance is achieved [13,14]. Conversely in minority class oversampling the number of minority class data points is increased either through creating exact copies of existing data points or through the use of creating novel data using Synthetic Minority Oversampling Techniques (SMOTE) [15,16]. One particularly noteworthy example of SMOTE being used in predicting protein glycation was the use of a Long Short-Term Memory (LSTM) Recurrent Neural Network (RNN) to generate additional synthetic glycated data points for the training of the DeepGly CNN [17,18]. These approaches have yielded significant improvements in Matthew's Correlation Coefficient (MCC) scores in the most recent glycation prediction programs as compared with the earliest models, such as NetGlycate. MCC is a statistical measure used to 107 evaluate the quality of two-class predictive models because it, unlike simpler metrics like accuracy, will not be biased by a model overfitting to the majority class as discussed above. In such a case, a model could achieve an extremely favorable accuracy score through predicting every input as the majority class, however the MCC would be quite low. NetGlycate was created in 2006 utilizing 60 different artificial networks trained on non-CDR of protein sequences that selected a final prediction through a weighted balloting procedure of the 60 outputs and achieved an MCC of 0.58 [19]. By comparison, DeepGly, published in 2019, is a CNN that employed both CDR and SMOTE as described above and obtained an MCC of 0.766 [18]. However, as has been demonstrated in the previous chapter, the results from the respective models often conflict with one another, their predictive power is still insufficient to truly guide experimental biomarker research, and no model to date lends itself readily to the goal of being able to predict glycation in a more physiologically relevant manner due, in part, to their being tethered to the single- letter code representation of the amino acids, as well as due to inadequately curated glycation data. Among the problems that the NLP approach faces, even when supplemented by physicochemical properties, secondary structure, continuous representation, and oversampling, are the following: how does one adjust the letter codes or physicochemical properties of an amino acid to account for it being post-translationally modified? How does one address the varying reaction environments that proteins can exist in such as compartments that don't have a pH of 7.4 and thus the protonation state of the amino acids may be different? The purpose of this dissertation is to provide novel approaches to addressing these problems that will improve glycation prediction. 108 One possible replacement for the single-letter code would be to use the simplified molecular-input line-entry system (SMILES) instead as this would represent the amino acids as a version of their chemical formulas thereby resolving or minimizing many of the aforementioned issues [20]. Unfortunately, the SMILE format itself is degenerative in that multiple sequences can be used to represent the same molecule which would therefore reduce the informational value provided to the model for a given sample. That said, the issue of the degenerative nature of the SMILE code can be resolved through converting it into a visual representation of the chemical structures. While previous glycation prediction CNNs have been structured similarly to NLP networks, higher-dimension CNNs have extensively been utilized for image recognition functions [21]. Thus, the solution proposed here is to move beyond strings of single-letter amino codes and instead use an image of the two-dimensional representation of their chemical structures. This presents the model with significantly more data to learn from—it sees that a tryptophan residue occupies more space in the image than a glycine, that leucine and isoleucine look remarkably similar, etc. A picture is worth a thousand words, as the saying goes. To this end, a 3D CNN was constructed and trained on visual representations of the chemical structure of the sequence window surrounding lysine residues. The end result is a model that exceeds the MCC values obtained by previous models and can do so utilizing smaller datasets. Moreover, the use of the chemical structures to train the model as opposed to the abstract single-letter coded amino acids simultaneously creates a framework that permits the expansion of glycation prediction beyond many of the constraints placed upon previous models because the chemical structures serve as a common basis regardless of amino acid state or even when looking at other biomolecules 109 entirely. This positions the model to be able to tackle glycation prediction in a much more comprehensive and physiologically relevant manner as more detailed datasets become available. 2. Methods 2.1. Base Datasets The SweetSMILE model was trained on three different datasets detailed in the following subsections. Datasets A and B have been commonly utilized by numerous glycation prediction programs and thus allow for a degree of comparability between model performance, while Dataset C was constructed for the purpose of training the model on the most physiologically relevant data. 2.1.1. Dataset A The glycation dataset from the Compendium of Protein Lysine Modifications (CPLM) [22] . The full dataset consists of 72 unique proteins and contains 2046 non-glycated lysines and 323 glycated lysines. The full dataset can be obtained at: http://cplm.biocuckoo.org/ 2.1.2. Dataset B This dataset was generated by Johansen et al and is the result of an extensive review of the experimental literature. Their goal was to create a dataset that would be consistent with glycation in vivo, so any lysine residue in a pro- and signal peptide was masked and in vitro data was only included if the experimental conditions reflected physiological conditions. It contains 20 proteins with 89 glycated lysines and 126 non-glycated lysines [19]. The dataset can be obtained at: https://services.healthtech.dtu.dk/datasets/GlycateBase-1.0/ 110 2.1.3. Dataset C This dataset is resultant from the union of datasets A and B. However, proteins glycated in vitro were subjected to screening based upon the experimental conditions under which the data was collected. To be left in the final dataset, these conditions needed to include: the reaction solution being phosphate buffered saline or similar, incubated at a temperature of 37 degrees Celsius, and at a glucose concentration of no greater than 10 mmol/L. This subset contains 279 glycated lysines and 1446 non-glycated lysines from 61 unique proteins. 2.2. Primary Sequence and Secondary Structure Window Generation and Analysis Dataset Cleaning For every protein in Dataset C, predictions of the secondary structure were generated utilizing the SPOT-1D algorithm [23]. Sequence and secondary structure windows were generated for all lysines within a protein by taking a cross-section of the sequence of 31 residues with 15 residues upstream and downstream of the lysine (Eq 1). In the case of the window length extending beyond the length of the protein a dummy residue, ‘X’, was used. After all windows were generated, any duplicates of a particular primary sequence were excluded from the dataset and in the case of conflict between the glycation statuses of windows the glycated state was assigned. (Eq 1.) Window = aa-naa(-n+1)....K0…….aa(n-1)aan For the primary sequences a sequence logo for the non-glycated and glycated sequences using LogoMaker [24]. Additionally for each glycation status, the amino acid and secondary structure frequency was calculated for each window position and 111 compared against the frequency of that amino acid or secondary structural element occurring at that position of all windows. 2.3. Correcting Dataset Class Imbalance The next step was to begin addressing the minority class problem presented by the glycation datasets. Oftentimes data-level approaches utilize random over-sampling of the minority class or under-sampling of the majority class. While such approaches can be valid, they implicitly rely upon the a priori assumptions that one has a sufficiently large dataset that is representative of the population for which one is trying to make predictions about. Given the small size of the glycation datasets, as well as the fact that the datapoints originated from studies that were not randomly selecting proteins from the body to test, such approaches could inadvertently lead to overweighting certain regions of the feature space. The use of SMOTE to bolster the number of minority samples is undoubtedly an improvement upon naive random oversampling in such a case, however still technically has the same implicit a priori assumptions and could lead to an overfitting of the minority class region of feature space reducing the ability of the model to generalize its predictions. Thus it seemed prudent to utilize a non-naive undersampling approach, such as clustering. To this end a recursive clustering undersampling was performed upon the non- glycated sequences according to their homology using CD-HIT [25]. Briefly summated CD-HIT generates clusters of sequences based upon a selected homology threshold and selects a sequence that is most representative of the sequences within the cluster. First, the windows were aggressively clustered using a threshold of 10% homology to break non-glycated sample space into very broad clusters. The representative sequences were retained and clustering was performed upon the remaining members of each individual 112 clustered set at 20% homology. This process was iterated with the homology threshold increasing by 10% each iteration, capping out at a maximum value of 50%, until an iteration was reached where the removal of the sum of the non-representative proteins in the clusters brought the given dataset to a ratio <55:45 non-glycated:glycated sequences. 2.4. Creating Chemical Structure Images for Model Input The windows, except for any X residues, were then converted into SMILE format and then into two-dimensional graphical representations of their chemical structures desired glycation windows using RDkit [26]. The images were generated with all default parameters except for orientation, which was changed to set a standardized orientation and alignment. The images were then converted into their RGB matrix format. At this point a zero matrix corresponding to white space was appended to the base RGB matrix for each X residue present in a sequence. Finally, the largest matrix was identified and the remaining matrices were padded to create a uniform input size. 2.5. Convolutional Neural Network Architecture, Compiling, and Training 2D and 3D CNNs have been employed in numerous applications of image recognition and processing of grayscale and colored images, respectively [27-30]. However, this technique has never been utilized for the purpose of predicting protein glycation sites. As detailed above, the primary sequence of a protein can be converted into a colored 2D image of its chemical structure before being transformed into a pixel matrix that can then be input into the CNN. Due to the small sample size and relatively simplistic and standardized nature of the images, the general architecture for the model is kept compact. A single 3D convolutional layer utilizing a leaky rectification linear unit (ReLU) activation function receives the input 113 layer before passing it on to a max pooling layer [31]. This arrangement is part of the backbone of countless image classification algorithms due to its ability to extract key patterns from convolutional feature maps extremely efficiently largely owing to the nature of the leaky ReLU activation function. Broadly summarized, it is a piece-wise activation function that takes the maximum value for a given neuronal input and returns a positive, near zero valued output for all negative inputs or a positive, linearly increasing output for all positive inputs. By examining only the maximum input value key features are more readily highlighted and computational load and complexity are simultaneously reduced. The aforementioned extracted patterns are then passed to the classification head portion of the network where the data is first flattened before being fed into a single dense hidden layer utilizing softmax activation to predict the probability of whether the central lysine residue is glycated or not [32]. Even compact neural networks such as this one have a large number of hyperparameters that need to be tuned to maximize the model's performance. Optimization of these values was accomplished using the Hyperband algorithm, which allows for the highly optimized values to be determined without the need to iterate through every possible combination of hyperparameters [33]. This is accomplished through the generation of random hyperparameter value combinations which are then evaluated on their performance. Hyperparameter value combinations and individual hyperparameter values that perform poorly are pruned, new combinations of the surviving hyperparameter values are generated, and the process is repeated. 114 2.6. Model Evaluation Due to the limited size of the datasets and to make the evaluation most comparable to previous models, k-fold cross-validation was employed to assess the generalizability and skill of the network [34]. Briefly, in k-fold cross-validation, a dataset is initially broken into k groups of approximately equal size. The model is trained upon k-1 of these groups and then evaluated on the withheld group. This is repeated until each group has served as the withheld group, whereafter the evaluation metric is averaged across all runs. Given the imbalanced nature of the datasets, MCC (Eq 2.) was the metric selected to evaluate the model's ability to correctly predict the glycation status of a lysine. The use of MCC also permitted direct comparison of results to previous models[35]. (Eq. 2) , where TP, TN, FP, and FN, are true positive, true negative, false positive, and false negative predictions, respectively. 3. Results and Discussion 3.1. Analysis of Primary and Secondary Structure of Glycated and Non- glycated Segments of Proteins The ability for any given lysine residue in a protein to undergo glycation is dictated by the complex interplay of the physicochemical properties of the surrounding amino acids, as well as the higher order structural characteristics of the protein [36-40]. It was therefore prudent to run a comparative analysis to examine whether any patterns in these 115 variables existed between regions of proteins containing lysines that glycate and those containing lysines for which glycation has not been observed. Given the pathophysiological nature of protein glycation in the modern world as well as that most post-translational modifications within the body occur enzymatically, the search for a glycation motif has often been framed within the literature as an arrangement of amino acids that cause glycation [19,41]. Figure 15 and Tables 10 and 11 detail the findings of the primary sequence analysis for both glycated and non-glycated protein segments. The results demonstrate that there are very few amino acids that are overrepresented surrounding glycated lysines and notably that the most commonly overrepresented residues are methionine upstream and the dummy X residue both upstream and downstream, which both are indicative of glycation events occurring towards the terminal ends of proteins and in short peptides. Previous studies have suggested that positively charged amino acids in close proximity to the lysine residue may facilitate glycation [42,43]. This discrepancy likely stems in part from the current study looking at a much larger dataset than the previous studies. It is also possible that positively charged amino acids could still be contributing to the local environment in certain tertiary and quaternary structural arrangements, but the current study strongly suggests that such a contribution is by no means essential when looking at the whole of protein glycation. 116 TABLE 10. Overrepresented Amino Acid Residues in Glycated and Non-glycated Sequence Windows. For every position within the 31 amino acid length window, overrepresented amino acids denoted by their single letter codes are listed for both glycated and non-glycated sequences. X represents where a window extends beyond the length of the natural protein sequence. Position Glycated Non-Glycated -15 -14 -13 -12 -11 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 X X X,L X X,I,V X X,M, P, S X,M X,M X,M L,T,V E,Q — — S,W — G D,Q E V E E — D,Q — X, F X, A X X X X E,M I,V E,Q — C,E,G — Q F,P Q L — E,F,Q C,E,F,S N C,F,Q,T — C,L,M,V F,L,Q,S H D,E,G,I,S C E,Q,W E,N — S C,L P,Q,U — F,T — Q 117 FIGURE 15. Sequence Logos of Glycated (top) and Non-Glycated (bottom) Sequence Windows. A sequence logo depicts the individual and relational probabilities of an amino acid occurring at a given position. FIGURE 16. Probability of Secondary Structural Elements in Glycated and Non-Glycated Sequence Windows. Graphs display the probability of a given position being an alpha helix (left), other coil (middle), or beta-pleated sheet (right) for glycated (green) and non- glycated (red) sequences. 118 In contrast to the glycated regions, there are a great many amino acids that are occurring more frequently than expected in the non-glycated regions. The negatively charged amino acids aspartate and glutamate are both highly expressed throughout these windows likely stemming from the fact that their negative charges being in such close proximity to the positive charge of lysine’s ammonium group mitigates or neutralizes the lysine’s charge. Uncharged polar residues such as serine, threonine, asparagine and glutamine are all prevalent immediately adjacent to the central lysine residue. Larger hydrophobic amino acids are also present in non-glycated lysine windows likely due to the steric hindrance they introduce at the potential glycation site. Additionally, cysteine, glycine, and proline are all heavily represented and are strongly predictive of ordered secondary and tertiary structural elements that may also interfere with glycation [44-49]. This was supported by the SPOT-1D secondary structure results which illustrated a stark contrast between glycated and non-glycated segments, with the former occurring largely in alpha helical regions and the latter in beta-pleated sheets and other coils (Figure 16). Taken in aggregate these findings call into question the idea of the existence of a traditional motif that promotes glycation. Instead, the results suggest that lysine residues are naturally susceptible to undergoing glycation and that rather than neighboring amino acids promoting glycation, glycation may be diminished due to the local amino acids altering the effective charge of the ammonium group of lysine and/or sufficiently high steric hindrance is introduced by adjacent amino acids or higher order structuring of the amino acid chain. Indeed, the observation that in the absence of neighboring amino acids that specifically interfere with glycation, most lysine residues can be glycated is in line with experimental observations showing that free amino acids and short peptides readily 119 glycate [50]. This phenomenon is further evidenced by the ability for lysine, other amino acids, and some short peptides to serve as anti-glycation treatments in diabetics through their ability to scavenge glucose effectively as is demonstrated by their ability to reduce hemoglobin A1C levels [51-54]. The results also suggest that since lysines in helical structures are more likely to glycate than are those in beta-pleated sheets, proteins rich in helices, such as membrane protein sequences, would have needed to evolve with primary sequence arrangements that promoted resistance to glycation that were still permissive for the formation of helices. However, while some clear correlations emerge from this analysis, simple rules did not emerge that would directly allow for the predicting whether or not a lysine would glycate with high accuracy because there are counterexamples in the data for all of the aforementioned, thereby underscoring the highly nuanced nature of the reaction and demonstrating precisely why attempts to predict protein glycation sites would greatly benefit from the use of machine learning techniques. 3.2. Model Performance and Comparison Across all three datasets tested, SweetSMILE achieved a higher respective MCC than any existing model (Table 11). Of the three datasets, the largest increase in predictive performance relative to existing models was seen when training on Dataset B . This is particularly noteworthy, especially in conjunction with the lower variability in MCC values between the datasets, because it offers rather clear support for the central hypothesis underpinning the use of the pictorial representation of the chemical structure of the amino acids rather than their letter codes. That is, it provides the model with more explicit relational information between the amino acids (e.g. a tryptophan looks drastically 120 different than a glycine) rather than the model needing to implicitly learn such similarities and differences between them in addition to then learning what patterns are relevant to predicting glycation. This results in the model being able to achieve a greater level of predictive performance on a much smaller sample size than otherwise possible. The theory that providing more explicit relational information permits higher predictive quality is supported by a similar best result being obtained by iProtGly-SS which incorporated physicochemical characteristics and secondary structure information into the model input. However, that model experienced a marked drop off when tested on Dataset B. In designing iProtGly-SS, a feature selection algorithm was utilized prior to the final training of the model, but this feature selection was conducted on Dataset A. Based upon the marked difference in predictive quality of the model between the two datasets, it seems likely that the features selected only optimize performance on that particular dataset resulting in an overfitting bias that drastically reduces the generalizability of the model to unseen proteins. 121 TABLE 11. Comparison of Matthew’s Correlation Coefficients achieved by various glycation prediction models. Dataset A corresponds to the CPLM dataset while Dataset B corresponds to that generated in the NetGlycate publication. The Any Dataset column displays the best MCC a model was able to achieve on any dataset, including a uniquely modified one, such as Dataset C put forth in this chapter. Model Dataset A Dataset B Any Dataset NetGlycate [19] — 0.58 0.58 Gly-PseAAC [9] 0.38 0.32 0.38 iProtGly-SS [11] 0.878 0.562 0.878 DeepGly [18] 0.838 0.766 0.838 SweetSMILE 0.896 0.838 0.904 As demonstrated in Chapter 2, existing models performed poorly in predicting the glycation statuses of lysine residues of immunological proteins and yielded results that were often contradictory to one another. To assess whether or not SweetSMILE offered an improvement in this area, predictions were generated for serotransferrin and compared against the observed glycation sites within the CPLM database, as well as against predictions made by the NetGlycate, PredGly, and Gly-PseAAC models (Table 12). SweetSMILE outperformed existing models in terms of predictive quality for serotransferrin by a significant margin (Table 13), however the MCC associated with its predictions on serotransferrin was notably than its performance on the glycation datasets used for training. When the sequence windows were examined for the missed predictions, no clear pattern emerged in terms of the primary amino acid sequences. With this in mind, there are several probable causes for the errors. The first is tied to the relatively small 122 size of the glycation datasets. Given the lack of a clear pattern, the incorrectly predicted sequences likely lie along a boundary between glycated and non-glycated prediction space meaning that a small difference in sequence composition that would not significantly impact glycation status in most sequences is not being appreciated by the model. More datapoints would enable the neural network to delineate these boundaries more precisely. This situation does present a potentially interesting use case for SMOTEs, in that sequences that lie along these boundaries functionally represent minority instances of their respective classes. So whereas SMOTEs have traditionally been used to generate synthetic samples of the broad minority class, they could theoretically be repurposed to generate synthetic samples of these minority subclasses which would then enable SweetSMILE to better learn the boundary conditions. The remaining probable sources of error relate to factors beyond the primary sequence information itself. It is possible that in some cases higher order structuring of the protein is bringing distant amino acids into the local reaction environment is a challenging problem to address. Expanding the number of residues constituting a window around a lysine residue could potentially pick some of these distant residues up, but doing so would inherently introduce a significant amount of noise into the model as the intervening residues would not be contributing to the local reaction environment. Realistically, the only approach to catch these particular edge cases would be to incorporate 3D structural data into the model input. However, such data is not readily available for many proteins and consequently it may be that for the time being neural networks are incapable of accurately predicting the glycation statuses of sites affected by higher order structural contributions. 123 Another possible source of error arises from the post-translational modifications of proteins. Some of the data comprising the glycation datasets were obtained from in vivo studies meaning that some of these proteins may have had additional post-translational modifications present. This could confound the obtained glycation data through these modifications directly occupying a lysine residue that otherwise would have glycated or that indirectly modified the local reaction environment in a way that altered the glycation status of lysine residues in either direction. The only solution to this particular issue would be for in vivo glycation studies to also report on the presence of other post-translational modifications of the proteins which could then be incorporated into the inputs being provided to SweetSMILE during training. TABLE 12. Experimentally observed and algorithmically predicted glycation statuses of lysine residues of serotransferrin. Experimentally observed values were taken from the CPLM database. N=Non-glycated residue; G=Glycated residue. Lysine Position 23 37 46 60 61 97 107 121 122 134 135 163 167 212 215 225 236 252 258 Observed NetGlycate PredGly N G G G N N G N G N G N N G N N N N G N N G G G N N G G N N N N N G G G N G G N G N G G G G G N G N N G G G G N N 124 Gly- PseAAC SweetSMILE N N G N G N G N N N G G G N N N N N G N N G G G N G G G N G N N N N G N N G Table 12 (cont’d) 278 295 297 299 310 315 323 331 359 362 373 384 399 420 433 452 453 464 466 467 489 508 509 515 530 546 553 564 571 576 588 610 612 618 646 659 668 676 683 G G G G G G N G G N N N G N N N G G N N G N G G N G G G N N N G N G G G G G G G G G G G N N G N N N N G N N N G G N G G N G G N G G N N N N G N G G G N N G N N G G N G N G G N N N G N N N N G N G N N N N N N N N G N N G N G G G N N G G G N N G N N G N N G G G N N N G N N G N N G G N G G G G N G N N G N G G G G G G G G N N N N N N N N N N G N G N N G G N N N N N N N N N N N N G G N N N N 125 TABLE 13. Confusion Matrix Values and Matthew’s Correlation Coefficients from algorithmic predictions made on serotransferrin. Metric True Positive True Negative False Positive False Negative MCC NetGlycate PredGly 12 19 6 21 0.13 23 15 10 10 0.30 Gly- PseAAC 15 SweetSMILE 26 19 6 18 0.22 22 3 7 0.66 While the traditional definition of glycation has limited predictive approaches to focus on predicting the modification of lysine residues, the physiological reality is far more complex even beyond the potential challenges illustrated by the serotransferrin predictions. Due to variations in local reaction conditions, such as changes in pH resulting in altered protonation status of the amino acids, there are instances of arginine, cysteine, and histidine non-enzymatically reacting with reducing sugars to form glycation adducts [50,55-59]. But beyond the proteinogenic amino acids, such adducts have also been observed on nucleic acids, lipids, and other biomolecules [60-62]. And this is where the true significance of moving away from the single-letter amino acid codes becomes apparent. Because the use of the letter codes is fundamentally prohibitive in attempting to retrain any of the previously existing models on the glycation of other types of biomolecules or modified amino acids, either through protonation differences resulting from pH or due to post-translational modifications being present— the letter codes don’t exist. One could theoretically try to adapt the letter system to accommodate these scenarios, however that results in a combinatorial explosion for the possible inputs for the model to learn thereby reducing the learning value conveyed for any single datapoint 126 entered and consequently would require a truly massive amount of data that would need to be meticulously curated to ensure that it was representatively balanced between modification types, molecule types, etc. This is essentially replacing the intractable problem of determining glycation sites experimentally with a slightly less intractable problem. The use of the graphical representation of the chemical structures of the amino acids avoids this problem because the model isn’t strictly learning the relationship between abstract letters, but rather is learning the principle relationships between components of the chemical structures and those components are applicable to all biomolecules, not just amino acids. Thus, incorporating a biomolecule other than an amino acid or a post- translational modification to an amino acid is as simple as inputting the desired SMILE code so that RDkit can create the visual representation of it and the model will be able to begin making accurate predictions surrounding it with relatively small number of examples because it needn't learn an entirely new set of relationships. One additional benefit of switching to a visual representation of the amino acids is to functionally do away with the dummy residue, X, generated when dealing with sequence windows that extend beyond the length of a protein. Instead such occurrences here are represented by empty space which is also inherently present between amino acid side chains to varying degrees. Substituting space for an X is a subtle nuance, but important in its ability to contribute information regarding steric hindrance to the model, that is largely lost in the letter code representation. One further benefit of this space, and why the model was designed to handle colored images instead of grayscale, is that as increasingly detailed datasets become available one could encode certain regions of this space to 127 denote various aspects of reaction conditions or even information about the tertiary and quaternary structure to further increase the predictive power of the model. 4. Conclusion For the first time, a primary sequence and secondary structure analysis was conducted upon regions surrounding non-glycated lysines in addition to the regions surrounding glycated lysines. The findings suggest that sequences evolve particular arrangements to resist glycation rather than there being a motif that promotes glycation. The model presented herein outperforms the existing models for the task of predicting the glycation of lysine residues in proteins, but more importantly demonstrates the value of utilizing chemical structure rather than amino acid letter code notation through its ability to reduce the number of datapoints required to achieve adequate predictive power and through enabling the ability for a model to make predictions regarding non-lysine amino acids as well as non-amino acid biomolecules. 128 BIBLIOGRAPHY 1. Suhre K, McCarthy MI, Schwenk JM. Genetics meets proteomics: perspectives for large population-based studies. Nat Rev Genet. 2021 Jan;22(1):19-37. doi: 10.1038/s41576-020-0268-2. 2. Greener JG, Kandathil SM, Moffat L, Jones DT. A guide to machine learning for biologists. Nat Rev Mol Cell Biol. 2022 Jan;23(1):40-55. doi: 10.1038/s41580-021-00407- 0. 3. Alzubaidi L, Zhang J, Humaidi AJ, Al-Dujaili A, Duan Y, Al-Shamma O, et al. Review of Deep Learning: Concepts, CNN Architectures, challenges, applications, Future Directions. Journal of Big Data. 2021;8(1). doi:10.1186/s40537-021-00444-8. Aloysius N, Geetha M. A review on deep convolutional Neural Networks. 2017 4. International Conference on Communication and Signal Processing (ICCSP). 2017; doi:10.1109/iccsp.2017.8286426. Fu H, Yang Y, Wang X, Wang H, Xu Y. DeepUbi: a deep learning framework for 5. prediction of ubiquitination sites in proteins. BMC Bioinformatics. 2019 Feb 18;20(1):86. doi: 10.1186/s12859-019-2677-9. Kim Y. Convolutional neural networks for sentence classification. Proceedings of 6. the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2014; doi:10.3115/v1/d14-1181 1. Haibo He, Garcia EA. Learning from Imbalanced Data. IEEE Transactions on 7. Knowledge and Data Engineering. 2009;21(9):1263–84. doi:10.1109/tkde.2008.239. Liu Y, Gu W, Zhang W, Wang J. Predict and Analyze Protein Glycation Sites with Int. 2015;2015:561547. doi: IFS Methods. Biomed Res 8. the mRMR and 10.1155/2015/561547. Xu Y, Li L, Ding J, Wu LY, Mai G, Zhou F. Gly-PseAAC: Identifying protein lysine doi: sequences. 20;602:1-7. Gene. 2017 Feb 9. glycation 10.1016/j.gene.2016.11.021. through 10. Zhao X, Zhao X, Bao L, Zhang Y, Dai J, Yin M. Glypre: In Silico Prediction of Protein Glycation Sites by Fusing Multiple Features and Support Vector Machine. Molecules. 2017 Nov 3;22(11):1891. doi: 10.3390/molecules22111891. 11. Islam MM, Saha S, Rahman MM, Shatabda S, Farid DM, Dehzangi A. iProtGly- SS: Identifying protein glycation sites using sequence and structure based features. Proteins. 2018 Jul;86(7):777-789. doi: 10.1002/prot.25511. 129 12. Asgari E, Mofrad MR. Continuous Distributed Representation of Biological Sequences for Deep Proteomics and Genomics. PLoS One. 2015 Nov 10;10(11):e0141287. doi: 10.1371/journal.pone.0141287. Zhang J, Chen L. Clustering-based undersampling with random over sampling 13. examples and support vector machine for imbalanced classification of breast cancer diagnosis. Comput Assist Surg (Abingdon). 2019 Oct;24(sup2):62-72. doi: 10.1080/24699322.2019.1649074. 14. Wibowo P, Fatichah C. Pruning-based oversampling technique with smoothed bootstrap resampling for imbalanced clinical dataset of covid-19. Journal of King Saud 2022;34(9):7830–9. University doi:10.1016/j.jksuci.2021.09.021. Information Sciences. - Computer and 15. Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP. Smote: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research. 2002;16:321–57. doi:10.1613/jair.953 Blagus R, Lusa L. SMOTE for high-dimensional class-imbalanced data. BMC Bioinformatics. 2013 Mar 22;14:106. doi: 10.1186/1471-2105-14-106. 16. Mikolov T, Karafiát M, Burget L, ÄŒernocký J, Khudanpur S. Recurrent neural network 2010; model. doi:10.21437/interspeech.2010-343. Interspeech language based 2010. 17. Chen J, Yang R, Zhang C, Zhang L, Zhang Q. DeepGly: A deep learning framework with recurrent and convolutional neural networks to identify protein glycation 2019;7:142368–78. sites doi:10.1109/access.2019.2944411. imbalanced Access. IEEE data. from Johansen MB, Kiemer L, Brunak S. Analysis and prediction of mammalian protein 18. glycation. Glycobiology. 2006 Sep;16(9):844-53. doi: 10.1093/glycob/cwl009. 19. Weininger D. Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of Chemical Information and Computer Sciences. 1988;28(1):31–6. doi:10.1021/ci00057a005. 20. Amjoud AB, Amrouch M. Object detection using deep learning, CNNS and Vision 2023;11:35479–516. IEEE Review. Transformers: doi:10.1109/access.2023.3266093. Access. A Liu Z, Wang Y, Gao T, Pan Z, Cheng H, Yang Q, Cheng Z, Guo A, Ren J, Xue Y. lysine modifications. Nucleic Acids Res. 2014 21. CPLM: a database of protein Jan;42(Database issue):D531-6. doi: 10.1093/nar/gkt1093. 22. Singh J, Litfin T, Paliwal K, Singh J, Hanumanthappa AK, Zhou Y. SPOT-1D- Single: improving the single-sequence-based prediction of protein secondary structure, backbone angles, solvent accessibility and half-sphere exposures using a large training 130 set and ensembled deep learning. Bioinformatics. 2021 Oct 25;37(20):3464-3472. doi: 10.1093/bioinformatics/btab316. Tareen A, Kinney JB. Logomaker: beautiful sequence logos in Python. 23. Bioinformatics. 2020 Apr 1;36(7):2272-2274. doi: 10.1093/bioinformatics/btz921. 24. Fu L, Niu B, Zhu Z, Wu S, Li W. CD-HIT: accelerated for clustering the next- generation sequencing data. Bioinformatics. 2012 Dec 1;28(23):3150-2. doi: 10.1093/bioinformatics/bts565. 25. RDKit: Open-source cheminformatics. https://www.rdkit.org 26. Yu J, Qin M, Zhou S. Dynamic gesture recognition based on 2D convolutional neural network and fusion. Sci Rep. 2022 Mar 14;12(1):4345. doi: feature 10.1038/s41598-022-08133-z. 27. Starke S, Leger S, Zwanenburg A, Leger K, Lohaus F, Linge A, Schreiber A, Kalinauskaite G, Tinhofer I, Guberina N, Guberina M, Balermpas P, von der Grün J, Ganswindt U, Belka C, Peeken JC, Combs SE, Boeke S, Zips D, Richter C, Troost EGC, Krause M, Baumann M, Löck S. 2D and 3D convolutional neural networks for outcome modelling of locally advanced head and neck squamous cell carcinoma. Sci Rep. 2020 Sep 24;10(1):15625. doi: 10.1038/s41598-020-70542-9. 28. Oh S, Kang SR, Oh IJ, Kim MS. Deep learning model integrating positron emission tomography and clinical data for prognosis prediction in non-small cell lung cancer patients. BMC Bioinformatics. 2023 Feb 6;24(1):39. doi: 10.1186/s12859-023-05160-z. 29. Vargas VM, Guijo-Rubio D, Gutiérrez PA, Hervás-Martà nez C. Relu-based activations: Analysis and experimental study for Deep Learning. Advances in Artificial Intelligence. 2021;33–43. doi:10.1007/978-3-030-85713-4_4. 30. Gao F, Li B, Chen L, Shang Z, Wei X, He C. A softmax classifier for high-precision classification of ultrasonic similar signals. Ultrasonics. 2021 Apr;112:106344. doi: 10.1016/j.ultras.2020.106344. 31. Li L, Jamieson K, DeSalvo G, Rostamizadeh A, Talwalkar A. Hyperband: Bandit- Based Configuration Evaluation for Hyperparameter Optimization. Journal of Machine Learning Research. 2018;18:1-52. 32. Anguita D, Ghelardoni L, Ghio A, Oneto L, Ridella S. The 'K' in K-fold Cross Validation. The European Symposium on Artificial Neural Networks. 2012. 33. Matthews BW. Comparison of the predicted and observed secondary structure of lysozyme. Biochim Biophys Acta. 1975 Oct 20;405(2):442-51. doi: T4 phage 10.1016/0005-2795(75)90109-9. 131 Iberg N, Flückiger R. Nonenzymatic glycosylation of albumin in vivo. 34. Identification of multiple glycosylated sites. J Biol Chem. 1986 Oct 15;261(29):13542-5. 35. Al Temimi AH, Amatdjais-Groenen HI, Reddy YV, Blaauw RH, Guo H, Qian P, et al. The nucleophilic amino group of lysine is central for histone lysine methyltransferase catalysis. Communications Chemistry. 2019;2(1). doi:10.1038/s42004-019-0210-8 Fitch CA, Platzer G, Okon M, Garcia-Moreno BE, McIntosh LP. Arginine: Its pKa 36. value revisited. Protein Sci. 2015 May;24(5):752-61. doi: 10.1002/pro.2647. 37. Gallegos M, Costales A, Martà n Pendás Ã. Does Steric Hindrance Actually Govern the Competition between Bimolecular Substitution and Elimination Reactions? J Phys Chem A. 2022 Mar 24;126(11):1871-1880. doi: 10.1021/acs.jpca.2c00415. 38. Ahmad S, Khan MS, Akhter F, Khan MS, Khan A, Ashraf JM, Pandey RP, Shahab U. Glycoxidation of biological macromolecules: a critical approach to halt the menace of glycation. Glycobiology. 2014 Nov;24(11):979-90. doi: 10.1093/glycob/cwu057. Zhang Q, Monroe ME, Schepmoes AA, Clauss TR, Gritsenko MA, Meng D, Petyuk 39. VA, Smith RD, Metz TO. Comprehensive identification of glycated peptides and their glycation motifs in plasma and erythrocytes of control and diabetic subjects. J Proteome Res. 2011 Jul 1;10(7):3076-88. doi: 10.1021/pr200040j. 40. Baynes JW, Watkins NG, Fisher CI, Hull CJ, Patrick JS, Ahmed MU, Dunn JA, Thorpe SR. The Amadori product on protein: structure and reactions. Prog Clin Biol Res. 1989;304:43-67. 41. Shilton BH, Campbell RL, Walton DJ. Site specificity of glycation of horse liver alcohol dehydrogenase in vitro. Eur J Biochem. 1993 Aug 1;215(3):567-72. doi: 10.1111/j.1432-1033.1993.tb18067.x. 42. Malkov SN, Zivković MV, Beljanski MV, Stojanović SD, Zarić SD. A reexamination of correlations of amino acids with particular secondary structures. Protein J. 2009 Feb;28(2):74-86. doi: 10.1007/s10930-009-9166-3. 43. Bosnjak, I., Bojovic, V., Segvic-Bubic, T., and Bielen, A. (2014). Occurrence of protein disulfide bonds in different domains of life: a comparison of proteins from the Protein Data Bank. Protein Eng. Des. Sel. 27, 65–72. doi: 10.1093/protein/gzt063. 44. Carugo O, Cemazar M, Zahariev S, Hudáky I, Gáspári Z, Perczel A, Pongor S. Vicinal disulfide turns. Protein Eng. 2003 Sep;16(9):637-9. doi: 10.1093/protein/gzg088. 45. Dong H, Sharma M, Zhou HX, Cross TA. Glycines: role in α-helical membrane protein structures and a potential indicator of native conformation. Biochemistry. 2012 Jun 19;51(24):4779-89. doi: 10.1021/bi300090x. 132 46. Morgan AA, Rubenstein E. Proline: the distribution, frequency, positioning, and common functional roles of proline and polyproline sequences in the human proteome. PLoS One. 2013;8(1):e53785. doi: 10.1371/journal.pone.0053785. Influence of proline residues on protein 47. MacArthur MW, Thornton JM. conformation. J Mol Biol. 1991 Mar 20;218(2):397-412. doi: 10.1016/0022- 2836(91)90721-h. 48. Münch G, Schicktanz D, Behme A, Gerlach M, Riederer P, Palm D, Schinzel R. Amino acid specificity of glycation and protein-AGE crosslinking reactivities determined with a dipeptide SPOT library. Nat Biotechnol. 1999 Oct;17(10):1006-10. doi: 10.1038/13704. 49. Chilukuri H, Kulkarni MJ, Fernandes M. Revisiting amino acids and peptides as doi: anti-glycation 10.1039/c7md00514h. agents. Medchemcomm. 12;9(4):614-624. 2018 Feb 50. Mirmiranpour H, Khaghani S, Bathaie SZ, Nakhjavani M, Kebriaeezadeh A, Ebadi M, Gerayesh-Nejad S, Zangooei M. The Preventive Effect of L-Lysine on Lysozyme Glycation in Type 2 Diabetes. Acta Med Iran. 2016 Jan;54(1):24-31. 51. Esmaeili F, Maleki V, Kheirouri S, Alizadeh M. The Effects of Taurine Supplementation on Metabolic Profiles, Pentosidine, Soluble Receptor of Advanced Glycation End Products and Methylglyoxal in Adults With Type 2 Diabetes: A Randomized, Double-Blind, Place-bo-Controlled Trial. Can J Diabetes. 2021 Feb;45(1):39-46. doi: 10.1016/j.jcjd.2020.05.004. 52. Houjeghani S, Kheirouri S, Faraji E, Jafarabadi MA. l-Carnosine supplementation attenuated fasting glucose, triglycerides, advanced glycation end products, and tumor necrosis factor-α levels in patients with type 2 diabetes: a double-blind placebo-controlled randomized doi: 10.1016/j.nutres.2017.11.003. Jan;49:96-106. clinical 2018 Res. Nutr trial. 53. Bidasee KR, Zhang Y, Shao CH, Wang M, Patel KP, Dincer UD, Besch HR Jr. Diabetes increases formation of advanced glycation end products on Sarco(endo)plasmic reticulum doi: 10.2337/diabetes.53.2.463. Feb;53(2):463-73. Ca2+-ATPase. Diabetes. 2004 54. Katsuta N, Takahashi H, Nagai M, Sugawa H, Nagai R. Changes in S-(2- succinyl)cysteine and advanced glycation end-products levels in mouse tissues associated with aging. Amino Acids. 2022 Apr;54(4):653-661. doi: 10.1007/s00726-022- 03130-y. 55. Shilton BH, Walton DJ. Sites of glycation of human and horse liver alcohol dehydrogenase in vivo. J Biol Chem. 1991 Mar 25;266(9):5587-92. 133 Iberg N, Flückiger R. Nonenzymatic glycosylation of albumin in vivo. 56. Identification of multiple glycosylated sites. J Biol Chem. 1986 Oct 15;261(29):13542-5. 57. Shapiro R, McManus MJ, Zalut C, Bunn HF. Sites of nonenzymatic glycosylation of human hemoglobin A. J Biol Chem. 1980 Apr 10;255(7):3120-7. 58. Suzuki K, Nakagawa K, Miyazawa T. Augmentation of blood lipid glycation and lipid oxidation in diabetic patients. Clin Chem Lab Med. 2014 Jan 1;52(1):47-52. doi: 10.1515/cclm-2012-0886. 59. Shuck SC, Wuenschell GE, Termini JS. Product Studies and Mechanistic Analysis of the Reaction of Methylglyoxal with Deoxyguanosine. Chem Res Toxicol. 2018 Feb 19;31(2):105-115. doi: 10.1021/acs.chemrestox.7b00274. 60. Allaman I, Bélanger M, Magistretti PJ. Methylglyoxal, the dark side of glycolysis. Front Neurosci. 2015 Feb 9;9:23. doi: 10.3389/fnins.2015.00023. 134 Chapter 4: Conclusions and Future Directions In a world where the incidence of Type 2 Diabetes, and therefore instances of chronic hyperglycemia, is high and is expected to continue to rise, the need to understand and predict protein glycation increases as well. Both through directly altering biomolecular function and through the activation of the pro-inflammatory RAGE pathways, glycation adversely affects many physiological and immunological processes. As illustrated in Chapter 2, existing protein prediction models that have achieved high prediction accuracy scores are not performing well when applied to immune system proteins and peptides calling into question their applicability to predicting protein glycation in a physiologically relevant manner. In an attempt to rectify this, a new protein glycation prediction algorithm utilizing a 3D Convolutional Neural Network architecture, SweetSMILE, was created as discussed in Chapter 3. SweetSMILE outperformed all existing models in terms of predictive quality and also introduced an entirely novel input structure that allows for glycation prediction to be extended to amino acids besides lysine, modified proteins, as well as entirely different classes of biomolecules. While SweetSMILE is an improvement over existing models, it still suffers from the greatest limitation facing advances in predicting glycation: the need for larger and more detailed datasets. As discussed in Chapter 1, owing to the non-enzymatic nature of the glycation reaction it is highly suspectible to local environmental conditions and this should be reflected in the curation of glycation databases by including values for those conditions, such as the reaction temperature, pH, and particular reducing sugar used. Additionally, glycation databases need to be expanded to include instances of N- terminal glycation, glycation of arginine, histidine, and cysteine residues, as well as the 135 glycation of other biomolecules. Future directions for SweetSMILE should include a proteome wide screening for proteins predicted to glycate and those unlikely to glycate. These results can then be cross compared with mutagenesis and similar data to identify glycation events occurring within close proximity to functional sites within proteins. In regards to better understanding diabetic pathophysiology these results could then be further screened based upon Gene Ontology processes and functions that correspond to known diabetic phenotypes. It would seem prudent to initially focus investigative efforts on biomolecules that exhibit one of two characeristics: 1) numerous moonlighting functions, such as lactotransferrin which plays a role in iron homeostasis, as an antimicrobial agent in the innate immune system, displays opiod receptor binding capacity, and several other functions as a loss of function in these molecules would theoretically have immediate rammifications across multiple systems OR 2) possess entirely unique functionality as loss of function of these molecules would then directly correspond to a loss in that unique functionality. Once one has identified their proteins, peptides and/or other biomolecules of interest, these can then be experimentally tested using a combination of mass spectrometry and functional assays similar to what was begun in Chapter 2. Glycation data generated from these studies can then be used to further train the SweetSMILE model and continue improving its predictive quality. Once promising candidates of biomolecules that are both likely to glycate and be tied to a known complication have been identified and the aforementioned initial functional screening and glycation validation studies have been conducted, the next step would be collecting clinical data from pre-diabetic and diabetic patients. Particular 136 emphasis should be placed upon collecting data from patients that exhibit differing progressions in terms of degree of insulin resistance but also in the co-morbities developed and their order. Such collection would obviously require collecting samples far beyond simple blood draws and include the sampling of a variety of cell types based upon the expression profile of the biomolecules of interest because this is where the richest data will most assuredly lie. For as discussed in the introductory chapter of this dissertation, glycation is a highly nuanced reaction and is influenced by numerous conditions within the local environment, all of which will vary from cell to cell and from organelle to organelle owing differential expression of glucose transporters, metabolic, enzymes, and a myriad of other factors underpinning the differences arising from cellular differentiation. Consequently, the fraction of any given biomolecule glycated should be predicted to vary depending on which cell or organelle the sample is derived. This could provide invaluable insight into the metabolic state of specific cells, organelles, and even specific metabolic pathways contained within either, all of which can then be investigated for correlation with specific progression patterns and development of co-morbities within the diabetic population. Indeed, it may one day be that these glycation constellations can be used to resolve type 2 diabetes into a number of sub-types each with their own optimized treatments and timelines, not dissimilar to as has been done in the field of cancer research and medicine. Beyond the topic of predicting protein glycation, this dissertation raised several novel possibilities regarding the reaction and its physiological role throughout evolutionary history. The first of these is challenging the supposition of a glycation motif and positing instead that it is more accurate to frame it as a number of different factors 137 conferring resistance to glycation which would naturally occur in their absence. This could be tested either through mutagenic studies or the creation of synthetic proteins and peptides in which the sequences are altered in ways that would be predicted to result in the loss of this resistance. The second novel possibility proposes that there are some physiologically beneficial effects of protein glycation including the stabilization of proteins and peptides thereby extending their half-lives, as well as serving as a short- term storage source for reducing sugars. Both of these ideas could be tested through labeled glycation studies. 138