CREATING ACCESSIBLE UML CLASS DIAGRAMS By Ira Woodring A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of Computer Science—Doctor of Philosophy 2025 ABSTRACT Unified Modeling Language (UML) Class Diagramming is the commonly accepted mech- anism used to describe relationships between software components. In addition, it is an essential educational tool that is used to convey the structure of software and the patterns of software design to students. Unfortunately, UML is a visual-only mechanism and therefore is not useful for developers and students who are blind or have visual impairments. This work describes a method for conveying class diagrams using audio, which addresses this lack of a tool to support these populations. This method works by dividing the views of a dia- gram into smaller spaces. Elements in these subspaces are conveyed through manipulation of audio properties. Multiple user studies were performed to prove that the tool is viable for conveying the static structure of software elements and that the workload required to use the tool is reasonable. The results of the studies indicate that the tool is effective and requires only slightly higher mental workload than traditional class diagrams. This thesis is dedicated to my wife, Sarah, who believed in me, to Sue and Eric Dravland who showed me I was worth loving, and to my kids who gave me the strength to keep going. iii ACKNOWLEDGEMENTS Thank you to Michael Hudson for your contributions and support. Your perspective has been eye-opening and invaluable. Thank you to my colleagues at Grand Valley State University. Without your support, friendly ears, and advice, I would not be where I am today. Most of all, thank you, Dr. Charles Owen for your patience, wisdom, and understanding. Your guidance and support made this work possible. I have had few great teachers over the years, and even fewer who show the compassion, care, and dedication that you provide to your students. Thank you, Sir. iv TABLE OF CONTENTS CHAPTER 1 INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 CHAPTER 2 DETERMINING USER PERCEPTION OF AUDIO STIMULI AS GEOMETRIC SHAPES . . . . . . . . . . . . . . . . . . . . . CHAPTER 3 DETERMINING LISTENER PRECISION . . . . . . . . . . . . . CHAPTER 4 USING NONETS AND SOUND TO CONVEY STATIC DIAGRAM ELEMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CHAPTER 5 REAL-WORLD DIAGRAM AND WORKLOAD TESTING . . . CHAPTER 6 CONCLUSIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 52 58 65 84 86 v CHAPTER 1 INTRODUCTION UML Class Diagrams are the most commonly accepted method of modeling software. How- ever, presently there is a glaring inadequacy in that they are only usable for developers without visual impairments or blindness. Although experienced developers may overcome this inadequacy, computing students with disabilities are left to struggle while learning com- plex computing topics without these vital aids. According to the US National Center for Education Statistics, 14% of American schoolchil- dren (7.3 million total children) qualify as having a disability under the Individuals with Disabilities Education Act (IDEA) [7]. As these students matriculate into the workforce and higher education, it is imperative that they receive as good an education as their peers. For students who major in software engineering or related fields, the inability to use a tool as powerful and prevalent as class diagrams makes this impossible. Furthermore, in the last decade, multiple countries have accepted that at least a cur- sory understanding of computer science concepts is essential for their citizens. The United Kingdom, for example, has added computing competencies to its national curriculum, and President Obama created AccessCSforall 1 in the United States [5, 4]. Obama noted that computer science should become a “basic skill” for the citizenry. Implicit in this endeavor is the mandate to ensure equal access to computing education for students with disabilities. At the same time, the complexity of many modern software solutions has become im- mense. For example, the F35 Joint Strike Force jet in the United States has an estimated 24 million lines of code [65]. The code base for Google services is estimated to be more than 2 billion lines of code [54]. Even small desktop applications often have a large number of lines of code; the popular open-source VLC media player, for example, consists of over 32 million lines2. Table 1.1 presents the number of lines of code for several popular open-source projects. 1https://www.washington.edu/accesscomputing/accesscsforall 2As measured by the Github-Linguist tool available at https://github.com/github/linguist 1 Object-oriented design of software projects allows for dividing code into areas of concern and thus managing this complexity. Alan Kay, the creator of Smalltalk (the first object- oriented programming language), noted that “Object-oriented design is a successful attempt to qualitatively improve the efficiency of modeling the ever more complex dynamic systems and user relationships made possible by the silicon explosion” [39]. However, even when di- vided into areas of concern, there needs to be a way to easily represent the interrelationships between various object types. The computing field makes use of many different diagram types, for various purposes, such as Class Responsibility Collaboration (CRC) cards, Entity Relationship (ER) diagrams, and Unified Modeling Language (UML) diagrams. Tradition- ally, static software structure has been conveyed through UML class diagrams. However, to include all individuals, a different method is necessary. Project Linux Kernel Swift Rust VScode VLC Media Player React Vuejs Lines of Code 1,010,549,062 92,988,974 53,603,913 42,293,424 32,920,033 6,143,496 2,926,351 Table 1.1 Lines of code (as measured by Github-Linguist) of selected popular open-source projects. Represents the state of projects from March 2022. UML is a visual language for modeling systems [16]. It consists of multiple diagram types that represent systems and interactions with and within systems. Of the various types of UML diagrams, UML Class Diagrams are used for modeling relationships between code en- tities in object-oriented software projects. Many undergraduate computer science programs teach class diagrams, and it is generally considered the standard way to model software. Although some studies show that UML use is decreasing ([12]), it is still a valuable tool for representing the relationships between components in a software design and a powerful pedagogical tool. Previous research demonstrates that properly designed sonification and auditory displays 2 enhance data presentation and comprehension across various domains. For example, studies have shown that sonifying image features help listeners better understand and reproduce image details through drawing [78]. In medical settings, sonification has been shown to be as effective as visual presentation in determining respiratory information, even in busy environments [72]. Individuals monitoring computer network traffic experienced reduced workload and improved awareness of traffic types and attacks when using auditory displays [24]. The offer of multiple modalities for data interaction improves user experience and understanding. 1.0.1 UML Class Diagrams Unified Modeling Language (UML) is a visual language to describe aspects of software systems. Class diagrams provide UML with the ability to model the static structure of a system [16]. They provide developers and maintainers of systems with an understanding of the overall architecture, as well as information about the relationships between a system’s elements. Class diagrams are a type of node-edge graph, where nodes represent software elements, and edges represent the relationships between them. However, UML is not a perfect solution. Students often struggle to learn and use UML diagrams [55, 22, 60]. The problems of learning UML arise due to the complexity of the language, as well as inadequate materials for teaching and learning UML [60]. However, the most notable inadequacy is that UML is a visual language and therefore not useful to students who are blind or have visual impairments. Unfortunately, previous research has concluded that it is impossible to fully include these individuals without having the assistance of an individual without visual impairment due to the lack of available and appropriate tools [42]. Using supplemental or alternate representations of information can benefit educators and students in this regard. By providing supplemental data representations, students receive an additional mechanism for understanding and exploration. The provision of alternate representations ensures that more students have access and can overcome some of the barriers imposed by differing abilities and schemas. For example, in a 2021 study, students were 3 taught the bubble, insert, selection, and merge sorting algorithms in short 15-minute lessons with the help of audio[7]. During the lessons, participants were instructed to run a program that used the algorithm they had just learned to sort 2500 random numbers. Participants were exposed to an output of the time taken, an audio sonification of the run as a sorting visualization, or a combined sonification and visualization. Two weeks after the lessons were completed, the students received a survey quiz to assess learning and retention. The results showed that the participants who received the sonification performed as well as the students who received graphical representations. Similarly, researchers have had success with multiple mode methods of teaching arrays and other data structures[50, 18]. Researchers have suggested that a computer science curriculum that includes multiple modalities results in a significant increase in reports of programming self-efficacy in blind students [61]. It has also been shown that users often prefer multi-modal methods of interfacing with data. In their ChartMaster work, which seeks to present Stock Market data effectively with screen readers, Zou and Treviranus noted that users wanted options and that no single modal- ity for presenting the information (among speech, sonification, haptic and hybrid methods) was preferred by all users [80]. Doush et al. reinforce this in their work by displaying charts from common office productivity software; their guidelines for graphical representa- tions specifically call for presentation strategies that can be customized and affected by user control [10]. UML provides for a variety of diagram types, each with its own node and edge types. The nodes in class diagrams usually indicate a class structure, although they are generic enough to be used for interfaces, enumerations, and other types of entities. One of the most important element types of a class diagram are the edges between the nodes, which represent the relationship types (Figure 1.1). Relationships between classes in object-oriented code bases are one of four types. We list them here in alphabetical order. The first type, “associations”, indicates that one class holds a reference to one or more instances of another class as an instance variable. There 4 Figure 1.1 Elements of a class diagram include classes (represented as blue rectangles), and a variety of arrows to represent different relationship types between them. (a) A generalization relationship. (b) An association relationship. (c) A realization relationship. (d) A dependency relationship. are two additional special types of associations, “aggregations” and “compositions”. We do not go into those here, as a generic association type is adequate for our work. A “dependency” exists where a class relies upon the existence of another but does not inherit from that class and does not hold an instance variable of the other class’s type. These may exist because the first class has a local variable of the second type, takes the second type as a parameter to one of its methods, or returns the second type from one of its methods. “Generalizations” indicate that a class inherits information from another class. Inher- itance is vital to object-oriented programming, allowing for simplicity in the creation of polymorphic elements. “Realizations” indicate the use of an interface. Not all computer languages support interfaces, and it is possible to represent this type of relationship with generalizations. 1.0.1.1 Representations of Node-Edge Graphs Node-edge graphs are found in many varying areas of science and mathematics. Class diagrams are a specialized form of a directed node-edge graph. Node-Edge graphs may be 5 Figure 1.2 A simple class diagram composed of six classes, and the generalization and association relationships between them. represented in a variety of alternative formats. Consider the diagram in Figure 1.2: Using set notation, we can mathematically represent the same diagram as a set of nodes and edges between them. Here, we call the diagram G, the nodes V , and the edges E. We can then define the diagram as follows. where G = (V, E) V = {Character, Inventory, M age, P aladin, P et, Rogue} E = {(Character, Inventory), (Character, P et), (M age, Character), (Rogue, Character), (P aladin, Character)} (1.1) (1.2) (1.3) An adjacency list could also be used to represent the graph. An adjacency list is a list of lists, with each node in the graph containing a list of nodes to which the node con- nects. Adjacency-list representations work best for graphs that have a smaller number of connections between elements. Our diagram above would be represented as 6 Character Inventory Mage Paladin Pet Rogue Character Inventory Mage Paladin Pet Rogue A A G G G Table 1.2 An adjacency matrix representation of the diagram from Figure 1.2. Here, we have used the letters “A” and “G” to indicate the connection is an association or generalization. Character − Inventory, P et M age − Character Rogue − Character P aladin − Character (1.4) An adjacency matrix view provides a representation of the same graph in table form. For a graph with n nodes, an adjacency matrix will be an n2 dimension table. 1.0.2 Rights of Individuals with Disabilities Judicial precedent (Case or Common Law) and statutes (Civil Law) govern the rights of individuals with disabilities in the workforce, education, and other areas of life. What follows is a summary of some of the more important rules and regulations. Although it specifically addressed race, the landmark Brown v. Board of Education of Topeka [1] is regarded as an important milestone for disability advocates, as it was in many ways the beginning of judicial support for minority rights [45]. Furthermore, as Meyer and Boutcher argue, the strategies used by activists and legal teams for Brown were adopted by those working on disability rights initiatives; therefore, its importance for the cause of equity for people with disabilities cannot be overstated. Pennsylvania Association for Retarded Children (PARC) v. Commonwealth of Pennsyl- vania (1971) was brought by parents of children with mental disabilities, who argued that 7 their children were entitled to appropriate state-provided education and services. Further- more, parents wanted their children to be served in an appropriate age-appropriate school attended by non-handicapped children [3]. The court set an important precedent by find- ing in the parents’ favor, and later legislation mandated this educational model. Mills v. Board of Education (1972) reinforced the PARC result. Judge Joseph Cornelius Waddy, who presided over the case, wrote in his decision that public education is “a right which must be made available to all on equal terms.” In 1973, the U.S. Congress passed the Rehabilitation Act to prevent discrimination against individuals with disabilities. Section 504 of this law mandates that any entity that receives federal funding must not discriminate solely based on an individual having a disability. The section specifically mentions that it pertains to organizations “principally engaged in the business of providing education, health care, housing, social services, or parks and recre- ation”. Section 508 specifies that electronic and information technology systems should be accessible to persons with disabilities 3. The Education for All Handicapped Children Act of 1975 enshrined a federal right for students with disabilities to receive a fair and appropriate public education (FAPE) to take place in the least restrictive environment (LRE) possible depending on a child’s needs. In addition to education, the legislation entitled those students to related services to support their education. An important aspect of this law was the requirement that students with disabilities have an individualized education program (IEP) tailored to their specific needs. The law allowed parents to be involved in creating and reviewing the IEP and required states to have procedures on which parents could rely to challenge the IEP. The law was reauthorized in 1990 at which time it was renamed the Individuals with Disabilities Education Act (IDEA) and again reauthorized in 2004 to align with No Child Left Behind (NCLB) a law that tied educational funding to student achievement on state-mandated performance metrics. As NCLB did not add new protections for students with disabilities (and has since 3Unless by doing so the agency is imposed with an undue burden. 8 been mostly replaced), it is not covered in further detail here. The 1982 case Board of Education v. Rowley provided the Supreme Court with its first opportunity to rule on the Education of the Handicapped Act. This suit was brought about by the parents of a child who was deaf. Although their daughter had been somewhat successful academically, they felt that she was not meeting her potential, as she was unable to understand all the words in the classroom. The court sided with the school board, finding that the 1975 act did not intend for students to meet their full potential, but rather to provide them access to a free, appropriate public education. The Americans with Disabilities Act (ADA) of 1990 added additional protections for individuals with both physical and mental disabilities, such as the requirement that busi- nesses provide “reasonable accommodations” to employees with disabilities, requirements for accessible public transportation, accessible design requirements for new and existing public buildings, and other miscellaneous provisions. 1.0.2.1 Implications of Disability Laws The spirit of the law concerning Americans with disabilities is clear in that alternative representations of information should be made available to students and employees in cases where it does not provide an undue burden on an organization to do so. The effects of these laws exist all around us; for instance, in Braille signage in public buildings and closed captioning in television broadcasts. However, there exists an obvious lack of accessibility in diagramming. This is, without a doubt, due to the complexities of creating alternative representations of information that meet these specialized use cases. However, as the ability to work with data becomes ever more important across many occupations, the lack of viable alternative representations of data becomes not just inconvenient but economically limiting to the millions of individuals to those who experience it. 9 1.0.3 W3C Best Practices for Accessibility With the ubiquitous presence of Web browsers on virtually all commodity computers, as well as the widespread adoption of the Web as the primary platform for sharing content, there has been an increasing focus on the best practices for assistive computing in this domain. Although the W3C4 exists to develop accessibility standards specifically for the Web, its recommendations serve as one of the most important repositories of best practices to ensure that content is accessible for all people. What follows is a summary of the key points of their Content Accessibility Guidelines [2]. The W3C recommends that content adhere to four principles of accessibility. The first is that the content must be presented in a way that is perceivable by users. In practice, this may mean that there are text-based alternatives to audio-based information such as captions, that there are audio-based descriptions of visual media, or that some other alter- native representation(s) of data exist. Furthermore, the content should be adaptable, which means that the presentation of information should be changeable without losing access to the information. Orientation should not be fixed and information should be easily distinguish- able. Color schemes in visualizations, and audio streams should provide easily differentiable stimuli. Second, the user interface must be operable. The W3C recommends that all features of an interface be available from the keyboard and note that careful design should ensure that keyboard focus is never “trapped” in a content area. At the same time, they recommend that alternative input modalities, such as touch screens and speech recognition, be added when possible. Input should not be time-limited, and the timing of any stimuli should be controllable by the user. Furthermore, there should never be more than three flashes in the time span of one second, so as to not trigger seizures. Third, the information and operation of the user interface should be understandable. Idioms and jargon should be kept to a minimum. The operation of the interface should be 4https://www.w3.org/about/ 10 predictable and the navigation consistent throughout the system. Error prevention mech- anisms should be prevalent and work well. Changing the focus of a particular structural element should not modify contextual areas such as the view port size or layout. Lastly, the content must be robust. Here, W3C means that content should be compatible with a large number of technologies, such as screen readers. Changes in user input elements should always provide the user with change notifications. Status messages must be available to assistive technologies. 1.0.4 Interfacing With Screen Readers The most widely used digital tool for people with blindness or vision impairments is the screen reader. A 2021 survey by the Institute for Disability Research, Policy and Practice found that of 1,568 respondents, 98.7% reported using a screen reader. Of these survey respondents, 53.7% reported the primary use of Job Access With Speech (JAWS) 5 and 30.7% used non-visual desktop access (NVDA) 6. The Accessibility Principles of the Worldwide Web Consortium (W3C) call for all non-textual content to provide a textual representation [6]. Non-textual content refers to images, buttons, and other controls, as well as multimedia and data represented with charts or other diagrams. Unfortunately, research has shown that these standards are often ignored. One study determined that only 39.6% of the images on the top 500 high-traffic websites had applied textual representations [15]. This is particularly problematic when an image contains a visualization of data or information. In one study of the frustrations encountered by more than 100 screen readers, the researchers found that lack of appropriate alt-text, poorly labeled or unlabeled forms, poor navigation, and other problems caused considerable frustration and up to 30% lost time when using computers [40]. An analysis by Sharif et al. classified the problems that screen readers faced while browsing websites randomly drawn from a curated set of 50 sites that included visualizations [58]. The problem categories they discovered were 5https://www.freedomscientific.com/products/software/jaws/ 6https://www.nvaccess.org/download/ 11 1. Invisible Visualizations - Screen readers did not detect a visualization on 33% of the pages that had them. 2. Incomprehensible Visualizations - Visualizations were detected by the screen reader, but were labeled only as “blank”, “graphic”, etc. 3. Lack of Access to Any Data Points - Screen readers could detect a visualization but could not access the data within. 4. Lack of Holistic Exploration and Trend Assessment - Users were unable to gain a high-level understanding of the visualizations and data within. 5. Lack of Ability to Investigate Specific Data Points - Some data points may be available, but not all. 6. Lack of Tabular Representation - Information was only provided via the graphical modality. 7. Lack of Textual Representation - Lack of alt-text and other descriptive text. 8. Lack of Description of Overall Trend in a Non-Visual Format - No summary information or description of the visualization was provided. The study designers recommended the appropriate use of alternate text and ARIA at- tributes, as well as a design that allows for holistic and lower-level exploration, autogenerated alternate text, and multiple modes for exploring data such as sonification. 1.0.5 Contributions of this Thesis This work and future work based on this work is an effort to make UML class diagrams more accessible to students who are blind or visually impaired while also determining best practices for tools that allow for multimodal exploration of UML class diagrams (and possibly other). Although UML encompasses more types of diagram than just class diagrams, a survey by Muller found that class diagrams are the most widely used in presentations to computer science and software engineering students [46]. Specifically, the efforts presented here were to develop a mechanism for presentation of class diagrams that: • Does not require manual re-creation of existing diagrams. • Reproduces diagrams from an existing, widely-used, production quality tool. • Uses inexpensive, ubiquitous, or nearly ubiquitous technologies for rendering diagrams. 12 • May require some training, but must not require a great deal of training to use effec- tively. The mechanism in this work, the use of sound for supplementing and replacing visual- izations, gives presenters a way to overcome the limitations of visualizations. It does this by careful manipulation of psychoacoustic properties, as determined by the authors and other researchers’ work. 13 1.1 Human Perception The processes of human perception are complicated and not well understood. They comprise much more than the simple reception of physical stimuli. The following work details the knowledge that has informed this work. 1.1.1 Human Visual Perception and Visualization Best Practices The work in this thesis focuses on human audio perception, but to disregard the well- established knowledge of the human visual system would be folly. Furthermore, the science of human audio perception is not as mature as visual perception; therefore, we look to more mature research to find hints as to how this complicated phenomenon works. Undoubtedly, the physical processes of recognizing waves of light as a picture of a loved one differ from those that recognize waves of audio as a recording of their voice. However, there are important parallels between the psychological processes that perform both tasks. For example, Albert Bregman noted in his work on audio perception that sensory inputs, be they from drawings or recordings, are separated into groups by the nervous system, while it makes sense of patterns in the world [14]. Although classical theories on the evolution of the physical aspects of human vision hypothesized that it developed largely to consume the maximum energy input from the Sun, recent work has indicated that it is likely to have developed to maximize the input of information [25]. However, not all information is useful; thus, human perceptual systems make use of the mechanism of attention to determine which groups of stimuli should require further processing [33]. But what criterion causes the visual perceptual system to form a stimuli group? For instance, what causes human brains to associate the varying waveforms of light detected by our physical processes to be grouped into an entity we could psychologically recognize as an apple? Research shows that objects in a scene are identified by properties such as color, texture, orientation, spatial frequency, brightness, and movement direction, stimuli that are easy for our systems to “separate” from competing stimuli[67]. However, separable stimuli alone are not enough. For example, studies have shown that 14 the use of disparate colors alone is not enough to create an effective visualization; instead, visualization designers should consider the meaning of colors in context[66]. This implies that the creation of an effective visualization requires a strong understanding not just of the data being presented but of the viewers’ mental models and schemas about the data. This is probably why even though there is a great body of visualization research, there is a lack of centralized, holistic guidelines for visualizations[26]. 1.1.1.1 Visualization Best Practices Ben Shneiderman introduces a taxonomy for organizing and understanding the various approaches to information visualization in his often cited work, “The Eyes Have It: A Task by Data Type Taxonomy for Data Visualization” [59]. The central framework of the paper is built around the Visual Information-Seeking Mantra: “Overview first, zoom, and filter, then details on demand.” This mantra emphasizes the importance of first giving users an overview of large data sets, allowing them to zoom in and filter data based on their interests, and finally providing detailed information when requested. This process helps to manage the challenge of navigating large and complex data sets efficiently. Shneiderman proposes a task-by-data-type taxonomy to classify visualization techniques. He identifies seven key data types, including one-dimensional data (like text), two-dimensional data (such as maps), three-dimensional data (real-world objects), temporal data (time-based sequences), multidimensional data (attributes with multiple variables), tree structures (hier- archies), and network data (interconnected items). The paper also outlines seven core tasks that users perform when interacting with visualized data: overview, zoom, filter, details-on- demand, relate, history, and extract. These tasks help users interact meaningfully with the data, from gaining a broad understanding to finding specific details. The paper further discusses the value of dynamic queries and filtering techniques that allow users to rapidly explore large data sets. By offering smooth user-controlled interac- tions, these systems facilitate easier navigation and understanding of complex information. Shneiderman also explores the application of various visualization tools, such as treemaps 15 and fisheye views, and emphasizes the role of advanced interface designs in reducing infor- mation overload. Ultimately, the paper argues that effective visualizations must make full use of human perceptual abilities and create intuitive interfaces for a broad range of users. Although this work focuses on visualizations, the process it identifies is one that translates well to other sensory inputs, as it is really a guide for how to present information in any form. A study by Doush et al. provided ten recommendations for the presentation of graph- ical information [11]. Although its work focuses on the multimodal presentation of more generic graphical information, some of their guidelines should be considered in the design of a multimodal UML framework. In particular, their work notes the need for: 1. Summary information 2. Customizable presentation strategies 3. User ability to change presentation modality 4. Structural hierarchy views for information for ease of navigation (i.e. chart titles and axis information) 5. Contextual information like pauses and other audio cues 6. Careful choice of scale These best practices reinforce Shneiderman’s results and go further; providing summary information is analogous to providing an overview, for instance. Zooming and filtering allow users to change the presentation modality. Providing contextual information, such as pauses, becomes increasingly important when switching from visual to audio stimuli as the primary method of representing information. 1.1.2 Human Audio Perception The phenomenon of human audio perception is complex and involves many stages of processing in multiple parts of the body. He noted that humans can perceive light with wavelengths between 380 and 750 nm, resulting in a range of roughly one octave, while 16 Figure 1.3 The three areas of the ear. Adapted from [63]. humans perceive sound from 20Hz to 20,000Hz, or nearly 10 octaves [49]. The science of audio perception involves biological and psychological processes. 1.1.2.1 Biological Processes of Hearing The US National Institutes of Health summarizes the basic biological processes of hearing [63]. Vibrations traveling across some media (usually air, but not always) are first picked up by the outer ear. The outer ear is made up of the fleshy portion called the pinna and the ear canal. Interestingly, the physical properties of the pinnae, such as shape, affect the incoming vibrations, resulting in perceptual differences [35]. After the sound travels through the ear canal, frequencies 3,000 - 4,000 Hz are amplified before the sound is received by the tympanic membrane (what many people refer to as the eardrum) [63]. This membrane connects to the ossicles, a group of three small bones in the middle ear. The sound then travels to the inner ear. For hearing, the most important part of the inner ear is the cochlea. Within the cochlea, vibrations are converted to electrical signals that the brain can consume by means of tiny hair cells called stereocilia. Figure 1.3 shows the three parts of the ear. 17 1.1.2.2 Psychological Processes of Hearing Although the biological processes of hearing are complex, the basic physical mecha- nisms are fairly well understood. In contrast, scientists continue to unravel the mysteries of the psychological side of human hearing, a field dubbed psychoacoustics. Early works by Colin Cherry identified one of the first foundational problems of psychoacoustics, the famous “Cocktail Party” problem[21]. Cherry noted that under certain circumstances, humans can perceive more than one speaker at a time; for example, while we are listening to one person at a cocktail party, we are often able to recognize the speech of another person simultaneously. Cherry performed tests where subjects listened with one or both ears to auditory stimuli, while he varied the properties of the stimuli, and found that changes in delay caused percep- tual effects in the listeners. Given a delay between the two stimuli, participants perceived the listener to move to their left or right side. More interestingly, with a delay of less than 20 ms given the same speech in both ears, the participants perceived the two stimuli as coming from a single source. These results helped researchers realize that sound perception is more than just the sum of physical stimuli. Later research by Albert Bregman continued to examine these perceptual oddities. Breg- man created the term “Auditory Scene Analysis” to describe his theories of human audio per- ception [14]. Just as the visual system uses separable properties of visual stimuli, the brain relies on separable audio properties to determine which audio stimuli should be grouped. Bregman referred to the groupings as “streams”. He was very clear to stipulate that a stream differed from a sound; he noted that a stream is a single event and that streams could be composed of multiple sounds (for instance, he noted that a musical performance might consist of a singer with a piano backing). Bregman’s work often found auditory parallels to existing Gestalt principles of visual perception. These principles - proximity, similarity, uniform density, common fate, closure, direction, and a few others that we will not name here - were identified by Gestaltists as deterministic factors in perceptual organization[68]. For example, consider the images in 18 Figure 1.4 An example of the Gestalt property of closure; in (a), the individual pieces seem unconnected, while in (b), they seem to be capital letter “B”s. From [14]. Figure 1.4; these images were used by Bregman to illustrate the Gestalt property of closure. In image “a”, there is not enough information for human brains to try to close and connect the (seemingly) random shapes. In image “b” though, we can’t help but see the pieces as making up the five capital “B” shapes. Similarly, Bregman notes that when an audio stimulus is masked with a louder stimulus, the softer-sounding stimulus is still perceived to continue playing (even if it is explicitly removed). 1.1.2.3 The Use of Complex Sounds Schutz and Gillard found that an overwhelming amount of research in non-speech audio perception relied on flat, simplistic tones, specifically, tones with a flat amplitude envelope [56]. They conclude that researchers choose these tones so that additional variables and com- plexity aren’t introduced into their works, and to the reproducibility of their work. However, Schutz argues that these synthesized tones do not elicit the same perceptual responses as more complex tones. He notes that amplitude envelope and timbre allow humans to deduce information such as the materials used to create a sound, and even to detect events (he uses the example of a bottle breaking). The authors’ early works relied on pure tones, usually simple sinusoids with a flat onset to offset response. That has since changed to synthesized instruments with a more complex amplitude envelope and additional timbre features. Furthermore, studies have shown that the psychoacoustic properties of sounds need to 19 adhere to listeners’ predefined mental schemas. For instance, Walker found that in magnitude estimation tasks (i.e. changes in stock prices, velocities, pressure, etc.), existing schemas on the polarity of values were an important design consideration [69]. In particular, he found that listeners sometimes perceived an increase in a modulated psychoacoustic property to an increase in magnitude, but other times related a decrease in magnitude to an increase in some property. He warned that sonification designers need to evaluate such mappings before designing a sonification. Ferguson and Brewster performed studies showing that modulation of psychoacoustic properties that generally have a negative connotation, such as noise or roughness, may be better at conveying properties in sonifications that are considered negative, such as stress or pressure [31]. 1.1.3 Psychoacoustic Properties The properties of sound and how they are interpreted by listeners are called psychoa- coustics. There are many properties of sound that may be manipulated for the transmission of information. Dubus and Bresin reviewed a large body of sonification research and identi- fied 30 properties and sub-properties that were being used (Table 1.3)[28]. “[P]erceived relations” in the aforementioned definition of sonification implies that the application of a particular sonification may be interpreted differently by different users. Walker and Kramer found that when mappings from some data dimension to a sound prop- erty match the listeners’ expectations, that performance increased [70, 32]. For example, multiple studies have discovered that listeners may perceive a rising pitch as indicative of a positive or a negative change in a corresponding data element. In user studies on the sonification of magnitude estimation of temperature, pressure, size, and weight, researchers found that listeners preferred a negative scale when presenting weight increases by pitch [70]. In another study, researchers created mappings to convey blurriness of astronomical images and found that some users perceived an increase in resolution as a positive mapping and others as negative [30]. It seems then that allowing users to invert polarity could render a sonification more useful to a wider audience. 20 Pitch Pitch Range Timbre Instrumentation Polyphonic Content Voice Gender Allophone Spectral Power Amplitude of Harmonic Frequency of Harmonic Roughness Brightness Center frequency of filter Saliency Dynamic Loudness Spatialization Doppler Effect Tempo Duration (Rhythmic duration, Event duration, Ambient duration, Non-specified duration scale) Sequential Position Melody Lead Articulation Decay Time Melody Harmony Chord Progression Spectral Duration Reverberation Time Performance Activity Level Table 1.3 Psychoacoustic Properties used for information presentation in prior literature, as identified by Dubus and Bresin [28]. The authors of that paper discovered that the use of pitch was used significantly more often that the other properties. 21 1.2 Related Work There is not a great deal of research on conveying class diagrams and related graph types in alternative forms. What follows are descriptions of the works that have influenced this work the most. 1.2.1 Student Struggles with UML As we have already noted, students with visual impairments or blindness cannot use UML or are greatly hindered in their ability to use it. However, the use of UML provides a much better understanding of the structure of systems than simply reading the code. However, even students who report no visual difficulties struggle with UML concepts. However, learning to code with readily available software structure visualizations has been shown to lead to better learning outcomes, as has the use of a model-driven paradigm to create software [77, 76]. More formal methods, such as Model-Driven Software Develop- ment (MDSD) encourage modeling throughout the development phase [23]. Although little pedagogical research has yet been done on MDSD and student learning outcomes, it is an area that holds promise. A recent study examined the types of errors that students make when creating UML diagrams [22]. During the course of a semester, the researchers analyzed more than 2,700 UML diagrams submitted by over 120 students. They classified the types of errors the stu- dents made. In class diagrams, researchers found that students most commonly left out class operations and their arguments, forgot to include dependencies, and left out or incorrectly typed functions. A tool with multi-modal presentation capabilities of the relationships be- tween objects may be able to help students better understand how to model (and therefore develop) software. Reuter et al. studied student learning through a “think-aloud” method, having students describe their mental processes while working on UML modeling problems [55]. Their results indicate that UML complexity combined with the lack of appropriate learning scaffolds such as additional examples of UML solutions, as well as the need for additional tools to create 22 diagrams, are the primary causes of student struggles. 1.2.2 Adapting UML for Individuals with Visual Impairments Existing work on adapting diagrams for people with visual disabilities and blindness is primarily based on text, tactile, audio, or mixed methods. 1.2.2.1 Tactile Methods Several authors have detailed their efforts to create touch-based UML solutions. These efforts range from laborious, human-centered methods to expensive digital mechanisms. Al- though digitally created tactile methods are likely to be useful once the assistive technologies that could support them become cheaper and more prevalent, they currently suffer from the need for expensive, hard-to-find hardware. A 2006 paper detailed the method of one professor in conveying class diagrams with note cards, cut plastic strips, and pushpins for a student with visual impairments [17]. The professor adapted the diagramming tasks by creating the same diagrams with the aforemen- tioned supplies. Although a rigorous study was not performed, anecdotally, the professor noted that the errors the student made in an individual diagramming task were the same as those of peer students. In addition, the professor noted that the student found the system more useful than the audio-based solutions he had used before. However, the work did not detail how different types of relationships might be conveyed using this mechanism. The 3D System Touch (formerly called Omni Phantom) 7 devices purport to recreate digital objects so that users can touch them as if they were real (Figure 1.5. Eid et al. studied the use of a software program to create UML diagrams called UML CASE, both with a mouse and keyboard and with an Omni Phantom device [29]. The system gave users the perception of holding a class; the more members a class contained, the heavier the class would feel. Diagrams can be edited by grabbing classes and dragging them around the diagram. Force responses would indicate when lines crossed or classes collided with one another, and an elastic effect was portrayed when a relationship between classes existed. 7https://www.3dsystems.com/haptics 23 Figure 1.5 3D Systems Touch devices create the impression that users can touch a digital object. Such a device was used in the UML CASE tool [29]. The researchers found that users took more time to build a diagram with the haptic device compared to the mouse and keyboard setup (5.2 minutes average time compared to 4.1 minutes). However, users overwhelmingly reported that the haptic mechanism improved the interaction. Unfortunately, the haptic devices used in this study are expensive and rare. A mechanism that uses a more ubiquitous technology is needed. Refreshable Braille displays employ a grid of pins that may be raised or lowered to recreate Braille lettering and other shapes and symbols. Loitsch and Weber used this type of display to recreate UML diagrams created with the Microsoft Visio tool [41]. Unlike the previously cited works, Loitsch and Weber conducted a formal user study of their work. They recruited 7 computer scientists with visual impairments and asked them to use the tool for two different tests. In the first test, users were asked to examine the diagrams and then asked questions about the number and types of objects in the diagrams. In the second test, users were asked to examine the diagrams and answer questions about diagram structures. A NASA Task Load Index (TLX) 8 was performed to measure the cognitive load of using 8https://human-factors.arc.nasa.gov/groups/TLX/index.php 18 24 Figure 1.6 An example of tactile UML. The display media is a refreshable Braille display comprised of 120 columns and 60 rows of pins. Image reproduced from [41]. Figure 1.7 Tactile UML of Class Diagrams by Owen, Coburn, and Castor [48]. The image on the left is the corkboard “scene”, populated with embossed Post-it notes, while the image on the right is an example of the foam pieces used to represent relationship type and multiplicity. the tool. Although results were generally good, the tool suffers from the need for expensive, non-ubiquitous Braille displays. Owen, Coburn and Castor used similar mechanisms in their 2014 work [48]. In this experience report, the authors detail a method of representing class diagrams via post-it notes representing classes, pinned to a corkboard. The notes were embossed with Braille to provide information about the classes and the class members. Wire or rubber bands were used to connect the classes. Foam shapes readily available in craft stores were used to 25 represent different types of relationships between classes and multiplicity (Figure 1.7). For example, a foam triangle indicates that the relationship is a generalization. Although this work shows promise, in that it is both flexible and provides complete artifacts (i.e., unlike the Brookshire method, it can convey relationship types and member information), it requires too much time to create a diagram. In the classroom, it is not unreasonable to expect that multiple diagrams may be needed for a single lecture, which would add considerable preparation time for the lecturer. Doherty and Cheng developed a proof-of-concept mechanism for 3D printing tactile rep- resentations of UML diagrams, called PRISCA [27]. This system parsed diagrams from the Visual Paradigm software package. It used open-source software to create 3D CAD models that were then printed on a (relatively) inexpensive 3D printer (Figure 1.8). The major benefit of their mechanism is that diagrams do not need to be adapted manually for end users as a diagram would be printed exactly as drawn. This aids cooperation between coders who do not have visual impairments and those who do, as the same diagrams may be used for both populations. Furthermore, the system can produce many types of UML diagrams. However, their mechanism has multiple downsides. First, even though the cost of 3D printers is decreasing, they are still somewhat expensive (Doherty and Chen mentioned that their model has cost $3,000), and they can be complex to use. Furthermore, the time to print a diagram was approximately 1.5 - 2 hours, making this method unsuitable for any rapid or agile design. Although this mechanism may be useful for conveying already designed systems, it excludes the inclusion of persons with visual disabilities in the design process. 1.2.2.2 Textual Methods Some works have focused on creating text-based UML diagrams. The reasons for text- based UML are varied and often aren’t for increasing access to diagrams for persons with disabilities, but rather to facilitate communication on messageboards, wikis, and through email [71, 51]. However, a text-based diagram may be more accessible as it is more likely to 26 Figure 1.8 Illustration of the PRISCA system from [27]. The diagram is recreated with graph elements raised, providing a tactile interface for discovering elements and relationships between them. Figure reproduced from the original work. include usable information for screen readers. Washizaki et al. developed a text-based UML class diagram notation to convey these diagrams in domains that exclude graphical communications [71]. In their paper, the au- thors note that their system is beneficial for sharing models on mailing lists and over email. The character set was a subset of the ASCII standard9. Lines were drawn using hyphen, pipe, and colon, the selection of which conveyed the type of relationship. Unlike traditional class diagrams, the TCD method separates the class definitions (which include a list of the variables and methods of a class) from the relationship view (see Figure 1.9). This design choice limits the amount of information that a reader must consume at once, which could make this mechanism more useful for use with assistive technologies such as screen readers. However, this solution does not scale well and still uses visual mechanics such as lines to represent relationships. A large diagram, or one in which relationship lines cross, would be 9https://webstore.ansi.org/standards/incits/ansiincits1986r2002 27 Figure 1.9 A class diagram (left) represented using the Text-based Class Diagram (TCD) method (right). An important feature of TCD is the separation of a class definition from the relationship hierarchy between classes. Figure from the original work [71]. no more consumable to assistive technologies than a graphical diagram. PlantUML10 is a robust, succinct, text-based mechanism for conveying a large variety of UML diagram types (we will focus here only on class diagrams). The designers of this system created it to be embeddable within the source code, and a Java-based tool is provided to parse the UML definitions from the Java source code. The diagram specifications must be enclosed with the tags @startuml and @enduml. Classes are represented by simply typing the class keyword followed by the class name. Symbols may be added to a class in one of two ways; either by entering the class name followed by a colon and the symbol name or by enclosing the symbol names within curly brackets, following the class declaration 1.10. Methods are specified by the symbol name followed by a set of parentheses. A variety of arrow types can be produced through combinations of line types and arrowheads, all conveyed with text. For instance, <|-- between Student and ComputingStudent represents a solid line with a closed but unfilled arrowhead pointing from the second class to the first. The syntax of the mechanism provides facilities to represent access modifiers (public, private, protected, and default), stereotypes, notes, and other UML elements. Graphical diagrams can be automatically compiled from textual representations using the Java-based tool provided by the PlantUML authors. PlantUML has been used successfully in an educational setting by a 10https://plantuml.com/ 28 Figure 1.10 Two examples of a PlantUML class specification; the first uses the colon syntax, while the second uses curly braces to denote ownership of symbols. student with vision impairment to create diagrams[19]. However, the tool does not provide any facilities for exploring existing diagrams. Earl Grey is a similar text-based UML language, similar to PlantUML [43]. Earl Grey differs from PlantUML in that it is designed to break completely from graphical methods of element and relationship conveyance. Although PlantUML is text-based, it relies on the use of symbols constructed from the ASCII character set to mirror visual UML elements such as solid and dashed lines and a variety of arrow types. The Earl Grey designers believe that such mirroring could lead to confusion for diagram consumers, and instead opted to use words to express these concepts. For example, they rely on the specific use of the terms “aggregation” and “association” to denote association types and use the term “isA” to convey generalizations. The classes are defined with their name and symbols within the class and end keywords (Figure 1.11). Although Earl Grey is designed for educational use, even a diagram with relatively few relationships could result in a great deal of text that must be consumed. As no tool for searching or exploring a diagram written in this language is provided, it is unlikely to be very useful in all but the simplest of scenarios. 1.2.2.3 Audio Methods Metatla et al. noted that UML diagrams are hierarchical structures that present the same information in several different ways [44]. For example, they noted that a diagram consisting of a class A and B, with an association between them, that a spoken representation of that information might be from the perspective of A or B (i.e. “A holds an instance of B”, or 29 Figure 1.11 The Earl Grey text-based class diagram notation. Generalizations are indicated by the “isA” keyword, while other relationship types are denoted with the relationship type, starting and ending classes, and the “end” keyword. Multiplicity may be added to the starting and ending class entries. could be from the perspective of the relationship (i.e. “an association exists between the two classes”. They designed an auditory display for class diagrams that emphasized the various perspectives. Their design had menus for objects, associations, and generalizations (they did not focus on other relationship types). The objects menu contained a list of all classes in the diagram, while associations and generalizations contained lists of those relationship types and the classes that comprised them. The designers presented information in two ways: a “verbose” mode that used text-to-speech descriptions, and a “terse” mode that used some text-to-speech combined with non-speech audio. Non-speech audio is used to represent a relationship, with different timbres used to represent different types of relationship. The stimulus consisted of one short tone and one long tone, or one long tone followed by one short tone. In comparing the tones with the visual element of a line ending with an arrowhead in classical UML, the short tone was representative of the arrowhead, while the long tone represented the line portion. If we had two classes, A and B, and the direction of the relationship was from B to A, the designers would reflect that with the class name “A”, a short tone (an arrowhead pointing at A), a long tone (representing the connecting line) and the class name “B”. If the direction were A to B, the short and long tones would be presented in reverse order. In addition, they used amplitude modulation to create what they 30 called a “coming into” or “coming from” effect. An experiment was performed to measure the effectiveness of both verbose and terse strategies. Participants were asked to answer questions about several diagrams. Ten trials were conducted using the verbose mode, and an additional ten for the terse mode. The participants scored well, with a mean score of µ = 96 (σ = 6.08) out of 100. A statistically significant time difference was not found between the completion times for the two presentation modes; this is an important result, showing that non-speech audio can effectively replace spoken descriptions of relationships in these diagrams. This is reinforced by the work we have since performed [75], as well as by the work presented later in this document. Unfortunately, the work by Metatla did not include specific information on the audio used. They note the use of “Different timbres”, as well as amplitude modulation, but do not provide any more information than that. Furthermore, the inability to reflect a holistic view of any part of the diagram at once is cumbersome. The appeal of visualizations such as UML is that the consumer can focus their attention on the macro view to notice patterns and trends, and then “drill-down” to specific parts of the diagram for further analysis. By including only three hierarchies, each of which provides merely a focused view on a particular type of element or relationship type, a consumer is forced to jump amongst the various menus to develop a complete schema of any single element. A better mechanism would incorporate this fine-grained view that Metatla’s work provides with an additional macro-level view or views, as well as a mechanism to search for elements and quickly navigate between them. 1.2.3 Mixed-mode Methods A method for presenting node link graphs, dubbed “TADA”, was published in 2024. This software relies on touch and speech for input and produces speech and musical notes to produce information about node-link diagrams (of which UML class diagrams can be considered a subset). The software was made to work with commodity tablets, without additional special hardware. The sounds are played as the user moves their hand across the tablet surface, and the system provides seven different interaction techniques. A 150Hz 31 pure tone indicates to the user that their hand is near a diagram element (a node or a link). The software represents a node in the graph presented with a French horn sound. The number of connections the node has influences the pitch of the French horn - the more connections the node has, the higher the pitch. A plucked guitar sound represents links between nodes; the shorter the connection length, the higher the pitch. An experiment was carried out with 25 participants (14 female, 10 male, 1 transgender, all of whom self-reported as legally blind. Of this group, 20 were unable to visually perceive any part of the diagrams on the tablets, and five were able to visually perceive only varying amounts. Participants participated in six blocks of activities. In each block, a training phase was completed before a testing phase. Participants were asked to answer questions about the diagrams presented to them. Participants were able to accurately answer questions about the diagrams, though nearly a fourth of them needed assistance with more than one of the questions. The study designers completed a NASA-TLX task load index to determine the perceived workload of the participants using the system and found that for the majority of the participants, the perceived workload was low. However, for some of the participants, the physical and mental demands of producing some of the interaction techniques were high. Furthermore, the system suffers from the design choices of using a fixed-size graph and the inability to pan or zoom, making this only usable for the most simplistic graphs. 32 1.2.4 Identifying Student Struggles with UML In order to better understand the user interface requirements for conveying class di- agrams, we first consulted the literature to determine the issues that cause students to struggle with class diagrams. We felt that issues that cause a student to struggle could po- tentially be exacerbated by presentation via audio. Although the literature was helpful (and is summarized in Section 1.2), we discovered a knowledge deficit. Studies often examined the results of student work, but we did not find publications asking students to relay their perceptions of why they struggle. Though it is often maligned, the Object Oriented (OO) programming paradigm remains in very high use in industry, and is taught in many undergraduate computing courses. Al- though exact numbers describing use of the paradigm do not exist, we do know that languages that support OO are consistently ranked highly in lists of the most used programming lan- guages. At the time of writing, both the IEEE Spectrum’s 2023 Top Programming Language list and the PYPL Popularity of Programming Language list show the first six of their top languages support OO, and one language (Java, the second highest rated in both) requires the OO paradigm [20, 8]. The TIOBE index provides similar results, with seven of the top eight languages supporting OO [9]. However, undergraduate struggles with learning OO concepts are long-standing and well documented [76]. Although there are other ways of visually modeling software such as Entity-Relationship Diagrams (ERDs), Class-responsibility-collaboration cards (CRCs), and even informal draw- ings, class diagrams are the commonly accepted standard for modeling software across most undergraduate curricula. Unfortunately, undergraduate students continue to struggle with learning and using UML; a review of the literature by Muoz et al. identified four areas in which students struggle with learning UML [47]. In that work, they note that inexperience in cognitive processing, UML complexity, abstract conceptualization, and lack of feedback are the primary problems. In this work, we posit that students may additionally be experi- encing a lack of “buy-in” due to inconsistent UML use across undergraduate courses, and a 33 misconception by students that since UML isn’t used as much in industry, that it holds little value. Previous work has shown that UML is not as widely accepted in industry as it is in academia, with surveys by multiple industry professionals reporting that more than 70% of the respondents do not use UML in practice [38, 53]. In one work, frequently cited reasons for the lack of industry use included the complexity of the language and the perception of the need to commit to its use consistently to enjoy the benefits [52]. In this work, we survey students’ experiences and perceptions of learning using UML class diagrams. The results do not indicate demographic differences in the perceptions of the students of their abilities, nor is there a significant correlation between the amount of time the students reported receiving UML instruction and their feelings of competency. Students ranked a need for more consistent use of UML throughout the undergraduate curriculum as the primary strategy needed to become more comfortable with class diagrams. 1.2.5 Study Design The research team on this project has been based at Michigan State University (MSU) and Grand Valley State University (GVSU). MSU is classified as an R1, or Doctoral university with very high research output. GVSU is classified as M1, or a Master’s Large university. Both universities’ programs in Computer Science (the authors’ field of study) are accredited through the Accreditation Board for Engineering and Technology, Inc. (ABET). We created a set of survey questions common to both universities and then created specific versions for each university. This allowed the researchers to compare the two populations to see if specific trends existed across both groups, and to allow the researchers to see which specific classes at their universities were using UML Class Diagrams. 1.2.6 Research Questions We identified the following research questions: RQ1: Are there significant differences between the two universities that affect student comfort with UML? RQ2: Are there demographic differences between students that affect student comfort 34 with UML? RQ3: Do students report struggling more with using UML to design a software solution, or with implementing a coding solution from UML? RQ4: Which factors do students indicate would be more helpful in becoming proficient with UML? 1.2.6.1 Population The surveys were sent by email to undergraduates from both universities. There were no benefits or consequences for students to take the surveys. Between the two locations, 105 students responded. Of those 105, n = 67 completed all survey questions (n = 38 for MSU, n = 29 for GVSU). Of the 67, 89.6% (n = 60), reported an age in the range of 16-24, 9% (n = 6), in the range of 24-34, and 1.4% (n = 1) in the range of 35-44. The standing of the class was also examined. The respondents were mostly upper class students (over 85.0%), with seniors making up 55.2% (n = 37) of the respondents, followed by juniors who made up 29.9% (n = 20). The freshmen respondents comprised 6.0% (n = 4) and the sophomores comprised 3.0% (n = 2). The remaining 6.0% (n = 4) responses indicated they were graduate students; this is due to the fact that GVSU offers an accelerated five-year program for a combined bachelor and master degree. 1.2.6.2 Data Collection We used the Qualtrics Flexible Survey Tool (Qualtrics, Provo, UT) for data collection. Demographic data was collected under the human subject control protocols of both univer- sities with the approval of both Institutional Review Boards. Data were downloaded from Qualtrics as comma-separated value files and combined into a single file for analysis. Python was used to further clean the data, including selecting only completed surveys from the original 105, pre-processing and label encoding the data, and performance of all data processing. Students were asked to rate their level of comfort with both designing a software solu- tion with UML and implementing a software solution from an existing UML diagram. A 35 five-point Likert scale was used to collect responses, with possible values being Extremely Un- comfortable, Somewhat Uncomfortable, Neither Comfortable of Uncomfortable, Somewhat Comfortable, and Extremely Comfortable. The students were then given a list of seven strategies that could improve their comfort level with UML and were asked to rank them in order of usefulness. Each strategy required a response, and no two responses could be assigned the same value. The students were also given an open option to suggest their own strategies. 1.2.7 Results 1.2.7.1 RQ1: Are there significant differences between the two universities that affect student comfort with UML? The goal of this work is to determine whether there are changes to how class diagrams are taught that undergraduates feel would be beneficial to their understanding and comfort with designing and implementing software with them. We first needed to ensure that the differences between the way UML was taught in each university did not have a significant effect on the comfort levels reported by the students. We identified several areas in which GVSU and MSU may have taught UML differently and analyzed whether the differences led to statistically significant differences in the reported levels of comfort with these diagrams. First, we asked students to report the approximate number of hours of instruction they received explicitly in learning UML Class Diagrams. The choices were <1, 2-5, 5-7, 7-10, or 10+ hours of instruction. The number of responses for each selection and each university was calculated and a Chi-square test was performed to determine the likelihood of differences between the two groups. The resulting value p = 0.06, while greater than p = 0.5 is too small to consider significant, particularly with a single sample from each organization. Furthermore, a Cramer’s V test found only very small effect sizes between time of instruction and reported comfort levels (ϕc = 0.294 for comfort designing solutions with class diagrams and ϕc = 0.298 for comfort implementing solutions from them). In the best case, there seems to exist only a weak correlation between the reported amount of time students receive 36 Figure 1.12 Students were asked to approximate the range of the number of hours of class diagram instruction they had received. This chart shows the percentage of students that responded with each range. Course Topic Software Engineering Introductory Programming (CS1 or CS2) Database Systems Analysis and Design Number of Responses (%) 53 (79.1%) 23 (34.3%) 17 (25.4%) 6 (9.0%) Table 1.4 Course types identified by students as providing explicit UML class diagram instruction. instruction and student understanding of class diagrams and the concepts they represent. Students were asked to specify in which courses they received explicit UML class diagram instruction. The courses were then categorized and grouped by topic. Differences in teaching styles and instructor topic preferences mean that many different courses were reported. However, we don’t find it meaningful to include outlier class types. As a comprehensive analysis broken down by instructor is far beyond the scope of this paper, we decided to accept a response if the same class type was indicated as being a source of class diagram instruction by more than 3 of the respondents from that given university. The results indicate that class diagram instruction occurs primarily in four course types: Introductory (CS1 and CS2), Systems Design and Analysis, Database and Software Engineering courses (Table 1.4). The students were then asked to report the percentage of classes in their major in which 37 Figure 1.13 The percentage of courses students reported that use class diagrams. Over 90% of respondents state that less than a quareter of their computing courses use class diagrams. they used class diagrams. This was not to measure instruction time in class diagrams, but rather to determine how many courses reinforced the importance of them by having students use them to design solutions. Although class diagrams are likely not useful for all courses (we have a hard time thinking of a use for them in a low-level course on architecture and hardware, for instance), there are many software engineering and computer science classes that require programming assignments that could be modeled with class diagrams. However, the overwhelming majority of the two groups of students reported that < 25% of their major classes used class diagrams - 82.8% (n = 24) for GVSU and 94.7% (n = 36) for MSU (Figure 1.13). We then asked the students to identify the specific courses in which they used UML. Again, we collected and coded the courses by topic and calculated the aggregate results. Course topics that had less than 3 respondents were again dropped as outliers. The responses to this question were nearly identical to those identifying courses that explicitly taught UML, with one additional topic - Mobile Application Development. 38 Course Topic Number of Responses Software Engineering Introductory Programming (CS1 or CS2) Database Systems Analysis and De- sign Mobile Application Devel- opment 55 19 13 6 6 Percentage of Respon- dents 79.1% 34.3% 25.4% 9.0% 9.0% Table 1.5 Course types identified by students as requiring the use of UML class diagrams. Table 1.6 : Contingency table of Cramer’s V effect sizes between demographic variables. No two demographic variables were highly associated with student learning outcomes. 1.2.7.2 RQ2: Are there demographic differences between students that affect student comfort with UML? We created a contingency table using the Cram´er V test in the demographic fields of gender, age, ethnicity, primary language, and comfort levels reported for both the design and implementation of UML solutions (Table 1.6). For gender, 73.1% (n = 49) reported identifying as male, 20.9% (n = 14) as female, 4.5% (n = 3) preferred not to answer, and 1.5% (n = 1) reported identifying as nonbinary. Responses to questions about ethnicity found 73.1% (n = 49) identified as White/Caucasian, 10.4% (n = 7) as Asian - Eastern, 6% (n = 4) as Asian - Indian, 3% (n = 2) as African- American, 3% (n = 2) as Hispanic, 3% (n = 2) as Mixed race and 1.5% (n = 1) as Middle Eastern. Both universities attract a large number of international students. To ensure that lan- 39 guage barriers were not the primary cause of student UML struggles, we asked students to select their primary language. Most of the students (95.5%, n = 64) listed English as their primary language. The other languages listed were Arabic, Mandarin and Marathi (1.5%, n = 1) each. Students were also asked to self-report whether they considered themselves to have a disability and if so what type of disability, as researchers wanted to ensure that visual or hearing impairments were not a major factor in student struggles. Of the 67 students, 70% (n = 47) reported no disability, 20.9% (n = 14) reported some type of disability, 7.5% responded with “Maybe”, and 1.5% (n = 1) preferred not to say. There were no demographic variables strongly associated with higher reported levels of comfort in using UML for software design or implementation. 1.2.7.3 RQ3: Do students report struggling more with the use of UML to design a software solution or with implementing a solution using UML? When asked to choose their level of comfort in designing an acceptable solution to a software problem using UML, 37.3% (n = 25) students noted that they were “Somewhat comfortable”. An additional 16.4% (n = 11) replied that they were “Extremely comfortable”. The remaining students reported “Neither comfortable nor uncomfortable” at a rate of 16.4% (n = 11), “Somewhat uncomfortable” at 16.4% (n = 11), and “Extremely uncomfortable” at 7.5% (n = 5) (Figure 1.14). Similarly, when ranking their comfort with implementing a software solution from a UML diagram, 44.8% (n = 30) the students were “Somewhat comfortable”, and 10.4% (n = 7) were “Extremely comfortable”. The remainder of the students reported more neg- ative results, with 25.4% (n = 17) responding “Somewhat uncomfortable”, 10.4% (n = 7) “Extremely uncomfortable”, and 9.0% indicating that they were “Neither comfortable nor uncomfortable” with this topic (Figure 1.15). A Cramer’s V test reported a moderate association between comfort design and com- fort implementation of a solution. The average numbers of responses were similar for the 40 Figure 1.14 The reported levels of comfort students reported with using class diagrams to design a software solution. Figure 1.15 The reported levels of comfort students reported with using a given class diagram to implement a software solution. 41 “Somewhat comfortable” and “Extremely comfortable” responses, with 53.7% responding with some level of comfort in designing a solution and 55.2% feeling at least some level of comfort in implementing a solution. Although it is not surprising that those two categories are correlated, these findings indicate that nearly half of the students do not feel comfortable using class diagrams and that many feel uncomfortable doing the tasks. Furthermore, 74.6% (n = 50) of the students responded that they do not use UML when it is not assigned. 1.2.7.4 RQ4: What factors do students indicate would be more helpful in be- coming proficient with UML? We asked students to rank the following strategies to improve student comfort with UML class diagrams: 1. More time spent on the process 2. Better UML resources for reference 3. More/Better design examples 4. More/Better implementation examples 5. Multiple modalities for exploring diagrams such as haptic or audio diagrams 6. Better creation/viewing software for diagrams 7. Better consistency in using UML in courses Respondents were asked to rank each of these fields according to their perceived usefulness in increasing comfort with class diagrams. The students labeled each strategy with a value between 1 (most helpful) and 7 (least helpful). A weighted average was calculated for the number of responses for each strategy with the value assigned by the students 1.7. The highest ranked strategy was “Better consistency in using class diagrams across the curriculum”. The students also indicated a preference for “More time spent learning class diagrams”, “Better design examples”, and “Better implementation examples”. When given the option to provide an open response, only three students did so. Their suggestions were “Lots more practice such as a program that grades UML you create after giving you example 42 Strategy Better consistency in using class diagrams throughout the curriculum More time spent learning class diagrams Better design examples Better implementation examples Better class diagram references Better diagramming software Multiple modalities for exploring diagrams Weighted Average Rank 11.250 10.179 10.071 9.857 9.426 9.392 8.250 Table 1.7 Weighted average values of the ranking of strategies to improve comfort with class diagrams. questions”, “[Course] is a lot of work and because of that I didn’t do the best. With that said, I still feel as though more UML forced practice would be great,” and “Using in more courses. I use UML for personal projects but never in class”. These statements reinforce the results that students feel that more consistency and interaction with class diagrams would be helpful. 1.2.8 Discussion The responses to the surveys of both universities indicate that students are receiving similar levels of instruction in both designing and implementing solutions. Both show similar numbers of hours of specific instruction given and similarity in the percentage of classes in the major that use class diagrams. However, this percentage is small. We were surprised to find that the same courses that students identified as teaching UML class diagrams were the only ones that used them. Perhaps this is to be expected, but it almost certainly reinforces to our students that modeling software through the use of these diagrams lacks value. Most computer science and software engineering courses probably require students to write software, but according to these results, more than 75% of the courses do not ask students to model their thinking and design process through the use of the tool we teach them that exists for that purpose. Furthermore, both GVSU and MSU require a collaborative culminating experience course in which students are expected to develop a comprehensive solution to a software problem, 43 yet not a single student in our sample identified that course using class diagrams. While it may be true that professional software developers don’t need to model software and can instead rely on more agile development processes, we must keep in mind that students are still developing their skills and mental models, and thus need this extra reinforcement. In addition, student responses identify the desire for more consistent use of class diagrams throughout the curriculum as the most helpful strategy to improve their level of comfort with them. As Large Language Models (LLMs) such as ChatGPT 11 and Github Autopilot 12 become increasingly capable of writing efficient code, it becomes ever more important that next- generation engineers are able to deconstruct and understand larger problems. Modeling throughout the undergraduate curriculum - and throughout an engineer’s career - holds the potential for helping students architect solutions to increasingly complicated problems. The ability to use those models, to update them and modify them, and to then generate code from the formal specifications has the potential to create high-quality software with less human work. 11https://chat.openai.com/ 12https://github.com/features/copilot 44 CHAPTER 2 DETERMINING USER PERCEPTION OF AUDIO STIMULI AS GEOMETRIC SHAPES We began this work with the desire to convey different shape types with audio. We weren’t specifically focused on class diagrams at this point; we had hoped to create a mechanism that would be useable for many different types of node-edge graphs, flow-charts, etc., reasoning that most visual graphs rely upon the mechanism of shape to present different types of entities. It was theorized that we could represent different shapes with spatial audio, by sequentially playing the spatialized points that made up the shapes. 2.0.1 Methodology We began our work with the determination of the efficacy of conveying simple (six-sided or less), closed, convex shapes with sound. This work is summarized here, published in [73] and expanded in [74]. We chose closed, convex shapes, as many graphical diagrams use such simple mechanisms. We developed software that maps audio to an 18.1 cm2 area. The math for the mapping made use of stereo panning and pitch manipulation using these functions: Amplitudelef t(x) = 1.0 − x 660 Amplituderight(x) = x 660 P itch(y) = y + 220 (2.1) (2.2) (2.3) Here x is a normalized volume value and y is in Hz. According to existing work, human hearing can discriminate between two frequencies below 4 kHz with precision down to 2 Hz [57]. However, above 4-5 kHz, our hearing is much less precise [13]. We selected our range based on these facts, combined with the desire to keep our participants as comfortable as possible during the testing; we felt that higher-frequency sounds could become irritating over long testing sessions. We developed a Java application for presenting shapes through sound and for collecting data. The participants were shown 10 different shapes, each made up of six or fewer line 45 segments representing convex polygons. A convex polygon is a shape in which any line drawn between two points within the shape remains entirely inside the polygon. A Planar PCT2485 touch screen display was used to present the two-dimensional shapes, with a display area of 18.1 cm by 18.1 cm. Participants could use a Korg nanoPAD2 MIDI device to control the playback of the shapes’ sonifications, including play/pause, speed, and the direction of playback (either clockwise or counterclockwise). The audio output was generated by a computer and routed through a Behringer HA4700 multichannel headphone amplifier, allowing each participant to individually adjust the volume. The researchers monitored the audio to ensure proper system functionality and the participants listened using Sony Professional MDR-7506 stereo headphones. Participants were allowed as much time as they needed to practice and familiarize them- selves with the setup. On-screen buttons were provided to play the audio for the center coordinates, values along the edges of the plane, and random points. During practice, par- ticipants could play a random point, select the perceived location on the touchscreen, and then compare it with the actual position. They could also play sounds for the sample shapes and view the corresponding visual output. When participants felt ready, they informed the researcher, who would then begin the presentation of the study shapes. Each participant was first presented with five test shapes, all composed of line segments. A small click was played at the end of each line segment to indicate a change in direction. Participants could replay the shape as many times as they wanted, choose to play it clockwise or counterclockwise, adjust the speed of playback, and control the volume. Once confident that they understood the shape, participants would attempt to draw it on the basis of their perception, after which they were shown the correct shape. After five trials, the researcher informed them that the correct shapes would no longer be shown for the remaining 10 tests. After these final attempts, the session ended without revealing the shapes to prevent bias or prior knowledge in future participants. The shapes included equilateral, isosceles, and scalene triangles, squares, rectangles, di- 46 Figure 2.1 Sample result from our study on conveying simple shapes with audio. Taken from [74]. amonds, and randomly generated polygons composed of four to six line segments. 2.0.2 Results of the Shape Drawing Study We recruited 66 students (16 female, 50 male), all undergraduates with an average age of µ = 21.05 years (σ = 2.8 years), all of whom reported normal hearing and carried out our experiment. The results were poor. Five participants were unable to finish the task as they found it too complex. Although we explicitly told each participant that all shapes were closed and convex, showed them samples, and allowed them to practice before testing them, most of the responses did not meet the criteria. Instead, participants frequently created drawings with unconnected line segments (Figure 2.1). Examining the pictures participants drew and conversing with the participants after testing led us to conclude that though the individual mappings were simple and easy to understand, the combination of the two led to imprecision. However, the researchers were able to use the system with decent accuracy, so we were not yet ready to abandon the idea. 47 We wanted to be sure that this system would not work and identified a possible lack of motivation as an influencing factor, as the task itself was not very interesting or exciting. We decided to repeat the study, but in a context that would be more fun and motivating for the participants. To do this, we developed a video game. 48 Figure 2.2 Sample scene from video game created for our study. In this image, the user has completed a spatial audio positioning task that unlocked the door. 2.0.3 Trying Again, with Increased Motivation We were not ready to give up on the use of spatial audio to convey points in space. We felt that we were able to perform the tasks relatively well ourselves and that perhaps lack of motivation or interest with the task was the problem. We hypothesized that a similar task in the context of a video game might yield better results. 2.0.4 Methodology We decided on a game where the player docks with and is trapped on an alien spacecraft. The premise of the game was that the aliens who created the ship did not use vision; instead, all controls are audible. The player’s task was to explore the ship to find pieces that fixed a broken hanger door that they needed to open to escape. Most of the doors of the ship were locked, but could be opened by touching them in the locations communicated using spatial audio signals (Figure 2.3). We acquired and created 3D assets for the spaceship, and built a prototype using the 49 Figure 2.3 When players encountered a door, the game scene switched to a mini-game that conveyed points in a plane via spatialized sounds. Players clicked where they perceived the sounds to be on the plane. If the were precise enough, the door would unlock. Babylon.js 1 Web game engine. The Howler.js 2 spatial audio library was used to spatialize our sounds. The stimulus used to convey the positions was a free 440Hz sine waveform found on Freesound.org 3. The waveform was modulated by pitch using the equation 2.3 and panned by Howlerjs’ equal-power panning function. We decided on this panning function as opposed to our original functions (Equations 2.1 and 2.2) in the hopes that it might lead to better results. The researchers were able to play the game with reasonable accuracy, so we began pre- liminary testing with students. Unfortunately, even with the addition of the game context, the results were poor. We began preliminary trials, requiring users to be accurate within a 100 pixel distance from all conveyed points in order to unlock a door. This was not nearly enough, so we altered the task to use a 250 pixel difference. Again, this was not a large enough range. 1https://www.babylonjs.com/ 2https://howlerjs.com/ 3http://www.fresound.org 50 After ten preliminary trials, the results were so disappointing that we ended the experi- ments. The players who tried the game reported that it was frustrating and confusing trying to unlock the doors. We considered modifying the game to automatically open the doors after three failed attempts but instead decided to rethink our premise. At this point, it was apparent that lack of motivation was not the primary issue preventing users from accurately selecting the points. We decided to change our research question; instead of “Can we convey points in space using spatialized sound?”, we set out to determine “How precisely can the average person select a position conveyed with spatialized audio?” 51 CHAPTER 3 DETERMINING LISTENER PRECISION Our previous work showed us the need to determine the precision with which the average listener could select a point conveyed using spatial audio. An experiment was devised to help make that determination. 3.0.1 Methodology Though many visualizations rely upon the mechanism of shape, there are other mech- anisms that we could use with sound that could convey the same information denoted by shapes in visualizations. For instance, a particular shape type in a flowchart or UML dia- gram could be represented by a particular musical instrument in a sonification, as the shape itself is not important, but rather what it represents. Therefore, we reasoned that it would be more useful to determine if any perceptual commonalities existed between listeners using a system that conveys elements’ position via the mechanism we used in our first study. To that end, we revised our software from the shape presentation study to audibly present a point to a listener and then wait for the participant to touch the perceived position of that point on a touch screen monitor. The details of our setup can be found in [73, 74]. We conducted two studies with this setup; the first comprised single-point tests, while the second presented two points. Participants could replay the position as many times as needed and were allowed to replay audio stimuli representing the boundaries of the plane. Then they were able to select the position that they believed corresponded to the audio stimuli. The participants were shown the correct position after each trial. Previously, we had relied on pure tones as audio stimuli in our work. Pure tones were adequate for our previous work; yet we felt it would be too confusing to convey multiple relationship types in one audio scene. We settled on the use of different timbres to convey the various types of relationship. Studies have shown that timbre discrimination in humans is very robust and sounds as short as a few milliseconds can be reliably identified [64]. After listening to a variety of audio samples, we settled on two that we felt were easily differentiable. 52 Figure 3.1 Sample point selection. The users’ selected points are represented via concentric circle targets, and the software selected points represented via solid circles. Note that this is an example of the two-point test. The presentation plane measures 18.1 cm by 18.1 cm. The first was a synthesized piano tone at 440Hz (Concert A or MIDI A4), while the second was a synthesized glockenspiel tone at 440Hz. 3.0.2 Results We recruited undergraduate students, this time 29 (9 women, 20 men), with a mean age of µ = 20.3 years (σ = 1.5 years). All reported normal hearing. Participants were allowed to practice until they reported that they were comfortable with the system. We then tested their precision. An analysis was performed to determine the mean precision (Figure 3.2) and found that the users were accurate to within 3.57 cm (σ = 1.95 cm). A regression analysis was performed to see if there was a correlation between practice trials (Figure 3.3) and overall accuracy, but found no significant correlation (R2 = 0.06). We decided to repeat the test and add a test that presented two points sequentially with a small delay. We again recruited students, finding 25 undergraduates (4 female, 21 male) 53 Figure 3.2 Mean accuracy per trial for the single-point only test. Accuracy did not increase with reinforcement. Originally published in [73]. Figure 3.3 Number of practice trials and mean user error. Originally published in [73]. 54 Figure 3.4 Number of practice trials and mean user error. Originally published in [73]. with a mean age of 21.8 years (σ = 4.0 years), all of whom reported normal hearing. For each participant, we first performed the same ten single-point tests and then ten trials with two points presented. The mean accuracy on the single-point task was 3.13 cm (σ = 2.24 cm), which closely mirrored the results of our first experiment. However, the two-point test showed poorer results, with a mean accuracy of within 4.33 cm for selecting the first point and a slightly lower accuracy of within 4.94 cm (σ = 3.23 cm) for selecting the second point. Analysis showed that the mean error in angle difference from the first and second points was 0.71 radians (40.68 degrees). Again, there was no correlation between practice time and overall accuracy, with a regres- sion value of R2 = 0.005 (Figure 3.4). Five different intervals between the stimuli presented were tested in the two-point test to determine whether the delay between the stimuli af- fected the results. Figure 3.5 presents these results. A duration of one second yielded the best results, although the differences between the length and accuracy of the stimuli were minor. 3.0.3 Discussion The results of this study indicate that with a reasonable division of the presentation space and the placement of graph elements in those divisions, users may be able to reliably 55 Figure 3.5 Boxplots of angle error compared to length of stimuli (n = 250). Originally published in [73]. perceive the position of an element relative to other elements. We decided to modify our presentation of audio stimuli in future studies to address this issue. We found inspiration in a work from 2008 named iSonic that focused on the presentation of scatterplots to individuals with visual impairments [79]. iSonic presented a scatterplot via a nonet (a grid with nine squares). The scatterplot data were converted into a heatmap such that a higher data point density resulted in a higher value for a cell. Pitch- modulated audio tones represented the values of a particular grid cell, and users could interact with the system using a keyboard and a touchpad. Most importantly, the nonet structure was recursive; users could select a cell to “zoom” into it, resulting in the cell’s expansion to the size of the entire grid. The presentation was then subdivided into nine cells, allowing users to explore the data in a very fine-grained manner (Figure 3.6). The iSonic authors tested the efficacy of their mechanism with a study comprising seven legally blind individuals (none reported having any residual vision). Individuals were asked to answer questions from three datasets using either the Excel or the iSonic system. The results showed that the 56 Figure 3.6 A scatterplot (left), and the same data presented using the iSonic software (from [79]. The software converts data point density to heatmap values. Users can zoom into a cell to have it expand to the entire nonet, which is then subdivided into cells for data exploration. participants were able to answer questions as accurately with the iSonic system as they were with Excel (precision 86% in each); however, the participants reported that iSonic was easier to use than Excel (rating it 7.9/10 for ease of use compared to 7.0/10 for Excel). The use of the nonet structure for exploration of data is appealing, for its simple design, recursive structure allowing for exploration of data at different levels, and because of our research on position and angle precision. We decided to adapt the nonet mechanism to the presentation of class diagrams. However, class diagrams differ greatly from scatterplots. Scatterplots represent a magnitude estimation task; for any given cell, stimuli must be provided that allow the consumer to compare the number of data points in that cell to other cells. However, class diagrams must convey the different types of relationships between diagram elements. 57 CHAPTER 4 USING NONETS AND SOUND TO CONVEY STATIC DIAGRAM ELEMENTS Building on our previous work, we created software for a proof-of-concept. This time, we wanted to determine whether listeners could accurately recall static, structural information about classes using the nonet mechanism to divide our space and presenting the elements within the nonets using sound. 4.0.1 Method Real-world class diagrams for large software projects can be very complex, while diagrams to convey educational concepts tend to be much smaller and simpler. For this reason, we decided to focus on the subset of diagrams most likely to be found in widely used software engineering texts. These diagrams illustrate concepts important to software engineering students, such as recurring design patterns, and usually contain a small number of classes. For example, the classic software engineering text Design Patterns: Elements of Reusable Object-Oriented Software contains twenty-three commonly used design patterns [34]. The sample structures provided by the authors of that text on average contained 4-5 classes (n = 23, µ = 4.5, min = 1, max = 10). Figure 4.1 illustrates one of these educational examples, a very commonly used pattern called the Iterator pattern. Zhao et al. showed that a nonet, or a grid with nine cells, can be used to convey graphical data via audio (Figure 3.6) [79]. Of particular importance is that each cell of their grid was Figure 4.1 The Iterator design pattern is one of many commonly taught design patterns in introductory software engineering courses. 58 portrayed in a pre-defined order and that an empty cell was represented by either a pause or a short stimuli. Our work borrows the nonet concept. Given a list of classes, a user of our system may select a class from a hierarchy to examine more closely. The chosen class is placed in the center cell of the nonet. Classes that directly relate to the chosen class are placed around the outside of the nonet. When a user chooses to play a representation of the selected class and its relationships, the stimulus for each cell in the nonet (except the chosen class in the center) is played clockwise starting at the upper left cell and ending at the middle left cell. Any cell that does not contain a class directly related to the currently chosen class is conveyed by a short clicking sound. Four relationship types may need to be portrayed in UML Class Diagrams. We will describe these in terms of two anonymously typed classes, class type A and class type B. An association occurs when class A holds a reference to one or more objects of type B. A generalization is a relationship whereby class A generalizes (as in is more generic than) class type B; these are used to illustrate parent-child or superclass-subclass relationships. A realization occurs when class A implements an interface specified by class type B (i.e. it implements the methods type B specifies). Finally, a dependency exists when class A requires class B to exist so that it can complete some task. Of these four relationship types, realization does not exist in all languages and can be simulated by generalization and abstract classes; therefore, it was not included in this proof-of-concept. Dependency relationships serve as a sort of “catch-all” type that can be complex to grasp for introductory students and were therefore also omitted from this work. Audio stimuli were assigned to the two remaining relationship types; general MIDI in- struments were used for simplicity. An association was represented by a half-second C4 note (MIDI note 60) with a synthesized acoustic grand piano (MIDI instrument 0). General- izations were represented by a half-second C4 note with a synthesized glockenspiel (MIDI instrument 10). These were chosen by the researchers because of their disparate timbres. The Oracle Java JDK provided the synthesizer used for our MacOS system; however, the 59 sounds were replaced with the FluidSynth 1 sound font, as the default provided sound font was deemed of poor quality. The participants listened to the stimuli via Sony MDR7506 Studio Monitor headphones connected to a four-channel Behringer Powerplay Pro-XL amplifier. As the amplifier was multichannel, the researchers were able to listen to the stimuli at the same time as the participants, and both the researcher and the participant had full control of the volume of their headphones. Participants could press a button on a gamepad to start the playback of the diagram. Each cell of the grid was played sequentially in clockwise order around the chosen class, which occupied the center of the grid. A short click was played if there was no class in a particular cell; otherwise, the association or generalization sound was played to indicate that a class in the current playing cell related to the chosen class via that type of relationship. Participants had a button that could repeat the diagram if they wanted, as well as buttons to control the length of the delay between stimuli. An illustration of our testbed layout is shown in Figure 4.2. An invitation was extended to undergraduate students in computing courses at Grand Valley State University. The study population ultimately consisted of n = 29 undergraduate students (24 male, 5, female). The results of this study were originally published in [75]. The following questions were asked of the participants before participating in the study: 1. What is a class? 2. What is an instance variable? 3. What is inheritance? 4. What is polymorphism? 5. What is an association relationship in UML Class Diagrams? 6. What is a generalization relationship in UML Class Diagrams? 7. Which relationship type denotes that a class holds one or more instance variables of another class type? 8. Which relationship type denotes that a class is a parent or child of another class? 1https://www.fluidsynth.org/ 60 Figure 4.2 Sample layout of a diagram using our testbed. Client was the chosen class and therefore occupies the central cell. Classes with relationships to this class are placed in the cells around the chosen class. 9. What is multiple inheritance? 10. What is a class hierarchy? These questions were used to gauge whether participants understood the concepts of Object-Oriented Programming and UML Class Diagram well enough to participate. Partic- ipants who missed three or more of the questions were not invited to participate. Those students who were unable to correctly answer all questions (but were still invited to participate) received a review of a few minutes. The participants were then shown the grid with numbers printed in the lower right corner of each cell. They were told how the cells in the diagram would be sequenced and asked to demo the sequence back to the researcher. The researcher then described how a class diagram would be conveyed. The researcher then played a repeating sound while the participants put on the headphones and adjusted the volume to a comfortable level. The researcher then played with the stimuli used to represent an association, followed by the stimuli used to represent generalizations. When participants indicated that they were able to distinguish between the two and that they were 61 comfortable with the overall concept and had no questions, the researcher began to present sample diagrams. For this proof-of-concept, to ensure that a wide range of parameters were evaluated, the system randomly selected the number of associations and generalizations to present. Participants were able to start the playback of a diagram when they were ready and were able to replay the diagram if they desired. The researcher then asked the participant to relate the number of associations and the number of generalizations that the diagram conveyed. Once participants indicated that they were comfortable with the sample diagrams, the researcher began keeping track of participant responses. Each participant was tested on ten diagrams. For each diagram, the users were instructed that the central cell of the grid was repre- sentative of a class for which we wanted to know the number and types of relationships. The system generated a random number of related classes (1 ≤ n ≤ 8). Of these, between 0 and 2 were selected and assigned a generalization relationship to the class of focus. A small number was chosen for the possible number of generalizations, as it is uncommon (and often discouraged) to have classes inherit from more than one class. The remaining classes were assigned an association relationship to the class of focus. After the test of ten diagrams, the researcher then asked the participants to listen to five additional diagrams. Participants were not asked to keep track of the number of associations and generalizations for these five tests, but rather for the users to keep track of the location (based on the grid number) where an association or generalization occurred. The purpose of the second test was to examine the efficacy of the nonet mechanism if also used in a top-down manner, as we believe that a complete system will require both methods. This project was approved by the Institutional Review Board of the University. 4.0.2 Results For the first test, all participants listened to ten diagrams and were asked to note the number of associations and generalizations for the focused class, generating n = 580 prompts. 62 The participants erred on the number of associations in a diagram a total of 11 times (3.8% error rate) and erred on the number of generalizations 14 times (4.8% error rate). The types of errors that the participants made varied. The participants simply mis- counted the number of associations, but counted the correct number of generalizations on n = 2 occasions (0.7% error rate). They miscounted the number of generalizations, but still gave the correct number of associations n = 5 times (error rate 1.7%. The most common error occurred when the participants miscounted both; this occurred in n = 9 of the trials (3.1% error rate). In all cases of a miscount for both categories, it appeared that the partici- pants had mistaken one stimulus for the other, as in each instance a miscount in one category resulted in a miscount in the other category of the inverse amount. For example, a diagram may have conveyed 4 associations and 2 generalizations, but the participant perceived 5 associations and 1 generalizations, or 3 associations and 3 generalizations. The first type of error (perceiving a generalization as an association) occurred in n = 7 of these instances, while the second type (perceiving an association as a generalization) occurred in n = 2 of the instances. The second test was harder for the participants. Each of the n = 29 participants had n = 5 trials to identify which cells of the grid contained a related class. The number of errors in all trials was 21 for an overall error rate of 14.5%. Two types of errors were made by participants; errors of omission when the participant did not perceive one of the stimuli (n = 12, or 57% of the errors), and errors occurring because the participant misidentified the cell (n = 9, or 43% of the errors). 4.0.3 Discussion The results of the first test were promising. Participants were able to recall consistently, quickly, and accurately the number of associations and generalizations for a particular class. This is likely due to the low number of stimuli presented combined with a consistent and well-defined presentation space, i.e. the nonet design with a single class of focus at its center. The results for the second test were less promising but should not be ignored. Although 63 participants erred at a rate of nearly 15% in this task, they received very little training time beforehand. It is very likely that familiarity with the system could improve these scores. 4.0.4 Future Work The scope of this work was to design a proof-of-concept, and it was successful in that endeavor. Future work will be undertaken to create a UML browser that can load precreated UML diagrams or create diagrams from scratch or from existing software source code. User studies will then be conducted to determine whether students who explore UML diagrams via this system show similar levels of understanding as students who examine the same diagrams in a visual format. The stimuli used to convey the types of relationship were chosen by the researchers in this experiment. A better option may be for participants to choose stimuli with perceptual meaning that are closely related to their preexisting schemas of relationship types [31]. Although our work focused solely on the presentation of the relationships between classes in a UML class diagram, this mechanism may be useful for the audible display of more general mathematical graphs. As mentioned in an earlier section, class diagrams do have similarities with mathematical graphs (node-edge graphs), which in turn have several similarities to other types of UML diagrams such as sequence diagrams, and even non-UML diagrams such as flowcharts. In this study, no formal evaluation of the time taken by participants to answer questions after a diagram was presented (although we tracked the number of times a participant repeated a diagram). Future work should compare the time required for this mechanism with that required for visual UML. Additionally, while users noted the mechanism used for this study to be intuitive, we may want to formally evaluate the mental load using the NASA Task Load Index 2. 2https://humansystems.arc.nasa.gov/groups/tlx/ 64 CHAPTER 5 REAL-WORLD DIAGRAM AND WORKLOAD TESTING For this experiment we used the same hardware from the prior experiment, with the addition of a USB number pad. The system was prototyped in Python and then rewritten in Java. Java provided stability in production that the Python prototypes lacked, as Java’s static type checking eliminated many run-time concerns. Visual Paradigm1 was chosen as the software package for diagram creation, as it is widely used and readily available for free use by students at our institution through an academic partnership program. Our first task was to understand the file format used by the Visual Paradigm software. Inspection of the files revealed them to be SQLite databases. SQLite is well-formed, and most languages provide libraries for the manipulation of these files. We used the Xerial SQLite JDBC driver 2. Unfortunately, much of the data in these databases is unstructured; although there are a few tables, some are used for large text dumps. Thus, we had to use regular expression matching to filter out the data we required. In particular, Visual Paradigm provides the ability to export diagrams as XML files. However, we were unable to locate an XML schema to describe the data and help us parse the information needed from the files. Once we completed the code that enabled us to pull the information we needed from the files, we wrote an application to allow users to select and explore diagrams from them. The application is designed to provide the often cited “Overview first, zoom and filter, then details-on-demand” model of information consumption [59]. To that end, we provide two views of the information. First, a diagram-level view is useful for discovering information, such as the number of classes in the diagram, their locations, and the areas with the least or most information. Second, is a detail-level view of a chosen class. The detail-level view provides information on the relationships between a selected class and the classes to which it is connected. 1https://www.visual-paradigm.com 2https://xerial.org/software/ 65 Users are first presented with a list of the class diagrams within the file (there may be more than one). Once they select the desired diagram, our software presents that diagram with the diagram-level view. Upon presentation of the diagram, the software uses the Mycroft AI Mimic 3 text-to-speech engine3 to vocalize the number of classes currently visible out of the total number of classes in the diagram. The diagram is positioned so that the upper left corner of the diagram is in the upper left corner of the presentation area. Entities in the diagram are presented in the same positions in which they were created in Visual Paradigm software. For each view, there are two presentation modes that the user can select. These include verbose, text-to-speech-based methods and audio-only-based methods that rely upon the modulation of audio samples, but no speech, to convey diagram information. We designed the audio-based methods to allow users to interact more quickly with the system than is possible with speech. Central to the audible presentation is the use of a nonet (a grid of nine cells), mapped to digit keys 1 through 9 on a USB number pad. The combination of a nonet and a number pad provides a convenient and intuitive correlation for input for exploration and navigation (Figure 5.1). This mechanism is used for both the diagram-level and detail-level views, though in subtly different ways, as described below. 5.0.1 The Diagram-Level View The diagram-level view provides a top-down approach to exploring class diagrams. When a diagram is chosen from the loading screen, it is drawn with classes positioned exactly as they were placed in the Visual Paradigm software. The nonet is superimposed on the portion of the diagram that fits within the visible region of the software. The system is initially in verbose mode, and a spoken message alerts users to the number of classes currently in view and the total number of classes in the diagram. By pressing one of the digit keys on the number pad, a verbal message is played telling the number of classes in the corresponding 3https://github.com/MycroftAI/mimic1 66 Figure 5.1 The system maps the digits 1-9 of the number pad to the cells of the nonet. Pressing a digit causes the system to render audio or speech (depending on the current mode) to describe the entities - if any - in that cell. cell and a list of each of those class names. Pressing the period key causes the system to render the number of entities in each cell sequentially, left-to-right, top-to-bottom. Although we designed the system for fairly simple, educational-use class diagrams, it is still possible that the entire diagram may not fit on the screen at once. Therefore, we programmed the system to pan the diagram using the W, A, S, and D keys (up, left, down, and right). Pressing one of these keys moves the diagram in the corresponding direction by the current width or height of a cell. When the diagram moves, an audio stimulus is best described as a short “whooshing” sound. The system will not allow one to pass past the boundaries of the diagram. Should the user attempt to move in a direction when they are already at the limit of that direction, a short “clicking” sound is played to alert them that the diagram has not moved. The software allows users to zoom in on a particular cell using the “+” key to zoom in and the “-” key to zoom out. The zoom level is not arbitrary; rather, the system requires 67 Figure 5.2 Sample diagram-level view. This view provides a high-level view of the diagram, with panning and zooming features. This view corresponds to the “Overview” phase of the “Overview, zoom and filter, then details” model of information consumption. users to select a particular cell into which to zoom. The chosen cell is then expanded to the entire nonet. This recursive zoom mechanism is based on work by Zhao et al., described in the review of the literature in this paper [79]. When zooming into a cell, if there is only one class in that particular cell, the detail-level view is opened. Alternatively, the user can trigger a search for a class by pressing the “F” key. Doing so brings up a list of all the classes in the diagram, in alphabetical order. Selecting a class from this list automatically triggers the detail-level view for that class. Cancellation from this menu leaves the user in the current view. The list is programmed to speak the names of the classes in the list as the users move through them. Zooming out brings the user to the prior diagram-level view with the previous zoom level and panning applied. 68 Users may opt to switch out of verbose mode and into speechless audio mode by pressing the “E” key. In this mode, instead of presenting a voice that demonstrates the number of classes in a selected cell, the system plays a musical note with a tremolo effect applied. The amount of effect applied is governed by the number of entities in the selected cell. No tremolo is added if there is only one class in a cell. This provides users a rapid way to discover which areas of the map contain information and which may be ignored. Unlike a traditional UML class diagram, the diagram-level view in the system does not present relationships between classes. The research team chose to separate the presentation of the relationships from the overall diagram, both because the previous work of the team had shown the need to limit the amount of audio stimuli and because research shows that the human auditory system is best suited for temporal input and vision is best suited for spatial input [36]. 5.0.2 The Detail-level View The detail-level view allows users to inspect a particular class and its relationships to other classes. This view is triggered whenever the user selects a class from the search menu or when zooming into a cell in which the class is the only diagram entity contained therein. This view also uses the nonet presentation mechanism but in a different way. In the detail view, the selected class is positioned in the center. Classes with which the selected class shares a relationship are positioned around the selected class, one class per cell. Although this provides only eight cells for related classes, this limitation is enough for the types of diagrams we wish to convey. Previous work identified the need for a mechanism for presenting diagrams with (on average) four to five classes [75]. Class diagrams represent relationships between classes with various line types, and di- rectionality with different line-ending arrows. The audio corollary in this system is the use of stimuli with varying timbres combined with pitch modulation. For relationships, we use a waveform of a piano playing a C4 note to convey an association, and a waveform of a glockenspiel playing a C4 note for a generalization. Our corollary for the directionality is to 69 Figure 5.3 A sample detail-level view. The center class (NPC in this case) is the class selected through zooming deeper into the diagram-level view. The different types of relationships are represented with different shapes for those with low-vision, but with different audio stimuli for individuals with blindness. modulate the waveform for the relationship such that a direction away from the currently selected class rises in pitch, and a direction toward the current class is a sinking pitch. The pitch change is applied to the last half-second of the stimuli, with a change in pitch of a whole step. When in verbose mode, the name of the class is spoken after the audio stimuli is played. 5.0.3 Experiment We asked for volunteers from undergraduate computing students. All students were required to have completed computer science one and two at the university where we teach. We imposed this requirement, as our university teaches class diagrams in Computer Science 70 Two; therefore, all participants should have been exposed to them prior. We wished to compare performance of three groups; a group using traditional visual UML class diagrams, one using the Visual Paradigm software with a screen reader, and one with our system. However, we tested Visual Paradigm software with Apple’s Voice Over, Microsoft Window’s Narrator, and Freedom Scientific’s JAWS screen reader, and found that none of them was able to render any diagram information. The menus and menu items were described only as “System dialog”, or a message alerting us that we were in a “JScrollPane”. It was impossible to use a screen reader with Visual Paradigm. We therefore recruited only two groups to test our software. Both groups consisted of 30 undergraduate students. Each participant worked individually with a researcher. For both groups, participants were first asked if they remembered UML class diagrams from their Computer Science Two course. We then asked them the preliminary questions from Section 4 that cover basic Object-Oriented Programming concepts and definitions, as well as questions specific to class diagrams. When students struggled to correctly answer a question, we reviewed the ma- terial with them. We were unable to find students with visual impairments or blindness. Although this limitation is unfortunate, it provided us with the opportunity to review the class diagrams visually with each participant before having them answer questions about the diagrams used in the study. The participants in the first group were then blindfolded. We described our system to them using the script from Appendix B. Participants were allowed to ask questions and then given an open-ended period to use the system and grow familiar with it, while still asking questions of the researchers. During this time, they were given a sample diagram to explore. Once a participant indicated they were ready to begin, we loaded a new diagram and began the experiment. Each participant was asked to answer ten questions about the diagram presented to them. Once a question was asked, the researcher began a timer on a stopwatch. When a question was answered or participants declined to answer the question, the timer was stopped, and the 71 time was recorded. The questions were asked one at a time and participants were given as much time to answer as they wished. During this time, participants could ask for clarification about the stimuli or how the system worked, but were not allowed to ask specific questions about the diagrams or about UML. All of the participant’s interactions with the system, as well as the timer information and answers, were recorded in log files for later data analysis. The second group was given the same opportunity to review Object-Oriented Program- ming and UML class diagrams, before being tested on a traditional, visual, UML class diagram. Participants were asked questions one at a time and given as much time as needed to answer (or declined to answer if they could not determine an answer). The researchers tracked the time used to reach an answer with a stopwatch and recorded the times and answers in log files. After completing the ten basic structural questions, each participant was asked to com- plete a NASA Task Load Index (NASA TLX) to provide information on the workload of answering the questions with the given presentation mechanism. Finally, participants were asked to answer three higher-level thinking questions about their diagrams. During this time they were not allowed to refer back to the diagrams but were asked to answer the questions from memory. This was to provide some measure of the participants’ synthesis and understanding of the diagrams they had encountered. 5.0.4 Results The first group consisted of 30 undergraduate computing students (19 male, 11 female). As noted above, participants were given as much time as they wanted to practice with the system prior to testing. On average, the participants practiced for 930.79 seconds (σ = 232.62 seconds), or around 15 and a half minutes. The minimum time taken by a participant was 635.92 seconds (10 minutes, 35.92 seconds), and the longest was 1370.94 seconds (22 minutes, 50.94 seconds). A Pearson correlation coefficient was calculated between practice time and the number of errors made by the participants, but found that the factor was not strongly correlated with success (r = 0.19). 72 1 1.0 2 0.93 3 0.87 4 0.97 5 0.93 6 0.97 7 1.0 8 0.97 9 0.97 10 0.63 1.0 1.0 1.0 0.97 1.0 1.0 1.0 1.0 1.0 0.90 Audio Pre- sen- ta- tion Visual Pre- sen- ta- tion Table 5.1 Difficulty index values for each question, separated by presentation mode. Lower values indicate harder difficulty questions. The second group comprised a different set of 30 undergraduate students (22 male, 8 female). This group served as the control group and was thus given the traditional visual UML task. Participants were asked the same questions and had the same review time as those in the audio group, but did not practice answering UML questions before beginning the experiment. For each question, we calculated a difficulty index so that we could identify the questions and the types of questions the students struggled with. This also served to compare the results of the two presentation methods. The questions were written to be simple structural questions for students to answer, given their prior experience with class diagrams. We wanted to ensure that the errors made were most likely due to the presentation mechanism and not to the complexity of the question. We used the function P = Number of correct responses Total number of responses (5.1) to generate the index values. The values of the questions separated by presentation type are in Table 5.1. We considered a question to be difficult for index values < 0.80. Only one question, number 10 presented with audio, was flagged as a hard question. Of importance is that the same question presented visually was not flagged as hard. The amount of time needed to answer a question varied greatly due to the presentation 73 Figure 5.4 Distribution of response times. As expected, audio task response times were significantly greater than the visual task. format. The average time taken to answer a question using audio-based presentations was 154.33 seconds (σ = 61.05 seconds), while the time to answer for visual-based presentations was 21.22 seconds (σ = 10.11 seconds). The response time distributions are shown in Figure 5.4. In addition to the difference in response times, it is notable that the audio task resulted in more outliers (both of a shorter and a longer response time). This could indicate that some of the participants were more comfortable with the task than others, which could indicate that more training would be helpful. The minimum time to respond to a response was 12.82 seconds for the audio-based method and 4.24 seconds for the visual method. The maximum response time was 325.43 seconds for the audio-based method and 62.62 seconds for the visual method. The mean time-to-answer values are shown in Figure 5.5. We conducted a Two-sampled T-Test to determine if there were significant differences in response times. As time variances differed between the two samples, we used the Satterwth- waite test. This showed that the time variations differ significantly (p < 0.0001). This is to be expected, as the audio task uses a sequential presentation method, whereas the visual presents all information at once. 74 Figure 5.5 The mean time in seconds, from the time a question was asked until the participant answered, per question. Figure 5.6 Time variance of responses to each question, under the audio-presentation approach. 75 Figure 5.7 Time variance of responses to each question, under the visual-presentation approach. 5.0.5 Workload Comparison In order to determine the perceived workload level of using the audio-based system, participants were asked to complete a NASA Task-Load Index (TLX) after completing a test. This index is highly regarded and used in many different fields and studies to help designers understand the subjective experience of various types of workload for users of systems [37]. The index compares the scores in the six areas of mental demand, physical demand, temporal demand, performance, effort, and frustration (Table 5.2). Users select a value for their perceived workload for each of the types. They also go through a pairwise selection process, where they rank each of the workload types against each other to determine which factors should be weighted the highest when calculating an overall index score. The final workload scores are between 0 and 100. We used the Apple iOS version of the app on an Apple iPad Air to collect the data. The task load scores for the audio presentation tasks had a mean of 22.76 (σ = 10.22). Although this represents a low-to-medium workload overall, we focussed on subscale scores to get a better idea of the complexities users encountered when using the mechanism. The scores 76 Mental Demand (low/high) - How much mental and perceptual activity was required? Physical Demand (low/high) - How much physical activity was required? Temporal Demand (low/high) - How much time pressure did you feel due to the rate or pace at which the tasks or task elements occurred? Performance (good/poor) - How successful do you think you were in accomplishing the goals of the task set by the experimenter? Effort (low/high) - How hard did you have to work (mentally and physically) to accom- plish your level of performance? Frustration Level (low/high) - How insecure, discouraged, irritated, stressed, and an- noyed versus secure, gratified, content, relaxed, and complacent were you? Table 5.2 Definitions of each workload type. From the NASA-TLX iOS app. Figure 5.8 Weighted rankings of workload measures, per trial. Mental demand had the highest workload values, followed by effort. were scaled by how participants completed the pairwise comparison tasks, which allows users to assign importance to individual workload tasks. Mental demand had the highest scaled score, with an average of 136.83 (σ = 82.31), followed by effort (µ = 109.17, σ = 57.42), performance (µ = 65.83, σ = 48.51), frustration (µ = 16.00, σ = 22.30), temporal demand (µ = 10.17, σ = 12.07), and physical demand (µ = 3.33, σ = 11.09). Workload rankings per trial are in Figure 5.8. The mean value of the workload for the visual task was µ = 14.38 (σ = 10.22). Again, we examine subscale scores to better understand user perception of the task. For the visual 77 Figure 5.9 Weighted rankings of workload measures, per trial for the visual task. Effort showed the highest workload value, followed by mental demand. task, effort was the highest rated workload with µ = 69.50 (σ = 72.22), followed by mental workload (µ = 62.00, σ = 41.68), performance (µ = 60.67, σ = 60.01), frustration (µ = 18.50, σ = 31.35), temporal workload (µ = 4.67, σ = 7.18), and physical workload (µ = 0.33, σ = 1.27). To determine whether a particular workload factor was significantly different for the audio task compared to the visual task, we were unable to use a standard T-test, as the underlying data was not continuous and the samples did not have similar variance or adhere to a normal distribution. Therefore, we used a Wilcoxon rank sum test. These nonparametric tests use a data ranking method that allows us to compare the median values and the distribution of samples (Figure 5.10). The scores indicate that the workload values for mental workload and effort are signif- icantly different (p < 0.0001 and p = 0.0015, respectively). The temporal workload values are close to statistical significance, but at p = 0.0705 they do not meet the threshold of p < 0.05. Examining the distributions of the rank scores via histograms reinforces these results. We see in Figure 5.11 that the distribution of the data for mental workload is higher for the audio task than for the visual task. 78 Figure 5.10 Wilcoxon Rank-Sum score summaries comparing the workload values of the different presentation methods. P-value scores indicate that mental workload and effort showed significant differences, with the audio method being higher. Figure 5.11 The distribution for Wilcoxon rank-sum scores for mental workload. The two methods show clear differences in distribution, with the audio presentation mechanism resulting in higher values. This indicates participants felt the mental workload was harder for the audio based task. 79 Figure 5.12 The distribution of Wilcoxon rank-sum scores for physical workload. The different presentation mechanisms did not result in significant differences in these values, though the audio method yielded a greater number of outliers. Physical, temporal and performance workload scores (Figures 5.12, 5.13, and 5.14) did not differ significantly. Interestingly, though participants for both audio and visual tasks were specifically told that they had as much time as needed to answer the questions, both reported feeling time pressure. We are unsure what to make of this result. Effort scores (Figure 5.15) varied significantly. Again, this is to be expected. However, we were surprised to find that the frustration scores were remarkably similar. This was an unexpected finding; we had hypothesized that students would find the audio task more frustrating. This could be a reflection of the general discomfort of students with UML diagramming. 80 Figure 5.13 The distribution of rank-sum scores for the temporal task. Oddly, though both groups were given as much time as needed to answer questions, these scores indicate they felt time pressure. 81 Figure 5.14 Performance rank-sum scores were very similar between the two tasks. Even though students were not being graded on the task, they reported similar measures of feelings of success with the tasks. 82 Figure 5.15 Rank-sum scores for effort were significantly different between the tasks. This indicates that the perceived effort for successfully completing the audio task was higher than the visual one. Figure 5.16 Frustration scores between the two tasks were very similar. This is a surprising result, and likely indicates general student struggles with UML diagramming. 83 CHAPTER 6 CONCLUSIONS This work demonstrates a method for conveying UML class diagrams using audio. Although this work began with a strong focus on using spatial audio to convey the relative positions of elements in class diagrams, the results of multiple experiments led us to the realization that the precision with which humans perceive positions of elements through spatial audio alone is insufficient for this task. Relationships between elements in class diagrams can be better conveyed through the use of a well-defined presentation space, the nonet, or grid of nine cells, combined with multiple views of the data (both using the nonet for layout) and modulated audio of differing timbres. Spatial audio is still utilized in the detail-level view; however, its use in the view is only of supplemental importance, as a fixed sequence of cells in the nonet is the primary mechanism used to convey relative position. It is unlikely that division of the presentation space into more cells could lead to better results, as our previous work showed that the precision of listener perception lacked fidelity. Cells in a nonet structure are localized, and provide a set of cells that is a small enough number that listeners can readily keep track of relationships. The ability to pan and zoom in on a cell compensates for the small number of cells. Additionally, the ability to directly map the nonet to a common numeric keypad provides an inexpensive and readily available interface. Although the mental workload of this method is slightly higher than that of traditional visual class diagram presentation, the difference is relatively small. Undoubtedly, the ad- dition of the remaining element types, such as interfaces and dependency and realization relationships, may increase the workload; however, with careful choice of psychoacoustic properties combined with filtering and zooming capabilities, any increase in the workload can likely be controlled. This is likely the same for the effort workload. Of interest is that the participants reported that answering the questions using traditional visual class diagrams was just as frustrating as using our audio-based method. This was 84 a surprising result believed to be due to general student discomfort with class diagrams. Combined with the results of the UML struggle survey, we believe that this is largely due to inconsistent use throughout the curriculum. It is highly likely that more consistent use could reduce this source of frustration. In particular, this work shows promise in effectively conveying those class diagrams types commonly used in educational settings. Diagrams in the educational setting are typically comprised of a relatively small number of elements and element types, and exist primarily to teach a software design pattern or concept. The results show that this work is suitable for that purpose. 6.0.1 Future Work Future work may focus on the addition of those less-used element types and relationships, such as interfaces, dependencies, and realizations. Furthermore, an examination of alternate stimuli should be performed, as it is likely that the workload can be further reduced by increasing the disparity between stimuli. Additionally, a more robust solution would likely allow for the customization of the stimuli so that users can adapt the mechanism to better suit their mental schemas. Our mechanism focuses on the presentation of diagrams with elements positioned exactly as in the original graphs. It is possible that modifying the layouts of the class diagrams while still preserving the semantic meaning might lead to better results, though this could compli- cate communication about diagrams between individuals and teams. However, some work has shown that the quality of the layout of a class diagram can affect the understanding and comprehension of diagrams [62]. This factor may be particularly important when portraying diagrams to individuals with disabilities and thus merits further study. This approach has not been tested with large and dense diagrams such as those that represent larger software engineering projects. Ultimately, it would be most beneficial to the community that this work be useful for diagrams of any size and for diagrams beyond UML. 85 [1] Brown v. Board of Education of Topeka, 347 U.S. 483. BIBLIOGRAPHY [2] Web content accessibility guidelines (wcag) 2.1. https://www.w3.org/TR/WCAG21/. [3] Pennsylvania Association, of Retarded Children v. Commonwealth of Pa., 1972. [4] The National Curriculum in England - GOV.UK, Dec 2014. [5] Fact sheet: President obama announces computer science for all initiative 2016. FACT SHEET: President Obama Announces Computer Science For All Initiative, Jan 2016. [6] Accessibility principles. https://www.w3.org/WAI/fundamentals/accessibility- principles/, May 2019. [7] Students with disabilities. https://nces.ed.gov/programs/coe/indicator/cgg, May 2021. [8] Pypl popularity of programming language. https://pypl.github.io/PYPL.html, Sep 2023. [9] Tiobe index for october 2023. https://www.tiobe.com/tiobe-index/, October 2023. [10] Iyad Abu Doush, Enrico Pontelli, Dominic Simon, Tran Cao Son, and Ou Ma. Making Microsoft Excel™: multimodal presentation of charts. In Proceedings of the 11th inter- national ACM SIGACCESS conference on Computers and accessibility, pages 147–154, 2009. [11] Iyad Abu Doush, Enrico Pontelli, Tran Cao Son, Dominic Simon, and Ou Ma. Mul- timodal presentation of two-dimensional charts: an investigation using open office xml and microsoft excel. ACM Transactions on Accessible Computing (TACCESS), 3(2):1– 50, 2010. [12] J¨urgen Anke, Stefan Bente, V Thurner, O Radfelder, and K Vosseberg. Uml in der hochschullehre: Eine kritische reflexion. In CEUR Workshop Proceedings, pages 8–20, 2019. [13] Fred Attneave and Richard K Olson. Pitch as a medium: A new approach to psy- chophysical scaling. The American journal of psychology, pages 147–166, 1971. [14] Albert Bergman. Auditory scene analysis the perceptual organization of sound. MIT Press, Cambridge, Mass., 1990. [15] Jeffrey P. Bigham, Ryan S. Kaminsky, Richard E. Ladner, Oscar M. Danielsson, and Gordon L. Hempton. Webinsight: Making web images accessible. In Proceedings of the 8th International ACM SIGACCESS Conference on Computers and Accessibility, Assets ’06, page 181–188, New York, NY, USA, 2006. Association for Computing Machinery. [16] Grady Booch, James Rumbaugh, and Ivar Jacobson. The Unified Modeling Language User Guide. Addison Wesley Longman, Inc., 1999. 86 [17] Robert G Brookshire. Teaching uml database modeling to visually impaired students. Issues in Information Systems, 7(1):98–101, 2006. [18] Matt Calder, Robert F. Cohen, Jessica Lanzoni, Neal Landry, and Joelle Skaff. Teaching data structures to students who are blind. SIGCSE Bull., 39(3):87–90, jun 2007. [19] Sarah Carruthers, Amber Thomas, Liam Kaufman-Willis, and Aaron Wang. Growing an accessible and inclusive systems design course with plantuml. In Proceedings of the 54th ACM Technical Symposium on Computer Science Education V. 1, pages 249–255, 2023. [20] Stephen Cass. The top programming languages 2023, Aug 2023. [21] E Colin Cherry. Some experiments on the recognition of speech, with one and with two ears. The Journal of the acoustical society of America, 25(5):975–979, 1953. [22] Stanislav Chren, Barbora Buhnova, Martin Macak, Lukas Daubner, and Bruno Rossi. Mistakes in uml diagrams: analysis of student projects in a software engineering course. In 2019 IEEE/ACM 41st International Conference on Software Engineering: Software Engineering Education and Training (ICSE-SEET), pages 100–109. IEEE, 2019. [23] Florian Daniel and Maristella Matera. Model-Driven Software Development, pages 71– 93. Springer Berlin Heidelberg, Berlin, Heidelberg, 2014. [24] Mohamed Debashi and Paul Vickers. Sonification of network traffic flow for monitoring and situational awareness. PloS one, 13(4):e0195948, 2018. [25] Alfonso Delgado-Bonal and Javier Mart´ın-Torres. Human vision is determined based on information theory. Scientific reports, 6(1):36038, 2016. [26] Alexandra Diehl, Alfie Abdul-Rahman, Mennatallah El-Assady, Benjamin Bach, Daniel A Keim, and Min Chen. Visguides: A forum for discussing visualization guide- lines. EuroVis (Short Papers), 6(7), 2018. [27] Brad Doherty and Betty HC Cheng. Uml modeling for visually-impaired persons. In HuFaMo@ MoDELS, pages 4–10, 2015. [28] Ga¨el Dubus and Roberto Bresin. A systematic review of mapping strategies for the sonification of physical quantities. PloS one, 8(12):e82491, 2013. [29] Mohamad Eid, Atif Alamri, Jamil Melhem, and Abdulmotaleb El Saddik. Evalutation of uml case tool with haptics. In Proceedings of the 2008 Ambi-Sys workshop on Haptic user interfaces in ambient media systems, pages 1–5, 2008. [30] Jamie Ferguson and Stephen A. Brewster. Evaluation of psychoacoustic sound param- eters for sonification. In Proceedings of the 19th ACM International Conference on Multimodal Interaction, ICMI ’17, page 120–127, New York, NY, USA, 2017. Associa- tion for Computing Machinery. 87 [31] Jamie Ferguson and Stephen A Brewster. Investigating perceptual congruence between data and display dimensions in sonification. In Proceedings of the 2018 CHI conference on human factors in computing systems, pages 1–9, 2018. [32] Jamie Ferguson and Stephen A. Brewster. Investigating Perceptual Congruence between Data and Display Dimensions in Sonification, page 1–9. Association for Computing Machinery, New York, NY, USA, 2018. [33] John M. Findlay. Active vision [electronic resource] : the psychology of looking and seeing / john m. findlay and iain d. gilchrist., Jan 2003. [34] Erich Gamma. Design patterns: elements of reusable object-oriented software, 1995. [35] Corentin Guezenoc and Renaud Seguier. A wide dataset of ear shapes and pinna-related transfer functions generated by random ear drawings. The Journal of the Acoustical Society of America, 147(6):4087–4096, 2020. [36] Sharon E Guttman, Lee A Gilroy, and Randolph Blake. Hearing what the eyes see: auditory encoding of visual temporal sequences. Psychol Sci, 16(3):228–235, March 2005. [37] Sandra G. Hart. Nasa-task load index (nasa-tlx); 20 years later. Proceedings of the Human Factors and Ergonomics Society Annual Meeting, 50(9):904–908, 2006. [38] Ed J´unior, Kleinner Farias, and Bruno Silva. A survey on the use of uml in the brazilian industry. In Proceedings of the XXXV Brazilian Symposium on Software Engineering, SBES ’21, page 275–284, New York, NY, USA, 2021. Association for Computing Ma- chinery. [39] Alan C Kay. The early history of smalltalk. In History of programming languages—II, pages 511–598. Association for Computing Machinery, New York, NY, USA, 1996. [40] Jonathan Lazar, Aaron Allen, Jason Kleinman, and Chris Malarkey. What frustrates screen reader users on the web: A study of 100 blind users. International Journal of human-computer interaction, 22(3):247–269, 2007. [41] Claudia Loitsch and Gerhard Weber. Viable haptic uml for blind people. In Klaus Miesenberger, Arthur Karshmer, Petr Penaz, and Wolfgang Zagler, editors, Computers Helping People with Special Needs, pages 509–516, Berlin, Heidelberg, 2012. Springer Berlin Heidelberg. [42] Leandro Luque, Leˆonidas de Oliveira Brandao, Romero Tori, and Anarosa Alves Franco Brandao. On the inclusion of blind people in uml e-learning activities. Revista Brasileira de Inform´atica na Educa¸c˜ao, 23(2), 2015. [43] Martin Mazanec and Ondrej Macek. On general-purpose textual modeling languages. In Dateso, volume 12, pages 1–12. Citeseer, 2012. 88 [44] Oussama Metatla, Nick Bryan-Kinns, and Tony Stockman. Auditory external represen- tations: Exploring and evaluating the design and learnability of an auditory uml dia- gram. In Proc. of the International Conference on Auditory Display. Montr´eal, Canada, pages 411–418, 2007. [45] David S Meyer and Steven A Boutcher. Signals and spillover: Brown v. board of education and other social movements. Perspectives on Politics, 5(1):81–93, 2007. [46] Karin M¨uller. How to make unified modeling language diagrams accessible for blind students. In Klaus Miesenberger, Arthur Karshmer, Petr Penaz, and Wolfgang Zagler, editors, Computers Helping People with Special Needs, pages 186–190, Berlin, Heidel- berg, 2012. Springer Berlin Heidelberg. [47] Juan Carlos Mu˜noz-Carpio, Michael Cowling, and James Birt. Framework to enhance teaching and learning in system analysis and unified modelling language. In 2018 IEEE International Conference on Teaching, Assessment, and Learning for Engineering (TALE), pages 91–98. IEEE, 2018. [48] Charles B Owen, Sarah Coburn, and Jordyn Castor. Teaching modern object-oriented programming to the blind: an instructor and student experience. In 2014 ASEE Annual Conference & Exposition, pages 24–1167, 2014. [49] Andrew J Oxenham. How we hear: The perception and neural coding of sound. Annual review of psychology, 69:27–50, 2018. [50] Nicola Papazafiropulos, Luca Fanucci, Barbara Leporini, Susanna Pelagatti, and Roberto Roncella. Haptic models of arrays through 3d printing for computer science ed- ucation. In Klaus Miesenberger, Christian B¨uhler, and Petr Penaz, editors, Computers Helping People with Special Needs, pages 491–498, Cham, 2016. Springer International Publishing. [51] Vanessa Petrausch, Stephan Seifermann, and Karin M¨uller. Guidelines for accessible textual uml modeling notations. In Computers Helping People with Special Needs: 15th International Conference, ICCHP 2016, Linz, Austria, July 13-15, 2016, Proceedings, Part I 15, pages 67–74. Springer, 2016. [52] Marian Petre. Uml in practice. In 2013 35th international conference on software engineering (icse), pages 722–731. IEEE, 2013. [53] Marian Petre. “no shit” or “oh, shit!”: responses to observations on the use of uml in professional practice. Software & Systems Modeling, 13:1225–1235, 2014. [54] Rachel Potvin. The motivation for a monolithic codebase, 2015. [55] Rebecca Reuter, Theresa Stark, Yvonne Sedelmaier, Dieter Landes, J¨urgen Mottok, and Christian Wolff. Insights in students’ problems during uml modeling. In 2020 IEEE Global Engineering Education Conference (EDUCON), pages 592–600, 2020. 89 [56] Michael Schutz and Jessica Gillard. On the generalization of tones: A detailed explo- ration of non-speech auditory perception stimuli. Scientific Reports, 10(1):9520, 2020. [57] Aleksander Sek and Brian CJ Moore. Frequency discrimination as a function of fre- quency, measured in several ways. The Journal of the Acoustical Society of America, 97(4):2479–2486, 1995. [58] Ather Sharif, Sanjana Shivani Chintalapati, Jacob O. Wobbrock, and Katharina Rei- necke. Understanding screen-reader users’ experiences with online data visualizations. In The 23rd International ACM SIGACCESS Conference on Computers and Accessi- bility, ASSETS ’21, New York, NY, USA, 2021. Association for Computing Machinery. [59] Ben Shneiderman. The eyes have it: A task by data type taxonomy for information visualizations. In The craft of information visualization, pages 364–371. Elsevier, 2003. [60] Keng Siau and Poi-Peng Loo. Identifying difficulties in learning uml. Information Systems Management, 23(3):43–51, 2006. [61] Andreas M. Stefik, Christopher Hundhausen, and Derrick Smith. On the design of an educational infrastructure for the blind and visually impaired in computer science. In Proceedings of the 42nd ACM Technical Symposium on Computer Science Education, SIGCSE ’11, page 571–576, New York, NY, USA, 2011. Association for Computing Machinery. [62] Harald St¨orrle. On the impact of layout quality to understanding uml diagrams. In 2011 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC), pages 135–142. IEEE, 2011. [63] Biological Sciences Curriculum Study, National Institutes of Health, et al. Information In NIH Curriculum Supplement about hearing, communication, and understanding. Series [Internet]. National Institutes of Health (US), 2007. [64] Clara Suied, Trevor R Agus, Simon J Thorpe, Nima Mesgarani, and Daniel Pressnitzer. Auditory gist: recognition of very short sounds from timbre cues. The Journal of the Acoustical Society of America, 135(3):1380–1391, 2014. [65] Michael J Sullivan, Bruce Fairbairn, Charlie Shivers, LeAnna Parkey, WK Roberts, Sean Merril, and Matt Lea. Joint strike fighter: Restructuring added resources and re- duced risk, but concurrency is still a major concern. Technical report, GOVERNMENT ACCOUNTABILITY OFFICE WASHINGTON DC, 2012. [66] Danielle Albers Szafir, Rita Borgo, Min Chen, Darren J Edwards, Brian Fisher, and Lace Padilla. Visualization Psychology. Springer, 2023. [67] Anne M Treisman and Garry Gelade. A feature-integration theory of attention. Cogni- tive psychology, 12(1):97–136, 1980. 90 [68] Johan Wagemans, James H Elder, Michael Kubovy, Stephen E Palmer, Mary A Peter- son, Manish Singh, and R¨udiger Von der Heydt. A century of gestalt psychology in visual perception: I. perceptual grouping and figure–ground organization. Psychological bulletin, 138(6):1172, 2012. [69] Bruce N Walker. Consistency of magnitude estimations with conceptual data dimensions used for sonification. Applied Cognitive Psychology: The Official Journal of the Society for Applied Research in Memory and Cognition, 21(5):579–599, 2007. [70] Bruce N. Walker and Gregory Kramer. Mappings and metaphors in auditory displays: An experimental assessment. ACM Trans. Appl. Percept., 2(4):407–412, oct 2005. [71] Hironori Washizaki, Masayoshi Akimoto, Atsushi Hasebe, Atsuto Kubo, and Yoshiaki Fukazawa. Tcd: A text-based uml class diagram notation and its model converters. In Advances in Software Engineering: International Conference, ASEA 2010, Held as Part of the Future Generation Information Technology Conference, FGIT 2010, Jeju Island, Korea, December 13-15, 2010. Proceedings, pages 296–302. Springer, 2010. [72] Marcus Watson and Penelope Sanderson. Sonification supports eyes-free respiratory monitoring and task time-sharing. Human factors, 46(3):497–517, 2004. [73] Ira Woodring and Charles Owen. An empirical study of user perception of audio stimuli In Proceedings of the 14th PErvasive Technologies in relation to a cartesian space. Related to Assistive Environments Conference, pages 8–15, 2021. [74] Ira Woodring and Charles Owen. Results of preliminary studies on the perception of the relationships between objects presented in a cartesian space. Technologies, 10(1), 2022. [75] Ira Woodring, Charles Owen, and Samia Islam. A method for presenting uml class diagrams with audio for blind and visually impaired students. In Proceedings of the 17th International Conference on PErvasive Technologies Related to Assistive Environments, pages 15–20, 2024. [76] Stelios Xinogalos. Object-oriented design and programming: an investigation of novices’ conceptions on objects and classes. ACM Transactions on Computing Education (TOCE), 15(3):1–21, 2015. [77] Jeong Yang, Youlg Lee, and Kai H Chang. Initial evaluation of jaguarcode: A web- based object-oriented programming environment with static and dynamic visualiza- tion. In 2017 IEEE 30th Conference on Software Engineering Education and Training (CSEE&T), pages 152–161. IEEE, 2017. [78] Tsubasa Yoshida, Kris M Kitani, Hideki Koike, Serge Belongie, and Kevin Schlei. image feature sonification for the visually impaired. In Proceedings of the Edgesonic: 2nd Augmented Human International Conference, pages 1–4, 2011. 91 [79] Haixia Zhao, Catherine Plaisant, Ben Shneiderman, and Jonathan Lazar. Data sonifi- cation for users with visual impairment: A case study with georeferenced data. ACM Trans. Comput.-Hum. Interact., 15(1), may 2008. [80] Hong Zou and Jutta Treviranus. Chartmaster: A tool for interacting with stock market charts using a screen reader. In Proceedings of the 17th International ACM SIGACCESS Conference on Computers & Accessibility, pages 107–116, 2015. 92