u. nth ; anew“ ‘lliil a! .n.. it. \ .v I. .i. )\ .I 3 1.....‘2 . , , m. g .. a...» r»... . -2 . x film uwdka 5.2!. , . , , fixWH .. ‘1311 :1“. . ‘.\Irfl1 »V ‘x Wt! LI 1!..." . i In... I. sill.- .. fawn-unh- . 3. I!!! I “If: .5 if. 1.: .73.. zaannxo..nn...lvwflok Vania“.- ”4:111?! z... 5.: l i ‘ ‘5 do... 1 A l..?.,ll.\nt?§|saii.l¢'t .6»). .3... 2?”! uotihtblhaf I; in” . . 1.3.1:.) xiiitht . '1. \. 1;!!!Jii‘ti 021.33]! 6.1mm. . i. Errultliuz. 1. i333, 1233 3.11! it: 2:... . $.9013i‘lixl. .0. .II vi... 11'... I: It .Iv‘lltcnileil! I’ll}. Michigan State University Q («,7 This is to certify that the thesis entitled USING EVOLUTIONARY COMPUTATION TO AUTOMATICALLY REFACTOR SOFTWARE DESIGNS TO INCLUDE DESIGN PATTERNS presented by ADAM C. JENSEN has been accepted towards fulfillment of the requirements for the MASTER OF degree in COMPUTER SCIENCE SCIENCE $51M ( C/UM Méjor Profesé’or’scsynature 471/15“ / I D Date MSU is an Affirmative Action/Equal Opportunity Employer PLACE IN RETURN Box to remove this checkout from your record. TO AVOID FINES return on or before date due. MAY BE RECALLED with earlier due date if requested. DATE DUE DATE DUE DATE DUE 5/08 K:lProj/Acc&Pres/ClRCIDateDue.indd Using Evolutionary Computation to Automatically Refactor Software Designs to Include Design Patterns By Adam C. Jensen A THESIS Submitted to Michigan State University in partial fulfillment of the requirements for the degree of MASTER OF SCIENCE Computer Science 2010 ABSTRACT Using Evolutionary Computation to Automatically Refactor Software Designs to Include Design Patterns By Adam C. Jensen Building a coherent object—oriented software design is a difficult, time-consuming task. Designing maintainable and extensible software that can be revised to satisfy new requirements is even more difficult. Recent approaches view software design as an optimization problem, thus enabling techniques such as evolutionary com- putation for automating design i1111.)rovemeut. These approaches focus on making small, behavior-preserving changes to software designs, such as moving an operation from one class to another and updating refermces to that method. However, these approaches do not leverage larger design strategies, such as design patterns. This thesis introduces an approach for automated improvement of software designs that focuses on the introduction of design patterns through refactoring. The proposed approach applies genetic programn’iing, an evolutionary computation technique, to identify suitable refactorings that improve the quality of a software design with re- spect to a set of software engineering metrics. By automatically introducing design patterns and using object-oriented metrics to guide the evolutionary process, our approach is able to generate a broader set of refactoring solutions that incorporate design patterns than traditional refactoring techniques. We apply this approach to a large set of published software models, as well as a model derived from a Web—based application. ACKNOWLEDGEMENTS To acknowledge everyone who contributed to the work in this thesis, I would need to mention everyone who interacted with me — directly or indirectly, deliberately or accidentally — in the past fifteen months. I thank each and every one of you. In any creative endeavor, the novelty and randomness of our surrounding environment plays a crucial role in motivating 11s, drawing us out of mental ruts, and providing healthy dose of distraction. I thank the universe for all of those things. More specifically, I could not have completed this work without the love and support of my family. They taught me that life is something that can be enjoyed. They also taught me that one can be both skeptical and optimistic, and that a good education is one of the most reliable means of realizing those ideas. Later, it was their encouragement that led me to pursue an advanced degree and helped me to work through times of doubt. I thank them for everything. I would like to thank my advisor, Dr. Betty Cheug, for her thoughtful comments, constructive criticism, ninja-like editing skills, and patience. I have a wonderful group of friends. Their indifference toward the day-to—day difficulties of my research served as a reminder that there is life outside of graduate school. I thank all of them for being supportive and fun to be with. Two of the local public radio stations, WCMU and WKAR, kept the stream of jazz and classical music flowing when I needed it most, and they provided a valuable window into the goings on across the world. Their contribution to my cultural education and general well-being is deeply appreciated. Finally, I want to thank my wife, Jenny. She has helped me every step of the way: applying to graduate school, finding a research area, choosing to write a thesis, and staying on task until the work was finished. However, these things just scratch the surface of her contribution. She brings untold warmth and joy to my life, and for this I am truly blessed. iii TABLE OF CONTENTS List of Tables .................................. v1 List of Figures .................................. v11 1 Introduction ................................. 1 1.1 Thesis Statement ............................. 4 1.2 Thesis C(‘mtributions ........................... 4 1.3 Organization of Thesis .......................... 5 Background .................................. 6 2.1 Unified Mmlr-zliug Language (‘UL\.II..) ................... 6 2.2 Refactoring ................................ 7 2.3 Design Patterns .............................. 8 2.3.1 Definition and Motivation .................... 8 2.3.2 Terminology ............................ 9 2.3.3 Example .............................. 10 2.3.4 Pattern Descriptions ....................... 13 2.3.5 Design Pattern Challr-‘uges .................... 21 2.4 Evolutionary Computatirm ........................ 22 2.5 hfletrics ................................... 24 2.6 Summary ................................. 28 Related Work ................................ 29 3.1 Refactoring ................................ 29 3.2 Search—Based Refactoring ......................... 33 3.3 Design Pattern Detection ......................... 35 3.4 Automated Design Pattern Instantiation ................ 37 3.5 Summary ................................. 4O Approach ................................... 42 4.1 Overview .................................. 42 4.2 Individual RepresentatioiI ........................ 43 4.2.1 Design Graph ........................... 45 4.2.2 Transformation Tree ....................... 45 4.3 Design Change Mechanisms ....................... 46 4.3.1 Helper Functicms ......................... 47 4.3.2 Transformation Nodes ...................... 48 4.3.3 Information Nodes ........................ 57 4.3.4 Lazy Node Evaluation ...................... 58 4.4 Evaluating Refactored Designs ...................... 59 4.4.1 Design Quality .......................... 59 4.4.2 Design Pattern Detection ..................... 60 iv 4.4.3 Fitness Ifunct ion ......................... 61 4.5 Exanmle .................................. 63 4.6 Summary ................................. 65 5 Validation ................................... 66 5.1 Prototype Implcmentation ........................ 66 5.1.1 Importing UML Class Diagrams ................. 67 5.1.2 Representing the 'lf‘arget Software Design as a Graph ..... 67 5.1.3 h’l'iitaticm ()1iie1‘a.tors for Manipulating the Graph ........ 68 5.1.4 Implementing the QMOOD Metrics Suite ............ 72 5.1.5 Detecting Design Patterns with PROLOG and SQL ...... 72 5.1.6 Providing a Domain-Specific \foc-afiular‘y ............ 73 5.2 Run—time Envinmmcnt and Parameters ................. 74 5.3 Hypotheses ................................ 75 5.’ Experiments ................................ 75 5.4.1 Experiment I: No Rewards or Penalties (Baseline) ....... 76 5.4.2 Experiment II: Penalizing Large GP Trees ........... 78 5.4.3 Experiment III: Rmvarding Ext-'olutiou of Design Pattern In— stances ............................... 83 5.4.4 Experiment IV: Rewm'ding Specific Serpiences of he’linitrans- form atim is ............................. 89 5.4.5 Experiment V: REMODD Case Study ............. 94 5.5 Discussion ................................. 96 5.6 Sunnuary ................................. 97 6 Conclusions and Future Work ...................... 98 6.1 Sunnnary ................................. 98 6.2 Iriuiiac'2t ................................... 100 6.3 Future \"Vork ................................ 100 A Queries for Design Pattern Detection ................. 103 Al PROLOG Queries ............................. 103 ./\.2 SQL Queries ............................... 106 B Transcript of the ReIVIoDD GP Run .................. 111 C XMI Representation of REMODD Design ............... 119 REFERENCES ................................. 240 2.1 2.2 0'1 v C1 03 [\D i— 01 4;. CA 1 Qt i—A I—* l--‘ \1 CD 01 01 i-‘ (7. A.1 13.1 LIST OF TABLES QMOOD i\I-Ietric Descriptions QMOOD Quality Characteristics and Formulas Experiment. I: Evolved Design Pattern Instz‘mces Expcrinn‘nt II. First Treatment: Evolved Design Pattern Instances . Experiment II. Second 'I‘rcatnwntz Evolved Design Pattern Instances Experiment II, Third 'I‘reatment: Evolved Design Pattern Instances Experiment. II: Nnml'icr of 'I‘ransfi)rmation Nodes Experinu‘int III. First 'I‘rrn'ritinent: Evolved Design Pattern Instances Experinjient III. Second Treatment: Evolved Design Pattern Instances Experiment III. Third Treatment: Evolved Design Pattern Instances . Experiment III. Fourth Treat ment: Evolved Design Pattern Instances Experiment III. Fifth Treatment: Evolved Design Pattern Instances Experiment III: Models with Significant. Increase in DP Instanures Experiment IV, First 'I’reatim‘rntc Evolved Design Pattern Instances Experiment IV , Secoml ’I‘rcatment: Evolved Design Patti-érn Instances Experiment IV _. Third Treatmcnt: Evolved Desigi‘i Pattern Instances . Experiment IV. Fourth Treatment: Evolved Design Pattern Instances Experiment IV. Fifth Treatment: Evolved Design Pattern Instances Experiment IV: i\-*Io(_l(_'.ls with Significant Increase in DP Instances Experiment V: Evolvml Design Pattern IllSlti.Il(':(-:S SQL Queries for Design Pattern Detection Role—plaving Classes in Sample Abstract Factory Instance vi 26 27 77 79 80 81 82 84 86 87 87 88 90 91 92 93 93 94 96 110 117 LIST OF FIGURES 2.1 Sample UML Class Diagram ....................... 7 2.2 Class Diagram for AutoMaker Example ................. 11 2.3 AutoMaker Class Diagram with Abstract. Factorv ........... 12 2.4 ABSTRACT FACTORY (illass Diagrznn .................. 14 2.5 ADAPTER Class Diagram ......................... 15 2.6 BRIDGE Class Diagram .......................... 17 2.7 DECORA’I‘OR Class Diagram ....................... 18 2.8 PROTOTYPE Class Diagram ....................... 19 2.9 PROXY Class l.)iagram .......................... 20 2.10 Prmzess Diagram for Gerwtic Programming ............... 23 2.11 Example of Subtree Crossover ...................... 24 4.1 Data Flow Diagram of the Pi_'(.)p(.)sed Ap1_>i'oacli ............. 43 4.2 GP Compmwnts and Relationships ................... 44 4.3 Abstract Access Minitransfi)rniation ................. 49 4.4 Abstraction Minitransfornration .................... 50 4.5 Delegation Minitransformatit)n ..................... 52 4.6 EncapsulateConstruction Minitransfornration ............ 53 4.7 PartialAbstraction Minitransformation ............... 55 4.8 Wrapper Mi11itransformatim1 ....................... 56 4.9 PROLOG code for ABSTRACT FACTORY design 1')attern ........ 61 4.10 Example Il"ransformation 'ili‘ree. Execution ................ 64 B.1 Sample Abstract Factory Design Pattern Instance .......... 117 13.2 Transformation Tree for Sample Abstract Factory Instance . . . . . 118 vii Chapter 1 Introduction Building a coherent object-oriented design for a piece of software is a difficult, time- consuming task. Designing that software to be easily maintained and extended in order to satisfy new requirements is even more difficult, as it forces software en- gineers to consider not only the details of the solution space (e.g., how to build on-board software for automobiles) but also details of the problem space (e.g., how to manage a large and ever-changing number of AutoPart subclasses and their ob- ject instances). These difficulties led to the creation of two important concepts in software engineering. First, metrics enable a software engineer to evaluate various characteristics of design quality, such as flexibility or readability, in a mathematically rigorous fashion. Second, design patterns provide a context-driven solution template for solving design problems that occur frequently in large software projects. How- ever, metrics and design patterns often require significant intellectual investment to understand and implement, thus limiting their adoption. This thesis addresses these concerns by introducing an approach, based on evolutionary computation, that au— tomates both the use of software engineering metrics and the introduction of design patterns in order to automatically generate sets of refactoring changes to improve an existing software design. The approach is modular, thus enabling the reconfig— uration or substitution of metrics and design patterns as appropriate for different domains, design constraints, and performance needs. In addition, the output from the approach includes the specific set of refactoring steps to apply in order to im- prove the quality of a design as measured by metrics. This information facilitates both manual and automatic application of the design changes, and our approach defers this choice to the software engineer. Prior approaches to automated design improvement have focused primarily 011 automated refactoring, i.e. making small, behavior-preserving changes to software designs or source code, and have treated the problem as a pattern-matching ac- tivity. O Cinnéide and Nixon [28], as well as Sunye et al. [43], introduced sets of coarse-grained refactorings that, under “green-field conditions”, can be applied auto- matically n order to improve design quality. However, the rule-based nature of these approaches limits their general applicability. In contrast, evolutionary computation (EC) harnesses the exploration and exploitation capabilities of Darwinian evolution to find the best solutions to an optimization problem [10], including many software engineering problems [8]. O’Keeffe and O Cinnéide [30,31] introduced the idea that an EC—based approach could recommend a set of fine-grained refactorings to apply in a given situation. They demonstrated that this approach is generally applicable and more flexible than prior refactoring work, suggesting that automated refactoring techniques are realizable. Seng et al. [40] proposed a similar approach that used a genetic algorithm to optimize the class hierarchy of an object-oriented program by moving methods from one class to another and evaluating the modified designs using metrics from Briand’s catalogs [5]. While these approaches have demonstrated that EC-based techniques for refactoring are viable, they use simple refactoring steps as design mutation operators that result in small, incremental design changes. This limitation makes it difficult to compose multiple refactoring steps, a process that is necessary to facilitate complex design changes, such as the introduction of design patterns. In contrast, this thesis introduces an approach to make the introduction of de— sign patterns a first-class part of the refactoring process. This shift in focus carries three key benefits. First, we leverage the widely accepted notion that application of design patterns improves the reusability and maintainability of non—trivial soft- ware designs [38]. Second, evolutionary computation enables us to make use of transformations to generate more innovative refactoring solutions that incorporate design patterns. Third, the approach illustrates the use of a modular framework for search-based refactoring whose components (i.e., metrics, design patterns, and refac— toring steps) can easily be substituted as appropriate for a given software project. By automatically constructing solutions that are highly rated by software engineer- ing metrics and also contain design patterns, we reduce the manual, trial-and—error refactoring labor for software developers and also provide automatic design advice that is tailored to the software design in question. All things being equal, those refactoring solutions requiring the fewest transformations are favored. Furthermore, our approach generates the sequence of steps by which the input design can be trans- formed into an improved design, thus supporting the integration of this approach into automated design refactoring tools. We use genetic programming (GP) [20], an evolutionary computing technique, to identify the most suitable set of refactorings to improve software design. As with any evolutionary approach, GP requires three components: a solution representation, a mechanism for making changes to that solution, and finally a means of measuring the solution’s quality. Our solution representation takes the form of an abstract syntax tree whose nodes consist of so—called minitransformations introduced by O Cinnéide [27], as well as a graph structure that represents the software design that is to be improved. The minitransformations, when composed, can create instances of design patterns. The minitransformations make specific changes to the software design and are defined as functions that accept inputs and return outputs. Thus, they can be expressed naturally as nodes in an abstract syntax tree. In order to evaluate the quality of solutions that are produced by the GP, we leverage the QMOOD metric suite that combines a rich set of metrics for analyzing the complexity of object— oriented hierarchies, cohesion of classes, and so on. QMOOD was introduced by Bansiya and Davis [2] and has been used in prior search-based refactoring work [31]. We illustrate the efficacy of this approach in a series of experiments by applying it to a large set of published models [14]. First, we demonstrate the approach without the use of specific rewards or penalties, thus allowing metrics alone to guide evolution. Second, we examine the benefits of applying a penalty in proportion to the number of refactoring steps suggested in order to limit the number of such steps that must be performed by hand. Third, we explore the benefits of explicitly rewarding individuals that evolve at least one design pattern instance. Fourth, We explore the benefits of rewarding the use of sequences of minitransformations that are known to construct design pattern instances. Finally, we apply the approach to a real-world case study [11]. 1. 1 Thesis Statement It is possible to use techniques from evolutionary computation, with input and guid- ance from software engineering metrics, in order to improve the design of existing software through the introduction of design pattern instances. 1.2 Thesis Contributions This thesis makes three contributions. First, we developed a novel genetic program- ming encoding that facilitates the automated refactoring of UML class diagrams. This provides a valuable proof of concept step that further supports the use of evo— lutionary computing to automate common software engineering tasks. Second, we developed a process, guided by software engineering metrics, for introducing instan- tiated design patterns into existing software designs. Finally, the results from our approach provide traceability in the form of a step-by-step process for realizing the suggested refactoring sequence. 1.3 Organization of Thesis The remainder of the thesis is organized as follows: Chapter 2 describes the back- ground concepts of this work, including evolutionary computation, software metrics, and design patterns. Chapter 3 gives an overview of related work. Chapter 4 de— scribes our approach for automated design improvement through the introduction of design patterns. Chapter 5 presents five experiments that validate our approach and document our exploration of the available parameters in our approach, thus pro- viding a basis for choosing appropriate parameter values. The chapter also includes a case study that applies the proposed approach to a real-world web application. Finally, our conclusions and future work are presented in Chapter 6. Chapter 2 Background This chapter presents background concepts related to the approach that is proposed in this thesis. We begin with an overview of the Unified Modeling Language (UML), followed by a review of approaches to refactoring software designs. Next, we present the basic concepts of evolutionary computation. Finally, we discuss the measurement of quality in object-oriented software design using metrics. 2.1 Unified Modeling Language (UML) The Unified Modeling Language (UML) is a standardized, general-purpose modeling language that is designed to be used in the software engineering field. It defines a set of graphical notations that can be used to create visual models of software systems. In this thesis, we focus on UML class diagrams, which are box-and—line diagrams that depict the object-oriented structure of a software system, including the sys— tem’s classes, their attributes and operations, and the associations between classes. A sample class diagram is depicted in Figure 2.1. In the sample diagram, there are five classes represented: Drawing, Shape, Circle, Rectangle, and Triangle. The latter three classes inherit the functionality (i.e., attributes and operations) of the Shape class. The inheritance relationship is indicated by the arrows with un— shaded, triangular tips. The Drawing class aggregates many Shape objects, and this relationship is depicted by the arrow that has a diamond shape at its tail adjacent to the aggregating class. This aggregation relationship also specifies multiplicity constraints. The numeral 1 next to the Drawing class indicates that there is one Drawing instance in the relationship, and the asterisk (*) next to Shape indicates that there can be zero or more Shape instances in the aggregation. Drawing area : float pe width : floa : float circumference : float base : float Figure 2.1: Sample UML Class Diagram 2.2 Refactoring As software ages, its design will need to be refined as defects are discovered and requirements change. To address these changes, software developers make small, behavior-preserving changes called refactorings. These changes typically have no effect on a program’s external behavior (i.e., its output) but instead focus on im- proving the program’s design by making it more intuitive, more easily extended, or more maintainable [34]. Refactoring has been a target for automation, as evidenced by the large number of common refactorings that are included in popular software development tools such 7This page is dedicated to Jenny, for all of her love and support. as Eclipse [9]. For example, Eclipse has a “rename class” refactoring that renames a class and updates all references to that class to be consistent with the new class name. This tool support reduces the number of tedious but important changes that software developers must perform manually. While industrial tools have focused on refactoring software at the source code level, techniques have been proposed that facilitate refactoring at the design, or model, level. Mens [23] proposed to model a software design as a graph and defines a set of refactorings in terms of graph trans- formations. Similarly, Markovic’ [22] proposed a technique for refactoring Unified Modeling Language (UML) class diagrams annotated with constraints that specify pre— and post-conditions for each class. Focusing on software design in this way enables us to see patterns in its structure, and these patterns may reveal oppor- tunities for refactoring steps that improve the software’s modularity and long-term maintainability. 2.3 Design Patterns In this section, we provide an overview of design patterns, beginning with a definition and motivating context. Next, we present an example illustrating the use of the Abstract Factory design pattern in a simple software system. We conclude with a textual description and graphical illustration of each design pattern that is supported in the proposed approach. 2.3.1 Definition and Motivation In the field of software engineering, a design pattern is a general, reusable solution to a reoccuring problem in software design. It describes the core of a solution to the problem, deferring the specific implementation details to the developer [12]. This generic nature makes design patterns easier to understand and communicate than implementation- or domain-specific solutions. Much like a document template, a design pattern gives developers a validated, robust starting point from which they can build a solution that suits their application and goals. It is generally accepted by software engineers that software built using design patterns is more maintainable and extensible than software that is developed in an ad—hoe fashion without such established building blocks [38]. Since well—known design patterns are derived from developers’ experiences over many years, it follows that design patterns are the manifestation of both long—term testing and collective experience. Thus, there is a clear advantage in using design patterns when they are relevant to a problem that a developer is facing. More specifically, design patterns are intended to make existing software more flexible. In this thesis, we define software flexibility to be “the ease with which software can be modified or extended to include new functionality”. Clearly, software flexibility directly affects the costs of maintenance and the feasibility of code reuse. Software that is not designed with flexibility and extension in mind will likely require greater effort and training to maintain during its lifetime than software that has been carefully designed to be amenable to future changes. As a result, much effort has been spent both in academia and in the software industry to develop tools and techniques that encourage flexible design concepts. Design patterns are one such tool, and their use has been widely cited as a good software engineering practice. 2.3.2 Terminology In this thesis, we define the new term design pattern template to mean the generic description of a design pattern as given by Gamma et al. [12]. Each design pattern template contains a short name for the design pattern, a textual description of its purpose and the context in which it should be used, as well as an illustrative example. A design pattern template contains enough information that a software engineer will, after reading it, understand the problem that the design pattern intends to solve and how to implement it. In ob ject—oriented software design, the process of creating an object instance from a given class is known as instantiation. Similarly, in this thesis we will refer to the process of creating a specific design pattern instance from a design pattern template as design pattern instantiation. Just as an object instance is a more concrete version of a class, a design pattern instance is a more concrete, application—specific version of a design pattern template. 2.3.3 Example Design patterns can be difficult to present or understand from a simple text descrip- tion. In this section, we present an example that illustrates the application of the Abstract Factory design pattern to an existing design. First, we introduce a software design called AutoMaker that can benefit from the introduction of a design pattern. AutoMaker represents a software system for an automobile manufacturer. A partial UML class diagram for AutoMaker is shown in Figure 2.2. In AutoMaker, there are three classes that represent the style of auto- mobiles to be manufactured. These include Convertible, Coupe, and Sedan. Each of these classes has a corresponding factory class, namely, ConvertibleFactory, CoupeFactory, SedanFactory. These factories construct instances of the automo— biles, and the instances are then shipped by a freight company (represented by the FreightCompany class) to the customer. While this seems to be a reasonable software design, it has a flaw that becomes apparent when the automobile manufacturer adds another vehicle to its product line. In order to introduce a compact vehicle style, we must introduce a Compact class. Next, we must introduce a CompactFactory class in order to make instances of Compact. Finally, the FreightCompany class must be modified in order to ship 10 visits visits [ FreightCompany IL; ships ships ships [Convertible ] Sedan ] Coupe ] visits makes makes makes , r V ConvertibleFactory SedanFactory CoupeFactory capacity : int capacity : int capacity : int active : bool active : bool active : bool makeAuto() :Automobile makeAuto() :Automobile makeAutoO :Automobile Figure 2.2: Class Diagram for AutoMaker Example instances of Compact. In an ideal design, the FreightCompany does not need to know when the automobile manufacturer has introduced a new product. That is, it is designed according to a common interface that is shared by all automobiles. Similarly, we observe that the factory classes duplicate much of their functionality when new automobile styles are introduced. This duplication is another scalability problem that can be resolved using object—oriented techniques. We now consider how to modify the software design to address the two flaws mentioned above. First, to enable the freight company to take a more abstract view of the automobiles that they are shipping, we introduce an abstract class named Automobile. We then modify the Convertible, Coupe, and Sedan classes so that they inherit Automobile. N ow that we have a common interface for all automobiles, we modify the FreightCompany class so that it ships instances of the abstract class Automobile. These changes are illustrated in the top half of Figure 2.3. Next, we consider the duplication of functionality among the factory classes. Since the three factory classes in the original class diagram have common attributes (i.e., capacity and active) and a common operation makeAuto, it is clear that a sec- ond abstract class is needed. Therefore, we introduce an abstract Factory class that 11 comprises the same attributes and operation as the original factory classes. Next, we modify the factory classes to inherit Factory, thus eliminating the duplication. The resulting class diagram is given in Figure 2.3. I‘Frelghtcfonipahv l——— [Convertible ships I Sedan l :> ‘ Automobile." visits A I Coupe manufactures Factory capacity : int . active : bool __‘— makeAuto() :Automobile ‘ [ConvertibleFactory] [ SedanFactory j | CoupeFactory ] Figure 2.3: AutoMaker Class Diagram with Abstract Factory In the course of this example, we have constructed an instance of the Abstract Factory design pattern. The purpose of Abstract Factory is to create a layer of abstraction between the product classes and the factory classes that create instances of those products. This layer ensures that the product classes, and especially the client classes, can vary independently of the factory classes, thus reducing the acci- dental complexity [6] of the software design as it changes over time in response to new requirements. The pattern has three roles: Abstract Factory, Abstract Prod- uct, and Client. In this example, those roles are fulfilled by Factory, Automobile, and F reightCompany, respectively. The classes filling these roles are shaded in Fig- ure 2.3. 12 2.3.4 Pattern Descriptions We now present detailed descriptions for each of the design patterns that are used in this thesis. These descriptions are derived from Design Patterns: Elements of Reusable Object-Oriented Software [12], in which Gamma et al. define three cate- gories for their design patterns: behavioral patterns that focus on common commu- nication patterns between objects and class, creational patterns that solve common problems related to creating objects, and structural patterns that identify reusable ways to realize relationships between entities (e.g., objects and classes). There is a great deal of diversity among the Gamma design patterns in terms of their level of detail. Due to the nature of our approach, we restrict the usable set of patterns to a subset of the creational and structural pattern categories. These patterns are identifiable based on their footprint in UML class diagrams, thus enabling us to detect their instances when they happen to evolve in a GP. Abstract Factory A factory is a class that is responsible for creating instances of other classes. In a drawing program, for example, the PixelFactory class could create instances of the Pixel class that are ultimately rendered to the screen. The Abstract Factory pattern provides a way to encapsulate a group of individual factories that have common functionality. A graphical depiction of Abstract Factory is shown in Figure 2.4. The pattern calls for the creation of an abstract Factory class that defines the interface for all other factories. Classes that need to use a factory, then, can be written to the interface. At runtime, these classes can utilize concrete factory classes without any knowledge of their specific implementation. 13 Client AbstractFactow Creafisroducto :Abstractfioduct | Factory 1 produces * AbstractProduct Froduct |CreateProduct() : Product Create() : Product Figure 2.4: ABSTRACT FACTORY Class Diagram 14 Adapter Two or more classes that were designed independently can be difficult to combine. The classes might make incompatible assumptions about their inputs or outputs, and modifying the classes to be compatible with one another might be infeasible. The Adapter design pattern translates, or adapts, the interface for a class into an interface that is compatible with another class. A graphical depiction is shown in Figure 2.5. jTanet Client > Request() _ Adapter Ad:a_ptee Request() SpecificRequestO Figure 2.5: ADAPTER Class Diagram 15 Bridge In many software systems, multiple implementations exist for a given abstraction. Ob ject—oriented design supports this scenario with inheritance, which enables a class to take on the attributes and operations of a parent (inherited) class. However, situations arise where this tight binding between a concrete implementation and its abstract inherited class is too restrictive. One solution to this problem is the Bridge design pattern. This pattern is meant to decouple an abstraction from its implementation so that the two can vary independently [12]. Bridge is depicted graphically in Figure 2.6. To motivate the use of the Bridge design pattern, we consider the Windowing toolkit example presented by Gamma et al. [12]. A Window abstraction in a user interface toolkit may need to be deployed on multiple platforms. Specifically, it needs to be able to display Window objects on the X Window System on Linux [47] as well as the Aqua windowing system on Apple Mac OS X (TM). A first approximation of a solution to this problem is to use inheritance; that is, we should create a concrete subclass of Window for each of the windowing systems. However, this solution has two major drawbacks. First, if we wish to extend the functionality of Window to include support for other types of windows, then we must re—implement each of the concrete subclasses. Second, the solution requires that a client class (i.e., a class that creates a Window instance) commit itself to a specific Window implementation. To address these problems, the Bridge design pattern separates the Window abstraction and classes that implement it into separate class hierarchies. This solution enables us to extend the Window abstraction to include new window types while reusing the existing concrete implementation classes. In addition, the clean separation of the class hierarchy enables client classes to view the hierarchy more abstractly, and therefore resolves the drawback of clients committing to a specific concrete implementation. 16 Abstraction L > Implementor | Operation() lfi Operationlmp() | . I . Concretelmplementor RefinedAbstractIon Operationlmp() Figure 2.6: BRIDGE Class Diagram 17 Decorator The Decorator pattern is a design pattern that allows new or additional behavior to be added to an existing object dynamically. Decorator is depicted graphically in Figure 2.7. For example, a Windowing toolkit might wish to add borders to a specific object instance of the Window class. Decorator accomplishes this change in functionality by wrapping, or enclosing, an object with another (decorator) object. The enclosing object conforms to the interface of the enclosed object, thus permitting it to interact transparently with clients of the enclosed object. | Component IA | Decorator | | Operation() |\1 | Operation() | | ConcreteComponent | | ConcreteDecorator | | Operation() | [Operation() | Figure 2.7: DECORATOR Class Diagram 18 Prototype The Prototype design pattern is used when many objects need to be created by cloning a prototypical object instance. Prototype is depicted graphically in Fig- ure 2.8. This design pattern avoids the need for subclasses, thus making it different from the Abstract Factory pattern. It also offers a solution for scenarios in which it is prohibitively expensive to create new object instances using, for example, the new keyword. [ Client | >| Proto pe ’ |Operation() | | Clone() A | ConcretePrototype | |Clone() : Prototype | Figure 2.8: PROTOTYPE Class Diagram 19 Proxy The Proxy design pattern comprises a class functioning as an interface to something else. Proxy is depicted graphically in Figure 2.9. This proxy class can be an interface to any system element: a network connection, a large object in memory, a file, or some other resource that is expensive or impossible to duplicate. For example, loading a large file into memory is an expensive operation. Using the Proxy design pattern, one can load the file into memory once and [then provide a light-weight proxy object that has the same interface as the file to each class that wishes to use it. Once all of the proxy objects are destroyed (e.g., from passing out of scope), the large file can be closed and purged from memory. | Subject | | Request() | A | RealSubject l< [ Prox | Request() | | Request() Figure 2.9: PROXY Class Diagram 20 2.3.5 Design Pattern Challenges In this thesis, we leverage the widely accepted notion that application of design patterns improves the rcusability and n'iaintainability of non-trivial software designs [38]. However, despite widespread adoption and general recognition that their use encourages more reusable and maintainable software design, design patterns have received criticism. We discuss several such criticisms in this section. Documenting design pattern use is recognized as a difficult problem [46]. Apply- ing a design pattern that involves multiple classes causes changes to each of those classes, which makes their documentation difficult to maintain in a single place. A developer can place a note in the software’s high-level design comments, but includ- ing design pattern-related comments in code is also important for code readability. When multiple design patterns are applied to a single software system, the docu- mentation problem is sometimes called the design pattern composition problem or overlapping problem [17]. In response to the design pattern documentation problem, He et al. [17] developed a UML-based method of analyzing design pattern composition and tracing the effects of that composition. Their work enables developers to view a software model as a composition of design patterns and enable that model to be viewed at different levels of abstraction. Schulz et al. [39] identifies two major problems with design patterns: first, there is no formal classification for design patterns and their mutual relationships. Second, there has been limited effort toward a systematic way to integrate design patterns into existing systems [51]. \ch address the second problem in this thesis. 21 2.4 Evolutionary Computation Evolutionary computation (EC) is a branch of machine learning that harnesses the concepts of natural (Darwinian) evolution in order to evolve computer programs that solve optimization problems [10]. Figure 2.10 illustrates the problem-solving flow of an EC technique. EC techniques comprise a population of initially-random individuals. Each individual encodes a solution representation, usually in the form of a bit-string or an abstract syntax tree. These solutions are evaluated according to a fitness function, a mathematical formula that measures an individual’s ability to solve a given problem. The individuals are then selected for placement into a subsequent population (the next generation) based on their fitness. Selection is typically performed using an algorithm known as tournament selection. This algorithm draws a sample of n individuals from the population, compares their fitnesses, and selects the individual with the highest fitness. The value n is called the tournament size and is typically less than 10. After being selected, these individuals undergo a random mutation and, through a process known as crossover, portions of their solution representations are recombined with solutions from other individuals in order to explore a greater portion of the solution space. This process of fitness measurement, selection, and mutation is continued for a fixed number of generations or until the optimal solution to the problem being solved is discovered. One EC technique, known as genetic programming (GP), uses an abstract syntax tree representation for the individuals in, its population [20]. An abstract syntax tree (AST) can represent, for example, an algebraic equation. In this situation, the nodes of the AST represent arithmetic operations, at least one algebraic variable, and numerical values. A population of these individuals can be given the task of finding an algebraic function that most closely fits a finite set of two-dimensional data points. This is known as the Regression Problem. In the context of this problem, the fitness of an individual can be defined as the sum of the function’s error (i.e., 22 f l V Initialize Population [ Terminate H Evaluation Hew Generatifl Sample Best Mutation & Individuals Crossover Figure 2.10: Process Diagram for Genetic Programming the difference between the function value and the value of the corresponding data point) over the set of points. Intuitively, an individual is highly fit if its algebraic function very closely matches the set of points. In this section, we assume that the function f (:c) = 51:2 + :r is the optimal fit for the data points that have been given to the GP as input. Beginning with a population of random individuals, each representing an alge- braic function, the GP uses tournament selection to select a sample of individuals that are highly fit relative to others in the population. After being selected, these individuals will be randomly mutated. This mutation could involve, for example, changing a node that represents a numerical (e.g., 7) into a node that represents an arithmetic operator or an algebraic variable. If the new node is an arithmetic operator that requires child nodes in order to be meaningful, then these nodes are randomly grown. After being mutated, individuals also undergo subtree crossover. This process is illustrated in Figure 2.11. Subtree crossover swaps subtrees between the ASTs of two parent individuals, thus creating two child individuals that reuse portions of 23 their parents’ ASTs but are also structurally different from them. In Figure 2.11, the subtrees to be swapped are shaded in the AST of parent nodes, and the arrows identify the new location of those subtrees in the child nodes. At this point, both child individuals are inserted into a new population that will become the next gen- eration. We note that Child 1 in Figure 2.11 represents the function f (x) = 2:2 +33, and thus it is the optimal solution. When the GP evaluates the next generation, this individual will be marked as optimal, and the GP will terminate. Parent 1 Parent 2 n I! Child 1 Child 2 Optimal Solution Figure 2.11: Example of Subtree Crossover The combination of mutation and subtree crossover that we described in this section has been shown to explore efficiently, over the course of several generations, a large space of candidate solutions. In this paper, we use genetic programming as the basis for our approach to automated improvement of software design. 2.5 Metrics As the word suggests, a metric is a mechanism for measuring a specific attribute of an element, with a specific focus on how that attribute changes over time. Many 24 software engineering metrics have been proposed in the literature [2,5, 7, 18, 36]. These metrics measure aspects of software design from the number of classes, the average size of an interface, and so on. They provide both an instantaneous snapshot of these characteristics and, when applied over time, a profile of how a software design has changed through its lifetime. The Hierarchical Metrics for Object-Oriented Design Quality (QMOOD) suite introduced by Bansiya and Davis [2] comprises 11 individual metrics, each of which evaluates a distinct aspect of object-oriented design quality. Notably, these metrics are designed to be amenable to automated evaluation, thus making them ideal for use in software engineering tools. From these metrics, Bansiya and Davis derived a set of formulas that measure abstract quality characteristics such as extensibility, readability, and maintainability. These high-level quality characteristics can inform software developers of the ways in which their software is well-designed and, con- versely, where it can be improved. Additionally, the QMOOD authors recommend that a real-valued weight should be assigned to each characteristic in order to spec- ify its relative importance, thus providing automated feedback regarding how well a software design meets that goal. A complete listing of the metrics in our QMOOD implementation is given in Table 2.1, adapted from O’Keeffe and O Cinnéide [32]. We implemented each of these metrics as a graph-based algorithm that analyzes specific characteristics of a target software design, such as the number of disjoint inheritance hierarchies. Each metric returns a floating—point number, and those numbers are used as input for formulas that evaluate complex quality characteristics. These characteristics, and the formulas that are used to compute them, are shown in Table 2.2. 25 Metric Name Description Design size in classes (DSIC) The number of classes in the software design. Number of hierarchies (NOH) The number of class hierarchies in the software design. Average number of ances- tors (N GA) The average number of other classes that a class inherits. Number of polymorphic methods (NOPM) The number of methods in the software design that exhibit polymorphic behavior. Class interface size (018) The average number of public methods in a class. Direct class coupling (DCC) A count of the number of classes that a given class is directly related to by attribute declaration or method return type. Cohesion among methods of class (CAM) The relatedness among methods of a class, com— puted using the summation of the intersection of parameters of a method with the maximum inde- pendent set of all parameter types in the class. Number of methods (NOM) The average number of methods in a class. Data access metric (DAM) The ratio of non-public (i.e., private or protected) attributes to the total number of attributes de- clared in a class. This is interpreted as the average of the ratios for all classes in the software design. Measure of (MOA) aggregation The average number of data declarations (e.g., fields) in a class whose data types are user-defined classes. We exclude classes that are part of the Java standard library and language. Measure of functional ab- straction (MFA) The ratio of the total number of methods inher- ited by a class to the number of methods that are accessible by member methods of that class. Table 2.1: QMOOD Metric Descriptions 26 Characteristic Formula Reusability 0.5 * DSIC — 0.25 * DCC -l- 0.5 * CIS Flexibility 0.25 * DAM — 0.25 * D00 + 0.5 * ll/IOA + 0.5 * NOPM Understandability —0.33 * DSIC — 0.33 * NOA + 0.33 * DAM — 0.33 * DCC’ - 0.33 * NOPM — 0.33 * NOM Functionality 0.22 * NOPAI + 0.22 * 018 + 0.22 * DSIC + 0.22 * NOH Extendibility 0.5 * NOA — 0.5 * DCC -l- 0.5 * MOFA + 0.5 * NOPM Effectiveness 0.2 * NOA + 0.2 * DAM + 0.2 * MOA + 0.2 * MOFA + 0.2 * NOPM Table 2.2: QMOOD Quality Characteristics and Formulas 27 2.6 Summary In this chapter we presented background concepts related to the approach that is proposed in this thesis. First, we gave an overview of the Unified Modeling Language (UML), followed by a discussion of approaches for refactoring software designs. Next, we presented the basic concepts of evolutionary computation. Finally, we discussed the measurement of quality in object-oriented software design using metrics. 28 Chapter 3 Related Work In this chapter, we present a review of the literature related to the approach that is proposed in this thesis. We begin with a discussion of prior work related to refactoring. Next, we discuss work that has used evolutionary techniques to perform refactoring tasks. Third, we review techniques for detecting design pattern instances in software designs. Finally, we discuss methods for automatically instantiating design patterns in existing software designs. 3. 1 Refactoring Since the requirements of existing software systems tend to change over time, there is a strong need for tools and techniques that make software maintenance more efficient and, when possible, more automated. One promising concept is that of refactorings, which are organization plans that support change at an intermediate level and do not change the behavior of a program [34]. Change at an “intermediate level” means that refactorings focus on changes that are more abstract than rewriting lines of code but more concrete than adding features to a system. For example, a refactoring might involve moving an operation from one class to another. This change might make the program more streamlined or easier to understand, but it will not affect the 29 program’s behavior. This property is known as behavior preservation (also known as semantic equivalence), and informally means that one could pass the same input to a program before and after the transformation is performed and receive precisely the same output from each. Studying the theoretical foundations of refactoring and establishing a formal basis for it has been a common subject of research in the software engineering field. We present a summary of such research in this section. Behavior-Preserving Refactoring of Object-Oriented Programs. Opdyke developed a set of refactorings [34] that are intended to facilitate behavior-preserving modification of object—oriented code, with a specific focus on encouraging the de- velopment of automated refactoring tools. These transformations are specifically designed to make the program more maintainable, flexible, or extensible, while re- specting behavior preservation. Behavior preservation for each refactoring is en— forced by pre—conditions and post-conditions that must be met before and after the refactoring is performed, respectively. As a simple example, a post-condition might be the following: “After the refactoring is applied, each class must have a unique name” . OCL-Based Specifications. A similar set of refactorings were proposed in 2001 by Sunye et al. [43] that make use of the Object Constraint Language (OCL) [50] as a formal foundation. OCL is a declarative language that supports the specification of pre— and post-conditions on elements in the UML meta—model. For example, one could use OCL to specify that a method GetAutoID must exist in a class Automobile before a refactoring is performed (a pre—condition) and afterward (a post-condition). The authors provide sample OCL code for refactorings that deal with class diagrams and statecharts. More specifically, these OCL constraints are designed to supervise the addition, removal, relocation, and promotion (in the sense of generalization and Specialization) of attributes and methods in class diagrams to ensure that model 30 COllrlrl mush an em or en Filllll' exist i1 Usin malt ll [lilf sift-r and i. is ill} deslg flats liS ll Sl‘Sii (3011‘, Him 3 ii emi Re: lllr fjf‘], ‘ . l' '. d]’l‘]! consistency is preserved during the refactoring. For refactoring statecharts, the constraints support replacing a set of equivalent actions attached to transitions with an entry or exit action attached to the state, or, symmetrically, unfolding an entry or exit action into a set of actions attached to the transitions to and from a state. Furthermore, a state can be moved into a composite state in order to simplify an existing statechart. Using Object-Oriented Metrics. Tahvildari and Kontogiannis [44] propose a methodology in which a catalog of object-oriented metrics is used to decide when a particular transformation should be applied in order to improve the quality of a software design. The work analyzes the interaction between specific transformations and ob ject-oriented metrics. Metrics are also used to detect common design flaws and to decide when a corrective transformation should be performed. The paper classifies design flaws into three groups: structural flaws, behavioral flaws, and architectural flaws. These categories enable detection and correction of a given. flaw based on its level of abstraction. From these three categories, a finer-grained classification system is also presented. Additionally, the authors define three categories of metrics: complexity, coupling, and cohesion. These metrics are derived from the literature on measurement of software quality. The proposed methodology was implemented as a Java tool and validated on Java Expert System Shell, a rule engine and scripting environment. Refactorings as Graph Manipulations. Mens et al. proposed a formal model for refactoring based on graph transformations [24]. By representing the object- oriented class hierarchy of a program as a graph whose vertices are classes (or oper- ations) and whose edges are relationships among those elements, the authors were able to utilize available research and tools for analyzing and manipulating graphs. As with the approaches already mentioned, this graph-based technique used pre- and 31 post-conditions in order to ensure preservation of the intent and, ideally, the behav- ior of a given program after it is refactored. The authors developed a notation similar to regular expressions that facilitates searching the graphs for patterns (e.g., to de- termine whether the graph contains a method invocation from method X to method Y). Using this notation, a software designer can specify rules (pre-conditions) that must be satisfied before a given refactoring can be performed. Likewise, the rules can verify that aspects of a program remain valid (post-conditions) after the refac— toring is performed. Within this framework, the authors reduced the problem of refactoring to one of transforming an input graph into an output graph by way of a series of simple graph modifications whose validity can be verified. Fixing Design Flaws. rIrifu et al. developed a method for identifying and fixing design flaws using human-assisted tools based on mappings between common design flaws and solutions [48]. The authors focused on design flaws that affect the non- functional goals of the program (e.g., extensibility or flexibility). They define a correction strategy to be a mapping between a design flaw and the set of possible solutions (refactorings) that, if applied, would correct that flaw. Detection of design flaws is performed using graph-based pattern matching in a manner similar to that of Mens et al. discussed previously [24]. In order to determine the order of preference among the set of possible solutions, a formal grammar is defined that can be used to specify atomic code transformations along with their respective difliculty. The influence of each design flaw on non—functional goals (e.g., extensibility) is recorded as well. With this information in hand, the authors’ prototype tool (known as the Advanced Refactoring Wizard) is able to make refactoring decisions automatically and will prompt the user only when the tool cannot reach a conclusion. 32 bar I [)ltllllr mm [)ll‘l'di‘ malt gi'iriil 3.2 Search-Based Refactoring Harman and Jones introduced the term search-based software engineering (SBSE) [15], arguing that the benefits of evolutionary computation could be brought to bear on software engineering problems that can be reformulated as optimization problems. Harman and Clark [16] presented evidence that existing object-oriented metrics could be used to construct effective fitness functions for evolutionary ap- proaches to automated software maintenance and refactoring. Possible improvement mechanisms include popular evolutionary computing techniques, such as genetic al- gorithms, genetic programming, simulated annealing, and hill climbing. O’Keeffe and O Cinnéide [30] considered the problem of automatically improv- ing the design of an ob ject—oriented program using a search-based approach. Calling the technique search-based refactoring, they selected simulated annealing (SA) as the improvement mechanism and implemented two complementary mutation oper- ators that SA could use to modify a program. These operators were pullUpMethod, which moves a method to the superclass of the class in which it was defined; and pushDownMethod, which moves a method to the subclasses of the class in which it was defined. Next, the authors constructed a fitness function using two simple metrics: availableMethods, computed as the sum of the methods declared within a given class and the methods inherited by that class; and methodsInherited, computed as the number of methods that a given class inherits. By minimizing availableMethods and maximizing methodsInherited, a crude measure of quality could be determined for a given program. This quality was used as part of the simu- lated annealing process in order to determine how the program should be modified. The approach assumes that the source code of the program is available, and this code is parsed into an abstract syntax tree (AST) by a prototype tool. Refactoring steps are then performed on this syntax tree, and the resulting structure is evaluated according to the metrics described above. Using a simple case study that initially 33 contal; 311011112 optima flit: qll In (if em: a gem; 1131115 0m: l'lll]f*t'l‘ iii-hair as sf mm contains code duplication and unnecessary methods, the authors showed that their automated approach could refactor a program so that its methods were positioned optimally in the hierarchy with respect to the metrics that were used to evaluate the quality of the program. In subsequent studies, O’Keeffe and O Cinnéide considered a greater variety of evolutionary algorithm, including single— and multiple-ascent hill climbers and a genetic algorithm [33,35]. Furthermore, the authors evaluated the evolving pro- grams using a more sophisticated quality model known as Quality Model for Object— Oriented Development (QMOOD) [2]. QMOOD assesses the quality of a program’s object-oriented design by examining interface size, inter-class coupling, and use of inheritance, among others. After examining each evolutionary algorithm on four case studies, the authors show that multiple-ascent hill climbers are more effective than other mechanisms in terms of speed and consistency of quality gain, but the advantage is not statistically significant. However, the authors do not consider the use of genetic programming, which is a cornerstone concept in our proposed ap— proach and has the potential to generate richer, more complex refactoring strategies through composition of multiple design changes. Seng et al. [40] propose a search—based refactoring approach that uses a genetic algorithm and a set of metrics from Briand’s catalog [5] in order to optimize the class structure of an ob ject-oriented program. The selected metrics evaluate coupling, co- hesion, complexity, and stability. The approach considered only the movement of methods from one class to another, and it is therefore less general than the ap- proaches of O’Keeffe and O Cinnéide. However, the authors pointed out the need to consider the role that a given class plays in a design. For example, design patterns will often violate design guidelines (e.g. cohesion) in order to fulfill their responsibil- ities, and get / set methods should typically remain in the classes in which they are defined. Using this information in a search-based technique would reduce the size 34 of the the us Ff‘let'll exam} tilt-flu the a] each tool 5 pared 0f t‘lti txprtg Class of the search space by removing uninteresting solutions. However, the authors defer the use of such heuristics to future work. To evaluate their approach, the authors selected the JHotDraw open source drawing framework, which is known as a good example for the use of design patterns. The authors manually displaced several methods from their original class in order to provide additional refactoring tasks for the approach to perform. The prototype tool that implements this approach moved each displaced method back to its original class at least once. Furthermore, the tool suggested several refactoring sequences that the authors deemed helpful. Com— pared to our approach, the proposed technique focuses exclusively on the relocation of classes and does not consider composition of refactoring steps, thus limiting the expressiveness of its solutions. However, the relocation of methods to optimize the class structure of a software design is a useful refactoring strategy. 3.3 Design Pattern Detection In this section, we review approaches for automatically detecting instances of de- sign patterns in source code and UML models. Unless otherwise specified, these approaches only consider the detection of Gamma design patterns [12]. Kramer and Prechelt [21] propose a PROLOG-based method for detecting in- stances of design patterns in object-oriented source code. The approach extracts design information from C++ header files and translates them into PROLOG facts (e.g., inheritance between class A and class B might be represented as “inherits A B”). Each design pattern is represented as a PROLOG rule or query that identifies every subset of facts that satisfies the terms of the query. In the context of design pattern detection, each PROLOG query searches for all combinations of classes, in- terfaces, or operations that can properly fill the roles of a particular design pattern. The result of each query, then, is the set of candidate design pattern instances in the software design. The authors implemented their approach in a tool called Pat and 35 dam? to th- torre drift tram Stilt ah'z. antl min] or E l5 ‘dl Off . r. 5] m2 ' ‘.'\ l; is, “flat lb f]; demonstrated through an empirical study that while its recall (i.e., the tool’s ability to detect known design pattern instances) was 100%, its precision (the number of correctly-detected design pattern instances divided by the total number of instances detected) ranged from 14% to 50% depending on the program being analyzed. The authors note that while this precision is “acceptable”, it would be much higher (i.e., 53-100%) if the structural analysis phase were capable of detecting method call del- egations. For each questionable design pattern instance, the authors consulted the program source code and its documentation in order to decide whether it is valid. Birkner [4] proposes a similar method to that of Kramer and Prechelt that ex- tracts design information from Java bytecode. Using a toolchain derived from the SWAG Kit [42], his approach uses a set of Query Language (QL) queries that an- alyze bytecode facts to identify candidate instances of structural design patterns. The structural queries presented are similar, though more coarse-grained, than the PROLOG queries of Kramer and Prechelt. Birkner’s approach also supports the de- tection of behavioral design patterns. The approach instruments the Java bytecode and extracts a trace of the program’s execution, thus enabling the analysis of runtime information that is not available from static, structural artifacts such as source code or UML class diagrams. The author experimentally demonstrates that his approach is able to detect the Gamma design patterns with more precision than PIN OT [41] or FUJABA [25], which are design pattern detection tools. Tsantalis et al. [49] propose the use of graph similarity scoring for detecting de- sign pattern instances. The approach represents a software design as a graph (i.e., a set of vertices and a set of edges between those vertices), thus enabling the use of established graph (and matrix) analysis algorithms. Specifically, each type of associ— ation between classes is modeled as a distinct matrix. One matrix depicts inheritance relationships, another depicts method invocations, and so on. A set of these matrices is defined for each design pattern, as well as the target software design. From this 36 found: matrit This { [i.e. : pitta St'lirt‘ apprt foundation, the authors define a similarity matrix that results from comparing the matrices for the target software design against the matrices for each design pattern. This process returns a similarity score within the range [0, 1]. A high similarity score (i.e., a value near 1) for a given design pattern indicates that an instance of that pattern probably exists in the target software design. Conversely, a low similarity score indicates a low probability that such a design pattern instance exists. This approach, in contrast to PROLOG based techniques, enables the detection of design pattern instances that differ from their standard representation. Given the nuanced nature of software design, this flexibility is a powerful asset. However, due to the computational complexity of large matrix operations, the approach requires that the target design be partitioned into subsystems, thus limiting its general applicability. Our approach leverages the work of Kramer and Prechelt [21] to detect design pattern instances using PROLOG. We extend their work, as well as the work of Birkner [4], to detect design pattern instances more efficiently using the Structured Query Language (SQL). 3.4 Automated Design Pattern Instantiation In this section, we review work that is most similar to our proposed approach. These techniques focus on the automatic instantiation of design patterns, either in program source code or in UML models. Design patterns have historically been treated as building blocks that are most effective when a new software system is being constructed. However, Schulz et al. [39] recast design patterns as operators that transform an existing (input) software design into a target (output) design. The approach uses Opdyke’s refactorings [34] as a basis for the transformation. After a successful application of a given operator, the target design contains an instance of the design pattern that corresponds to the operator. The authors proposes a five-step process for using the operators. It is 37 assumt design mftmu that t' illt‘ (it: applies Sllllt'llI deign. bi? fillf'l (hinge dassos t‘ll‘dllgt- patten thatch, mhra 3:1 up], lht‘ (in, Opera] linden Oi it. 1 [1011" E ills: ll (ll-"fill . fall? a "h‘? lit, and; - ’ii 1‘; assumed that a software design exists and that a software engineer wishes to apply a design pattern to that design. First, a software engineer identifies the portion of the software design that should be reorganized and selects the design pattern operator that will perform the reorganization. Second, the preconditions associated with the design pattern operator are checked to ensure that the design pattern can be applied. Third, the process performs a parameterized transformation of the problem structure (i.e., the portion of the software design to be reorganized) into the target design. The parameters used in this step are the roles in a design pattern that must be filled by (e.g.) classes and operations. Fourth, the software engineer must address changes that the design pattern operator made to interfaces of existing classes. All classes that depend on these interfaces must be updated to be consistent with the changed design. Finally, the fifth step calls for the postconditions of the design pattern operator to be checked. The postconditions are difficult to formalize, and therefore they are described informally and may need further attention from the software engineer. The authors argue that recasting design patterns as operators, as opposed to building blocks, is favorable because the software engineer applying the design pattern only needs to know the parameters (i.e., the role-playing classes, operations, etc.) that will participate in the pattern. This removes the need to understand all of the details of implementing a design pattern from scratch. O Cinnéide and Nixon [28] introduced the notion of decomposing design patterns (e.g., those of Gamma et al) into minipatterns (e.g., “encapsulate object construc- tion” and “abstract access via an interface”) and minitransformations (e.g., “replace class with interface”). They argue that by understanding these smaller elements, a developer can reason about how an existing model must change in order to incorpo— rate a given design pattern while preserving the original behavior and semantics of the model. The authors later considered the automated application of minipatterns and minitransformations in order to produce better refactoring tools [27,29]. How- 38 pr I'll'l Whit ': [limit lien t intro. USP 0 ever, they acknowledge that the approach can be automatically applied only under “green-field conditions”, thus limiting its general applicability. Albin—Amiot et al. [1] discuss the available tools and techniques for automating the use of design patterns with respect to building a new program or identifying de- sign patterns in an existing program. They propose that the combination of a pattern description language, a set of pre-specified patterns, automated detection and iden- tification of patterns, and automated source code refactoring is the correct approach to solving the larger problem of automating the detection, analysis, and application of design patterns. The authors present two prototype tools: Patterns-Box, which provides assistance to developers who are designing a new program, and PTIDEJ, which helps to identify design patterns in existing code. When used together, these prototypes can detect defects in existing software designs and assist in the correc— tion of those defects by suggesting and automatically applying changes. Despite this increased focus on general automation, however, the approach does not consider the use of flexible search techniques such as evolutionary computation. Jeon et al. [19] propose an automated approach to refactoring based on design patterns in Java programs. Their approach defines a set of inference rules for iden- tifying portions of an existing design that could benefit from the introduction of a design pattern. Furthermore, the approach automatically identifies a refactor- ing strategy in order to introduce a given design pattern. The approach translates the target software design into set of PROLOG predicates, similar to the approach of Birkner [4] and the approach presented in this thesis. Each supported design pattern has a corresponding Prolog query that identifies classes that could fill the roles of the pattern. The authors present an example application of the Abstract Factory design pattern to a simple program. Gomes et al. [13,14] identified the need for automation of design pattern selec- tion. They argue that existing approaches depend too heavily on human input in 39 order to choose which design patterns to apply and when to apply them. Using Case-Based Reasoning (CBR), a technique developed by the Artificial Intelligence community, they present a method that seeks to automate the selection and appli- cation process. By encoding the situation in which a design pattern was used in the past, a CBR system can learn to select and apply design patterns based on the structure of an existing program. Representing a design pattern in the form of a case (i.e., a chunk of experience), then, enables this approach to automate the selection and application of design patterns. In order to validate the approach, the authors constructed a prototype tool that consisted of a UML front-end and a CBR reason- ing engine. Using a sample of 60 past design pattern applications and 25 sample class diagrams (containing three to five objects and no operations or attributes), the authors report that the prototype tool correctly identified 76% of the past design pattern applications as viable for the sample model being considered. Our approach uses the key insight that O Cinnéide’s minitransformations [27] can be understood as functions that transform one software design into another. 3.5 Summary In this chapter, we reviewed the literature related to search-based refactoring, de- sign pattern representation and detection, and automated instantiation of design patterns. Given the utility and ubiquity of refactorings, we recognize that auto— mated refactoring has the potential to reduce significantly the amount of manual labor that software engineers must perform in order to maintain and extend exist- ing software systems. However, existing automated refactoring approaches tend to couch the problem as one of pattern matching; i.e., first locate a sub-optimal portion of the design, and then make a predefined set of changes to that portion to improve it. This means that we must be able to specify all of the design flaws that we want to correct, which is difficult to achieve in general. By leveraging evolutionary compu- 40 tation and using software engineering metrics for guidance, search-based refactoring approaches have minimized the problem of identifying which portions of a design to refactor. However, these approaches have not addressed two key issues. First, we believe that the composition of multiple refactorings can generate richer, more complex sequences of refactoring steps. Following this line of thought, we believe that composition of multiple refactorings can result in the automated introduction of design patterns into existing software designs. This thesis addresses these issues through the use of genetic programming, the minitransformations of O Cinnéide, and software engineering metrics. 41 Chapter 4 Approach In this chapter, we present our approach for the automated improvement of object- oriented software design through the introduction of design patterns. 4. 1 Overview Our approach to improve object-oriented software design is to use a genetic pro- gramming environment (a GP) that is guided by software engineering metrics to determine the optimal set of refactorings to apply to a software design. This ap- proach has two objectives: first, to improve the quality of the design as measured by the QMOOD suite of object-oriented metrics; and second, to introduce design patterns when appropriate in order to improve the maintainability of the design. In this chapter, we describe the three major components of the GP: how individ- uals in the evolving population are represented, how those individuals are changed through mutation, and how newly-evolved individuals are evaluated. A data flow diagram that represents the approach is shown in Figure 4.1, and the remainder of this chapter describes the elements of the diagram. 42This page dedicated to the memory of Douglas Adams. 42 Convert XMI Software to Graph Design (XMI) Annotated Graph Create Population Individuals ‘—J___ Metrics Evaluate Population ‘1 . Design Fitness of Patterns Each Individual Select Individuals for Next P0pulation Current Population Individuals 7 fab—sewer and Mutation [n Generations Completed] Select Best lndividual(s) Refactoring Solution(s) Figure 4.1: Data Flow Diagram of the Proposed Approach 4.2 Individual Representation A GP contains a finite population of individuals that are given an optimization prob— lem to solve. In this approach, we represent these individuals as a pair that includes a design graph and a transformation tree. A visual depiction of the relationships between a population, an individual, and the components of an individual is given in Figure 4.2. 43 Individual Population Legend Transformation Node (3 Information Node 0 0 Individual mm”) »: . ,, . ,. Convertible .. ,, (Convertible>< Automobile D] Facto ]( Factory ] call‘ instantiate ‘ . onvertible ® Convertible call [m . - Convertible I drives produces Facto Target Software Design (Elided UML Class Diagram) Figure 4.2: GP Components and Relationships 44 4.2.1 Design Graph One of the inputs to our approach is the target software design, in the form of a UML class diagram, that we want to refactor to include design patterns. Our approach translates this class diagram into a graph (i.e., a set of vertices and a set of edges between those vertices) that is augmented with design meta-data, such as method invocations. We call this augmented graph structure a design graph. A simple UML class diagram is shown at the bottom of Figure 4.2, and the equivalent design graph is shown directly above it. Each vertex ill a design graph represents a class, interface, or operation. The edges between these vertices denote semantic relationships between the entities that they represent. An edge has one of the following types: aggregate, associate, compose, call, implement, inherit, instantiate, or own. Each type represents a relationship between elements in an object-oriented design. Every individual in the population is given a copy of the design graph to modify. Representing the target software design as a graph enables us to use common graph algorithms for analysis. Furthermore, a graph-based representation enables us to use metrics that are amenable to graph-based implementations, such as QMOOD [2]. To the best of our knowledge, this approach is the first to combine a graph-based representation of a software design with a GP to perform automated refactoring. 4.2.2 Transformation Tree Each individual is responsible for modifying its copy of the design graph to improve its quality with respect to a set of metrics. The type of modifications to perform, and the order of these modifications, is decided by the individual’s transformation tree. A sample transformation tree is shown in Figure 4.2. The transformation tree is a type of abstract syntax tree (AST). ASTs are typically used to represent the structure of a computer program, which comprises a series of code statements 45 that must be executed ill a specific order. In this approach, the code statements are replaced by minitransformations that make well-defined changes to the design graph. These minitransformations are represented by nodes. When an individual is executed by the GP, the individual’s transformation tree is executed in standard tree-traversal order. That is, the root node is executed first, and its children are executed recursively, beginning with the left-most node and proceeding from left to right. When a node is executed, it has the opportunity to analyze or modify the graph. A node can also take input from its child nodes, thus enabling multiple nodes to collaborate to make complex modifications to the graph. Next, we describe the types of nodes that appear ill a transformation tree. 4.3 Design Change Mechanisms We introduce two categories of transformation tree nodes ill our approach. The first category, transformation nodes, comprises nodes that make changes to the target software design. These nodes are depicted as shaded ovals in the transformation tree shown in Figure 4.2. Transformation nodes require information from the target design to do their work. For example, a transformation node that represents the PartialAbstraction minitransformation needs to know the class from which it will construct an abstraction. This information is provided by the second category of nodes, information nodes, which represent elements in the target design (e.g., classes). These nodes are depicted as white ovals in the transformation tree in Figure 4.2. Similar to the order of parameters ill a function call, the left—to-right order of the information nodes determines their role in the work of a transformation node. In the remainder of this section, we discuss each of these categories and the nodes that they comprise. We begin by describing a set of helper functions that perform simple tasks on behalf of the transformation nodes, such as creating an empty class in the target software design. Next, we discuss each of the transformation 46 nodes and each of the information nodes. We conclude with a discussion of the way in which the nodes are evaluated when a transformation tree is being executed. 4.3.1 Helper Functions In this section we discuss each of the helper functions that supports the operation of the transformation nodes. abstractClass The abstractAccess function is declared as follows: Interface abstractAccess(Class G, String name) This function constructs and returns a new interface named name that contains all of the public methods of class c. createEmptyClass The createEmptyClass function is declared as follows: Class createEmptyClass(String name) This function constructs and returns an empty class named name. createWrapperClass The createWrapperClass function is declared as follows: Class createWrapperClass(Interface iface, String wrapperName) This function constructs and returns a new class called wrapperName that provides the same interface as the interface iface and delegates each method in the interface to a wrapped object of type iface. makeAbstract The makeAbstract function is declared as follows: 47 Operation makeAbstract<0peration G, String newName) This function constructs and returns a method (operation) called newName that, when given the same arguments, will create the same object as the constructor operation c. 4.3.2 Transformation Nodes Transformation nodes make well-defined changes (transformations) to the target software design. These nodes are adapted from O Cinnéide’s minitransformations [27] and are designed to construct instances of the Gamma design patterns [12]. As demonstrated by O Cinnéide, the minitransformations are sufficiently powerful to construct instances of the Gamma design pattern subset that we support in this approach. The minitransformations were originally designed to introduce design patterns into program source code. In our approach, we adapted the minitransfor- mations to perform design-level transformations, thus enabling their use in UML class diagrams. Next, we present a list of the transformation nodes that includes their inputs (provided by child nodes in the transformation tree), outputs, a graphi- cal depiction of the transformation that is performed, and a pseudocode description of the transformation. AbstractAccess Transformation The inputs and output of the AbstractAccess transformation node are as follows: Inputs: class Context, class Concrete, interface Iface Output: interface If ace The AbstractAccess transformation node modifies a class Context that directly uses another class Concrete so that Context accesses Concrete through an interface IConcrete. This change is illustrated in Figure 4.3. It is generally accepted that 48 such decoupling, or loosening of the relationships between two classes by utilizing an interface, facilitates the development of new classes and other design changes in order to meet future requirements [12]. This minitransformation, when used in combination with the Abstraction minitransformation, utilizes a newly-created interface in order to decouple one class from another. Before After lConcreto Wencrete «interface» , u_§es _ «1mm doxo : int °°'“°’“ ' doxo : int doYO : int doYO : int 4 : implements E implements I Comérete Conbéote uses _ name: tring name: tri “m“ ' doXF int TJ—do :Int doYO : int doYO : int Figure 4.3: Abstract Access Minitransformation Implementation. First, AbstractAccess node modifies the Context class to ac- cess the concrete class through the Iface interface. To accomplish this, it ex- amines any outbound edges from Context that end at Concrete. For each such edge, the node creates a new edge from Context to Iface with the same type (e.g., associate). Finally, the AbstractAccess node returns Iface as its output. A pseudocode version of this algorithm is shown in Algorithm 1. Algorithm 1 Pseudocode for AbstractAccess Transformation Node AbstractAccess(graph C, class Ctx, class Cone, interface Iface) 1: edges = outEdges(Ctx) 2: for all e E edges do 3 if e.sink = Gone then 4 G.addEdge(e.source, Iface, e.type) 5: G.removeEdge(Ctx, e) 6 end if 7: end for 8: return Iface 49 Abstraction Transformation The inputs and output of the Abstraction transformation node are as follows: Inputs: class Product, string ifaceName Output: interface If ace The Abstraction transformation node constructs a new interface that contains all of the public methods in an existing class, thus enabling other classes to take a more abstract view of the original class and any future classes that implement the interface. Introducing an interface in this fashion is a common refactoring step that restructures a software design to support additional functionality. For exam- ple, software that tracks vehicle inventory for an automotive dealership needs to track multiple vehicle types. These differing vehicle types will have many common characteristics, and a corresponding software design will include a class inheritance hierarchy that reflects those characteristics. When given a representative prototype, such as a class representing a mid—sized car, the Abstraction minitransformation can construct an interface for all classes that represent vehicles. A new vehicle- related class, then, only needs to implement the interface in order to integrate with the existing design. Before After Concrete lConcrete name : Slrin <> doX() : int doXO : int (1on int doYO : int I ] implements 1 Conbmto name : String doX() : int doYQ : int Figure 4.4: Abstraction Minitransformation 5O Implementation. The Abstraction transformation node creates a new interface Iface, whose name is derived from the string if aceName, that has the same public methods and attributes as the Product class. Next, the node adds an implements edge between Product and the new interface. Finally, the node returns the new interface Iface as its output. A pseudocode version of this algorithm is shown in Algorithm 2. Algorithm 2 Pseudocode for Abstraction 'Il‘ansformation Node Abstraction(graph G, class Product, string IfaceName) : # Create an interface with the same public operations as Product interface Iface = Helpers.abstractClass(Product, “I” + IfaceName) : G.addVertex(Iface) : G.addEdge(Product, Iface, IMPLEMENT) return Iface lbwtxpl—i gt Delegation Transformation The input and output of the Delegation transformation node are as follows: Input: class Owner 0 utput: class Delegate The Delegation transformation node is used to move part of an existing class to a. component class, and to set up a delegation relationship from the existing Class to its component. This change is illustrated in Figure 4.5. A class that has a'CCZU.mulated too many methods (sometimes called a “god class”) may benefit from ITloving some of those methods to a separate component class. Since a method that is moved from one class to another will no longer have access to private members of the original class, those members must be made public. Thus, Delegation can I‘eqmre the public interface of a class to change. 51 Before After Toncrete name : String doX() : int doYO : int calls Figure 4.5: Delegation Minitransformation Implementation. The Delegation transformation node creates a new empty class whose name is the concatenation of the original class (called Owner here) and the string “Delegate”. For the purposes of this example, the class would be called ownerComponent. Next, two new edges are created. The edges are of type aggregate and call, respectively, starting at Owner and ending at the new OwnerComponent class. Now that the delegate class has been created, the transformation node dele- gates functionality by moving one or more of Owner’s operations to OwnerComponent. In our implementation, the transformation node randomly chooses two operations (if they are available) to delegate. Finally, the node returns OwnerComponent as its output. A pseudocode version of this algorithm is shown in Algorithm 3. Algorithm 3 Pseudocode for Delegation Transformation Node Delegation(graph G, class Owner) string compNamc = Ownername + “Component” class Component = Helpers.createEmptyClass(compName) G.addEdge(Owner, Component, AGGREGATE) G.addEdge(Owner, Component, CALL) # Choose two methods (if available) at random to delegate Helpers.moveMethods(Owner, Component, 2) return Component EncapsulateConstruction Transformation The inputs and output of the EncapsulateConstruction transformation node are as follows: 52 Inputs: class Creator, class Product Output: class Creator The EncapsulateConstruction transformation node weakens the binding be- tween a class that creates instances of another class by relocating the code state- ments that perform instantiation into a dedicated method. This change is illustrated in Figure 4.6. In general, a class that contains many code statements that create objects is difficult to maintain. For example, if there is a change in the param- eters of a class’s constructor, then all statements that instantiate that class are invalidated and must be updated to use the modified parameters. However, we can resolve the problem by moving all object-creating code statements for a class Shape to a dedicated method CreateShape that is responsible for constructing and returning instances of Shape. By creating this dedicated method, we guarantee that any change to the constructor of class Shape only affects the code statements in a single method. The EncapsulateConstruction minitransformation constructs the dedicated method and modifies the relevant code statements to use the new method. Before After ShapeManager ShggeManager numSflapes : in_t__ numShapes : int_ CopyShape() :Shape CopyShape() : Shape RandomizeShapeO : Shape RandomizeShapeO : Shape TranslormShapeQ : Shape TransformShapeO : Shape CreateShape() : Shape Figure 4.6: EncapsulateConstruction Minitransformation Implementation. The EncapsulateConstruction transformation node creates a dedicated createProduct operation inside of the Creator class in order to localize the creation of Product instances to a specific location in the software design. To accomplish this, the node creates the createProduct operation and adds an own edge between the Creator class and the operation. Next, the node considers all 53 outgoing instantiate edges from the Creator class. If such an edge leads to the Product class, then the instantiate edge is removed, and a new instantiate edge is created between createProduct and Product. In addition, a call edge is created from createProduct to Product. Finally, the Creator class is returned as output. We note that the output from this transformation node is not a new design element. However, in order to satisfy the strong typing rules of the transformation tree, each transformation node must return an output. A pseudocode version of this algorithm is shown in Algorithm 4. Algorithm 4 Pseudocode for EncapsulateConstruction Transformation Node EncapsulateConstruction(graph G, class Creator, class Prod- uct) 1: edges 2 outEdges(Creator) 2: for all e E edges do 3: if e.type = INSTANTIATE and e.sink 2 Product then 4: G.removeEdge(e) 5: string methodName 2 “create” + Product.name; 6: operation op = Helpers.makeAbstract(Creator, methodName) 7: G.addVertex(op) 8: G.addEdge(op, Product, INSTANTIATE) 9: G.addEdge(Creator, op, CALL) 10: end if 11: end for 12: return Creator PartialAbstraction Transformation The inputs and output of the PartialAbstraction transformation node are as follows: Inputs: class Concrete, string NewAbstractName Output: class Abstract 54 The PartialAbstraction transformation node constructs a new abstract class from an existing (concrete) class and adds an inherits relationship from the con- crete class to the abstract class. This change is illustrated in Figure 4.7. Growing the class inheritance hierarchy in a software design is a common task as the design matures and as requirements change over time. PartialAbstraction grows the inheritance hierarchy by creating a new abstract class that has the same methods as an existing class, thus enabling other classes to inherit the functionality of the existing class in a way that facilitates future maintenance. Before After Concrete Abstract name : Strin doX() : int doX() : int doYQ : int doYO : int Concrete name : String Figure 4.7: PartialAbstraction Minitransformation Implementation. The PartialAbstraction transformation node creates a new abstract class whose name is derived from the string NewAbstractName. The node then creates an inherit relationship between the Concrete class and the new ab- stract class. Finally, the node returns the new abstract class as output. A pseu— docode version of this algorithm is shown in Algorithm 5. Algorithm 5 Pseudocode for PartialAbstract ion Transformation Node PartialAbstraction(graph G, class Concrete, string New- Name) 1: class Abstract = Helpers.createEmptyClass(“PAbstract”+NewName) 2: G.addVertex(Abstract) 3: G.addEdge(Concrete, Abstract, INHERIT) 4: return Abstract 55 Wrapper Transformation The input and output of the Wrapper transformation node are as follows: Input: interface If ace Output: interface WrapperIface The Wrapper transformation node wraps the functionality of an existing class with another class. This change is illustrated in Figure 4.8. Requests (e.g., method calls) to a wrapper object are forwarded to the wrapped object, and similarly the responses to those requests are passed back to the wrapper class and returned to the original calling object. This enables the wrapped class to be replaced at run time and loosens the coupling between the wrapped class and classes that use it. Before After CllOflfl uses * IConcrete uses «Interface» doX() : int cuomz "393 : doYO : int Figure 4.8: Wrapper Minitransformation Implementation. The Wrapper transformation node wraps the functionality of a set of classes (i.e., classes that implement a given interface) inside of another class, thus enabling the wrapped class to be replaced at run-time. To accomplish this transformation, the node creates a new If aceWrapper class and adds an aggregate edge between If aceWrapper and the interface If ace. Semantically, this relationship can be understood as the aggregation of multiple implementations of the interface, and this facilitates run-time selection of the implementation that is most suitable. 56 Next, the node modifies the classes that use Iface so that they use IfaceWrapper instead. 0 Cinnéide’s original Wrapper description specifies an additional input, namely a set of client classes that use the If ace interface, and specifies that only these classes should be modified to use WrapperClass. However, our implementation assumes that all classes that uses Iface should be modified to use WrapperClass. We argue that, generally, a designer would wrap all classes that implement an inter- face so that the specific class being used can be changed at run-time, and thus all implementation classes should be modified. A pseudocode version of this algorithm is shown in Algorithm 6. Algorithm 6 Pseudocode for Wrapper Transformation Node Wrapper(graph G, interface Iface) 1: string wrapperName = Iface.name+” Wrapper” 2: interface IfaceWrapper = Helpers.createVVrapperClass(Iface, wrapperName, “wrapped” + Iface.name) edges = inEdges(Iface) for all e E edges do addEdge(e.source, IfaceWrapper, e.type) removeEdge(e) end for G.addEdge(IfaceWrapper, Iface, AGGREGATE) G.addEdge(IfaceWrapper, Iface, CALL) 10: G.addEdge(IfaceWrapper, Iface, INSTANTIATE) 11: return IfaceWrapper 4.3.3 Information Nodes Information nodes supply semantic information from the target software design to the transformation nodes. To use a natural language analogy, if transformation nodes are verbs, then information nodes are the nouns and adjectives that provide context for the verbs. For example, a transformation node that modifies a class will need to be informed of the specific class that it is to modify. We now discuss each of the information nodes in turn. 57 o ClassNode instances represent a class in the target design. 0 IfaceNode instances represent an interface in the target design. 0 OperNode instances represent an operation in the target design. 0 StringNode instances represent a string that is derived from a dictionary of words that are relevant in the target design. This dictionary is typically gen- erated from a requirements document. 0 RootNode is a place-holder that has no effect on the target design and does not inform any transformation nodes. It is responsible for controlling the arity of the transformation tree; i.e., it loosely controls the number of transformation subtrees that evolve as the GP runs. 4.3.4 Lazy Node Evaluation One of the key GP design challenges in this work was to ensure that new design elements (i.e., classes, interfaces, and operations) created during the execution of a transformation node are available for use by all other nodes in the same individual. For example, we assume that a Wrapper transformation node has created a new wrapper class named MyWrapper. We further assume that a PartialAbstraction transformation node is executed later and needs to use the MyWrapper class in order to build an abstract class with the same public interface. For this composition of transformation nodes to take place, the Wrapper transformation node must be a child of the PartialAbstraction node. However, this places a strong restriction on the types of composition among transformation nodes that can occur. Ideally, transformation nodes in separate subtrees can reuse one another’s newly-created design elements. Furthermore, information nodes should be able to refer to classes, interfaces, and operations that did not exist in the original target design but were instead created by transformation nodes. 58 In order to provide a richer environment for composition of transformation nodes, we use a technique known as lazy evaluation. When a ClassNode is created during tree growing (e.g., during population initialization), it simply acts as a placeholder and does not refer to any specific class. When the tree containing the ClassNode is executed, the ClassNode will be visited, and at this point it will randomly choose a class from the current target design, which includes any classes that have been created by transformation nodes during tree execution. Once the ClassNode has chosen a class to represent, the choice is final. If the node becomes part of an- other individual’s tree through subtree crossover, the chosen class is preserved. This technique ensures that new classes have a fair chance of being selected. 4.4 Evaluating Refactored Designs When the GP evaluates an individual in the population, it executes the transforma- tion tree associated with that individual. When this execution completes, the indi- vidual’s copy of the target software design is in a modified state, and thus its quality must be re-evaluated. The GP uses this updated quality to determine whether the individual should be carried into the next generation of individuals. In this section, we discuss the factors that contribute to the quality of a modified target software design. 4.4.1 Design Quality Once an individual has executed and modified its copy of the target software design, the quality of that modified design must be evaluated. In this approach, we evaluate its quality using the QMOOD metric suite. The modified software design is passed to QMOOD as input, and the metrics suite returns a real-valued number that represents the overall quality of the design, taking into account quality characteristics such 59 as readability and maintainability. This number is used as the base fitness for the design. Incentives and rewards (e.g., for individuals that evolve design pattern instances) can be added to the base fitness in order to encourage the evolution of solution characteristics that are appropriate for a given target software design. 4.4.2 Design Pattern Detection The design change mechanisms (1ninitransformations) that are used in this approach are designed to create instances of the Gamma design patterns when they are com- posed. One of our primary objectives in this approach is to modify target software designs so that they incorporate design patterns. However, in order to identify can- didate designs that include design patterns we must have a mechanism for detecting their presence. In this section, we describe the PROLOG—based design pattern de- tector that we developed for this detection task. PROLOG [26] is a declarative logic programming language that supports queries over relations. To facilitate design pattern detection, we translate the target software design into relations that specify the classes and operations in the design, as well as the relationships between them (e.g., class inheritance or class-operation owner- ship). The relations for classes, interfaces, and operations are cls, interface, and Operation, respectively. With this set of relations in hand, we construct one PRO- LOG query for each design pattern whose instances we wish to detect in a given soft- ware design. Our PROLOG queries are adapted from the work of Kramer et al. [21], which provided a query for each design pattern supported by our approach with the exception of Abstract Factory. A PROLOG query for the Abstract Factory design pattern is shown in Figure 4.9. When an individual in the population is being evaluated, the PROLOG design pattern detector analyzes the individual’s design graph for the presence of design 60 abstract_factory(AFact,CFact,AProd,CProd,Client) :- cls(AFact), cls(CFact), cls(AProd), cls(CProd), cls(Client), inherit(CFact,AFact), inherit(CProd,AProd), instantiate(CFact,CProd), fcall(Client,CProd), AFact \= CFact, AProd \= CProd, AFact \= Client, CFact \= Client, AProd \= Client, CProd \= Client, AFact \= AProd, AProd \= CFact. Figure 4.9: PROLOG code for ABSTRACT FACTORY design pattern pattern instances by running the query for each design pattern individually. When an instance is detected, Prolog returns the set of design elements that satisfy the query; i.e., the set of classes, interfaces, and operations that participate in the design pattern instance. In the case of the Abstract Factory design pattern, this set of elements must include classes that fill the following roles: Abstract Factory (AFact), Concrete Factory (CFact), Abstract Product (AProd), Concrete Product (CProd), and Client (Client), where ‘\=’ is the negation of the boolean equality operator (i.e., not equal to). The GP records this set of elements for later processing. 4.4.3 Fitness Function After the evaluation of an individual has completed, the GP assigns a fitness value, a real-valued number, to the individual. This value is the result of a mathematical function known as a fitness function that incorporates the QMOOD value for the target software design, the presence (or lack) of design pattern instances in that design, as well as factors related to the content of the individual’s transformation tree. The fitness function that we developed for this approach is as follows: Fitness = CQ- QMOOD + CR- patternReward — CNCP' nodeCountPenalty + CMSR- matchingSequencesReward, 61 where CQ, CR, C NCP, and CMSR are coefficients that determine the relative weight of each term. These weights are parameters of the approach that may be tuned as appropriate for a given problem or domain. Next, we describe each term in the fitness function in detail. 0 QMOOD provides a rough measure of overall design quality. 0 patternReward adds a bonus if the target software design contains at least one design pattern instance. As a baseline, we have set this bonus at twice the QMOOD value. If the target software design does not contain a design pattern instance, then this term evaluates to zero. 0 nodeCountPenalty provides a penalty that is proportionate to the number of transformation nodes in the transformation tree:1 the penalty is intended to drive the GP to select individuals that have fewer transformation nodes, thus reducing the number of refactoring steps that are required to realize the individual’s refactored software design. To compute this term, we multiply the QMOOD value by the number of transformation nodes in the transforma- tion tree and multiply the result by a constant coefficient that determines the strength of the penalty. 0 matchingSequencesReward assigns a bonus for specific sequences of trans- formation nodes. O Cinnéide specified sequences of minitransformations that, under green-field circumstances, would create instances of design patterns [27]. To drive individuals in the population to evolve design pattern instances, we assign a reward that is proportionate to the number of these sequences that are detected. (We note that other sequences of minitransformations are capa- ble of creating design pattern instances, and we have observed this capability 1Some transformation nodes require more child nodes than others, and therefore we penalize in proportion to the number of transformation nodes rather than the total number of nodes in the transformation tree. 62 empirically. However, the GP is able to evolve design pattern instances more consistently when this reward is applied.) The GP maintains a record of the order in which transformation nodes are executed, and this record is consulted to determine when a reward should be assigned. A note on minitransformation sequences As discussed above, the matchingSequencesReward term in the fitness function awards a fitness bonus for individuals that apply specific sequences of minitrans- formations. The following is a list of these sequences. The design patterns whose instances are likely to result from the application of each sequence are given in parentheses. e Wrapper, Abstraction, AbstractAccess (Adapter, Proxy) Wrapper (Bridge) 0 Abstraction, AbstractAccess (Composite, Prototype) Abstraction, AbstractAccess, EncapsulateConstruction (Abstract Factory) 0 Abstraction, AbstractAccess, Wrapper (Decorator) 4.5 Example We now present a concrete example that describes the execution of a simple transfor- mation tree and its corresponding effect on a target software design. Specifically, we demonstrate how an instance of the Abstract Factory design pattern is constructed by the GP. The execution is shown graphically in Figure 4.10. For this example, we assume that an individual is undergoing a fitness evaluation. A key component of that evaluation is executing the individual’s transformation tree so that the GP 63 can determine whether the individual is capable of introducing a design pattern in- stance. The individual’s transformation tree is shown at the top of Figure 4.10. The two shaded nodes in the transformation tree are transformation nodes, and therefore they are responsible for modifying the target software design. The nodes in the tree are executed in a post—traversal order. For example, RootNode executes its left child (i.e., PartialAbstractionl) first, then its right child (i.e., PartialAbstractiong), and finally itself. The child nodes perform the same execution recursively until the bottom of the tree is reached. RootNode CPartialAbstractiom) < PartialAbstraction2> . ,, . ,, Convertible .. Coma... Atoms... a: Transformation Tree original: m - ftgfig‘gfl ‘ Automobile After PA1: ‘ onvert le Facto b: Intermediate Results ‘ Figure 4.10: Example Transformation Tree Execution In the lower half of the Figure 4.10 are three successive versions of an elided target software design that is being modified. These versions show the state of the 64 target software design at the beginning of the tree execution (denoted by “Origi— nal”), after the first PartialAbstraction node (denoted by “After PAI”), and fi- nally after the second PartialAbstraction node (denoted by “After PA2”). When the first PartialAbstraction node is executed, a new superclass called Vehicle is introduced, and the existing Convertible class inherits from it. Finally, the remain- ing PartialAbstraction node executes, and another superclass called Factory is created, along with an inheritance relationship between ConvertibleFactory and Factory. This simple example illustrates the construction of an Abstract Factory design pattern instance. 4.6 Summary In this chapter, we presented a detailed discussion of our approach for the automated improvement of object—oriented software design through the introduction of design patterns. 65 Chapter 5 Validation In this chapter, we validate our approach to automated software design improvement through a series of experiments. These experiments are conducted using a proto— type implementation of the approach described in Chapter 4. First, we present an overview of the prototype and its run-time environment. Next, we discuss the ex- perimental parameters and terms that are used throughout this chapter. Finally, we present the experiments and experimental results, along with a detailed discussion of the results. 5.1 Prototype Implementation In this section, we describe the prototype implementation that we constructed to validate our approach, as well as the run-time environment for the experiments in the next sections. We implemented our prototype using ECJl (Evolutionary Computation for Java), a Java—based framework that enables rapid prototyping of projects that use evolutionary computation to solve optimization problems. The following paragraphs describe the ways in which we extended ECJ to implement our approach. 1ECJ is available at http: //cs.gmu. edu/“eclab/projects/ecj/docs/ 66 5.1.1 Importing UML Class Diagrams In order to improve the quality of a software design, we must first represent that design in a format that is easy to analyze and manipulate. During the modeling and design phase of software, its object-oriented structure is typically represented in a modeling language such as UML. In order to facilitate information exchange between modeling tools, the XML Interchange (XMI) format was developed. While there are several mutually—incompatible dialects of XMI implemented in tools such as Rational XDE and ArgoUML, the format provides a useful medium for importing model information from those tools. For the approach proposed in this thesis, we use the XMI dialect implemented by ArgoUML [45]. To facilitate the use of this format, we developed a tool that translates the XMI data produced by ArgoUML into an graph-based structure that can be analyzed and manipulated by the GP. 5.1.2 Representing the Target Software Design as a Graph Our approach represents the target software design (i.e., the design that we wish to refactor) as a graph. A graph is a set of vertices and a set of edges between those vertices. Our prototype uses the .lGraph2 library to construct graph instances. Since representing a software design as a graph requires us to annotate the vertices and edges with additional information, such as the type of relationship that exists between two classes, we extend the basic graph implementation in J Graph to include these details. We refer to this extended implementation as a design graph. Each vertex in a design graph has an associated type (e.g., class, interface, or operation). Similarly, each edge has an associated type (e.g., association or inheritance) that gives the relationship a semantic meaning. Each individual in the GP population is given a copy of the graph to modify. 2JGraph is available at http://jgraph. com/ 67 In order to provide more design information for the GP and metrics to use, we make several assumptions about the relationships among classes and interfaces in the input model. For each association between two classes, in addition to creating an associate edge, we add two call edges, one going in each direction between the classes. If an operation in class ClassA returns an instance of a user—defined class ClassB (i.e., a class that is defined in the target software design), then we add an instantiate edge between ClassA and ClassB since it is likely that ClassA creates the instance that its operation returns. Similarly, for each class attribute whose type is a user-defined class, we add instantiate and aggregate edges between the class and the attribute. 5.1.3 Mutation Operators for Manipulating the Graph The core of each individual in a GP population is its transformation tree. The transformation tree comprises transformation nodes and information nodes, and it is responsible for modifying the individual’s design graph (and, therefore, the target software design). The specific changes that are made to the design graph depend on the type of nodes in the transformation tree and their position in the tree. The trans- formation nodes in this approach are implementations of the minitransformations proposed by O Cinnéide [27]. Since the minitransformations are defined as func- tions, they can be represented naturally as nodes in the tree-based representation used in genetic programming. Each transformation node requires input to complete its work. This input is mostly provided by child information nodes, although other transformation nodes can provide input as well. Each information node is a reference to a specific element in the design graph. The information nodes represent classes, interfaces, and oper- ations The child nodes are ordered from left to right, the role that each child node plays depends on its position in that order. For example, the EncapsulateConstruction 68 transformation node uses its left child node as the Creator class and its right child node as the Product class. In the language of genetic programming, these infor- mation nodes are terminal nodes,3 since they do not have child nodes and thus represent the termination of a branch of the tree. 'Ifansformation nodes, conversely, are non-terminal nodes that have one or more child nodes. Transformation Tree Node Typing Rules The ECJ framework enforces strict typing rules on transformation tree nodes, thus guaranteeing that each transformation tree is valid by construction. This guarantee reduces the amount of processing overhead incurred by the GP, as it does not need to verify a transformation tree before executing it. We now present a description of the typing rules that were developed for this approach. Our approach defines the following primitive node types: ClassNode, IfaceNode, OperNode, StringNode, RootNode, and nil (a dummy type). Each transformation tree in our approach has a single RootNode at its root. Both ClassNode and IfaceNode nodes can be children of the RootNode, and thus our approach de— fines a C1ass0rIfaceNode set type that enables the RootNode to accept both types as children. The following list specifies the child types that are accepted by each primitive node, along with the node’s return type. The first six nodes correspond to the transformation nodes discussed in Chapter 4. AbstractAccess Child nodes: 1. ClassN ode 2. ClassNode 3There is one exception: the RootNode is a non-terminal information node. 69 3. IfaceNode Returns: IfaceNode Abstraction Child nodes: 1. Cl assNode 2. StringNode Returns: IfaceNode Delegation Child nodes: 1. ClassNode Returns: ClassNode EncapsulateConstruction Child nodes: 1. ClassNode 2. ClassNode Returns: ClassNode PartialAbstraction Child nodes: 1. ClassNode 70 2. StringNode Returns: ClassNode Wrapper Child nodes: 1. If aceNode Returns: ClassNode The remaining nodes correspond to information nodes. ClassNode Returns: C1 assNode IfaceNode Returns: If aceNode RootNode Child Nodes 1. ClassOrIfaceNode 2. ClassOrIfaceNode 3. ClassOrIfaceNode Returns: nil StringNode Returns: StringNode 71 5.1.4 Implementing the QMOOD Metrics Suite After an individual has modified the target software design, the quality of the mod- ified design needs to be evaluated. Our approach begins this evaluation by applying the QMOOD metrics suite to the design. We implemented the individual QMOOD metrics as simple graph algorithms that analyze the structure of an design graph to discover the number of distinct class hierarchies in the target design, the average number of public methods across all classes, and so on. The software designs that are used in our initial experiments do not contain class attributes, and therefore our QMOOD computation does not consider the Cohesion Among Methods of a Class (CAM) metric. However, this is a limitation of the software designs and not of the approach. The values of the remaining ten metrics are used to compute a real-valued base fitness that represents the design’s overall quality, taking into account quality characteristics that include readability, flexibility, and so on. In this implementa— tion, each of these characteristics is given equal weight. However, the specific vector of weights that are assigned can be tuned for the application or domain in question. 5.1.5 Detecting Design Patterns with PROLOG and SQL The second step of evaluating a modified target software design is to identify any new design pattern instances that have evolved in the design. To facilitate this detection step, we implemented two design pattern detectors. The first detector uses PROLOG, a logic programming language, and the second uses the Structured Query Language (SQL). The detector implementations are based on the open source Java-based packages JLogic4 and HSQLDB5, respectively. Both detectors require the target software design to be translated into a set of facts. In the context of our approach, facts are predicates that specify a property of the target software design, 4JLogic is available at http://jlogic.sourceforge .net/ 5HSQLDB is available at http : //hsqldb . org/ 72 such as “Automobile is-a class”. Once the target software design is represented as a collection of facts, the design pattern detectors can perform an exhaustive search for design pattern instances. The design pattern detectors use queries (sometimes called rules) to search for facts that meet certain criteria. As a simple example, a query might return all classes that have an inheritance relationship with another class, an association relationship with yet another class, and contain more than two operations. Using these criteria, we specify queries that accurately identify design pattern instances. The PROLOG rules (queries) used in this thesis are adapted from Kramer and Prechelt [21], and the SQL queries are adapted from the work of Birkner [4]. An exhaustive list of these queries is given in Appendix A. Through experimentation, we observed that our SQL detector can scan a software design for design pattern instances more than 100 times faster than the PROLOG detector. However, despite producing equivalent results, we note that PROLOG queries are easier to read and more intuitive than the corresponding SQL queries, and therefore we use the PROLOG queries in this discussion. 5.1.6 Providing a Domain-Specific Vocabulary During the course of their execution, several transformation nodes (e.g., Wrapper) introduce new classes, interfaces, or operations. These new design elements must have unique names, and therefore a mechanism is needed to provide names that are relevant to the design. Ensuring that a sensible name is chosen for a new design element is a nontrivial problem. For the approach proposed in this thesis, we assume that a requirements document is available for the target software design. We then generate a list of nouns from the requirements document and use it as an additional input for the GP. 73 There are two cases when a name is required for a new design element. First, when a new StringNode is created, it selects a random noun from the list to use as its name. This value is then passed to the StringNode’s parent node, which is typically a transformation node. Second, several of the transformation nodes create new design elements that are closely related to existing elements. For example, a Delegation node might create a DrawingComponent class as a container for dele- gating some functionality from an existing Drawing class. Since the new component class (DrawingComponent) is tightly coupled to the Drawing class, we designed the Delegation transformation node to reuse its name. In this scenario, the transfor— mation node does not need to select a new name from the list of nouns. If the chosen name already exists in the design, then a numerical suffix is appended to create a unique name (e.g., DrawingComponent) might become DrawingComponent2. While this solution ensures that unique string-based names can be assigned to new design elements, we note that the solution is ad-hoc in nature and requires developer review to ensure that the chosen noun is sensible. 5.2 Run-time Environment and Parameters The experiments presented in this chapter were run on a 128—node, 1024-core cluster of 2.3GHz machines with 8GB of memory running SuSE Linux Enterprise Server 10 (TM) with kernel version 2.6.16. For each model, we conducted five independent runs using random seed values in the range from 1 to 5. We refer to each of these runs as a trial. The random seeds are used by the Java random number generator and are instrumental in generating the initial population of random individuals, as well as determining the subset of those individuals that will be mutated. Performing multiple trials helps to control for edge cases that are not representative of an average run of the GP. Only the best individual in a given trial is considered in our results. 74 For all of the experiments in this paper, the mutation and recombination of individuals proceeds as follows. When individuals are selected to be placed into the next generation, 90% of those individuals will be mutated and then recombined with another individual. The remaining 10% of selected individuals will be placed into the next generation unchanged. These probabilities are commonly used in genetic programming-based approaches [20]. 5.3 Hypotheses For each of the experiments discussed in this chapter, we present two hypotheses: the null hypothesis takes a no—ehange perspective, stating that the proposed approach will have no discernible effect on the software design being refactored (e.g., in terms of the overall design quality or number of design pattern instances). The alternative hypothesis, conversely, states that the approach, when applied to a software design, has a discernible and significant effect on the design. The experiments that follow are designed to test the proposed approach in a way that will enable us to reject one hypothesis and accept the other. 5.4 Experiments In order to evaluate the proposed approach and explore the effect of various parame- ter values on its ability to evolve a diverse set of design pattern instances in software designs, we developed a series of five experiments. The first four experiments vali- date the approach on a set of small UML models, most of which contain 10 to 15 classes. These models were originally used for a study of using case-based reasoning to automatically apply design patterns [13]. For example, one model represents the high-level workings of an accounting office, while another represents the design of a computer game. The set comprises 58 models in total, and each model represents a 75 software system that solves a unique problem. We document our process of selecting appropriate GP parameters (e.g., rewards and penalties) by describing the experi- ments that we conducted to compare the effects of various parameter values. The fifth experiment validates the approach on a larger UML model of a web-based repos- itory of software models. The input models (software designs) for these experiments contain only UML class diagrams. In experiments I-IV, we ran the GP for 100 generations with a population size of 100 individuals. The tournament size was 7 individuals. These parameters are typical for a genetic programming experiment and tend to provide enough running time and population diversity for a GP to discover high-quality solutions [37]. 5.4.1 Experiment I: No Rewards or Penalties (Baseline) The objective of this first experiment set is to evaluate our approach on a diverse set of models that were constructed without the use of design patterns, thus assessing its ability to introduce design pattern instances in such a scenario. For each of the 58 models, we performed the following experiment. Setup and Parameters. In this experiment, the coefficients for the patternReward, nodeCountPenalty, and matchingSequencesReward terms in the fitness function were 0.0 in order to evaluate the efficacy of the implementation without any rewards or penalties. Null Hypothesis (Base Case): There is no change with respect to the number of design pattern instances in each trial when the GP terminates; i.e., none of the trials contains any new design pattern instances. Alternative Hypothesis: A majority of the models contain a newly-evolved de- sign pattern instance in at least one trial when the GP terminates. 76 Results. The number and type of design pattern instances that evolved during this experiment are given in Table 5.1.. The mean number of transformation nodes in the best individuals’ transformation trees at the end of the run was 5.94. There were 3103 new design pattern instances that evolved during the run, and 40 of the 58 (69%) models evolved at least one new design pattern instance. This is an average of 10.7 newly-evolved design pattern instances per trial. From these initial results, it is clear that the approach is capable of constructing new design pattern instances in existing software designs. Design Pattern Name Number of Instances Abstract Factory 1 Adapter 176 Bridge 7 Composite 24 Decorator 66 Prototype 2758 Proxy 71 Total 3103 Table 5.1: Experiment I: Evolved Design Pattern Instances The trials in this experiment set ran for five minutes on average. The mean number of classes in the models was 10.9 at the beginning of the run and 77.1 at the end, showing a strong increase due to the nature of the minitransformations and the introduction of design patterns. Our detector found an average of 9.8 design pattern instances at the start of the run. Of the 290 trials (58 models and five seeds per model), there were 166 trials that did not evolve a design pattern instance. However, 40 of the 58 models evolved at least one new design pattern instance. Therefore, we reject the null hypothesis and accept the alternative hypothesis. In the remaining 77 experiments, we explore various options for increasing the frequency of evolution of design pattern instances. 5.4.2 Experiment II: Penalizing Large GP Trees One of the design goals for the proposed approach is to generate sequences of refac- toring steps of a reasonable size for a software engineer to understand and, if au- tomation is not desired, to apply manually. Each individual in a GP population contains a transformation tree that, in turn, contains transformation nodes that modify the target software design. Roughly speaking, the number of refactoring steps represented by an individual’s transformation tree is equal to the number of transformation nodes (minitransformations) in the tree. Thus, in order to encourage a population of individuals to generate shorter sequences of refactoring steps, in this experiment set we assign each individual a penalty proportionate to the number of transformation nodes. For all treatments in this experiment, this penalty is the product of a coefficient (e.g., 0.0025), the number of transformation nodes in the individual’s transformation tree, and the QMOOD quality of the individual. The intent of this penalty is to apply a small, persistent pressure on individuals to evolve transformation tree that contain fewer transformation nodes. Null Hypothesis (Base Case): There is no decrease in the number of trans- formation nodes when individuals are given a fitness penalty proportionate to the number of transformation nodes, as compared with Experiment 1. Alternative Hypothesis: There is a decrease in the number of transformation nodes when individuals are given a fitness penalty proportionate to the number of transformation nodes, as compared with Erperiment I. 78 First Treatment: 0.0025 Coefficient Setup and Parameters. In this first treatment, we choose the value 0.0025 for the coefficient of the nodeCountPenalty term in the fitness function. While the resulting penalty is small, we believe that it will provide a persistent and effective pressure toward trees with fewer transformation nodes. The coefficients for the patternReward and matchingSequencesReward terms are both set to 0.0. Results. The number and type of design pattern instances that evolved during this experiment are given in Table 5.2. The mean number of transformation nodes in the best individuals’ transformation trees at the end of the run was 6.87, a slight increase in comparison to the control, Experiment I, and the average fitness of these best individuals was 3.40, a sharp decrease. Design Pattern Name Number of Instances Abstract Factory 1 Adapter 239 Bridge 9 Composite 45 Decorator 20 Prototype 2681 Proxy 133 Total 3128 Table 5.2: Experiment II, First Treatment: Evolved Design Pattern Instances 79 Second Treatment: 0.025 Coefficient Setup and Parameters. In this treatment, we choose the value 0.025 for the coefficient of the nodeCountPenalty term in the fitness function. The coefficients for the patternReward and matchingSequencesReward terms are both set to 0.0. Results. The number and type of design pattern instances that evolved during this experiment are given in Table 5.3. The mean number of transformation nodes in the best individuals’ transformation trees at the end of the run was 5.86, and the average fitness of these best individuals was 2.57. Each model contained at least one new design pattern instance at the end of the run. In addition, the mean number of transformation nodes decreased slightly with respect to the control (5.94 in Experiment I). Design Pattern Name Number of Instances Abstract Factory 4 Adapter 166 Bridge 5 Composite 10 Decorator 36 Prototype 1845 Proxy 73 Total 2139 Table 5.3: Experiment II, Second Treatment: Evolved Design Pattern Instances 80 Third Treatment: 0.25 Coefficient Setup and Parameters. In this treatment, we choose the value 0.25 for the coefficient of the nodeCountPenalty term in the fitness function. The coefficients for the patternReward and matchingSequencesReward terms are both set to 0.0. Results. The number and type of design pattern instances that evolved during this experiment are given in Table 5.4. The mean number of transformation nodes in the best individuals’ transformation trees at the end of the run was 0.67, and the average fitness of these best individuals was 1.05, both of which are a sharp decrease from the control (Experiment I). Design Pattern Name Number of Instances Abstract Factory 9 Adapter 38 Bridge 0 Composite 0 Decorator 36 Prototype 198 Proxy 134 Total 415 Table 5.4: Experiment II, Third Tmatment: Evolved Design Pattern Instances Discussion In this experiment, we demonstrated that applying a penalty in proportion to the number of transformation nodes in an individual’s transformation tree is an effective mechanism for limiting the growth of the tree. There was no observed difference in the running time of these trials compared with those in Experiment I. The average 81 number of transformation nodes across all of the models for each treatment is given in Table 5.5. For comparison, the average number of transformation nodes from Experiment I is given as a control and is shown in the table as treatment C. While the greatest decrease was observed in Tmatment 3, we believe that this penalty is too strong and will not provide enough flexibility for the GP to discover useful solutions. Therefore, we choose the smaller coefficient value of 0.025 for CNCP. By keeping the number of transformation nodes at a reasonable level, we ensure that the sequences of refactorings generated by our approach can reasonably be applied by hand. Treatment CNcp # of Trans. Nodes C 0.0 5.94 1 0.0025 6.87 2 0.025 5.86 3 0.25 0.67 Table 5.5: Experiment II: Number of Transformation Nodes N ow that we have examined the effect of a penalty for longer sequences of refac- toring steps, the remaining experiments in this chapter will use the value 0.025 for the nodeCountPenalty term in the GP fitness function. Using this coefficient in the fitness function resulted in transformation trees that had an average of 5.86 transfor- mation nodes. We believe that this number of design transformations is reasonable for a designer to understand and apply by hand, and it is large enough to facilitate the construction of design pattern instances. 82 5.4.3 Experiment III: Rewarding Evolution of Design Pat- tern Instances In this section, we turn to the problem of 110w to encourage individuals to evolve a greater number of design pattern instances. We examine this problem by introducing a new reward that is the product of a real-valued coefficient and the QMOOD quality of an individual. Effectively, this reward improves an individual’s fitness by a fixed percentage that is equal to the value of the coefficient. For example, a coefficient of 0.2 provides a fitness boost that is equal to 20% of the individual’s base fitness. We examine several values for the coefiicient in this experiment set that comprises four treatments. Null Hypothesis (Base Case): There is no increase in the number of design pat- tern instances when individuals are given a fitness bonus for design pattern presence, as compared with Treatment 2 from Experiment 11. Alternative Hypothesis: There is an increase in the number of design pattern instances when individuals are given a fitness bonus for design pattern presence, as compared with Treatment 2 from Experiment II. First Treatment: 0.125coeflicient Setup and Parameters. In this first treatment, we choose the value 0.125 for the coefficient of the patternReward term in the fitness function. The coefficient for the nodePenalty term is 0.025, and the coefficient for the matchingSequencesReward term is 0.0. Results. The number and type of design pattern instances that evolved during this experiment are given in Table 5.6. The mean number of transformation nodes in the best individuals’ transformation trees at the end of the run was 6.42. Of 83 the 58 models, 2 showed a statistically significant increase in the number of design patterns compared with Treatment 2 from Experiment II. Design Pattern Name Number of Instances Abstract Factory 5 Adapter 248 Bridge 14 Composite 44 Decorator 57 Prototype 3546 Proxy 109 Total 4023 Table 5.6: Experiment III, First Treatment: Evolved Design Pattern Instances Second Treatment: 0.2fi5 Coefficient Setup and Parameters. In this treatment, we choose the value 0.25 for the coefficient of the patternReward term in the fitness function. The coefficient for the nodePenalty term is 0.025, and the coefficient for the matchingSequencesReward term is 0.0. Results. The number and type of design pattern instances that evolved during this experiment are given in Table 5.7. The mean number of transformation nodes in the best individuals’ transformation trees at the end of the run was 6.66. Of the 58 models, 4 showed a statistically significant increase in the number of design patterns compared with Treatment 2 from Experiment II. 84 Design Pattern Name Number of Instances Abstract Factory 9 Adapter 289 Bridge 4 Composite 37 Decorator 86 Prototype 2979 Proxy 158 Total 3562 Table 5.7: Experiment III, Second Treatment: Evolved Design Pattern Instances Third Treatment: 0.5 Coefficient Setup and Parameters. In this treatment, we choose the value 0.5 for the co— efficient of the patternReward term in the fitness function. The coefficient for the nodePenalty term is 0.025, and the coefficient for the matchingSequencesReward term is 0.0. Results. The number and type of design pattern instances that evolved during this experiment are given in Table 5.8. The mean number of transformation nodes in the best individuals’ transformation trees at the end of the run was 6.68. Of the 58 models, 7 showed a statistically significant increase in the number of design patterns compared with Treatment 2 from Experiment II. Fourth Treatment: 1.0 Coefficient Setup and Parameters. In this treatment, we choose the value 1.0 for the co- efficient of the patternReward term in the fitness function. The coefficient for the 85 Design Pattern Name Number of Instances Abstract Factory 4 Adapter 365 Bridge 2 Composite 29 Decorator 31 Prototype 3080 Proxy 174 Total 3685 Table 5.8: Experiment III, Third Treatment: Evolved Design Pattern Instances nodePenalty term is 0.025, and the coefficient for the matchingSequencesReward term is 0.0. Results. The number and type of design pattern instances that evolved during this experiment are given in Table 5.9. The mean number of transformation nodes in the best individuals’ transformation trees at the end of the run was 6.79. Of the 58 models, 17 showed a statistically significant increase in the number of design patterns compared with Tmatmcnt 2 from Experiment II. Fifth Treatment: 2.0 Coefficient Setup and Parameters. In this treatment, we choose the value 2.0 for the co- efficient of the patternReward term in the fitness function. The coefficient for the nodePenalty term is 0.0025, and the coefficient for the matchingSequencesReward term is 0.0. Results. The number and type of design pattern instances that evolved during this experiment are given in Table 5.10. The mean number of transformation nodes 86 Design Pattern Name Number of Instances Abstract Factory 8 Adapter 360 Bridge 15 Composite 47 Decorator 89 Prototype 4018 Proxy 259 Total 4796 Table 5.9: Experiment III, Fourth Treatment: Evolved Design Pattern Instances in the best individuals’ transformation trees at the end of the run was 5.85. Of the 58 models, 9 showed a statistically significant increase in the number of design patterns compared with Treatment 2 from Experiment II. Design Pattern Name Number of Instances Abstract Factory 14 Adapter 400 Bridge 12 Composite 75 Decorator 63 Prototype 3454 Proxy 247 Total 4265 Table 5.10: Experiment III, Fifth Treatment: Evolved Design Pattern Instances 87 Discussion In this experiment, we considered several values for the coefficient of the patternReward term in the fitness function. There was no observed difference in the running time of these trials compared with those in Experiments I and II. While in- creasing the patternReward coefficient did not result in a consistently higher number of design pattern instances, it is clear from the results that there is a significant in- creasing trend in the number of instances, and more design pattern instances evolved in comparison to Experiment II. Therefore, we reject the null hypothesis and accept the alternative hypothesis. The number of models showing a statistically significant increase (p < 0.05) in the number of design pattern instances is shown for each treatment in Table 5.11. Based on the results, we observe that the patternReward coefficient value 1.0 produces the best combination of design pattern diversity and average number of new design patterns instances, and a greater number of models showed a. statistically significantly higher number of evolved design pattern instances when using the coefficient value 1.0. We use this coefficient value for the remaining experiments. Treatment C p R # of Models 1 0.125 2 2 0.25 4 3 0.5 7 4 1.0 17 5 2.0 9 Table 5.11: Experiment III: Models with Significant Increase in DP Instances 88 5.4.4 Experiment IV: Rewarding Specific Sequences of Mini- transformations Now that we have considered the coefficients for the nodeCountPenalty and patternReward terms in the fitness function, we turn our attention to the final term: matchingSequencesReward. This reward is given to individuals whose trans- formation trees contain transformation nodes in specific sequences that are known to produce design pattern instances, as discussed in Section 4.4.3. The objective of this reward is to encourage individuals to use these sequences. However, the reward must be small enough that individuals can explore other sequences. Null Hypothesis (Base Case): There is no increase in the number of design pat- tern instances when individuals are given a fitness bonus for using specific sequences of transformation nodes, as compared with Treatment 4 of Experiment III. Alternative Hypothesis: There is an increase in the number of design pattern instances when individuals are given a fitness bonus for using specific sequences of transformation nodes, as compared with Treatment 4 of Experiment III. First Treatment: 0.025 Coefficient Setup and Parameters. In this treatment, we choose the value 0.025 for the coefficient of the matchingSequencesReward term in the fitness function. The coef— ficient for the nodePenalty term is 0.025, and the coefficient for the patternReward term is 1.0. Results. The number and type of design pattern instances that evolved during this experiment are given in Table 5.12. The mean number of transformation nodes in the best individuals’ transformation trees at the end of the run was 6.87. Of 89 the 58 models, 1 showed a statistically significant increase in the number of design patterns compared with Treatment 4 from Experiment III. Design Pattern Name Number of Instances Abstract Factory 6 Adapter 315 Bridge 13 Composite 88 Decorator 105 Prototype 4026 Proxy 219 Total 4772 Table 5.12: Experiment IV, First Treatment: Evolved Design Pattern Instances Second Treatment: 0.25 Coefficient Setup and Parameters. In this treatment, we choose the value 0.25 for the coeffi- cient of the matchingSequencesReward term in the fitness function. The coefficient for the nodePenalty term is 0.025, and the coefficient for the patternReward term is 1.0. Results. The number and type of design pattern instances that evolved during this experiment are given in Table 5.13. The mean number of transformation nodes in the best individuals’ transformation trees at the end of the run was 6.86. Of the 58 models, 2 showed a statistically significant increase in the number of design patterns compared with Treatment 4 from Experiment III. 90 Design Pattern Name Number of Instances Abstract Factory 18 Adapter 423 Bridge 6 Composite 85 Decorator 74 Prototype 4291 Proxy 267 Total 5164 Table 5.13: Experiment IV, Second Treatment: Evolved Design Pattern Instances Third Treatment: 0.5 Coefficient Setup and Parameters. In this treatment, we choose the value 0.5 for the coeffi— cient of the matchingSequencesReward term in the fitness function. The coefficient for the nodePenalty term is 0.025, and the coefficient for the patternReward term is 1.0. Results. The number and type of design pattern instances that evolved during this experiment are given in Table 5.14. The mean number of transformation nodes in the best individuals’ transformation trees at the end of the run was 6.90. Of the 58 models, 3 showed a statistically significant increase in the number of design patterns compared with Treatment 4 from Experiment 111. Fourth Treatment: 1.0 Coefficient Setup and Parameters. In this treatment, we choose the value 1.0 for the coeffi- cient of the matchingSequencesReward term in the fitness function. The coefficient 91 Design Pattern Name Number of Instances Abstract Factory 5 Adapter 288 Bridge 1 Composite 55 Decorator 27 Prototype 4445 Proxy 251 Total 5072 Table 5.14: Experiment IV, Third Treatment: Evolved Design Pattern Instances for the nodePenalty term is 0.025, and the coefficient for the patternReward term is 1.0, Results. The number and type of design pattern instances that evolved during this experiment are given in Table 5.15. The mean number of transformation nodes in the best individuals’ transformation trees at the end of the run was 7.23. Of the 58 models, 4 showed a statistically significant increase in the number of design patterns compared with Treatment 4 from Experiment III. mh Treatment: 2.0 Coefficient Setup and Parameters. In this treatment, we choose the value 2.0 for the coeffi— cient of the mat chingSequencesReward term in the fitness function. The coefficient fOr the nodePenalty term is 0.0025, and the coefficient for the patternReward term is 0.5. Results. The number and type of design pattern instances that evolved during this experiment are given in Table 5.16. The mean number of transformation nodes 92 Design Pattern Name Number of Instances Abstract Factory 19 Adapter 447 Bridge 11 Composite 54 Decorator 92 Prototype 4127 Proxy 346 Total 5096 Table 5.15: Experiment IV, Fourth Treatment: Evolved Design Pattern Instances in the best individuals’ transformation trees at the end of the run was 7.14. Of the 58 models, 2 showed a statistically significant increase in the number of design patterns compared with Treatment 4 from Experiment III. Design Pattern Name Number of Instances Abstract Factory 21 Adapter 400 Bridge 25 Composite 99 Decorator 83 Prototype 3541 Proxy 343 Total 4512 Table 5.16: Experiment IV, Fifth Treatment: Evolved Design Pattern Instances 93 Discussion In this experiment, we considered several values for the coefficient of the matchingSequencesReward term in the fitness function. There was no observed difference in the running time of these trials compared with those in Experiments I-III. Based on the results, we observe a steady increase in the number of evolved design pattern instances when the reward is increased. Therefore, we reject the null hypothesis and accept the alternative hypothesis. The number of models showing a statistically significant (p < 0.05) increase in the number of design patterns is shown for each treatment in Table 5.17. The largest number of models showed a statistically significant increase in Treatment 4. We choose the value 1.0 as the matchingSequencesReward coefficient for the remaining experiment. Treatment CMSR # of Models 1 0.125 1 2 0.25 2 3 0.5 3 4 1.0 4 5 2.0 2 Table 5.17: Experiment IV: Models with Significant Increase in DP Instances 5.4.5 Experiment V: REMODD Case Study This experiment set applies the approach to a model of the Repository for Model- Driven Development, or REMODD [11]. REMODD is a web-based repository for model-driven engineering artifacts, including models, test cases, documentation, code, etc. The design for REMODD contains 23 classes. This experiment is de- signed to test the approach on a medium-sized software design that already contains design pattern instances. 94 Null Hypothesis (Base Case): There is no change with respect to the number of design pattern instances in this design; i.e., none of the trials contains a design pattern instance. Alternative Hypothesis: At least one trial contains a newly-evolved design pat- tern instance when the GP terminates. Setup and Parameters. In this experiment, we ran the GP for 25 generations with a population size of 100 individuals. The tournament size was 7 individuals. Drawing from the conclusions of the previous experiments, the coefficients for the patternReward, nodeCountPenalty, and matchingSequencesReward terms in the fitness function were 0.5, 0.025, and 0.5, respectively. Results. The trials for this experiment took an average of six days to complete. The longer running time is largely due to the increased complexity (i.e., a greater number of classes and associations between them) of the ReMoDD design compared with the designs in the first experimental set. This increased complexity requires the design pattern detector to search a much larger space for possible design patterns, and the running time scales accordingly. The number of classes in the model was 23 at the beginning of the run, and the mean number of classes at the end was 38.8. This is a more significant increase than we saw in the first experimental set. However, this software design contains twice as many classes as an average design in the first experimental set, and this creates a larger set of inputs for the minitransformations. As a result, we expect to see a larger number of design pattern instances evolving in each trial, and this tends to require a greater number of classes. Our detector found 136 candidate design pattern instances in the original design. Table 5.18 depicts the number and type of design pattern instances that evolved during the experiment. The mean number of transformation nodes in the best individuals was 4.50. There 95 were a total of 61 new design pattern instances that evolved during the run. This is an average of 12.20 design pattern instances per trial. Each of the five trials evolved at least one design pattern instance. Based on these results, at least one trial evolved a new design pattern instance. Therefore, we reject the null hypothesis and accept the alternative hypothesis. Design Pattern Name Number of Instances Abstract Factory 2 Adapter 13 Bridge 3 Composite 0 Decorator 2 Prototype 26 Proxy 15 Total 61 Table 5.18: Experiment V: Evolved Design Pattern Instances 5.5 Discussion The contrast among trials with respect to the number of evolved design pattern instances reflects the role of randomness in the initial population of individuals, as well as the multi-dimensional nature of the solution space. The trials that did not evolve new design pattern instances consistently produced solutions with a higher QMOOD value, and subsequently a higher fitness value, than other trials for the same model. This suggests that despite improvements in quality that often accom- pany the introduction of design patterns, their introduction is not always the best course of action when one is trying to improve design quality. However, it must be 96 noted that these results only use one quality model (i.e., QMOOD) and therefore may not be representative of all quality models. We observe that specific types of design patterns are either over— or under- represented in these results. For example, the Prototype design pattern appears very often in experiments I-IV, and it is also the most prolific design pattern type in the final experiment. This distribution is due to the differing complexity of the design patterns. For example, an instance the Bridge design pattern comprises more classes and associations than an instance of the Adapter design pattern, and this increases the likelihood that an instance of Adapter will evolve. The results of these experiments are encouraging. They show that the proposed approach scales to medium-sized software designs. Furthermore, the variety of de- sign pattern types that evolved provides the human developer multiple options for improving the target software design, and each evolved design pattern instance pro- vides a step-by-step procedure for implementing the instance, thus reducing the manual design (or redesign) labor that the developer must endure. 5.6 Summary In this chapter, we presented experimental validation for the approach that is pro— posed in this thesis. First, we discussed the prototype implementation that was de- veloped to validate the approach. Next, we discussed the experimental parameters, hypotheses, and terms that were used throughout the chapter. We then presented the series of experiments that were conducted to explore the parameter spaces and to identify the most suitable parameter values. Finally, we presented the results of a real-world case study and discussed the consequences of the experimental results. 97 Chapter 6 Conclusions and Future Work 6. 1 Summary In this thesis, we presented an approach for automated improvement of software de- sign that leverages evolutionary computation, software engineering metrics, and the introduction of design patterns. Specifically, we proposed the use of O Cinnéide’s minitransformations as mutation operators in a GP and the use of software engineer- ing metrics to guide the process of selecting individuals to survive into subsequent generations. The minitransformations were originally discovered as part of a search for mini-patterns, or design patterns within design patterns [27]. The mini-patterns, it was hoped, would enable us to develop better automated refactoring techniques that could, for example, automatically introduce design patterns into a design that was constructed without them. By recognizing that these minitransformations are function-like constructs that can be composed to generate novel refactorings, we were able to construct a genetic programming-based approach for deciding the best sequence of refactorings to apply to an existing design. Through a set of five experiments, we showed that our approach is capable of simultaneously improving the quality of a software design (as measured by software engineering metrics) and automatically introducing design pattern instances, a com- 98 bination that has not previously been considered. First, we demonstrated that the approach is feasible and can construct instances of design patterns in existing de- signs with guidance from the QMOOD metrics suite only. Next, we identified three optimizations that can be used to control the number and type of design patterns that evolve. One of our design goals was to ensure that a human software engineer could understand the entire set of refactoring steps that our approach recommended. Therefore, the first optimization penalizes individuals in proportion to the number of transformation nodes (minitransformations) in their transformation trees, thus re- ducing the number of suggested refactoring steps on average. We demonstrated the effectiveness of this optimization in the second experiment. The second optimization introduced a reward for individuals that evolved at least one design pattern instance. The amount of reward was a fraction of the QMOOD quality value for the design, and we demonstrated in the third experiment that this reward tends to increase the number of design pattern instances that evolve. The third optimization introduces a reward for sequences of minitransformations that were identified by O Cinnéide as being capable of creating design pattern instances. This reward is small and con- stant (i.e., not a percentage based on the individual’s QMOOD fitness), and thus provides a small pressure that allows individuals to explore the search space with- out being unduly penalized. In the fourth experiment, we demonstrated that this reward increases the number, as well as the type, of design pattern instances that evolve. The first four experiments helped us to explore and tune the parameters of our experimental setup, and we applied the chosen parameters to a case study in the fifth experiment. The case study applied our approach to a web-based repository for model-driven development artifacts known as REMODD. As the results of the fifth experiment demonstrate, the proposed approach scales to medium-sized designs with tens of classes. 99 6. 2 Impact We believe that this work has practical applications as a tool for software engineer- ing students and as a design assistance tool for software engineers. Our approach makes suggestions regarding the Gamma design patterns that can be applied to an existing design and provides a well-defined series of refactoring steps to construct those instances. A student who is learning about the concepts design patterns could download a tool based on this approach, and that tool would help to identify por- tions of the student’s software designs that can benefit from the application of design patterns. Similarly, one could implement this approach as a plugin for a software modeling tool (e.g., the Eclipse Modeling Framework). This tool could provide real—time design advice to a software engineer who is considering the use of design patterns to improve maintainability and reusability. 6. 3 Future Work Our ongoing research looks at several open questions related to this work, and we Consider these questions in this section. Our approach is designed to be modular and supports the replacement of its var- iOus components. However, identifying the optimal combination of design patterns, metrics, and design change mechanisms for a given problem is nontrivial. Therefore, We are considering other combinations of design patterns, software engineering met- rics, and design change mechanisms in order to find the combinations that are most Effective for specific problem domains or design needs. The design change mechanisms used in this work focus exclusively on the con— Struction of design pattern instances. We would like to explore the use of traditional refactorings (e.g., moving a method from one class to another) as design change mechanisms in addition to minitransformations of O Cinnéide. It is crucial that 100 these refactorings are behavior-preserving. This addition might enable the approach to make fine-grained changes to the target software design and, in turn, produce a design of higher quality than what is currently possible. Similarly, we selected the QMOOD metrics suite as the sole measure of quality for evolving software designs. Since these metrics are the measurement tool that enables the GP to select the best individuals to survive, it follows that our choice of metrics influences the type and number of design pattern instances that evolve. Thus, we are exploring other suites of metrics in order to perform a comparison of fitness for purpose from which we can recommend a metrics suite based on the problem domain or software design under consideration. Fourth, we plan to consider different sets of software engineering metrics, such as those available in DesignAdvisor [3]. Studying a multitude of metrics in this aPPI‘oach will improve our understanding of how sets of refactorings interact with Various metrics and design patterns, thus enhancing our ability to tailor the approach to the specific needs of a given problem domain or user. The experiments presented in this thesis considered only the individual with the greatest fitness in each trial. Since the proposed approach focuses on the introduction of design pattern instances, we plan to modify the GP to examine multiple highly-fit individuals (e.g., the individuals with the five highest unique fitness values in the trial). This will enable us to identify useful individuals that evolved design pattern iIlStances but did not have the highest fitness. During our experimentation for this t»hesis, we observed useful design pattern instances evolving in individuals that did not have the best fitness, and therefore these instances did not appear in the final results. By exploring a larger group of highly-fit individuals, we may discover richer Sets of useful refactorings than what the current implementation provides. 101 APPENDICES 102 Appendix A Queries for Design Pattern Detection This section contains the Prolog and SQL queries that were developed in order to facilitate detection of design pattern instances. Each query detects a different type Of design pattern instance. The queries, along with the type of pattern instance that they detect, are presented in Table A.1. A-1 PROLOG Queries Abstract Factory a:bs‘tractjactory (AFact , CFact , AProd , CProd , Client) :- cls(AFact), cls(CFact), cls(AProd), cls(CProd), cls(Client), inherit(CFact,AFact), inherit(CProd,AProd), instantiate(CFact,CProd), fcall(Client,CProd), AFact \= CFact, AProd \= CProd, AFact \= Client, CFact \= Client, AProd \= Client, CProd \= Client, AFact \= AProd, AProd \= CFact. 103 Adapter adapter(Client, Adapter, Adaptee, Target) :- cls(Client), cls(Adapter), cls(Adaptee), cls(Target), fcall(Client, Target), inherit(Adapter,Target), fcall(Adapter, Adaptee), Client \= Adapter, Client \= Adaptee, Client \= Target, Adapter \= Adaptee, Adapter \= Target, Adaptee \= Target. Bridge bridge(Abstraction, RefAbstraction, Implementor, ConcImplementor) :- inherit(RefAbstraction, Abstraction), inherit(ConcImplementor, Implementor), aggregate(Abstraction, Implementor), Abstraction \= RefAbstraction, Abstraction \= Implementor, Abstraction \= ConcImplementor, RefAbstraction \= Implementor, RefAbstraction \= ConcImplementor, Implementor \= ConcImplementor. Composite composite(Client, Component, Leaf, Composite) :- fcall(Client,Component), inherit(Leaf, Component), inherit(Composite, Component), aggregate(Composite, Component), Client \= Component, Client \= Leaf, Client \= Composite, Component \= Leaf, Component \= Composite, Leaf \= Composite. 104 Decorator decorator(Component, ConcComponent, Decorator, ConcDecorator) :- inherit(ConcComponent, Component), inherit(Decorator, Component), inherit(ConcDecorator, Decorator), Component \= ConcComponent, Component \= Decorator, Component \= ConcDecorator, ConcComponent \= Decorator, ConcComponent \= ConcDecorator, Decorator \= ConcDecorator. Prototype prototype(ProtAbs, ProtCon, InstClass, Oper) :- cls(ProtAbs), cls(ProtCon), cls(InstCIass), inherit(ProtCon, ProtAbs), Operation(Oper), own(ProtAbs,0per), instantiate(InstClass, ProtAbs), ProtAbs \= ProtCon, ProtAbs \= InstClass, ProtCon \= InstClass. Proxy Proxy(ReaISubject, Subject, Proxy) :— inherit(RealSubject, Subject), fcall(Proxy, RealSubject), RealSubject \= Subject, RealSubject \= Proxy, Subject \= Proxy. 105 A.2 SQL Queries Pattern Type SQL Query Abstract Factory SELECT DISTINCT * FROM tClass cAb- sProd, tClass cAbsFact, tClass cConProd, tClass cConFact JOIN tInherit tInheritFact ON tInherit— Fact.sinkchbsFact.name JOIN tlnstantiate ON tIn- stantiate.sourceztlnheritFact.source JOIN tInherit tInher- itProd ON tInheritProd.sourceztInstantiate.sink WHERE tInheritFact.sink l: tInheritProd.sink AND tInherit- Fact.source l: tInheritProd.source AND tInheritFact.source = cConFact.name AND tlnstantiatesink = cConProd.name AND tInheritProd.sink = cAbsProd.name Adapter SELECT DISTINCT * FROM tClass cClient, tClass cAdapter, tClass cAdaptee, tClass cTarget JOIN tCall callClientTarget ON (cClient.nan'iezcallClientTarget.source AND cTarget.name=callClientTarget.sink) JOIN tIn- herit inheritAdapterTarget ON (inheritAdapter- Target.sink=cTarget.name AND inheritAdapterTar- get.source=cAdapter.name) JOIN tCall callAdapter- Adaptee ON (callAdapterAdaptee.source=cAdapter.name AND callAdapterAdaptee.sink=cAdaptee.name) WHERE cAdapter.name !2 cClient.name AND cAdaptee.name l: cClient.name AND cAdaptee.name !2 cAdapter.name AND cTarget.name l: cClient.name AND cTarget.name != cAdapter.name AND cTarget.name != cAdaptee.name 106 Bridge SELECT DISTINCT * FROM tClass cAbstraction, tClass cRefinedAbstraction, tClass cImplementor, tClass cConcreteImplementor JOIN tInherit inheritRefinedAb— stractionAbstraction ON (inheritRefinedAbstraetionAb— straction.sourcechefinedAbstraction.name AND inheritRe- finedAbstractionAbstraction.sink=cAbstraction.name) JOIN tInherit inheritConcreteImplementor ON (inheritConcreteIm- plementor.source=cConcreteImplementor.name AND inherit- Concretelmplementor.sinkzclmplemcntor.name) JOIN tAg— gregate aggregateAbstractionImplementor ON (aggregateAb— stractionImplementor.sourcechbstraction.name AND ag- gregateAbstractionlmplementor.sink=clmplementor.name) WHERE cRefinedAbstraction.name l: cAbstraction.name AND cImplementor.name l: cRefinedAbstraction.name AND cConcreteImplementor.name !2 cAbstraction.name 107 Composite SELECT DISTINCT * FROM tClass cClient, tClass cCompenent, tClass cLeaf, tClass cComposite JOIN tCall callClientComponent ON (callClientCompo— nent.soureechlient.name AN D callClientCompo— nent.sinkchomponent.name) JOIN tInherit inheritLeaf- Component ON (inheritLeafComponent.source=cLeaf.name AND inheritLeafComponent.sink=cComponent.name) JOIN tInherit inheritCompositeComponent ON (inheritComposite- Component.sourcc=cComposite.name AND inheritCompos- iteComponent.sinkchomponent.narne) JOIN tAggregate aggregateCompositeComponent ON (aggregateCompos- iteComponent.source=cComposite.name AND aggregate- CompositeComponent.sink=cComponent.name) WHERE cCompenent.name !2 cClient.name AND cLeaf.name != cClient.name AND cLeaf.name !2 cCompenent.name AND cComposite.name l: cClient.name AND cComposite.name !2 cLeaf.name AND cComposite.name !2 cCompenent.name 108 Decorator SELECT DISTINCT * FROM tClass cCompenent, tClass cConcreteComponent, tClass cDecorator, tClass cConcreteDecorator JOIN tInherit inheritConcreteCompo— nentComponent ON (inheritConcreteComponentCompo- nent.source=cConcreteComponent.name AND inheritCon— creteComponentComponent.sinkchomponent.name) JOIN tInherit inheritDecoratorComponent ON (inheritDecora— torComponent.sourcechecorator.name AND inheritDec— oratorComponent.sink=cComponent.name) JOIN tInherit inheritConcreteDecoratorDecorator ON (inheritConcreteDec— oratorDecoratorsourcezcConcreteDecorator.name AND inheritConcreteDecoratorDecorator.sink=cDecorator.name) WHERE cConcreteComponent.name != cCompenent.name AND cDecorator.name !2 cCompenent.name AND cDeco- rator.name !2 cConcreteComponent.name AND cConcret- eDecorator.name !2 cCompenent.name AND cConcret- eDecorator.name !2 cConcreteComponent.name AND cConcreteDecorator.name l: cDecorator.name 109 Prototype SELECT DISTINCT * FROM tClass protAbs, tClass protCon, tClass cInstClass JOIN tInherit ON tInherit.source=protCon.name JOIN tOwn ON tOwn.source=tInherit.sink JOIN tOperation ON tOperation.name=tOwn.sink JOIN tlnstantiate ON (tInstantiate.sourcezclnstClass.name AND tlnstan- tiate.sink=protAbs.name) WHERE tInherit.sink = protAbs.name AND protAbs.name l: protCon.name AND protAbs.name l: cInstClass.name AND protCon.name !2 cInstClass.name Proxy SELECT DISTINCT * FROM tClass cRealSubject, tClass cSubject, tClass cProxy JOIN tInherit in- heritRealSubjectSubject ON (inheritRealSubjectSub— ject.sourcc=cRealSubject.name AND inheritRealSubject- Subject.sink=cSubject.name) JOIN tCall callProxyRealSub— ject ON (callProxyRealSubject.source=cProxy.name AND callProxyRealSubject.sink=cRealSubject.name) WHERE cSubject.name l: cRealSubject.name AND cProxy.name != cRealSubject.name AND cSubject.name !2 cRealSub— ject.name Table A.1: SQL Queries for Design Pattern Detection 110 Appendix B Transcript of the ReMoDD GP Run This chapter contains a transcript of the output from our prototype implementation from Experiment V (the REMODD case study) presented in Chapter 5. Following the transcript, we discuss an instance of the Abstract Factory design pattern that evolved during the run of Experiment V. Architecture: x86_64 | ECJ I An evolutionary computation system (version 18) I By Sean Luke I URL: http://cs.gmu.edu/"eclab/projects/ecj/ I Mail: ecj-hepocs.gmu.edu I (better: join ECJ-INTEREST at URL above) I Date: June 23, 2008 I Current Java: 1.6.0_14 / Java HotSpot(TM) 64-Bit Server VM-14.0-b16 I Required Minimum Java: 1.3 111 Threads: breed/1 eval/l Seed: 5 Job: 0 Setting up Processing GP Types Processing GP Node Constraints Processing GP Function Sets Processing GP Tree Constraints Determining Tree Sizes [...1 Compiling Distributions WARNING: In function set f0 for the GPTreeConstraints tcO, no *nonterminals* are given with the return type StringNode which is required by other functions in the function set or by the tree’s return type. This may or may not be a problem for you. PARAMETER: gp.tc.0 WARNING: In function set f0 for the GPTreeConstraints tcO, no terminals are given with the return type nil which is required by other functions in the function set or by the tree’s return type. Nearly all tree-builders in ECJ require the ability to add a terminal of any type for which there is a nonterminal, and at any time. Without terminals, your code may not work. One common indication that a tree-builder has failed due to this problem is if you get the MersenneTwister error ’n must be positive’. PARAMETER: gp.tc.0 112 Initializing Generation 0 Seed used for refactoring: 5 Input file: Models/ReMoDD.xmi Tree size penalty: 0.0025 DP Reward Coefficient: 0.5 MT Reward Coefficient: 1.0 Prototype individual set up [ ... ] --- Graph construction complete --- Pattern detector initialized QMOOD: 3.2699559 Patterns: 136 { Prototype Prototype Prototype Prototype Prototype Prototype Prototype Prototype Prototype Prototype Prototype Prototype Prototype Prototype Prototype Prototype Prototype Prototype Prototype Prototype Prototype Prototype Prototype Prototype Prototype Prototype Prototype Prototype Prototype Prototype Prototype Prototype Prototype Prototype Prototype Prototype Prototype Prototype Prototype Prototype Prototype Prototype Prototype Prototype Adapter Adapter Adapter Adapter Adapter Adapter Adapter Adapter Adapter Adapter Adapter Adapter Adapter Adapter Adapter Adapter Adapter Adapter Adapter Adapter Adapter Adapter Adapter Proxy Proxy Proxy Proxy Proxy Proxy Proxy Proxy Adapter Adapter Adapter Adapter Adapter Adapter Adapter Adapter Adapter Adapter Adapter Adapter Adapter Adapter Adapter Adapter Adapter Adapter Adapter Adapter Adapter Adapter Adapter Adapter Adapter Adapter Proxy Proxy Proxy Proxy Proxy Proxy Proxy Proxy Proxy Proxy Proxy Proxy Proxy Proxy Proxy Decorator Decorator Decorator 113 Decorator Decorator Decorator Decorator Decorator Decorator Decorator Decorator Decorator Decorator Decorator Decorator Decorator Decorator Decorator Decorator Decorator } (IVl,|E|): (167,373) lClasses at startl = 69 Subpop 0 best fitness of Generation 1 Subpop 0 best Generation 2 Subpop 0 best Generation 3 Subpop 0 best Generation 4 Subpop 0 best Generation 5 Subpop 0 best Generation 6 Subpop 0 best Generation 7 Subpop 0 best Generation 8 Subpop 0 best Generation 9 Subpop 0 best Generation 10 Subpop 0 best Generation 11 fitness fitness fitness fitness fitness fitness fitness fitness fitness fitness of of of of of of of of of of generation: generation: generation: generation: generation: generation: generation: generation: generation: generation: generation: 114 Fitness: Fitness: Fitness: Fitness: Fitness: Fitness: Fitness: Fitness: Fitness: Fitness: Fitness: .511621 .608894 .195335 .316859 .810745 .3174 .414059 .4795265 .538723 .022783 .522734 Subpop 0 best Generation 12 Subpop 0 best Generation 13 Subpop 0 best Generation 14 Subpop 0 best Generation 15 Subpop 0 best Generation 16 Subpop 0 best Generation 17 Subpop 0 best Generation 18 Subpop 0 best Generation 19 Subpop 0 best Generation 20 Subpop 0 best Generation 21 Subpop 0 best Generation 22 Subpop 0 best Generation 23 Subpop 0 best Generation 24 Subpop 0 best fitness fitness fitness fitness fitness fitness fitness fitness fitness fitness fitness fitness fitness fitness of of of of of of of of of of of of of of generation: generation: generation: generation: generation: generation: generation: generation: generation: generation: generation: generation: generation: generation: 115 Fitness: Fitness: Fitness: Fitness: Fitness: Fitness: Fitness: Fitness: Fitness: Fitness: Fitness: Fitness: Fitness: Fitness: .737719 .737986 .737588 .737986 .737986 .783194 .786055 .786582 .846434 .847186 .847186 .847324 .845243 .849785 Subpop 0 best fitness of run: Fitness: 9.849785 Subpop 0 best fitness of run: Fitness: 9.849785 Subpop 0 tree size of best individual: 19 Subp0p O MTs in best individual: 9 Subpop O graph (IVI,IEI) of best individual: (173,380) lClasses at endl = 75 New pattern instances in best individual: 24 Sample Design Pattern Instance Next, we consider an instance of the AB- STRACT FACTORY design pattern that evolved during the run whose transcript is shown above. A graphical depiction of this instance is shown in Figure B.1. Each rectangle represents a class in the ReMoDD design. For ease of understanding and post-processing of results, the string “” is appended by the GP to each class that plays a role in the design pattern instances, and only those role-playing classes are shown in the figure. The specific roles and the classes that play them are given in Table B1. The class ArrayList belongs to the Java standard library. We can understand the process that created the PAbstractRandomWord class by analyzing its name. First, the “RandomWord” suffix was produced by a StringNode that re- trieved a random noun from the GP’s domain-specific vocabulary that was discussed in Section 5.1.6. This noun was returned to the parent node of the StringNode. One of the StringNode’s ancestor nodes was a PartialAbstraction node that prefixed “RandomWord” with “PAbstract” (an abbreviation of the node type), thus result- ing in the final name: “PAbstractRandomWord”. This analysis is confirmed by the transformation tree depicted in Figure B2, which is taken from the best individual of the GP run described in this chapter. 116 /—\ Viewer User ArrayList BRIT PAbstractRandomWord l Figure B.1: Sample Abstract Factory Design Pattern Instance Role Class Filling the Role Abstract Factory User Concrete Factory Viewer Abstract Product ArrayList Concrete Product PAbstractRandomWord Table B.1: Role-playing Classes in Sample Abstract Factory Instance 117 Auuozsovcmmvouozmnwuum Asmeieflmveeozeefio Auuozaoucmmvouozmcfiuum Auoefiogfleaamgveeozm33 Auuoonvcmmvonozmcfiuum AHOHUO>V0©OmeMHU AUHDOwwVUUOZOOflwH Auuozeoucmmvovozmcwuum Aommnmumovovozmmeau Auuozsoocwmvouozmcflnum Ahnuu¢JUMHHuH¢vou02mmmao AmunoomvouozoomuH coauouuumncamfiuuem :oAuomuumnmameuuem AONDOOwVOUOZUUMNH :oauomuumnmamauumm AOHSUQmVQUOZOUcHH coauomuumndawwuumm coauomuumn (UML:StructuralFeature.type> (UML:StructuralFeature.type> . (UML:Generalization.parent> 129 P Y g P Y 136 137 139 (UML:StructuralFeature.type> (UML:StructuralFeature.multiplicity) 141 ’false’ isAbstract 8 ’false’ isActive 143 145 (UML:BehavioralFeature.parameter> 153 155 158 (UML:AssociationEnd.multiplicity> (UML:AssociationEnd.participant) (UML:AssociationEnd.multiplicity> (UML:AssociationEnd.participant> (UML:AssociationEnd.multiplicity> (UML:AssociationEnd.participant> (UML:AssociationEnd.multiplicity> 162 (UML:AssociationEnd.multiplicity) (UML:AssociationEnd.multiplicity> (UML:AssociationEnd.participant> 170 171 172 1h 174 -8000:OOOOOOOOOOOOOBEF’/> -8000:00000000000008F3’/> ~8000:00000000000008FC’/> -8000:0000000000000901’/> -8000:000000000000090C’/> -8000:0000000000000912’/> 175 (UML:AssociationEnd.participant> P Y 8 P Y 178 179 (UMLzAssociationEnd.participant> (UML:AssociationEnd.multiplicity> ’ I ; «a (UML:GeneralizableElement.generalization> (UML:AssociationEnd.multiplicity> 194 195 (UML:AssociationEnd.participant> (UML:AssociationEnd.multiplicity> 202 (UML:AssociationEnd.multiplicity> (UML:StructuralFeature.multiplicity> "ID-«J.- .IV“ '4 . .,: ' (UML:AssociationEnd.multiplicity> (UML:AssociationEnd.participant> (UML:AssociationEnd.participant) (UML:AssociationEnd.participant) 211 212 (UML:Generalization.parent> (UML:AssociationEnd.multiplicity> (UML:StructuralFeature.multiplicity> 215 (UML:GeneralizableElement.generalization> (UML:Generalization.parent> (UML:AssociationEnd.multiplicity> (UML:AssociationEnd.participant> (UML:AssociationEnd.multiplicity> i ... (UML:AssociationEnd.participant> (UML:AssociationEnd.multiplicity> 227 (UML:Generalization.parent> 228 229 230 (UML:AssociationEnd.participant) (UML:AssociationEnd.participant> 232 (UML:CompositeState.subvertex> 239 REFERENCES 240 REFERENCES [1] H. Albin-Amiot, P. Cointe, Y.G. Gueheneuc, and N. Jussien. Instantiating and Detecting Design Patterns: Putting Bits and Pieces Together. In Proceedings of ASE, 2001. [2] J. Bansiya and CG Davis. A hierarchical model for object-oriented design quality assessment. IEEE Transactions on Software Engineering, 28(1):4—17, 2002. [3] Berenbach, B., Hartman, J. DesignAdvisor, A UML-based Architectural Design Tool. Siemens Corporate Research, 2002. [4] Marcel Birkner. Object-Oriented Design Pattern Detection using Static and Dynamic Analysis in Java Software. Master’s thesis, University of Applied Sciences Bonn-Rhein—Sieg, Sankt Augustin, Germany, 2007. [5] LC. Briand, J .W. Daly, and J. Wiist. A unified framework for cohesion mea- surement in object-oriented systems. Empirical Software Engineering, 3(1):65- 117, 1998. [6] RP. Brooks. The mythical man—month: essays on software engineering. Addison-Wesley, 1995. - [7] SR. Chidamber and CF. Kemerer. A metrics suite for object oriented design. IEEE Transactions on software engineering, 20(6):476—493, 1994. [8] J. Clarke, JJ Dolado, M. Harman, R. Hierons, B. Jones, M. Lumkin, B. Mitchell, S. Mancoridis, K. Rees, M. Roper, et al. Reformulating software engineering as a search problem. IEEE Proceedings - Software, 150(3):161—175, 2003. [9] Eclipseorg. Eclipseorg java development environment. http : / / eclipse . org/ . [10] AE Eiben and M. Schoenauer. Evolutionary computing. Information Processing Letters, 82(1):186, 2002. [11] R. France, J. Bieman, and B.H.C. Cheng. Repository for model driven devel- opment (REMODD). Lecture Notes in Computer Science, 4364:1311, 2007. [12] E. Gamma, R. Helm, R. Johnson, and J. Vlissides. Design patterns: elements of reusable object-oriented software. Addison-Wesley Reading, MA, 1995. 241 [13] [141 [15] [16] [18] [19] [20] [21] [22] [23] [24] [25} P. Comes, F.C. Pereira, P. Paiva, N. Seco, P. Carreiro, J. Ferreira, and C. Bento. Selection and Reuse of Software Design Patterns Using CBR and WordNet. In Proc. of the 15th Int ’1 Conf. on Software Engineering and Knowledge Engineer- ing (SEKE 2003). San Francisco, volume 296, 2003. P. Games, PC. Pereira, P. Paiva, N. Seco, P. Carreiro, J.L. Ferreira, and C. Bento. Using CBR for Automation of Software Design Patterns. Lecture Notes in Computer Science, pages 534—548, 2002. M. Harman and BF. Jones. Search-based software engineering. Information and Software Technology, 43(14):833~839, 2001. Mark Harman and John A. Clark. Metrics are fitness functions too. In Proceed- ings of the 10th IEEE International Symposium on Software Metrics (MET— RI CS ’04), pages 58—69. IEEE Computer Society, 2004. C. He, F. He, K. He, J. Liu, and W. Tu. RoleOf Relationship and Its Meta Model for Design Pattern Instantiation. Lecture Notes in Computer Science, 35842642, 2005. M. Hitz and B. Montazeri. Chidamber and Kemerer’s metrics suite: a mea- surement theory perspective. IEEE Transactions on Software Engineering, 22(4):267—271, 1996. S.U. Jeon, J.S. Lee, and DH. Bae. An automated refactoring approach to design pattern-based program transformations in java programs. In Proceedings of the Ninth Asia-Pacific Software Engineering Conference, pages 337—345. IEEE Computer Society Washington, DC, USA, 2002. J .R. Koza. Genetic programming: on the programming of computers by means of natural selection. The MIT press, 1992. C. Kramer, C. GmbH, and L. Prechelt. Design recovery by automated search for structural design patterns in object—oriented software. In Proceedings of WCRE, volume 96, page 208, 1996. S. Markovié and T. Bar. Refactoring OCL annotated UML class diagrams. Software and Systems Modeling, 7(1):25-47, 2008. T. Mens. On the use of graph transformations for model refactoring. Lecture Notes in Computer Science, 41432219, 2006. T. Mens, S. Demeyer, and D. Janssens. Object-oriented refactoring using graph rewriting. Technical report, Technical Report vub—prog-tr-02-01, Vrije Univer— siteit Brussel, 2001. 213, 2001. U. Nickel, J. Niere, and A. Zundorf. The F UJABA environment. In Interna- tional Conference on Software Engineering, volume 22, pages 7428745, 2000. 242 [26] U. N ilsson and J. Maluszynski. Logic, programming and Prolog. Wiley, 1990. [27] M. O Cinnéide. Automated Application of Design Patterns: A Refactoring Approach. PhD thesis, University of Dublin, Trinity College, 2001. [28] M. O Cinnéide and P. Nixon. A methodology for the automated introduction of design patterns. In Software Maintenance, 1.999.(ICSM’99) Proceedings. IEEE International Conference on, pages 463—472, 1999. [29] M. O Cinnéide and P. Nixon. Automated software evolution towards design patterns. In Proceedings of the 4th International Workshop on Principles of Software Evolution, pages 162—165. ACM New York, NY, USA, 2001. [30] M. O’Keeffe and MO. Cinnéide. A stochastic approach to automated design improvement. In Proceedings of the 2nd international conference on Principles and practice of programming in Java, pages 59—62. Computer Science Press, Inc. New York, NY, USA, 2003. [31] M. O’Keeffe and M. O Cinnéide. Search-based refactoring: an empirical study. Journal of Software Maintenance and Evolution: Research and Practice, 20(5), 2008. [32] M. O’Keeffe and M. O Cinnéide. Search-based refactoring for software mainte— nance. The Journal of Systems 63 Software, 81(4):502v 516, 2008. [33] M.K. O’Keeffe and M. O Cinnéide. Getting the most from search-based refac- toring. In Proceedings of the 9th annual conference on Genetic and evolutionary computation, pages 111481120. ACM Press New York, NY, USA, 2007. [34] W.F. Opdyke. Refactoring Object- Oriented Frameworks. PhD thesis, University of Illinois, 1992. [35] M. OKeeffe and M. O Cinnéide. Search-based software maintenance. In Confer- ence on Software Maintenance and Reengineering (CSMR06), pages 249—260, 2006. [36] J. Paakki, A. Karhinen, J. Gustafsson, L. Nenonen, and A.I. Verkamo. Soft- ware metrics by architectural pattern mining. In Proceedings of the Interna- tional Conference on Software: Theory and Practice (16th IFIP World Com- puter Congress), pages 325-332, 2000. [37] R. Poli, WB Langdon, and NP. McPhee. A field guide to genetic programming. Lulu Enterprises Uk Ltd, 2008. [38] L. Prechelt, B. Unger, W.F. Tichy, P. Brossler, and L.G. Votta. A controlled ex- periment in maintenance comparing design patterns to simpler solutions. IEEE Transactions on Software Engineering, pages 113481144, 2001. 243 [39] B. Schulz, T. Genssler, B. Mohr, W. Zimmer, and El. Karlsruhe. On the computer aided introduction of design patterns into object-oriented systems. In Technology of Object-Oriented Languages, 1908. TOOLS 27. Proceedings, pages 258—267, 1998. [40] O. Seng, J. Stammel, and D. Burkhart. Search-based determination of refactor- ings for improving the class structure of object-oriented systems. In Proceedings of the 8th annual conference on Genetic and evolutionary computation, pages 1909—1916. ACM New York, NY, USA, 2006. [41] N. Shi and RA. Olsson. Reverse engineering of design patterns from Java source code. Computer, 2007. [42] University of Waterloo Software Architecture Group. SWAG Toolkit. http://swag.uwaterloo.ca, 2008. [43] G. Sunye, D. Pellet, Y. Le 'I‘raon, and J .M. Jezequel. Refactoring UML Models. Lecture Notes in Computer Science, pages 134—148, 2001. [44] L. Tahvildari and K. Kontogiannis. A metric-based approach to enhance de- sign quality through meta-pattern transformations. In Proc. European Conf. Software Maintenance and Reeng, pages 183—192, 2003. [45] Tigrisorg. Argouml modeling tool. http://argouml.tigris.org/. [46] M. Torchiano. Documenting Pattern Use in Java Programs. In Proceedings of ICSM 2002 (International Conference on Software Maintenance), pages 230— 233, 2002. [47] L. Torvalds et al. The Linux operating system. http:/ / www.1inux.org/ . [48] A. Tiifu, O. Seng, and T. Genssler. Automated design flaw correction in object- oriented systems. In Software Maintenance and Reengineering, 2004. CSMR 2004. Proceedings. Eighth European Conference on, pages 174—183, 2004. [49] N. Tsantalis, A. Chatzigeorgiou, G. Stephanides, and ST Halkidis. Design pattern detection using similarity scoring. IEEE Transactions on Software En- gineering, 32(11):896—909, 2006. [50] JR Warmer and AC. Kleppe. The Object constraint language. Addison- Wesley, 1998. [51] W. Zimmer. Relationships between design patterns. Pattern languages of pro- gram design, 1:345-364, 1995. 244 Mlcllmlsllllllllll[[[Vl[l[|][|[[[|[|[|]|[ES 3 1293 03003 5506