138

 

1Hams
l

)“P'
511-"

57,:25493

This is to certify that the
thesis entitled

Specifying Compositional Semantic Functions For Non-
Hierarchical Languages Using Natural Deduction Systems

presented by
Heather J. Goldsby

has been accepted towards fulﬁllment
of the requirements for the

M. S. degree in Computer Science

Major Professor's Signa'ture
¥/29/a%

Date

 

MSU is an Afﬁrmative Action/Equal Opportunity Institution

flo44v__

 

 

 

LIBRARY

Michigan State

University

 

 

PLACE IN RETURN BOX to remove this checkout from your record.
To AVOID FINES return on or before date due.
MAY BE RECALLED with earlier due date if requested.

 

DATE DUE

DATE DUE

DATE DUE

 

' ‘

 

 

 

 

 

 

 

 

 

 

 

 

 

6/01 c:/ClRC/DateDue.p65-p.15

 

SPECIFYING COMPOSITIONAL SEMANTIC FUNCTIONS
FOR NON-HIERARCHICAL LANGUAGES
USING NATURAL DEDUCTION SYSTEMS

By
HEATHER J. GOLDSBY

A THESIS

Submitted to
Michigan State University
in partial fulﬁllment of the requirements
for the degree of

MASTER OF SCIENCE
Department of Computer Science

2004

ABSTRACT

SPECIFYING COMPOSITIONAL SEMANTIC FUNCTIONS
FOR NON-HIERARCHICAL LANGUAGES
USING NATURAL DEDUCTION SYSTEMS

By
HEATHER J. GOLDSBY

UML-based integrated development environments (IDES) enable designers to cre-
ate and type-check UML designs. Some of these IDES can generate implementation
language code; whereas others translate diagrams into a formal notation that can
be analyzed by automatic veriﬁcation techniques. All of these approaches, however,
involve the translation of a UML diagram (or set of related diagrams) into some alter-
native representation. As UML matures and is used in a wider variety of application
domains, we expect the number of such translations to increase. This paper describes
a notation for specifying custom UML translations, from which lightweight transla-
tors can be generated. Moreover, we provide a set of algorithms organized into a tool

for ensuring these speciﬁed translations are well-formed.

To my Grammie, Jane Goldsby, who always said that education is something no one
can take away from you

iii

Acknowledgements

I wish to thank Dr. Stirewalt for teaching me nearly everything I know about
computer science and writing. As I complete this masters and look forward to pursing
my doctorate, I realize I have so much more to learn, but I am looking forward to
the process. Also, Min Deng and Dr. Betty Cheng whose research is correlated with
what is presented here and helped me so much along the way.

Finally, my parents, Douglas and Jill Goldsby, who were there when I needed
them them most and supported me through all of my difﬁcult times. I couldn’t have

made it through without them.

iv

Table of Contents

LIST OF FIGURES

1

2

Introduction

Background

2.1 UML and Metamodels ..........................
2.1.1 Class Diagram Notation .....................
2.1.2 Formally Representing a Metamodel ..............

2.2 Programming Language Theory .....................
2.2.1 Semantic Emotions ........................
2.2.2 Expressions ............................

2.3 Natural Deduction Systems .......................
2.3.1 Inference Rules ..........................
2.3.2 Derivations ............................

Compositional Semantics for Non-Hierarchical Languages

3.1 Problem: Assigning Compositional Semantics to Non-Hierarchical
Language Expressions ..........................

3.2 Solution: Hierarchical Abstract Syntax Based On Projections .....
3.2.1 Projection .............................
3.2.2 Hierarchical Abstract Syntax ..................

Specifying Semantic Functions Using NDS

4.1 Semantic Functions Operating Over Projection Type ASTs ......
4.2 Optimization ...............................
4.3 Procedure for generating an LN DS ...................
4.4 Beneﬁt: Symmetric Premise .......................
4.5 Formally Representing LN DS ......................

LN DS Example

5.1 Type layer .................................
5.2 Impl layer .................................
5.3 Example derivation ............................

Validation

6.1 Automatically checking if an LN DS is well-formed ...........
6.1.1 Adding Marked Associations ...................
6.1.2 Collapsing Generalizations ....................
6.1.3 Eliminating Reﬂexive Aggregations ...............
6.1.4 Checking if a Graph is Cyclic ..................

Related Work

Conclusion

22

22
23
23
27

31
31
34
36
37
39

44
44
46
48

52
52
54
56
57
58

6O

62

LIST OF REFERENCES

vi

64

2.1
2.2
2.3
2.4

3.1
3.2
3.3
3.4

4.1
4.2
4.3
4.4
4.5
4.6

5.1
5.2
5.3
5.4
5.5
5.6
5.7

6.1
6.2
6.3
6.4
6.5
6.6

List of Figures

UML metamodel ............................. 6
Expression Data Model Example .................... 17
Sample NDS ................................ 19
Sample Derivation 'Itee .......................... 20
Projection of an Expression Data Model Example ........... 25
Projection Types ............................. 27
small class diagram ............................ 29
ASG to AST conversion ......................... 29
Example NDS Built on HAS ....................... 32
Example Derivation applying N DS ................... 33
C++ AST ................................. 33
Example LN DS .............................. 35
Example Derivation applying LNDS ................... 37
Example of Symmetric Premises ..................... 38
LNDS Example (1) ............................ 45
LN DS Example (2) ............................ 46
LNDS Example (3) ............................ 47
LNDS Derivation Ttee Example ..................... 49
UML ASG ................................. 49
Constructed C++ AST .......................... 50
Code resulting from application of rule Classmp, ............ 51
HIJ language ............................... 53
HIJ LNDS ................................. 53
HIJ language after adding marked associations ............. 56
HIJ language after collapsing generalizations .............. 57
HIJ language after removing reﬂexive aggregations ........... 58
HIJ graph to be checked for cycles .................... 59

vii

Chapter 1: Introduction

UML-based integrated development environments (IDES) enable designers to cre-
ate and type-check UML designs. Some IDES can generate code from a UML model.
For example Rational Rose [31] and Poseidon [20] can generate stubbed class deﬁni-
tions from class diagrams, and some research tools can translate state diagrams into
formal speciﬁcations that can be analyzed using automatic veriﬁcation techniques
[27, 25, 21]. In addition to fully automated generation, many object-oriented design
methods include heuristics for reﬁning UML class diagrams into code or database
artifacts [16, 5]. All of these approaches involve the translation of a UML diagram
(or set of related diagrams) into some alternative representation. As UML matures
and is used in a wider variety of application domains, we expect the number of such
translations to increase. Each translation will need to be implemented by a translator,
which will need to be programmed and integrated into an IDE. This thesis describes
a notation for specifying custom UML translations, from which translators can be
generated.

This research builds upon the results of the Amalia project, which addresses how
to integrate formal analysis capability into IDES [36, 12, 37, 13]. Using the Amalia tool
suite, a designer declaratively Speciﬁes an analysis with axioms and inference rules
organized in a natural deduction system (N DS) [14]. Morevoer, the Amalia generator
produces analyzers, which compute behavioral analysis as a side effect of traversing
the internal representation of speciﬁcations in a formal language (e.g., LOTOS and
linear temporal logic), from a NDS. Unfortunately, it is not clear how to use N DS
to specify UML translations, thus we cannot directly use the Amalia tool suite to
generate UML translators.

Intuitively, a translation is a semantic function, i.e., a function that maps pro-

grams or models, such as UML diagrams, to target artifacts, such as C++ class stubs,

Promela speciﬁcations, or relational database schemas. Traditionally, semantic func-
tions are designed compositionally, which means that the meaning of a program is
deﬁned in terms of the meaning of its subprograms. Compositional semantic speciﬁ-
cations are designed to assume that the syntax of the source language is hierarchical,
by which we mean programs in the language can be represented as tree structures
[34]. UML’S syntax is not hierarchical [15]; hence its diagrams are represented not as
trees, but rather as labeled graphs. Thus, it is not obvious how to deﬁne a semantic
function whose domain consists of UML diagrams.

To deﬁne a semantic function that operates over UML diagrams, we transform
the labeled graphs to trees and then use traditional techniques to define a semantic
function that operates over these trees. Thus, we can think of a semantic function
whose domain consists of labeled graphs as the composition of two functions: 1) a
function that transforms the labeled graphs to trees; 2) a semantic function whose
domain consists of those trees. The primary contribution of this thesis is a method
for writing these semantic functions in such a way that the intermediary step of
constructing the trees is implicit.

To accomplish this, we introduce the notion of a projection, which is a collection
of types that provide a limited View of the objects in a labeled graph. Projections form
trees, which are elaborated in a demand driven fashion by incrementally traversing
the graph structure being projected. Thus, the implementation of a semantic function
over projections will, as a side effect of traversing (and therefore elaborating) a pro—
jection tree, traverse the nodes and edges of the labeled graph. In fact, we Show how
such a function can be implemented without explicitly instantiating the projection
trees.

This thesis contributes: (1) a method for using N DS to write semantic functions
that assign meaning to labeled graphs without explicitly converting the graphs to

trees, and (2) a tool that checks whether these speciﬁcations are well-formed. The re-

mainder of this paper is organized as follows. We ﬁrst introduce the key background
components. Speciﬁcally, we introduce the UML notation, programming language
theory, and N DS (Chapter 2). We then formally deﬁne a method for projecting la-
beled graphs. (Chapter 3). Next, we deﬁne a method for writing semantic functions
that operate over these projections. We then introduce a syntactic extension to N DS
that allows us to write semantic functions that operate over labeled graphs. (Chap-
ter 4). Speciﬁcations written using our syntactic extension of NDS have the potential
to be ill-formed, in which case the translator may traverse source graphs without ter-
minating. We provide algorithms for checking that semantic functions written using
our extensions are well-formed (Chapter 6) and conclude with a discussion of lessons

learned and future work (Chapter 8).

Chapter 2: Background

Our overall goal is to write semantic functions from which we can automatically
generate UML translators. This thesis contributes a method for compositionally as-
signing meaning to non-hierarchical language diagrams, represented as labeled graphs,
and a method for implementing such speciﬁcations using graph traversal algorithms.
We rely upon the contributions of [11] for extending the syntax of the target language
so as to simplify semantic speciﬁcation. In addition, our work builds upon the results
and terminology of three primary ﬁelds: UML modeling (Chapter 2.1), program-
ming language theory (Chapter 2.2), and NDS (Chapter 2.3). We build upon
the terms and techniques of programming language theory, which is concerned with
Specifying and implementing semantic functions that operate over trees, to specify
and implement semantic functions that operate over labeled graphs. In the sequel,
we present algorithms, which analyze if such a semantic Speciﬁcation is well formed,
by appealing to a formalization of the metamodel of the source language. We present

this formalization in Chapter 2.1.2.

2.1 UML and Metamodels

The United Modeling Language (UML) [7] is a collection of twelve diagram lan-
guages used for specifying and documenting the artifacts of an object-oriented soft-
ware system. We are interested in translating UML class diagrams, which illustrate
the static structure of a system in terms of classes and relationships between classes.
In addition to being the source language of our translations, we also use UML to
represent the syntax of the various languages involved in translation (including UML
itself) using metamodels, which are class diagrams that model the syntax of a lan-

guage. For example, Figure 2.1 depicts a metamodel of the UML class diagram

langauge, which we adapted from [5]. We now introduce the class diagram notation

and a formal representation of the metamodel in Z [35].

neonate 1:2: n3 8&3

82022

 

            
            

.......... ms cozuaougaasao O

F" anaemia:

=0: . £220 . cumuoucsom

 

 

4
F u o2? 338.630.

   

O"

             

a
mEszoEgo

E

  
 

       

 

 

  

£s>usm=csms 8.80253
...o «BEES . . no «Bans»
mEEanoawa %
,. 0:32.55.
232.5332»... SEE—.0 . o

  

 

4
oEmZoSntzm
m3ntt<

 

83529. al]|]4
Elms];
25.8.23 L.

oEuzcoﬁemEn.
REESE.

 

2.1.1 Class Diagram Notation

A UML class diagram depicts a collection of classes and relationships among these
classes. A class is used to denote a set of objects with similar properties. Graphically
a class is depicted as a rectangle. Classes may have zero or more attributes, which
are listed below the name of the class within the rectangle. For example, Figure 2.1
depicts a class named Parameter, which has an attribute called ParameterName.

Relationships in UML are depicted as lines between classes. The generalization
relationship is used to express that one class is a more highly specialized version
of another class. The most general class is called the superclass, whereas the more
specialized classes are called the subclasses. Subclasses inherit the attributes, and re-
lationships of their superclass. Generalization relationships are graphically depicted
by a triangle attached to the superclass with lines connecting the triangle to the
subclasses. An example generalization is the relationship between Type, Domain and
Class, in which Type is the superclass and Domain and Class are the subclasses (Fig-
ure 2.1). I

The association relationship is used to express an interaction between instances of
classes. An association is depicted as a line that connects two classesl, whose instances
are said to play a role in the association. Generally, associations are directed and
hence are depicted as arrows. An association depicted as a line without an arrow is a
bidirectional association. Associations are named, and each end of an association can
be given a name to further clarify the role that objects of the adjacent class play in
the association. Each role has a multiplicity that indicates the number of instances
of the adjacent class that will be linked with a single instance of the peer class under
the association. An example of a bidirectional association is roleClass, which relates

instances of class Class and class Role (Figure 2.1).

 

1It is possible to have an association that relates a class to itself. Such an association is called
reﬂexive.

An aggregation is a special form of association that denotes one class is a part of
another class. Aggregations are depicted as a line connecting two different classes with
a diamond on the line adjacent to the container class. For example, the composite
association depicted in Figure 2.1 is an aggregation that relates instances of class

Aggregation to instances of class Role (Figure 2.1).

2.1.2 Formally Representing a Metamodel

In the sequel, our algorithms determine if a semantic function Speciﬁed in a
variant of NDS is well-formed by appealing to the metamodel of the source language.
Hence, we need to represent metamodels as data typesz. We have chosen to write
the speciﬁcation of the metamodel data type in Z [35] to conform to related research
projects, which are also formalizing the UML metamodel using Z [11].

Fundamentally, a metamodel is a set of classes, a set of associations and a gen-

eralization relation. To model these concepts, we introduce the following given sets.

[CHAR]

STRING == seq CHAR

The CHAR set is used to form strings from which we can represent the names of

classes, attributes, and associations.

[CL/185-10, ATTRJD, ASSOCJD]

CLASSJD, AT TRJD, and ASSOCJD comprise the identiﬁers for classes, at-
tributes and associations respectively. Elements of these sets can only be compared

for equality; we use them to deﬁne accessor functions, which, for instance, retrieve a

 

2Our formalization models only the information needed by our algorithms, not all the information
contained within metamodels.

class’s attributes given a CLASSJD.

Each class in Figure 2.1 corresponds to some element in the set CLASSJD.
When referring to a speciﬁc CLASSJD, we use the name of the corresponding class
rendered in Helvetica font. For example, Parameter refers to the CLASSJD that
corresponds to the class named “Parameter” in Figure 2.1. We adopt a similar
convention for elements of sets ASS OCJD and ATTRJD. These conventions assume
that association and attribute names are unique in a model3.

Modeling the meaning of an attribute requires a space of legal data types, which

we model with the following free type:

DATATYPE 2:: boolean | integer I string

In this formalization, the type of an attribute can be either boolean (boolean), integer
(integer), or string (string).
Each class attribute comprises a name, and a data type. We formalize this notion

with a tuple:

ATTRIBUTE 2: STRING x DATATYPE

For instance, the attribute ParameterName (Figure 2.1) is represented as the tuple
("ParameterName", string).
The accessor function getAttribute retrieves the attribute tuple that corresponds

to an attribute identiﬁer. It is formally deﬁned as follows.

[ getAttribute : ATTRJD ——> ATTRIBUTE

 

Applying the getAttribute function to the attribute identiﬁer ParameterName yields

the tuple (“ParameterName", string).

 

3This assumption is not valid for general UML class models, but it greatly simpliﬁes our treat-
ment; thus we impose the restriction on Figure 2.1.

Formally, each class comprises a name and a set of attribute identiﬁers. We

represent this in Z as follows.

CLASS == STRING x PATTRJD

For instance, the class Parameter (Figure 2.1) is represented as the tuple
("Parameter”, {ParameterName}). The accessor function getClass returns the class
tuple that corresponds to a given CLASSJD. The accessor functions getClassName
and getClassA ttrs return a string that is the class’s name and the the set of attribute
identiﬁers that comprise a class respectively. These functions are formalized as fol-

lows.

getClass : CLASSJD —+ CLASS
getClassName : CLASS ——> STRING
getClassAttrs : CLASS —> PATTRJD

getClassName = ﬁrst[STRING, lP’ ATTRJD]
getClassAttrs = second [STRING , IPATTRJD]
\7’ c1, c2 : CLASSJD; classl, class2 : CLASS 0
getClassAttrs(classl) ﬂ getClassAttrs(class2) # Q (i) classl = class2

 

 

For example, applying the function getClass to Parameter yields the tuple
(“Parameter", { ParameterName }). Applying the function getClassName to class
represented by the tuple (“Parameter" {ParameterName}) yields “Parameter" and ap-
plying getClassAttrs to the same tuple yields {ParameterName}.

We also deﬁne relation genRelation to represent generalization relationships in
the metamodel. Speciﬁcally, the genRelation relates two class identiﬁers. The ﬁrst
class identiﬁer is that of the superclass and the second class identiﬁer is that of the

subclass. Formally, this relation is speciﬁed as follows.

I genRelation : CLASSJD +—> CLASSJD

\7’ c : CLASSJD o
(c, c) ¢ genRelation+

 

10

The invariant states that there are no cycles in the generalization hierarchy. If c is a
CLASSJD, then applying genRelationG {c} D returns the set of subclasses of c. For
instance, genRelationQ {Type} I) yields the set (Domain, Class}.

Classes relate to one another by associations. There are multiple types of asso-
ciations, which we deﬁne by the free type:

ASSOCIATION_TYPE 2:: aggregation(( CLASSJD x CLASSJD»
I directed((CLASS_ID x CLASSJD»

An association can be either an aggregation with a composite class and part class, or
a directed association with a source class and a target class. We model a bidirectional
association as two directed associations. The bidirectional association roleClass (Fig-
ure 2.1) is represented as directed (Class, Role) and directed (Role, Class). Likewise,
the aggregation part (Figure 2.1) is represented as aggregation (Aggregation, Role).
We deﬁne several helper functions to return pertinent information about

ASS 0 CIA TI 0N _ TYPES. They are as follows:

getAssocSrcClass : ASSOCIATION_TYPE —+ CLASSJD
getAssocTargClass : ASSOCIATION _TYPE —> CLASSJD

V c1, c2: CLASSJD o
getAssocSrcClass aggregation(c1, c2) = Cl

 

/\

getAssocSrcClass directed (c1, c2) = c1

/\

getAssocTargClass aggregation(c1, c2) = c2
/\

 

getAssocTargClass directed (c1, c2) = c2

The functions getAssocSrcClass and getAssocTargClass when applied to an
ASSOCIATION _TYPE element return the ﬁrst and second class identiﬁer respec-
tively. For example, applying the getAssocSrcClass function to the part aggregation

(Figure 2.1), represented by the element aggregation (Aggregation, Role), yields Ag-

11

gregation. Applying the getAssoc TargClass function to the same element yields Role.

An association comprises a name and a type as follows:

ASSOCIATION == STRING x ASSOCIATION_TYPE

For example, the part aggregation is represented as the tuple (”part", aggregation
(Aggregation, Role)).

We deﬁne a function that returns the tuple representing an association when
given an association identiﬁer. We also deﬁne accessor functions to return the ﬁrst
and second elements of a tuple representing an association. They are formalized as

follows.

getAssoc : ASSOCJD —+ ASSOCIATION
getAssocName : ASSOCIATION —-+ STRING
getAssocType : ASSOCIATION —) ASSOCIATION_TYPE

Va: STRING; at: ASSOCIATION_TYPE’ o
getAssocName = ﬁrst[STRING, ASSOCIATION_TYPE]
getAssocType = second[STRING, ASSOCIATION_TYPE]

 

 

For example, applying getAssoc to ASS OCJD (part) yields (”part", (aggregation,
Aggregation, Role)).
We model a metamodel as a set of classes and a set of associations. It is formalized

as follows.

__ Metamodel
classes : IP’ CLASSJD
assocs : PASSOCJD

classes = dom getClass
assocs = dom getAssoc

 

 

 

 

12

2.2 Programming Language Theory

Our work is concerned with deﬁning semantic functions, hence we build upon
the terminology of programming language theory. However, because this thesis is
concerned with specifying semantic functions that operate over labeled graphs, which

is not addressed by programming language theory, we also introduce terms we created.

2.2. 1 Semantic Functions

The programming—language semantics community uses NDS as a formal method
for Specifying semantics [34]. Moreover, the Amalia framework is capable of gener-
ating analyzers from semantic speciﬁcations written using NDS [36]. Thus, we use
NDS to formally specify semantic functions that translate UML to an implementation
language.

Formally, a semantic function maps elements of the syntactic domain to elements
of the semantic domain. A syntactic domain is the set of trees (or labeled graphs) that
represent valid programs (or diagrams) in the source language. A semantic domain
is the set of mathematical objects used to give meaning to elements of the syntactic
domain of a language [34]. In traditional denotational semantics, the syntactic do-
main is assumed to be an algebraic data type, which is a disjoint union of types that
are possibly recursive. Instances (or terms) of algebraic data types are constructed
from a ﬁnite set of operators, which are injective functions that map elements of the
constituent type into the union. At least one of the operators in an algebraic data
type must be nullary. For example, we can deﬁne an algebraic data type Nat whose
elements are constructed using two operators:

Zero :-—) Nat
Succ : Nat -—> Nat

where Zero is a nullary operator and Succ is a unary operator. Terms in this data type

13

include Zero, Succ(Zero), Succ(Succ(Zero)), etc. A language with hierarchical syntax
can trivially be represented as algebraic data type, where the operators correspond
to the language syntax constructs and syntactically correct programs in the language
can be represented as terms.

A semantic function is deﬁned compositionally by deﬁning cases based on the
major operator of the term to which the function is applied. If the major operator is
a nullary constructor, then the function is computed based on the basic attributes of
the term. By contrast, if the major operator is a constructor with arity > 0, then the
function is computed in terms of these basic attributes and the results attained by
applying the function recursively to the term’s operands. For example, we can deﬁne
a semantic function W : Nat —-> N as follows:

W|[Zero] = 0
W[Succ(x)]] = W][x]] + 1

Applying semantic function W to term Zero yields 0. Thus, the pair (Zero, 0) is said
to be an element of W. In the sequel, we refer to such pairs as mapping assertions [13]
and represent them Zero +—-> 0 to allow easy integration into the rule format. Com-
positional deﬁnitions are useful because they encapsulate (in the cases) the meaning
of each feature of a language in isolation. Notice, however, that this beneﬁt is achiev-
able because the syntactic domain is an algebraic data type (and thus its terms are
hierarchical).

By contrast, programs and diagrams in a non-hierarchical language (such as
UML) are represented as labeled graphs, rather than algebraic terms. Labeled graphs
lack the hierarchical operator—operand structure of algebraic terms. Consequently, it
is not obvious how to deﬁne a compositional semantics for such languages.

In the sequel, we Show how to transform labeled graphs into algebraic terms, for

which a compositional semantics can then be deﬁned. The functional composition of

14

the labeled graph to algebraic term translation with the semantic function yields a
compositional semantic function for translating UML diagrams. We ultimately want
to integrate a UML translator into an existing UML modeling tool. Thus, we deﬁne
the translation from labeled graphs into algebraic terms such that the translation
can be implemented to be performed incrementally, thereby allowing the generated
UML translator to generate target code as a side effect of traversing the in-memory
UML representations (which are labeled graphs). To make these ideas more precise,
we now describe how algebraic terms and labeled graphs are implemented in tools as

linked object structures.

2.2.2 Expressions

We refer to the implementations of both algebraic data types and labeled graphs
as expressions. A language’s abstract syntax refers to a class of expressions that record
the structure of programs or Speciﬁcations in that language. Expressions are generally
implementations of algebraic data types and hence are trees; thus we shall use the
conventional term abstract syntax tree (AST) [32] to refer to expressions representing
programs or speciﬁcations or to expressions representing (syntactically well-formed)
parts of a program or a Speciﬁcation. The implementation of labeled graphs e. g., UML
diagrams, are graphs; thus we introduce the term abstract syntax graph (ASG) to refer
to expressions representing models or to expressions representing (syntactically well-
formed) parts of models. All expressions are instantiations of expression types, which
we formally deﬁne as follows.

A data model D is a ﬁnite collection of classes, such that for each class c E D:
o The public interface of c contains no mutable operations, and

o For every class c’ that appears in the signature of an operation in the interface

of c: c’ E ’D.

15

We deﬁne the relation returnsD: D H D, such that V c, c’ E D o (c, c’) E returnsD if
and only if there exists an operation op in the interface of c such that c’ is referenced
in the return type of op.

An expression data model 8 is a data model with the additional constraint that
every parameterized operation4 of every class c E 8 takes parameters that are values
of primitive domains (i.e., integer, string, etc.) and the return type is either void,
a value of a primitive domain, or a reference to an instance of a class in 8. We refer
to the classes comprising an expression data model as expression types and we refer
to operations that return references to expression types as navigation operations.

Expression types are not necessarily algebraic data types because their instances
may be graphs rather than trees.

For example, Figure 2.2 depicts an expression data model with two expres-
sion types, Class and Role. In this example, Class contains one navigation op—
eration getRoleClass, and Role contains one navigation operation, also called
getRoleClass. Observe, instances of these expression types form graphs, rather
than trees. Thus, this expression data model is not an algebraic data type.

A metamodel is a graphical depiction of an expression data model. Each class
in the metamodel corresponds to an expression type. The outgoing associations of
a given class in the metamodel are represented as the pertinent expression type’s
navigation operations. Speciﬁcally, an association with name X is represented as
a navigation operation named getX (i.e., the association name prepended with the
keyword get). Moreover, getX returns a pointer to instances of the expression type,
which corresponds to the class at the target end of association X. The navigation
operation is a member of the expression type corresponding to the source class of
the association. Observe expression types Class and Role (Figure 2.2) correspond

to classes Class and Role (Figure 2.1). Navigation operations getRoleClass corre-

 

4excluding the constructor

16

 

class Class {
public:
C1ass( const string & name, const Role* role )
_name(name); _role(role) {}

const string & getNameC) const { return Ename; }
const Role* getRoleClass() const { return xrole; }

protected:
const string _name;
const Role* _role;

Expression Type Class

class Role {
public:
Role( const string & name, const C1ass* class )
_name(name); _class(class) {}

const string & getName() const { return _name; }

const C1ass* getRoleClassC) const { return _class; }

protected:
const string _name;
const Class* _class;

Expression Type Role

Figure 2.2: Expression Data Model Example

 

l7

Sponds to association roleClass.

2.3 Natural Deduction Systems

The programming-language semantics community use N DS to specify semantic
functions that assign meaning to algebraic terms [34]. NDS is able to specify only
those semantic functions whose syntactic domain is an algebraic data type. The
implementation of these semantic functions operate over ASTS, the in-memory rep-

resentation of algebraic terms.

2.3. 1 Inference Rules

An inference rule prescribes how to derive mapping assertions for an algebraic
term represented as a tree from the mapping assertion derivable from its subtrees.

An inference rule takes the following form:

 

Cl H d1
0n H dn
[rulename]
A(Cl,...,Cn) H B(d1,...,dn)
Informally, the rule states that if the subtrees c1, ..., cn translate into d1, ..., (1,, then the

tree A(c1, ..., cn) translates into B (d1, ..., dn). Mapping assertions in the numerator of
a rule (c1 i—> d1, ..., cn i—> dm) make up the rule’s premises. A rule without premises is
called an axiom. The denominator of a rule (A(c1, ..., C") H B (d1, ..., dn)) speciﬁes
a mapping assertion, called the conclusion, that is inferred using the rule.

Each side of the conclusion is called an AS T schema because traditionally the
implementation of each semantic the function will operate over ASTS and construct

ASTS. Each AST schema comprises an operator (e. g., A, B), which is an expression

18

 

[Zero]
Zero H 0

I H y

[Succ]
Succ(x) H PIUSOTWG/l

 

Figure 2.3: Sample NDS

type, and a series of parameters called free variables (e.g., (c1, ..., cn) and (d1, ..., d,,))
. These free variables will be bound to either strings or instantiated AST schemas
called operands.

Figure 2.3 depicts a sample NDS that Speciﬁes the semantic function W ref-
erenced earlier. Speciﬁcally, the AST schemas for axiom [Zero] comprise operators
Zero and 0; neither has free variables. The source AST schema for inference rule
[Succ] comprises an operator, Succ, and a free variable, x, which is a place holder for
an operand. Similarly, the target AST schema for inference rule [Succ] comprises an
operator, PlusOne, and a free variable, y, which is a place holder for an operand. The
single premise of this inference rule maps free variable x to free variable y.

We adopt several conventions regarding the naming of operators and free vari-
ables. Because the subject operator is an expression type of the language to be
translated, it will be depicted as a class in the metamodel depiction of the language’s
abstract syntax. We name free variables that parameterize the subject after the con-
structs in the metamodel. Free variables used as place holders for operands are named
after the association that relates the class representing the subject operator to the
class representing the expression type of the operand. For example, the metamodel
for the natural numbers language would contain classes named Succ, Zero, and Nat,
and a directed association or aggregation named :1: connecting Succ to Nat. Moreover,
Succ and Zero would inherit from the abstract class Nat. Non-operand free variable

placeholders are named after attributes of the classes depicted in the metamodel.

19

[Zero]

 

Zero H O
Succ(Zero) H Plus One(0)
Succ(Succ(Zero)) H PlusOne(PlusOne(O))

 

[Succ]

[Succ]

 

Figure 2.4: Sample Derivation Tree

2.3.2 Derivations

A derivation is the process of translating a tree in the conclusion of the rule by
inductively applying inference rules to translate the sub-operands of the tree until
only axioms can be applied. At this point, the original tree can be translated. If
NDS Speciﬁes a semantic function every pair in the function, has a corresponding
derivation tree whose conclusion is the mapping assertion representing the function.
This derivation tree proves the conclusion. 7

Derivation trees are able to prove translation by relying upon a technique called
structural induction, which is a form of mathematical deduction designed for proving
properties by decomposing tree structures. The leaves of a derivation tree correspond
to applications of axioms, or applications of the inductive base cases. Each non-leaf
node is an application of an inference rule and each edge connects an instantiated
premise with the derivation tree that proves its validity, or in our case proves it
translates correctly. These are applications of the inductive cases.

For example, Figure 2.4 depicts the derivation tree that proves
(Succ(Succ(Zero())), PlusOne(PlusOne(0()))) is an element of the semantic func-
tion W. The mapping assertion representing this pair Succ(Succ(Zero())) H
PlusOne(PlusOne(O()))) is the root of the derivation tree and is depicted at the
bottom of the diagram. This mapping assertion corresponds to the conclusion of
an application of rule [Succ]. The application of the premise of the rule [Succ] now
corresponds to the derivation tree that proves mapping assertion Succ(Zero()) H

PlusOne(0())). Thus, the premise has been replaced by another application of rule

20

[Succ], now with conclusion Succ(Zero()) H PlusOne(0())). The premise of this rule
has been replaced with an application of axiom [Zero], which proves the mapping

assertion Zero() H 0().

‘

21

Chapter 3: Compositional
Semantics for Non-Hierarchical

Languages

Because semantic functions are compositional, writing them for a hierarchical
language, whose instances are algebraic terms, is straightforward: The meaning of a
composite term is deﬁned in terms of the meanings of its parts. However, writing
semantic Speciﬁcations for a non-hierarchical language, e.g., UML, whose instances
are labeled graphs, is non-trivial because traversing a link from a labeled graph re-
turns another labeled graph, which may comprise the original labeled graph. Thus,
the meaning of a labeled graph cannot be speciﬁed compositionally by recursively

deconstructing a composite labeled graph into recursively smaller parts.

3.1 Problem: Assigning Compositional Semantics
to Non-Hierarchical Language Expressions

To write semantic speciﬁcations for a non-hierarchical language, we make the
language appear to be hierarchical by imposing a tree structure on its labeled graphs.
Conceptually, this is possible because a graph can be unwound into an tree without
any loss of information. The traversal begins at some designated node, from which
the root of a tree is created. As a graph is traversed by recursively visiting its graph
nodes and traversing their outgoing edges, a new tree node is created for every graph
node visited by the traversal, and a new tree edge is created for every graph edge
traversed by the traversal. If the graph contains a cycle, inﬁnitely many tree nodes

will be created to represent a single graph node, thus the tree is of inﬁnite depth.

22

Our key contribution is a technique for imposing an algebraic term structure onto
labeled graphs to make the latter appear to be ﬁnite trees. This technique enables
compositional speciﬁcation of functions over labeled graphs.

To avoid confusion, we have carefully distinguished a semantic function, which
maps labeled graphs to terms in an algebraic data type, from the implementation of
this function, which traverses an ASG to construct an AST. In the sequel, we use
the terms labeled graph and ASG interchangeably and the terms algebraic term and

AST interchangeably when the distinction is clear from context.

3.2 Solution: Hierarchical Abstract Syntax Based
On Projections

To make ASGS appear to be ﬁnite ASTS, we developed a new abstract syntax
representation called hierarchical abstract syntax (HAS). This representation is based
on the primitive notion of a projection, which implements the decorator pattern [19].

It is formalized as follows.

3.2.1 Projection

We say a data model 73 is a projection of an expression data model 8 if there

exists a surjective function hp: ”P —» 8 such that V c E ’P

1. The implementation of c declares a data member 3 whose type is a reference to

h'p(C). We shall henceforth refer to such an s as the source reference of c.

2. For each operation op in the interface of c; the return type of op is either void,

a value of a primitive domain, or an instance of some class c’ E P. Moreover:

o in the ﬁrst two cases, there must exist an operation op’ in the interface of

23

hp(c) such that op has exactly the same signature as op’ 1.

o in the last case, let

R opName ( argList )

be the signature of op. Then the interface of hp(c) must contain an oper-

ation op’ with signature

hp(R) * opName ( argList )

In other words, if op’s return type, R, is an instance of some class c’,
the interface of hp(c) must declare an operation op’ whose opName and

argList are identical to that of op, but whose return value is a reference to

hp(C,).

If such an hp exists, we refer to classes in dom(hp) as projection types and their

corresponding classes as source expression types.

For example, Figure 3.1 depicts a projection of the expression data model de-
picted in Figure 2.2. It has three projection types ClassO, Classl and RoleO. In
this example, Class is the source expression type of both ClassO and Classl; Role
is the source expression type of RoleO. The interface of ClassO declares operations
getName and getRoleClass, where getName has exactly the same signature as the
getName operation declared in the interface of Class. The signature of ClassO’s

operation getRoleClass is

RoleO getRoleClass( ) const

 

lAn operation’s signature consists of its name, return type and parameters.

24

 

class ClassO {
public:
ClassO( const Class* c ) : _c(c) {}

const string& getName() const { return _c->getName(); }
RoleO getRoleClass() const { return Role0(_c->getRoleClass(); }

protected:
const Class* _c;

Projection Type ClassO

class Classl {
public:
Classl( const Class* c ) : _c(c) {}

const string& getName() const { return _c->getName(); }

protected:
const Class* _c;

Projection Type Classl

class RoleO {
public:
RoleO( const Role* r ) : _r(r) {}

const string & getName() const { return _r->getName(); }
Classl getRoleClass() const { return C1assl( _r->getRoleClass()); }

protected:
const Role* _r;

Projection Type RoleO

Figure 3.1: Projection of an Expression Data Model Example

 

25

We thus expect the interface of h/p( ClassO) to declare an operation with the signature

hp(RoleO) * getRoleClass ( ) const

and indeed, the interface of Class (Figure 2.2) declares such an operation. Similarly,
the getName operations declared in the interface of RoleO and Classl have exactly
the same Signature as the getName operations declared in the interfaces of Role
and Class respectively (Figure 2.2). The signature of the operation getRoleClass

declared in the interface of RoleO is

ClassO getRoleClass( ) const

We thus expect the interface of hp(Role0) to declare an operation with the signature

hp(ClassO) * getRoleClass ( ) const

and indeed the interface of Role (Figure 2.2) declares such an operation.

We use projections to unwind labeled graphs into trees. Observe, the types
within an expression data model can be instantiated to form an ASG. The simplest
case of this occurring is when an expression data type contains a navigation operation
that returns a pointer to an instantiation of another expression data type, which
in turn contains a navigation operation that returns a pointer to an instantiation
of the original expression data type. Figure 2.2 depicts this scenario. Because
projection type operations return instances of projection types (Figure 3.1), rather
than pointers to instances of projection types, it is impossible for two projection type
instances to be mutually referencing. Thus, projection type instances form projection
ASTS, rather than ASGs. We further illustrate this point with an example in the

following section.

26

 

class ErrClassO {
public:
ErrClassO( const Class* c ) : _c(c) {}

const string& getName() const { return -c->getName(); }
ErrRoleO getRoleClass() const
{ return ErrRoleO(-c->getRoleC1ass(l; }

protected:
const C1ass* _c;

Projection Type ErrClassO

class ErrRoleO {
public:
ErrRoleO( const Role* r ) : _r(r) {}

const string & getName() const { return _r->getName(); }
ErrClassO getRoleClass() const
{ return ErrClass0( _r->getRoleClass()); }

protected:
const Role* _r;

};

Projection Type ErrRoleO

Figure 3.2: Projection Types

 

3.2.2 Hierarchical Abstract Syntax

Although instances of projection types are guaranteed to form ASTS, they are
not guaranteed to form ﬁnite ASTS. For example, the ErrRoleO and ErrClassO
depicted in Figure 3.2 are valid projection types, but traversals of their instances are
not ﬁnite. An instance of ErrRoleO contains operation getRoleClass whose return
type is an instance of ErrClassO. An instance of ErrClassO contains operation
getRoleClass whose return type is an instance of ErrRoleO. Traversals of instances

of these projection types continue to create new instances, which would be traversed

27

indeﬁnitely. To ensure the ASTS are ﬁnite, we deﬁne a language’s HAS as a collection
of projection types with some additional constraints. HAS is formalized as follows.

Let Layerp be a partition of the projection ’P, such that:

VS 6 Layerp o Vc,c’ E S o hp(c) = h/p(C,) 4:) c = c’
For each set S in the partition Layerp no two projection types in the set have

the same source expression type.

A trivial example of an HAS is where each of the partition’s sets contains one pro-
jection type. Obviously, it is impossible for a given set to contain multiple projection
types that have the same source expression type.

Layerp is a language’s HAS if Layerp can be indexed by a minimal, partially

ordered set I with a maximal index, such that: Vi E I:

1. flc E Layerp(i) o (c, c) E (Layerp(i) <1 return5p)+
The returnsp relation relates a projection type to the projection types that are
returned by any of its operations. Taking the transitive closure of the returns;:
relation with the range restricted to the projection types in Layerp(i) relates
a projection type to the projection types that are reachable by applying the
relation multiple times. Thus, the predicate states that a projection type in an

HAS is never reachable from itself.

2. VC 6 Layerp(i), Vc’ E P O (c, c’) E returnsp c» Elj oj Z i o c’ E Layerp(j)
If the returns? relation relates two projection types in different sets within the
partition, the projection type in the domain of the returnsp relation is in a set
whose index is less than or equal to the index of the set of the projection type

in the range of the relation.

We refer to Layerp(i) as a layer.
The projection types in Figure 3.1 form an HAS for an elided portion of UML’S

abstract syntax. The HAS is partitioned into two layers, 0 and 1. Layer 0 has

28

 

 

A

AtoBAgg

 

 

roleA

 

 

roIeB

 

 

 

Figure 3.3: small class diagram

 

A : Class
elementName = "A"

 

 

 

 

 

roleClass] [ roleClass

 

roleA : Role
roleName = "roleA"

 

 

 

A0 : ClassO

 

 

 

elementName = "A"

 

\
\
\

[ roleClass

 

roleAO : RoleO

 

 

roleName = "roleA"

 

 

 

 

 

 

 

\\ [ roleClass

A1 : Class1
elementName = "A"

 

 

 

 

 

 

b.
Figure 3.4: ASG to AST conversion

a.

two elements, RoleO and ClassO; whereas layer 1 has one element, Classl. To
demonstrate the correspondence between a projection type and both its layer and
source expression type, we adopt the convention of naming our projection types with
the name of their source expression type appended with the name of their layer.
ClassO, Classl and RoleO adhere to this convention.

The HAS we have speciﬁed allows us to construct ﬁnite ASTS of instances of
projection types by traversing labeled graphs. To concretely illustrate this solution,
we offer the following example. Figure 3.3 depicts a UML class diagram with two
classes, A and B. Class A is related to class B by aggregation AtoBAgg. Class A’s
role in AtoBAgg is roleA. Class B’s role in AtoBAgg is roleB. Figure 3.4 a depicts
a part of the ASG constructed using the UML metamodel depicted in Figure 2.1.

Using projection types, traversing the ASG depicted in Figure 3.4 a constructs

29

the projection AST depicted in b. A0 is an instance of ClassO; A1 is an instance
of Classl; and roleAO is an instance of RoleO. A0 contains a reference to A, the
instance of its expression type, and a link (called roleClass) to roleAO. roleAO contains
a reference to roleA, the instance of its expression type, and a link (called roleClass)
to A1. A1 contains a reference to A, the instance of its expression type. The semantic
meaning of A0 depends on the meaning of roleAO, which depends on the meaning of
A1. The meaning of A1 does not depend on the meanings of any projection type
instances. Thus, it is possible to deﬁne a compositional semantic function, whose
implementation would translate this AST to a target language AST.

Developing an HAS for a non-hierarchical language allows us to transform the
language’s expressions from ASGS to ﬁnite projection ASTS. Because ASTS can be
assigned meaning compositionally, we are then able to deﬁne a semantic function
whose implementation traverses these ASTS and as a by product produces a target

language AST, which can be pretty printed to produce implementation language code.

30

Chapter 4: Specifying Semantic
Functions Using NDS

The process we have speciﬁed for translating an ASG to a target language AST
involves ﬁrst projecting the ASG to a projection AST and then translating the projec-
tion AST to a target language AST by applying the NDS Speciﬁed semantic function.
In this section, we illustrate the second step of this process by Specifying a sample se-
mantic function, whose implementation translates projection ASTS to target-language
ASTS (Section 4.1). We then present a syntactic extension to N DS, which allows
us to optimize the process by eliminating the intermediary step of constructing pro—

jection ASTS (Section 4.2).

4.1 Semantic Functions Operating Over Projec-

tion Type ASTS

As an example, we Specify an NDS that translates projection ASTS to target
language ASTS. This example is selected for its brevity; the rules are not part of the
UML to C++ translation we specify later.

Figure 4.1 depicts an NDS that Speciﬁes a semantic function whose implemen-
tation maps instances of Classl, ClassO and RoleO (Figure 3.1) to C++ ASTS. Axiom
[Classl] translates instances of Classl to instances of the CppClassForwardDec, which
represents the forward declaration of a C++ class. Rule [RoleO] translates instances
of RoleO to instances of FunctionMember, which represents a C++ function member.
Rule [ClassO] translates instances of ClassO to instances of CppClass, which represents
a C++ class deﬁnition. The parameters of each subject operator may appear as the

subject of a premise or as a parameter of the target operator. If a given rule does not

31

 

role Class H functionM ember [ ClassO]

 

ClassO( elementName, role Class)
H

CppClass(elementName, functionMember)

 

[Classl]
Classl(elementName)
H
CppClassForward Dec( elementName)
role Class H cpp ClassForwardDec [RoleO]

 

Ro|e0(mleName, roleClass)
l'--)

FunctionMember(roleName, cppClassForwardDec)

Figure 4.1: Example NDS Built on HAS

 

reference a subject parameter in its premise or target, then that parameter may be
elided. Thus, the subject operator of projection rules with the same expression type,
such as [ClassO] and [Classl], have different parameters. Speciﬁcally, the subject
operator, 03550, of rule [ClassO], is parameterized by elementName and roleClass;
whereas the subject Operator of rule [Classl], which is Classl, is parameterized only
by elementName.

Applying the projection rules shown in Figure 4.1 to the projection AST de-
picted in Figure 3.4 b results in the derivation tree depicted in Figure 4.2. Initially,
the rule [ClassO] is applied to instance A0, which is a projection of instance A. For A0
to be translated roleAO, the instance representing AO’S single role, must be translated.
Thus, the premise of the application of [ClassO] translates roleAO by applying rule
[RoleO]. For roleAO to be translated, A1 must be translated. Thus, the premise of the
application of [RoleO] translates A1 by applying rule [Classl]. The resulting C++

AST generated by this derivation tree is depicted in Figure 4.31.

 

1The authors are aware that the code generated by this AST is nonsensical. This is because the
N DS example, which was selected to illustrate using NDS, is also nonsensical.

32

 

[CHassl]

 

Classl( “A”)

H

CppClassForward Dec( “A”) [[20160]

 

Ro|e0( “roleA”, Classl( “A"))

H

FunctionMember( “roleA”, CppClassForwardDec( “A”)) [ClassO]

 

ClassO(“A”, Role0( “roleA”, Classl( “A”))

H
CPPCIBSS( “A”, FunctionMember( “roleA”,
CppClassForwardDec( “A”))

Figure 4.2: Example Derivation applyinLNDS

 

 

A : CppClass

 

 

 

 

elementName = "A"

 

[ fuctionMember

 

roleA : FunctionMember
roleName = "roleA"

 

 

 

 

 

cppClassReturnType

 

A : CppClassForwardDec
elementName = "A"

 

 

 

 

 

Figure 4.3: C++ AST

33

Notice, the derivation is ﬁnite as are all derivations that use this set of rules.
Hence, this small example achieves our goal of specifying a compositional seman—
tic function that maps ASG in the syntactic domain of a non-hierarchical language

(UML) to elements of the semantic domain of another language (C++).

4.2 Optimization

We optimize the process of translating an ASG to a target-language AST by elim-I
inating the explicit construction of a projection AST, which currently is constructed
and then discarded. Recall, projection types are organized into layers, where different
projection types in the same layer do not have the same source expression type. This
has allowed us to informally deﬁne proj, an indexed family of functions (one function
per layer). Each function translates ASGS to projection ASTS (Section 3). More
formally, we deﬁne proj(z') (where z' E I) as a partial function that translates ASGS
to projection ASTS, where each projection AST is an instance of a class in Layerp(z').
Next, we used NDS to specify trans, a semantic function that translates projection
ASTS to target ASTS (Section 4.1). For each 2', the composition of trans and pr0j(z')
is a partial function that translates ASGS to target language ASTS. Our key insight
is that it is possible to deﬁne a family of functions, transproj, where each function
translates ASGS to target language ASTS. Speciﬁcally, there exists an indexed family
of functions transproj, such that for all z' E d0m(proj), transproj(z') = trans o proj(z').
In this section we construct transproj and in doing so optimize the translation process
by eliminating the explicit construction of projection ASTS.

Unfortunately, N DS is designed to specify only one function and hence cannot
specify a family of functions. Thus, to specify transproj, we extended NDS into a
new representation called, layered natural deduction systems (LNDS). For clarity, we

refer to the rules in an LNDS as expression inference rules.

34

 

role Class H0 functionM ember

 

 

[ClassAtLayerO]

Class(elementName, roleClass)

H0
CppClass( elementName, functionMember)

[ClassA tLayerl]

Class(elementName)

H1
CppClassForwardDec(elementName)

roleClass H1 cpp ClassForwardDec [ R0 le A t LayerO]

 

Ro|e(roleName, role Class)
H0

Function Member( roleN ame, CppClassForwardDec)

Figure 4.4: Example LNDS

 

An LN DS comprises one or more rule layers, where a rule layer is a set of expres-
sion inference rules that collectively specify a function in transproj. Syntactically, we
indicate, the rule layer to which a rule belongs by appending the name of the function
i to the maplet symbol in the conclusion. The symbol takes the form Hk, where k
is the function name. The premises of an expression inference rule can refer to a
different function than that speciﬁed by the conclusion. To clarify, we append the
name of the function used to apply the premise to the premise’s maplet symbol.

Figure 4.4 depicts an LN DS speciﬁcation of a semantic function that assigns
meaning to instances of Class and Role. The maplet symbol in each conclusion is
appended with the name of the function speciﬁed by the rule. Speciﬁcally, expression
rules [ClassAtLayerO] and [RoleAtLayerO] specify function 0; whereas expression rule
[ClassAtLayerl] speciﬁes function 1. The maplet symbols in the premises have been
labeled to indicate which function to apply to translate the premise’s subject. The
maplet symbol in the premise of [ClassAtLayerO] is appended with 0. Similarly, the

premise of [RoleAtLayerO] is appended with 1.

35

4.3 Procedure for generating an LNDS

We deﬁne the procedure for constructing an LN DS from an N DS that operates
over projection types and the code for these projection types. To generate an expres-
sion rule that operates over labeled graphs from a projection rule that operates over
terms, we let the subject operator of the generated expression rule be the source ex-
pression type of the projection rule’s subject operator. The names of the parameters
of the subject operator of an expression rule are the names of the parameters of the
subject operator in the projection rule.

Our NDS speciﬁes only one function; whereas our LNDS speciﬁes a family of
functions. We infer which function a generated expression rule speciﬁes using the
sufﬁx of the subject operator of the projection rule. Recall, this sufﬁx is the name of
a layer. We append this to the maplet symbol in the conclusion of the expression rule.
We infer which function to apply to translate the premise subject of an expression
rule using the signature of the operation in the projection type that bears its name.
Speciﬁcally, the signature of the operation will include a return type, which will be
the projection type of which the operation will return an instance. The name of this
projection type will contain a layer name that we append the maplet symbol the
corresponding expression rule premise.

Figure 4.4 depicts the LNDS equivalent to the NDS shown in Figure 4.1.
Speciﬁcally, [ClassAtLayerO], [ClassAtLayerl], and [RoleAtLayerO], (Figure 4.4)
correspond to [ClassO], [Classl] and [RoleO] (Figure 4.1) respectively. The sub-
ject operator of each expression rule’s conclusion is the source expression type of the
subject operator of the corresponding projection rule. Speciﬁcally, the subject oper-
ator of expression rules [ClassAtLayerO] and [ClassAtLayerl] is Class, which is the
source expression type of ClassO and Classl, the subject operators of their respec-
tive projection rules. The subject operator of expression rule [RoleAtLayerO] is Role,

which is the source expression type of RoleO, the subject operator of projection rule

36

 

[ClassA tLayerl]

 

Class( “A”)
H1
CppClassForward Dec( “A” )
Role( “roleA”, Class( “A”))

Ho
Function Member( “roleA” , CppClassFonNard Dec( “A”))
Class( “A”, Role( “roleA” , Class( “A”)))

H0

CppClass( “A”, Function Member( “roleA”, CppClassForwardDec( “A”)))

 

[Role/l tLayerO]

[ClassA tLayerO]

 

Figure 4.5: Example Derivation applying LNDS

 

[RoleO]. The names of the parameters of the subject operator of each expression rule
are those of the corresponding projection rule’s subject operator. Observe, we are
able to infer the layer name to append to each maplet symbol in the premises and
conclusions of the expression rules using the process described above.

The expression rules in Figure 4.4 are applied to ASGS, rather than ASTS.
Figure 4.5 depicts the derivation tree that results from applying the LNDS shown
in Figure 4.4 to the ASG depicted in Figure 3.4 a. First, the rule [ClassAtLayerO]
is applied to instance A. For A to be translated, roleA (the instance representing A’s
single role) must be translated under function 0. Thus, the premise of the application
of [ClassAtLayerO] is the result of applying rule [RoleAtLayerO] to roleA. For roleA
to be translated under function 0, A must be translated under function 1. This is
accomplished in the premise of the application of rule [Role/l tLayerO]. This derivation
produces the same C++ AST (depicted in Figure 4.3) as the derivation shown in

Figure 4.2.

4.4 Beneﬁt: Symmetric Premise

A benefit of using LNDS is that we are able to apply different functions to

the same ASG. An example where this ability is useful is translating a role in UML,

37

 

this H type "3mm Type [Rolempz]

 

Role(roleName, roleClass)
H imp!
AssocRo|e(roleName, return Type)

Figure 4.6: Example of Symmetric Premises

 

which represents the part an instance of a class plays in an association or aggregation.
Because we use roles as a way to reference a speciﬁc instance of a class that participates
in the relationship, we represent roles in C++ as a private data member and a public
function member. The data member stores a pointer to an instance of a C++ class
and the function member provides a method for retrieving the pointer to the instance
of a C++ class. An AssocRole is a function member and a data member that represent
a UML role. The name of the data member is that of the role and is transmitted
in string roleName. Similarly, the name of the function member is that of the role
prepended with the keyword “get”. The data type of the data member (which is the
same as the return type of the function member) is constructed by translating the
role using a different function.

The syntactic structure that facilitates this ability is a symmetric premise, i.e.,
an expression rule premise whose subject is the subject of the rule’s conclusion. Con-
tinuing with our example, the instance of role is the subject of the conclusion and the
subject of the premise, which constructs the return type. We indicate a premise is a
symmetric premise by using the keyword this as the subject of the premise.

Figure 4.6 depicts an inference rule that speciﬁes the translation of role de-
scribed earlier. Speciﬁcally, it translates an instance of Role to an instance of Assoc-
Role. The return type of the data member and function member, which constitute

AssocRole, is constructed by the symmetric premise.

38

4.5 Formally Representing LNDS

In the sequel, our algorithms determine if an LN DS is well-formed by appealing
to the metamodel of the source language. Hence we need to represent an LNDS as a
datatype. We have chosen to formally specify the LNDS data type in Z to correspond
with the rest of our work.

Fundamentally, an LNDS is a set of inference rules and axioms. To model an

LNDS, we introduce the following given sets.

[FV_ID, LJD]

These sets represent the identiﬁers for free variables and layers respectively. As
with the other identiﬁers we introduced, elements of these sets can only be compared

for equality. We declare the special free variable this to be an element of F VJD.

I this : FVJD

A free variable in any LNDS (for example Figure 4.6) corresponds an element
of F VJD. To refer to a speciﬁc F VJD, we use render the name of the corresponding
free variable in italics (as this is how the free variable names are represented in the
rules). For example, roleName refers to the free variable identiﬁer that corresponds
to the free variable of the same name in Figure 4.6. We adopt a similar convention
for elements of the set L_ID.

A free variable is associated with a string representing its name.

FREEVARIABLE == STRING

We also deﬁne accessor functions getF V and getFVName, where getF V returns the
free variable that corresponds to a given FVJD and getFVName returns the string

representing the free variables name when given a free variable.

39

getF V : F VJD —+ FREE VARIABLE
getFVName : FREE VARIABLE —+ STRING

We model a layer as a tuple consisting of a string, which is the name of a function,

and a natural number, which is the index of the layer.

LAYER == STRING x N

For example, (“impl", O) and (”type", 1) represent the layers used by the LNDS de-
picted in Figure 4.6. The accessor function getLayer returns the layer tuple that cor-
responds to a given LJD. The accessor functions getLayerName and getLayerIndex
return a string that is the layer’s name and a natural number, which is the layer’s
index, respectively. These functions are formalized as follows.

getLayer : L_ID -+ LA YER

getLayerName : LA YER —+ STRING
getLayerIndezr : LA YER —> N

getLayerName = ﬁrst[STRING, N]
getLayerIndezr = second[STRING, N]

 

 

For example, applying the function getLayer to impl yields the tuple ("impl",
0). Applying the function getLayerName to this tuple yields ”impl" and applying
getLayerIndezr to the same tuple yields 0.

We model a premise as a tuple consisting of two free variables (the source and

the target ) and a layer identiﬁer.

PREMISE == FVJD x LJD x FVJD

For example, the premise of rule [Roleimpll is formalized as the tuple (this, type,
returnType). We deﬁne three accessor functions to return the ﬁrst, second and third

elements of a tuple representing a premise.

4O

getPL : PREMISE _+ LJD
getPS : PREMISE —+ FVJD
getPT : PREMISE _+ FVJD

va1,fv2 : FVJD; layer: LJD o
getPS(fv1, layer,fv2) = fvl
/\
getPL(fu1, layer, fv2) = layer
/\
getPT(fv1, layer,fv2) = fv2

 

 

Applying function getPL to the tuple (this, type, returnType) yields type. Applying
functions getPS and getPT to tuple yields this and returnType respectively.

We model an AST schema as a class identiﬁer and a sequence of free variable
identiﬁers. Because we require all operators to name a class in the metamodel, each

operator is represented as a CLASSJDQ.

ASTSGHEMA == CLASSJD x seq FVJD

For example, the target of rule [Rolempjlis represented by the tuple (AssocRole,
(roleName, returnType)). The accessor functions getOperator and getFree Variables
returns the class identiﬁer and free variable identiﬁers of an AST schema tuple re-
spectively.

getOp : ASTSGHEMA —+ CLASSJD
getFVIDs : ASTSGHEMA —> seq FVJD

getOp = ﬁrst[CLASS_ID, seq FVJD]
getFVIDs = second [CLASS_ID, seq F VJD]

 

 

An inference rule’s conclusion is a tuple consisting of two AST schemas (the

subject and the target) and a layer identiﬁer.

CONCLUSION == ASTSGHEMA x LJD x ASTSGHEMA

 

2The target language operators will be depicted as classes on the target language metamodel.
Owing to space constraints, we do not depict the C++ metamodel.

41

For example, we represent the conclusion of rule [Roleimpll with the tuple ((Role,
(roleName, r0leClass)), impl, (AssocRole, (roleName, returnType))). We deﬁne three
accessor functions to access the ﬁrst, second and third elements of a tuple representing

a conclusion.

getCS : CONCLUSION —> ASTSGHEMA
getCT: CONCLUSION —» ASTSGHEMA
getCL: CONCLUSION _. LJD

V a1, a2 : ASTSGHEMA; layer : LJD o
getCS(a1, layer, a2) = a1
/\
getCL(a1, layer, a2) = layer
/\
getCT(a1, layer, a2) = a2

 

 

Applying the getCS function to the tuple ((Role, (roleName, roleClass)), impl, (As-
socRole, (roleName, returnType))) yields (Role, (roleName, roleClass)). Applying
getCT to the same tuple yields (AssocRole, (roleName, returnType)). Lastly, apply—
ing getCL yields impl.

We represent an inference rule as a schema consisting of a set of premises and a
conclusion. The four invariants constrain the legal schema bindings as follows. Each
parameter of the conclusion’s target is either a parameter of the conclusion’s subject
or is a premise’s target. Each premise’s target is a parameter of the target of the
conclusion. Each premise’s subject is a free variable that parameterizes the subject
of the conclusion or is a free variable whose name is the keyword this. The layers
used in all of the premises are constrained in that their index must be greater than

or equal to the index of the layer in the conclusion.

42

_ I nferenceRule

 

prems : llD PREMISE
conc : CONCLUSION

ran(getFVIDs(getCT(conc)))
C

Fan(getFVIDs(getCS(conc))) U getPT(] prems [)
getPT(] prems D Q ran(getFVIDs(getCT(conc)))
getPS(] prems D Q ran(getFVIDs(getCS(conc))) U {this}
V p : prems o
(getPLg getLayer g getLayerInde$)( p) S
(getCLg getLayer g getLayerIndez)(conc)

 

 

 

43

Chapter 5: LNDS Example

We present an LNDS and brieﬂy illustrate the process of generating C++ ASTS
from UML ASGS. Because generating code is not the thrust of this thesis, our example
is small and aims only to give a ﬂavor of how code generation is performed.

Figures 5.1 - 5.3 depict an LNDS speciﬁcation of a semantic function that
translates UML ASGS to C++ ASTS. This example has two rule layers named type
(Figure 5.1) and impl (Figure 5.2 and Figure 5.3), where impl < type. This
semantic function translates UML classes to C++ classes, UML attributes and asso-
ciations to C++ function members and data members, and UML generalizations to

C++ class inheritance. Brieﬂy, we explain each rule.

5. 1 Type layer

First, we describe the rules comprising the type rule layer, which is depicted in
Figure 5.1. Axiom [Classtype] is used to construct an instance of CppClass, which rep-
resents a C++ class declaration. When CppClass is instantiated with only a name, the
resulting instance of CppClass is a forward declaration. All applications of [Classtype]
generate forward declarations. Axiom [Domaintype] constructs a C++ domain from
a UML domain. We use domains to represent primitive types (i.e., non-class types
such as boolean and integer).

Rule [Roletype] is used to construct an AST representation of a pointer to an
instance of a C++ class, from the information contained within a UML Role. Its
premise computes the C++ class declaration (i.e., the instance of CppClass) associated
with the UML class referenced in the role.

Rule [Parametertype] is used to construct an AST representation of a parameter

of a function member. The UML parameter consists of a name and a type. The C++

44

 

Class ]
Class(elementName) [ type

thpe
CppClass(elementName)

 

. ‘ [Domaintype]
Domaln(domainName)

”type
CppDomain(domainName)

 

roleClass thpe TOZCCPPCIGSS [Rolet ]
C
Role(roleName, roleClass) yp
thpe
PointerType(roleName, roleCppClass)
param Type H type cpp Type [Para meter type]

 

Para meter( parameterN ame, param Type)

H type
CppPara meter (parameterName, cpp Type)

return Type H type data Type
opParams H type param Types

 

[Operation type]
Operation(0perati0nName, return Type,

opParams)

thpe
OperationType(operationName, data Type, paramTypes)

Figure 5.1: LNDS Example (1)

45

attributeDomain H type type

[Attributeimpz]
Attribute( attributeName, attributeDomain)

 

Himpl

DataMember(attributeName, type)

this v—+ returnT e
type yp [Roleimpz]

 

Role(roleName, role Class)

H impl

AssocRo|e(roleName, returnType)

Figure 5.2: LNDS Example (2)

parameter’s name will be that of the UML parameter (parameterName). The C++
parameter’s type will be constructed in the premise.

Rule [Operationtype] is used to construct an AST representation of the declaration
of a function member. The function member’s name will be that of the UML operation
(operationName). The function member’s return type will be constructed by the ﬁrst
premise. The function member’s parameter list will be constructed by the second

premise.

5.2 Impl layer

Next, we describe the rules comprising the impl rule layer, which is depicted in
Figure 5.2 and Figure 5.3. Rule [Attributeimpll constructs an AST representation
of a C++ class data member. The data member’s name is that of the UML attribute
(attributeName) and its type is constructed in the premise.

Rule [Roleimpl] is used to translate a UML role. Rule [Aggregationmpl] constructs
an AST representation of a C++ Aspect, which is a construct that we invented to

represent a set of sets of function members and data members that, when combined

46

part Him,” assocRole [Aggregation impl]

 

Aggregation(elementName, part)
Himpl

Aspect(elementName, assocRole)

associateRoles H -. assocRoleSet . . .
"”1 [OrdmaryAssocratIon

 

- - - . z]
OrdnnaryAssoc:atuon(elementName, ""p

associateRoles)
Himpl

Aspect( elementName, assocRoleSet)

‘SUPCTC’ZGSS H type Opp S up er [Generalizationimpz]

 

Generalization(superClass)
Himpl

CppGeneralization(cppSuper)

classA tts H imp; dataM ems
0p Class H type cpp Ops
/ assocs H imp, assocA spect
/ 0998 Himpl aggAspect
sup Class H impl cppSupers

 

[Class,-,,,p1]
Class(elementName,
sup Class, classA tts,
opClass, / assocs, / aggs)
Himpl
CppClass(elementName,
cppSupers, dataM ems,

cpp Ops, ﬁattten ( assocA spect) ,
flatten( aggAspect)))

Figure 5.3: LNDS Example (3)

47

and pretty printed, represent all of the code required to implement a UML ordinary
association or aggregation. We chose to represent a UML aggregation in C++ as a
one way relationship in which the instance of the class playing the role of the whole
maintains information about the instance of the class playing the role of the part].
The part role is translated by the premise.

Rule [OrdinaryAssociation,-mp,] constructs an AST representation of an Aspect for
each UML ordinary association. The single premise, which translates a Role to a set
of function and data members, is applied to each Role in an association. Thus, it
translates a set of roles to a set of sets of function and data members.

Rule [Generalizationimpz] is used to translate a UML generalization to an AST
representation of a C++ inheritance relation. The single premise translates the UML
super class in the generalization to a C++ class declaration.

Lastly, rule [Classmpl] translates a UML class to a C++ class declaration. No-
tice, rule [Classtype] also did this, but without any information about data members,
function members, or generalizations. If a C++ class declaration has already been
constructed with a given name, this rule will elaborate that declaration with new
information, rather than construct a new class. A C++ class consists of a sequence
of members (both function and data) and generalizations. The premises of this rule
translate a UML class’s attributes, operations, associations, aggregations and gen-
eralizations to C++ function member declarations, data members, and inheritance
relationships between C++ classes. The ﬂatten function is applied to assocAspect
(and aggAspect) to change it from a set of sets of function and data members to a set

of function and data members.

5.3 Example derivation

Figure 5.4 depicts the derivation tree that translates class A of the UML class

 

1Because the whole role is not used by this translation it is not represented in the rule.

48

 

Class 6
Class(“B”) [ typ ]

H type

CppClass( “B” )
Role( “roleB”, Class( “B"))

H type
PointerType( “roleB” ,

CppClass(“B”))
Role( “roleB”, Class( “B”))

[Roletype]

[Roleimpl]

l_"impl

AssocRole( “roleB” , PointerType( “roleB” ,
a II
CppClass( B ll) [Aggregation

i l
Aggregation( “AtoBAgg”, Role( “roleB”, Class( “BI/Ill) mp1

Himpl
Aspect( “AtoBAgg”, AssocRole( “roleB” ,
PointerType( “roleB” , CppClass( “B”)))) [Class 1
im 1
Class( “A”, E, Z, Q, Q, Aggregation(“AtoBAgg”, p
Role( “roleB”, Class( “B”))))
Himpl
CppClass( “A”, Q, G, e, Z, ﬂatten(Aspect( “AtoBAgg”, AssocRole( “roleB”,
PointerType( “roleB”, CppClass( “B”))))))

Figure 5.4: LNDS Derivation Tree Example

 

 

A : Class B : Class
elementName = "B"

 

 

 

 

 

 

 

 

elementName = "A"

ll \ag\gs

AtoBAgg : Aggregation
elementName = "AtoB"

,, Amsite part w

roleA : Role roleB : Role
roleName = "roleA" roleName = "roleB"

 

‘
7

 

 

 

sselgelm

roleClass
sse|oe|01
roleClass

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Figure 5.5: UML ASG

49

 

 

A : Cpﬁlm

elementName = “A“

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

aegAsped
AtoBAgg :Asm
elementName = "AtoBAgg"
assocRoleSet roleB : FunctionMem retumTYDO poms
memberName = "getroleB' ’
w roleB : PolnterEmpTyE B : CppClass
roleName = "roleB' roleName = “roleB“ elementName = 'B"
roleB : DataMember r
memberName = “roleB“ dalaMemberTwe

 

 

 

Figure 5.6: Constructed C++ AST

diagram depicted in Figure 3.3. The ASG for this class diagram is depicted in
Figure 5.5. First, the rule [Classimpz] is applied to instance A. For A to be translated,
aggregation AtoBAgg must be translated. Thus, the single premise shown of the
application of [Classmpl] is the result of applying rule [Aggregationimplj to AtoBAgg2.
For AtoBAgg to be translated, the part role, roleB must be translated. Thus, the single
premise of the application of [Aggregationimpl] is the result of applying rule [Roleimpzl
to roleB. For roleB to be translated, a type for it must be constructed. Hence, the
single premise of the application of [Roleimpzl is the result of applying rule [Roletype]
to roleB. For the type of roleB to be constructed the instance of its class, B, must be
translated. Thus, the single premise of the application of [Roletype] translates B into
a forward declaration of the C++ class B.

The resulting C++ AST is depicted in Figure 5.6. Figure 5.7 depicts the
C++ code that results from pretty printing this AST . Speciﬁcally, both instance A
and instance B are represented as C++ classes. Aggregation AtoBAgg is represented
as a function member and data member in class A, where the data member stores a

pointer to an instance of class B and the function member accesses this pointer.

 

2The other premises of the [Classmpl] rule are unnecessary for this translation and hence have
been elided.

50

 

class A {
public:
const B * getAtoBAgg();

protected:
const B * AtoBAgg;
}:

class B {

};

Figure 5.7: Code resulting from application of rule Classimpz

 

51

Chapter 6: Validation

An LN DS is well—formed if it is impossible to construct an inﬁnite derivation
by applying its rules. To ensure an LN DS is well-formed, we developed a tool that
determines if an LNDS is well-formed when given an LNDS and a metamodel depicting

the (original) abstract syntax of the (non—hierarchical) language.

6.1 Automatically checking if an LNDS is well-
formed

Each derivation traverses an ASG and applies functions to translate its nodes
and produce target language ASTS. A derivation is guaranteed to be ﬁnite if the
traversal never applies the same function to an ASG node more than once. We verify
this property by marking the traversal path that corresponds to each function in the
LN DS (as speciﬁed by one layer) on the metamodel and checking that the path is
acyclic. This is possible because the links, which are taken to traverse an ASG, are
depicted as associations in the metamodel and the ASG node types are depicted as
classes in the metamodel. Although the algorithms are speciﬁed here in Z, we have
implemented them in Haskell [23, 4].

To illustrate our algorithms we introduce a trivial non-hierarchical language HIJ
whose abstract syntax is depicted in Figure 6.1. An LNDS speciﬁed semantic func-
tion, which assigns meaning to syntactically valid programs in the HIJ language, is
depicted in Figure 6.2. Observe, there are two functions, Zero and One each of
which is speciﬁed by a rule layer and where the index of rule layer Zero is less than
the index of rule layer One. Although we demonstrate our algorithms by applying
them to layer Zero of the HIJ language, we have used these algorithms to ensure the

rules presented in Chapter 5 are well-formed.

52

 

 

 

 

DI

ItoH

 

 

 

 

 

 

ItoJ
I

 

 

 

 

Jtol

 

 

 

Figure 6.1: HIJ language

 

ItoH HZem TransOfIH
ItoJ HOne TransOfIJ

 

[IZero]
I(It0H, ItoJ)

HZero

TranslationOfIZero( TransOfIH, TransOfIJ)

JtoI H28", TransOfJI [128m]

 

J(JtoI)

l_*Zero

Translation OfJZero( Trans OfJI )

[1 One]

 

10
HOME

TranslationOfIOne()
Figure 6.2: HIJ LNDS

53

6.1.1 Adding Marked Associations

The ﬁrst step in checking if an LNDS rule layer is well-formed is adding marked
associations to the metamodel to depict the traversal path. Speciﬁcally, we add a
marked association connecting the class depicting the subject operator with each
class depicting the type of a premise subject. We have designed our LNDS so that
the name of the free variable in the premise’s subject corresponds to the name of an
association connecting the class depicting the subject operator to the class depicting
the type of the premise’s subject in the metamodell. If the unmarked association is
an aggregation, we add a marked aggregation; otherwise we add a marked ordinary
association.

Below we depict the Z schema for representing a marked metamodel,
MarkedMetamodel, which we then initialize by adding marked associations using
operation InitializeMarkedMetamodel. A MarkedMetamodel extends a Metamodel
with a relation named markedAggs, which relates two class identiﬁers, and a relation
named markedAssocs, which also relates two class identiﬁers. We restrict the domain
and range of these relations to be subsets of the class identiﬁers that comprise the

M etamodel .

 

_ MarkedMetamodel
Metamodel
markedAggs : CLASSJD -—> CLASSJD
markedAssocs : CLASSJD —> CLASSJD

dom markedAggs Q classes
ran markedAggs Q classes
dom markedAssocs Q classes
ran markedAssocs Q classes

 

 

 

We initialize a MarkedMetamodel with operation InitializeMarkedMetamodel us-

ing information from the LNDS. The inputs are a set of inference rules and the

 

1We adopted this convention to simplify the implementation. It is not necessary in general.

54

identiﬁer of the layer we are interested in marking.

_._ I nitializeM arkedM etamodel
AMarkedMetamodel

rules? : lP’ InferenceRule
layer? : LJD

markedAggs’ = {c1,c2 : CLASSJD |
3 a : assocs; r : rules?; p : PREMISE | p E r.prems o
getFV(getPS(p)) = getAssocName(getAssoc(a))
/\
getAssocType(getAssoc(a)) = aggregation(c1, c2)
/\
getCL(r.conc) = getPL(p) 2 layer?}
markedAssocs’ = {c1,c2 : CLASSJD |
El a : assocs; r : rules?; p : PREMISE I p E r.prems o
getFV(getPS(p)) = getAssocName(getAssoc(a))
/\
getAssocType(getAssoc(a)) = directed(c1, c2)
/\
getCL(r.conc) = getPL(p) 2 layer?}
classes’ = classes ’
assocs’ = assocs

 

 

 

 

An element is added to markedAggs for every association where the name of the
free variable in the premise’s subject is the same as the name of the association, the
type of the association is aggregation, and the layer index of the conclusion is the
same as both the layer index we are marking and the layer index of the premise. The
element we add to markedAggs is a tuple consisting of the class identiﬁers of the whole
and part class in the aggregation. Similarly, an element is added to markedAssocs
for every association where the name of the free variable in the premise’s subject
is the same as the name of the association, the type of the association is directed,
and the layer index of the conclusion is the same as both the layer index we are
marking and the layer index of the premise. The element we add to markedAssocs is
a tuple consisting of the class identiﬁers of the source and target class in the ordinary

association.

 

 

 

 

 

 

 

 

 

l J

-- . .-.-—q-L—_—..——- -_-"_'.u....__m_c- 'r.:.._._..1.

 

 

 

 

 

 

Figure 6.3: HIJ language after adding marked associations

For example, we initialize a MarkedMetamodel for our HIJ example. The ini-
tialized marked metamodel is depicted in Figure 6.1 and inference rules [IZero] and
[JZero] are depicted in Figure 6.2. Rule [IZero] has two premises. The ﬁrst premise
ItoH H Zen, TransOfIH belongs to the same rule layer as the conclusion (0), hence
a marked association depicting the relationship between the source operator, I , and
the type of the premise, H, should be added to the metamodel. The association that
depicts this relationship is an aggregation; thus, we add a marked aggregation that
connects l to H. The next premise ItoJ Hone TransOfJ is of a different rule layer,
so we ignore it. This process is repeated for rule [JZero]. The marked metamodel
consisting of classes, generalizations and marked associations (which are depicted in

gray) resulting from the application of this algorithm is depicted in Figure 6.32.

6.1.2 Collapsing Generalizations

Generalizations obscure the cycles within the traversal path. Thus, we collapse
all generalizations by replacing each association that involves the superclass of the

generalization with associations involving each of the subclasses of the generalization.

 

2Because our algorithms concern themselves exclusively with marked aggregations and associa-
tions we have elided all unmarked associations and unmarked aggregations.

56

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

A
a-

Figure 6.4: HIJ language after collapsing generalizations

The Z schema that collapses the generalizations of a MarkedMetamodel is formalized

as follows.

__ CollapseGens
AMarkedMetamodel

 

 

- classes’ = classes

assocs' = assocs

markedAggs’ = markedAggs g (genRelatiorﬁ)
markedAssocs’ = markedAssocs 3 (genRelation+)

 

 

Figure 6.4 depicts the HIJ metamodel after the generalization relationship re-
lating H (the superclass) to I and J (the subclasses) is collapsed. Originally, the
marked aggregation ItoH related class I to class H. This marked aggregation has now

been replaced with two marked aggregations that relate l to itself and | to J.

6.1.3 Eliminating Reﬂexive Aggregations

A reflexive aggregation is an association that aggregates a class to itself. Although
these aggregations appear to be cycles within the traversal path, the deﬁnition of

aggregation precludes an aggregation relating an instance to itself. Thus, they cannot

57

 

 

 

DI

 

 

 

 

 

l"
’-
, ~ «A»
~. r' F
lln'm‘ﬂ-I-dl

 

 

I J

u
1 "I..- -..u-‘w --——-—-r:-.-—_r.1o-4...~.-3

 

 

 

 

 

Figure 6.5: HIJ language after removing reﬂexive aggregations

specify cycles within a traversal and need to be eliminated prior to checking the
traversal path (as depicted by the marked associations in the metamodel) for cycles.
The Z schema that eliminates the reﬂexive aggregations in a marked metamodel is as

follows.

__ EliminateReﬂexiueAggs
AMarkedMetamodel

 

 

classes’ = classes
assocs’ = assocs

markedAggs’ = markedAggs \ {c : CLASSJD o (c, c)}
markedAssocs’ = markedAssocs

 

 

Figure 6.4 depicts a reﬂexive aggregation relating class I to itself. Applying
the eliminateRefleziveAgg procedure to this marked metamodel results in the marked

metamodel depicted in Figure 6.5 in which this aggregation has been removed.

6.1.4 Checking if a Graph is Cyclic

Our overall goal is to check if the traversal path speciﬁed by one rule layer is

cyclic. To accomplish this we have added marked associations depicting the traversal

58

 

 

 

 

 

 

l J

.—

 

 

 

 

 

 

Figure 6.6: HIJ graph to be checked for cycles

path to the metamodel, collapsed generalizations and eliminated reﬂexive aggrega-
tions. To check the metamodel for cycles we must view it as a graph by viewing
each class as a node and each marked association as a directed edge. Speciﬁcally, an
aggregation is an edge that points from the composite class to the part class. Viewing
a metamodel is this way reduces the problem to a cycle detection problem. If the
marked edges, which depict the traversal path, form a cycle, the LN DS is ill-formed.
This is formalized in Z as follows.

_ CyclicMarking
M arkedM etamodel

3 c : CLASSJD o (c, c) E (markedAggs U markedAssocs)+

 

 

 

 

Figure 6.6 depicts the graph representation of the HIJ metamodel. Observe that
for clarity we have transformed aggregations to arrows that point from the container
class to the part class". It is now a trivial process to deduce that graph contains a
cycle and thus the LNDS is not well-formed. If this graph did not contain a cycle, to
prove that the LNDS was well-formed each of the above algorithms would have to be

applied to the One rule layer.

 

3Ordinary associations have always been represented as arrows and hence are unmodiﬁed.

59

Chapter 7: Related Work

In this thesis, we have proposed a method for compositionally specifying UML’S
semantics using a syntactically extended version of NDS. By way of discussion, we
discuss two other approaches to formally specifying UML’s semantics.

First, we discuss correspondence-style rules (e.g., [33, 3, 8, 40, 41, 10, 39, 24, 26,
30, 9, 29]), which use natural language and code templates to specify the semantics
for a portion of the UML notation. For example, [33], [17] and [18] translate UML
diagrams to Z [35] and [3, 29] translate UML diagrams to PVS. [8, 40, 41] translate
OMT [5], a precursor of UML, diagrams to LOTOS [6]. The primary shortcoming of
correspondence—style rules is the imprecision of the prose speciﬁcation, which hides
ambiguities in the conditions for applying a rule. Thus, although this approach assigns
semantic meaning to UML diagrams by translating the diagrams to programs, in a
formal language, the ambiguity of the translation procedure makes it impossible to
check if the the semantic speciﬁcation is well-formed.

The second approach to assigning semantic meaning to UML diagrams is using
graph grammars, which transform UML models to models in other domain speciﬁc
languages, often times undergoing several intermediate transformations [28, 2, 1, 22].
The advantage to using intermediate transformations is that they are less complex
than the direct transformation and hence are easier to specify and maintain. [28] spec-
iﬁes each transformation using a domain mapping speciﬁcation, which is an extended
version of a UML object diagram that depicts the relationships between objects in
the source language and objects in the target language. [2, 1] specify each transfor-
mation with rules that depict mappings between metamodels and a sequencer, which
tells the order in which the rule should be executed. The primary shortcoming of
this approaches is that the transformations are speciﬁed with visual languages, whose

semantics are informal. Thus, there is ambiguity in the transformation speciﬁcation.

60

Again, it is impossible to check if each transformation is well-formed.

There are two distinct advantages to using LN DS to specify translations. First,
N DS, which LN DS extends, has formally deﬁned semantics, which eliminates all am—
biguity in translation speciﬁcation. This makes it possible to check if a translation
speciﬁcation is well-formed and to generate translators from translation speciﬁcations.
Second, NDS specify the meaning of language features compositionally, which means

the features of each language can be understood in isolation.

61

Chapter 8: Conclusion

In summary, we have presented an approach to compositionally specifying the
semantics of non-hierarchical languages. Speciﬁcally, we have proposed a method for
imposing hierarchical structure onto non-hierarchical syntax using projections. We
then were able to specify semantic functions, which operate over projection ASTS,
using NDS. As an optimization, we introduced LNDS, which speciﬁes semantic func-
tions that operate over ASGS without transforming them to projection ASTS. To
ensure LNDS speciﬁcations are well-formed, we have also developed and presented
a tool that checks for this property. We view this research into specifying semantic
functions as the ﬁrst step towards generating UML translators.

To that end, we have created J eannel, a prototype for such a translator generator,
which generates individually tailored translators for each LNDS speciﬁed UML trans-
lation. Each generated UML translator translates UML diagrams to target language
ASTS, which are pretty printed to produce implementation language code. State of
the art UML [7] translation tools, such as Rational Rose [31] and Poseidon [20], allow
the developer to choose from different UML to implementation language translations.
The different implementation language code generated by these tools are examples
of implementation variation. These tools artiﬁcially limit implementation variation
because the developer is unable to add new translations. For example, database de-
velopers who wish to implement their system in SQL are unable use non-extensible
UML generation tools that only have C++ speciﬁed translations. A UML transla-
tion tool is more useful if it provides the desired implementation variation, either by
having speciﬁed all the translations developers wish to use, which is very unlikely, or

by by allowing developers to specify translations. By allowing a developer to spec-

 

1Named after Jeanne Champollion, the mother of Jean FIanscois Champollion translator of the
Rosetta stone, an allusion to our tool being a generator of translators of the modern hieroglyphic,

UML.

62

ify UML’s semantics using LNDS and then use Jeanne to automatically generate a
translator that implements this translation, the implementation variation is limitless.

Possible future work includes reﬁning Jeanne. Although we have developed a
prototype, there are many nuances of the problem to be explored prior to publication.
Upon the completion of Jeanne, we will work to provide ﬁne grain code generation
options, granting the developer greater control over the generated code. We are also
interested in extending our work by investigating how the use of multiple types of
UML diagrams, such as state diagrams, could be used to generate more complete

code.

63

Bibliography
[1] Agrawal A., Levendovszky T., Sprinkle J., Shi F., and Karsai G. Generative
programming via graph transformations in the model-driven architecture. In
Proc. of OOPSLA, Workshop on Generative Techniques in the Context of Model
Driven Architecture, Seattle, WA, November 5, 2002, 2002.

[2] Shi F Agrawal A., Karsai G. A uml-based graph transformation approach for
implementing domain-speciﬁc model transformations. Technical Report ISIS-03-
403, Vanderbilt University, November 2003.

[3] Demissie B. Aredo, Issa Traoré, and Ketil Stolen. Towards formalization of UML
class structure in PVS. Technical report, University of Oslo, August 1999.

[4] Richard Bird. Introduction to Functional Programming Using Haskell second
edition. Prentice Hall Europe, 1998.

[5] Michael Blaha and William Premerlani. Object-Oriented Modeling and Design
for Database Applications. Prentice-Hall, Inc, Upper Saddle River, New Jersey,
1998.

[6] Tommaso Bolognesi and Ed Brinksma. Introduction to the ISO Speciﬁcation
Language LOTOS. Computer Networks and ISDN Systems, 14(1):25—59, 1987.

[7] Grady Booch, James Rumbaugh, and Ivar Jacobson. The Uniﬁed Modeling
Language User Guide. Addison—Wesley Publishing Company, Reading, Mas-
sachusetts, 1999.

[8] R. Bordeau and B. Cheng. A formal semantics for object model diagrams. In
IEEE Transactions on Software Engineering, pages 21(10):799—821, 1995.

[9] Jean-Michel Bruel. Transforming UML models to formal speciﬁcations. In Luis
Andrade, Ana Moreira, Akash Deshpande, and Stuart Kent, editors, Proceedings
of the OOPSLA ’98 Workshop on Forrnalizing UML. Why? How?, 1998.

[10] Scott A. DeLoach and Thomas C. Hartrum. A theory-based representation for
object-oriented domain models. IEEE Transactions on Software Engineering,
26(6), June 2000.

[11] Min Deng. Formal denotational semantics of UML using metamodels and NDS
rules for embedded systems domain. Technical report, Michigan State University,
2004.

[12] L. K. Dillon and R. E. K. Stirewalt. Lightweight analysis of operational speciﬁca-
tions using inference graphs. In Proceedings of the 23rd International Conference
on Software Engineering, pages 57—70. IEEE Computer Society Press, 2001.

[13] L. K. Dillon and R. E. K. Stirewalt. Inference graphs: A computational structure
supporting generation of customizable and correct analysis components. In IEEE
Transactions of Software Engineering, 2002.

64

[14] David Duffy. Principles of Automated Theorem Proving. John Wiley and Sons

[15]

[16]
[17]

[18]

[19]

[20]
[21]

[22]

[23]

[24]

[25]

[26]

Ltd., West Sussex P019 IUD, England, 1991.

Martin Erwig. Semantics of visual languages. pages 304—311. 13th IEEE Sym-
posium on Visual Languages, 1997.

J. Rumbaugh et al. Object-Oriented Modeling and Design. Prentice-Hall, 1991.

Andy Evans and Tony Clark. Foundations of the Uniﬁed Modeling Language.
In Proc. of the 2nd BCS-FA CS Northern Formal Methods Workshop, Ilkley, UK,
23-24 September 1997, 1997.

Robert France, Andy Evans, and Kevin Lano. The UML as a formal modeling
notation. In Haim Kilov, Bernhard Rumpe, and Ian Simmonds, editors, Pro-
ceedings OOPSLA ’97 Workshop on Object-oriented Behavioral Semantics, pages
75—81. Technische Universitat Miinchen, TUM-I9737, 1997.

E. Gamma, R. Helm, R. Johnson, and J. Vlissides. Design Patterns: Elements
of Reusable Object-Oriented Software. Addison-Wesley Publishing Company,
Reading, Massachusetts, 1995.

Gentleware. Poseidon. URL: http://www.gentleware.com/products.

S. Gnesi, D. Latella, and M. Massink. Model checking uml statechart diagrams
using jack. In Proc. of 4th IEEE International Symposium on High-Assurance
Systems Engineering, Washington DC, USA, 1999.

Wai Ming Ho, Jean-Marc Jquel, Alain Le Guennec, and Francois Pennaneac’h.
UMLAUT: An extendible UML transformation framework. In In Proceedings of
the IEEE International Conference on Automated Software Engineering, pages
275—278, 1999.

Paul Hudak, John Peterson, and Joseph Fasel. A Gentle Introduction To Haskell
98. 1999.

Soon-Kyeong Kim and David A. Carrington. A formal denotational semantics
of UML in Object-Z. L’OBJET: Software, Databases, Networks, 7(1), 2001.

Diego Latella, Istvan Majzik, and Mieke Massink. Towards a formal opera-
tional semantics of UML statechart diagrams. In Proc. F MOODS ’99, IFIP
T C6/ WG6.1 Third International Conference on Formal Methods for Open
Object-Based Distributed Systems, Florence, Italy, February 15-18, 1999. Kluwer,
1999.

Luigi Lavazza, Gabriele Quaroni, and Matteo Venturelli. Combining uml and
formal notations for modelling real-time systems. In Joint 8th European software
engineering Conference (ESEC) and 9th ACM SIGSOF T International Sympo-
sium on the Foundations of Software Engineering (FSE), 2001.

65

[27] W. E. McUmber and B. H. C. Cheng. A general framework for formalizing
uml with formal languages. In In Proc. of the 2001 International Conference on
Software Engineering (ICSE’2001), 2001.

[28] D. Milicev. Automatic model transformations using extended uml object dia-
grams in modeling environments. IEEE Trans. Softw. Eng, 28(4):413—431, 2002.

[29] D. Muthiayen and V. S. Alagar. Formalizing UML for rigorous software devel-
opment. In Luis Andrade, Ana Moreira, Akash Deshpande, and Stuart Kent,
editors, Proceedings of the OOPSLA ’98 Workshop on Formalizing UML. Why?
How?, 1998.

[30] G. Overgaard. A formal approach to relationships in the Uniﬁed Modeling Lan-
guage. In M. Broy, D. Coleman, T. S. E. Maibaum, and B. Rumpe, editors,
Proceedings PSM T ’98 Workshop on Precise Semantics for Software Modeling
Techniques, pages 91—108. Technische Universitat, Miinchen, Germany, TUM-
I9803, 1998.

[31] Rational. Rational rose. URL: http://www.rational.com.

[32] Ravi Sethi. Programming Languages : Concepts Ed Constructs. Addison-Wesley,
1996.

[33] M. Shroff and R. France. Towards a formalization of UML class structures in Z.
In In Proceedings of COMPSA C ’97, 1997.

[34] Kenneth Slonneger and Barry L. Kurtz. Formal Syntax and Semantics of Pro-
gramming Languages. Addison-Wesley, 1995.

[35] J. M. Spivey. The Z Notation: A Reference Manual. Prentice Hall, New York,
1992.

[36] Kurt Stirewalt and Laura K. Dillon. Generation of visitor components that
implement program transformations. In ACM SI GS OF T Symposium on Software
Reusability, pages 86—94, 2001.

[37] R. E. Kurt Stirewalt and Laura K. Dillon. A component-based approach to build-
ing formal analysis tools. In International Conference on Software Engineering,
pages 167—176, 2001.

[38] Bjarne Stroustrup. The C ++ Programming Language. Addison-Wesley Pub Co.,
2000.

[39] Meyer C. Tanuan. Automated analysis of uniﬁed modeling language UML spec-
iﬁcations. Master’s thesis, University of Waterloo, Waterloo, Canada, August
2001.

66

[40] Enoch Y. Wang and Betty H. C. Cheng. Formalizing and integrating the func-
tional model into object-oriented design. MSU Technical Report MSU-CPS-97—
34, Michigan State University, Department of Computer Science, East Lansing,
MI 48824, September 1997 . Submitted for publication.

[41] Enoch Y. Wang, Heather A. Richter, and Betty H. C. Cheng. Formalizing and
integrating the dynamic model within OMT. In IEEE Proceedings of the 19th

International Conference on Software Engineering, pages 45—55, Boston, MA,
May 1997. IEEE.

67

 

 

1.333." 'l.‘

‘l‘l - ‘I '5‘! ‘0‘ I) 14!.

V V

ll]lll[[l][ll[[l][[l[[7]][1

   

-;Al _._., _.