1 Introduction
-
We developed a conceptual framework along with a pattern for program reverse engineering from the viewpoint of metamodels.
-
Using a SLR to identify necessary features, we created a comprehensive taxonomy called ProMeTA. ProMeTA characterizes program metamodels in reverse engineering based on our framework.
-
We classified existing popular program metamodels based on our taxonomy.
2 Background
2.1 Program Reverse Engineering
2.2 Program Metamodel
3 Terminology and Conceptual Framework
3.1 Terminology
-
A model is a simplification of a system with an intended goal (Favre and Nguyen 2005). For example, a diagram showing only the program modules and their dependencies is a model of a program created with the goal of understanding the basic structure.
-
A metamodel is a model of the language that captures the essential properties and features of a model (Clark et al. 2015). In this context, a model is an abstraction of an aspect of the real world, while a metamodel is a further abstraction to describe the model. Although metamodels have primarily been developed and advertised by the Object Management Group (OMG) with its MOF standard (Alanen and Porres 2003) in the context of modelware, metamodels are not limited to MOF-based models. Examples of metamodels include Program metamodels in modelware, schemas (or exchange format) in dataware, and grammars in grammarware (Favre and Nguyen 2005), which are models of program modeling languages, data languages, and programming languages, respectively in different technological spaces (Kurtev et al. 2002; Wimmer and Kramler 2005). By referring to the ISO/IEC 42010:2011 terminology (ISO/IEC/IEEE 2011), a model is a “view” conforming to a “viewpoint” (i.e., metamodel) (Bruneliėre et al. 2014).2
3.2 Conceptual Framework
-
A program metamodel is a model of a programming language grammar, which represents target programs according to a specific purpose. The elements of any program metamodel must be mapped to (a set of) elements of the corresponding grammar. As shown in Fig. 2, “Program metamodel” is mapped by “Grammar”. A program model must conform to its program metamodel. In Fig. 2, “Program model” conforms to “Program metamodel”. Examples include KDM, FAMIX, and UML.
-
A program metalanguage is a language to describe program metamodels. In Fig. 2, “Program metamodel” conforms to “Metalanguage”. Metalanguages can be classified as metasyntaxes of grammar such as Extended BNF (EBNF) (ISO/IEC 1996) in textual presentation or meta-metamodels of metamodels at certain abstraction levels such as MOF and Eclipse Modeling Framework (EMF) meta model Ecore (Steinberg et al. 2008) usually in a graphic presentation.3
-
A context-free grammar (or simply grammar) is a formal device to specify which strings are part of the language, where the language is a set of strings over a finite set of symbols (Earley 1970).
-
A concrete syntax tree (CST) is a parse tree that pictorially shows how a string in a language is derived from the start symbol of the grammar (Aho et al. 2006).
-
An abstract syntax tree (AST), is a simplified syntactic representation of the source code, excluding superficial distinctions of form and constituents that are unimportant for translation from the tree (Aho et al. 2006). An AST follows an abstract grammar, which is a representation of the original concrete grammar at a higher-level of abstraction.
-
An abstract syntax model is a graphical representation of an abstract syntax (tree). Abstract syntax models can be seen as low-level program metamodels. Examples include programming-language-independent AST models such as ASTM (OMG 2011a)4 and programming-language-specific AST models such as Java Metamodel (Kollmann and Gogolla 2001).
-
A standard exchange format (SEF) (or simply an exchange format) is a metamodel (i.e., schema) of model data used to store the program models exchangeable among different tools (Fig. 2). For example, “Model data” conforms to “Exchange format”. Most of the elements in the exchange format can be mapped to (a set of) elements in the corresponding program metamodel. The exchange format may contain additional information (e.g., visual layout information) that is not included in the corresponding program model. Thus, “Exchange format” might be mapped by “Program metamodel” in Fig. 2. Examples include XML, XML Metadata Interchange format (XMI), Resource Descriptor Format (RDF), Rigi Standard Form (RSF), Tuple-Attribute Language (TA), GraX (Sim and Koschke 2001), CASE Data Interchange Format (CDIF) (Imber 1991) and MSE (Ducasse et al. 2011). Some of these (e.g., XMI and RDF) are general-purpose exchange formats that can be adapted to software, while others are specific to software (Sim and Koschke 2001).
3.3 Program Reverse Engineering as a Pattern
Section | Description |
---|---|
Name | Transformation to higher abstraction levels |
Context | You are analyzing software to comprehend or maintain it. |
Problem | The description of the software contains too much data to be comprehended or analyzed in a reasonable amount of time. You have some interest in certain aspects on the software; however, its description is too complex to focus on particular aspects of the interest. |
Solution | Transform the software (i.e., Lower-base in Fig. 3) as a source to another as a target at a higher or the same level of abstraction (Higher-base). This is usually done by defining rules mapping from a metamodel at a lower level (i.e., Lower-meta) as the domain to another metamodel at a higher or the same level (i.e., Higher-meta) as the range. Figure 3 shows the elements involved in the transformation. Concrete transformations can be classified into four types: Extraction, Abstraction, View and Store. |
– Extraction transforms code artifacts based on a certain grammar to a set of program facts based on a certain program metamodel. It is usually done by a parser that parses code artifacts. | |
– Abstraction transforms program models based on a certain lower metamodel to another model based on a certain higher metamodel. It is usually done by a filter component that queries, selects, and joins necessary data with respect to the higher metamodel; target higher metamodels are sometimes implicitly declared for the purpose of interactive ad hoc abstraction. | |
– View transforms program models based on a metamodel to another model based on another visualization metamodel at a similar or almost the same abstraction level. The transformation results are then displayed. Typical examples are HTML tables, UML diagrams, and any general graph representation. | |
– Store transforms program models based on a metamodel to model data according to an exchange format at a similar or almost the same abstraction level. Then the results are stored in a repository. Typical examples are XMI files, RDF files, and relational database. | |
Known implementation | Any reverse engineering tool. |
Related patterns | The following patterns are based on combinations of multiple concrete transformations. |
– Integrated program reverse engineering performs Extraction, Abstraction, Store, and View in its solution. | |
– Fact extraction performs Extraction and Store in its solution. | |
– Architecture recovery performs Extraction, Abstraction and View in its solution. |
4 Taxonomy Construction
4.1 Construction Process
4.2 Systematic Literature Review
-
Subject: “meta model”6 OR “meta models” OR metamodel* . We use this category to find papers that define and/or use a metamodel.
-
Stimuli: “source code” OR “source codes” OR program* . We define this set to find studies based on the types of stimuli that are usually use in program metamodels studies.
-
Task: extract* OR transform* OR generat* . These are simple yet sufficient to identify relevant papers since any reverse engineering objective and application must employ some sort of transformation; For example, extraction and generation can be regarded as types of vertical transformations (Gray et al. 2004).
4.3 Inclusion and Exclusion Criteria
4.4 Paper Selection Process
5 Program Metamodel TAxonomy (ProMeTA)
5.1 Feature: Target Language
5.2 Feature: Abstraction Level
5.3 Feature: Metalanguage
5.4 Feature: Exchange Format
5.5 Feature: Processing Environment
5.6 Feature: Definition
5.7 Feature: Program Metadata and History Data
5.8 Feature: Quality
-
Functional suitability consists of three sub-characteristics: 1) functional appropriateness, which is mostly concerned with traceability (Kurtev et al. 2002; Czarnecki and Helsen 2003) from model elements to the corresponding portion of the source code, 2) functional correctness regarding how the program metamodel is verified (Wu 2010), and 3) functional completeness regarding the applicability of the metamodel (i.e., general purpose metamodels or task-specific ones) (Tilley et al. 1994). In general, low-level metamodels are good for executability since any GPL should provide executable semantics, whereas most mid- or high-level metamodels lack executable semantics.
-
Performance efficiency addresses the quantity of extracted data (Sim et al. 2000) and primarily depends on the granularity of the metamodel. A metamodel sacrifices such resource utilization if the ratio of the extracted information to code is very high.
-
Compatibility addresses the interoperability among different tools and environments, which is broken down into several concrete properties. The identity (i.e., the identity preservation during transformation), solution reuse, and neutrality are primarily determined by the exchange patterns (Jin et al. 2002). A metamodel satisfies integrity only if some special mechanism to ensure an errorless exchange has been provided with the metamodel (Jin et al. 2002). A metamodel satisfies the instance representation (Ferenc et al. 2002) if a model can be easily represented in any SEF. This property is almost identical to the content-presentation separation (Kurtev et al. 2002).
-
Usability addresses the learnability that is supported by the existence of documentations, samples, and user communities (Christopher 2006).
-
Reliability addresses the availability of the program metamodel in terms of licensing (Christopher 2006). Although metamodels should be fully available through websites or other means, sometimes only parts of a metamodel are provided.
-
Maintainability encompasses five sub-characteristics. Among them, simplicity and evolvability are primarily determined by the exchange patterns (Jin et al. 2002). Some metamodels have specific modularity mechanisms (such as packages) and–or reuse mechanisms (such as the inheritance and logical composition of metamodel elements) (Czarnecki and Helsen 2003) to improve maintainability. The formality is specified as partially formalized or completely formalized (Clark et al. 2015) according to the available metamodel definition.
-
Portability addresses adaptability and is composed of three concrete properties: flexibility and scalability are primarily determined by the exchange patterns (Jin et al. 2002). A metamodel satisfies popularity if many different organizations beside the original developers have used it.
6 Validation of ProMeTA
6.1 Target Popular Metamodels
Metamodel | List of papers |
---|---|
Abstract Syntax Graph in TGraph | (Ebert 2008) |
Abstract Syntax Metamodel in ECORE/EMF | (Bergmayr and Wimmer 2013) |
Abstract Syntax Model in a graph grammar | (Naik and Bahulkar 2004) |
Abstract Syntax Tree in logic representation | (Chirila and Jebelean 2010) |
Abstract Syntax Tree Metamodel (ASTM) | |
Abstract Syntax Tree Model in MOF | (Soden and Eichler 2007) |
Abstract Syntax Tree Model in UML | (Antoniol et al. 2003) |
Architecture Model in TGraph | (Ebert 2008) |
Columbus Schema | (Vidȧcs 2009) |
Common Meta-Model in common tree grammar | (Strein et al. 2006) |
Daghstul Middle Metamodel | (Lethbridge et al. 2004) |
Datrix schema | (Lin and Holt 2004) |
Delphi metamodel in UML | (Knodel and Calderon-Meza 2004) |
Generic AST model in MOF | (Reus et al. 2006) |
Grammar by EBNF | (Bergmayr and Wimmer 2013) |
GXL schema in UML | (Meng and Wong 2004) |
Hismo | (Gȯmez et al. 2009) |
Integrated Meta-model of Reengineering in UML | (Cho 2005) |
JaMoPP Java Model | (Heidenreich et al. 2010) |
Java Meta Model in UML | (Kollmann and Gogolla 2001) |
Java MetaModel in grUML | (Ebert 2008) |
Java Metamodel in MOF | (Favre 2008) |
KDM | |
FAMIX | |
MARPLE model in ECORE/EMF | (Arcelli et al. 2010) |
Meta-model for design patterns and source code | (Guėhėneuc and Albin-Amiot 2001) |
Program entities and relationships in RDB | (Harmer and Wilkie 2002) |
Program Metamodel in UML | (Wu 2010) |
Ring meta-model | (Gȯmez and Ducasse 2012) |
Source Code Meta-Model in UML | (Alikacem and Sahraoui 2009) |
SourcererDB Metamodel | (Ossher et al. 2009) |
SPOOL repository schema | |
System Engineering Technology Interface metamodel | (Krasovec and Howell 1995) |
UNIQ-ART Meta-model |
6.2 Classification Results
-
Target language: Of the five metamodels, three are language independent, while two handle object-oriented source code. Regardless of the language independence, all support the Java language since it seems to be the most common, especially in the context of reverse engineering research and practice. The second most common language is C++.If the target language is a major one like Java or C++, existing program metamodels and their corresponding reverse engineering tools may be reused, but if the target language is a minor one, a specific metamodel must be selected or a new one must be created.
-
Abstraction level: All of the five metamodels can be used as mid-level metamodels, but only one metamodel (M2) can be used as a high-level one. According to the coverage of the low-level metamodel features, M1 and M2 are more useful even though they still miss some lexical structure features such as Token, Separator, and Layout. There are limited supports for language dialects.Practitioners and researchers can choose an appropriate metamodel and its corresponding reverse engineering tool according to their abstraction level requirements. However, our classification results indicate that none of the existing metamodels supports all of the required features at certain abstraction levels; in this case, it may be necessary to extend existing metamodels or create new one to cover the missing features.
-
Metalanguage: Four of the five metamodels adopt the standard meta-metamodel MOF or the unified language UML, which are explicitly and externally defined, while only M5 adopts a specific implicitly-internally definition.If practitioners and researchers adopt various tools for long-term usage, it may be better to choose or create program metamodels (like M1–M4) defined by widely accepted, explicitly-externally defined metalanguages (especially MOF and UML).In addition, the existence of user communities of metamodels could contribute to the ease of usage of their metalanguages; for example, since M3 has a large user community as identified regarding the feature Q9: Learnability, its metalanguage UML could be a good choice for creating (or selecting) program metamodels.
-
Exchange format: Corresponding to the metalanguage used, three of the five metamodels adopt standard SEFs such as XMI, which are explicitly-externally defined, while M5 supports a specific binary-based implicitly-internally defined data exchange.If practitioners and researchers consider utilizing various tools for long-term usage, selecting or creating program metamodels with a good exchange format quality (like M1, M2 and M4), which support the widely accepted, explicitly-externally defined SEFs (especially XMI) may be a better choice; however, its impact on selection or creation could be less than those of other features (such as the abstraction level) since specific exchange formats can be additionally supported by preparing convertors among exchange formats, unless the metamodel originally supports explicitly-externally defined SEFs.
-
Processing environment: Due to their popularity, all of the five metamodels have dedicated extractors and navigation supports. It is obvious that extractors and navigation supports should be prepared to improve the ease of use of any program metamodels.There are dedicated transformation supports including refactoring facilities for three of five. Most of the metamodels (except for M5) are suitable for transformations and program analysis. Practitioners and researchers should check whether the processing environment and facilities are available to meet their reverse engineering objectives.
-
Definition: All of the five metamodels are manually defined. All except M5 are explicitly defined, leading to high quality metamodels with high compatibility, maintainability, and portability. Three of which are externally and fully formalized. The other two (M4 and M5) are internally defined.If practitioners and researchers utilize various tools for long-term usage, selecting or creating explicitly-externally defined metamodels (like M1–M3) is a better choice.
-
Program metadata and history data: There are few supports to describe meta and history data in metamodels; only the programming language name and the file version are supported by M1 and M2, respectively.15During the SLR, several history-aware metamodels were found to explicitly address the version history: Ring (Gȯmez and Ducasse 2012), Hismo (Gîrba and Ducasse 2006; Gȯmez et al. 2009), FAMIX-based RHDB code model (Antoniol et al. 2005) and FAMIX-based ArchEvoDB schema (Pinzger et al. 2005). If practitioners and researchers conduct reverse engineering in which history analysis is taken into account, selecting a history-aware metamodel, especially the RHDB code model and the ArchEvoDB schema, may be better since these are defined as extensions of FAMIX, which is a widely accepted popular metamodel.
-
Functionality: Two metamodels (M1 and M2) support most of the functional suitability features, including executability, traceability, and transformability, since these are low-level metamodels supporting static and dynamic semantics shown in the abstraction level features. None explicitly state how these have been verified. Although most can be used for various purposes, only M5 is for several specific tasks such as the dependency analysis.Practitioners and researchers should verify whether the potential program metamodels satisfy their reverse engineering functionality requirements. If a metamodel is used for various reverse engineering purposes, selecting a general one (like M1–M4) is better.
-
Non-functionality: Only M1 sacrifices the performance efficiency since it contains all of the statement-level code descriptions. Three (M1–M3) have a good usability since documents and samples with communities are well prepared. These three metamodels also have good compatibility, maintainability, and portability since these are explicitly-externally defined, fully formalized, and fully available. Unfortunately the definitions of M4 and M5 seem to be unavailable elsewhere on the Internet or in the literature. Most of the metamodels (except M5) support inheritance and logical composition as reuse mechanism. However, only M2 supports the dedicated modularity mechanism.Practitioners and researchers should check whether potential program metamodels satisfy their non-functionality requirements. If existing metamodels are to be reused, they must select fully available and formalized metamodels (like M1–M3).
6.3 Discussions
7 RQ1: Does ProMeTA cover all possible characteristics and limitations in existing works that evaluate and compare program metamodels?
8 RQ2: Does ProMeTA have orthogonality in its classification features?
9 RQ3: Is ProMeTA useful for guiding practitioners and researchers?
-
UC1. Developing new reverse engineering tools: When engineers want to build their own reverse engineering tools, they must define the requirements in program metamodels that enable and circumscribe the features of the tools. ProMeTA supports the requirements definition and guides reuse, extension, or creation of metamodels because engineers can recognize features included in ProMeTA as possible requirement items. Moreover, if a ProMeTA-based classification result of a potential metamodel for reuse or extension is available like M1–M5 in the above validation, engineers can easily determine whether the metamodel satisfies their requirements.
-
UC2. Choosing existing reverse engineering tools: When engineers want to reuse and eventually extend existing reverse engineering tools, they must compare and then select the appropriate one according to how the underlying program metamodels meet their objectives. ProMeTA can help by comparing criteria and the metamodels according to the characteristics defined in ProMeTA. Moreover, ProMeTA may help by comparing existing classification results of metamodels (if available).
-
UC3. Communicating or researching program metamodels and reverse engineering tools: ProMeTA can serve as a reference for the reverse engineering community, including practitioners and researchers. It can be extended by peers, providing the community with an important body of knowledge to guide future communications and research on program metamodels and the corresponding reverse engineering tools since it incorporates the characteristics of metamodels into a single orthogonal structure based on a conceptual framework that defines common terminology. For example, ProMeTA can serve as the basis for building an open repository of information of existing program metamodels (and corresponding tools) by accumulating classification results. The above-mentioned classification results of M1–M5 can be a starting point.