Design Pattern Detection

This is an old revision of the document!

Background \section{Background} \label{background}

\subsection{Language Independence} Fabry and Mens \cite{Fabry2004} proposed a language independent approach for object oriented design patterns through the use of logic meta programming. They separated the language-independent commonalities from language specific variabilities, which enabled them to write logic rules to reason about the structure of object oriented languages, independent of the actual object oriented programming language. They used Smalltalk and Java as an example of two object oriented programming languages that have significant differences. They presented the SOUL logic metaprogramming system, which was extended to support Java reasoning (SOULJava is the extended version of SOUL). They worked on a meta-level interface to provide the representational mapping between the logic meta language and the object oriented base language. Moreover, they worked on a logic repository of all the predicates available for the reasoning engine. The main idea is to provide a logical representation of object oriented source code, and try to identify best practice patterns and design patterns using language independent logic rules. This approach was tested against two Smalltalk and two Java applications. A very limited number of design patterns and best practice patterns were tested. No false positives or false negatives were reported, which is a good indicator of the approach's reliability. Their work shows the feasibility of reasoning about object oriented systems in a language independent way. Moreover, further work needs to be done to apply the approach on other design patterns, which as they mentioned will require a small number of extensions.
\subsection{Results Visualization} Another example is in \cite{Amiot2001}. Albin-Amiot et al. presented a set of tools and techniques to solve the problem of automating the instantiation and the detection of design patterns. They presented the \textit{Pattern Description Language}(PDL) that describes design patterns as first-class entities, providing their associated code and detecting their occurrences in existing code. PDL defines a design pattern's abstract model describing its structural and behavioral aspects. This provides a formalization of design pattern descriptions in order to handle the instantiation and detection of design patterns. They presented a tool called PTIDEJ \textit{Pattern Trace Identification, Detection, and Enhancement for Java} which can be used to automatically manipulate design pattern models using PDL, identify and visualize design pattern complete or distorted versions in the source code, and refactor the distorted ones to best comply with the corresponding design pattern.
\subsection{Systems Parsing} Different approaches have been proposed to statically analyze software systems. Systems could be parsed to another form such as graphs. The new form usually consists of entities, attributes, and relations that need to be examined by tools to detect design patterns.
For instance, Czibula and Serban \cite{Czibula2008} proposed an original clustering-based approach for identifying instances of design patterns. Their approach consisted of four steps: first is \textit{data collection}, where they analyze the system to extract all the relevant entities, such as classes, methods, attributes, etc. They then compute the distance between pairs of classes involved in a relation. Second is \textit{preprocessing}, where they reduce the search space by eliminating classes that can never be part of the design pattern to be detected. Third is \textit{grouping}, where they group entities based on dissimilarity measures between classes. Finally is \textit{design pattern recovery}, where they extract design pattern instances from the generated clusters that meet the design pattern constraints. They only reported one experiment of detecting the Proxy design pattern using this approach, two of the three detected instances were true with no further analysis.
Moreover, the main idea in the study conducted by Kaczor et al. \cite{Kaczor2006} was to convert the program to a string representation. They presented a bit vector algorithm for design pattern detection. Such an algorithm could find the solution to a problem in a bounded number of vector operations, which is independent from the program length. They first convert models of the design motifs and the program in digraphs, considering only binary relationships. Then they convert the models to Eulerian digraphs by adding dummy edges between vertices with unequal in-degree and out-degree edges. The minimum Eulerian circuit is computed in the next step to obtain a unique string representation of a design motif and of the program models. Finally, an iterative Bit-vector algorithm is used to identify design patterns by matching the string representation of design motifs and the string representation of the program models.
\subsection{Tools Scalability} Many studies aim to solve the scalability problem faced by design pattern detection tools. Antoniol et al. \cite{Antoniol1998} presented a conservative approach based on a multi-stage reduction strategy using OO software metrics and structural properties to extract structural design patterns from OO design or code. Their approach exploits software metrics to reduce search space complexity and makes use of method delegation information to further reduce the cardinality of the set of the retrieved pattern candidates. An Abstract Object Language (AOL) is used to represent the code or the design, to keep the approach independent from the programming language. This representation contains information about classes, methods, attributes, as well as the relations between classes. The recovery approach consisted of 5 steps; first the AOL representation is extracted from the code and design (AOL is based on UML), then parsed to AST to be used in the subsequent steps. Next the AST is traversed and decorated with class level metrics computed on the AOL AST. A multi-stage constraint evaluator is then used to evaluate design pattern instances. The constraints are essential to reduce the search space dimensions. This is done through metrics constraint evaluation (for instance, number of inheritance, association, aggregation relationships), followed by structural constraints evaluation, and delegation constraints evaluation. Doing the metrics constraints evaluation first helps reduce the search space complexity and the cardinality of the candidate set.
Another study that focuses on enhancing systems scalability is a study conducted by Gueheneuc et al. \cite{Gueheneuc2004}. They worked on finding a way to reduce the search space for identifying micro-architectures similar to design motifs. They used a set of attributes to identify all classes playing the same role in a design motif. These attributes (size, filiation, cohesion, coupling) are later used to eliminate from the search space. First, they manually built a repository of classes playing roles in design motifs. Then, for each of the mentioned attributes, they used metrics proposed in the literature to associate a set of values with each class. After that, a machine learning algorithm is used to infer a set of rules. The rules are later validated and interpreted to create a set of fingerprints identifying roles in design motifs. These rules are not used to detect design patterns. Instead, they can be used to reduce the search space of participating classes by eliminating classes that do not have the sufficient fingerprint for the candidate role. This approach has been implemented in the PTIDEJ tool, where in some cases it helped reduce the search space by approximately 89\%. The process of calculating the metrics for classes is not time consuming. On the other hand, the process of building the repository and training the data needs most of the work.
\subsection{Tools Evaluation} They pursued their work by creating a benchmark for comparing and evaluating design pattern miner tools in \cite{Fulop2008}. Their ongoing work is language, tool, pattern, and software independent. The benchmark contains three design pattern mining tools and over a thousand design pattern instances. A new tool can be easily added to the benchmark database. The benchmark tool can be used to browse the database and query certain information based on language, software, tool and/or pattern. Pattern instances are shown in detail along with standard statistics based on user evaluation, completeness, and correctness of the results. Precision and recall rates are also available for design pattern instances, which are based on the human evaluation of the instances and a preset threshold value, which means that a detected pattern instance is considered true if the average score of the evaluation is more than the threshold value.
An interesting study was conducted by Kniesel and Binun \cite{Kniesel2009}. The study combined five design pattern detection tools and proposed an approach called \textit{Data Fusion} for combining their results in order to identify patterns not detected by individual tools. They evaluated the similarity scoring tool, DP-Miner, PINOT, PTIDEJ, and FUJABA and reached the following finding. Due to property relaxation, a weaker set of constraints of a design pattern holds. Therefore, tools identify general patterns more that the actual patterns. They proposed notations of subpattern and superpattern relationships that are purely technical and not related to the intent. All subpattern instances are also instances of the superpattern. Making the superpattern a more general version. Moreover, they say a pattern is smaller if it is defined by a weaker set of constraints. Therefore, smaller patterns are identified more reliably with higher confidence by more than one tool. On the other hand, big patterns are rarely classified identically by the tools, and if they are identified identically then they have a very high chance of being true positives. They also noticed that if tools do not agree on big patterns, sometimes they agree on their superpatterns. A superpattern is a witness of a subpattern. EDPs are also considered as witnesses. They presented a set of rules for combining the results of the five studied tools. Their approach succeeded in increasing the recall and the precision rates for some design patterns, while no improvement was achieved in other patterns due to the high number of false positives provided by the tools.