Supporting material for the paper

Multi-objective Module Clustering for Kate

By Matheus Paixao, Mark Harman and Yuanyuan Zhang
Centre for Research on Evolution, Search and Testing (CREST UCL)
University College London

A complementary technical report about this paper is available here

May 2015 (last update: May 2015)

Table of Contents
  1. Abstract
  2. Kate Modularization Datasets


The paper applies multi-objective search based software remodularization to the program Kate, showing how this can improve cohesion and coupling, and investigating differences between weighted and unweighted approaches and between equal-size and maximising clusters approaches. It is also investigated the effects of considering omnipresent modules. Overall, it is provided evidence that search based modularization can benefit Kate developers.

Kate Modularization Datasets

Data Extraction

Kate’s source code is organized in only two folders, src and session, where each folder accommodate some classes. First, the call graph of each function of Kate was directly extracted from the source code using Doxygen. Then, Doxygen was also used to extract the inheritance graph between classes. Finally, Kate’s unweighted and weighted MDGs were created from the call and inheritance graphs, where each class is considered a module, and a function call or inheritance from one class to another represents a dependency between the respective modules. The weight of an edge in the weighted MDG is considered to be the number of functions calls from one class to another. The clusters are considered to be the folders the classes are in.

There are usually some modules that have more dependencies than the average. Such modules are called omnipresent because they do not seem to belong to any particular cluster, but to the system as whole. The omnipresent modules were identified using thresholds. By choosing an omnipresent threshold o_t = 3 , for example, all modules that have 3 times more dependencies than the average is considered to be omnipresent. As smaller the threshold, more modules will be identified as omnipresent. Two different thresholds were used in this work, o_t = 3 and o_t = 2 . A threshold o t = 4 did not identified any omnipresent module.

Datasets Download
Dataset Example Dataset Example
All Datasets All datasets
Unweighted Unweighted
Unweighted Threshold 3 Unweighted Threshold 3
Unweighted Threshold 2 Unweighted Threshold 2
Weighted Weighted
Weighted Threshold 3 Weighted Threshold 3
Weighted Threshold 2 Weighted Threshold 2