Maisqual is a recursive acronym standing for "Maisqual Automagically Improves Software QUALity". It is a research project about "Mining Software Engineering Data for Useful Knowledge". By applying data mining algorithms and methods to software engineering data, we aim to get a new level of understanding to help enhance the quality of software development projects.



The Maisqual project was funded by SQuORING Technologies in 2011-2014 in collaboration with Philippe Preux from the SequeL team. SequeL is an acronym for Sequential Learning; it is a joint research project of the LIFL (Laboratoire d’Informatique Fondamentale de Lille, Université de Lille 3), the CNRS (Centre National de Recherche Scientifique) and the INRIA (Institut National de Recherche en Informatique Appliquée) located in the Lille-Nord Europe research center.

This work is still available on the Maisqual wiki hosted by SQuORING Technologies. The company decided to publish this material as a contribution to the field of software engineering. Thanks to them!

The following cloud of words shows an interesting perspective on the thesis. The size of each word depends on its frequency in the memoir.

Maisqual Wordle


The following describes the different parts of the memoire, to help you get straight to the information that matters to you:

  1. State of the Art

Software Engineering
Simply put, software engineering can be defined as the art of building great software. This chapter reviews the major software engineering concepts that we used in this project: the fundamentals of software development processes and practices are presented, then we establishe the concerns and rules of measurement as applied to software, give a few definitions and concepts about software quality, and finally lists some major quality models that one needs to know when working with software quality.

Data Mining
Data mining is the art of examining large data sources and generating a new level of knowledge to better understand and predict a system’s behaviour. In this chapter we review some statistical and data mining techniques, looking more specifically at how they have been applied to software engineering challenges. As often as possible we give practical examples extracted from the experiments we conducted. We start with some basic exploratory analysis tools, and then review the main techniques we investigated for our purpose: clustering, outliers detection, regression analysis, time series analysis, and distribution functions.

  1. The Maisqual project

This chapter describes the data mining process and the data retrieval steps used for Maisqual. We used literate data analysis to run semantic-safe investigations. Two main documents were written: one a study of a single version of a project, and another a study of a series of consecutive snapshots of a project.

First stones: building the project
This chapter describes how the Maisqual project was modified after these experiences and how the precepts uncovered in the previous chapter were implemented in Maisqual. We investigated the nature of software engineering data, established a typical topology of artefacts, explained the methodological approach, and actually implemented an automated process to generate the data sets and analyses.

Generating the data sets
This chapter introduces the data sets that were generated for the Squore Labs using the framework setup earlier. The data sets produced during this phase have different structures and characteristics. Hence three different types of sets are defined: evolution data sets, release data sets, and version data sets.

  1. SQuORE Labs

Working with the Eclipse foundation
This chapter describes how we did work with the Eclipse Foundation to develop the first prototype of a maturity assessment model for the PolarSys working group.

Outliers detection
Developing automatic Action Items for the SQuORE product.

  • Clustering
  • Developing automatic scales for quality attributes evaluation in SQuORE.
  • Correlating practices and attributes of software
  • Investigating attributes of software, practices of development, and correlations in software engineering data.
  1. Conclusion