Maisqual Datasets

This section proposes various software-related open evolutionnary data sets. Open data sets are important, because they allow easy experimentation, learning and research. They also provide a good foundation for the reproducibility of research works.

The specificities of these data sets are numerous. Firstly, they are really easy to use: flat csv files that can be imported (e.g. in R) with one single line. Secondly, the metrics provided include uncommon measures like rule-checking, mailing lists and configuration management data on top of the more classical source code measures. Finally, they provide three consistent layers of information for each version of software: application, files, and functions.

In application data sets, each line describes a version of the full product. Included metrics are code, configuration management and mailing lists.
In file-level data sets, each line describes a file in the product. Included metrics are code and configuration management.
In function-level data sets, each line describes a function in the product. Only includes metrics from code.

Included Metrics

The data sets define 159 metrics, including: 20 from code, 16 from configuration management, 30 from change management, 93 from rule-checking tools.

Included Rules

Rule-checking tools look for common anti-patterns in the code. They allow to follow the evolution of development practices along years. The proposed data sets include rules from PMD (58 rules), CheckStyle (39 rules) and SQuORE (21 rules).

Ant dataset

These data sets have been produced in the course of the Maisqual project, and are used to demonstrate some of the algorithms developed. They feature a hundred metrics on various versions of the Apache Ant project, in a neat csv format for easy use.

The Ant data set has also been featured in the MSR 2014 data track held in Hyderabad, India. You can download the article describing the data set here, and the poster used during the conference here.

JMeter dataset

JMeter is another famous project from the Apache foundation, widely used for http components testing. Both releases and weekly snapshots are provided.