This section proposes various software-related open evolutionnary data sets. Open data sets are important, because they allow easy experimentation, learning and research. They also provide a good foundation for the reproducibility of research works.
The specificities of these data sets are numerous. Firstly, they are really easy to use: flat csv files that can be imported (e.g. in R) with one single line. Secondly, the metrics provided include uncommon measures like rule-checking, mailing lists and configuration management data on top of the more classical source code measures. Finally, they provide three consistent layers of information for each version of software: application, files, and functions.
In application data sets, each line describes a version of the full product. Included metrics are code, configuration management and mailing lists.
In file-level data sets, each line describes a file in the product. Included metrics are code and configuration management.
In function-level data sets, each line describes a function in the product. Only includes metrics from code.
- See Included Metrics
The data sets define 159 metrics, including: 20 from code, 16 from configuration management, 30 from change management, 93 from rule-checking tools.
- See Included Rules
Rule-checking tools look for common anti-patterns in the code. They allow to follow the evolution of development practices along years. The proposed data sets include rules from PMD (58 rules), CheckStyle (39 rules) and SQuORE (21 rules).
- See Ant Dataset
These data sets have been produced in the course of the Maisqual project, and are used to demonstrate some of the algorithms developed. They feature a hundred metrics on various versions of the Apache Ant project, in a neat csv format for easy use.
- See JMeter
JMeter is another famous project from the Apache foundation, widely used for http components testing. Both releases and weekly snapshots are provided.