# Diff datasets

A collection of diff datasets. It contains:

- [Defects4J](https://github.com/GumTreeDiff/datasets/tree/main/defects4j) which is a Java dataset used in the program repair community. The `buggy` folder contains the files containing the defect and the `fixed` folder contains the fixed files.
- [BugsInPy](https://github.com/GumTreeDiff/datasets/tree/main/bugsinpy) which is a Python dataset used in the program repair community. Its layout is similar to Defects4J's one
- [unparsable](https://github.com/GumTreeDiff/datasets/tree/main/unparsable) which contains the diff cases for which I could not parse the files.

For reproducibility sakes, the Python scripts used to produce the datasets are also provided.
