The repository contains detokenized GEC datasets described in the paper Adapting LLMs for Minimal-Edit Grammatical Error Correction.
The detokenized datasets are licensed under the terms of the original datasets. We are not the authors of these datasets. The datasets originate from the following research works:
A New Dataset and Method for Automatically Grading ESOL Texts (Yannakoudakis et al., ACL 2011)
The CoNLL-2014 Shared Task on Grammatical Error Correction (Ng et al., CoNLL 2014)
JFLEG: A Fluency Corpus and Benchmark for Grammatical Error Correction (Napoles et al., EACL 2017)
The BEA-2019 Shared Task on Grammatical Error Correction (Bryant et al., BEA 2019)