Detokenized-GEC-Datasets

The repository contains detokenized GEC datasets described in the paper Adapting LLMs for Minimal-Edit Grammatical Error Correction.

The detokenized datasets are licensed under the terms of the original datasets. We are not the authors of these datasets. The datasets originate from the following research works:

A New Dataset and Method for Automatically Grading ESOL Texts (Yannakoudakis et al., ACL 2011)

The CoNLL-2014 Shared Task on Grammatical Error Correction (Ng et al., CoNLL 2014)

JFLEG: A Fluency Corpus and Benchmark for Grammatical Error Correction (Napoles et al., EACL 2017)

The BEA-2019 Shared Task on Grammatical Error Correction (Bryant et al., BEA 2019)

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
BEA		BEA
CoNLL		CoNLL
FCE		FCE
JFLEG		JFLEG
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Detokenized-GEC-Datasets

About

Uh oh!

Releases

Packages

richardxoldman/detokenized-gec-datasets

Folders and files

Latest commit

History

Repository files navigation

Detokenized-GEC-Datasets

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages