--- language: - en tags: - gec library_name: opennmt license: mit metrics: - bleu inference: false --- ### Introduction This repository contains a description on how to use OpenNMT on the Grammar Error Correction (GEC) task. The idea is to approch GEC as a translation task ### Usage Install the necessary dependencies: ```bash pip3 install ctranslate2 pyonmttok ``` Simple tokenization & translation using Python: ```python import ctranslate2 import pyonmttok from huggingface_hub import snapshot_download model_dir = snapshot_download(repo_id="jordimas/gec-opennmt-english", revision="main") tokenizer=pyonmttok.Tokenizer(mode="none", sp_model_path = model_dir + "/sp_m.model") tokenized=tokenizer.tokenize("The water are hot. My friends are going to be late. Today mine mother is in Barcelona.") translator = ctranslate2.Translator(model_dir) translated = translator.translate_batch([tokenized[0]]) print(tokenizer.detokenize(translated[0][0]['tokens'])) ``` # Model The model has been training using the [clang8](https://github.com/google-research-datasets/clang8) corpus for English language. Details: * Model: TransformerBase * Tokenizer: SentencePiece * BLEU = 85.50 # Papers Relevant papers: * [Approaching Neural Grammatical Error Correction as a Low-Resource Machine Translation Task](https://aclanthology.org/N18-1055.pdf) * [A Simple Recipe for Multilingual Grammatical Error Correction](https://arxiv.org/pdf/2106.03830.pdf) # Contact Email address: Jordi Mas: jmas@softcatala.org