Introduction

This repository contains a description on how to use OpenNMT on the Grammar Error Correction (GEC) task. The idea is to approch GEC as a translation task

Usage

Install the necessary dependencies:

pip3 install ctranslate2 pyonmttok

Simple tokenization & translation using Python:

import ctranslate2
import pyonmttok
from huggingface_hub import snapshot_download
model_dir = snapshot_download(repo_id="jordimas/gec-opennmt-english", revision="main")

tokenizer=pyonmttok.Tokenizer(mode="none", sp_model_path = model_dir + "/sp_m.model")
tokenized=tokenizer.tokenize("The water are hot. My friends are going to be late. Today mine mother is in Barcelona.")

translator = ctranslate2.Translator(model_dir)
translated = translator.translate_batch([tokenized[0]])
print(tokenizer.detokenize(translated[0][0]['tokens']))

Model

The model has been training using the clang8 corpus for English language.

Details:

  • Model: TransformerBase
  • Tokenizer: SentencePiece
  • BLEU = 85.50

Papers

Relevant papers:

Contact

Email address: Jordi Mas: jmas@softcatala.org

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model authors have turned it off explicitly.