jordimas
/

gec-opennmt-english

Model card Files Files and versions Community

gec-opennmt-english / README.md

jordimas's picture

Fix

f20ccf3 over 2 years ago

|

1.53 kB

	---
	language:
	- en

	tags:
	- gec

	library_name: opennmt
	license: mit
	metrics:
	- bleu

	inference: false
	---

	### Introduction

	This repository contains a description on how to use OpenNMT on the Grammar Error Correction (GEC) task. The idea is to approch GEC as a translation task

	### Usage

	Install the necessary dependencies:


	```bash
	pip3 install ctranslate2 pyonmttok
	```


	Simple tokenization & translation using Python:


	```python
	import ctranslate2
	import pyonmttok
	from huggingface_hub import snapshot_download
	model_dir = snapshot_download(repo_id="jordimas/gec-opennmt-english", revision="main")

	tokenizer=pyonmttok.Tokenizer(mode="none", sp_model_path = model_dir + "/sp_m.model")
	tokenized=tokenizer.tokenize("The water are hot. My friends are going to be late. Today mine mother is in Barcelona.")

	translator = ctranslate2.Translator(model_dir)
	translated = translator.translate_batch([tokenized[0]])
	print(tokenizer.detokenize(translated[0][0]['tokens']))
	```

	# Model

	The model has been training using the [clang8](https://github.com/google-research-datasets/clang8) corpus for English language.

	Details:
	* Model: TransformerBase
	* Tokenizer: SentencePiece
	* BLEU = 85.50

	# Papers

	Relevant papers:

	* [Approaching Neural Grammatical Error Correction as a Low-Resource Machine Translation Task](https://aclanthology.org/N18-1055.pdf)
	* [A Simple Recipe for Multilingual Grammatical Error Correction](https://arxiv.org/pdf/2106.03830.pdf)


	# Contact

	Email address: Jordi Mas: jmas@softcatala.org