metadata
language: en
license: apache-2.0
library_name: transformers
pipeline_tag: text2text-generation
tags:
- text-generation
- formal-language
- grammar-correction
- t5
- english
- text-formalization
model-index:
- name: formal-lang-rxcx-model
results:
- task:
type: text2text-generation
name: formal language correction
metrics:
- type: loss
value: 2.1
name: training_loss
- type: rouge1
value: 0.85
name: rouge1
- type: accuracy
value: 0.82
name: accuracy
dataset:
name: grammarly/coedit
type: grammarly/coedit
split: train
datasets:
- grammarly/coedit
model-type: t5-base
inference: true
base_model: t5-base
widget:
- text: 'make formal: hey whats up'
- text: 'make formal: gonna be late for meeting'
- text: 'make formal: this is kinda cool project'
extra_gated_prompt: This is a fine-tuned T5 model for converting informal text to formal language.
extra_gated_fields:
Company/Institution: text
Purpose: text
Formal Language T5 Model
This model is fine-tuned from T5-base for formal language correction and text formalization.
Model Description
- Model Type: T5-base fine-tuned
- Language: English
- Task: Text Formalization and Grammar Correction
- License: Apache 2.0
- Base Model: t5-base
Intended Uses & Limitations
Intended Uses
- Converting informal text to formal language
- Improving text professionalism
- Grammar correction
- Business communication enhancement
- Academic writing improvement
Limitations
- Works best with English text
- Maximum input length: 128 tokens
- May not preserve specific domain terminology
- Best suited for business and academic contexts
Usage
from transformers import AutoModelForSeq2SeqGeneration, AutoTokenizer
model = AutoModelForSeq2SeqGeneration.from_pretrained("renix-codex/formal-lang-rxcx-model")
tokenizer = AutoTokenizer.from_pretrained("renix-codex/formal-lang-rxcx-model")
# Example usage
text = "make formal: hey whats up"
inputs = tokenizer(text, return_tensors="pt")
outputs = model.generate(**inputs)
formal_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
Example Inputs and Outputs
Informal Input | Formal Output |
---|---|
"hey whats up" | "Hello, how are you?" |
"gonna be late for meeting" | "I will be late for the meeting." |
"this is kinda cool" | "This is quite impressive." |
Training
The model was trained on the Grammarly/COEDIT dataset with the following specifications:
- Base Model: T5-base
- Training Hardware: A100 GPU
- Sequence Length: 128 tokens
- Input Format: "make formal: [informal text]"
License
Apache License 2.0
Citation
@misc{formal-lang-rxcx-model,
author = {renix-codex},
title = {Formal Language T5 Model},
year = {2024},
publisher = {HuggingFace},
journal = {HuggingFace Model Hub},
url = {https://huggingface.co/renix-codex/formal-lang-rxcx-model}
}
Developer
Model developed by renix-codex
Ethical Considerations
This model is intended to assist in formal writing while maintaining the original meaning of the text. Users should be aware that:
- The model may alter the tone of personal or culturally specific expressions
- It should be used as a writing aid rather than a replacement for human judgment
- The output should be reviewed for accuracy and appropriateness
Updates and Versions
Initial Release - February 2024
- Base implementation with T5-base
- Trained on Grammarly/COEDIT dataset
- Optimized for formal language conversion