File size: 3,761 Bytes

---
language: en
license: apache-2.0
library_name: transformers
pipeline_tag: text2text-generation
tags:
- text-generation
- formal-language
- grammar-correction
- t5
- english
- text-formalization

model-index:
- name: formal-lang-rxcx-model
  results:
  - task:
      type: text2text-generation
      name: formal language correction
    metrics:
      - type: loss
        value: 2.1  # Replace with your actual training loss
        name: training_loss
      - type: rouge1
        value: 0.85  # Replace with your actual ROUGE score
        name: rouge1
      - type: accuracy
        value: 0.82  # Replace with your actual accuracy
        name: accuracy
    dataset:
      name: grammarly/coedit
      type: grammarly/coedit
      split: train
      
datasets:
- grammarly/coedit

model-type: t5-base
inference: true
base_model: t5-base

widget:
- text: "make formal: hey whats up"
- text: "make formal: gonna be late for meeting"
- text: "make formal: this is kinda cool project"

extra_gated_prompt: This is a fine-tuned T5 model for converting informal text to formal language.

extra_gated_fields:
  Company/Institution: text
  Purpose: text

---

# Formal Language T5 Model

This model is fine-tuned from T5-base for formal language correction and text formalization.

## Model Description

- **Model Type:** T5-base fine-tuned
- **Language:** English
- **Task:** Text Formalization and Grammar Correction
- **License:** Apache 2.0
- **Base Model:** t5-base

## Intended Uses & Limitations

### Intended Uses
- Converting informal text to formal language
- Improving text professionalism
- Grammar correction
- Business communication enhancement
- Academic writing improvement

### Limitations
- Works best with English text
- Maximum input length: 128 tokens
- May not preserve specific domain terminology
- Best suited for business and academic contexts

## Usage

```python
from transformers import AutoModelForSeq2SeqGeneration, AutoTokenizer

model = AutoModelForSeq2SeqGeneration.from_pretrained("renix-codex/formal-lang-rxcx-model")
tokenizer = AutoTokenizer.from_pretrained("renix-codex/formal-lang-rxcx-model")

# Example usage
text = "make formal: hey whats up"
inputs = tokenizer(text, return_tensors="pt")
outputs = model.generate(**inputs)
formal_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
```

## Example Inputs and Outputs

| Informal Input | Formal Output |
|----------------|---------------|
| "hey whats up" | "Hello, how are you?" |
| "gonna be late for meeting" | "I will be late for the meeting." |
| "this is kinda cool" | "This is quite impressive." |

## Training

The model was trained on the Grammarly/COEDIT dataset with the following specifications:
- Base Model: T5-base
- Training Hardware: A100 GPU
- Sequence Length: 128 tokens
- Input Format: "make formal: [informal text]"

## License

Apache License 2.0

## Citation

```bibtex
@misc{formal-lang-rxcx-model,
    author = {renix-codex},
    title = {Formal Language T5 Model},
    year = {2024},
    publisher = {HuggingFace},
    journal = {HuggingFace Model Hub},
    url = {https://huggingface.co/renix-codex/formal-lang-rxcx-model}
}
```

## Developer

Model developed by renix-codex

## Ethical Considerations

This model is intended to assist in formal writing while maintaining the original meaning of the text. Users should be aware that:
- The model may alter the tone of personal or culturally specific expressions
- It should be used as a writing aid rather than a replacement for human judgment
- The output should be reviewed for accuracy and appropriateness

## Updates and Versions

Initial Release - February 2024
- Base implementation with T5-base
- Trained on Grammarly/COEDIT dataset
- Optimized for formal language conversion