renix-codex's picture
Update README.md
d00a20a verified
---
language: en
license: apache-2.0
library_name: transformers
pipeline_tag: text2text-generation
tags:
- text-generation
- formal-language
- grammar-correction
- t5
- english
- text-formalization
model-index:
- name: formal-lang-rxcx-model
results:
- task:
type: text2text-generation
name: formal language correction
metrics:
- type: loss
value: 2.1 # Replace with your actual training loss
name: training_loss
- type: rouge1
value: 0.85 # Replace with your actual ROUGE score
name: rouge1
- type: accuracy
value: 0.82 # Replace with your actual accuracy
name: accuracy
dataset:
name: grammarly/coedit
type: grammarly/coedit
split: train
datasets:
- grammarly/coedit
model-type: t5-base
inference: true
base_model: t5-base
widget:
- text: "make formal: hey whats up"
- text: "make formal: gonna be late for meeting"
- text: "make formal: this is kinda cool project"
extra_gated_prompt: This is a fine-tuned T5 model for converting informal text to formal language.
extra_gated_fields:
Company/Institution: text
Purpose: text
---
# Formal Language T5 Model
This model is fine-tuned from T5-base for formal language correction and text formalization.
## Model Description
- **Model Type:** T5-base fine-tuned
- **Language:** English
- **Task:** Text Formalization and Grammar Correction
- **License:** Apache 2.0
- **Base Model:** t5-base
## Intended Uses & Limitations
### Intended Uses
- Converting informal text to formal language
- Improving text professionalism
- Grammar correction
- Business communication enhancement
- Academic writing improvement
### Limitations
- Works best with English text
- Maximum input length: 128 tokens
- May not preserve specific domain terminology
- Best suited for business and academic contexts
## Usage
```python
from transformers import AutoModelForSeq2SeqGeneration, AutoTokenizer
model = AutoModelForSeq2SeqGeneration.from_pretrained("renix-codex/formal-lang-rxcx-model")
tokenizer = AutoTokenizer.from_pretrained("renix-codex/formal-lang-rxcx-model")
# Example usage
text = "make formal: hey whats up"
inputs = tokenizer(text, return_tensors="pt")
outputs = model.generate(**inputs)
formal_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
```
## Example Inputs and Outputs
| Informal Input | Formal Output |
|----------------|---------------|
| "hey whats up" | "Hello, how are you?" |
| "gonna be late for meeting" | "I will be late for the meeting." |
| "this is kinda cool" | "This is quite impressive." |
## Training
The model was trained on the Grammarly/COEDIT dataset with the following specifications:
- Base Model: T5-base
- Training Hardware: A100 GPU
- Sequence Length: 128 tokens
- Input Format: "make formal: [informal text]"
## License
Apache License 2.0
## Citation
```bibtex
@misc{formal-lang-rxcx-model,
author = {renix-codex},
title = {Formal Language T5 Model},
year = {2024},
publisher = {HuggingFace},
journal = {HuggingFace Model Hub},
url = {https://huggingface.co/renix-codex/formal-lang-rxcx-model}
}
```
## Developer
Model developed by renix-codex
## Ethical Considerations
This model is intended to assist in formal writing while maintaining the original meaning of the text. Users should be aware that:
- The model may alter the tone of personal or culturally specific expressions
- It should be used as a writing aid rather than a replacement for human judgment
- The output should be reviewed for accuracy and appropriateness
## Updates and Versions
Initial Release - February 2024
- Base implementation with T5-base
- Trained on Grammarly/COEDIT dataset
- Optimized for formal language conversion