Update README.md

d00a20a verified 26 days ago

3.76 kB

	---
	language: en
	license: apache-2.0
	library_name: transformers
	pipeline_tag: text2text-generation
	tags:
	- text-generation
	- formal-language
	- grammar-correction
	- t5
	- english
	- text-formalization

	model-index:
	- name: formal-lang-rxcx-model
	results:
	- task:
	type: text2text-generation
	name: formal language correction
	metrics:
	- type: loss
	value: 2.1 # Replace with your actual training loss
	name: training_loss
	- type: rouge1
	value: 0.85 # Replace with your actual ROUGE score
	name: rouge1
	- type: accuracy
	value: 0.82 # Replace with your actual accuracy
	name: accuracy
	dataset:
	name: grammarly/coedit
	type: grammarly/coedit
	split: train

	datasets:
	- grammarly/coedit

	model-type: t5-base
	inference: true
	base_model: t5-base

	widget:
	- text: "make formal: hey whats up"
	- text: "make formal: gonna be late for meeting"
	- text: "make formal: this is kinda cool project"

	extra_gated_prompt: This is a fine-tuned T5 model for converting informal text to formal language.

	extra_gated_fields:
	Company/Institution: text
	Purpose: text

	---

	# Formal Language T5 Model

	This model is fine-tuned from T5-base for formal language correction and text formalization.

	## Model Description

	- Model Type: T5-base fine-tuned
	- Language: English
	- Task: Text Formalization and Grammar Correction
	- License: Apache 2.0
	- Base Model: t5-base

	## Intended Uses & Limitations

	### Intended Uses
	- Converting informal text to formal language
	- Improving text professionalism
	- Grammar correction
	- Business communication enhancement
	- Academic writing improvement

	### Limitations
	- Works best with English text
	- Maximum input length: 128 tokens
	- May not preserve specific domain terminology
	- Best suited for business and academic contexts

	## Usage

	```python
	from transformers import AutoModelForSeq2SeqGeneration, AutoTokenizer

	model = AutoModelForSeq2SeqGeneration.from_pretrained("renix-codex/formal-lang-rxcx-model")
	tokenizer = AutoTokenizer.from_pretrained("renix-codex/formal-lang-rxcx-model")

	# Example usage
	text = "make formal: hey whats up"
	inputs = tokenizer(text, return_tensors="pt")
	outputs = model.generate(**inputs)
	formal_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
	```

	## Example Inputs and Outputs

	\| Informal Input \| Formal Output \|
	\|----------------\|---------------\|
	\| "hey whats up" \| "Hello, how are you?" \|
	\| "gonna be late for meeting" \| "I will be late for the meeting." \|
	\| "this is kinda cool" \| "This is quite impressive." \|

	## Training

	The model was trained on the Grammarly/COEDIT dataset with the following specifications:
	- Base Model: T5-base
	- Training Hardware: A100 GPU
	- Sequence Length: 128 tokens
	- Input Format: "make formal: [informal text]"

	## License

	Apache License 2.0

	## Citation

	```bibtex
	@misc{formal-lang-rxcx-model,
	author = {renix-codex},
	title = {Formal Language T5 Model},
	year = {2024},
	publisher = {HuggingFace},
	journal = {HuggingFace Model Hub},
	url = {https://huggingface.co/renix-codex/formal-lang-rxcx-model}
	}
	```

	## Developer

	Model developed by renix-codex

	## Ethical Considerations

	This model is intended to assist in formal writing while maintaining the original meaning of the text. Users should be aware that:
	- The model may alter the tone of personal or culturally specific expressions
	- It should be used as a writing aid rather than a replacement for human judgment
	- The output should be reviewed for accuracy and appropriateness

	## Updates and Versions

	Initial Release - February 2024
	- Base implementation with T5-base
	- Trained on Grammarly/COEDIT dataset
	- Optimized for formal language conversion