|
--- |
|
license: mit |
|
datasets: |
|
- fka/awesome-chatgpt-prompts |
|
- gopipasala/fka-awesome-chatgpt-prompts |
|
metrics: |
|
- character |
|
base_model: |
|
- meta-llama/Llama-3.2-11B-Vision-Instruct |
|
new_version: meta-llama/Llama-3.1-8B-Instruct |
|
pipeline_tag: summarization |
|
library_name: diffusers |
|
tags: |
|
- legal |
|
--- |
|
|
|
|
|
Model Card for Rithu Paran's Summarization Model |
|
Model Details |
|
Model Description |
|
Purpose: This model is designed for text summarization, specifically built to condense long-form content into concise, meaningful summaries. |
|
Developed by: Rithu Paran |
|
Model type: Transformer-based Language Model for Summarization |
|
Base Model: Meta-Llama/Llama-3.2-11B-Vision-Instruct |
|
Finetuned Model Version: Meta-Llama/Llama-3.1-8B-Instruct |
|
Language(s): Primarily English, with limited support for other languages. |
|
License: MIT License |
|
Model Sources |
|
Repository: Available on Hugging Face Hub under Rithu Paran |
|
Datasets Used: fka/awesome-chatgpt-prompts, gopipasala/fka-awesome-chatgpt-prompts |
|
Uses |
|
Direct Use |
|
This model can be directly employed for summarizing various types of content, such as news articles, reports, and other informational documents. |
|
Out-of-Scope Use |
|
It is not recommended for highly technical or specialized documents without additional fine-tuning or adaptation. |
|
Bias, Risks, and Limitations |
|
While this model was designed to be general-purpose, there may be inherent biases due to the training data. Users should be cautious when using the model for sensitive content or in applications where accuracy is crucial. |
|
|
|
How to Get Started with the Model |
|
Here's a quick example of how to start using the model for summarization: |
|
|
|
python |
|
Copy code |
|
from transformers import pipeline |
|
|
|
summarizer = pipeline("summarization", model="rithu-paran/your-summarization-model") |
|
text = "Insert long-form text here." |
|
summary = summarizer(text, max_length=100, min_length=30) |
|
print(summary) |
|
Training Details |
|
Training Data |
|
Datasets: fka/awesome-chatgpt-prompts, gopipasala/fka-awesome-chatgpt-prompts |
|
Preprocessing: Data was tokenized and normalized for better model performance. |
|
Training Procedure |
|
Hardware: Trained on GPUs with Hugging Face API resources. |
|
Precision: Mixed-precision (fp16) was utilized to enhance training efficiency. |
|
Training Hyperparameters |
|
Batch Size: 16 |
|
Learning Rate: 5e-5 |
|
Epochs: 3 |
|
Optimizer: AdamW |
|
Evaluation |
|
Metrics |
|
Metrics Used: ROUGE Score, BLEU Score |
|
Evaluation Datasets: Evaluated on a subset of fka/awesome-chatgpt-prompts for summarization performance. |
|
Technical Specifications |
|
Model Architecture |
|
Based on Llama-3 architecture, optimized for summarization through attention-based mechanisms. |
|
Compute Infrastructure |
|
Hardware: Nvidia A100 GPUs were used for training. |
|
Software: Hugging Face’s transformers library along with the diffusers library. |
|
Environmental Impact |
|
Hardware Type: Nvidia A100 GPUs |
|
Training Duration: ~10 hours |
|
Estimated Carbon Emission: Approximate emissions calculated using Machine Learning Impact calculator. |
|
Contact |
|
For any questions or issues, please reach out to Rithu Paran via the Hugging Face Forum. |
|
|
|
|