rithuparan07
commited on
Commit
•
e8ece45
1
Parent(s):
db0ca04
Update README.md
Browse files
README.md
CHANGED
@@ -13,66 +13,56 @@ library_name: diffusers
|
|
13 |
tags:
|
14 |
- legal
|
15 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
16 |
|
17 |
-
|
18 |
-
Model Card for Rithu Paran's Summarization Model
|
19 |
Model Details
|
20 |
Model Description
|
21 |
-
|
22 |
-
|
23 |
-
|
|
|
|
|
|
|
|
|
|
|
24 |
Base Model: Meta-Llama/Llama-3.2-11B-Vision-Instruct
|
25 |
-
Finetuned Model
|
26 |
-
|
27 |
-
|
28 |
-
|
29 |
-
Repository: Available on Hugging Face Hub under Rithu Paran
|
30 |
-
Datasets Used: fka/awesome-chatgpt-prompts, gopipasala/fka-awesome-chatgpt-prompts
|
31 |
-
Uses
|
32 |
-
Direct Use
|
33 |
-
This model can be directly employed for summarizing various types of content, such as news articles, reports, and other informational documents.
|
34 |
-
Out-of-Scope Use
|
35 |
-
It is not recommended for highly technical or specialized documents without additional fine-tuning or adaptation.
|
36 |
-
Bias, Risks, and Limitations
|
37 |
-
While this model was designed to be general-purpose, there may be inherent biases due to the training data. Users should be cautious when using the model for sensitive content or in applications where accuracy is crucial.
|
38 |
|
39 |
-
How to Get Started with the Model
|
40 |
-
Here's a quick example of how to start using the model for summarization:
|
41 |
|
42 |
-
python
|
43 |
-
Copy code
|
44 |
-
from transformers import pipeline
|
45 |
|
46 |
-
summarizer = pipeline("summarization", model="rithu-paran/your-summarization-model")
|
47 |
-
text = "Insert long-form text here."
|
48 |
-
summary = summarizer(text, max_length=100, min_length=30)
|
49 |
-
print(summary)
|
50 |
-
Training Details
|
51 |
-
Training Data
|
52 |
-
Datasets: fka/awesome-chatgpt-prompts, gopipasala/fka-awesome-chatgpt-prompts
|
53 |
-
Preprocessing: Data was tokenized and normalized for better model performance.
|
54 |
-
Training Procedure
|
55 |
-
Hardware: Trained on GPUs with Hugging Face API resources.
|
56 |
-
Precision: Mixed-precision (fp16) was utilized to enhance training efficiency.
|
57 |
-
Training Hyperparameters
|
58 |
-
Batch Size: 16
|
59 |
-
Learning Rate: 5e-5
|
60 |
-
Epochs: 3
|
61 |
-
Optimizer: AdamW
|
62 |
-
Evaluation
|
63 |
-
Metrics
|
64 |
-
Metrics Used: ROUGE Score, BLEU Score
|
65 |
-
Evaluation Datasets: Evaluated on a subset of fka/awesome-chatgpt-prompts for summarization performance.
|
66 |
-
Technical Specifications
|
67 |
-
Model Architecture
|
68 |
-
Based on Llama-3 architecture, optimized for summarization through attention-based mechanisms.
|
69 |
-
Compute Infrastructure
|
70 |
-
Hardware: Nvidia A100 GPUs were used for training.
|
71 |
-
Software: Hugging Face’s transformers library along with the diffusers library.
|
72 |
-
Environmental Impact
|
73 |
-
Hardware Type: Nvidia A100 GPUs
|
74 |
-
Training Duration: ~10 hours
|
75 |
-
Estimated Carbon Emission: Approximate emissions calculated using Machine Learning Impact calculator.
|
76 |
-
Contact
|
77 |
-
For any questions or issues, please reach out to Rithu Paran via the Hugging Face Forum.
|
78 |
|
|
|
13 |
tags:
|
14 |
- legal
|
15 |
---
|
16 |
+
Model Overview Section:
|
17 |
+
Add a brief paragraph summarizing the model’s purpose, what makes it unique, and its intended users.
|
18 |
+
For example:
|
19 |
+
vbnet
|
20 |
+
Copy code
|
21 |
+
This model, developed by Rithu Paran, is designed to provide high-quality text summarization, making it ideal for applications in content curation, news summarization, and document analysis. Leveraging the Meta-Llama architecture, it delivers accurate, concise summaries while maintaining key information, and is optimized for general-purpose use.
|
22 |
+
2. Model Description:
|
23 |
+
Under Model Type, clarify the model's focus on general text summarization or a specific summarization task (e.g., long-form content, news).
|
24 |
+
Update Language(s) with more detail on the model's primary language capabilities.
|
25 |
+
3. Model Use Cases:
|
26 |
+
Expand Direct Use and Out-of-Scope Use with specific examples to guide users.
|
27 |
+
Direct Use: News article summarization, summarizing reports for quick insights, content summarization for educational purposes.
|
28 |
+
Out-of-Scope Use: Avoid using it for legal or medical content without specialized training.
|
29 |
+
4. Bias, Risks, and Limitations:
|
30 |
+
Include any known biases related to the datasets used. For example, “The model may reflect certain cultural or societal biases present in the training data.”
|
31 |
+
Add a note on limitations in terms of accuracy for complex technical summaries or if the model occasionally generates nonsensical summaries.
|
32 |
+
5. How to Get Started with the Model:
|
33 |
+
Add more usage tips, such as how to adjust parameters for different summary lengths.
|
34 |
+
Example:
|
35 |
+
python
|
36 |
+
Copy code
|
37 |
+
summary = summarizer(text, max_length=150, min_length=50, do_sample=False)
|
38 |
+
6. Training Details:
|
39 |
+
In Training Hyperparameters, provide a rationale for the chosen batch size and learning rate.
|
40 |
+
If you have insights into why AdamW was chosen as the optimizer, it would be helpful to include that too.
|
41 |
+
7. Environmental Impact:
|
42 |
+
Add a short sentence on the steps taken to minimize the environmental impact, if applicable.
|
43 |
+
8. Evaluation:
|
44 |
+
If possible, include the exact ROUGE and BLEU scores to show the model’s summarization performance.
|
45 |
+
9. Additional Information:
|
46 |
+
You could add a Future Work or Planned Improvements section if you plan to enhance the model further.
|
47 |
+
In the Contact section, you might mention if you are open to feedback, bug reports, or contributions.
|
48 |
+
Here’s a short sample revision for the Model Details section:
|
49 |
|
|
|
|
|
50 |
Model Details
|
51 |
Model Description
|
52 |
+
This model by Rithu Paran focuses on text summarization, reducing lengthy content into concise summaries. Built on the Meta-Llama architecture, it has been finetuned to effectively capture key points from general text sources.
|
53 |
+
|
54 |
+
Purpose: General-purpose text summarization
|
55 |
+
Developer: Rithu Paran
|
56 |
+
Architecture: Transformer-based Llama-3
|
57 |
+
Language: Primarily English
|
58 |
+
Model Versions
|
59 |
+
|
60 |
Base Model: Meta-Llama/Llama-3.2-11B-Vision-Instruct
|
61 |
+
Current Finetuned Model: Meta-Llama/Llama-3.1-8B-Instruct
|
62 |
+
For the full model card, keep these ideas in mind and feel free to customize it further to fit your style! Let me know if you’d like more specific revisions.
|
63 |
+
|
64 |
+
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
65 |
|
|
|
|
|
66 |
|
|
|
|
|
|
|
67 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
68 |
|