Kimargin commited on
Commit
e47bb94
1 Parent(s): 9215450

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +93 -76
README.md CHANGED
@@ -12,93 +12,110 @@ metrics:
12
  - accuracy
13
  ---
14
 
15
- Model Card for GPT-NEO-1.3B-wiki
16
- Model Details
17
- Model Description
18
- This model is a fine-tuned version of EleutherAI/gpt-neo-1.3B. It has been fine-tuned on the Wikipedia dataset for tasks such as text generation, summarization, and question-answering in the English language. The model uses a causal language modeling objective and is capable of generating contextually coherent text.
19
-
20
- Developed by: Kimargin
21
- Model type: Fine-tuned model
22
- Language(s): English
23
- License: Apache 2.0
24
- Finetuned from model: EleutherAI/gpt-neo-1.3B
25
- Model Sources
26
- Repository: Kimargin/GPT-NEO-1.3B-wiki
27
- Uses
28
- Direct Use
29
- This model can be used directly for generating text, summarizing documents, and answering factual questions based on context. It is suitable for general-purpose NLP tasks where coherent and fluent text generation is needed.
30
-
31
- Downstream Use
32
- Users can fine-tune this model further for specialized tasks such as summarization of domain-specific texts (e.g., legal or medical texts), generating code, or answering specific types of questions.
33
-
34
- Out-of-Scope Use
35
- The model is not suitable for real-time decision-making in critical applications, such as medical or legal advice. It may produce biased or inaccurate text if given ambiguous or politically sensitive input.
36
-
37
- Bias, Risks, and Limitations
38
- This model was trained on Wikipedia data, which could contain biases inherent in the dataset. The model may reflect those biases in its output. Additionally, the model may not handle very specialized knowledge domains accurately.
39
-
40
- Recommendations
41
- Users should carefully review and verify the text generated by the model before using it in any critical applications. The model should be used in scenarios where generated outputs can be reviewed by a human to mitigate any potential biases or inaccuracies.
42
-
43
- How to Get Started with the Model
44
- To use this model, you can load it with the following code:
45
-
46
- python
47
- Copy code
 
 
 
 
 
 
 
 
 
 
 
 
48
  from transformers import AutoModelForCausalLM, AutoTokenizer
49
 
50
  tokenizer = AutoTokenizer.from_pretrained("Kimargin/GPT-NEO-1.3B-wiki")
51
  model = AutoModelForCausalLM.from_pretrained("Kimargin/GPT-NEO-1.3B-wiki")
52
 
53
- input_text = "Explain the history of the internet."
54
  inputs = tokenizer(input_text, return_tensors="pt")
55
  outputs = model.generate(inputs["input_ids"], max_length=100)
56
  print(tokenizer.decode(outputs[0], skip_special_tokens=True))
57
- Training Details
58
- Training Data
59
- The model was fine-tuned on a subset of the English Wikipedia dataset, which includes a broad range of topics and domains. The dataset is generally factual but may still contain biases.
60
-
61
- Training Procedure
62
- The model was trained using mixed precision (float16) on GPU hardware.
63
-
64
- Training Hyperparameters
65
- Learning rate: 5e-5
66
- Batch size: 16
67
- Epochs: 3
68
- Precision: float16 (mixed precision)
69
- Evaluation
70
- Testing Data
71
- The model was evaluated on a held-out validation subset of the Wikipedia dataset.
72
-
73
- Factors
74
- General domain knowledge: The model performs well on generating factual and coherent text on common knowledge topics covered in Wikipedia.
75
- Contextual understanding: The model can maintain coherence over relatively long text sequences but may struggle with very specialized or niche topics.
76
- Metrics
77
- Perplexity: The model achieved a perplexity of 25.3 on the validation set.
78
- Accuracy: Measured by manual evaluation of text generation for accuracy in answering factual questions.
79
- Results
80
- The model demonstrates strong capabilities in general-purpose text generation and answering factual questions. However, it can generate irrelevant or biased responses in edge cases, especially with ambiguous input.
81
-
82
- Environmental Impact
83
- Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).
84
-
85
- Hardware Type: NVIDIA A100 GPUs
86
- Hours used: 20 hours
87
- Cloud Provider: Google Cloud
88
- Compute Region: US-Central
89
- Carbon Emitted: ~50 kg CO2
90
- Technical Specifications
91
- Model Architecture and Objective
92
- The model is a causal language model with 1.3 billion parameters based on the GPT-Neo architecture.
93
-
94
- Compute Infrastructure
 
 
 
 
 
95
  The model was trained on NVIDIA A100 GPUs using Google Cloud infrastructure.
96
 
97
- Citation
98
- If you use this model, please cite it as follows:
 
99
 
100
- bibtex
101
- Copy code
102
  @article{gpt-neo,
103
  author = {EleutherAI},
104
  title = {GPT-Neo: Large Scale Autoregressive Language Model},
 
12
  - accuracy
13
  ---
14
 
15
+ # Model Card for GPT-NEO-1.3B-wiki
16
+
17
+ ## Model Details
18
+
19
+ ### Model Description
20
+
21
+ This model is based on [EleutherAI/gpt-neo-1.3B](https://huggingface.co/EleutherAI/gpt-neo-1.3B) and has been fine-tuned on the Wikipedia dataset. It is designed for text generation tasks such as summarization, question answering, and text completion in English. The model is fine-tuned to improve the fluency and factual accuracy of the generated content.
22
+
23
+ - **Developed by:** Kimargin
24
+ - **Model type:** Fine-tuned model
25
+ - **Language(s):** English
26
+ - **License:** Apache 2.0
27
+ - **Finetuned from model:** EleutherAI/gpt-neo-1.3B
28
+
29
+ ### Model Sources
30
+
31
+ - **Repository:** [Kimargin/GPT-NEO-1.3B-wiki](https://huggingface.co/Kimargin/GPT-NEO-1.3B-wiki)
32
+
33
+ ## Uses
34
+
35
+ ### Direct Use
36
+
37
+ This model can be used for tasks like text generation, summarization, and question-answering. It is useful for generating coherent and factual text based on English-language prompts.
38
+
39
+ ### Downstream Use
40
+
41
+ The model can be fine-tuned further for domain-specific applications such as legal or medical text generation, creating specialized question-answering systems, or generating structured content from prompts.
42
+
43
+ ### Out-of-Scope Use
44
+
45
+ The model should not be used in critical applications (e.g., legal, medical, or financial advice) as it may generate biased, inaccurate, or misleading information. It is also not suited for real-time decision-making.
46
+
47
+ ## Bias, Risks, and Limitations
48
+
49
+ Since the model was trained on Wikipedia data, it may inherit biases present in the dataset. Users should be cautious when using the model for generating sensitive or potentially biased content. The model may produce inaccurate or misleading text if given ambiguous or misleading prompts.
50
+
51
+ ### Recommendations
52
+
53
+ Users should verify the outputs of the model, especially in critical use cases, and should not rely solely on the model for factual accuracy without human verification.
54
+
55
+ ## How to Get Started with the Model
56
+
57
+ To use the model, you can load it as follows:
58
+
59
+ ```python
60
  from transformers import AutoModelForCausalLM, AutoTokenizer
61
 
62
  tokenizer = AutoTokenizer.from_pretrained("Kimargin/GPT-NEO-1.3B-wiki")
63
  model = AutoModelForCausalLM.from_pretrained("Kimargin/GPT-NEO-1.3B-wiki")
64
 
65
+ input_text = "What happened during World War II?"
66
  inputs = tokenizer(input_text, return_tensors="pt")
67
  outputs = model.generate(inputs["input_ids"], max_length=100)
68
  print(tokenizer.decode(outputs[0], skip_special_tokens=True))
69
+
70
+
71
+ ## Training Details
72
+
73
+ ### Training Data
74
+ The model was fine-tuned on a subset of the Wikipedia dataset, which contains a broad range of general knowledge topics. This dataset was chosen to improve the model's capability to generate accurate, general-domain knowledge.
75
+
76
+ ### Training Procedure
77
+ The model was fine-tuned using mixed precision (float16) on multiple GPUs for three epochs. The training was done to minimize perplexity and improve fluency in generated text.
78
+
79
+ ### Training Hyperparameters
80
+ - **Learning rate:** 5e-5
81
+ - **Batch size:** 16
82
+ - **Epochs:** 3
83
+ - **Precision:** float16 (mixed precision)
84
+
85
+ ## Evaluation
86
+
87
+ ### Testing Data
88
+ The model was evaluated using a validation subset of the Wikipedia dataset to measure its performance on general text generation tasks.
89
+
90
+ ### Metrics
91
+ - **Perplexity:** The model achieved a perplexity of 25.3 on the validation set.
92
+ - **Accuracy:** The accuracy of the model in generating factual answers was evaluated qualitatively.
93
+
94
+ ### Results
95
+ The model demonstrates good performance in generating coherent and contextually relevant text, but it may still struggle with niche or specialized topics that are underrepresented in its training data.
96
+
97
+ ## Environmental Impact
98
+ Training large models like GPT-Neo has a significant carbon footprint due to the computational resources required. The estimated environmental impact of fine-tuning this model is as follows:
99
+
100
+ - **Hardware Type:** NVIDIA A100 GPUs
101
+ - **Hours used:** 20 hours
102
+ - **Cloud Provider:** Google Cloud
103
+ - **Compute Region:** US-Central
104
+ - **Carbon Emitted:** ~50 kg CO2 (estimated using the [ML Impact calculator](https://mlco2.github.io/impact#compute))
105
+
106
+ ## Technical Specifications
107
+
108
+ ### Model Architecture and Objective
109
+ The model is a causal language model with 1.3 billion parameters, based on the GPT-Neo architecture. It generates text by predicting the next word in a sequence, making it suitable for text completion and generation tasks.
110
+
111
+ ### Compute Infrastructure
112
  The model was trained on NVIDIA A100 GPUs using Google Cloud infrastructure.
113
 
114
+ ## Citation
115
+
116
+ If you use this model, please cite the original GPT-Neo model as follows:
117
 
118
+ ```bibtex
 
119
  @article{gpt-neo,
120
  author = {EleutherAI},
121
  title = {GPT-Neo: Large Scale Autoregressive Language Model},