nicholasKluge
commited on
Commit
•
6c5d3de
1
Parent(s):
9bb929e
Upload 12 files
Browse files- AIRA_FineTuning.ipynb +0 -0
- Aira_emissions.csv +1 -1
- README.md +8 -9
- config.json +1 -1
- generation_config.json +1 -1
- pytorch_model.bin +1 -1
- training_stats.parquet +1 -1
- vocab.json +0 -0
AIRA_FineTuning.ipynb
CHANGED
The diff for this file is too large to render.
See raw diff
|
|
Aira_emissions.csv
CHANGED
@@ -1,2 +1,2 @@
|
|
1 |
timestamp,project_name,run_id,duration,emissions,emissions_rate,cpu_power,gpu_power,ram_power,cpu_energy,gpu_energy,ram_energy,energy_consumed,country_name,country_iso_code,region,cloud_provider,cloud_region,os,python_version,codecarbon_version,cpu_count,cpu_model,gpu_count,gpu_model,longitude,latitude,ram_total_size,tracking_mode,on_cloud,pue
|
2 |
-
2023-06-
|
|
|
1 |
timestamp,project_name,run_id,duration,emissions,emissions_rate,cpu_power,gpu_power,ram_power,cpu_energy,gpu_energy,ram_energy,energy_consumed,country_name,country_iso_code,region,cloud_provider,cloud_region,os,python_version,codecarbon_version,cpu_count,cpu_model,gpu_count,gpu_model,longitude,latitude,ram_total_size,tracking_mode,on_cloud,pue
|
2 |
+
2023-06-26T22:38:01,Aira_emissions,bd08affb-b1e2-4849-8513-a85a02cf0f84,3690.1905386447906,0.0009893192359507477,2.6809435057358087e-07,42.5,296.394,31.30528450012207,0.04356464091208248,0.34052867170535045,0.03207338637952947,0.41616669899696207,Canada,CAN,quebec,,,Linux-5.15.107+-x86_64-with-glibc2.31,3.10.12,2.2.4,12,Intel(R) Xeon(R) CPU @ 2.20GHz,1,1 x NVIDIA A100-SXM4-40GB,-71.2,46.8,83.48075866699219,machine,N,1.0
|
README.md
CHANGED
@@ -44,7 +44,6 @@ inference:
|
|
44 |
|
45 |
The dataset used to train this model combines the following sources of data: the [`synthetic-instruct-gptj-pairwise`](https://huggingface.co/datasets/Dahoas/synthetic-instruct-gptj-pairwise) dataset, the [`databricks_dolly_15k`](https://huggingface.co/datasets/HuggingFaceH4/databricks_dolly_15k) dataset, the [`instruction-dataset`](https://huggingface.co/datasets/HuggingFaceH4/instruction-dataset) dataset, and a subset of [Aira's](https://github.com/Nkluge-correa/Aira-EXPERT) fine-tuning dataset, focused on Q&A related to Ethics, AI, AI safety, and other related topics. The dataset is available in both Portuguese and English.
|
46 |
|
47 |
-
|
48 |
Check our gradio-demo in [Spaces](https://huggingface.co/spaces/nicholasKluge/Aira-Demo).
|
49 |
|
50 |
## Details
|
@@ -56,22 +55,22 @@ Check our gradio-demo in [Spaces](https://huggingface.co/spaces/nicholasKluge/Ai
|
|
56 |
- **Batch size:** 32
|
57 |
- **Optimizer:** `torch.optim.AdamW` (warmup_steps = 1e2, learning_rate = 5e-4, epsilon = 1e-8)
|
58 |
- **GPU:** 1 NVIDIA A100-SXM4-40GB
|
59 |
-
- **Emissions:** 0.
|
60 |
-
- **Total Energy Consumption:** 0.
|
61 |
|
62 |
| Epoch/Loss|Training|Validation|
|
63 |
|---|---|---|
|
64 |
-
| 1 |0.
|
65 |
-
| 2 |0.
|
66 |
-
| 3 |0.
|
67 |
-
| 4 |0.
|
68 |
-
| 5 |0.
|
69 |
|
70 |
This repository has the notebook used to train this model.
|
71 |
|
72 |
## Usage
|
73 |
|
74 |
-
Two special tokens are used to mark the user side of the interaction and the model's response:
|
75 |
|
76 |
`<|startoftext|>`What is a language model?`<|endoftext|>`A language model is a probability distribution over a vocabulary.`<|endoftext|>`
|
77 |
|
|
|
44 |
|
45 |
The dataset used to train this model combines the following sources of data: the [`synthetic-instruct-gptj-pairwise`](https://huggingface.co/datasets/Dahoas/synthetic-instruct-gptj-pairwise) dataset, the [`databricks_dolly_15k`](https://huggingface.co/datasets/HuggingFaceH4/databricks_dolly_15k) dataset, the [`instruction-dataset`](https://huggingface.co/datasets/HuggingFaceH4/instruction-dataset) dataset, and a subset of [Aira's](https://github.com/Nkluge-correa/Aira-EXPERT) fine-tuning dataset, focused on Q&A related to Ethics, AI, AI safety, and other related topics. The dataset is available in both Portuguese and English.
|
46 |
|
|
|
47 |
Check our gradio-demo in [Spaces](https://huggingface.co/spaces/nicholasKluge/Aira-Demo).
|
48 |
|
49 |
## Details
|
|
|
55 |
- **Batch size:** 32
|
56 |
- **Optimizer:** `torch.optim.AdamW` (warmup_steps = 1e2, learning_rate = 5e-4, epsilon = 1e-8)
|
57 |
- **GPU:** 1 NVIDIA A100-SXM4-40GB
|
58 |
+
- **Emissions:** 0.0009 KgCO2 (Canada)
|
59 |
+
- **Total Energy Consumption:** 0.41 kWh
|
60 |
|
61 |
| Epoch/Loss|Training|Validation|
|
62 |
|---|---|---|
|
63 |
+
| 1 |0.947100|0.774946|
|
64 |
+
| 2 |0.737357|0.730962|
|
65 |
+
| 3 |0.657410|0.710232|
|
66 |
+
| 4 |0.597437|0.705064|
|
67 |
+
| 5 |0.551684|0.704830|
|
68 |
|
69 |
This repository has the notebook used to train this model.
|
70 |
|
71 |
## Usage
|
72 |
|
73 |
+
Two special tokens are used to mark the user side of the interaction and the model's response:
|
74 |
|
75 |
`<|startoftext|>`What is a language model?`<|endoftext|>`A language model is a probability distribution over a vocabulary.`<|endoftext|>`
|
76 |
|
config.json
CHANGED
@@ -33,7 +33,7 @@
|
|
33 |
}
|
34 |
},
|
35 |
"torch_dtype": "float32",
|
36 |
-
"transformers_version": "4.30.
|
37 |
"use_cache": true,
|
38 |
"vocab_size": 50259
|
39 |
}
|
|
|
33 |
}
|
34 |
},
|
35 |
"torch_dtype": "float32",
|
36 |
+
"transformers_version": "4.30.2",
|
37 |
"use_cache": true,
|
38 |
"vocab_size": 50259
|
39 |
}
|
generation_config.json
CHANGED
@@ -2,5 +2,5 @@
|
|
2 |
"_from_model_config": true,
|
3 |
"bos_token_id": 50256,
|
4 |
"eos_token_id": 50256,
|
5 |
-
"transformers_version": "4.30.
|
6 |
}
|
|
|
2 |
"_from_model_config": true,
|
3 |
"bos_token_id": 50256,
|
4 |
"eos_token_id": 50256,
|
5 |
+
"transformers_version": "4.30.2"
|
6 |
}
|
pytorch_model.bin
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
size 497813341
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:7f01da44af4eef5e609983099507e6c2e6c92bb149afa3723d555cdf3a32c4c5
|
3 |
size 497813341
|
training_stats.parquet
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
size 3108
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:63cec774a93f84808183ddf0dacaca250ff645a5e6883cdfd4ea3f96a0cce3fa
|
3 |
size 3108
|
vocab.json
CHANGED
The diff for this file is too large to render.
See raw diff
|
|