leonardlin
commited on
Commit
•
093800e
1
Parent(s):
916206c
Update README.md
Browse files
README.md
CHANGED
@@ -10,17 +10,25 @@ datasets:
|
|
10 |
- augmxnt/ultra-orca-boros-en-ja-v1
|
11 |
---
|
12 |
|
13 |
-
shisa-v2 Base Model ablation
|
14 |
|
15 |
-
This
|
16 |
-
It also uses NEFTune, although the expected impact may be neglible for this dataset.
|
17 |
|
18 |
-
|
|
|
19 |
|
20 |
-
|
21 |
|
|
|
22 |
|
23 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
24 |
|
25 |
| Model | Average | ELYZA-tasks-100 | MT-Bench | Rakuda | Tengu-Bench |
|
26 |
|----------------------------------------|---------|-----------------|----------|--------|-------------|
|
@@ -29,6 +37,7 @@ Using a [fork](https://github.com/shisa-ai/shaberi) of [Lightblue's Shaberi benc
|
|
29 |
| **shisa-ai/shisa-v1-llama3-70b** | **7.30**| **7.34** | **7.67** | **8.15** | **6.04** |
|
30 |
| gpt-3.5-turbo-0125 | 7.17 | 7.24 | 6.98 | 7.64 | 6.82 |
|
31 |
| **shisa-ai/shisa-v1-llama3-70b** | **7.17**| **7.16** | **7.45** | **7.98** | **6.09** |
|
|
|
32 |
| karakuri-ai/karakuri-lm-70b-chat-v0.1 | 6.84 | 6.86 | 6.43 | 7.85 | 6.23 |
|
33 |
| lightblue/ao-karasu-72B | 6.81 | 7.19 | 6.54 | 7.25 | 6.27 |
|
34 |
| **shisa-ai/shisa-v1-llama3-8b^** | **6.29**| **6.62** | **6.41** | **7.05**|**5.07** |
|
|
|
10 |
- augmxnt/ultra-orca-boros-en-ja-v1
|
11 |
---
|
12 |
|
13 |
+
# shisa-v2 Base Model ablation
|
14 |
|
15 |
+
This is a fine-tune Llama 3 70B Instruct with the primary `shisa-v1` dataset to improve Japanese language capabilities.
|
|
|
16 |
|
17 |
+
This model uses a LR of 8e-6 that slightly improves performance vs the original 2e-5 tune (based on and validating predictive power of the the
|
18 |
+
results of the Llama 3 8B LR ablations).
|
19 |
|
20 |
+
It also uses NEFTune, although the expected impact is neglible for this dataset.
|
21 |
|
22 |
+
While the 2e-5 model matched gpt-3.5-turbo performance, this 2e6 version consistently edges it out, so I think it's fair to say that this model "beats" it.
|
23 |
|
24 |
+
There are a selection of GGUF quants here: https://huggingface.co/shisa-ai/shisa-v1-llama3-70b-gguf
|
25 |
+
|
26 |
+
While this is merely a test ablation on the road to `shisa-v2`, as the strongest commercially usable open JA model I've tested so far, this model may be of general interest.
|
27 |
+
|
28 |
+
|
29 |
+
## Performance
|
30 |
+
|
31 |
+
Measured using a [fork](https://github.com/shisa-ai/shaberi) of [Lightblue's Shaberi benchmark framework](https://github.com/lightblue-tech/japanese_llm_eval):
|
32 |
|
33 |
| Model | Average | ELYZA-tasks-100 | MT-Bench | Rakuda | Tengu-Bench |
|
34 |
|----------------------------------------|---------|-----------------|----------|--------|-------------|
|
|
|
37 |
| **shisa-ai/shisa-v1-llama3-70b** | **7.30**| **7.34** | **7.67** | **8.15** | **6.04** |
|
38 |
| gpt-3.5-turbo-0125 | 7.17 | 7.24 | 6.98 | 7.64 | 6.82 |
|
39 |
| **shisa-ai/shisa-v1-llama3-70b** | **7.17**| **7.16** | **7.45** | **7.98** | **6.09** |
|
40 |
+
| karakuri-ai/karakuri-lm-8x7b-chat-v0.1 | 7.00 | 7.18 | 6.30 | 7.98 | 6.55 |
|
41 |
| karakuri-ai/karakuri-lm-70b-chat-v0.1 | 6.84 | 6.86 | 6.43 | 7.85 | 6.23 |
|
42 |
| lightblue/ao-karasu-72B | 6.81 | 7.19 | 6.54 | 7.25 | 6.27 |
|
43 |
| **shisa-ai/shisa-v1-llama3-8b^** | **6.29**| **6.62** | **6.41** | **7.05**|**5.07** |
|