shisa-ai
/

shisa-v1-llama3-70b

Text Generation

Generated from Trainer

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

leonardlin commited on May 22

Commit

35add1d

•

1 Parent(s): fc5c300

Update README.md

Files changed (1) hide show

README.md +8 -0

README.md CHANGED Viewed

@@ -10,6 +10,14 @@ model-index:
 shisa-v2 Base Model ablation
 Using a [fork](https://github.com/shisa-ai/shaberi) of [Lightblue's Shaberi benchmark framework](https://github.com/lightblue-tech/japanese_llm_eval):
 | Model                                  | Average | ELYZA-tasks-100 | MT-Bench | Rakuda | Tengu-Bench |

 shisa-v2 Base Model ablation
+This model uses a LR of 8e-6 that slightly improves performance vs the original 2e-5
+It also uses NEFTune, although the expected impact may be neglible for this dataset.
+(this appears to validate the Llama 3 8B LR ablations for predicting improved LR hyperparameter)
+While the last model matched gpt-3.5-turbo, I think it's fair to say that this model allows us to farily say that it "beats" it.
 Using a [fork](https://github.com/shisa-ai/shaberi) of [Lightblue's Shaberi benchmark framework](https://github.com/lightblue-tech/japanese_llm_eval):
 | Model                                  | Average | ELYZA-tasks-100 | MT-Bench | Rakuda | Tengu-Bench |