neph1
/

bellman-7b-mistral-instruct-v0.2

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

neph1 commited on Jan 25

Commit

8ed8b7c

•

1 Parent(s): fc637b9

Update README.md

Files changed (1) hide show

README.md +11 -7

README.md CHANGED Viewed

@@ -8,12 +8,14 @@ language:
 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/653cd3049107029eb004f968/pLcriXAfp3Y9Z0RGwwVUB.png)
 Updated 240112: Bigger dataset. Validation set. rank/alpha: 16/32. 2k context length. Please note that unquantized version is NOT updated.
-Qlora trained for 2 epochs on 9600 rows of q&a from around 1300 pages from wikipedia + around 100 of python questions and examples from
 neph1/Alpaca-Lora-GPT4-Swedish-Refined (because I had spent so much time cleaning them and didn't want to throw them away). Also a couple of hundred rows of manually
 gathered examples and some generated using chat-gpt.
-Dataset otherwise generated using gpt-3.5-turbo.
 The goal is to improve knowledge in Swedish topics, while improving the quality of the language.
@@ -24,20 +26,22 @@ As with any bard, what this model says should be taken with a grain of salt. Eve
 Configuration:
-Rank: 16
-Alpha: 32
 Dropout: 0.1
-Learning rate: 3e-5
 Context length: 2048
-Prompt format: ```[INST] Hur bakar jag sockerkaka?[/INST]```
-An absolutely beautiful example (first try). Sadly it's not always as good. (gguf q8, temp: 0.7, llama.cpp):
 ```
 User: Vem är statsminister i Sverige?

 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/653cd3049107029eb004f968/pLcriXAfp3Y9Z0RGwwVUB.png)
+Updated 240124: Dataset: 11300 rows. Rank: 32/64. Included a set of "summarize" tasks and longer "essay" style input. The dataset for the 240112 update had about 2000 duplicated rows, sadly.
 Updated 240112: Bigger dataset. Validation set. rank/alpha: 16/32. 2k context length. Please note that unquantized version is NOT updated.
+Qlora trained for 2 epochs on 11300 rows of q&a + around 100 of python questions and examples from
 neph1/Alpaca-Lora-GPT4-Swedish-Refined (because I had spent so much time cleaning them and didn't want to throw them away). Also a couple of hundred rows of manually
 gathered examples and some generated using chat-gpt.
+Dataset otherwise generated using gpt-3.5-turbo and Mixtral 8x7b (about on third).
 The goal is to improve knowledge in Swedish topics, while improving the quality of the language.
 Configuration:
+Rank: 32
+Alpha: 64
 Dropout: 0.1
+Learning rate (at start): 2e-5
 Context length: 2048
+Training length: ca 2.1 epochs
+Prompt format: ```[INST]Hur bakar jag sockerkaka?[/INST]```
+Example (240112 version). Sadly it's not always as good. (gguf q8, temp: 0.7, llama.cpp):
 ```
 User: Vem är statsminister i Sverige?