AtAndDev
/

ShortKing-3b-v0.2

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

AtAndDev commited on Oct 2, 2023

Commit

4bcf161

•

1 Parent(s): 114712b

Update README.md

Files changed (1) hide show

README.md +37 -1

README.md CHANGED Viewed

@@ -4,4 +4,40 @@ datasets:
 - Photolens/alpaca-cleaned-airoboros-2.1-no-code-oasst1-en-merged
 language:
 - en
----

 - Photolens/alpaca-cleaned-airoboros-2.1-no-code-oasst1-en-merged
 language:
 - en
+---
+## Model overview
+This model is finetuned on *[a merged dataset of: oasst1-en, alpaca-cleaned and airoboros-2.1-no-code](https://huggingface.co/datasets/Photolens/alpaca-cleaned-airoboros-2.1-no-code-oasst1-en-merged)* on a base model: *[Marx-3b-V2](https://huggingface.co/acrastt/Marx-3B-V2)*
+ - License: "`Creative-Commons-Attribution-4.0`"
+ - Language: "`en`"
+ - Size: "`3.43b params`"
+## Prompt template
+Prompt template:
+```
+### SYSTEM:
+<system_prompt_here>
+### HUMAN:
+<prompter_message_here>
+### INPUT:
+<input_text_here>
+### RESPONSE:
+<leave_a_blank_line_here>
+```
+*Note: If you dont have a system or input text, do not include the tokens in the prompt.*
+## Training Details
+This model took `2:40:54` to train in LoRA on a single `A100 40gb` GPU.<br>
+ - *epochs*:  `1`
+ - *train batch size*:  `8`
+ - *eval batch size*:  `8`
+ - *gradient accumulation steps*:  `1`
+ - *maximum gradient normal*:  `0.3`
+ - *learning rate*:  `2e-4`
+ - *weight decay*:  `0.001`
+ - *optimizer*:  `paged_adamw_32bit`
+ - *learning rate schedule*:  `cosine`
+ - *warmup ratio (linear)*:  `0.03`