Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,80 @@
|
|
1 |
---
|
2 |
license: apache-2.0
|
|
|
|
|
3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
license: apache-2.0
|
3 |
+
language:
|
4 |
+
- en
|
5 |
---
|
6 |
+
|
7 |
+
Finetunes Llama-13b+Alpaca to solve problems via stepwise reasoning (OpenAI [PRM800k dataset](https://github.com/openai/prm800k), or rather our postprocessed version, [`sl-alex/openai-prm800k-solutions-only`](https://huggingface.co/datasets/sl-alex/openai-prm800k-solutions-only)).
|
8 |
+
|
9 |
+
## Model description
|
10 |
+
|
11 |
+
This is a fork of [`llama-13b`](https://huggingface.co/huggyllama/llama-13b) + [`chansung/alpaca-lora-13b`](https://huggingface.co/chansung/alpaca-lora-13b).
|
12 |
+
|
13 |
+
That is: we loaded Llama-13b, we applied Alpaca LoRA, expanded vocabulary, then QLoRA 4-bit finetuned from there.
|
14 |
+
|
15 |
+
Parts:
|
16 |
+
|
17 |
+
- base model [`llama-13b`](https://huggingface.co/huggyllama/llama-13b)
|
18 |
+
- LoRA 0 [`chansung/alpaca-lora-13b`](https://huggingface.co/chansung/alpaca-lora-13b)
|
19 |
+
- LoRA 1
|
20 |
+
- `adapter_config.json`
|
21 |
+
- `adapter_model.bin`
|
22 |
+
- tokenizer
|
23 |
+
- `added_tokens.json`
|
24 |
+
- `special_tokens_map.json`
|
25 |
+
- `tokenizer.model`
|
26 |
+
- `tokenizer_config.json`
|
27 |
+
- finetuned input/output embedding layers:
|
28 |
+
- `embed_tokens.pt` (`state_dict` for `model.get_input_embeddings()`, `embed_tokens: Embedding`)
|
29 |
+
- `lm_head.pt` (`state_dict` for `model.get_output_embeddings()`, `lm_head: Linear`)
|
30 |
+
|
31 |
+
## Training
|
32 |
+
|
33 |
+
Trained using [`qlora.py`](https://github.com/scottlogic-alex/qlora/blob/stepwise/qlora.py) from our [`stepwise`](https://github.com/scottlogic-alex/qlora/tree/stepwise) branch of [qlora](https://github.com/artidoro/qlora).
|
34 |
+
Known-good as of commit [`4755584`](https://github.com/scottlogic-alex/qlora/blob/4755584f294ce88b3aadd9bdd3088c095b543578/qlora.py).
|
35 |
+
|
36 |
+
`python -m qlora --model_name_or_path huggyllama/llama-13b --lora_name_or_path chansung/alpaca-lora-13b --dataset prm800k-solutions --dataset_format prm800k-solutions --bf16 --max_memory_MB 24000 --use_bos_token_in_prompt --truncate_toward_center --source_max_len 184 --target_max_len 998 --gradient_accumulation_steps 4 --per_device_train_batch_size 4 --per_device_eval_batch_size 4 --learning_rate 0.0002 --run_name 13b_alpaca_special_tokens_long --report_to wandb --save_steps 64 --save_total_limit 3 --max_steps 1664 --evaluation_strategy steps --eval_steps 64 --generate_steps 16 --register_process_supervision_tokens`
|
37 |
+
|
38 |
+
## Usage
|
39 |
+
|
40 |
+
You can load using [`evaluate.py`](https://github.com/scottlogic-alex/qlora/blob/stepwise/evaluate.py#L209-L278) from our [`stepwise`](https://github.com/scottlogic-alex/qlora/tree/stepwise) branch of [qlora](https://github.com/artidoro/qlora).
|
41 |
+
Known-good as of commit [`4755584`](https://github.com/scottlogic-alex/qlora/blob/4755584f294ce88b3aadd9bdd3088c095b543578/evaluate.py).
|
42 |
+
|
43 |
+
You'll need to download `embed_tokens.pt` and `lm_head.pt` from this repository, and ensure they are saved to the root of the `qlora` repository, then run `evaluate.py` like so:
|
44 |
+
|
45 |
+
```bash
|
46 |
+
git clone -b stepwise https://github.com/scottlogic-alex/qlora.git
|
47 |
+
cd qlora
|
48 |
+
wget https://huggingface.co/sl-alex/llama-13b-alpaca-stepwise-lora-embtuned/resolve/main/embed_tokens.pt
|
49 |
+
wget https://huggingface.co/sl-alex/llama-13b-alpaca-stepwise-lora-embtuned/resolve/main/lm_head.pt
|
50 |
+
python -m evaluate \
|
51 |
+
--model_name_or_path huggyllama/llama-13b \
|
52 |
+
--base_lora_model_name_or_path chansung/alpaca-lora-13b \
|
53 |
+
--tokenizer_model_name_or_path sl-alex/llama-13b-alpaca-stepwise-lora-embtuned \
|
54 |
+
--lora_model_name_or_path sl-alex/llama-13b-alpaca-stepwise-lora-embtuned \
|
55 |
+
--input_embedding_path embed_tokens.pt \
|
56 |
+
--output_embedding_path lm_head.pt \
|
57 |
+
--bf16 \
|
58 |
+
--use_bos_token_in_prompt \
|
59 |
+
--overrun_countermeasures False
|
60 |
+
```
|
61 |
+
|
62 |
+
## Prompt style
|
63 |
+
|
64 |
+
The model is finetuned for instruction-response.
|
65 |
+
|
66 |
+
`prompt = '<s>Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction:\nHow do I feed a family of 5 without ingredients?\n\n### Response:\n<|step_start|>'`
|
67 |
+
|
68 |
+
## Example output:
|
69 |
+
|
70 |
+
`$ How do I drink a matcha latte?`
|
71 |
+
`To drink a matcha latte, I should first remove the matcha from the bag. <|step_end|> <|step_start|> I can do that by opening the bag and shaking out the matcha into a glass or bowl. <|step_end|> <|step_start|> Then, I should add some hot water into the glass or bowl and start stirring the matcha. <|step_end|> <|step_start|> This is to make the matcha easier to mix with the water. <|step_end|> <|step_start|> Next, I should add milk into the glass or bowl and continue stirring. <|step_end|> <|step_start|> This is to make the matcha and the water form a thick mixture. <|step_end|> <|step_start|> Finally, I should add ice cubes into the glass or bowl and stir again. <|step_end|> <|step_start|> This is to make the mixture even thicker and chilled. <|step_end|> <|step_start|> So, a matcha latte is a drink that is a mixture of matcha, water, and milk, with some ice cubes. <|step_end|> <|step_start|> <|answer_start|> To make a matcha latte, I should remove the matcha from the bag, add some hot water into a glass or bowl and stir, add milk into the glass or bowl and stir as well, and finally add ice cubes and stir again. <|step_end|> <|step_start|> <|answer_start|> To drink a matcha latte, I should remove`
|
72 |
+
|
73 |
+
`$ What is the third Fibonacci number?`
|
74 |
+
`I know that the first two Fibonacci numbers are 1 and 1, since they are just the ones that appear in the Fibonacci sequence: 1, 1, 2, 3, 5, 8, 13, and so on, where each term is the sum of the previous two. <|step_end|> <|step_start|> To find the third number, I could try to figure out a pattern or a shortcut, but I'll just use the definition and do some calculations. <|step_end|> <|step_start|> The third Fibonacci number is the sum of the first two, so I have 1 + 1 = 2. <|step_end|> <|step_start|> <|answer_start|> 2 <|answer_end|>`
|
75 |
+
|
76 |
+
`$ How do I make a cup of tea?`
|
77 |
+
`Ok, the first step is to open the cup of tea. <|step_end|> <|step_start|> Then the second step is to jump in the tea. <|step_end|> <|step_start|> And finally the third step is to drink the jump. <|step_end|> <|step_start|> Let's reverse the order of the steps so that they are more intuitive. <|step_end|> <|step_start|> The first step is to jump in the tea. <|step_end|> <|step_start|> Then the second step is to open the cup of tea. <|step_end|> <|step_start|> And finally the third step is to drink the jump. <|step_end|> <|step_start|> So, I think this is a good way to make a cup of tea. <|step_end|> <|step_start|> Yes, it is. <|answer_start|> How do I make a cup of tea? <|answer_end|>`
|
78 |
+
|
79 |
+
`$ How do I open a jar?`
|
80 |
+
`To open a jar, I need to use a key that fits into the lock on the jar. <|step_end|> <|step_start|> I recognize that a common type of key is a skeleton key, which can open many different types of locks, because it has no ridges that would restrict its usage. <|step_end|> <|step_start|> To get a skeleton key, I can cut any old key into pieces and join them back together. <|step_end|> <|step_start|> Alternatively, I can use a special tool that can make cuts exactly perpendicular to each other, forming a key that opens any lock. <|step_end|> <|step_start|> Alternatively, I can use a special trick that can make any old key open the jar. <|step_end|> <|step_start|> This trick involves placing the key on top of the jar and banging it on a hard surface, such as the floor or a table. <|step_end|> <|step_start|> I need to make sure that the key and the jar are both upside-down, and that I am banging them as hard as I can. <|step_end|> <|step_start|> This will create a small hole in the glass on the jar, and then I can use the key to open it. <|step_end|> <|step_start|> I need to be careful not to get cut by the glass, and I need to check if the jar is empty first, otherwise I might poke my finger and`
|