justus27 commited on
Commit
cc20df7
1 Parent(s): d6e8b96

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +0 -72
README.md CHANGED
@@ -1,72 +0,0 @@
1
- ---
2
- license: apache-2.0
3
- tags:
4
- - Composer
5
- - MosaicML
6
- - llm-foundry
7
- - StreamingDatasets
8
- datasets:
9
- - mc4
10
- - c4
11
- - togethercomputer/RedPajama-Data-1T
12
- - bigcode/the-stack
13
- - allenai/s2orc
14
- inference: false
15
- ---
16
-
17
- This is MPT-7B patched so that it can be used with a LoRA. Note that while I tested that it works and I get reasonable results out, it is very possible that the model isn't being trained correctly. The model code specifically says that left padding is not supported, but I forcibly did so and got decent results.
18
-
19
- Note that when using LoRA, there is a strange quirk that prevents me from causing generation with an empty prompt.
20
-
21
- I also included a model-agnostic `export_hf_checkpoint.py` script, which you can use to merge your lora back into a new full model (check Github link at the end). Once you do this, you do not need to use the patched version of the model code anymore. That being said, if you want to be able to load the model in 8bit you will still need it. The usage is `python export_hf_checkpoint.py <source> <lora> <dest>`.
22
-
23
- If you would like to use this with text-generation-webui, apply the following patch:
24
- ```
25
- --- a/modules/training.py
26
- +++ b/modules/training.py
27
- @@ -28,12 +28,13 @@ try:
28
- MODEL_CLASSES = {v: k for k, v in MODEL_FOR_CAUSAL_LM_MAPPING_NAMES}
29
- except:
30
- standard_modules = ["q_proj", "v_proj"]
31
- - model_to_lora_modules = {"llama": standard_modules, "opt": standard_modules, "gptj": standard_modules, "gpt_neox": ["query_key_value"]}
32
- + model_to_lora_modules = {"llama": standard_modules, "opt": standard_modules, "gptj": standard_modules, "gpt_neox": ["query_key_value"], "mpt": ["Wqkv"]}
33
- MODEL_CLASSES = {
34
- "LlamaForCausalLM": "llama",
35
- "OPTForCausalLM": "opt",
36
- "GPTJForCausalLM": "gptj",
37
- - "GPTNeoXForCausalLM": "gpt_neox"
38
- + "GPTNeoXForCausalLM": "gpt_neox",
39
- + "MPTForCausalLM": "mpt"
40
- }
41
-
42
- WANT_INTERRUPT = False
43
- ```
44
- You will need to run the webui with these options:
45
- ```
46
- python server.py --model mosaicml_mpt-7b-instruct --trust-remote-code --load-in-8bit
47
- ```
48
- You may also need to patch `bitsandbytes/nn/modules.py` to prevent running out of VRAM when saving the LoRA:
49
- ```
50
- --- a/modules.py
51
- +++ b/modules.py
52
- @@ -259,13 +259,13 @@
53
- if not self.state.has_fp16_weights and self.state.CB is None and self.state.CxB is not None:
54
- # reorder weight layout back from ampere/turing to row
55
- reorder_layout = True
56
- - weight_clone = self.weight.data.clone()
57
- + weight_clone = self.weight.data
58
- else:
59
- reorder_layout = False
60
-
61
- try:
62
- if reorder_layout:
63
- - self.weight.data = undo_layout(self.state.CxB, self.state.tile_indices)
64
- + self.weight.data = undo_layout(self.state.CxB.cpu(), self.state.tile_indices.cpu())
65
-
66
- super()._save_to_state_dict(destination, prefix, keep_vars)
67
- ```
68
- (It resides in miniconda3/envs/textgen/lib/python3.10/site-packages/bitsandbytes/nn/modules.py for me.)
69
-
70
- The alterations are based on the source code for the llama model from HF Transformers.
71
-
72
- Big thanks to "iwalton3" for making this possible. You can find the `export_hf_checkpoint.py` here: https://github.com/iwalton3/mpt-lora-patch/blob/master/export_hf_checkpoint.py