File size: 4,200 Bytes
0cb424b b91c1ec |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 |
---
license: apache-2.0
datasets:
- bigcode/self-oss-instruct-sc2-exec-filter-50k
library_name: transformers
pipeline_tag: text-generation
tags:
- code
---
# README
## Model Summary
This is a instruction-tuned version of the [Starcoder2-3B model](https://huggingface.co/bigcode/starcoder2-3b). It has been trained using the same [repository](https://github.com/bigcode-project/starcoder2-self-align) and [dataset](https://huggingface.co/datasets/bigcode/self-oss-instruct-sc2-exec-filter-50k) used for Starcoder2-15B. It uses the same prompt generation technique as the Starcoder2-15B mode. So, it can be used as a drop in replacement by just changing the model path.
* [Paper](https://arxiv.org/abs/2402.19173)
## Intended Use
Running code language models locally. This model can easily run on:
* 8 GB and 10 GB VRAM machines with FP16
* 6 GB VRAM machines with INT8
* 4 GB VRAM machines with INT4
## Example
**Using FP16**
```python
import transformers
import torch
pipeline = transformers.pipeline(
model="outputs_starcoder3b_4e",
task="text-generation",
torch_dtype=torch.bfloat16,
device_map="auto",
)
def respond(instruction: str, response_prefix: str) -> str:
messages = [{"role": "user", "content": instruction}]
prompt = pipeline.tokenizer.apply_chat_template(messages, tokenize=False)
prompt += response_prefix
teminators = [
pipeline.tokenizer.eos_token_id,
pipeline.tokenizer.convert_tokens_to_ids("###"),
]
result = pipeline(
prompt,
max_length=1024,
num_return_sequences=1,
do_sample=False,
eos_token_id=teminators,
pad_token_id=pipeline.tokenizer.eos_token_id,
truncation=True,
)
response = response_prefix + result[0]["generated_text"][len(prompt) :].split("###")[0].rstrip()
return response
instruction = "Write the Transformer encoder in PyTorch."
response_prefix = ""
print(respond(instruction, response_prefix))
```
*Output:*
````
```python
import torch
import torch.nn as nn
class TransformerEncoder(nn.Module):
def __init__(self, d_model, nhead, num_layers, dim_feedforward=2048, dropout=0.1):
super(TransformerEncoder, self).__init__()
self.encoder_layer = nn.TransformerEncoderLayer(d_model, nhead, dim_feedforward, dropout)
self.transformer_encoder = nn.TransformerEncoder(self.encoder_layer, num_layers)
def forward(self, src):
return self.transformer_encoder(src)
```
````
## Training
* 4 epochs
* Training type: Full fine tuning
* Training time: ~4 hours
* Batch size: 2
* Gradient accumulation step: 256
* Sequence length: 1280
### Exact Training Command Used
**See the [repository](https://github.com/bigcode-project/starcoder2-self-align) for setup details.**
```
MODEL_KEY=bigcode/starcoder2-3b
LR=1e-5
EPOCH=4
SEQ_LEN=1280
WARMUP_RATIO=0.05
OUTPUT_DIR=outputs_starcoder3b_4e
DATASET_FILE=train_data.jsonl
accelerate launch -m star_align.train \
--model_key $MODEL_KEY \
--model_name_or_path $MODEL_KEY \
--use_flash_attention True \
--datafile_paths $DATASET_FILE \
--output_dir $OUTPUT_DIR \
--bf16 True \
--num_train_epochs $EPOCH \
--max_training_seq_length $SEQ_LEN \
--pad_to_max_length False \
--per_device_train_batch_size 2 \
--gradient_accumulation_steps 256 \
--group_by_length False \
--ddp_find_unused_parameters False \
--logging_steps 1 \
--log_level info \
--optim adafactor \
--max_grad_norm -1 \
--warmup_ratio $WARMUP_RATIO \
--learning_rate $LR \
--lr_scheduler_type linear \
--attention_dropout 0.0 \
--residual_dropout 0.0 \
--embedding_dropout 0.0
```
### Hardware
* 40 GB NVIDIA A100
## Attributions
* [Starcoder2 Self Align codebase](https://github.com/bigcode-project/starcoder2-self-align)
* [Starcoder2 Self Align dataset](https://huggingface.co/datasets/bigcode/self-oss-instruct-sc2-exec-filter-50k)
* [Starcoder2 paper](https://arxiv.org/abs/2402.19173)
## License
The model is licensed under the BigCode OpenRAIL-M v1 license agreement. You can find the full agreement [here](https://huggingface.co/spaces/bigcode/bigcode-model-license-agreement). |