|
--- |
|
license: apache-2.0 |
|
datasets: |
|
- bigcode/self-oss-instruct-sc2-exec-filter-50k |
|
library_name: transformers |
|
pipeline_tag: text-generation |
|
tags: |
|
- code |
|
--- |
|
# README |
|
|
|
## Model Summary |
|
|
|
This is a instruction-tuned version of the [Starcoder2-3B model](https://huggingface.co/bigcode/starcoder2-3b). It has been trained using the same [repository](https://github.com/bigcode-project/starcoder2-self-align) and [dataset](https://huggingface.co/datasets/bigcode/self-oss-instruct-sc2-exec-filter-50k) used for Starcoder2-15B. It uses the same prompt generation technique as the Starcoder2-15B mode. So, it can be used as a drop in replacement by just changing the model path. |
|
|
|
* [Paper](https://arxiv.org/abs/2402.19173) |
|
|
|
## Intended Use |
|
|
|
Running code language models locally. This model can easily run on: |
|
|
|
* 8 GB and 10 GB VRAM machines with FP16 |
|
* 6 GB VRAM machines with INT8 |
|
* 4 GB VRAM machines with INT4 |
|
|
|
## Example |
|
|
|
**Using FP16** |
|
|
|
```python |
|
import transformers |
|
import torch |
|
|
|
pipeline = transformers.pipeline( |
|
model="outputs_starcoder3b_4e", |
|
task="text-generation", |
|
torch_dtype=torch.bfloat16, |
|
device_map="auto", |
|
) |
|
|
|
def respond(instruction: str, response_prefix: str) -> str: |
|
messages = [{"role": "user", "content": instruction}] |
|
prompt = pipeline.tokenizer.apply_chat_template(messages, tokenize=False) |
|
prompt += response_prefix |
|
|
|
teminators = [ |
|
pipeline.tokenizer.eos_token_id, |
|
pipeline.tokenizer.convert_tokens_to_ids("###"), |
|
] |
|
|
|
result = pipeline( |
|
prompt, |
|
max_length=1024, |
|
num_return_sequences=1, |
|
do_sample=False, |
|
eos_token_id=teminators, |
|
pad_token_id=pipeline.tokenizer.eos_token_id, |
|
truncation=True, |
|
) |
|
response = response_prefix + result[0]["generated_text"][len(prompt) :].split("###")[0].rstrip() |
|
return response |
|
|
|
|
|
instruction = "Write the Transformer encoder in PyTorch." |
|
response_prefix = "" |
|
|
|
print(respond(instruction, response_prefix)) |
|
``` |
|
|
|
*Output:* |
|
|
|
```` |
|
```python |
|
import torch |
|
import torch.nn as nn |
|
|
|
class TransformerEncoder(nn.Module): |
|
def __init__(self, d_model, nhead, num_layers, dim_feedforward=2048, dropout=0.1): |
|
super(TransformerEncoder, self).__init__() |
|
self.encoder_layer = nn.TransformerEncoderLayer(d_model, nhead, dim_feedforward, dropout) |
|
self.transformer_encoder = nn.TransformerEncoder(self.encoder_layer, num_layers) |
|
|
|
def forward(self, src): |
|
return self.transformer_encoder(src) |
|
``` |
|
```` |
|
|
|
## Training |
|
|
|
* 4 epochs |
|
* Training type: Full fine tuning |
|
* Training time: ~4 hours |
|
* Batch size: 2 |
|
* Gradient accumulation step: 256 |
|
* Sequence length: 1280 |
|
|
|
### Exact Training Command Used |
|
|
|
**See the [repository](https://github.com/bigcode-project/starcoder2-self-align) for setup details.** |
|
|
|
``` |
|
MODEL_KEY=bigcode/starcoder2-3b |
|
LR=1e-5 |
|
EPOCH=4 |
|
SEQ_LEN=1280 |
|
WARMUP_RATIO=0.05 |
|
OUTPUT_DIR=outputs_starcoder3b_4e |
|
DATASET_FILE=train_data.jsonl |
|
accelerate launch -m star_align.train \ |
|
--model_key $MODEL_KEY \ |
|
--model_name_or_path $MODEL_KEY \ |
|
--use_flash_attention True \ |
|
--datafile_paths $DATASET_FILE \ |
|
--output_dir $OUTPUT_DIR \ |
|
--bf16 True \ |
|
--num_train_epochs $EPOCH \ |
|
--max_training_seq_length $SEQ_LEN \ |
|
--pad_to_max_length False \ |
|
--per_device_train_batch_size 2 \ |
|
--gradient_accumulation_steps 256 \ |
|
--group_by_length False \ |
|
--ddp_find_unused_parameters False \ |
|
--logging_steps 1 \ |
|
--log_level info \ |
|
--optim adafactor \ |
|
--max_grad_norm -1 \ |
|
--warmup_ratio $WARMUP_RATIO \ |
|
--learning_rate $LR \ |
|
--lr_scheduler_type linear \ |
|
--attention_dropout 0.0 \ |
|
--residual_dropout 0.0 \ |
|
--embedding_dropout 0.0 |
|
``` |
|
|
|
### Hardware |
|
|
|
* 40 GB NVIDIA A100 |
|
|
|
## Attributions |
|
|
|
* [Starcoder2 Self Align codebase](https://github.com/bigcode-project/starcoder2-self-align) |
|
* [Starcoder2 Self Align dataset](https://huggingface.co/datasets/bigcode/self-oss-instruct-sc2-exec-filter-50k) |
|
* [Starcoder2 paper](https://arxiv.org/abs/2402.19173) |
|
|
|
## License |
|
|
|
The model is licensed under the BigCode OpenRAIL-M v1 license agreement. You can find the full agreement [here](https://huggingface.co/spaces/bigcode/bigcode-model-license-agreement). |