--- license: apache-2.0 datasets: - bigcode/self-oss-instruct-sc2-exec-filter-50k library_name: transformers pipeline_tag: text-generation tags: - code --- # README ## Model Summary This is a instruction-tuned version of the [Starcoder2-3B model](https://huggingface.co/bigcode/starcoder2-3b). It has been trained using the same [repository](https://github.com/bigcode-project/starcoder2-self-align) and [dataset](https://huggingface.co/datasets/bigcode/self-oss-instruct-sc2-exec-filter-50k) used for Starcoder2-15B. It uses the same prompt generation technique as the Starcoder2-15B mode. So, it can be used as a drop in replacement by just changing the model path. * [Paper](https://arxiv.org/abs/2402.19173) ## Intended Use Running code language models locally. This model can easily run on: * 8 GB and 10 GB VRAM machines with FP16 * 6 GB VRAM machines with INT8 * 4 GB VRAM machines with INT4 ## Example **Using FP16** ```python import transformers import torch pipeline = transformers.pipeline( model="outputs_starcoder3b_4e", task="text-generation", torch_dtype=torch.bfloat16, device_map="auto", ) def respond(instruction: str, response_prefix: str) -> str: messages = [{"role": "user", "content": instruction}] prompt = pipeline.tokenizer.apply_chat_template(messages, tokenize=False) prompt += response_prefix teminators = [ pipeline.tokenizer.eos_token_id, pipeline.tokenizer.convert_tokens_to_ids("###"), ] result = pipeline( prompt, max_length=1024, num_return_sequences=1, do_sample=False, eos_token_id=teminators, pad_token_id=pipeline.tokenizer.eos_token_id, truncation=True, ) response = response_prefix + result[0]["generated_text"][len(prompt) :].split("###")[0].rstrip() return response instruction = "Write the Transformer encoder in PyTorch." response_prefix = "" print(respond(instruction, response_prefix)) ``` *Output:* ```` ```python import torch import torch.nn as nn class TransformerEncoder(nn.Module): def __init__(self, d_model, nhead, num_layers, dim_feedforward=2048, dropout=0.1): super(TransformerEncoder, self).__init__() self.encoder_layer = nn.TransformerEncoderLayer(d_model, nhead, dim_feedforward, dropout) self.transformer_encoder = nn.TransformerEncoder(self.encoder_layer, num_layers) def forward(self, src): return self.transformer_encoder(src) ``` ```` ## Training * 4 epochs * Training type: Full fine tuning * Training time: ~4 hours * Batch size: 2 * Gradient accumulation step: 256 * Sequence length: 1280 ### Exact Training Command Used **See the [repository](https://github.com/bigcode-project/starcoder2-self-align) for setup details.** ``` MODEL_KEY=bigcode/starcoder2-3b LR=1e-5 EPOCH=4 SEQ_LEN=1280 WARMUP_RATIO=0.05 OUTPUT_DIR=outputs_starcoder3b_4e DATASET_FILE=train_data.jsonl accelerate launch -m star_align.train \ --model_key $MODEL_KEY \ --model_name_or_path $MODEL_KEY \ --use_flash_attention True \ --datafile_paths $DATASET_FILE \ --output_dir $OUTPUT_DIR \ --bf16 True \ --num_train_epochs $EPOCH \ --max_training_seq_length $SEQ_LEN \ --pad_to_max_length False \ --per_device_train_batch_size 2 \ --gradient_accumulation_steps 256 \ --group_by_length False \ --ddp_find_unused_parameters False \ --logging_steps 1 \ --log_level info \ --optim adafactor \ --max_grad_norm -1 \ --warmup_ratio $WARMUP_RATIO \ --learning_rate $LR \ --lr_scheduler_type linear \ --attention_dropout 0.0 \ --residual_dropout 0.0 \ --embedding_dropout 0.0 ``` ### Hardware * 40 GB NVIDIA A100 ## Attributions * [Starcoder2 Self Align codebase](https://github.com/bigcode-project/starcoder2-self-align) * [Starcoder2 Self Align dataset](https://huggingface.co/datasets/bigcode/self-oss-instruct-sc2-exec-filter-50k) * [Starcoder2 paper](https://arxiv.org/abs/2402.19173) ## License The model is licensed under the BigCode OpenRAIL-M v1 license agreement. You can find the full agreement [here](https://huggingface.co/spaces/bigcode/bigcode-model-license-agreement).