Update README.md

0cb424b verified 7 months ago

4.2 kB

	---
	license: apache-2.0
	datasets:
	- bigcode/self-oss-instruct-sc2-exec-filter-50k
	library_name: transformers
	pipeline_tag: text-generation
	tags:
	- code
	---
	# README

	## Model Summary

	This is a instruction-tuned version of the [Starcoder2-3B model](https://huggingface.co/bigcode/starcoder2-3b). It has been trained using the same [repository](https://github.com/bigcode-project/starcoder2-self-align) and [dataset](https://huggingface.co/datasets/bigcode/self-oss-instruct-sc2-exec-filter-50k) used for Starcoder2-15B. It uses the same prompt generation technique as the Starcoder2-15B mode. So, it can be used as a drop in replacement by just changing the model path.

	* [Paper](https://arxiv.org/abs/2402.19173)

	## Intended Use

	Running code language models locally. This model can easily run on:

	* 8 GB and 10 GB VRAM machines with FP16
	* 6 GB VRAM machines with INT8
	* 4 GB VRAM machines with INT4

	## Example

	Using FP16

	```python
	import transformers
	import torch

	pipeline = transformers.pipeline(
	model="outputs_starcoder3b_4e",
	task="text-generation",
	torch_dtype=torch.bfloat16,
	device_map="auto",
	)

	def respond(instruction: str, response_prefix: str) -> str:
	messages = [{"role": "user", "content": instruction}]
	prompt = pipeline.tokenizer.apply_chat_template(messages, tokenize=False)
	prompt += response_prefix

	teminators = [
	pipeline.tokenizer.eos_token_id,
	pipeline.tokenizer.convert_tokens_to_ids("###"),
	]

	result = pipeline(
	prompt,
	max_length=1024,
	num_return_sequences=1,
	do_sample=False,
	eos_token_id=teminators,
	pad_token_id=pipeline.tokenizer.eos_token_id,
	truncation=True,
	)
	response = response_prefix + result[0]["generated_text"][len(prompt) :].split("###")[0].rstrip()
	return response


	instruction = "Write the Transformer encoder in PyTorch."
	response_prefix = ""

	print(respond(instruction, response_prefix))
	```

	Output:

	````
	```python
	import torch
	import torch.nn as nn

	class TransformerEncoder(nn.Module):
	def __init__(self, d_model, nhead, num_layers, dim_feedforward=2048, dropout=0.1):
	super(TransformerEncoder, self).__init__()
	self.encoder_layer = nn.TransformerEncoderLayer(d_model, nhead, dim_feedforward, dropout)
	self.transformer_encoder = nn.TransformerEncoder(self.encoder_layer, num_layers)

	def forward(self, src):
	return self.transformer_encoder(src)
	```
	````

	## Training

	* 4 epochs
	* Training type: Full fine tuning
	* Training time: ~4 hours
	* Batch size: 2
	* Gradient accumulation step: 256
	* Sequence length: 1280

	### Exact Training Command Used

	See the [repository](https://github.com/bigcode-project/starcoder2-self-align) for setup details.

	```
	MODEL_KEY=bigcode/starcoder2-3b
	LR=1e-5
	EPOCH=4
	SEQ_LEN=1280
	WARMUP_RATIO=0.05
	OUTPUT_DIR=outputs_starcoder3b_4e
	DATASET_FILE=train_data.jsonl
	accelerate launch -m star_align.train \
	--model_key $MODEL_KEY \
	--model_name_or_path $MODEL_KEY \
	--use_flash_attention True \
	--datafile_paths $DATASET_FILE \
	--output_dir $OUTPUT_DIR \
	--bf16 True \
	--num_train_epochs $EPOCH \
	--max_training_seq_length $SEQ_LEN \
	--pad_to_max_length False \
	--per_device_train_batch_size 2 \
	--gradient_accumulation_steps 256 \
	--group_by_length False \
	--ddp_find_unused_parameters False \
	--logging_steps 1 \
	--log_level info \
	--optim adafactor \
	--max_grad_norm -1 \
	--warmup_ratio $WARMUP_RATIO \
	--learning_rate $LR \
	--lr_scheduler_type linear \
	--attention_dropout 0.0 \
	--residual_dropout 0.0 \
	--embedding_dropout 0.0
	```

	### Hardware

	* 40 GB NVIDIA A100

	## Attributions

	* [Starcoder2 Self Align codebase](https://github.com/bigcode-project/starcoder2-self-align)
	* [Starcoder2 Self Align dataset](https://huggingface.co/datasets/bigcode/self-oss-instruct-sc2-exec-filter-50k)
	* [Starcoder2 paper](https://arxiv.org/abs/2402.19173)

	## License

	The model is licensed under the BigCode OpenRAIL-M v1 license agreement. You can find the full agreement [here](https://huggingface.co/spaces/bigcode/bigcode-model-license-agreement).