astraios-1b-fft / README.md

Update README.md

a63668b 12 months ago

4.76 kB


	---
	license: bigcode-openrail-m
	datasets:
	- bigcode/guanaco-commits
	metrics:
	- code_eval
	library_name: peft
	tags:
	- code
	---
	# Astraios: Parameter-Efficient Instruction Tuning Code Large Language Models
	<p align="center" width="100%">
	<a ><img src="https://github.com/bigcode-project/astraios/blob/main/visuals/banner.png?raw=true" alt="Astraios" style="width: 20%; min-width: 300px; display: block; margin: auto;"></a>
	</p>

	# Table of Contents

	1. [Model Summary](#model-summary)
	2. [Use](#use)
	3. [Training](#training)
	4. [Citation](#citation)

	# Model Summary

	> Astraios-1B-FFT is an instruction tuned model with 15.5B parameters created by finetuning StarCoderBase on CommitPackFT & OASST as described in the Astraios paper.

	- Repository: [bigcode-project/astraios](https://github.com/bigcode-project/astraios)
	- Paper: [Astraios: Parameter-Efficient Instruction Tuning Code Large Language Models]()
	- Languages: 80+ Programming languages
	- ✨Astraios:
	<table>
	<tr>
	<th>Data</t>
	<td><a href=https://huggingface.co/datasets/bigcode/guanaco-commits>CommitPackFT+OASST</a></td>
	<td>Filtered version of CommitPack and OASST for high-quality commit messages that resemble instructions</td>
	</tr>
	<tr>
	<th>Model</t>
	<td><a href=https://huggingface.co/collections/bigcode/astraios-1b-6576ff1b8e449026ae327c1c>Astraios-1B</a></td>
	<td>Collection of StarCoderBase-1B models instruction tuned on CommitPackFT + OASST with different tuning methods</td>
	</tr>
	<tr>
	<th></t>
	<td><a href=https://huggingface.co/collections/bigcode/astraios-3b-6577127317ee44ff547252d3>Astraios-3B</a></td>
	<td>Collection of StarCoderBase-3B (3B parameters) models instruction tuned on CommitPackFT + OASST with different tuning methods</td>
	</tr>
	<tr>
	<th></t>
	<td><a href=https://huggingface.co/collections/starpeft/starcoderbase-7b-650c1f028b45cfec8e72c265>Astraios-7B</a></td>
	<td>Collection of StarCoderBase-7B (7B parameters) models instruction tuned on CommitPackFT + OASST with different tuning methods</td>
	</tr>
	<tr>
	<th></t>
	<td><a href=https://huggingface.co/collections/bigcode/astraios-16b-65788b7476b6de79781054cc>Astraios-16B</a></td>
	<td>Collection of StarCoderBase-16B (16B parameters) models instruction tuned on CommitPackFT + OASST with different tuning methods</td>
	</tr>
	<tr>
	<th>Evaluation</t>
	<td><a href=https://huggingface.co/datasets/code_x_glue_cc_clone_detection_big_clone_bench>BigCloneBench</a></td>
	<td>Dataset for clone detection; We use 2,000 samples for evaluation</td>
	</tr>
	<tr>
	<th></t>
	<td><a href=https://huggingface.co/datasets/code_x_glue_cc_defect_detection>Devign</a></td>
	<td>Dataset for defect detection; We use 2,000 samples for evaluation</td>
	</tr>
	<tr>
	<th></t>
	<td><a href=https://huggingface.co/datasets/bigcode/humanevalpack>HumanEvalPack</a></td>
	<td>Extension of OpenAI's HumanEval to cover 3 scenarios across 6 languages</td>
	</tr>
	<tr>
	<th></t>
	<td><a href=https://huggingface.co/datasets/RaymondLi/perturbed_humaneval>ReCode</a></td>
	<td>Dataset for the robustness of code generation, covering 4 variants</td>
	</tr>
	<tr>
	<th></t>
	<td><a href=https://huggingface.co/datasets/moyix/asleep_keyboard>Asleep At The Keyboard</a></td>
	<td>Datasets for security of code generation; We use DoW for evaluation</td>
	</tr>
	</table>


	# Use

	## Intended use

	The model follows instructions provided in the input. You should always preface your input with "Question: " and finish it with "Answer:", for example: "Question: Please write a function in Python that performs bubble sort.

	Answer:"

	Feel free to share your generations in the Community tab!

	## Generation
	```python
	# pip install -q transformers
	from transformers import AutoModelForCausalLM, AutoTokenizer

	checkpoint = "bigcode/astraios-1b-fft"
	model = AutoModelForCausalLM.from_pretrained(checkpoint)
	device = "cuda" # for GPU usage or "cpu" for CPU usage

	tokenizer = AutoTokenizer.from_pretrained(checkpoint)
	model = AutoModelForCausalLM.from_pretrained(checkpoint).to(device)

	inputs = tokenizer.encode("Question: Please write a function in Python that performs bubble sort.

	Answer:", return_tensors="pt").to(device)
	outputs = model.generate(inputs)
	print(tokenizer.decode(outputs[0]))
	```

	# Training

	## Model

	- Architecture: GPT-2 model with multi-query attention and Fill-in-the-Middle objective
	- Steps: 250k pretraining & 200 instruction tuning
	- Precision: fp32

	## Hardware

	- Pretraining:
	- GPUs: 512 Tesla A100
	- Training time: 24 days
	- Instruction tuning:
	- GPUs: 8 Tesla A100

	## Software

	- Orchestration: [Megatron-LM/Transformers](https://github.com/bigcode-project/octopack#training)
	- Neural networks: [PyTorch](https://github.com/pytorch/pytorch)

	# Citation

	```bibtex
	```