octocoder / README.md

Update README.md

6e8b1b3 over 1 year ago

7.82 kB

	---
	pipeline_tag: text-generation
	inference: true
	widget:
	- text: 'def print_hello_world():'
	example_title: Hello world
	group: Python
	license: bigcode-openrail-m
	datasets:
	- bigcode/commitpackft
	- bigcode/oasst-octopack
	metrics:
	- code_eval
	library_name: transformers
	tags:
	- code
	model-index:
	- name: OctoCoder
	results:
	- task:
	type: text-generation
	dataset:
	type: bigcode/humanevalpack
	name: HumanEvalSynthesize Python
	metrics:
	- name: pass@1
	type: pass@1
	value: 46.2
	verified: false
	- task:
	type: text-generation
	dataset:
	type: bigcode/humanevalpack
	name: HumanEvalSynthesize JavaScript
	metrics:
	- name: pass@1
	type: pass@1
	value: 39.2
	verified: false
	- task:
	type: text-generation
	dataset:
	type: bigcode/humanevalpack
	name: HumanEvalSynthesize Java
	metrics:
	- name: pass@1
	type: pass@1
	value: 38.2
	verified: false
	- task:
	type: text-generation
	dataset:
	type: bigcode/humanevalpack
	name: HumanEvalSynthesize Go
	metrics:
	- name: pass@1
	type: pass@1
	value: 30.4
	verified: false
	- task:
	type: text-generation
	dataset:
	type: bigcode/humanevalpack
	name: HumanEvalSynthesize C++
	metrics:
	- name: pass@1
	type: pass@1
	value: 35.6
	verified: false
	- task:
	type: text-generation
	dataset:
	type: bigcode/humanevalpack
	name: HumanEvalSynthesize Rust
	metrics:
	- name: pass@1
	type: pass@1
	value: 23.4
	verified: false
	- task:
	type: text-generation
	dataset:
	type: bigcode/humanevalpack
	name: HumanEvalSynthesize Average
	metrics:
	- name: pass@1
	type: pass@1
	value: 35.5
	verified: false
	- task:
	type: text-generation
	dataset:
	type: bigcode/humanevalpack
	name: HumanEvalFix Python
	metrics:
	- name: pass@1
	type: pass@1
	value: 30.4
	verified: false
	- task:
	type: text-generation
	dataset:
	type: bigcode/humanevalpack
	name: HumanEvalFix JavaScript
	metrics:
	- name: pass@1
	type: pass@1
	value: 28.4
	verified: false
	- task:
	type: text-generation
	dataset:
	type: bigcode/humanevalpack
	name: HumanEvalFix Java
	metrics:
	- name: pass@1
	type: pass@1
	value: 30.6
	verified: false
	- task:
	type: text-generation
	dataset:
	type: bigcode/humanevalpack
	name: HumanEvalFix Go
	metrics:
	- name: pass@1
	type: pass@1
	value: 30.2
	verified: false
	- task:
	type: text-generation
	dataset:
	type: bigcode/humanevalpack
	name: HumanEvalFix C++
	metrics:
	- name: pass@1
	type: pass@1
	value: 26.1
	verified: false
	- task:
	type: text-generation
	dataset:
	type: bigcode/humanevalpack
	name: HumanEvalFix Rust
	metrics:
	- name: pass@1
	type: pass@1
	value: 16.5
	verified: false
	- task:
	type: text-generation
	dataset:
	type: bigcode/humanevalpack
	name: HumanEvalFix Average
	metrics:
	- name: pass@1
	type: pass@1
	value: 27.0
	verified: false
	- task:
	type: text-generation
	dataset:
	type: bigcode/humanevalpack
	name: HumanEvalExplain Python
	metrics:
	- name: pass@1
	type: pass@1
	value: 35.1
	verified: false
	- task:
	type: text-generation
	dataset:
	type: bigcode/humanevalpack
	name: HumanEvalExplain JavaScript
	metrics:
	- name: pass@1
	type: pass@1
	value: 24.5
	verified: false
	- task:
	type: text-generation
	dataset:
	type: bigcode/humanevalpack
	name: HumanEvalExplain Java
	metrics:
	- name: pass@1
	type: pass@1
	value: 27.3
	verified: false
	- task:
	type: text-generation
	dataset:
	type: bigcode/humanevalpack
	name: HumanEvalExplain Go
	metrics:
	- name: pass@1
	type: pass@1
	value: 21.1
	verified: false
	- task:
	type: text-generation
	dataset:
	type: bigcode/humanevalpack
	name: HumanEvalExplain C++
	metrics:
	- name: pass@1
	type: pass@1
	value: 24.1
	verified: false
	- task:
	type: text-generation
	dataset:
	type: bigcode/humanevalpack
	name: HumanEvalExplain Rust
	metrics:
	- name: pass@1
	type: pass@1
	value: 14.8
	verified: false
	- task:
	type: text-generation
	dataset:
	type: bigcode/humanevalpack
	name: HumanEvalExplain Average
	metrics:
	- name: pass@1
	type: pass@1
	value: 24.5
	verified: false
	---

	![Octopack](https://github.com/bigcode-project/octopack/blob/31f3320f098703c7910e43492c39366eeea68d83/banner.png?raw=true)

	# Table of Contents

	1. [Model Summary](#model-summary)
	2. [Use](#use)
	3. [Training](#training)
	4. [Citation](#citation)

	# Model Summary

	OctoCoder is an instruction tuned model with 15.5B parameters created by finetuning StarCoder on CommitPackFT & OASST as described in the OctoPack paper.

	- Repository: [bigcode/octopack](https://github.com/bigcode-project/octopack)
	- Paper: [TODO]()
	- Languages: 80+ Programming languages
	- OctoPack🐙🎒:
	<table>
	<tr>
	<th>Data</t>
	<th><a href=https://huggingface.co/datasets/bigcode/commitpack>CommitPack</a></th>
	<td>4TB of GitHub commits across 350 programming languages</td>
	</tr>
	<tr>
	<th></t>
	<th><a href=https://huggingface.co/datasets/bigcode/commitpackft>CommitPackFT</a></th>
	<td>Filtered version of CommitPack for high-quality commit messages that resemble instructions</td>
	</tr>
	<tr>
	<th>Model</t>
	<th><a href=https://huggingface.co/bigcode/octocoder>OctoCoder</a></th>
	<td>StarCoder (16B parameters) instruction tuned on CommitPackFT + OASST</td>
	</tr>
	<tr>
	<th></t>
	<th><a href=https://huggingface.co/bigcode/octogeex>OctoGeeX</a></th>
	<td>CodeGeeX2 (6B parameters) instruction tuned on CommitPackFT + OASST</td>
	</tr>
	<tr>
	<th>Evaluation  </t>
	<th><a href=https://huggingface.co/datasets/bigcode/humanevalpack>HumanEvalPack</a></th>
	<td>Extension of OpenAI's HumanEval to cover 3 scenarios across 6 languages</td>
	</tr>
	</table>


	# Use

	## Intended use

	The model follows instructions provided in the input. We recommend prefacing your input with "Question: " and finishing with "Answer:", for example: "Question: Please write a function in Python that performs bubble sort.\n\nAnswer:"

	Feel free to share your generations in the Community tab!

	## Generation
	```python
	# pip install -q transformers
	from transformers import AutoModelForCausalLM, AutoTokenizer

	checkpoint = "bigcode/octocoder"
	device = "cuda" # for GPU usage or "cpu" for CPU usage

	tokenizer = AutoTokenizer.from_pretrained(checkpoint)
	model = AutoModelForCausalLM.from_pretrained(checkpoint).to(device)

	inputs = tokenizer.encode("Question: Please write a function in Python that performs bubble sort.\n\nAnswer:", return_tensors="pt").to(device)
	outputs = model.generate(inputs)
	print(tokenizer.decode(outputs[0]))
	```

	# Training

	## Model

	- Architecture: GPT-2 model with multi-query attention and Fill-in-the-Middle objective
	- Steps: 250k pretraining & 30 instruction tuning
	- Pretraining tokens: 1 trillion pretraining & 2M instruction tuning
	- Precision: bfloat16

	## Hardware

	- Pretraining:
	- GPUs: 512 Tesla A100
	- Training time: 24 days
	- Instruction tuning:
	- GPUs: 8 Tesla A100
	- Training time: 4 hours

	## Software

	- Orchestration: [Megatron-LM/Transformers](https://github.com/bigcode-project/octopack#training)
	- Neural networks: [PyTorch](https://github.com/pytorch/pytorch)

	# Citation

	TODO